2019-06-04 02:11:32 -06:00
// SPDX-License-Identifier: GPL-2.0-only
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
/*
* Kernel - based Virtual Machine driver for Linux
*
* This module enables machines with Intel VT - x extensions to run virtual
* machines without emulation or binary translation .
*
* Copyright ( C ) 2006 Qumranet , Inc .
2010-10-06 06:23:22 -06:00
* Copyright 2010 Red Hat , Inc . and / or its affiliates .
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
*
* Authors :
* Avi Kivity < avi @ qumranet . com >
* Yaniv Kamay < yaniv @ qumranet . com >
*/
2018-12-03 14:52:53 -07:00
# include <linux/frame.h>
# include <linux/highmem.h>
# include <linux/hrtimer.h>
# include <linux/kernel.h>
2007-12-16 02:02:48 -07:00
# include <linux/kvm_host.h>
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
# include <linux/module.h>
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-16 10:58:32 -06:00
# include <linux/moduleparam.h>
2012-03-21 00:33:51 -06:00
# include <linux/mod_devicetable.h>
2018-12-03 14:52:53 -07:00
# include <linux/mm.h>
# include <linux/sched.h>
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled. However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.
The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case. So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.
Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:
1) /sys/devices/system/cpu/smt/control showed "on" instead of
"notsupported"; and
2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
I'd propose that we instead consider #1 above to not actually be a
problem. Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later. So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).
The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.
So fix it by:
a) reverting the original "fix" and its followup fix:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")
and
b) changing vmx_vm_init() to query the actual current SMT state --
instead of the sysfs control value -- to determine whether the L1TF
warning is needed. This also requires the 'sched_smt_present'
variable to exported, instead of 'cpu_smt_control'.
Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30 06:13:58 -07:00
# include <linux/sched/smt.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 02:04:11 -06:00
# include <linux/slab.h>
2010-04-29 10:09:01 -06:00
# include <linux/tboot.h>
2018-12-03 14:52:53 -07:00
# include <linux/trace_events.h>
2007-06-28 12:15:57 -06:00
2018-12-03 14:52:53 -07:00
# include <asm/apic.h>
2018-08-06 08:42:49 -06:00
# include <asm/asm.h>
2015-09-18 08:29:54 -06:00
# include <asm/cpu.h>
2020-03-20 07:13:46 -06:00
# include <asm/cpu_device_id.h>
2018-12-03 14:52:53 -07:00
# include <asm/debugreg.h>
2006-12-13 01:33:43 -07:00
# include <asm/desc.h>
2015-04-26 08:56:05 -06:00
# include <asm/fpu/internal.h>
2018-12-03 14:52:53 -07:00
# include <asm/io.h>
2015-09-18 08:29:51 -06:00
# include <asm/irq_remapping.h>
2018-12-03 14:52:53 -07:00
# include <asm/kexec.h>
# include <asm/perf_event.h>
# include <asm/mce.h>
2017-05-28 11:00:17 -06:00
# include <asm/mmu_context.h>
2018-03-20 08:02:11 -06:00
# include <asm/mshyperv.h>
2020-01-23 10:29:45 -07:00
# include <asm/mwait.h>
2018-12-03 14:52:53 -07:00
# include <asm/spec-ctrl.h>
# include <asm/virtext.h>
# include <asm/vmx.h>
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2018-12-03 14:53:02 -07:00
# include "capabilities.h"
2018-12-03 14:52:53 -07:00
# include "cpuid.h"
2018-12-03 14:52:58 -07:00
# include "evmcs.h"
2018-12-03 14:52:53 -07:00
# include "irq.h"
# include "kvm_cache_regs.h"
# include "lapic.h"
# include "mmu.h"
2018-12-03 14:53:18 -07:00
# include "nested.h"
2018-12-03 14:53:07 -07:00
# include "ops.h"
2015-06-19 07:45:05 -06:00
# include "pmu.h"
2018-12-03 14:52:53 -07:00
# include "trace.h"
2018-12-03 14:53:04 -07:00
# include "vmcs.h"
2018-12-03 14:53:05 -07:00
# include "vmcs12.h"
2018-12-03 14:53:07 -07:00
# include "vmx.h"
2018-12-03 14:52:53 -07:00
# include "x86.h"
2009-06-17 06:22:14 -06:00
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
MODULE_AUTHOR ( " Qumranet " ) ;
MODULE_LICENSE ( " GPL " ) ;
2020-02-27 19:49:52 -07:00
# ifdef MODULE
2012-03-21 00:33:51 -06:00
static const struct x86_cpu_id vmx_cpu_id [ ] = {
2020-03-20 07:13:50 -06:00
X86_MATCH_FEATURE ( X86_FEATURE_VMX , NULL ) ,
2012-03-21 00:33:51 -06:00
{ }
} ;
MODULE_DEVICE_TABLE ( x86cpu , vmx_cpu_id ) ;
2020-02-27 19:49:52 -07:00
# endif
2012-03-21 00:33:51 -06:00
2018-12-03 14:53:03 -07:00
bool __read_mostly enable_vpid = 1 ;
2009-03-23 09:39:48 -06:00
module_param_named ( vpid , enable_vpid , bool , 0444 ) ;
2008-01-17 00:14:33 -07:00
2017-11-06 05:31:13 -07:00
static bool __read_mostly enable_vnmi = 1 ;
module_param_named ( vnmi , enable_vnmi , bool , S_IRUGO ) ;
2018-12-03 14:53:03 -07:00
bool __read_mostly flexpriority_enabled = 1 ;
2009-03-23 09:39:48 -06:00
module_param_named ( flexpriority , flexpriority_enabled , bool , S_IRUGO ) ;
2008-03-24 10:15:14 -06:00
2018-12-03 14:53:03 -07:00
bool __read_mostly enable_ept = 1 ;
2009-03-23 09:39:48 -06:00
module_param_named ( ept , enable_ept , bool , S_IRUGO ) ;
2008-04-24 20:13:16 -06:00
2018-12-03 14:53:03 -07:00
bool __read_mostly enable_unrestricted_guest = 1 ;
2009-06-08 12:34:16 -06:00
module_param_named ( unrestricted_guest ,
enable_unrestricted_guest , bool , S_IRUGO ) ;
2018-12-03 14:53:03 -07:00
bool __read_mostly enable_ept_ad_bits = 1 ;
2012-05-28 05:33:35 -06:00
module_param_named ( eptad , enable_ept_ad_bits , bool , S_IRUGO ) ;
2012-06-12 11:30:18 -06:00
static bool __read_mostly emulate_invalid_guest_state = true ;
2009-03-23 07:41:17 -06:00
module_param ( emulate_invalid_guest_state , bool , S_IRUGO ) ;
2008-08-17 07:39:48 -06:00
2012-01-12 16:02:18 -07:00
static bool __read_mostly fasteoi = 1 ;
2011-08-30 04:56:17 -06:00
module_param ( fasteoi , bool , S_IRUGO ) ;
2020-02-20 10:22:04 -07:00
bool __read_mostly enable_apicv = 1 ;
2013-04-11 05:25:12 -06:00
module_param ( enable_apicv , bool , S_IRUGO ) ;
2013-01-24 19:18:49 -07:00
2011-05-25 14:02:23 -06:00
/*
* If nested = 1 , nested virtualization is supported , i . e . , guests may use
* VMX and be a hypervisor for its own guests . If nested = 0 , guests may not
* use VMX instructions .
*/
2018-10-16 16:55:22 -06:00
static bool __read_mostly nested = 1 ;
2011-05-25 14:02:23 -06:00
module_param ( nested , bool , S_IRUGO ) ;
2018-12-03 14:53:03 -07:00
bool __read_mostly enable_pml = 1 ;
2015-01-27 19:54:28 -07:00
module_param_named ( pml , enable_pml , bool , S_IRUGO ) ;
2019-05-20 07:34:35 -06:00
static bool __read_mostly dump_invalid_vmcs = 0 ;
module_param ( dump_invalid_vmcs , bool , 0644 ) ;
2018-01-16 08:51:18 -07:00
# define MSR_BITMAP_MODE_X2APIC 1
# define MSR_BITMAP_MODE_X2APIC_APICV 2
2015-10-20 01:39:09 -06:00
# define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL
2016-06-13 15:19:59 -06:00
/* Guest_tsc -> host_tsc conversion requires 64-bit division. */
static int __read_mostly cpu_preemption_timer_multi ;
static bool __read_mostly enable_preemption_timer = 1 ;
# ifdef CONFIG_X86_64
module_param_named ( preemption_timer , enable_preemption_timer , bool , S_IRUGO ) ;
# endif
2018-07-13 09:42:30 -06:00
# define KVM_VM_CR0_ALWAYS_OFF (X86_CR0_NW | X86_CR0_CD)
2018-03-05 13:04:38 -07:00
# define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR0_NE
# define KVM_VM_CR0_ALWAYS_ON \
( KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST | \
X86_CR0_WP | X86_CR0_PG | X86_CR0_PE )
2009-12-07 03:26:18 -07:00
# define KVM_CR4_GUEST_OWNED_BITS \
( X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \
2017-08-24 06:27:56 -06:00
| X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_TSD )
2009-12-07 03:26:18 -07:00
2018-03-05 13:04:39 -07:00
# define KVM_VM_CR4_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR4_VMXE
2009-12-06 08:21:14 -07:00
# define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE)
# define KVM_RMODE_VM_CR4_ALWAYS_ON (X86_CR4_VME | X86_CR4_PAE | X86_CR4_VMXE)
2010-04-08 09:19:35 -06:00
# define RMODE_GUEST_OWNED_EFLAGS_BITS (~(X86_EFLAGS_IOPL | X86_EFLAGS_VM))
2018-10-24 02:05:14 -06:00
# define MSR_IA32_RTIT_STATUS_MASK (~(RTIT_STATUS_FILTEREN | \
RTIT_STATUS_CONTEXTEN | RTIT_STATUS_TRIGGEREN | \
RTIT_STATUS_ERROR | RTIT_STATUS_STOPPED | \
RTIT_STATUS_BYTECNT ) )
# define MSR_IA32_RTIT_OUTPUT_BASE_MASK \
( ~ ( ( 1UL < < cpuid_query_maxphyaddr ( vcpu ) ) - 1 ) | 0x7f )
2009-10-09 04:03:20 -06:00
/*
* These 2 parameters are used to config the controls for Pause - Loop Exiting :
* ple_gap : upper bound on the amount of time between two successive
* executions of PAUSE in a loop . Also indicate if ple enabled .
2011-01-04 07:51:33 -07:00
* According to test , this time is usually smaller than 128 cycles .
2009-10-09 04:03:20 -06:00
* ple_window : upper bound on the amount of time a guest is allowed to execute
* in a PAUSE loop . Tests indicate that most spinlocks are held for
* less than 2 ^ 12 cycles
* Time is measured based on a counter that runs at the same rate as the TSC ,
* refer SDM volume 3 b section 21.6 .13 & 22.1 .3 .
*/
2018-03-16 14:37:24 -06:00
static unsigned int ple_gap = KVM_DEFAULT_PLE_GAP ;
2018-11-23 10:02:14 -07:00
module_param ( ple_gap , uint , 0444 ) ;
2014-08-21 10:08:08 -06:00
2018-03-16 14:37:22 -06:00
static unsigned int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW ;
module_param ( ple_window , uint , 0444 ) ;
2009-10-09 04:03:20 -06:00
2014-08-21 10:08:08 -06:00
/* Default doubles per-vcpu window every exit. */
2018-03-16 14:37:24 -06:00
static unsigned int ple_window_grow = KVM_DEFAULT_PLE_WINDOW_GROW ;
2018-03-16 14:37:22 -06:00
module_param ( ple_window_grow , uint , 0444 ) ;
2014-08-21 10:08:08 -06:00
/* Default resets per-vcpu window every exit to ple_window. */
2018-03-16 14:37:24 -06:00
static unsigned int ple_window_shrink = KVM_DEFAULT_PLE_WINDOW_SHRINK ;
2018-03-16 14:37:22 -06:00
module_param ( ple_window_shrink , uint , 0444 ) ;
2014-08-21 10:08:08 -06:00
/* Default is to compute the maximum so we can never overflow. */
2018-03-16 14:37:22 -06:00
static unsigned int ple_window_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX ;
module_param ( ple_window_max , uint , 0444 ) ;
2014-08-21 10:08:08 -06:00
2018-10-24 02:05:10 -06:00
/* Default is SYSTEM mode, 1 for host-guest mode */
int __read_mostly pt_mode = PT_MODE_SYSTEM ;
module_param ( pt_mode , int , S_IRUGO ) ;
2018-07-02 04:29:30 -06:00
static DEFINE_STATIC_KEY_FALSE ( vmx_l1d_should_flush ) ;
2018-07-21 14:25:00 -06:00
static DEFINE_STATIC_KEY_FALSE ( vmx_l1d_flush_cond ) ;
2018-07-13 08:23:21 -06:00
static DEFINE_MUTEX ( vmx_l1d_flush_mutex ) ;
2018-07-02 04:29:30 -06:00
2018-07-13 08:23:19 -06:00
/* Storage for pre module init parameter parsing */
static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush_param = VMENTER_L1D_FLUSH_AUTO ;
2018-07-02 04:29:30 -06:00
static const struct {
const char * option ;
2018-08-22 08:43:39 -06:00
bool for_parse ;
2018-07-02 04:29:30 -06:00
} vmentry_l1d_param [ ] = {
2018-08-22 08:43:39 -06:00
[ VMENTER_L1D_FLUSH_AUTO ] = { " auto " , true } ,
[ VMENTER_L1D_FLUSH_NEVER ] = { " never " , true } ,
[ VMENTER_L1D_FLUSH_COND ] = { " cond " , true } ,
[ VMENTER_L1D_FLUSH_ALWAYS ] = { " always " , true } ,
[ VMENTER_L1D_FLUSH_EPT_DISABLED ] = { " EPT disabled " , false } ,
[ VMENTER_L1D_FLUSH_NOT_REQUIRED ] = { " not required " , false } ,
2018-07-02 04:29:30 -06:00
} ;
2018-07-13 08:23:19 -06:00
# define L1D_CACHE_ORDER 4
static void * vmx_l1d_flush_pages ;
static int vmx_setup_l1d_flush ( enum vmx_l1d_flush_state l1tf )
2018-07-02 04:29:30 -06:00
{
2018-07-13 08:23:19 -06:00
struct page * page ;
2018-07-18 11:07:38 -06:00
unsigned int i ;
2018-07-02 04:29:30 -06:00
2019-08-26 13:30:23 -06:00
if ( ! boot_cpu_has_bug ( X86_BUG_L1TF ) ) {
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED ;
return 0 ;
}
2018-07-13 08:23:19 -06:00
if ( ! enable_ept ) {
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED ;
return 0 ;
2018-07-02 04:29:30 -06:00
}
2018-08-15 23:42:39 -06:00
if ( boot_cpu_has ( X86_FEATURE_ARCH_CAPABILITIES ) ) {
u64 msr ;
rdmsrl ( MSR_IA32_ARCH_CAPABILITIES , msr ) ;
if ( msr & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH ) {
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED ;
return 0 ;
}
}
2018-08-05 08:07:46 -06:00
x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.
The possible values are:
full
Provides all available mitigations for the L1TF vulnerability. Disables
SMT and enables all mitigations in the hypervisors. SMT control via
/sys/devices/system/cpu/smt/control is still possible after boot.
Hypervisors will issue a warning when the first VM is started in
a potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
full,force
Same as 'full', but disables SMT control. Implies the 'nosmt=force'
command line option. sysfs control of SMT and the hypervisor flush
control is disabled.
flush
Leaves SMT enabled and enables the conditional hypervisor mitigation.
Hypervisors will issue a warning when the first VM is started in a
potentially insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nosmt
Disables SMT and enables the conditional hypervisor mitigation. SMT
control via /sys/devices/system/cpu/smt/control is still possible
after boot. If SMT is reenabled or flushing disabled at runtime
hypervisors will issue a warning.
flush,nowarn
Same as 'flush', but hypervisors will not warn when
a VM is started in a potentially insecure configuration.
off
Disables hypervisor mitigations and doesn't emit any warnings.
Default is 'flush'.
Let KVM adhere to these semantics, which means:
- 'lt1f=full,force' : Performe L1D flushes. No runtime control
possible.
- 'l1tf=full'
- 'l1tf-flush'
- 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if
SMT has been runtime enabled or L1D flushing
has been run-time enabled
- 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted.
- 'l1tf=off' : L1D flushes are not performed and no warnings
are emitted.
KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.
This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.
Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
2018-07-13 08:23:25 -06:00
/* If set to auto use the default l1tf mitigation method */
if ( l1tf = = VMENTER_L1D_FLUSH_AUTO ) {
switch ( l1tf_mitigation ) {
case L1TF_MITIGATION_OFF :
l1tf = VMENTER_L1D_FLUSH_NEVER ;
break ;
case L1TF_MITIGATION_FLUSH_NOWARN :
case L1TF_MITIGATION_FLUSH :
case L1TF_MITIGATION_FLUSH_NOSMT :
l1tf = VMENTER_L1D_FLUSH_COND ;
break ;
case L1TF_MITIGATION_FULL :
case L1TF_MITIGATION_FULL_FORCE :
l1tf = VMENTER_L1D_FLUSH_ALWAYS ;
break ;
}
} else if ( l1tf_mitigation = = L1TF_MITIGATION_FULL_FORCE ) {
l1tf = VMENTER_L1D_FLUSH_ALWAYS ;
}
2018-07-13 08:23:19 -06:00
if ( l1tf ! = VMENTER_L1D_FLUSH_NEVER & & ! vmx_l1d_flush_pages & &
! boot_cpu_has ( X86_FEATURE_FLUSH_L1D ) ) {
2019-02-11 12:02:52 -07:00
/*
* This allocation for vmx_l1d_flush_pages is not tied to a VM
* lifetime and so should not be charged to a memcg .
*/
2018-07-13 08:23:19 -06:00
page = alloc_pages ( GFP_KERNEL , L1D_CACHE_ORDER ) ;
if ( ! page )
return - ENOMEM ;
vmx_l1d_flush_pages = page_address ( page ) ;
2018-07-18 11:07:38 -06:00
/*
* Initialize each page with a different pattern in
* order to protect against KSM in the nested
* virtualization case .
*/
for ( i = 0 ; i < 1u < < L1D_CACHE_ORDER ; + + i ) {
memset ( vmx_l1d_flush_pages + i * PAGE_SIZE , i + 1 ,
PAGE_SIZE ) ;
}
2018-07-13 08:23:19 -06:00
}
l1tf_vmx_mitigation = l1tf ;
2018-07-13 08:23:22 -06:00
if ( l1tf ! = VMENTER_L1D_FLUSH_NEVER )
static_branch_enable ( & vmx_l1d_should_flush ) ;
else
static_branch_disable ( & vmx_l1d_should_flush ) ;
2018-07-13 08:23:20 -06:00
2018-07-21 14:25:00 -06:00
if ( l1tf = = VMENTER_L1D_FLUSH_COND )
static_branch_enable ( & vmx_l1d_flush_cond ) ;
2018-07-13 08:23:22 -06:00
else
2018-07-21 14:25:00 -06:00
static_branch_disable ( & vmx_l1d_flush_cond ) ;
2018-07-13 08:23:19 -06:00
return 0 ;
}
static int vmentry_l1d_flush_parse ( const char * s )
{
unsigned int i ;
if ( s ) {
for ( i = 0 ; i < ARRAY_SIZE ( vmentry_l1d_param ) ; i + + ) {
2018-08-22 08:43:39 -06:00
if ( vmentry_l1d_param [ i ] . for_parse & &
sysfs_streq ( s , vmentry_l1d_param [ i ] . option ) )
return i ;
2018-07-13 08:23:19 -06:00
}
}
2018-07-02 04:29:30 -06:00
return - EINVAL ;
}
2018-07-13 08:23:19 -06:00
static int vmentry_l1d_flush_set ( const char * s , const struct kernel_param * kp )
{
2018-07-13 08:23:21 -06:00
int l1tf , ret ;
2018-07-13 08:23:19 -06:00
l1tf = vmentry_l1d_flush_parse ( s ) ;
if ( l1tf < 0 )
return l1tf ;
2018-08-22 08:43:39 -06:00
if ( ! boot_cpu_has ( X86_BUG_L1TF ) )
return 0 ;
2018-07-13 08:23:19 -06:00
/*
* Has vmx_init ( ) run already ? If not then this is the pre init
* parameter parsing . In that case just store the value and let
* vmx_init ( ) do the proper setup after enable_ept has been
* established .
*/
if ( l1tf_vmx_mitigation = = VMENTER_L1D_FLUSH_AUTO ) {
vmentry_l1d_flush_param = l1tf ;
return 0 ;
}
2018-07-13 08:23:21 -06:00
mutex_lock ( & vmx_l1d_flush_mutex ) ;
ret = vmx_setup_l1d_flush ( l1tf ) ;
mutex_unlock ( & vmx_l1d_flush_mutex ) ;
return ret ;
2018-07-13 08:23:19 -06:00
}
2018-07-02 04:29:30 -06:00
static int vmentry_l1d_flush_get ( char * s , const struct kernel_param * kp )
{
2018-08-22 08:43:39 -06:00
if ( WARN_ON_ONCE ( l1tf_vmx_mitigation > = ARRAY_SIZE ( vmentry_l1d_param ) ) )
return sprintf ( s , " ??? \n " ) ;
2018-07-13 08:23:19 -06:00
return sprintf ( s , " %s \n " , vmentry_l1d_param [ l1tf_vmx_mitigation ] . option ) ;
2018-07-02 04:29:30 -06:00
}
static const struct kernel_param_ops vmentry_l1d_flush_ops = {
. set = vmentry_l1d_flush_set ,
. get = vmentry_l1d_flush_get ,
} ;
2018-07-13 08:23:22 -06:00
module_param_cb ( vmentry_l1d_flush , & vmentry_l1d_flush_ops , NULL , 0644 ) ;
2018-07-02 04:29:30 -06:00
2012-12-20 07:57:45 -07:00
static bool guest_state_valid ( struct kvm_vcpu * vcpu ) ;
static u32 vmx_segment_access_rights ( struct kvm_segment * var ) ;
2018-11-07 20:22:21 -07:00
static __always_inline void vmx_disable_intercept_for_msr ( unsigned long * msr_bitmap ,
2018-02-01 14:59:43 -07:00
u32 msr , int type ) ;
2007-06-20 02:20:04 -06:00
KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
Transitioning to/from a VMX guest requires KVM to manually save/load
the bulk of CPU state that the guest is allowed to direclty access,
e.g. XSAVE state, CR2, GPRs, etc... For obvious reasons, loading the
guest's GPR snapshot prior to VM-Enter and saving the snapshot after
VM-Exit is done via handcoded assembly. The assembly blob is written
as inline asm so that it can easily access KVM-defined structs that
are used to hold guest state, e.g. moving the blob to a standalone
assembly file would require generating defines for struct offsets.
The other relevant aspect of VMX transitions in KVM is the handling of
VM-Exits. KVM doesn't employ a separate VM-Exit handler per se, but
rather treats the VMX transition as a mega instruction (with many side
effects), i.e. sets the VMCS.HOST_RIP to a label immediately following
VMLAUNCH/VMRESUME. The label is then exposed to C code via a global
variable definition in the inline assembly.
Because of the global variable, KVM takes steps to (attempt to) ensure
only a single instance of the owning C function, e.g. vmx_vcpu_run, is
generated by the compiler. The earliest approach placed the inline
assembly in a separate noinline function[1]. Later, the assembly was
folded back into vmx_vcpu_run() and tagged with __noclone[2][3], which
is still used today.
After moving to __noclone, an edge case was encountered where GCC's
-ftracer optimization resulted in the inline assembly blob being
duplicated. This was "fixed" by explicitly disabling -ftracer in the
__noclone definition[4].
Recently, it was found that disabling -ftracer causes build warnings
for unsuspecting users of __noclone[5], and more importantly for KVM,
prevents the compiler for properly optimizing vmx_vcpu_run()[6]. And
perhaps most importantly of all, it was pointed out that there is no
way to prevent duplication of a function with 100% reliability[7],
i.e. more edge cases may be encountered in the future.
So to summarize, the only way to prevent the compiler from duplicating
the global variable definition is to move the variable out of inline
assembly, which has been suggested several times over[1][7][8].
Resolve the aforementioned issues by moving the VMLAUNCH+VRESUME and
VM-Exit "handler" to standalone assembly sub-routines. Moving only
the core VMX transition codes allows the struct indexing to remain as
inline assembly and also allows the sub-routines to be used by
nested_vmx_check_vmentry_hw(). Reusing the sub-routines has a happy
side-effect of eliminating two VMWRITEs in the nested_early_check path
as there is no longer a need to dynamically change VMCS.HOST_RIP.
Note that callers to vmx_vmenter() must account for the CALL modifying
RSP, e.g. must subtract op-size from RSP when synchronizing RSP with
VMCS.HOST_RSP and "restore" RSP prior to the CALL. There are no great
alternatives to fudging RSP. Saving RSP in vmx_enter() is difficult
because doing so requires a second register (VMWRITE does not provide
an immediate encoding for the VMCS field and KVM supports Hyper-V's
memory-based eVMCS ABI). The other more drastic alternative would be
to use eschew VMCS.HOST_RSP and manually save/load RSP using a per-cpu
variable (which can be encoded as e.g. gs:[imm]). But because a valid
stack is needed at the time of VM-Exit (NMIs aren't blocked and a user
could theoretically insert INT3/INT1ICEBRK at the VM-Exit handler), a
dedicated per-cpu VM-Exit stack would be required. A dedicated stack
isn't difficult to implement, but it would require at least one page
per CPU and knowledge of the stack in the dumpstack routines. And in
most cases there is essentially zero overhead in dynamically updating
VMCS.HOST_RSP, e.g. the VMWRITE can be avoided for all but the first
VMLAUNCH unless nested_early_check=1, which is not a fast path. In
other words, avoiding the VMCS.HOST_RSP by using a dedicated stack
would only make the code marginally less ugly while requiring at least
one page per CPU and forcing the kernel to be aware (and approve) of
the VM-Exit stack shenanigans.
[1] cea15c24ca39 ("KVM: Move KVM context switch into own function")
[2] a3b5ba49a8c5 ("KVM: VMX: add the __noclone attribute to vmx_vcpu_run")
[3] 104f226bfd0a ("KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()")
[4] 95272c29378e ("compiler-gcc: disable -ftracer for __noclone functions")
[5] https://lkml.kernel.org/r/20181218140105.ajuiglkpvstt3qxs@treble
[6] https://patchwork.kernel.org/patch/8707981/#21817015
[7] https://lkml.kernel.org/r/ri6y38lo23g.fsf@suse.cz
[8] https://lkml.kernel.org/r/20181218212042.GE25620@tassilo.jf.intel.com
Suggested-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Martin Jambor <mjambor@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-20 13:25:17 -07:00
void vmx_vmexit ( void ) ;
2019-07-19 14:41:07 -06:00
# define vmx_insn_failed(fmt...) \
do { \
WARN_ONCE ( 1 , fmt ) ; \
pr_warn_ratelimited ( fmt ) ; \
} while ( 0 )
2019-07-19 14:41:08 -06:00
asmlinkage void vmread_error ( unsigned long field , bool fault )
{
if ( fault )
kvm_spurious_fault ( ) ;
else
vmx_insn_failed ( " kvm: vmread failed: field=%lx \n " , field ) ;
}
2019-07-19 14:41:07 -06:00
noinline void vmwrite_error ( unsigned long field , unsigned long value )
{
vmx_insn_failed ( " kvm: vmwrite failed: field=%lx val=%lx err=%d \n " ,
field , value , vmcs_read32 ( VM_INSTRUCTION_ERROR ) ) ;
}
noinline void vmclear_error ( struct vmcs * vmcs , u64 phys_addr )
{
vmx_insn_failed ( " kvm: vmclear failed: %p/%llx \n " , vmcs , phys_addr ) ;
}
noinline void vmptrld_error ( struct vmcs * vmcs , u64 phys_addr )
{
vmx_insn_failed ( " kvm: vmptrld failed: %p/%llx \n " , vmcs , phys_addr ) ;
}
noinline void invvpid_error ( unsigned long ext , u16 vpid , gva_t gva )
{
vmx_insn_failed ( " kvm: invvpid failed: ext=0x%lx vpid=%u gva=0x%lx \n " ,
ext , vpid , gva ) ;
}
noinline void invept_error ( unsigned long ext , u64 eptp , gpa_t gpa )
{
vmx_insn_failed ( " kvm: invept failed: ext=0x%lx eptp=%llx gpa=0x%llx \n " ,
ext , eptp , gpa ) ;
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
static DEFINE_PER_CPU ( struct vmcs * , vmxarea ) ;
2018-12-03 14:53:06 -07:00
DEFINE_PER_CPU ( struct vmcs * , current_vmcs ) ;
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
/*
* We maintain a per - CPU linked - list of VMCS loaded on that CPU . This is needed
* when a CPU is brought down , and we need to VMCLEAR all VMCSs loaded on it .
*/
static DEFINE_PER_CPU ( struct list_head , loaded_vmcss_on_cpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2015-09-18 08:29:55 -06:00
/*
* We maintian a per - CPU linked - list of vCPU , so in wakeup_handler ( ) we
* can find which vCPU should be waken up .
*/
static DEFINE_PER_CPU ( struct list_head , blocked_vcpu_on_cpu ) ;
static DEFINE_PER_CPU ( spinlock_t , blocked_vcpu_on_cpu_lock ) ;
2008-01-17 00:14:33 -07:00
static DECLARE_BITMAP ( vmx_vpid_bitmap , VMX_NR_VPIDS ) ;
static DEFINE_SPINLOCK ( vmx_vpid_lock ) ;
2018-12-03 14:53:02 -07:00
struct vmcs_config vmcs_config ;
struct vmx_capability vmx_capability ;
2008-04-24 20:13:16 -06:00
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
# define VMX_SEGMENT_FIELD(seg) \
[ VCPU_SREG_ # # seg ] = { \
. selector = GUEST_ # # seg # # _SELECTOR , \
. base = GUEST_ # # seg # # _BASE , \
. limit = GUEST_ # # seg # # _LIMIT , \
. ar_bytes = GUEST_ # # seg # # _AR_BYTES , \
}
2012-08-29 17:30:19 -06:00
static const struct kvm_vmx_segment_field {
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
unsigned selector ;
unsigned base ;
unsigned limit ;
unsigned ar_bytes ;
} kvm_vmx_segment_fields [ ] = {
VMX_SEGMENT_FIELD ( CS ) ,
VMX_SEGMENT_FIELD ( DS ) ,
VMX_SEGMENT_FIELD ( ES ) ,
VMX_SEGMENT_FIELD ( FS ) ,
VMX_SEGMENT_FIELD ( GS ) ,
VMX_SEGMENT_FIELD ( SS ) ,
VMX_SEGMENT_FIELD ( TR ) ,
VMX_SEGMENT_FIELD ( LDTR ) ,
} ;
2020-04-15 14:34:52 -06:00
static inline void vmx_segment_cache_clear ( struct vcpu_vmx * vmx )
{
vmx - > segment_cache . bitmask = 0 ;
}
KVM: VMX: Store the host kernel's IDT base in a global variable
Although the kernel may use multiple IDTs, KVM should only ever see the
"real" IDT, e.g. the early init IDT is long gone by the time KVM runs
and the debug stack IDT is only used for small windows of time in very
specific flows.
Before commit a547c6db4d2f1 ("KVM: VMX: Enable acknowledge interupt on
vmexit"), the kernel's IDT base was consumed by KVM only when setting
constant VMCS state, i.e. to set VMCS.HOST_IDTR_BASE. Because constant
host state is done once per vCPU, there was ostensibly no need to cache
the kernel's IDT base.
When support for "ack interrupt on exit" was introduced, KVM added a
second consumer of the IDT base as handling already-acked interrupts
requires directly calling the interrupt handler, i.e. KVM uses the IDT
base to find the address of the handler. Because interrupts are a fast
path, KVM cached the IDT base to avoid having to VMREAD HOST_IDTR_BASE.
Presumably, the IDT base was cached on a per-vCPU basis simply because
the existing code grabbed the IDT base on a per-vCPU (VMCS) basis.
Note, all post-boot IDTs use the same handlers for external interrupts,
i.e. the "ack interrupt on exit" use of the IDT base would be unaffected
even if the cached IDT somehow did not match the current IDT. And as
for the original use case of setting VMCS.HOST_IDTR_BASE, if any of the
above analysis is wrong then KVM has had a bug since the beginning of
time since KVM has effectively been caching the IDT at vCPU creation
since commit a8b732ca01c ("[PATCH] kvm: userspace interface").
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-19 23:50:57 -06:00
static unsigned long host_idt_base ;
2009-09-07 02:14:12 -06:00
2007-04-19 05:28:44 -06:00
/*
2018-12-05 16:28:59 -07:00
* Though SYSCALL is only supported in 64 - bit mode on Intel CPUs , kvm
* will emulate SYSCALL in legacy mode if the vendor string in guest
* CPUID .0 : { EBX , ECX , EDX } is " AuthenticAMD " or " AMDisbetter! " To
* support this emulation , IA32_STAR must always be included in
* vmx_msr_index [ ] , even in i386 builds .
2007-04-19 05:28:44 -06:00
*/
2018-12-03 14:53:15 -07:00
const u32 vmx_msr_index [ ] = {
2006-12-13 01:33:45 -07:00
# ifdef CONFIG_X86_64
2009-09-06 06:55:37 -06:00
MSR_SYSCALL_MASK , MSR_LSTAR , MSR_CSTAR ,
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
# endif
2010-07-17 07:03:26 -06:00
MSR_EFER , MSR_TSC_AUX , MSR_STAR ,
2019-11-18 10:23:00 -07:00
MSR_IA32_TSX_CTRL ,
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
} ;
2018-03-20 08:02:11 -06:00
# if IS_ENABLED(CONFIG_HYPERV)
static bool __read_mostly enlightened_vmcs = true ;
module_param ( enlightened_vmcs , bool , 0444 ) ;
2018-07-19 02:40:23 -06:00
/* check_ept_pointer() should be under protection of ept_pointer_lock. */
static void check_ept_pointer_match ( struct kvm * kvm )
{
struct kvm_vcpu * vcpu ;
u64 tmp_eptp = INVALID_PAGE ;
int i ;
kvm_for_each_vcpu ( i , vcpu , kvm ) {
if ( ! VALID_PAGE ( tmp_eptp ) ) {
tmp_eptp = to_vmx ( vcpu ) - > ept_pointer ;
} else if ( tmp_eptp ! = to_vmx ( vcpu ) - > ept_pointer ) {
to_kvm_vmx ( kvm ) - > ept_pointers_match
= EPT_POINTERS_MISMATCH ;
return ;
}
}
to_kvm_vmx ( kvm ) - > ept_pointers_match = EPT_POINTERS_MATCH ;
}
2019-01-21 00:27:05 -07:00
static int kvm_fill_hv_flush_list_func ( struct hv_guest_mapping_flush_list * flush ,
2018-12-06 06:21:07 -07:00
void * data )
{
struct kvm_tlb_range * range = data ;
return hyperv_fill_flush_guest_mapping_list ( flush , range - > start_gfn ,
range - > pages ) ;
}
static inline int __hv_remote_flush_tlb_with_range ( struct kvm * kvm ,
struct kvm_vcpu * vcpu , struct kvm_tlb_range * range )
{
u64 ept_pointer = to_vmx ( vcpu ) - > ept_pointer ;
/*
* FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs address
* of the base of EPT PML4 table , strip off EPT configuration
* information .
*/
if ( range )
return hyperv_flush_guest_mapping_range ( ept_pointer & PAGE_MASK ,
kvm_fill_hv_flush_list_func , ( void * ) range ) ;
else
return hyperv_flush_guest_mapping ( ept_pointer & PAGE_MASK ) ;
}
static int hv_remote_flush_tlb_with_range ( struct kvm * kvm ,
struct kvm_tlb_range * range )
2018-07-19 02:40:23 -06:00
{
2018-10-13 08:54:05 -06:00
struct kvm_vcpu * vcpu ;
2019-01-04 00:20:44 -07:00
int ret = 0 , i ;
2018-07-19 02:40:23 -06:00
spin_lock ( & to_kvm_vmx ( kvm ) - > ept_pointer_lock ) ;
if ( to_kvm_vmx ( kvm ) - > ept_pointers_match = = EPT_POINTERS_CHECK )
check_ept_pointer_match ( kvm ) ;
if ( to_kvm_vmx ( kvm ) - > ept_pointers_match ! = EPT_POINTERS_MATCH ) {
2018-12-06 00:34:36 -07:00
kvm_for_each_vcpu ( i , vcpu , kvm ) {
2018-12-06 06:21:07 -07:00
/* If ept_pointer is invalid pointer, bypass flush request. */
if ( VALID_PAGE ( to_vmx ( vcpu ) - > ept_pointer ) )
ret | = __hv_remote_flush_tlb_with_range (
kvm , vcpu , range ) ;
2018-12-06 00:34:36 -07:00
}
2018-10-13 08:54:05 -06:00
} else {
2018-12-06 06:21:07 -07:00
ret = __hv_remote_flush_tlb_with_range ( kvm ,
kvm_get_vcpu ( kvm , 0 ) , range ) ;
2018-07-19 02:40:23 -06:00
}
spin_unlock ( & to_kvm_vmx ( kvm ) - > ept_pointer_lock ) ;
return ret ;
}
2018-12-06 06:21:07 -07:00
static int hv_remote_flush_tlb ( struct kvm * kvm )
{
return hv_remote_flush_tlb_with_range ( kvm , NULL ) ;
}
2019-08-22 08:30:21 -06:00
static int hv_enable_direct_tlbflush ( struct kvm_vcpu * vcpu )
{
struct hv_enlightened_vmcs * evmcs ;
struct hv_partition_assist_pg * * p_hv_pa_pg =
& vcpu - > kvm - > arch . hyperv . hv_pa_pg ;
/*
* Synthetic VM - Exit is not enabled in current code and so All
* evmcs in singe VM shares same assist page .
*/
2019-09-25 07:30:35 -06:00
if ( ! * p_hv_pa_pg )
2019-08-22 08:30:21 -06:00
* p_hv_pa_pg = kzalloc ( PAGE_SIZE , GFP_KERNEL ) ;
2019-09-25 07:30:35 -06:00
if ( ! * p_hv_pa_pg )
return - ENOMEM ;
2019-08-22 08:30:21 -06:00
evmcs = ( struct hv_enlightened_vmcs * ) to_vmx ( vcpu ) - > loaded_vmcs - > vmcs ;
evmcs - > partition_assist_page =
__pa ( * p_hv_pa_pg ) ;
2019-09-25 07:30:35 -06:00
evmcs - > hv_vm_id = ( unsigned long ) vcpu - > kvm ;
2019-08-22 08:30:21 -06:00
evmcs - > hv_enlightenments_control . nested_flush_hypercall = 1 ;
return 0 ;
}
2018-03-20 08:02:11 -06:00
# endif /* IS_ENABLED(CONFIG_HYPERV) */
2016-06-13 15:19:59 -06:00
/*
* Comment ' s format : document - errata name - stepping - processor name .
* Refer from
* https : //www.virtualbox.org/svn/vbox/trunk/src/VBox/VMM/VMMR0/HMR0.cpp
*/
static u32 vmx_preemption_cpu_tfms [ ] = {
/* 323344.pdf - BA86 - D0 - Xeon 7500 Series */
0x000206E6 ,
/* 323056.pdf - AAX65 - C2 - Xeon L3406 */
/* 322814.pdf - AAT59 - C2 - i7-600, i5-500, i5-400 and i3-300 Mobile */
/* 322911.pdf - AAU65 - C2 - i5-600, i3-500 Desktop and Pentium G6950 */
0x00020652 ,
/* 322911.pdf - AAU65 - K0 - i5-600, i3-500 Desktop and Pentium G6950 */
0x00020655 ,
/* 322373.pdf - AAO95 - B1 - Xeon 3400 Series */
/* 322166.pdf - AAN92 - B1 - i7-800 and i5-700 Desktop */
/*
* 320767. pdf - AAP86 - B1 -
* i7 - 900 Mobile Extreme , i7 - 800 and i7 - 700 Mobile
*/
0x000106E5 ,
/* 321333.pdf - AAM126 - C0 - Xeon 3500 */
0x000106A0 ,
/* 321333.pdf - AAM126 - C1 - Xeon 3500 */
0x000106A1 ,
/* 320836.pdf - AAJ124 - C0 - i7-900 Desktop Extreme and i7-900 Desktop */
0x000106A4 ,
/* 321333.pdf - AAM126 - D0 - Xeon 3500 */
/* 321324.pdf - AAK139 - D0 - Xeon 5500 */
/* 320836.pdf - AAJ124 - D0 - i7-900 Extreme and i7-900 Desktop */
0x000106A5 ,
2018-12-03 13:13:32 -07:00
/* Xeon E3-1220 V2 */
0x000306A8 ,
2016-06-13 15:19:59 -06:00
} ;
static inline bool cpu_has_broken_vmx_preemption_timer ( void )
{
u32 eax = cpuid_eax ( 0x00000001 ) , i ;
/* Clear the reserved bits */
eax & = ~ ( 0x3U < < 14 | 0xfU < < 28 ) ;
2016-07-04 09:13:07 -06:00
for ( i = 0 ; i < ARRAY_SIZE ( vmx_preemption_cpu_tfms ) ; i + + )
2016-06-13 15:19:59 -06:00
if ( eax = = vmx_preemption_cpu_tfms [ i ] )
return true ;
return false ;
}
2015-07-29 04:05:37 -06:00
static inline bool cpu_need_virtualize_apic_accesses ( struct kvm_vcpu * vcpu )
2007-10-28 19:40:42 -06:00
{
2015-07-29 04:05:37 -06:00
return flexpriority_enabled & & lapic_in_kernel ( vcpu ) ;
2007-10-28 19:40:42 -06:00
}
2009-04-01 01:52:31 -06:00
static inline bool report_flexpriority ( void )
{
return flexpriority_enabled ;
}
2018-12-03 14:53:16 -07:00
static inline int __find_msr_index ( struct vcpu_vmx * vmx , u32 msr )
2006-12-13 01:34:01 -07:00
{
int i ;
2007-07-27 06:13:10 -06:00
for ( i = 0 ; i < vmx - > nmsrs ; + + i )
2009-09-07 02:14:12 -06:00
if ( vmx_msr_index [ vmx - > guest_msrs [ i ] . index ] = = msr )
2007-05-17 09:55:15 -06:00
return i ;
return - 1 ;
}
2018-12-03 14:53:16 -07:00
struct shared_msr_entry * find_msr_entry ( struct vcpu_vmx * vmx , u32 msr )
2007-05-17 09:55:15 -06:00
{
int i ;
2007-07-30 00:31:43 -06:00
i = __find_msr_index ( vmx , msr ) ;
2007-05-17 09:55:15 -06:00
if ( i > = 0 )
2007-07-27 06:13:10 -06:00
return & vmx - > guest_msrs [ i ] ;
2007-02-09 09:38:40 -07:00
return NULL ;
2006-12-13 01:34:01 -07:00
}
2019-11-18 10:23:01 -07:00
static int vmx_set_guest_msr ( struct vcpu_vmx * vmx , struct shared_msr_entry * msr , u64 data )
{
int ret = 0 ;
u64 old_msr_data = msr - > data ;
msr - > data = data ;
if ( msr - vmx - > guest_msrs < vmx - > save_nmsrs ) {
preempt_disable ( ) ;
ret = kvm_set_shared_msr ( msr - > index , msr - > data ,
msr - > mask ) ;
preempt_enable ( ) ;
if ( ret )
msr - > data = old_msr_data ;
}
return ret ;
}
2015-09-09 16:38:55 -06:00
# ifdef CONFIG_KEXEC_CORE
2012-12-06 08:43:34 -07:00
static void crash_vmclear_local_loaded_vmcss ( void )
{
int cpu = raw_smp_processor_id ( ) ;
struct loaded_vmcs * v ;
list_for_each_entry ( v , & per_cpu ( loaded_vmcss_on_cpu , cpu ) ,
loaded_vmcss_on_cpu_link )
vmcs_clear ( v - > vmcs ) ;
}
2015-09-09 16:38:55 -06:00
# endif /* CONFIG_KEXEC_CORE */
2012-12-06 08:43:34 -07:00
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
static void __loaded_vmcs_clear ( void * arg )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
struct loaded_vmcs * loaded_vmcs = arg ;
2007-01-05 17:36:23 -07:00
int cpu = raw_smp_processor_id ( ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
if ( loaded_vmcs - > cpu ! = cpu )
return ; /* vcpu migration can race with cpu offline */
if ( per_cpu ( current_vmcs , cpu ) = = loaded_vmcs - > vmcs )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
per_cpu ( current_vmcs , cpu ) = NULL ;
KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.
Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs(). This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.
Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel. Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.
Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration. list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.
In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR. Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().
Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.
[*] https://patchwork.kernel.org/patch/1675731/#3720461
Fixes: 8f536b7697a0 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-21 13:37:49 -06:00
vmcs_clear ( loaded_vmcs - > vmcs ) ;
if ( loaded_vmcs - > shadow_vmcs & & loaded_vmcs - > launched )
vmcs_clear ( loaded_vmcs - > shadow_vmcs ) ;
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
list_del ( & loaded_vmcs - > loaded_vmcss_on_cpu_link ) ;
2012-11-28 05:54:14 -07:00
/*
KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.
Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs(). This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.
Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel. Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.
Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration. list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.
In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR. Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().
Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.
[*] https://patchwork.kernel.org/patch/1675731/#3720461
Fixes: 8f536b7697a0 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-21 13:37:49 -06:00
* Ensure all writes to loaded_vmcs , including deleting it from its
* current percpu list , complete before setting loaded_vmcs - > vcpu to
* - 1 , otherwise a different cpu can see vcpu = = - 1 first and add
* loaded_vmcs to its percpu list before it ' s deleted from this cpu ' s
* list . Pairs with the smp_rmb ( ) in vmx_vcpu_load_vmcs ( ) .
2012-11-28 05:54:14 -07:00
*/
smp_wmb ( ) ;
KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.
Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs(). This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.
Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel. Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.
Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration. list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.
In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR. Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().
Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.
[*] https://patchwork.kernel.org/patch/1675731/#3720461
Fixes: 8f536b7697a0 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-21 13:37:49 -06:00
loaded_vmcs - > cpu = - 1 ;
loaded_vmcs - > launched = 0 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2018-12-03 14:53:07 -07:00
void loaded_vmcs_clear ( struct loaded_vmcs * loaded_vmcs )
2007-02-12 01:54:46 -07:00
{
2012-11-28 05:53:15 -07:00
int cpu = loaded_vmcs - > cpu ;
if ( cpu ! = - 1 )
smp_call_function_single ( cpu ,
__loaded_vmcs_clear , loaded_vmcs , 1 ) ;
2007-02-12 01:54:46 -07:00
}
2011-04-27 10:42:18 -06:00
static bool vmx_segment_cache_test_set ( struct vcpu_vmx * vmx , unsigned seg ,
unsigned field )
{
bool ret ;
u32 mask = 1 < < ( seg * SEG_FIELD_NR + field ) ;
2019-09-27 15:45:22 -06:00
if ( ! kvm_register_is_available ( & vmx - > vcpu , VCPU_EXREG_SEGMENTS ) ) {
kvm_register_mark_available ( & vmx - > vcpu , VCPU_EXREG_SEGMENTS ) ;
2011-04-27 10:42:18 -06:00
vmx - > segment_cache . bitmask = 0 ;
}
ret = vmx - > segment_cache . bitmask & mask ;
vmx - > segment_cache . bitmask | = mask ;
return ret ;
}
static u16 vmx_read_guest_seg_selector ( struct vcpu_vmx * vmx , unsigned seg )
{
u16 * p = & vmx - > segment_cache . seg [ seg ] . selector ;
if ( ! vmx_segment_cache_test_set ( vmx , seg , SEG_FIELD_SEL ) )
* p = vmcs_read16 ( kvm_vmx_segment_fields [ seg ] . selector ) ;
return * p ;
}
static ulong vmx_read_guest_seg_base ( struct vcpu_vmx * vmx , unsigned seg )
{
ulong * p = & vmx - > segment_cache . seg [ seg ] . base ;
if ( ! vmx_segment_cache_test_set ( vmx , seg , SEG_FIELD_BASE ) )
* p = vmcs_readl ( kvm_vmx_segment_fields [ seg ] . base ) ;
return * p ;
}
static u32 vmx_read_guest_seg_limit ( struct vcpu_vmx * vmx , unsigned seg )
{
u32 * p = & vmx - > segment_cache . seg [ seg ] . limit ;
if ( ! vmx_segment_cache_test_set ( vmx , seg , SEG_FIELD_LIMIT ) )
* p = vmcs_read32 ( kvm_vmx_segment_fields [ seg ] . limit ) ;
return * p ;
}
static u32 vmx_read_guest_seg_ar ( struct vcpu_vmx * vmx , unsigned seg )
{
u32 * p = & vmx - > segment_cache . seg [ seg ] . ar ;
if ( ! vmx_segment_cache_test_set ( vmx , seg , SEG_FIELD_AR ) )
* p = vmcs_read32 ( kvm_vmx_segment_fields [ seg ] . ar_bytes ) ;
return * p ;
}
2018-12-03 14:53:16 -07:00
void update_exception_bitmap ( struct kvm_vcpu * vcpu )
2007-05-02 08:57:40 -06:00
{
u32 eb ;
2010-01-20 10:20:20 -07:00
eb = ( 1u < < PF_VECTOR ) | ( 1u < < UD_VECTOR ) | ( 1u < < MC_VECTOR ) |
2017-02-03 22:18:52 -07:00
( 1u < < DB_VECTOR ) | ( 1u < < AC_VECTOR ) ;
2018-03-12 05:12:51 -06:00
/*
* Guest access to VMware backdoor ports could legitimately
* trigger # GP because of TSS I / O permission bitmap .
* We intercept those # GP and allow access to them anyway
* as VMware does .
*/
if ( enable_vmware_backdoor )
eb | = ( 1u < < GP_VECTOR ) ;
2010-01-20 10:20:20 -07:00
if ( ( vcpu - > guest_debug &
( KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP ) ) = =
( KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP ) )
eb | = 1u < < BP_VECTOR ;
2009-06-09 05:10:45 -06:00
if ( to_vmx ( vcpu ) - > rmode . vm86_active )
2007-05-02 08:57:40 -06:00
eb = ~ 0 ;
2009-03-23 10:26:32 -06:00
if ( enable_ept )
2020-02-26 20:20:54 -07:00
eb & = ~ ( 1u < < PF_VECTOR ) ;
KVM: nVMX: Further fixes for lazy FPU loading
KVM's "Lazy FPU loading" means that sometimes L0 needs to set CR0.TS, even
if a guest didn't set it. Moreover, L0 must also trap CR0.TS changes and
NM exceptions, even if we have a guest hypervisor (L1) who didn't want these
traps. And of course, conversely: If L1 wanted to trap these events, we
must let it, even if L0 is not interested in them.
This patch fixes some existing KVM code (in update_exception_bitmap(),
vmx_fpu_activate(), vmx_fpu_deactivate()) to do the correct merging of L0's
and L1's needs. Note that handle_cr() was already fixed in the above patch,
and that new code in introduced in previous patches already handles CR0
correctly (see prepare_vmcs02(), prepare_vmcs12(), and nested_vmx_vmexit()).
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-05-25 14:15:08 -06:00
/* When we are running a nested L2 guest and L1 specified for it a
* certain exception bitmap , we must trap the same exceptions and pass
* them to L1 . When running L2 , we will only handle the exceptions
* specified above if L1 did not want them .
*/
if ( is_guest_mode ( vcpu ) )
eb | = get_vmcs12 ( vcpu ) - > exception_bitmap ;
2007-05-02 08:57:40 -06:00
vmcs_write32 ( EXCEPTION_BITMAP , eb ) ;
}
2018-02-01 14:59:45 -07:00
/*
* Check if MSR is intercepted for currently loaded MSR bitmap .
*/
static bool msr_write_intercepted ( struct kvm_vcpu * vcpu , u32 msr )
{
unsigned long * msr_bitmap ;
int f = sizeof ( unsigned long ) ;
if ( ! cpu_has_vmx_msr_bitmap ( ) )
return true ;
msr_bitmap = to_vmx ( vcpu ) - > loaded_vmcs - > msr_bitmap ;
if ( msr < = 0x1fff ) {
return ! ! test_bit ( msr , msr_bitmap + 0x800 / f ) ;
} else if ( ( msr > = 0xc0000000 ) & & ( msr < = 0xc0001fff ) ) {
msr & = 0x1fff ;
return ! ! test_bit ( msr , msr_bitmap + 0xc00 / f ) ;
}
return true ;
}
2013-11-25 06:37:13 -07:00
static void clear_atomic_switch_msr_special ( struct vcpu_vmx * vmx ,
unsigned long entry , unsigned long exit )
2011-10-05 06:01:22 -06:00
{
2013-11-25 06:37:13 -07:00
vm_entry_controls_clearbit ( vmx , entry ) ;
vm_exit_controls_clearbit ( vmx , exit ) ;
2011-10-05 06:01:22 -06:00
}
2019-11-07 22:14:39 -07:00
int vmx_find_msr_index ( struct vmx_msrs * m , u32 msr )
2018-06-20 18:11:39 -06:00
{
unsigned int i ;
for ( i = 0 ; i < m - > nr ; + + i ) {
if ( m - > val [ i ] . index = = msr )
return i ;
}
return - ENOENT ;
}
2010-04-28 07:40:38 -06:00
static void clear_atomic_switch_msr ( struct vcpu_vmx * vmx , unsigned msr )
{
2018-06-20 18:11:39 -06:00
int i ;
2010-04-28 07:40:38 -06:00
struct msr_autoload * m = & vmx - > msr_autoload ;
2011-10-05 06:01:22 -06:00
switch ( msr ) {
case MSR_EFER :
2018-12-03 14:53:00 -07:00
if ( cpu_has_load_ia32_efer ( ) ) {
2013-11-25 06:37:13 -07:00
clear_atomic_switch_msr_special ( vmx ,
VM_ENTRY_LOAD_IA32_EFER ,
2011-10-05 06:01:22 -06:00
VM_EXIT_LOAD_IA32_EFER ) ;
return ;
}
break ;
case MSR_CORE_PERF_GLOBAL_CTRL :
2018-12-03 14:53:00 -07:00
if ( cpu_has_load_perf_global_ctrl ( ) ) {
2013-11-25 06:37:13 -07:00
clear_atomic_switch_msr_special ( vmx ,
2011-10-05 06:01:22 -06:00
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL ,
VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL ) ;
return ;
}
break ;
2010-12-21 03:54:20 -07:00
}
2019-11-07 22:14:38 -07:00
i = vmx_find_msr_index ( & m - > guest , msr ) ;
2018-06-20 18:11:39 -06:00
if ( i < 0 )
2018-06-20 20:00:47 -06:00
goto skip_guest ;
2018-06-20 11:58:37 -06:00
- - m - > guest . nr ;
m - > guest . val [ i ] = m - > guest . val [ m - > guest . nr ] ;
vmcs_write32 ( VM_ENTRY_MSR_LOAD_COUNT , m - > guest . nr ) ;
2010-12-21 03:54:20 -07:00
2018-06-20 20:00:47 -06:00
skip_guest :
2019-11-07 22:14:38 -07:00
i = vmx_find_msr_index ( & m - > host , msr ) ;
2018-06-20 20:00:47 -06:00
if ( i < 0 )
2010-04-28 07:40:38 -06:00
return ;
2018-06-20 20:00:47 -06:00
- - m - > host . nr ;
m - > host . val [ i ] = m - > host . val [ m - > host . nr ] ;
2018-06-20 11:58:37 -06:00
vmcs_write32 ( VM_EXIT_MSR_LOAD_COUNT , m - > host . nr ) ;
2010-04-28 07:40:38 -06:00
}
2013-11-25 06:37:13 -07:00
static void add_atomic_switch_msr_special ( struct vcpu_vmx * vmx ,
unsigned long entry , unsigned long exit ,
unsigned long guest_val_vmcs , unsigned long host_val_vmcs ,
u64 guest_val , u64 host_val )
2011-10-05 06:01:22 -06:00
{
vmcs_write64 ( guest_val_vmcs , guest_val ) ;
2018-09-26 10:23:56 -06:00
if ( host_val_vmcs ! = HOST_IA32_EFER )
vmcs_write64 ( host_val_vmcs , host_val ) ;
2013-11-25 06:37:13 -07:00
vm_entry_controls_setbit ( vmx , entry ) ;
vm_exit_controls_setbit ( vmx , exit ) ;
2011-10-05 06:01:22 -06:00
}
2010-04-28 07:40:38 -06:00
static void add_atomic_switch_msr ( struct vcpu_vmx * vmx , unsigned msr ,
2018-06-20 20:01:22 -06:00
u64 guest_val , u64 host_val , bool entry_only )
2010-04-28 07:40:38 -06:00
{
2018-06-20 20:01:22 -06:00
int i , j = 0 ;
2010-04-28 07:40:38 -06:00
struct msr_autoload * m = & vmx - > msr_autoload ;
2011-10-05 06:01:22 -06:00
switch ( msr ) {
case MSR_EFER :
2018-12-03 14:53:00 -07:00
if ( cpu_has_load_ia32_efer ( ) ) {
2013-11-25 06:37:13 -07:00
add_atomic_switch_msr_special ( vmx ,
VM_ENTRY_LOAD_IA32_EFER ,
2011-10-05 06:01:22 -06:00
VM_EXIT_LOAD_IA32_EFER ,
GUEST_IA32_EFER ,
HOST_IA32_EFER ,
guest_val , host_val ) ;
return ;
}
break ;
case MSR_CORE_PERF_GLOBAL_CTRL :
2018-12-03 14:53:00 -07:00
if ( cpu_has_load_perf_global_ctrl ( ) ) {
2013-11-25 06:37:13 -07:00
add_atomic_switch_msr_special ( vmx ,
2011-10-05 06:01:22 -06:00
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL ,
VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL ,
GUEST_IA32_PERF_GLOBAL_CTRL ,
HOST_IA32_PERF_GLOBAL_CTRL ,
guest_val , host_val ) ;
return ;
}
break ;
KVM: VMX: disable PEBS before a guest entry
Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
would crash if you decided to run a host command that uses PEBS, like
perf record -e 'cpu/mem-stores/pp' -a
This happens because KVM is using VMX MSR switching to disable PEBS, but
SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
isn't safe:
When software needs to reconfigure PEBS facilities, it should allow a
quiescent period between stopping the prior event counting and setting
up a new PEBS event. The quiescent period is to allow any latent
residual PEBS records to complete its capture at their previously
specified buffer address (provided by IA32_DS_AREA).
There might not be a quiescent period after the MSR switch, so a CPU
ends up using host's MSR_IA32_DS_AREA to access an area in guest's
memory. (Or MSR switching is just buggy on some models.)
The guest can learn something about the host this way:
If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
in #PF where we leak host's MSR_IA32_DS_AREA through CR2.
After that, a malicious guest can map and configure memory where
MSR_IA32_DS_AREA is pointing and can therefore get an output from
host's tracing.
This is not a critical leak as the host must initiate with PEBS tracing
and I have not been able to get a record from more than one instruction
before vmentry in vmx_vcpu_run() (that place has most registers already
overwritten with guest's).
We could disable PEBS just few instructions before vmentry, but
disabling it earlier shouldn't affect host tracing too much.
We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
optimization isn't worth its code, IMO.
(If you are implementing PEBS for guests, be sure to handle the case
where both host and guest enable PEBS, because this patch doesn't.)
Fixes: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.")
Cc: <stable@vger.kernel.org>
Reported-by: Jiří Olša <jolsa@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04 07:08:42 -07:00
case MSR_IA32_PEBS_ENABLE :
/* PEBS needs a quiescent period after being disabled (to write
* a record ) . Disabling PEBS through VMX MSR swapping doesn ' t
* provide that period , so a CPU could write host ' s record into
* guest ' s memory .
*/
wrmsrl ( MSR_IA32_PEBS_ENABLE , 0 ) ;
2010-12-21 03:54:20 -07:00
}
2019-11-07 22:14:38 -07:00
i = vmx_find_msr_index ( & m - > guest , msr ) ;
2018-06-20 20:01:22 -06:00
if ( ! entry_only )
2019-11-07 22:14:38 -07:00
j = vmx_find_msr_index ( & m - > host , msr ) ;
2010-04-28 07:40:38 -06:00
2019-11-07 22:14:37 -07:00
if ( ( i < 0 & & m - > guest . nr = = NR_LOADSTORE_MSRS ) | |
( j < 0 & & m - > host . nr = = NR_LOADSTORE_MSRS ) ) {
2013-10-30 16:34:56 -06:00
printk_once ( KERN_WARNING " Not enough msr switch entries. "
2011-10-05 06:01:24 -06:00
" Can't add msr %x \n " , msr ) ;
return ;
2010-04-28 07:40:38 -06:00
}
2018-06-20 20:00:47 -06:00
if ( i < 0 ) {
2018-06-20 18:11:39 -06:00
i = m - > guest . nr + + ;
2018-06-20 11:58:37 -06:00
vmcs_write32 ( VM_ENTRY_MSR_LOAD_COUNT , m - > guest . nr ) ;
2018-06-20 20:00:47 -06:00
}
2018-06-20 20:01:22 -06:00
m - > guest . val [ i ] . index = msr ;
m - > guest . val [ i ] . value = guest_val ;
if ( entry_only )
return ;
2010-04-28 07:40:38 -06:00
2018-06-20 20:00:47 -06:00
if ( j < 0 ) {
j = m - > host . nr + + ;
2018-06-20 11:58:37 -06:00
vmcs_write32 ( VM_EXIT_MSR_LOAD_COUNT , m - > host . nr ) ;
2010-04-28 07:40:38 -06:00
}
2018-06-20 20:00:47 -06:00
m - > host . val [ j ] . index = msr ;
m - > host . val [ j ] . value = host_val ;
2010-04-28 07:40:38 -06:00
}
2009-10-29 03:00:16 -06:00
static bool update_transition_efer ( struct vcpu_vmx * vmx , int efer_offset )
2007-05-20 22:28:09 -06:00
{
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.
KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.
When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.
The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).
There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08 04:13:39 -07:00
u64 guest_efer = vmx - > vcpu . arch . efer ;
u64 ignore_bits = 0 ;
2019-10-27 09:23:23 -06:00
/* Shadow paging assumes NX to be available. */
if ( ! enable_ept )
guest_efer | = EFER_NX ;
2009-08-04 03:08:45 -06:00
2007-08-28 18:48:05 -06:00
/*
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.
KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.
When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.
The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).
There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08 04:13:39 -07:00
* LMA and LME handled by hardware ; SCE meaningless outside long mode .
2007-08-28 18:48:05 -06:00
*/
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.
KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.
When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.
The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).
There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08 04:13:39 -07:00
ignore_bits | = EFER_SCE ;
2007-08-28 18:48:05 -06:00
# ifdef CONFIG_X86_64
ignore_bits | = EFER_LMA | EFER_LME ;
/* SCE is meaningful only in long mode on Intel */
if ( guest_efer & EFER_LMA )
ignore_bits & = ~ ( u64 ) EFER_SCE ;
# endif
2010-04-28 07:42:29 -06:00
x86, kvm, vmx: Always use LOAD_IA32_EFER if available
At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
faster than switching it manually.
I benchmarked this using the vmexit kvm-unit-test (single run, but
GOAL multiplied by 5 to do more iterations):
Test Before After Change
cpuid 2000 1932 -3.40%
vmcall 1914 1817 -5.07%
mov_from_cr8 13 13 0.00%
mov_to_cr8 19 19 0.00%
inl_from_pmtimer 19164 10619 -44.59%
inl_from_qemu 15662 10302 -34.22%
inl_from_kernel 3916 3802 -2.91%
outl_to_kernel 2230 2194 -1.61%
mov_dr 172 176 2.33%
ipi (skipped) (skipped)
ipi+halt (skipped) (skipped)
ple-round-robin 13 13 0.00%
wr_tsc_adjust_msr 1920 1845 -3.91%
rd_tsc_adjust_msr 1892 1814 -4.12%
mmio-no-eventfd:pci-mem 16394 11165 -31.90%
mmio-wildcard-eventfd:pci-mem 4607 4645 0.82%
mmio-datamatch-eventfd:pci-mem 4601 4610 0.20%
portio-no-eventfd:pci-io 11507 7942 -30.98%
portio-wildcard-eventfd:pci-io 2239 2225 -0.63%
portio-datamatch-eventfd:pci-io 2250 2234 -0.71%
I haven't explicitly computed the significance of these numbers,
but this isn't subtle.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[The results were reproducible on all of Nehalem, Sandy Bridge and
Ivy Bridge. The slowness of manual switching is because writing
to EFER with WRMSR triggers a TLB flush, even if the only bit you're
touching is SCE (so the page table format is not affected). Doing
the write as part of vmentry/vmexit, instead, does not flush the TLB,
probably because all processors that have EPT also have VPID. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 19:25:18 -07:00
/*
* On EPT , we can ' t emulate NX , so we must switch EFER atomically .
* On CPUs that support " load IA32_EFER " , always switch EFER
* atomically , since it ' s faster than switching it manually .
*/
2018-12-03 14:53:00 -07:00
if ( cpu_has_load_ia32_efer ( ) | |
x86, kvm, vmx: Always use LOAD_IA32_EFER if available
At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
faster than switching it manually.
I benchmarked this using the vmexit kvm-unit-test (single run, but
GOAL multiplied by 5 to do more iterations):
Test Before After Change
cpuid 2000 1932 -3.40%
vmcall 1914 1817 -5.07%
mov_from_cr8 13 13 0.00%
mov_to_cr8 19 19 0.00%
inl_from_pmtimer 19164 10619 -44.59%
inl_from_qemu 15662 10302 -34.22%
inl_from_kernel 3916 3802 -2.91%
outl_to_kernel 2230 2194 -1.61%
mov_dr 172 176 2.33%
ipi (skipped) (skipped)
ipi+halt (skipped) (skipped)
ple-round-robin 13 13 0.00%
wr_tsc_adjust_msr 1920 1845 -3.91%
rd_tsc_adjust_msr 1892 1814 -4.12%
mmio-no-eventfd:pci-mem 16394 11165 -31.90%
mmio-wildcard-eventfd:pci-mem 4607 4645 0.82%
mmio-datamatch-eventfd:pci-mem 4601 4610 0.20%
portio-no-eventfd:pci-io 11507 7942 -30.98%
portio-wildcard-eventfd:pci-io 2239 2225 -0.63%
portio-datamatch-eventfd:pci-io 2250 2234 -0.71%
I haven't explicitly computed the significance of these numbers,
but this isn't subtle.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[The results were reproducible on all of Nehalem, Sandy Bridge and
Ivy Bridge. The slowness of manual switching is because writing
to EFER with WRMSR triggers a TLB flush, even if the only bit you're
touching is SCE (so the page table format is not affected). Doing
the write as part of vmentry/vmexit, instead, does not flush the TLB,
probably because all processors that have EPT also have VPID. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 19:25:18 -07:00
( enable_ept & & ( ( vmx - > vcpu . arch . efer ^ host_efer ) & EFER_NX ) ) ) {
2010-04-28 07:42:29 -06:00
if ( ! ( guest_efer & EFER_LMA ) )
guest_efer & = ~ EFER_LME ;
2014-11-10 12:19:15 -07:00
if ( guest_efer ! = host_efer )
add_atomic_switch_msr ( vmx , MSR_EFER ,
2018-06-20 20:01:22 -06:00
guest_efer , host_efer , false ) ;
2018-09-26 10:23:43 -06:00
else
clear_atomic_switch_msr ( vmx , MSR_EFER ) ;
2010-04-28 07:42:29 -06:00
return false ;
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.
KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.
When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.
The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).
There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08 04:13:39 -07:00
} else {
2018-09-26 10:23:43 -06:00
clear_atomic_switch_msr ( vmx , MSR_EFER ) ;
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.
KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.
When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.
The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).
There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08 04:13:39 -07:00
guest_efer & = ~ ignore_bits ;
guest_efer | = host_efer & ignore_bits ;
vmx - > guest_msrs [ efer_offset ] . data = guest_efer ;
vmx - > guest_msrs [ efer_offset ] . mask = ~ ignore_bits ;
2010-04-28 07:42:29 -06:00
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.
KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.
When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.
The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).
There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08 04:13:39 -07:00
return true ;
}
2007-08-28 18:48:05 -06:00
}
2017-02-20 09:56:11 -07:00
# ifdef CONFIG_X86_32
/*
* On 32 - bit kernels , VM exits still load the FS and GS bases from the
* VMCS rather than the segment table . KVM uses this helper to figure
* out the current bases to poke them into the VMCS before entry .
*/
2010-02-25 03:43:09 -07:00
static unsigned long segment_base ( u16 selector )
{
2017-02-20 09:56:12 -07:00
struct desc_struct * table ;
2010-02-25 03:43:09 -07:00
unsigned long v ;
2017-02-20 09:56:12 -07:00
if ( ! ( selector & ~ SEGMENT_RPL_MASK ) )
2010-02-25 03:43:09 -07:00
return 0 ;
2017-03-14 11:05:08 -06:00
table = get_current_gdt_ro ( ) ;
2010-02-25 03:43:09 -07:00
2017-02-20 09:56:12 -07:00
if ( ( selector & SEGMENT_TI_MASK ) = = SEGMENT_LDT ) {
2010-02-25 03:43:09 -07:00
u16 ldt_selector = kvm_read_ldt ( ) ;
2017-02-20 09:56:12 -07:00
if ( ! ( ldt_selector & ~ SEGMENT_RPL_MASK ) )
2010-02-25 03:43:09 -07:00
return 0 ;
2017-02-20 09:56:12 -07:00
table = ( struct desc_struct * ) segment_base ( ldt_selector ) ;
2010-02-25 03:43:09 -07:00
}
2017-02-20 09:56:12 -07:00
v = get_desc_base ( & table [ selector > > 3 ] ) ;
2010-02-25 03:43:09 -07:00
return v ;
}
2017-02-20 09:56:11 -07:00
# endif
2010-02-25 03:43:09 -07:00
2019-12-10 16:24:33 -07:00
static inline bool pt_can_write_msr ( struct vcpu_vmx * vmx )
{
2020-03-02 16:56:22 -07:00
return vmx_pt_mode_is_host_guest ( ) & &
2019-12-10 16:24:33 -07:00
! ( vmx - > pt_desc . guest . ctl & RTIT_CTL_TRACEEN ) ;
}
2018-10-24 02:05:12 -06:00
static inline void pt_load_msr ( struct pt_ctx * ctx , u32 addr_range )
{
u32 i ;
wrmsrl ( MSR_IA32_RTIT_STATUS , ctx - > status ) ;
wrmsrl ( MSR_IA32_RTIT_OUTPUT_BASE , ctx - > output_base ) ;
wrmsrl ( MSR_IA32_RTIT_OUTPUT_MASK , ctx - > output_mask ) ;
wrmsrl ( MSR_IA32_RTIT_CR3_MATCH , ctx - > cr3_match ) ;
for ( i = 0 ; i < addr_range ; i + + ) {
wrmsrl ( MSR_IA32_RTIT_ADDR0_A + i * 2 , ctx - > addr_a [ i ] ) ;
wrmsrl ( MSR_IA32_RTIT_ADDR0_B + i * 2 , ctx - > addr_b [ i ] ) ;
}
}
static inline void pt_save_msr ( struct pt_ctx * ctx , u32 addr_range )
{
u32 i ;
rdmsrl ( MSR_IA32_RTIT_STATUS , ctx - > status ) ;
rdmsrl ( MSR_IA32_RTIT_OUTPUT_BASE , ctx - > output_base ) ;
rdmsrl ( MSR_IA32_RTIT_OUTPUT_MASK , ctx - > output_mask ) ;
rdmsrl ( MSR_IA32_RTIT_CR3_MATCH , ctx - > cr3_match ) ;
for ( i = 0 ; i < addr_range ; i + + ) {
rdmsrl ( MSR_IA32_RTIT_ADDR0_A + i * 2 , ctx - > addr_a [ i ] ) ;
rdmsrl ( MSR_IA32_RTIT_ADDR0_B + i * 2 , ctx - > addr_b [ i ] ) ;
}
}
static void pt_guest_enter ( struct vcpu_vmx * vmx )
{
2020-03-02 16:56:22 -07:00
if ( vmx_pt_mode_is_system ( ) )
2018-10-24 02:05:12 -06:00
return ;
/*
2018-10-24 02:05:15 -06:00
* GUEST_IA32_RTIT_CTL is already set in the VMCS .
* Save host state before VM entry .
2018-10-24 02:05:12 -06:00
*/
2018-10-24 02:05:15 -06:00
rdmsrl ( MSR_IA32_RTIT_CTL , vmx - > pt_desc . host . ctl ) ;
2018-10-24 02:05:12 -06:00
if ( vmx - > pt_desc . guest . ctl & RTIT_CTL_TRACEEN ) {
wrmsrl ( MSR_IA32_RTIT_CTL , 0 ) ;
pt_save_msr ( & vmx - > pt_desc . host , vmx - > pt_desc . addr_range ) ;
pt_load_msr ( & vmx - > pt_desc . guest , vmx - > pt_desc . addr_range ) ;
}
}
static void pt_guest_exit ( struct vcpu_vmx * vmx )
{
2020-03-02 16:56:22 -07:00
if ( vmx_pt_mode_is_system ( ) )
2018-10-24 02:05:12 -06:00
return ;
if ( vmx - > pt_desc . guest . ctl & RTIT_CTL_TRACEEN ) {
pt_save_msr ( & vmx - > pt_desc . guest , vmx - > pt_desc . addr_range ) ;
pt_load_msr ( & vmx - > pt_desc . host , vmx - > pt_desc . addr_range ) ;
}
/* Reload host state (IA32_RTIT_CTL will be cleared on VM exit). */
wrmsrl ( MSR_IA32_RTIT_CTL , vmx - > pt_desc . host . ctl ) ;
}
2019-05-07 10:06:31 -06:00
void vmx_set_host_fs_gs ( struct vmcs_host_state * host , u16 fs_sel , u16 gs_sel ,
unsigned long fs_base , unsigned long gs_base )
{
if ( unlikely ( fs_sel ! = host - > fs_sel ) ) {
if ( ! ( fs_sel & 7 ) )
vmcs_write16 ( HOST_FS_SELECTOR , fs_sel ) ;
else
vmcs_write16 ( HOST_FS_SELECTOR , 0 ) ;
host - > fs_sel = fs_sel ;
}
if ( unlikely ( gs_sel ! = host - > gs_sel ) ) {
if ( ! ( gs_sel & 7 ) )
vmcs_write16 ( HOST_GS_SELECTOR , gs_sel ) ;
else
vmcs_write16 ( HOST_GS_SELECTOR , 0 ) ;
host - > gs_sel = gs_sel ;
}
if ( unlikely ( fs_base ! = host - > fs_base ) ) {
vmcs_writel ( HOST_FS_BASE , fs_base ) ;
host - > fs_base = fs_base ;
}
if ( unlikely ( gs_base ! = host - > gs_base ) ) {
vmcs_writel ( HOST_GS_BASE , gs_base ) ;
host - > gs_base = gs_base ;
}
}
2018-12-03 14:53:16 -07:00
void vmx_prepare_switch_to_guest ( struct kvm_vcpu * vcpu )
2007-05-02 07:54:03 -06:00
{
2007-09-10 09:10:54 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2018-07-23 13:32:47 -06:00
struct vmcs_host_state * host_state ;
2018-04-04 04:44:14 -06:00
# ifdef CONFIG_X86_64
2018-03-13 11:48:05 -06:00
int cpu = raw_smp_processor_id ( ) ;
2018-04-04 04:44:14 -06:00
# endif
2018-07-23 13:32:41 -06:00
unsigned long fs_base , gs_base ;
u16 fs_sel , gs_sel ;
2009-09-07 02:14:12 -06:00
int i ;
2007-09-10 09:10:54 -06:00
KVM: VMX: use preemption timer to force immediate VMExit
A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest. Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI. This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).
When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI. Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed. But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0. Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.
For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing. The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter. Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.
Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate. There are no known errata affecting timer value of '0'.
[1] I/O SMIs also have guarantees on when they arrive, but I have
no idea if/how those are emulated in KVM.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-27 16:21:12 -06:00
vmx - > req_immediate_exit = false ;
2018-11-20 09:03:25 -07:00
/*
* Note that guest MSRs to be saved / restored can also be changed
* when guest state is loaded . This happens when guest transitions
* to / from long - mode by setting MSR_EFER . LMA .
*/
2019-06-07 11:00:14 -06:00
if ( ! vmx - > guest_msrs_ready ) {
vmx - > guest_msrs_ready = true ;
2018-11-20 09:03:25 -07:00
for ( i = 0 ; i < vmx - > save_nmsrs ; + + i )
kvm_set_shared_msr ( vmx - > guest_msrs [ i ] . index ,
vmx - > guest_msrs [ i ] . data ,
vmx - > guest_msrs [ i ] . mask ) ;
}
2020-02-17 03:37:43 -07:00
if ( vmx - > nested . need_vmcs12_to_shadow_sync )
nested_sync_vmcs12_to_shadow ( vcpu ) ;
2019-06-07 11:00:14 -06:00
if ( vmx - > guest_state_loaded )
2007-05-02 07:54:03 -06:00
return ;
2019-06-07 11:00:14 -06:00
host_state = & vmx - > loaded_vmcs - > host_state ;
2018-07-23 13:32:42 -06:00
2007-05-02 07:54:03 -06:00
/*
* Set host fs and gs selectors . Unfortunately , 22.2 .3 does not
* allow segment selectors with cpl > 0 or ti = = 1.
*/
2018-07-23 13:32:47 -06:00
host_state - > ldt_sel = kvm_read_ldt ( ) ;
2018-03-13 11:48:04 -06:00
# ifdef CONFIG_X86_64
2018-07-23 13:32:47 -06:00
savesegment ( ds , host_state - > ds_sel ) ;
savesegment ( es , host_state - > es_sel ) ;
2018-07-23 13:32:41 -06:00
gs_base = cpu_kernelmode_gs_base ( cpu ) ;
2018-07-11 11:37:18 -06:00
if ( likely ( is_64bit_mm ( current - > mm ) ) ) {
save_fsgs_for_kvm ( ) ;
2018-07-23 13:32:41 -06:00
fs_sel = current - > thread . fsindex ;
gs_sel = current - > thread . gsindex ;
2018-07-11 11:37:18 -06:00
fs_base = current - > thread . fsbase ;
2018-07-23 13:32:41 -06:00
vmx - > msr_host_kernel_gs_base = current - > thread . gsbase ;
2018-07-11 11:37:18 -06:00
} else {
2018-07-23 13:32:41 -06:00
savesegment ( fs , fs_sel ) ;
savesegment ( gs , gs_sel ) ;
2018-07-11 11:37:18 -06:00
fs_base = read_msr ( MSR_FS_BASE ) ;
2018-07-23 13:32:41 -06:00
vmx - > msr_host_kernel_gs_base = read_msr ( MSR_KERNEL_GS_BASE ) ;
2007-05-02 07:54:03 -06:00
}
2012-05-13 10:53:24 -06:00
2018-09-24 09:23:01 -06:00
wrmsrl ( MSR_KERNEL_GS_BASE , vmx - > msr_guest_kernel_gs_base ) ;
2018-04-04 10:58:59 -06:00
# else
2018-07-23 13:32:41 -06:00
savesegment ( fs , fs_sel ) ;
savesegment ( gs , gs_sel ) ;
fs_base = segment_base ( fs_sel ) ;
gs_base = segment_base ( gs_sel ) ;
2007-05-02 08:33:43 -06:00
# endif
2018-07-23 13:32:41 -06:00
2019-05-07 10:06:31 -06:00
vmx_set_host_fs_gs ( host_state , fs_sel , gs_sel , fs_base , gs_base ) ;
2019-06-07 11:00:14 -06:00
vmx - > guest_state_loaded = true ;
2007-05-02 07:54:03 -06:00
}
2018-07-23 13:32:44 -06:00
static void vmx_prepare_switch_to_host ( struct vcpu_vmx * vmx )
2007-05-02 07:54:03 -06:00
{
2018-07-23 13:32:47 -06:00
struct vmcs_host_state * host_state ;
2019-06-07 11:00:14 -06:00
if ( ! vmx - > guest_state_loaded )
2007-05-02 07:54:03 -06:00
return ;
2019-06-07 11:00:14 -06:00
host_state = & vmx - > loaded_vmcs - > host_state ;
2018-07-23 13:32:42 -06:00
2007-11-18 04:50:24 -07:00
+ + vmx - > vcpu . stat . host_state_reload ;
2018-07-23 13:32:42 -06:00
2010-11-11 03:37:26 -07:00
# ifdef CONFIG_X86_64
2018-09-24 09:23:01 -06:00
rdmsrl ( MSR_KERNEL_GS_BASE , vmx - > msr_guest_kernel_gs_base ) ;
2010-11-11 03:37:26 -07:00
# endif
2018-07-23 13:32:47 -06:00
if ( host_state - > ldt_sel | | ( host_state - > gs_sel & 7 ) ) {
kvm_load_ldt ( host_state - > ldt_sel ) ;
2007-05-02 07:54:03 -06:00
# ifdef CONFIG_X86_64
2018-07-23 13:32:47 -06:00
load_gs_index ( host_state - > gs_sel ) ;
2010-10-19 08:46:55 -06:00
# else
2018-07-23 13:32:47 -06:00
loadsegment ( gs , host_state - > gs_sel ) ;
2007-05-02 07:54:03 -06:00
# endif
}
2018-07-23 13:32:47 -06:00
if ( host_state - > fs_sel & 7 )
loadsegment ( fs , host_state - > fs_sel ) ;
2012-05-13 10:53:24 -06:00
# ifdef CONFIG_X86_64
2018-07-23 13:32:47 -06:00
if ( unlikely ( host_state - > ds_sel | host_state - > es_sel ) ) {
loadsegment ( ds , host_state - > ds_sel ) ;
loadsegment ( es , host_state - > es_sel ) ;
2012-05-13 10:53:24 -06:00
}
# endif
2017-02-20 09:56:14 -07:00
invalidate_tss_limit ( ) ;
2009-09-06 06:55:37 -06:00
# ifdef CONFIG_X86_64
2010-11-11 03:37:26 -07:00
wrmsrl ( MSR_KERNEL_GS_BASE , vmx - > msr_host_kernel_gs_base ) ;
2009-09-06 06:55:37 -06:00
# endif
2017-03-14 11:05:08 -06:00
load_fixmap_gdt ( raw_smp_processor_id ( ) ) ;
2019-06-07 11:00:14 -06:00
vmx - > guest_state_loaded = false ;
vmx - > guest_msrs_ready = false ;
2007-05-02 07:54:03 -06:00
}
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
When lazy save/restore of MSR_KERNEL_GS_BASE was introduced[1], the
MSR was intercepted in all modes and was only restored for the host
when the guest is in 64-bit mode. So at the time, going through the
full host restore prior to accessing MSR_KERNEL_GS_BASE was necessary
to load host state and was not a significant waste of cycles.
Later, MSR_KERNEL_GS_BASE interception was disabled for a 64-bit
guest[2], and then unconditionally saved/restored for the host[3].
As a result, loading full host state is overkill for accesses to
MSR_KERNEL_GS_BASE, and completely unnecessary when the guest is
not in 64-bit mode.
Add a dedicated utility to read/write the guest's MSR_KERNEL_GS_BASE
(outside of the save/restore flow) to minimize the overhead incurred
when accessing the MSR. When setting EFER, only decache the MSR if
the new EFER will disable long mode.
Removing out-of-band usage of vmx_load_host_state() also eliminates,
or at least reduces, potential corner cases in its usage, which in
turn will (hopefuly) make it easier to reason about future changes
to the save/restore flow, e.g. optimization of saving host state.
[1] commit 44ea2b1758d8 ("KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx
autoload msr area")
[2] commit 5897297bc228 ("KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE")
[3] commit c8770e7ba63b ("KVM: VMX: Fix host userspace gsbase corruption")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-23 13:32:43 -06:00
# ifdef CONFIG_X86_64
static u64 vmx_read_guest_kernel_gs_base ( struct vcpu_vmx * vmx )
2008-06-24 02:48:49 -06:00
{
2018-09-24 09:23:01 -06:00
preempt_disable ( ) ;
2019-06-07 11:00:14 -06:00
if ( vmx - > guest_state_loaded )
2018-09-24 09:23:01 -06:00
rdmsrl ( MSR_KERNEL_GS_BASE , vmx - > msr_guest_kernel_gs_base ) ;
preempt_enable ( ) ;
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
When lazy save/restore of MSR_KERNEL_GS_BASE was introduced[1], the
MSR was intercepted in all modes and was only restored for the host
when the guest is in 64-bit mode. So at the time, going through the
full host restore prior to accessing MSR_KERNEL_GS_BASE was necessary
to load host state and was not a significant waste of cycles.
Later, MSR_KERNEL_GS_BASE interception was disabled for a 64-bit
guest[2], and then unconditionally saved/restored for the host[3].
As a result, loading full host state is overkill for accesses to
MSR_KERNEL_GS_BASE, and completely unnecessary when the guest is
not in 64-bit mode.
Add a dedicated utility to read/write the guest's MSR_KERNEL_GS_BASE
(outside of the save/restore flow) to minimize the overhead incurred
when accessing the MSR. When setting EFER, only decache the MSR if
the new EFER will disable long mode.
Removing out-of-band usage of vmx_load_host_state() also eliminates,
or at least reduces, potential corner cases in its usage, which in
turn will (hopefuly) make it easier to reason about future changes
to the save/restore flow, e.g. optimization of saving host state.
[1] commit 44ea2b1758d8 ("KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx
autoload msr area")
[2] commit 5897297bc228 ("KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE")
[3] commit c8770e7ba63b ("KVM: VMX: Fix host userspace gsbase corruption")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-23 13:32:43 -06:00
return vmx - > msr_guest_kernel_gs_base ;
2008-06-24 02:48:49 -06:00
}
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
When lazy save/restore of MSR_KERNEL_GS_BASE was introduced[1], the
MSR was intercepted in all modes and was only restored for the host
when the guest is in 64-bit mode. So at the time, going through the
full host restore prior to accessing MSR_KERNEL_GS_BASE was necessary
to load host state and was not a significant waste of cycles.
Later, MSR_KERNEL_GS_BASE interception was disabled for a 64-bit
guest[2], and then unconditionally saved/restored for the host[3].
As a result, loading full host state is overkill for accesses to
MSR_KERNEL_GS_BASE, and completely unnecessary when the guest is
not in 64-bit mode.
Add a dedicated utility to read/write the guest's MSR_KERNEL_GS_BASE
(outside of the save/restore flow) to minimize the overhead incurred
when accessing the MSR. When setting EFER, only decache the MSR if
the new EFER will disable long mode.
Removing out-of-band usage of vmx_load_host_state() also eliminates,
or at least reduces, potential corner cases in its usage, which in
turn will (hopefuly) make it easier to reason about future changes
to the save/restore flow, e.g. optimization of saving host state.
[1] commit 44ea2b1758d8 ("KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx
autoload msr area")
[2] commit 5897297bc228 ("KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE")
[3] commit c8770e7ba63b ("KVM: VMX: Fix host userspace gsbase corruption")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-23 13:32:43 -06:00
static void vmx_write_guest_kernel_gs_base ( struct vcpu_vmx * vmx , u64 data )
{
2018-09-24 09:23:01 -06:00
preempt_disable ( ) ;
2019-06-07 11:00:14 -06:00
if ( vmx - > guest_state_loaded )
2018-09-24 09:23:01 -06:00
wrmsrl ( MSR_KERNEL_GS_BASE , data ) ;
preempt_enable ( ) ;
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
When lazy save/restore of MSR_KERNEL_GS_BASE was introduced[1], the
MSR was intercepted in all modes and was only restored for the host
when the guest is in 64-bit mode. So at the time, going through the
full host restore prior to accessing MSR_KERNEL_GS_BASE was necessary
to load host state and was not a significant waste of cycles.
Later, MSR_KERNEL_GS_BASE interception was disabled for a 64-bit
guest[2], and then unconditionally saved/restored for the host[3].
As a result, loading full host state is overkill for accesses to
MSR_KERNEL_GS_BASE, and completely unnecessary when the guest is
not in 64-bit mode.
Add a dedicated utility to read/write the guest's MSR_KERNEL_GS_BASE
(outside of the save/restore flow) to minimize the overhead incurred
when accessing the MSR. When setting EFER, only decache the MSR if
the new EFER will disable long mode.
Removing out-of-band usage of vmx_load_host_state() also eliminates,
or at least reduces, potential corner cases in its usage, which in
turn will (hopefuly) make it easier to reason about future changes
to the save/restore flow, e.g. optimization of saving host state.
[1] commit 44ea2b1758d8 ("KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx
autoload msr area")
[2] commit 5897297bc228 ("KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE")
[3] commit c8770e7ba63b ("KVM: VMX: Fix host userspace gsbase corruption")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-23 13:32:43 -06:00
vmx - > msr_guest_kernel_gs_base = data ;
}
# endif
2015-09-18 08:29:54 -06:00
static void vmx_vcpu_pi_load ( struct kvm_vcpu * vcpu , int cpu )
{
struct pi_desc * pi_desc = vcpu_to_pi_desc ( vcpu ) ;
struct pi_desc old , new ;
unsigned int dest ;
2017-06-06 04:57:06 -06:00
/*
* In case of hot - plug or hot - unplug , we may have to undo
* vmx_vcpu_pi_put even if there is no assigned device . And we
* always keep PI . NDST up to date for simplicity : it makes the
* code easier , and CPU migration is not a fast path .
*/
if ( ! pi_test_sn ( pi_desc ) & & vcpu - > cpu = = cpu )
2015-09-18 08:29:54 -06:00
return ;
2019-11-11 10:20:11 -07:00
/*
* If the ' nv ' field is POSTED_INTR_WAKEUP_VECTOR , do not change
* PI . NDST : pi_post_block is the one expected to change PID . NDST and the
* wakeup handler expects the vCPU to be on the blocked_vcpu_list that
* matches PI . NDST . Otherwise , a vcpu may not be able to be woken up
* correctly .
*/
if ( pi_desc - > nv = = POSTED_INTR_WAKEUP_VECTOR | | vcpu - > cpu = = cpu ) {
pi_clear_sn ( pi_desc ) ;
goto after_clear_sn ;
}
2017-06-06 04:57:06 -06:00
/* The full case. */
2015-09-18 08:29:54 -06:00
do {
old . control = new . control = pi_desc - > control ;
2017-06-06 04:57:06 -06:00
dest = cpu_physical_id ( cpu ) ;
2015-09-18 08:29:54 -06:00
2017-06-06 04:57:06 -06:00
if ( x2apic_enabled ( ) )
new . ndst = dest ;
else
new . ndst = ( dest < < 8 ) & 0xFF00 ;
2015-09-18 08:29:54 -06:00
new . sn = 0 ;
2017-09-28 09:58:41 -06:00
} while ( cmpxchg64 ( & pi_desc - > control , old . control ,
new . control ) ! = old . control ) ;
2019-02-13 19:48:07 -07:00
2019-11-11 10:20:11 -07:00
after_clear_sn :
2019-02-13 19:48:07 -07:00
/*
* Clear SN before reading the bitmap . The VT - d firmware
* writes the bitmap and reads SN atomically ( 5.2 .3 in the
* spec ) , so it doesn ' t really have a memory barrier that
* pairs with this , but we cannot do that and we need one .
*/
smp_mb__after_atomic ( ) ;
2019-11-11 10:20:12 -07:00
if ( ! pi_is_pir_empty ( pi_desc ) )
2019-02-13 19:48:07 -07:00
pi_set_on ( pi_desc ) ;
2015-09-18 08:29:54 -06:00
}
2016-03-22 02:51:18 -06:00
2020-05-01 10:31:17 -06:00
void vmx_vcpu_load_vmcs ( struct kvm_vcpu * vcpu , int cpu ,
struct loaded_vmcs * buddy )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2007-07-27 06:13:10 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2016-07-29 19:56:53 -06:00
bool already_loaded = vmx - > loaded_vmcs - > cpu = = cpu ;
2020-05-01 10:31:17 -06:00
struct vmcs * prev ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2016-07-29 19:56:53 -06:00
if ( ! already_loaded ) {
2017-03-10 04:47:13 -07:00
loaded_vmcs_clear ( vmx - > loaded_vmcs ) ;
2010-05-11 04:29:42 -06:00
local_irq_disable ( ) ;
2012-11-28 05:54:14 -07:00
/*
KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.
Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs(). This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.
Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel. Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.
Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration. list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.
In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR. Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().
Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.
[*] https://patchwork.kernel.org/patch/1675731/#3720461
Fixes: 8f536b7697a0 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-21 13:37:49 -06:00
* Ensure loaded_vmcs - > cpu is read before adding loaded_vmcs to
* this cpu ' s percpu list , otherwise it may not yet be deleted
* from its previous cpu ' s percpu list . Pairs with the
* smb_wmb ( ) in __loaded_vmcs_clear ( ) .
2012-11-28 05:54:14 -07:00
*/
smp_rmb ( ) ;
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
list_add ( & vmx - > loaded_vmcs - > loaded_vmcss_on_cpu_link ,
& per_cpu ( loaded_vmcss_on_cpu , cpu ) ) ;
2010-05-11 04:29:42 -06:00
local_irq_enable ( ) ;
2016-07-29 19:56:53 -06:00
}
2020-05-01 10:31:17 -06:00
prev = per_cpu ( current_vmcs , cpu ) ;
if ( prev ! = vmx - > loaded_vmcs - > vmcs ) {
2016-07-29 19:56:53 -06:00
per_cpu ( current_vmcs , cpu ) = vmx - > loaded_vmcs - > vmcs ;
vmcs_load ( vmx - > loaded_vmcs - > vmcs ) ;
2020-05-01 10:31:17 -06:00
/*
* No indirect branch prediction barrier needed when switching
* the active VMCS within a guest , e . g . on nested VM - Enter .
* The L1 VMM can protect itself with retpolines , IBPB or IBRS .
*/
if ( ! buddy | | WARN_ON_ONCE ( buddy - > vmcs ! = prev ) )
indirect_branch_prediction_barrier ( ) ;
2016-07-29 19:56:53 -06:00
}
if ( ! already_loaded ) {
2017-03-22 15:32:33 -06:00
void * gdt = get_current_gdt_ro ( ) ;
2016-07-29 19:56:53 -06:00
unsigned long sysenter_esp ;
2020-03-20 15:28:20 -06:00
/*
* Flush all EPTP / VPID contexts , the new pCPU may have stale
* TLB entries from its previous association with the vCPU .
*/
2016-07-29 19:56:53 -06:00
kvm_make_request ( KVM_REQ_TLB_FLUSH , vcpu ) ;
2010-05-11 04:29:42 -06:00
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
/*
* Linux uses per - cpu TSS and GDT , so set these when switching
2017-02-20 09:56:10 -07:00
* processors . See 22.2 .4 .
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
*/
2017-02-20 09:56:10 -07:00
vmcs_writel ( HOST_TR_BASE ,
2017-12-04 07:07:20 -07:00
( unsigned long ) & get_cpu_entry_area ( cpu ) - > tss . x86_tss ) ;
2017-03-22 15:32:33 -06:00
vmcs_writel ( HOST_GDTR_BASE , ( unsigned long ) gdt ) ; /* 22.2.4 */
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
rdmsrl ( MSR_IA32_SYSENTER_ESP , sysenter_esp ) ;
vmcs_writel ( HOST_IA32_SYSENTER_ESP , sysenter_esp ) ; /* 22.2.3 */
2015-10-20 01:39:10 -06:00
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
vmx - > loaded_vmcs - > cpu = cpu ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2015-09-18 08:29:54 -06:00
2016-03-01 14:36:13 -07:00
/* Setup TSC multiplier */
if ( kvm_has_tsc_control & &
2016-08-17 10:36:47 -06:00
vmx - > current_tsc_ratio ! = vcpu - > arch . tsc_scaling_ratio )
decache_tsc_multiplier ( vmx ) ;
2019-05-07 10:06:32 -06:00
}
/*
* Switches to specified vcpu , until a matching vcpu_put ( ) , but assumes
* vcpu mutex is already taken .
*/
2020-05-06 17:58:50 -06:00
static void vmx_vcpu_load ( struct kvm_vcpu * vcpu , int cpu )
2019-05-07 10:06:32 -06:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2020-05-01 10:31:17 -06:00
vmx_vcpu_load_vmcs ( vcpu , cpu , NULL ) ;
2016-03-01 14:36:13 -07:00
2015-09-18 08:29:54 -06:00
vmx_vcpu_pi_load ( vcpu , cpu ) ;
2019-05-07 10:06:32 -06:00
2017-11-29 02:31:20 -07:00
vmx - > host_debugctlmsr = get_debugctlmsr ( ) ;
2015-09-18 08:29:54 -06:00
}
static void vmx_vcpu_pi_put ( struct kvm_vcpu * vcpu )
{
struct pi_desc * pi_desc = vcpu_to_pi_desc ( vcpu ) ;
if ( ! kvm_arch_has_assigned_device ( vcpu - > kvm ) | |
2016-06-12 19:56:56 -06:00
! irq_remapping_cap ( IRQ_POSTING_CAP ) | |
! kvm_vcpu_apicv_active ( vcpu ) )
2015-09-18 08:29:54 -06:00
return ;
/* Set SN when the vCPU is preempted */
if ( vcpu - > preempted )
pi_set_sn ( pi_desc ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2019-05-07 10:06:31 -06:00
static void vmx_vcpu_put ( struct kvm_vcpu * vcpu )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2015-09-18 08:29:54 -06:00
vmx_vcpu_pi_put ( vcpu ) ;
2018-07-23 13:32:44 -06:00
vmx_prepare_switch_to_host ( to_vmx ( vcpu ) ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2017-07-20 02:11:54 -06:00
static bool emulation_required ( struct kvm_vcpu * vcpu )
{
return emulate_invalid_guest_state & & ! guest_state_valid ( vcpu ) ;
}
2018-12-03 14:53:16 -07:00
unsigned long vmx_get_rflags ( struct kvm_vcpu * vcpu )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2019-09-27 15:45:18 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2010-04-08 09:19:35 -06:00
unsigned long rflags , save_rflags ;
2009-08-12 06:29:37 -06:00
2019-09-27 15:45:22 -06:00
if ( ! kvm_register_is_available ( vcpu , VCPU_EXREG_RFLAGS ) ) {
kvm_register_mark_available ( vcpu , VCPU_EXREG_RFLAGS ) ;
2011-03-07 03:51:22 -07:00
rflags = vmcs_readl ( GUEST_RFLAGS ) ;
2019-09-27 15:45:18 -06:00
if ( vmx - > rmode . vm86_active ) {
2011-03-07 03:51:22 -07:00
rflags & = RMODE_GUEST_OWNED_EFLAGS_BITS ;
2019-09-27 15:45:18 -06:00
save_rflags = vmx - > rmode . save_rflags ;
2011-03-07 03:51:22 -07:00
rflags | = save_rflags & ~ RMODE_GUEST_OWNED_EFLAGS_BITS ;
}
2019-09-27 15:45:18 -06:00
vmx - > rflags = rflags ;
2010-04-08 09:19:35 -06:00
}
2019-09-27 15:45:18 -06:00
return vmx - > rflags ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2018-12-03 14:53:16 -07:00
void vmx_set_rflags ( struct kvm_vcpu * vcpu , unsigned long rflags )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2019-09-27 15:45:18 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2019-09-27 15:45:19 -06:00
unsigned long old_rflags ;
2017-07-20 02:11:54 -06:00
2019-09-27 15:45:19 -06:00
if ( enable_unrestricted_guest ) {
2019-09-27 15:45:22 -06:00
kvm_register_mark_available ( vcpu , VCPU_EXREG_RFLAGS ) ;
2019-09-27 15:45:19 -06:00
vmx - > rflags = rflags ;
vmcs_writel ( GUEST_RFLAGS , rflags ) ;
return ;
}
old_rflags = vmx_get_rflags ( vcpu ) ;
2019-09-27 15:45:18 -06:00
vmx - > rflags = rflags ;
if ( vmx - > rmode . vm86_active ) {
vmx - > rmode . save_rflags = rflags ;
2008-01-30 05:31:27 -07:00
rflags | = X86_EFLAGS_IOPL | X86_EFLAGS_VM ;
2010-04-08 09:19:35 -06:00
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_writel ( GUEST_RFLAGS , rflags ) ;
2017-07-20 02:11:54 -06:00
2019-09-27 15:45:18 -06:00
if ( ( old_rflags ^ vmx - > rflags ) & X86_EFLAGS_VM )
vmx - > emulation_required = emulation_required ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2018-12-03 14:53:16 -07:00
u32 vmx_get_interrupt_shadow ( struct kvm_vcpu * vcpu )
2009-05-12 14:21:05 -06:00
{
u32 interruptibility = vmcs_read32 ( GUEST_INTERRUPTIBILITY_INFO ) ;
int ret = 0 ;
if ( interruptibility & GUEST_INTR_STATE_STI )
2010-02-19 11:38:07 -07:00
ret | = KVM_X86_SHADOW_INT_STI ;
2009-05-12 14:21:05 -06:00
if ( interruptibility & GUEST_INTR_STATE_MOV_SS )
2010-02-19 11:38:07 -07:00
ret | = KVM_X86_SHADOW_INT_MOV_SS ;
2009-05-12 14:21:05 -06:00
2014-05-20 06:29:47 -06:00
return ret ;
2009-05-12 14:21:05 -06:00
}
2018-12-03 14:53:16 -07:00
void vmx_set_interrupt_shadow ( struct kvm_vcpu * vcpu , int mask )
2009-05-12 14:21:05 -06:00
{
u32 interruptibility_old = vmcs_read32 ( GUEST_INTERRUPTIBILITY_INFO ) ;
u32 interruptibility = interruptibility_old ;
interruptibility & = ~ ( GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS ) ;
2010-02-19 11:38:07 -07:00
if ( mask & KVM_X86_SHADOW_INT_MOV_SS )
2009-05-12 14:21:05 -06:00
interruptibility | = GUEST_INTR_STATE_MOV_SS ;
2010-02-19 11:38:07 -07:00
else if ( mask & KVM_X86_SHADOW_INT_STI )
2009-05-12 14:21:05 -06:00
interruptibility | = GUEST_INTR_STATE_STI ;
if ( ( interruptibility ! = interruptibility_old ) )
vmcs_write32 ( GUEST_INTERRUPTIBILITY_INFO , interruptibility ) ;
}
2018-10-24 02:05:14 -06:00
static int vmx_rtit_ctl_check ( struct kvm_vcpu * vcpu , u64 data )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
unsigned long value ;
/*
* Any MSR write that attempts to change bits marked reserved will
* case a # GP fault .
*/
if ( data & vmx - > pt_desc . ctl_bitmask )
return 1 ;
/*
* Any attempt to modify IA32_RTIT_CTL while TraceEn is set will
* result in a # GP unless the same write also clears TraceEn .
*/
if ( ( vmx - > pt_desc . guest . ctl & RTIT_CTL_TRACEEN ) & &
( ( vmx - > pt_desc . guest . ctl ^ data ) & ~ RTIT_CTL_TRACEEN ) )
return 1 ;
/*
* WRMSR to IA32_RTIT_CTL that sets TraceEn but clears this bit
* and FabricEn would cause # GP , if
* CPUID . ( EAX = 14 H , ECX = 0 ) : ECX . SNGLRGNOUT [ bit 2 ] = 0
*/
if ( ( data & RTIT_CTL_TRACEEN ) & & ! ( data & RTIT_CTL_TOPA ) & &
! ( data & RTIT_CTL_FABRIC_EN ) & &
! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_single_range_output ) )
return 1 ;
/*
* MTCFreq , CycThresh and PSBFreq encodings check , any MSR write that
* utilize encodings marked reserved will casue a # GP fault .
*/
value = intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_mtc_periods ) ;
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_mtc ) & &
! test_bit ( ( data & RTIT_CTL_MTC_RANGE ) > >
RTIT_CTL_MTC_RANGE_OFFSET , & value ) )
return 1 ;
value = intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_cycle_thresholds ) ;
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_psb_cyc ) & &
! test_bit ( ( data & RTIT_CTL_CYC_THRESH ) > >
RTIT_CTL_CYC_THRESH_OFFSET , & value ) )
return 1 ;
value = intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_psb_periods ) ;
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_psb_cyc ) & &
! test_bit ( ( data & RTIT_CTL_PSB_FREQ ) > >
RTIT_CTL_PSB_FREQ_OFFSET , & value ) )
return 1 ;
/*
* If ADDRx_CFG is reserved or the encodings is > 2 will
* cause a # GP fault .
*/
value = ( data & RTIT_CTL_ADDR0 ) > > RTIT_CTL_ADDR0_OFFSET ;
if ( ( value & & ( vmx - > pt_desc . addr_range < 1 ) ) | | ( value > 2 ) )
return 1 ;
value = ( data & RTIT_CTL_ADDR1 ) > > RTIT_CTL_ADDR1_OFFSET ;
if ( ( value & & ( vmx - > pt_desc . addr_range < 2 ) ) | | ( value > 2 ) )
return 1 ;
value = ( data & RTIT_CTL_ADDR2 ) > > RTIT_CTL_ADDR2_OFFSET ;
if ( ( value & & ( vmx - > pt_desc . addr_range < 3 ) ) | | ( value > 2 ) )
return 1 ;
value = ( data & RTIT_CTL_ADDR3 ) > > RTIT_CTL_ADDR3_OFFSET ;
if ( ( value & & ( vmx - > pt_desc . addr_range < 4 ) ) | | ( value > 2 ) )
return 1 ;
return 0 ;
}
2019-08-27 15:40:39 -06:00
static int skip_emulated_instruction ( struct kvm_vcpu * vcpu )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2020-04-27 09:55:59 -06:00
unsigned long rip , orig_rip ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2019-08-27 15:40:39 -06:00
/*
* Using VMCS . VM_EXIT_INSTRUCTION_LEN on EPT misconfig depends on
* undefined behavior : Intel ' s SDM doesn ' t mandate the VMCS field be
* set when EPT misconfig occurs . In practice , real hardware updates
* VM_EXIT_INSTRUCTION_LEN on EPT misconfig , but other hypervisors
* ( namely Hyper - V ) don ' t set it due to it being undefined behavior ,
* i . e . we end up advancing IP with some random value .
*/
if ( ! static_cpu_has ( X86_FEATURE_HYPERVISOR ) | |
to_vmx ( vcpu ) - > exit_reason ! = EXIT_REASON_EPT_MISCONFIG ) {
2020-04-27 09:55:59 -06:00
orig_rip = kvm_rip_read ( vcpu ) ;
rip = orig_rip + vmcs_read32 ( VM_EXIT_INSTRUCTION_LEN ) ;
# ifdef CONFIG_X86_64
/*
* We need to mask out the high 32 bits of RIP if not in 64 - bit
* mode , but just finding out that we are in 64 - bit mode is
* quite expensive . Only do it if there was a carry .
*/
if ( unlikely ( ( ( rip ^ orig_rip ) > > 31 ) = = 3 ) & & ! is_64_bit_mode ( vcpu ) )
rip = ( u32 ) rip ;
# endif
2019-08-27 15:40:39 -06:00
kvm_rip_write ( vcpu , rip ) ;
} else {
if ( ! kvm_emulate_instruction ( vcpu , EMULTYPE_SKIP ) )
return 0 ;
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2009-05-12 14:21:05 -06:00
/* skipping an emulated instruction also counts */
vmx_set_interrupt_shadow ( vcpu , 0 ) ;
x86: kvm: svm: propagate errors from skip_emulated_instruction()
On AMD, kvm_x86_ops->skip_emulated_instruction(vcpu) can, in theory,
fail: in !nrips case we call kvm_emulate_instruction(EMULTYPE_SKIP).
Currently, we only do printk(KERN_DEBUG) when this happens and this
is not ideal. Propagate the error up the stack.
On VMX, skip_emulated_instruction() doesn't fail, we have two call
sites calling it explicitly: handle_exception_nmi() and
handle_task_switch(), we can just ignore the result.
On SVM, we also have two explicit call sites:
svm_queue_exception() and it seems we don't need to do anything there as
we check if RIP was advanced or not. In task_switch_interception(),
however, we are better off not proceeding to kvm_task_switch() in case
skip_emulated_instruction() failed.
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-13 07:53:30 -06:00
KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27 15:40:38 -06:00
return 1 ;
x86: kvm: svm: propagate errors from skip_emulated_instruction()
On AMD, kvm_x86_ops->skip_emulated_instruction(vcpu) can, in theory,
fail: in !nrips case we call kvm_emulate_instruction(EMULTYPE_SKIP).
Currently, we only do printk(KERN_DEBUG) when this happens and this
is not ideal. Propagate the error up the stack.
On VMX, skip_emulated_instruction() doesn't fail, we have two call
sites calling it explicitly: handle_exception_nmi() and
handle_task_switch(), we can just ignore the result.
On SVM, we also have two explicit call sites:
svm_queue_exception() and it seems we don't need to do anything there as
we check if RIP was advanced or not. In task_switch_interception(),
however, we are better off not proceeding to kvm_task_switch() in case
skip_emulated_instruction() failed.
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-13 07:53:30 -06:00
}
2020-06-05 05:59:05 -06:00
/*
* Handles kvm_read / write_guest_virt * ( ) result and either injects # PF or returns
* KVM_EXIT_INTERNAL_ERROR for cases not currently handled by KVM . Return value
* indicates whether exit to userspace is needed .
*/
int vmx_handle_memory_failure ( struct kvm_vcpu * vcpu , int r ,
struct x86_exception * e )
{
if ( r = = X86EMUL_PROPAGATE_FAULT ) {
kvm_inject_emulated_page_fault ( vcpu , e ) ;
return 1 ;
}
/*
* In case kvm_read / write_guest_virt * ( ) failed with X86EMUL_IO_NEEDED
* while handling a VMX instruction KVM could ' ve handled the request
* correctly by exiting to userspace and performing I / O but there
* doesn ' t seem to be a real use - case behind such requests , just return
* KVM_EXIT_INTERNAL_ERROR for now .
*/
vcpu - > run - > exit_reason = KVM_EXIT_INTERNAL_ERROR ;
vcpu - > run - > internal . suberror = KVM_INTERNAL_ERROR_EMULATION ;
vcpu - > run - > internal . ndata = 0 ;
return 0 ;
}
2020-02-07 03:36:07 -07:00
/*
* Recognizes a pending MTF VM - exit and records the nested state for later
* delivery .
*/
static void vmx_update_emulated_instruction ( struct kvm_vcpu * vcpu )
{
struct vmcs12 * vmcs12 = get_vmcs12 ( vcpu ) ;
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
if ( ! is_guest_mode ( vcpu ) )
return ;
/*
* Per the SDM , MTF takes priority over debug - trap exceptions besides
* T - bit traps . As instruction emulation is completed ( i . e . at the
* instruction boundary ) , any # DB exception pending delivery must be a
* debug - trap . Record the pending MTF state to be delivered in
* vmx_check_nested_events ( ) .
*/
if ( nested_cpu_has_mtf ( vmcs12 ) & &
( ! vcpu - > arch . exception . pending | |
vcpu - > arch . exception . nr = = DB_VECTOR ) )
vmx - > nested . mtf_pending = true ;
else
vmx - > nested . mtf_pending = false ;
}
static int vmx_skip_emulated_instruction ( struct kvm_vcpu * vcpu )
{
vmx_update_emulated_instruction ( vcpu ) ;
return skip_emulated_instruction ( vcpu ) ;
}
2018-03-12 05:53:03 -06:00
static void vmx_clear_hlt ( struct kvm_vcpu * vcpu )
{
/*
* Ensure that we clear the HLT state in the VMCS . We don ' t need to
* explicitly skip the instruction because if the HLT state is set ,
* then the instruction is already executing and RIP has already been
* advanced .
*/
if ( kvm_hlt_in_guest ( vcpu - > kvm ) & &
vmcs_read32 ( GUEST_ACTIVITY_STATE ) = = GUEST_ACTIVITY_HLT )
vmcs_write32 ( GUEST_ACTIVITY_STATE , GUEST_ACTIVITY_ACTIVE ) ;
}
2017-07-13 19:30:39 -06:00
static void vmx_queue_exception ( struct kvm_vcpu * vcpu )
2007-11-25 04:41:11 -07:00
{
2008-07-14 04:28:51 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2017-07-13 19:30:39 -06:00
unsigned nr = vcpu - > arch . exception . nr ;
bool has_error_code = vcpu - > arch . exception . has_error_code ;
u32 error_code = vcpu - > arch . exception . error_code ;
2008-12-15 05:52:10 -07:00
u32 intr_info = nr | INTR_INFO_VALID_MASK ;
2008-07-14 04:28:51 -06:00
2018-10-16 15:29:22 -06:00
kvm_deliver_exception_payload ( vcpu ) ;
2008-12-15 05:52:10 -07:00
if ( has_error_code ) {
2008-07-14 04:28:51 -06:00
vmcs_write32 ( VM_ENTRY_EXCEPTION_ERROR_CODE , error_code ) ;
2008-12-15 05:52:10 -07:00
intr_info | = INTR_INFO_DELIVER_CODE_MASK ;
}
2008-07-14 04:28:51 -06:00
2009-06-09 05:10:45 -06:00
if ( vmx - > rmode . vm86_active ) {
2011-04-13 08:12:54 -06:00
int inc_eip = 0 ;
if ( kvm_exception_is_soft ( nr ) )
inc_eip = vcpu - > arch . event_exit_inst_len ;
2019-08-27 15:40:36 -06:00
kvm_inject_realmode_interrupt ( vcpu , nr , inc_eip ) ;
2008-07-14 04:28:51 -06:00
return ;
}
KVM: VMX: raise internal error for exception during invalid protected mode state
Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
an exception in Protected Mode while emulating guest due to invalid
guest state. Unlike Big RM, KVM doesn't support emulating exceptions
in PM, i.e. PM exceptions are always injected via the VMCS. Because
we will never do VMRESUME due to emulation_required, the exception is
never realized and we'll keep emulating the faulting instruction over
and over until we receive a signal.
Exit to userspace iff there is a pending exception, i.e. don't exit
simply on a requested event. The purpose of this check and exit is to
aid in debugging a guest that is in all likelihood already doomed.
Invalid guest state in PM is extremely limited in normal operation,
e.g. it generally only occurs for a few instructions early in BIOS,
and any exception at this time is all but guaranteed to be fatal.
Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
handled/emulated, while checking for vectored interrupts, e.g. INTR
and NMI, without hitting false positives would add a fair amount of
complexity for almost no benefit (getting hit by lightning seems
more likely than encountering this specific scenario).
Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
exception via the VMCS and emulation_required is true.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-23 10:34:00 -06:00
WARN_ON_ONCE ( vmx - > emulation_required ) ;
2009-05-11 04:35:50 -06:00
if ( kvm_exception_is_soft ( nr ) ) {
vmcs_write32 ( VM_ENTRY_INSTRUCTION_LEN ,
vmx - > vcpu . arch . event_exit_inst_len ) ;
2008-12-15 05:52:10 -07:00
intr_info | = INTR_TYPE_SOFT_EXCEPTION ;
} else
intr_info | = INTR_TYPE_HARD_EXCEPTION ;
vmcs_write32 ( VM_ENTRY_INTR_INFO_FIELD , intr_info ) ;
2018-03-12 05:53:03 -06:00
vmx_clear_hlt ( vcpu ) ;
2007-11-25 04:41:11 -07:00
}
2007-05-17 09:55:15 -06:00
/*
* Swap MSR entry in host / guest MSR entry array .
*/
2007-07-30 00:31:43 -06:00
static void move_msr_up ( struct vcpu_vmx * vmx , int from , int to )
2007-05-17 09:55:15 -06:00
{
2009-09-07 02:14:12 -06:00
struct shared_msr_entry tmp ;
2007-07-27 06:13:10 -06:00
tmp = vmx - > guest_msrs [ to ] ;
vmx - > guest_msrs [ to ] = vmx - > guest_msrs [ from ] ;
vmx - > guest_msrs [ from ] = tmp ;
2007-05-17 09:55:15 -06:00
}
2007-04-19 04:22:48 -06:00
/*
* Set up the vmcs to automatically save and restore system
* msrs . Don ' t touch the 64 - bit msrs if the guest is in legacy
* mode , as fiddling with msrs is very expensive .
*/
2007-07-30 00:31:43 -06:00
static void setup_msrs ( struct vcpu_vmx * vmx )
2007-04-19 04:22:48 -06:00
{
2009-09-07 02:14:12 -06:00
int save_nmsrs , index ;
2007-04-19 04:22:48 -06:00
2007-05-17 09:55:15 -06:00
save_nmsrs = 0 ;
# ifdef CONFIG_X86_64
2018-12-05 16:29:01 -07:00
/*
* The SYSCALL MSRs are only needed on long mode guests , and only
* when EFER . SCE is set .
*/
if ( is_long_mode ( & vmx - > vcpu ) & & ( vmx - > vcpu . arch . efer & EFER_SCE ) ) {
index = __find_msr_index ( vmx , MSR_STAR ) ;
2007-05-17 09:55:15 -06:00
if ( index > = 0 )
2007-07-30 00:31:43 -06:00
move_msr_up ( vmx , index , save_nmsrs + + ) ;
index = __find_msr_index ( vmx , MSR_LSTAR ) ;
2007-05-17 09:55:15 -06:00
if ( index > = 0 )
2007-07-30 00:31:43 -06:00
move_msr_up ( vmx , index , save_nmsrs + + ) ;
2018-12-05 16:29:01 -07:00
index = __find_msr_index ( vmx , MSR_SYSCALL_MASK ) ;
if ( index > = 0 )
2007-07-30 00:31:43 -06:00
move_msr_up ( vmx , index , save_nmsrs + + ) ;
2007-05-17 09:55:15 -06:00
}
# endif
2009-10-29 03:00:16 -06:00
index = __find_msr_index ( vmx , MSR_EFER ) ;
if ( index > = 0 & & update_transition_efer ( vmx , index ) )
2009-09-07 02:14:12 -06:00
move_msr_up ( vmx , index , save_nmsrs + + ) ;
2018-12-05 16:28:58 -07:00
index = __find_msr_index ( vmx , MSR_TSC_AUX ) ;
if ( index > = 0 & & guest_cpuid_has ( & vmx - > vcpu , X86_FEATURE_RDTSCP ) )
move_msr_up ( vmx , index , save_nmsrs + + ) ;
2019-11-18 10:23:00 -07:00
index = __find_msr_index ( vmx , MSR_IA32_TSX_CTRL ) ;
if ( index > = 0 )
move_msr_up ( vmx , index , save_nmsrs + + ) ;
2007-04-19 04:22:48 -06:00
2009-09-07 02:14:12 -06:00
vmx - > save_nmsrs = save_nmsrs ;
2019-06-07 11:00:14 -06:00
vmx - > guest_msrs_ready = false ;
2009-02-24 13:26:47 -07:00
2013-01-24 19:18:50 -07:00
if ( cpu_has_vmx_msr_bitmap ( ) )
2018-01-16 08:51:18 -07:00
vmx_update_msr_bitmap ( & vmx - > vcpu ) ;
2007-04-19 04:22:48 -06:00
}
2018-11-06 03:14:25 -07:00
static u64 vmx_write_l1_tsc_offset ( struct kvm_vcpu * vcpu , u64 offset )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2018-11-25 10:45:35 -07:00
struct vmcs12 * vmcs12 = get_vmcs12 ( vcpu ) ;
u64 g_tsc_offset = 0 ;
/*
* We ' re here if L1 chose not to trap WRMSR to TSC . According
* to the spec , this should set L1 ' s TSC ; The offset that L1
* set for L2 remains unchanged , and still needs to be added
* to the newly set TSC to get L2 ' s TSC .
*/
if ( is_guest_mode ( vcpu ) & &
2019-12-06 01:45:26 -07:00
( vmcs12 - > cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING ) )
2018-11-25 10:45:35 -07:00
g_tsc_offset = vmcs12 - > tsc_offset ;
2018-11-06 03:14:25 -07:00
2018-11-25 10:45:35 -07:00
trace_kvm_write_tsc_offset ( vcpu - > vcpu_id ,
vcpu - > arch . tsc_offset - g_tsc_offset ,
offset ) ;
vmcs_write64 ( TSC_OFFSET , offset + g_tsc_offset ) ;
return offset + g_tsc_offset ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2011-05-25 14:02:23 -06:00
/*
* nested_vmx_allowed ( ) checks whether a guest should be allowed to use VMX
* instructions and MSRs ( i . e . , nested VMX ) . Nested VMX is disabled for
* all guests if the " nested " module option is off , and can also be disabled
* for a single guest by disabling its VMX cpuid bit .
*/
2018-12-03 14:53:17 -07:00
bool nested_vmx_allowed ( struct kvm_vcpu * vcpu )
2011-05-25 14:02:23 -06:00
{
2017-08-04 16:12:49 -06:00
return nested & & guest_cpuid_has ( vcpu , X86_FEATURE_VMX ) ;
2011-05-25 14:02:23 -06:00
}
2018-12-03 14:53:18 -07:00
static inline bool vmx_feature_control_msr_valid ( struct kvm_vcpu * vcpu ,
uint64_t val )
2016-11-29 19:14:07 -07:00
{
2018-12-03 14:53:18 -07:00
uint64_t valid_bits = to_vmx ( vcpu ) - > msr_ia32_feature_control_valid_bits ;
2016-11-29 19:14:07 -07:00
2018-12-03 14:53:18 -07:00
return ! ( val & ~ valid_bits ) ;
2016-11-29 19:14:07 -07:00
}
2018-12-03 14:53:18 -07:00
static int vmx_get_msr_feature ( struct kvm_msr_entry * msr )
2016-11-29 19:14:07 -07:00
{
2018-12-03 14:53:18 -07:00
switch ( msr - > index ) {
case MSR_IA32_VMX_BASIC . . . MSR_IA32_VMX_VMFUNC :
if ( ! nested )
return 1 ;
return vmx_get_vmx_msr ( & vmcs_config . nested , msr - > index , & msr - > data ) ;
2020-05-29 01:43:45 -06:00
case MSR_IA32_PERF_CAPABILITIES :
msr - > data = vmx_get_perf_capabilities ( ) ;
return 0 ;
2018-12-03 14:53:18 -07:00
default :
return 1 ;
}
2016-11-29 19:14:07 -07:00
}
2018-12-03 14:53:18 -07:00
/*
* Reads an msr value ( of ' msr_index ' ) into ' pdata ' .
* Returns 0 on success , non - 0 otherwise .
* Assumes vcpu_load ( ) was already called .
*/
static int vmx_get_msr ( struct kvm_vcpu * vcpu , struct msr_data * msr_info )
2016-11-29 19:14:07 -07:00
{
2018-12-03 14:53:18 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
struct shared_msr_entry * msr ;
2018-10-24 02:05:14 -06:00
u32 index ;
2016-11-29 19:14:07 -07:00
2018-12-03 14:53:18 -07:00
switch ( msr_info - > index ) {
# ifdef CONFIG_X86_64
case MSR_FS_BASE :
msr_info - > data = vmcs_readl ( GUEST_FS_BASE ) ;
2016-11-29 19:14:07 -07:00
break ;
2018-12-03 14:53:18 -07:00
case MSR_GS_BASE :
msr_info - > data = vmcs_readl ( GUEST_GS_BASE ) ;
2016-11-29 19:14:07 -07:00
break ;
2018-12-03 14:53:18 -07:00
case MSR_KERNEL_GS_BASE :
msr_info - > data = vmx_read_guest_kernel_gs_base ( vmx ) ;
2016-11-29 19:14:07 -07:00
break ;
2018-12-03 14:53:18 -07:00
# endif
case MSR_EFER :
return kvm_get_msr_common ( vcpu , msr_info ) ;
2019-11-18 10:23:00 -07:00
case MSR_IA32_TSX_CTRL :
if ( ! msr_info - > host_initiated & &
! ( vcpu - > arch . arch_capabilities & ARCH_CAP_TSX_CTRL_MSR ) )
return 1 ;
goto find_shared_msr ;
2019-07-16 00:55:50 -06:00
case MSR_IA32_UMWAIT_CONTROL :
if ( ! msr_info - > host_initiated & & ! vmx_has_waitpkg ( vmx ) )
return 1 ;
msr_info - > data = vmx - > msr_ia32_umwait_control ;
break ;
2018-12-03 14:53:18 -07:00
case MSR_IA32_SPEC_CTRL :
if ( ! msr_info - > host_initiated & &
! guest_cpuid_has ( vcpu , X86_FEATURE_SPEC_CTRL ) )
return 1 ;
msr_info - > data = to_vmx ( vcpu ) - > spec_ctrl ;
2016-11-29 19:14:07 -07:00
break ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
case MSR_IA32_SYSENTER_CS :
2015-04-08 07:30:38 -06:00
msr_info - > data = vmcs_read32 ( GUEST_SYSENTER_CS ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
break ;
case MSR_IA32_SYSENTER_EIP :
2015-04-08 07:30:38 -06:00
msr_info - > data = vmcs_readl ( GUEST_SYSENTER_EIP ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
break ;
case MSR_IA32_SYSENTER_ESP :
2015-04-08 07:30:38 -06:00
msr_info - > data = vmcs_readl ( GUEST_SYSENTER_ESP ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
break ;
2014-02-24 03:56:53 -07:00
case MSR_IA32_BNDCFGS :
2017-07-03 20:27:41 -06:00
if ( ! kvm_mpx_supported ( ) | |
2017-08-04 16:12:49 -06:00
( ! msr_info - > host_initiated & &
! guest_cpuid_has ( vcpu , X86_FEATURE_MPX ) ) )
2014-03-05 15:19:52 -07:00
return 1 ;
2015-04-08 07:30:38 -06:00
msr_info - > data = vmcs_read64 ( GUEST_BNDCFGS ) ;
2014-02-24 03:56:53 -07:00
break ;
2016-06-22 00:59:56 -06:00
case MSR_IA32_MCG_EXT_CTL :
if ( ! msr_info - > host_initiated & &
2017-12-20 04:50:28 -07:00
! ( vmx - > msr_ia32_feature_control &
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
FEAT_CTL_LMCE_ENABLED ) )
2014-01-04 10:47:22 -07:00
return 1 ;
2016-06-22 00:59:56 -06:00
msr_info - > data = vcpu - > arch . mcg_ext_ctl ;
break ;
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
case MSR_IA32_FEAT_CTL :
2017-12-20 04:50:28 -07:00
msr_info - > data = vmx - > msr_ia32_feature_control ;
2014-01-04 10:47:22 -07:00
break ;
case MSR_IA32_VMX_BASIC . . . MSR_IA32_VMX_VMFUNC :
if ( ! nested_vmx_allowed ( vcpu ) )
return 1 ;
2020-02-05 05:30:33 -07:00
if ( vmx_get_vmx_msr ( & vmx - > nested . msrs , msr_info - > index ,
& msr_info - > data ) )
return 1 ;
/*
* Enlightened VMCS v1 doesn ' t have certain fields , but buggy
* Hyper - V versions are still trying to use corresponding
* features when they are exposed . Filter out the essential
* minimum .
*/
if ( ! msr_info - > host_initiated & &
vmx - > nested . enlightened_vmcs_enabled )
nested_evmcs_filter_control_msr ( msr_info - > index ,
& msr_info - > data ) ;
break ;
2018-10-24 02:05:14 -06:00
case MSR_IA32_RTIT_CTL :
2020-03-02 16:56:22 -07:00
if ( ! vmx_pt_mode_is_host_guest ( ) )
2018-10-24 02:05:14 -06:00
return 1 ;
msr_info - > data = vmx - > pt_desc . guest . ctl ;
break ;
case MSR_IA32_RTIT_STATUS :
2020-03-02 16:56:22 -07:00
if ( ! vmx_pt_mode_is_host_guest ( ) )
2018-10-24 02:05:14 -06:00
return 1 ;
msr_info - > data = vmx - > pt_desc . guest . status ;
break ;
case MSR_IA32_RTIT_CR3_MATCH :
2020-03-02 16:56:22 -07:00
if ( ! vmx_pt_mode_is_host_guest ( ) | |
2018-10-24 02:05:14 -06:00
! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_cr3_filtering ) )
return 1 ;
msr_info - > data = vmx - > pt_desc . guest . cr3_match ;
break ;
case MSR_IA32_RTIT_OUTPUT_BASE :
2020-03-02 16:56:22 -07:00
if ( ! vmx_pt_mode_is_host_guest ( ) | |
2018-10-24 02:05:14 -06:00
( ! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_topa_output ) & &
! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_single_range_output ) ) )
return 1 ;
msr_info - > data = vmx - > pt_desc . guest . output_base ;
break ;
case MSR_IA32_RTIT_OUTPUT_MASK :
2020-03-02 16:56:22 -07:00
if ( ! vmx_pt_mode_is_host_guest ( ) | |
2018-10-24 02:05:14 -06:00
( ! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_topa_output ) & &
! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_single_range_output ) ) )
return 1 ;
msr_info - > data = vmx - > pt_desc . guest . output_mask ;
break ;
case MSR_IA32_RTIT_ADDR0_A . . . MSR_IA32_RTIT_ADDR3_B :
index = msr_info - > index - MSR_IA32_RTIT_ADDR0_A ;
2020-03-02 16:56:22 -07:00
if ( ! vmx_pt_mode_is_host_guest ( ) | |
2018-10-24 02:05:14 -06:00
( index > = 2 * intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_num_address_ranges ) ) )
return 1 ;
if ( index % 2 )
msr_info - > data = vmx - > pt_desc . guest . addr_b [ index / 2 ] ;
else
msr_info - > data = vmx - > pt_desc . guest . addr_a [ index / 2 ] ;
break ;
2009-12-18 01:48:47 -07:00
case MSR_TSC_AUX :
2017-08-04 16:12:49 -06:00
if ( ! msr_info - > host_initiated & &
! guest_cpuid_has ( vcpu , X86_FEATURE_RDTSCP ) )
2009-12-18 01:48:47 -07:00
return 1 ;
2019-11-18 10:23:00 -07:00
goto find_shared_msr ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
default :
2019-11-18 10:23:00 -07:00
find_shared_msr :
2017-12-20 04:50:28 -07:00
msr = find_msr_entry ( vmx , msr_info - > index ) ;
2006-12-29 17:49:48 -07:00
if ( msr ) {
2015-04-08 07:30:38 -06:00
msr_info - > data = msr - > data ;
2006-12-29 17:49:48 -07:00
break ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2015-04-08 07:30:38 -06:00
return kvm_get_msr_common ( vcpu , msr_info ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
return 0 ;
}
2020-04-28 17:10:24 -06:00
static u64 nested_vmx_truncate_sysenter_addr ( struct kvm_vcpu * vcpu ,
u64 data )
{
# ifdef CONFIG_X86_64
if ( ! guest_cpuid_has ( vcpu , X86_FEATURE_LM ) )
return ( u32 ) data ;
# endif
return ( unsigned long ) data ;
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
/*
2019-12-10 23:26:25 -07:00
* Writes msr value into the appropriate " register " .
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
* Returns 0 on success , non - 0 otherwise .
* Assumes vcpu_load ( ) was already called .
*/
2012-11-29 13:42:12 -07:00
static int vmx_set_msr ( struct kvm_vcpu * vcpu , struct msr_data * msr_info )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2007-07-27 06:13:10 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2009-09-07 02:14:12 -06:00
struct shared_msr_entry * msr ;
2007-05-20 22:28:09 -06:00
int ret = 0 ;
2012-11-29 13:42:12 -07:00
u32 msr_index = msr_info - > index ;
u64 data = msr_info - > data ;
2018-10-24 02:05:14 -06:00
u32 index ;
2007-05-20 22:28:09 -06:00
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
switch ( msr_index ) {
2006-12-29 17:49:48 -07:00
case MSR_EFER :
2012-11-29 13:42:12 -07:00
ret = kvm_set_msr_common ( vcpu , msr_info ) ;
2007-05-20 22:28:09 -06:00
break ;
2009-03-23 14:13:44 -06:00
# ifdef CONFIG_X86_64
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
case MSR_FS_BASE :
2011-04-27 10:42:18 -06:00
vmx_segment_cache_clear ( vmx ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_writel ( GUEST_FS_BASE , data ) ;
break ;
case MSR_GS_BASE :
2011-04-27 10:42:18 -06:00
vmx_segment_cache_clear ( vmx ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_writel ( GUEST_GS_BASE , data ) ;
break ;
2009-09-06 06:55:37 -06:00
case MSR_KERNEL_GS_BASE :
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
When lazy save/restore of MSR_KERNEL_GS_BASE was introduced[1], the
MSR was intercepted in all modes and was only restored for the host
when the guest is in 64-bit mode. So at the time, going through the
full host restore prior to accessing MSR_KERNEL_GS_BASE was necessary
to load host state and was not a significant waste of cycles.
Later, MSR_KERNEL_GS_BASE interception was disabled for a 64-bit
guest[2], and then unconditionally saved/restored for the host[3].
As a result, loading full host state is overkill for accesses to
MSR_KERNEL_GS_BASE, and completely unnecessary when the guest is
not in 64-bit mode.
Add a dedicated utility to read/write the guest's MSR_KERNEL_GS_BASE
(outside of the save/restore flow) to minimize the overhead incurred
when accessing the MSR. When setting EFER, only decache the MSR if
the new EFER will disable long mode.
Removing out-of-band usage of vmx_load_host_state() also eliminates,
or at least reduces, potential corner cases in its usage, which in
turn will (hopefuly) make it easier to reason about future changes
to the save/restore flow, e.g. optimization of saving host state.
[1] commit 44ea2b1758d8 ("KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx
autoload msr area")
[2] commit 5897297bc228 ("KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE")
[3] commit c8770e7ba63b ("KVM: VMX: Fix host userspace gsbase corruption")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-23 13:32:43 -06:00
vmx_write_guest_kernel_gs_base ( vmx , data ) ;
2009-09-06 06:55:37 -06:00
break ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
# endif
case MSR_IA32_SYSENTER_CS :
2019-05-07 10:06:36 -06:00
if ( is_guest_mode ( vcpu ) )
get_vmcs12 ( vcpu ) - > guest_sysenter_cs = data ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_write32 ( GUEST_SYSENTER_CS , data ) ;
break ;
case MSR_IA32_SYSENTER_EIP :
2020-04-28 17:10:24 -06:00
if ( is_guest_mode ( vcpu ) ) {
data = nested_vmx_truncate_sysenter_addr ( vcpu , data ) ;
2019-05-07 10:06:36 -06:00
get_vmcs12 ( vcpu ) - > guest_sysenter_eip = data ;
2020-04-28 17:10:24 -06:00
}
2007-03-06 03:05:53 -07:00
vmcs_writel ( GUEST_SYSENTER_EIP , data ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
break ;
case MSR_IA32_SYSENTER_ESP :
2020-04-28 17:10:24 -06:00
if ( is_guest_mode ( vcpu ) ) {
data = nested_vmx_truncate_sysenter_addr ( vcpu , data ) ;
2019-05-07 10:06:36 -06:00
get_vmcs12 ( vcpu ) - > guest_sysenter_esp = data ;
2020-04-28 17:10:24 -06:00
}
2007-03-06 03:05:53 -07:00
vmcs_writel ( GUEST_SYSENTER_ESP , data ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
break ;
2019-05-07 10:06:37 -06:00
case MSR_IA32_DEBUGCTLMSR :
if ( is_guest_mode ( vcpu ) & & get_vmcs12 ( vcpu ) - > vm_exit_controls &
VM_EXIT_SAVE_DEBUG_CONTROLS )
get_vmcs12 ( vcpu ) - > guest_ia32_debugctl = data ;
ret = kvm_set_msr_common ( vcpu , msr_info ) ;
break ;
2014-02-24 03:56:53 -07:00
case MSR_IA32_BNDCFGS :
2017-07-03 20:27:41 -06:00
if ( ! kvm_mpx_supported ( ) | |
2017-08-04 16:12:49 -06:00
( ! msr_info - > host_initiated & &
! guest_cpuid_has ( vcpu , X86_FEATURE_MPX ) ) )
2014-03-05 15:19:52 -07:00
return 1 ;
2017-08-24 06:27:56 -06:00
if ( is_noncanonical_address ( data & PAGE_MASK , vcpu ) | |
2017-05-23 12:52:54 -06:00
( data & MSR_IA32_BNDCFGS_RSVD ) )
2014-03-05 15:19:52 -07:00
return 1 ;
2014-02-24 03:56:53 -07:00
vmcs_write64 ( GUEST_BNDCFGS , data ) ;
break ;
2019-07-16 00:55:50 -06:00
case MSR_IA32_UMWAIT_CONTROL :
if ( ! msr_info - > host_initiated & & ! vmx_has_waitpkg ( vmx ) )
return 1 ;
/* The reserved bit 1 and non-32 bit [63:32] should be zero */
if ( data & ( BIT_ULL ( 1 ) | GENMASK_ULL ( 63 , 32 ) ) )
return 1 ;
vmx - > msr_ia32_umwait_control = data ;
break ;
2018-02-01 14:59:45 -07:00
case MSR_IA32_SPEC_CTRL :
if ( ! msr_info - > host_initiated & &
! guest_cpuid_has ( vcpu , X86_FEATURE_SPEC_CTRL ) )
return 1 ;
2020-01-20 08:33:06 -07:00
if ( data & ~ kvm_spec_ctrl_valid_bits ( vcpu ) )
2018-02-01 14:59:45 -07:00
return 1 ;
vmx - > spec_ctrl = data ;
if ( ! data )
break ;
/*
* For non - nested :
* When it ' s written ( to non - zero ) for the first time , pass
* it through .
*
* For nested :
* The handling of the MSR bitmap for L2 guests is done in
2019-12-10 23:26:21 -07:00
* nested_vmx_prepare_msr_bitmap . We should not touch the
2018-02-01 14:59:45 -07:00
* vmcs02 . msr_bitmap here since it gets completely overwritten
* in the merging . We update the vmcs01 here for L1 as well
* since it will end up touching the MSR anyway now .
*/
vmx_disable_intercept_for_msr ( vmx - > vmcs01 . msr_bitmap ,
MSR_IA32_SPEC_CTRL ,
MSR_TYPE_RW ) ;
break ;
2019-11-18 10:23:00 -07:00
case MSR_IA32_TSX_CTRL :
if ( ! msr_info - > host_initiated & &
! ( vcpu - > arch . arch_capabilities & ARCH_CAP_TSX_CTRL_MSR ) )
return 1 ;
if ( data & ~ ( TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR ) )
return 1 ;
goto find_shared_msr ;
2018-02-01 14:59:43 -07:00
case MSR_IA32_PRED_CMD :
if ( ! msr_info - > host_initiated & &
! guest_cpuid_has ( vcpu , X86_FEATURE_SPEC_CTRL ) )
return 1 ;
if ( data & ~ PRED_CMD_IBPB )
return 1 ;
2020-01-20 08:33:06 -07:00
if ( ! boot_cpu_has ( X86_FEATURE_SPEC_CTRL ) )
return 1 ;
2018-02-01 14:59:43 -07:00
if ( ! data )
break ;
wrmsrl ( MSR_IA32_PRED_CMD , PRED_CMD_IBPB ) ;
/*
* For non - nested :
* When it ' s written ( to non - zero ) for the first time , pass
* it through .
*
* For nested :
* The handling of the MSR bitmap for L2 guests is done in
2019-12-10 23:26:21 -07:00
* nested_vmx_prepare_msr_bitmap . We should not touch the
2018-02-01 14:59:43 -07:00
* vmcs02 . msr_bitmap here since it gets completely overwritten
* in the merging .
*/
vmx_disable_intercept_for_msr ( vmx - > vmcs01 . msr_bitmap , MSR_IA32_PRED_CMD ,
MSR_TYPE_W ) ;
break ;
2008-10-09 02:01:55 -06:00
case MSR_IA32_CR_PAT :
2019-05-07 10:06:27 -06:00
if ( ! kvm_pat_valid ( data ) )
return 1 ;
2019-05-07 10:06:35 -06:00
if ( is_guest_mode ( vcpu ) & &
get_vmcs12 ( vcpu ) - > vm_exit_controls & VM_EXIT_SAVE_IA32_PAT )
get_vmcs12 ( vcpu ) - > guest_ia32_pat = data ;
2008-10-09 02:01:55 -06:00
if ( vmcs_config . vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT ) {
vmcs_write64 ( GUEST_IA32_PAT , data ) ;
vcpu - > arch . pat = data ;
break ;
}
2012-11-29 13:42:12 -07:00
ret = kvm_set_msr_common ( vcpu , msr_info ) ;
2009-12-18 01:48:47 -07:00
break ;
2012-11-29 13:42:50 -07:00
case MSR_IA32_TSC_ADJUST :
ret = kvm_set_msr_common ( vcpu , msr_info ) ;
2009-12-18 01:48:47 -07:00
break ;
2016-06-22 00:59:56 -06:00
case MSR_IA32_MCG_EXT_CTL :
if ( ( ! msr_info - > host_initiated & &
! ( to_vmx ( vcpu ) - > msr_ia32_feature_control &
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
FEAT_CTL_LMCE_ENABLED ) ) | |
2016-06-22 00:59:56 -06:00
( data & ~ MCG_EXT_CTL_LMCE_EN ) )
return 1 ;
vcpu - > arch . mcg_ext_ctl = data ;
break ;
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
case MSR_IA32_FEAT_CTL :
2016-06-22 00:59:55 -06:00
if ( ! vmx_feature_control_msr_valid ( vcpu , data ) | |
2016-06-22 00:59:54 -06:00
( to_vmx ( vcpu ) - > msr_ia32_feature_control &
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
FEAT_CTL_LOCKED & & ! msr_info - > host_initiated ) )
2014-01-04 10:47:22 -07:00
return 1 ;
2016-06-22 00:59:54 -06:00
vmx - > msr_ia32_feature_control = data ;
2014-01-04 10:47:22 -07:00
if ( msr_info - > host_initiated & & data = = 0 )
vmx_leave_nested ( vcpu ) ;
break ;
case MSR_IA32_VMX_BASIC . . . MSR_IA32_VMX_VMFUNC :
2016-11-29 19:14:07 -07:00
if ( ! msr_info - > host_initiated )
return 1 ; /* they are read-only */
if ( ! nested_vmx_allowed ( vcpu ) )
return 1 ;
return vmx_set_vmx_msr ( vcpu , msr_index , data ) ;
2018-10-24 02:05:14 -06:00
case MSR_IA32_RTIT_CTL :
2020-03-02 16:56:22 -07:00
if ( ! vmx_pt_mode_is_host_guest ( ) | |
2018-10-24 02:05:16 -06:00
vmx_rtit_ctl_check ( vcpu , data ) | |
vmx - > nested . vmxon )
2018-10-24 02:05:14 -06:00
return 1 ;
vmcs_write64 ( GUEST_IA32_RTIT_CTL , data ) ;
vmx - > pt_desc . guest . ctl = data ;
2018-10-24 02:05:15 -06:00
pt_update_intercept_for_msr ( vmx ) ;
2018-10-24 02:05:14 -06:00
break ;
case MSR_IA32_RTIT_STATUS :
2019-12-10 16:24:33 -07:00
if ( ! pt_can_write_msr ( vmx ) )
return 1 ;
if ( data & MSR_IA32_RTIT_STATUS_MASK )
2018-10-24 02:05:14 -06:00
return 1 ;
vmx - > pt_desc . guest . status = data ;
break ;
case MSR_IA32_RTIT_CR3_MATCH :
2019-12-10 16:24:33 -07:00
if ( ! pt_can_write_msr ( vmx ) )
return 1 ;
if ( ! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_cr3_filtering ) )
2018-10-24 02:05:14 -06:00
return 1 ;
vmx - > pt_desc . guest . cr3_match = data ;
break ;
case MSR_IA32_RTIT_OUTPUT_BASE :
2019-12-10 16:24:33 -07:00
if ( ! pt_can_write_msr ( vmx ) )
return 1 ;
if ( ! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_topa_output ) & &
! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_single_range_output ) )
return 1 ;
if ( data & MSR_IA32_RTIT_OUTPUT_BASE_MASK )
2018-10-24 02:05:14 -06:00
return 1 ;
vmx - > pt_desc . guest . output_base = data ;
break ;
case MSR_IA32_RTIT_OUTPUT_MASK :
2019-12-10 16:24:33 -07:00
if ( ! pt_can_write_msr ( vmx ) )
return 1 ;
if ( ! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_topa_output ) & &
! intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_single_range_output ) )
2018-10-24 02:05:14 -06:00
return 1 ;
vmx - > pt_desc . guest . output_mask = data ;
break ;
case MSR_IA32_RTIT_ADDR0_A . . . MSR_IA32_RTIT_ADDR3_B :
2019-12-10 16:24:33 -07:00
if ( ! pt_can_write_msr ( vmx ) )
return 1 ;
2018-10-24 02:05:14 -06:00
index = msr_info - > index - MSR_IA32_RTIT_ADDR0_A ;
2019-12-10 16:24:33 -07:00
if ( index > = 2 * intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_num_address_ranges ) )
2018-10-24 02:05:14 -06:00
return 1 ;
2019-12-10 16:24:32 -07:00
if ( is_noncanonical_address ( data , vcpu ) )
2018-10-24 02:05:14 -06:00
return 1 ;
if ( index % 2 )
vmx - > pt_desc . guest . addr_b [ index / 2 ] = data ;
else
vmx - > pt_desc . guest . addr_a [ index / 2 ] = data ;
break ;
2009-12-18 01:48:47 -07:00
case MSR_TSC_AUX :
2017-08-04 16:12:49 -06:00
if ( ! msr_info - > host_initiated & &
! guest_cpuid_has ( vcpu , X86_FEATURE_RDTSCP ) )
2009-12-18 01:48:47 -07:00
return 1 ;
/* Check reserved bit, higher 32 bits should be zero */
if ( ( data > > 32 ) ! = 0 )
return 1 ;
2019-11-18 10:23:00 -07:00
goto find_shared_msr ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
default :
2019-11-18 10:23:00 -07:00
find_shared_msr :
2007-07-30 00:31:43 -06:00
msr = find_msr_entry ( vmx , msr_index ) ;
2019-11-18 10:23:01 -07:00
if ( msr )
ret = vmx_set_guest_msr ( vmx , msr , data ) ;
else
ret = kvm_set_msr_common ( vcpu , msr_info ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2007-05-20 22:28:09 -06:00
return ret ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2008-06-27 11:58:02 -06:00
static void vmx_cache_reg ( struct kvm_vcpu * vcpu , enum kvm_reg reg )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2020-05-01 22:32:30 -06:00
unsigned long guest_owned_bits ;
2019-09-27 15:45:22 -06:00
kvm_register_mark_available ( vcpu , reg ) ;
2008-06-27 11:58:02 -06:00
switch ( reg ) {
case VCPU_REGS_RSP :
vcpu - > arch . regs [ VCPU_REGS_RSP ] = vmcs_readl ( GUEST_RSP ) ;
break ;
case VCPU_REGS_RIP :
vcpu - > arch . regs [ VCPU_REGS_RIP ] = vmcs_readl ( GUEST_RIP ) ;
break ;
2009-05-31 13:58:47 -06:00
case VCPU_EXREG_PDPTR :
if ( enable_ept )
ept_save_pdptrs ( vcpu ) ;
break ;
2020-05-01 22:32:31 -06:00
case VCPU_EXREG_CR0 :
guest_owned_bits = vcpu - > arch . cr0_guest_owned_bits ;
vcpu - > arch . cr0 & = ~ guest_owned_bits ;
vcpu - > arch . cr0 | = vmcs_readl ( GUEST_CR0 ) & guest_owned_bits ;
break ;
2019-09-27 15:45:23 -06:00
case VCPU_EXREG_CR3 :
if ( enable_unrestricted_guest | | ( enable_ept & & is_paging ( vcpu ) ) )
vcpu - > arch . cr3 = vmcs_readl ( GUEST_CR3 ) ;
break ;
2020-05-01 22:32:30 -06:00
case VCPU_EXREG_CR4 :
guest_owned_bits = vcpu - > arch . cr4_guest_owned_bits ;
vcpu - > arch . cr4 & = ~ guest_owned_bits ;
vcpu - > arch . cr4 | = vmcs_readl ( GUEST_CR4 ) & guest_owned_bits ;
break ;
2008-06-27 11:58:02 -06:00
default :
2019-09-27 15:45:23 -06:00
WARN_ON_ONCE ( 1 ) ;
2008-06-27 11:58:02 -06:00
break ;
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
static __init int cpu_has_kvm_support ( void )
{
2008-11-17 14:03:16 -07:00
return cpu_has_vmx ( ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
static __init int vmx_disabled_by_bios ( void )
{
2019-12-20 21:45:09 -07:00
return ! boot_cpu_has ( X86_FEATURE_MSR_IA32_FEAT_CTL ) | |
! boot_cpu_has ( X86_FEATURE_VMX ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2020-03-21 13:37:51 -06:00
static int kvm_cpu_vmxon ( u64 vmxon_pointer )
2010-05-11 04:29:38 -06:00
{
2020-03-21 13:37:51 -06:00
u64 msr ;
2017-03-10 04:47:13 -07:00
cr4_set_bits ( X86_CR4_VMXE ) ;
2016-03-29 08:43:10 -06:00
intel_pt_handle_vmx ( 1 ) ;
2020-03-21 13:37:51 -06:00
asm_volatile_goto ( " 1: vmxon %[vmxon_pointer] \n \t "
_ASM_EXTABLE ( 1 b , % l [ fault ] )
: : [ vmxon_pointer ] " m " ( vmxon_pointer )
: : fault ) ;
return 0 ;
fault :
WARN_ONCE ( 1 , " VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x%llx \n " ,
rdmsrl_safe ( MSR_IA32_FEAT_CTL , & msr ) ? 0xdeadbeef : msr ) ;
intel_pt_handle_vmx ( 0 ) ;
cr4_clear_bits ( X86_CR4_VMXE ) ;
return - EFAULT ;
2010-05-11 04:29:38 -06:00
}
2014-08-28 07:13:03 -06:00
static int hardware_enable ( void )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
int cpu = raw_smp_processor_id ( ) ;
u64 phys_addr = __pa ( per_cpu ( vmxarea , cpu ) ) ;
2020-03-21 13:37:51 -06:00
int r ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2014-10-24 16:58:08 -06:00
if ( cr4_read_shadow ( ) & X86_CR4_VMXE )
2009-09-15 03:37:46 -06:00
return - EBUSY ;
2018-03-20 08:02:11 -06:00
/*
* This can happen if we hot - added a CPU but failed to allocate
* VP assist page for it .
*/
if ( static_branch_unlikely ( & enable_evmcs ) & &
! hv_get_vp_assist_page ( cpu ) )
return - EFAULT ;
2020-03-21 13:37:51 -06:00
r = kvm_cpu_vmxon ( phys_addr ) ;
if ( r )
return r ;
2012-12-06 08:43:34 -07:00
2017-08-24 12:51:29 -06:00
if ( enable_ept )
ept_sync_global ( ) ;
2009-09-15 03:37:46 -06:00
return 0 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
static void vmclear_local_loaded_vmcss ( void )
2008-05-13 07:22:47 -06:00
{
int cpu = raw_smp_processor_id ( ) ;
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
struct loaded_vmcs * v , * n ;
2008-05-13 07:22:47 -06:00
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
list_for_each_entry_safe ( v , n , & per_cpu ( loaded_vmcss_on_cpu , cpu ) ,
loaded_vmcss_on_cpu_link )
__loaded_vmcs_clear ( v ) ;
2008-05-13 07:22:47 -06:00
}
2008-11-17 14:03:18 -07:00
/* Just like cpu_vmxoff(), but with the __kvm_handle_fault_on_reboot()
* tricks .
*/
static void kvm_cpu_vmxoff ( void )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
KVM/x86: Use assembly instruction mnemonics instead of .byte streams
Recently the minimum required version of binutils was changed to 2.20,
which supports all VMX instruction mnemonics. The patch removes
all .byte #defines and uses real instruction mnemonics instead.
The compiler is now able to pass memory operand to the instruction,
so there is no need for memory clobber anymore. Also, the compiler
adds CC register clobber automatically to all extended asm clauses,
so the patch also removes explicit CC clobber.
The immediate benefit of the patch is removal of many unnecesary
register moves, resulting in 1434 saved bytes in vmx.o:
text data bss dec hex filename
151257 18246 8500 178003 2b753 vmx.o
152691 18246 8500 179437 2bced vmx-old.o
Some examples of improvement include removal of unneeded moves
of %rsp to %rax in front of invept and invvpid instructions:
a57e: b9 01 00 00 00 mov $0x1,%ecx
a583: 48 89 04 24 mov %rax,(%rsp)
a587: 48 89 e0 mov %rsp,%rax
a58a: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
a591: 00 00
a593: 66 0f 38 80 08 invept (%rax),%rcx
to:
a45c: 48 89 04 24 mov %rax,(%rsp)
a460: b8 01 00 00 00 mov $0x1,%eax
a465: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
a46c: 00 00
a46e: 66 0f 38 80 04 24 invept (%rsp),%rax
and the ability to use more optimal registers and memory operands
in the instruction:
8faa: 48 8b 44 24 28 mov 0x28(%rsp),%rax
8faf: 4c 89 c2 mov %r8,%rdx
8fb2: 0f 79 d0 vmwrite %rax,%rdx
to:
8e7c: 44 0f 79 44 24 28 vmwrite 0x28(%rsp),%r8
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-11 11:40:44 -06:00
asm volatile ( __ex ( " vmxoff " ) ) ;
2016-03-29 08:43:10 -06:00
intel_pt_handle_vmx ( 0 ) ;
2017-03-10 04:47:13 -07:00
cr4_clear_bits ( X86_CR4_VMXE ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2014-08-28 07:13:03 -06:00
static void hardware_disable ( void )
2008-11-17 14:03:18 -07:00
{
2017-03-10 04:47:13 -07:00
vmclear_local_loaded_vmcss ( ) ;
kvm_cpu_vmxoff ( ) ;
2008-11-17 14:03:18 -07:00
}
2020-03-12 12:04:16 -06:00
/*
* There is no X86_FEATURE for SGX yet , but anyway we need to query CPUID
* directly instead of going through cpu_has ( ) , to ensure KVM is trapping
* ENCLS whenever it ' s supported in hardware . It does not matter whether
* the host OS supports or has enabled SGX .
*/
static bool cpu_has_sgx ( void )
{
return cpuid_eax ( 0 ) > = 0x12 & & ( cpuid_eax ( 0x12 ) & BIT ( 0 ) ) ;
}
2007-07-29 02:07:42 -06:00
static __init int adjust_vmx_controls ( u32 ctl_min , u32 ctl_opt ,
2007-10-08 07:02:08 -06:00
u32 msr , u32 * result )
2007-07-29 02:07:42 -06:00
{
u32 vmx_msr_low , vmx_msr_high ;
u32 ctl = ctl_min | ctl_opt ;
rdmsr ( msr , vmx_msr_low , vmx_msr_high ) ;
ctl & = vmx_msr_high ; /* bit == 0 in high word ==> must be zero */
ctl | = vmx_msr_low ; /* bit == 1 in low word ==> must be one */
/* Ensure minimum (required) set of control bits are supported. */
if ( ctl_min & ~ ctl )
2007-07-31 05:23:01 -06:00
return - EIO ;
2007-07-29 02:07:42 -06:00
* result = ctl ;
return 0 ;
}
2018-12-03 14:53:01 -07:00
static __init int setup_vmcs_config ( struct vmcs_config * vmcs_conf ,
struct vmx_capability * vmx_cap )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
u32 vmx_msr_low , vmx_msr_high ;
2008-04-24 20:13:16 -06:00
u32 min , opt , min2 , opt2 ;
2007-07-29 02:07:42 -06:00
u32 _pin_based_exec_control = 0 ;
u32 _cpu_based_exec_control = 0 ;
2007-10-28 19:40:42 -06:00
u32 _cpu_based_2nd_exec_control = 0 ;
2007-07-29 02:07:42 -06:00
u32 _vmexit_control = 0 ;
u32 _vmentry_control = 0 ;
2018-02-26 05:40:09 -07:00
memset ( vmcs_conf , 0 , sizeof ( * vmcs_conf ) ) ;
2012-02-07 10:49:20 -07:00
min = CPU_BASED_HLT_EXITING |
2007-07-29 02:07:42 -06:00
# ifdef CONFIG_X86_64
CPU_BASED_CR8_LOAD_EXITING |
CPU_BASED_CR8_STORE_EXITING |
# endif
2008-04-24 20:13:16 -06:00
CPU_BASED_CR3_LOAD_EXITING |
CPU_BASED_CR3_STORE_EXITING |
2017-12-12 01:44:21 -07:00
CPU_BASED_UNCOND_IO_EXITING |
2007-07-29 02:07:42 -06:00
CPU_BASED_MOV_DR_EXITING |
2019-12-06 01:45:26 -07:00
CPU_BASED_USE_TSC_OFFSETTING |
2018-03-12 05:53:02 -06:00
CPU_BASED_MWAIT_EXITING |
CPU_BASED_MONITOR_EXITING |
2011-11-10 05:57:25 -07:00
CPU_BASED_INVLPG_EXITING |
CPU_BASED_RDPMC_EXITING ;
2010-12-06 09:53:38 -07:00
2007-10-28 19:40:42 -06:00
opt = CPU_BASED_TPR_SHADOW |
2008-03-27 23:18:56 -06:00
CPU_BASED_USE_MSR_BITMAPS |
2007-10-28 19:40:42 -06:00
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS ;
2007-07-29 02:07:42 -06:00
if ( adjust_vmx_controls ( min , opt , MSR_IA32_VMX_PROCBASED_CTLS ,
& _cpu_based_exec_control ) < 0 )
2007-07-31 05:23:01 -06:00
return - EIO ;
2007-09-12 04:03:11 -06:00
# ifdef CONFIG_X86_64
if ( ( _cpu_based_exec_control & CPU_BASED_TPR_SHADOW ) )
_cpu_based_exec_control & = ~ CPU_BASED_CR8_LOAD_EXITING &
~ CPU_BASED_CR8_STORE_EXITING ;
# endif
2007-10-28 19:40:42 -06:00
if ( _cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS ) {
2008-04-24 20:13:16 -06:00
min2 = 0 ;
opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
2013-01-24 19:18:50 -07:00
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
2008-01-17 00:14:33 -07:00
SECONDARY_EXEC_WBINVD_EXITING |
2008-04-24 20:13:16 -06:00
SECONDARY_EXEC_ENABLE_VPID |
2009-06-08 12:34:16 -06:00
SECONDARY_EXEC_ENABLE_EPT |
2009-10-09 04:03:20 -06:00
SECONDARY_EXEC_UNRESTRICTED_GUEST |
2009-12-18 01:48:47 -07:00
SECONDARY_EXEC_PAUSE_LOOP_EXITING |
2016-07-12 02:44:55 -06:00
SECONDARY_EXEC_DESC |
2012-07-01 19:18:48 -06:00
SECONDARY_EXEC_RDTSCP |
2013-01-24 19:18:49 -07:00
SECONDARY_EXEC_ENABLE_INVPCID |
2013-01-24 19:18:51 -07:00
SECONDARY_EXEC_APIC_REGISTER_VIRT |
2013-04-18 05:35:25 -06:00
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
2014-12-02 04:14:59 -07:00
SECONDARY_EXEC_SHADOW_VMCS |
2015-01-27 19:54:28 -07:00
SECONDARY_EXEC_XSAVES |
2017-08-24 12:51:37 -06:00
SECONDARY_EXEC_RDSEED_EXITING |
SECONDARY_EXEC_RDRAND_EXITING |
2015-09-09 00:05:51 -06:00
SECONDARY_EXEC_ENABLE_PML |
2017-08-03 13:54:41 -06:00
SECONDARY_EXEC_TSC_SCALING |
KVM: x86: Add support for user wait instructions
UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.
This patch adds support for user wait instructions in KVM. Availability
of the user wait instructions is indicated by the presence of the CPUID
feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may
be executed at any privilege level, and use 32bit IA32_UMWAIT_CONTROL MSR
to set the maximum time.
The behavior of user wait instructions in VMX non-root operation is
determined first by the setting of the "enable user wait and pause"
secondary processor-based VM-execution control bit 26.
If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause
an invalid-opcode exception (#UD).
If the VM-execution control is 1, treatment is based on the
setting of the “RDTSC exiting†VM-execution control. Because KVM never
enables RDTSC exiting, if the instruction causes a delay, the amount of
time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay. If
IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in
EDX:EAX minus the value that RDTSC would return; if
IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum
of that difference and AND(IA32_UMWAIT_CONTROL,FFFFFFFCH).
Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it.
Detailed information about user wait instructions can be found in the
latest Intel 64 and IA-32 Architectures Software Developer's Manual.
Co-developed-by: Jingqi Liu <jingqi.liu@intel.com>
Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-16 00:55:49 -06:00
SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
2018-10-24 02:05:10 -06:00
SECONDARY_EXEC_PT_USE_GPA |
SECONDARY_EXEC_PT_CONCEAL_VMX |
2020-03-12 12:04:16 -06:00
SECONDARY_EXEC_ENABLE_VMFUNC ;
if ( cpu_has_sgx ( ) )
opt2 | = SECONDARY_EXEC_ENCLS_EXITING ;
2008-04-24 20:13:16 -06:00
if ( adjust_vmx_controls ( min2 , opt2 ,
MSR_IA32_VMX_PROCBASED_CTLS2 ,
2007-10-28 19:40:42 -06:00
& _cpu_based_2nd_exec_control ) < 0 )
return - EIO ;
}
# ifndef CONFIG_X86_64
if ( ! ( _cpu_based_2nd_exec_control &
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES ) )
_cpu_based_exec_control & = ~ CPU_BASED_TPR_SHADOW ;
# endif
2013-01-24 19:18:49 -07:00
if ( ! ( _cpu_based_exec_control & CPU_BASED_TPR_SHADOW ) )
_cpu_based_2nd_exec_control & = ~ (
2013-01-24 19:18:50 -07:00
SECONDARY_EXEC_APIC_REGISTER_VIRT |
2013-01-24 19:18:51 -07:00
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY ) ;
2013-01-24 19:18:49 -07:00
2017-10-18 17:02:19 -06:00
rdmsr_safe ( MSR_IA32_VMX_EPT_VPID_CAP ,
2018-12-03 14:53:01 -07:00
& vmx_cap - > ept , & vmx_cap - > vpid ) ;
2017-10-18 17:02:19 -06:00
2008-04-24 20:13:16 -06:00
if ( _cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT ) {
2008-09-23 10:18:35 -06:00
/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
enabled */
2009-08-27 09:41:30 -06:00
_cpu_based_exec_control & = ~ ( CPU_BASED_CR3_LOAD_EXITING |
CPU_BASED_CR3_STORE_EXITING |
CPU_BASED_INVLPG_EXITING ) ;
2018-12-03 14:53:01 -07:00
} else if ( vmx_cap - > ept ) {
vmx_cap - > ept = 0 ;
2017-10-18 17:02:19 -06:00
pr_warn_once ( " EPT CAP should not exist if not support "
" 1-setting enable EPT VM-execution control \n " ) ;
}
if ( ! ( _cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_VPID ) & &
2018-12-03 14:53:01 -07:00
vmx_cap - > vpid ) {
vmx_cap - > vpid = 0 ;
2017-10-18 17:02:19 -06:00
pr_warn_once ( " VPID CAP should not exist if not support "
" 1-setting enable VPID VM-execution control \n " ) ;
2008-04-24 20:13:16 -06:00
}
2007-07-29 02:07:42 -06:00
2016-06-15 12:55:08 -06:00
min = VM_EXIT_SAVE_DEBUG_CONTROLS | VM_EXIT_ACK_INTR_ON_EXIT ;
2007-07-29 02:07:42 -06:00
# ifdef CONFIG_X86_64
min | = VM_EXIT_HOST_ADDR_SPACE_SIZE ;
# endif
2018-12-03 14:53:00 -07:00
opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
VM_EXIT_LOAD_IA32_PAT |
VM_EXIT_LOAD_IA32_EFER |
2018-10-24 02:05:10 -06:00
VM_EXIT_CLEAR_BNDCFGS |
VM_EXIT_PT_CONCEAL_PIP |
VM_EXIT_CLEAR_IA32_RTIT_CTL ;
2007-07-29 02:07:42 -06:00
if ( adjust_vmx_controls ( min , opt , MSR_IA32_VMX_EXIT_CTLS ,
& _vmexit_control ) < 0 )
2007-07-31 05:23:01 -06:00
return - EIO ;
2007-07-29 02:07:42 -06:00
2017-11-06 05:31:12 -07:00
min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING ;
opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR |
PIN_BASED_VMX_PREEMPTION_TIMER ;
2013-04-11 05:25:12 -06:00
if ( adjust_vmx_controls ( min , opt , MSR_IA32_VMX_PINBASED_CTLS ,
& _pin_based_exec_control ) < 0 )
return - EIO ;
2016-07-08 03:53:38 -06:00
if ( cpu_has_broken_vmx_preemption_timer ( ) )
_pin_based_exec_control & = ~ PIN_BASED_VMX_PREEMPTION_TIMER ;
2013-04-11 05:25:12 -06:00
if ( ! ( _cpu_based_2nd_exec_control &
2016-06-15 12:55:08 -06:00
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY ) )
2013-04-11 05:25:12 -06:00
_pin_based_exec_control & = ~ PIN_BASED_POSTED_INTR ;
2014-02-21 02:55:44 -07:00
min = VM_ENTRY_LOAD_DEBUG_CONTROLS ;
2018-12-03 14:53:00 -07:00
opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
VM_ENTRY_LOAD_IA32_PAT |
VM_ENTRY_LOAD_IA32_EFER |
2018-10-24 02:05:10 -06:00
VM_ENTRY_LOAD_BNDCFGS |
VM_ENTRY_PT_CONCEAL_PIP |
VM_ENTRY_LOAD_IA32_RTIT_CTL ;
2007-07-29 02:07:42 -06:00
if ( adjust_vmx_controls ( min , opt , MSR_IA32_VMX_ENTRY_CTLS ,
& _vmentry_control ) < 0 )
2007-07-31 05:23:01 -06:00
return - EIO ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2018-12-03 14:53:00 -07:00
/*
* Some cpus support VM_ { ENTRY , EXIT } _IA32_PERF_GLOBAL_CTRL but they
* can ' t be used due to an errata where VM Exit may incorrectly clear
* IA32_PERF_GLOBAL_CTRL [ 34 : 32 ] . Workaround the errata by using the
* MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL .
*/
if ( boot_cpu_data . x86 = = 0x6 ) {
switch ( boot_cpu_data . x86_model ) {
case 26 : /* AAK155 */
case 30 : /* AAP115 */
case 37 : /* AAT100 */
case 44 : /* BC86,AAY89,BD102 */
case 46 : /* BA97 */
2019-01-14 13:12:02 -07:00
_vmentry_control & = ~ VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL ;
2018-12-03 14:53:00 -07:00
_vmexit_control & = ~ VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL ;
pr_warn_once ( " kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
" does not work properly. Using workaround \n " ) ;
break ;
default :
break ;
}
}
2006-12-29 17:49:54 -07:00
rdmsr ( MSR_IA32_VMX_BASIC , vmx_msr_low , vmx_msr_high ) ;
2007-07-29 02:07:42 -06:00
/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
if ( ( vmx_msr_high & 0x1fff ) > PAGE_SIZE )
2007-07-31 05:23:01 -06:00
return - EIO ;
2007-07-29 02:07:42 -06:00
# ifdef CONFIG_X86_64
/* IA-32 SDM Vol 3B: 64-bit CPUs always have VMX_BASIC_MSR[48]==0. */
if ( vmx_msr_high & ( 1u < < 16 ) )
2007-07-31 05:23:01 -06:00
return - EIO ;
2007-07-29 02:07:42 -06:00
# endif
/* Require Write-Back (WB) memory type for VMCS accesses. */
if ( ( ( vmx_msr_high > > 18 ) & 15 ) ! = 6 )
2007-07-31 05:23:01 -06:00
return - EIO ;
2007-07-29 02:07:42 -06:00
2007-07-31 05:23:01 -06:00
vmcs_conf - > size = vmx_msr_high & 0x1fff ;
2016-09-05 07:57:00 -06:00
vmcs_conf - > order = get_order ( vmcs_conf - > size ) ;
2016-09-04 12:23:15 -06:00
vmcs_conf - > basic_cap = vmx_msr_high & ~ 0x1fff ;
2018-03-20 08:02:11 -06:00
2018-06-29 13:59:04 -06:00
vmcs_conf - > revision_id = vmx_msr_low ;
2007-07-29 02:07:42 -06:00
2007-07-31 05:23:01 -06:00
vmcs_conf - > pin_based_exec_ctrl = _pin_based_exec_control ;
vmcs_conf - > cpu_based_exec_ctrl = _cpu_based_exec_control ;
2007-10-28 19:40:42 -06:00
vmcs_conf - > cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control ;
2007-07-31 05:23:01 -06:00
vmcs_conf - > vmexit_ctrl = _vmexit_control ;
vmcs_conf - > vmentry_ctrl = _vmentry_control ;
2007-07-29 02:07:42 -06:00
2018-03-20 08:02:11 -06:00
if ( static_branch_unlikely ( & enable_evmcs ) )
evmcs_sanitize_exec_ctrls ( vmcs_conf ) ;
2007-07-29 02:07:42 -06:00
return 0 ;
2006-12-29 17:49:54 -07:00
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2019-02-11 12:02:52 -07:00
struct vmcs * alloc_vmcs_cpu ( bool shadow , int cpu , gfp_t flags )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
int node = cpu_to_node ( cpu ) ;
struct page * pages ;
struct vmcs * vmcs ;
2019-02-11 12:02:52 -07:00
pages = __alloc_pages_node ( node , flags , vmcs_config . order ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
if ( ! pages )
return NULL ;
vmcs = page_address ( pages ) ;
2007-07-29 02:07:42 -06:00
memset ( vmcs , 0 , vmcs_config . size ) ;
2018-06-29 13:59:04 -06:00
/* KVM supports Enlightened VMCS v1 only */
if ( static_branch_unlikely ( & enable_evmcs ) )
2018-06-22 17:35:01 -06:00
vmcs - > hdr . revision_id = KVM_EVMCS_VERSION ;
2018-06-29 13:59:04 -06:00
else
2018-06-22 17:35:01 -06:00
vmcs - > hdr . revision_id = vmcs_config . revision_id ;
2018-06-29 13:59:04 -06:00
2018-06-22 17:35:12 -06:00
if ( shadow )
vmcs - > hdr . shadow_vmcs = 1 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
return vmcs ;
}
2018-12-03 14:53:07 -07:00
void free_vmcs ( struct vmcs * vmcs )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2007-07-29 02:07:42 -06:00
free_pages ( ( unsigned long ) vmcs , vmcs_config . order ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
/*
* Free a VMCS , but before that VMCLEAR it on the CPU where it was last loaded
*/
2018-12-03 14:53:07 -07:00
void free_loaded_vmcs ( struct loaded_vmcs * loaded_vmcs )
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
{
if ( ! loaded_vmcs - > vmcs )
return ;
loaded_vmcs_clear ( loaded_vmcs ) ;
free_vmcs ( loaded_vmcs - > vmcs ) ;
loaded_vmcs - > vmcs = NULL ;
2018-01-16 08:51:18 -07:00
if ( loaded_vmcs - > msr_bitmap )
free_page ( ( unsigned long ) loaded_vmcs - > msr_bitmap ) ;
2016-10-28 09:29:39 -06:00
WARN_ON ( loaded_vmcs - > shadow_vmcs ! = NULL ) ;
KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.
So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-24 06:26:10 -06:00
}
2018-12-03 14:53:07 -07:00
int alloc_loaded_vmcs ( struct loaded_vmcs * loaded_vmcs )
2018-01-11 04:16:15 -07:00
{
2018-06-22 17:35:12 -06:00
loaded_vmcs - > vmcs = alloc_vmcs ( false ) ;
2018-01-11 04:16:15 -07:00
if ( ! loaded_vmcs - > vmcs )
return - ENOMEM ;
2020-03-21 13:37:50 -06:00
vmcs_clear ( loaded_vmcs - > vmcs ) ;
2018-01-11 04:16:15 -07:00
loaded_vmcs - > shadow_vmcs = NULL ;
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
loaded_vmcs - > hv_timer_soft_disabled = false ;
2020-03-21 13:37:50 -06:00
loaded_vmcs - > cpu = - 1 ;
loaded_vmcs - > launched = 0 ;
2018-01-16 08:51:18 -07:00
if ( cpu_has_vmx_msr_bitmap ( ) ) {
2019-02-11 12:02:52 -07:00
loaded_vmcs - > msr_bitmap = ( unsigned long * )
__get_free_page ( GFP_KERNEL_ACCOUNT ) ;
2018-01-16 08:51:18 -07:00
if ( ! loaded_vmcs - > msr_bitmap )
goto out_vmcs ;
memset ( loaded_vmcs - > msr_bitmap , 0xff , PAGE_SIZE ) ;
2018-04-16 04:50:33 -06:00
2018-05-25 09:36:17 -06:00
if ( IS_ENABLED ( CONFIG_HYPERV ) & &
static_branch_unlikely ( & enable_evmcs ) & &
2018-04-16 04:50:33 -06:00
( ms_hyperv . nested_features & HV_X64_NESTED_MSR_BITMAP ) ) {
struct hv_enlightened_vmcs * evmcs =
( struct hv_enlightened_vmcs * ) loaded_vmcs - > vmcs ;
evmcs - > hv_enlightenments_control . msr_bitmap = 1 ;
}
2018-01-16 08:51:18 -07:00
}
2018-07-23 13:32:47 -06:00
memset ( & loaded_vmcs - > host_state , 0 , sizeof ( struct vmcs_host_state ) ) ;
2019-05-07 13:18:00 -06:00
memset ( & loaded_vmcs - > controls_shadow , 0 ,
sizeof ( struct vmcs_controls_shadow ) ) ;
2018-07-23 13:32:47 -06:00
2018-01-11 04:16:15 -07:00
return 0 ;
2018-01-16 08:51:18 -07:00
out_vmcs :
free_loaded_vmcs ( loaded_vmcs ) ;
return - ENOMEM ;
2018-01-11 04:16:15 -07:00
}
2007-06-01 01:47:13 -06:00
static void free_kvm_area ( void )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
int cpu ;
2009-09-29 15:38:37 -06:00
for_each_possible_cpu ( cpu ) {
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
free_vmcs ( per_cpu ( vmxarea , cpu ) ) ;
2009-09-29 15:38:37 -06:00
per_cpu ( vmxarea , cpu ) = NULL ;
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
static __init int alloc_kvm_area ( void )
{
int cpu ;
2009-09-29 15:38:37 -06:00
for_each_possible_cpu ( cpu ) {
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
struct vmcs * vmcs ;
2019-02-11 12:02:52 -07:00
vmcs = alloc_vmcs_cpu ( false , cpu , GFP_KERNEL ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
if ( ! vmcs ) {
free_kvm_area ( ) ;
return - ENOMEM ;
}
2018-06-29 13:59:04 -06:00
/*
* When eVMCS is enabled , alloc_vmcs_cpu ( ) sets
* vmcs - > revision_id to KVM_EVMCS_VERSION instead of
* revision_id reported by MSR_IA32_VMX_BASIC .
*
2018-12-26 18:03:51 -07:00
* However , even though not explicitly documented by
2018-06-29 13:59:04 -06:00
* TLFS , VMXArea passed as VMXON argument should
* still be marked with revision_id reported by
* physical CPU .
*/
if ( static_branch_unlikely ( & enable_evmcs ) )
2018-06-22 17:35:01 -06:00
vmcs - > hdr . revision_id = vmcs_config . revision_id ;
2018-06-29 13:59:04 -06:00
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
per_cpu ( vmxarea , cpu ) = vmcs ;
}
return 0 ;
}
2013-01-21 06:36:47 -07:00
static void fix_pmode_seg ( struct kvm_vcpu * vcpu , int seg ,
2012-12-20 07:57:45 -07:00
struct kvm_segment * save )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2012-12-20 07:57:45 -07:00
if ( ! emulate_invalid_guest_state ) {
/*
* CS and SS RPL should be equal during guest entry according
* to VMX spec , but in reality it is not always so . Since vcpu
* is in the middle of the transition from real mode to
* protected mode it is safe to assume that RPL 0 is a good
* default value .
*/
if ( seg = = VCPU_SREG_CS | | seg = = VCPU_SREG_SS )
2015-03-29 07:33:04 -06:00
save - > selector & = ~ SEGMENT_RPL_MASK ;
save - > dpl = save - > selector & SEGMENT_RPL_MASK ;
2012-12-20 07:57:45 -07:00
save - > s = 1 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2012-12-20 07:57:45 -07:00
vmx_set_segment ( vcpu , save , seg ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
static void enter_pmode ( struct kvm_vcpu * vcpu )
{
unsigned long flags ;
2008-08-17 07:42:16 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2012-12-20 07:57:45 -07:00
/*
* Update real mode segment cache . It may be not up - to - date if sement
* register was written while vcpu was in a guest mode .
*/
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_ES ] , VCPU_SREG_ES ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_DS ] , VCPU_SREG_DS ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_FS ] , VCPU_SREG_FS ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_GS ] , VCPU_SREG_GS ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_SS ] , VCPU_SREG_SS ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_CS ] , VCPU_SREG_CS ) ;
2009-06-09 05:10:45 -06:00
vmx - > rmode . vm86_active = 0 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2012-08-21 08:07:00 -06:00
vmx_set_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_TR ] , VCPU_SREG_TR ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
flags = vmcs_readl ( GUEST_RFLAGS ) ;
2010-04-08 09:19:35 -06:00
flags & = RMODE_GUEST_OWNED_EFLAGS_BITS ;
flags | = vmx - > rmode . save_rflags & ~ RMODE_GUEST_OWNED_EFLAGS_BITS ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_writel ( GUEST_RFLAGS , flags ) ;
2007-07-17 07:34:16 -06:00
vmcs_writel ( GUEST_CR4 , ( vmcs_readl ( GUEST_CR4 ) & ~ X86_CR4_VME ) |
( vmcs_readl ( CR4_READ_SHADOW ) & X86_CR4_VME ) ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
update_exception_bitmap ( vcpu ) ;
2013-01-21 06:36:47 -07:00
fix_pmode_seg ( vcpu , VCPU_SREG_CS , & vmx - > rmode . segs [ VCPU_SREG_CS ] ) ;
fix_pmode_seg ( vcpu , VCPU_SREG_SS , & vmx - > rmode . segs [ VCPU_SREG_SS ] ) ;
fix_pmode_seg ( vcpu , VCPU_SREG_ES , & vmx - > rmode . segs [ VCPU_SREG_ES ] ) ;
fix_pmode_seg ( vcpu , VCPU_SREG_DS , & vmx - > rmode . segs [ VCPU_SREG_DS ] ) ;
fix_pmode_seg ( vcpu , VCPU_SREG_FS , & vmx - > rmode . segs [ VCPU_SREG_FS ] ) ;
fix_pmode_seg ( vcpu , VCPU_SREG_GS , & vmx - > rmode . segs [ VCPU_SREG_GS ] ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2012-08-21 08:07:00 -06:00
static void fix_rmode_seg ( int seg , struct kvm_segment * save )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2012-08-29 17:30:19 -06:00
const struct kvm_vmx_segment_field * sf = & kvm_vmx_segment_fields [ seg ] ;
2012-12-20 07:57:45 -07:00
struct kvm_segment var = * save ;
var . dpl = 0x3 ;
if ( seg = = VCPU_SREG_CS )
var . type = 0x3 ;
if ( ! emulate_invalid_guest_state ) {
var . selector = var . base > > 4 ;
var . base = var . base & 0xffff0 ;
var . limit = 0xffff ;
var . g = 0 ;
var . db = 0 ;
var . present = 1 ;
var . s = 1 ;
var . l = 0 ;
var . unusable = 0 ;
var . type = 0x3 ;
var . avl = 0 ;
if ( save - > base & 0xf )
printk_once ( KERN_WARNING " kvm: segment base is not "
" paragraph aligned when entering "
" protected mode (seg=%d) " , seg ) ;
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2012-12-20 07:57:45 -07:00
vmcs_write16 ( sf - > selector , var . selector ) ;
2017-02-21 01:50:01 -07:00
vmcs_writel ( sf - > base , var . base ) ;
2012-12-20 07:57:45 -07:00
vmcs_write32 ( sf - > limit , var . limit ) ;
vmcs_write32 ( sf - > ar_bytes , vmx_segment_access_rights ( & var ) ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
static void enter_rmode ( struct kvm_vcpu * vcpu )
{
unsigned long flags ;
2008-08-17 07:42:16 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2018-03-20 13:17:20 -06:00
struct kvm_vmx * kvm_vmx = to_kvm_vmx ( vcpu - > kvm ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2012-08-21 08:07:00 -06:00
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_TR ] , VCPU_SREG_TR ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_ES ] , VCPU_SREG_ES ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_DS ] , VCPU_SREG_DS ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_FS ] , VCPU_SREG_FS ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_GS ] , VCPU_SREG_GS ) ;
2012-12-12 10:10:51 -07:00
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_SS ] , VCPU_SREG_SS ) ;
vmx_get_segment ( vcpu , & vmx - > rmode . segs [ VCPU_SREG_CS ] , VCPU_SREG_CS ) ;
2012-08-21 08:07:00 -06:00
2009-06-09 05:10:45 -06:00
vmx - > rmode . vm86_active = 1 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2011-03-13 04:34:27 -06:00
/*
* Very old userspace does not call KVM_SET_TSS_ADDR before entering
2013-03-15 01:38:56 -06:00
* vcpu . Warn the user that an update is overdue .
2011-03-13 04:34:27 -06:00
*/
2018-03-20 13:17:20 -06:00
if ( ! kvm_vmx - > tss_addr )
2011-03-13 04:34:27 -06:00
printk_once ( KERN_WARNING " kvm: KVM_SET_TSS_ADDR need to be "
" called before entering vcpu \n " ) ;
2011-04-27 10:42:18 -06:00
vmx_segment_cache_clear ( vmx ) ;
2018-03-20 13:17:20 -06:00
vmcs_writel ( GUEST_TR_BASE , kvm_vmx - > tss_addr ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_write32 ( GUEST_TR_LIMIT , RMODE_TSS_SIZE - 1 ) ;
vmcs_write32 ( GUEST_TR_AR_BYTES , 0x008b ) ;
flags = vmcs_readl ( GUEST_RFLAGS ) ;
2010-04-08 09:19:35 -06:00
vmx - > rmode . save_rflags = flags ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2008-01-30 05:31:27 -07:00
flags | = X86_EFLAGS_IOPL | X86_EFLAGS_VM ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_writel ( GUEST_RFLAGS , flags ) ;
2007-07-17 07:34:16 -06:00
vmcs_writel ( GUEST_CR4 , vmcs_readl ( GUEST_CR4 ) | X86_CR4_VME ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
update_exception_bitmap ( vcpu ) ;
2012-12-20 07:57:45 -07:00
fix_rmode_seg ( VCPU_SREG_SS , & vmx - > rmode . segs [ VCPU_SREG_SS ] ) ;
fix_rmode_seg ( VCPU_SREG_CS , & vmx - > rmode . segs [ VCPU_SREG_CS ] ) ;
fix_rmode_seg ( VCPU_SREG_ES , & vmx - > rmode . segs [ VCPU_SREG_ES ] ) ;
fix_rmode_seg ( VCPU_SREG_DS , & vmx - > rmode . segs [ VCPU_SREG_DS ] ) ;
fix_rmode_seg ( VCPU_SREG_GS , & vmx - > rmode . segs [ VCPU_SREG_GS ] ) ;
fix_rmode_seg ( VCPU_SREG_FS , & vmx - > rmode . segs [ VCPU_SREG_FS ] ) ;
2012-05-31 05:49:22 -06:00
2007-10-10 00:26:45 -06:00
kvm_mmu_reset_context ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2018-12-03 14:53:16 -07:00
void vmx_set_efer ( struct kvm_vcpu * vcpu , u64 efer )
2009-02-20 10:23:37 -07:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2009-09-07 02:14:12 -06:00
struct shared_msr_entry * msr = find_msr_entry ( vmx , MSR_EFER ) ;
if ( ! msr )
return ;
2009-02-20 10:23:37 -07:00
2010-01-21 06:31:50 -07:00
vcpu - > arch . efer = efer ;
2009-02-20 10:23:37 -07:00
if ( efer & EFER_LMA ) {
2013-11-25 06:37:13 -07:00
vm_entry_controls_setbit ( to_vmx ( vcpu ) , VM_ENTRY_IA32E_MODE ) ;
2009-02-20 10:23:37 -07:00
msr - > data = efer ;
} else {
2013-11-25 06:37:13 -07:00
vm_entry_controls_clearbit ( to_vmx ( vcpu ) , VM_ENTRY_IA32E_MODE ) ;
2009-02-20 10:23:37 -07:00
msr - > data = efer & ~ EFER_LME ;
}
setup_msrs ( vmx ) ;
}
2006-12-13 01:33:45 -07:00
# ifdef CONFIG_X86_64
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
static void enter_lmode ( struct kvm_vcpu * vcpu )
{
u32 guest_tr_ar ;
2011-04-27 10:42:18 -06:00
vmx_segment_cache_clear ( to_vmx ( vcpu ) ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
guest_tr_ar = vmcs_read32 ( GUEST_TR_AR_BYTES ) ;
2015-08-13 14:18:48 -06:00
if ( ( guest_tr_ar & VMX_AR_TYPE_MASK ) ! = VMX_AR_TYPE_BUSY_64_TSS ) {
2011-09-12 03:26:22 -06:00
pr_debug_ratelimited ( " %s: tss fixup for long mode. \n " ,
__func__ ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_write32 ( GUEST_TR_AR_BYTES ,
2015-08-13 14:18:48 -06:00
( guest_tr_ar & ~ VMX_AR_TYPE_MASK )
| VMX_AR_TYPE_BUSY_64_TSS ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2010-07-06 02:30:49 -06:00
vmx_set_efer ( vcpu , vcpu - > arch . efer | EFER_LMA ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
static void exit_lmode ( struct kvm_vcpu * vcpu )
{
2013-11-25 06:37:13 -07:00
vm_entry_controls_clearbit ( to_vmx ( vcpu ) , VM_ENTRY_IA32E_MODE ) ;
2010-07-06 02:30:49 -06:00
vmx_set_efer ( vcpu , vcpu - > arch . efer & ~ EFER_LMA ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
# endif
2020-03-20 15:28:18 -06:00
static void vmx_flush_tlb_all ( struct kvm_vcpu * vcpu )
2020-03-20 15:28:14 -06:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
/*
2020-03-20 15:28:18 -06:00
* INVEPT must be issued when EPT is enabled , irrespective of VPID , as
* the CPU is not required to invalidate guest - physical mappings on
* VM - Entry , even if VPID is disabled . Guest - physical mappings are
* associated with the root EPT structure and not any particular VPID
* ( INVVPID also isn ' t required to invalidate guest - physical mappings ) .
2020-03-20 15:28:14 -06:00
*/
if ( enable_ept ) {
ept_sync_global ( ) ;
} else if ( enable_vpid ) {
if ( cpu_has_vmx_invvpid_global ( ) ) {
vpid_sync_vcpu_global ( ) ;
} else {
vpid_sync_vcpu_single ( vmx - > vpid ) ;
vpid_sync_vcpu_single ( vmx - > nested . vpid02 ) ;
}
}
}
2020-03-20 15:28:16 -06:00
static void vmx_flush_tlb_current ( struct kvm_vcpu * vcpu )
{
u64 root_hpa = vcpu - > arch . mmu - > root_hpa ;
/* No flush required if the current context is invalid. */
if ( ! VALID_PAGE ( root_hpa ) )
return ;
if ( enable_ept )
ept_sync_context ( construct_eptp ( vcpu , root_hpa ) ) ;
else if ( ! is_guest_mode ( vcpu ) )
vpid_sync_context ( to_vmx ( vcpu ) - > vpid ) ;
else
vpid_sync_context ( nested_get_vpid02 ( vcpu ) ) ;
}
2018-06-29 14:10:05 -06:00
static void vmx_flush_tlb_gva ( struct kvm_vcpu * vcpu , gva_t addr )
{
/*
2020-03-20 15:28:11 -06:00
* vpid_sync_vcpu_addr ( ) is a nop if vmx - > vpid = = 0 , see the comment in
* vmx_flush_tlb_guest ( ) for an explanation of why this is ok .
2018-06-29 14:10:05 -06:00
*/
2020-03-20 15:28:11 -06:00
vpid_sync_vcpu_addr ( to_vmx ( vcpu ) - > vpid , addr ) ;
2018-06-29 14:10:05 -06:00
}
KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
Add a dedicated hook to handle flushing TLB entries on behalf of the
guest, i.e. for a paravirtualized TLB flush, and use it directly instead
of bouncing through kvm_vcpu_flush_tlb().
For VMX, change the effective implementation implementation to never do
INVEPT and flush only the current context, i.e. to always flush via
INVVPID(SINGLE_CONTEXT). The INVEPT performed by __vmx_flush_tlb() when
@invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
flush guest-physical mappings; linear and combined mappings are flushed
by VM-Enter when VPID is disabled, and changes in the guest pages tables
do not affect guest-physical mappings.
When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
architecture) to invalidate guest-physical mappings, i.e. TLB entries
that cache guest-physical mappings can live across INVVPID as the
mappings are associated with an EPTP, not a VPID. The intent of
@invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
gpa mappings", i.e. do INVEPT and not simply INVVPID. Other than nested
VPID handling, which now calls vpid_sync_context() directly, the only
scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
enabled) is if KVM is flushing TLB entries from the guest's perspective,
i.e. is only required to invalidate linear mappings.
For SVM, flushing TLB entries from the guest's perspective can be done
by flushing the current ASID, as changes to the guest's page tables are
associated only with the current ASID.
Adding a dedicated ->tlb_flush_guest() paves the way toward removing
@invalidate_gpa, which is a potentially dangerous control flag as its
meaning is not exactly crystal clear, even for those who are familiar
with the subtleties of what mappings Intel CPUs are/aren't allowed to
keep across various invalidation scenarios.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-20 15:28:10 -06:00
static void vmx_flush_tlb_guest ( struct kvm_vcpu * vcpu )
{
/*
* vpid_sync_context ( ) is a nop if vmx - > vpid = = 0 , e . g . if enable_vpid = = 0
* or a vpid couldn ' t be allocated for this vCPU . VM - Enter and VM - Exit
* are required to flush GVA - > { G , H } PA mappings from the TLB if vpid is
* disabled ( VM - Enter with vpid enabled and vpid = = 0 is disallowed ) ,
* i . e . no explicit INVVPID is necessary .
*/
vpid_sync_context ( to_vmx ( vcpu ) - > vpid ) ;
}
2008-04-27 22:24:45 -06:00
static void ept_load_pdptrs ( struct kvm_vcpu * vcpu )
{
2013-10-09 10:13:19 -06:00
struct kvm_mmu * mmu = vcpu - > arch . walk_mmu ;
2019-09-27 15:45:22 -06:00
if ( ! kvm_register_is_dirty ( vcpu , VCPU_EXREG_PDPTR ) )
2009-05-31 13:58:47 -06:00
return ;
2019-06-06 10:52:44 -06:00
if ( is_pae_paging ( vcpu ) ) {
2013-10-09 10:13:19 -06:00
vmcs_write64 ( GUEST_PDPTR0 , mmu - > pdptrs [ 0 ] ) ;
vmcs_write64 ( GUEST_PDPTR1 , mmu - > pdptrs [ 1 ] ) ;
vmcs_write64 ( GUEST_PDPTR2 , mmu - > pdptrs [ 2 ] ) ;
vmcs_write64 ( GUEST_PDPTR3 , mmu - > pdptrs [ 3 ] ) ;
2008-04-27 22:24:45 -06:00
}
}
2018-12-03 14:53:16 -07:00
void ept_save_pdptrs ( struct kvm_vcpu * vcpu )
2009-05-31 09:41:29 -06:00
{
2013-10-09 10:13:19 -06:00
struct kvm_mmu * mmu = vcpu - > arch . walk_mmu ;
2020-04-15 14:34:50 -06:00
if ( WARN_ON_ONCE ( ! is_pae_paging ( vcpu ) ) )
return ;
mmu - > pdptrs [ 0 ] = vmcs_read64 ( GUEST_PDPTR0 ) ;
mmu - > pdptrs [ 1 ] = vmcs_read64 ( GUEST_PDPTR1 ) ;
mmu - > pdptrs [ 2 ] = vmcs_read64 ( GUEST_PDPTR2 ) ;
mmu - > pdptrs [ 3 ] = vmcs_read64 ( GUEST_PDPTR3 ) ;
2009-05-31 13:58:47 -06:00
2019-09-27 15:45:22 -06:00
kvm_register_mark_dirty ( vcpu , VCPU_EXREG_PDPTR ) ;
2009-05-31 09:41:29 -06:00
}
2008-04-27 22:24:45 -06:00
static void ept_update_paging_mode_cr0 ( unsigned long * hw_cr0 ,
unsigned long cr0 ,
struct kvm_vcpu * vcpu )
{
2019-05-07 13:17:56 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2019-09-27 15:45:22 -06:00
if ( ! kvm_register_is_available ( vcpu , VCPU_EXREG_CR3 ) )
2019-09-27 15:45:23 -06:00
vmx_cache_reg ( vcpu , VCPU_EXREG_CR3 ) ;
2008-04-27 22:24:45 -06:00
if ( ! ( cr0 & X86_CR0_PG ) ) {
/* From paging/starting to nonpaging */
2019-05-07 13:17:56 -06:00
exec_controls_setbit ( vmx , CPU_BASED_CR3_LOAD_EXITING |
CPU_BASED_CR3_STORE_EXITING ) ;
2008-04-27 22:24:45 -06:00
vcpu - > arch . cr0 = cr0 ;
2009-12-07 03:16:48 -07:00
vmx_set_cr4 ( vcpu , kvm_read_cr4 ( vcpu ) ) ;
2008-04-27 22:24:45 -06:00
} else if ( ! is_paging ( vcpu ) ) {
/* From nonpaging to paging */
2019-05-07 13:17:56 -06:00
exec_controls_clearbit ( vmx , CPU_BASED_CR3_LOAD_EXITING |
CPU_BASED_CR3_STORE_EXITING ) ;
2008-04-27 22:24:45 -06:00
vcpu - > arch . cr0 = cr0 ;
2009-12-07 03:16:48 -07:00
vmx_set_cr4 ( vcpu , kvm_read_cr4 ( vcpu ) ) ;
2008-04-27 22:24:45 -06:00
}
2009-08-18 19:52:18 -06:00
if ( ! ( cr0 & X86_CR0_WP ) )
* hw_cr0 & = ~ X86_CR0_WP ;
2008-04-27 22:24:45 -06:00
}
2018-12-03 14:53:16 -07:00
void vmx_set_cr0 ( struct kvm_vcpu * vcpu , unsigned long cr0 )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2009-06-09 05:10:45 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2009-06-08 12:34:16 -06:00
unsigned long hw_cr0 ;
2018-07-13 09:42:30 -06:00
hw_cr0 = ( cr0 & ~ KVM_VM_CR0_ALWAYS_OFF ) ;
2009-06-08 12:34:16 -06:00
if ( enable_unrestricted_guest )
2013-02-04 07:00:28 -07:00
hw_cr0 | = KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST ;
2013-01-21 06:36:45 -07:00
else {
2013-02-04 07:00:28 -07:00
hw_cr0 | = KVM_VM_CR0_ALWAYS_ON ;
2008-04-27 22:24:45 -06:00
2013-01-21 06:36:45 -07:00
if ( vmx - > rmode . vm86_active & & ( cr0 & X86_CR0_PE ) )
enter_pmode ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2013-01-21 06:36:45 -07:00
if ( ! vmx - > rmode . vm86_active & & ! ( cr0 & X86_CR0_PE ) )
enter_rmode ( vcpu ) ;
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2006-12-13 01:33:45 -07:00
# ifdef CONFIG_X86_64
2010-01-21 06:31:50 -07:00
if ( vcpu - > arch . efer & EFER_LME ) {
2007-07-17 07:19:08 -06:00
if ( ! is_paging ( vcpu ) & & ( cr0 & X86_CR0_PG ) )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
enter_lmode ( vcpu ) ;
2007-07-17 07:19:08 -06:00
if ( is_paging ( vcpu ) & & ! ( cr0 & X86_CR0_PG ) )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
exit_lmode ( vcpu ) ;
}
# endif
2018-03-05 13:04:40 -07:00
if ( enable_ept & & ! enable_unrestricted_guest )
2008-04-27 22:24:45 -06:00
ept_update_paging_mode_cr0 ( & hw_cr0 , cr0 , vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_writel ( CR0_READ_SHADOW , cr0 ) ;
2008-04-27 22:24:45 -06:00
vmcs_writel ( GUEST_CR0 , hw_cr0 ) ;
2007-12-13 08:50:52 -07:00
vcpu - > arch . cr0 = cr0 ;
2020-05-01 22:32:31 -06:00
kvm_register_mark_available ( vcpu , VCPU_EXREG_CR0 ) ;
2013-01-21 06:36:49 -07:00
/* depends on vcpu->arch.cr0 to be set to a new value */
vmx - > emulation_required = emulation_required ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2020-05-01 22:32:33 -06:00
static int vmx_get_tdp_level ( struct kvm_vcpu * vcpu )
2017-08-24 06:27:55 -06:00
{
if ( cpu_has_vmx_ept_5levels ( ) & & ( cpuid_maxphyaddr ( vcpu ) > 48 ) )
return 5 ;
return 4 ;
}
2020-05-01 22:32:33 -06:00
static int get_ept_level ( struct kvm_vcpu * vcpu )
{
if ( is_guest_mode ( vcpu ) & & nested_cpu_has_ept ( get_vmcs12 ( vcpu ) ) )
return vmx_eptp_page_walk_level ( nested_ept_get_eptp ( vcpu ) ) ;
return vmx_get_tdp_level ( vcpu ) ;
}
2018-12-03 14:53:07 -07:00
u64 construct_eptp ( struct kvm_vcpu * vcpu , unsigned long root_hpa )
2008-04-27 22:24:45 -06:00
{
2017-08-24 06:27:55 -06:00
u64 eptp = VMX_EPTP_MT_WB ;
eptp | = ( get_ept_level ( vcpu ) = = 5 ) ? VMX_EPTP_PWL_5 : VMX_EPTP_PWL_4 ;
2008-04-27 22:24:45 -06:00
2017-06-30 18:26:32 -06:00
if ( enable_ept_ad_bits & &
( ! is_guest_mode ( vcpu ) | | nested_ept_ad_enabled ( vcpu ) ) )
2017-08-10 15:15:28 -06:00
eptp | = VMX_EPTP_AD_ENABLE_BIT ;
2008-04-27 22:24:45 -06:00
eptp | = ( root_hpa & PAGE_MASK ) ;
return eptp ;
}
2020-03-20 15:28:33 -06:00
void vmx_load_mmu_pgd ( struct kvm_vcpu * vcpu , unsigned long pgd )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2018-07-19 02:40:23 -06:00
struct kvm * kvm = vcpu - > kvm ;
2019-09-27 15:45:16 -06:00
bool update_guest_cr3 = true ;
2008-04-27 22:24:45 -06:00
unsigned long guest_cr3 ;
u64 eptp ;
2009-03-23 10:26:32 -06:00
if ( enable_ept ) {
2020-03-20 15:28:33 -06:00
eptp = construct_eptp ( vcpu , pgd ) ;
2008-04-27 22:24:45 -06:00
vmcs_write64 ( EPT_POINTER , eptp ) ;
2018-07-19 02:40:23 -06:00
2020-03-21 14:26:00 -06:00
if ( kvm_x86_ops . tlb_remote_flush ) {
2018-07-19 02:40:23 -06:00
spin_lock ( & to_kvm_vmx ( kvm ) - > ept_pointer_lock ) ;
to_vmx ( vcpu ) - > ept_pointer = eptp ;
to_kvm_vmx ( kvm ) - > ept_pointers_match
= EPT_POINTERS_CHECK ;
spin_unlock ( & to_kvm_vmx ( kvm ) - > ept_pointer_lock ) ;
}
2020-05-20 06:37:37 -06:00
if ( ! enable_unrestricted_guest & & ! is_paging ( vcpu ) )
2018-07-19 02:40:23 -06:00
guest_cr3 = to_kvm_vmx ( kvm ) - > ept_identity_map_addr ;
2019-09-27 15:45:17 -06:00
else if ( test_bit ( VCPU_EXREG_CR3 , ( ulong * ) & vcpu - > arch . regs_avail ) )
guest_cr3 = vcpu - > arch . cr3 ;
else /* vmcs01.GUEST_CR3 is already up-to-date. */
update_guest_cr3 = false ;
2009-10-26 12:48:33 -06:00
ept_load_pdptrs ( vcpu ) ;
2020-03-20 15:28:33 -06:00
} else {
guest_cr3 = pgd ;
2008-04-27 22:24:45 -06:00
}
2019-09-27 15:45:16 -06:00
if ( update_guest_cr3 )
vmcs_writel ( GUEST_CR3 , guest_cr3 ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2018-12-03 14:53:16 -07:00
int vmx_set_cr4 ( struct kvm_vcpu * vcpu , unsigned long cr4 )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2019-05-07 13:17:57 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2015-04-16 12:58:05 -06:00
/*
* Pass through host ' s Machine Check Enable value to hw_cr4 , which
* is in force while we are in guest mode . Do not let guests control
* this bit , even if host CR4 . MCE = = 0.
*/
2018-03-05 13:04:39 -07:00
unsigned long hw_cr4 ;
hw_cr4 = ( cr4_read_shadow ( ) & X86_CR4_MCE ) | ( cr4 & ~ X86_CR4_MCE ) ;
if ( enable_unrestricted_guest )
hw_cr4 | = KVM_VM_CR4_ALWAYS_ON_UNRESTRICTED_GUEST ;
2019-05-07 13:17:57 -06:00
else if ( vmx - > rmode . vm86_active )
2018-03-05 13:04:39 -07:00
hw_cr4 | = KVM_RMODE_VM_CR4_ALWAYS_ON ;
else
hw_cr4 | = KVM_PMODE_VM_CR4_ALWAYS_ON ;
2008-04-27 22:24:45 -06:00
2018-04-30 11:01:06 -06:00
if ( ! boot_cpu_has ( X86_FEATURE_UMIP ) & & vmx_umip_emulated ( ) ) {
if ( cr4 & X86_CR4_UMIP ) {
2019-05-07 13:17:57 -06:00
secondary_exec_controls_setbit ( vmx , SECONDARY_EXEC_DESC ) ;
2018-04-30 11:01:06 -06:00
hw_cr4 & = ~ X86_CR4_UMIP ;
} else if ( ! is_guest_mode ( vcpu ) | |
2019-05-07 13:17:57 -06:00
! nested_cpu_has2 ( get_vmcs12 ( vcpu ) , SECONDARY_EXEC_DESC ) ) {
secondary_exec_controls_clearbit ( vmx , SECONDARY_EXEC_DESC ) ;
}
2018-04-30 11:01:06 -06:00
}
2016-07-12 02:44:55 -06:00
2011-05-25 14:03:24 -06:00
if ( cr4 & X86_CR4_VMXE ) {
/*
* To use VMXON ( and later other VMX instructions ) , a guest
* must first be able to turn on cr4 . VMXE ( see handle_vmon ( ) ) .
* So basically the check on whether to allow nested VMX
2018-09-18 07:19:17 -06:00
* is here . We operate under the default treatment of SMM ,
* so VMX cannot be enabled under SMM .
2011-05-25 14:03:24 -06:00
*/
2018-09-18 07:19:17 -06:00
if ( ! nested_vmx_allowed ( vcpu ) | | is_smm ( vcpu ) )
2011-05-25 14:03:24 -06:00
return 1 ;
2013-03-07 06:08:07 -07:00
}
KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
all CR0 and CR4 bits are allowed to be 1 during VMX operation.
This does not match real hardware, which disallows the high 32 bits of
CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
which are defined in the SDM but missing according to CPUID). A guest
can induce a VM-entry failure by setting these bits in GUEST_CR0 and
GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
valid.
Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
checks on these registers do not verify must-be-0 bits. Fix these checks
to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
This patch should introduce no change in behavior in KVM, since these
MSRs are still -1ULL.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 19:14:08 -07:00
2019-05-07 13:17:57 -06:00
if ( vmx - > nested . vmxon & & ! nested_cr4_valid ( vcpu , cr4 ) )
2011-05-25 14:03:24 -06:00
return 1 ;
2007-12-13 08:50:52 -07:00
vcpu - > arch . cr4 = cr4 ;
2020-05-01 22:32:30 -06:00
kvm_register_mark_available ( vcpu , VCPU_EXREG_CR4 ) ;
2018-03-05 13:04:39 -07:00
if ( ! enable_unrestricted_guest ) {
if ( enable_ept ) {
if ( ! is_paging ( vcpu ) ) {
hw_cr4 & = ~ X86_CR4_PAE ;
hw_cr4 | = X86_CR4_PSE ;
} else if ( ! ( cr4 & X86_CR4_PAE ) ) {
hw_cr4 & = ~ X86_CR4_PAE ;
}
2009-12-08 03:14:42 -07:00
}
2008-04-27 22:24:45 -06:00
2015-11-02 14:20:00 -07:00
/*
2016-03-22 02:51:15 -06:00
* SMEP / SMAP / PKU is disabled if CPU is in non - paging mode in
* hardware . To emulate this behavior , SMEP / SMAP / PKU needs
* to be manually disabled when guest switches to non - paging
* mode .
*
* If ! enable_unrestricted_guest , the CPU is always running
* with CR0 . PG = 1 and CR4 needs to be modified .
* If enable_unrestricted_guest , the CPU automatically
* disables SMEP / SMAP / PKU when the guest sets CR0 . PG = 0.
2015-11-02 14:20:00 -07:00
*/
2018-03-05 13:04:39 -07:00
if ( ! is_paging ( vcpu ) )
hw_cr4 & = ~ ( X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE ) ;
}
2015-11-02 14:20:00 -07:00
2008-04-27 22:24:45 -06:00
vmcs_writel ( CR4_READ_SHADOW , cr4 ) ;
vmcs_writel ( GUEST_CR4 , hw_cr4 ) ;
2011-05-25 14:03:24 -06:00
return 0 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2018-12-03 14:53:16 -07:00
void vmx_get_segment ( struct kvm_vcpu * vcpu , struct kvm_segment * var , int seg )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2011-01-03 05:28:52 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
u32 ar ;
2012-12-12 10:10:51 -07:00
if ( vmx - > rmode . vm86_active & & seg ! = VCPU_SREG_LDTR ) {
2012-08-21 08:07:00 -06:00
* var = vmx - > rmode . segs [ seg ] ;
2011-01-03 05:28:52 -07:00
if ( seg = = VCPU_SREG_TR
2011-04-27 10:42:18 -06:00
| | var - > selector = = vmx_read_guest_seg_selector ( vmx , seg ) )
2012-08-21 08:07:00 -06:00
return ;
2012-08-21 08:07:08 -06:00
var - > base = vmx_read_guest_seg_base ( vmx , seg ) ;
var - > selector = vmx_read_guest_seg_selector ( vmx , seg ) ;
return ;
2011-01-03 05:28:52 -07:00
}
2011-04-27 10:42:18 -06:00
var - > base = vmx_read_guest_seg_base ( vmx , seg ) ;
var - > limit = vmx_read_guest_seg_limit ( vmx , seg ) ;
var - > selector = vmx_read_guest_seg_selector ( vmx , seg ) ;
ar = vmx_read_guest_seg_ar ( vmx , seg ) ;
2013-06-28 04:17:18 -06:00
var - > unusable = ( ar > > 16 ) & 1 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
var - > type = ar & 15 ;
var - > s = ( ar > > 4 ) & 1 ;
var - > dpl = ( ar > > 5 ) & 3 ;
2013-06-28 04:17:18 -06:00
/*
* Some userspaces do not preserve unusable property . Since usable
* segment has to be present according to VMX spec we can use present
* property to amend userspace bug by making unusable segment always
* nonpresent . vmx_segment_access_rights ( ) already marks nonpresent
* segment as unusable .
*/
var - > present = ! var - > unusable ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
var - > avl = ( ar > > 12 ) & 1 ;
var - > l = ( ar > > 13 ) & 1 ;
var - > db = ( ar > > 14 ) & 1 ;
var - > g = ( ar > > 15 ) & 1 ;
}
2011-01-03 05:28:52 -07:00
static u64 vmx_get_segment_base ( struct kvm_vcpu * vcpu , int seg )
{
struct kvm_segment s ;
if ( to_vmx ( vcpu ) - > rmode . vm86_active ) {
vmx_get_segment ( vcpu , & s , seg ) ;
return s . base ;
}
2011-04-27 10:42:18 -06:00
return vmx_read_guest_seg_base ( to_vmx ( vcpu ) , seg ) ;
2011-01-03 05:28:52 -07:00
}
2018-12-03 14:53:16 -07:00
int vmx_get_cpl ( struct kvm_vcpu * vcpu )
2008-03-24 11:38:34 -06:00
{
2013-01-07 14:27:06 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2014-05-14 01:39:49 -06:00
if ( unlikely ( vmx - > rmode . vm86_active ) )
2008-03-24 11:38:34 -06:00
return 0 ;
2014-05-14 01:39:49 -06:00
else {
int ar = vmx_read_guest_seg_ar ( vmx , VCPU_SREG_SS ) ;
2015-08-13 14:18:48 -06:00
return VMX_AR_DPL ( ar ) ;
2011-03-07 06:26:44 -07:00
}
}
2007-05-07 01:55:37 -06:00
static u32 vmx_segment_access_rights ( struct kvm_segment * var )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
u32 ar ;
2012-06-07 08:06:10 -06:00
if ( var - > unusable | | ! var - > present )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
ar = 1 < < 16 ;
else {
ar = var - > type & 15 ;
ar | = ( var - > s & 1 ) < < 4 ;
ar | = ( var - > dpl & 3 ) < < 5 ;
ar | = ( var - > present & 1 ) < < 7 ;
ar | = ( var - > avl & 1 ) < < 12 ;
ar | = ( var - > l & 1 ) < < 13 ;
ar | = ( var - > db & 1 ) < < 14 ;
ar | = ( var - > g & 1 ) < < 15 ;
}
2007-05-07 01:55:37 -06:00
return ar ;
}
2018-12-03 14:53:16 -07:00
void vmx_set_segment ( struct kvm_vcpu * vcpu , struct kvm_segment * var , int seg )
2007-05-07 01:55:37 -06:00
{
2009-06-09 05:10:45 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2012-08-29 17:30:19 -06:00
const struct kvm_vmx_segment_field * sf = & kvm_vmx_segment_fields [ seg ] ;
2007-05-07 01:55:37 -06:00
2011-04-27 10:42:18 -06:00
vmx_segment_cache_clear ( vmx ) ;
2012-12-12 10:10:54 -07:00
if ( vmx - > rmode . vm86_active & & seg ! = VCPU_SREG_LDTR ) {
vmx - > rmode . segs [ seg ] = * var ;
if ( seg = = VCPU_SREG_TR )
vmcs_write16 ( sf - > selector , var - > selector ) ;
else if ( var - > s )
fix_rmode_seg ( seg , & vmx - > rmode . segs [ seg ] ) ;
2012-12-20 07:57:45 -07:00
goto out ;
2007-05-07 01:55:37 -06:00
}
2012-12-12 10:10:54 -07:00
2007-05-07 01:55:37 -06:00
vmcs_writel ( sf - > base , var - > base ) ;
vmcs_write32 ( sf - > limit , var - > limit ) ;
vmcs_write16 ( sf - > selector , var - > selector ) ;
2009-06-08 12:34:16 -06:00
/*
* Fix the " Accessed " bit in AR field of segment registers for older
* qemu binaries .
* IA32 arch specifies that at the time of processor reset the
* " Accessed " bit in the AR field of segment registers is 1. And qemu
2012-06-28 01:16:19 -06:00
* is setting it to 0 in the userland code . This causes invalid guest
2009-06-08 12:34:16 -06:00
* state vmexit when " unrestricted guest " mode is turned on .
* Fix for this setup issue in cpu_reset is being pushed in the qemu
* tree . Newer qemu binaries with that qemu fix would not need this
* kvm hack .
*/
if ( enable_unrestricted_guest & & ( seg ! = VCPU_SREG_LDTR ) )
2012-12-12 10:10:55 -07:00
var - > type | = 0x1 ; /* Accessed */
2009-06-08 12:34:16 -06:00
2012-12-12 10:10:55 -07:00
vmcs_write32 ( sf - > ar_bytes , vmx_segment_access_rights ( var ) ) ;
2012-12-20 07:57:45 -07:00
out :
2014-03-27 02:51:52 -06:00
vmx - > emulation_required = emulation_required ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
static void vmx_get_cs_db_l_bits ( struct kvm_vcpu * vcpu , int * db , int * l )
{
2011-04-27 10:42:18 -06:00
u32 ar = vmx_read_guest_seg_ar ( to_vmx ( vcpu ) , VCPU_SREG_CS ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
* db = ( ar > > 14 ) & 1 ;
* l = ( ar > > 13 ) & 1 ;
}
2010-02-16 01:51:48 -07:00
static void vmx_get_idt ( struct kvm_vcpu * vcpu , struct desc_ptr * dt )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2010-02-16 01:51:48 -07:00
dt - > size = vmcs_read32 ( GUEST_IDTR_LIMIT ) ;
dt - > address = vmcs_readl ( GUEST_IDTR_BASE ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2010-02-16 01:51:48 -07:00
static void vmx_set_idt ( struct kvm_vcpu * vcpu , struct desc_ptr * dt )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2010-02-16 01:51:48 -07:00
vmcs_write32 ( GUEST_IDTR_LIMIT , dt - > size ) ;
vmcs_writel ( GUEST_IDTR_BASE , dt - > address ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2010-02-16 01:51:48 -07:00
static void vmx_get_gdt ( struct kvm_vcpu * vcpu , struct desc_ptr * dt )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2010-02-16 01:51:48 -07:00
dt - > size = vmcs_read32 ( GUEST_GDTR_LIMIT ) ;
dt - > address = vmcs_readl ( GUEST_GDTR_BASE ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2010-02-16 01:51:48 -07:00
static void vmx_set_gdt ( struct kvm_vcpu * vcpu , struct desc_ptr * dt )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2010-02-16 01:51:48 -07:00
vmcs_write32 ( GUEST_GDTR_LIMIT , dt - > size ) ;
vmcs_writel ( GUEST_GDTR_BASE , dt - > address ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2008-08-17 07:38:32 -06:00
static bool rmode_segment_valid ( struct kvm_vcpu * vcpu , int seg )
{
struct kvm_segment var ;
u32 ar ;
vmx_get_segment ( vcpu , & var , seg ) ;
2012-12-12 10:10:49 -07:00
var . dpl = 0x3 ;
2012-12-12 10:10:50 -07:00
if ( seg = = VCPU_SREG_CS )
var . type = 0x3 ;
2008-08-17 07:38:32 -06:00
ar = vmx_segment_access_rights ( & var ) ;
if ( var . base ! = ( var . selector < < 4 ) )
return false ;
2012-12-20 07:57:44 -07:00
if ( var . limit ! = 0xffff )
2008-08-17 07:38:32 -06:00
return false ;
2012-12-12 10:10:49 -07:00
if ( ar ! = 0xf3 )
2008-08-17 07:38:32 -06:00
return false ;
return true ;
}
static bool code_segment_valid ( struct kvm_vcpu * vcpu )
{
struct kvm_segment cs ;
unsigned int cs_rpl ;
vmx_get_segment ( vcpu , & cs , VCPU_SREG_CS ) ;
2015-03-29 07:33:04 -06:00
cs_rpl = cs . selector & SEGMENT_RPL_MASK ;
2008-08-17 07:38:32 -06:00
2009-01-04 14:26:52 -07:00
if ( cs . unusable )
return false ;
2015-08-13 14:18:48 -06:00
if ( ~ cs . type & ( VMX_AR_TYPE_CODE_MASK | VMX_AR_TYPE_ACCESSES_MASK ) )
2008-08-17 07:38:32 -06:00
return false ;
if ( ! cs . s )
return false ;
2015-08-13 14:18:48 -06:00
if ( cs . type & VMX_AR_TYPE_WRITEABLE_MASK ) {
2008-08-17 07:38:32 -06:00
if ( cs . dpl > cs_rpl )
return false ;
2009-01-04 14:26:52 -07:00
} else {
2008-08-17 07:38:32 -06:00
if ( cs . dpl ! = cs_rpl )
return false ;
}
if ( ! cs . present )
return false ;
/* TODO: Add Reserved field check, this'll require a new member in the kvm_segment_field structure */
return true ;
}
static bool stack_segment_valid ( struct kvm_vcpu * vcpu )
{
struct kvm_segment ss ;
unsigned int ss_rpl ;
vmx_get_segment ( vcpu , & ss , VCPU_SREG_SS ) ;
2015-03-29 07:33:04 -06:00
ss_rpl = ss . selector & SEGMENT_RPL_MASK ;
2008-08-17 07:38:32 -06:00
2009-01-04 14:26:52 -07:00
if ( ss . unusable )
return true ;
if ( ss . type ! = 3 & & ss . type ! = 7 )
2008-08-17 07:38:32 -06:00
return false ;
if ( ! ss . s )
return false ;
if ( ss . dpl ! = ss_rpl ) /* DPL != RPL */
return false ;
if ( ! ss . present )
return false ;
return true ;
}
static bool data_segment_valid ( struct kvm_vcpu * vcpu , int seg )
{
struct kvm_segment var ;
unsigned int rpl ;
vmx_get_segment ( vcpu , & var , seg ) ;
2015-03-29 07:33:04 -06:00
rpl = var . selector & SEGMENT_RPL_MASK ;
2008-08-17 07:38:32 -06:00
2009-01-04 14:26:52 -07:00
if ( var . unusable )
return true ;
2008-08-17 07:38:32 -06:00
if ( ! var . s )
return false ;
if ( ! var . present )
return false ;
2015-08-13 14:18:48 -06:00
if ( ~ var . type & ( VMX_AR_TYPE_CODE_MASK | VMX_AR_TYPE_WRITEABLE_MASK ) ) {
2008-08-17 07:38:32 -06:00
if ( var . dpl < rpl ) /* DPL < RPL */
return false ;
}
/* TODO: Add other members to kvm_segment_field to allow checking for other access
* rights flags
*/
return true ;
}
static bool tr_valid ( struct kvm_vcpu * vcpu )
{
struct kvm_segment tr ;
vmx_get_segment ( vcpu , & tr , VCPU_SREG_TR ) ;
2009-01-04 14:26:52 -07:00
if ( tr . unusable )
return false ;
2015-03-29 07:33:04 -06:00
if ( tr . selector & SEGMENT_TI_MASK ) /* TI = 1 */
2008-08-17 07:38:32 -06:00
return false ;
2009-01-04 14:26:52 -07:00
if ( tr . type ! = 3 & & tr . type ! = 11 ) /* TODO: Check if guest is in IA32e mode */
2008-08-17 07:38:32 -06:00
return false ;
if ( ! tr . present )
return false ;
return true ;
}
static bool ldtr_valid ( struct kvm_vcpu * vcpu )
{
struct kvm_segment ldtr ;
vmx_get_segment ( vcpu , & ldtr , VCPU_SREG_LDTR ) ;
2009-01-04 14:26:52 -07:00
if ( ldtr . unusable )
return true ;
2015-03-29 07:33:04 -06:00
if ( ldtr . selector & SEGMENT_TI_MASK ) /* TI = 1 */
2008-08-17 07:38:32 -06:00
return false ;
if ( ldtr . type ! = 2 )
return false ;
if ( ! ldtr . present )
return false ;
return true ;
}
static bool cs_ss_rpl_check ( struct kvm_vcpu * vcpu )
{
struct kvm_segment cs , ss ;
vmx_get_segment ( vcpu , & cs , VCPU_SREG_CS ) ;
vmx_get_segment ( vcpu , & ss , VCPU_SREG_SS ) ;
2015-03-29 07:33:04 -06:00
return ( ( cs . selector & SEGMENT_RPL_MASK ) = =
( ss . selector & SEGMENT_RPL_MASK ) ) ;
2008-08-17 07:38:32 -06:00
}
/*
* Check if guest state is valid . Returns true if valid , false if
* not .
* We assume that registers are always usable
*/
static bool guest_state_valid ( struct kvm_vcpu * vcpu )
{
2013-01-21 06:36:43 -07:00
if ( enable_unrestricted_guest )
return true ;
2008-08-17 07:38:32 -06:00
/* real mode guest state checks */
2013-04-14 07:07:37 -06:00
if ( ! is_protmode ( vcpu ) | | ( vmx_get_rflags ( vcpu ) & X86_EFLAGS_VM ) ) {
2008-08-17 07:38:32 -06:00
if ( ! rmode_segment_valid ( vcpu , VCPU_SREG_CS ) )
return false ;
if ( ! rmode_segment_valid ( vcpu , VCPU_SREG_SS ) )
return false ;
if ( ! rmode_segment_valid ( vcpu , VCPU_SREG_DS ) )
return false ;
if ( ! rmode_segment_valid ( vcpu , VCPU_SREG_ES ) )
return false ;
if ( ! rmode_segment_valid ( vcpu , VCPU_SREG_FS ) )
return false ;
if ( ! rmode_segment_valid ( vcpu , VCPU_SREG_GS ) )
return false ;
} else {
/* protected mode guest state checks */
if ( ! cs_ss_rpl_check ( vcpu ) )
return false ;
if ( ! code_segment_valid ( vcpu ) )
return false ;
if ( ! stack_segment_valid ( vcpu ) )
return false ;
if ( ! data_segment_valid ( vcpu , VCPU_SREG_DS ) )
return false ;
if ( ! data_segment_valid ( vcpu , VCPU_SREG_ES ) )
return false ;
if ( ! data_segment_valid ( vcpu , VCPU_SREG_FS ) )
return false ;
if ( ! data_segment_valid ( vcpu , VCPU_SREG_GS ) )
return false ;
if ( ! tr_valid ( vcpu ) )
return false ;
if ( ! ldtr_valid ( vcpu ) )
return false ;
}
/* TODO:
* - Add checks on RIP
* - Add checks on RFLAGS
*/
return true ;
}
2007-10-08 07:02:08 -06:00
static int init_rmode_tss ( struct kvm * kvm )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2011-03-09 00:41:04 -07:00
gfn_t fn ;
2007-10-01 14:14:18 -06:00
u16 data = 0 ;
2014-09-16 05:37:40 -06:00
int idx , r ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2011-03-09 00:41:04 -07:00
idx = srcu_read_lock ( & kvm - > srcu ) ;
2018-03-20 13:17:20 -06:00
fn = to_kvm_vmx ( kvm ) - > tss_addr > > PAGE_SHIFT ;
2007-10-01 14:14:18 -06:00
r = kvm_clear_guest_page ( kvm , fn , 0 , PAGE_SIZE ) ;
if ( r < 0 )
2007-12-20 17:18:22 -07:00
goto out ;
2007-10-01 14:14:18 -06:00
data = TSS_BASE_SIZE + TSS_REDIRECTION_SIZE ;
2008-08-13 00:10:33 -06:00
r = kvm_write_guest_page ( kvm , fn + + , & data ,
TSS_IOPB_BASE_OFFSET , sizeof ( u16 ) ) ;
2007-10-01 14:14:18 -06:00
if ( r < 0 )
2007-12-20 17:18:22 -07:00
goto out ;
2007-10-01 14:14:18 -06:00
r = kvm_clear_guest_page ( kvm , fn + + , 0 , PAGE_SIZE ) ;
if ( r < 0 )
2007-12-20 17:18:22 -07:00
goto out ;
2007-10-01 14:14:18 -06:00
r = kvm_clear_guest_page ( kvm , fn , 0 , PAGE_SIZE ) ;
if ( r < 0 )
2007-12-20 17:18:22 -07:00
goto out ;
2007-10-01 14:14:18 -06:00
data = ~ 0 ;
2007-12-20 17:18:22 -07:00
r = kvm_write_guest_page ( kvm , fn , & data ,
RMODE_TSS_SIZE - 2 * PAGE_SIZE - 1 ,
sizeof ( u8 ) ) ;
out :
2011-03-09 00:41:04 -07:00
srcu_read_unlock ( & kvm - > srcu , idx ) ;
2014-09-16 05:37:40 -06:00
return r ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2008-04-25 07:44:52 -06:00
static int init_rmode_identity_map ( struct kvm * kvm )
{
2018-03-20 13:17:20 -06:00
struct kvm_vmx * kvm_vmx = to_kvm_vmx ( kvm ) ;
2020-01-09 07:57:14 -07:00
int i , r = 0 ;
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:11 -07:00
kvm_pfn_t identity_map_pfn ;
2008-04-25 07:44:52 -06:00
u32 tmp ;
2018-03-20 13:17:20 -06:00
/* Protect kvm_vmx->ept_identity_pagetable_done. */
2014-09-16 04:41:58 -06:00
mutex_lock ( & kvm - > slots_lock ) ;
2018-03-20 13:17:20 -06:00
if ( likely ( kvm_vmx - > ept_identity_pagetable_done ) )
2020-01-09 07:57:14 -07:00
goto out ;
2014-09-16 04:41:58 -06:00
2018-03-20 13:17:20 -06:00
if ( ! kvm_vmx - > ept_identity_map_addr )
kvm_vmx - > ept_identity_map_addr = VMX_EPT_IDENTITY_PAGETABLE_ADDR ;
identity_map_pfn = kvm_vmx - > ept_identity_map_addr > > PAGE_SHIFT ;
2014-09-16 04:41:58 -06:00
2017-08-24 12:51:34 -06:00
r = __x86_set_memory_region ( kvm , IDENTITY_PAGETABLE_PRIVATE_MEMSLOT ,
2018-03-20 13:17:20 -06:00
kvm_vmx - > ept_identity_map_addr , PAGE_SIZE ) ;
2014-09-16 04:41:59 -06:00
if ( r < 0 )
2020-01-09 07:57:14 -07:00
goto out ;
2014-09-16 04:41:58 -06:00
2008-04-25 07:44:52 -06:00
r = kvm_clear_guest_page ( kvm , identity_map_pfn , 0 , PAGE_SIZE ) ;
if ( r < 0 )
goto out ;
/* Set up identity-mapping pagetable for EPT in real mode */
for ( i = 0 ; i < PT32_ENT_PER_PAGE ; i + + ) {
tmp = ( i < < 22 ) + ( _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE ) ;
r = kvm_write_guest_page ( kvm , identity_map_pfn ,
& tmp , i * sizeof ( tmp ) , sizeof ( tmp ) ) ;
if ( r < 0 )
goto out ;
}
2018-03-20 13:17:20 -06:00
kvm_vmx - > ept_identity_pagetable_done = true ;
2014-09-16 04:41:59 -06:00
2008-04-25 07:44:52 -06:00
out :
2014-09-16 04:41:58 -06:00
mutex_unlock ( & kvm - > slots_lock ) ;
2014-09-16 04:41:59 -06:00
return r ;
2008-04-25 07:44:52 -06:00
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
static void seg_setup ( int seg )
{
2012-08-29 17:30:19 -06:00
const struct kvm_vmx_segment_field * sf = & kvm_vmx_segment_fields [ seg ] ;
2009-06-08 12:34:16 -06:00
unsigned int ar ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_write16 ( sf - > selector , 0 ) ;
vmcs_writel ( sf - > base , 0 ) ;
vmcs_write32 ( sf - > limit , 0xffff ) ;
2012-12-20 07:57:46 -07:00
ar = 0x93 ;
if ( seg = = VCPU_SREG_CS )
ar | = 0x08 ; /* code segment */
2009-06-08 12:34:16 -06:00
vmcs_write32 ( sf - > ar_bytes , ar ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2007-10-28 19:40:42 -06:00
static int alloc_apic_access_page ( struct kvm * kvm )
{
KVM: fix error paths for failed gfn_to_page() calls
This bug was triggered:
[ 4220.198458] BUG: unable to handle kernel paging request at fffffffffffffffe
[ 4220.203907] IP: [<ffffffff81104d85>] put_page+0xf/0x34
......
[ 4220.237326] Call Trace:
[ 4220.237361] [<ffffffffa03830d0>] kvm_arch_destroy_vm+0xf9/0x101 [kvm]
[ 4220.237382] [<ffffffffa036fe53>] kvm_put_kvm+0xcc/0x127 [kvm]
[ 4220.237401] [<ffffffffa03702bc>] kvm_vcpu_release+0x18/0x1c [kvm]
[ 4220.237407] [<ffffffff81145425>] __fput+0x111/0x1ed
[ 4220.237411] [<ffffffff8114550f>] ____fput+0xe/0x10
[ 4220.237418] [<ffffffff81063511>] task_work_run+0x5d/0x88
[ 4220.237424] [<ffffffff8104c3f7>] do_exit+0x2bf/0x7ca
The test case:
printf(fmt, ##args); \
exit(-1);} while (0)
static int create_vm(void)
{
int sys_fd, vm_fd;
sys_fd = open("/dev/kvm", O_RDWR);
if (sys_fd < 0)
die("open /dev/kvm fail.\n");
vm_fd = ioctl(sys_fd, KVM_CREATE_VM, 0);
if (vm_fd < 0)
die("KVM_CREATE_VM fail.\n");
return vm_fd;
}
static int create_vcpu(int vm_fd)
{
int vcpu_fd;
vcpu_fd = ioctl(vm_fd, KVM_CREATE_VCPU, 0);
if (vcpu_fd < 0)
die("KVM_CREATE_VCPU ioctl.\n");
printf("Create vcpu.\n");
return vcpu_fd;
}
static void *vcpu_thread(void *arg)
{
int vm_fd = (int)(long)arg;
create_vcpu(vm_fd);
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t thread;
int vm_fd;
(void)argc;
(void)argv;
vm_fd = create_vm();
pthread_create(&thread, NULL, vcpu_thread, (void *)(long)vm_fd);
printf("Exit.\n");
return 0;
}
It caused by release kvm->arch.ept_identity_map_addr which is the
error page.
The parent thread can send KILL signal to the vcpu thread when it was
exiting which stops faulting pages and potentially allocating memory.
So gfn_to_pfn/gfn_to_page may fail at this time
Fixed by checking the page before it is used
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-07 00:14:20 -06:00
struct page * page ;
2007-10-28 19:40:42 -06:00
int r = 0 ;
2009-12-23 09:35:26 -07:00
mutex_lock ( & kvm - > slots_lock ) ;
2014-09-24 01:57:58 -06:00
if ( kvm - > arch . apic_access_page_done )
2007-10-28 19:40:42 -06:00
goto out ;
2015-10-12 05:38:32 -06:00
r = __x86_set_memory_region ( kvm , APIC_ACCESS_PAGE_PRIVATE_MEMSLOT ,
APIC_DEFAULT_PHYS_BASE , PAGE_SIZE ) ;
2007-10-28 19:40:42 -06:00
if ( r )
goto out ;
2008-02-10 09:04:15 -07:00
2014-09-10 23:38:00 -06:00
page = gfn_to_page ( kvm , APIC_DEFAULT_PHYS_BASE > > PAGE_SHIFT ) ;
KVM: fix error paths for failed gfn_to_page() calls
This bug was triggered:
[ 4220.198458] BUG: unable to handle kernel paging request at fffffffffffffffe
[ 4220.203907] IP: [<ffffffff81104d85>] put_page+0xf/0x34
......
[ 4220.237326] Call Trace:
[ 4220.237361] [<ffffffffa03830d0>] kvm_arch_destroy_vm+0xf9/0x101 [kvm]
[ 4220.237382] [<ffffffffa036fe53>] kvm_put_kvm+0xcc/0x127 [kvm]
[ 4220.237401] [<ffffffffa03702bc>] kvm_vcpu_release+0x18/0x1c [kvm]
[ 4220.237407] [<ffffffff81145425>] __fput+0x111/0x1ed
[ 4220.237411] [<ffffffff8114550f>] ____fput+0xe/0x10
[ 4220.237418] [<ffffffff81063511>] task_work_run+0x5d/0x88
[ 4220.237424] [<ffffffff8104c3f7>] do_exit+0x2bf/0x7ca
The test case:
printf(fmt, ##args); \
exit(-1);} while (0)
static int create_vm(void)
{
int sys_fd, vm_fd;
sys_fd = open("/dev/kvm", O_RDWR);
if (sys_fd < 0)
die("open /dev/kvm fail.\n");
vm_fd = ioctl(sys_fd, KVM_CREATE_VM, 0);
if (vm_fd < 0)
die("KVM_CREATE_VM fail.\n");
return vm_fd;
}
static int create_vcpu(int vm_fd)
{
int vcpu_fd;
vcpu_fd = ioctl(vm_fd, KVM_CREATE_VCPU, 0);
if (vcpu_fd < 0)
die("KVM_CREATE_VCPU ioctl.\n");
printf("Create vcpu.\n");
return vcpu_fd;
}
static void *vcpu_thread(void *arg)
{
int vm_fd = (int)(long)arg;
create_vcpu(vm_fd);
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t thread;
int vm_fd;
(void)argc;
(void)argv;
vm_fd = create_vm();
pthread_create(&thread, NULL, vcpu_thread, (void *)(long)vm_fd);
printf("Exit.\n");
return 0;
}
It caused by release kvm->arch.ept_identity_map_addr which is the
error page.
The parent thread can send KILL signal to the vcpu thread when it was
exiting which stops faulting pages and potentially allocating memory.
So gfn_to_pfn/gfn_to_page may fail at this time
Fixed by checking the page before it is used
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-07 00:14:20 -06:00
if ( is_error_page ( page ) ) {
r = - EFAULT ;
goto out ;
}
2014-09-24 01:57:58 -06:00
/*
* Do not pin the page in memory , so that memory hot - unplug
* is able to migrate it .
*/
put_page ( page ) ;
kvm - > arch . apic_access_page_done = true ;
2007-10-28 19:40:42 -06:00
out :
2009-12-23 09:35:26 -07:00
mutex_unlock ( & kvm - > slots_lock ) ;
2007-10-28 19:40:42 -06:00
return r ;
}
2018-12-03 14:53:16 -07:00
int allocate_vpid ( void )
2008-01-17 00:14:33 -07:00
{
int vpid ;
2009-03-23 10:01:29 -06:00
if ( ! enable_vpid )
2015-09-16 03:30:05 -06:00
return 0 ;
2008-01-17 00:14:33 -07:00
spin_lock ( & vmx_vpid_lock ) ;
vpid = find_first_zero_bit ( vmx_vpid_bitmap , VMX_NR_VPIDS ) ;
2015-09-16 03:30:05 -06:00
if ( vpid < VMX_NR_VPIDS )
2008-01-17 00:14:33 -07:00
__set_bit ( vpid , vmx_vpid_bitmap ) ;
2015-09-16 03:30:05 -06:00
else
vpid = 0 ;
2008-01-17 00:14:33 -07:00
spin_unlock ( & vmx_vpid_lock ) ;
2015-09-16 03:30:05 -06:00
return vpid ;
2008-01-17 00:14:33 -07:00
}
2018-12-03 14:53:16 -07:00
void free_vpid ( int vpid )
2010-04-17 02:41:47 -06:00
{
2015-09-16 03:30:05 -06:00
if ( ! enable_vpid | | vpid = = 0 )
2010-04-17 02:41:47 -06:00
return ;
spin_lock ( & vmx_vpid_lock ) ;
2015-09-16 03:30:05 -06:00
__clear_bit ( vpid , vmx_vpid_bitmap ) ;
2010-04-17 02:41:47 -06:00
spin_unlock ( & vmx_vpid_lock ) ;
}
2018-11-07 20:22:21 -07:00
static __always_inline void vmx_disable_intercept_for_msr ( unsigned long * msr_bitmap ,
2018-01-16 08:51:18 -07:00
u32 msr , int type )
2008-03-27 23:18:56 -06:00
{
2009-02-24 12:46:19 -07:00
int f = sizeof ( unsigned long ) ;
2008-03-27 23:18:56 -06:00
if ( ! cpu_has_vmx_msr_bitmap ( ) )
return ;
2018-04-16 04:50:33 -06:00
if ( static_branch_unlikely ( & enable_evmcs ) )
evmcs_touch_msr_bitmap ( ) ;
2008-03-27 23:18:56 -06:00
/*
* See Intel PRM Vol . 3 , 20.6 .9 ( MSR - Bitmap Address ) . Early manuals
* have the write - low and read - high bitmap offsets the wrong way round .
* We can control MSRs 0x00000000 - 0x00001fff and 0xc0000000 - 0xc0001fff .
*/
if ( msr < = 0x1fff ) {
2013-01-24 19:18:50 -07:00
if ( type & MSR_TYPE_R )
/* read-low */
__clear_bit ( msr , msr_bitmap + 0x000 / f ) ;
if ( type & MSR_TYPE_W )
/* write-low */
__clear_bit ( msr , msr_bitmap + 0x800 / f ) ;
2008-03-27 23:18:56 -06:00
} else if ( ( msr > = 0xc0000000 ) & & ( msr < = 0xc0001fff ) ) {
msr & = 0x1fff ;
2013-01-24 19:18:50 -07:00
if ( type & MSR_TYPE_R )
/* read-high */
__clear_bit ( msr , msr_bitmap + 0x400 / f ) ;
if ( type & MSR_TYPE_W )
/* write-high */
__clear_bit ( msr , msr_bitmap + 0xc00 / f ) ;
}
}
2018-11-07 20:22:21 -07:00
static __always_inline void vmx_enable_intercept_for_msr ( unsigned long * msr_bitmap ,
2018-01-16 08:51:18 -07:00
u32 msr , int type )
{
int f = sizeof ( unsigned long ) ;
if ( ! cpu_has_vmx_msr_bitmap ( ) )
return ;
2018-04-16 04:50:33 -06:00
if ( static_branch_unlikely ( & enable_evmcs ) )
evmcs_touch_msr_bitmap ( ) ;
2018-01-16 08:51:18 -07:00
/*
* See Intel PRM Vol . 3 , 20.6 .9 ( MSR - Bitmap Address ) . Early manuals
* have the write - low and read - high bitmap offsets the wrong way round .
* We can control MSRs 0x00000000 - 0x00001fff and 0xc0000000 - 0xc0001fff .
*/
if ( msr < = 0x1fff ) {
if ( type & MSR_TYPE_R )
/* read-low */
__set_bit ( msr , msr_bitmap + 0x000 / f ) ;
if ( type & MSR_TYPE_W )
/* write-low */
__set_bit ( msr , msr_bitmap + 0x800 / f ) ;
} else if ( ( msr > = 0xc0000000 ) & & ( msr < = 0xc0001fff ) ) {
msr & = 0x1fff ;
if ( type & MSR_TYPE_R )
/* read-high */
__set_bit ( msr , msr_bitmap + 0x400 / f ) ;
if ( type & MSR_TYPE_W )
/* write-high */
__set_bit ( msr , msr_bitmap + 0xc00 / f ) ;
}
}
2018-11-07 20:22:21 -07:00
static __always_inline void vmx_set_intercept_for_msr ( unsigned long * msr_bitmap ,
2018-01-16 08:51:18 -07:00
u32 msr , int type , bool value )
{
if ( value )
vmx_enable_intercept_for_msr ( msr_bitmap , msr , type ) ;
else
vmx_disable_intercept_for_msr ( msr_bitmap , msr , type ) ;
}
static u8 vmx_msr_bitmap_mode ( struct kvm_vcpu * vcpu )
2009-02-24 13:26:47 -07:00
{
2018-01-16 08:51:18 -07:00
u8 mode = 0 ;
if ( cpu_has_secondary_exec_ctrls ( ) & &
2019-05-07 13:17:57 -06:00
( secondary_exec_controls_get ( to_vmx ( vcpu ) ) &
2018-01-16 08:51:18 -07:00
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE ) ) {
mode | = MSR_BITMAP_MODE_X2APIC ;
if ( enable_apicv & & kvm_vcpu_apicv_active ( vcpu ) )
mode | = MSR_BITMAP_MODE_X2APIC_APICV ;
}
return mode ;
2013-01-24 19:18:50 -07:00
}
2018-01-16 08:51:18 -07:00
static void vmx_update_msr_bitmap_x2apic ( unsigned long * msr_bitmap ,
u8 mode )
2013-01-24 19:18:50 -07:00
{
2018-01-16 08:51:18 -07:00
int msr ;
for ( msr = 0x800 ; msr < = 0x8ff ; msr + = BITS_PER_LONG ) {
unsigned word = msr / BITS_PER_LONG ;
msr_bitmap [ word ] = ( mode & MSR_BITMAP_MODE_X2APIC_APICV ) ? 0 : ~ 0 ;
msr_bitmap [ word + ( 0x800 / sizeof ( long ) ) ] = ~ 0 ;
}
if ( mode & MSR_BITMAP_MODE_X2APIC ) {
/*
* TPR reads and writes can be virtualized even if virtual interrupt
* delivery is not in use .
*/
vmx_disable_intercept_for_msr ( msr_bitmap , X2APIC_MSR ( APIC_TASKPRI ) , MSR_TYPE_RW ) ;
if ( mode & MSR_BITMAP_MODE_X2APIC_APICV ) {
vmx_enable_intercept_for_msr ( msr_bitmap , X2APIC_MSR ( APIC_TMCCT ) , MSR_TYPE_R ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , X2APIC_MSR ( APIC_EOI ) , MSR_TYPE_W ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , X2APIC_MSR ( APIC_SELF_IPI ) , MSR_TYPE_W ) ;
}
2016-09-21 17:43:25 -06:00
}
2009-02-24 13:26:47 -07:00
}
2018-12-03 14:53:16 -07:00
void vmx_update_msr_bitmap ( struct kvm_vcpu * vcpu )
2018-01-16 08:51:18 -07:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
unsigned long * msr_bitmap = vmx - > vmcs01 . msr_bitmap ;
u8 mode = vmx_msr_bitmap_mode ( vcpu ) ;
u8 changed = mode ^ vmx - > msr_bitmap_mode ;
if ( ! changed )
return ;
if ( changed & ( MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV ) )
vmx_update_msr_bitmap_x2apic ( msr_bitmap , mode ) ;
vmx - > msr_bitmap_mode = mode ;
}
2018-10-24 02:05:15 -06:00
void pt_update_intercept_for_msr ( struct vcpu_vmx * vmx )
{
unsigned long * msr_bitmap = vmx - > vmcs01 . msr_bitmap ;
bool flag = ! ( vmx - > pt_desc . guest . ctl & RTIT_CTL_TRACEEN ) ;
u32 i ;
vmx_set_intercept_for_msr ( msr_bitmap , MSR_IA32_RTIT_STATUS ,
MSR_TYPE_RW , flag ) ;
vmx_set_intercept_for_msr ( msr_bitmap , MSR_IA32_RTIT_OUTPUT_BASE ,
MSR_TYPE_RW , flag ) ;
vmx_set_intercept_for_msr ( msr_bitmap , MSR_IA32_RTIT_OUTPUT_MASK ,
MSR_TYPE_RW , flag ) ;
vmx_set_intercept_for_msr ( msr_bitmap , MSR_IA32_RTIT_CR3_MATCH ,
MSR_TYPE_RW , flag ) ;
for ( i = 0 ; i < vmx - > pt_desc . addr_range ; i + + ) {
vmx_set_intercept_for_msr ( msr_bitmap ,
MSR_IA32_RTIT_ADDR0_A + i * 2 , MSR_TYPE_RW , flag ) ;
vmx_set_intercept_for_msr ( msr_bitmap ,
MSR_IA32_RTIT_ADDR0_B + i * 2 , MSR_TYPE_RW , flag ) ;
}
}
2018-09-04 01:56:52 -06:00
static bool vmx_guest_apic_has_interrupt ( struct kvm_vcpu * vcpu )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
void * vapic_page ;
u32 vppr ;
int rvi ;
if ( WARN_ON_ONCE ( ! is_guest_mode ( vcpu ) ) | |
! nested_cpu_has_vid ( get_vmcs12 ( vcpu ) ) | |
2019-01-31 13:24:37 -07:00
WARN_ON_ONCE ( ! vmx - > nested . virtual_apic_map . gfn ) )
2018-09-04 01:56:52 -06:00
return false ;
2018-10-03 05:44:26 -06:00
rvi = vmx_get_rvi ( ) ;
2018-09-04 01:56:52 -06:00
2019-01-31 13:24:37 -07:00
vapic_page = vmx - > nested . virtual_apic_map . hva ;
2018-09-04 01:56:52 -06:00
vppr = * ( ( u32 * ) ( vapic_page + APIC_PROCPRI ) ) ;
return ( ( rvi & 0xf0 ) > ( vppr & 0xf0 ) ) ;
}
2017-04-27 23:13:59 -06:00
static inline bool kvm_vcpu_trigger_posted_interrupt ( struct kvm_vcpu * vcpu ,
bool nested )
2015-02-16 07:36:33 -07:00
{
# ifdef CONFIG_SMP
2017-04-27 23:13:59 -06:00
int pi_vec = nested ? POSTED_INTR_NESTED_VECTOR : POSTED_INTR_VECTOR ;
2015-02-16 07:36:33 -07:00
if ( vcpu - > mode = = IN_GUEST_MODE ) {
2015-09-18 08:29:54 -06:00
/*
2017-09-17 19:56:50 -06:00
* The vector of interrupt to be delivered to vcpu had
* been set in PIR before this function .
*
* Following cases will be reached in this block , and
* we always send a notification event in all cases as
* explained below .
*
* Case 1 : vcpu keeps in non - root mode . Sending a
* notification event posts the interrupt to vcpu .
*
* Case 2 : vcpu exits to root mode and is still
* runnable . PIR will be synced to vIRR before the
* next vcpu entry . Sending a notification event in
* this case has no effect , as vcpu is not in root
* mode .
2015-09-18 08:29:54 -06:00
*
2017-09-17 19:56:50 -06:00
* Case 3 : vcpu exits to root mode and is blocked .
* vcpu_block ( ) has already synced PIR to vIRR and
* never blocks vcpu if vIRR is not cleared . Therefore ,
* a blocked vcpu here does not wait for any requested
* interrupts in PIR , and sending a notification event
* which has no effect is safe here .
2015-09-18 08:29:54 -06:00
*/
2017-04-27 23:13:59 -06:00
apic - > send_IPI_mask ( get_cpu_mask ( vcpu - > cpu ) , pi_vec ) ;
2015-02-16 07:36:33 -07:00
return true ;
}
# endif
return false ;
}
2015-02-03 08:58:17 -07:00
static int vmx_deliver_nested_posted_interrupt ( struct kvm_vcpu * vcpu ,
int vector )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
if ( is_guest_mode ( vcpu ) & &
vector = = vmx - > nested . posted_intr_nv ) {
/*
* If a posted intr is not recognized by hardware ,
* we will accomplish it in the next vmentry .
*/
vmx - > nested . pi_pending = true ;
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2
Consider the following scenario:
1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
to CPU B via virtual posted-interrupt mechanism.
2. CPU B is currently executing L2 guest.
3. vmx_deliver_nested_posted_interrupt() calls
kvm_vcpu_trigger_posted_interrupt() which will note that
vcpu->mode == IN_GUEST_MODE.
4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR
IPI, CPU B exits from L2 to L0 during event-delivery
(valid IDT-vectoring-info).
5. CPU A now sends the physical IPI. The IPI is received in host and
it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing.
6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT,
CPU B continues to run in L0 and reach vcpu_enter_guest(). As
KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume
L2 guest.
7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but
it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be
consumed at next L2 entry!
Another scenario to consider:
1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
to CPU B via virtual posted-interrupt mechanism.
2. Assume that before CPU A calls kvm_vcpu_trigger_posted_interrupt(),
CPU B is at L0 and is about to resume into L2. Further assume that it is
in vcpu_enter_guest() after check for KVM_REQ_EVENT.
3. At this point, CPU A calls kvm_vcpu_trigger_posted_interrupt() which
will note that vcpu->mode != IN_GUEST_MODE. Therefore, do nothing and
return false. Then, will set pi_pending=true and KVM_REQ_EVENT.
4. Now CPU B continue and resumes into L2 guest without processing
the posted-interrupt until next L2 entry!
To fix both issues, we just need to change
vmx_deliver_nested_posted_interrupt() to set pi_pending=true and
KVM_REQ_EVENT before calling kvm_vcpu_trigger_posted_interrupt().
It will fix the first scenario by chaging step (6) to note that
KVM_REQ_EVENT and pi_pending=true and therefore process
nested posted-interrupt.
It will fix the second scenario by two possible ways:
1. If kvm_vcpu_trigger_posted_interrupt() is called while CPU B has changed
vcpu->mode to IN_GUEST_MODE, physical IPI will be sent and will be received
when CPU resumes into L2.
2. If kvm_vcpu_trigger_posted_interrupt() is called while CPU B hasn't yet
changed vcpu->mode to IN_GUEST_MODE, then after CPU B will change
vcpu->mode it will call kvm_request_pending() which will return true and
therefore force another round of vcpu_enter_guest() which will note that
KVM_REQ_EVENT and pi_pending=true and therefore process nested
posted-interrupt.
Cc: stable@vger.kernel.org
Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
[Add kvm_vcpu_kick to also handle the case where L1 doesn't intercept L2 HLT
and L2 executes HLT instruction. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-09 11:27:20 -07:00
/* the PIR and ON have been set by L1. */
if ( ! kvm_vcpu_trigger_posted_interrupt ( vcpu , true ) )
kvm_vcpu_kick ( vcpu ) ;
2015-02-03 08:58:17 -07:00
return 0 ;
}
return - 1 ;
}
2013-04-11 05:25:15 -06:00
/*
* Send interrupt to vcpu via posted interrupt way .
* 1. If target vcpu is running ( non - root mode ) , send posted interrupt
* notification to vcpu and hardware will sync PIR to vIRR atomically .
* 2. If target vcpu isn ' t running ( root mode ) , kick it to pick up the
* interrupt from PIR in next vmentry .
*/
2020-02-20 10:22:05 -07:00
static int vmx_deliver_posted_interrupt ( struct kvm_vcpu * vcpu , int vector )
2013-04-11 05:25:15 -06:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
int r ;
2015-02-03 08:58:17 -07:00
r = vmx_deliver_nested_posted_interrupt ( vcpu , vector ) ;
if ( ! r )
2020-02-20 10:22:05 -07:00
return 0 ;
if ( ! vcpu - > arch . apicv_active )
return - 1 ;
2015-02-03 08:58:17 -07:00
2013-04-11 05:25:15 -06:00
if ( pi_test_and_set_pir ( vector , & vmx - > pi_desc ) )
2020-02-20 10:22:05 -07:00
return 0 ;
2013-04-11 05:25:15 -06:00
kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
is blocked", 2015-09-18) the posted interrupt descriptor is checked
unconditionally for PIR.ON. Therefore we don't need KVM_REQ_EVENT to
trigger the scan and, if NMIs or SMIs are not involved, we can avoid
the complicated event injection path.
Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
there since APICv was introduced.
However, without the KVM_REQ_EVENT safety net KVM needs to be much
more careful about races between vmx_deliver_posted_interrupt and
vcpu_enter_guest. First, the IPI for posted interrupts may be issued
between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
If that happens, kvm_trigger_posted_interrupt returns true, but
smp_kvm_posted_intr_ipi doesn't do anything about it. The guest is
entered with PIR.ON, but the posted interrupt IPI has not been sent
and the interrupt is only delivered to the guest on the next vmentry
(if any). To fix this, disable interrupts before setting vcpu->mode.
This ensures that the IPI is delayed until the guest enters non-root mode;
it is then trapped by the processor causing the interrupt to be injected.
Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
and vcpu->mode = IN_GUEST_MODE. In this case, kvm_vcpu_kick is called
but it (correctly) doesn't do anything because it sees vcpu->mode ==
OUTSIDE_GUEST_MODE. Again, the guest is entered with PIR.ON but no
posted interrupt IPI is pending; this time, the fix for this is to move
the RVI update after IN_GUEST_MODE.
Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
though the second could actually happen with VT-d posted interrupts.
In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
in another vmentry which would inject the interrupt.
This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19 05:57:33 -07:00
/* If a previous notification has sent the IPI, nothing to do. */
if ( pi_test_and_set_on ( & vmx - > pi_desc ) )
2020-02-20 10:22:05 -07:00
return 0 ;
kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
is blocked", 2015-09-18) the posted interrupt descriptor is checked
unconditionally for PIR.ON. Therefore we don't need KVM_REQ_EVENT to
trigger the scan and, if NMIs or SMIs are not involved, we can avoid
the complicated event injection path.
Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
there since APICv was introduced.
However, without the KVM_REQ_EVENT safety net KVM needs to be much
more careful about races between vmx_deliver_posted_interrupt and
vcpu_enter_guest. First, the IPI for posted interrupts may be issued
between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
If that happens, kvm_trigger_posted_interrupt returns true, but
smp_kvm_posted_intr_ipi doesn't do anything about it. The guest is
entered with PIR.ON, but the posted interrupt IPI has not been sent
and the interrupt is only delivered to the guest on the next vmentry
(if any). To fix this, disable interrupts before setting vcpu->mode.
This ensures that the IPI is delayed until the guest enters non-root mode;
it is then trapped by the processor causing the interrupt to be injected.
Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
and vcpu->mode = IN_GUEST_MODE. In this case, kvm_vcpu_kick is called
but it (correctly) doesn't do anything because it sees vcpu->mode ==
OUTSIDE_GUEST_MODE. Again, the guest is entered with PIR.ON but no
posted interrupt IPI is pending; this time, the fix for this is to move
the RVI update after IN_GUEST_MODE.
Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
though the second could actually happen with VT-d posted interrupts.
In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
in another vmentry which would inject the interrupt.
This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-19 05:57:33 -07:00
2020-04-28 00:23:27 -06:00
if ( vcpu ! = kvm_get_running_vcpu ( ) & &
! kvm_vcpu_trigger_posted_interrupt ( vcpu , false ) )
2013-04-11 05:25:15 -06:00
kvm_vcpu_kick ( vcpu ) ;
2020-02-20 10:22:05 -07:00
return 0 ;
2013-04-11 05:25:15 -06:00
}
2011-05-25 14:09:01 -06:00
/*
* Set up the vmcs ' s constant host - state fields , i . e . , host - state fields that
* will not change in the lifetime of the guest .
* Note that host - state that does change is set elsewhere . E . g . , host - state
* that is set differently for each CPU is set in vmx_vcpu_load ( ) , not here .
*/
2018-12-03 14:53:16 -07:00
void vmx_set_constant_host_state ( struct vcpu_vmx * vmx )
2011-05-25 14:09:01 -06:00
{
u32 low32 , high32 ;
unsigned long tmpl ;
2017-05-28 11:00:17 -06:00
unsigned long cr0 , cr3 , cr4 ;
2011-05-25 14:09:01 -06:00
2016-10-31 16:18:45 -06:00
cr0 = read_cr0 ( ) ;
WARN_ON ( cr0 & X86_CR0_TS ) ;
vmcs_writel ( HOST_CR0 , cr0 ) ; /* 22.2.3 */
2017-05-28 11:00:17 -06:00
/*
* Save the most likely value for this task ' s CR3 in the VMCS .
* We can ' t use __get_current_cr3_fast ( ) because we ' re not atomic .
*/
2017-06-12 11:26:14 -06:00
cr3 = __read_cr3 ( ) ;
2017-05-28 11:00:17 -06:00
vmcs_writel ( HOST_CR3 , cr3 ) ; /* 22.2.3 FIXME: shadow tables */
2018-07-23 13:32:47 -06:00
vmx - > loaded_vmcs - > host_state . cr3 = cr3 ;
2011-05-25 14:09:01 -06:00
2014-10-08 10:02:13 -06:00
/* Save the most likely value for this task's CR4 in the VMCS. */
2014-10-24 16:58:08 -06:00
cr4 = cr4_read_shadow ( ) ;
2014-10-08 10:02:13 -06:00
vmcs_writel ( HOST_CR4 , cr4 ) ; /* 22.2.3, 22.2.5 */
2018-07-23 13:32:47 -06:00
vmx - > loaded_vmcs - > host_state . cr4 = cr4 ;
2014-10-08 10:02:13 -06:00
2011-05-25 14:09:01 -06:00
vmcs_write16 ( HOST_CS_SELECTOR , __KERNEL_CS ) ; /* 22.2.4 */
2012-05-13 10:53:24 -06:00
# ifdef CONFIG_X86_64
/*
* Load null selectors , so we can avoid reloading them in
2018-07-23 13:32:44 -06:00
* vmx_prepare_switch_to_host ( ) , in case userspace uses
* the null selectors too ( the expected case ) .
2012-05-13 10:53:24 -06:00
*/
vmcs_write16 ( HOST_DS_SELECTOR , 0 ) ;
vmcs_write16 ( HOST_ES_SELECTOR , 0 ) ;
# else
2011-05-25 14:09:01 -06:00
vmcs_write16 ( HOST_DS_SELECTOR , __KERNEL_DS ) ; /* 22.2.4 */
vmcs_write16 ( HOST_ES_SELECTOR , __KERNEL_DS ) ; /* 22.2.4 */
2012-05-13 10:53:24 -06:00
# endif
2011-05-25 14:09:01 -06:00
vmcs_write16 ( HOST_SS_SELECTOR , __KERNEL_DS ) ; /* 22.2.4 */
vmcs_write16 ( HOST_TR_SELECTOR , GDT_ENTRY_TSS * 8 ) ; /* 22.2.4 */
KVM: VMX: Store the host kernel's IDT base in a global variable
Although the kernel may use multiple IDTs, KVM should only ever see the
"real" IDT, e.g. the early init IDT is long gone by the time KVM runs
and the debug stack IDT is only used for small windows of time in very
specific flows.
Before commit a547c6db4d2f1 ("KVM: VMX: Enable acknowledge interupt on
vmexit"), the kernel's IDT base was consumed by KVM only when setting
constant VMCS state, i.e. to set VMCS.HOST_IDTR_BASE. Because constant
host state is done once per vCPU, there was ostensibly no need to cache
the kernel's IDT base.
When support for "ack interrupt on exit" was introduced, KVM added a
second consumer of the IDT base as handling already-acked interrupts
requires directly calling the interrupt handler, i.e. KVM uses the IDT
base to find the address of the handler. Because interrupts are a fast
path, KVM cached the IDT base to avoid having to VMREAD HOST_IDTR_BASE.
Presumably, the IDT base was cached on a per-vCPU basis simply because
the existing code grabbed the IDT base on a per-vCPU (VMCS) basis.
Note, all post-boot IDTs use the same handlers for external interrupts,
i.e. the "ack interrupt on exit" use of the IDT base would be unaffected
even if the cached IDT somehow did not match the current IDT. And as
for the original use case of setting VMCS.HOST_IDTR_BASE, if any of the
above analysis is wrong then KVM has had a bug since the beginning of
time since KVM has effectively been caching the IDT at vCPU creation
since commit a8b732ca01c ("[PATCH] kvm: userspace interface").
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-19 23:50:57 -06:00
vmcs_writel ( HOST_IDTR_BASE , host_idt_base ) ; /* 22.2.4 */
2011-05-25 14:09:01 -06:00
KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
Transitioning to/from a VMX guest requires KVM to manually save/load
the bulk of CPU state that the guest is allowed to direclty access,
e.g. XSAVE state, CR2, GPRs, etc... For obvious reasons, loading the
guest's GPR snapshot prior to VM-Enter and saving the snapshot after
VM-Exit is done via handcoded assembly. The assembly blob is written
as inline asm so that it can easily access KVM-defined structs that
are used to hold guest state, e.g. moving the blob to a standalone
assembly file would require generating defines for struct offsets.
The other relevant aspect of VMX transitions in KVM is the handling of
VM-Exits. KVM doesn't employ a separate VM-Exit handler per se, but
rather treats the VMX transition as a mega instruction (with many side
effects), i.e. sets the VMCS.HOST_RIP to a label immediately following
VMLAUNCH/VMRESUME. The label is then exposed to C code via a global
variable definition in the inline assembly.
Because of the global variable, KVM takes steps to (attempt to) ensure
only a single instance of the owning C function, e.g. vmx_vcpu_run, is
generated by the compiler. The earliest approach placed the inline
assembly in a separate noinline function[1]. Later, the assembly was
folded back into vmx_vcpu_run() and tagged with __noclone[2][3], which
is still used today.
After moving to __noclone, an edge case was encountered where GCC's
-ftracer optimization resulted in the inline assembly blob being
duplicated. This was "fixed" by explicitly disabling -ftracer in the
__noclone definition[4].
Recently, it was found that disabling -ftracer causes build warnings
for unsuspecting users of __noclone[5], and more importantly for KVM,
prevents the compiler for properly optimizing vmx_vcpu_run()[6]. And
perhaps most importantly of all, it was pointed out that there is no
way to prevent duplication of a function with 100% reliability[7],
i.e. more edge cases may be encountered in the future.
So to summarize, the only way to prevent the compiler from duplicating
the global variable definition is to move the variable out of inline
assembly, which has been suggested several times over[1][7][8].
Resolve the aforementioned issues by moving the VMLAUNCH+VRESUME and
VM-Exit "handler" to standalone assembly sub-routines. Moving only
the core VMX transition codes allows the struct indexing to remain as
inline assembly and also allows the sub-routines to be used by
nested_vmx_check_vmentry_hw(). Reusing the sub-routines has a happy
side-effect of eliminating two VMWRITEs in the nested_early_check path
as there is no longer a need to dynamically change VMCS.HOST_RIP.
Note that callers to vmx_vmenter() must account for the CALL modifying
RSP, e.g. must subtract op-size from RSP when synchronizing RSP with
VMCS.HOST_RSP and "restore" RSP prior to the CALL. There are no great
alternatives to fudging RSP. Saving RSP in vmx_enter() is difficult
because doing so requires a second register (VMWRITE does not provide
an immediate encoding for the VMCS field and KVM supports Hyper-V's
memory-based eVMCS ABI). The other more drastic alternative would be
to use eschew VMCS.HOST_RSP and manually save/load RSP using a per-cpu
variable (which can be encoded as e.g. gs:[imm]). But because a valid
stack is needed at the time of VM-Exit (NMIs aren't blocked and a user
could theoretically insert INT3/INT1ICEBRK at the VM-Exit handler), a
dedicated per-cpu VM-Exit stack would be required. A dedicated stack
isn't difficult to implement, but it would require at least one page
per CPU and knowledge of the stack in the dumpstack routines. And in
most cases there is essentially zero overhead in dynamically updating
VMCS.HOST_RSP, e.g. the VMWRITE can be avoided for all but the first
VMLAUNCH unless nested_early_check=1, which is not a fast path. In
other words, avoiding the VMCS.HOST_RSP by using a dedicated stack
would only make the code marginally less ugly while requiring at least
one page per CPU and forcing the kernel to be aware (and approve) of
the VM-Exit stack shenanigans.
[1] cea15c24ca39 ("KVM: Move KVM context switch into own function")
[2] a3b5ba49a8c5 ("KVM: VMX: add the __noclone attribute to vmx_vcpu_run")
[3] 104f226bfd0a ("KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()")
[4] 95272c29378e ("compiler-gcc: disable -ftracer for __noclone functions")
[5] https://lkml.kernel.org/r/20181218140105.ajuiglkpvstt3qxs@treble
[6] https://patchwork.kernel.org/patch/8707981/#21817015
[7] https://lkml.kernel.org/r/ri6y38lo23g.fsf@suse.cz
[8] https://lkml.kernel.org/r/20181218212042.GE25620@tassilo.jf.intel.com
Suggested-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Martin Jambor <mjambor@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-20 13:25:17 -07:00
vmcs_writel ( HOST_RIP , ( unsigned long ) vmx_vmexit ) ; /* 22.2.5 */
2011-05-25 14:09:01 -06:00
rdmsr ( MSR_IA32_SYSENTER_CS , low32 , high32 ) ;
vmcs_write32 ( HOST_IA32_SYSENTER_CS , low32 ) ;
rdmsrl ( MSR_IA32_SYSENTER_EIP , tmpl ) ;
vmcs_writel ( HOST_IA32_SYSENTER_EIP , tmpl ) ; /* 22.2.3 */
if ( vmcs_config . vmexit_ctrl & VM_EXIT_LOAD_IA32_PAT ) {
rdmsr ( MSR_IA32_CR_PAT , low32 , high32 ) ;
vmcs_write64 ( HOST_IA32_PAT , low32 | ( ( u64 ) high32 < < 32 ) ) ;
}
2018-09-26 10:23:56 -06:00
2018-12-03 14:53:00 -07:00
if ( cpu_has_load_ia32_efer ( ) )
2018-09-26 10:23:56 -06:00
vmcs_write64 ( HOST_IA32_EFER , host_efer ) ;
2011-05-25 14:09:01 -06:00
}
2018-12-03 14:53:16 -07:00
void set_cr4_guest_host_mask ( struct vcpu_vmx * vmx )
2011-05-25 14:09:31 -06:00
{
vmx - > vcpu . arch . cr4_guest_owned_bits = KVM_CR4_GUEST_OWNED_BITS ;
if ( enable_ept )
vmx - > vcpu . arch . cr4_guest_owned_bits | = X86_CR4_PGE ;
2011-05-25 14:10:02 -06:00
if ( is_guest_mode ( & vmx - > vcpu ) )
vmx - > vcpu . arch . cr4_guest_owned_bits & =
~ get_vmcs12 ( & vmx - > vcpu ) - > cr4_guest_host_mask ;
2011-05-25 14:09:31 -06:00
vmcs_writel ( CR4_GUEST_HOST_MASK , ~ vmx - > vcpu . arch . cr4_guest_owned_bits ) ;
}
2019-05-07 13:17:53 -06:00
u32 vmx_pin_based_exec_ctrl ( struct vcpu_vmx * vmx )
2013-04-11 05:25:12 -06:00
{
u32 pin_based_exec_ctrl = vmcs_config . pin_based_exec_ctrl ;
2015-11-10 05:36:33 -07:00
if ( ! kvm_vcpu_apicv_active ( & vmx - > vcpu ) )
2013-04-11 05:25:12 -06:00
pin_based_exec_ctrl & = ~ PIN_BASED_POSTED_INTR ;
2017-11-06 05:31:13 -07:00
if ( ! enable_vnmi )
pin_based_exec_ctrl & = ~ PIN_BASED_VIRTUAL_NMIS ;
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
if ( ! enable_preemption_timer )
pin_based_exec_ctrl & = ~ PIN_BASED_VMX_PREEMPTION_TIMER ;
2013-04-11 05:25:12 -06:00
return pin_based_exec_ctrl ;
}
2015-11-10 05:36:33 -07:00
static void vmx_refresh_apicv_exec_ctrl ( struct kvm_vcpu * vcpu )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2019-05-07 13:17:55 -06:00
pin_controls_set ( vmx , vmx_pin_based_exec_ctrl ( vmx ) ) ;
2016-05-18 08:48:20 -06:00
if ( cpu_has_secondary_exec_ctrls ( ) ) {
if ( kvm_vcpu_apicv_active ( vcpu ) )
2019-05-07 13:17:57 -06:00
secondary_exec_controls_setbit ( vmx ,
2016-05-18 08:48:20 -06:00
SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY ) ;
else
2019-05-07 13:17:57 -06:00
secondary_exec_controls_clearbit ( vmx ,
2016-05-18 08:48:20 -06:00
SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY ) ;
}
if ( cpu_has_vmx_msr_bitmap ( ) )
2018-01-16 08:51:18 -07:00
vmx_update_msr_bitmap ( vcpu ) ;
2015-11-10 05:36:33 -07:00
}
2018-12-03 14:53:07 -07:00
u32 vmx_exec_control ( struct vcpu_vmx * vmx )
{
u32 exec_control = vmcs_config . cpu_based_exec_ctrl ;
if ( vmx - > vcpu . arch . switch_db_regs & KVM_DEBUGREG_WONT_EXIT )
exec_control & = ~ CPU_BASED_MOV_DR_EXITING ;
if ( ! cpu_need_tpr_shadow ( & vmx - > vcpu ) ) {
exec_control & = ~ CPU_BASED_TPR_SHADOW ;
# ifdef CONFIG_X86_64
exec_control | = CPU_BASED_CR8_STORE_EXITING |
CPU_BASED_CR8_LOAD_EXITING ;
# endif
}
if ( ! enable_ept )
exec_control | = CPU_BASED_CR3_STORE_EXITING |
CPU_BASED_CR3_LOAD_EXITING |
CPU_BASED_INVLPG_EXITING ;
if ( kvm_mwait_in_guest ( vmx - > vcpu . kvm ) )
exec_control & = ~ ( CPU_BASED_MWAIT_EXITING |
CPU_BASED_MONITOR_EXITING ) ;
if ( kvm_hlt_in_guest ( vmx - > vcpu . kvm ) )
exec_control & = ~ CPU_BASED_HLT_EXITING ;
return exec_control ;
}
2017-08-24 05:55:35 -06:00
static void vmx_compute_secondary_exec_control ( struct vcpu_vmx * vmx )
2011-05-25 14:09:31 -06:00
{
2017-08-24 05:55:35 -06:00
struct kvm_vcpu * vcpu = & vmx - > vcpu ;
2011-05-25 14:09:31 -06:00
u32 exec_control = vmcs_config . cpu_based_2nd_exec_ctrl ;
2016-07-12 02:44:55 -06:00
2020-03-02 16:56:22 -07:00
if ( vmx_pt_mode_is_system ( ) )
2018-10-24 02:05:10 -06:00
exec_control & = ~ ( SECONDARY_EXEC_PT_USE_GPA | SECONDARY_EXEC_PT_CONCEAL_VMX ) ;
2017-08-24 05:55:35 -06:00
if ( ! cpu_need_virtualize_apic_accesses ( vcpu ) )
2011-05-25 14:09:31 -06:00
exec_control & = ~ SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES ;
if ( vmx - > vpid = = 0 )
exec_control & = ~ SECONDARY_EXEC_ENABLE_VPID ;
if ( ! enable_ept ) {
exec_control & = ~ SECONDARY_EXEC_ENABLE_EPT ;
enable_unrestricted_guest = 0 ;
}
if ( ! enable_unrestricted_guest )
exec_control & = ~ SECONDARY_EXEC_UNRESTRICTED_GUEST ;
2018-03-12 05:53:04 -06:00
if ( kvm_pause_in_guest ( vmx - > vcpu . kvm ) )
2011-05-25 14:09:31 -06:00
exec_control & = ~ SECONDARY_EXEC_PAUSE_LOOP_EXITING ;
2017-08-24 05:55:35 -06:00
if ( ! kvm_vcpu_apicv_active ( vcpu ) )
2013-01-24 19:18:51 -07:00
exec_control & = ~ ( SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY ) ;
2013-01-24 19:18:50 -07:00
exec_control & = ~ SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE ;
2016-07-12 02:44:55 -06:00
/* SECONDARY_EXEC_DESC is enabled/disabled on writes to CR4.UMIP,
* in vmx_set_cr4 . */
exec_control & = ~ SECONDARY_EXEC_DESC ;
2013-04-18 05:35:25 -06:00
/* SECONDARY_EXEC_SHADOW_VMCS is enabled when L1 executes VMPTRLD
( handle_vmptrld ) .
We can NOT enable shadow_vmcs here because we don ' t have yet
a current VMCS12
*/
exec_control & = ~ SECONDARY_EXEC_SHADOW_VMCS ;
2015-11-03 22:46:05 -07:00
if ( ! enable_pml )
exec_control & = ~ SECONDARY_EXEC_ENABLE_PML ;
2015-01-27 19:54:28 -07:00
2017-08-24 06:48:03 -06:00
if ( vmx_xsaves_supported ( ) ) {
/* Exposing XSAVES only when XSAVE is exposed */
bool xsaves_enabled =
2019-12-10 15:44:15 -07:00
boot_cpu_has ( X86_FEATURE_XSAVE ) & &
2017-08-24 06:48:03 -06:00
guest_cpuid_has ( vcpu , X86_FEATURE_XSAVE ) & &
guest_cpuid_has ( vcpu , X86_FEATURE_XSAVES ) ;
2019-10-21 17:30:20 -06:00
vcpu - > arch . xsaves_enabled = xsaves_enabled ;
2017-08-24 06:48:03 -06:00
if ( ! xsaves_enabled )
exec_control & = ~ SECONDARY_EXEC_XSAVES ;
if ( nested ) {
if ( xsaves_enabled )
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high | =
2017-08-24 06:48:03 -06:00
SECONDARY_EXEC_XSAVES ;
else
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high & =
2017-08-24 06:48:03 -06:00
~ SECONDARY_EXEC_XSAVES ;
}
}
2020-03-02 16:56:58 -07:00
if ( cpu_has_vmx_rdtscp ( ) ) {
2017-08-24 05:55:35 -06:00
bool rdtscp_enabled = guest_cpuid_has ( vcpu , X86_FEATURE_RDTSCP ) ;
if ( ! rdtscp_enabled )
exec_control & = ~ SECONDARY_EXEC_RDTSCP ;
if ( nested ) {
if ( rdtscp_enabled )
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high | =
2017-08-24 05:55:35 -06:00
SECONDARY_EXEC_RDTSCP ;
else
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high & =
2017-08-24 05:55:35 -06:00
~ SECONDARY_EXEC_RDTSCP ;
}
}
2020-03-02 16:56:34 -07:00
if ( cpu_has_vmx_invpcid ( ) ) {
2017-08-24 05:55:35 -06:00
/* Exposing INVPCID only when PCID is exposed */
bool invpcid_enabled =
guest_cpuid_has ( vcpu , X86_FEATURE_INVPCID ) & &
guest_cpuid_has ( vcpu , X86_FEATURE_PCID ) ;
if ( ! invpcid_enabled ) {
exec_control & = ~ SECONDARY_EXEC_ENABLE_INVPCID ;
guest_cpuid_clear ( vcpu , X86_FEATURE_INVPCID ) ;
}
if ( nested ) {
if ( invpcid_enabled )
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high | =
2017-08-24 05:55:35 -06:00
SECONDARY_EXEC_ENABLE_INVPCID ;
else
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high & =
2017-08-24 05:55:35 -06:00
~ SECONDARY_EXEC_ENABLE_INVPCID ;
}
}
2017-08-23 17:32:04 -06:00
if ( vmx_rdrand_supported ( ) ) {
bool rdrand_enabled = guest_cpuid_has ( vcpu , X86_FEATURE_RDRAND ) ;
if ( rdrand_enabled )
2017-08-24 12:51:37 -06:00
exec_control & = ~ SECONDARY_EXEC_RDRAND_EXITING ;
2017-08-23 17:32:04 -06:00
if ( nested ) {
if ( rdrand_enabled )
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high | =
2017-08-24 12:51:37 -06:00
SECONDARY_EXEC_RDRAND_EXITING ;
2017-08-23 17:32:04 -06:00
else
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high & =
2017-08-24 12:51:37 -06:00
~ SECONDARY_EXEC_RDRAND_EXITING ;
2017-08-23 17:32:04 -06:00
}
}
2017-08-23 17:32:03 -06:00
if ( vmx_rdseed_supported ( ) ) {
bool rdseed_enabled = guest_cpuid_has ( vcpu , X86_FEATURE_RDSEED ) ;
if ( rdseed_enabled )
2017-08-24 12:51:37 -06:00
exec_control & = ~ SECONDARY_EXEC_RDSEED_EXITING ;
2017-08-23 17:32:03 -06:00
if ( nested ) {
if ( rdseed_enabled )
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high | =
2017-08-24 12:51:37 -06:00
SECONDARY_EXEC_RDSEED_EXITING ;
2017-08-23 17:32:03 -06:00
else
2018-02-26 05:40:08 -07:00
vmx - > nested . msrs . secondary_ctls_high & =
2017-08-24 12:51:37 -06:00
~ SECONDARY_EXEC_RDSEED_EXITING ;
2017-08-23 17:32:03 -06:00
}
}
KVM: x86: Add support for user wait instructions
UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.
This patch adds support for user wait instructions in KVM. Availability
of the user wait instructions is indicated by the presence of the CPUID
feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may
be executed at any privilege level, and use 32bit IA32_UMWAIT_CONTROL MSR
to set the maximum time.
The behavior of user wait instructions in VMX non-root operation is
determined first by the setting of the "enable user wait and pause"
secondary processor-based VM-execution control bit 26.
If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause
an invalid-opcode exception (#UD).
If the VM-execution control is 1, treatment is based on the
setting of the “RDTSC exiting†VM-execution control. Because KVM never
enables RDTSC exiting, if the instruction causes a delay, the amount of
time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay. If
IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in
EDX:EAX minus the value that RDTSC would return; if
IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum
of that difference and AND(IA32_UMWAIT_CONTROL,FFFFFFFCH).
Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it.
Detailed information about user wait instructions can be found in the
latest Intel 64 and IA-32 Architectures Software Developer's Manual.
Co-developed-by: Jingqi Liu <jingqi.liu@intel.com>
Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-16 00:55:49 -06:00
if ( vmx_waitpkg_supported ( ) ) {
bool waitpkg_enabled =
guest_cpuid_has ( vcpu , X86_FEATURE_WAITPKG ) ;
if ( ! waitpkg_enabled )
exec_control & = ~ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE ;
if ( nested ) {
if ( waitpkg_enabled )
vmx - > nested . msrs . secondary_ctls_high | =
SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE ;
else
vmx - > nested . msrs . secondary_ctls_high & =
~ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE ;
}
}
2017-08-24 05:55:35 -06:00
vmx - > secondary_exec_control = exec_control ;
2011-05-25 14:09:31 -06:00
}
2011-07-11 13:33:44 -06:00
static void ept_set_mmio_spte_mask ( void )
{
/*
* EPT Misconfigurations can be generated if the value of bits 2 : 0
* of an EPT paging - structure entry is 110 b ( write / execute ) .
*/
2020-05-19 03:04:49 -06:00
kvm_mmu_set_mmio_spte_mask ( VMX_EPT_MISCONFIG_WX_VALUE , 0 ) ;
2011-07-11 13:33:44 -06:00
}
2014-12-02 04:14:58 -07:00
# define VMX_XSS_EXIT_BITMAP 0
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2018-12-03 14:53:09 -07:00
/*
2019-10-20 03:11:01 -06:00
* Noting that the initialization of Guest - state Area of VMCS is in
* vmx_vcpu_reset ( ) .
2018-12-03 14:53:09 -07:00
*/
2019-10-20 03:11:01 -06:00
static void init_vmcs ( struct vcpu_vmx * vmx )
2018-12-03 14:53:09 -07:00
{
if ( nested )
2019-10-20 03:11:01 -06:00
nested_vmx_set_vmcs_shadowing_bitmap ( ) ;
2018-12-03 14:53:09 -07:00
2008-03-27 23:18:56 -06:00
if ( cpu_has_vmx_msr_bitmap ( ) )
2018-01-16 08:51:18 -07:00
vmcs_write64 ( MSR_BITMAP , __pa ( vmx - > vmcs01 . msr_bitmap ) ) ;
2008-03-27 23:18:56 -06:00
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_write64 ( VMCS_LINK_POINTER , - 1ull ) ; /* 22.3.1.5 */
/* Control */
2019-05-07 13:18:00 -06:00
pin_controls_set ( vmx , vmx_pin_based_exec_ctrl ( vmx ) ) ;
2007-09-12 04:03:11 -06:00
2019-05-07 13:18:00 -06:00
exec_controls_set ( vmx , vmx_exec_control ( vmx ) ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2016-06-02 12:17:24 -06:00
if ( cpu_has_secondary_exec_ctrls ( ) ) {
2017-08-24 05:55:35 -06:00
vmx_compute_secondary_exec_control ( vmx ) ;
2019-05-07 13:18:00 -06:00
secondary_exec_controls_set ( vmx , vmx - > secondary_exec_control ) ;
2016-06-02 12:17:24 -06:00
}
2007-10-28 19:40:42 -06:00
2015-11-10 05:36:33 -07:00
if ( kvm_vcpu_apicv_active ( & vmx - > vcpu ) ) {
2013-01-24 19:18:51 -07:00
vmcs_write64 ( EOI_EXIT_BITMAP0 , 0 ) ;
vmcs_write64 ( EOI_EXIT_BITMAP1 , 0 ) ;
vmcs_write64 ( EOI_EXIT_BITMAP2 , 0 ) ;
vmcs_write64 ( EOI_EXIT_BITMAP3 , 0 ) ;
vmcs_write16 ( GUEST_INTR_STATUS , 0 ) ;
2013-04-11 05:25:12 -06:00
2015-12-02 22:29:34 -07:00
vmcs_write16 ( POSTED_INTR_NV , POSTED_INTR_VECTOR ) ;
2013-04-11 05:25:12 -06:00
vmcs_write64 ( POSTED_INTR_DESC_ADDR , __pa ( ( & vmx - > pi_desc ) ) ) ;
2013-01-24 19:18:51 -07:00
}
2018-03-12 05:53:04 -06:00
if ( ! kvm_pause_in_guest ( vmx - > vcpu . kvm ) ) {
2009-10-09 04:03:20 -06:00
vmcs_write32 ( PLE_GAP , ple_gap ) ;
2014-08-21 10:08:07 -06:00
vmx - > ple_window = ple_window ;
vmx - > ple_window_dirty = true ;
2009-10-09 04:03:20 -06:00
}
2011-07-11 13:28:04 -06:00
vmcs_write32 ( PAGE_FAULT_ERROR_CODE_MASK , 0 ) ;
vmcs_write32 ( PAGE_FAULT_ERROR_CODE_MATCH , 0 ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_write32 ( CR3_TARGET_COUNT , 0 ) ; /* 22.2.1 */
2010-10-19 08:46:55 -06:00
vmcs_write16 ( HOST_FS_SELECTOR , 0 ) ; /* 22.2.4 */
vmcs_write16 ( HOST_GS_SELECTOR , 0 ) ; /* 22.2.4 */
2013-04-11 05:25:10 -06:00
vmx_set_constant_host_state ( vmx ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_writel ( HOST_FS_BASE , 0 ) ; /* 22.2.4 */
vmcs_writel ( HOST_GS_BASE , 0 ) ; /* 22.2.4 */
2017-08-03 13:54:41 -06:00
if ( cpu_has_vmx_vmfunc ( ) )
vmcs_write64 ( VM_FUNCTION_CONTROL , 0 ) ;
2007-05-20 22:28:09 -06:00
vmcs_write32 ( VM_EXIT_MSR_STORE_COUNT , 0 ) ;
vmcs_write32 ( VM_EXIT_MSR_LOAD_COUNT , 0 ) ;
2018-06-20 11:58:37 -06:00
vmcs_write64 ( VM_EXIT_MSR_LOAD_ADDR , __pa ( vmx - > msr_autoload . host . val ) ) ;
2007-05-20 22:28:09 -06:00
vmcs_write32 ( VM_ENTRY_MSR_LOAD_COUNT , 0 ) ;
2018-06-20 11:58:37 -06:00
vmcs_write64 ( VM_ENTRY_MSR_LOAD_ADDR , __pa ( vmx - > msr_autoload . guest . val ) ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2015-04-27 07:11:25 -06:00
if ( vmcs_config . vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT )
vmcs_write64 ( GUEST_IA32_PAT , vmx - > vcpu . arch . pat ) ;
2008-10-09 02:01:55 -06:00
2019-05-07 13:18:00 -06:00
vm_exit_controls_set ( vmx , vmx_vmexit_ctrl ( ) ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
/* 22.2.1, 20.8.1 */
2019-05-07 13:18:00 -06:00
vm_entry_controls_set ( vmx , vmx_vmentry_ctrl ( ) ) ;
2007-07-29 02:07:42 -06:00
2017-02-03 22:18:52 -07:00
vmx - > vcpu . arch . cr0_guest_owned_bits = X86_CR0_TS ;
vmcs_writel ( CR0_GUEST_HOST_MASK , ~ X86_CR0_TS ) ;
2011-05-25 14:09:31 -06:00
set_cr4_guest_host_mask ( vmx ) ;
2007-10-21 03:00:39 -06:00
2019-10-20 03:10:58 -06:00
if ( vmx - > vpid ! = 0 )
vmcs_write16 ( VIRTUAL_PROCESSOR_ID , vmx - > vpid ) ;
2014-12-02 04:14:58 -07:00
if ( vmx_xsaves_supported ( ) )
vmcs_write64 ( XSS_EXIT_BITMAP , VMX_XSS_EXIT_BITMAP ) ;
2016-07-07 15:49:58 -06:00
if ( enable_pml ) {
vmcs_write64 ( PML_ADDRESS , page_to_phys ( vmx - > pml_pg ) ) ;
vmcs_write16 ( GUEST_PML_INDEX , PML_ENTITY_NUM - 1 ) ;
}
KVM: vmx: Inject #UD for SGX ENCLS instruction in guest
Virtualization of Intel SGX depends on Enclave Page Cache (EPC)
management that is not yet available in the kernel, i.e. KVM support
for exposing SGX to a guest cannot be added until basic support
for SGX is upstreamed, which is a WIP[1].
Until SGX is properly supported in KVM, ensure a guest sees expected
behavior for ENCLS, i.e. all ENCLS #UD. Because SGX does not have a
true software enable bit, e.g. there is no CR4.SGXE bit, the ENCLS
instruction can be executed[1] by the guest if SGX is supported by the
system. Intercept all ENCLS leafs (via the ENCLS- exiting control and
field) and unconditionally inject #UD.
[1] https://www.spinics.net/lists/kvm/msg171333.html or
https://lkml.org/lkml/2018/7/3/879
[2] A guest can execute ENCLS in the sense that ENCLS will not take
an immediate #UD, but no ENCLS will ever succeed in a guest
without explicit support from KVM (map EPC memory into the guest),
unless KVM has a *very* egregious bug, e.g. accidentally mapped
EPC memory into the guest SPTEs. In other words this patch is
needed only to prevent the guest from seeing inconsistent behavior,
e.g. #GP (SGX not enabled in Feature Control MSR) or #PF (leaf
operand(s) does not point at EPC memory) instead of #UD on ENCLS.
Intercepting ENCLS is not required to prevent the guest from truly
utilizing SGX.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20180814163334.25724-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-14 10:33:34 -06:00
if ( cpu_has_vmx_encls_vmexit ( ) )
vmcs_write64 ( ENCLS_EXITING_BITMAP , - 1ull ) ;
2018-10-24 02:05:12 -06:00
2020-03-02 16:56:22 -07:00
if ( vmx_pt_mode_is_host_guest ( ) ) {
2018-10-24 02:05:12 -06:00
memset ( & vmx - > pt_desc , 0 , sizeof ( vmx - > pt_desc ) ) ;
/* Bit[6~0] are forced to 1, writes are ignored. */
vmx - > pt_desc . guest . output_mask = 0x7F ;
vmcs_write64 ( GUEST_IA32_RTIT_CTL , 0 ) ;
}
2007-10-21 03:00:39 -06:00
}
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 05:34:08 -06:00
static void vmx_vcpu_reset ( struct kvm_vcpu * vcpu , bool init_event )
2007-10-21 03:00:39 -06:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2014-01-24 08:48:44 -07:00
struct msr_data apic_base_msr ;
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 05:34:08 -06:00
u64 cr0 ;
2007-10-21 03:00:39 -06:00
2009-06-09 05:10:45 -06:00
vmx - > rmode . vm86_active = 0 ;
2018-02-01 14:59:45 -07:00
vmx - > spec_ctrl = 0 ;
2007-10-21 03:00:39 -06:00
2019-07-16 00:55:50 -06:00
vmx - > msr_ia32_umwait_control = 0 ;
2007-12-13 08:50:52 -07:00
vmx - > vcpu . arch . regs [ VCPU_REGS_RDX ] = get_rdx_init_val ( ) ;
2019-09-05 00:26:28 -06:00
vmx - > hv_deadline_tsc = - 1 ;
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 05:34:08 -06:00
kvm_set_cr8 ( vcpu , 0 ) ;
if ( ! init_event ) {
apic_base_msr . data = APIC_DEFAULT_PHYS_BASE |
MSR_IA32_APICBASE_ENABLE ;
if ( kvm_vcpu_is_reset_bsp ( vcpu ) )
apic_base_msr . data | = MSR_IA32_APICBASE_BSP ;
apic_base_msr . host_initiated = true ;
kvm_set_apic_base ( vcpu , & apic_base_msr ) ;
}
2007-10-21 03:00:39 -06:00
2011-04-27 10:42:18 -06:00
vmx_segment_cache_clear ( vmx ) ;
2008-08-20 06:07:31 -06:00
seg_setup ( VCPU_SREG_CS ) ;
2013-03-13 05:42:34 -06:00
vmcs_write16 ( GUEST_CS_SELECTOR , 0xf000 ) ;
2015-12-03 07:49:56 -07:00
vmcs_writel ( GUEST_CS_BASE , 0xffff0000ul ) ;
2007-10-21 03:00:39 -06:00
seg_setup ( VCPU_SREG_DS ) ;
seg_setup ( VCPU_SREG_ES ) ;
seg_setup ( VCPU_SREG_FS ) ;
seg_setup ( VCPU_SREG_GS ) ;
seg_setup ( VCPU_SREG_SS ) ;
vmcs_write16 ( GUEST_TR_SELECTOR , 0 ) ;
vmcs_writel ( GUEST_TR_BASE , 0 ) ;
vmcs_write32 ( GUEST_TR_LIMIT , 0xffff ) ;
vmcs_write32 ( GUEST_TR_AR_BYTES , 0x008b ) ;
vmcs_write16 ( GUEST_LDTR_SELECTOR , 0 ) ;
vmcs_writel ( GUEST_LDTR_BASE , 0 ) ;
vmcs_write32 ( GUEST_LDTR_LIMIT , 0xffff ) ;
vmcs_write32 ( GUEST_LDTR_AR_BYTES , 0x00082 ) ;
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 05:34:08 -06:00
if ( ! init_event ) {
vmcs_write32 ( GUEST_SYSENTER_CS , 0 ) ;
vmcs_writel ( GUEST_SYSENTER_ESP , 0 ) ;
vmcs_writel ( GUEST_SYSENTER_EIP , 0 ) ;
vmcs_write64 ( GUEST_IA32_DEBUGCTL , 0 ) ;
}
2007-10-21 03:00:39 -06:00
KVM: VMX: Fix rflags cache during vCPU reset
Reported by syzkaller:
*** Guest State ***
CR0: actual=0x0000000080010031, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
CR4: actual=0x0000000000002061, shadow=0x0000000000000000, gh_mask=ffffffffffffe8f1
CR3 = 0x000000002081e000
RSP = 0x000000000000fffa RIP = 0x0000000000000000
RFLAGS=0x00023000 DR7 = 0x00000000000000
^^^^^^^^^^
------------[ cut here ]------------
WARNING: CPU: 6 PID: 24431 at /home/kernel/linux/arch/x86/kvm//x86.c:7302 kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
CPU: 6 PID: 24431 Comm: reprotest Tainted: G W OE 4.14.0+ #26
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
RSP: 0018:ffff880291d179e0 EFLAGS: 00010202
Call Trace:
kvm_vcpu_ioctl+0x479/0x880 [kvm]
do_vfs_ioctl+0x142/0x9a0
SyS_ioctl+0x74/0x80
entry_SYSCALL_64_fastpath+0x23/0x9a
The failed vmentry is triggered by the following beautified testcase:
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <stdint.h>
#include <linux/kvm.h>
#include <fcntl.h>
#include <sys/ioctl.h>
long r[5];
int main()
{
struct kvm_debugregs dr = { 0 };
r[2] = open("/dev/kvm", O_RDONLY);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
struct kvm_guest_debug debug = {
.control = 0xf0403,
.arch = {
.debugreg[6] = 0x2,
.debugreg[7] = 0x2
}
};
ioctl(r[4], KVM_SET_GUEST_DEBUG, &debug);
ioctl(r[4], KVM_RUN, 0);
}
which testcase tries to setup the processor specific debug
registers and configure vCPU for handling guest debug events through
KVM_SET_GUEST_DEBUG. The KVM_SET_GUEST_DEBUG ioctl will get and set
rflags in order to set TF bit if single step is needed. All regs' caches
are reset to avail and GUEST_RFLAGS vmcs field is reset to 0x2 during vCPU
reset. However, the cache of rflags is not reset during vCPU reset. The
function vmx_get_rflags() returns an unreset rflags cache value since
the cache is marked avail, it is 0 after boot. Vmentry fails if the
rflags reserved bit 1 is 0.
This patch fixes it by resetting both the GUEST_RFLAGS vmcs field and
its cache to 0x2 during vCPU reset.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-20 15:52:21 -07:00
kvm_set_rflags ( vcpu , X86_EFLAGS_FIXED ) ;
2013-03-13 05:42:34 -06:00
kvm_rip_write ( vcpu , 0xfff0 ) ;
2007-10-21 03:00:39 -06:00
vmcs_writel ( GUEST_GDTR_BASE , 0 ) ;
vmcs_write32 ( GUEST_GDTR_LIMIT , 0xffff ) ;
vmcs_writel ( GUEST_IDTR_BASE , 0 ) ;
vmcs_write32 ( GUEST_IDTR_LIMIT , 0xffff ) ;
2010-12-06 09:53:38 -07:00
vmcs_write32 ( GUEST_ACTIVITY_STATE , GUEST_ACTIVITY_ACTIVE ) ;
2007-10-21 03:00:39 -06:00
vmcs_write32 ( GUEST_INTERRUPTIBILITY_INFO , 0 ) ;
2015-12-03 07:49:56 -07:00
vmcs_writel ( GUEST_PENDING_DBG_EXCEPTIONS , 0 ) ;
2017-10-11 06:10:19 -06:00
if ( kvm_mpx_supported ( ) )
vmcs_write64 ( GUEST_BNDCFGS , 0 ) ;
2007-10-21 03:00:39 -06:00
setup_msrs ( vmx ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
vmcs_write32 ( VM_ENTRY_INTR_INFO_FIELD , 0 ) ; /* 22.2.1 */
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 05:34:08 -06:00
if ( cpu_has_vmx_tpr_shadow ( ) & & ! init_event ) {
2007-10-28 19:40:42 -06:00
vmcs_write64 ( VIRTUAL_APIC_PAGE_ADDR , 0 ) ;
2015-07-29 04:05:37 -06:00
if ( cpu_need_tpr_shadow ( vcpu ) )
2007-10-28 19:40:42 -06:00
vmcs_write64 ( VIRTUAL_APIC_PAGE_ADDR ,
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 05:34:08 -06:00
__pa ( vcpu - > arch . apic - > regs ) ) ;
2007-10-28 19:40:42 -06:00
vmcs_write32 ( TPR_THRESHOLD , 0 ) ;
}
2014-11-02 00:54:30 -06:00
kvm_make_request ( KVM_REQ_APIC_PAGE_RELOAD , vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 05:34:08 -06:00
cr0 = X86_CR0_NW | X86_CR0_CD | X86_CR0_ET ;
vmx - > vcpu . arch . cr0 = cr0 ;
2016-04-28 14:49:21 -06:00
vmx_set_cr0 ( vcpu , cr0 ) ; /* enter rmode */
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 05:34:08 -06:00
vmx_set_cr4 ( vcpu , 0 ) ;
2015-10-19 03:30:19 -06:00
vmx_set_efer ( vcpu , 0 ) ;
2017-02-03 22:18:52 -07:00
KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
References (from Intel SDM):
"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]
"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-13 05:34:08 -06:00
update_exception_bitmap ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2015-09-23 04:26:57 -06:00
vpid_sync_context ( vmx - > vpid ) ;
2018-03-12 05:53:03 -06:00
if ( init_event )
vmx_clear_hlt ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2018-12-03 14:53:18 -07:00
static void enable_irq_window ( struct kvm_vcpu * vcpu )
2008-09-26 01:30:57 -06:00
{
2019-12-06 01:45:24 -07:00
exec_controls_setbit ( to_vmx ( vcpu ) , CPU_BASED_INTR_WINDOW_EXITING ) ;
2008-09-26 01:30:57 -06:00
}
2014-03-07 12:03:15 -07:00
static void enable_nmi_window ( struct kvm_vcpu * vcpu )
2008-09-26 01:30:57 -06:00
{
2017-11-06 05:31:13 -07:00
if ( ! enable_vnmi | |
2017-11-06 05:31:12 -07:00
vmcs_read32 ( GUEST_INTERRUPTIBILITY_INFO ) & GUEST_INTR_STATE_STI ) {
2014-03-07 12:03:15 -07:00
enable_irq_window ( vcpu ) ;
return ;
}
2008-09-26 01:30:57 -06:00
2019-12-06 01:45:25 -07:00
exec_controls_setbit ( to_vmx ( vcpu ) , CPU_BASED_NMI_WINDOW_EXITING ) ;
2008-09-26 01:30:57 -06:00
}
2009-05-11 04:35:50 -06:00
static void vmx_inject_irq ( struct kvm_vcpu * vcpu )
2007-07-06 03:20:49 -06:00
{
2007-11-22 02:42:59 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2009-05-11 04:35:50 -06:00
uint32_t intr ;
int irq = vcpu - > arch . interrupt . nr ;
2007-11-22 02:42:59 -07:00
2009-06-17 06:22:14 -06:00
trace_kvm_inj_virq ( irq ) ;
2008-04-10 13:31:10 -06:00
2008-09-01 06:57:51 -06:00
+ + vcpu - > stat . irq_injections ;
2009-06-09 05:10:45 -06:00
if ( vmx - > rmode . vm86_active ) {
2011-04-13 08:12:54 -06:00
int inc_eip = 0 ;
if ( vcpu - > arch . interrupt . soft )
inc_eip = vcpu - > arch . event_exit_inst_len ;
2019-08-27 15:40:36 -06:00
kvm_inject_realmode_interrupt ( vcpu , irq , inc_eip ) ;
2007-07-06 03:20:49 -06:00
return ;
}
2009-05-11 04:35:50 -06:00
intr = irq | INTR_INFO_VALID_MASK ;
if ( vcpu - > arch . interrupt . soft ) {
intr | = INTR_TYPE_SOFT_INTR ;
vmcs_write32 ( VM_ENTRY_INSTRUCTION_LEN ,
vmx - > vcpu . arch . event_exit_inst_len ) ;
} else
intr | = INTR_TYPE_EXT_INTR ;
vmcs_write32 ( VM_ENTRY_INTR_INFO_FIELD , intr ) ;
2018-03-12 05:53:03 -06:00
vmx_clear_hlt ( vcpu ) ;
2007-07-06 03:20:49 -06:00
}
2008-05-15 04:23:25 -06:00
static void vmx_inject_nmi ( struct kvm_vcpu * vcpu )
{
2008-09-26 01:30:51 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2017-11-06 05:31:13 -07:00
if ( ! enable_vnmi ) {
2017-11-06 05:31:12 -07:00
/*
* Tracking the NMI - blocked state in software is built upon
* finding the next open IRQ window . This , in turn , depends on
* well - behaving guests : They have to keep IRQs disabled at
* least as long as the NMI handler runs . Otherwise we may
* cause NMI nesting , maybe breaking the guest . But as this is
* highly unlikely , we can live with the residual risk .
*/
vmx - > loaded_vmcs - > soft_vnmi_blocked = 1 ;
vmx - > loaded_vmcs - > vnmi_blocked_time = 0 ;
}
2017-07-14 05:36:11 -06:00
+ + vcpu - > stat . nmi_injections ;
vmx - > loaded_vmcs - > nmi_known_unmasked = false ;
2008-09-26 01:30:57 -06:00
2009-06-09 05:10:45 -06:00
if ( vmx - > rmode . vm86_active ) {
2019-08-27 15:40:36 -06:00
kvm_inject_realmode_interrupt ( vcpu , NMI_VECTOR , 0 ) ;
2008-09-26 01:30:51 -06:00
return ;
}
KVM: nVMX: Fix the NMI IDT-vectoring handling
Run kvm-unit-tests/eventinj.flat in L1:
Sending NMI to self
After NMI to self
FAIL: NMI
This test scenario is to test whether VMM can handle NMI IDT-vectoring info correctly.
At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on both L1
and L0 are empty so:
- The L2 accesses memory can generate EPT violation which can be intercepted by L0.
The EPT violation vmexit occurred during delivery of this NMI, and the NMI info is
recorded in vmcs02's IDT-vectoring info.
- L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT violation into L1.
The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info since
it is a nested vmexit.
- L1 receives the EPT violation, then fixes its EPT12.
- L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits to L0.
- L0 emulates VMRESUME which is called from L1, then return to L2.
L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 through
vmcs02.
- The L2 re-executes the fault instruction and cause EPT violation again.
- Since the L1's EPT12 is valid, L0 can fix its EPT02
- L0 resume L2
The EPT violation vmexit occurred during delivery of this NMI again, and the NMI info
is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through vmentry
event injection since it is caused by EPT02's EPT violation.
However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU is in
guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if it is
the L0's responsibility to inject NMI from IDT-vectoring info to L2.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Bandan Das <bsd@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-22 03:55:54 -06:00
2008-05-15 04:23:25 -06:00
vmcs_write32 ( VM_ENTRY_INTR_INFO_FIELD ,
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR ) ;
2018-03-12 05:53:03 -06:00
vmx_clear_hlt ( vcpu ) ;
2008-05-15 04:23:25 -06:00
}
2018-12-03 14:53:16 -07:00
bool vmx_get_nmi_mask ( struct kvm_vcpu * vcpu )
2009-11-11 17:04:25 -07:00
{
2017-07-14 05:36:11 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
bool masked ;
2017-11-06 05:31:13 -07:00
if ( ! enable_vnmi )
2017-11-06 05:31:12 -07:00
return vmx - > loaded_vmcs - > soft_vnmi_blocked ;
2017-07-14 05:36:11 -06:00
if ( vmx - > loaded_vmcs - > nmi_known_unmasked )
2011-03-07 07:52:07 -07:00
return false ;
2017-07-14 05:36:11 -06:00
masked = vmcs_read32 ( GUEST_INTERRUPTIBILITY_INFO ) & GUEST_INTR_STATE_NMI ;
vmx - > loaded_vmcs - > nmi_known_unmasked = ! masked ;
return masked ;
2009-11-11 17:04:25 -07:00
}
2018-12-03 14:53:16 -07:00
void vmx_set_nmi_mask ( struct kvm_vcpu * vcpu , bool masked )
2009-11-11 17:04:25 -07:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2017-11-06 05:31:13 -07:00
if ( ! enable_vnmi ) {
2017-11-06 05:31:12 -07:00
if ( vmx - > loaded_vmcs - > soft_vnmi_blocked ! = masked ) {
vmx - > loaded_vmcs - > soft_vnmi_blocked = masked ;
vmx - > loaded_vmcs - > vnmi_blocked_time = 0 ;
}
} else {
vmx - > loaded_vmcs - > nmi_known_unmasked = ! masked ;
if ( masked )
vmcs_set_bits ( GUEST_INTERRUPTIBILITY_INFO ,
GUEST_INTR_STATE_NMI ) ;
else
vmcs_clear_bits ( GUEST_INTERRUPTIBILITY_INFO ,
GUEST_INTR_STATE_NMI ) ;
}
2009-11-11 17:04:25 -07:00
}
2020-04-22 20:25:44 -06:00
bool vmx_nmi_blocked ( struct kvm_vcpu * vcpu )
{
if ( is_guest_mode ( vcpu ) & & nested_exit_on_nmi ( vcpu ) )
return false ;
if ( ! enable_vnmi & & to_vmx ( vcpu ) - > loaded_vmcs - > soft_vnmi_blocked )
return true ;
return ( vmcs_read32 ( GUEST_INTERRUPTIBILITY_INFO ) &
( GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_STI |
GUEST_INTR_STATE_NMI ) ) ;
}
2020-05-22 09:21:49 -06:00
static int vmx_nmi_allowed ( struct kvm_vcpu * vcpu , bool for_injection )
2013-04-14 04:12:47 -06:00
{
2014-03-07 12:03:12 -07:00
if ( to_vmx ( vcpu ) - > nested . nested_run_pending )
2020-05-22 09:21:49 -06:00
return - EBUSY ;
2013-04-14 13:04:26 -06:00
2020-04-23 12:08:58 -06:00
/* An NMI must not be injected into L2 if it's supposed to VM-Exit. */
if ( for_injection & & is_guest_mode ( vcpu ) & & nested_exit_on_nmi ( vcpu ) )
2020-05-22 09:21:49 -06:00
return - EBUSY ;
2020-04-23 12:08:58 -06:00
2020-04-22 20:25:44 -06:00
return ! vmx_nmi_blocked ( vcpu ) ;
}
2020-04-22 20:25:42 -06:00
2020-04-22 20:25:44 -06:00
bool vmx_interrupt_blocked ( struct kvm_vcpu * vcpu )
{
if ( is_guest_mode ( vcpu ) & & nested_exit_on_intr ( vcpu ) )
2020-04-22 20:25:41 -06:00
return false ;
2017-11-06 05:31:12 -07:00
2020-04-22 20:25:50 -06:00
return ! ( vmx_get_rflags ( vcpu ) & X86_EFLAGS_IF ) | |
2020-04-22 20:25:44 -06:00
( vmcs_read32 ( GUEST_INTERRUPTIBILITY_INFO ) &
( GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS ) ) ;
2013-04-14 04:12:47 -06:00
}
2020-05-22 09:21:49 -06:00
static int vmx_interrupt_allowed ( struct kvm_vcpu * vcpu , bool for_injection )
2009-03-23 04:12:11 -06:00
{
2020-03-02 23:27:35 -07:00
if ( to_vmx ( vcpu ) - > nested . nested_run_pending )
2020-05-22 09:21:49 -06:00
return - EBUSY ;
2020-03-02 23:27:35 -07:00
2020-04-23 12:08:58 -06:00
/*
* An IRQ must not be injected into L2 if it ' s supposed to VM - Exit ,
* e . g . if the IRQ arrived asynchronously after checking nested events .
*/
if ( for_injection & & is_guest_mode ( vcpu ) & & nested_exit_on_intr ( vcpu ) )
2020-05-22 09:21:49 -06:00
return - EBUSY ;
2020-04-23 12:08:58 -06:00
2020-04-22 20:25:44 -06:00
return ! vmx_interrupt_blocked ( vcpu ) ;
2009-03-23 04:12:11 -06:00
}
2007-10-24 16:29:55 -06:00
static int vmx_set_tss_addr ( struct kvm * kvm , unsigned int addr )
{
int ret ;
2018-03-05 13:04:36 -07:00
if ( enable_unrestricted_guest )
return 0 ;
2020-01-09 07:57:16 -07:00
mutex_lock ( & kvm - > slots_lock ) ;
ret = __x86_set_memory_region ( kvm , TSS_PRIVATE_MEMSLOT , addr ,
PAGE_SIZE * 3 ) ;
mutex_unlock ( & kvm - > slots_lock ) ;
2007-10-24 16:29:55 -06:00
if ( ret )
return ret ;
2018-03-20 13:17:20 -06:00
to_kvm_vmx ( kvm ) - > tss_addr = addr ;
2014-09-16 05:37:40 -06:00
return init_rmode_tss ( kvm ) ;
2007-10-24 16:29:55 -06:00
}
2018-03-20 13:17:19 -06:00
static int vmx_set_identity_map_addr ( struct kvm * kvm , u64 ident_addr )
{
2018-03-20 13:17:20 -06:00
to_kvm_vmx ( kvm ) - > ept_identity_map_addr = ident_addr ;
2018-03-20 13:17:19 -06:00
return 0 ;
}
2012-12-20 07:57:47 -07:00
static bool rmode_exception ( struct kvm_vcpu * vcpu , int vec )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2008-07-14 04:28:51 -06:00
switch ( vec ) {
case BP_VECTOR :
2010-02-23 09:47:53 -07:00
/*
* Update instruction length as we may reinject the exception
* from user space while in guest debugging mode .
*/
to_vmx ( vcpu ) - > vcpu . arch . event_exit_inst_len =
vmcs_read32 ( VM_EXIT_INSTRUCTION_LEN ) ;
2008-12-15 05:52:10 -07:00
if ( vcpu - > guest_debug & KVM_GUESTDBG_USE_SW_BP )
2012-12-20 07:57:47 -07:00
return false ;
/* fall through */
case DB_VECTOR :
2020-02-18 19:45:48 -07:00
return ! ( vcpu - > guest_debug &
( KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP ) ) ;
2008-12-15 05:52:10 -07:00
case DE_VECTOR :
2008-07-14 04:28:51 -06:00
case OF_VECTOR :
case BR_VECTOR :
case UD_VECTOR :
case DF_VECTOR :
case SS_VECTOR :
case GP_VECTOR :
case MF_VECTOR :
2012-12-20 07:57:47 -07:00
return true ;
2008-07-14 04:28:51 -06:00
}
2012-12-20 07:57:47 -07:00
return false ;
}
static int handle_rmode_exception ( struct kvm_vcpu * vcpu ,
int vec , u32 err_code )
{
/*
* Instruction with address size override prefix opcode 0x67
* Cause the # SS fault with 0 error code in VM86 mode .
*/
if ( ( ( vec = = GP_VECTOR ) | | ( vec = = SS_VECTOR ) ) & & err_code = = 0 ) {
KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27 15:40:38 -06:00
if ( kvm_emulate_instruction ( vcpu , 0 ) ) {
2012-12-20 07:57:47 -07:00
if ( vcpu - > arch . halt_request ) {
vcpu - > arch . halt_request = 0 ;
2015-03-02 12:43:31 -07:00
return kvm_vcpu_halt ( vcpu ) ;
2012-12-20 07:57:47 -07:00
}
return 1 ;
}
return 0 ;
}
/*
* Forward all other exceptions that are valid in real mode .
* FIXME : Breaks guest debugging in real mode , needs to be fixed with
* the required debugging infrastructure rework .
*/
kvm_queue_exception ( vcpu , vec ) ;
return 1 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2009-06-08 03:37:09 -06:00
/*
* Trigger machine check on the host . We assume all the MSRs are already set up
* by the CPU and that we still run on the same CPU as the MCE occurred on .
* We pass a fake environment to the machine check handler because we want
* the guest to be always treated like user space , no matter what context
* it used internally .
*/
static void kvm_machine_check ( void )
{
2020-04-14 01:14:14 -06:00
# if defined(CONFIG_X86_MCE)
2009-06-08 03:37:09 -06:00
struct pt_regs regs = {
. cs = 3 , /* Fake ring 3 no matter what the guest ran on */
. flags = X86_EFLAGS_IF ,
} ;
2020-02-25 15:33:23 -07:00
do_machine_check ( & regs ) ;
2009-06-08 03:37:09 -06:00
# endif
}
2009-08-24 02:10:17 -06:00
static int handle_machine_check ( struct kvm_vcpu * vcpu )
2009-06-08 03:37:09 -06:00
{
2019-04-19 23:50:59 -06:00
/* handled by vmx_vcpu_run() */
2009-06-08 03:37:09 -06:00
return 1 ;
}
2020-04-10 05:54:02 -06:00
/*
* If the host has split lock detection disabled , then # AC is
* unconditionally injected into the guest , which is the pre split lock
* detection behaviour .
*
* If the host has split lock detection enabled then # AC is
* only injected into the guest when :
* - Guest CPL = = 3 ( user mode )
* - Guest has # AC detection enabled in CR0
* - Guest EFLAGS has AC bit set
*/
static inline bool guest_inject_ac ( struct kvm_vcpu * vcpu )
{
if ( ! boot_cpu_has ( X86_FEATURE_SPLIT_LOCK_DETECT ) )
return true ;
return vmx_get_cpl ( vcpu ) = = 3 & & kvm_read_cr0_bits ( vcpu , X86_CR0_AM ) & &
( kvm_get_rflags ( vcpu ) & X86_EFLAGS_AC ) ;
}
2019-04-19 23:50:59 -06:00
static int handle_exception_nmi ( struct kvm_vcpu * vcpu )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2007-11-22 02:30:47 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2009-08-24 02:10:17 -06:00
struct kvm_run * kvm_run = vcpu - > run ;
2008-12-15 05:52:10 -07:00
u32 intr_info , ex_no , error_code ;
2008-12-15 05:52:10 -07:00
unsigned long cr2 , rip , dr6 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
u32 vect_info ;
2007-11-22 02:30:47 -07:00
vect_info = vmx - > idt_vectoring_info ;
2020-04-27 11:18:37 -06:00
intr_info = vmx_get_intr_info ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2019-06-06 06:57:25 -06:00
if ( is_machine_check ( intr_info ) | | is_nmi ( intr_info ) )
2019-04-19 23:50:59 -06:00
return 1 ; /* handled by handle_exception_nmi_irqoff() */
2007-04-27 00:29:49 -06:00
2018-04-03 17:28:48 -06:00
if ( is_invalid_opcode ( intr_info ) )
return handle_ud ( vcpu ) ;
2007-09-17 13:57:50 -06:00
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
error_code = 0 ;
2008-02-11 09:26:38 -07:00
if ( intr_info & INTR_INFO_DELIVER_CODE_MASK )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
error_code = vmcs_read32 ( VM_EXIT_INTR_ERROR_CODE ) ;
2012-10-16 23:48:06 -06:00
2018-03-12 05:12:51 -06:00
if ( ! vmx - > rmode . vm86_active & & is_gp_fault ( intr_info ) ) {
WARN_ON_ONCE ( ! enable_vmware_backdoor ) ;
2019-08-27 15:40:30 -06:00
/*
* VMware backdoor emulation on # GP interception only handles
* IN { S } , OUT { S } , and RDPMC , none of which generate a non - zero
* error code on # GP .
*/
if ( error_code ) {
kvm_queue_exception_e ( vcpu , GP_VECTOR , error_code ) ;
return 1 ;
}
KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27 15:40:38 -06:00
return kvm_emulate_instruction ( vcpu , EMULTYPE_VMWARE_GP ) ;
2018-03-12 05:12:51 -06:00
}
2012-10-16 23:48:06 -06:00
/*
* The # PF with PFEC . RSVD = 1 indicates the guest is accessing
* MMIO , it is better to report an internal error .
* See the comments in vmx_handle_exit .
*/
if ( ( vect_info & VECTORING_INFO_VALID_MASK ) & &
! ( is_page_fault ( intr_info ) & & ! ( error_code & PFERR_RSVD_MASK ) ) ) {
vcpu - > run - > exit_reason = KVM_EXIT_INTERNAL_ERROR ;
vcpu - > run - > internal . suberror = KVM_INTERNAL_ERROR_SIMUL_EX ;
2015-04-02 13:11:05 -06:00
vcpu - > run - > internal . ndata = 3 ;
2012-10-16 23:48:06 -06:00
vcpu - > run - > internal . data [ 0 ] = vect_info ;
vcpu - > run - > internal . data [ 1 ] = intr_info ;
2015-04-02 13:11:05 -06:00
vcpu - > run - > internal . data [ 2 ] = error_code ;
2012-10-16 23:48:06 -06:00
return 0 ;
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
if ( is_page_fault ( intr_info ) ) {
2020-04-15 14:34:53 -06:00
cr2 = vmx_get_exit_qual ( vcpu ) ;
2017-07-13 19:30:40 -06:00
/* EPT won't cause page fault directly */
2020-05-25 08:41:17 -06:00
WARN_ON_ONCE ( ! vcpu - > arch . apf . host_apf_flags & & enable_ept ) ;
2017-08-11 10:36:43 -06:00
return kvm_handle_page_fault ( vcpu , error_code , cr2 , NULL , 0 ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2008-12-15 05:52:10 -07:00
ex_no = intr_info & INTR_INFO_VECTOR_MASK ;
2012-12-20 07:57:47 -07:00
if ( vmx - > rmode . vm86_active & & rmode_exception ( vcpu , ex_no ) )
return handle_rmode_exception ( vcpu , ex_no , error_code ) ;
2008-12-15 05:52:10 -07:00
switch ( ex_no ) {
case DB_VECTOR :
2020-04-15 14:34:53 -06:00
dr6 = vmx_get_exit_qual ( vcpu ) ;
2008-12-15 05:52:10 -07:00
if ( ! ( vcpu - > guest_debug &
( KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP ) ) ) {
kvm/x86: fix icebp instruction handling
The undocumented 'icebp' instruction (aka 'int1') works pretty much like
'int3' in the absense of in-circuit probing equipment (except,
obviously, that it raises #DB instead of raising #BP), and is used by
some validation test-suites as such.
But Andy Lutomirski noticed that his test suite acted differently in kvm
than on bare hardware.
The reason is that kvm used an inexact test for the icebp instruction:
it just assumed that an all-zero VM exit qualification value meant that
the VM exit was due to icebp.
That is not unlike the guess that do_debug() does for the actual
exception handling case, but it's purely a heuristic, not an absolute
rule. do_debug() does it because it wants to ascribe _some_ reasons to
the #DB that happened, and an empty %dr6 value means that 'icebp' is the
most likely casue and we have no better information.
But kvm can just do it right, because unlike the do_debug() case, kvm
actually sees the real reason for the #DB in the VM-exit interruption
information field.
So instead of relying on an inexact heuristic, just use the actual VM
exit information that says "it was 'icebp'".
Right now the 'icebp' instruction isn't technically documented by Intel,
but that will hopefully change. The special "privileged software
exception" information _is_ actually mentioned in the Intel SDM, even
though the cause of it isn't enumerated.
Reported-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-20 13:16:59 -06:00
if ( is_icebp ( intr_info ) )
2019-08-27 15:40:39 -06:00
WARN_ON ( ! skip_emulated_instruction ( vcpu ) ) ;
2014-04-16 03:02:51 -06:00
2020-05-05 05:33:20 -06:00
kvm_queue_exception_p ( vcpu , DB_VECTOR , dr6 ) ;
2008-12-15 05:52:10 -07:00
return 1 ;
}
2020-05-05 14:49:58 -06:00
kvm_run - > debug . arch . dr6 = dr6 | DR6_FIXED_1 | DR6_RTM ;
2008-12-15 05:52:10 -07:00
kvm_run - > debug . arch . dr7 = vmcs_readl ( GUEST_DR7 ) ;
/* fall through */
case BP_VECTOR :
2010-02-23 09:47:53 -07:00
/*
* Update instruction length as we may reinject # BP from
* user space while in guest debugging mode . Reading it for
* # DB as well causes no harm , it is not used in that case .
*/
vmx - > vcpu . arch . event_exit_inst_len =
vmcs_read32 ( VM_EXIT_INSTRUCTION_LEN ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
kvm_run - > exit_reason = KVM_EXIT_DEBUG ;
2011-04-28 06:59:33 -06:00
rip = kvm_rip_read ( vcpu ) ;
2008-12-15 05:52:10 -07:00
kvm_run - > debug . arch . pc = vmcs_readl ( GUEST_CS_BASE ) + rip ;
kvm_run - > debug . arch . exception = ex_no ;
2008-12-15 05:52:10 -07:00
break ;
2020-04-10 05:54:02 -06:00
case AC_VECTOR :
if ( guest_inject_ac ( vcpu ) ) {
kvm_queue_exception_e ( vcpu , AC_VECTOR , error_code ) ;
return 1 ;
}
/*
* Handle split lock . Depending on detection mode this will
* either warn and disable split lock detection for this
* task or force SIGBUS on it .
*/
if ( handle_guest_split_lock ( kvm_rip_read ( vcpu ) ) )
return 1 ;
fallthrough ;
2008-12-15 05:52:10 -07:00
default :
2008-12-15 05:52:10 -07:00
kvm_run - > exit_reason = KVM_EXIT_EXCEPTION ;
kvm_run - > ex . exception = ex_no ;
kvm_run - > ex . error_code = error_code ;
2008-12-15 05:52:10 -07:00
break ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
return 0 ;
}
2019-11-04 15:59:58 -07:00
static __always_inline int handle_external_interrupt ( struct kvm_vcpu * vcpu )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2007-04-19 08:27:43 -06:00
+ + vcpu - > stat . irq_exits ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
return 1 ;
}
2009-08-24 02:10:17 -06:00
static int handle_triple_fault ( struct kvm_vcpu * vcpu )
2007-02-12 01:54:36 -07:00
{
2009-08-24 02:10:17 -06:00
vcpu - > run - > exit_reason = KVM_EXIT_SHUTDOWN ;
KVM: X86: Fix residual mmio emulation request to userspace
Reported by syzkaller:
The kvm-intel.unrestricted_guest=0
WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
CPU: 5 PID: 1014 Comm: warn_test Tainted: G W OE 4.13.0-rc3+ #8
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
Call Trace:
? put_pid+0x3a/0x50
? rcu_read_lock_sched_held+0x79/0x80
? kmem_cache_free+0x2f2/0x350
kvm_vcpu_ioctl+0x340/0x700 [kvm]
? kvm_vcpu_ioctl+0x340/0x700 [kvm]
? __fget+0xfc/0x210
do_vfs_ioctl+0xa4/0x6a0
? __fget+0x11d/0x210
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc2
? __this_cpu_preempt_check+0x13/0x20
The syszkaller folks reported a residual mmio emulation request to userspace
due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and
incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true
and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs
several threads to launch the same vCPU, the thread which lauch this vCPU after
the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will
trigger the warning.
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <linux/kvm.h>
#include <stdio.h>
int kvmcpu;
struct kvm_run *run;
void* thr(void* arg)
{
int res;
res = ioctl(kvmcpu, KVM_RUN, 0);
printf("ret1=%d exit_reason=%d suberror=%d\n",
res, run->exit_reason, run->internal.suberror);
return 0;
}
void test()
{
int i, kvm, kvmvm;
pthread_t th[4];
kvm = open("/dev/kvm", O_RDWR);
kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0);
srand(getpid());
for (i = 0; i < 4; i++) {
pthread_create(&th[i], 0, thr, 0);
usleep(rand() % 10000);
}
for (i = 0; i < 4; i++)
pthread_join(th[i], 0);
}
int main()
{
for (;;) {
int pid = fork();
if (pid < 0)
exit(1);
if (pid == 0) {
test();
exit(0);
}
int status;
while (waitpid(pid, &status, __WALL) != pid) {}
}
return 0;
}
This patch fixes it by resetting the vcpu->mmio_needed once we receive
the triple fault to avoid the residue.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-09 23:33:12 -06:00
vcpu - > mmio_needed = 0 ;
2007-02-12 01:54:36 -07:00
return 0 ;
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2009-08-24 02:10:17 -06:00
static int handle_io ( struct kvm_vcpu * vcpu )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2007-09-12 00:18:28 -06:00
unsigned long exit_qualification ;
2018-03-08 09:57:27 -07:00
int size , in , string ;
2007-03-20 04:46:50 -06:00
unsigned port ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2020-04-15 14:34:53 -06:00
exit_qualification = vmx_get_exit_qual ( vcpu ) ;
2007-03-20 04:46:50 -06:00
string = ( exit_qualification & 16 ) ! = 0 ;
2007-08-05 01:36:40 -06:00
2010-03-18 07:20:23 -06:00
+ + vcpu - > stat . io_exits ;
2007-08-05 01:36:40 -06:00
2018-03-08 09:57:26 -07:00
if ( string )
KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27 15:40:38 -06:00
return kvm_emulate_instruction ( vcpu , 0 ) ;
2007-08-05 01:36:40 -06:00
2010-03-18 07:20:23 -06:00
port = exit_qualification > > 16 ;
size = ( exit_qualification & 7 ) + 1 ;
2018-03-08 09:57:26 -07:00
in = ( exit_qualification & 8 ) ! = 0 ;
2010-03-18 07:20:23 -06:00
2018-03-08 09:57:27 -07:00
return kvm_fast_pio ( vcpu , size , port , in ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2007-02-19 05:37:47 -07:00
static void
vmx_patch_hypercall ( struct kvm_vcpu * vcpu , unsigned char * hypercall )
{
/*
* Patch in the VMCALL instruction :
*/
hypercall [ 0 ] = 0x0f ;
hypercall [ 1 ] = 0x01 ;
hypercall [ 2 ] = 0xc1 ;
}
2012-06-28 01:16:19 -06:00
/* called to set cr0 as appropriate for a mov-to-cr0 exit. */
2011-05-25 14:14:38 -06:00
static int handle_set_cr0 ( struct kvm_vcpu * vcpu , unsigned long val )
{
if ( is_guest_mode ( vcpu ) ) {
2013-03-07 06:08:07 -07:00
struct vmcs12 * vmcs12 = get_vmcs12 ( vcpu ) ;
unsigned long orig_val = val ;
2011-05-25 14:14:38 -06:00
/*
* We get here when L2 changed cr0 in a way that did not change
* any of L1 ' s shadowed bits ( see nested_vmx_exit_handled_cr ) ,
2013-03-07 06:08:07 -07:00
* but did change L0 shadowed bits . So we first calculate the
* effective cr0 value that L1 would like to write into the
* hardware . It consists of the L2 - owned bits from the new
* value combined with the L1 - owned bits from L1 ' s guest_cr0 .
2011-05-25 14:14:38 -06:00
*/
2013-03-07 06:08:07 -07:00
val = ( val & ~ vmcs12 - > cr0_guest_host_mask ) |
( vmcs12 - > guest_cr0 & vmcs12 - > cr0_guest_host_mask ) ;
KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
all CR0 and CR4 bits are allowed to be 1 during VMX operation.
This does not match real hardware, which disallows the high 32 bits of
CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
which are defined in the SDM but missing according to CPUID). A guest
can induce a VM-entry failure by setting these bits in GUEST_CR0 and
GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
valid.
Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
checks on these registers do not verify must-be-0 bits. Fix these checks
to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
This patch should introduce no change in behavior in KVM, since these
MSRs are still -1ULL.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 19:14:08 -07:00
if ( ! nested_guest_cr0_valid ( vcpu , val ) )
2011-05-25 14:14:38 -06:00
return 1 ;
2013-03-07 06:08:07 -07:00
if ( kvm_set_cr0 ( vcpu , val ) )
return 1 ;
vmcs_writel ( CR0_READ_SHADOW , orig_val ) ;
2011-05-25 14:14:38 -06:00
return 0 ;
2013-03-07 06:08:07 -07:00
} else {
if ( to_vmx ( vcpu ) - > nested . vmxon & &
KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
all CR0 and CR4 bits are allowed to be 1 during VMX operation.
This does not match real hardware, which disallows the high 32 bits of
CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
which are defined in the SDM but missing according to CPUID). A guest
can induce a VM-entry failure by setting these bits in GUEST_CR0 and
GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
valid.
Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
checks on these registers do not verify must-be-0 bits. Fix these checks
to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
This patch should introduce no change in behavior in KVM, since these
MSRs are still -1ULL.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 19:14:08 -07:00
! nested_host_cr0_valid ( vcpu , val ) )
2013-03-07 06:08:07 -07:00
return 1 ;
KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
all CR0 and CR4 bits are allowed to be 1 during VMX operation.
This does not match real hardware, which disallows the high 32 bits of
CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
which are defined in the SDM but missing according to CPUID). A guest
can induce a VM-entry failure by setting these bits in GUEST_CR0 and
GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
valid.
Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
checks on these registers do not verify must-be-0 bits. Fix these checks
to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
This patch should introduce no change in behavior in KVM, since these
MSRs are still -1ULL.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 19:14:08 -07:00
2011-05-25 14:14:38 -06:00
return kvm_set_cr0 ( vcpu , val ) ;
2013-03-07 06:08:07 -07:00
}
2011-05-25 14:14:38 -06:00
}
static int handle_set_cr4 ( struct kvm_vcpu * vcpu , unsigned long val )
{
if ( is_guest_mode ( vcpu ) ) {
2013-03-07 06:08:07 -07:00
struct vmcs12 * vmcs12 = get_vmcs12 ( vcpu ) ;
unsigned long orig_val = val ;
/* analogously to handle_set_cr0 */
val = ( val & ~ vmcs12 - > cr4_guest_host_mask ) |
( vmcs12 - > guest_cr4 & vmcs12 - > cr4_guest_host_mask ) ;
if ( kvm_set_cr4 ( vcpu , val ) )
2011-05-25 14:14:38 -06:00
return 1 ;
2013-03-07 06:08:07 -07:00
vmcs_writel ( CR4_READ_SHADOW , orig_val ) ;
2011-05-25 14:14:38 -06:00
return 0 ;
} else
return kvm_set_cr4 ( vcpu , val ) ;
}
2016-07-12 02:44:55 -06:00
static int handle_desc ( struct kvm_vcpu * vcpu )
{
WARN_ON ( ! ( vcpu - > arch . cr4 & X86_CR4_UMIP ) ) ;
KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27 15:40:38 -06:00
return kvm_emulate_instruction ( vcpu , 0 ) ;
2016-07-12 02:44:55 -06:00
}
2009-08-24 02:10:17 -06:00
static int handle_cr ( struct kvm_vcpu * vcpu )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2009-06-17 06:22:14 -06:00
unsigned long exit_qualification , val ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
int cr ;
int reg ;
2010-06-10 08:02:14 -06:00
int err ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
int ret ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2020-04-15 14:34:53 -06:00
exit_qualification = vmx_get_exit_qual ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
cr = exit_qualification & 15 ;
reg = ( exit_qualification > > 8 ) & 15 ;
switch ( ( exit_qualification > > 4 ) & 3 ) {
case 0 : /* mov to cr */
2014-06-18 08:19:25 -06:00
val = kvm_register_readl ( vcpu , reg ) ;
2009-06-17 06:22:14 -06:00
trace_kvm_cr_write ( cr , val ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
switch ( cr ) {
case 0 :
2011-05-25 14:14:38 -06:00
err = handle_set_cr0 ( vcpu , val ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_complete_insn_gp ( vcpu , err ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
case 3 :
2018-03-05 13:04:41 -07:00
WARN_ON_ONCE ( enable_unrestricted_guest ) ;
2010-06-10 08:02:16 -06:00
err = kvm_set_cr3 ( vcpu , val ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_complete_insn_gp ( vcpu , err ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
case 4 :
2011-05-25 14:14:38 -06:00
err = handle_set_cr4 ( vcpu , val ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_complete_insn_gp ( vcpu , err ) ;
2009-04-21 08:45:06 -06:00
case 8 : {
u8 cr8_prev = kvm_get_cr8 ( vcpu ) ;
2014-06-18 08:19:25 -06:00
u8 cr8 = ( u8 ) val ;
2010-12-21 03:12:00 -07:00
err = kvm_set_cr8 ( vcpu , cr8 ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
ret = kvm_complete_insn_gp ( vcpu , err ) ;
2015-07-29 04:05:37 -06:00
if ( lapic_in_kernel ( vcpu ) )
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return ret ;
2009-04-21 08:45:06 -06:00
if ( cr8_prev < = cr8 )
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return ret ;
/*
* TODO : we might be squashing a
* KVM_GUESTDBG_SINGLESTEP - triggered
* KVM_EXIT_DEBUG here .
*/
2009-08-24 02:10:17 -06:00
vcpu - > run - > exit_reason = KVM_EXIT_SET_TPR ;
2009-04-21 08:45:06 -06:00
return 0 ;
}
2012-09-18 10:36:14 -06:00
}
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
break ;
2007-04-27 00:29:21 -06:00
case 2 : /* clts */
2017-02-03 22:18:52 -07:00
WARN_ONCE ( 1 , " Guest should always own CR0.TS " ) ;
vmx_set_cr0 ( vcpu , kvm_read_cr0_bits ( vcpu , ~ X86_CR0_TS ) ) ;
2009-12-29 09:07:30 -07:00
trace_kvm_cr_write ( 0 , kvm_read_cr0 ( vcpu ) ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
case 1 : /*mov from cr*/
switch ( cr ) {
case 3 :
2018-03-05 13:04:41 -07:00
WARN_ON_ONCE ( enable_unrestricted_guest ) ;
2010-12-05 08:30:00 -07:00
val = kvm_read_cr3 ( vcpu ) ;
kvm_register_write ( vcpu , reg , val ) ;
trace_kvm_cr_read ( cr , val ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
case 8 :
2009-06-17 06:22:14 -06:00
val = kvm_get_cr8 ( vcpu ) ;
kvm_register_write ( vcpu , reg , val ) ;
trace_kvm_cr_read ( cr , val ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
break ;
case 3 : /* lmsw */
2009-12-29 08:33:58 -07:00
val = ( exit_qualification > > LMSW_SOURCE_DATA_SHIFT ) & 0x0f ;
2009-12-29 09:07:30 -07:00
trace_kvm_cr_write ( 0 , ( kvm_read_cr0 ( vcpu ) & ~ 0xful ) | val ) ;
2009-12-29 08:33:58 -07:00
kvm_lmsw ( vcpu , val ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
default :
break ;
}
2009-08-24 02:10:17 -06:00
vcpu - > run - > exit_reason = 0 ;
KVM: Cleanup the kvm_print functions and introduce pr_XX wrappers
Introduces a couple of print functions, which are essentially wrappers
around standard printk functions, with a KVM: prefix.
Functions introduced or modified are:
- kvm_err(fmt, ...)
- kvm_info(fmt, ...)
- kvm_debug(fmt, ...)
- kvm_pr_unimpl(fmt, ...)
- pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...)
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-03 12:17:48 -06:00
vcpu_unimpl ( vcpu , " unhandled control register: op %d cr %d \n " ,
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
( int ) ( exit_qualification > > 4 ) & 3 , cr ) ;
return 0 ;
}
2009-08-24 02:10:17 -06:00
static int handle_dr ( struct kvm_vcpu * vcpu )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2007-09-12 00:18:28 -06:00
unsigned long exit_qualification ;
2014-10-02 16:10:05 -06:00
int dr , dr7 , reg ;
2020-04-15 14:34:53 -06:00
exit_qualification = vmx_get_exit_qual ( vcpu ) ;
2014-10-02 16:10:05 -06:00
dr = exit_qualification & DEBUG_REG_ACCESS_NUM ;
/* First, if DR does not exist, trigger UD */
if ( ! kvm_require_dr ( vcpu , dr ) )
return 1 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
2010-01-20 10:20:20 -07:00
/* Do not handle if the CPL > 0, will trigger GP on re-entry */
2009-09-01 03:03:25 -06:00
if ( ! kvm_require_cpl ( vcpu , 0 ) )
return 1 ;
2014-10-02 16:10:05 -06:00
dr7 = vmcs_readl ( GUEST_DR7 ) ;
if ( dr7 & DR7_GD ) {
2008-12-15 05:52:10 -07:00
/*
* As the vm - exit takes precedence over the debug trap , we
* need to emulate the latter , either for the host or the
* guest debugging itself .
*/
if ( vcpu - > guest_debug & KVM_GUESTDBG_USE_HW_BP ) {
2020-05-06 03:59:39 -06:00
vcpu - > run - > debug . arch . dr6 = DR6_BD | DR6_RTM | DR6_FIXED_1 ;
2014-10-02 16:10:05 -06:00
vcpu - > run - > debug . arch . dr7 = dr7 ;
2014-11-02 02:54:45 -07:00
vcpu - > run - > debug . arch . pc = kvm_get_linear_rip ( vcpu ) ;
2009-08-24 02:10:17 -06:00
vcpu - > run - > debug . arch . exception = DB_VECTOR ;
vcpu - > run - > exit_reason = KVM_EXIT_DEBUG ;
2008-12-15 05:52:10 -07:00
return 0 ;
} else {
2020-05-05 05:33:20 -06:00
kvm_queue_exception_p ( vcpu , DB_VECTOR , DR6_BD ) ;
2008-12-15 05:52:10 -07:00
return 1 ;
}
}
2014-02-21 02:32:27 -07:00
if ( vcpu - > guest_debug = = 0 ) {
2019-05-07 13:17:56 -06:00
exec_controls_clearbit ( to_vmx ( vcpu ) , CPU_BASED_MOV_DR_EXITING ) ;
2014-02-21 02:32:27 -07:00
/*
* No more DR vmexits ; force a reload of the debug registers
* and reenter on this instruction . The next vmexit will
* retrieve the full state of the debug registers .
*/
vcpu - > arch . switch_db_regs | = KVM_DEBUGREG_WONT_EXIT ;
return 1 ;
}
2008-12-15 05:52:10 -07:00
reg = DEBUG_REG_ACCESS_REG ( exit_qualification ) ;
if ( exit_qualification & TYPE_MOV_FROM_DR ) {
2010-04-13 01:05:23 -06:00
unsigned long val ;
2013-12-18 11:16:24 -07:00
if ( kvm_get_dr ( vcpu , dr , & val ) )
return 1 ;
kvm_register_write ( vcpu , reg , val ) ;
2010-04-13 01:05:23 -06:00
} else
2014-06-18 08:19:23 -06:00
if ( kvm_set_dr ( vcpu , dr , kvm_register_readl ( vcpu , reg ) ) )
2013-12-18 11:16:24 -07:00
return 1 ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2014-02-21 02:32:27 -07:00
static void vmx_sync_dirty_debug_regs ( struct kvm_vcpu * vcpu )
{
get_debugreg ( vcpu - > arch . db [ 0 ] , 0 ) ;
get_debugreg ( vcpu - > arch . db [ 1 ] , 1 ) ;
get_debugreg ( vcpu - > arch . db [ 2 ] , 2 ) ;
get_debugreg ( vcpu - > arch . db [ 3 ] , 3 ) ;
get_debugreg ( vcpu - > arch . dr6 , 6 ) ;
vcpu - > arch . dr7 = vmcs_readl ( GUEST_DR7 ) ;
vcpu - > arch . switch_db_regs & = ~ KVM_DEBUGREG_WONT_EXIT ;
2019-05-07 13:17:56 -06:00
exec_controls_setbit ( to_vmx ( vcpu ) , CPU_BASED_MOV_DR_EXITING ) ;
2014-02-21 02:32:27 -07:00
}
2010-04-13 01:05:23 -06:00
static void vmx_set_dr7 ( struct kvm_vcpu * vcpu , unsigned long val )
{
vmcs_writel ( GUEST_DR7 , val ) ;
}
2009-08-24 02:10:17 -06:00
static int handle_tpr_below_threshold ( struct kvm_vcpu * vcpu )
2007-09-12 04:03:11 -06:00
{
2016-12-18 06:02:21 -07:00
kvm_apic_update_ppr ( vcpu ) ;
2007-09-12 04:03:11 -06:00
return 1 ;
}
2009-08-24 02:10:17 -06:00
static int handle_interrupt_window ( struct kvm_vcpu * vcpu )
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
{
2019-12-06 01:45:24 -07:00
exec_controls_clearbit ( to_vmx ( vcpu ) , CPU_BASED_INTR_WINDOW_EXITING ) ;
2008-04-10 13:31:10 -06:00
2010-07-27 03:30:24 -06:00
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
2008-09-26 01:30:45 -06:00
+ + vcpu - > stat . irq_window_exits ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
return 1 ;
}
2009-08-24 02:10:17 -06:00
static int handle_vmcall ( struct kvm_vcpu * vcpu )
2007-02-19 05:37:47 -07:00
{
2016-02-11 06:44:59 -07:00
return kvm_emulate_hypercall ( vcpu ) ;
2007-02-19 05:37:47 -07:00
}
2010-11-01 07:35:01 -06:00
static int handle_invd ( struct kvm_vcpu * vcpu )
{
KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27 15:40:38 -06:00
return kvm_emulate_instruction ( vcpu , 0 ) ;
2010-11-01 07:35:01 -06:00
}
2009-08-24 02:10:17 -06:00
static int handle_invlpg ( struct kvm_vcpu * vcpu )
2008-09-23 10:18:35 -06:00
{
2020-04-15 14:34:53 -06:00
unsigned long exit_qualification = vmx_get_exit_qual ( vcpu ) ;
2008-09-23 10:18:35 -06:00
kvm_mmu_invlpg ( vcpu , exit_qualification ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
2008-09-23 10:18:35 -06:00
}
2011-11-10 05:57:25 -07:00
static int handle_rdpmc ( struct kvm_vcpu * vcpu )
{
int err ;
err = kvm_rdpmc ( vcpu ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_complete_insn_gp ( vcpu , err ) ;
2011-11-10 05:57:25 -07:00
}
2009-08-24 02:10:17 -06:00
static int handle_wbinvd ( struct kvm_vcpu * vcpu )
2007-11-11 03:28:35 -07:00
{
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_emulate_wbinvd ( vcpu ) ;
2007-11-11 03:28:35 -07:00
}
2010-06-09 21:27:12 -06:00
static int handle_xsetbv ( struct kvm_vcpu * vcpu )
{
u64 new_bv = kvm_read_edx_eax ( vcpu ) ;
2019-04-30 11:36:17 -06:00
u32 index = kvm_rcx_read ( vcpu ) ;
2010-06-09 21:27:12 -06:00
if ( kvm_set_xcr ( vcpu , index , new_bv ) = = 0 )
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
2010-06-09 21:27:12 -06:00
return 1 ;
}
2009-08-24 02:10:17 -06:00
static int handle_apic_access ( struct kvm_vcpu * vcpu )
2007-10-28 19:40:42 -06:00
{
2011-08-30 04:56:17 -06:00
if ( likely ( fasteoi ) ) {
2020-04-15 14:34:53 -06:00
unsigned long exit_qualification = vmx_get_exit_qual ( vcpu ) ;
2011-08-30 04:56:17 -06:00
int access_type , offset ;
access_type = exit_qualification & APIC_ACCESS_TYPE ;
offset = exit_qualification & APIC_ACCESS_OFFSET ;
/*
* Sane guest uses MOV to write EOI , with written value
* not cared . So make a short - circuit here by avoiding
* heavy instruction emulation .
*/
if ( ( access_type = = TYPE_LINEAR_APIC_INST_WRITE ) & &
( offset = = APIC_EOI ) ) {
kvm_lapic_set_eoi ( vcpu ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
2011-08-30 04:56:17 -06:00
}
}
KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27 15:40:38 -06:00
return kvm_emulate_instruction ( vcpu , 0 ) ;
2007-10-28 19:40:42 -06:00
}
2013-01-24 19:18:51 -07:00
static int handle_apic_eoi_induced ( struct kvm_vcpu * vcpu )
{
2020-04-15 14:34:53 -06:00
unsigned long exit_qualification = vmx_get_exit_qual ( vcpu ) ;
2013-01-24 19:18:51 -07:00
int vector = exit_qualification & 0xff ;
/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
kvm_apic_set_eoi_accelerated ( vcpu , vector ) ;
return 1 ;
}
2013-01-24 19:18:49 -07:00
static int handle_apic_write ( struct kvm_vcpu * vcpu )
{
2020-04-15 14:34:53 -06:00
unsigned long exit_qualification = vmx_get_exit_qual ( vcpu ) ;
2013-01-24 19:18:49 -07:00
u32 offset = exit_qualification & 0xfff ;
/* APIC-write VM exit is trap-like and thus no need to adjust IP */
kvm_apic_write_nodecode ( vcpu , offset ) ;
return 1 ;
}
2009-08-24 02:10:17 -06:00
static int handle_task_switch ( struct kvm_vcpu * vcpu )
2008-03-24 15:14:53 -06:00
{
2008-09-26 01:30:47 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2008-03-24 15:14:53 -06:00
unsigned long exit_qualification ;
2010-04-14 07:51:09 -06:00
bool has_error_code = false ;
u32 error_code = 0 ;
2008-03-24 15:14:53 -06:00
u16 tss_selector ;
2012-02-08 06:34:38 -07:00
int reason , type , idt_v , idt_index ;
2009-03-30 07:03:29 -06:00
idt_v = ( vmx - > idt_vectoring_info & VECTORING_INFO_VALID_MASK ) ;
2012-02-08 06:34:38 -07:00
idt_index = ( vmx - > idt_vectoring_info & VECTORING_INFO_VECTOR_MASK ) ;
2009-03-30 07:03:29 -06:00
type = ( vmx - > idt_vectoring_info & VECTORING_INFO_TYPE_MASK ) ;
2008-03-24 15:14:53 -06:00
2020-04-15 14:34:53 -06:00
exit_qualification = vmx_get_exit_qual ( vcpu ) ;
2008-03-24 15:14:53 -06:00
reason = ( u32 ) exit_qualification > > 30 ;
2009-03-30 07:03:29 -06:00
if ( reason = = TASK_SWITCH_GATE & & idt_v ) {
switch ( type ) {
case INTR_TYPE_NMI_INTR :
vcpu - > arch . nmi_injected = false ;
2011-03-23 07:02:47 -06:00
vmx_set_nmi_mask ( vcpu , true ) ;
2009-03-30 07:03:29 -06:00
break ;
case INTR_TYPE_EXT_INTR :
2009-05-11 04:35:50 -06:00
case INTR_TYPE_SOFT_INTR :
2009-03-30 07:03:29 -06:00
kvm_clear_interrupt_queue ( vcpu ) ;
break ;
case INTR_TYPE_HARD_EXCEPTION :
2010-04-14 07:51:09 -06:00
if ( vmx - > idt_vectoring_info &
VECTORING_INFO_DELIVER_CODE_MASK ) {
has_error_code = true ;
error_code =
vmcs_read32 ( IDT_VECTORING_ERROR_CODE ) ;
}
/* fall through */
2009-03-30 07:03:29 -06:00
case INTR_TYPE_SOFT_EXCEPTION :
kvm_clear_exception_queue ( vcpu ) ;
break ;
default :
break ;
}
2008-09-26 01:30:47 -06:00
}
2008-03-24 15:14:53 -06:00
tss_selector = exit_qualification ;
2009-03-30 07:03:29 -06:00
if ( ! idt_v | | ( type ! = INTR_TYPE_HARD_EXCEPTION & &
type ! = INTR_TYPE_EXT_INTR & &
type ! = INTR_TYPE_NMI_INTR ) )
2019-08-27 15:40:39 -06:00
WARN_ON ( ! skip_emulated_instruction ( vcpu ) ) ;
2009-03-30 07:03:29 -06:00
2008-12-15 05:52:10 -07:00
/*
* TODO : What about debug traps on tss switch ?
* Are we supposed to inject them and update dr6 ?
*/
2019-08-27 15:40:35 -06:00
return kvm_task_switch ( vcpu , tss_selector ,
type = = INTR_TYPE_SOFT_INTR ? idt_index : - 1 ,
KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27 15:40:38 -06:00
reason , has_error_code , error_code ) ;
2008-03-24 15:14:53 -06:00
}
2009-08-24 02:10:17 -06:00
static int handle_ept_violation ( struct kvm_vcpu * vcpu )
2008-04-27 22:24:45 -06:00
{
2009-03-24 20:08:52 -06:00
unsigned long exit_qualification ;
2008-04-27 22:24:45 -06:00
gpa_t gpa ;
2016-11-28 06:39:58 -07:00
u64 error_code ;
2008-04-27 22:24:45 -06:00
2020-04-15 14:34:53 -06:00
exit_qualification = vmx_get_exit_qual ( vcpu ) ;
2008-04-27 22:24:45 -06:00
2013-09-15 02:07:23 -06:00
/*
* EPT violation happened while executing iret from NMI ,
* " blocked by NMI " bit has to be set before next VM entry .
* There are errata that may cause this bit to not be set :
* AAK134 , BY25 .
*/
2013-09-25 01:58:22 -06:00
if ( ! ( to_vmx ( vcpu ) - > idt_vectoring_info & VECTORING_INFO_VALID_MASK ) & &
2017-11-06 05:31:13 -07:00
enable_vnmi & &
2013-09-25 01:58:22 -06:00
( exit_qualification & INTR_INFO_UNBLOCK_NMI ) )
2013-09-15 02:07:23 -06:00
vmcs_set_bits ( GUEST_INTERRUPTIBILITY_INFO , GUEST_INTR_STATE_NMI ) ;
2008-04-27 22:24:45 -06:00
gpa = vmcs_read64 ( GUEST_PHYSICAL_ADDRESS ) ;
2009-06-17 06:22:14 -06:00
trace_kvm_page_fault ( gpa , exit_qualification ) ;
2012-06-20 01:58:04 -06:00
2016-12-06 17:46:10 -07:00
/* Is it a read fault? */
2016-12-21 21:29:28 -07:00
error_code = ( exit_qualification & EPT_VIOLATION_ACC_READ )
2016-12-06 17:46:10 -07:00
? PFERR_USER_MASK : 0 ;
/* Is it a write fault? */
2016-12-21 21:29:28 -07:00
error_code | = ( exit_qualification & EPT_VIOLATION_ACC_WRITE )
2016-12-06 17:46:10 -07:00
? PFERR_WRITE_MASK : 0 ;
/* Is it a fetch fault? */
2016-12-21 21:29:28 -07:00
error_code | = ( exit_qualification & EPT_VIOLATION_ACC_INSTR )
2016-12-06 17:46:10 -07:00
? PFERR_FETCH_MASK : 0 ;
/* ept page table entry is present? */
error_code | = ( exit_qualification &
( EPT_VIOLATION_READABLE | EPT_VIOLATION_WRITABLE |
EPT_VIOLATION_EXECUTABLE ) )
? PFERR_PRESENT_MASK : 0 ;
2012-06-20 01:58:04 -06:00
2016-11-28 06:39:58 -07:00
error_code | = ( exit_qualification & 0x100 ) ! = 0 ?
PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK ;
2013-08-06 03:00:32 -06:00
vcpu - > arch . exit_qualification = exit_qualification ;
2012-06-20 01:58:04 -06:00
return kvm_mmu_page_fault ( vcpu , gpa , error_code , NULL , 0 ) ;
2008-04-27 22:24:45 -06:00
}
2009-08-24 02:10:17 -06:00
static int handle_ept_misconfig ( struct kvm_vcpu * vcpu )
2009-06-11 09:07:43 -06:00
{
gpa_t gpa ;
2017-08-17 10:36:58 -06:00
/*
* A nested guest cannot optimize MMIO vmexits , because we have an
* nGPA here instead of the required GPA .
*/
2009-06-11 09:07:43 -06:00
gpa = vmcs_read64 ( GUEST_PHYSICAL_ADDRESS ) ;
2017-08-17 10:36:58 -06:00
if ( ! is_guest_mode ( vcpu ) & &
! kvm_io_bus_write ( vcpu , KVM_FAST_MMIO_BUS , gpa , 0 , NULL ) ) {
2015-09-15 00:41:58 -06:00
trace_kvm_fast_mmio ( gpa ) ;
2019-08-27 15:40:39 -06:00
return kvm_skip_emulated_instruction ( vcpu ) ;
KVM: VMX: speed up wildcard MMIO EVENTFD
With KVM, MMIO is much slower than PIO, due to the need to
do page walk and emulation. But with EPT, it does not have to be: we
know the address from the VMCS so if the address is unique, we can look
up the eventfd directly, bypassing emulation.
Unfortunately, this only works if userspace does not need to match on
access length and data. The implementation adds a separate FAST_MMIO
bus internally. This serves two purposes:
- minimize overhead for old userspace that does not use eventfd with lengtth = 0
- minimize disruption in other code (since we don't know the length,
devices on the MMIO bus only get a valid address in write, this
way we don't need to touch all devices to teach them to handle
an invalid length)
At the moment, this optimization only has effect for EPT on x86.
It will be possible to speed up MMIO for NPT and MMU using the same
idea in the future.
With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
I was unable to detect any measureable slowdown to non-eventfd MMIO.
Making MMIO faster is important for the upcoming virtio 1.0 which
includes an MMIO signalling capability.
The idea was suggested by Peter Anvin. Lots of thanks to Gleb for
pre-review and suggestions.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-03-31 12:50:44 -06:00
}
2009-06-11 09:07:43 -06:00
2018-03-29 15:48:31 -06:00
return kvm_mmu_page_fault ( vcpu , gpa , PFERR_RSVD_MASK , NULL , 0 ) ;
2009-06-11 09:07:43 -06:00
}
2009-08-24 02:10:17 -06:00
static int handle_nmi_window ( struct kvm_vcpu * vcpu )
2008-05-15 04:23:25 -06:00
{
2017-11-06 05:31:13 -07:00
WARN_ON_ONCE ( ! enable_vnmi ) ;
2019-12-06 01:45:25 -07:00
exec_controls_clearbit ( to_vmx ( vcpu ) , CPU_BASED_NMI_WINDOW_EXITING ) ;
2008-05-15 04:23:25 -06:00
+ + vcpu - > stat . nmi_window_exits ;
2010-07-27 03:30:24 -06:00
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
2008-05-15 04:23:25 -06:00
return 1 ;
}
2009-09-01 04:48:18 -06:00
static int handle_invalid_guest_state ( struct kvm_vcpu * vcpu )
2008-08-17 07:47:05 -06:00
{
2009-01-05 03:10:54 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2010-09-19 06:34:08 -06:00
bool intr_window_requested ;
2012-06-07 08:08:48 -06:00
unsigned count = 130 ;
2010-09-19 06:34:08 -06:00
2019-05-07 13:17:56 -06:00
intr_window_requested = exec_controls_get ( vmx ) &
2019-12-06 01:45:24 -07:00
CPU_BASED_INTR_WINDOW_EXITING ;
2008-08-17 07:47:05 -06:00
2014-03-27 02:51:52 -06:00
while ( vmx - > emulation_required & & count - - ! = 0 ) {
2020-04-22 20:25:48 -06:00
if ( intr_window_requested & & ! vmx_interrupt_blocked ( vcpu ) )
2010-09-19 06:34:08 -06:00
return handle_interrupt_window ( & vmx - > vcpu ) ;
2017-04-26 14:32:19 -06:00
if ( kvm_test_request ( KVM_REQ_EVENT , vcpu ) )
2012-06-12 11:21:38 -06:00
return 1 ;
KVM: x86: Remove emulation_result enums, EMULATE_{DONE,FAIL,USER_EXIT}
Deferring emulation failure handling (in some cases) to the caller of
x86_emulate_instruction() has proven fragile, e.g. multiple instances of
KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it
being difficult to discern what emulation types can return what result,
and which combination of types and results are handled where.
Now that x86_emulate_instruction() always handles emulation failure,
i.e. EMULATION_FAIL is only referenced in callers, remove the
emulation_result enums entirely. Per KVM's existing exit handling
conventions, return '0' and '1' for "exit to userspace" and "resume
guest" respectively. Doing so cleans up many callers, e.g. they can
return kvm_emulate_instruction() directly instead of having to interpret
its result.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-27 15:40:38 -06:00
if ( ! kvm_emulate_instruction ( vcpu , 0 ) )
2019-08-27 15:40:37 -06:00
return 0 ;
2008-10-29 02:39:42 -06:00
KVM: VMX: raise internal error for exception during invalid protected mode state
Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
an exception in Protected Mode while emulating guest due to invalid
guest state. Unlike Big RM, KVM doesn't support emulating exceptions
in PM, i.e. PM exceptions are always injected via the VMCS. Because
we will never do VMRESUME due to emulation_required, the exception is
never realized and we'll keep emulating the faulting instruction over
and over until we receive a signal.
Exit to userspace iff there is a pending exception, i.e. don't exit
simply on a requested event. The purpose of this check and exit is to
aid in debugging a guest that is in all likelihood already doomed.
Invalid guest state in PM is extremely limited in normal operation,
e.g. it generally only occurs for a few instructions early in BIOS,
and any exception at this time is all but guaranteed to be fatal.
Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
handled/emulated, while checking for vectored interrupts, e.g. INTR
and NMI, without hitting false positives would add a fair amount of
complexity for almost no benefit (getting hit by lightning seems
more likely than encountering this specific scenario).
Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
exception via the VMCS and emulation_required is true.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-23 10:34:00 -06:00
if ( vmx - > emulation_required & & ! vmx - > rmode . vm86_active & &
2019-08-27 15:40:37 -06:00
vcpu - > arch . exception . pending ) {
vcpu - > run - > exit_reason = KVM_EXIT_INTERNAL_ERROR ;
vcpu - > run - > internal . suberror =
KVM_INTERNAL_ERROR_EMULATION ;
vcpu - > run - > internal . ndata = 0 ;
return 0 ;
}
2008-08-17 07:47:05 -06:00
2013-05-08 09:38:44 -06:00
if ( vcpu - > arch . halt_request ) {
vcpu - > arch . halt_request = 0 ;
2019-08-27 15:40:37 -06:00
return kvm_vcpu_halt ( vcpu ) ;
2013-05-08 09:38:44 -06:00
}
2019-08-27 15:40:37 -06:00
/*
* Note , return 1 and not 0 , vcpu_run ( ) is responsible for
* morphing the pending signal into the proper return code .
*/
2008-08-17 07:47:05 -06:00
if ( signal_pending ( current ) )
2019-08-27 15:40:37 -06:00
return 1 ;
2008-08-17 07:47:05 -06:00
if ( need_resched ( ) )
schedule ( ) ;
}
2019-08-27 15:40:37 -06:00
return 1 ;
2014-08-21 10:08:08 -06:00
}
static void grow_ple_window ( struct kvm_vcpu * vcpu )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2019-09-05 20:17:21 -06:00
unsigned int old = vmx - > ple_window ;
2014-08-21 10:08:08 -06:00
2018-03-16 14:37:24 -06:00
vmx - > ple_window = __grow_ple_window ( old , ple_window ,
ple_window_grow ,
ple_window_max ) ;
2014-08-21 10:08:08 -06:00
2019-09-05 20:17:22 -06:00
if ( vmx - > ple_window ! = old ) {
2014-08-21 10:08:08 -06:00
vmx - > ple_window_dirty = true ;
2019-09-05 20:17:22 -06:00
trace_kvm_ple_window_update ( vcpu - > vcpu_id ,
vmx - > ple_window , old ) ;
}
2014-08-21 10:08:08 -06:00
}
static void shrink_ple_window ( struct kvm_vcpu * vcpu )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2019-09-05 20:17:21 -06:00
unsigned int old = vmx - > ple_window ;
2014-08-21 10:08:08 -06:00
2018-03-16 14:37:24 -06:00
vmx - > ple_window = __shrink_ple_window ( old , ple_window ,
ple_window_shrink ,
ple_window ) ;
2014-08-21 10:08:08 -06:00
2019-09-05 20:17:22 -06:00
if ( vmx - > ple_window ! = old ) {
2014-08-21 10:08:08 -06:00
vmx - > ple_window_dirty = true ;
2019-09-05 20:17:22 -06:00
trace_kvm_ple_window_update ( vcpu - > vcpu_id ,
vmx - > ple_window , old ) ;
}
2014-08-21 10:08:08 -06:00
}
2015-09-18 08:29:55 -06:00
/*
* Handler for POSTED_INTERRUPT_WAKEUP_VECTOR .
*/
static void wakeup_handler ( void )
{
struct kvm_vcpu * vcpu ;
int cpu = smp_processor_id ( ) ;
spin_lock ( & per_cpu ( blocked_vcpu_on_cpu_lock , cpu ) ) ;
list_for_each_entry ( vcpu , & per_cpu ( blocked_vcpu_on_cpu , cpu ) ,
blocked_vcpu_list ) {
struct pi_desc * pi_desc = vcpu_to_pi_desc ( vcpu ) ;
if ( pi_test_on ( pi_desc ) = = 1 )
kvm_vcpu_kick ( vcpu ) ;
}
spin_unlock ( & per_cpu ( blocked_vcpu_on_cpu_lock , cpu ) ) ;
}
2018-04-06 15:47:32 -06:00
static void vmx_enable_tdp ( void )
2016-12-06 17:46:16 -07:00
{
kvm_mmu_set_mask_ptes ( VMX_EPT_READABLE_MASK ,
enable_ept_ad_bits ? VMX_EPT_ACCESS_BIT : 0ull ,
enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull ,
0ull , VMX_EPT_EXECUTABLE_MASK ,
cpu_has_vmx_ept_execute_only ( ) ? 0ull : VMX_EPT_READABLE_MASK ,
2017-07-17 15:10:27 -06:00
VMX_EPT_RWX_MASK , 0ull ) ;
2016-12-06 17:46:16 -07:00
ept_set_mmio_spte_mask ( ) ;
}
2009-10-09 04:03:20 -06:00
/*
* Indicate a busy - waiting vcpu in spinlock . We do not enable the PAUSE
* exiting , so only get here on cpu with PAUSE - Loop - Exiting .
*/
2009-10-12 16:37:31 -06:00
static int handle_pause ( struct kvm_vcpu * vcpu )
2009-10-09 04:03:20 -06:00
{
2018-03-12 05:53:04 -06:00
if ( ! kvm_pause_in_guest ( vcpu - > kvm ) )
2014-08-21 10:08:08 -06:00
grow_ple_window ( vcpu ) ;
2017-08-07 22:05:33 -06:00
/*
* Intel sdm vol3 ch - 25.1 .3 says : The " PAUSE-loop exiting "
* VM - execution control is ignored if CPL > 0. OTOH , KVM
* never set PAUSE_EXITING and just set PLE if supported ,
* so the vcpu must be CPL = 0 if it gets a PAUSE exit .
*/
kvm_vcpu_on_spin ( vcpu , true ) ;
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
2009-10-09 04:03:20 -06:00
}
2014-05-07 14:52:13 -06:00
static int handle_nop ( struct kvm_vcpu * vcpu )
2009-12-14 22:29:54 -07:00
{
KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.
Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:
- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
take precedence. The singlestep will be triggered again on the next
instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
which do not trigger singlestep traps as mentioned previously.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-29 13:40:40 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
2009-12-14 22:29:54 -07:00
}
2014-05-07 14:52:13 -06:00
static int handle_mwait ( struct kvm_vcpu * vcpu )
{
printk_once ( KERN_WARNING " kvm: MWAIT instruction emulated as NOP! \n " ) ;
return handle_nop ( vcpu ) ;
}
2017-08-23 17:32:04 -06:00
static int handle_invalid_op ( struct kvm_vcpu * vcpu )
{
kvm_queue_exception ( vcpu , UD_VECTOR ) ;
return 1 ;
}
2015-07-05 11:08:57 -06:00
static int handle_monitor_trap ( struct kvm_vcpu * vcpu )
{
return 1 ;
}
2014-05-07 14:52:13 -06:00
static int handle_monitor ( struct kvm_vcpu * vcpu )
{
printk_once ( KERN_WARNING " kvm: MONITOR instruction emulated as NOP! \n " ) ;
return handle_nop ( vcpu ) ;
}
2018-12-03 14:53:18 -07:00
static int handle_invpcid ( struct kvm_vcpu * vcpu )
2014-05-06 00:19:15 -06:00
{
2018-12-03 14:53:18 -07:00
u32 vmx_instruction_info ;
unsigned long type ;
bool pcid_enabled ;
gva_t gva ;
struct x86_exception e ;
unsigned i ;
unsigned long roots_to_free = 0 ;
struct {
u64 pcid ;
u64 gla ;
} operand ;
2020-06-05 05:59:05 -06:00
int r ;
2015-04-16 20:22:21 -06:00
2018-12-03 14:53:18 -07:00
if ( ! guest_cpuid_has ( vcpu , X86_FEATURE_INVPCID ) ) {
2014-05-06 00:19:15 -06:00
kvm_queue_exception ( vcpu , UD_VECTOR ) ;
return 1 ;
}
2018-12-03 14:53:18 -07:00
vmx_instruction_info = vmcs_read32 ( VMX_INSTRUCTION_INFO ) ;
type = kvm_register_readl ( vcpu , ( vmx_instruction_info > > 28 ) & 0xf ) ;
if ( type > 3 ) {
kvm_inject_gp ( vcpu , 0 ) ;
2015-04-16 20:22:21 -06:00
return 1 ;
}
2018-12-03 14:53:18 -07:00
/* According to the Intel instruction reference, the memory operand
* is read even if it isn ' t needed ( e . g . , for type = = all )
*/
2020-04-15 14:34:53 -06:00
if ( get_vmx_mem_address ( vcpu , vmx_get_exit_qual ( vcpu ) ,
2019-06-05 15:19:16 -06:00
vmx_instruction_info , false ,
sizeof ( operand ) , & gva ) )
2014-05-06 00:19:16 -06:00
return 1 ;
2020-06-05 05:59:05 -06:00
r = kvm_read_guest_virt ( vcpu , gva , & operand , sizeof ( operand ) , & e ) ;
if ( r ! = X86EMUL_CONTINUE )
return vmx_handle_memory_failure ( vcpu , r , & e ) ;
2014-05-06 00:19:16 -06:00
2018-12-03 14:53:18 -07:00
if ( operand . pcid > > 12 ! = 0 ) {
kvm_inject_gp ( vcpu , 0 ) ;
return 1 ;
2018-06-22 17:35:13 -06:00
}
2016-11-30 13:03:43 -07:00
2018-12-03 14:53:18 -07:00
pcid_enabled = kvm_read_cr4_bits ( vcpu , X86_CR4_PCIDE ) ;
2016-11-30 13:03:43 -07:00
2018-12-03 14:53:18 -07:00
switch ( type ) {
case INVPCID_TYPE_INDIV_ADDR :
if ( ( ! pcid_enabled & & ( operand . pcid ! = 0 ) ) | |
is_noncanonical_address ( operand . gla , vcpu ) ) {
kvm_inject_gp ( vcpu , 0 ) ;
return 1 ;
}
kvm_mmu_invpcid_gva ( vcpu , operand . gla , operand . pcid ) ;
return kvm_skip_emulated_instruction ( vcpu ) ;
2018-06-22 17:35:08 -06:00
2018-12-03 14:53:18 -07:00
case INVPCID_TYPE_SINGLE_CTXT :
if ( ! pcid_enabled & & ( operand . pcid ! = 0 ) ) {
kvm_inject_gp ( vcpu , 0 ) ;
return 1 ;
}
2016-11-30 13:03:43 -07:00
2018-12-03 14:53:18 -07:00
if ( kvm_get_active_pcid ( vcpu ) = = operand . pcid ) {
kvm_mmu_sync_roots ( vcpu ) ;
2020-03-20 15:28:20 -06:00
kvm_make_request ( KVM_REQ_TLB_FLUSH_CURRENT , vcpu ) ;
2018-12-03 14:53:18 -07:00
}
2016-11-30 13:03:43 -07:00
2018-12-03 14:53:18 -07:00
for ( i = 0 ; i < KVM_MMU_NUM_PREV_ROOTS ; i + + )
2020-03-20 15:28:32 -06:00
if ( kvm_get_pcid ( vcpu , vcpu - > arch . mmu - > prev_roots [ i ] . pgd )
2018-12-03 14:53:18 -07:00
= = operand . pcid )
roots_to_free | = KVM_MMU_ROOT_PREVIOUS ( i ) ;
2018-07-19 12:59:07 -06:00
2018-12-03 14:53:18 -07:00
kvm_mmu_free_roots ( vcpu , vcpu - > arch . mmu , roots_to_free ) ;
/*
* If neither the current cr3 nor any of the prev_roots use the
* given PCID , then nothing needs to be done here because a
* resync will happen anyway before switching to any other CR3 .
*/
2016-11-30 13:03:43 -07:00
2018-12-03 14:53:18 -07:00
return kvm_skip_emulated_instruction ( vcpu ) ;
2018-06-22 17:35:08 -06:00
2018-12-03 14:53:18 -07:00
case INVPCID_TYPE_ALL_NON_GLOBAL :
/*
* Currently , KVM doesn ' t mark global entries in the shadow
* page tables , so a non - global flush just degenerates to a
* global flush . If needed , we could optimize this later by
* keeping track of global entries in shadow page tables .
*/
2016-11-30 13:03:43 -07:00
2018-12-03 14:53:18 -07:00
/* fall-through */
case INVPCID_TYPE_ALL_INCL_GLOBAL :
kvm_mmu_unload ( vcpu ) ;
return kvm_skip_emulated_instruction ( vcpu ) ;
2016-11-30 13:03:43 -07:00
2018-12-03 14:53:18 -07:00
default :
BUG ( ) ; /* We have already checked above that type <= 3 */
}
2016-11-30 13:03:43 -07:00
}
2018-12-03 14:53:18 -07:00
static int handle_pml_full ( struct kvm_vcpu * vcpu )
2011-05-25 14:02:54 -06:00
{
2018-12-03 14:53:18 -07:00
unsigned long exit_qualification ;
2013-07-08 05:12:35 -06:00
2018-12-03 14:53:18 -07:00
trace_kvm_pml_full ( vcpu - > vcpu_id ) ;
2013-07-08 05:12:35 -06:00
2020-04-15 14:34:53 -06:00
exit_qualification = vmx_get_exit_qual ( vcpu ) ;
2017-05-19 07:48:51 -06:00
/*
2018-12-03 14:53:18 -07:00
* PML buffer FULL happened while executing iret from NMI ,
* " blocked by NMI " bit has to be set before next VM entry .
2017-05-19 07:48:51 -06:00
*/
2018-12-03 14:53:18 -07:00
if ( ! ( to_vmx ( vcpu ) - > idt_vectoring_info & VECTORING_INFO_VALID_MASK ) & &
enable_vnmi & &
( exit_qualification & INTR_INFO_UNBLOCK_NMI ) )
vmcs_set_bits ( GUEST_INTERRUPTIBILITY_INFO ,
GUEST_INTR_STATE_NMI ) ;
2018-07-27 14:44:45 -06:00
2018-12-03 14:53:18 -07:00
/*
* PML buffer already flushed at beginning of VMEXIT . Nothing to do
* here . . , and there ' s no userspace involvement needed for PML .
*/
2011-05-25 14:02:54 -06:00
return 1 ;
}
2020-05-06 09:44:01 -06:00
static fastpath_t handle_fastpath_preemption_timer ( struct kvm_vcpu * vcpu )
2017-08-01 15:00:39 -06:00
{
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
if ( ! vmx - > req_immediate_exit & &
2020-05-06 09:44:01 -06:00
! unlikely ( vmx - > loaded_vmcs - > hv_timer_soft_disabled ) ) {
2018-12-03 14:53:18 -07:00
kvm_lapic_expired_hv_timer ( vcpu ) ;
2020-05-06 09:44:01 -06:00
return EXIT_FASTPATH_REENTER_GUEST ;
}
return EXIT_FASTPATH_NONE ;
}
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
2020-05-06 09:44:01 -06:00
static int handle_preemption_timer ( struct kvm_vcpu * vcpu )
{
handle_fastpath_preemption_timer ( vcpu ) ;
2018-12-03 14:53:18 -07:00
return 1 ;
2017-08-01 15:00:39 -06:00
}
2018-12-03 14:53:18 -07:00
/*
* When nested = 0 , all VMX instruction VM Exits filter here . The handlers
* are overwritten by nested_vmx_setup ( ) when nested = 1.
*/
static int handle_vmx_instruction ( struct kvm_vcpu * vcpu )
2018-10-16 10:50:03 -06:00
{
2018-12-03 14:53:18 -07:00
kvm_queue_exception ( vcpu , UD_VECTOR ) ;
return 1 ;
2018-10-16 10:50:03 -06:00
}
2018-12-03 14:53:18 -07:00
static int handle_encls ( struct kvm_vcpu * vcpu )
2013-04-18 05:37:55 -06:00
{
2018-12-03 14:53:18 -07:00
/*
* SGX virtualization is not yet supported . There is no software
* enable bit for SGX , so we have to trap ENCLS and inject a # UD
* to prevent the guest from executing ENCLS .
*/
kvm_queue_exception ( vcpu , UD_VECTOR ) ;
return 1 ;
2013-04-18 05:37:55 -06:00
}
2011-05-25 14:02:54 -06:00
/*
2018-12-03 14:53:18 -07:00
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume . Otherwise they set the kvm_run parameter to indicate what needs
* to be done to userspace and return 0.
2011-05-25 14:02:54 -06:00
*/
2018-12-03 14:53:18 -07:00
static int ( * kvm_vmx_exit_handlers [ ] ) ( struct kvm_vcpu * vcpu ) = {
2019-04-19 23:50:59 -06:00
[ EXIT_REASON_EXCEPTION_NMI ] = handle_exception_nmi ,
2018-12-03 14:53:18 -07:00
[ EXIT_REASON_EXTERNAL_INTERRUPT ] = handle_external_interrupt ,
[ EXIT_REASON_TRIPLE_FAULT ] = handle_triple_fault ,
[ EXIT_REASON_NMI_WINDOW ] = handle_nmi_window ,
[ EXIT_REASON_IO_INSTRUCTION ] = handle_io ,
[ EXIT_REASON_CR_ACCESS ] = handle_cr ,
[ EXIT_REASON_DR_ACCESS ] = handle_dr ,
2019-11-04 15:59:58 -07:00
[ EXIT_REASON_CPUID ] = kvm_emulate_cpuid ,
[ EXIT_REASON_MSR_READ ] = kvm_emulate_rdmsr ,
[ EXIT_REASON_MSR_WRITE ] = kvm_emulate_wrmsr ,
2019-12-06 01:45:24 -07:00
[ EXIT_REASON_INTERRUPT_WINDOW ] = handle_interrupt_window ,
2019-11-04 15:59:58 -07:00
[ EXIT_REASON_HLT ] = kvm_emulate_halt ,
2018-12-03 14:53:18 -07:00
[ EXIT_REASON_INVD ] = handle_invd ,
[ EXIT_REASON_INVLPG ] = handle_invlpg ,
[ EXIT_REASON_RDPMC ] = handle_rdpmc ,
[ EXIT_REASON_VMCALL ] = handle_vmcall ,
[ EXIT_REASON_VMCLEAR ] = handle_vmx_instruction ,
[ EXIT_REASON_VMLAUNCH ] = handle_vmx_instruction ,
[ EXIT_REASON_VMPTRLD ] = handle_vmx_instruction ,
[ EXIT_REASON_VMPTRST ] = handle_vmx_instruction ,
[ EXIT_REASON_VMREAD ] = handle_vmx_instruction ,
[ EXIT_REASON_VMRESUME ] = handle_vmx_instruction ,
[ EXIT_REASON_VMWRITE ] = handle_vmx_instruction ,
[ EXIT_REASON_VMOFF ] = handle_vmx_instruction ,
[ EXIT_REASON_VMON ] = handle_vmx_instruction ,
[ EXIT_REASON_TPR_BELOW_THRESHOLD ] = handle_tpr_below_threshold ,
[ EXIT_REASON_APIC_ACCESS ] = handle_apic_access ,
[ EXIT_REASON_APIC_WRITE ] = handle_apic_write ,
[ EXIT_REASON_EOI_INDUCED ] = handle_apic_eoi_induced ,
[ EXIT_REASON_WBINVD ] = handle_wbinvd ,
[ EXIT_REASON_XSETBV ] = handle_xsetbv ,
[ EXIT_REASON_TASK_SWITCH ] = handle_task_switch ,
[ EXIT_REASON_MCE_DURING_VMENTRY ] = handle_machine_check ,
[ EXIT_REASON_GDTR_IDTR ] = handle_desc ,
[ EXIT_REASON_LDTR_TR ] = handle_desc ,
[ EXIT_REASON_EPT_VIOLATION ] = handle_ept_violation ,
[ EXIT_REASON_EPT_MISCONFIG ] = handle_ept_misconfig ,
[ EXIT_REASON_PAUSE_INSTRUCTION ] = handle_pause ,
[ EXIT_REASON_MWAIT_INSTRUCTION ] = handle_mwait ,
[ EXIT_REASON_MONITOR_TRAP_FLAG ] = handle_monitor_trap ,
[ EXIT_REASON_MONITOR_INSTRUCTION ] = handle_monitor ,
[ EXIT_REASON_INVEPT ] = handle_vmx_instruction ,
[ EXIT_REASON_INVVPID ] = handle_vmx_instruction ,
[ EXIT_REASON_RDRAND ] = handle_invalid_op ,
[ EXIT_REASON_RDSEED ] = handle_invalid_op ,
[ EXIT_REASON_PML_FULL ] = handle_pml_full ,
[ EXIT_REASON_INVPCID ] = handle_invpcid ,
[ EXIT_REASON_VMFUNC ] = handle_vmx_instruction ,
[ EXIT_REASON_PREEMPTION_TIMER ] = handle_preemption_timer ,
[ EXIT_REASON_ENCLS ] = handle_encls ,
} ;
2018-10-16 10:50:03 -06:00
2018-12-03 14:53:18 -07:00
static const int kvm_vmx_max_exit_handlers =
ARRAY_SIZE ( kvm_vmx_exit_handlers ) ;
2011-05-25 14:02:54 -06:00
2018-12-03 14:53:18 -07:00
static void vmx_get_exit_info ( struct kvm_vcpu * vcpu , u64 * info1 , u64 * info2 )
2011-05-25 14:02:54 -06:00
{
2020-04-15 14:34:53 -06:00
* info1 = vmx_get_exit_qual ( vcpu ) ;
2020-04-15 14:34:54 -06:00
* info2 = vmx_get_intr_info ( vcpu ) ;
2011-05-25 14:02:54 -06:00
}
2018-12-03 14:53:18 -07:00
static void vmx_destroy_pml_buffer ( struct vcpu_vmx * vmx )
2011-05-25 14:06:59 -06:00
{
2018-12-03 14:53:18 -07:00
if ( vmx - > pml_pg ) {
__free_page ( vmx - > pml_pg ) ;
vmx - > pml_pg = NULL ;
2018-10-16 10:50:03 -06:00
}
2011-05-25 14:06:59 -06:00
}
2018-12-03 14:53:18 -07:00
static void vmx_flush_pml_buffer ( struct kvm_vcpu * vcpu )
2011-05-25 14:10:33 -06:00
{
2018-12-03 14:53:18 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
u64 * pml_buf ;
u16 pml_idx ;
2011-05-25 14:10:33 -06:00
2018-12-03 14:53:18 -07:00
pml_idx = vmcs_read16 ( GUEST_PML_INDEX ) ;
2011-05-25 14:10:33 -06:00
2018-12-03 14:53:18 -07:00
/* Do nothing if PML buffer is empty */
if ( pml_idx = = ( PML_ENTITY_NUM - 1 ) )
return ;
2011-05-25 14:10:33 -06:00
2018-12-03 14:53:18 -07:00
/* PML index always points to next available PML buffer entity */
if ( pml_idx > = PML_ENTITY_NUM )
pml_idx = 0 ;
else
pml_idx + + ;
2018-10-16 10:50:02 -06:00
2018-12-03 14:53:18 -07:00
pml_buf = page_address ( vmx - > pml_pg ) ;
for ( ; pml_idx < PML_ENTITY_NUM ; pml_idx + + ) {
u64 gpa ;
2018-10-16 10:50:02 -06:00
2018-12-03 14:53:18 -07:00
gpa = pml_buf [ pml_idx ] ;
WARN_ON ( gpa & ( PAGE_SIZE - 1 ) ) ;
kvm_vcpu_mark_page_dirty ( vcpu , gpa > > PAGE_SHIFT ) ;
2018-10-16 10:50:02 -06:00
}
2018-12-03 14:53:18 -07:00
/* reset PML index */
vmcs_write16 ( GUEST_PML_INDEX , PML_ENTITY_NUM - 1 ) ;
2018-10-16 10:50:02 -06:00
}
2018-05-29 10:11:33 -06:00
/*
2018-12-03 14:53:18 -07:00
* Flush all vcpus ' PML buffer and update logged GPAs to dirty_bitmap .
* Called before reporting dirty_bitmap to userspace .
2018-05-29 10:11:33 -06:00
*/
2018-12-03 14:53:18 -07:00
static void kvm_flush_pml_buffers ( struct kvm * kvm )
2011-05-25 14:08:30 -06:00
{
2018-12-03 14:53:18 -07:00
int i ;
struct kvm_vcpu * vcpu ;
2011-05-25 14:08:30 -06:00
/*
2018-12-03 14:53:18 -07:00
* We only need to kick vcpu out of guest mode here , as PML buffer
* is flushed at beginning of all VMEXITs , and it ' s obvious that only
* vcpus running in guest are possible to have unflushed GPAs in PML
* buffer .
2011-05-25 14:08:30 -06:00
*/
2018-12-03 14:53:18 -07:00
kvm_for_each_vcpu ( i , vcpu , kvm )
kvm_vcpu_kick ( vcpu ) ;
2011-05-25 14:08:30 -06:00
}
2018-12-03 14:53:18 -07:00
static void vmx_dump_sel ( char * name , uint32_t sel )
2011-05-25 14:08:30 -06:00
{
2018-12-03 14:53:18 -07:00
pr_err ( " %s sel=0x%04x, attr=0x%05x, limit=0x%08x, base=0x%016lx \n " ,
name , vmcs_read16 ( sel ) ,
vmcs_read32 ( sel + GUEST_ES_AR_BYTES - GUEST_ES_SELECTOR ) ,
vmcs_read32 ( sel + GUEST_ES_LIMIT - GUEST_ES_SELECTOR ) ,
vmcs_readl ( sel + GUEST_ES_BASE - GUEST_ES_SELECTOR ) ) ;
2011-05-25 14:08:30 -06:00
}
2018-12-03 14:53:18 -07:00
static void vmx_dump_dtsel ( char * name , uint32_t limit )
2016-11-30 13:03:44 -07:00
{
2018-12-03 14:53:18 -07:00
pr_err ( " %s limit=0x%08x, base=0x%016lx \n " ,
name , vmcs_read32 ( limit ) ,
vmcs_readl ( limit + GUEST_GDTR_BASE - GUEST_GDTR_LIMIT ) ) ;
2016-11-30 13:03:44 -07:00
}
2019-04-15 07:16:17 -06:00
void dump_vmcs ( void )
2011-05-25 14:07:29 -06:00
{
2019-05-20 07:34:35 -06:00
u32 vmentry_ctl , vmexit_ctl ;
u32 cpu_based_exec_ctrl , pin_based_exec_ctrl , secondary_exec_control ;
unsigned long cr4 ;
u64 efer ;
2011-05-25 14:07:29 -06:00
2019-05-20 07:34:35 -06:00
if ( ! dump_invalid_vmcs ) {
pr_warn_ratelimited ( " set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. \n " ) ;
return ;
}
vmentry_ctl = vmcs_read32 ( VM_ENTRY_CONTROLS ) ;
vmexit_ctl = vmcs_read32 ( VM_EXIT_CONTROLS ) ;
cpu_based_exec_ctrl = vmcs_read32 ( CPU_BASED_VM_EXEC_CONTROL ) ;
pin_based_exec_ctrl = vmcs_read32 ( PIN_BASED_VM_EXEC_CONTROL ) ;
cr4 = vmcs_readl ( GUEST_CR4 ) ;
efer = vmcs_read64 ( GUEST_IA32_EFER ) ;
secondary_exec_control = 0 ;
2018-12-03 14:53:18 -07:00
if ( cpu_has_secondary_exec_ctrls ( ) )
secondary_exec_control = vmcs_read32 ( SECONDARY_VM_EXEC_CONTROL ) ;
2018-10-08 13:28:08 -06:00
2018-12-03 14:53:18 -07:00
pr_err ( " *** Guest State *** \n " ) ;
pr_err ( " CR0: actual=0x%016lx, shadow=0x%016lx, gh_mask=%016lx \n " ,
vmcs_readl ( GUEST_CR0 ) , vmcs_readl ( CR0_READ_SHADOW ) ,
vmcs_readl ( CR0_GUEST_HOST_MASK ) ) ;
pr_err ( " CR4: actual=0x%016lx, shadow=0x%016lx, gh_mask=%016lx \n " ,
cr4 , vmcs_readl ( CR4_READ_SHADOW ) , vmcs_readl ( CR4_GUEST_HOST_MASK ) ) ;
pr_err ( " CR3 = 0x%016lx \n " , vmcs_readl ( GUEST_CR3 ) ) ;
if ( ( secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT ) & &
( cr4 & X86_CR4_PAE ) & & ! ( efer & EFER_LMA ) )
{
pr_err ( " PDPTR0 = 0x%016llx PDPTR1 = 0x%016llx \n " ,
vmcs_read64 ( GUEST_PDPTR0 ) , vmcs_read64 ( GUEST_PDPTR1 ) ) ;
pr_err ( " PDPTR2 = 0x%016llx PDPTR3 = 0x%016llx \n " ,
vmcs_read64 ( GUEST_PDPTR2 ) , vmcs_read64 ( GUEST_PDPTR3 ) ) ;
2014-12-10 22:53:27 -07:00
}
2018-12-03 14:53:18 -07:00
pr_err ( " RSP = 0x%016lx RIP = 0x%016lx \n " ,
vmcs_readl ( GUEST_RSP ) , vmcs_readl ( GUEST_RIP ) ) ;
pr_err ( " RFLAGS=0x%08lx DR7 = 0x%016lx \n " ,
vmcs_readl ( GUEST_RFLAGS ) , vmcs_readl ( GUEST_DR7 ) ) ;
pr_err ( " Sysenter RSP=%016lx CS:RIP=%04x:%016lx \n " ,
vmcs_readl ( GUEST_SYSENTER_ESP ) ,
vmcs_read32 ( GUEST_SYSENTER_CS ) , vmcs_readl ( GUEST_SYSENTER_EIP ) ) ;
vmx_dump_sel ( " CS: " , GUEST_CS_SELECTOR ) ;
vmx_dump_sel ( " DS: " , GUEST_DS_SELECTOR ) ;
vmx_dump_sel ( " SS: " , GUEST_SS_SELECTOR ) ;
vmx_dump_sel ( " ES: " , GUEST_ES_SELECTOR ) ;
vmx_dump_sel ( " FS: " , GUEST_FS_SELECTOR ) ;
vmx_dump_sel ( " GS: " , GUEST_GS_SELECTOR ) ;
vmx_dump_dtsel ( " GDTR: " , GUEST_GDTR_LIMIT ) ;
vmx_dump_sel ( " LDTR: " , GUEST_LDTR_SELECTOR ) ;
vmx_dump_dtsel ( " IDTR: " , GUEST_IDTR_LIMIT ) ;
vmx_dump_sel ( " TR: " , GUEST_TR_SELECTOR ) ;
if ( ( vmexit_ctl & ( VM_EXIT_SAVE_IA32_PAT | VM_EXIT_SAVE_IA32_EFER ) ) | |
( vmentry_ctl & ( VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_IA32_EFER ) ) )
pr_err ( " EFER = 0x%016llx PAT = 0x%016llx \n " ,
efer , vmcs_read64 ( GUEST_IA32_PAT ) ) ;
pr_err ( " DebugCtl = 0x%016llx DebugExceptions = 0x%016lx \n " ,
vmcs_read64 ( GUEST_IA32_DEBUGCTL ) ,
vmcs_readl ( GUEST_PENDING_DBG_EXCEPTIONS ) ) ;
if ( cpu_has_load_perf_global_ctrl ( ) & &
vmentry_ctl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL )
pr_err ( " PerfGlobCtl = 0x%016llx \n " ,
vmcs_read64 ( GUEST_IA32_PERF_GLOBAL_CTRL ) ) ;
if ( vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS )
pr_err ( " BndCfgS = 0x%016llx \n " , vmcs_read64 ( GUEST_BNDCFGS ) ) ;
pr_err ( " Interruptibility = %08x ActivityState = %08x \n " ,
vmcs_read32 ( GUEST_INTERRUPTIBILITY_INFO ) ,
vmcs_read32 ( GUEST_ACTIVITY_STATE ) ) ;
if ( secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY )
pr_err ( " InterruptStatus = %04x \n " ,
vmcs_read16 ( GUEST_INTR_STATUS ) ) ;
2014-12-10 22:52:58 -07:00
2018-12-03 14:53:18 -07:00
pr_err ( " *** Host State *** \n " ) ;
pr_err ( " RIP = 0x%016lx RSP = 0x%016lx \n " ,
vmcs_readl ( HOST_RIP ) , vmcs_readl ( HOST_RSP ) ) ;
pr_err ( " CS=%04x SS=%04x DS=%04x ES=%04x FS=%04x GS=%04x TR=%04x \n " ,
vmcs_read16 ( HOST_CS_SELECTOR ) , vmcs_read16 ( HOST_SS_SELECTOR ) ,
vmcs_read16 ( HOST_DS_SELECTOR ) , vmcs_read16 ( HOST_ES_SELECTOR ) ,
vmcs_read16 ( HOST_FS_SELECTOR ) , vmcs_read16 ( HOST_GS_SELECTOR ) ,
vmcs_read16 ( HOST_TR_SELECTOR ) ) ;
pr_err ( " FSBase=%016lx GSBase=%016lx TRBase=%016lx \n " ,
vmcs_readl ( HOST_FS_BASE ) , vmcs_readl ( HOST_GS_BASE ) ,
vmcs_readl ( HOST_TR_BASE ) ) ;
pr_err ( " GDTBase=%016lx IDTBase=%016lx \n " ,
vmcs_readl ( HOST_GDTR_BASE ) , vmcs_readl ( HOST_IDTR_BASE ) ) ;
pr_err ( " CR0=%016lx CR3=%016lx CR4=%016lx \n " ,
vmcs_readl ( HOST_CR0 ) , vmcs_readl ( HOST_CR3 ) ,
vmcs_readl ( HOST_CR4 ) ) ;
pr_err ( " Sysenter RSP=%016lx CS:RIP=%04x:%016lx \n " ,
vmcs_readl ( HOST_IA32_SYSENTER_ESP ) ,
vmcs_read32 ( HOST_IA32_SYSENTER_CS ) ,
vmcs_readl ( HOST_IA32_SYSENTER_EIP ) ) ;
if ( vmexit_ctl & ( VM_EXIT_LOAD_IA32_PAT | VM_EXIT_LOAD_IA32_EFER ) )
pr_err ( " EFER = 0x%016llx PAT = 0x%016llx \n " ,
vmcs_read64 ( HOST_IA32_EFER ) ,
vmcs_read64 ( HOST_IA32_PAT ) ) ;
if ( cpu_has_load_perf_global_ctrl ( ) & &
vmexit_ctl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL )
pr_err ( " PerfGlobCtl = 0x%016llx \n " ,
vmcs_read64 ( HOST_IA32_PERF_GLOBAL_CTRL ) ) ;
2014-12-10 22:52:58 -07:00
2018-12-03 14:53:18 -07:00
pr_err ( " *** Control State *** \n " ) ;
pr_err ( " PinBased=%08x CPUBased=%08x SecondaryExec=%08x \n " ,
pin_based_exec_ctrl , cpu_based_exec_ctrl , secondary_exec_control ) ;
pr_err ( " EntryControls=%08x ExitControls=%08x \n " , vmentry_ctl , vmexit_ctl ) ;
pr_err ( " ExceptionBitmap=%08x PFECmask=%08x PFECmatch=%08x \n " ,
vmcs_read32 ( EXCEPTION_BITMAP ) ,
vmcs_read32 ( PAGE_FAULT_ERROR_CODE_MASK ) ,
vmcs_read32 ( PAGE_FAULT_ERROR_CODE_MATCH ) ) ;
pr_err ( " VMEntry: intr_info=%08x errcode=%08x ilen=%08x \n " ,
vmcs_read32 ( VM_ENTRY_INTR_INFO_FIELD ) ,
vmcs_read32 ( VM_ENTRY_EXCEPTION_ERROR_CODE ) ,
vmcs_read32 ( VM_ENTRY_INSTRUCTION_LEN ) ) ;
pr_err ( " VMExit: intr_info=%08x errcode=%08x ilen=%08x \n " ,
vmcs_read32 ( VM_EXIT_INTR_INFO ) ,
vmcs_read32 ( VM_EXIT_INTR_ERROR_CODE ) ,
vmcs_read32 ( VM_EXIT_INSTRUCTION_LEN ) ) ;
pr_err ( " reason=%08x qualification=%016lx \n " ,
vmcs_read32 ( VM_EXIT_REASON ) , vmcs_readl ( EXIT_QUALIFICATION ) ) ;
pr_err ( " IDTVectoring: info=%08x errcode=%08x \n " ,
vmcs_read32 ( IDT_VECTORING_INFO_FIELD ) ,
vmcs_read32 ( IDT_VECTORING_ERROR_CODE ) ) ;
pr_err ( " TSC Offset = 0x%016llx \n " , vmcs_read64 ( TSC_OFFSET ) ) ;
if ( secondary_exec_control & SECONDARY_EXEC_TSC_SCALING )
pr_err ( " TSC Multiplier = 0x%016llx \n " ,
vmcs_read64 ( TSC_MULTIPLIER ) ) ;
2019-04-15 07:14:32 -06:00
if ( cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW ) {
if ( secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY ) {
u16 status = vmcs_read16 ( GUEST_INTR_STATUS ) ;
pr_err ( " SVI|RVI = %02x|%02x " , status > > 8 , status & 0xff ) ;
}
2019-04-24 04:15:08 -06:00
pr_cont ( " TPR Threshold = 0x%02x \n " , vmcs_read32 ( TPR_THRESHOLD ) ) ;
2019-04-15 07:14:32 -06:00
if ( secondary_exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES )
pr_err ( " APIC-access addr = 0x%016llx " , vmcs_read64 ( APIC_ACCESS_ADDR ) ) ;
2019-04-24 04:15:08 -06:00
pr_cont ( " virt-APIC addr = 0x%016llx \n " , vmcs_read64 ( VIRTUAL_APIC_PAGE_ADDR ) ) ;
2019-04-15 07:14:32 -06:00
}
2018-12-03 14:53:18 -07:00
if ( pin_based_exec_ctrl & PIN_BASED_POSTED_INTR )
pr_err ( " PostedIntrVec = 0x%02x \n " , vmcs_read16 ( POSTED_INTR_NV ) ) ;
if ( ( secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT ) )
pr_err ( " EPT pointer = 0x%016llx \n " , vmcs_read64 ( EPT_POINTER ) ) ;
if ( secondary_exec_control & SECONDARY_EXEC_PAUSE_LOOP_EXITING )
pr_err ( " PLE Gap=%08x Window=%08x \n " ,
vmcs_read32 ( PLE_GAP ) , vmcs_read32 ( PLE_WINDOW ) ) ;
if ( secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID )
pr_err ( " Virtual processor ID = 0x%04x \n " ,
vmcs_read16 ( VIRTUAL_PROCESSOR_ID ) ) ;
2014-12-10 22:52:58 -07:00
}
2018-12-03 14:53:18 -07:00
/*
* The guest has exited . See if we can fix it or if we need userspace
* assistance .
*/
2020-04-28 00:23:25 -06:00
static int vmx_handle_exit ( struct kvm_vcpu * vcpu , fastpath_t exit_fastpath )
2014-12-10 22:52:58 -07:00
{
2018-12-03 14:53:18 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
u32 exit_reason = vmx - > exit_reason ;
u32 vectoring_info = vmx - > idt_vectoring_info ;
2014-12-10 22:52:58 -07:00
2018-12-03 14:53:18 -07:00
/*
* Flush logged GPAs PML buffer , this will make dirty_bitmap more
* updated . Another good is , in kvm_vm_ioctl_get_dirty_log , before
* querying dirty_bitmap , we only need to kick all vcpus out of guest
* mode as if vcpus is in root mode , the PML buffer must has been
* flushed already .
*/
if ( enable_pml )
vmx_flush_pml_buffer ( vcpu ) ;
2016-11-30 08:03:11 -07:00
2020-04-22 20:25:48 -06:00
/*
* We should never reach this point with a pending nested VM - Enter , and
* more specifically emulation of L2 due to invalid guest state ( see
* below ) should never happen as that means we incorrectly allowed a
* nested VM - Enter with an invalid vmcs12 .
*/
WARN_ON_ONCE ( vmx - > nested . nested_run_pending ) ;
2018-12-03 14:53:18 -07:00
/* If guest state is invalid, start emulating */
if ( vmx - > emulation_required )
return handle_invalid_guest_state ( vcpu ) ;
2016-11-30 08:03:11 -07:00
2020-03-17 11:32:50 -06:00
if ( is_guest_mode ( vcpu ) ) {
/*
* The host physical addresses of some pages of guest memory
* are loaded into the vmcs02 ( e . g . vmcs12 ' s Virtual APIC
* Page ) . The CPU may write to these pages via their host
* physical address while L2 is running , bypassing any
* address - translation - based dirty tracking ( e . g . EPT write
* protection ) .
*
* Mark them dirty on every exit from L2 to prevent them from
* getting out of sync with dirty tracking .
*/
nested_mark_vmcs12_pages_dirty ( vcpu ) ;
2020-04-15 11:55:16 -06:00
if ( nested_vmx_reflect_vmexit ( vcpu ) )
2020-04-15 11:55:10 -06:00
return 1 ;
2020-03-17 11:32:50 -06:00
}
2016-11-30 08:03:10 -07:00
2018-12-03 14:53:18 -07:00
if ( exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY ) {
dump_vmcs ( ) ;
vcpu - > run - > exit_reason = KVM_EXIT_FAIL_ENTRY ;
vcpu - > run - > fail_entry . hardware_entry_failure_reason
= exit_reason ;
return 0 ;
2016-11-30 08:03:10 -07:00
}
2018-12-03 14:53:18 -07:00
if ( unlikely ( vmx - > fail ) ) {
2019-07-19 10:15:08 -06:00
dump_vmcs ( ) ;
2018-12-03 14:53:18 -07:00
vcpu - > run - > exit_reason = KVM_EXIT_FAIL_ENTRY ;
vcpu - > run - > fail_entry . hardware_entry_failure_reason
= vmcs_read32 ( VM_INSTRUCTION_ERROR ) ;
return 0 ;
}
2018-06-27 15:59:11 -06:00
2018-12-03 14:53:18 -07:00
/*
* Note :
* Do not try to fix EXIT_REASON_EPT_MISCONFIG if it caused by
* delivery event since it indicates guest is accessing MMIO .
* The vm - exit can be triggered again after return to guest that
* will cause infinite loop .
*/
if ( ( vectoring_info & VECTORING_INFO_VALID_MASK ) & &
( exit_reason ! = EXIT_REASON_EXCEPTION_NMI & &
exit_reason ! = EXIT_REASON_EPT_VIOLATION & &
exit_reason ! = EXIT_REASON_PML_FULL & &
exit_reason ! = EXIT_REASON_TASK_SWITCH ) ) {
vcpu - > run - > exit_reason = KVM_EXIT_INTERNAL_ERROR ;
vcpu - > run - > internal . suberror = KVM_INTERNAL_ERROR_DELIVERY_EV ;
vcpu - > run - > internal . ndata = 3 ;
vcpu - > run - > internal . data [ 0 ] = vectoring_info ;
vcpu - > run - > internal . data [ 1 ] = exit_reason ;
vcpu - > run - > internal . data [ 2 ] = vcpu - > arch . exit_qualification ;
if ( exit_reason = = EXIT_REASON_EPT_MISCONFIG ) {
vcpu - > run - > internal . ndata + + ;
vcpu - > run - > internal . data [ 3 ] =
vmcs_read64 ( GUEST_PHYSICAL_ADDRESS ) ;
}
return 0 ;
}
2018-06-27 15:59:11 -06:00
2018-12-03 14:53:18 -07:00
if ( unlikely ( ! enable_vnmi & &
vmx - > loaded_vmcs - > soft_vnmi_blocked ) ) {
2020-04-22 20:25:48 -06:00
if ( ! vmx_interrupt_blocked ( vcpu ) ) {
2018-12-03 14:53:18 -07:00
vmx - > loaded_vmcs - > soft_vnmi_blocked = 0 ;
} else if ( vmx - > loaded_vmcs - > vnmi_blocked_time > 1000000000LL & &
vcpu - > arch . nmi_pending ) {
/*
* This CPU don ' t support us in finding the end of an
* NMI - blocked window if the guest runs with IRQs
* disabled . So we pull the trigger after 1 s of
* futile waiting , but inform the user about this .
*/
printk ( KERN_WARNING " %s: Breaking out of NMI-blocked "
" state on VCPU %d after 1 s timeout \n " ,
__func__ , vcpu - > vcpu_id ) ;
vmx - > loaded_vmcs - > soft_vnmi_blocked = 0 ;
}
}
2018-06-27 15:59:11 -06:00
2020-04-28 00:23:25 -06:00
if ( exit_fastpath ! = EXIT_FASTPATH_NONE )
2019-11-20 20:17:11 -07:00
return 1 ;
2019-12-11 13:47:51 -07:00
if ( exit_reason > = kvm_vmx_max_exit_handlers )
goto unexpected_vmexit ;
2019-11-04 15:59:59 -07:00
# ifdef CONFIG_RETPOLINE
2019-12-11 13:47:51 -07:00
if ( exit_reason = = EXIT_REASON_MSR_WRITE )
return kvm_emulate_wrmsr ( vcpu ) ;
else if ( exit_reason = = EXIT_REASON_PREEMPTION_TIMER )
return handle_preemption_timer ( vcpu ) ;
else if ( exit_reason = = EXIT_REASON_INTERRUPT_WINDOW )
return handle_interrupt_window ( vcpu ) ;
else if ( exit_reason = = EXIT_REASON_EXTERNAL_INTERRUPT )
return handle_external_interrupt ( vcpu ) ;
else if ( exit_reason = = EXIT_REASON_HLT )
return kvm_emulate_halt ( vcpu ) ;
else if ( exit_reason = = EXIT_REASON_EPT_MISCONFIG )
return handle_ept_misconfig ( vcpu ) ;
2019-11-04 15:59:59 -07:00
# endif
2019-12-11 13:47:51 -07:00
exit_reason = array_index_nospec ( exit_reason ,
kvm_vmx_max_exit_handlers ) ;
if ( ! kvm_vmx_exit_handlers [ exit_reason ] )
goto unexpected_vmexit ;
return kvm_vmx_exit_handlers [ exit_reason ] ( vcpu ) ;
unexpected_vmexit :
vcpu_unimpl ( vcpu , " vmx: unexpected exit reason 0x%x \n " , exit_reason ) ;
dump_vmcs ( ) ;
vcpu - > run - > exit_reason = KVM_EXIT_INTERNAL_ERROR ;
vcpu - > run - > internal . suberror =
2019-08-26 04:16:43 -06:00
KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON ;
2019-12-11 13:47:51 -07:00
vcpu - > run - > internal . ndata = 1 ;
vcpu - > run - > internal . data [ 0 ] = exit_reason ;
return 0 ;
2016-11-30 08:03:10 -07:00
}
2018-10-08 14:42:20 -06:00
/*
2018-12-03 14:53:18 -07:00
* Software based L1D cache flush which is used when microcode providing
* the cache control MSR is not loaded .
2018-10-08 14:42:20 -06:00
*
2018-12-03 14:53:18 -07:00
* The L1D cache is 32 KiB on Nehalem and later microarchitectures , but to
* flush it is required to read in 64 KiB because the replacement algorithm
* is not exactly LRU . This could be sized at runtime via topology
* information but as all relevant affected CPUs have 32 KiB L1D cache size
* there is no point in doing so .
2018-10-08 14:42:20 -06:00
*/
2018-12-03 14:53:18 -07:00
static void vmx_l1d_flush ( struct kvm_vcpu * vcpu )
2011-05-25 14:10:02 -06:00
{
2018-12-03 14:53:18 -07:00
int size = PAGE_SIZE < < L1D_CACHE_ORDER ;
2017-12-20 06:05:21 -07:00
/*
2018-12-03 14:53:18 -07:00
* This code is only executed when the the flush mode is ' cond ' or
* ' always '
2017-12-20 06:05:21 -07:00
*/
2018-12-03 14:53:18 -07:00
if ( static_branch_likely ( & vmx_l1d_flush_cond ) ) {
bool flush_l1d ;
2017-12-20 06:05:21 -07:00
2018-12-03 14:53:18 -07:00
/*
* Clear the per - vcpu flush bit , it gets set again
* either from vcpu_run ( ) or from one of the unsafe
* VMEXIT handlers .
*/
flush_l1d = vcpu - > arch . l1tf_flush_l1d ;
vcpu - > arch . l1tf_flush_l1d = false ;
2017-12-20 06:05:21 -07:00
2018-12-03 14:53:18 -07:00
/*
* Clear the per - cpu flush bit , it gets set again from
* the interrupt handlers .
*/
flush_l1d | = kvm_get_cpu_l1tf_flush_l1d ( ) ;
kvm_clear_cpu_l1tf_flush_l1d ( ) ;
2017-12-20 06:05:21 -07:00
2018-12-03 14:53:18 -07:00
if ( ! flush_l1d )
return ;
}
2018-09-26 10:23:50 -06:00
2018-12-03 14:53:18 -07:00
vcpu - > stat . l1d_flush + + ;
2017-12-20 06:05:21 -07:00
2018-12-03 14:53:18 -07:00
if ( static_cpu_has ( X86_FEATURE_FLUSH_L1D ) ) {
wrmsrl ( MSR_IA32_FLUSH_CMD , L1D_FLUSH ) ;
return ;
}
2017-12-20 06:05:21 -07:00
2018-12-03 14:53:18 -07:00
asm volatile (
/* First ensure the pages are in the TLB */
" xorl %%eax, %%eax \n "
" .Lpopulate_tlb: \n \t "
" movzbl (%[flush_pages], %% " _ASM_AX " ), %%ecx \n \t "
" addl $4096, %%eax \n \t "
" cmpl %%eax, %[size] \n \t "
" jne .Lpopulate_tlb \n \t "
" xorl %%eax, %%eax \n \t "
" cpuid \n \t "
/* Now fill the cache */
" xorl %%eax, %%eax \n "
" .Lfill_cache: \n "
" movzbl (%[flush_pages], %% " _ASM_AX " ), %%ecx \n \t "
" addl $64, %%eax \n \t "
" cmpl %%eax, %[size] \n \t "
" jne .Lfill_cache \n \t "
" lfence \n "
: : [ flush_pages ] " r " ( vmx_l1d_flush_pages ) ,
[ size ] " r " ( size )
: " eax " , " ebx " , " ecx " , " edx " ) ;
2018-09-26 10:23:50 -06:00
}
2017-12-20 06:05:21 -07:00
2018-12-03 14:53:18 -07:00
static void update_cr8_intercept ( struct kvm_vcpu * vcpu , int tpr , int irr )
2018-09-26 10:23:50 -06:00
{
2018-12-03 14:53:18 -07:00
struct vmcs12 * vmcs12 = get_vmcs12 ( vcpu ) ;
2019-11-11 05:30:54 -07:00
int tpr_threshold ;
2018-09-26 10:23:50 -06:00
2018-12-03 14:53:18 -07:00
if ( is_guest_mode ( vcpu ) & &
nested_cpu_has ( vmcs12 , CPU_BASED_TPR_SHADOW ) )
return ;
2017-12-20 06:05:21 -07:00
2019-11-11 05:30:54 -07:00
tpr_threshold = ( irr = = - 1 | | tpr < irr ) ? 0 : irr ;
2019-11-11 05:30:55 -07:00
if ( is_guest_mode ( vcpu ) )
to_vmx ( vcpu ) - > nested . l1_tpr_threshold = tpr_threshold ;
else
vmcs_write32 ( TPR_THRESHOLD , tpr_threshold ) ;
2017-12-20 05:56:53 -07:00
}
2018-12-03 14:53:18 -07:00
void vmx_set_virtual_apic_mode ( struct kvm_vcpu * vcpu )
2017-12-20 05:56:53 -07:00
{
2019-05-07 13:17:57 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2018-12-03 14:53:18 -07:00
u32 sec_exec_control ;
2017-12-20 05:56:53 -07:00
2018-12-03 14:53:18 -07:00
if ( ! lapic_in_kernel ( vcpu ) )
return ;
2016-07-06 05:23:51 -06:00
2018-12-03 14:53:18 -07:00
if ( ! flexpriority_enabled & &
! cpu_has_vmx_virtualize_x2apic_mode ( ) )
return ;
2015-02-03 08:58:17 -07:00
2018-12-03 14:53:18 -07:00
/* Postpone execution until vmcs01 is the current VMCS. */
if ( is_guest_mode ( vcpu ) ) {
2019-05-07 13:17:57 -06:00
vmx - > nested . change_vmcs01_virtual_apic_mode = true ;
2018-12-03 14:53:18 -07:00
return ;
2016-11-30 13:03:45 -07:00
}
2011-05-25 14:10:02 -06:00
2019-05-07 13:17:57 -06:00
sec_exec_control = secondary_exec_controls_get ( vmx ) ;
2018-12-03 14:53:18 -07:00
sec_exec_control & = ~ ( SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE ) ;
2018-09-26 10:23:50 -06:00
2018-12-03 14:53:18 -07:00
switch ( kvm_get_apic_mode ( vcpu ) ) {
case LAPIC_MODE_INVALID :
WARN_ONCE ( true , " Invalid local APIC state " ) ;
case LAPIC_MODE_DISABLED :
break ;
case LAPIC_MODE_XAPIC :
if ( flexpriority_enabled ) {
sec_exec_control | =
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES ;
2020-03-20 15:28:25 -06:00
kvm_make_request ( KVM_REQ_APIC_PAGE_RELOAD , vcpu ) ;
/*
* Flush the TLB , reloading the APIC access page will
* only do so if its physical address has changed , but
* the guest may have inserted a non - APIC mapping into
* the TLB while the APIC access page was disabled .
*/
kvm_make_request ( KVM_REQ_TLB_FLUSH_CURRENT , vcpu ) ;
2018-12-03 14:53:18 -07:00
}
break ;
case LAPIC_MODE_X2APIC :
if ( cpu_has_vmx_virtualize_x2apic_mode ( ) )
sec_exec_control | =
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE ;
break ;
2018-09-26 10:23:50 -06:00
}
2019-05-07 13:17:57 -06:00
secondary_exec_controls_set ( vmx , sec_exec_control ) ;
2018-09-26 10:23:50 -06:00
2018-12-03 14:53:18 -07:00
vmx_update_msr_bitmap ( vcpu ) ;
}
2013-03-13 04:31:24 -06:00
2020-03-20 15:28:24 -06:00
static void vmx_set_apic_access_page_addr ( struct kvm_vcpu * vcpu )
2018-12-03 14:53:18 -07:00
{
2020-03-20 15:28:24 -06:00
struct page * page ;
2020-03-20 15:28:23 -06:00
/* Defer reload until vmcs01 is the current VMCS. */
if ( is_guest_mode ( vcpu ) ) {
to_vmx ( vcpu ) - > nested . reload_vmcs01_apic_access_page = true ;
return ;
2018-12-03 14:53:18 -07:00
}
2020-03-20 15:28:23 -06:00
2020-03-20 15:28:25 -06:00
if ( ! ( secondary_exec_controls_get ( to_vmx ( vcpu ) ) &
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES ) )
return ;
2020-03-20 15:28:24 -06:00
page = gfn_to_page ( vcpu - > kvm , APIC_DEFAULT_PHYS_BASE > > PAGE_SHIFT ) ;
if ( is_error_page ( page ) )
return ;
vmcs_write64 ( APIC_ACCESS_ADDR , page_to_phys ( page ) ) ;
2020-03-20 15:28:23 -06:00
vmx_flush_tlb_current ( vcpu ) ;
2020-03-20 15:28:24 -06:00
/*
* Do not pin apic access page in memory , the MMU notifier
* will call us again if it is migrated or swapped out .
*/
put_page ( page ) ;
2018-12-03 14:53:18 -07:00
}
2011-05-25 14:10:02 -06:00
2018-12-03 14:53:18 -07:00
static void vmx_hwapic_isr_update ( struct kvm_vcpu * vcpu , int max_isr )
{
u16 status ;
u8 old ;
2018-06-22 17:35:11 -06:00
2018-12-03 14:53:18 -07:00
if ( max_isr = = - 1 )
max_isr = 0 ;
2015-02-03 08:57:51 -07:00
2018-12-03 14:53:18 -07:00
status = vmcs_read16 ( GUEST_INTR_STATUS ) ;
old = status > > 8 ;
if ( max_isr ! = old ) {
status & = 0xff ;
status | = max_isr < < 8 ;
vmcs_write16 ( GUEST_INTR_STATUS , status ) ;
}
}
2016-11-30 13:03:45 -07:00
2018-12-03 14:53:18 -07:00
static void vmx_set_rvi ( int vector )
{
u16 status ;
u8 old ;
KVM: vmx: Inject #UD for SGX ENCLS instruction in guest
Virtualization of Intel SGX depends on Enclave Page Cache (EPC)
management that is not yet available in the kernel, i.e. KVM support
for exposing SGX to a guest cannot be added until basic support
for SGX is upstreamed, which is a WIP[1].
Until SGX is properly supported in KVM, ensure a guest sees expected
behavior for ENCLS, i.e. all ENCLS #UD. Because SGX does not have a
true software enable bit, e.g. there is no CR4.SGXE bit, the ENCLS
instruction can be executed[1] by the guest if SGX is supported by the
system. Intercept all ENCLS leafs (via the ENCLS- exiting control and
field) and unconditionally inject #UD.
[1] https://www.spinics.net/lists/kvm/msg171333.html or
https://lkml.org/lkml/2018/7/3/879
[2] A guest can execute ENCLS in the sense that ENCLS will not take
an immediate #UD, but no ENCLS will ever succeed in a guest
without explicit support from KVM (map EPC memory into the guest),
unless KVM has a *very* egregious bug, e.g. accidentally mapped
EPC memory into the guest SPTEs. In other words this patch is
needed only to prevent the guest from seeing inconsistent behavior,
e.g. #GP (SGX not enabled in Feature Control MSR) or #PF (leaf
operand(s) does not point at EPC memory) instead of #UD on ENCLS.
Intercepting ENCLS is not required to prevent the guest from truly
utilizing SGX.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20180814163334.25724-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-14 10:33:34 -06:00
2018-12-03 14:53:18 -07:00
if ( vector = = - 1 )
vector = 0 ;
2011-05-25 14:10:02 -06:00
2018-12-03 14:53:18 -07:00
status = vmcs_read16 ( GUEST_INTR_STATUS ) ;
old = ( u8 ) status & 0xff ;
if ( ( u8 ) vector ! = old ) {
status & = ~ 0xff ;
status | = ( u8 ) vector ;
vmcs_write16 ( GUEST_INTR_STATUS , status ) ;
2018-09-26 10:23:50 -06:00
}
2018-12-03 14:53:18 -07:00
}
2018-09-26 10:23:50 -06:00
2018-12-03 14:53:18 -07:00
static void vmx_hwapic_irr_update ( struct kvm_vcpu * vcpu , int max_irr )
{
2018-09-26 10:23:50 -06:00
/*
2018-12-03 14:53:18 -07:00
* When running L2 , updating RVI is only relevant when
* vmcs12 virtual - interrupt - delivery enabled .
* However , it can be enabled only when L1 also
* intercepts external - interrupts and in that case
* we should not update vmcs02 RVI but instead intercept
* interrupt . Therefore , do nothing when running L2 .
2011-05-25 14:10:02 -06:00
*/
2018-12-03 14:53:18 -07:00
if ( ! is_guest_mode ( vcpu ) )
vmx_set_rvi ( max_irr ) ;
}
2011-05-25 14:10:02 -06:00
2018-12-03 14:53:18 -07:00
static int vmx_sync_pir_to_irr ( struct kvm_vcpu * vcpu )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
int max_irr ;
bool max_irr_updated ;
2014-08-21 05:46:50 -06:00
2018-12-03 14:53:18 -07:00
WARN_ON ( ! vcpu - > arch . apicv_active ) ;
if ( pi_test_on ( & vmx - > pi_desc ) ) {
pi_clear_on ( & vmx - > pi_desc ) ;
/*
2019-11-11 05:25:25 -07:00
* IOMMU can write to PID . ON , so the barrier matters even on UP .
2018-12-03 14:53:18 -07:00
* But on x86 this is just a compiler barrier anyway .
*/
smp_mb__after_atomic ( ) ;
max_irr_updated =
kvm_apic_update_irr ( vcpu , vmx - > pi_desc . pir , & max_irr ) ;
2018-10-16 10:50:04 -06:00
/*
2018-12-03 14:53:18 -07:00
* If we are running L2 and L1 has a new pending interrupt
* which can be injected , we should re - evaluate
* what should be done with this new L1 interrupt .
* If L1 intercepts external - interrupts , we should
* exit from L2 to L1 . Otherwise , interrupt should be
* delivered directly to L2 .
2018-10-16 10:50:04 -06:00
*/
2018-12-03 14:53:18 -07:00
if ( is_guest_mode ( vcpu ) & & max_irr_updated ) {
if ( nested_exit_on_intr ( vcpu ) )
kvm_vcpu_exiting_guest_mode ( vcpu ) ;
else
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
2018-10-16 10:50:04 -06:00
}
2018-12-03 14:53:18 -07:00
} else {
max_irr = kvm_lapic_find_highest_irr ( vcpu ) ;
2014-08-21 05:46:50 -06:00
}
2018-12-03 14:53:18 -07:00
vmx_hwapic_irr_update ( vcpu , max_irr ) ;
return max_irr ;
}
2014-08-21 05:46:50 -06:00
2019-08-04 20:03:19 -06:00
static bool vmx_dy_apicv_has_pending_interrupt ( struct kvm_vcpu * vcpu )
{
2019-11-11 10:20:10 -07:00
struct pi_desc * pi_desc = vcpu_to_pi_desc ( vcpu ) ;
return pi_test_on ( pi_desc ) | |
2019-11-11 10:20:12 -07:00
( pi_test_sn ( pi_desc ) & & ! pi_is_pir_empty ( pi_desc ) ) ;
2019-08-04 20:03:19 -06:00
}
2018-12-03 14:53:18 -07:00
static void vmx_load_eoi_exitmap ( struct kvm_vcpu * vcpu , u64 * eoi_exit_bitmap )
{
if ( ! kvm_vcpu_apicv_active ( vcpu ) )
return ;
2017-12-20 06:05:21 -07:00
2018-12-03 14:53:18 -07:00
vmcs_write64 ( EOI_EXIT_BITMAP0 , eoi_exit_bitmap [ 0 ] ) ;
vmcs_write64 ( EOI_EXIT_BITMAP1 , eoi_exit_bitmap [ 1 ] ) ;
vmcs_write64 ( EOI_EXIT_BITMAP2 , eoi_exit_bitmap [ 2 ] ) ;
vmcs_write64 ( EOI_EXIT_BITMAP3 , eoi_exit_bitmap [ 3 ] ) ;
2017-12-20 05:56:53 -07:00
}
2018-12-03 14:53:18 -07:00
static void vmx_apicv_post_state_restore ( struct kvm_vcpu * vcpu )
2017-12-20 05:56:53 -07:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2018-03-05 10:33:27 -07:00
2018-12-03 14:53:18 -07:00
pi_clear_on ( & vmx - > pi_desc ) ;
memset ( vmx - > pi_desc . pir , 0 , sizeof ( vmx - > pi_desc . pir ) ) ;
}
2017-12-20 05:56:53 -07:00
2019-04-19 23:50:59 -06:00
static void handle_exception_nmi_irqoff ( struct vcpu_vmx * vmx )
2018-12-03 14:53:18 -07:00
{
2020-04-15 14:34:54 -06:00
u32 intr_info = vmx_get_intr_info ( & vmx - > vcpu ) ;
2011-05-25 14:10:02 -06:00
2018-12-03 14:53:18 -07:00
/* if exit due to PF check for async PF */
2020-04-15 14:34:54 -06:00
if ( is_page_fault ( intr_info ) ) {
2020-05-25 08:41:17 -06:00
vmx - > vcpu . arch . apf . host_apf_flags = kvm_read_and_reset_apf_flags ( ) ;
2018-12-03 14:53:18 -07:00
/* Handle machine checks before interrupts are enabled */
2020-04-15 14:34:54 -06:00
} else if ( is_machine_check ( intr_info ) ) {
2018-12-03 14:53:18 -07:00
kvm_machine_check ( ) ;
/* We need to handle NMIs before interrupts are enabled */
2020-04-15 14:34:54 -06:00
} else if ( is_nmi ( intr_info ) ) {
2018-12-03 14:53:18 -07:00
kvm_before_interrupt ( & vmx - > vcpu ) ;
asm ( " int $2 " ) ;
kvm_after_interrupt ( & vmx - > vcpu ) ;
2011-05-25 14:10:02 -06:00
}
2018-12-03 14:53:18 -07:00
}
2011-05-25 14:10:02 -06:00
2019-04-19 23:50:59 -06:00
static void handle_external_interrupt_irqoff ( struct kvm_vcpu * vcpu )
2018-12-03 14:53:18 -07:00
{
2019-04-19 23:50:56 -06:00
unsigned int vector ;
unsigned long entry ;
2018-12-03 14:53:18 -07:00
# ifdef CONFIG_X86_64
2019-04-19 23:50:56 -06:00
unsigned long tmp ;
2018-12-03 14:53:18 -07:00
# endif
2019-04-19 23:50:56 -06:00
gate_desc * desc ;
2020-04-15 14:34:54 -06:00
u32 intr_info = vmx_get_intr_info ( vcpu ) ;
2011-05-25 14:10:02 -06:00
2019-04-19 23:50:56 -06:00
if ( WARN_ONCE ( ! is_external_intr ( intr_info ) ,
" KVM: unexpected VM-Exit interrupt info: 0x%x " , intr_info ) )
return ;
vector = intr_info & INTR_INFO_VECTOR_MASK ;
KVM: VMX: Store the host kernel's IDT base in a global variable
Although the kernel may use multiple IDTs, KVM should only ever see the
"real" IDT, e.g. the early init IDT is long gone by the time KVM runs
and the debug stack IDT is only used for small windows of time in very
specific flows.
Before commit a547c6db4d2f1 ("KVM: VMX: Enable acknowledge interupt on
vmexit"), the kernel's IDT base was consumed by KVM only when setting
constant VMCS state, i.e. to set VMCS.HOST_IDTR_BASE. Because constant
host state is done once per vCPU, there was ostensibly no need to cache
the kernel's IDT base.
When support for "ack interrupt on exit" was introduced, KVM added a
second consumer of the IDT base as handling already-acked interrupts
requires directly calling the interrupt handler, i.e. KVM uses the IDT
base to find the address of the handler. Because interrupts are a fast
path, KVM cached the IDT base to avoid having to VMREAD HOST_IDTR_BASE.
Presumably, the IDT base was cached on a per-vCPU basis simply because
the existing code grabbed the IDT base on a per-vCPU (VMCS) basis.
Note, all post-boot IDTs use the same handlers for external interrupts,
i.e. the "ack interrupt on exit" use of the IDT base would be unaffected
even if the cached IDT somehow did not match the current IDT. And as
for the original use case of setting VMCS.HOST_IDTR_BASE, if any of the
above analysis is wrong then KVM has had a bug since the beginning of
time since KVM has effectively been caching the IDT at vCPU creation
since commit a8b732ca01c ("[PATCH] kvm: userspace interface").
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-19 23:50:57 -06:00
desc = ( gate_desc * ) host_idt_base + vector ;
2019-04-19 23:50:56 -06:00
entry = gate_offset ( desc ) ;
2019-04-19 23:50:58 -06:00
kvm_before_interrupt ( vcpu ) ;
2019-04-19 23:50:56 -06:00
asm volatile (
2018-12-03 14:53:18 -07:00
# ifdef CONFIG_X86_64
2020-05-04 09:57:06 -06:00
" mov %%rsp, %[sp] \n \t "
" and $-16, %%rsp \n \t "
" push %[ss] \n \t "
2019-04-19 23:50:56 -06:00
" push %[sp] \n \t "
2018-12-03 14:53:18 -07:00
# endif
2019-04-19 23:50:56 -06:00
" pushf \n \t "
2020-05-04 09:57:06 -06:00
" push %[cs] \n \t "
2019-04-19 23:50:56 -06:00
CALL_NOSPEC
:
2018-12-03 14:53:18 -07:00
# ifdef CONFIG_X86_64
2019-04-19 23:50:56 -06:00
[ sp ] " =&r " ( tmp ) ,
2018-12-03 14:53:18 -07:00
# endif
2019-04-19 23:50:56 -06:00
ASM_CALL_CONSTRAINT
:
2020-03-23 13:12:43 -06:00
[ thunk_target ] " r " ( entry ) ,
2020-05-04 09:57:06 -06:00
# ifdef CONFIG_X86_64
2019-04-19 23:50:56 -06:00
[ ss ] " i " ( __KERNEL_DS ) ,
2020-05-04 09:57:06 -06:00
# endif
2019-04-19 23:50:56 -06:00
[ cs ] " i " ( __KERNEL_CS )
) ;
2019-04-19 23:50:58 -06:00
kvm_after_interrupt ( vcpu ) ;
2018-12-03 14:53:18 -07:00
}
2019-04-19 23:50:59 -06:00
STACK_FRAME_NON_STANDARD ( handle_external_interrupt_irqoff ) ;
2020-04-10 11:47:03 -06:00
static void vmx_handle_exit_irqoff ( struct kvm_vcpu * vcpu )
2019-04-19 23:50:59 -06:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
if ( vmx - > exit_reason = = EXIT_REASON_EXTERNAL_INTERRUPT )
handle_external_interrupt_irqoff ( vcpu ) ;
else if ( vmx - > exit_reason = = EXIT_REASON_EXCEPTION_NMI )
handle_exception_nmi_irqoff ( vmx ) ;
}
2016-11-29 19:14:10 -07:00
2020-02-18 16:40:11 -07:00
static bool vmx_has_emulated_msr ( u32 index )
2018-12-03 14:53:18 -07:00
{
switch ( index ) {
case MSR_IA32_SMBASE :
/*
* We cannot do SMM unless we can run the guest in big
* real mode .
*/
return enable_unrestricted_guest | | emulate_invalid_guest_state ;
2019-07-02 06:45:24 -06:00
case MSR_IA32_VMX_BASIC . . . MSR_IA32_VMX_VMFUNC :
return nested ;
2018-12-03 14:53:18 -07:00
case MSR_AMD64_VIRT_SPEC_CTRL :
/* This is AMD only. */
return false ;
default :
return true ;
2018-03-21 07:20:18 -06:00
}
2018-12-03 14:53:18 -07:00
}
2018-03-12 11:56:13 -06:00
2018-12-03 14:53:18 -07:00
static void vmx_recover_nmi_blocking ( struct vcpu_vmx * vmx )
{
u32 exit_intr_info ;
bool unblock_nmi ;
u8 vector ;
bool idtv_info_valid ;
KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT
KVM does not correctly handle L1 hypervisors that emulate L2 real mode with
PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest
PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used
(see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always
dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two
related issues:
1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are
overwritten in ept_load_pdptrs because the registers are believed to have
been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is
running with PAE enabled but PDPTRs have been set up by L1.
2) When L2 is about to enable paging and loads its CR3, we, again, attempt
to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees
that this will succeed (it's just a CR3 load, paging is not enabled yet) and
if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is
then lost and L2 crashes right after it enables paging.
This patch replaces the kvm_set_cr3 call with a simple register write if PAE
and EPT are both on. CR3 is not to be interpreted in this case.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-30 08:03:08 -07:00
2018-12-03 14:53:18 -07:00
idtv_info_valid = vmx - > idt_vectoring_info & VECTORING_INFO_VALID_MASK ;
2013-09-25 03:51:36 -06:00
2018-12-03 14:53:18 -07:00
if ( enable_vnmi ) {
if ( vmx - > loaded_vmcs - > nmi_known_unmasked )
return ;
2020-04-15 14:34:54 -06:00
exit_intr_info = vmx_get_intr_info ( & vmx - > vcpu ) ;
2018-12-03 14:53:18 -07:00
unblock_nmi = ( exit_intr_info & INTR_INFO_UNBLOCK_NMI ) ! = 0 ;
vector = exit_intr_info & INTR_INFO_VECTOR_MASK ;
/*
* SDM 3 : 27.7 .1 .2 ( September 2008 )
* Re - set bit " block by NMI " before VM entry if vmexit caused by
* a guest IRET fault .
* SDM 3 : 23.2 .2 ( September 2008 )
* Bit 12 is undefined in any of the following cases :
* If the VM exit sets the valid bit in the IDT - vectoring
* information field .
* If the VM exit is due to a double fault .
*/
if ( ( exit_intr_info & INTR_INFO_VALID_MASK ) & & unblock_nmi & &
vector ! = DF_VECTOR & & ! idtv_info_valid )
vmcs_set_bits ( GUEST_INTERRUPTIBILITY_INFO ,
GUEST_INTR_STATE_NMI ) ;
else
vmx - > loaded_vmcs - > nmi_known_unmasked =
! ( vmcs_read32 ( GUEST_INTERRUPTIBILITY_INFO )
& GUEST_INTR_STATE_NMI ) ;
} else if ( unlikely ( vmx - > loaded_vmcs - > soft_vnmi_blocked ) )
vmx - > loaded_vmcs - > vnmi_blocked_time + =
ktime_to_ns ( ktime_sub ( ktime_get ( ) ,
vmx - > loaded_vmcs - > entry_time ) ) ;
2011-05-25 14:10:02 -06:00
}
2018-12-03 14:53:18 -07:00
static void __vmx_complete_interrupts ( struct kvm_vcpu * vcpu ,
u32 idt_vectoring_info ,
int instr_len_field ,
int error_code_field )
2018-02-20 19:24:39 -07:00
{
2018-12-03 14:53:18 -07:00
u8 vector ;
int type ;
bool idtv_info_valid ;
2018-02-20 19:24:39 -07:00
2018-12-03 14:53:18 -07:00
idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK ;
2018-02-20 19:24:39 -07:00
2018-12-03 14:53:18 -07:00
vcpu - > arch . nmi_injected = false ;
kvm_clear_exception_queue ( vcpu ) ;
kvm_clear_interrupt_queue ( vcpu ) ;
2017-08-03 13:54:42 -06:00
2018-12-03 14:53:18 -07:00
if ( ! idtv_info_valid )
return ;
2017-05-05 12:28:09 -06:00
2018-12-03 14:53:18 -07:00
kvm_make_request ( KVM_REQ_EVENT , vcpu ) ;
2016-11-30 13:03:46 -07:00
2018-12-03 14:53:18 -07:00
vector = idt_vectoring_info & VECTORING_INFO_VECTOR_MASK ;
type = idt_vectoring_info & VECTORING_INFO_TYPE_MASK ;
2018-09-26 10:23:39 -06:00
2018-12-03 14:53:18 -07:00
switch ( type ) {
case INTR_TYPE_NMI_INTR :
vcpu - > arch . nmi_injected = true ;
/*
* SDM 3 : 27.7 .1 .2 ( September 2008 )
* Clear bit " block by NMI " before VM entry if a NMI
* delivery faulted .
*/
vmx_set_nmi_mask ( vcpu , false ) ;
break ;
case INTR_TYPE_SOFT_EXCEPTION :
vcpu - > arch . event_exit_inst_len = vmcs_read32 ( instr_len_field ) ;
/* fall through */
case INTR_TYPE_HARD_EXCEPTION :
if ( idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK ) {
u32 err = vmcs_read32 ( error_code_field ) ;
kvm_requeue_exception_e ( vcpu , vector , err ) ;
} else
kvm_requeue_exception ( vcpu , vector ) ;
break ;
case INTR_TYPE_SOFT_INTR :
vcpu - > arch . event_exit_inst_len = vmcs_read32 ( instr_len_field ) ;
/* fall through */
case INTR_TYPE_EXT_INTR :
kvm_queue_interrupt ( vcpu , vector , type = = INTR_TYPE_SOFT_INTR ) ;
break ;
default :
break ;
2018-06-20 18:21:29 -06:00
}
2016-11-30 13:03:46 -07:00
}
2018-12-03 14:53:18 -07:00
static void vmx_complete_interrupts ( struct vcpu_vmx * vmx )
2018-06-22 17:35:07 -06:00
{
2018-12-03 14:53:18 -07:00
__vmx_complete_interrupts ( & vmx - > vcpu , vmx - > idt_vectoring_info ,
VM_EXIT_INSTRUCTION_LEN ,
IDT_VECTORING_ERROR_CODE ) ;
2018-06-22 17:35:07 -06:00
}
2018-12-03 14:53:18 -07:00
static void vmx_cancel_injection ( struct kvm_vcpu * vcpu )
2016-11-30 13:03:46 -07:00
{
2018-12-03 14:53:18 -07:00
__vmx_complete_interrupts ( vcpu ,
vmcs_read32 ( VM_ENTRY_INTR_INFO_FIELD ) ,
VM_ENTRY_INSTRUCTION_LEN ,
VM_ENTRY_EXCEPTION_ERROR_CODE ) ;
2017-11-05 17:54:48 -07:00
2018-12-03 14:53:18 -07:00
vmcs_write32 ( VM_ENTRY_INTR_INFO_FIELD , 0 ) ;
2016-11-30 13:03:46 -07:00
}
2018-12-03 14:53:18 -07:00
static void atomic_switch_perf_msrs ( struct vcpu_vmx * vmx )
KVM: nVMX: add option to perform early consistency checks via H/W
KVM defers many VMX consistency checks to the CPU, ostensibly for
performance reasons[1], including checks that result in VMFail (as
opposed to VMExit). This behavior may be undesirable for some users
since this means KVM detects certain classes of VMFail only after it
has processed guest state, e.g. emulated MSR load-on-entry. Because
there is a strict ordering between checks that cause VMFail and those
that cause VMExit, i.e. all VMFail checks are performed before any
checks that cause VMExit, we can detect (almost) all VMFail conditions
via a dry run of sorts. The almost qualifier exists because some
state in vmcs02 comes from L0, e.g. VPID, which means that hardware
will never detect an invalid VPID in vmcs12 because it never sees
said value. Software must (continue to) explicitly check such fields.
After preparing vmcs02 with all state needed to pass the VMFail
consistency checks, optionally do a "test" VMEnter with an invalid
GUEST_RFLAGS. If the VMEnter results in a VMExit (due to bad guest
state), then we can safely say that the nested VMEnter should not
VMFail, i.e. any VMFail encountered in nested_vmx_vmexit() must
be due to an L0 bug. GUEST_RFLAGS is used to induce VMExit as it
is unconditionally loaded on all implementations of VMX, has an
invalid value that is writable on a 32-bit system and its consistency
check is performed relatively early in all implementations (the exact
order of consistency checks is micro-architectural).
Unfortunately, since the "passing" case causes a VMExit, KVM must
be extra diligent to ensure that host state is restored, e.g. DR7
and RFLAGS are reset on VMExit. Failure to restore RFLAGS.IF is
particularly fatal.
And of course the extra VMEnter and VMExit impacts performance.
The raw overhead of the early consistency checks is ~6% on modern
hardware (though this could easily vary based on configuration),
while the added latency observed from the L1 VMM is ~10%. The
early consistency checks do not occur in a vacuum, e.g. spending
more time in L0 can lead to more interrupts being serviced while
emulating VMEnter, thereby increasing the latency observed by L1.
Add a module param, early_consistency_checks, to provide control
over whether or not VMX performs the early consistency checks.
In addition to standard on/off behavior, the param accepts a value
of -1, which is essentialy an "auto" setting whereby KVM does
the early checks only when it thinks it's running on bare metal.
When running nested, doing early checks is of dubious value since
the resulting behavior is heavily dependent on L0. In the future,
the "auto" setting could also be used to default to skipping the
early hardware checks for certain configurations/platforms if KVM
reaches a state where it has 100% coverage of VMFail conditions.
[1] To my knowledge no one has implemented and tested full software
emulation of the VMFail consistency checks. Until that happens,
one can only speculate about the actual performance overhead of
doing all VMFail consistency checks in software. Obviously any
code is slower than no code, but in the grand scheme of nested
virtualization it's entirely possible the overhead is negligible.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-26 10:23:57 -06:00
{
2018-12-03 14:53:18 -07:00
int i , nr_msrs ;
struct perf_guest_switch_msr * msrs ;
2011-05-25 14:12:04 -06:00
2018-12-03 14:53:18 -07:00
msrs = perf_guest_get_msrs ( & nr_msrs ) ;
2013-04-20 02:52:36 -06:00
2018-12-03 14:53:18 -07:00
if ( ! msrs )
return ;
2017-11-05 17:54:48 -07:00
2018-12-03 14:53:18 -07:00
for ( i = 0 ; i < nr_msrs ; i + + )
if ( msrs [ i ] . host = = msrs [ i ] . guest )
clear_atomic_switch_msr ( vmx , msrs [ i ] . msr ) ;
else
add_atomic_switch_msr ( vmx , msrs [ i ] . msr , msrs [ i ] . guest ,
msrs [ i ] . host , false ) ;
2016-11-30 13:03:46 -07:00
}
KVM: nVMX: add option to perform early consistency checks via H/W
KVM defers many VMX consistency checks to the CPU, ostensibly for
performance reasons[1], including checks that result in VMFail (as
opposed to VMExit). This behavior may be undesirable for some users
since this means KVM detects certain classes of VMFail only after it
has processed guest state, e.g. emulated MSR load-on-entry. Because
there is a strict ordering between checks that cause VMFail and those
that cause VMExit, i.e. all VMFail checks are performed before any
checks that cause VMExit, we can detect (almost) all VMFail conditions
via a dry run of sorts. The almost qualifier exists because some
state in vmcs02 comes from L0, e.g. VPID, which means that hardware
will never detect an invalid VPID in vmcs12 because it never sees
said value. Software must (continue to) explicitly check such fields.
After preparing vmcs02 with all state needed to pass the VMFail
consistency checks, optionally do a "test" VMEnter with an invalid
GUEST_RFLAGS. If the VMEnter results in a VMExit (due to bad guest
state), then we can safely say that the nested VMEnter should not
VMFail, i.e. any VMFail encountered in nested_vmx_vmexit() must
be due to an L0 bug. GUEST_RFLAGS is used to induce VMExit as it
is unconditionally loaded on all implementations of VMX, has an
invalid value that is writable on a 32-bit system and its consistency
check is performed relatively early in all implementations (the exact
order of consistency checks is micro-architectural).
Unfortunately, since the "passing" case causes a VMExit, KVM must
be extra diligent to ensure that host state is restored, e.g. DR7
and RFLAGS are reset on VMExit. Failure to restore RFLAGS.IF is
particularly fatal.
And of course the extra VMEnter and VMExit impacts performance.
The raw overhead of the early consistency checks is ~6% on modern
hardware (though this could easily vary based on configuration),
while the added latency observed from the L1 VMM is ~10%. The
early consistency checks do not occur in a vacuum, e.g. spending
more time in L0 can lead to more interrupts being serviced while
emulating VMEnter, thereby increasing the latency observed by L1.
Add a module param, early_consistency_checks, to provide control
over whether or not VMX performs the early consistency checks.
In addition to standard on/off behavior, the param accepts a value
of -1, which is essentialy an "auto" setting whereby KVM does
the early checks only when it thinks it's running on bare metal.
When running nested, doing early checks is of dubious value since
the resulting behavior is heavily dependent on L0. In the future,
the "auto" setting could also be used to default to skipping the
early hardware checks for certain configurations/platforms if KVM
reaches a state where it has 100% coverage of VMFail conditions.
[1] To my knowledge no one has implemented and tested full software
emulation of the VMFail consistency checks. Until that happens,
one can only speculate about the actual performance overhead of
doing all VMFail consistency checks in software. Obviously any
code is slower than no code, but in the grand scheme of nested
virtualization it's entirely possible the overhead is negligible.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-26 10:23:57 -06:00
2019-07-16 00:55:50 -06:00
static void atomic_switch_umwait_control_msr ( struct vcpu_vmx * vmx )
{
u32 host_umwait_control ;
if ( ! vmx_has_waitpkg ( vmx ) )
return ;
host_umwait_control = get_umwait_control_msr ( ) ;
if ( vmx - > msr_ia32_umwait_control ! = host_umwait_control )
add_atomic_switch_msr ( vmx , MSR_IA32_UMWAIT_CONTROL ,
vmx - > msr_ia32_umwait_control ,
host_umwait_control , false ) ;
else
clear_atomic_switch_msr ( vmx , MSR_IA32_UMWAIT_CONTROL ) ;
}
2018-12-03 14:53:18 -07:00
static void vmx_update_hv_timer ( struct kvm_vcpu * vcpu )
2016-11-30 13:03:47 -07:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2018-12-03 14:53:18 -07:00
u64 tscl ;
u32 delta_tsc ;
KVM: nVMX: add option to perform early consistency checks via H/W
KVM defers many VMX consistency checks to the CPU, ostensibly for
performance reasons[1], including checks that result in VMFail (as
opposed to VMExit). This behavior may be undesirable for some users
since this means KVM detects certain classes of VMFail only after it
has processed guest state, e.g. emulated MSR load-on-entry. Because
there is a strict ordering between checks that cause VMFail and those
that cause VMExit, i.e. all VMFail checks are performed before any
checks that cause VMExit, we can detect (almost) all VMFail conditions
via a dry run of sorts. The almost qualifier exists because some
state in vmcs02 comes from L0, e.g. VPID, which means that hardware
will never detect an invalid VPID in vmcs12 because it never sees
said value. Software must (continue to) explicitly check such fields.
After preparing vmcs02 with all state needed to pass the VMFail
consistency checks, optionally do a "test" VMEnter with an invalid
GUEST_RFLAGS. If the VMEnter results in a VMExit (due to bad guest
state), then we can safely say that the nested VMEnter should not
VMFail, i.e. any VMFail encountered in nested_vmx_vmexit() must
be due to an L0 bug. GUEST_RFLAGS is used to induce VMExit as it
is unconditionally loaded on all implementations of VMX, has an
invalid value that is writable on a 32-bit system and its consistency
check is performed relatively early in all implementations (the exact
order of consistency checks is micro-architectural).
Unfortunately, since the "passing" case causes a VMExit, KVM must
be extra diligent to ensure that host state is restored, e.g. DR7
and RFLAGS are reset on VMExit. Failure to restore RFLAGS.IF is
particularly fatal.
And of course the extra VMEnter and VMExit impacts performance.
The raw overhead of the early consistency checks is ~6% on modern
hardware (though this could easily vary based on configuration),
while the added latency observed from the L1 VMM is ~10%. The
early consistency checks do not occur in a vacuum, e.g. spending
more time in L0 can lead to more interrupts being serviced while
emulating VMEnter, thereby increasing the latency observed by L1.
Add a module param, early_consistency_checks, to provide control
over whether or not VMX performs the early consistency checks.
In addition to standard on/off behavior, the param accepts a value
of -1, which is essentialy an "auto" setting whereby KVM does
the early checks only when it thinks it's running on bare metal.
When running nested, doing early checks is of dubious value since
the resulting behavior is heavily dependent on L0. In the future,
the "auto" setting could also be used to default to skipping the
early hardware checks for certain configurations/platforms if KVM
reaches a state where it has 100% coverage of VMFail conditions.
[1] To my knowledge no one has implemented and tested full software
emulation of the VMFail consistency checks. Until that happens,
one can only speculate about the actual performance overhead of
doing all VMFail consistency checks in software. Obviously any
code is slower than no code, but in the grand scheme of nested
virtualization it's entirely possible the overhead is negligible.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-26 10:23:57 -06:00
2018-12-03 14:53:18 -07:00
if ( vmx - > req_immediate_exit ) {
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
vmcs_write32 ( VMX_PREEMPTION_TIMER_VALUE , 0 ) ;
vmx - > loaded_vmcs - > hv_timer_soft_disabled = false ;
} else if ( vmx - > hv_deadline_tsc ! = - 1 ) {
2018-12-03 14:53:18 -07:00
tscl = rdtsc ( ) ;
if ( vmx - > hv_deadline_tsc > tscl )
/* set_hv_timer ensures the delta fits in 32-bits */
delta_tsc = ( u32 ) ( ( vmx - > hv_deadline_tsc - tscl ) > >
cpu_preemption_timer_multi ) ;
else
delta_tsc = 0 ;
2016-11-30 13:03:47 -07:00
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
vmcs_write32 ( VMX_PREEMPTION_TIMER_VALUE , delta_tsc ) ;
vmx - > loaded_vmcs - > hv_timer_soft_disabled = false ;
} else if ( ! vmx - > loaded_vmcs - > hv_timer_soft_disabled ) {
vmcs_write32 ( VMX_PREEMPTION_TIMER_VALUE , - 1 ) ;
vmx - > loaded_vmcs - > hv_timer_soft_disabled = true ;
2018-07-18 10:49:01 -06:00
}
2016-11-30 13:03:47 -07:00
}
2019-01-25 08:41:04 -07:00
void vmx_update_host_rsp ( struct vcpu_vmx * vmx , unsigned long host_rsp )
2016-11-30 13:03:46 -07:00
{
2019-01-25 08:41:04 -07:00
if ( unlikely ( host_rsp ! = vmx - > loaded_vmcs - > host_state . rsp ) ) {
vmx - > loaded_vmcs - > host_state . rsp = host_rsp ;
vmcs_writel ( HOST_RSP , host_rsp ) ;
}
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
}
2013-04-14 04:12:46 -06:00
2020-04-28 00:23:25 -06:00
static fastpath_t vmx_exit_handlers_fastpath ( struct kvm_vcpu * vcpu )
2020-04-28 00:23:23 -06:00
{
switch ( to_vmx ( vcpu ) - > exit_reason ) {
case EXIT_REASON_MSR_WRITE :
return handle_fastpath_set_msr_irqoff ( vcpu ) ;
2020-05-06 09:44:01 -06:00
case EXIT_REASON_PREEMPTION_TIMER :
return handle_fastpath_preemption_timer ( vcpu ) ;
2020-04-28 00:23:23 -06:00
default :
return EXIT_FASTPATH_NONE ;
}
}
2019-01-25 08:41:19 -07:00
bool __vmx_vcpu_run ( struct vcpu_vmx * vmx , unsigned long * regs , bool launched ) ;
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
2020-04-28 00:23:25 -06:00
static fastpath_t vmx_vcpu_run ( struct kvm_vcpu * vcpu )
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
{
2020-04-28 00:23:25 -06:00
fastpath_t exit_fastpath ;
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
unsigned long cr3 , cr4 ;
2020-04-28 00:23:25 -06:00
reenter_guest :
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
/* Record the guest's net vcpu time for enforced NMI injections. */
if ( unlikely ( ! enable_vnmi & &
vmx - > loaded_vmcs - > soft_vnmi_blocked ) )
vmx - > loaded_vmcs - > entry_time = ktime_get ( ) ;
/* Don't enter VMX if guest state is invalid, let the exit handler
start emulation until we arrive back to a valid state */
if ( vmx - > emulation_required )
2020-04-10 11:47:03 -06:00
return EXIT_FASTPATH_NONE ;
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
if ( vmx - > ple_window_dirty ) {
vmx - > ple_window_dirty = false ;
vmcs_write32 ( PLE_WINDOW , vmx - > ple_window ) ;
}
2020-02-17 03:37:43 -07:00
/*
* We did this in prepare_switch_to_guest , because it needs to
* be within srcu_read_lock .
*/
WARN_ON_ONCE ( vmx - > nested . need_vmcs12_to_shadow_sync ) ;
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
2019-09-27 15:45:22 -06:00
if ( kvm_register_is_dirty ( vcpu , VCPU_REGS_RSP ) )
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
vmcs_writel ( GUEST_RSP , vcpu - > arch . regs [ VCPU_REGS_RSP ] ) ;
2019-09-27 15:45:22 -06:00
if ( kvm_register_is_dirty ( vcpu , VCPU_REGS_RIP ) )
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
vmcs_writel ( GUEST_RIP , vcpu - > arch . regs [ VCPU_REGS_RIP ] ) ;
cr3 = __get_current_cr3_fast ( ) ;
if ( unlikely ( cr3 ! = vmx - > loaded_vmcs - > host_state . cr3 ) ) {
vmcs_writel ( HOST_CR3 , cr3 ) ;
vmx - > loaded_vmcs - > host_state . cr3 = cr3 ;
}
cr4 = cr4_read_shadow ( ) ;
if ( unlikely ( cr4 ! = vmx - > loaded_vmcs - > host_state . cr4 ) ) {
vmcs_writel ( HOST_CR4 , cr4 ) ;
vmx - > loaded_vmcs - > host_state . cr4 = cr4 ;
}
/* When single-stepping over STI and MOV SS, we must clear the
* corresponding interruptibility bits in the guest state . Otherwise
* vmentry fails as it then expects bit 14 ( BS ) in pending debug
* exceptions being set , but that ' s not correct for the guest debugging
* case . */
if ( vcpu - > guest_debug & KVM_GUESTDBG_SINGLESTEP )
vmx_set_interrupt_shadow ( vcpu , 0 ) ;
2019-10-21 17:30:25 -06:00
kvm_load_guest_xsave_state ( vcpu ) ;
2019-04-12 01:55:39 -06:00
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
pt_guest_enter ( vmx ) ;
2020-06-19 03:40:46 -06:00
atomic_switch_perf_msrs ( vmx ) ;
2019-07-16 00:55:50 -06:00
atomic_switch_umwait_control_msr ( vmx ) ;
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
if ( enable_preemption_timer )
vmx_update_hv_timer ( vcpu ) ;
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
2019-05-20 02:18:09 -06:00
if ( lapic_in_kernel ( vcpu ) & &
vcpu - > arch . apic - > lapic_timer . timer_advance_ns )
kvm_wait_lapic_expire ( vcpu ) ;
KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.
Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.
Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined. Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function. Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone. At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function. Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.
Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-15 18:10:53 -07:00
/*
* If this vCPU has touched SPEC_CTRL , restore the guest ' s value if
* it ' s non - zero . Since vmentry is serialising on affected CPUs , there
* is no need to worry about the conditional branch over the wrmsr
* being speculatively taken .
*/
x86_spec_ctrl_set_guest ( vmx - > spec_ctrl , 0 ) ;
2019-05-14 08:57:29 -06:00
/* L1D Flush includes CPU buffer clear to mitigate MDS */
2019-01-25 08:41:13 -07:00
if ( static_branch_unlikely ( & vmx_l1d_should_flush ) )
vmx_l1d_flush ( vcpu ) ;
2019-05-14 08:57:29 -06:00
else if ( static_branch_unlikely ( & mds_user_clear ) )
mds_clear_cpu_buffers ( ) ;
2019-01-25 08:41:13 -07:00
if ( vcpu - > arch . cr2 ! = read_cr2 ( ) )
write_cr2 ( vcpu - > arch . cr2 ) ;
2019-01-25 08:41:19 -07:00
vmx - > fail = __vmx_vcpu_run ( vmx , ( unsigned long * ) & vcpu - > arch . regs ,
vmx - > loaded_vmcs - > launched ) ;
2019-01-25 08:41:13 -07:00
vcpu - > arch . cr2 = read_cr2 ( ) ;
2014-03-07 12:03:12 -07:00
2018-12-03 14:53:18 -07:00
/*
* We do not use IBRS in the kernel . If this vCPU has used the
* SPEC_CTRL MSR it may have left it on ; save the value and
* turn it off . This is much more efficient than blindly adding
* it to the atomic save / restore list . Especially as the former
* ( Saving guest MSRs on vmexit ) doesn ' t even exist in KVM .
*
* For non - nested case :
* If the L01 MSR bitmap does not intercept the MSR , then we need to
* save it .
*
* For nested case :
* If the L02 MSR bitmap does not intercept the MSR , then we need to
* save it .
*/
if ( unlikely ( ! msr_write_intercepted ( vcpu , MSR_IA32_SPEC_CTRL ) ) )
vmx - > spec_ctrl = native_read_msr ( MSR_IA32_SPEC_CTRL ) ;
2014-03-07 12:03:12 -07:00
2018-12-03 14:53:18 -07:00
x86_spec_ctrl_restore_host ( vmx - > spec_ctrl , 0 ) ;
KVM: VMX: use preemption timer to force immediate VMExit
A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest. Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI. This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).
When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI. Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed. But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0. Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.
For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing. The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter. Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.
Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate. There are no known errata affecting timer value of '0'.
[1] I/O SMIs also have guarantees on when they arrive, but I have
no idea if/how those are emulated in KVM.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-27 16:21:12 -06:00
2018-12-03 14:53:18 -07:00
/* All fields are clean at this point */
if ( static_branch_unlikely ( & enable_evmcs ) )
current_evmcs - > hv_clean_fields | =
HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL ;
2014-03-07 12:03:13 -07:00
2019-08-22 08:30:21 -06:00
if ( static_branch_unlikely ( & enable_evmcs ) )
current_evmcs - > hv_vp_id = vcpu - > arch . hyperv . vp_index ;
2018-12-03 14:53:18 -07:00
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
if ( vmx - > host_debugctlmsr )
update_debugctlmsr ( vmx - > host_debugctlmsr ) ;
2014-03-07 12:03:13 -07:00
2018-12-03 14:53:18 -07:00
# ifndef CONFIG_X86_64
/*
* The sysexit path does not restore ds / es , so we must set them to
* a reasonable value ourselves .
*
* We can ' t defer this to vmx_prepare_switch_to_host ( ) since that
* function may be executed in interrupt context , which saves and
* restore segments around it , nullifying its effect .
*/
loadsegment ( ds , __USER_DS ) ;
loadsegment ( es , __USER_DS ) ;
# endif
2011-05-25 14:11:34 -06:00
KVM: nVMX: Reset register cache (available and dirty masks) on VMCS switch
Reset the per-vCPU available and dirty register masks when switching
between vmcs01 and vmcs02, as the masks track state relative to the
current VMCS. The stale masks don't cause problems in the current code
base because the registers are either unconditionally written on nested
transitions or, in the case of segment registers, have an additional
tracker that is manually reset.
Note, by dropping (previously implicitly, now explicitly) the dirty mask
when switching the active VMCS, KVM is technically losing writes to the
associated fields. But, the only regs that can be dirtied (RIP, RSP and
PDPTRs) are unconditionally written on nested transitions, e.g. explicit
writeback is a waste of cycles, and a WARN_ON would be rather pointless.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200415203454.8296-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15 14:34:51 -06:00
vmx_register_cache_reset ( vcpu ) ;
2013-09-16 02:11:44 -06:00
2018-10-24 02:05:12 -06:00
pt_guest_exit ( vmx ) ;
2019-10-21 17:30:25 -06:00
kvm_load_host_xsave_state ( vcpu ) ;
2019-04-12 01:55:39 -06:00
2018-12-03 14:53:18 -07:00
vmx - > nested . nested_run_pending = 0 ;
vmx - > idt_vectoring_info = 0 ;
2016-09-04 12:22:47 -06:00
2020-04-10 11:47:02 -06:00
if ( unlikely ( vmx - > fail ) ) {
vmx - > exit_reason = 0xdead ;
2020-04-10 11:47:03 -06:00
return EXIT_FASTPATH_NONE ;
2020-04-10 11:47:02 -06:00
}
vmx - > exit_reason = vmcs_read32 ( VM_EXIT_REASON ) ;
if ( unlikely ( ( u16 ) vmx - > exit_reason = = EXIT_REASON_MCE_DURING_VMENTRY ) )
2019-04-19 23:50:55 -06:00
kvm_machine_check ( ) ;
2020-04-28 00:23:23 -06:00
trace_kvm_exit ( vmx - > exit_reason , vcpu , KVM_ISA_VMX ) ;
2020-04-10 11:47:02 -06:00
if ( unlikely ( vmx - > exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY ) )
2020-04-10 11:47:03 -06:00
return EXIT_FASTPATH_NONE ;
2018-12-03 14:53:18 -07:00
vmx - > loaded_vmcs - > launched = 1 ;
vmx - > idt_vectoring_info = vmcs_read32 ( IDT_VECTORING_INFO_FIELD ) ;
2013-03-13 09:06:41 -06:00
2018-12-03 14:53:18 -07:00
vmx_recover_nmi_blocking ( vmx ) ;
vmx_complete_interrupts ( vmx ) ;
2020-04-10 11:47:03 -06:00
2020-04-28 00:23:23 -06:00
if ( is_guest_mode ( vcpu ) )
return EXIT_FASTPATH_NONE ;
exit_fastpath = vmx_exit_handlers_fastpath ( vcpu ) ;
2020-04-28 00:23:25 -06:00
if ( exit_fastpath = = EXIT_FASTPATH_REENTER_GUEST ) {
if ( ! kvm_vcpu_exit_request ( vcpu ) ) {
/*
* FIXME : this goto should be a loop in vcpu_enter_guest ,
* but it would incur the cost of a retpoline for now .
* Revisit once static calls are available .
*/
2020-04-28 00:23:27 -06:00
if ( vcpu - > arch . apicv_active )
vmx_sync_pir_to_irr ( vcpu ) ;
2020-04-28 00:23:25 -06:00
goto reenter_guest ;
}
exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED ;
}
2020-04-10 11:47:03 -06:00
return exit_fastpath ;
2018-12-03 14:53:18 -07:00
}
2014-06-16 05:59:43 -06:00
2018-12-03 14:53:18 -07:00
static void vmx_free_vcpu ( struct kvm_vcpu * vcpu )
2016-11-30 13:03:42 -07:00
{
2018-12-03 14:53:18 -07:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2011-05-25 14:11:34 -06:00
2018-12-03 14:53:18 -07:00
if ( enable_pml )
vmx_destroy_pml_buffer ( vmx ) ;
free_vpid ( vmx - > vpid ) ;
nested_vmx_free_vcpu ( vcpu ) ;
free_loaded_vmcs ( vmx - > loaded_vmcs ) ;
}
2011-05-25 14:11:34 -06:00
2019-12-18 14:54:55 -07:00
static int vmx_create_vcpu ( struct kvm_vcpu * vcpu )
2018-12-03 14:53:18 -07:00
{
2019-02-11 12:02:52 -07:00
struct vcpu_vmx * vmx ;
2018-12-03 14:53:18 -07:00
unsigned long * msr_bitmap ;
2019-12-18 14:54:50 -07:00
int i , cpu , err ;
2011-05-25 14:11:34 -06:00
2019-12-18 14:54:52 -07:00
BUILD_BUG_ON ( offsetof ( struct vcpu_vmx , vcpu ) ! = 0 ) ;
vmx = to_vmx ( vcpu ) ;
2019-07-21 22:26:21 -06:00
2018-12-03 14:53:18 -07:00
err = - ENOMEM ;
2018-11-06 15:53:56 -07:00
2018-12-03 14:53:18 -07:00
vmx - > vpid = allocate_vpid ( ) ;
2017-07-06 17:33:05 -06:00
2013-04-14 04:12:46 -06:00
/*
2018-12-03 14:53:18 -07:00
* If PML is turned on , failure on enabling PML just results in failure
* of creating the vcpu , therefore we can simplify PML logic ( by
* avoiding dealing with cases , such as enabling PML partially on vcpus
2019-12-10 23:26:22 -07:00
* for the guest ) , etc .
2013-04-14 04:12:46 -06:00
*/
2018-12-03 14:53:18 -07:00
if ( enable_pml ) {
2019-02-11 12:02:52 -07:00
vmx - > pml_pg = alloc_page ( GFP_KERNEL_ACCOUNT | __GFP_ZERO ) ;
2018-12-03 14:53:18 -07:00
if ( ! vmx - > pml_pg )
2019-12-18 14:54:55 -07:00
goto free_vpid ;
2018-12-03 14:53:18 -07:00
}
2011-05-25 14:11:34 -06:00
2019-12-03 17:24:42 -07:00
BUILD_BUG_ON ( ARRAY_SIZE ( vmx_msr_index ) ! = NR_SHARED_MSRS ) ;
2011-05-25 14:11:34 -06:00
2019-10-20 03:11:00 -06:00
for ( i = 0 ; i < ARRAY_SIZE ( vmx_msr_index ) ; + + i ) {
u32 index = vmx_msr_index [ i ] ;
u32 data_low , data_high ;
int j = vmx - > nmsrs ;
if ( rdmsr_safe ( index , & data_low , & data_high ) < 0 )
continue ;
if ( wrmsr_safe ( index , data_low , data_high ) < 0 )
continue ;
2019-11-21 02:01:51 -07:00
2019-10-20 03:11:00 -06:00
vmx - > guest_msrs [ j ] . index = i ;
vmx - > guest_msrs [ j ] . data = 0 ;
2019-11-21 02:01:51 -07:00
switch ( index ) {
case MSR_IA32_TSX_CTRL :
/*
* No need to pass TSX_CTRL_CPUID_CLEAR through , so
* let ' s avoid changing CPUID bits under the host
* kernel ' s feet .
*/
vmx - > guest_msrs [ j ] . mask = ~ ( u64 ) TSX_CTRL_CPUID_CLEAR ;
break ;
default :
vmx - > guest_msrs [ j ] . mask = - 1ull ;
break ;
}
2019-10-20 03:11:00 -06:00
+ + vmx - > nmsrs ;
}
2018-12-03 14:53:18 -07:00
err = alloc_loaded_vmcs ( & vmx - > vmcs01 ) ;
if ( err < 0 )
2019-12-03 17:24:42 -07:00
goto free_pml ;
2018-09-26 10:23:53 -06:00
2018-12-03 14:53:18 -07:00
msr_bitmap = vmx - > vmcs01 . msr_bitmap ;
2018-11-09 10:35:11 -07:00
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_IA32_TSC , MSR_TYPE_R ) ;
2018-12-03 14:53:18 -07:00
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_FS_BASE , MSR_TYPE_RW ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_GS_BASE , MSR_TYPE_RW ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_KERNEL_GS_BASE , MSR_TYPE_RW ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_IA32_SYSENTER_CS , MSR_TYPE_RW ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_IA32_SYSENTER_ESP , MSR_TYPE_RW ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_IA32_SYSENTER_EIP , MSR_TYPE_RW ) ;
2019-12-18 14:54:55 -07:00
if ( kvm_cstate_in_guest ( vcpu - > kvm ) ) {
2019-05-21 00:06:53 -06:00
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_CORE_C1_RES , MSR_TYPE_R ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_CORE_C3_RESIDENCY , MSR_TYPE_R ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_CORE_C6_RESIDENCY , MSR_TYPE_R ) ;
vmx_disable_intercept_for_msr ( msr_bitmap , MSR_CORE_C7_RESIDENCY , MSR_TYPE_R ) ;
}
2018-12-03 14:53:18 -07:00
vmx - > msr_bitmap_mode = 0 ;
2011-05-25 14:11:34 -06:00
2018-12-03 14:53:18 -07:00
vmx - > loaded_vmcs = & vmx - > vmcs01 ;
cpu = get_cpu ( ) ;
2019-12-18 14:54:50 -07:00
vmx_vcpu_load ( vcpu , cpu ) ;
vcpu - > cpu = cpu ;
2019-10-20 03:11:01 -06:00
init_vmcs ( vmx ) ;
2019-12-18 14:54:50 -07:00
vmx_vcpu_put ( vcpu ) ;
2018-12-03 14:53:18 -07:00
put_cpu ( ) ;
2019-12-18 14:54:50 -07:00
if ( cpu_need_virtualize_apic_accesses ( vcpu ) ) {
2019-12-18 14:54:55 -07:00
err = alloc_apic_access_page ( vcpu - > kvm ) ;
2018-12-03 14:53:18 -07:00
if ( err )
goto free_vmcs ;
}
if ( enable_ept & & ! enable_unrestricted_guest ) {
2019-12-18 14:54:55 -07:00
err = init_rmode_identity_map ( vcpu - > kvm ) ;
2018-12-03 14:53:18 -07:00
if ( err )
goto free_vmcs ;
}
2011-05-25 14:11:34 -06:00
2018-12-03 14:53:18 -07:00
if ( nested )
nested_vmx_setup_ctls_msrs ( & vmx - > nested . msrs ,
2020-02-20 10:22:04 -07:00
vmx_capability . ept ) ;
2018-12-03 14:53:18 -07:00
else
memset ( & vmx - > nested . msrs , 0 , sizeof ( vmx - > nested . msrs ) ) ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2018-12-03 14:53:18 -07:00
vmx - > nested . posted_intr_nv = - 1 ;
vmx - > nested . current_vmptr = - 1ull ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2020-02-11 10:40:58 -07:00
vcpu - > arch . microcode_version = 0x100000000ULL ;
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
vmx - > msr_ia32_feature_control_valid_bits = FEAT_CTL_LOCKED ;
2013-09-25 03:51:36 -06:00
2018-05-22 08:16:14 -06:00
/*
2018-12-03 14:53:18 -07:00
* Enforce invariant : pi_desc . nv is always either POSTED_INTR_VECTOR
* or POSTED_INTR_WAKEUP_VECTOR .
2018-05-22 08:16:14 -06:00
*/
2018-12-03 14:53:18 -07:00
vmx - > pi_desc . nv = POSTED_INTR_VECTOR ;
vmx - > pi_desc . sn = 1 ;
2011-05-25 14:11:34 -06:00
2018-12-06 00:34:36 -07:00
vmx - > ept_pointer = INVALID_PAGE ;
2019-12-18 14:54:52 -07:00
return 0 ;
2011-05-25 14:11:34 -06:00
2018-12-03 14:53:18 -07:00
free_vmcs :
free_loaded_vmcs ( vmx - > loaded_vmcs ) ;
free_pml :
vmx_destroy_pml_buffer ( vmx ) ;
2019-12-18 14:54:55 -07:00
free_vpid :
2018-12-03 14:53:18 -07:00
free_vpid ( vmx - > vpid ) ;
2019-12-18 14:54:52 -07:00
return err ;
2018-12-03 14:53:18 -07:00
}
2014-02-24 04:30:04 -07:00
2019-02-19 03:10:49 -07:00
# define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https: //www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
# define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https: //www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
2013-07-15 02:04:08 -06:00
2018-12-03 14:53:18 -07:00
static int vmx_vm_init ( struct kvm * kvm )
{
spin_lock_init ( & to_kvm_vmx ( kvm ) - > ept_pointer_lock ) ;
2014-12-10 22:52:58 -07:00
2018-12-03 14:53:18 -07:00
if ( ! ple_gap )
kvm - > arch . pause_in_guest = true ;
2015-02-03 08:49:31 -07:00
2018-12-03 14:53:18 -07:00
if ( boot_cpu_has ( X86_BUG_L1TF ) & & enable_ept ) {
switch ( l1tf_mitigation ) {
case L1TF_MITIGATION_OFF :
case L1TF_MITIGATION_FLUSH_NOWARN :
/* 'I explicitly don't care' is set */
break ;
case L1TF_MITIGATION_FLUSH :
case L1TF_MITIGATION_FLUSH_NOSMT :
case L1TF_MITIGATION_FULL :
/*
* Warn upon starting the first VM in a potentially
* insecure environment .
*/
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled. However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.
The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case. So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.
Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:
1) /sys/devices/system/cpu/smt/control showed "on" instead of
"notsupported"; and
2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
I'd propose that we instead consider #1 above to not actually be a
problem. Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later. So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).
The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.
So fix it by:
a) reverting the original "fix" and its followup fix:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")
and
b) changing vmx_vm_init() to query the actual current SMT state --
instead of the sysfs control value -- to determine whether the L1TF
warning is needed. This also requires the 'sched_smt_present'
variable to exported, instead of 'cpu_smt_control'.
Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30 06:13:58 -07:00
if ( sched_smt_active ( ) )
2018-12-03 14:53:18 -07:00
pr_warn_once ( L1TF_MSG_SMT ) ;
if ( l1tf_vmx_mitigation = = VMENTER_L1D_FLUSH_NEVER )
pr_warn_once ( L1TF_MSG_L1D ) ;
break ;
case L1TF_MITIGATION_FULL_FORCE :
/* Flush is enforced */
break ;
}
}
2019-11-14 13:15:05 -07:00
kvm_apicv_init ( kvm , enable_apicv ) ;
2018-12-03 14:53:18 -07:00
return 0 ;
2011-05-25 14:11:34 -06:00
}
2019-04-19 23:18:17 -06:00
static int __init vmx_check_processor_compat ( void )
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
{
2018-12-03 14:53:18 -07:00
struct vmcs_config vmcs_conf ;
struct vmx_capability vmx_cap ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2019-12-20 21:45:10 -07:00
if ( ! this_cpu_has ( X86_FEATURE_MSR_IA32_FEAT_CTL ) | |
! this_cpu_has ( X86_FEATURE_VMX ) ) {
pr_err ( " kvm: VMX is disabled on CPU %d \n " , smp_processor_id ( ) ) ;
return - EIO ;
}
2018-12-03 14:53:18 -07:00
if ( setup_vmcs_config ( & vmcs_conf , & vmx_cap ) < 0 )
2019-04-19 23:18:17 -06:00
return - EIO ;
2018-12-03 14:53:18 -07:00
if ( nested )
2020-02-20 10:22:04 -07:00
nested_vmx_setup_ctls_msrs ( & vmcs_conf . nested , vmx_cap . ept ) ;
2018-12-03 14:53:18 -07:00
if ( memcmp ( & vmcs_config , & vmcs_conf , sizeof ( struct vmcs_config ) ) ! = 0 ) {
printk ( KERN_ERR " kvm: CPU %d feature inconsistency! \n " ,
smp_processor_id ( ) ) ;
2019-04-19 23:18:17 -06:00
return - EIO ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
}
2019-04-19 23:18:17 -06:00
return 0 ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
}
2018-12-03 14:53:18 -07:00
static u64 vmx_get_mt_mask ( struct kvm_vcpu * vcpu , gfn_t gfn , bool is_mmio )
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
{
2018-12-03 14:53:18 -07:00
u8 cache ;
u64 ipat = 0 ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2020-02-13 14:30:34 -07:00
/* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
* memory aliases with conflicting memory types and sometimes MCEs .
* We have to be careful as to what are honored and when .
*
* For MMIO , guest CD / MTRR are ignored . The EPT memory type is set to
* UC . The effective memory type is UC or WC depending on guest PAT .
* This was historically the source of MCEs and we want to be
* conservative .
*
* When there is no need to deal with noncoherent DMA ( e . g . , no VT - d
* or VT - d has snoop control ) , guest CD / MTRR / PAT are all ignored . The
* EPT memory type is set to WB . The effective memory type is forced
* WB .
*
* Otherwise , we trust guest . Guest CD / MTRR / PAT are all honored . The
* EPT memory type is used to emulate guest CD / MTRR .
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
*/
2020-02-13 14:30:34 -07:00
2018-12-03 14:53:18 -07:00
if ( is_mmio ) {
cache = MTRR_TYPE_UNCACHABLE ;
goto exit ;
}
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2018-12-03 14:53:18 -07:00
if ( ! kvm_arch_has_noncoherent_dma ( vcpu - > kvm ) ) {
ipat = VMX_EPT_IPAT_BIT ;
cache = MTRR_TYPE_WRBACK ;
goto exit ;
}
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2018-12-03 14:53:18 -07:00
if ( kvm_read_cr0 ( vcpu ) & X86_CR0_CD ) {
ipat = VMX_EPT_IPAT_BIT ;
if ( kvm_check_has_quirk ( vcpu - > kvm , KVM_X86_QUIRK_CD_NW_CLEARED ) )
cache = MTRR_TYPE_WRBACK ;
else
cache = MTRR_TYPE_UNCACHABLE ;
goto exit ;
}
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2018-12-03 14:53:18 -07:00
cache = kvm_mtrr_get_guest_memory_type ( vcpu , gfn ) ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2018-12-03 14:53:18 -07:00
exit :
return ( cache < < VMX_EPT_MT_EPTE_SHIFT ) | ipat ;
}
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2019-05-07 13:17:57 -06:00
static void vmcs_set_secondary_exec_control ( struct vcpu_vmx * vmx )
2018-12-03 14:53:18 -07:00
{
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
/*
2018-12-03 14:53:18 -07:00
* These bits in the secondary execution controls field
* are dynamic , the others are mostly based on the hypervisor
* architecture and the guest ' s CPUID . Do not touch the
* dynamic bits .
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
*/
2018-12-03 14:53:18 -07:00
u32 mask =
SECONDARY_EXEC_SHADOW_VMCS |
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
SECONDARY_EXEC_DESC ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2019-05-07 13:17:57 -06:00
u32 new_ctl = vmx - > secondary_exec_control ;
u32 cur_ctl = secondary_exec_controls_get ( vmx ) ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
2019-05-07 13:17:57 -06:00
secondary_exec_controls_set ( vmx , ( new_ctl & ~ mask ) | ( cur_ctl & mask ) ) ;
KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
A VMEnter that VMFails (as opposed to VMExits) does not touch host
state beyond registers that are explicitly noted in the VMFail path,
e.g. EFLAGS. Host state does not need to be loaded because VMFail
is only signaled for consistency checks that occur before the CPU
starts to load guest state, i.e. there is no need to restore any
state as nothing has been modified. But in the case where a VMFail
is detected by hardware and not by KVM (due to deferring consistency
checks to hardware), KVM has already loaded some amount of guest
state. Luckily, "loaded" only means loaded to KVM's software model,
i.e. vmcs01 has not been modified. So, unwind our software model to
the pre-VMEntry host state.
Not restoring host state in this VMFail path leads to a variety of
failures because we end up with stale data in vcpu->arch, e.g. CR0,
CR4, EFER, etc... will all be out of sync relative to vmcs01. Any
significant delta in the stale data is all but guaranteed to crash
L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
An alternative to this "soft" reload would be to load host state from
vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
wildly inconsistent with respect to the VMX architecture, e.g. an L1
VMM with separate VMExit and VMFail paths would explode.
Note that this approach does not mean KVM is 100% accurate with
respect to VMX hardware behavior, even at an architectural level
(the exact order of consistency checks is microarchitecture specific).
But 100% emulation accuracy isn't the goal (with this patch), rather
the goal is to be consistent in the information delivered to L1, e.g.
a VMExit should not fall-through VMENTER, and a VMFail should not jump
to HOST_RIP.
This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu
context after VMLAUNCH/VMRESUME failure)", but retains the core
aspects of that patch, just in an open coded form due to the need to
pull state from vmcs01 instead of vmcs12. Restoring host state
resolves a variety of issues introduced by commit "4f350c6dbcb9
(kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
which remedied the incorrect behavior of treating VMFail like VMExit
but in doing so neglected to restore arch state that had been modified
prior to attempting nested VMEnter.
A sample failure that occurs due to stale vcpu.arch state is a fault
of some form while emulating an LGDT (due to emulated UMIP) from L1
after a failed VMEntry to L3, in this case when running the KVM unit
test test_tpr_threshold_values in L1. L0 also hits a WARN in this
case due to a stale arch.cr4.UMIP.
L1:
BUG: unable to handle kernel paging request at ffffc90000663b9e
PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
Oops: 0009 [#1] SMP
CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_load_gdt+0x0/0x10
...
Call Trace:
load_fixmap_gdt+0x22/0x30
__vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
vmx_handle_exit+0x246/0x15a0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x600
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4f/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
L0:
WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
...
CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
Hardware name: Intel Corporation Kabylake Client platform/KBL S
RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
...
Call Trace:
kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
do_vfs_ioctl+0x9f/0x5e0
ksys_ioctl+0x66/0x70
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x49/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
Fixes: 4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22 15:57:07 -06:00
}
2011-05-25 14:11:34 -06:00
/*
2018-12-03 14:53:18 -07:00
* Generate MSR_IA32_VMX_CR { 0 , 4 } _FIXED1 according to CPUID . Only set bits
* ( indicating " allowed-1 " ) if they are supported in the guest ' s CPUID .
2011-05-25 14:11:34 -06:00
*/
2018-12-03 14:53:18 -07:00
static void nested_vmx_cr_fixed1_bits_update ( struct kvm_vcpu * vcpu )
2011-05-25 14:11:34 -06:00
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2018-12-03 14:53:18 -07:00
struct kvm_cpuid_entry2 * entry ;
2011-05-25 14:11:34 -06:00
2018-12-03 14:53:18 -07:00
vmx - > nested . msrs . cr0_fixed1 = 0xffffffff ;
vmx - > nested . msrs . cr4_fixed1 = X86_CR4_PCE ;
2018-04-13 21:10:52 -06:00
2018-12-03 14:53:18 -07:00
# define cr4_fixed1_update(_cr4_mask, _reg, _cpuid_mask) do { \
if ( entry & & ( entry - > _reg & ( _cpuid_mask ) ) ) \
vmx - > nested . msrs . cr4_fixed1 | = ( _cr4_mask ) ; \
} while ( 0 )
2014-12-10 22:52:58 -07:00
2018-12-03 14:53:18 -07:00
entry = kvm_find_cpuid_entry ( vcpu , 0x1 , 0 ) ;
2019-12-17 14:32:42 -07:00
cr4_fixed1_update ( X86_CR4_VME , edx , feature_bit ( VME ) ) ;
cr4_fixed1_update ( X86_CR4_PVI , edx , feature_bit ( VME ) ) ;
cr4_fixed1_update ( X86_CR4_TSD , edx , feature_bit ( TSC ) ) ;
cr4_fixed1_update ( X86_CR4_DE , edx , feature_bit ( DE ) ) ;
cr4_fixed1_update ( X86_CR4_PSE , edx , feature_bit ( PSE ) ) ;
cr4_fixed1_update ( X86_CR4_PAE , edx , feature_bit ( PAE ) ) ;
cr4_fixed1_update ( X86_CR4_MCE , edx , feature_bit ( MCE ) ) ;
cr4_fixed1_update ( X86_CR4_PGE , edx , feature_bit ( PGE ) ) ;
cr4_fixed1_update ( X86_CR4_OSFXSR , edx , feature_bit ( FXSR ) ) ;
cr4_fixed1_update ( X86_CR4_OSXMMEXCPT , edx , feature_bit ( XMM ) ) ;
cr4_fixed1_update ( X86_CR4_VMXE , ecx , feature_bit ( VMX ) ) ;
cr4_fixed1_update ( X86_CR4_SMXE , ecx , feature_bit ( SMX ) ) ;
cr4_fixed1_update ( X86_CR4_PCIDE , ecx , feature_bit ( PCID ) ) ;
cr4_fixed1_update ( X86_CR4_OSXSAVE , ecx , feature_bit ( XSAVE ) ) ;
2018-06-22 17:35:08 -06:00
2018-12-03 14:53:18 -07:00
entry = kvm_find_cpuid_entry ( vcpu , 0x7 , 0 ) ;
2019-12-17 14:32:42 -07:00
cr4_fixed1_update ( X86_CR4_FSGSBASE , ebx , feature_bit ( FSGSBASE ) ) ;
cr4_fixed1_update ( X86_CR4_SMEP , ebx , feature_bit ( SMEP ) ) ;
cr4_fixed1_update ( X86_CR4_SMAP , ebx , feature_bit ( SMAP ) ) ;
cr4_fixed1_update ( X86_CR4_PKE , ecx , feature_bit ( PKU ) ) ;
cr4_fixed1_update ( X86_CR4_UMIP , ecx , feature_bit ( UMIP ) ) ;
cr4_fixed1_update ( X86_CR4_LA57 , ecx , feature_bit ( LA57 ) ) ;
2016-09-06 10:33:21 -06:00
2018-12-03 14:53:18 -07:00
# undef cr4_fixed1_update
}
2013-02-23 14:35:37 -07:00
2018-12-03 14:53:18 -07:00
static void nested_vmx_entry_exit_ctls_update ( struct kvm_vcpu * vcpu )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2018-08-27 16:21:11 -06:00
2018-12-03 14:53:18 -07:00
if ( kvm_mpx_supported ( ) ) {
bool mpx_enabled = guest_cpuid_has ( vcpu , X86_FEATURE_MPX ) ;
2011-05-25 14:11:34 -06:00
2018-12-03 14:53:18 -07:00
if ( mpx_enabled ) {
vmx - > nested . msrs . entry_ctls_high | = VM_ENTRY_LOAD_BNDCFGS ;
vmx - > nested . msrs . exit_ctls_high | = VM_EXIT_CLEAR_BNDCFGS ;
} else {
vmx - > nested . msrs . entry_ctls_high & = ~ VM_ENTRY_LOAD_BNDCFGS ;
vmx - > nested . msrs . exit_ctls_high & = ~ VM_EXIT_CLEAR_BNDCFGS ;
}
2016-08-08 12:16:23 -06:00
}
2018-12-03 14:53:18 -07:00
}
2011-05-25 14:11:34 -06:00
2018-10-24 02:05:13 -06:00
static void update_intel_pt_cfg ( struct kvm_vcpu * vcpu )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
struct kvm_cpuid_entry2 * best = NULL ;
int i ;
for ( i = 0 ; i < PT_CPUID_LEAVES ; i + + ) {
best = kvm_find_cpuid_entry ( vcpu , 0x14 , i ) ;
if ( ! best )
return ;
vmx - > pt_desc . caps [ CPUID_EAX + i * PT_CPUID_REGS_NUM ] = best - > eax ;
vmx - > pt_desc . caps [ CPUID_EBX + i * PT_CPUID_REGS_NUM ] = best - > ebx ;
vmx - > pt_desc . caps [ CPUID_ECX + i * PT_CPUID_REGS_NUM ] = best - > ecx ;
vmx - > pt_desc . caps [ CPUID_EDX + i * PT_CPUID_REGS_NUM ] = best - > edx ;
}
/* Get the number of configurable Address Ranges for filtering */
vmx - > pt_desc . addr_range = intel_pt_validate_cap ( vmx - > pt_desc . caps ,
PT_CAP_num_address_ranges ) ;
/* Initialize and clear the no dependency bits */
vmx - > pt_desc . ctl_bitmask = ~ ( RTIT_CTL_TRACEEN | RTIT_CTL_OS |
RTIT_CTL_USR | RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC ) ;
/*
* If CPUID . ( EAX = 14 H , ECX = 0 ) : EBX [ 0 ] = 1 CR3Filter can be set otherwise
* will inject an # GP
*/
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_cr3_filtering ) )
vmx - > pt_desc . ctl_bitmask & = ~ RTIT_CTL_CR3EN ;
/*
* If CPUID . ( EAX = 14 H , ECX = 0 ) : EBX [ 1 ] = 1 CYCEn , CycThresh and
* PSBFreq can be set
*/
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_psb_cyc ) )
vmx - > pt_desc . ctl_bitmask & = ~ ( RTIT_CTL_CYCLEACC |
RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ ) ;
/*
* If CPUID . ( EAX = 14 H , ECX = 0 ) : EBX [ 3 ] = 1 MTCEn BranchEn and
* MTCFreq can be set
*/
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_mtc ) )
vmx - > pt_desc . ctl_bitmask & = ~ ( RTIT_CTL_MTC_EN |
RTIT_CTL_BRANCH_EN | RTIT_CTL_MTC_RANGE ) ;
/* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can be set */
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_ptwrite ) )
vmx - > pt_desc . ctl_bitmask & = ~ ( RTIT_CTL_FUP_ON_PTW |
RTIT_CTL_PTW_EN ) ;
/* If CPUID.(EAX=14H,ECX=0):EBX[5]=1 PwrEvEn can be set */
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_power_event_trace ) )
vmx - > pt_desc . ctl_bitmask & = ~ RTIT_CTL_PWR_EVT_EN ;
/* If CPUID.(EAX=14H,ECX=0):ECX[0]=1 ToPA can be set */
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_topa_output ) )
vmx - > pt_desc . ctl_bitmask & = ~ RTIT_CTL_TOPA ;
/* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
if ( intel_pt_validate_cap ( vmx - > pt_desc . caps , PT_CAP_output_subsys ) )
vmx - > pt_desc . ctl_bitmask & = ~ RTIT_CTL_FABRIC_EN ;
/* unmask address range configure area */
for ( i = 0 ; i < vmx - > pt_desc . addr_range ; i + + )
2018-12-26 13:40:59 -07:00
vmx - > pt_desc . ctl_bitmask & = ~ ( 0xfULL < < ( 32 + i * 4 ) ) ;
2018-10-24 02:05:13 -06:00
}
2018-12-03 14:53:18 -07:00
static void vmx_cpuid_update ( struct kvm_vcpu * vcpu )
{
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2011-05-25 14:11:34 -06:00
2019-10-21 17:30:20 -06:00
/* xsaves_enabled is recomputed in vmx_compute_secondary_exec_control(). */
vcpu - > arch . xsaves_enabled = false ;
2018-12-03 14:53:18 -07:00
if ( cpu_has_secondary_exec_ctrls ( ) ) {
vmx_compute_secondary_exec_control ( vmx ) ;
2019-05-07 13:17:57 -06:00
vmcs_set_secondary_exec_control ( vmx ) ;
2015-02-03 08:58:17 -07:00
}
2011-05-25 14:11:34 -06:00
2018-12-03 14:53:18 -07:00
if ( nested_vmx_allowed ( vcpu ) )
to_vmx ( vcpu ) - > msr_ia32_feature_control_valid_bits | =
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX ;
2018-12-03 14:53:18 -07:00
else
to_vmx ( vcpu ) - > msr_ia32_feature_control_valid_bits & =
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
~ ( FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX ) ;
2017-09-14 17:31:44 -06:00
2018-12-03 14:53:18 -07:00
if ( nested_vmx_allowed ( vcpu ) ) {
nested_vmx_cr_fixed1_bits_update ( vcpu ) ;
nested_vmx_entry_exit_ctls_update ( vcpu ) ;
2017-09-14 17:31:44 -06:00
}
2018-10-24 02:05:13 -06:00
if ( boot_cpu_has ( X86_FEATURE_INTEL_PT ) & &
guest_cpuid_has ( vcpu , X86_FEATURE_INTEL_PT ) )
update_intel_pt_cfg ( vcpu ) ;
2019-11-18 10:23:01 -07:00
if ( boot_cpu_has ( X86_FEATURE_RTM ) ) {
struct shared_msr_entry * msr ;
msr = find_msr_entry ( vmx , MSR_IA32_TSX_CTRL ) ;
if ( msr ) {
bool enabled = guest_cpuid_has ( vcpu , X86_FEATURE_RTM ) ;
vmx_set_guest_msr ( vmx , msr , enabled ? 0 : TSX_CTRL_RTM_DISABLE ) ;
}
}
2018-12-03 14:53:18 -07:00
}
2018-09-26 10:23:55 -06:00
2020-03-02 16:56:43 -07:00
static __init void vmx_set_cpu_caps ( void )
2018-12-03 14:53:18 -07:00
{
2020-03-02 16:56:43 -07:00
kvm_set_cpu_caps ( ) ;
/* CPUID 0x1 */
if ( nested )
kvm_cpu_cap_set ( X86_FEATURE_VMX ) ;
/* CPUID 0x7 */
2020-03-02 16:56:45 -07:00
if ( kvm_mpx_supported ( ) )
kvm_cpu_cap_check_and_set ( X86_FEATURE_MPX ) ;
if ( cpu_has_vmx_invpcid ( ) )
kvm_cpu_cap_check_and_set ( X86_FEATURE_INVPCID ) ;
if ( vmx_pt_mode_is_host_guest ( ) )
kvm_cpu_cap_check_and_set ( X86_FEATURE_INTEL_PT ) ;
2020-03-02 16:56:43 -07:00
2020-03-02 16:56:47 -07:00
if ( vmx_umip_emulated ( ) )
kvm_cpu_cap_set ( X86_FEATURE_UMIP ) ;
2020-03-02 16:56:44 -07:00
/* CPUID 0xD.1 */
2020-03-05 08:11:56 -07:00
supported_xss = 0 ;
2020-03-02 16:56:44 -07:00
if ( ! vmx_xsaves_supported ( ) )
kvm_cpu_cap_clear ( X86_FEATURE_XSAVES ) ;
2020-03-02 16:56:43 -07:00
/* CPUID 0x80000001 */
if ( ! cpu_has_vmx_rdtscp ( ) )
kvm_cpu_cap_clear ( X86_FEATURE_RDTSCP ) ;
2020-05-23 10:14:54 -06:00
if ( vmx_waitpkg_supported ( ) )
kvm_cpu_cap_check_and_set ( X86_FEATURE_WAITPKG ) ;
2011-05-25 14:11:34 -06:00
}
2018-12-03 14:53:18 -07:00
static void vmx_request_immediate_exit ( struct kvm_vcpu * vcpu )
2014-01-04 10:47:19 -07:00
{
2018-12-03 14:53:18 -07:00
to_vmx ( vcpu ) - > req_immediate_exit = true ;
2011-05-25 14:12:04 -06:00
}
2020-02-04 16:26:31 -07:00
static int vmx_check_intercept_io ( struct kvm_vcpu * vcpu ,
struct x86_instruction_info * info )
{
struct vmcs12 * vmcs12 = get_vmcs12 ( vcpu ) ;
unsigned short port ;
bool intercept ;
int size ;
if ( info - > intercept = = x86_intercept_in | |
info - > intercept = = x86_intercept_ins ) {
port = info - > src_val ;
size = info - > dst_bytes ;
} else {
port = info - > dst_val ;
size = info - > src_bytes ;
}
/*
* If the ' use IO bitmaps ' VM - execution control is 0 , IO instruction
* VM - exits depend on the ' unconditional IO exiting ' VM - execution
* control .
*
* Otherwise , IO instruction VM - exits are controlled by the IO bitmaps .
*/
if ( ! nested_cpu_has ( vmcs12 , CPU_BASED_USE_IO_BITMAPS ) )
intercept = nested_cpu_has ( vmcs12 ,
CPU_BASED_UNCOND_IO_EXITING ) ;
else
intercept = nested_vmx_check_io_bitmaps ( vcpu , port , size ) ;
2020-02-29 12:30:14 -07:00
/* FIXME: produce nested vmexit and return X86EMUL_INTERCEPTED. */
2020-02-04 16:26:31 -07:00
return intercept ? X86EMUL_UNHANDLEABLE : X86EMUL_CONTINUE ;
}
2011-04-04 04:39:27 -06:00
static int vmx_check_intercept ( struct kvm_vcpu * vcpu ,
struct x86_instruction_info * info ,
2020-02-18 16:29:42 -07:00
enum x86_intercept_stage stage ,
struct x86_exception * exception )
2011-04-04 04:39:27 -06:00
{
2016-07-12 03:04:26 -06:00
struct vmcs12 * vmcs12 = get_vmcs12 ( vcpu ) ;
2020-02-04 16:26:31 -07:00
switch ( info - > intercept ) {
2016-07-12 03:04:26 -06:00
/*
* RDPID causes # UD if disabled through secondary execution controls .
* Because it is marked as EmulateOnUD , we need to intercept it here .
*/
2020-02-04 16:26:31 -07:00
case x86_intercept_rdtscp :
if ( ! nested_cpu_has2 ( vmcs12 , SECONDARY_EXEC_RDTSCP ) ) {
2020-02-18 16:29:42 -07:00
exception - > vector = UD_VECTOR ;
exception - > error_code_valid = false ;
2020-02-04 16:26:31 -07:00
return X86EMUL_PROPAGATE_FAULT ;
}
break ;
case x86_intercept_in :
case x86_intercept_ins :
case x86_intercept_out :
case x86_intercept_outs :
return vmx_check_intercept_io ( vcpu , info ) ;
2016-07-12 03:04:26 -06:00
2020-02-29 12:30:14 -07:00
case x86_intercept_lgdt :
case x86_intercept_lidt :
case x86_intercept_lldt :
case x86_intercept_ltr :
case x86_intercept_sgdt :
case x86_intercept_sidt :
case x86_intercept_sldt :
case x86_intercept_str :
if ( ! nested_cpu_has2 ( vmcs12 , SECONDARY_EXEC_DESC ) )
return X86EMUL_CONTINUE ;
/* FIXME: produce nested vmexit and return X86EMUL_INTERCEPTED. */
break ;
2016-07-12 03:04:26 -06:00
/* TODO: check more intercepts... */
2020-02-04 16:26:31 -07:00
default :
break ;
}
2020-02-04 16:26:29 -07:00
return X86EMUL_UNHANDLEABLE ;
2011-04-04 04:39:27 -06:00
}
2016-06-13 15:19:59 -06:00
# ifdef CONFIG_X86_64
/* (a << shift) / divisor, return 1 if overflow otherwise 0 */
static inline int u64_shl_div_u64 ( u64 a , unsigned int shift ,
u64 divisor , u64 * result )
{
u64 low = a < < shift , high = a > > ( 64 - shift ) ;
/* To avoid the overflow on divq */
if ( high > = divisor )
return 1 ;
/* Low hold the result, high hold rem which is discarded */
asm ( " divq %2 \n \t " : " =a " ( low ) , " =d " ( high ) :
" rm " ( divisor ) , " 0 " ( low ) , " 1 " ( high ) ) ;
* result = low ;
return 0 ;
}
2019-04-16 14:32:46 -06:00
static int vmx_set_hv_timer ( struct kvm_vcpu * vcpu , u64 guest_deadline_tsc ,
bool * expired )
2016-06-13 15:19:59 -06:00
{
2018-04-10 06:15:46 -06:00
struct vcpu_vmx * vmx ;
2018-05-29 00:53:17 -06:00
u64 tscl , guest_tscl , delta_tsc , lapic_timer_advance_cycles ;
2019-04-17 11:15:32 -06:00
struct kvm_timer * ktimer = & vcpu - > arch . apic - > lapic_timer ;
2018-04-10 06:15:46 -06:00
vmx = to_vmx ( vcpu ) ;
tscl = rdtsc ( ) ;
guest_tscl = kvm_read_l1_tsc ( vcpu , tscl ) ;
delta_tsc = max ( guest_deadline_tsc , guest_tscl ) - guest_tscl ;
2019-04-17 11:15:32 -06:00
lapic_timer_advance_cycles = nsec_to_cycles ( vcpu ,
ktimer - > timer_advance_ns ) ;
2018-05-29 00:53:17 -06:00
if ( delta_tsc > lapic_timer_advance_cycles )
delta_tsc - = lapic_timer_advance_cycles ;
else
delta_tsc = 0 ;
2016-06-13 15:19:59 -06:00
/* Convert to host delta tsc if tsc scaling is enabled */
if ( vcpu - > arch . tsc_scaling_ratio ! = kvm_default_tsc_scaling_ratio & &
2019-04-16 14:32:48 -06:00
delta_tsc & & u64_shl_div_u64 ( delta_tsc ,
2016-06-13 15:19:59 -06:00
kvm_tsc_scaling_ratio_frac_bits ,
2019-04-16 14:32:48 -06:00
vcpu - > arch . tsc_scaling_ratio , & delta_tsc ) )
2016-06-13 15:19:59 -06:00
return - ERANGE ;
/*
* If the delta tsc can ' t fit in the 32 bit after the multi shift ,
* we can ' t use the preemption timer .
* It ' s possible that it fits on later vmentries , but checking
* on every vmentry is costly so we just use an hrtimer .
*/
if ( delta_tsc > > ( cpu_preemption_timer_multi + 32 ) )
return - ERANGE ;
vmx - > hv_deadline_tsc = tscl + delta_tsc ;
2019-04-16 14:32:46 -06:00
* expired = ! delta_tsc ;
return 0 ;
2016-06-13 15:19:59 -06:00
}
static void vmx_cancel_hv_timer ( struct kvm_vcpu * vcpu )
{
2018-08-27 16:21:11 -06:00
to_vmx ( vcpu ) - > hv_deadline_tsc = - 1 ;
2016-06-13 15:19:59 -06:00
}
# endif
2014-08-26 05:27:46 -06:00
static void vmx_sched_in ( struct kvm_vcpu * vcpu , int cpu )
2014-08-21 10:08:06 -06:00
{
2018-03-12 05:53:04 -06:00
if ( ! kvm_pause_in_guest ( vcpu - > kvm ) )
2014-08-21 10:08:08 -06:00
shrink_ple_window ( vcpu ) ;
2014-08-21 10:08:06 -06:00
}
2015-01-27 19:54:28 -07:00
static void vmx_slot_enable_log_dirty ( struct kvm * kvm ,
struct kvm_memory_slot * slot )
{
2020-02-26 18:32:27 -07:00
if ( ! kvm_dirty_log_manual_protect_and_init_set ( kvm ) )
kvm_mmu_slot_leaf_clear_dirty ( kvm , slot ) ;
2015-01-27 19:54:28 -07:00
kvm_mmu_slot_largepage_remove_write_access ( kvm , slot ) ;
}
static void vmx_slot_disable_log_dirty ( struct kvm * kvm ,
struct kvm_memory_slot * slot )
{
kvm_mmu_slot_set_dirty ( kvm , slot ) ;
}
static void vmx_flush_log_dirty ( struct kvm * kvm )
{
kvm_flush_pml_buffers ( kvm ) ;
}
2020-06-22 15:58:29 -06:00
static int vmx_write_pml_buffer ( struct kvm_vcpu * vcpu , gpa_t gpa )
2017-05-05 13:25:14 -06:00
{
struct vmcs12 * vmcs12 ;
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
2020-06-22 15:58:29 -06:00
gpa_t dst ;
2017-05-05 13:25:14 -06:00
if ( is_guest_mode ( vcpu ) ) {
WARN_ON_ONCE ( vmx - > nested . pml_full ) ;
/*
* Check if PML is enabled for the nested guest .
* Whether eptp bit 6 is set is already checked
* as part of A / D emulation .
*/
vmcs12 = get_vmcs12 ( vcpu ) ;
if ( ! nested_cpu_has_pml ( vmcs12 ) )
return 0 ;
2017-05-10 13:43:17 -06:00
if ( vmcs12 - > guest_pml_index > = PML_ENTITY_NUM ) {
2017-05-05 13:25:14 -06:00
vmx - > nested . pml_full = true ;
return 1 ;
}
2020-06-22 15:58:29 -06:00
gpa & = ~ 0xFFFull ;
2019-01-31 13:24:32 -07:00
dst = vmcs12 - > pml_address + sizeof ( u64 ) * vmcs12 - > guest_pml_index ;
2017-05-05 13:25:14 -06:00
2019-01-31 13:24:32 -07:00
if ( kvm_write_guest_page ( vcpu - > kvm , gpa_to_gfn ( dst ) , & gpa ,
offset_in_page ( dst ) , sizeof ( gpa ) ) )
2017-05-05 13:25:14 -06:00
return 0 ;
2019-01-31 13:24:32 -07:00
vmcs12 - > guest_pml_index - - ;
2017-05-05 13:25:14 -06:00
}
return 0 ;
}
2015-01-27 19:54:28 -07:00
static void vmx_enable_log_dirty_pt_masked ( struct kvm * kvm ,
struct kvm_memory_slot * memslot ,
gfn_t offset , unsigned long mask )
{
kvm_mmu_clear_dirty_pt_masked ( kvm , memslot , offset , mask ) ;
}
2017-06-06 04:57:04 -06:00
static void __pi_post_block ( struct kvm_vcpu * vcpu )
{
struct pi_desc * pi_desc = vcpu_to_pi_desc ( vcpu ) ;
struct pi_desc old , new ;
unsigned int dest ;
do {
old . control = new . control = pi_desc - > control ;
2017-06-06 04:57:05 -06:00
WARN ( old . nv ! = POSTED_INTR_WAKEUP_VECTOR ,
" Wakeup handler not enabled while the VCPU is blocked \n " ) ;
2017-06-06 04:57:04 -06:00
dest = cpu_physical_id ( vcpu - > cpu ) ;
if ( x2apic_enabled ( ) )
new . ndst = dest ;
else
new . ndst = ( dest < < 8 ) & 0xFF00 ;
/* set 'NV' to 'notification vector' */
new . nv = POSTED_INTR_VECTOR ;
2017-09-28 09:58:41 -06:00
} while ( cmpxchg64 ( & pi_desc - > control , old . control ,
new . control ) ! = old . control ) ;
2017-06-06 04:57:04 -06:00
2017-06-06 04:57:05 -06:00
if ( ! WARN_ON_ONCE ( vcpu - > pre_pcpu = = - 1 ) ) {
spin_lock ( & per_cpu ( blocked_vcpu_on_cpu_lock , vcpu - > pre_pcpu ) ) ;
2017-06-06 04:57:04 -06:00
list_del ( & vcpu - > blocked_vcpu_list ) ;
2017-06-06 04:57:05 -06:00
spin_unlock ( & per_cpu ( blocked_vcpu_on_cpu_lock , vcpu - > pre_pcpu ) ) ;
2017-06-06 04:57:04 -06:00
vcpu - > pre_pcpu = - 1 ;
}
}
2015-09-18 08:29:55 -06:00
/*
* This routine does the following things for vCPU which is going
* to be blocked if VT - d PI is enabled .
* - Store the vCPU to the wakeup list , so when interrupts happen
* we can find the right vCPU to wake up .
* - Change the Posted - interrupt descriptor as below :
* ' NDST ' < - - vcpu - > pre_pcpu
* ' NV ' < - - POSTED_INTR_WAKEUP_VECTOR
* - If ' ON ' is set during this process , which means at least one
* interrupt is posted for this vCPU , we cannot block it , in
* this case , return 1 , otherwise , return 0.
*
*/
2016-06-13 15:19:58 -06:00
static int pi_pre_block ( struct kvm_vcpu * vcpu )
2015-09-18 08:29:55 -06:00
{
unsigned int dest ;
struct pi_desc old , new ;
struct pi_desc * pi_desc = vcpu_to_pi_desc ( vcpu ) ;
if ( ! kvm_arch_has_assigned_device ( vcpu - > kvm ) | |
2016-06-12 19:56:56 -06:00
! irq_remapping_cap ( IRQ_POSTING_CAP ) | |
! kvm_vcpu_apicv_active ( vcpu ) )
2015-09-18 08:29:55 -06:00
return 0 ;
2017-06-06 04:57:05 -06:00
WARN_ON ( irqs_disabled ( ) ) ;
local_irq_disable ( ) ;
if ( ! WARN_ON_ONCE ( vcpu - > pre_pcpu ! = - 1 ) ) {
vcpu - > pre_pcpu = vcpu - > cpu ;
spin_lock ( & per_cpu ( blocked_vcpu_on_cpu_lock , vcpu - > pre_pcpu ) ) ;
list_add_tail ( & vcpu - > blocked_vcpu_list ,
& per_cpu ( blocked_vcpu_on_cpu ,
vcpu - > pre_pcpu ) ) ;
spin_unlock ( & per_cpu ( blocked_vcpu_on_cpu_lock , vcpu - > pre_pcpu ) ) ;
}
2015-09-18 08:29:55 -06:00
do {
old . control = new . control = pi_desc - > control ;
WARN ( ( pi_desc - > sn = = 1 ) ,
" Warning: SN field of posted-interrupts "
" is set before blocking \n " ) ;
/*
* Since vCPU can be preempted during this process ,
* vcpu - > cpu could be different with pre_pcpu , we
* need to set pre_pcpu as the destination of wakeup
* notification event , then we can find the right vCPU
* to wakeup in wakeup handler if interrupts happen
* when the vCPU is in blocked state .
*/
dest = cpu_physical_id ( vcpu - > pre_pcpu ) ;
if ( x2apic_enabled ( ) )
new . ndst = dest ;
else
new . ndst = ( dest < < 8 ) & 0xFF00 ;
/* set 'NV' to 'wakeup vector' */
new . nv = POSTED_INTR_WAKEUP_VECTOR ;
2017-09-28 09:58:41 -06:00
} while ( cmpxchg64 ( & pi_desc - > control , old . control ,
new . control ) ! = old . control ) ;
2015-09-18 08:29:55 -06:00
2017-06-06 04:57:05 -06:00
/* We should not block the vCPU if an interrupt is posted for it. */
if ( pi_test_on ( pi_desc ) = = 1 )
__pi_post_block ( vcpu ) ;
local_irq_enable ( ) ;
return ( vcpu - > pre_pcpu = = - 1 ) ;
2015-09-18 08:29:55 -06:00
}
2016-06-13 15:19:58 -06:00
static int vmx_pre_block ( struct kvm_vcpu * vcpu )
{
if ( pi_pre_block ( vcpu ) )
return 1 ;
2016-06-13 15:19:59 -06:00
if ( kvm_lapic_hv_timer_in_use ( vcpu ) )
kvm_lapic_switch_to_sw_timer ( vcpu ) ;
2016-06-13 15:19:58 -06:00
return 0 ;
}
static void pi_post_block ( struct kvm_vcpu * vcpu )
2015-09-18 08:29:55 -06:00
{
2017-06-06 04:57:05 -06:00
if ( vcpu - > pre_pcpu = = - 1 )
2015-09-18 08:29:55 -06:00
return ;
2017-06-06 04:57:05 -06:00
WARN_ON ( irqs_disabled ( ) ) ;
local_irq_disable ( ) ;
2017-06-06 04:57:04 -06:00
__pi_post_block ( vcpu ) ;
2017-06-06 04:57:05 -06:00
local_irq_enable ( ) ;
2015-09-18 08:29:55 -06:00
}
2016-06-13 15:19:58 -06:00
static void vmx_post_block ( struct kvm_vcpu * vcpu )
{
2020-03-21 14:26:00 -06:00
if ( kvm_x86_ops . set_hv_timer )
2016-06-13 15:19:59 -06:00
kvm_lapic_switch_to_hv_timer ( vcpu ) ;
2016-06-13 15:19:58 -06:00
pi_post_block ( vcpu ) ;
}
2015-09-18 08:29:51 -06:00
/*
* vmx_update_pi_irte - set IRTE for Posted - Interrupts
*
* @ kvm : kvm
* @ host_irq : host irq of the interrupt
* @ guest_irq : gsi of the interrupt
* @ set : set or unset PI
* returns 0 on success , < 0 on failure
*/
static int vmx_update_pi_irte ( struct kvm * kvm , unsigned int host_irq ,
uint32_t guest_irq , bool set )
{
struct kvm_kernel_irq_routing_entry * e ;
struct kvm_irq_routing_table * irq_rt ;
struct kvm_lapic_irq irq ;
struct kvm_vcpu * vcpu ;
struct vcpu_data vcpu_info ;
2017-09-07 12:02:30 -06:00
int idx , ret = 0 ;
2015-09-18 08:29:51 -06:00
if ( ! kvm_arch_has_assigned_device ( kvm ) | |
2016-06-12 19:56:56 -06:00
! irq_remapping_cap ( IRQ_POSTING_CAP ) | |
! kvm_vcpu_apicv_active ( kvm - > vcpus [ 0 ] ) )
2015-09-18 08:29:51 -06:00
return 0 ;
idx = srcu_read_lock ( & kvm - > irq_srcu ) ;
irq_rt = srcu_dereference ( kvm - > irq_routing , & kvm - > irq_srcu ) ;
2017-09-07 12:02:30 -06:00
if ( guest_irq > = irq_rt - > nr_rt_entries | |
hlist_empty ( & irq_rt - > map [ guest_irq ] ) ) {
pr_warn_once ( " no route for guest_irq %u/%u (broken user space?) \n " ,
guest_irq , irq_rt - > nr_rt_entries ) ;
goto out ;
}
2015-09-18 08:29:51 -06:00
hlist_for_each_entry ( e , & irq_rt - > map [ guest_irq ] , link ) {
if ( e - > type ! = KVM_IRQ_ROUTING_MSI )
continue ;
/*
* VT - d PI cannot support posting multicast / broadcast
* interrupts to a vCPU , we still use interrupt remapping
* for these kind of interrupts .
*
* For lowest - priority interrupts , we only support
* those with single CPU as the destination , e . g . user
* configures the interrupts via / proc / irq or uses
* irqbalance to make the interrupts single - CPU .
*
* We will support full lowest - priority interrupt later .
2019-09-05 06:58:18 -06:00
*
* In addition , we can only inject generic interrupts using
* the PI mechanism , refuse to route others through it .
2015-09-18 08:29:51 -06:00
*/
2016-07-12 14:09:27 -06:00
kvm_set_msi_irq ( kvm , e , & irq ) ;
2019-09-05 06:58:18 -06:00
if ( ! kvm_intr_is_single_vcpu ( kvm , & irq , & vcpu ) | |
! kvm_irq_is_postable ( & irq ) ) {
2016-01-25 01:53:32 -07:00
/*
* Make sure the IRTE is in remapped mode if
* we don ' t handle it in posted mode .
*/
ret = irq_set_vcpu_affinity ( host_irq , NULL ) ;
if ( ret < 0 ) {
printk ( KERN_INFO
" failed to back to remapped mode, irq: %u \n " ,
host_irq ) ;
goto out ;
}
2015-09-18 08:29:51 -06:00
continue ;
2016-01-25 01:53:32 -07:00
}
2015-09-18 08:29:51 -06:00
vcpu_info . pi_desc_addr = __pa ( vcpu_to_pi_desc ( vcpu ) ) ;
vcpu_info . vector = irq . vector ;
2018-04-11 01:16:40 -06:00
trace_kvm_pi_irte_update ( host_irq , vcpu - > vcpu_id , e - > gsi ,
2015-09-18 08:29:51 -06:00
vcpu_info . vector , vcpu_info . pi_desc_addr , set ) ;
if ( set )
ret = irq_set_vcpu_affinity ( host_irq , & vcpu_info ) ;
2017-09-17 19:56:49 -06:00
else
2015-09-18 08:29:51 -06:00
ret = irq_set_vcpu_affinity ( host_irq , NULL ) ;
if ( ret < 0 ) {
printk ( KERN_INFO " %s: failed to update PI IRTE \n " ,
__func__ ) ;
goto out ;
}
}
ret = 0 ;
out :
srcu_read_unlock ( & kvm - > irq_srcu , idx ) ;
return ret ;
}
2016-06-22 00:59:56 -06:00
static void vmx_setup_mce ( struct kvm_vcpu * vcpu )
{
if ( vcpu - > arch . mcg_cap & MCG_LMCE_P )
to_vmx ( vcpu ) - > msr_ia32_feature_control_valid_bits | =
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
FEAT_CTL_LMCE_ENABLED ;
2016-06-22 00:59:56 -06:00
else
to_vmx ( vcpu ) - > msr_ia32_feature_control_valid_bits & =
x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2019-12-20 21:44:55 -07:00
~ FEAT_CTL_LMCE_ENABLED ;
2016-06-22 00:59:56 -06:00
}
2020-05-22 09:21:49 -06:00
static int vmx_smi_allowed ( struct kvm_vcpu * vcpu , bool for_injection )
2017-10-11 08:54:41 -06:00
{
2017-10-11 08:54:43 -06:00
/* we need a nested vmexit to enter SMM, postpone if run is pending */
if ( to_vmx ( vcpu ) - > nested . nested_run_pending )
2020-05-22 09:21:49 -06:00
return - EBUSY ;
2020-04-23 09:02:36 -06:00
return ! is_smm ( vcpu ) ;
2017-10-11 08:54:41 -06:00
}
2017-10-11 08:54:40 -06:00
static int vmx_pre_enter_smm ( struct kvm_vcpu * vcpu , char * smstate )
{
2017-10-11 08:54:43 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
vmx - > nested . smm . guest_mode = is_guest_mode ( vcpu ) ;
if ( vmx - > nested . smm . guest_mode )
nested_vmx_vmexit ( vcpu , - 1 , 0 , 0 ) ;
vmx - > nested . smm . vmxon = vmx - > nested . vmxon ;
vmx - > nested . vmxon = false ;
2018-03-12 05:53:03 -06:00
vmx_clear_hlt ( vcpu ) ;
2017-10-11 08:54:40 -06:00
return 0 ;
}
2019-04-02 09:03:09 -06:00
static int vmx_pre_leave_smm ( struct kvm_vcpu * vcpu , const char * smstate )
2017-10-11 08:54:40 -06:00
{
2017-10-11 08:54:43 -06:00
struct vcpu_vmx * vmx = to_vmx ( vcpu ) ;
int ret ;
if ( vmx - > nested . smm . vmxon ) {
vmx - > nested . vmxon = true ;
vmx - > nested . smm . vmxon = false ;
}
if ( vmx - > nested . smm . guest_mode ) {
2018-09-26 10:23:47 -06:00
ret = nested_vmx_enter_non_root_mode ( vcpu , false ) ;
2017-10-11 08:54:43 -06:00
if ( ret )
return ret ;
vmx - > nested . smm . guest_mode = false ;
}
2017-10-11 08:54:40 -06:00
return 0 ;
}
2020-05-22 09:21:49 -06:00
static void enable_smi_window ( struct kvm_vcpu * vcpu )
2017-10-17 08:02:39 -06:00
{
2020-05-22 09:21:49 -06:00
/* RSM will cause a vmexit anyway. */
2017-10-17 08:02:39 -06:00
}
2019-02-15 10:24:12 -07:00
static bool vmx_need_emulation_on_page_fault ( struct kvm_vcpu * vcpu )
{
2019-07-14 22:35:17 -06:00
return false ;
2019-02-15 10:24:12 -07:00
}
2019-08-26 04:24:49 -06:00
static bool vmx_apic_init_signal_blocked ( struct kvm_vcpu * vcpu )
{
return to_vmx ( vcpu ) - > nested . vmxon ;
}
2020-05-08 14:36:43 -06:00
static void vmx_migrate_timers ( struct kvm_vcpu * vcpu )
{
if ( is_guest_mode ( vcpu ) ) {
struct hrtimer * timer = & to_vmx ( vcpu ) - > nested . preemption_timer ;
if ( hrtimer_try_to_cancel ( timer ) = = 1 )
hrtimer_start_expires ( timer , HRTIMER_MODE_ABS_PINNED ) ;
}
}
2020-03-21 14:26:01 -06:00
static void hardware_unsetup ( void )
2020-03-21 14:25:57 -06:00
{
if ( nested )
nested_vmx_hardware_unsetup ( ) ;
free_kvm_area ( ) ;
}
static bool vmx_check_apicv_inhibit_reasons ( ulong bit )
{
ulong supported = BIT ( APICV_INHIBIT_REASON_DISABLE ) |
BIT ( APICV_INHIBIT_REASON_HYPERV ) ;
return supported & BIT ( bit ) ;
}
2020-03-21 14:26:02 -06:00
static struct kvm_x86_ops vmx_x86_ops __initdata = {
2020-03-21 14:25:57 -06:00
. hardware_unsetup = hardware_unsetup ,
. hardware_enable = hardware_enable ,
. hardware_disable = hardware_disable ,
. cpu_has_accelerated_tpr = report_flexpriority ,
. has_emulated_msr = vmx_has_emulated_msr ,
. vm_size = sizeof ( struct kvm_vmx ) ,
. vm_init = vmx_vm_init ,
. vcpu_create = vmx_create_vcpu ,
. vcpu_free = vmx_free_vcpu ,
. vcpu_reset = vmx_vcpu_reset ,
. prepare_guest_switch = vmx_prepare_switch_to_guest ,
. vcpu_load = vmx_vcpu_load ,
. vcpu_put = vmx_vcpu_put ,
. update_bp_intercept = update_exception_bitmap ,
. get_msr_feature = vmx_get_msr_feature ,
. get_msr = vmx_get_msr ,
. set_msr = vmx_set_msr ,
. get_segment_base = vmx_get_segment_base ,
. get_segment = vmx_get_segment ,
. set_segment = vmx_set_segment ,
. get_cpl = vmx_get_cpl ,
. get_cs_db_l_bits = vmx_get_cs_db_l_bits ,
. set_cr0 = vmx_set_cr0 ,
. set_cr4 = vmx_set_cr4 ,
. set_efer = vmx_set_efer ,
. get_idt = vmx_get_idt ,
. set_idt = vmx_set_idt ,
. get_gdt = vmx_get_gdt ,
. set_gdt = vmx_set_gdt ,
. set_dr7 = vmx_set_dr7 ,
. sync_dirty_debug_regs = vmx_sync_dirty_debug_regs ,
. cache_reg = vmx_cache_reg ,
. get_rflags = vmx_get_rflags ,
. set_rflags = vmx_set_rflags ,
2020-03-20 15:28:18 -06:00
. tlb_flush_all = vmx_flush_tlb_all ,
2020-03-20 15:28:20 -06:00
. tlb_flush_current = vmx_flush_tlb_current ,
2020-03-21 14:25:57 -06:00
. tlb_flush_gva = vmx_flush_tlb_gva ,
KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook
Add a dedicated hook to handle flushing TLB entries on behalf of the
guest, i.e. for a paravirtualized TLB flush, and use it directly instead
of bouncing through kvm_vcpu_flush_tlb().
For VMX, change the effective implementation implementation to never do
INVEPT and flush only the current context, i.e. to always flush via
INVVPID(SINGLE_CONTEXT). The INVEPT performed by __vmx_flush_tlb() when
@invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
flush guest-physical mappings; linear and combined mappings are flushed
by VM-Enter when VPID is disabled, and changes in the guest pages tables
do not affect guest-physical mappings.
When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
architecture) to invalidate guest-physical mappings, i.e. TLB entries
that cache guest-physical mappings can live across INVVPID as the
mappings are associated with an EPTP, not a VPID. The intent of
@invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
gpa mappings", i.e. do INVEPT and not simply INVVPID. Other than nested
VPID handling, which now calls vpid_sync_context() directly, the only
scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
enabled) is if KVM is flushing TLB entries from the guest's perspective,
i.e. is only required to invalidate linear mappings.
For SVM, flushing TLB entries from the guest's perspective can be done
by flushing the current ASID, as changes to the guest's page tables are
associated only with the current ASID.
Adding a dedicated ->tlb_flush_guest() paves the way toward removing
@invalidate_gpa, which is a potentially dangerous control flag as its
meaning is not exactly crystal clear, even for those who are familiar
with the subtleties of what mappings Intel CPUs are/aren't allowed to
keep across various invalidation scenarios.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-20 15:28:10 -06:00
. tlb_flush_guest = vmx_flush_tlb_guest ,
2020-03-21 14:25:57 -06:00
. run = vmx_vcpu_run ,
. handle_exit = vmx_handle_exit ,
. skip_emulated_instruction = vmx_skip_emulated_instruction ,
. update_emulated_instruction = vmx_update_emulated_instruction ,
. set_interrupt_shadow = vmx_set_interrupt_shadow ,
. get_interrupt_shadow = vmx_get_interrupt_shadow ,
. patch_hypercall = vmx_patch_hypercall ,
. set_irq = vmx_inject_irq ,
. set_nmi = vmx_inject_nmi ,
. queue_exception = vmx_queue_exception ,
. cancel_injection = vmx_cancel_injection ,
. interrupt_allowed = vmx_interrupt_allowed ,
. nmi_allowed = vmx_nmi_allowed ,
. get_nmi_mask = vmx_get_nmi_mask ,
. set_nmi_mask = vmx_set_nmi_mask ,
. enable_nmi_window = enable_nmi_window ,
. enable_irq_window = enable_irq_window ,
. update_cr8_intercept = update_cr8_intercept ,
. set_virtual_apic_mode = vmx_set_virtual_apic_mode ,
. set_apic_access_page_addr = vmx_set_apic_access_page_addr ,
. refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl ,
. load_eoi_exitmap = vmx_load_eoi_exitmap ,
. apicv_post_state_restore = vmx_apicv_post_state_restore ,
. check_apicv_inhibit_reasons = vmx_check_apicv_inhibit_reasons ,
. hwapic_irr_update = vmx_hwapic_irr_update ,
. hwapic_isr_update = vmx_hwapic_isr_update ,
. guest_apic_has_interrupt = vmx_guest_apic_has_interrupt ,
. sync_pir_to_irr = vmx_sync_pir_to_irr ,
. deliver_posted_interrupt = vmx_deliver_posted_interrupt ,
. dy_apicv_has_pending_interrupt = vmx_dy_apicv_has_pending_interrupt ,
. set_tss_addr = vmx_set_tss_addr ,
. set_identity_map_addr = vmx_set_identity_map_addr ,
2020-05-01 22:32:33 -06:00
. get_tdp_level = vmx_get_tdp_level ,
2020-03-21 14:25:57 -06:00
. get_mt_mask = vmx_get_mt_mask ,
. get_exit_info = vmx_get_exit_info ,
. cpuid_update = vmx_cpuid_update ,
. has_wbinvd_exit = cpu_has_vmx_wbinvd_exit ,
. write_l1_tsc_offset = vmx_write_l1_tsc_offset ,
. load_mmu_pgd = vmx_load_mmu_pgd ,
. check_intercept = vmx_check_intercept ,
. handle_exit_irqoff = vmx_handle_exit_irqoff ,
. request_immediate_exit = vmx_request_immediate_exit ,
. sched_in = vmx_sched_in ,
. slot_enable_log_dirty = vmx_slot_enable_log_dirty ,
. slot_disable_log_dirty = vmx_slot_disable_log_dirty ,
. flush_log_dirty = vmx_flush_log_dirty ,
. enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked ,
. write_log_dirty = vmx_write_pml_buffer ,
. pre_block = vmx_pre_block ,
. post_block = vmx_post_block ,
. pmu_ops = & intel_pmu_ops ,
2020-04-17 08:24:18 -06:00
. nested_ops = & vmx_nested_ops ,
2020-03-21 14:25:57 -06:00
. update_pi_irte = vmx_update_pi_irte ,
# ifdef CONFIG_X86_64
. set_hv_timer = vmx_set_hv_timer ,
. cancel_hv_timer = vmx_cancel_hv_timer ,
# endif
. setup_mce = vmx_setup_mce ,
. smi_allowed = vmx_smi_allowed ,
. pre_enter_smm = vmx_pre_enter_smm ,
. pre_leave_smm = vmx_pre_leave_smm ,
. enable_smi_window = enable_smi_window ,
. need_emulation_on_page_fault = vmx_need_emulation_on_page_fault ,
. apic_init_signal_blocked = vmx_apic_init_signal_blocked ,
2020-05-08 14:36:43 -06:00
. migrate_timers = vmx_migrate_timers ,
2020-03-21 14:25:57 -06:00
} ;
2018-12-03 14:53:11 -07:00
static __init int hardware_setup ( void )
{
unsigned long host_bndcfgs ;
KVM: VMX: Store the host kernel's IDT base in a global variable
Although the kernel may use multiple IDTs, KVM should only ever see the
"real" IDT, e.g. the early init IDT is long gone by the time KVM runs
and the debug stack IDT is only used for small windows of time in very
specific flows.
Before commit a547c6db4d2f1 ("KVM: VMX: Enable acknowledge interupt on
vmexit"), the kernel's IDT base was consumed by KVM only when setting
constant VMCS state, i.e. to set VMCS.HOST_IDTR_BASE. Because constant
host state is done once per vCPU, there was ostensibly no need to cache
the kernel's IDT base.
When support for "ack interrupt on exit" was introduced, KVM added a
second consumer of the IDT base as handling already-acked interrupts
requires directly calling the interrupt handler, i.e. KVM uses the IDT
base to find the address of the handler. Because interrupts are a fast
path, KVM cached the IDT base to avoid having to VMREAD HOST_IDTR_BASE.
Presumably, the IDT base was cached on a per-vCPU basis simply because
the existing code grabbed the IDT base on a per-vCPU (VMCS) basis.
Note, all post-boot IDTs use the same handlers for external interrupts,
i.e. the "ack interrupt on exit" use of the IDT base would be unaffected
even if the cached IDT somehow did not match the current IDT. And as
for the original use case of setting VMCS.HOST_IDTR_BASE, if any of the
above analysis is wrong then KVM has had a bug since the beginning of
time since KVM has effectively been caching the IDT at vCPU creation
since commit a8b732ca01c ("[PATCH] kvm: userspace interface").
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-19 23:50:57 -06:00
struct desc_ptr dt ;
2020-03-02 16:57:03 -07:00
int r , i , ept_lpage_level ;
2018-12-03 14:53:11 -07:00
KVM: VMX: Store the host kernel's IDT base in a global variable
Although the kernel may use multiple IDTs, KVM should only ever see the
"real" IDT, e.g. the early init IDT is long gone by the time KVM runs
and the debug stack IDT is only used for small windows of time in very
specific flows.
Before commit a547c6db4d2f1 ("KVM: VMX: Enable acknowledge interupt on
vmexit"), the kernel's IDT base was consumed by KVM only when setting
constant VMCS state, i.e. to set VMCS.HOST_IDTR_BASE. Because constant
host state is done once per vCPU, there was ostensibly no need to cache
the kernel's IDT base.
When support for "ack interrupt on exit" was introduced, KVM added a
second consumer of the IDT base as handling already-acked interrupts
requires directly calling the interrupt handler, i.e. KVM uses the IDT
base to find the address of the handler. Because interrupts are a fast
path, KVM cached the IDT base to avoid having to VMREAD HOST_IDTR_BASE.
Presumably, the IDT base was cached on a per-vCPU basis simply because
the existing code grabbed the IDT base on a per-vCPU (VMCS) basis.
Note, all post-boot IDTs use the same handlers for external interrupts,
i.e. the "ack interrupt on exit" use of the IDT base would be unaffected
even if the cached IDT somehow did not match the current IDT. And as
for the original use case of setting VMCS.HOST_IDTR_BASE, if any of the
above analysis is wrong then KVM has had a bug since the beginning of
time since KVM has effectively been caching the IDT at vCPU creation
since commit a8b732ca01c ("[PATCH] kvm: userspace interface").
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-19 23:50:57 -06:00
store_idt ( & dt ) ;
host_idt_base = dt . address ;
2018-12-03 14:53:11 -07:00
for ( i = 0 ; i < ARRAY_SIZE ( vmx_msr_index ) ; + + i )
kvm_define_shared_msr ( i , vmx_msr_index [ i ] ) ;
if ( setup_vmcs_config ( & vmcs_config , & vmx_capability ) < 0 )
return - EIO ;
if ( boot_cpu_has ( X86_FEATURE_NX ) )
kvm_enable_efer_bits ( EFER_NX ) ;
if ( boot_cpu_has ( X86_FEATURE_MPX ) ) {
rdmsrl ( MSR_IA32_BNDCFGS , host_bndcfgs ) ;
WARN_ONCE ( host_bndcfgs , " KVM: BNDCFGS in host will be lost " ) ;
}
2020-03-02 16:56:24 -07:00
if ( ! cpu_has_vmx_mpx ( ) )
2020-03-02 16:56:23 -07:00
supported_xcr0 & = ~ ( XFEATURE_MASK_BNDREGS |
XFEATURE_MASK_BNDCSR ) ;
2018-12-03 14:53:11 -07:00
if ( ! cpu_has_vmx_vpid ( ) | | ! cpu_has_vmx_invvpid ( ) | |
! ( cpu_has_vmx_invvpid_single ( ) | | cpu_has_vmx_invvpid_global ( ) ) )
enable_vpid = 0 ;
if ( ! cpu_has_vmx_ept ( ) | |
! cpu_has_vmx_ept_4levels ( ) | |
! cpu_has_vmx_ept_mt_wb ( ) | |
! cpu_has_vmx_invept_global ( ) )
enable_ept = 0 ;
if ( ! cpu_has_vmx_ept_ad_bits ( ) | | ! enable_ept )
enable_ept_ad_bits = 0 ;
if ( ! cpu_has_vmx_unrestricted_guest ( ) | | ! enable_ept )
enable_unrestricted_guest = 0 ;
if ( ! cpu_has_vmx_flexpriority ( ) )
flexpriority_enabled = 0 ;
if ( ! cpu_has_virtual_nmis ( ) )
enable_vnmi = 0 ;
/*
* set_apic_access_page_addr ( ) is used to reload apic access
* page upon invalidation . No need to do anything if not
* using the APIC_ACCESS_ADDR VMCS field .
*/
if ( ! flexpriority_enabled )
2020-03-21 14:25:58 -06:00
vmx_x86_ops . set_apic_access_page_addr = NULL ;
2018-12-03 14:53:11 -07:00
if ( ! cpu_has_vmx_tpr_shadow ( ) )
2020-03-21 14:25:58 -06:00
vmx_x86_ops . update_cr8_intercept = NULL ;
2018-12-03 14:53:11 -07:00
# if IS_ENABLED(CONFIG_HYPERV)
if ( ms_hyperv . nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
2018-12-06 06:21:07 -07:00
& & enable_ept ) {
2020-03-21 14:25:58 -06:00
vmx_x86_ops . tlb_remote_flush = hv_remote_flush_tlb ;
vmx_x86_ops . tlb_remote_flush_with_range =
2018-12-06 06:21:07 -07:00
hv_remote_flush_tlb_with_range ;
}
2018-12-03 14:53:11 -07:00
# endif
if ( ! cpu_has_vmx_ple ( ) ) {
ple_gap = 0 ;
ple_window = 0 ;
ple_window_grow = 0 ;
ple_window_max = 0 ;
ple_window_shrink = 0 ;
}
if ( ! cpu_has_vmx_apicv ( ) ) {
enable_apicv = 0 ;
2020-03-21 14:25:58 -06:00
vmx_x86_ops . sync_pir_to_irr = NULL ;
2018-12-03 14:53:11 -07:00
}
if ( cpu_has_vmx_tsc_scaling ( ) ) {
kvm_has_tsc_control = true ;
kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX ;
kvm_tsc_scaling_ratio_frac_bits = 48 ;
}
set_bit ( 0 , vmx_vpid_bitmap ) ; /* 0 is reserved for host */
if ( enable_ept )
vmx_enable_tdp ( ) ;
2020-03-02 16:57:03 -07:00
if ( ! enable_ept )
ept_lpage_level = 0 ;
else if ( cpu_has_vmx_ept_1g_page ( ) )
2020-04-27 18:54:22 -06:00
ept_lpage_level = PG_LEVEL_1G ;
2020-03-02 16:57:03 -07:00
else if ( cpu_has_vmx_ept_2m_page ( ) )
2020-04-27 18:54:22 -06:00
ept_lpage_level = PG_LEVEL_2M ;
2018-12-03 14:53:11 -07:00
else
2020-04-27 18:54:22 -06:00
ept_lpage_level = PG_LEVEL_4K ;
2020-03-02 16:57:03 -07:00
kvm_configure_mmu ( enable_ept , ept_lpage_level ) ;
2018-12-03 14:53:11 -07:00
/*
* Only enable PML when hardware supports PML feature , and both EPT
* and EPT A / D bit features are enabled - - PML depends on them to work .
*/
if ( ! enable_ept | | ! enable_ept_ad_bits | | ! cpu_has_vmx_pml ( ) )
enable_pml = 0 ;
if ( ! enable_pml ) {
2020-03-21 14:25:58 -06:00
vmx_x86_ops . slot_enable_log_dirty = NULL ;
vmx_x86_ops . slot_disable_log_dirty = NULL ;
vmx_x86_ops . flush_log_dirty = NULL ;
vmx_x86_ops . enable_log_dirty_pt_masked = NULL ;
2018-12-03 14:53:11 -07:00
}
if ( ! cpu_has_vmx_preemption_timer ( ) )
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
enable_preemption_timer = false ;
2018-12-03 14:53:11 -07:00
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
if ( enable_preemption_timer ) {
u64 use_timer_freq = 5000ULL * 1000 * 1000 ;
2018-12-03 14:53:11 -07:00
u64 vmx_msr ;
rdmsrl ( MSR_IA32_VMX_MISC , vmx_msr ) ;
cpu_preemption_timer_multi =
vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK ;
KVM: VMX: Leave preemption timer running when it's disabled
VMWRITEs to the major VMCS controls, pin controls included, are
deceptively expensive. CPUs with VMCS caching (Westmere and later) also
optimize away consistency checks on VM-Entry, i.e. skip consistency
checks if the relevant fields have not changed since the last successful
VM-Entry (of the cached VMCS). Because uops are a precious commodity,
uCode's dirty VMCS field tracking isn't as precise as software would
prefer. Notably, writing any of the major VMCS fields effectively marks
the entire VMCS dirty, i.e. causes the next VM-Entry to perform all
consistency checks, which consumes several hundred cycles.
As it pertains to KVM, toggling PIN_BASED_VMX_PREEMPTION_TIMER more than
doubles the latency of the next VM-Entry (and again when/if the flag is
toggled back). In a non-nested scenario, running a "standard" guest
with the preemption timer enabled, toggling the timer flag is uncommon
but not rare, e.g. roughly 1 in 10 entries. Disabling the preemption
timer can change these numbers due to its use for "immediate exits",
even when explicitly disabled by userspace.
Nested virtualization in particular is painful, as the timer flag is set
for the majority of VM-Enters, but prepare_vmcs02() initializes vmcs02's
pin controls to *clear* the flag since its the timer's final state isn't
known until vmx_vcpu_run(). I.e. the majority of nested VM-Enters end
up unnecessarily writing pin controls *twice*.
Rather than toggle the timer flag in pin controls, set the timer value
itself to the largest allowed value to put it into a "soft disabled"
state, and ignore any spurious preemption timer exits.
Sadly, the timer is a 32-bit value and so theoretically it can fire
before the head death of the universe, i.e. spurious exits are possible.
But because KVM does *not* save the timer value on VM-Exit and because
the timer runs at a slower rate than the TSC, the maximuma timer value
is still sufficiently large for KVM's purposes. E.g. on a modern CPU
with a timer that runs at 1/32 the frequency of a 2.4ghz constant-rate
TSC, the timer will fire after ~55 seconds of *uninterrupted* guest
execution. In other words, spurious VM-Exits are effectively only
possible if the host is completely tickless on the logical CPU, the
guest is not using the preemption timer, and the guest is not generating
VM-Exits for any other reason.
To be safe from bad/weird hardware, disable the preemption timer if its
maximum delay is less than ten seconds. Ten seconds is mostly arbitrary
and was selected in no small part because it's a nice round number.
For simplicity and paranoia, fall back to __kvm_request_immediate_exit()
if the preemption timer is disabled by KVM or userspace. Previously
KVM continued to use the preemption timer to force immediate exits even
when the timer was disabled by userspace. Now that KVM leaves the timer
running instead of truly disabling it, allow userspace to kill it
entirely in the unlikely event the timer (or KVM) malfunctions.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-07 13:18:05 -06:00
if ( tsc_khz )
use_timer_freq = ( u64 ) tsc_khz * 1000 ;
use_timer_freq > > = cpu_preemption_timer_multi ;
/*
* KVM " disables " the preemption timer by setting it to its max
* value . Don ' t use the timer if it might cause spurious exits
* at a rate faster than 0.1 Hz ( of uninterrupted guest time ) .
*/
if ( use_timer_freq > 0xffffffffu / 10 )
enable_preemption_timer = false ;
}
if ( ! enable_preemption_timer ) {
2020-03-21 14:25:58 -06:00
vmx_x86_ops . set_hv_timer = NULL ;
vmx_x86_ops . cancel_hv_timer = NULL ;
vmx_x86_ops . request_immediate_exit = __kvm_request_immediate_exit ;
2018-12-03 14:53:11 -07:00
}
kvm_set_posted_intr_wakeup_handler ( wakeup_handler ) ;
kvm_mce_cap_supported | = MCG_LMCE_P ;
2018-10-24 02:05:10 -06:00
if ( pt_mode ! = PT_MODE_SYSTEM & & pt_mode ! = PT_MODE_HOST_GUEST )
return - EINVAL ;
if ( ! enable_ept | | ! cpu_has_vmx_intel_pt ( ) )
pt_mode = PT_MODE_SYSTEM ;
2018-12-03 14:53:11 -07:00
if ( nested ) {
2018-12-03 14:53:13 -07:00
nested_vmx_setup_ctls_msrs ( & vmcs_config . nested ,
2020-02-20 10:22:04 -07:00
vmx_capability . ept ) ;
2018-12-03 14:53:13 -07:00
2020-05-06 14:46:53 -06:00
r = nested_vmx_hardware_setup ( kvm_vmx_exit_handlers ) ;
2018-12-03 14:53:11 -07:00
if ( r )
return r ;
}
2020-03-02 16:56:43 -07:00
vmx_set_cpu_caps ( ) ;
KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking
Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
what is effectively a KVM-adjusted copy of boot_cpu_data, or more
precisely, the x86_capability array in boot_cpu_data.
In terms of KVM support, the vast majority of CPUID feature bits are
constant, and *all* feature support is known at KVM load time. Rather
than apply boot_cpu_data, which is effectively read-only after init,
at runtime, copy it into a KVM-specific array and use *that* to mask
CPUID registers.
In additional to consolidating the masking, kvm_cpu_caps can be adjusted
by SVM/VMX at load time and thus eliminate all feature bit manipulation
in ->set_supported_cpuid().
Opportunistically clean up a few warts:
- Replace bare "unsigned" with "unsigned int" when a feature flag is
captured in a local variable, e.g. f_nx.
- Sort the CPUID masks by function, index and register (alphabetically
for registers, i.e. EBX comes before ECX/EDX).
- Remove the superfluous /* cpuid 7.0.ecx */ comments.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Call kvm_set_cpu_caps from kvm_x86_ops->hardware_setup due to fixed
GBPAGES patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 16:56:41 -07:00
2018-12-03 14:53:11 -07:00
r = alloc_kvm_area ( ) ;
if ( r )
nested_vmx_hardware_unsetup ( ) ;
return r ;
}
2020-03-21 14:25:56 -06:00
static struct kvm_x86_init_ops vmx_init_ops __initdata = {
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
. cpu_has_kvm_support = cpu_has_kvm_support ,
. disabled_by_bios = vmx_disabled_by_bios ,
2007-07-31 05:23:01 -06:00
. check_processor_compatibility = vmx_check_processor_compat ,
2020-03-21 14:25:56 -06:00
. hardware_setup = hardware_setup ,
2018-10-16 10:50:01 -06:00
2020-03-21 14:25:56 -06:00
. runtime_ops = & vmx_x86_ops ,
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
} ;
2018-07-13 08:23:16 -06:00
static void vmx_cleanup_l1d_flush ( void )
2018-07-02 04:47:38 -06:00
{
if ( vmx_l1d_flush_pages ) {
free_pages ( ( unsigned long ) vmx_l1d_flush_pages , L1D_CACHE_ORDER ) ;
vmx_l1d_flush_pages = NULL ;
}
2018-07-13 08:23:16 -06:00
/* Restore state so sysfs ignores VMX */
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO ;
2018-07-02 04:29:30 -06:00
}
2018-07-13 08:23:18 -06:00
static void vmx_exit ( void )
{
# ifdef CONFIG_KEXEC_CORE
RCU_INIT_POINTER ( crash_vmclear_loaded_vmcss , NULL ) ;
synchronize_rcu ( ) ;
# endif
kvm_exit ( ) ;
# if IS_ENABLED(CONFIG_HYPERV)
if ( static_branch_unlikely ( & enable_evmcs ) ) {
int cpu ;
struct hv_vp_assist_page * vp_ap ;
/*
* Reset everything to support using non - enlightened VMCS
* access later ( e . g . when we reload the module with
* enlightened_vmcs = 0 )
*/
for_each_online_cpu ( cpu ) {
vp_ap = hv_get_vp_assist_page ( cpu ) ;
if ( ! vp_ap )
continue ;
2019-08-22 08:30:21 -06:00
vp_ap - > nested_control . features . directhypercall = 0 ;
2018-07-13 08:23:18 -06:00
vp_ap - > current_nested_vmcs = 0 ;
vp_ap - > enlighten_vmentry = 0 ;
}
static_branch_disable ( & enable_evmcs ) ;
}
# endif
vmx_cleanup_l1d_flush ( ) ;
}
module_exit ( vmx_exit ) ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
static int __init vmx_init ( void )
{
2020-04-01 02:13:48 -06:00
int r , cpu ;
2018-03-20 08:02:11 -06:00
# if IS_ENABLED(CONFIG_HYPERV)
/*
* Enlightened VMCS usage should be recommended and the host needs
* to support eVMCS v1 or above . We can also disable eVMCS support
* with module parameter .
*/
if ( enlightened_vmcs & &
ms_hyperv . hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED & &
( ms_hyperv . nested_features & HV_X64_ENLIGHTENED_VMCS_VERSION ) > =
KVM_EVMCS_VERSION ) {
int cpu ;
/* Check that we have assist pages on all online CPUs */
for_each_online_cpu ( cpu ) {
if ( ! hv_get_vp_assist_page ( cpu ) ) {
enlightened_vmcs = false ;
break ;
}
}
if ( enlightened_vmcs ) {
pr_info ( " KVM: vmx: using Hyper-V Enlightened VMCS \n " ) ;
static_branch_enable ( & enable_evmcs ) ;
}
2019-08-22 08:30:21 -06:00
if ( ms_hyperv . nested_features & HV_X64_NESTED_DIRECT_FLUSH )
vmx_x86_ops . enable_direct_tlbflush
= hv_enable_direct_tlbflush ;
2018-03-20 08:02:11 -06:00
} else {
enlightened_vmcs = false ;
}
# endif
2020-03-21 14:25:56 -06:00
r = kvm_init ( & vmx_init_ops , sizeof ( struct vcpu_vmx ) ,
2018-07-13 08:23:18 -06:00
__alignof__ ( struct vcpu_vmx ) , THIS_MODULE ) ;
2007-04-30 00:45:24 -06:00
if ( r )
2014-10-27 20:14:48 -06:00
return r ;
2008-03-27 23:18:56 -06:00
2018-07-13 08:23:18 -06:00
/*
2018-07-13 08:23:19 -06:00
* Must be called after kvm_init ( ) so enable_ept is properly set
* up . Hand the parameter mitigation value in which was stored in
* the pre module init parser . If no parameter was given , it will
* contain ' auto ' which will be turned into the default ' cond '
* mitigation mode .
*/
2019-08-26 13:30:23 -06:00
r = vmx_setup_l1d_flush ( vmentry_l1d_flush_param ) ;
if ( r ) {
vmx_exit ( ) ;
return r ;
2018-07-02 04:47:38 -06:00
}
2008-03-27 23:18:56 -06:00
2020-04-01 02:13:48 -06:00
for_each_possible_cpu ( cpu ) {
INIT_LIST_HEAD ( & per_cpu ( loaded_vmcss_on_cpu , cpu ) ) ;
INIT_LIST_HEAD ( & per_cpu ( blocked_vcpu_on_cpu , cpu ) ) ;
spin_lock_init ( & per_cpu ( blocked_vcpu_on_cpu_lock , cpu ) ) ;
}
2015-09-09 16:38:55 -06:00
# ifdef CONFIG_KEXEC_CORE
2012-12-06 08:43:34 -07:00
rcu_assign_pointer ( crash_vmclear_loaded_vmcss ,
crash_vmclear_local_loaded_vmcss ) ;
# endif
2018-05-01 16:40:28 -06:00
vmx_check_vmcs12_offsets ( ) ;
2012-12-06 08:43:34 -07:00
2007-04-30 00:45:24 -06:00
return 0 ;
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 03:21:36 -07:00
}
2018-07-13 08:23:18 -06:00
module_init ( vmx_init ) ;