1
0
Fork 0
Commit Graph

85 Commits (6b0a7f84ea1fe248df96ccc4dd86e817e32ef65b)

Author SHA1 Message Date
Sean Christopherson 9ec19493fb KVM: x86: clear SMM flags before loading state while leaving SMM
RSM emulation is currently broken on VMX when the interrupted guest has
CR4.VMXE=1.  Stop dancing around the issue of HF_SMM_MASK being set when
loading SMSTATE into architectural state, e.g. by toggling it for
problematic flows, and simply clear HF_SMM_MASK prior to loading
architectural state (from SMRAM save state area).

Reported-by: Jon Doron <arilou@gmail.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Fixes: 5bea5123cb ("KVM: VMX: check nested state and CR4.VMXE against SMM")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16 15:37:36 +02:00
Sean Christopherson ed19321fb6 KVM: x86: Load SMRAM in a single shot when leaving SMM
RSM emulation is currently broken on VMX when the interrupted guest has
CR4.VMXE=1.  Rather than dance around the issue of HF_SMM_MASK being set
when loading SMSTATE into architectural state, ideally RSM emulation
itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM
architectural state.

Ostensibly, the only motivation for having HF_SMM_MASK set throughout
the loading of state from the SMRAM save state area is so that the
memory accesses from GET_SMSTATE() are tagged with role.smm.  Load
all of the SMRAM save state area from guest memory at the beginning of
RSM emulation, and load state from the buffer instead of reading guest
memory one-by-one.

This paves the way for clearing HF_SMM_MASK prior to loading state,
and also aligns RSM with the enter_smm() behavior, which fills a
buffer and writes SMRAM save state in a single go.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16 15:37:35 +02:00
Liran Alon e51bfdb687 KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU
Issue was discovered when running kvm-unit-tests on KVM running as L1 on
top of Hyper-V.

When vmx_instruction_intercept unit-test attempts to run RDPMC to test
RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC
handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU.
Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC.

The reason vmx_instruction_intercept unit-test attempts to run RDPMC
even though Hyper-V doesn't support PMU is because L1 expose to L2
support for RDPMC-exiting. Which is reasonable to assume that is
supported only in case CPU supports PMU to being with.

Above issue can easily be simulated by modifying
vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with
"-cpu host,+vmx,-pmu" and run unit-test.

To handle issue, change KVM to expose RDPMC-exiting only when guest
supports PMU.

Reported-by: Saar Amar <saaramar@microsoft.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16 15:37:34 +02:00
WANG Chao 1811d979c7 x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
guest xcr0 could leak into host when MCE happens in guest mode. Because
do_machine_check() could schedule out at a few places.

For example:

kvm_load_guest_xcr0
...
kvm_x86_ops->run(vcpu) {
  vmx_vcpu_run
    vmx_complete_atomic_exit
      kvm_machine_check
        do_machine_check
          do_memory_failure
            memory_failure
              lock_page

In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
out, host cpu has guest xcr0 loaded (0xff).

In __switch_to {
     switch_fpu_finish
       copy_kernel_to_fpregs
         XRSTORS

If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
and tries to reinitialize fpu by restoring init fpu state. Same story as
last #GP, except we get DOUBLE FAULT this time.

Cc: stable@vger.kernel.org
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16 15:37:33 +02:00
Paolo Bonzini 690908104e KVM: nVMX: allow tests to use bad virtual-APIC page address
As mentioned in the comment, there are some special cases where we can simply
clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
Handle them so that we can remove some XFAILs from vmx.flat.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16 10:59:07 +02:00
Sean Christopherson 0cf9135b77 KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
regardless of hardware support under the pretense that KVM fully
emulates MSR_IA32_ARCH_CAPABILITIES.  Unfortunately, only VMX hosts
handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).

Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
that it's emulated on AMD hosts.

Fixes: 1eaafe91a0 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
Cc: stable@vger.kernel.org
Reported-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28 17:29:00 +01:00
Singh, Brijesh 05d5a48635 KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
Errata#1096:

On a nested data page fault when CR.SMAP=1 and the guest data read
generates a SMAP violation, GuestInstrBytes field of the VMCB on a
VMEXIT will incorrectly return 0h instead the correct guest
instruction bytes .

Recommend Workaround:

To determine what instruction the guest was executing the hypervisor
will have to decode the instruction at the instruction pointer.

The recommended workaround can not be implemented for the SEV
guest because guest memory is encrypted with the guest specific key,
and instruction decoder will not be able to decode the instruction
bytes. If we hit this errata in the SEV guest then log the message
and request a guest shutdown.

Reported-by: Venkatesh Srinivas <venkateshs@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28 17:27:17 +01:00
Linus Torvalds 636deed6c0 ARM: some cleanups, direct physical timer assignment, cache sanitization
for 32-bit guests
 
 s390: interrupt cleanup, introduction of the Guest Information Block,
 preparation for processor subfunctions in cpu models
 
 PPC: bug fixes and improvements, especially related to machine checks
 and protection keys
 
 x86: many, many cleanups, including removing a bunch of MMU code for
 unnecessary optimizations; plus AVIC fixes.
 
 Generic: memcg accounting
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJci+7XAAoJEL/70l94x66DUMkIAKvEefhceySHYiTpfefjLjIC
 16RewgHa+9CO4Oo5iXiWd90fKxtXLXmxDQOS4VGzN0rxvLGRw/fyXIxL1MDOkaAO
 l8SLSNuewY4XBUgISL3PMz123r18DAGOuy9mEcYU/IMesYD2F+wy5lJ17HIGq6X2
 RpoF1p3qO1jfkPTKOob6Ixd4H5beJNPKpdth7LY3PJaVhDxgouj32fxnLnATVSnN
 gENQ10fnt8BCjshRYW6Z2/9bF15JCkUFR1xdBW2/xh1oj+kvPqqqk2bEN1eVQzUy
 2hT/XkwtpthqjSbX8NNavWRSFnOnbMLTRKQyIXmFVsM5VoSrwtiGsCFzBgcT++I=
 =XIzU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - some cleanups
   - direct physical timer assignment
   - cache sanitization for 32-bit guests

  s390:
   - interrupt cleanup
   - introduction of the Guest Information Block
   - preparation for processor subfunctions in cpu models

  PPC:
   - bug fixes and improvements, especially related to machine checks
     and protection keys

  x86:
   - many, many cleanups, including removing a bunch of MMU code for
     unnecessary optimizations
   - AVIC fixes

  Generic:
   - memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
  kvm: vmx: fix formatting of a comment
  KVM: doc: Document the life cycle of a VM and its resources
  MAINTAINERS: Add KVM selftests to existing KVM entry
  Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
  KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
  KVM: PPC: Fix compilation when KVM is not enabled
  KVM: Minor cleanups for kvm_main.c
  KVM: s390: add debug logging for cpu model subfunctions
  KVM: s390: implement subfunction processor calls
  arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
  KVM: arm/arm64: Remove unused timer variable
  KVM: PPC: Book3S: Improve KVM reference counting
  KVM: PPC: Book3S HV: Fix build failure without IOMMU support
  Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
  x86: kvmguest: use TSC clocksource if invariant TSC is exposed
  KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
  KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
  KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
  KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
  KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
  ...
2019-03-15 15:00:28 -07:00
Paolo Bonzini 4a605bc08e kvm: vmx: fix formatting of a comment
Eliminate a gratuitous conflict with 5.0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-15 19:24:34 +01:00
Ben Gardon 4183683918 kvm: vmx: Add memcg accounting to KVM allocations
There are many KVM kernel memory allocations which are tied to the life of
the VM process and should be charged to the VM process's cgroup. If the
allocations aren't tied to the process, the OOM killer will not know
that killing the process will free the associated kernel memory.
Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
charged to the VM process's cgroup.

Tested:
	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
	introduced no new failures.
	Ran a kernel memory accounting test which creates a VM to touch
	memory and then checks that the kernel memory allocated for the
	process is within certain bounds.
	With this patch we account for much more of the vmalloc and slab memory
	allocated for the VM.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:31 +01:00
Paolo Bonzini b4b65b5642 KVM: x86: cleanup freeing of nested state
Ensure that the VCPU free path goes through vmx_leave_nested and
thus nested_vmx_vmexit, so that the cancellation of the timer does
not have to be in free_nested.  In addition, because some paths through
nested_vmx_vmexit do not go through sync_vmcs12, the cancellation of
the timer is moved there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:27 +01:00
Luwei Kang 81b016676e KVM: x86: Sync the pending Posted-Interrupts
Some Posted-Interrupts from passthrough devices may be lost or
overwritten when the vCPU is in runnable state.

The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
but not running on physical CPU). If a posted interrupt coming at this
time, the irq remmaping facility will set the bit of PIR (Posted
Interrupt Requests) without ON (Outstanding Notification).
So this interrupt can't be sync to APIC virtualization register and
will not be handled by Guest because ON is zero.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
[Eliminate the pi_clear_sn fast path. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:27 +01:00
Sean Christopherson fc2ba5a27a KVM: VMX: Call vCPU-run asm sub-routine from C and remove clobbering
...now that the sub-routine follows standard calling conventions.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:18 +01:00
Sean Christopherson 3b895ef486 KVM: VMX: Preserve callee-save registers in vCPU-run asm sub-routine
...to make it callable from C code.

Note that because KVM chooses to be ultra paranoid about guest register
values, all callee-save registers are still cleared after VM-Exit even
though the host's values are now reloaded from the stack.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:17 +01:00
Sean Christopherson e75c3c3a04 KVM: VMX: Return VM-Fail from vCPU-run assembly via standard ABI reg
...to prepare for making the assembly sub-routine callable from C code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:17 +01:00
Sean Christopherson 77df549559 KVM: VMX: Pass @launched to the vCPU-run asm via standard ABI regs
...to prepare for making the sub-routine callable from C code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:16 +01:00
Sean Christopherson ee2fc635ef KVM: VMX: Rename ____vmx_vcpu_run() to __vmx_vcpu_run()
...now that the name is no longer usurped by a defunct helper function.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:15 +01:00
Sean Christopherson c823dd5c0f KVM: VMX: Fold __vmx_vcpu_run() back into vmx_vcpu_run()
...now that the code is no longer tagged with STACK_FRAME_NON_STANDARD.
Arguably, providing __vmx_vcpu_run() to break up vmx_vcpu_run() is
valuable on its own, but the previous split was purposely made as small
as possible to limit the effects STACK_FRAME_NON_STANDARD.  In other
words, the current split is now completely arbitrary and likely not the
most logical.

This also allows renaming ____vmx_vcpu_run() to __vmx_vcpu_run() in a
future patch.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:14 +01:00
Sean Christopherson 5e0781df18 KVM: VMX: Move vCPU-run code to a proper assembly routine
As evidenced by the myriad patches leading up to this moment, using
an inline asm blob for vCPU-run is nothing short of horrific.  It's also
been called "unholy", "an abomination" and likely a whole host of other
names that would violate the Code of Conduct if recorded here and now.

The code is relocated nearly verbatim, e.g. quotes, newlines, tabs and
__stringify need to be dropped, but other than those cosmetic changes
the only functional changees are to add the "call" and replace the final
"jmp" with a "ret".

Note that STACK_FRAME_NON_STANDARD is also dropped from __vmx_vcpu_run().

Suggested-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:13 +01:00
Sean Christopherson 63c73aa07f KVM: VMX: Create a stack frame in vCPU-run
...in preparation for moving to a proper assembly sub-routnine.
vCPU-run isn't a leaf function since it calls vmx_update_host_rsp()
and vmx_vmenter().  And since we need to save/restore RBP anyways,
unconditionally creating the frame costs a single MOV, i.e. don't
bother keying off CONFIG_FRAME_POINTER or using FRAME_BEGIN, etc...

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:13 +01:00
Sean Christopherson c14f9dd50b KVM: VMX: Use #defines in place of immediates in VM-Enter inline asm
...to prepare for moving the inline asm to a proper asm sub-routine.
Eliminating the immediates allows a nearly verbatim move, e.g. quotes,
newlines, tabs and __stringify need to be dropped, but other than those
cosmetic changes the only function change will be to replace the final
"jmp" with a "ret".

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:12 +01:00
Xiaoyao Li 98ae70cc47 kvm: vmx: Fix entry number check for add_atomic_switch_msr()
Commit ca83b4a7f2 ("x86/KVM/VMX: Add find_msr() helper function")
introduces the helper function find_msr(), which returns -ENOENT when
not find the msr in vmx->msr_autoload.guest/host. Correct checking contion
of no more available entry in vmx->msr_autoload.

Fixes: ca83b4a7f2 ("x86/KVM/VMX: Add find_msr() helper function")
Cc: stable@vger.kernel.org
Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-14 16:22:20 +01:00
Luwei Kang c112b5f502 KVM: x86: Recompute PID.ON when clearing PID.SN
Some Posted-Interrupts from passthrough devices may be lost or
overwritten when the vCPU is in runnable state.

The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state but
not running on physical CPU). If a posted interrupt comes at this time,
the irq remapping facility will set the bit of PIR (Posted Interrupt
Requests) but not ON (Outstanding Notification).  Then, the interrupt
will not be seen by KVM, which always expects PID.ON=1 if PID.PIR=1
as documented in the Intel processor SDM but not in the VT-d specification.
To fix this, restore the invariant after PID.SN is cleared.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-14 16:20:31 +01:00
Sean Christopherson d558920491 KVM: VMX: Use vcpu->arch.regs directly when saving/loading guest state
...now that all other references to struct vcpu_vmx have been removed.

Note that 'vmx' still needs to be passed into the asm blob in _ASM_ARG1
as it is consumed by vmx_update_host_rsp().  And similar to that code,
use _ASM_ARG2 in the assembly code to prepare for moving to proper asm,
while explicitly referencing the exact registers in the clobber list for
clarity in the short term and to avoid additional precompiler games.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:26 +01:00
Sean Christopherson f78d0971b7 KVM: VMX: Don't save guest registers after VM-Fail
A failed VM-Enter (obviously) didn't succeed, meaning the CPU never
executed an instrunction in guest mode and so can't have changed the
general purpose registers.

In addition to saving some instructions in the VM-Fail case, this also
provides a separate path entirely and thus an opportunity to propagate
the fail condition to vmx->fail via register without introducing undue
pain.  Using a register, as opposed to directly referencing vmx->fail,
eliminates the need to pass the offset of 'fail', which will simplify
moving the code to proper assembly in future patches.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:26 +01:00
Sean Christopherson 217aaff53c KVM: VMX: Invert the ordering of saving guest/host scratch reg at VM-Enter
Switching the ordering allows for an out-of-line path for VM-Fail
that elides saving guest state but still shares the register clearing
with the VM-Exit path.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:25 +01:00
Sean Christopherson c9afc58cc3 KVM: VMX: Pass "launched" directly to the vCPU-run asm blob
...and remove struct vcpu_vmx's temporary __launched variable.

Eliminating __launched is a bonus, the real motivation is to get to the
point where the only reference to struct vcpu_vmx in the asm code is
to vcpu.arch.regs, which will simplify moving the blob to a proper asm
file.  Note that also means this approach is deliberately different than
what is used in nested_vmx_check_vmentry_hw().

Use BL as it is a callee-save register in both 32-bit and 64-bit ABIs,
i.e. it can't be modified by vmx_update_host_rsp(), to avoid having to
temporarily save/restore the launched flag.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:24 +01:00
Sean Christopherson c09b03eb7f KVM: VMX: Update VMCS.HOST_RSP via helper C function
Providing a helper function to update HOST_RSP is visibly easier to
read, and more importantly (for the future) eliminates two arguments to
the VM-Enter assembly blob.  Reducing the number of arguments to the asm
blob is for all intents and purposes a prerequisite to moving the code
to a proper assembly routine.  It's not truly mandatory, but it greatly
simplifies the future code, and the cost of the extra CALL+RET is
negligible in the grand scheme.

Note that although _ASM_ARG[1-3] can be used in the inline asm itself,
the intput/output constraints need to be manually defined.  gcc will
actually compile with _ASM_ARG[1-3] specified as constraints, but what
it actually ends up doing with the bogus constraint is unknown.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:24 +01:00
Sean Christopherson 47e97c099b KVM: VMX: Load/save guest CR2 via C code in __vmx_vcpu_run()
...to eliminate its parameter and struct vcpu_vmx offset definition
from the assembly blob.  Accessing CR2 from C versus assembly doesn't
change the likelihood of taking a page fault (and modifying CR2) while
it's loaded with the guest's value, so long as we don't do anything
silly between accessing CR2 and VM-Enter/VM-Exit.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:23 +01:00
Sean Christopherson 5a8781607e KVM: nVMX: Cache host_rsp on a per-VMCS basis
Currently, host_rsp is cached on a per-vCPU basis, i.e. it's stored in
struct vcpu_vmx.  In non-nested usage the caching is for all intents
and purposes 100% effective, e.g. only the first VMLAUNCH needs to
synchronize VMCS.HOST_RSP since the call stack to vmx_vcpu_run() is
identical each and every time.  But when running a nested guest, KVM
must invalidate the cache when switching the current VMCS as it can't
guarantee the new VMCS has the same HOST_RSP as the previous VMCS.  In
other words, the cache loses almost all of its efficacy when running a
nested VM.

Move host_rsp to struct vmcs_host_state, which is per-VMCS, so that it
is cached on a per-VMCS basis and restores its 100% hit rate when
nested VMs are in play.

Note that the host_rsp cache for vmcs02 essentially "breaks" when
nested early checks are enabled as nested_vmx_check_vmentry_hw() will
see a different RSP at the time of its VM-Enter.  While it's possible
to avoid even that VMCS.HOST_RSP synchronization, e.g. by employing a
dedicated VM-Exit stack, there is little motivation for doing so as
the overhead of two VMWRITEs (~55 cycles) is dwarfed by the overhead
of the extra VMX transition (600+ cycles) and is a proverbial drop in
the ocean relative to the total cost of a nested transtion (10s of
thousands of cycles).

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:22 +01:00
Sean Christopherson 6f7c6d23b7 KVM: VMX: Let the compiler save/load RDX during vCPU-run
Per commit c20363006a ("KVM: VMX: Let gcc to choose which registers
to save (x86_64)"), the only reason RDX is saved/loaded to/from the
stack is because it was specified as an input, i.e. couldn't be marked
as clobbered (ignoring the fact that "saving" it to a dummy output
would indirectly mark it as clobbered).

Now that RDX is no longer an input, clobber it.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:17 +01:00
Sean Christopherson ccf447434e KVM: VMX: Manually load RDX in vCPU-run asm blob
Load RDX with the VMCS.HOST_RSP field encoding on-demand instead of
delegating to the compiler via an input constraint.  In addition to
saving one whole MOV instruction, this allows RDX to be properly
clobbered (in a future patch) instead of being saved/loaded to/from
the stack.

Despite nested_vmx_check_vmentry_hw() having similar code, leave it
alone, for now.  In that case, RDX is unconditionally used and isn't
clobbered, i.e. sending in HOST_RSP as an input is simpler.

Note that because HOST_RSP is an enum and not a define, it must be
redefined as an immediate instead of using __stringify(HOST_RSP).  The
naming "conflict" between host_rsp and HOST_RSP is slightly confusing,
but the former will be removed in a future patch, at which point
HOST_RSP is absolutely what is desired.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:16 +01:00
Sean Christopherson f3689e3f17 KVM: VMX: Save RSI to an unused output in the vCPU-run asm blob
RSI is clobbered by the vCPU-run asm blob, but it's not marked as such,
probably because GCC doesn't let you mark inputs as clobbered.  "Save"
RSI to a dummy output so that GCC recognizes it as being clobbered.

Fixes: 773e8a0425 ("x86/kvm: use Enlightened VMCS when running on Hyper-V")
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:15 +01:00
Sean Christopherson 831a301129 KVM: VMX: Modify only RSP when creating a placeholder for guest's RCX
In the vCPU-run asm blob, the guest's RCX is temporarily saved onto the
stack after VM-Exit as the exit flow must first load a register with a
pointer to the vCPU's save area in order to save the guest's registers.
RCX is arbitrarily designated as the scratch register.

Since the stack usage is to (1)save host, (2)save guest, (3)load host
and (4)load guest, the code can't conform to the stack's natural FIFO
semantics, i.e. it can't simply do PUSH/POP.  Regardless of whether it
is done for the host's value or guest's value, at some point the code
needs to access the stack using a non-traditional method, e.g. MOV
instead of POP.  vCPU-run opts to create a placeholder on the stack for
guest's RCX (by adjusting RSP) and saves RCX to its place immediately
after VM-Exit (via MOV).

In other words, the purpose of the first 'PUSH RCX' at the start of
the vCPU-run asm blob  is to adjust RSP down, i.e. there's no need to
actually access memory.  Use 'SUB $wordsize, RSP' instead of 'PUSH RCX'
to make it more obvious that the intent is simply to create a gap on
the stack for the guest's RCX.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:14 +01:00
Sean Christopherson 0e0ab73c9a KVM: VMX: Zero out *all* general purpose registers after VM-Exit
...except RSP, which is restored by hardware as part of VM-Exit.

Paolo theorized that restoring registers from the stack after a VM-Exit
in lieu of zeroing them could lead to speculative execution with the
guest's values, e.g. if the stack accesses miss the L1 cache[1].
Zeroing XORs are dirt cheap, so just be ultra-paranoid.

Note that the scratch register (currently RCX) used to save/restore the
guest state is also zeroed as its host-defined value is loaded via the
stack, just with a MOV instead of a POP.

[1] https://patchwork.kernel.org/patch/10771539/#22441255

Fixes: 0cb5b30698 ("kvm: vmx: Scrub hardware GPRs at VM-exit")
Cc: <stable@vger.kernel.org>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:14 +01:00
Sean Christopherson 61c08aa960 KVM: VMX: Compare only a single byte for VMCS' "launched" in vCPU-run
The vCPU-run asm blob does a manual comparison of a VMCS' launched
status to execute the correct VM-Enter instruction, i.e. VMLAUNCH vs.
VMRESUME.  The launched flag is a bool, which is a typedef of _Bool.
C99 does not define an exact size for _Bool, stating only that is must
be large enough to hold '0' and '1'.  Most, if not all, compilers use
a single byte for _Bool, including gcc[1].

Originally, 'launched' was of type 'int' and so the asm blob used 'cmpl'
to check the launch status.  When 'launched' was moved to be stored on a
per-VMCS basis, struct vcpu_vmx's "temporary" __launched flag was added
in order to avoid having to pass the current VMCS into the asm blob.
The new  '__launched' was defined as a 'bool' and not an 'int', but the
'cmp' instruction was not updated.

This has not caused any known problems, likely due to compilers aligning
variables to 4-byte or 8-byte boundaries and KVM zeroing out struct
vcpu_vmx during allocation.  I.e. vCPU-run accesses "junk" data, it just
happens to always be zero and so doesn't affect the result.

[1] https://gcc.gnu.org/ml/gcc-patches/2000-10/msg01127.html

Fixes: d462b81923 ("KVM: VMX: Keep list of loaded VMCSs, instead of vcpus")
Cc: <stable@vger.kernel.org>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-12 13:12:12 +01:00
Josh Poimboeuf b284909aba cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit:

  73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")

... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled.  However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.

The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case.  So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.

Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:

  1) /sys/devices/system/cpu/smt/control showed "on" instead of
     "notsupported"; and

  2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.

I'd propose that we instead consider #1 above to not actually be a
problem.  Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later.  So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).

The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.

So fix it by:

  a) reverting the original "fix" and its followup fix:

     73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
     bc2d8d262c ("cpu/hotplug: Fix SMT supported evaluation")

     and

  b) changing vmx_vm_init() to query the actual current SMT state --
     instead of the sysfs control value -- to determine whether the L1TF
     warning is needed.  This also requires the 'sched_smt_present'
     variable to exported, instead of 'cpu_smt_control'.

Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30 19:27:00 +01:00
Gustavo A. R. Silva b2869f28e1 KVM: x86: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warnings:

arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:29:36 +01:00
Sean Christopherson 5ad6ece869 KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag.  Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.

Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter().  Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.

Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined.  Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function.  Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.

Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone.  At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function.  Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.

Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.

Fixes: 453eafbe65 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:37 +01:00
Yi Wang 8997f65700 kvm: vmx: fix some -Wmissing-prototypes warnings
We get some warnings when building kernel with W=1:
arch/x86/kvm/vmx/vmx.c:426:5: warning: no previous prototype for ‘kvm_fill_hv_flush_list_func’ [-Wmissing-prototypes]
arch/x86/kvm/vmx/nested.c:58:6: warning: no previous prototype for ‘init_vmcs_shadow_fields’ [-Wmissing-prototypes]

Make them static to fix this.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:35 +01:00
Sean Christopherson 85ba2b165d KVM: VMX: Use the correct field var when clearing VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
Fix a recently introduced bug that results in the wrong VMCS control
field being updated when applying a IA32_PERF_GLOBAL_CTRL errata.

Fixes: c73da3fcab ("KVM: VMX: Properly handle dynamic VM Entry/Exit controls")
Reported-by: Harald Arnesen <harald@skogtun.org>
Tested-by: Harald Arnesen <harald@skogtun.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 18:52:54 +01:00
Lan Tianyu b7c1c226f9 KVM/VMX: Avoid return error when flush tlb successfully in the hv_remote_flush_tlb_with_range()
The "ret" is initialized to be ENOTSUPP. The return value of
__hv_remote_flush_tlb_with_range() will be Or with "ret" when ept
table potiners are mismatched. This will cause return ENOTSUPP even if
flush tlb successfully. This patch is to fix the issue and set
"ret" to 0.

Fixes: a5c214dad1 ("KVM/VMX: Change hv flush logic when ept tables are mismatched.")
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11 18:38:07 +01:00
Gustavo A. R. Silva d14eff1bc5 KVM: x86: Fix bit shifting in update_intel_pt_cfg
ctl_bitmask in pt_desc is of type u64. When an integer like 0xf is
being left shifted more than 32 bits, the behavior is undefined.

Fix this by adding suffix ULL to integer 0xf.

Addresses-Coverity-ID: 1476095 ("Bad bit shift operation")
Fixes: 6c0f0bba85 ("KVM: x86: Introduce a function to initialize the PT configuration")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11 18:38:07 +01:00
Linus Torvalds 312a466155 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kprobes: Remove trampoline_handler() prototype
  x86/kernel: Fix more -Wmissing-prototypes warnings
  x86: Fix various typos in comments
  x86/headers: Fix -Wmissing-prototypes warning
  x86/process: Avoid unnecessary NULL check in get_wchan()
  x86/traps: Complete prototype declarations
  x86/mce: Fix -Wmissing-prototypes warnings
  x86/gart: Rewrite early_gart_iommu_check() comment
2018-12-26 17:03:51 -08:00
Sean Christopherson 453eafbe65 KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
Transitioning to/from a VMX guest requires KVM to manually save/load
the bulk of CPU state that the guest is allowed to direclty access,
e.g. XSAVE state, CR2, GPRs, etc...  For obvious reasons, loading the
guest's GPR snapshot prior to VM-Enter and saving the snapshot after
VM-Exit is done via handcoded assembly.  The assembly blob is written
as inline asm so that it can easily access KVM-defined structs that
are used to hold guest state, e.g. moving the blob to a standalone
assembly file would require generating defines for struct offsets.

The other relevant aspect of VMX transitions in KVM is the handling of
VM-Exits.  KVM doesn't employ a separate VM-Exit handler per se, but
rather treats the VMX transition as a mega instruction (with many side
effects), i.e. sets the VMCS.HOST_RIP to a label immediately following
VMLAUNCH/VMRESUME.  The label is then exposed to C code via a global
variable definition in the inline assembly.

Because of the global variable, KVM takes steps to (attempt to) ensure
only a single instance of the owning C function, e.g. vmx_vcpu_run, is
generated by the compiler.  The earliest approach placed the inline
assembly in a separate noinline function[1].  Later, the assembly was
folded back into vmx_vcpu_run() and tagged with __noclone[2][3], which
is still used today.

After moving to __noclone, an edge case was encountered where GCC's
-ftracer optimization resulted in the inline assembly blob being
duplicated.  This was "fixed" by explicitly disabling -ftracer in the
__noclone definition[4].

Recently, it was found that disabling -ftracer causes build warnings
for unsuspecting users of __noclone[5], and more importantly for KVM,
prevents the compiler for properly optimizing vmx_vcpu_run()[6].  And
perhaps most importantly of all, it was pointed out that there is no
way to prevent duplication of a function with 100% reliability[7],
i.e. more edge cases may be encountered in the future.

So to summarize, the only way to prevent the compiler from duplicating
the global variable definition is to move the variable out of inline
assembly, which has been suggested several times over[1][7][8].

Resolve the aforementioned issues by moving the VMLAUNCH+VRESUME and
VM-Exit "handler" to standalone assembly sub-routines.  Moving only
the core VMX transition codes allows the struct indexing to remain as
inline assembly and also allows the sub-routines to be used by
nested_vmx_check_vmentry_hw().  Reusing the sub-routines has a happy
side-effect of eliminating two VMWRITEs in the nested_early_check path
as there is no longer a need to dynamically change VMCS.HOST_RIP.

Note that callers to vmx_vmenter() must account for the CALL modifying
RSP, e.g. must subtract op-size from RSP when synchronizing RSP with
VMCS.HOST_RSP and "restore" RSP prior to the CALL.  There are no great
alternatives to fudging RSP.  Saving RSP in vmx_enter() is difficult
because doing so requires a second register (VMWRITE does not provide
an immediate encoding for the VMCS field and KVM supports Hyper-V's
memory-based eVMCS ABI).  The other more drastic alternative would be
to use eschew VMCS.HOST_RSP and manually save/load RSP using a per-cpu
variable (which can be encoded as e.g. gs:[imm]).  But because a valid
stack is needed at the time of VM-Exit (NMIs aren't blocked and a user
could theoretically insert INT3/INT1ICEBRK at the VM-Exit handler), a
dedicated per-cpu VM-Exit stack would be required.  A dedicated stack
isn't difficult to implement, but it would require at least one page
per CPU and knowledge of the stack in the dumpstack routines.  And in
most cases there is essentially zero overhead in dynamically updating
VMCS.HOST_RSP, e.g. the VMWRITE can be avoided for all but the first
VMLAUNCH unless nested_early_check=1, which is not a fast path.  In
other words, avoiding the VMCS.HOST_RSP by using a dedicated stack
would only make the code marginally less ugly while requiring at least
one page per CPU and forcing the kernel to be aware (and approve) of
the VM-Exit stack shenanigans.

[1] cea15c24ca39 ("KVM: Move KVM context switch into own function")
[2] a3b5ba49a8 ("KVM: VMX: add the __noclone attribute to vmx_vcpu_run")
[3] 104f226bfd ("KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()")
[4] 95272c2937 ("compiler-gcc: disable -ftracer for __noclone functions")
[5] https://lkml.kernel.org/r/20181218140105.ajuiglkpvstt3qxs@treble
[6] https://patchwork.kernel.org/patch/8707981/#21817015
[7] https://lkml.kernel.org/r/ri6y38lo23g.fsf@suse.cz
[8] https://lkml.kernel.org/r/20181218212042.GE25620@tassilo.jf.intel.com

Suggested-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Martin Jambor <mjambor@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 12:02:50 +01:00
Sean Christopherson 051a2d3e59 KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
Use '%% " _ASM_CX"' instead of '%0' to dereference RCX, i.e. the
'struct vcpu_vmx' pointer, in the VM-Enter asm blobs of vmx_vcpu_run()
and nested_vmx_check_vmentry_hw().  Using the symbolic name means that
adding/removing an output parameter(s) requires "rewriting" almost all
of the asm blob, which makes it nearly impossible to understand what's
being changed in even the most minor patches.

Opportunistically improve the code comments.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 12:02:43 +01:00
Lan Tianyu 1f3a3e46cc KVM/VMX: Add hv tlb range flush support
This patch is to register tlb_remote_flush_with_range callback with
hv tlb range flush interface.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:39 +01:00
Luwei Kang ee85dec2fe KVM: x86: Disable Intel PT when VMXON in L1 guest
Currently, Intel Processor Trace do not support tracing in L1 guest
VMX operation(IA32_VMX_MISC[bit 14] is 0). As mentioned in SDM,
on these type of processors, execution of the VMXON instruction will
clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL
causes a general-protection exception (#GP).

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:38 +01:00
Chao Peng b08c28960f KVM: x86: Set intercept for Intel PT MSRs read/write
To save performance overhead, disable intercept Intel PT MSRs
read/write when Intel PT is enabled in guest.
MSR_IA32_RTIT_CTL is an exception that will always be intercepted.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:37 +01:00
Chao Peng bf8c55d8dc KVM: x86: Implement Intel PT MSRs read/write emulation
This patch implement Intel Processor Trace MSRs read/write
emulation.
Intel PT MSRs read/write need to be emulated when Intel PT
MSRs is intercepted in guest and during live migration.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:36 +01:00