alistair23-linux

redonkable

Author	SHA1	Message	Date
Babu Moger	830bd71f2c	KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep tfor all cr intercepts. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985253016.11252.16945893859439811480.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:15 -04:00
Babu Moger	4c44e8d6c1	KVM: SVM: Add new intercept word in vmcb_control_area The new intercept bits have been added in vmcb control area to support few more interceptions. Here are the some of them. - INTERCEPT_INVLPGB, - INTERCEPT_INVLPGB_ILLEGAL, - INTERCEPT_INVPCID, - INTERCEPT_MCOMMIT, - INTERCEPT_TLBSYNC, Add a new intercept word in vmcb_control_area to support these instructions. Also update kvm_nested_vmrun trace function to support the new addition. AMD documentation for these instructions is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:15 -04:00
Babu Moger	c62e2e94b9	KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors Convert all the intercepts to one array of 32 bit vectors in vmcb_control_area. This makes it easy for future intercept vector additions. Also update trace functions. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:15 -04:00
Babu Moger	9780d51dc2	KVM: SVM: Modify intercept_exceptions to generic intercepts Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_exceptions bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250037.11252.1361972528657052410.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:14 -04:00
Babu Moger	30abaa8838	KVM: SVM: Change intercept_dr to generic intercepts Modify intercept_dr to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_dr bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985249255.11252.10000868032136333355.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:14 -04:00
Babu Moger	03bfeeb988	KVM: SVM: Change intercept_cr to generic intercepts Change intercept_cr to generic intercepts in vmcb_control_area. Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept where applicable. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985248506.11252.9081085950784508671.stgit@bmoger-ubuntu> [Change constant names. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:13 -04:00
Babu Moger	c45ad7229d	KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept) This is in preparation for the future intercept vector additions. Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept using kernel APIs __set_bit, __clear_bit and test_bit espectively. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985247876.11252.16039238014239824460.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:13 -04:00
Babu Moger	a90c1ed9f1	KVM: nSVM: Remove unused field host_intercept_exceptions is not used anywhere. Clean it up. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985252277.11252.8819848322175521354.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:12 -04:00
Maxim Levitsky	8d22b90e94	KVM: SVM: refactor exit labels in svm_create_vcpu Kernel coding style suggests not to use labels like error1,error2 Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:12 -04:00
Maxim Levitsky	0681de1b83	KVM: SVM: use __GFP_ZERO instead of clear_page Another small refactoring. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:11 -04:00
Maxim Levitsky	f4c847a956	KVM: SVM: refactor msr permission bitmap allocation Replace svm_vcpu_init_msrpm with svm_vcpu_alloc_msrpm, that also allocates the msr bitmap and add svm_vcpu_free_msrpm to free it. This will be used later to move the nested msr permission bitmap allocation to nested.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:11 -04:00
Maxim Levitsky	0dd16b5b0c	KVM: nSVM: rename nested vmcb to vmcb12 This is to be more consistient with VMX, and to support upcoming addition of vmcb02 Hopefully no functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:10 -04:00
Maxim Levitsky	1feaba144c	KVM: SVM: rename a variable in the svm_create_vcpu The 'page' is to hold the vcpu's vmcb so name it as such to avoid confusion. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200827171145.374620-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:10 -04:00
Wanpeng Li	010fd37fdd	KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns All the checks in lapic_timer_int_injected(), __kvm_wait_lapic_expire(), and these function calls waste cpu cycles when the timer mode is not tscdeadline. We can observe ~1.3% world switch time overhead by kvm-unit-tests/vmexit.flat vmcall testing on AMD server. This patch reduces the world switch latency caused by timer_advance_ns feature when the timer mode is not tscdeadline by simpling move the check against apic->lapic_timer.expired_tscdeadline much earlier. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-7-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:10 -04:00
Wanpeng Li	68ca7663c7	KVM: LAPIC: Narrow down the kick target vCPU The kick after setting KVM_REQ_PENDING_TIMER is used to handle the timer fires on a different pCPU which vCPU is running on. This kick costs about 1000 clock cycles and we don't need this when injecting already-expired timer or when using the VMX preemption timer because kvm_lapic_expired_hv_timer() is called from the target vCPU. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-6-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:09 -04:00
Wanpeng Li	275038332f	KVM: LAPIC: Guarantee the timer is in tsc-deadline mode when setting Check apic_lvtt_tscdeadline() mode directly instead of apic_lvtt_oneshot() and apic_lvtt_period() to guarantee the timer is in tsc-deadline mode when wrmsr MSR_IA32_TSCDEADLINE. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-3-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:09 -04:00
Wanpeng Li	a970e9b216	KVM: LAPIC: Return 0 when getting the tscdeadline timer if the lapic is hw disabled Return 0 when getting the tscdeadline timer if the lapic is hw disabled. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-2-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:08 -04:00
Wanpeng Li	ae6f249686	KVM: LAPIC: Fix updating DFR missing apic map recalculation There is missing apic map recalculation after updating DFR, if it is INIT RESET, in x2apic mode, local apic is software enabled before. This patch fix it by introducing the function kvm_apic_set_dfr() to be called in INIT RESET handling path. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1597827327-25055-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:08 -04:00
Yi Li	2fc4f15dac	kvm/eventfd: move wildcard calculation outside loop There is no need to calculate wildcard in each iteration since wildcard is not changed. Signed-off-by: Yi Li <yili@winhong.com> Message-Id: <20200911055652.3041762-1-yili@winhong.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:08 -04:00
Chenyi Qiang	b9757a4b6f	KVM: nVMX: Simplify the initialization of nested_vmx_msrs The nested VMX controls MSRs can be initialized by the global capability values stored in vmcs_config. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200828085622.8365-6-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:07 -04:00
Chenyi Qiang	efc831338b	KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they are not exposed by the system ioctl KVM_GET_MSR. Add them to the setup of nested VMX controls MSR. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20200828085622.8365-2-chenyi.qiang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:07 -04:00
Vitaly Kuznetsov	d5cd6f3401	KVM: nSVM: Avoid freeing uninitialized pointers in svm_set_nested_state() The save and ctl pointers are passed uninitialized to kfree() when svm_set_nested_state() follows the 'goto out_set_gif' path. While the issue could've been fixed by initializing these on-stack varialbles to NULL, it seems preferable to eliminate 'out_set_gif' label completely as it is not actually a failure path and duplicating a single svm_set_gif() call doesn't look too bad. [ bp: Drop obscure Addresses-Coverity: tag. ] Fixes: `6ccbd29ade` ("KVM: SVM: nested: Don't allocate VMCB structures on stack") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Joerg Roedel <jroedel@suse.de> Reported-by: Colin King <colin.king@canonical.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Joerg Roedel <jroedel@suse.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20200914133725.650221-1-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:56:43 -04:00
Paolo Bonzini	2e3df760cd	PPC KVM update for 5.10 - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJfaXWyAAoJEJ2a6ncsY3GfcBwH/A/8qEXwzRZifASJMCVNyDho iGU3ku2I6Ui9zC1PpcbAJ0Eu7ZAdUXpqrdyKOJLHquZGhKH4Jl7Cjq5ZpEXzGP4v 22QJA8ek9HWAbaeV+N9Q1zpWUCRBGR+Onm5g9KE7BG5/eUaQDukpHLpDNXl3nT95 zlmaiMYYYYhgSQKBh3HQp5nhYMVpwToq14EsJV6sZ99nJhrjtXjx3MsoCU03+h+k 9y1FwBVkS2XQ0deYQuFYSgVNCF1gmK8lBbKL1Zly2MYJhQDZbiX/VJrtGW/ls5kl KbzWYxQHZ46NH6SSfwXLbBZa5reS+s1va/Q9RJL9/lCncn5iYhcCJ1sVBHLIq4U= =nYru -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-next-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM update for 5.10 - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes	2020-09-22 08:18:16 -04:00
Paolo Bonzini	bf3c0e5e71	Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD	2020-09-22 06:43:17 -04:00
Wang Wensheng	cf59eb13e1	KVM: PPC: Book3S: Fix symbol undeclared warnings Build the kernel with `C=2`: arch/powerpc/kvm/book3s_hv_nested.c:572:25: warning: symbol 'kvmhv_alloc_nested' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_mmu_radix.c:350:6: warning: symbol 'kvmppc_radix_set_pte_at' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv.c:3568:5: warning: symbol 'kvmhv_p9_guest_entry' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv_rm_xics.c:767:15: warning: symbol 'eoi_rc' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio_hv.c:240:13: warning: symbol 'iommu_tce_kill_rm' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio.c:492:6: warning: symbol 'kvmppc_tce_iommu_do_map' was not declared. Should it be static? arch/powerpc/kvm/book3s_pr.c:572:6: warning: symbol 'kvmppc_set_pvr_pr' was not declared. Should it be static? Those symbols are used only in the files that define them so make them static to fix the warnings. Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-22 11:53:55 +10:00
Jing Xiangfeng	eb173559c9	KVM: PPC: Book3S: Remove redundant initialization of variable ret The variable ret is being initialized with '-ENOMEM' that is meaningless. So remove it. Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-22 11:53:55 +10:00
Qinglang Miao	4517076608	KVM: PPC: Book3S HV: XIVE: Convert to DEFINE_SHOW_ATTRIBUTE Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-22 11:53:55 +10:00
Paolo Bonzini	32251b07d5	KVM: s390: add documentation for KVM_CAP_S390_DIAG318 diag318 code was merged in 5.9-rc1, let us add some missing documentation -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJfXxjIAAoJEBF7vIC1phx8nvQP/jN1mdF8h18ylQjSyY+e4ZS0 yogItIlHUSFAAVesLe1NiKZWi7gYj6ZkN4E5qaze2NX4+YgM5raqb25p6K0in4b5 PARFOCQeRO9BmT+VhGJBoJplfSIbqwche0A5606/0jYtakRv3ZJh+gMZGR8Tk5u9 nZgvJ/6jD8sPQ/pyZGr6L1UgHD661UZ8U/E3qfMYGyLpRzQiKaiGQNVCvWdMdXL1 M0XvCati9BRIvZtuqSUzLJVGf14ca2UH6eTOZkp0BpR+dckHPaocgyqpougR21Ht u1x6VBIKDCAy/EYs7Rmk2PBwY64Ym155m9yYHuuNPmm+ruPaz/A0YJJ25bhK1J+S hirc4EVaCASwpeYuvcRDY9SyspSxLFkU0GXAj6GqLPlM/w2BgWU0oBENNGZe0Ayt whdQjFZGNqlHucgO7HkVYbAHkCxHlfnK2Osfs1apzppfpwrjtTs6LvPpGCXYInWx lOAd9OUk7K6wmC1WktvXV+HMsTB4/ko1pwRcsBiga7hTtawT/xI3W0TtpzrbPwIP 0bpxmC+D+dAZjj7FbXXpyuxMsotoWIEttdBajuTYHeiQQq+Q/4BMDaHMA5n1Vk0F gq5wAZZUplkeCWXsjNm3EPnBMyCl5CQee+vns7mD0Hx/IjqoZShXSzYvVyP25MYb w22dAOEiEl0fq54yxE3p =y7Nx -----END PGP SIGNATURE----- Merge tag 'kvm-s390-master-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: add documentation for KVM_CAP_S390_DIAG318 diag318 code was merged in 5.9-rc1, let us add some missing documentation	2020-09-20 17:31:15 -04:00
Paolo Bonzini	b73815a18d	KVM/arm64 fixes for 5.9, take #2 - Fix handling of S1 Page Table Walk permission fault at S2 on instruction fetch - Cleanup kvm_vcpu_dabt_iswrite() -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl9k6LYPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDe6UP+gMRK9yEgY7eBg23nfOVDiF7t8dvj+J5eSws 8YP3nRew9pL710SLClGJCrVyVaCEMPN7cL7O926IALqGPMsPgKFPX7RcshjVUOUo otjQPfT3hMOKfmAGCHyQHeudQ8KMEZX9ikdvGd19zT3F2t2CojUn0uFZPgjIXKOe x4Un5F/QuOVFoI85BctqpNRJsc9yzjpIu/J2YjO+QeKyMoK1gp+GLIACDczfbaLq IWSRApDPP3vyxImxtWaahI/IlAk4XcDV4EFmUMN2GDuK64bFTcky1YLr88njp0ZV VVrxw4Z/lGGKl1MqNV8/OVBS+z1+L8G8+uNFELFL1veDc+eDn+P/Sr/SoDxT93uO Dz9/gWJd2GEtneJ/PspjxLdzpAkUluV7vRQBikjVtBld7OivnDYeylJsA88olXzI PUY+Y9p1K3T/u5oeiKEnDvufzisDiYZSKpE81KtJP+++biOULVIm7rkCmg5IvUAF GhxnQ/n2RW9Q+FCXy7FTss8gxl7FBKdWSCnGQ1ZYa+hwnIa9WU1GLG94Zx/qc/GO zIYstUKsYtn2ViHn9XTbbewUz1XeCaedmb/R1Uu14j6WrzgirhJOfVHE2US9Xlji w/GruENw4Xqlb842q5NbZnv2XtTFQWLAduthAotclnIecYRvqIWRR8JVf64XVvjR K0JmH08N =D7jg -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.9, take #2 - Fix handling of S1 Page Table Walk permission fault at S2 on instruction fetch - Cleanup kvm_vcpu_dabt_iswrite()	2020-09-20 17:31:07 -04:00
Vitaly Kuznetsov	7d1f8691cc	Revert "KVM: Check the allocation of pv cpu mask" The commit `0f99022210` ("KVM: Check the allocation of pv cpu mask") we have in 5.9-rc5 has two issue: 1) Compilation fails for !CONFIG_SMP, see: https://bugzilla.kernel.org/show_bug.cgi?id=209285 2) This commit completely disables PV TLB flush, see https://lore.kernel.org/kvm/87y2lrnnyf.fsf@vitty.brq.redhat.com/ The allocation problem is likely a theoretical one, if we don't have memory that early in boot process we're likely doomed anyway. Let's solve it properly later. This reverts commit `0f99022210`. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-20 17:29:58 -04:00
Marc Zyngier	620cf45f7a	KVM: arm64: Remove S1PTW check from kvm_vcpu_dabt_iswrite() Now that kvm_vcpu_trap_is_write_fault() checks for S1PTW, there is no need for kvm_vcpu_dabt_iswrite() to do the same thing, as we already check for this condition on all existing paths. Drop the check and add a comment instead. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200915104218.1284701-3-maz@kernel.org	2020-09-18 18:01:48 +01:00
Marc Zyngier	c4ad98e4b7	KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch KVM currently assumes that an instruction abort can never be a write. This is in general true, except when the abort is triggered by a S1PTW on instruction fetch that tries to update the S1 page tables (to set AF, for example). This can happen if the page tables have been paged out and brought back in without seeing a direct write to them (they are thus marked read only), and the fault handling code will make the PT executable(!) instead of writable. The guest gets stuck forever. In these conditions, the permission fault must be considered as a write so that the Stage-1 update can take place. This is essentially the I-side equivalent of the problem fixed by `60e21a0ef5` ("arm64: KVM: Take S1 walks into account when determining S2 write faults"). Update kvm_is_write_fault() to return true on IABT+S1PTW, and introduce kvm_vcpu_trap_is_exec_fault() that only return true when no faulting on a S1 fault. Additionally, kvm_vcpu_dabt_iss1tw() is renamed to kvm_vcpu_abt_iss1tw(), as the above makes it plain that it isn't specific to data abort. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200915104218.1284701-2-maz@kernel.org	2020-09-18 18:01:48 +01:00
Paul Mackerras	35dfb43c24	KVM: PPC: Book3S HV: Set LPCR[HDICE] before writing HDEC POWER8 and POWER9 machines have a hardware deviation where generation of a hypervisor decrementer exception is suppressed if the HDICE bit in the LPCR register is 0 at the time when the HDEC register decrements from 0 to -1. When entering a guest, KVM first writes the HDEC register with the time until it wants the CPU to exit the guest, and then writes the LPCR with the guest value, which includes HDICE = 1. If HDEC decrements from 0 to -1 during the interval between those two events, it is possible that we can enter the guest with HDEC already negative but no HDEC exception pending, meaning that no HDEC interrupt will occur while the CPU is in the guest, or at least not until HDEC wraps around. Thus it is possible for the CPU to keep executing in the guest for a long time; up to about 4 seconds on POWER8, or about 4.46 years on POWER9 (except that the host kernel hard lockup detector will fire first). To fix this, we set the LPCR[HDICE] bit before writing HDEC on guest entry. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-17 11:38:17 +10:00
Fabiano Rosas	05e6295dc7	KVM: PPC: Book3S HV: Do not allocate HPT for a nested guest The current nested KVM code does not support HPT guests. This is informed/enforced in some ways: - Hosts < P9 will not be able to enable the nested HV feature; - The nested hypervisor MMU capabilities will not contain KVM_CAP_PPC_MMU_HASH_V3; - QEMU reflects the MMU capabilities in the 'ibm,arch-vec-5-platform-support' device-tree property; - The nested guest, at 'prom_parse_mmu_model' ignores the 'disable_radix' kernel command line option if HPT is not supported; - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT. There is, however, still a way to start a HPT guest by using max-compat-cpu=power8 at the QEMU machine options. This leads to the guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB ioctl. With the guest set to hash, the nested hypervisor goes through the entry path that has no knowledge of nesting (kvmppc_run_vcpu) and crashes when it tries to execute an hypervisor-privileged (mtspr HDEC) instruction at __kvmppc_vcore_entry: root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ... <snip> [ 538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1 [ 538.543355] NIP: c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0 [ 538.543417] REGS: c0000013e91e33b0 TRAP: 0700 Not tainted (5.9.0-rc4) [ 538.543470] MSR: 8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 22422882 XER: 20040000 [ 538.543546] CFAR: c00800000753f4b0 IRQMASK: 3 GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000 GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72 GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0 GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008 GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8 GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000 GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001 GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001 [ 538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv] [ 538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv] [ 538.544173] Call Trace: [ 538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable) [ 538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv] [ 538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv] [ 538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm] [ 538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm] [ 538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm] [ 538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90 [ 538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0 [ 538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c [ 538.544787] Instruction dump: [ 538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002 [ 538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd [ 538.544953] ---[ end trace 74423e2b948c2e0c ]--- This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in the nested hypervisor, causing QEMU to abort. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-17 11:38:17 +10:00
Greg Kurz	4e1b2ab7e6	KVM: PPC: Don't return -ENOTSUPP to userspace in ioctls ENOTSUPP is a linux only thingy, the value of which is unknown to userspace, not to be confused with ENOTSUP which linux maps to EOPNOTSUPP, as permitted by POSIX [1]: [EOPNOTSUPP] Operation not supported on socket. The type of socket (address family or protocol) does not support the requested operation. A conforming implementation may assign the same values for [EOPNOTSUPP] and [ENOTSUP]. Return -EOPNOTSUPP instead of -ENOTSUPP for the following ioctls: - KVM_GET_FPU for Book3s and BookE - KVM_SET_FPU for Book3s and BookE - KVM_GET_DIRTY_LOG for BookE This doesn't affect QEMU which doesn't call the KVM_GET_FPU and KVM_SET_FPU ioctls on POWER anyway since they are not supported, and _buggily_ ignores anything but -EPERM for KVM_GET_DIRTY_LOG. [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-17 11:38:17 +10:00
Collin Walling	f20d4e924b	docs: kvm: add documentation for KVM_CAP_S390_DIAG318 Documentation for the s390 DIAGNOSE 0x318 instruction handling. Signed-off-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/kvm/20200625150724.10021-2-walling@linux.ibm.com/ Message-Id: <20200625150724.10021-2-walling@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2020-09-14 09:08:10 +02:00
Maxim Levitsky	37f66bbef0	KVM: emulator: more strict rsm checks. Don't ignore return values in rsm_load_state_64/32 to avoid loading invalid state from SMM state area if it was tampered with by the guest. This is primarly intended to avoid letting guest set bits in EFER (like EFER.SVME when nesting is disabled) by manipulating SMM save area. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 12:22:55 -04:00
Maxim Levitsky	3ebb5d2617	KVM: nSVM: more strict SMM checks when returning to nested guest * check that guest is 64 bit guest, otherwise the SVM related fields in the smm state area are not defined * If the SMM area indicates that SMM interrupted a running guest, check that EFER.SVME which is also saved in this area is set, otherwise the guest might have tampered with SMM save area, and so indicate emulation failure which should triple fault the guest. * Check that that guest CPUID supports SVM (due to the same issue as above) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 12:21:43 -04:00
Maxim Levitsky	772b81bb2f	SVM: nSVM: setup nested msr permission bitmap on nested state load This code was missing and was forcing the L2 run with L1's msr permission bitmap Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 12:20:53 -04:00
Maxim Levitsky	9883764ad0	SVM: nSVM: correctly restore GIF on vmexit from nesting after migration Currently code in svm_set_nested_state copies the current vmcb control area to L1 control area (hsave->control), under assumption that it mostly reflects the defaults that kvm choose, and later qemu overrides these defaults with L2 state using standard KVM interfaces, like KVM_SET_REGS. However nested GIF (which is AMD specific thing) is by default is true, and it is copied to hsave area as such. This alone is not a big deal since on VMexit, GIF is always set to false, regardless of what it was on VM entry. However in nested_svm_vmexit we were first were setting GIF to false, but then we overwrite the control fields with value from the hsave area. (including the nested GIF field itself if GIF virtualization is enabled). Now on normal vm entry this is not a problem, since GIF is usually false prior to normal vm entry, and this is the value that copied to hsave, and then restored, but this is not always the case when the nested state is loaded as explained above. To fix this issue, move svm_set_gif after we restore the L1 control state in nested_svm_vmexit, so that even with wrong GIF in the saved L1 control area, we still clear GIF as the spec says. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 12:19:06 -04:00
Vitaly Kuznetsov	cc17b22559	x86/kvm: don't forget to ACK async PF IRQ Merge commit `26d05b368a` ("Merge branch 'kvm-async-pf-int' into HEAD") tried to adapt the new interrupt based async PF mechanism to the newly introduced IDTENTRY magic but unfortunately it missed the fact that DEFINE_IDTENTRY_SYSVEC() doesn't call ack_APIC_irq() on its own and all DEFINE_IDTENTRY_SYSVEC() users have to call it manually. As the result all multi-CPU KVM guest hang on boot when KVM_FEATURE_ASYNC_PF_INT is present. The breakage went unnoticed because no KVM userspace (e.g. QEMU) currently set it (and thus async PF mechanism is currently disabled) but we're about to change that. Fixes: `26d05b368a` ("Merge branch 'kvm-async-pf-int' into HEAD") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200908135350.355053-3-vkuznets@redhat.com> Tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 02:22:21 -04:00
Vitaly Kuznetsov	244081f907	x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro DEFINE_IDTENTRY_SYSVEC() already contains irqentry_enter()/ irqentry_exit(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200908135350.355053-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 02:22:07 -04:00
Wanpeng Li	99b82a1437	KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit According to SDM 27.2.4, Event delivery causes an APIC-access VM exit. Don't report internal error and freeze guest when event delivery causes an APIC-access exit, it is handleable and the event will be re-injected during the next vmentry. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1597827327-25055-2-git-send-email-wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 02:21:17 -04:00
Wanpeng Li	e42c68281b	KVM: SVM: avoid emulation with stale next_rip svm->next_rip is reset in svm_vcpu_run() only after calling svm_exit_handlers_fastpath(), which will cause SVM's skip_emulated_instruction() to write a stale RIP. We can move svm_exit_handlers_fastpath towards the end of svm_vcpu_run(). To align VMX with SVM, keep svm_complete_interrupts() close as well. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paul K. <kronenpj@kronenpj.dyndns.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> [Also move vmcb_mark_all_clean before any possible write to the VMCB. - Paolo]	2020-09-12 02:19:23 -04:00
Vitaly Kuznetsov	d831de1772	KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN Even without in-kernel LAPIC we should allow writing '0' to MSR_KVM_ASYNC_PF_EN as we're not enabling the mechanism. In particular, QEMU with 'kernel-irqchip=off' fails to start a guest with qemu-system-x86_64: error: failed to set MSR 0x4b564d02 to 0x0 Fixes: `9d3c447c72` ("KVM: X86: Fix async pf caused null-ptr-deref") Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200911093147.484565-1-vkuznets@redhat.com> [Actually commit the version proposed by Sean Christopherson. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-11 13:26:47 -04:00
David Rientjes	7be74942f1	KVM: SVM: Periodically schedule when unregistering regions on destroy There may be many encrypted regions that need to be unregistered when a SEV VM is destroyed. This can lead to soft lockups. For example, on a host running 4.15: watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348] CPU: 206 PID: 194348 Comm: t_virtual_machi RIP: 0010:free_unref_page_list+0x105/0x170 ... Call Trace: [<0>] release_pages+0x159/0x3d0 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd] [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd] [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd] [<0>] kvm_arch_destroy_vm+0x47/0x200 [<0>] kvm_put_kvm+0x1a8/0x2f0 [<0>] kvm_vm_release+0x25/0x30 [<0>] do_exit+0x335/0xc10 [<0>] do_group_exit+0x3f/0xa0 [<0>] get_signal+0x1bc/0x670 [<0>] do_signal+0x31/0x130 Although the CLFLUSH is no longer issued on every encrypted region to be unregistered, there are no other changes that can prevent soft lockups for very large SEV VMs in the latest kernel. Periodically schedule if necessary. This still holds kvm->lock across the resched, but since this only happens when the VM is destroyed this is assumed to be acceptable. Signed-off-by: David Rientjes <rientjes@google.com> Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-11 13:24:15 -04:00
Huacai Chen	15e9e35cd1	KVM: MIPS: Change the definition of kvm type MIPS defines two kvm types: #define KVM_VM_MIPS_TE 0 #define KVM_VM_MIPS_VZ 1 In Documentation/virt/kvm/api.rst it is said that "You probably want to use 0 as machine type", which implies that type 0 be the "automatic" or "default" type. And, in user-space libvirt use the null-machine (with type 0) to detect the kvm capability, which returns "KVM not supported" on a VZ platform. I try to fix it in QEMU but it is ugly: https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html And Thomas Huth suggests me to change the definition of kvm type: https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html So I define like this: #define KVM_VM_MIPS_AUTO 0 #define KVM_VM_MIPS_VZ 1 #define KVM_VM_MIPS_TE 2 Since VZ and TE cannot co-exists, using type 0 on a TE platform will still return success (so old user-space tools have no problems on new kernels); the advantage is that using type 0 on a VZ platform will not return failure. So, the only problem is "new user-space tools use type 2 on old kernels", but if we treat this as a kernel bug, we can backport this patch to old stable kernels. Signed-off-by: Huacai Chen <chenhc@lemote.com> Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-11 13:22:52 -04:00
Lai Jiangshan	f6f6195b88	kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed When kvm_mmu_get_page() gets a page with unsynced children, the spt pagetable is unsynchronized with the guest pagetable. But the guest might not issue a "flush" operation on it when the pagetable entry is changed from zero or other cases. The hypervisor has the responsibility to synchronize the pagetables. KVM behaved as above for many years, But commit `8c8560b833` ("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes") inadvertently included a line of code to change it without giving any reason in the changelog. It is clear that the commit's intention was to change KVM_REQ_TLB_FLUSH -> KVM_REQ_TLB_FLUSH_CURRENT, so we don't needlessly flush other contexts; however, one of the hunks changed a nearby KVM_REQ_MMU_SYNC instead. This patch changes it back. Link: https://lore.kernel.org/lkml/20200320212833.3507-26-sean.j.christopherson@intel.com/ Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20200902135421.31158-1-jiangshanlai@gmail.com> fixes: `8c8560b833` ("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-11 13:16:55 -04:00
Chenyi Qiang	c6b177a3be	KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control A minor fix for the update of VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL field in exit_ctls_high. Fixes: `03a8871add` ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control") Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200828085622.8365-5-chenyi.qiang@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-11 13:15:12 -04:00
Rustam Kovhaev	f65886606c	KVM: fix memory leak in kvm_io_bus_unregister_dev() when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing the bus, we should iterate over all other devices linked to it and call kvm_iodevice_destructor() for them Fixes: `90db10434b` ("KVM: kvm_io_bus_unregister_dev() should never fail") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707 Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200907185535.233114-1-rkovhaev@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-11 13:15:11 -04:00

1 2 3 4 5 ...

950192 Commits (fc387d8daf3960c5e1bc18fa353768056f4fd394) All Branches Search

950192 Commits (fc387d8daf3960c5e1bc18fa353768056f4fd394)

All Branches