1
0
Fork 0
Commit Graph

78 Commits (d691b7e7d1b5186eae62fd32adee65d3316bfdf6)

Author SHA1 Message Date
Balbir Singh 0428491cba powerpc/mm: Trace tlbie(l) instructions
Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate
Entry (Local)) instructions.

The tlbie instruction has changed over the years, so not all versions
accept the same operands. Use the ISA v3 field operands because they are
the most verbose, we may change them in future.

Example output:

  qemu-system-ppc-5371  [016]  1412.369519: tlbie:
  	tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Add some missing trace_tlbie()s, reword change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-23 21:14:49 +10:00
Aneesh Kumar K.V d19469e841 power/mm: update pte_write and pte_wrprotect to handle savedwrite
We use pte_write() to check whethwer the pte entry is writable.  This is
mostly used to later mark the pte read only if it is writable.  The other
use of pte_write() is to check whether the pte_entry is writable so that
hardware page table entry can be marked accordingly.  This is used in kvm
where we look at qemu page table entry and update hardware hash page table
for the guest with correct write enable bit.

With the above, for the first usage we should also check the savedwrite
bit so that we can correctly clear the savedwite bit.  For the later, we
add a new variant __pte_write().

With this we can revert write_protect_page part of 595cd8f256 ("mm/ksm:
handle protnone saved writes when making page write protect").  But I left
it as it is as an example code for savedwrite check.

Fixes: c137a2757b ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write")
Link: http://lkml.kernel.org/r/1488203787-17849-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09 17:01:09 -08:00
David Gibson 3d089f84c6 KVM: PPC: Book3S HV: Don't store values derivable from HPT order
Currently the kvm_hpt_info structure stores the hashed page table's order,
and also the number of HPTEs it contains and a mask for its size.  The
last two can be easily derived from the order, so remove them and just
calculate them as necessary with a couple of helper inlines.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:34 +11:00
David Gibson 3f9d4f5a5f KVM: PPC: Book3S HV: Gather HPT related variables into sub-structure
Currently, the powerpc kvm_arch structure contains a number of variables
tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
patch gathers them all together into a single kvm_hpt_info substructure.
This makes life more convenient for the upcoming HPT resizing
implementation.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:28 +11:00
Paul Mackerras a29ebeaf55 KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement
With radix, the guest can do TLB invalidations itself using the tlbie
(global) and tlbiel (local) TLB invalidation instructions.  Linux guests
use local TLB invalidations for translations that have only ever been
accessed on one vcpu.  However, that doesn't mean that the translations
have only been accessed on one physical cpu (pcpu) since vcpus can move
around from one pcpu to another.  Thus a tlbiel might leave behind stale
TLB entries on a pcpu where the vcpu previously ran, and if that task
then moves back to that previous pcpu, it could see those stale TLB
entries and thus access memory incorrectly.  The usual symptom of this
is random segfaults in userspace programs in the guest.

To cope with this, we detect when a vcpu is about to start executing on
a thread in a core that is a different core from the last time it
executed.  If that is the case, then we mark the core as needing a
TLB flush and then send an interrupt to any thread in the core that is
currently running a vcpu from the same guest.  This will get those vcpus
out of the guest, and the first one to re-enter the guest will do the
TLB flush.  The reason for interrupting the vcpus executing on the old
core is to cope with the following scenario:

	CPU 0			CPU 1			CPU 4
	(core 0)			(core 0)			(core 1)

	VCPU 0 runs task X      VCPU 1 runs
	core 0 TLB gets
	entries from task X
	VCPU 0 moves to CPU 4
							VCPU 0 runs task X
							Unmap pages of task X
							tlbiel

				(still VCPU 1)			task X moves to VCPU 1
				task X runs
				task X sees stale TLB
				entries

That is, as soon as the VCPU starts executing on the new core, it
could unmap and tlbiel some page table entries, and then the task
could migrate to one of the VCPUs running on the old core and
potentially see stale TLB entries.

Since the TLB is shared between all the threads in a core, we only
use the bit of kvm->arch.need_tlb_flush corresponding to the first
thread in the core.  To ensure that we don't have a window where we
can miss a flush, this moves the clearing of the bit from before the
actual flush to after it.  This way, two threads might both do the
flush, but we prevent the situation where one thread can enter the
guest before the flush is finished.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:51 +11:00
Paul Mackerras 65dae5403a KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode
If the guest is in radix mode, then it doesn't have a hashed page
table (HPT), so all of the hypercalls that manipulate the HPT can't
work and should return an error.  This adds checks to make them
return H_FUNCTION ("function not supported").

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:50 +11:00
Paul Mackerras e34af78490 KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h
This moves the prototypes for functions that are only called from
assembler code out of asm/asm-prototypes.h into asm/kvm_ppc.h.
The prototypes were added in commit ebe4535fbe ("KVM: PPC:
Book3S HV: sparse: prototypes for functions called from assembler",
2016-10-10), but given that the functions are KVM functions,
having them in a KVM header will be better for long-term
maintenance.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-12-01 14:03:46 +11:00
Paul Mackerras 7c5b06cadf KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9
POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
and tlbiel (local tlbie) instructions.  Both instructions get a
set of new parameters (RIC, PRS and R) which appear as bits in the
instruction word.  The tlbiel instruction now has a second register
operand, which contains a PID and/or LPID value if needed, and
should otherwise contain 0.

This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
as well as older processors.  Since we only handle HPT guests so
far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
word as on previous processors, so we don't need to conditionally
execute different instructions depending on the processor.

The local flush on first entry to a guest in book3s_hv_rmhandlers.S
is a loop which depends on the number of TLB sets.  Rather than
using feature sections to set the number of iterations based on
which CPU we're on, we now work out this number at VM creation time
and store it in the kvm_arch struct.  That will make it possible to
get the number from the device tree in future, which will help with
compatibility with future processors.

Since mmu_partition_table_set_entry() does a global flush of the
whole LPID, we don't need to do the TLB flush on first entry to the
guest on each processor.  Therefore we don't set all bits in the
tlb_need_flush bitmap on VM startup on POWER9.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24 09:24:23 +11:00
Paul Mackerras abb7c7ddba KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9
This adapts the KVM-HV hashed page table (HPT) code to read and write
HPT entries in the new format defined in Power ISA v3.00 on POWER9
machines.  The new format moves the B (segment size) field from the
first doubleword to the second, and trims some bits from the AVA
(abbreviated virtual address) and ARPN (abbreviated real page number)
fields.  As far as possible, the conversion is done when reading or
writing the HPT entries, and the rest of the code continues to use
the old format.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24 09:24:23 +11:00
Paul Mackerras f064a0de15 KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT
The hashed page table MMU in POWER processors can update the R
(reference) and C (change) bits in a HPTE at any time until the
HPTE has been invalidated and the TLB invalidation sequence has
completed.  In kvmppc_h_protect, which implements the H_PROTECT
hypercall, we read the HPTE, modify the second doubleword,
invalidate the HPTE in memory, do the TLB invalidation sequence,
and then write the modified value of the second doubleword back
to memory.  In doing so we could overwrite an R/C bit update done
by hardware between when we read the HPTE and when the TLB
invalidation completed.  To fix this we re-read the second
doubleword after the TLB invalidation and OR in the (possibly)
new values of R and C.  We can use an OR since hardware only ever
sets R and C, never clears them.

This race was found by code inspection.  In principle this bug could
cause occasional guest memory corruption under host memory pressure.

Fixes: a8606e20e4 ("KVM: PPC: Handle some PAPR hcalls in the kernel", 2011-06-29)
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-21 15:29:20 +11:00
Yongji Xie a56ee9f8f0 KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries
This keeps a per vcpu cache for recently page faulted MMIO entries.
On a page fault, if the entry exists in the cache, we can avoid some
time-consuming paths, for example, looking up HPT, locking HPTE twice
and searching mmio gfn from memslots, then directly call
kvmppc_hv_emulate_mmio().

In current implenment, we limit the size of cache to four. We think
it's enough to cover the high-frequency MMIO HPTEs in most case.
For example, considering the case of using virtio device, for virtio
legacy devices, one HPTE could handle notifications from up to
1024 (64K page / 64 byte Port IO register) devices, so one cache entry
is enough; for virtio modern devices, we always need one HPTE to handle
notification for each device because modern device would use a 8M MMIO
register to notify host instead of Port IO register, typically the
system's configuration should not exceed four virtio devices per
vcpu, four cache entry is also enough in this case. Of course, if needed,
we could also modify the macro to a module parameter in the future.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-21 15:17:55 +11:00
Yongji Xie f05859827d KVM: PPC: Book3S HV: Clear the key field of HPTE when the page is paged out
Currently we mark a HPTE for emulated MMIO with HPTE_V_ABSENT bit
set as well as key 0x1f. However, those HPTEs may be conflicted with
the HPTE for real guest RAM page HPTE with key 0x1f when the page
get paged out.

This patch clears the key field of HPTE when the page is paged out,
then recover it when HPTE is re-established.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-21 15:17:55 +11:00
Daniel Axtens ebe4535fbe KVM: PPC: Book3S HV: sparse: prototypes for functions called from assembler
A bunch of KVM functions are only called from assembler.
Give them prototypes in asm-prototypes.h
This reduces sparse warnings.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-21 15:17:54 +11:00
Aneesh Kumar K.V 30bda41aba powerpc/mm: Drop WIMG in favour of new constants
PowerISA 3.0 introduces two pte bits with the below meaning for radix:
  00 -> Normal Memory
  01 -> Strong Access Order (SAO)
  10 -> Non idempotent I/O (Cache inhibited and guarded)
  11 -> Tolerant I/O (Cache inhibited)

We drop the existing WIMG bits in the Linux page table in favour of the
above constants. We loose _PAGE_WRITETHRU with this conversion. We only
use writethru via pgprot_cached_wthru() which is used by
fbdev/controlfb.c which is Apple control display and also PPC32.

With respect to _PAGE_COHERENCE, we have been marking hpte always
coherent for some time now. htab_convert_pte_flags() always added
HPTE_R_M.

NOTE: KVM changes need closer review.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:33 +10:00
Aneesh Kumar K.V f64e8084c9 powerpc/mm: Move hash related mmu-*.h headers to book3s/
No code changes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-03 21:19:21 +11:00
Linus Torvalds 2f4bf528ec powerpc updates for 4.4
- Kconfig: remove BE-only platforms from LE kernel build from Boqun Feng
  - Refresh ps3_defconfig from Geoff Levand
  - Emit GNU & SysV hashes for the vdso from Michael Ellerman
  - Define an enum for the bolted SLB indexes from Anshuman Khandual
  - Use a local to avoid multiple calls to get_slb_shadow() from Michael Ellerman
  - Add gettimeofday() benchmark from Michael Neuling
  - Avoid link stack corruption in __get_datapage() from Michael Neuling
  - Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar K.V
  - Add ppc64le_defconfig from Michael Ellerman
  - pseries: extract of_helpers module from Andy Shevchenko
  - Correct string length in pseries_of_derive_parent() from Nathan Fontenot
  - Free the MSI bitmap if it was slab allocated from Denis Kirjanov
  - Shorten irq_chip name for the SIU from Christophe Leroy
  - Wait 1s for secondaries to enter OPAL during kexec from Samuel Mendoza-Jonas
  - Fix _ALIGN_* errors due to type difference. from Aneesh Kumar K.V
  - powerpc/pseries/hvcserver: don't memset pi_buff if it is null from Colin Ian King
  - Disable hugepd for 64K page size. from Aneesh Kumar K.V
  - Differentiate between hugetlb and THP during page walk from Aneesh Kumar K.V
  - Make PCI non-optional for pseries from Michael Ellerman
  - Individual System V IPC system calls from Sam bobroff
  - Add selftest of unmuxed IPC calls from Michael Ellerman
  - discard .exit.data at runtime from Stephen Rothwell
  - Delete old orphaned PrPMC 280/2800 DTS and boot file. from Paul Gortmaker
  - Use of_get_next_parent to simplify code from Christophe Jaillet
  - Paginate some xmon output from Sam bobroff
  - Add some more elements to the xmon PACA dump from Michael Ellerman
  - Allow the tm-syscall selftest to build with old headers from Michael Ellerman
  - Run EBB selftests only on POWER8 from Denis Kirjanov
  - Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael Ellerman
  - Avoid reference to potentially freed memory in prom.c from Christophe Jaillet
  - Quieten boot wrapper output with run_cmd from Geoff Levand
  - EEH fixes and cleanups from Gavin Shan
  - Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
  - Use of_get_next_parent() in of_get_ibm_chip_id() from Michael Ellerman
  - Fix section mismatch warning in msi_bitmap_alloc() from Denis Kirjanov
  - Fix ps3-lpm white space from Rudhresh Kumar J
  - Fix ps3-vuart null dereference from Colin King
  - nvram: Add missing kfree in error path from Christophe Jaillet
  - nvram: Fix function name in some errors messages. from Christophe Jaillet
  - drivers/macintosh: adb: fix misleading Kconfig help text from Aaro Koskinen
  - agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
  - cxl: Free virtual PHB when removing from Andrew Donnellan
  - scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from Michael Ellerman
  - scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building with O= from Michael Ellerman
 
  - Freescale updates from Scott: Highlights include 64-bit book3e kexec/kdump
    support, a rework of the qoriq clock driver, device tree changes including
    qoriq fman nodes, support for a new 85xx board, and some fixes.
 
  - MPC5xxx updates from Anatolij: Highlights include a driver for MPC512x
    LocalPlus Bus FIFO with its device tree binding documentation, mpc512x
    device tree updates and some minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWPEZgAAoJEFHr6jzI4aWANjYQAKX2Q/95hqKfCuF5FBcUmtMC
 Pu/Nff027MVzxZ2ApDcvvLGps5Nz2bn3nIhc9zjkXc5E8DuL6X3Yl8ce7qyNcc3g
 cJJ8RvtUo6J1OMWetXFehtPYniAAwKMhZYKnj0+WnLr2SyH/Vhl3ehDkFbGyPtuH
 r+2E7krFjfVgU+bzciIFnOaDekFuFN/pXWMb6e6zQyBJe9N8ZIp96uouGCebKVd0
 VDLItzdaKErT8JFfbymMPvZm3V0rMVx4WWu3kAbQX8LrD5a18NF1zrjAOHRXc61n
 kkk8/DPuNOon1PbXXyiS5BcFyZRe+KE3VBnoW5sOMqMIRg5WdO1oU3e2pEfXMO8+
 leXYwFLXiKzUZuOgQG2QiUhrzD2yC1o6/TJWATv0dSl9AwrecgPX+Vj6X357slAf
 A9E3eMy5tgnpndBWZmvZS3W7YDKH+NkeZ+Q40+NErAlqr++ErrTcKVndk5vWlYTT
 7mMZeTXagX66al/k5ATKqwB7iUSpnYHSAa9fcUYPSM2FnXsDxPyeJGkBbcoOmkGj
 QrpgNYOvJaUJd076goZCV39v0c1xpfV9/9kyVch8HUadf6JcjpVZwYnbGw2qlJjh
 ZanuBG2VOeSwaKQqXiRBSBetnpAg8CVpFjDmX9wOBfSek2wxEJqDX/vQExdbIDQQ
 pUs7vnUxLzhmW/x+ygOI
 =YwcM
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Kconfig: remove BE-only platforms from LE kernel build from Boqun
   Feng
 - Refresh ps3_defconfig from Geoff Levand
 - Emit GNU & SysV hashes for the vdso from Michael Ellerman
 - Define an enum for the bolted SLB indexes from Anshuman Khandual
 - Use a local to avoid multiple calls to get_slb_shadow() from Michael
   Ellerman
 - Add gettimeofday() benchmark from Michael Neuling
 - Avoid link stack corruption in __get_datapage() from Michael Neuling
 - Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar
   K.V
 - Add ppc64le_defconfig from Michael Ellerman
 - pseries: extract of_helpers module from Andy Shevchenko
 - Correct string length in pseries_of_derive_parent() from Nathan
   Fontenot
 - Free the MSI bitmap if it was slab allocated from Denis Kirjanov
 - Shorten irq_chip name for the SIU from Christophe Leroy
 - Wait 1s for secondaries to enter OPAL during kexec from Samuel
   Mendoza-Jonas
 - Fix _ALIGN_* errors due to type difference, from Aneesh Kumar K.V
 - powerpc/pseries/hvcserver: don't memset pi_buff if it is null from
   Colin Ian King
 - Disable hugepd for 64K page size, from Aneesh Kumar K.V
 - Differentiate between hugetlb and THP during page walk from Aneesh
   Kumar K.V
 - Make PCI non-optional for pseries from Michael Ellerman
 - Individual System V IPC system calls from Sam bobroff
 - Add selftest of unmuxed IPC calls from Michael Ellerman
 - discard .exit.data at runtime from Stephen Rothwell
 - Delete old orphaned PrPMC 280/2800 DTS and boot file, from Paul
   Gortmaker
 - Use of_get_next_parent to simplify code from Christophe Jaillet
 - Paginate some xmon output from Sam bobroff
 - Add some more elements to the xmon PACA dump from Michael Ellerman
 - Allow the tm-syscall selftest to build with old headers from Michael
   Ellerman
 - Run EBB selftests only on POWER8 from Denis Kirjanov
 - Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael
   Ellerman
 - Avoid reference to potentially freed memory in prom.c from Christophe
   Jaillet
 - Quieten boot wrapper output with run_cmd from Geoff Levand
 - EEH fixes and cleanups from Gavin Shan
 - Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
 - Use of_get_next_parent() in of_get_ibm_chip_id() from Michael
   Ellerman
 - Fix section mismatch warning in msi_bitmap_alloc() from Denis
   Kirjanov
 - Fix ps3-lpm white space from Rudhresh Kumar J
 - Fix ps3-vuart null dereference from Colin King
 - nvram: Add missing kfree in error path from Christophe Jaillet
 - nvram: Fix function name in some errors messages, from Christophe
   Jaillet
 - drivers/macintosh: adb: fix misleading Kconfig help text from Aaro
   Koskinen
 - agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
 - cxl: Free virtual PHB when removing from Andrew Donnellan
 - scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from
   Michael Ellerman
 - scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building
   with O= from Michael Ellerman
 - Freescale updates from Scott: Highlights include 64-bit book3e
   kexec/kdump support, a rework of the qoriq clock driver, device tree
   changes including qoriq fman nodes, support for a new 85xx board, and
   some fixes.
 - MPC5xxx updates from Anatolij: Highlights include a driver for
   MPC512x LocalPlus Bus FIFO with its device tree binding
   documentation, mpc512x device tree updates and some minor fixes.

* tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (106 commits)
  powerpc/msi: Fix section mismatch warning in msi_bitmap_alloc()
  powerpc/prom: Use of_get_next_parent() in of_get_ibm_chip_id()
  powerpc/pseries: Correct string length in pseries_of_derive_parent()
  powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry
  powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)
  powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan
  powerpc/fsl: Add #clock-cells and clockgen label to clockgen nodes
  powerpc: handle error case in cpm_muram_alloc()
  powerpc: mpic: use IRQCHIP_SKIP_SET_WAKE instead of redundant mpic_irq_set_wake
  powerpc/book3e-64: Enable kexec
  powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop
  powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32
  powerpc/book3e-64/kexec: Enable SMP release
  powerpc/book3e-64/kexec: create an identity TLB mapping
  powerpc/book3e-64: Don't limit paca to 256 MiB
  powerpc/book3e/kdump: Enable crash_kexec_wait_realmode
  powerpc/book3e: support CONFIG_RELOCATABLE
  powerpc/booke64: Fix args to copy_and_flush
  powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts
  powerpc/e6500: kexec: Handle hardware threads
  ...
2015-11-05 23:38:43 -08:00
Paul Mackerras c64dfe2af3 KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs
This fixes a bug where the old HPTE value returned by H_REMOVE has
the valid bit clear if the HPTE was an absent HPTE, as happens for
HPTEs for emulated MMIO pages and for RAM pages that have been paged
out by the host.  If the absent bit is set, we clear it and set the
valid bit, because from the guest's point of view, the HPTE is valid.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-21 16:25:06 +11:00
Aneesh Kumar K.V 891121e6c0 powerpc/mm: Differentiate between hugetlb and THP during page walk
We need to properly identify whether a hugepage is an explicit or
a transparent hugepage in follow_huge_addr(). We used to depend
on hugepage shift argument to do that. But in some case that can
result in wrong results. For ex:

On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
We do prevent reusing the pfn page via the usage of
kick_all_cpus_sync(). But that happens after we updated the pte to 0.
Hence in follow_huge_addr() we can find hugepage shift set, but transparent
huge page check fail for a thp pte.

NOTE: We fixed a variant of this race against thp split in commit
691e95fd73
("powerpc/mm/thp: Make page table walk safe against thp split/collapse")

Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
follow_page_mask occasionally.

In the long term, we may want to switch ppc64 64k page size config to
enable CONFIG_ARCH_WANT_GENERAL_HUGETLB

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-12 15:30:09 +11:00
Paul Mackerras cdeee51842 KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
This adds implementations for the H_CLEAR_REF (test and clear reference
bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.

When clearing the reference or change bit in the guest view of the HPTE,
we also have to clear it in the real HPTE so that we can detect future
references or changes.  When we do so, we transfer the R or C bit value
to the rmap entry for the underlying host page so that kvm_age_hva_hv(),
kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page
has been referenced and/or changed.

These hypercalls are not used by Linux guests.  These implementations
have been tested using a FreeBSD guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:18 +02:00
Paul Mackerras 08fe1e7bd2 KVM: PPC: Book3S HV: Fix bug in dirty page tracking
This fixes a bug in the tracking of pages that get modified by the
guest.  If the guest creates a large-page HPTE, writes to memory
somewhere within the large page, and then removes the HPTE, we only
record the modified state for the first normal page within the large
page, when in fact the guest might have modified some other normal
page within the large page.

To fix this we use some unused bits in the rmap entry to record the
order (log base 2) of the size of the page that was modified, when
removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
order to return the correct number of modified pages.

The same thing could in principle happen when removing a HPTE at the
host's request, i.e. when paging out a page, except that we never
page out large pages, and the guest can only create large-page HPTEs
if the guest RAM is backed by large pages.  However, we also fix
this case for the sake of future-proofing.

The reference bit is also subject to the same loss of information.  We
don't make the same fix here for the reference bit because there isn't
an interface for userspace to find out which pages the guest has
referenced, whereas there is one for userspace to find out which pages
the guest has modified.  Because of this loss of information, the
kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
say that a page has not been referenced when it has, but that doesn't
matter greatly because we never page or swap out large pages.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:18 +02:00
Paul Mackerras 1e5bf454f5 KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
The reference (R) and change (C) bits in a HPT entry can be set by
hardware at any time up until the HPTE is invalidated and the TLB
invalidation sequence has completed.  This means that when removing
a HPTE, we need to read the HPTE after the invalidation sequence has
completed in order to obtain reliable values of R and C.  The code
in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd3265
("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
read after invalidation as a side effect of other changes.  This
restores the read of the HPTE after invalidation.

The user-visible effect of this bug would be that when migrating a
guest, there is a small probability that a page modified by the guest
and then unmapped by the guest might not get re-transmitted and thus
the destination might end up with a stale copy of the page.

Fixes: 6f22bd3265
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-08-22 11:16:18 +02:00
Linus Torvalds 63905bba5b powerpc fixes for 4.1
- Fix for mm_dec_nr_pmds() from Scott.
 - Fixes for oopses seen with KVM + THP from Aneesh.
 - Build fixes from Aneesh & Shreyas.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVOsk5AAoJEFHr6jzI4aWACWMP/3EaNoeA1g8VbZWZEdRaoLvX
 W7D08DI3Dt8HLxyn2JR08jYZF0gr68XrF6OiscVVki7wVXT8fbH4jSNmBkbzNH95
 d9taScJyR1CUavkhsXivnR1qEE1Fi2KA2OW9RaNfoSt1MVtdsvOK6xXklUGksuJQ
 XygzyrRr4Dj82kuMUAMO0YDMvknMlzi3a8dzyrWZBXBZOOTWavGB6bQKtCTaOQ99
 3OFGLQ10uY7lmdHDi0t0tQ99FuYfLiJpg5fTLoUni4J5tFp8JlZ+x0Gwc0apN0cy
 Ym8EO6++qWDv8FXvYEPfVUEjbF1fyPiawUgpkMnyvXgd8K5G85SIrtkGW0Ml+6sX
 GfJH8w9hpDbF5EnWlC9bn/jT7sHBHFdrxZuQUc0L4M2OtM73R2a0Xr3b7ZxFCD1q
 7RpYu8MKKcyvaIXNg7VBJjj8zL+WmUJKF6J5uX5bGU2xH0khmp0vTknyyjbwrlcF
 uHidv5ZhMt3aAI70v14jA5BTEmLyOYRu58Ei6cT/VT/DjdbpEApdK8BMAvKSEeib
 +hzh6oDFT92AM0tbg15bNmqGbGfgqtVKe4GDS2QyGaHGAFOGs1nPuSa9se1xYDcM
 CCtRyABwpzJsrCfwra2fsTU6FxlatK4ONViyWFBXa6mEjBNSZ4XmyZvdWUqlwpSC
 F5jNGppm5Ama6xxcLphA
 =6yQx
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc fixes from Michael Ellerman:

 - fix for mm_dec_nr_pmds() from Scott.

 - fixes for oopses seen with KVM + THP from Aneesh.

 - build fixes from Aneesh & Shreyas.

* tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled
  powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error
  powerpc/mm/thp: Return pte address if we find trans_splitting.
  powerpc/mm/thp: Make page table walk safe against thp split/collapse
  KVM: PPC: Remove page table walk helpers
  KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
  powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
2015-04-26 13:23:15 -07:00
Aneesh Kumar K.V a4bd6eb07c KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
This adds helper routines for locking and unlocking HPTEs, and uses
them in the rest of the code.  We don't change any locking rules in
this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:29 +02:00
Aneesh Kumar K.V 7d6e7f7ffa powerpc/mm/thp: Return pte address if we find trans_splitting.
For THP that is marked trans splitting, we return the pte.
This require the callers to handle the pmd_trans_splitting scenario,
if they care. All the current callers are either looking at pfn or
write_ok, hence we don't need to update them.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-17 11:23:40 +10:00
Aneesh Kumar K.V 691e95fd73 powerpc/mm/thp: Make page table walk safe against thp split/collapse
We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-17 11:23:39 +10:00
Aneesh Kumar K.V dac5657067 KVM: PPC: Remove page table walk helpers
This patch remove helpers which we had used only once in the code.
Limiting page table walk variants help in ensuring that we won't
end up with code walking page table with wrong assumptions.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-17 11:23:24 +10:00
Mel Gorman 8a0516ed8b mm: convert p[te|md]_numa users to p[te|md]_protnone_numa
Convert existing users of pte_numa and friends to the new helper.  Note
that the kernel is broken after this patch is applied until the other page
table modifiers are also altered.  This patch layout is to make review
easier.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:08 -08:00
Paul Mackerras c17b98cf60 KVM: PPC: Book3S HV: Remove code for PPC970 processors
This removes the code that was added to enable HV KVM to work
on PPC970 processors.  The PPC970 is an old CPU that doesn't
support virtualizing guest memory.  Removing PPC970 support also
lets us remove the code for allocating and managing contiguous
real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
case, the code for pinning pages of guest memory when first
accessed and keeping track of which pages have been pinned, and
the code for handling H_ENTER hypercalls in virtual mode.

Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
The KVM_CAP_PPC_RMA capability now always returns 0.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:44:03 +01:00
Paul Mackerras b4a839009a KVM: PPC: Book3S HV: Fix KSM memory corruption
Testing with KSM active in the host showed occasional corruption of
guest memory.  Typically a page that should have contained zeroes
would contain values that look like the contents of a user process
stack (values such as 0x0000_3fff_xxxx_xxx).

Code inspection in kvmppc_h_protect revealed that there was a race
condition with the possibility of granting write access to a page
which is read-only in the host page tables.  The code attempts to keep
the host mapping read-only if the host userspace PTE is read-only, but
if that PTE had been temporarily made invalid for any reason, the
read-only check would not trigger and the host HPTE could end up
read-write.  Examination of the guest HPT in the failure situation
revealed that there were indeed shared pages which should have been
read-only that were mapped read-write.

To close this race, we don't let a page go from being read-only to
being read-write, as far as the real HPTE mapping the page is
concerned (the guest view can go to read-write, but the actual mapping
stays read-only).  When the guest tries to write to the page, we take
an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a
writable HPTE for the page.

This eliminates the occasional corruption of shared pages
that was previously seen with KSM active.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15 13:27:24 +01:00
Paolo Bonzini cc568ead3c Patch queue for ppc - 2014-08-01
Highlights in this release include:
 
   - BookE: Rework instruction fetch, not racy anymore now
   - BookE HV: Fix ONE_REG accessors for some in-hardware registers
   - Book3S: Good number of LE host fixes, enable HV on LE
   - Book3S: Some misc bug fixes
   - Book3S HV: Add in-guest debug support
   - Book3S HV: Preload cache lines on context switch
   - Remove 440 support
 
 Alexander Graf (31):
       KVM: PPC: Book3s PR: Disable AIL mode with OPAL
       KVM: PPC: Book3s HV: Fix tlbie compile error
       KVM: PPC: Book3S PR: Handle hyp doorbell exits
       KVM: PPC: Book3S PR: Fix ABIv2 on LE
       KVM: PPC: Book3S PR: Fix sparse endian checks
       PPC: Add asm helpers for BE 32bit load/store
       KVM: PPC: Book3S HV: Make HTAB code LE host aware
       KVM: PPC: Book3S HV: Access guest VPA in BE
       KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE
       KVM: PPC: Book3S HV: Access XICS in BE
       KVM: PPC: Book3S HV: Fix ABIv2 on LE
       KVM: PPC: Book3S HV: Enable for little endian hosts
       KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct
       KVM: PPC: Deflect page write faults properly in kvmppc_st
       KVM: PPC: Book3S: Stop PTE lookup on write errors
       KVM: PPC: Book3S: Add hack for split real mode
       KVM: PPC: Book3S: Make magic page properly 4k mappable
       KVM: PPC: Remove 440 support
       KVM: Rename and add argument to check_extension
       KVM: Allow KVM_CHECK_EXTENSION on the vm fd
       KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode
       KVM: PPC: Implement kvmppc_xlate for all targets
       KVM: PPC: Move kvmppc_ld/st to common code
       KVM: PPC: Remove kvmppc_bad_hva()
       KVM: PPC: Use kvm_read_guest in kvmppc_ld
       KVM: PPC: Handle magic page in kvmppc_ld/st
       KVM: PPC: Separate loadstore emulation from priv emulation
       KVM: PPC: Expose helper functions for data/inst faults
       KVM: PPC: Remove DCR handling
       KVM: PPC: HV: Remove generic instruction emulation
       KVM: PPC: PR: Handle FSCR feature deselects
 
 Alexey Kardashevskiy (1):
       KVM: PPC: Book3S: Fix LPCR one_reg interface
 
 Aneesh Kumar K.V (4):
       KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
       KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
       KVM: PPC: BOOK3S: PR: Emulate instruction counter
       KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
 
 Anton Blanchard (2):
       KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue
       KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()
 
 Bharat Bhushan (10):
       kvm: ppc: bookehv: Added wrapper macros for shadow registers
       kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1
       kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR
       kvm: ppc: booke: Add shared struct helpers of SPRN_ESR
       kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7
       kvm: ppc: Add SPRN_EPR get helper function
       kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit
       KVM: PPC: Booke-hv: Add one reg interface for SPRG9
       KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer
       KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
 
 Michael Neuling (1):
       KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling
 
 Mihai Caraman (8):
       KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
       KVM: PPC: e500: Fix default tlb for victim hint
       KVM: PPC: e500: Emulate power management control SPR
       KVM: PPC: e500mc: Revert "add load inst fixup"
       KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1
       KVM: PPC: Book3s: Remove kvmppc_read_inst() function
       KVM: PPC: Allow kvmppc_get_last_inst() to fail
       KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
 
 Paul Mackerras (4):
       KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling
       KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled
       KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call
       KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication
 
 Stewart Smith (2):
       Split out struct kvmppc_vcore creation to separate function
       Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJT21skAAoJECszeR4D/txgeFEP/AzJopN7s//W33CfyBqURHXp
 XALCyAw+S67gtcaTZbxomcG1xuT8Lj9WEw28iz3rCtAnJwIxsY63xrI1nXMzTaI2
 p1rC0ai5Qy+nlEbd6L78spZy/Nzh8DFYGWx78iUSO1mYD8xywJwtoiBA539pwp8j
 8N+mgn61Hwhv31bKtsZlmzXymVr/jbTp5LVuxsBLJwD2lgT49g+4uBnX2cG/iXkg
 Rzbh7LxoNNXrSPI8sYmTWu/81aeXteeX70ja6DHuV5dWLNTuAXJrh5EUfeAZqBrV
 aYcLWUYmIyB87txNmt6ZGVar2p3jr2Xhb9mKx+EN4dbehblanLc1PUqlHd0q3dKc
 Nt60ByqpZn+qDAK86dShSZLEe+GT3lovvE76CqVXD4Er+OUEkc9JoxhN1cof/Gb0
 o6uwZ2isXHRdGoZx5vb4s3UTOlwZGtoL/CyY/HD/ujYDSURkCGbxLj3kkecSY8ut
 QdDAWsC15BwsHtKLr5Zwjp2w+0eGq2QJgfvO0zqWFiz9k33SCBCUpwluFeqh27Hi
 aR5Wir3j+MIw9G8XlYlDJWYfi0h/SZ4G7hh7jSu26NBNBzQsDa8ow/cLzdMhdUwH
 OYSaeqVk5wiRb9to1uq1NQWPA0uRAx3BSjjvr9MCGRqmvn+FV5nj637YWUT+53Hi
 aSvg/U2npghLPPG2cihu
 =JuLr
 -----END PGP SIGNATURE-----

Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm

Patch queue for ppc - 2014-08-01

Highlights in this release include:

  - BookE: Rework instruction fetch, not racy anymore now
  - BookE HV: Fix ONE_REG accessors for some in-hardware registers
  - Book3S: Good number of LE host fixes, enable HV on LE
  - Book3S: Some misc bug fixes
  - Book3S HV: Add in-guest debug support
  - Book3S HV: Preload cache lines on context switch
  - Remove 440 support

Alexander Graf (31):
      KVM: PPC: Book3s PR: Disable AIL mode with OPAL
      KVM: PPC: Book3s HV: Fix tlbie compile error
      KVM: PPC: Book3S PR: Handle hyp doorbell exits
      KVM: PPC: Book3S PR: Fix ABIv2 on LE
      KVM: PPC: Book3S PR: Fix sparse endian checks
      PPC: Add asm helpers for BE 32bit load/store
      KVM: PPC: Book3S HV: Make HTAB code LE host aware
      KVM: PPC: Book3S HV: Access guest VPA in BE
      KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE
      KVM: PPC: Book3S HV: Access XICS in BE
      KVM: PPC: Book3S HV: Fix ABIv2 on LE
      KVM: PPC: Book3S HV: Enable for little endian hosts
      KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct
      KVM: PPC: Deflect page write faults properly in kvmppc_st
      KVM: PPC: Book3S: Stop PTE lookup on write errors
      KVM: PPC: Book3S: Add hack for split real mode
      KVM: PPC: Book3S: Make magic page properly 4k mappable
      KVM: PPC: Remove 440 support
      KVM: Rename and add argument to check_extension
      KVM: Allow KVM_CHECK_EXTENSION on the vm fd
      KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode
      KVM: PPC: Implement kvmppc_xlate for all targets
      KVM: PPC: Move kvmppc_ld/st to common code
      KVM: PPC: Remove kvmppc_bad_hva()
      KVM: PPC: Use kvm_read_guest in kvmppc_ld
      KVM: PPC: Handle magic page in kvmppc_ld/st
      KVM: PPC: Separate loadstore emulation from priv emulation
      KVM: PPC: Expose helper functions for data/inst faults
      KVM: PPC: Remove DCR handling
      KVM: PPC: HV: Remove generic instruction emulation
      KVM: PPC: PR: Handle FSCR feature deselects

Alexey Kardashevskiy (1):
      KVM: PPC: Book3S: Fix LPCR one_reg interface

Aneesh Kumar K.V (4):
      KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
      KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
      KVM: PPC: BOOK3S: PR: Emulate instruction counter
      KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page

Anton Blanchard (2):
      KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue
      KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()

Bharat Bhushan (10):
      kvm: ppc: bookehv: Added wrapper macros for shadow registers
      kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1
      kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR
      kvm: ppc: booke: Add shared struct helpers of SPRN_ESR
      kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7
      kvm: ppc: Add SPRN_EPR get helper function
      kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit
      KVM: PPC: Booke-hv: Add one reg interface for SPRG9
      KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer
      KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr

Michael Neuling (1):
      KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling

Mihai Caraman (8):
      KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
      KVM: PPC: e500: Fix default tlb for victim hint
      KVM: PPC: e500: Emulate power management control SPR
      KVM: PPC: e500mc: Revert "add load inst fixup"
      KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1
      KVM: PPC: Book3s: Remove kvmppc_read_inst() function
      KVM: PPC: Allow kvmppc_get_last_inst() to fail
      KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

Paul Mackerras (4):
      KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling
      KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled
      KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call
      KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication

Stewart Smith (2):
      Split out struct kvmppc_vcore creation to separate function
      Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8

Conflicts:
	Documentation/virtual/kvm/api.txt
2014-08-05 09:58:11 +02:00
Alexander Graf 6f22bd3265 KVM: PPC: Book3S HV: Make HTAB code LE host aware
When running on an LE host all data structures are kept in little endian
byte order. However, the HTAB still needs to be maintained in big endian.

So every time we access any HTAB we need to make sure we do so in the right
byte order. Fix up all accesses to manually byte swap.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:22:22 +02:00
Aneesh Kumar K.V 341acbb3aa KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
With guests supporting Multiple page size per segment (MPSS),
hpte_page_size returns the actual page size used. Add a new function to
return base page size and use that to compare against the the page size
calculated from SLB. Without this patch a hpte lookup can fail since
we are comparing wrong page size in kvmppc_hv_find_lock_hpte.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-25 14:07:06 +02:00
Linus Torvalds b05d59dfce At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM.  Changes include:
 
 - a lot of s390 changes: optimizations, support for migration,
   GDB support and more
 
 - ARM changes are pretty small: support for the PSCI 0.2 hypercall
   interface on both the guest and the host (the latter acked by Catalin)
 
 - initial POWER8 and little-endian host support
 
 - support for running u-boot on embedded POWER targets
 
 - pretty large changes to MIPS too, completing the userspace interface
   and improving the handling of virtualized timer hardware
 
 - for x86, a larger set of changes is scheduled for 3.17.  Still,
   we have a few emulator bugfixes and support for running nested
   fully-virtualized Xen guests (para-virtualized Xen guests have
   always worked).  And some optimizations too.
 
 The only missing architecture here is ia64.  It's not a coincidence
 that support for KVM on ia64 is scheduled for removal in 3.17.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX
 BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1
 j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8
 W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ
 0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH
 CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA
 nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB
 TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m
 ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og
 VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl
 3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz
 bAknaZpPiOrW11Et1htY
 =j5Od
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next

Pull KVM updates from Paolo Bonzini:
 "At over 200 commits, covering almost all supported architectures, this
  was a pretty active cycle for KVM.  Changes include:

   - a lot of s390 changes: optimizations, support for migration, GDB
     support and more

   - ARM changes are pretty small: support for the PSCI 0.2 hypercall
     interface on both the guest and the host (the latter acked by
     Catalin)

   - initial POWER8 and little-endian host support

   - support for running u-boot on embedded POWER targets

   - pretty large changes to MIPS too, completing the userspace
     interface and improving the handling of virtualized timer hardware

   - for x86, a larger set of changes is scheduled for 3.17.  Still, we
     have a few emulator bugfixes and support for running nested
     fully-virtualized Xen guests (para-virtualized Xen guests have
     always worked).  And some optimizations too.

  The only missing architecture here is ia64.  It's not a coincidence
  that support for KVM on ia64 is scheduled for removal in 3.17"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
  KVM: add missing cleanup_srcu_struct
  KVM: PPC: Book3S PR: Rework SLB switching code
  KVM: PPC: Book3S PR: Use SLB entry 0
  KVM: PPC: Book3S HV: Fix machine check delivery to guest
  KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
  KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
  KVM: PPC: Book3S HV: Fix dirty map for hugepages
  KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
  KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
  KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
  KVM: PPC: Book3S: Add ONE_REG register names that were missed
  KVM: PPC: Add CAP to indicate hcall fixes
  KVM: PPC: MPIC: Reset IRQ source private members
  KVM: PPC: Graciously fail broken LE hypercalls
  PPC: ePAPR: Fix hypercall on LE guest
  KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
  KVM: PPC: BOOK3S: Always use the saved DAR value
  PPC: KVM: Make NX bit available with magic page
  KVM: PPC: Disable NX for old magic page using guests
  KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
  ...
2014-06-04 08:47:12 -07:00
Paul Mackerras 55765483e1 KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
The global_invalidates() function contains a check that is intended
to tell whether we are currently executing in the context of a hypercall
issued by the guest.  The reason is that the optimization of using a
local TLB invalidate instruction is only valid in that context.  The
check was testing local_paca->kvm_hstate.kvm_vcore, which gets set
when entering the guest but no longer gets cleared when exiting the
guest.  To fix this, we use the kvm_vcpu field instead, which does
get cleared when exiting the guest, by the kvmppc_release_hwthread()
calls inside kvmppc_run_core().

The effect of having the check wrong was that when kvmppc_do_h_remove()
got called from htab_write() on the destination machine during a
migration, it cleared the current cpu's bit in kvm->arch.need_tlb_flush.
This meant that when the guest started running in the destination VM,
it may miss out on doing a complete TLB flush, and therefore may end
up using stale TLB entries from a previous guest that used the same
LPID value.

This should make migration more reliable.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30 14:26:28 +02:00
pingfank@linux.vnet.ibm.com 1ad9f23873 KVM: PPC: Book3S: HV: make _PAGE_NUMA take effect
Numa fault is a method which help to achieve auto numa balancing.
When such a page fault takes place, the page fault handler will check
whether the page is placed correctly. If not, migration should be
involved to cut down the distance between the cpu and pages.

A pte with _PAGE_NUMA help to implement numa fault. It means not to
allow the MMU to access the page directly. So a page fault is triggered
and numa fault handler gets the opportunity to run checker.

As for the access of MMU, we need special handling for the powernv's guest.
When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
invalidate it in guest's htab, but when we tried to re-insert them,
we firstly try to map it in real-mode. Only after this fails, we fallback
to virt mode, and most of important, we run numa fault handler in virt
mode.  This patch guards the way of real-mode to ensure that if a pte is
marked with _PAGE_NUMA, it will NOT be mapped in real mode, instead, it will
be mapped in virt mode and have the opportunity to be checked with placement.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-04-28 12:35:41 +02:00
Paul Mackerras 797f9c07eb KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
With HV KVM, some high-frequency hypercalls such as H_ENTER are handled
in real mode, and need to access the memslots array for the guest.
Accessing the memslots array is safe, because we hold the SRCU read
lock for the whole time that a guest vcpu is running.  However, the
checks that kvm_memslots() does when lockdep is enabled are potentially
unsafe in real mode, when only the linear mapping is available.
Furthermore, kvm_memslots() can be called from a secondary CPU thread,
which is an offline CPU from the point of view of the host kernel,
and is not running the task which holds the SRCU read lock.

To avoid false positives in the checks in kvm_memslots(), and to avoid
possible side effects from doing the checks in real mode, this replaces
kvm_memslots() with kvm_memslots_raw() in all the places that execute
in real mode.  kvm_memslots_raw() is a new function that is like
kvm_memslots() but uses rcu_dereference_raw_notrace() instead of
kvm_dereference_check().

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29 19:58:35 +11:00
Paolo Bonzini b73117c493 Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queue
Conflicts:
	arch/powerpc/kvm/book3s_hv_rmhandlers.S
	arch/powerpc/kvm/booke.c
2014-01-29 18:29:01 +01:00
Bharat Bhushan 7c85e6b39c kvm: book3s: rename lookup_linux_pte() to lookup_linux_pte_and_update()
lookup_linux_pte() is doing more than lookup, updating the pte,
so for clarity it is renamed to lookup_linux_pte_and_update()

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:15:06 +01:00
Paul Mackerras df9059bb64 KVM: PPC: Book3S HV: Don't drop low-order page address bits
Commit caaa4c804f ("KVM: PPC: Book3S HV: Fix physical address
calculations") unfortunately resulted in some low-order address bits
getting dropped in the case where the guest is creating a 4k HPTE
and the host page size is 64k.  By getting the low-order bits from
hva rather than gpa we miss out on bits 12 - 15 in this case, since
hva is at page granularity.  This puts the missing bits back in.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-18 11:30:35 +01:00
pingfan liu 91648ec09c powerpc: kvm: fix rare but potential deadlock scene
Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
realmode, so it can trigger the deadlock.

Suppose the following scene:

Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of
vcpus.

If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched
out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be
caught in realmode for a long time.

What makes things even worse if the following happens,
  On cpuM, bitlockX is hold, on cpuN, Y is hold.
  vcpu_B_2 try to lock Y on cpuM in realmode
  vcpu_A_2 try to lock X on cpuN in realmode

Oops! deadlock happens

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
CC: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-11-18 22:41:57 +01:00
Paul Mackerras caaa4c804f KVM: PPC: Book3S HV: Fix physical address calculations
This fixes a bug in kvmppc_do_h_enter() where the physical address
for a page can be calculated incorrectly if transparent huge pages
(THP) are active.  Until THP came along, it was true that if we
encountered a large (16M) page in kvmppc_do_h_enter(), then the
associated memslot must be 16M aligned for both its guest physical
address and the userspace address, and the physical address
calculations in kvmppc_do_h_enter() assumed that.  With THP, that
is no longer true.

In the case where we are using MMU notifiers and the page size that
we get from the Linux page tables is larger than the page being mapped
by the guest, we need to fill in some low-order bits of the physical
address.  Without THP, these bits would be the same in the guest
physical address (gpa) and the host virtual address (hva).  With THP,
they can be different, and we need to use the bits from hva rather
than gpa.

In the case where we are not using MMU notifiers, the host physical
address we get from the memslot->arch.slot_phys[] array already
includes the low-order bits down to the PAGE_SIZE level, even if
we are using large pages.  Thus we can simplify the calculation in
this case to just add in the remaining bits in the case where
PAGE_SIZE is 64k and the guest is mapping a 4k page.

The same bug exists in kvmppc_book3s_hv_page_fault().  The basic fix
is to use psize (the page size from the HPTE) rather than pte_size
(the page size from the Linux PTE) when updating the HPTE low word
in r.  That means that pfn needs to be computed to PAGE_SIZE
granularity even if the Linux PTE is a huge page PTE.  That can be
arranged simply by doing the page_to_pfn() before setting page to
the head of the compound page.  If psize is less than PAGE_SIZE,
then we need to make sure we only update the bits from PAGE_SIZE
upwards, in order not to lose any sub-page offset bits in r.
On the other hand, if psize is greater than PAGE_SIZE, we need to
make sure we don't bring in non-zero low order bits in pfn, hence
we mask (pfn << PAGE_SHIFT) with ~(psize - 1).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-11-18 22:36:09 +01:00
Linus Torvalds 39eda2aba6 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "Here's the powerpc batch for this merge window.  Some of the
  highlights are:

   - A bunch of endian fixes ! We don't have full LE support yet in that
     release but this contains a lot of fixes all over arch/powerpc to
     use the proper accessors, call the firmware with the right endian
     mode, etc...

   - A few updates to our "powernv" platform (non-virtualized, the one
     to run KVM on), among other, support for bridging the P8 LPC bus
     for UARTs, support and some EEH fixes.

   - Some mpc51xx clock API cleanups in preparation for a clock API
     overhaul

   - A pile of cleanups of our old math emulation code, including better
     support for using it to emulate optional FP instructions on
     embedded chips that otherwise have a HW FPU.

   - Some infrastructure in selftest, for powerpc now, but could be
     generalized, initially used by some tests for our perf instruction
     counting code.

   - A pile of fixes for hotplug on pseries (that was seriously
     bitrotting)

   - The usual slew of freescale embedded updates, new boards, 64-bit
     hiberation support, e6500 core PMU support, etc..."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits)
  powerpc: Correct FSCR bit definitions
  powerpc/xmon: Fix printing of set of CPUs in xmon
  powerpc/pseries: Move lparcfg.c to platforms/pseries
  powerpc/powernv: Return secondary CPUs to firmware on kexec
  powerpc/btext: Fix CONFIG_PPC_EARLY_DEBUG_BOOTX on ppc32
  powerpc: Cleanup handling of the DSCR bit in the FSCR register
  powerpc/pseries: Child nodes are not detached by dlpar_detach_node
  powerpc/pseries: Add mising of_node_put in delete_dt_node
  powerpc/pseries: Make dlpar_configure_connector parent node aware
  powerpc/pseries: Do all node initialization in dlpar_parse_cc_node
  powerpc/pseries: Fix parsing of initial node path in update_dt_node
  powerpc/pseries: Pack update_props_workarea to map correctly to rtas buffer header
  powerpc/pseries: Fix over writing of rtas return code in update_dt_node
  powerpc/pseries: Fix creation of loop in device node property list
  powerpc: Skip emulating & leave interrupts off for kernel program checks
  powerpc: Add more exception trampolines for hypervisor exceptions
  powerpc: Fix location and rename exception trampolines
  powerpc: Add more trap names to xmon
  powerpc/pseries: Add a warning in the case of cross-cpu VPA registration
  powerpc: Update the 00-Index in Documentation/powerpc
  ...
2013-09-06 10:49:42 -07:00
Alexander Graf bf550fc93d Merge remote-tracking branch 'origin/next' into kvm-ppc-next
Conflicts:
	mm/Kconfig

CMA DMA split and ZSWAP introduction were conflicting, fix up manually.
2013-08-29 00:41:59 +02:00
Anton Blanchard 54bb7f4bda powerpc: Make rwlocks endian safe
Our ppc64 spinlocks and rwlocks use a trick where a lock token and
the paca index are placed in the lock with a single store. Since we
are using two u16s they need adjusting for little endian.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14 15:33:40 +10:00
Paul Mackerras 5448050124 KVM: PPC: Book3S HV: Correct tlbie usage
This corrects the usage of the tlbie (TLB invalidate entry) instruction
in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
On the PPC970, the bit to select large vs. small page is in the instruction,
not in the RB register value.  This changes the code to use the correct
form on PPC970.

On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
field of the RB value incorrectly for 64k pages.  This fixes it.

Since we now have several cases to handle for the tlbie instruction, this
factors out the code to do a sequence of tlbies into a new function,
do_tlbies(), and calls that from the various places where the code was
doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
use the same global_invalidates() function for determining whether to do
local or global TLB invalidations as is used in other places, for
consistency, and also to make sure that kvm->arch.need_tlb_flush gets
updated properly.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-10 13:14:09 +02:00
Aneesh Kumar K.V db7cb5b924 powerpc/kvm: Handle transparent hugepage in KVM
We can find pte that are splitting while walking page tables. Return
None pte in that case.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:55 +10:00
Aneesh Kumar K.V 12bc9f6fc1 powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly
document why we don't need to handle transparent hugepages at callsites.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:54 +10:00
Paul Mackerras a1b4a0f606 KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes
At present, the code that determines whether a HPT entry has changed,
and thus needs to be sent to userspace when it is copying the HPT,
doesn't consider a hardware update to the reference and change bits
(R and C) in the HPT entries to constitute a change that needs to
be sent to userspace.  This adds code to check for changes in R and C
when we are scanning the HPT to find changed entries, and adds code
to set the changed flag for the HPTE when we update the R and C bits
in the guest view of the HPTE.

Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
function into kvm_book3s_64.h.

Current Linux guest kernels don't use the hardware updates of R and C
in the HPT, so this change won't affect them.  Linux (or other) kernels
might in future want to use the R and C bits and have them correctly
transferred across when a guest is migrated, so it is better to correct
this deficiency.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:12 +02:00
Paul Mackerras 1b400ba0cd KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations
When we change or remove a HPT (hashed page table) entry, we can do
either a global TLB invalidation (tlbie) that works across the whole
machine, or a local invalidation (tlbiel) that only affects this core.
Currently we do local invalidations if the VM has only one vcpu or if
the guest requests it with the H_LOCAL flag, though the guest Linux
kernel currently doesn't ever use H_LOCAL.  Then, to cope with the
possibility that vcpus moving around to different physical cores might
expose stale TLB entries, there is some code in kvmppc_hv_entry to
flush the whole TLB of entries for this VM if either this vcpu is now
running on a different physical core from where it last ran, or if this
physical core last ran a different vcpu.

There are a number of problems on POWER7 with this as it stands:

- The TLB invalidation is done per thread, whereas it only needs to be
  done per core, since the TLB is shared between the threads.
- With the possibility of the host paging out guest pages, the use of
  H_LOCAL by an SMP guest is dangerous since the guest could possibly
  retain and use a stale TLB entry pointing to a page that had been
  removed from the guest.
- The TLB invalidations that we do when a vcpu moves from one physical
  core to another are unnecessary in the case of an SMP guest that isn't
  using H_LOCAL.
- The optimization of using local invalidations rather than global should
  apply to guests with one virtual core, not just one vcpu.

(None of this applies on PPC970, since there we always have to
invalidate the whole TLB when entering and leaving the guest, and we
can't support paging out guest memory.)

To fix these problems and simplify the code, we now maintain a simple
cpumask of which cpus need to flush the TLB on entry to the guest.
(This is indexed by cpu, though we only ever use the bits for thread
0 of each core.)  Whenever we do a local TLB invalidation, we set the
bits for every cpu except the bit for thread 0 of the core that we're
currently running on.  Whenever we enter a guest, we test and clear the
bit for our core, and flush the TLB if it was set.

On initial startup of the VM, and when resetting the HPT, we set all the
bits in the need_tlb_flush cpumask, since any core could potentially have
stale TLB entries from the previous VM to use the same LPID, or the
previous contents of the HPT.

Then, we maintain a count of the number of online virtual cores, and use
that when deciding whether to use a local invalidation rather than the
number of online vcpus.  The code to make that decision is extracted out
into a new function, global_invalidates().  For multi-core guests on
POWER7 (i.e. when we are using mmu notifiers), we now never do local
invalidations regardless of the H_LOCAL flag.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:05 +01:00
Paul Mackerras 1cc8ed0b13 KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages
Currently, if the guest does an H_PROTECT hcall requesting that the
permissions on a HPT entry be changed to allow writing, we make the
requested change even if the page is marked read-only in the host
Linux page tables.  This is a problem since it would for instance
allow a guest to modify a page that KSM has decided can be shared
between multiple guests.

To fix this, if the new permissions for the page allow writing, we need
to look up the memslot for the page, work out the host virtual address,
and look up the Linux page tables to get the PTE for the page.  If that
PTE is read-only, we reduce the HPTE permissions to read-only.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:00 +01:00