1
0
Fork 0
Commit Graph

16792 Commits (68fe55680d0f3342969f49412fceabb90bdfadba)

Author SHA1 Message Date
Russell Currey 8e3f1b1d82 powerpc/powernv/pci: Enable 64-bit devices to access >4GB DMA space
On PHB3/POWER8 systems, devices can select between two different sections
of address space, TVE#0 and TVE#1.  TVE#0 is intended for 32bit devices
that aren't capable of addressing more than 4GB.  Selecting TVE#1 instead,
with the capability of addressing over 4GB, is performed by setting bit 59
of a PCI address.

However, some devices aren't capable of addressing at least 59 bits, but
still want more than 4GB of DMA space.  In order to enable this, reconfigure
TVE#0 to be suitable for 64-bit devices by allocating memory past the
initial 4GB that is inaccessible by 64-bit DMAs.

This bypass mode is only enabled if a device requests 4GB or more of DMA
address space, if the system has PHB3 (POWER8 systems), and if the device
does not share a PE with any devices from different vendors.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:28 +10:00
Russell Currey a0f98629f1 powerpc/powernv/pci: Add helper to check if a PE has a single vendor
Add a helper that determines if all the devices contained in a given PE
are all from the same vendor or not.  This can be useful in determining
if it's okay to make PE-wide changes that may be suitable for some
devices but not for others.

This is used later in the series.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:28 +10:00
Russell Currey a4b48ba904 powerpc/powernv/pci: Add support for PHB4 diagnostics
As with P7IOC and PHB3, add kernel-side support for decoding and printing
diagnostic data for PHB4.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:27 +10:00
Russell Currey 5cb1f8fddd powerpc/powernv/pci: Dynamically allocate PHB diag data
Diagnostic data for PHBs currently works by allocated a fixed-sized buffer.
This is simple, but either wastes memory (though only a few kilobytes) or
in the case of PHB4 isn't enough to fit the whole data blob.

For machines that don't describe the diagnostic data size in the device
tree, use the hardcoded buffer size as before.  For those that do, only
allocate exactly what's needed.

In the special case of P7IOC (which has two types of diag data), the larger
should be specified in the device tree.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:27 +10:00
Russell Currey 31bbd45af3 powerpc/powernv/pci: Reduce spam when dumping PEST
Dumping the PE State Tables (PEST) can be highly verbose if a number of PEs
are affected, especially in the case where the whole PHB is frozen and 512
lines get printed.  Check for duplicates when dumping the PEST to reduce
useless output.

For example:

    PE[0f8] A/B: 9700002600000000 80000080d00000f8
    PE[0f9] A/B: 8000000000000000 0000000000000000
    PE[..0fe] A/B: as above
    PE[0ff] A/B: 8440002b00000000 0000000000000000

instead of:

    PE[0f8] A/B: 9700002600000000 80000080d00000f8
    PE[0f9] A/B: 8000000000000000 0000000000000000
    PE[0fa] A/B: 8000000000000000 0000000000000000
    PE[0fb] A/B: 8000000000000000 0000000000000000
    PE[0fc] A/B: 8000000000000000 0000000000000000
    PE[0fd] A/B: 8000000000000000 0000000000000000
    PE[0fe] A/B: 8000000000000000 0000000000000000
    PE[0ff] A/B: 8440002b00000000 0000000000000000

and you can imagine how much worse it can get for 512 PEs.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:26 +10:00
Michael Neuling 2bafb7ffa3 powerpc/tm: Fix comment
Update to real function name.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:09:09 +10:00
Michael Neuling aa9a951636 powerpc: Fix asm offsets to point to actual FP and VMX regs
The asm code assumes the FP regs are at the start of fp_state. While
this is true now, it may not always be the case and there is nothing
enforcing it.

This fixes the asm-offsets to point to the actual FP registers inside
the fp_state.  Similarly for VMX.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:09:08 +10:00
Michael Neuling 64ebb9a208 powerpc: Fix /proc/cpuinfo revision for POWER9 DD2
The P9 PVR bits 12-15 don't indicate a revision but instead different
chip configurations.  From BookIV we have:
   Bits      Configuration
    0 :    Scale out 12 cores
    1 :    Scale out 24 cores
    2 :    Scale up  12 cores
    3 :    Scale up  24 cores

DD1 doesn't use this but DD2 does. Linux will mostly use the "Scale
out 24 core" configuration (ie. SMT4 not SMT8) which results in a PVR
of 0x004e1200. The reported revision in /proc/cpuinfo is hence
reported incorrectly as "18.0".

This patch fixes this to mask off only the relevant bits for the major
revision (ie. bits 8-11) for POWER9.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:09:07 +10:00
Michael Ellerman d6bd8194e2 powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user()
Larry Finger reported that his Powerbook G4 was no longer booting with v4.12-rc,
userspace was up but giving weird errors such as:

  udevd[64]: starting version 175
  udevd[64]: Unable to receive ctrl message: Bad address.
  modprobe: chdir(4.12-rc1): No such file or directory

He bisected the problem to commit 3448890c32 ("powerpc: get rid of zeroing,
switch to RAW_COPY_USER").

Al identified that the problem is actually a miscompilation by GCC 4.6.3, which
is exposed by the above commit.

Al also pointed out that inlining copy_to/from_user() is probably of little or
no benefit, which is correct. Using Anton's copy_to_user benchmark, with a
pathological single byte copy, we see a small increase in performance
by *removing* inlining:

  Before (inlined):
  # time ./copy_to_user -w -l 1 -i 10000000	( x 3 )
  real	0m22.063s
  real	0m22.059s
  real	0m22.076s

  After:
  # time ./copy_to_user -w -l 1 -i 10000000	( x 3 )
  real	0m21.325s
  real	0m21.299s
  real	0m21.364s

So as a small performance improvement and to avoid the miscompilation, drop
inlining copy_to/from_user() on 32-bit.

Fixes: 3448890c32 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER")
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-26 23:25:08 +10:00
Ingo Molnar 1bc3cd4dfa Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-24 08:57:20 +02:00
Linus Torvalds 94a6df251d powerpc fixes for 4.12 #7
- three fixes for kprobes/ftrace/livepatch interactions.
 
  - properly handle data breakpoints when using the Radix MMU.
 
  - fix for perf sampling of registers during call_usermodehelper().
 
  - properly initialise the thread_info on our emergency stacks
 
  - add an explicit flush when doing TLB invalidations for a process
    using NPU2.
 
 Thanks to:
   Alistair Popple, Naveen N. Rao, Nicholas Piggin, Ravi Bangoria,
   Masami Hiramatsu.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZTZy4AAoJEFHr6jzI4aWA9CYQAK+BIZ2wM+QEKDWUc7bHUBfJ
 kVkFr59VS4x9w2zL2fKijy3CTNqaEXCUhmCks7PFYxGfF437YaJGVfCBVotuY9Ce
 SKTkJujUUf7b1zN+lKz8d9u6AKomE9rYBLpR0LPhDrnpiLbHtyWCeFWsmOB63k4E
 05EwIHGAlvIC/dc6bHoeJzSLT5agK2KcCVWjgVzZgkDi7sbYkE8qhPmo/cojSERo
 48+o8beAKgU3YEI8OwraxYBlUR71DKfdL7+6xvEo8kVNj5iNMq5GWY+YLvcQgR50
 3MLuGxWFZWVRfZY8rrLMajFxNXojwuWuLu/PTT0Kz2ZRgLseF+op0AH2Ezsw4pnZ
 CLp0sSKs9BqpwKuFCb1lHiEVnGfOb9CFy3u0nWmQjsE0Bj8HRC433x4fNQcJVUmJ
 ZMPXRtZaboPV9jt3UoUhtancMiXdAbTP48N7klFRuVwCOycnxW5yAFkCssFaSpsn
 EAidzBDODUXUV6/3paNVsZD7ehVJ/FMBgKSyAoJrcr+RZeFbn4b9m/NvdpdhQIwn
 iGrTMhz3YmEhxiZrStYB9aaeaaWKZxd120bnTcfFEcnMOCKUkBSICtqjGLVsBO5e
 rQV9P97h+kxf+Wh7DqhkC7br7URpYsYDZa9bCd+SAL1qrGeNZW/RP01ABRZWiSi4
 0QVvKZ7uVzyEHIVHXOoj
 =a2Ax
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 4.12. Most of these actually came in last
  week but got held up for some more testing.

   - three fixes for kprobes/ftrace/livepatch interactions.

   - properly handle data breakpoints when using the Radix MMU.

   - fix for perf sampling of registers during call_usermodehelper().

   - properly initialise the thread_info on our emergency stacks

   - add an explicit flush when doing TLB invalidations for a process
     using NPU2.

  Thanks to: Alistair Popple, Naveen N. Rao, Nicholas Piggin, Ravi
  Bangoria, Masami Hiramatsu"

* tag 'powerpc-4.12-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64: Initialise thread_info for emergency stacks
  powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD
  powerpc/perf: Fix oops when kthread execs user process
  powerpc/64s: Handle data breakpoints in Radix mode
  powerpc/kprobes: Skip livepatch_handler() for jprobes
  powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS
  powerpc/kprobes: Pause function_graph tracing during jprobes handling
2017-06-23 17:53:16 -07:00
Balbir Singh 0428491cba powerpc/mm: Trace tlbie(l) instructions
Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate
Entry (Local)) instructions.

The tlbie instruction has changed over the years, so not all versions
accept the same operands. Use the ISA v3 field operands because they are
the most verbose, we may change them in future.

Example output:

  qemu-system-ppc-5371  [016]  1412.369519: tlbie:
  	tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Add some missing trace_tlbie()s, reword change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-23 21:14:49 +10:00
Thiago Jung Bauermann 3e401f7a2e powerpc: Only obtain cpu_hotplug_lock if called by rtasd
Calling arch_update_cpu_topology from a CPU hotplug state machine callback
hits a deadlock because the function tries to get a read lock on
cpu_hotplug_lock while the state machine still holds a write lock on it.

Since all callers of arch_update_cpu_topology except rtasd already hold
cpu_hotplug_lock, this patch changes the function to use
stop_machine_cpuslocked and creates a separate function for rtasd which
still tries to obtain the lock.

Michael Bringmann investigated the bug and provided a detailed analysis
of the deadlock on this previous RFC for an alternate solution:

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michael Bringmann <mwb@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1497996510-4032-1-git-send-email-bauerman@linux.vnet.ibm.com
Link: https://patchwork.ozlabs.org/patch/771293/
2017-06-23 09:32:11 +02:00
Nicholas Piggin 34f19ff1b5 powerpc/64: Initialise thread_info for emergency stacks
Emergency stacks have their thread_info mostly uninitialised, which in
particular means garbage preempt_count values.

Emergency stack code runs with interrupts disabled entirely, and is
used very rarely, so this has been unnoticed so far. It was found by a
proposed new powerpc watchdog that takes a soft-NMI directly from the
masked_interrupt handler and using the emergency stack. That crashed
at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be
garbage.

To fix this, zero the entire THREAD_SIZE allocation, and initialize
the thread_info.

Cc: stable@vger.kernel.org
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move it all into setup_64.c, use a function not a macro. Fix
      crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-23 13:25:38 +10:00
Alistair Popple bbd5ff50af powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD
NPU2 requires an extra explicit flush to an active GPU PID when
sending address translation shoot downs (ATSDs) to reliably flush the
GPU TLB. This patch adds just such a flush at the end of each sequence
of ATSDs.

We can safely use PID 0 which is always reserved and active on the
GPU. PID 0 is only used for init_mm which will never be a user mm on
the GPU. To enforce this we add a check in pnv_npu2_init_context()
just in case someone tries to use PID 0 on the GPU.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
[mpe: Use true/false for bool literals]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-22 21:21:08 +10:00
Ingo Molnar a4eb8b9935 Merge branch 'linus' into x86/mm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:57:28 +02:00
Paul Mackerras d4cfb11387 powerpc: Convert VDSO update function to use new update_vsyscall interface
This converts the powerpc VDSO time update function to use the new
interface introduced in commit 576094b7f0 ("time: Introduce new
GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
us the time as of the last update in seconds and whole nanoseconds,
with the new interface we get the nanoseconds part effectively in
a binary fixed-point format with tk->tkr_mono.shift bits to the
right of the binary point.

With the old interface, the fractional nanoseconds got truncated,
meaning that the value returned by the VDSO clock_gettime function
would have about 1ns of jitter in it compared to the value computed
by the generic timekeeping code in the kernel.

The powerpc VDSO time functions (clock_gettime and gettimeofday)
already work in units of 2^-32 seconds, or 0.23283 ns, because that
makes it simple to split the result into seconds and fractional
seconds, and represent the fractional seconds in either microseconds
or nanoseconds.  This is good enough accuracy for now, so this patch
avoids changing how the VDSO works or the interface in the VDSO data
page.

This patch converts the powerpc update_vsyscall_old to be called
update_vsyscall and use the new interface.  We convert the fractional
second to units of 2^-32 seconds without truncating to whole nanoseconds.
(There is still a conversion to whole nanoseconds for any legacy users
of the vdso_data/systemcfg stamp_xtime field.)

In addition, this improves the accuracy of the computation of tb_to_xs
for those systems with high-frequency timebase clocks (>= 268.5 MHz)
by doing the right shift in two parts, one before the multiplication and
one after, rather than doing the right shift before the multiplication.
(We can't do all of the right shift after the multiplication unless we
use 128-bit arithmetic.)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-22 16:26:23 +10:00
Paul Mackerras 2ed4f9dd19 KVM: PPC: Book3S HV: Add capability to report possible virtual SMT modes
Now that userspace can set the virtual SMT mode by enabling the
KVM_CAP_PPC_SMT capability, it is useful for userspace to be able
to query the set of possible virtual SMT modes.  This provides a
new capability, KVM_CAP_PPC_SMT_POSSIBLE, to provide this
information.  The return value is a bitmap of possible modes, with
bit N set if virtual SMT mode 2^N is available.  That is, 1 indicates
SMT1 is available, 2 indicates that SMT2 is available, 3 indicates
that both SMT1 and SMT2 are available, and so on.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-22 11:25:31 +10:00
Aravinda Prasad e20bbd3d8d KVM: PPC: Book3S HV: Exit guest upon MCE when FWNMI capability is enabled
Enhance KVM to cause a guest exit with KVM_EXIT_NMI
exit reason upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering a 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivery of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Note:

This patch now directly invokes machine_check_print_event_info()
from kvmppc_handle_exit_hv() to print the event to host console
at the time of guest exit before the exception is passed on to the
guest. Hence, the host-side handling which was performed earlier
via machine_check_fwnmi is removed.

The reasons for this approach is (i) it is not possible
to distinguish whether the exception occurred in the
guest or the host from the pt_regs passed on the
machine_check_exception(). Hence machine_check_exception()
calls panic, instead of passing on the exception to
the guest, if the machine check exception is not
recoverable. (ii) the approach introduced in this
patch gives opportunity to the host kernel to perform
actions in virtual mode before passing on the exception
to the guest. This approach does not require complex
tweaks to machine_check_fwnmi and friends.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-22 11:24:57 +10:00
David S. Miller 3d09198243 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21 17:35:22 -04:00
Santosh Sivaraj 6b847d795c powerpc/time: Fix tracing in time.c
Since trace_clock is in a different file and already marked with notrace,
enable tracing in time.c by removing it from the disabled list in Makefile.
Also annotate clocksource read functions and sched_clock with notrace.

Testing: Timer and ftrace selftests run with different trace clocks.

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-21 20:37:27 +10:00
Michael Ellerman fd88b945c1 powerpc/64s: Rename slb_allocate_realmode() to slb_allocate()
As for slb_miss_realmode(), rename slb_allocate_realmode() to avoid
confusion over whether it runs in real or virtual mode - it runs in
both.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2017-06-21 16:18:33 +10:00
Michael Ellerman 442b6e8e03 powerpc/64s: Rename slb_miss_realmode() to slb_miss_common()
slb_miss_realmode() doesn't always runs in real mode, which is what the
name implies. So rename it to avoid confusing people.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2017-06-21 16:18:29 +10:00
Michael Ellerman b102063b47 powerpc/64s: Use BRANCH_TO_COMMON() for slb_miss_realmode
All the callers of slb_miss_realmode currently open code the #ifndef
CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case.
We have a macro to do this, BRANCH_TO_COMMON(), so use it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2017-06-21 16:18:24 +10:00
Mahesh Salgaonkar 8aa586c688 powerpc/book3s: EXPORT_SYMBOL_GPL machine_check_print_event_info
It will be used in arch/powerpc/kvm/book3s_hv.c KVM module.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-21 13:52:05 +10:00
Aravinda Prasad 134764ed6e KVM: PPC: Book3S HV: Add new capability to control MCE behaviour
This introduces a new KVM capability to control how KVM behaves
on machine check exception (MCE) in HV KVM guests.

If this capability has not been enabled, KVM redirects machine check
exceptions to guest's 0x200 vector, if the address in error belongs to
the guest. With this capability enabled, KVM will cause a guest exit
with the exit reason indicating an NMI.

The new capability is required to avoid problems if a new kernel/KVM
is used with an old QEMU, running a guest that doesn't issue
"ibm,nmi-register".  As old QEMU does not understand the NMI exit
type, it treats it as a fatal error.  However, the guest could have
handled the machine check error if the exception was delivered to
guest's 0x200 interrupt vector instead of NMI exit in case of old
QEMU.

[paulus@ozlabs.org - Reworded the commit message to be clearer,
 enable only on HV KVM.]

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-21 13:37:08 +10:00
Radim Krčmář c72544d85f Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* fix problems that could cause hangs or crashes in the host on POWER9
* fix problems that could allow guests to potentially affect or disrupt
  the execution of the controlling userspace
2017-06-20 14:32:57 +02:00
Nicholas Piggin 8568f1e026 powerpc/64s/paca: EX_CTR is not used with RELOCATABLE=n, remove it
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:02 +10:00
Nicholas Piggin 635942ae53 powerpc/64s/paca: EX_R3 can be merged with EX_DAR
EX_R3 is used only for a small section of the bad stack handler.
Merge it with EX_DAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:01 +10:00
Nicholas Piggin dbeea1d6b4 powerpc/64s/paca: EX_LR can be merged with EX_DAR
EX_LR is used only for a small section of the SLB miss handler.
Merge it with EX_DAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:01 +10:00
Nicholas Piggin 36670fcf01 powerpc/64s/paca: EX_SRR0 is unused, remove it
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:00 +10:00
Nicholas Piggin 8c38851415 powerpc/64s: Add EX_SIZE definition for paca exception save areas
Rather than open-coding it 4 times.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move __ASSEMBLY__ guards into head-64.h where they're really needed]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:00 +10:00
Nicholas Piggin 4d7cd3b956 powerpc/64s: Avoid r3 save/restore in SLB miss handler
The SLB miss handler uses r3 for the faulting address but r12 is
mostly able to be freed up to save r3 in. It just requires SRR1
be reloaded again on error.

It would be more conventional to use r12 for SRR1 (and use r11 to
save r3), but slb_allocate_realmode clobbers r11 and not r12.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:21:59 +10:00
Nicholas Piggin fe5482c043 powerpc/64s: SLB miss already has CTR saved for relocatable kernel
The EXCEPTION_PROLOG_1 used by SLB miss already saves CTR when the
kernel is built with CONFIG_RELOCATABLE. So it does not have to be
saved and reloaded when branching to slb_miss_realmode. It can be
restored from the PACA as usual.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:21:58 +10:00
Nicholas Piggin 7c28f04828 powerpc/64s: Avoid saving faulting address into EX_DAR in SLB miss
The EX_DAR save area is only used in exceptional cases. With r3 no
longer clobbered by slb_allocate_realmode, saving faulting address to
EX_DAR can be deferred to those cases.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:21:50 +10:00
Nicholas Piggin d59afffdf0 powerpc/64s: Preserve r3 in slb_allocate_realmode()
One fewer registers clobbered by this function means the SLB miss
handler can save one fewer.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:18:25 +10:00
Ingo Molnar 902b319413 Merge branch 'WIP.sched/core' into sched/core
Conflicts:
	kernel/sched/Makefile

Pick up the waitqueue related renames - it didn't get much feedback,
so it appears to be uncontroversial. Famous last words? ;-)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:28:21 +02:00
Paul Mackerras ee3308a254 KVM: PPC: Book3S HV: Don't sleep if XIVE interrupt pending on POWER9
On a POWER9 system, it is possible for an interrupt to become pending
for a VCPU when that VCPU is about to cede (execute a H_CEDE hypercall)
and has already disabled interrupts, or in the H_CEDE processing up
to the point where the XIVE context is pulled from the hardware.  In
such a case, the H_CEDE should not sleep, but should return immediately
to the guest.  However, the conditions tested in kvmppc_vcpu_woken()
don't include the condition that a XIVE interrupt is pending, so the
VCPU could sleep until the next decrementer interrupt.

To fix this, we add a new xive_interrupt_pending() helper which looks
in the XIVE context that was pulled from the hardware to see if the
priority of any pending interrupt is higher (numerically lower than)
the CPU priority.  If so then kvmppc_vcpu_woken() will return true.
If the XIVE context has never been used, then both the pipr and the
cppr fields will be zero and the test will indicate that no interrupt
is pending.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-20 15:46:12 +10:00
Hugh Dickins 1be7107fbe mm: larger stack guard gap, between vmas
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-19 21:50:20 +08:00
Nicholas Piggin 40d24343a8 powerpc/64s/idle: Run latch switch is done with MSR[EE]=0
In the idle sleep/wake code we know that MSR[EE] is clear, so we can
avoid 2 x mfmsr and 2 x mtmsr by calling the double-underscore
versions of the run latch routines which assume interrupts are already
disabled.

Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:30 +10:00
Nicholas Piggin 95acdc0712 powerpc/64s/idle: Predict HMI wakeup as unlikely
In a busy system, idle wakeups can be expected from IPIs and device
interrupts.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:29 +10:00
Nicholas Piggin 9d29250136 powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths
Idle code now always runs at the 0xc... effective address whether
in real or virtual mode. This means rfid can be ditched, along
with a lot of SRR manipulations.

In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
MSR states as required.

This also balances the return prediction for the idle call, by
doing blr rather than rfid to return to the idle caller.

On POWER9, 2-process context switch on different cores, with snooze
disabled, increases performance by 2%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Incorporate v2 fixes from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:29 +10:00
Nicholas Piggin b51351e264 powerpc/64s/idle: Branch to handler with virtual mode offset
Have the system reset idle wakeup handlers branched to in real mode
with the 0xc... kernel address applied. This allows simplifications of
avoiding rfid when switching to virtual mode in the wakeup handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:28 +10:00
Nicholas Piggin b48bbb82e2 powerpc/64s: Don't unbalance the return branch predictor in __replay_interrupt()
The __replay_interrupt() code is branched to with bl, but the caller is
returned to directly with rfid from the interrupt.

Instead, rfid to a stub that returns to the caller with blr, which
should keep the return branch predictor balanced.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:28 +10:00
Nicholas Piggin a9af97aa0a powerpc/64s: msgclr when handling doorbell exceptions from system reset
msgsnd doorbell exceptions are cleared when the doorbell interrupt is
taken. However if a doorbell exception causes a system reset interrupt
wake from power saving state, the message is not cleared. Processing
the doorbell from the system reset interrupt requires msgclr to avoid
taking the exception again.

Testing this plus the previous wakup direct patch gives:

                                original         wakeup direct     msgclr
Different threads, same core:   315k/s           264k/s            345k/s
Different cores:                235k/s           242k/s            242k/s

Net speedup is +10% for same core, and +3% for different core.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:27 +10:00
Nicholas Piggin 771d4304d0 powerpc/64s/idle: Process interrupts from system reset wakeup
When the CPU wakes from low power state, it begins at the system reset
interrupt with the exception that caused the wakeup encoded in SRR1.

Today, powernv idle wakeup ignores the wakeup reason (except a special
case for HMI), and the regular interrupt corresponding to the
exception will fire after the idle wakeup exits.

Change this to replay the interrupt from the idle wakeup before
interrupts are hard-enabled.

Test on POWER8 of context_switch selftests benchmark with polling idle
disabled (e.g., always nap, giving cross-CPU IPIs) gives the following
results:

                                original         wakeup direct
Different threads, same core:   315k/s           264k/s
Different cores:                235k/s           242k/s

There is a slowdown for doorbell IPI (same core) case because system
reset wakeup does not clear the message and the doorbell interrupt
fires again needlessly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:27 +10:00
Nicholas Piggin 2525db04d1 powerpc/powernv: Simplify lazy IRQ handling in CPU offline
Rather than concern ourselves with any soft-mask logic in the CPU
hotplug handler, just hard disable interrupts. This ensures there
are no lazy-irqs pending, which means we can call directly to idle
instruction in order to sleep.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:26 +10:00
Nicholas Piggin 2201f994a5 powerpc/64s/idle: Move soft interrupt mask logic into C code
This simplifies the asm and fixes irq-off tracing over sleep
instructions.

Also move powersave_nap check for POWER8 into C code, and move
PSSCR register value calculation for POWER9 into C.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:26 +10:00
Paul Mackerras 579006944e KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9
On POWER9, we no longer have the restriction that we had on POWER8
where all threads in a core have to be in the same partition, so
the CPU threads are now independent.  However, we still want to be
able to run guests with a virtual SMT topology, if only to allow
migration of guests from POWER8 systems to POWER9.

A guest that has a virtual SMT mode greater than 1 will expect to
be able to use the doorbell facility; it will expect the msgsndp
and msgclrp instructions to work appropriately and to be able to read
sensible values from the TIR (thread identification register) and
DPDES (directed privileged doorbell exception status) special-purpose
registers.  However, since each CPU thread is a separate sub-processor
in POWER9, these instructions and registers can only be used within
a single CPU thread.

In order for these instructions to appear to act correctly according
to the guest's virtual SMT mode, we have to trap and emulate them.
We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR
register.  The emulation is triggered by the hypervisor facility
unavailable interrupt that occurs when the guest uses them.

To cause a doorbell interrupt to occur within the guest, we set the
DPDES register to 1.  If the guest has interrupts enabled, the CPU
will generate a doorbell interrupt and clear the DPDES register in
hardware.  The DPDES hardware register for the guest is saved in the
vcpu->arch.vcore->dpdes field.  Since this gets written by the guest
exit code, other VCPUs wishing to cause a doorbell interrupt don't
write that field directly, but instead set a vcpu->arch.doorbell_request
flag.  This is consumed and set to 0 by the guest entry code, which
then sets DPDES to 1.

Emulating reads of the DPDES register is somewhat involved, because
it requires reading the doorbell pending interrupt status of all of the
VCPU threads in the virtual core, and if any of those VCPUs are
running, their doorbell status is only up-to-date in the hardware
DPDES registers of the CPUs where they are running.  In order to get
a reasonable approximation of the current doorbell status, we send
those CPUs an IPI, causing an exit from the guest which will update
the vcpu->arch.vcore->dpdes field.  We then use that value in
constructing the emulated DPDES register value.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-19 14:34:37 +10:00
Paul Mackerras 3c31352460 KVM: PPC: Book3S HV: Allow userspace to set the desired SMT mode
This allows userspace to set the desired virtual SMT (simultaneous
multithreading) mode for a VM, that is, the number of VCPUs that
get assigned to each virtual core.  Previously, the virtual SMT mode
was fixed to the number of threads per subcore, and if userspace
wanted to have fewer vcpus per vcore, then it would achieve that by
using a sparse CPU numbering.  This had the disadvantage that the
vcpu numbers can get quite large, particularly for SMT1 guests on
a POWER8 with 8 threads per core.  With this patch, userspace can
set its desired virtual SMT mode and then use contiguous vcpu
numbering.

On POWER8, where the threading mode is "strict", the virtual SMT mode
must be less than or equal to the number of threads per subcore.  On
POWER9, which implements a "loose" threading mode, the virtual SMT
mode can be any power of 2 between 1 and 8, even though there is
effectively one thread per subcore, since the threads are independent
and can all be in different partitions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-19 14:34:20 +10:00
Paul Mackerras 769377f77c KVM: PPC: Book3S HV: Context-switch HFSCR between host and guest on POWER9
This adds code to allow us to use a different value for the HFSCR
(Hypervisor Facilities Status and Control Register) when running the
guest from that which applies in the host.  The reason for doing this
is to allow us to trap the msgsndp instruction and related operations
in future so that they can be virtualized.  We also save the value of
HFSCR when a hypervisor facility unavailable interrupt occurs, because
the high byte of HFSCR indicates which facility the guest attempted to
access.

We save and restore the host value on guest entry/exit because some
bits of it affect host userspace execution.

We only do all this on POWER9, not on POWER8, because we are not
intending to virtualize any of the facilities controlled by HFSCR on
POWER8.  In particular, the HFSCR bit that controls execution of
msgsndp and related operations does not exist on POWER8.  The HFSCR
doesn't exist at all on POWER7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-19 14:08:02 +10:00
Paul Mackerras 1da4e2f4fb KVM: PPC: Book3S HV: Don't let VCPU sleep if it has a doorbell pending
It is possible, through a narrow race condition, for a VCPU to exit
the guest with a H_CEDE hypercall while it has a doorbell interrupt
pending.  In this case, the H_CEDE should return immediately, but in
fact it puts the VCPU to sleep until some other interrupt becomes
pending or a prod is received (via another VCPU doing H_PROD).

This fixes it by checking the DPDES (Directed Privileged Doorbell
Exception Status) bit for the thread along with the other interrupt
pending bits.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-19 14:05:22 +10:00
Paul Mackerras 1bc3fe818c KVM: PPC: Book3S HV: Enable guests to use large decrementer mode on POWER9
This allows userspace (e.g. QEMU) to enable large decrementer mode for
the guest when running on a POWER9 host, by setting the LPCR_LD bit in
the guest LPCR value.  With this, the guest exit code saves 64 bits of
the guest DEC value on exit.  Other places that use the guest DEC
value check the LPCR_LD bit in the guest LPCR value, and if it is set,
omit the 32-bit sign extension that would otherwise be done.

This doesn't change the DEC emulation used by PR KVM because PR KVM
is not supported on POWER9 yet.

This is partly based on an earlier patch by Oliver O'Halloran.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-19 14:02:04 +10:00
Linus Torvalds 5ac447d268 powerpc fixes for 4.12 #6
Three small fixes for recently merged code:
 
  - remove a spurious WARN_ON when a PCI device has no of_node, it's allowed in
    some circumstances for there to be no of_node.
 
  - fix the offset for store EOI MMIOs in the XIVE interrupt controller.
 
  - fix non-const WARN_ONs which were becoming BUGs due to them losing
    BUGFLAG_WARNING in a recent cleanup patch.
 
 Thanks to:
   Alexey Kardashevskiy, Alistair Popple, Benjamin Herrenschmidt.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZQ7XeAAoJEFHr6jzI4aWA2bgP/R++c9YehNdDKbZyLqumY+6q
 7ns8NoxgEW/Gc8JuSTE4MW51q2HimBJu6ntyHfMUwgpGQpGzOGDn6g8OxVfjOySa
 kFc7cytgOOhEpTHENDZ3xxZtcSd9iafyX9ga/0dz6UycfEHcZayiXDRuXffRzJwa
 RNqbwDxOtkgn6w4bW02SRlfDSTra+zQZQd6NsPXSJJgF+tb3MflMj1A9WoJp/mj/
 tXc9fpKQsZkIG/AvAHziizHqeAKJxUrmoVb8qy1SYTKVUDZoxTYgiO1G1nebZX/s
 Zzsdd/fcHcd0DIEJkjf2V3cegmIGTLzw7mUOodU7IF3mZ1LPgCMVF5lTTZzjcXDQ
 d1gugVojHnGr7KB3lNNijyHxsmHG7LdTQmRHcyZ2L8KYpa3/+Ca3ZuFnTwjvgRNx
 dJEFX5JdAhCrkg1B73rvcjKCFg0ysVIrkdf27SaameaQdQQuZU4+5+s1LB2EqJQr
 II3+pnZr/RF3OWu4yJE5KAHX5ZBQQ+unzVPpW4pqvwYMoVKhO7dhCPPISeRCtzJE
 +po5Ys4ncheSRhwf5dQhf+H04kXmL6ekpl1GJOBB3BskJcsIr8hiLp3/mF238et1
 80o6yTAJLADKUIl75ISiePz+KFZNamgke1/XWZolfHYZ9dNRF0c//E0qvpopz8jE
 F90hxEAtJ9ws/VUlo40Q
 =Mnxp
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Three small fixes for recently merged code:

   - remove a spurious WARN_ON when a PCI device has no of_node, it's
     allowed in some circumstances for there to be no of_node.

   - fix the offset for store EOI MMIOs in the XIVE interrupt
     controller.

   - fix non-const WARN_ONs which were becoming BUGs due to them losing
     BUGFLAG_WARNING in a recent cleanup patch.

  Thanks to: Alexey Kardashevskiy, Alistair Popple, Benjamin
  Herrenschmidt"

* tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path
  powerpc/xive: Fix offset for store EOI MMIOs
  powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node
2017-06-17 05:57:54 +09:00
Ravi Bangoria bf05fc25f2 powerpc/perf: Fix oops when kthread execs user process
When a kthread calls call_usermodehelper() the steps are:
  1. allocate current->mm
  2. load_elf_binary()
  3. populate current->thread.regs

While doing this, interrupts are not disabled. If there is a perf
interrupt in the middle of this process (i.e. step 1 has completed
but not yet reached to step 3) and if perf tries to read userspace
regs, kernel oops with following log:

  Unable to handle kernel paging request for data at address 0x00000000
  Faulting instruction address: 0xc0000000000da0fc
  ...
  Call Trace:
  perf_output_sample_regs+0x6c/0xd0
  perf_output_sample+0x4e4/0x830
  perf_event_output_forward+0x64/0x90
  __perf_event_overflow+0x8c/0x1e0
  record_and_restart+0x220/0x5c0
  perf_event_interrupt+0x2d8/0x4d0
  performance_monitor_exception+0x54/0x70
  performance_monitor_common+0x158/0x160
  --- interrupt: f01 at avtab_search_node+0x150/0x1a0
      LR = avtab_search_node+0x100/0x1a0
  ...
  load_elf_binary+0x6e8/0x15a0
  search_binary_handler+0xe8/0x290
  do_execveat_common.isra.14+0x5f4/0x840
  call_usermodehelper_exec_async+0x170/0x210
  ret_from_kernel_thread+0x5c/0x7c

Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
pt_regs are not set.

Fixes: ed4a4ef85c ("powerpc/perf: Add support for sampling interrupt register state")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 21:02:46 +10:00
Naveen N. Rao d89ba5353f powerpc/64s: Handle data breakpoints in Radix mode
On Power9, trying to use data breakpoints throws the splat shown
below. This is because the check for a data breakpoint in DSISR is in
do_hash_page(), which is not called when in Radix mode.

  Unable to handle kernel paging request for data at address 0xc000000000e19218
  Faulting instruction address: 0xc0000000001155e8
  cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20]
  pc: c0000000001155e8: find_pid_ns+0x48/0xe0
  lr: c000000000116ac4: find_task_by_vpid+0x44/0x90
  sp: c0000000ef1e7da0
  msr: 9000000000009033
  dar: c000000000e19218
  dsisr: 400000

Move the check to handle_page_fault() so as to catch data breakpoints
in both Hash and Radix MMU modes.

We have to change the check in do_hash_page() against 0xa410 to use
0xa450, so as to include the value of (DSISR_DABRMATCH << 16).

There are two sites that call handle_page_fault() when in Radix, both
already pass DSISR in r4.

Fixes: caca285e5a ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
Cc: stable@vger.kernel.org # v4.7+
Reported-by: Shriya R. Kulkarni <shriykul@in.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Fix the fall-through case on hash, we need to reload DSISR]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 19:49:43 +10:00
Naveen N. Rao c05b8c4474 powerpc/kprobes: Skip livepatch_handler() for jprobes
ftrace_caller() depends on a modified regs->nip to detect if a certain
function has been livepatched. However, with KPROBES_ON_FTRACE, it is
possible for regs->nip to have been modified by the kprobes pre_handler
(jprobes, for instance). In this case, we do not want to invoke the
livepatch_handler so as not to consume the livepatch stack.

To distinguish between the two (kprobes and livepatch), we check if
there is an active kprobe on the current function. If there is, then we
know for sure that it must have modified the NIP as we don't support
livepatching a kprobe'd function. In this case, we simply skip the
livepatch_handler and branch to the new NIP. Otherwise, the
livepatch_handler is invoked.

Fixes: ead514d5fb ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 19:49:43 +10:00
Naveen N. Rao a4979a7e71 powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS
For DYNAMIC_FTRACE_WITH_REGS, we should be passing-in the original set
of registers in pt_regs, to capture the state _before_ ftrace_caller.
However, we are instead passing the stack pointer *after* allocating a
stack frame in ftrace_caller. Fix this by saving the proper value of r1
in pt_regs. Also, use SAVE_10GPRS() to simplify the code.

Fixes: 153086644f ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 19:49:43 +10:00
Naveen N. Rao a9f8553e93 powerpc/kprobes: Pause function_graph tracing during jprobes handling
This fixes a crash when function_graph and jprobes are used together.
This is essentially commit 237d28db03 ("ftrace/jprobes/x86: Fix
conflict between jprobes and function graph tracing"), but for powerpc.

Jprobes breaks function_graph tracing since the jprobe hook needs to use
jprobe_return(), which never returns back to the hook, but instead to
the original jprobe'd function. The solution is to momentarily pause
function_graph tracing before invoking the jprobe hook and re-enable it
when returning back to the original jprobe'd function.

Fixes: 6794c78243 ("powerpc64: port of the function graph tracer")
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 19:49:43 +10:00
Alexey Kardashevskiy a093c92dc7 powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path
When trapped on WARN_ON(), report_bug() is expected to return
BUG_TRAP_TYPE_WARN so the caller will increment NIP by 4 and continue.
The __builtin_constant_p() path of the PPC's WARN_ON()
calls (indirectly) __WARN_FLAGS() which has BUGFLAG_WARNING set,
however the other branch does not which makes report_bug() report a
bug rather than a warning.

Fixes: f26dee1510 ("debug: Avoid setting BUGFLAG_WARNING twice")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 16:10:37 +10:00
Paul Mackerras 3d3efb68c1 KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1
POWER9 DD1 has an erratum where writing to the TBU40 register, which
is used to apply an offset to the timebase, can cause the timebase to
lose counts.  This results in the timebase on some CPUs getting out of
sync with other CPUs, which then results in misbehaviour of the
timekeeping code.

To work around the problem, we make KVM ignore the timebase offset for
all guests on POWER9 DD1 machines.  This means that live migration
cannot be supported on POWER9 DD1 machines.

Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-16 16:04:57 +10:00
Paul Mackerras 7ceaa6dcd8 KVM: PPC: Book3S HV: Save/restore host values of debug registers
At present, HV KVM on POWER8 and POWER9 machines loses any instruction
or data breakpoint set in the host whenever a guest is run.
Instruction breakpoints are currently only used by xmon, but ptrace
and the perf_event subsystem can set data breakpoints as well as xmon.

To fix this, we save the host values of the debug registers (CIABR,
DAWR and DAWRX) before entering the guest and restore them on exit.
To provide space to save them in the stack frame, we expand the stack
frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-16 11:53:19 +10:00
David S. Miller 0ddead90b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 11:59:32 -04:00
Benjamin Herrenschmidt 25642705b2 powerpc/xive: Fix offset for store EOI MMIOs
Architecturally we should apply a 0x400 offset for these. Not doing
it will break future HW implementations.

The offset of 0 is supposed to remain for "triggers" though not all
sources support both trigger and store EOI, and in P9 specifically,
some sources will treat 0 as a store EOI. But future chips will not.
So this makes us use the properly architected offset which should work
always.

Fixes: 243e25112d ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 23:29:39 +10:00
Nicholas Piggin 07d2a628bc powerpc/64s: Avoid cpabort in context switch when possible
The ISA v3.0B copy-paste facility only requires cpabort when switching
to a process that has foreign real addresses mapped (direct access to
accelerators), to clear a potential copy buffer filled by a previous
thread. There is no accelerator driver implemented yet, so cpabort can
be removed. It can be be re-added when a driver is implemented.

POWER9 DD1 requires the copy buffer to always be cleared on context
switch, but if accelerators are not in use, then an unpaired copy from
a dummy region is sufficient to clear data out of the copy buffer.

This increases context switch performance by about 5% on POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin 9145effd62 powerpc/64: Drop explicit hwsync in context switch
The sync (aka. hwsync, aka. heavyweight sync) in the context switch
code to prevent MMIO access being reordered from the point of view of
a single process if it gets migrated to a different CPU is not
required because there is an hwsync performed earlier in the context
switch path.

Comment this so it's clear enough if anything changes on the scheduler
or the powerpc sides. Remove the hwsync from _switch.

This improves context switch performance by 2-3% on POWER8.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin 837e72f78a powerpc/64: Drop reservation-clearing ldarx in context switch
There is no need to explicitly break the reservation in _switch,
because we are guaranteed that the context switch path will include a
larx/stcx.

Comment the guarantee and remove the reservation clear from _switch.

This is worth 1-2% in context switch performance.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin e4c0fc5f72 powerpc/64s: Leave interrupts hard enabled in context switch for radix
Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug")
hard disabled interrupts over the low level context switch, because
the SLB management can't cope with a PMU interrupt accesing the stack
in that window.

Radix based kernel mapping does not use the SLB so it does not require
interrupts hard disabled here.

This is worth 1-2% in context switch performance on POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin bc4f65e4cf powerpc/64: Avoid restore_math call if possible in syscall exit
The syscall exit code that branches to restore_math is quite heavy on
Book3S, consisting of 2 mtmsr instructions. Threads that don't use both
FP and vector can get caught here if the kernel ever uses FP or vector.
Lazy-FP/vec context switching also trips this case.

So check for lazy FP and vector before switching RI for restore_math.
Move most of this case out of line.

For threads that do want to restore math registers, the MSR switches are
still suboptimal. Future direction may be to use a soft-RI bit to avoid
MSR switches in kernel (similar to soft-EE), but for now at least the
no-restore

POWER9 context switch rate increases by about 5% due to sched_yield(2)
return performance. I haven't constructed a test to measure the syscall
cost.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin acd7d8cef0 powerpc/64s: Optimize hypercall/syscall entry
After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
guest to host"), a getppid() system call goes from 307 cycles to 358
cycles (+17%) on POWER8. This is due significantly to the scratch SPR
used by the hypercall check.

It turns out there are a some volatile registers common to both system
call and hypercall (in particular, r12, cr0, ctr), which can be used to
avoid the SPR and some other overheads. This brings getppid to 320 cycles
(+4%).

Testing hcall entry performance by running "sc 1" in guest userspace
before this patch is 854 cycles, afterwards is 826. Also a small win
there.

POWER9 syscall is improved by about the same amount, hcall not tested.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Michael Ellerman 9abcc981de powerpc/mm/radix: Only add X for pages overlapping kernel text
Currently we map the whole linear mapping with PAGE_KERNEL_X. Instead we
should check if the page overlaps the kernel text and only then add
PAGE_KERNEL_X.

Note that we still use 1G pages if they're available, so this will
typically still result in a 1G executable page at KERNELBASE. So this fix is
primarily useful for catching stray branches to high linear mapping addresses.

Without this patch, we can execute at 1G in xmon using:

  0:mon> m c000000040000000
  c000000040000000  00 l
  c000000040000000  00000000 01006038
  c000000040000004  00000000 2000804e
  c000000040000008  00000000 x
  0:mon> di c000000040000000
  c000000040000000  38600001      li      r3,1
  c000000040000004  4e800020      blr
  0:mon> p c000000040000000
  return value is 0x1

After we get a 400 as expected:

  0:mon> p c000000040000000
  *** 400 exception occurred

Fixes: 2bfd65e45e ("powerpc/mm/radix: Add radix callbacks for early init routines")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
2017-06-15 16:34:39 +10:00
Michael Ellerman 0edc2ca9cc Revert "powerpc: Handle simultaneous interrupts at once"
This reverts commit 45cb08f479.

For some reason this is causing IRQ problems on Freescale Book3E
machines, eg on my p5020ds:

  irq 25: nobody cared (try booting with the "irqpoll" option)
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc3-gcc-6.3.1-00037-g45cb08f4791c #624
  Call Trace:
  [c0000000fffdbb10] [c00000000049962c] .dump_stack+0xa8/0xe8 (unreliable)
  [c0000000fffdbba0] [c0000000000babf4] .__report_bad_irq+0x54/0x140
  [c0000000fffdbc40] [c0000000000bb11c] .note_interrupt+0x324/0x380
  [c0000000fffdbd00] [c0000000000b7110] .handle_irq_event_percpu+0x68/0x88
  [c0000000fffdbd90] [c0000000000b718c] .handle_irq_event+0x5c/0xa8
  [c0000000fffdbe10] [c0000000000bc01c] .handle_fasteoi_irq+0xe4/0x298
  [c0000000fffdbe90] [c0000000000b59c4] .generic_handle_irq+0x50/0x74
  [c0000000fffdbf10] [c0000000000075d8] .__do_irq+0x74/0x1f0
  [c0000000fffdbf90] [c0000000000189f8] .call_do_irq+0x14/0x24
  [c0000000f7173060] [c0000000000077e4] .do_IRQ+0x90/0x120
  [c0000000f7173100] [c00000000001d93c] exc_0x500_common+0xfc/0x100
  --- interrupt: 501 at .prepare_to_wait_event+0xc/0x14c
      LR = .fsl_elbc_run_command+0xc8/0x23c
  [c0000000f71734d0] [c00000000065f418] .nand_reset+0xb8/0x168
  [c0000000f7173560] [c00000000065fec4] .nand_scan_ident+0x2b0/0x1638
  [c0000000f7173650] [c000000000666cd8] .fsl_elbc_nand_probe+0x34c/0x5f0
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  [c0000000f7173750] [c0000000005a3c60] .platform_drv_probe+0x64/0xb0
  [c0000000f71737d0] [c0000000005a12e0] .really_probe+0x290/0x334
  [c0000000f7173870] [c0000000005a14a0] .__driver_attach+0x11c/0x120
  [c0000000f7173900] [c00000000059e6a0] .bus_for_each_dev+0x98/0xfc
  [c0000000f71739a0] [c0000000005a0b3c] .driver_attach+0x34/0x4c
  [c0000000f7173a20] [c0000000005a04b0] .bus_add_driver+0x1ac/0x2e0
  [c0000000f7173ac0] [c0000000005a2170] .driver_register+0x94/0x160
  [c0000000f7173b40] [c0000000005a3be0] .__platform_driver_register+0x60/0x7c
  [c0000000f7173bc0] [c000000000d6aab4] .fsl_elbc_nand_driver_init+0x24/0x38
  [c0000000f7173c30] [c000000000001934] .do_one_initcall+0x68/0x1b8
  [c0000000f7173d00] [c000000000d210f8] .kernel_init_freeable+0x260/0x338
  [c0000000f7173db0] [c0000000000021b0] .kernel_init+0x20/0xe70
  [c0000000f7173e30] [c0000000000009bc] .ret_from_kernel_thread+0x58/0x9c
  handlers:
  [<c000000000ed85c8>] .fsl_lbc_ctrl_irq
  Disabling IRQ #25

Ben also had concerns with the implementation being potentially slow on
some PICs, so revert it for now.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:20:46 +10:00
Paul Mackerras 46a704f840 KVM: PPC: Book3S HV: Preserve userspace HTM state properly
If userspace attempts to call the KVM_RUN ioctl when it has hardware
transactional memory (HTM) enabled, the values that it has put in the
HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
guest values.  To fix this, we detect this condition and save those
SPR values in the thread struct, and disable HTM for the task.  If
userspace goes to access those SPRs or the HTM facility in future,
a TM-unavailable interrupt will occur and the handler will reload
those SPRs and re-enable HTM.

If userspace has started a transaction and suspended it, we would
currently lose the transactional state in the guest entry path and
would almost certainly get a "TM Bad Thing" interrupt, which would
cause the host to crash.  To avoid this, we detect this case and
return from the KVM_RUN ioctl with an EINVAL error, with the KVM
exit reason set to KVM_EXIT_FAIL_ENTRY.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-15 16:18:17 +10:00
Paul Mackerras 4c3bb4ccd0 KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
This restores several special-purpose registers (SPRs) to sane values
on guest exit that were missed before.

TAR and VRSAVE are readable and writable by userspace, and we need to
save and restore them to prevent the guest from potentially affecting
userspace execution (not that TAR or VRSAVE are used by any known
program that run uses the KVM_RUN ioctl).  We save/restore these
in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.

FSCR affects userspace execution in that it can prohibit access to
certain facilities by userspace.  We restore it to the normal value
for the task on exit from the KVM_RUN ioctl.

IAMR is normally 0, and is restored to 0 on guest exit.  However,
with a radix host on POWER9, it is set to a value that prevents the
kernel from executing user-accessible memory.  On POWER9, we save
IAMR on guest entry and restore it on guest exit to the saved value
rather than 0.  On POWER8 we continue to set it to 0 on guest exit.

PSPB is normally 0.  We restore it to 0 on guest exit to prevent
userspace taking advantage of the guest having set it non-zero
(which would allow userspace to set its SMT priority to high).

UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
the AMR from being used as a covert channel between userspace
processes, since the AMR is not context-switched at present.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-15 16:17:09 +10:00
Alistair Popple 377aa6b0ef powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node
Commit 4c3b89effc ("powerpc/powernv: Add sanity checks to
pnv_pci_get_{gpu|npu}_dev") introduced explicit warnings in
pnv_pci_get_npu_dev() when a PCIe device has no associated device-tree
node. However not all PCIe devices have an of_node and
pnv_pci_get_npu_dev() gets indirectly called at least once for every
PCIe device in the system. This results in spurious WARN_ON()'s so
remove it.

The same situation should not exist for pnv_pci_get_gpu_dev() as any
NPU based PCIe device requires a device-tree node.

Fixes: 4c3b89effc ("powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-14 15:23:19 +10:00
Thomas Falcon 40c9db8ad8 ibmvnic: Client-initiated failover
The IBM vNIC protocol provides support for the user to initiate
a failover from the client LPAR in case the current backing infrastructure
is deemed inadequate or in an error state.

Support for two H_VIOCTL sub-commands for vNIC devices are required
to implement this function. These commands are H_GET_SESSION_TOKEN
and H_SESSION_ERR_DETECTED.

"[H_GET_SESSION_TOKEN] is used to obtain a session token from a VNIC client
adapter.  This token is opaque to the caller and is intended to be used in
tandem with the SESSION_ERROR_DETECTED vioctl subfunction."

"[H_SESSION_ERR_DETECTED] is used to report that the currently active
backing device for a VNIC client adapter is behaving poorly, and that
the hypervisor should attempt to fail over to a different backing device,
if one is available."

To provide tools access to this functionality the vNIC driver creates a
sysfs file that, when written to, will send a request to pHyp to failover
to a different backing device.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13 12:53:35 -04:00
Kirill A. Shutemov e585513b76 x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
This patch provides all required callbacks required by the generic
get_user_pages_fast() code and switches x86 over - and removes
the platform specific implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13 08:56:50 +02:00
Paul Mackerras ca8efa1df1 KVM: PPC: Book3S HV: Context-switch EBB registers properly
This adds code to save the values of three SPRs (special-purpose
registers) used by userspace to control event-based branches (EBBs),
which are essentially interrupts that get delivered directly to
userspace.  These registers are loaded up with guest values when
entering the guest, and their values are saved when exiting the
guest, but we were not saving the host values and restoring them
before going back to userspace.

On POWER8 this would only affect userspace programs which explicitly
request the use of EBBs and also use the KVM_RUN ioctl, since the
only source of EBBs on POWER8 is the PMU, and there is an explicit
enable bit in the PMU registers (and those PMU registers do get
properly context-switched between host and guest).  On POWER9 there
is provision for externally-generated EBBs, and these are not subject
to the control in the PMU registers.

Since these registers only affect userspace, we can save them when
we first come in from userspace and restore them before returning to
userspace, rather than saving/restoring the host values on every
guest entry/exit.  Similarly, we don't need to worry about their
values on offline secondary threads since they execute in the context
of the idle task, which never executes in userspace.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-13 14:12:02 +10:00
Greg Kroah-Hartman 205a1ee15d powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-12 16:18:37 +02:00
Greg Kroah-Hartman 451e3f1a74 powerpc: vio: use dev_groups and not dev_attrs for bus_type
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-12 16:18:37 +02:00
Linus Torvalds 32627645e9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key subsystem fixes from James Morris:
 "Here are a bunch of fixes for Linux keyrings, including:

   - Fix up the refcount handling now that key structs use the
     refcount_t type and the refcount_t ops don't allow a 0->1
     transition.

   - Fix a potential NULL deref after error in x509_cert_parse().

   - Don't put data for the crypto algorithms to use on the stack.

   - Fix the handling of a null payload being passed to add_key().

   - Fix incorrect cleanup an uninitialised key_preparsed_payload in
     key_update().

   - Explicit sanitisation of potentially secure data before freeing.

   - Fixes for the Diffie-Helman code"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
  KEYS: fix refcount_inc() on zero
  KEYS: Convert KEYCTL_DH_COMPUTE to use the crypto KPP API
  crypto : asymmetric_keys : verify_pefile:zero memory content before freeing
  KEYS: DH: add __user annotations to keyctl_kdf_params
  KEYS: DH: ensure the KDF counter is properly aligned
  KEYS: DH: don't feed uninitialized "otherinfo" into KDF
  KEYS: DH: forbid using digest_null as the KDF hash
  KEYS: sanitize key structs before freeing
  KEYS: trusted: sanitize all key material
  KEYS: encrypted: sanitize all key material
  KEYS: user_defined: sanitize key payloads
  KEYS: sanitize add_key() and keyctl() key payloads
  KEYS: fix freeing uninitialized memory in key_update()
  KEYS: fix dereferencing NULL payload with nonzero length
  KEYS: encrypted: use constant-time HMAC comparison
  KEYS: encrypted: fix race causing incorrect HMAC calculations
  KEYS: encrypted: fix buffer overread in valid_master_desc()
  KEYS: encrypted: avoid encrypting/decrypting stack buffers
  KEYS: put keyring if install_session_keyring_to_cred() fails
  KEYS: Delete an error message for a failed memory allocation in get_derived_key()
  ...
2017-06-11 16:17:29 -07:00
Linus Torvalds a92f63cd13 powerpc fixes for 4.12 #5
Mostly fairly minor, of note are:
  - Fix percpu allocations to be NUMA aware
  - Limit 4k page size config to 64TB virtual address space
  - Avoid needlessly restoring FP and vector registers
 
 Thanks to:
   Aneesh Kumar K.V, Breno Leitao, Christophe Leroy, Frederic Barrat, Madhavan
   Srinivasan, Michael Bringmann, Nicholas Piggin, Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZOobDAAoJEFHr6jzI4aWA/2AQALYfeTGo5Pl+c4E1wd5XpDae
 cXNxDaFFqdyNBW3OzWcDDW7w+pGvuu6SLPdki1CNosp4MprqpZurjTe9bhfYoAy1
 SQAcUgk7vra6WlbfguinwlCDeeKt53Np9ZYs4jhq/wy9bp6+T0evpdoY5NvHRr+x
 wR4y3KtP1t3PSlnJFfXq2jHg6xYcqJ14cVjWD6KPW8c8YB81WU7FkeVF5YtzinxE
 OWfuJlwzXG3JtG5q9CwwcxUiwpWSPZ7WPHovD8sCQc1pkdXx98LY5syhYy+8n2pa
 B2DbHoRwwkxQwT2z6KAGBUPtf94Fhe4bHcH24RXVeTnr1Wu1sIrkJkQ7c0OR6wN/
 nT8H+uJMpuWPNmbVk6n1LWqLem3YliVB6HwcP0944TF9P/f1GCF6YotkZJ/aXTSc
 FLA/teN0tzQyaX+DOrgTGTgmuscCOkDMRUHPmB9HoBYP3XLgeA+N9mphbxtj03d0
 ch62OqZBw6MiweEx3Ji1s7mfTr8/VaCtSx+rzz7GsQllXeo8k2TVoBewr2X1wesv
 0hB779hRYt9JvDVgmJlLyClLsE/kEy6ZPD6r7kf9DgBYwQRaHdMZDZIR6jDZjklJ
 viVEmQ91lCRx+g3q7hqgZO8VPr2EY12h7PqGgcDvkzHq0vawkxif4aBJ5AIZHpWs
 J23ZZB08M8llurG99+uA
 =KDBG
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Mostly fairly minor, of note are:

   - Fix percpu allocations to be NUMA aware

   - Limit 4k page size config to 64TB virtual address space

   - Avoid needlessly restoring FP and vector registers

  Thanks to Aneesh Kumar K.V, Breno Leitao, Christophe Leroy, Frederic
  Barrat, Madhavan Srinivasan, Michael Bringmann, Nicholas Piggin,
  Vaibhav Jain"

* tag 'powerpc-4.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by default
  powerpc/mm/4k: Limit 4k page size config to 64TB virtual address space
  cxl: Fix error path on bad ioctl
  powerpc/perf: Fix Power9 test_adder fields
  powerpc/numa: Fix percpu allocations to be NUMA aware
  cxl: Avoid double free_irq() for psl,slice interrupts
  powerpc/kernel: Initialize load_tm on task creation
  powerpc/kernel: Fix FP and vector register restoration
  powerpc/64: Reclaim CPU_FTR_SUBCORE
  powerpc/hotplug-mem: Fix missing endian conversion of aa_index
  powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function
  powerpc/spufs: Fix coredump of SPU contexts
  powerpc/64s: Add dt_cpu_ftrs boot time setup option
2017-06-09 09:44:46 -07:00
Aleksa Sarai 54ebbfb160 tty: add TIOCGPTPEER ioctl
When opening the slave end of a PTY, it is not possible for userspace to
safely ensure that /dev/pts/$num is actually a slave (in cases where the
mount namespace in which devpts was mounted is controlled by an
untrusted process). In addition, there are several unresolvable
race conditions if userspace were to attempt to detect attacks through
stat(2) and other similar methods [in addition it is not clear how
userspace could detect attacks involving FUSE].

Resolve this by providing an interface for userpace to safely open the
"peer" end of a PTY file descriptor by using the dentry cached by
devpts. Since it is not possible to have an open master PTY without
having its slave exposed in /dev/pts this interface is safe. This
interface currently does not provide a way to get the master pty (since
it is not clear whether such an interface is safe or even useful).

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Valentin Rothberg <vrothberg@suse.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09 12:27:54 +02:00
Greg Kroah-Hartman 823a44ac23 powerpc: ibmebus: use dev_groups and not dev_attrs for bus_type
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Johan Hovold <johan@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09 11:00:46 +02:00
Greg Kroah-Hartman 892533e191 powerpc: ps3: use dev_groups and not dev_attrs for bus_type
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.

Seems-ok: Geoff Levand <geoff@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09 11:00:46 +02:00
Bilal Amarni 47b2c3fff4 security/keys: add CONFIG_KEYS_COMPAT to Kconfig
CONFIG_KEYS_COMPAT is defined in arch-specific Kconfigs and is missing for
several 64-bit architectures : mips, parisc, tile.

At the moment and for those architectures, calling in 32-bit userspace the
keyctl syscall would return an ENOSYS error.

This patch moves the CONFIG_KEYS_COMPAT option to security/keys/Kconfig, to
make sure the compatibility wrapper is registered by default for any 64-bit
architecture as long as it is configured with CONFIG_COMPAT.

[DH: Modified to remove arm64 compat enablement also as requested by Eric
 Biggers]

Signed-off-by: Bilal Amarni <bilal.amarni@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
cc: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09 13:29:45 +10:00
Michael Ellerman c6ee9619e2 powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by default
The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with
other general kernel options. It's really more at home in the "Platform
Support" section, so move it there.

Also enable it by default, for Book3s 64. It does mostly nothing unless
the device tree properties are found, and we will want it enabled
eventually in distro kernels, so turn it on to start getting more
testing.

Fixes: 5a61ef74f2 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08 20:42:57 +10:00
Aneesh Kumar K.V 92d9dfda8b powerpc/mm/4k: Limit 4k page size config to 64TB virtual address space
Supporting 512TB requires us to do a order 3 allocation for level 1 page
table (pgd). This results in page allocation failures with certain workloads.
For now limit 4k linux page size config to 64TB.

Fixes: f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08 20:42:56 +10:00
Greg Kroah-Hartman 6f428096a4 driver core: remove CLASS_ATTR usage
There was only 2 remaining users of CLASS_ATTR() so let's finally get
rid of them and force everyone to use the correct RW/RO/WO versions
instead.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 11:15:22 +02:00
David S. Miller 216fe8f021 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just some simple overlapping changes in marvell PHY driver
and the DSA core code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 22:20:08 -04:00
Martin KaFai Lau 783d28dd11 bpf: Add jited_len to struct bpf_prog
Add jited_len to struct bpf_prog.  It will be
useful for the struct bpf_prog_info which will
be added in the later patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 15:41:24 -04:00
Madhavan Srinivasan 8c218578fc powerpc/perf: Fix Power9 test_adder fields
Commit 8d911904f3 ('powerpc/perf: Add restrictions to PMC5 in power9 DD1')
was added to restrict the use of PMC5 in Power9 DD1. Intention was to disable
the use of PMC5 using raw event code. But instead of updating the
power9_isa207_pmu structure (used on DD1), the commit incorrectly updated the
power9_pmu structure. Fix it.

Fixes: 8d911904f3 ("powerpc/perf: Add restrictions to PMC5 in power9 DD1")
Reported-by: Shriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Shriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:21:19 +10:00
Michael Ellerman ba4a648f12 powerpc/numa: Fix percpu allocations to be NUMA aware
In commit 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.

Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.

This is visible in the dmesg output, as all pcpu allocs being in group 0:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
  pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
  pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47

To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
  pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
  pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47

We can also check the data_offset in the paca of various CPUs, with the fix we
see:

  CPU 0:  data_offset = 0x0ffe8b0000
  CPU 24: data_offset = 0x1ffe5b0000

And we can see from dmesg that CPU 24 has an allocation on node 1:

  node   0: [mem 0x0000000000000000-0x0000000fffffffff]
  node   1: [mem 0x0000001000000000-0x0000001fffffffff]

Cc: stable@vger.kernel.org # v3.16+
Fixes: 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:19:46 +10:00
Nicholas Piggin 90df4bfb4d powerpc/64s: Machine check handle ifetch from foreign real address for POWER9
The i-side 0111b machine check, which is "Instruction Fetch to foreign
address space", was missed by 7b9f71f974 ("powerpc/64s: POWER9 machine
check handler").

    The POWER9 processor core considers host real addresses with a
    nonzero value in RA(8:12) as foreign address space, accessible only
    by the copy and paste instructions. The copy and paste instruction
    pair can be used to invoke the Nest accelerators via the Virtual
    Accelerator Switchboard (VAS).

It is an error for any regular load/store or ifetch to go to a foreign
addresses. When relocation is on, this causes an MMU exception. When
relocation is off, a machine check exception. It is possible to trigger
this machine check by branching to a foreign address with MSR[IR]=0.

Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler")
Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:17:15 +10:00
Breno Leitao 7f22ced437 powerpc/kernel: Initialize load_tm on task creation
Currently tsk->thread.load_tm is not initialized in the task creation
and can contain garbage on a new task.

This is an undesired behaviour, since it affects the timing to enable
and disable the transactional memory laziness (disabling and enabling
the MSR TM bit, which affects TM reclaim and recheckpoint in the
scheduling process).

Fixes: 5d176f751e ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 19:09:22 +10:00
Christophe Leroy 4386c096c2 powerpc/mm: Rename map_page() to map_kernel_page() on 32-bit
These two functions implement the same semantics, so unify their naming so we
can share code that calls them. The longer name is more descriptive so use it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:59:03 +10:00
Balbir Singh d2485644c7 powerpc/mm/hugetlb: Add support for page accounting
Add __GFP_ACCOUNT to __hugepte_alloc()

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:03:12 +10:00
Balbir Singh abd667be15 powerpc/mm/book(e)(3s)/32: Add page table accounting
Add support in pte_alloc_one() and pgd_alloc() by
passing __GFP_ACCOUNT in the flags

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:03:11 +10:00
Balbir Singh de3b87611d powerpc/mm/book(e)(3s)/64: Add page table accounting
Introduce a helper pgtable_gfp_flags() which
just returns the current gfp flags and adds
__GFP_ACCOUNT to account for page table allocation.
The generic helper is added to include/asm/pgalloc.h
and has two variants - WARNING ugly bits ahead

1. If the header is included from a module, no check
for mm == &init_mm is done, since init_mm is not
exported
2. For kernel includes, the check is done and required
see (3e79ec7 arch: x86: charge page tables to kmemcg)

The fundamental assumption is that no module should be
doing pgd/pud/pmd and pte alloc's on behalf of init_mm
directly.

NOTE: This adds an overhead to pmd/pud/pgd allocations
similar to x86.  The other alternative was to implement
pmd_alloc_kernel/pud_alloc_kernel and pgd_alloc_kernel
with their offset variants.

For 4k page size, pte_alloc_one no longer calls
pte_alloc_one_kernel.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:03:10 +10:00
Balbir Singh c5cee6421c powerpc/mm/hash: Do a local flush if possible when no batch is active
Currently in hpte_need_flush() if there is no batch pending we always do a
global TLB flush, which is inefficient if the mm has never run on another
thread.

Instead do the same check that __flush_tlb_pending() does and check if a local
flush is sufficient when batch->active is false. Instead of open-coding it we
use mm_is_thread_local().

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Don't use a local, just inline mm_is_thread_local()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:02:55 +10:00
Colin Ian King b802ab46ba powerpc: Fix some spelling mistakes
Collation of some spelling fixes from Colin.

 Attemping   -> Attempting
 intialized  -> initialized
 missmanaged -> mismanaged

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 16:50:15 +10:00
Breno Leitao 1195892c09 powerpc/kernel: Fix FP and vector register restoration
Currently tsk->thread->load_vec and load_fp are not initialized during
task creation, which can lead to garbage values in these variables (non-zero
values).

These variables will be checked later in restore_math() to validate if the
FP and vector registers are being utilized. Since these values might be
non-zero, the restore_math() will continue to save the FP and vectors even if
they were never utilized by the userspace application. load_fp and load_vec
counters will then overflow (they wrap at 255) and the FP and Altivec will be
finally disabled, but before that condition is reached (counter overflow)
several context switches will have restored FP and vector registers without
need, causing a performance degradation.

Fixes: 70fe3d980f ("powerpc: Restore FPU/VEC/VSX if previously used")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 15:55:30 +10:00
Radim Krčmář 2fa6e1e12a KVM: add kvm_request_pending
A first step in vcpu->requests encapsulation.  Additionally, we now
use READ_ONCE() when accessing vcpu->requests, which ensures we
always load vcpu->requests when it's accessed.  This is important as
other threads can change it any time.  Also, READ_ONCE() documents
that vcpu->requests is used with other threads, likely requiring
memory barriers, which it does.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[ Documented the new use of READ_ONCE() and converted another check
  in arch/mips/kvm/vz.c ]
Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-06-04 16:53:00 +02:00
Andrew Jones 2387149ead KVM: improve arch vcpu request defining
Marc Zyngier suggested that we define the arch specific VCPU request
base, rather than requiring each arch to remember to start from 8.
That suggestion, along with Radim Krcmar's recent VCPU request flag
addition, snowballed into defining something of an arch VCPU request
defining API.

No functional change.

(Looks like x86 is running out of arch VCPU request bits.  Maybe
 someday we'll need to extend to 64.)

Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-06-04 16:53:00 +02:00
Matt Brown f718d426d7 powerpc/lib/xor_vmx: Ensure no altivec code executes before enable_kernel_altivec()
The xor_vmx.c file is used for the RAID5 xor operations. In these functions
altivec is enabled to run the operation and then disabled.

The code uses enable_kernel_altivec() around the core of the algorithm, however
the whole file is built with -maltivec, so the compiler is within its rights to
generate altivec code anywhere. This has been seen at least once in the wild:

  0:mon> di $xor_altivec_2
  c0000000000b97d0  3c4c01d9	addis   r2,r12,473
  c0000000000b97d4  3842db30	addi    r2,r2,-9424
  c0000000000b97d8  7c0802a6	mflr    r0
  c0000000000b97dc  f8010010	std     r0,16(r1)
  c0000000000b97e0  60000000	nop
  c0000000000b97e4  7c0802a6	mflr    r0
  c0000000000b97e8  faa1ffa8	std     r21,-88(r1)
  ...
  c0000000000b981c  f821ff41	stdu    r1,-192(r1)
  c0000000000b9820  7f8101ce	stvx    v28,r1,r0		<-- POP
  c0000000000b9824  38000030	li      r0,48
  c0000000000b9828  7fa101ce	stvx    v29,r1,r0
  ...
  c0000000000b984c  4bf6a06d	bl      c0000000000238b8 # enable_kernel_altivec

This patch splits the non-altivec code into xor_vmx_glue.c which calls the
altivec functions in xor_vmx.c. By compiling xor_vmx_glue.c without
-maltivec we can guarantee that altivec instruction will not be executed
outside of the enable/disable block.

Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[mpe: Rework change log and include disassembly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 20:17:52 +10:00
Hari Bathini 48a316e350 powerpc/fadump: Set an upper limit for boot memory size
By default, 5% of system RAM is reserved for preserving boot memory.
Alternatively, a user can specify the amount of memory to reserve.
See Documentation/powerpc/firmware-assisted-dump.txt for details. In
addition to the memory reserved for preserving boot memory, some more
memory is reserved, to save HPTE region, CPU state data and ELF core
headers.

Memory Reservation during first kernel looks like below:

  Low memory                                        Top of memory
  0      boot memory size                                       |
  |           |                       |<--Reserved dump area -->|
  V           V                       |   Permanent Reservation V
  +-----------+----------/ /----------+---+----+-----------+----+
  |           |                       |CPU|HPTE|  DUMP     |ELF |
  +-----------+----------/ /----------+---+----+-----------+----+
        |                                           ^
        |                                           |
        \                                           /
         -------------------------------------------
          Boot memory content gets transferred to
          reserved area by firmware at the time of
          crash

This implicitly means that the sum of the sizes of boot memory, CPU
state data, HPTE region, DUMP preserving area and ELF core headers
can't be greater than the total memory size. But currently, a user is
allowed to specify any value as boot memory size. So, the above rule
is violated when a boot memory size around 50% of the total available
memory is specified. As the kernel is not handling this currently, it
may lead to undefined behavior. Fix it by setting an upper limit for
boot memory size to 25% of the total available memory. Also, instead
of using memblock_end_of_DRAM(), which doesn't take the holes, if any,
in the memory layout into account, use memblock_phys_mem_size() to
calculate the percentage of total available memory.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 20:16:50 +10:00
Hari Bathini e7467dc694 powerpc/fadump: Update comment about offset where fadump is reserved
With commit f6e6bedb77 ("powerpc/fadump: Reserve memory at an offset
closer to bottom of RAM"), memory for fadump is no longer reserved at
the top of RAM. But there are still a few places which say so. Change
them appropriately.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 20:16:49 +10:00
Hari Bathini 81d9eca502 powerpc/fadump: Add a warning when 'fadump_reserve_mem=' is used
With commit 11550dc0a0 ("powerpc/fadump: reuse crashkernel parameter
for fadump memory reservation"), 'fadump_reserve_mem=' parameter is
deprecated in favor of 'crashkernel=' parameter. Add a warning if
'fadump_reserve_mem=' is still used.

Fixes: 11550dc0a0 ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation")
Suggested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Unsplit long printk strings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 20:16:35 +10:00
Michal Suchanek 98b8cd7f75 powerpc/fadump: Return error when fadump registration fails
- log an error message when registration fails and no error code listed
   in the switch is returned
 - translate the hv error code to posix error code and return it from
   fw_register
 - return the posix error code from fw_register to the process writing
   to sysfs
 - return EEXIST on re-registration
 - return success on deregistration when fadump is not registered
 - return ENODEV when no memory is reserved for fadump

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Tested-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Use pr_err() to shrink the error print]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:57 +10:00
Christophe Leroy f782ddf297 powerpc: Remove __ilog2()s and use generic ones
With the __ilog2() function as defined in
arch/powerpc/include/asm/bitops.h, GCC will not optimise the code
in case of constant parameter.

The generic ilog2() function in include/linux/log2.h is written
to handle the case of the constant parameter.

This patch discards the three __ilog2() functions and
defines __ilog2() as ilog2()

For non constant calls, the generated code is doing the same:
int test__ilog2(unsigned long x)
{
	return __ilog2(x);
}

int test__ilog2_u32(u32 n)
{
	return __ilog2_u32(n);
}

int test__ilog2_u64(u64 n)
{
	return __ilog2_u64(n);
}

On PPC32 before the patch:
00000000 <test__ilog2>:
   0:	7c 63 00 34 	cntlzw  r3,r3
   4:	20 63 00 1f 	subfic  r3,r3,31
   8:	4e 80 00 20 	blr

0000000c <test__ilog2_u32>:
   c:	7c 63 00 34 	cntlzw  r3,r3
  10:	20 63 00 1f 	subfic  r3,r3,31
  14:	4e 80 00 20 	blr

On PPC32 after the patch:
00000000 <test__ilog2>:
   0:	7c 63 00 34 	cntlzw  r3,r3
   4:	20 63 00 1f 	subfic  r3,r3,31
   8:	4e 80 00 20 	blr

0000000c <test__ilog2_u32>:
   c:	7c 63 00 34 	cntlzw  r3,r3
  10:	20 63 00 1f 	subfic  r3,r3,31
  14:	4e 80 00 20 	blr

On PPC64 before the patch:
0000000000000000 <.test__ilog2>:
   0:	7c 63 00 74 	cntlzd  r3,r3
   4:	20 63 00 3f 	subfic  r3,r3,63
   8:	7c 63 07 b4 	extsw   r3,r3
   c:	4e 80 00 20 	blr

0000000000000010 <.test__ilog2_u32>:
  10:	7c 63 00 34 	cntlzw  r3,r3
  14:	20 63 00 1f 	subfic  r3,r3,31
  18:	7c 63 07 b4 	extsw   r3,r3
  1c:	4e 80 00 20 	blr

0000000000000020 <.test__ilog2_u64>:
  20:	7c 63 00 74 	cntlzd  r3,r3
  24:	20 63 00 3f 	subfic  r3,r3,63
  28:	7c 63 07 b4 	extsw   r3,r3
  2c:	4e 80 00 20 	blr

On PPC64 after the patch:
0000000000000000 <.test__ilog2>:
   0:	7c 63 00 74 	cntlzd  r3,r3
   4:	20 63 00 3f 	subfic  r3,r3,63
   8:	7c 63 07 b4 	extsw   r3,r3
   c:	4e 80 00 20 	blr

0000000000000010 <.test__ilog2_u32>:
  10:	7c 63 00 34 	cntlzw  r3,r3
  14:	20 63 00 1f 	subfic  r3,r3,31
  18:	7c 63 07 b4 	extsw   r3,r3
  1c:	4e 80 00 20 	blr

0000000000000020 <.test__ilog2_u64>:
  20:	7c 63 00 74 	cntlzd  r3,r3
  24:	20 63 00 3f 	subfic  r3,r3,63
  28:	7c 63 07 b4 	extsw   r3,r3
  2c:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:56 +10:00
Christophe Leroy 22ef33b368 powerpc: Replace ffz() by equivalent generic function
With the ffz() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.

This patch replaces ffz() by the generic function.

The generic ffz(x) expects to never be called with ~x == 0
as written in the comment in include/asm-generic/bitops/ffz.h
The only user of ffz() within arch/powerpc/ is
platforms/512x/mpc5121_ads_cpld.c, which checks if x is not 0xff

For non constant calls, the generated code is doing the same:

unsigned long testffz(unsigned long x)
{
	return ffz(x);
}

On PPC32, before the patch:
00000018 <testffz>:
  18:	7c 63 18 f9 	not.    r3,r3
  1c:	40 82 00 0c 	bne     28 <testffz+0x10>
  20:	38 60 00 20 	li      r3,32
  24:	4e 80 00 20 	blr
  28:	7d 23 00 d0 	neg     r9,r3
  2c:	7d 23 18 38 	and     r3,r9,r3
  30:	7c 63 00 34 	cntlzw  r3,r3
  34:	20 63 00 1f 	subfic  r3,r3,31
  38:	4e 80 00 20 	blr

On PPC32, after the patch:
00000018 <testffz>:
  18:	39 23 00 01 	addi    r9,r3,1
  1c:	7d 23 18 78 	andc    r3,r9,r3
  20:	7c 63 00 34 	cntlzw  r3,r3
  24:	20 63 00 1f 	subfic  r3,r3,31
  28:	4e 80 00 20 	blr

On PPC64, before the patch:
0000000000000030 <.testffz>:
  30:	7c 60 18 f9 	not.    r0,r3
  34:	38 60 00 40 	li      r3,64
  38:	4d 82 00 20 	beqlr
  3c:	7c 60 00 d0 	neg     r3,r0
  40:	7c 63 00 38 	and     r3,r3,r0
  44:	7c 63 00 74 	cntlzd  r3,r3
  48:	20 63 00 3f 	subfic  r3,r3,63
  4c:	7c 63 07 b4 	extsw   r3,r3
  50:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000030 <.testffz>:
  30:	38 03 00 01 	addi    r0,r3,1
  34:	7c 03 18 78 	andc    r3,r0,r3
  38:	7c 63 00 74 	cntlzd  r3,r3
  3c:	20 63 00 3f 	subfic  r3,r3,63
  40:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:55 +10:00
Christophe Leroy 2fcff790dc powerpc: Use builtin functions for fls()/__fls()/fls64()
With the fls() functions as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.

This patch replaces __fls() by the builtin function, and modifies
fls() and fls64() to use builtins instead of inline assembly

For non constant calls, the generated code is doing the same:

int testfls(unsigned int x)
{
	return fls(x);
}

unsigned long test__fls(unsigned long x)
{
	return __fls(x);
}

int testfls64(__u64 x)
{
	return fls64(x);
}

On PPC32, before the patch:
00000064 <testfls>:
  64:	7c 63 00 34 	cntlzw  r3,r3
  68:	20 63 00 20 	subfic  r3,r3,32
  6c:	4e 80 00 20 	blr

00000070 <test__fls>:
  70:	7c 63 00 34 	cntlzw  r3,r3
  74:	20 63 00 1f 	subfic  r3,r3,31
  78:	4e 80 00 20 	blr

0000007c <testfls64>:
  7c:	2c 03 00 00 	cmpwi   r3,0
  80:	40 82 00 10 	bne     90 <testfls64+0x14>
  84:	7c 83 00 34 	cntlzw  r3,r4
  88:	20 63 00 20 	subfic  r3,r3,32
  8c:	4e 80 00 20 	blr
  90:	7c 63 00 34 	cntlzw  r3,r3
  94:	20 63 00 40 	subfic  r3,r3,64
  98:	4e 80 00 20 	blr

On PPC32, after the patch:
00000054 <testfls>:
  54:	7c 63 00 34 	cntlzw  r3,r3
  58:	20 63 00 20 	subfic  r3,r3,32
  5c:	4e 80 00 20 	blr

00000060 <test__fls>:
  60:	7c 63 00 34 	cntlzw  r3,r3
  64:	20 63 00 1f 	subfic  r3,r3,31
  68:	4e 80 00 20 	blr

0000006c <testfls64>:
  6c:	2c 03 00 00 	cmpwi   r3,0
  70:	41 82 00 10 	beq     80 <testfls64+0x14>
  74:	7c 63 00 34 	cntlzw  r3,r3
  78:	20 63 00 40 	subfic  r3,r3,64
  7c:	4e 80 00 20 	blr
  80:	7c 83 00 34 	cntlzw  r3,r4
  84:	20 63 00 40 	subfic  r3,r3,32
  88:	4e 80 00 20 	blr

On PPC64, before the patch:
00000000000000a0 <.testfls>:
  a0:	7c 63 00 34 	cntlzw  r3,r3
  a4:	20 63 00 20 	subfic  r3,r3,32
  a8:	7c 63 07 b4 	extsw   r3,r3
  ac:	4e 80 00 20 	blr

00000000000000b0 <.test__fls>:
  b0:	7c 63 00 74 	cntlzd  r3,r3
  b4:	20 63 00 3f 	subfic  r3,r3,63
  b8:	7c 63 07 b4 	extsw   r3,r3
  bc:	4e 80 00 20 	blr

00000000000000c0 <.testfls64>:
  c0:	7c 63 00 74 	cntlzd  r3,r3
  c4:	20 63 00 40 	subfic  r3,r3,64
  c8:	7c 63 07 b4 	extsw   r3,r3
  cc:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000090 <.testfls>:
  90:	7c 63 00 34 	cntlzw  r3,r3
  94:	20 63 00 20 	subfic  r3,r3,32
  98:	7c 63 07 b4 	extsw   r3,r3
  9c:	4e 80 00 20 	blr

00000000000000a0 <.test__fls>:
  a0:	7c 63 00 74 	cntlzd  r3,r3
  a4:	20 63 00 3f 	subfic  r3,r3,63
  a8:	4e 80 00 20 	blr
  ac:	60 00 00 00 	nop

00000000000000b0 <.testfls64>:
  b0:	7c 63 00 74 	cntlzd  r3,r3
  b4:	20 63 00 40 	subfic  r3,r3,64
  b8:	7c 63 07 b4 	extsw   r3,r3
  bc:	4e 80 00 20 	blr

Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:55 +10:00
Christophe Leroy f83647d642 powerpc: Discard ffs()/__ffs() function and use builtin functions instead
With the ffs() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter, as shown
by the small exemple below.

int ffs_test(void)
{
	return 4 << ffs(31);
}

c0012334 <ffs_test>:
c0012334:       39 20 00 01     li      r9,1
c0012338:       38 60 00 04     li      r3,4
c001233c:       7d 29 00 34     cntlzw  r9,r9
c0012340:       21 29 00 20     subfic  r9,r9,32
c0012344:       7c 63 48 30     slw     r3,r3,r9
c0012348:       4e 80 00 20     blr

With this patch, the same function will compile as follows:

c0012334 <ffs_test>:
c0012334:       38 60 00 08     li      r3,8
c0012338:       4e 80 00 20     blr

The same happens with __ffs()

For non constant calls, the generated code is doing the same,
allthought it is slightly different on 64 bits for ffs():

unsigned long test__ffs(unsigned long x)
{
	return __ffs(x);
}

int testffs(int x)
{
	return ffs(x);
}

On PPC32, before the patch:
0000003c <test__ffs>:
  3c:	7d 23 00 d0 	neg     r9,r3
  40:	7d 23 18 38 	and     r3,r9,r3
  44:	7c 63 00 34 	cntlzw  r3,r3
  48:	20 63 00 1f 	subfic  r3,r3,31
  4c:	4e 80 00 20 	blr

00000050 <testffs>:
  50:	7d 23 00 d0 	neg     r9,r3
  54:	7d 23 18 38 	and     r3,r9,r3
  58:	7c 63 00 34 	cntlzw  r3,r3
  5c:	20 63 00 20 	subfic  r3,r3,32
  60:	4e 80 00 20 	blr

On PPC32, after the patch:
0000002c <test__ffs>:
  2c:	7d 23 00 d0 	neg     r9,r3
  30:	7d 23 18 38 	and     r3,r9,r3
  34:	7c 63 00 34 	cntlzw  r3,r3
  38:	20 63 00 1f 	subfic  r3,r3,31
  3c:	4e 80 00 20 	blr

00000040 <testffs>:
  40:	7d 23 00 d0 	neg     r9,r3
  44:	7d 23 18 38 	and     r3,r9,r3
  48:	7c 63 00 34 	cntlzw  r3,r3
  4c:	20 63 00 20 	subfic  r3,r3,32
  50:	4e 80 00 20 	blr

On PPC64, before the patch:
0000000000000060 <.test__ffs>:
  60:	7c 03 00 d0 	neg     r0,r3
  64:	7c 03 18 38 	and     r3,r0,r3
  68:	7c 63 00 74 	cntlzd  r3,r3
  6c:	20 63 00 3f 	subfic  r3,r3,63
  70:	7c 63 07 b4 	extsw   r3,r3
  74:	4e 80 00 20 	blr

0000000000000080 <.testffs>:
  80:	7c 03 00 d0 	neg     r0,r3
  84:	7c 03 18 38 	and     r3,r0,r3
  88:	7c 63 00 74 	cntlzd  r3,r3
  8c:	20 63 00 40 	subfic  r3,r3,64
  90:	7c 63 07 b4 	extsw   r3,r3
  94:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000050 <.test__ffs>:
  50:	7c 03 00 d0 	neg     r0,r3
  54:	7c 03 18 38 	and     r3,r0,r3
  58:	7c 63 00 74 	cntlzd  r3,r3
  5c:	20 63 00 3f 	subfic  r3,r3,63
  60:	4e 80 00 20 	blr

0000000000000070 <.testffs>:
  70:	7c 03 00 d0 	neg     r0,r3
  74:	7c 03 18 38 	and     r3,r0,r3
  78:	7c 63 00 34 	cntlzw  r3,r3
  7c:	20 63 00 20 	subfic  r3,r3,32
  80:	7c 63 07 b4 	extsw   r3,r3
  84:	4e 80 00 20 	blr
(ffs() operates on an int so cntlzw is equivalent to cntlzd)

In addition, when reading the generated vmlinux, we can observe
that with the builtin functions, GCC sometimes efficiently spreads
the instructions within the generated functions while the inline
assembly force them to remain grouped together.

__builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h

Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:54 +10:00
Christophe Leroy 45cb08f479 powerpc: Handle simultaneous interrupts at once
It often happens to have simultaneous interrupts, for instance
when having double Ethernet attachment. With the current
implementation, we suffer the cost of kernel entry/exit for each
interrupt.

This patch introduces a loop in __do_irq() to handle all interrupts
at once before returning.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:20:44 +10:00
Christophe Leroy 3c29b60388 powerpc/8xx: fix mpc8xx_get_irq() return on no irq
IRQ 0 is a valid HW interrupt. So get_irq() shall return 0 when
there is no irq, instead of returning irq_linear_revmap(... ,0)

Fixes: f2a0bd3753 ("[POWERPC] 8xx: powerpc port of core CPM PIC")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:20:44 +10:00
Christophe Leroy 362957c27e powerpc/40x: Clear MSR_DR in one insn instead of two
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:20:43 +10:00
Christophe Leroy 92aa2fe039 powerpc/mm: The 8xx doesn't call do_page_fault() for breakpoints
The 8xx has a dedicated exception for breakpoints, that directly
calls do_break()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:20:12 +10:00
Christophe Leroy da929f6af4 powerpc/mm: Evaluate user_mode(regs) only once in do_page_fault()
Analysis of the assembly code shows that when using user_mode(regs),
at least the 'andi.' is redone all the time, and also
the 'lwz ,132(r31)' most of the time. With the new form, the 'is_user'
is mapped to cr4, then all further use of is_user results in just
things like 'beq cr4,218 <do_page_fault+0x218>'

Without the patch:

  50:	81 1e 00 84 	lwz     r8,132(r30)
  54:	71 09 40 00 	andi.   r9,r8,16384
  58:	40 82 00 0c 	bne     64 <do_page_fault+0x64>

  84:	81 3e 00 84 	lwz     r9,132(r30)
  8c:	71 2a 40 00 	andi.   r10,r9,16384
  90:	41 a2 01 64 	beq     1f4 <do_page_fault+0x1f4>

  d4:	81 3e 00 84 	lwz     r9,132(r30)
  dc:	71 28 40 00 	andi.   r8,r9,16384
  e0:	41 82 02 08 	beq     2e8 <do_page_fault+0x2e8>

 108:	81 3e 00 84 	lwz     r9,132(r30)
 110:	71 28 40 00 	andi.   r8,r9,16384
 118:	41 82 02 28 	beq     340 <do_page_fault+0x340>

 1e4:	81 3e 00 84 	lwz     r9,132(r30)
 1e8:	71 2a 40 00 	andi.   r10,r9,16384
 1ec:	40 82 01 68 	bne     354 <do_page_fault+0x354>

 228:	81 3e 00 84 	lwz     r9,132(r30)
 22c:	71 28 40 00 	andi.   r8,r9,16384
 230:	41 82 ff c4 	beq     1f4 <do_page_fault+0x1f4>

 288:	71 2a 40 00 	andi.   r10,r9,16384
 294:	41 a2 fe 60 	beq     f4 <do_page_fault+0xf4>

 50c:	81 3e 00 84 	lwz     r9,132(r30)
 514:	71 2a 40 00 	andi.   r10,r9,16384
 518:	40 a2 fc e0 	bne     1f8 <do_page_fault+0x1f8>

 534:	81 3e 00 84 	lwz     r9,132(r30)
 53c:	71 2a 40 00 	andi.   r10,r9,16384
 540:	41 82 fc b8 	beq     1f8 <do_page_fault+0x1f8>

This patch creates a local var called 'is_user' which contains the
result of user_mode(regs)

With the patch:

  20:	81 03 00 84 	lwz     r8,132(r3)
  48:	55 09 97 fe 	rlwinm  r9,r8,18,31,31
  58:	2e 09 00 00 	cmpwi   cr4,r9,0
  5c:	40 92 00 0c 	bne     cr4,68 <do_page_fault+0x68>

  88:	41 b2 01 90 	beq     cr4,218 <do_page_fault+0x218>

  d4:	40 92 01 d0 	bne     cr4,2a4 <do_page_fault+0x2a4>

 120:	41 b2 00 f8 	beq     cr4,218 <do_page_fault+0x218>

 138:	41 b2 ff a0 	beq     cr4,d8 <do_page_fault+0xd8>

 1d4:	40 92 00 e0 	bne     cr4,2b4 <do_page_fault+0x2b4>

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:19:45 +10:00
Christophe Leroy 97a011e69b powerpc/mm: Remove a redundant test in do_page_fault()
The result of (trap == 0x400) is already in is_exec.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:18:34 +10:00
Christophe Leroy e8de85ca32 powerpc/mm: Only call store_updates_sp() on stores in do_page_fault()
Function store_updates_sp() checks whether the faulting
instruction is a store updating r1. Therefore we can limit its calls
to store exceptions.

This patch is an improvement of commit a7a9dcd882 ("powerpc: Avoid
taking a data miss on every userspace instruction miss")

With the same microbenchmark app, run with 500 as argument, on an
MPC885 we get:

Before this patch: 152000 DTLB misses
After this patch:  147000 DTLB misses

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:10:24 +10:00
Christophe Leroy 9affa9e228 powerpc/mm: Remove __this_fixmap_does_not_exist()
This function has not been used since commit 9494a1e842
("powerpc: use generic fixmap.h)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:09:53 +10:00
Balbir Singh e63739b168 powerpc/mm/ptdump: Dump the first entry of the linear mapping as well
The check in hpte_find() should be < and not <= for PAGE_OFFSET

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:09:52 +10:00
Stephen Rothwell 042cc40934 powerpc: use asm-generic/socket.h as much as possible
asm-generic/socket.h already has an exception for the differences that
powerpc needs, so just include it after defining the differences.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01 14:48:05 -04:00
Michael Ellerman 0e5e7f5e97 powerpc/64: Reclaim CPU_FTR_SUBCORE
We are running low on CPU feature bits, so we only want to use them when
it's really necessary.

CPU_FTR_SUBCORE is only used in one place, and only in C, so we don't
need it in order to make asm patching work. It can only be set on
"Power8" CPUs, which in practice means POWER8, POWER8E and POWER8NVL.
There are no plans to implement it on future CPUs, but if there ever
were we could retrofit it then.

Although KVM uses subcores, it never looks at the CPU feature, it either
looks at the ISA level or the threads_per_subcore value.

So drop the CPU feature and do a PVR check instead. Drop the device tree
"subcore" feature as we no longer support doing anything with it, and we
will drop it from skiboot too.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:56:28 +10:00
Michael Bringmann dc421b200f powerpc/hotplug-mem: Fix missing endian conversion of aa_index
When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property.  Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.

Fixes: 5f97b2a0d1 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:54:41 +10:00
Christophe Leroy 6f553912ee powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function
of_mm_gpiochip_add_data() generates an oops for NULL pointer dereference.

of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
setting the data, therefore ->save_regs() cannot use gpiochip_get_data()

Fixes: 937daafca7 ("powerpc: simple-gpio: use gpiochip data pointer")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:54:41 +10:00
Michael Ellerman 99acc9bede powerpc/spufs: Fix coredump of SPU contexts
If a process dumps core while it has SPU contexts active then we have
code to also dump information about the SPU contexts.

Unfortunately it's been broken for 3 1/2 years, and we didn't notice. In
commit 7b1f4020d0 ("spufs: get rid of dump_emit() wrappers") the nread
variable was removed and rc used instead. That means when the loop exits
successfully, rc has the number of bytes read, but it's then used as the
return value for the function, which should return 0 on success.

So fix it by setting rc = 0 before returning in the success case.

Fixes: 7b1f4020d0 ("spufs: get rid of dump_emit() wrappers")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:54:40 +10:00
Nicholas Piggin a2b05b7aa6 powerpc/64s: Add dt_cpu_ftrs boot time setup option
Provide a dt_cpu_ftrs= cmdline option to disable the dt_cpu_ftrs CPU
feature discovery, and fall back to the "cputable" based version.

Also allow control of advertising unknown features to userspace and
with this parameter, and remove the clunky CONFIG option.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add explicit early check of bootargs in dt_cpu_ftrs_init()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:54:33 +10:00
Alexei Starovoitov 71189fa9b0 bpf: free up BPF_JMP | BPF_CALL | BPF_X opcode
free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual
indirect call by register and use kernel internal opcode to
mark call instruction into bpf_tail_call() helper.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31 19:29:47 -04:00
Nicholas Piggin 83a092cf95 powerpc: Link warning for orphan sections
Add --orphan-handling=warn to final link flags. This ensures we can
handle all sections explicitly. This would have caught subtle breakage
such as 7de3b27bac at build-time.

Also bring existing orphan sections into the fold:
- .text.hot and .text.unlikely are compiler generated sections.
- .sdata2, .dynsbss, .plt are used by PPC32
- We previously did not specify DWARF_DEBUG or STABS_DEBUG
- DWARF_DEBUG did not include all DWARF sections that can be emitted
- A number of sections are unused and can be discarded.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin c494adefef powerpc/64: Tool to check head sections location sanity
Use a tool to check that the location of "fixed sections" are where
we expected them to be, which catches cases the linker script can't
(stubs being added to start of .text section), and which ends up
being neater.

Sample output:

  ERROR: start_text address is c000000000008100, should be c000000000008000
  ERROR: see comments in arch/powerpc/tools/head_check.sh

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in fix from Nick for 4.6 era toolchains]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin 951eedebcd powerpc/64: Handle linker stubs in low .text code
Very large kernels may require linker stubs for branches from HEAD
text code. The linker may place these stubs before the HEAD text
sections, which breaks the assumption that HEAD text is located at 0
(or the .text section being located at 0x7000/0x8000 on Book3S
kernels).

Provide an option to create a small section just before the .text
section with an empty 256 - 4 bytes, and adjust the start of the .text
section to match. The linker will tend to put stubs in that section
and not break our relative-to-absolute offset assumptions.

This causes a small waste of space on common kernels, but allows large
kernels to build and boot. For now, it is an EXPERT config option,
defaulting to =n, but a reference is provided for it in the build-time
check for such breakage. This is good enough for allyesconfig and
custom users / hackers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin 4ea80652dc powerpc/64s: Tool to flag direct branches from unrelocated interrupt vectors
Direct banches from code below __end_interrupts to code above
__end_interrupts when built with CONFIG_RELOCATABLE are disallowed
because they will break when the kernel is not located at 0.

Sample output:

    WARNING: Unrelocated relative branches
    c000000000000118 bl-> 0xc000000000038fb8 <pnv_restore_hyp_resource>
    c00000000000013c b-> 0xc0000000001068a4 <kvm_start_guest>
    c000000000000148 b-> 0xc00000000003919c <pnv_wakeup_loss>
    c00000000000014c b-> 0xc00000000003923c <pnv_wakeup_noloss>
    c0000000000005a4 b-> 0xc000000000106ffc <kvmppc_interrupt_hv>
    c000000000001af0 b-> 0xc000000000106ffc <kvmppc_interrupt_hv>
    c000000000001b24 b-> 0xc000000000106ffc <kvmppc_interrupt_hv>
    c000000000001b58 b-> 0xc000000000106ffc <kvmppc_interrupt_hv>

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin efe0160cfd powerpc/64: Linker on-demand sfpr functions for modules
For final link, the powerpc64 linker generates fpr save/restore
functions on-demand, placing them in the .sfpr section. Starting with
binutils 2.25, these can be provided for non-final links with
--save-restore-funcs. Use that where possible for module links.

This saves about 200 bytes per module (~60kB) on powernv defconfig
build.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin cde9f2f420 powerpc/64: Do not create new section for save/restore functions
There is no need to create a new section for these. Consolidate with
32-bit and just use .text.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin 7c868b66f8 powerpc/64: Do not link crtsaveres.o in boot
crtsaveres.S is empty with 64-bit builds already, so just don't
build and link it to match the vmlinux build.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use CONFIG_PPC64_BOOT_WRAPPER not CONFIG_PPC32 to fix BE build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin baa25b571a powerpc/64: Do not link crtsavres.o in vmlinux
The 64-bit linker creates save/restore functions on demand with final
links, so vmlinux does not require crtsavres.o.

Make crtsavres.o extra-y on 64-bit (it is still required by modules).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin e8c688251d powerpc/64: Place sfpr section explicitly with the linker script
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Stephen Rothwell 308d263d3f powerpc: Use uapi/asm-generic/sockios.h
The arch version is identical except for comments and white space.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Stephen Rothwell b87901e6ec powerpc: Use the asm-generic versions of some uapi includes
These are completely obvious as all they do is include the asm-generic
versions.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Ivan Mikhaylov 6e2f03e292 powerpc/[booke|4xx]: Don't clobber TCR[WP] when setting TCR[DIE]
Prevent a kernel panic caused by unintentionally clearing TCR watchdog
bits. At this point in the kernel boot, the watchdog may have already
been enabled by u-boot. The original code's attempt to write to the TCR
register results in an inadvertent clearing of the watchdog
configuration bits, causing the 476 to reset.

Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Ivan Mikhaylov 20975a0ae1 powerpc/44x/fsp2: Add defconfig for FSP2 board
This patch adds default FSP2 config for main usage.

Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Ivan Mikhaylov 9eec6cb142 powerpc/44x/fsp2: Add device tree for FSP2 board
Add a device tree for FSP2 board (476 based).

Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Ivan Mikhaylov c4b56b023d powerpc/44x/fsp2: Platform support for FSP2 (476fpe) board
Add platform code support for FSP2 (476fpe) board.

Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Gautham R. Shenoy 22c6663dc6 powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1
On Power9 DD1 due to a hardware bug the Power-Saving Level Status
field (PLS) of the PSSCR for a thread waking up from a deep state can
under-report if some other thread in the core is in a shallow stop
state. The scenario in which this can manifest is as follows:

   1) All the threads of the core are in deep stop.
   2) One of the threads is woken up. The PLS for this thread will
      correctly reflect that it is waking up from deep stop.
   3) The thread that has woken up now executes a shallow stop.
   4) When some other thread in the core is woken, its PLS will reflect
      the shallow stop state.

Thus, the subsequent thread for which the PLS is under-reporting the
wakeup state will not restore the hypervisor resources.

Hence, on DD1 systems, use the Requested Level (RL) field as a
workaround to restore the contents of the hypervisor resources on the
wakeup from the stop state.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Akshay Adiga 1e1601b38e powerpc/powernv/idle: Restore SPRs for deep idle states via stop API.
Some of the SPR values (HID0, MSR, SPRG0) don't change during the run
time of a booted kernel, once they have been initialized.

The contents of these SPRs are lost when the CPUs enter deep stop
states. So instead saving and restoring SPRs from the kernel, use the
stop-api provided by the firmware by which the firmware can restore
the contents of these SPRs to their initialized values after wakeup
from a deep stop state.

Apart from these, program the PSSCR value to that of the deepest stop
state via the stop-api. This will be used to indicate to the
underlying firmware as to what stop state to put the threads that have
been woken up by a special-wakeup.

And while we are at programming SPRs via stop-api, ensure that HID1,
HID4 and HID5 registers which are only available on POWER8 are not
requested to be restored by the firware on POWER9.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Gautham R. Shenoy cb0be7ec03 powerpc/powernv/idle: Restore LPCR on wakeup from deep-stop
On wakeup from a deep stop state which is supposed to lose the
hypervisor state, we don't restore the LPCR to the old value but set
it to a "sane" value via cur_cpu_spec->cpu_restore().

The problem is that the "sane" value doesn't include UPRT and the HR
bits which are required to run correctly in Radix mode.

Fix this on POWER9 onwards by restoring the LPCR value whatever it was
before executing the stop instruction.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Gautham R. Shenoy ec48673552 powerpc/powernv/idle: Decouple Timebase restore & Per-core SPRs restore
On POWER8, in case of
   -  nap: both timebase and hypervisor state is retained.
   -  fast-sleep: timebase is lost. But the hypervisor state is retained.
   -  winkle: timebase and hypervisor state is lost.

Hence, the current code for handling exit from a idle state assumes
that if the timebase value is retained, then so is the hypervisor
state. Thus, the current code doesn't restore per-core hypervisor
state in such cases.

But that is no longer the case on POWER9 where we do have stop states
in which timebase value is retained, but the hypervisor state is
lost. So we have to ensure that the per-core hypervisor state gets
restored in such cases.

Fix this by ensuring that even in the case when timebase is retained,
we explicitly check if we are waking up from a deep stop that loses
per-core hypervisor state (indicated by cr4 being eq or gt), and if
this is the case, we restore the per-core hypervisor state.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Gautham R. Shenoy 5f221c3ca1 powerpc/powernv/idle: Correctly initialize core_idle_state_ptr
The lower 8 bits of core_idle_state_ptr tracks the number of non-idle
threads in the core. This is supposed to be initialized to bit-map
corresponding to the threads_per_core. However, currently it is
initialized to PNV_CORE_IDLE_THREAD_BITS (0xFF). This is correct for
POWER8 which has 8 threads per core, but not for POWER9 which has 4
threads per core.

As a result, on POWER9, core_idle_state_ptr gets initialized to
0xFF. In case when all the threads of the core are idle, the bits
corresponding tracking the idle-threads are non-zero. As a result, the
idle entry/exit code fails to save/restore per-core hypervisor state
since it assumes that there are threads in the cores which are still
active.

Fix this by correctly initializing the lower bits of the
core_idle_state_ptr on the basis of threads_per_core.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Anton Blanchard 518470fe96 powerpc: Add HAVE_IRQ_TIME_ACCOUNTING
Allow us to enable IRQ_TIME_ACCOUNTING. Even though we currently
use VIRT_CPU_ACCOUNTING_NATIVE, that option is quite heavy
weight and IRQ_TIME_ACCOUNTING might be better in some cases.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Pavel Machek 70a92003de powerpc/sequoia: Fix NAND partitions not to overlap
Currently the DTS defines two partitions at the same addresses, if you
use one, you will corrupt information on the other one. Fix it by
shifting the second partition.

Signed-off-by: Pavel Machek <pavel@denx.de>
[mpe: Reconstruct change log from email thread]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Andrew Jeffery a3f952df3c powerpc: Tweak copy selection parameter in __copy_tofrom_user_power7()
Experiments with the netperf benchmark indicated that the size selecting
VMX-based copies in __copy_tofrom_user_power7() was suboptimal on POWER8.
Measurements showed that parity was in the neighbourhood of 3328 bytes,
rather than greater than 4096. The change gives a 1.5-2.0% improvement in
performance for 4096-byte buffers, reducing the relative time spent in
__copy_tofrom_user_power7() from approximately 7% to approximately 5% in
the TCP_RR benchmark.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin 09b6c1129f powerpc/xmon: Fix compile error with PPC_8xx=y
Rearrange the code so that mode and badaddr are only defined when
they're used.

Also unsplit the string for easier grepping, and switch from CONFIG_8xx
which is deprecated to CONFIG_PPC_8xx.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin 67d2041808 powerpc/powernv: Fix CPU_HOTPLUG=n idle.c compile error
Fixes: a7cd88da97 ("powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.c")
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin c07e1b8a27 powerpc/64s: Fix OPAL_CALL non-maskable interrupt reentrancy
OPAL_CALL uses SRR[01] with MSR_RI=1, which gets corrupted if there
is an interleaving system reset or machine check interrupt.

Use HSRR[01] instead, which does not require MSR_RI=0.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin f1fe525201 powerpc/64s: Fix FIXUP_ENDIAN non-maskable interrupt reentrancy
FIXUP_ENDIAN uses SRR[01] with MSR_RI=1, which gets corrupted if there
is an interleaving system reset or machine check interrupt.

Set MSR_RI=0 before setting SRRs. The rfid will restore MSR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Paul Mackerras 2f2724630f KVM: PPC: Book3S HV: Cope with host using large decrementer mode
POWER9 introduces a new mode for the decrementer register, called
large decrementer mode, in which the decrementer counter is 56 bits
wide rather than 32, and reads are sign-extended rather than
zero-extended.  For the decrementer, this new mode is optional and
controlled by a bit in the LPCR.  The hypervisor decrementer (HDEC)
is 56 bits wide on POWER9 and has no mode control.

Since KVM code reads and writes the decrementer and hypervisor
decrementer registers in a few places, it needs to be aware of the
need to treat the decrementer value as a 64-bit quantity, and only do
a 32-bit sign extension when large decrementer mode is not in effect.
Similarly, the HDEC should always be treated as a 64-bit quantity on
POWER9.  We define a new EXTEND_HDEC macro to encapsulate the feature
test for POWER9 and the sign extension.

To enable the sign extension to be removed in large decrementer mode,
we test the LPCR_LD bit in the host LPCR image stored in the struct
kvm for the guest.  If is set then large decrementer mode is enabled
and the sign extension should be skipped.

This is partly based on an earlier patch by Oliver O'Halloran.

Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-29 16:01:26 +10:00
Al Viro 613763a1f0 take compat_sys_old_getrlimit() to native syscall
... and sanitize the ifdefs in there

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-27 15:38:06 -04:00
Linus Torvalds 6f68a6ae1f powerpc fixes for 4.12 #4
Fix running SPU programs on Cell, and a few other minor fixes.
 
 Thanks to:
   Alistair Popple, Jeremy Kerr, Michael Neuling, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZKURPAAoJEFHr6jzI4aWAhckQAJxZZHt2OMbdNu0PxHdhZgxo
 +eSODIF0jvzBnYs/Pe9aqqrxuONxW+ioclyUIVYLUlHwLjCGf7x2Y5tJe0OmEff6
 ZOaUUcwECKw4cy2UJY6NCGv0nw/8INTDN5xPcQq9M8gExmX6plTmbnQg8Y10ONdQ
 LYu36GWyXF4ygblvLo7kXs8tuZYKozO6kPRqxiQ3zML2dOAyqWqPwpnoWSc6c7oR
 W+z/Vuxe3lTR+QHbfvnSpQhmdVi+WEnwFvgNmIise5R9Jd30Q1f1vES5E089ifmN
 b0Qi5/gkb6YWBkROvxTARFRdmU0/YPNDFWUsuyHJB/Nz1MnqqXx5X+5KpqqinPya
 azVoYW010x2zawm0aX+BF/WeH5ymsl++R84/aO8UR0fA2AIwEOQeLGWZvaZb8CMl
 9vd3NqCJ+diBhgCHiHp80pjD978bqt7Ls1nfbHhYTJ31HRT8Yz/ympWOhLE6rp+t
 kGR+UOHduaZWK3KHoE2WIoUFJuQMvRgFmjoy2G+YaK/PcUc8OA+90v1665fnbk+N
 wmZyAirP39gveHkHXDywqbEjN4CSMgsqrRW/KwPo0b4mf2m3rsIAshO9pBbZRv+P
 evhrAkCYRv5zGbGIYJ/TiEyball+8NQzxzoYmMzq62pjE27gyIe94Sqy80U4zyOC
 RqqUxflOBgMDC8Ufc30u
 =EM32
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix running SPU programs on Cell, and a few other minor fixes.

  Thanks to Alistair Popple, Jeremy Kerr, Michael Neuling, Nicholas
  Piggin"

* tag 'powerpc-4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Add PPC_FEATURE userspace bits for SCV and DARN instructions
  powerpc/spufs: Fix hash faults for kernel regions
  powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N
  powerpc/powernv/npu-dma.c: Fix opal_npu_destroy_context() call
  selftests/powerpc: Fix TM resched DSCR test with some compilers
2017-05-27 09:28:34 -07:00
Sebastian Andrzej Siewior f9a69931c3 powerpc/powernv: Use stop_machine_cpuslocked()
set_subcores_per_core() holds get_online_cpus() while invoking stop_machine().

stop_machine() invokes get_online_cpus() as well. This is correct, but
prevents the conversion of the hotplug locking to a percpu rwsem.

Use stop_machine_cpuslocked() to avoid the nested call. Convert
*_online_cpus() to the new interfaces while at it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170524081548.331016542@linutronix.de
2017-05-26 10:10:41 +02:00
Sebastian Andrzej Siewior 419af25fa4 KVM/PPC/Book3S HV: Use cpuhp_setup_state_nocalls_cpuslocked()
kvmppc_alloc_host_rm_ops() holds get_online_cpus() while invoking
cpuhp_setup_state_nocalls().

cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is
correct, but prevents the conversion of the hotplug locking to a percpu
rwsem.

Use cpuhp_setup_state_nocalls_cpuslocked() to avoid the nested
call. Convert *_online_cpus() to the new interfaces while at it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: kvm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: kvm-ppc@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Alexander Graf <agraf@suse.com>
Link: http://lkml.kernel.org/r/20170524081547.809616236@linutronix.de
2017-05-26 10:10:39 +02:00
Nicholas Piggin a4700a2610 powerpc: Add PPC_FEATURE userspace bits for SCV and DARN instructions
Providing "scv" support to userspace requires kernel support, so it
must be advertised as independently to the base ISA 3 instruction set.

The darn instruction relies on firmware enablement, so it has been
decided to split this out from the core ISA 3 feature as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-25 23:07:45 +10:00
Jeremy Kerr d75e4919cc powerpc/spufs: Fix hash faults for kernel regions
Commit ac29c64089 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
introduced check_pte_access() which denied kernel access to
non-_PAGE_PRIVILEGED pages.

However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
for spufs' kernel accesses, so the DMAs required to establish SPE
memory no longer work.

This change adds _PAGE_PRIVILEGED to the hash fault handler for
kernel accesses.

Fixes: ac29c64089 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Reported-by: Sombat Tragolgosol <sombat3960@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-25 23:07:44 +10:00
Michael Neuling d957fb4d17 powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N
Currently if you disable CONFIG_PPC_RADIX_MMU you'll crash on boot on
a P9. This is because we still set MMU_FTR_TYPE_RADIX via
ibm,pa-features and MMU_FTR_TYPE_RADIX is what's used for code patching
in much of the asm code (ie. slb_miss_realmode)

This patch fixes the problem by stopping MMU_FTR_TYPE_RADIX from being
set from ibm.pa-features.

We may eventually end up removing the CONFIG_PPC_RADIX_MMU option
completely but until then this fixes the issue.

Fixes: 17a3dd2f5f ("powerpc/mm/radix: Use firmware feature to enable Radix MMU")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-25 23:07:44 +10:00
Alistair Popple 415ba3c157 powerpc/powernv/npu-dma.c: Fix opal_npu_destroy_context() call
opal_npu_destroy_context() should be called with the NPU PHB, not the
PCIe PHB.

Fixes: 1ab66d1fba ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-25 23:07:39 +10:00
Thomas Gleixner a8fcfc1917 powerpc: Adjust system_state check
To enable smp_processor_id() and might_sleep() debug checks earlier, it's
required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

Adjust the system_state check in smp_generic_cpu_bootable() to handle the
extra states.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170516184735.359536998@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-23 10:01:35 +02:00
David S. Miller 218b6a5b23 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-05-22 23:32:48 -04:00
David S. Miller 1c4f676a68 net: Define SCM_TIMESTAMPING_PKTINFO on all architectures.
A definition was only provided for asm-generic/socket.h
using platforms, define it for the others as well

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21 23:13:37 -04:00
Linus Torvalds 4217fdde34 KVM fixes for v4.12-rc2
ARM:
  - A fix for a build failure introduced in -rc1 when tracepoints are
    enabled on 32-bit ARM.
  - Disabling use of stack pointer protection in the hyp code which can
    cause panics.
  - A handful of VGIC fixes.
  - A fix to the init of the redistributors on GICv3 systems that
    prevented boot with kvmtool on GICv3 systems introduced in -rc1.
  - A number of race conditions fixed in our MMU handling code.
  - A fix for the guest being able to program the debug extensions for
    the host on the 32-bit side.
 
 PPC:
  - Fixes for build failures with PR KVM configurations.
  - A fix for a host crash that can occur on POWER9 with radix guests.
 
 x86:
  - Fixes for nested PML and nested EPT.
  - A fix for crashes caused by reserved bits in SSE MXCSR that could
    have been set by userspace.
  - An optimization of halt polling that fixes high CPU overhead.
  - Fixes for four reports from Dan Carpenter's static checker.
  - A protection around code that shouldn't have been preemptible.
  - A fix for port IO emulation.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJZHzY3AAoJEED/6hsPKofocI8H/AiOHXi6AC/3s9Ok3IbN/Wp6
 +xSm1yqgxitGhpmKIJQyKMUTV0t8SblRV2nxvW7/MEyfl7vztiyWENaVFc6pO6N7
 GbnLvdImZ9aypoBaxVOY8WG/CHw2XZ7oUYyBIGrWECH3k+fptBNdISFK3D76+4G2
 +tAuWSpKSQFwjGxtreUSlnvQBp6Tjh/PqTyxslPs4zYCL6UPKSSVAoxy4yOKj3AX
 G03tx/1U1n/hSJHub9RFqho4dhVGT/p3V6oppZmS1g/ZqGPQwK1wxlYquHOtORFR
 Iq8LdkNQwTdkLlTTOG+tamYSfzn0+KhczfWjIh6ZEb79ARrUSnBU4Awpvom1C2A=
 =B6Rl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "ARM:
   - a fix for a build failure introduced in -rc1 when tracepoints are
     enabled on 32-bit ARM.

   - disable use of stack pointer protection in the hyp code which can
     cause panics.

   - a handful of VGIC fixes.

   - a fix to the init of the redistributors on GICv3 systems that
     prevented boot with kvmtool on GICv3 systems introduced in -rc1.

   - a number of race conditions fixed in our MMU handling code.

   - a fix for the guest being able to program the debug extensions for
     the host on the 32-bit side.

  PPC:
   - fixes for build failures with PR KVM configurations.

   - a fix for a host crash that can occur on POWER9 with radix guests.

  x86:
   - fixes for nested PML and nested EPT.

   - a fix for crashes caused by reserved bits in SSE MXCSR that could
     have been set by userspace.

   - an optimization of halt polling that fixes high CPU overhead.

   - fixes for four reports from Dan Carpenter's static checker.

   - a protection around code that shouldn't have been preemptible.

   - a fix for port IO emulation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
  KVM: x86: prevent uninitialized variable warning in check_svme()
  KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
  KVM: x86: zero base3 of unusable segments
  KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
  KVM: x86: Fix potential preemption when get the current kvmclock timestamp
  KVM: Silence underflow warning in avic_get_physical_id_entry()
  KVM: arm/arm64: Hold slots_lock when unregistering kvm io bus devices
  KVM: arm/arm64: Fix bug when registering redist iodevs
  KVM: x86: lower default for halt_poll_ns
  kvm: arm/arm64: Fix use after free of stage2 page table
  kvm: arm/arm64: Force reading uncached stage2 PGD
  KVM: nVMX: fix EPT permissions as reported in exit qualification
  KVM: VMX: Don't enable EPT A/D feature if EPT feature is disabled
  KVM: x86: Fix load damaged SSEx MXCSR register
  kvm: nVMX: off by one in vmx_write_pml_buffer()
  KVM: arm: rename pm_fake handler to trap_raz_wi
  KVM: arm: plug potential guest hardware debug leakage
  kvm: arm/arm64: Fix race in resetting stage2 PGD
  KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2 registers
  KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interrupt
  ...
2017-05-19 15:13:13 -07:00
Linus Torvalds f538a82c07 ARM: SoC fixes (and a cross-arch dt-include fix)
We had a small batch of fixes before -rc1, but here is a larger one. It
 contains a backmerge of 4.12-rc1 since some of the downstream branches we
 merge had that as base; at the same time we already had merged contents
 before -rc1 and rebase wasn't the right solution.
 
 A mix of random smaller fixes and a few things worth pointing out:
 
  - We've started telling people to avoid cross-tree shared branches if all
    they're doing is picking up one or two DT-used constants from a
    shared include file, and instead to use the numeric values on first
    submission. Follow-up moving over to symbolic names are sent in right
    after -rc1, i.e. here. It's only a few minor patches of this type.
 
  - Linus Walleij and others are resurrecting the 'Gemini' platform, and
    wanted a cut-down platform-specific defconfig for it. So I picked that
    up for them.
 
  - Rob Herring ran 'savedefconfig' on arm64, it's a bit churny but it helps
    people to prepare patches since it's a pain when defconfig and current
    savedefconfig contents differs too much.
 
  - Devicetree additions for some pinctrl drivers for Armada that were
    merged this window. I'd have preferred to see those earlier but it's not
    a huge deail.
 
 The biggest change worth pointing out though since it's touching other
 parts of the tree: We added prefixes to be used when cross-including
 DT contents between arm64 and arm, allowing someone to #include
 <arm/foo.dtsi> from arm64, and likewise. As part of that, we needed
 arm/foo.dtsi to work on arm as well. The way I suggested this to Heiko
 resulted in a recursive symlink.
 
 Instead, I've now moved it out of arch/*/boot/dts/include, into a shared
 location under scripts/dtc. While I was at it, I consolidated so all
 architectures now behave the same way in this manner.
 
 Rob Herring (DT maintainer) has acked it. I cc:d most other arch
 maintainers but nobody seems to care much; it doesn't really affect them
 since functionality is unchanged for them by default.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZH0gEAAoJEIwa5zzehBx3eCcQAJX55nWjTV/ankFWyaQiXZx1
 JhcThxugqPYviYFFTpI3LZnZ0snWbZBNfkoju8ukmzIiqoO/eDlB+LVz6PVWfCIl
 4egZZZF1tgxEFoQQ71WKpF1hj0pKccCugHX+5uBDID3s9vjxgQS1Gf1G5ZeFrqbd
 m9brxbouGsZMscuWb59K7ayIXO6D4C2hqQqJtGrOZc2jfLs9rZBchDVSQ28sRNQy
 qXIcAgH+D1QWfbAi0+cI6opnWmEdcofO5Uge8KzK1wO0HYzO5GQJw1KbM/AAJ7+Y
 JtPEWhuUKl8aou6515rFPD7yjFaMtfbL0+0UeKS2TRGz+dSCoSs1kTyJ4cpNAUCT
 E3hOLYKzq8rbxcGwEqfp4JjktpWSPGGhEbp4lvNV1gk9A0MLHPnidLCKSoLyCkN0
 3qmmlrt4hSCpF07IvY7hWUALHIOsRPtIdbaOMzAyzcWkzu/DMmQ3lFdt7Bgi3AbB
 j0Phtz0TR7X6A/1gAxZDGjHaYaEG6KR9ufJMyCNtgGUaKeMZakthbYSz8MdXIq5X
 zKqL2ZyPKNq6zHZbvc3yIiYmVKubT9t+8Wc4AjXPNdWgR455V0GSlmf3XCA8rAp7
 hISzE4CD4N/YIKNPukt4kcJY7TBpcOZxfquMfBxLEqke+GxJL80CGaOf8iZb3ipM
 R697L88FstLhSNhEl/gu
 =2EGB
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "We had a small batch of fixes before -rc1, but here is a larger one.
  It contains a backmerge of 4.12-rc1 since some of the downstream
  branches we merge had that as base; at the same time we already had
  merged contents before -rc1 and rebase wasn't the right solution.

  A mix of random smaller fixes and a few things worth pointing out:

   - We've started telling people to avoid cross-tree shared branches if
     all they're doing is picking up one or two DT-used constants from a
     shared include file, and instead to use the numeric values on first
     submission. Follow-up moving over to symbolic names are sent in
     right after -rc1, i.e. here. It's only a few minor patches of this
     type.

   - Linus Walleij and others are resurrecting the 'Gemini' platform,
     and wanted a cut-down platform-specific defconfig for it. So I
     picked that up for them.

   - Rob Herring ran 'savedefconfig' on arm64, it's a bit churny but it
     helps people to prepare patches since it's a pain when defconfig
     and current savedefconfig contents differs too much.

   - Devicetree additions for some pinctrl drivers for Armada that were
     merged this window. I'd have preferred to see those earlier but
     it's not a huge deail.

  The biggest change worth pointing out though since it's touching other
  parts of the tree: We added prefixes to be used when cross-including
  DT contents between arm64 and arm, allowing someone to #include
  <arm/foo.dtsi> from arm64, and likewise. As part of that, we needed
  arm/foo.dtsi to work on arm as well. The way I suggested this to Heiko
  resulted in a recursive symlink.

  Instead, I've now moved it out of arch/*/boot/dts/include, into a
  shared location under scripts/dtc. While I was at it, I consolidated
  so all architectures now behave the same way in this manner.

  Rob Herring (DT maintainer) has acked it. I cc:d most other arch
  maintainers but nobody seems to care much; it doesn't really affect
  them since functionality is unchanged for them by default"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (29 commits)
  arm64: dts: rockchip: fix include reference
  firmware: ti_sci: fix strncat length check
  ARM: remove duplicate 'const' annotations'
  arm64: defconfig: enable options needed for QCom DB410c board
  arm64: defconfig: sync with savedefconfig
  ARM: configs: add a gemini defconfig
  devicetree: Move include prefixes from arch to separate directory
  ARM: dts: dra7: Reduce cpu thermal shutdown temperature
  memory: omap-gpmc: Fix debug output for access width
  ARM: dts: LogicPD Torpedo: Fix camera pin mux
  ARM: dts: omap4: enable CEC pin for Pandaboard A4 and ES
  ARM: dts: gta04: fix polarity of clocks for mcbsp4
  ARM: dts: dra7: Add power hold and power controller properties to palmas
  soc: imx: add PM dependency for IMX7_PM_DOMAINS
  ARM: dts: imx6sx-sdb: Remove OPP override
  ARM: dts: imx53-qsrb: Pulldown PMIC IRQ pin
  soc: bcm: brcmstb: Correctly match 7435 SoC
  tee: add ARM_SMCCC dependency
  ARM: omap2+: make omap4_get_cpu1_ns_pa_addr declaration usable
  ARM64: dts: mediatek: configure some fixed mmc parameters
  ...
2017-05-19 13:36:56 -07:00
Olof Johansson d5d332d3f7 devicetree: Move include prefixes from arch to separate directory
We use a directory under arch/$ARCH/boot/dts as an include path
that has links outside of the subtree to find dt-bindings from under
include/dt-bindings. That's been working well, but new DT architectures
haven't been adding them by default.

Recently there's been a desire to share some of the DT material between
arm and arm64, which originally caused developers to create symlinks or
relative includes between the subtrees. This isn't ideal -- it breaks
if the DT files aren't stored in the exact same hierarchy as the kernel
tree, and generally it's just icky.

As a somewhat cleaner solution we decided to add a $ARCH/ prefix link
once, and allow DTS files to reference dtsi (and dts) files in other
architectures that way.

Original approach was to create these links under each architecture,
but it lead to the problem of recursive symlinks.

As a remedy, move the include link directories out of the architecture
trees into a common location. At the same time, they can now share one
directory and one dt-bindings/ link as well.

Fixes: 4027494ae6 ('ARM: dts: add arm/arm64 include symlinks')
Reported-by: Russell King <linux@armlinux.org.uk>
Reported-by: Omar Sandoval <osandov@osandov.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linux-arch <linux-arch@vger.kernel.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
2017-05-18 23:55:48 -07:00
Michael Ellerman e41e53cd4f powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash
virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on
an address. What this means in practice is that it should only return true for
addresses in the linear mapping which are backed by a valid PFN.

We are failing to properly check that the address is in the linear mapping,
because virt_to_pfn() will return a valid looking PFN for more or less any
address. That bug is actually caused by __pa(), used in virt_to_pfn().

eg: __pa(0xc000000000010000) = 0x10000  # Good
    __pa(0xd000000000010000) = 0x10000  # Bad!
    __pa(0x0000000000010000) = 0x10000  # Bad!

This started happening after commit bdbc29c19b ("powerpc: Work around gcc
miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition
of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from
the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give
you something bogus back.

Until we can verify if that GCC bug is no longer an issue, or come up with
another solution, this commit does the minimal fix to make virt_addr_valid()
work, by explicitly checking that the address is in the linear mapping region.

Fixes: bdbc29c19b ("powerpc: Work around gcc miscompilation of __pa() on 64-bit")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Tested-by: Breno Leitao <breno.leitao@gmail.com>
2017-05-19 13:04:35 +10:00
Holger Brunck 1f6753d68e powerpc/85xx/kmcent2: use hdlc busmode for UCC1
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-18 10:28:39 -04:00
Michael Ellerman bfb9956ab4 powerpc/mm: Fix crash in page table dump with huge pages
The page table dump code doesn't know about huge pages, so currently
it crashes (or walks random memory, usually leading to a crash), if it
finds a huge page. On Book3S we only see huge pages in the Linux page
tables when we're using the P9 Radix MMU.

Teaching the code to properly handle huge pages is a bit more involved,
so for now just prevent the crash.

Cc: stable@vger.kernel.org # v4.10+
Fixes: 8eb07b1870 ("powerpc/mm: Dump linux pagetables")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-17 11:56:33 +10:00
Al Viro 8298525839 kill strlen_user()
no callers, no consistent semantics, no sane way to use it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-15 23:40:22 -04:00
Naveen N. Rao d04c02f8aa powerpc/kprobes: Fix handling of instruction emulation on probe re-entry
Commit 22d8b3dec2 ("powerpc/kprobes: Emulate instructions on kprobe
handler re-entry") enabled emulating instructions on kprobe re-entry,
rather than single-stepping always. However, we didn't update the single
stepping code to only be run if the emulation fails. Also, we missed
re-enabling preemption if the instruction emulation was successful. Fix
those issues.

Fixes: 22d8b3dec2 ("powerpc/kprobes: Emulate instructions on kprobe handler re-entry")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-16 13:11:07 +10:00
Gautham R. Shenoy bbb075ddf7 powerpc/powernv: Set NAPSTATELOST after recovering paca on P9 DD1
Commit 17ed4c8f81 ("powerpc/powernv: Recover correct PACA on wakeup
from a stop on P9 DD1") promises to set the NAPSTATELOST bit in paca
after recovering the correct paca for the thread waking up from stop1
on DD1, so that the GPRs can be correctly restored on the stop exit
path. However, it loads the value 1 into r3, but stores the value in
r0 into NAPSTATELOST(r13).

Fix this by correctly set the NAPSTATELOST bit in paca after
recovering the paca on POWER9 DD1.

Fixes: 17ed4c8f81 ("powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-16 13:03:21 +10:00
Radim Krčmář 65acb891aa Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
- fix build failures with PR KVM configurations
- fix a host crash that can occur on POWER9 with radix guests
2017-05-15 14:38:56 +02:00
Michael Neuling f48e91e87e powerpc/tm: Fix FP and VMX register corruption
In commit dc3106690b ("powerpc: tm: Always use fp_state and vr_state
to store live registers"), a section of code was removed that copied
the current state to checkpointed state. That code should not have been
removed.

When an FP (Floating Point) unavailable is taken inside a transaction,
we need to abort the transaction. This is because at the time of the
tbegin, the FP state is bogus so the state stored in the checkpointed
registers is incorrect. To fix this, we treclaim (to get the
checkpointed GPRs) and then copy the thread_struct FP live state into
the checkpointed state. We then trecheckpoint so that the FP state is
correctly restored into the CPU.

The copying of the FP registers from live to checkpointed is what was
missing.

This simplifies the logic slightly from the original patch.
tm_reclaim_thread() will now always write the checkpointed FP
state. Either the checkpointed FP state will be written as part of
the actual treclaim (in tm.S), or it'll be a copy of the live
state. Which one we use is based on MSR[FP] from userspace.

Similarly for VMX.

Fixes: dc3106690b ("powerpc: tm: Always use fp_state and vr_state to store live registers")
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: cyrilbur@gmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-15 19:31:38 +10:00
Michael Ellerman 43e24e82f3 powerpc/modules: If mprofile-kernel is enabled add it to vermagic
On powerpc we can build the kernel with two different ABIs for mcount(), which
is used by ftrace. Kernels built with one ABI do not know how to load modules
built with the other ABI. The new style ABI is called "mprofile-kernel", for
want of a better name.

Currently if we build a module using the old style ABI, and the kernel with
mprofile-kernel, when we load the module we'll oops something like:

  # insmod autofs4-no-mprofile-kernel.ko
  ftrace-powerpc: Unexpected instruction f8810028 around bl _mcount
  ------------[ cut here ]------------
  WARNING: CPU: 6 PID: 3759 at ../kernel/trace/ftrace.c:2024 ftrace_bug+0x2b8/0x3c0
  CPU: 6 PID: 3759 Comm: insmod Not tainted 4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 #11
  ...
  NIP [c0000000001eaa48] ftrace_bug+0x2b8/0x3c0
  LR [c0000000001eaff8] ftrace_process_locs+0x4a8/0x590
  Call Trace:
    alloc_pages_current+0xc4/0x1d0 (unreliable)
    ftrace_process_locs+0x4a8/0x590
    load_module+0x1c8c/0x28f0
    SyS_finit_module+0x110/0x140
    system_call+0x38/0xfc
  ...
  ftrace failed to modify
  [<d000000002a31024>] 0xd000000002a31024
   actual:   35:65:00:48

We can avoid this by including in the vermagic whether the kernel/module was
built with mprofile-kernel. Which results in:

  # insmod autofs4-pg.ko
  autofs4: version magic
  '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 SMP mod_unload modversions '
  should be
  '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269-dirty SMP mod_unload modversions mprofile-kernel'
  insmod: ERROR: could not insert module autofs4-pg.ko: Invalid module format

Fixes: 8c50b72a3b ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-15 19:31:38 +10:00
Linus Torvalds dc2a248166 powerpc updates for 4.12 part 2
Highlights include:
 
  - rework the Linux page table geometry to lower memory usage on 64-bit Book3S
    (IBM chips) using the Hash MMU.
 
  - support for a new device tree binding for discovering CPU features on future
    firmwares.
 
  - Freescale updates from Scott: "Includes a fix for a powerpc/next mm regression
    on 64e, a fix for a kernel hang on 64e when using a debugger inside a
    relocated kernel, a qman fix, and misc qe improvements."
 
 Thanks to:
   Christophe Leroy, Gavin Shan, Horia Geantă, LiuHailong, Nicholas Piggin, Roy
   Pledge, Scott Wood, Valentin Longchamp.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZFXjPAAoJEFHr6jzI4aWAgG4QAJoF7G5Txj0Du2I2/wQDkVq1
 InJ+BNji0xnOrFpz2EcIIlbIwBeJbY9cSIbmKUEPQU4hxtQgI8Q5WNEl2btWq8xz
 I0Ej3uc5obc9ltUdQoGxgXih/XDd8UN3fscSE2/SSuPY/A7JwAVZMsCEJ1tWdxpM
 hx+R9wlaUT3I6jmQwj9gg6zuBdIOL5szvZXKh9ruPKNyZWbPmPSUwIqiyT0YHsiD
 01OZsFYpdSH6Ka/eNHSNx5HC+kK8aDVaqd5E2fkHeH9+sxerpEzMo2PmK4T8vChh
 mSD4nhfqRwC2WRpPF/MY+zGBeXrFkCkR+nYhaqVDXXACKzfHgU58NOfvrmtRj52X
 vTW+cn92wqFTmi0TNUfhEFt8elcOO7/fKh1OVhsFx+bD+bgj8G1ZkLoBU/0QUzRf
 R4hiKKuOMnDHriNPdlAOKjHpR+ewh8Q679INThEJzEQpn7VBY72hcQwapQ3MjMnd
 E7LfsGwqGPkTc6gy1bFbWum5HMGOcmE0qkrnZo5VyFhNNwBs1Kx/B1GHjUOiucVu
 km5GEVNTfCkZqeabdca7fwbGcMH7zchR1ootqH2m18PZJAzr85A+aTqfrdJ5fDBs
 v/nznfcPVNEgvEW0im2jhpPoAlQE6/YvYa+kG4zjjxWA5FKVKdTzINexD82jlcqP
 +fDtIDxNcFkzlt4gacjh
 =YOQs
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:
 "The change to the Linux page table geometry was delayed for more
  testing with 16G pages, and there's the new CPU features stuff which
  just needed one more polish before going in. Plus a few changes from
  Scott which came in a bit late. And then various fixes, mostly minor.

  Summary highlights:

   - rework the Linux page table geometry to lower memory usage on
     64-bit Book3S (IBM chips) using the Hash MMU.

   - support for a new device tree binding for discovering CPU features
     on future firmwares.

   - Freescale updates from Scott:
      "Includes a fix for a powerpc/next mm regression on 64e, a fix for
       a kernel hang on 64e when using a debugger inside a relocated
       kernel, a qman fix, and misc qe improvements."

  Thanks to: Christophe Leroy, Gavin Shan, Horia Geantă, LiuHailong,
  Nicholas Piggin, Roy Pledge, Scott Wood, Valentin Longchamp"

* tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Support new device tree binding for discovering CPU features
  powerpc: Don't print cpu_spec->cpu_name if it's NULL
  of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle
  powerpc/64s: Fix unnecessary machine check handler relocation branch
  powerpc/mm/book3s/64: Rework page table geometry for lower memory usage
  powerpc: Fix distclean with Makefile.postlink
  powerpc/64e: Don't place the stack beyond TASK_SIZE
  powerpc/powernv: Block PCI config access on BCM5718 during EEH recovery
  powerpc/8xx: Adding support of IRQ in MPC8xx GPIO
  soc/fsl/qbman: Disable IRQs for deferred QBMan work
  soc/fsl/qe: add EXPORT_SYMBOL for the 2 qe_tdm functions
  soc/fsl/qe: only apply QE_General4 workaround on affected SoCs
  soc/fsl/qe: round brg_freq to 1kHz granularity
  soc/fsl/qe: get rid of immrbar_virt_to_phys()
  net: ethernet: ucc_geth: fix MEM_PART_MURAM mode
  powerpc/64e: Fix hang when debugging programs with relocated kernel
2017-05-12 10:04:09 -07:00
Paul Mackerras 76d837a4c0 KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms
Commit e91aa8e6ec ("KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64
permanently", 2017-03-22) enabled the SPAPR TCE code for all 64-bit
Book 3S kernel configurations in order to simplify the code and
reduce #ifdefs.  However, 64-bit Book 3S PPC platforms other than
pseries and powernv don't implement the necessary IOMMU callbacks,
leading to build failures like the following (for a pasemi config):

scripts/kconfig/conf  --silentoldconfig Kconfig
warning: (KVM_BOOK3S_64) selects SPAPR_TCE_IOMMU which has unmet direct dependencies (IOMMU_SUPPORT && (PPC_POWERNV || PPC_PSERIES))

...

  CC [M]  arch/powerpc/kvm/book3s_64_vio.o
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c: In function ‘kvmppc_clear_tce’:
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c:363:2: error: implicit declaration of function ‘iommu_tce_xchg’ [-Werror=implicit-function-declaration]
  iommu_tce_xchg(tbl, entry, &hpa, &dir);
  ^

To fix this, we make the inclusion of the SPAPR TCE support, and the
code that uses it in book3s_vio.c and book3s_vio_hv.c, depend on
the inclusion of support for the pseries and/or powernv platforms.
This means that when running a 'pseries' guest on those platforms,
the guest won't have in-kernel acceleration of the PAPR TCE hypercalls,
but at least now they compile.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-12 15:32:30 +10:00
Paul Mackerras 67325e988f KVM: PPC: Book3S PR: Check copy_to/from_user return values
The PR KVM implementation of the PAPR HPT hypercalls (H_ENTER etc.)
access an image of the HPT in userspace memory using copy_from_user
and copy_to_user.  Recently, the declarations of those functions were
annotated to indicate that the return value must be checked.  Since
this code doesn't currently check the return value, this causes
compile warnings like the ones shown below, and since on PPC the
default is to compile arch/powerpc with -Werror, this causes the
build to fail.

To fix this, we check the return values, and if non-zero, fail the
hypercall being processed with a H_FUNCTION error return value.
There is really no good error return value to use since PAPR didn't
envisage the possibility that the hypervisor may not be able to access
the guest's HPT, and H_FUNCTION (function not supported) seems as
good as any.

The typical compile warnings look like this:

  CC      arch/powerpc/kvm/book3s_pr_papr.o
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c: In function ‘kvmppc_h_pr_enter’:
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c:53:2: error: ignoring return value of ‘copy_from_user’, declared with attribute warn_unused_result [-Werror=unused-result]
  copy_from_user(pteg, (void __user *)pteg_addr, sizeof(pteg));
  ^
/home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c:74:2: error: ignoring return value of ‘copy_to_user’, declared with attribute warn_unused_result [-Werror=unused-result]
  copy_to_user((void __user *)pteg_addr, hpte, HPTE_SIZE);
  ^

... etc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-12 15:09:03 +10:00
Paul Mackerras acde25726b KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers
POWER9 running a radix guest will take some hypervisor interrupts
without going to real mode (turning off the MMU).  This means that
early hypercall handlers may now be called in virtual mode.  Most of
the handlers work just fine in both modes, but there are some that
can crash the host if called in virtual mode, notably the TCE (IOMMU)
hypercalls H_PUT_TCE, H_STUFF_TCE and H_PUT_TCE_INDIRECT.  These
already have both a real-mode and a virtual-mode version, so we
arrange for the real-mode version to return H_TOO_HARD for radix
guests, which will result in the virtual-mode version being called.

The other hypercall which is sensitive to the MMU mode is H_RANDOM.
It doesn't have a virtual-mode version, so this adds code to enable
it to be called in either mode.

An alternative solution was considered which would refuse to call any
of the early hypercall handlers when doing a virtual-mode exit from a
radix guest.  However, the XICS-on-XIVE code depends on the XICS
hypercalls being handled early even for virtual-mode exits, because
the handlers need to be called before the XIVE vCPU state has been
pulled off the hardware.  Therefore that solution would have become
quite invasive and complicated, and was rejected in favour of the
simpler, though less elegant, solution presented here.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-12 15:08:09 +10:00
Linus Torvalds 791a9a666d Kbuild UAPI header export updates for v4.12
Improvement of headers_install by Nicolas Dichtel.
 
 It has been long since the introduction of uapi directories,
 but the de-coupling of exported headers has not been completed.
 Headers listed in header-y are exported whether they exist in
 uapi directories or not.  His work fixes this inconsistency.
 
 All (and only) headers under uapi directories are now exported.
 The asm-generic wrappers are still exceptions, but this is a big
 step forward.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZE7MBAAoJED2LAQed4NsGroAP/iARejrIFmxuH96D5h2aiP1j
 c8KHQ+5fuq4w2KBmfbfkNvWbazlVheT6RrYWBUh/GABGsSqQC07d8New6B8TaUkE
 K0E48RsuYxouP18Ys6BOO4/zyRhEFD7Ta72PGQ/gDQY+6hAu4jYQnMdG0wipTblS
 QWgnUxTqfCbTjnRpRKXpcwRff+OeTWtOv3s0V8UashJUxnFVQ7Br2uRsm/KKkU/k
 jQC65KyHL4HlsFeeAiMmQ9IQPVwLsd6+d5crs0nydHaJ2XrFlNNQ7EEMyG8FxPdx
 9b/VpS+XY6DO+jeqkcpFrdL9IgcmCn72Qc5/4vrHuQO2dpWW5mVaVPq9RAGP0Yq/
 FB0vZRTp/tOIkD+0esirZW2gJtU3DWMY1A9rc5jjLRabdnRXVTdLfhEnksYJEfES
 yPbDEuKyzo6a+zBSqNtMquJPmYVYEDS2mcmgxY5sB58qtXkUN2Yr+uUALxC8XhXW
 SHHwIAV3a+UX5ZU9Ys8dp2hI4EXYXtdvsz2zvl4qPIn/Q9d1YoEJRe7/Y0p8gBXM
 5pVJ1yohKoYrNZVGBe0LO/gHGVAVgMj0cKn0Xg51bbvjxY2U5djUbMY0uw1gFrrM
 O9ld3C6O8zH5BsExCfwp9iPz2SW5W9N80kgnKfjCHBRUKuMTkm02DJf8Hx+pyfVQ
 DCy9lYTi76IgZ1uflKq9
 =Rqdo
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild UAPI updates from Masahiro Yamada:
 "Improvement of headers_install by Nicolas Dichtel.

  It has been long since the introduction of uapi directories, but the
  de-coupling of exported headers has not been completed. Headers listed
  in header-y are exported whether they exist in uapi directories or
  not. His work fixes this inconsistency.

  All (and only) headers under uapi directories are now exported. The
  asm-generic wrappers are still exceptions, but this is a big step
  forward"

* tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  arch/include: remove empty Kbuild files
  uapi: export all arch specifics directories
  uapi: export all headers under uapi directories
  smc_diag.h: fix include from userland
  btrfs_tree.h: fix include from userland
  uapi: includes linux/types.h before exporting files
  Makefile.headersinst: remove destination-y option
  Makefile.headersinst: cleanup input files
  x86: stop exporting msr-index.h to userland
  nios2: put setup.h in uapi
  h8300: put bitsperlong.h in uapi
2017-05-10 20:45:36 -07:00
Linus Torvalds 5ccd414080 Second round of KVM Changes for v4.12:
* ARM: bugfixes; moved shared 32-bit/64-bit files to virt/kvm/arm;
 support for saving/restoring virtual ITS state to userspace
 
 * PPC: XIVE (eXternal Interrupt Virtualization Engine) support
 
 * x86: nVMX improvements, including emulated page modification logging
 (PML) which brings nice performance improvements on some workloads
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZEeusAAoJEL/70l94x66Dq+cH/RkL9znP717k7Z0jS8/FJN9q
 wKU8j0jRxuqjnvEu89redfFKxElWM9T1fwReBObjWct9+hyJ9Pbpf95Lr9ca39PR
 zhiBMKl79he0gHV/z48ItuzH1mOrU/KzFfxHYLlBd4oGw0ZdUttWAsUtaWQ8UNFo
 xtyu2R+CWYLeAUBrpYmkvrOjhnB+S9+f4y2OY9pXsMg4HN9/Tdn0B656yTOWdu9C
 onO3QQXNY/dzGFLH3tA/kAbz25x4Y+pP2UHlMm5vkW8XWPn+lluUwtBonTKdzy64
 RDWWUWcs0k37ps4H9b56oXmz8ZFZ0FQF3MimDQueGHSYOXCxU5EqmC9c7KZmZrg=
 =KcCv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "ARM:
   - bugfixes
   - moved shared 32-bit/64-bit files to virt/kvm/arm
   - support for saving/restoring virtual ITS state to userspace

  PPC:
   - XIVE (eXternal Interrupt Virtualization Engine) support

  x86:
   - nVMX improvements, including emulated page modification logging
     (PML) which brings nice performance improvements on some workloads"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits)
  KVM: arm/arm64: vgic-its: Cleanup after failed ITT restore
  KVM: arm/arm64: Don't call map_resources when restoring ITS tables
  KVM: arm/arm64: Register ITS iodev when setting base address
  KVM: arm/arm64: Get rid of its->initialized field
  KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs
  KVM: arm/arm64: Slightly rework kvm_vgic_addr
  KVM: arm/arm64: Make vgic_v3_check_base more broadly usable
  KVM: arm/arm64: Refactor vgic_register_redist_iodevs
  KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus
  nVMX: Advertise PML to L1 hypervisor
  nVMX: Implement emulated Page Modification Logging
  kvm: x86: Add a hook for arch specific dirty logging emulation
  kvm: nVMX: Validate CR3 target count on nested VM-entry
  KVM: set no_llseek in stat_fops_per_vm
  KVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enable
  KVM: arm/arm64: Clarification and relaxation to ITS save/restore ABI
  KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES
  KVM: arm64: vgic-its: Fix pending table sync
  KVM: arm64: vgic-its: ITT save and restore
  KVM: arm64: vgic-its: Device table save/restore
  ...
2017-05-10 11:29:23 -07:00
Linus Torvalds de4d195308 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Debloat RCU headers

   - Parallelize SRCU callback handling (plus overlapping patches)

   - Improve the performance of Tree SRCU on a CPU-hotplug stress test

   - Documentation updates

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  rcu: Open-code the rcu_cblist_n_lazy_cbs() function
  rcu: Open-code the rcu_cblist_n_cbs() function
  rcu: Open-code the rcu_cblist_empty() function
  rcu: Separately compile large rcu_segcblist functions
  srcu: Debloat the <linux/rcu_segcblist.h> header
  srcu: Adjust default auto-expediting holdoff
  srcu: Specify auto-expedite holdoff time
  srcu: Expedite first synchronize_srcu() when idle
  srcu: Expedited grace periods with reduced memory contention
  srcu: Make rcutorture writer stalls print SRCU GP state
  srcu: Exact tracking of srcu_data structures containing callbacks
  srcu: Make SRCU be built by default
  srcu: Fix Kconfig botch when SRCU not selected
  rcu: Make non-preemptive schedule be Tasks RCU quiescent state
  srcu: Expedite srcu_schedule_cbs_snp() callback invocation
  srcu: Parallelize callback handling
  kvm: Move srcu_struct fields to end of struct kvm
  rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
  rcu: Use true/false in assignment to bool
  rcu: Use bool value directly
  ...
2017-05-10 10:30:46 -07:00
Nicolas Dichtel fcc8487d47 uapi: export all headers under uapi directories
Regularly, when a new header is created in include/uapi/, the developer
forgets to add it in the corresponding Kbuild file. This error is usually
detected after the release is out.

In fact, all headers under uapi directories should be exported, thus it's
useless to have an exhaustive list.

After this patch, the following files, which were not exported, are now
exported (with make headers_install_all):
asm-arc/kvm_para.h
asm-arc/ucontext.h
asm-blackfin/shmparam.h
asm-blackfin/ucontext.h
asm-c6x/shmparam.h
asm-c6x/ucontext.h
asm-cris/kvm_para.h
asm-h8300/shmparam.h
asm-h8300/ucontext.h
asm-hexagon/shmparam.h
asm-m32r/kvm_para.h
asm-m68k/kvm_para.h
asm-m68k/shmparam.h
asm-metag/kvm_para.h
asm-metag/shmparam.h
asm-metag/ucontext.h
asm-mips/hwcap.h
asm-mips/reg.h
asm-mips/ucontext.h
asm-nios2/kvm_para.h
asm-nios2/ucontext.h
asm-openrisc/shmparam.h
asm-parisc/kvm_para.h
asm-powerpc/perf_regs.h
asm-sh/kvm_para.h
asm-sh/ucontext.h
asm-tile/shmparam.h
asm-unicore32/shmparam.h
asm-unicore32/ucontext.h
asm-x86/hwcap2.h
asm-xtensa/kvm_para.h
drm/armada_drm.h
drm/etnaviv_drm.h
drm/vgem_drm.h
linux/aspeed-lpc-ctrl.h
linux/auto_dev-ioctl.h
linux/bcache.h
linux/btrfs_tree.h
linux/can/vxcan.h
linux/cifs/cifs_mount.h
linux/coresight-stm.h
linux/cryptouser.h
linux/fsmap.h
linux/genwqe/genwqe_card.h
linux/hash_info.h
linux/kcm.h
linux/kcov.h
linux/kfd_ioctl.h
linux/lightnvm.h
linux/module.h
linux/nbd-netlink.h
linux/nilfs2_api.h
linux/nilfs2_ondisk.h
linux/nsfs.h
linux/pr.h
linux/qrtr.h
linux/rpmsg.h
linux/sched/types.h
linux/sed-opal.h
linux/smc.h
linux/smc_diag.h
linux/stm.h
linux/switchtec_ioctl.h
linux/vfio_ccw.h
linux/wil6210_uapi.h
rdma/bnxt_re-abi.h

Note that I have removed from this list the files which are generated in every
exported directories (like .install or .install.cmd).

Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
subdirs with a pure makefile command.

For the record, note that exported files for asm directories are a mix of
files listed by:
 - include/uapi/asm-generic/Kbuild.asm;
 - arch/<arch>/include/uapi/asm/Kbuild;
 - arch/<arch>/include/asm/Kbuild.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Salter <msalter@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-05-11 00:21:54 +09:00
Nicholas Piggin 5a61ef74f2 powerpc/64s: Support new device tree binding for discovering CPU features
The ibm,powerpc-cpu-features device tree binding describes CPU features with
ASCII names and extensible compatibility, privilege, and enablement metadata
that allows improved flexibility and compatibility with new hardware.

The interface is described in detail in ibm,powerpc-cpu-features.txt in this
patch.

Currently this code is not enabled by default, and there are no released
firmwares that provide the binding.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-09 23:42:55 +10:00
Nicholas Piggin 75bda95048 powerpc: Don't print cpu_spec->cpu_name if it's NULL
Currently we assume that if the cpu_spec has a pvr_mask then it must also have a
cpu_name. But that will change in a subsequent commit when we do CPU feature
discovery via the device tree, so check explicitly if cpu_name is NULL.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-09 22:56:03 +10:00
Michael Ellerman 0b382fb3d9 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Includes a fix for a powerpc/next mm regression on 64e, a fix for a
kernel hang on 64e when using a debugger inside a relocated kernel, a
qman fix, and misc qe improvements."
2017-05-09 22:54:35 +10:00
Paolo Bonzini 4415b33528 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
The main thing here is a new implementation of the in-kernel
XICS interrupt controller emulation for POWER9 machines, from Ben
Herrenschmidt.

POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
Virtualization Engine) which is able to deliver interrupts directly
to guest virtual CPUs in hardware without hypervisor intervention.
With this new code, the guest still sees the old XICS interface but
performance is better because the XICS emulation in the host uses the
XIVE directly rather than going through a XICS emulation in firmware.

Conflicts:
	arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix]
	arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
2017-05-09 11:50:01 +02:00
Nicholas Piggin 6102c005a7 powerpc/64s: Fix unnecessary machine check handler relocation branch
Similarly to commit 2563a70c3b ("powerpc/64s: Remove unnecessary relocation
branch from idle handler"), the machine check handler has a BRANCH_TO from
relocated to relocated code, which is unnecessary.

It has also caused build errors with some toolchains:

  arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
  arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
  (0xffffffffffff8280 is not between 0x0000000000000000 and
  0x000000000000ffff)

Fixes: 1945bc4549 ("powerpc/64s: Fix POWER9 machine check handler from stop state")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reported-and-tested-by : Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-09 19:25:50 +10:00
Michael Ellerman ba95b5d035 powerpc/mm/book3s/64: Rework page table geometry for lower memory usage
Recently in commit f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB")
we increased the virtual address space for user processes to 128TB by default,
and up to 512TB if user space opts in.

This obviously required expanding the range of the Linux page tables. For Book3s
64-bit using hash and with PAGE_SIZE=64K, we increased the PGD to 2^15 entries.
This meant we could cover the full address range, while still being able to
insert a 16G hugepage at the PGD level and a 16M hugepage in the PMD.

The downside of that geometry is that it uses a lot of memory for the PGD, and
in particular makes the PGD a 4-page allocation, which means it's much more
likely to fail under memory pressure.

Instead we can make the PMD larger, so that a single PUD entry maps 16G,
allowing the 16G hugepages to sit at that level in the tree. We're then able to
split the remaining bits between the PUG and PGD. We make the PGD slightly
larger as that results in lower memory usage for typical programs.

When THP is enabled the PMD actually doubles in size, to 2^11 entries, or 2^14
bytes, which is large but still < PAGE_SIZE.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2017-05-09 19:24:23 +10:00
Horia Geantă 24e0bfbf63 powerpc: Fix distclean with Makefile.postlink
Makefile.postlink always includes include/config/auto.conf, however
this file is not present in a clean kernel tree, causing make to fail:

  $ git clone linuxppc.git
  $ cd linuxppc.git
  $ make distclean
  arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such file or directory
  make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
  make: *** [vmlinuxclean] Error 2

Equally running 'make distclean; make distclean' will trip the error case.

Change the inclusion such that file not being found does not trigger an error.

Fixes: f188d0524d ("powerpc: Use the new post-link pass to check relocations")
Reported-by: Mircea Pop <mircea.pop@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-09 19:24:23 +10:00
Linus Torvalds 857f864014 pci-v4.12-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z
 akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G
 tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St
 svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM
 KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5
 gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG
 0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f
 oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+
 IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE
 iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3
 IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq
 ags00JtMLitfNPBH3eSl
 =eE4D
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - add framework for supporting PCIe devices in Endpoint mode (Kishon
   Vijay Abraham I)

 - use non-postable PCI config space mappings when possible (Lorenzo
   Pieralisi)

 - clean up and unify mmap of PCI BARs (David Woodhouse)

 - export and unify Function Level Reset support (Christoph Hellwig)

 - avoid FLR for Intel 82579 NICs (Sasha Neftin)

 - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)

 - short-circuit config access failures for disconnected devices (Keith
   Busch)

 - remove D3 sleep delay when possible (Adrian Hunter)

 - freeze PME scan before suspending devices (Lukas Wunner)

 - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)

 - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)

 - add arch-specific alignment control to improve device passthrough by
   avoiding multiple BARs in a page (Yongji Xie)

 - add sysfs sriov_drivers_autoprobe to control VF driver binding
   (Bodong Wang)

 - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)

 - fix crashes when unbinding host controllers that don't support
   removal (Brian Norris)

 - add driver for MicroSemi Switchtec management interface (Logan
   Gunthorpe)

 - add driver for Faraday Technology FTPCI100 host bridge (Linus
   Walleij)

 - add i.MX7D support (Andrey Smirnov)

 - use generic MSI support for Aardvark (Thomas Petazzoni)

 - make Rockchip driver modular (Brian Norris)

 - advertise 128-byte Read Completion Boundary support for Rockchip
   (Shawn Lin)

 - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)

 - convert atomic_t to refcount_t in HV driver (Elena Reshetova)

 - add CPU IRQ affinity in HV driver (K. Y. Srinivasan)

 - fix PCI bus removal in HV driver (Long Li)

 - add support for ThunderX2 DMA alias topology (Jayachandran C)

 - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)

 - add ITE 8893 bridge DMA alias quirk (Jarod Wilson)

 - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
   (Manish Jaggi)

* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
  PCI: Don't allow unbinding host controllers that aren't prepared
  ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
  MAINTAINERS: Add PCI Endpoint maintainer
  Documentation: PCI: Add userguide for PCI endpoint test function
  tools: PCI: Add sample test script to invoke pcitest
  tools: PCI: Add a userspace tool to test PCI endpoint
  Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
  misc: Add host side PCI driver for PCI test function device
  PCI: Add device IDs for DRA74x and DRA72x
  dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
  PCI: dwc: dra7xx: Workaround for errata id i870
  dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
  PCI: dwc: dra7xx: Add EP mode support
  PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
  dt-bindings: PCI: Add DT bindings for PCI designware EP mode
  PCI: dwc: designware: Add EP mode support
  Documentation: PCI: Add binding documentation for pci-test endpoint function
  ixgbe: Use pcie_flr() instead of duplicating it
  IB/hfi1: Use pcie_flr() instead of duplicating it
  PCI: imx6: Fix spelling mistake: "contol" -> "control"
  ...
2017-05-08 19:03:25 -07:00
Linus Torvalds bf5f89463f Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - the rest of MM

 - various misc things

 - procfs updates

 - lib/ updates

 - checkpatch updates

 - kdump/kexec updates

 - add kvmalloc helpers, use them

 - time helper updates for Y2038 issues. We're almost ready to remove
   current_fs_time() but that awaits a btrfs merge.

 - add tracepoints to DAX

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
  drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
  selftests/vm: add a test for virtual address range mapping
  dax: add tracepoint to dax_insert_mapping()
  dax: add tracepoint to dax_writeback_one()
  dax: add tracepoints to dax_writeback_mapping_range()
  dax: add tracepoints to dax_load_hole()
  dax: add tracepoints to dax_pfn_mkwrite()
  dax: add tracepoints to dax_iomap_pte_fault()
  mtd: nand: nandsim: convert to memalloc_noreclaim_*()
  treewide: convert PF_MEMALLOC manipulations to new helpers
  mm: introduce memalloc_noreclaim_{save,restore}
  mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
  mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
  mm/huge_memory.c: use zap_deposited_table() more
  time: delete CURRENT_TIME_SEC and CURRENT_TIME
  gfs2: replace CURRENT_TIME with current_time
  apparmorfs: replace CURRENT_TIME with current_time()
  lustre: replace CURRENT_TIME macro
  fs: ubifs: replace CURRENT_TIME_SEC with current_time
  fs: ufs: use ktime_get_real_ts64() for birthtime
  ...
2017-05-08 18:17:56 -07:00
Stephen Boyd ad61dd303a scripts/spelling.txt: add regsiter -> register spelling mistake
This typo is quite common.  Fix it and add it to the spelling file so
that checkpatch catches it earlier.

Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Hari Bathini 11550dc0a0 powerpc/fadump: reuse crashkernel parameter for fadump memory reservation
fadump supports specifying memory to reserve for fadump's crash kernel
with fadump_reserve_mem kernel parameter.  This parameter currently
supports passing a fixed memory size, like fadump_reserve_mem=<size>
only.  This patch aims to add support for other syntaxes like
range-based memory size
<range1>:<size1>[,<range2>:<size2>,<range3>:<size3>,...] which allows
using the same parameter to boot the kernel with different system RAM
sizes.

As crashkernel parameter already supports the above mentioned syntaxes,
this patch deprecates fadump_reserve_mem parameter and reuses
crashkernel parameter instead, to specify memory for fadump's crash
kernel memory reservation as well.  If any offset is provided in
crashkernel parameter, it will be ignored in case of fadump, as fadump
reserves memory at end of RAM.

Advantages using crashkernel parameter instead of fadump_reserve_mem
parameter are one less kernel parameter overall, code reuse and support
for multiple syntaxes to specify memory.

Suggested-by: Dave Young <dyoung@redhat.com>
Link: http://lkml.kernel.org/r/149035346749.6881.911095631212975718.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:11 -07:00
Hari Bathini 22bd0177bd powerpc/fadump: remove dependency with CONFIG_KEXEC
Now that crashkernel parameter parsing and vmcoreinfo related code is
moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove
dependency with CONFIG_KEXEC for CONFIG_FA_DUMP.  While here, get rid of
definitions of fadump_append_elf_note() & fadump_final_note() functions
to reuse similar functions compiled under CONFIG_CRASH_CORE.

Link: http://lkml.kernel.org/r/149035343956.6881.1536459326017709354.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:11 -07:00