1
0
Fork 0
Commit Graph

150429 Commits (a7ddcea58ae22d85d94eabfdd3de75c3742e376b)

Author SHA1 Message Date
Joerg Roedel 30514effc9 x86/mm/pti: Don't clear permissions in pti_clone_pmd()
The function sets the global-bit on cloned PMD entries, which only makes
sense when the permissions are identical between the user and the kernel
page-table. Further, only write-permissions are cleared for entry-text and
kernel-text sections, which are not writeable at the end of the boot
process.

The reason why this RW clearing exists is that in the early PTI
implementations the cloned kernel areas were set up during early boot
before the kernel text is set to read only and not touched afterwards.

This is not longer true. The cloned areas are still set up early to get the
entry code working for interrupts and other things, but after the kernel
text has been set RO the clone is repeated which copies the RO PMD/PTEs
over to the user visible clone. That means the initial clearing of the
writable bit can be avoided.

[ tglx: Amended changelog ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1533637471-30953-3-git-send-email-joro@8bytes.org
2018-08-07 23:36:02 +02:00
Peter Zijlstra 5800dc5c19 x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
Nadav reported that on guests we're failing to rewrite the indirect
calls to CALLEE_SAVE paravirt functions. In particular the
pv_queued_spin_unlock() call is left unpatched and that is all over the
place. This obviously wrecks Spectre-v2 mitigation (for paravirt
guests) which relies on not actually having indirect calls around.

The reason is an incorrect clobber test in paravirt_patch_call(); this
function rewrites an indirect call with a direct call to the _SAME_
function, there is no possible way the clobbers can be different
because of this.

Therefore remove this clobber check. Also put WARNs on the other patch
failure case (not enough room for the instruction) which I've not seen
trigger in my (limited) testing.

Three live kernel image disassemblies for lock_sock_nested (as a small
function that illustrates the problem nicely). PRE is the current
situation for guests, POST is with this patch applied and NATIVE is with
or without the patch for !guests.

PRE:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
   0xffffffff817be970 <+0>:     push   %rbp
   0xffffffff817be971 <+1>:     mov    %rdi,%rbp
   0xffffffff817be974 <+4>:     push   %rbx
   0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
   0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
   0xffffffff817be981 <+17>:    mov    %rbx,%rdi
   0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
   0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
   0xffffffff817be98f <+31>:    test   %eax,%eax
   0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
   0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
   0xffffffff817be99d <+45>:    mov    %rbx,%rdi
   0xffffffff817be9a0 <+48>:    callq  *0xffffffff822299e8
   0xffffffff817be9a7 <+55>:    pop    %rbx
   0xffffffff817be9a8 <+56>:    pop    %rbp
   0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
   0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
   0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
   0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
   0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
   0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.

POST:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
   0xffffffff817be970 <+0>:     push   %rbp
   0xffffffff817be971 <+1>:     mov    %rdi,%rbp
   0xffffffff817be974 <+4>:     push   %rbx
   0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
   0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
   0xffffffff817be981 <+17>:    mov    %rbx,%rdi
   0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
   0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
   0xffffffff817be98f <+31>:    test   %eax,%eax
   0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
   0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
   0xffffffff817be99d <+45>:    mov    %rbx,%rdi
   0xffffffff817be9a0 <+48>:    callq  0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
   0xffffffff817be9a5 <+53>:    xchg   %ax,%ax
   0xffffffff817be9a7 <+55>:    pop    %rbx
   0xffffffff817be9a8 <+56>:    pop    %rbp
   0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
   0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
   0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063aa0 <__local_bh_enable_ip>
   0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
   0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
   0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.

NATIVE:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
   0xffffffff817be970 <+0>:     push   %rbp
   0xffffffff817be971 <+1>:     mov    %rdi,%rbp
   0xffffffff817be974 <+4>:     push   %rbx
   0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
   0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
   0xffffffff817be981 <+17>:    mov    %rbx,%rdi
   0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
   0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
   0xffffffff817be98f <+31>:    test   %eax,%eax
   0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
   0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
   0xffffffff817be99d <+45>:    mov    %rbx,%rdi
   0xffffffff817be9a0 <+48>:    movb   $0x0,(%rdi)
   0xffffffff817be9a3 <+51>:    nopl   0x0(%rax)
   0xffffffff817be9a7 <+55>:    pop    %rbx
   0xffffffff817be9a8 <+56>:    pop    %rbp
   0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
   0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
   0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
   0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
   0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
   0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.


Fixes: 63f70270cc ("[PATCH] i386: PARAVIRT: add common patching machinery")
Fixes: 3010a0663f ("x86/paravirt, objtool: Annotate indirect calls")
Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
2018-08-07 22:15:02 +02:00
Paul Burton 36dc5b20e3
MIPS: Use dins to simplify __write_64bit_c0_split()
The code in __write_64bit_c0_split() is used by MIPS32 kernels running
on MIPS64 CPUs to write a 64-bit value to a 64-bit coprocessor 0
register using a single 64-bit dmtc0 instruction. It does this by
combining the 2x 32-bit registers used to hold the 64-bit value into a
single register, which in the existing code involves three steps:

  1) Zero extend register A which holds bits 31:0 of our data, since it
     may have previously held a sign-extended value.

  2) Shift register B which holds bits 63:32 of our data in bits 31:0
     left by 32 bits, such that the bits forming our data are in the
     position they'll be in the final 64-bit value & bits 31:0 of the
     register are zero.

  3) Or the two registers together to form the 64-bit value in one
     64-bit register.

From MIPS r2 onwards we have a dins instruction which can effectively
perform all 3 of those steps using a single instruction.

Add a path for MIPS r2 & beyond which uses dins to take bits 31:0 from
register B & insert them into bits 63:32 of register A, giving us our
full 64-bit value in register A with one instruction.

Since we know that MIPS r2 & above support the sel field for the dmtc0
instruction, we don't bother special casing sel==0. Omiting the sel
field would assemble to exactly the same instruction as when we
explicitly specify that it equals zero.

Signed-off-by: Paul Burton <paul.burton@mips.com>
2018-08-07 10:33:45 -07:00
Paul Burton 08eeb44b24
MIPS: Use read-write output operand in __write_64bit_c0_split()
Commit c22c804310 ("MIPS: Fix input modify in
__write_64bit_c0_split()") modified __write_64bit_c0_split() constraints
such that we have both an input & an output which we hope to assign to
the same registers, and modify the output rather than incorrectly
clobbering an input.

The way in which we use both an output & an input parameter with the
input constrained to share the output registers is a little convoluted &
also problematic for clang, which complains if the input & output values
have different widths. For example:

  In file included from kernel/fork.c:98:
  ./arch/mips/include/asm/mmu_context.h:149:19: error: unsupported
    inline asm: input with type 'unsigned long' matching output with
    type 'unsigned long long'
          write_c0_entryhi(cpu_asid(cpu, next));
          ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
  ./arch/mips/include/asm/mmu_context.h:93:2: note: expanded from macro
    'cpu_asid'
          (cpu_context((cpu), (mm)) & cpu_asid_mask(&cpu_data[cpu]))
          ^
  ./arch/mips/include/asm/mipsregs.h:1617:65: note: expanded from macro
    'write_c0_entryhi'
  #define write_c0_entryhi(val)   __write_ulong_c0_register($10, 0, val)
                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
  ./arch/mips/include/asm/mipsregs.h:1430:39: note: expanded from macro
    '__write_ulong_c0_register'
                  __write_64bit_c0_register(reg, sel, val);               \
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
  ./arch/mips/include/asm/mipsregs.h:1400:41: note: expanded from macro
    '__write_64bit_c0_register'
                  __write_64bit_c0_split(register, sel, value);           \
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
  ./arch/mips/include/asm/mipsregs.h:1498:13: note: expanded from macro
    '__write_64bit_c0_split'
                          : "r,0" (val));                                 \
                                   ^~~

We can both fix this build failure & simplify the code somewhat by
assigning the __tmp variable with the input value in C prior to our
inline assembly, and then using a single read-write output operand (ie.
a constraint beginning with +) to provide this value to our assembly.

Signed-off-by: Paul Burton <paul.burton@mips.com>
2018-08-07 10:33:35 -07:00
Joerg Roedel 88c6f8a397 x86/mm/pti: Fix 32 bit PCID check
The check uses the wrong operator and causes false positive
warnings in the kernel log on some systems.

Fixes: 5e8105950a ('x86/mm/pti: Add Warning when booting on a PCID capable CPU')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1533637471-30953-2-git-send-email-joro@8bytes.org
2018-08-07 18:51:22 +02:00
Juergen Gross cd9139220b xen: don't use privcmd_call() from xen_mc_flush()
Using privcmd_call() for a singleton multicall seems to be wrong, as
privcmd_call() is using stac()/clac() to enable hypervisor access to
Linux user space.

Even if currently not a problem (pv domains can't use SMAP while HVM
and PVH domains can't use multicalls) things might change when
PVH dom0 support is added to the kernel.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-07 11:37:01 -04:00
Benjamin Herrenschmidt 77b5f703dc powerpc/powernv/opal: Use standard interrupts property when available
For (bad) historical reasons, OPAL used to create a non-standard pair
of properties "opal-interrupts" and "opal-interrupts-names" for
representing the list of interrupts it wants Linux to request on its
behalf.

Among other issues, the opal-interrupts doesn't have a way to carry
the type of interrupts, and they were assumed to be all level
sensitive.

This is wrong on some recent systems where some of them are edge
sensitive causing warnings in the XIVE code and possible misbehaviours
if they need to be retriggered (typically the NPU2 TCE error
interrupts).

This makes Linux switch to using the standard "interrupts" and
"interrupt-names" properties instead when they are available, using
standard of_irq helpers, which can carry all the desired type
information.

Newer versions of OPAL will generate those properties in addition to
the legacy ones.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fixup prefix logic to check strlen(r->name). Reinstate setting
 of start = 0 in opal_event_shutdown() to avoid double free warnings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:38 +10:00
Christophe Leroy d6690b1a9b powerpc: Allow CPU selection of e300core variants
GCC supports -mcpu=e300c2 and -mcpu=e300c3

This patch gives the opportunity to tune kernel to one of
those two types.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:37 +10:00
Christophe Leroy 0e00a8c9fd powerpc: Allow CPU selection also on PPC32
This patch extends to PPC32 the capability to select the exact
CPU type.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:37 +10:00
Christophe Leroy cc62d20ce4 powerpc: Make CPU selection logic generic in Makefile
At the time being, when adding a new CPU for selection, both
Kconfig.cputype and Makefile have to be modified.

This patch moves into Kconfig.cputype the name of the CPU to me
passed to the -mcpu= argument.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Rename the option to TARGET_CPU to echo the gcc documentation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:36 +10:00
Rodrigo R. Galvao badf436f6f powerpc/Makefiles: Convert ifeq to ifdef where possible
In Makefiles if we're testing a CONFIG_FOO symbol for equality with 'y'
we can instead just use ifdef. The latter reads easily, so convert to
it where possible.

Signed-off-by: Rodrigo R. Galvao <rosattig@linux.vnet.ibm.com>
Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:36 +10:00
Paul Mackerras f8db2007ff powerpc/64: Copy as much as possible in __copy_tofrom_user
In __copy_tofrom_user, if we encounter an exception on a store, we
stop copying and return the number of bytes not copied.  However,
if the store is wider than one byte and is to an unaligned address,
it is possible that the store operand overlaps a page boundary
and the exception occurred on the latter part of the store operand,
meaning that it would be possible to copy a few more bytes.  Since
copy_to_user is generally expected to copy as much as possible,
it would be better to copy those extra few bytes.  This adds code
to do that.  Since this edge case is not performance-critical,
the code has been written to be compact rather than as fast as
possible.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:36 +10:00
Paul Mackerras 98c45f51f7 selftests/powerpc/64: Test all paths through copy routines
The hand-coded assembler 64-bit copy routines include feature sections
that select one code path or another depending on which CPU we are
executing on.  The self-tests for these copy routines end up testing
just one path.  This adds a mechanism for selecting any desired code
path at compile time, and makes 2 or 3 versions of each test, each
using a different code path, so as to cover all the possible paths.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Add -mcpu=power4 to CFLAGS for older compilers]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:35 +10:00
Paul Mackerras a7c81ce398 powerpc/64: Make exception table clearer in __copy_tofrom_user_base
This aims to make the generation of exception table entries for the
loads and stores in __copy_tofrom_user_base clearer and easier to
verify.  Instead of having a series of local labels on the loads and
stores, with a series of corresponding labels later for the exception
handlers, we now use macros to generate exception table entries at the
point of each load and store that could potentially trap.  We do this
with the macros lex (load exception) and stex (store exception).
These macros are used right before the load or store to which they
apply.

Some complexity is introduced by the fact that we have some more work
to do after hitting an exception, because we need to calculate and
return the number of bytes not copied.  The code uses r3 as the
current pointer into the destination buffer, that is, the address of
the first byte of the destination that has not been modified.
However, at various points in the copy loops, r3 can be 4, 8, 16 or 24
bytes behind that point.

To express this offset in an understandable way, we define a symbol
r3_offset which is updated at various points so that it equal to the
difference between the address of the first unmodified byte of the
destination and the value in r3.  (In fact it only needs to be
accurate at the point of each lex or stex macro invocation.)

The rules for updating r3_offset are as follows:

* It starts out at 0
* An addi r3,r3,N instruction decreases r3_offset by N
* A store instruction (stb, sth, stw, std) to N(r3)
  increases r3_offset by the width of the store (1, 2, 4, 8)
* A store with update instruction (stbu, sthu, stwu, stdu) to N(r3)
  sets r3_offset to the width of the store.

There is some trickiness to the way that the lex and stex macros and
the associated exception handlers work.  I would have liked to use
the current value of r3_offset in the name of the symbol used as
the exception handler, as in ".Lld_exc_$(r3_offset)" and then
have symbols .Lld_exc_0, .Lld_exc_8, .Lld_exc_16 etc. corresponding
to the offsets that needed to be added to r3.  However, I couldn't
see a way to do that with gas.

Instead, the exception handler address is .Lld_exc - r3_offset or
.Lst_exc - r3_offset, that is, the distance ahead of .Lld_exc/.Lst_exc
that we start executing is equal to the amount that we need to add to
r3.  This works because r3_offset is always a small multiple of 4,
and our instructions are 4 bytes long.  This means that before
.Lld_exc and .Lst_exc, we have a sequence of instructions that
increments r3 by 4, 8, 16 or 24 depending on where we start.  The
sequence increments r3 by 4 per instruction (on average).

We also replace the exception table for the 4k copy loop by a
macro per load or store.  These loads and stores all use exactly
the same exception handler, which simply resets the argument registers
r3, r4 and r5 to there original values and re-does the whole copy
using the slower loop.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:34 +10:00
zhong jiang 81d7b08b3c powerpc/powermac: of_node_put() is not needed after iterator
for_each_node_by_name() iterators only exit normally when the loop
cursor is NULL, So there is no need to call of_node_put().

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:34 +10:00
Haren Myneni 656ecc16e8 crypto/nx: Initialize 842 high and normal RxFIFO control registers
NX increments readOffset by FIFO size in receive FIFO control register
when CRB is read. But the index in RxFIFO has to match with the
corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
may be processing incorrect CRBs and can cause CRB timeout.

VAS FIFO offset is 0 when the receive window is opened during
initialization. When the module is reloaded or in kexec boot, readOffset
in FIFO control register may not match with VAS entry. This patch adds
nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
control register for both high and normal FIFOs.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
[mpe: Fixup uninitialized variable warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:34 +10:00
Haren Myneni 6e708000ec powerpc/powernv: Export opal_check_token symbol
Export opal_check_token symbol for modules to check the availability
of OPAL calls before using them.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:33 +10:00
Randy Dunlap f5daf77a55 powerpc/platforms/85xx: fix t1042rdb_diu.c build errors & warning
Fix build errors and warnings in t1042rdb_diu.c by adding header files
and MODULE_LICENSE().

../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: data definition has no type or storage class
 early_initcall(t1042rdb_diu_init);
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: error: type defaults to 'int' in declaration of 'early_initcall' [-Werror=implicit-int]
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: parameter names (without types) in function declaration

and
WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/85xx/t1042rdb_diu.o

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:33 +10:00
Anju T Sudhakar 7ccc4fe5ff powerpc/perf: Remove sched_task function defined for thread-imc
Call trace observed while running perf-fuzzer:

  CPU: 43 PID: 9088 Comm: perf_fuzzer Not tainted 4.13.0-32-generic #35~lp1746225
  task: c000003f776ac900 task.stack: c000003f77728000
  NIP: c000000000299b70 LR: c0000000002a4534 CTR: c00000000029bb80
  REGS: c000003f7772b760 TRAP: 0700   Not tainted  (4.13.0-32-generic)
  MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
    CR: 24008822  XER: 00000000
  CFAR: c000000000299a70 SOFTE: 0
  GPR00: c0000000002a4534 c000003f7772b9e0 c000000001606200 c000003fef858908
  GPR04: c000003f776ac900 0000000000000001 ffffffffffffffff 0000003fee730000
  GPR08: 0000000000000000 0000000000000000 c0000000011220d8 0000000000000002
  GPR12: c00000000029bb80 c000000007a3d900 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c000003f776ad090 c000000000c71354
  GPR24: c000003fef716780 0000003fee730000 c000003fe69d4200 c000003f776ad330
  GPR28: c0000000011220d8 0000000000000001 c0000000014c6108 c000003fef858900
  NIP [c000000000299b70] perf_pmu_sched_task+0x170/0x180
  LR [c0000000002a4534] __perf_event_task_sched_in+0xc4/0x230
  Call Trace:
    perf_iterate_sb+0x158/0x2a0 (unreliable)
    __perf_event_task_sched_in+0xc4/0x230
    finish_task_switch+0x21c/0x310
    __schedule+0x304/0xb80
    schedule+0x40/0xc0
    do_wait+0x254/0x2e0
    kernel_wait4+0xa0/0x1a0
    SyS_wait4+0x64/0xc0
    system_call+0x58/0x6c
  Instruction dump:
  3beafea0 7faa4800 409eff18 e8010060 eb610028 ebc10040 7c0803a6 38210050
  eb81ffe0 eba1ffe8 ebe1fff8 4e800020 <0fe00000> 4bffffbc 60000000 60420000
  ---[ end trace 8c46856d314c1811 ]---

The context switch call-backs for thread-imc are defined in sched_task function.
So when thread-imc events are grouped with software pmu events,
perf_pmu_sched_task hits the WARN_ON_ONCE condition, since software PMUs are
assumed not to have a sched_task defined.

Patch to move the thread_imc enable/disable opal call back from sched_task to
event_[add/del] function

Fixes: f74c89bd80 ("powerpc/perf: Add thread IMC PMU support")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:32 +10:00
Nicholas Piggin 4231aba000 powerpc/64s: Fix page table fragment refcount race vs speculative references
The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a
speculative reference, and then the speculative failure handler
decrements the new reference, and the underflow eventually pops when
the page tables are freed.

Fix this by using a dedicated field in the struct page for the page
table fragment allocator.

Fixes: 5c1f6ee9a3 ("powerpc: Reduce PTE table memory wastage")
Cc: stable@vger.kernel.org # v3.10+
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:32 +10:00
Darren Stevens e13606d732 powerpc/pasemi: Use pr_err/pr_warn... for kernel messages
Pasemi code still uses printk(KERN_ERR/KERN_WARN ... change these to
pr_err(, pr_warn(... to match other powerpc arch code.

No functional changes.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
[mpe: Unsplit some strings while we're at it]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:31 +10:00
Murilo Opsfelder Araujo a99b9c5ed4 powerpc/traps: Show instructions on exceptions
Call show_user_instructions() in arch/powerpc/kernel/traps.c to dump
instructions at faulty location, useful to debugging.

Before this patch, an unhandled signal message looked like:

  pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000]

After this patch, it looks like:

  pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000]
  pandafault[10524]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
  pandafault[10524]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:30 +10:00
Murilo Opsfelder Araujo 88b0fe1757 powerpc: Add show_user_instructions()
show_user_instructions() is a slightly modified version of
show_instructions() that allows userspace instruction dump.

This will be useful within show_signal_msg() to dump userspace
instructions of the faulty location.

Here is a sample of what show_user_instructions() outputs:

  pandafault[10850]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
  pandafault[10850]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040

The current->comm and current->pid printed can serve as a glue that
links the instructions dump to its originator, allowing messages to be
interleaved in the logs.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:30 +10:00
Murilo Opsfelder Araujo 0f642d616b powerpc/traps: Print VMA for unhandled signals
This adds VMA address in the message printed for unhandled signals,
similarly to what other architectures, like x86, print.

Before this patch, a page fault looked like:

  pandafault[61470]: unhandled signal 11 at 100007d0 nip 1000061c lr 7fff8d185100 code 2

After this patch, a page fault looks like:

  pandafault[6303]: segfault 11 at 100007d0 nip 1000061c lr 7fff93c55100 code 2 in pandafault[10000000+10000]

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:30 +10:00
Murilo Opsfelder Araujo 49d8f2011d powerpc/traps: Use %lx format in show_signal_msg()
Use %lx format to print registers.  This avoids having two different
formats and avoids checking for MSR_64BIT, improving readability of the
function.

Even though we could have used %px, which is functionally equivalent to %lx
as per Documentation/core-api/printk-formats.rst, it is not semantically
correct because the data printed are not pointers.  And using %px requires
casting data to (void *).

Besides that, %lx matches the format used in show_regs().

Before this patch:

  pandafault[4808]: unhandled signal 11 at 0000000010000718 nip 0000000010000574 lr 00007fff935e7a6c code 2

After this patch:

  pandafault[4732]: unhandled signal 11 at 10000718 nip 10000574 lr 7fff86697a6c code 2

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:29 +10:00
Murilo Opsfelder Araujo 35a52a10c3 powerpc/traps: Use an explicit ratelimit state for show_signal_msg()
Replace printk_ratelimited() by printk() and a default rate limit
burst to limit displaying unhandled signals messages.

This will allow us to call print_vma_addr() in a future patch, which
does not work with printk_ratelimited().

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:29 +10:00
Murilo Opsfelder Araujo 658b0f92bc powerpc/traps: Print unhandled signals in a separate function
Isolate the logic of printing unhandled signals out of _exception_pkey().
No functional change, only code rearrangement.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:29 +10:00
Michael Ellerman 78ee994637 powerpc/64s: Make rfi_flush_fallback a little more robust
Because rfi_flush_fallback runs immediately before the return to
userspace it currently runs with the user r1 (stack pointer). This
means if we oops in there we will report a bad kernel stack pointer in
the exception entry path, eg:

  Bad kernel stack pointer 7ffff7150e40 at c0000000000023b4
  Oops: Bad kernel stack pointer, sig: 6 [#1]
  LE SMP NR_CPUS=32 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 1246 Comm: klogd Not tainted 4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3 #7
  NIP:  c0000000000023b4 LR: 0000000010053e00 CTR: 0000000000000040
  REGS: c0000000fffe7d40 TRAP: 4100   Not tainted  (4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3)
  MSR:  9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE>  CR: 44000442  XER: 20000000
  CFAR: c00000000000bac8 IRQMASK: c0000000f1e66a80
  GPR00: 0000000002000000 00007ffff7150e40 00007fff93a99900 0000000000000020
  ...
  NIP [c0000000000023b4] rfi_flush_fallback+0x34/0x80
  LR [0000000010053e00] 0x10053e00

Although the NIP tells us where we were, and the TRAP number tells us
what happened, it would still be nicer if we could report the actual
exception rather than barfing about the stack pointer.

We an do that fairly simply by loading the kernel stack pointer on
entry and restoring the user value before returning. That way we see a
regular oops such as:

  Unrecoverable exception 4100 at c00000000000239c
  Oops: Unrecoverable exception, sig: 6 [#1]
  LE SMP NR_CPUS=32 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 1251 Comm: klogd Not tainted 4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty #40
  NIP:  c00000000000239c LR: 0000000010053e00 CTR: 0000000000000040
  REGS: c0000000f1e17bb0 TRAP: 4100   Not tainted  (4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty)
  MSR:  9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE>  CR: 44000442  XER: 20000000
  CFAR: c00000000000bac8 IRQMASK: 0
  ...
  NIP [c00000000000239c] rfi_flush_fallback+0x3c/0x80
  LR [0000000010053e00] 0x10053e00
  Call Trace:
  [c0000000f1e17e30] [c00000000000b9e4] system_call+0x5c/0x70 (unreliable)

Note this shouldn't make the kernel stack pointer vulnerable to a
meltdown attack, because it should be flushed from the cache before we
return to userspace. The user r1 value will be in the cache, because
we load it in the return path, but that is harmless.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-08-08 00:32:27 +10:00
Michael Ellerman 99d54754d3 powerpc/powernv: Query firmware for count cache flush settings
Look for fw-features properties to determine the appropriate settings
for the count cache flush, and then call the generic powerpc code to
set it up based on the security feature flags.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:27 +10:00
Michael Ellerman ba72dc1719 powerpc/pseries: Query hypervisor for count cache flush settings
Use the existing hypercall to determine the appropriate settings for
the count cache flush, and then call the generic powerpc code to set
it up based on the security feature flags.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:26 +10:00
Michael Ellerman ee13cb249f powerpc/64s: Add support for software count cache flush
Some CPU revisions support a mode where the count cache needs to be
flushed by software on context switch. Additionally some revisions may
have a hardware accelerated flush, in which case the software flush
sequence can be shortened.

If we detect the appropriate flag from firmware we patch a branch
into _switch() which takes us to a count cache flush sequence.

That sequence in turn may be patched to return early if we detect that
the CPU supports accelerating the flush sequence in hardware.

Add debugfs support for reporting the state of the flush, as well as
runtime disabling it.

And modify the spectre_v2 sysfs file to report the state of the
software flush.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:26 +10:00
Michael Ellerman dc8c6cce9a powerpc/64s: Add new security feature flags for count cache flush
Add security feature flags to indicate the need for software to flush
the count cache on context switch, and for the presence of a hardware
assisted count cache flush.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:26 +10:00
Michael Ellerman 06d0bbc6d0 powerpc/asm: Add a patch_site macro & helpers for patching instructions
Add a macro and some helper C functions for patching single asm
instructions.

The gas macro means we can do something like:

  1:	nop
  	patch_site 1b, patch__foo

Which is less visually distracting than defining a GLOBAL symbol at 1,
and also doesn't pollute the symbol table which can confuse eg. perf.

These are obviously similar to our existing feature sections, but are
not automatically patched based on CPU/MMU features, rather they are
designed to be manually patched by C code at some arbitrary point.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:25 +10:00
Diana Craciun c28218d4ab powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platforms
Used barrier_nospec to sanitize the syscall table.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:24 +10:00
Diana Craciun ebcd1bfc33 powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
Implement the barrier_nospec as a isync;sync instruction sequence.
The implementation uses the infrastructure built for BOOK3S 64.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:24 +10:00
Diana Craciun 406d2b6ae3 powerpc/64: Make meltdown reporting Book3S 64 specific
In a subsequent patch we will enable building security.c for Book3E.
However the NXP platforms are not vulnerable to Meltdown, so make the
Meltdown vulnerability reporting PPC_BOOK3S_64 specific.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:24 +10:00
Michael Ellerman af375eefbf powerpc/64: Call setup_barrier_nospec() from setup_arch()
Currently we require platform code to call setup_barrier_nospec(). But
if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case
then we can call it in setup_arch().

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:23 +10:00
Michael Ellerman 179ab1cbf8 powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
Add a config symbol to encode which platforms support the
barrier_nospec speculation barrier. Currently this is just Book3S 64
but we will add Book3E in a future patch.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:23 +10:00
Diana Craciun 6453b532f2 powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.
NXP Book3E platforms are not vulnerable to speculative store
bypass, so make the mitigations PPC_BOOK3S_64 specific.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:18 +10:00
Masahiro Yamada 0004438a16 um: clean up archheaders recipe
Now that '%asm-generic' is added to no-dot-config-targets,
'make asm-generic' does not include the kernel configuration.

You can simply do 'make asm-generic' in the recursed top Makefile
without bothering syncconfig.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Richard Weinberger <richard@nod.at>
2018-08-07 21:30:36 +09:00
Masahiro Yamada 13d3d01e26 um: fix parallel building with O= option
Randy Dunlap reports UML occasionally fails to build with -j<N> and
O=<builddir> options.

  make[1]: Entering directory '/home/rdunlap/mmotm-2018-0802-1529/UM64'
    UPD     include/generated/uapi/linux/version.h
    WRAP    arch/x86/include/generated/asm/dma-contiguous.h
    WRAP    arch/x86/include/generated/asm/export.h
    WRAP    arch/x86/include/generated/asm/early_ioremap.h
    WRAP    arch/x86/include/generated/asm/mcs_spinlock.h
    WRAP    arch/x86/include/generated/asm/mm-arch-hooks.h
    WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
    WRAP    arch/x86/include/generated/uapi/asm/poll.h
    GEN     ./Makefile
  make[2]: *** No rule to make target 'archheaders'.  Stop.
  arch/um/Makefile:119: recipe for target 'archheaders' failed
  make[1]: *** [archheaders] Error 2
  make[1]: *** Waiting for unfinished jobs....
    UPD     include/config/kernel.release
  make[1]: *** wait: No child processes.  Stop.
  Makefile:146: recipe for target 'sub-make' failed
  make: *** [sub-make] Error 2

The cause of the problem is the use of '$(MAKE) KBUILD_SRC=',
which recurses to the top Makefile via the $(objtree)/Makefile
generated by scripts/mkmakefile.

When you run "make -j<N> O=<builddir> ARCH=um", Make can execute
'archheaders' and 'outputmakefile' targets simultaneously because
there is no dependency between them.

If it happens,

  $(Q)$(MAKE) KBUILD_SRC= ARCH=$(HEADER_ARCH) archheaders

... tries to run $(objtree)/Makefile that is being updated.

The correct way for the recursion is

  $(Q)$(MAKE) -f $(srctree)/Makefile ARCH=$(HEADER_ARCH) archheaders

..., which does not rely on the generated Makefile.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Richard Weinberger <richard@nod.at>
2018-08-07 21:30:15 +09:00
Diana Craciun cf175dc315 powerpc/64: Disable the speculation barrier from the command line
The speculation barrier can be disabled from the command line
with the parameter: "nospectre_v1".

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:39 +10:00
Michael Ellerman 0b924de4f6 powerpc/64s: Don't use __MASKABLE_EXCEPTION unnecessarily
We only need to use __MASKABLE_EXCEPTION in one of the four cases for
hardware interrupt, so use the helper macros in the other cases.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:39 +10:00
Michael Ellerman b536da7c2d powerpc/64s: Drop unused loc parameter to MASKABLE_EXCEPTION macros
We pass the "loc" (location) parameter to MASKABLE_EXCEPTION and
friends, but it's not used, so drop it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:38 +10:00
Michael Ellerman 0a55c24185 powerpc/64s: Remove PSERIES naming from the MASKABLE macros
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:38 +10:00
Michael Ellerman 6adc6e9c07 powerpc/64s: Drop _MASKABLE_RELON_EXCEPTION_PSERIES()
_MASKABLE_RELON_EXCEPTION_PSERIES() does nothing useful, update all
callers to use __MASKABLE_RELON_EXCEPTION_PSERIES() directly.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:37 +10:00
Michael Ellerman 9bf2877ac1 powerpc/64s: Drop _MASKABLE_EXCEPTION_PSERIES()
_MASKABLE_EXCEPTION_PSERIES() does nothing useful, update all callers
to use __MASKABLE_EXCEPTION_PSERIES() directly.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:37 +10:00
Michael Ellerman bdf08e1da0 powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES to EXCEPTION_PROLOG
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:36 +10:00
Michael Ellerman 270373f14f powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES
To just EXCEPTION_RELON_PROLOG().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:36 +10:00
Michael Ellerman 6ebb939740 powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES_1
The EXCEPTION_RELON_PROLOG_PSERIES_1() macro does the same job as
EXCEPTION_PROLOG_2 (which we just recently created), except for
"RELON" (relocation on) exceptions.

So rename it as such.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:35 +10:00
Michael Ellerman 94f3cc8e36 powerpc/64s: Remove PSERIES from the NORI macros
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:35 +10:00
Michael Ellerman cb58a4a4b3 powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES_1 to EXCEPTION_PROLOG_2
As with the other patches in this series, we are removing the
"PSERIES" from the name as it's no longer meaningful.

In this case it's not simply a case of removing the "PSERIES" as that
would result in a clash with the existing EXCEPTION_PROLOG_1.

Instead we name this one EXCEPTION_PROLOG_2, as it's usually used in
sequence after 0 and 1.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:35 +10:00
Michael Ellerman b706f42362 powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES_OOL to STD_RELON_EXCEPTION_OOL
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:34 +10:00
Michael Ellerman e42389c5f1 powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES to STD_RELON_EXCEPTION
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:34 +10:00
Michael Ellerman 75e8bef3d6 powerpc/64s: Rename STD_EXCEPTION_PSERIES_OOL to STD_EXCEPTION_OOL
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:33 +10:00
Michael Ellerman e899fce509 powerpc/64s: Rename STD_EXCEPTION_PSERIES to STD_EXCEPTION
The "PSERIES" in STD_EXCEPTION_PSERIES is to differentiate the macros
from the legacy iSeries versions, which are called
STD_EXCEPTION_ISERIES. It is not anything to do with pseries vs
powernv or powermac etc.

We removed the legacy iSeries code in 2012, in commit 8ee3e0d69623x
("powerpc: Remove the main legacy iSerie platform code").

So remove "PSERIES" from the macros.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:33 +10:00
Michael Ellerman 92b6d65c07 powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_RELON_PROLOG_PSERIES()
EXCEPTION_RELON_PROLOG_PSERIES() only has two users,
STD_RELON_EXCEPTION_PSERIES() and STD_RELON_EXCEPTION_HV() both of
which "call" SET_SCRATCH0(), so just move SET_SCRATCH0() into
EXCEPTION_RELON_PROLOG_PSERIES().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:32 +10:00
Michael Ellerman 4a7a0a8444 powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES()
EXCEPTION_PROLOG_PSERIES() only has two users, STD_EXCEPTION_PSERIES()
and STD_EXCEPTION_HV() both of which "call" SET_SCRATCH0(), so just
move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:31 +10:00
Darren Stevens 250a93501d powerpc/pasemi: Search for PCI root bus by compatible property
Pasemi arch code finds the root of the PCI-e bus by searching the
device-tree for a node called 'pxp'. But the root bus has a compatible
property of 'pasemi,rootbus' so search for that instead.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:31 +10:00
Christophe Leroy 9412b23450 powerpc/lib: Implement strlen() in assembly for PPC32
The generic implementation of strlen() reads strings byte per byte.

This patch implements strlen() in assembly based on a read of entire
words, in the same spirit as what some other arches and glibc do.

On a 8xx the time spent in strlen is reduced by 3/4 for long strings.

strlen() selftest on an 8xx provides the following values:

Before the patch (ie with the generic strlen() in lib/string.c):

  len 256 : time = 1.195055
  len 016 : time = 0.083745
  len 008 : time = 0.046828
  len 004 : time = 0.028390

After the patch:

  len 256 : time = 0.272185 ==> 78% improvment
  len 016 : time = 0.040632 ==> 51% improvment
  len 008 : time = 0.033060 ==> 29% improvment
  len 004 : time = 0.029149 ==> 2% degradation

On a 832x:

Before the patch:

  len 256 : time = 0.236125
  len 016 : time = 0.018136
  len 008 : time = 0.011000
  len 004 : time = 0.007229

After the patch:

  len 256 : time = 0.094950 ==> 60% improvment
  len 016 : time = 0.013357 ==> 26% improvment
  len 008 : time = 0.010586 ==> 4% improvment
  len 004 : time = 0.008784

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:30 +10:00
Mahesh Salgaonkar 94675cceac powerpc/pseries: Defer the logging of rtas error to irq work queue.
rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from
vmalloc (non-linear mapping) area.

On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. Apart from that, as Nick pointed out,
pSeries_log_error() also takes a spin_lock while logging error which
is not safe in NMI context. It may endup in deadlock if we get another
MCE before releasing the lock. Fix this by deferring the logging of
rtas error to irq work queue.

Current implementation uses two different buffers to hold rtas error
log depending on whether extended log is provided or not. This makes
bit difficult to identify which buffer has valid data that needs to
logged later in irq work. Simplify this using single buffer, one per
paca, and copy rtas log to it irrespective of whether extended log is
provided or not. Allocate this buffer below RMA region so that it can
be accessed in real mode mce handler.

Fixes: b96672dd84 ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable@vger.kernel.org # v4.14+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:29 +10:00
Mahesh Salgaonkar 74e96bf44f powerpc/pseries: Avoid using the size greater than RTAS_ERROR_LOG_MAX.
The global mce data buffer that used to copy rtas error log is of 2048
(RTAS_ERROR_LOG_MAX) bytes in size. Before the copy we read
extended_log_length from rtas error log header, then use max of
extended_log_length and RTAS_ERROR_LOG_MAX as a size of data to be copied.
Ideally the platform (phyp) will never send extended error log with
size > 2048. But if that happens, then we have a risk of buffer overrun
and corruption. Fix this by using min_t instead.

Fixes: d368514c30 ("powerpc: Fix corruption when grabbing FWNMI data")
Reported-by: Michal Suchanek <msuchanek@suse.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:28 +10:00
Benjamin Herrenschmidt e27e0a9465 powerpc/xive: Remove xive_kexec_teardown_cpu()
It's identical to xive_teardown_cpu() so just use the latter

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:28 +10:00
Benjamin Herrenschmidt dbc5740247 powerpc/xive: Remove now useless pr_debug statements
Those overly verbose statement in the setup of the pool VP
aren't particularly useful (esp. considering we don't actually
use the pool, we configure it bcs HW requires it only). So
remove them which improves the code readability.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:27 +10:00
Nicholas Piggin 34c604d275 powerpc/64s: free page table caches at exit_mmap time
The kernel page table caches are tied to init_mm, so there is no
more need for them after userspace is finished.

destroy_context() gets called when we drop the last reference for an
mm, which can be much later than the task exit due to other lazy mm
references to it. We can free the page table cache pages on task exit
because they only cache the userspace page tables and kernel threads
should not access user space addresses.

The mapping for kernel threads itself is maintained in init_mm and
page table cache for that is attached to init_mm.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Merge change log additions from Aneesh]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:27 +10:00
Nicholas Piggin 5a6099346c powerpc/64s/radix: tlb do not flush on page size when fullmm
When the mm is being torn down there will be a full PID flush so
there is no need to flush the TLB on page size changes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:26 +10:00
Michael Ellerman 7cd129b4b5 powerpc: Add a checkpatch wrapper with our preferred settings
This makes it easy to run checkpatch with settings that I like.

Usage is eg:

  $ ./arch/powerpc/tools/checkpatch.sh -g origin/master..

To check all commits since origin/master.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Russell Currey <ruscur@russell.cc>
2018-08-07 21:49:25 +10:00
Michael Ellerman 4da1f79227 powerpc/64: Disable irq restore warning for now
We recently added a warning in arch_local_irq_restore() to check that
the soft masking state matches reality.

Unfortunately it trips in a few places, which are not entirely trivial
to fix. The key problem is if we're doing function_graph tracing of
restore_math(), the warning pops and then seems to recurse. It's not
entirely clear because the system continuously oopses on all CPUs,
with the output interleaved and unreadable.

It's also been observed on a G5 coming out of idle.

Until we can fix those cases disable the warning for now.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:24 +10:00
Martin Schwidefsky 26f843848b s390: fix br_r1_trampoline for machines without exrl
For machines without the exrl instruction the BFP jit generates
code that uses an "br %r1" instruction located in the lowcore page.
Unfortunately there is a cut & paste error that puts an additional
"larl %r1,.+14" instruction in the code that clobbers the branch
target address in %r1. Remove the larl instruction.

Cc: <stable@vger.kernel.org> # v4.17+
Fixes: de5cb6eb51 ("s390: use expoline thunks in the BPF JIT")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-07 13:38:16 +02:00
Martin Schwidefsky 5eda25b102 s390/lib: use expoline for all bcr instructions
The memove, memset, memcpy, __memset16, __memset32 and __memset64
function have an additional indirect return branch in form of a
"bzr" instruction. These need to use expolines as well.

Cc: <stable@vger.kernel.org> # v4.17+
Fixes: 97489e0663 ("s390/lib: use expoline for indirect branches")
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-07 13:38:13 +02:00
Thomas Gleixner bc2d8d262c cpu/hotplug: Fix SMT supported evaluation
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.

Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.

SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.

Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-07 12:25:30 +02:00
Ard Biesheuvel 22240df7ac crypto: arm64/ghash-ce - implement 4-way aggregation
Enhance the GHASH implementation that uses 64-bit polynomial
multiplication by adding support for 4-way aggregation. This
more than doubles the performance, from 2.4 cycles per byte
to 1.1 cpb on Cortex-A53.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:51:40 +08:00
Ard Biesheuvel 8e492eff7d crypto: arm64/ghash-ce - replace NEON yield check with block limit
Checking the TIF_NEED_RESCHED flag is disproportionately costly on cores
with fast crypto instructions and comparatively slow memory accesses.

On algorithms such as GHASH, which executes at ~1 cycle per byte on
cores that implement support for 64 bit polynomial multiplication,
there is really no need to check the TIF_NEED_RESCHED particularly
often, and so we can remove the NEON yield check from the assembler
routines.

However, unlike the AEAD or skcipher APIs, the shash/ahash APIs take
arbitrary input lengths, and so there needs to be some sanity check
to ensure that we don't hog the CPU for excessive amounts of time.

So let's simply cap the maximum input size that is processed in one go
to 64 KB.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:51:39 +08:00
Ondrej Mosnacek 877ccce7cb crypto: x86/aegis,morus - Fix and simplify CPUID checks
It turns out I had misunderstood how the x86_match_cpu() function works.
It evaluates a logical OR of the matching conditions, not logical AND.
This caused the CPU feature checks for AEGIS to pass even if only SSE2
(but not AES-NI) was supported (or vice versa), leading to potential
crashes if something tried to use the registered algs.

This patch switches the checks to a simpler method that is used e.g. in
the Camellia x86 code.

The patch also removes the MODULE_DEVICE_TABLE declarations which
actually seem to cause the modules to be auto-loaded at boot, which is
not desired. The crypto API on-demand module loading is sufficient.

Fixes: 1d373d4e8e ("crypto: x86 - Add optimized AEGIS implementations")
Fixes: 6ecc9d9ff9 ("crypto: x86 - Add optimized MORUS implementations")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:51:15 +08:00
Ard Biesheuvel 30f1a9f53e crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable
Squeeze out another 5% of performance by minimizing the number
of invocations of kernel_neon_begin()/kernel_neon_end() on the
common path, which also allows some reloads of the key schedule
to be optimized away.

The resulting code runs at 2.3 cycles per byte on a Cortex-A53.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:38:04 +08:00
Ard Biesheuvel e0bd888dc4 crypto: arm64/aes-ce-gcm - implement 2-way aggregation
Implement a faster version of the GHASH transform which amortizes
the reduction modulo the characteristic polynomial across two
input blocks at a time.

On a Cortex-A53, the gcm(aes) performance increases 24%, from
3.0 cycles per byte to 2.4 cpb for large input sizes.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:38:04 +08:00
Ard Biesheuvel 71e52c278c crypto: arm64/aes-ce-gcm - operate on two input blocks at a time
Update the core AES/GCM transform and the associated plumbing to operate
on 2 AES/GHASH blocks at a time. By itself, this is not expected to
result in a noticeable speedup, but it paves the way for reimplementing
the GHASH component using 2-way aggregation.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:38:04 +08:00
Herbert Xu 3465893d27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merge crypto-2.6 to pick up NEON yield revert.
2018-08-07 17:37:10 +08:00
Ard Biesheuvel f10dc56c64 crypto: arm64 - revert NEON yield for fast AEAD implementations
As it turns out, checking the TIF_NEED_RESCHED flag after each
iteration results in a significant performance regression (~10%)
when running fast algorithms (i.e., ones that use special instructions
and operate in the < 4 cycles per byte range) on in-order cores with
comparatively slow memory accesses such as the Cortex-A53.

Given the speed of these ciphers, and the fact that the page based
nature of the AEAD scatterwalk API guarantees that the core NEON
transform is never invoked with more than a single page's worth of
input, we can estimate the worst case duration of any resulting
scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages,
processing a page's worth of input at 4 cycles per byte results in
a delay of ~250 us, which is a reasonable upper bound.

So let's remove the yield checks from the fused AES-CCM and AES-GCM
routines entirely.

This reverts commit 7b67ae4d5c and
partially reverts commit 7c50136a8a.

Fixes: 7c50136a8a ("crypto: arm64/aes-ghash - yield NEON after every ...")
Fixes: 7b67ae4d5c ("crypto: arm64/aes-ccm - yield NEON after every ...")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:26:23 +08:00
Paul Burton b023a93960
MIPS: Avoid using array as parameter to write_c0_kpgd()
Passing an array (swapper_pg_dir) as the argument to write_c0_kpgd() in
setup_pw() will become problematic if we modify __write_64bit_c0_split()
to cast its val argument to unsigned long long, because for 32-bit
kernel builds the size of a pointer will differ from the size of an
unsigned long long. This would fall foul of gcc's pointer-to-int-cast
diagnostic.

Cast the value to a long, which should be the same width as the pointer
that we ultimately want & will be sign extended if required to the
unsigned long long that __write_64bit_c0_split() ultimately needs.

Signed-off-by: Paul Burton <paul.burton@mips.com>
2018-08-06 18:44:09 -07:00
Paul Burton ee67855ecd
MIPS: vdso: Allow clang's --target flag in VDSO cflags
The MIPS VDSO code filters out a subset of known-good flags from
KBUILD_CFLAGS to use when building VDSO libraries. When we build using
clang we need to allow the --target flag through, otherwise we'll
generally attempt to build the VDSO for the architecture of the build
machine rather than for MIPS.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20154/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-08-06 15:53:33 -07:00
Paul Burton 4467f7ad7d
MIPS: genvdso: Remove GOT checks
Our genvdso tool performs some rather paranoid checking that the VDSO
library isn't attempting to make use of a GOT by constraining the number
of entries that the GOT is allowed to contain to the minimum 2 entries
that are always generated by binutils.

Unfortunately lld prior to revision 334390 generates a third entry,
which is unused & thus harmless but falls foul of genvdso's checks &
causes the build to fail.

Since we already check that the VDSO contains no relocations it seems
reasonable to presume that it also doesn't contain use of a GOT, which
would involve relocations. Thus rather than attempting to work around
this issue by allowing 3 GOT entries when using lld, simply remove the
GOT checks which seem overly paranoid.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20152/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-08-06 15:28:46 -07:00
M. Vefa Bicakci 405c018a25 xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bits
Commit d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
adjustment corruption") has moved the query and calculation of the
x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct
from the get_cpu_cap function to a new function named
get_cpu_address_sizes.

One of the call sites related to Xen PV VMs was unfortunately missed
in the aforementioned commit. This prevents successful boot-up of
kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL
is enabled, due to the following code path:

  enlighten_pv.c::xen_start_kernel
    mmu_pv.c::xen_reserve_special_pages
      page.h::__pa
        physaddr.c::__phys_addr
          physaddr.h::phys_addr_valid

phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical
addresses. boot_cpu_data.x86_phys_bits is no longer populated before
the call to xen_reserve_special_pages due to the aforementioned commit
though, so the validation performed by phys_addr_valid fails, which
causes __phys_addr to trigger a BUG, preventing boot-up.

Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: x86@kernel.org
Cc: stable@vger.kernel.org # for v4.17 and up
Fixes: d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-06 16:27:41 -04:00
Thomas Gleixner 315706049c Merge branch 'x86/pti-urgent' into x86/pti
Integrate the PTI Global bit fixes which conflict with the 32bit PTI
support.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-06 20:56:34 +02:00
Dave Hansen c40a56a781 x86/mm/init: Remove freed kernel image areas from alias mapping
The kernel image is mapped into two places in the virtual address space
(addresses without KASLR, of course):

	1. The kernel direct map (0xffff880000000000)
	2. The "high kernel map" (0xffffffff81000000)

We actually execute out of #2.  If we get the address of a kernel symbol,
it points to #2, but almost all physical-to-virtual translations point to

Parts of the "high kernel map" alias are mapped in the userspace page
tables with the Global bit for performance reasons.  The parts that we map
to userspace do not (er, should not) have secrets. When PTI is enabled then
the global bit is usually not set in the high mapping and just used to
compensate for poor performance on systems which lack PCID.

This is fine, except that some areas in the kernel image that are adjacent
to the non-secret-containing areas are unused holes.  We free these holes
back into the normal page allocator and reuse them as normal kernel memory.
The memory will, of course, get *used* via the normal map, but the alias
mapping is kept.

This otherwise unused alias mapping of the holes will, by default keep the
Global bit, be mapped out to userspace, and be vulnerable to Meltdown.

Remove the alias mapping of these pages entirely.  This is likely to
fracture the 2M page mapping the kernel image near these areas, but this
should affect a minority of the area.

The pageattr code changes *all* aliases mapping the physical pages that it
operates on (by default).  We only want to modify a single alias, so we
need to tweak its behavior.

This unmapping behavior is currently dependent on PTI being in place.
Going forward, we should at least consider doing this for all
configurations.  Having an extra read-write alias for memory is not exactly
ideal for debugging things like random memory corruption and this does
undercut features like DEBUG_PAGEALLOC or future work like eXclusive Page
Frame Ownership (XPFO).

Before this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
  current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

After this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
current_kernel-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
current_kernel-0xffffffff82488000-0xffffffff82600000        1504K                               pte
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82c0d000          52K     RW                     NX pte
current_kernel-0xffffffff82c0d000-0xffffffff82dc0000        1740K                               pte

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
  current_user-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
  current_user-0xffffffff82488000-0xffffffff82600000        1504K                               pte
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

[ tglx: Do not unmap on 32bit as there is only one mapping ]

Fixes: 0f561fce4d ("x86/pti: Enable global pages for shared areas")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20180802225831.5F6A2BFC@viggo.jf.intel.com
2018-08-06 20:54:16 +02:00
Robert P. J. Day 7dc084d625
MIPS: Remove obsolete MIPS checks for DST node "chosen@0"
As there is precious little left in any DTS files referring to the
node "/chosen@0" as opposed to "/chosen", remove the two checks for
the former node name.

[paul.burton@mips.com:
  The modified yamon-dt code only operates on
  arch/mips/boot/dts/mti/sead3.dts right now, and that uses chosen
  rather than chosen@0 anyway, so this should have no behavioural
  effect.]

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20131/
Cc: linux-mips@linux-mips.org
2018-08-06 09:50:33 -07:00
Uros Bizjak fd8ca6dac9 KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c
Remove open-coded uses of set instructions to use CC_SET()/CC_OUT() in
arch/x86/kvm/vmx.c.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
[Mark error paths as unlikely while touching this. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 18:18:41 +02:00
Wanpeng Li aaffcfd1e8 KVM: X86: Implement PV IPIs in linux guest
Implement paravirtual apic hooks to enable PV IPIs for KVM if the "send IPI"
hypercall is available.  The hypercall lets a guest send IPIs, with
at most 128 destinations per hypercall in 64-bit mode and 64 vCPUs per
hypercall in 32-bit mode.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:22 +02:00
Wanpeng Li d63bae079b KVM: X86: Add kvm hypervisor init time platform setup callback
Add kvm hypervisor init time platform setup callback which
will be used to replace native apic hooks by pararvirtual
hooks.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:21 +02:00
Wanpeng Li 4180bf1b65 KVM: X86: Implement "send IPI" hypercall
Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
is enabled in qemu, however, latest AMD EPYC still just supports xapic
mode which can get great improvement by Exit-less IPIs. This patchset
lets a guest send multicast IPIs, with at most 128 destinations per
hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.

Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):

x2apic cluster mode, vanilla

 Dry-run:                         0,            2392199 ns
 Self-IPI:                  6907514,           15027589 ns
 Normal IPI:              223910476,          251301666 ns
 Broadcast IPI:                   0,         9282161150 ns
 Broadcast lock:                  0,         8812934104 ns

x2apic cluster mode, pv-ipi

 Dry-run:                         0,            2449341 ns
 Self-IPI:                  6720360,           15028732 ns
 Normal IPI:              228643307,          255708477 ns
 Broadcast IPI:                   0,         7572293590 ns  => 22% performance boost
 Broadcast lock:                  0,         8316124651 ns

x2apic physical mode, vanilla

 Dry-run:                         0,            3135933 ns
 Self-IPI:                  8572670,           17901757 ns
 Normal IPI:              226444334,          255421709 ns
 Broadcast IPI:                   0,        19845070887 ns
 Broadcast lock:                  0,        19827383656 ns

x2apic physical mode, pv-ipi

 Dry-run:                         0,            2446381 ns
 Self-IPI:                  6788217,           15021056 ns
 Normal IPI:              219454441,          249583458 ns
 Broadcast IPI:                   0,         7806540019 ns  => 154% performance boost
 Broadcast lock:                  0,         9143618799 ns

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:20 +02:00
Tianyu Lan 74fec5b9db KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs()
X86_CR4_OSXSAVE check belongs to sregs check and so move into
kvm_valid_sregs().

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:19 +02:00
Liang Chen ee6268ba3a KVM: x86: Skip pae_root shadow allocation if tdp enabled
Considering the fact that the pae_root shadow is not needed when
tdp is in use, skip the pae_root shadow page allocation to allow
mmu creation even not being able to obtain memory from DMA32
zone when particular cgroup cpuset.mems or mempolicy control is
applied.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:19 +02:00
Tianyu Lan c2a4eadf77 KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
mmu_set_spte() flushes remote tlbs for drop_parent_pte/drop_spte()
and set_spte() separately. This may introduce redundant flush. This
patch is to combine these flushes and check flush request after
calling set_spte().

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Reviewed-by: Junaid Shahid <junaids@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:18 +02:00
Sean Christopherson 5e079c7ece KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible
The host's FS.base and GS.base rarely change, e.g. ~0.1% of host/guest
swaps on my system.  Cache the last value written to the VMCS and skip
the VMWRITE to the associated VMCS fields when loading host state if
the value hasn't changed since the last VMWRITE.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:17 +02:00
Sean Christopherson 8f21a0bbf3 KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible
On a 64-bit host, FS.sel and GS.sel are all but guaranteed to be 0,
which in turn means they'll rarely change.  Skip the VMWRITE for the
associated VMCS fields when loading host state if the selector hasn't
changed since the last VMWRITE.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:16 +02:00
Sean Christopherson f3bbc0dced KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup
The HOST_{FS,GS}_BASE fields are guaranteed to be written prior to
VMENTER, by way of vmx_prepare_switch_to_guest().  Initialize the
fields to zero for 64-bit kernels instead of pulling the base values
from their respective MSRs.  In addition to eliminating two RDMSRs,
vmx_prepare_switch_to_guest() can safely assume the initial value of
the fields is zero in all cases.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:16 +02:00
Sean Christopherson d7ee039e2b KVM: vmx: move struct host_state usage to struct loaded_vmcs
Make host_state a property of a loaded_vmcs so that it can be
used as a cache of the VMCS fields, e.g. to lazily VMWRITE the
corresponding VMCS field.  Treating host_state as a cache does
not work if it's not VMCS specific as the cache would become
incoherent when switching between vmcs01 and vmcs02.

Move vmcs_host_cr3 and vmcs_host_cr4 into host_state.

Explicitly zero out host_state when allocating a new VMCS for a
loaded_vmcs.  Unlike the pre-existing vmcs_host_cr{3,4} usage,
the segment information is not guaranteed to be (re)initialized
when running a new nested VMCS, e.g. HOST_FS_BASE is not written
in vmx_set_constant_host_state().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:15 +02:00
Sean Christopherson e920de8507 KVM: vmx: compute need to reload FS/GS/LDT on demand
Remove fs_reload_needed and gs_ldt_reload_needed from host_state
and instead compute whether we need to reload various state at
the time we actually do the reload.  The state that is tracked
by the *_reload_needed variables is not any more volatile than
the trackers themselves.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:14 +02:00
Sean Christopherson fd1ec7723f KVM: nVMX: remove a misleading comment regarding vmcs02 fields
prepare_vmcs02() has an odd comment that says certain fields are
"not in vmcs02".  AFAICT the intent of the comment is to document
that various VMCS fields are not handled by prepare_vmcs02(),
e.g. HOST_{FS,GS}_{BASE,SELECTOR}.  While technically true, the
comment is misleading, e.g. it can lead the reader to think that
KVM never writes those fields to vmcs02.

Remove the comment altogether as the handling of FS and GS is
not specific to nested VMX, and GUEST_PML_INDEX has been written
by prepare_vmcs02() since commit "4e59516a12a6 (kvm: vmx: ensure
VMCS is current while enabling PML)"

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:13 +02:00
Sean Christopherson 6d6095bd2c KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state()
Now that the vmx_load_host_state() wrapper is gone, i.e. the only
time we call the core functions is when we're actually about to
switch between guest/host, rename the functions that handle lazy
state switching to vmx_prepare_switch_to_{guest,host}_state() to
better document the full extent of their functionality.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:12 +02:00
Sean Christopherson 678e315e78 KVM: vmx: add dedicated utility to access guest's kernel_gs_base
When lazy save/restore of MSR_KERNEL_GS_BASE was introduced[1], the
MSR was intercepted in all modes and was only restored for the host
when the guest is in 64-bit mode.  So at the time, going through the
full host restore prior to accessing MSR_KERNEL_GS_BASE was necessary
to load host state and was not a significant waste of cycles.

Later, MSR_KERNEL_GS_BASE interception was disabled for a 64-bit
guest[2], and then unconditionally saved/restored for the host[3].
As a result, loading full host state is overkill for accesses to
MSR_KERNEL_GS_BASE, and completely unnecessary when the guest is
not in 64-bit mode.

Add a dedicated utility to read/write the guest's MSR_KERNEL_GS_BASE
(outside of the save/restore flow) to minimize the overhead incurred
when accessing the MSR.  When setting EFER, only decache the MSR if
the new EFER will disable long mode.

Removing out-of-band usage of vmx_load_host_state() also eliminates,
or at least reduces, potential corner cases in its usage, which in
turn will (hopefuly) make it easier to reason about future changes
to the save/restore flow, e.g. optimization of saving host state.

[1] commit 44ea2b1758 ("KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx
                                    autoload msr area")
[2] commit 5897297bc2 ("KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE")
[3] commit c8770e7ba6 ("KVM: VMX: Fix host userspace gsbase corruption")

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:12 +02:00
Sean Christopherson bd9966de4e KVM: vmx: track host_state.loaded using a loaded_vmcs pointer
Using 'struct loaded_vmcs*' to track whether the CPU registers
contain host or guest state kills two birds with one stone.

  1. The (effective) boolean host_state.loaded is poorly named.
     It does not track whether or not host state is loaded into
     the CPU registers (which most readers would expect), but
     rather tracks if host state has been saved AND guest state
     is loaded.

  2. Using a loaded_vmcs pointer provides a more robust framework
     for the optimized guest/host state switching, especially when
     consideration per-VMCS enhancements.  To that end, WARN_ONCE
     if we try to switch to host state with a different VMCS than
     was last used to save host state.

Resolve an occurrence of the new WARN by setting loaded_vmcs after
the call to vmx_vcpu_put() in vmx_switch_vmcs().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:11 +02:00
Sean Christopherson e368b875a8 KVM: vmx: refactor segmentation code in vmx_save_host_state()
Use local variables in vmx_save_host_state() to temporarily track
the selector and base values for FS and GS, and reorganize the
code so that the 64-bit vs 32-bit portions are contained within
a single #ifdef.  This refactoring paves the way for future patches
to modify the updating of VMCS state with minimal changes to the
code, and (hopefully) simplifies resolving a likely conflict with
another in-flight patch[1] by being the whipping boy for future
patches.

[1] https://www.spinics.net/lists/kvm/msg171647.html

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:10 +02:00
Jim Mattson e49fcb8b9e kvm: nVMX: Fix fault priority for VMX operations
When checking emulated VMX instructions for faults, the #UD for "IF
(not in VMX operation)" should take precedence over the #GP for "ELSIF
CPL > 0."

Suggested-by: Eric Northup <digitaleric@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:09 +02:00
Jim Mattson 36090bf43a kvm: nVMX: Fix fault vector for VMX operation at CPL > 0
The fault that should be raised for a privilege level violation is #GP
rather than #UD.

Fixes: 727ba748e1 ("kvm: nVMX: Enforce cpl=0 for VMX instructions")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:08 +02:00
Tianyu Lan 877ad952be KVM: vmx: Add tlb_remote_flush callback support
Register tlb_remote_flush callback for vmx when hyperv capability of
nested guest mapping flush is detected. The interface can help to
reduce overhead when flush ept table among vcpus for nested VM. The
tradition way is to send IPIs to all affected vcpus and executes
INVEPT on each vcpus. It will trigger several vmexits for IPI
and INVEPT emulation. Hyper-V provides such hypercall to do
flush for all vcpus and call the hypercall when all ept table
pointers of single VM are same.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:07 +02:00
Tianyu Lan b08660e59d KVM: x86: Add tlb remote flush callback in kvm_x86_ops.
This patch is to provide a way for platforms to register hv tlb remote
flush callback and this helps to optimize operation of tlb flush
among vcpus for nested virtualization case.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:06 +02:00
Tianyu Lan 60cfce4c4f X86/Hyper-V: Add hyperv_nested_flush_guest_mapping ftrace support
This patch is to add hyperv_nested_flush_guest_mapping support to trace
hvFlushGuestPhysicalAddressSpace hypercall.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:05 +02:00
Tianyu Lan eb914cfe72 X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall support
Hyper-V supports a pv hypercall HvFlushGuestPhysicalAddressSpace to
flush nested VM address space mapping in l1 hypervisor and it's to
reduce overhead of flushing ept tlb among vcpus. This patch is to
implement it.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:04 +02:00
Waiman Long 3553ae5690 x86/kvm: Don't use pvqspinlock code if only 1 vCPU
On a VM with only 1 vCPU, the locking fast path will always be
successful. In this case, there is no need to use the the PV qspinlock
code which has higher overhead on the unlock side than the native
qspinlock code.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:04 +02:00
Tianyu Lan 450917b654 KVM/MMU: Simplify __kvm_sync_page() function
Merge check of "sp->role.cr4_pae != !!is_pae(vcpu))" and "vcpu->
arch.mmu.sync_page(vcpu, sp) == 0". kvm_mmu_prepare_zap_page()
is called under both these conditions.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:03 +02:00
Junaid Shahid 208320ba10 kvm: x86: Remove CR3_PCID_INVD flag
It is a duplicate of X86_CR3_PCID_NOFLUSH. So just use that instead.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:02 +02:00
Junaid Shahid b94742c958 kvm: x86: Add multi-entry LRU cache for previous CR3s
Adds support for storing multiple previous CR3/root_hpa pairs maintained
as an LRU cache, so that the lockless CR3 switch path can be used when
switching back to any of them.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:02 +02:00
Junaid Shahid faff87588d kvm: x86: Flush only affected TLB entries in kvm_mmu_invlpg*
This needs a minor bug fix. The updated patch is as follows.

Thanks,
Junaid

------------------------------------------------------------------------------

kvm_mmu_invlpg() and kvm_mmu_invpcid_gva() only need to flush the TLB
entries for the specific guest virtual address, instead of flushing all
TLB entries associated with the VM.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:01 +02:00
Junaid Shahid 956bf3531f kvm: x86: Skip shadow page resync on CR3 switch when indicated by guest
When the guest indicates that the TLB doesn't need to be flushed in a
CR3 switch, we can also skip resyncing the shadow page tables since an
out-of-sync shadow page table is equivalent to an out-of-sync TLB.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:59:00 +02:00
Junaid Shahid 08fb59d8a4 kvm: x86: Support selectively freeing either current or previous MMU root
kvm_mmu_free_roots() now takes a mask specifying which roots to free, so
that either one of the roots (active/previous) can be individually freed
when needed.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:59 +02:00
Junaid Shahid 7eb77e9f5f kvm: x86: Add a root_hpa parameter to kvm_mmu->invlpg()
This allows invlpg() to be called using either the active root_hpa
or the prev_root_hpa.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:58 +02:00
Junaid Shahid ade61e2824 kvm: x86: Skip TLB flush on fast CR3 switch when indicated by guest
When PCIDs are enabled, the MSb of the source operand for a MOV-to-CR3
instruction indicates that the TLB doesn't need to be flushed.

This change enables this optimization for MOV-to-CR3s in the guest
that have been intercepted by KVM for shadow paging and are handled
within the fast CR3 switch path.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:58 +02:00
Junaid Shahid eb4b248e15 kvm: vmx: Support INVPCID in shadow paging mode
Implement support for INVPCID in shadow paging mode as well.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:57 +02:00
Junaid Shahid c9470a2e28 kvm: x86: Propagate guest PCIDs to host PCIDs
When using shadow paging mode, propagate the guest's PCID value to
the shadow CR3 in the host instead of always using PCID 0.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:56 +02:00
Junaid Shahid afe828d1de kvm: x86: Add ability to skip TLB flush when switching CR3
Remove the implicit flush from the set_cr3 handlers, so that the
callers are able to decide whether to flush the TLB or not.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:55 +02:00
Junaid Shahid 50c28f21d0 kvm: x86: Use fast CR3 switch for nested VMX
Use the fast CR3 switch mechanism to locklessly change the MMU root
page when switching between L1 and L2. The switch from L2 to L1 should
always go through the fast path, while the switch from L1 to L2 should
go through the fast path if L1's CR3/EPTP for L2 hasn't changed
since the last time.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:54 +02:00
Junaid Shahid 1c53da3fa3 kvm: x86: Support resetting the MMU context without resetting roots
This adds support for re-initializing the MMU context in a different
mode while preserving the active root_hpa and the prev_root.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:54 +02:00
Junaid Shahid 0aab33e4f9 kvm: x86: Add support for fast CR3 switch across different MMU modes
This generalizes the lockless CR3 switch path to be able to work
across different MMU modes (e.g. nested vs non-nested) by checking
that the expected page role of the new root page matches the page role
of the previously stored root page in addition to checking that the new
CR3 matches the previous CR3. Furthermore, instead of loading the
hardware CR3 in fast_cr3_switch(), it is now done in vcpu_enter_guest(),
as by that time the MMU context would be up-to-date with the VCPU mode.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:53 +02:00
Junaid Shahid 6e42782f51 kvm: x86: Introduce KVM_REQ_LOAD_CR3
The KVM_REQ_LOAD_CR3 request loads the hardware CR3 using the
current root_hpa.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:52 +02:00
Junaid Shahid 9fa72119b2 kvm: x86: Introduce kvm_mmu_calc_root_page_role()
These functions factor out the base role calculation from the
corresponding kvm_init_*_mmu() functions. The new functions return
what would be the role assigned to a root page in the current VCPU
state. This can be masked with mmu_base_role_mask to derive the base
role.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:51 +02:00
Junaid Shahid 7c390d350f kvm: x86: Add fast CR3 switch code path
When using shadow paging, a CR3 switch in the guest results in a VM Exit.
In the common case, that VM exit doesn't require much processing by KVM.
However, it does acquire the MMU lock, which can start showing signs of
contention under some workloads even on a 2 VCPU VM when the guest is
using KPTI. Therefore, we add a fast path that avoids acquiring the MMU
lock in the most common cases e.g. when switching back and forth between
the kernel and user mode CR3s used by KPTI with no guest page table
changes in between.

For now, this fast path is implemented only for 64-bit guests and hosts
to avoid the handling of PDPTEs, but it can be extended later to 32-bit
guests and/or hosts as well.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:51 +02:00
Junaid Shahid 578e1c4db2 kvm: x86: Avoid taking MMU lock in kvm_mmu_sync_roots if no sync is needed
kvm_mmu_sync_roots() can locklessly check whether a sync is needed and just
bail out if it isn't.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:50 +02:00
Junaid Shahid 5ce4786f75 kvm: x86: Make sync_page() flush remote TLBs once only
sync_page() calls set_spte() from a loop across a page table. It would
work better if set_spte() left the TLB flushing to its callers, so that
sync_page() can aggregate into a single call.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:49 +02:00
Peter Xu 42522d08cd KVM: MMU: drop vcpu param in gpte_access
It's never used.  Drop it.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:48 +02:00
Liran Alon abfc52c612 KVM: nVMX: Separate logic allocating shadow vmcs to a function
No functionality change.
This is done as a preparation for VMCS shadowing virtualization.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:48 +02:00
Liran Alon 491a603845 KVM: VMX: Mark vmcs header as shadow in case alloc_vmcs_cpu() allocate shadow vmcs
No functionality change.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:47 +02:00
Liran Alon 32c7acf044 KVM: nVMX: Expose VMCS shadowing to L1 guest
Expose VMCS shadowing to L1 as a VMX capability of the virtual CPU,
whether or not VMCS shadowing is supported by the physical CPU.
(VMCS shadowing emulation)

Shadowed VMREADs and VMWRITEs from L2 are handled by L0, without a
VM-exit to L1.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:46 +02:00
Liran Alon a7cde481b6 KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps
This is done as a preparation for VMCS shadowing emulation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:45 +02:00
Liran Alon 6d894f498f KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2
This is done as a preparation to VMCS shadowing emulation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:44 +02:00
Paolo Bonzini fa58a9fa74 KVM: nVMX: include shadow vmcs12 in nested state
The shadow vmcs12 cannot be flushed on KVM_GET_NESTED_STATE,
because at that point guest memory is assumed by userspace to
be immutable.  Capture the cache in vmx_get_nested_state, adding
another page at the end if there is an active shadow vmcs12.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:43 +02:00
Liran Alon 61ada7488f KVM: nVMX: Cache shadow vmcs12 on VMEntry and flush to memory on VMExit
This is done is done as a preparation to VMCS shadowing emulation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:42 +02:00
Liran Alon f145d90d97 KVM: nVMX: Verify VMCS shadowing VMCS link pointer
Intel SDM considers these checks to be part of
"Checks on Guest Non-Register State".

Note that it is legal for vmcs->vmcs_link_pointer to be -1ull
when VMCS shadowing is enabled. In this case, any VMREAD/VMWRITE to
shadowed-field sets the ALU flags for VMfailInvalid (i.e. CF=1).

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:41 +02:00
Liran Alon a8a7c02bf7 KVM: nVMX: Verify VMCS shadowing controls
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:40 +02:00
Liran Alon f792d2743e KVM: nVMX: Introduce nested_cpu_has_shadow_vmcs()
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:40 +02:00
Liran Alon a6192d40d5 KVM: nVMX: Fail VMLAUNCH and VMRESUME on shadow VMCS
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:39 +02:00
Liran Alon fa97d7dba7 KVM: nVMX: Allow VMPTRLD for shadow VMCS if vCPU supports VMCS shadowing
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:38 +02:00
Liran Alon e253674227 KVM: VMX: Change vmcs12_{read,write}_any() to receive vmcs12 as parameter
No functionality change.
This is done as a preparation for VMCS shadowing emulation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:37 +02:00
Liran Alon 392b2f25aa KVM: VMX: Create struct for VMCS header
No functionality change.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:37 +02:00
Jim Mattson 8fcc4b5923 kvm: nVMX: Introduce KVM_CAP_NESTED_STATE
For nested virtualization L0 KVM is managing a bit of state for L2 guests,
this state can not be captured through the currently available IOCTLs. In
fact the state captured through all of these IOCTLs is usually a mix of L1
and L2 state. It is also dependent on whether the L2 guest was running at
the moment when the process was interrupted to save its state.

With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
that is in VMX operation.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
[karahmed@ - rename structs and functions and make them ready for AMD and
             address previous comments.
           - handle nested.smm state.
           - rebase & a bit of refactoring.
           - Merge 7/8 and 8/8 into one patch. ]
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:30 +02:00
Paolo Bonzini 7f7f1ba33c KVM: x86: do not load vmcs12 pages while still in SMM
If the vCPU enters system management mode while running a nested guest,
RSM starts processing the vmentry while still in SMM.  In that case,
however, the pages pointed to by the vmcs12 might be incorrectly
loaded from SMRAM.  To avoid this, delay the handling of the pages
until just before the next vmentry.  This is done with a new request
and a new entry in kvm_x86_ops, which we will be able to reuse for
nested VMX state migration.

Extracted from a patch by Jim Mattson and KarimAllah Ahmed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:57:58 +02:00
Paolo Bonzini 44883f01fe KVM: x86: ensure all MSRs can always be KVM_GET/SET_MSR'd
Some of the MSRs returned by GET_MSR_INDEX_LIST currently cannot be sent back
to KVM_GET_MSR and/or KVM_SET_MSR; either they can never be sent back, or you
they are only accepted under special conditions.  This makes the API a pain to
use.

To avoid this pain, this patch makes it so that the result of the get-list
ioctl can always be used for host-initiated get and set.  Since we don't have
a separate way to check for read-only MSRs, this means some Hyper-V MSRs are
ignored when written.  Arguably they should not even be in the result of
GET_MSR_INDEX_LIST, but I am leaving there in case userspace is using the
outcome of GET_MSR_INDEX_LIST to derive the support for the corresponding
Hyper-V feature.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:32:01 +02:00
Sean Christopherson cf81a7e580 KVM: vmx: remove save/restore of host BNDCGFS MSR
Linux does not support Memory Protection Extensions (MPX) in the
kernel itself, thus the BNDCFGS (Bound Config Supervisor) MSR will
always be zero in the KVM host, i.e. RDMSR in vmx_save_host_state()
is superfluous.  KVM unconditionally sets VM_EXIT_CLEAR_BNDCFGS,
i.e. BNDCFGS will always be zero after VMEXIT, thus manually loading
BNDCFGS is also superfluous.

And in the event the MPX kernel support is added (unlikely given
that MPX for userspace is in its death throes[1]), BNDCFGS will
likely be common across all CPUs[2], and at the least shouldn't
change on a regular basis, i.e. saving the MSR on every VMENTRY is
completely unnecessary.

WARN_ONCE in hardware_setup() if the host's BNDCFGS is non-zero to
document that KVM does not preserve BNDCFGS and to serve as a hint
as to how BNDCFGS likely should be handled if MPX is used in the
kernel, e.g. BNDCFGS should be saved once during KVM setup.

[1] https://lkml.org/lkml/2018/4/27/1046
[2] http://www.openwall.com/lists/kernel-hardening/2017/07/24/28

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:32:00 +02:00
Paolo Bonzini d2ce98ca0a Linux 4.18-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76
 sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA
 1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR
 9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU
 mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2
 L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt
 vJJlibI=
 =42a5
 -----END PGP SIGNATURE-----

Merge tag 'v4.18-rc6' into HEAD

Pull bug fixes into the KVM development tree to avoid nasty conflicts.
2018-08-06 17:31:36 +02:00
Thomas Gleixner 9e90c79852 irqchip updates for 4.19
- GICv3 ITS LPI allocation revamp
 - GICv3 support for hypervisor-enforced LPI range
 - GICv3 ITS conversion to raw spinlock
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltoBXMVHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDyUYP/1feAq3F7ZmhCIZka4c6y/m4EBpq
 BjWEEgOAGMEyyB4s98flsRtZcEUxxp6CqEXo2FgCsd1Nj+og7oA7vwOlqy3aGzsi
 9f/Z5Wi6SlG06lH5tmYNkyVbGk2tE3s2FzkH5Rg8qZGk+X3OCOdNs/+G20pYAkSp
 ESePWSapbQUJSExJ1MqzfdHFidtVA1V+ev8BKdIp2ykl1NRae8LJeKHIbqac49Ym
 JclfCLFpQM1M1ElB9j0E8hAvZhz10oOz7TtBR737O/1QEifVyFqGBckPzldvwIJM
 zZ+nR+Yzj1ruD109xwaF1iKy9AinZWhiqrtN7UXJ3jwHtNih+sy0R6FQ38GMNoOC
 0K02n/qStR5xglGr4BmAcWlOuFtBYWfz6HpSVMqaTWWmOxHEiqS6pXtEA+dV/YyI
 wHLbo0YzpWTQm6t1+b/PoByAJ0/hOcD1nOD57b+NGjX7tZV0sGjpGsecvFhTSywh
 BN3COBi9k/FOBrOTGDX1qUAI+mEf76vc2BAC+BkkoiiMg3WlY0E9qfQJguUxHdrb
 0LS3lDZoHCNoz8RZLrUyenTT0NYGcjPGUTinMDJWG79VGXOWFexTDdCuX0kF90CK
 1Zie3O6lrTYolmaiyLUxwukKp1SVUyoA5IpKVwfDJQYUhEfk27yvlzg2MBMcHDRA
 uy3QSkmjx9vw/sAu
 =gKw8
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip updates from Marc Zyngier:

- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
2018-08-06 12:45:42 +02:00
Yangbo Lu a16b5da54d powerpc/mpc85xx: add clocks property for fman ptp timer node
This patch is to add clocks property for fman ptp timer node.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05 17:11:49 -07:00
Yangbo Lu f63421a70f arm64: dts: fsl: add clocks property for fman ptp timer node
This patch is to add clocks property for fman ptp timer node.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05 17:11:49 -07:00
Alistair Strachan 379d98ddf4 x86: vdso: Use $LD instead of $CC to link
The vdso{32,64}.so can fail to link with CC=clang when clang tries to find
a suitable GCC toolchain to link these libraries with.

/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
  access beyond end of merged section (782)

This happens because the host environment leaked into the cross compiler
environment due to the way clang searches for suitable GCC toolchains.

Clang is a retargetable compiler, and each invocation of it must provide
--target=<something> --gcc-toolchain=<something> to allow it to find the
correct binutils for cross compilation. These flags had been added to
KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various
reasons) which breaks clang's ability to find the correct linker when cross
compiling.

Most of the time this goes unnoticed because the host linker is new enough
to work anyway, or is incompatible and skipped, but this cannot be reliably
assumed.

This change alters the vdso makefile to just use LD directly, which
bypasses clang and thus the searching problem. The makefile will just use
${CROSS_COMPILE}ld instead, which is always what we want. This matches the
method used to link vmlinux.

This drops references to DISABLE_LTO; this option doesn't seem to be set
anywhere, and not knowing what its possible values are, it's not clear how
to convert it from CC to LD flag.

Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: kernel-team@android.com
Cc: joel@joelfernandes.org
Cc: Andi Kleen <andi.kleen@intel.com>
Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrachan@google.com
2018-08-05 22:33:50 +02:00
Nick Desaulniers 208cbb3255 x86/irqflags: Provide a declaration for native_save_fl
It was reported that the commit d0a8d9378d is causing users of gcc < 4.9
to observe -Werror=missing-prototypes errors.

Indeed, it seems that:
extern inline unsigned long native_save_fl(void) { return 0; }

compiled with -Werror=missing-prototypes produces this warning in gcc <
4.9, but not gcc >= 4.9.

Fixes: d0a8d9378d ("x86/paravirt: Make native_save_fl() extern inline").
Reported-by: David Laight <david.laight@aculab.com>
Reported-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: jgross@suse.com
Cc: kstewart@linuxfoundation.org
Cc: gregkh@linuxfoundation.org
Cc: boris.ostrovsky@oracle.com
Cc: astrachan@google.com
Cc: mka@chromium.org
Cc: arnd@arndb.de
Cc: tstellar@redhat.com
Cc: sedat.dilek@gmail.com
Cc: David.Laight@aculab.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180803170550.164688-1-ndesaulniers@google.com
2018-08-05 22:30:37 +02:00
Dave Hansen 6ea2738e0c x86/mm/init: Add helper for freeing kernel image pages
When chunks of the kernel image are freed, free_init_pages() is used
directly.  Consolidate the three sites that do this.  Also update the
string to give an incrementally better description of that memory versus
what was there before.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: hughd@google.com
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225829.FE0E32EA@viggo.jf.intel.com
2018-08-05 22:21:03 +02:00
Dave Hansen 9f515cdb41 x86/mm/init: Pass unconverted symbol addresses to free_init_pages()
The x86 code has several places where it frees parts of kernel image:

 1. Unused SMP alternative
 2. __init code
 3. The hole between text and rodata
 4. The hole between rodata and data

We call free_init_pages() to do this.  Strangely, we convert the symbol
addresses to kernel direct map addresses in some cases (#3, #4) but not
others (#1, #2).

The virt_to_page() and the other code in free_reserved_area() now works
fine for for symbol addresses on x86, so don't bother converting the
addresses to direct map addresses before freeing them.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: hughd@google.com
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225828.89B2D0E2@viggo.jf.intel.com
2018-08-05 22:21:02 +02:00
Dave Hansen eac7073aa6 x86/mm/pti: Clear Global bit more aggressively
The kernel image starts out with the Global bit set across the entire
kernel image.  The bit is cleared with set_memory_nonglobal() in the
configurations with PCIDs where the performance benefits of the Global bit
are not needed.

However, this is fragile.  It means that we are stuck opting *out* of the
less-secure (Global bit set) configuration, which seems backwards.  Let's
start more secure (Global bit clear) and then let things opt back in if
they want performance, or are truly mapping common data between kernel and
userspace.

This fixes a bug.  Before this patch, there are areas that are unmapped
from the user page tables (like like everything above 0xffffffff82600000 in
the example below).  These have the hallmark of being a wrong Global area:
they are not identical in the 'current_kernel' and 'current_user' page
table dumps.  They are also read-write, which means they're much more
likely to contain secrets.

Before this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                 GLB NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE     GLB NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                 GLB NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE     GLB NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

 current_user:---[ High Kernel Mapping ]---
 current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
 current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
 current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
 current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                 GLB NX pte
 current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
 current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

After this patch:

current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd

  current_user:---[ High Kernel Mapping ]---
  current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
  current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
  current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
  current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
  current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
  current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd

Fixes: 0f561fce4d ("x86/pti: Enable global pages for shared areas")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225825.A100C071@viggo.jf.intel.com
2018-08-05 22:21:02 +02:00
David S. Miller c1c8626fce Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Lots of overlapping changes, mostly trivial in nature.

The mlxsw conflict was resolving using the example
resolution at:

https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05 13:04:31 -07:00
Linus Torvalds a8c199208c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
 "A single fix, which addresses boot failures on machines which do not
  report EBDA correctly, which can place the trampoline into reserved
  memory regions. Validating against E820 prevents that"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/compressed/64: Validate trampoline placement against E820
2018-08-05 09:39:30 -07:00
Linus Torvalds 0cdf6d4607 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A set of fixes for perf:

  Kernel side:

   - Fix the hardcoded index of extra PCI devices on Broadwell which
     caused a resource conflict and triggered warnings on CPU hotplug.

  Tooling:

   - Update the tools copy of several files, including perf_event.h,
     powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and x86's
     memcpy_64.s (used in 'perf bench mem'), silencing the respective
     warnings during the perf tools build.

   - Fix the build on the alpine:edge distro"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices
  perf tools: Fix the build on the alpine:edge distro
  tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
  tools headers uapi: Refresh linux/bpf.h copy
  tools headers powerpc: Update asm/unistd.h copy to pick new
  tools headers uapi: Update tools's copy of linux/perf_event.h
2018-08-05 09:13:07 -07:00
Paolo Bonzini 5b76a3cff0 KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
When nested virtualization is in use, VMENTER operations from the nested
hypervisor into the nested guest will always be processed by the bare metal
hypervisor, and KVM's "conditional cache flushes" mode in particular does a
flush on nested vmentry.  Therefore, include the "skip L1D flush on
vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.

Add the relevant Documentation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 17:10:20 +02:00
Paolo Bonzini 8e0b2b9166 x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed.  Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 17:10:19 +02:00
Paolo Bonzini ea156d192f x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Three changes to the content of the sysfs file:

 - If EPT is disabled, L1TF cannot be exploited even across threads on the
   same core, and SMT is irrelevant.

 - If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
   instead of "vulnerable, SMT vulnerable"

 - Reorder the two parts so that the main vulnerability state comes first
   and the detail on SMT is second.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 17:10:19 +02:00
Thomas Gleixner f2701b77bb Merge 4.18-rc7 into master to pick up the KVM dependcy
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 16:39:29 +02:00
Nicolai Stange 18b57ce2eb x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
For VMEXITs caused by external interrupts, vmx_handle_external_intr()
indirectly calls into the interrupt handlers through the host's IDT.

It follows that these interrupts get accounted for in the
kvm_cpu_l1tf_flush_l1d per-cpu flag.

The subsequently executed vmx_l1d_flush() will thus be aware that some
interrupts have happened and conduct a L1d flush anyway.

Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed
anymore. Drop it.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 09:53:14 +02:00
Nicolai Stange ffcba43ff6 x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
The last missing piece to having vmx_l1d_flush() take interrupts after
VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on
irq entry.

Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(),
ipi_entering_ack_irq(), smp_reschedule_interrupt() and
uv_bau_message_interrupt().

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 09:53:13 +02:00
Nicolai Stange 447ae31667 x86: Don't include linux/irq.h from asm/hardirq.h
The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().

Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like

  asm/smp.h
    asm/apic.h
      asm/hardirq.h
        linux/irq.h
          linux/topology.h
            linux/smp.h
              asm/smp.h

or

  linux/gfp.h
    linux/mmzone.h
      asm/mmzone.h
        asm/mmzone_64.h
          asm/smp.h
            asm/apic.h
              asm/hardirq.h
                linux/irq.h
                  linux/irqdesc.h
                    linux/kobject.h
                      linux/sysfs.h
                        linux/kernfs.h
                          linux/idr.h
                            linux/gfp.h

and others.

This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.

A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.

However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.

Remove the linux/irq.h #include from x86' asm/hardirq.h.

Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.

Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 09:53:13 +02:00
Nicolai Stange 45b575c00d x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
Part of the L1TF mitigation for vmx includes flushing the L1D cache upon
VMENTRY.

L1D flushes are costly and two modes of operations are provided to users:
"always" and the more selective "conditional" mode.

If operating in the latter, the cache would get flushed only if a host side
code path considered unconfined had been traversed. "Unconfined" in this
context means that it might have pulled in sensitive data like user data
or kernel crypto keys.

The need for L1D flushes is tracked by means of the per-vcpu flag
l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A
vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a
L1d flush based on its value and reset that flag again.

Currently, interrupts delivered "normally" while in root operation between
VMEXIT and VMENTER are not taken into account. Part of the reason is that
these don't leave any traces and thus, the vmx code is unable to tell if
any such has happened.

As proposed by Paolo Bonzini, prepare for tracking all interrupts by
introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in
strong analogy to the per-vcpu ->l1tf_flush_l1d.

A later patch will make interrupt handlers set it.

For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86'
per-cpu irq_cpustat_t as suggested by Peter Zijlstra.

Provide the helpers kvm_set_cpu_l1tf_flush_l1d(),
kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them
trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate.

Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as
l1tf_flush_l1d.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-05 09:53:12 +02:00
Nicolai Stange 9aee5f8a7e x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
An upcoming patch will extend KVM's L1TF mitigation in conditional mode
to also cover interrupts after VMEXITs. For tracking those, stores to a
new per-cpu flag from interrupt handlers will become necessary.

In order to improve cache locality, this new flag will be added to x86's
irq_cpustat_t.

Make some space available there by shrinking the ->softirq_pending bitfield
from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS,
i.e. 10.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-05 09:53:12 +02:00
Nicolai Stange 5b6ccc6c3b x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes
vmx_l1d_flush() if so.

This test is unncessary for the "always flush L1D" mode.

Move the check to vmx_l1d_flush()'s conditional mode code path.

Notes:
- vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
  extra function call.
  
- This inverts the (static) branch prediction, but there hadn't been any
  explicit likely()/unlikely() annotations before and so it stays as is.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 09:53:11 +02:00
Nicolai Stange 427362a142 x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
The vmx_l1d_flush_always static key is only ever evaluated if
vmx_l1d_should_flush is enabled. In that case however, there are only two
L1d flushing modes possible: "always" and "conditional".

The "conditional" mode's implementation tends to require more sophisticated
logic than the "always" mode.

Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
with a 'vmx_l1d_flush_cond' one.

There is no change in functionality.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 09:53:11 +02:00
Nicolai Stange 379fd0c7e6 x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no
point in setting l1tf_flush_l1d to true from there again.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 09:53:10 +02:00
Olof Johansson afd3e3dad6 Qualcomm ARM64 Updates for v4.19 - Part 2
* Add thermal nodes for MSM8996 and SDM845
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbY3lJAAoJEFKiBbHx2RXVttEP/jHngIc9DSOi6r1hXvjNgaTM
 CdvxETAaKlDcFb4H4vwTVbHONor4GHK8UnusPodI3SPG62pbEr2Jad9Z/NpuW3XP
 qAvgyPAMPKDPfHAYdp3lGPKb7yc6BBtL5sSTgl/rbsIOFrSeD6BxKRWdcwRIvwW5
 d5VU/JyyBX0Kb0cIaasmUWP5crYXzFsgV0SggQiekgDzfO5cK8pEVwKc5GbIevDN
 mjXf8u5i58xJSabSrXCltKj13/RMzAzVlQaPl/3L3T5SrGq2AYHGWLOwdVMbeeZg
 8nJ95w0voq+f4euL+l5pPmJFw1AZIEWX4MykVJp3bI8qEQwBKxqscw2zg4ZjAaoT
 BAx363JiOHsCknWuhVd845T2U8yMMgzYExDjeMV21goJCNaPO1fXPZr6IZlcqq3x
 X5NT9J3qFKZdf04EHTO7ov/aUO/QiCcxZl8KO1/Sgjwd/tvDQDkFAlFWFC9nrehR
 IMYnWw/btI6HG6iJAyuGx4gmTvza8dOR2oZLnanHAiECJ+6d8nr863yk/b92uP4d
 qGvIwXiQ4FPDP4ta5biacqQCeyT3MIGRL+cceB4S2/m2IxO9+VMPrGJifkD5aDOp
 jBW+f0G9Fb3mgjqGTAAA/bW7tNkbLX4WtWAlTo1to1BOgKB5zwCCZgf43I8Y8wie
 +Rk13xVxvqrR8idXIFap
 =w/hf
 -----END PGP SIGNATURE-----

Merge tag 'qcom-arm64-for-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into next/dt

Qualcomm ARM64 Updates for v4.19 - Part 2

* Add thermal nodes for MSM8996 and SDM845

* tag 'qcom-arm64-for-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux: (21 commits)
  arm64: dts: sdm845: Add tsens nodes
  arm64: dts: msm8996: thermal: Initialise via DT and add second controller
  soc: qcom: rmtfs-mem: fix memleak in probe error paths
  soc: qcom: llc-slice: Add missing MODULE_LICENSE()
  drivers: qcom: rpmh: fix unwanted error check for get_tcs_of_type()
  drivers: qcom: rpmh-rsc: fix the loop index check in get_req_from_tcs
  firmware: qcom: scm: add a dummy qcom_scm_assign_mem()
  drivers: qcom: rpmh-rsc: Check cmd_db_ready() to help children
  drivers: qcom: rpmh-rsc: allow active requests from wake TCS
  drivers: qcom: rpmh: add support for batch RPMH request
  drivers: qcom: rpmh: allow requests to be sent asynchronously
  drivers: qcom: rpmh: cache sleep/wake state requests
  drivers: qcom: rpmh-rsc: allow invalidation of sleep/wake TCS
  drivers: qcom: rpmh-rsc: write sleep/wake requests to TCS
  drivers: qcom: rpmh: add RPMH helper functions
  drivers: qcom: rpmh-rsc: log RPMH requests in FTRACE
  dt-bindings: introduce RPMH RSC bindings for Qualcomm SoCs
  drivers: qcom: rpmh-rsc: add RPMH controller for QCOM SoCs
  drivers: soc: Add LLCC driver
  dt-bindings: Documentation for qcom, llcc
  ...
2018-08-04 11:02:54 -07:00
Linus Torvalds 0b5b1f9a78 Two bugfixes.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJbZLlkAAoJEL/70l94x66D0DkIAJidCqR7YYvsSspPpjbN30iK
 GE3AJhfXDgj+DZ+/HQpslGP7+rpcErtuSLA6pyX8oFewoOt0LNNXeEdGazfpEt76
 lz112RBIjfYVs9GpoiqRbMhIkJQG8lrpP+Ji3yQAdlUcdhoK7IbkFGQpWUk8LBKH
 +11UMt7QYRnw9/BOYrAoY5fplt1PBjkban+s5VDZOMPq433i7pH7haDq5WVB9El7
 n626YvbYXZ4V1mOeqVs4YCBfHZb8dIs58MKBbqJuYefjzX/f9zS72F50ZlJ1D2Sv
 a0gpmpWeDrR9gH+j/TYfHbdN4IWiD5zyk5tIHPLlAkf6FCpO1wOc7xERchx0VWM=
 =4vo0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Two vmx bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: vmx: fix vpid leak
  KVM: vmx: use local variable for current_vmptr when emulating VMPTRST
2018-08-03 13:43:59 -07:00
Linus Torvalds 2ed7533cb7 powerpc fixes for 4.18 #5
One fix for a regression in a recent TLB flush optimisation, which caused us to
 incorrectly not send TLB invalidations to coprocessors.
 
 Thanks to:
   Frederic Barrat, Nicholas Piggin, Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJbZDxOExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAm
 7w//QcYYpBcBu9XK0E53XglSoC4iTzMrPnIIivPsSngLXWtEC/5315nbCOdhT4mh
 CTYxmwCm0pZG4kxJntnT4MXwSLhM46eFs586uOS1BgEJw+m85m2tiEsrNQSogJ14
 GzZiIk1UbzKUZjpSTIXigx0Um6x+q83kIBhd/ewrejB1wz6d5T7Wktjv2iWewGdf
 kOJPXuvaWXeGmJWJ2Gek3T5j7PCII0humEQjsy40XVFR3NYC7jT1r4UONE3NeBTU
 xavo/EcB3JHx2MPPgAfJT/KAiHLQZzuR382azf7W5sQJ7/Yy2N1jpskSxFoMEMtc
 axYWniHfGwJsnwcO4HPisIA67tXR6EzugAyuta4IQufRVp1s+j2WIzyqjl4uOB7Q
 v5ZXBH/Y4wShqVHhUzzQasiJgGYZoUoeOvJIoFNEo5f6UFuT/l9mkuM4vPtNKi6i
 NpT2Hj/hTATNRGyrBnvjcktS8uHbVxSHE6Cm4ZHSfQ8POMulUKvhHTZ7gYsxpxUE
 ktKCNgNL57QHRQojY0AftpEMLrDYiJEE9+6OLjEk3Mf7ayiZAAjIpy7LRD9+p8Q8
 eomEXmwQr1YOXGgvS0mFFPd+P2PfJoQztd1fWZtZIY1xksdwGep65e24hBOEjCAV
 7vVx+3pm+dekEFVf976ZYgACHCjOn8Efd5t+AK7b8S0u0uw=
 =Z81h
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "One fix for a regression in a recent TLB flush optimisation, which
  caused us to incorrectly not send TLB invalidations to coprocessors.

  Thanks to Frederic Barrat, Nicholas Piggin, Vaibhav Jain"

* tag 'powerpc-4.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s/radix: Fix missing global invalidations when removing copro
2018-08-03 10:38:21 -07:00
Thomas Gleixner 4a7a54a55e x86/intel_rdt: Disable PMU access
Peter is objecting to the direct PMU access in RDT. Right now the PMU usage
is broken anyway as it is not coordinated with perf.

Until this discussion settled, disable the PMU mechanics by simply
rejecting the type '2' measurement in the resctrl file.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
CC: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: hpa@zytor.com
2018-08-03 13:06:35 +02:00
Sai Praneeth 706d51681d x86/speculation: Support Enhanced IBRS on future CPUs
Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.

From the specification [1]:

 "With enhanced IBRS, the predicted targets of indirect branches
  executed cannot be controlled by software that was executed in a less
  privileged predictor mode or on another logical processor. As a
  result, software operating on a processor with enhanced IBRS need not
  use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
  privileged predictor mode. Software can isolate predictor modes
  effectively simply by setting the bit once. Software need not disable
  enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."

If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:

 "Retpoline is known to be an effective branch target injection (Spectre
  variant 2) mitigation on Intel processors belonging to family 6
  (enumerated by the CPUID instruction) that do not have support for
  enhanced IBRS. On processors that support enhanced IBRS, it should be
  used for mitigation instead of retpoline."

The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.

If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.

Enhanced IBRS still requires IBPB for full mitigation.

[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511

Originally-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim C Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com
2018-08-03 12:50:34 +02:00
Peter Feiner 301d328a6f x86/cpufeatures: Add EPT_AD feature bit
Some Intel processors have an EPT feature whereby the accessed & dirty bits
in EPT entries can be updated by HW. MSR IA32_VMX_EPT_VPID_CAP exposes the
presence of this capability.

There is no point in trying to use that new feature bit in the VMX code as
VMX needs to read the MSR anyway to access other bits, but having the
feature bit for EPT_AD in place helps virtualization management as it
exposes "ept_ad" in /proc/cpuinfo/$proc/flags if the feature is present.

[ tglx: Amended changelog ]

Signed-off-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Peter Shier <pshier@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180801180657.138051-1-pshier@google.com
2018-08-03 12:36:23 +02:00
Palmer Dabbelt c5ca4560de openrisc: Use the new GENERIC_IRQ_MULTI_HANDLER
It appears that openrisc copied arm64's GENERIC_IRQ_MULTI_HANDLER code
(which came from arm).  Cnvert it to use the generic version.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Stafford Horne <shorne@gmail.com>
Cc: linux@armlinux.org.uk
Cc: catalin.marinas@arm.com
Cc: Will Deacon <will.deacon@arm.com>
Cc: jonas@southpole.se
Cc: stefan.kristiansson@saunalahti.fi
Cc: jason@lakedaemon.net
Cc: marc.zyngier@arm.com
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: nicolas.pitre@linaro.org
Cc: vladimir.murzin@arm.com
Cc: keescook@chromium.org
Cc: jinb.park7@gmail.com
Cc: yamada.masahiro@socionext.com
Cc: alexandre.belloni@bootlin.com
Cc: pombredanne@nexb.com
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: kstewart@linuxfoundation.org
Cc: jhogan@kernel.org
Cc: mark.rutland@arm.com
Cc: ard.biesheuvel@linaro.org
Cc: james.morse@arm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: openrisc@lists.librecores.org
Link: https://lkml.kernel.org/r/20180622170126.6308-5-palmer@sifive.com
2018-08-03 12:14:09 +02:00
Palmer Dabbelt 78ae2e1cd8 arm64: Use the new GENERIC_IRQ_MULTI_HANDLER
It appears arm64 copied arm's GENERIC_IRQ_MULTI_HANDLER code, but made
it unconditional.

Converts the arm64 code to use the new generic code, which simply consists
of deleting the arm64 code and setting MULTI_IRQ_HANDLER instead.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: linux@armlinux.org.uk
Cc: catalin.marinas@arm.com
Cc: Will Deacon <will.deacon@arm.com>
Cc: jonas@southpole.se
Cc: stefan.kristiansson@saunalahti.fi
Cc: shorne@gmail.com
Cc: jason@lakedaemon.net
Cc: marc.zyngier@arm.com
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: nicolas.pitre@linaro.org
Cc: vladimir.murzin@arm.com
Cc: keescook@chromium.org
Cc: jinb.park7@gmail.com
Cc: yamada.masahiro@socionext.com
Cc: alexandre.belloni@bootlin.com
Cc: pombredanne@nexb.com
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: kstewart@linuxfoundation.org
Cc: jhogan@kernel.org
Cc: mark.rutland@arm.com
Cc: ard.biesheuvel@linaro.org
Cc: james.morse@arm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: openrisc@lists.librecores.org
Link: https://lkml.kernel.org/r/20180622170126.6308-4-palmer@sifive.com
2018-08-03 12:14:09 +02:00
Palmer Dabbelt 4c301f9b6a ARM: Convert to GENERIC_IRQ_MULTI_HANDLER
Converts the ARM interrupt code to use the recently added
GENERIC_IRQ_MULTI_HANDLER, which is essentially just a copy of ARM's
existhing MULTI_IRQ_HANDLER.  The only changes are:

* handle_arch_irq is now defined in a generic C file instead of an
  arm-specific assembly file.
 
* handle_arch_irq is now marked as __ro_after_init.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux@armlinux.org.uk
Cc: catalin.marinas@arm.com
Cc: Will Deacon <will.deacon@arm.com>
Cc: jonas@southpole.se
Cc: stefan.kristiansson@saunalahti.fi
Cc: shorne@gmail.com
Cc: jason@lakedaemon.net
Cc: marc.zyngier@arm.com
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: nicolas.pitre@linaro.org
Cc: vladimir.murzin@arm.com
Cc: keescook@chromium.org
Cc: jinb.park7@gmail.com
Cc: yamada.masahiro@socionext.com
Cc: alexandre.belloni@bootlin.com
Cc: pombredanne@nexb.com
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: kstewart@linuxfoundation.org
Cc: jhogan@kernel.org
Cc: mark.rutland@arm.com
Cc: ard.biesheuvel@linaro.org
Cc: james.morse@arm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: openrisc@lists.librecores.org
Link: https://lkml.kernel.org/r/20180622170126.6308-3-palmer@sifive.com
2018-08-03 12:14:08 +02:00
Reza Arbab 9eab9901b0 powerpc/powernv: Fix concurrency issue with npu->mmio_atsd_usage
We've encountered a performance issue when multiple processors stress
{get,put}_mmio_atsd_reg(). These functions contend for
mmio_atsd_usage, an unsigned long used as a bitmask.

The accesses to mmio_atsd_usage are done using test_and_set_bit_lock()
and clear_bit_unlock(). As implemented, both of these will require
a (successful) stwcx to that same cache line.

What we end up with is thread A, attempting to unlock, being slowed by
other threads repeatedly attempting to lock. A's stwcx instructions
fail and retry because the memory reservation is lost every time a
different thread beats it to the punch.

There may be a long-term way to fix this at a larger scale, but for
now resolve the immediate problem by gating our call to
test_and_set_bit_lock() with one to test_bit(), which is obviously
implemented without using a store.

Fixes: 1ab66d1fba ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03 20:09:01 +10:00
Christoph Hellwig 06832fc004 powerpc: Do not redefine NEED_DMA_MAP_STATE
kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it
from CONFIG_PPC using the same condition as an if guard.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[mpe: Move it under PPC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03 20:09:00 +10:00
Guenter Roeck 6e0495c2e8 powerpc/4xx: Fix error return path in ppc4xx_msi_probe()
An arbitrary error in ppc4xx_msi_probe() quite likely results in a
crash similar to the following, seen after dma_alloc_coherent()
returned an error.

  Unable to handle kernel paging request for data at address 0x00000000
  Faulting instruction address: 0xc001bff0
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE Canyonlands
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper Tainted: G        W
  4.18.0-rc6-00010-gff33d1030a6c #1
  NIP:  c001bff0 LR: c001c418 CTR: c01faa7c
  REGS: cf82db40 TRAP: 0300   Tainted: G        W
  (4.18.0-rc6-00010-gff33d1030a6c)
  MSR:  00029000 <CE,EE,ME>  CR: 28002024  XER: 00000000
  DEAR: 00000000 ESR: 00000000
  GPR00: c001c418 cf82dbf0 cf828000 cf8de400 00000000 00000000 000000c4 000000c4
  GPR08: c0481ea4 00000000 00000000 000000c4 22002024 00000000 c00025e8 00000000
  GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
  GPR24: 00029000 0000000c 00000000 cf8de410 c0494d60 c0494d60 cf8bebc0 00000001
  NIP [c001bff0] ppc4xx_of_msi_remove+0x48/0xa0
  LR [c001c418] ppc4xx_msi_probe+0x294/0x3b8
  Call Trace:
  [cf82dbf0] [00029000] 0x29000 (unreliable)
  [cf82dc10] [c001c418] ppc4xx_msi_probe+0x294/0x3b8
  [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
  [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
  [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
  [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
  [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
  [cf82dd40] [c02050c8] device_add+0x404/0x5c4
  [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
  [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
  [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
  [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
  [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
  [cf82dea0] [c0002404] do_one_initcall+0x40/0x188
  [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
  [cf82df30] [c0002600] kernel_init+0x18/0x104
  [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
  Instruction dump:
  90010024 813d0024 2f890000 83c30058 41bd0014 48000038 813d0024 7f89f800
  409d002c 813e000c 57ea103a 3bff0001 <7c69502e> 2f830000 419effe0 4803b26d
  ---[ end trace 8cf551077ecfc42a ]---

Fix it up. Specifically,

- Return valid error codes from ppc4xx_setup_pcieh_hw(), have it clean
  up after itself, and only access hardware after all possible error
  conditions have been handled.
- Use devm_kzalloc() instead of kzalloc() in ppc4xx_msi_probe()

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03 20:08:20 +10:00
Eric Biggers 4e34e51f48 crypto: arm/chacha20 - always use vrev for 16-bit rotates
The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
but the one-way code (used on remainder blocks) implements it with
vshl + vsri, which is slower.  Switch the one-way code to vrev32.16 too.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03 18:06:05 +08:00
Jonathan Cameron e4a1f7858a arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC
Enable all 4 SEC units available on d05 boards.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03 18:06:02 +08:00
Herbert Xu c5f5aeef9b Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Merge mainline to pick up c7513c2a27 ("crypto/arm64: aes-ce-gcm -
add missing kernel_neon_begin/end pair").
2018-08-03 17:55:12 +08:00
Nicholas Piggin 3127692deb powernv/cpuidle: Fix idle states all being marked invalid
Commit 9c7b185ab2 ("powernv/cpuidle: Parse dt idle properties into
global structure") parses dt idle states into structs, but never marks
them valid. This results in all idle states being lost.

Fixes: 9c7b185ab2 ("powernv/cpuidle: Parse dt idle properties into global structure")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03 18:30:33 +10:00
Linus Torvalds ed0093d976 ARC updates for 4.18
- Software managed DMA wreckage after rework in 4.17 [Euginey]
    + missing cache flush
    + SMP_CACHE_BYTES vs. cache_line_size
 
  - allmodconfig build errors [Randy]
 
  - Maintainer update for Mellanox (EZChip) NPS platform
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbY3tYAAoJEGnX8d3iisJeCbIP/1aRwU61Sn+1g4PBh2x8XBzU
 hvlUB5IIlFY+1aQEZG3h3P3SNi/DO3WtjXaAzzUlSHdX6jLFn8VWZupfnTE8Tr2p
 9tx5VGrHECQzg+ew6qc5KU3zcRMT4uy61QqE5r0MPYRXzpO9V25bWArQ1wBDMrzG
 T85X19dKvmNDFrJk5SkKFn6bHpeGaIrzwJzzgVJAcDMmMolQggMqJwHbLGt6reiI
 8CDZ3bmwwCbnb3r6JlltZq/MeJFdcLReL1eQsedh2GbqFoi4IRia0ICQakL+DLLk
 ru7sg+LOGKm9GpSHxzP1Jq1m3iPgXKW2UpggwCfe8Fima5mNSGRiwUZkVTfZ5h+W
 en4Mf97E6eFnouGD8w86b4KnQ3X6c1zxBEGjrnaDbifq/6iGfef+sxEFanJoGaHa
 kpLHYXe3CG9OgV05kxtnjJQRuBuRgIcK4G4LwASuq8JWRb3jIQL+VrHsYnh9oWUl
 66yMd9SSMHgW5ccE0r6oaJvG0dCBaichbJaVX0VoEZCqNbbSRR+ifoRJ/yXOpeVw
 TGkRlah7b5l64RL3klfa+mlONuCArn4Oflxjpje5RqKOQP8mFIhy/w4mox82DWW7
 n+7QygRj6H8Euei7ttP3G+9jL9+WErZf6+EwVKcfYmyEKj9+wRn5ySFm75hjmLsM
 IBUXUeVjttbCncx9yCrj
 =OGbe
 -----END PGP SIGNATURE-----

Merge tag 'arc-4.18-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
 "Another batch of fixes for ARC, this time mainly DMA API rework
  wreckage:

   - Fix software managed DMA wreckage after rework in 4.17 [Euginey]
      * missing cache flush
      * SMP_CACHE_BYTES vs cache_line_size

   - Fix allmodconfig build errors [Randy]

   - Maintainer update for Mellanox (EZChip) NPS platform"

* tag 'arc-4.18-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  arc: fix type warnings in arc/mm/cache.c
  arc: fix build errors in arc/include/asm/delay.h
  arc: [plat-eznps] fix printk warning in arc/plat-eznps/mtm.c
  arc: [plat-eznps] fix data type errors in platform headers
  ARC: [plat-eznps] Add missing struct nps_host_reg_aux_dpc
  ARC: add SMP_CACHE_BYTES value validate
  ARC: dma [non-IOC] setup SMP_CACHE_BYTES and cache_line_size
  ARC: dma [non IOC]: fix arc_dma_sync_single_for_(device|cpu)
  ARC: Add Ofer Levi as plat-eznps maintainer
2018-08-02 16:52:50 -07:00
Arun Parameswaran 18b872d854 arm64: dts: Fix the base address of the Broadcom iProc mdio mux
Modify the base address of the mdio mux driver to point to the
start of the mdio mux block's register address space.

Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02 14:36:49 -07:00
Amit Kucheria cda676b5c9 arm64: dts: sdm845: Add tsens nodes
SDM845 has two tsens blocks, one with 13 sensors and the other with 8
sensors. It uses version 2 of the TSENS IP, so use the fallback property to
allow more common code.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
2018-08-02 16:34:24 -05:00
Amit Kucheria f35c11b03b arm64: dts: msm8996: thermal: Initialise via DT and add second controller
We also split up the regmap address space into two, for the TM and SROT
registers. This was required to deal with different address offsets for the
TM and SROT registers across different SoC families.

8996 has two TSENS IP blocks, initialise the second one too.

Since tsens-common.c/init_common() currently only registers one address
space, the order is important (TM before SROT). This is OK since the code
doesn't really use the SROT functionality yet.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
2018-08-02 16:33:36 -05:00
Linus Torvalds ef46808b79 pci-v4.18-fixes-5
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAltjEoMUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vw3gQ/+IK9Poej5yIKG06gtbtLjflq9wRXW
 ZP72U9P8gpfI/L/0T+uAHVpp5ewjD8tGCgsEhS/OzU7eu/fyr5V5TnqjfFE++fi4
 JnQrl4J14y8miP/iYQGf+6CwGYsHM4TnVa/yo525FocZBqYWKcLAPtYuhcXNPWNC
 iOKa3wii1KErJnjwCU0mR1a45dP5vJC/TCEcDD1ZQYJPxDrfB9lwwAo+rXFyPjMe
 TBJso1bLOzd3XTuyQqiP/HBPdtGi024eb3JdK16K273EPKoFp9Iw8irpP4hZKPqK
 60wbTc4kSlo7Sj9OOdTXH1XW2xe/xDJimyyJjsVz8Rg6y8DN6Vg0aXEazm/xFmRC
 wxP4/U5ciivhvmCrbqB6DPqcBtu83Yxh7sZZPKw4LGU8Dk75QDhlVAmE2+vH51u8
 Owt11GAjj++6EHCjeTms+bvAL/HFWeGru/A2NZE93QEy/tq3UtCLXGkni5Av1vWj
 u2bQPVqia1s7xPdRR3lwCOeCa7yU/LwqpNm8w0uF/0oCRKs42Ao4pU/sLYQGihoo
 rwqAEBKYA6TI6L7C4TmCue4cegRvK0Mhn8wZdGrJhJTKu+1gtIs0ph1bwtMin28d
 U8zX6Zhi9/xQSJN7iARbEcdgCZtePKkLIMSiILKMmAh+bjAs3HzLMs2czQ2g7YKU
 +71+CqJfgQXmxjI=
 =pwJS
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Fix integer overflow in new mobiveil driver (Dan Carpenter)

 - Fix race during NVMe removal/rescan (Hari Vyas)

* tag 'pci-v4.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Fix is_added/is_busmaster race condition
  PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE
2018-08-02 10:59:19 -07:00
David S. Miller 89b1698c93 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
The BTF conflicts were simple overlapping changes.

The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02 10:55:32 -07:00
Linus Torvalds 8cda548ffb arm64 regression fix
- Fix potential clobbering of user vector register state by AES ghash code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJbYtldAAoJELescNyEwWM0aisH/ReBqfJb1kTr5ybUriSTsuTl
 Vjn8O17aH+177P3xlY47+cYWsvewJxiy7MldlLEiER0tsyjYvqTDuzy7+ffdJJ8R
 rs0MGhYh9WpD3OVjng7TmwXD71xefk/gLq4MPqmgn9e/DoAt4HO/7jC4c+mSwxRZ
 gHsAH5g6WUclRvU1zXaT8QKdZnmwudPFvy5O+bQYQrIJr++zBiyJ47qu1+TjJQuC
 kVmM7XV0c0L1fK9z7A18PcW+tMlIu15ITzJwEercXen/7XypdDOufgc4Y9odHCkC
 5ZWnV5wZtaJIuo/JWWPnoheWfTqIVF7ggXwsaXmZ0hKfQkBvbJ8fxhogCIWgt40=
 =rpVW
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 regression fix from Will Deacon:
 "Ard found a nasty arm64 regression in 4.18 where the AES ghash/gcm
  code doesn't notify the kernel about its use of the vector registers,
  therefore potentially corrupting live user state.

  The fix is straightforward and Herbert agreed for it to go via arm64"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair
2018-08-02 10:21:49 -07:00
Paul Burton ca75e2fc77
MIPS: generic: Remove input symbols from defconfig
generic_defconfig explicitly disables CONFIG_INPUT_MOUSEDEV,
CONFIG_INPUT_KEYBOARD & CONFIG_INPUT_MOUSE which results in warnings
when merging board config fragments if any of them require these
options. This is the case for the ranchu board, which means we've had
the following warning when configuring for generic platform targets
since commit f2d0b0d5c1 ("MIPS: ranchu: Add Ranchu as a new
generic-based board"):

  $ make ARCH=mips 32r2el_defconfig
  Using ./arch/mips/configs/generic_defconfig as base
  Merging arch/mips/configs/generic/32r2.config
  Merging arch/mips/configs/generic/el.config
  Merging ./arch/mips/configs/generic/board-sead-3.config
  Merging ./arch/mips/configs/generic/board-ranchu.config
  Value of CONFIG_INPUT_KEYBOARD is redefined by fragment ./arch/mips/configs/generic/board-ranchu.config:
  Previous value: # CONFIG_INPUT_KEYBOARD is not set
  New value: CONFIG_INPUT_KEYBOARD=y

  Merging ./arch/mips/configs/generic/board-ni169445.config
  Merging ./arch/mips/configs/generic/board-boston.config
  Merging ./arch/mips/configs/generic/board-ocelot.config
  Merging ./arch/mips/configs/generic/board-xilfpga.config
  scripts/kconfig/conf  --olddefconfig Kconfig
  #
  # configuration written to .config
  #

Resolve this by removing mention of the CONFIG_INPUT_* Kconfig symbols
from generic_defconfig, allowing them to take their default values &
allowing board config fragments to enable them without warnings.

This may be problematic if CONFIG_ARCH_MIGHT_HAVE_PC_SERIO is ever
enabled for CONFIG_MIPS_GENERIC=y configurations, but for now that isn't
the case so we can worry about that if & when it happens.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20109/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-08-02 10:18:37 -07:00
Russell King a3c0f84765 ARM: spectre-v1: mitigate user accesses
Spectre variant 1 attacks are about this sequence of pseudo-code:

	index = load(user-manipulated pointer);
	access(base + index * stride);

In order for the cache side-channel to work, the access() must me made
to memory which userspace can detect whether cache lines have been
loaded.  On 32-bit ARM, this must be either user accessible memory, or
a kernel mapping of that same user accessible memory.

The problem occurs when the load() speculatively loads privileged data,
and the subsequent access() is made to user accessible memory.

Any load() which makes use of a user-maniplated pointer is a potential
problem if the data it has loaded is used in a subsequent access.  This
also applies for the access() if the data loaded by that access is used
by a subsequent access.

Harden the get_user() accessors against Spectre attacks by forcing out
of bounds addresses to a NULL pointer.  This prevents get_user() being
used as the load() step above.  As a side effect, put_user() will also
be affected even though it isn't implicated.

Also harden copy_from_user() by redoing the bounds check within the
arm_copy_from_user() code, and NULLing the pointer if out of bounds.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-08-02 17:41:38 +01:00
Russell King b1cd0a1480 ARM: spectre-v1: use get_user() for __get_user()
Fixing __get_user() for spectre variant 1 is not sane: we would have to
add address space bounds checking in order to validate that the location
should be accessed, and then zero the address if found to be invalid.

Since __get_user() is supposed to avoid the bounds check, and this is
exactly what get_user() does, there's no point having two different
implementations that are doing the same thing.  So, when the Spectre
workarounds are required, make __get_user() an alias of get_user().

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-08-02 17:41:38 +01:00
Russell King d09fbb327d ARM: use __inttype() in get_user()
Borrow the x86 implementation of __inttype() to use in get_user() to
select an integer type suitable to temporarily hold the result value.
This is necessary to avoid propagating the volatile nature of the
result argument, which can cause the following warning:

lib/iov_iter.c:413:5: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var]

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-08-02 17:41:38 +01:00
Russell King 8c8484a1c1 ARM: oabi-compat: copy semops using __copy_from_user()
__get_user_error() is used as a fast accessor to make copying structure
members as efficient as possible.  However, with software PAN and the
recent Spectre variant 1, the efficiency is reduced as these are no
longer fast accessors.

In the case of software PAN, it has to switch the domain register around
each access, and with Spectre variant 1, it would have to repeat the
access_ok() check for each access.

Rather than using __get_user_error() to copy each semops element member,
copy each semops element in full using __copy_from_user().

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-08-02 17:41:38 +01:00
Russell King 42019fc50d ARM: vfp: use __copy_from_user() when restoring VFP state
__get_user_error() is used as a fast accessor to make copying structure
members in the signal handling path as efficient as possible.  However,
with software PAN and the recent Spectre variant 1, the efficiency is
reduced as these are no longer fast accessors.

In the case of software PAN, it has to switch the domain register around
each access, and with Spectre variant 1, it would have to repeat the
access_ok() check for each access.

Use __copy_from_user() rather than __get_user_err() for individual
members when restoring VFP state.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-08-02 17:41:37 +01:00
Zhong Jiang 0b2c1aec49 x86/iommu: Use NULL instead of 0
Fixes the following sparse warning:

arch/x86/kernel/pci-iommu_table.c:63:37: warning: Using plain integer as NULL pointer

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <hpa@zytor.com>
Cc: <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/1532162004-24670-1-git-send-email-zhongjiang@huawei.com
2018-08-02 14:33:19 +02:00
Uros Bizjak 216a37202f x86/boot: Use CC_SET()/CC_OUT() instead of open coding it
Remove open-coded uses of set instructions with CC_SET()/CC_OUT().

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180629142844.15200-1-ubizjak@gmail.com
2018-08-02 14:30:42 +02:00
Chengguang Xu 765d28f136 x86/mm: Remove redundant check for kmem_cache_create()
The flag 'SLAB_PANIC' implies panic on failure, So there is no need to
check the returned pointer for NULL.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1528804132-154948-1-git-send-email-cgxu519@gmx.com
2018-08-02 14:27:35 +02:00
Colin Ian King d79d024820 x86/platform/UV: Remove redundant check of p == q
The check for p == q is dead code because the proceeding switch
statements jump to the end of the outer for-loop with continue
statements.  Remove the dead code.

Detected by CoverityScan, CID#145071 ("Structurally dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731090938.11856-1-colin.king@canonical.com
2018-08-02 14:25:41 +02:00
zhong jiang c7ba9f7cb5 x86/platform/olpc: Use PTR_ERR_OR_ZERO()
Replace the open coded equivalent with PTR_ERR_OR_ZERO().

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1533053286-34990-1-git-send-email-zhongjiang@huawei.com
2018-08-02 14:25:40 +02:00
Kirill A. Shutemov 1b3a626436 x86/boot/compressed/64: Validate trampoline placement against E820
There were two report of boot failure cased by trampoline placed into
a reserved memory region. It can happen on machines that don't report
EBDA correctly.

Fix the problem by re-validating the found address against the E820 table.
If the address is in a reserved area, find the next usable region below the
initial address.

Fixes: 3548e131ec ("x86/boot/compressed/64: Find a place for 32-bit trampoline")
Reported-by: Dmitry Malkin <d.malkin@real-time-systems.com>
Reported-by: youling 257 <youling257@gmail.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180801133225.38121-1-kirill.shutemov@linux.intel.com
2018-08-02 14:22:22 +02:00
Paolo Bonzini 85eae57bbb KVM: s390: Features for 4.19
- initial version for host large page support. Must be enabled with
   module parameter hpage=1 and will conflict with the nested=1
   parameter.
 - enable etoken facility for guests
 - Fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbYAz8AAoJEBF7vIC1phx8r6cP+wa1xFphqa5ha7/qlr/upfWF
 aqIQ9IqyUcsYC83EQuEKTgEEgXkhOZ6eUM8B5dkmTP0iTroB8bFc5+AEKZfRoMzk
 rLBuxZti5cKI0l5nE4ChX20WNzjwVIRiHqkyh7Z+otpfVS5tEujkVPbOkzHTVZIL
 MJWAGb0m8PXbXShqAq3d5aPKcQz3L/y4uPHZsPJtv6vf/e06l5OlvP5gxiIFORFQ
 ss01RrzlqXGJ8KEUPD1kWT2I3mbSUZldb2Yd0oi/m4RK2csJ9WAEYNBjLstdXJ9/
 l0iqT7ZAbRfdAxRKHSUPxoj1WK/EPxEYfh+NbR+gbVgGrsMi9yeKPAIVC1ws/kG1
 qdjVHbJyMXCHRaKi/+ayyNwdcv6cWSIEFbfIwqFfKEm/5/AbSIg76uReuWh1oDM4
 tX7aQNJp7zmRWBbxGy9ezJlvOOvz+G2hMtf5MiD3rMBI602FE+eak8NBTXmOQ8Jv
 /EZw/xJFQBOnD7G/cNMQRiPGHOPRdhS4O/9N4DUh9Qw88rOtqLKQRRnkB5cTawWa
 LEVU+sOhqCROEr/pQdN4cID71bnWb2bsvtmTX5EOK1UF4LeaKLaEmgCJBIkOGjns
 VI9RecNa7meeeYYN67Mgp9VdGab38fc+ybT9TVejVheznrxcKrR8RlwCjcmLP2n4
 QpzmR/9E3SzwY6y/k11i
 =WSH5
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Features for 4.19

- initial version for host large page support. Must be enabled with
  module parameter hpage=1 and will conflict with the nested=1
  parameter.
- enable etoken facility for guests
- Fixes
2018-08-02 13:57:29 +02:00
Christoph Hellwig 6fa1d28e38 sh: use generic dma_noncoherent_ops
Switch to the generic noncoherent direct mapping implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
2018-08-02 13:54:20 +02:00
Christoph Hellwig 46bcde94cd sh: split arch/sh/mm/consistent.c
Half of the file just contains platform device memory setup code which
is required for all builds, and half contains helpers for dma coherent
allocation, which is only needed if CONFIG_DMA_NONCOHERENT is enabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
2018-08-02 13:54:15 +02:00
Christoph Hellwig a602915f5d sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case
This is a slight change in behavior as we avoid the detour through the
virtual mapping for the coherent allocator, but if this CPU really is
coherent that should be the right thing to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
2018-08-02 13:54:11 +02:00
Christoph Hellwig 47fcae0d2a sh: introduce a sh_cacheop_vaddr helper
And use it in the maple bus code to avoid a dma API dependency.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
2018-08-02 13:54:06 +02:00
Christoph Hellwig b2fcb677d4 sh: simplify get_arch_dma_ops
Remove the indirection through the dma_ops variable, and just return
nommu_dma_ops directly from get_arch_dma_ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
2018-08-02 13:54:01 +02:00
Ingo Molnar 16e0e6a83b Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-08-02 09:59:20 +02:00
Randy Dunlap 22471e1313 kconfig: use a menu in arch/Kconfig to reduce clutter
Put everything in arch/Kconfig into a General options menu
so that they don't clutter up the main/major/primary list of
menu options.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02 08:06:54 +09:00
Christoph Hellwig 87a4c37599 kconfig: include kernel/Kconfig.preempt from init/Kconfig
Almost all architectures include it.  Add a ARCH_NO_PREEMPT symbol to
disable preempt support for alpha, hexagon, non-coldfire m68k and
user mode Linux.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02 08:06:54 +09:00
Christoph Hellwig 06ec64b84c Kconfig: consolidate the "Kernel hacking" menu
Move the source of lib/Kconfig.debug and arch/$(ARCH)/Kconfig.debug to
the top-level Kconfig.  For two architectures that means moving their
arch-specific symbols in that menu into a new arch Kconfig.debug file,
and for a few more creating a dummy file so that we can include it
unconditionally.

Also move the actual 'Kernel hacking' menu to lib/Kconfig.debug, where
it belongs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02 08:06:48 +09:00
Christoph Hellwig 1572497cb0 kconfig: include common Kconfig files from top-level Kconfig
Instead of duplicating the source statements in every architecture just
do it once in the toplevel Kconfig file.

Note that with this the inclusion of arch/$(SRCARCH/Kconfig moves out of
the top-level Kconfig into arch/Kconfig so that don't violate ordering
constraits while keeping a sensible menu structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02 08:03:23 +09:00
Christoph Hellwig 17c46a6aff kconfig: remove duplicate SWAP symbol defintions
microblaze and nios2 define their own always n SWAP symbols.  Remove those
and let the generic defintion do the right thing by adding a new symbol
to disable swap entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02 08:03:23 +09:00
Christoph Hellwig 9bea18010f um: create a proper drivers Kconfig
Merge arch/um/Kconfig.char and arch/um/Kconfig.net into a new
arch/um/drivers/Kconfig.  This fits the way Kconfig files are placed
elsewhere in the kernel tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02 08:03:23 +09:00
Christoph Hellwig f163977d21 um: cleanup Kconfig files
We can handle all not architecture specific UM configuration directly in
the newly added arch/um/Kconfig.  Do so by merging the Kconfig.common,
Kconfig.rest and Kconfig.um files into arch/um/Kconfig, and move the main
UML menu as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02 08:03:23 +09:00
Christoph Hellwig 79b05c1f31 um: stop abusing KBUILD_KCONFIG
Instead create a arch/um/Kconfig file that just includes the actual
per-arch Kconfig file.  Note that we use HEADER_ARCH to find the
per-arch Kconfig file as that variable already includes the
normalization from i386 or x86_64 to x86.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02 08:03:23 +09:00
Linus Torvalds 6b47037682 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
 "Just a single fix this time around for recent binutils causing build
  problems when generating Thumb-2 code"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
2018-08-01 15:01:26 -07:00
Linus Torvalds 8b11ec1b5f mm: do not initialize TLB stack vma's with vma_init()
Commit 2c4541e24c ("mm: use vma_init() to initialize VMAs on stack and
data segments") tried to initialize various left-over ad-hoc vma's
"properly", but actually made things worse for the temporary vma's used
for TLB flushing.

vma_init() doesn't actually initialize all of the vma, just a few
fields, so doing something like

   -       struct vm_area_struct vma = { .vm_mm = tlb->mm, };
   +       struct vm_area_struct vma;
   +
   +       vma_init(&vma, tlb->mm);

was actually very bad: instead of having a nicely initialized vma with
every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma
with only a couple of fields initialized.  And they weren't even fields
that the code in question mostly cared about.

The flush_tlb_range() function takes a "struct vma" rather than a
"struct mm_struct", because a few architectures actually care about what
kind of range it is - being able to only do an ITLB flush if it's a
range that doesn't have data accesses enabled, for example.  And all the
normal users already have the vma for doing the range invalidation.

But a few people want to call flush_tlb_range() with a range they just
made up, so they also end up using a made-up vma.  x86 just has a
special "flush_tlb_mm_range()" function for this, but other
architectures (arm and ia64) do the "use fake vma" thing instead, and
thus got caught up in the vma_init() changes.

At the same time, the TLB flushing code really doesn't care about most
other fields in the vma, so vma_init() is just unnecessary and
pointless.

This fixes things by having an explicit "this is just an initializer for
the TLB flush" initializer macro, which is used by the arm/arm64/ia64
people who mis-use this interface with just a dummy vma.

Fixes: 2c4541e24c ("mm: use vma_init() to initialize VMAs on stack and data segments")
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01 13:43:38 -07:00
Paul Burton 48ae93fdd1
MIPS: Delete unused code in linux32.c
The A() & AA() macros have been unused since commit 05e4396651
("[MIPS] Use SYSVIPC_COMPAT to fix various problems on N32"), which
switched to the more standard compat_ptr().

RLIM_INFINITY32, RESOURCE32() & struct rlimit32 have been present but
unused since the beginning of the git era.

Remove the dead code.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20108/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-08-01 13:20:27 -07:00
Paul Burton 3a1c0fc592
MIPS: Remove unused sys_32_mmap2
The sys_32_mmap2 function has been unused since we started using syscall
wrappers in commit dbda6ac089 ("MIPS: CVE-2009-0029: Enable syscall
wrappers."), and is indeed identical to the sys_mips_mmap2 function that
replaced it in sys32_call_table.

Remove the dead code.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20107/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-08-01 13:20:21 -07:00
Paul Burton 96a68b14db
MIPS: Remove nabi_no_regargs
Our sigreturn functions make use of a macro named nabi_no_regargs to
declare 8 dummy arguments to a function, forcing the compiler to expect
a pt_regs structure on the stack rather than in argument registers. This
is an ugly hack which unnecessarily causes these sigreturn functions to
need to care about the calling convention of the ABI the kernel is built
for. Although this is abstracted via nabi_no_regargs, it's still ugly &
unnecessary.

Remove nabi_no_regargs & the struct pt_regs argument from sigreturn
functions, and instead use current_pt_regs() to find the struct pt_regs
on the stack, which works cleanly regardless of ABI.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20106/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-08-01 13:20:15 -07:00
Linus Torvalds ebad825cdd ia64: mark special ia64 memory areas anonymous
Commit bfd40eaff5 ("mm: fix vma_is_anonymous() false-positives") made
newly allocated vma's have a dummy vm_ops field so that they wouldn't be
mistaken for anonymous mappings, and if you wanted an anonymous vma you
had to explicitly say so by calling "vma_set_anonymous()" on it.

However, it missed the two special vmas that ia64 processes have: the
register backing store and the NaT page.  So they wouldn't actually act
like anonymous ranges, and page faults on them caused a SIGBUS rather
than the creation of a new anon page in them.

That obviously will make any ia64 binary very unhappy indeed, and the
boot fails early.

Fixes: bfd40eaff5 ("mm: fix vma_is_anonymous() false-positives")
Reported-by: Tony Luck <tony.luck@intel.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01 09:57:50 -07:00
Frederic Barrat cca19f0b68 powerpc/64s/radix: Fix missing global invalidations when removing copro
With the optimizations for TLB invalidation from commit 0cef77c779
("powerpc/64s/radix: flush remote CPUs out of single-threaded
mm_cpumask"), the scope of a TLBI (global vs. local) can now be
influenced by the value of the 'copros' counter of the memory context.

When calling mm_context_remove_copro(), the 'copros' counter is
decremented first before flushing. It may have the unintended side
effect of sending local TLBIs when we explicitly need global
invalidations in this case. Thus breaking any nMMU user in a bad and
unpredictable way.

Fix it by flushing first, before updating the 'copros' counter, so
that invalidations will be global.

Fixes: 0cef77c779 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-01 23:23:41 +10:00
Martin Schwidefsky fb7d7518b0 s390/numa: move initial setup of node_to_cpumask_map
The numa_init_early initcall sets the node_to_cpumask_map[0] to the
full cpu_possible_mask. Unfortunately this early_initcall is too late,
the NUMA setup for numa=emu is done even earlier. The order of calls
is numa_setup() -> emu_update_cpu_topology(), then the early_initcalls(),
followed by sched_init_domains().

Starting with git commit 051f3ca02e
"sched/topology: Introduce NUMA identity node sched domain"
the incorrect node_to_cpumask_map[0] really screws up the domain
setup and the kernel panics with the follow oops:

Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-08-01 07:48:33 +02:00
Olof Johansson 07d268f541 Enablement of some more features relevant to rk3399-kevin
(Chromebook Plus)
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCAAuFiEE7v+35S2Q1vLNA3Lx86Z5yZzRHYEFAltfHfEQHGhlaWtvQHNu
 dGVjaC5kZQAKCRDzpnnJnNEdgUzjCACqE5AOVnEk1y9RuyTT1AqfuH6SkAHzw+7n
 QnEtfpHTmSP2SMWXUBreebr0YgHDXmyOLWsKO0dIg9kjLcqV2HKIZmHWy6i5ADEu
 nvexKJ2YJ51MnzAaWXS08en49nV+pDJmLnNK0eeozGOw3qHqYo7f9AiweBQoHq7D
 IZZetI57wiSSrWVD122T8XZQfw9OCYwvBtEYJMOcsUDm89JECDZNgn3mvFr8pzRM
 FbGzkWoYJaySlfm/nGyVOZvPgcYymIOzVIxZej26vqa1//5WojAlw5tTvKZ3Jd9i
 aR/i4ljxXmHbY3qVu1zgac+aYKMFWLeM5SV0WB5cWo0DYkkMrTC4
 =FNsc
 -----END PGP SIGNATURE-----

Merge tag 'v4.19-rockchip-defconfig64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/defconfig

Enablement of some more features relevant to rk3399-kevin
(Chromebook Plus)

* tag 'v4.19-rockchip-defconfig64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: defconfig: Enable more peripherals for Samsung Chromebook Plus.

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-31 19:14:11 -07:00
Olof Johansson 4f53a4a76c A new board, the Vamrs Ficus using the rk3399 and followin the 96boards
standard. LEDs and power button for the rk3399 firefly and removal of
 some deprecated type-c properties from the rk3399 devicetree.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCAAuFiEE7v+35S2Q1vLNA3Lx86Z5yZzRHYEFAltfHXEQHGhlaWtvQHNu
 dGVjaC5kZQAKCRDzpnnJnNEdgepyB/9HJZvneyadfomxhywbflJ8wkZj4WWfHDNG
 79hwcg33EBcP2+vRkTeYypkurmhZdv+sSF6h1sLUBmtFNuapzd7xlI7FW+NAwwgl
 6UFU4DtiC2E0b9ljrCL1WQH34BEVpSyaENO7EF1gR8CvZAnb4BzRgzDxfAczbQlH
 6uBRzfuHooW/zHrbLxMxzkMbeHlU7m8i3/F2E3Uftxy5/qHUNmFGNcOl0bCpT/i6
 gS1QKIBAMmQ6yaNwrwG6GmPPampv2jXGUCNFAlCNkLPjKY0Ri7XdBq0k91sx78YT
 bP7k8Qd+TANMZgHB+mHPhgupMDur8KL+hn5I+MO3DItfn3rUW4EL
 =0BU+
 -----END PGP SIGNATURE-----

Merge tag 'v4.19-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/dt

A new board, the Vamrs Ficus using the rk3399 and followin the 96boards
standard. LEDs and power button for the rk3399 firefly and removal of
some deprecated type-c properties from the rk3399 devicetree.

* tag 'v4.19-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: add led support for Firefly-RK3399
  arm64: dts: rockchip: remove deprecated Type-C PHY properties on rk3399
  arm64: dts: rockchip: add power button support for Firefly-RK3399
  arm64: dts: rockchip: drop out-of-tree properties from rk3399-ficus regulator
  arm64: dts: rockchip: add voltage properties for vcc3v3_pcie on rk3399 ficus
  arm64: dts: rockchip: add USB 2.0 and 3.0 support on Ficus board
  arm64: dts: rockchip: add 96boards RK3399 Ficus board
  dt-bindings: Add vendor prefix for Vamrs Ltd.

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-31 19:13:27 -07:00
Olof Johansson 2dd207c91a Amlogic 64-bit DT updates for v4.19, round 3
- add DT support for AXG Audio
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEe4dGDhaSf6n1v/EMWTcYmtP7xmUFAltgxzIACgkQWTcYmtP7
 xmXihQ//WWaTQ37sOMn8R0saCGRT/sJAsoyasfcenDTl4RU/SKBVfdKRf869oQZF
 O7XtrNU+xk2ejj1n/S2HCYxftFllkFTnBp5C1F6P8bmxAVE7WquvRM00dI2bueXn
 QK4nJUCvvypnkpNanBcfNcZmSpmiKym8bBGntM13o50SZtQQphzldsd6YciVaOuw
 1VAKABEn/1PqttJ+vfbsxgSBov2N9tPmOj6TPu0ySwvmYwD99Yc9lGNbZNNeM0F+
 WmKW45qKdZDeDTvrNDgX4MyVxnrK5TnG/vs4qr8YIh6txDwhNlu2bekS/L0G3Moy
 M/TxZcohQa94R1Z+evI+qhy4asDDGwsRrBmRLmHYYGIlKVM/1VeCU0JoXNoGpQuF
 k8tmIsuqjMMQKFSMv9jD11HZqB+Q8zSuGWUz8KMNOX+o/rSYpJK8N/w/irbaKVxg
 6kG8US5VXzf/TZXvuczK3mJ9lYazqn8Xeb5jzLbtHonooOpgiFNexoV1KHNjqtPc
 WIJTpsbCtUu79l/Sdmmngf51/eW80x7bodXvKx1+4M09T7ZNTN3mVG22gfFdy0H0
 t/bWbZPuakGdhIDU7kj2IaId60+cPUwfyUKpTebYu3BmmbVSYJHCbO5UmQfiOmrk
 DTVHx2Gh5l+Ks6ccMe2cmNG00emkOwxmpQEQcS3e3JNsCeXy4Gg=
 =ykhe
 -----END PGP SIGNATURE-----

Merge tag 'amlogic-dt64-3' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into next/dt

Amlogic 64-bit DT updates for v4.19, round 3
- add DT support for AXG Audio

* tag 'amlogic-dt64-3' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic:
  arm64: dts: meson-axg: add spdif-dit codec
  arm64: dts: meson-axg: add lineout codec
  arm64: dts: meson-axg: add linein codec
  arm64: dts: meson-axg: add tdm interfaces
  arm64: dts: meson-axg: add tdmout formatters
  arm64: dts: meson-axg: add tdmin formatters
  arm64: dts: meson-axg: add spdifout
  arm64: dts: meson-axg: add audio arb reset controller
  arm64: dts: meson-axg: add usb power regulator
  arm64: dts: meson-axg: add vcc 5v regulator on the s400
  arm64: dts: meson-axg: improve power supplies description

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-31 19:10:31 -07:00
Olof Johansson 7f27a62267 mvebu dt64 for 4.19 (part 2)
Use more specific compatible for the Inside Secure SafeXcel on the
 Armada 37xx and the Armada 7K/8K SoCs.
 -----BEGIN PGP SIGNATURE-----
 
 iF0EABECAB0WIQQYqXDMF3cvSLY+g9cLBhiOFHI71QUCW1GhYAAKCRALBhiOFHI7
 1UYeAKCZvsX0evzDYPCmL32CSuArmJ+OKwCdFYn7QoXQIYwGP60bK2UZzhcWROs=
 =V7S+
 -----END PGP SIGNATURE-----

Merge tag 'mvebu-dt64-4.19-2' of git://git.infradead.org/linux-mvebu into next/dt

mvebu dt64 for 4.19 (part 2)

Use more specific compatible for the Inside Secure SafeXcel on the
Armada 37xx and the Armada 7K/8K SoCs.

* tag 'mvebu-dt64-4.19-2' of git://git.infradead.org/linux-mvebu:
  arm64: dts: marvell: armada-37xx: update the crypto engine compatible
  arm64: dts: marvell: armada-cp110: update the crypto engine compatible

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-31 19:07:19 -07:00
Kunihiko Hayashi c8f8a0b50b ARM: multi_v7_defconfig: add CONFIG_UNIPHIER_THERMAL and CONFIG_SNI_AVE
Enable the thermal monitor driver and the AVE ethernet driver
implemented on UniPhier SoCs.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-31 19:05:31 -07:00
Masahiro Yamada f0fc40aff6 ARM: uniphier: select RESET_CONTROLLER
The UniPhier platform highly relies on the reset controller.
Select RESET_CONTROLLER to enable it forcibly.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-31 19:04:41 -07:00
Masahiro Yamada ab6ab445b9 arm64: uniphier: select RESET_CONTROLLER
The UniPhier platform highly relies on the reset controller.
Select RESET_CONTROLLER to enable it forcibly.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-31 19:04:40 -07:00
Masahiro Yamada f61513f724 ARM: uniphier: remove empty Makefile
arch/arm/mach-uniphier/Makefile has been unused for a long time.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-31 19:04:39 -07:00
Olof Johansson 61c22946ad ASPEED device tree updates for 4.19
- New support for the ASPEED USB host controller and USB vhub (device)
  support
 
  - Descriptions for the ColdFire processor that is part of the ASPEED
  SoC
 
  - Small fixes:
   * pwm/tach clock
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+nHMAt9PCBDH63wBa3ZZB4FHcJ4FAltacAgACgkQa3ZZB4FH
 cJ7VIg//S1Df+le7rn4LYqF4OSXoDDTU4oDTkQtjvMIBoTjGnWbwhtaMYKzSsE82
 nNgo+BXFuKufNc27zNCiK2aGrWMbFqjcvC3OKE6V/qERMsyE5g1b41oAb8sbusJJ
 NnbvHL1g9UE3fojwOyHNkQG5pfUAv/+Dj3P0hdmKa3Jsk+pVHlKN1mB1z5tM80Ab
 MR+/MaGnNHqj1giTdubLvh8Bhp+sOS3XymVsvZfKjadloLO4qcIbQnxpTJB3GyRE
 xa4KhXv1oSSzAIsfwtR/iTyvuc3Klr/K1ePdlObeFVmc+IH75GrPYZEJznhuKUau
 JKhZlgIpfFoeDbEIO9Os/tejobHh30W+/FjsVvaZ353sqzx1r7i5limw5Hip0j1w
 PgxZAdYBTAGSRRK5n43N6aK6IsrNxGiHzFlzU/gBFUAIJfZUHtJ4Kn0eO61Ee3FJ
 fsVyxtosx5LBhV+rjkoZBB4wfoMmuI0YoKc+o3/wb4zYEUq9HitoBq7KasCBi9/L
 20e+Qf0YNZ4r2STeLIeY0/1wkmUrH4Z3nthhlIjySnuscAEX8utGFEobGUf3WYnR
 mN1Kk4ur+j9b7Yv77MseQheFXn9MtyCqI3sElJPaOjnkUUZBDLPJ65sW75OOPW9u
 SmdvV6am7ndo8LSmIFH34lmLEwEt6b3r+zzgCpu1ZTw9YznWmsc=
 =aigP
 -----END PGP SIGNATURE-----

Merge tag 'aspeed-4.19-devicetree-no-fsi' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/aspeed into next/dt

ASPEED device tree updates for 4.19

 - New support for the ASPEED USB host controller and USB vhub (device)
 support

 - Descriptions for the ColdFire processor that is part of the ASPEED
 SoC

 - Small fixes:
  * pwm/tach clock

* tag 'aspeed-4.19-devicetree-no-fsi' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/aspeed:
  ARM: dts: aspeed: Add coprocessor interrupt controller
  ARM: dts: aspeed: Use 24MHz fixed clock for pwm
  ARM: dts: aspeed: Fix Romulus VGA frame buffer
  ARM: dts: aspeed: Enable vhub on port A of AST2500 EVB
  ARM: dts: aspeed: Add G5 USB Virtual Hub
  ARM: dts: aspeed: Add G4 USB Virtual Hub
  ARM: dts: aspeed: Add G5 USB host pinmux
  ARM: dts: aspeed: Add G4 USB pinmux

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-31 19:02:51 -07:00
Alexandre Belloni 84a7f564fa
mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123
Ocelot PCB123 has a SPI NOR connected on its SPI bus.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20103/
Cc: Mark Brown <broonie@kernel.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-spi@vger.kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Allan Nielsen <allan.nielsen@microsemi.com>
2018-07-31 10:34:34 -07:00
Alexandre Belloni 9eaf3ba5e0
mips: dts: mscc: Add spi on Ocelot
Add support for the SPI controller

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20101/
Cc: Mark Brown <broonie@kernel.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-spi@vger.kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Allan Nielsen <allan.nielsen@microsemi.com>
2018-07-31 10:34:08 -07:00
Hari Vyas 44bda4b7d2 PCI: Fix is_added/is_busmaster race condition
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.

When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.

is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.

A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.

Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master().  As these fields are not
handled atomically, that clears the is_added bit.

Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state.  This avoids the
race because is_added no longer shares a memory location with is_busmaster.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 11:27:54 -05:00
Philipp Rudo 8cce437fbb s390/kdump: Fix elfcorehdr size calculation
Before the memory for the elfcorehdr is allocated the required size is
estimated with

       alloc_size = 0x1000 + get_cpu_cnt() * 0x4a0 +
               mem_chunk_cnt * sizeof(Elf64_Phdr);

Where 0x4a0 is used as size for the ELF notes to store the register
contend. This size is 8 bytes too small. Usually this does not immediately
cause a problem because the page reserved for overhead (Elf_Ehdr,
vmcoreinfo, etc.) is pretty generous. So usually there is enough spare
memory to counter the mis-calculated per cpu size. However, with growing
overhead and/or a huge cpu count the allocated size gets too small for the
elfcorehdr. Ultimately a BUG_ON is triggered causing the crash kernel to
panic.

Fix this by properly calculating the required size instead of relying on
magic numbers.

Fixes: a62bc07392 ("s390/kdump: add support for vector extension")
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-31 17:43:43 +02:00
Ard Biesheuvel c7513c2a27 crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair
Calling pmull_gcm_encrypt_block() requires kernel_neon_begin() and
kernel_neon_end() to be used since the routine touches the NEON
register file. Add the missing calls.

Also, since NEON register contents are not preserved outside of
a kernel mode NEON region, pass the key schedule array again.

Fixes: 7c50136a8a ("crypto: arm64/aes-ghash - yield NEON after every ...")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-31 13:20:30 +01:00
Will Deacon dcab90d909 arm64: kexec: Add comment to explain use of __flush_icache_range()
Now that we understand the deadlock arising from flush_icache_range()
on the kexec crash kernel path, add a comment to justify the use of
__flush_icache_range() here.

Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-31 12:10:38 +01:00
Will Deacon eab1cecc12 arm64: sdei: Mark sdei stack helper functions as static
The SDEI stack helper functions are only used by _on_sdei_stack() and
refer to symbols (e.g. sdei_stack_normal_ptr) that are only defined if
CONFIG_VMAP_STACK=y.

Mark these functions as static, so we don't run into errors at link-time
due to references to undefined symbols. Stick all the parameters onto
the same line whilst we're passing through.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-31 12:08:22 +01:00
Sam Bobroff b87b9cf493 powerpc/pseries: fix EEH recovery of some IOV devices
EEH recovery currently fails on pSeries for some IOV capable PCI
devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
certain device tree properties for the device. (Found on an IOV
capable device using the ipr driver.)

Recovery fails in pci_enable_resources() at the check on r->parent,
because r->flags is set and r->parent is not.  This state is due to
sriov_init() setting the start, end and flags members of the IOV BARs
but the parent not being set later in
pseries_pci_fixup_iov_resources(), because the
"ibm,open-sriov-vf-bar-info" property is missing.

Correct this by zeroing the resource flags for IOV BARs when they
can't be configured (this is the same method used by sriov_init() and
__pci_read_base()).

VFs cleared this way can't be enabled later, because that requires
another device tree property, "ibm,number-of-configurable-vfs" as well
as support for the RTAS function "ibm_map_pes". These are all part of
hypervisor support for IOV and it seems unlikely that a hypervisor
would ever partially, but not fully, support it. (None are currently
provided by QEMU/KVM.)

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Bryant G. Ly <bryantly@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:46 +10:00
Shilpasri G Bhat 04baaf28f4 powerpc/powernv: Add support to enable sensor groups
Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:45 +10:00
Akshay Adiga 1961acad2f powernv/cpuidle: Use parsed device tree values for cpuidle_init
Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
cpuidle driver. Use properties from pnv_idle_states structure for powernv
cpuidle_init.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:44 +10:00
Akshay Adiga 9c7b185ab2 powernv/cpuidle: Parse dt idle properties into global structure
Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:44 +10:00
Finn Thain ebd722275f macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
Now that the PowerMac via-pmu driver supports m68k PowerBooks,
switch over to that driver and remove the via-pmu68k driver.

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:42 +10:00
Finn Thain 54c990775f macintosh/via-pmu68k: Don't load driver on unsupported hardware
Don't load the via-pmu68k driver on early PowerBooks. The M50753 PMU
device found in those models was never supported by this driver.
Attempting to load the driver usually causes a boot hang.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Michael Schmitz <schmitzmic@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:42 +10:00
Bhupesh Sharma e401b7c2c6 arm64, kaslr: export offset in VMCOREINFO ELF notes
Include KASLR offset in arm64 VMCOREINFO ELF notes to assist in
debugging. vmcore parsing in user-space already expects this value in
the notes and we are providing it for portability of those existing
tools with x86.

Ideally we would like core code to do this (so that way this
information won't be missed when an architecture adds KASLR support),
but mips has CONFIG_RANDOMIZE_BASE, and doesn't provide kaslr_offset(),
so I am not sure if this is needed for mips (and other such similar arch
cases in future). So, lets keep this architecture specific for now.

As an example of a user-space use-case, consider the
makedumpfile user-space utility which will need fixup to use this
KASLR offset to work with cases where we need to find a way to
translate symbol address from vmlinux to kernel run time address
in case of KASLR boot on arm64.

I have already submitted the makedumpfile user-space patch upstream
and the maintainer has suggested to wait for the kernel changes to be
included (see [0]).

I tested this on my qualcomm amberwing board both for KASLR and
non-KASLR boot cases:

Without this patch:
   # cat > scrub.conf << EOF
   [vmlinux]
   erase jiffies
   erase init_task.utime
   for tsk in init_task.tasks.next within task_struct:tasks
       erase tsk.utime
   endfor
   EOF

  # makedumpfile --split -d 31 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3}
  readpage_elf: Attempt to read non-existent page at 0xffffa8a5bf180000.
  readmem: type_addr: 1, addr:ffffa8a5bf180000, size:8
  vaddr_to_paddr_arm64: Can't read pgd
  readmem: Can't convert a virtual address(ffff0000092a542c) to physical
  address.
  readmem: type_addr: 0, addr:ffff0000092a542c, size:390
  check_release: Can't get the address of system_utsname

After this patch check_release() is ok, and also we are able to erase
symbol from vmcore (I checked this with kernel 4.18.0-rc4+):

  # makedumpfile --split -d 31 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3}
  The kernel version is not supported.
  The makedumpfile operation may be incomplete.
  Checking for memory holes                         : [100.0 %] \
  Checking for memory holes                         : [100.0 %] |
  Checking foExcluding unnecessary pages                       : [100.0 %]
  \
  Excluding unnecessary pages                       : [100.0 %] \

  The dumpfiles are saved to dumpfile_1, dumpfile_2, and dumpfile_3.

  makedumpfile Completed.

[0] https://www.spinics.net/lists/kexec/msg21195.html

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-31 10:27:01 +01:00
Michael O'Farrell 9d2dcc8fc6 arm64: perf: Add cap_user_time aarch64
It is useful to get the running time of a thread.  Doing so in an
efficient manner can be important for performance of user applications.
Avoiding system calls in `clock_gettime` when handling
CLOCK_THREAD_CPUTIME_ID is important.  Other clocks are handled in the
VDSO, but CLOCK_THREAD_CPUTIME_ID falls back on the system call.

CLOCK_THREAD_CPUTIME_ID is not handled in the VDSO since it would have
costs associated with maintaining updated user space accessible time
offsets.  These offsets have to be updated everytime the a thread is
scheduled/descheduled.  However, for programs regularly checking the
running time of a thread, this is a performance improvement.

This patch takes a middle ground, and adds support for cap_user_time an
optional feature of the perf_event API.  This way costs are only
incurred when the perf_event api is enabled.  This is done the same way
as it is in x86.

Ultimately this allows calculating the thread running time in userspace
on aarch64 as follows (adapted from perf_event_open manpage):

u32 seq, time_mult, time_shift;
u64 running, count, time_offset, quot, rem, delta;
struct perf_event_mmap_page *pc;
pc = buf;  // buf is the perf event mmaped page as documented in the API.

if (pc->cap_usr_time) {
    do {
        seq = pc->lock;
        barrier();
        running = pc->time_running;

        count = readCNTVCT_EL0();  // Read ARM hardware clock.
        time_offset = pc->time_offset;
        time_mult   = pc->time_mult;
        time_shift  = pc->time_shift;

        barrier();
    } while (pc->lock != seq);

    quot = (count >> time_shift);
    rem = count & (((u64)1 << time_shift) - 1);
    delta = time_offset + quot * time_mult +
            ((rem * time_mult) >> time_shift);

    running += delta;
    // running now has the current nanosecond level thread time.
}

Summary of changes in the patch:

For aarch64 systems, make arch_perf_update_userpage update the timing
information stored in the perf_event page.  Requiring the following
calculations:
  - Calculate the appropriate time_mult, and time_shift factors to convert
    ticks to nano seconds for the current clock frequency.
  - Adjust the mult and shift factors to avoid shift factors of 32 bits.
    (possibly unnecessary)
  - The time_offset userspace should apply when doing calculations:
    negative the current sched time (now), because time_running and
    time_enabled fields of the perf_event page have just been updated.
Toggle bits to appropriate values:
  - Enable cap_user_time

Signed-off-by: Michael O'Farrell <micpof@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-31 10:14:00 +01:00
Ard Biesheuvel d26de6c9f4 arm64: drop unused kernel_neon_begin_partial() macro
When kernel mode NEON was first introduced to the arm64 kernel,
every call to kernel_neon_begin()/_end() stacked resp. unstacked
the entire NEON register file, making it worthwile to reduce the
number of used NEON registers to a bare minimum, and only stack
those. kernel_neon_begin_partial() was introduced for this purpose,
but after the refactoring for SVE and other changes, it no longer
exists and was simply #define'd to kernel_neon_begin() directly.

In the mean time, all users have been updated, so let's remove
the fallback macro.

Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-31 10:13:50 +01:00
Hendrik Brueckner 5223c67167 s390/cpum_sf: save TOD clock base in SDBs for time conversion
Processing the samples in the AUX-area by perf requires the computation
of respective time stamps.  The time stamps used by perf are based on
the monotonic clock.  To convert the TOD clock value contained in an
SDB to a monotonic clock value, the TOD clock base is required.  Hence,
also save the TOD clock base in the SDB.

Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-31 11:02:27 +02:00
Arnd Bergmann 15280e8107 sparc64: add reads{b,w,l}/writes{b,w,l}
Some drivers need these for compile-testing. On most architectures
they come from asm-generic/io.h, but not on sparc64, which has its
own definitions.

Since we already have ioread*_rep()/iowrite*_rep() that have the
same behavior on sparc64 (i.e. all PCI I/O space is memory mapped),
we can rename the existing helpers and add macros to define them
to the same implementation.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Boris Brezillon <boris.brezillon@bootlin>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2018-07-31 09:46:06 +02:00
Arnd Bergmann 0bbf47eab4 ia64: use asm-generic/io.h
asm-generic/io.h provides a generic implementation of all I/O accessors,
which the architectures can override.

Since ia64 does not provide readsl/writesl etc, any driver using those
fails to build, and including asm-generic/io.h will provide the
missing interfaces, as well as any other future interfaces that get
added there. We need to #define a couple of symbols to themselves
in the ia64 to ensure that we use the ia64 specific version of those
rather than the generic one.

There should be no other effect than adding {read,write}s{b,w,l}()
as well as {in,out}s{b,w,l}_p(), which were also not provided
by ia64 but are provided by the generic header for historic reasons.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2018-07-31 09:46:05 +02:00
Kan Liang 156c8b58ef perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices
Masayoshi Mizuma reported that a warning message is shown while a CPU is
hot-removed on Broadwell servers:

  WARNING: CPU: 126 PID: 6 at arch/x86/events/intel/uncore.c:988
  uncore_pci_remove+0x10b/0x150
  Call Trace:
   pci_device_remove+0x42/0xd0
   device_release_driver_internal+0x148/0x220
   pci_stop_bus_device+0x76/0xa0
   pci_stop_root_bus+0x44/0x60
   acpi_pci_root_remove+0x1f/0x80
   acpi_bus_trim+0x57/0x90
   acpi_bus_trim+0x2e/0x90
   acpi_device_hotplug+0x2bc/0x4b0
   acpi_hotplug_work_fn+0x1a/0x30
   process_one_work+0x174/0x3a0
   worker_thread+0x4c/0x3d0
   kthread+0xf8/0x130

This bug was introduced by:

  commit 15a3e845b0 ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs")

The index of "QPI Port 2 filter" was hardcode to 2, but this conflicts with the
index of "PCU.3" which is "HSWEP_PCI_PCU_3", which equals to 2 as well.

To fix the conflict, the hardcoded index needs to be cleaned up:

 - introduce a new enumerator "BDX_PCI_QPI_PORT2_FILTER" for "QPI Port 2
   filter" on Broadwell,
 - increase UNCORE_EXTRA_PCI_DEV_MAX by one,
 - clean up the hardcoded index.

Debugged-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: msys.mizuma@gmail.com
Cc: stable@vger.kernel.org
Fixes: 15a3e845b0 ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs")
Link: http://lkml.kernel.org/r/1532953688-15008-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-31 07:43:37 +02:00
Martin Schwidefsky 03760d44b1 KVM: s390: initial host large page support
- must be enabled via module parameter hpage=1
 - cannot be used together with nested
 - does support migration
 - does support hugetlbfs
 - no THP yet
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbX4AwAAoJEBF7vIC1phx85eMP/ifsNHwqfAOrZBdlJuLVPla5
 47J8iY4i4DOKGhKI4YOTcJQhn1izKZhECXS8d8hghB/sQUCE2CLVr1X/r1Udy2Pq
 bpKG4apYtcJZBF6qn7yDMjBGkIRK4OCBD1pkuKEq2NyvUgPsHUVUgpuq2gngMTBk
 ZN9MIfRQMdIEJsT389D6T9as0lwABJ0MJap5AudkQwguN2dDhQGeZv8l0QYV8C2I
 WqRI2VsI1QEo3cJr1lJ5li/F9fC7q0l6QwlvPVocIHJAnq01zJvOekeAgQ4hzz16
 JIoQckJq8m4d4PqZ7aWmAaMEemoQ9llmCavovspJNtFT79jho6cWWtBEvq+t0GLQ
 qTsG9Yi20hONZMWAw+JIdSdOuFMD0HCpOWdUtSMjENFRbr8LLHUr91dGIxRLjF8Z
 gv3vDJrbGzCQ+b9qPA8SrAN7U3VNCZG384MEmobwTuv5hxOopWp6chcK7RCriV/m
 7cFDfO7+2pZymdW7D4DWlFiZl4mWpwOxip32C9tCt0CQveqeYSZsb5Qb9Pe+50vr
 JhpB74UL79Wffvd65InGlu5jx1SdGG0QAzmBOkdOsAhX+0WMmXRB1ddn4whu7HPU
 ssNtdKgLt9KkM/kIsB9RC/YLvUFK1lBVHrfnzUmLw3CBHP3QeO+V+arLwdVLVDjV
 PA/LPECBWtGtQtxGWb2H
 =Y0Wl
 -----END PGP SIGNATURE-----

Merge tag 'hlp_stage1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into features

Pull hlp_stage1 from Christian Borntraeger with the following changes:

KVM: s390: initial host large page support

- must be enabled via module parameter hpage=1
- cannot be used together with nested
- does support migration
- does support hugetlbfs
- no THP yet
2018-07-31 07:14:19 +02:00
Linus Torvalds f67077deb4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Several smallish fixes, I don't think any of this requires another -rc
  but I'll leave that up to you:

   1) Don't leak uninitialzed bytes to userspace in xfrm_user, from Eric
      Dumazet.

   2) Route leak in xfrm_lookup_route(), from Tommi Rantala.

   3) Premature poll() returns in AF_XDP, from Björn Töpel.

   4) devlink leak in netdevsim, from Jakub Kicinski.

   5) Don't BUG_ON in fib_compute_spec_dst, the condition can
      legitimately happen. From Lorenzo Bianconi.

   6) Fix some spectre v1 gadgets in generic socket code, from Jeremy
      Cline.

   7) Don't allow user to bind to out of range multicast groups, from
      Dmitry Safonov with a follow-up by Dmitry Safonov.

   8) Fix metrics leak in fib6_drop_pcpu_from(), from Sabrina Dubroca"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  netlink: Don't shift with UB on nlk->ngroups
  net/ipv6: fix metrics leak
  xen-netfront: wait xenbus state change when load module manually
  can: ems_usb: Fix memory leak on ems_usb_disconnect()
  openvswitch: meter: Fix setting meter id for new entries
  netlink: Do not subscribe to non-existent groups
  NET: stmmac: align DMA stuff to largest cache line length
  tcp_bbr: fix bw probing to raise in-flight data for very small BDPs
  net: socket: Fix potential spectre v1 gadget in sock_is_registered
  net: socket: fix potential spectre v1 gadget in socketcall
  net: mdio-mux: bcm-iproc: fix wrong getter and setter pair
  ipv4: remove BUG_ON() from fib_compute_spec_dst
  enic: handle mtu change for vf properly
  net: lan78xx: fix rx handling before first packet is send
  nfp: flower: fix port metadata conversion bug
  bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog()
  bpf: fix bpf_skb_load_bytes_relative pkt length check
  perf build: Build error in libbpf missing initialization
  net: ena: Fix use of uninitialized DMA address bits field
  bpf: btf: Use exact btf value_size match in map_check_btf()
  ...
2018-07-30 21:40:37 -07:00
谢致邦 (XIE Zhibang) 60bc84e227
MIPS: Loongson: Merge load addresses
Systems based upon the Loongson 1B & 1C CPUs share the same load
address, as do those based upon Loongson 1A. Unify the definition of
this load address to reduce duplication & avoid the need for an extra
Loongson 1A case in future.

[paul.burton@mips.com: Rewrite commit message.]

Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/14927/
Cc: linux-mips@linux-mips.org
2018-07-30 18:59:01 -07:00
谢致邦 (XIE Zhibang) 968dc5a0ea
MIPS: Loongson: Set Loongson32 to MIPS32R1
LS232 (Loonson 2-issue 32-bit, also called GS232 (Godson 2-issue 32-bit))
is the CPU core (microarchitecture) of Loongson 1A/1B/1C.

According to "LS232 用户手册 (LS232 User Manual)", LS232 implements the
MIPS32 Release 1 instruction set, and part of the MIPS32 Release 2
instruction set.

In the manual, LS232 implements all of the MIPS32R2 instruction set
except the FPU instructions, and LS232 also implements 5 FPU
instructions of the MIPS32R2 instruction set: CEIL.L.fmt, CVT.L.fmt,
FLOOR.L.fmt, TRUNC.L.fmt, and ROUND.L.fmt.

But a bug of the DI instruction has been found during tests, the DI
instruction can not disable interrupts in arch_local_irq_disable() with
CONFIG_PREEMPT_NONE=y and CFLAGS='-mno-branch-likely' in some cases.

[paul.burton@mips.com:
  - Remove the _MIPS_ISA redefinition to match the change made for the
    generic MIPSr1 CPUs by commit 344ebf0994 ("MIPS: Always use
    -march=<arch>, not -<arch> shortcuts").]

Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/16155/
Cc: linux-mips@linux-mips.org
Cc: ralf@linux-mips.org
2018-07-30 18:54:15 -07:00
Jiri Kosina fdf82a7856 x86/speculation: Protect against userspace-userspace spectreRSB
The article "Spectre Returns! Speculation Attacks using the Return Stack 
Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks, 
making use solely of the RSB contents even on CPUs that don't fallback to 
BTB on RSB underflow (Skylake+).

Mitigate userspace-userspace attacks by always unconditionally filling RSB on
context switch when the generic spectrev2 mitigation has been enabled.

[1] https://arxiv.org/pdf/1807.07940.pdf

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1807261308190.997@cbobk.fhfr.pm
2018-07-31 00:45:15 +02:00
Janosch Frank 2375846193 KVM: s390: initial host large page support
- must be enabled via module parameter hpage=1
 - cannot be used together with nested
 - does support migration
 - does support hugetlbfs
 - no THP yet
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbX4AwAAoJEBF7vIC1phx85eMP/ifsNHwqfAOrZBdlJuLVPla5
 47J8iY4i4DOKGhKI4YOTcJQhn1izKZhECXS8d8hghB/sQUCE2CLVr1X/r1Udy2Pq
 bpKG4apYtcJZBF6qn7yDMjBGkIRK4OCBD1pkuKEq2NyvUgPsHUVUgpuq2gngMTBk
 ZN9MIfRQMdIEJsT389D6T9as0lwABJ0MJap5AudkQwguN2dDhQGeZv8l0QYV8C2I
 WqRI2VsI1QEo3cJr1lJ5li/F9fC7q0l6QwlvPVocIHJAnq01zJvOekeAgQ4hzz16
 JIoQckJq8m4d4PqZ7aWmAaMEemoQ9llmCavovspJNtFT79jho6cWWtBEvq+t0GLQ
 qTsG9Yi20hONZMWAw+JIdSdOuFMD0HCpOWdUtSMjENFRbr8LLHUr91dGIxRLjF8Z
 gv3vDJrbGzCQ+b9qPA8SrAN7U3VNCZG384MEmobwTuv5hxOopWp6chcK7RCriV/m
 7cFDfO7+2pZymdW7D4DWlFiZl4mWpwOxip32C9tCt0CQveqeYSZsb5Qb9Pe+50vr
 JhpB74UL79Wffvd65InGlu5jx1SdGG0QAzmBOkdOsAhX+0WMmXRB1ddn4whu7HPU
 ssNtdKgLt9KkM/kIsB9RC/YLvUFK1lBVHrfnzUmLw3CBHP3QeO+V+arLwdVLVDjV
 PA/LPECBWtGtQtxGWb2H
 =Y0Wl
 -----END PGP SIGNATURE-----

Merge tag 'hlp_stage1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvms390/next

KVM: s390: initial host large page support

- must be enabled via module parameter hpage=1
- cannot be used together with nested
- does support migration
- does support hugetlbfs
- no THP yet
2018-07-30 23:20:48 +02:00
Janosch Frank a449938297 KVM: s390: Add huge page enablement control
General KVM huge page support on s390 has to be enabled via the
kvm.hpage module parameter. Either nested or hpage can be enabled, as
we currently do not support vSIE for huge backed guests. Once the vSIE
support is added we will either drop the parameter or enable it as
default.

For a guest the feature has to be enabled through the new
KVM_CAP_S390_HPAGE_1M capability and the hpage module
parameter. Enabling it means that cmm can't be enabled for the vm and
disables pfmf and storage key interpretation.

This is due to the fact that in some cases, in upcoming patches, we
have to split huge pages in the guest mapping to be able to set more
granular memory protection on 4k pages. These split pages have fake
page tables that are not visible to the Linux memory management which
subsequently will not manage its PGSTEs, while the SIE will. Disabling
these features lets us manage PGSTE data in a consistent matter and
solve that problem.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 23:13:38 +02:00
Janosch Frank a9e00d8349 s390/mm: Add huge page gmap linking support
Let's allow huge pmd linking when enabled through the
KVM_CAP_S390_HPAGE_1M capability. Also we can now restrict gmap
invalidation and notification to the cases where the capability has
been activated and save some cycles when that's not the case.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 23:13:38 +02:00
Dominik Dingel 7d735b9ae8 s390/mm: hugetlb pages within a gmap can not be freed
Guests backed by huge pages could theoretically free unused pages via
the diagnose 10 instruction. We currently don't allow that, so we
don't have to refault it once it's needed again.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-07-30 23:13:38 +02:00
Christoph Hellwig a8651194f9 PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core
There is nothing arch-specific about PCI or dma-debug, so call
dma_debug_add_bus() from the PCI core just after registering the bus type.

Most of dma-debug is already generic; this just adds reporting of pending
dma-allocations on driver unload for arches other than powerpc, sh, and
x86.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
2018-07-30 15:58:01 -05:00
Thomas Petazzoni 12be1036c5 sparc: use asm-generic version of msi.h
This is necessary to be able to include <linux/msi.h> when
CONFIG_GENERIC_MSI_IRQ_DOMAIN is enabled. Without this, a build with
CONFIG_GENERIC_MSI_IRQ_DOMAIN fails with:

   In file included from drivers//ata/ahci.c:45:0:
>> include/linux/msi.h:226:10: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
             msi_alloc_info_t *arg);
             ^~~~~~~~~~~~~~~~
             sg_alloc_fn
   include/linux/msi.h:230:9: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
            msi_alloc_info_t *arg);
            ^~~~~~~~~~~~~~~~
            sg_alloc_fn
   include/linux/msi.h:239:12: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
               msi_alloc_info_t *arg);
               ^~~~~~~~~~~~~~~~
               sg_alloc_fn
   include/linux/msi.h:240:22: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
     void  (*msi_finish)(msi_alloc_info_t *arg, int retval);
                         ^~~~~~~~~~~~~~~~
                         sg_alloc_fn
   include/linux/msi.h:241:20: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
     void  (*set_desc)(msi_alloc_info_t *arg,
                       ^~~~~~~~~~~~~~~~
                       sg_alloc_fn
   include/linux/msi.h:316:18: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
           int nvec, msi_alloc_info_t *args);
                     ^~~~~~~~~~~~~~~~
                     sg_alloc_fn
   include/linux/msi.h:318:29: error: unknown type name 'msi_alloc_info_t'; did you mean 'sg_alloc_fn'?
            int virq, int nvec, msi_alloc_info_t *args);
                                ^~~~~~~~~~~~~~~~
                                sg_alloc_fn

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 13:00:56 -07:00
Thomas Petazzoni f0afc6b18d sparc: move MSI related definitions to where they are used
The definitions in arch/sparc/include/asm/msi.h are only used in
arch/sparc/mm/srmmu.c, so it makes sense to have them in the C file
directly.

In addition, having a custom arch/sparc/include/asm/msi.h prevents
from using the asm-generic version of this header, which is necessary
to be able to include <linux/msi.h> when CONFIG_GENERIC_MSI_IRQ_DOMAIN
is enabled.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 13:00:56 -07:00
Steven Rostedt (VMware) 6f57ed681e sparc/time: Add missing __init to init_tick_ops()
Code that was added to force gcc not to inline any function that isn't
explicitly declared as inline uncovered that init_tick_ops() isn't
marked as "__init". It is only called by __init functions and more
importantly it too calls an __init function which would require it to be
__init as well.

Link: http://lkml.kernel.org/r/201806060444.hdHcKOBy%fengguang.wu@intel.com

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 12:48:29 -07:00
Linus Torvalds 527838d470 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - a build race fix

   - a Xen entry fix

   - a TSC_DEADLINE quirk future-proofing fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Fix if_changed build flip/flop bug
  x86/entry/64: Remove %ebx handling from error_entry/exit
  x86/apic: Future-proof the TSC_DEADLINE quirk for SKX
2018-07-30 12:16:03 -07:00
Randy Dunlap ec837d620c arc: fix type warnings in arc/mm/cache.c
Fix type warnings in arch/arc/mm/cache.c.

../arch/arc/mm/cache.c: In function 'flush_anon_page':
../arch/arc/mm/cache.c:1062:55: warning: passing argument 2 of '__flush_dcache_page' makes integer from pointer without a cast [-Wint-conversion]
  __flush_dcache_page((phys_addr_t)page_address(page), page_address(page));
                                                       ^~~~~~~~~~~~~~~~~~
../arch/arc/mm/cache.c:1013:59: note: expected 'long unsigned int' but argument is of type 'void *'
 void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
                                             ~~~~~~~~~~~~~~^~~~~

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Elad Kanfi <eladkan@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Ofer Levi <oferle@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-07-30 11:48:50 -07:00
Randy Dunlap 2423665ec5 arc: fix build errors in arc/include/asm/delay.h
Fix build errors in arch/arc/'s delay.h:
- add "extern unsigned long loops_per_jiffy;"
- add <asm-generic/types.h> for "u64"

In file included from ../drivers/infiniband/hw/cxgb3/cxio_hal.c:32:
../arch/arc/include/asm/delay.h: In function '__udelay':
../arch/arc/include/asm/delay.h:61:12: error: 'u64' undeclared (first use in this function)
  loops = ((u64) usecs * 4295 * HZ * loops_per_jiffy) >> 32;
            ^~~

In file included from ../drivers/infiniband/hw/cxgb3/cxio_hal.c:32:
../arch/arc/include/asm/delay.h: In function '__udelay':
../arch/arc/include/asm/delay.h:63:37: error: 'loops_per_jiffy' undeclared (first use in this function)
  loops = ((u64) usecs * 4295 * HZ * loops_per_jiffy) >> 32;
                                     ^~~~~~~~~~~~~~~

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Elad Kanfi <eladkan@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Ofer Levi <oferle@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-07-30 11:48:50 -07:00
Randy Dunlap 9e2ea40554 arc: [plat-eznps] fix printk warning in arc/plat-eznps/mtm.c
Fix printk format warning in arch/arc/plat-eznps/mtm.c:

In file included from ../include/linux/printk.h:7,
                 from ../include/linux/kernel.h:14,
                 from ../include/linux/list.h:9,
                 from ../include/linux/smp.h:12,
                 from ../arch/arc/plat-eznps/mtm.c:17:
../arch/arc/plat-eznps/mtm.c: In function 'set_mtm_hs_ctr':
../include/linux/kern_levels.h:5:18: warning: format '%d' expects argument of type 'int', but argument 2 has type 'long int' [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
                  ^~~~~~
../include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
 #define KERN_ERR KERN_SOH "3" /* error conditions */
                  ^~~~~~~~
../include/linux/printk.h:308:9: note: in expansion of macro 'KERN_ERR'
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
         ^~~~~~~~
../arch/arc/plat-eznps/mtm.c:166:3: note: in expansion of macro 'pr_err'
   pr_err("** Invalid @nps_mtm_hs_ctr [%d] needs to be [%d:%d] (incl)\n",
   ^~~~~~
../arch/arc/plat-eznps/mtm.c:166:40: note: format string is defined here
   pr_err("** Invalid @nps_mtm_hs_ctr [%d] needs to be [%d:%d] (incl)\n",
                                       ~^
                                       %ld
The hs_ctr variable can just be int instead of long, so also change
kstrtol() to kstrtoint() and leave the format string as %d.

Also add 2 header files since they are used in mtm.c and we prefer
not to depend on accidental/indirect #includes.

Cc: linux-snps-arc@lists.infradead.org
Cc: Ofer Levi <oferle@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-07-30 11:48:49 -07:00
Randy Dunlap b1f32ce1c3 arc: [plat-eznps] fix data type errors in platform headers
Add <linux/types.h> to fix build errors.
Both ctop.h and <soc/nps/common.h> use u32 types and cause many
errors.

Examples:
../include/soc/nps/common.h:71:4: error: unknown type name 'u32'
    u32 __reserved:20, cluster:4, core:4, thread:4;
../include/soc/nps/common.h:76:3: error: unknown type name 'u32'
   u32 value;
../include/soc/nps/common.h:124:4: error: unknown type name 'u32'
    u32 base:8, cl_x:4, cl_y:4,
../include/soc/nps/common.h:127:3: error: unknown type name 'u32'
   u32 value;

../arch/arc/plat-eznps/include/plat/ctop.h:83:4: error: unknown type name 'u32'
    u32 gen:1, gdis:1, clk_gate_dis:1, asb:1,
../arch/arc/plat-eznps/include/plat/ctop.h:86:3: error: unknown type name 'u32'
   u32 value;
../arch/arc/plat-eznps/include/plat/ctop.h:93:4: error: unknown type name 'u32'
    u32 csa:22, dmsid:6, __reserved:3, cs:1;
../arch/arc/plat-eznps/include/plat/ctop.h:95:3: error: unknown type name 'u32'
   u32 value;

Cc: linux-snps-arc@lists.infradead.org
Cc: Ofer Levi <oferle@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-07-30 11:48:49 -07:00
Linus Torvalds 0634922a78 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes:

   - AMD IBS data corruptor fix (uncovered by UBSAN)

   - an Intel PEBS entry unwind error fix

   - a HW-tracing crash fix

   - a MAINTAINERS update"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix crash when using HW tracing kernel filters
  perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)
  MAINTAINERS: Add Naveen N. Rao as kprobes co-maintainer
  perf/x86/amd/ibs: Don't access non-started event
2018-07-30 11:45:30 -07:00
Linus Torvalds fb20c03d37 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "A paravirt UP-patching fix, and an I2C MUX driver lockdep warning fix"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/pvqspinlock/x86: Use LOCK_PREFIX in __pv_queued_spin_unlock() assembly code
  i2c/mux, locking/core: Annotate the nested rt_mutex usage
  locking/rtmutex: Allow specifying a subclass for nested locking
2018-07-30 11:37:16 -07:00
Linus Torvalds d464b0314c Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fix from Ingo Molnar:
 "An UEFI variables fix for SEV guests"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Access EFI MMIO data as unencrypted when SEV is active
2018-07-30 11:07:34 -07:00
Yi Wang 843c408905 x86/apic: Trivial coding style fixes
There is inconsistent indenting in calibrate_APIC_clock() and
activate_managed(). Remove the surplus TAB.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: jgross@suse.com
Cc: ville.syrjala@linux.intel.com
Cc: len.brown@intel.com
Cc: gregkh@linuxfoundation.org
Cc: zhong.weidong@zte.com.cn
Link: https://lkml.kernel.org/r/1532672103-32250-1-git-send-email-wang.yi59@zte.com.cn
2018-07-30 19:56:30 +02:00
Dou Liyang 24cfd8ca1d x86/platform/UV: Mark memblock related init code and data correctly
parse_mem_block_size() and mem_block_size are only used during init. Mark
them accordingly.

Fixes: d7609f4210 ("x86/platform/UV: Add kernel parameter to set memory block size")
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Link: https://lkml.kernel.org/r/20180730075947.23023-1-douly.fnst@cn.fujitsu.com
2018-07-30 19:53:58 +02:00
zhong jiang 5db1b1e1ee x86/boot/KASLR: Make local variable mem_limit static
Fix the following sparse warning:

arch/x86/boot/compressed/kaslr.c:102:20: warning: symbol 'mem_limit' was not declared. Should it be static?

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/1532958273-47725-1-git-send-email-zhongjiang@huawei.com
2018-07-30 19:46:03 +02:00
Quentin Schulz 6386889ac2
MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller
The GPIO controller also serves as an interrupt controller for events
on the GPIO it handles.

An interrupt occurs whenever a GPIO line has changed.

Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20015/
Cc: robh+dt@kernel.org
Cc: mark.rutland@arm.com
Cc: ralf@linux-mips.org
Cc: jhogan@kernel.org
Cc: linux-gpio@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: thomas.petazzoni@bootlin.com
2018-07-30 10:34:28 -07:00
Dou Liyang 1088c6eef2 x86/kvmclock: Mark kvm_get_preset_lpj() as __init
kvm_get_preset_lpj() is only called from kvmclock_init(), so mark it __init
as well.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc:   Radim Krčmář<rkrcmar@redhat.com>
Cc:  <kvm@vger.kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20180730075421.22830-3-douly.fnst@cn.fujitsu.com
2018-07-30 19:33:35 +02:00
Dou Liyang 608008a457 x86/tsc: Consolidate init code
Split out suplicated code from tsc_early_init() and tsc_init() into a
common helper and fixup some comment typos.

[ tglx: Massaged changelog and renamed function ]

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180730075421.22830-2-douly.fnst@cn.fujitsu.com
2018-07-30 19:33:35 +02:00
Paul Burton 0211d49e52
MIPS: generic: Select MIPS_AUTO_PFN_OFFSET
Enable CONFIG_MIPS_AUTO_PFN_OFFSET for the generic platform, allowing
it to avoid wasted book-keeping for pages with addresses lower than the
physical base address of memory.

This has a minimal impact on kernel text size, with 64r6el_defconfig
gaining 0.1% in size as reported by bloat-o-meter:

  add/remove: 4/1 grow/shrink: 345/13 up/down: 9017/-392 (8625)
  Function                                     old     new   delta
  pcpu_setup_first_chunk                      1444    1780    +336
  pcpu_alloc_first_chunk                       864    1136    +272
  start_kernel                                1064    1288    +224
  initcall_blacklist                           224     372    +148
  try_fill_recv                               2088    2184     +96
  ...
  Total: Before=8457273, After=8465898, chg +0.10%

The gain for systems with large offsets to physical memory & the ability
to continue using generic kernels on such systems seems well worth this
small cost.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Suggested-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
Patchwork: https://patchwork.linux-mips.org/patch/20049/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-07-30 10:27:38 -07:00
Paul Burton 6c359eb1dc
MIPS: Allow auto-dection of ARCH_PFN_OFFSET & PHYS_OFFSET
On systems where physical memory begins at a non-zero address, defining
PHYS_OFFSET (which influences ARCH_PFN_OFFSET) can save us time & memory
by avoiding book-keeping for pages from address zero to the start of
memory.

Some MIPS platforms already make use of this, but with the definition of
PHYS_OFFSET being compile-time constant it hasn't been possible to
enable this optimization for a kernel which may run on systems with
varying physical memory base addresses.

Introduce a new Kconfig option CONFIG_MIPS_AUTO_PFN_OFFSET which, when
enabled, makes ARCH_PFN_OFFSET a variable & detects it from the boot
memory map (which for example may have been populated from DT). The
relationship with PHYS_OFFSET is reversed, with PHYS_OFFSET now being
based on ARCH_PFN_OFFSET. This is because ARCH_PFN_OFFSET is used far
more often, so avoiding the need for runtime calculation gives us a
smaller impact on kernel text size (0.1% rather than 0.15% for
64r6el_defconfig).

Signed-off-by: Paul Burton <paul.burton@mips.com>
Suggested-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
Patchwork: https://patchwork.linux-mips.org/patch/20048/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
2018-07-30 10:27:32 -07:00
Paul Burton 0494d7ffdc
MIPS: Fix ISA virt/bus conversion for non-zero PHYS_OFFSET
isa_virt_to_bus() & isa_bus_to_virt() claim to treat ISA bus addresses
as being identical to physical addresses, but they fail to do so in the
presence of a non-zero PHYS_OFFSET.

Correct this by having them use virt_to_phys() & phys_to_virt(), which
consolidates the calculations to one place & ensures that ISA bus
addresses do indeed match physical addresses.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20047/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
2018-07-30 10:27:28 -07:00
Paul Burton 0d0e14770d
MIPS: Make (UN)CAC_ADDR() PHYS_OFFSET-agnostic
Converting an address between cached & uncached (typically addresses in
(c)kseg0 & (c)kseg1 or 2 xkphys regions) should not depend upon
PHYS_OFFSET in any way - we're converting from a virtual address in one
unmapped region to a virtual address in another unmapped region.

For some reason our CAC_ADDR() & UNCAC_ADDR() macros make use of
PAGE_OFFSET, which typically includes PHYS_OFFSET. This means that
platforms with a non-zero PHYS_OFFSET typically have to workaround
miscalculation by these 2 macros by also defining UNCAC_BASE to a value
that isn't really correct.

It appears that an attempt has previously been made to address this with
commit 3f4579252aa1 ("MIPS: make CAC_ADDR and UNCAC_ADDR account for
PHYS_OFFSET") which was later undone by commit ed3ce16c3d ("Revert
"MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"") which
also introduced the ar7 workaround. That attempt at a fix was roughly
equivalent, but essentially caused the CAC_ADDR() & UNCAC_ADDR() macros
to cancel out PHYS_OFFSET by adding & then subtracting it again. In his
revert Leonid is correct that using PHYS_OFFSET makes no sense in the
context of these macros, but appears to have missed its inclusion via
PAGE_OFFSET which means PHYS_OFFSET actually had an effect after the
revert rather than before it.

Here we fix this by modifying CAC_ADDR() & UNCAC_ADDR() to stop using
PAGE_OFFSET (& thus PHYS_OFFSET), instead using __pa() & __va() along
with UNCAC_BASE.

For UNCAC_ADDR(), __pa() will convert a cached address to a physical
address which we can simply use as an offset from UNCAC_BASE to obtain
an address in the uncached region.

For CAC_ADDR() we can undo the effect of UNCAC_ADDR() by subtracting
UNCAC_BASE and using __va() on the result.

With this change made, remove definitions of UNCAC_BASE from the ar7 &
pic32 platforms which appear to have defined them only to workaround
this problem.

Signed-off-by: Paul Burton <paul.burton@mips.com>
References: 3f4579252aa1 ("MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET")
References: ed3ce16c3d ("Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"")
Patchwork: https://patchwork.linux-mips.org/patch/20046/
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
2018-07-30 10:27:20 -07:00
Dave Kleikamp 140aada48b arm64: kexec: machine_kexec should call __flush_icache_range
machine_kexec flushes the reboot_code_buffer from the icache
after stopping the other cpus.

Commit 3b8c9f1cdf ("arm64: IPI each CPU after invalidating the I-cache
for kernel mappings") added an IPI call to flush_icache_range, which
causes a hang here, so replace the call with __flush_icache_range

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-30 17:58:11 +01:00
Ofer Levi 05b466bf84 ARC: [plat-eznps] Add missing struct nps_host_reg_aux_dpc
Fixing compilation issue caused by missing struct nps_host_reg_aux_dpc
definition.

Fixes: 3f9cd874dc ("ARC: [plat-eznps] avoid toggling of DPC register")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ofer Levi <oferle@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-07-30 09:47:53 -07:00
Eugeniy Paltsev 386177da9e ARC: add SMP_CACHE_BYTES value validate
Check that SMP_CACHE_BYTES (and hence ARCH_DMA_MINALIGN) is larger
or equal to any cache line length by comparing it with values
previously read from ARC cache BCR registers.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-07-30 09:46:19 -07:00
Will Deacon efd112353b arm64: svc: Ensure hardirq tracing is updated before return
We always run userspace with interrupts enabled, but with the recent
conversion of the syscall entry/exit code to C, we don't inform the
hardirq tracing code that interrupts are about to become enabled by
virtue of restoring the EL0 SPSR.

This patch ensures that trace_hardirqs_on() is called on the syscall
return path when we return to the assembly code with interrupts still
disabled.

Fixes: f37099b699 ("arm64: convert syscall trace logic to C")
Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-30 17:43:39 +01:00
Janosch Frank 57cb198cfd KVM: s390: Beautify skey enable check
Let's introduce an explicit check if skeys have already been enabled
for the vcpu, so we don't have to check the mm context if we don't have
the storage key facility.

This lets us check for enablement without having to take the mm
semaphore and thus speedup skey emulation.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-07-30 17:05:52 +02:00
Alexey Spirkov f7e2a15223 powerpc/44x: Mark mmu_init_secondary() as __init
mmu_init_secondary() calls ppc44x_pin_tlb() which is marked __init,
leading to a warning:

  The function mmu_init_secondary() references
  the function __init ppc44x_pin_tlb().

There's no CPU hotplug support on 44x so mmu_init_secondary() will
only be called at boot. Therefore we should mark it as __init.

Signed-off-by: Alexey Spirkov <alexeis@astrosoft.ru>
[mpe: Flesh out change log details]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:22 +10:00
Michael Ellerman a984506c54 powerpc/mm: Don't report PUDs as memory leaks when using kmemleak
Paul Menzel reported that kmemleak was producing reports such as:

  unreferenced object 0xc0000000f8b80000 (size 16384):
    comm "init", pid 1, jiffies 4294937416 (age 312.240s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<00000000d997deb7>] __pud_alloc+0x80/0x190
      [<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0
      [<00000000091e51c2>] shift_arg_pages+0xc0/0x210
      [<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0
      [<0000000060871529>] load_elf_binary+0x41c/0x1648
      [<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280
      [<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940
      [<000000005f953a6e>] sys_execve+0x58/0x70
      [<000000009700a858>] system_call+0x5c/0x70

Indicating that a PUD was being leaked.

However what's really happening is that kmemleak is not able to
recognise the references from the PGD to the PUD, because they are not
fully qualified pointers.

We can confirm that in xmon, eg:

Find the task struct for pid 1 "init":
  0:mon> P
       task_struct     ->thread.ksp    PID   PPID S  P CMD
  c0000001fe7c0000 c0000001fe803960      1      0 S 13 systemd

Dump virtual address 0 to find the PGD:
  0:mon> dv 0 c0000001fe7c0000
  pgd  @ 0xc0000000f8b01000

Dump the memory of the PGD:
  0:mon> d c0000000f8b01000
  c0000000f8b01000 00000000f8b90000 0000000000000000  |................|
  c0000000f8b01010 0000000000000000 0000000000000000  |................|
  c0000000f8b01020 0000000000000000 0000000000000000  |................|
  c0000000f8b01030 0000000000000000 00000000f8b80000  |................|
                                    ^^^^^^^^^^^^^^^^

There we can see the reference to our supposedly leaked PUD. But
because it's missing the leading 0xc, kmemleak won't recognise it.

We can confirm it's still in use by translating an address that is
mapped via it:
  0:mon> dv 7fff94000000 c0000001fe7c0000
  pgd  @ 0xc0000000f8b01000
  pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <--
  pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000
  pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000
  ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386
  Maps physical address = 0x00000001d5ce0000
  Flags = Accessed Dirty Read Write

The fix is fairly simple. We need to tell kmemleak to ignore PUD
allocations and never report them as leaks. We can also tell it not to
scan the PGD, because it will never find pointers in there. However it
will still notice if we allocate a PGD and then leak it.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:21 +10:00
Christophe Leroy 405cb4024e powerpc: split asm/tlbflush.h
Split asm/tlbflush.h into:
asm/nohash/tlbflush.h
asm/book3s/32/tlbflush.h
asm/book3s/64/tlbflush.h (already existing)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:21 +10:00
Christophe Leroy 45ef5992e0 powerpc: remove unnecessary inclusion of asm/tlbflush.h
asm/tlbflush.h is only needed for:
- using functions xxx_flush_tlb_xxx()
- using MMU_NO_CONTEXT
- including asm-generic/pgtable.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:20 +10:00
Christophe Leroy 7bc396958c powerpc/44x: remove page.h from mmu-44x.h
mmu-44x.h doesn't need asm/page.h if PAGE_SHIFT are replaced by CONFIG_PPC_XX_PAGES

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:20 +10:00
Christophe Leroy 0c295d0e9c powerpc/nohash: fix hash related comments in pgtable.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:19 +10:00
Christophe Leroy 62b8426578 powerpc: fix includes in asm/processor.h
Remove superflous includes and add missing ones

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:19 +10:00
Christophe Leroy 6b62266911 powerpc/book3s: Remove PPC_PIN_SIZE
PPC_PIN_SIZE is specific to the 44x and is defined in mmu.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:19 +10:00
Christophe Leroy b5ac51d747 powerpc: declare set_breakpoint() static
set_breakpoint() is only used in process.c so make it static

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:18 +10:00
Christophe Leroy e8cb7a55eb powerpc: remove superflous inclusions of asm/fixmap.h
Files not using fixmap consts or functions don't need asm/fixmap.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:18 +10:00
Christophe Leroy 2c86cd188f powerpc: clean inclusions of asm/feature-fixups.h
files not using feature fixup don't need asm/feature-fixups.h
files using feature fixup need asm/feature-fixups.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:17 +10:00
Christophe Leroy 5c35a02c54 powerpc: clean the inclusion of stringify.h
Only include linux/stringify.h is files using __stringify()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:17 +10:00
Christophe Leroy ec0c464cdb powerpc: move ASM_CONST and stringify_in_c() into asm-const.h
This patch moves ASM_CONST() and stringify_in_c() into
dedicated asm-const.h, then cleans all related inclusions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: asm-compat.h should include asm-const.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:16 +10:00
Christophe Leroy 36a7eeaff7 powerpc/405: move PPC405_ERR77 in asm-405.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:13 +10:00
Christophe Leroy 8c58259bba powerpc: remove unneeded inclusions of cpu_has_feature.h
Files not using cpu_has_feature() don't need cpu_has_feature.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:47:54 +10:00
Christophe Leroy db0a2b633d powerpc: remove kdump.h from page.h
page.h doesn't need kdump.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:47:53 +10:00
Joerg Roedel ca38dc8f27 x86/kexec: Allocate 8k PGDs for PTI
Fuzzing the PTI-x86-32 code with trinity showed unhandled
kernel paging request oops-messages that looked a lot like
silent data corruption.

Lot's of debugging and testing lead to the kexec-32bit code,
which is still allocating 4k PGDs when PTI is enabled. But
since it uses native_set_pud() to build the page-table, it
will unevitably call into __pti_set_user_pgtbl(), which
writes beyond the allocated 4k page.

Use PGD_ALLOCATION_ORDER to allocate PGDs in the kexec code
to fix the issue.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David H. Gutteridge <dhgutteridge@sympatico.ca>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1532533683-5988-4-git-send-email-joro@8bytes.org
2018-07-30 13:53:48 +02:00
Joerg Roedel 6863ea0cda x86/mm: Remove in_nmi() warning from vmalloc_fault()
It is perfectly okay to take page-faults, especially on the
vmalloc area while executing an NMI handler. Remove the
warning.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David H. Gutteridge <dhgutteridge@sympatico.ca>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1532533683-5988-2-git-send-email-joro@8bytes.org
2018-07-30 13:53:48 +02:00
Nicolas Pitre 001a30c4d0 ARM: 8785/1: use compiler built-ins for ffs and fls
On ARMv5 and above, it is beneficial to use compiler built-ins such as
__builtin_ffs() and __builtin_ctzl() to implement ffs(), __ffs(), fls()
and __fls(). The compiler does inline the clz instruction and even the
rbit instruction when available, or provide a constant value when
possible. On ARMv4 the compiler calls out to helper functions for those
built-ins so it is best to keep the open coded versions in that case.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30 11:45:53 +01:00
Vladimir Murzin cbfc5619e0 ARM: 8784/1: NOMMU: Allow enter in Hyp mode
ARMv8R adds support for virtualisation extension (with some deviation
from v8A). With this patch hyp-unaware boot code can offload to kernel
setting up HYP stuff in a sane state.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30 11:45:52 +01:00
Vladimir Murzin c803ce3f18 ARM: 8783/1: NOMMU: Extend check for VBAR support
ARMv8R adds support for VBAR and updates ID_PFR1 with the new filed
Sec_frac (bits [23:20]):

Security fractional field. When the Security field is 0000, determines
the support for features from the ARMv7 Security Extensions. Permitted
values are:

0000 No features from the ARMv7 Security Extensions are implemented.
     This value is not supported in ARMv8 if ID_PFR1 bits [7:4] are zero.

0001 The implementation includes the VBAR, and the TCR.PD0 and TCR.PD1
     bits.

0010 As for 0001, plus the ability to access Secure or Non-secure
     physical memory is supported.

All other values are reserved.

This field is only valid when ID_PFR1[7:4] == 0, otherwise it holds
the value 0000.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30 11:45:51 +01:00
Masahiro Yamada 35f5a6acfb ARM: 8782/1: vfp: clean up arch/arm/vfp/Makefile
Since commit 799c434154 ("kbuild: thin archives make default for
all archs"), $(AR) is used instead of $(LD) to combine object files.

The following code in arch/arm/vfp/Makefile:

  LDFLAGS         +=--no-warn-mismatch

... is no longer used.

Also, arch/arm/Makefile already guards arch/arm/vfp/ by a boolean
symbol, CONFIG_VFP, like this:

  core-$(CONFIG_VFP)              += arch/arm/vfp/

So, $(CONFIG_VFP) is always evaluated to y in arch/arm/vfp/Makefile.
There is no point to use pseudo object, vfp.o, which never becomes
a module.  Add all objects to obj-y directly.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30 11:45:50 +01:00
Vincent Whitchurch afc9f65e01 ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
When building the kernel as Thumb-2 with binutils 2.29 or newer, if the
assembler has seen the .type directive (via ENDPROC()) for a symbol, it
automatically handles the setting of the lowest bit when the symbol is
used with ADR.  The badr macro on the other hand handles this lowest bit
manually.  This leads to a jump to a wrong address in the wrong state
in the syscall return path:

 Internal error: Oops - undefined instruction: 0 [#2] SMP THUMB2
 Modules linked in:
 CPU: 0 PID: 652 Comm: modprobe Tainted: G      D           4.18.0-rc3+ #8
 PC is at ret_fast_syscall+0x4/0x62
 LR is at sys_brk+0x109/0x128
 pc : [<80101004>]    lr : [<801c8a35>]    psr: 60000013
 Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
 Control: 50c5387d  Table: 9e82006a  DAC: 00000051
 Process modprobe (pid: 652, stack limit = 0x(ptrval))

 80101000 <ret_fast_syscall>:
 80101000:       b672            cpsid   i
 80101002:       f8d9 2008       ldr.w   r2, [r9, #8]
 80101006:       f1b2 4ffe       cmp.w   r2, #2130706432 ; 0x7f000000

 80101184 <local_restart>:
 80101184:       f8d9 a000       ldr.w   sl, [r9]
 80101188:       e92d 0030       stmdb   sp!, {r4, r5}
 8010118c:       f01a 0ff0       tst.w   sl, #240        ; 0xf0
 80101190:       d117            bne.n   801011c2 <__sys_trace>
 80101192:       46ba            mov     sl, r7
 80101194:       f5ba 7fc8       cmp.w   sl, #400        ; 0x190
 80101198:       bf28            it      cs
 8010119a:       f04f 0a00       movcs.w sl, #0
 8010119e:       f3af 8014       nop.w   {20}
 801011a2:       f2af 1ea2       subw    lr, pc, #418    ; 0x1a2

To fix this, add a new symbol name which doesn't have ENDPROC used on it
and use that with badr.  We can't remove the badr usage since that would
would cause breakage with older binutils.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30 11:45:19 +01:00
Janosch Frank bd096f6443 KVM: s390: Add skey emulation fault handling
When doing skey emulation for huge guests, we now need to fault in
pmds, as we don't have PGSTES anymore to store them when we do not
have valid table entries.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-07-30 11:20:18 +01:00
Janosch Frank 637ff9efe5 s390/mm: Add huge pmd storage key handling
Storage keys for guests with huge page mappings have to be managed in
hardware. There are no PGSTEs for PMDs that we could use to retain the
guests's logical view of the key.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 11:20:18 +01:00
Janosch Frank 3afdfca698 s390/mm: Clear skeys for newly mapped huge guest pmds
Similarly to the pte skey handling, where we set the storage key to
the default key for each newly mapped pte, we have to also do that for
huge pmds.

With the PG_arch_1 flag we keep track if the area has already been
cleared of its skeys.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-30 11:20:18 +01:00
Dominik Dingel 964c2c05c9 s390/mm: Clear huge page storage keys on enable_skey
When a guest starts using storage keys, we trap and set a default one
for its whole valid address space. With this patch we are now able to
do that for large pages.

To speed up the storage key insertion, we use
__storage_key_init_range, which in-turn will use sske_frame to set
multiple storage keys with one instruction. As it has been previously
used for debuging we have to get rid of the default key check and make
it quiescing.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
[replaced page_set_storage_key loop with __storage_key_init_range]
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 11:20:18 +01:00
Janosch Frank 0959e16867 s390/mm: Add huge page dirty sync support
To do dirty loging with huge pages, we protect huge pmds in the
gmap. When they are written to, we unprotect them and mark them dirty.

We introduce the function gmap_test_and_clear_dirty_pmd which handles
dirty sync for huge pages.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
2018-07-30 11:20:18 +01:00
Janosch Frank 6a3762778d s390/mm: Add gmap pmd invalidation and clearing
If the host invalidates a pmd, we also have to invalidate the
corresponding gmap pmds, as well as flush them from the TLB. This is
necessary, as we don't share the pmd tables between host and guest as
we do with ptes.

The clearing part of these three new functions sets a guest pmd entry
to _SEGMENT_ENTRY_EMPTY, so the guest will fault on it and we will
re-link it.

Flushing the gmap is not necessary in the host's lazy local and csp
cases. Both purge the TLB completely.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
2018-07-30 11:20:17 +01:00
Janosch Frank 7c4b13a7c0 s390/mm: Add gmap pmd notification bit setting
Like for ptes, we also need invalidation notification for pmds, to
make sure the guest lowcore pages are always accessible and later
addition of shadowed pmds.

With PMDs we do not have PGSTEs or some other bits we could use in the
host PMD. Instead we pick one of the free bits in the gmap PMD. Every
time a host pmd will be invalidated, we will check if the respective
gmap PMD has the bit set and in that case fire up the notifier.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-07-30 11:20:17 +01:00
Janosch Frank 58b7e200d2 s390/mm: Add gmap pmd linking
Let's allow pmds to be linked into gmap for the upcoming s390 KVM huge
page support.

Before this patch we copied the full userspace pmd entry. This is not
correct, as it contains SW defined bits that might be interpreted
differently in the GMAP context. Now we only copy over all hardware
relevant information leaving out the software bits.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 11:20:17 +01:00
Janosch Frank 2c46e974dd s390/mm: Abstract gmap notify bit setting
Currently we use the software PGSTE bits PGSTE_IN_BIT and
PGSTE_VSIE_BIT to notify before an invalidation occurs on a prefix
page or a VSIE page respectively. Both bits are pgste specific, but
are used when protecting a memory range.

Let's introduce abstract GMAP_NOTIFY_* bits that will be realized into
the respective bits when gmap DAT table entries are protected.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 11:20:17 +01:00
Janosch Frank 5a045bb9c4 s390/mm: Make gmap_protect_range more modular
This patch reworks the gmap_protect_range logic and extracts the pte
handling into an own function. Also we do now walk to the pmd and make
it accessible in the function for later use. This way we can add huge
page handling logic more easily.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-30 11:20:17 +01:00
Masahiro Yamada 9fe37714c1 microblaze: delete wrong comment about machine_early_init
machine_early_init is defined in arch/microblaze/kernel/setup.c
I do not see mach-* directory for MicroBlaze.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2018-07-30 08:33:04 +02:00
Dave Airlie 3fce461827 BackMerge v4.18-rc7 into drm-next
rmk requested this for armada and I think we've had a few
conflicts build up.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-07-30 10:39:22 +10:00
Geert Uytterhoeven 58064e1f46 m68knommu: Fix typos in Coldfire 5272 DMA debug code
If DEBUG_DMA is defined:

    include/asm/dma.h: In function ‘set_dma_mode’:
    include/asm/dma.h:393: error: ‘dmabp’ undeclared (first use in this function)
    include/asm/dma.h:393: error: (Each undeclared identifier is reported only once
    include/asm/dma.h:393: error: for each function it appears in.)
    include/asm/dma.h: In function ‘set_dma_addr’:
    include/asm/dma.h:424: error: ‘dmawp’ undeclared (first use in this function)

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2018-07-30 09:15:01 +10:00
Geert Uytterhoeven eec85fa9d9 m68k: coldfire: Normalize clk API
Coldfire still provides its own variant of the clk API rather than using
the generic COMMON_CLK API.  This generally works, but it causes some
link errors with drivers using the clk_round_rate(), clk_set_rate(),
clk_set_parent(), or clk_get_parent() functions when a platform lacks
those interfaces.

This adds empty stub implementations for each of them, and I don't even
try to do something useful here but instead just print a WARN() message
to make it obvious what is going on if they ever end up being called.

The drivers that call these won't be used on these platforms (otherwise
we'd get a link error today), so the added code is harmless bloat and
will warn about accidental use.

Based on commit bd7fefe1f0 ("ARM: w90x900: normalize clk API").

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2018-07-30 09:15:01 +10:00
Geert Uytterhoeven 71a896687b m68k/defconfig: Update defconfigs for v4.18-rc6
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-07-29 10:55:53 +02:00
Mike Rapoport 1008a11590 m68k: switch to MEMBLOCK + NO_BOOTMEM
In m68k the physical memory is described by [memory_start, memory_end] for
!MMU variant and by m68k_memory array of memory ranges for the MMU version.
This information is directly use to register the physical memory with
memblock.

The reserve_bootmem() calls are replaced with memblock_reserve() and the
bootmap bitmap allocation is simply dropped.

Since the MMU variant creates early mappings only for the small part of the
memory we force bottom-up allocations in memblock.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-07-29 10:48:18 +02:00
Mike Rapoport 9e09221957 m68k/page_no.h: force __va argument to be unsigned long
Add explicit casting to unsigned long to the __va() parameter

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-07-29 10:48:18 +02:00
Mike Rapoport 384052e4ed m68k/bitops: convert __ffs to match generic declaration
The generic bitops declare __ffs as

	static inline unsigned long __ffs(unsigned long word);

Convert the m68k version to match the generic declaration.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-07-29 10:48:18 +02:00
Geert Uytterhoeven 781c4d6f5f m68k/io: Switch mmu variant to <asm-generic/io.h>
The dummy functions defined in <asm/io_mm.h> can be provided by
<asm-generic/io.h>.

As nommu already uses <asm-generic/io.h>, move its inclusion to
<asm/io.h>, and add/adjust include guards where appropriate.

This gets rid of lots of "statement with no effect" and "unused
variable" warnings when compile-testing.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
2018-07-29 10:48:18 +02:00
Geert Uytterhoeven ab4d391d27 m68k/io: Move mem*io define guards to <asm/kmap.h>
The mem*io define guards are applicable to all users of <asm/kmap.h>.
Hence move them, and drop the #ifdef.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
2018-07-29 10:48:18 +02:00
Geert Uytterhoeven e295066f66 m68k/io: Add missing ioremap define guards, fix typo
- Add missing define guard for ioremap_wt(),
  - Move ARCH_HAS_IOREMAP_WT from <asm/io_mm.h> to <asm/kmap.h>, as it
    is applicable to Coldfire with MMU, too,
  - Fix typo s/ioremap_fillcache/ioremap_fullcache/,
  - Add define guard for iounmap() for consistency with other
    architectures.

Fixes: 9746882f54 ("m68k: group io mapping definitions and functions")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
2018-07-29 10:48:18 +02:00
Arnd Bergmann d7de1c3af1 m68k: Remove unused set_clock_mmss() helpers
Commit 397ac99c6c ("m68k: remove dead timer code") removed set_rtc_mmss()
because it was unused in 2012. However, this was itself the only user of the
mach_set_clock_mmss() callback and the many implementations of that callback,
which are equally unused.

This removes all of those as well.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-07-29 10:48:18 +02:00
Arnd Bergmann 5b9bfb8ec4 m68k: mac: Use time64_t in RTC handling
The real-time clock on m68k (and powerpc) mac systems uses an unsigned
32-bit value starting in 1904, which overflows in 2040, about two years
later than everyone else, but this gets wrapped around in the Linux
code in 2038 already because of the deprecated usage of time_t and/or
long in the conversion.

Getting rid of the deprecated interfaces makes it work until 2040 as
documented, and it could be easily extended by reinterpreting
the resulting time64_t as a positive number. For the moment, I'm
adding a WARN_ON() that triggers if we encounter a time before 1970
or after 2040 (the two are indistinguishable).

This brings it in line with the corresponding code that we have on
powerpc macintosh.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[fthain: Adopt __u32 for the union in via_read_time(), consistent with
	 changes to via_write_time()]
[fthain: Use lower_32_bits() in via_write_time(), consistent with changes
	 to pmu_write_time() and cuda_write_time()]
[fthain: Have via_read_time() return a time64_t, consistent with changes
	 to pmu_read_time() and cuda_read_time()]
[fthain: Drop the pointless wraparound conditional in via_read_time()]
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
[geert: Drop WARN_ON(), as it is reported to trigger on powermac]
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-07-29 10:44:58 +02:00
Sunil Muthuswamy 9d9c965687 Drivers: hv: vmbus: Get rid of MSR access from vmbus_drv.c
Get rid of ISA specific code from vmus_drv.c which is common code.

Fixes: 81b18bce48 ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic")

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-29 08:09:56 +02:00
David S. Miller 958b4cd8fa Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-07-28

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) API fixes for libbpf's BTF mapping of map key/value types in order
   to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR()
   markings, from Martin.

2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached
   consumer pointer of the RX queue, from Björn.

3) Fix __xdp_return() to check for NULL pointer after the rhashtable
   lookup that retrieves the allocator object, from Taehee.

4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue
   by 4 bytes which got removed from overall stack usage, from Wang.

5) Fix bpf_skb_load_bytes_relative() length check to use actual
   packet length, from Daniel.

6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple()
   handler, from Thomas.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28 21:02:21 -07:00
Linus Torvalds 7648c44680 One more fix for 4.18:
- Revert an errata workaround for the BCM5300X platform that was
     merged for v4.18-rc2 but has been found to cause hangs on at least
     systems using the BCM4718A1.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW1zA0hUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN2QyAEAxicOismTbPgSI+2oiaOjJbF5KjXi
 cPNCtDkjwkUafU4A/1jvJlXpw6UHRwFAr6fcfnPMcK6QyYiQ9NnSWz46QYkI
 =X4uV
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_4.18_5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fix from Paul Burton:
 "Here's one more MIPS fix, reverting an errata workaround that was
  merged for v4.18-rc2 but has since been found to cause system hangs on
  some BCM4718A1-based systems by the OpenWRT project"

* tag 'mips_fixes_4.18_5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  Revert "MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum"
2018-07-28 12:32:28 -07:00
Nicholas Mc Guire 28ec2238f3
MIPS: generic: fix missing of_node_put()
of_find_compatible_node() returns a device_node pointer with refcount
incremented and must be decremented explicitly.
 As this code is using the result only to check presence of the interrupt
controller (!NULL) but not actually using the result otherwise the
refcount can be decremented here immediately again.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19820/
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
2018-07-27 19:48:57 -07:00
Nicholas Mc Guire b1259519e6
MIPS: Octeon: add missing of_node_put()
The call to of_find_node_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented here after the last
usage.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19558/
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
2018-07-27 19:45:15 -07:00
Paul Burton 351fdddd36
MIPS: VDSO: Prevent use of smp_processor_id()
VDSO code should not be using smp_processor_id(), since it is executed
in user mode.
Introduce a VDSO-specific path which will cause a compile-time
or link-time error (depending upon support for __compiletime_error) if
the VDSO ever incorrectly attempts to use smp_processor_id().

[Matt Redfearn <matt.redfearn@imgtec.com>: Move before change to
smp_processor_id in series]

Signed-off-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/17932/
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
2018-07-27 19:36:16 -07:00
Alban Bedel 24babe69d7
MIPS: ath79: Use the IRQ based GPIO key driver for the buttons
Now that the GPIO driver support interrupts we don't need to poll the
buttons.

Signed-off-by: Alban Bedel <albeu@free.fr>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/15283/
Cc: linux-mips@linux-mips.org
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Antony Pavlov <antonynpavlov@gmail.com>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
2018-07-27 19:29:53 -07:00
Masahiro Yamada 43fee2b238 kbuild: do not redirect the first prerequisite for filechk
Currently, filechk unconditionally opens the first prerequisite and
redirects it as the stdin of a filechk_* rule.  Hence, every target
using $(call filechk,...) must list something as the first prerequisite
even if it is unneeded.

'< $<' is actually unneeded in most cases.  Each rule can explicitly
adds it if necessary.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-07-28 10:34:10 +09:00
Masahiro Yamada 6b0709f5a5 ARM: at91: remove unused duplicated filechk_offsets
The filechk_offsets in arch/arm/mach-at91/Makefile is never
used because it is always overridden by the equivalent one in
scripts/Makefile.lib

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2018-07-28 10:34:06 +09:00