1
0
Fork 0
Commit Graph

5670 Commits (7e1c4e27928e5f87b9b1eaf06dc31773b2f1e7f1)

Author SHA1 Message Date
Mike Rapoport 7e1c4e2792 memblock: stop using implicit alignment to SMP_CACHE_BYTES
When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.

Implicit alignment is done deep in the memblock allocator and it can
come as a surprise.  Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.

Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.

For the case when memblock APIs are used via helper functions, e.g.  like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.

The direct memblock APIs users were updated using the semantic patch below:

@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)

[mhocko@suse.com: changelog update]
[akpm@linux-foundation.org: coding-style fixes]
[rppt@linux.ibm.com: fix missed uses of implicit alignment]
  Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paul Burton <paul.burton@mips.com>	[MIPS]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Mike Rapoport 57c8a661d9 mm: remove include/linux/bootmem.h
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>

@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Mike Rapoport eb31d559f1 memblock: remove _virt from APIs returning virtual address
The conversion is done using

sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
	$(git grep -l memblock_virt_alloc)

Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:15 -07:00
Mike Rapoport 9a8dd708d5 memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*
Make it explicit that the caller gets a physical address rather than a
virtual one.

This will also allow using meblock_alloc prefix for memblock allocations
returning virtual address, which is done in the following patches.

The conversion is done using the following semantic patch:

@@
expression e1, e2, e3;
@@
(
- memblock_alloc(e1, e2)
+ memblock_phys_alloc(e1, e2)
|
- memblock_alloc_nid(e1, e2, e3)
+ memblock_phys_alloc_nid(e1, e2, e3)
|
- memblock_alloc_try_nid(e1, e2, e3)
+ memblock_phys_alloc_try_nid(e1, e2, e3)
)

Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:15 -07:00
Linus Torvalds 345671ea0f Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc things

 - ocfs2 updates

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits)
  hugetlbfs: dirty pages as they are added to pagecache
  mm: export add_swap_extent()
  mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
  tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE
  mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page()
  mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()
  mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition
  mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t
  mm/gup: cache dev_pagemap while pinning pages
  Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"
  mm: return zero_resv_unavail optimization
  mm: zero remaining unavailable struct pages
  tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option
  tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option
  tools/testing/selftests/vm/gup_benchmark.c: allow user specified file
  tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage
  mm/gup_benchmark.c: add additional pinning methods
  mm/gup_benchmark.c: time put_page()
  mm: don't raise MEMCG_OOM event due to failed high-order allocation
  mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock
  ...
2018-10-26 19:33:41 -07:00
Johannes Weiner 8508cf3ffa sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
There are several definitions of those functions/macros in places that
mess with fixed-point load averages.  Provide an official version.

[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:26:32 -07:00
Linus Torvalds 685f7e4f16 powerpc updates for 4.20
Notable changes:
 
  - A large series to rewrite our SLB miss handling, replacing a lot of fairly
    complicated asm with much fewer lines of C.
 
  - Following on from that, we now maintain a cache of SLB entries for each
    process and preload them on context switch. Leading to a 27% speedup for our
    context switch benchmark on Power9.
 
  - Improvements to our handling of SLB multi-hit errors. We now print more debug
    information when they occur, and try to continue running by flushing the SLB
    and reloading, rather than treating them as fatal.
 
  - Enable THP migration on 64-bit Book3S machines (eg. Power7/8/9).
 
  - Add support for physical memory up to 2PB in the linear mapping on 64-bit
    Book3S. We only support up to 512TB as regular system memory, otherwise the
    percpu allocator runs out of vmalloc space.
 
  - Add stack protector support for 32 and 64-bit, with a per-task canary.
 
  - Add support for PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP.
 
  - Support recognising "big cores" on Power9, where two SMT4 cores are presented
    to us as a single SMT8 core.
 
  - A large series to cleanup some of our ioremap handling and PTE flags.
 
  - Add a driver for the PAPR SCM (storage class memory) interface, allowing
    guests to operate on SCM devices (acked by Dan).
 
  - Changes to our ftrace code to handle very large kernels, where we need to use
    a trampoline to get to ftrace_caller().
 
 Many other smaller enhancements and cleanups.
 
 Thanks to:
   Alan Modra, Alistair Popple, Aneesh Kumar K.V, Anton Blanchard, Aravinda
   Prasad, Bartlomiej Zolnierkiewicz, Benjamin Herrenschmidt, Breno Leitao,
   Cédric Le Goater, Christophe Leroy, Christophe Lombard, Dan Carpenter, Daniel
   Axtens, Finn Thain, Gautham R. Shenoy, Gustavo Romero, Haren Myneni, Hari
   Bathini, Jia Hongtao, Joel Stanley, John Allen, Laurent Dufour, Madhavan
   Srinivasan, Mahesh Salgaonkar, Mark Hairgrove, Masahiro Yamada, Michael
   Bringmann, Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Nathan
   Fontenot, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran,
   Paul Mackerras, Petr Vorel, Rashmica Gupta, Reza Arbab, Rob Herring, Sam
   Bobroff, Samuel Mendoza-Jonas, Scott Wood, Stan Johnson, Stephen Rothwell,
   Stewart Smith, Suraj Jitindar Singh, Tyrel Datwyler, Vaibhav Jain, Vasant
   Hegde, YueHaibing, zhong jiang,
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb01vTAAoJEFHr6jzI4aWADsEP/jqL3+2qxs098ra80tpXCpXJ
 tgXCosEs4b35sGtyHeUWZZZfWXeisaPAIlP8zTx1n50HACZduDYRAl0Ew9XB7Xdw
 enDHRVccD21FsmHBOx/Ii1rVJlovWlj6EQCWHKeZmNjeRoFuClVZ7CYmf+mBifKR
 sw2Db2fKA/59wMTq2zIMy5pqYgqlAs4jTWS6uN5hKPoBmO/82ARnNG+qgLuloD3Z
 O8zSDM9QQ7PpuyDgTjO9SAo2YjmEfXlEG6cOCCejsU3DMctaEAK5PUZ+blsHYHBH
 BYZYKs/x4pcw0SO41GtTh0M2YqDYBVuBIpRw8lLZap97Xo9ucSkAm5WD3rGxk4CY
 YeZKEPUql6MHN3+DKl8mx2F0V+Et/tio2HNqc9KReR1tfoolZAbe+SFZHfgmc/Rq
 RD9nnG8KRd4K2K1BTqpkTmI1EtE7jPtPJPSV8gMGhgL/N5vPmH3mql/qyOtYx48E
 6/hPzWESgs16VRZ/opLh8VvjlY1HBDODQhehhhl+o23/Vb8qEgRf8Uqhq50rQW1H
 EeOqyyYQ90txSU31Sgy1kQkvOgIFAsBObWT1ZCJ3RbfGbB4/tdEAvZqTZRlXo2OY
 7P0Sqcw/9Le5eJkHIlLtBv0TF7y1OYemCbLgRQzFlcRP+UKtYyg8eFnFjqbPEEmP
 ulwhn/BfFVSgaYKQ503u
 =I0pj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - A large series to rewrite our SLB miss handling, replacing a lot of
     fairly complicated asm with much fewer lines of C.

   - Following on from that, we now maintain a cache of SLB entries for
     each process and preload them on context switch. Leading to a 27%
     speedup for our context switch benchmark on Power9.

   - Improvements to our handling of SLB multi-hit errors. We now print
     more debug information when they occur, and try to continue running
     by flushing the SLB and reloading, rather than treating them as
     fatal.

   - Enable THP migration on 64-bit Book3S machines (eg. Power7/8/9).

   - Add support for physical memory up to 2PB in the linear mapping on
     64-bit Book3S. We only support up to 512TB as regular system
     memory, otherwise the percpu allocator runs out of vmalloc space.

   - Add stack protector support for 32 and 64-bit, with a per-task
     canary.

   - Add support for PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP.

   - Support recognising "big cores" on Power9, where two SMT4 cores are
     presented to us as a single SMT8 core.

   - A large series to cleanup some of our ioremap handling and PTE
     flags.

   - Add a driver for the PAPR SCM (storage class memory) interface,
     allowing guests to operate on SCM devices (acked by Dan).

   - Changes to our ftrace code to handle very large kernels, where we
     need to use a trampoline to get to ftrace_caller().

  And many other smaller enhancements and cleanups.

  Thanks to: Alan Modra, Alistair Popple, Aneesh Kumar K.V, Anton
  Blanchard, Aravinda Prasad, Bartlomiej Zolnierkiewicz, Benjamin
  Herrenschmidt, Breno Leitao, Cédric Le Goater, Christophe Leroy,
  Christophe Lombard, Dan Carpenter, Daniel Axtens, Finn Thain, Gautham
  R. Shenoy, Gustavo Romero, Haren Myneni, Hari Bathini, Jia Hongtao,
  Joel Stanley, John Allen, Laurent Dufour, Madhavan Srinivasan, Mahesh
  Salgaonkar, Mark Hairgrove, Masahiro Yamada, Michael Bringmann,
  Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Nathan
  Fontenot, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver
  O'Halloran, Paul Mackerras, Petr Vorel, Rashmica Gupta, Reza Arbab,
  Rob Herring, Sam Bobroff, Samuel Mendoza-Jonas, Scott Wood, Stan
  Johnson, Stephen Rothwell, Stewart Smith, Suraj Jitindar Singh, Tyrel
  Datwyler, Vaibhav Jain, Vasant Hegde, YueHaibing, zhong jiang"

* tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (221 commits)
  Revert "selftests/powerpc: Fix out-of-tree build errors"
  powerpc/msi: Fix compile error on mpc83xx
  powerpc: Fix stack protector crashes on CPU hotplug
  powerpc/traps: restore recoverability of machine_check interrupts
  powerpc/64/module: REL32 relocation range check
  powerpc/64s/radix: Fix radix__flush_tlb_collapsed_pmd double flushing pmd
  selftests/powerpc: Add a test of wild bctr
  powerpc/mm: Fix page table dump to work on Radix
  powerpc/mm/radix: Display if mappings are exec or not
  powerpc/mm/radix: Simplify split mapping logic
  powerpc/mm/radix: Remove the retry in the split mapping logic
  powerpc/mm/radix: Fix small page at boundary when splitting
  powerpc/mm/radix: Fix overuse of small pages in splitting logic
  powerpc/mm/radix: Fix off-by-one in split mapping logic
  powerpc/ftrace: Handle large kernel configs
  powerpc/mm: Fix WARN_ON with THP NUMA migration
  selftests/powerpc: Fix out-of-tree build errors
  powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected
  powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64
  powerpc/time: isolate scaled cputime accounting in dedicated functions.
  ...
2018-10-26 14:36:21 -07:00
Linus Torvalds b27186abb3 Devicetree updates for 4.20:
- Sync dtc with upstream version v1.4.7-14-gc86da84d30e4
 
 - Work to get rid of direct accesses to struct device_node name and
   type pointers in preparation for removing them. New helpers for
   parsing DT cpu nodes and conversions to use the helpers. printk
   conversions to %pOFn for printing DT node names. Most went thru
   subystem trees, so this is the remainder.
 
 - Fixes to DT child node lookups to actually be restricted to child
   nodes instead of treewide.
 
 - Refactoring of dtb targets out of arch code. This makes the support
   more uniform and enables building all dtbs on c6x, microblaze, and
   powerpc.
 
 - Various DT binding updates for Renesas r8a7744 SoC
 
 - Vendor prefixes for Facebook, OLPC
 
 - Restructuring of some ARM binding docs moving some peripheral bindings
   out of board/SoC binding files
 
 - New "secure-chosen" binding for secure world settings on ARM
 
 - Dual licensing of 2 DT IRQ binding headers
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAlvTKWYQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw8J5EACMAnrTxWQmXfQXOZEVxztcFavH6LP8mh2e
 7FZIZ38jzHXXvl81tAg1nBhzFUU/qtvqW8NDCZ9OBxKvp6PFDNhWu241ZodSB1Kw
 MZWy2A9QC+qbHYCC+SB5gOT0+Py3v7LNCBa5/TxhbFd35THJM8X0FP7gmcCGX593
 9Ml1rqawT4mK5XmCpczT0cXxyC4TgVtpfDWZH2KgJTR/kwXVQlOQOGZ8a1y/wrt7
 8TLIe7Qy4SFRzjhwbSta1PUehyYfe4uTSsXIJ84kMvNMxinLXQtvd7t9TfsK8p/R
 WjYUneJskVjtxVrMQfdV4MxyFL1YEt2mYcr0PMKIWxMCgGDAZsHPoUZmjyh/PrCI
 uiZtEHn3fXpUZAV/xEHHNirJxYyQfHGiksAT+lPrUXYYLCcZ3ZmqiTEYhGoQAfH5
 CQPMuxA6yXxp6bov6zJwZSTZtkXciju8aQRhUhlxIfHTqezmGYeql/bnWd+InNuR
 upANLZBh6D2jTWzDyobconkCCLlVkSqDoqOx725mMl6hIcdH9d2jVX7hwRf077VI
 5i3CyPSJOkSOLSdB8bAPYfBoaDtH2bthxieUrkkSbIjbwHO1H6a2lxPeG/zah0a3
 ePMGhi7J84UM4VpJEi000cP+bhPumJtJrG7zxP7ldXdfAF436sQ6KRptlcpLpj5i
 IwMhUQNH+g==
 =335v
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull Devicetree updates from Rob Herring:
 "A bit bigger than normal as I've been busy this cycle.

  There's a few things with dependencies and a few things subsystem
  maintainers didn't pick up, so I'm taking them thru my tree.

  The fixes from Johan didn't get into linux-next, but they've been
  waiting for some time now and they are what's left of what subsystem
  maintainers didn't pick up.

  Summary:

   - Sync dtc with upstream version v1.4.7-14-gc86da84d30e4

   - Work to get rid of direct accesses to struct device_node name and
     type pointers in preparation for removing them. New helpers for
     parsing DT cpu nodes and conversions to use the helpers. printk
     conversions to %pOFn for printing DT node names. Most went thru
     subystem trees, so this is the remainder.

   - Fixes to DT child node lookups to actually be restricted to child
     nodes instead of treewide.

   - Refactoring of dtb targets out of arch code. This makes the support
     more uniform and enables building all dtbs on c6x, microblaze, and
     powerpc.

   - Various DT binding updates for Renesas r8a7744 SoC

   - Vendor prefixes for Facebook, OLPC

   - Restructuring of some ARM binding docs moving some peripheral
     bindings out of board/SoC binding files

   - New "secure-chosen" binding for secure world settings on ARM

   - Dual licensing of 2 DT IRQ binding headers"

* tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (78 commits)
  ARM: dt: relicense two DT binding IRQ headers
  power: supply: twl4030-charger: fix OF sibling-node lookup
  NFC: nfcmrvl_uart: fix OF child-node lookup
  net: stmmac: dwmac-sun8i: fix OF child-node lookup
  net: bcmgenet: fix OF child-node lookup
  drm/msm: fix OF child-node lookup
  drm/mediatek: fix OF sibling-node lookup
  of: Add missing exports of node name compare functions
  dt-bindings: Add OLPC vendor prefix
  dt-bindings: misc: bk4: Add device tree binding for Liebherr's BK4 SPI bus
  dt-bindings: thermal: samsung: Add SPDX license identifier
  dt-bindings: clock: samsung: Add SPDX license identifiers
  dt-bindings: timer: ostm: Add R7S9210 support
  dt-bindings: phy: rcar-gen2: Add r8a7744 support
  dt-bindings: can: rcar_can: Add r8a7744 support
  dt-bindings: timer: renesas, cmt: Document r8a7744 CMT support
  dt-bindings: watchdog: renesas-wdt: Document r8a7744 support
  dt-bindings: thermal: rcar: Add device tree support for r8a7744
  Documentation: dt: Add binding for /secure-chosen/stdout-path
  dt-bindings: arm: zte: Move sysctrl bindings to their own doc
  ...
2018-10-26 12:09:58 -07:00
Linus Torvalds ba9f6f8954 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo updates from Eric Biederman:
 "I have been slowly sorting out siginfo and this is the culmination of
  that work.

  The primary result is in several ways the signal infrastructure has
  been made less error prone. The code has been updated so that manually
  specifying SEND_SIG_FORCED is never necessary. The conversion to the
  new siginfo sending functions is now complete, which makes it
  difficult to send a signal without filling in the proper siginfo
  fields.

  At the tail end of the patchset comes the optimization of decreasing
  the size of struct siginfo in the kernel from 128 bytes to about 48
  bytes on 64bit. The fundamental observation that enables this is by
  definition none of the known ways to use struct siginfo uses the extra
  bytes.

  This comes at the cost of a small user space observable difference.
  For the rare case of siginfo being injected into the kernel only what
  can be copied into kernel_siginfo is delivered to the destination, the
  rest of the bytes are set to 0. For cases where the signal and the
  si_code are known this is safe, because we know those bytes are not
  used. For cases where the signal and si_code combination is unknown
  the bits that won't fit into struct kernel_siginfo are tested to
  verify they are zero, and the send fails if they are not.

  I made an extensive search through userspace code and I could not find
  anything that would break because of the above change. If it turns out
  I did break something it will take just the revert of a single change
  to restore kernel_siginfo to the same size as userspace siginfo.

  Testing did reveal dependencies on preferring the signo passed to
  sigqueueinfo over si->signo, so bit the bullet and added the
  complexity necessary to handle that case.

  Testing also revealed bad things can happen if a negative signal
  number is passed into the system calls. Something no sane application
  will do but something a malicious program or a fuzzer might do. So I
  have fixed the code that performs the bounds checks to ensure negative
  signal numbers are handled"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
  signal: Guard against negative signal numbers in copy_siginfo_from_user32
  signal: Guard against negative signal numbers in copy_siginfo_from_user
  signal: In sigqueueinfo prefer sig not si_signo
  signal: Use a smaller struct siginfo in the kernel
  signal: Distinguish between kernel_siginfo and siginfo
  signal: Introduce copy_siginfo_from_user and use it's return value
  signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
  signal: Fail sigqueueinfo if si_signo != sig
  signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
  signal/unicore32: Use force_sig_fault where appropriate
  signal/unicore32: Generate siginfo in ucs32_notify_die
  signal/unicore32: Use send_sig_fault where appropriate
  signal/arc: Use force_sig_fault where appropriate
  signal/arc: Push siginfo generation into unhandled_exception
  signal/ia64: Use force_sig_fault where appropriate
  signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
  signal/ia64: Use the generic force_sigsegv in setup_frame
  signal/arm/kvm: Use send_sig_mceerr
  signal/arm: Use send_sig_fault where appropriate
  signal/arm: Use force_sig_fault where appropriate
  ...
2018-10-24 11:22:39 +01:00
Aravinda Prasad c6c26fb55e powerpc/pseries: Export raw per-CPU VPA data via debugfs
This patch exports the raw per-CPU VPA data via debugfs.
A per-CPU file is created which exports the VPA data of
that CPU to help debug some of the VPA related issues or
to analyze the per-CPU VPA related statistics.

v3: Removed offline CPU check.

v2: Included offline CPU check and other review comments.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Michael Ellerman 23ad1a2700 powerpc: Add -Werror at arch/powerpc level
Back when I added -Werror in commit ba55bd7436 ("powerpc: Add
configurable -Werror for arch/powerpc") I did it by adding it to most
of the arch Makefiles.

At the time we excluded math-emu, because apparently it didn't build
cleanly. But that seems to have been fixed somewhere in the interim.

So move the -Werror addition to the top-level of the arch, this saves
us from repeating it in every Makefile and means we won't forget to
add it to any new sub-dirs.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 00:56:17 +11:00
Oliver O'Halloran b5beae5e22 powerpc/pseries: Add driver for PAPR SCM regions
Adds a driver that implements support for enabling and accessing PAPR
SCM regions. Unfortunately due to how the PAPR interface works we can't
use the existing of_pmem driver (yet) because:

 a) The guest is required to use the H_SCM_BIND_MEM h-call to add
    add the SCM region to it's physical address space, and
 b) There is currently no mechanism for relating a bare of_pmem region
    to the backing DIMM (or not-a-DIMM for our case).

Both of these are easily handled by rolling the functionality into a
seperate driver so here we are...

Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 00:56:17 +11:00
Oliver O'Halloran 4c5d87db49 powerpc/pseries: PAPR persistent memory support
This patch implements support for discovering storage class memory
devices at boot and for handling hotplug of new regions via RTAS
hotplug events.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Fix CONFIG_MEMORY_HOTPLUG=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 00:56:17 +11:00
Christophe Leroy 422123ccb9 powerpc/traps: fix machine check handlers to use pr_cont()
When printing the machine check cause, the cause appears on the
following line due to bad use of printk without \n:

[   33.663993] Machine check in kernel mode.
[   33.664011] Caused by (from SRR1=9032):
[   33.664036] Data access error at address c90c8000

This patch fixes it by using pr_cont() for the second part:

[  133.258131] Machine check in kernel mode.
[  133.258146] Caused by (from SRR1=9032): Data access error at address c90c8000

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 00:56:17 +11:00
Christophe Leroy 56f3c1413f powerpc/mm: properly set PAGE_KERNEL flags in ioremap()
Set PAGE_KERNEL directly in the caller and do not rely on a
hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set.

As already done for PPC64, use pgprot_cache() helpers instead of
_PAGE_XXX flags in PPC32 ioremap() derived functions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy aa91796ec4 powerpc: don't use ioremap_prot() nor __ioremap() unless really needed.
In many places, ioremap_prot() and __ioremap() can be replaced with
higher level functions like ioremap(), ioremap_coherent(),
ioremap_cache(), ioremap_wc() ...

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Michael Bringmann 65b9fdadfc powerpc/pseries/mobility: Extend start/stop topology update scope
The powerpc mobility code may receive RTAS requests to perform PRRN
(Platform Resource Reassignment Notification) topology changes at any
time, including during LPAR migration operations.

In some configurations where the affinity of CPUs or memory is being
changed on that platform, the PRRN requests may apply or refer to
outdated information prior to the complete update of the device-tree.

This patch changes the duration for which topology updates are
suppressed during LPAR migrations from just the rtas_ibm_suspend_me()
/ 'ibm,suspend-me' call(s) to cover the entire migration_store()
operation to allow all changes to the device-tree to be applied prior
to accepting and applying any PRRN requests.

For tracking purposes, pr_info notices are added to the functions
start_topology_update() and stop_topology_update() of 'numa.c'.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
YueHaibing b45e9d761b powerpc/pseries/memory-hotplug: Fix return value type of find_aa_index
The variable 'aa_index' is defined as an unsigned value in
update_lmb_associativity_index(), but find_aa_index() may return -1
when dlpar_clone_property() fails. So change find_aa_index() to return
a bool, which indicates whether 'aa_index' was found or not.

Fixes: c05a5a4096 ("powerpc/pseries: Dynamic add entires to associativity lookup array")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Nathan Fontenot nfont@linux.vnet.ibm.com>
[mpe: Tweak changelog, rename is_found to just found]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff fef7f90552 powerpc/eeh: Cleanup eeh_ops.wait_state()
The wait_state member of eeh_ops does not need to be platform
dependent; it's just logic around eeh_ops.get_state(). Therefore,
merge the two (slightly different!) platform versions into a new
function, eeh_wait_state() and remove the eeh_ops member.

While doing this, also correct:
* The wait logic, so that it never waits longer than max_wait.
* The wait logic, so that it never waits less than
  EEH_STATE_MIN_WAIT_TIME.
* One call site where the result is treated like a bit field before
  it's checked for negative error values.
* In pseries_eeh_get_state(), rename the "state" parameter to "delay"
  because that's what it is.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff e762bb891a powerpc/eeh: Cleanup eeh_pe_state_mark()
Currently, eeh_pe_state_mark() marks a PE (and it's children) with a
state and then performs additional processing if that state included
EEH_PE_ISOLATED.

The state parameter is always a constant at the call site, so
rearrange eeh_pe_state_mark() into two functions and just call the
appropriate one at each site.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff 80e65b0094 powerpc/eeh: Cleanup list_head field names
Instances of struct eeh_pe are placed in a tree structure using the
fields "child_list" and "child", so place these next to each other
in the definition.

The field "child" is a list entry, so remove the unnecessary and
misleading use of the list initializer, LIST_HEAD(), on it.

The eeh_dev struct contains two list entry fields, called "list" and
"rmv_list". Rename them to "entry" and "rmv_entry" and, as above, stop
initializing them with LIST_HEAD().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff bffc0176e7 powerpc/eeh: Cleanup EEH_POSTPONED_PROBE
Currently a flag, EEH_POSTPONED_PROBE, is used to prevent an incorrect
message "EEH: No capable adapters found" from being displayed during
the boot of powernv systems.

It is necessary because, on powernv, the call to eeh_probe_devices()
made from eeh_init() is too early and EEH can't yet be enabled. A
second call is made later from eeh_pnv_post_init(), which succeeds.

(On pseries, the first call succeeds because PCI devices are set up
early enough and no second call is made.)

This can be simplified by moving the early call to eeh_probe_devices()
from eeh_init() (where it's seen by both platforms) to
pSeries_final_fixup(), so that each platform only calls
eeh_probe_devices() once, at a point where it can succeed.
This is slightly later in the boot sequence, but but still early
enough and it is now in the same place in the sequence for both
platforms (the pcibios_fixup hook).

The display of the message can be cleaned up as well, by moving it
into eeh_probe_devices().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Bartlomiej Zolnierkiewicz 719736e1cc powerpc: remove redundant 'default n' from Kconfig-s
'default n' is the default value for any bool or tristate Kconfig
setting so there is no need to write it explicitly.

Also since commit f467c5640c ("kconfig: only write '# CONFIG_FOO
is not set' for visible symbols") the Kconfig behavior is the same
regardless of 'default n' being present or not:

    ...
    One side effect of (and the main motivation for) this change is making
    the following two definitions behave exactly the same:

        config FOO
                bool

        config FOO
                bool
                default n

    With this change, neither of these will generate a
    '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
    That might make it clearer to people that a bare 'default n' is
    redundant.
    ...

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Dan Carpenter 014704e6f5 powerpc: Fix signedness bug in update_flash_db()
The "count < sizeof(struct os_area_db)" comparison is type promoted to
size_t so negative values of "count" are treated as very high values
and we accidentally return success instead of a negative error code.

This doesn't really change runtime much but it fixes a static checker
warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Michael Ellerman 9b7e4d601b Merge branch 'fixes' into next
Merge our fixes branch. It has a few important fixes that are needed for
futher testing and also some commits that will conflict with content in
next.
2018-10-09 16:51:05 +11:00
Finn Thain 0792a2c8e0 macintosh: Use common code to access RTC
Now that the 68k Mac port has adopted the via-pmu driver, the same RTC
code can be shared between m68k and powerpc. Replace duplicated code in
arch/powerpc and arch/m68k with common RTC accessors for Cuda and PMU.

Drop the problematic WARN_ON which was introduced in commit 22db552b50
("powerpc/powermac: Fix rtc read/write functions").

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-08 22:53:10 +11:00
Mark Hairgrove f86ad3e019 powerpc/powernv/npu: Remove atsd_threshold debugfs setting
This threshold is no longer used now that all invalidates issue a single
ATSD to each active NPU.

Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04 16:55:54 +10:00
Mark Hairgrove 3689c37d23 powerpc/powernv/npu: Use size-based ATSD invalidates
Prior to this change only two types of ATSDs were issued to the NPU:
invalidates targeting a single page and invalidates targeting the whole
address space. The crossover point happened at the configurable
atsd_threshold which defaulted to 2M. Invalidates that size or smaller
would issue per-page invalidates for the whole range.

The NPU supports more invalidation sizes however: 64K, 2M, 1G, and all.
These invalidates target addresses aligned to their size. 2M is a common
invalidation size for GPU-enabled applications because that is a GPU
page size, so reducing the number of invalidates by 32x in that case is a
clear improvement.

ATSD latency is high in general so now we always issue a single invalidate
rather than multiple. This will over-invalidate in some cases, but for any
invalidation size over 2M it matches or improves the prior behavior.
There's also an improvement for single-page invalidates since the prior
version issued two invalidates for that case instead of one.

With this change all issued ATSDs now perform a flush, so the flush
parameter has been removed from all the helpers.

To show the benefit here are some performance numbers from a
microbenchmark which creates a 1G allocation then uses mprotect with
PROT_NONE to trigger invalidates in strides across the allocation.

One NPU (1 GPU):

         mprotect rate (GB/s)
Stride   Before      After      Speedup
64K         5.3        5.6           5%
1M         39.3       57.4          46%
2M         49.7       82.6          66%
4M        286.6      285.7           0%

Two NPUs (6 GPUs):

         mprotect rate (GB/s)
Stride   Before      After      Speedup
64K         6.5        7.4          13%
1M         33.4       67.9         103%
2M         38.7       93.1         141%
4M        356.7      354.6          -1%

Anything over 2M is roughly the same as before since both cases issue a
single ATSD.

Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04 16:55:53 +10:00
Mark Hairgrove 7ead15a144 powerpc/powernv/npu: Reduce eieio usage when issuing ATSD invalidates
There are two types of ATSDs issued to the NPU: invalidates targeting a
specific virtual address and invalidates targeting the whole address
space. In both cases prior to this change, the sequence was:

    for each NPU
        - Write the target address to the XTS_ATSD_AVA register
        - EIEIO
        - Write the launch value to issue the ATSD

First, a target address is not required when invalidating the whole
address space, so that write and the EIEIO have been removed. The AP
(size) field in the launch is not needed either.

Second, for per-address invalidates the above sequence is inefficient in
the common case of multiple NPUs because an EIEIO is issued per NPU. This
unnecessarily forces the launches of later ATSDs to be ordered with the
launches of earlier ones. The new sequence only issues a single EIEIO:

    for each NPU
        - Write the target address to the XTS_ATSD_AVA register
    EIEIO
    for each NPU
        - Write the launch value to issue the ATSD

Performance results were gathered using a microbenchmark which creates a
1G allocation then uses mprotect with PROT_NONE to trigger invalidates in
strides across the allocation.

With only a single NPU active (one GPU) the difference is in the noise for
both types of invalidates (+/-1%).

With two NPUs active (on a 6-GPU system) the effect is more noticeable:

         mprotect rate (GB/s)
Stride   Before      After      Speedup
64K         5.9        6.5          10%
1M         31.2       33.4           7%
2M         36.3       38.7           7%
4M        322.6      356.7          11%

Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04 16:55:52 +10:00
Michal Suchanek 8a03e81cb1 powerpc/64s: consolidate MCE counter increment.
The code in machine_check_exception excludes 64s hvmode when
incrementing the MCE counter only to call opal_machine_check to
increment it specifically for this case.

Remove the exclusion and special case.

Fixes: a43c159042 ("powerpc/pseries: Flush SLB contents on SLB MCE
		errors.")

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:06 +10:00
Breno Leitao 62dea077f5 powerpc/powernv: Mark function as __noreturn
There is a mismatch between function pnv_platform_error_reboot() definition
and declaration regarding function modifiers. In the declaration part, it
contains the function attribute __noreturn, while function definition
itself lacks it.

This was reported by sparse tool as an error:

  arch/powerpc/platforms/powernv/opal.c:538:6: error: symbol 'pnv_platform_error_reboot' redeclared with different type (originally declared at arch/powerpc/platforms/powernv/powernv.h:11) - different modifiers

I checked and the function is already being considered as being 'noreturn'
by the compiler, thus, I understand this patch does not change any code
being generated.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:05 +10:00
Christophe Leroy c3ff2a5193 powerpc/32: add stack protector support
This functionality was tentatively added in the past
(commit 6533b7c16e ("powerpc: Initial stack protector
(-fstack-protector) support")) but had to be reverted
(commit f2574030b0 ("powerpc: Revert the initial stack
protector support") because of GCC implementing it differently
whether it had been built with libc support or not.

Now, GCC offers the possibility to manually set the
stack-protector mode (global or tls) regardless of libc support.

This time, the patch selects HAVE_STACKPROTECTOR only if
-mstack-protector-guard=tls is supported by GCC.

On PPC32, as register r2 points to current task_struct at
all time, the stack_canary located inside task_struct can be
used directly by using the following GCC options:
-mstack-protector-guard=tls
-mstack-protector-guard-reg=r2
-mstack-protector-guard-offset=offsetof(struct task_struct, stack_canary))

The protector is disabled for prom_init and bootx_init as
it is too early to handle it properly.

 $ echo CORRUPT_STACK > /sys/kernel/debug/provoke-crash/DIRECT
[  134.943666] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: lkdtm_CORRUPT_STACK+0x64/0x64
[  134.943666]
[  134.955414] CPU: 0 PID: 283 Comm: sh Not tainted 4.18.0-s3k-dev-12143-ga3272be41209 #835
[  134.963380] Call Trace:
[  134.965860] [c6615d60] [c001f76c] panic+0x118/0x260 (unreliable)
[  134.971775] [c6615dc0] [c001f654] panic+0x0/0x260
[  134.976435] [c6615dd0] [c032c368] lkdtm_CORRUPT_STACK_STRONG+0x0/0x64
[  134.982769] [c6615e00] [ffffffff] 0xffffffff

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:03 +10:00
Naveen N. Rao 9258227e9d powerpc/pseries: Fix how we iterate over the DTL entries
When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we look up dtl_idx in
the lppaca to determine the number of entries in the buffer. Since
lppaca is in big endian, we need to do an endian conversion before using
this in our calculation to determine the number of entries in the
buffer. Without this, we do not iterate over the existing entries in the
DTL buffer properly.

Fixes: 7c105b63bd ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:02 +10:00
Naveen N. Rao db787af1b8 powerpc/pseries: Fix DTL buffer registration
When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we register the DTL
buffer for a cpu when the associated file under powerpc/dtl in debugfs
is opened. When doing so, we need to set the size of the buffer being
registered in the second u32 word of the buffer. This needs to be in big
endian, but we are not doing the conversion resulting in the below error
showing up in dmesg:

	dtl_start: DTL registration for cpu 0 (hw 0) failed with -4

Fix this in the obvious manner.

Fixes: 7c105b63bd ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:02 +10:00
Rob Herring b9ef7b4b86 powerpc: Convert to using %pOFn instead of device_node.name
In preparation to remove the node name pointer from struct device_node,
convert printf users to use the %pOFn format specifier.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:01 +10:00
Rob Herring c417596d24 powerpc/pseries: Use of_irq_get helper() in request_event_sources_irqs()
Instead of calling both of_irq_parse_one() and
irq_create_of_mapping(), call of_irq_get() instead which does
essentially the same thing. of_irq_get() also calls irq_find_host()
for deferred probe support, but this should be fine as
irq_create_of_mapping() also calls that internally. This gets us
closer to making the former 2 functions static.

In the process of simplifying request_event_sources_irqs(), combine
the the pr_err() and WARN_ON() calls to just a WARN().

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:01 +10:00
Rob Herring 8c8933eba0 powerpc/cell: Use irq_of_parse_and_map() helper
Instead of calling both of_irq_parse_one() and
irq_create_of_mapping(), call of_irq_parse_and_map() instead which
does the same thing. This gets us closer to making the former 2
functions static.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:00 +10:00
Aneesh Kumar K.V a0820ff334 powerpc/mm:book3s: Enable THP migration support
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:00 +10:00
Vaibhav Jain 8139046a5a powerpc/powernv: Make possible for user to force a full ipl cec reboot
Ever since fast reboot is enabled by default in opal,
opal_cec_reboot() will use fast-reset instead of full IPL to perform
system reboot. This leaves the user with no direct way to force a full
IPL reboot except changing an nvram setting that persistently disables
fast-reset for all subsequent reboots.

This patch provides a more direct way for the user to force a one-shot
full IPL reboot by passing the command line argument 'full' to the
reboot command. So the user will be able to tweak the reboot behavior
via:

  $ sudo reboot full	# Force a full ipl reboot skipping fast-reset

  or
  $ sudo reboot  	# default reboot path (usually fast-reset)

The reboot command passes the un-parsed command argument to the kernel
via the 'Reboot' syscall which is then passed on to the arch function
pnv_restart(). The patch updates pnv_restart() to handle this cmd-arg
and issues opal_cec_reboot2 with OPAL_REBOOT_FULL_IPL to force a full
IPL reset.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:39:45 +10:00
Rob Herring 38959a091e powerpc: 8xx: get cpu node with of_get_cpu_node
"device_type" use is deprecated for FDT though it has continued to be used
for nodes like cpu nodes. Use of_get_cpu_node() instead which works using
node names by default. This will allow the eventually removal of cpu
device_type properties.

Also, fix a leaked reference and add a missing of_node_put.

Cc: Vitaly Bordug <vitb@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
2018-09-28 14:25:58 -05:00
Rob Herring 84dbc69a2f powerpc: 4xx: get cpu node with of_get_cpu_node
"device_type" use is deprecated for FDT though it has continued to be used
for nodes like cpu nodes. Use of_get_cpu_node() instead which works using
node names by default. This will allow the eventually removal of cpu
device_type properties.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
2018-09-28 14:25:58 -05:00
Rob Herring a94fe36634 powerpc: use for_each_of_cpu_node iterator
Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This
has the side effect of defaulting to iterating using "cpu" node names in
preference to the deprecated (for FDT) device_type == "cpu".

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
2018-09-28 14:25:58 -05:00
Eric W. Biederman f383d8b4ae signal/powerpc: Use force_sig_fault where appropriate
Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21 15:53:56 +02:00
Alexey Kardashevskiy 7233b8cab3 powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again)
mpe: This was fixed originally in commit d3d4ffaae4
("powerpc/powernv/ioda2: Reduce upper limit for DMA window size"), but
contrary to what the merge commit says was inadvertently lost by me in
commit ce57c6610c ("Merge branch 'topic/ppc-kvm' into next") which
brought in changes that moved the code to a new file. So reapply it to
the new file.

Original commit message follows:

We use PHB in mode1 which uses bit 59 to select a correct DMA window.
However there is mode2 which uses bits 59:55 and allows up to 32 DMA
windows per a PE.

Even though documentation does not clearly specify that, it seems that
the actual hardware does not support bits 59:55 even in mode1, in
other words we can create a window as big as 1<<58 but DMA simply
won't work.

This reduces the upper limit from 59 to 55 bits to let the userspace
know about the hardware limits.

Fixes: ce57c6610c ("Merge branch 'topic/ppc-kvm' into next")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-20 14:31:03 +10:00
Suraj Jitindar Singh 74422e2b19 powerpc/pseries: Remove VLA from lparcfg_write()
In lparcfg_write we hard code kbuf_sz and then use this as the variable
length of kbuf creating a variable length array. Since we're hard coding
the length anyway just define the array using this as the length and
remove the need for kbuf_sz, thus removing the variable length array.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19 22:08:12 +10:00
Nathan Fontenot fd12527a1d powerpc/pseries: Remove unneeded uses of dlpar work queue
There are three instances in which dlpar hotplug events are invoked;
handling a hotplug interrupt (in a kvm guest), handling a dlpar
request through sysfs, and updating LMB affinity when handling a
PRRN event. Only in the case of handling a hotplug interrupt do we
have to put the work on a workqueue, the other cases can handle the
dlpar request directly.

This patch exports the handle_dlpar_errorlog() function so that
dlpar hotplug events can be handled directly and updates the two
instances mentioned above to use the direct invocation.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19 22:08:12 +10:00
Nathan Fontenot 063b8b1251 powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR request
The updates to powerpc numa and memory hotplug code now use the
in-kernel LMB array instead of the device tree. This change allows the
pseries memory DLPAR code to only update the device tree once after
successfully handling a DLPAR request.

Prior to the in-kernel LMB array, the numa code looked up the affinity
for memory being added in the device tree, the code now looks this up
in the LMB array. This change means the memory hotplug code can just
update the affinity for an LMB in the LMB array instead of updating
the device tree.

This also provides a savings in kernel memory. When updating the
device tree old properties are never free'ed since there is no
usecount on properties. This behavior leads to a new copy of the
property being allocated every time a LMB is added or removed (i.e. a
request to add 100 LMBs creates 100 new copies of the property). With
this update only a single new property is created when a DLPAR request
completes successfully.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19 22:08:12 +10:00
Nicholas Piggin 2a056f58fd powerpc: consolidate -mno-sched-epilog into FTRACE flags
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19 22:01:56 +10:00
Mahesh Salgaonkar c6d15258cd powerpc/pseries: Dump the SLB contents on SLB MCE errors.
If we get a machine check exceptions due to SLB errors then dump the
current SLB contents which will be very much helpful in debugging the
root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
faulty SLB entries. In real mode mce handler saves the old SLB contents
into this buffer accessible through paca and print it out later in virtual
mode.

With this patch the console will log SLB contents like below on SLB MCE
errors:

[  507.297236] SLB contents of cpu 0x1
[  507.297237] Last SLB entry inserted at slot 16
[  507.297238] 00 c000000008000000 400ea1b217000500
[  507.297239]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
[  507.297240] 01 d000000008000000 400d43642f000510
[  507.297242]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  507.297243] 11 f000000008000000 400a86c85f000500
[  507.297244]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
[  507.297245] 12 00007f0008000000 4008119624000d90
[  507.297246]   1T  ESID=       7f  VSID=      8119624 LLP:110
[  507.297247] 13 0000000018000000 00092885f5150d90
[  507.297247]  256M ESID=        1  VSID=   92885f5150 LLP:110
[  507.297248] 14 0000010008000000 4009e7cb50000d90
[  507.297249]   1T  ESID=        1  VSID=      9e7cb50 LLP:110
[  507.297250] 15 d000000008000000 400d43642f000510
[  507.297251]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  507.297252] 16 d000000008000000 400d43642f000510
[  507.297253]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  507.297253] ----------------------------------
[  507.297254] SLB cache ptr value = 3
[  507.297254] Valid SLB cache entries:
[  507.297255] 00 EA[0-35]=    7f000
[  507.297256] 01 EA[0-35]=        1
[  507.297257] 02 EA[0-35]=     1000
[  507.297257] Rest of SLB cache entries:
[  507.297258] 03 EA[0-35]=    7f000
[  507.297258] 04 EA[0-35]=        1
[  507.297259] 05 EA[0-35]=     1000
[  507.297260] 06 EA[0-35]=       12
[  507.297260] 07 EA[0-35]=    7f000

Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19 21:59:41 +10:00
Mahesh Salgaonkar 8f0b80561f powerpc/pseries: Display machine check error details.
Extract the MCE error details from RTAS extended log and display it to
console.

With this patch you should now see mce logs like below:

[  142.371818] Severe Machine check interrupt [Recovered]
[  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[  142.371822]   Initiator: CPU
[  142.371823]   Error type: SLB [Multihit]
[  142.371824]     Effective address: d00000000ca70000

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19 21:59:41 +10:00