1
0
Fork 0
Commit Graph

288 Commits (7e1c4e27928e5f87b9b1eaf06dc31773b2f1e7f1)

Author SHA1 Message Date
Mike Rapoport 7e1c4e2792 memblock: stop using implicit alignment to SMP_CACHE_BYTES
When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.

Implicit alignment is done deep in the memblock allocator and it can
come as a surprise.  Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.

Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.

For the case when memblock APIs are used via helper functions, e.g.  like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.

The direct memblock APIs users were updated using the semantic patch below:

@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)

[mhocko@suse.com: changelog update]
[akpm@linux-foundation.org: coding-style fixes]
[rppt@linux.ibm.com: fix missed uses of implicit alignment]
  Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paul Burton <paul.burton@mips.com>	[MIPS]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Mike Rapoport 57c8a661d9 mm: remove include/linux/bootmem.h
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>

@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Mike Rapoport eb31d559f1 memblock: remove _virt from APIs returning virtual address
The conversion is done using

sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
	$(git grep -l memblock_virt_alloc)

Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:15 -07:00
Linus Torvalds aa5b1054ba powerpc fixes for 4.19 #2
- An implementation for the newly added hv_ops->flush() for the OPAL hvc
    console driver backends, I forgot to apply this after merging the hvc driver
    changes before the merge window.
 
  - Enable all PCI bridges at boot on powernv, to avoid races when multiple
    children of a bridge try to enable it simultaneously. This is a workaround
    until the PCI core can be enhanced to fix the races.
 
  - A fix to query PowerVM for the correct system topology at boot before
    initialising sched domains, seen in some configurations to cause broken
    scheduling etc.
 
  - A fix for pte_access_permitted() on "nohash" platforms.
 
  - Two commits to fix SIGBUS when using remap_pfn_range() seen on Power9 due to
    a workaround when using the nest MMU (GPUs, accelerators).
 
  - Another fix to the VFIO code used by KVM, the previous fix had some bugs
    which caused guests to not start in some configurations.
 
  - A handful of other minor fixes.
 
 Thanks to:
   Aneesh Kumar K.V, Benjamin Herrenschmidt, Christophe Leroy, Hari Bathini, Luke
   Dashjr, Mahesh Salgaonkar, Nicholas Piggin, Paul Mackerras, Srikar Dronamraju.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAlt//10THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLUGD/9y/MPrs+V2lNFhxP+l1jO4Ro0v8DPI
 vARjqq06WfppgQgSS1dvWfzLaMFbGe8wPRfL0T1xMOXCZg8Ts/HrgxVBVFYcv6/Q
 xBaU5bKztg6HKQbwwO+8B/gdTA3hO7yFVux6JGwsGO5Ebl8Q3UDOdbvYX6XTj3H8
 rdiicle9LaI7qodC8bxlBo1Be0YKEW0O/ag179sxXzozzvoPIyFpeX6FL1sAft++
 XlQS1MHu4hErSM2rbmyoFCm+SmyRt3CD0NTVmNd2cgw5XexPOBFlnsdgpaK1jJFc
 CYu1chP83E91ol1/8NAPkcmPWvP6MGyoqOl75RghooY2D1IJ2GtLKoz2Dvc53ay2
 ZlIpMyc2CYIa55Mj18tOV/NbGbh0Lf0Ta++BxqxbcCDt5fq0VxHkoDPqBgWh/tdp
 Po7oQc7U2VTKKC3wiLr//nSHpgtSTAWtucDt7oT7GdP8+EMxUZ8teBFIfTkwfuD2
 plroEmcYuRD3beI4FAG/iCp5POOCsnHLkKVDl7tyQPl3Yvu8hvyLY9gBS9RN4Unt
 /z94YFJtz7UD+VP7jDaPiQS26y0WEJfW8ml0tNxMZVBdksZLPIPN0/EU/04V0jWf
 GxynzKMhElxC67tK/Eb43EGiLEZAlbEnJxoOtiCnL1MK+OIJPjg6e7o6qlsmaa+Y
 zkXVpxLtkq0OoQ==
 =1AUa
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - An implementation for the newly added hv_ops->flush() for the OPAL
   hvc console driver backends, I forgot to apply this after merging the
   hvc driver changes before the merge window.

 - Enable all PCI bridges at boot on powernv, to avoid races when
   multiple children of a bridge try to enable it simultaneously. This
   is a workaround until the PCI core can be enhanced to fix the races.

 - A fix to query PowerVM for the correct system topology at boot before
   initialising sched domains, seen in some configurations to cause
   broken scheduling etc.

 - A fix for pte_access_permitted() on "nohash" platforms.

 - Two commits to fix SIGBUS when using remap_pfn_range() seen on Power9
   due to a workaround when using the nest MMU (GPUs, accelerators).

 - Another fix to the VFIO code used by KVM, the previous fix had some
   bugs which caused guests to not start in some configurations.

 - A handful of other minor fixes.

Thanks to: Aneesh Kumar K.V, Benjamin Herrenschmidt, Christophe Leroy,
Hari Bathini, Luke Dashjr, Mahesh Salgaonkar, Nicholas Piggin, Paul
Mackerras, Srikar Dronamraju.

* tag 'powerpc-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mce: Fix SLB rebolting during MCE recovery path.
  KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages
  powerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transition
  powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.
  powerpc/nohash: fix pte_access_permitted()
  powerpc/topology: Get topology for shared processors at boot
  powerpc64/ftrace: Include ftrace.h needed for enable/disable calls
  powerpc/powernv/pci: Work around races in PCI bridge enabling
  powerpc/fadump: cleanup crash memory ranges support
  powerpc/powernv: provide a console flush operation for opal hvc driver
  powerpc/traps: Avoid rate limit messages from show unhandled signals
  powerpc/64s: Fix PACA_IRQ_HARD_DIS accounting in idle_power4()
2018-08-24 09:34:23 -07:00
Benjamin Herrenschmidt db2173198b powerpc/powernv/pci: Work around races in PCI bridge enabling
The generic code is racy when multiple children of a PCI bridge try to
enable it simultaneously.

This leads to drivers trying to access a device through a
not-yet-enabled bridge, and this EEH errors under various
circumstances when using parallel driver probing.

There is work going on to fix that properly in the PCI core but it
will take some time.

x86 gets away with it because (outside of hotplug), the BIOS enables
all the bridges at boot time.

This patch does the same thing on powernv by enabling all bridges that
have child devices at boot time, thus avoiding subsequent races. It's
suitable for backporting to stable and distros, while the proper PCI
fix will probably be significantly more invasive.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-20 20:29:17 +10:00
Linus Torvalds 5e2d059b52 powerpc updates for 4.19
Notable changes:
 
  - A fix for a bug in our page table fragment allocator, where a page table page
    could be freed and reallocated for something else while still in use, leading
    to memory corruption etc. The fix reuses pt_mm in struct page (x86 only) for
    a powerpc only refcount.
 
  - Fixes to our pkey support. Several are user-visible changes, but bring us in
    to line with x86 behaviour and/or fix outright bugs. Thanks to Florian Weimer
    for reporting many of these.
 
  - A series to improve the hvc driver & related OPAL console code, which have
    been seen to cause hardlockups at times. The hvc driver changes in particular
    have been in linux-next for ~month.
 
  - Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.
 
  - Remove Power8 DD1 and Power9 DD1 support, neither chip should be in use
    anywhere other than as a paper weight.
 
  - An optimised memcmp implementation using Power7-or-later VMX instructions
 
  - Support for barrier_nospec on some NXP CPUs.
 
  - Support for flushing the count cache on context switch on some IBM CPUs
    (controlled by firmware), as a Spectre v2 mitigation.
 
  - A series to enhance the information we print on unhandled signals to bring it
    into line with other arches, including showing the offending VMA and dumping
    the instructions around the fault.
 
 Thanks to:
   Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Alexey
   Spirkov, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar,
   Arnd Bergmann, Bartosz Golaszewski, Benjamin Herrenschmidt, Bharat Bhushan,
   Bjoern Noetel, Boqun Feng, Breno Leitao, Bryant G. Ly, Camelia Groza,
   Christophe Leroy, Christoph Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt,
   Darren Stevens, Dave Young, David Gibson, Diana Craciun, Finn Thain, Florian
   Weimer, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
   Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel Stanley,
   Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh Salgaonkar, Markus
   Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues, Michael Hanselmann, Michael
   Neuling, Michael Schmitz, Mukesh Ojha, Murilo Opsfelder Araujo, Nicholas
   Piggin, Parth Y Shah, Paul Mackerras, Paul Menzel, Ram Pai, Randy Dunlap,
   Rashmica Gupta, Reza Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff,
   Scott Wood, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson,
   Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat Rao
   B, zhong jiang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAlt2O6cTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgC7hD/4+cj796Df7GsVsIMxzQm7SS9dklIdO
 JuKj2Nr5HRzTH59jWlXukLG9mfTNCFgFJB4gEpK1ArDOTcHTCI9RRsLZTZ/kum66
 7Pd+7T40dLYXB5uecuUs0vMXa2fI3syKh1VLzACSXv3Dh9BBIKQBwW/aD2eww4YI
 1fS5LnXZ2PSxfr6KNAC6ogZnuaiD0sHXOYrtGHq+S/TFC7+Z6ySa6+AnPS+hPVoo
 /rHDE1Khr66aj7uk+PP2IgUrCFj6Sbj6hTVlS/iAuwbMjUl9ty6712PmvX9x6wMZ
 13hJQI+g6Ci+lqLKqmqVUpXGSr6y4NJGPS/Hko4IivBTJApI+qV/tF2H9nxU+6X0
 0RqzsMHPHy13n2torA1gC7ttzOuXPI4hTvm6JWMSsfmfjTxLANJng3Dq3ejh6Bqw
 76EMowpDLexwpy7/glPpqNdsP4ySf2Qm8yq3mR7qpL4m3zJVRGs11x+s5DW8NKBL
 Fl5SqZvd01abH+sHwv6NLaLkEtayUyohxvyqu2RU3zu5M5vi7DhqstybTPjKPGu0
 icSPh7b2y10WpOUpC6lxpdi8Me8qH47mVc/trZ+SpgBrsuEmtJhGKszEnzRCOqos
 o2IhYHQv3lQv86kpaAFQlg/RO+Lv+Lo5qbJ209V+hfU5nYzXpEulZs4dx1fbA+ze
 fK8GEh+u0L4uJg==
 =PzRz
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - A fix for a bug in our page table fragment allocator, where a page
     table page could be freed and reallocated for something else while
     still in use, leading to memory corruption etc. The fix reuses
     pt_mm in struct page (x86 only) for a powerpc only refcount.

   - Fixes to our pkey support. Several are user-visible changes, but
     bring us in to line with x86 behaviour and/or fix outright bugs.
     Thanks to Florian Weimer for reporting many of these.

   - A series to improve the hvc driver & related OPAL console code,
     which have been seen to cause hardlockups at times. The hvc driver
     changes in particular have been in linux-next for ~month.

   - Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.

   - Remove Power8 DD1 and Power9 DD1 support, neither chip should be in
     use anywhere other than as a paper weight.

   - An optimised memcmp implementation using Power7-or-later VMX
     instructions

   - Support for barrier_nospec on some NXP CPUs.

   - Support for flushing the count cache on context switch on some IBM
     CPUs (controlled by firmware), as a Spectre v2 mitigation.

   - A series to enhance the information we print on unhandled signals
     to bring it into line with other arches, including showing the
     offending VMA and dumping the instructions around the fault.

  Thanks to: Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey
  Kardashevskiy, Alexey Spirkov, Alistair Popple, Andrew Donnellan,
  Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Bartosz Golaszewski,
  Benjamin Herrenschmidt, Bharat Bhushan, Bjoern Noetel, Boqun Feng,
  Breno Leitao, Bryant G. Ly, Camelia Groza, Christophe Leroy, Christoph
  Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt, Darren Stevens, Dave
  Young, David Gibson, Diana Craciun, Finn Thain, Florian Weimer,
  Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
  Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel
  Stanley, Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh
  Salgaonkar, Markus Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues,
  Michael Hanselmann, Michael Neuling, Michael Schmitz, Mukesh Ojha,
  Murilo Opsfelder Araujo, Nicholas Piggin, Parth Y Shah, Paul
  Mackerras, Paul Menzel, Ram Pai, Randy Dunlap, Rashmica Gupta, Reza
  Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff, Scott Wood,
  Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson, Thiago
  Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat
  Rao, zhong jiang"

* tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (234 commits)
  powerpc/mm/book3s/radix: Add mapping statistics
  powerpc/uaccess: Enable get_user(u64, *p) on 32-bit
  powerpc/mm/hash: Remove unnecessary do { } while(0) loop
  powerpc/64s: move machine check SLB flushing to mm/slb.c
  powerpc/powernv/idle: Fix build error
  powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range
  powerpc/mm: remove warning about ‘type’ being set
  powerpc/32: Include setup.h header file to fix warnings
  powerpc: Move `path` variable inside DEBUG_PROM
  powerpc/powermac: Make some functions static
  powerpc/powermac: Remove variable x that's never read
  cxl: remove a dead branch
  powerpc/powermac: Add missing include of header pmac.h
  powerpc/kexec: Use common error handling code in setup_new_fdt()
  powerpc/xmon: Add address lookup for percpu symbols
  powerpc/mm: remove huge_pte_offset_and_shift() prototype
  powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled
  powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
  powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements
  powerpc/fadump: handle crash memory ranges array index overflow
  ...
2018-08-17 11:32:50 -07:00
Hari Vyas 44bda4b7d2 PCI: Fix is_added/is_busmaster race condition
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.

When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.

is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.

A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.

Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master().  As these fields are not
handled atomically, that clears the is_added bit.

Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state.  This avoids the
race because is_added no longer shares a memory location with is_busmaster.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 11:27:54 -05:00
Michael Ellerman ce57c6610c Merge branch 'topic/ppc-kvm' into next
Merge in some commits we're sharing with the KVM tree.

I manually propagated the change from commit d3d4ffaae4
("powerpc/powernv/ioda2: Reduce upper limit for DMA window size") into
pci-ioda-tce.c.

Conflicts:
        arch/powerpc/include/asm/cputable.h
        arch/powerpc/platforms/powernv/pci-ioda.c
        arch/powerpc/platforms/powernv/pci.h
2018-07-19 14:37:57 +10:00
Alexey Kardashevskiy a68bd1267b powerpc/powernv/ioda: Allocate indirect TCE levels on demand
At the moment we allocate the entire TCE table, twice (hardware part and
userspace translation cache). This normally works as we normally have
contigous memory and the guest will map entire RAM for 64bit DMA.

However if we have sparse RAM (one example is a memory device), then
we will allocate TCEs which will never be used as the guest only maps
actual memory for DMA. If it is a single level TCE table, there is nothing
we can really do but if it a multilevel table, we can skip allocating
TCEs we know we won't need.

This adds ability to allocate only first level, saving memory.

This changes iommu_table::free() to avoid allocating of an extra level;
iommu_table::set() will do this when needed.

This adds @alloc parameter to iommu_table::exchange() to tell the callback
if it can allocate an extra level; the flag is set to "false" for
the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
H_TOO_HARD.

This still requires the entire table to be counted in mm::locked_vm.

To be conservative, this only does on-demand allocation when
the usespace cache table is requested which is the case of VFIO.

The example math for a system replicating a powernv setup with NVLink2
in a guest:
16GB RAM mapped at 0x0
128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000

the table to cover that all with 64K pages takes:
(((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB

If we allocate only necessary TCE levels, we will only need:
(((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect
levels).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:11 +10:00
Alexey Kardashevskiy 090bad39b2 powerpc/powernv: Add indirect levels to it_userspace
We want to support sparse memory and therefore huge chunks of DMA windows
do not need to be mapped. If a DMA window big enough to require 2 or more
indirect levels, and a DMA window is used to map all RAM (which is
a default case for 64bit window), we can actually save some memory by
not allocation TCE for regions which we are not going to map anyway.

The hardware tables alreary support indirect levels but we also keep
host-physical-to-userspace translation array which is allocated by
vmalloc() and is a flat array which might use quite some memory.

This converts it_userspace from vmalloc'ed array to a multi level table.

As the format becomes platform dependend, this replaces the direct access
to it_usespace with a iommu_table_ops::useraddrptr hook which returns
a pointer to the userspace copy of a TCE; future extension will return
NULL if the level was not allocated.

This should not change non-KVM handling of TCE tables and it_userspace
will not be allocated for non-KVM tables.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:10 +10:00
Alexey Kardashevskiy 191c22879f powerpc/powernv: Move TCE manupulation code to its own file
Right now we have allocation code in pci-ioda.c and traversing code in
pci.c, let's keep them toghether. However both files are big enough
already so let's move this business to a new file.

While we at it, move the code which links IOMMU table groups to
IOMMU tables as it is not specific to any PNV PHB model.

These puts exported symbols from the new file together.

This fixes several warnings from checkpatch.pl like this:
"WARNING: Prefer 'unsigned int' to bare use of 'unsigned'".

As this is almost cut-n-paste, there should be no behavioral change.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:07 +10:00
Alexey Kardashevskiy da2bb0da73 powerpc/powernv: Remove useless wrapper
This gets rid of a useless wrapper around
pnv_pci_ioda2_table_free_pages().

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:47:02 +10:00
Alexey Kardashevskiy 00c376fdd7 powerpc/powernv/ioda2: Add 256M IOMMU page size to the default POWER8 case
The sketchy bypass uses 256M pages so add this page size as well.

This should cause no behavioral change but will be used later.

Fixes: 477afd6ea6 "powerpc/ioda: Use ibm,supported-tce-sizes for IOMMU page size mask"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-04 22:41:09 +10:00
Alastair D'Silva 8bf6b91a51 Revert "powerpc/powernv: Add support for the cxl kernel api on the real phb"
Remove abandonned capi support for the Mellanox CX4.

This reverts commit 4361b03430.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:32 +10:00
Alastair D'Silva 0cfd7335d1 Revert "cxl: Add support for interrupts on the Mellanox CX4"
Remove abandonned capi support for the Mellanox CX4.

This reverts commit a2f67d5ee8.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:30 +10:00
Alexey Kardashevskiy d3d4ffaae4 powerpc/powernv/ioda2: Reduce upper limit for DMA window size
We use PHB in mode1 which uses bit 59 to select a correct DMA window.
However there is mode2 which uses bits 59:55 and allows up to 32 DMA
windows per a PE.

Even though documentation does not clearly specify that, it seems that
the actual hardware does not support bits 59:55 even in mode1, in other
words we can create a window as big as 1<<58 but DMA simply won't work.

This reduces the upper limit from 59 to 55 bits to let the userspace know
about the hardware limits.

Fixes: 7aafac11e3 "powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:29 +10:00
Alexey Kardashevskiy 98fd72fe82 powerpc/powernv/ioda2: Remove redundant free of TCE pages
When IODA2 creates a PE, it creates an IOMMU table with it_ops::free
set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages().

Since iommu_tce_table_put() calls it_ops::free when the last reference
to the table is released, explicit call to pnv_pci_ioda2_table_free_pages()
is not needed so let's remove it.

This should fix double free in the case of PCI hotuplug as
pnv_pci_ioda2_table_free_pages() does not reset neither
iommu_table::it_base nor ::it_size.

This was not exposed by SRIOV as it uses different code path via
pnv_pcibios_sriov_disable().

IODA1 does not inialize it_ops::free so it does not have this issue.

Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:25 +10:00
Michael Ellerman 001ff2ee13 powerpc/powernv: Use __raw_[rm_]writeq_be() in pci-ioda.c
This allows us to squash some sparse warnings and also avoids having
to do explicity endian conversions in the code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
2018-05-18 22:00:00 +10:00
Alexey Kardashevskiy 7ef73cd39b powerpc/ioda: Use ibm, supported-tce-sizes for IOMMU page size mask
At the moment we assume that IODA2 and newer PHBs can always do 4K/64K/16M
IOMMU pages, however this is not the case for POWER9 and now skiboot
advertises the supported sizes via the device so we use that instead
of hard coding the mask.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-14 23:10:33 +10:00
Alexey Kardashevskiy d41ce7b1bc powerpc/powernv/npu: Do not try invalidating 32bit table when 64bit table is enabled
GPUs and the corresponding NVLink bridges get different PEs as they
have separate translation validation entries (TVEs). We put these PEs
to the same IOMMU group so they cannot be passed through separately.
So the iommu_table_group_ops::set_window/unset_window for GPUs do set
tables to the NPU PEs as well which means that iommu_table's list of
attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked.
This list is used for TCE cache invalidation.

The problem is that NPU PE has just a single TVE and can be programmed
to point to 32bit or 64bit windows while GPU PE has two (as any other
PCI device). So we end up having an 32bit iommu_table struct linked to
both PEs even though only the 64bit TCE table cache can be invalidated
on NPU. And a relatively recent skiboot detects this and prints
errors.

This changes GPU's iommu_table_group_ops::set_window/unset_window to
make sure that NPU PE is only linked to the table actually used by the
hardware. If there are two tables used by an IOMMU group, the NPU PE
will use the last programmed one which with the current use scenarios
is expected to be a 64bit one.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:44:57 +11:00
Markus Elfring a0828cf57a powerpc: Use sizeof(*foo) rather than sizeof(struct foo)
It's slightly less error prone to use sizeof(*foo) rather than
specifying the type.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[mpe: Consolidate into one patch, rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-20 16:47:53 +11:00
Ingo Molnar ed7158bae4 treewide/trivial: Remove ';;$' typo noise
On lkml suggestions were made to split up such trivial typo fixes into per subsystem
patches:

  --- a/arch/x86/boot/compressed/eboot.c
  +++ b/arch/x86/boot/compressed/eboot.c
  @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
          struct efi_uga_draw_protocol *uga = NULL, *first_uga;
          efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
          unsigned long nr_ugas;
  -       u32 *handles = (u32 *)uga_handle;;
  +       u32 *handles = (u32 *)uga_handle;
          efi_status_t status = EFI_INVALID_PARAMETER;
          int i;

This patch is the result of the following script:

  $ sed -i 's/;;$/;/g' $(git grep -E ';;$'  | grep "\.[ch]:"  | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)

... followed by manual review to make sure it's all good.

Splitting this up is just crazy talk, let's get over with this and just do it.

Reported-by: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-22 10:59:33 +01:00
Linus Torvalds 03f51d4efa powerpc updates for 4.16
Highlights:
 
  - Enable support for memory protection keys aka "pkeys" on Power7/8/9 when
    using the hash table MMU.
 
  - Extend our interrupt soft masking to support masking PMU interrupts as well
    as "normal" interrupts, and then use that to implement local_t for a ~4x
    speedup vs the current atomics-based implementation.
 
  - A new driver "ocxl" for "Open Coherent Accelerator Processor Interface
    (OpenCAPI)" devices.
 
  - Support for new device tree properties on PowerVM to describe hotpluggable
    memory and devices.
 
  - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit VDSO.
 
  - Freescale updates from Scott:
      "Contains fixes for CPM GPIO and an FSL PCI erratum workaround, plus a
       minor cleanup patch."
 
 As well as quite a lot of other changes all over the place, and small fixes and
 cleanups as always.
 
 Thanks to:
   Alan Modra, Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andreas
   Schwab, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman
   Khandual, Anton Blanchard, Arnd Bergmann, Balbir Singh, Benjamin
   Herrenschmidt, Bhaktipriya Shridhar, Bryant G. Ly, Cédric Le Goater,
   Christophe Leroy, Christophe Lombard, Cyril Bur, David Gibson, Desnes A. Nunes
   do Rosario, Dmitry Torokhov, Frederic Barrat, Geert Uytterhoeven, Guilherme G.
   Piccoli, Gustavo A. R. Silva, Gustavo Romero, Ivan Mikhaylov, Joakim
   Tjernlund, Joe Perches, Josh Poimboeuf, Juan J. Alvarez, Julia Cartwright,
   Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre,
   Michael Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot,
   Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai,
   Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee, Simon Guo, Stewart
   Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann, Vaibhav Jain, Vasyl
   Gomonovych.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJadF6wExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYA2
 nBAAnguCEyAIYpc+ffE3WU9xJEWxa6bKuVufHcUFVntGiGD+igmMS+SHp4ay3Aos
 HcA4WFrpzNb2KZ++kmFWtAKWnMfCiW9xuYJNicjr7X5ZiVBEhLWN/mQCwBKs3p6L
 5+HhvytcdkKVbEcyVjEGvRL40AyxXNOI02o6Co9X8vanHsmWB4q0eWe4PHstZqlg
 6K6kazMp+NTvEFYwKNXDOvuHouKSL57l14SLROH7CpJkNTOQ9s+W59/LmnuCjRlu
 o70b7iWOAEbF9tvMma1ksDZVNj7mSyaymLYCyOXu4CkuuleJacZYJ9oQGNddoIbC
 wk7l93vPT/yze7DYg8x3uXpKcaDEvEepPuQ/ubz+UXFQWuJtl5ej6Cv+0eOmyZIs
 +bjWhGHKdNttnsiPlTRCX/gWD13RE1dB6xXJlfOJ7Oz9OnXXK8ZKc1NTREbQXRWM
 8tClAwf9upWpm86GHPVnyrgYbgZo5b1os4SoS8e3kESzakrQVQP7J376u2DtccRq
 2AGqjJ+tl5tYPnhm8zG1cNrpqHHpgkNGqLS7DvWRg3EPmEKVQcltN1b/0aKaAjHA
 aTRofjrVo+jJ4MX1uyEo59yNCEQPfjkmHRQdLwm+xjWTzEPfIMzpWyXm14tawDQf
 OjcAe90W/qQ18brw4z+2BI14J76XziOSX/QcunOn1u/sqaM=
 =3rYn
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights:

   - Enable support for memory protection keys aka "pkeys" on Power7/8/9
     when using the hash table MMU.

   - Extend our interrupt soft masking to support masking PMU interrupts
     as well as "normal" interrupts, and then use that to implement
     local_t for a ~4x speedup vs the current atomics-based
     implementation.

   - A new driver "ocxl" for "Open Coherent Accelerator Processor
     Interface (OpenCAPI)" devices.

   - Support for new device tree properties on PowerVM to describe
     hotpluggable memory and devices.

   - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit
     VDSO.

   - Freescale updates from Scott: fixes for CPM GPIO and an FSL PCI
     erratum workaround, plus a minor cleanup patch.

  As well as quite a lot of other changes all over the place, and small
  fixes and cleanups as always.

  Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy,
  Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V,
  Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann,
  Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G.
  Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur,
  David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic
  Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva,
  Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh
  Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal,
  Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael
  Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot,
  Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud,
  Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee,
  Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann,
  Vaibhav Jain, Vasyl Gomonovych"

* tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (199 commits)
  powerpc/mm/radix: Fix build error when RADIX_MMU=n
  macintosh/ams-input: Use true and false for boolean values
  macintosh: change some data types from int to bool
  powerpc/watchdog: Print the NIP in soft_nmi_interrupt()
  powerpc/watchdog: regs can't be null in soft_nmi_interrupt()
  powerpc/watchdog: Tweak watchdog printks
  powerpc/cell: Remove axonram driver
  rtc-opal: Fix handling of firmware error codes, prevent busy loops
  powerpc/mpc52xx_gpt: make use of raw_spinlock variants
  macintosh/adb: Properly mark continued kernel messages
  powerpc/pseries: Fix cpu hotplug crash with memoryless nodes
  powerpc/numa: Ensure nodes initialized for hotplug
  powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
  powerpc/kernel: Block interrupts when updating TIDR
  powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
  powerpc/mm/nohash: do not flush the entire mm when range is a single page
  powerpc/pseries: Add Initialization of VF Bars
  powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV
  powerpc/eeh: Add EEH notify resume sysfs
  powerpc/eeh: Add EEH operations to notify resume
  ...
2018-02-02 10:01:04 -08:00
Alexey Kardashevskiy 902bdc5745 powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
The pcidev value stored in pci_dn is only used for NPU/NPU2
initialization. We can easily drop the cached pointer and
use an ancient helper - pci_get_domain_bus_and_slot() instead in order
to reduce complexity.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27 20:39:01 +11:00
Andrew Donnellan 228c2f4103 powerpc/powernv: Set correct configuration space size for opencapi devices
The configuration space for opencapi devices doesn't have a PCI
Express capability, therefore confusing linux in thinking it's of an
old PCI type with a 256-byte configuration space size, instead of the
desired 4k. So add a PCI fixup to declare the correct size.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:56 +11:00
Frederic Barrat 7f2c39e91f powerpc/powernv: Introduce new PHB type for opencapi links
The NPU was already abstracted by opal as a virtual PHB for nvlink,
but it helps to be able to differentiate between a nvlink or opencapi
PHB, as it's not completely transparent to linux. In particular, PE
assignment differs and we'll also need the information in later
patches.

So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a
new type PNV_PHB_NPU_OCAPI.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24 11:42:56 +11:00
Guilherme G. Piccoli 45baee1416 powerpc/powernv: Add ppc_pci_reset_phbs parameter to issue a PHB reset
During a kdump kernel boot in PowerPC, we request a reset of the PHBs
to the FW. It makes sense, since if we are booting a kdump kernel it
means we had some trouble before and we cannot rely in the adapters'
health; they could be in a bad state, hence the reset is needed.

But this reset is useful not only in kdump - there are situations,
specially when debugging drivers, that we could break an adapter in
a way it requires such reset. One can tell to just go ahead and
reboot the machine, but happens that many times doing kexec is much
faster, and so preferable than a full power cycle.

This patch adds the ppc_pci_reset_phbs parameter to perform such reset.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 05:48:35 +11:00
Alexey Kardashevskiy ae677ff02f powerpc/powernv/ioda: Finish removing explicit max window size check
9003a2498 removed checn from the DMA window pages allocator, however
the VFIO driver tests limits before doing so by calling
the get_table_size hook which was left behind; this fixes it.

Fixes: 9003a2498 "powerpc/powernv/ioda: Remove explicit max window size check"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21 01:12:53 +11:00
Christoph Hellwig 2d9d6f6c9e powerpc: rename dma_direct_ to dma_nommu_
We want to use the dma_direct_ namespace for a generic implementation,
so rename powerpc to the second best choice: dma_nommu_.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-10 16:41:14 +01:00
Bryant G. Ly 988fc3ba56 powerpc/pci: Separate SR-IOV Calls
SR-IOV can now be enabled for the powernv platform and pseries
platform. Therefore move the appropriate calls to machine dependent
code instead of relying on definition at compile time.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@us.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:35 +11:00
Joe Perches f2c2cbcc35 powerpc: Use pr_warn instead of pr_warning
At some point, pr_warning will be removed so all logging messages use
a consistent <prefix>_warn style.

Update arch/powerpc/

Miscellanea:

o Coalesce formats
o Realign arguments
o Use %s, __func__ instead of embedded function names
o Remove unnecessary line continuations

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Geoff Levand <geoff@infradead.org>
[mpe: Rebase due to some %pOF changes.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04 11:54:34 +11:00
Alexey Kardashevskiy 9003a24981 powerpc/powernv/ioda: Remove explicit max window size check
DMA windows can only have a size of power of two on IODA2 hardware and
using memory_hotplug_max() to determine the upper limit won't work
correcly if it returns not power of two value.

This removes the check as the platform code does this check in
pnv_pci_ioda2_setup_default_config() anyway; the other client is VFIO
and that thing checks against locked_vm limit which prevents the userspace
from locking too much memory.

It is expected to impact DPDK on machines with non-power-of-two RAM size,
mostly. KVM guests are less likely to be affected as usually guests get
less than half of hosts RAM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 23:28:25 +11:00
Alexey Kardashevskiy d6f934fd48 powerpc/powernv: Reserve a hole which appears after enabling IOV
In order to make generic IOV code work, the physical function IOV BAR
should start from offset of the first VF. Since M64 segments share
PE number space across PHB, and some PEs may be in use at the time
when IOV is enabled, the existing code shifts the IOV BAR to the index
of the first PE/VF. This creates a hole in IOMEM space which can be
potentially taken by some other device.

This reserves a temporary hole on a parent and releases it when IOV is
disabled; the temporary resources are stored in pci_dn to avoid
kmalloc/free.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 16:48:12 +11:00
Benjamin Herrenschmidt b9fde58db7 powerpc/powernv: Rework EEH initialization on powernv
Remove the post_init callback which is only used
by powernv, we can just call it explicitly from
the powernv code.

This partially kills the ability to "disable" eeh at
runtime via debugfs as this was calling that same
callback again, but this is both unused and broken
in several ways. If we want to revive it, we need
to create a dedicated enable/disable callback on the
backend that does the right thing.

Let the bulk of eeh initialize normally at
core_initcall() like it does on pseries by removing
the hack in eeh_init() that delays it.

Instead we make sure our eeh->probe cleanly bails
out of the PEs haven't been created yet and we force
a re-probe where we used to call eeh_init() again.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-26 11:19:07 +10:00
Rob Herring b7c670d673 powerpc: Convert to using %pOF instead of full_name
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:27:04 +10:00
Michael Ellerman 15c659ff9d Merge branch 'fixes' into next
There's a non-trivial dependency between some commits we want to put in
next and the KVM prefetch work around that went into fixes. So merge
fixes into next.
2017-08-23 22:20:10 +10:00
Frederic Barrat 2552910084 powerpc/powernv: Enable PCI peer-to-peer
P9 has support for PCI peer-to-peer, enabling a device to write in the
MMIO space of another device directly, without interrupting the CPU.

This patch adds support for it on powernv, by adding a new API to be
called by drivers. The pnv_pci_set_p2p(...) call configures an
'initiator', i.e the device which will issue the MMIO operation, and a
'target', i.e. the device on the receiving side.

P9 really only supports MMIO stores for the time being but that's
expected to change in the future, so the API allows to define both
load and store operations.

  /* PCI p2p descriptor */
  #define OPAL_PCI_P2P_ENABLE           0x1
  #define OPAL_PCI_P2P_LOAD             0x2
  #define OPAL_PCI_P2P_STORE            0x4

  int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
                      u64 desc)

It uses a new OPAL call, as the configuration magic is done on the
PHBs by skiboot.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
[mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08 11:27:30 +10:00
Alistair Popple 253fd51e2f powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
Commit 8e3f1b1d82 ("powerpc/powernv/pci: Enable 64-bit devices to access
>4GB DMA space") introduced the ability for PCI device drivers to request a
DMA mask between 64 and 32 bits and actually get a mask greater than
32-bits. However currently if certain machine configuration dependent
conditions are not meet the code silently falls back to a 32-bit mask.

This makes it hard for device drivers to detect which mask they actually
got. Instead we should return an error when the request could not be
fulfilled which allows drivers to either fallback or implement other
workarounds as documented in DMA-API-HOWTO.txt.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-28 23:02:55 +10:00
Russell Currey 8e3f1b1d82 powerpc/powernv/pci: Enable 64-bit devices to access >4GB DMA space
On PHB3/POWER8 systems, devices can select between two different sections
of address space, TVE#0 and TVE#1.  TVE#0 is intended for 32bit devices
that aren't capable of addressing more than 4GB.  Selecting TVE#1 instead,
with the capability of addressing over 4GB, is performed by setting bit 59
of a PCI address.

However, some devices aren't capable of addressing at least 59 bits, but
still want more than 4GB of DMA space.  In order to enable this, reconfigure
TVE#0 to be suitable for 64-bit devices by allocating memory past the
initial 4GB that is inaccessible by 64-bit DMAs.

This bypass mode is only enabled if a device requests 4GB or more of DMA
address space, if the system has PHB3 (POWER8 systems), and if the device
does not share a PE with any devices from different vendors.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:28 +10:00
Russell Currey a0f98629f1 powerpc/powernv/pci: Add helper to check if a PE has a single vendor
Add a helper that determines if all the devices contained in a given PE
are all from the same vendor or not.  This can be useful in determining
if it's okay to make PE-wide changes that may be suitable for some
devices but not for others.

This is used later in the series.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:28 +10:00
Russell Currey 5cb1f8fddd powerpc/powernv/pci: Dynamically allocate PHB diag data
Diagnostic data for PHBs currently works by allocated a fixed-sized buffer.
This is simple, but either wastes memory (though only a few kilobytes) or
in the case of PHB4 isn't enough to fit the whole data blob.

For machines that don't describe the diagnostic data size in the device
tree, use the hardcoded buffer size as before.  For those that do, only
allocate exactly what's needed.

In the special case of P7IOC (which has two types of diag data), the larger
should be specified in the device tree.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:27 +10:00
Linus Torvalds 857f864014 pci-v4.12-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z
 akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G
 tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St
 svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM
 KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5
 gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG
 0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f
 oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+
 IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE
 iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3
 IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq
 ags00JtMLitfNPBH3eSl
 =eE4D
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - add framework for supporting PCIe devices in Endpoint mode (Kishon
   Vijay Abraham I)

 - use non-postable PCI config space mappings when possible (Lorenzo
   Pieralisi)

 - clean up and unify mmap of PCI BARs (David Woodhouse)

 - export and unify Function Level Reset support (Christoph Hellwig)

 - avoid FLR for Intel 82579 NICs (Sasha Neftin)

 - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)

 - short-circuit config access failures for disconnected devices (Keith
   Busch)

 - remove D3 sleep delay when possible (Adrian Hunter)

 - freeze PME scan before suspending devices (Lukas Wunner)

 - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)

 - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)

 - add arch-specific alignment control to improve device passthrough by
   avoiding multiple BARs in a page (Yongji Xie)

 - add sysfs sriov_drivers_autoprobe to control VF driver binding
   (Bodong Wang)

 - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)

 - fix crashes when unbinding host controllers that don't support
   removal (Brian Norris)

 - add driver for MicroSemi Switchtec management interface (Logan
   Gunthorpe)

 - add driver for Faraday Technology FTPCI100 host bridge (Linus
   Walleij)

 - add i.MX7D support (Andrey Smirnov)

 - use generic MSI support for Aardvark (Thomas Petazzoni)

 - make Rockchip driver modular (Brian Norris)

 - advertise 128-byte Read Completion Boundary support for Rockchip
   (Shawn Lin)

 - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)

 - convert atomic_t to refcount_t in HV driver (Elena Reshetova)

 - add CPU IRQ affinity in HV driver (K. Y. Srinivasan)

 - fix PCI bus removal in HV driver (Long Li)

 - add support for ThunderX2 DMA alias topology (Jayachandran C)

 - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)

 - add ITE 8893 bridge DMA alias quirk (Jarod Wilson)

 - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
   (Manish Jaggi)

* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
  PCI: Don't allow unbinding host controllers that aren't prepared
  ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
  MAINTAINERS: Add PCI Endpoint maintainer
  Documentation: PCI: Add userguide for PCI endpoint test function
  tools: PCI: Add sample test script to invoke pcitest
  tools: PCI: Add a userspace tool to test PCI endpoint
  Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
  misc: Add host side PCI driver for PCI test function device
  PCI: Add device IDs for DRA74x and DRA72x
  dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
  PCI: dwc: dra7xx: Workaround for errata id i870
  dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
  PCI: dwc: dra7xx: Add EP mode support
  PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
  dt-bindings: PCI: Add DT bindings for PCI designware EP mode
  PCI: dwc: designware: Add EP mode support
  Documentation: PCI: Add binding documentation for pci-test endpoint function
  ixgbe: Use pcie_flr() instead of duplicating it
  IB/hfi1: Use pcie_flr() instead of duplicating it
  PCI: imx6: Fix spelling mistake: "contol" -> "control"
  ...
2017-05-08 19:03:25 -07:00
Alistair Popple 6b3d12a948 powerpc/powernv: Fix TCE kill on NVLink2
Commit 616badd2fb ("powerpc/powernv: Use OPAL call for TCE kill on
NVLink2") forced all TCE kills to go via the OPAL call for
NVLink2. However the PHB3 implementation of TCE kill was still being
called directly from some functions which in some circumstances caused
a machine check.

This patch adds an equivalent IODA2 version of the function which uses
the correct invalidation method depending on PHB model and changes all
external callers to use it instead.

Fixes: 616badd2fb ("powerpc/powernv: Use OPAL call for TCE kill on NVLink2")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-03 20:45:55 +10:00
Alexey Kardashevskiy e49a6a2173 powerpc/powernv: Fix iommu table size calculation hook for small tables
When the userspace requests a small TCE table (which takes less than
the system page size) and more than 1 TCE level, the existing code
returns a single page size which is a bug as each additional TCE level
requires at least one page and this is what
pnv_pci_ioda2_table_alloc_pages() does. And we end up seeing
WARN_ON(!ret && ((*ptbl)->it_allocated_size != table_size))
in drivers/vfio/vfio_iommu_spapr_tce.c.

This replaces incorrect _ALIGN_UP() (which aligns zero up to zero) with
max_t() to fix the bug.

Besides removing WARN_ON(), there should be no other changes in
behaviour.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:26:54 +10:00
Alexey Kardashevskiy 82eae1afbb powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc
pnv_pci_table_alloc() ignores possible failure from kzalloc_node(),
this adds a check. There are 2 callers of pnv_pci_table_alloc(),
one already checks for tbl!=NULL, this adds WARN_ON() to the other path
which only happens during boot time in IODA1 and not expected to fail.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:26:53 +10:00
Michael Ellerman b13f6683ed Merge branch 'topic/ppc-kvm' into next
Merge the topic branch we were sharing with kvm-ppc, Paul has also
merged it.
2017-04-28 20:19:37 +10:00
Yongji Xie 3827463769 powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
Override pcibios_default_alignment() to set default alignment to PAGE_SIZE
for all PCI devices on PowerNV platform.  Thus sub-page BARs would not
share a page and could be mapped into guest when VFIO passthrough them.

Signed-off-by: Yongji Xie <elohimes@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-04-19 12:51:26 -05:00
Michael Ellerman 7644d5819c powerpc: Create asm/debugfs.h and move powerpc_debugfs_root there
powerpc_debugfs_root is the dentry representing the root of the
"powerpc" directory tree in debugfs.

Currently it sits in asm/debug.h, a long with some other things that
have "debug" in the name, but are otherwise unrelated.

Pull it out into a separate header, which also includes linux/debugfs.h,
and convert all the users to include debugfs.h instead of debug.h.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11 07:46:03 +10:00
Alistair Popple 1ab66d1fba powerpc/powernv: Introduce address translation services for Nvlink2
Nvlink2 supports address translation services (ATS) allowing devices
to request address translations from an mmu known as the nest MMU
which is setup to walk the CPU page tables.

To access this functionality certain firmware calls are required to
setup and manage hardware context tables in the nvlink processing unit
(NPU). The NPU also manages forwarding of TLB invalidates (known as
address translation shootdowns/ATSDs) to attached devices.

This patch exports several methods to allow device drivers to register
a process id (PASID/PID) in the hardware tables and to receive
notification of when a device should stop issuing address translation
requests (ATRs). It also adds a fault handler to allow device drivers
to demand fault pages in.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
[mpe: Fix up comment formatting, use flush_tlb_mm()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-04 13:27:26 +10:00
Alexey Kardashevskiy e5afdf9dd5 powerpc/vfio_spapr_tce: Add reference counting to iommu_table
So far iommu_table obejcts were only used in virtual mode and had
a single owner. We are going to change this by implementing in-kernel
acceleration of DMA mapping requests. The proposed acceleration
will handle requests in real mode and KVM will keep references to tables.

This adds a kref to iommu_table and defines new helpers to update it.
This replaces iommu_free_table() with iommu_tce_table_put() and makes
iommu_free_table() static. iommu_tce_table_get() is not used in this patch
but it will be in the following patch.

Since this touches prototypes, this also removes @node_name parameter as
it has never been really useful on powernv and carrying it for
the pseries platform code to iommu_free_table() seems to be quite
useless as well.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-30 21:42:11 +11:00