1
0
Fork 0
Commit Graph

13309 Commits (92c14cab43267411bc9160f23d55a7548d814483)

Author SHA1 Message Date
Dennis Zhou 92c14cab43 percpu: convert chunk hints to be based on pcpu_block_md
As mentioned in the last patch, a chunk's hints are no different than a
block just responsible for more bits. This converts chunk level hints to
use a pcpu_block_md to maintain them. This lets us reuse the same hint
helper functions as a block. The left_free and right_free are unused by
the chunk's pcpu_block_md.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
2019-03-13 12:25:31 -07:00
Dennis Zhou 047924c968 percpu: make pcpu_block_md generic
In reality, a chunk is just a block covering a larger number of bits.
The hints themselves are one in the same. Rather than maintaining the
hints separately, first introduce nr_bits to genericize
pcpu_block_update() to correctly maintain block->right_free. The next
patch will convert chunk hints to be managed as a pcpu_block_md.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
2019-03-13 12:25:31 -07:00
Dennis Zhou da3afdd5bb percpu: use block scan_hint to only scan forward
Blocks now remember the latest scan_hint. This can be used on the
allocation path as when a contig_hint is broken, we can promote the
scan_hint to the contig_hint and scan forward from there. This works
because pcpu_block_refresh_hint() is only called on the allocation path
while block free regions are updated manually in
pcpu_block_update_hint_free().

Signed-off-by: Dennis Zhou <dennis@kernel.org>
2019-03-13 12:25:31 -07:00
Dennis Zhou b89462a9c5 percpu: remember largest area skipped during allocation
Percpu allocations attempt to do first fit by scanning forward from the
first_free of a block. However, fragmentation from allocation requests
can cause holes not seen by block hint update functions. To address
this, create a local version of bitmap_find_next_zero_area_off() that
remembers the largest area skipped over. The caveat is that it only sees
regions skipped over due to not fitting, not regions skipped due to
alignment.

Prior to updating the scan_hint, a scan backwards is done to try and
recover free bits skipped due to alignment. While this can cause
scanning to miss earlier possible free areas, smaller allocations will
eventually fill those holes due to first fit.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
2019-03-13 12:25:31 -07:00
Dennis Zhou 382b88e961 percpu: add block level scan_hint
Fragmentation can cause both blocks and chunks to have an early
first_firee bit available, but only able to satisfy allocations much
later on. This patch introduces a scan_hint to help mitigate some
unnecessary scanning.

The scan_hint remembers the largest area prior to the contig_hint. If
the contig_hint == scan_hint, then scan_hint_start > contig_hint_start.
This is necessary for scan_hint discovery when refreshing a block.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
2019-03-13 12:25:31 -07:00
Dennis Zhou b239f7daf5 percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE
Previously, block size was flexible based on the constraint that the
GCD(PCPU_BITMAP_BLOCK_SIZE, PAGE_SIZE) > 1. However, this carried the
overhead that keeping a floating number of populated free pages required
scanning over the free regions of a chunk.

Setting the block size to be fixed at PAGE_SIZE lets us know when an
empty page becomes used as we will break a full contig_hint of a block.
This means we no longer have to scan the whole chunk upon breaking a
contig_hint which empty page management piggybacked off. A later patch
takes advantage of this to optimize the allocation path by only scanning
forward using the scan_hint introduced later too.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
2019-03-13 12:25:31 -07:00
Dennis Zhou 8744d85942 percpu: relegate chunks unusable when failing small allocations
In certain cases, requestors of percpu memory may want specific
alignments. However, it is possible to end up in situations where the
contig_hint matches, but the alignment does not. This causes excess
scanning of chunks that will fail. To prevent this, if a small
allocation fails (< 32B), the chunk is moved to the empty list. Once an
allocation is freed from that chunk, it is placed back into rotation.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
2019-03-13 12:25:31 -07:00
Dennis Zhou 3e54097beb percpu: manage chunks based on contig_bits instead of free_bytes
When a chunk becomes fragmented, it can end up having a large number of
small allocation areas free. The free_bytes sorting of chunks leads to
unnecessary checking of chunks that cannot satisfy the allocation.
Switch to contig_bits sorting to prevent scanning chunks that may not be
able to service the allocation request.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
2019-03-13 12:25:31 -07:00
Dennis Zhou d9f3a01eeb percpu: introduce helper to determine if two regions overlap
While block hints were always accurate, it's possible when spanning
across blocks that we miss updating the chunk's contig_hint. Rather than
rely on correctness of the boundaries of hints, do a full overlap
comparison.

A future patch introduces the scan_hint which makes the contig_hint
slightly fuzzy as they can at times be smaller than the actual hint.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
2019-03-13 12:25:31 -07:00
Dennis Zhou 8c43004af0 percpu: do not search past bitmap when allocating an area
pcpu_find_block_fit() guarantees that a fit is found within
PCPU_BITMAP_BLOCK_BITS. Iteration is used to determine the first fit as
it compares against the block's contig_hint. This can lead to
incorrectly scanning past the end of the bitmap. The behavior was okay
given the check after for bit_off >= end and the correctness of the
hints from pcpu_find_block_fit().

This patch fixes this by bounding the end offset by the number of bits
in a chunk.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
2019-03-13 12:25:31 -07:00
Dennis Zhou 8e5a2b9893 percpu: update free path with correct new free region
When updating the chunk's contig_hint on the free path of a hint that
does not touch the page boundaries, it was incorrectly using the
starting offset of the free region and the block's contig_hint. This
could lead to incorrect assumptions about fit given a size and better
alignment of the start. Fix this by using (end - start) as this is only
called when updating a hint within a block.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
2019-03-13 12:25:31 -07:00
Linus Torvalds a667cb7a94 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few misc things

 - the rest of MM

-  remove flex_arrays, replace with new simple radix-tree implementation

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits)
  Drop flex_arrays
  sctp: convert to genradix
  proc: commit to genradix
  generic radix trees
  selinux: convert to kvmalloc
  md: convert to kvmalloc
  openvswitch: convert to kvmalloc
  of: fix kmemleak crash caused by imbalance in early memory reservation
  mm: memblock: update comments and kernel-doc
  memblock: split checks whether a region should be skipped to a helper function
  memblock: remove memblock_{set,clear}_region_flags
  memblock: drop memblock_alloc_*_nopanic() variants
  memblock: memblock_alloc_try_nid: don't panic
  treewide: add checks for the return value of memblock_alloc*()
  swiotlb: add checks for the return value of memblock_alloc*()
  init/main: add checks for the return value of memblock_alloc*()
  mm/percpu: add checks for the return value of memblock_alloc*()
  sparc: add checks for the return value of memblock_alloc*()
  ia64: add checks for the return value of memblock_alloc*()
  arch: don't memset(0) memory returned by memblock_alloc()
  ...
2019-03-12 10:39:53 -07:00
Mike Rapoport a2974133b7 mm: memblock: update comments and kernel-doc
* Remove comments mentioning bootmem
* Extend "DOC: memblock overview"
* Add kernel-doc comments for several more functions

[akpm@linux-foundation.org: fix copy-n-paste error]
Link: http://lkml.kernel.org/r/1549626347-25461-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:02 -07:00
Mike Rapoport c9a688a3e9 memblock: split checks whether a region should be skipped to a helper function
__next_mem_range() and __next_mem_range_rev() duplicate the code that
checks whether a region should be skipped because of node or flags
incompatibility.

Split this code into a helper function.

Link: http://lkml.kernel.org/r/1549455025-17706-3-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:02 -07:00
Mike Rapoport fe145124db memblock: remove memblock_{set,clear}_region_flags
The memblock API provides dedicated helpers to set or clear a flag on a
memory region, e.g.  memblock_{mark,clear}_hotplug().

The memblock_{set,clear}_region_flags() functions are used only by the
memblock internal function that adjusts the region flags.  Drop these
functions and use open-coded implementation instead.

Link: http://lkml.kernel.org/r/1549455025-17706-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:02 -07:00
Mike Rapoport 26fb3dae0a memblock: drop memblock_alloc_*_nopanic() variants
As all the memblock allocation functions return NULL in case of error
rather than panic(), the duplicates with _nopanic suffix can be removed.

Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>		[printk]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:02 -07:00
Mike Rapoport c0dbe825a9 memblock: memblock_alloc_try_nid: don't panic
As all the memblock_alloc*() users are now checking the return value and
panic() in case of error, the panic() call can be removed from the core
memblock allocator, namely memblock_alloc_try_nid().

Link: http://lkml.kernel.org/r/1548057848-15136-21-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:02 -07:00
Mike Rapoport 8a7f97b902 treewide: add checks for the return value of memblock_alloc*()
Add check for the return value of memblock_alloc*() functions and call
panic() in case of error.  The panic message repeats the one used by
panicing memblock allocators with adjustment of parameters to include
only relevant ones.

The replacement was mostly automated with semantic patches like the one
below with manual massaging of format strings.

  @@
  expression ptr, size, align;
  @@
  ptr = memblock_alloc(size, align);
  + if (!ptr)
  + 	panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);

[anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
  Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
[rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
  Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
[rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
  Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
[akpm@linux-foundation.org: fix xtensa printk warning]
Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Guo Ren <ren_guo@c-sky.com>		[c-sky]
Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>	[s390]
Reviewed-by: Juergen Gross <jgross@suse.com>		[Xen]
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Acked-by: Max Filippov <jcmvbkbc@gmail.com>		[xtensa]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:02 -07:00
Mike Rapoport f655f40537 mm/percpu: add checks for the return value of memblock_alloc*()
Add panic() calls if memblock_alloc() returns NULL.

The panic() format duplicates the one used by memblock itself and in
order to avoid explosion with long parameters list replace open coded
allocation size calculations with a local variable.

Link: http://lkml.kernel.org/r/1548057848-15136-17-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:02 -07:00
Mike Rapoport c366ea89fa memblock: make memblock_find_in_range_node() and choose_memblock_flags() static
These functions are not used outside memblock.  Make them static.

Link: http://lkml.kernel.org/r/1548057848-15136-12-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:01 -07:00
Mike Rapoport 92d12f9544 memblock: refactor internal allocation functions
Currently, memblock has several internal functions with overlapping
functionality.  They all call memblock_find_in_range_node() to find free
memory and then reserve the allocated range and mark it with kmemleak.
However, there is difference in the allocation constraints and in
fallback strategies.

The allocations returning physical address first attempt to find free
memory on the specified node within mirrored memory regions, then retry
on the same node without the requirement for memory mirroring and
finally fall back to all available memory.

The allocations returning virtual address start with clamping the
allowed range to memblock.current_limit, attempt to allocate from the
specified node from regions with mirroring and with user defined minimal
address.  If such allocation fails, next attempt is done with node
restriction lifted.  Next, the allocation is retried with minimal
address reset to zero and at last without the requirement for mirrored
regions.

Let's consolidate various fallbacks handling and make them more
consistent for physical and virtual variants.  Most of the fallback
handling is moved to memblock_alloc_range_nid() and it now handles node
and mirror fallbacks.

The memblock_alloc_internal() uses memblock_alloc_range_nid() to get a
physical address of the allocated range and converts it to virtual
address.

The fallback for allocation below the specified minimal address remains
in memblock_alloc_internal() because memblock_alloc_range_nid() is used
by CMA with exact requirement for lower bounds.

The memblock_phys_alloc_nid() function is completely dropped as it is not
used anywhere outside memblock and its only usage can be replaced by a
call to memblock_alloc_range_nid().

[rppt@linux.ibm.com: fix parameter order in memblock_phys_alloc_try_nid()]
  Link: http://lkml.kernel.org/r/20190203113915.GC8620@rapoport-lnx
Link: http://lkml.kernel.org/r/1548057848-15136-11-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:01 -07:00
Mike Rapoport 0ba9e6edd4 memblock: drop memblock_alloc_base()
The memblock_alloc_base() function tries to allocate a memory up to the
limit specified by its max_addr parameter and panics if the allocation
fails.  Replace its usage with memblock_phys_alloc_range() and make the
callers check the return value and panic in case of error.

Link: http://lkml.kernel.org/r/1548057848-15136-10-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:01 -07:00
Mike Rapoport 42b46aeff2 memblock: drop __memblock_alloc_base()
The __memblock_alloc_base() function tries to allocate a memory up to
the limit specified by its max_addr parameter.  Depending on the value
of this parameter, the __memblock_alloc_base() can is replaced with the
appropriate memblock_phys_alloc*() variant.

Link: http://lkml.kernel.org/r/1548057848-15136-9-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:01 -07:00
Mike Rapoport ecc3e771f4 memblock: memblock_phys_alloc(): don't panic
Make the memblock_phys_alloc() function an inline wrapper for
memblock_phys_alloc_range() and update the memblock_phys_alloc() callers
to check the returned value and panic in case of error.

Link: http://lkml.kernel.org/r/1548057848-15136-8-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:01 -07:00
Mike Rapoport 337555744e memblock: memblock_phys_alloc_try_nid(): don't panic
The memblock_phys_alloc_try_nid() function tries to allocate memory from
the requested node and then falls back to allocation from any node in
the system.  The memblock_alloc_base() fallback used by this function
panics if the allocation fails.

Replace the memblock_alloc_base() fallback with the direct call to
memblock_alloc_range_nid() and update the memblock_phys_alloc_try_nid()
callers to check the returned value and panic in case of error.

Link: http://lkml.kernel.org/r/1548057848-15136-7-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:01 -07:00
Mike Rapoport 8a770c2a83 memblock: emphasize that memblock_alloc_range() returns a physical address
Rename memblock_alloc_range() to memblock_phys_alloc_range() to
emphasize that it returns a physical address.

While on it, remove the 'enum memblock_flags' parameter from this
function as its only user anyway sets it to MEMBLOCK_NONE, which is the
default for the most of memblock allocations.

Link: http://lkml.kernel.org/r/1548057848-15136-6-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:01 -07:00
Mike Rapoport 53d818d274 memblock: drop memblock_alloc_base_nid()
memblock_alloc_base_nid() is a oneliner wrapper for
memblock_alloc_range_nid() without any side effect.

Replace it's usage by the direct calls to memblock_alloc_range_nid().

Link: http://lkml.kernel.org/r/1548057848-15136-5-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com>			[Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:01 -07:00
Souptick Joarder b57e622e6d mm/hmm: convert to use vm_fault_t
Convert to use vm_fault_t type as return type for fault handler.

kbuild reported warning during testing of
*mm-create-the-new-vm_fault_t-type.patch* available in below link -
https://patchwork.kernel.org/patch/10752741/

  kernel/memremap.c:46:34: warning: incorrect type in return expression
                           (different base types)
  kernel/memremap.c:46:34: expected restricted vm_fault_t
  kernel/memremap.c:46:34: got int

This patch has fixed the warnings and also hmm_devmem_fault() is
converted to return vm_fault_t to avoid further warnings.

[sfr@canb.auug.org.au: drm/nouveau/dmem: update for struct hmm_devmem_ops member change]
  Link: http://lkml.kernel.org/r/20190220174407.753d94e5@canb.auug.org.au
Link: http://lkml.kernel.org/r/20190110145900.GA1317@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:00 -07:00
Linus Torvalds d14d7f14f1 xen: fixes and features for 5.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXIYrgwAKCRCAXGG7T9hj
 viyuAP4/bKpQ8QUp2V6ddkyEG4NTkA7H87pqQQsxJe9sdoyRRwD5AReS7oitoRS/
 cm6SBpwdaPRX/hfVvT2/h1GWxkvDFgA=
 =8Zfa
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.1a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "xen fixes and features:

   - remove fallback code for very old Xen hypervisors

   - three patches for fixing Xen dom0 boot regressions

   - an old patch for Xen PCI passthrough which was never applied for
     unknown reasons

   - some more minor fixes and cleanup patches"

* tag 'for-linus-5.1a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: fix dom0 boot on huge systems
  xen, cpu_hotplug: Prevent an out of bounds access
  xen: remove pre-xen3 fallback handlers
  xen/ACPI: Switch to bitmap_zalloc()
  x86/xen: dont add memory above max allowed allocation
  x86: respect memory size limiting via mem= parameter
  xen/gntdev: Check and release imported dma-bufs on close
  xen/gntdev: Do not destroy context while dma-bufs are in use
  xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
  xen-scsiback: mark expected switch fall-through
  xen: mark expected switch fall-through
2019-03-11 17:08:14 -07:00
Linus Torvalds a50243b1dd 5.1 Merge Window Pull Request
This has been a slightly more active cycle than normal with ongoing core
 changes and quite a lot of collected driver updates.
 
 - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe
 
 - A new data transfer mode for HFI1 giving higher performance
 
 - Significant functional and bug fix update to the mlx5 On-Demand-Paging MR
   feature
 
 - A chip hang reset recovery system for hns
 
 - Change mm->pinned_vm to an atomic64
 
 - Update bnxt_re to support a new 57500 chip
 
 - A sane netlink 'rdma link add' method for creating rxe devices and fixing
   the various unregistration race conditions in rxe's unregister flow
 
 - Allow lookup up objects by an ID over netlink
 
 - Various reworking of the core to driver interface:
   * Drivers should not assume umem SGLs are in PAGE_SIZE chunks
   * ucontext is accessed via udata not other means
   * Start to make the core code responsible for object memory
     allocation
   * Drivers should convert struct device to struct ib_device
     via a helper
   * Drivers have more tools to avoid use after unregister problems
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlyAJYYACgkQOG33FX4g
 mxrWwQ/+OyAx4Moru7Aix0C6GWxTJp/wKgw21CS3reZxgLai6x81xNYG/s2wCNjo
 IccObVd7mvzyqPdxOeyHBsJBbQDqWvoD6O2duH8cqGMgBRgh3CSdUep2zLvPpSAx
 2W1SvWYCLDnCuarboFrCA8c4AN3eCZiqD7z9lHyFQGjy3nTUWzk1uBaOP46uaiMv
 w89N8EMdXJ/iY6ONzihvE05NEYbMA8fuvosKLLNdghRiHIjbMQU8SneY23pvyPDd
 ZziPu9NcO3Hw9OVbkwtJp47U3KCBgvKHmnixyZKkikjiD+HVoABw2IMwcYwyBZwP
 Bic/ddONJUvAxMHpKRnQaW7znAiHARk21nDG28UAI7FWXH/wMXgicMp6LRcNKqKF
 vqXdxHTKJb0QUR4xrYI+eA8ihstss7UUpgSgByuANJ0X729xHiJtlEvPb1DPo1Dz
 9CB4OHOVRl5O8sA5Jc6PSusZiKEpvWoyWbdmw0IiwDF5pe922VLl5Nv88ta+sJ38
 v2Ll5AgYcluk7F3599Uh9D7gwp5hxW2Ph3bNYyg2j3HP4/dKsL9XvIJPXqEthgCr
 3KQS9rOZfI/7URieT+H+Mlf+OWZhXsZilJG7No0fYgIVjgJ00h3SF1/299YIq6Qp
 9W7ZXBfVSwLYA2AEVSvGFeZPUxgBwHrSZ62wya4uFeB1jyoodPk=
 =p12E
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a slightly more active cycle than normal with ongoing
  core changes and quite a lot of collected driver updates.

   - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe

   - A new data transfer mode for HFI1 giving higher performance

   - Significant functional and bug fix update to the mlx5
     On-Demand-Paging MR feature

   - A chip hang reset recovery system for hns

   - Change mm->pinned_vm to an atomic64

   - Update bnxt_re to support a new 57500 chip

   - A sane netlink 'rdma link add' method for creating rxe devices and
     fixing the various unregistration race conditions in rxe's
     unregister flow

   - Allow lookup up objects by an ID over netlink

   - Various reworking of the core to driver interface:
       - drivers should not assume umem SGLs are in PAGE_SIZE chunks
       - ucontext is accessed via udata not other means
       - start to make the core code responsible for object memory
         allocation
       - drivers should convert struct device to struct ib_device via a
         helper
       - drivers have more tools to avoid use after unregister problems"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
  net/mlx5: ODP support for XRC transport is not enabled by default in FW
  IB/hfi1: Close race condition on user context disable and close
  RDMA/umem: Revert broken 'off by one' fix
  RDMA/umem: minor bug fix in error handling path
  RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
  cxgb4: kfree mhp after the debug print
  IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
  IB/rdmavt: Fix loopback send with invalidate ordering
  IB/iser: Fix dma_nents type definition
  IB/mlx5: Set correct write permissions for implicit ODP MR
  bnxt_re: Clean cq for kernel consumers only
  RDMA/uverbs: Don't do double free of allocated PD
  RDMA: Handle ucontext allocations by IB/core
  RDMA/core: Fix a WARN() message
  bnxt_re: fix the regression due to changes in alloc_pbl
  IB/mlx4: Increase the timeout for CM cache
  IB/core: Abort page fault handler silently during owning process exit
  IB/mlx5: Validate correct PD before prefetch MR
  IB/mlx5: Protect against prefetch of invalid MR
  RDMA/uverbs: Store PR pointer before it is overwritten
  ...
2019-03-09 15:53:03 -08:00
Linus Torvalds f86727f8bd Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm cleanup from Ingo Molnar:
 "A single GUP cleanup"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mm/gup: Remove the 'write' parameter from gup_fast_permitted()
2019-03-07 17:43:58 -08:00
Linus Torvalds 8d521d94da Merge branch 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu
Pull percpu updates from Dennis Zhou:
 "There are 2 minor changes to the percpu allocator this merge window:

   - for loop condition that could be out of bounds on multi-socket UP

   - cosmetic removal of pcpu_group_offsets[0] in UP code as it is 0

  There has been an interest in having better alignment with percpu
  allocations. This has caused a performance regression in at least one
  reported workload. I have a series out which adds scan hints to the
  allocator as well as some other performance oriented changes. I hope
  to have this queued for v5.2 soon"

* 'for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  percpu: km: no need to consider pcpu_group_offsets[0]
  percpu: use nr_groups as check condition
2019-03-07 10:06:46 -08:00
Oscar Salvador ea2c3f6f55 mm,mremap: bail out earlier in mremap_to under map pressure
When using mremap() syscall in addition to MREMAP_FIXED flag, mremap()
calls mremap_to() which does the following:

1) unmaps the destination region where we are going to move the map
2) If the new region is going to be smaller, we unmap the last part
   of the old region

Then, we will eventually call move_vma() to do the actual move.

move_vma() checks whether we are at least 4 maps below max_map_count
before going further, otherwise it bails out with -ENOMEM.  The problem
is that we might have already unmapped the vma's in steps 1) and 2), so
it is not possible for userspace to figure out the state of the vmas
after it gets -ENOMEM, and it gets tricky for userspace to clean up
properly on error path.

While it is true that we can return -ENOMEM for more reasons (e.g: see
may_expand_vm() or move_page_tables()), I think that we can avoid this
scenario if we check early in mremap_to() if the operation has high
chances to succeed map-wise.

Should that not be the case, we can bail out before we even try to unmap
anything, so we make sure the vma's are left untouched in case we are
likely to be short of maps.

The thumb-rule now is to rely on the worst-scenario case we can have.
That is when both vma's (old region and new region) are going to be
split in 3, so we get two more maps to the ones we already hold (one per
each).  If current map count + 2 maps still leads us to 4 maps below the
threshold, we are going to pass the check in move_vma().

Of course, this is not free, as it might generate false positives when
it is true that we are tight map-wise, but the unmap operation can
release several vma's leading us to a good state.

Another approach was also investigated [1], but it may be too much
hassle for what it brings.

[1] https://lore.kernel.org/lkml/20190219155320.tkfkwvqk53tfdojt@d104.suse.de/

Link: http://lkml.kernel.org/r/20190226091314.18446-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Qian Cai d778015ac9 mm/sparse: fix a bad comparison
next_present_section_nr() could only return an unsigned number -1, so
just check it specifically where compilers will convert -1 to unsigned
if needed.

  mm/sparse.c: In function 'sparse_init_nid':
  mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
         ((section_nr >= 0) &&    \
                      ^~
  mm/sparse.c:478:2: note: in expansion of macro
  'for_each_present_section_nr'
    for_each_present_section_nr(pnum_begin, pnum) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
         ((section_nr >= 0) &&    \
                      ^~
  mm/sparse.c:497:2: note: in expansion of macro
  'for_each_present_section_nr'
    for_each_present_section_nr(pnum_begin, pnum) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  mm/sparse.c: In function 'sparse_init':
  mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
         ((section_nr >= 0) &&    \
                      ^~
  mm/sparse.c:520:2: note: in expansion of macro
  'for_each_present_section_nr'
    for_each_present_section_nr(pnum_begin + 1, pnum_end) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~

Link: http://lkml.kernel.org/r/20190228181839.86504-1-cai@lca.pw
Fixes: c4e1be9ec1 ("mm, sparsemem: break out of loops early")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Jan Stancek fc8efd2ddf mm/memory.c: do_fault: avoid usage of stale vm_area_struct
LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8.
This is a stress test, where one thread mmaps/writes/munmaps memory area
and other thread is trying to read from it:

  CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51
  Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
  Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8)
  Call Trace:
  ([<0000000000000000>]           (null))
   [<00000000001adae4>] lock_acquire+0xec/0x258
   [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98
   [<000000000012a780>] page_table_free+0x48/0x1a8
   [<00000000002f6e54>] do_fault+0xdc/0x670
   [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0
   [<00000000002fb138>] handle_mm_fault+0x1b0/0x320
   [<00000000001248cc>] do_dat_exception+0x19c/0x2c8
   [<000000000080e5ee>] pgm_check_handler+0x19e/0x200

page_table_free() is called with NULL mm parameter, but because "0" is a
valid address on s390 (see S390_lowcore), it keeps going until it
eventually crashes in lockdep's lock_acquire.  This crash is
reproducible at least since 4.14.

Problem is that "vmf->vma" used in do_fault() can become stale.  Because
mmap_sem may be released, other threads can come in, call munmap() and
cause "vma" be returned to kmem cache, and get zeroed/re-initialized and
re-used:

handle_mm_fault                           |
  __handle_mm_fault                       |
    do_fault                              |
      vma = vmf->vma                      |
      do_read_fault                       |
        __do_fault                        |
          vma->vm_ops->fault(vmf);        |
            mmap_sem is released          |
                                          |
                                          | do_munmap()
                                          |   remove_vma_list()
                                          |     remove_vma()
                                          |       vm_area_free()
                                          |         # vma is released
                                          | ...
                                          | # same vma is allocated
                                          | # from kmem cache
                                          | do_mmap()
                                          |   vm_area_alloc()
                                          |     memset(vma, 0, ...)
                                          |
      pte_free(vma->vm_mm, ...);          |
        page_table_free                   |
          spin_lock_bh(&mm->context.lock);|
            <crash>                       |

Cache mm_struct to avoid using potentially stale "vma".

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c

Link: http://lkml.kernel.org/r/5b3fdf19e2a5be460a384b936f5b56e13733f1b8.1551595137.git.jstancek@redhat.com
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Qian Cai 70516b936b mm/huge_memory.c: fix "orig_pud" set but not used
Commit a00cc7d9dd ("mm, x86: add support for PUD-sized transparent
hugepages") introduced pudp_huge_get_and_clear_full() but no one uses
its return code.

In order to not diverge from pmdp_huge_get_and_clear_full(), just change
zap_huge_pud() to not assign the return value from
pudp_huge_get_and_clear_full().

  mm/huge_memory.c: In function 'zap_huge_pud':
  mm/huge_memory.c:1982:8: warning: variable 'orig_pud' set but not used [-Wunused-but-set-variable]
    pud_t orig_pud;
          ^~~~~~~~

Link: http://lkml.kernel.org/r/20190301221956.97493-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Qian Cai cd02cf1ace mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
When onlining a memory block with DEBUG_PAGEALLOC, it unmaps the pages
in the block from kernel, However, it does not map those pages while
offlining at the beginning.  As the result, it triggers a panic below
while onlining on ppc64le as it checks if the pages are mapped before
unmapping.  However, the imbalance exists for all arches where
double-unmappings could happen.  Therefore, let kernel map those pages
in generic_online_page() before they have being freed into the page
allocator for the first time where it will set the page count to one.

On the other hand, it works fine during the boot, because at least for
IBM POWER8, it does,

early_setup
  early_init_mmu
    harsh__early_init_mmu
      htab_initialize [1]
        htab_bolt_mapping [2]

where it effectively map all memblock regions just like
kernel_map_linear_page(), so later mem_init() -> memblock_free_all()
will unmap them just fine without any imbalance.  On other arches
without this imbalance checking, it still unmap them once at the most.

[1]
for_each_memblock(memory, reg) {
        base = (unsigned long)__va(reg->base);
        size = reg->size;

        DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
                base, size, prot);

        BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
                prot, mmu_linear_psize, mmu_kernel_ssize));
        }

[2] linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
    kernel BUG at arch/powerpc/mm/hash_utils_64.c:1815!
    Oops: Exception in kernel mode, sig: 5 [#1]
    LE SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA pSeries
    CPU: 2 PID: 4298 Comm: bash Not tainted 5.0.0-rc7+ #15
    NIP:  c000000000062670 LR: c00000000006265c CTR: 0000000000000000
    REGS: c0000005bf8a75b0 TRAP: 0700   Not tainted  (5.0.0-rc7+)
    MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28422842
    XER: 00000000
    CFAR: c000000000804f44 IRQMASK: 1
    NIP [c000000000062670] __kernel_map_pages+0x2e0/0x4f0
    LR [c00000000006265c] __kernel_map_pages+0x2cc/0x4f0
    Call Trace:
       __kernel_map_pages+0x2cc/0x4f0
       free_unref_page_prepare+0x2f0/0x4d0
       free_unref_page+0x44/0x90
       __online_page_free+0x84/0x110
       online_pages_range+0xc0/0x150
       walk_system_ram_range+0xc8/0x120
       online_pages+0x280/0x5a0
       memory_subsys_online+0x1b4/0x270
       device_online+0xc0/0xf0
       state_store+0xc0/0x180
       dev_attr_store+0x3c/0x60
       sysfs_kf_write+0x70/0xb0
       kernfs_fop_write+0x10c/0x250
       __vfs_write+0x48/0x240
       vfs_write+0xd8/0x210
       ksys_write+0x70/0x120
       system_call+0x5c/0x70

Link: http://lkml.kernel.org/r/20190301220814.97339-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Qian Cai 82ede7ee38 mm/memcontrol.c: fix bad line in comment
Commit 230671533d ("mm: memory.low hierarchical behavior") missed an
asterisk in one of the comments.

  mm/memcontrol.c:5774: warning: bad line:                | 0, otherwise.

Link: http://lkml.kernel.org/r/20190301143734.94393-1-cai@lca.pw
Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Peng Fan 0d3bd18a5e mm/cma.c: cma_declare_contiguous: correct err handling
In case cma_init_reserved_mem failed, need to free the memblock
allocated by memblock_reserve or memblock_alloc_range.

Quote Catalin's comments:
  https://lkml.org/lkml/2019/2/26/482

Kmemleak is supposed to work with the memblock_{alloc,free} pair and it
ignores the memblock_reserve() as a memblock_alloc() implementation
detail. It is, however, tolerant to memblock_free() being called on
a sub-range or just a different range from a previous memblock_alloc().
So the original patch looks fine to me. FWIW:

Link: http://lkml.kernel.org/r/20190227144631.16708-1-peng.fan@nxp.com
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Qian Cai 0c81585499 mm/page_ext.c: fix an imbalance with kmemleak
After offlining a memory block, kmemleak scan will trigger a crash, as
it encounters a page ext address that has already been freed during
memory offlining.  At the beginning in alloc_page_ext(), it calls
kmemleak_alloc(), but it does not call kmemleak_free() in
free_page_ext().

    BUG: unable to handle kernel paging request at ffff888453d00000
    PGD 128a01067 P4D 128a01067 PUD 128a04067 PMD 47e09e067 PTE 800ffffbac2ff060
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
    Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
    RIP: 0010:scan_block+0xb5/0x290
    Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 <4c> 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
    RSP: 0018:ffff8881ec57f8e0 EFLAGS: 00010082
    RAX: 0000000000000000 RBX: ffff888453d00000 RCX: ffffffffa61e5a54
    RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888453d00000
    RBP: ffff8881ec57f920 R08: fffffbfff4ed588d R09: fffffbfff4ed588c
    R10: fffffbfff4ed588c R11: ffffffffa76ac463 R12: dffffc0000000000
    R13: ffff888453d00ff9 R14: ffff8881f80cef48 R15: ffff8881f80cef48
    FS:  00007f6c0e3f8740(0000) GS:ffff8881f7680000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff888453d00000 CR3: 00000001c4244003 CR4: 00000000001606a0
    Call Trace:
     scan_gray_list+0x269/0x430
     kmemleak_scan+0x5a8/0x10f0
     kmemleak_write+0x541/0x6ca
     full_proxy_write+0xf8/0x190
     __vfs_write+0xeb/0x980
     vfs_write+0x15a/0x4f0
     ksys_write+0xd2/0x1b0
     __x64_sys_write+0x73/0xb0
     do_syscall_64+0xeb/0xaaa
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f6c0dad73b8
    Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
    RSP: 002b:00007ffd5b863cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f6c0dad73b8
    RDX: 0000000000000005 RSI: 000055a9216e1710 RDI: 0000000000000001
    RBP: 000055a9216e1710 R08: 000000000000000a R09: 00007ffd5b863840
    R10: 000000000000000a R11: 0000000000000246 R12: 00007f6c0dda9780
    R13: 0000000000000005 R14: 00007f6c0dda4740 R15: 0000000000000005
    Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
    CR2: ffff888453d00000
    ---[ end trace ccf646c7456717c5 ]---
    Kernel panic - not syncing: Fatal exception
    Shutting down cpus with NMI
    Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
    0xffffffff80000000-0xffffffffbfffffff)
    ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Andrey Ryabinin 5f438eee8f mm/compaction: pass pgdat to too_many_isolated() instead of zone
too_many_isolated() in mm/compaction.c looks only at node state, so it
makes more sense to change argument to pgdat instead of zone.

Link: http://lkml.kernel.org/r/20190228083329.31892-3-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Andrey Ryabinin f4b7e272b5 mm: remove zone_lru_lock() function, access ->lru_lock directly
We have common pattern to access lru_lock from a page pointer:
	zone_lru_lock(page_zone(page))

Which is silly, because it unfolds to this:
	&NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)]->zone_pgdat->lru_lock
while we can simply do
	&NODE_DATA(page_to_nid(page))->lru_lock

Remove zone_lru_lock() function, since it's only complicate things.  Use
'page_pgdat(page)->lru_lock' pattern instead.

[aryabinin@virtuozzo.com: a slightly better version of __split_huge_page()]
  Link: http://lkml.kernel.org/r/20190301121651.7741-1-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/20190228083329.31892-2-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Andrey Ryabinin a7ca12f9d9 mm/workingset: remove unused @mapping argument in workingset_eviction()
workingset_eviction() doesn't use and never did use the @mapping
argument.  Remove it.

Link: http://lkml.kernel.org/r/20190228083329.31892-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Gustavo A. R. Silva 960087445c mm/swapfile.c: use struct_size() in kvzalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array.  For example:

  struct foo {
      int stuff;
      struct boo entry[];
  };

  size = sizeof(struct foo) + count * sizeof(struct boo);
  instance = kvzalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

  instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

Notice that, in this case, variable size is not necessary, hence it is
removed.

This code was detected with the help of Coccinelle.

Link: http://lkml.kernel.org/r/20190221154622.GA19599@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:21 -08:00
Yue Hu 5a7f1b2f2f mm/cma_debug.c: remove static scoped cma_debugfs_root
Currently cma_debugfs_root is static storage.  That is unnecessary since
it will be only used by next cma_debugfs_add_one().  We can just pass it
to following calling to save thisspace.  Also remove useless idx
parameter.

Link: http://lkml.kernel.org/r/20190221040130.8940-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:20 -08:00
Yu Zhao 5d3ee42f8f mm/shmem: make find_get_pages_range() work for huge page
find_get_pages_range() and find_get_pages_range_tag() already correctly
increment reference count on head when seeing compound page, but they
may still use page index from tail.  Page index from tail is always
zero, so these functions don't work on huge shmem.  This hasn't been a
problem because, AFAIK, nobody calls these functions on (huge) shmem.
Fix them anyway just in case.

Link: http://lkml.kernel.org/r/20190110030838.84446-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:20 -08:00
Christoph Hellwig afa0011289 mm: unexport free_reserved_area
This function is only used by built-in code, which makes perfect sense
given the purpose of it.

Link: http://lkml.kernel.org/r/20190213174621.29297-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:20 -08:00
Oscar Salvador daf3538ad5 mm,memory_hotplug: explicitly pass the head to isolate_huge_page
isolate_huge_page() expects we pass the head of hugetlb page to it:

  bool isolate_huge_page(...)
  {
	...
	VM_BUG_ON_PAGE(!PageHead(page), page);
	...
  }

While I really cannot think of any situation where we end up with a
non-head page between hands in do_migrate_range(), let us make sure the
code is as sane as possible by explicitly passing the Head.  Since we
already got the pointer, it does not take us extra effort.

Link: http://lkml.kernel.org/r/20190208090604.975-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:20 -08:00
Jan Kara f900482da5 mm/migrate.c: cleanup expected_page_refs()
Andrea has noted that page migration code propagates page_mapping(page)
through the whole migration stack down to migrate_page() function so it
seems stupid to then use page_mapping(page) in expected_page_refs()
instead of passed down 'mapping' argument.  I agree so let's make
expected_page_refs() more in line with the rest of the migration stack.

Link: http://lkml.kernel.org/r/20190207112314.24872-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:20 -08:00
Mike Rapoport a862f68a8b docs/core-api/mm: fix return value descriptions in mm/
Many kernel-doc comments in mm/ have the return value descriptions
either misformatted or omitted at all which makes kernel-doc script
unhappy:

$ make V=1 htmldocs
...
./mm/util.c:36: info: Scanning doc for kstrdup
./mm/util.c:41: warning: No description found for return value of 'kstrdup'
./mm/util.c:57: info: Scanning doc for kstrdup_const
./mm/util.c:66: warning: No description found for return value of 'kstrdup_const'
./mm/util.c:75: info: Scanning doc for kstrndup
./mm/util.c:83: warning: No description found for return value of 'kstrndup'
...

Fixing the formatting and adding the missing return value descriptions
eliminates ~100 such warnings.

Link: http://lkml.kernel.org/r/1549549644-4903-4-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:20 -08:00