1
0
Fork 0
alistair23-linux/mm
David Rientjes ac79f78dab Revert "Revert "mm, thp: restore node-local hugepage allocations""
This reverts commit a8282608c8.

The commit references the original intended semantic for MADV_HUGEPAGE
which has subsequently taken on three unique purposes:

 - enables or disables thp for a range of memory depending on the system's
   config (is thp "enabled" set to "always" or "madvise"),

 - determines the synchronous compaction behavior for thp allocations at
   fault (is thp "defrag" set to "always", "defer+madvise", or "madvise"),
   and

 - reverts a previous MADV_NOHUGEPAGE (there is no madvise mode to only
   clear previous hugepage advice).

These are the three purposes that currently exist in 5.2 and over the
past several years that userspace has been written around.  Adding a
NUMA locality preference adds a fourth dimension to an already conflated
advice mode.

Based on the semantic that MADV_HUGEPAGE has provided over the past
several years, there exist workloads that use the tunable based on these
principles: specifically that the allocation should attempt to
defragment a local node before falling back.  It is agreed that remote
hugepages typically (but not always) have a better access latency than
remote native pages, although on Naples this is at parity for
intersocket.

The revert commit that this patch reverts allows hugepage allocation to
immediately allocate remotely when local memory is fragmented.  This is
contrary to the semantic of MADV_HUGEPAGE over the past several years:
that is, memory compaction should be attempted locally before falling
back.

The performance degradation of remote hugepages over local hugepages on
Rome, for example, is 53.5% increased access latency.  For this reason,
the goal is to revert back to the 5.2 and previous behavior that would
attempt local defragmentation before falling back.  With the patch that
is reverted by this patch, we see performance degradations at the tail
because the allocator happily allocates the remote hugepage rather than
even attempting to make a local hugepage available.

zone_reclaim_mode is not a solution to this problem since it does not
only impact hugepage allocations but rather changes the memory
allocation strategy for *all* page allocations.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-28 14:05:38 -07:00
..
kasan mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y 2019-08-24 19:48:42 -07:00
Kconfig mm: introduce ARCH_HAS_PTE_DEVMAP 2019-07-16 19:23:25 -07:00
Kconfig.debug mm, debug_pagealloc: use a page type instead of page_ext flag 2019-07-12 11:05:43 -07:00
Makefile memremap: move from kernel/ to mm/ 2019-08-03 07:02:01 -07:00
backing-dev.c backing-dev: no need to check return value of debugfs_create functions 2019-06-03 15:49:07 +02:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-07-16 19:23:21 -07:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
compaction.c mm: compaction: avoid 100% CPU usage during compaction when a task is killed 2019-08-03 07:02:00 -07:00
debug.c mm: update references to page _refcount 2019-05-14 19:52:47 -07:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2017-12-11 14:54:44 +01:00
fadvise.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm/filemap.c: correct the comment about VM_FAULT_RETRY 2019-07-12 11:05:43 -07:00
frame_vector.c mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' 2017-12-14 16:00:48 -08:00
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup.c mm: introduce ARCH_HAS_PTE_DEVMAP 2019-07-16 19:23:25 -07:00
gup_benchmark.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot} 2019-07-25 16:14:39 -03:00
huge_memory.c Revert "Revert "mm, thp: restore node-local hugepage allocations"" 2019-09-28 14:05:38 -07:00
hugetlb.c hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS 2019-08-13 16:06:53 -07:00
hugetlb_cgroup.c mm: rename page_counter's count/limit into usage/max 2018-06-07 17:34:35 -07:00
hwpoison-inject.c hwpoison-inject: no need to check return value of debugfs_create functions 2019-06-03 15:39:40 +02:00
init-mm.c mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids 2018-07-17 09:35:30 +02:00
internal.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
khugepaged.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c mm: kmemleak: disable early logging in case of error 2019-08-13 16:06:52 -07:00
ksm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
list_lru.c mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages 2019-07-12 11:05:44 -07:00
maccess.c The main changes in this release include: 2019-07-18 11:51:00 -07:00
madvise.c mm: remove MEMORY_DEVICE_PUBLIC support 2019-07-02 14:32:43 -03:00
memblock.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
memcontrol.c mm: memcontrol: fix percpu vmstats and vmevents flush 2019-08-30 18:00:50 -07:00
memfd.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
memory-failure.c HMM patches for 5.3 2019-07-14 19:42:11 -07:00
memory.c mm: thp: make transhuge_vma_suitable available for anonymous THP 2019-07-18 17:08:06 -07:00
memory_hotplug.c mm/memory_hotplug.c: remove unneeded return for void function 2019-08-03 07:02:01 -07:00
mempolicy.c Revert "Revert "mm, thp: restore node-local hugepage allocations"" 2019-09-28 14:05:38 -07:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memremap.c mm/hmm: fix ZONE_DEVICE anon page mapping reuse 2019-08-13 16:06:52 -07:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm/migrate.c: initialize pud_entry in migrate_vma() 2019-08-03 07:02:01 -07:00
mincore.c mm/mincore.c: fix race between swapoff and mincore 2019-07-12 11:05:43 -07:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-06-13 17:34:56 -10:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmu_context.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
mmu_gather.c mm: mmu_gather: remove __tlb_reset_range() for force flush 2019-06-13 17:34:56 -10:00
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-12 11:05:46 -07:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c mm/mprotect.c: fix compilation warning because of unused 'mm' variable 2019-05-14 09:47:51 -07:00
mremap.c mm/mmu_notifier: contextual information for event triggering invalidation 2019-05-14 09:47:49 -07:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nommu.c mm: fix the MAP_UNINITIALIZED flag 2019-07-16 19:23:21 -07:00
oom_kill.c mm/oom_kill.c: remove redundant OOM score normalization in select_bad_process() 2019-07-12 11:05:47 -07:00
page-writeback.c mm: remove the account_page_dirtied export 2019-07-12 11:05:42 -07:00
page_alloc.c mm, page_alloc: move_freepages should not examine struct page of reserved memory 2019-08-24 19:48:42 -07:00
page_counter.c memcg: introduce memory.min 2018-06-07 17:34:36 -07:00
page_ext.c mm, debug_pagealloc: use a page type instead of page_ext flag 2019-07-12 11:05:43 -07:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-06-29 16:43:45 +08:00
page_io.c mm, swap: use rbtree for swap_extent 2019-07-12 11:05:43 -07:00
page_isolation.c mm/page_isolation.c: change the prototype of undo_isolate_page_range() 2019-07-12 11:05:43 -07:00
page_owner.c mm/page_owner: Simplify stack trace handling 2019-04-29 12:37:50 +02:00
page_poison.c page_poison: play nicely with KASAN 2019-03-05 21:07:13 -08:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-10-31 08:54:11 -07:00
pagewalk.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
pgtable-generic.c x86/mm: Page size aware flush_tlb_mm_range() 2018-10-09 16:51:11 +02:00
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm/hmm: fix bad subpage pointer in try_to_unmap_one 2019-08-13 16:06:52 -07:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" 2019-08-13 16:06:52 -07:00
shuffle.c mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
slab.h mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
slab_common.c mm/slab_common.c: work around clang bug #42570 2019-07-16 19:23:21 -07:00
slob.c mm/slab: refactor common ksize KASAN logic into slab_common.c 2019-07-12 11:05:42 -07:00
slub.c mm: slub: Fix slab walking for init_on_free 2019-07-31 13:16:06 -07:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm/sparsemem: cleanup 'section number' data types 2019-07-18 17:08:07 -07:00
swap.c docs: admin-guide: move sysctl directory to it 2019-07-15 11:03:01 -03:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c mm/swap_state.c: simplify total_swapcache_pages() with get_swap_device() 2019-07-12 11:05:43 -07:00
swapfile.c mm, swap: use rbtree for swap_extent 2019-07-12 11:05:43 -07:00
truncate.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
usercopy.c mm/usercopy: use memory range to be accessed for wraparound check 2019-08-13 16:06:52 -07:00
userfaultfd.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 2019-06-19 17:09:53 +02:00
util.c mm: add account_locked_vm utility function 2019-07-16 19:23:25 -07:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c mm/vmalloc.c: fix percpu free VM area search criteria 2019-08-13 16:06:52 -07:00
vmpressure.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
vmscan.c mm, memcg: do not set reclaim_state on soft limit reclaim 2019-08-30 18:00:50 -07:00
vmstat.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
workingset.c mm: workingset: fix vmstat counters for shadow nodes 2019-08-13 16:06:52 -07:00
z3fold.c mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate 2019-08-30 18:00:50 -07:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zsmalloc.c mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n 2019-08-30 18:00:50 -07:00
zswap.c zswap: ignore debugfs_create_dir() return value 2019-06-03 15:39:39 +02:00