1
0
Fork 0
alistair23-linux/mm
Oscar Salvador 10eeadf304 mm,memory_hotplug: unlock 1GB-hugetlb on x86_64
On x86_64, 1GB-hugetlb pages could never be offlined due to the fact
that hugepage_migration_supported() returned false for PUD_SHIFT.
So whenever we wanted to offline a memblock containing a gigantic
hugetlb page, we never got beyond has_unmovable_pages() check.
This changed with [1], where now we also return true for PUD_SHIFT.

After that patch, the check in has_unmovable_pages() and scan_movable_pages()
returned true, but we still had a final barrier in do_migrate_range():

if (compound_order(head) > PFN_SECTION_SHIFT) {
	ret = -EBUSY;
	break;
}

This is not really nice, and we do not really need it.
It is perfectly possible to migrate a gigantic page as long as another node has
a spare gigantic page for us.
In alloc_huge_page_nodemask(), we calculate the __real__ number of free pages,
and if any, we try to dequeue one from another node.

This all works fine when we do have another node with a spare gigantic page,
but if that is not the case, alloc_huge_page_nodemask() ends up calling
alloc_migrate_huge_page() which bails out if the wanted page is gigantic.
That is mainly because finding a 1GB (or even 16GB on powerpc) contiguous
memory is quite unlikely when the system has been running for a while.

In that situation, we will keep looping forever because scan_movable_pages()
will give us the same page and we will fail again because there is no node
where we can dequeue a gigantic page from.
This is not nice, and it has been raised that we might want to treat -ENOMEM
as a fatal error in do_migrate_range(), but this has to be checked further.

Anyway, I would tend say that this is the administrator's job, to make sure
that the system can keep up with the memory to be offlined, so that would mean
that if we want to use gigantic pages, make sure that the other nodes have at
least enough gigantic pages to keep up in case we need to offline memory.

Just for the sake of completeness, this is one of the tests done:

 # echo 1 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
 # echo 1 > /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages

 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
   1
 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages
   1

 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages
   1
 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages
   1

 (hugetlb1gb is a program that maps 1GB region using MAP_HUGE_1GB)

 # numactl -m 1 ./hugetlb1gb
 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages
   0
 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages
   1

 # offline node1 memory
 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages
   0

[1] https://lore.kernel.org/patchwork/patch/998796/

Link: http://lkml.kernel.org/r/20190320152658.10855-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
..
kasan arm64 updates for 5.2 2019-05-06 17:54:22 -07:00
Kconfig ksm: replace jhash2 with xxhash 2018-12-28 12:11:46 -08:00
Kconfig.debug mm/page_owner: move config option to mm/Kconfig.debug 2019-03-05 21:07:18 -08:00
Makefile mm: remove nobootmem 2018-10-31 08:54:16 -07:00
backing-dev.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-01-22 14:39:38 -07:00
balloon_compaction.c virtio_balloon: fix deadlock on OOM 2017-11-14 23:57:38 +02:00
cleancache.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
cma.c memblock: emphasize that memblock_alloc_range() returns a physical address 2019-03-12 10:04:01 -07:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
compaction.c mm/compaction.c: abort search if isolation fails 2019-04-04 11:56:15 +01:00
debug.c mm/debug.c: fix __dump_page when mapping->host is not set 2019-03-29 10:01:37 -07:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2017-12-11 14:54:44 +01:00
fadvise.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
failslab.c mm: no need to check return value of debugfs_create functions 2019-03-05 21:07:17 -08:00
filemap.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
frame_vector.c mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' 2017-12-14 16:00:48 -08:00
frontswap.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
gup.c mm/gup: add FOLL_LONGTERM capability to GUP fast 2019-05-14 09:47:46 -07:00
gup_benchmark.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c mm/hmm: convert to use vm_fault_t 2019-03-12 10:04:00 -07:00
huge_memory.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
hugetlb.c mm/hugetlb.c: don't put_page in lock of hugetlb_lock 2019-05-14 09:47:44 -07:00
hugetlb_cgroup.c mm: rename page_counter's count/limit into usage/max 2018-06-07 17:34:35 -07:00
hwpoison-inject.c mm/memory_failure: Remove unused trapno from memory_failure 2018-01-23 12:17:42 -06:00
init-mm.c mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids 2018-07-17 09:35:30 +02:00
internal.h mm, compaction: capture a page under direct compaction 2019-03-05 21:07:17 -08:00
interval_tree.c mm/interval_tree.c: use vma_pages() helper 2018-01-31 17:18:37 -08:00
khugepaged.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c Merge branch 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-05-06 13:11:48 -07:00
ksm.c mm: ksm: do not block on page lock when searching stable tree 2019-03-05 21:07:19 -08:00
list_lru.c numa: make "nr_node_ids" unsigned int 2019-03-05 21:07:19 -08:00
maccess.c Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses" 2019-02-25 09:10:51 -08:00
madvise.c asm-generic/tlb, arch: Provide CONFIG_HAVE_MMU_GATHER_PAGE_SIZE 2019-04-03 10:32:40 +02:00
memblock.c Printk changes for 5.2 2019-05-07 09:18:12 -07:00
memcontrol.c mm: writeback: use exact memcg dirty counts 2019-04-05 16:02:31 -10:00
memfd.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
memory-failure.c mm: hwpoison: fix thp split handing in soft_offline_in_use_page() 2019-03-05 21:07:13 -08:00
memory.c Printk changes for 5.2 2019-05-07 09:18:12 -07:00
memory_hotplug.c mm,memory_hotplug: unlock 1GB-hugetlb on x86_64 2019-05-14 09:47:46 -07:00
mempolicy.c mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified 2019-03-29 10:01:37 -07:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
mincore.c Revert "Change mincore() to count "mapped" pages rather than "cached" pages" 2019-01-24 09:04:37 +13:00
mlock.c mm: remove zone_lru_lock() function, access ->lru_lock directly 2019-03-05 21:07:21 -08:00
mm_init.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
mmap.c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-04-19 09:46:05 -07:00
mmu_context.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
mmu_gather.c asm-generic/tlb: Remove tlb_table_flush() 2019-04-03 10:33:02 +02:00
mmu_notifier.c mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 2018-12-28 12:11:50 -08:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c mm: update ptep_modify_prot_commit to take old pte value as arg 2019-03-05 21:07:18 -08:00
mremap.c mm,mremap: bail out earlier in mremap_to under map pressure 2019-03-05 21:07:21 -08:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nommu.c mm/gup: cache dev_pagemap while pinning pages 2018-10-26 16:38:15 -07:00
oom_kill.c mm,oom: don't kill global init via memory.oom.group 2019-03-05 21:07:19 -08:00
page-writeback.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
page_alloc.c mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact() 2019-05-14 09:47:45 -07:00
page_counter.c memcg: introduce memory.min 2018-06-07 17:34:36 -07:00
page_ext.c memblock: drop memblock_alloc_*_nopanic() variants 2019-03-12 10:04:02 -07:00
page_idle.c mm: remove zone_lru_lock() function, access ->lru_lock directly 2019-03-05 21:07:21 -08:00
page_io.c mm/page_io.c: fix polled swap page in 2019-01-04 13:13:48 -08:00
page_isolation.c mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate() 2019-03-29 10:01:37 -07:00
page_owner.c mm/page_owner: Simplify stack trace handling 2019-04-29 12:37:50 +02:00
page_poison.c page_poison: play nicely with KASAN 2019-03-05 21:07:13 -08:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-10-31 08:54:11 -07:00
pagewalk.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
percpu-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-km.c percpu: km: no need to consider pcpu_group_offsets[0] 2019-02-26 13:47:58 -08:00
percpu-stats.c treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
percpu-vm.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu.c percpu: stop printing kernel addresses 2019-03-18 10:36:36 -07:00
pgtable-generic.c x86/mm: Page size aware flush_tlb_mm_range() 2018-10-09 16:51:11 +02:00
process_vm_access.c mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors 2018-02-06 18:32:48 -08:00
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
rmap.c mm: remove zone_lru_lock() function, access ->lru_lock directly 2019-03-05 21:07:21 -08:00
rodata_test.c mm: fix RODATA_TEST failure "rodata_test: test data was not read only" 2017-10-03 17:54:24 -07:00
shmem.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
slab.c mm/slab.c: fix an infinite loop in leaks_show() 2019-05-14 09:47:45 -07:00
slab.h mm: add support for kmem caches in DMA32 zone 2019-03-29 10:01:37 -07:00
slab_common.c mm: add support for kmem caches in DMA32 zone 2019-03-29 10:01:37 -07:00
slob.c slob: use slab_list instead of lru 2019-05-14 09:47:44 -07:00
slub.c mm/slub.c: update the comment about slab frozen 2019-05-14 09:47:45 -07:00
sparse-vmemmap.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sparse.c mm/hotplug: fix offline undo_isolate_page_range() 2019-03-29 10:01:37 -07:00
swap.c mm: remove zone_lru_lock() function, access ->lru_lock directly 2019-03-05 21:07:21 -08:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
swapfile.c mm: swapoff: shmem_unuse() stop eviction without igrab() 2019-04-19 09:46:04 -07:00
truncate.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
usercopy.c mm/usercopy.c: no check page span for stack objects 2019-01-08 17:15:11 -08:00
userfaultfd.c hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization" 2019-01-08 17:15:11 -08:00
util.c mm/gup: change GUP fast to use flags rather than a write 'bool' 2019-05-14 09:47:46 -07:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c mm/vmalloc: Add flag for freeing of special permsissions 2019-04-30 12:37:58 +02:00
vmpressure.c mm/vmpressure.c: convert to use match_string() helper 2018-06-07 17:34:36 -07:00
vmscan.c mm: generalize putback scan functions 2019-05-14 09:47:45 -07:00
vmstat.c mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n 2019-04-19 09:46:04 -07:00
workingset.c mm/workingset: remove unused @mapping argument in workingset_eviction() 2019-03-05 21:07:21 -08:00
z3fold.c z3fold: fix possible reclaim races 2018-11-18 10:15:09 -08:00
zbud.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zpool.c mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc 2018-02-21 15:35:43 -08:00
zsmalloc.c mm/zsmalloc.c: fix fall-through annotation 2018-10-26 16:26:35 -07:00
zswap.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00