remarkable-linux/mm
Gavin Shan e02dc017c3 mm/page_alloc: fix nodes for reclaim in fast path
When @node_reclaim_node isn't 0, the page allocator tries to reclaim
pages if the amount of free memory in the zones are below the low
watermark.  On Power platform, none of NUMA nodes are scanned for page
reclaim because no nodes match the condition in zone_allows_reclaim().
On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance
of Node-A to Node-A.  So the preferred node even won't be scanned for
page reclaim.

   __alloc_pages_nodemask()
   get_page_from_freelist()
      zone_allows_reclaim()

Anton proposed the test code as below:

   # cat alloc.c
      :
   int main(int argc, char *argv[])
   {
	void *p;
	unsigned long size;
	unsigned long start, end;

	start = time(NULL);
	size = strtoul(argv[1], NULL, 0);
	printf("To allocate %ldGB memory\n", size);

	size <<= 30;
	p = malloc(size);
	assert(p);
	memset(p, 0, size);

	end = time(NULL);
	printf("Used time: %ld seconds\n", end - start);
	sleep(3600);
	return 0;
   }

The system I use for testing has two NUMA nodes.  Both have 128GB
memory.  In below scnario, the page caches on node#0 should be reclaimed
when it encounters pressure to accommodate request of allocation.

   # echo 2 > /proc/sys/vm/zone_reclaim_mode; \
     sync; \
     echo 3 > /proc/sys/vm/drop_caches; \
   # taskset -c 0 cat file.32G > /dev/null; \
     grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33619712 kB
   # taskset -c 0 ./alloc 128
   # grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33619840 kB
   # grep MemFree /sys/devices/system/node/node0/meminfo
     Node 0 MemFree:          186816 kB

With the patch applied, the pagecache on node-0 is reclaimed when its
free memory is running out.  It's the expected behaviour.

   # echo 2 > /proc/sys/vm/zone_reclaim_mode; \
     sync; \
     echo 3 > /proc/sys/vm/drop_caches
   # taskset -c 0 cat file.32G > /dev/null; \
     grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:       33605568 kB
   # taskset -c 0 ./alloc 128
   # grep FilePages /sys/devices/system/node/node0/meminfo
     Node 0 FilePages:        1379520 kB
   # grep MemFree /sys/devices/system/node/node0/meminfo
     Node 0 MemFree:           317120 kB

Fixes: 5f7a75acdb ("mm: page_alloc: do not cache reclaim distances")
Link: http://lkml.kernel.org/r/1486532455-29613-1-git-send-email-gwshan@linux.vnet.ibm.com
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>	[3.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:56 -08:00
..
kasan arm64 updates for 4.11: 2017-02-22 10:46:44 -08:00
backing-dev.c mm/backing-dev.c: use rb_entry() 2017-02-22 16:41:30 -08:00
balloon_compaction.c
bootmem.c mm/bootmem.c: cosmetic improvement of code readability 2017-02-22 16:41:29 -08:00
cleancache.c
cma.c mm: cma: print allocation failure reason and bitmap status 2017-02-24 17:46:55 -08:00
cma.h
cma_debug.c mm: cma_alloc: allow to specify GFP mask 2017-02-24 17:46:55 -08:00
compaction.c mm/migration: make isolate_movable_page() return int type 2017-02-24 17:46:55 -08:00
debug.c mm, debug: print raw struct page data in __dump_page() 2016-12-12 18:55:08 -08:00
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED 2016-12-20 09:48:46 -08:00
failslab.c
filemap.c mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
frame_vector.c mm: replace get_vaddr_frames() write/force parameters with gup_flags 2016-10-19 08:11:24 -07:00
frontswap.c
gup.c mm, x86: add support for PUD-sized transparent hugepages 2017-02-24 17:46:54 -08:00
highmem.c
huge_memory.c mm/autonuma: let architecture override how the write bit should be stashed in a protnone pte. 2017-02-24 17:46:56 -08:00
hugetlb.c mm: alloc_contig_range: allow to specify GFP mask 2017-02-24 17:46:55 -08:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2016-11-22 11:49:48 -06:00
internal.h mm, rmap: check all VMAs that PTE-mapped THP can be part of 2017-02-24 17:46:55 -08:00
interval_tree.c
Kconfig mm: THP page cache support for ppc64 2016-12-12 18:55:08 -08:00
Kconfig.debug
khugepaged.c mm: get rid of __GFP_OTHER_NODE 2017-01-10 18:31:55 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: fix reference to Documentation 2016-12-12 18:55:07 -08:00
ksm.c mm/ksm: handle protnone saved writes when making page write protect 2017-02-24 17:46:56 -08:00
list_lru.c mm/list_lru.c: avoid error-path NULL pointer deref 2016-10-27 18:43:42 -07:00
maccess.c
madvise.c mm, madvise: fail with ENOMEM when splitting vma will hit max_map_count 2017-02-24 17:46:55 -08:00
Makefile mm: introduce page_vma_mapped_walk() 2017-02-24 17:46:55 -08:00
memblock.c memblock: embed memblock type name within struct memblock_type 2017-02-24 17:46:54 -08:00
memcontrol.c slab: use memcg_kmem_cache_wq for slab destruction operations 2017-02-22 16:41:27 -08:00
memory-failure.c HWPOISON: soft offlining for non-lru movable page 2017-02-24 17:46:55 -08:00
memory.c mm/autonuma: let architecture override how the write bit should be stashed in a protnone pte. 2017-02-24 17:46:56 -08:00
memory_hotplug.c mm/memory_hotplug.c: fix overflow in test_pages_in_a_zone() 2017-02-24 17:46:56 -08:00
mempolicy.c mm/mempolicy.c: do not put mempolicy before using its nodemask 2017-01-24 16:26:14 -08:00
mempool.c
memtest.c
migrate.c mm: convert remove_migration_pte() to use page_vma_mapped_walk() 2017-02-24 17:46:55 -08:00
mincore.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mlock.c thp: fix corner case of munlock() of PTE-mapped THPs 2016-11-30 16:32:52 -08:00
mm_init.c
mmap.c mm, madvise: fail with ENOMEM when splitting vma will hit max_map_count 2017-02-24 17:46:55 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist() 2017-02-22 16:41:29 -08:00
mprotect.c mm/autonuma: let architecture override how the write bit should be stashed in a protnone pte. 2017-02-24 17:46:56 -08:00
mremap.c userfaultfd: non-cooperative: add event for memory unmaps 2017-02-24 17:46:55 -08:00
msync.c
nobootmem.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
nommu.c userfaultfd: non-cooperative: add event for memory unmaps 2017-02-24 17:46:55 -08:00
oom_kill.c mm, oom: header nodemask is NULL when cpusets are disabled 2017-02-24 17:46:53 -08:00
page-writeback.c mm/page-writeback.c: place "not" inside of unlikely() statement in wb_domain_writeout_inc() 2017-02-24 17:46:56 -08:00
page_alloc.c mm/page_alloc: fix nodes for reclaim in fast path 2017-02-24 17:46:56 -08:00
page_counter.c
page_ext.c
page_idle.c mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs() 2017-02-24 17:46:55 -08:00
page_io.c writeback: add wbc_to_write_flags() 2016-11-02 10:24:03 -06:00
page_isolation.c mm, page_alloc: avoid page_to_pfn() when merging buddies 2017-02-22 16:41:27 -08:00
page_owner.c
page_poison.c
page_vma_mapped.c mm: convert page_mapped_in_vma() to use page_vma_mapped_walk() 2017-02-24 17:46:55 -08:00
pagewalk.c mm, x86: add support for PUD-sized transparent hugepages 2017-02-24 17:46:54 -08:00
percpu-km.c
percpu-vm.c
percpu.c Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2016-12-13 12:34:47 -08:00
pgtable-generic.c mm, x86: add support for PUD-sized transparent hugepages 2017-02-24 17:46:54 -08:00
process_vm_access.c mm: unexport __get_user_pages_unlocked() 2016-12-14 16:04:09 -08:00
quicklist.c
readahead.c mm: don't cap request size based on read-ahead setting 2016-12-12 18:55:08 -08:00
rmap.c mm: drop page_check_address{,_transhuge} 2017-02-24 17:46:55 -08:00
shmem.c mm/shmem.c: fix unlikely() test of info->seals to test only for WRITE and GROW 2017-02-24 17:46:56 -08:00
slab.c slab: introduce __kmemcg_cache_deactivate() 2017-02-22 16:41:27 -08:00
slab.h slab: remove synchronous synchronize_sched() from memcg cache deactivation path 2017-02-22 16:41:27 -08:00
slab_common.c slab: use memcg_kmem_cache_wq for slab destruction operations 2017-02-22 16:41:27 -08:00
slob.c slab: introduce __kmemcg_cache_deactivate() 2017-02-22 16:41:27 -08:00
slub.c slub: make sysfs directories for memcg sub-caches optional 2017-02-22 16:41:27 -08:00
sparse-vmemmap.c
sparse.c mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next 2017-02-22 16:41:29 -08:00
swap.c mm: vmscan: move dirty pages out of the way until they're flushed 2017-02-24 17:46:54 -08:00
swap_cgroup.c
swap_slots.c mm/swap: skip readahead only when swap slot cache is enabled 2017-02-22 16:41:30 -08:00
swap_state.c mm/swap: skip readahead only when swap slot cache is enabled 2017-02-22 16:41:30 -08:00
swapfile.c mm/swap: enable swap slots cache usage 2017-02-22 16:41:30 -08:00
truncate.c mm: Invalidate DAX radix tree entries only if appropriate 2016-12-26 20:29:24 -08:00
usercopy.c mm/usercopy: Switch to using lm_alias 2017-01-11 13:56:50 +00:00
userfaultfd.c userfaultfd: mcopy_atomic: return -ENOENT when no compatible VMA found 2017-02-24 17:46:55 -08:00
util.c userfaultfd: non-cooperative: add event for memory unmaps 2017-02-24 17:46:55 -08:00
vmacache.c
vmalloc.c vmalloc: back off when the current task is killed 2017-02-24 17:46:55 -08:00
vmpressure.c
vmscan.c mm, vmscan: clear PGDAT_WRITEBACK when zone is balanced 2017-02-24 17:46:55 -08:00
vmstat.c mm, compaction: add vmstats for kcompactd work 2017-02-22 16:41:29 -08:00
workingset.c mm, vmscan: cleanup lru size claculations 2017-02-22 16:41:30 -08:00
z3fold.c z3fold: add kref refcounting 2017-02-24 17:46:54 -08:00
zbud.c
zpool.c
zsmalloc.c mm: fix some typos in mm/zsmalloc.c 2017-02-22 16:41:29 -08:00
zswap.c zswap: disable changing params if init fails 2017-02-03 14:13:19 -08:00