alistair23-linux/mm
Johannes Weiner 6b4f7799c6 mm: vmscan: invoke slab shrinkers from shrink_zone()
The slab shrinkers are currently invoked from the zonelist walkers in
kswapd, direct reclaim, and zone reclaim, all of which roughly gauge the
eligible LRU pages and assemble a nodemask to pass to NUMA-aware
shrinkers, which then again have to walk over the nodemask.  This is
redundant code, extra runtime work, and fairly inaccurate when it comes to
the estimation of actually scannable LRU pages.  The code duplication will
only get worse when making the shrinkers cgroup-aware and requiring them
to have out-of-band cgroup hierarchy walks as well.

Instead, invoke the shrinkers from shrink_zone(), which is where all
reclaimers end up, to avoid this duplication.

Take the count for eligible LRU pages out of get_scan_count(), which
considers many more factors than just the availability of swap space, like
zone_reclaimable_pages() currently does.  Accumulate the number over all
visited lruvecs to get the per-zone value.

Some nodes have multiple zones due to memory addressing restrictions.  To
avoid putting too much pressure on the shrinkers, only invoke them once
for each such node, using the class zone of the allocation as the pivot
zone.

For now, this integrates the slab shrinking better into the reclaim logic
and gets rid of duplicative invocations from kswapd, direct reclaim, and
zone reclaim.  It also prepares for cgroup-awareness, allowing
memcg-capable shrinkers to be added at the lruvec level without much
duplication of both code and runtime work.

This changes kswapd behavior, which used to invoke the shrinkers for each
zone, but with scan ratios gathered from the entire node, resulting in
meaningless pressure quantities on multi-zone nodes.

Zone reclaim behavior also changes.  It used to shrink slabs until the
same amount of pages were shrunk as were reclaimed from the LRUs.  Now it
merely invokes the shrinkers once with the zone's scan ratio, which makes
the shrinkers go easier on caches that implement aging and would prefer
feeding back pressure from recently used slab objects to unused LRU pages.

[vdavydov@parallels.com: assure class zone is populated]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
..
backing-dev.c Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block 2014-10-18 11:53:51 -07:00
balloon_compaction.c mm/balloon_compaction: fix deflation when compaction is disabled 2014-10-29 16:33:15 -07:00
bootmem.c mem-hotplug: reset node managed pages when hot-adding a new pgdat 2014-11-13 16:17:06 -08:00
cleancache.c
cma.c mm: cma: align to physical address, not CMA region position 2014-12-13 12:42:46 -08:00
compaction.c mm, compaction: more focused lru and pcplists draining 2014-12-10 17:41:06 -08:00
debug-pagealloc.c mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
debug.c mm: move page->mem_cgroup bad page handling into generic code 2014-12-10 17:41:09 -08:00
dmapool.c mm/dmapool.c: fixed a brace coding style issue 2014-10-09 22:26:00 -04:00
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm: convert i_mmap_mutex to rwsem 2014-12-13 12:42:45 -08:00
filemap_xip.c mm/xip: share the i_mmap_rwsem 2014-12-13 12:42:45 -08:00
fremap.c mm: use new helper functions around the i_mmap_mutex 2014-12-13 12:42:45 -08:00
frontswap.c mm/frontswap.c: fix the condition in BUG_ON 2014-12-10 17:41:08 -08:00
gup.c mm: Update generic gup implementation to handle hugepage directory 2014-11-14 17:24:21 +11:00
highmem.c
huge_memory.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2014-12-11 17:30:55 -08:00
hugetlb.c hugetlb: hugetlb_register_all_nodes(): add __init marker 2014-12-13 12:42:47 -08:00
hugetlb_cgroup.c mm: hugetlb_cgroup: convert to lockless page counters 2014-12-10 17:41:04 -08:00
hwpoison-inject.c
init-mm.c
internal.h mm, compaction: always update cached scanner positions 2014-12-10 17:41:06 -08:00
interval_tree.c mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA 2014-10-09 22:25:57 -04:00
iov_iter.c copy_from_iter_nocache() 2014-12-08 20:25:23 -05:00
Kconfig mm/balloon_compaction: add vmstat counters and kpageflags bit 2014-10-09 22:26:01 -04:00
Kconfig.debug mm/debug-pagealloc: prepare boottime configurable on/off 2014-12-13 12:42:48 -08:00
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c
kmemleak.c
ksm.c mm: ksm use pr_err instead of printk 2014-10-09 22:26:00 -04:00
list_lru.c
maccess.c
madvise.c
Makefile mm/page_owner: keep track of page owners 2014-12-13 12:42:48 -08:00
memblock.c mm/memblock.c: refactor functions to set/clear MEMBLOCK_HOTPLUG 2014-12-13 12:42:46 -08:00
memcontrol.c mm/memcontrol.c: remove the unused arg in __memcg_kmem_get_cache() 2014-12-13 12:42:47 -08:00
memory-failure.c mm: vmscan: invoke slab shrinkers from shrink_zone() 2014-12-13 12:42:48 -08:00
memory.c mm: export find_extend_vma() and handle_mm_fault() for driver use 2014-12-13 12:42:47 -08:00
memory_hotplug.c mm, memory_hotplug/failure: drain single zone pcplists 2014-12-10 17:41:05 -08:00
mempolicy.c mm: mempolicy: skip inaccessible VMAs when setting MPOL_MF_LAZY 2014-10-09 22:26:02 -04:00
mempool.c
migrate.c mm/balloon_compaction: redesign ballooned pages management 2014-10-09 22:26:01 -04:00
mincore.c mm: mincore: add hwpoison page handle 2014-12-13 12:42:46 -08:00
mlock.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 15:44:12 +02:00
mm_init.c
mmap.c mm: export find_extend_vma() and handle_mm_fault() for driver use 2014-12-13 12:42:47 -08:00
mmu_context.c
mmu_notifier.c kvm: Fix page ageing bugs 2014-09-24 14:07:58 +02:00
mmzone.c
mprotect.c mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared 2014-10-14 02:18:28 +02:00
mremap.c mm: convert i_mmap_mutex to rwsem 2014-12-13 12:42:45 -08:00
msync.c
nobootmem.c mem-hotplug: reset node managed pages when hot-adding a new pgdat 2014-11-13 16:17:06 -08:00
nommu.c mm/nommu: use alloc_pages_exact() rather than its own implementation 2014-12-13 12:42:48 -08:00
oom_kill.c Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-12-11 18:57:19 -08:00
page-writeback.c mm, memcg: fix potential undefined behaviour in page stat accounting 2014-12-10 17:41:08 -08:00
page_alloc.c mm: vmscan: invoke slab shrinkers from shrink_zone() 2014-12-13 12:42:48 -08:00
page_counter.c mm: memcontrol: remove obsolete kmemcg pinning tricks 2014-12-10 17:41:05 -08:00
page_ext.c mm/page_owner: keep track of page owners 2014-12-13 12:42:48 -08:00
page_io.c
page_isolation.c mm, page_isolation: drain single zone pcplists 2014-12-10 17:41:05 -08:00
page_owner.c mm/page_owner: correct owner information for early allocated pages 2014-12-13 12:42:48 -08:00
pagewalk.c mm: use VM_BUG_ON_MM where possible 2014-10-09 22:25:58 -04:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: off by one in BUG_ON() 2014-10-29 10:34:34 -04:00
pgtable-generic.c mm: actually clear pmd_numa before invalidating 2014-08-29 16:28:15 -07:00
process_vm_access.c
quicklist.c
readahead.c
rmap.c mm/rmap: calculate page offset when needed 2014-12-13 12:42:46 -08:00
shmem.c shmem: support RENAME_WHITEOUT 2014-10-24 00:14:37 +02:00
slab.c Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-12-11 18:57:19 -08:00
slab.h memcg: use generic slab iterators for showing slabinfo 2014-12-10 17:41:07 -08:00
slab_common.c memcg: use generic slab iterators for showing slabinfo 2014-12-10 17:41:07 -08:00
slob.c mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller() 2014-10-09 22:25:50 -04:00
slub.c Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-12-11 18:57:19 -08:00
sparse-vmemmap.c
sparse.c
swap.c mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache 2014-10-09 22:25:59 -04:00
swap_cgroup.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swap_state.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swapfile.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
truncate.c mm: Fix comment before truncate_setsize() 2014-11-07 08:29:25 +11:00
util.c proc/maps: make vm_is_stack() logic namespace-friendly 2014-10-09 22:25:50 -04:00
vmacache.c mm,vmacache: count number of system-wide flushes 2014-12-13 12:42:48 -08:00
vmalloc.c mm/vmalloc.c: replace printk with pr_warn 2014-12-10 17:41:05 -08:00
vmpressure.c mm/vmpressure.c: fix race in vmpressure_work_fn() 2014-12-02 17:32:07 -08:00
vmscan.c mm: vmscan: invoke slab shrinkers from shrink_zone() 2014-12-13 12:42:48 -08:00
vmstat.c mm,vmacache: count number of system-wide flushes 2014-12-13 12:42:48 -08:00
workingset.c
zbud.c Merge Linus' tree to be be to apply submitted patches to newer code than 2014-11-20 14:42:02 +01:00
zpool.c mm/zpool: use prefixed module loading 2014-08-29 16:28:16 -07:00
zsmalloc.c zsmalloc: simplify init_zspage free obj linking 2014-10-09 22:26:03 -04:00
zswap.c Merge Linus' tree to be be to apply submitted patches to newer code than 2014-11-20 14:42:02 +01:00