alistair23-linux/mm
Michal Hocko 7dea19f9ee mm: introduce memalloc_nofs_{save,restore} API
GFP_NOFS context is used for the following 5 reasons currently:

 - to prevent from deadlocks when the lock held by the allocation
   context would be needed during the memory reclaim

 - to prevent from stack overflows during the reclaim because the
   allocation is performed from a deep context already

 - to prevent lockups when the allocation context depends on other
   reclaimers to make a forward progress indirectly

 - just in case because this would be safe from the fs POV

 - silence lockdep false positives

Unfortunately overuse of this allocation context brings some problems to
the MM.  Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.

In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily.  We would like to get rid of
those as much as possible.  One way to do that is to use the flag in
scopes rather than isolated cases.  Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.

Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.

Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context.  This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO.  The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.

PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO.  memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts.  Xfs code paths preserve
their semantic.  kmem_flags_convert() doesn't need to evaluate the flag
anymore.

This patch shouldn't introduce any functional changes.

Let's hope that filesystems will drop direct GFP_NOFS (resp.  ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.

[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03 15:52:09 -07:00
..
kasan kasan: report only the first error by default 2017-03-31 17:13:30 -07:00
backing-dev.c bdi: Drop 'parent' argument from bdi_register[_va]() 2017-04-20 12:09:55 -06:00
balloon_compaction.c
bootmem.c mm/bootmem.c: cosmetic improvement of code readability 2017-02-22 16:41:29 -08:00
cleancache.c
cma.c mm: cma: print allocation failure reason and bitmap status 2017-02-24 17:46:55 -08:00
cma.h
cma_debug.c mm: cma_alloc: allow to specify GFP mask 2017-02-24 17:46:55 -08:00
compaction.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
debug.c
debug_page_ref.c
dmapool.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
early_ioremap.c
fadvise.c mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED 2016-12-20 09:48:46 -08:00
failslab.c
filemap.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-02 11:18:50 -07:00
frame_vector.c
frontswap.c
gup.c Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" 2017-04-23 11:45:20 +02:00
highmem.c
huge_memory.c mm: reclaim MADV_FREE pages 2017-05-03 15:52:08 -07:00
hugetlb.c mm/hugetlb.c: don't call region_abort if region_chg fails 2017-03-31 17:13:30 -07:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c
internal.h mm: use is_migrate_highatomic() to simplify the code 2017-05-03 15:52:08 -07:00
interval_tree.c
Kconfig mm: remove AVR32 arch special handling in mm/Kconfig 2017-05-01 09:36:31 +02:00
Kconfig.debug mm: add arch-independent testcases for RODATA 2017-02-27 18:43:48 -08:00
khugepaged.c mm: don't assume anonymous pages have SwapBacked flag 2017-05-03 15:52:08 -07:00
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: fix section name for .data..ro_after_init 2017-03-31 17:13:30 -07:00
ksm.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h> 2017-03-02 08:42:28 +01:00
list_lru.c
maccess.c
madvise.c mm: enable MADV_FREE for swapless system 2017-05-03 15:52:08 -07:00
Makefile mm: add arch-independent testcases for RODATA 2017-02-27 18:43:48 -08:00
memblock.c mm/memblock.c: fix memblock_next_valid_pfn() 2017-03-09 17:01:10 -08:00
memcontrol.c mm: memcontrol: provide shmem statistics 2017-05-03 15:52:08 -07:00
memory-failure.c mm: delete unnecessary TTU_* flags 2017-05-03 15:52:08 -07:00
memory.c Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc 2017-04-02 10:33:48 -04:00
memory_hotplug.c mm: add private lock to serialize memory hotplug operations 2017-03-16 16:56:18 -07:00
mempolicy.c mm/mempolicy.c: fix error handling in set_mempolicy and mbind. 2017-04-08 10:57:55 -07:00
mempool.c
memtest.c
migrate.c mm: don't assume anonymous pages have SwapBacked flag 2017-05-03 15:52:08 -07:00
mincore.c mm: remove shmem_mapping() shmem_zero_setup() duplicates 2017-02-24 17:46:56 -08:00
mlock.c Merge branch 'prep-for-5level' 2017-03-10 08:59:07 -08:00
mm_init.c
mmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-02 15:20:00 -08:00
mmu_context.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
mmu_notifier.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
mmzone.c mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist() 2017-02-22 16:41:29 -08:00
mprotect.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
mremap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
msync.c
nobootmem.c
nommu.c Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
oom_kill.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> 2017-03-02 08:42:35 +01:00
page-writeback.c mm/page-writeback.c: use setup_deferrable_timer 2017-05-03 15:52:08 -07:00
page_alloc.c mm: introduce memalloc_nofs_{save,restore} API 2017-05-03 15:52:09 -07:00
page_counter.c
page_ext.c
page_idle.c mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs() 2017-02-24 17:46:55 -08:00
page_io.c
page_isolation.c mm: use is_migrate_isolate_page() to simplify the code 2017-05-03 15:52:08 -07:00
page_owner.c
page_poison.c
page_vma_mapped.c mm: fix page_vma_mapped_walk() for ksm pages 2017-04-08 00:47:48 -07:00
pagewalk.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
percpu-km.c
percpu-vm.c percpu: remove unused chunk_alloc parameter from pcpu_get_pages() 2017-03-06 15:56:55 -05:00
percpu.c Merge branch 'sched/core' into locking/core 2017-04-04 11:31:12 +02:00
pgtable-generic.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
process_vm_access.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
quicklist.c
readahead.c
rmap.c mm: fix lazyfree BUG_ON check in try_to_unmap_one() 2017-05-03 15:52:08 -07:00
rodata_test.c mm: add arch-independent testcases for RODATA 2017-02-27 18:43:48 -08:00
shmem.c Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
slab.c slab: avoid IPIs when creating kmem caches 2017-05-03 15:52:07 -07:00
slab.h slab: remove synchronous synchronize_sched() from memcg cache deactivation path 2017-02-22 16:41:27 -08:00
slab_common.c kasan: drain quarantine of memcg slab objects 2017-02-24 17:46:56 -08:00
slob.c slab: introduce __kmemcg_cache_deactivate() 2017-02-22 16:41:27 -08:00
slub.c slub: make sysfs directories for memcg sub-caches optional 2017-02-22 16:41:27 -08:00
sparse-vmemmap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
sparse.c mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next 2017-02-22 16:41:29 -08:00
swap.c mm: move MADV_FREE pages into LRU_INACTIVE_FILE list 2017-05-03 15:52:08 -07:00
swap_cgroup.c mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() 2017-04-08 00:47:49 -07:00
swap_slots.c mm, swap: Remove WARN_ON_ONCE() in free_swap_slot() 2017-03-21 14:13:19 -07:00
swap_state.c mm/swap: skip readahead only when swap slot cache is enabled 2017-02-22 16:41:30 -08:00
swapfile.c mm, swap: Fix a race in free_swap_and_cache() 2017-05-03 15:52:08 -07:00
truncate.c fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
usercopy.c mm/usercopy: Drop extra is_vmalloc_or_module() check 2017-04-05 12:30:18 -07:00
userfaultfd.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
util.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
vmacache.c sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
vmalloc.c A reasonably busy cycle for documentation this time around. There is a new 2017-05-02 10:21:17 -07:00
vmpressure.c mm: vmpressure: fix sending wrong events on underflow 2017-02-24 17:46:56 -08:00
vmscan.c mm: introduce memalloc_nofs_{save,restore} API 2017-05-03 15:52:09 -07:00
vmstat.c mm, vmstat: suppress pcp stats for unpopulated zones in zoneinfo 2017-05-03 15:52:08 -07:00
workingset.c mm: workingset: fix premature shadow node shrinking with cgroups 2017-03-31 17:13:30 -07:00
z3fold.c z3fold: fix page locking in z3fold_alloc() 2017-04-13 18:24:20 -07:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: expand class bit 2017-04-13 18:24:21 -07:00
zswap.c zswap: don't param_set_charp while holding spinlock 2017-02-27 18:43:45 -08:00