alistair23-linux/mm
Michal Hocko 2f064f3485 mm: make page pfmemalloc check more robust
Commit c48a11c7ad ("netvm: propagate page->pfmemalloc to skb") added
checks for page->pfmemalloc to __skb_fill_page_desc():

        if (page->pfmemalloc && !page->mapping)
                skb->pfmemalloc = true;

It assumes page->mapping == NULL implies that page->pfmemalloc can be
trusted.  However, __delete_from_page_cache() can set set page->mapping
to NULL and leave page->index value alone.  Due to being in union, a
non-zero page->index will be interpreted as true page->pfmemalloc.

So the assumption is invalid if the networking code can see such a page.
And it seems it can.  We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf.  There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.

The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead.  We can reuse the index
again but define it to an impossible value (-1UL).  This is the page
index so it should never see the value that large.  Replace all direct
users of page->pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.

The information will get lost if somebody wants to use page->index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g.  what SLAB and SLUB do).

[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad ("netvm: propagate page->pfmemalloc to skb")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Vlastimil Babka <vbabka@suse.com>
Debugged-by: Jiri Bohac <jbohac@suse.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>	[3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-21 14:30:10 -07:00
..
kasan .mailmap: Andrey Ryabinin has moved 2015-08-14 15:56:32 -07:00
backing-dev.c writeback: don't drain bdi_writeback_congested on bdi destruction 2015-07-02 08:46:00 -06:00
balloon_compaction.c
bootmem.c mm: page_alloc: pass PFN to __free_pages_bootmem 2015-06-30 19:44:55 -07:00
cleancache.c
cma.c mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute 2015-06-24 17:49:44 -07:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
compaction.c
debug-pagealloc.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c writeback: implement and use inode_congested() 2015-06-02 08:33:35 -06:00
failslab.c
filemap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
frontswap.c frontswap: allow multiple backends 2015-06-24 17:49:45 -07:00
gup.c
highmem.c
huge_memory.c mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_* 2015-08-07 04:39:42 +03:00
hugetlb.c mm/hugetlb: remove unused arch hook prepare/release_hugepage 2015-06-25 17:00:35 -07:00
hugetlb_cgroup.c
hwpoison-inject.c mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling 2015-06-24 17:49:42 -07:00
init-mm.c
internal.h mm: meminit: finish initialisation of struct pages before basic setup 2015-06-30 19:44:56 -07:00
interval_tree.c
Kconfig mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set 2015-06-30 19:44:56 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc() 2015-06-24 17:49:46 -07:00
ksm.c
list_lru.c
maccess.c
madvise.c
Makefile
memblock.c mm: page_alloc: pass PFN to __free_pages_bootmem 2015-06-30 19:44:55 -07:00
memcontrol.c Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block 2015-06-25 16:00:17 -07:00
memory-failure.c mm/hwpoison: fix panic due to split huge zero page 2015-08-14 15:56:32 -07:00
memory.c mm: avoid setting up anonymous pages into file mapping 2015-07-09 11:12:48 -07:00
memory_hotplug.c memory-hotplug: fix wrong edge when hot add a new node 2015-08-14 15:56:32 -07:00
mempolicy.c mm, thp: respect MPOL_PREFERRED policy with non-local node 2015-06-24 17:49:46 -07:00
mempool.c
memtest.c mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute 2015-06-24 17:49:44 -07:00
migrate.c mm/memory-failure: set PageHWPoison before migrate_pages() 2015-08-07 04:39:42 +03:00
mincore.c
mlock.c
mm_init.c mm: meminit: remove mminit_verify_page_links 2015-06-30 19:44:56 -07:00
mmap.c mm/mmap.c: optimization of do_mmap_pgoff function 2015-06-24 17:49:45 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c mm: fix mprotect() behaviour on VM_LOCKED VMAs 2015-06-24 17:49:41 -07:00
mremap.c mm: new arch_remap() hook 2015-06-24 17:49:41 -07:00
msync.c
nobootmem.c mm: page_alloc: pass PFN to __free_pages_bootmem 2015-06-30 19:44:55 -07:00
nommu.c Replace module_init with appropriate alternate initcall in non modules. 2015-07-02 10:36:29 -07:00
oom_kill.c mm/oom_kill.c: print points as unsigned int 2015-06-24 17:49:44 -07:00
page-writeback.c writeback: fix initial dirty limit 2015-08-07 04:39:42 +03:00
page_alloc.c mm: make page pfmemalloc check more robust 2015-08-21 14:30:10 -07:00
page_counter.c
page_ext.c
page_io.c
page_isolation.c
page_owner.c mm/page_owner: set correct gfp_mask on page_owner 2015-07-17 16:39:54 -07:00
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc() 2015-06-24 17:49:46 -07:00
pgtable-generic.c mm: clarify that the function operates on hugepage pte 2015-06-24 17:49:44 -07:00
process_vm_access.c
quicklist.c
readahead.c writeback: implement and use inode_congested() 2015-06-02 08:33:35 -06:00
rmap.c Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block 2015-06-25 16:00:17 -07:00
shmem.c ipc: use private shmem or hugetlbfs inodes for shm segments. 2015-08-07 04:39:41 +03:00
slab.c mm: make page pfmemalloc check more robust 2015-08-21 14:30:10 -07:00
slab.h slab: correct size_index table before replacing the bootstrap kmem_cache_node 2015-06-24 17:49:41 -07:00
slab_common.c mm/slub: allow merging when SLAB_DEBUG_FREE is set 2015-08-07 04:39:40 +03:00
slob.c
slub.c mm: make page pfmemalloc check more robust 2015-08-21 14:30:10 -07:00
sparse-vmemmap.c
sparse.c
swap.c mm: drop bogus VM_BUG_ON_PAGE assert in put_page() codepath 2015-06-24 17:49:42 -07:00
swap_cgroup.c
swap_state.c
swapfile.c vfs: add seq_file_path() helper 2015-06-23 18:01:07 -04:00
truncate.c
util.c
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations 2015-08-05 10:49:38 +02:00
vmstat.c
workingset.c
zbud.c zpool: remove zpool_evict() 2015-06-25 17:00:37 -07:00
zpool.c zpool: remove zpool_evict() 2015-06-25 17:00:37 -07:00
zsmalloc.c zpool: remove zpool_evict() 2015-06-25 17:00:37 -07:00
zswap.c zswap: runtime enable/disable 2015-06-25 17:00:37 -07:00