alistair23-linux/mm
Mel Gorman dc9086004b mm: compaction: check for overlapping nodes during isolation for migration
When isolating pages for migration, migration starts at the start of a
zone while the free scanner starts at the end of the zone.  Migration
avoids entering a new zone by never going beyond the free scanned.

Unfortunately, in very rare cases nodes can overlap.  When this happens,
migration isolates pages without the LRU lock held, corrupting lists
which will trigger errors in reclaim or during page free such as in the
following oops

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: [<ffffffff810f795c>] free_pcppages_bulk+0xcc/0x450
  PGD 1dda554067 PUD 1e1cb58067 PMD 0
  Oops: 0000 [#1] SMP
  CPU 37
  Pid: 17088, comm: memcg_process_s Tainted: G            X
  RIP: free_pcppages_bulk+0xcc/0x450
  Process memcg_process_s (pid: 17088, threadinfo ffff881c2926e000, task ffff881c2926c0c0)
  Call Trace:
    free_hot_cold_page+0x17e/0x1f0
    __pagevec_free+0x90/0xb0
    release_pages+0x22a/0x260
    pagevec_lru_move_fn+0xf3/0x110
    putback_lru_page+0x66/0xe0
    unmap_and_move+0x156/0x180
    migrate_pages+0x9e/0x1b0
    compact_zone+0x1f3/0x2f0
    compact_zone_order+0xa2/0xe0
    try_to_compact_pages+0xdf/0x110
    __alloc_pages_direct_compact+0xee/0x1c0
    __alloc_pages_slowpath+0x370/0x830
    __alloc_pages_nodemask+0x1b1/0x1c0
    alloc_pages_vma+0x9b/0x160
    do_huge_pmd_anonymous_page+0x160/0x270
    do_page_fault+0x207/0x4c0
    page_fault+0x25/0x30

The "X" in the taint flag means that external modules were loaded but but
is unrelated to the bug triggering.  The real problem was because the PFN
layout looks like this

  Zone PFN ranges:
    DMA      0x00000010 -> 0x00001000
    DMA32    0x00001000 -> 0x00100000
    Normal   0x00100000 -> 0x01e80000
  Movable zone start PFN for each node
  early_node_map[14] active PFN ranges
      0: 0x00000010 -> 0x0000009b
      0: 0x00000100 -> 0x0007a1ec
      0: 0x0007a354 -> 0x0007a379
      0: 0x0007f7ff -> 0x0007f800
      0: 0x00100000 -> 0x00680000
      1: 0x00680000 -> 0x00e80000
      0: 0x00e80000 -> 0x01080000
      1: 0x01080000 -> 0x01280000
      0: 0x01280000 -> 0x01480000
      1: 0x01480000 -> 0x01680000
      0: 0x01680000 -> 0x01880000
      1: 0x01880000 -> 0x01a80000
      0: 0x01a80000 -> 0x01c80000
      1: 0x01c80000 -> 0x01e80000

The fix is straight-forward.  isolate_migratepages() has to make a
similar check to isolate_freepage to ensure that it never isolates pages
from a zone it does not hold the LRU lock for.

This was discovered in a 3.0-based kernel but it affects 3.1.x, 3.2.x
and current mainline.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-08 19:03:51 -08:00
..
backing-dev.c freezer: implement and use kthread_freezable_should_stop() 2011-11-21 12:32:23 -08:00
bootmem.c mm: bootmem: try harder to free pages in bulk 2012-01-10 16:30:45 -08:00
bounce.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
cleancache.c
compaction.c mm: compaction: check for overlapping nodes during isolation for migration 2012-02-08 19:03:51 -08:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c
fadvise.c fadvise: only initiate writeback for specified range with FADV_DONTNEED 2012-01-10 16:30:43 -08:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c readahead: fix pipeline break caused by block plug 2012-02-03 16:16:41 -08:00
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-03 16:16:41 -08:00
fremap.c
highmem.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
huge_memory.c memcg: fix split_huge_page_refcounts() 2012-01-12 20:13:09 -08:00
hugetlb.c mm/hugetlb.c: undo change to page mapcount in fault handler 2012-01-23 08:38:48 -08:00
hwpoison-inject.c
init-mm.c
internal.h mm: thp: tail page refcounting fix 2011-11-02 16:06:57 -07:00
Kconfig Merge branch 'master' into x86/memblock 2011-11-28 09:46:22 -08:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c memcg: clear pc->mem_cgroup if necessary. 2012-01-12 20:13:07 -08:00
maccess.c
madvise.c
Makefile
memblock.c memblock: Fix alloc failure due to dumb underflow protection in memblock_find_in_range_node() 2012-01-16 08:38:06 +01:00
memcontrol.c mm/memcontrol.c: fix warning with CONFIG_NUMA=n 2012-02-03 16:16:40 -08:00
memory-failure.c mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
memory.c mm: fix rss count leakage during migration 2012-01-23 08:38:49 -08:00
memory_hotplug.c mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
mempolicy.c mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c mm: postpone migrated page mapping reset 2012-02-03 16:16:40 -08:00
mincore.c
mlock.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
mm_init.c
mmap.c mm: simplify find_vma_prev() 2012-01-10 16:30:44 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c mremap: enforce rmap src/dst vma ordering in case of vma_merge() succeeding in copy_vma() 2012-01-10 16:30:44 -08:00
msync.c
nobootmem.c Merge branch 'master' into x86/memblock 2011-11-28 09:46:22 -08:00
nommu.c xen: map foreign pages for shared rings by updating the PTEs directly 2011-11-16 12:13:08 -05:00
oom_kill.c mm: unify remaining mem_cont, mem, etc. variable names to memcg 2012-01-12 20:13:06 -08:00
page-writeback.c Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux 2012-01-10 16:59:59 -08:00
page_alloc.c mm: __count_immobile_pages(): make sure the node is online 2012-01-23 08:38:47 -08:00
page_cgroup.c page_cgroup: drop multi CONFIG_MEMORY_HOTPLUG 2012-01-12 20:13:08 -08:00
page_io.c
page_isolation.c
pagewalk.c
percpu-km.c
percpu-vm.c percpu: fix chunk range calculation 2011-11-22 08:09:46 -08:00
percpu.c Kmemleak patches 2012-01-14 18:11:11 -08:00
pgtable-generic.c
prio_tree.c
process_vm_access.c Fix race in process_vm_rw_core 2012-02-02 12:55:17 -08:00
quicklist.c
readahead.c
rmap.c mm: unify remaining mem_cont, mem, etc. variable names to memcg 2012-01-12 20:13:06 -08:00
shmem.c SHM_UNLOCK: fix Unevictable pages stranded after swap 2012-01-23 08:38:48 -08:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-01-11 18:52:23 -08:00
slob.c
slub.c mm,x86,um: move CMPXCHG_DOUBLE config option 2012-01-12 20:13:03 -08:00
sparse-vmemmap.c
sparse.c
swap.c mm: remove del_page_from_lru, add page_off_lru 2012-01-12 20:13:10 -08:00
swap_state.c memcg: clear pc->mem_cgroup if necessary. 2012-01-12 20:13:07 -08:00
swapfile.c mm: unify remaining mem_cont, mem, etc. variable names to memcg 2012-01-12 20:13:06 -08:00
thrash.c
truncate.c
util.c
vmalloc.c mm/vmalloc.c: eliminate extra loop in pcpu_get_vm_areas error path 2012-01-12 20:13:10 -08:00
vmscan.c SHM_UNLOCK: fix Unevictable pages stranded after swap 2012-01-23 08:38:48 -08:00
vmstat.c mm,x86,um: move CMPXCHG_LOCAL config option 2012-01-12 20:13:03 -08:00