alistair23-linux/mm
Michal Hocko ecc736fc3c memcg: fix endless loop caused by mem_cgroup_iter
Hugh has reported an endless loop when the hardlimit reclaim sees the
same group all the time.  This might happen when the reclaim races with
the memcg removal.

shrink_zone
                                                [rmdir root]
  mem_cgroup_iter(root, NULL, reclaim)
    // prev = NULL
    rcu_read_lock()
    mem_cgroup_iter_load
      last_visited = iter->last_visited   // gets root || NULL
      css_tryget(last_visited)            // failed
      last_visited = NULL                 [1]
    memcg = root = __mem_cgroup_iter_next(root, NULL)
    mem_cgroup_iter_update
      iter->last_visited = root;
    reclaim->generation = iter->generation

 mem_cgroup_iter(root, root, reclaim)
   // prev = root
   rcu_read_lock
    mem_cgroup_iter_load
      last_visited = iter->last_visited   // gets root
      css_tryget(last_visited)            // failed
    [1]

The issue seemed to be introduced by commit 5f57816197 ("memcg: relax
memcg iter caching") which has replaced unconditional css_get/css_put by
css_tryget/css_put for the cached iterator.

This patch fixes the issue by skipping css_tryget on the root of the
tree walk in mem_cgroup_iter_load and symmetrically doesn't release it
in mem_cgroup_iter_update.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Hugh Dickins <hughd@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: <stable@vger.kernel.org>	[3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:53 -08:00
..
backing-dev.c
balloon_compaction.c mm: print more details for bad_page() 2014-01-23 16:36:50 -08:00
bootmem.c mm/bootmem.c: remove unused local `map' 2013-11-13 12:09:09 +09:00
bounce.c mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored 2013-09-30 14:31:02 -07:00
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
compaction.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
fremap.c mm: fix use-after-free in sys_remap_file_pages 2014-01-02 14:40:30 -08:00
frontswap.c
highmem.c
huge_memory.c mm: audit/fix non-modular users of module_init in core code 2014-01-23 16:36:52 -08:00
hugetlb.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
hugetlb_cgroup.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
hwpoison-inject.c mm/hwpoison: add '#' to hwpoison_inject 2014-01-21 16:19:48 -08:00
init-mm.c
internal.h mm: show message when updating min_free_kbytes in thp 2014-01-23 16:36:52 -08:00
interval_tree.c
Kconfig mm: add missing dependency in Kconfig 2013-12-18 19:04:52 -08:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: kmemleak: avoid false negatives on vmalloc'ed objects 2013-11-13 12:09:07 +09:00
ksm.c mm: audit/fix non-modular users of module_init in core code 2014-01-23 16:36:52 -08:00
list_lru.c mm: list_lru: fix almost infinite loop causing effective livelock 2013-10-30 12:57:46 -07:00
maccess.c
madvise.c mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood 2013-09-30 14:31:02 -07:00
Makefile
memblock.c mm/nobootmem: free_all_bootmem again 2014-01-23 16:36:52 -08:00
memcontrol.c memcg: fix endless loop caused by mem_cgroup_iter 2014-01-23 16:36:53 -08:00
memory-failure.c mm/memory-failure.c: shift page lock from head page to tail page after thp split 2014-01-23 16:36:52 -08:00
memory.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
memory_hotplug.c mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug 2014-01-23 16:36:52 -08:00
mempolicy.c mm: new_vma_page() cannot see NULL vma for hugetlb pages 2014-01-23 16:36:52 -08:00
mempool.c
migrate.c sched/numa: fix setting of cpupid on page migration twice 2014-01-23 16:36:52 -08:00
mincore.c mm: do_mincore() cleanup 2014-01-23 16:36:52 -08:00
mlock.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
mm_init.c mm/mm_init.c: make creation of the mm_kobj happen earlier than device_initcall 2014-01-23 16:36:52 -08:00
mmap.c mm: audit/fix non-modular users of module_init in core code 2014-01-23 16:36:52 -08:00
mmu_context.c
mmu_notifier.c mm: audit/fix non-modular users of module_init in core code 2014-01-23 16:36:52 -08:00
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c mm: numa: do not automatically migrate KSM pages 2014-01-21 16:19:48 -08:00
mremap.c mm: revert mremap pud_free anti-fix 2013-10-16 21:35:53 -07:00
msync.c
nobootmem.c mm/nobootmem: free_all_bootmem again 2014-01-23 16:36:52 -08:00
nommu.c mm: add overcommit_kbytes sysctl variable 2014-01-21 16:19:44 -08:00
oom_kill.c mm, oom: prefer thread group leaders for display purposes 2014-01-23 16:36:53 -08:00
page-writeback.c writeback: fix negative bdi max pause 2013-10-16 21:35:53 -07:00
page_alloc.c mm: show message when updating min_free_kbytes in thp 2014-01-23 16:36:52 -08:00
page_cgroup.c Merge branch 'akpm' (incoming from Andrew) 2014-01-21 19:05:45 -08:00
page_io.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
page_isolation.c
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c
percpu.c Merge branch 'akpm' (incoming from Andrew) 2014-01-21 19:05:45 -08:00
pgtable-generic.c mm: fix TLB flush race between migration, and change_protection_range 2013-12-18 19:04:51 -08:00
process_vm_access.c
quicklist.c
readahead.c readahead: fix sequential read cache miss detection 2013-11-13 12:09:09 +09:00
rmap.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
shmem.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
slab.c Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2013-11-22 08:10:34 -08:00
slab.h memcg, slab: RCU protect memcg_params for root caches 2014-01-23 16:36:51 -08:00
slab_common.c slab: do not panic if we fail to create memcg cache 2014-01-23 16:36:51 -08:00
slob.c
slub.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
swap.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
swap_state.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
swapfile.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
truncate.c truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
util.c mm: add overcommit_kbytes sysctl variable 2014-01-21 16:19:44 -08:00
vmalloc.c mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page} 2014-01-21 16:19:44 -08:00
vmpressure.c memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state 2013-11-22 18:20:43 -05:00
vmscan.c mm: vmscan: call NUMA-unaware shrinkers irrespective of nodemask 2014-01-23 16:36:52 -08:00
vmstat.c mm: numa: return the number of base pages altered by protection changes 2013-11-13 12:09:11 +09:00
zbud.c
zswap.c mm/zswap.c: change params from hidden to ro 2014-01-23 16:36:50 -08:00