alistair23-linux/mm
Vladimir Davydov 8135be5a80 memcg: fix possible use-after-free in memcg_kmem_get_cache()
Suppose task @t that belongs to a memory cgroup @memcg is going to
allocate an object from a kmem cache @c.  The copy of @c corresponding to
@memcg, @mc, is empty.  Then if kmem_cache_alloc races with the memory
cgroup destruction we can access the memory cgroup's copy of the cache
after it was destroyed:

CPU0				CPU1
----				----
[ current=@t
  @mc->memcg_params->nr_pages=0 ]

kmem_cache_alloc(@c):
  call memcg_kmem_get_cache(@c);
  proceed to allocation from @mc:
    alloc a page for @mc:
      ...

				move @t from @memcg
				destroy @memcg:
				  mem_cgroup_css_offline(@memcg):
				    memcg_unregister_all_caches(@memcg):
				      kmem_cache_destroy(@mc)

    add page to @mc

We could fix this issue by taking a reference to a per-memcg cache, but
that would require adding a per-cpu reference counter to per-memcg caches,
which would look cumbersome.

Instead, let's take a reference to a memory cgroup, which already has a
per-cpu reference counter, in the beginning of kmem_cache_alloc to be
dropped in the end, and move per memcg caches destruction from css offline
to css free.  As a side effect, per-memcg caches will be destroyed not one
by one, but all at once when the last page accounted to the memory cgroup
is freed.  This doesn't sound as a high price for code readability though.

Note, this patch does add some overhead to the kmem_cache_alloc hot path,
but it is pretty negligible - it's just a function call plus a per cpu
counter decrement, which is comparable to what we already have in
memcg_kmem_get_cache.  Besides, it's only relevant if there are memory
cgroups with kmem accounting enabled.  I don't think we can find a way to
handle this race w/o it, because alloc_page called from kmem_cache_alloc
may sleep so we can't flush all pending kmallocs w/o reference counting.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:49 -08:00
..
backing-dev.c Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block 2014-10-18 11:53:51 -07:00
balloon_compaction.c mm/balloon_compaction: fix deflation when compaction is disabled 2014-10-29 16:33:15 -07:00
bootmem.c mem-hotplug: reset node managed pages when hot-adding a new pgdat 2014-11-13 16:17:06 -08:00
cleancache.c
cma.c mm: cma: align to physical address, not CMA region position 2014-12-13 12:42:46 -08:00
compaction.c mm, compaction: more focused lru and pcplists draining 2014-12-10 17:41:06 -08:00
debug-pagealloc.c mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
debug.c mm: move page->mem_cgroup bad page handling into generic code 2014-12-10 17:41:09 -08:00
dmapool.c mm/dmapool.c: fixed a brace coding style issue 2014-10-09 22:26:00 -04:00
early_ioremap.c
fadvise.c mm: fadvise: document the fadvise(FADV_DONTNEED) behaviour for partial pages 2014-12-13 12:42:49 -08:00
failslab.c
filemap.c mm: convert i_mmap_mutex to rwsem 2014-12-13 12:42:45 -08:00
filemap_xip.c mm/xip: share the i_mmap_rwsem 2014-12-13 12:42:45 -08:00
fremap.c mm: use new helper functions around the i_mmap_mutex 2014-12-13 12:42:45 -08:00
frontswap.c mm/frontswap.c: fix the condition in BUG_ON 2014-12-10 17:41:08 -08:00
gup.c mm: Update generic gup implementation to handle hugepage directory 2014-11-14 17:24:21 +11:00
highmem.c
huge_memory.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2014-12-11 17:30:55 -08:00
hugetlb.c hugetlb: hugetlb_register_all_nodes(): add __init marker 2014-12-13 12:42:47 -08:00
hugetlb_cgroup.c mm: hugetlb_cgroup: convert to lockless page counters 2014-12-10 17:41:04 -08:00
hwpoison-inject.c
init-mm.c
internal.h mm, compaction: always update cached scanner positions 2014-12-10 17:41:06 -08:00
interval_tree.c mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA 2014-10-09 22:25:57 -04:00
iov_iter.c copy_from_iter_nocache() 2014-12-08 20:25:23 -05:00
Kconfig mm/balloon_compaction: add vmstat counters and kpageflags bit 2014-10-09 22:26:01 -04:00
Kconfig.debug mm/debug-pagealloc: prepare boottime configurable on/off 2014-12-13 12:42:48 -08:00
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c
kmemleak.c
ksm.c mm: ksm use pr_err instead of printk 2014-10-09 22:26:00 -04:00
list_lru.c
maccess.c
madvise.c
Makefile mm/page_owner: keep track of page owners 2014-12-13 12:42:48 -08:00
memblock.c mm/memblock.c: refactor functions to set/clear MEMBLOCK_HOTPLUG 2014-12-13 12:42:46 -08:00
memcontrol.c memcg: fix possible use-after-free in memcg_kmem_get_cache() 2014-12-13 12:42:49 -08:00
memory-failure.c mm: vmscan: invoke slab shrinkers from shrink_zone() 2014-12-13 12:42:48 -08:00
memory.c mm: export find_extend_vma() and handle_mm_fault() for driver use 2014-12-13 12:42:47 -08:00
memory_hotplug.c mm, memory_hotplug/failure: drain single zone pcplists 2014-12-10 17:41:05 -08:00
mempolicy.c mm: mempolicy: skip inaccessible VMAs when setting MPOL_MF_LAZY 2014-10-09 22:26:02 -04:00
mempool.c
migrate.c mm: unmapped page migration avoid unmap+remap overhead 2014-12-13 12:42:49 -08:00
mincore.c mm: mincore: add hwpoison page handle 2014-12-13 12:42:46 -08:00
mlock.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 15:44:12 +02:00
mm_init.c
mmap.c mm: export find_extend_vma() and handle_mm_fault() for driver use 2014-12-13 12:42:47 -08:00
mmu_context.c
mmu_notifier.c kvm: Fix page ageing bugs 2014-09-24 14:07:58 +02:00
mmzone.c
mprotect.c mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared 2014-10-14 02:18:28 +02:00
mremap.c mm: convert i_mmap_mutex to rwsem 2014-12-13 12:42:45 -08:00
msync.c
nobootmem.c mem-hotplug: reset node managed pages when hot-adding a new pgdat 2014-11-13 16:17:06 -08:00
nommu.c mm/nommu: use alloc_pages_exact() rather than its own implementation 2014-12-13 12:42:48 -08:00
oom_kill.c oom: kill the insufficient and no longer needed PT_TRACE_EXIT check 2014-12-13 12:42:49 -08:00
page-writeback.c mm, memcg: fix potential undefined behaviour in page stat accounting 2014-12-10 17:41:08 -08:00
page_alloc.c mm: remove the highmem zones' memmap in the highmem zone 2014-12-13 12:42:49 -08:00
page_counter.c mm: memcontrol: remove obsolete kmemcg pinning tricks 2014-12-10 17:41:05 -08:00
page_ext.c mm/page_owner: keep track of page owners 2014-12-13 12:42:48 -08:00
page_io.c
page_isolation.c mm, page_isolation: drain single zone pcplists 2014-12-10 17:41:05 -08:00
page_owner.c mm/page_owner: correct owner information for early allocated pages 2014-12-13 12:42:48 -08:00
pagewalk.c mm: use VM_BUG_ON_MM where possible 2014-10-09 22:25:58 -04:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: off by one in BUG_ON() 2014-10-29 10:34:34 -04:00
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c mm/rmap: calculate page offset when needed 2014-12-13 12:42:46 -08:00
shmem.c shmem: support RENAME_WHITEOUT 2014-10-24 00:14:37 +02:00
slab.c memcg: fix possible use-after-free in memcg_kmem_get_cache() 2014-12-13 12:42:49 -08:00
slab.h memcg: use generic slab iterators for showing slabinfo 2014-12-10 17:41:07 -08:00
slab_common.c memcg: use generic slab iterators for showing slabinfo 2014-12-10 17:41:07 -08:00
slob.c mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller() 2014-10-09 22:25:50 -04:00
slub.c memcg: fix possible use-after-free in memcg_kmem_get_cache() 2014-12-13 12:42:49 -08:00
sparse-vmemmap.c
sparse.c
swap.c mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache 2014-10-09 22:25:59 -04:00
swap_cgroup.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swap_state.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swapfile.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
truncate.c mm: Fix comment before truncate_setsize() 2014-11-07 08:29:25 +11:00
util.c proc/maps: make vm_is_stack() logic namespace-friendly 2014-10-09 22:25:50 -04:00
vmacache.c mm,vmacache: count number of system-wide flushes 2014-12-13 12:42:48 -08:00
vmalloc.c mm/vmalloc.c: fix memory ordering bug 2014-12-13 12:42:49 -08:00
vmpressure.c mm/vmpressure.c: fix race in vmpressure_work_fn() 2014-12-02 17:32:07 -08:00
vmscan.c mm: vmscan: invoke slab shrinkers from shrink_zone() 2014-12-13 12:42:48 -08:00
vmstat.c mm,vmacache: count number of system-wide flushes 2014-12-13 12:42:48 -08:00
workingset.c
zbud.c Merge Linus' tree to be be to apply submitted patches to newer code than 2014-11-20 14:42:02 +01:00
zpool.c
zsmalloc.c zsmalloc: simplify init_zspage free obj linking 2014-10-09 22:26:03 -04:00
zswap.c Merge Linus' tree to be be to apply submitted patches to newer code than 2014-11-20 14:42:02 +01:00