1
0
Fork 0
alistair23-linux/mm
Glauber Costa 5413dfba88 slub: drop mutex before deleting sysfs entry
Sasha Levin recently reported a lockdep problem resulting from the new
attribute propagation introduced by kmemcg series.  In short, slab_mutex
will be called from within the sysfs attribute store function.  This will
create a dependency, that will later be held backwards when a cache is
destroyed - since destruction occurs with the slab_mutex held, and then
calls in to the sysfs directory removal function.

In this patch, I propose to adopt a strategy close to what
__kmem_cache_create does before calling sysfs_slab_add, and release the
lock before the call to sysfs_slab_remove.  This is pretty much the last
operation in the kmem_cache_shutdown() path, so we could do better by
splitting this and moving this call alone to later on.  This will fit
nicely when sysfs handling is consistent between all caches, but will look
weird now.

Lockdep info:

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117 Tainted: G        W
  -------------------------------------------------------
  trinity-child13/6961 is trying to acquire lock:
   (s_active#43){++++.+}, at:  sysfs_addrm_finish+0x31/0x60

  but task is already holding lock:
   (slab_mutex){+.+.+.}, at:  kmem_cache_destroy+0x22/0xe0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:
  -> #1 (slab_mutex){+.+.+.}:
          lock_acquire+0x1aa/0x240
          __mutex_lock_common+0x59/0x5a0
          mutex_lock_nested+0x3f/0x50
          slab_attr_store+0xde/0x110
          sysfs_write_file+0xfa/0x150
          vfs_write+0xb0/0x180
          sys_pwrite64+0x60/0xb0
          tracesys+0xe1/0xe6
  -> #0 (s_active#43){++++.+}:
          __lock_acquire+0x14df/0x1ca0
          lock_acquire+0x1aa/0x240
          sysfs_deactivate+0x122/0x1a0
          sysfs_addrm_finish+0x31/0x60
          sysfs_remove_dir+0x89/0xd0
          kobject_del+0x16/0x40
          __kmem_cache_shutdown+0x40/0x60
          kmem_cache_destroy+0x40/0xe0
          mon_text_release+0x78/0xe0
          __fput+0x122/0x2d0
          ____fput+0x9/0x10
          task_work_run+0xbe/0x100
          do_exit+0x432/0xbd0
          do_group_exit+0x84/0xd0
          get_signal_to_deliver+0x81d/0x930
          do_signal+0x3a/0x950
          do_notify_resume+0x3e/0x90
          int_signal+0x12/0x17

  other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(slab_mutex);
                                 lock(s_active#43);
                                 lock(slab_mutex);
    lock(s_active#43);

   *** DEADLOCK ***

  2 locks held by trinity-child13/6961:
   #0:  (mon_lock){+.+.+.}, at:  mon_text_release+0x25/0xe0
   #1:  (slab_mutex){+.+.+.}, at:  kmem_cache_destroy+0x22/0xe0

  stack backtrace:
  Pid: 6961, comm: trinity-child13 Tainted: G        W    3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117
  Call Trace:
    print_circular_bug+0x1fb/0x20c
    __lock_acquire+0x14df/0x1ca0
    lock_acquire+0x1aa/0x240
    sysfs_deactivate+0x122/0x1a0
    sysfs_addrm_finish+0x31/0x60
    sysfs_remove_dir+0x89/0xd0
    kobject_del+0x16/0x40
    __kmem_cache_shutdown+0x40/0x60
    kmem_cache_destroy+0x40/0xe0
    mon_text_release+0x78/0xe0
    __fput+0x122/0x2d0
    ____fput+0x9/0x10
    task_work_run+0xbe/0x100
    do_exit+0x432/0xbd0
    do_group_exit+0x84/0xd0
    get_signal_to_deliver+0x81d/0x930
    do_signal+0x3a/0x950
    do_notify_resume+0x3e/0x90
    int_signal+0x12/0x17

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18 15:02:15 -08:00
..
Kconfig memory-hotplug: document and enable CONFIG_MOVABLE_NODE 2012-12-18 15:02:12 -08:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
Makefile mm: introduce a common interface for balloon pages mobility 2012-12-11 17:22:26 -08:00
backing-dev.c Revert "bdi: add a user-tunable cpu_list for the bdi flusher threads" 2012-12-17 11:29:09 -08:00
balloon_compaction.c mm: introduce a common interface for balloon pages mobility 2012-12-11 17:22:26 -08:00
bootmem.c mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic() 2012-12-12 17:38:35 -08:00
bounce.c bounce: allow use of bounce pool via config option 2012-07-18 16:40:35 -04:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c dmapool: make DMAPOOL_DEBUG detect corruption of free marker 2012-12-11 17:22:24 -08:00
fadvise.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c readahead: fault retry breaks mmap file read random detection 2012-10-09 16:22:47 +09:00
filemap_xip.c mm: move all mmu notifier invocations to be done outside the PT lock 2012-10-09 16:22:58 +09:00
fremap.c remap_file_pages: correctly handle the case of a NULL vm_ops pointer 2012-10-19 13:37:57 -07:00
frontswap.c frontswap: support exclusive gets if tmem backend is capable 2012-09-21 10:38:12 -04:00
highmem.c mm, highmem: get virtual address of the page using PKMAP_ADDR() 2012-12-11 17:22:24 -08:00
huge_memory.c mm: fix kernel BUG at huge_memory.c:1474! 2012-12-16 19:02:38 -08:00
hugetlb.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
hugetlb_cgroup.c cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() 2012-11-19 08:13:38 -08:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
interval_tree.c mm: add CONFIG_DEBUG_VM_RB build option 2012-10-09 16:22:42 +09:00
kmemcheck.c
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: use rbtree instead of prio tree 2012-10-09 16:22:39 +09:00
ksm.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: prepare VM_DONTDUMP for using in drivers 2012-10-09 16:22:18 +09:00
memblock.c x86, mm: Trim memory in memblock to be page aligned 2012-10-24 11:52:21 -07:00
memcontrol.c slab: propagate tunable values 2012-12-18 15:02:14 -08:00
memory-failure.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
memory.c mm: use kbasename() 2012-12-17 17:15:17 -08:00
memory_hotplug.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
mempolicy.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
mempool.c mempool: add @gfp_mask to mempool_create_node() 2012-06-25 11:53:47 +02:00
migrate.c mm,numa: fix update_mmu_cache_pmd call 2012-12-17 19:37:03 -08:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c mm, thp: fix mlock statistics 2012-10-09 16:23:03 +09:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm/mmu_notifier: allocate mmu_notifier in advance 2012-10-25 14:37:53 -07:00
mmzone.c memcg: fix hotplugged memory zone oops 2012-11-16 14:33:04 -08:00
mprotect.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
mremap.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
msync.c
nobootmem.c mm: introduce new field "managed_pages" to struct zone 2012-12-12 17:38:34 -08:00
nommu.c mm: export a function to get vm committed memory 2012-11-15 15:41:22 -08:00
oom_kill.c mm, oom: remove redundant sleep in pagefault oom handler 2012-12-12 17:38:34 -08:00
page-writeback.c writeback: remove nr_pages_dirtied arg from balance_dirty_pages_ratelimited_nr() 2012-12-11 17:22:21 -08:00
page_alloc.c mm: allocate kernel pages to the right memcg 2012-12-18 15:02:12 -08:00
page_cgroup.c memcontrol: use N_MEMORY instead N_HIGH_MEMORY 2012-12-12 17:38:32 -08:00
page_io.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
page_isolation.c memory-hotplug: skip HWPoisoned page when offlining pages 2012-12-11 17:22:22 -08:00
pagewalk.c thp: change split_huge_page_pmd() interface 2012-12-12 17:38:31 -08:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c mm, percpu: Make sure percpu_alloc early parameter has an argument 2012-12-02 06:23:04 -08:00
pgtable-generic.c mm: Only flush the TLB when clearing an accessible pte 2012-12-11 14:28:34 +00:00
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
rmap.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
shmem.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
slab.c memcg: add comments clarifying aspects of cache attribute propagation 2012-12-18 15:02:15 -08:00
slab.h slab: propagate tunable values 2012-12-18 15:02:14 -08:00
slab_common.c slab: propagate tunable values 2012-12-18 15:02:14 -08:00
slob.c sl[au]b: always get the cache from its page in kmem_cache_free() 2012-12-18 15:02:14 -08:00
slub.c slub: drop mutex before deleting sysfs entry 2012-12-18 15:02:15 -08:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c memory-hotplug, mm/sparse.c: clear the memory to store struct page 2012-12-11 17:22:23 -08:00
swap.c mm: remove vma arg from page_evictable 2012-10-09 16:22:55 +09:00
swap_state.c mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages 2012-07-31 18:42:47 -07:00
swapfile.c mm, oom: fix race when specifying a thread as the oom origin 2012-12-11 17:22:27 -08:00
truncate.c mm: use clear_page_mlock() in page_remove_rmap() 2012-10-09 16:22:56 +09:00
util.c Merge branch 'master' into for-next 2012-10-28 19:29:19 +01:00
vmalloc.c mm: use IS_ENABLED(CONFIG_NUMA) instead of NUMA_BUILD 2012-12-11 17:22:22 -08:00
vmscan.c vmscan: use N_MEMORY instead N_HIGH_MEMORY 2012-12-12 17:38:33 -08:00
vmstat.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00