remarkable-linux/mm
Ravikiran G Thirumalai 8f5be20bf8 [PATCH] mm: slab: eliminate lock_cpu_hotplug from slab
Here's an attempt towards doing away with lock_cpu_hotplug in the slab
subsystem.  This approach also fixes a bug which shows up when cpus are
being offlined/onlined and slab caches are being tuned simultaneously.

http://marc.theaimsgroup.com/?l=linux-kernel&m=116098888100481&w=2

The patch has been stress tested overnight on a 2 socket 4 core AMD box with
repeated cpu online and offline, while dbench and kernbench process are
running, and slab caches being tuned at the same time.
There were no lockdep warnings either.  (This test on 2,6.18 as 2.6.19-rc
crashes at __drain_pages
http://marc.theaimsgroup.com/?l=linux-kernel&m=116172164217678&w=2 )

The approach here is to hold cache_chain_mutex from CPU_UP_PREPARE until
CPU_ONLINE (similar in approach as worqueue_mutex) .  Slab code sensitive
to cpu_online_map (kmem_cache_create, kmem_cache_destroy, slabinfo_write,
__cache_shrink) is already serialized with cache_chain_mutex.  (This patch
lengthens cache_chain_mutex hold time at kmem_cache_destroy to cover this).
 This patch also takes the cache_chain_sem at kmem_cache_shrink to protect
sanity of cpu_online_map at __cache_shrink, as viewed by slab.
(kmem_cache_shrink->__cache_shrink->drain_cpu_caches).  But, really,
kmem_cache_shrink is used at just one place in the acpi subsystem!  Do we
really need to keep kmem_cache_shrink at all?

Another note.  Looks like a cpu hotplug event can send  CPU_UP_CANCELED to
a registered subsystem even if the subsystem did not receive CPU_UP_PREPARE.
This could be due to a subsystem registered for notification earlier than
the current subsystem crapping out with NOTIFY_BAD. Badness can occur with
in the CPU_UP_CANCELED code path at slab if this happens (The same would
apply for workqueue.c as well).  To overcome this, we might have to use either
a) a per subsystem flag and avoid handling of CPU_UP_CANCELED, or
b) Use a special notifier events like LOCK_ACQUIRE/RELEASE as Gautham was
   using in his experiments, or
c) Do not send CPU_UP_CANCELED to a subsystem which did not receive
   CPU_UP_PREPARE.

I would prefer c).

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:21 -08:00
..
allocpercpu.c [PATCH] Extract the allocpercpu functions from the slab allocator 2006-09-26 08:48:51 -07:00
backing-dev.c [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
bootmem.c [PATCH] bootmem: use MAX_DMA_ADDRESS instead of LOW32LIMIT 2006-09-26 08:48:49 -07:00
bounce.c [PATCH] BLOCK: Separate the bounce buffering code from the highmem code [try #6] 2006-09-30 20:32:11 +02:00
fadvise.c [PATCH] fadvise() make POSIX_FADV_NOREUSE a no-op 2006-08-06 08:57:47 -07:00
filemap.c [PATCH] grab swap token reordered 2006-12-07 08:39:21 -08:00
filemap.h Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
filemap_xip.c [PATCH] mark address_space_operations const 2006-06-28 14:59:04 -07:00
fremap.c [PATCH] paravirt: pte clear not present 2006-10-01 00:39:33 -07:00
highmem.c [PATCH] BLOCK: Separate the bounce buffering code from the highmem code [try #6] 2006-09-30 20:32:11 +02:00
hugetlb.c [PATCH] htlb forget rss with pt sharing 2006-12-07 08:39:21 -08:00
internal.h [PATCH] mm: VM_BUG_ON 2006-09-26 08:48:44 -07:00
Kconfig Fix "can not" in Documentation and Kconfig 2006-10-03 22:53:09 +02:00
madvise.c [PATCH] Fix MADV_REMOVE protection checking 2006-04-17 18:22:18 -07:00
Makefile [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
memory.c [PATCH] grab swap token reordered 2006-12-07 08:39:21 -08:00
memory_hotplug.c [PATCH] Get rid of zone_table[] 2006-12-07 08:39:20 -08:00
mempolicy.c [PATCH] memory page_alloc zonelist caching speedup 2006-12-07 08:39:20 -08:00
mempool.c [PATCH] dm: work around mempool_alloc, bio_alloc_bioset deadlocks 2006-09-01 11:39:09 -07:00
migrate.c [PATCH] Fix sys_move_pages when a NULL node list is passed 2006-11-03 12:27:59 -08:00
mincore.c [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR 2005-04-19 13:29:20 -07:00
mlock.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
mmap.c [PATCH] hugetlb: fix error return for brk() entering a hugepage region 2006-11-14 15:15:01 -08:00
mmzone.c [PATCH] mm/mmzone.c: EXPORT_UNUSED_SYMBOL 2006-07-10 13:24:17 -07:00
mprotect.c [PATCH] paravirt: lazy mmu mode hooks.patch 2006-10-01 00:39:33 -07:00
mremap.c [PATCH] paravirt: lazy mmu mode hooks.patch 2006-10-01 00:39:33 -07:00
msync.c [PATCH] mm: msync() cleanup 2006-09-26 08:48:45 -07:00
nommu.c [PATCH] uclinux: fix mmap() of directory for nommu case 2006-12-06 07:41:26 -08:00
oom_kill.c [PATCH] oom: less memdie 2006-12-07 08:39:20 -08:00
page-writeback.c [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
page_alloc.c [PATCH] mm: add arch_alloc_page 2006-12-07 08:39:21 -08:00
page_io.c [PATCH] swsusp: read speedup 2006-09-26 08:48:58 -07:00
pdflush.c [PATCH] pdflush: handle resume wakeups 2006-06-25 10:01:06 -07:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c [PATCH] Cleanup read_pages() 2006-11-03 12:27:56 -08:00
rmap.c [PATCH] mm: more commenting on lock ordering 2006-10-20 10:26:44 -07:00
shmem.c [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
shmem_acl.c [PATCH] Fix typos in mm/shmem_acl.c 2006-10-11 11:14:23 -07:00
slab.c [PATCH] mm: slab: eliminate lock_cpu_hotplug from slab 2006-12-07 08:39:21 -08:00
slob.c [PATCH] Make kmem_cache_destroy() return void 2006-09-27 08:26:11 -07:00
sparse.c [PATCH] Get rid of zone_table[] 2006-12-07 08:39:20 -08:00
swap.c WorkStruct: make allyesconfig 2006-11-22 14:57:56 +00:00
swap_state.c [PATCH] lockdep: locking init debugging improvement 2006-07-03 15:27:02 -07:00
swapfile.c [PATCH] valid_swaphandles() fix 2006-09-29 09:18:23 -07:00
thrash.c [PATCH] new scheme to preempt swap token 2006-12-07 08:39:21 -08:00
tiny-shmem.c [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree 2006-06-26 12:25:08 -07:00
truncate.c [PATCH] invalidate: remove_mapping() fix 2006-10-17 08:18:43 -07:00
util.c [PATCH] slab: clean up leak tracking ifdefs a little bit 2006-10-04 07:55:13 -07:00
vmalloc.c [PATCH] Fix strange size check in __get_vm_area_node() 2006-11-16 11:43:38 -08:00
vmscan.c [PATCH] balance_pdgat() cleanup 2006-12-07 08:39:21 -08:00
vmstat.c [PATCH] vmscan: Fix temp_priority race 2006-10-28 11:30:50 -07:00