1
0
Fork 0
alistair23-linux/mm
Andrea Arcangeli 7906d00cd1 mmu-notifiers: add mm_take_all_locks() operation
mm_take_all_locks holds off reclaim from an entire mm_struct.  This allows
mmu notifiers to register into the mm at any time with the guarantee that
no mmu operation is in progress on the mm.

This operation locks against the VM for all pte/vma/mm related operations
that could ever happen on a certain mm.  This includes vmtruncate,
try_to_unmap, and all page faults.

The caller must take the mmap_sem in write mode before calling
mm_take_all_locks().  The caller isn't allowed to release the mmap_sem
until mm_drop_all_locks() returns.

mmap_sem in write mode is required in order to block all operations that
could modify pagetables and free pages without need of altering the vma
layout (for example populate_range() with nonlinear vmas).  It's also
needed in write mode to avoid new anon_vmas to be associated with existing
vmas.

A single task can't take more than one mm_take_all_locks() in a row or it
would deadlock.

mm_take_all_locks() and mm_drop_all_locks are expensive operations that
may have to take thousand of locks.

mm_take_all_locks() can fail if it's interrupted by signals.

When mmu_notifier_register returns, we must be sure that the driver is
notified if some task is in the middle of a vmtruncate for the 'mm' where
the mmu notifier was registered (mmu_notifier_invalidate_range_start/end
is run around the vmtruncation but mmu_notifier_register can run after
mmu_notifier_invalidate_range_start and before
mmu_notifier_invalidate_range_end).  Same problem for rmap paths.  And
we've to remove page pinning to avoid replicating the tlb_gather logic
inside KVM (and GRU doesn't work well with page pinning regardless of
needing tlb_gather), so without mm_take_all_locks when vmtruncate frees
the page, kvm would have no way to notice that it mapped into sptes a page
that is going into the freelist without a chance of any further
mmu_notifier notification.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Chris Wright <chrisw@redhat.com>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Izik Eidus <izike@qumranet.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-28 16:30:21 -07:00
..
Kconfig x86: lockless get_user_pages_fast() 2008-07-26 12:00:06 -07:00
Makefile mm: remove mm_init compilation dependency on CONFIG_DEBUG_MEMORY_INIT 2008-07-24 10:47:17 -07:00
allocpercpu.c mm/allocpercpu.c: make 4 functions static 2008-07-26 12:00:12 -07:00
backing-dev.c mm: bdi: fix race in bdi_class device creation 2008-05-20 13:31:53 -07:00
bootmem.c bootmem: replace node_boot_start in struct bootmem_data 2008-07-24 10:47:20 -07:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap.c [patch 3/5] vfs: change remove_suid() to file_remove_suid() 2008-07-26 20:53:16 -04:00
filemap_xip.c [patch 3/5] vfs: change remove_suid() to file_remove_suid() 2008-07-26 20:53:16 -04:00
fremap.c mm: fix various kernel-doc comments 2008-03-19 18:53:35 -07:00
highmem.c highmem: Export totalhigh_pages. 2008-07-19 22:39:46 -07:00
hugetlb.c hugetlb: fix CONFIG_SYSCTL=n build 2008-07-26 12:00:01 -07:00
internal.h mm: export prep_compound_page to mm 2008-07-24 10:47:17 -07:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
memcontrol.c memcg: limit change shrink usage 2008-07-25 10:53:37 -07:00
memory.c make mm/memory.c:print_bad_pte() static 2008-07-26 12:00:12 -07:00
memory_hotplug.c memory-hotplug: add sysfs removable attribute for hotplug memory remove 2008-07-24 10:47:21 -07:00
mempolicy.c hugetlb: modular state for hugetlb page size 2008-07-24 10:47:17 -07:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY 2007-07-16 09:05:37 -07:00
mm_init.c mm: create /sys/kernel/mm 2008-07-24 10:47:17 -07:00
mmap.c mmu-notifiers: add mm_take_all_locks() operation 2008-07-28 16:30:21 -07:00
mmzone.c mm: filter based on a nodemask as well as a gfp_mask 2008-04-28 08:58:19 -07:00
mprotect.c mm: record MAP_NORESERVE status on vmas and fix small page mprotect reservations 2008-07-24 10:47:16 -07:00
mremap.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c tracehook: tracehook_expect_breakpoints 2008-07-26 12:00:09 -07:00
oom_kill.c oom_kill: remove unused parameter in badness() 2008-04-28 08:58:26 -07:00
page-writeback.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
page_alloc.c stop_machine: Wean existing callers off stop_machine_run() 2008-07-28 12:16:31 +10:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c pdflush: use time_after() instead of open-coding it 2008-07-25 10:53:28 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm: readahead scan lockless 2008-07-26 12:00:06 -07:00
rmap.c SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
shmem.c tmpfs: fix kernel BUG in shmem_delete_inode 2008-07-28 16:30:20 -07:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
slab.c SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
slob.c SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
slub.c SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
sparse-vmemmap.c Christoph has moved 2008-07-04 10:40:04 -07:00
sparse.c make mm/sparse.c: make a function static 2008-07-26 12:00:12 -07:00
swap.c mm: remove initialization of static per-cpu variables 2008-07-24 10:47:21 -07:00
swap_state.c mm: print swapcache page count in show_swap_cache_info() 2008-07-26 12:00:10 -07:00
swapfile.c mm/swapfile.c: make code static 2008-07-26 12:00:12 -07:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
truncate.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
util.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-07-26 20:17:56 -07:00
vmalloc.c Use WARN() in mm/vmalloc.c 2008-07-26 12:00:07 -07:00
vmscan.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
vmstat.c mm/vmstat.c: proper externs 2008-07-24 10:47:14 -07:00