1
0
Fork 0
alistair23-linux/mm
Larry Woodman c5c99429fa fix hugepages leak due to pagetable page sharing
The shared page table code for hugetlb memory on x86 and x86_64
is causing a leak.  When a user of hugepages exits using this code
the system leaks some of the hugepages.

-------------------------------------------------------
Part of /proc/meminfo just before database startup:
HugePages_Total:  5500
HugePages_Free:   5500
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Just before shutdown:
HugePages_Total:  5500
HugePages_Free:   4475
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

After shutdown:
HugePages_Total:  5500
HugePages_Free:   4988
HugePages_Rsvd:
0 Hugepagesize:     2048 kB
----------------------------------------------------------

The problem occurs durring a fork, in copy_hugetlb_page_range().  It
locates the dst_pte using huge_pte_alloc().  Since huge_pte_alloc() calls
huge_pmd_share() it will share the pmd page if can, yet the main loop in
copy_hugetlb_page_range() does a get_page() on every hugepage.  This is a
violation of the shared hugepmd pagetable protocol and creates additional
referenced to the hugepages causing a leak when the unmap of the VMA
occurs.  We can skip the entire replication of the ptes when the hugepage
pagetables are shared.  The attached patch skips copying the ptes and the
get_page() calls if the hugetlbpage pagetable is shared.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-24 08:07:27 -08:00
..
Kconfig sparsemem: make SPARSEMEM_VMEMMAP selectable 2007-12-17 19:28:16 -08:00
Makefile memory unplug: page isolation 2007-10-16 09:43:02 -07:00
allocpercpu.c Slab allocators: Replace explicit zeroing with __GFP_ZERO 2007-07-17 10:23:02 -07:00
backing-dev.c mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init 2007-12-05 09:21:18 -08:00
bootmem.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
fadvise.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
filemap.c Do dirty page accounting when removing a page from the page cache 2007-12-19 14:05:13 -08:00
filemap_xip.c xip: fix get_zeroed_page with __GFP_HIGHMEM 2008-01-08 16:10:36 -08:00
fremap.c remap_file_pages: kernel-doc corrections 2007-10-17 08:43:07 -07:00
highmem.c Create the ZONE_MOVABLE zone 2007-07-17 10:22:59 -07:00
hugetlb.c fix hugepages leak due to pagetable page sharing 2008-01-24 08:07:27 -08:00
internal.h Breakout page_order() to internal.h to avoid special knowledge of the buddy allocator 2007-10-16 09:43:01 -07:00
madvise.c speed up madvise_need_mmap_write() usage 2007-07-16 09:05:36 -07:00
memory.c Update ctime and mtime for memory-mapped files 2008-01-23 09:58:55 -08:00
memory_hotplug.c Add IORESOUCE_BUSY flag for System RAM 2007-11-14 18:45:39 -08:00
mempolicy.c Migration: find correct vma in new_vma_page() 2007-11-14 18:45:38 -08:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c Typo fixes retrun -> return 2007-10-20 02:13:26 +02:00
mincore.c [PATCH] mincore: vma crossing fix 2007-02-15 09:57:03 -08:00
mlock.c do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY 2007-07-16 09:05:37 -07:00
mmap.c VM/Security: add security hook to do_brk 2007-12-05 09:21:21 -08:00
mmzone.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
mprotect.c fix mprotect vma_wants_writenotify prot 2007-10-23 08:32:06 -07:00
mremap.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c Security: round mmap hint address above mmap_min_addr 2007-12-06 00:25:10 +11:00
oom_kill.c oom_kill bug 2007-10-20 15:04:06 -07:00
page-writeback.c Revert "writeback: introduce writeback_control.more_io to indicate more io" 2008-01-14 21:21:29 -08:00
page_alloc.c mm: fix section mismatch warning in page_alloc.c 2008-01-17 15:38:58 -08:00
page_io.c Drop 'size' argument from bio_endio and bi_end_io 2007-10-10 09:25:57 +02:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
pdflush.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm: bdi init hooks 2007-10-17 08:42:45 -07:00
rmap.c [S390] Optimize storage key handling for anonymous pages 2007-11-20 11:13:46 +01:00
shmem.c tmpfs: restore missing clear_highpage 2007-11-28 11:04:28 -08:00
shmem_acl.c [PATCH] Fix typos in mm/shmem_acl.c 2006-10-11 11:14:23 -07:00
slab.c Unify /proc/slabinfo configuration 2008-01-02 13:04:48 -08:00
slob.c Avoid double memclear() in SLOB/SLUB 2007-12-09 10:17:52 -08:00
slub.c Unify /proc/slabinfo configuration 2008-01-02 13:04:48 -08:00
sparse-vmemmap.c memory hotplug fix: fix section mismatch in vmammap_allock_block() 2007-11-29 09:24:54 -08:00
sparse.c mm/sparse.c: improve the error handling for sparse_add_one_section() 2007-12-17 19:28:16 -08:00
swap.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
swap_state.c mm: clarify __add_to_swap_cache locking 2007-10-16 09:42:53 -07:00
swapfile.c Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION 2007-07-29 16:45:38 -07:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c r/o bind mounts: filesystem helpers for custom 'struct file's 2007-10-17 08:43:04 -07:00
truncate.c Drop some headers from mm.h 2007-10-17 08:42:55 -07:00
util.c fix mm/util.c:krealloc() 2007-11-14 18:45:41 -08:00
vmalloc.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
vmscan.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
vmstat.c vmstat: fix section mismatch warning 2007-11-14 18:45:42 -08:00