alistair23-linux/mm
Andrea Arcangeli 37a1c49a91 thp: mremap support and TLB optimization
This adds THP support to mremap (decreases the number of split_huge_page()
calls).

Here are also some benchmarks with a proggy like this:

===
#define _GNU_SOURCE
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>

#define SIZE (5UL*1024*1024*1024)

int main()
{
        static struct timeval oldstamp, newstamp;
	long diffsec;
	char *p, *p2, *p3, *p4;
	if (posix_memalign((void **)&p, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&p2, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&p3, 2*1024*1024, 4096))
		perror("memalign"), exit(1);

	memset(p, 0xff, SIZE);
	memset(p2, 0xff, SIZE);
	memset(p3, 0x77, 4096);
	gettimeofday(&oldstamp, NULL);
	p4 = mremap(p, SIZE, SIZE, MREMAP_FIXED|MREMAP_MAYMOVE, p3);
	gettimeofday(&newstamp, NULL);
	diffsec = newstamp.tv_sec - oldstamp.tv_sec;
	diffsec = newstamp.tv_usec - oldstamp.tv_usec + 1000000 * diffsec;
	printf("usec %ld\n", diffsec);
	if (p == MAP_FAILED || p4 != p3)
	//if (p == MAP_FAILED)
		perror("mremap"), exit(1);
	if (memcmp(p4, p2, SIZE))
		printf("mremap bug\n"), exit(1);
	printf("ok\n");

	return 0;
}
===

THP on

 Performance counter stats for './largepage13' (3 runs):

          69195836 dTLB-loads                 ( +-   3.546% )  (scaled from 50.30%)
             60708 dTLB-load-misses           ( +-  11.776% )  (scaled from 52.62%)
         676266476 dTLB-stores                ( +-   5.654% )  (scaled from 69.54%)
             29856 dTLB-store-misses          ( +-   4.081% )  (scaled from 89.22%)
        1055848782 iTLB-loads                 ( +-   4.526% )  (scaled from 80.18%)
              8689 iTLB-load-misses           ( +-   2.987% )  (scaled from 58.20%)

        7.314454164  seconds time elapsed   ( +-   0.023% )

THP off

 Performance counter stats for './largepage13' (3 runs):

        1967379311 dTLB-loads                 ( +-   0.506% )  (scaled from 60.59%)
           9238687 dTLB-load-misses           ( +-  22.547% )  (scaled from 61.87%)
        2014239444 dTLB-stores                ( +-   0.692% )  (scaled from 60.40%)
           3312335 dTLB-store-misses          ( +-   7.304% )  (scaled from 67.60%)
        6764372065 iTLB-loads                 ( +-   0.925% )  (scaled from 79.00%)
              8202 iTLB-load-misses           ( +-   0.475% )  (scaled from 70.55%)

        9.693655243  seconds time elapsed   ( +-   0.069% )

grep thp /proc/vmstat
thp_fault_alloc 35849
thp_fault_fallback 0
thp_collapse_alloc 3
thp_collapse_alloc_failed 0
thp_split 0

thp_split 0 confirms no thp split despite plenty of hugepages allocated.

The measurement of only the mremap time (so excluding the 3 long
memset and final long 10GB memory accessing memcmp):

THP on

usec 14824
usec 14862
usec 14859

THP off

usec 256416
usec 255981
usec 255847

With an older kernel without the mremap optimizations (the below patch
optimizes the non THP version too).

THP on

usec 392107
usec 390237
usec 404124

THP off

usec 444294
usec 445237
usec 445820

I guess with a threaded program that sends more IPI on large SMP it'd
create an even larger difference.

All debug options are off except DEBUG_VM to avoid skewing the
results.

The only problem for native 2M mremap like it happens above both the
source and destination address must be 2M aligned or the hugepmd can't be
moved without a split but that is an hardware limitation.

[akpm@linux-foundation.org: coding-style nitpicking]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:48 -07:00
..
backing-dev.c mm: Add comment explaining task state setting in bdi_forker_thread() 2011-09-02 17:17:02 -06:00
bootmem.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
bounce.c bounce: call flush_dcache_page() after bounce_copy_vec() 2010-09-09 18:57:25 -07:00
cleancache.c mm: cleancache core ops functions and config 2011-05-26 10:01:36 -06:00
compaction.c mm: compaction: make isolate_lru_page() filter-aware 2011-10-31 17:30:44 -07:00
debug-pagealloc.c mm/debug-pagealloc.c: use memchr_inv 2011-10-31 17:30:47 -07:00
dmapool.c devres: fix possible use after free 2011-07-25 20:57:14 -07:00
fadvise.c readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM 2010-03-06 11:26:25 -08:00
failslab.c fault-injection: add ability to export fault_attr in arbitrary directory 2011-08-03 14:25:20 -10:00
filemap.c vfs: iov_iter: have iov_iter_advance decrement nr_segs appropriately 2011-10-28 13:55:08 +02:00
filemap_xip.c mm: Convert i_mmap_lock to a mutex 2011-05-25 08:39:18 -07:00
fremap.c mm: don't access vm_flags as 'int' 2011-05-26 09:20:31 -07:00
highmem.c mm: make HASHED_PAGE_VIRTUAL page_address' struct page argument const. 2011-08-17 13:00:20 -07:00
huge_memory.c thp: mremap support and TLB optimization 2011-10-31 17:30:48 -07:00
hugetlb.c mm: hugetlb: fix coding style issues 2011-07-25 20:57:09 -07:00
hwpoison-inject.c Fix common misspellings 2011-03-31 11:26:23 -03:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm: nommu: sort mm->mmap list properly 2011-05-25 08:39:05 -07:00
Kconfig memblock: add NO_BOOTMEM config symbol 2011-10-31 17:30:47 -07:00
Kconfig.debug mm: debug-pagealloc: fix kconfig dependency warning 2011-03-22 17:44:02 -07:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
ksm.c oom: fix race while temporarily setting current's oom_score_adj 2011-10-31 17:30:45 -07:00
maccess.c maccess,probe_kernel: Make write/read src const void * 2011-05-25 19:56:23 -04:00
madvise.c fs: kill i_alloc_sem 2011-07-20 20:47:46 -04:00
Makefile Cross Memory Attach 2011-10-31 17:30:44 -07:00
memblock.c memblock: add memblock_start_of_DRAM() 2011-10-31 17:30:47 -07:00
memcontrol.c mm: change isolate mode from #define to bitwise type 2011-10-31 17:30:44 -07:00
memory-failure.c HWPoison: add memory_failure_queue() 2011-08-03 11:15:58 -04:00
memory.c mm/futex: fix futex writes on archs with SW tracking of dirty & young 2011-07-25 20:57:11 -07:00
memory_hotplug.c mm: extend memory hotplug API to allow memory hotplug in virtual machines 2011-07-25 20:57:08 -07:00
mempolicy.c mm/mempolicy.c: make copy_from_user() provably correct 2011-09-14 18:09:36 -07:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c mm: migration: clean up unmap_and_move() 2011-10-31 17:30:45 -07:00
mincore.c mm: clarify the radix_tree exceptional cases 2011-08-03 14:25:24 -10:00
mlock.c mm: don't access vm_flags as 'int' 2011-05-26 09:20:31 -07:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mmap: fix and tidy up overcommit page arithmetic 2011-07-25 20:57:09 -07:00
mmu_context.c exit: fix oops in sync_mm_rss 2010-03-24 16:31:21 -07:00
mmu_notifier.c thp: mmu_notifier_test_young 2011-01-13 17:32:46 -08:00
mmzone.c mm: page allocator: adjust the per-cpu counter threshold when memory is low 2011-01-13 17:32:31 -08:00
mprotect.c thp: mprotect: transparent huge page support 2011-01-13 17:32:44 -08:00
mremap.c thp: mremap support and TLB optimization 2011-10-31 17:30:48 -07:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nobootmem.c memblock/nobootmem: remove unneeded code from alloc_bootmem_node_high() 2011-05-25 08:39:31 -07:00
nommu.c mmap: fix and tidy up overcommit page arithmetic 2011-07-25 20:57:09 -07:00
oom_kill.c oom: fix race while temporarily setting current's oom_score_adj 2011-10-31 17:30:45 -07:00
page-writeback.c mm/page-writeback.c: document bdi_min_ratio 2011-10-31 17:30:45 -07:00
page_alloc.c mm: output a list of loaded modules when we hit bad_page() 2011-10-31 17:30:45 -07:00
page_cgroup.c mm: Convert vmalloc/memset to vzalloc 2011-09-15 13:56:56 +02:00
page_io.c block: kill off REQ_UNPLUG 2011-03-10 08:52:27 +01:00
page_isolation.c mm: page_isolation: codeclean fix comment and rm unneeded val init 2010-10-26 16:52:11 -07:00
pagewalk.c pagewalk: fix code comment for THP 2011-07-25 20:57:09 -07:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: remove gfp mask from pcpu_get_vm_areas 2011-01-13 17:32:34 -08:00
percpu.c Merge branch 'for-2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-05-24 11:53:42 -07:00
pgtable-generic.c mm/pgtable-generic.c: fix CONFIG_SWAP=n build 2011-01-26 10:49:58 +10:00
prio_tree.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
process_vm_access.c Cross Memory Attach 2011-10-31 17:30:44 -07:00
quicklist.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
readahead.c readahead: readahead page allocations are OK to fail 2011-05-25 08:39:25 -07:00
rmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback 2011-07-26 10:39:54 -07:00
shmem.c Merge branch 'next-evm' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/ima-2.6 into next 2011-08-09 10:31:03 +10:00
slab.c mm: restrict access to slab files under procfs and sysfs 2011-09-27 22:59:27 +03:00
slob.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
slub.c lib/string.c: introduce memchr_inv() 2011-10-31 17:30:47 -07:00
sparse-vmemmap.c tree-wide: fix comment/printk typos 2010-11-01 15:38:34 -04:00
sparse.c mm: make some struct page's const 2011-07-25 20:57:07 -07:00
swap.c mm: batch activate_page() to reduce lock contention 2011-05-25 08:39:37 -07:00
swap_state.c block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
swapfile.c oom: fix race while temporarily setting current's oom_score_adj 2011-10-31 17:30:45 -07:00
thrash.c mm: swap-token: add a comment for priority aging 2011-07-25 20:57:08 -07:00
truncate.c mm: a few small updates for radix-swap 2011-08-03 14:25:24 -10:00
util.c mm: nommu: sort mm->mmap list properly 2011-05-25 08:39:05 -07:00
vmalloc.c mm: avoid null pointer access in vm_struct via /proc/vmallocinfo 2011-10-31 17:30:47 -07:00
vmscan.c vmscan: count pages into balanced for zone with good watermark 2011-10-31 17:30:47 -07:00
vmstat.c mm: vmscan: immediately reclaim end-of-LRU dirty pages when writeback completes 2011-10-31 17:30:47 -07:00