remarkable-linux/mm
Dave Hansen 1de4fa14ee x86, mpx: Cleanup unused bound tables
The previous patch allocates bounds tables on-demand.  As noted in
an earlier description, these can add up to *HUGE* amounts of
memory.  This has caused OOMs in practice when running tests.

This patch adds support for freeing bounds tables when they are no
longer in use.

There are two types of mappings in play when unmapping tables:
 1. The mapping with the actual data, which userspace is
    munmap()ing or brk()ing away, etc...
 2. The mapping for the bounds table *backing* the data
    (is tagged with VM_MPX, see the patch "add MPX specific
    mmap interface").

If userspace use the prctl() indroduced earlier in this patchset
to enable the management of bounds tables in kernel, when it
unmaps the first type of mapping with the actual data, the kernel
needs to free the mapping for the bounds table backing the data.
This patch hooks in at the very end of do_unmap() to do so.
We look at the addresses being unmapped and find the bounds
directory entries and tables which cover those addresses.  If
an entire table is unused, we clear associated directory entry
and free the table.

Once we unmap the bounds table, we would have a bounds directory
entry pointing at empty address space. That address space might
now be allocated for some other (random) use, and the MPX
hardware might now try to walk it as if it were a bounds table.
That would be bad.  So any unmapping of an enture bounds table
has to be accompanied by a corresponding write to the bounds
directory entry to invalidate it.  That write to the bounds
directory can fault, which causes the following problem:

Since we are doing the freeing from munmap() (and other paths
like it), we hold mmap_sem for write. If we fault, the page
fault handler will attempt to acquire mmap_sem for read and
we will deadlock.  To avoid the deadlock, we pagefault_disable()
when touching the bounds directory entry and use a
get_user_pages() to resolve the fault.

The unmapping of bounds tables happends under vm_munmap().  We
also (indirectly) call vm_munmap() to _do_ the unmapping of the
bounds tables.  We avoid unbounded recursion by disallowing
freeing of bounds tables *for* bounds tables.  This would not
occur normally, so should not have any practical impact.  Being
strict about it here helps ensure that we do not have an
exploitable stack overflow.

Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:54 +01:00
..
backing-dev.c Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block 2014-10-18 11:53:51 -07:00
balloon_compaction.c mm/balloon_compaction: fix deflation when compaction is disabled 2014-10-29 16:33:15 -07:00
bootmem.c mm/bootmem.c: use include/linux/ headers 2014-10-09 22:26:00 -04:00
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
cma.c mm: cma: Use %pa to print physical addresses 2014-10-27 13:00:55 +01:00
compaction.c mm/compaction.c: avoid premature range skip in isolate_migratepages_range 2014-10-29 16:33:13 -07:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
debug.c mm/debug.c: use pr_emerg() 2014-10-09 22:25:59 -04:00
dmapool.c mm/dmapool.c: fixed a brace coding style issue 2014-10-09 22:26:00 -04:00
early_ioremap.c mm: create generic early_ioremap() support 2014-04-07 16:36:15 -07:00
fadvise.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c mm/filemap.c: remove trailing whitespace 2014-10-09 22:26:00 -04:00
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
fremap.c mm: mark remap_file_pages() syscall as deprecated 2014-06-06 16:08:17 -07:00
frontswap.c swap: change swap_list_head to plist, add swap_avail_head 2014-06-04 16:54:07 -07:00
gup.c mm: introduce a general RCU get_user_pages_fast() 2014-10-09 22:26:00 -04:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c mm, thp: fix collapsing of hugepages on madvise 2014-10-29 16:33:14 -07:00
hugetlb.c mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA 2014-10-09 22:25:57 -04:00
hugetlb_cgroup.c hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked 2014-08-29 16:28:16 -07:00
hwpoison-inject.c mm/hwpoison-inject.c: remove unnecessary null test before debugfs_remove_recursive 2014-08-06 18:01:19 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm, compaction: pass gfp mask to compact_control 2014-10-09 22:25:55 -04:00
interval_tree.c mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA 2014-10-09 22:25:57 -04:00
iov_iter.c Add copy_to_iter(), copy_from_iter() and iov_iter_zero() 2014-10-09 02:39:03 -04:00
Kconfig mm/balloon_compaction: add vmstat counters and kpageflags bit 2014-10-09 22:26:01 -04:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c mm: introduce kmemleak_update_trace() 2014-06-06 16:08:17 -07:00
ksm.c mm: ksm use pr_err instead of printk 2014-10-09 22:26:00 -04:00
list_lru.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: update the description for madvise_remove 2014-08-06 18:01:18 -07:00
Makefile Fixup for 3.18: use PATCHv2 of "mm: Support compiling out madvise and fadvise" 2014-10-12 09:21:57 -04:00
memblock.c mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() 2014-09-10 15:42:12 -07:00
memcontrol.c mm: memcontrol: fix missed end-writeback page accounting 2014-10-29 16:33:15 -07:00
memory-failure.c cgroup: remove redundant check in cgroup_ino() 2014-09-19 09:16:23 -04:00
memory.c zap_pte_range: update addr when forcing flush after TLB batching faiure 2014-10-28 13:16:28 -07:00
memory_hotplug.c memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() 2014-10-29 16:33:14 -07:00
mempolicy.c mm: mempolicy: skip inaccessible VMAs when setting MPOL_MF_LAZY 2014-10-09 22:26:02 -04:00
mempool.c mm/mempool.c: update the kmemleak stack trace for mempool allocations 2014-06-06 16:08:17 -07:00
migrate.c mm/balloon_compaction: redesign ballooned pages management 2014-10-09 22:26:01 -04:00
mincore.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
mlock.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 15:44:12 +02:00
mm_init.c mm: bring back /sys/kernel/mm 2014-01-27 21:02:39 -08:00
mmap.c x86, mpx: Cleanup unused bound tables 2014-11-18 00:58:54 +01:00
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm 2014-02-21 08:50:17 +01:00
mmu_notifier.c kvm: Fix page ageing bugs 2014-09-24 14:07:58 +02:00
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared 2014-10-14 02:18:28 +02:00
mremap.c mm/mremap.c: use linux headers 2014-10-09 22:26:00 -04:00
msync.c msync: fix incorrect fstart calculation 2014-07-03 09:21:53 -07:00
nobootmem.c mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() 2014-09-10 15:42:12 -07:00
nommu.c percpu_counter: add @gfp to percpu_counter_init() 2014-09-08 09:51:29 +09:00
oom_kill.c OOM, PM: OOM killed task shouldn't escape PM suspend 2014-10-21 23:44:21 +02:00
page-writeback.c mm: memcontrol: fix missed end-writeback page accounting 2014-10-29 16:33:15 -07:00
page_alloc.c OOM, PM: OOM killed task shouldn't escape PM suspend 2014-10-21 23:44:21 +02:00
page_cgroup.c cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. 2014-10-29 16:33:13 -07:00
page_io.c fix __swap_writepage() compile failure on old gcc versions 2014-06-14 19:30:48 -05:00
page_isolation.c mm: memory-hotplug: enable memory hotplug to handle hugepage 2013-09-11 15:57:48 -07:00
pagewalk.c mm: use VM_BUG_ON_MM where possible 2014-10-09 22:25:58 -04:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: fix how @gfp is interpreted by the percpu allocator 2014-10-08 12:01:52 -04:00
pgtable-generic.c mm: actually clear pmd_numa before invalidating 2014-08-29 16:28:15 -07:00
process_vm_access.c start adding the tag to iov_iter 2014-05-06 17:32:49 -04:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c mm/readahead.c: remove unused file_ra_state from count_history_pages 2014-08-06 18:01:15 -07:00
rmap.c mm: rmap: split out page_remove_file_rmap() 2014-10-29 16:33:15 -07:00
shmem.c shmem: support RENAME_WHITEOUT 2014-10-24 00:14:37 +02:00
slab.c mm/slab: fix unaligned access on sparc64 2014-10-14 02:18:12 +02:00
slab.h mm/slab: use percpu allocator for cpu cache 2014-10-09 22:25:51 -04:00
slab_common.c mm/slab_common: don't check for duplicate cache names 2014-10-29 16:33:15 -07:00
slob.c mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller() 2014-10-09 22:25:50 -04:00
slub.c mm/slab_common: commonize slab merge logic 2014-10-09 22:25:51 -04:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c mm: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:35:54 -07:00
swap.c mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache 2014-10-09 22:25:59 -04:00
swap_state.c mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache 2014-10-09 22:25:59 -04:00
swapfile.c mm: memcontrol: rewrite uncharge API 2014-08-08 15:57:17 -07:00
truncate.c mm: Fix comment before truncate_setsize() 2014-11-07 08:29:25 +11:00
util.c proc/maps: make vm_is_stack() logic namespace-friendly 2014-10-09 22:25:50 -04:00
vmacache.c mm,vmacache: optimize overflow system-wide flushing 2014-06-04 16:53:57 -07:00
vmalloc.c mm/vmalloc.c: use seq_open_private() instead of seq_open() 2014-10-09 22:25:56 -04:00
vmpressure.c arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
vmscan.c mm: memcontrol: fix transparent huge page allocations under pressure 2014-10-09 22:25:59 -04:00
vmstat.c vmstat: on-demand vmstat workers V8 2014-10-09 22:26:02 -04:00
workingset.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
zbud.c zbud: avoid accessing last unused freelist 2014-10-09 22:26:03 -04:00
zpool.c mm/zpool: use prefixed module loading 2014-08-29 16:28:16 -07:00
zsmalloc.c zsmalloc: simplify init_zspage free obj linking 2014-10-09 22:26:03 -04:00
zswap.c mm/zswap.c: add __init to zswap_entry_cache_destroy() 2014-08-08 15:57:18 -07:00