alistair23-linux/mm
Hugh Dickins cda540ace6 mm: get_user_pages(write,force) refuse to COW in shared areas
get_user_pages(write=1, force=1) has always had odd behaviour on write-
protected shared mappings: although it demands FMODE_WRITE-access to the
underlying object (do_mmap_pgoff sets neither VM_SHARED nor VM_MAYWRITE
without that), it ends up with do_wp_page substituting private anonymous
Copied-On-Write pages for the shared file pages in the area.

That was long ago intentional, as a safety measure to prevent ptrace
setting a breakpoint (or POKETEXT or POKEDATA) from inadvertently
corrupting the underlying executable.  Yet exec and dynamic loaders open
the file read-only, and use MAP_PRIVATE rather than MAP_SHARED.

The traditional odd behaviour still causes surprises and bugs in mm, and
is probably not what any caller wants - even the comment on the flag
says "You do not want this" (although it's undoubtedly necessary for
overriding userspace protections in some contexts, and good when !write).

Let's stop doing that.  But it would be dangerous to remove the long-
standing safety at this stage, so just make get_user_pages(write,force)
fail with EFAULT when applied to a write-protected shared area.
Infiniband may in future want to force write through to underlying
object: we can add another FOLL_flag later to enable that if required.

Odd though the old behaviour was, there is no doubt that we may turn out
to break userspace with this change, and have to revert it quickly.
Issue a WARN_ON_ONCE to help debug the changed case (easily triggered by
userspace, so only once to prevent spamming the logs); and delay a few
associated cleanups until this change is proved.

get_user_pages callers who might see trouble from this change:
  ptrace poking, or writing to /proc/<pid>/mem
  drivers/infiniband/
  drivers/media/v4l2-core/
  drivers/gpu/drm/exynos/exynos_drm_gem.c
  drivers/staging/tidspbridge/core/tiomap3430.c
if they ever apply get_user_pages to write-protected shared mappings
of an object which was opened for writing.

I went to apply the same change to mm/nommu.c, but retreated.  NOMMU has
no place for COW, and its VM_flags conventions are not the same: I'd be
more likely to screw up NOMMU than make an improvement there.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-04 16:16:55 -07:00
..
backing-dev.c bdi: avoid oops on device removal 2014-04-03 16:20:49 -07:00
balloon_compaction.c mm: print more details for bad_page() 2014-01-23 16:36:50 -08:00
bootmem.c mm/bootmem.c: remove unused local `map' 2013-11-13 12:09:09 +09:00
bounce.c block: Convert bio_for_each_segment() to bvec_iter 2013-11-23 22:33:49 -08:00
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
compaction.c mm/compaction.c: mark function as static 2014-04-03 16:21:02 -07:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap.c mm: remove read_cache_page_async() 2014-04-03 16:21:04 -07:00
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
fremap.c mm: fix bad rss-counter if remap_file_pages raced migration 2014-03-19 16:21:49 -07:00
frontswap.c
highmem.c
huge_memory.c mm, thp: drop do_huge_pmd_wp_zero_page_fallback() 2014-04-03 16:21:04 -07:00
hugetlb.c mm, hugetlb: mark some bootstrap functions as __init 2014-04-03 16:21:01 -07:00
hugetlb_cgroup.c cgroup: drop const from @buffer of cftype->write_string() 2014-03-19 10:23:54 -04:00
hwpoison-inject.c mm/hwpoison: add '#' to hwpoison_inject 2014-01-21 16:19:48 -08:00
init-mm.c
internal.h mm/page-writeback.c: do not count anon pages as dirtyable memory 2014-01-29 16:22:39 -08:00
interval_tree.c
Kconfig mm/Kconfig: fix URL for zsmalloc benchmark 2014-03-10 17:26:20 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: change some global variables to int 2014-04-03 16:20:50 -07:00
ksm.c mm: close PageTail race 2014-03-04 07:55:47 -08:00
list_lru.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
maccess.c
madvise.c
Makefile mm: thrash detection-based file cache sizing 2014-04-03 16:21:01 -07:00
memblock.c memblock: add limit checking to memblock_virt_alloc 2014-01-29 16:22:40 -08:00
memcontrol.c Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-04-03 13:05:42 -07:00
memory-failure.c Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-04-03 13:05:42 -07:00
memory.c mm: get_user_pages(write,force) refuse to COW in shared areas 2014-04-04 16:16:55 -07:00
memory_hotplug.c mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug 2014-01-23 16:36:52 -08:00
mempolicy.c mm: optimize put_mems_allowed() usage 2014-04-03 16:20:58 -07:00
mempool.c
migrate.c mm: fix swapops.h:131 bug if remap_file_pages raced migration 2014-03-20 22:09:09 -07:00
mincore.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
mlock.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
mm_init.c mm: bring back /sys/kernel/mm 2014-01-27 21:02:39 -08:00
mmap.c Merge branch 'locks-3.15' of git://git.samba.org/jlayton/linux 2014-04-04 14:21:20 -07:00
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm 2014-02-21 08:50:17 +01:00
mmu_notifier.c mm: audit/fix non-modular users of module_init in core code 2014-01-23 16:36:52 -08:00
mmzone.c
mprotect.c mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit 2014-02-17 11:19:36 +11:00
mremap.c mm: revert mremap pud_free anti-fix 2013-10-16 21:35:53 -07:00
msync.c
nobootmem.c mm/nobootmem.c: mark function as static 2014-04-03 16:21:02 -07:00
nommu.c locks: fix locks_mandatory_locked to respect file-private locks 2014-03-31 08:24:43 -04:00
oom_kill.c mm, oom: base root bonus on current usage 2014-01-30 16:56:56 -08:00
page-writeback.c mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() 2014-02-06 13:48:51 -08:00
page_alloc.c mm: optimize put_mems_allowed() usage 2014-04-03 16:20:58 -07:00
page_cgroup.c mm/page_cgroup.c: mark functions as static 2014-04-03 16:21:02 -07:00
page_io.c Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
page_isolation.c
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c
percpu.c percpu: renew the max_contig if we merge the head and previous block 2014-03-29 09:29:42 -04:00
pgtable-generic.c mm: fix TLB flush race between migration, and change_protection_range 2013-12-18 19:04:51 -08:00
process_vm_access.c mm/process_vm_access.c: mark function as static 2014-04-03 16:21:02 -07:00
quicklist.c
readahead.c mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages 2014-04-03 16:21:05 -07:00
rmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2014-03-31 14:35:30 -07:00
shmem.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
slab.c mm: optimize put_mems_allowed() usage 2014-04-03 16:20:58 -07:00
slab.h memcg, slab: RCU protect memcg_params for root caches 2014-01-23 16:36:51 -08:00
slab_common.c slab: fix wrong retval on kmem_cache_create_memcg error path 2014-01-29 16:22:40 -08:00
slob.c
slub.c slub: do not drop slab_mutex for sysfs_slab_add 2014-04-03 16:21:05 -07:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c sparse: fix comment 2014-04-02 09:16:17 +02:00
swap.c mm: thrash detection-based file cache sizing 2014-04-03 16:21:01 -07:00
swap_state.c swap: add a simple detector for inappropriate swapin readahead 2014-02-06 13:48:51 -08:00
swapfile.c mm/swap: fix race on swap_info reuse between swapoff and swapon 2014-02-06 13:48:51 -08:00
truncate.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
util.c mm: add overcommit_kbytes sysctl variable 2014-01-21 16:19:44 -08:00
vmalloc.c Revert "mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page}" 2014-01-27 21:02:39 -08:00
vmpressure.c arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
vmscan.c mm: thrash detection-based file cache sizing 2014-04-03 16:21:01 -07:00
vmstat.c drop_caches: add some documentation and info message 2014-04-03 16:21:04 -07:00
workingset.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
zbud.c
zsmalloc.c zsmalloc: add copyright 2014-01-30 16:56:55 -08:00
zswap.c mm/zswap.c: change params from hidden to ro 2014-01-23 16:36:50 -08:00