alistair23-linux/mm
Peter Xu 292924b260 userfaultfd: wp: apply _PAGE_UFFD_WP bit
Firstly, introduce two new flags MM_CP_UFFD_WP[_RESOLVE] for
change_protection() when used with uffd-wp and make sure the two new flags
are exclusively used.  Then,

  - For MM_CP_UFFD_WP: apply the _PAGE_UFFD_WP bit and remove _PAGE_RW
    when a range of memory is write protected by uffd

  - For MM_CP_UFFD_WP_RESOLVE: remove the _PAGE_UFFD_WP bit and recover
    _PAGE_RW when write protection is resolved from userspace

And use this new interface in mwriteprotect_range() to replace the old
MM_CP_DIRTY_ACCT.

Do this change for both PTEs and huge PMDs.  Then we can start to identify
which PTE/PMD is write protected by general (e.g., COW or soft dirty
tracking), and which is for userfaultfd-wp.

Since we should keep the _PAGE_UFFD_WP when doing pte_modify(), add it
into _PAGE_CHG_MASK as well.  Meanwhile, since we have this new bit, we
can be even more strict when detecting uffd-wp page faults in either
do_wp_page() or wp_huge_pmd().

After we're with _PAGE_UFFD_WP, a special case is when a page is both
protected by the general COW logic and also userfault-wp.  Here the
userfault-wp will have higher priority and will be handled first.  Only
after the uffd-wp bit is cleared on the PTE/PMD will we continue to handle
the general COW.  These are the steps on what will happen with such a
page:

  1. CPU accesses write protected shared page (so both protected by
     general COW and uffd-wp), blocked by uffd-wp first because in
     do_wp_page we'll handle uffd-wp first, so it has higher priority
     than general COW.

  2. Uffd service thread receives the request, do UFFDIO_WRITEPROTECT
     to remove the uffd-wp bit upon the PTE/PMD.  However here we
     still keep the write bit cleared.  Notify the blocked CPU.

  3. The blocked CPU resumes the page fault process with a fault
     retry, during retry it'll notice it was not with the uffd-wp bit
     this time but it is still write protected by general COW, then
     it'll go though the COW path in the fault handler, copy the page,
     apply write bit where necessary, and retry again.

  4. The CPU will be able to access this page with write bit set.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-8-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
..
kasan kasan: detect negative size in memory operation function 2020-04-02 09:35:30 -07:00
backing-dev.c memcg: fix a crash in wb_workfn when a device disappears 2020-01-31 10:30:36 -08:00
balloon_compaction.c
cleancache.c
cma.c
cma.h
cma_debug.c
compaction.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
debug.c mm: dump_page(): additional diagnostics for huge pinned pages 2020-04-02 09:35:27 -07:00
debug_page_ref.c
dmapool.c
early_ioremap.c mm/early_ioremap.c: use %pa to print resource_size_t variables 2020-01-31 10:30:38 -08:00
fadvise.c
failslab.c
filemap.c mm: allow VM_FAULT_RETRY for multiple times 2020-04-02 09:35:30 -07:00
frame_vector.c
frontswap.c
gup.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
gup_benchmark.c mm/gup_benchmark: support pin_user_pages() and related calls 2020-04-02 09:35:27 -07:00
highmem.c mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions 2019-12-10 10:12:55 +01:00
hmm.c mm/hmm: return error for non-vma snapshots 2020-03-30 16:58:36 -03:00
huge_memory.c userfaultfd: wp: apply _PAGE_UFFD_WP bit 2020-04-07 10:43:39 -07:00
hugetlb.c mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge() 2020-04-02 09:35:32 -07:00
hugetlb_cgroup.c hugetlb_cgroup: add accounting for shared mappings 2020-04-02 09:35:32 -07:00
hwpoison-inject.c
init-mm.c
internal.h mm: add function __putback_isolated_page 2020-04-07 10:43:38 -07:00
interval_tree.c
Kconfig mm: introduce Reported pages 2020-04-07 10:43:38 -07:00
Kconfig.debug mm: add generic ptdump 2020-02-04 03:05:25 +00:00
khugepaged.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
kmemleak-test.c
kmemleak.c mm/kmemleak.c: use address-of operator on section symbols 2020-04-02 09:35:26 -07:00
ksm.c mm/ksm.c: update get_user_pages() argument in comment 2020-04-07 10:43:38 -07:00
list_lru.c mm: memcg/slab: use mem_cgroup_from_obj() 2020-04-02 09:35:28 -07:00
maccess.c
madvise.c mm: do not allow MADV_PAGEOUT for CoW pages 2020-03-21 18:56:06 -07:00
Makefile mm: introduce Reported pages 2020-04-07 10:43:38 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: update huge page-table entry callbacks 2020-04-02 09:35:29 -07:00
memblock.c mm/memblock.c: remove redundant assignment to variable max_addr 2020-04-02 09:35:32 -07:00
memcontrol.c mm, memcg: bypass high reclaim iteration for cgroup hierarchy root 2020-04-07 10:43:37 -07:00
memfd.c
memory-failure.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
memory.c userfaultfd: wp: apply _PAGE_UFFD_WP bit 2020-04-07 10:43:39 -07:00
memory_hotplug.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
mempolicy.c mm: merge parameters for change_protection() 2020-04-07 10:43:39 -07:00
mempool.c
memremap.c memremap: add an owner field to struct dev_pagemap 2020-03-26 14:33:37 -03:00
memtest.c
migrate.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
mincore.c mm: pagewalk: add 'depth' parameter to pte_hole 2020-02-04 03:05:25 +00:00
mlock.c
mm_init.c
mmap.c mm/vma: make vma_is_accessible() available for general use 2020-04-07 10:43:37 -07:00
mmu_context.c
mmu_gather.c asm-generic/tlb: provide MMU_GATHER_TABLE_FREE 2020-02-04 03:05:26 +00:00
mmu_notifier.c mm/mmu_notifier: silence PROVE_RCU_LIST warnings 2020-03-21 18:56:06 -07:00
mmzone.c
mprotect.c userfaultfd: wp: apply _PAGE_UFFD_WP bit 2020-04-07 10:43:39 -07:00
mremap.c mm/mremap: add MREMAP_DONTUNMAP to mremap() 2020-04-02 09:35:30 -07:00
msync.c
nommu.c x86/mm: split vmalloc_sync_all() 2020-03-21 18:56:06 -07:00
oom_kill.c mm, oom: dump stack of victim when reaping failed 2020-01-31 10:30:38 -08:00
page-writeback.c mm/gup/writeback: add callbacks for inaccessible pages 2020-04-02 09:35:27 -07:00
page_alloc.c mm: introduce Reported pages 2020-04-07 10:43:38 -07:00
page_counter.c mm, memcg: prevent memory.min load/store tearing 2020-04-02 09:35:29 -07:00
page_ext.c mm/sparse: rename pfn_present() to pfn_in_present_section() 2020-04-02 09:35:30 -07:00
page_idle.c
page_io.c fs: Enable bmap() function to properly return errors 2020-02-03 08:05:37 -05:00
page_isolation.c mm: add function __putback_isolated_page 2020-04-07 10:43:38 -07:00
page_owner.c
page_poison.c
page_reporting.c mm/page_reporting: add budget limit on how many pages can be reported per pass 2020-04-07 10:43:39 -07:00
page_reporting.h mm: introduce Reported pages 2020-04-07 10:43:38 -07:00
page_vma_mapped.c mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page 2020-01-31 10:30:38 -08:00
pagewalk.c x86: mm: avoid allocating struct mm_struct on the stack 2020-02-04 03:05:25 +00:00
percpu-internal.h
percpu-km.c
percpu-stats.c percpu: update copyright emails to dennis@kernel.org 2020-04-01 10:09:12 -07:00
percpu-vm.c
percpu.c percpu: update copyright emails to dennis@kernel.org 2020-04-01 10:09:12 -07:00
pgtable-generic.c
process_vm_access.c mm: docs: Fix a comment in process_vm_rw_core 2020-03-25 10:04:01 -05:00
ptdump.c x86: mm: avoid allocating struct mm_struct on the stack 2020-02-04 03:05:25 +00:00
readahead.c
rmap.c mm: remove CONFIG_TRANSPARENT_HUGE_PAGECACHE 2020-04-07 10:43:38 -07:00
rodata_test.c
shmem.c mm: remove CONFIG_TRANSPARENT_HUGE_PAGECACHE 2020-04-07 10:43:38 -07:00
shuffle.c mm: adjust shuffle code to allow for future coalescing 2020-04-07 10:43:38 -07:00
shuffle.h mm: adjust shuffle code to allow for future coalescing 2020-04-07 10:43:38 -07:00
slab.c mm, debug_pagealloc: don't rely on static keys too early 2020-01-13 18:19:02 -08:00
slab.h mm: kmem: rename (__)memcg_kmem_(un)charge_memcg() to __memcg_kmem_(un)charge() 2020-04-02 09:35:28 -07:00
slab_common.c mm, memcg: fix build error around the usage of kmem_caches 2020-04-02 09:35:28 -07:00
slob.c
slub.c slub: relocate freelist pointer to middle of object 2020-04-02 09:35:26 -07:00
sparse-vmemmap.c
sparse.c mm/sparse.c: allocate memmap preferring the given node 2020-04-02 09:35:30 -07:00
swap.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: assign|reset cache slot by value directly 2020-04-02 09:35:27 -07:00
swap_state.c mm/swap_state.c: use the same way to count page in [add_to|delete_from]_swap_cache 2020-04-02 09:35:28 -07:00
swapfile.c mm/swapfile: fix data races in try_to_unuse() 2020-04-02 09:35:27 -07:00
truncate.c
usercopy.c
userfaultfd.c userfaultfd: wp: apply _PAGE_UFFD_WP bit 2020-04-07 10:43:39 -07:00
util.c
vmacache.c
vmalloc.c mm/vmalloc: fix a typo in comment 2020-04-07 10:43:38 -07:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
vmstat.c mm, thp: track fallbacks due to failed memcg charges separately 2020-04-07 10:43:38 -07:00
workingset.c
z3fold.c mm/z3fold.c: do not include rwlock.h directly 2020-03-06 07:06:09 -06:00
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: fix the migrated zspage statistics. 2020-01-04 13:55:09 -08:00
zswap.c zswap: potential NULL dereference on error in init_zswap() 2020-01-31 10:30:39 -08:00