alistair23-linux/mm
Yang Shi 88aa7cc688 mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too.  It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.

And, the mmap_sem contention may cause unexpected issue like below:

INFO: task ps:14018 blocked for more than 120 seconds.
       Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
 ps              D    0 14018      1 0x00000004
 Call Trace:
   schedule+0x36/0x80
   rwsem_down_read_failed+0xf0/0x150
   call_rwsem_down_read_failed+0x18/0x30
   down_read+0x20/0x40
   proc_pid_cmdline_read+0xd9/0x4e0
   __vfs_read+0x37/0x150
   vfs_read+0x96/0x130
   SyS_read+0x55/0xc0
   entry_SYSCALL_64_fastpath+0x1a/0xc5

Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.

So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.

This patch just eliminates the abuse of mmap_sem, but it can't resolve
the above hung task warning completely since the later
access_remote_vm() call needs acquire mmap_sem.  The mmap_sem
scalability issue will be solved in the future.

[yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
  Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:34 -07:00
..
kasan kasan: fix memory hotplug during boot 2018-05-25 18:12:11 -07:00
backing-dev.c bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue 2018-05-23 15:28:50 -06:00
balloon_compaction.c
bootmem.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
cleancache.c docs/vm: rename documentation files to .rst 2018-04-16 14:18:15 -06:00
cma.c Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" 2018-05-24 10:07:50 -07:00
cma.h
cma_debug.c
compaction.c Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" 2018-05-24 10:07:50 -07:00
debug.c mm/debug.c: provide useful debugging information for VM_BUG 2018-01-04 16:45:09 -08:00
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() 2018-04-02 20:16:10 +02:00
failslab.c mm: make should_failslab always available for fault injection 2018-04-05 21:36:26 -07:00
filemap.c mm/filemap.c: fix NULL pointer in page_cache_tree_insert() 2018-04-20 17:18:36 -07:00
frame_vector.c
frontswap.c docs/vm: rename documentation files to .rst 2018-04-16 14:18:15 -06:00
gup.c proc: do not access cmdline nor environ from file-backed areas 2018-05-17 09:27:47 -07:00
gup_benchmark.c mm/gup_benchmark: handle gup failures 2018-04-13 17:10:27 -07:00
highmem.c
hmm.c Merge branch 'mm-rst' into docs-next 2018-04-16 14:25:08 -06:00
huge_memory.c There's been a fair amount of work in the docs tree this time around, 2018-06-04 12:34:27 -07:00
hugetlb.c Merge branch 'mm-rst' into docs-next 2018-04-16 14:25:08 -06:00
hugetlb_cgroup.c
hwpoison-inject.c mm/memory_failure: Remove unused trapno from memory_failure 2018-01-23 12:17:42 -06:00
init-mm.c mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct 2018-06-07 17:34:34 -07:00
internal.h Changes for 4.18: 2018-06-05 13:24:20 -07:00
interval_tree.c mm/interval_tree.c: use vma_pages() helper 2018-01-31 17:18:37 -08:00
Kconfig There's been a fair amount of work in the docs tree this time around, 2018-06-04 12:34:27 -07:00
Kconfig.debug
khugepaged.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
kmemleak-test.c
kmemleak.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
ksm.c mm/ksm: docs: extend overview comment and make it "DOC:" 2018-04-27 17:19:24 -06:00
list_lru.c mm: make counting of list_lru_one::nr_items lockless 2018-04-05 21:36:27 -07:00
maccess.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
madvise.c mm/memory_failure: Remove unused trapno from memory_failure 2018-01-23 12:17:42 -06:00
Makefile mm/swap_slots.c: use conditional compilation 2018-04-05 21:36:24 -07:00
memblock.c powerpc updates for 4.17 2018-04-07 12:08:19 -07:00
memcontrol.c fs: add new vfs_poll and file_can_poll helpers 2018-05-26 09:16:44 +02:00
memory-failure.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
memory.c fs/dax.c: use new return type vm_fault_t 2018-06-07 17:34:33 -07:00
memory_hotplug.c mm/memory_hotplug: fix leftover use of struct page during hotplug 2018-05-25 18:12:11 -07:00
mempolicy.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
mempool.c mempool: Add mempool_init()/mempool_exit() 2018-05-14 13:14:23 -06:00
memtest.c
migrate.c mm: migrate: fix double call of radix_tree_replace_slot() 2018-05-11 17:28:45 -07:00
mincore.c
mlock.c mm, mlock, vmscan: no more skipping pagevecs 2018-02-21 15:35:42 -08:00
mm_init.c
mmap.c There's been a fair amount of work in the docs tree this time around, 2018-06-04 12:34:27 -07:00
mmu_context.c
mmu_notifier.c mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks 2018-01-31 17:18:38 -08:00
mmzone.c
mprotect.c sched/numa: avoid trapping faults and attempting migration of file-backed dirty pages 2018-04-11 10:28:31 -07:00
mremap.c
msync.c
nobootmem.c
nommu.c mm/nommu: remove description of alloc_vm_area 2018-04-05 21:36:26 -07:00
oom_kill.c mm, oom: fix concurrent munlock and oom reaper unmap, v3 2018-05-11 17:28:45 -07:00
page-writeback.c writeback: safer lock nesting 2018-04-20 17:18:35 -07:00
page_alloc.c mm, memory_hotplug: make has_unmovable_pages more robust 2018-05-25 18:12:11 -07:00
page_counter.c
page_ext.c mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it 2018-01-31 17:18:39 -08:00
page_idle.c mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one() 2018-04-05 21:36:25 -07:00
page_io.c block: convert to bio_first_bvec_all & bio_first_page_all 2018-01-06 09:18:00 -07:00
page_isolation.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
page_owner.c mm/page_owner.c: make early_page_owner_param() __init 2018-04-05 21:36:26 -07:00
page_poison.c mm/page_poison.c: make early_page_poison_param() __init 2018-04-05 21:36:26 -07:00
page_vma_mapped.c mm, page_vma_mapped: Introduce pfn_in_hpage() 2018-01-22 12:15:57 -08:00
pagewalk.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
percpu-internal.h
percpu-km.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu-stats.c mm: reuse DEFINE_SHOW_ATTRIBUTE() macro 2018-04-05 21:36:25 -07:00
percpu-vm.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu.c arch: remove obsolete architecture ports 2018-04-02 20:20:12 -07:00
pgtable-generic.c mm: do not lose dirty and accessed bits in pmdp_invalidate() 2018-01-31 17:18:38 -08:00
process_vm_access.c mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors 2018-02-06 18:32:48 -08:00
quicklist.c
readahead.c mm: split ->readpages calls to avoid non-contiguous pages lists 2018-06-01 18:37:32 -07:00
rmap.c Linux 4.17-rc2 2018-04-27 17:13:20 -06:00
rodata_test.c
shmem.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
slab.c slab: __GFP_ZERO is incompatible with a constructor 2018-06-07 17:34:34 -07:00
slab.h slab, slub: skip unnecessary kasan_cache_shutdown() 2018-04-05 21:36:24 -07:00
slab_common.c mm: make should_failslab always available for fault injection 2018-04-05 21:36:26 -07:00
slob.c slab: __GFP_ZERO is incompatible with a constructor 2018-06-07 17:34:34 -07:00
slub.c mm/slub: remove obsolete comment 2018-06-07 17:34:34 -07:00
sparse-vmemmap.c mm: merge vmem_altmap_alloc into altmap_alloc_block_buf 2018-01-08 11:46:23 -08:00
sparse.c mm: sections are not offlined during memory hotremove 2018-05-11 17:28:45 -07:00
swap.c mm/swap.c: remove @cold parameter description for release_pages() 2018-04-05 21:36:26 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: use conditional compilation 2018-04-05 21:36:24 -07:00
swap_state.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
swapfile.c mm: fix nr_rotate_swap leak in swapon() error case 2018-05-25 18:12:10 -07:00
truncate.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
usercopy.c usercopy: WARN() on slab cache usercopy region violations 2018-01-15 12:07:48 -08:00
userfaultfd.c mm/userfaultfd.c: remove duplicate include 2018-02-06 18:32:47 -08:00
util.c Merge branch 'mm-rst' into docs-next 2018-04-16 14:25:08 -06:00
vmacache.c
vmalloc.c proc: introduce proc_create_seq_private 2018-05-16 07:23:35 +02:00
vmpressure.c
vmscan.c mm: fix the NULL mapping case in __isolate_lru_page() 2018-06-02 09:33:47 -07:00
vmstat.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
workingset.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
z3fold.c z3fold: fix reclaim lock-ups 2018-05-11 17:28:45 -07:00
zbud.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zpool.c mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc 2018-02-21 15:35:43 -08:00
zsmalloc.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
zswap.c mm, swap, frontswap: fix THP swap if frontswap enabled 2018-02-21 15:35:43 -08:00