alistair23-linux/mm
Uladzislau Rezki (Sony) 96e2db4561 mm/vmalloc: rework the drain logic
A current "lazy drain" model suffers from at least two issues.

First one is related to the unsorted list of vmap areas, thus in order to
identify the [min:max] range of areas to be drained, it requires a full
list scan.  What is a time consuming if the list is too long.

Second one and as a next step is about merging all fragments with a free
space.  What is also a time consuming because it has to iterate over
entire list which holds outstanding lazy areas.

See below the "preemptirqsoff" tracer that illustrates a high latency.  It
is ~24676us.  Our workloads like audio and video are effected by such long
latency:

<snip>
  tracer: preemptirqsoff

  preemptirqsoff latency trace v1.1.5 on 4.9.186-perf+
  --------------------------------------------------------------------
  latency: 24676 us, #4/4, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 P:8)
     -----------------
     | task: crtc_commit:112-261 (uid:0 nice:0 policy:1 rt_prio:16)
     -----------------
   => started at: __purge_vmap_area_lazy
   => ended at:   __purge_vmap_area_lazy

                   _------=> CPU#
                  / _-----=> irqs-off
                 | / _----=> need-resched
                 || / _---=> hardirq/softirq
                 ||| / _--=> preempt-depth
                 |||| /     delay
   cmd     pid   ||||| time  |   caller
      \   /      |||||  \    |   /
crtc_com-261     1...1    1us*: _raw_spin_lock <-__purge_vmap_area_lazy
[...]
crtc_com-261     1...1 24675us : _raw_spin_unlock <-__purge_vmap_area_lazy
crtc_com-261     1...1 24677us : trace_preempt_on <-__purge_vmap_area_lazy
crtc_com-261     1...1 24683us : <stack trace>
 => free_vmap_area_noflush
 => remove_vm_area
 => __vunmap
 => vfree
 => drm_property_free_blob
 => drm_mode_object_unreference
 => drm_property_unreference_blob
 => __drm_atomic_helper_crtc_destroy_state
 => sde_crtc_destroy_state
 => drm_atomic_state_default_clear
 => drm_atomic_state_clear
 => drm_atomic_state_free
 => complete_commit
 => _msm_drm_commit_work_cb
 => kthread_worker_fn
 => kthread
 => ret_from_fork
<snip>

To address those two issues we can redesign a purging of the outstanding
lazy areas.  Instead of queuing vmap areas to the list, we replace it by
the separate rb-tree.  In hat case an area is located in the tree/list in
ascending order.  It will give us below advantages:

a) Outstanding vmap areas are merged creating bigger coalesced blocks,
   thus it becomes less fragmented.

b) It is possible to calculate a flush range [min:max] without scanning
   all elements.  It is O(1) access time or complexity;

c) The final merge of areas with the rb-tree that represents a free
   space is faster because of (a).  As a result the lock contention is
   also reduced.

Link: https://lkml.kernel.org/r/20201116220033.1837-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: huang ying <huang.ying.caritas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:41 -08:00
..
kasan kasan: fix object remaining in offline per-cpu quarantine 2020-12-11 14:02:14 -08:00
backing-dev.c bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag 2020-09-24 13:43:39 -06:00
balloon_compaction.c
cleancache.c
cma.c
cma.h
cma_debug.c
compaction.c mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate 2020-11-14 11:26:03 -08:00
debug.c mm, dump_page: rename head_mapcount() --> head_compound_mapcount() 2020-10-13 18:38:29 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped. 2020-10-16 11:11:14 -07:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm: memcontrol: add file_thp, shmem_thp to memory.stat 2020-12-15 12:13:39 -08:00
frame_vector.c
frontswap.c
gup.c mm/gup: combine put_compound_head() and unpin_user_page() 2020-12-15 12:13:39 -08:00
gup_test.c mm/gup_test.c: mark gup_test_init as __init function 2020-12-15 12:13:38 -08:00
gup_test.h selftests/vm: gup_test: introduce the dump_pages() sub-test 2020-12-15 12:13:38 -08:00
highmem.c mm/highmem.c: clean up endif comments 2020-10-16 11:11:18 -07:00
hmm.c
huge_memory.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
hugetlb.c vm_ops: rename .split() callback to .may_split() 2020-12-15 12:13:41 -08:00
hugetlb_cgroup.c hugetlb_cgroup: fix offline of hugetlb cgroup with reservations 2020-12-06 10:19:07 -08:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
internal.h mm: move free_unref_page to mm/internal.h 2020-12-15 12:13:41 -08:00
interval_tree.c
ioremap.c
Kconfig mm/gup_test: GUP_TEST depends on DEBUG_FS 2020-12-15 12:13:38 -08:00
Kconfig.debug
khugepaged.c mm: memcontrol: add file_thp, shmem_thp to memory.stat 2020-12-15 12:13:39 -08:00
kmemleak.c mm/kmemleak: rely on rcu for task stack scanning 2020-10-13 18:38:27 -07:00
ksm.c docs: get rid of :c:type explicit declarations for structs 2020-10-15 07:49:40 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c
madvise.c mm/madvise: remove racy mm ownership check 2020-12-08 20:57:18 -08:00
Makefile mm: mmap_lock: add tracepoints around lock acquisition 2020-12-15 12:13:41 -08:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: enhance the kernel-doc markups 2020-12-15 12:13:41 -08:00
memblock.c memblock: get rid of a :c:type leftover 2020-10-15 07:49:46 +02:00
memcontrol.c mm: memcontrol: account pagetables per node 2020-12-15 12:13:40 -08:00
memfd.c
memory-failure.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
memory.c mm: cleanup: remove unused tsk arg from __access_remote_vm 2020-12-15 12:13:40 -08:00
memory_hotplug.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
mempolicy.c mm: mempolicy: fix potential pte_unmap_unlock pte error 2020-11-02 12:14:19 -08:00
mempool.c mm/mempool: add 'else' to split mutually exclusive case 2020-10-13 18:38:34 -07:00
memremap.c mm/mremap_pages: fix static key devmap_managed_key updates 2020-11-02 12:14:18 -08:00
memtest.c
migrate.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
mincore.c mm: factor find_get_incore_page out of mincore_page 2020-10-13 18:38:29 -07:00
mlock.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
mm_init.c
mmap.c mm: forbid splitting special mappings 2020-12-15 12:13:41 -08:00
mmap_lock.c mm: mmap_lock: add tracepoints around lock acquisition 2020-12-15 12:13:41 -08:00
mmu_gather.c
mmu_notifier.c mm: track mmu notifiers in fs_reclaim_acquire/release 2020-12-15 12:13:41 -08:00
mmzone.c
mprotect.c
mremap.c mremap: check if it's possible to split original vma 2020-12-15 12:13:41 -08:00
msync.c
nommu.c mm: cleanup: remove unused tsk arg from __access_remote_vm 2020-12-15 12:13:40 -08:00
oom_kill.c mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-13 18:38:35 -07:00
page-writeback.c mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback) 2020-11-24 15:23:19 -08:00
page_alloc.c mm: track mmu notifiers in fs_reclaim_acquire/release 2020-12-15 12:13:41 -08:00
page_counter.c mm/page_counter: use page_counter_read in page_counter_set_max 2020-12-15 12:13:40 -08:00
page_ext.c mm: fix page_owner initializing issue for arm32 2020-12-15 12:13:38 -08:00
page_idle.c
page_io.c mm/page_io.c: remove useless out label in __swap_writepage() 2020-10-13 18:38:30 -07:00
page_isolation.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_owner.c mm/page_owner: record timestamp and pid 2020-12-15 12:13:38 -08:00
page_poison.c mm/page_poison.c: replace bool variable with static key 2020-10-16 11:11:17 -07:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h
page_vma_mapped.c mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte 2020-12-15 12:13:41 -08:00
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c percpu: convert flexible array initializers to use struct_size() 2020-10-30 23:02:28 +00:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c mm/process_vm_access: Add missing #include <linux/compat.h> 2020-10-27 12:41:29 -07:00
ptdump.c
readahead.c mm: use limited read-ahead to satisfy read 2020-10-17 13:49:08 -06:00
rmap.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
rodata_test.c
shmem.c mm: memcontrol: add file_thp, shmem_thp to memory.stat 2020-12-15 12:13:39 -08:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h
slab.c mm/slab: rerform init_on_free earlier 2020-12-15 12:13:37 -08:00
slab.h mm: extract might_alloc() debug check 2020-12-15 12:13:41 -08:00
slab_common.c mm: slab: clarify krealloc()'s behavior with __GFP_ZERO 2020-12-15 12:13:37 -08:00
slob.c mm: extract might_alloc() debug check 2020-12-15 12:13:41 -08:00
slub.c mm/slub: let number of online CPUs determine the slub page order 2020-12-15 12:13:38 -08:00
sparse-vmemmap.c
sparse.c mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG 2020-10-16 11:11:18 -07:00
swap.c mm: remove pagevec_lookup_range_nr_tag() 2020-12-15 12:13:39 -08:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c mm/swap_state: skip meaningless swap cache readahead when ra_info.win == 0 2020-12-15 12:13:39 -08:00
swapfile.c mm/swapfile.c: use memset to fill the swap_map with SWAP_HAS_CACHE 2020-12-15 12:13:39 -08:00
truncate.c mm/truncate: add parameter explanation for invalidate_mapping_pagevec 2020-12-15 12:13:38 -08:00
usercopy.c
userfaultfd.c
util.c mm/util.c: update the kerneldoc for kstrdup_const() 2020-10-16 11:11:17 -07:00
vmacache.c
vmalloc.c mm/vmalloc: rework the drain logic 2020-12-15 12:13:41 -08:00
vmpressure.c
vmscan.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-15 12:13:39 -08:00
vmstat.c mm: memcontrol: account pagetables per node 2020-12-15 12:13:40 -08:00
workingset.c mm: memcg/slab: rename *_lruvec_slab_state to *_lruvec_kmem_state 2020-12-15 12:13:40 -08:00
z3fold.c mm/z3fold.c: use xx_zalloc instead xx_alloc and memset 2020-10-13 18:38:34 -07:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c
zsmalloc.c mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
zswap.c