1
0
Fork 0
alistair23-linux/Documentation
Mel Gorman 1c30844d2d mm: reclaim small amounts of memory when an external fragmentation event occurs
An external fragmentation event was previously described as

    When the page allocator fragments memory, it records the event using
    the mm_page_alloc_extfrag event. If the fallback_order is smaller
    than a pageblock order (order-9 on 64-bit x86) then it's considered
    an event that will cause external fragmentation issues in the future.

The kernel reduces the probability of such events by increasing the
watermark sizes by calling set_recommended_min_free_kbytes early in the
lifetime of the system.  This works reasonably well in general but if
there are enough sparsely populated pageblocks then the problem can still
occur as enough memory is free overall and kswapd stays asleep.

This patch introduces a watermark_boost_factor sysctl that allows a zone
watermark to be temporarily boosted when an external fragmentation causing
events occurs.  The boosting will stall allocations that would decrease
free memory below the boosted low watermark and kswapd is woken if the
calling context allows to reclaim an amount of memory relative to the size
of the high watermark and the watermark_boost_factor until the boost is
cleared.  When kswapd finishes, it wakes kcompactd at the pageblock order
to clean some of the pageblocks that may have been affected by the
fragmentation event.  kswapd avoids any writeback, slab shrinkage and swap
from reclaim context during this operation to avoid excessive system
disruption in the name of fragmentation avoidance.  Care is taken so that
kswapd will do normal reclaim work if the system is really low on memory.

This was evaluated using the same workloads as "mm, page_alloc: Spread
allocations across zones before introducing fragmentation".

1-socket Skylake machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 1 THP allocating thread
--------------------------------------

4.20-rc3 extfrag events < order 9:   804694
4.20-rc3+patch:                      408912 (49% reduction)
4.20-rc3+patch1-4:                    18421 (98% reduction)

                                   4.20.0-rc3             4.20.0-rc3
                                 lowzone-v5r8             boost-v5r8
Amean     fault-base-1      653.58 (   0.00%)      652.71 (   0.13%)
Amean     fault-huge-1        0.00 (   0.00%)      178.93 * -99.00%*

                              4.20.0-rc3             4.20.0-rc3
                            lowzone-v5r8             boost-v5r8
Percentage huge-1        0.00 (   0.00%)        5.12 ( 100.00%)

Note that external fragmentation causing events are massively reduced by
this path whether in comparison to the previous kernel or the vanilla
kernel.  The fault latency for huge pages appears to be increased but that
is only because THP allocations were successful with the patch applied.

1-socket Skylake machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------

4.20-rc3 extfrag events < order 9:  291392
4.20-rc3+patch:                     191187 (34% reduction)
4.20-rc3+patch1-4:                   13464 (95% reduction)

thpfioscale Fault Latencies
                                   4.20.0-rc3             4.20.0-rc3
                                 lowzone-v5r8             boost-v5r8
Min       fault-base-1      912.00 (   0.00%)      905.00 (   0.77%)
Min       fault-huge-1      127.00 (   0.00%)      135.00 (  -6.30%)
Amean     fault-base-1     1467.55 (   0.00%)     1481.67 (  -0.96%)
Amean     fault-huge-1     1127.11 (   0.00%)     1063.88 *   5.61%*

                              4.20.0-rc3             4.20.0-rc3
                            lowzone-v5r8             boost-v5r8
Percentage huge-1       77.64 (   0.00%)       83.46 (   7.49%)

As before, massive reduction in external fragmentation events, some jitter
on latencies and an increase in THP allocation success rates.

2-socket Haswell machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 5 THP allocating threads
----------------------------------------------------------------

4.20-rc3 extfrag events < order 9:  215698
4.20-rc3+patch:                     200210 (7% reduction)
4.20-rc3+patch1-4:                   14263 (93% reduction)

                                   4.20.0-rc3             4.20.0-rc3
                                 lowzone-v5r8             boost-v5r8
Amean     fault-base-5     1346.45 (   0.00%)     1306.87 (   2.94%)
Amean     fault-huge-5     3418.60 (   0.00%)     1348.94 (  60.54%)

                              4.20.0-rc3             4.20.0-rc3
                            lowzone-v5r8             boost-v5r8
Percentage huge-5        0.78 (   0.00%)        7.91 ( 910.64%)

There is a 93% reduction in fragmentation causing events, there is a big
reduction in the huge page fault latency and allocation success rate is
higher.

2-socket Haswell machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------

4.20-rc3 extfrag events < order 9: 166352
4.20-rc3+patch:                    147463 (11% reduction)
4.20-rc3+patch1-4:                  11095 (93% reduction)

thpfioscale Fault Latencies
                                   4.20.0-rc3             4.20.0-rc3
                                 lowzone-v5r8             boost-v5r8
Amean     fault-base-5     6217.43 (   0.00%)     7419.67 * -19.34%*
Amean     fault-huge-5     3163.33 (   0.00%)     3263.80 (  -3.18%)

                              4.20.0-rc3             4.20.0-rc3
                            lowzone-v5r8             boost-v5r8
Percentage huge-5       95.14 (   0.00%)       87.98 (  -7.53%)

There is a large reduction in fragmentation events with some jitter around
the latencies and success rates.  As before, the high THP allocation
success rate does mean the system is under a lot of pressure.  However, as
the fragmentation events are reduced, it would be expected that the
long-term allocation success rate would be higher.

Link: http://lkml.kernel.org/r/20181123114528.28802-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:48 -08:00
..
ABI Device properties framework updates for 4.21-rc1 2018-12-25 15:01:46 -08:00
EDID
PCI pci-v4.20-changes 2018-10-25 06:50:48 -07:00
RCU doc: Fix "struction" typo in RCU memory-ordering documentation 2018-11-12 08:56:25 -08:00
accelerators ocxl: Document new OCXL IOCTLs 2018-06-03 20:40:33 +10:00
accounting psi: cgroup support 2018-10-26 16:26:32 -07:00
acpi ACPI: property: graph: Update graph documentation to use generic references 2018-07-23 12:44:52 +02:00
admin-guide selinux/stable-4.21 PR 20181224 2018-12-27 12:01:58 -08:00
aoe
arm ARM: SoC device tree updates for 4.20 2018-10-29 15:05:20 -07:00
arm64 arm64 festive updates for 4.21 2018-12-25 17:41:56 -08:00
auxdisplay Doc: misc-devices: move lcd-panel-cgram.txt to auxdisplay/ 2018-04-12 16:08:02 +02:00
backlight
block Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
blockdev This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-08-07 11:02:05 -07:00
bus-devices
cdrom Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
cgroup-v1 Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2018-10-25 17:15:46 -07:00
cma
connector
console Documentation: corrections to console/console.txt 2018-08-10 16:09:40 -06:00
core-api XArray: Add xa_cmpxchg_irq and xa_cmpxchg_bh 2018-12-06 08:26:17 -05:00
cpu-freq Documentation: cpu-freq: Frequencies aren't always sorted 2018-11-07 13:29:04 +01:00
cpuidle Documentation: admin-guide: PM: Add cpuidle document 2018-12-03 10:03:36 +01:00
crypto crypto: skcipher - remove remnants of internal IV generators 2018-12-23 11:52:45 +08:00
dev-tools kasan: update documentation 2018-12-28 12:11:44 -08:00
device-mapper This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
devicetree Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-12-27 13:53:32 -08:00
doc-guide Documentation/sphinx: allow "functions" with no parameters 2018-06-30 07:52:42 -06:00
driver-api docs: driver-api: Add I3C documentation 2018-11-12 10:33:49 +01:00
driver-model gpio: Add devm_gpiod_unhinge() 2018-12-11 01:04:23 +00:00
early-userspace initramfs: move gen_initramfs_list.sh from scripts/ to usr/ 2018-08-22 23:21:44 +09:00
extcon
fault-injection Documentation: nvme: Documentation for nvme fault injection 2018-03-26 08:53:43 -06:00
fb This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
features MIPS: Enable IOREMAP_PROT config option for MIPS cpus 2018-11-05 10:15:28 -08:00
filesystems This pull request contains updates for UBIFS: 2018-11-04 14:46:04 -08:00
firmware_class
fmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
fpga docs: fpga: add a document for FPGA Device Feature List (DFL) Framework Overview 2018-07-15 13:55:44 +02:00
gpio Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
gpu Final changes to drm-misc-next for v4.21: 2018-12-07 11:23:05 +10:00
hid Documentation: fix input related doc refs 2017-10-12 11:14:06 -06:00
hwmon hwmon: (ina3221) Read channel input source info from DT 2018-10-10 20:37:13 -07:00
i2c i2c: add i2c bus driver for NVIDIA GPU 2018-11-09 17:46:43 +01:00
ia64 ia64: doc: tweak whitespace for 'console=' parameter 2018-03-05 14:41:38 -08:00
ide Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
iio
infiniband Documentation/ABI: update infiniband sysfs interfaces 2018-02-23 08:18:33 -07:00
input Revert "Input: Add the `REL_WHEEL_HI_RES` event code" 2018-11-22 08:57:44 +01:00
ioctl drm pull for 4.20-rc1 2018-10-28 17:49:53 -07:00
isdn Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
kbuild kbuild: remove unused cc-fullversion variable 2018-11-02 00:15:26 +09:00
kdump
kernel-hacking doc:it_IT: translation for kernel-hacking 2018-07-26 16:21:09 -06:00
laptops platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
leds Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
lightnvm
livepatch livepatch: Remove not longer valid limitations from the documentation 2018-05-24 15:37:57 +02:00
locking This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
m68k Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
maintainer docs: Fix more broken references 2018-06-15 18:11:26 -03:00
md raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
media media fixes for v4.20-rc8 2018-12-25 13:11:30 -08:00
memory-devices
mic
mips Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
misc-devices pci_endpoint_test: Add 2 ioctl commands 2018-07-19 11:46:57 +01:00
mmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
mtd Documentation: mtd: remove stale pxa3xx NAND controller documentation 2018-09-04 23:37:38 +02:00
namespaces
netlabel Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
networking Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2018-12-20 18:20:26 -08:00
nfc
nios2
nvdimm
nvmem Documentation: nvmem: document cell tables and lookup entries 2018-09-28 15:14:54 +02:00
openrisc Documentation: openrisc: Updates to README 2017-10-30 21:37:53 +09:00
parisc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
pcmcia pcmcia: remove long deprecated pcmcia_request_exclusive_irq() function 2018-08-18 12:30:42 -07:00
perf Documentation: perf: Add documentation for ThunderX2 PMU uncore driver 2018-12-06 12:29:47 +00:00
phy
platform
power This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
powerpc powerpc/fadump: Reservationless firmware assisted dump 2018-12-21 11:32:49 +11:00
pps
process The Compiler Attributes series 2018-11-01 18:34:46 -07:00
pti
ptp ptp: Fix documentation to match code. 2018-03-26 12:13:21 -04:00
rapidio Documentation: rapidio: move sysfs interface to ABI 2018-02-23 08:25:45 -07:00
riscv perf: riscv: Add Document for Future Porting Guide 2018-06-04 14:02:11 -07:00
s390 KVM updates for v4.20 2018-10-25 17:57:35 -07:00
scheduler This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
scsi SCSI misc on 20181024 2018-10-25 07:40:30 -07:00
security Merge branch 'next-keys2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2018-11-01 15:23:59 -07:00
serial TTY/Serial patches for 4.20-rc1 2018-10-29 10:42:20 -07:00
sh
sound ALSA: doc: Brush up the old writing-an-alsa-driver 2018-10-18 10:30:01 +02:00
sparc sparc64: Add support for ADI (Application Data Integrity) 2018-03-18 07:38:48 -07:00
sphinx Documentation/sphinx: allow "functions" with no parameters 2018-06-30 07:52:42 -06:00
sphinx-static docs: improve readability for people with poorer eyesight 2018-10-07 09:16:50 -06:00
spi Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
sysctl mm: reclaim small amounts of memory when an external fragmentation event occurs 2018-12-28 12:11:48 -08:00
target
thermal thermal: Add cooling device's statistics in sysfs 2018-04-02 21:49:01 +08:00
timers Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
trace The biggest change here is the updates to kprobes 2018-10-30 09:49:56 -07:00
translations This was a moderately busy cycle for docs, with the usual collection of 2018-08-14 14:29:31 -07:00
usb USB-serial updates for v4.19-rc1 2018-07-20 21:47:15 +02:00
userspace-api x86/speculation: Add prctl() control for indirect branch speculation 2018-11-28 11:57:13 +01:00
virtual x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID 2018-12-14 17:59:54 +01:00
vm Merge drm/drm-next into drm-intel-next-queued 2018-11-20 13:14:08 +02:00
w1 Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
watchdog documentation: watchdog: add documentation for armada-37xx-wdt 2018-10-13 15:19:40 +02:00
wimax
x86 Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-12-26 12:17:43 -08:00
xilinx Documentation: xilinx: Add documentation for eemi APIs 2018-10-09 13:26:05 +02:00
xtensa xtensa: add support for KASAN 2017-12-16 22:37:12 -08:00
.gitignore
Changes
CodingStyle
DMA-API-HOWTO.txt
DMA-API.txt
DMA-ISA-LPC.txt
DMA-attributes.txt
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt irqdomain: Kill CONFIG_IRQ_DOMAIN_DEBUG 2018-01-24 12:32:58 +01:00
IRQ.txt
Intel-IOMMU.txt
Makefile Documentation: add script and build target to check for broken file references 2017-10-12 11:07:42 -06:00
SAK.txt
SM501.txt
SubmittingPatches
atomic_bitops.txt locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit() 2018-02-13 14:55:53 +01:00
atomic_t.txt
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
clearing-warn-once.txt kernel debug: support resetting WARN*_ONCE 2017-11-17 16:10:00 -08:00
conf.py This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt Documentation: remove stale firmware API reference 2018-05-14 16:44:41 +02:00
digsig.txt
docutils.conf
dontdiff
efi-stub.txt efi_stub: update documentation on dtb= parameter 2018-09-09 14:46:44 -06:00
eisa.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcc-plugins.txt
highuid.txt
hw_random.txt
hwspinlock.txt
index.rst docs: tidy up TOCs and refs to license-rules.rst 2018-08-31 16:50:50 -06:00
intel_txt.txt
io-mapping.txt
io_ordering.txt
iostats.txt block: Track DISCARD statistics and output them in stat and diskstat 2018-07-18 08:44:22 -06:00
irqflags-tracing.txt
isa.txt
isapnp.txt
kernel-per-CPU-kthreads.txt doc: Update removal of RCU-bh/sched update machinery 2018-08-30 10:59:48 -07:00
kobject.txt
kprobes.txt kprobes/Documentation: Fix various typos 2018-06-22 11:10:55 +02:00
kref.txt
ldm.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lsm.txt
lzo.txt
mailbox.txt
memory-barriers.txt locking/memory-barriers: Replace smp_cond_acquire() with smp_cond_load_acquire() 2018-10-02 10:28:05 +02:00
men-chameleon-bus.txt
nommu-mmap.txt Documentation: nommu-map: Fix duplicate word typo 2018-06-26 09:01:27 -06:00
ntb.txt
numastat.txt
padata.txt
parport-lowlevel.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt Documentation: fix locking rt-mutex doc refs 2017-10-19 12:56:44 -06:00
pnp.txt
preempt-locking.txt Documentation: preempt-locking: Use better example 2018-10-12 11:35:47 -06:00
pwm.txt
rbtree.txt
remoteproc.txt
rfkill.txt rfkill: Fix several typos in documentation 2018-06-15 13:36:08 +02:00
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt Documentation: rtc: move iotcl interface documentation to ABI 2018-01-12 00:20:41 +01:00
sgi-ioc4.txt
siphash.txt
smsc_ece1099.txt
speculation.txt Documentation: Document array_index_nospec 2018-01-30 21:54:28 +01:00
static-keys.txt
svga.txt documentation/svga.txt: update outdated file 2017-11-20 10:45:50 -07:00
switchtec.txt NTB: switchtec_ntb: Update switchtec documentation with prerequisites for NTB 2018-10-11 11:28:53 -05:00
sync_file.txt
tee.txt
this_cpu_ops.txt
unaligned-memory-access.txt
vfio-mediated-device.txt vfio/mdev: Check globally for duplicate devices 2018-06-08 10:24:27 -06:00
vfio.txt vfio: fix documentation 2018-05-08 09:16:41 -06:00
video-output.txt
xillybus.txt
xz.txt
zorro.txt