1
0
Fork 0
alistair23-linux/Documentation
Roman Gushchin 7a1adfddaf mm: don't raise MEMCG_OOM event due to failed high-order allocation
It was reported that on some of our machines containers were restarted
with OOM symptoms without an obvious reason.  Despite there were almost no
memory pressure and plenty of page cache, MEMCG_OOM event was raised
occasionally, causing the container management software to think, that OOM
has happened.  However, no tasks have been killed.

The following investigation showed that the problem is caused by a failing
attempt to charge a high-order page.  In such case, the OOM killer is
never invoked.  As shown below, it can happen under conditions, which are
very far from a real OOM: e.g.  there is plenty of clean page cache and no
memory pressure.

There is no sense in raising an OOM event in this case, as it might
confuse a user and lead to wrong and excessive actions (e.g.  restart the
workload, as in my case).

Let's look at the charging path in try_charge().  If the memory usage is
about memory.max, which is absolutely natural for most memory cgroups, we
try to reclaim some pages.  Even if we were able to reclaim enough memory
for the allocation, the following check can fail due to a race with
another concurrent allocation:

    if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
        goto retry;

For regular pages the following condition will save us from triggering
the OOM:

   if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
       goto retry;

But for high-order allocation this condition will intentionally fail.  The
reason behind is that we'll likely fall to regular pages anyway, so it's
ok and even preferred to return ENOMEM.

In this case the idea of raising MEMCG_OOM looks dubious.

Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after allocation
order check, so that the event won't be raised for high order allocations.
This change doesn't affect regular pages allocation and charging.

Link: http://lkml.kernel.org/r/20181004214050.7417-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:38:14 -07:00
..
ABI Char/Misc driver patches for 4.20-rc1 2018-10-26 09:11:43 -07:00
EDID
PCI pci-v4.20-changes 2018-10-25 06:50:48 -07:00
RCU This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
accelerators ocxl: Document new OCXL IOCTLs 2018-06-03 20:40:33 +10:00
accounting psi: cgroup support 2018-10-26 16:26:32 -07:00
acpi ACPI: property: graph: Update graph documentation to use generic references 2018-07-23 12:44:52 +02:00
admin-guide mm: don't raise MEMCG_OOM event due to failed high-order allocation 2018-10-26 16:38:14 -07:00
aoe
arm Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
arm64 Documentation/arm64: HugeTLB page implementation 2018-10-10 18:08:36 +01:00
auxdisplay Doc: misc-devices: move lcd-panel-cgram.txt to auxdisplay/ 2018-04-12 16:08:02 +02:00
backlight
block Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
blockdev This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-08-07 11:02:05 -07:00
bus-devices
cdrom Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
cgroup-v1 Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2018-10-25 17:15:46 -07:00
cma
connector
console Documentation: corrections to console/console.txt 2018-08-10 16:09:40 -06:00
core-api Printk changes for 4.20 2018-10-25 17:11:52 -07:00
cpu-freq cpufreq: Drop cpufreq_table_validate_and_show() 2018-04-10 08:40:45 +02:00
cpuidle cpuidle: Add definition of residency to sysfs documentation 2018-04-09 13:44:37 +02:00
crypto crypto: remove redundant type flags from tfm allocation 2018-07-09 00:30:29 +08:00
dev-tools docs: dev-tools: coccinelle: Update documentation 2018-08-31 16:51:59 -06:00
device-mapper This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
devicetree Char/Misc driver patches for 4.20-rc1 2018-10-26 09:11:43 -07:00
doc-guide Documentation/sphinx: allow "functions" with no parameters 2018-06-30 07:52:42 -06:00
driver-api Char/Misc driver patches for 4.20-rc1 2018-10-26 09:11:43 -07:00
driver-model dmaengine: add a new helper dmaenginem_async_device_register 2018-07-30 10:50:22 +05:30
early-userspace initramfs: move gen_initramfs_list.sh from scripts/ to usr/ 2018-08-22 23:21:44 +09:00
extcon
fault-injection Documentation: nvme: Documentation for nvme fault injection 2018-03-26 08:53:43 -06:00
fb This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
features ARM: 8777/1: Hook up SYNC_CORE functionality for sys_membarrier() 2018-07-11 11:02:08 +01:00
filesystems mm, proc: add KReclaimable to /proc/meminfo 2018-10-26 16:26:32 -07:00
firmware_class
fmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
fpga docs: fpga: add a document for FPGA Device Feature List (DFL) Framework Overview 2018-07-15 13:55:44 +02:00
gpio Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
gpu drm/msm/gpu: Add the buffer objects from the submit to the crash dump 2018-07-30 08:50:10 -04:00
hid Documentation: fix input related doc refs 2017-10-12 11:14:06 -06:00
hwmon hwmon: (ina3221) Read channel input source info from DT 2018-10-10 20:37:13 -07:00
i2c i2c: refactor function to release a DMA safe buffer 2018-08-30 23:13:15 +02:00
ia64 ia64: doc: tweak whitespace for 'console=' parameter 2018-03-05 14:41:38 -08:00
ide Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
iio iio: adc: New driver for Cirrus Logic EP93xx ADC 2017-07-25 19:56:23 +01:00
infiniband Documentation/ABI: update infiniband sysfs interfaces 2018-02-23 08:18:33 -07:00
input Input: Add the `REL_WHEEL_HI_RES` event code 2018-09-05 10:12:07 +02:00
ioctl USB/PHY patches for 4.20-rc1 2018-10-26 08:14:13 -07:00
isdn Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
kbuild Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
kdump
kernel-hacking doc:it_IT: translation for kernel-hacking 2018-07-26 16:21:09 -06:00
laptops Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
leds Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
lightnvm
livepatch livepatch: Remove not longer valid limitations from the documentation 2018-05-24 15:37:57 +02:00
locking This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
m68k Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
maintainer docs: Fix more broken references 2018-06-15 18:11:26 -03:00
md raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
media media: video_function_calls.rst: drop obsolete video-set-attributes reference 2018-08-29 13:24:30 -04:00
memory-devices
mic
mips Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
misc-devices pci_endpoint_test: Add 2 ioctl commands 2018-07-19 11:46:57 +01:00
mmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
mtd Documentation: mtd: remove stale pxa3xx NAND controller documentation 2018-09-04 23:37:38 +02:00
namespaces
netlabel Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
networking This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
nfc
nios2
nvdimm
nvmem Documentation: nvmem: document cell tables and lookup entries 2018-09-28 15:14:54 +02:00
openrisc Documentation: openrisc: Updates to README 2017-10-30 21:37:53 +09:00
parisc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
pcmcia pcmcia: remove long deprecated pcmcia_request_exclusive_irq() function 2018-08-18 12:30:42 -07:00
perf drivers/bus: Move Arm CCN PMU driver 2018-03-06 17:26:15 +01:00
phy
platform
power This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
powerpc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
pps drivers/pps: aesthetic tweaks to PPS-related content 2017-09-08 18:26:51 -07:00
process This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
pti
ptp ptp: Fix documentation to match code. 2018-03-26 12:13:21 -04:00
rapidio Documentation: rapidio: move sysfs interface to ABI 2018-02-23 08:25:45 -07:00
riscv perf: riscv: Add Document for Future Porting Guide 2018-06-04 14:02:11 -07:00
s390 KVM updates for v4.20 2018-10-25 17:57:35 -07:00
scheduler This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
scsi SCSI misc on 20181024 2018-10-25 07:40:30 -07:00
security This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
serial Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
sh
sound ALSA: doc: Brush up the old writing-an-alsa-driver 2018-10-18 10:30:01 +02:00
sparc sparc64: Add support for ADI (Application Data Integrity) 2018-03-18 07:38:48 -07:00
sphinx Documentation/sphinx: allow "functions" with no parameters 2018-06-30 07:52:42 -06:00
sphinx-static docs: improve readability for people with poorer eyesight 2018-10-07 09:16:50 -06:00
spi Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
sysctl Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
target
thermal thermal: Add cooling device's statistics in sysfs 2018-04-02 21:49:01 +08:00
timers Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
trace Char/Misc driver patches for 4.20-rc1 2018-10-26 09:11:43 -07:00
translations This was a moderately busy cycle for docs, with the usual collection of 2018-08-14 14:29:31 -07:00
usb USB-serial updates for v4.19-rc1 2018-07-20 21:47:15 +02:00
userspace-api audit/stable-4.18 PR 20180605 2018-06-06 16:34:00 -07:00
virtual KVM updates for v4.20 2018-10-25 17:57:35 -07:00
vm slub: extend slub debug to handle multiple slabs 2018-10-26 16:25:19 -07:00
w1 Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
watchdog watchdog: remove bfin_wdt driver 2018-03-26 15:57:04 +02:00
wimax
x86 mm: remove references to vm_insert_pfn() 2018-10-26 16:25:20 -07:00
xtensa xtensa: add support for KASAN 2017-12-16 22:37:12 -08:00
.gitignore
Changes
CodingStyle
DMA-API-HOWTO.txt
DMA-API.txt dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags 2017-09-01 11:59:17 +02:00
DMA-ISA-LPC.txt
DMA-attributes.txt
IPMI.txt ipmi: Make IPMI panic strings always available 2017-09-27 16:03:45 -05:00
IRQ-affinity.txt
IRQ-domain.txt irqdomain: Kill CONFIG_IRQ_DOMAIN_DEBUG 2018-01-24 12:32:58 +01:00
IRQ.txt
Intel-IOMMU.txt
Makefile Documentation: add script and build target to check for broken file references 2017-10-12 11:07:42 -06:00
SAK.txt
SM501.txt
SubmittingPatches
atomic_bitops.txt locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit() 2018-02-13 14:55:53 +01:00
atomic_t.txt Documentation/locking/atomic: Finish the document... 2017-08-25 11:06:33 +02:00
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
clearing-warn-once.txt kernel debug: support resetting WARN*_ONCE 2017-11-17 16:10:00 -08:00
conf.py This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt Documentation: remove stale firmware API reference 2018-05-14 16:44:41 +02:00
digsig.txt
docutils.conf
dontdiff Remove gperf usage from toolchain 2017-08-19 11:02:53 -07:00
efi-stub.txt efi_stub: update documentation on dtb= parameter 2018-09-09 14:46:44 -06:00
eisa.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcc-plugins.txt
highuid.txt
hw_random.txt
hwspinlock.txt
index.rst docs: tidy up TOCs and refs to license-rules.rst 2018-08-31 16:50:50 -06:00
intel_txt.txt
io-mapping.txt
io_ordering.txt
iostats.txt block: Track DISCARD statistics and output them in stat and diskstat 2018-07-18 08:44:22 -06:00
irqflags-tracing.txt
isa.txt
isapnp.txt
kernel-per-CPU-kthreads.txt doc: Update removal of RCU-bh/sched update machinery 2018-08-30 10:59:48 -07:00
kobject.txt
kprobes.txt kprobes/Documentation: Fix various typos 2018-06-22 11:10:55 +02:00
kref.txt
ldm.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lsm.txt
lzo.txt
mailbox.txt
memory-barriers.txt locking/memory-barriers: Replace smp_cond_acquire() with smp_cond_load_acquire() 2018-10-02 10:28:05 +02:00
men-chameleon-bus.txt
nommu-mmap.txt Documentation: nommu-map: Fix duplicate word typo 2018-06-26 09:01:27 -06:00
ntb.txt
numastat.txt
padata.txt
parport-lowlevel.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt Documentation: fix locking rt-mutex doc refs 2017-10-19 12:56:44 -06:00
pnp.txt
preempt-locking.txt Documentation: preempt-locking: Use better example 2018-10-12 11:35:47 -06:00
pwm.txt
rbtree.txt rbtree: cache leftmost node internally 2017-09-08 18:26:48 -07:00
remoteproc.txt
rfkill.txt rfkill: Fix several typos in documentation 2018-06-15 13:36:08 +02:00
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt Documentation: rtc: move iotcl interface documentation to ABI 2018-01-12 00:20:41 +01:00
sgi-ioc4.txt
siphash.txt
smsc_ece1099.txt
speculation.txt Documentation: Document array_index_nospec 2018-01-30 21:54:28 +01:00
static-keys.txt jump_label: Provide hotplug context variants 2017-08-10 12:28:59 +02:00
svga.txt documentation/svga.txt: update outdated file 2017-11-20 10:45:50 -07:00
switchtec.txt NTB: switchtec_ntb: Update switchtec documentation with prerequisites for NTB 2018-10-11 11:28:53 -05:00
sync_file.txt
tee.txt
this_cpu_ops.txt
unaligned-memory-access.txt
vfio-mediated-device.txt vfio/mdev: Check globally for duplicate devices 2018-06-08 10:24:27 -06:00
vfio.txt vfio: fix documentation 2018-05-08 09:16:41 -06:00
video-output.txt
xillybus.txt
xz.txt
zorro.txt