1
0
Fork 0
alistair23-linux/drivers
David Hildenbrand 238dd5ab00 drivers/base/memory.c: indicate all memory blocks as removable
commit 53cdc1cb29 upstream.

We see multiple issues with the implementation/interface to compute
whether a memory block can be offlined (exposed via
/sys/devices/system/memory/memoryX/removable) and would like to simplify
it (remove the implementation).

1. It runs basically lockless. While this might be good for performance,
   we see possible races with memory offlining that will require at
   least some sort of locking to fix.

2. Nowadays, more false positives are possible. No arch-specific checks
   are performed that validate if memory offlining will not be denied
   right away (and such check will require locking). For example, arm64
   won't allow to offline any memory block that was added during boot -
   which will imply a very high error rate. Other archs have other
   constraints.

3. The interface is inherently racy. E.g., if a memory block is detected
   to be removable (and was not a false positive at that time), there is
   still no guarantee that offlining will actually succeed. So any
   caller already has to deal with false positives.

4. It is unclear which performance benefit this interface actually
   provides. The introducing commit 5c755e9fd8 ("memory-hotplug: add
   sysfs removable attribute for hotplug memory remove") mentioned

	"A user-level agent must be able to identify which sections
	 of memory are likely to be removable before attempting the
	 potentially expensive operation."

   However, no actual performance comparison was included.

Known users:

 - lsmem: Will group memory blocks based on the "removable" property. [1]

 - chmem: Indirect user. It has a RANGE mode where one can specify
          removable ranges identified via lsmem to be offlined. However,
          it also has a "SIZE" mode, which allows a sysadmin to skip the
          manual "identify removable blocks" step. [2]

 - powerpc-utils: Uses the "removable" attribute to skip some memory
          blocks right away when trying to find some to offline+remove.
          However, with ballooning enabled, it already skips this
          information completely (because it once resulted in many false
          negatives). Therefore, the implementation can deal with false
          positives properly already. [3]

According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
driven from userspace via the drmgr command (powerpc-utils).  Nowadays
it's managed in the kernel - including onlining/offlining of memory
blocks - triggered by drmgr writing to /sys/kernel/dlpar.  So the
affected legacy userspace handling is only active on old kernels.  Only
very old versions of drmgr on a new kernel (unlikely) might execute
slower - totally acceptable.

With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
break any user space tool.  We implement a very bad heuristic now.
Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
"not removable" as before.

Original discussion can be found in [4] ("[PATCH RFC v1] mm:
is_mem_section_removable() overhaul").

Other users of is_mem_section_removable() will be removed next, so that
we can remove is_mem_section_removable() completely.

[1] http://man7.org/linux/man-pages/man1/lsmem.1.html
[2] http://man7.org/linux/man-pages/man8/chmem.8.html
[3] https://github.com/ibm-power-utilities/powerpc-utils
[4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com

Also, this patch probably fixes a crash reported by Steve.
http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.com

Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Nathan Fontenot <ndfont@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-01 11:02:02 +02:00
..
accessibility
acpi ACPI: PM: s2idle: Rework ACPI events synchronization 2020-04-01 11:01:29 +02:00
amba ARM updates for 5.4-rc: 2019-10-23 06:26:33 -04:00
android binderfs: use refcount for binder control devices too 2020-03-25 08:25:50 +01:00
ata ata: ahci: Add shutdown to freeze hardware resources of ahci 2020-02-28 17:22:28 +01:00
atm fore200e: Fix incorrect checks of NULL pointer dereference 2020-02-24 08:36:36 +01:00
auxdisplay
base drivers/base/memory.c: indicate all memory blocks as removable 2020-04-01 11:02:02 +02:00
bcma
block virtio-blk: fix hw_queue stopped on arbitrary error 2020-03-18 07:17:48 +01:00
bluetooth Bluetooth: btusb: Disable runtime suspend on Realtek devices 2020-02-11 04:35:09 -08:00
bus bus: ti-sysc: Fix 1-wire reset quirk 2020-03-12 13:00:31 +01:00
cdrom cdrom: respect device capabilities during opening action 2020-01-04 19:18:25 +01:00
char ipmi_si: Avoid spurious errors for optional IRQs 2020-03-18 07:17:52 +01:00
clk clk: uniphier: Add SCSSI clock gate for each channel 2020-02-24 08:36:42 +01:00
clocksource clocksource: davinci: only enable clockevents once tim34 is initialized 2020-02-24 08:36:46 +01:00
connector
counter
cpufreq cpufreq: Fix policy initialization for internal governor drivers 2020-03-05 16:43:44 +01:00
cpuidle cpuidle: teo: Avoid using "early hits" incorrectly 2020-02-05 21:22:52 +00:00
crypto crypto: chtls - Fixed memory leak 2020-02-24 08:36:40 +01:00
dax
dca
devfreq Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs" 2020-03-05 16:43:43 +01:00
dio
dma dmaengine: coh901318: Fix a double lock bug in dma_tc_handle() 2020-03-12 13:00:30 +01:00
dma-buf dma-buf: free dmabuf->name in dma_buf_release() 2020-03-12 13:00:30 +01:00
edac EDAC/synopsys: Do not print an error with back-to-back snprintf() calls 2020-03-12 13:00:31 +01:00
eisa
extcon extcon-intel-cht-wc: Don't reset USB data connection at probe 2020-02-01 09:34:46 +00:00
firewire net: add annotations on hh->hh_len lockless accesses 2020-01-09 10:20:06 +01:00
firmware efi: Add a sanity check to efivar_store_raw() 2020-03-18 07:17:53 +01:00
fpga
fsi fsi: core: Fix small accesses and unaligned offsets via sysfs 2019-12-31 16:45:09 +01:00
gnss
gpio gpiolib: Fix irq_disable() semantics 2020-04-01 11:01:58 +02:00
gpu drm/exynos: Fix cleanup of IOMMU related objects 2020-04-01 11:01:53 +02:00
greybus
hid HID: add ALWAYS_POLL quirk to lenovo pixart mouse 2020-03-21 08:11:59 +01:00
hsi
hv hv_balloon: Balloon up according to request page number 2020-02-11 04:35:21 -08:00
hwmon hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT() 2020-03-12 13:00:30 +01:00
hwspinlock
hwtracing stm class: sys-t: Fix the use of time_after() 2020-03-25 08:25:56 +01:00
i2c i2c: hix5hd2: add missed clk_disable_unprepare in remove 2020-04-01 11:01:56 +02:00
i3c
ide ide: serverworks: potential overflow in svwks_set_pio_mode() 2020-02-24 08:36:53 +01:00
idle
iio iio: light: vcnl4000: update sampling periods for vcnl4040 2020-03-25 08:25:54 +01:00
infiniband RDMA/mad: Do not crash if the rdma device does not have a umad interface 2020-04-01 11:01:58 +02:00
input Input: synaptics - enable RMI on HP Envy 13-ad105ng 2020-04-01 11:01:57 +02:00
interconnect interconnect: qcom: qcs404: Walk the list safely on node removal 2019-12-17 19:55:39 +01:00
iommu iommu/vt-d: Populate debugfs if IOMMUs are detected 2020-04-01 11:01:56 +02:00
ipack
irqchip irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL 2020-02-24 08:37:01 +01:00
isdn net: use skb_queue_empty_lockless() in poll() handlers 2019-10-28 13:33:41 -07:00
leds leds: pca963x: Fix open-drain initialization 2020-02-24 08:36:24 +01:00
lightnvm
macintosh macintosh: windfarm: fix MODINFO regression 2020-03-18 07:17:53 +01:00
mailbox mailbox: imx: Fix Tx doorbell shutdown path 2020-01-04 19:18:30 +01:00
mcb
md dm integrity: use dm_bio_record and dm_bio_restore 2020-03-25 08:25:48 +01:00
media media: v4l2-mem2mem.c: fix broken links 2020-03-12 13:00:21 +01:00
memory memory: mtk-smi: Add PM suspend and resume ops 2020-01-17 19:48:59 +01:00
memstick
message scsi: mptfusion: Fix double fetch bug in ioctl 2020-01-23 08:22:35 +01:00
mfd mfd: max77650: Select REGMAP_IRQ in Kconfig 2020-02-14 16:34:19 -05:00
misc mmc: rtsx_pci: Fix support for speed-modes that relies on tuning 2020-03-25 08:25:54 +01:00
mmc mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY 2020-04-01 11:01:29 +02:00
mtd mtd: sharpslpart: Fix unsigned comparison to zero 2020-02-14 16:34:18 -05:00
mux
net rtlwifi: rtl8188ee: Fix regression due to commit d1d1a96bdb 2020-04-01 11:02:00 +02:00
nfc NFC: fdp: Fix a signedness bug in fdp_nci_send_patch() 2020-04-01 11:01:38 +02:00
ntb
nubus
nvdimm libnvdimm/btt: fix variable 'rc' set but not used 2020-01-04 19:18:12 +01:00
nvme nvmet-tcp: set MSG_MORE only if we actually have more to send 2020-03-25 08:25:59 +01:00
nvmem nvmem: core: fix memory abort in cleanup path 2020-02-11 04:35:21 -08:00
of drivers/of/of_mdio.c:fix of_mdiobus_register() 2020-04-01 11:01:51 +02:00
opp opp: Free static OPPs on errors while adding them 2020-02-24 08:36:34 +01:00
oprofile
parisc
parport parport: load lowlevel driver if ports not found 2019-12-31 16:45:25 +01:00
pci PCI: Add DMA alias quirk for PLX PEX NTB 2020-02-24 08:36:37 +01:00
pcmcia
perf drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer 2020-03-25 08:25:47 +01:00
phy phy: ti: gmii-sel: do not fail in case of gmii 2020-03-25 08:25:42 +01:00
pinctrl pinctrl: core: Remove extra kref_get which blocks hogs being freed 2020-03-18 07:17:55 +01:00
platform platform/x86: intel_mid_powerbtn: Take a copy of ddata 2020-02-14 16:34:12 -05:00
pnp
power power: supply: ltc2941-battery-gauge: fix use-after-free 2020-02-11 04:35:24 -08:00
powercap powercap: intel_rapl: add NULL pointer check to rapl_mmio_cpu_online() 2020-01-14 20:08:18 +01:00
pps
ps3
ptp ptp: free ptp device pin descriptors properly 2020-01-23 08:22:51 +01:00
pwm pwm: omap-dmtimer: put_device() after of_find_device_by_node() 2020-03-05 16:43:49 +01:00
rapidio
ras
regulator regulator: stm32-vrefbuf: fix a possible overshoot when re-enabling 2020-03-12 13:00:29 +01:00
remoteproc remoteproc: Initialize rproc_class before use 2020-02-24 08:36:54 +01:00
reset reset: uniphier: Add SCSSI reset control for each channel 2020-02-24 08:36:41 +01:00
rpmsg rpmsg: char: release allocated memory 2020-01-14 20:08:37 +01:00
rtc rtc: max8907: add missing select REGMAP_IRQ 2020-03-25 08:25:56 +01:00
s390 s390/qeth: handle error when backing RX buffer 2020-04-01 11:01:54 +02:00
sbus
scsi scsi: sd: Fix optimal I/O size for devices that change reported values 2020-04-01 11:02:01 +02:00
sfi
sh
siox
slimbus
soc soc: imx-scu: Align imx sc msg structs to 4 2020-03-12 13:00:28 +01:00
soundwire soundwire: intel: fix PDI/stream mapping for Bulk 2019-12-31 16:45:11 +01:00
spi spi: spi_register_controller(): free bus id on error paths 2020-03-25 08:25:48 +01:00
spmi spmi: pmic-arb: Set lockdep class for hierarchical irq domains 2020-02-19 19:53:07 +01:00
ssb
staging staging: greybus: loopback_test: fix potential path truncations 2020-03-25 08:25:59 +01:00
target scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session" 2020-02-28 17:22:25 +01:00
tc
tee tee: optee: Fix compilation issue with nommu 2020-02-05 21:22:49 +00:00
thermal thermal: brcmstb_thermal: Do not use DT coefficients 2020-03-05 16:43:50 +01:00
thunderbolt thunderbolt: Prevent crash if non-active NVMem file is read 2020-02-28 17:22:13 +01:00
tty tty: fix compat TIOCGSERIAL checking wrong function ptr 2020-03-25 08:25:52 +01:00
uio uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol() 2020-02-24 08:36:27 +01:00
usb xhci: Do not open code __print_symbolic() in xhci trace events 2020-03-25 08:25:56 +01:00
vfio vfio/spapr/nvlink2: Skip unpinning pages on error exit 2020-02-24 08:36:43 +01:00
vhost vhost: Check docket sk_family instead of call getname 2020-03-05 16:43:44 +01:00
video vgacon: Fix a UAF in vgacon_invert_region 2020-03-12 13:00:19 +01:00
virt
virtio virtio_ring: Fix mem leak with vring_new_virtqueue() 2020-03-18 07:17:55 +01:00
visorbus visorbus: fix uninitialized variable access 2020-02-24 08:36:47 +01:00
vlynq
vme vme: bridges: reduce stack usage 2020-02-24 08:36:48 +01:00
w1
watchdog ACPI: watchdog: Set default timeout in probe 2020-03-21 08:11:48 +01:00
xen xenbus: req->err should be updated before req->state 2020-03-25 08:25:49 +01:00
zorro
Kconfig
Makefile