1
0
Fork 0
alistair23-linux/drivers
David Hildenbrand 2c91f8fc6c mm/memory_hotplug: fix try_offline_node()
try_offline_node() is pretty much broken right now:

 - The node span is updated when onlining memory, not when adding it. We
   ignore memory that was mever onlined. Bad.

 - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
   trigger a kernel panic. Bad for memory that is offline but also bad
   for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
   first PFN of a section might contain garbage.

 - Sections belonging to mixed nodes are not properly considered.

As memory blocks might belong to multiple nodes, we would have to walk
all pageblocks (or at least subsections) within present sections.
However, we don't have a way to identify whether a memmap that is not
online was initialized (relevant for ZONE_DEVICE).  This makes things
more complicated.

Luckily, we can piggy pack on the node span and the nid stored in memory
blocks.  Currently, the node span is grown when calling
move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
removing memory, before calling try_offline_node().  Sysfs links are
created via link_mem_sections(), e.g., during boot or when adding
memory.

If the node still spans memory or if any memory block belongs to the
nid, we don't set the node offline.  As memory blocks that span multiple
nodes cannot get offlined, the nid stored in memory blocks is reliable
enough (for such online memory blocks, the node still spans the memory).

Introduce for_each_memory_block() to efficiently walk all memory blocks.

Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
when removing ZONE_DEVICE memory to fix similar issues (access of
garbage memmaps) - until we have a reliable way to identify whether
these memmaps were properly initialized.  This implies later, that once
a node had ZONE_DEVICE memory, we won't be able to set a node offline -
which should be acceptable.

Since commit f1dd2cd13c ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") memory that is added is not
assoziated with a zone/node (memmap not initialized).  The introducing
commit 60a5a19e74 ("memory-hotplug: remove sysfs file of node")
already missed that we could have multiple nodes for a section and that
the zone/node span is updated when onlining pages, not when adding them.

I tested this by hotplugging two DIMMs to a memory-less and cpu-less
NUMA node.  The node is properly onlined when adding the DIMMs.  When
removing the DIMMs, the node is properly offlined.

Masayoshi Mizuma reported:

: Without this patch, memory hotplug fails as panic:
:
:  BUG: kernel NULL pointer dereference, address: 0000000000000000
:  ...
:  Call Trace:
:   remove_memory_block_devices+0x81/0xc0
:   try_remove_memory+0xb4/0x130
:   __remove_memory+0xa/0x20
:   acpi_memory_device_remove+0x84/0x100
:   acpi_bus_trim+0x57/0x90
:   acpi_bus_trim+0x2e/0x90
:   acpi_device_hotplug+0x2b2/0x4d0
:   acpi_hotplug_work_fn+0x1a/0x30
:   process_one_work+0x171/0x380
:   worker_thread+0x49/0x3f0
:   kthread+0xf8/0x130
:   ret_from_fork+0x35/0x40

[david@redhat.com: v3]
  Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
Fixes: 60a5a19e74 ("memory-hotplug: remove sysfs file of node")
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-15 18:34:00 -08:00
..
accessibility
acpi Power management fix for 5.4-rc6 2019-11-01 09:30:48 -07:00
amba ARM updates for 5.4-rc: 2019-10-23 06:26:33 -04:00
android binder: Don't modify VMA bounds in ->mmap handler 2019-10-17 05:58:44 -07:00
ata ata: libahci_platform: Fix regulator_get_optional() misuse 2019-10-25 14:22:20 -06:00
atm atm: he: clean up an indentation issue 2019-09-25 13:54:45 +02:00
auxdisplay It's a somewhat calmer cycle for docs this time, as the churn of the mass 2019-09-17 16:22:26 -07:00
base mm/memory_hotplug: fix try_offline_node() 2019-11-15 18:34:00 -08:00
bcma bcma: make arrays pwr_info_offset and sprom_sizes static const, shrinks object size 2019-09-13 16:44:49 +03:00
block rbd: silence bogus uninitialized warning in rbd_object_map_update_finish() 2019-11-14 19:00:53 +01:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
bus bus: ti-sysc: Fix watchdog quirk handling 2019-10-18 08:45:32 -07:00
cdrom
char char/random: Add a newline at the end of the file 2019-10-02 13:49:43 -07:00
clk Fixes for various clk driver issues that happened because of code we 2019-11-08 08:15:01 -08:00
clocksource - Fix scary messages in sh_mtu2 by using platform_irq_count() helper 2019-11-04 18:43:23 +01:00
connector
counter
cpufreq cpufreq: intel_pstate: Fix invalid EPB setting 2019-11-08 11:29:58 +01:00
cpuidle cpuidle: haltpoll: Take 'idle=' override into account 2019-10-22 11:43:17 +02:00
crypto inet: stop leaking jiffies on the wire 2019-11-01 14:57:52 -07:00
dax
dca
devfreq
dio
dma dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle 2019-10-23 21:15:21 +05:30
dma-buf dma-buf/resv: fix exclusive fence get 2019-10-10 17:05:20 +02:00
edac EDAC/ghes: Fix Use after free in ghes_edac remove path 2019-10-17 11:27:05 +02:00
eisa
extcon chrome platform changes for v5.4 2019-09-19 14:14:28 -07:00
firewire
firmware efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN 2019-10-31 09:40:21 +01:00
fpga Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
fsi
gnss
gpio Revert "gpio: merrifield: Pass irqchip when adding gpiochip" 2019-11-03 23:41:11 +01:00
gpu drm fixes for 5.4-rc8 2019-11-15 08:47:34 -08:00
greybus
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2019-11-07 11:54:54 -08:00
hsi HSI changes for the 5.4 series 2019-09-22 12:02:21 -07:00
hv Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM_SLEEP 2019-10-01 14:49:45 -04:00
hwmon hwmon: (ina3221) Fix read timeout issue 2019-10-28 18:46:55 -07:00
hwspinlock
hwtracing intel_th: pci: Add Jasper Lake PCH support 2019-11-04 15:01:25 +01:00
i2c i2c: stm32f7: remove warning when compiling with W=1 2019-10-24 20:52:21 +02:00
i3c
ide
idle
iio iio: adc: stm32-adc: fix stopping dma 2019-10-27 15:57:19 +00:00
infiniband RDMA/hns: Correct the value of srq_desc_size 2019-11-06 13:37:02 -04:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2019-11-13 12:16:47 -08:00
interconnect interconnect: Add locking in icc_set_tag() 2019-10-20 12:14:41 +03:00
iommu iommu/vt-d: Fix panic after kexec -p for kdump 2019-10-30 10:30:22 +01:00
ipack
irqchip irqchip updates for 5.4, take 2 2019-10-25 14:25:15 +02:00
isdn net: use skb_queue_empty_lockless() in poll() handlers 2019-10-28 13:33:41 -07:00
leds leds: lm3532: Fix optional led-max-microamp prop error handling 2019-09-12 20:45:52 +02:00
lightnvm lightnvm: print error when target is not found 2019-09-05 13:17:01 -06:00
macintosh cpufreq: Use per-policy frequency QoS 2019-10-21 02:05:21 +02:00
mailbox mailbox: qcom-apcs: fix max_register value 2019-09-17 00:54:29 -05:00
mcb
md for-linus-2019-10-18 2019-10-18 22:29:36 -04:00
media media: stkwebcam: fix runtime PM after driver unbind 2019-10-04 14:38:46 +02:00
memory iommu/mediatek: Clean up struct mtk_smi_iommu 2019-08-30 15:57:27 +02:00
memstick memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()' 2019-10-09 11:08:03 +02:00
message
mfd mfd: mt6397: Fix probe after changing mt6397-core 2019-10-24 08:49:25 +01:00
misc misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach 2019-10-04 18:22:14 +02:00
mmc mmc: sdhci-of-at91: fix quirk2 overwrite 2019-11-14 14:57:53 +01:00
mtd mtd: rawnand: au1550nd: Fix au_read_buf16() prototype 2019-10-07 09:56:36 +02:00
mux
net Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 2019-11-08 16:50:14 -08:00
nfc NFC: st21nfca: fix double free 2019-11-06 21:48:29 -08:00
ntb NTB: fix IDT Kconfig typos/spellos 2019-09-23 17:20:40 -04:00
nubus
nvdimm libnvdimm fixes v5.4-rc1 2019-09-29 10:33:41 -07:00
nvme for-linus-2019-11-08 2019-11-08 18:15:55 -08:00
nvmem Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
of of: reserved_mem: add missing of_node_put() for proper ref-counting 2019-10-23 15:15:05 -05:00
opp opp: Reinitialize the list_kref before adding the static OPPs again 2019-10-23 10:58:44 +05:30
oprofile
parisc parisc: Remove 32-bit DMA enforcement from sba_iommu 2019-10-14 21:44:26 +02:00
parport Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
pci PCI: PM: Fix pci_power_up() 2019-10-15 23:51:36 +02:00
pcmcia Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
perf Merge branches 'for-next/52-bit-kva', 'for-next/cpu-topology', 'for-next/error-injection', 'for-next/perf', 'for-next/psci-cpuidle', 'for-next/rng', 'for-next/smpboot', 'for-next/tbi' and 'for-next/tlbi' into for-next/core 2019-08-30 12:46:12 +01:00
phy pci-v5.4-changes 2019-09-23 19:16:01 -07:00
pinctrl pinctrl: stmfx: fix valid_mask init sequence 2019-11-07 10:06:46 +01:00
platform platform/x86: i2c-multi-instantiate: Fail the probe if no IRQ provided 2019-10-14 15:31:50 +03:00
pnp
power power supply and reset changes for the v5.4 series 2019-09-22 12:04:59 -07:00
powercap Power management updates for 5.4-rc1 2019-09-17 19:15:14 -07:00
pps
ps3
ptp ptp: fix typo of "mechanism" in Kconfig help text 2019-10-07 14:55:46 -04:00
pwm pwm: bcm-iproc: Prevent unloading the driver module while in use 2019-11-08 18:38:06 +01:00
rapidio
ras
regulator regulator: Fixes for v5.4 2019-10-23 15:31:17 -04:00
remoteproc remoteproc updates for v5.4 2019-09-22 10:55:08 -07:00
reset reset: fix of_reset_control_get_count kerneldoc comment 2019-10-24 10:26:33 +02:00
rpmsg rpmsg: glink-smem: Name the edge based on parent remoteproc 2019-09-17 15:33:31 -07:00
rtc RTC for 5.4 2019-09-22 11:05:43 -07:00
s390 s390/zcrypt: fix memleak at release 2019-10-22 17:55:51 +02:00
sbus
scsi SCSI fixes on 20191111 2019-11-11 09:14:36 -08:00
sfi
sh
siox
slimbus
soc soc: imx: gpc: fix initialiser format 2019-10-26 19:47:31 +08:00
soundwire soundwire: slave: fix scanf format 2019-10-24 16:55:45 +05:30
spi LED updates for 5.4-rc1 2019-09-17 18:40:42 -07:00
spmi
ssb ssb: make array pwr_info_offset static const, makes object smaller 2019-09-13 17:23:18 +03:00
staging Remove VirtualBox guest shared folders filesystem 2019-11-12 15:22:24 -08:00
target SCSI fixes on 20191101 2019-11-02 11:15:52 -07:00
tc
tee tee/shm: untag user pointers in tee_shm_register 2019-09-25 17:51:41 -07:00
thermal cpufreq: Use per-policy frequency QoS 2019-10-21 02:05:21 +02:00
thunderbolt thunderbolt: Drop unnecessary read when writing LC command in Ice Lake 2019-10-08 12:08:21 +03:00
tty 8250-men-mcb: fix error checking when get_num_ports returns -ENODEV 2019-10-15 21:38:41 +02:00
uio Char/Misc driver patches for 5.4-rc1 2019-09-18 11:14:31 -07:00
usb usb: dwc3: gadget: fix race when disabling ep with cancelled xfers 2019-10-31 18:57:54 +01:00
vfio vfio/type1: Initialize resv_msi_base 2019-10-15 14:07:01 -06:00
vhost vringh: fix copy direction of vringh_iov_push_kern() 2019-10-28 04:25:04 -04:00
video - Some new documentation for GEM shmem madvise helpers 2019-11-08 12:12:57 +10:00
virt virt: vbox: fix memory leak in hgcm_call_preprocess_linaddr 2019-10-10 14:50:32 +02:00
virtio virtio_ring: fix stalls for packed rings 2019-10-28 04:24:46 -04:00
visorbus
vlynq
vme
w1 w1: ds250x: Fix build error without CRC16 2019-10-10 15:35:41 +02:00
watchdog watchdog: bd70528: Add MODULE_ALIAS to allow module auto loading 2019-11-05 16:58:12 +01:00
xen Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-10-19 17:09:11 -04:00
zorro
Kconfig Staging/IIO driver patches for 5.4-rc1 2019-09-18 11:05:34 -07:00
Makefile Staging/IIO driver patches for 5.4-rc1 2019-09-18 11:05:34 -07:00