1
0
Fork 0
alistair23-linux/arch/powerpc/mm
Srikar Dronamraju 2ea6263068 powerpc/topology: Get topology for shared processors at boot
On a shared LPAR, Phyp will not update the CPU associativity at boot
time. Just after the boot system does recognize itself as a shared
LPAR and trigger a request for correct CPU associativity. But by then
the scheduler would have already created/destroyed its sched domains.

This causes
  - Broken load balance across Nodes causing islands of cores.
  - Performance degradation esp if the system is lightly loaded
  - dmesg to wrongly report all CPUs to be in Node 0.
  - Messages in dmesg saying borken topology.
  - With commit 051f3ca02e ("sched/topology: Introduce NUMA identity
    node sched domain"), can cause rcu stalls at boot up.

The sched_domains_numa_masks table which is used to generate cpumasks
is only created at boot time just before creating sched domains and
never updated. Hence, its better to get the topology correct before
the sched domains are created.

For example on 64 core Power 8 shared LPAR, dmesg reports

  Brought up 512 CPUs
  Node 0 CPUs: 0-511
  Node 1 CPUs:
  Node 2 CPUs:
  Node 3 CPUs:
  Node 4 CPUs:
  Node 5 CPUs:
  Node 6 CPUs:
  Node 7 CPUs:
  Node 8 CPUs:
  Node 9 CPUs:
  Node 10 CPUs:
  Node 11 CPUs:
  ...
  BUG: arch topology borken
       the DIE domain not a subset of the NUMA domain
  BUG: arch topology borken
       the DIE domain not a subset of the NUMA domain

numactl/lscpu output will still be correct with cores spreading across
all nodes:

  Socket(s):             64
  NUMA node(s):          12
  Model:                 2.0 (pvr 004d 0200)
  Model name:            POWER8 (architected), altivec supported
  Hypervisor vendor:     pHyp
  Virtualization type:   para
  L1d cache:             64K
  L1i cache:             32K
  NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
  NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
  NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
  NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
  NUMA node4 CPU(s):     208-215,304-311,400-407,496-503
  NUMA node5 CPU(s):     168-175,264-271,360-367,456-463
  NUMA node6 CPU(s):     128-135,224-231,320-327,416-423
  NUMA node7 CPU(s):     136-143,232-239,328-335,424-431
  NUMA node8 CPU(s):     216-223,312-319,408-415,504-511
  NUMA node9 CPU(s):     144-151,240-247,336-343,432-439
  NUMA node10 CPU(s):    152-159,248-255,344-351,440-447
  NUMA node11 CPU(s):    160-167,256-263,352-359,448-455

Currently on this LPAR, the scheduler detects 2 levels of Numa and
created numa sched domains for all CPUs, but it finds a single DIE
domain consisting of all CPUs. Hence it deletes all numa sched
domains.

To address this, detect the shared processor and update topology soon
after CPUs are setup so that correct topology is updated just before
scheduler creates sched domain.

With the fix, dmesg reports:

  numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 368-375 464-471
  numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 376-383 472-479
  numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 384-391 480-487
  numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 392-399 488-495
  numa: Node 4 CPUs: 208-215 304-311 400-407 496-503
  numa: Node 5 CPUs: 168-175 264-271 360-367 456-463
  numa: Node 6 CPUs: 128-135 224-231 320-327 416-423
  numa: Node 7 CPUs: 136-143 232-239 328-335 424-431
  numa: Node 8 CPUs: 216-223 312-319 408-415 504-511
  numa: Node 9 CPUs: 144-151 240-247 336-343 432-439
  numa: Node 10 CPUs: 152-159 248-255 344-351 440-447
  numa: Node 11 CPUs: 160-167 256-263 352-359 448-455

and lscpu also reports:

  Socket(s):             64
  NUMA node(s):          12
  Model:                 2.0 (pvr 004d 0200)
  Model name:            POWER8 (architected), altivec supported
  Hypervisor vendor:     pHyp
  Virtualization type:   para
  L1d cache:             64K
  L1i cache:             32K
  NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
  NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
  NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
  NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
  NUMA node4 CPU(s):     208-215,304-311,400-407,496-503
  NUMA node5 CPU(s):     168-175,264-271,360-367,456-463
  NUMA node6 CPU(s):     128-135,224-231,320-327,416-423
  NUMA node7 CPU(s):     136-143,232-239,328-335,424-431
  NUMA node8 CPU(s):     216-223,312-319,408-415,504-511
  NUMA node9 CPU(s):     144-151,240-247,336-343,432-439
  NUMA node10 CPU(s):    152-159,248-255,344-351,440-447
  NUMA node11 CPU(s):    160-167,256-263,352-359,448-455

Reported-by: Manjunatha H R <manjuhr1@in.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[mpe: Trim / format change log]
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-21 16:01:59 +10:00
..
8xx_mmu.c powerpc/mm/slice: Fix hugepage allocation at hint address on 8xx 2018-03-06 09:21:23 +11:00
40x_mmu.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
44x_mmu.c powerpc/44x: Mark mmu_init_secondary() as __init 2018-07-30 22:48:22 +10:00
Makefile powerpc/Makefiles: Convert ifeq to ifdef where possible 2018-08-08 00:32:36 +10:00
copro_fault.c powerpc/mm: Add support for handling > 512TB address in SLB miss 2018-03-31 00:10:38 +11:00
dma-noncoherent.c powerpc/mm: Rename map_page() to map_kernel_page() on 32-bit 2017-06-05 19:59:03 +10:00
drmem.c powerpc/mm/drmem: Fix unexpected flag value in ibm,dynamic-memory-v2 2018-02-23 16:45:51 +11:00
dump_hashpagetable.c powerpc: remove superflous inclusions of asm/fixmap.h 2018-07-30 22:48:18 +10:00
dump_linuxpagetables.c powerpc/mm: Introduce _PAGE_NA 2018-01-16 23:47:14 +11:00
fault.c powerpc: remove unnecessary inclusion of asm/tlbflush.h 2018-07-30 22:48:20 +10:00
fsl_booke_mmu.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
hash64_4k.c powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group 2018-07-24 22:03:17 +10:00
hash64_64k.c powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group 2018-07-24 22:03:17 +10:00
hash_low_32.S powerpc: clean inclusions of asm/feature-fixups.h 2018-07-30 22:48:17 +10:00
hash_native_64.c powerpc: remove unnecessary inclusion of asm/tlbflush.h 2018-07-30 22:48:20 +10:00
hash_utils_64.c powerpc: remove unnecessary inclusion of asm/tlbflush.h 2018-07-30 22:48:20 +10:00
highmem.c powerpc/mm: remove warning about ‘type’ being set 2018-08-10 22:12:38 +10:00
hugepage-hash64.c powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group 2018-07-24 22:03:17 +10:00
hugetlbpage-book3e.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hugetlbpage-hash64.c powerpc/mm/hash64: Store the slot information at the right offset for hugetlb 2018-02-13 22:37:48 +11:00
hugetlbpage-radix.c powerpc updates for 4.15 2017-11-16 12:47:46 -08:00
hugetlbpage.c powerpc/hugetlbpage: Rmove unhelpful HUGEPD_*_SHIFT macros 2018-07-19 14:38:46 +10:00
init-common.c powerpc/mm: Fix crashes with 16G huge pages 2018-02-13 22:37:47 +11:00
init_32.c powerpc/mm/32: Remove the reserved memory hack 2018-04-01 00:47:44 +11:00
init_64.c powerpc/mm/radix: Parse disable_radix commandline correctly. 2018-04-04 16:59:36 +10:00
mem.c treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX 2018-06-15 07:55:25 +09:00
mmap.c exec: pass stack rlimit into mm layout functions 2018-04-11 10:28:37 -07:00
mmu_context.c powerpc/64s/radix: optimise pte_update 2018-06-03 20:40:36 +10:00
mmu_context_book3s64.c powerpc/64s: Fix page table fragment refcount race vs speculative references 2018-08-08 00:32:32 +10:00
mmu_context_hash32.c powerpc: remove unnecessary inclusion of asm/tlbflush.h 2018-07-30 22:48:20 +10:00
mmu_context_iommu.c KVM: PPC: Check if IOMMU page is contained in the pinned physical page 2018-07-18 16:17:17 +10:00
mmu_context_nohash.c powerpc/mm: Remove stale_map[] handling on non SMP processors 2018-06-04 00:39:16 +10:00
mmu_decl.h powerpc: remove unnecessary inclusion of asm/tlbflush.h 2018-07-30 22:48:20 +10:00
numa.c powerpc/topology: Get topology for shared processors at boot 2018-08-21 16:01:59 +10:00
pgtable-book3e.c powerpc/mm: Make page table size a variable 2016-05-01 18:32:48 +10:00
pgtable-book3s64.c powerpc/mm/book3s/radix: Add mapping statistics 2018-08-13 16:35:05 +10:00
pgtable-hash64.c powerpc/mm: Use pmd_lockptr instead of opencoding it 2018-05-15 22:29:09 +10:00
pgtable-radix.c powerpc/mm/book3s/radix: Add mapping statistics 2018-08-13 16:35:05 +10:00
pgtable.c powerpc/mm/hugetlb: Update hugetlb related locks 2018-06-03 20:40:37 +10:00
pgtable_32.c powerpc/mm/32: Remove the reserved memory hack 2018-04-01 00:47:44 +11:00
pgtable_64.c powerpc/mm: Use page fragments for allocation page table at PMD level 2018-05-15 22:29:12 +10:00
pkeys.c powerpc/pkeys: make protection key 0 less special 2018-07-24 21:43:24 +10:00
ppc_mmu_32.c powerpc/sparse: Fix plain integer as NULL pointer warning 2018-05-25 12:04:38 +10:00
slb.c powerpc/64s: move machine check SLB flushing to mm/slb.c 2018-08-10 22:12:39 +10:00
slb_low.S powerpc: clean inclusions of asm/feature-fixups.h 2018-07-30 22:48:17 +10:00
slice.c powerpc/8xx: Fix build with hugetlbfs enabled 2018-04-11 12:00:23 +10:00
subpage-prot.c powerpc: remove unnecessary inclusion of asm/tlbflush.h 2018-07-30 22:48:20 +10:00
tlb-radix.c Merge branch 'topic/ppc-kvm' into next 2018-07-19 14:37:57 +10:00
tlb_hash32.c powerpc/sparse: Fix plain integer as NULL pointer warning 2018-05-25 12:04:38 +10:00
tlb_hash64.c powerpc/mm: Add support for handling > 512TB address in SLB miss 2018-03-31 00:10:38 +11:00
tlb_low_64e.S powerpc: clean inclusions of asm/feature-fixups.h 2018-07-30 22:48:17 +10:00
tlb_nohash.c powerpc/mm/nohash: do not flush the entire mm when range is a single page 2018-01-27 20:24:44 +11:00
tlb_nohash_low.S powerpc: clean inclusions of asm/feature-fixups.h 2018-07-30 22:48:17 +10:00
vphn.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
vphn.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00