Commit graph

41 commits

Author SHA1 Message Date
Yinghai Lu cb95a13a8a x86: merge zones_sizes_init for numa and non numa on 32-bit
move out e820_register_active_regions from non numa zones_sizes_init()
and remove numa version zones_sizes_init().

and let 32 bit call remove_all_active_ranges() in setup_arch() directly
like 64-bit

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:22 +02:00
Yinghai Lu 996cf4438f x86: don't reallocate pgt for node0
kva ram already mapped right after away, so don't need to get that for low ram.
avoid wasting one copy of pgdat.

also add node id in early_res name in case we get it from find_e820_area.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:16:15 +02:00
Yinghai Lu 3a58a2a6c8 x86: introduce init_memory_mapping for 32bit #3
move kva related early backto initmem_init for numa32

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 13:10:33 +02:00
Yinghai Lu c09434571d x86: numa32 pfn print out using hex instead
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:23 +02:00
Yinghai Lu 2ec65f8b89 x86: clean up using max_low_pfn on 32-bit
so that max_low_pfn is not changed after it is set.
so we can move that early and out of initmem_init.

could call find_low_pfn_range just after max_pfn is set.

also could move reserve_initrd out of setup_bootmem_allocator

so 32bit is more like 64bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:20 +02:00
Yinghai Lu b2ac82a090 x86: introduce initmem_init for 32 bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 12:50:15 +02:00
Ingo Molnar 3de352bbd8 Merge branch 'x86/mpparse' into x86/devel
Conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/io_apic_32.c
	arch/x86/kernel/setup_64.c
	arch/x86/mm/init_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 11:14:58 +02:00
Yinghai Lu cc9f7a0ccf x86: kill bad_ppro
so don't punish all other cpus without that problem when init highmem

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:38:19 +02:00
Yinghai Lu b5bc6c0e55 x86, mm: use add_highpages_with_active_regions() for high pages init v2
use early_node_map to init high pages, so we can remove page_is_ram() and
page_is_reserved_early() in the big loop with add_one_highpage

also remove page_is_reserved_early(), it is not needed anymore.

v2: fix the build of other platforms

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:37:25 +02:00
Yinghai Lu cc1050bafe x86: replace shrink_active_range() with remove_active_range()
in case we have kva before ramdisk on a node, we still need to use
those ranges.

v2: reserve_early kva ram area, in case there are holes in highmem, to avoid
    those area could be treat as free high pages.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 10:36:29 +02:00
Yinghai Lu 9043f00796 x86, numa, 32-bit: use find_e820_area() to find KVA RAM on node
don't assume we can use RAM near the end of every node.
Esp systems that have few memory and they could have
kva address and kva RAM all below max_low_pfn.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:52 +02:00
Yinghai Lu cc1a9d86ce mm, x86: shrink_active_range() should check all
Now we are using register_e820_active_regions() instead of
add_active_range() directly. So end_pfn could be different between the
value in early_node_map to node_end_pfn.

So we need to make shrink_active_range() smarter.

shrink_active_range() is a generic MM function in mm/page_alloc.c but
it is only used on 32-bit x86. Should we move it back to some file in
arch/x86?

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 11:31:44 +02:00
Yinghai Lu 7b2a0a6c48 x86: make 32-bit use e820_register_active_regions()
this way 32-bit is more similar to 64-bit, and smarter e820 and numa.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:58 +02:00
Yinghai Lu 84b56fa46b x86, numa, 32-bit: make sure get we kva space
when 1/3 user/kernel split is used, and less memory is installed, or if
we have a big hole below 4g, max_low_pfn is still using 3g-128m

try to go down from max_low_pfn until we get it. otherwise will panic.

need to make 32-bit code to use register_e820_active_regions ... later.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 12:01:55 +02:00
Yinghai Lu 6af61a7614 x86: clean up max_pfn_mapped usage - 32-bit
on 32-bit in head_32.S after initial page table is done, we get initial
max_pfn_mapped, and then kernel_physical_mapping_init will give us
a final one.

We need to use that to make sure find_e820_area will get valid addresses
for boot_map and for NODE_DATA(0) on numa32.

XEN PV and lguest may need to assign max_pfn_mapped too.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu 287572cb38 x86, numa, 32-bit: avoid clash between ramdisk and kva
use find_e820_area to get address space...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:28 +02:00
Yinghai Lu e8c27ac919 x86, numa, 32-bit: print out debug info on all kvas
also fix the print out of node_remap_end_vaddr

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:26 +02:00
Yinghai Lu 0596152388 x86, 32-bit: change propagate_e820_map() back to find_max_pfn()
we don't need to call memory_present that early.
numa and sparse will call memory_present later and might
even fail, it will call memory_present for the full range.

also for sparse it will call alloc_bootmem ... before we set up bootmem.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu b66cd72073 x86: set node_remap_size[0] in fallback path
... otherwise alloc_remap will not get node_mem_map from kva area, and
alloc_node_mem_map has to alloc_bootmem_node to get mem_map.
It will use two low address copies ...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:25 +02:00
Yinghai Lu ba924c81dd x86, numa, 32-bit: increase max_elements to 1024
so every element will represent 64M instead of 256M.

AMD opteron could have HW memory hole remapping, so could have
[0, 8g + 64M) on node0. Reduce element size to 64M to keep that on node 0

Later we need to use find_e820_area() to allocate memory_node_map like
on 64-bit. But need to move memory_present out of populate_mem_map...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-03 13:26:24 +02:00
Ingo Molnar 471b3c1b01 x86, numaq 32-bit: build fix
fix:

drivers/built-in.o: In function `acpi_numa_init':
: undefined reference to `acpi_numa_arch_fixup'

which can happen with ACPI && NUMAQ.
2008-06-03 12:49:06 +02:00
Ingo Molnar cf3d0cb8a1 x86: 32-bit numa, build fix
on Summit it's possible to have:

 CONFIG_ACPI_SRAT=y
 CONFIG_HAVE_ARCH_PARSE_SRAT=y

in which case acpi.h defines the acpi_numa_slit_init() and
acpi_numa_processor_affinity_init() methods as a macro.
2008-06-03 11:45:09 +02:00
Ingo Molnar b28852d670 x86: add dummy acpi_numa_processor_affinity_init() implementation on 32-bit
allow CONFIG_ACPI_NUMA builds to succeed.
2008-06-03 10:23:53 +02:00
Ingo Molnar 2772f54bf3 x86: add acpi_numa_slit_init() dummy implementation on 32-bit
allow CONFIG_ACPI_NUMA builds to succeed on 32-bit.
2008-06-03 10:23:48 +02:00
Yinghai Lu a5481280b2 x86: extend e820 early_res support 32bit -fix #5
reserve early numa kva, so it will not clash with new RAMDISK

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:53 +02:00
Yinghai Lu 163872950d x86: extend e820 early_res support 32bit -fix #4
reserve_early pgdata for 32bit numa

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-31 09:55:50 +02:00
Andy Whitcroft 84d6bd0e27 x86: cope with no remap space being allocated for a numa node
When allocating the pgdat's for numa nodes on x86_32 we attempt to place
them in the numa remap space for that node.  However should the node not
have any remap space allocated (such as due to having non-ram pages in
the remap location in the node) then we will incorrectly place the pgdat
at zero.  Check we have remap available, falling back to node 0 memory
where we do not.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-20 14:10:50 +02:00
Andy Whitcroft b9ada4281c x86: reinstate numa remap for SPARSEMEM on x86 NUMA systems
Recent kernels have been panic'ing trying to allocate memory early in boot,
in __alloc_pages:

  BUG: unable to handle kernel paging request at 00001568
  IP: [<c10407b6>] __alloc_pages+0x33/0x2cc
  *pdpt = 00000000013a5001 *pde = 0000000000000000
  Oops: 0000 [#1] SMP
  Modules linked in:

  Pid: 1, comm: swapper Not tainted (2.6.25 #78)
  EIP: 0060:[<c10407b6>] EFLAGS: 00010246 CPU: 0
  EIP is at __alloc_pages+0x33/0x2cc
  EAX: 00001564 EBX: 000412d0 ECX: 00001564 EDX: 000005c3
  ESI: f78012a0 EDI: 00000001 EBP: 00001564 ESP: f7871e50
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 1, ti=f7870000 task=f786f670 task.ti=f7870000)
  Stack: 00000000 f786f670 00000010 00000000 0000b700 000412d0 f78012a0 00000001
         00000000 c105b64d 00000000 000412d0 f78012a0 f7803120 00000000 c105c1c5
         00000010 f7803144 000412d0 00000001 f7803130 f7803120 f78012a0 00000001
  Call Trace:
   [<c105b64d>] kmem_getpages+0x94/0x129
   [<c105c1c5>] cache_grow+0x8f/0x123
   [<c105c689>] ____cache_alloc_node+0xb9/0xe4
   [<c105c999>] kmem_cache_alloc_node+0x92/0xd2
   [<c1018929>] build_sched_domains+0x536/0x70d
   [<c100b63c>] do_flush_tlb_all+0x0/0x3f
   [<c100b63c>] do_flush_tlb_all+0x0/0x3f
   [<c10572d6>] interleave_nodes+0x23/0x5a
   [<c105c44f>] alternate_node_alloc+0x43/0x5b
   [<c1018b47>] arch_init_sched_domains+0x46/0x51
   [<c136e85e>] kernel_init+0x0/0x82
   [<c137ac19>] sched_init_smp+0x10/0xbb
   [<c136e8a1>] kernel_init+0x43/0x82
   [<c10035cf>] kernel_thread_helper+0x7/0x10

Debugging this showed that the NODE_DATA() for nodes other than node 0
were all NULL.  Tracing this back showed that the NODE_DATA() pointers
were being initialised to each nodes remap space.  However under
SPARSEMEM remap is disabled which leads to the pgdat's being placed
incorrectly at kernel virtual address 0.  Leading to the panic when
attempting to allocate memory from these nodes.

Numa remap was disabled in the commit below.  This occured while fixing
problems triggered when attempting to boot x86_32 NUMA SPARSEMEM kernels
on non-numa hardware.

	x86: make NUMA work on 32-bit
	commit 1b000a5dbe

The real problem is believed to be related to other alignment issues in
the regions blocked out from the bootmem allocator for small memory
systems, and has been fixed separately.  Therefore re-enable remap for
SPARSMEM, which fixes pgdat allocation issues.  Testing confirms that
SPARSMEM NUMA kernels will boot correctly with this part of the change
reverted.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-20 14:10:48 +02:00
Thomas Gleixner 48b83d2425 x86: undo visws/numaq build changes
arch/x86/pci/Makefile_32 has a nasty detail. VISWS and NUMAQ build
override the generic pci-y rules. This needs a proper cleanup, but
that needs more thoughts. Undo

commit 895d30935e
    x86: numaq fix
    do not override the existing pci-y rule when adding visws or
    numaq rules.

There is also a stupid init function ordering problem vs. acpi.o

Add comments to the Makefile to avoid tripping over this again.

Remove the srat stub code in discontig_32.c to allow a proper NUMAQ
build.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-04 20:04:45 +02:00
Ingo Molnar fa5c463941 x86: rename find_max_pfn() to propagate_e820_map()
this function doesnt just 'find' the max_pfn - it also has
other side-effects such as registering sparse memory maps.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:55 +02:00
Alexey Starikovskiy ce3fe6b2bf x86: use get_bios_ebda in mpparse_64.c
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:41:05 +02:00
Yinghai Lu 76c324182b x86: fix trim mtrr not to setup_memory two times
we could call find_max_pfn() directly instead of setup_memory() to get
max_pfn needed for mtrr trimming.

otherwise setup_memory() is called two times... that is duplicated...

[ mingo@elte.hu: both Thomas and me simulated a double call to
  setup_bootmem_allocator() and can confirm that it is a real bug
  which can hang in certain configs. It's not been reported yet but
  that is probably due to the relatively scarce nature of
  MTRR-trimming systems. ]

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26 22:23:41 +01:00
Bernhard Walle 72a7fe3967 Introduce flags for reserve_bootmem()
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.

This patch:

Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past.  This is to avoid conflicts.

Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:25 -08:00
Mel Gorman 1b000a5dbe x86: make NUMA work on 32-bit
The DISCONTIG memory model on x86 32 bit uses a remap allocator early
in boot. The objective is that portions of every node are mapped in to
the kernel virtual area (KVA) in place of ZONE_NORMAL so that node-local
allocations can be made for pgdat and mem_map structures.

With SPARSEMEM, the amount that is set aside is insufficient for all the
mem_maps to be allocated. During the boot process, it falls back to using
the bootmem allocator. This breaks assumptions that SPARSEMEM makes about
the layout of the mem_map in memory and results in a VM_BUG_ON triggering
due to pfn_to_page() returning garbage values.

This patch only enables the remap allocator for use with DISCONTIG.

Without SRAT support, a compile-error occurs because ACPI table parsing
functions are only available in x86-64. This patch also adds no-op stubs
and prints a warning message. What likely needs to be done is sharing
the table parsing functions between 32 and 64 bit if they are
compatible.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:25 +01:00
Mel Gorman bac4894dfa x86: make NUMA work on 32-bit again
On 32-bit NUMA, the memmap representing struct pages on each node is
allocated from node-local memory if possible. As only node-0 has memory from
ZONE_NORMAL, the memmap must be mapped into low memory. This is done by
reserving space in the Kernel Virtual Area (KVA) for the memmap belonging
to other nodes by taking pages from the end of ZONE_NORMAL and remapping
the other nodes memmap into those virtual addresses. The node boundaries
are then adjusted so that the region of pages is not used and it is marked
as reserved in the bootmem allocator.

This reserved portion of the KVA is PMD aligned althought
strictly speaking that requirement could be lifted (see thread at
http://lkml.org/lkml/2007/8/24/220). The problem is that when aligned, there
may be a portion of ZONE_NORMAL at the end that is not used for memmap and
does not have an initialised memmap nor is it marked reserved in the bootmem
allocator. Later in the boot process, these pages are freed and a storm of
Bad page state messages result.

This patch marks these pages reserved that are wasted due to alignment
in the bootmem allocator so they are not accidently freed. It is worth
noting that memory from node-0 is wasted where it could have been put into
ZONE_HIGHMEM on NUMA machines. Worse, the KVA is always reserved from the
location of real memory even when there is plenty of spare virtual address
space.

This patch also makes sure that reserve_bootmem() is not called with a
0-length size in numa_kva_reserve().  When this happens, it usually means
that a kernel built for Summit is being booted on a normal machine. The
resulting BUG_ON() is misleading so it is caught here.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:32:54 +01:00
H. Peter Anvin cf8fa920cb i386: handle an initrd in highmem (version 2)
The boot protocol has until now required that the initrd be located in
lowmem, which makes the lowmem/highmem boundary visible to the boot
loader.  This was exported to the bootloader via a compile-time
field.  Unfortunately, the vmalloc= command-line option breaks this
part of the protocol; instead of adding yet another hack that affects
the bootloader, have the kernel relocate the initrd down below the
lowmem boundary inside the kernel itself.

Note that this does not rely on HIGHMEM being enabled in the kernel.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:32:51 +01:00
Adrian Bunk fb8c177fe0 x86: mm/discontig_32.c: make code static
node0_bdata and paddr_to_nid() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-30 00:22:22 +01:00
Simon Arlott 27b46d7661 spelling fixes: arch/i386/
Spelling fixes in arch/i386/.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 01:13:56 +02:00
Adrian Bunk 59659f14b6 i386: make some variables static
This patch makes some needlessly global variables static.

[ tglx: arch/x86 adaptation ]

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-17 20:15:56 +02:00
H. Peter Anvin 30c826451d [x86] remove uses of magic macros for boot_params access
Instead of using magic macros for boot_params access, simply use the
boot_params structure.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2007-10-16 17:38:31 -07:00
Thomas Gleixner ad757b6aa5 i386: move mm
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-11 11:16:47 +02:00
Renamed from arch/i386/mm/discontig_32.c (Browse further)