1
0
Fork 0
Commit Graph

69 Commits (a2beab31b167bd8ba49bb84944e07ac096f2ab0a)

Author SHA1 Message Date
Yinghai Lu 475613b9e3 x86: fix memoryless node oops during boot
fix oops during boot reported in this thread:

  http://lkml.org/lkml/2008/2/6/65

enable booting on memoryless nodes.

Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26 22:23:40 +01:00
Thomas Gleixner 9e9630481e x86: revert: reserve dma32 early for gart
Revert

commit f62f1fc9ef
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date:   Fri Mar 7 15:02:50 2008 -0800

    x86: reserve dma32 early for gart

The patch has a dependency on bootmem modifications which are not .25
material that late in the -rc cycle. The problem which is addressed by
the patch is limited to machines with 256G and more memory booted with
NUMA disabled. This is not a .25 regression and the audience which is
affected by this problem is very limited, so it's safer to do the
revert than pulling in intrusive bootmem changes right now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-22 19:25:41 +01:00
Yinghai Lu f62f1fc9ef x86: reserve dma32 early for gart
a system with 256 GB of RAM, when NUMA is disabled crashes the
following way:

Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
Kernel panic - not syncing: Not enough memory for aperture
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33

Call Trace:
 [<ffffffff84037c62>] panic+0xb2/0x190
 [<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
 [<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
 [<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
 [<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
 [<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
 [<ffffffff84506a2f>] ? _etext+0x0/0x1
 [<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
 [<ffffffff847ac795>] mem_init+0x45/0x1a0
 [<ffffffff8479ff35>] start_kernel+0x295/0x380
 [<ffffffff8479f1c2>] _sinittext+0x1c2/0x230

the root cause is : memmap PMD is too big,
[ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.

solution will be:
1. make memmap allocation get memory above 4G...
2. reserve some dma32 range early before we try to set up memmap for all.
and release that before pci_iommu_alloc, so gart or swiotlb could get some
range under 4g limit for sure.

the patch is using method 2.
because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP

will get
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21 17:06:15 +01:00
Mikael Pettersson 12c247a671 x86: fix boot failure on 486 due to TSC breakage
> Diffing dmesg between git7 and git8 doesn't sched any light since
 > git8 also removed the printouts of the x86 caps as they were being
 > initialised and updated. I'm currently adding those printouts back
 > in the hope of seeing where and when the caps get broken.

That turned out to be very illuminating:

 --- dmesg-2.6.24-git7	2008-02-24 18:01:25.295851000 +0100
 +++ dmesg-2.6.24-git8	2008-02-24 18:01:25.530358000 +0100
 ...
 CPU: After generic identify, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000

 CPU: After all inits, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+CPU: After applying cleared_cpu_caps, caps: 00000013 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Notice how the TSC cap bit goes from Off to On.

(The first two lines are printout loops from -git7 forward-ported
to -git8, the third line is the same printout loop added just after
the xor-with-cleared_cpu_caps[] loop.)

Here's how the breakage occurs:
1. arch/x86/kernel/tsc_32.c:tsc_init() sees !cpu_has_tsc,
   so bails and calls setup_clear_cpu_cap(X86_FEATURE_TSC).
2. include/asm-x86/cpufeature.h:setup_clear_cpu_cap(bit) clears
   the bit in boot_cpu_data and sets it in cleared_cpu_caps
3. arch/x86/kernel/cpu/common.c:identify_cpu() XORs all caps
   in with cleared_cpu_caps
   HOWEVER, at this point c->x86_capability correctly has TSC
   Off, cleared_cpu_caps has TSC On, so the XOR incorrectly
   sets TSC to On in c->x86_capability, with disastrous results.

The real bug is that clearing bits with XOR only works if the
bits are known to be 1 prior to the XOR, and that's not true here.

A simple fix is to convert the XOR to AND-NOT instead. The following
patch does that, and allows my 486 to boot 2.6.25-rc kernels again.

[ mingo@elte.hu: fixed a similar bug in setup_64.c as well. ]

The breakage was introduced via commit 7d851c8d3d.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-26 12:56:04 +01:00
Marcin Slusarz d8ff0bbf56 x86: fix printout ugliness in cpu info printk
fix print_cpu_info, because it produced on boot:

  CPU: <6>AMD Athlon(tm) 64 Processor 3200+ stepping 00

instead of:

  CPU: AMD Athlon(tm) 64 Processor 3200+ stepping 00

(broken since 04e1ba8521 -
 x86: cleanup kernel/setup_64.c)

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:33 +01:00
Sam Ravnborg 04d733bd35 x86: fix section mismatch in setup_64.c:srat_detect_node
srat_detect_node() is only used by __cpuinit init_intel().
So the trivial fix is to annotate srat_detect_node() with __cpuinit.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:30 +01:00
Sam Ravnborg 08acb67262 x86: fix section mismatch warning in setup_64.c:nearby_node
nearby_node() were only used by __cpuinit amd_detect_cmp()
So annotating nearby_node() __cpuinit was the trivial fix.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:29 +01:00
David Howells 1eb1141123 aout: remove unnecessary inclusions of {asm, linux}/a.out.h
Remove now unnecessary inclusions of {asm,linux}/a.out.h.

[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:30 -08:00
Bernhard Walle 18a01a3beb Use BOOTMEM_EXCLUSIVE for kdump
Use the BOOTMEM_EXCLUSIVE, introduced in the previous patch, to avoid
conflicts while reserving the memory for the kdump capture kernel
(crashkernel=).

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:25 -08:00
Bernhard Walle 72a7fe3967 Introduce flags for reserve_bootmem()
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.

This patch:

Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past.  This is to avoid conflicts.

Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:25 -08:00
H. Peter Anvin fa1408e4df x86: unify CPU feature string names
Move the CPU feature string names to a separate file (common to 32
and 64 bits); additionally, make <asm/cpufeature.h> includable by host
code in preparation for including the CPU feature strings in the boot
code.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:00 +01:00
Yinghai Lu 24a5da73f4 x86_64: make bootmap_start page align v6
boot oopses when a system has 64 or 128 GB of RAM installed:

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>]  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60  EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS:  0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack:  ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
 [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
 [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12
 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
 [<ffffffff8020ccee>] ? child_rip+0x0/0x12

Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
 RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss are corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008a485 - 0000000000091484]
  bootmap [0000000000d406f4 -  0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100

setup_node_bootmem() makes NODE_DATA cacheline aligned,
and bootmap is page-aligned.

the patch updates find_e820_area() to make sure we can meet
the alignment constraints.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:49:41 +01:00
Bernhard Kaindl f212ec4b7b x86: early boot debugging via FireWire (ohci1394_dma=early)
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.

If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.

With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.

In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.

An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.

A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt

It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff

Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Huang, Ying a3828064be x86: fixes some bugs about EFI memory map handling
This patch fixes some bugs of EFI memory handing code.

- On x86_64, it is possible that EFI memory map can not be mapped via
  identity map, so efi_map_memmap is removed, just use early_ioremap.

- On i386, the EFI memory map mapping take effect cross paging_init,
  so it is not necessary to use efi_map_memmap.

- EFI memory map is unmapped in efi_enter_virtual_mode to avoid
  early_ioremap leak.

Signed-off-by: Huang Ying <ying.huang@intel.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Yinghai Lu 07035f076f x86: not set boot cpu in cpu_present_map again
in init/main.c boot_cpu_init() already does that before setup_arch

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:38 +01:00
Sam Ravnborg adb8daed46 x86: fix section mismatch warning in setup_64.c
Fix the following warning:
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x7a3): Section mismatch: reference to .init.text:amd_detect_cmp in 'init_amd'

The function amd_detect_cmp were annotated __init and
was only used from init_amd() which are annotated __cpuinit.

Annotate amd_detect_cmp() with _cpuinit to fix it.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:37 +01:00
Hiroshi Shimamoto 45078cb5e2 x86: remove struct cpu_model_info
No one uses struct cpu_model_info on x86_64 now.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:33 +01:00
Yinghai Lu 3effef1f3b x86: should use array directly for early_ptr
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:33 +01:00
Yinghai Lu ac629a98bf x86: remove duplicated line about
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:33 +01:00
Jan Engelhardt 8a45eb31d8 x86: constify function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:32 +01:00
travis@sgi.com 602a54a8ca x86: change bios_cpu_apicid to percpu data variable fixup
Change static bios_cpu_apicid array to a per_cpu data variable.
This includes using a static array used during initialization
similar to the way x86_cpu_to_apicid[] is handled.

There is one early use of bios_cpu_apicid in apic_is_clustered_box().
The other reference in cpu_present_to_apicid() is called after
smp_set_apicids() has setup the percpu version of bios_cpu_apicid.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:21 +01:00
Andi Kleen ac72e7888a x86: add generic clearcpuid=... option
Add a generic option to clear any cpuid bit. I added it because it was
very easy to add with the new generic cpuid disable bitmap and perhaps
it will be useful in the future.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:21 +01:00
Andi Kleen 191679fdfa x86: add noclflush option
To disable CLFLUSH usage, especially in change_page_attr().

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:21 +01:00
Andi Kleen 7d851c8d3d x86: add framework to disable CPUID bits on the command line
There are already various options to disable specific cpuid bits
on the command line. They all use their own variable. Add a generic
mask to make this easier in the future.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:20 +01:00
Hiroshi Shimamoto 74ff305b05 x86: move select_idle_routine() call after detect_ht()
Move the select_idle_routine() call to after the detect_ht() call at
identify_cpu() on 64-bit.

This change is for printing the polling idle and HT enabled warning
message properly.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:18 +01:00
Yinghai Lu 71617bf140 x86: only call early_init_amd one time
Andi's patch
"
    x86: move X86_FEATURE_CONSTANT_TSC into early cpu feature detection

    Need this in the next patch in time_init and that happens early.

    This includes a minor fix on i386 where early_intel_workarounds()
    [which is now called early_init_intel] really executes early as
    the comments say.
"
calling early_init_amd in early_identify_cpu and identify_cpu two times.

this patch remove the one in identify_cpu

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:18 +01:00
Jesse Barnes 99fc8d424b x86, 32-bit: trim memory not covered by wb mtrrs
On some machines, buggy BIOSes don't properly setup WB MTRRs to cover all
available RAM, meaning the last few megs (or even gigs) of memory will be
marked uncached.  Since Linux tends to allocate from high memory addresses
first, this causes the machine to be unusably slow as soon as the kernel
starts really using memory (i.e.  right around init time).

This patch works around the problem by scanning the MTRRs at boot and
figuring out whether the current end_pfn value (setup by early e820 code)
goes beyond the highest WB MTRR range, and if so, trimming it to match.  A
fairly obnoxious KERN_WARNING is printed too, letting the user know that
not all of their memory is available due to a likely BIOS bug.

Something similar could be done on i386 if needed, but the boot ordering
would be slightly different, since the MTRR code on i386 depends on the
boot_cpu_data structure being setup.

This patch fixes a bug in the last patch that caused the code to run on
non-Intel machines (AMD machines apparently don't need it and it's untested
on other non-Intel machines, so best keep it off).

Further enhancements and fixes from:

  Yinghai Lu <Yinghai.Lu@Sun.COM>
  Andi Kleen <ak@suse.de>

Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Tested-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:18 +01:00
Andi Kleen 7517527891 x86: replace hard coded reservations in 64-bit early boot code with dynamic table
On x86-64 there are several memory allocations before bootmem. To avoid
them stomping on each other they used to be all hard coded in bad_area().
Replace this with an array that is filled as needed.

This cleans up the code considerably and allows to expand its use.

Cc: peterz@infradead.org
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:17 +01:00
Andi Kleen 0c07ee38c9 x86: use the correct cpuid method to detect MWAIT support for C states
Previously there was a AMD specific quirk to handle the case of
AMD Fam10h MWAIT not supporting any C states. But it turns out
that CPUID already has ways to detectly detect that without
using special quirks.

The new code simply checks if MWAIT supports at least C1 and doesn't
use it if it doesn't. No more vendor specific code.

Note this is does not simply clear MWAIT because MWAIT can be still
useful even without C states.

Credit goes to Ben Serebrin for pointing out the (nearly) obvious.

Cc: "Andreas Herrmann" <andreas.herrmann3@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:16 +01:00
travis@sgi.com e8c10ef9dd x86: change bios_cpu_apicid to percpu data variable
Change static bios_cpu_apicid array to a per_cpu data variable.
This includes using a static array used during initialization
similar to the way x86_cpu_to_apicid[] is handled.

There is one early use of bios_cpu_apicid in apic_is_clustered_box().
The other reference in cpu_present_to_apicid() is called after
smp_set_apicids() has setup the percpu version of bios_cpu_apicid.

[ mingo@elte.hu: build fix ]

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:12 +01:00
travis@sgi.com df3825c56d x86: change NR_CPUS arrays in numa_64
Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

	char cpu_to_node_map[NR_CPUS];

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:11 +01:00
travis@sgi.com 3b41908902 x86: cleanup x86_cpu_to_apicid references
Clean up references to x86_cpu_to_apicid.  Removes extraneous
comments and standardizes on "x86_*_early_ptr" for the early
kernel init references.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:11 +01:00
Yinghai Lu aaf2304242 x86: disable the GART early, 64-bit
For K8 system: 4G RAM with memory hole remapping enabled, or more than
4G RAM installed.

when try to use kexec second kernel, and the first doesn't include
gart_shutdown. the second kernel could have different aper position than
the first kernel. and second kernel could use that hole as RAM that is
still used by GART set by the first kernel. esp. when try to kexec
2.6.24 with sparse mem enable from previous kernel (from RHEL 5 or SLES
10). the new kernel will use aper by GART (set by first kernel) for
vmemmap. and after new kernel setting one new GART. the position will be
real RAM. the _mapcount set is lost.

Bad page state in process 'swapper'
page:ffffe2000e600020 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 0, comm: swapper Not tainted 2.6.24-rc7-smp-gcdf71a10-dirty #13

Call Trace:
 [<ffffffff8026401f>] bad_page+0x63/0x8d
 [<ffffffff80264169>] __free_pages_ok+0x7c/0x2a5
 [<ffffffff80ba75d1>] free_all_bootmem_core+0xd0/0x198
 [<ffffffff80ba3a42>] numa_free_all_bootmem+0x3b/0x76
 [<ffffffff80ba3461>] mem_init+0x3b/0x152
 [<ffffffff80b959d3>] start_kernel+0x236/0x2c2
 [<ffffffff80b9511a>] _sinittext+0x11a/0x121

and
 [ffffe2000e600000-ffffe2000e7fffff] PMD ->ffff81001c200000 on node 0
phys addr is : 0x1c200000

RHEL 5.1 kernel -53 said:
PCI-DMA: aperture base @ 1c000000 size 65536 KB

new kernel said:
Mapping aperture over 65536 KB of RAM @ 3c000000

So could try to disable that GART if possible.

According to Ingo

> hm, i'm wondering, instead of modifying the GART, why dont we simply
> _detect_ whatever GART settings we have inherited, and propagate that
> into our e820 maps? I.e. if there's inconsistency, then punch that out
> from the memory maps and just dont use that memory.
>
> that way it would not matter whether the GART settings came from a [old
> or crashing] Linux kernel that has not called gart_iommu_shutdown(), or
> whether it's a BIOS that has set up an aperture hole inconsistent with
> the memory map it passed. (or the memory map we _think_ i tried to pass
> us)
>
> it would also be more robust to only read and do a memory map quirk
> based on that, than actively trying to change the GART so early in the
> bootup. Later on we have to re-enable the GART _anyway_ and have to
> punch a hole for it.
>
> and as a bonus, we would have shored up our defenses against crappy
> BIOSes as well.

add e820 modification for gart inconsistent setting.

gart_fix_e820=off could be used to disable e820 fix.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:09 +01:00
Randy Dunlap d504e39efd x86: discover_ebda section mismatch
Fix section mismatches.  discover_ebda() can be __init.

WARNING: vmlinux.o(.text+0x738a): Section mismatch: reference to .init.data:ebda_addr (between 'discover_ebda' and 'get_model_name')
WARNING: vmlinux.o(.text+0x73c4): Section mismatch: reference to .init.data:ebda_size (between 'discover_ebda' and 'get_model_name')

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:05 +01:00
Andi Kleen e3cfac84cf x86: mark memory_setup __init
Otherwise

WARNING: vmlinux.o(.text+0x64a9): Section mismatch: reference to .init.text:machine_specific_memory_setup (between 'memory_setup' and 'show_cpuinfo')

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:32:49 +01:00
Andreas Herrmann 9566e91d49 x86: fix detection of CONSTANT_TSC bit for AMD CPUs
Commits
 - c52f61fcbdb2aa84f0e4d831ef07f375e6b99b2c
  (x86: allow TSC clock source on AMD Fam10h and some cleanup)
 - e30436f05d456efaff77611e4494f607b14c2782
  (x86: move X86_FEATURE_CONSTANT_TSC into early cpu feature detection)

are supposed to fix the detection of contant TSC for AMD CPUs.
Unfortunately on x86_64 it does still not work with current x86/mm.
For a Phenom I still get:

  ...
  TSC calibrated against PM_TIMER
  Marking TSC unstable due to TSCs unsynchronized
  time.c: Detected 2288.366 MHz processor.
  ...

We have to set c->x86_power in early_identify_cpu to properly detect
the CONSTANT_TSC bit in early_init_amd.

Attached patch fixes this issue. Following the relevant boot
messages when the fix is used:

  ...
  TSC calibrated against PM_TIMER
  time.c: Detected 2288.279 MHz processor.
  ...
  Initializing CPU#1
  ...
  checking TSC synchronization [CPU#0 -> CPU#1]: passed.
  ...
  Initializing CPU#2
  ...
  checking TSC synchronization [CPU#0 -> CPU#2]: passed.
  ...
  Booting processor 3/4 APIC 0x3
  ...
  checking TSC synchronization [CPU#0 -> CPU#3]: passed.
  Brought up 4 CPUs
  ...

Patch is against x86/mm (v2.6.24-rc8-672-ga9f7faa).
Please apply.

Set c->x86_power in early_identify_cpu. This ensures that
X86_FEATURE_CONSTANT_TSC can properly be set in early_init_amd.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:32:41 +01:00
Andi Kleen 2b16a23538 x86: move X86_FEATURE_CONSTANT_TSC into early cpu feature detection
Need this in the next patch in time_init and that happens early.

This includes a minor fix on i386 where early_intel_workarounds()
[which is now called early_init_intel] really executes early as
the comments say.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:32:40 +01:00
Ingo Molnar e402644013 x86: map vsyscalls early enough
map vsyscalls early enough. This is important if a __vsyscall_fn
function is used by other kernel code too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:32:39 +01:00
Andi Kleen 707fa8ed92 x86: Implement support to synchronize RDTSC with LFENCE on Intel CPUs
According to Intel RDTSC can be always synchronized with LFENCE
on all current CPUs. Implement the necessary CPUID bit for that.

It is unclear yet if that is true for all future CPUs too,
but if there's another way the kernel can be always updated.

Cc: asit.k.mallick@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:32:37 +01:00
Andi Kleen de4218634e x86: implement support to synchronize RDTSC through MFENCE on AMD CPUs
According to AMD RDTSC can be synchronized through MFENCE.
Implement the necessary CPUID bit for that.

Cc: andreas.herrmann3@amd.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:32:37 +01:00
Glauber de Oliveira Costa 1a53905add x86: move definitions to processor.h
This patch moves definitions that are present in only one of the files
(between processor_32.h and processor_64.h), to processor.h. They're mostly
structures and function definitions.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:39 +01:00
Huang, Ying 5b83683f32 x86: EFI runtime service support
This patch adds basic runtime services support for EFI x86_64 system.  The
main file of the patch is the addition of efi_64.c for x86_64.  This file is
modeled after the EFI IA32 avatar.  EFI runtime services initialization are
implemented in efi_64.c.  Some x86_64 specifics are worth noting here.  On
x86_64, parameters passed to EFI firmware services need to follow the EFI
calling convention.  For this purpose, a set of functions named efi_call<x>
(<x> is the number of parameters) are implemented.  EFI function calls are
wrapped before calling the firmware service.  The duplicated code between
efi_32.c and efi_64.c is placed in efi.c to remove them from efi_32.c.

Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:19 +01:00
Glauber de Oliveira Costa 746ef0cd0c x86: prepare 64-bit architecture initialization for paravirt
This patch prepares the x86_64 architecture initialization for
paravirt. It requires a memory initialization step, which is done
by implementing 64-bit version for machine_specific_memory_setup,
and putting an ARCH_SETUP hook, for guest-dependent initialization.
This last step is done akin to i386

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:11 +01:00
Markus Metzger eee3af4a2c x86, ptrace: support for branch trace store(BTS)
Resend using different mail client

Changes to the last version:
- split implementation into two layers: ds/bts and ptrace
- renamed TIF's
- save/restore ds save area msr in __switch_to_xtra()
- make block-stepping only look at BTF bit

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:09 +01:00
Jeremy Fitzhardinge 53756d3722 x86: add set/clear_cpu_cap operations
The patch to suppress bitops-related warnings added a pile of ugly
casts.  Many of these were related to the management of x86 CPU
capabilities.  Clean these up by adding specific set/clear_cpu_cap
macros, and use them consistently.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:55 +01:00
Jeremy Fitzhardinge 5548fecdff x86: clean up bitops-related warnings
Add casts to appropriate places to silence spurious bitops warnings.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:55 +01:00
Yinghai Lu a860b63c41 x86: store core id bits in cpuinfo_x8
We need to store core id bits to cpuinfo_x86 in early_identify_cpu. So we
use it to create acpiid_to_node array in k8topolgy.c

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:39 +01:00
Thomas Gleixner 04e1ba8521 x86: cleanup kernel/setup_64.c
Clean it up before applying more patches to it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:30:39 +01:00
Bernhard Walle c9cce83dd1 x86: remove extern declarations for code, data, bss resources
This patch removes the extern struct resource declarations for
data_resource, code_resource and bss_resource on x86 and declares that
three structures as static as done on other architectures like IA64.

On i386, these structures are moved to setup_32.c (from e820_32.c) because
that's code that is not specific to e820 and also required on EFI systems.
That makes the "extern" reference superfluous.

On x86_64, data_resource, code_resource and bss_resource are passed to
e820_reserve_resources() as arguments just as done on i386 and IA64.  That
also avoids the "extern" reference and it's possible to make it static.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:32 +01:00
Thomas Gleixner 3e35a0e525 x86: move ioapic code where it belongs
The commit 399287229c hacked the
ioapic resource mapping into apic.c for no good reason.
Move the code into io_apic_64.c where it belongs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:30:19 +01:00