Commit graph

2333 commits

Author SHA1 Message Date
Marcelo Tosatti 1da8a77bdc x86: KVM guest: hypercall based pte updates and TLB flushes
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - guest/host split
 - fix 32-bit truncation issues
 - adjust to mmu_op
 - adjust to ->release_*() renamed
 - add ->release_pud()]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:28 +03:00
Marcelo Tosatti 2f333bcb4e KVM: MMU: hypercall based pte updates and TLB flushes
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - one mmu_op hypercall instead of one per op
 - allow 64-bit gpa on hypercall
 - don't pass host errors (-ENOMEM) to guest]

[akpm: warning fix on i386]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:27 +03:00
Avi Kivity 9f81128591 KVM: Provide unlocked version of emulator_write_phys()
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:26 +03:00
Marcelo Tosatti 0cf1bfd273 x86: KVM guest: add basic paravirt support
Add basic KVM paravirt support. Avoid vm-exits on IO delays.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:25 +03:00
Marcelo Tosatti a28e4f5a62 KVM: add basic paravirt support
Add basic KVM paravirt support. Avoid vm-exits on IO delays.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:24 +03:00
Sheng Yang 308b0f239e KVM: Add reset support for in kernel PIT
Separate the reset part and prepare for reset support.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:23 +03:00
Sheng Yang e0f63cb927 KVM: Add save/restore supporting of in kernel PIT
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:22 +03:00
Sheng Yang 7837699fa6 KVM: In kernel PIT model
The patch moves the PIT model from userspace to kernel, and increases
the timer accuracy greatly.

[marcelo: make last_injected_time per-guest]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:21 +03:00
Avi Kivity 4fcaa98267 KVM: Remove pointless desc_ptr #ifdef
The desc_struct changes left an unnecessary #ifdef; remove it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:27 +03:00
Avi Kivity 019960ae99 KVM: VMX: Don't adjust tsc offset forward
Most Intel hosts have a stable tsc, and playing with the offset only
reduces accuracy.  By limiting tsc offset adjustment only to forward updates,
we effectively disable tsc offset adjustment on these hosts.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:27 +03:00
Harvey Harrison b8688d51bb KVM: replace remaining __FUNCTION__ occurances
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:27 +03:00
Joerg Roedel 71c4dfafc0 KVM: detect if VCPU triple faults
In the current inject_page_fault path KVM only checks if there is another PF
pending and injects a DF then. But it has to check for a pending DF too to
detect a shutdown condition in the VCPU.  If this is not detected the VCPU goes
to a PF -> DF -> PF loop when it should triple fault. This patch detects this
condition and handles it with an KVM_SHUTDOWN exit to userspace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:27 +03:00
Avi Kivity 2d3ad1f40c KVM: Prefix control register accessors with kvm_ to avoid namespace pollution
Names like 'set_cr3()' look dangerously close to affecting the host.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:26 +03:00
Marcelo Tosatti 05da45583d KVM: MMU: large page support
Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed.  If the largepage contains
write-protected pages, a large pte is not used.

Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.

Anthony measures a 4% improvement on 4-way kernbench, with NPT.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:25 +03:00
Marcelo Tosatti 2e53d63acb KVM: MMU: ignore zapped root pagetables
Mark zapped root pagetables as invalid and ignore such pages during lookup.

This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:25 +03:00
Alexander Graf 847f0ad8cb KVM: Implement dummy values for MSR_PERF_STATUS
Darwin relies on this and ceases to work without.

Signed-off-by: Alexander Graf <alex@csgraf.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:25 +03:00
Harvey Harrison 14af3f3c56 KVM: sparse fixes for kvm/x86.c
In two case statements, use the ever popular 'i' instead of index:
arch/x86/kvm/x86.c:1063:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here
arch/x86/kvm/x86.c:1079:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here

Make it static.
arch/x86/kvm/x86.c:1945:24: warning: symbol 'emulate_ops' was not declared. Should it be static?

Drop the return statements.
arch/x86/kvm/x86.c:2878:2: warning: returning void-valued expression
arch/x86/kvm/x86.c:2944:2: warning: returning void-valued expression

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Harvey Harrison 4866d5e3d5 KVM: SVM: make iopm_base static
Fixes sparse warning as well.
arch/x86/kvm/svm.c:69:15: warning: symbol 'iopm_base' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Harvey Harrison 77cd337f22 KVM: x86 emulator: fix sparse warnings in x86_emulate.c
Nesting __emulate_2op_nobyte inside__emulate_2op produces many shadowed
variable warnings on the internal variable _tmp used by both macros.

Change the outer macro to use __tmp.

Avoids a sparse warning like the following at every call site of __emulate_2op
arch/x86/kvm/x86_emulate.c:1091:3: warning: symbol '_tmp' shadows an earlier one
arch/x86/kvm/x86_emulate.c:1091:3: originally declared here
[18 more warnings suppressed]

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Amit Shah f11c3a8d84 KVM: Add stat counter for hypercalls
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Avi Kivity a5f61300c4 KVM: Use x86's segment descriptor struct instead of private definition
The x86 desc_struct unification allows us to remove segment_descriptor.h.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Avi Kivity a988b910ef KVM: Add API for determining the number of supported memory slots
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:23 +03:00
Avi Kivity f725230af9 KVM: Add API to retrieve the number of supported vcpus per vm
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:23 +03:00
Harvey Harrison 7a95727567 KVM: x86 emulator: make register_address_increment and JMP_REL static inlines
Change jmp_rel() to a function as well.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:23 +03:00
Harvey Harrison e4706772ea KVM: x86 emulator: make register_address, address_mask static inlines
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:22 +03:00
Harvey Harrison ddcb2885e2 KVM: x86 emulator: add ad_mask static inline
Replaces open-coded mask calculation in macros.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:22 +03:00
Glauber de Oliveira Costa 790c73f628 x86: KVM guest: paravirtualized clocksource
This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

[randy dunlap: add missing include]
[randy dunlap: disallow on Voyager or Visual WS]

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:22 +03:00
Glauber de Oliveira Costa 18068523d3 KVM: paravirtualized clocksource: host part
This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writing to it periodically.

The area is binary compatible with xen, as we use the same shadow_info
structure.

[marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME]
[avi: save full value of the msr, even if enable bit is clear]
[avi: clear previous value of time_page]

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:22 +03:00
Joerg Roedel 24e09cbf48 KVM: SVM: enable LBR virtualization
This patch implements the Last Branch Record Virtualization (LBRV) feature of
the AMD Barcelona and Phenom processors into the kvm-amd module. It will only
be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So
there is no increased world switch overhead when the guest doesn't use these
MSRs.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Joerg Roedel f65c229c3e KVM: SVM: allocate the MSR permission map per VCPU
This patch changes the kvm-amd module to allocate the SVM MSR permission map
per VCPU instead of a global map for all VCPUs. With this we have more
flexibility allowing specific guests to access virtualized MSRs. This is
required for LBR virtualization.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Joerg Roedel e6101a96c9 KVM: SVM: let init_vmcb() take struct vcpu_svm as parameter
Change the parameter of the init_vmcb() function in the kvm-amd module from
struct vmcb to struct vcpu_svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Ryan Harper 2e11384c2c KVM: VMX: fix typo in VMX header define
Looking at Intel Volume 3b, page 148, table 20-11 and noticed
that the field name is 'Deliver' not 'Deliever'.  Attached patch changes
the define name and its user in vmx.c

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Joerg Roedel 709ddebf81 KVM: SVM: add support for Nested Paging
This patch contains the SVM architecture dependent changes for KVM to enable
support for the Nested Paging feature of AMD Barcelona and Phenom processors.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Joerg Roedel fb72d1674d KVM: MMU: add TDP support to the KVM MMU
This patch contains the changes to the KVM MMU necessary for support of the
Nested Paging feature in AMD Barcelona and Phenom Processors.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:20 +03:00
Joerg Roedel cc4b6871e7 KVM: export the load_pdptrs() function to modules
The load_pdptrs() function is required in the SVM module for NPT support.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:20 +03:00
Joerg Roedel 4d9976bbdc KVM: MMU: make the __nonpaging_map function generic
The mapping function for the nonpaging case in the softmmu does basically the
same as required for Nested Paging. Make this function generic so it can be
used for both.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:20 +03:00
Joerg Roedel 1855267210 KVM: export information about NPT to generic x86 code
The generic x86 code has to know if the specific implementation uses Nested
Paging. In the generic code Nested Paging is called Two Dimensional Paging
(TDP) to avoid confusion with (future) TDP implementations of other vendors.
This patch exports the availability of TDP to the generic x86 code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel 6c7dac72d5 KVM: SVM: add module parameter to disable Nested Paging
To disable the use of the Nested Paging feature even if it is available in
hardware this patch adds a module parameter. Nested Paging can be disabled by
passing npt=0 to the kvm_amd module.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel e3da3acdb3 KVM: SVM: add detection of Nested Paging feature
Let SVM detect if the Nested Paging feature is available on the hardware.
Disable it to keep this patch series bisectable.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel 33bd6a0b3e KVM: SVM: move feature detection to hardware setup code
By moving the SVM feature detection from the each_cpu code to the hardware
setup code it runs only once. As an additional advance the feature check is now
available earlier in the module setup process.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel 9457a712a2 KVM: allow access to EFER in 32bit KVM
This patch makes the EFER register accessible on a 32bit KVM host. This is
necessary to boot 32 bit PAE guests under SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel 9f62e19a11 KVM: VMX: unifdef the EFER specific code
To allow access to the EFER register in 32bit KVM the EFER specific code has to
be exported to the x86 generic code. This patch does this in a backwards
compatible manner.

[avi: add check for EFER-less hosts]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:18 +03:00
Joerg Roedel 50a37eb4e0 KVM: align valid EFER bits with the features of the host system
This patch aligns the bits the guest can set in the EFER register with the
features in the host processor. Currently it lets EFER.NX disabled if the
processor does not support it and enables EFER.LME and EFER.LMA only for KVM on
64 bit hosts.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:18 +03:00
Joerg Roedel f2b4b7ddf6 KVM: make EFER_RESERVED_BITS configurable for architecture code
This patch give the SVM and VMX implementations the ability to add some bits
the guest can set in its EFER register.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:18 +03:00
Sheng Yang 2384d2b326 KVM: VMX: Enable Virtual Processor Identification (VPID)
To allow TLB entries to be retained across VM entry and VM exit, the VMM
can now identify distinct address spaces through a new virtual-processor ID
(VPID) field of the VMCS.

[avi: drop vpid_sync_all()]
[avi: add "cc" to asm constraints]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:17 +03:00
Avi Kivity d196e34336 KVM: MMU: Decouple mmio from shadow page tables
Currently an mmio guest pte is encoded in the shadow pagetable as a
not-present trapping pte, with the SHADOW_IO_MARK bit set.  However
nothing is ever done with this information, so maintaining it is a
useless complication.

This patch moves the check for mmio to before shadow ptes are instantiated,
so the shadow code is never invoked for ptes that reference mmio.  The code
is simpler, and with future work, can be made to handle mmio concurrently.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:17 +03:00
Avi Kivity 1d6ad2073e KVM: x86 emulator: group decoding for group 1 instructions
Opcodes 0x80-0x83

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:16 +03:00
Avi Kivity d95058a1a7 KVM: x86 emulator: add group 7 decoding
This adds group decoding for opcode 0x0f 0x01 (group 7).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:15 +03:00
Avi Kivity fd60754e4f KVM: x86 emulator: Group decoding for groups 4 and 5
Add group decoding support for opcode 0xfe (group 4) and 0xff (group 5).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:15 +03:00
Avi Kivity 7d858a19ef KVM: x86 emulator: Group decoding for group 3
This adds group decoding support for opcodes 0xf6, 0xf7 (group 3).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Avi Kivity 43bb19cd33 KVM: x86 emulator: group decoding for group 1A
This adds group decode support for opcode 0x8f.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Avi Kivity e09d082c03 KVM: x86 emulator: add support for group decoding
Certain x86 instructions use bits 3:5 of the byte following the opcode as an
opcode extension, with the decode sometimes depending on bits 6:7 as well.
Add support for this in the main decoding table rather than an ad-hock
adaptation per opcode.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Dong, Eddie 1ae0a13def KVM: MMU: Simplify hash table indexing
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Dong, Eddie 489f1d6526 KVM: MMU: Update shadow ptes on partial guest pte writes
A guest partial guest pte write will leave shadow_trap_nonpresent_pte
in spte, which generates a vmexit at the next guest access through that pte.

This patch improves this by reading the full guest pte in advance and thus
being able to update the spte and eliminate the vmexit.

This helps pae guests which use two 32-bit writes to set a single 64-bit pte.

[truncation fix by Eric]

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:13 +03:00
Peter Zijlstra 7f424a8b08 fix idle (arch, acpi and apm) and lockdep
OK, so 25-mm1 gave a lockdep error which made me look into this.

The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561

The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.

So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:

  idle routines are entered with IRQs disabled
  idle routines will exit with IRQs enabled

Nearly all already did this in one form or another.

Merge the 32 and 64 bit bits so they no longer have different bugs.

As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-27 00:01:45 +02:00
Yinghai Lu 5f0b2976cb x86: add pci=check_enable_amd_mmconf and dmi check
so will disable that feature by default, and only enable that via
pci=check_enable_amd_mmconf or for system match with dmi table.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu e8ee6f0ae5 x86: work around io allocation overlap of HT links
normally BIOSes assign io/mmio range to different HT links without
overlapping, even same node same link should get non overlapping
entries.

but Rafael L. Wysocki's buggy BIOS creates a link with overlapping
entries for mmio and io:

  node 0 link 0: io port [1000, ffffff]
  node 0 link 0: mmio [e0000000, efffffff]
  node 0 link 0: mmio [a0000, bffff]
  node 0 link 0: mmio [80000000, ffffffff]

try to merge them and we will get:

  bus: [00, ff] on node 0 link 0
  bus: 00 index 0 io port: [0, ffff]
  bus: 00 index 1 mmio: [80000000, fcffffffff]
  bus: 00 index 2 mmio: [a0000, bffff]

so later we will reduce the chance to assign used resource to
unassigned device.

Reported-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu cbf9bd603a acpi: get boot_cpu_id as early for k8_scan_nodes
[mingo@elte.hu: split from "x86_64: get boot_cpu_id as early for k8_scan_nodes]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu 4cf1946374 x86_64: don't need set default res if only have one root bus
if there's only one root bus there's no need to split resources.

This patch fixes the issue described at:

  http://lkml.org/lkml/2008/4/10/304

Reported-and-bisected-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu 6e184f299d x86: double check the multi root bus with fam10h mmconf
some bioses give same range to mmconf for fam10h msr, and mmio for node/link.

fam10h msr will overide mmio for node/link.
so we can not assign range to devices under node/link for unassigned resources.

this patch will take range out from the mmio for node/link

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu 30a18d6c3f x86: multi pci root bus with different io resource range, on 64-bit
scan AMD opteron io/mmio routing to make sure every pci root bus get correct
resource range. Thus later pci scan could assign correct resource to device
with unassigned resource.

this can fix a system without _CRS for multi pci root bus.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu 35ddd068fb x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
... so we use the same code with Quad core cpu as old opteron.

This patch is useful when acpi=off or _PXM is not there in DSDT.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu 871d5f8dd0 x86: get mp_bus_to_node early
Currently, on an amd k8 system with multi ht chains, the numa_node of
pci devices under /sys/devices/pci0000:80/* is always 0, even if that
chain is on node 1 or 2 or 3.

Workaround: pcibus_to_node(bus) is used when we want to get the node that
pci_device is on.

In struct device, we already have numa_node member, and we could use
dev_to_node()/set_dev_node() to get and set numa_node in the device.
set_dev_node is called in pci_device_add() with pcibus_to_node(bus),
and pcibus_to_node uses bus->sysdata for nodeid.

The problem is when pci_add_device is called, bus->sysdata is not assigned
correct nodeid yet. The result is that numa_node will always be 0.

pcibios_scan_root and pci_scan_root could take sysdata. So we need to get
mp_bus_to_node mapping before these two are called, and thus
get_mp_bus_to_node could get correct node for sysdata in root bus.

In scanning of the root bus, all child busses will take parent bus sysdata.
So all pci_device->dev.numa_node will be assigned correctly and automatically.

Later we could use dev_to_node(&pci_dev->dev) to get numa_node, and we
could also could make other bus specific device get the correct numa_node
too.

This is an updated version of pci_sysdata and Jeff's pci_domain patch.

[ mingo@elte.hu: build fix ]

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:04 +02:00
Yinghai Lu bb63b42199 x86 pci: remove checking type for mmconfig probe
doesn't need to check if it is type1 or type2, we can use raw_pci_ops
directly.

also make pci_direct_conf1 static again.

anyway is there system with type 2 and mmconf support?

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu d2ebdf4bae x86: remove unneeded check in mmconf reject
mmconfig is only used to access extended configuration space.

so don't need to reject MFG that only have one entry and only handle bus0.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu d39398a333 x86: seperate mmconf for fam10h out from setup_64.c
Separate mmconf for fam10h out from setup_64.c

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu d4c4d09415 x86: if acpi=off, force setting the mmconf for fam10h
some BIOS only let AMD fam 10h handle bus0, and nvidia mcp55/ck804
to handle other buses. at that case MCFG will cover all over them.

but with acpi=off, we can not use MCFG. this patch will double check
the busnbits, and if it is less handling 256 bues, and acpi=off
will forcely reset the mmconf in msr, so we still use mmconf in above case.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu 7fd0da4085 x86_64: check MSR to get MMCONFIG for AMD Family 10h
so even booting kernel with acpi=off or even MCFG is not there, we still can
use MMCONFIG.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu eee206c3bf x86_64: check and enable MMCONFIG for AMD Family 10h
So we can use MMCONF when MMCONF is not set by BIOS

using TOP_MEM2 msr to get memory top, and try to scan fam10h mmio routing to
make sure the range is not conflicted with some prefetch MMIO that is above 4G.
(current only LinuxBIOS assign 64 bit mmio above 4G for some co-processor)

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 23:41:04 +02:00
Yinghai Lu 57741a7790 x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
reuse pci_cfg_space_size but skip check pci express and pci-x CAP ID.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Yinghai Lu 05c58b8ac7 x86: mmconf enable mcfg early
Patch
	"x86: validate against ACPI motherboard resources"

changed the mmconf init sequence, and init MMCONF late in acpi_init.

here change it back to old sequence:

 1. check hostbridge in early
 2. check MCFG with e820 in early
 3. if all fail, will check MCFg with acpi _CRS in acpi_init

So we can make MCONF working again when acpi=off is set if hostbridge
support that.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Yinghai Lu 0b64ad7123 x86: clear pci_mmcfg_virt when mmcfg get rejected
For x86_64, need to free pci_mmcfg_virt, and iounmap some pointers
when MMCONF is not reserved in E820 or acpi _CRS and get rejected.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Robert Hancock 7752d5cfe3 x86: validate against acpi motherboard resources
This path adds validation of the MMCONFIG table against the ACPI reserved
motherboard resources.  If the MMCONFIG table is found to be reserved in
ACPI, we don't bother checking the E820 table.  The PCI Express firmware
spec apparently tells BIOS developers that reservation in ACPI is required
and E820 reservation is optional, so checking against ACPI first makes
sense.  Many BIOSes don't reserve the MMCONFIG region in E820 even though
it is perfectly functional, the existing check needlessly disables MMCONFIG
in these cases.

In order to do this, MMCONFIG setup has been split into two phases.  If PCI
configuration type 1 is not available then MMCONFIG is enabled early as
before.  Otherwise, it is enabled later after the ACPI interpreter is
enabled, since we need to be able to execute control methods in order to
check the ACPI reserved resources.  Presently this is just triggered off
the end of ACPI interpreter initialization.

There are a few other behavioral changes here:

- Validate all MMCONFIG configurations provided, not just the first one.

- Validate the entire required length of each configuration according to
  the provided ending bus number is reserved, not just the minimum required
  allocation.

- Validate that the area is reserved even if we read it from the chipset
  directly and not from the MCFG table.  This catches the case where the
  BIOS didn't set the location properly in the chipset and has mapped it
  over other things it shouldn't have.

This also cleans up the MMCONFIG initialization functions so that they
simply do nothing if MMCONFIG is not compiled in.

Based on an original patch by Rajesh Shah from Intel.

[akpm@linux-foundation.org: many fixes and cleanups]
Signed-off-by: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andi Kleen <ak@suse.de>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 23:41:03 +02:00
Linus Torvalds c3bf9bc243 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
  x86_64/mm: check and print vmemmap allocation continuous
  x86_64: fix setup_node_bootmem to support big mem excluding with memmap
  x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
  mm: allow reserve_bootmem() cross nodes
  mm: offset align in alloc_bootmem()
  mm: fix alloc_bootmem_core to use fast searching for all nodes
  mm: make mem_map allocation continuous
2008-04-26 14:04:32 -07:00
Yinghai Lu c2b91e2eec x86_64/mm: check and print vmemmap allocation continuous
On big systems with lots of memory, don't print out too much during
bootup, and make it easy to find if it is continuous.

on 256G 8 sockets system will get
 [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0
[ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs
 [ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0
[ffffe20038700000-ffffe200387fffff] potential offnode page_structs
 [ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1
 [ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2
 [ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2
[ffffe20054700000-ffffe200547fffff] potential offnode page_structs
 [ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2
[ffffe20070700000-ffffe200707fffff] potential offnode page_structs
 [ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3
 [ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4
 [ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4
[ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs
 [ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4
[ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs
 [ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5
 [ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6
 [ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6
[ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs
 [ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6
 [ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7

instead of a very long print out...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 22:51:09 +02:00
Yinghai Lu 1a27fc0a42 x86_64: fix setup_node_bootmem to support big mem excluding with memmap
typical case: four sockets system, every node has 4g ram, and we are using:

	memmap=10g$4g

to mask out memory on node1 and node2

when numa is enabled, early_node_mem is used to get node_data and node_bootmap.

if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.

so check it and print out some info about it.

need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.

depends on "mm: make reserve_bootmem can crossed the nodes".

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Yinghai Lu 8b3cd09ed2 x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
"mm: make reserve_bootmem can crossed the nodes" provides new
reserve_bootmem(), let reserve_bootmem_generic() use that.

reserve_bootmem_generic() is used to reserve initramdisk, so this way
we can make sure even when bootloader or kexec load ranges cross the
node memory boundaries, reserve_bootmem still works.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Linus Torvalds 9b79ed952b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3:
  x86, bitops: select the generic bitmap search functions
  x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only
  x86: finalize bitops unification
  x86, UML: remove x86-specific implementations of find_first_bit
  x86: optimize find_first_bit for small bitmaps
  x86: switch 64-bit to generic find_first_bit
  x86: generic versions of find_first_(zero_)bit, convert i386
  bitops: use __fls for fls64 on 64-bit archs
  generic: implement __fls on all 64-bit archs
  generic: introduce a generic __fls implementation
  x86: merge the simple bitops and move them to bitops.h
  x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
  x86, uml: fix uml with generic find_next_bit for x86
  x86: change x86 to use generic find_next_bit
  uml: Kconfig cleanup
  uml: fix build error
2008-04-26 13:46:11 -07:00
Linus Torvalds 539a5fe226 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam:
  x86, boot: Document for linked list of struct setup_data
  x86, boot: export linked list of struct setup_data via debugfs
  x86, boot: add linked list of struct setup_data
  x86, boot: add free_early to early reservation machanism
2008-04-26 13:29:41 -07:00
Huang, Ying c14b2adf19 x86, boot: export linked list of struct setup_data via debugfs
Export linked list of struct setup_data via debugfs.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 21:34:42 +02:00
Huang, Ying 8b664aa66e x86, boot: add linked list of struct setup_data
This patch adds a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 21:34:42 +02:00
Huang, Ying 50eae2a7c9 x86, boot: add free_early to early reservation machanism
Add free_early to early reservation mechanism - this way early bootup
failure paths can stop wasting memory.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 21:34:42 +02:00
Venki Pallipadi 0124cecfc8 x86, PAT: disable /dev/mem mmap RAM with PAT
disable /dev/mem mmap of RAM with PAT. It makes things safer and
eliminates aliasing. A future improvement would be to avoid the
range_is_allowed duplication.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 21:28:29 +02:00
Alexander van Heukelum 19870def58 x86, bitops: select the generic bitmap search functions
Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
lib/Kconfig, defaulting to off. An arch that wants to use the
generic implementation now only has to use a select statement
to include them.

I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
and used that to select the generic search functions. This
way ARCH=um SUBARCH=i386 automatically picks up the change
too, and arch/um/Kconfig.i386 can therefore be simplified a
bit. ARCH=um SUBARCH=x86_64 does things differently, but
still compiles fine. It seems that a "def_bool y" always
wins over a "def_bool n"?

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:17 +02:00
Alexander van Heukelum 5245698f66 x86, UML: remove x86-specific implementations of find_first_bit
x86 has been switched to the generic versions of find_first_bit
and find_first_zero_bit, but the original versions were retained.
This patch just removes the now unused x86-specific versions.

also update UML.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:17 +02:00
Alexander van Heukelum 2aba6925fd x86: switch 64-bit to generic find_first_bit
Switch x86_64 to generic find_first_bit. The x86_64-specific
implementation is not removed.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Alexander van Heukelum 77b9bd9c49 x86: generic versions of find_first_(zero_)bit, convert i386
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.

The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.

This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Alexander van Heukelum 12d9c8420b x86: merge the simple bitops and move them to bitops.h
Some of those can be written in such a way that the same
inline assembly can be used to generate both 32 bit and
64 bit code.

For ffs and fls, x86_64 unconditionally used the cmov
instruction and i386 unconditionally used a conditional
branch over a mov instruction. In the current patch I
chose to select the version based on the availability
of the cmov instruction instead. A small detail here is
that x86_64 did not previously set CONFIG_X86_CMOV=y.

Improved comments for ffs, ffz, fls and variations.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Alexander van Heukelum 6fd92b63d0 x86: change x86 to use generic find_next_bit
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.

Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.

		Athlon		Xeon		Opteron 32/64bit
x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s

If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.

Using the generic version does give a slightly bigger kernel, though.

defconfig:	   text    data     bss     dec     hex filename
x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Linus Torvalds 4a27214d7b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  x86 PAT: decouple from nonpromisc devmem
  x86 PAT: tone down debugging messages
2008-04-26 09:50:58 -07:00
Linus Torvalds d485cb9aa2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-optimized-inlining
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-optimized-inlining:
  generic: make optimized inlining arch-opt-in
  x86: add optimized inlining
2008-04-26 09:48:52 -07:00
Linus Torvalds b82287587e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-misc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-misc: (28 commits)
  x86: section mismatch fixes, #3
  x86: section mismatch fixes, #2
  x86: pgtable_32.h - prototype and section mismatch fixes
  x86: unlock_ExtINT_logic() - fix section mismatch warnings
  x86: uniq_ioapic_id - fix section mismatch warning
  x86: trampoline_32.S - switch to .cpuinit.data
  x86: use get_bios_ebda()
  x86: remove duplicate get_bios_ebda() from rio.h
  x86: get_bios_ebda() requires asm/io.h
  x86: use cpumask function for present, possible, and online cpus
  x86: cleanup div_sc() usage
  x86: cleanup clocksource_hz2mult usage
  x86: remove unnecessary memset and NULL check after alloc_bootmem()
  x86: use bitmap library for pin_programmed
  x86: use MP_intsrc_info()
  x86: use BUILD_BUG_ON() for the size of struct intel_mp_floating
  x86_64 ia32 ptrace: convert to compat_arch_ptrace
  x86_64 ia32 ptrace: use compat_ptrace_request for siginfo
  x86 signals: lift set_fs
  x86 signals: lift flags diddling code
  ...
2008-04-26 09:44:32 -07:00
Ingo Molnar 2a8a2719be x86 PAT: decouple from nonpromisc devmem
Linus pointed it out that PAT should not depend on NONPROMISC_DEVMEM.

Also make PAT non-default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-26 09:31:17 -07:00
Ingo Molnar 765c68bd54 generic: make optimized inlining arch-opt-in
Stephen Rothwell reported that linux-next did not build on powerpc64.

make optimized inlining dependent on architecture opt-in.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:44:55 +02:00
Ingo Molnar 60a3cdd063 x86: add optimized inlining
add CONFIG_OPTIMIZE_INLINING=y.

allow gcc to optimize the kernel image's size by uninlining
functions that have been marked 'inline'. Previously gcc was
forced by Linux to always-inline these functions via a gcc
attribute:

 #define inline	inline __attribute__((always_inline))

Especially when the user has already selected
CONFIG_OPTIMIZE_FOR_SIZE=y this can make a huge difference in
kernel image size (using a standard Fedora .config):

   text    data     bss     dec           hex filename
   5613924  562708 3854336 10030968    990f78 vmlinux.before
   5486689  562708 3854336  9903733    971e75 vmlinux.after

that's a 2.3% text size reduction (!).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:44:55 +02:00
Jacek Luczak 5afca33a43 x86: section mismatch fixes, #3
This patch fixes section mismatch warnings in unlock_ExtINT_logic().

WARNING: arch/x86/kernel/built-in.o(.text+0x14a92): Section mismatch in reference from the function unlock_ExtINT_logic()
to the function .init.text:find_isa_irq_pin()
The function unlock_ExtINT_logic() references
the function __init find_isa_irq_pin().
This is often because unlock_ExtINT_logic lacks a __init
annotation or the annotation of find_isa_irq_pin is wrong.

Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 17:35:48 +02:00
Jacek Luczak 28acf285de x86: unlock_ExtINT_logic() - fix section mismatch warnings
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x12cc9): Section mismatch in reference from the function unlock_ExtINT_logic()

unlock_ExtINT_logic() is only used by __init check_timer(). Annotate unlock_ExtINT_logic() witch __init.

Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 17:35:47 +02:00
Jacek Luczak 991074fd35 x86: uniq_ioapic_id - fix section mismatch warning
Fix folowing warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x10799): Section mismatch in reference from the function uniq_ioapic_id()

uniq_ioapic_id() is only used by __init mp_register_ioapic(). Annotate uniq_ioapic_id() with __init.

Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 17:35:47 +02:00
Jacek Luczak 4c01f23bdb x86: trampoline_32.S - switch to .cpuinit.data
This patch fixes section mismatch warnings of __cpuinit
setup_trampoline() on 32-bit host.

Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 17:35:47 +02:00
Akinobu Mita 356fa0c6e1 x86: use get_bios_ebda()
Use get_bios_ebda().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita ae5830a6f8 x86: remove duplicate get_bios_ebda() from rio.h
get_bios_ebda() exists in asm/rio.h and asm/bios_ebda.h.
This patch removes the one in asm/rio.h.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita 7c04e64a1b x86: use cpumask function for present, possible, and online cpus
cpu_online(), cpu_present(), for_each_possible_cpu(), num_possible_cpus()

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita 877084fb1c x86: cleanup div_sc() usage
Remove the magic number in the third argment of div_sc().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita d454157b11 x86: cleanup clocksource_hz2mult usage
Remove the magic number in the second argument of clocksource_hz2mult()

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita b1fceac2b9 x86: remove unnecessary memset and NULL check after alloc_bootmem()
memset and NULL check after alloc_bootmem() are unnecessary.
Because it returns zeroed memory and it never return NULL.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita a1a33fa315 x86: use bitmap library for pin_programmed
Use bitmap library for pin_programmed rather than reinvent
bitmaps.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita 4abc1a0068 x86: use MP_intsrc_info()
Remove duplicate code by using MP_intsrc_info() in mpparse.c

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita 5d47a271f3 x86: use BUILD_BUG_ON() for the size of struct intel_mp_floating
Use BUILD_BUG_ON() instead of compile-time error technique with
extern non-exsistent function.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Roland McGrath 562b80baff x86_64 ia32 ptrace: convert to compat_arch_ptrace
Now that there are no more special cases in sys32_ptrace, we
can convert to using the generic compat_sys_ptrace entry point.
The sys32_ptrace function gets simpler and becomes compat_arch_ptrace.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Roland McGrath cdb6990479 x86_64 ia32 ptrace: use compat_ptrace_request for siginfo
This removes the special-case handling for PTRACE_GETSIGINFO
and PTRACE_SETSIGINFO from x86_64's sys32_ptrace.  The generic
compat_ptrace_request code handles these.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Roland McGrath 55928e37b2 x86 signals: lift set_fs
This lifts the set_fs(USER_DS) call for signal handler setup out of the
three places copying the same code into the one place that calls them
all.  There is no change in what it does.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Roland McGrath 8b9c5ff380 x86 signals: lift flags diddling code
This lifts the code diddling the TF and DF bits for signal handler setup
out of the several places copying the same code into the one place that
calls them all.  There is no change in what it does.

I also separated the recently-added DF bit clearing from the TF diddling.
The compiler turns them back into one instruction anyway.  The tossing
in of DF to the same line of code with no new comments was a bit more
arcane than seems wise.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Dmitri Vorobiev f7f17a67c5 x86: remove NexGen support
It is claimed that NexGen CPUs were never shipped:

   http://lkml.org/lkml/2008/4/20/179

Also, the kernel support for these chips has been broken for
a long time, the code intended to support NexGen thereby being
essentially dead.

As an outcome of the discussion that can be found using the URL
above, this patch removes the NexGen support altogether.

The changes in this patch survived a defconfig build for i386, a
couple of successful randconfig builds, as well as a runtime test,
which consisted in booting a 32-bit x86 box up to the shell prompt.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Dmitri Vorobiev a2b4bd9c95 x86: array can become static
In arch/x86/kernel/setup_64.c, the standard_io_resources array
is needlessly defined as global. This patch makes this variable
static.

This patch was successfully build-tested using the defconfig
for x86_64. Runtime test was performed by booting a 64-bit x86
box up to the shell prompt.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Dmitri Vorobiev f3b14a32db x86: remove unused function amd_init_cpu()
There are no users for the function amd_init_cpu() defined in
arch/x86/kernel/cpu/amd.c. This patch removes this routine.

This patch was build-tested using defconfigs for i386 and x86_64,
and a few randconfig instances. Runtime tests were performed by
booting 32- and 64-bit x86 boxen up to the shell prompt.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Jan Beulich 911f6a7ba2 x86-64: extend MCE CPU quirk handling
At least on my Barcelona, I see MCE log entries after cold boot caused
by BIOS not properly clearing the respective registers. Therefore, this
patch extends the workaround to families 0x10 and 0x11 (the latter just
for completeness, I have nothing to verify this against).
At the same time, provide a way to make these entries visible via the
'mce=bootlog' command line option even on these machines.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Jan Beulich 79bf0e0353 i386: fix signal type for iret exception
.. since it uses ILL_BADSTK (which is meaningless in the context of
SIGSEGV).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Jan Beulich 86d78f6402 x86: fix watchdog ops for CoreDuo
There apparently was an unnoticed conflict between an earlier patch to
this file and mine (d1e084746b), which
I noticed only now. I suppose a change like the one below (untested) is
needed; I didn't get any response on a confirmation request for this from
the submitter of the first patch.

The issue is the writing of the 'checkbit' member at the end of
setup_intel_arch_watchdog(), which my patch made go to intel_arch_wd_ops
rather than wd_ops.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Jan Beulich 5065dbafc2 i386: fix asm constraint in do_IRQ()
Two prior changes resulted in the "ecx" clobber being lost.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Ingo Molnar 8db979bcfe x86 PAT: decouple from nonpromisc devmem
Linus pointed it out that PAT should not depend on NONPROMISC_DEVMEM.

Also make PAT non-default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 16:01:26 +02:00
Ingo Molnar 1ebcc654f0 x86 PAT: tone down debugging messages
Linus reported these excessive debug printouts:

>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0380000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000

turn that into a pr_debug().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 16:01:26 +02:00
Linus Torvalds bf16ae2509 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat:
  generic: add ioremap_wc() interface wrapper
  /dev/mem: make promisc the default
  pat: cleanups
  x86: PAT use reserve free memtype in mmap of /dev/mem
  x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
  x86: PAT avoid aliasing in /dev/mem read/write
  devmem: add range_is_allowed() check to mmap of /dev/mem
  x86: introduce /dev/mem restrictions with a config option
2008-04-25 12:48:08 -07:00
Linus Torvalds 4b7227ca32 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next: (52 commits)
  xen: add balloon driver
  xen: allow compilation with non-flat memory
  xen: fold xen_sysexit into xen_iret
  xen: allow set_pte_at on init_mm to be lockless
  xen: disable preemption during tlb flush
  xen pvfb: Para-virtual framebuffer, keyboard and pointer driver
  xen: Add compatibility aliases for frontend drivers
  xen: Module autoprobing support for frontend drivers
  xen blkfront: Delay wait for block devices until after the disk is added
  xen/blkfront: use bdget_disk
  xen: Make xen-blkfront write its protocol ABI to xenstore
  xen: import arch generic part of xencomm
  xen: make grant table arch portable
  xen: replace callers of alloc_vm_area()/free_vm_area() with xen_ prefixed one
  xen: make include/xen/page.h portable moving those definitions under asm dir
  xen: add resend_irq_on_evtchn() definition into events.c
  Xen: make events.c portable for ia64/xen support
  xen: move events.c to drivers/xen for IA64/Xen support
  xen: move features.c from arch/x86/xen/features.c to drivers/xen
  xen: add missing definitions in include/xen/interface/vcpu.h which ia64/xen needs
  ...
2008-04-25 12:32:10 -07:00
Matthew Wilcox 2cfed60cc2 Update .gitignore files
Add some autogenerated files to various .gitignore files

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-25 12:27:32 -07:00
Ingo Molnar 00c6b2d5d7 x86: harden kernel code patching
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Mathieu Desnoyers b7b66baa8b x86: clean up text_poke()
Clean up the codepath, remove alignment restrictions and do sanity
checking of the end result, to make sure we patched the right site.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Jiri Slaby 8b132ecbcf x86: fix text_poke()
kernel_text_address returns true even for modules which is not wanted
in text_poke. Use core_kernel_text instead.

This is a regression introduced in e587cadd8f
which caused occasionaly crashes after suspend/resume.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Ingo Molnar 70c9f590ff x86: remove set_fixmap() warning
set_fixmap()+clear_fixmap() is safe.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Ingo Molnar 82a355f5a2 x86: make __set_fixmap() non-init
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Jeremy Fitzhardinge af7ae3b9c4 xen: allow compilation with non-flat memory
There's no real reason we can't support sparsemem/discontigmem, so do so.
This is mostly useful to support hotplug memory.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge b77797fb2b xen: fold xen_sysexit into xen_iret
xen_sysexit and xen_iret were doing essentially the same thing.  Rather
than having a separate implementation for xen_sysexit, we can just strip
the stack back to an iret frame and jump into xen_iret.  This removes
a lot of code and complexity - specifically, another critical region.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge 2bd50036b5 xen: allow set_pte_at on init_mm to be lockless
The usual pagetable locking protocol doesn't seem to apply to updates
to init_mm, so don't rely on preemption being disabled in xen_set_pte_at
on init_mm.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge 41e332b2a2 xen: disable preemption during tlb flush
Various places in the kernel flush the tlb even though preemption doens't
guarantee the tlb flush is happening on any particular CPU.  In many cases
this doesn't seem to matter, so don't make a fuss about it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Isaku Yamahata 8d3d2106c1 xen: make grant table arch portable
split out x86 specific part from grant-table.c and
allow ia64/xen specific initialization.
ia64/xen grant table is based on pseudo physical address
(guest physical address) unlike x86/xen. On ia64 init_mm
doesn't map identity straight mapped area.
ia64/xen specific grant table initialization is necessary.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Isaku Yamahata e04d0d0767 xen: move events.c to drivers/xen for IA64/Xen support
move arch/x86/xen/events.c undedr drivers/xen to share codes
with x86 and ia64. And minor adjustment to compile.
ia64/xen also uses events.c

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Isaku Yamahata af711cda4f xen: move features.c from arch/x86/xen/features.c to drivers/xen
ia64/xen also uses it too. Move it into common place so that
ia64/xen can share the code.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge 0f2c876952 xen: jump to iret fixup
Use jmp rather than call for the iret fixup, so its consistent with
the sysexit fixup, and it simplifies the stack (which is already
complex).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge dbe9e994c9 xen: no need for domU to worry about MCE/MCA
Mask MCE/MCA out of cpu caps.  Its harmless to leave them there, but
it does prevent the kernel from starting an unnecessary thread.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge 229664bee6 xen: short-cut for recursive event handling
If an event comes in while events are currently being processed, then
just increment the counter and have the outer event loop reprocess the
pending events.  This prevents unbounded recursion on heavy event
loads (of course massive event storms will cause infinite loops).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge ee8fa1c67f xen: make sure retriggered events are set pending
retrigger_dynirq() was incomplete, and didn't properly set the event
to be pending again.  It doesn't seem to actually get used.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge ee523ca1e4 xen: implement a debug-interrupt handler
Xen supports the notion of a debug interrupt which can be triggered
from the console.  For now this is implemented to show pending events,
masks and each CPU's pending event set.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge e2a81baf66 xen: support sysenter/sysexit if hypervisor does
64-bit Xen supports sysenter for 32-bit guests, so support its
use.  (sysenter is faster than int $0x80 in 32-on-64.)

sysexit is still not supported, so we fake it up using iret.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 85958b465c x86: unify pgd ctor/dtor
All pagetables need fundamentally the same setup and destruction, so
just use the same code for everything.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 68db065c84 x86: unify KERNEL_PGD_PTRS
Make KERNEL_PGD_PTRS common, as previously it was only being defined
for 32-bit.

There are a couple of follow-on changes from this:
 - KERNEL_PGD_PTRS was being defined in terms of USER_PGD_PTRS.  The
   definition of USER_PGD_PTRS doesn't really make much sense on x86-64,
   since it can have two different user address-space configurations.
   I renamed USER_PGD_PTRS to KERNEL_PGD_BOUNDARY, which is meaningful
   for all of 32/32, 32/64 and 64/64 process configurations.

 - USER_PTRS_PER_PGD was also defined and was being used for similar
   purposes.  Converting its users to KERNEL_PGD_BOUNDARY left it
   completely unused, and so I removed it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 90e9f53662 xen: make sure iret faults are trapped
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 947a69c90c xen: unify pte operations
We can fold the essentially common pte functions together now.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 430442e38e xen: make use of pte_t union
pte_t always contains a "pte" field for the whole pte value, so make
use of it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge abf33038ff xen: use appropriate pte types
Convert Xen pagetable handling to use appropriate *val_t types.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge c20311e165 x86/pgtable.h: demacro ptep_clear_flush_young
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge f9fbf1a36a x86/pgtable.h: demacro ptep_test_and_clear_young
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge ee5aa8d3ba x86/pgtable.h: demacro ptep_set_access_flags
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 2761fa0920 x86: add pud_alloc for 4-level pagetables
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 6944a9c894 x86: rename paravirt_alloc_pt etc after the pagetable structure
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.

[ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 394158559d x86: move all the pgd_list handling to one place
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 5a5f8f4224 x86: move pgalloc pud and pgd operations into common place
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge 170fdff705 x86: move pmd functions into common asm/pgalloc.h
Common definitions for 3-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge 397f687ab7 x86: move pte functions into common asm/pgalloc.h
Common definitions for 2-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge 1d262d3a49 x86: put paravirt stubs into common asm/pgalloc.h
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Ingo Molnar 1ec1fe73df x86: xen unify x86 add common mm pgtable c fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge 4f76cd3822 x86: add common mm/pgtable.c
Add a common arch/x86/mm/pgtable.c file for common pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge 79bf6d66ab x86: convert pgalloc_64.h from macros to inlines
Convert asm-x86/pgalloc_64.h from macros into functions (#include hell
prevents __*_free_tlb from being inline, but they're probably a bit
big to inline anyway).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Ingo Molnar 1f56cf1c58 /dev/mem: make promisc the default
default to the old semantics.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
Ingo Molnar 28eb559b5b pat: cleanups
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com e7f260a276 x86: PAT use reserve free memtype in mmap of /dev/mem
Use reserve_memtype and free_memtype wrappers for /dev/mem mmaps. The memtype
is slightly complicated here, given that we have to support existing X mappings.
We fallback on UC_MINUS for that.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com f0970c13b6 x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
Introduce phys_mem_access_prot_allowed(), which checks whether the mapping
is possible, without any conflicts and returns success or failure based on that.
phys_mem_access_prot() by itself does not allow failure case. This ability
to return error is needed for PAT where we may have aliasing conflicts.

x86 setup __HAVE_PHYS_MEM_ACCESS_PROT and move x86 specific code out of
/dev/mem into arch specific area.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com e045fb2a98 x86: PAT avoid aliasing in /dev/mem read/write
Add xlate and unxlate around /dev/mem read/write. This sets up the mapping
that can be used for /dev/mem read and write without aliasing worries.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
Arjan van de Ven ae531c26c5 x86: introduce /dev/mem restrictions with a config option
This patch introduces a restriction on /dev/mem: Only non-memory can be
read or written unless the newly introduced config option is set.

The X server needs access to /dev/mem for the PCI space, but it doesn't need
access to memory; both the file permissions and SELinux permissions of /dev/mem
just make X effectively super-super powerful. With the exception of the
BIOS area, there's just no valid app that uses /dev/mem on actual memory.
Other popular users of /dev/mem are rootkits and the like.
(note: mmap access of memory via /dev/mem was already not allowed since
a really long time)

People who want to use /dev/mem for kernel debugging can enable the config
option.

The restrictions of this patch have been in the Fedora and RHEL kernels for
at least 4 years without any problems.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:40:47 +02:00
Ingo Molnar a4928cffe6 "make namespacecheck" fixes
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:15:44 +02:00
Alexey Starikovskiy f8dc5a186c x86: fix compilation error in VisWS
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:15:44 +02:00
Ingo Molnar fcbc04c0ab x86: voyager fix
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:15:43 +02:00
Alexey Starikovskiy 4d33bdb768 x86: Drop duplicate from setup.c
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:15:43 +02:00
Linus Torvalds bda0c0afa7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6: (42 commits)
  PCI: Change PCI subsystem MAINTAINER
  PCI: pci-iommu-iotlb-flushing-speedup
  PCI: pci_setup_bridge() mustn't be __devinit
  PCI: pci_bus_size_cardbus() mustn't be __devinit
  PCI: pci_scan_device() mustn't be __devinit
  PCI: pci_alloc_child_bus() mustn't be __devinit
  PCI: replace remaining __FUNCTION__ occurrences
  PCI: Hotplug: fakephp: Return success, not ENODEV, when bus rescan is triggered
  PCI: Hotplug: Fix leaks in IBM Hot Plug Controller Driver - ibmphp_init_devno()
  PCI: clean up resource alignment management
  PCI: aerdrv_acpi.c: remove unneeded NULL check
  PCI: Update VIA CX700 quirk
  PCI: Expose PCI VPD through sysfs
  PCI: iommu: iotlb flushing
  PCI: simplify quirk debug output
  PCI: iova RB tree setup tweak
  PCI: parisc: use generic pci_enable_resources()
  PCI: ppc: use generic pci_enable_resources()
  PCI: powerpc: use generic pci_enable_resources()
  PCI: ia64: use generic pci_enable_resources()
  ...
2008-04-21 15:58:35 -07:00
Linus Torvalds 904e0ab54b Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  [HWRNG] omap: Minor updates
  [CRYPTO] kconfig: Ordering cleanup
  [CRYPTO] all: Clean up init()/fini()
  [CRYPTO] padlock-aes: Use generic setkey function
  [CRYPTO] aes: Export generic setkey
  [CRYPTO] api: Make the crypto subsystem fully modular
  [CRYPTO] cts: Add CTS mode required for Kerberos AES support
  [CRYPTO] lrw: Replace all adds to big endians variables with be*_add_cpu
  [CRYPTO] tcrypt: Change the XTEA test vectors
  [CRYPTO] tcrypt: Shrink the tcrypt module
  [CRYPTO] tcrypt: Change the usage of the test vectors
  [CRYPTO] api: Constify function pointer tables
  [CRYPTO] aes-x86-32: Remove unused return code
  [CRYPTO] tcrypt: Shrink speed templates
  [CRYPTO] tcrypt: Group common speed templates
  [CRYPTO] sha512: Rename sha512 to sha512_generic
  [CRYPTO] sha384: Hardware acceleration for s390
  [CRYPTO] sha512: Hardware acceleration for s390
  [CRYPTO] s390: Generic sha_update and sha_final
  [CRYPTO] api: Switch to proc_create()
2008-04-21 15:57:09 -07:00
Linus Torvalds e80ab411e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
  SCSI: convert struct class_device to struct device
  DRM: remove unused dev_class
  IB: rename "dev" to "srp_dev" in srp_host structure
  IB: convert struct class_device to struct device
  memstick: convert struct class_device to struct device
  driver core: replace remaining __FUNCTION__ occurrences
  sysfs: refill attribute buffer when reading from offset 0
  PM: Remove destroy_suspended_device()
  Firmware: add iSCSI iBFT Support
  PM: Remove legacy PM (fix)
  Kobject: Replace list_for_each() with list_for_each_entry().
  SYSFS: Explicitly include required header file slab.h.
  Driver core: make device_is_registered() work for class devices
  PM: Convert wakeup flag accessors to inline functions
  PM: Make wakeup flags available whenever CONFIG_PM is set
  PM: Fix misuse of wakeup flag accessors in serial core
  Driver core: Call device_pm_add() after bus_add_device() in device_add()
  PM: Handle device registrations during suspend/resume
  block: send disk "change" event for rescan_partitions()
  sysdev: detect multiple driver registrations
  ...

Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
2008-04-21 15:49:58 -07:00
Linus Torvalds 429f731dea Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
  Deprecate the asm/semaphore.h files in feature-removal-schedule.
  Convert asm/semaphore.h users to linux/semaphore.h
  security: Remove unnecessary inclusions of asm/semaphore.h
  lib: Remove unnecessary inclusions of asm/semaphore.h
  kernel: Remove unnecessary inclusions of asm/semaphore.h
  include: Remove unnecessary inclusions of asm/semaphore.h
  fs: Remove unnecessary inclusions of asm/semaphore.h
  drivers: Remove unnecessary inclusions of asm/semaphore.h
  net: Remove unnecessary inclusions of asm/semaphore.h
  arch: Remove unnecessary inclusions of asm/semaphore.h
2008-04-21 15:41:27 -07:00
Linus Torvalds ec965350bb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
  sched: build fix
  sched: better rt-group documentation
  sched: features fix
  sched: /debug/sched_features
  sched: add SCHED_FEAT_DEADLINE
  sched: debug: show a weight tree
  sched: fair: weight calculations
  sched: fair-group: de-couple load-balancing from the rb-trees
  sched: fair-group scheduling vs latency
  sched: rt-group: optimize dequeue_rt_stack
  sched: debug: add some debug code to handle the full hierarchy
  sched: fair-group: SMP-nice for group scheduling
  sched, cpuset: customize sched domains, core
  sched, cpuset: customize sched domains, docs
  sched: prepatory code movement
  sched: rt: multi level group constraints
  sched: task_group hierarchy
  sched: fix the task_group hierarchy for UID grouping
  sched: allow the group scheduler to have multiple levels
  sched: mix tasks and groups
  ...
2008-04-21 15:40:24 -07:00
Bjorn Helgaas b81d988c04 PCI: x86: use generic pci_enable_resources()
Use the generic pci_enable_resources() instead of the arch-specific code.

Unlike this arch-specific code, the generic version:
    - checks for resource collisions with "!r->parent"

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:47:04 -07:00
Bjorn Helgaas 657472e9cc PCI: remove "pci=routeirq" noise from dmesg
The "pci=routeirq" option was added in 2004, and I don't get any valid
reports anymore.  The option is still mentioned in kernel-parameters.txt.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:47:03 -07:00
Gary Hade cb3576fa34 PCI: Include PCI domain in PCI bus names on x86/x86_64
The PCI bus names included in /proc/iomem and /proc/ioports are
of the form 'PCI Bus #XX' where XX is the bus number.  This patch
changes the naming to 'PCI Bus XXXX:YY' where XXXX is the domain
number and YY is the bus number.  For example, PCI bus 14 in
domain 0 will show as 'PCI Bus 0000:14' instead of 'PCI Bus #14'.
This change makes the naming consistent with other architectures
such as ia64 where multiple PCI domain support has been around
longer.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:47:03 -07:00
Greg Kroah-Hartman 6355f3d1c6 PCI: remove pcibios_fixup_ghosts()
This function was obviously never being used since early 2.5 days as any
device that it would try to remove would never really be removed from
the system due to the PCI device list being held in the driver core, not
the general list of PCI devices.

As we have not had a single report of a problem here in 4 years, I think
it's safe to remove now.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:47:00 -07:00
Greg Kroah-Hartman 1ba6ab11d8 PCI: remove initial bios sort of PCI devices on x86
We currently keep 2 lists of PCI devices in the system, one in the
driver core, and one all on its own.  This second list is sorted at boot
time, in "BIOS" order, to try to remain compatible with older kernels
(2.2 and earlier days).  There was also a "nosort" option to turn this
sorting off, to remain compatible with even older kernel versions, but
that just ends up being what we have been doing from 2.5 days...

Unfortunately, the second list of devices is not really ever used to 
determine the probing order of PCI devices or drivers[1].  That is done
using the driver core list instead.  This change happened back in the
early 2.5 days.

Relying on BIOS ording for the binding of drivers to specific device
names is problematic for many reasons, and userspace tools like udev
exist to properly name devices in a persistant manner if that is needed,
no reliance on the BIOS is needed.

Matt Domsch and others at Dell noticed this back in 2006, and added a
boot option to sort the PCI device lists (both of them) in a
breadth-first manner to help remain compatible with the 2.4 order, if
needed for any reason.  This option is not going away, as some systems
rely on them.

This patch removes the sorting of the internal PCI device list in "BIOS"
mode, as it's not needed at all anymore, and hasn't for many years.
I've also removed the PCI flags for this from some other arches that for
some reason defined them, but never used them.

This should not change the ordering of any drivers or device probing.

[1] The old-style pci_get_device and pci_find_device() still used this
sorting order, but there are very few drivers that use these functions,
as they are deprecated for use in this manner.  If for some reason, a
driver rely on the order and uses these functions, the breadth-first
boot option will resolve any problem.

Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:46:58 -07:00
Greg Kroah-Hartman a2b5d87784 PCI: remove pci_get_device_reverse from calgary driver
This isn't needed, we can just walk the devices in bus order with no
problems at all, as we really want to remove pci_get_device_reverse from
the kernel tree.

Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:46:53 -07:00
Sebastian Siewior 744b5a2810 [CRYPTO] aes-x86-32: Remove unused return code
The return parameter isn't used remove it.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-04-21 10:19:21 +08:00
Rafael J. Wysocki b844eba292 PM: Remove destroy_suspended_device()
After 2.6.24 there was a plan to make the PM core acquire all device
semaphores during a suspend/hibernation to protect itself from
concurrent operations involving device objects.  That proved to be
too heavy-handed and we found a better way to achieve the goal, but
before it happened, we had introduced the functions
device_pm_schedule_removal() and destroy_suspended_device() to allow
drivers to "safely" destroy a suspended device and we had adapted some
drivers to use them.  Now that these functions are no longer necessary,
it seems reasonable to remove them and modify their users to use the
normal device unregistration instead.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:28 -07:00
Konrad Rzeszutek 138fe4e069 Firmware: add iSCSI iBFT Support
Add /sysfs/firmware/ibft/[initiator|targetX|ethernetX] directories along with
text properties which export the the iSCSI Boot Firmware Table (iBFT)
structure.

What is iSCSI Boot Firmware Table?  It is a mechanism for the iSCSI tools to
extract from the machine NICs the iSCSI connection information so that they
can automagically mount the iSCSI share/target.  Currently the iSCSI
information is hard-coded in the initrd.  The /sysfs entries are read-only
one-name-and-value fields.

The usual set of data exposed is:

# for a in `find /sys/firmware/ibft/ -type f -print`; do  echo -n "$a: ";  cat $a; done
/sys/firmware/ibft/target0/target-name: iqn.2007.com.intel-sbx44:storage-10gb
/sys/firmware/ibft/target0/nic-assoc: 0
/sys/firmware/ibft/target0/chap-type: 0
/sys/firmware/ibft/target0/lun: 00000000
/sys/firmware/ibft/target0/port: 3260
/sys/firmware/ibft/target0/ip-addr: 192.168.79.116
/sys/firmware/ibft/target0/flags: 3
/sys/firmware/ibft/target0/index: 0
/sys/firmware/ibft/ethernet0/mac: 00:11:25:9d:8b:01
/sys/firmware/ibft/ethernet0/vlan: 0
/sys/firmware/ibft/ethernet0/gateway: 192.168.79.254
/sys/firmware/ibft/ethernet0/origin: 0
/sys/firmware/ibft/ethernet0/subnet-mask: 255.255.252.0
/sys/firmware/ibft/ethernet0/ip-addr: 192.168.77.41
/sys/firmware/ibft/ethernet0/flags: 7
/sys/firmware/ibft/ethernet0/index: 0
/sys/firmware/ibft/initiator/initiator-name: iqn.2007-07.com:konrad.initiator
/sys/firmware/ibft/initiator/flags: 3
/sys/firmware/ibft/initiator/index: 0

For full details of the IBFT structure please take a look at:
ftp://ftp.software.ibm.com/systems/support/system_x_pdf/ibm_iscsi_boot_firmware_table_v1.02.pdf

[akpm@linux-foundation.org: fix build]
Signed-off-by: Konrad Rzeszutek <konradr@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:28 -07:00
Mike Travis fb0f330e62 x86: modify show_shared_cpu_map in intel_cacheinfo
* Removed kmalloc (or local array) in show_shared_cpu_map().

  * Added show_shared_cpu_list() function.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:59 +02:00
Mike Travis 9f0e8d0400 x86: convert cpumask_of_cpu macro to allocated array
* Here is a simple patch to use an allocated array of cpumasks to
    represent cpumask_of_cpu() instead of constructing one on the stack.
    It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
    currently only set for x86_64 SMP.  Otherwise the the existing
    cpumask_of_cpu() is used but has been changed to produce an lvalue
    so a pointer to it can be used.

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:59 +02:00
Mike Travis b53e921ba1 generic: reduce stack pressure in sched_affinity
* Modify sched_affinity functions to pass cpumask_t variables by reference
    instead of by value.

  * Use new set_cpus_allowed_ptr function.

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Cc: Paul Jackson <pj@sgi.com>
Cc: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:59 +02:00
Mike Travis fc0e474840 x86: use new set_cpus_allowed_ptr function
* Use new set_cpus_allowed_ptr() function added by previous patch,
    which instead of passing the "newly allowed cpus" cpumask_t arg
    by value,  pass it by pointer:

    -int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
    +int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

  * Cleanup uses of CPU_MASK_ALL.

  * Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
    Use pointers to cpumask_t arguments whenever possible.

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Cc: Len Brown <len.brown@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:58 +02:00
Mike Travis d366f8cbc1 cpumask: Cleanup more uses of CPU_MASK and NODE_MASK
*  Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
    NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
    and MAXNODES counts.

 *  In some cases, the cpumask variable was initialized but then overwritten
    with another value.  This is the case for changes like this:

    -       cpumask_t oldmask = CPU_MASK_ALL;
    +       cpumask_t oldmask;

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:58 +02:00
Mike Travis f46bdf2db2 numa: move large array from stack to _initdata section
* Move large array "struct bootnode nodes" from stack to _initdata
    section to reduce amount of stack space required.

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:58 +02:00
Mike Travis d18d00f5db x86: oprofile: remove NR_CPUS arrays in arch/x86/oprofile/nmi_int.c
Change the following arrays sized by NR_CPUS to be PERCPU variables:

	static struct op_msrs cpu_msrs[NR_CPUS];
	static unsigned long saved_lvtpc[NR_CPUS];

Also some minor complaints from checkpatch.pl fixed.

Based on:
	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
	git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git

All changes were transparent except for:

 static void nmi_shutdown(void)
 {
+	struct op_msrs *msrs = &__get_cpu_var(cpu_msrs);
 	nmi_enabled = 0;
 	on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1);
 	unregister_die_notifier(&profile_exceptions_nb);
-	model->shutdown(cpu_msrs);
+	model->shutdown(msrs);
 	free_msrs();
 }

The existing code passed a reference to cpu 0's instance of struct op_msrs
to model->shutdown, whilst the other functions are passed a reference to
<this cpu's> instance of a struct op_msrs.  This seemed to be a bug to me
even though as long as cpu 0 and <this cpu> are of the same type it would
have the same effect...?

Cc: Philippe Elie <phil.el@wanadoo.fr>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:58 +02:00
Mike Travis 6b6309b4c7 x86: reduce memory and stack usage in intel_cacheinfo
* Change the following static arrays sized by NR_CPUS to
  per_cpu data variables:

	_cpuid4_info *cpuid4_info[NR_CPUS];
	_index_kobject *index_kobject[NR_CPUS];
	kobject * cache_kobject[NR_CPUS];

* Remove the local NR_CPUS array with a kmalloc'd region in
  show_shared_cpu_map().

Also some minor complaints from checkpatch.pl fixed.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:58 +02:00
Jack Steiner 34d0559178 x86: UV startup of slave cpus
This patch changes smpboot.c so that it can start slave cpus running
in UV non-unique apicid mode. The SIPI must be sent using a UV-specific
mechanism.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:19:58 +02:00
Glauber Costa 098cb7f27e x86: integrate pci-dma.c
The code in pci-dma_{32,64}.c are now sufficiently
close to each other. We merge them in pci-dma.c.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:58 +02:00
Glauber Costa bb8ada95a7 x86: don't do dma if mask is NULL.
if the device hasn't provided a mask, abort allocation.
Note that we're using a fallback device now, so it does not cover
the case of a NULL device: just drivers passing NULL masks around.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:58 +02:00
Glauber Costa da60cab4dd x86: return conditional to mmu
Just return our allocation if we don't have an mmu. For i386, where this patch
is being applied, we never have. So our goal is just to have the code to look like
x86_64's.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:58 +02:00
Glauber Costa aa99b16faa x86: remove kludge from x86_64
The claim is that i386 does it. Just it does not.
So remove it.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:58 +02:00
Glauber Costa 8f19ca1341 x86: unify gfp masks
Use the same gfp masks for x86_64 and i386.
It involves using HIGHMEM or DMA32 where necessary, for the sake
of code compatibility, (no real effect), and using the NORETRY
mask for i386.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:58 +02:00
Glauber Costa 5fa78ca75d x86: retry allocation if failed
This patch puts in the code to retry allocation in case it fails. By its
own, it does not make much sense but making the code look like x86_64.
But later patches in this series will make we try to allocate from
zones other than DMA first, which will possibly fail.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19 19:19:58 +02:00