Commit graph

260 commits

Author SHA1 Message Date
Anil S Keshavamurthy 2d14e39da8 [PATCH] kprobes: enable funcions only for required arch
Kernel/kprobes.c defines get_insn_slot() and free_insn_slot() which are
currently required _only_ for x86_64 and powerpc (which has no-exec support).

FYI, get{free}_insn_slot() functions manages the memory page which is mapped
as executable, required for instruction emulation.

This patch moves those two functions under __ARCH_WANT_KPROBES_INSN_SLOT and
defines __ARCH_WANT_KPROBES_INSN_SLOT in arch specific kprobes.h file.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:39 -08:00
Brian Gerst af4cd3fe4c [PATCH] Generic ioctl.h
Most arches copied the i386 ioctl.h.  Combine them into a generic header.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:34 -08:00
Vivek Goyal ec9ce0dbaa [PATCH] kdump: x86_64 save cpu registers upon crash
- Saving the cpu registers of all cpus before booting in to the crash
  kernel.

- crash_setup_regs will save the registers of the cpu on which panic has
  occured.  One of the concerns ppc64 folks raised is that after capturing the
  register states, one should not pop the current call frame and push new one.
   Hence it has been inlined.  More call frames later get pushed on to stack
  (machine_crash_shutdown() and machine_kexec()), but one will not want to
  backtrace those.

- Not very sure about the CFI annotations.  With this patch I am getting
  decent backtrace with gdb.  Assuming, compiler has generated enough
  debugging information for crash_kexec().  Coding crash_setup_regs() in pure
  assembly makes it tricky because then it can not be inlined and we don't
  want to return back after capturing register states we don't want to pop
  this call frame.

- Saving the non-panicing cpus registers will be done in the NMI handler
  while shooting down them in machine_crash_shutdown.

- Introducing CRASH_DUMP option in Kconfig for x86_64.

Signed-off-by: Murali M Chakravarthy <muralim@in.ibm.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:28 -08:00
akpm@osdl.org 69cda7b1f0 [PATCH] kdump: x86_64: add memmmap command line option
)

From: Vivek Goyal <vgoyal@in.ibm.com>

- This patch introduces the memmap option for x86_64 similar to i386.

- memmap=exactmap enables setting of an exact E820 memory map, as specified
  by the user.

Changes in this version:

- Used e820_end_of_ram() to find the max_pfn as suggested by Andi kleen.

- removed PFN_UP & PFN_DOWN macros

- Printing the user defined map also.

Signed-off-by: Murali M Chakravarthy <muralim@in.ibm.com>
Signed-off-by: Hariprasad Nellitheertha <nharipra@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:27 -08:00
Vivek Goyal cc57165874 [PATCH] kdump: dynamic per cpu allocation of memory for saving cpu registers
- In case of system crash, current state of cpu registers is saved in memory
  in elf note format.  So far memory for storing elf notes was being allocated
  statically for NR_CPUS.

- This patch introduces dynamic allocation of memory for storing elf notes.
  It uses alloc_percpu() interface.  This should lead to better memory usage.

- Introduced based on Andi Kleen's and Eric W. Biederman's suggestions.

- This patch also moves memory allocation for elf notes from architecture
  dependent portion to architecture independent portion.  Now crash_notes is
  architecture independent.  The whole idea is that size of memory to be
  allocated per cpu (MAX_NOTE_BYTES) can be architecture dependent and
  allocation of this memory can be architecture independent.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:26 -08:00
Ingo Molnar b8aa0361e4 [PATCH] mutex subsystem, add include/asm-x86_64/mutex.h
add the x86_64 version of mutex.h, optimized in assembly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
2006-01-09 15:59:18 -08:00
Ingo Molnar ffbf670f5c [PATCH] mutex subsystem, add atomic_xchg() to all arches
add atomic_xchg() to all the architectures. Needed by the new mutex code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
2006-01-09 15:59:17 -08:00
Ravikiran G Thirumalai 1fd73c6b67 [PATCH] Kill L1_CACHE_SHIFT_MAX
Kill L1_CACHE_SHIFT from all arches.  Since L1_CACHE_SHIFT_MAX is not used
anymore with the introduction of INTERNODE_CACHE, kill L1_CACHE_SHIFT_MAX.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:39 -08:00
Christoph Lameter 39743889aa [PATCH] Swap Migration V5: sys_migrate_pages interface
sys_migrate_pages implementation using swap based page migration

This is the original API proposed by Ray Bryant in his posts during the first
half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.

The intent of sys_migrate is to migrate memory of a process.  A process may
have migrated to another node.  Memory was allocated optimally for the prior
context.  sys_migrate_pages allows to shift the memory to the new node.

sys_migrate_pages is also useful if the processes available memory nodes have
changed through cpuset operations to manually move the processes memory.  Paul
Jackson is working on an automated mechanism that will allow an automatic
migration if the cpuset of a process is changed.  However, a user may decide
to manually control the migration.

This implementation is put into the policy layer since it uses concepts and
functions that are also needed for mbind and friends.  The patch also provides
a do_migrate_pages function that may be useful for cpusets to automatically
move memory.  sys_migrate_pages does not modify policies in contrast to Ray's
implementation.

The current code here is based on the swap based page migration capability and
thus is not able to preserve the physical layout relative to it containing
nodeset (which may be a cpuset).  When direct page migration becomes available
then the implementation needs to be changed to do a isomorphic move of pages
between different nodesets.  The current implementation simply evicts all
pages in source nodeset that are not in the target nodeset.

Patch supports ia64, i386 and x86_64.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:12:42 -08:00
Shaohua Li 1fa744e6e9 [PATCH] cpu hotplug/x86_64: disable interrupt in play_dead
With physical CPU hotplug, the CPU is hot removed and it should not receive
any interrupts.  Disabling interrupt is much safer.  This basically is what we
do in ia64 & x86.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:39 -08:00
Brian Gerst 19d534842c [PATCH] mpspec: remove unneeded packed attribute
GCC 4.1 gives the following warning: include/asm/mpspec.h:79: warning:
`packed' attribute ignored for field of type `unsigned char'

The packed attribute isn't really necessary anyways so just remove it.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Acked-by: Dave Jones <davej@codemonkey.org.uk>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:39 -08:00
Arjan van de Ven 67df197b1a [PATCH] x86/x86_64: mark rodata section read-only: x86-64 support
x86-64 specific parts to make the .rodata section read only

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:36 -08:00
Arjan van de Ven c728252c7a [PATCH] x86/x86_64: mark rodata section read only: generic x86-64 bugfix
Bug fix required for the .rodata work on x86-64:

when change_page_attr() and friends need to break up a 2Mb page into 4Kb
pages, it always set the NX bit on the PMD, which causes the cpu to consider
the entire 2Mb region to be NX regardless of the actual PTE perms.  This is
fine in general, with one big exception: the 2Mb page that covers the last
part of the kernel .text!  The fix is to not invent a new permission for the
new PMD entry, but to just inherit the existing one minus the PSE bit.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:36 -08:00
Christoph Lameter d3cb487149 [PATCH] atomic_long_t & include/asm-generic/atomic.h V2
Several counters already have the need to use 64 atomic variables on 64 bit
platforms (see mm_counter_t in sched.h).  We have to do ugly ifdefs to fall
back to 32 bit atomic on 32 bit platforms.

The VM statistics patch that I am working on will also make more extensive
use of atomic64.

This patch introduces a new type atomic_long_t by providing definitions in
asm-generic/atomic.h that works similar to the c "long" type.  Its 32 bits
on 32 bit platforms and 64 bits on 64 bit platforms.

Also cleans up the determination of the mm_counter_t in sched.h.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:29 -08:00
Badari Pulavarty f6b3ec238d [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store
Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
given range of pages & its associated backing store.  Current
implementation supports only shmfs/tmpfs and other filesystems return
-ENOSYS.

"Some app allocates large tmpfs files, then when some task quits and some
client disconnect, some memory can be released.  However the only way to
release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli

Databases want to use this feature to drop a section of their bufferpool
(shared memory segments) - without writing back to disk/swap space.

This feature is also useful for supporting hot-plug memory on UML.

Concerns raised by Andrew Morton:

- "We have no plan for holepunching!  If we _do_ have such a plan (or
  might in the future) then what would the API look like?  I think
  sys_holepunch(fd, start, len), so we should start out with that."

- Using madvise is very weird, because people will ask "why do I need to
  mmap my file before I can stick a hole in it?"

- None of the other madvise operations call into the filesystem in this
  manner.  A broad question is: is this capability an MM operation or a
  filesytem operation?  truncate, for example, is a filesystem operation
  which sometimes has MM side-effects.  madvise is an mm operation and with
  this patch, it gains FS side-effects, only they're really, really
  significant ones."

Comments:

- Andrea suggested the fs operation too but then it's more efficient to
  have it as a mm operation with fs side effects, because they don't
  immediatly know fd and physical offset of the range.  It's possible to
  fixup in userland and to use the fs operation but it's more expensive,
  the vmas are already in the kernel and we can use them.

Short term plan &  Future Direction:

- We seem to need this interface only for shmfs/tmpfs files in the short
  term.  We have to add hooks into the filesystem for correctness and
  completeness.  This is what this patch does.

- In the future, plan is to support both fs and mmap apis also.  This
  also involves (other) filesystem specific functions to be implemented.

- Current patch doesn't support VM_NONLINEAR - which can be addressed in
  the future.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:22 -08:00
Stephen Hemminger 90933fc8ba [FLS64]: x86_64 version
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:07 -08:00
Stephen Hemminger 3821af2fe1 [FLS64]: generic version
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:06 -08:00
Dag-Erling Smørgrav abe842eb98 [PATCH] Avoid namespace pollution in <asm/param.h>
In commit 3D59121003721a8fad11ee72e646fd9d3076b5679c, the x86 and x86-64
<asm/param.h> was changed to include <linux/config.h> for the
configurable timer frequency.

However, asm/param.h is sometimes used in userland (it is included
indirectly from <sys/param.h>), so your commit pollutes the userland
namespace with tons of CONFIG_FOO macros.  This greatly confuses
software packages (such as BusyBox) which use CONFIG_FOO macros
themselves to control the inclusion of optional features.

After a short exchange, Christoph approved this patch

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-02 08:38:38 -08:00
Ben Collins e5c34a57c8 [PATCH] Fix typo in x86_64 __build_write_lock_const assembly
Based on __build_read_lock_const, this looked like a bug.

[ Indeed. Maybe nobody uses this version? Worth fixing up anyway ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-24 12:30:22 -08:00
Ravikiran G Thirumalai c660439ba9 [PATCH] x86_64/ia64 : Fix compilation error for node_to_first_cpu
Fixes a compiler error in node_to_first_cpu, __ffs expects unsigned long as
a parameter; instead cpumask_t was being passed.  The macro
node_to_first_cpu was not yet used in x86_64 and ia64 arches, and so we never
hit this.  This patch replaces __ffs with first_cpu macro, similar to other
arches.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-24 12:30:22 -08:00
Hugh Dickins 7c72aaf296 [PATCH] mm: fill arch atomic64 gaps
alpha, sparc64, x86_64 are each missing some primitives from their atomic64
support: fill in the gaps I've noticed by extrapolating asm, follow the
groupings in each file.  But powerpc and parisc still lack atomic64.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23 16:08:39 -08:00
Jacob.Shin@amd.com e6c667592e [PATCH] Fix x86_64/msr.h interface to agree with i386/msr.h
Ever since we remove msr.c from x86_64 branch and started grabbing it from
i386, msr device (read functionality) has been broken for us.

This is due to the differences between asm-i386/msr.h and asm-x86_64/msr.h interfaces.

Here is a patch to our side to fix this.

Thankfully, as of current (2.6.15-rc1-git6) tree, arch/i386/kernel/msr.c is the only file that uses rdmsr_safe macro.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-20 11:52:59 -08:00
Linus Torvalds 4060994c3e Merge x86-64 update from Andi 2005-11-14 19:56:02 -08:00
Andi Kleen 8893166ff8 [PATCH] x86_64: Increase the maximum number of local APICs to the maximum
This is needed for large multinode IBM systems which have a sparse
APIC space in clustered mode, fully covering the available 8 bits.

The previous kernels would limit the local APIC number to 127,
which caused it to reject some of the CPUs at boot.

I increased the maximum and shrunk the apic_version array a bit
to make up for that (the version is only 8 bit, so don't need
an full int to store)

Cc:  Chris McDermott <lcm@us.ibm.com>

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:17 -08:00
Paolo 'Blaisorblade' Giarrusso efbbdce94f [PATCH] x86_64: Use common sys_time64
Keeping this function does not makes sense because it's a copied (and
buggy) copy of sys_time.  The only difference is that now.tv_sec (which is
a time_t, i.e.  a 64-bit long) is copied (and truncated) into a int
(32-bit).

The prototype is the same (they both take a long __user *), so let's drop
this and redirect it to sys_time (and make sure it exists by defining
__ARCH_WANT_SYS_TIME).

Only disadvantage is that the sys_stime definition is also compiled (may be
fixed if needed by adding a separate __ARCH_WANT_SYS_STIME macro, and
defining it for all arch's defining __ARCH_WANT_SYS_TIME except x86_64).

Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:17 -08:00
Paolo 'Blaisorblade' Giarrusso bf0f2e2383 [PATCH] x86_64: Set ____cacheline_maxaligned_in_smp alignment to 128 bytes
The current value was correct before the introduction of Intel EM64T support -
but now L1_CACHE_SHIFT_MAX can be less than L1_CACHE_SHIFT, which _is_ funny!

Between the few users of ____cacheline_maxaligned_in_smp, we also have (for
example) rcu_ctrlblk, and struct zone, with zone->{lru_,}lock.  I.e.  we have
a lot of excess cacheline bouncing on them.

No correctness issues, obviously.  So this could even be merged for 2.6.14
(I'm not a fan of this idea, though).

CC: Andi Kleen <ak@suse.de>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:17 -08:00
Andi Kleen 8e0d4f4e91 [PATCH] x86_64: Remove asm-x86_64/rwsem.h
Not needed since x86-64 always uses the spinlock based rwsems.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:17 -08:00
Siddha, Suresh B 94605eff57 [PATCH] x86-64/i386: Intel HT, Multi core detection fixes
Fields obtained through cpuid vector 0x1(ebx[16:23]) and
vector 0x4(eax[14:25], eax[26:31]) indicate the maximum values and might not
always be the same as what is available and what OS sees.  So make sure
"siblings" and "cpu cores" values in /proc/cpuinfo reflect the values as seen
by OS instead of what cpuid instruction says. This will also fix the buggy BIOS
cases (for example where cpuid on a single core cpu says there are "2" siblings,
even when HT is disabled in the BIOS.
http://bugzilla.kernel.org/show_bug.cgi?id=4359)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:16 -08:00
Andi Kleen e90f22edf4 [PATCH] x86_64: Fix NUMA node lookup debug code which had bitrotted
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:16 -08:00
Andi Kleen a88cde13ba [PATCH] x86_64: Formatting fixes for arch/x86_64/kernel/process.c
No functional changes.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:16 -08:00
Andi Kleen ea0be473a1 [PATCH] x86_64: Allow modular build of ia32 aout loader
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:16 -08:00
Andi Kleen 420f8f68c9 [PATCH] x86_64: New heuristics to find out hotpluggable CPUs.
With a NR_CPUS==128 kernel with CPU hotplug enabled we would waste 4MB
on per CPU data of all possible CPUs.  The reason was that HOTPLUG
always set up possible map to NR_CPUS cpus and then we need to allocate
that much (each per CPU data is roughly ~32k now)

The underlying problem is that ACPI didn't tell us how many hotplug CPUs
the platform supports.  So the old code just assumed all, which would
lead to this memory wastage.

This implements some new heuristics:

 - If the BIOS specified disabled CPUs in the ACPI/mptables assume they
   can be enabled later (this is bending the ACPI specification a bit,
   but seems like a obvious extension)
 - The user can overwrite it with a new additionals_cpus=NUM option
 - Otherwise use half of the available CPUs or 2, whatever is more.

Cc: ashok.raj@intel.com
Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:15 -08:00
Andi Kleen 485832a5d9 [PATCH] x86_64: Use int operations in spinlocks to support more than 128 CPUs spinning.
Pointed out by Eric Dumazet

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:15 -08:00
Andi Kleen 6b75aeedde [PATCH] x86_64: Don't apply __PHYSICAL_MASK to page frame numbers
It is for physical addresses, not for PFNs.

Pointed out by Tejun Heo.

Cc: htejun@gmail.com

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:14 -08:00
Siddha, Suresh B f6c2e3330d [PATCH] x86_64: Unmap NULL during early bootup
We should zap the low mappings, as soon as possible, so that we can catch
kernel bugs more effectively. Previously early boot had NULL mapped
and didn't trap on NULL references.

This patch introduces boot_level4_pgt, which will always have low identity
addresses mapped.  Druing boot, all the processors will use this as their
level4 pgt.  On BP, we will switch to init_level4_pgt as soon as we enter C
code and zap the low mappings as soon as we are done with the usage of
identity low mapped addresses.  On AP's we will zap the low mappings as
soon as we jump to C code.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:14 -08:00
Andi Kleen 69d81fcde7 [PATCH] x86_64: Speed up numa_node_id by putting it directly into the PDA
Not go from the CPU number to an mapping array.
Mode number is often used now in fast paths.

This also adds a generic numa_node_id to all the topology includes

Suggested by Eric Dumazet

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:14 -08:00
Andi Kleen 1dff7f3db5 [PATCH] x86_64: Fix up outdated pfn_to_page comment
pfn_to_page really requires pfn_valid to be true now, no question.
Some people stumbled over it, but it was misleading and wrong.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:13 -08:00
James Cleverdon 6004e1b7ef [PATCH] i386/x86-64: Share interrupt vectors when there is a large number of interrupt sources
Here's a patch that builds on Natalie Protasevich's IRQ compression
patch and tries to work for MPS boots as well as ACPI.  It is meant for
a 4-node IBM x460 NUMA box, which was dying because it had interrupt
pins with GSI numbers > NR_IRQS and thus overflowed irq_desc.

The problem is that this system has 270 GSIs (which are 1:1 mapped with
I/O APIC RTEs) and an 8-node box would have 540.  This is much bigger
than NR_IRQS (224 for both i386 and x86_64).  Also, there aren't enough
vectors to go around.  There are about 190 usable vectors, not counting
the reserved ones and the unused vectors at 0x20 to 0x2F.  So, my patch
attempts to compress the GSI range and share vectors by sharing IRQs.

Cc: "Protasevich, Natalie" <Natalie.Protasevich@unisys.com>

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:13 -08:00
Jacob Shin 89b831ef8b [PATCH] x86_64: Support for AMD specific MCE Threshold.
MC4_MISC - DRAM Errors Threshold Register realized under AMD K8 Rev F.
This register is used to count correctable and uncorrectable ECC errors that occur during DRAM read operations.
The user may interface through sysfs files in order to change the threshold configuration.

bank%d/error_count - reads current error count, write to clear.
bank%d/interrupt_enable - set/clear interrupt enable.
bank%d/threshold_limit - read/write the threshold limit.

APIC vector 0xF9 in hw_irq.h.
5 software defined bank ids in mce.h.
new apic.c function to setup threshold apic lvt.
defaults to interrupt off, count enabled, and threshold limit max.
sysfs interface created on /sys/devices/system/threshold.

AK: added some ifdefs to make it compile on UP

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:13 -08:00
Jan Beulich 979edfadba [PATCH] x86_64: Adjust, correct, and complete the HPET definitions for x86-64.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:13 -08:00
Andi Kleen a2f1b42490 [PATCH] x86_64: Add 4GB DMA32 zone
Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.

As a bit of historical background: when the x86-64 port
was originally designed we had some discussion if we should
use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
both. Both was ruled out at this point because it was in early
2.4 when VM is still quite shakey and had bad troubles even
dealing with one DMA zone.  We settled on the 16MB DMA zone mainly
because we worried about older soundcards and the floppy.

But this has always caused problems since then because
device drivers had trouble getting enough DMA able memory. These days
the VM works much better and the wide use of NUMA has proven
it can deal with many zones successfully.

So this patch adds both zones.

This helps drivers who need a lot of memory below 4GB because
their hardware is not accessing more (graphic drivers - proprietary
and free ones, video frame buffer drivers, sound drivers etc.).
Previously they could only use IOMMU+16MB GFP_DMA, which
was not enough memory.

Another common problem is that hardware who has full memory
addressing for >4GB misses it for some control structures in memory
(like transmit rings or other metadata).  They tended to allocate memory
in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
(even on AMD systems the IOMMU tends to be quite small) especially if you have
many devices.  With the new zone pci_alloc_consistent can just put
this stuff into memory below 4GB which works better.

One argument was still if the zone should be 4GB or 2GB. The main
motivation for 2GB would be an unnamed not so unpopular hardware
raid controller (mostly found in older machines from a particular four letter
company) who has a strange 2GB restriction in firmware. But
that one works ok with swiotlb/IOMMU anyways, so it doesn't really
need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
it seems to be the most common restriction.

The new zone is so far added only for x86-64.

For other architectures who don't set up this
new zone nothing changes. Architectures can set a compatibility
define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
as GFP_DMA. Otherwise it's a nop because on 32bit architectures
it's normally not needed because GFP_NORMAL (=0) is DMA able
enough.

One problem is still that GFP_DMA means different things on different
architectures. e.g. some drivers used to have #ifdef ia64  use GFP_DMA
(trusting it to be 4GB) #elif __x86_64__ (use other hacks like
the swiotlb because 16MB is not enough) ... . This was quite
ugly and is now obsolete.

These should be now converted to use GFP_DMA32 unconditionally. I haven't done
this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
which will use GFP_DMA32 transparently.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:13 -08:00
Nick Piggin 8426e1f6af [PATCH] atomic: inc_not_zero
Introduce an atomic_inc_not_zero operation.  Make this a special case of
atomic_add_unless because lockless pagecache actually wants
atomic_inc_not_negativeone due to its offset refcount.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13 18:14:16 -08:00
Nick Piggin 4a6dae6d38 [PATCH] atomic: cmpxchg
Introduce an atomic_cmpxchg operation.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13 18:14:16 -08:00
Siddha, Suresh B 47936357c0 [PATCH] x86_64: fix tss limit
Fix the x86_64 TSS limit in TSS descriptor.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13 18:14:09 -08:00
Ashok Raj b4033c1715 [PATCH] PCI: Change MSI to use physical delivery mode always
MSI hardcoded delivery mode to use logical delivery mode. Recently
x86_64 moved to use physical mode addressing to support physflat mode.
With this mode enabled noticed that my eth with MSI werent working.

msi_address_init()  was hardcoded to use logical mode for i386 and x86_64.
So when we switch to use physical mode, things stopped working.

Since anyway we dont use lowest priority delivery with MSI, its always
directed to just a single CPU. Its safe  and simpler to use
physical mode always, even when we use logical delivery mode for IPI's
or other ioapic RTE's.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-11-10 16:09:18 -08:00
Ananth N Mavinakayanahalli e7a510f92c [PATCH] Kprobes: Track kprobe on a per_cpu basis - x86_64 changes
x86_64 changes to track kprobe execution on a per-cpu basis.  We now track the
kprobe state machine independently on each cpu using a arch specific kprobe
control block.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:46 -08:00
Tim Schmielau 8c65b4a604 [PATCH] fix remaining missing includes
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch.  This should now allow not to include sched.h
from module.h, which is done by a followup patch.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:41 -08:00
Tony Luck c7fb577e2a manual update from upstream:
Applied Al's change 06a544971f
to new location of swiotlb.c

Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-10-31 10:51:57 -08:00
Arthur Othieno 727a53bd53 [PATCH] semaphore: Remove __MUTEX_INITIALIZER()
__MUTEX_INITIALIZER() has no users, and equates to the more commonly used
DECLARE_MUTEX(), thus making it pretty much redundant.  Remove it for good.

Signed-off-by: Arthur Othieno <a.othieno@bluewin.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:27 -08:00
Tejun Heo 1426d7a81d [PATCH] vm: remove unused/broken page_pte[_prot] macros
This patch removes page_pte_prot and page_pte macros from all
architectures.  Some architectures define both, some only page_pte (broken)
and others none.  These macros are not used anywhere.

page_pte_prot(page, prot) is identical to mk_pte(page, prot) and
page_pte(page) is identical to page_pte_prot(page, __pgprot(0)).

* The following architectures define both page_pte_prot and page_pte

  arm, arm26, ia64, sh64, sparc, sparc64

* The following architectures define only page_pte (broken)

  frv, i386, m32r, mips, sh, x86-64

* All other architectures define neither

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:22 -08:00
Christoph Hellwig dfb7dac3af [PATCH] unify sys_ptrace prototype
Make sure we always return, as all syscalls should.  Also move the common
prototype to <linux/syscalls.h>

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:20 -08:00
Brian Gerst c531178157 [PATCH] Clean up mtrr compat ioctl code
Handle 32-bit mtrr ioctls in the mtrr driver instead of the ia32
compatability layer.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:13 -08:00
Rik Van Riel eb92f4ef32 [PATCH] add sem_is_read/write_locked()
Add sem_is_read/write_locked functions to the read/write semaphores, along the
same lines of the *_is_locked spinlock functions.  The swap token tuning patch
uses sem_is_read_locked; sem_is_write_locked is added for completeness.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:35 -07:00
Al Viro f80aabb03a [PATCH] gfp_t: dma-mapping (amd64)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:48 -07:00
Al Viro 06a544971f [PATCH] gfp_t: dma-mapping (ia64)
... and related annotations for amd64 - swiotlb code is shared, but
prototypes are not.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:47 -07:00
Linus Torvalds 79b95a454b Revert "x86-64: Avoid unnecessary double bouncing for swiotlb"
Commit id 6142891a0c

Andi Kleen reports that it seems to break things for some people,
and since it's purely a small optimization, revert it for now.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-27 16:28:39 -07:00
Tony Luck 9cec58dc13 Update from upstream with manual merge of Yasunori Goto's
changes to swiotlb.c made in commit 281dd25cdc
since this file has been moved from arch/ia64/lib/swiotlb.c to
lib/swiotlb.c

Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-10-20 10:41:44 -07:00
Andi Kleen 421c7ce6d0 [PATCH] x86_64: Allocate cpu local data for all possible CPUs
CPU hotplug fills up the possible map to NR_CPUs, but it did that after
setting up per CPU data. This lead to CPU data not getting allocated
for all possible CPUs, which lead to various side effects.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 16:33:25 -07:00
Kirill Korotaev 1294b118cb [PATCH] x86_64: Add missing () around arguments of pte_index macro
x86-64: Add missing () around arguments of pte_index macro

Signed-Off-By: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 08:54:38 -07:00
Andi Kleen 7d318d7747 [PATCH] Fix up TLB flush filter disabling
I checked with AMD and they requested to only disable it for family 15.
Also disable it for i386 too. And some style fixes.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-29 15:41:42 -07:00
John W. Linville 8d15d19e44 [PATCH] x86_64: implement dma_sync_single_range_for_{cpu,device}
Re-implement dma_sync_single_range_for_{cpu,device} for x86_64 using
swiotlb_sync_single_range_for_{cpu,device}.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-29 14:45:52 -07:00
John W. Linville 878a97cfd7 [PATCH] swiotlb: support syncing sub-ranges of mappings
This patch implements swiotlb_sync_single_range_for_{cpu,device}. This
is intended to support an x86_64 implementation of
dma_sync_single_range_for_{cpu,device}.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-29 14:44:23 -07:00
Andrew Morton 73a0b538ee [PATCH] x86_64: desc.h-needs smp.h
include/asm/desc.h: In function `load_LDT':
include/asm/desc.h:209: warning: implicit declaration of function `get_cpu'
include/asm/desc.h:211: warning: implicit declaration of function `put_cpu'

Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:01 -07:00
Randy Dunlap 33bf56106d [PATCH] feature removal of io_remap_page_range()
As written in Documentation/feature-removal-schedule.txt, remove the
io_remap_page_range() kernel API.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:33 -07:00
Andi Kleen 2ade814736 [PATCH] x86-64: clean up local_add/sub arguments
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:59 -07:00
Chuck Ebbert 66759a01ad [PATCH] x86-64: i386/x86-64: Fix time going twice as fast problem on ATI Xpress chipsets
Original patch from Bertro Simul

This is probably still not quite correct, but seems to be
the best solution so far.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:58 -07:00
Jan Beulich 049cdefe19 [PATCH] x86-64: reduce x86-64 bug frame by 4 bytes
As mentioned before, the size of the bug frame can be further reduced while
continuing to use instructions to encode the information.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:58 -07:00
Jan Beulich a2d236b3ac [PATCH] x86-64: Lose constraints on cmpxchg
While only cosmetic for x86-64, this adjusts the cmpxchg code appearantly
inherited from i386 to use more generic constraints.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:57 -07:00
Jan Beulich 1a426cb764 [PATCH] x86-64: Declare NMI_VECTOR and handle it in the IPI sending code.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:57 -07:00
Andi Kleen a2a0c992e9 [PATCH] x86-64: Remove unused vxtime.hz field
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:57 -07:00
Andi Kleen a0d58c9741 [PATCH] x86-64: Set the stack pointer correctly in init_thread and init_tss
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:57 -07:00
Jan Beulich 1209140c3c [PATCH] x86-64: Safe interrupts in oops_begin/end
Rather than blindly re-enabling interrupts in oops_end(), save their state
in oope_begin() and then restore that state.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:57 -07:00
Andi Kleen 059bf0f6c3 [PATCH] x86-64: Merge msr.c with i386 version
The only difference was the inline assembly, so move that into
asm/msr.h and merge with the i386 version.

This adds some missing sysfs support code to x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:57 -07:00
Jan Beulich 7effaa882a [PATCH] x86-64: Fix CFI information
Being the foundation for reliable stack unwinding, this fixes CFI unwind
annotations in many low-level x86_64 routines, plus a config option
(available to all architectures, and also present in the previously sent
patch adding such annotations to i386 code) to enable them separatly
rather than only along with adding full debug information.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:56 -07:00
Andi Kleen 27183ebd33 [PATCH] x86-64: Add dma_sync_single_range_for_{cpu,device}
Currently just defined to their non range parts.

Pointed out by John Linville

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:56 -07:00
Adrian Bunk 9c0aa0f9a1 [PATCH] Replace extern inline with static inline in asm-x86_64/*
They should be identical in the kernel now, but this
makes it consistent with other code.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:56 -07:00
Nakul Saraiya f297e4e5e4 [PATCH] Increase nodemap hash.
Needed for some newer Opteron systems with E stepping and memory
relocation enabled. The node addresses are different in lower
bits now so the nodemap hash function needs to be enlarged.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:55 -07:00
Jim Paradis fb048927ad [PATCH] x86-64: Fix off by one in pfn_valid
When I gave proposed the fix to pfn_valid() for RHEL4, Stephen Tweedie's
sharp eyes caught this:

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:55 -07:00
Andi Kleen 2b4a08150e [PATCH] x86-64: Increase TLB flush array size
The generic TLB flush functions kept upto 506 pages per
CPU to avoid too frequent IPIs.

This value was done for the L1 cache of older x86 CPUs,
but with modern CPUs it does not make much sense anymore.
TLB flushing is slow enough that using the L2 cache is fine.

This patch increases the flush array on x86-64 to cache
5350 pages. That is roughly 20MB with 4K pages. It speeds
up large munmaps in multithreaded processes on SMP considerably.

The cost is roughly 42k of memory per CPU, which is reasonable.

I only increased it on x86-64 for now, but it would probably
make sense to increase it everywhere. Embedded architectures
with SMP may keep it smaller to save some memory per CPU.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:58 -07:00
Andi Kleen 165aeb8284 [PATCH] x86-64: Don't include config.h in asm/timex.h
asm-x86-64/timex.h does not reference CONFIG constants.
Do not need to include config.h.

Signed-off-by: Grant Grundler <iod00d@hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:58 -07:00
Andi Kleen 3f74478b5f [PATCH] x86-64: Some cleanup and optimization to the processor data area.
- Remove unused irqrsp field
- Remove pda->me
- Optimize set_softirq_pending slightly

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:58 -07:00
Andi Kleen e5bc8b6baf [PATCH] x86-64: Make remote TLB flush more scalable
Instead of using a global spinlock to protect the state
of the remote TLB flush use a lock and state for each sending CPU.

To tell the receiver where to look for the state use 8 different
call vectors.  Each CPU uses a specific vector to trigger flushes on other
CPUs. Depending on the received vector the target CPUs look into
the right per cpu variable for the flush data.

When the system has more than 8 CPUs they are hashed to the 8 available
vectors. The limited global vector space forces us to this right now.
In future when interrupts are split into per CPU domains this could be
fixed, at the cost of needing more IPIs in flat mode.

Also some minor cleanup in the smp flush code and remove some outdated
debug code.

Requires patch to move cpu_possible_map setup earlier.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:58 -07:00
Andi Kleen 69e1a33f62 [PATCH] x86-64: Use ACPI PXM to parse PCI<->node assignments
Since this is shared code I had to implement it for i386 too

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:57 -07:00
Andi Kleen b9aac10ddd [PATCH] x86-64: Remove redundant max_mapnr and replace with end_pfn
The FLATMEM people added it, but there doesn't seem a good reason
because end_pfn is identical.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:57 -07:00
Andi Kleen 6142891a0c [PATCH] x86-64: Avoid unnecessary double bouncing for swiotlb
PCI_DMA_BUS_IS_PHYS has to be zero even when the GART IOMMU is disabled
and the swiotlb is used. Otherwise the block layer does unnecessary
double bouncing.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:56 -07:00
Andi Kleen 3f098c2605 [PATCH] x86-64: Support dualcore and 8 socket systems in k8 fallback node parsing
In particular on systems where the local APIC space and node space
is very different from the Linux CPU number space.

Previously the older NUMA setup code directly parsing the K8
northbridge registers had some issues on 8 socket or dual core
systems. This patch fixes them.

This is mainly done by fixing some confusion between Linux
CPU numbers and local APIC ids. We now pass the local APIC IDs
to later code, which avoids mismatches.

Also add some heuristics to detect cases where the Hypertransport
nodeids and the local APIC IDs don't match, but are shifted
by a constant offset.

This is still all quite hackish, hopefully BIOS writers fill
in correct SRATs instead.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:56 -07:00
Andi Kleen b91691164b [PATCH] x86-64: Don't cache align PDA on UP builds
Suggested by someone I forgot who sorry.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:56 -07:00
Andi Kleen 0b07e984fc [PATCH] x86-64: Don't assign CPU numbers in SRAT parsing
Do that later when the CPU boots. SRAT just stores the APIC<->Node
mapping node. This fixes problems on systems where the order
of SRAT entries does not match the MADT.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:55 -07:00
Andi Kleen 61c11341ed [PATCH] x86-64: Remove esr disable hack in APIC code
This was just needed for the Numasaurus, which fortunately
doesn't support x86-64 CPUs.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:55 -07:00
Andi Kleen eddfb4ed29 [PATCH] x86-64: Remove obsolete APIC "write around" bug workaround
No x86-64 chipset has this bug

Generated code doesn't change because it was always disabled.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:55 -07:00
Adrian Bunk 672289e9fa [PATCH] i386/x86_64: make get_cpu_vendor() static
get_cpu_vendor() no longer has any users in other files.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:35 -07:00
Ingo Molnar fb1c8f93d8 [PATCH] spinlock consolidation
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code.  It does the following
things:

 - consolidates and enhances the spinlock/rwlock debugging code

 - simplifies the asm/spinlock.h files

 - encapsulates the raw spinlock type and moves generic spinlock
   features (such as ->break_lock) into the generic code.

 - cleans up the spinlock code hierarchy to get rid of the spaghetti.

Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c.  (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)

Also, i've enhanced the rwlock debugging facility, it will now track
write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.

The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:

 include/asm-i386/spinlock_types.h       |   16
 include/asm-x86_64/spinlock_types.h     |   16

I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:

   SMP                         |  UP
   ----------------------------|-----------------------------------
   asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
   linux/spinlock_types.h      |  linux/spinlock_types.h
   asm/spinlock_smp.h          |  linux/spinlock_up.h
   linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
   linux/spinlock.h            |  linux/spinlock.h

/*
 * here's the role of the various spinlock/rwlock related include files:
 *
 * on SMP builds:
 *
 *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
 *                        initializers
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
 *                        implementations, mostly inline assembly code
 *
 *   (also included on UP-debug builds:)
 *
 *  linux/spinlock_api_smp.h:
 *                        contains the prototypes for the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 *
 * on UP builds:
 *
 *  linux/spinlock_type_up.h:
 *                        contains the generic, simplified UP spinlock type.
 *                        (which is an empty structure on non-debug builds)
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  linux/spinlock_up.h:
 *                        contains the __raw_spin_*()/etc. version of UP
 *                        builds. (which are NOPs on non-debug, non-preempt
 *                        builds)
 *
 *   (included on UP-non-debug builds:)
 *
 *  linux/spinlock_api_up.h:
 *                        builds the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 */

All SMP and UP architectures are converted by this patch.

arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.

From: Grant Grundler <grundler@parisc-linux.org>

  Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
  Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
  non-SMP kernels.  That should be trivial to fix up later if necessary.

  I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
  some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
  are well tested and contained entirely inside arch specific code.  I do NOT
  expect any new issues to arise with them.

 If someone does ever need to use debug/metrics with them, then they will
  need to unravel this hairball between spinlocks, atomic ops, and bit ops
  that exist only because parisc has exactly one atomic instruction: LDCW
  (load and clear word).

From: "Luck, Tony" <tony.luck@intel.com>

   ia64 fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjanv@infradead.org>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:21 -07:00
Linus Torvalds 486a153f0e Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild 2005-09-09 15:46:49 -07:00
Kenji Kaneshige 24b20ac6e1 [PATCH] remove unnecessary handle_IRQ_event() prototypes
The function prototype for handle_IRQ_event() in a few architctures is not
needed because they use GENERIC_HARDIRQ.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:33 -07:00
Sam Ravnborg e2d5df935d kbuild: alpha,x86_64 use generic asm-offsets.h support
Delete obsolete stuff from arch makefiles
Rename .h file to asm-offsets.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-09-09 21:28:48 +02:00
Len Brown 64e47488c9 Merge linux-2.6 with linux-acpi-2.6 2005-09-08 01:45:47 -04:00
Stephen Rothwell 5ac353f9ba [PATCH] Clean up struct flock definitions
This patch just gathers together all the struct flock definitions except
xtensa into asm-generic/fcntl.h.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell 1abf62afb6 [PATCH] Clean up the fcntl operations
This patch puts the most popular of each fcntl operation/flag into
asm-generic/fcntl.h and cleans up the arch files.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell e64ca97fd8 [PATCH] Clean up the open flags
This patch puts the most popular of each open flag into asm-generic/fcntl.h
and cleans up the arch files.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:38 -07:00
Stephen Rothwell 9317259ead [PATCH] Create asm-generic/fcntl.h
This set of patches creates asm-generic/fcntl.h and consolidates as much as
possible from the asm-*/fcntl.h files into it.

This patch just gathers all the identical bits of the asm-*/fcntl.h files into
asm-generic/fcntl.h.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:37 -07:00