Commit graph

275 commits

Author SHA1 Message Date
LEROY Christophe 0b05e2d671 powerpc/32: cacheable_memcpy becomes memcpy
cacheable_memcpy uses dcbz instruction and is more efficient than
memcpy when the destination is in RAM. If the destination is in an
io area, memcpy_toio() is normally used, not memcpy

This patch renames memcpy as generic_memcpy, and renames
cacheable_memcpy as memcpy

On MPC885, we get approximatly 7% increase of the transfer rate
on an FTP reception

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:27 -05:00
LEROY Christophe c152f149ce powerpc/32: Merge the new memset() with the old one
cacheable_memzero() which has become the new memset() and the old
memset() are quite similar, so just merge them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:24 -05:00
LEROY Christophe 5b2a32e806 powerpc/32: memset(0): use cacheable_memzero
cacheable_memzero uses dcbz instruction and is more efficient than
memset(0) when the destination is in RAM

This patch renames memset as generic_memset, and defines memset
as a prolog to cacheable_memzero. This prolog checks if the byte
to set is 0. If not, it falls back to generic_memcpy()

cacheable_memzero disappears as it is not referenced anywhere anymore

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:21 -05:00
LEROY Christophe df087e450d Partially revert "powerpc: Remove duplicate cacheable_memcpy/memzero functions"
This partially reverts
commit 'powerpc: Remove duplicate cacheable_memcpy/memzero functions
("b05ae4ee602b7dc90771408ccf0972e1b3801a35")'

Functions cacheable_memcpy/memzero are more efficient than
memcpy/memset as they use the dcbz instruction which avoids refill
of the cacheline with the data that we will overwrite.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:21 -05:00
LEROY Christophe 92c985f1d7 powerpc: put csum_tcpudp_magic inline
csum_tcpudp_magic() is only a few instructions, and does modify
really few registers. So it is not worth having it as a separate
function and suffer function branching and saving of volatile
registers.

This patch makes it inline by use of the already existing
csum_tcpudp_nofold() function.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:19 -05:00
Linus Torvalds 08d183e3c1 powerpc updates for 4.2
- Disable the 32-bit vdso when building LE, so we can build with a 64-bit only
    toolchain.
  - EEH fixes from Gavin & Richard.
  - Enable the sys_kcmp syscall from Laurent.
  - Sysfs control for fastsleep workaround from Shreyas.
  - Expose OPAL events as an irq chip by Alistair.
  - MSI ops moved to pci_controller_ops by Daniel.
  - Fix for kernel to userspace backtraces for perf from Anton.
  - Merge pseries and pseries_le defconfigs from Cyril.
  - CXL in-kernel API from Mikey.
  - OPAL prd driver from Jeremy.
  - Fix for DSCR handling & tests from Anshuman.
  - Powernv flash mtd driver from Cyril.
  - Dynamic DMA Window support on powernv from Alexey.
  - LLVM clang fixes & workarounds from Anton.
  - Reworked version of the patch to abort syscalls when transactional.
  - Fix the swap encoding to support 4TB, from Aneesh.
  - Various fixes as usual.
  - Freescale updates from Scott: Highlights include more 8xx optimizations, an
    e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and
    various fixes and cleanup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViSZqAAoJEFHr6jzI4aWAA7kQAKq3+pejfo2rY7alpKJyeVao
 vlaIEaDNOTh+ctcmu3MFF9Jy6fai8gNZziRXU5JRmE5RW4GVBN4KZiqXRbkVjdBK
 uG9sCX7Y58VRsS2vnGBYLsamfTMgjaXeDvgunQHVLiechJnrDr0RHEK90F3LSi73
 Axp6l8XIG63a3zFZmkhzANMCme2lm5+MWmGlSjUUNi5F+viQUgJc5iiO8xrVUgM5
 RpNlV2NJSqFiU+gMQWJ226V85UIniouq4j+qtyUcu8/m9BberyolXVU0GPlPFdsx
 r/Qh9uCJyZaUdSB5hzomQZj50IsSz6J6nEuJTeGRoVZOmeI8Dnc2xU9fxQF5fC8H
 lUJw10WPoNOggQZTeSUKn7wTXw3i4p3KsWNUczaW68VJdhqZUVaSp0+I6mnDSqzs
 9iGC+VffLYNa1OHq7mGRFrgDdLBCHes31aZ3CxlQsmyNpAPCwMzsD4TUfVnvOG6E
 oJOeaQ4mZM9PvqxEYJfoIL+vgRxmQ8sdIBtNY4in+C7J6eFnZNFO9xmPnJZuVU31
 PGtx60kjFCOVMXvqn34WkRNbgqGWI91IK0KcRwFO2LXVio1uY77TWL52kNK2IMsp
 Az+VDDvqnT3+BoV1yz0P6SrXAkwTpvFk2y+IdmEiUUN7zZFL5ZSA2epej9AzHTAK
 WID2bc5yVtIL6p6x5ICH
 =d9Wh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - disable the 32-bit vdso when building LE, so we can build with a
   64-bit only toolchain.

 - EEH fixes from Gavin & Richard.

 - enable the sys_kcmp syscall from Laurent.

 - sysfs control for fastsleep workaround from Shreyas.

 - expose OPAL events as an irq chip by Alistair.

 - MSI ops moved to pci_controller_ops by Daniel.

 - fix for kernel to userspace backtraces for perf from Anton.

 - merge pseries and pseries_le defconfigs from Cyril.

 - CXL in-kernel API from Mikey.

 - OPAL prd driver from Jeremy.

 - fix for DSCR handling & tests from Anshuman.

 - Powernv flash mtd driver from Cyril.

 - dynamic DMA Window support on powernv from Alexey.

 - LLVM clang fixes & workarounds from Anton.

 - reworked version of the patch to abort syscalls when transactional.

 - fix the swap encoding to support 4TB, from Aneesh.

 - various fixes as usual.

 - Freescale updates from Scott: Highlights include more 8xx
   optimizations, an e6500 hugetlb optimization, QMan device tree nodes,
   t1024/t1023 support, and various fixes and cleanup.

* tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits)
  cxl: Fix typo in debug print
  cxl: Add CXL_KERNEL_API config option
  powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma()
  powerpc/mm: Change the swap encoding in pte.
  powerpc/mm: PTE_RPN_MAX is not used, remove the same
  powerpc/tm: Abort syscalls in active transactions
  powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off
  powerpc/include: Add opal-prd to installed uapi headers
  powerpc/powernv: fix construction of opal PRD messages
  powerpc/powernv: Increase opal-irqchip initcall priority
  powerpc: Make doorbell check preemption safe
  powerpc/powernv: pnv_init_idle_states() should only run on powernv
  macintosh/nvram: Remove as unused
  powerpc: Don't use gcc specific options on clang
  powerpc: Don't use -mno-strict-align on clang
  powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it
  powerpc: Only use -mabi=altivec if toolchain supports it
  powerpc: Fix duplicate const clang warning in user access code
  vfio: powerpc/spapr: Support Dynamic DMA windows
  vfio: powerpc/spapr: Register memory and define IOMMU v2
  ...
2015-06-24 08:46:32 -07:00
Anton Blanchard 1fb3f5a7ca powerpc: Only use -mabi=altivec if toolchain supports it
The -mabi=altivec option is not recognised on LLVM, so use call cc-option
to check for support.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 17:33:05 +10:00
David Hildenbrand 5f76eea88d sched/preempt, powerpc: Disable preemption in enable_kernel_altivec() explicitly
enable_kernel_altivec() has to be called with disabled preemption.
Let's make this explicit, to prepare for pagefault_disable() not
touching preemption anymore.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-14-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:17 +02:00
Michael Ellerman f691fa1080 powerpc: Replace mem_init_done with slab_is_available()
We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".

But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.

So replace it with the generic and 100% correct slab_is_available().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-10 20:02:48 +10:00
Michael Ellerman df60f57684 Merge branch 'next-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into test
Merge miscellaneous bits from benh. Fix a minor conflict with
OpalMessageType changing names to opal_msg_type.
2015-03-26 20:04:28 +11:00
Geert Uytterhoeven 1f8c82ab1b cpufreq/ppc: Add missing #include <asm/smp.h>
If CONFIG_SMP=n, <linux/smp.h> does not include <asm/smp.h>, causing:

drivers/cpufreq/ppc-corenet-cpufreq.c: In function 'corenet_cpufreq_cpu_init':
drivers/cpufreq/ppc-corenet-cpufreq.c:173:3: error: implicit declaration of function 'get_hard_smp_processor_id' [-Werror=implicit-funcuresh E. Warrier" <warrier@linux.vnet.ibm.com>
X-Patchwork-Id: 443703
Message-Id: <54EE5989.7010800@linux.vnet.ibm.com>
To: linuxppc-dev@ozlabs.org
Date: Wed, 25 Feb 2015 17:23:53 -0600

Export __spin_yield so that the arch_spin_unlock() function can
be invoked from a module. This will be required for modules where
we want to take a lock that is also is acquired in hypervisor
real mode. Because we want to avoid running any lockdep code
(which may not be safe in real mode), this lock needs to be
an arch_spinlock_t instead of a normal spinlock.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-25 16:53:28 +11:00
Kyle Moffett b05ae4ee60 powerpc: Remove duplicate cacheable_memcpy/memzero functions
These functions are only used from one place each.  If the cacheable_*
versions really are more efficient, then those changes should be
migrated into the common code instead.

NOTE: The old routines are just flat buggy on kernels that support
      hardware with different cacheline sizes.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 11:25:50 +11:00
Markus Elfring 7f4eec3953 powerpc: Delete unnecessary checks before kfree()
The kfree() function tests whether its argument is NULL and then returns
immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:14 +11:00
Anton Blanchard df99e6eb3f powerpc: Change vsrX register defines to vsX to match gcc and glibc
As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VSX register definitions - the kernel uses vsrX whereas gcc uses
vsX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:32:11 +11:00
Anton Blanchard c2ce6f9f3d powerpc: Change vrX register defines to vX to match gcc and glibc
As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VMX register definitions - the kernel uses vrX whereas both gcc and
glibc use vX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:32:11 +11:00
Michael Ellerman 1dcee55fea powerpc/lib: Makefile, use obj64-y to consolidate 64-bit rules
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:00:24 +11:00
Michael Ellerman 564ec2f2a0 powerpc/lib: Makefile, consolidate obj-y sections
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:00:24 +11:00
Anton Blanchard 15c2d45d17 powerpc: Add 64bit optimised memcmp
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:55 +11:00
Andreas Ruprecht 803d57de2b powerpc/lib: Do not include string.o in obj-y twice
In the Makefile, string.o (which is generated from string.S) is
included into the list of objects being built unconditionally
(obj-y) in line 12.

Additionally, if CONFIG_PPC64 is set, it is included again in
line 17.

This patch removes the latter unnecessary inclusion.

Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:45:55 +11:00
Linus Torvalds a7cb7bb664 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree update from Jiri Kosina:
 "Usual stuff: documentation updates, printk() fixes, etc"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
  intel_ips: fix a type in error message
  cpufreq: cpufreq-dt: Move newline to end of error message
  ps3rom: fix error return code
  treewide: fix typo in printk and Kconfig
  ARM: dts: bcm63138: change "interupts" to "interrupts"
  Replace mentions of "list_struct" to "list_head"
  kernel: trace: fix printk message
  scsi: mpt2sas: fix ioctl in comment
  zbud, zswap: change module author email
  clocksource: Fix 'clcoksource' typo in comment
  arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help
  gpio: msm-v1: make boolean argument more obvious
  usb: Fix typo in usb-serial-simple.c
  PCI: Fix comment typo 'COMFIG_PM_OPS'
  powerpc: Fix comment typo 'CONIFG_8xx'
  powerpc: Fix comment typos 'CONFiG_ALTIVEC'
  clk: st: Spelling s/stucture/structure/
  isci: Spelling s/stucture/structure/
  usb: gadget: zero: Spelling s/infrastucture/infrastructure/
  treewide: Fix company name in module descriptions
  ...
2014-12-12 10:08:06 -08:00
Jiri Kosina a02001086b Merge Linus' tree to be be to apply submitted patches to newer code than
current trivial.git base
2014-11-20 14:42:02 +01:00
Michael Ellerman e39f223fc9 powerpc: Remove more traces of bootmem
Although we are now selecting NO_BOOTMEM, we still have some traces of
bootmem lying around. That is because even with NO_BOOTMEM there is
still a shim that converts bootmem calls into memblock calls, but
ultimately we want to remove all traces of bootmem.

Most of the patch is conversions from alloc_bootmem() to
memblock_virt_alloc(). In general a call such as:

  p = (struct foo *)alloc_bootmem(x);

Becomes:

  p = memblock_virt_alloc(x, 0);

We don't need the cast because memblock_virt_alloc() returns a void *.
The alignment value of zero tells memblock to use the default alignment,
which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.

We remove a number of NULL checks on the result of
memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
if it can't allocate, in exactly the same way as alloc_bootmem(), so the
NULL checks are and always have been redundant.

The memory returned by memblock_virt_alloc() is already zeroed, so we
remove several memsets of the result of memblock_virt_alloc().

Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
to that is pointless, 16XB ought to be enough for anyone.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-19 21:41:51 +11:00
Paul Mackerras 7048c84694 powerpc: Fix compilation of emulate_step()
Commit be96f63375 ("powerpc: Split out instruction analysis
part of emulate_step()") added some calls to do_fp_load()
and do_fp_store(), which fail to compile on configs with
CONFIG_PPC_FPU=n and CONFIG_PPC_EMULATE_SSTEP=y.  This fixes
the compile by adding #ifdef CONFIG_PPC_FPU around the code
that calls these functions.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-12 15:54:29 +11:00
Kyle McMartin dedd24a12f powerpc: Remove unused devm_ioremap_prot()
Added in 2008, but has never had any in-tree users, and no other
architectures provide it.

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-10 09:59:28 +11:00
Paul Bolle c2522dcdcb powerpc: Fix comment typos 'CONFiG_ALTIVEC'
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-10-29 14:41:49 +01:00
Paul Mackerras c9f6f4ed95 powerpc: Implement emulation of string loads and stores
The size field of the op.type word is now the total number of bytes
to be loaded or stored.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:52 +10:00
Paul Mackerras cf87c3f6b6 powerpc: Emulate icbi, mcrf and conditional-trap instructions
This extends the instruction emulation done by analyse_instr() and
emulate_step() to handle a few more instructions that are found in
the kernel.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:51 +10:00
Paul Mackerras be96f63375 powerpc: Split out instruction analysis part of emulate_step()
This splits out the instruction analysis part of emulate_step() into
a separate analyse_instr() function, which decodes the instruction,
but doesn't execute any load or store instructions.  It does execute
integer instructions and branches which can be executed purely by
updating register values in the pt_regs struct.  For other instructions,
it returns the instruction type and other details in a new
instruction_op struct.  emulate_step() then uses that information
to execute loads, stores, cache operations, mfmsr, mtmsr[d], and
(on 64-bit) sc instructions.

The reason for doing this is so that the KVM code can use it instead
of having its own separate instruction emulation code.  Possibly the
alignment interrupt handler could also use this.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:51 +10:00
Anton Blanchard e51df2c170 powerpc: Make a bunch of things static
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:41 +10:00
Anton Blanchard 7b20a955c3 powerpc: Move lib symbol exports into arch/powerpc/lib/ppc_ksyms.c
Move the lib symbol exports closer to their function definitions

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:39 +10:00
Michael Ellerman 78e05b1421 powerpc: Add smp_mb()s to arch_spin_unlock_wait()
Similar to the previous commit which described why we need to add a
barrier to arch_spin_is_locked(), we have a similar problem with
spin_unlock_wait().

We need a barrier on entry to ensure any spinlock we have previously
taken is visibly locked prior to the load of lock->slock.

It's also not clear if spin_unlock_wait() is intended to have ACQUIRE
semantics. For now be conservative and add a barrier on exit to give it
ACQUIRE semantics.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:27 +10:00
Michael Ellerman 0f369103ce powerpc: Remove power3 from comments
There are still a few occurences where it remains, because it helps to
explain something that persists.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28 14:10:26 +10:00
Li Zhong 6f5405bc2e powerpc: use _GLOBAL_TOC for memmove
memmove may be called from module code copy_pages(btrfs), and it may
call memcpy, which may call back to C code, so it needs to use
_GLOBAL_TOC to set up r2 correctly.

This fixes following error when I tried to boot an le guest:

Vector: 300 (Data Access) at [c000000073f97210]
    pc: c000000000015004: enable_kernel_altivec+0x24/0x80
    lr: c000000000058fbc: enter_vmx_copy+0x3c/0x60
    sp: c000000073f97490
   msr: 8000000002009033
   dar: d000000001d50170
 dsisr: 40000000
  current = 0xc0000000734c0000
  paca    = 0xc00000000fff0000	 softe: 0	 irq_happened: 0x01
    pid   = 815, comm = mktemp
enter ? for help
[c000000073f974f0] c000000000058fbc enter_vmx_copy+0x3c/0x60
[c000000073f97510] c000000000057d34 memcpy_power7+0x274/0x840
[c000000073f97610] d000000001c3179c copy_pages+0xfc/0x110 [btrfs]
[c000000073f97660] d000000001c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs]
[c000000073f97700] d000000001be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs]
[c000000073f97820] d000000001be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs]
[c000000073f97890] d000000001bfed30 insert_with_overflow+0x70/0x180 [btrfs]
[c000000073f97900] d000000001bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs]
[c000000073f979a0] d000000001c1f92c btrfs_add_link+0x10c/0x370 [btrfs]
[c000000073f97a40] d000000001c20e94 btrfs_create+0x204/0x270 [btrfs]
[c000000073f97b00] c00000000026d438 vfs_create+0x178/0x210
[c000000073f97b50] c000000000270a70 do_last+0x9f0/0xe90
[c000000073f97c20] c000000000271010 path_openat+0x100/0x810
[c000000073f97ce0] c000000000272ea8 do_filp_open+0x58/0xd0
[c000000073f97dc0] c00000000025ade8 do_sys_open+0x1b8/0x300
[c000000073f97e30] c00000000000a008 syscall_exit+0x0/0x7c

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-22 15:56:04 +10:00
Paul Mackerras e698b96678 powerpc: Fix bugs in emulate_step()
This fixes some bugs in emulate_step().  First, the setting of the carry
bit for the arithmetic right-shift instructions was not correct on 64-bit
machines because we were masking with a mask of type int rather than
unsigned long.  Secondly, the sld (shift left doubleword) instruction was
using the wrong instruction field for the register containing the shift
count.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-22 15:55:51 +10:00
Paul Bolle b69a1da94f powerpc: fix typo 'CONFIG_PPC_CPU'
Commit cd64d1697c ("powerpc: mtmsrd not defined") added a check for
CONFIG_PPC_CPU were a check for CONFIG_PPC_FPU was clearly intended.

Fixes: cd64d1697c ("powerpc: mtmsrd not defined")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:25 +10:00
Anton Blanchard 2ac7b0166a powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC()
__clear_user and copy_page load from the TOC and are also exported
to modules. This means we have to use _GLOBAL_TOC() so that we
create the global entry point that sets up the TOC.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:41 +10:00
Benjamin Herrenschmidt f6869e7fe6 Merge remote-tracking branch 'anton/abiv2' into next
This series adds support for building the powerpc 64-bit
LE kernel using the new ABI v2. We already supported
running ABI v2 userspace programs but this adds support
for building the kernel itself using the new ABI.
2014-05-05 20:57:12 +10:00
Philippe Bergheaud 00f554fade powerpc: memcpy optimization for 64bit LE
Unaligned stores take alignment exceptions on POWER7 running in little-endian.
This is a dumb little-endian base memcpy that prevents unaligned stores.
Once booted the feature fixup code switches over to the VMX copy loops
(which are already endian safe).

The question is what we do before that switch over. The base 64bit
memcpy takes alignment exceptions on POWER7 so we can't use it as is.
Fixing the causes of alignment exception would slow it down, because
we'd need to ensure all loads and stores are aligned either through
rotate tricks or bytewise loads and stores. Either would be bad for
all other 64bit platforms.

[ I simplified the loop a bit - Anton ]

Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-30 15:26:18 +10:00
Anton Blanchard 169c7cee31 powerpc: Add _GLOBAL_TOC for ABIv2 assembly functions exported to modules
If an assembly function that calls back into c code is exported to
modules, we need to ensure r2 is setup correctly. There are only
two places crazy enough to do it (two of which are my fault).

Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-23 10:05:32 +10:00
Ulrich Weigand 752a6422fe powerpc: Fix unsafe accesses to parameter area in ELFv2
Some of the assembler files in lib/ make use of the fact that in the
ELFv1 ABI, the caller guarantees to provide stack space to save the
parameter registers r3 ... r10.  This guarantee is no longer present
in ELFv2 for functions that have no variable argument list and no
more than 8 arguments.

Change the affected routines to temporarily store registers in the
red zone and/or the top of their own stack frame (in the space
provided to save r31 .. r29, which is actually not used in these
routines).

In opal_query_takeover, simply always allocate a stack frame;
the routine is not performance critical.

Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-23 10:05:24 +10:00
Anton Blanchard b37c10d128 powerpc: Fix ABIv2 issues with stack offsets in assembly code
Fix STK_PARAM and use it instead of hardcoding ABIv1 offsets.

Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-23 10:05:23 +10:00
Anton Blanchard b1576fec7f powerpc: No need to use dot symbols when branching to a function
binutils is smart enough to know that a branch to a function
descriptor is actually a branch to the functions text address.

Alan tells me that binutils has been doing this for 9 years.

Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-23 10:05:16 +10:00
Michael Ellerman 22d651dcef selftests/powerpc: Import Anton's memcpy / copy_tofrom_user tests
Turn Anton's memcpy / copy_tofrom_user test into something that can
live in tools/testing/selftests.

It requires one turd in arch/powerpc/lib/memcpy_64.S, but it's pretty
harmless IMHO.

We are sailing very close to the wind with the feature macros. We define
them to nothing, which currently means we get a few extra nops and
include the unaligned calls.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-07 15:53:12 +11:00
Andreas Schwab 8fe9c93e74 powerpc: Add vr save/restore functions
GCC 4.8 now generates out-of-line vr save/restore functions when
optimizing for size.  They are needed for the raid6 altivec support.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:46:43 +11:00
Benjamin Herrenschmidt dece8ada99 Merge branch 'merge' into next
Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such
as Anton Little Endian fixes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 15:19:31 +11:00
Paul E. McKenney 20151169f1 powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
The powerpc 64-bit __copy_tofrom_user() function uses shifts to handle
unaligned invocations.  However, these shifts were designed for
big-endian systems: On little-endian systems, they must shift in the
opposite direction.

This commit relies on the C preprocessor to insert the correct shifts
into the assembly code.

[ This is a rare but nasty LE issue. Most of the time we use the POWER7
optimised __copy_tofrom_user_power7 loop, but when it hits an exception
we fall back to the base __copy_tofrom_user loop. - Anton ]

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 14:02:30 +11:00
Kevin Hao 1e8341ae0c powerpc: Move the patch_exception to a common place
So that it can be used by other codes. No function change.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-02 14:06:54 +11:00
Anton Blanchard ef1313deaf powerpc: Add VMX optimised xor for RAID5
Add a VMX optimised xor, used primarily for RAID5. On a POWER7 blade
this is a decent win:

   32regs    : 17932.800 MB/sec
   altivec   : 19724.800 MB/sec

The bigger gain is when the same test is run in SMT4 mode, as it
would if there was a lot of work going on:

   8regs     :  8377.600 MB/sec
   altivec   : 15801.600 MB/sec

I tested this against an array created without the patch, and also
verified it worked as expected on a little endian kernel.

[ Fix !CONFIG_ALTIVEC build -- BenH ]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-30 16:02:28 +11:00
Tom Musta dbc2fbd7c2 powerpc: Fix Unaligned LE Floating Point Loads and Stores
This patch addresses unaligned single precision floating point loads
and stores in the single-step code.  The old implementation
improperly treated an 8 byte structure as an array of two 4 byte
words, which is a classic little endian bug.

Signed-off-by: Tom Musta <tmusta@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-30 16:01:36 +11:00
Tom Musta 6506b4718b powerpc: Fix Unaligned Loads and Stores
This patch modifies the unaligned access routines of the sstep.c
module so that it properly reverses the bytes of storage operands
in the little endian kernel kernel.   This is implemented by
breaking an unaligned little endian access into a combination of
single byte accesses plus an overal byte reversal operation.

Signed-off-by: Tom Musta <tmusta@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-30 16:01:30 +11:00
Anton Blanchard de577a3568 powerpc: Use generic memcpy code in little endian
We need to fix some endian issues in our memcpy code. For now
just enable the generic memcpy routine for little endian builds.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11 16:48:40 +11:00
Anton Blanchard 7a332b0c9a powerpc: Use generic checksum code in little endian
We need to fix some endian issues in our checksum code. For now
just enable the generic checksum routines for little endian builds.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11 16:48:39 +11:00
Anton Blanchard 32ee1e188e powerpc: Fix endian issues in VMX copy loops
Fix the permute loops for little endian.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11 16:48:25 +11:00
Paul E. McKenney 8f21bd0090 powerpc: Restore registers on error exit from csum_partial_copy_generic()
The csum_partial_copy_generic() function saves the PowerPC non-volatile
r14, r15, and r16 registers for the main checksum-and-copy loop.
Unfortunately, it fails to restore them upon error exit from this loop,
which results in silent corruption of these registers in the presumably
rare event of an access exception within that loop.

This commit therefore restores these register on error exit from the loop.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-03 17:22:42 +10:00
Paul E. McKenney d9813c3681 powerpc: Fix parameter clobber in csum_partial_copy_generic()
The csum_partial_copy_generic() uses register r7 to adjust the remaining
bytes to process.  Unfortunately, r7 also holds a parameter, namely the
address of the flag to set in case of access exceptions while reading
the source buffer.  Lacking a quantum implementation of PowerPC, this
commit instead uses register r9 to do the adjusting, leaving r7's
pointer uncorrupted.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-03 17:22:36 +10:00
Benjamin Herrenschmidt cbc9565ee8 powerpc: Remove ksp_limit on ppc64
We've been keeping that field in thread_struct for a while, it contains
the "limit" of the current stack pointer and is meant to be used for
detecting stack overflows.

It has a few problems however:

 - First, it was never actually *used* on 64-bit. Set and updated but
not actually exploited

 - When switching stack to/from irq and softirq stacks, it's update
is racy unless we hard disable interrupts, which is costly. This
is fine on 32-bit as we don't soft-disable there but not on 64-bit.

Thus rather than fixing 2 in order to implement 1 in some hypothetical
future, let's remove the code completely from 64-bit. In order to avoid
a clutter of ifdef's, we remove the updates from C code completely
during interrupt stack switching, and instead maintain it from the
asm helper that is used to do the stack switching in the first place.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-25 14:15:51 +10:00
Tom Musta 17e8de7e18 powerpc: Unaligned stores and stmw are broken in emulation code
The stmw instruction was incorrectly decoded as an update form instruction
and thus the RA register was being clobbered.

Also, the utility routine to write memory to unaligned addresses breaks the
operation into smaller aligned accesses but was incorrectly incrementing
the address by only one; it needs to increment the address by the size of
the smaller aligned chunk.

Signed-off-by: Tom Musta <tmusta@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-27 14:36:08 +10:00
Anton Blanchard 7ffcf8ec26 powerpc: Fix little endian lppaca, slb_shadow and dtl_entry
The lppaca, slb_shadow and dtl_entry hypervisor structures are
big endian, so we have to byte swap them in little endian builds.

LE KVM hosts will also need to be fixed but for now add an #error
to remind us.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14 15:33:35 +10:00
Michael Neuling 70a54a4fae powerpc: Fix single step emulation of 32bit overflowed branches
Check truncate_if_32bit() on final write to nip.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:13 +10:00
Michael Neuling 280a5ba22c powerpc/pseries: Improve stream generation comments in copypage/user
No code changes, just documenting what's happening a little better.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:26 +10:00
Suzuki K. Poulose 5e249d4528 uprobes/powerpc: Add dependency on single step emulation
Uprobes uses emulate_step in sstep.c, but we haven't explicitly specified
the dependency. On pseries HAVE_HW_BREAKPOINT protects us, but 44x has no
such luxury.

Consolidate other users that depend on sstep and create a new config option.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: linuxppc-dev@ozlabs.org
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-29 11:35:06 +11:00
Anton Blanchard 1fbe9cf259 powerpc: Build kernel with -mcmodel=medium
Finally remove the two level TOC and build with -mcmodel=medium.

Unfortunately we can't build modules with -mcmodel=medium due to
the tricks the kernel module loader plays with percpu data:

# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
#
# The kernel module loader relocates the percpu data section from the
# original location (starting with 0xd...) to somewhere in the base
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.

On older kernels we fall back to the two level TOC (-mminimal-toc)

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:00:31 +11:00
Nishanth Aravamudan c8adfeccee powerpc: Fix VMX fix for memcpy case
In 2fae7cdb60 ("powerpc: Fix VMX in
interrupt check in POWER7 copy loops"), Anton inadvertently
introduced a regression for memcpy on POWER7 machines. copyuser and
memcpy diverge slightly in their use of cr1 (copyuser doesn't use it,
but memcpy does) and you end up clobbering that register with your fix.
That results in (taken from an FC18 kernel):

[   18.824604] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052f40
[   18.824618] Oops: Unrecoverable VMX/Altivec Unavailable Exception, sig: 6 [#1]
[   18.824623] SMP NR_CPUS=1024 NUMA pSeries
[   18.824633] Modules linked in: tg3(+) be2net(+) cxgb4(+) ipr(+) sunrpc xts lrw gf128mul dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua squashfs cramfs
[   18.824705] NIP: c000000000052f40 LR: c00000000020b874 CTR: 0000000000000512
[   18.824709] REGS: c000001f1fef7790 TRAP: 0f20   Not tainted  (3.6.0-0.rc6.git0.2.fc18.ppc64)
[   18.824713] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 4802802e  XER: 20000010
[   18.824726] SOFTE: 0
[   18.824728] CFAR: 0000000000000f20
[   18.824731] TASK = c000000fa7128400[0] 'swapper/24' THREAD: c000000fa7480000 CPU: 24
GPR00: 00000000ffffffc0 c000001f1fef7a10 c00000000164edc0 c000000f9b9a8120
GPR04: c000000f9b9a8124 0000000000001438 0000000000000060 03ffffff064657ee
GPR08: 0000000080000000 0000000000000010 0000000000000020 0000000000000030
GPR12: 0000000028028022 c00000000ff25400 0000000000000001 0000000000000000
GPR16: 0000000000000000 7fffffffffffffff c0000000016b2180 c00000000156a500
GPR20: c000000f968c7a90 c0000000131c31d8 c000001f1fef4000 c000000001561d00
GPR24: 000000000000000a 0000000000000000 0000000000000001 0000000000000012
GPR28: c000000fa5c04f80 00000000000008bc c0000000015c0a28 000000000000022e
[   18.824792] NIP [c000000000052f40] .memcpy_power7+0x5a0/0x7c4
[   18.824797] LR [c00000000020b874] .pcpu_free_area+0x174/0x2d0
[   18.824800] Call Trace:
[   18.824803] [c000001f1fef7a10] [c000000000052c14] .memcpy_power7+0x274/0x7c4 (unreliable)
[   18.824809] [c000001f1fef7b10] [c00000000020b874] .pcpu_free_area+0x174/0x2d0
[   18.824813] [c000001f1fef7bb0] [c00000000020ba88] .free_percpu+0xb8/0x1b0
[   18.824819] [c000001f1fef7c50] [c00000000043d144] .throtl_pd_exit+0x94/0xd0
[   18.824824] [c000001f1fef7cf0] [c00000000043acf8] .blkg_free+0x88/0xe0
[   18.824829] [c000001f1fef7d90] [c00000000018c048] .rcu_process_callbacks+0x2e8/0x8a0
[   18.824835] [c000001f1fef7e90] [c0000000000a8ce8] .__do_softirq+0x158/0x4d0
[   18.824840] [c000001f1fef7f90] [c000000000025ecc] .call_do_softirq+0x14/0x24
[   18.824845] [c000000fa7483650] [c000000000010e80] .do_softirq+0x160/0x1a0
[   18.824850] [c000000fa74836f0] [c0000000000a94a4] .irq_exit+0xf4/0x120
[   18.824854] [c000000fa7483780] [c000000000020c44] .timer_interrupt+0x154/0x4d0
[   18.824859] [c000000fa7483830] [c000000000003be0] decrementer_common+0x160/0x180
[   18.824866] --- Exception: 901 at .plpar_hcall_norets+0x84/0xd4
[   18.824866]     LR = .check_and_cede_processor+0x48/0x80
[   18.824871] [c000000fa7483b20] [c00000000007f018] .check_and_cede_processor+0x18/0x80 (unreliable)
[   18.824877] [c000000fa7483b90] [c00000000007f104] .dedicated_cede_loop+0x84/0x150
[   18.824883] [c000000fa7483c50] [c0000000006bc030] .cpuidle_enter+0x30/0x50
[   18.824887] [c000000fa7483cc0] [c0000000006bc9f4] .cpuidle_idle_call+0x104/0x720
[   18.824892] [c000000fa7483d80] [c000000000070af8] .pSeries_idle+0x18/0x40
[   18.824897] [c000000fa7483df0] [c000000000019084] .cpu_idle+0x1a4/0x380
[   18.824902] [c000000fa7483ec0] [c0000000008a4c18] .start_secondary+0x520/0x528
[   18.824907] [c000000fa7483f90] [c0000000000093f0] .start_secondary_prolog+0x10/0x14
[   18.824911] Instruction dump:
[   18.824914] 38840008 90030000 90e30004 38630008 7ca62850 7cc300d0 78c7e102 7cf01120
[   18.824923] 78c60660 39200010 39400020 39600030 <7e00200c> 7c0020ce 38840010 409f001c
[   18.824935] ---[ end trace 0bb95124affaaa45 ]---
[   18.825046] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052d08

I believe the right fix is to make memcpy match usercopy and not use
cr1.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@kernel.org> [v3.6]
2012-10-04 18:02:43 +10:00
Tiejun Chen 8e9f693715 powerpc/kprobe: Don't emulate store when kprobe stwu r1
We don't do the real store operation for kprobing 'stwu Rx,(y)R1'
since this may corrupt the exception frame, now we will do this
operation safely in exception return code after migrate current
exception frame below the kprobed function stack.

So we only update gpr[1] here and trigger a thread flag to mask
this.

Note we should make sure if we trigger kernel stack over flow.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-18 15:32:45 +10:00
Benjamin Herrenschmidt 636802ef96 powerpc: Don't use __put_user() in patch_instruction
patch_instruction() can be called very early on ppc32, when the kernel
isn't yet running at it's linked address. That can cause the !
is_kernel_addr() test in __put_user() to trip and call might_sleep()
which is very bad at that point during boot.

Use a lower level function instead for now, at least until we get to
rework ppc32 boot process to do the code patching later, like ppc64
does.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 16:05:23 +10:00
Anton Blanchard 2fae7cdb60 powerpc: Fix VMX in interrupt check in POWER7 copy loops
The enhanced prefetch hint patches corrupt the condition register
that was used to check if we are in interrupt. Fix this by using cr1.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-08-24 20:26:09 +10:00
Anton Blanchard dad477ccd6 powerpc: POWER7 copy_to_user/copy_from_user patch applied twice
"powerpc: Use enhanced touch instructions in POWER7
copy_to_user/copy_from_user" was applied twice. Remove one.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-08-24 20:26:09 +10:00
Stephen Rothwell 1d5a436d2c powerpc: Put the gpr save/restore functions in their own section
This allows the linker to know that calls to them do not need to switch
TOC and stop errors like the following when linking large configurations:

powerpc64-linux-ld: drivers/built-in.o: In function `.gpiochip_is_requested':
(.text+0x4): sibling call optimization to `_savegpr0_29' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `_savegpr0_29' extern

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-11 14:19:59 +10:00
Michael Neuling e55174e911 powerpc: Fixes for instructions not using correct register naming
These macros are using integers where they could be using logical
names since they take registers.

We are going to enforce this soon, so fix these up now.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:18:16 +10:00
Michael Neuling 86e32fdce7 powerpc: Change mtcrf to use real register names
mtocrf define is just a wrapper around the real instructions so we can
just use real register names here (ie. lower case).

Also remove braces in macro so this is possible.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:18:11 +10:00
Michael Neuling 44ce6a5ee7 powerpc: Merge STK_REG/PARAM/FRAMESIZE
Merge the defines of STACKFRAMESIZE, STK_REG, STK_PARAM from different
places.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:18:03 +10:00
Michael Neuling c75df6f96c powerpc: Fix usage of register macros getting ready for %r0 change
Anything that uses a constructed instruction (ie. from ppc-opcode.h),
need to use the new R0 macro, as %r0 is not going to work.

Also convert usages of macros where we are just determining an offset
(usually for a load/store), like:
	std	r14,STK_REG(r14)(r1)
Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since
it's just calculating an offset.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10 19:17:55 +10:00
Anton Blanchard cf8fb5533f powerpc: Optimise the 64bit optimised __clear_user
I blame Mikey for this. He elevated my slightly dubious testcase:

to benchmark status. And naturally we need to be number 1 at creating
zeros. So lets improve __clear_user some more.

As Paul suggests we can use dcbz for large lengths. This patch gets
the destination cacheline aligned then uses dcbz on whole cachelines.

Before:
10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s

After:
10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s

39 GB/s, a new record.

Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Olof Johansson <olof@lixom.net>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:48 +10:00
Anton Blanchard b3f271e86e powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch
Implement a POWER7 optimised memcpy using VMX and enhanced prefetch
instructions.

This is a copy of the POWER7 optimised copy_to_user/copy_from_user
loop. Detailed implementation and performance details can be found in
commit a66086b819 (powerpc: POWER7 optimised
copy_to_user/copy_from_user using VMX).

I noticed memcpy issues when profiling a RAID6 workload:

	.memcpy
	.async_memcpy
	.async_copy_data
	.__raid_run_ops
	.handle_stripe
	.raid5d
	.md_thread

I created a simplified testcase by building a RAID6 array with 4 1GB
ramdisks (booting with brd.rd_size=1048576):

# mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3]

I then timed how long it took to write to the entire array:

# dd if=/dev/zero of=/dev/md0 bs=1M

Before: 892 MB/s
After:  999 MB/s

A 12% improvement.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:46 +10:00
Anton Blanchard bce4b4bd91 powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_user
Version 2.06 of the POWER ISA introduced enhanced touch instructions,
allowing us to specify a number of attributes including the length of
a stream.

This patch adds a software stream for both loads and stores in the
POWER7 copy_tofrom_user loop. Since the setup is quite complicated
and we have to use an eieio to ensure correct ordering of the "GO"
command we only do this for copies above 4kB.

To quantify any performance improvements we need a working set
bigger than the caches so we operate on a 1GB file:

# dd if=/dev/zero of=/tmp/foo bs=1M count=1024

And we compare how fast we can read the file:

# dd if=/tmp/foo of=/dev/null bs=1M

before: 7.7 GB/s
after:  9.6 GB/s

A 25% improvement.

The worst case for this patch will be a completely L1 cache contained
copy of just over 4kB. We can test this with the copy_to_user
testcase we used to tune copy_tofrom_user originally:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

# time ./copy_to_user2 -l 4224 -i 10000000

before: 6.807 s
after:  6.946 s

A 2% slowdown, which seems reasonable considering our data is unlikely
to be completely L1 contained.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:45 +10:00
Anton Blanchard fde69282b7 powerpc: POWER7 optimised copy_page using VMX and enhanced prefetch
Implement a POWER7 optimised copy_page using VMX and enhanced
prefetch instructions. We use enhanced prefetch hints to prefetch
both the load and store side. We copy a cacheline at a time and
fall back to regular loads and stores if we are unable to use VMX
(eg we are in an interrupt).

The following microbenchmark was used to assess the impact of
the patch:

http://ozlabs.org/~anton/junkcode/page_fault_file.c

We test MAP_PRIVATE page faults across a 1GB file, 100 times:

# time ./page_fault_file -p -l 1G -i 100

Before: 22.25s
After:  18.89s

17% faster

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:44 +10:00
Anton Blanchard 6f7839e542 powerpc: Rename copyuser_power7_vmx.c to vmx-helper.c
Subsequent patches will add more VMX library functions and it makes
sense to keep all the c-code helper functions in the one file.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:43 +10:00
Anton Blanchard a9514dc69d powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_user
Version 2.06 of the POWER ISA introduced enhanced touch instructions,
allowing us to specify a number of attributes including the length of
a stream.

This patch adds a software stream for both loads and stores in the
POWER7 copy_tofrom_user loop. Since the setup is quite complicated
and we have to use an eieio to ensure correct ordering of the "GO"
command we only do this for copies above 4kB.

To quantify any performance improvements we need a working set
bigger than the caches so we operate on a 1GB file:

# dd if=/dev/zero of=/tmp/foo bs=1M count=1024

And we compare how fast we can read the file:

# dd if=/tmp/foo of=/dev/null bs=1M

before: 7.7 GB/s
after:  9.6 GB/s

A 25% improvement.

The worst case for this patch will be a completely L1 cache contained
copy of just over 4kB. We can test this with the copy_to_user
testcase we used to tune copy_tofrom_user originally:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

# time ./copy_to_user2 -l 4224 -i 10000000

before: 6.807 s
after:  6.946 s

A 2% slowdown, which seems reasonable considering our data is unlikely
to be completely L1 contained.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:42 +10:00
Anton Blanchard 17968fbbd1 powerpc: 64bit optimised __clear_user
I noticed __clear_user high up in a profile of one of my RAID stress
tests. The testcase was doing a dd from /dev/zero which ends up
calling __clear_user.

__clear_user is basically a loop with a single 4 byte store which
is horribly slow. We can do much better by aligning the desination
and doing 32 bytes of 8 byte stores in a loop.

The following testcase was used to verify the patch:

http://ozlabs.org/~anton/junkcode/stress_clear_user.c

To show the improvement in performance I ran a dd from /dev/zero
to /dev/null on a POWER7 box:

Before:

# dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s

After:

# time dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s

Over 5x faster.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:41 +10:00
Steven Rostedt b6e3796834 powerpc: Have patch_instruction detect faults
For ftrace to use the patch_instruction code, it needs to check for
faults on write. Ftrace updates code all over the kernel, and we need to
know if code is updated or not due to protections that are placed on
some portions of the kernel. If ftrace does not detect a fault, it will
error later on, and it will be much more difficult to find the problem.

By changing patch_instruction() to detect faults, then ftrace will be
able to make use of it too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03 14:14:38 +10:00
Paul Mackerras 1629372caa powerpc: Use the new generic strncpy_from_user() and strnlen_user()
This is much the same as for SPARC except that we can do the find_zero()
function more efficiently using the count-leading-zeroes instructions.
Tested on 32-bit and 64-bit PowerPC.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-27 21:00:07 -07:00
Anton Blanchard 694caf0255 powerpc: Remove CONFIG_POWER4_ONLY
Remove CONFIG_POWER4_ONLY, the option is badly named and only does two
things:

- It wraps the MMU segment table code. With feature fixups there is
  little downside to compiling this in.

- It uses the newer mtocrf instruction in various assembly functions.
  Instead of making this a compile option just do it at runtime via
  a feature fixup.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-04-30 15:37:26 +10:00
David Howells ae3a197e3d Disintegrate asm/system.h for PowerPC
Disintegrate asm/system.h for PowerPC.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: linuxppc-dev@lists.ozlabs.org
2012-03-28 18:30:02 +01:00
Stephen Rothwell f5339277eb powerpc: Remove FW_FEATURE ISERIES from arch code
This is no longer selectable, so just remove all the dependent code.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-21 11:16:11 +11:00
Anton Blanchard a66086b819 powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX
Implement a POWER7 optimised copy_to_user/copy_from_user using VMX.
For large aligned copies this new loop is over 10% faster, and for
large unaligned copies it is over 200% faster.

If we take a fault we fall back to the old version, this keeps
things relatively simple and easy to verify.

On POWER7 unaligned stores rarely slow down - they only flush when
a store crosses a 4KB page boundary. Furthermore this flush is
handled completely in hardware and should be 20-30 cycles.

Unaligned loads on the other hand flush much more often - whenever
crossing a 128 byte cache line, or a 32 byte sector if either sector
is an L1 miss.

Considering this information we really want to get the loads aligned
and not worry about the alignment of the stores. Microbenchmarks
confirm that this approach is much faster than the current unaligned
copy loop that uses shifts and rotates to ensure both loads and
stores are aligned.

We also want to try and do the stores in cacheline aligned, cacheline
sized chunks. If the store queue is unable to merge an entire
cacheline of stores then the L2 cache will have to do a
read/modify/write. Even worse, we will serialise this with the stores
in the next iteration of the copy loop since both iterations hit
the same cacheline.

Based on this, the new loop does the following things:

1 - 127 bytes
Get the source 8 byte aligned and use 8 byte loads and stores. Pretty
boring and similar to how the current loop works.

128 - 4095 bytes
Get the source 8 byte aligned and use 8 byte loads and stores,
1 cacheline at a time. We aren't doing the stores in cacheline
aligned chunks so we will potentially serialise once per cacheline.
Even so it is much better than the loop we have today.

4096 - bytes
If both source and destination have the same alignment get them both
16 byte aligned, then get the destination cacheline aligned. Do
cacheline sized loads and stores using VMX.

If source and destination do not have the same alignment, we get the
destination cacheline aligned, and use permute to do aligned loads.

In both cases the VMX loop should be optimal - we always do aligned
loads and stores and are always doing stores in cacheline aligned,
cacheline sized chunks.

To be able to use VMX we must be careful about interrupts and
sleeping. We don't use the VMX loop when in an interrupt (which should
be rare anyway) and we wrap the VMX loop in disable/enable_pagefault
and fall back to the existing copy_tofrom_user loop if we do need to
sleep.

The VMX breakpoint of 4096 bytes was chosen using this microbenchmark:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

Since we are using VMX and there is a cost to saving and restoring
the user VMX state there are two broad cases we need to benchmark:

- Best case - userspace never uses VMX

- Worst case - userspace always uses VMX

In reality a userspace process will sit somewhere between these two
extremes. Since we need to test both aligned and unaligned copies we
end up with 4 combinations. The point at which the VMX loop begins to
win is:

0% VMX
aligned		2048 bytes
unaligned	2048 bytes

100% VMX
aligned		16384 bytes
unaligned	8192 bytes

Considering this is a microbenchmark, the data is hot in cache and
the VMX loop has better store queue merging properties we set the
breakpoint to 4096 bytes, a little below the unaligned breakpoints.

Some future optimisations we can look at:

- Looking at the perf data, a significant part of the cost when a
  task is always using VMX is the extra exception we take to restore
  the VMX state. As such we should do something similar to the x86
  optimisation that restores FPU state for heavy users. ie:

        /*
         * If the task has used fpu the last 5 timeslices, just do a full
         * restore of the math state immediately to avoid the trap; the
         * chances of needing FPU soon are obviously high now
         */
        preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

  and

        /*
         * fpu_counter contains the number of consecutive context switches
         * that the FPU is used. If this is over a threshold, the lazy fpu
         * saving becomes unlazy to save the trap. This is an unsigned char
         * so that after 256 times the counter wraps and the behavior turns
         * lazy again; this to deal with bursty apps that only use FPU for
         * a short time
         */

- We could create a paca bit to mirror the VMX enabled MSR bit and check
  that first, avoiding multiple calls to calling enable_kernel_altivec.
  That should help with iovec based system calls like readv.

- We could have two VMX breakpoints, one for when we know the user VMX
  state is loaded into the registers and one when it isn't. This could
  be a second bit in the paca so we can calculate the break points quickly.

- One suggestion from Ben was to save and restore the VSX registers
  we use inline instead of using enable_kernel_altivec.

[BenH: Fixed a problem with preempt and fixed build without CONFIG_ALTIVEC]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-19 14:40:40 +11:00
Anton Blanchard d715e433b7 powerpc: Copy down exception vectors after feature fixups
kdump fails because we try to execute an HV only instruction. Feature
fixups are being applied after we copy the exception vectors down to 0
so they miss out on any updates.

We have always had this issue but it only became critical in v3.0
when we added CFAR support (breaks POWER5) and v3.1 when we added
POWERNV (breaks everyone).

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> [v3.0+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-11-16 14:47:54 +11:00
Paul Gortmaker 4b16f8e2d6 powerpc: various straight conversions from module.h --> export.h
All these files were including module.h just for the basic
EXPORT_SYMBOL infrastructure.  We can shift them off to the
export.h header which is a way smaller footprint and thus
realize some compile time gains.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:44 -04:00
Linus Torvalds 82aff107f8 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (152 commits)
  powerpc: Fix hard CPU IDs detection
  powerpc/pmac: Update via-pmu to new syscore_ops
  powerpc/kvm: Fix the build for 32-bit Book 3S (classic) processors
  powerpc/kvm: Fix kvmppc_core_pending_dec
  powerpc: Remove last piece of GEMINI
  powerpc: Fix for Pegasos keyboard and mouse
  powerpc: Make early memory scan more resilient to out of order nodes
  powerpc/pseries/iommu: Cleanup ddw naming
  powerpc/pseries/iommu: Find windows after kexec during boot
  powerpc/pseries/iommu: Remove ddw property when destroying window
  powerpc/pseries/iommu: Add additional checks when changing iommu mask
  powerpc/pseries/iommu: Use correct return type in dupe_ddw_if_already_created
  powerpc: Remove unused/obsolete CONFIG_XICS
  misc: Add CARMA DATA-FPGA Programmer support
  misc: Add CARMA DATA-FPGA Access Driver
  powerpc: Make IRQ_NOREQUEST last to clear, first to set
  powerpc: Integrated Flash controller device tree bindings
  powerpc/85xx: Create dts of each core in CAMP mode for P1020RDB
  powerpc/85xx: Fix PCIe IDSEL for Px020RDB
  powerpc/85xx: P2020 DTS: re-organize dts files
  ...
2011-05-20 13:28:01 -07:00
Linus Torvalds 268bb0ce3e sanitize <linux/prefetch.h> usage
Commit e66eed651f ("list: remove prefetching from regular list
iterators") removed the include of prefetch.h from list.h, which
uncovered several cases that had apparently relied on that rather
obscure header file dependency.

So this fixes things up a bit, using

   grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
   grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

to guide us in finding files that either need <linux/prefetch.h>
inclusion, or have it despite not needing it.

There are more of them around (mostly network drivers), but this gets
many core ones.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-20 12:50:29 -07:00
Milton Miller a56555e573 powerpc: Remove alloc_maybe_bootmem for zalloc version
Replace all remaining callers of alloc_maybe_bootmem with
zalloc_maybe_bootmem.   The callsite in pci_dn is followed with a
memset to clear the memory, and not zeroing at the other callsites
in the celleb fake pci code could lead to following uninitialized
memory as pointers or even freeing said pointers on error paths.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-19 15:30:57 +10:00
Anton Blanchard 40f1ce7fb7 powerpc: Remove ioremap_flags
We have a confusing number of ioremap functions. Make things just a
bit simpler by merging ioremap_flags and ioremap_prot.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-19 14:30:43 +10:00
Anton Blanchard d988f0e3f8 powerpc: Simplify 4k/64k copy_page logic
To make it easier to add optimised versions of copy_page, remove
the 4kB loop for 64kB pages and just do all the work in copy_page.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-19 14:30:42 +10:00
Michael Ellerman b91e136cdf powerpc: Use MSR_64BIT in sstep.c, fix kprobes on BOOK3E
We check MSR_SF a lot in sstep.c, to decide if we need to emulate the
truncation of values when running in 32-bit mode. Factor out that code
into a helper, and convert it and the other uses to use MSR_64BIT.

This fixes a bug on BOOK3E where kprobes would end up returning to a
32-bit address, because regs->nip was truncated, because (msr & MSR_SF)
was false.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-27 14:18:46 +10:00
Michael Ellerman c0337288ab powerpc: Ensure the else case of feature sections will fit
When we create an alternative feature section, the else case must be the
same size or smaller than the body. This is because when we patch the
else case in we just overwrite the body, so there must be room.

Up to now we just did this by inspection, but it's quite easy to enforce
it in the assembler, so we should.

The only change is to add the ifgt block, but that effects the alignment
of the tabs and so the whole macro is modified.

Also add a test, but #if 0 it because we don't want to break the build.
Anyone who's modifying the feature macros should enable the test.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21 14:08:33 +11:00
Anton Blanchard b5f9b6665b powerpc: Hardcode popcnt instructions for old assemblers
The popcnt instructions went into binutils relatively recently. As with a
number of other instructions, create macros and hardcode them.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-12-09 15:35:30 +11:00
Anton Blanchard 64ff312876 powerpc: Add support for popcnt instructions
POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step
this patch does all the work out of line, but it would be nice to implement
them as inlines with an out of line fallback.

The performance issue with hweight was noticed when disabling SMT on a large
(192 thread) POWER7 box. The patch improves that testcase by about 8%.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-29 15:48:17 +11:00
matt mooney 4108d9ba90 powerpc/Makefiles: Change to new flag variables
Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-10-13 16:19:22 +11:00
Sean MacLennan cd64d1697c powerpc: mtmsrd not defined
Replace the BOOK3S_64 specific mtmsrd with the generic MTMSRD macro.
Only enable ldstfp when CONFIG_PPC_FPU is set.

Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02 14:07:34 +10:00
Sean MacLennan 025c0186a0 powerpc: Fix incorrect .stabs entry for copy_32.S
Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02 14:07:34 +10:00
Paul Mackerras 8154c5d22d powerpc: Abstract indexing of lppaca structs
Currently we have the lppaca structs as a simple array of NR_CPUS
entries, taking up space in the data section of the kernel image.
In future we would like to allocate them dynamically, so this
abstracts out the accesses to the array, making it easier to
change how we locate the lppaca for a given cpu in future.
Specifically, lppaca[cpu] changes to lppaca_of(cpu).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02 14:07:31 +10:00