Commit graph

93 commits

Author SHA1 Message Date
Kirill A. Shutemov 76b3aec332 tile: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:19 +09:00
Chris Metcalf 4b12909fd1 tile: remove HUGE_VMAP dead code
A config option to allow a variant vmap() using huge pages that was never
upstreamed had some bits of code related to it scattered around the tile
architecture; the config option was removed downstream and this commit
cleans up the scattered evidence of it from the upstream as well.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-13 11:15:24 -04:00
Chris Metcalf 8629470ef8 tile: use pmd_pfn() instead of casting via pte_t
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-13 11:14:25 -04:00
Johannes Weiner 759496ba64 arch: mm: pass userspace fault flag to generic fault handler
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
Johannes Weiner 94bce453c7 arch: mm: remove obsolete init OOM protection
The memcg code can trap tasks in the context of the failing allocation
until an OOM situation is resolved.  They can hold all kinds of locks
(fs, mm) at this point, which makes it prone to deadlocking.

This series converts memcg OOM handling into a two step process that is
started in the charge context, but any waiting is done after the fault
stack is fully unwound.

Patches 1-4 prepare architecture handlers to support the new memcg
requirements, but in doing so they also remove old cruft and unify
out-of-memory behavior across architectures.

Patch 5 disables the memcg OOM handling for syscalls, readahead, kernel
faults, because they can gracefully unwind the stack with -ENOMEM.  OOM
handling is restricted to user triggered faults that have no other
option.

Patch 6 reworks memcg's hierarchical OOM locking to make it a little
more obvious wth is going on in there: reduce locked regions, rename
locking functions, reorder and document.

Patch 7 implements the two-part OOM handling such that tasks are never
trapped with the full charge stack in an OOM situation.

This patch:

Back before smart OOM killing, when faulting tasks were killed directly on
allocation failures, the arch-specific fault handlers needed special
protection for the init process.

Now that all fault handlers call into the generic OOM killer (see commit
609838cfed: "mm: invoke oom-killer from remaining unconverted page
fault handlers"), which already provides init protection, the
arch-specific leftovers can be removed.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Acked-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
Naoya Horiguchi 83467efbdb mm: migrate: check movability of hugepage in unmap_and_move_huge_page()
Currently hugepage migration works well only for pmd-based hugepages
(mainly due to lack of testing,) so we had better not enable migration of
other levels of hugepages until we are ready for it.

Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
page table walk and check pud/pmd_huge() there, so they are safe.  But the
other users (softoffline and memory hotremove) don't do this, so without
this patch they can try to migrate unexpected types of hugepages.

To prevent this, we introduce hugepage_migration_support() as an
architecture dependent check of whether hugepage are implemented on a pmd
basis or not.  And on some architecture multiple sizes of hugepages are
available, so hugepage_migration_support() also checks hugepage size.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:49 -07:00
Chris Metcalf ce61cdc270 tile: make __write_once a synonym for __read_mostly
This was really only useful for TILE64 when we mapped the
kernel data with small pages. Now we use a huge page and we
really don't want to map different parts of the kernel
data in different ways.

We retain the __write_once name in case we want to bring
it back to life at some point in the future.

Note that this change uncovered a latent bug where the
"smp_topology" variable happened to always be aligned mod 8
so we could store two "int" values at once, but when we
eliminated __write_once it ended up only aligned mod 4.
Fix with an explicit annotation.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-03 14:53:32 -04:00
Chris Metcalf d7c9661115 tile: remove support for TILE64
This chip is no longer being actively developed for (it was superceded
by the TILEPro64 in 2008), and in any case the existing compiler and
toolchain in the community do not support it.  It's unlikely that the
kernel works with TILE64 at this point as the configuration has not been
tested in years.  The support is also awkward as it requires maintaining
a significant number of ifdefs.  So, just remove it altogether.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-03 14:53:29 -04:00
Chris Metcalf 640710a33b tile: add virt_to_kpte() API and clean up and document behavior
We use virt_to_pte(NULL, va) a lot, which isn't very obvious.
I added virt_to_kpte(va) as a more obvious wrapper function,
that also validates the va as being a kernel adddress.

And, I fixed the semantics of virt_to_pte() so that we handle
the pud and pmd the same way, and we now document the fact that
we handle the final pte level differently.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-03 14:52:13 -04:00
Chris Metcalf acbde1db29 tile: parameterize VA and PA space more cleanly
The existing code relied on the hardware definition (<arch/chip.h>)
to specify how much VA and PA space was available.  It's convenient
to allow customizing this for some configurations, so provide symbols
MAX_PA_WIDTH and MAX_VA_WIDTH in <asm/page.h> that can be modified
if desired.

Additionally, move away from the MEM_XX_INTRPT nomenclature to
define the start of various regions within the VA space.  In fact
the cleaner symbol is, for example, MEM_SV_START, to indicate the
start of the area used for supervisor code; the actual address of the
interrupt vectors is not as important, and can be changed if desired.
As part of this change, convert from "intrpt1" nomenclature (which
built in the old privilege-level 1 model) to a simple "intrpt".

Also strip out some tilepro-specific code supporting modifying the
PL the kernel could run at, since we don't actually support using
different PLs in tilepro, only tilegx.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-03 14:47:34 -04:00
Chris Metcalf 051168df52 tile: don't assume user privilege is zero
Technically, user privilege is anything less than kernel
privilege.  We modify the existing user_mode() macro to have
this semantic (and use it in a couple of places it wasn't being
used before), and add an IS_KERNEL_EX1() macro to the assembly
code as well.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-03 14:45:52 -04:00
Chris Metcalf a718e10cba tile: handle super huge pages in virt_to_pte
This tile-specific API had a minor bug, in that if a super huge (>4GB)
page mapped a particular address range, we wouldn't handle it correctly.
As part of fixing that bug, I also cleaned up some of the pud and pmd
accessors to make them more consistent.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-30 11:57:02 -04:00
Chris Metcalf 084fe6a0f5 tile: remove set/clear_fixmap APIs
Nothing in the codebase was using them, and as written they took
"unsigned long" as the physical address rather than "phys_addr_t",
which is wrong on tilepro anyway.  Rather than fixing stale APIs,
just remove them; if there's ever demand for them on this platform,
we can put them back.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-30 11:56:38 -04:00
Tony Lu b2eca4274c tile: support ASLR fully
With this change, tile Linux now supports address-space layout
randomization for shared objects, stack, heap and vdso.

Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Tony Lu <zlu@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-30 11:56:25 -04:00
Tony Lu 3fa17c395b tile: support kprobes on tilegx
This change includes support for Kprobes, Jprobes and Return Probes.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Tony Lu <zlu@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-30 11:55:53 -04:00
Chris Metcalf 9ae0983847 tile: provide traceability for hypervisor calls
This change adds infrastructure (CONFIG_TILE_HVGLUE_TRACE) that
provides C code wrappers for the calls the kernel makes to the Tilera
hypervisor.  This allows standard kernel infrastructure like FTRACE to
be able to instrument hypervisor calls.

To allow direct calls to the true API, we export their names with a
leading underscore as well.  This is important for the few contexts
where we need to make hypervisor calls without touching the stack.

As part of this change, we also switch from creating the symbols
with linker magic to creating them with assembler magic.  This lets
us provide a symbol type and generally make them appear more as symbols
and less as just random values in the Elf namespace.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13 16:26:31 -04:00
Chris Metcalf fad052dc4b tile: avoid struct vm_struct leak
If ioreamp_prot() fails in ioremap_page_range() due to kernel memory
exhaustion, we previously would leak a struct vm_struct.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13 16:26:25 -04:00
Chris Metcalf 4a556f4f56 tile: implement gettimeofday() via vDSO
This change creates the framework for vDSO calls, makes the existing
rt_sigreturn() mechanism use it, and adds a fast gettimeofday().
Now that we need to expose the vDSO address to userspace, we add
AT_SYSINFO_EHDR to the set of aux entries provided to userspace.
(You can disable any extra vDSO support by booting with vdso=0,
but the rt_sigreturn vDSO page will still be provided.)

Note that glibc has supported the tile vDSO since release 2.17.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13 16:26:21 -04:00
Chris Metcalf 0c1d1917c5 tile: support simulator notification for ET_DYN objects
The tile code notifies the simulator of new ET_EXEC objects starting
to execute so that tracing code can properly annotate the objects.
However, we didn't support ET_DYN executables like ld.so, so we
didn't properly load symbols, etc.  This change enables that support;
we use a variant of the SIM_CONTROL_DLOPEN simulator notification
that newer simulators will recognize and use to set the base address
for the next SIM_CONTROL_OS_EXEC notification.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13 16:26:17 -04:00
Chris Metcalf bc1a298f4e tile: support CONFIG_PREEMPT
This change adds support for CONFIG_PREEMPT (full kernel preemption).
In addition to the core support, this change includes a number
of places where we fix up uses of smp_processor_id() and per-cpu
variables.  I also eliminate the PAGE_HOME_HERE and PAGE_HOME_UNKNOWN
values for page homing, as it turns out they weren't being used.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13 16:26:01 -04:00
Chris Metcalf 1182b69cb2 tile: remove calls to arch_flush_lazy_mmu_mode()
Since it's a no-op on tile anyway, there's no reason to be calling
it in tile-specific code.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13 16:25:56 -04:00
Chris Metcalf a0bd12d718 tile: fix some issues in hugepage support
First, in huge_pte_offset(), we were erroneously checking
pgd_present(), which is always true, rather than pud_present(),
which is the thing that tells us if there is a top-level (L0) PTE.
Fixing this means we properly look up huge page entries only when
the Present bit is actually set in the PTE.

Second, use the standard pte_alloc_map() instead of the hand-rolled
pte_alloc_hugetlb() routine that basically was written to avoid
worrying about CONFIG_HIGHPTE.  However, we no longer plan to support
HIGHPTE, so a separate routine was just unnecessary code duplication.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13 16:25:52 -04:00
Chris Metcalf 2f9ac29eec tile: fast-path unaligned memory access for tilegx
This change enables unaligned userspace memory access via a kernel
fast path on tilegx.  The kernel tracks user PC/instruction pairs
per-thread using a direct-mapped cache in userspace.  The cache
maps those PC/instruction pairs to JIT'ed instruction sequences that
load or store using byte-wide load store intructions and then
synthesize 2-, 4- or 8-byte load or store results.  Once an
instruction has been seen to generate an unaligned access once,
subsequent hits on that instruction typically require overhead
of only around 50 cycles if cache and TLB is hot.

We support the prctl() PR_GET_UNALIGN / PR_SET_UNALIGN sys call to
enable or disable unaligned fixups on a per-process basis.

To do this we pull some of the tilepro unaligned support out of the
single_step.c file; tilepro uses instruction disassembly for both
single-step and unaligned access support.  Since tilegx actually has
hardware singlestep support, though, it's cleaner to keep the tilegx
unaligned access code in a separate file.  While we're at it,
properly rename the tilepro-specific types, etc., to have tilepro
suffixes instead of generic tile suffixes.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13 16:04:10 -04:00
Chris Metcalf e5f7bd4353 tile: fix tilegx vmalloc_sync_all BUG_ON
As specified, the test wasn't correct, and in any case it should
be a BUILD_BUG_ON.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-12 14:46:51 -04:00
Michel Lespinasse 98d1e64f95 mm: remove free_area_cache
Since all architectures have been converted to use vm_unmapped_area(),
there is no remaining use for the free_area_cache.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-10 18:11:34 -07:00
Johannes Weiner 609838cfed mm: invoke oom-killer from remaining unconverted page fault handlers
A few remaining architectures directly kill the page faulting task in an
out of memory situation.  This is usually not a good idea since that
task might not even use a significant amount of memory and so may not be
the optimal victim to resolve the situation.

Since 2.6.29's 1c0fe6e ("mm: invoke oom-killer from page fault") there
is a hook that architecture page fault handlers are supposed to call to
invoke the OOM killer and let it pick the right task to kill.  Convert
the remaining architectures over to this hook.

To have the previous behavior of simply taking out the faulting task the
vm.oom_kill_allocating_task sysctl can be set to 1.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>   [arch/arc bits]
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:20 -07:00
Jiang Liu 3f29c33194 mm/tile: prepare for removing num_physpages and simplify mem_init()
Prepare for removing num_physpages and simplify mem_init().

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:37 -07:00
Jiang Liu 40a3b8df7b tile: normalize global variables exported by vmlinux.lds
Normalize global variables exported by vmlinux.lds to conform usage
guidelines from include/asm-generic/sections.h.

1) Use _text to mark the start of the kernel image including the head
text, and _stext to mark the start of the .text section.
2) Export mandatory global variables __init_begin and __init_end.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:34 -07:00
Jiang Liu 0c98853473 mm: concentrate modification of totalram_pages into the mm core
Concentrate code to modify totalram_pages into the mm core, so the arch
memory initialized code doesn't need to take care of it.  With these
changes applied, only following functions from mm core modify global
variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
free_all_bootmem_node(), adjust_managed_page_count().

With this patch applied, it will be much more easier for us to keep
totalram_pages and zone->managed_pages in consistence.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:33 -07:00
Jiang Liu abd1b6d65f mm/tile: use common help functions to free reserved pages
Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:33 -07:00
Joonsoo Kim ef93247325 mm, vmalloc: change iterating a vmlist to find_vm_area()
This patchset removes vm_struct list management after initializing
vmalloc.  Adding and removing an entry to vmlist is linear time
complexity, so it is inefficient.  If we maintain this list, overall
time complexity of adding and removing area to vmalloc space is O(N),
although we use rbtree for finding vacant place and it's time complexity
is just O(logN).

And vmlist and vmlist_lock is used many places of outside of vmalloc.c.
It is preferable that we hide this raw data structure and provide
well-defined function for supporting them, because it makes that they
cannot mistake when manipulating theses structure and it makes us easily
maintain vmalloc layer.

For kexec and makedumpfile, I export vmap_area_list, instead of vmlist.
This comes from Atsushi's recommendation.  For more information, please
refer below link.  https://lkml.org/lkml/2012/12/6/184

This patch:

The purpose of iterating a vmlist is finding vm area with specific virtual
address.  find_vm_area() is provided for this purpose and more efficient,
because it uses a rbtree.  So change it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:33 -07:00
Shaohua Li ec8acf20af swap: add per-partition lock for swapfile
swap_lock is heavily contended when I test swap to 3 fast SSD (even
slightly slower than swap to 2 such SSD).  The main contention comes
from swap_info_get().  This patch tries to fix the gap with adding a new
per-partition lock.

Global data like nr_swapfiles, total_swap_pages, least_priority and
swap_list are still protected by swap_lock.

nr_swap_pages is an atomic now, it can be changed without swap_lock.  In
theory, it's possible get_swap_page() finds no swap pages but actually
there are free swap pages.  But sounds not a big problem.

Accessing partition specific data (like scan_swap_map and so on) is only
protected by swap_info_struct.lock.

Changing swap_info_struct.flags need hold swap_lock and
swap_info_struct.lock, because scan_scan_map() will check it.  read the
flags is ok with either the locks hold.

If both swap_lock and swap_info_struct.lock must be hold, we always hold
the former first to avoid deadlock.

swap_entry_free() can change swap_list.  To delete that code, we add a
new highest_priority_index.  Whenever get_swap_page() is called, we
check it.  If it's valid, we use it.

It's a pity get_swap_page() still holds swap_lock().  But in practice,
swap_lock() isn't heavily contended in my test with this patch (or I can
say there are other much more heavier bottlenecks like TLB flush).  And
BTW, looks get_swap_page() doesn't really need the lock.  We never free
swap_info[] and we check SWAP_WRITEOK flag.  The only risk without the
lock is we could swapout to some low priority swap, but we can quickly
recover after several rounds of swap, so sounds not a big deal to me.
But I'd prefer to fix this if it's a real problem.

"swap: make each swap partition have one address_space" improved the
swapout speed from 1.7G/s to 2G/s.  This patch further improves the
speed to 2.3G/s, so around 15% improvement.  It's a multi-process test,
so TLB flush isn't the biggest bottleneck before the patches.

[arnd@arndb.de: fix it for nommu]
[hughd@google.com: add missing unlock]
[minchan@kernel.org: get rid of lockdep whinge on sys_swapon]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:17 -08:00
Wen Congyang 24d335ca36 memory-hotplug: introduce new arch_remove_memory() for removing page table
For removing memory, we need to remove page tables.  But it depends on
architecture.  So the patch introduce arch_remove_memory() for removing
page table.  Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
      (I don't know how to implement it for s390).

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Michel Lespinasse c22c0d6344 mm: remove flags argument to mmap_region
After the MAP_POPULATE handling has been moved to mmap_region() call
sites, the only remaining use of the flags argument is to pass the
MAP_NORESERVE flag.  This can be just as easily handled by
do_mmap_pgoff(), so do that and remove the mmap_region() flags
parameter.

[akpm@linux-foundation.org: remove double parens]
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:11 -08:00
Chris Metcalf 7c63e1ee0a tile: export a handful of symbols appropriately
This was shown up by running with "allmodconfig".  I used
EXPORT_SYMBOL() to match existing conventions in files that
were already exporting symbols, or that were exported that way
by other architectures, and otherwise EXPORT_SYMBOL_GPL().

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-02-08 13:20:36 -05:00
Linus Torvalds 9977d9b379 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull big execve/kernel_thread/fork unification series from Al Viro:
 "All architectures are converted to new model.  Quite a bit of that
  stuff is actually shared with architecture trees; in such cases it's
  literally shared branch pulled by both, not a cherry-pick.

  A lot of ugliness and black magic is gone (-3KLoC total in this one):

   - kernel_thread()/kernel_execve()/sys_execve() redesign.

     We don't do syscalls from kernel anymore for either kernel_thread()
     or kernel_execve():

     kernel_thread() is essentially clone(2) with callback run before we
     return to userland, the callbacks either never return or do
     successful do_execve() before returning.

     kernel_execve() is a wrapper for do_execve() - it doesn't need to
     do transition to user mode anymore.

     As a result kernel_thread() and kernel_execve() are
     arch-independent now - they live in kernel/fork.c and fs/exec.c
     resp.  sys_execve() is also in fs/exec.c and it's completely
     architecture-independent.

   - daemonize() is gone, along with its parts in fs/*.c

   - struct pt_regs * is no longer passed to do_fork/copy_process/
     copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

   - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
     still need wrappers (ones with callee-saved registers not saved in
     pt_regs on syscall entry), but the main part of those suckers is in
     kernel/fork.c now."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
  do_coredump(): get rid of pt_regs argument
  print_fatal_signal(): get rid of pt_regs argument
  ptrace_signal(): get rid of unused arguments
  get rid of ptrace_signal_deliver() arguments
  new helper: signal_pt_regs()
  unify default ptrace_signal_deliver
  flagday: kill pt_regs argument of do_fork()
  death to idle_regs()
  don't pass regs to copy_process()
  flagday: don't pass regs to copy_thread()
  bfin: switch to generic vfork, get rid of pointless wrappers
  xtensa: switch to generic clone()
  openrisc: switch to use of generic fork and clone
  unicore32: switch to generic clone(2)
  score: switch to generic fork/vfork/clone
  c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
  take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
  mn10300: switch to generic fork/vfork/clone
  h8300: switch to generic fork/vfork/clone
  tile: switch to generic clone()
  ...

Conflicts:
	arch/microblaze/include/asm/Kbuild
2012-12-12 12:22:13 -08:00
Michel Lespinasse dd5295965b mm: use vm_unmapped_area() in hugetlbfs on tile architecture
Update the tile hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:26 -08:00
Chris Metcalf 6b14e4198c arch/tile: eliminate pt_regs trampolines for syscalls
Using the new current_pt_regs() model, we can remove some trampolines
from assembly code and call directly to the C syscall implementations.
rt_sigreturn() and clone() still need some assembly wrapping, but no
longer are passed a pt_regs pointer.  sigaltstack() and the
tilepro-specific cmpxchg_badaddr() syscalls are now just straight C.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-10-23 16:23:58 -04:00
Shaohua Li 45cac65b0f readahead: fault retry breaks mmap file read random detection
.fault now can retry.  The retry can break state machine of .fault.  In
filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
try, since the page is in page cache now, ra->mmap_miss is decreased.  And
these are done in one fault, so we can't detect random mmap file access.

Add a new flag to indicate .fault is tried once.  In the second try, skip
ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

I only tested x86, didn't test other archs, but looks the change for other
archs is obvious, but who knows :)

Signed-off-by: Shaohua Li <shaohua.li@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:47 +09:00
Konstantin Khlebnikov 2dd8ad81e3 mm: use mm->exe_file instead of first VM_EXECUTABLE vma->vm_file
Some security modules and oprofile still uses VM_EXECUTABLE for retrieving
a task's executable file.  After this patch they will use mm->exe_file
directly.  mm->exe_file is protected with mm->mmap_sem, so locking stays
the same.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>			[arch/tile]
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>	[tomoyo]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:18 +09:00
Linus Torvalds 84eda28060 Merge branch 'kmap_atomic' of git://github.com/congwang/linux
Pull final kmap_atomic cleanups from Cong Wang:
 "This should be the final round of cleanup, as the definitions of enum
  km_type finally get removed from the whole tree.  The patches have
  been in linux-next for a long time."

* 'kmap_atomic' of git://github.com/congwang/linux:
  pipe: remove KM_USER0 from comments
  vmalloc: remove KM_USER0 from comments
  feature-removal-schedule.txt: remove kmap_atomic(page, km_type)
  tile: remove km_type definitions
  um: remove km_type definitions
  asm-generic: remove km_type definitions
  avr32: remove km_type definitions
  frv: remove km_type definitions
  powerpc: remove km_type definitions
  arm: remove km_type definitions
  highmem: remove the deprecated form of kmap_atomic
  tile: remove usage of enum km_type
  frv: remove the second parameter of kmap_atomic_primary()
  jbd2: remove the second argument of kmap_atomic
2012-07-27 11:26:48 -07:00
Cong Wang 61d06c83fc tile: remove usage of enum km_type
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-07-23 14:11:23 +08:00
Chris Metcalf eef015c8aa arch/tile: enable ZONE_DMA for tilegx
This is required for PCI root complex legacy support and USB OHCI root
complex support.  With this change tilegx now supports allocating memory
whose PA fits in 32 bits.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-07-18 16:40:11 -04:00
Chris Metcalf bbaa22c3a0 tilegx pci: support I/O to arbitrarily-cached pages
The tilegx PCI root complex support (currently only in linux-next)
is limited to pages that are homed on cached in the default manner,
i.e. "hash-for-home".  This change supports delivery of I/O data to
pages that are cached in other ways (locally on a particular core,
uncached, user-managed incoherent, etc.).

A large part of the change is supporting flushing pages from cache
on particular homes so that we can transition the data that we are
delivering to or from the device appropriately.  The new homecache_finv*
routines handle this.

Some changes to page_table_range_init() were also required to make
the fixmap code work correctly on tilegx; it hadn't been used there
before.

We also remove some stub mark_caches_evicted_*() routines that
were just no-ops anyway.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-07-18 16:40:05 -04:00
Chris Metcalf 129622672d arch/tile: tilegx PCI root complex support
This change implements PCIe root complex support for tilegx using
the kernel support layer for accessing the TRIO hardware shim.

Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> [changes in 07487f3]
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-07-18 16:39:11 -04:00
Kautuk Consul 4ce6bea220 tile/mm/fault.c: Port OOM changes to handle_page_fault
Commit d065bd810b
(mm: retry page fault when blocking on disk transfer) and
commit 37b23e0525
(x86,mm: make pagefault killable)

The above commits introduced changes into the x86 pagefault handler
for making the page fault handler retryable as well as killable.

These changes reduce the mmap_sem hold time, which is crucial
during OOM killer invocation.

Port these changes to tile.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
[cmetcalf@tilera.com: initialize "flags" after "write" updated.]
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25 12:48:29 -04:00
Chris Metcalf 621b195515 arch/tile: support multiple huge page sizes dynamically
This change adds support for a new "super" bit in the PTE, using the new
arch_make_huge_pte() method.  The Tilera hypervisor sees the bit set at a
given level of the page table and gangs together 4, 16, or 64 consecutive
pages from that level of the hierarchy to create a larger TLB entry.

One extra "super" page size can be specified at each of the three levels
of the page table hierarchy on tilegx, using the "hugepagesz" argument
on the boot command line.  A new hypervisor API is added to allow Linux
to tell the hypervisor how many PTEs to gang together at each level of
the page table.

To allow pre-allocating huge pages larger than the buddy allocator can
handle, this change modifies the Tilera bootmem support to put all of
memory on tilegx platforms into bootmem.

As part of this change I eliminate the vestigial CONFIG_HIGHPTE support,
which never worked anyway, and eliminate the hv_page_size() API in favor
of the standard vma_kernel_pagesize() API.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25 12:48:27 -04:00
Chris Metcalf d5d14ed6f2 arch/tile: Allow tilegx to build with either 16K or 64K page size
This change introduces new flags for the hv_install_context()
API that passes a page table pointer to the hypervisor.  Clients
can explicitly request 4K, 16K, or 64K small pages when they
install a new context.  In practice, the page size is fixed at
kernel compile time and the same size is always requested every
time a new page table is installed.

The <hv/hypervisor.h> header changes so that it provides more abstract
macros for managing "page" things like PFNs and page tables.  For
example there is now a HV_DEFAULT_PAGE_SIZE_SMALL instead of the old
HV_PAGE_SIZE_SMALL.  The various PFN routines have been eliminated and
only PA- or PTFN-based ones remain (since PTFNs are always expressed
in fixed 2KB "page" size).  The page-table management macros are
renamed with a leading underscore and take page-size arguments with
the presumption that clients will use those macros in some single
place to provide the "real" macros they will use themselves.

I happened to notice the old hv_set_caching() API was totally broken
(it assumed 4KB pages) so I changed it so it would nominally work
correctly with other page sizes.

Tag modules with the page size so you can't load a module built with
a conflicting page size.  (And add a test for SMP while we're at it.)

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25 12:48:24 -04:00
Chris Metcalf 51007004f4 arch/tile: use interrupt critical sections less
In general we want to avoid ever touching memory while within an
interrupt critical section, since the page fault path goes through
a different path from the hypervisor when in an interrupt critical
section, and we carefully decided with tilegx that we didn't need
to support this path in the kernel.  (On tilepro we did implement
that path as part of supporting atomic instructions in software.)

In practice we always need to touch the kernel stack, since that's
where we store the interrupt state before releasing the critical
section, but this change cleans up a few things.  The IRQ_ENABLE
macro is split up so that when we want to enable interrupts in a
deferred way (e.g. for cpu_idle or for interrupt return) we can
read the per-cpu enable mask before entering the critical section.
The cache-migration code is changed to use interrupt masking instead
of interrupt critical sections.  And, the interrupt-entry code is
changed so that we defer loading "tp" from per-cpu data until after
we have released the interrupt critical section.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25 12:48:20 -04:00
Chris Metcalf b1760c847f arch/tile: remove bogus performance optimization
We were re-homing the initial task's kernel stack on the boot cpu,
but in fact it's better to let it stay globally homed, since that
task isn't bound to the boot cpu anyway.  This is more of a general
cleanup than an actual performance optimization, but it removes
code, which is a good thing. :-)

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-04-02 12:13:59 -04:00