Commit graph

39912 commits

Author SHA1 Message Date
Christoph Lameter dc322a99d3 mm: use raw_cpu ops for determining current NUMA node
With the preempt checking logic for __this_cpu_ops we will get false
positives from locations in the code that use numa_node_id.

Before the __this_cpu ops where introduced there were no checks for
preemption present either.  smp_raw_processor_id() was used.  See

  http://www.spinics.net/lists/linux-numa/msg00641.html

Therefore we need to use raw_cpu_read here to avoid false postives.

Note that this issue has been discussed in prior years.  If the process
changes nodes after retrieving the current numa node then that is
acceptable since most uses of numa_node etc are for optimization and not
for correctness.

There were suggestions to implement a raw_numa_node_id in order to do
preempt checks for numa_node_id as well.  But I think we better defer
that to another patch since that would mean investigating how
numa_node_id() is used throughout the kernel which would increase the
scope of this patchset significantly.  After all preemption was never
checked before when numa_node_id() was used.

Some sample traces:

__this_cpu_read operation in preemptible [00000000] code: login/1456
caller is __this_cpu_preempt_check+0x2b/0x2d
CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3-dirty #185
Call Trace:
  dump_stack+0x4e/0x82
  check_preemption_disabled+0xc5/0xe0
  __this_cpu_preempt_check+0x2b/0x2d
  get_task_policy+0x1d/0x49
  get_vma_policy+0x14/0x76
  alloc_pages_vma+0x35/0xff
  handle_mm_fault+0x290/0x73b
  __do_page_fault+0x3fe/0x44d
  do_page_fault+0x9/0xc
  page_fault+0x22/0x30
  generic_file_aio_read+0x38e/0x624
  do_sync_read+0x54/0x73
  vfs_read+0x9d/0x12a
  SyS_read+0x47/0x7e
  cstar_dispatch+0x7/0x23

caller is __this_cpu_preempt_check+0x2b/0x2d
CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3-dirty #185
Call Trace:
  dump_stack+0x4e/0x82
  check_preemption_disabled+0xc5/0xe0
  __this_cpu_preempt_check+0x2b/0x2d
  alloc_pages_current+0x8f/0xbc
  __page_cache_alloc+0xb/0xd
  __do_page_cache_readahead+0xf4/0x219
  ra_submit+0x1c/0x20
  ondemand_readahead+0x28c/0x2b4
  page_cache_sync_readahead+0x38/0x3a
  generic_file_aio_read+0x261/0x624
  do_sync_read+0x54/0x73
  vfs_read+0x9d/0x12a
  SyS_read+0x47/0x7e
  cstar_dispatch+0x7/0x23

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:13 -07:00
Christoph Lameter b3ca1c10d7 percpu: add raw_cpu_ops
The kernel has never been audited to ensure that this_cpu operations are
consistently used throughout the kernel.  The code generated in many
places can be improved through the use of this_cpu operations (which
uses a segment register for relocation of per cpu offsets instead of
performing address calculations).

The patch set also addresses various consistency issues in general with
the per cpu macros.

A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
   because checks are skipped. This is typically shown through a raw_
   prefix. So this patch set changes the places where __this_cpu_ptr()
   is used to raw_cpu_ptr().

B. There has been the long term wish by some that __this_cpu operations
   would check for preemption. However, there are cases where preemption
   checks need to be skipped. This patch set adds raw_cpu operations that
   do not check for preemption and then adds preemption checks to the
   __this_cpu operations.

C. The use of __get_cpu_var is always a reference to a percpu variable
   that can also be handled via a this_cpu operation. This patch set
   replaces all uses of __get_cpu_var with this_cpu operations.

D. We can then use this_cpu RMW operations in various places replacing
   sequences of instructions by a single one.

E. The use of this_cpu operations throughout will allow other arches than
   x86 to implement optimized references and RMV operations to work with
   per cpu local data.

F. The use of this_cpu operations opens up the possibility to
   further optimize code that relies on synchronization through
   per cpu data.

The patch set works in a couple of stages:

I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
    Also converts the existing __this_cpu_xx_# primitive in the x86
    code to raw_cpu_xx_#.

II. Patch 2-4 use the raw_cpu operations in places that would give
     us false positives once they are enabled.

III. Patch 5 adds preemption checks to __this_cpu operations to allow
    checking if preemption is properly disabled when these functions
    are used.

IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
   with this_cpu_ptr. They do not depend on any changes to the percpu
   code. No preemption tests are skipped if they are applied.

V. Patches 21-46 are conversion patches that use this_cpu operations
   in various kernel subsystems/drivers or arch code.

VI.  Patches 47/48 (not included in this series) remove no longer used
    functions (__this_cpu_ptr and __get_cpu_var).  These should only be
    applied after all the conversion patches have made it and after we
    have done additional passes through the kernel to ensure that none of
    the uses of these functions remain.

This patch (of 46):

The patches following this one will add preemption checks to __this_cpu
ops so we need to have an alternative way to use this_cpu operations
without preemption checks.

raw_cpu_ops will be the basis for all other ops since these will be the
operations that do not implement any checks.

Primitive operations are renamed by this patch from __this_cpu_xxx to
raw_cpu_xxxx.

Also change the uses of the x86 percpu primitives in preempt.h.
These depend directly on asm/percpu.h (header #include nesting issue).

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bryan Wu <cooloney@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: David Daney <david.daney@cavium.com>
Cc: David Miller <davem@davemloft.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Robert Richter <rric@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:13 -07:00
Vladimir Davydov 9a41707bd3 slub: rework sysfs layout for memcg caches
Currently, we try to arrange sysfs entries for memcg caches in the same
manner as for global caches.  Apart from turning /sys/kernel/slab into a
mess when there are a lot of kmem-active memcgs created, it actually
does not work properly - we won't create more than one link to a memcg
cache in case its parent is merged with another cache.  For instance, if
A is a root cache merged with another root cache B, we will have the
following sysfs setup:

  X
  A -> X
  B -> X

where X is some unique id (see create_unique_id()).  Now if memcgs M and
N start to allocate from cache A (or B, which is the same), we will get:

  X
  X:M
  X:N
  A -> X
  B -> X
  A:M -> X:M
  A:N -> X:N

Since B is an alias for A, we won't get entries B:M and B:N, which is
confusing.

It is more logical to have entries for memcg caches under the
corresponding root cache's sysfs directory.  This would allow us to keep
sysfs layout clean, and avoid such inconsistencies like one described
above.

This patch does the trick.  It creates a "cgroup" kset in each root
cache kobject to keep its children caches there.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Glauber Costa <glommer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:13 -07:00
Vladimir Davydov b8529907ba memcg, slab: do not destroy children caches if parent has aliases
Currently we destroy children caches at the very beginning of
kmem_cache_destroy().  This is wrong, because the root cache will not
necessarily be destroyed in the end - if it has aliases (refcount > 0),
kmem_cache_destroy() will simply decrement its refcount and return.  In
this case, at best we will get a bunch of warnings in dmesg, like this
one:

  kmem_cache_destroy kmalloc-32:0: Slab cache still has objects
  CPU: 1 PID: 7139 Comm: modprobe Tainted: G    B   W    3.13.0+ #117
  Call Trace:
    dump_stack+0x49/0x5b
    kmem_cache_destroy+0xdf/0xf0
    kmem_cache_destroy_memcg_children+0x97/0xc0
    kmem_cache_destroy+0xf/0xf0
    xfs_mru_cache_uninit+0x21/0x30 [xfs]
    exit_xfs_fs+0x2e/0xc44 [xfs]
    SyS_delete_module+0x198/0x1f0
    system_call_fastpath+0x16/0x1b

At worst - if kmem_cache_destroy() will race with an allocation from a
memcg cache - the kernel will panic.

This patch fixes this by moving children caches destruction after the
check if the cache has aliases.  Plus, it forbids destroying a root
cache if it still has children caches, because each children cache keeps
a reference to its parent.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Glauber Costa <glommer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:13 -07:00
Vladimir Davydov 794b1248be memcg, slab: separate memcg vs root cache creation paths
Memcg-awareness turned kmem_cache_create() into a dirty interweaving of
memcg-only and except-for-memcg calls.  To clean this up, let's move the
code responsible for memcg cache creation to a separate function.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Glauber Costa <glommer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:12 -07:00
Vladimir Davydov 5722d094ad memcg, slab: cleanup memcg cache creation
This patch cleans up the memcg cache creation path as follows:

- Move memcg cache name creation to a separate function to be called
  from kmem_cache_create_memcg().  This allows us to get rid of the mutex
  protecting the temporary buffer used for the name formatting, because
  the whole cache creation path is protected by the slab_mutex.

- Get rid of memcg_create_kmem_cache().  This function serves as a proxy
  to kmem_cache_create_memcg().  After separating the cache name creation
  path, it would be reduced to a function call, so let's inline it.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Glauber Costa <glommer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:12 -07:00
Uwe Kleine-König ce816fa88c Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP
If the renamed symbol is defined lib/iomap.c implements ioport_map and
ioport_unmap and currently (nearly) all platforms define the port
accessor functions outb/inb and friend unconditionally.  So
HAS_IOPORT_MAP is the better name for this.

Consequently NO_IOPORT is renamed to NO_IOPORT_MAP.

The motivation for this change is to reintroduce a symbol HAS_IOPORT
that signals if outb/int et al are available.  I will address that at
least one merge window later though to keep surprises to a minimum and
catch new introductions of (HAS|NO)_IOPORT.

The changes in this commit were done using:

	$ git grep -l -E '(NO|HAS)_IOPORT' | xargs perl -p -i -e 's/\b((?:CONFIG_)?(?:NO|HAS)_IOPORT)\b/$1_MAP/'

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:11 -07:00
Alexandre Bounine 2aaf308b95 rapidio: rework device hierarchy and introduce mport class of devices
This patch removes an artificial RapidIO bus root device and establishes
actual device hierarchy by providing reference to real parent devices.
It also introduces device class for RapidIO controller devices (on-chip
or an eternal bridge, known as "mport").

Existing implementation was sufficient for SoC-based platforms that have
a single RapidIO controller.  With introduction of devices using
multiple RapidIO controllers and PCIe-to-RapidIO bridges the old scheme
is very limiting or does not work at all.  The implemented changes allow
to properly reference platform's local RapidIO mport devices and provide
device details needed for upper layers.

This change to RapidIO device hierarchy does not break any known
existing kernel or user space interfaces.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Cc: Jerry Jacobs <jerry.jacobs@prodrive-technologies.com>
Cc: Arno Tiemersma <arno.tiemersma@prodrive-technologies.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:07 -07:00
Stephen Hemminger 90ae3ae539 idr: remove dead code
Remove no longer used deprecated code, and make local functions
static.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: George Spelvin <linux@horizon.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:07 -07:00
Rashika Kheria 82e0703b6c include/linux/crash_dump.h: add vmcore_cleanup() prototype
Eliminate the following warning in proc/vmcore.c:

  fs/proc/vmcore.c:1088:6: warning: no previous prototype for `vmcore_cleanup' [-Wmissing-prototypes]

[akpm@linux-foundation.org: clean up powerpc, remove unneeded EXPORT_SYMBOL]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:06 -07:00
Oleg Nesterov ad86622b47 wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space
get_task_state() uses the most significant bit to report the state to
user-space, this means that EXIT_ZOMBIE->EXIT_TRACE->EXIT_DEAD transition
can be noticed via /proc as Z -> X -> Z change.  Note that this was
possible even before EXIT_TRACE was introduced.

This is not really bad but imho it make sense to hide EXIT_TRACE from
user-space completely.  So the patch simply swaps EXIT_ZOMBIE and
EXIT_DEAD, this way EXIT_TRACE will be seen as EXIT_ZOMBIE by user-space.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michal Schmidt <mschmidt@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Lennart Poettering <lpoetter@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:06 -07:00
Oleg Nesterov abd50b39e7 wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition
wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock.  If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.

The last transition is racy, this is even documented in 50b8d25748
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race".  wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else.  So
the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.  This was fixed by
the previous commit, but it was the temporary hack.

1. Add the new exit_state, EXIT_TRACE. It means that the task is the
   traced zombie, debugger is going to detach and notify its natural
   parent.

   This new state is actually EXIT_ZOMBIE | EXIT_DEAD. This way we
   can avoid the changes in proc/kgdb code, get_task_state() still
   reports "X (dead)" in this case.

   Note: with or without this change userspace can see Z -> X -> Z
   transition. Not really bad, but probably makes sense to fix.

2. Change wait_task_zombie() to use EXIT_TRACE instead of EXIT_DEAD
   if we need to notify the ->real_parent.

3. Revert the previous hack in reparent_leader(), now that EXIT_DEAD
   is always the final state we can safely ignore such a task.

4. Change wait_consider_task() to check EXIT_TRACE separately and kill
   the racy and no longer needed ptrace_reparented() case.

   If ptrace == T an EXIT_TRACE thread should be simply ignored, the
   owner of this state is going to ptrace_unlink() this task. We can
   pretend that it was already removed from ->ptraced list.

   Otherwise we should skip this thread too but clear ->notask_error,
   we must be the natural parent and debugger is going to untrace and
   notify us. IOW, this doesn't differ from "EXIT_ZOMBIE && p->ptrace"
   even if the task was already untraced.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Lennart Poettering <lpoetter@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:05 -07:00
Oleg Nesterov 23aebe1691 exec: kill bprm->tcomm[], simplify the "basename" logic
Starting from commit c4ad8f98be ("execve: use 'struct filename *' for
executable name passing") bprm->filename can not go away after
flush_old_exec(), so we do not need to save the binary name in
bprm->tcomm[] added by 96e02d1586 ("exec: fix use-after-free bug in
setup_new_exec()").

And there was never need for filename_to_taskname-like code, we can
simply do set_task_comm(kbasename(filename).

This patch has to change set_task_comm() and trace_task_rename() to
accept "const char *", but I think this change is also good.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:05 -07:00
Srikar Dronamraju 834a964a09 numa: use LAST_CPUPID_SHIFT to calculate LAST_CPUPID_MASK
LAST_CPUPID_MASK is calculated using LAST_CPUPID_WIDTH.  However
LAST_CPUPID_WIDTH itself can be 0.  (when LAST_CPUPID_NOT_IN_PAGE_FLAGS is
set).  In such a case LAST_CPUPID_MASK turns out to be 0.

But with recent commit 1ae71d0319: (mm: numa: bugfix for
LAST_CPUPID_NOT_IN_PAGE_FLAGS) if LAST_CPUPID_MASK is 0,
page_cpupid_xchg_last() and page_cpupid_reset_last() causes
page->_last_cpupid to be set to 0.

This causes performance regression. Its almost as if numa_balancing is
off.

Fix LAST_CPUPID_MASK by using LAST_CPUPID_SHIFT instead of
LAST_CPUPID_WIDTH.

Some performance numbers and perf stats with and without the fix.

(3.14-rc6)
----------
numa01

 Performance counter stats for '/usr/bin/time -f %e %S %U %c %w -o start_bench.out -a ./numa01':

         12,27,462 cs                                                           [100.00%]
          2,41,957 migrations                                                   [100.00%]
       1,68,01,713 faults                                                       [100.00%]
    7,99,35,29,041 cache-misses
            98,808 migrate:mm_migrate_pages                                     [100.00%]

    1407.690148814 seconds time elapsed

numa02

 Performance counter stats for '/usr/bin/time -f %e %S %U %c %w -o start_bench.out -a ./numa02':

            63,065 cs                                                           [100.00%]
            14,364 migrations                                                   [100.00%]
          2,08,118 faults                                                       [100.00%]
      25,32,59,404 cache-misses
                12 migrate:mm_migrate_pages                                     [100.00%]

      63.840827219 seconds time elapsed

(3.14-rc6 with fix)
-------------------
numa01

 Performance counter stats for '/usr/bin/time -f %e %S %U %c %w -o start_bench.out -a ./numa01':

          9,68,911 cs                                                           [100.00%]
          1,01,414 migrations                                                   [100.00%]
         88,38,697 faults                                                       [100.00%]
    4,42,92,51,042 cache-misses
          4,25,060 migrate:mm_migrate_pages                                     [100.00%]

     685.965331189 seconds time elapsed

numa02

 Performance counter stats for '/usr/bin/time -f %e %S %U %c %w -o start_bench.out -a ./numa02':

            17,543 cs                                                           [100.00%]
             2,962 migrations                                                   [100.00%]
          1,17,843 faults                                                       [100.00%]
      11,80,61,644 cache-misses
            12,358 migrate:mm_migrate_pages                                     [100.00%]

      20.380132343 seconds time elapsed

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:58 -07:00
Fabian Frederick 29f175d125 mm/readahead.c: inline ra_submit
Commit f9acc8c7b3 ("readahead: sanify file_ra_state names") left
ra_submit with a single function call.

Move ra_submit to internal.h and inline it to save some stack.  Thanks
to Andrew Morton for commenting different versions.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:58 -07:00
Miklos Szeredi ed6d7c8e57 mm: remove unused arg of set_page_dirty_balance()
There's only one caller of set_page_dirty_balance() and that will call it
with page_mkwrite == 0.

The page_mkwrite argument was unused since commit b827e496c8 "mm: close
page_mkwrite races".

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:57 -07:00
Michal Hocko d715ae08f2 memcg: rename high level charging functions
mem_cgroup_newpage_charge is used only for charging anonymous memory so
it is better to rename it to mem_cgroup_charge_anon.

mem_cgroup_cache_charge is used for file backed memory so rename it to
mem_cgroup_charge_file.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:57 -07:00
Johannes Weiner df38197546 memcg: get_mem_cgroup_from_mm()
Instead of returning NULL from try_get_mem_cgroup_from_mm() when the mm
owner is exiting, just return root_mem_cgroup.  This makes sense for all
callsites and gets rid of some of them having to fallback manually.

[fengguang.wu@intel.com: fix warnings]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:56 -07:00
Kirill A. Shutemov d230dec18d mm: use 'const char *' insted of 'char *' for reason in dump_page()
I tried to use 'dump_page(page, __func__)' for debugging, but it triggers
warning:

  warning: passing argument 2 of `dump_page' discards `const' qualifier from pointer target type [enabled by default]

Let's convert 'reason' to 'const char *' in dump_page() and friends: we
shouldn't modify it anyway.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:55 -07:00
David Rientjes 539a13b47e res_counter: remove interface for locked charging and uncharging
The res_counter_{charge,uncharge}_locked() variants are not used in the
kernel outside of the resource counter code itself, so remove the
interface.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:54 -07:00
David Rientjes f0432d1596 mm, mempolicy: remove per-process flag
PF_MEMPOLICY is an unnecessary optimization for CONFIG_SLAB users.
There's no significant performance degradation to checking
current->mempolicy rather than current->flags & PF_MEMPOLICY in the
allocation path, especially since this is considered unlikely().

Running TCP_RR with netperf-2.4.5 through localhost on 16 cpu machine with
64GB of memory and without a mempolicy:

	threads		before		after
	16		1249409		1244487
	32		1281786		1246783
	48		1239175		1239138
	64		1244642		1241841
	80		1244346		1248918
	96		1266436		1254316
	112		1307398		1312135
	128		1327607		1326502

Per-process flags are a scarce resource so we should free them up whenever
possible and make them available.  We'll be using it shortly for memcg oom
reserves.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:54 -07:00
David Rientjes 2a389610a7 mm, mempolicy: rename slab_node for clarity
slab_node() is actually a mempolicy function, so rename it to
mempolicy_slab_node() to make it clearer that it used for processes with
mempolicies.

At the same time, cleanup its code by saving numa_mem_id() in a local
variable (since we require a node with memory, not just any node) and
remove an obsolete comment that assumes the mempolicy is actually passed
into the function.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Tim Hockin <thockin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:54 -07:00
Davidlohr Bueso 615d6e8756 mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:53 -07:00
Kirill A. Shutemov f1820361f8 mm: implement ->map_pages for page cache
filemap_map_pages() is generic implementation of ->map_pages() for
filesystems who uses page cache.

It should be safe to use filemap_map_pages() for ->map_pages() if
filesystem use filemap_fault() for ->fault().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:53 -07:00
Kirill A. Shutemov 8c6e50b029 mm: introduce vm_ops->map_pages()
Here's new version of faultaround patchset.  It took a while to tune it
and collect performance data.

First patch adds new callback ->map_pages to vm_operations_struct.

->map_pages() is called when VM asks to map easy accessible pages.
Filesystem should find and map pages associated with offsets from
"pgoff" till "max_pgoff".  ->map_pages() is called with page table
locked and must not block.  If it's not possible to reach a page without
blocking, filesystem should skip it.  Filesystem should use do_set_pte()
to setup page table entry.  Pointer to entry associated with offset
"pgoff" is passed in "pte" field in vm_fault structure.  Pointers to
entries for other offsets should be calculated relative to "pte".

Currently VM use ->map_pages only on read page fault path.  We try to
map FAULT_AROUND_PAGES a time.  FAULT_AROUND_PAGES is 16 for now.
Performance data for different FAULT_AROUND_ORDER is below.

TODO:
 - implement ->map_pages() for shmem/tmpfs;
 - modify get_user_pages() to be able to use ->map_pages() and implement
   mmap(MAP_POPULATE|MAP_NONBLOCK) on top.

=========================================================================
Tested on 4-socket machine (120 threads) with 128GiB of RAM.

Few real-world workloads. The sweet spot for FAULT_AROUND_ORDER here is
somewhere between 3 and 5. Let's say 4 :)

Linux build (make -j60)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		283,301,572	247,151,987	212,215,789	204,772,882	199,568,944	194,703,779	193,381,485
	time, seconds		151.227629483	153.920996480	151.356125472	150.863792049	150.879207877	151.150764954	151.450962358
Linux rebuild (make -j60)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		5,396,854	4,148,444	2,855,286	2,577,282	2,361,957	2,169,573	2,112,643
	time, seconds		27.404543757	27.559725591	27.030057426	26.855045126	26.678618635	26.974523490	26.761320095
Git test suite (make -j60 test)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		129,591,823	99,200,751	66,106,718	57,606,410	51,510,808	45,776,813	44,085,515
	time, seconds		66.087215026	64.784546905	64.401156567	65.282708668	66.034016829	66.793780811	67.237810413

Two synthetic tests: access every word in file in sequential/random order.
It doesn't improve much after FAULT_AROUND_ORDER == 4.

Sequential access 16GiB file
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
 1 thread
	minor-faults		4,195,437	2,098,275	525,068		262,251		131,170		32,856		8,282
	time, seconds		7.250461742	6.461711074	5.493859139	5.488488147	5.707213983	5.898510832	5.109232856
 8 threads
	minor-faults		33,557,540	16,892,728	4,515,848	2,366,999	1,423,382	442,732		142,339
	time, seconds		16.649304881	9.312555263	6.612490639	6.394316732	6.669827501	6.75078944	6.371900528
 32 threads
	minor-faults		134,228,222	67,526,810	17,725,386	9,716,537	4,763,731	1,668,921	537,200
	time, seconds		49.164430543	29.712060103	12.938649729	10.175151004	11.840094583	9.594081325	9.928461797
 60 threads
	minor-faults		251,687,988	126,146,952	32,919,406	18,208,804	10,458,947	2,733,907	928,217
	time, seconds		86.260656897	49.626551828	22.335007632	17.608243696	16.523119035	16.339489186	16.326390902
 120 threads
	minor-faults		503,352,863	252,939,677	67,039,168	35,191,827	19,170,091	4,688,357	1,471,862
	time, seconds		124.589206333	79.757867787	39.508707872	32.167281632	29.972989292	28.729834575	28.042251622
Random access 1GiB file
 1 thread
	minor-faults		262,636		132,743		34,369		17,299		8,527		3,451		1,222
	time, seconds		15.351890914	16.613802482	16.569227308	15.179220992	16.557356122	16.578247824	15.365266994
 8 threads
	minor-faults		2,098,948	1,061,871	273,690		154,501		87,110		25,663		7,384
	time, seconds		15.040026343	15.096933500	14.474757288	14.289129964	14.411537468	14.296316837	14.395635804
 32 threads
	minor-faults		8,390,734	4,231,023	1,054,432	528,847		269,242		97,746		26,881
	time, seconds		20.430433109	21.585235358	22.115062928	14.872878951	14.880856305	14.883370649	14.821261690
 60 threads
	minor-faults		15,733,258	7,892,809	1,973,393	988,266		594,789		164,994		51,691
	time, seconds		26.577302548	25.692397770	18.728863715	20.153026398	21.619101933	17.745086260	17.613215273
 120 threads
	minor-faults		31,471,111	15,816,616	3,959,209	1,978,685	1,008,299	264,635		96,010
	time, seconds		41.835322703	40.459786095	36.085306105	35.313894834	35.814445675	36.552633793	34.289210594

Touch only one page in page table in 16GiB file
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
 1 thread
	minor-faults		8,372		8,324		8,270		8,260		8,249		8,239		8,237
	time, seconds		0.039892712	0.045369149	0.051846126	0.063681685	0.079095975	0.17652406	0.541213386
 8 threads
	minor-faults		65,731		65,681		65,628		65,620		65,608		65,599		65,596
	time, seconds		0.124159196	0.488600638	0.156854426	0.191901957	0.242631486	0.543569456	1.677303984
 32 threads
	minor-faults		262,388		262,341		262,285		262,276		262,266		262,257		263,183
	time, seconds		0.452421421	0.488600638	0.565020946	0.648229739	0.789850823	1.651584361	5.000361559
 60 threads
	minor-faults		491,822		491,792		491,723		491,711		491,701		491,691		491,825
	time, seconds		0.763288616	0.869620515	0.980727360	1.161732354	1.466915814	3.04041448	9.308612938
 120 threads
	minor-faults		983,466		983,655		983,366		983,372		983,363		984,083		984,164
	time, seconds		1.595846553	1.667902182	2.008959376	2.425380942	2.941368804	5.977807890	18.401846125

This patch (of 2):

Introduce new vm_ops callback ->map_pages() and uses it for mapping easy
accessible pages around fault address.

On read page fault, if filesystem provides ->map_pages(), we try to map up
to FAULT_AROUND_PAGES pages around page fault address in hope to reduce
number of minor page faults.

We call ->map_pages first and use ->fault() as fallback if page by the
offset is not ready to be mapped (cold page cache or something).

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:52 -07:00
Alex Thorlton a0715cc226 mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE
Add VM_INIT_DEF_MASK, to allow us to set the default flags for VMs.  It
also adds a prctl control which allows us to set the THP disable bit in
mm->def_flags so that VMs will pick up the setting as they are created.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:52 -07:00
Linus Torvalds 467a9e1633 CPU hotplug notifiers registration fixes for 3.15-rc1
The purpose of this single series of commits from Srivatsa S Bhat (with
 a small piece from Gautham R Shenoy) touching multiple subsystems that use
 CPU hotplug notifiers is to provide a way to register them that will not
 lead to deadlocks with CPU online/offline operations as described in the
 changelog of commit 93ae4f978c (CPU hotplug: Provide lockless versions
 of callback registration functions).
 
 The first three commits in the series introduce the API and document it
 and the rest simply goes through the users of CPU hotplug notifiers and
 converts them to using the new method.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTQow2AAoJEILEb/54YlRxW4QQAJlYRDUzwFJzJzYhltQYuVR+
 4D74XMtvXgoJfg3cwdSWvMKKpJZnA9BVN0f7Hcx9wYmgdexYUuHeZJmMNyc3S2+g
 KjKBIsugvgmZhHbbLd6TJ6GBbhGT5JLt9VmSfL9zIkveInU1YHFUUqL/mxdHm4J0
 BSGKjk2rN3waRJgmY+xfliFLtQjDKFwJpMuvrgtoUyfas3f4sIV43UNbqdvA/weJ
 rzedxXOlKH/id4b56lj/4iIzcoL3mwvJJ7r6n0CEMsKv87z09kqR0O+69Tsq/cgs
 j17CsvoJOmZGk3QTeKVMQWBsvk6aPoDu3zK83gLbQMt+qjOpSTbJLz/3HZw4/TrW
 ss4nuZne1DLMGS+6hoxYbTP+6Ni//Kn+l/LrHc5jb7m1X3lMO4W2aV3IROtIE1rv
 lEP1IG01NU4u9YwkVj1dyhrkSp8tLPul4SrUK8W+oNweOC5crjJV7vJbIPJgmYiM
 IZN55wln0yVRtR4TX+rmvN0PixsInE8MeaVCmReApyF9pdzul/StxlBze5BKLSJD
 cqo1kNPpsmdxoDucqUpQ/gSvy+IOl2qnlisB5PpV93sk7De6TFDYrGHxjYIW7jMf
 StXwdCDDQhzd2Q8Kfpp895A1dbIl8rKtwA6bTU2eX+BfMVFzuMdT44cvosx1+UdQ
 sWl//rg76nb13dFjvF+q
 =SW7Q
 -----END PGP SIGNATURE-----

Merge tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
 "The purpose of this single series of commits from Srivatsa S Bhat
  (with a small piece from Gautham R Shenoy) touching multiple
  subsystems that use CPU hotplug notifiers is to provide a way to
  register them that will not lead to deadlocks with CPU online/offline
  operations as described in the changelog of commit 93ae4f978c ("CPU
  hotplug: Provide lockless versions of callback registration
  functions").

  The first three commits in the series introduce the API and document
  it and the rest simply goes through the users of CPU hotplug notifiers
  and converts them to using the new method"

* tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
  net/iucv/iucv.c: Fix CPU hotplug callback registration
  net/core/flow.c: Fix CPU hotplug callback registration
  mm, zswap: Fix CPU hotplug callback registration
  mm, vmstat: Fix CPU hotplug callback registration
  profile: Fix CPU hotplug callback registration
  trace, ring-buffer: Fix CPU hotplug callback registration
  xen, balloon: Fix CPU hotplug callback registration
  hwmon, via-cputemp: Fix CPU hotplug callback registration
  hwmon, coretemp: Fix CPU hotplug callback registration
  thermal, x86-pkg-temp: Fix CPU hotplug callback registration
  octeon, watchdog: Fix CPU hotplug callback registration
  oprofile, nmi-timer: Fix CPU hotplug callback registration
  intel-idle: Fix CPU hotplug callback registration
  clocksource, dummy-timer: Fix CPU hotplug callback registration
  drivers/base/topology.c: Fix CPU hotplug callback registration
  acpi-cpufreq: Fix CPU hotplug callback registration
  zsmalloc: Fix CPU hotplug callback registration
  scsi, fcoe: Fix CPU hotplug callback registration
  scsi, bnx2fc: Fix CPU hotplug callback registration
  scsi, bnx2i: Fix CPU hotplug callback registration
  ...
2014-04-07 14:55:46 -07:00
Arnd Bergmann b8780c363d sched: remove sleep_on() and friends
This is the final piece in the puzzle, as all patches to remove the
last users of \(interruptible_\|\)sleep_on\(_timeout\|\) have made it
into the 3.15 merge window. The work was long overdue, and this
interface in particular should not have survived the BKL removal
that was done a couple of years ago.

Citing Jon Corbet from http://lwn.net/2001/0201/kernel.php3":

 "[...] it was suggested that the janitors look for and fix all code
  that calls sleep_on() [...] since (1) almost all such code is
  incorrect, and (2) Linus has agreed that those functions should
  be removed in the 2.5 development series".

We haven't quite made it for 2.5, but maybe we can merge this for 3.15.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 11:24:06 -07:00
Linus Torvalds 240cd6a817 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "The biggest chunk is a series of patches from Ilya that add support
  for new Ceph osd and crush map features, including some new tunables,
  primary affinity, and the new encoding that is needed for erasure
  coding support.  This brings things into parity with the server side
  and the looming firefly release.  There is also support for allocation
  hints in RBD that help limit fragmentation on the server side.

  There is also a series of patches from Zheng fixing NFS reexport,
  directory fragmentation support, flock vs fnctl behavior, and some
  issues with clustered MDS.

  Finally, there are some miscellaneous fixes from Yunchuan Wen for
  fscache, Fabian Frederick for ACLs, and from me for fsync(dirfd)
  behavior"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (79 commits)
  ceph: skip invalid dentry during dcache readdir
  libceph: dump pool {read,write}_tier to debugfs
  libceph: output primary affinity values on osdmap updates
  ceph: flush cap release queue when trimming session caps
  ceph: don't grabs open file reference for aborted request
  ceph: drop extra open file reference in ceph_atomic_open()
  ceph: preallocate buffer for readdir reply
  libceph: enable PRIMARY_AFFINITY feature bit
  libceph: redo ceph_calc_pg_primary() in terms of ceph_calc_pg_acting()
  libceph: add support for osd primary affinity
  libceph: add support for primary_temp mappings
  libceph: return primary from ceph_calc_pg_acting()
  libceph: switch ceph_calc_pg_acting() to new helpers
  libceph: introduce apply_temps() helper
  libceph: introduce pg_to_raw_osds() and raw_to_up_osds() helpers
  libceph: ceph_can_shift_osds(pool) and pool type defines
  libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions
  libceph: enable OSDMAP_ENC feature bit
  libceph: primary_affinity decode bits
  libceph: primary_affinity infrastructure
  ...
2014-04-07 11:09:13 -07:00
Jon Mason 53ca4fea0b NTB: Code Style Clean-up
Some white space and 80 char overruns corrected.

Signed-off-by: Jon Mason <jon.mason@intel.com>
2014-04-07 10:59:19 -07:00
Jon Mason 403c63cb6d NTB: client event cleanup
Provide a better event interface between the client and transport

Signed-off-by: Jon Mason <jon.mason@intel.com>
2014-04-07 10:59:19 -07:00
Linus Torvalds 3021112598 f2fs updates for v3.15
This patch-set includes the following major enhancement patches.
  o introduce large directory support
  o introduce f2fs_issue_flush to merge redundant flush commands
  o merge write IOs as much as possible aligned to the segment
  o add sysfs entries to tune the f2fs configuration
  o use radix_tree for the free_nid_list to reduce in-memory operations
  o remove costly bit operations in f2fs_find_entry
  o enhance the readahead flow for CP/NAT/SIT/SSA blocks
 
 The other bug fixes are as follows.
  o recover xattr node blocks correctly after sudden-power-cut
  o fix to calculate the maximum number of node ids
  o enhance to handle many error cases
 
 And, there are a bunch of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTQiQrAAoJEEAUqH6CSFDSlbIP/iq06BrUeMDLoQFhA2GQFKFD
 wd0A5h9hCiFcKBcI/u/aAQqj/a5wdwzDl9XzH2PzJ45IM6sVGQZ0lv+kdLhab6rk
 ipNbV7G0yLAX+8ygS6GZF7pSKfMzGSGTrRvfdtoiunIip1jCY1IkUxv1XMgBSPza
 wnWYrE5HXEqRUDCqPXJyxrPmx0/0jw8/V82Ng9stnY34ySs+l/3Pvg65Kh0QuSSy
 BRjJUGlOCF68KUBKd+6YB2T5KlbQde3/5lhP+GMOi+xm5sFB+j+59r/WpJpF2Nxs
 ImxQs5GkiU01ErH/rn5FgHY/zzddQenBKwOvrjEeUA1eVpBurdsIr1JN0P6qDbgB
 ho5U8LzCQq+HZiW444eQGkXSOagpUKqDhTVJO7Fji/wG88Atc9gLX3ix8TH2skxT
 C5CvvrJM7DKBtkZyTzotKY/cWorOZhge6E/EkbGaM1sSHdK5b1Rg4YlFi9TDyz0n
 QjGD1uuvEeukeKGdIG9pjc7o5ledbMDYwLpT2RuRXenLOTsn8BqDOo9aRTg+5Kag
 tJNJLFumjPR2mEBNKjicJMUf381J/SKDwZszAz9mgvCZXldMza/Ax0LzJDJCVmkP
 UuBiVzGxVzpd33IsESUDr0J9hc+t8kS10jfAeKnE3cpb6n7/RYxstHh6CHOFKNXM
 gPUSYPN3CYiP47DnSfzA
 =eSW+
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - introduce large directory support
   - introduce f2fs_issue_flush to merge redundant flush commands
   - merge write IOs as much as possible aligned to the segment
   - add sysfs entries to tune the f2fs configuration
   - use radix_tree for the free_nid_list to reduce in-memory operations
   - remove costly bit operations in f2fs_find_entry
   - enhance the readahead flow for CP/NAT/SIT/SSA blocks

  The other bug fixes are as follows:
   - recover xattr node blocks correctly after sudden-power-cut
   - fix to calculate the maximum number of node ids
   - enhance to handle many error cases

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (62 commits)
  f2fs: fix wrong statistics of inline data
  f2fs: check the acl's validity before setting
  f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
  f2fs: fix to cover io->bio with io_rwsem
  f2fs: fix error path when fail to read inline data
  f2fs: use list_for_each_entry{_safe} for simplyfying code
  f2fs: avoid free slab cache under spinlock
  f2fs: avoid unneeded lookup when xattr name length is too long
  f2fs: avoid unnecessary bio submit when wait page writeback
  f2fs: return -EIO when node id is not matched
  f2fs: avoid RECLAIM_FS-ON-W warning
  f2fs: skip unnecessary node writes during fsync
  f2fs: introduce fi->i_sem to protect fi's info
  f2fs: change reclaim rate in percentage
  f2fs: add missing documentation for dir_level
  f2fs: remove unnecessary threshold
  f2fs: throttle the memory footprint with a sysfs entry
  f2fs: avoid to drop nat entries due to the negative nr_shrink
  f2fs: call f2fs_wait_on_page_writeback instead of native function
  f2fs: introduce nr_pages_to_write for segment alignment
  ...
2014-04-07 10:55:36 -07:00
Linus Torvalds e5744abb2f == Changes to existing drivers ==
- Use of managed resources - omap, twl4030, ti_am335x_tscadc
    - Advanced error handling - omap
    - Rework clk management - omap
    - Device Tree (re-)work - tc3589x, pm8921, da9055, sec
    - IRC management overhaul and !BROKEN - pm8921
    - Convert to regmap - ssbi, pm8921
    - Use simple power-management ops - ucb1x00
    - Include file clean-up - adp5520, cs5535, janz, lpc_ich,
       - lpc_sch, max14577, mcp-sa11x0, pcf50633-adc, rc5t583,
       	rdc321x-southbridge, retu, smsc-ece1099, ti-ssp, ti_am335x_tscadc,
 	tps65912, vexpress-config, wm8350, ywm8350
    - Various bug fixes across the subsystem
       - NULL/invalid pointer dereference prevention
       - Resource leak mitigation,
       - Variable used initialised
       - Staticise various containers
       - Enforce return value checks
 
  == New drivers/supported devices ==
    - Add support for s2mps14 and s2mpa01 to sec
    - Add support for da9063 (v5) to da9063
    - Add support for atom-c2000 to gpio-ich
    - Add support for come-{mbt10,cbt6,chl6} to kempld
    - Add support for da9053 to da9052
    - Add support for itco-wdt (v3) and baytrail to lpc_ich
    - Add new drivers for tps65218, rtsx_usb, bcm590xx
 
  == (Re-)moved drivers ==
    - twl4030 ==> drivers/iio
    - ti-ssp  ==> /dev/null
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTQmHbAAoJEFGvii+H/HdhXgAQAI6dNLb3AfNol49pNh5nEcrt
 sixQwu56xe64xozcMq41oXcbIe6NOvd/Sgf57fowCqrNyXd3k2sp6/KzA1yM8yfc
 2Xfm2fzzMoyH3lEopepT0zKMEyeOKxCNJWInXjRDmR6EN8szV/gAvwEptXnXKq8n
 sANQCBr2A1sDAlxu5onDI6SGEibCZgSsW+EElPyNKjXyIXdATv+ZLSuNCapt2Zg2
 H/KM+CY2hlcl6quWwjEUtPF4Ux0hIv3ePkwDKQicXMgndxL3+aL5L66UHsIovgxW
 o9H2aA6cfOQJuAXAZUvHlsNlefFW5qpFFR8kXiW87Say3+7nijoe5DhH/RBSZN+i
 O0rbxWVa1rW9eYmHuKAPNMR8Lp4FN9OvBo/Yv3UfmMV661vLVLOvTwJI9GZg7v8o
 UPMDhYNgEnRNrWqf7Wkj9ywgvGaO8qggm7gpE2cFD8DGDR7aZQ9goRKpaVSjTNmW
 4staek1u4g7YQ9s2UXxQ0JFc7esMbUbXxv5Bmk+4JPiI3P4gDMTg7jhh5iKDcEs5
 BVUIfdYKF9LInfYT3o9Uvo6TbYeAfwwzOdMFDWa5BjGOCLD9ttOEGtqMD/bkANbn
 YsaD6xKEL+su37CocSPnekgU+IS0uLb15jpa06CmoaALPGAZcRffKMygSHtlyGtR
 pNazlO93tu9JXQcL5B+A
 =p4SP
 -----END PGP SIGNATURE-----

Merge tag 'mfd-for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:
 "Changes to existing drivers:
   - Use of managed resources - omap, twl4030, ti_am335x_tscadc
   - Advanced error handling - omap
   - Rework clk management - omap
   - Device Tree (re-)work - tc3589x, pm8921, da9055, sec
   - IRC management overhaul and !BROKEN - pm8921
   - Convert to regmap - ssbi, pm8921
   - Use simple power-management ops - ucb1x00
   - Include file clean-up - adp5520, cs5535, janz, lpc_ich,
      - lpc_sch, max14577, mcp-sa11x0, pcf50633-adc, rc5t583,
      	rdc321x-southbridge, retu, smsc-ece1099, ti-ssp, ti_am335x_tscadc,
	tps65912, vexpress-config, wm8350, ywm8350
   - Various bug fixes across the subsystem
      - NULL/invalid pointer dereference prevention
      - Resource leak mitigation,
      - Variable used initialised
      - Staticise various containers
      - Enforce return value checks

  New drivers/supported devices:
   - Add support for s2mps14 and s2mpa01 to sec
   - Add support for da9063 (v5) to da9063
   - Add support for atom-c2000 to gpio-ich
   - Add support for come-{mbt10,cbt6,chl6} to kempld
   - Add support for da9053 to da9052
   - Add support for itco-wdt (v3) and baytrail to lpc_ich
   - Add new drivers for tps65218, rtsx_usb, bcm590xx

  (Re-)moved drivers:
   - twl4030 ==> drivers/iio
   - ti-ssp  ==> /dev/null"

* tag 'mfd-for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (103 commits)
  mfd: wm5110: Correct default for HEADPHONE_DETECT_1
  mfd: arizona: Correct small errors in the DT binding documentation
  mfd: arizona: Mark DSP clocking register as volatile
  mfd: devicetree: bindings: Add pm8xxx RTC description
  mfd: kempld-core: Fix potential hang-up during boot
  mfd: sec-core: Fix uninitialized 'regmap_rtc' on S2MPA01
  mfd: tps65910: Fix regmap_irq_chip_data leak on mfd_add_devices fail
  mfd: tps65910: Fix possible invalid pointer dereference on regmap_add_irq_chip fail
  mfd: sec-core: Fix I2C dummy device resource leak on probe failure
  mfd: sec-core: Add of_compatible strings for clock MFD cells
  mfd: Remove obsolete ti-ssp driver
  Documentation: mfd: s2mps11: Describe S5M8767 and S2MPS14 clocks
  mfd: bcm590xx: Fix type argument for module device table
  mfd: lpc_ich: Add support for Intel Bay Trail SoC
  mfd: lpc_ich: Add support for NM10 GPIO
  mfd: lpc_ich: Change Avoton to iTCO v3
  watchdog: iTCO_wdt: Add support for v3 silicon
  mfd: lpc_ich: Add support for iTCO v3
  mfd: lpc_ich: Remove lpc_ich_cfg struct use
  mfd: lpc_ich: Only configure watchdog or GPIO when present
  ...
2014-04-07 10:24:18 -07:00
Linus Torvalds c29aa153ef MTD updates for 3.15:
- A few SPI NOR ID definitions
  - Kill the NAND "max pagesize" restriction
  - Fix some x16 bus-width NAND support
  - Add NAND JEDEC parameter page support
  - DT bindings for NAND ECC
  - GPMI NAND updates (subpage reads)
  - More OMAP NAND refactoring
  - New STMicro SPI NOR driver (now in 40 patches!)
  - A few other random bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTP6x+AAoJEFySrpd9RFgtit0P/jLWsjMK8G2ldPC4bMZsXmDF
 n3c71GcCRlUq4Qzb4rtZx9DANLAh+JyRrMOKCPg6dAMegFdmUqDOpZpNp0vF57KG
 myFjqTk+n5y0tfSkWLMUFt0tQ8ArDp3IBkQCUWkD5LgG50EWmjveIQGH0kFnkE39
 Kytqvw17RV7f81tIs+WvKt8++YWD2X1VTpTi0S4fx2bJ99bJDBf/GgdoQpj2oirt
 igXmloUFEsob9JHZ3qumcUm9vaHwv2TiouZTvRyGdJCCoPdpJEZO4Ka6e4uAVarT
 6kMKXBk3lj2GsilOSFFCNetXfy5Bf0TkJkv4rDjh3R1Y4J/hSgraVCbWXdKhb6tj
 RmwesdFMjsyS4f/Rhk5PXwJgGL9uK2mi6bk/SmXU0AMgCDSa5zjshY8Wq6C6uXwk
 LqlnK8l3h8Txotbc/XJIL+QGMbMkYQI8gxWTHFaqzDtkMe36mnGs9Zec3oso/s2d
 CNRpq5+dMZ6qF0z3zpOQHmFbaOekivMy7kCKMXer6ONsrQBNwTdmkwy+SdqnWsLF
 YdJttwV/RRcE0SRvK6GrhvzkGlV83Z8RPny6hC1kbrgQ0ffoy2CmIqyWNObK1RXf
 sYqoF8TCtR6Y8rHHi5dzZ1lria+nm8pb4+UfQLRI0mK8og7YW+fIfHXxqRrZZIrt
 No8NCPBKWzyew2UE0AiQ
 =CKih
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20140405' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 - A few SPI NOR ID definitions
 - Kill the NAND "max pagesize" restriction
 - Fix some x16 bus-width NAND support
 - Add NAND JEDEC parameter page support
 - DT bindings for NAND ECC
 - GPMI NAND updates (subpage reads)
 - More OMAP NAND refactoring
 - New STMicro SPI NOR driver (now in 40 patches!)
 - A few other random bugfixes

* tag 'for-linus-20140405' of git://git.infradead.org/linux-mtd: (120 commits)
  Fix index regression in nand_read_subpage
  mtd: diskonchip: mem resource name is not optional
  mtd: nand: fix mention to CONFIG_MTD_NAND_ECC_BCH
  mtd: nand: fix GET/SET_FEATURES address on 16-bit devices
  mtd: omap2: Use devm_ioremap_resource()
  mtd: denali_dt: Use devm_ioremap_resource()
  mtd: devices: elm: update DRIVER_NAME as "omap-elm"
  mtd: devices: elm: configure parallel channels based on ecc_steps
  mtd: devices: elm: clean elm_load_syndrome
  mtd: devices: elm: check for hardware engine's design constraints
  mtd: st_spi_fsm: Succinctly reorganise .remove()
  mtd: st_spi_fsm: Allow loop to run at least once before giving up CPU
  mtd: st_spi_fsm: Correct vendor name spelling issue - missing "M"
  mtd: st_spi_fsm: Avoid duplicating MTD core code
  mtd: st_spi_fsm: Remove useless consts from function arguments
  mtd: st_spi_fsm: Convert ST SPI FSM (NOR) Flash driver to new DT partitions
  mtd: st_spi_fsm: Move runtime configurable msg sequences into device's struct
  mtd: st_spi_fsm: Supply the W25Qxxx chip specific configuration call-back
  mtd: st_spi_fsm: Supply the S25FLxxx chip specific configuration call-back
  mtd: st_spi_fsm: Supply the MX25xxx chip specific configuration call-back
  ...
2014-04-07 10:17:30 -07:00
Viresh Kumar 7f4b04614a cpufreq: create another field .flags in cpufreq_frequency_table
Currently cpufreq frequency table has two fields: frequency and driver_data.
driver_data is only for drivers' internal use and cpufreq core shouldn't use
it at all. But with the introduction of BOOST frequencies, this assumption
was broken and we started using it as a flag instead.

There are two problems due to this:
- It is against the description of this field, as driver's data is used by
  the core now.
- if drivers fill it with -3 for any frequency, then those frequencies are
  never considered by cpufreq core as it is exactly same as value of
  CPUFREQ_BOOST_FREQ, i.e. ~2.

The best way to get this fixed is by creating another field flags which
will be used for such flags. This patch does that. Along with that various
drivers need modifications due to the change of struct cpufreq_frequency_table.

Reviewed-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-04-07 14:43:50 +02:00
Linus Torvalds 2b3a8fd735 NFS client updates for Linux 3.15
Highlights include:
 
 - Stable fix for a use after free issue in the NFSv4.1 open code
 - Fix the SUNRPC bi-directional RPC code to account for TCP segmentation
 - Optimise usage of readdirplus when confronted with 'ls -l' situations
 - Soft mount bugfixes
 - NFS over RDMA bugfixes
 - NFSv4 close locking fixes
 - Various NFSv4.x client state management optimisations
 - Rename/unlink code cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTQBayAAoJEGcL54qWCgDyUzgQAKzSlbcksMQT55M/KZJXabNW
 KSctJeDrkTkRxOXTNxuF9NbIgeqenLijCokXty6BIUgup0zkOPMzFfRfgdQvplnp
 YEj4sOEXEZ8CX+PoUTYOEayzt0ssEAOyidumiM+Gx2LD/E1d2xyCL7YaAOjIhVQS
 OnXcX1cZw+dZSUxC9vu5fVDjrphJTnp4CXdbvR5PiJiXeKqzZd9e5M3hXgpAQ/AS
 mWjYeUvM9mwyz7UmbLKkWEmzB3tFlGdTzDPxLRrkfcOSKI2Ham0lL3/Uv50/nRTu
 99ts6KH8KLGcUuL9vD9KRebht2f71usBrWAdvpy1cUcf1Fh6lmEg4ktGfkqldaUu
 9kNu9d5DCxJoGc6R2UTw5FeyPwYuDWoBwEGy1DcguJ5CeQn2R2nH4ps/P3J3DX4d
 DZsJqCY9idKZCQhtyR0iF9j3x2bNFoENaL6WHI6b0J+xjMedIbHgeUQzIQP0RLBJ
 h0IcjK0D+e7WdyC7jk4Nm3krtms5SNUG5/N9OUO36a7v8735PJBcbcgm9hZJt8Fh
 t/4vqUmKIBXHioHsMhaFslqTWlYIR9a3MYmN7QtHFYbqUfNxH69v9y3d6jb4Igck
 kqoEiui5aJOCR76s7oVdHCcm+klBwEPiACT+H9CUMzSoKzHSWsBSNZbJR3BEia4M
 7dwScS1OfI2KuutshGQA
 =weNx
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Stable fix for a use after free issue in the NFSv4.1 open code
   - Fix the SUNRPC bi-directional RPC code to account for TCP segmentation
   - Optimise usage of readdirplus when confronted with 'ls -l' situations
   - Soft mount bugfixes
   - NFS over RDMA bugfixes
   - NFSv4 close locking fixes
   - Various NFSv4.x client state management optimisations
   - Rename/unlink code cleanups"

* tag 'nfs-for-3.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
  nfs: pass string length to pr_notice message about readdir loops
  NFSv4: Fix a use-after-free problem in open()
  SUNRPC: rpc_restart_call/rpc_restart_call_prepare should clear task->tk_status
  SUNRPC: Don't let rpc_delay() clobber non-timeout errors
  SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks
  SUNRPC: Ensure call_status() deals correctly with SOFTCONN tasks
  NFSv4: Ensure we respect soft mount timeouts during trunking discovery
  NFSv4: Schedule recovery if nfs40_walk_client_list() is interrupted
  NFS: advertise only supported callback netids
  SUNRPC: remove KERN_INFO from dprintk() call sites
  SUNRPC: Fix large reads on NFS/RDMA
  NFS: Clean up: revert increase in READDIR RPC buffer max size
  SUNRPC: Ensure that call_bind times out correctly
  SUNRPC: Ensure that call_connect times out correctly
  nfs: emit a fsnotify_nameremove call in sillyrename codepath
  nfs: remove synchronous rename code
  nfs: convert nfs_rename to use async_rename infrastructure
  nfs: make nfs_async_rename non-static
  nfs: abstract out code needed to complete a sillyrename
  NFSv4: Clear the open state flags if the new stateid does not match
  ...
2014-04-06 10:09:38 -07:00
Linus Torvalds 6f4c98e1c2 Nothing major: the stricter permissions checking for sysfs broke
a staging driver; fix included.  Greg KH said he'd take the patch
 but hadn't as the merge window opened, so it's included here
 to avoid breaking build.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTQMH9AAoJENkgDmzRrbjxo4UP/jwlenP44v+RFpo/dn8Z8E2n
 SREQscU5ZZKvuyFD6kUdvOz8YC/nTrJvXoVkMUF05GVbuvb8/8UPtT9ECVemd0rW
 xNy4aFfv9rbrqRLBLpLK9LAgTuhwlbTgGxgL78zRn3hWmf1hBZWCY+cEvKM8l/+9
 oEQdORL0sUpZh7iryAeGqbOrXT4gqJEvSLOFwiYTSo6ryzWIilmdXSUAh6s8MIEX
 PR1+oH9J8B6J29lcXKMf8/sDI1EBUeSLdBmMCuN5Y7xpYxsQLroVx94kPbdBY+XK
 ZRoYuUGSUJfGRZY46cFKApIGeF07z1DGoyXghbSWEQrI+23TMUmrKUg47LSukE4Y
 yCUf8HAtqIA3gVc9GKDdSp/2UpkAhTTv5ogKgnIzs1InWtOIBdDRSVUQXDosFEXw
 6ZZe1pQs2zfXyXxO4j0Wq36K4RgI0aqOVw+dcC+w5BidjVylgnYRV0PSDd72tid7
 bIfnjDbUBo+o4LanPNGYK474KyO7AslgTE50w6zwbJzgdwCQ36hCpKqScBZzm60a
 42LrgTVoIHHWAL1tDzWL/LzWflZGdJAezzNje0/f2Q3bGMiNHWoljAvUphkTZ7qt
 E8+jWqmM+riH3e8Y5wKpO1BKt7NGHISEy//bUlnqTwisjIzVILZ6VjfugQ1AI+0x
 llTXPBotFvfvXqxunBg7
 =yzUO
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Nothing major: the stricter permissions checking for sysfs broke a
  staging driver; fix included.  Greg KH said he'd take the patch but
  hadn't as the merge window opened, so it's included here to avoid
  breaking build"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  staging: fix up speakup kobject mode
  Use 'E' instead of 'X' for unsigned module taint flag.
  VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
  kallsyms: fix percpu vars on x86-64 with relocation.
  kallsyms: generalize address range checking
  module: LLVMLinux: Remove unused function warning from __param_check macro
  Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
  module: remove MODULE_GENERIC_TABLE
  module: allow multiple calls to MODULE_DEVICE_TABLE() per module
  module: use pr_cont
2014-04-06 09:38:07 -07:00
Linus Torvalds 04535d273e . Fix dm-cache corruption caused by discard_block_size >
cache_block_size
 
 . Fix a lock-inversion detected by LOCKDEP in dm-cache
 
 . Fix a dangling bio bug in the dm-thinp target's process_deferred_bios
   error path
 
 . Fix corruption due to non-atomic transaction commit which allowed a
   metadata superblock to be written before all other metadata was
   successfully written -- this is common to all targets that use the
   persistent-data library's transaction manager (dm-thinp, dm-cache and
   dm-era).
 
 . Various small cleanups in the DM core
 
 . Add the dm-era target which is useful for keeping track of which
   blocks were written within a user defined period of time called an
   'era'.  Use cases include tracking changed blocks for backup software,
   and partially invalidating the contents of a cache to restore cache
   coherency after rolling back a vendor snapshot.
 
 . Improve the on-disk layout of multithreaded writes to the dm-thin-pool
   by splitting the pool's deferred bio list to be a per-thin device list
   and then sorting that list using an rb_tree.  The subsequent read
   throughput of the data written via multiple threads improved by ~70%.
 
 . Simplify the multipath target's handling of queuing IO by pushing
   requests back to the request queue rather than queueing the IO
   internally.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTPv/6AAoJEMUj8QotnQNagQYH/3EkB2f66TRfjRQpVAZuchw/
 U0IbVWcMJKMdhj3uaSNzIkAbTgF+QsZUOLHP/7Q6zLq0M2J3WGrJn2ELqq53MenF
 E0+rJ8duKnJ5oLhhVT62ukBDh3XHWT0JyijXPWNa2gUoYwJqM9BAlXbC/OTfUNaZ
 mBCxvUWGME8k3ht310GhwvzBQjYuxIXhw8XlbGvakb9S83SZwNpCh231iumOEzPe
 Vzfx/xTto0fH3R5/knNV/H9xt0Dv4vt4Aqbqqys9UbQvPzj9qN/mxUZIFg+LZh/w
 WuvHHw6HcAiNNrQGFcm6i1AK2jJ+F61K3afMlYsiamTxMNM+0q/B9HemkX/0ieU=
 =lY8m
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper changes from Mike Snitzer:

 - Fix dm-cache corruption caused by discard_block_size > cache_block_size

 - Fix a lock-inversion detected by LOCKDEP in dm-cache

 - Fix a dangling bio bug in the dm-thinp target's process_deferred_bios
   error path

 - Fix corruption due to non-atomic transaction commit which allowed a
   metadata superblock to be written before all other metadata was
   successfully written -- this is common to all targets that use the
   persistent-data library's transaction manager (dm-thinp, dm-cache and
   dm-era).

 - Various small cleanups in the DM core

 - Add the dm-era target which is useful for keeping track of which
   blocks were written within a user defined period of time called an
   'era'.  Use cases include tracking changed blocks for backup
   software, and partially invalidating the contents of a cache to
   restore cache coherency after rolling back a vendor snapshot.

 - Improve the on-disk layout of multithreaded writes to the
   dm-thin-pool by splitting the pool's deferred bio list to be a
   per-thin device list and then sorting that list using an rb_tree.
   The subsequent read throughput of the data written via multiple
   threads improved by ~70%.

 - Simplify the multipath target's handling of queuing IO by pushing
   requests back to the request queue rather than queueing the IO
   internally.

* tag 'dm-3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (24 commits)
  dm cache: fix a lock-inversion
  dm thin: sort the per thin deferred bios using an rb_tree
  dm thin: use per thin device deferred bio lists
  dm thin: simplify pool_is_congested
  dm thin: fix dangling bio in process_deferred_bios error path
  dm mpath: print more useful warnings in multipath_message()
  dm-mpath: do not activate failed paths
  dm mpath: remove extra nesting in map function
  dm mpath: remove map_io()
  dm mpath: reduce memory pressure when requeuing
  dm mpath: remove process_queued_ios()
  dm mpath: push back requests instead of queueing
  dm table: add dm_table_run_md_queue_async
  dm mpath: do not call pg_init when it is already running
  dm: use RCU_INIT_POINTER instead of rcu_assign_pointer in __unbind
  dm: stop using bi_private
  dm: remove dm_get_mapinfo
  dm: make dm_table_alloc_md_mempools static
  dm: take care to copy the space map roots before locking the superblock
  dm transaction manager: fix corruption due to non-atomic transaction commit
  ...
2014-04-05 18:49:31 -07:00
Linus Torvalds 3f583bc219 IOMMU Upates for Linux v3.15
This time a few more updates queued up.
 
 	* Rework VT-d code to support ACPI devices
 
 	* Improvements for memory and PCI hotplug support
 	  in the VT-d driver
 
 	* Device-tree support for OMAP IOMMU
 
 	* Convert OMAP IOMMU to use devm_* interfaces
 
 	* Fixed PASID support for AMD IOMMU
 
 	* Other random cleanups and fixes for OMAP, ARM-SMMU
 	  and SHMOBILE IOMMU
 
 Most of the changes are in the VT-d driver because some rework was
 necessary for better hotplug and ACPI device support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTPn8aAAoJECvwRC2XARrj66UQAICYhJKrErry3V8tGxzJ6r/3
 jEWCmwd//FmV7rHcdRqckrv8vn9n9wf0lhRoGpbo9iiUrO8ikTdNltccKr1t0cAf
 yPabayk1PbbZyc1Dv1hGKbIH52lHfVjbqesQ9Z2gndgVJuXWz5tdGDInXIvuYYEP
 vWJFS2D5pfAj/lokvcJ1LcwLoBgDdsKM6dbGb2gOxxHm3gIUFvkZktMWGKgKeEGF
 HL8R78tfGj6LRfLILT3RcxW4LfhnM2r1ZSeOm1lCdy7CISptqg6COWdX+gw/UToJ
 lQeuPXv/AUaLTFRL+v07qus8iVsvr6E+Fx3ppmi+mkvgoSD0spWPKnAvrAkFDij3
 ygXFKumTHqGTIHJQTVNWXrHkYFeM/XaIX7lZv0UwhE2RuLTRjFaIdBo2Wckmu9T3
 Un2mZ6NshyI2IQ6NC9ytxHP8BCAP/rhjirKOpegYlKfbU5AHjkBQHHNjL7hcSIH5
 g3ZXIV/FOgfNWhv4cdBR8go1N/PO+VMMYUfc1MC0sYi9MeiOnmsq5Y3LhXQWDgPE
 eogQiz5j87ciNcA4IMO/tUJ2xBSvU6g2ESSCLZxZ33F6uD6X+VVgvinOeFBncbCS
 fD0QUBo27tXUclWJHe7Z2yT4X2Xn4RohGEohqSpD0iNa1RUaNyBf7EzoEkZihxSZ
 J7OuEnbE1kVQLh8y5ZdE
 =uy/Z
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU upates from Joerg Roedel:
 "This time a few more updates queued up.

   - Rework VT-d code to support ACPI devices

   - Improvements for memory and PCI hotplug support in the VT-d driver

   - Device-tree support for OMAP IOMMU

   - Convert OMAP IOMMU to use devm_* interfaces

   - Fixed PASID support for AMD IOMMU

   - Other random cleanups and fixes for OMAP, ARM-SMMU and SHMOBILE
     IOMMU

  Most of the changes are in the VT-d driver because some rework was
  necessary for better hotplug and ACPI device support"

* tag 'iommu-updates-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (75 commits)
  iommu/vt-d: Fix error handling in ANDD processing
  iommu/vt-d: returning free pointer in get_domain_for_dev()
  iommu/vt-d: Only call dmar_acpi_dev_scope_init() if DRHD units present
  iommu/vt-d: Check for NULL pointer in dmar_acpi_dev_scope_init()
  iommu/amd: Fix logic to determine and checking max PASID
  iommu/vt-d: Include ACPI devices in iommu=pt
  iommu/vt-d: Finally enable translation for non-PCI devices
  iommu/vt-d: Remove to_pci_dev() in intel_map_page()
  iommu/vt-d: Remove pdev from intel_iommu_attach_device()
  iommu/vt-d: Remove pdev from iommu_no_mapping()
  iommu/vt-d: Make domain_add_dev_info() take struct device
  iommu/vt-d: Make domain_remove_one_dev_info() take struct device
  iommu/vt-d: Rename 'hwdev' variables to 'dev' now that that's the norm
  iommu/vt-d: Remove some pointless to_pci_dev() calls
  iommu/vt-d: Make get_valid_domain_for_dev() take struct device
  iommu/vt-d: Make iommu_should_identity_map() take struct device
  iommu/vt-d: Handle RMRRs for non-PCI devices
  iommu/vt-d: Make get_domain_for_dev() take struct device
  iommu/vt-d: Make domain_context_mapp{ed,ing}() take struct device
  iommu/vt-d: Make device_to_iommu() cope with non-PCI devices
  ...
2014-04-05 18:46:26 -07:00
Linus Torvalds 19bc2eec3c The clock framework changes for 3.15 look similar to past pull requests.
Mostly clock driver updates, more Device Tree support in the form of
 common functions useful across platforms and a handful of features and
 fixes to the framework core.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTPKLWAAoJEDqPOy9afJhJTJUP/32NJ6+g2/Ren3LNW2QFUAzj
 XAJ1PiuciuMFBI1ttErBwgpgtETj1qLQKakipNxoVQk0hN4Ymi6Dz23+7Vif0241
 8uDgvMg70eeZlyUk2cc0huJzta2kCWQB7jOZT0oDTlzXA8lq3OiSJrc5ey/leVwW
 SM3NySvbN+t/bOaHW5z7oFtsqANCS/t3P0+cL9I+EgUtCJ4boqqI/a01dgZt4qp3
 C68ar1Iy5ko6cFNzsjhmHBw1rz3ChQQhCdKDQsIgTbsgMXlI7AHD8CKizB9dxLpI
 dmM4HFprHlwKdNSsCwMltXT4ROhV6to1Jlo64dekvYbJzGsqR4OoRTUzUC549kOW
 OijFk7QDWMkCBvKA6pmCMpa3GuxRCnU8P8EtmiTra7tz6wwSFESKKEywG6r17/eO
 9TU+apzknHYN//Mfx1ODfHGpXxqgZaJCAR8YGZ/sKFAQZSbJqxl7czqr26BmXDgJ
 FQxlxgYHGn2PnKr8aI8F35PZWZf2dOKDYImwdslmQXc122I8+qnHsruxLKdGxzQR
 VH33ezMP/IhTjcTLwDSmK9JleX5SxxmULRM5kFM+cDh3KJDpw0h/GZXo8XKFSyN4
 8qxh5V+QmROzZ8cFFFa/QVXfNHxkAgVSofP/YovkYYMpVt0o7SBMpEXDrfePrmBD
 OdoXQ0ETAaitehRph1Aj
 =zk74
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus-3.15' of git://git.linaro.org/people/mike.turquette/linux

Pull clock framework changes from Mike Turquette:
 "The clock framework changes for 3.15 look similar to past pull
  requests.  Mostly clock driver updates, more Device Tree support in
  the form of common functions useful across platforms and a handful of
  features and fixes to the framework core"

* tag 'clk-for-linus-3.15' of git://git.linaro.org/people/mike.turquette/linux: (86 commits)
  clk: shmobile: fix setting paretn clock rate
  clk: shmobile: rcar-gen2: fix lb/sd0/sd1/sdh clock parent to pll1
  clk: Fix minor errors in of_clk_init() function comments
  clk: reverse default clk provider initialization order in of_clk_init()
  clk: sirf: update copyright years to 2014
  clk: mmp: try to use closer one when do round rate
  clk: mmp: fix the wrong calculation formula
  clk: mmp: fix wrong mask when calculate denominator
  clk: st: Adds quadfs clock binding
  clk: st: Adds clockgen-vcc and clockgen-mux clock binding
  clk: st: Adds clockgen clock binding
  clk: st: Adds divmux and prediv clock binding
  clk: st: Support for A9 MUX clocks
  clk: st: Support for ClockGenA9/DDR/GPU
  clk: st: Support for QUADFS inside ClockGenB/C/D/E/F
  clk: st: Support for VCC-mux and MUX clocks
  clk: st: Support for PLLs inside ClockGenA(s)
  clk: st: Support for DIVMUX and PreDiv Clocks
  clk: support hardware-specific debugfs entries
  clk: s2mps11: Use of_get_child_by_name
  ...
2014-04-05 18:39:18 -07:00
Linus Torvalds 9712d3c377 pwm: Changes for v3.15-rc1
The legacy HAVE_PWM Kconfig symbol is finally being retired. Thanks a
 lot to Sascha Hauer for doing that.
 
 Three new drivers are added: Freescale FTM, Cirrus Logic CLPS711X and
 Intel Low Power Subsystem.
 
 An assortment of fixes and cleanups rounds things off for this release
 cycle.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTPmXoAAoJEN0jrNd/PrOhSacQAKNpqWHpFdFuhqpO6dvmqYj3
 dvf6EDMnNaOS+TjbCvwP5awAiBhTbJRaTclP1lXXXOnzHvzeeYWhS2ESp4Yl8mRx
 GRHj5OxmquaVPY5HN+6guVyCrgq4R2sxPU1P2VoPhhomhvP2VuEBbD/ddudC3e2k
 /e9BuBhUB9eaur6d+vKX7Bnz09wf+ASobgIisjyyqSYysDgE82BAanX/knnLIyQL
 RKCsz75w14rIxU/f8EML8EMnWiGINYpP+M/NGtPvcNBBOX9DkdzBvSvcbm+gS6ma
 g2P+zsJgxhUpvvmzhqUumADUU8BWo/P1Y/6FQGRku6EmmJQQspTvDvOs1jCauouC
 5vUA41Jwh+4+AKeNWN28tDlh9i5kKYdzYP5SeRcM9mW1SI7AIFmg62lxdus7ZnBB
 e8UFd26kp/hZxXPdDVHtQi9y5Z5kn4axutVpbISuW5P9z1HF9bFOVHKQVlk7D6uz
 EqqiYLdW/MxrmBq+v35biwx6afk3zJ8Qas/MmVIVTcLcLDTFLPEm4EawwcRZo8F3
 Jh4p4IHxjEgLYcwVBNOe4ZBJg10fM1gmh18dDTyri759HE1mpi4/DwTGcv3iK4AU
 njv4Q+qBq9QkY2ktw3qCkTDcwiM9jm+FHfdyKXeR5+CjfOf61/CF+N1jBQ8ZMrb7
 XIRHle+mvL/RYpPDML/P
 =pbF+
 -----END PGP SIGNATURE-----

Merge tag 'pwm/for-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm changes from Thierry Reding:
 "The legacy HAVE_PWM Kconfig symbol is finally being retired.  Thanks a
  lot to Sascha Hauer for doing that.

  Three new drivers are added: Freescale FTM, Cirrus Logic CLPS711X and
  Intel Low Power Subsystem.

  An assortment of fixes and cleanups rounds things off for this release
  cycle"

* tag 'pwm/for-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: pxa: Constify OF match table
  pwm: pxa: Fix typo "pwm" -> "PWM"
  Revert "pwm: pxa: Use of_match_ptr()"
  pwm: add support for Intel Low Power Subsystem PWM
  pwm: Add CLPS711X PWM support
  pwm: atmel: correct CDTY calculation
  pwm: atmel: Fix polarity handling
  Documentation: Add device tree bindings for Freescale FTM PWM.
  pwm: Add Freescale FTM PWM driver support
  pwm: pxa: Use of_match_ptr()
  pwm: samsung: Use SIMPLE_DEV_PM_OPS macro
  pwm: renesas-tpu: Add dependency on HAS_IOMEM
  pwm: Remove obsolete HAVE_PWM Kconfig symbol
2014-04-05 18:32:31 -07:00
Linus Torvalds 2bf73dd61a ARM: SoC: late cleanups
These could not be part of the first cleanup branch, because they either
 came too late in the cycle, or they have dependencies on other branches.
 Important changes are:
 
 * The integrator platform is almost multiplatform capable after
   some reorganization (Linus Walleij)
 * Minor cleanups on Zynq (Michal Simek)
 * Lots of changes for Exynos and other Samsung platforms, including
   further preparations for multiplatform support and the clocks bindings
   are rearranged.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUz/2IGCrR//JCVInAQI+sA//baZOXHTNRR7uBh5PJgaDFIyNjtBDDyyB
 m+yYgw24n3WP1YWtFhBKza7p5Eh2spWYgffKV/logWM4SC3HjkCUsLkQwruHa2qe
 H/pCknUXqUNiwH76WVbfrABb+0tARjEB+U0QfXh7af7Zk+ZXMqQ1/ItU0YdpJiGO
 mOAI5c6gzpr953cmzuHer8foATmF5DNuJPhPDPYlgeg2+yvXgcnfi9a+AXE8Eqb1
 sZeWUJrqJERBlmsVgihq1+gPJjh0Kw7D9r835JqQeKRnywFgvGbmf5kYriPiEEBt
 hJUUnRHW6GCFQM9MemP0nOaRQlQYJA+EPqzB+0YRps0Gq+3QCIXFzZwLije/eMvr
 2YjpITS2MaTqvag1o4yNmfeG+hGMN6MgbOh9q5kLagTXn/9nsQ6aYkD9tCXw4G08
 bH3PP90AT6jQoNDoac5Pt2xPBPvY1JnnUegw5YmQQAlKeSEaiSJnHaC4gD9jzy7q
 fvoXey/Fz/ZgtZKL0wjbjhUrurS45xqZUW0MlMFOt6U7wdG4wsuemaI2PID6tKp8
 ZmZ5gyHsX+CK4GfmhFFu3XhM8hyRj3/OBSy0/Wls3znFH/6j/X1gvrH87gnS9+ax
 +Ettut5uCutDaUJRymXDlqdF9ysLC3DVHpofQPSCqVZ+IHQkUadypyc6YY1Z5mtQ
 x/nxniFA7/A=
 =1i9x
 -----END PGP SIGNATURE-----

Merge tag 'tags/cleanup2-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC late cleanups from Arnd Bergmann:
 "These could not be part of the first cleanup branch, because they
  either came too late in the cycle, or they have dependencies on other
  branches.  Important changes are:

   - The integrator platform is almost multiplatform capable after some
     reorganization (Linus Walleij)
   - Minor cleanups on Zynq (Michal Simek)
   - Lots of changes for Exynos and other Samsung platforms, including
     further preparations for multiplatform support and the clocks
     bindings are rearranged"

* tag 'tags/cleanup2-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (54 commits)
  devicetree: fix newly added exynos sata bindings
  ARM: EXYNOS: Fix compilation error in cpuidle.c
  ARM: S5P64X0: Explicitly include linux/serial_s3c.h in mach/pm-core.h
  ARM: EXYNOS: Remove hardware.h file
  ARM: SAMSUNG: Remove hardware.h inclusion
  ARM: S3C24XX: Remove invalid code from hardware.h
  dt-bindings: clock: Move exynos-audss-clk.h to dt-bindings/clock
  ARM: dts: Keep some essential LDOs enabled for arndale-octa board
  ARM: dts: Disable MDMA1 node for arndale-octa board
  ARM: S3C64XX: Fix build for implicit serial_s3c.h inclusion
  serial: s3c: Fix build of header without serial_core.h preinclusion
  ARM: EXYNOS: Allow wake-up using GIC interrupts
  ARM: EXYNOS: Stop using legacy Samsung PM code
  ARM: EXYNOS: Remove PM initcalls and useless indirection
  ARM: EXYNOS: Fix abuse of CONFIG_PM
  ARM: SAMSUNG: Move s3c_pm_check_* prototypes to plat/pm-common.h
  ARM: SAMSUNG: Move common save/restore helpers to separate file
  ARM: SAMSUNG: Move Samsung PM debug code into separate file
  ARM: SAMSUNG: Consolidate PM debug functions
  ARM: SAMSUNG: Use debug_ll_addr() to get UART base address
  ...
2014-04-05 15:46:37 -07:00
Linus Torvalds cbda94e039 ARM: SoC: driver changes
These changes are mostly for ARM specific device drivers that either
 don't have an upstream maintainer, or that had the maintainer ask
 us to pick up the changes to avoid conflicts. A large chunk of this
 are clock drivers (bcm281xx, exynos, versatile, shmobile), aside from
 that, reset controllers for STi as well as a large rework of the
 Marvell Orion/EBU watchdog driver are notable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUz/1+GCrR//JCVInAQJmfg/9GyqHatDjjUPUBjUQRIEtKgGdmQwdbDqF
 x+OrS/q5B5zYbpIWkbkt1IUYJfU+89Z5ev9jxI4rV824Nu9Y92mHPDnv+N/ptkIh
 q2OVP3bQDpWs3aEVV2B1HBNcWrNUuwco9BJu05eegEePii/cto0/wKwWIgUmrmjy
 xOLthsnp2YmeplGs7ctC6Dz8XbmELebpawejTGylARXei/SwmzB/YYDgJbYjRL2I
 WSCVa8Vo+MZaGC/yxdKVTtvsKVQenxGoMO3ojikJeRdvuVRJds48Cw+UBdzWYNeJ
 3Ssvbdx6Xltf9jy/7H0btOUgxPetZuUV+2XpbWfGu0Zr9FcGDv3q9hrxA+UYKnkY
 GIGU0otSsmpHnX5Ms3E2xnHiV/fihxA3qohqts5kYRBDr5uc+IpW6SbDymQliCGG
 OO4XmIVM3pmsqAqP3Zuseemt9CeSW2yC0XlfXkzjO74yY39c+WLBbtGI40Z5W6i0
 mM1C8RD3QSNijYCEC8eqz06BQfRImsPs+jllsnJTZaHfbOsib718uvandjfG26lN
 616YMcqq0Sp51HIQ4qW7f2dQr7vOyNqbukdkrwF5JgkY/nVki5kdciRg/yeipRy6
 Ey80a+OTq0GQljM0F2dcH/A1eHH9KsuI1L6NdSMJsl0h6guIBORPTwTw3qJ13OkR
 wpJyM+Gm+Fk=
 =u/FI
 -----END PGP SIGNATURE-----

Merge tag 'drivers-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC driver changes from Arnd Bergmann:
 "These changes are mostly for ARM specific device drivers that either
  don't have an upstream maintainer, or that had the maintainer ask us
  to pick up the changes to avoid conflicts.

  A large chunk of this are clock drivers (bcm281xx, exynos, versatile,
  shmobile), aside from that, reset controllers for STi as well as a
  large rework of the Marvell Orion/EBU watchdog driver are notable"

* tag 'drivers-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (99 commits)
  Revert "dts: socfpga: Add DTS entry for adding the stmmac glue layer for stmmac."
  Revert "net: stmmac: Add SOCFPGA glue driver"
  ARM: shmobile: r8a7791: Fix SCIFA3-5 clocks
  ARM: STi: Add reset controller support to mach-sti Kconfig
  drivers: reset: stih416: add softreset controller
  drivers: reset: stih415: add softreset controller
  drivers: reset: Reset controller driver for STiH416
  drivers: reset: Reset controller driver for STiH415
  drivers: reset: STi SoC system configuration reset controller support
  dts: socfpga: Add sysmgr node so the gmac can use to reference
  dts: socfpga: Add support for SD/MMC on the SOCFPGA platform
  reset: Add optional resets and stubs
  ARM: shmobile: r7s72100: fix bus clock calculation
  Power: Reset: Generalize qnap-poweroff to work on Synology devices.
  dts: socfpga: Update clock entry to support multiple parents
  ARM: socfpga: Update socfpga_defconfig
  dts: socfpga: Add DTS entry for adding the stmmac glue layer for stmmac.
  net: stmmac: Add SOCFPGA glue driver
  watchdog: orion_wdt: Use %pa to print 'phys_addr_t'
  drivers: cci: Export CCI PMU revision
  ...
2014-04-05 15:37:40 -07:00
Linus Torvalds f83ccb9358 ARM: SoC: device tree changes
A large part of the arm-soc patches are nowadays DT changes, adding support
 for new SoCs, boards and devices without changing kernel source. The plan
 is still to move the devicetree files out of the kernel tree and reduce
 the amount of churn going on here, but we keep finding reasons to delay
 doing that.
 
 Changes are really all over the place, with little sticking out particularly.
 We have contributions from a total of 116 people in this branch.
 
 Unfortunately, the size of this branch also causes a significant number
 of conflicts at the moment, typically when subsystem maintainers merge
 patches that change the driver at the same time as the dts files. In
 most cases this could be avoided because the dts changes are supposed
 to be compatible in both ways, and we are asking everyone to send ARM
 dts changes through our tree only.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUz/11WCrR//JCVInAQIIyRAA0DjdNNQ/A4G2i1nZCiTFH6a4oZy4JarN
 ATVPkW/V8avhh+yVNe5FWA44Xe6CDC5TXwMaIsbK+w3Iclj3fplh/MsBkQ9ZT9Sl
 LAjJoOjuYucCeDy0WLVioRKZ4PJEDoCu/oZTauIMnmWCOCRxLYpOM3FkAT9oN/Ti
 lswpTSLiV1/U3ZSI4M3qn+Sx1VJL8c/hAIWbvf5if2diYkWPk3VOSKyxmD9zLWdD
 Iqtb79J+ETVeOIM4sHnx79cG4ZCdpOfRAl7qx6hkJu0YATXESxWhpXVE2McTJuzM
 qHKsRRNSfsfSWPeF4angll9o06X/qgdT6C4P2dfH49lGeG7llOttw3OaCx3hWCTe
 U5bt26qtbwG2ZbzocaqvideP+rbpQrCH2vdO1embPv5Lu6peMoBWjxy6twSVXJBG
 LIymJ0IbiGYxL7BReGqRXt6ehy0BDWBeTSTdsGqgEl2TnxHuS/kgGfJc4D5riiEk
 aRPVq10p/k+yo4BZtq2GqXIOG6cqkIQ5lhl5Tg9+MfUlquAONqJP70FgRJDBIw9L
 9uJp71bgSsA6eYg2tXoqJtpdjKplDWavgtACzIkFg2qFLyYmKvx+F0AXbeTIsrri
 /mIchTyG+dgiIjWvj/Xsf7jhrdzRcl3uKsJwFmk927pIsh24HV8T+LKgHrf+sVcO
 qEsEnKGYA6s=
 =zl/N
 -----END PGP SIGNATURE-----

Merge tag 'dt-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC device tree changes from Arnd Bergmann:
 "A large part of the arm-soc patches are nowadays DT changes, adding
  support for new SoCs, boards and devices without changing kernel
  source.  The plan is still to move the devicetree files out of the
  kernel tree and reduce the amount of churn going on here, but we keep
  finding reasons to delay doing that.

  Changes are really all over the place, with little sticking out
  particularly.  We have contributions from a total of 116 people in
  this branch.

  Unfortunately, the size of this branch also causes a significant
  number of conflicts at the moment, typically when subsystem
  maintainers merge patches that change the driver at the same time as
  the dts files.  In most cases this could be avoided because the dts
  changes are supposed to be compatible in both ways, and we are asking
  everyone to send ARM dts changes through our tree only"

* tag 'dt-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (541 commits)
  dts: stmmac: Document the clocks property in the stmmac base document
  dts: socfpga: Add DTS entry for adding the stmmac glue layer for stmmac.
  ARM: STi: stih41x: Add support for the FSM Serial Flash Controller
  ARM: STi: stih416: Add support for the FSM Serial Flash Controller
  ARM: tegra: fix Dalmore pinctrl configuration
  ARM: dts: keystone: use common "ti,keystone" compatible instead of -evm
  ARM: dts: k2hk-evm: set ubifs partition size for 512M NAND
  ARM: dts: Build all keystone dt blobs
  ARM: dts: keystone: Fix control register range for clktsip
  ARM: dts: keystone: Fix domain register range for clkfftc1
  ARM: dts: bcm28155-ap: leave camldo1 on to fix reboot
  ARM: dts: add bcm590xx pmu support and enable for bcm28155-ap
  ARM: dts: bcm21664: Add device tree files.
  ARM: DT: bcm21664: Device tree bindings
  ARM: efm32: properly namespace i2c location property
  ARM: efm32: fix unit address part in USART2 device nodes' names
  ARM: mvebu: Enable NAND controller in Armada 385-DB
  ARM: mvebu: Add support for NAND controller in Armada 38x SoC
  ARM: mvebu: Add the Core Divider clock to Armada 38x SoCs
  ARM: mvebu: Add a 2 GHz fixed-clock on Armada 38x SoCs
  ...
2014-04-05 15:29:04 -07:00
Linus Torvalds ff050ad12c ARM: SoC specific changes
Lots of changes specific to one of the SoC families. Some that
 stick out are:
 
 * mach-qcom gains new features, most importantly SMP support for
   the newer chips (Stephen Boyd, Rohit Vaswani)
 * mvebu gains support for three new SoCs: Armada 375, 380 and 385
   (Thomas Petazzoni and Free-electrons team)
 * SMP support for Rockchips (Heiko Stübner)
 * Lots of i.MX changes (Shawn Guo)
 * Added support for BCM5301x SoC (Hauke Mehrtens)
 * Multiplatform support for Marvell Kirkwood and Dove
   (Andrew Lunn and Sebastian Hesselbarth doing the final part
   of a long journey)
 * Unify davinci platforms and remove obsolete ones (Sekhar Nori,
   Arnd Bergmann)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUz/yT2CrR//JCVInAQJN8A/9Ft1rfp4LEe8Lpr9yAZydG4UaJKy8Hh7Z
 fmohMAuy88J+8jzdwQKKCeEiId+nIf+WmFIQDn9YRDev1/T2v32Ax49XuGtY47JX
 4loIC2wR0+j1aSwhEVOmlM03lX7Hbu6iNDkxaLkDKTRrt3DhDNA6cPZYwNOT273W
 Yx7hIDpvsoOVN3zbPwqhwLrXgywsaNB9E7ly1GixRd1thdg46kMRcM0LJSXPH3we
 pyx7sZbILTVMeUx79XUTvBDJYsbjJWFZknVDYXGkrS5YxAASVsVW2KW9fP9E+UXE
 wTmOxg6spsHGgCezwy8NL5UmfaAOXL3mm6ginFwWpyz7Iu+P5IvfR1W+8UA/O8tp
 K9y8wLA64chPQJkAGaPQBqUPq9QkNHodZWgaPKxKuuv3qF481DCnQKkFRz+sl7mu
 oQVGnoMCnTY6L6yYcIq/GpgiJ731vwefirAwPR8FEBN/gw/gC01b+DDchx/5inPJ
 6V6dCEtPZxXMOsIaYBWFauk3pMFU3E8coklmteyYDQg7eb+55Zq3vsNEpu/vb6ll
 M660AQzzbkZ7lgsSBdNODEvkNH15kC35G2UCfwy99uCE4k/0Vi7reJ1BzXkc+dtJ
 +maBtA6NMALXQ/EI+B+fZLccI4Hv7avwFy1rQJaf+TLiFvTd9yp0qUX8JjXWDPgu
 pPWQOC4a9mU=
 =AGpV
 -----END PGP SIGNATURE-----

Merge tag 'soc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC specific changes from Arnd Bergmann:
 "Lots of changes specific to one of the SoC families.  Some that stick
  out are:

   - mach-qcom gains new features, most importantly SMP support for the
     newer chips (Stephen Boyd, Rohit Vaswani)
   - mvebu gains support for three new SoCs: Armada 375, 380 and 385
     (Thomas Petazzoni and Free-electrons team)
   - SMP support for Rockchips (Heiko Stübner)
   - Lots of i.MX changes (Shawn Guo)
   - Added support for BCM5301x SoC (Hauke Mehrtens)
   - Multiplatform support for Marvell Kirkwood and Dove (Andrew Lunn
     and Sebastian Hesselbarth doing the final part of a long journey)
   - Unify davinci platforms and remove obsolete ones (Sekhar Nori, Arnd
     Bergmann)"

* tag 'soc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (126 commits)
  ARM: sunxi: Select HAVE_ARM_ARCH_TIMER
  ARM: cache-tauros2: remove ARMv6 code
  ARM: mvebu: don't select CONFIG_NEON
  ARM: davinci: fix DT booting with default defconfig
  ARM: configs: bcm_defconfig: enable bcm590xx regulator support
  ARM: davinci: remove tnetv107x support
  MAINTAINERS: Update ARM STi maintainers
  ARM: restrict BCM_KONA_UART to ARCH_BCM_MOBILE
  ARM: bcm21664: Add board support.
  ARM: sunxi: Add the new watchog compatibles to the reboot code
  ARM: enable ARM_HAS_SG_CHAIN for multiplatform
  ARM: davinci: remove da8xx_omapl_defconfig
  ARM: davinci: da8xx: fix multiple watchdog device registration
  ARM: davinci: add da8xx specific configs to davinci_all_defconfig
  ARM: davinci: enable da8xx build concurrently with older devices
  ARM: BCM5301X: workaround suppress fault
  ARM: BCM5301X: add early debugging support
  ARM: BCM5301X: initial support for the BCM5301X/BCM470X SoCs with ARM CPU
  ARM: mach-bcm: Remove GENERIC_TIME
  ARM: shmobile: APMU: Fix warnings due to improper printk formats
  ...
2014-04-05 14:19:54 -07:00
Linus Torvalds dfc25e4503 ARM: SoC: cleanups for 3.15
These cleanup patches are mainly move stuff around and should all
 be harmless. They are mainly split out so that other branches can
 be based on top to avoid conflicts.
 
 Notable changes are:
 
 * We finally remove all mach/timex.h, after CLOCK_TICK_RATE is no
   longer used. (Uwe Kleine-König)
 * The Qualcomm MSM platform is split out into legacy mach-msm and
   new-style mach-qcom, to allow easier maintainance of the new
   hardware support without regressions. (Kumar Gala)
 * A rework of some of the Kconfig logic to simplify multiplatform
   support (Rob Herring)
 * Samsung Exynos gets closer to supporting multiplatform (Sachin
   Kamat and others)
 * mach-bcm3528 gets merged into mach-bcm (Stephen Warren)
 * at91 gains some common clock framework support (Alexandre Belloni,
   Jean-Jacques Hiblot and other French people).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUz/yOWCrR//JCVInAQLOPBAAwTMkMrD8S8ggz6vfiQHZNdRPAC7NUJ46
 +eYKmBVi5d6EdnjNuRElWENsh0ZosSAUFHrXsIC2NdH9sAJ9HOqWNNLymuA59Jo9
 HZ/Ze6xQXDPNV7TROPoXuIli/2OCOXyyQHJsfI7h9V3PCx31qo0B5OdCxU0mtXK6
 r1giREhnJFwfQMF/FTdnzhalFJoSjWwv/nkpNmQDJKRLKj9GzwQqItqw68gV6RzU
 Gnt6YK+9xC1B0cfWTFhAm6kbr9i7mvHoMG5tE3no2uuJMn4K7TgeMqOyvPWhmUeB
 EZi656szT1m5VfRWOqG+7coZO2VM4GO4NI0Xfin3GHllugOYls1il/FAfCPMLiwh
 RvuOmQGCkLIpdkuHop5QaI/h1EzlHA59nzTjmGf1+wWPsm0CIg08XOD9izQbRnN9
 EmRqn1/8POIi17xcWyeMp8LB0APsTI+IflZFaYprEY9VlLLA/Pd+7udULhs8Bq8y
 1l6fB6aPZKnDKCBy/PEIR+y+EHFEbwfrx6zm/pxVDX6P5DlQMFWL78pdBoJUa2h8
 3pm/bSzNU5OSz1nJMLJv2jBTtnM5BvFgQBUi2qJ9Lr+nUhJXKCJ80kE/nOlXoCIU
 J952p3OhkYTQQcjuUVQeTXvRUOGB7mKok0pDFZNE6c7faqxTCudMABQq/KbMFstU
 eE3cH5FyYj4=
 =GcBb
 -----END PGP SIGNATURE-----

Merge tag 'cleanup-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC cleanups from Arnd Bergmann:
 "These cleanup patches are mainly move stuff around and should all be
  harmless.  They are mainly split out so that other branches can be
  based on top to avoid conflicts.

  Notable changes are:

   - We finally remove all mach/timex.h, after CLOCK_TICK_RATE is no
     longer used (Uwe Kleine-König)
   - The Qualcomm MSM platform is split out into legacy mach-msm and
     new-style mach-qcom, to allow easier maintainance of the new
     hardware support without regressions (Kumar Gala)
   - A rework of some of the Kconfig logic to simplify multiplatform
     support (Rob Herring)
   - Samsung Exynos gets closer to supporting multiplatform (Sachin
     Kamat and others)
   - mach-bcm3528 gets merged into mach-bcm (Stephen Warren)
   - at91 gains some common clock framework support (Alexandre Belloni,
     Jean-Jacques Hiblot and other French people)"

* tag 'cleanup-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (89 commits)
  ARM: hisi: select HAVE_ARM_SCU only for SMP
  ARM: efm32: allow uncompress debug output
  ARM: prima2: build reset code standalone
  ARM: at91: add PWM clock
  ARM: at91: move sam9261 SoC to common clk
  ARM: at91: prepare common clk transition for sam9261 SoC
  ARM: at91: updated the at91_dt_defconfig with support for the ADS7846
  ARM: at91: dt: sam9261: Device Tree support for the at91sam9261ek
  ARM: at91: dt: defconfig: Added the sam9261 to the list of DT-enabled SOCs
  ARM: at91: dt: Add at91sam9261 dt SoC support
  ARM: at91: switch sam9rl to common clock framework
  ARM: at91/dt: define main clk frequency of at91sam9rlek
  ARM: at91/dt: define at91sam9rl clocks
  ARM: at91: prepare common clk transition for sam9rl SoCs
  ARM: at91: prepare sam9 dt boards transition to common clk
  ARM: at91: dt: sam9rl: Device Tree for the at91sam9rlek
  ARM: at91/defconfig: Add the sam9rl to the list of DT-enabled SOCs
  ARM: at91: Add at91sam9rl DT SoC support
  ARM: at91: prepare at91sam9rl DT transition
  ARM: at91/defconfig: refresh at91sam9260_9g20_defconfig
  ...
2014-04-05 13:51:19 -07:00
Linus Torvalds 9f800363bb ARM: SoC non-critical bug fixes for 3.15
Lots of isolated bug fixes that were not found to be important
 enough to be submitted before the merge window or backported
 into stable kernels.
 The vast majority of these came out of Arnd's randconfig testing
 and just prevents running into build-time bugs in configurations
 that we do not care about in practice.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUz/yEmCrR//JCVInAQIDsBAAu9uUC/uuc77953rsRqXPOCqjG4Q4g7Y+
 HGxuztTGGJN6eglK7+aRKbmSlZck6KQykevm+OYnoINcGyazXmajkUnbaVvgNCU9
 iRyRLkLjilDWBQXY5Ou3wK2WgyI4pMokRYIkp+MpQHQ5IlvJ5707IYj+FswdK5kT
 npbcP+L5oJ13afVnI18uflapr2ecXGdvfuEZw3sWpKcfefutxmEVYzRUBkNgj5Pd
 bva9GcWuA/ymRJR1XQmXh7EE+kqzGX5P0hFfaQsgtUwvY2Bv3fNia+GMLrf6pUGb
 Pl3rxyfo9VKoW0gbeVB7sk1rHTgh6ay2T8PBSz5dpyoR4A1n8BZQXPjUd7fBKv97
 VRWMXRQz5sQ05FnvJFlV5CcYikf8GFOPooUhgY7Fo1sdoDawkAOQ1AJ4yhPsx86u
 V/S3o3pMWqDGnFMFmS95iAWW7Ru66XVYsPJnFktiLXt6SLlSAY52DzV6HlStF4hi
 O9dsIi5TsOxYhSWpMFZCxHK/I805zEjGOAyTYnCQB6Lwadg0mUiwdRJvp0YzcdDM
 X1mCsz8yHM3bbhvkxbqzwnBNgz24TkDPA8IvUGFtyxGF+5m8MgAzIKcGc4PKI6Gg
 I9M0oechC2dusvfflXFinvRhZMHMHi8+t58b/+29KrsacnE5vDmBFzeWGUkCXs5q
 oo4cWe14m6U=
 =KRJL
 -----END PGP SIGNATURE-----

Merge tag 'fixes-non-critical-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC non-critical bug fixes from Arnd Bergmann:
 "Lots of isolated bug fixes that were not found to be important enough
  to be submitted before the merge window or backported into stable
  kernels.

  The vast majority of these came out of Arnd's randconfig testing and
  just prevents running into build-time bugs in configurations that we
  do not care about in practice"

* tag 'fixes-non-critical-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (75 commits)
  ARM: at91: fix a typo
  ARM: moxart: fix CPU selection
  ARM: tegra: fix board DT pinmux setup
  ARM: nspire: Fix compiler warning
  IXP4xx: Fix DMA masks.
  Revert "ARM: ixp4xx: Make dma_set_coherent_mask common, correct implementation"
  IXP4xx: Fix Goramo Multilink GPIO conversion.
  Revert "ARM: ixp4xx: fix gpio rework"
  ARM: tegra: make debug_ll code build for ARMv6
  ARM: sunxi: fix build for THUMB2_KERNEL
  ARM: exynos: add missing include of linux/module.h
  ARM: exynos: fix l2x0 saved regs handling
  ARM: samsung: select CRC32 for SAMSUNG_PM_CHECK
  ARM: samsung: select ATAGS where necessary
  ARM: samsung: fix SAMSUNG_PM_DEBUG Kconfig logic
  ARM: samsung: allow serial driver to be disabled
  ARM: s5pv210: enable IDE support in MACH_TORBRECK
  ARM: s5p64x0: fix building with only one soc type
  ARM: s3c64xx: select power domains only when used
  ARM: s3c64xx: MACH_SMDK6400 needs HSMMC1
  ...
2014-04-05 13:44:27 -07:00
Linus Torvalds 2d1eb87ae1 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM changes from Russell King:

 - Perf updates from Will Deacon:
   - Support for Qualcomm Krait processors (run perf on your phone!)
   - Support for Cortex-A12 (run perf stat on your FPGA!)
   - Support for perf_sample_event_took, allowing us to automatically decrease
     the sample rate if we can't handle the PMU interrupts quickly enough
     (run perf record on your FPGA!).

 - Basic uprobes support from David Long:
     This patch series adds basic uprobes support to ARM. It is based on
     patches developed earlier by Rabin Vincent. That approach of adding
     hooks into the kprobes instruction parsing code was not well received.
     This approach separates the ARM instruction parsing code in kprobes out
     into a separate set of functions which can be used by both kprobes and
     uprobes. Both kprobes and uprobes then provide their own semantic action
     tables to process the results of the parsing.

 - ARMv7M (microcontroller) updates from Uwe Kleine-König

 - OMAP DMA updates (recently added Vinod's Ack even though they've been
   sitting in linux-next for a few months) to reduce the reliance of
   omap-dma on the code in arch/arm.

 - SA11x0 changes from Dmitry Eremin-Solenikov and Alexander Shiyan

 - Support for Cortex-A12 CPU

 - Align support for ARMv6 with ARMv7 so they can cooperate better in a
   single zImage.

 - Addition of first AT_HWCAP2 feature bits for ARMv8 crypto support.

 - Removal of IRQ_DISABLED from various ARM files

 - Improved efficiency of virt_to_page() for single zImage

 - Patch from Ulf Hansson to permit runtime PM callbacks to be available for
   AMBA devices for suspend/resume as well.

 - Finally kill asm/system.h on ARM.

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (89 commits)
  dmaengine: omap-dma: more consolidation of CCR register setup
  dmaengine: omap-dma: move IRQ handling to omap-dma
  dmaengine: omap-dma: move register read/writes into omap-dma.c
  ARM: omap: dma: get rid of 'p' allocation and clean up
  ARM: omap: move dma channel allocation into plat-omap code
  ARM: omap: dma: get rid of errata global
  ARM: omap: clean up DMA register accesses
  ARM: omap: remove almost-const variables
  ARM: omap: remove references to disable_irq_lch
  dmaengine: omap-dma: cleanup errata 3.3 handling
  dmaengine: omap-dma: provide register read/write functions
  dmaengine: omap-dma: use cached CCR value when enabling DMA
  dmaengine: omap-dma: move barrier to omap_dma_start_desc()
  dmaengine: omap-dma: move clnk_ctrl setting to preparation functions
  dmaengine: omap-dma: improve efficiency loading C.SA/C.EI/C.FI registers
  dmaengine: omap-dma: consolidate clearing channel status register
  dmaengine: omap-dma: move CCR buffering disable errata out of the fast path
  dmaengine: omap-dma: provide register definitions
  dmaengine: omap-dma: consolidate setup of CCR
  dmaengine: omap-dma: consolidate setup of CSDP
  ...
2014-04-05 13:20:43 -07:00
Dave Airlie 9f97ba806a Merge tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel into drm-next
Merge window -fixes pull request as usual. Well, I did sneak in Jani's
drm_i915_private_t typedef removal, need to have fun with a big sed job
too ;-)

Otherwise:
- hdmi interlaced fixes (Jesse&Ville)
- pipe error/underrun/crc tracking fixes, regression in late 3.14-rc (but
  not cc: stable since only really relevant for igt runs)
- large cursor wm fixes (Chris)
- fix gpu turbo boost/throttle again, was getting stuck due to vlv rps
  patches (Chris+Imre)
- fix runtime pm fallout (Paulo)
- bios framebuffer inherit fix (Chris)
- a few smaller things

* tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel: (196 commits)
  Skip intel_crt_init for Dell XPS 8700
  drm/i915: vlv: fix RPS interrupt mask setting
  Revert "drm/i915/vlv: fixup DDR freq detection per Punit spec"
  drm/i915: move power domain init earlier during system resume
  drm/i915: Fix the computation of required fb size for pipe
  drm/i915: don't get/put runtime PM at the debugfs forcewake file
  drm/i915: fix WARNs when reading DDI state while suspended
  drm/i915: don't read cursor registers on powered down pipes
  drm/i915: get runtime PM at i915_display_info
  drm/i915: don't read pp_ctrl_reg if we're suspended
  drm/i915: get runtime PM at i915_reg_read_ioctl
  drm/i915: don't schedule force_wake_timer at gen6_read
  drm/i915: vlv: reserve the GT power context only once during driver init
  drm/i915: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/overlay: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/ringbuffer: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/display: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/irq: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/dma: prefer struct drm_i915_private to drm_i915_private_t
  ...
2014-04-05 16:14:21 +10:00
Dave Airlie 82c68b6ccd drm/tegra: Changes for v3.15-rc1
Implement eDP support for Tegra124 and support the PRIME vmap()/vunmap()
 operations.
 
 A symbol that is required for upcoming V4L2 support is now exported by
 the host1x driver.
 
 Relicense drivers under the GPL v2 for consistency. One exception is the
 public header file, which is relicensed under MIT to abide by the common
 rule.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTPlzgAAoJEN0jrNd/PrOhR2AQALMfTgwlcUb53NkYKyuotf1g
 dcUeCXrYlOZQhEkTEBkp8rjU3kYHcLieQW5NFUpVKMy4VTvb1nXPB0VrEJjajtrx
 coAzffIVzqhWOUz4iGHphoIhzfQ6xQTNCd8B2bT/4pdnHuHNt4A10blFfxlBYPwD
 2hw4alTYpaNhsSso3dDB2ORSKZsCWlFC/bPJVA/yGtrXon/CR8Q9sGIqcEnKa6fp
 gPfdxJChr2c5FeFIgQRnkt+MHOl+SgpkzxNXX5c5ffY6kt1HvKKJZfTv4cbOsSrn
 7xPtgv0PKiiGtReRXZxZKB/xOGKJBCDM2oXfv02pMT5bCIRTzpmkWne3cuU2b2Mn
 FN67ZBHCSPRiBcdHIc7pGwP8jIg21zZ/7IqWW9/4yAXksYV3Ii7TdQY3eL3PCrBP
 3802ygJznKuVx2S1xLMI7z4DXV+44cLCCWzmglWEQPQfKFCVgTsmuLr8HiM1Tj1m
 YvEibgL72ggDsInGF4nrwidEirvtRqHSn/qcD19p1gRORKxR8P7e9LUmWN/PHlkV
 iKfcaMyWpHuCLcCyKC2b9iieAtLDz1Hsn9MiaQ7BcZUVVMAS6OVrrrm14Q5Wbi/Z
 RxfF0hRjPDEXyrxo2LKrVLQbxeMhkmBfkc532YZCwSxoWvgScUfE73lB/kk68Iv2
 c0WnbuHrH41dslXH4yPl
 =LGu0
 -----END PGP SIGNATURE-----

Merge tag 'drm/tegra/for-3.15-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next

drm/tegra: Changes for v3.15-rc1

Implement eDP support for Tegra124 and support the PRIME vmap()/vunmap()
operations.

A symbol that is required for upcoming V4L2 support is now exported by
the host1x driver.

Relicense drivers under the GPL v2 for consistency. One exception is the
public header file, which is relicensed under MIT to abide by the common
rule.

* tag 'drm/tegra/for-3.15-rc1' of git://anongit.freedesktop.org/tegra/linux:
  drm/tegra: Use standard GPL v2 license text
  drm/tegra: Relicense under GPL v2
  drm/tegra: Relicense public header under MIT
  drm/tegra: Add eDP support
  gpu: host1x: export host1x_syncpt_incr_max() function
  drm/tegra: prime: Add vmap support
2014-04-05 16:13:08 +10:00
Linus Torvalds 8e0c083234 fbdev changes for 3.15 (main part)
Various fbdev fixes and improvements, but nothing big.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTPoIDAAoJEPo9qoy8lh71ScsQAI4sxk8ci/Al6uh3aOw90M+b
 Iy3MnBfjcItmtpmNUm8JJQ7ce5hfX2H8+uONu5iQCVLXZZyqDsIUtoJlZn2XCnEV
 3WFtHE12UwdiAiTYdinK6PH42A5G2U5RC2QMtPxAdm/QoKJgxI3ngsWJydFqWsRc
 wT7cFZ90ZybFc1+n5gcoXqoSfIm9ve/jTAVIzYCWRpqVUpsHbFGmK3t+MRaw4d6V
 AhJVYAwF5OJkB/QiKOiAmP1QytC7mKDtGy/GlQAZhrr924OK+/EVM1gUTdJEiE7x
 j1cZhQh5w/zisQE7N05ch5BTAZZRWMa5eUc/BbPal8sNwu3oK+WE5VOxClGLhCr/
 x3zq5Dzxr9r3xwonXILzUBqpGgmPuKPa0mAeknE0/tAgtqRvbLBttc1bp/sJj6/8
 6eNMEVg1SOg4387kR/kF4dRgeiLF9enrwNxVgm3rlJOXQQcpmqC7F4e9yiGdfqyr
 ZVzGYIvwErTLL/T/UBc6OTyGoG7Ifake1PEV8H0GWaF5uzIEZXoFJrEx7o9j9SR/
 H7qOcz7lm3YSmKB0tdgfBGMOnkOaWn+GMTW8N4M071PHyVEyJmz9uxLHQSiRRuMc
 Fwr1r2SyuxP1TmyIGT/Kyk2rApx/KP1TH/Z8MbFNL7J6VwgBzNrkIj1DbkngCODf
 VcbZiq04uICNKw8QsHtg
 =bavN
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-main-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux

Pull fbdev changes from Tomi Valkeinen:
 "Various fbdev fixes and improvements, but nothing big"

* tag 'fbdev-main-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (38 commits)
  fbdev: Make the switch from generic to native driver less alarming
  Video: atmel: avoid the id of fix screen info is overwritten
  video: imxfb: Add DT default contrast control register property.
  video: atmel_lcdfb: ensure the hardware is initialized with the correct mode
  fbdev: vesafb: add dev->remove() callback
  fbdev: efifb: add dev->remove() callback
  video: pxa3xx-gcu: switch to devres functions
  video: pxa3xx-gcu: provide an empty .open call
  video: pxa3xx-gcu: pass around struct device *
  video: pxa3xx-gcu: rename some symbols
  sisfb: fix 1280x720 resolution support
  video: fbdev: uvesafb: Remove impossible code path in uvesafb_init_info
  video: fbdev: uvesafb: Remove redundant NULL check in uvesafb_remove
  fbdev: FB_OPENCORES should depend on HAS_DMA
  OMAPDSS: convert pixel clock to common videomode style
  OMAPDSS: Remove unused get_context_loss_count support
  OMAPDSS: use DISPC register to detect context loss
  video: da8xx-fb: Use "SIMPLE_DEV_PM_OPS" macro
  video: imxfb: Convert to SIMPLE_DEV_PM_OPS
  video: imxfb: Resolve mismatch between backlight/contrast
  ...
2014-04-04 21:28:36 -07:00
Linus Torvalds d1d9cfc330 A number of cleanups plus support for the RDSEED instruction, which
will be showing up in Intel Broadwell CPU's.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTPaDyAAoJENNvdpvBGATwIVkP/Ao77ATdIBKr7kJJ+7q/jroo
 hfg2EOBzYwFSTZ4EUCwTwCNOqcAnr8xBYRWDDTxoiT0ORQN+4QkIXlukTBZbeCUg
 5ZCvRYhNcrieCBX69X/0aQ7yAkMR5/KSjccww3Tr6bLEItEBS7/gkeGv0H/E1DOg
 /1C3vbDDRPchM2qGM/TS/2Wd3WU5jWJk4cURJUlweBWHGXItOk6yHIY/+h3t4Bj1
 eJfxwG21BLJ1t159uvMj5Sw9Ketwnzgc4hsU6Ai/o37+Q12LYJ6xRCHsAatr6aA8
 dsObrfnD15T3gH/elVzB0tpBkor9pESkJAvUxWdMNMNGVU01ONuBz6rbSUqLdKu4
 GXA+9iHq4Ti43f4M886PweSweFr1+lFIiQrdAQaWZdrooLCMJCgjMNM+7wnlu8ho
 Eit65rq4k6y5Fb5GTB8Am/TCb3HNYhXnrMERA0meKx+/K8yaMygzHtPoT+DHdn8z
 GWDwNiMYrtTeiJ5nduljO3M2lYJDIWLkCD9SwupXe5HEj3qSHS/aiT5ng6YP+oOK
 mmUiWWgpVv3unO6pJw49pbzuAWXnbHVWMV2VDILVxn7mZZ0NAOL3OGAmK7US+Z/J
 qww9s89A2oba8vTbdrW7JuiP9ESlujpfrLQQgnW6WInMHY2dfo5hSo84ULLeLMXQ
 SqT1yywl4U888jqBYioB
 =RtMT
 -----END PGP SIGNATURE-----

Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull /dev/random changes from Ted Ts'o:
 "A number of cleanups plus support for the RDSEED instruction, which
  will be showing up in Intel Broadwell CPU's"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: Add arch_has_random[_seed]()
  random: If we have arch_get_random_seed*(), try it before blocking
  random: Use arch_get_random_seed*() at init time and once a second
  x86, random: Enable the RDSEED instruction
  random: use the architectural HWRNG for the SHA's IV in extract_buf()
  random: clarify bits/bytes in wakeup thresholds
  random: entropy_bytes is actually bits
  random: simplify accounting code
  random: tighten bound on random_read_wakeup_thresh
  random: forget lock in lockless accounting
  random: simplify accounting logic
  random: fix comment on "account"
  random: simplify loop in random_read
  random: fix description of get_random_bytes
  random: fix comment on proc_do_uuid
  random: fix typos / spelling errors in comments
2014-04-04 21:25:28 -07:00
Ilya Dryomov 18cb95af2d libceph: enable PRIMARY_AFFINITY feature bit
Announce our support for osdmaps with non-default primary affinity
values.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:20 -07:00
Ilya Dryomov 8008ab1080 libceph: return primary from ceph_calc_pg_acting()
In preparation for adding support for primary_temp, stop assuming
primaryness: add a primary out parameter to ceph_calc_pg_acting() and
change call sites accordingly.  Primary is now specified separately
from the order of osds in the set.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:14 -07:00
Ilya Dryomov ac972230e2 libceph: switch ceph_calc_pg_acting() to new helpers
Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(),
raw_to_up_osds() and apply_temps().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:13 -07:00
Ilya Dryomov 2abebdbca7 libceph: ceph_can_shift_osds(pool) and pool type defines
Bring in pg_pool_t::can_shift_osds() counterpart along with pool type
defines.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:08 -07:00
Ilya Dryomov 246138fa67 libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions
Sync up with ceph.git definitions.  Bring in ceph_osd_is_down().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:07 -07:00
Ilya Dryomov ddf3a21a03 libceph: enable OSDMAP_ENC feature bit
Announce our support for "new" (v7 - split and separately versioned
client and osd sections) osdmap enconding.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:05 -07:00
Ilya Dryomov 2cfa34f2d6 libceph: primary_affinity infrastructure
Add primary_affinity infrastructure.  primary_affinity values are
stored in an max_osd-sized array, hanging off ceph_osdmap, similar to
a osd_weight array.

Introduce {get,set}_primary_affinity() helpers, primarily to return
CEPH_OSD_DEFAULT_PRIMARY_AFFINITY when no affinity has been set and to
abstract out osd_primary_affinity array allocation and initialization.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:02 -07:00
Ilya Dryomov 9686f94c8c libceph: primary_temp infrastructure
Add primary_temp mappings infrastructure.  struct ceph_pg_mapping is
overloaded, primary_temp mappings are stored in an rb-tree, rooted at
ceph_osdmap, in a manner similar to pg_temp mappings.

Dump primary_temp mappings to /sys/kernel/debug/ceph/<client>/osdmap,
one 'primary_temp <pgid> <osd>' per line, e.g:

    primary_temp 2.6 4

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:07:58 -07:00
Ilya Dryomov 35a935d75d libceph: generalize ceph_pg_mapping
In preparation for adding support for primary_temp mappings, generalize
struct ceph_pg_mapping so it can hold mappings other than pg_temp.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:07:57 -07:00
Ilya Dryomov a2505d63ee libceph: split osdmap allocation and decode steps
Split osdmap allocation and initialization into a separate function,
ceph_osdmap_decode().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:07:37 -07:00
Ilya Dryomov 07bd7de47a crush: support chooseleaf_vary_r tunable (tunables3) by default
Add TUNABLES3 feature (chooseleaf_vary_r tunable) to a set of features
supported by default.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04 21:07:29 -07:00
Ilya Dryomov d83ed858f1 crush: add SET_CHOOSELEAF_VARY_R step
This lets you adjust the vary_r tunable on a per-rule basis.

Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04 21:07:28 -07:00
Ilya Dryomov e2b149cc4b crush: add chooseleaf_vary_r tunable
The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04 21:07:26 -07:00
Yan, Zheng eb13e832f8 ceph: use fl->fl_file as owner identifier of flock and posix lock
flock and posix lock should use fl->fl_file instead of process ID
as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
is usually equal to fl->fl_file, but it also can be a customized
value). The process ID of who holds the lock is just for F_GETLK
fcntl(2).

The fix is rename the 'pid' fields of struct ceph_mds_request_args
and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
We also set the most significant bit of the 'owner' field. MDS can
use that bit to distinguish between old and new clients.

The MDS counterpart of this patch modifies the flock code to not
take the 'pid_namespace' into consideration when checking conflict
locks.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04 21:07:11 -07:00
Linus Torvalds d15e03104e xfs: update for 3.15-rc1
The main changes in the XFS tree for 3.15-rc1 are:
 
         - O_TMPFILE support
         - allowing AIO+DIO writes beyond EOF
         - FALLOC_FL_COLLAPSE_RANGE support for fallocate syscall and XFS
           implementation
         - FALLOC_FL_ZERO_RANGE support for fallocate syscall and XFS
           implementation
         - IO verifier cleanup and rework
         - stack usage reduction changes
         - vm_map_ram NOIO context fixes to remove lockdep warings
         - various bug fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJTPykMAAoJEK3oKUf0dfod/KoP/jKQwzQPdtT8EtAu5vENh9AO
 55zwCDXXFjCNIGIFPkrUBQbbARVAqhLZn3vuLUUhqtRRELdgJy/yFKZ37MPd8bhU
 dKetivEB192Jcd6Sn74vsOsNLm1u9mJqbQ1aothz0TiOrkkWFZlz4Otu36MZRHN3
 9WgZXWSxr6I/hYHGyCorJWZ5ISs0XD3vR5dYXYeZChbTpTxlCT4X/YgUtW4WH/Tq
 y4gG0fKfwr9KK07/LXuQgUuZGU8vwVuNNsXPhqh+FZ39SLD2Ea83h46Hzf/+vVNI
 kCIyYN1y40uBWczmwAptVEnUwhpGK8PzNrhKwTZICDtuct9sikf7c+o0aEE9lcqo
 8IBt0Dy4l7BQVFSZOjYo5Jw5a8jAbkh47zru31HxogEVqafdz80iWB12JagOOnXM
 v/McvDvZMyfgGckih32FM4G7ElvTYgGai5/3dLhfMuhc4/DdwcBOF1yHmFmnjhWO
 QRsQxLdefUtP3MfMYKaJHM6v2wE1S2l0owgp+HdPluNiOUmH/fqFq1WpHxqqeRPz
 nuHF8oYlxaZP5WAarz6Yf1/twIeZJ1rTD8np8uocvMqQJzMYJgrQyH+xJqjJaITR
 iveQcEoRB8D7/fXMGDdcjZYE2fG4l4JE2kuh97k5NZw76e3v2YXSGh0kd9WqR1uN
 t07joLRQKR2pJuSmuD5E
 =uSkJ
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Dave Chinner:
 "There are a couple of new fallocate features in this request - it was
  decided that it was easiest to push them through the XFS tree using
  topic branches and have the ext4 support be based on those branches.
  Hence you may see some overlap with the ext4 tree merge depending on
  how they including those topic branches into their tree.  Other than
  that, there is O_TMPFILE support, some cleanups and bug fixes.

  The main changes in the XFS tree for 3.15-rc1 are:

   - O_TMPFILE support
   - allowing AIO+DIO writes beyond EOF
   - FALLOC_FL_COLLAPSE_RANGE support for fallocate syscall and XFS
     implementation
   - FALLOC_FL_ZERO_RANGE support for fallocate syscall and XFS
     implementation
   - IO verifier cleanup and rework
   - stack usage reduction changes
   - vm_map_ram NOIO context fixes to remove lockdep warings
   - various bug fixes and cleanups"

* tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfs: (34 commits)
  xfs: fix directory hash ordering bug
  xfs: extra semi-colon breaks a condition
  xfs: Add support for FALLOC_FL_ZERO_RANGE
  fs: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
  xfs: inode log reservations are still too small
  xfs: xfs_check_page_type buffer checks need help
  xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation
  xfs: use NOIO contexts for vm_map_ram
  xfs: don't leak EFSBADCRC to userspace
  xfs: fix directory inode iolock lockdep false positive
  xfs: allocate xfs_da_args to reduce stack footprint
  xfs: always do log forces via the workqueue
  xfs: modify verifiers to differentiate CRC from other errors
  xfs: print useful caller information in xfs_error_report
  xfs: add xfs_verifier_error()
  xfs: add helper for updating checksums on xfs_bufs
  xfs: add helper for verifying checksums on xfs_bufs
  xfs: Use defines for CRC offsets in all cases
  xfs: skip pointless CRC updates after verifier failures
  xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
  ...
2014-04-04 15:50:08 -07:00
Linus Torvalds 24e7ea3bea Major changes for 3.14 include support for the newly added ZERO_RANGE
and COLLAPSE_RANGE fallocate operations, and scalability improvements
 in the jbd2 layer and in xattr handling when the extended attributes
 spill over into an external block.
 
 Other than that, the usual clean ups and minor bug fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y
 Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A
 ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI
 kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd
 bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15
 a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs
 rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa
 ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/
 +165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA
 2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr
 700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU
 DrPKlXwIgva7zq5/S0Vr
 =4s1Z
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Major changes for 3.14 include support for the newly added ZERO_RANGE
  and COLLAPSE_RANGE fallocate operations, and scalability improvements
  in the jbd2 layer and in xattr handling when the extended attributes
  spill over into an external block.

  Other than that, the usual clean ups and minor bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
  ext4: fix premature freeing of partial clusters split across leaf blocks
  ext4: remove unneeded test of ret variable
  ext4: fix comment typo
  ext4: make ext4_block_zero_page_range static
  ext4: atomically set inode->i_flags in ext4_set_inode_flags()
  ext4: optimize Hurd tests when reading/writing inodes
  ext4: kill i_version support for Hurd-castrated file systems
  ext4: each filesystem creates and uses its own mb_cache
  fs/mbcache.c: doucple the locking of local from global data
  fs/mbcache.c: change block and index hash chain to hlist_bl_node
  ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
  ext4: refactor ext4_fallocate code
  ext4: Update inode i_size after the preallocation
  ext4: fix partial cluster handling for bigalloc file systems
  ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
  ext4: only call sync_filesystm() when remounting read-only
  fs: push sync_filesystem() down to the file system's remount_fs()
  jbd2: improve error messages for inconsistent journal heads
  jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
  jbd2: minimize region locked by j_list_lock in journal_get_create_access()
  ...
2014-04-04 15:39:39 -07:00
Linus Torvalds f7789dc0d4 Merge branch 'locks-3.15' of git://git.samba.org/jlayton/linux
Pull file locking updates from Jeff Layton:
 "Highlights:

   - maintainership change for fs/locks.c.  Willy's not interested in
     maintaining it these days, and is OK with Bruce and I taking it.
   - fix for open vs setlease race that Al ID'ed
   - cleanup and consolidation of file locking code
   - eliminate unneeded BUG() call
   - merge of file-private lock implementation"

* 'locks-3.15' of git://git.samba.org/jlayton/linux:
  locks: make locks_mandatory_area check for file-private locks
  locks: fix locks_mandatory_locked to respect file-private locks
  locks: require that flock->l_pid be set to 0 for file-private locks
  locks: add new fcntl cmd values for handling file private locks
  locks: skip deadlock detection on FL_FILE_PVT locks
  locks: pass the cmd value to fcntl_getlk/getlk64
  locks: report l_pid as -1 for FL_FILE_PVT locks
  locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT"
  locks: rename locks_remove_flock to locks_remove_file
  locks: consolidate checks for compatible filp->f_mode values in setlk handlers
  locks: fix posix lock range overflow handling
  locks: eliminate BUG() call when there's an unexpected lock on file close
  locks: add __acquires and __releases annotations to locks_start and locks_stop
  locks: remove "inline" qualifier from fl_link manipulation functions
  locks: clean up comment typo
  locks: close potential race between setlease and open
  MAINTAINERS: update entry for fs/locks.c
2014-04-04 14:21:20 -07:00
Linus Torvalds 7df934526c Merge branch 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull renameat2 system call from Miklos Szeredi:
 "This adds a new syscall, renameat2(), which is the same as renameat()
  but with a flags argument.

  The purpose of extending rename is to add cross-rename, a symmetric
  variant of rename, which exchanges the two files.  This allows
  interesting things, which were not possible before, for example
  atomically replacing a directory tree with a symlink, etc...  This
  also allows overlayfs and friends to operate on whiteouts atomically.

  Andy Lutomirski also suggested a "noreplace" flag, which disables the
  overwriting behavior of rename.

  These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only
  implemented for ext4 as an example and for testing"

* 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ext4: add cross rename support
  ext4: rename: split out helper functions
  ext4: rename: move EMLINK check up
  ext4: rename: create ext4_renament structure for local vars
  vfs: add cross-rename
  vfs: lock_two_nondirectories: allow directory args
  security: add flags to rename hooks
  vfs: add RENAME_NOREPLACE flag
  vfs: add renameat2 syscall
  vfs: rename: use common code for dir and non-dir
  vfs: rename: move d_move() up
  vfs: add d_is_dir()
2014-04-04 14:03:05 -07:00
Bryan Wu 64400c3791 gpu: host1x: export host1x_syncpt_incr_max() function
Tegra V4L2 camera driver needs this function to do frame capture.

Signed-off-by: Bryan Wu <pengw@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2014-04-04 09:12:49 +02:00
Linus Torvalds 73f10274a6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
 "The first round of updates for the input subsystem.

  Just new drivers and existing driver fixes, no core changes except for
  the new uinput IOCTL to allow userspace to fetch sysfs name of the
  input device that was created"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (43 commits)
  Input: edt-ft5x06 - add a missing condition
  Input: appletouch - fix jumps when additional fingers are detected
  Input: appletouch - implement sensor data smoothing
  Input: add driver for SOC button array
  Input: pm8xxx-vibrator - add DT match table
  Input: pmic8xxx-pwrkey - migrate to DT
  Input: pmic8xxx-keypad - migrate to DT
  Input: pmic8xxx-keypad - migrate to regmap APIs
  Input: pmic8xxx-keypad - migrate to devm_* APIs
  Input: pmic8xxx-keypad - fix build by removing gpio configuration
  Input: add new driver for ARM CLPS711X keypad
  Input: edt-ft5x06 - add support for M09 firmware version
  Input: edt-ft5x06 - ignore touchdown events
  Input: edt-ft5x06 - adjust delays to conform datasheet
  Input: edt-ft5x06 - add DT support
  Input: edt-ft5x06 - several cleanups; no functional change
  Input: appletouch - dial back fuzz setting
  Input: remove obsolete tnetv107x drivers
  Input: sirfsoc-onkey - set the capability of reporting KEY_POWER
  Input: da9052_onkey - use correct register bit for key status
  ...
2014-04-03 17:02:31 -07:00
Linus Torvalds 877f075aac Main batch of InfiniBand/RDMA changes for 3.15:
- The biggest change is core API extensions and mlx5 low-level driver
    support for handling DIF/DIX-style protection information, and the
    addition of PI support to the iSER initiator.  Target support will be
    arriving shortly through the SCSI target tree.
 
  - A nice simplification to the "umem" memory pinning library now that
    we have chained sg lists.  Kudos to Yishai Hadas for realizing our
    code didn't have to be so crazy.
 
  - Another nice simplification to the sg wrappers used by qib, ipath and
    ehca to handle their mapping of memory to adapter.
 
  - The usual batch of fixes to bugs found by static checkers etc. from
    intrepid people like Dan Carpenter and Yann Droneaud.
 
  - A large batch of cxgb4, ocrdma, qib driver updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTPYBnAAoJEENa44ZhAt0hGI4P/29eotGwpkANUQE6FQvxCUL2
 CXJtSg52lmYvGJrPK4IhihpbtQmHJz3iXEzlOOWidTw1dJgObR6vFaRymh7+vDLs
 CdzybMcXdasarqTuYeJbFzhkimpwtWWrMy/8Ik/Jj/5glGQ6cUSpdYZzVtFhYNqf
 hCGE8iLi+tuekJJj1htut5D6apXM7udcdc2yLJNOdsSj/VUXt1oqG1x9xAi9R8Tq
 7o8eFSStdlja0EBQ6Hli2zauCSnQkaUtr8h6EAFbcCtvBK8HqsHSc2gfq2ViFUiN
 ztt167oWoQnVkR0qCPL5nVt+CRQHHROprVXvbpcTI3aW61gNIl6OrUUOXefzHXac
 TNi+fdMpiEB/JQ4Z04Jzd1dGCSjYeTqPj4rO4meFjBmxRDdTgZHu7FWwejT1nYJ5
 d2abVdCOT+QWlIlM7m/pjdWJII5OYM+4/jtTayGepEaR4fTUzKtPZPBLNUBDBKE+
 4f92PC8LiuPkwJgb6XT96onPz1bDCOnPSEdwoKUFKPeGUcwgVOM/Wx5NU4Yf7rfg
 RxQwZ7mJXbjCYFlmGGo/0QDy6UEGkIFYlJSzooP+wlK1JvZ5h2M+9QKX2FtwzR+R
 I2kBxcTXWsM/h88R7MkNqbNIllmhssrJwmAE46OneZbfoBOB+JZjb4nLRTu0jEcS
 zn6f16GmJ37BKn2/qYY/
 =Ww6H
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.15:

   - The biggest change is core API extensions and mlx5 low-level driver
     support for handling DIF/DIX-style protection information, and the
     addition of PI support to the iSER initiator.  Target support will
     be arriving shortly through the SCSI target tree.

   - A nice simplification to the "umem" memory pinning library now that
     we have chained sg lists.  Kudos to Yishai Hadas for realizing our
     code didn't have to be so crazy.

   - Another nice simplification to the sg wrappers used by qib, ipath
     and ehca to handle their mapping of memory to adapter.

   - The usual batch of fixes to bugs found by static checkers etc.
     from intrepid people like Dan Carpenter and Yann Droneaud.

   - A large batch of cxgb4, ocrdma, qib driver updates"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (102 commits)
  RDMA/ocrdma: Unregister inet notifier when unloading ocrdma
  RDMA/ocrdma: Fix warnings about pointer <-> integer casts
  RDMA/ocrdma: Code clean-up
  RDMA/ocrdma: Display FW version
  RDMA/ocrdma: Query controller information
  RDMA/ocrdma: Support non-embedded mailbox commands
  RDMA/ocrdma: Handle CQ overrun error
  RDMA/ocrdma: Display proper value for max_mw
  RDMA/ocrdma: Use non-zero tag in SRQ posting
  RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr()
  RDMA/ocrdma: Increment abi version count
  RDMA/ocrdma: Update version string
  be2net: Add abi version between be2net and ocrdma
  RDMA/ocrdma: ABI versioning between ocrdma and be2net
  RDMA/ocrdma: Allow DPP QP creation
  RDMA/ocrdma: Read ASIC_ID register to select asic_gen
  RDMA/ocrdma: SQ and RQ doorbell offset clean up
  RDMA/ocrdma: EQ full catastrophe avoidance
  RDMA/cxgb4: Disable DSGL use by default
  RDMA/cxgb4: rx_data() needs to hold the ep mutex
  ...
2014-04-03 16:57:19 -07:00
Linus Torvalds 154d6f18a4 This is the bulk of GPIO changes for v3.15:
- Merged in a branch of irqchip changes from Thomas
   Gleixner: we need to have new callbacks from the
   irqchip to determine if the GPIO line will be eligible
   for IRQs, and this callback must be able to say "no".
   After some thinking I got the branch from tglx and
   have switched all current users over to use this.
 
 - Based on tglx patches, we have added some generic
   irqchip helpers in the gpiolib core. These will
   help centralize code when GPIO drivers have simple
   chained/cascaded IRQs. Drivers will still define
   their irqchip vtables, but the gpiolib core will
   take care of irqdomain set-up, mapping from local
   offsets to Linux irqs, and reserve resources by
   marking the GPIO lines for IRQs.
 
 - Initially the PL061 and Nomadik GPIO/pin control
   drivers have been switched over to use the new
   gpiochip-to-irqchip infrastructure with more
   drivers expected for the next kernel cycle. The
   factoring of just two drivers still makes it worth
   it so it is already a win.
 
 - A new driver for the Synopsys DesignWare APB GPIO
   block.
 
 - Modify the DaVinci GPIO driver to be reusable also
   for the new TI Keystone architecture.
 
 - A new driver for the LSI ZEVIO SoCs.
 
 - Delete the obsolte tnetv107x driver.
 
 - Some incremental work on GPIO descriptors: have
   gpiod_direction_output() use a logical level,
   respecting assertion polarity through ACTIVE_LOW
   flags, adding gpiod_direction_output_raw() for the
   case where you want to set that very value. Add
   gpiochip_get_desc() to fetch a GPIO descriptor from
   a specific offset on a certain chip inside driver
   code.
 
 - Switch ACPI GPIO code over to using
   gpiochip_get_desc() and get rid of gpio_to_desc().
 
 - The ACPI GPIO event handling code has been reworked
   after encountering an actual real life implementation.
 
 - Support for ACPI GPIO operation regions.
 
 - Generic GPIO chips can now be assigned labels/names
   from platform data.
 
 - We now clamp values returned from GPIO drivers to
   the boolean [0,1] range.
 
 - Some improved documentation on how to use the polarity
   flag was added.
 
 - The a large slew of incremental driver updates and
   non-critical fixes. Some targeted for stable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTPQnkAAoJEEEQszewGV1zyf4P/AmXV0O/FoyeQnXDxDsp7V/t
 JpfD0Gy8FlFmRxjG+UYutRCWUHxFQJU+j0ToVC4/N8clNS1LwA+ZwhNgB8dqRokz
 JVeeqUPn95z2kGe3j9DgVXWMRAytq7y8fXFuNUN36losceuxyOj4mYKLP9Yjnp9l
 4pS1TtQHF95a7qmnyYjGZy8VNcUz1gJ7wJrGxKI+Kl/8pcdA6rPqom6ozCXpZjaD
 5GGQoSvXKIn44+8qZeJsebd1YEso/8K66e9JomcGEsuZl78ArDOzoSllpYF2h/RM
 bo4BFUmoOL3/jVp7FFVbybfolwuRmQesY4NFqx03e+y++hxHFHl90FT+mnednS2Q
 k4lB0o1YRjf2tfMmm4cJ3tVBnFRhssTVb9ynDbzUw61mNVEuxP90f/njrHlObnPT
 1uVVWUE+4ojral213S2IYGHkg1OlWSn0DP6tEaswjOsGJrMdXpdxS5RPwcRtcByT
 HufZRNbUbLzXBzf4WeV2foSS3XqbXYcuMfdRBSWrbuJqW56robbdKKyvrMRPvh7j
 FV7SEK0yFPRe3nuzKM+t9TDGdUt4qivv/YfVeGfCnTVgFOac6cKrHG9gzM58mVcb
 4czG3B1TbqgfGVeZuew4qUdlHSmnSsS+pf/h9Yh9QCHqaKGh3R17cSDxIKAIVTIW
 pH6nuShTXsbrmRMeMhux
 =8Qqf
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v3.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull bulk of gpio updates from Linus Walleij:
 "A pretty big chunk of changes this time, but it has all been on
  rotation in linux-next and had some testing.  Of course there will be
  some amount of fixes on top...

   - Merged in a branch of irqchip changes from Thomas Gleixner: we need
     to have new callbacks from the irqchip to determine if the GPIO
     line will be eligible for IRQs, and this callback must be able to
     say "no".  After some thinking I got the branch from tglx and have
     switched all current users over to use this.

   - Based on tglx patches, we have added some generic irqchip helpers
     in the gpiolib core.  These will help centralize code when GPIO
     drivers have simple chained/cascaded IRQs.  Drivers will still
     define their irqchip vtables, but the gpiolib core will take care
     of irqdomain set-up, mapping from local offsets to Linux irqs, and
     reserve resources by marking the GPIO lines for IRQs.

   - Initially the PL061 and Nomadik GPIO/pin control drivers have been
     switched over to use the new gpiochip-to-irqchip infrastructure
     with more drivers expected for the next kernel cycle.  The
     factoring of just two drivers still makes it worth it so it is
     already a win.

   - A new driver for the Synopsys DesignWare APB GPIO block.

   - Modify the DaVinci GPIO driver to be reusable also for the new TI
     Keystone architecture.

   - A new driver for the LSI ZEVIO SoCs.

   - Delete the obsolte tnetv107x driver.

   - Some incremental work on GPIO descriptors: have
     gpiod_direction_output() use a logical level, respecting assertion
     polarity through ACTIVE_LOW flags, adding gpiod_direction_output_raw()
     for the case where you want to set that very value.  Add
     gpiochip_get_desc() to fetch a GPIO descriptor from a specific
     offset on a certain chip inside driver code.

   - Switch ACPI GPIO code over to using gpiochip_get_desc() and get rid
     of gpio_to_desc().

   - The ACPI GPIO event handling code has been reworked after
     encountering an actual real life implementation.

   - Support for ACPI GPIO operation regions.

   - Generic GPIO chips can now be assigned labels/names from platform
     data.

   - We now clamp values returned from GPIO drivers to the boolean [0,1]
     range.

   - Some improved documentation on how to use the polarity flag was
     added.

   - a large slew of incremental driver updates and non-critical fixes.
     Some targeted for stable"

* tag 'gpio-v3.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (80 commits)
  gpio: rcar: Add helper variable dev = &pdev->dev
  gpio-lynxpoint: force gpio_get() to return "1" and "0" only
  gpio: unmap gpio irqs properly
  pch_gpio: set value before enabling output direction
  gpio: moxart: Actually set output state in moxart_gpio_direction_output()
  gpio: moxart: Avoid forward declaration
  gpio: mxs: Allow for recursive enable_irq_wake() call
  gpio: samsung: Add missing "break" statement
  gpio: twl4030: Remove redundant assignment
  gpio: dwapb: correct gpio-cells in binding document
  gpio: iop: fix devm_ioremap_resource() return value checking
  pinctrl: coh901: convert driver to use gpiolib irqchip
  pinctrl: nomadik: convert driver to use gpiolib irqchip
  gpio: pl061: convert driver to use gpiolib irqchip
  gpio: add IRQ chip helpers in gpiolib
  pinctrl: nomadik: factor in platform data container
  pinctrl: nomadik: rename secondary to latent
  gpio: Driver for SYSCON-based GPIOs
  gpio: generic: Use platform_device_id->driver_data field for driver flags
  pinctrl: coh901: move irq line locking to resource callbacks
  ...
2014-04-03 16:44:15 -07:00
Russell King bce5669be3 Merge branch 'devel-stable' into for-next 2014-04-04 00:33:49 +01:00
Russell King 95959e6a06 Merge branches 'amba', 'fixes', 'misc', 'mmci', 'unstable/omap-dma' and 'unstable/sa11x0' into for-next 2014-04-04 00:33:32 +01:00
Russell King 596c471b69 dmaengine: omap-dma: move register read/writes into omap-dma.c
Export the DMA register information from the SoC specific data, such
that we can access the registers directly in omap-dma.c, mapping the
register region ourselves as well.

Rather than calculating the DMA channel register in its entirety for
each access, we pre-calculate an offset base address for the allocated
DMA channel and then just use the appropriate register offset.

Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-04-04 00:31:49 +01:00
Russell King 9834f81314 ARM: omap: move dma channel allocation into plat-omap code
This really needs to be there, because otherwise the plat-omap code can
kfree() this data structure, and then re-use the pointer later.

Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-04-04 00:31:46 +01:00
Russell King 64a2dc3d3d ARM: omap: clean up DMA register accesses
We can do much better with this by using a structure to describe each
register, rather than code.

Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-04-04 00:31:44 +01:00
Russell King e38b1485fd ARM: omap: remove references to disable_irq_lch
The disable_irq_lch method is never actually used, so there's not much
point it existing; remove it.

Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-04-04 00:31:41 +01:00
Russell King b9e97822da dmaengine: omap-dma: program hardware directly
Program the transfer parameters directly into the hardware, rather
than using the functions in arch/arm/plat-omap/dma.c.

Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-04-04 00:27:43 +01:00
Russell King 1b416c4b41 dmaengine: omap-dma: provide a hook to get the underlying DMA platform ops
Provide and use a hook to obtain the underlying DMA platform operations
so that omap-dma.c can access the hardware more directly without
involving the legacy DMA driver.

Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-04-04 00:27:38 +01:00
Linus Torvalds 76ca7d1cca Merge branch 'akpm' (incoming from Andrew)
Merge first patch-bomb from Andrew Morton:
 - Various misc bits
 - kmemleak fixes
 - small befs, codafs, cifs, efs, freexxfs, hfsplus, minixfs, reiserfs things
 - fanotify
 - I appear to have become SuperH maintainer
 - ocfs2 updates
 - direct-io tweaks
 - a bit of the MM queue
 - printk updates
 - MAINTAINERS maintenance
 - some backlight things
 - lib/ updates
 - checkpatch updates
 - the rtc queue
 - nilfs2 updates
 - Small Documentation/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (237 commits)
  Documentation/SubmittingPatches: remove references to patch-scripts
  Documentation/SubmittingPatches: update some dead URLs
  Documentation/filesystems/ntfs.txt: remove changelog reference
  Documentation/kmemleak.txt: updates
  fs/reiserfs/super.c: add __init to init_inodecache
  fs/reiserfs: move prototype declaration to header file
  fs/hfsplus/attributes.c: add __init to hfsplus_create_attr_tree_cache()
  fs/hfsplus/extents.c: fix concurrent acess of alloc_blocks
  fs/hfsplus/extents.c: remove unused variable in hfsplus_get_block
  nilfs2: update project's web site in nilfs2.txt
  nilfs2: update MAINTAINERS file entries fix
  nilfs2: verify metadata sizes read from disk
  nilfs2: add FITRIM ioctl support for nilfs2
  nilfs2: add nilfs_sufile_trim_fs to trim clean segs
  nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl
  nilfs2: add nilfs_sufile_set_suinfo to update segment usage
  nilfs2: add struct nilfs_suinfo_update and flags
  nilfs2: update MAINTAINERS file entries
  fs/coda/inode.c: add __init to init_inodecache()
  BEFS: logging cleanup
  ...
2014-04-03 16:22:16 -07:00
Ryusuke Konishi 0ec060d188 nilfs2: verify metadata sizes read from disk
Add code to check sizes of on-disk data of metadata files such as inode
size, segment usage size, DAT entry size, and checkpoint size.  Although
these sizes are read from disk, the current implementation doesn't check
them.

If these sizes are not sane on disk, it can cause out-of-range access to
metadata or memory access overrun on metadata block buffers due to
overflow in sundry calculations.

Both lower limit and upper limit of metadata sizes are verified to
prevent these issues.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:26 -07:00
Andreas Rohner 2cc88f3a5f nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl
With this ioctl the segment usage entries in the SUFILE can be updated
from userspace.

This is useful, because it allows the userspace GC to modify and update
segment usage entries for specific segments, which enables it to avoid
unnecessary write operations.

If a segment needs to be cleaned, but there is no or very little
reclaimable space in it, the cleaning operation basically degrades to a
useless moving operation.  In the end the only thing that changes is the
location of the data and a timestamp in the segment usage information.
With this ioctl the GC can skip the cleaning and update the segment
usage entries directly instead.

This is basically a shortcut to cleaning the segment.  It is still
necessary to read the segment summary information, but the writing of
the live blocks can be skipped if it's not worth it.

[konishi.ryusuke@lab.ntt.co.jp: add description of NILFS_IOCTL_SET_SUINFO ioctl]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:25 -07:00
Andreas Rohner 90ccf7dde9 nilfs2: add struct nilfs_suinfo_update and flags
Add the nilfs_suinfo_update structure, which contains the information
needed to update one segment usage entry.  The flags specify, which
fields need to be updated.

Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:25 -07:00
Josh Cartwright 5a418558cd rtc: pm8xxx: add support for devicetree
Add support for describing the PM8921/PM8058 RTC in device tree.

Additionally:
   - drop support for describing the RTC using platform data,
     as there are no current in tree users who do so.
   - make allow_set_time a device-specific flag, instead of mucking
     with the rtc_ops

Signed-off-by: Josh Cartwright <joshc@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:23 -07:00
Rashika Kheria af661e8360 lib/decompress_inflate.c: include appropriate header file
Include appropriate header file include/linux/decompress/inflate.h in
lib/decompress_inflate.c because it has prototype declaration of
function defined in lib/decompress_inflate.c.

Also, fix the guard around the header file
include/linux/decompress/inflate.h to use a more unique guard symbol.
This avoids conflict with the INFLATE_H defined by
zlib_inflate/inflate.h.

This eliminates the following warning in lib/decompress_inflate.c:

  lib/decompress_inflate.c:35:17: warning: no previous prototype for `gunzip' [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:12 -07:00
Liu Ying a55944ca82 backlight: update bd state & fb_blank properties when necessary
We don't have to update the state and fb_blank properties of a backlight
device every time a blanking or unblanking event comes because they may
have already been what we want.  Another thought is that one backlight
device may be shared by multiple framebuffers.  The backlight driver
should take the backlight device as a resource shared by all the
associated framebuffers.

This patch adds some logic to record each framebuffer's backlight usage
to determine the backlight device use count and whether the two
properties should be updated or not.  To be more specific, only one
unblank operation on a certain blanked framebuffer may increase the
backlight device's use count by one, while one blank operation on a
certain unblanked framebuffer may decrease the use count by one, because
the userspace is likely to unblank an unblanked framebuffer or blank a
blanked framebuffer.

Signed-off-by: Liu Ying <Ying.Liu@freescale.com>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:09 -07:00
Simon Kågström d487d57581 include/linux/printk.h: remove double asmlinkage in printk_emit
The double asmlinkage was introduced in commit 7ff9554bb5 ("printk:
convert byte-buffer to variable-length record buffer").

Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:08 -07:00
Petr Mladek 0185698c0f printk: remove duplicated check for log level
The check for the exact log level is already done in printk_get_level.
We do not need to duplicate it in printk_skip_level.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:07 -07:00
Joe Perches a5ed3cee94 err.h: use bool for IS_ERR and IS_ERR_OR_NULL
Use the more natural return of bool for these tests.

No difference observed in .o files produced by gcc for x86.

Remove the dentry description of kernel pointers left over from the 90's
and 2002's cleanup move of parts of fs.h to err.h.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:06 -07:00
Wang YanQing 8f6c5ffc89 kernel/groups.c: remove return value of set_groups
After commit 6307f8fee2 ("security: remove dead hook task_setgroups"),
set_groups will always return zero, so we could just remove return value
of set_groups.

This patch reduces code size, and simplfies code to use set_groups,
because we don't need to check its return value any more.

[akpm@linux-foundation.org: remove obsolete claims from set_groups() comment]
Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:05 -07:00
Rashika Kheria e3a0cfdc8c include/linux/syscalls.h: add sys32_quotactl() prototype
This eliminates the following warning in quota/compat.c:

  fs/quota/compat.c:43:17: warning: no previous prototype for `sys32_quotactl' [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:05 -07:00
Vladimir Davydov bcccff93af kobject: don't block for each kobject_uevent
Currently kobject_uevent has somewhat unpredictable semantics.  The
point is, since it may call a usermode helper and wait for it to execute
(UMH_WAIT_EXEC), it is impossible to say for sure what lock dependencies
it will introduce for the caller - strictly speaking it depends on what
fs the binary is located on and the set of locks fork may take.  There
are quite a few kobject_uevent's users that do not take this into
account and call it with various mutexes taken, e.g.  rtnl_mutex,
net_mutex, which might potentially lead to a deadlock.

Since there is actually no reason to wait for the usermode helper to
execute there, let's make kobject_uevent start the helper asynchronously
with the aid of the UMH_NO_WAIT flag.

Personally, I'm interested in this, because I really want kobject_uevent
to be called under the slab_mutex in the slub implementation as it used
to be some time ago, because it greatly simplifies synchronization and
automatically fixes a kmemcg-related race.  However, there was a
deadlock detected on an attempt to call kobject_uevent under the
slab_mutex (see https://lkml.org/lkml/2012/1/14/45), which was reported
to be fixed by releasing the slab_mutex for kobject_uevent.

Unfortunately, there was no information about who exactly blocked on the
slab_mutex causing the usermode helper to stall, neither have I managed
to find this out or reproduce the issue.

BTW, this is not the first attempt to make kobject_uevent use
UMH_NO_WAIT.  Previous one was made by commit f520360d93 ("kobject:
don't block for each kobject_uevent"), but it was wrong (it passed
arguments allocated on stack to async thread) so it was reverted in
05f54c13cd ("Revert "kobject: don't block for each kobject_uevent".").
It targeted on speeding up the boot process though.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Greg KH <greg@kroah.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:04 -07:00
Dave Hansen 5509a5d27b drop_caches: add some documentation and info message
There is plenty of anecdotal evidence and a load of blog posts
suggesting that using "drop_caches" periodically keeps your system
running in "tip top shape".  Perhaps adding some kernel documentation
will increase the amount of accurate data on its use.

If we are not shrinking caches effectively, then we have real bugs.
Using drop_caches will simply mask the bugs and make them harder to
find, but certainly does not fix them, nor is it an appropriate
"workaround" to limit the size of the caches.  On the contrary, there
have been bug reports on issues that turned out to be misguided use of
cache dropping.

Dropping caches is a very drastic and disruptive operation that is good
for debugging and running tests, but if it creates bug reports from
production use, kernel developers should be aware of its use.

Add a bit more documentation about it, a syslog message to track down
abusers, and vmstat drop counters to help analyze problem reports.

[akpm@linux-foundation.org: checkpatch fixes]
[hannes@cmpxchg.org: add runtime suppression control]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:04 -07:00
Sasha Levin 67f9fd91f9 mm: remove read_cache_page_async()
This patch removes read_cache_page_async() which wasn't really needed
anywhere and simplifies the code around it a bit.

read_cache_page_async() is useful when we want to read a page into the
cache without waiting for it to complete.  This happens when the
appropriate callback 'filler' doesn't complete its read operation and
releases the page lock immediately, and instead queues a different
completion routine to do that.  This never actually happened anywhere in
the code.

read_cache_page_async() had 3 different callers:

- read_cache_page() which is the sync version, it would just wait for
  the requested read to complete using wait_on_page_read().

- JFFS2 would call it from jffs2_gc_fetch_page(), but the filler
  function it supplied doesn't do any async reads, and would complete
  before the filler function returns - making it actually a sync read.

- CRAMFS would call it using the read_mapping_page_async() wrapper, with
  a similar story to JFFS2 - the filler function doesn't do anything that
  reminds async reads and would always complete before the filler function
  returns.

To sum it up, the code in mm/filemap.c never took advantage of having
read_cache_page_async().  While there are filler callbacks that do async
reads (such as the block one), we always called it with the
read_cache_page().

This patch adds a mandatory wait for read to complete when adding a new
page to the cache, and removes read_cache_page_async() and its wrappers.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:04 -07:00
Rashika Kheria c558784fa9 include/linux/mm.h: remove ifdef condition
The ifdef conditions in include/linux/mm.h presents three cases:

 - !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID)

   There is no actual definition of function but include/linux/mm.h has a
   static inline stub defined.

 - defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID)

   linux/mm.h does not define a prototype, but mm/page_alloc.c defines
   the function.

   Hence, compiler reports the following warning:

     mm/page_alloc.c:4300:15: warning: no previous prototype for `__early_pfn_to_nid' [-Wmissing-prototypes]

 - defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID)

   The architecture defines the function, and linux/mm.h has a
   prototype.

Thus, join the conditions of Case 2 and 3 ie eliminate the ifdef
condition of CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID to eliminate the missing
prototype warning from file mm/page_alloc.c.

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:03 -07:00
Johannes Weiner 449dd6984d mm: keep page cache radix tree nodes in check
Previously, page cache radix tree nodes were freed after reclaim emptied
out their page pointers.  But now reclaim stores shadow entries in their
place, which are only reclaimed when the inodes themselves are
reclaimed.  This is problematic for bigger files that are still in use
after they have a significant amount of their cache reclaimed, without
any of those pages actually refaulting.  The shadow entries will just
sit there and waste memory.  In the worst case, the shadow entries will
accumulate until the machine runs out of memory.

To get this under control, the VM will track radix tree nodes
exclusively containing shadow entries on a per-NUMA node list.  Per-NUMA
rather than global because we expect the radix tree nodes themselves to
be allocated node-locally and we want to reduce cross-node references of
otherwise independent cache workloads.  A simple shrinker will then
reclaim these nodes on memory pressure.

A few things need to be stored in the radix tree node to implement the
shadow node LRU and allow tree deletions coming from the list:

1. There is no index available that would describe the reverse path
   from the node up to the tree root, which is needed to perform a
   deletion.  To solve this, encode in each node its offset inside the
   parent.  This can be stored in the unused upper bits of the same
   member that stores the node's height at no extra space cost.

2. The number of shadow entries needs to be counted in addition to the
   regular entries, to quickly detect when the node is ready to go to
   the shadow node LRU list.  The current entry count is an unsigned
   int but the maximum number of entries is 64, so a shadow counter
   can easily be stored in the unused upper bits.

3. Tree modification needs tree lock and tree root, which are located
   in the address space, so store an address_space backpointer in the
   node.  The parent pointer of the node is in a union with the 2-word
   rcu_head, so the backpointer comes at no extra cost as well.

4. The node needs to be linked to an LRU list, which requires a list
   head inside the node.  This does increase the size of the node, but
   it does not change the number of objects that fit into a slab page.

[akpm@linux-foundation.org: export the right function]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
Johannes Weiner 139e561660 lib: radix_tree: tree node interface
Make struct radix_tree_node part of the public interface and provide API
functions to create, look up, and delete whole nodes.  Refactor the
existing insert, look up, delete functions on top of these new node
primitives.

This will allow the VM to track and garbage collect page cache radix
tree nodes.

[sasha.levin@oracle.com: return correct error code on insertion failure]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
Johannes Weiner a528910e12 mm: thrash detection-based file cache sizing
The VM maintains cached filesystem pages on two types of lists.  One
list holds the pages recently faulted into the cache, the other list
holds pages that have been referenced repeatedly on that first list.
The idea is to prefer reclaiming young pages over those that have shown
to benefit from caching in the past.  We call the recently usedbut
ultimately was not significantly better than a FIFO policy and still
thrashed cache based on eviction speed, rather than actual demand for
cache.

This patch solves one half of the problem by decoupling the ability to
detect working set changes from the inactive list size.  By maintaining
a history of recently evicted file pages it can detect frequently used
pages with an arbitrarily small inactive list size, and subsequently
apply pressure on the active list based on actual demand for cache, not
just overall eviction speed.

Every zone maintains a counter that tracks inactive list aging speed.
When a page is evicted, a snapshot of this counter is stored in the
now-empty page cache radix tree slot.  On refault, the minimum access
distance of the page can be assessed, to evaluate whether the page
should be part of the active list or not.

This fixes the VM's blindness towards working set changes in excess of
the inactive list.  And it's the foundation to further improve the
protection ability and reduce the minimum inactive list size of 50%.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
Johannes Weiner 91b0abe36a mm + fs: store shadow entries in page cache
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page.  As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently.  At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.

Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate.  Reclaim will check
for this flag before installing shadow pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
Johannes Weiner 0cd6144aad mm + fs: prepare for non-page entries in page cache radix trees
shmem mappings already contain exceptional entries where swap slot
information is remembered.

To be able to store eviction information for regular page cache, prepare
every site dealing with the radix trees directly to handle entries other
than pages.

The common lookup functions will filter out non-page entries and return
NULL for page cache holes, just as before.  But provide a raw version of
the API which returns non-page entries as well, and switch shmem over to
use it.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:00 -07:00
Johannes Weiner e7b563bb2a mm: filemap: move radix tree hole searching here
The radix tree hole searching code is only used for page cache, for
example the readahead code trying to get a a picture of the area
surrounding a fault.

It sufficed to rely on the radix tree definition of holes, which is
"empty tree slot".  But this is about to change, though, as shadow page
descriptors will be stored in the page cache after the actual pages get
evicted from memory.

Move the functions over to mm/filemap.c and make them native page cache
operations, where they can later be adapted to handle the new definition
of "page cache hole".

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:00 -07:00
Johannes Weiner 53c59f262d lib: radix-tree: add radix_tree_delete_item()
Provide a function that does not just delete an entry at a given index,
but also allows passing in an expected item.  Delete only if that item
is still located at the specified index.

This is handy when lockless tree traversals want to delete entries as
well because they don't have to do an second, locked lookup to verify
the slot has not changed under them before deleting the entry.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:00 -07:00
Johannes Weiner 6a3ed2123a mm: vmstat: fix UP zone state accounting
Summary:

The VM maintains cached filesystem pages on two types of lists.  One
list holds the pages recently faulted into the cache, the other list
holds pages that have been referenced repeatedly on that first list.
The idea is to prefer reclaiming young pages over those that have shown
to benefit from caching in the past.  We call the recently used list
"inactive list" and the frequently used list "active list".

Currently, the VM aims for a 1:1 ratio between the lists, which is the
"perfect" trade-off between the ability to *protect* frequently used
pages and the ability to *detect* frequently used pages.  This means
that working set changes bigger than half of cache memory go undetected
and thrash indefinitely, whereas working sets bigger than half of cache
memory are unprotected against used-once streams that don't even need
caching.

This happens on file servers and media streaming servers, where the
popular files and file sections change over time.  Even though the
individual files might be smaller than half of memory, concurrent access
to many of them may still result in their inter-reference distance being
greater than half of memory.  It's also been reported as a problem on
database workloads that switch back and forth between tables that are
bigger than half of memory.  In these cases the VM never recognizes the
new working set and will for the remainder of the workload thrash disk
data which could easily live in memory.

Historically, every reclaim scan of the inactive list also took a
smaller number of pages from the tail of the active list and moved them
to the head of the inactive list.  This model gave established working
sets more gracetime in the face of temporary use-once streams, but
ultimately was not significantly better than a FIFO policy and still
thrashed cache based on eviction speed, rather than actual demand for
cache.

This series solves the problem by maintaining a history of pages evicted
from the inactive list, enabling the VM to detect frequently used pages
regardless of inactive list size and facilitate working set transitions.

Tests:

The reported database workload is easily demonstrated on a 8G machine
with two filesets a 6G.  This fio workload operates on one set first,
then switches to the other.  The VM should obviously always cache the
set that the workload is currently using.

This test is based on a problem encountered by Citus Data customers:
  http://citusdata.com/blog/72-linux-memory-manager-and-your-big-data

unpatched:
  db1: READ: io=98304MB, aggrb=885559KB/s, minb=885559KB/s, maxb=885559KB/s, mint= 113672msec, maxt= 113672msec
  db2: READ: io=98304MB, aggrb= 66169KB/s, minb= 66169KB/s, maxb= 66169KB/s, mint=1521302msec, maxt=1521302msec
  sdb: ios=835750/4, merge=2/1, ticks=4659739/60016, in_queue=4719203, util=98.92%

  real    27m15.541s
  user    0m19.059s
  sys     0m51.459s

patched:
  db1: READ: io=98304MB, aggrb=877783KB/s, minb=877783KB/s, maxb=877783KB/s, mint=114679msec, maxt=114679msec
  db2: READ: io=98304MB, aggrb=397449KB/s, minb=397449KB/s, maxb=397449KB/s, mint=253273msec, maxt=253273msec
  sdb: ios=170587/4, merge=2/1, ticks=954910/61123, in_queue=1015923, util=90.40%

  real    6m8.630s
  user    0m14.714s
  sys     0m31.233s

As can be seen, the unpatched kernel simply never adapts to the
workingset change and db2 is stuck indefinitely with secondary storage
speed.  The patched kernel needs 2-3 iterations over db2 before it
replaces db1 and reaches full memory speed.  Given the unbounded
negative affect of the existing VM behavior, these patches should be
considered correctness fixes rather than performance optimizations.

Another test resembles a fileserver or streaming server workload, where
data in excess of memory size is accessed at different frequencies.
There is very hot data accessed at a high frequency.  Machines should be
fitted so that the hot set of such a workload can be fully cached or all
bets are off.  Then there is a very big (compared to available memory)
set of data that is used-once or at a very low frequency; this is what
drives the inactive list and does not really benefit from caching.
Lastly, there is a big set of warm data in between that is accessed at
medium frequencies and benefits from caching the pages between the first
and last streamer of each burst.

unpatched:
   hot: READ: io=128000MB, aggrb=160693KB/s, minb=160693KB/s, maxb=160693KB/s, mint=815665msec, maxt=815665msec
  warm: READ: io= 81920MB, aggrb=109853KB/s, minb= 27463KB/s, maxb= 29244KB/s, mint=717110msec, maxt=763617msec
  cold: READ: io= 30720MB, aggrb= 35245KB/s, minb= 35245KB/s, maxb= 35245KB/s, mint=892530msec, maxt=892530msec
   sdb: ios=797960/4, merge=11763/1, ticks=4307910/796, in_queue=4308380, util=100.00%

patched:
   hot: READ: io=128000MB, aggrb=160678KB/s, minb=160678KB/s, maxb=160678KB/s, mint=815740msec, maxt=815740msec
  warm: READ: io= 81920MB, aggrb=147747KB/s, minb= 36936KB/s, maxb= 40960KB/s, mint=512000msec, maxt=567767msec
  cold: READ: io= 30720MB, aggrb= 40960KB/s, minb= 40960KB/s, maxb= 40960KB/s, mint=768000msec, maxt=768000msec
   sdb: ios=596514/4, merge=9341/1, ticks=2395362/997, in_queue=2396484, util=79.18%

In both kernels, the hot set is propagated to the active list and then
served from cache.

In both kernels, the beginning of the warm set is propagated to the
active list as well, but in the unpatched case the active list
eventually takes up half of memory and no new pages from the warm set
get activated, despite repeated access, and despite most of the active
list soon being stale.  The patched kernel on the other hand detects the
thrashing and manages to keep this cache window rolling through the data
set.  This frees up enough IO bandwidth that the cold set is served at
full speed as well and disk utilization even drops by 20%.

For reference, this same test was performed with the traditional
demotion mechanism, where deactivation is coupled to inactive list
reclaim.  However, this had the same outcome as the unpatched kernel:
while the warm set does indeed get activated continuously, it is forced
out of the active list by inactive list pressure, which is dictated
primarily by the unrelated cold set.  The warm set is evicted before
subsequent streamers can benefit from it, even though there would be
enough space available to cache the pages of interest.

Costs:

Page reclaim used to shrink the radix trees but now the tree nodes are
reused for shadow entries, where the cost depends heavily on the page
cache access patterns.  However, with workloads that maintain spatial or
temporal locality, the shadow entries are either refaulted quickly or
reclaimed along with the inode object itself.  Workloads that will
experience a memory cost increase are those that don't really benefit
from caching in the first place.

A more predictable alternative would be a fixed-cost separate pool of
shadow entries, but this would incur relatively higher memory cost for
well-behaved workloads at the benefit of cornercases.  It would also
make the shadow entry lookup more costly compared to storing them
directly in the cache structure.

Future:

To simplify the merging process, this patch set is implementing thrash
detection on a global per-zone level only for now, but the design is
such that it can be extended to memory cgroups as well.  All we need to
do is store the unique cgroup ID along the node and zone identifier
inside the eviction cookie to identify the lruvec.

Right now we have a fixed ratio (50:50) between inactive and active list
but we already have complaints about working sets exceeding half of
memory being pushed out of the cache by simple streaming in the
background.  Ultimately, we want to adjust this ratio and allow for a
much smaller inactive list.  These patches are an essential step in this
direction because they decouple the VMs ability to detect working set
changes from the inactive list size.  This would allow us to base the
inactive list size on the combined readahead window size for example and
potentially protect a much bigger working set.

It's also a big step towards activating pages with a reuse distance
larger than memory, as long as they are the most frequently used pages
in the workload.  This will require knowing more about the access
frequency of active pages than what we measure right now, so it's also
deferred in this series.

Another possibility of having thrashing information would be to revisit
the idea of local reclaim in the form of zero-config memory control
groups.  Instead of having allocating tasks go straight to global
reclaim, they could try to reclaim the pages in the memcg they are part
of first as long as the group is not thrashing.  This would allow a user
to drop e.g.  a back-up job in an otherwise unconfigured memcg and it
would only inflate (and possibly do global reclaim) until it has enough
memory to do proper readahead.  But once it reaches that point and stops
thrashing it would just recycle its own used-once pages without kicking
out the cache of any other tasks in the system more than necessary.

This patch (of 10):

Fengguang Wu's build testing spotted problems with inc_zone_state() and
dec_zone_state() on UP configurations in out-of-tree patches.

inc_zone_state() is declared but not defined, dec_zone_state() is
missing entirely.

Just like with *_zone_page_state(), they can be defined like their
preemption-unsafe counterparts on UP.

[akpm@linux-foundation.org: make it build]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:00 -07:00
Davidlohr Bueso 7b24d8616b mm, hugetlb: fix race in region tracking
There is a race condition if we map a same file on different processes.
Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex.
When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only
mmap_sem (exclusively).  This doesn't prevent other tasks from modifying
the region structure, so it can be modified by two processes
concurrently.

To solve this, introduce a spinlock to resv_map and make region
manipulation function grab it before they do actual work.

[davidlohr@hp.com: updated changelog]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:59 -07:00
Joonsoo Kim 9119a41e90 mm, hugetlb: unify region structure handling
Currently, to track reserved and allocated regions, we use two different
ways, depending on the mapping.  For MAP_SHARED, we use
address_mapping's private_list and, while for MAP_PRIVATE, we use a
resv_map.

Now, we are preparing to change a coarse grained lock which protect a
region structure to fine grained lock, and this difference hinder it.
So, before changing it, unify region structure handling, consistently
using a resv_map regardless of the kind of mapping.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:59 -07:00
Mel Gorman d26914d117 mm: optimize put_mems_allowed() usage
Since put_mems_allowed() is strictly optional, its a seqcount retry, we
don't need to evaluate the function if the allocation was in fact
successful, saving a smp_rmb some loads and comparisons on some relative
fast-paths.

Since the naming, get/put_mems_allowed() does suggest a mandatory
pairing, rename the interface, as suggested by Mel, to resemble the
seqcount interface.

This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
where it is important to note that the return value of the latter call
is inverted from its previous incarnation.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:58 -07:00
Jan Kara 9f985cb6c4 quota: provide function to grab quota structure reference
Provide dqgrab() function to get quota structure reference when we are
sure it already has at least one active reference.  Make use of this
function inside quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:54 -07:00
Jan Kara 9573f79355 fanotify: convert access_mutex to spinlock
access_mutex is used only to guard operations on access_list.  There's
no need for sleeping within this lock so just make a spinlock out of it.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:51 -07:00
Li Zefan 5f3bf19aeb kmemleak: remove redundant code
Remove kmemleak_padding() and kmemleak_release().

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:50 -07:00
Jan Kara 5acda9d12d bdi: avoid oops on device removal
After commit 839a8e8660 ("writeback: replace custom worker pool
implementation with unbound workqueue") when device is removed while we
are writing to it we crash in bdi_writeback_workfn() ->
set_worker_desc() because bdi->dev is NULL.

This can happen because even though bdi_unregister() cancels all pending
flushing work, nothing really prevents new ones from being queued from
balance_dirty_pages() or other places.

Fix the problem by clearing BDI_registered bit in bdi_unregister() and
checking it before scheduling of any flushing work.

Fixes: 839a8e8660

Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Derek Basehore <dbasehore@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:49 -07:00
Linus Torvalds d0cb5f71c5 VFIO updates for v3.15 include:
- Allow the vfio-type1 IOMMU to support multiple domains within a container
 - Plumb path to query whether all domains are cache-coherent
 - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation
 - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd)
 
 The first patch also makes the vfio-type1 IOMMU driver completely independent
 of the bus_type of the devices it's handling, which enables it to be used for
 both vfio-pci and a future vfio-platform (and hopefully combinations involving
 both simultaneously).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTPbikAAoJECObm247sIsiG9cP/jnurW84YuAHmybzy3R4nMaa
 8BWcQST+TyQ78GWvubFDcRt+vHmJUI4iFWPBIm1twBSVrMy6F1GcT/spmSne1Dfb
 OvcfGEy59tGO+BeklHg5Qq2Hj1UzfeV9uQoy+PrpTF/sYzsyL6g5+O3Zv39SNvr0
 zCwnO2JAKXIpQlkK3wVha6V13X07Z/+d2b4JSsvmON2cnlPzOUYNrgBvoSeV2IQe
 3kMAU9qfAfQlpNDhRRZL/HshgWazCg4XZUp9UdNjUkOxrUp4vXHrVTUJnUGBDytk
 8V9UGRKF3mik9PqpZJk4jLV5urgUVpnUR5747uqs0KF+9GWxClXvh3gp1XX8Zn7f
 NCfDSMn/wrDr6fQBglKUadITaDhF49KV+J0cX083q46BbYxhAfgxv1I9UOvLreG6
 SGAezlTB0mefa1BmSwJdfKwDBsuDOhbF0TsHC6zT7yFc8pqe3hl/vQzUyu3HtwuM
 yPycQiQQmvOaymaiiBRwyLocwJbDNKPigDomv8NZ8Mcd97Fu9YsF4ydaxMJmmeKf
 TffYS4z4tUElYiU2vv1ipB8T+ReCmdtj2cIvyfwTL9jycY9ouO4bJ56dOGWd3WNl
 m7AVokbrrJQ03itUpFMuYjHAzTNUlXVvXlGr9n6L4VTR0RTL1zwJVrMVDsXdhxm4
 XgDbVB5PrxinDAC1vJvI
 =mnzO
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:
 "VFIO updates for v3.15 include:

   - Allow the vfio-type1 IOMMU to support multiple domains within a
     container
   - Plumb path to query whether all domains are cache-coherent
   - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation
   - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd)

  The first patch also makes the vfio-type1 IOMMU driver completely
  independent of the bus_type of the devices it's handling, which
  enables it to be used for both vfio-pci and a future vfio-platform
  (and hopefully combinations involving both simultaneously)"

* tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio:
  vfio: always select ANON_INODES
  kvm/vfio: Support for DMA coherent IOMMUs
  vfio: Add external user check extension interface
  vfio/type1: Add extension to test DMA cache coherence of IOMMU
  vfio/iommu_type1: Multi-IOMMU domain support
2014-04-03 14:05:02 -07:00
Linus Torvalds 32d01dc7be Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "A lot updates for cgroup:

   - The biggest one is cgroup's conversion to kernfs.  cgroup took
     after the long abandoned vfs-entangled sysfs implementation and
     made it even more convoluted over time.  cgroup's internal objects
     were fused with vfs objects which also brought in vfs locking and
     object lifetime rules.  Naturally, there are places where vfs rules
     don't fit and nasty hacks, such as credential switching or lock
     dance interleaving inode mutex and cgroup_mutex with object serial
     number comparison thrown in to decide whether the operation is
     actually necessary, needed to be employed.

     After conversion to kernfs, internal object lifetime and locking
     rules are mostly isolated from vfs interactions allowing shedding
     of several nasty hacks and overall simplification.  This will also
     allow implmentation of operations which may affect multiple cgroups
     which weren't possible before as it would have required nesting
     i_mutexes.

   - Various simplifications including dropping of module support,
     easier cgroup name/path handling, simplified cgroup file type
     handling and task_cg_lists optimization.

   - Prepatory changes for the planned unified hierarchy, which is still
     a patchset away from being actually operational.  The dummy
     hierarchy is updated to serve as the default unified hierarchy.
     Controllers which aren't claimed by other hierarchies are
     associated with it, which BTW was what the dummy hierarchy was for
     anyway.

   - Various fixes from Li and others.  This pull request includes some
     patches to add missing slab.h to various subsystems.  This was
     triggered xattr.h include removal from cgroup.h.  cgroup.h
     indirectly got included a lot of files which brought in xattr.h
     which brought in slab.h.

  There are several merge commits - one to pull in kernfs updates
  necessary for converting cgroup (already in upstream through
  driver-core), others for interfering changes in the fixes branch"

* 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
  cgroup: remove useless argument from cgroup_exit()
  cgroup: fix spurious lockdep warning in cgroup_exit()
  cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
  cgroup: break kernfs active_ref protection in cgroup directory operations
  cgroup: fix cgroup_taskset walking order
  cgroup: implement CFTYPE_ONLY_ON_DFL
  cgroup: make cgrp_dfl_root mountable
  cgroup: drop const from @buffer of cftype->write_string()
  cgroup: rename cgroup_dummy_root and related names
  cgroup: move ->subsys_mask from cgroupfs_root to cgroup
  cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
  cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
  cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
  cgroup: reorganize cgroup bootstrapping
  cgroup: relocate setting of CGRP_DEAD
  cpuset: use rcu_read_lock() to protect task_cs()
  cgroup_freezer: document freezer_fork() subtleties
  cgroup: update cgroup_transfer_tasks() to either succeed or fail
  cgroup: drop task_lock() protection around task->cgroups
  cgroup: update how a newly forked task gets associated with css_set
  ...
2014-04-03 13:05:42 -07:00
Jiri Pirko d0290214de net: add busy_poll device feature
Currently there is no way how to find out if a device supports busy
polling. So add a feature and make it dependent on ndo_busy_poll
existence.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03 14:31:34 -04:00
Daniel Borkmann 8e2f1a63f2 packet: fix packet_direct_xmit for BQL enabled drivers
Currently, in packet_direct_xmit() we test the assigned netdevice queue
for netif_xmit_frozen_or_stopped() before doing an ndo_start_xmit().

This can have the side-effect that BQL enabled drivers which make use
of netdev_tx_sent_queue() internally, set __QUEUE_STATE_STACK_XOFF from
within the stack and would not fully fill the device's TX ring from
packet sockets with PACKET_QDISC_BYPASS enabled.

Instead, use a test without BQL bit so that bursts can be absorbed
into the NICs TX ring. Fix and code suggested by Eric Dumazet, thanks!

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03 14:29:12 -04:00
Linus Torvalds 68114e5eb8 Most of the changes were largely clean ups, and some documentation.
But there were a few features that were added.
 
 Uprobes now work with event triggers and multi buffers.
 Uprobes have support under ftrace and perf.
 
 The big feature is that the function tracer can now be used within the
 multi buffer instances. That is, you can now trace some functions
 in one buffer, others in another buffer, all functions in a third buffer
 and so on. They are basically agnostic from each other. This only
 works for the function tracer and not for the function graph trace,
 although you can have the function graph tracer running in the top level
 buffer (or any tracer for that matter) and have different function tracing
 going on in the sub buffers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTOthtAAoJEKQekfcNnQGu5c8H/Ana/U+0tmksp1dbHkRHsKSH
 +Fsv4Jeu8gf1NaFKHEhkUTcFtnzE6qAPV2VCrcJwXbhAhhwZm+LjrnWdoy3215S3
 cQW4LftLEonh2cM36Cos74TulMEYN6XmL6dQZV+CILKQkDrWU4qJjQ64okXEkqrd
 9iG3p/mSXyvJcmnyg61ALnMOhZDLsXY3djBhWBPhiTPGS6BRb9zh4Pmw6Zv0n2rJ
 U93Gt/3AQrv1ybu73dUxqP0abp60oXOiWoF/R2jcbKqIM+K9RPJX79unCV3jq3u9
 f+6jMlB9PgAMqQj6ihJdwxKDDuzwyrVdEPnsgvl4jarCBCtVVwhKedBaKN/KS8k=
 =HdXY
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Most of the changes were largely clean ups, and some documentation.
  But there were a few features that were added:

  Uprobes now work with event triggers and multi buffers and have
  support under ftrace and perf.

  The big feature is that the function tracer can now be used within the
  multi buffer instances.  That is, you can now trace some functions in
  one buffer, others in another buffer, all functions in a third buffer
  and so on.  They are basically agnostic from each other.  This only
  works for the function tracer and not for the function graph trace,
  although you can have the function graph tracer running in the top
  level buffer (or any tracer for that matter) and have different
  function tracing going on in the sub buffers"

* tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
  tracing: Add BUG_ON when stack end location is over written
  tracepoint: Remove unused API functions
  Revert "tracing: Move event storage for array from macro to standalone function"
  ftrace: Constify ftrace_text_reserved
  tracepoints: API doc update to tracepoint_probe_register() return value
  tracepoints: API doc update to data argument
  ftrace: Fix compilation warning about control_ops_free
  ftrace/x86: BUG when ftrace recovery fails
  ftrace: Warn on error when modifying ftrace function
  ftrace: Remove freelist from struct dyn_ftrace
  ftrace: Do not pass data to ftrace_dyn_arch_init
  ftrace: Pass retval through return in ftrace_dyn_arch_init()
  ftrace: Inline the code from ftrace_dyn_table_alloc()
  ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
  tracing: Evaluate len expression only once in __dynamic_array macro
  tracing: Correctly expand len expressions from __dynamic_array macro
  tracing/module: Replace include of tracepoint.h with jump_label.h in module.h
  tracing: Fix event header migrate.h to include tracepoint.h
  tracing: Fix event header writeback.h to include tracepoint.h
  tracing: Warn if a tracepoint is not set via debugfs
  ...
2014-04-03 10:26:31 -07:00
Linus Torvalds 59ecc26004 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 3.15:
   - Added 3DES driver for OMAP4/AM43xx
   - Added AVX2 acceleration for SHA
   - Added hash-only AEAD algorithms in caam
   - Removed tegra driver as it is not functioning and the hardware is
     too slow
   - Allow blkcipher walks over AEAD (needed for ARM)
   - Fixed unprotected FPU/SSE access in ghash-clmulni-intel
   - Fixed highmem crash in omap-sham
   - Add (zero entropy) randomness when initialising hardware RNGs
   - Fixed unaligned ahash comletion functions
   - Added soft module depedency for crc32c for initrds that use crc32c"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (60 commits)
  crypto: ghash-clmulni-intel - use C implementation for setkey()
  crypto: x86/sha1 - reduce size of the AVX2 asm implementation
  crypto: x86/sha1 - fix stack alignment of AVX2 variant
  crypto: x86/sha1 - re-enable the AVX variant
  crypto: sha - SHA1 transform x86_64 AVX2
  crypto: crypto_wq - Fix late crypto work queue initialization
  crypto: caam - add missing key_dma unmap
  crypto: caam - add support for aead null encryption
  crypto: testmgr - add aead null encryption test vectors
  crypto: export NULL algorithms defines
  crypto: caam - remove error propagation handling
  crypto: hash - Simplify the ahash_finup implementation
  crypto: hash - Pull out the functions to save/restore request
  crypto: hash - Fix the pointer voodoo in unaligned ahash
  crypto: caam - Fix first parameter to caam_init_rng
  crypto: omap-sham - Map SG pages if they are HIGHMEM before accessing
  crypto: caam - Dynamic memory allocation for caam_rng_ctx object
  crypto: allow blkcipher walks over AEAD data
  crypto: remove direct blkcipher_walk dependency on transform
  hwrng: add randomness to system from rng sources
  ...
2014-04-03 09:28:16 -07:00
Linus Torvalds bea803183e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Apart from reordering the SELinux mmap code to ensure DAC is called
  before MAC, these are minor maintenance updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
  selinux: correctly label /proc inodes in use before the policy is loaded
  selinux: put the mmap() DAC controls before the MAC controls
  selinux: fix the output of ./scripts/get_maintainer.pl for SELinux
  evm: enable key retention service automatically
  ima: skip memory allocation for empty files
  evm: EVM does not use MD5
  ima: return d_name.name if d_path fails
  integrity: fix checkpatch errors
  ima: fix erroneous removal of security.ima xattr
  security: integrity: Use a more current logging style
  MAINTAINERS: email updates and other misc. changes
  ima: reduce memory usage when a template containing the n field is used
  ima: restore the original behavior for sending data with ima template
  Integrity: Pass commname via get_task_comm()
  fs: move i_readcount
  ima: use static const char array definitions
  security: have cap_dentry_init_security return error
  ima: new helper: file_inode(file)
  kernel: Mark function as static in kernel/seccomp.c
  capability: Use current logging styles
  ...
2014-04-03 09:26:18 -07:00
Linus Torvalds cd6362befe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Here is my initial pull request for the networking subsystem during
  this merge window:

   1) Support for ESN in AH (RFC 4302) from Fan Du.

   2) Add full kernel doc for ethtool command structures, from Ben
      Hutchings.

   3) Add BCM7xxx PHY driver, from Florian Fainelli.

   4) Export computed TCP rate information in netlink socket dumps, from
      Eric Dumazet.

   5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
      Dichtel.

   6) Convert many drivers to pci_enable_msix_range(), from Alexander
      Gordeev.

   7) Record SKB timestamps more efficiently, from Eric Dumazet.

   8) Switch to microsecond resolution for TCP round trip times, also
      from Eric Dumazet.

   9) Clean up and fix 6lowpan fragmentation handling by making use of
      the existing inet_frag api for it's implementation.

  10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.

  11) Auto size SKB lengths when composing netlink messages based upon
      past message sizes used, from Eric Dumazet.

  12) qdisc dumps can take a long time, add a cond_resched(), From Eric
      Dumazet.

  13) Sanitize netpoll core and drivers wrt.  SKB handling semantics.
      Get rid of never-used-in-tree netpoll RX handling.  From Eric W
      Biederman.

  14) Support inter-address-family and namespace changing in VTI tunnel
      driver(s).  From Steffen Klassert.

  15) Add Altera TSE driver, from Vince Bridgers.

  16) Optimizing csum_replace2() so that it doesn't adjust the checksum
      by checksumming the entire header, from Eric Dumazet.

  17) Expand BPF internal implementation for faster interpreting, more
      direct translations into JIT'd code, and much cleaner uses of BPF
      filtering in non-socket ocntexts.  From Daniel Borkmann and Alexei
      Starovoitov"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
  netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
  net: Add a test to see if a skb is freeable in irq context
  qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
  net: ptp: move PTP classifier in its own file
  net: sxgbe: make "core_ops" static
  net: sxgbe: fix logical vs bitwise operation
  net: sxgbe: sxgbe_mdio_register() frees the bus
  Call efx_set_channels() before efx->type->dimension_resources()
  xen-netback: disable rogue vif in kthread context
  net/mlx4: Set proper build dependancy with vxlan
  be2net: fix build dependency on VxLAN
  mac802154: make csma/cca parameters per-wpan
  mac802154: allow only one WPAN to be up at any given time
  net: filter: minor: fix kdoc in __sk_run_filter
  netlink: don't compare the nul-termination in nla_strcmp
  can: c_can: Avoid led toggling for every packet.
  can: c_can: Simplify TX interrupt cleanup
  can: c_can: Store dlc private
  can: c_can: Reduce register access
  can: c_can: Make the code readable
  ...
2014-04-02 20:53:45 -07:00
Yan, Zheng 19913b4eac ceph: add get_name() NFS export callback
Use the newly introduced LOOKUPNAME MDS request to connect child
inode to its parent directory.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 10:33:53 +08:00
Ilya Dryomov 7cc69d42e6 libceph: bump CEPH_OSD_MAX_OP to 3
Our longest osd request now contains 3 ops: copyup+hint+write.

Also, CEPH_OSD_MAX_OP value in a BUG_ON in rbd_osd_req_callback() was
hard-coded to 2.  Fix it, and switch to rbd_assert while at it.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-03 10:33:52 +08:00
Ilya Dryomov c647b8a8c6 libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op
This is primarily for rbd's benefit and is supposed to combat
fragmentation:

"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks.  We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."

SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-03 10:33:51 +08:00
Ilya Dryomov 7b25bf5f02 libceph: encode CEPH_OSD_OP_FLAG_* op flags
Encode ceph_osd_op::flags field so that it gets sent over the wire.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-03 10:33:51 +08:00
Ilya Dryomov 9d521470a4 libceph: a per-osdc crush scratch buffer
With the addition of erasure coding support in the future, scratch
variable-length array in crush_do_rule_ary() is going to grow to at
least 200 bytes on average, on top of another 128 bytes consumed by
rawosd/osd arrays in the call chain.  Replace it with a buffer inside
struct osdmap and a mutex.  This shouldn't result in any contention,
because all osd requests were already serialized by request_mutex at
that point; the only unlocked caller was ceph_ioctl_get_dataloc().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 10:33:50 +08:00
Linus Torvalds 0f1b1e6d73 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
 - substantial cleanup of the generic and transport layers, in the
   direction of an ultimate goal of making struct hid_device completely
   transport independent, by Benjamin Tissoires
 - cp2112 driver from David Barksdale
 - a lot of fixes and new hardware support (Dualshock 4) to hid-sony
   driver, by Frank Praznik
 - support for Win 8.1 multitouch protocol by Andrew Duggan
 - other smaller fixes / device ID additions

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (75 commits)
  HID: sony: fix force feedback mismerge
  HID: sony: Set the quriks flag for Bluetooth controllers
  HID: sony: Fix Sixaxis cable state detection
  HID: uhid: Add UHID_CREATE2 + UHID_INPUT2
  HID: hyperv: fix _raw_request() prototype
  HID: hyperv: Implement a stub raw_request() entry point
  HID: hid-sensor-hub: fix sleeping function called from invalid context
  HID: multitouch: add support for Win 8.1 multitouch touchpads
  HID: remove hid_output_raw_report transport implementations
  HID: sony: do not rely on hid_output_raw_report
  HID: cp2112: remove the last hid_output_raw_report() call
  HID: cp2112: remove various hid_out_raw_report calls
  HID: multitouch: add support of other generic collections in hid-mt
  HID: multitouch: remove pen special handling
  HID: multitouch: remove registered devices with default behavior
  HID: hidp: Add a comment that some devices depend on the current behavior of uniq
  HID: sony: Prevent duplicate controller connections.
  HID: sony: Perform a boundry check on the sixaxis battery level index.
  HID: sony: Fix work queue issues
  HID: sony: Fix multi-line comment styling
  ...
2014-04-02 16:24:28 -07:00
Linus Torvalds 159d8133d0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual rocket science -- mostly documentation and comment updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  sparse: fix comment
  doc: fix double words
  isdn: capi: fix "CAPI_VERSION" comment
  doc: DocBook: Fix typos in xml and template file
  Bluetooth: add module name for btwilink
  driver core: unexport static function create_syslog_header
  mmc: core: typo fix in printk specifier
  ARM: spear: clean up editing mistake
  net-sysfs: fix comment typo 'CONFIG_SYFS'
  doc: Insert MODULE_ in module-signing macros
  Documentation: update URL to hfsplus Technote 1150
  gpio: update path to documentation
  ixgbe: Fix format string in ixgbe_fcoe.
  Kconfig: Remove useless "default N" lines
  user_namespace.c: Remove duplicated word in comment
  CREDITS: fix formatting
  treewide: Fix typo in Documentation/DocBook
  mm: Fix warning on make htmldocs caused by slab.c
  ata: ata-samsung_cf: cleanup in header file
  idr: remove unused prototype of idr_free()
2014-04-02 16:23:38 -07:00
Linus Torvalds 05bf58ca4b Merge branch 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched/idle changes from Ingo Molnar:
 "More idle code reorganization, to prepare for more integration.

  (Sent separately because it depended on pending timer work, which is
  now upstream)"

* 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/idle: Add more comments to the code
  sched/idle: Move idle conditions in cpuidle_idle main function
  sched/idle: Reorganize the idle loop
  cpuidle/idle: Move the cpuidle_idle_call function to idle.c
  idle/cpuidle: Split cpuidle_idle_call main function into smaller functions
2014-04-02 16:22:27 -07:00
Linus Torvalds 7cbb39d4d4 Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "PPC and ARM do not have much going on this time.  Most of the cool
  stuff, instead, is in s390 and (after a few releases) x86.

  ARM has some caching fixes and PPC has transactional memory support in
  guests.  MIPS has some fixes, with more probably coming in 3.16 as
  QEMU will soon get support for MIPS KVM.

  For x86 there are optimizations for debug registers, which trigger on
  some Windows games, and other important fixes for Windows guests.  We
  now expose to the guest Broadwell instruction set extensions and also
  Intel MPX.  There's also a fix/workaround for OS X guests, nested
  virtualization features (preemption timer), and a couple kvmclock
  refinements.

  For s390, the main news is asynchronous page faults, together with
  improvements to IRQs (floating irqs and adapter irqs) that speed up
  virtio devices"

* tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
  KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
  KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
  KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
  KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
  KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
  KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
  KVM: PPC: Book3S HV: Add transactional memory support
  KVM: Specify byte order for KVM_EXIT_MMIO
  KVM: vmx: fix MPX detection
  KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
  KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
  KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
  KVM: s390: clear local interrupts at cpu initial reset
  KVM: s390: Fix possible memory leak in SIGP functions
  KVM: s390: fix calculation of idle_mask array size
  KVM: s390: randomize sca address
  KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
  KVM: Bump KVM_MAX_IRQ_ROUTES for s390
  KVM: s390: irq routing for adapter interrupts.
  KVM: s390: adapter interrupt sources
  ...
2014-04-02 14:50:10 -07:00
Linus Torvalds b9f2b21a32 Devicetree changes for v3.15
Updates to devicetree core code. This branch contains the following notable changes:
 * Add reserved memory binding
 * Make struct device_node a kobject and remove legacy /proc/device-tree
 * ePAPR conformance fixes
 * Update in-kernel DTC copy to version v1.4.0
 * Preparation changes for dynamic device tree overlays
 * minor bug fixes and documentation changes
 
 The most significant change in this branch is the conversion of struct
 device_node to be a kobject that is exposed via sysfs and removal of the
 old /proc/device-tree code. This simplifies the device tree handling
 code and tightens up the lifecycle on device tree nodes.
 
 [updated: added fix for dangling select PROC_DEVICETREE]
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTOyNwAAoJEMWQL496c2LNZY0QAIreUrpo3/hKRau61EDPXkOA
 UFRyPUHD0k/dNXWWDbTfvKH/nAfzdVwejhePqEWiODiFOFkq7JyQlMKPA+CZuZj0
 ygN4215A1yj/hDf6JRD5Zn4WGpawDt9InlbZSps6P5dd8voV5t5dz6uzz+Y7uqaK
 CAjTDlBSmxEen5vRHiHQgKv74au/+b9yfSURjPQVWg46+wl3WJwjsdzerphm4unW
 tpEr8zkIsm51mqqAx4penIuiovh7+L2J5v4BFeg8o+kaZEuZpVxLHJPOuBd5hdom
 zeqEIj3AqHTh5suYIHe4aAbZ2wMP3kYGgkPGwfWLnwLyULxalcCtGZeaCi9nwTFj
 Fdj+7f17ocrt5mif0f5Deufi1LqJsDjhY6G9p7HuV7Y9hsMILpJIUoGENPji+TWj
 BA4L45eaPmNYdKJytEtFD7F2WnXeHZ6fDtYho/39DWW+Bt16IFX85T199irhxGG4
 byN6LRaahk2UeycSXkQHAlWOQHqzBcJJAkQLN2iahzyYRr9Dy+VI2E9clm53m49O
 YQYcONdUlMYrtfRwJpbB9XHM0HgZUvg0LT5z/iHQs9uJtoo33Oj+zxFixyZLQ9Dq
 qyLqQWEpV9gFLAo9tpf56gffkLiJRsHkX4UJ6oTtj4DY1WWU9H81jjCvv/7flzp/
 8ZyyZzANQf1DZ9kqO2v+
 =lyA5
 -----END PGP SIGNATURE-----

Merge tag 'dt-for-linus' of git://git.secretlab.ca/git/linux

Pull devicetree changes from Grant Likely:
 "Updates to devicetree core code.  This branch contains the following
  notable changes:

   - add reserved memory binding
   - make struct device_node a kobject and remove legacy
     /proc/device-tree
   - ePAPR conformance fixes
   - update in-kernel DTC copy to version v1.4.0
   - preparatory changes for dynamic device tree overlays
   - minor bug fixes and documentation changes

  The most significant change in this branch is the conversion of struct
  device_node to be a kobject that is exposed via sysfs and removal of
  the old /proc/device-tree code.  This simplifies the device tree
  handling code and tightens up the lifecycle on device tree nodes.

  [updated: added fix for dangling select PROC_DEVICETREE]"

* tag 'dt-for-linus' of git://git.secretlab.ca/git/linux: (29 commits)
  dt: Remove dangling "select PROC_DEVICETREE"
  of: Add support for ePAPR "stdout-path" property
  of: device_node kobject lifecycle fixes
  of: only scan for reserved mem when fdt present
  powerpc: add support for reserved memory defined by device tree
  arm64: add support for reserved memory defined by device tree
  of: add missing major vendors
  of: add vendor prefix for SMSC
  of: remove /proc/device-tree
  of/selftest: Add self tests for manipulation of properties
  of: Make device nodes kobjects so they show up in sysfs
  arm: add support for reserved memory defined by device tree
  drivers: of: add support for custom reserved memory drivers
  drivers: of: add initialization code for dynamic reserved memory
  drivers: of: add initialization code for static reserved memory
  of: document bindings for reserved-memory nodes
  Revert "of: fix of_update_property()"
  kbuild: dtbs_install: new make target
  ARM: mvebu: Allows to get the SoC ID even without PCI enabled
  of: Allows to use the PCI translator without the PCI core
  ...
2014-04-02 14:27:15 -07:00
Linus Torvalds 70f6c08757 More ACPI and power management updates for 3.15-rc1
- Remaining changes from upstream ACPICA release 20140214 that introduce
    code to automatically serialize the execution of methods creating any
    named objects which really cannot be executed in parallel with each
    other anyway (previously ACPICA attempted to address that by aborting
    methods upon conflict detection, but that wasn't reliable enough and
    led to other issues).  From Bob Moore and Lv Zheng.
 
  - intel_pstate fix to use del_timer_sync() instead of del_timer() in
    the exit path before freeing the timer structure from Dirk Brandewie
    (original patch from Thomas Gleixner).
 
  - cpufreq fix related to system resume from Viresh Kumar.
 
  - Serialization of frequency transitions in cpufreq that involve
    PRECHANGE and POSTCHANGE notifications to avoid ordering issues
    resulting from race conditions.  From Srivatsa S Bhat and Viresh Kumar.
 
  - Revert of an ACPI processor driver change that was based on a specific
    interpretation of the ACPI spec which may not be correct (the relevant
    part of the spec appears to be incomplete).  From Hanjun Guo.
 
  - Runtime PM core cleanups and documentation updates from Geert Uytterhoeven.
 
  - PNP core cleanup from Michael Opdenacker.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTO1+vAAoJEILEb/54YlRxHYgP/RB18RLcwSIPMTWoZPo5t+pd
 IGtHkG5xzCBZXiqL9OJLm+dH1V5w+wZVXh2865ZDiqq4CZYZWD6RUQnx5q0rSVR5
 54PYzx2I0i8ApPxRYTTmnb2NHUPedp3l0YSRC0gt73Q/6o9TcmOMtcn5pfTyCvbB
 m3am3mpKKxRD+vYCADjjUtuj4NQ62z9DjM5iJIql7Pj4kAJVgSxP8nsfHY6EwNaT
 m9mnNCA6Zemh89luM1W2vw69ZoZwLAbXIXJYCNy3khT13SYO2SCNhX/tlY7ncCvv
 P+9gawJb6Wio7pVHqRR0Lesc8J29uzivEeS8WqZ3R0P0HoTP6z5a5R+b06ecGQjF
 OWvj7wURjZE4t7qEtIOHmwIyCRE4Nxly90r5upj9kKVBaczz/LbDeCVfKU/Y2Vu6
 PPxmjRwjO4S8FqLihwiXCSYVf3pxBrDKgjjofM7f2CiO8D41C4KhgowbUqyUSCgw
 VKXU6UQbzVigfrGXsdqIJiTnEMmbOvrPy6PaVh27NlwXX3sg1SwYcoegsW+ee7m1
 jJdl1TRI27pl7NPgTkLpf5K7n6mkDsou8Y+PcQhFa6FNTn/k8gp/RfOHpLHaNTjL
 15Aswkm70Ojeae+Ahx8ZgrWXF7iu+uBX7KakeUVQJg/PFjXIspx+c/SrGzh7ZLa1
 aOqoKfFY7zDke4AV3eH/
 =EfZ8
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI and power management updates from Rafael Wysocki:
 "These are commits that were not quite ready when I sent the original
  pull request for 3.15-rc1 several days ago, but they have spent some
  time in linux-next since then and appear to be good to go.  All of
  them are fixes and cleanups.

  Specifics:

   - Remaining changes from upstream ACPICA release 20140214 that
     introduce code to automatically serialize the execution of methods
     creating any named objects which really cannot be executed in
     parallel with each other anyway (previously ACPICA attempted to
     address that by aborting methods upon conflict detection, but that
     wasn't reliable enough and led to other issues).  From Bob Moore
     and Lv Zheng.

   - intel_pstate fix to use del_timer_sync() instead of del_timer() in
     the exit path before freeing the timer structure from Dirk
     Brandewie (original patch from Thomas Gleixner).

   - cpufreq fix related to system resume from Viresh Kumar.

   - Serialization of frequency transitions in cpufreq that involve
     PRECHANGE and POSTCHANGE notifications to avoid ordering issues
     resulting from race conditions.  From Srivatsa S Bhat and Viresh
     Kumar.

   - Revert of an ACPI processor driver change that was based on a
     specific interpretation of the ACPI spec which may not be correct
     (the relevant part of the spec appears to be incomplete).  From
     Hanjun Guo.

   - Runtime PM core cleanups and documentation updates from Geert
     Uytterhoeven.

   - PNP core cleanup from Michael Opdenacker"

* tag 'pm+acpi-3.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Make cpufreq_notify_transition & cpufreq_notify_post_transition static
  cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin|end}
  cpufreq: Make sure frequency transitions are serialized
  intel_pstate: Use del_timer_sync in intel_pstate_cpu_stop
  cpufreq: resume drivers before enabling governors
  PM / Runtime: Spelling s/competing/completing/
  PM / Runtime: s/foo_process_requests/foo_process_next_request/
  PM / Runtime: GENERIC_SUBSYS_PM_OPS is gone
  PM / Runtime: Correct documented return values for generic PM callbacks
  PM / Runtime: Split line longer than 80 characters
  PM / Runtime: dev_pm_info.runtime_error is signed
  Revert "ACPI / processor: Make it possible to get APIC ID via GIC"
  ACPICA: Enable auto-serialization as a default kernel behavior.
  ACPICA: Ignore sync_level for methods that have been auto-serialized.
  ACPICA: Add additional named objects for the auto-serialize method scan.
  ACPICA: Add auto-serialization support for ill-behaved control methods.
  ACPICA: Remove global option to serialize all control methods.
  PNP: remove deprecated IRQF_DISABLED
2014-04-02 14:10:21 -07:00
Linus Torvalds 7125764c5d Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull compat time conversion changes from Peter Anvin:
 "Despite the branch name this is really neither an x86 nor an
  x32-specific patchset, although it the implementation of the
  discussions that followed the x32 security hole a few months ago.

  This removes get/put_compat_timespec/val() and replaces them with
  compat_get/put_timespec/val() which are savvy as to the current status
  of COMPAT_USE_64BIT_TIME.

  It removes several unused and/or incorrect/misleading functions (like
  compat_put_timeval_convert which doesn't in fact do any conversion)
  and also replaces several open-coded implementations what is now
  called compat_convert_timespec() with that function"

* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  compat: Fix sparse address space warnings
  compat: Get rid of (get|put)_compat_time(val|spec)
2014-04-02 12:51:41 -07:00
Linus Torvalds c6f21243ce Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso changes from Peter Anvin:
 "This is the revamp of the 32-bit vdso and the associated cleanups.

  This adds timekeeping support to the 32-bit vdso that we already have
  in the 64-bit vdso.  Although 32-bit x86 is legacy, it is likely to
  remain in the embedded space for a very long time to come.

  This removes the traditional COMPAT_VDSO support; the configuration
  variable is reused for simply removing the 32-bit vdso, which will
  produce correct results but obviously suffer a performance penalty.
  Only one beta version of glibc was affected, but that version was
  unfortunately included in one OpenSUSE release.

  This is not the end of the vdso cleanups.  Stefani and Andy have
  agreed to continue work for the next kernel cycle; in fact Andy has
  already produced another set of cleanups that came too late for this
  cycle.

  An incidental, but arguably important, change is that this ensures
  that unused space in the VVAR page is properly zeroed.  It wasn't
  before, and would contain whatever garbage was left in memory by BIOS
  or the bootloader.  Since the VVAR page is accessible to user space
  this had the potential of information leaks"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86, vdso: Fix the symbol versions on the 32-bit vDSO
  x86, vdso, build: Don't rebuild 32-bit vdsos on every make
  x86, vdso: Actually discard the .discard sections
  x86, vdso: Fix size of get_unmapped_area()
  x86, vdso: Finish removing VDSO32_PRELINK
  x86, vdso: Move more vdso definitions into vdso.h
  x86: Load the 32-bit vdso in place, just like the 64-bit vdsos
  x86, vdso32: handle 32 bit vDSO larger one page
  x86, vdso32: Disable stack protector, adjust optimizations
  x86, vdso: Zero-pad the VVAR page
  x86, vdso: Add 32 bit VDSO time support for 64 bit kernel
  x86, vdso: Add 32 bit VDSO time support for 32 bit kernel
  x86, vdso: Patch alternatives in the 32-bit VDSO
  x86, vdso: Introduce VVAR marco for vdso32
  x86, vdso: Cleanup __vdso_gettimeofday()
  x86, vdso: Replace VVAR(vsyscall_gtod_data) by gtod macro
  x86, vdso: __vdso_clock_gettime() cleanup
  x86, vdso: Revamp vclock_gettime.c
  mm: Add new func _install_special_mapping() to mmap.c
  x86, vdso: Make vsyscall_gtod_data handling x86 generic
  ...
2014-04-02 12:26:43 -07:00
Joerg Roedel e172b81222 Merge branches 'iommu/fixes', 'arm/smmu', 'x86/amd', 'arm/omap', 'arm/shmobile' and 'x86/vt-d' into next 2014-04-02 19:13:12 +02:00
Al Viro ccad236566 kill generic_file_buffered_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:38 -04:00
Al Viro 3b93f911d5 export generic_perform_write(), start getting rid of generic_file_buffer_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:36 -04:00
Al Viro 5cb6c6c7eb generic_file_direct_write(): get rid of ppos argument
always equal to &iocb->ki_pos.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:35 -04:00
Al Viro fcacafd269 kill the 5th argument of generic_file_buffered_write()
same story - it's &iocb->ki_pos in all cases

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:34 -04:00
Al Viro 41fc56d573 kill the 4th argument of __generic_file_aio_write()
It's always equal to &iocb->ki_pos, where iocb is the value of the 1st
argument.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:34 -04:00
Al Viro 86d564c84c constify blk_rq_map_user_iov() and friends
sg_iovec array passed to it can be const

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:31 -04:00
Al Viro 6e58e79db8 introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read()
generic_file_aio_read() was looping over the target iovec, with loop over
(source) pages nested inside that.  Just set an iov_iter up and pass *that*
to do_generic_file_aio_read().  With copy_page_to_iter() doing all work
of mapping and copying a page to iovec and advancing iov_iter.

Switch shmem_file_aio_read() to the same and kill file_read_actor(), while
we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:21 -04:00
Kent Overstreet 9223687863 iov_iter: Move iov_iter to uio.h
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2014-04-01 23:19:21 -04:00
Al Viro c186afb4db switch ->is_partially_uptodate() to saner arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:19 -04:00
Al Viro fbb32750a6 pipe: kill ->map() and ->unmap()
all pipe_buffer_operations have the same instances of those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:19 -04:00
Al Viro 5d826c847b new helper: readlink_copy()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:15 -04:00
Al Viro 7f4b36f9bb get rid of files_defer_init()
the only thing it's doing these days is calculation of
upper limit for fs.nr_open sysctl and that can be done
statically

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:14 -04:00
Al Viro 83f936c75e mark struct file that had write access grabbed by open()
new flag in ->f_mode - FMODE_WRITER.  Set by do_dentry_open() in case
when it has grabbed write access, checked by __fput() to decide whether
it wants to drop the sucker.  Allows to stop bothering with mnt_clone_write()
in alloc_file(), along with fewer special_file() checks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:12 -04:00
Al Viro 4597e695b8 get rid of DEBUG_WRITECOUNT
it only makes control flow in __fput() and friends more convoluted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:12 -04:00
Al Viro e25115786e switch nbd to sockfd_lookup/sockfd_put
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:10 -04:00