1
0
Fork 0
Commit Graph

484558 Commits (c253a8965cdf54806f74c4f46cb2f50b95a65b83)

Author SHA1 Message Date
Vlastimil Babka 510f550788 mm, cma: drain single zone pcplists
CMA allocation drains pcplists so that pages can merge back to buddy
allocator.  Since it operates on a single zone, we can reduce the
pcplists drain to the single zone, which is now possible.

The change should make CMA allocations faster and not disturbing
unrelated pcplists anymore.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Vlastimil Babka ec25af84b2 mm, page_isolation: drain single zone pcplists
When setting MIGRATETYPE_ISOLATE on a pageblock, pcplists are drained to
have a better chance that all pages will be successfully isolated and
not left in the per-cpu caches.  Since isolation is always concerned
with a single zone, we can reduce the pcplists drain to the single zone,
which is now possible.

The change should make memory isolation faster and not disturbing
unrelated pcplists anymore.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Vlastimil Babka 93481ff0e5 mm: introduce single zone pcplists drain
The functions for draining per-cpu pages back to buddy allocators
currently always operate on all zones.  There are however several cases
where the drain is only needed in the context of a single zone, and
spilling other pcplists is a waste of time both due to the extra
spilling and later refilling.

This patch introduces new zone pointer parameter to drain_all_pages()
and changes the dummy parameter of drain_local_pages() to be also a zone
pointer.  When NULL is passed, the functions operate on all zones as
usual.  Passing a specific zone pointer reduces the work to the single
zone.

All callers are updated to pass the NULL pointer in this patch.
Conversion to single zone (where appropriate) is done in further
patches.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Pintu Kumar 8612c6639b mm/vmscan.c: replace printk with pr_err
This patch replaces printk(KERN_ERR..) with pr_err found under
shrink_slab.  Thus it also reduces one line extra because of formatting.

Signed-off-by: Pintu Kumar <pintu.k@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Pintu Kumar 0cbc8533b7 mm/vmalloc.c: replace printk with pr_warn
This patch replaces printk(KERN_WARNING..) with pr_warn.
Thus it also reduces one line extra because of formatting.

Signed-off-by: Pintu Kumar <pintu.k@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Anton Blanchard f88dfff5f1 mm/page_alloc.c: convert boot printks without log level to pr_info
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Johannes Weiner 6d3d6aa22a mm: memcontrol: remove synchronous stock draining code
With charge reparenting, the last synchronous stock drainer left.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Johannes Weiner b2052564e6 mm: memcontrol: continue cache reclaim from offlined groups
On cgroup deletion, outstanding page cache charges are moved to the parent
group so that they're not lost and can be reclaimed during pressure
on/inside said parent.  But this reparenting is fairly tricky and its
synchroneous nature has led to several lock-ups in the past.

Since c2931b70a3 ("cgroup: iterate cgroup_subsys_states directly") css
iterators now also include offlined css, so memcg iterators can be changed
to include offlined children during reclaim of a group, and leftover cache
can just stay put.

There is a slight change of behavior in that charges of deleted groups no
longer show up as local charges in the parent.  But they are still
included in the parent's hierarchical statistics.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Johannes Weiner 64f2199389 mm: memcontrol: remove obsolete kmemcg pinning tricks
As charges now pin the css explicitely, there is no more need for kmemcg
to acquire a proxy reference for outstanding pages during offlining, or
maintain state to identify such "dead" groups.

This was the last user of the uncharge functions' return values, so remove
them as well.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Johannes Weiner e8ea14cc6e mm: memcontrol: take a css reference for each charged page
Charges currently pin the css indirectly by playing tricks during
css_offline(): user pages stall the offlining process until all of them
have been reparented, whereas kmemcg acquires a keep-alive reference if
outstanding kernel pages are detected at that point.

In preparation for removing all this complexity, make the pinning explicit
and acquire a css references for every charged page.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Johannes Weiner 5ac8fb31ad mm: memcontrol: convert reclaim iterator to simple css refcounting
The memcg reclaim iterators use a complicated weak reference scheme to
prevent pinning cgroups indefinitely in the absence of memory pressure.

However, during the ongoing cgroup core rework, css lifetime has been
decoupled such that a pinned css no longer interferes with removal of
the user-visible cgroup, and all this complexity is now unnecessary.

[mhocko@suse.cz: ensure that the cached reference is always released]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:05 -08:00
Johannes Weiner 5b1efc027c kernel: res_counter: remove the unused API
All memory accounting and limiting has been switched over to the
lockless page counters.  Bye, res_counter!

[akpm@linux-foundation.org: update Documentation/cgroups/memory.txt]
[mhocko@suse.cz: ditch the last remainings of res_counter]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Johannes Weiner 71f87bee38 mm: hugetlb_cgroup: convert to lockless page counters
Abandon the spinlock-protected byte counters in favor of the unlocked
page counters in the hugetlb controller as well.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Johannes Weiner 3e32cb2e0a mm: memcontrol: lockless page counters
Memory is internally accounted in bytes, using spinlock-protected 64-bit
counters, even though the smallest accounting delta is a page.  The
counter interface is also convoluted and does too many things.

Introduce a new lockless word-sized page counter API, then change all
memory accounting over to it.  The translation from and to bytes then only
happens when interfacing with userspace.

The removed locking overhead is noticable when scaling beyond the per-cpu
charge caches - on a 4-socket machine with 144-threads, the following test
shows the performance differences of 288 memcgs concurrently running a
page fault benchmark:

vanilla:

   18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
         1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
            24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
     1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
 8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
 1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
     1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )

     132.474343877 seconds time elapsed                                          ( +-  0.21% )

lockless:

   12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
           832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
            15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
     1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
 9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
 2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
     1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )

      91.369330729 seconds time elapsed                                          ( +-  0.45% )

On top of improved scalability, this also gets rid of the icky long long
types in the very heart of memcg, which is great for 32 bit and also makes
the code a lot more readable.

Notable differences between the old and new API:

- res_counter_charge() and res_counter_charge_nofail() become
  page_counter_try_charge() and page_counter_charge() resp. to match
  the more common kernel naming scheme of try_do()/do()

- res_counter_uncharge_until() is only ever used to cancel a local
  counter and never to uncharge bigger segments of a hierarchy, so
  it's replaced by the simpler page_counter_cancel()

- res_counter_set_limit() is replaced by page_counter_limit(), which
  expects its callers to serialize against themselves

- res_counter_memparse_write_strategy() is replaced by
  page_counter_limit(), which rounds down to the nearest page size -
  rather than up.  This is more reasonable for explicitely requested
  hard upper limits.

- to keep charging light-weight, page_counter_try_charge() charges
  speculatively, only to roll back if the result exceeds the limit.
  Because of this, a failing bigger charge can temporarily lock out
  smaller charges that would otherwise succeed.  The error is bounded
  to the difference between the smallest and the biggest possible
  charge size, so for memcg, this means that a failing THP charge can
  send base page charges into reclaim upto 2MB (4MB) before the limit
  would have been reached.  This should be acceptable.

[akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse]
[akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Pranith Kumar 8df0c2dcf6 slab: replace smp_read_barrier_depends() with lockless_dereference()
Recently lockless_dereference() was added which can be used in place of
hard-coding smp_read_barrier_depends().  The following PATCH makes the
change.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Andrew Morton c871ac4e96 slab: improve checking for invalid gfp_flags
The code goes BUG, but doesn't tell us which bits were unexpectedly set.
Print that out.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Andrey Ryabinin f6edde9cbe mm: slub: fix format mismatches in slab_err() callers
Adding __printf(3, 4) to slab_err exposed following:

  mm/slub.c: In function `check_slab':
  mm/slub.c:852:4: warning: format `%u' expects argument of type `unsigned int', but argument 4 has type `const char *' [-Wformat=]
      s->name, page->objects, maxobj);
      ^
  mm/slub.c:852:4: warning: too many arguments for format [-Wformat-extra-args]
  mm/slub.c:857:4: warning: format `%u' expects argument of type `unsigned int', but argument 4 has type `const char *' [-Wformat=]
      s->name, page->inuse, page->objects);
      ^
  mm/slub.c:857:4: warning: too many arguments for format [-Wformat-extra-args]

  mm/slub.c: In function `on_freelist':
  mm/slub.c:905:4: warning: format `%d' expects argument of type `int', but argument 5 has type `long unsigned int' [-Wformat=]
      "should be %d", page->objects, max_objects);

Fix first two warnings by removing redundant s->name.
Fix the last by changing type of max_object from unsigned long to int.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Joonsoo Kim 5436205738 mm/slab: reverse iteration on find_mergeable()
Unlike SLUB, sometimes, object isn't started at the beginning of the slab
in the SLAB.  This causes the unalignment problem when after slab merging
is supported by commit 12220dea07 ("mm/slab: support slab merge").
Alignment mismatch check is introduced ("mm/slab: fix unalignment problem
on Malta with EVA due to slab merge") to prevent merge in this case.

This causes undesirable result that merging happens between infrequently
used kmem_caches if there are kmem_caches with same size and is 256 bytes,
are merged into pool_workqueue rather than kmalloc-256, because
kmem_caches for kmalloc are at the tail of the list.

To prevent this situation, this patch reverses iteration order in
find_mergeable() to find frequently used kmem_caches.  This change helps
to merge kmem_cache to frequently used kmem_caches, such as kmalloc
kmem_caches.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Vladimir Davydov 1df3b26f20 slab: print slabinfo header in seq show
Currently we print the slabinfo header in the seq start method, which
makes it unusable for showing leaks, so we have leaks_show, which does
practically the same as s_show except it doesn't show the header.

However, we can print the header in the seq show method - we only need
to check if the current element is the first on the list.  This will
allow us to use the same set of seq iterators for both leaks and
slabinfo reporting, which is nice.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
LQYMGT b455def28d mm: slab/slub: coding style: whitespaces and tabs mixture
Some code in mm/slab.c and mm/slub.c use whitespaces in indent.
Clean them up.

Signed-off-by: LQYMGT <lqymgt@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Jan Kara e2ab879e96 fs/char_dev.c: remove pointless assignment from __register_chrdev_region()
At one place we assign major number we found to ret.  That assignment is
then never used and actually doesn't make any sense given how the code is
currently structured (the assignment comes from pre-git times).  Just
remove it.

Coverity id: 1226852.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Dan Carpenter b3e3e5af60 ocfs2: remove unneeded NULL check
In commit 1faf289454 ("ocfs2_dlm: disallow a domain join if node maps
mismatch") we introduced a new earlier NULL check so this one is not
needed.  Also static checkers complain because we dereference it first
and then check for NULL.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
Dan Carpenter 88d69b92fc ocfs2: remove bogus NULL check in ocfs2_move_extents()
"inode" isn't NULL here, and also we dereference it on the previous line
so static checkers get annoyed.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:04 -08:00
jiangyiwen 61fb9ea4b3 ocfs2: do not set filesystem readonly if link down
Do not set the filesystem readonly if the storage link is down.  In this
case, metadata is not corrupted and only -EIO is returned.  And if it is
indeed corrupted metadata, it has already called ocfs2_error() in
ocfs2_validate_inode_block().

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Xue jiufei d1e7823874 ocfs2: do not set OCFS2_LOCK_UPCONVERT_FINISHING if nonblocking lock can not be granted at once
ocfs2_readpages() use nonblocking flag to avoid page lock inversion.  It
will trigger cluster hang because that flag OCFS2_LOCK_UPCONVERT_FINISHING
is not cleared if nonblocking lock cannot be granted at once.  The flag
would prevent dc thread from downconverting.  So other nodes cannot
acheive this lockres for ever.

So we should not set OCFS2_LOCK_UPCONVERT_FINISHING when receiving ast if
nonblocking lock had already returned.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Jan Kara dc17158060 ocfs2: fix error handling when creating debugfs root in ocfs2_init()
Error handling if creation of root of debugfs in ocfs2_init() fails is
broken.  Although error code is set we fail to exit ocfs2_init() with
error and thus initialization ends with success.  Later when mounting a
filesystem, ocfs2 debugfs entries end up being created in the root of
debugfs filesystem which is confusing.

Fix the error handling to bail out.

Coverity id: 1227009.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Goldwyn Rodrigues 86b9c6f3f8 ocfs2: remove filesize checks for sync I/O journal commit
Filesize is not a good indication that the file needs to be synced.
An example where this breaks is:
 1. Open the file in O_SYNC|O_RDWR
 2. Read a small portion of the file (say 64 bytes)
 3. Lseek to starting of the file
 4. Write 64 bytes

If the node crashes, it is not written out to disk because this was not
committed in the journal and the other node which reads the file after
recovery reads stale data (even if the write on the other node was
successful)

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Junxiao Bi 196fe71d64 ocfs2: o2net: fix connect expired
Set nn_persistent_error to -ENOTCONN will stop reconnect since the
"stop" condition in o2net_start_connect() will be true.

    stop = (nn->nn_sc ||
                (nn->nn_persistent_error &&
                (nn->nn_persistent_error != -ENOTCONN || timeout == 0)));

This will make connection never be established if the first connection
request is lost.

Set nn_persistent_error to 0 when connect expired to fix this.  With
this changes, dlm will not be waken up when connect expired, this is OK
since dlm depends on network, dlm can do nothing in this case if waken
up.  Let it wait there for network recover and connect built again to
continue.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Srinivas Eeda cb79662bc2 ocfs2: o2dlm: fix a race between purge and master query
Node A sends master query request to node B which is the master.  At this
time lockres happens to be on purgelist.  dlm_master_request_handler gets
the dlm spinlock, finds the resource and releases the dlm spin lock.
Right at this dlm_thread on this node could purge the lockres.
dlm_master_request_handler can then acquire lockres spinlock and reply to
Node A that node B is the master even though lockres on node B is purged.

The above scenario will now make node A falsely think node B is the master
which is inconsistent.  Further if another node C tries to master the same
resource, every node will respond they are not the master.  Node C then
masters the resource and sends assert master to all nodes.  This will now
make node A crash with the following message.

dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current
owner is 10!

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Tested-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Jan Kara f5425fcea7 ocfs2: report error from o2hb_do_disk_heartbeat() to user
Report return value of o2hb_do_disk_heartbeat() as a part of ML_HEARTBEAT
message so that we know whether a heartbeat actually happened or not.
This also makes assigned but otherwise unused 'ret' variable used.

Coverity id: 1227053.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Jan Kara 4a635a113b ocfs2: remove bogus test from ocfs2_read_locked_inode()
'args' are always set for ocfs2_read_locked_inode() and brelse() checks
whether bh is NULL.  So the test (args && bh) is unnecessary (plus the
args part is really confusing anyway).  Remove it.

Coverity id: 1128856.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Jan Kara 2b693005b8 ocfs2: Fix xattr check in ocfs2_get_xattr_nolock()
ocfs2_get_xattr_nolock() checks whether inode has any extended attributes
(OCFS2_HAS_XATTR_FL).  If not, it just sets 'ret' to -ENODATA but
continues with checking inline and external attributes anyway (which is
pointless although it does not harm).  Just return immediately when we
know there are no extended attributes in the inode.

Coverity id: 1226906.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Dan Carpenter 519a286175 ocfs2: fix an off-by-one BUG_ON() statement
The ->si_slots[] array is allocated in ocfs2_init_slot_info() it has
"->max_slots" number of elements so this test should be >= instead of >.

Static checker work.  Compile tested only.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Joseph Qi f08736bd6c ocfs2/dlm: let sender retry if dlm_dispatch_assert_master failed with -ENOMEM
Do not BUG() if GFP_ATOMIC allocation fails in dlm_dispatch_assert_master.
Instead, return -ENOMEM to the sender and then retry.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Dan Carpenter ded0014274 sh: off by one BUG_ON() in setup_bootmem_node()
This off by one bug is harmless but it upsets the static checkers and the
code is obvious so it doesn't hurt to fix it.  The Smatch warning is:

    arch/sh/mm/numa.c:47 setup_bootmem_node()
    error: buffer overflow 'node_data' 1024 <= 1024

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:02 -08:00
Johannes Berg 7b990789a4 scripts/kernel-doc: don't eat struct members with __aligned
The change from \d+ to .+ inside __aligned() means that the following
structure:

  struct test {
        u8 a __aligned(2);
        u8 b __aligned(2);
  };

essentially gets modified to

  struct test {
        u8 a;
  };

for purposes of kernel-doc, thus dropping a struct member, which in
turns causes warnings and invalid kernel-doc generation.

Fix this by replacing the catch-all (".") with anything that's not a
semicolon ("[^;]").

Fixes: 9dc30918b2 ("scripts/kernel-doc: handle struct member __aligned without numbers")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cc: Nishanth Menon <nm@ti.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:02 -08:00
Florian Fainelli 2ce8e7ed00 dma-debug: prevent early callers from crashing
dma_debug_init() is called by architecture specific code at different
levels, but typically as a fs_initcall due to the debugfs initialization.
Some platforms may have early callers of the DMA-API, running prior to the
fs_initcall() level, which is not much of an issue unless
CONFIG_DMA_API_DEBUG is set.  When the DMA-API debugging facilities are
turned on a caller will go through:

debug_dma_map_{single,page}
  -> dma_mapping_error (inline function usually)
    -> debug_dma_mapping_error
      -> get_hash_bucket

Calling get_hash_bucket() returns a valid hash value since we hash on high
bits of the dma_addr cookie, but we will grab an unitialized spinlock,
which typically won't crash but produce a warning, the real crash will
however happen during the bucket list traversal because the list has not
been initialized yet.

An obvious solution is of course to move some of the offenders to run
after the fs_initcall level, but since this might not always be an option,
we add a flag "dma_debug_initialized" which is set to false by default,
and set to true once dma_debug_init() has had a chance to run.

The dma_debug_disabled() helper function previously introduced just needs
to check for dma_debug_initialized to allow the caller to proceed or not.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Horia Geanta <horia.geanta@freescale.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:02 -08:00
Florian Fainelli 01ce18b311 dma-debug: introduce dma_debug_disabled
Add a helper function which returns whether the DMA debugging API is
disabled, right now we only check for global_disable, but in order to
accommodate early callers of the DMA-API, we will check for more
initialization flags in the next patch.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Horia Geanta <horia.geanta@freescale.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:02 -08:00
Fabian Frederick 662e9b2b98 fs/cifs/smb2file.c: replace count*size kzalloc by kcalloc
kcalloc manages count*sizeof overflow.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:02 -08:00
Fabian Frederick 4b99d39b1b fs/cifs/file.c: replace count*size kzalloc by kcalloc
kcalloc manages count*sizeof overflow.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:02 -08:00
Fabian Frederick bc09d141eb fs/cifs: remove obsolete __constant
Replace all __constant_foo to foo() except in smb2status.h (1700 lines to
update).

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steve French <sfrench@samba.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:02 -08:00
Joonsoo Kim 6b101e2a3c mm/CMA: fix boot regression due to physical address of high_memory
high_memory isn't direct mapped memory so retrieving it's physical address
isn't appropriate.  But, it would be useful to check physical address of
highmem boundary so it's justfiable to get physical address from it.  In
x86, there is a validation check if CONFIG_DEBUG_VIRTUAL and it triggers
following boot failure reported by Ingo.

  ...
  BUG: Int 6: CR2 00f06f53
  ...
  Call Trace:
    dump_stack+0x41/0x52
    early_idt_handler+0x6b/0x6b
    cma_declare_contiguous+0x33/0x212
    dma_contiguous_reserve_area+0x31/0x4e
    dma_contiguous_reserve+0x11d/0x125
    setup_arch+0x7b5/0xb63
    start_kernel+0xb8/0x3e6
    i386_start_kernel+0x79/0x7d

To fix boot regression, this patch implements workaround to avoid
validation check in x86 when retrieving physical address of high_memory.
__pa_nodebug() used by this patch is implemented only in x86 so there is
no choice but to use dirty #ifdef.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:02 -08:00
Linus Torvalds d82012695e Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more 2038 timer work from Thomas Gleixner:
 "Two more patches for the ongoing 2038 work:

   - New accessors to clock MONOTONIC and REALTIME seconds

  This is a seperate branch as Arnd has follow up work depending on
  this"

* 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Provide y2038 safe accessor to the seconds portion of CLOCK_REALTIME
  timekeeping: Provide fast accessor to the seconds part of CLOCK_MONOTONIC
2014-12-10 10:13:28 -08:00
Linus Torvalds 3eb5b893eb Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MPX support from Thomas Gleixner:
 "This enables support for x86 MPX.

  MPX is a new debug feature for bound checking in user space.  It
  requires kernel support to handle the bound tables and decode the
  bound violating instruction in the trap handler"

* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic: Remove asm-generic arch_bprm_mm_init()
  mm: Make arch_unmap()/bprm_mm_init() available to all architectures
  x86: Cleanly separate use of asm-generic/mm_hooks.h
  x86 mpx: Change return type of get_reg_offset()
  fs: Do not include mpx.h in exec.c
  x86, mpx: Add documentation on Intel MPX
  x86, mpx: Cleanup unused bound tables
  x86, mpx: On-demand kernel allocation of bounds tables
  x86, mpx: Decode MPX instruction to get bound violation information
  x86, mpx: Add MPX-specific mmap interface
  x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: Add MPX to disabled features
  ia64: Sync struct siginfo with general version
  mips: Sync struct siginfo with general version
  mpx: Extend siginfo structure to include bound violation information
  x86, mpx: Rename cfg_reg_u and status_reg
  x86: mpx: Give bndX registers actual names
  x86: Remove arbitrary instruction size limit in instruction decoder
2014-12-10 09:34:43 -08:00
Linus Torvalds 9e66645d72 Merge branch 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq domain updates from Thomas Gleixner:
 "The real interesting irq updates:

   - Support for hierarchical irq domains:

     For complex interrupt routing scenarios where more than one
     interrupt related chip is involved we had no proper representation
     in the generic interrupt infrastructure so far.  That made people
     implement rather ugly constructs in their nested irq chip
     implementations.  The main offenders are x86 and arm/gic.

     To distangle that mess we have now hierarchical irqdomains which
     seperate the various interrupt chips and connect them via the
     hierarchical domains.  That keeps the domain specific details
     internal to the particular hierarchy level and removes the
     criss/cross referencing of chip internals.  The resulting hierarchy
     for a complex x86 system will look like this:

        vector          mapped: 74
          msi-0         mapped: 2
          dmar-ir-1     mapped: 69
            ioapic-1    mapped: 4
            ioapic-0    mapped: 20
            pci-msi-2   mapped: 45
          dmar-ir-0     mapped: 3
            ioapic-2    mapped: 1
            pci-msi-1   mapped: 2
          htirq         mapped: 0

     Neither ioapic nor pci-msi know about the dmar interrupt remapping
     between themself and the vector domain.  If interrupt remapping is
     disabled ioapic and pci-msi become direct childs of the vector
     domain.

     In hindsight we should have done that years ago, but in hindsight
     we always know better :)

   - Support for generic MSI interrupt domain handling

     We have more and more non PCI related MSI interrupts, so providing
     a generic infrastructure for this is better than having all
     affected architectures implementing their own private hacks.

   - Support for PCI-MSI interrupt domain handling, based on the generic
     MSI support.

     This part carries the pci/msi branch from Bjorn Helgaas pci tree to
     avoid a massive conflict.  The PCI/MSI parts are acked by Bjorn.

  I have two more branches on top of this.  The full conversion of x86
  to hierarchical domains and a partial conversion of arm/gic"

* 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  genirq: Move irq_chip_write_msi_msg() helper to core
  PCI/MSI: Allow an msi_controller to be associated to an irq domain
  PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain
  PCI/MSI: Enhance core to support hierarchy irqdomain
  PCI/MSI: Move cached entry functions to irq core
  genirq: Provide default callbacks for msi_domain_ops
  genirq: Introduce msi_domain_alloc/free_irqs()
  asm-generic: Add msi.h
  genirq: Add generic msi irq domain support
  genirq: Introduce callback irq_chip.irq_write_msi_msg
  genirq: Work around __irq_set_handler vs stacked domains ordering issues
  irqdomain: Introduce helper function irq_domain_add_hierarchy()
  irqdomain: Implement a method to automatically call parent domains alloc/free
  genirq: Introduce helper irq_domain_set_info() to reduce duplicated code
  genirq: Split out flow handler typedefs into seperate header file
  genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
  genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip
  genirq: Add more helper functions to support stacked irq_chip
  genirq: Introduce helper functions to support stacked irq_chip
  irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF
  ...
2014-12-10 09:01:01 -08:00
Linus Torvalds ecb50f0afd Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq core updates from Thomas Gleixner:
 "This is the first (boring) part of irq updates:

   - support for big endian I/O accessors in the generic irq chip

   - cleanup of brcmstb/bcm7120 drivers so they can be reused for non
     ARM SoCs

   - the usual pile of fixes and updates for the various ARM irq chips"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  irqchip: dw-apb-ictl: Add PM support
  irqchip: dw-apb-ictl: Enable IRQ_GC_MASK_CACHE_PER_TYPE
  irqchip: dw-apb-ictl: Always use use {readl|writel}_relaxed
  ARM: orion: convert the irq_reg_{readl,writel} calls to the new API
  irqchip: atmel-aic: Add missing entry for rm9200 irq fixups
  irqchip: atmel-aic: Rename at91sam9_aic_irq_fixup for naming consistency
  irqchip: atmel-aic: Add specific irq fixup function for sam9g45 and sam9rl
  irqchip: atmel-aic: Add irq fixups for at91sam926x SoCs
  irqchip: atmel-aic: Add irq fixup for RTT block
  irqchip: brcmstb-l2: Convert driver to use irq_reg_{readl,writel}
  irqchip: bcm7120-l2: Convert driver to use irq_reg_{readl,writel}
  irqchip: bcm7120-l2: Decouple driver from brcmstb-l2
  irqchip: bcm7120-l2: Extend driver to support 64+ bit controllers
  irqchip: bcm7120-l2: Use gc->mask_cache to simplify suspend/resume functions
  irqchip: bcm7120-l2: Fix missing nibble in gc->unused mask
  irqchip: bcm7120-l2: Make sure all register accesses use base+offset
  irqchip: bcm7120-l2, brcmstb-l2: Remove ARM Kconfig dependency
  irqchip: bcm7120-l2: Eliminate bad IRQ check
  irqchip: brcmstb-l2: Eliminate dependency on ARM code
  genirq: Generic chip: Add big endian I/O accessors
  ...
2014-12-10 08:38:57 -08:00
Linus Torvalds a157508c97 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
 "The time(r) departement provides:

   - more infrastructure work on the year 2038 issue

   - a few fixes in the Armada SoC timers

   - the usual pile of fixlets and improvements"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: armada-370-xp: Use the reference clock on A375 SoC
  watchdog: orion: Use the reference clock on Armada 375 SoC
  clocksource: armada-370-xp: Add missing clock enable
  time: Fix sign bug in NTP mult overflow warning
  time: Remove timekeeping_inject_sleeptime()
  rtc: Update suspend/resume timing to use 64bit time
  rtc/lib: Provide y2038 safe rtc_tm_to_time()/rtc_time_to_tm() replacement
  time: Fixup comments to reflect usage of timespec64
  time: Expose get_monotonic_coarse64() for in-kernel uses
  time: Expose getrawmonotonic64 for in-kernel uses
  time: Provide y2038 safe mktime() replacement
  time: Provide y2038 safe timekeeping_inject_sleeptime() replacement
  time: Provide y2038 safe do_settimeofday() replacement
  time: Complete NTP adjustment threshold judging conditions
  time: Avoid possible NTP adjustment mult overflow.
  time: Rename udelay_test.c to test_udelay.c
  clocksource: sirf: Remove hard-coded clock rate
2014-12-10 08:18:32 -08:00
Linus Torvalds 86c6a2fddf Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

     This instruments might_sleep() checks to catch places that nest
     blocking primitives - such as mutex usage in a wait loop.  Such
     bugs can result in hard to debug races/hangs.

     Another category of invalid nesting that this facility will detect
     is the calling of blocking functions from within schedule() ->
     sched_submit_work() -> blk_schedule_flush_plug().

     There's some potential for false positives (if secondary blocking
     primitives themselves are not ready yet for this facility), but the
     kernel will warn once about such bugs per bootup, so the warning
     isn't much of a nuisance.

     This feature comes with a number of fixes, for problems uncovered
     with it, so no messages are expected normally.

   - Another round of sched/numa optimizations and refinements, for
     CONFIG_NUMA_BALANCING=y.

   - Another round of sched/dl fixes and refinements.

  Plus various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched: Add missing rcu protection to wake_up_all_idle_cpus
  sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
  sched/numa: Init numa balancing fields of init_task
  sched/deadline: Remove unnecessary definitions in cpudeadline.h
  sched/cpupri: Remove unnecessary definitions in cpupri.h
  sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
  sched/fair: Fix stale overloaded status in the busiest group finding logic
  sched: Move p->nr_cpus_allowed check to select_task_rq()
  sched/completion: Document when to use wait_for_completion_io_*()
  sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
  sched/fair: Kill task_struct::numa_entry and numa_group::task_list
  sched: Refactor task_struct to use numa_faults instead of numa_* pointers
  sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
  sched/deadline: Reschedule from switched_from_dl() after a successful pull
  sched/deadline: Push task away if the deadline is equal to curr during wakeup
  sched/deadline: Add deadline rq status print
  sched/deadline: Fix artificial overrun introduced by yield_task_dl()
  sched/rt: Clean up check_preempt_equal_prio()
  sched/core: Use dl_bw_of() under rcu_read_lock_sched()
  sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
  ...
2014-12-09 21:21:34 -08:00
Linus Torvalds bee2782f30 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull leftover perf fixes from Ingo Molnar:
 "Two perf fixes left over from the previous cycle"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf session: Do not fail on processing out of order event
  x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
2014-12-09 21:18:06 -08:00
Linus Torvalds 5706ffd045 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events update from Ingo Molnar:
 "On the kernel side there's few changes, the one that stands out is
  PEBS machine state sampling support on x86, by Stephane Eranian.

  On the tooling side:

  User visible tooling changes:

   - Don't open the DWARF info multiple times, keeping instead a dwfl
     handle in struct dso, greatly speeding up 'perf report' on powerpc.
     (Sukadev Bhattiprolu)

   - Introduce PARSE_OPT_DISABLED option flag and use it to avoid
     showing undersired options in tools that provides frontends to
     'perf record', like sched, kvm, etc (Namhyung Kim)

   - Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo
     Carvalho de Melo)

   - Fix annotation with kcore (Adrian Hunter)

   - Support source line numbers in annotate using a hotkey (Andi Kleen)

   - Callchain improvements including:
     * Enable printing the srcline in the history
     * Make get_srcline fall back to sym+offset (Andi Kleen)

   - TUI hist_entry browser fixes, including showing missing overhead
     value for first level callchain.  Detected comparing the output of
     --stdio/--gui (that matched) with --tui, that had this problem.
     (Namhyung Kim)

   - Support handling complete branch stacks as histograms (Andi Kleen)

  Tooling infrastructure changes:

   - Prep work for supporting per-pkg and snapshot counters in 'perf
     stat' (Jiri Olsa)

   - 'perf stat' refactorings, moving stuff from it to evsel.c to use in
     per-pkg/snapshot format changes (Jiri Olsa)

   - Add per-pkg format file parsing (Matt Fleming)

   - Clean up libelf feature support code (Namhyung Kim)

   - Add gzip decompression support for kernel modules (Namhyung Kim)

   - More prep patches for Intel PT, including a a thread stack and more
     stuff made available via the database export mechanism (Adrian
     Hunter)

   - More Intel PT work, including a facility to export sample data
     (comms, threads, symbol names, etc) in a database friendly way,
     with an script to use this to create a postgresql database.
     (Adrian Hunter)

   - Make sure that thread->mg->machine points to the machine where the
     thread exists (it was being set only for the kmaps kernel modules
     case, do it as well for the mmaps) and use it to shorten function
     signatures (Arnaldo Carvalho de Melo)

  ... and lots of other fixes and smaller improvements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
  perf report: In branch stack mode use address history sorting
  perf report: Add --branch-history option
  perf callchain: Support handling complete branch stacks as histograms
  perf stat: Add support for snapshot counters
  perf stat: Add support for per-pkg counters
  perf tools: Remove perf_evsel__read interface
  perf stat: Use read_counter in read_counter_aggr
  perf stat: Make read_counter work over the thread dimension
  perf stat: Use perf_evsel__read_cb in read_counter
  perf tools: Add snapshot format file parsing
  perf tools: Add per-pkg format file parsing
  perf evsel: Introduce perf_evsel__read_cb function
  perf evsel: Introduce perf_counts_values__scale function
  perf evsel: Introduce perf_evsel__compute_deltas function
  perf tools: Allow to force redirect pr_debug to stderr.
  perf tools: Fix segfault due to invalid kernel dso access
  perf callchain: Make get_srcline fall back to sym+offset
  perf symbols: Move bfd_demangle stubbing to its only user
  perf callchain: Enable printing the srcline in the history
  perf tools: Collapse first level callchain entry if it has sibling
  ...
2014-12-09 20:55:37 -08:00