1
0
Fork 0
Commit Graph

4469 Commits (64ec72a1ece37d9bc7ba8b11d6091ce7cb1d8eec)

Author SHA1 Message Date
Matthew Wilcox 0ac398ef39 radix-tree: Add radix_tree_iter_delete
Factor the deletion code out into __radix_tree_delete() and provide a
nice iterator-based wrapper around it.  If we free the node, advance
the iterator to avoid reading from freed memory.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13 16:09:55 -05:00
Matthew Wilcox 30b888ba95 radix-tree: Add radix_tree_iter_tag_clear()
The counterpart to radix_tree_iter_tag_set(), used by the IDR code

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Rehas Sachdeva <aquannie@gmail.com>
2017-02-13 16:09:44 -05:00
Song Liu 10257d7196 EXPORT_SYMBOL radix_tree_replace_slot
It will be used in drivers/md/raid5-cache.c

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2017-02-13 09:17:51 -08:00
Kees Cook dfb4357da6 time: Remove CONFIG_TIMER_STATS
Currently CONFIG_TIMER_STATS exposes process information across namespaces:

kernel/time/timer_list.c print_timer():

        SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);

/proc/timer_list:

 #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570

Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Xing Gao <xgao01@email.wm.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jessica Frazelle <me@jessfraz.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10 11:15:08 +01:00
Waiman Long 0cad93c345 debugobjects: Improve variable naming
As suggested by Ingo, the debug_objects_alloc counter is now renamed to
debug_objects_allocated with minor twist in comment and debug output.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1486503630-1501-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10 09:53:04 +01:00
Peter Zijlstra f405df5de3 refcount_t: Introduce a special purpose refcount type
Provide refcount_t, an atomic_t like primitive built just for
refcounting.

It provides saturation semantics such that overflow becomes impossible
and thereby 'spurious' use-after-free is avoided.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10 09:04:19 +01:00
Sergey Senozhatsky f92bac3b14 printk: rename nmi.c and exported api
A preparation patch for printk_safe work. No functional change.
- rename nmi.c to print_safe.c
- add `printk_safe' prefix to some (which used both by printk-safe
  and printk-nmi) of the exported functions.

Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Calvin Owens <calvinowens@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-02-08 11:02:33 +01:00
Greg Kroah-Hartman 17fa87fe5a Merge 4.10-rc7 into char-misc-next
We want the hv and other fixes in here as well to handle merge and
testing issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-06 09:39:13 +01:00
Waiman Long 858274b6a1 debugobjects: Reduce contention on the global pool_lock
On a large SMP system with many CPUs, the global pool_lock may become
a performance bottleneck as all the CPUs that need to allocate or
free debug objects have to take the lock. That can sometimes cause
soft lockups like:

 NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21]
 ...
 RIP: 0010:[<ffffffff817c216b>]  [<ffffffff817c216b>]
	_raw_spin_unlock_irqrestore+0x3b/0x60
 ...
 Call Trace:
  [<ffffffff813f40d1>] free_object+0x81/0xb0
  [<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220
  [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
  [<ffffffff81284996>] ? file_free_rcu+0x36/0x60
  [<ffffffff81251712>] kmem_cache_free+0xd2/0x380
  [<ffffffff81284960>] ? fput+0x90/0x90
  [<ffffffff81284996>] file_free_rcu+0x36/0x60
  [<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550
  [<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550
  [<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50
  [<ffffffff810c59d1>] kthread+0x101/0x120
  [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
  [<ffffffff817c2d32>] ret_from_fork+0x22/0x50

To reduce the amount of contention on the pool_lock, the actual
kmem_cache_free() of the debug objects will be delayed if the pool_lock
is busy. This will temporarily increase the amount of free objects
available at the free pool when the system is busy. As a result,
the number of kmem_cache allocation and freeing is reduced.

To further reduce the lock operations free debug objects in batches of
four.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-4-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-05 17:09:32 +01:00
Waiman Long 97dd552eb2 debugobjects: Scale thresholds with # of CPUs
On a large SMP systems with hundreds of CPUs, the current thresholds
for allocating and freeing debug objects (256 and 1024 respectively)
may not work well. This can cause a lot of needless calls to
kmem_aloc() and kmem_free() on those systems.

To alleviate this thrashing problem, the object freeing threshold
is now increased to "1024 + # of CPUs * 32". Whereas the object
allocation threshold is increased to "256 + # of CPUs * 4". That
should make the debug objects subsystem scale better with the number
of CPUs available in the system.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-3-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04 09:01:55 +01:00
Waiman Long c4b73aabd0 debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
New debugfs stat counters are added to track the numbers of
kmem_cache_alloc() and kmem_cache_free() function calls to get a
sense of how the internal debug objects cache management is performing.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-2-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04 09:01:54 +01:00
Jiri Pirko 44091d29f2 lib: Introduce priority array area manager
This introduces a infrastructure for management of linear priority
areas. Priority order in an array matters, however order of items inside
a priority group does not matter.

As an initial implementation, L-sort algorithm is used. It is quite
trivial. More advanced algorithm called P-sort will be introduced as a
follow-up. The infrastructure is prepared for other algos.

Alongside this, a testing module is introduced as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:42 -05:00
Jason A. Donenfeld 1c83a9aab8 ext4: move halfmd4 into hash.c directly
The "half md4" transform should not be used by any new code. And
fortunately, it's only used now by ext4. Since ext4 supports several
hashing methods, at some point it might be desirable to move to
something like SipHash. As an intermediate step, remove half md4 from
cryptohash.h and lib, and make it just a local function in ext4's
hash.c. There's precedent for doing this; the other function ext can use
for its hashes -- TEA -- is also implemented in the same place. Also, by
being a local function, this might allow gcc to perform some additional
optimizations.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-02 11:52:14 -05:00
Dave Airlie 012bbe28c0 Merge tag 'drm-misc-next-2017-01-30' of git://anongit.freedesktop.org/git/drm-misc into drm-next
Another round of -misc stuff:
- Noralf debugfs cleanup cleanup (not yet everything, some more driver
  patches awaiting acks).
- More doc work.
- edid/infoframe fixes from Ville.
- misc 1-patch fixes all over, as usual

Noralf needs this for his tinydrm pull request.

* tag 'drm-misc-next-2017-01-30' of git://anongit.freedesktop.org/git/drm-misc: (48 commits)
  drm/vc4: Remove vc4_debugfs_cleanup()
  dma/fence: Export enable-signaling tracepoint for emission by drivers
  drm/tilcdc: Remove tilcdc_debugfs_cleanup()
  drm/tegra: Remove tegra_debugfs_cleanup()
  drm/sti: Remove drm_debugfs_remove_files() calls
  drm/radeon: Remove drm_debugfs_remove_files() call
  drm/omap: Remove omap_debugfs_cleanup()
  drm/hdlcd: Remove hdlcd_debugfs_cleanup()
  drm/etnaviv: Remove etnaviv_debugfs_cleanup()
  drm/etnaviv: allow build with COMPILE_TEST
  drm/amd/amdgpu: Remove drm_debugfs_remove_files() call
  drm/prime: Clarify DMA-BUF/GEM Object lifetime
  drm/ttm: Make sure BOs being swapped out are cacheable
  drm/atomic: Remove drm_atomic_debugfs_cleanup()
  drm: drm_minor_register(): Clean up debugfs on failure
  drm: debugfs: Remove all files automatically on cleanup
  drm/fourcc: add vivante tiled layout format modifiers
  drm/edid: Set YQ bits in the AVI infoframe according to CEA-861-F
  drm/edid: Set AVI infoframe Q even when QS=0
  drm/edid: Introduce drm_hdmi_avi_infoframe_quant_range()
  ...
2017-02-01 08:31:09 +10:00
Ingo Molnar a8709fa4a0 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

 - Dynticks updates, consolidating open-coded counter accesses into a well-defined API

 - SRCU updates: Simplify algorithm, add formal verification

 - Documentation updates

 - Miscellaneous fixes

 - Torture-test updates

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-31 07:45:42 +01:00
David S. Miller 4e8f2fc1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two trivial overlapping changes conflicts in MPLS and mlx5.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28 10:33:06 -05:00
Matthew Wilcox 35534c869c radix tree: constify some pointers
If we're just getting the value of a tag, or looking up an entry,
we won't modify the radix tree, so we can declare these functions as
taking a const pointer.  Mostly for documentation purposes, though it
might help code generation.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-01-27 21:29:38 -05:00
Omar Sandoval 24af1ccfe1 sbitmap: add helpers for dumping to a seq_file
This is useful debugging information that will be used in the blk-mq
debugfs directory.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>

Changed 'weight' to 'busy'.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27 08:17:44 -07:00
Dave Airlie b0df0b251b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next
Backmerge Linus master to get the connector locking revert.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux: (645 commits)
  sysctl: fix proc_doulongvec_ms_jiffies_minmax()
  Revert "drm/probe-helpers: Drop locking from poll_enable"
  MAINTAINERS: add Dan Streetman to zbud maintainers
  MAINTAINERS: add Dan Streetman to zswap maintainers
  mm: do not export ioremap_page_range symbol for external module
  mn10300: fix build error of missing fpu_save()
  romfs: use different way to generate fsid for BLOCK or MTD
  frv: add missing atomic64 operations
  mm, page_alloc: fix premature OOM when racing with cpuset mems update
  mm, page_alloc: move cpuset seqcount checking to slowpath
  mm, page_alloc: fix fast-path race with cpuset update or removal
  mm, page_alloc: fix check for NULL preferred_zone
  kernel/panic.c: add missing \n
  fbdev: color map copying bounds checking
  frv: add atomic64_add_unless()
  mm/mempolicy.c: do not put mempolicy before using its nodemask
  radix-tree: fix private list warnings
  Documentation/filesystems/proc.txt: add VmPin
  mm, memcg: do not retry precharge charges
  proc: add a schedule point in proc_pid_readdir()
  ...
2017-01-27 11:00:42 +10:00
Luis R. Rodriguez 061132d2b9 test_firmware: add test custom fallback trigger
We have no custom fallback mechanism test interface. Provide one.
This tests both the custom fallback mechanism and cancelling the
it.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-25 11:52:34 +01:00
Luis R. Rodriguez 083a93b0c1 test_firmware: use device attribute groups
This simplifies init and exit.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-25 11:52:34 +01:00
Luis R. Rodriguez 67fd553ce0 test_firmware: move misc_device down
This will make further changes easier to review.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-25 11:52:34 +01:00
zhong jiang 3277953de2 mm: do not export ioremap_page_range symbol for external module
Recently, I've found cases in which ioremap_page_range was used
incorrectly, in external modules, leading to crashes.  This can be
partly attributed to the fact that ioremap_page_range is lower-level,
with fewer protections, as compared to the other functions that an
external module would typically call.  Those include:

     ioremap_cache
     ioremap_nocache
     ioremap_prot
     ioremap_uc
     ioremap_wc
     ioremap_wt

...each of which wraps __ioremap_caller, which in turn provides a safer
way to achieve the mapping.

Therefore, stop EXPORT-ing ioremap_page_range.

Link: http://lkml.kernel.org/r/1485173220-29010-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Suggested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00
Matthew Wilcox dd040b6f6d radix-tree: fix private list warnings
The newly introduced warning in radix_tree_free_nodes() was testing the
wrong variable; it should have been 'old' instead of 'node'.

Fixes: ea07b862ac ("mm: workingset: fix use-after-free in shadow node shrinker")
Link: http://lkml.kernel.org/r/20170118163746.GA32495@cmpxchg.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00
Bart Van Assche 551199aca1 lib/dma-virt: Add dma_virt_ops
Several RDMA drivers (hfi1, qib and rxe) expect that ib_sge.addr
is a virtual address. Provide DMA mapping operations that are
suitable for these drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche 7844572c63 lib/dma-noop: Only build dma_noop_ops for s390 and m32r
Reduce the kernel size by only building dma_noop_ops for those
architectures that actually use it. This was suggested by
Christoph Hellwig.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche 1eec9e2bef lib/dma-noop: Clarify a comment
The next patch in this series will introduce another set of DMA
operations that map 1:1 with memory. Clarify that dma-noop maps
to physical addresses.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche 5299709d0a treewide: Constify most dma_map_ops structures
Most dma_map_ops structures are never modified. Constify these
structures such that these can be write-protected. This patch
has been generated as follows:

git grep -l 'struct dma_map_ops' |
  xargs -d\\n sed -i \
    -e 's/struct dma_map_ops/const struct dma_map_ops/g' \
    -e 's/const struct dma_map_ops {/struct dma_map_ops {/g' \
    -e 's/^const struct dma_map_ops;$/struct dma_map_ops;/' \
    -e 's/const const struct dma_map_ops /const struct dma_map_ops /g';
sed -i -e 's/const \(struct dma_map_ops intel_dma_ops\)/\1/' \
  $(git grep -l 'struct dma_map_ops intel_dma_ops');
sed -i -e 's/const \(struct dma_map_ops dma_iommu_ops\)/\1/' \
  $(git grep -l 'struct dma_map_ops' | grep ^arch/powerpc);
sed -i -e '/^struct vmd_dev {$/,/^};$/ s/const \(struct dma_map_ops[[:blank:]]dma_ops;\)/\1/' \
       -e '/^static void vmd_setup_dma_ops/,/^}$/ s/const \(struct dma_map_ops \*dest\)/\1/' \
       -e 's/const \(struct dma_map_ops \*dest = \&vmd->dma_ops\)/\1/' \
    drivers/pci/host/*.c
sed -i -e '/^void __init pci_iommu_alloc(void)$/,/^}$/ s/dma_ops->/intel_dma_ops./' arch/ia64/kernel/pci-dma.c
sed -i -e 's/static const struct dma_map_ops sn_dma_ops/static struct dma_map_ops sn_dma_ops/' arch/ia64/sn/pci/pci_dma.c
sed -i -e 's/(const struct dma_map_ops \*)//' drivers/misc/mic/bus/vop_bus.c

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: x86@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Matt Fleming 961518259b rcu: Enable RCU tracepoints by default to aid in debugging
While debugging a performance issue I needed to understand why
RCU sofitrqs were firing so frequently.

Unfortunately, the RCU callback tracepoints are hidden behind
CONFIG_RCU_TRACE which defaults to off in the upstream kernel and is
likely to also be disabled in enterprise distribution configs.

Enable it by default for CONFIG_TREE_RCU. However, we must keep it
disabled for tiny RCU, because it would otherwise pull in a large
amount of code that would make tiny RCU less than tiny.

I ran some file system metadata intensive workloads (git checkout,
FS-Mark) on a variety of machines with this patch and saw no
detectable change in performance.

Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:13 -08:00
Chris Wilson 717c8ae7aa lib/prime_numbers: Suppress warn on kmalloc failure
The allocation for the bitmap may become very large, larger than
MAX_ORDER, for large requests. We fail gracefully by falling back to
trail-division, so disable the warning from kmalloc:

  521.961092] WARNING: CPU: 0 PID: 30637 at mm/page_alloc.c:3548 __alloc_pages_slowpath+0x237/0x9a0
[  521.961105] Modules linked in: i915(+) drm_kms_helper intel_gtt prime_numbers [last unloaded: drm_kms_helper]
[  521.961126] CPU: 0 PID: 30637 Comm: drv_selftest Tainted: G     U  W       4.10.0-rc3+ #321
[  521.961137] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[  521.961148] Call Trace:
[  521.961161]  dump_stack+0x4d/0x6f
[  521.961172]  __warn+0xc1/0xe0
[  521.961181]  warn_slowpath_null+0x18/0x20
[  521.961189]  __alloc_pages_slowpath+0x237/0x9a0
[  521.961200]  ? sg_init_table+0x1a/0x40
[  521.961208]  ? get_page_from_freelist+0x3fa/0x910
[  521.961275]  ? i915_gem_object_get_sg+0x272/0x2b0 [i915]
[  521.961285]  __alloc_pages_nodemask+0x1ea/0x220
[  521.961295]  kmalloc_order+0x1c/0x50
[  521.961304]  __kmalloc+0x115/0x170
[  521.961314]  expand_to_next_prime+0x43/0x180 [prime_numbers]
[  521.961324]  next_prime_number+0x47/0xc0 [prime_numbers]
[  521.961377]  igt_vma_rotate+0x386/0x590 [i915]
[  521.961429]  i915_subtests+0x37/0xc0 [i915]
[  521.961481]  i915_vma_mock_selftests+0x3d/0x70 [i915]
[  521.961532]  run_selftests+0x16e/0x1f0 [i915]
[  521.961541]  ? 0xffffffffa02a4000
[  521.961592]  i915_mock_selftests+0x29/0x40 [i915]
[  521.961638]  i915_init+0xa/0x5e [i915]
[  521.961646]  ? 0xffffffffa02a4000
[  521.961655]  do_one_initcall+0x3e/0x160
[  521.961664]  ? __vunmap+0x7c/0xc0
[  521.961672]  ? vfree+0x29/0x70
[  521.961680]  ? kmem_cache_alloc+0xcf/0x120
[  521.961690]  do_init_module+0x55/0x1c4
[  521.961699]  load_module+0x1f3f/0x25b0
[  521.961707]  ? __symbol_put+0x40/0x40
[  521.961716]  ? kernel_read_file+0x100/0x190
[  521.961725]  SYSC_finit_module+0xbc/0xf0
[  521.961734]  SyS_finit_module+0x9/0x10
[  521.961744]  entry_SYSCALL_64_fastpath+0x17/0x98
[  521.961752] RIP: 0033:0x7f111aca4119
[  521.961760] RSP: 002b:00007ffd8be6cbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  521.961773] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f111aca4119
[  521.961781] RDX: 0000000000000000 RSI: 000055dfc18bc8e0 RDI: 0000000000000006
[  521.961789] RBP: 00007ffd8be6bbe0 R08: 0000000000000000 R09: 0000000000000000
[  521.961796] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000005
[  521.961805] R13: 000055dfc18bd3a0 R14: 00007ffd8be6bbc0 R15: 0000000000000005

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170113235119.22528-1-chris@chris-wilson.co.uk
2017-01-23 09:17:12 +01:00
Eric Dumazet aaf0f2fa68 percpu_counter: percpu_counter_hotcpu_callback() cleanup
In commit ebd8fef304 ("percpu_counter: make percpu_counters_lock
irq-safe") we disabled irqs in percpu_counter_hotcpu_callback()

We can grab every counter spinlock without having to disable
irqs again.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-01-20 10:06:56 -05:00
Geliang Tang d852d39432 timerqueue: Use rb_entry_safe() instead of open-coding it
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/0d5cf199ac43792df0b6f7e2145545c30fa1dbbe.1482222135.git.geliangtang@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-20 08:03:42 +01:00
Omar Sandoval 6c0ca7ae29 sbitmap: fix wakeup hang after sbq resize
When we resize a struct sbitmap_queue, we update the wakeup batch size,
but we don't update the wait count in the struct sbq_wait_states. If we
resized down from a size which could use a bigger batch size, these
counts could be too large and cause us to miss necessary wakeups. To fix
this, update the wait counts when we resize (ensuring some careful
memory ordering so that it's safe w.r.t. concurrent clears).

This also fixes a theoretical issue where two threads could end up
bumping the wait count up by the batch size, which could also
potentially lead to hangs.

Reported-by: Martin Raiber <martin@urbackup.org>
Fixes: e3a2b3f931 ("blk-mq: allow changing of queue depth through sysfs")
Fixes: 2971c35f35 ("blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-18 13:41:55 -07:00
Omar Sandoval f66227de59 sbitmap: use smp_mb__after_atomic() in sbq_wake_up()
We always do an atomic clear_bit() right before we call sbq_wake_up(),
so we can use smp_mb__after_atomic(). While we're here, comment the
memory barriers in here a little more.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-18 13:41:49 -07:00
David S. Miller 580bdf5650 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-17 15:19:37 -05:00
Linus Torvalds 203f80f1c4 Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fix from Konrad Rzeszutek Wilk:
 "A tiny fix to make sure that page-sized mappings are page-aligned (and
  not say straddle two pages). This is important for some drivers (such
  as NVME)"

* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: ensure that page-sized mappings are page-aligned
2017-01-17 09:27:50 -08:00
Greg Kroah-Hartman 11cca3d12f Merge 4.10-rc4 into tty-next
We want the serial/tty fixes in here as well to build on top of.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-16 16:57:54 +01:00
Nikita Yushchenko 602d9858f0 swiotlb: ensure that page-sized mappings are page-aligned
Some drivers do depend on page mappings to be page aligned.

Swiotlb already enforces such alignment for mappings greater than page,
extend that to page-sized mappings as well.

Without this fix, nvme hits BUG() in nvme_setup_prps(), because that routine
assumes page-aligned mappings.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2017-01-15 12:37:24 -05:00
Linus Torvalds f4d3935e4f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

The most notable fix here is probably the fix for a splice regression
("fix a fencepost error in pipe_advance()") noticed by Alan Wylie.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix a fencepost error in pipe_advance()
  coredump: Ensure proper size of sparse core files
  aio: fix lock dep warning
  tmpfs: clear S_ISGID when setting posix ACLs
2017-01-14 17:13:28 -08:00
Al Viro b9dc6f65bc fix a fencepost error in pipe_advance()
The logics in pipe_advance() used to release all buffers past the new
position failed in cases when the number of buffers to release was equal
to pipe->buffers.  If that happened, none of them had been released,
leaving pipe full.  Worse, it was trivial to trigger and we end up with
pipe full of uninitialized pages.  IOW, it's an infoleak.

Cc: stable@vger.kernel.org # v4.9
Reported-by: "Alan J. Wylie" <alan@wylie.me.uk>
Tested-by: "Alan J. Wylie" <alan@wylie.me.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-14 19:50:41 -05:00
Chris Wilson f2a5fec173 locking/ww_mutex: Begin kselftests for ww_mutex
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Nicolai Hähnle <nhaehnle@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161201114711.28697-4-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:37:14 +01:00
Will Deacon 42d1a731ff Merge branch 'aarch64/for-next/debug-virtual' into aarch64/for-next/core
Merge core DEBUG_VIRTUAL changes from Laura Abbott. Later arm and arm64
support depends on these.

* aarch64/for-next/debug-virtual:
  drivers: firmware: psci: Use __pa_symbol for kernel symbol
  mm/usercopy: Switch to using lm_alias
  mm/kasan: Switch to using __pa_symbol and lm_alias
  kexec: Switch to __pa_symbol
  mm: Introduce lm_alias
  mm/cma: Cleanup highmem check
  lib/Kconfig.debug: Add ARCH_HAS_DEBUG_VIRTUAL
2017-01-12 15:04:29 +00:00
Felix Fietkau 732dbf3a61 serial: do not accept sysrq characters via serial port
many embedded boards have a disconnected TTL level serial which can
generate some garbage that can lead to spurious false sysrq detects.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-12 11:51:24 +01:00
David S. Miller 02ac5d1487 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two AF_* families adding entries to the lockdep tables
at the same time.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 14:43:39 -05:00
Laura Abbott fa5b6ec9e5 lib/Kconfig.debug: Add ARCH_HAS_DEBUG_VIRTUAL
DEBUG_VIRTUAL currently depends on DEBUG_KERNEL && X86. arm64 is getting
the same support. Rather than add a list of architectures, switch this
to ARCH_HAS_DEBUG_VIRTUAL and let architectures select it as
appropriate.

Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-11 13:56:49 +00:00
Sudip Mukherjee da0510c475 lib/Kconfig.debug: fix frv build failure
The build of frv allmodconfig was failing with the errors like:

  /tmp/cc0JSPc3.s: Assembler messages:
  /tmp/cc0JSPc3.s:1839: Error: symbol `.LSLT0' is already defined
  /tmp/cc0JSPc3.s:1842: Error: symbol `.LASLTP0' is already defined
  /tmp/cc0JSPc3.s:1969: Error: symbol `.LELTP0' is already defined
  /tmp/cc0JSPc3.s:1970: Error: symbol `.LELT0' is already defined

Commit 866ced950b ("kbuild: Support split debug info v4") introduced
splitting the debug info and keeping that in a separate file.  Somehow,
the frv-linux gcc did not like that and I am guessing that instead of
splitting it started copying.  The first report about this is at:

  https://lists.01.org/pipermail/kbuild-all/2015-July/010527.html.

I will try and see if this can work with frv and if still fails I will
open a bug report with gcc.  But meanwhile this is the easiest option to
solve build failure of frv.

Fixes: 866ced950b ("kbuild: Support split debug info v4")
Link: http://lkml.kernel.org/r/1482062348-5352-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
David S. Miller bb1d303444 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-09 15:39:11 -05:00
Jason A. Donenfeld 1ae2324f73 siphash: implement HalfSipHash1-3 for hash tables
HalfSipHash, or hsiphash, is a shortened version of SipHash, which
generates 32-bit outputs using a weaker 64-bit key. It has *much* lower
security margins, and shouldn't be used for anything too sensitive, but
it could be used as a hashtable key function replacement, if the output
is never exposed, and if the security requirement is not too high.

The goal is to make this something that performance-critical jhash users
would be willing to use.

On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias
SipHash1-3 to HalfSipHash1-3 on those systems.

64-bit x86_64:
[    0.509409] test_siphash:     SipHash2-4 cycles: 4049181
[    0.510650] test_siphash:     SipHash1-3 cycles: 2512884
[    0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920
[    0.512904] test_siphash:    JenkinsHash cycles:  978267
So, we map hsiphash() -> SipHash1-3

32-bit x86:
[    0.509868] test_siphash:     SipHash2-4 cycles: 14812892
[    0.513601] test_siphash:     SipHash1-3 cycles:  9510710
[    0.515263] test_siphash: HalfSipHash1-3 cycles:  3856157
[    0.515952] test_siphash:    JenkinsHash cycles:  1148567
So, we map hsiphash() -> HalfSipHash1-3

hsiphash() is roughly 3 times slower than jhash(), but comes with a
considerable security improvement.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 13:58:57 -05:00
Jason A. Donenfeld 2c956a6077 siphash: add cryptographically secure PRF
SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function, or as a
general PRF for short input use cases, such as sequence numbers or RNG
chaining.

For the first usage:

There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector. Currently
hashtables use jhash, which is fast but not secure, and some kind of
rotating key scheme (or none at all, which isn't good). SipHash is meant
as a replacement for jhash in these cases.

There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.

While SipHash is extremely fast for a cryptographically secure function,
it is likely a bit slower than the insecure jhash, and so replacements
will be evaluated on a case-by-case basis based on whether or not the
difference in speed is negligible and whether or not the current jhash usage
poses a real security risk.

For the second usage:

A few places in the kernel are using MD5 or SHA1 for creating secure
sequence numbers, syn cookies, port numbers, or fast random numbers.
SipHash is a faster and more fitting, and more secure replacement for MD5
in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
obvious and straight-forward, and so is submitted along with this patch
series. There shouldn't be much of a debate over its efficacy.

Dozens of languages are already using this internally for their hash
tables and PRFs. Some of the BSDs already use this in their kernels.
SipHash is a widely known high-speed solution to a widely known set of
problems, and it's time we catch-up.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 13:58:57 -05:00
Dave Airlie 3806a271bf Merge tag 'drm-misc-next-2016-12-30' of git://anongit.freedesktop.org/git/drm-misc into drm-next
First -misc pull for 4.11:
- drm_mm rework + lots of selftests (Chris Wilson)
- new connector_list locking+iterators
- plenty of kerneldoc updates
- format handling rework from Ville
- atomic helper changes from Maarten for better plane corner-case handling
  in drivers, plus the i915 legacy cursor patch that needs this
- bridge cleanup from Laurent
- plus plenty of small stuff all over
- also contains a merge of the 4.10 docs tree so that we could apply the
  dma-buf kerneldoc patches

It's a lot more than usual, but due to the merge window blackout it also
covers about 4 weeks, so all in line again on a per-week basis. The more
annoying part with no pull request for 4 weeks is managing cross-tree
work. The -intel pull request I'll follow up with does conflict quite a
bit with -misc here. Longer-term (if drm-misc keeps growing) a
drm-next-queued to accept pull request for the next merge window during
this time might be useful.

I'd also like to backmerge -rc2+this into drm-intel next week, we have
quite a pile of patches waiting for the stuff in here.

* tag 'drm-misc-next-2016-12-30' of git://anongit.freedesktop.org/git/drm-misc: (126 commits)
  drm: Add kerneldoc markup for new @scan parameters in drm_mm
  drm/mm: Document locking rules
  drm: Use drm_mm_insert_node_in_range_generic() for everyone
  drm: Apply range restriction after color adjustment when allocation
  drm: Wrap drm_mm_node.hole_follows
  drm: Apply tight eviction scanning to color_adjust
  drm: Simplify drm_mm scan-list manipulation
  drm: Optimise power-of-two alignments in drm_mm_scan_add_block()
  drm: Compute tight evictions for drm_mm_scan
  drm: Fix application of color vs range restriction when scanning drm_mm
  drm: Unconditionally do the range check in drm_mm_scan_add_block()
  drm: Rename prev_node to hole in drm_mm_scan_add_block()
  drm: Fix O= out-of-tree builds for selftests
  drm: Extract struct drm_mm_scan from struct drm_mm
  drm: Add asserts to catch overflow in drm_mm_init() and drm_mm_init_scan()
  drm: Simplify drm_mm_clean()
  drm: Detect overflow in drm_mm_reserve_node()
  drm: Fix kerneldoc for drm_mm_scan_remove_block()
  drm: Promote drm_mm alignment to u64
  drm: kselftest for drm_mm and restricted color eviction
  ...
2017-01-09 09:55:57 +10:00
Johannes Weiner ea07b862ac mm: workingset: fix use-after-free in shadow node shrinker
Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked like
use-after-free access from the shadow node shrinker.

Dave Jones managed to reproduce the issue with a debug patch applied,
which confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:

  WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
  CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
  Call Trace:
     delete_node+0x1e4/0x200
     __radix_tree_delete_node+0xd/0x10
     shadow_lru_isolate+0xe6/0x220
     __list_lru_walk_one.isra.4+0x9b/0x190
     list_lru_walk_one+0x23/0x30
     scan_shadow_nodes+0x2e/0x40
     shrink_slab.part.44+0x23d/0x5d0
     shrink_node+0x22c/0x330
     kswapd+0x392/0x8f0

This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
inlined radix_tree_shrink().

The problem is with 14b468791f ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node.

While the reclaimed shadow node itself is unlinked by the shrinker, its
deletion from the tree can cause the left-most leaf node in the tree to
be shrunk.  If that happens to be a shadow node as well, we don't unlink
it from the LRU as we should.

Consider this tree, where the s are shadow entries:

       root->rnode
            |
       [0       n]
        |       |
     [s    ] [sssss]

Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:

       root->rnode
            |
       [0        ]
        |
    [s     ]

Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:

       root->rnode
            |
       [s        ]

The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root->rnode directly and free
the node - which is a leaf node on the shadow node LRU.

  root->rnode
       |
       s

Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.

Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.

Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.

Fixes: 14b468791f ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: Dave Chinner <david@fromorbit.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Leech <cleech@redhat.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-07 18:22:40 -08:00
Linus Torvalds 2fd8774c79 Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
 "This has one fix to make i915 work when using Xen SWIOTLB, and a
  feature from Geert to aid in debugging of devices that can't do DMA
  outside the 32-bit address space.

  The feature from Geert is on top of v4.10 merge window commit
  (specifically you pulling my previous branch), as his changes were
  dependent on the Documentation/ movement patches.

  I figured it would just easier than me trying than to cherry-pick the
  Documentation patches to satisfy git.

  The patches have been soaking since 12/20, albeit I updated the last
  patch due to linux-next catching an compiler error and adding an
  Tested-and-Reported-by tag"

* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: Export swiotlb_max_segment to users
  swiotlb: Add swiotlb=noforce debug option
  swiotlb: Convert swiotlb_force from int to enum
  x86, swiotlb: Simplify pci_swiotlb_detect_override()
2017-01-06 10:53:21 -08:00
Konrad Rzeszutek Wilk 7453c549f5 swiotlb: Export swiotlb_max_segment to users
So they can figure out what is the optimal number of pages
that can be contingously stitched together without fear of
bounce buffer.

We also expose an mechanism for sub-users of SWIOTLB API, such
as Xen-SWIOTLB to set the max segment value. And lastly
if swiotlb=force is set (which mandates we bounce buffer everything)
we set max_segment so at least we can bounce buffer one 4K page
instead of a giant 512KB one for which we may not have space.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-Tested-by: Juergen Gross <jgross@suse.com>
2017-01-06 13:00:01 -05:00
Chris Wilson cf4a7207b1 lib: Add a simple prime number generator
Prime numbers are interesting for testing components that use multiplies
and divides, such as testing DRM's struct drm_mm alignment computations.

v2: Move to lib/, add selftest
v3: Fix initial constants (exclude 0/1 from being primes)
v4: More RCU markup to keep 0day/sparse happy
v5: Fix RCU unwind on module exit, add to kselftests
v6: Tidy computation of bitmap size
v7: for_each_prime_number_from()
v8: Compose small-primes using BIT() for easier verification
v9: Move rcu dance entirely into callers.
v10: Improve quote for Betrand's Postulate (aka Chebyshev's theorem)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222144514.3911-1-chris@chris-wilson.co.uk
2016-12-27 12:30:56 +01:00
Linus Torvalds 3ddc76dfc7 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer type cleanups from Thomas Gleixner:
 "This series does a tree wide cleanup of types related to
  timers/timekeeping.

   - Get rid of cycles_t and use a plain u64. The type is not really
     helpful and caused more confusion than clarity

   - Get rid of the ktime union. The union has become useless as we use
     the scalar nanoseconds storage unconditionally now. The 32bit
     timespec alike storage got removed due to the Y2038 limitations
     some time ago.

     That leaves the odd union access around for no reason. Clean it up.

  Both changes have been done with coccinelle and a small amount of
  manual mopping up"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ktime: Get rid of ktime_equal()
  ktime: Cleanup ktime_set() usage
  ktime: Get rid of the union
  clocksource: Use a plain u64 instead of cycle_t
2016-12-25 14:30:04 -08:00
Linus Torvalds b272f732f8 Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP hotplug notifier removal from Thomas Gleixner:
 "This is the final cleanup of the hotplug notifier infrastructure. The
  series has been reintgrated in the last two days because there came a
  new driver using the old infrastructure via the SCSI tree.

  Summary:

   - convert the last leftover drivers utilizing notifiers

   - fixup for a completely broken hotplug user

   - prevent setup of already used states

   - removal of the notifiers

   - treewide cleanup of hotplug state names

   - consolidation of state space

  There is a sphinx based documentation pending, but that needs review
  from the documentation folks"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/armada-xp: Consolidate hotplug state space
  irqchip/gic: Consolidate hotplug state space
  coresight/etm3/4x: Consolidate hotplug state space
  cpu/hotplug: Cleanup state names
  cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
  staging/lustre/libcfs: Convert to hotplug state machine
  scsi/bnx2i: Convert to hotplug state machine
  scsi/bnx2fc: Convert to hotplug state machine
  cpu/hotplug: Prevent overwriting of callbacks
  x86/msr: Remove bogus cleanup from the error path
  bus: arm-ccn: Prevent hotplug callback leak
  perf/x86/intel/cstate: Prevent hotplug callback leak
  ARM/imx/mmcd: Fix broken cpu hotplug handling
  scsi: qedi: Convert to hotplug state machine
2016-12-25 14:05:56 -08:00
Thomas Gleixner 2456e85535 ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Thomas Gleixner 530e9b76ae cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(),
register_hotcpu_notifier(), register_cpu_notifier(),
__register_hotcpu_notifier(), __register_cpu_notifier(),
unregister_hotcpu_notifier(), unregister_cpu_notifier(),
__unregister_hotcpu_notifier(), __unregister_cpu_notifier()

are unused now. Remove them and all related code.

Remove also the now pointless cpu notifier error injection mechanism. The
states can be executed step by step and error rollback is the same as cpu
down, so any state transition can be tested w/o requiring the notifier
error injection.

Some CPU hotplug states are kept as they are (ab)used for hotplug state
tracking.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-25 10:47:43 +01:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds a307d0a007 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull final vfs updates from Al Viro:
 "Assorted cleanups and fixes all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sg_write()/bsg_write() is not fit to be called under KERNEL_DS
  ufs: fix function declaration for ufs_truncate_blocks
  fs: exec: apply CLOEXEC before changing dumpable task flags
  seq_file: reset iterator to first record for zero offset
  vfs: fix isize/pos/len checks for reflink & dedupe
  [iov_iter] fix iterate_all_kinds() on empty iterators
  move aio compat to fs/aio.c
  reorganize do_make_slave()
  clone_private_mount() doesn't need to touch namespace_sem
  remove a bogus claim about namespace_sem being held by callers of mnt_alloc_id()
2016-12-23 10:52:43 -08:00
Al Viro 33844e6651 [iov_iter] fix iterate_all_kinds() on empty iterators
Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
and followups, except that in this case we really want to do nothing when
asked for zero-length operation - unlike zero-length iterate_and_advance(),
zero-length iterate_all_kinds() has no side effects, and callers are simpler
that way.

That got exposed when copy_from_iter_full() had been used by tipc, which
builds an msghdr with zero payload and (now) feeds it to a primitive
based on iterate_all_kinds() instead of iterate_and_advance().

Reported-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22 23:00:22 -05:00
Borislav Petkov 50f4d9bda9 printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text
s/prink/printk/

Link: http://lkml.kernel.org/r/20161215170111.19075-1-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Olof Johansson <olof@lixom.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:47 -08:00
Geert Uytterhoeven fff5d99225 swiotlb: Add swiotlb=noforce debug option
On architectures like arm64, swiotlb is tied intimately to the core
architecture DMA support. In addition, ZONE_DMA cannot be disabled.

To aid debugging and catch devices not supporting DMA to memory outside
the 32-bit address space, add a kernel command line option
"swiotlb=noforce", which disables the use of bounce buffers.
If specified, trying to map memory that cannot be used with DMA will
fail, and a rate-limited warning will be printed.

Note that io_tlb_nslabs is set to 1, which is the minimal supported
value.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-12-19 09:05:20 -05:00
Geert Uytterhoeven ae7871be18 swiotlb: Convert swiotlb_force from int to enum
Convert the flag swiotlb_force from an int to an enum, to prepare for
the advent of more possible values.

Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-12-19 09:05:20 -05:00
Linus Torvalds 9a19a6db37 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:

 - more ->d_init() stuff (work.dcache)

 - pathname resolution cleanups (work.namei)

 - a few missing iov_iter primitives - copy_from_iter_full() and
   friends. Either copy the full requested amount, advance the iterator
   and return true, or fail, return false and do _not_ advance the
   iterator. Quite a few open-coded callers converted (and became more
   readable and harder to fuck up that way) (work.iov_iter)

 - several assorted patches, the big one being logfs removal

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  logfs: remove from tree
  vfs: fix put_compat_statfs64() does not handle errors
  namei: fold should_follow_link() with the step into not-followed link
  namei: pass both WALK_GET and WALK_MORE to should_follow_link()
  namei: invert WALK_PUT logics
  namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
  namei: saner calling conventions for mountpoint_last()
  namei.c: get rid of user_path_parent()
  switch getfrag callbacks to ..._full() primitives
  make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
  [iov_iter] new primitives - copy_from_iter_full() and friends
  don't open-code file_inode()
  ceph: switch to use of ->d_init()
  ceph: unify dentry_operations instances
  lustre: switch to use of ->d_init()
2016-12-16 10:24:44 -08:00
Matthew Wilcox b9a0deb96b redo: radix tree test suite: fix compilation
[ This resurrects commit 53855d10f4, which was reverted in
  2b41226b39.  It depended on commit d544abd5ff ("lib/radix-tree:
  Convert to hotplug state machine") so now it is correct to apply ]

Patch "lib/radix-tree: Convert to hotplug state machine" breaks the test
suite as it adds a call to cpuhp_setup_state_nocalls() which is not
currently emulated in the test suite.  Add it, and delete the emulation
of the old CPU hotplug mechanism.

Link: http://lkml.kernel.org/r/1480369871-5271-36-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 11:04:20 -08:00
Matthew Wilcox e8de434076 radix-tree: ensure counts are initialised
radix_tree_join() was freeing nodes with a non-zero ->exceptional count,
and radix_tree_split() wasn't zeroing ->exceptional when it allocated
the new node.  Fix this by making all callers of radix_tree_node_alloc()
pass in the new counts (and some other always-initialised fields), which
will prevent the problem recurring if in future we decide to do
something similar.

Link: http://lkml.kernel.org/r/1481667692-14500-3-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox a90eb3a2a4 radix-tree: fix replacement for multiorder entries
When replacing an entry with NULL, we need to delete any sibling
entries.  Also account deleting exceptional entries properly.  Also fix
a bug with radix_tree_iter_replace() where we would fail to remove
entirely freed nodes.  Also fix accounting bug when switching between
normal and exceptional entries with replace_slot.  Also add testcases
for all these bugs.

Link: http://lkml.kernel.org/r/1480369871-5271-61-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 2791653a68 radix-tree: add radix_tree_split_preload()
Calculate how many nodes we need to allocate to split an old_order entry
into multiple entries, each of size new_order.  The test suite checks
that we allocated exactly the right number of nodes; neither too many
(checked by rtp->nr == 0), nor too few (checked by comparing
nr_allocated before and after the call to radix_tree_split()).

Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox e157b55594 radix-tree: add radix_tree_split
This new function splits a larger multiorder entry into smaller entries
(potentially multi-order entries).  These entries are initialised to
RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't
confused.  The caller should then call radix_tree_for_each_slot() and
radix_tree_replace_slot() in order to turn these retry entries into the
intended new entries.  Tags are replicated from the original multiorder
entry into each new entry.

Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 175542f575 radix-tree: add radix_tree_join
This new function allows for the replacement of many smaller entries in
the radix tree with one larger multiorder entry.  From the point of view
of an RCU walker, they may see a mixture of the smaller entries and the
large entry during the same walk, but they will never see NULL for an
index which was populated before the join.

Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 268f42de71 radix-tree: delete radix_tree_range_tag_if_tagged()
This is an exceptionally complicated function with just one caller
(tag_pages_for_writeback).  We devote a large portion of the runtime of
the test suite to testing this one function which has one caller.  By
introducing the new function radix_tree_iter_tag_set(), we can eliminate
all of the complexity while keeping the performance.  The caller can now
use a fairly standard radix_tree_for_each() loop, and it doesn't need to
worry about tricksy things like 'start' wrapping.

The test suite continues to spend a large amount of time investigating
this function, but now it's testing the underlying primitives such as
radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator
which are also used by other parts of the kernel.

Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 478922e2b0 radix-tree: delete radix_tree_locate_item()
This rather complicated function can be better implemented as an
iterator.  It has only one caller, so move the functionality to the only
place that needs it.  Update the test suite to follow the same pattern.

Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 148deab223 radix-tree: improve multiorder iterators
This fixes several interlinked problems with the iterators in the
presence of multiorder entries.

1. radix_tree_iter_next() would only advance by one slot, which would
   result in the iterators returning the same entry more than once if
   there were sibling entries.

2. radix_tree_next_slot() could return an internal pointer instead of
   a user pointer if a tagged multiorder entry was immediately followed by
   an entry of lower order.

3. radix_tree_next_slot() expanded to a lot more code than it used to
   when multiorder support was compiled in.  And I wasn't comfortable with
   entry_to_node() being in a header file.

Fixing radix_tree_iter_next() for the presence of sibling entries
necessarily involves examining the contents of the radix tree, so we now
need to pass 'slot' to radix_tree_iter_next(), and we need to change the
calling convention so it is called *before* dropping the lock which
protects the tree.  Also rename it to radix_tree_iter_resume(), as some
people thought it was necessary to call radix_tree_iter_next() each time
around the loop.

radix_tree_next_slot() becomes closer to how it looked before multiorder
support was introduced.  It only checks to see if the next entry in the
chunk is a sibling entry or a pointer to a node; this should be rare
enough that handling this case out of line is not a performance impact
(and such impact is amortised by the fact that the entry we just
processed was a multiorder entry).  Also, radix_tree_next_slot() used to
force a new chunk lookup for untagged entries, which is more expensive
than the out of line sibling entry skipping.

Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 218ed7503a radix-tree: improve dump output
Print the indices of the entries as unsigned (instead of signed)
integers and print the parent node of each entry to help navigate around
larger trees where the layout is not quite so obvious.  Print the
indices covered by a node.  Rearrange the order of fields printed so the
indices and parents line up for each type of entry.

Link: http://lkml.kernel.org/r/1480369871-5271-53-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox bc412fca6e radix-tree: make radix_tree_find_next_bit more useful
Since this function is specialised to the radix tree, pass in the node
and tag to calculate the address of the bitmap in
radix_tree_find_next_bit() instead of the caller.  Likewise, there is no
need to pass in the size of the bitmap.

Link: http://lkml.kernel.org/r/1480369871-5271-52-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 9498d2bb34 radix-tree: create node_tag_set()
Similar to node_tag_clear(), factor node_tag_set() out of
radix_tree_range_tag_if_tagged().

Link: http://lkml.kernel.org/r/1480369871-5271-51-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 91d9c05ac6 radix-tree: move rcu_head into a union with private_list
I want to be able to reference node->parent after freeing node.

Currently node->parent is in a union with rcu_head, so it is overwritten
when the node is put on the RCU list.  We know that private_list is not
referenced after the node is freed, so it is safe for these two members
to share space.

Link: http://lkml.kernel.org/r/1480369871-5271-50-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 91b9677c4c radix-tree: fix typo
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Andreas Platschek 0462554707 Kconfig: lib/Kconfig.ubsan fix reference to ubsan documentation
Documenation/ubsan.txt was moved to Documentation/dev-tools/ubsan.rst,
this fixes the reference.

Link: http://lkml.kernel.org/r/1476698152-29340-3-git-send-email-andreas.platschek@opentech.at
Signed-off-by: Andreas Platschek <andreas.platschek@opentech.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Andreas Platschek 700199b0c1 Kconfig: lib/Kconfig.debug: fix references to Documenation
Documentation on development tools was moved to Documentation/devl-tools
and sphinxified (renamed from .txt to .rst).

References in lib/Kconfig.debug need to be updated to the new location.

Link: http://lkml.kernel.org/r/1476698152-29340-2-git-send-email-andreas.platschek@opentech.at
Signed-off-by: Andreas Platschek <andreas.platschek@opentech.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Linus Torvalds 2a4c32edd3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD updates from Shaohua Li:

 - a raid5 writeback cache feature.

   The goal is to aggregate writes to make full stripe write and reduce
   read-modify-write. It's helpful for workload which does sequential
   write and follows fsync for example. This feature is experimental and
   off by default right now.

 - FAILFAST support.

   This fails IOs to broken raid disks quickly, so can improve latency.
   It's mainly for DASD storage, but some patches help normal raid array
   too.

 - support bad block for raid array with external metadata

 - AVX2 instruction support for raid6 parity calculation

 - normalize MD info output

 - add missing blktrace

 - other bug fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (66 commits)
  md: separate flags for superblock changes
  md: MD_RECOVERY_NEEDED is set for mddev->recovery
  md: takeover should clear unrelated bits
  md/r5cache: after recovery, increase journal seq by 10000
  md/raid5-cache: fix crc in rewrite_data_only_stripes()
  md/raid5-cache: no recovery is required when create super-block
  md: fix refcount problem on mddev when stopping array.
  md/r5cache: do r5c_update_log_state after log recovery
  md/raid5-cache: adjust the write position of the empty block if no data blocks
  md/r5cache: run_no_space_stripes() when R5C_LOG_CRITICAL == 0
  md/raid5: limit request size according to implementation limits
  md/raid5-cache: do not need to set STRIPE_PREREAD_ACTIVE repeatedly
  md/raid5-cache: remove the unnecessary next_cp_seq field from the r5l_log
  md/raid5-cache: release the stripe_head at the appropriate location
  md/raid5-cache: use ring add to prevent overflow
  md/raid5-cache: remove unnecessary function parameters
  raid5-cache: don't set STRIPE_R5C_PARTIAL_STRIPE flag while load stripe into cache
  raid5-cache: add another check conditon before replaying one stripe
  md/r5cache: enable IRQs on error path
  md/r5cache: handle alloc_page failure
  ...
2016-12-14 10:58:17 -08:00
Linus Torvalds b5cab0da75 Merge branch 'stable/for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb updates from Konrad Rzeszutek Wilk:

 - minor fixes (rate limiting), remove certain functions

 - support for DMA_ATTR_SKIP_CPU_SYNC which is an optimization
   in the DMA API

* 'stable/for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: Minor fix-ups for DMA_ATTR_SKIP_CPU_SYNC support
  swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNC
  swiotlb-xen: Enforce return of DMA_ERROR_CODE in mapping function
  swiotlb: Drop unused functions swiotlb_map_sg and swiotlb_unmap_sg
  swiotlb: Rate-limit printing when running out of SW-IOMMU space
2016-12-13 15:52:23 -08:00
Linus Torvalds c11a6cfb01 Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Mostly patches to initialize workqueue subsystem earlier and get rid
  of keventd_up().

  The patches were headed for the last merge cycle but got delayed due
  to a bug found late minute, which is fixed now.

  Also, to help debugging, destroy_workqueue() is more chatty now on a
  sanity check failure."

* 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: move wq_numa_init() to workqueue_init()
  workqueue: remove keventd_up()
  debugobj, workqueue: remove keventd_up() usage
  slab, workqueue: remove keventd_up() usage
  power, workqueue: remove keventd_up() usage
  tty, workqueue: remove keventd_up() usage
  mce, workqueue: remove keventd_up() usage
  workqueue: make workqueue available early during boot
  workqueue: dump workqueue state on sanity check failures in destroy_workqueue()
2016-12-13 12:59:57 -08:00
Shaohua Li 20737738d3 Merge branch 'md-next' into md-linus 2016-12-13 12:40:15 -08:00
Linus Torvalds 098c30557a Driver core patches for 4.10-rc1
Here's the new driver core patches for 4.10-rc1.
 
 Big thing here is the nice addition of "functional dependencies" to the
 driver core.  The idea has been talked about for a very long time, great
 job to Rafael for stepping up and implementing it. It's been tested for
 longer than the 4.9-rc1 date, we held off on merging it earlier in order
 to feel more comfortable about it.
 
 Other than that, it's just a handful of small other patches, some good
 cleanups to the mess that is the firmware class code, and we have a test
 driver for the deferred probe logic.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWFAvPQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ym3NgCgmhFeWEkp9SDt17YGGavmnzQUlBQAoJlUipJp
 PHeQkq15ZWw3wWC9FEvM
 =91M1
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here's the new driver core patches for 4.10-rc1.

  Big thing here is the nice addition of "functional dependencies" to
  the driver core. The idea has been talked about for a very long time,
  great job to Rafael for stepping up and implementing it. It's been
  tested for longer than the 4.9-rc1 date, we held off on merging it
  earlier in order to feel more comfortable about it.

  Other than that, it's just a handful of small other patches, some good
  cleanups to the mess that is the firmware class code, and we have a
  test driver for the deferred probe logic.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (30 commits)
  firmware: Correct handling of fw_state_wait() return value
  driver core: Silence device links sphinx warning
  firmware: remove warning at documentation generation time
  drivers: base: dma-mapping: Fix typo in dmam_alloc_non_coherent comments
  driver core: test_async: fix up typo found by 0-day
  firmware: move fw_state_is_done() into UHM section
  firmware: do not use fw_lock for fw_state protection
  firmware: drop bit ops in favor of simple state machine
  firmware: refactor loading status
  firmware: fix usermode helper fallback loading
  driver core: firmware_class: convert to use class_groups
  driver core: devcoredump: convert to use class_groups
  driver core: class: add class_groups support
  kernfs: Declare two local data structures static
  driver-core: fix platform_no_drv_owner.cocci warnings
  drivers/base/memory.c: Remove unused 'first_page' variable
  driver core: add CLASS_ATTR_WO()
  drivers: base: cacheinfo: support DT overrides for cache properties
  drivers: base: cacheinfo: add pr_fmt logging
  drivers: base: cacheinfo: fix boot error message when acpi is enabled
  ...
2016-12-13 11:42:18 -08:00
Linus Torvalds 36869cb93d Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the main block pull request this series. Contrary to previous
  release, I've kept the core and driver changes in the same branch. We
  always ended up having dependencies between the two for obvious
  reasons, so makes more sense to keep them together. That said, I'll
  probably try and keep more topical branches going forward, especially
  for cycles that end up being as busy as this one.

  The major parts of this pull request is:

   - Improved support for O_DIRECT on block devices, with a small
     private implementation instead of using the pig that is
     fs/direct-io.c. From Christoph.

   - Request completion tracking in a scalable fashion. This is utilized
     by two components in this pull, the new hybrid polling and the
     writeback queue throttling code.

   - Improved support for polling with O_DIRECT, adding a hybrid mode
     that combines pure polling with an initial sleep. From me.

   - Support for automatic throttling of writeback queues on the block
     side. This uses feedback from the device completion latencies to
     scale the queue on the block side up or down. From me.

   - Support from SMR drives in the block layer and for SD. From Hannes
     and Shaun.

   - Multi-connection support for nbd. From Josef.

   - Cleanup of request and bio flags, so we have a clear split between
     which are bio (or rq) private, and which ones are shared. From
     Christoph.

   - A set of patches from Bart, that improve how we handle queue
     stopping and starting in blk-mq.

   - Support for WRITE_ZEROES from Chaitanya.

   - Lightnvm updates from Javier/Matias.

   - Supoort for FC for the nvme-over-fabrics code. From James Smart.

   - A bunch of fixes from a whole slew of people, too many to name
     here"

* 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
  blk-stat: fix a few cases of missing batch flushing
  blk-flush: run the queue when inserting blk-mq flush
  elevator: make the rqhash helpers exported
  blk-mq: abstract out blk_mq_dispatch_rq_list() helper
  blk-mq: add blk_mq_start_stopped_hw_queue()
  block: improve handling of the magic discard payload
  blk-wbt: don't throttle discard or write zeroes
  nbd: use dev_err_ratelimited in io path
  nbd: reset the setup task for NBD_CLEAR_SOCK
  nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
  nvme-fabrics: Add target support for FC transport
  nvme-fabrics: Add host support for FC transport
  nvme-fabrics: Add FC transport LLDD api definitions
  nvme-fabrics: Add FC transport FC-NVME definitions
  nvme-fabrics: Add FC transport error codes to nvme.h
  Add type 0x28 NVME type code to scsi fc headers
  nvme-fabrics: patch target code in prep for FC transport support
  nvme-fabrics: set sqe.command_id in core not transports
  parser: add u64 number parser
  nvme-rdma: align to generic ib_event logging helper
  ...
2016-12-13 10:19:16 -08:00
Daniel Vetter c248891675 These are the documentation changes for 4.10.
It's another busy cycle for the docs tree, as the sphinx conversion
 continues.  Highlights include:
 
  - Further work on PDF output, which remains a bit of a pain but should be
    more solid now.
 
  - Five more DocBook template files converted to Sphinx.  Only 27 to go...
    Lots of plain-text files have also been converted and integrated.
 
  - Images in binary formats have been replaced with more source-friendly
    versions.
 
  - Various bits of organizational work, including the renaming of various
    files discussed at the kernel summit.
 
  - New documentation for the device_link mechanism.
 
 ...and, of course, lots of typo fixes and small updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYTbl7AAoJEI3ONVYwIuV63NIP/REwzThnGWFJMRSuq8Ieq2r9
 sFSQsaGTGlhyKiDoEooo+SO/Za3uTonjK+e7WZg8mhdiEdamta5aociU/71C1Yy/
 T9ur0FhcGblrvZ1NidSDvCLwuECZOMMei7mgLZ9a+KCpc4ANqqTVZSUm1blKcqhF
 XelhVXxBa0ar35l/pVzyCxkdNXRWXv+MJZE8hp5XAdTdr11DS7UY9zrZdH31axtf
 BZlbYJrvB8WPydU6myTjRpirA17Hu7uU64MsL3bNIEiRQ+nVghEzQC8uxeUCvfVx
 r0H5AgGGQeir+e8GEv2T20SPZ+dumXs+y/HehKNb3jS3gV0mo+pKPeUhwLIxr+Zh
 QY64gf+jYf5ISHwAJRnU0Ima72ehObzSbx9Dko10nhq2OvbR5f83gjz9t9jKYFU7
 RDowICA8lwqyRbHRoVfyoW8CpVhWFpMFu3yNeJMckeTish3m7ANqzaWslbsqIP5G
 zxgFMIrVVSbeae+sUeygtEJAnWI09aZ4tuaUXYtGWwu6ikC/3aV6DryP4bthG2LF
 A19uV4nMrLuuh8g2wiTHHjMfjYRwvSn+f9yaolwJhwyNDXQzRPy+ZJ3W/6olOkXC
 bAxTmVRCW5GA/fmSrfXmW1KbnxlWfP2C62hzZQ09UHxzTHdR97oFLDQdZhKo1uwf
 pmSJR0hVeRUmA4uw6+Su
 =A0EV
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.10' of git://git.lwn.net/linux into drm-misc-next

Backmerge the docs-next branch from Jon into drm-misc so that we can
apply the dma-buf documentation cleanup patches. Git found a conflict
where there was none because both drm-misc and docs had identical
patches to clean up file rename issues in the rst include directives.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-12-13 10:36:39 +01:00
Linus Torvalds e7aa8c2eb1 These are the documentation changes for 4.10.
It's another busy cycle for the docs tree, as the sphinx conversion
 continues.  Highlights include:
 
  - Further work on PDF output, which remains a bit of a pain but should be
    more solid now.
 
  - Five more DocBook template files converted to Sphinx.  Only 27 to go...
    Lots of plain-text files have also been converted and integrated.
 
  - Images in binary formats have been replaced with more source-friendly
    versions.
 
  - Various bits of organizational work, including the renaming of various
    files discussed at the kernel summit.
 
  - New documentation for the device_link mechanism.
 
 ...and, of course, lots of typo fixes and small updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYTbl7AAoJEI3ONVYwIuV63NIP/REwzThnGWFJMRSuq8Ieq2r9
 sFSQsaGTGlhyKiDoEooo+SO/Za3uTonjK+e7WZg8mhdiEdamta5aociU/71C1Yy/
 T9ur0FhcGblrvZ1NidSDvCLwuECZOMMei7mgLZ9a+KCpc4ANqqTVZSUm1blKcqhF
 XelhVXxBa0ar35l/pVzyCxkdNXRWXv+MJZE8hp5XAdTdr11DS7UY9zrZdH31axtf
 BZlbYJrvB8WPydU6myTjRpirA17Hu7uU64MsL3bNIEiRQ+nVghEzQC8uxeUCvfVx
 r0H5AgGGQeir+e8GEv2T20SPZ+dumXs+y/HehKNb3jS3gV0mo+pKPeUhwLIxr+Zh
 QY64gf+jYf5ISHwAJRnU0Ima72ehObzSbx9Dko10nhq2OvbR5f83gjz9t9jKYFU7
 RDowICA8lwqyRbHRoVfyoW8CpVhWFpMFu3yNeJMckeTish3m7ANqzaWslbsqIP5G
 zxgFMIrVVSbeae+sUeygtEJAnWI09aZ4tuaUXYtGWwu6ikC/3aV6DryP4bthG2LF
 A19uV4nMrLuuh8g2wiTHHjMfjYRwvSn+f9yaolwJhwyNDXQzRPy+ZJ3W/6olOkXC
 bAxTmVRCW5GA/fmSrfXmW1KbnxlWfP2C62hzZQ09UHxzTHdR97oFLDQdZhKo1uwf
 pmSJR0hVeRUmA4uw6+Su
 =A0EV
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.10' of git://git.lwn.net/linux

Pull documentation update from Jonathan Corbet:
 "These are the documentation changes for 4.10.

  It's another busy cycle for the docs tree, as the sphinx conversion
  continues. Highlights include:

   - Further work on PDF output, which remains a bit of a pain but
     should be more solid now.

   - Five more DocBook template files converted to Sphinx. Only 27 to
     go... Lots of plain-text files have also been converted and
     integrated.

   - Images in binary formats have been replaced with more
     source-friendly versions.

   - Various bits of organizational work, including the renaming of
     various files discussed at the kernel summit.

   - New documentation for the device_link mechanism.

  ... and, of course, lots of typo fixes and small updates"

* tag 'docs-4.10' of git://git.lwn.net/linux: (193 commits)
  dma-buf: Extract dma-buf.rst
  Update Documentation/00-INDEX
  docs: 00-INDEX: document directories/files with no docs
  docs: 00-INDEX: remove non-existing entries
  docs: 00-INDEX: add missing entries for documentation files/dirs
  docs: 00-INDEX: consolidate process/ and admin-guide/ description
  scripts: add a script to check if Documentation/00-INDEX is sane
  Docs: change sh -> awk in REPORTING-BUGS
  Documentation/core-api/device_link: Add initial documentation
  core-api: remove an unexpected unident
  ppc/idle: Add documentation for powersave=off
  Doc: Correct typo, "Introdution" => "Introduction"
  Documentation/atomic_ops.txt: convert to ReST markup
  Documentation/local_ops.txt: convert to ReST markup
  Documentation/assoc_array.txt: convert to ReST markup
  docs-rst: parse-headers.pl: cleanup the documentation
  docs-rst: fix media cleandocs target
  docs-rst: media/Makefile: reorganize the rules
  docs-rst: media: build SVG from graphviz files
  docs-rst: replace bayer.png by a SVG image
  ...
2016-12-12 21:58:13 -08:00
Linus Torvalds e34bac726d Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - various misc bits

 - most of MM (quite a lot of MM material is awaiting the merge of
   linux-next dependencies)

 - kasan

 - printk updates

 - procfs updates

 - MAINTAINERS

 - /lib updates

 - checkpatch updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (123 commits)
  init: reduce rootwait polling interval time to 5ms
  binfmt_elf: use vmalloc() for allocation of vma_filesz
  checkpatch: don't emit unified-diff error for rename-only patches
  checkpatch: don't check c99 types like uint8_t under tools
  checkpatch: avoid multiple line dereferences
  checkpatch: don't check .pl files, improve absolute path commit log test
  scripts/checkpatch.pl: fix spelling
  checkpatch: don't try to get maintained status when --no-tree is given
  lib/ida: document locking requirements a bit better
  lib/rbtree.c: fix typo in comment of ____rb_erase_color
  lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
  MAINTAINERS: add drm and drm/i915 irc channels
  MAINTAINERS: add "C:" for URI for chat where developers hang out
  MAINTAINERS: add drm and drm/i915 bug filing info
  MAINTAINERS: add "B:" for URI where to file bugs
  get_maintainer: look for arbitrary letter prefixes in sections
  printk: add Kconfig option to set default console loglevel
  printk/sound: handle more message headers
  printk/btrfs: handle more message headers
  printk/kdb: handle more message headers
  ...
2016-12-12 20:50:02 -08:00
Linus Torvalds e71c3978d6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the final round of converting the notifier mess to the state
  machine. The removal of the notifiers and the related infrastructure
  will happen around rc1, as there are conversions outstanding in other
  trees.

  The whole exercise removed about 2000 lines of code in total and in
  course of the conversion several dozen bugs got fixed. The new
  mechanism allows to test almost every hotplug step standalone, so
  usage sites can exercise all transitions extensively.

  There is more room for improvement, like integrating all the
  pointlessly different architecture mechanisms of synchronizing,
  setting cpus online etc into the core code"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  tracing/rb: Init the CPU mask on allocation
  soc/fsl/qbman: Convert to hotplug state machine
  soc/fsl/qbman: Convert to hotplug state machine
  zram: Convert to hotplug state machine
  KVM/PPC/Book3S HV: Convert to hotplug state machine
  arm64/cpuinfo: Convert to hotplug state machine
  arm64/cpuinfo: Make hotplug notifier symmetric
  mm/compaction: Convert to hotplug state machine
  iommu/vt-d: Convert to hotplug state machine
  mm/zswap: Convert pool to hotplug state machine
  mm/zswap: Convert dst-mem to hotplug state machine
  mm/zsmalloc: Convert to hotplug state machine
  mm/vmstat: Convert to hotplug state machine
  mm/vmstat: Avoid on each online CPU loops
  mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
  tracing/rb: Convert to hotplug state machine
  oprofile/nmi timer: Convert to hotplug state machine
  net/iucv: Use explicit clean up labels in iucv_init()
  x86/pci/amd-bus: Convert to hotplug state machine
  x86/oprofile/nmi: Convert to hotplug state machine
  ...
2016-12-12 19:25:04 -08:00
Daniel Vetter a2ef9471c7 lib/ida: document locking requirements a bit better
I wanted to wrap a bunch of ida_simple_get calls into their own locking,
until I dug around and read the original commit message.  Stuff like
this should imo be added to the kernel doc, let's do that.

Link: http://lkml.kernel.org/r/20161027072216.20411-1-daniel.vetter@ffwll.ch
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:09 -08:00
Jie Chen ce093a0454 lib/rbtree.c: fix typo in comment of ____rb_erase_color
In Case 3 of `sibling == parent->rb_right':

Right rotation will not change color of sl and S in the diagram
(i.e. should not change "sl" to "Sl", "S" to "s")

In Case 3 of `sibling == parent->rb_left':

     (p)           (p)
     / \           / \
    S   N    -->  sr  N
   / \           /
  Sl  sr        S
               /
              Sl

  This is actually left rotation at "S", not right rotation.

In Case 4 of `sibling == parent->rb_left':

     (p)             (s)
     / \             / \
    S   N     -->   Sl  P
   / \                 / \
  sl (sr)            (sr) N

  This is actually right rotation at "(p)" + color flips, not left
  rotation + color flips.

Link: http://lkml.kernel.org/r/1472391115-3702-1-git-send-email-fykcee1@gmail.com
Signed-off-by: Jie Chen <fykcee1@gmail.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:09 -08:00
Dave Young 6b2a65c7ff lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
With CONFIG_DEVMEM not set, CONFIG_STRICT_DEVMEM will be useless even if
it is set =y, thus let's update the dependency in Kconfig.

Link: http://lkml.kernel.org/r/20161006051217.GA31027@dhcp-128-65.nay.redhat.com
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:09 -08:00
Olof Johansson a8cfdc68f6 printk: add Kconfig option to set default console loglevel
Add a configuration option to set the default console loglevel.  This
is, as before, still possible to override at runtime through bootargs
(loglevel=<x>), sysrq and /proc/printk.

There are cases where adding additional arguments on the commandline is
impractical, and changing the default for the kernel when being built
makes more sense.  Provide such a method here, for those who choose to
do so.

Also, while touching this code, clarify the difference between
MESSAGE_LOGLEVEL_DEFAULT and CONSOLE_LOGLEVEL_DEFAULT.

Link: http://lkml.kernel.org/r/1479676829-30031-1-git-send-email-olof@lixom.net
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:09 -08:00
Johannes Weiner 14b468791f mm: workingset: move shadow entry tracking to radix tree exceptional tracking
Currently, we track the shadow entries in the page cache in the upper
bits of the radix_tree_node->count, behind the back of the radix tree
implementation.  Because the radix tree code has no awareness of them,
we rely on random subtleties throughout the implementation (such as the
node->count != 1 check in the shrinking code, which is meant to exclude
multi-entry nodes but also happens to skip nodes with only one shadow
entry, as that's accounted in the upper bits).  This is error prone and
has, in fact, caused the bug fixed in d3798ae8c6 ("mm: filemap: don't
plant shadow entries without radix tree node").

To remove these subtleties, this patch moves shadow entry tracking from
the upper bits of node->count to the existing counter for exceptional
entries.  node->count goes back to being a simple counter of valid
entries in the tree node and can be shrunk to a single byte.

This vastly simplifies the page cache code.  All accounting happens
natively inside the radix tree implementation, and maintaining the LRU
linkage of shadow nodes is consolidated into a single function in the
workingset code that is called for leaf nodes affected by a change in
the page cache tree.

This also removes the last user of the __radix_delete_node() return
value.  Eliminate it.

Link: http://lkml.kernel.org/r/20161117193211.GE23430@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:08 -08:00
Johannes Weiner 4d693d0860 lib: radix-tree: update callback for changing leaf nodes
Support handing __radix_tree_replace() a callback that gets invoked for
all leaf nodes that change or get freed as a result of the slot
replacement, to assist users tracking nodes with node->private_list.

This prepares for putting page cache shadow entries into the radix tree
root again and drastically simplifying the shadow tracking.

Link: http://lkml.kernel.org/r/20161117193134.GD23430@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:08 -08:00
Johannes Weiner f4b109c6da lib: radix-tree: add entry deletion support to __radix_tree_replace()
Page cache shadow entry handling will be a lot simpler when it can use a
single generic replacement function for pages, shadow entries, and
emptying slots.

Make __radix_tree_replace() properly account insertions and deletions in
node->count and garbage collect nodes as they become empty.  Then
re-implement radix_tree_delete() on top of it.

Link: http://lkml.kernel.org/r/20161117193058.GC23430@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:08 -08:00
Johannes Weiner 6d75f366b9 lib: radix-tree: check accounting of existing slot replacement users
The bug in khugepaged fixed earlier in this series shows that radix tree
slot replacement is fragile; and it will become more so when not only
NULL<->!NULL transitions need to be caught but transitions from and to
exceptional entries as well.  We need checks.

Re-implement radix_tree_replace_slot() on top of the sanity-checked
__radix_tree_replace().  This requires existing callers to also pass the
radix tree root, but it'll warn us when somebody replaces slots with
contents that need proper accounting (transitions between NULL entries,
real entries, exceptional entries) and where a replacement through the
slot pointer would corrupt the radix tree node counts.

Link: http://lkml.kernel.org/r/20161117193021.GB23430@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:08 -08:00
Johannes Weiner f7942430e4 lib: radix-tree: native accounting of exceptional entries
The way the page cache is sneaking shadow entries of evicted pages into
the radix tree past the node entry accounting and tracking them manually
in the upper bits of node->count is fraught with problems.

These shadow entries are marked in the tree as exceptional entries,
which are a native concept to the radix tree.  Maintain an explicit
counter of exceptional entries in the radix tree node.  Subsequent
patches will switch shadow entry tracking over to that counter.

DAX and shmem are the other users of exceptional entries.  Since slot
replacements that change the entry type from regular to exceptional must
now be accounted, introduce a __radix_tree_replace() function that does
replacement and accounting, and switch DAX and shmem over.

The increase in radix tree node size is temporary.  A followup patch
switches the shadow tracking to this new scheme and we'll no longer need
the upper bits in node->count and shrink that back to one byte.

Link: http://lkml.kernel.org/r/20161117192945.GA23430@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:08 -08:00
Linus Torvalds 6cdf89b1ca Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The tree got pretty big in this development cycle, but the net effect
  is pretty good:

    115 files changed, 673 insertions(+), 1522 deletions(-)

  The main changes were:

   - Rework and generalize the mutex code to remove per arch mutex
     primitives. (Peter Zijlstra)

   - Add vCPU preemption support: add an interface to query the
     preemption status of vCPUs and use it in locking primitives - this
     optimizes paravirt performance. (Pan Xinhui, Juergen Gross,
     Christian Borntraeger)

   - Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to
     clean up and improve the s390 lock yielding machinery and its core
     kernel impact. (Christian Borntraeger)

   - Micro-optimize mutexes some more. (Waiman Long)

   - Reluctantly add the to-be-deprecated mutex_trylock_recursive()
     interface on a temporary basis, to give the DRM code more time to
     get rid of its locking hacks. Any other users will be NAK-ed on
     sight. (We turned off the deprecation warning for the time being to
     not pollute the build log.) (Peter Zijlstra)

   - Improve the rtmutex code a bit, in light of recent long lived
     bugs/races. (Thomas Gleixner)

   - Misc fixes, cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/paravirt: Fix bool return type for PVOP_CALL()
  x86/paravirt: Fix native_patch()
  locking/ww_mutex: Use relaxed atomics
  locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked()
  locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL
  x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()
  locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted
  locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock()
  Documentation/virtual/kvm: Support the vCPU preemption check
  x86/xen: Support the vCPU preemption check
  x86/kvm: Support the vCPU preemption check
  x86/kvm: Support the vCPU preemption check
  kvm: Introduce kvm_write_guest_offset_cached()
  locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests
  locking/spinlocks, s390: Implement vcpu_is_preempted(cpu)
  locking/core, powerpc: Implement vcpu_is_preempted(cpu)
  sched/core: Introduce the vcpu_is_preempted(cpu) interface
  sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q
  locking/core: Provide common cpu_relax_yield() definition
  locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily
  ...
2016-12-12 10:48:02 -08:00
Linus Torvalds 718c0ddd6a Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this development cycle were:

   - Miscellaneous fixes, including a change to call_rcu()'s rcu_head
     alignment check.

   - Security-motivated list consistency checks, which are disabled by
     default behind DEBUG_LIST.

   - Torture-test updates.

   - Documentation updates, yet again just simple changes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  torture: Prevent jitter from delaying build-only runs
  torture: Remove obsolete files from rcutorture .gitignore
  rcu: Don't kick unless grace period or request
  rcu: Make expedited grace periods recheck dyntick idle state
  torture: Trace long read-side delays
  rcu: RCU_TRACE enables event tracing as well as debugfs
  rcu: Remove obsolete comment from __call_rcu()
  rcu: Remove obsolete rcu_check_callbacks() header comment
  rcu: Tighten up __call_rcu() rcu_head alignment check
  Documentation/RCU: Fix minor typo
  documentation: Present updated RCU guarantee
  bug: Avoid Kconfig warning for BUG_ON_DATA_CORRUPTION
  lib/Kconfig.debug: Fix typo in select statement
  lkdtm: Add tests for struct list corruption
  bug: Provide toggle for BUG on data corruption
  list: Split list_del() debug checking into separate function
  rculist: Consolidate DEBUG_LIST for list_add_rcu()
  list: Split list_add() debug checking into separate function
2016-12-12 09:09:54 -08:00
Ingo Molnar 6f38751510 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11 13:07:13 +01:00
David S. Miller 821781a9f4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-10 16:21:55 -05:00
Linus Torvalds 2b41226b39 Revert "radix tree test suite: fix compilation"
This reverts commit 53855d10f4.

It shouldn't have come in yet - it depends on the changes in linux-next
that will come in during the next merge window.  As Matthew Wilcox says,
the test suite is broken with the current state without the revert.

Requested-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-09 10:41:42 -08:00
Matthew Wilcox 53855d10f4 radix tree test suite: fix compilation
Patch "lib/radix-tree: Convert to hotplug state machine" breaks the test
suite as it adds a call to cpuhp_setup_state_nocalls() which is not
currently emulated in the test suite.  Add it, and delete the emulation
of the old CPU hotplug mechanism.

Link: http://lkml.kernel.org/r/1480369871-5271-36-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-07 17:10:00 -08:00
Linus Torvalds 5b43f97f3f Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "Two rtmutex race fixes (which miraculously never triggered, that we
  know of), plus two lockdep printk formatting regression fixes"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Fix report formatting
  locking/rtmutex: Use READ_ONCE() in rt_mutex_owner()
  locking/rtmutex: Prevent dequeue vs. unlock race
  locking/selftest: Fix output since KERN_CONT changes
2016-12-07 11:27:33 -08:00
James Smart a317178e36 parser: add u64 number parser
Will be used by the nvme-fabrics FC transport in parsing options

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-12-06 10:17:03 +02:00
Al Viro cbbd26b8b1 [iov_iter] new primitives - copy_from_iter_full() and friends
copy_from_iter_full(), copy_from_iter_full_nocache() and
csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
et.al., advancing iterator only in case of successful full copy
and returning whether it had been successful or not.

Convert some obvious users.  *NOTE* - do not blindly assume that
something is a good candidate for those unless you are sure that
not advancing iov_iter in failure case is the right thing in
this case.  Anything that does short read/short write kind of
stuff (or is in a loop, etc.) is unlikely to be a good one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 14:33:36 -05:00
David S. Miller 2745529ac7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Couple conflicts resolved here:

1) In the MACB driver, a bug fix to properly initialize the
   RX tail pointer properly overlapped with some changes
   to support variable sized rings.

2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
   overlapping with a reorganization of the driver to support
   ACPI, OF, as well as PCI variants of the chip.

3) In 'net' we had several probe error path bug fixes to the
   stmmac driver, meanwhile a lot of this code was cleaned up
   and reorganized in 'net-next'.

4) The cls_flower classifier obtained a helper function in
   'net-next' called __fl_delete() and this overlapped with
   Daniel Borkamann's bug fix to use RCU for object destruction
   in 'net'.  It also overlapped with Jiri's change to guard
   the rhashtable_remove_fast() call with a check against
   tc_skip_sw().

5) In mlx4, a revert bug fix in 'net' overlapped with some
   unrelated changes in 'net-next'.

6) In geneve, a stale header pointer after pskb_expand_head()
   bug fix in 'net' overlapped with a large reorganization of
   the same code in 'net-next'.  Since the 'net-next' code no
   longer had the bug in question, there was nothing to do
   other than to simply take the 'net-next' hunks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 12:29:53 -05:00
Ingo Molnar 1b95b1a06c Merge branch 'locking/urgent' into locking/core, to pick up dependent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-02 11:13:44 +01:00
Dmitry Vyukov 828347f8f9 kasan: support use-after-scope detection
Gcc revision 241896 implements use-after-scope detection.  Will be
available in gcc 7.  Support it in KASAN.

Gcc emits 2 new callbacks to poison/unpoison large stack objects when
they go in/out of scope.  Implement the callbacks and add a test.

[dvyukov@google.com: v3]
  Link: http://lkml.kernel.org/r/1479998292-144502-1-git-send-email-dvyukov@google.com
Link: http://lkml.kernel.org/r/1479226045-145148-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-30 16:32:52 -08:00
Chris Wilson f8ff04e2be lib/debugobjects: export for use in modules
Drivers, or other modules, that use a mixture of objects (especially
objects embedded within other objects) would like to take advantage of
the debugobjects facilities to help catch misuse.  Currently, the
debugobjects interface is only available to builtin drivers and requires
a set of EXPORT_SYMBOL_GPL for use by modules.

I am using the debugobjects in i915.ko to try and catch some invalid
operations on embedded objects.  The problem currently only presents
itself across module unload so forcing i915 to be builtin is not an
option.

Link: http://lkml.kernel.org/r/20161122143039.6433-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Du, Changbin" <changbin.du@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-30 16:32:52 -08:00
David S. Miller 0b42f25d2f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.

Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26 23:42:21 -05:00
Michael Ellerman 2513940989 locking/selftest: Fix output since KERN_CONT changes
Since the KERN_CONT changes the locking-selftest output is messed up, eg:

  ----------------------------------------------------------------------------
                                   | spin |wlock |rlock |mutex | wsem | rsem |
    --------------------------------------------------------------------------
                       A-A deadlock:
    ok  |
    ok  |
    ok  |
    ok  |
    ok  |
    ok  |

Use pr_cont() to get it looking normal again:

  ----------------------------------------------------------------------------
                                   | spin |wlock |rlock |mutex | wsem | rsem |
    --------------------------------------------------------------------------
                       A-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |

Reported-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/1480027528-934-1-git-send-email-mpe@ellerman.id.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-25 07:12:19 +01:00
Andrey Ryabinin f5527fffff mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
This fixes CVE-2016-8650.

If mpi_powm() is given a zero exponent, it wants to immediately return
either 1 or 0, depending on the modulus.  However, if the result was
initalised with zero limb space, no limbs space is allocated and a
NULL-pointer exception ensues.

Fix this by allocating a minimal amount of limb space for the result when
the 0-exponent case when the result is 1 and not touching the limb space
when the result is 0.

This affects the use of RSA keys and X.509 certificates that carry them.

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
PGD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
task: ffff8804011944c0 task.stack: ffff880401294000
RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
Stack:
 ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
 0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
 ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
Call Trace:
 [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
 [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
 [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
 [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
 [<ffffffff8132a95c>] rsa_verify+0x9d/0xee
 [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
 [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
 [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
 [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
 [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
 [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
 [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
 [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
 [<ffffffff812fe227>] SyS_add_key+0x154/0x19e
 [<ffffffff81001c2b>] do_syscall_64+0x80/0x191
 [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
 RSP <ffff880401297ad8>
CR2: 0000000000000000
---[ end trace d82015255d4a5d8d ]---

Basically, this is a backport of a libgcrypt patch:

	http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526

Fixes: cdec9cb516 ("crypto: GnuPG based MPI lib - source files (part 1)")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
cc: linux-ima-devel@lists.sourceforge.net
cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-11-25 12:57:50 +11:00
Ingo Molnar af91a81131 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Documentation updates, yet again just simple changes.

 - Miscellaneous fixes, including a change to call_rcu()'s
   rcu_head alignment check.

 - Security-motivated list consistency checks, which are
   disabled by default behind DEBUG_LIST.

 - Torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-23 10:04:28 +01:00
David S. Miller f9aa9dc7d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All conflicts were simple overlapping changes except perhaps
for the Thunder driver.

That driver has a change_mtu method explicitly for sending
a message to the hardware.  If that fails it returns an
error.

Normally a driver doesn't need an ndo_change_mtu method becuase those
are usually just range changes, which are now handled generically.
But since this extra operation is needed in the Thunder driver, it has
to stay.

However, if the message send fails we have to restore the original
MTU before the change because the entire call chain expects that if
an error is thrown by ndo_change_mtu then the MTU did not change.
Therefore code is added to nicvf_change_mtu to remember the original
MTU, and to restore it upon nicvf_update_hw_max_frs() failue.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22 13:27:16 -05:00
Ingo Molnar 02cb689b2c Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22 12:37:38 +01:00
Linus Torvalds 8d1a2408ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:

 1) With modern networking cards we can run out of 32-bit DMA space, so
    support 64-bit DMA addressing when possible on sparc64. From Dave
    Tushar.

 2) Some signal frame validation checks are inverted on sparc32, fix
    from Andreas Larsson.

 3) Lockdep tables can get too large in some circumstances on sparc64,
    add a way to adjust the size a bit. From Babu Moger.

 4) Fix NUMA node probing on some sun4v systems, from Thomas Tai.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: drop duplicate header scatterlist.h
  lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined
  config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc
  sunbmac: Fix compiler warning
  sunqe: Fix compiler warnings
  sparc64: Enable 64-bit DMA
  sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
  sparc64: Bind PCIe devices to use IOMMU v2 service
  sparc64: Initialize iommu_map_table and iommu_pool
  sparc64: Add ATU (new IOMMU) support
  sparc64: Add FORCE_MAX_ZONEORDER and default to 13
  sparc64: fix compile warning section mismatch in find_node()
  sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
  sparc64: Fix find_node warning if numa node cannot be found
2016-11-21 13:56:17 -08:00
Alexey Dobriyan 32d84cdcfd netlink: smaller nla_attr_minlen table
Length of a netlink attribute may be u16 but lengths of basic attributes
are much smaller, so small we can save 16 bytes of .rodata and pocket
change inside .text.

16-bit is worse on x86-64 than 8-bit because of operand size override prefix.

	add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-19 (-19)
	function                                     old     new   delta
	validate_nla                                 418     417      -1
	nla_policy_len                                66      64      -2
	nla_attr_minlen                               32      16     -16
	Total: Before=154865051, After=154865032, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-19 22:11:25 -05:00
Jonathan Corbet 917fef6f7e Linux 4.9-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYHmoCAAoJEHm+PkMAQRiG7RMIAI2i7Y5hpL5yCxK5AFaL4u/G
 KxXfp1B1UanUTgjOmd7zGqtDYcFX9t7GTTUFixQ7/9Opr4PD9qbnatoDGSc3xjbT
 msDgA1B78F1/Q3kHWfeGq32MihQ4mj5NwUCo+igUcUvvWG7mHgzErj/Nh5RoobQX
 p/izdpTbrw3GX6xXB8olbG7XWHaVye/+TT3q6+gmgm8I/QEujcLeGoycE0zlhPN8
 FG/JX76At/+ZM2Py7Oxo3k+oKL9CHrtOQYDp/wN0uslV5eYvvkZz0/M1HMOGZt+c
 gZU5jzM17K7C4Nzo06WAuBU9wUBGc25m+cPicLlOmljnzfU+f50SKaDjZq3p7QI=
 =2KUF
 -----END PGP SIGNATURE-----

Merge tag 'v4.9-rc4' into sound

Bring in -rc4 patches so I can successfully merge the sound doc changes.
2016-11-18 16:13:41 -07:00
Babu Moger e6b5f1be7a config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc
This new config parameter limits the space used for "Lock debugging:
prove locking correctness" by about 4MB. The current sparc systems have
the limitation of 32MB size for kernel size including .text, .data and
.bss sections. With PROVE_LOCKING feature, the kernel size could grow
beyond this limit and causing system boot-up issues. With this option,
kernel limits the size of the entries of lock_chains, stack_trace etc.,
so that kernel fits in required size limit. This is not visible to user
and only used for sparc.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 11:33:19 -08:00
Abhi Das 680bb946a1 fix iov_iter_advance() for ITER_PIPE
iov_iter_advance() needs to decrement iter->count by the number of
bytes we'd moved beyond.  Normal flavours do that, but ITER_PIPE
doesn't and ITER_PIPE generic_file_read_iter() for O_DIRECT files
ends up with a bogus fallback to page cache read, resulting in incorrect
values for file offset and bytes read.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-17 00:00:48 -05:00
Christian Borntraeger f2f09a4cee locking/core: Remove cpu_relax_lowlatency() users
With the s390 special case of a yielding cpu_relax() implementation gone,
we can now remove all users of cpu_relax_lowlatency() and replace them
with cpu_relax().

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16 10:15:10 +01:00
Paul E. McKenney 6190aaafd0 Merge branches 'doc.2016.11.14a', 'fixes.2016.11.14a', 'list.2016.10.31a' and 'torture.2016.11.14a' into HEAD
doc.2016.11.14a:  Documentation changes
fixes.2016.11.14aneous fixes
list.2016.10.31a:  List updates
torture.2016.11.14a:  Torture-test updates
2016-11-14 10:50:50 -08:00
Nikolay Borisov f2151a0a34 rcu: RCU_TRACE enables event tracing as well as debugfs
The commit brings the RCU_TRACE Kconfig option's help text up to date
by noting that it enables additional event tracing as well as debugfs.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
[ paulmck:  Do some wordsmithing. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2016-11-14 10:46:25 -08:00
Chris Wilson ae65a21fb8 lib/stackdepot: export save/fetch stack for drivers
Some drivers would like to record stacktraces in order to aide leak
tracing.  As stackdepot already provides a facility for only storing the
unique traces, thereby reducing the memory required, export that
functionality for use by drivers.

The code was originally created for KASAN and moved under lib in commit
cd11016e5f ("mm, kasan: stackdepot implementation.  Enable stackdepot
for SLAB") so that it could be shared with mm/.  In turn, we want to
share it now with drivers.

Link: http://lkml.kernel.org/r/20161108133209.22704-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11 08:12:37 -08:00
Alexander Duyck d29fa0cb76 swiotlb: Minor fix-ups for DMA_ATTR_SKIP_CPU_SYNC support
I am updating the paths so that instead of trying to pass
"attr | DMA_ATTR_SKIP_CPU_SYNC" we instead just OR the value into attr and
then pass it since attr will not be used after we make the unmap call.

I realized there was one spot I had missed when I was applying the DMA
attribute to the DMA mapping exception handling.  This change corrects that.

Finally it looks like there is a stray blank line at the end of the
swiotlb_unmap_sg_attrs function that can be dropped.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2016-11-10 15:56:19 -05:00
Sebastian Andrzej Siewior d544abd5ff lib/radix-tree: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161103145021.28528-6-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 23:45:27 +01:00
Sebastian Andrzej Siewior 5588f5afb4 lib/percpu_counter: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161103145021.28528-5-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 23:45:26 +01:00
Gayatri Kammela b9bf33a8bd lib/raid6: Add AVX2 optimized xor_syndrome functions
Implement the AVX2 optimization of RAID6 xor_syndrome functions which is
simply based on sse2.c written by hpa.

Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Yuanhan Liu <yuanhan.liu@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-07 15:08:20 -08:00
Alexander Duyck 0443fa003f swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNC
As a first step to making DMA_ATTR_SKIP_CPU_SYNC apply to architectures
beyond just ARM I need to make it so that the swiotlb will respect the
flag.  In order to do that I also need to update the swiotlb-xen since it
heavily makes use of the functionality.

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2016-11-07 15:06:33 -05:00
Alexander Duyck ebcf6f979d swiotlb: Drop unused functions swiotlb_map_sg and swiotlb_unmap_sg
There are no users for swiotlb_map_sg or swiotlb_unmap_sg so we might as
well just drop them.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by:  Christoph Hellwig <hch@infradead.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2016-11-07 15:06:31 -05:00
Geert Uytterhoeven 0d2e18985a swiotlb: Rate-limit printing when running out of SW-IOMMU space
If the system runs out of SW-IOMMU space, changes are high successive
requests will fail, too, flooding the kernel log.  This is true
especially for streaming DMA, which is typically used repeatedly outside
the driver's initialization routine.  Add rate-limiting to fix this.

While at it, get rid of the open-coded dev_name() handling by using the
appropriate dev_err_*() variant.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2016-11-07 15:06:26 -05:00
Christoph Hellwig 2f8b544477 block,fs: untangle fs.h and blk_types.h
Nothing in fs.h should require blk_types.h to be included.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01 09:43:26 -06:00
Arnd Bergmann 4520bcb2ba bug: Avoid Kconfig warning for BUG_ON_DATA_CORRUPTION
The CONFIG_DEBUG_LIST option is normally meant for kernel developers
rather than production machines and is guarded by CONFIG_DEBUG_KERNEL.

In contrast, the newly added CONFIG_BUG_ON_DATA_CORRUPTION is meant
for security hardening and may be used on systems that intentionally
do not enable CONFIG_DEBUG_KERNEL. In this configuration, we get
a warning from Kconfig about the mismatched dependencies:

warning: (BUG_ON_DATA_CORRUPTION) selects DEBUG_LIST which has unmet direct dependencies (DEBUG_KERNEL)

This annotates the DEBUG_LIST option to be selectable by
BUG_ON_DATA_CORRUPTION when DEBUG_KERNEL is disabled.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 40cd725cfc7f ("bug: Provide toggle for BUG on data corruption")
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-10-31 13:01:59 -07:00
Valentin Rothberg 91a6cee6e3 lib/Kconfig.debug: Fix typo in select statement
Commit 484f29c7430b3 ("bug: Provide toggle for BUG on data corruption")
added a Kconfig select statement on CONFIG_DEBUG_LIST, but the CONFIG_
prefix is only used in Make and C(PP) syntax.  Remove the CONFIG_ prefix
to correctly select the Kconfig option DEBUG_LIST.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
2016-10-31 13:01:59 -07:00
Kees Cook de54ebbe26 bug: Provide toggle for BUG on data corruption
The kernel checks for cases of data structure corruption under some
CONFIGs (e.g. CONFIG_DEBUG_LIST). When corruption is detected, some
systems may want to BUG() immediately instead of letting the system run
with known corruption.  Usually these kinds of manipulation primitives can
be used by security flaws to gain arbitrary memory write control. This
provides a new config CONFIG_BUG_ON_DATA_CORRUPTION and a corresponding
macro CHECK_DATA_CORRUPTION for handling these situations. Notably, even
if not BUGing, the kernel should not continue processing the corrupted
structure.

This is inspired by similar hardening by Syed Rameez Mustafa in MSM
kernels, and in PaX and Grsecurity, which is likely in response to earlier
removal of the BUG calls in commit 924d9addb9 ("list debugging: use
WARN() instead of BUG()").

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
2016-10-31 13:01:58 -07:00
Kees Cook 0cd340dcb0 list: Split list_del() debug checking into separate function
Similar to the list_add() debug consolidation, this commit consolidates
the debug checking performed during CONFIG_DEBUG_LIST into a new
__list_del_entry_valid() function, and stops list updates when corruption
is found.

Refactored from same hardening in PaX and Grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
2016-10-31 13:01:57 -07:00
Kees Cook 54acd4397d rculist: Consolidate DEBUG_LIST for list_add_rcu()
This commit consolidates the debug checking for list_add_rcu() into the
new single __list_add_valid() debug function.  Notably, this commit fixes
the sanity check that was added in commit 17a801f4bf ("list_debug:
WARN for adding something already in the list"), which wasn't checking
RCU-protected lists.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
2016-10-31 13:01:57 -07:00
Kees Cook d7c816733d list: Split list_add() debug checking into separate function
Right now, __list_add() code is repeated either in list.h or in
list_debug.c, but the only differences between the two versions
are the debug checks. This commit therefore extracts these debug
checks into a separate __list_add_valid() function and consolidates
__list_add(). Additionally this new __list_add_valid() function will stop
list manipulations if a corruption is detected, instead of allowing for
further corruption that may lead to even worse conditions.

This is slight refactoring of the same hardening done in PaX and Grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
2016-10-31 13:01:56 -07:00
Greg Kroah-Hartman bb41d2a51f Merge 4.9-rc3 into driver-core-next
We want the fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-30 06:43:43 -04:00
Linus Torvalds 2a26d99b25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Lots of fixes, mostly drivers as is usually the case.

   1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
      Khoroshilov.

   2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
      Pedersen.

   3) Don't put aead_req crypto struct on the stack in mac80211, from
      Ard Biesheuvel.

   4) Several uninitialized variable warning fixes from Arnd Bergmann.

   5) Fix memory leak in cxgb4, from Colin Ian King.

   6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.

   7) Several VRF semantic fixes from David Ahern.

   8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.

   9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.

  10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.

  11) Fix stale link state during failover in NCSCI driver, from Gavin
      Shan.

  12) Fix netdev lower adjacency list traversal, from Ido Schimmel.

  13) Propvide proper handle when emitting notifications of filter
      deletes, from Jamal Hadi Salim.

  14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.

  15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.

  16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.

  17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.

  18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
      Leitner.

  19) Revert a netns locking change that causes regressions, from Paul
      Moore.

  20) Add recursion limit to GRO handling, from Sabrina Dubroca.

  21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.

  22) Avoid accessing stale vxlan/geneve socket in data path, from
      Pravin Shelar"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
  geneve: avoid using stale geneve socket.
  vxlan: avoid using stale vxlan socket.
  qede: Fix out-of-bound fastpath memory access
  net: phy: dp83848: add dp83822 PHY support
  enic: fix rq disable
  tipc: fix broadcast link synchronization problem
  ibmvnic: Fix missing brackets in init_sub_crq_irqs
  ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
  Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
  arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
  net/mlx4_en: Save slave ethtool stats command
  net/mlx4_en: Fix potential deadlock in port statistics flow
  net/mlx4: Fix firmware command timeout during interrupt test
  net/mlx4_core: Do not access comm channel if it has not yet been initialized
  net/mlx4_en: Fix panic during reboot
  net/mlx4_en: Process all completions in RX rings after port goes up
  net/mlx4_en: Resolve dividing by zero in 32-bit system
  net/mlx4_core: Change the default value of enable_qos
  net/mlx4_core: Avoid setting ports to auto when only one port type is supported
  net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
  ...
2016-10-29 20:33:20 -07:00
Julia Lawall edbbf994bd kobject: improve function-level documentation
In the first case, rename the second variable to correspond to the name
found in the function parameter list.

In the remaining cases, reorder the variables to correspond to their order
in the parameter list.

Issue detected using Coccinelle (http://coccinelle.lip6.fr/)

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 02:42:10 -04:00
Daniel Mentz 62e931fac4 lib/genalloc.c: start search from start of chunk
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.

The shortcut

	if (size > atomic_read(&chunk->avail))
		continue;

makes the loop skip over chunks that do not have enough bytes left to
fulfill the request.  There are two situations, though, where an
allocation might still fail:

(1) The available memory is not contiguous, i.e.  the request cannot
    be fulfilled due to external fragmentation.

(2) A race condition.  Another thread runs the same code concurrently
    and is quicker to grab the available memory.

In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed.  This return value is then assigned to
start_bit.  The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched.  Today, the code fails to reset start_bit to 0.  As a
result, prefixes of subsequent chunks are ignored.  Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.

Fixes: 7f184275aa ("lib, Make gen_pool memory allocator lockless")
Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Dmitry Vyukov 02754e0a48 lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
second level).  Size of each stack is (num_frames + 3) * sizeof(long).
Which gives us ~84K stacks.  This capacity was chosen empirically and it
is enough to run kernel normally.

However, when lots of configs are enabled and a fuzzer tries to maximize
code coverage, it easily hits the limit within tens of minutes.  I've
tested for long a time with number of top level entries bumped 4x
(4096).  And I think I've seen overflow only once.  But I don't have all
configs enabled and code coverage has not reached maximum yet.  So bump
it 8x to 8192.

Since we have two-level table, memory cost of this is very moderate --
currently the top-level table is 8KB, with this patch it is 64KB, which
is negligible under KASAN.

Here is some approx math.

128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches.  Most of
alloc/free call sites are reachable with only one stack.  But some
utility functions can have large fanout.  Assuming average fanout is 5x,
total number of alloc/free stacks is ~300K.

Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Kees Cook 0e07f663c9 latent_entropy: raise CONFIG_FRAME_WARN by default
When building with the latent_entropy plugin, set the default
CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
blocks that, when instrumented by the latent_entropy plugin, grow beyond
1024 byte stack size on 32-bit builds.

Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Mauro Carvalho Chehab 8c27ceff36 docs: fix locations of several documents that got moved
The previous patch renamed several files that are cross-referenced
along the Kernel documentation. Adjust the links to point to
the right places.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-10-24 08:12:35 -02:00
Daniel Borkmann 0d906b1e8d bpf, test: fix ld_abs + vlan push/pop stress test
After commit 636c262808 ("net: skbuff: Remove errornous length
validation in skb_vlan_pop()") mentioned test case stopped working,
throwing a -12 (ENOMEM) return code. The issue however is not due to
636c262808, but rather due to a buggy test case that got uncovered
from the change in behaviour in 636c262808.

The data_size of that test case for the skb was set to 1. In the
bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that
loop with: reading skb data, pushing 68 tags, reading skb data,
popping 68 tags, reading skb data, etc, in order to force a skb
expansion and thus trigger that JITs recache skb->data. Problem is
that initial data_size is too small.

While before 636c262808, the test silently bailed out due to the
skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an
error from failing skb_ensure_writable(). Set at least minimum of
ETH_HLEN as an initial length so that on first push of data, equivalent
pop will succeed.

Fixes: 4d9c5c53ac ("test_bpf: add bpf_skb_vlan_push/pop() tests")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:39:06 -04:00
Tejun Heo 8bc4a04455 Merge branch 'for-4.9' into for-4.10 2016-10-19 12:12:40 -04:00
Linus Torvalds 9ffc66941d This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot time as
 possible, hoping to capitalize on any possible variation in CPU operation
 (due to runtime data differences, hardware differences, SMP ordering,
 thermal timing variation, cache behavior, etc).
 
 At the very least, this plugin is a much more comprehensive example for
 how to manipulate kernel code using the gcc plugin internals.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJX/BAFAAoJEIly9N/cbcAmzW8QALFbCs7EFFkML+M/M/9d8zEk
 1QbUs/z8covJTTT1PjSdw7JUrAMulI3S00owpcQVd/PcWjRPU80QwfsXBgIB0tvC
 Kub2qxn6Oaf+kTB646zwjFgjdCecw/USJP+90nfcu2+LCnE8ReclKd1aUee+Bnhm
 iDEUyH2ONIoWq6ta2Z9sA7+E4y2ZgOlmW0iga3Mnf+OcPtLE70fWPoe5E4g9DpYk
 B+kiPDrD9ql5zsHaEnKG1ldjiAZ1L6Grk8rGgLEXmbOWtTOFmnUhR+raK5NA/RCw
 MXNuyPay5aYPpqDHFm+OuaWQAiPWfPNWM3Ett4k0d9ZWLixTcD1z68AciExwk7aW
 SEA8b1Jwbg05ZNYM7NJB6t6suKC4dGPxWzKFOhmBicsh2Ni5f+Az0BQL6q8/V8/4
 8UEqDLuFlPJBB50A3z5ngCVeYJKZe8Bg/Swb4zXl6mIzZ9darLzXDEV6ystfPXxJ
 e1AdBb41WC+O2SAI4l64yyeswkGo3Iw2oMbXG5jmFl6wY/xGp7dWxw7gfnhC6oOh
 afOT54p2OUDfSAbJaO0IHliWoIdmE5ZYdVYVU9Ek+uWyaIwcXhNmqRg+Uqmo32jf
 cP5J9x2kF3RdOcbSHXmFp++fU+wkhBtEcjkNpvkjpi4xyA47IWS7lrVBBebrCq9R
 pa/A7CNQwibIV6YD8+/p
 =1dUK
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc plugins update from Kees Cook:
 "This adds a new gcc plugin named "latent_entropy". It is designed to
  extract as much possible uncertainty from a running system at boot
  time as possible, hoping to capitalize on any possible variation in
  CPU operation (due to runtime data differences, hardware differences,
  SMP ordering, thermal timing variation, cache behavior, etc).

  At the very least, this plugin is a much more comprehensive example
  for how to manipulate kernel code using the gcc plugin internals"

* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  latent_entropy: Mark functions with __latent_entropy
  gcc-plugins: Add latent_entropy plugin
2016-10-15 10:03:15 -07:00
Linus Torvalds b26b5ef5ec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more misc uaccess and vfs updates from Al Viro:
 "The rest of the stuff from -next (more uaccess work) + assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  score: traps: Add missing include file to fix build error
  fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths
  fs/super.c: fix race between freeze_super() and thaw_super()
  overlayfs: Fix setting IOP_XATTR flag
  iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector()
  blackfin: no access_ok() for __copy_{to,from}_user()
  arm64: don't zero in __copy_from_user{,_inatomic}
  arm: don't zero in __copy_from_user_inatomic()/__copy_from_user()
  arc: don't leak bits of kernel stack into coredump
  alpha: get rid of tail-zeroing in __copy_user()
2016-10-14 18:19:05 -07:00
Vegard Nossum ffecee4f24 iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector()
Both import_iovec() and rw_copy_check_uvector() take an array
(typically small and on-stack) which is used to hold an iovec array copy
from userspace. This is to avoid an expensive memory allocation in the
fast path (i.e. few iovec elements).

The caller may have to check whether these functions actually used
the provided buffer or allocated a new one -- but this differs between
the too. Let's just add a kernel doc to clarify what the semantics are
for each function.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-14 20:00:34 -04:00
Linus Torvalds 5d89d9f502 linux-kselftest-4.9-rc1-update
This update consists of:
 
 - Fixes and improvements to existing tests
 - Moving code from Documentation to selftests, samples, and tools.
 
   Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and networking
   tests from Documentation to selftests.
 
   Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay, and
   blackfin examples from Documentation to samples.
 
   Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from
   Documentation to tools.
 
   Deletes BUILD_DOCSRC and its dependencies.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX/6zUAAoJEAsCRMQNDUMczIEP/0kH+yjJ3El4GYIokspR1/UU
 ++sy4XMzrD1UPy90v+ftcg4ss5R80r0v7EZ59k1UjDJSZ6WATHHGoZKCS2Dy3xcq
 i+0vm7Bawh7YWrXD3TunwaL97lwb2DdVTSxRXuU4Hfv+oVynUfh/+ZlCH6RCM2nm
 ZJE5PDYiq4nTVSRqFB2FyRE6yay5dPvpQ2ArwnSEw+ku4C+ZdGTGCWzS+aZBwZM/
 ykePkGLVRXz9FsWTCmipJzYu0Z/M4xEGlfXQZiiLG2HicbJNP6AqJImbQrANm+TW
 RFigYpofdhr9XG5TKTLIudaRt9qB6BE0mYEApZXH8U7NrHElfO9BBMEwzajl0V/2
 q/r5iej/CJult3zsfkhdHo7GLXpOaDLyoXiUI6UTgL0XOdWLAWTqDYx4JJz9sXxp
 B9dwKJeP5HLipk6FMkAHgJM90JKQFd/nLDKxeWexbMu/b/yQ2C9AR7NpdQ+c1X7I
 8W8UNEi/fnK75+r4t3NfeD2/5boq/jwujSKEMDQm/3R8L8EFYYb/TRoujFn89Na3
 wbZLV3hBL+KQ5lRyIx7X8RKyVJv1nlo9Wh57ItJed6zvGp5EmsI8w+DER2RfbO2c
 HR2JPDKSxmU8O2WBfDW5QoiPQH8Lssd147Ir0UFE7mwBXgWWsmxJxDpufizAXwyJ
 qnELJ9X3UFIdydtoObLr
 =60kH
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:
 "This update consists of:

   - Fixes and improvements to existing tests

   - Moving code from Documentation to selftests, samples, and tools:

     * Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and
       networking tests from Documentation to selftests.

     * Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay,
       and blackfin examples from Documentation to samples.

     * Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from
       Documentation to tools.

     * Deletes BUILD_DOCSRC and its dependencies"

* tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (21 commits)
  selftests/futex: Check ANSI terminal color support
  Doc: update 00-INDEX files to reflect the runnable code move
  samples: move blackfin gptimers-example from Documentation
  tools: move pcmcia crc32hash tool from Documentation
  tools: move laptops dslm tool from Documentation
  tools: move accounting tool from Documentation
  samples: move auxdisplay example code from Documentation
  samples: move watchdog example code from Documentation
  samples: move timers example code from Documentation
  samples: move misc-devices/mei example code from Documentation
  samples: move mic/mpssd example code from Documentation
  selftests: Move networking/timestamping from Documentation
  selftests: move watchdog tests from Documentation/watchdog
  selftests: move ia64 tests from Documentation/ia64
  selftests: move vDSO tests from Documentation/vDSO
  selftests: move ptp tests from Documentation/ptp
  selftests: move prctl tests from Documentation/prctl
  selftests: move dnotify_test from Documentation/filesystems
  selftests/timers: Add missing error code assignment before test
  selftests/zram: replace ZRAM_LZ4_COMPRESS
  ...
2016-10-14 15:17:12 -07:00
Linus Torvalds b6daa51b9a Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:

 - Nick improved generic implementations of percpu operations which
   modify the variable and return so that they calculate the physical
   address only once.

 - percpu_ref percpu <-> atomic mode switching improvements. The
   patchset was originally posted about a year ago but fell through the
   crack.

 - misc non-critical fixes.

* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()
  mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk()
  percpu: eliminate two sparse warnings
  percpu: improve generic percpu modify-return implementation
  percpu-refcount: init ->confirm_switch member properly
  percpu_ref: allow operation mode switching operations to be called concurrently
  percpu_ref: restructure operation mode switching
  percpu_ref: unify staggered atomic switching wait behavior
  percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic()
  percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation
2016-10-14 11:46:25 -07:00
Alexander Potapenko 65deb8af76 kcov: do not instrument lib/stackdepot.c
There's no point in collecting coverage from lib/stackdepot.c, as it is
not a function of syscall inputs.  Disabling kcov instrumentation for that
file will reduce the coverage noise level.

Link: http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:32 -07:00
Noam Camus 2d13e6ca42 lib/bitmap.c: enhance bitmap syntax
Today there are platforms with many CPUs (up to 4K).  Trying to boot only
part of the CPUs may result in too long string.

For example lets take NPS platform that is part of arch/arc.  This
platform have SMP system with 256 cores each with 16 HW threads (SMT
machine) where HW thread appears as CPU to the kernel.  In this example
there is total of 4K CPUs.  When one tries to boot only part of the HW
threads from each core the string representing the map may be long...  For
example if for sake of performance we decided to boot only first half of
HW threads of each core the map will look like:
0-7,16-23,32-39,...,4080-4087

This patch introduce new syntax to accommodate with such use case.  I
added an optional postfix to a range of CPUs which will choose according
to given modulo the desired range of reminders i.e.:

    <cpus range>:sed_size/group_size

For example, above map can be described in new syntax like this:
0-4095:8/16

Note that this patch is backward compatible with current syntax.

[akpm@linux-foundation.org: rework documentation]
Link: http://lkml.kernel.org/r/1473579629-4283-1-git-send-email-noamca@mellanox.com
Signed-off-by: Noam Camus <noamca@mellanox.com>
Cc: David Decotigny <decot@googlers.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:30 -07:00
Alexey Dobriyan 8cfd56d479 lib/kstrtox.c: smaller _parse_integer()
Set "overflow" bit upon encountering it instead of postponing to the end
of the conversion. Somehow gcc unwedges itself and generates better code:

	$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
	_parse_integer                      177     139     -38

Inspired by patch from Zhaoxiu Zeng.

Link: http://lkml.kernel.org/r/20160826221920.GA1909@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:30 -07:00
Mark Rutland bf90e56e46 lib: harden strncpy_from_user
The strncpy_from_user() accessor is effectively a copy_from_user()
specialised to copy strings, terminating early at a NUL byte if possible.
In other respects it is identical, and can be used to copy an arbitrarily
large buffer from userspace into the kernel.  Conceptually, it exposes a
similar attack surface.

As with copy_from_user(), we check the destination range when the kernel
is built with KASAN, but unlike copy_from_user() we do not check the
destination buffer when using HARDENED_USERCOPY.  As strncpy_from_user()
calls get_user() in a loop, we must call check_object_size() explicitly.

This patch adds this instrumentation to strncpy_from_user(), per the same
rationale as with the regular copy_from_user().  In the absence of
hardened usercopy this will have no impact as the instrumentation expands
to an empty static inline function.

Link: http://lkml.kernel.org/r/1472221903-31181-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:30 -07:00
Al Viro 1689c73a73 Fix off-by-one in __pipe_get_pages()
it actually worked only when requested area ended on the page boundary...

Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 10:40:01 -07:00
Emese Revfy 0766f788eb latent_entropy: Mark functions with __latent_entropy
The __latent_entropy gcc attribute can be used only on functions and
variables.  If it is on a function then the plugin will instrument it for
gathering control-flow entropy. If the attribute is on a variable then
the plugin will initialize it with random contents.  The variable must
be an integer, an integer array type or a structure with integer fields.

These specific functions have been selected because they are init
functions (to help gather boot-time entropy), are called at unpredictable
times, or they have variable loops, each of which provide some level of
latent entropy.

Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: expanded commit message]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-10-10 14:51:45 -07:00
Linus Torvalds abb5a14fa2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted misc bits and pieces.

  There are several single-topic branches left after this (rename2
  series from Miklos, current_time series from Deepa Dinamani, xattr
  series from Andreas, uaccess stuff from from me) and I'd prefer to
  send those separately"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
  proc: switch auxv to use of __mem_open()
  hpfs: support FIEMAP
  cifs: get rid of unused arguments of CIFSSMBWrite()
  posix_acl: uapi header split
  posix_acl: xattr representation cleanups
  fs/aio.c: eliminate redundant loads in put_aio_ring_file
  fs/internal.h: add const to ns_dentry_operations declaration
  compat: remove compat_printk()
  fs/buffer.c: make __getblk_slow() static
  proc: unsigned file descriptors
  fs/file: more unsigned file descriptors
  fs: compat: remove redundant check of nr_segs
  cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
  cifs: don't use memcpy() to copy struct iov_iter
  get rid of separate multipage fault-in primitives
  fs: Avoid premature clearing of capabilities
  fs: Give dentry to inode_change_ok() instead of inode
  fuse: Propagate dentry down to inode_change_ok()
  ceph: Propagate dentry down to inode_change_ok()
  xfs: Propagate dentry down to inode_change_ok()
  ...
2016-10-10 13:04:49 -07:00
Shuah Khan 1848929251 samples: move blackfin gptimers-example from Documentation
Move blackfin gptimers-example to samples and remove it from Documentation
Makefile. Update samples Kconfig and Makefile to build gptimers-example.

blackfin is the last CONFIG_BUILD_DOCSRC target in Documentation/Makefile.
Hence this patch also includes changes to remove CONFIG_BUILD_DOCSRC from
Makefile and lib/Kconfig.debug and updates VIDEO_PCI_SKELETON dependency
on BUILD_DOCSRC.

Documentation/Makefile is not deleted to avoid braking make htmldocs and
make distclean.

Acked-by: Michal Marek <mmarek@suse.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2016-10-10 07:12:02 -06:00
Linus Torvalds b66484cd74 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - fsnotify updates

 - ocfs2 updates

 - all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits)
  console: don't prefer first registered if DT specifies stdout-path
  cred: simpler, 1D supplementary groups
  CREDITS: update Pavel's information, add GPG key, remove snail mail address
  mailmap: add Johan Hovold
  .gitattributes: set git diff driver for C source code files
  uprobes: remove function declarations from arch/{mips,s390}
  spelling.txt: "modeled" is spelt correctly
  nmi_backtrace: generate one-line reports for idle cpus
  arch/tile: adopt the new nmi_backtrace framework
  nmi_backtrace: do a local dump_stack() instead of a self-NMI
  nmi_backtrace: add more trigger_*_cpu_backtrace() methods
  min/max: remove sparse warnings when they're nested
  Documentation/filesystems/proc.txt: add more description for maps/smaps
  mm, proc: fix region lost in /proc/self/smaps
  proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
  proc: add LSM hook checks to /proc/<tid>/timerslack_ns
  proc: relax /proc/<tid>/timerslack_ns capability requirements
  meminfo: break apart a very long seq_printf with #ifdefs
  seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
  proc: faster /proc/*/status
  ...
2016-10-07 21:38:00 -07:00
Chris Metcalf 6727ad9e20 nmi_backtrace: generate one-line reports for idle cpus
When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative.  Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.

This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.

Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:30 -07:00
Chris Metcalf 6776648952 nmi_backtrace: do a local dump_stack() instead of a self-NMI
Currently on arm there is code that checks whether it should call
dump_stack() explicitly, to avoid trying to raise an NMI when the
current context is not preemptible by the backtrace IPI.  Similarly, the
forthcoming arch/tile support uses an IPI mechanism that does not
support generating an NMI to self.

Accordingly, move the code that guards this case into the generic
mechanism, and invoke it unconditionally whenever we want a backtrace of
the current cpu.  It seems plausible that in all cases, dump_stack()
will generate better information than generating a stack from the NMI
handler.  The register state will be missing, but that state is likely
not particularly helpful in any case.

Or, if we think it is helpful, we should be capturing and emitting the
current register state in all cases when regs == NULL is passed to
nmi_cpu_backtrace().

Link: http://lkml.kernel.org/r/1472487169-14923-3-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Aaron Tomlin <atomlin@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:30 -07:00
Chris Metcalf 9a01c3ed5c nmi_backtrace: add more trigger_*_cpu_backtrace() methods
Patch series "improvements to the nmi_backtrace code" v9.

This patch series modifies the trigger_xxx_backtrace() NMI-based remote
backtracing code to make it more flexible, and makes a few small
improvements along the way.

The motivation comes from the task isolation code, where there are
scenarios where we want to be able to diagnose a case where some cpu is
about to interrupt a task-isolated cpu.  It can be helpful to see both
where the interrupting cpu is, and also an approximation of where the
cpu that is being interrupted is.  The nmi_backtrace framework allows us
to discover the stack of the interrupted cpu.

I've tested that the change works as desired on tile, and build-tested
x86, arm, mips, and sparc64.  For x86 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section.  For arm, mips, and sparc I just build-tested it
and made sure the generic cpuidle routines were in the new cpuidle
section, but I didn't attempt to figure out which the platform-specific
idle routines might be.  That might be more usefully done by someone
with platform experience in follow-up patches.

This patch (of 4):

Currently you can only request a backtrace of either all cpus, or all
cpus but yourself.  It can also be helpful to request a remote backtrace
of a single cpu, and since we want that, the logical extension is to
support a cpumask as the underlying primitive.

This change modifies the existing lib/nmi_backtrace.c code to take a
cpumask as its basic primitive, and modifies the linux/nmi.h code to use
the new "cpumask" method instead.

The existing clients of nmi_backtrace (arm and x86) are converted to
using the new cpumask approach in this change.

The other users of the backtracing API (sparc64 and mips) are converted
to use the cpumask approach rather than the all/allbutself approach.
The mips code ignored the "include_self" boolean but with this change it
will now also dump a local backtrace if requested.

Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:30 -07:00
Vineet Gupta 51a021244b atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
This came to light when implementing native 64-bit atomics for ARCv2.

The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
to check whether atomic64_dec_if_positive() is available.  It seems it
was needed when not every arch defined it.  However as of current code
the Kconfig option seems needless

 - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
   generic definition of API is present lib/atomic64.c
 - arches with native 64-bit atomics select it in arch/*/Kconfig and
   define the API in their headers

So I see no point in keeping the Kconfig option

Compile tested for:
 - blackfin (CONFIG_GENERIC_ATOMIC64)
 - x86 (!CONFIG_GENERIC_ATOMIC64)
 - ia64

Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:30 -07:00
Linus Torvalds d1f5323370 Merge branch 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS splice updates from Al Viro:
 "There's a bunch of branches this cycle, both mine and from other folks
  and I'd rather send pull requests separately.

  This one is the conversion of ->splice_read() to ITER_PIPE iov_iter
  (and introduction of such). Gets rid of a lot of code in fs/splice.c
  and elsewhere; there will be followups, but these are for the next
  cycle...  Some pipe/splice-related cleanups from Miklos in the same
  branch as well"

* 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  pipe: fix comment in pipe_buf_operations
  pipe: add pipe_buf_steal() helper
  pipe: add pipe_buf_confirm() helper
  pipe: add pipe_buf_release() helper
  pipe: add pipe_buf_get() helper
  relay: simplify relay_file_read()
  switch default_file_splice_read() to use of pipe-backed iov_iter
  switch generic_file_splice_read() to use of ->read_iter()
  new iov_iter flavour: pipe-backed
  fuse_dev_splice_read(): switch to add_to_pipe()
  skb_splice_bits(): get rid of callback
  new helper: add_to_pipe()
  splice: lift pipe_lock out of splice_to_pipe()
  splice: switch get_iovec_page_array() to iov_iter
  splice_to_pipe(): don't open-code wakeup_pipe_readers()
  consistent treatment of EFAULT on O_DIRECT read/write
2016-10-07 15:36:58 -07:00
Linus Torvalds 513a4befae Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the main pull request for block layer changes in 4.9.

  As mentioned at the last merge window, I've changed things up and now
  do just one branch for core block layer changes, and driver changes.
  This avoids dependencies between the two branches. Outside of this
  main pull request, there are two topical branches coming as well.

  This pull request contains:

   - A set of fixes, and a conversion to blk-mq, of nbd. From Josef.

   - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd.
     Followup dependency fix from Geert.

   - General fixes from Bart, Baoyou, Guoqing, and Linus W.

   - CFQ async write starvation fix from Glauber.

   - Add supprot for delayed kick of the requeue list, from Mike.

   - Pull out the scalable bitmap code from blk-mq-tag.c and make it
     generally available under the name of sbitmap. Only blk-mq-tag uses
     it for now, but the blk-mq scheduling bits will use it as well.
     From Omar.

   - bdev thaw error progagation from Pierre.

   - Improve the blk polling statistics, and allow the user to clear
     them. From Stephen.

   - Set of minor cleanups from Christoph in block/blk-mq.

   - Set of cleanups and optimizations from me for block/blk-mq.

   - Various nvme/nvmet/nvmeof fixes from the various folks"

* 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits)
  fs/block_dev.c: return the right error in thaw_bdev()
  nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
  nvme/scsi: Remove power management support
  nvmet: Make dsm number of ranges zero based
  nvmet: Use direct IO for writes
  admin-cmd: Added smart-log command support.
  nvme-fabrics: Add host_traddr options field to host infrastructure
  nvme-fabrics: revise host transport option descriptions
  nvme-fabrics: rework nvmf_get_address() for variable options
  nbd: use BLK_MQ_F_BLOCKING
  blkcg: Annotate blkg_hint correctly
  cfq: fix starvation of asynchronous writes
  blk-mq: add flag for drivers wanting blocking ->queue_rq()
  blk-mq: remove non-blocking pass in blk_mq_map_request
  blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
  block: export bio_free_pages to other modules
  lightnvm: propagate device_add() error code
  lightnvm: expose device geometry through sysfs
  lightnvm: control life of nvm_dev in driver
  blk-mq: register device instead of disk
  ...
2016-10-07 14:42:05 -07:00
Linus Torvalds 2ab704a47e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "The usual rocket science from the trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  tracing/syscalls: fix multiline in error message text
  lib/Kconfig.debug: fix DEBUG_SECTION_MISMATCH description
  doc: vfs: fix fadvise() sycall name
  x86/entry: spell EBX register correctly in documentation
  securityfs: fix securityfs_create_dir comment
  irq: Fix typo in tracepoint.xml
2016-10-07 12:24:08 -07:00
Linus Torvalds c23112e039 Merge tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD updates from Shaohua Li:
 "This update includes:

   - new AVX512 instruction based raid6 gen/recovery algorithm

   - a couple of md-cluster related bug fixes

   - fix a potential deadlock

   - set nonrotational bit for raid array with SSD

   - set correct max_hw_sectors for raid5/6, which hopefuly can improve
     performance a little bit

   - other minor fixes"

* tag 'md/4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md: set rotational bit
  raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays
  raid5: handle register_shrinker failure
  raid5: fix to detect failure of register_shrinker
  md: fix a potential deadlock
  md/bitmap: fix wrong cleanup
  raid5: allow arbitrary max_hw_sectors
  lib/raid6: Add AVX512 optimized xor_syndrome functions
  lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
  lib/raid6: Add AVX512 optimized recovery functions
  lib/raid6: Add AVX512 optimized gen_syndrome functions
  md-cluster: make resync lock also could be interruptted
  md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  md-cluster: convert the completion to wait queue
  md-cluster: protect md_find_rdev_nr_rcu with rcu lock
  md-cluster: clean related infos of cluster
  md: changes for MD_STILL_CLOSED flag
  md-cluster: remove some unnecessary dlm_unlock_sync
  md-cluster: use FORCEUNLOCK in lockres_free
  md-cluster: call md_kick_rdev_from_array once ack failed
2016-10-07 09:45:43 -07:00
Linus Torvalds 553911c67e dmaengine updates for 4.8-rc1
This is bit large pile of code which bring in some nice additions:
  - Error reporting: we have added a new mechanism for users of dmaenegine to
    register a callback_result which tells them the result of the dma
    transaction. Right now only one user ntb is using it.
  - As we discussed on KS mailing list and pointed out NO_IRQ has no place in
    kernel, this also remove NO_IRQ from dmaengine subsystem (both arm and
    ppc users)
  - Support for IOMMU slave transfers and it implementation for arm.
  - To get better build coverage, enable COMPILE_TEST for bunch of driver,
    and fix the warning and sparse complaints on these.
  - Apart from above, usual updates spread across drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX9cGYAAoJEHwUBw8lI4NHXh0P/3OsctPYnwcangOz268hHDap
 7ZHwau96K7DRi8cFCc0XmG083Ivqih/fWMFJBUOEsuwS3zPHkfgfhsvm7MqrK3vv
 psJIwnubwTVVQ3lePYJlnna6mijcRNXVAooRLiqylA3QPIYRxECDFVDRNwf39D+I
 bYp5tmlFcobugOUUoMqq1D/gH8EHUWxrnrsS6UBBpYm+cusc6u9/JXlOb4pcJGSL
 V340zQ0S9FNuEM3b+1kMAeq3DG2wLXv9oJzz/6EN59sx5AdjlYUPHd/PvTYOeG0T
 crdtDfL+7xcqP0Ms4SGTOD4kXSe6nErr3bIBHQXI6ZmJn0j//+3yU21kTMl95kM+
 RM7nE4vItuQR0jPxVlhuLCcf3q7zMi+noOPZ1DVRTE1Yf9AizAgbPXyOE+jzGUUi
 6E+0Mj6CLpFH/Mffxphs7L6GKwfWqaLjAupbjR6EWZud37KAwvpcB1CkJEgT9C4s
 OiZ4INTPxXmw9dX/T9CPOyh8oZ8mB9LTUzHoJDvDGuwYm7HE0U9pzHG4bP0mjIIt
 y3RboP78t1HC9oZUrxCoGhvekJtok0k3RLGJTSx9ujklY9MJGG/F1KEC6APp5tXu
 0UToMXpgXSUkKEZesmsJFj/lbh1+h/yo5zTG5Hek8lh1K0sczaoWu3xTTSY9SSZQ
 ihlqyvdzSBweKo8ktU8A
 =9iA3
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-4.9-rc1' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine updates from Vinod Koul:
 "This is bit large pile of code which bring in some nice additions:

   - Error reporting: we have added a new mechanism for users of
     dmaenegine to register a callback_result which tells them the
     result of the dma transaction. Right now only one user (ntb) is
     using it.

   - As we discussed on KS mailing list and pointed out NO_IRQ has no
     place in kernel, this also remove NO_IRQ from dmaengine subsystem
     (both arm and ppc users)

   - Support for IOMMU slave transfers and its implementation for arm.

   - To get better build coverage, enable COMPILE_TEST for bunch of
     driver, and fix the warning and sparse complaints on these.

   - Apart from above, usual updates spread across drivers"

* tag 'dmaengine-4.9-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (169 commits)
  async_pq_val: fix DMA memory leak
  dmaengine: virt-dma: move function declarations
  dmaengine: omap-dma: Enable burst and data pack for SG
  DT: dmaengine: rcar-dmac: document R8A7743/5 support
  dmaengine: fsldma: Unmap region obtained by of_iomap
  dmaengine: jz4780: fix resource leaks on error exit return
  dma-debug: fix ia64 build, use PHYS_PFN
  dmaengine: coh901318: fix integer overflow when shifting more than 32 places
  dmaengine: edma: avoid uninitialized variable use
  dma-mapping: fix m32r build warning
  dma-mapping: fix ia64 build, use PHYS_PFN
  dmaengine: ti-dma-crossbar: enable COMPILE_TEST
  dmaengine: omap-dma: enable COMPILE_TEST
  dmaengine: edma: enable COMPILE_TEST
  dmaengine: ti-dma-crossbar: Fix of_device_id data parameter usage
  dmaengine: ti-dma-crossbar: Correct type for of_find_property() third parameter
  dmaengine/ARM: omap-dma: Fix the DMAengine compile test on non OMAP configs
  dmaengine: edma: Rename set_bits and remove unused clear_bits helper
  dmaengine: edma: Use correct type for of_find_property() third parameter
  dmaengine: edma: Fix of_device_id data parameter usage (legacy vs TPCC)
  ...
2016-10-06 17:13:54 -07:00
Miklos Szeredi a779638cf6 pipe: add pipe_buf_release() helper
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05 18:23:58 -04:00
Al Viro 241699cd72 new iov_iter flavour: pipe-backed
iov_iter variant for passing data into pipe.  copy_to_iter()
copies data into page(s) it has allocated and stuffs them into
the pipe; copy_page_to_iter() stuffs there a reference to the
page given to it.  Both will try to coalesce if possible.
iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages()
and friends will do as copy_to_iter() would have and return the
pages where the data would've been copied.  iov_iter_advance()
will truncate everything past the spot it has advanced to.

New primitive: iov_iter_pipe(), used for initializing those.
pipe should be locked all along.

Running out of space acts as fault would for iovec-backed ones;
in other words, giving it to ->read_iter() may result in short
read if the pipe overflows, or -EFAULT if it happens with nothing
copied there.

In other words, ->read_iter() on those acts pretty much like
->splice_read().  Moreover, all generic_file_splice_read() users,
as well as many other ->splice_read() instances can be switched
to that scheme - that'll happen in the next commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05 18:23:36 -04:00
Linus Torvalds 687ee0ad4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and
    co. at Google. https://lwn.net/Articles/701165/

 2) Do TCP Small Queues for retransmits, from Eric Dumazet.

 3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei
    Starovoitov.

 4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai.

 5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn.

 6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker.

 7) Support ndo_poll_controller in mlx5, from Calvin Owens.

 8) Move VRF processing to an output hook and allow l3mdev to be
    loopback, from David Ahern.

 9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern.

10) Congestion control in RXRPC, from David Howells.

11) Support geneve RX offload in ixgbe, from Emil Tantilov.

12) When hitting pressure for new incoming TCP data SKBs, perform a
    partial rathern than a full purge of the OFO queue (which could be
    huge). From Eric Dumazet.

13) Convert XFRM state and policy lookups to RCU, from Florian Westphal.

14) Support RX network flow classification to igb, from Gangfeng Huang.

15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski.

16) New skbmod packet action, from Jamal Hadi Salim.

17) Remove some inefficiencies in snmp proc output, from Jia He.

18) Add FIB notifications to properly propagate route changes to
    hardware which is doing forwarding offloading. From Jiri Pirko.

19) New dsa driver for qca8xxx chips, from John Crispin.

20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej
    Żenczykowski.

21) Add L3 mode to ipvlan, from Mahesh Bandewar.

22) Support 802.1ad in mlx4, from Moshe Shemesh.

23) Support hardware LRO in mediatek driver, from Nelson Chang.

24) Add TC offloading to mlx5, from Or Gerlitz.

25) Convert various drivers to ethtool ksettings interfaces, from
    Philippe Reynes.

26) TX max rate limiting for cxgb4, from Rahul Lakkireddy.

27) NAPI support for ath10k, from Rajkumar Manoharan.

28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed.

29) UDP replicast support in TIPC, from Richard Alpe.

30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru.

31) Support BQL in thunderx driver, from Sunil Goutham.

32) TSO support in alx driver, from Tobias Regnery.

33) Add stream parser engine and use it in kcm.

34) Support async DHCP replies in ipconfig module, from Uwe
    Kleine-König.

35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits)
  mlxsw: switchx2: Fix misuse of hard_header_len
  mlxsw: spectrum: Fix misuse of hard_header_len
  net/faraday: Stop NCSI device on shutdown
  net/ncsi: Introduce ncsi_stop_dev()
  net/ncsi: Rework the channel monitoring
  net/ncsi: Allow to extend NCSI request properties
  net/ncsi: Rework request index allocation
  net/ncsi: Don't probe on the reserved channel ID (0x1f)
  net/ncsi: Introduce NCSI_RESERVED_CHANNEL
  net/ncsi: Avoid unused-value build warning from ia64-linux-gcc
  net: Add netdev all_adj_list refcnt propagation to fix panic
  net: phy: Add Edge-rate driver for Microsemi PHYs.
  vmxnet3: Wake queue from reset work
  i40e: avoid NULL pointer dereference and recursive errors on early PCI error
  qed: Add RoCE ll2 & GSI support
  qed: Add support for memory registeration verbs
  qed: Add support for QP verbs
  qed: PD,PKEY and CQ verb support
  qed: Add support for RoCE hw init
  qede: Add qedr framework
  ...
2016-10-05 10:11:24 -07:00
Johannes Weiner d3798ae8c6 mm: filemap: don't plant shadow entries without radix tree node
When the underflow checks were added to workingset_node_shadow_dec(),
they triggered immediately:

  kernel BUG at ./include/linux/swap.h:276!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
   soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt
  CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60ba944 #1
  Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016
  task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
  RIP: page_cache_tree_insert+0xf1/0x100
  Call Trace:
    __add_to_page_cache_locked+0x12e/0x270
    add_to_page_cache_lru+0x4e/0xe0
    mpage_readpages+0x112/0x1d0
    blkdev_readpages+0x1d/0x20
    __do_page_cache_readahead+0x1ad/0x290
    force_page_cache_readahead+0xaa/0x100
    page_cache_sync_readahead+0x3f/0x50
    generic_file_read_iter+0x5af/0x740
    blkdev_read_iter+0x35/0x40
    __vfs_read+0xe1/0x130
    vfs_read+0x96/0x130
    SyS_read+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x13/0x8f
  Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00
  RIP  page_cache_tree_insert+0xf1/0x100

This is a long-standing bug in the way shadow entries are accounted in
the radix tree nodes. The shrinker needs to know when radix tree nodes
contain only shadow entries, no pages, so node->count is split in half
to count shadows in the upper bits and pages in the lower bits.

Unfortunately, the radix tree implementation doesn't know of this and
assumes all entries are in node->count. When there is a shadow entry
directly in root->rnode and the tree is later extended, the radix tree
implementation will copy that entry into the new node and and bump its
node->count, i.e. increases the page count bits. Once the shadow gets
removed and we subtract from the upper counter, node->count underflows
and triggers the warning. Afterwards, without node->count reaching 0
again, the radix tree node is leaked.

Limit shadow entries to when we have actual radix tree nodes and can
count them properly. That means we lose the ability to detect refaults
from files that had only the first page faulted in at eviction time.

Fixes: 449dd6984d ("mm: keep page cache radix tree nodes in check")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-05 09:17:56 -07:00
Linus Torvalds e46cae4418 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "The new features and main improvements in this merge for v4.9

   - Support for the UBSAN sanitizer

   - Set HAVE_EFFICIENT_UNALIGNED_ACCESS, it improves the code in some
     places

   - Improvements for the in-kernel fpu code, in particular the overhead
     for multiple consecutive in kernel fpu users is recuded

   - Add a SIMD implementation for the RAID6 gen and xor operations

   - Add RAID6 recovery based on the XC instruction

   - The PCI DMA flush logic has been improved to increase the speed of
     the map / unmap operations

   - The time synchronization code has seen some updates

  And bug fixes all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
  s390/con3270: fix insufficient space padding
  s390/con3270: fix use of uninitialised data
  MAINTAINERS: update DASD maintainer
  s390/cio: fix accidental interrupt enabling during resume
  s390/dasd: add missing \n to end of dev_err messages
  s390/config: Enable config options for Docker
  s390/dasd: make query host access interruptible
  s390/dasd: fix panic during offline processing
  s390/dasd: fix hanging offline processing
  s390/pci_dma: improve lazy flush for unmap
  s390/pci_dma: split dma_update_trans
  s390/pci_dma: improve map_sg
  s390/pci_dma: simplify dma address calculation
  s390/pci_dma: remove dma address range check
  iommu/s390: simplify registration of I/O address translation parameters
  s390: migrate exception table users off module.h and onto extable.h
  s390: export header for CLP ioctl
  s390/vmur: fix irq pointer dereference in int handler
  s390/dasd: add missing KOBJ_CHANGE event for unformatted devices
  s390: enable UBSAN
  ...
2016-10-04 14:05:52 -07:00
Linus Torvalds 597f03f9d1 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug updates from Thomas Gleixner:
 "Yet another batch of cpu hotplug core updates and conversions:

   - Provide core infrastructure for multi instance drivers so the
     drivers do not have to keep custom lists.

   - Convert custom lists to the new infrastructure. The block-mq custom
     list conversion comes through the block tree and makes the diffstat
     tip over to more lines removed than added.

   - Handle unbalanced hotplug enable/disable calls more gracefully.

   - Remove the obsolete CPU_STARTING/DYING notifier support.

   - Convert another batch of notifier users.

   The relayfs changes which conflicted with the conversion have been
   shipped to me by Andrew.

   The remaining lot is targeted for 4.10 so that we finally can remove
   the rest of the notifiers"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  cpufreq: Fix up conversion to hotplug state machine
  blk/mq: Reserve hotplug states for block multiqueue
  x86/apic/uv: Convert to hotplug state machine
  s390/mm/pfault: Convert to hotplug state machine
  mips/loongson/smp: Convert to hotplug state machine
  mips/octeon/smp: Convert to hotplug state machine
  fault-injection/cpu: Convert to hotplug state machine
  padata: Convert to hotplug state machine
  cpufreq: Convert to hotplug state machine
  ACPI/processor: Convert to hotplug state machine
  virtio scsi: Convert to hotplug state machine
  oprofile/timer: Convert to hotplug state machine
  block/softirq: Convert to hotplug state machine
  lib/irq_poll: Convert to hotplug state machine
  x86/microcode: Convert to hotplug state machine
  sh/SH-X3 SMP: Convert to hotplug state machine
  ia64/mca: Convert to hotplug state machine
  ARM/OMAP/wakeupgen: Convert to hotplug state machine
  ARM/shmobile: Convert to hotplug state machine
  arm64/FP/SIMD: Convert to hotplug state machine
  ...
2016-10-03 19:43:08 -07:00
Linus Torvalds 1a4a2bc460 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull low-level x86 updates from Ingo Molnar:
 "In this cycle this topic tree has become one of those 'super topics'
  that accumulated a lot of changes:

   - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on
     x86 - preceded by an array of changes. v4.8 saw preparatory changes
     in this area already - this is the rest of the work. Includes the
     thread stack caching performance optimization. (Andy Lutomirski)

   - switch_to() cleanups and all around enhancements. (Brian Gerst)

   - A large number of dumpstack infrastructure enhancements and an
     unwinder abstraction. The secret long term plan is safe(r) live
     patching plus maybe another attempt at debuginfo based unwinding -
     but all these current bits are standalone enhancements in a frame
     pointer based debug environment as well. (Josh Poimboeuf)

   - More __ro_after_init and const annotations. (Kees Cook)

   - Enable KASLR for the vmemmap memory region. (Thomas Garnier)"

[ The virtually mapped stack changes are pretty fundamental, and not
  x86-specific per se, even if they are only used on x86 right now. ]

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  x86/asm: Get rid of __read_cr4_safe()
  thread_info: Use unsigned long for flags
  x86/alternatives: Add stack frame dependency to alternative_call_2()
  x86/dumpstack: Fix show_stack() task pointer regression
  x86/dumpstack: Remove dump_trace() and related callbacks
  x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder
  oprofile/x86: Convert x86_backtrace() to use the new unwinder
  x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder
  perf/x86: Convert perf_callchain_kernel() to use the new unwinder
  x86/unwind: Add new unwind interface and implementations
  x86/dumpstack: Remove NULL task pointer convention
  fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y
  sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK
  lib/syscall: Pin the task stack in collect_syscall()
  x86/process: Pin the target stack in get_wchan()
  x86/dumpstack: Pin the target stack when dumping it
  kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
  sched/core: Add try_get_task_stack() and put_task_stack()
  x86/entry/64: Fix a minor comment rebase error
  iommu/amd: Don't put completion-wait semaphore on stack
  ...
2016-10-03 16:13:28 -07:00
Linus Torvalds de956b8f45 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "Main changes in this cycle were:

   - Refactor the EFI memory map code into architecture neutral files
     and allow drivers to permanently reserve EFI boot services regions
     on x86, as well as ARM/arm64. (Matt Fleming)

   - Add ARM support for the EFI ESRT driver. (Ard Biesheuvel)

   - Make the EFI runtime services and efivar API interruptible by
     swapping spinlocks for semaphores. (Sylvain Chouleur)

   - Provide the EFI identity mapping for kexec which allows kexec to
     work on SGI/UV platforms with requiring the "noefi" kernel command
     line parameter. (Alex Thorlton)

   - Add debugfs node to dump EFI page tables on arm64. (Ard Biesheuvel)

   - Merge the EFI test driver being carried out of tree until now in
     the FWTS project. (Ivan Hu)

   - Expand the list of flags for classifying EFI regions as "RAM" on
     arm64 so we align with the UEFI spec. (Ard Biesheuvel)

   - Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
     or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
     services function table for direct calls, alleviating us from
     having to maintain the custom function table. (Lukas Wunner)

   - Miscellaneous cleanups and fixes"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE
  x86/efi: Allow invocation of arbitrary boot services
  x86/efi: Optimize away setup_gop32/64 if unused
  x86/efi: Use kmalloc_array() in efi_call_phys_prolog()
  efi/arm64: Treat regions with WT/WC set but WB cleared as memory
  efi: Add efi_test driver for exporting UEFI runtime service interfaces
  x86/efi: Defer efi_esrt_init until after memblock_x86_fill
  efi/arm64: Add debugfs node to dump UEFI runtime page tables
  x86/efi: Remove unused find_bits() function
  fs/efivarfs: Fix double kfree() in error path
  x86/efi: Map in physical addresses in efi_map_region_fixed
  lib/ucs2_string: Speed up ucs2_utf8size()
  firmware-gsmi: Delete an unnecessary check before the function call "dma_pool_destroy"
  x86/efi: Initialize status to ensure garbage is not returned on small size
  efi: Replace runtime services spinlock with semaphore
  efi: Don't use spinlocks for efi vars
  efi: Use a file local lock for efivars
  efi/arm*: esrt: Add missing call to efi_esrt_init()
  efi/esrt: Use memremap not ioremap to access ESRT table in memory
  x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data
  ...
2016-10-03 11:33:18 -07:00
David S. Miller b50afd203a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three sets of overlapping changes.  Nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-02 22:20:41 -04:00
Niklas Söderlund 2e0cc304e9 dma-debug: fix ia64 build, use PHYS_PFN
kbuild test robot reports:

   lib/dma-debug.c: In function 'debug_dma_map_resource':
>> lib/dma-debug.c:1541:16: error: implicit declaration of function '__phys_to_pfn' [-Werror=implicit-function-declaration]
     entry->pfn  = __phys_to_pfn(addr);
                   ^~~~~~~~~~~~~

ia64 does not provide __phys_to_pfn(), use the PHYS_PFN() alias.

Fixes: 0e74b34dfc ("dma-debug: add support for resource mappings")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-09-30 23:21:05 +05:30
Thomas Gleixner d7e25c66c9 Merge branch 'x86/urgent' into x86/asm
Get the cr4 fixes so we can apply the final cleanup
2016-09-30 12:38:28 +02:00
Uwe Kleine-König 67797b9237 lib/Kconfig.debug: fix DEBUG_SECTION_MISMATCH description
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-09-29 10:19:45 +02:00
Shaohua Li 3796c3cbfb lib: clean up put_cpu_var usage
put_cpu_var takes the percpu data, not the data returned from
get_cpu_var.

This doesn't change the behavior.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 22:09:17 -04:00
Al Viro 4bce9f6ee8 get rid of separate multipage fault-in primitives
* the only remaining callers of "short" fault-ins are just as happy with generic
variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
"short" ones
* rename the multipage variants to now available plain ones.
* get rid of compat macro defining iov_iter_fault_in_multipage_readable by
expanding it in its only user.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 18:12:24 -04:00
Gayatri Kammela 099b548c42 raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays
Specifying the aligned attributes to the char data[NDISKS][PAGE_SIZE],
char recovi[PAGE_SIZE] and char recovi[PAGE_SIZE] arrays, so that all
malloc memory is page boundary aligned.

Without these alignment attributes, the test causes a segfault in
userspace when the NDISKS are changed to 4 from 16.

The RAID stripes will be page aligned anyway, so we want to test what
the kernel actually will execute.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Reviewed-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-26 16:18:21 -07:00
Niklas Söderlund 0e74b34dfc dma-debug: add support for resource mappings
A MMIO mapped resource can not be represented by a struct page so a new
debug type is needed to handle this. This patch add such type and
functionality to add/remove entries and how to translate them to a
physical address.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-09-26 22:16:41 +05:30
Linus Torvalds 8d2c0d36d6 radix tree: fix sibling entry handling in radix_tree_descend()
The fixes to the radix tree test suite show that the multi-order case is
broken.  The basic reason is that the radix tree code uses tagged
pointers with the "internal" bit in the low bits, and calculating the
pointer indices was supposed to mask off those bits.  But gcc will
notice that we then use the index to re-create the pointer, and will
avoid doing the arithmetic and use the tagged pointer directly.

This cleans the code up, using the existing is_sibling_entry() helper to
validate the sibling pointer range (instead of open-coding it), and
using entry_to_node() to mask off the low tag bit from the pointer.  And
once you do that, you might as well just use the now cleaned-up pointer
directly.

[ Side note: the multi-order code isn't actually ever used in the kernel
  right now, and the only reason I didn't just delete all that code is
  that Kirill Shutemov piped up and said:

    "Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
     It also converts shmem-with-huge-pages and hugetlb to them.

     I'm okay with converting it to other mechanism, but I need
     something.  (I looked into Konstantin's RFC patchset[2].  It looks
     okay, but I don't feel myself qualified to review it as I don't
     know much about radix-tree internals.)"

  [1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
  [2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg ]

Reported-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Cedric Blancher <cedric.blancher@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-25 13:32:46 -07:00
David S. Miller d6989d4bbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-23 06:46:57 -04:00
Vivien Didelot 96b03ab86d locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
Fix the indefinitiley -> indefinitely typo in Kconfig.debug.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160922205513.17821-1-vivien.didelot@savoirfairelinux.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-23 07:30:04 +02:00
Gayatri Kammela 694dda62d2 lib/raid6: Add AVX512 optimized xor_syndrome functions
Optimize RAID6 xor_syndrome functions to take advantage of the 512-bit
ZMM integer instructions introduced in AVX512.

AVX512 optimized xor_syndrome functions, which is simply based on sse2.c
written by hpa.

The patch was tested and benchmarked before submission on
a hardware that has AVX512 flags to support such instructions

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Megha Dey <megha.dey@linux.intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21 09:09:44 -07:00
Gayatri Kammela 161db5d165 lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
Adding avx512 gen_syndrome and recovery functions so as to allow code to
be compiled and tested successfully in userspace.

This patch is tested in userspace and improvement in performace is
observed.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21 09:09:44 -07:00
Gayatri Kammela 13c520b299 lib/raid6: Add AVX512 optimized recovery functions
Optimize RAID6 recovery functions to take advantage of
the 512-bit ZMM integer instructions introduced in AVX512.

AVX512 optimized recovery functions, which is simply based
on recov_avx2.c written by Jim Kukunas

This patch was tested and benchmarked before submission on
a hardware that has AVX512 flags to support such instructions

Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21 09:09:44 -07:00
Gayatri Kammela e0a491c129 lib/raid6: Add AVX512 optimized gen_syndrome functions
Optimize RAID6 gen_syndrom functions to take advantage of
the 512-bit ZMM integer instructions introduced in AVX512.

AVX512 optimized gen_syndrom functions, which is simply based
on avx2.c written by Yuanhan Liu and sse2.c written by hpa.

The patch was tested and benchmarked before submission on
a hardware that has AVX512 flags to support such instructions

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21 09:09:44 -07:00
Neal Cardwell a4f1f9ac81 lib/win_minmax: windowed min or max estimator
This commit introduces a generic library to estimate either the min or
max value of a time-varying variable over a recent time window. This
is code originally from Kathleen Nichols. The current form of the code
is from Van Jacobson.

A single struct minmax_sample will track the estimated windowed-max
value of the series if you call minmax_running_max() or the estimated
windowed-min value of the series if you call minmax_running_min().

Nearly equivalent code is already in place for minimum RTT estimation
in the TCP stack. This commit extracts that code and generalizes it to
handle both min and max. Moving the code here reduces the footprint
and complexity of the TCP code base and makes the filter generally
available for other parts of the codebase, including an upcoming TCP
congestion control module.

This library works well for time series where the measurements are
smoothly increasing or decreasing.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 00:22:59 -04:00
Ingo Molnar 41a66072c3 Merge branch 'efi/urgent' into efi/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20 16:58:59 +02:00
Christian Borntraeger 725c4d22bb ubsan: allow to disable the null sanitizer
Some architectures use a hardware defined structure at address zero.
Checking for a null pointer will result in many ubsan reports.
Allow users to disable the null sanitizer.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-09-20 14:26:08 +02:00
Herbert Xu ca26893f05 rhashtable: Add rhlist interface
The insecure_elasticity setting is an ugly wart brought out by
users who need to insert duplicate objects (that is, distinct
objects with identical keys) into the same table.

In fact, those users have a much bigger problem.  Once those
duplicate objects are inserted, they don't have an interface to
find them (unless you count the walker interface which walks
over the entire table).

Some users have resorted to doing a manual walk over the hash
table which is of course broken because they don't handle the
potential existence of multiple hash tables.  The result is that
they will break sporadically when they encounter a hash table
resize/rehash.

This patch provides a way out for those users, at the expense
of an extra pointer per object.  Essentially each object is now
a list of objects carrying the same key.  The hash table will
only see the lists so nothing changes as far as rhashtable is
concerned.

To use this new interface, you need to insert a struct rhlist_head
into your objects instead of struct rhash_head.  While the hash
table is unchanged, for type-safety you'll need to use struct
rhltable instead of struct rhashtable.  All the existing interfaces
have been duplicated for rhlist, including the hash table walker.

One missing feature is nulls marking because AFAIK the only potential
user of it does not need duplicate objects.  Should anyone need
this it shouldn't be too hard to add.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 04:43:36 -04:00
Ingo Molnar b2c16e1efd Merge branch 'linus' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20 08:29:21 +02:00
Sebastian Andrzej Siewior 8c58898b3e fault-injection/cpu: Convert to hotplug state machine
Install the callbacks via the state machine.

This is just a temporary vehicle to keep the interface working for now,
It'll be replaced by the sysfs interface which allows to step through the
hotplug state machine step by step.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160906170457.32393-15-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-19 21:44:31 +02:00
Sebastian Andrzej Siewior 75e12ed653 lib/irq_poll: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160906170457.32393-8-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-19 21:44:27 +02:00
Colin Ian King 60658e0dc1 sbitmap: initialize weight to zero
Variable weight is not being initialized to zero before it is
used to compute the weight sum. Ensure it is initialized to zero.

Found with static analysis with cppcheck:
[lib/sbitmap.c:177]: (error) Uninitialized variable: weight

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-19 08:19:40 -06:00
Al Viro d4690f1e1c fix iov_iter_fault_in_readable()
... by turning it into what used to be multipages counterpart

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-17 14:05:30 -07:00
Omar Sandoval 5c64a8df0c sbitmap: don't update the allocation hint on clear after resize
If we have a bunch of high-numbered bits allocated and then we resize
the struct sbitmap_queue, when those bits get cleared, we'll update the
hint and then have to re-randomize it repeatedly. Avoid that by checking
that the cleared bit is still a valid hint. No measurable performance
difference in the common case.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17 13:39:06 -06:00
Tejun Heo 7092dff2af debugobj, workqueue: remove keventd_up() usage
Now that workqueue can handle work item queueing from very early
during boot, there is no need to gate schedule_work() while
!keventd_up().  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
2016-09-17 13:18:21 -04:00
Omar Sandoval 05fd095d53 sbitmap: re-initialize allocation hints after resize
After a struct sbitmap_queue is resized smaller, the allocation hints
may still be set to bits beyond the new depth of the bitmap. This means
that, for example, if the number of blk-mq tags is reduced through
sysfs, more requests than the nominal queue depth may be in flight.

It's tempting to fix this at resize time by doing a one-time
reinitialization of the hints, but this can race with
__sbitmap_queue_get() updating the hint. Instead, check the hint before
we use it. This caused no measurable performance difference in my
synthetic benchmarks.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17 08:39:16 -06:00
Omar Sandoval 98d95416db sbitmap: randomize initial alloc_hint values
In order to get good cache behavior from a sbitmap, we want each CPU to
stick to its own cacheline(s) as much as possible. This might happen
naturally as the bitmap gets filled up and the alloc_hint values spread
out, but we really want this behavior from the start. blk-mq apparently
intended to do this, but the code to do this was never wired up. Get rid
of the dead code and make it part of the sbitmap library.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17 08:39:14 -06:00
Omar Sandoval f4a644db86 sbitmap: push alloc policy into sbitmap_queue
Again, there's no point in passing this in every time. Make it part of
struct sbitmap_queue and clean up the API.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17 08:39:12 -06:00
Omar Sandoval 40aabb6746 sbitmap: push per-cpu last_tag into sbitmap_queue
Allocating your own per-cpu allocation hint separately makes for an
awkward API. Instead, allocate the per-cpu hint as part of the struct
sbitmap_queue. There's no point for a struct sbitmap_queue without the
cache, but you can still use a bare struct sbitmap.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17 08:39:10 -06:00
Omar Sandoval 48e28166a7 sbitmap: allocate wait queues on a specific node
The original bt_alloc() we converted from was using kzalloc(), not
kzalloc_node(), to allocate the wait queues. This was probably an
oversight, so fix it for sbitmap_queue_init_node().

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17 08:39:08 -06:00
Omar Sandoval 88459642cb blk-mq: abstract tag allocation out into sbitmap library
This is a generally useful data structure, so make it available to
anyone else who might want to use it. It's also a nice cleanup
separating the allocation logic from the rest of the tag handling logic.

The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
selected by CONFIG_BLOCK for now.

This should be a complete noop functionality-wise.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-17 08:38:44 -06:00
Andy Lutomirski aa1f1a6396 lib/syscall: Pin the task stack in collect_syscall()
This will avoid a potential read-after-free if collect_syscall()
(e.g. /proc/PID/syscall) is called on an exiting task.

Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0bfd8e6d4729c97745d3781a29610a33d0a8091d.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 09:18:53 +02:00
Jakub Kicinski 5c0ca3f566 test_bpf: fix the dummy skb after dissector changes
Commit d5709f7ab7 ("flow_dissector: For stripped vlan, get vlan
info from skb->vlan_tci") made flow dissector look at vlan_proto
when vlan is present.  Since test_bpf sets skb->vlan_tci to ~0
(including VLAN_TAG_PRESENT) we have to populate skb->vlan_proto.

Fixes false negative on test #24:
test_bpf: #24 LD_PAYLOAD_OFF jited:0 175 ret 0 != 42 FAIL (1 times)

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15 19:17:15 -04:00
Ingo Molnar d4b80afbba Merge branch 'linus' into x86/asm, to pick up recent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 08:24:53 +02:00
David S. Miller b20b378d49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mediatek/mtk_eth_soc.c
	drivers/net/ethernet/qlogic/qed/qed_dcbx.c
	drivers/net/phy/Kconfig

All conflicts were cases of overlapping commits.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12 15:52:44 -07:00
Lukas Wunner cf289cefbf lib/ucs2_string: Speed up ucs2_utf8size()
No need to calculate the string length on every loop iteration.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:46 +01:00
David S. Miller 60175ccdf4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree.  Most relevant updates are the removal of per-conntrack timers to
use a workqueue/garbage collection approach instead from Florian
Westphal, the hash and numgen expression for nf_tables from Laura
Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
removal of ip_conntrack sysctl and many other incremental updates on our
Netfilter codebase.

More specifically, they are:

1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
   transport area in dccp, sctp, tcp, udp and udplite protocol
   conntrackers, from Gao Feng.

2) Missing whitespace on error message in physdev match, from Hangbin Liu.

3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.

4) Add nf_ct_expires() helper function and use it, from Florian Westphal.

5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
   from Florian.

6) Rename nf_tables set implementation to nft_set_{name}.c

7) Introduce the hash expression to allow arbitrary hashing of selector
   concatenations, from Laura Garcia Liebana.

8) Remove ip_conntrack sysctl backward compatibility code, this code has
   been around for long time already, and we have two interfaces to do
   this already: nf_conntrack sysctl and ctnetlink.

9) Use nf_conntrack_get_ht() helper function whenever possible, instead
   of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.

10) Add quota expression for nf_tables.

11) Add number generator expression for nf_tables, this supports
    incremental and random generators that can be combined with maps,
    very useful for load balancing purpose, again from Laura Garcia Liebana.

12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.

13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
    configuration, this is used by a follow up patch to perform better chain
    update validation.

14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
    nft_set_hash implementation to honor the NLM_F_EXCL flag.

15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
    patch from Florian Westphal.

16) Don't use the DYING bit to know if the conntrack event has been already
    delivered, instead a state variable to track event re-delivery
    states, also from Florian.

17) Remove the per-conntrack timer, use the workqueue approach that was
    discussed during the NFWS, from Florian Westphal.

18) Use the netlink conntrack table dump path to kill stale entries,
    again from Florian.

19) Add a garbage collector to get rid of stale conntracks, from
    Florian.

20) Reschedule garbage collector if eviction rate is high.

21) Get rid of the __nf_ct_kill_acct() helper.

22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.

23) Make nf_log_set() interface assertive on unsupported families.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06 12:45:26 -07:00
Geert Uytterhoeven e6173ba42b lib/test_hash.c: fix warning in preprocessor symbol evaluation
Some versions of gcc don't like tests for the value of an undefined
preprocessor symbol, even in the #else branch of an #ifndef:

    lib/test_hash.c:224:7: warning: "HAVE_ARCH__HASH_32" is not defined [-Wundef]
     #elif HAVE_ARCH__HASH_32 != 1
	   ^
    lib/test_hash.c:229:7: warning: "HAVE_ARCH_HASH_32" is not defined [-Wundef]
     #elif HAVE_ARCH_HASH_32 != 1
	   ^
    lib/test_hash.c:234:7: warning: "HAVE_ARCH_HASH_64" is not defined [-Wundef]
     #elif HAVE_ARCH_HASH_64 != 1
	   ^

Seen with gcc 4.9, not seen with 4.1.2.

Change the logic to only check the value inside an #ifdef to fix this.

Fixes: 468a942852 ("<linux/hash.h>: Add support for architecture-specific functions")
Link: http://lkml.kernel.org/r/20160829214952.1334674-4-arnd@arndb.de
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: George Spelvin <linux@sciencehorizons.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-01 17:52:01 -07:00
Geert Uytterhoeven ed76b7a131 lib/test_hash.c: fix warning in two-dimensional array init
lib/test_hash.c: In function 'test_hash_init':
  lib/test_hash.c:146:2: warning: missing braces around initializer [-Wmissing-braces]

Fixes: 468a942852 ("<linux/hash.h>: Add support for architecture-specific functions")
Link: http://lkml.kernel.org/r/20160829214952.1334674-3-arnd@arndb.de
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: George Spelvin <linux@sciencehorizons.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-01 17:52:01 -07:00
Martin Schwidefsky f5b55fa1f8 RAID/s390: provide raid6 recovery optimization
The XC instruction can be used to improve the speed of the raid6
recovery. The loops now operate on blocks of 256 bytes.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-09-01 16:13:25 +02:00
Josh Poimboeuf 0d025d271e mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
There are three usercopy warnings which are currently being silenced for
gcc 4.6 and newer:

1) "copy_from_user() buffer size is too small" compile warning/error

   This is a static warning which happens when object size and copy size
   are both const, and copy size > object size.  I didn't see any false
   positives for this one.  So the function warning attribute seems to
   be working fine here.

   Note this scenario is always a bug and so I think it should be
   changed to *always* be an error, regardless of
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.

2) "copy_from_user() buffer size is not provably correct" compile warning

   This is another static warning which happens when I enable
   __compiletime_object_size() for new compilers (and
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
   is const, but copy size is *not*.  In this case there's no way to
   compare the two at build time, so it gives the warning.  (Note the
   warning is a byproduct of the fact that gcc has no way of knowing
   whether the overflow function will be called, so the call isn't dead
   code and the warning attribute is activated.)

   So this warning seems to only indicate "this is an unusual pattern,
   maybe you should check it out" rather than "this is a bug".

   I get 102(!) of these warnings with allyesconfig and the
   __compiletime_object_size() gcc check removed.  I don't know if there
   are any real bugs hiding in there, but from looking at a small
   sample, I didn't see any.  According to Kees, it does sometimes find
   real bugs.  But the false positive rate seems high.

3) "Buffer overflow detected" runtime warning

   This is a runtime warning where object size is const, and copy size >
   object size.

All three warnings (both static and runtime) were completely disabled
for gcc 4.6 with the following commit:

  2fb0815c9e ("gcc4: disable __compiletime_object_size for GCC 4.6+")

That commit mistakenly assumed that the false positives were caused by a
gcc bug in __compiletime_object_size().  But in fact,
__compiletime_object_size() seems to be working fine.  The false
positives were instead triggered by #2 above.  (Though I don't have an
explanation for why the warnings supposedly only started showing up in
gcc 4.6.)

So remove warning #2 to get rid of all the false positives, and re-enable
warnings #1 and #3 by reverting the above commit.

Furthermore, since #1 is a real bug which is detected at compile time,
upgrade it to always be an error.

Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
needed.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-30 10:10:21 -07:00
David S. Miller 6abdd5f593 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30 00:54:02 -04:00
Martin Schwidefsky 474fd6e80f RAID/s390: add SIMD implementation for raid6 gen/xor
Using vector registers is slightly faster:

raid6: vx128x8  gen() 19705 MB/s
raid6: vx128x8  xor() 11886 MB/s
raid6: using algorithm vx128x8 gen() 19705 MB/s
raid6: .... xor() 11886 MB/s, rmw enabled

vs the software algorithms:

raid6: int64x1  gen()  3018 MB/s
raid6: int64x1  xor()  1429 MB/s
raid6: int64x2  gen()  4661 MB/s
raid6: int64x2  xor()  3143 MB/s
raid6: int64x4  gen()  5392 MB/s
raid6: int64x4  xor()  3509 MB/s
raid6: int64x8  gen()  4441 MB/s
raid6: int64x8  xor()  3207 MB/s
raid6: using algorithm int64x4 gen() 5392 MB/s
raid6: .... xor() 3509 MB/s, rmw enabled

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:05:04 +02:00
Eric Dumazet 9dbeea7f08 rhashtable: fix a memory leak in alloc_bucket_locks()
If vmalloc() was successful, do not attempt a kmalloc_array()

Fixes: 4cf0b354d9 ("rhashtable: avoid large lock-array allocations")
Reported-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26 21:59:53 -07:00
Pablo Neira Ayuso 5ca8cc5bf1 rhashtable: add rhashtable_lookup_get_insert_key()
This patch modifies __rhashtable_insert_fast() so it returns the
existing object that clashes with the one that you want to insert.
In case the object is successfully inserted, NULL is returned.
Otherwise, you get an error via ERR_PTR().

This patch adapts the existing callers of __rhashtable_insert_fast()
so they handle this new logic, and it adds a new
rhashtable_lookup_get_insert_key() interface to fetch this existing
object.

nf_tables needs this change to improve handling of EEXIST cases via
honoring the NLM_F_EXCL flag and by checking if the data part of the
mapping matches what we have.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-08-26 17:29:41 +02:00
Andy Lutomirski b4a0f533e5 dma-api: Teach the "DMA-from-stack" check about vmapped stacks
If we're using CONFIG_VMAP_STACK=y and we manage to point an sg entry
at the stack, then either the sg page will be in highmem or sg_virt()
will return the direct-map alias.  In neither case will the existing
check_for_stack() implementation realize that it's a stack page.

Fix it by explicitly checking for stack pages.

This has no effect by itself.  It's broken out for ease of review.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/448460622731312298bf19dcbacb1606e75de7a9.1470907718.git.luto@kernel.org
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:11:41 +02:00
Herbert Xu 246779dd09 rhashtable: Remove GFP flag from rhashtable_walk_init
The commit 8f6fd83c6c ("rhashtable:
accept GFP flags in rhashtable_walk_init") added a GFP flag argument
to rhashtable_walk_init because some users wish to use the walker
in an unsleepable context.

In fact we don't need to allocate memory in rhashtable_walk_init
at all.  The walker is always paired with an iterator so we could
just stash ourselves there.

This patch does that by introducing a new enter function to replace
the existing init function.  This way we don't have to churn all
the existing users again.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 14:40:24 -07:00
Linus Torvalds 184ca82348 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
    Fietkau.

 2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

 3) Fix some tg3 ethtool logic bugs, and one that would cause no
    interrupts to be generated when rx-coalescing is set to 0.  From
    Satish Baddipadige and Siva Reddy Kallam.

 4) QLCNIC mailbox corruption and napi budget handling fix from Manish
    Chopra.

 5) Fix fib_trie logic when walking the trie during /proc/net/route
    output than can access a stale node pointer.  From David Forster.

 6) Several sctp_diag fixes from Phil Sutter.

 7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

 8) Checksum fixup fixes in bpf from Daniel Borkmann.

 9) Memork leaks in nfnetlink, from Liping Zhang.

10) Use after free in rxrpc, from David Howells.

11) Use after free in new skb_array code of macvtap driver, from Jason
    Wang.

12) Calipso resource leak, from Colin Ian King.

13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

15) Fix lockdep splats in macsec, from Sabrina Dubroca.

16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
    handling.

17) Various tc-action bug fixes, from CONG Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net_sched: allow flushing tc police actions
  net_sched: unify the init logic for act_police
  net_sched: convert tcf_exts from list to pointer array
  net_sched: move tc offload macros to pkt_cls.h
  net_sched: fix a typo in tc_for_each_action()
  net_sched: remove an unnecessary list_del()
  net_sched: remove the leftover cleanup_a()
  mlxsw: spectrum: Allow packets to be trapped from any PG
  mlxsw: spectrum: Unmap 802.1Q FID before destroying it
  mlxsw: spectrum: Add missing rollbacks in error path
  mlxsw: reg: Fix missing op field fill-up
  mlxsw: spectrum: Trap loop-backed packets
  mlxsw: spectrum: Add missing packet traps
  mlxsw: spectrum: Mark port as active before registering it
  mlxsw: spectrum: Create PVID vPort before registering netdevice
  mlxsw: spectrum: Remove redundant errors from the code
  mlxsw: spectrum: Don't return upon error in removal path
  i40e: check for and deal with non-contiguous TCs
  ixgbe: Re-enable ability to toggle VLAN filtering
  ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
  ...
2016-08-17 17:26:58 -07:00
Vegard Nossum 12311959ec rhashtable: fix shift by 64 when shrinking
I got this:

    ================================================================================
    UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13
    shift exponent 64 is too large for 64-bit type 'long unsigned int'
    CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    Workqueue: events rht_deferred_worker
     0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3
     ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0
     0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640
    Call Trace:
     [<ffffffff82344f50>] dump_stack+0xac/0xfc
     [<ffffffff82344ea4>] ? _atomic_dec_and_lock+0xc4/0xc4
     [<ffffffff8242f5b8>] ubsan_epilogue+0xd/0x8a
     [<ffffffff82430c41>] __ubsan_handle_shift_out_of_bounds+0x255/0x29a
     [<ffffffff824309ec>] ? __ubsan_handle_out_of_bounds+0x180/0x180
     [<ffffffff84003436>] ? nl80211_req_set_reg+0x256/0x2f0
     [<ffffffff812112ba>] ? print_context_stack+0x8a/0x160
     [<ffffffff81200031>] ? amd_pmu_reset+0x341/0x380
     [<ffffffff823af808>] rht_deferred_worker+0x1618/0x1790
     [<ffffffff823af808>] ? rht_deferred_worker+0x1618/0x1790
     [<ffffffff823ae1f0>] ? rhashtable_jhash2+0x370/0x370
     [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970
     [<ffffffff8134c1cf>] process_one_work+0x79f/0x1970
     [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970
     [<ffffffff8134ba30>] ? try_to_grab_pending+0x4c0/0x4c0
     [<ffffffff8134d564>] ? worker_thread+0x1c4/0x1340
     [<ffffffff8134d8ff>] worker_thread+0x55f/0x1340
     [<ffffffff845e904f>] ? __schedule+0x4df/0x1d40
     [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970
     [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970
     [<ffffffff813642f7>] kthread+0x237/0x390
     [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280
     [<ffffffff845f8c93>] ? _raw_spin_unlock_irq+0x33/0x50
     [<ffffffff845f95df>] ret_from_fork+0x1f/0x40
     [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280
    ================================================================================

roundup_pow_of_two() is undefined when called with an argument of 0, so
let's avoid the call and just fall back to ht->p.min_size (which should
never be smaller than HASH_MIN_SIZE).

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-15 11:10:09 -07:00
Florian Westphal 4cf0b354d9 rhashtable: avoid large lock-array allocations
Sander reports following splat after netfilter nat bysrc table got
converted to rhashtable:

swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
 [<ffffffff811633ed>] warn_alloc_failed+0xdd/0x140
 [<ffffffff811638b1>] __alloc_pages_nodemask+0x3e1/0xcf0
 [<ffffffff811a72ed>] alloc_pages_current+0x8d/0x110
 [<ffffffff8117cb7f>] kmalloc_order+0x1f/0x70
 [<ffffffff811aec19>] __kmalloc+0x129/0x140
 [<ffffffff8146d561>] bucket_table_alloc+0xc1/0x1d0
 [<ffffffff8146da1d>] rhashtable_insert_rehash+0x5d/0xe0
 [<ffffffff819fcfff>] nf_nat_setup_info+0x2ef/0x400

The failure happens when allocating the spinlock array.
Even with GFP_KERNEL its unlikely for such a large allocation
to succeed.

Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
to adding NOWARN for atomic allocations this also makes the bucket-array
sizing more conservative.

In commit 095dc8e0c3 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
IOW, consider size needed by a single spinlock when determining
number of locks per cpu.  So with 64 byte per cacheline and 4 byte per
spinlock this gives 32 locks per cpu.

Resulting size of the lock-array (sizeof(spinlock) == 4):

cpus:    1   2   4   8   16   32   64
old:    1k  1k  4k  8k  16k  16k  16k
new:   128 256 512  1k   2k   4k   8k

8k allocation should have decent chance of success even
with GFP_ATOMIC, and should not fail with GFP_KERNEL.

With 72-byte spinlock (LOCKDEP):
cpus :   1   2
old:    9k 18k
new:   ~2k ~4k

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-14 21:12:57 -07:00
Roman Pen a67823c1ed percpu-refcount: init ->confirm_switch member properly
This patch targets two things which are related to ->confirm_switch:

 1. Init ->confirm_switch pointer with NULL on percpu_ref_init() or
    kernel frightfully complains with WARN_ON_ONCE(ref->confirm_switch)
    at __percpu_ref_switch_to_atomic if memory chunk was not properly
    zeroed.

 2. Warn if RCU callback is still in progress on percpu_ref_exit().
    The race still exists, because percpu_ref_call_confirm_rcu()
    drops ->confirm_switch to NULL early, but that is only a warning
    and still the caller is responsible that ref is no longer in
    active use.  Hopefully that can help to catch incorrect usage
    of percpu-refcount.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-08-11 13:52:23 -04:00
Tejun Heo 33e465ce7c percpu_ref: allow operation mode switching operations to be called concurrently
percpu_ref initially didn't have explicit mode switching operations.
It started out in percpu mode and switched to atomic mode on kill and
then released.  Ensuring that kill operation is initiated only after
init completes was naturally the caller's responsibility.

percpu_ref_reinit() was introduced later but it didn't shift the
synchronization responsibility.  Reinit can't be performed until kill
is confirmed, so there was nothing to worry about
synchronization-wise.  Also, as both reinit and kill manipulate the
base reference, invocations of the same function couldn't be allowed
to race each other.

The latest additions of percpu_ref_switch_to_atomic/percpu() changed
the situation.  These two functions can be called any time as long as
the percpu_ref is between init and exit and thus there are valid valid
usage scenarios where these new functions race with each other or
against reinit/kill.  Mostly from inertia, f47ad45784 ("percpu_ref:
decouple switching to percpu mode and reinit") still left
synchronization among percpu mode switching operations to its users.

That the new switch functions can be freely mixed with kill/reinit but
the operations themselves should be synchronized is too subtle a
requirement and led to a very subtle race condition in blk-mq freezing
path.

This patch fixes the situation by introducing percpu_ref_switch_lock
to protect mode switching operations.  This ensures that percpu-ref
users don't have to worry about mode changing operations racing
against each other, e.g. switch_to_percpu against kill, as long as the
sequence of operations is valid.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Link: http://lkml.kernel.org/g/1443287365-4244-7-git-send-email-akinobu.mita@gmail.com
Fixes: f47ad45784 ("percpu_ref: decouple switching to percpu mode and reinit")
2016-08-10 15:02:58 -04:00
Tejun Heo 3f49bdd958 percpu_ref: restructure operation mode switching
Restructure atomic/percpu mode switching.

* The users of __percpu_ref_switch_to_atomic/percpu() now call a new
  function __percpu_ref_switch_mode() which calls either of the
  original switching functions depending on the current state of
  ref->force_atomic and the __PERCPU_REF_DEAD flag.  The callers no
  longer check whether switching is necessary but always invoke
  __percpu_ref_switch_mode().

* !ref->confirm_switch waiting is collected into
  __percpu_ref_switch_mode().

This patch doesn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
2016-08-10 15:02:58 -04:00
Tejun Heo 18808354b7 percpu_ref: unify staggered atomic switching wait behavior
When an atomic or percpu switching starts before the previous atomic
switching finishes, the taken behaviors are

* If the new atomic switching has confirmation callback, it waits
  for the previous atomic switching to complete.

* If the new percpu switching is the first percpu switching following
  the previous atomic switching, it waits the previous atomic
  switching to complete.

No percpu_ref user depends on these subtleties.  The only meaningful
part is that, if the caller ensures that atomic switching isn't in
progress, mode switching operations can be issued from any context.

This patch pulls the wait logic to the top of both switching functions
so that they always wait for the previous atomic switching to
complete.  This makes the behavior simpler and consistent for both
directions and will help allowing concurrent invocations of mode
switching functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
2016-08-10 15:02:58 -04:00
Tejun Heo b2302c7fdc percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic()
Reorganize __percpu_ref_switch_to_atomic() so that it looks
structurally similar to __percpu_ref_switch_to_percpu() and relocate
percpu_ref_switch_to_atomic so that the two internal functions are
co-located.

This patch doesn't introduce any functional differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
2016-08-10 15:02:58 -04:00
Tejun Heo a2f5630cb7 percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation
At the beginning, percpu_ref guaranteed a RCU grace period between a
call to percpu_ref_kill_and_confirm() and the invocation of the
confirmation callback.  This guarantee exposed internal implementation
details and got rescinded while switching over to sched RCU; however,
__percpu_ref_switch_to_atomic() still inserts a full sched RCU grace
period even when it can simply wait for the previous attempt.

Remove the unnecessary grace period and perform the confirmation
synchronously for staggered atomic switching attempts.  Update
comments accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
2016-08-10 15:02:58 -04:00
Linus Torvalds 1bd4403d86 unsafe_[get|put]_user: change interface to use a error target label
When I initially added the unsafe_[get|put]_user() helpers in commit
5b24a7a2aa ("Add 'unsafe' user access functions for batched
accesses"), I made the mistake of modeling the interface on our
traditional __[get|put]_user() functions, which return zero on success,
or -EFAULT on failure.

That interface is fairly easy to use, but it's actually fairly nasty for
good code generation, since it essentially forces the caller to check
the error value for each access.

In particular, since the error handling is already internally
implemented with an exception handler, and we already use "asm goto" for
various other things, we could fairly easily make the error cases just
jump directly to an error label instead, and avoid the need for explicit
checking after each operation.

So switch the interface to pass in an error label, rather than checking
the error value in the caller.  Best do it now before we start growing
more users (the signal handling code in particular would be a good place
to use the new interface).

So rather than

	if (unsafe_get_user(x, ptr))
		... handle error ..

the interface is now

	unsafe_get_user(x, ptr, label);

where an error during the user mode fetch will now just cause a jump to
'label' in the caller.

Right now the actual _implementation_ of this all still ends up being a
"if (err) goto label", and does not take advantage of any exception
label tricks, but for "unsafe_put_user()" in particular it should be
fairly straightforward to convert to using the exception table model.

Note that "unsafe_get_user()" is much harder to convert to a clever
exception table model, because current versions of gcc do not allow the
use of "asm goto" (for the exception) with output values (for the actual
value to be fetched).  But that is hopefully not a limitation in the
long term.

[ Also note that it might be a good idea to switch unsafe_get_user() to
  actually _return_ the value it fetches from user space, but this
  commit only changes the error handling semantics ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08 13:02:01 -07:00
Phil Sutter 3b3bf80b99 rhashtable-test: Fix max_size parameter description
Looks like a simple copy'n'paste error.

Fixes: 1aa661f5c3 ("rhashtable-test: Measure time to insert, remove & traverse entries")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 12:52:42 -07:00
Jason Baron 9049fc7453 dynamic_debug: add jump label support
Although dynamic debug is often only used for debug builds, sometimes
its enabled for production builds as well.  Minimize its impact by using
jump labels.  This reduces the text section by 7000+ bytes in the kernel
image below.  It does increase data, but this should only be referenced
when changing the direction of the branches, and hence usually not in
cache.

     text     data     bss       dec     hex  filename
  8194852  4879776  925696  14000324  d5a0c4  vmlinux.pre
  8187337  4960224  925696  14073257  d6bda9  vmlinux.post

Link: http://lkml.kernel.org/r/d165b465e8c89bc582d973758d40be44c33f018b.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Krzysztof Kozlowski 00085f1efa dma-mapping: use unsigned long for dma_attrs
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Linus Torvalds d52bd54db8 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - the rest of ocfs2

 - various hotfixes, mainly MM

 - quite a bit of misc stuff - drivers, fork, exec, signals, etc.

 - printk updates

 - firmware

 - checkpatch

 - nilfs2

 - more kexec stuff than usual

 - rapidio updates

 - w1 things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits)
  ipc: delete "nr_ipc_ns"
  kcov: allow more fine-grained coverage instrumentation
  init/Kconfig: add clarification for out-of-tree modules
  config: add android config fragments
  init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
  relay: add global mode support for buffer-only channels
  init: allow blacklisting of module_init functions
  w1:omap_hdq: fix regression
  w1: add helper macro module_w1_family
  w1: remove need for ida and use PLATFORM_DEVID_AUTO
  rapidio/switches: add driver for IDT gen3 switches
  powerpc/fsl_rio: apply changes for RIO spec rev 3
  rapidio: modify for rev.3 specification changes
  rapidio: change inbound window size type to u64
  rapidio/idt_gen2: fix locking warning
  rapidio: fix error handling in mbox request/release functions
  rapidio/tsi721_dma: advance queue processing from transfer submit call
  rapidio/tsi721: add messaging mbox selector parameter
  rapidio/tsi721: add PCIe MRRS override parameter
  rapidio/tsi721_dma: add channel mask and queue size parameters
  ...
2016-08-02 21:08:07 -04:00
Vegard Nossum a4691deabf kcov: allow more fine-grained coverage instrumentation
For more targeted fuzzing, it's better to disable kernel-wide
instrumentation and instead enable it on a per-subsystem basis.  This
follows the pattern of UBSAN and allows you to compile in the kcov
driver without instrumenting the whole kernel.

To instrument a part of the kernel, you can use either

    # for a single file in the current directory
    KCOV_INSTRUMENT_filename.o := y

or

    # for all the files in the current directory (excluding subdirectories)
    KCOV_INSTRUMENT := y

or

    # (same as above)
    ccflags-y += $(CFLAGS_KCOV)

or

    # for all the files in the current directory (including subdirectories)
    subdir-ccflags-y += $(CFLAGS_KCOV)

Link: http://lkml.kernel.org/r/1464008380-11405-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:43 -04:00
Arnd Bergmann a9bfd33217 crc32: use ktime_get_ns() for measurement
The crc32 test function measures the elapsed time in nanoseconds, but
uses 'struct timespec' for that.  We want to remove timespec from the
kernel for y2038 compatibility, and ktime_get_ns() also helps make the
code simpler here.

It is also slightly better to use monontonic time, as we are only
interested in the time difference.

Link: http://lkml.kernel.org/r/20160617143932.3289626-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: "David S . Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:08 -04:00
Sebastian Ott f003a1f182 lib/iommu-helper: skip to next segment
When a large enough area in the iommu bitmap is found but would span a
boundary we continue the search starting from the next bit position.
For large allocations this can lead to several useless invocations of
bitmap_find_next_zero_area() and iommu_is_span_boundary().

Continue the search from the start of the next segment (which is the
next bit position such that we'll not cross the same segment boundary
again).

Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1606081910070.3211@schleppi
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:07 -04:00
Borislav Petkov 6b1d174b0c ratelimit: extend to print suppressed messages on release
Extend the ratelimiting facility to print the amount of suppressed lines
when it is being released.

This use case is aimed at short-termed, burst-like users for which we
want to output the suppressed lines stats only once, after it has been
disposed of.  For an example, see /dev/kmsg usage in a follow-on patch.

Also, change the printk() line we issue on release to not use
"callbacks" as it is misleading: we're not suppressing callbacks but
printk() calls.

This has been separated from a previous patch by Linus.

Link: http://lkml.kernel.org/r/20160716061745.15795-2-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Franck Bui <fbui@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:06 -04:00
Nicolas Iooss 901d805c33 UBSAN: fix typo in format string
handle_object_size_mismatch() used %pk to format a kernel pointer with
pr_err().  This seemed to be a misspelling for %pK, but using this to
format a kernel pointer does not make much sence here.

Therefore use %p instead, like in handle_missaligned_access().

Link: http://lkml.kernel.org/r/20160730083010.11569-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 17:31:41 -04:00
Vladimir Davydov 05eb6e7263 radix-tree: account nodes to memcg only if explicitly requested
Radix trees may be used not only for storing page cache pages, so
unconditionally accounting radix tree nodes to the current memory cgroup
is bad: if a radix tree node is used for storing data shared among
different cgroups we risk pinning dead memory cgroups forever.

So let's only account radix tree nodes if it was explicitly requested by
passing __GFP_ACCOUNT to INIT_RADIX_TREE.  Currently, we only want to
account page cache entries, so mark mapping->page_tree so.

Fixes: 58e698af4c ("radix-tree: account radix_tree_node to memory cgroup")
Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>	[4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 17:31:41 -04:00
Linus Torvalds f716a85cd6 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - GCC plugin support by Emese Revfy from grsecurity, with a fixup from
   Kees Cook.  The plugins are meant to be used for static analysis of
   the kernel code.  Two plugins are provided already.

 - reduction of the gcc commandline by Arnd Bergmann.

 - IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada

 - bin2c fix by Michael Tautschnig

 - setlocalversion fix by Wolfram Sang

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  gcc-plugins: disable under COMPILE_TEST
  kbuild: Abort build on bad stack protector flag
  scripts: Fix size mismatch of kexec_purgatory_size
  kbuild: make samples depend on headers_install
  Kbuild: don't add obj tree in additional includes
  Kbuild: arch: look for generated headers in obtree
  Kbuild: always prefix objtree in LINUXINCLUDE
  Kbuild: avoid duplicate include path
  Kbuild: don't add ../../ to include path
  vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
  kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
  kconfig.h: use already defined macros for IS_REACHABLE() define
  export.h: use __is_defined() to check if __KSYM_* is defined
  kconfig.h: use __is_defined() to check if MODULE is defined
  kbuild: setlocalversion: print error to STDERR
  Add sancov plugin
  Add Cyclomatic complexity GCC plugin
  GCC plugin infrastructure
  Shared library support
2016-08-02 16:37:12 -04:00
Linus Torvalds f38d2e5313 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a number of regressions in the marvell cesa driver caused
  by the chaining work, and a regression in lib/mpi that leads to a
  GFP_KERNEL allocation with preemption disabled"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: marvell - Don't copy IV vectors from the _process op for ciphers
  lib/mpi: Fix SG miter leak
  crypto: marvell - Update cache with input sg only when it is unmapped
  crypto: marvell - Don't chain at DMA level when backlog is disabled
  crypto: marvell - Fix memory leaks in TDMA chain for cipher requests
2016-08-01 14:28:42 -04:00
Linus Torvalds d761f3ed6e Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Thomas Gleixner:

 - more work to make the microcode loader robust

 - a fix for the micro code load precedence

 - fixes for initrd loading with randomized memory

 - less printk noise on SMP machines

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm, x86/microcode: Add __PAGE_OFFSET_BASE define on 32-bit
  x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
  x86/microcode: Remove unused symbol exports
  x86/microcode/intel: Do not issue microcode updates messages on each CPU
  Documentation/microcode: Document some aspects for more clarity
  x86/microcode/AMD: Make amd_ucode_patch[] static
  x86/microcode/intel: Unexport save_mc_for_early()
  x86/microcode/intel: Rename load_microcode_early() to find_microcode_patch()
  x86/microcode: Propagate save_microcode_in_initrd() retval
  x86/microcode: Get rid of find_cpio_data()'s dummy offset arg
  lib/cpio: Make find_cpio_data()'s offset arg optional
  x86/microcode: Fix suspend to RAM with builtin microcode
  x86/microcode: Fix loading precedence
2016-07-30 13:18:33 -07:00
Herbert Xu 4816c94064 lib/mpi: Fix SG miter leak
In mpi_read_raw_from_sgl we may leak the SG miter resouces after
reading the leading zeroes.  This patch fixes this by stopping the
iteration once the leading zeroes have been read.

Fixes: 127827b9c2 ("lib/mpi: Do not do sg_virt")
Reported-by: Nicolai Stange <nicstange@gmail.com>
Tested-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-29 18:30:16 +08:00
Linus Torvalds 1c88e19b0f Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "The rest of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (101 commits)
  mm, compaction: simplify contended compaction handling
  mm, compaction: introduce direct compaction priority
  mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
  mm, page_alloc: make THP-specific decisions more generic
  mm, page_alloc: restructure direct compaction handling in slowpath
  mm, page_alloc: don't retry initial attempt in slowpath
  mm, page_alloc: set alloc_flags only once in slowpath
  lib/stackdepot.c: use __GFP_NOWARN for stack allocations
  mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
  mm, kasan: account for object redzone in SLUB's nearest_obj()
  mm: fix use-after-free if memory allocation failed in vma_adjust()
  zsmalloc: Delete an unnecessary check before the function call "iput"
  mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
  mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
  mm: optimize copy_page_to/from_iter_iovec
  mm: add cond_resched() to generic_swapfile_activate()
  Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
  mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
  mm: hwpoison: remove incorrect comments
  make __section_nr() more efficient
  ...
2016-07-28 16:36:48 -07:00
Kirill A. Shutemov 87cc271d5e lib/stackdepot.c: use __GFP_NOWARN for stack allocations
This (large, atomic) allocation attempt can fail.  We expect and handle
that, so avoid the scary warning.

Link: http://lkml.kernel.org/r/20160720151905.GB19146@node.shutemov.name
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Alexander Potapenko 80a9201a59 mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
For KASAN builds:
 - switch SLUB allocator to using stackdepot instead of storing the
   allocation/deallocation stacks in the objects;
 - change the freelist hook so that parts of the freelist can be put
   into the quarantine.

[aryabinin@virtuozzo.com: fixes]
  Link: http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Mikulas Patocka 3fa6c50731 mm: optimize copy_page_to/from_iter_iovec
copy_page_to_iter_iovec() and copy_page_from_iter_iovec() copy some data
to userspace or from userspace.  These functions have a fast path where
they map a page using kmap_atomic and a slow path where they use kmap.

kmap is slower than kmap_atomic, so the fast path is preferred.

However, on kernels without highmem support, kmap just calls
page_address, so there is no need to avoid kmap.  On kernels without
highmem support, the fast path just increases code size (and cache
footprint) and it doesn't improve copy performance in any way.

This patch enables the fast path only if CONFIG_HIGHMEM is defined.

Code size reduced by this patch:
  x86 (without highmem)	  928
  x86-64		  960
  sparc64		  848
  alpha			 1136
  pa-risc		 1200

[akpm@linux-foundation.org: use IS_ENABLED(), per Andi]
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221711410.4818@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Linus Torvalds 554828ee0d Merge branch 'salted-string-hash'
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.

That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.

It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.

Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.

* merge branch 'salted-string-hash':
  fs/dcache.c: Save one 32-bit multiply in dcache lookup
  vfs: make the string hashes salt the hash
2016-07-28 12:26:31 -07:00
Linus Torvalds 818e607b57 A number of improvements for the /dev/random driver; the most
important is the use of a ChaCha20-based CRNG for /dev/urandom, which
 is faster, more efficient, and easier to make scalable for
 silly/abusive userspace programs that want to read from /dev/urandom
 in a tight loop on NUMA systems.
 
 This set of patches also improves entropy gathering on VM's running on
 Microsoft Azure, and will take advantage of a hw random number
 generator (if present) to initialize the /dev/urandom pool.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXla/8AAoJEPL5WVaVDYGj4xgH/3qDPHpFvynUMbMgg/MRQNt2
 K1zhKKQjYuJ465aIiPME21tGC1C5JxO3a9mgDy0pfJmLVvCWRUx9dDtdI59Xkaev
 KuVTbqdl8D75lftX3/jF3lXKd5dVLrW2V9hOMcESQqBFuoc1B8sJztR0upQsdGvA
 I8+jjxRffSzxDZY4it6p/lqsTdEfwhA+mEc6ztZ+Ccsdlqd+GNCvB/YE4tk+U6x/
 tGKaKfuiHSXpR4P9ks/L5gwaIcFQ6Y1rdVqd8YP3iC9YXGFZdJkRXklRRj1mCgWs
 YoCDMATTos+6iVONft94fF5pW6HND1F3bxPNo/RpMZca3MlqbiCSij3ky+gqUc4=
 =mSBt
 -----END PGP SIGNATURE-----

Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull random driver updates from Ted Ts'o:
 "A number of improvements for the /dev/random driver; the most
  important is the use of a ChaCha20-based CRNG for /dev/urandom, which
  is faster, more efficient, and easier to make scalable for
  silly/abusive userspace programs that want to read from /dev/urandom
  in a tight loop on NUMA systems.

  This set of patches also improves entropy gathering on VM's running on
  Microsoft Azure, and will take advantage of a hw random number
  generator (if present) to initialize the /dev/urandom pool"

(It turns out that the random tree hadn't been in linux-next this time
around, because it had been dropped earlier as being too quiet.  Oh
well).

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: strengthen input validation for RNDADDTOENTCNT
  random: add backtracking protection to the CRNG
  random: make /dev/urandom scalable for silly userspace programs
  random: replace non-blocking pool with a Chacha20-based CRNG
  random: properly align get_random_int_hash
  random: add interrupt callback to VMBus IRQ handler
  random: print a warning for the first ten uninitialized random users
  random: initialize the non-blocking pool via add_hwgenerator_randomness()
2016-07-27 15:11:55 -07:00
Linus Torvalds 468fc7ed55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

 2) Make DSA binding more sane, from Andrew Lunn.

 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface.  From Brenden Blanco and
    others.

 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

 8) Simplify netlink conntrack entry layout, from Florian Westphal.

 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

11) Support qdisc packet injection in pktgen, from John Fastabend.

12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

13) Add NV congestion control support to TCP, from Lawrence Brakmo.

14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

16) Support MPLS over IPV4, from Simon Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  xgene: Fix build warning with ACPI disabled.
  be2net: perform temperature query in adapter regardless of its interface state
  l2tp: Correctly return -EBADF from pppol2tp_getname.
  net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
  net: ipmr/ip6mr: update lastuse on entry change
  macsec: ensure rx_sa is set when validation is disabled
  tipc: dump monitor attributes
  tipc: add a function to get the bearer name
  tipc: get monitor threshold for the cluster
  tipc: make cluster size threshold for monitoring configurable
  tipc: introduce constants for tipc address validation
  net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
  MAINTAINERS: xgene: Add driver and documentation path
  Documentation: dtb: xgene: Add MDIO node
  dtb: xgene: Add MDIO node
  drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
  drivers: net: xgene: Use exported functions
  drivers: net: xgene: Enable MDIO driver
  drivers: net: xgene: Add backward compatibility
  drivers: net: phy: xgene: Add MDIO driver
  ...
2016-07-27 12:03:20 -07:00
Ingo Molnar df15929f8f Merge branch 'linus' into x86/microcode, to pick up merge window changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-27 12:35:35 +02:00
Linus Torvalds 0e06f5c0de Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2

 - most(?) of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
  thp: fix comments of __pmd_trans_huge_lock()
  cgroup: remove unnecessary 0 check from css_from_id()
  cgroup: fix idr leak for the first cgroup root
  mm: memcontrol: fix documentation for compound parameter
  mm: memcontrol: remove BUG_ON in uncharge_list
  mm: fix build warnings in <linux/compaction.h>
  mm, thp: convert from optimistic swapin collapsing to conservative
  mm, thp: fix comment inconsistency for swapin readahead functions
  thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
  shmem: split huge pages beyond i_size under memory pressure
  thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
  khugepaged: add support of collapse for tmpfs/shmem pages
  shmem: make shmem_inode_info::lock irq-safe
  khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
  thp: extract khugepaged from mm/huge_memory.c
  shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
  shmem: add huge pages support
  shmem: get_unmapped_area align huge page
  shmem: prepare huge= mount option and sysfs knob
  mm, rmap: account shmem thp pages
  ...
2016-07-26 19:55:54 -07:00
Kirill A. Shutemov c78c66d1dd radix-tree: implement radix_tree_maybe_preload_order()
The new helper is similar to radix_tree_maybe_preload(), but tries to
preload number of nodes required to insert (1 << order) continuous
naturally-aligned elements.

This is required to push huge pages into pagecache.

Link: http://lkml.kernel.org/r/1466021202-61880-24-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Joonsoo Kim f2ca0b5571 mm/page_owner: use stackdepot to store stacktrace
Currently, we store each page's allocation stacktrace on corresponding
page_ext structure and it requires a lot of memory.  This causes the
problem that memory tight system doesn't work well if page_owner is
enabled.  Moreover, even with this large memory consumption, we cannot
get full stacktrace because we allocate memory at boot time and just
maintain 8 stacktrace slots to balance memory consumption.  We could
increase it to more but it would make system unusable or change system
behaviour.

To solve the problem, this patch uses stackdepot to store stacktrace.
It obviously provides memory saving but there is a drawback that
stackdepot could fail.

stackdepot allocates memory at runtime so it could fail if system has
not enough memory.  But, most of allocation stack are generated at very
early time and there are much memory at this time.  So, failure would
not happen easily.  And, one failure means that we miss just one page's
allocation stacktrace so it would not be a big problem.  In this patch,
when memory allocation failure happens, we store special stracktrace
handle to the page that is failed to save stacktrace.  With it, user can
guess memory usage properly even if failure happens.

Memory saving looks as following.  (4GB memory system with page_owner)
(before the patch -> after the patch)

static allocation:
92274688 bytes -> 25165824 bytes

dynamic allocation after boot + kernel build:
0 bytes -> 327680 bytes

total:
92274688 bytes -> 25493504 bytes

72% reduction in total.

Note that implementation looks complex than someone would imagine
because there is recursion issue.  stackdepot uses page allocator and
page_owner is called at page allocation.  Using stackdepot in page_owner
could re-call page allcator and then page_owner.  That is a recursion.
To detect and avoid it, whenever we obtain stacktrace, recursion is
checked and page_owner is set to dummy information if found.  Dummy
information means that this page is allocated for page_owner feature
itself (such as stackdepot) and it's understandable behavior for user.

[iamjoonsoo.kim@lge.com: mm-page_owner-use-stackdepot-to-store-stacktrace-v3]
  Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
  Link: http://lkml.kernel.org/r/1466150259-27727-7-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Stephen Boyd d5dfc80f80 dma-debug: track bucket lock state for static checkers
get_hash_bucket() and put_hash_bucket() acquire and release the same
spinlock, but this confuses static checkers such as sparse

  lib/dma-debug.c:254:27: warning: context imbalance in 'get_hash_bucket' - wrong count at exit
  lib/dma-debug.c:268:13: warning: context imbalance in 'put_hash_bucket' - unexpected unlock

Add the appropriate acquire and release statements so that checkers can
properly track the lock state.

Link: http://lkml.kernel.org/r/20160701191552.24295-1-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Kees Cook a519167e75 gcc-plugins: disable under COMPILE_TEST
Since adding the gcc plugin development headers is required for the
gcc plugin support, we should ease into this new kernel build dependency
more slowly. For now, disable the gcc plugins under COMPILE_TEST so that
all*config builds will skip it.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-07-27 00:08:54 +02:00
Linus Torvalds d05d7f4079 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:

   - the big change is the cleanup from Mike Christie, cleaning up our
     uses of command types and modified flags.  This is what will throw
     some merge conflicts

   - regression fix for the above for btrfs, from Vincent

   - following up to the above, better packing of struct request from
     Christoph

   - a 2038 fix for blktrace from Arnd

   - a few trivial/spelling fixes from Bart Van Assche

   - a front merge check fix from Damien, which could cause issues on
     SMR drives

   - Atari partition fix from Gabriel

   - convert cfq to highres timers, since jiffies isn't granular enough
     for some devices these days.  From Jan and Jeff

   - CFQ priority boost fix idle classes, from me

   - cleanup series from Ming, improving our bio/bvec iteration

   - a direct issue fix for blk-mq from Omar

   - fix for plug merging not involving the IO scheduler, like we do for
     other types of merges.  From Tahsin

   - expose DAX type internally and through sysfs.  From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
  block: Fix front merge check
  block: do not merge requests without consulting with io scheduler
  block: Fix spelling in a source code comment
  block: expose QUEUE_FLAG_DAX in sysfs
  block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
  Btrfs: fix comparison in __btrfs_map_block()
  block: atari: Return early for unsupported sector size
  Doc: block: Fix a typo in queue-sysfs.txt
  cfq-iosched: Charge at least 1 jiffie instead of 1 ns
  cfq-iosched: Fix regression in bonnie++ rewrite performance
  cfq-iosched: Convert slice_resid from u64 to s64
  block: Convert fifo_time from ulong to u64
  blktrace: avoid using timespec
  block/blk-cgroup.c: Declare local symbols static
  block/bio-integrity.c: Add #include "blk.h"
  block/partition-generic.c: Remove a set-but-not-used variable
  block: bio: kill BIO_MAX_SIZE
  cfq-iosched: temporarily boost queue priority for idle classes
  block: drbd: avoid to use BIO_MAX_SIZE
  block: bio: remove BIO_MAX_SECTORS
  ...
2016-07-26 15:03:07 -07:00
Linus Torvalds bbce2ad2d7 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.8:

  API:
   - first part of skcipher low-level conversions
   - add KPP (Key-agreement Protocol Primitives) interface.

  Algorithms:
   - fix IPsec/cryptd reordering issues that affects aesni
   - RSA no longer does explicit leading zero removal
   - add SHA3
   - add DH
   - add ECDH
   - improve DRBG performance by not doing CTR by hand

  Drivers:
   - add x86 AVX2 multibuffer SHA256/512
   - add POWER8 optimised crc32c
   - add xts support to vmx
   - add DH support to qat
   - add RSA support to caam
   - add Layerscape support to caam
   - add SEC1 AEAD support to talitos
   - improve performance by chaining requests in marvell/cesa
   - add support for Araneus Alea I USB RNG
   - add support for Broadcom BCM5301 RNG
   - add support for Amlogic Meson RNG
   - add support Broadcom NSP SoC RNG"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (180 commits)
  crypto: vmx - Fix aes_p8_xts_decrypt build failure
  crypto: vmx - Ignore generated files
  crypto: vmx - Adding support for XTS
  crypto: vmx - Adding asm subroutines for XTS
  crypto: skcipher - add comment for skcipher_alg->base
  crypto: testmgr - Print akcipher algorithm name
  crypto: marvell - Fix wrong flag used for GFP in mv_cesa_dma_add_iv_op
  crypto: nx - off by one bug in nx_of_update_msc()
  crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
  crypto: scatterwalk - Inline start/map/done
  crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start
  crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone
  crypto: scatterwalk - Fix test in scatterwalk_done
  crypto: api - Optimise away crypto_yield when hard preemption is on
  crypto: scatterwalk - add no-copy support to copychunks
  crypto: scatterwalk - Remove scatterwalk_bytes_sglen
  crypto: omap - Stop using crypto scatterwalk_bytes_sglen
  crypto: skcipher - Remove top-level givcipher interface
  crypto: user - Remove crypto_lookup_skcipher call
  crypto: cts - Convert to skcipher
  ...
2016-07-26 13:40:17 -07:00
Linus Torvalds 55392c4c06 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "This update provides the following changes:

   - The rework of the timer wheel which addresses the shortcomings of
     the current wheel (cascading, slow search for next expiring timer,
     etc).  That's the first major change of the wheel in almost 20
     years since Finn implemted it.

   - A large overhaul of the clocksource drivers init functions to
     consolidate the Device Tree initialization

   - Some more Y2038 updates

   - A capability fix for timerfd

   - Yet another clock chip driver

   - The usual pile of updates, comment improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
  tick/nohz: Optimize nohz idle enter
  clockevents: Make clockevents_subsys static
  clocksource/drivers/time-armada-370-xp: Fix return value check
  timers: Implement optimization for same expiry time in mod_timer()
  timers: Split out index calculation
  timers: Only wake softirq if necessary
  timers: Forward the wheel clock whenever possible
  timers/nohz: Remove pointless tick_nohz_kick_tick() function
  timers: Optimize collect_expired_timers() for NOHZ
  timers: Move __run_timers() function
  timers: Remove set_timer_slack() leftovers
  timers: Switch to a non-cascading wheel
  timers: Reduce the CPU index space to 256k
  timers: Give a few structs and members proper names
  hlist: Add hlist_is_singular_node() helper
  signals: Use hrtimer for sigtimedwait()
  timers: Remove the deprecated mod_timer_pinned() API
  timers, net/ipv4/inet: Initialize connection request timers as pinned
  timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
  timers, drivers/tty/metag_da: Initialize the poll timer as pinned
  ...
2016-07-25 20:43:12 -07:00
Linus Torvalds 0f657262d5 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "Various x86 low level modifications:

   - preparatory work to support virtually mapped kernel stacks (Andy
     Lutomirski)

   - support for 64-bit __get_user() on 32-bit kernels (Benjamin
     LaHaise)

   - (involved) workaround for Knights Landing CPU erratum (Dave Hansen)

   - MPX enhancements (Dave Hansen)

   - mremap() extension to allow remapping of the special VDSO vma, for
     purposes of user level context save/restore (Dmitry Safonov)

   - hweight and entry code cleanups (Borislav Petkov)

   - bitops code generation optimizations and cleanups with modern GCC
     (H. Peter Anvin)

   - syscall entry code optimizations (Paolo Bonzini)"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  x86/mm/cpa: Add missing comment in populate_pdg()
  x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
  x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
  x86/smp: Remove unnecessary initialization of thread_info::cpu
  x86/smp: Remove stack_smp_processor_id()
  x86/uaccess: Move thread_info::addr_limit to thread_struct
  x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
  x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
  x86/dumpstack: When OOPSing, rewind the stack before do_exit()
  x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
  x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
  x86/dumpstack: Try harder to get a call trace on stack overflow
  x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
  x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
  x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
  x86/mm: Use pte_none() to test for empty PTE
  x86/mm: Disallow running with 32-bit PTEs to work around erratum
  x86/mm: Ignore A/D bits in pte/pmd/pud_none()
  x86/mm: Move swap offset/type up in PTE to work around erratum
  x86/entry: Inline enter_from_user_mode()
  ...
2016-07-25 15:34:18 -07:00
Linus Torvalds c86ad14d30 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The locking tree was busier in this cycle than the usual pattern - a
  couple of major projects happened to coincide.

  The main changes are:

   - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
     across all SMP architectures (Peter Zijlstra)

   - add atomic_fetch_{inc/dec}() as well, using the generic primitives
     (Davidlohr Bueso)

   - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
     Waiman Long)

   - optimize smp_cond_load_acquire() on arm64 and implement LSE based
     atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
     on arm64 (Will Deacon)

   - introduce smp_acquire__after_ctrl_dep() and fix various barrier
     mis-uses and bugs (Peter Zijlstra)

   - after discovering ancient spin_unlock_wait() barrier bugs in its
     implementation and usage, strengthen its semantics and update/fix
     usage sites (Peter Zijlstra)

   - optimize mutex_trylock() fastpath (Peter Zijlstra)

   - ... misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
  locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
  locking/static_keys: Fix non static symbol Sparse warning
  locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
  locking/atomic, arch/tile: Fix tilepro build
  locking/atomic, arch/m68k: Remove comment
  locking/atomic, arch/arc: Fix build
  locking/Documentation: Clarify limited control-dependency scope
  locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
  locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
  locking/atomic, arch/mips: Convert to _relaxed atomics
  locking/atomic, arch/alpha: Convert to _relaxed atomics
  locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
  locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
  locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
  locking/atomic: Fix atomic64_relaxed() bits
  locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  ...
2016-07-25 12:41:29 -07:00
Andy Lutomirski 13d4ea097d x86/uaccess: Move thread_info::addr_limit to thread_struct
struct thread_info is a legacy mess.  To prepare for its partial removal,
move thread_info::addr_limit out.

As an added benefit, this way is simpler.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:30 +02:00
Thomas Gleixner 53bf837b78 timers: Remove set_timer_slack() leftovers
We now have implicit batching in the timer wheel. The slack API is no longer
used, so remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrew F. Davis <afd@ti.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jaehoon Chung <jh80.chung@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>
Cc: Pali Rohár <pali.rohar@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:09 +02:00
David Howells c1adf20052 Introduce rb_replace_node_rcu()
Implement an RCU-safe variant of rb_replace_node() and rearrange
rb_replace_node() to do things in the same order.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-07-06 10:51:14 +01:00
Theodore Ts'o e192be9d9a random: replace non-blocking pool with a Chacha20-based CRNG
The CRNG is faster, and we don't pretend to track entropy usage in the
CRNG any more.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-07-03 00:57:23 -04:00
Herbert Xu 127827b9c2 lib/mpi: Do not do sg_virt
Currently the mpi SG helpers use sg_virt which is completely
broken.  It happens to work with normal kernel memory but will
fail with anything that is not linearly mapped.

This patch fixes this by using the SG iterator helpers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-01 23:45:18 +08:00
Herbert Xu 9b45b7bba3 crypto: rsa - Generate fixed-length output
Every implementation of RSA that we have naturally generates
output with leading zeroes.  The one and only user of RSA,
pkcs1pad wants to have those leading zeroes in place, in fact
because they are currently absent it has to write those zeroes
itself.

So we shouldn't be stripping leading zeroes in the first place.
In fact this patch makes rsa-generic produce output with fixed
length so that pkcs1pad does not need to do any extra work.

This patch also changes DH to use the new interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-01 23:45:18 +08:00
Peter Zijlstra 28aa2bda22 locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
Now that all the architectures have implemented support for these new
atomic primitives add on the generic infrastructure to expose and use
it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-16 10:48:32 +02:00
Paul E. McKenney 4e9a073f60 torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code
This commit removes CONFIG_RCU_TORTURE_TEST_RUNNABLE in favor of the
already-existing rcutorture.torture_runnable kernel boot parameter.
It also converts an #ifdef into IS_ENABLED(), saving a few lines of code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:02:15 -07:00
Paul E. McKenney f8cbdee99b torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE
This commit applies the infamous IS_ENABLED() macro to eliminate a #ifdef.
It also eliminates the RCU_PERF_TEST_RUNNABLE Kconfig option in favor
of the already-existing rcuperf.perf_runnable kernel boot parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:02:15 -07:00
Linus Torvalds 8387ff2577 vfs: make the string hashes salt the hash
We always mixed in the parent pointer into the dentry name hash, but we
did it late at lookup time.  It turns out that we can simplify that
lookup-time action by salting the hash with the parent pointer early
instead of late.

A few other users of our string hashes also wanted to mix in their own
pointers into the hash, and those are updated to use the same mechanism.

Hash users that don't have any particular initial salt can just use the
NULL pointer as a no-salt.

Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-10 20:21:46 -07:00
Ming Lei 1bdc76aea1 iov_iter: use bvec iterator to implement iterate_bvec()
bvec has one native/mature iterator for long time, so not
necessary to use the reinvented wheel for iterating bvecs
in lib/iov_iter.c.

Two ITER_BVEC test cases are run:
	- xfstest(-g auto) on loop dio/aio, no regression found
	- swap file works well under extreme stress(stress-ng --all 64 -t
	  800 -v), and lots of OOMs are triggerd, and the whole
	system still survives

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-09 10:02:47 -06:00
Borislav Petkov f5967101e9 x86/hweight: Get rid of the special calling convention
People complained about ARCH_HWEIGHT_CFLAGS and how it throws a wrench
into kcov, lto, etc, experimentations.

Add asm versions for __sw_hweight{32,64}() and do explicit saving and
restoring of clobbered registers. This gets rid of the special calling
convention. We get to call those functions on !X86_FEATURE_POPCNT CPUs.

We still need to hardcode POPCNT and register operands as some old gas
versions which we support, do not know about POPCNT.

Btw, remove redundant REX prefix from 32-bit POPCNT because alternatives
can do padding now.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1464605787-20603-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-08 15:01:02 +02:00
Borislav Petkov 7557933e6b lib/cpio: Make find_cpio_data()'s offset arg optional
Some callers don't use it so make it optional.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1465225850-7352-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-08 11:04:19 +02:00
Emese Revfy 543c37cb16 Add sancov plugin
The sancov gcc plugin inserts a __sanitizer_cov_trace_pc() call
at the start of basic blocks.

This plugin is a helper plugin for the kcov feature. It supports
all gcc versions with plugin support (from gcc-4.5 on).
It is based on the gcc commit "Add fuzzing coverage support" by Dmitry Vyukov
(https://gcc.gnu.org/viewcvs/gcc?limit_changes=0&view=revision&revision=231296).

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-06-07 22:57:10 +02:00
Nicolai Stange 20b5b7f3c2 lib/mpi: refactor mpi_read_from_buffer() in terms of mpi_read_raw_data()
mpi_read_from_buffer() and mpi_read_raw_data() do basically the same thing
except that the former extracts the number of payload bits from the first
two bytes of the input buffer.

Besides that, the data copying logic is exactly the same.

Replace the open coded buffer to MPI instance conversion by a call to
mpi_read_raw_data().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-31 16:42:01 +08:00
Nicolai Stange cdf24b42c6 lib/mpi: mpi_read_from_buffer(): sanitize short buffer printk
The first two bytes of the input buffer encode its expected length and
mpi_read_from_buffer() prints a console message if the given buffer is too
short.

However, there are some oddities with how this message is printed:
- It is printed at the default loglevel. This is different from the
  one used in the case that the first two bytes' value is unsupportedly
  large, i.e. KERN_INFO.
- The format specifier '%d' is used for unsigned ints.
- It prints the values of nread and *ret_nread. This is redundant since
  the former is always the latter + 1.

Clean this up as follows:
- Use pr_info() rather than printk() with no loglevel.
- Use the format specifiers '%u' in place if '%d'.
- Do not print the redundant 'nread' but the more helpful 'nbytes' value.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-31 16:42:00 +08:00
Nicolai Stange 7af791e0f0 lib/mpi: mpi_read_from_buffer(): return -EINVAL upon too short buffer
Currently, if the input buffer is shorter than the expected length as
indicated by its first two bytes, an MPI instance of this expected length
will be allocated and filled with as much data as is available. The rest
will remain uninitialized.

Instead of leaving this condition undetected, an error code should be
reported to the caller.

Since this situation indicates that the input buffer's first two bytes,
encoding the number of expected bits, are garbled, -EINVAL is appropriate
here.

If the input buffer is shorter than indicated by its first two bytes,
make mpi_read_from_buffer() return -EINVAL.
Get rid of the 'nread' variable: with the new semantics, the total number
of bytes read from the input buffer is known in advance.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-31 16:42:00 +08:00
Nicolai Stange c5ce7c697c lib/digsig: digsig_verify_rsa(): return -EINVAL if modulo length is zero
Currently, if digsig_verify_rsa() detects that the modulo's length is zero,
i.e. mlen == 0, it returns -ENOMEM which doesn't really fit here.

Make digsig_verify_rsa() return -EINVAL upon mlen == 0.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-31 16:42:00 +08:00
Nicolai Stange 03cdfaad49 lib/mpi: mpi_read_from_buffer(): return error code
mpi_read_from_buffer() reads a MPI from a buffer into a newly allocated
MPI instance. It expects the buffer's leading two bytes to contain the
number of bits, followed by the actual payload.

On failure, it returns NULL and updates the in/out argument ret_nread
somewhat inconsistently:
- If the given buffer is too short to contain the leading two bytes
  encoding the number of bits or their value is unsupported, then
  ret_nread will be cleared.
- If the allocation of the resulting MPI instance fails, ret_nread is left
  as is.

The only user of mpi_read_from_buffer(), digsig_verify_rsa(), simply checks
for a return value of NULL and returns -ENOMEM if that happens.

While this is all of cosmetic nature only, there is another error condition
which currently isn't detectable by the caller of mpi_read_from_buffer():
if the given buffer is too small to hold the number of bits as encoded in
its first two bytes, the return value will be non-NULL and *ret_nread > 0.

In preparation of communicating this condition to the caller, let
mpi_read_from_buffer() return error values by means of the ERR_PTR()
mechanism.

Make the sole caller of mpi_read_from_buffer(), digsig_verify_rsa(),
check the return value for IS_ERR() rather than == NULL. If IS_ERR() is
true, return the associated error value rather than the fixed -ENOMEM.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-31 16:41:59 +08:00
Nicolai Stange eef0df6a59 lib/mpi: mpi_read_raw_data(): fix nbits calculation
The number of bits, nbits, is calculated in mpi_read_raw_data() as follows:

  nbits = nbytes * 8;

Afterwards, the number of leading zero bits of the first byte get
subtracted:

  nbits -= count_leading_zeros(buffer[0]);

However, count_leading_zeros() takes an unsigned long and thus,
the u8 gets promoted to an unsigned long.

Thus, the above doesn't subtract the number of leading zeros in the most
significant nonzero input byte from nbits, but the number of leading
zeros of the most significant nonzero input byte promoted to unsigned long,
i.e. BITS_PER_LONG - 8 too many.

Fix this by subtracting

  count_leading_zeros(...) - (BITS_PER_LONG - 8)

from nbits only.

Fixes: e104599294 ("MPILIB: Provide a function to read raw data into an
                     MPI")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-31 16:41:59 +08:00
Nicolai Stange dfd9051067 lib/mpi: mpi_read_raw_data(): purge redundant clearing of nbits
In mpi_read_raw_data(), unsigned nbits is calculated as follows:

 nbits = nbytes * 8;

and redundantly cleared later on if nbytes == 0:

  if (nbytes > 0)
    ...
  else
    nbits = 0;

Purge this redundant clearing for the sake of clarity.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-31 16:41:58 +08:00
Nicolai Stange 4bdf1cfca5 lib/mpi: purge mpi_set_buffer()
mpi_set_buffer() has no in-tree users and similar functionality is provided
by mpi_read_raw_data().

Remove mpi_set_buffer().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-31 16:41:58 +08:00
Bjørn Mork bc9dc9d5ee lib/uuid.c: use correct offset in uuid parser
Use '+ 0' and '+ 1' as offsets, like they were intended, instead of
adding to the result.

Fixes: 2b1b0d6670 ("lib/uuid.c: introduce a few more generic helpers")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-30 15:26:57 -07:00
Andy Shevchenko cfaff0e515 lib/uuid: add a test module
It appears that somehow I missed a test of the latest UUID rework which
landed in the kernel.  Present a small test module to avoid such cases
in the future.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-30 15:26:57 -07:00
Linus Torvalds 7e0fb73c52 Merge branch 'hash' of git://ftp.sciencehorizons.net/linux
Pull string hash improvements from George Spelvin:
 "This series does several related things:

   - Makes the dcache hash (fs/namei.c) useful for general kernel use.

     (Thanks to Bruce for noticing the zero-length corner case)

   - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the
     above.

   - Avoids 64-bit multiplies in hash_64() on 32-bit platforms.  Two
     32-bit multiplies will do well enough.

   - Rids the world of the bad hash multipliers in hash_32.

     This finishes the job started in commit 689de1d6ca ("Minimal
     fix-up of bad hashing behavior of hash_64()")

     The vast majority of Linux architectures have hardware support for
     32x32-bit multiply and so derive no benefit from "simplified"
     multipliers.

     The few processors that do not (68000, h8/300 and some models of
     Microblaze) have arch-specific implementations added.  Those
     patches are last in the series.

   - Overhauls the dcache hash mixing.

     The patch in commit 0fed3ac866 ("namei: Improve hash mixing if
     CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
     Replaced with a much more careful design that's simultaneously
     faster and better.  (My own invention, as there was noting suitable
     in the literature I could find.  Comments welcome!)

   - Modify the hash_name() loop to skip the initial HASH_MIX().  This
     would let us salt the hash if we ever wanted to.

   - Sort out partial_name_hash().

     The hash function is declared as using a long state, even though
     it's truncated to 32 bits at the end and the extra internal state
     contributes nothing to the result.  And some callers do odd things:

      - fs/hfs/string.c only allocates 32 bits of state
      - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

   - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
     rather than 0..sizeof(long)-1.  This would simplify users other
     than full_name_hash"

  Special thanks to Bruce Fields for testing and finding bugs in v1.  (I
  learned some humbling lessons about "obviously correct" code.)

  On the arch-specific front, the m68k assembly has been tested in a
  standalone test harness, I've been in contact with the Microblaze
  maintainers who mostly don't care, as the hardware multiplier is never
  omitted in real-world applications, and I haven't heard anything from
  the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
  h8300: Add <asm/hash.h>
  microblaze: Add <asm/hash.h>
  m68k: Add <asm/hash.h>
  <linux/hash.h>: Add support for architecture-specific functions
  fs/namei.c: Improve dcache hash function
  Eliminate bad hash multipliers from hash_32() and  hash_64()
  Change hash_64() return value to 32 bits
  <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
  fs/namei.c: Add hashlen_string() function
  Pull out string hash to <linux/stringhash.h>
2016-05-28 16:15:25 -07:00
George Spelvin 468a942852 <linux/hash.h>: Add support for architecture-specific functions
This is just the infrastructure; there are no users yet.

This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/hash.h>.

That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis <alistai@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
2016-05-28 15:48:31 -04:00
Ville Syrjälä 3017cd63f2 dma-debug: avoid spinlock recursion when disabling dma-debug
With netconsole (at least) the pr_err("...  disablingn") call can
recurse back into the dma-debug code, where it'll try to grab
free_entries_lock again.  Avoid the problem by doing the printk after
dropping the lock.

Link: http://lkml.kernel.org/r/1463678421-18683-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-26 15:35:44 -07:00
Linus Torvalds 0985b65d3b Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs iov_iter regression fix from Al Viro:
 "Fix for braino in 'fold checks into iterate_and_advance()'"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  do "fold checks into iterate_and_advance()" right
2016-05-25 15:59:09 -07:00
Al Viro 19f1845933 do "fold checks into iterate_and_advance()" right
the only case when we should skip the iterate_and_advance() guts
is when nothing's left in the iterator, _not_ just when requested
amount is 0.  Said guts will do nothing in the latter case anyway;
the problem we tried to deal with in the aforementioned commit is
that when there's nothing left *and* the amount requested is 0,
we might end up deferencing one iovec too many; the value we fetch
from there is discarded in that case, but theoretically it might
oops if the iovec array ends exactly at the end of page with the
next page not mapped.

Bailing out on zero size requested had an unexpected side effect -
zero-length segment in the beginning of iovec array ended up
throwing do_loop_readv_writev() into infinite spin; we do not
advance past the empty segment at all.  Reproducer is trivial:
echo '#include <sys/uio.h>' >a.c
echo 'main() {char c; struct iovec v[] = {{&c,0},{&c,1}}; readv(0,v,2);}' >>a.c
cc a.c && ./a.out </proc/uptime

which should end up with the process not hanging.  Probably ought to
go into LTP or xfstests...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-25 18:16:02 -04:00
Jiri Slaby c5d2cac0f1 kgdb: depends on VT
With VT=n, the kernel build fails with:

  drivers/built-in.o: In function `kgdboc_pre_exp_handler':
  kgdboc.c:(.text+0x7b5aa): undefined reference to `fg_console'
  kgdboc.c:(.text+0x7b5ce): undefined reference to `vc_cons'
  kgdboc.c:(.text+0x7b5d5): undefined reference to `vc_cons'

kgdboc.o is built when KGDB_SERIAL_CONSOLE is set.  So make
KGDB_SERIAL_CONSOLE depend on HW_CONSOLE which includes those symbols.

Link: http://lkml.kernel.org/r/1459412955-4696-1-git-send-email-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: "Jim Davis" <jim.epost@gmail.com>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Linus Torvalds 5469dc270c Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - the rest of MM

 - KASAN updates

 - procfs updates

 - exit, fork updates

 - printk updates

 - lib/ updates

 - radix-tree testsuite updates

 - checkpatch updates

 - kprobes updates

 - a few other misc bits

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
  samples/kprobes: print out the symbol name for the hooks
  samples/kprobes: add a new module parameter
  kprobes: add the "tls" argument for j_do_fork
  init/main.c: simplify initcall_blacklisted()
  fs/efs/super.c: fix return value
  checkpatch: improve --git <commit-count> shortcut
  checkpatch: reduce number of `git log` calls with --git
  checkpatch: add support to check already applied git commits
  checkpatch: add --list-types to show message types to show or ignore
  checkpatch: advertise the --fix and --fix-inplace options more
  checkpatch: whine about ACCESS_ONCE
  checkpatch: add test for keywords not starting on tabstops
  checkpatch: improve CONSTANT_COMPARISON test for structure members
  checkpatch: add PREFER_IS_ENABLED test
  lib/GCD.c: use binary GCD algorithm instead of Euclidean
  radix-tree: free up the bottom bit of exceptional entries for reuse
  dax: move RADIX_DAX_ definitions to dax.c
  radix-tree: make radix_tree_descend() more useful
  radix-tree: introduce radix_tree_replace_clear_tags()
  radix-tree: tidy up __radix_tree_create()
  ...
2016-05-20 22:31:33 -07:00
Linus Torvalds 3aa2fc1667 driver core update for 4.7-rc1
Here's the "big" driver core update for 4.7-rc1.
 
 Mostly just debugfs changes, the long-known and messy races with removing
 debugfs files should be fixed thanks to the great work of Nicolai Stange.  We
 also have some isa updates in here (the x86 maintainers told me to take it
 through this tree), a new warning when we run out of dynamic char major
 numbers, and a few other assorted changes, details in the shortlog.
 
 All have been in linux-next for some time with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlc/0mwACgkQMUfUDdst+ynjXACgjNxR5nMUiM8ZuuD0i4Xj7VXd
 hnIAoM08+XDCv41noGdAcKv+2WZVZWMC
 =i+0H
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here's the "big" driver core update for 4.7-rc1.

  Mostly just debugfs changes, the long-known and messy races with
  removing debugfs files should be fixed thanks to the great work of
  Nicolai Stange.  We also have some isa updates in here (the x86
  maintainers told me to take it through this tree), a new warning when
  we run out of dynamic char major numbers, and a few other assorted
  changes, details in the shortlog.

  All have been in linux-next for some time with no reported issues"

* tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
  Revert "base: dd: don't remove driver_data in -EPROBE_DEFER case"
  gpio: ws16c48: Utilize the ISA bus driver
  gpio: 104-idio-16: Utilize the ISA bus driver
  gpio: 104-idi-48: Utilize the ISA bus driver
  gpio: 104-dio-48e: Utilize the ISA bus driver
  watchdog: ebc-c384_wdt: Utilize the ISA bus driver
  iio: stx104: Utilize the module_isa_driver and max_num_isa_dev macros
  iio: stx104: Add X86 dependency to STX104 Kconfig option
  Documentation: Add ISA bus driver documentation
  isa: Implement the max_num_isa_dev macro
  isa: Implement the module_isa_driver macro
  pnp: pnpbios: Add explicit X86_32 dependency to PNPBIOS
  isa: Decouple X86_32 dependency from the ISA Kconfig option
  driver-core: use 'dev' argument in dev_dbg_ratelimited stub
  base: dd: don't remove driver_data in -EPROBE_DEFER case
  kernfs: Move faulting copy_user operations outside of the mutex
  devcoredump: add scatterlist support
  debugfs: unproxify files created through debugfs_create_u32_array()
  debugfs: unproxify files created through debugfs_create_blob()
  debugfs: unproxify files created through debugfs_create_bool()
  ...
2016-05-20 21:26:15 -07:00
Zhaoxiu Zeng fff7fb0b2d lib/GCD.c: use binary GCD algorithm instead of Euclidean
The binary GCD algorithm is based on the following facts:
	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

Even on x86 machines with reasonable division hardware, the binary
algorithm runs about 25% faster (80% the execution time) than the
division-based Euclidian algorithm.

On platforms like Alpha and ARMv6 where division is a function call to
emulation code, it's even more significant.

There are two variants of the code here, depending on whether a fast
__ffs (find least significant set bit) instruction is available.  This
allows the unpredictable branches in the bit-at-a-time shifting loop to
be eliminated.

If fast __ffs is not available, the "even/odd" GCD variant is used.

I use the following code to benchmark:

	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <string.h>
	#include <time.h>
	#include <unistd.h>

	#define swap(a, b) \
		do { \
			a ^= b; \
			b ^= a; \
			a ^= b; \
		} while (0)

	unsigned long gcd0(unsigned long a, unsigned long b)
	{
		unsigned long r;

		if (a < b) {
			swap(a, b);
		}

		if (b == 0)
			return a;

		while ((r = a % b) != 0) {
			a = b;
			b = r;
		}

		return b;
	}

	unsigned long gcd1(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b >>= __builtin_ctzl(b);

		for (;;) {
			a >>= __builtin_ctzl(a);
			if (a == b)
				return a << __builtin_ctzl(r);

			if (a < b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd2(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &= -r;

		while (!(b & r))
			b >>= 1;

		for (;;) {
			while (!(a & r))
				a >>= 1;
			if (a == b)
				return a;

			if (a < b)
				swap(a, b);
			a -= b;
			a >>= 1;
			if (a & r)
				a += b;
			a >>= 1;
		}
	}

	unsigned long gcd3(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b >>= __builtin_ctzl(b);
		if (b == 1)
			return r & -r;

		for (;;) {
			a >>= __builtin_ctzl(a);
			if (a == 1)
				return r & -r;
			if (a == b)
				return a << __builtin_ctzl(r);

			if (a < b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd4(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &= -r;

		while (!(b & r))
			b >>= 1;
		if (b == r)
			return r;

		for (;;) {
			while (!(a & r))
				a >>= 1;
			if (a == r)
				return r;
			if (a == b)
				return a;

			if (a < b)
				swap(a, b);
			a -= b;
			a >>= 1;
			if (a & r)
				a += b;
			a >>= 1;
		}
	}

	static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
		gcd0, gcd1, gcd2, gcd3, gcd4,
	};

	#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

	#if defined(__x86_64__)

	#define rdtscll(val) do { \
		unsigned long __a,__d; \
		__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
		(val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \
	} while(0)

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		unsigned long long start, end;
		unsigned long long ret;
		unsigned long gcd_res;

		rdtscll(start);
		gcd_res = gcd(a, b);
		rdtscll(end);

		if (end >= start)
			ret = end - start;
		else
			ret = ~0ULL - start + 1 + end;

		*res = gcd_res;
		return ret;
	}

	#else

	static inline struct timespec read_time(void)
	{
		struct timespec time;
		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
		return time;
	}

	static inline unsigned long long diff_time(struct timespec start, struct timespec end)
	{
		struct timespec temp;

		if ((end.tv_nsec - start.tv_nsec) < 0) {
			temp.tv_sec = end.tv_sec - start.tv_sec - 1;
			temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
		} else {
			temp.tv_sec = end.tv_sec - start.tv_sec;
			temp.tv_nsec = end.tv_nsec - start.tv_nsec;
		}

		return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
	}

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		struct timespec start, end;
		unsigned long gcd_res;

		start = read_time();
		gcd_res = gcd(a, b);
		end = read_time();

		*res = gcd_res;
		return diff_time(start, end);
	}

	#endif

	static inline unsigned long get_rand()
	{
		if (sizeof(long) == 8)
			return (unsigned long)rand() << 32 | rand();
		else
			return rand();
	}

	int main(int argc, char **argv)
	{
		unsigned int seed = time(0);
		int loops = 100;
		int repeats = 1000;
		unsigned long (*res)[TEST_ENTRIES];
		unsigned long long elapsed[TEST_ENTRIES];
		int i, j, k;

		for (;;) {
			int opt = getopt(argc, argv, "n:r:s:");
			/* End condition always first */
			if (opt == -1)
				break;

			switch (opt) {
			case 'n':
				loops = atoi(optarg);
				break;
			case 'r':
				repeats = atoi(optarg);
				break;
			case 's':
				seed = strtoul(optarg, NULL, 10);
				break;
			default:
				/* You won't actually get here. */
				break;
			}
		}

		res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
		memset(elapsed, 0, sizeof(elapsed));

		srand(seed);
		for (j = 0; j < loops; j++) {
			unsigned long a = get_rand();
			/* Do we have args? */
			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			unsigned long long min_elapsed[TEST_ENTRIES];
			for (k = 0; k < repeats; k++) {
				for (i = 0; i < TEST_ENTRIES; i++) {
					unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
					if (k == 0 || min_elapsed[i] > tmp)
						min_elapsed[i] = tmp;
				}
			}
			for (i = 0; i < TEST_ENTRIES; i++)
				elapsed[i] += min_elapsed[i];
		}

		for (i = 0; i < TEST_ENTRIES; i++)
			printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

		k = 0;
		srand(seed);
		for (j = 0; j < loops; j++) {
			unsigned long a = get_rand();
			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			for (i = 1; i < TEST_ENTRIES; i++) {
				if (res[j][i] != res[j][0])
					break;
			}
			if (i < TEST_ENTRIES) {
				if (k == 0) {
					k = 1;
					fprintf(stderr, "Error:\n");
				}
				fprintf(stderr, "gcd(%lu, %lu): ", a, b);
				for (i = 0; i < TEST_ENTRIES; i++)
					fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
			}
		}

		if (k == 0)
			fprintf(stderr, "PASS\n");

		free(res);

		return 0;
	}

Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 10174
  gcd1: elapsed 2120
  gcd2: elapsed 2902
  gcd3: elapsed 2039
  gcd4: elapsed 2812
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9309
  gcd1: elapsed 2280
  gcd2: elapsed 2822
  gcd3: elapsed 2217
  gcd4: elapsed 2710
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9589
  gcd1: elapsed 2098
  gcd2: elapsed 2815
  gcd3: elapsed 2030
  gcd4: elapsed 2718
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9914
  gcd1: elapsed 2309
  gcd2: elapsed 2779
  gcd3: elapsed 2228
  gcd4: elapsed 2709
  PASS

[akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
Signed-off-by: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 9e85d81119 radix-tree: make radix_tree_descend() more useful
Now that the shift amount is stored in the node, radix_tree_descend()
can calculate offset itself from index, which removes several lines of
code from each of the tree walkers.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox d604c32452 radix-tree: introduce radix_tree_replace_clear_tags()
In addition to replacing the entry, we also clear all associated tags.
This is really a one-off special for page_cache_tree_delete() which had
far too much detailed knowledge about how the radix tree works.

For efficiency, factor node_tag_clear() out of radix_tree_tag_clear() It
can be used by radix_tree_delete_item() as well as
radix_tree_replace_clear_tags().

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 89148aa402 radix-tree: tidy up __radix_tree_create()
1. Rename the existing variable 'slot' to 'child'.
2. Introduce a new variable called 'slot' which is the address of the
   slot we're dealing with.  This lets us simplify the tree insertion,
   and removes the recalculation of 'slot' at the end of the function.
3. Using 'slot' in the sibling pointer insertion part makes the code
   more readable.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox a8e4da25d3 radix-tree: tidy up range_tag_if_tagged
Convert radix_tree_range_tag_if_tagged to name the nodes parent, node
and child instead of node & slot.

Use parent->offset instead of playing games with 'upindex'.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 8c1244de00 radix-tree: tidy up next_chunk
Convert radix_tree_next_chunk to use 'child' instead of 'slot' as the
name of the child node.  Also use node_maxindex() where it makes sense.

The 'rnode' variable was unnecessary; it doesn't overlap in usage with
'node', so we can just use 'node' the whole way through the function.

Improve the testcase to start the walk from every index in the carefully
constructed tree, and to accept any index within the range covered by
the entry.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox af49a63e10 radix-tree: change naming conventions in radix_tree_shrink
Use the more standard 'node' and 'child' instead of 'to_free' and
'slot'.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox b194d16c27 radix-tree: rename radix_tree_is_indirect_ptr()
As with indirect_to_ptr(), ptr_to_indirect() and
RADIX_TREE_INDIRECT_PTR, change radix_tree_is_indirect_ptr() to
radix_tree_is_internal_node().

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 4dd6c0987c radix-tree: rename indirect_to_ptr() to entry_to_node()
Mirrors the earlier commit introducing node_to_entry().

Also change the type returned to be a struct radix_tree_node pointer.
That lets us simplify a couple of places in the radix tree shrink &
extend paths where we could convert an entry into a pointer, modify the
node, then convert the pointer back into an entry.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox a4db4dcea1 radix-tree: rename ptr_to_indirect() to node_to_entry()
ptr_to_indirect() was a bad name.  What it really means is "Convert this
pointer to a node into an entry suitable for storing in the radix tree".
So node_to_entry() seemed like a better name.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 30ff46ccb3 radix-tree: rename INDIRECT_PTR to INTERNAL_NODE
The name RADIX_TREE_INDIRECT_PTR doesn't really match the meaning.
RADIX_TREE_INTERNAL_NODE is a better name.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox d0891265bb radix-tree: remove root->height
The only remaining references to root->height were in extend and shrink,
where it was updated.  Now we can remove it entirely.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox fb209019c9 radix-tree: remove a use of root->height from delete_node
If radix_tree_shrink returns whether it managed to shrink, then
__radix_tree_delete_node doesn't ned to query the tree to find out
whether it did any work or not.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox c12e51b07b radix-tree: replace node->height with node->shift
node->shift represents the shift necessary for looking in the slots
array at this level.  It is equal to the old (node->height - 1) *
RADIX_TREE_MAP_SHIFT.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 0c7fa0a841 radix-tree: split node->path into offset and height
Neither piece of information we're storing in node->path can be larger
than 64, so store each in its own unsigned char instead of shifting and
masking to store them both in an unsigned int.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 2fcd9005cc radix-tree: miscellaneous fixes
Typos, whitespace, grammar, line length, using the correct types, etc.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 6b053b8e5f radix-tree: add copyright statements
The multiorder support is a sufficiently large feature to be worth
adding copyrigt lines for.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Ross Zwisler 0796c58325 radix-tree: fix radix_tree_dump() for multi-order entries
- Print which indices are covered by every leaf entry
 - Print sibling entries
 - Print the node pointer instead of the slot entry
 - Build by default in userspace, and make it accessible to the test-suite

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 070c5ac274 radix-tree: fix radix_tree_range_tag_if_tagged() for multiorder entries
I had previously decided that tagging a single multiorder entry would
count as tagging 2^order entries for the purposes of 'nr_to_tag'.  I now
believe that decision to be a mistake, and it should count as a single
entry.  That's more likely to be what callers expect.

When walking back up the tree from a newly-tagged entry, the current
code assumed we were starting from the lowest level of the tree; if we
have a multiorder entry with an order at least RADIX_TREE_MAP_SHIFT in
size then we need to shift the index by 'shift' before we start walking
back up the tree, or we will end up not setting tags on higher entries,
and then mistakenly thinking that entries below a certain point in the
tree are not tagged.

If the first index we examine is a sibling entry of a tagged multiorder
entry, we were not tagging it.  We need to examine the canonical entry,
and the easiest way to do that is to use radix_tree_descend().  We then
have to skip over sibling slots when looking for the next entry in the
tree or we will end up walking back to the canonical entry.

Add several tests for radix_tree_range_tag_if_tagged().

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 0a2efc6c80 radix-tree: rewrite radix_tree_locate_item
Use the new multi-order support functions to rewrite
radix_tree_locate_item().  Modify the locate tests to test multiorder
entries too.

[hughd@google.com: radix_tree_locate_item() is often returning the wrong index]
  Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1605012108490.1166@eggly.anvils
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 8a14f4d832 radix-tree: fix radix_tree_create for sibling entries
If the radix tree user attempted to insert a colliding entry with an
existing multiorder entry, then radix_tree_create() could encounter a
sibling entry when walking down the tree to look for a slot.  Use
radix_tree_descend() to fix the problem, and add a test-case to make
sure the problem doesn't come back in future.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Ross Zwisler 4589ba6d0f radix-tree: rewrite radix_tree_tag_get
Use the new multi-order support functions to rewrite
radix_tree_tag_get()

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Ross Zwisler 00f47b5811 radix-tree: rewrite radix_tree_tag_clear
Use the new multi-order support functions to rewrite
radix_tree_tag_clear()

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Ross Zwisler fb969909dd radix-tree: rewrite radix_tree_tag_set
Use the new multi-order support functions to rewrite
radix_tree_tag_set()

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Ross Zwisler 21ef533931 radix-tree: add support for multi-order iterating
This enables the macros radix_tree_for_each_slot() and friends to be
used with multi-order entries.

The way that this works is that we treat all entries in a given slots[]
array as a single chunk.  If the index given to radix_tree_next_chunk()
happens to point us to a sibling entry, we will back up iter->index so
that it points to the canonical entry, and that will be the place where
we start our iteration.

As we're processing a chunk in radix_tree_next_slot(), we process
canonical entries, skip over sibling entries, and restart the chunk
lookup if we find a non-sibling indirect pointer.  This drops back to
the radix_tree_next_chunk() code, which will re-walk the tree and look
for another chunk.

This allows us to properly handle multi-order entries mixed with other
entries that are at various heights in the radix tree.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 7b60e9ad59 radix-tree: fix multiorder BUG_ON in radix_tree_insert
These BUG_ON tests are to ensure that all the tags are clear when
inserting a new entry.  If we insert a multiorder entry, we'll end up
looking at the tags for a different node, and so the BUG_ON can end up
triggering spuriously.

Also, we now have three tags, not two, so check all three are clear, and
check all the root tags with a single call to BUG_ON since the bits are
stored contiguously.

Include a test-case to ensure this problem does not reoccur.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 858299544e radix-tree: rewrite __radix_tree_lookup
Use the new multi-order support functions to rewrite __radix_tree_lookup()

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox afe0e395b6 radix-tree: fix several shrinking bugs with multiorder entries
Setting the indirect bit on the user data entry used to be unambiguous
because the tree walking code knew not to expect internal nodes in the
last level of the tree.  Multiorder entries can appear at any level of
the tree, and a leaf with the indirect bit set is indistinguishable from
a pointer to a node.

Introduce a special entry (RADIX_TREE_RETRY) which is neither a valid
user entry, nor a valid pointer to a node.  The radix_tree_deref_retry()
function continues to work the same way, but tree walking code can
distinguish it from a pointer to a node.

Also fix the condition for setting slot->parent to NULL; it does not
matter what height the tree is, it only matters whether slot is an
indirect pointer.  Move this code above the comment which is referring
to the assignment to root->rnode.

Also fix the condition for preventing the tree from shrinking to a
single entry if it's a multiorder entry.

Add a test-case to the test suite that checks that the tree goes back
down to its original height after an item is inserted & deleted from a
higher index in the tree.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 49ea6ebcd3 radix-tree: fix extending the tree for multi-order entries at offset 0
The current code will insert entries at each level, as if we're going to
add a new entry at the bottom level, so we then get an -EEXIST when we
try to insert the entry into the tree.  The best way to fix this is to
not check 'order' when inserting into an empty tree.

We still need to 'extend' the tree to the height necessary for the maximum
index corresponding to this entry, so pass that value to
radix_tree_extend() rather than the index we're asked to create, or we
won't create a tree that's deep enough.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 1456a439fc radix-tree: introduce radix_tree_load_root()
All the tree walking functions start with some variant of this code;
centralise it in one place so we're not chasing subtly different bugs
everywhere.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox aa54757602 radix-tree: remove restriction on multi-order entries
Now that sibling pointers are handled explicitly, there is no purpose
served by restricting the order to be >= RADIX_TREE_MAP_SHIFT.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 29e0967c2f radix-tree: fix deleting a multi-order entry through an alias
If we deleted an entry through an index which looked up a sibling
pointer, we'd end up zeroing out the wrong slots in the node.  Use
get_slot_offset() to find the right slot.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 3b8c00f684 radix-tree: fix sibling entry insertion
The subtraction was the wrong way round, leading to undefined behaviour
(shift by an amount larger than the size of the type).

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox db050f2924 radix-tree: add missing sibling entry functionality
The code I previously added to enable multiorder radix tree entries was
untested and therefore buggy.  This commit adds the support functions
that Ross and I decided were necessary over a four-week period of
iterating various designs.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Matthew Wilcox 57578c2ea2 raxix-tree: introduce CONFIG_RADIX_TREE_MULTIORDER
I've been receiving increasingly concerned notes from 0day about how
much my recent changes have been bloating the radix tree.  Make it
happier by only including multiorder support if
CONFIG_TRANSPARENT_HUGEPAGES is set.

This is an independent Kconfig option, so other radix tree users can
also set it if they have a need.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jan Kara <jack@suse.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Andy Shevchenko e3a93bce69 lib/uuid.c: remove FSF address
There is no point in keeping an address in the file since it's subject
to change.

While here, update Intel Copyright years.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Andy Shevchenko 2b1b0d6670 lib/uuid.c: introduce a few more generic helpers
There are new helpers in this patch:

  uuid_is_valid		checks if a UUID is valid
  uuid_be_to_bin	converts from string to binary (big endian)
  uuid_le_to_bin	converts from string to binary (little endian)

They will be used in future, i.e. in the following patches in the series.

This also moves the indices arrays to lib/uuid.c to be shared accross
modules.

[andriy.shevchenko@linux.intel.com: fix typo]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Andy Shevchenko 8da4b8c48e lib/uuid.c: move generate_random_uuid() to uuid.c
Let's gather the UUID related functions under one hood.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Andy Shevchenko aa4ea1c3b3 lib/vsprintf: simplify UUID printing
There are few functions here and there along with type definitions that
provide UUID API.  This series consolidates everything under one hood
and converts current users.

This has been tested for a while internally, however it doesn't mean we
covered all possible cases (especially accuracy of UUID constants after
conversion).  So, please test this as much as you can and provide your
tag.  We appreciate the effort.

The ACPI conversion is postponed for now to sort more generic things out
first.

This patch (of 9):

Since we have hex_byte_pack_upper() we may use it directly and avoid
second loop.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Petr Mladek 42a0bb3f71 printk/nmi: generic solution for safe printk in NMI
printk() takes some locks and could not be used a safe way in NMI
context.

The chance of a deadlock is real especially when printing stacks from
all CPUs.  This particular problem has been addressed on x86 by the
commit a9edc88093 ("x86/nmi: Perform a safe NMI stack trace on all
CPUs").

The patchset brings two big advantages.  First, it makes the NMI
backtraces safe on all architectures for free.  Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited.  We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers.  These are not easy to avoid.

This patch reuses most of the code and makes it generic.  It is useful
for all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context.  It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers.  There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock.  It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven
Rostedt.  It needs a per-CPU buffer and is compiled only on
architectures that call nmi_enter().  This is achieved by the new
HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures.  We need to clean up NMI
handling there first.  Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see

  https://lkml.org/lkml/2015/6/10/327

[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>	[arm part]
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Andrey Ryabinin eae08dcab8 kasan/tests: add tests for user memory access functions
Add some tests for the newly-added user memory access API.

Link: http://lkml.kernel.org/r/1462538722-1574-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Andrey Ryabinin 1771c6e1a5 x86/kasan: instrument user memory access API
Exchange between user and kernel memory is coded in assembly language.
Which means that such accesses won't be spotted by KASAN as a compiler
instruments only C code.

Add explicit KASAN checks to user memory access API to ensure that
userspace writes to (or reads from) a valid kernel memory.

Note: Unlike others strncpy_from_user() is written mostly in C and KASAN
sees memory accesses in it.  However, it makes sense to add explicit
check for all @count bytes that *potentially* could be written to the
kernel.

[aryabinin@virtuozzo.com: move kasan check under the condition]
  Link: http://lkml.kernel.org/r/1462869209-21096-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1462538722-1574-4-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Alexander Potapenko 96fe805fb6 mm, kasan: add a ksize() test
Add a test that makes sure ksize() unpoisons the whole chunk.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Konstantin Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Linus Torvalds a05a70db34 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - fsnotify fix

 - poll() timeout fix

 - a few scripts/ tweaks

 - debugobjects updates

 - the (small) ocfs2 queue

 - Minor fixes to kernel/padata.c

 - Maybe half of the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  mm, page_alloc: restore the original nodemask if the fast path allocation failed
  mm, page_alloc: uninline the bad page part of check_new_page()
  mm, page_alloc: don't duplicate code in free_pcp_prepare
  mm, page_alloc: defer debugging checks of pages allocated from the PCP
  mm, page_alloc: defer debugging checks of freed pages until a PCP drain
  cpuset: use static key better and convert to new API
  mm, page_alloc: inline pageblock lookup in page free fast paths
  mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
  mm, page_alloc: pull out side effects from free_pages_check
  mm, page_alloc: un-inline the bad part of free_pages_check
  mm, page_alloc: check multiple page fields with a single branch
  mm, page_alloc: remove field from alloc_context
  mm, page_alloc: avoid looking up the first zone in a zonelist twice
  mm, page_alloc: shortcut watermark checks for order-0 pages
  mm, page_alloc: reduce cost of fair zone allocation policy retry
  mm, page_alloc: shorten the page allocator fast path
  mm, page_alloc: check once if a zone has isolated pageblocks
  mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
  mm, page_alloc: simplify last cpupid reset
  mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
  ...
2016-05-19 20:00:06 -07:00
Andrew Morton 0edaf86cf1 include/linux/nodemask.h: create next_node_in() helper
Lots of code does

	node = next_node(node, XXX);
	if (node == MAX_NUMNODES)
		node = first_node(XXX);

so create next_node_in() to do this and use it in various places.

[mhocko@suse.com: use next_node_in() helper]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Hui Zhu <zhuhui@xiaomi.com>
Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Du, Changbin b9fdac7f66 debugobjects: insulate non-fixup logic related to static obj from fixup callbacks
When activating a static object we need make sure that the object is
tracked in the object tracker.  If it is a non-static object then the
activation is illegal.

In previous implementation, each subsystem need take care of this in
their fixup callbacks.  Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.

To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not.  If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.

This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.

At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked.  Then Change
such object can overwrite someone's memory and cause unexpected
behaviour.  For example, the timer_fixup_activate bind timer to function
stub_timer.

Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
  Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Du, Changbin d99b1d8912 percpu_counter: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).

Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Du, Changbin e7a8e78bd4 debugobjects: correct the usage of fixup call results
If debug_object_fixup() return non-zero when problem has been fixed.
But the code got it backwards, it taks 0 as fixup successfully.  So fix
it.

Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Du, Changbin b1e4d9d82d debugobjects: make fixup functions return bool instead of int
I am going to introduce debugobjects infrastructure to USB subsystem.
But before this, I found the code of debugobjects could be improved.
This patchset will make fixup functions return bool type instead of int.
Because fixup only need report success or no.  boolean is the 'real'
type.

This patch (of 7):

The object debugging infrastructure core provides some fixup callbacks
for the subsystem who use it.  These callbacks are called from the debug
code whenever a problem in debug_object_init is detected.  And
debugobjects core suppose them returns 1 when the fixup was successful,
otherwise 0.  So the return type is boolean.

A bad thing is that debug_object_fixup use the return value for
arithmetic operation.  It confused me that what is the reall return
type.

Reading over the whole code, I found some place do use the return value
incorrectly(see next patch).  So why use bool type instead?

Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Linus Torvalds f4f27d0028 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - A new LSM, "LoadPin", from Kees Cook is added, which allows forcing
     of modules and firmware to be loaded from a specific device (this
     is from ChromeOS, where the device as a whole is verified
     cryptographically via dm-verity).

     This is disabled by default but can be configured to be enabled by
     default (don't do this if you don't know what you're doing).

   - Keys: allow authentication data to be stored in an asymmetric key.
     Lots of general fixes and updates.

   - SELinux: add restrictions for loading of kernel modules via
     finit_module().  Distinguish non-init user namespace capability
     checks.  Apply execstack check on thread stacks"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (48 commits)
  LSM: LoadPin: provide enablement CONFIG
  Yama: use atomic allocations when reporting
  seccomp: Fix comment typo
  ima: add support for creating files using the mknodat syscall
  ima: fix ima_inode_post_setattr
  vfs: forbid write access when reading a file into memory
  fs: fix over-zealous use of "const"
  selinux: apply execstack check on thread stacks
  selinux: distinguish non-init user namespace capability checks
  LSM: LoadPin for kernel file loading restrictions
  fs: define a string representation of the kernel_read_file_id enumeration
  Yama: consolidate error reporting
  string_helpers: add kstrdup_quotable_file
  string_helpers: add kstrdup_quotable_cmdline
  string_helpers: add kstrdup_quotable
  selinux: check ss_initialized before revalidating an inode label
  selinux: delay inode label lookup as long as possible
  selinux: don't revalidate an inode's label when explicitly setting it
  selinux: Change bool variable name to index.
  KEYS: Add KEYCTL_DH_COMPUTE command
  ...
2016-05-19 09:21:36 -07:00
Linus Torvalds 675e0655c1 SCSI misc on 20160517
This patch includes the usual quota of driver updates (bnx2fc, mp3sas,
 hpsa, ncr5380, lpfc, hisi_sas, snic, aacraid, megaraid_sas) there's
 also a multiqueue update for scsi_debug, assorted bug fixes and a few
 other minor updates (refactor of scsi_sg_pools into generic code, alua
 and VPD updates, and struct timeval conversions).
 
 Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJXO8W0AAoJEDeqqVYsXL0MW24H/jGWwfjsDUiSsLwbLca6DWu8
 ZCWZ7rSZ27CApwGPgZGpLvUg+vpW8Ykm2zdeBnlZ6ScXS+dT3uo/PHsnemsTextj
 6glQNIOFY0Ja2GwkkN00M6IZQhTJ628cqJKIEJxC68lIw16wiOwjZaK68GMrusDO
 Sl062rkuLR6Jb2T+YoT/sD8jQfWlSj2V9e9rqJoS/rIbS6B+hUipuybz2yQ2yK2u
 XFc30yal9oVz1fHEoh2O8aqckW3/iskukVXVuZ0MQzT/lV/bm9I6AnWVHw7d0Yhp
 ZELjXpjx5M2Z/d8k0Wvx1e25oL/ERwa96yLnTvRcqyF5Yt1EgAhT+jKvo4pnGr8=
 =L6y/
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "First round of SCSI updates for the 4.6+ merge window.

  This batch includes the usual quota of driver updates (bnx2fc, mp3sas,
  hpsa, ncr5380, lpfc, hisi_sas, snic, aacraid, megaraid_sas).  There's
  also a multiqueue update for scsi_debug, assorted bug fixes and a few
  other minor updates (refactor of scsi_sg_pools into generic code, alua
  and VPD updates, and struct timeval conversions)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (138 commits)
  mpt3sas: Used "synchronize_irq()"API to synchronize timed-out IO & TMs
  mpt3sas: Set maximum transfer length per IO to 4MB for VDs
  mpt3sas: Updating mpt3sas driver version to 13.100.00.00
  mpt3sas: Fix initial Reference tag field for 4K PI drives.
  mpt3sas: Handle active cable exception event
  mpt3sas: Update MPI header to 2.00.42
  Revert "lpfc: Delete unnecessary checks before the function call mempool_destroy"
  eata_pio: missing break statement
  hpsa: Fix type ZBC conditional checks
  scsi_lib: Decode T10 vendor IDs
  scsi_dh_alua: do not fail for unknown VPD identification
  scsi_debug: use locally assigned naa
  scsi_debug: uuid for lu name
  scsi_debug: vpd and mode page work
  scsi_debug: add multiple queue support
  bfa: fix bfa_fcb_itnim_alloc() error handling
  megaraid_sas: Downgrade two success messages to info
  cxlflash: Fix to resolve dead-lock during EEH recovery
  scsi_debug: rework resp_report_luns
  scsi_debug: use pdt constants
  ...
2016-05-18 16:38:59 -07:00
Linus Torvalds 69370471d0 Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter cleanups from Al Viro.

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fold checks into iterate_and_advance()
  rw_verify_area(): saner calling conventions
  aio: remove a pointless assignment
2016-05-18 11:46:23 -07:00
James Bottomley e7ca7f9fa2 Merge branch 'fixes' into misc 2016-05-17 21:12:50 -04:00
Linus Torvalds a7fd20d1c4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support SPI based w5100 devices, from Akinobu Mita.

   2) Partial Segmentation Offload, from Alexander Duyck.

   3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.

   4) Allow cls_flower stats offload, from Amir Vadai.

   5) Implement bpf blinding, from Daniel Borkmann.

   6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
      actually using FASYNC these atomics are superfluous.  From Eric
      Dumazet.

   7) Run TCP more preemptibly, also from Eric Dumazet.

   8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
      driver, from Gal Pressman.

   9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.

  10) Improve BPF usage documentation, from Jesper Dangaard Brouer.

  11) Support tunneling offloads in qed, from Manish Chopra.

  12) aRFS offloading in mlx5e, from Maor Gottlieb.

  13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
      Leitner.

  14) Add MSG_EOR support to TCP, this allows controlling packet
      coalescing on application record boundaries for more accurate
      socket timestamp sampling.  From Martin KaFai Lau.

  15) Fix alignment of 64-bit netlink attributes across the board, from
      Nicolas Dichtel.

  16) Per-vlan stats in bridging, from Nikolay Aleksandrov.

  17) Several conversions of drivers to ethtool ksettings, from Philippe
      Reynes.

  18) Checksum neutral ILA in ipv6, from Tom Herbert.

  19) Factorize all of the various marvell dsa drivers into one, from
      Vivien Didelot

  20) Add VF support to qed driver, from Yuval Mintz"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
  Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
  Revert "phy dp83867: Make rgmii parameters optional"
  r8169: default to 64-bit DMA on recent PCIe chips
  phy dp83867: Make rgmii parameters optional
  phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
  bpf: arm64: remove callee-save registers use for tmp registers
  asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
  switchdev: pass pointer to fib_info instead of copy
  net_sched: close another race condition in tcf_mirred_release()
  tipc: fix nametable publication field in nl compat
  drivers: net: Don't print unpopulated net_device name
  qed: add support for dcbx.
  ravb: Add missing free_irq() calls to ravb_close()
  qed: Remove a stray tab
  net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fec-mpc52xx: use phydev from struct net_device
  bpf, doc: fix typo on bpf_asm descriptions
  stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
  net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fs-enet: use phydev from struct net_device
  ...
2016-05-17 16:26:30 -07:00
Linus Torvalds 9a07a79684 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:

   - Crypto self tests can now be disabled at boot/run time.
   - Add async support to algif_aead.

  Algorithms:

   - A large number of fixes to MPI from Nicolai Stange.
   - Performance improvement for HMAC DRBG.

  Drivers:

   - Use generic crypto engine in omap-des.
   - Merge ppc4xx-rng and crypto4xx drivers.
   - Fix lockups in sun4i-ss driver by disabling IRQs.
   - Add DMA engine support to ccp.
   - Reenable talitos hash algorithms.
   - Add support for Hisilicon SoC RNG.
   - Add basic crypto driver for the MXC SCC.

  Others:

   - Do not allocate crypto hash tfm in NORECLAIM context in ecryptfs"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
  crypto: qat - change the adf_ctl_stop_devices to void
  crypto: caam - fix caam_jr_alloc() ret code
  crypto: vmx - comply with ABIs that specify vrsave as reserved.
  crypto: testmgr - Add a flag allowing the self-tests to be disabled at runtime.
  crypto: ccp - constify ccp_actions structure
  crypto: marvell/cesa - Use dma_pool_zalloc
  crypto: qat - make adf_vf_isr.c dependant on IOV config
  crypto: qat - Fix typo in comments
  lib: asn1_decoder - add MODULE_LICENSE("GPL")
  crypto: omap-sham - Use dma_request_chan() for requesting DMA channel
  crypto: omap-des - Use dma_request_chan() for requesting DMA channel
  crypto: omap-aes - Use dma_request_chan() for requesting DMA channel
  crypto: omap-des - Integrate with the crypto engine framework
  crypto: s5p-sss - fix incorrect usage of scatterlists api
  crypto: s5p-sss - Fix missed interrupts when working with 8 kB blocks
  crypto: s5p-sss - Use common BIT macro
  crypto: mxc-scc - fix unwinding in mxc_scc_crypto_register()
  crypto: mxc-scc - signedness bugs in mxc_scc_ablkcipher_req_init()
  crypto: talitos - fix ahash algorithms registration
  crypto: ccp - Ensure all dependencies are specified
  ...
2016-05-17 09:33:39 -07:00
Linus Torvalds a3871bd434 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Documentation updates, including fixes to the design-level
     requirements documentation and a fixed version of the design-level
     data-structure documentation.  These fixes include removing
     cartoons and getting rid of the html/htmlx duplication.

   - Further improvements to the new-age expedited grace periods.

   - Miscellaneous fixes.

   - Torture-test changes, including a new rcuperf module for measuring
     RCU grace-period performance and scalability, which is useful for
     the expedited-grace-period changes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  rcutorture: Add boot-time adjustment of leaf fanout
  rcutorture: Add irqs-disabled test for call_rcu()
  rcutorture: Dump trace buffer upon shutdown
  rcutorture: Don't rebuild identical kernel
  rcutorture: Add OS-jitter capability
  documentation: Add documentation for RCU's major data structures
  rcutorture: Convert test duration to seconds early
  torture: Kill qemu, not parent process
  torture: Clarify refusal to run more than one torture test
  rcutorture: Consider FROZEN hotplug notifier transitions
  rcutorture: Remove redundant initialization to zero
  rcuperf: Do not wake up shutdown wait queue if "shutdown" is false.
  rcutorture: Add largish-system rcuperf scenario
  rcutorture: Avoid RCU CPU stall warning and RT throttling
  rcutorture: Add rcuperf holdoff boot parameter to reduce interference
  rcutorture: Make scripts analyze rcuperf trace data, if present
  rcutorture: Make rcuperf collect expedited event-trace data
  rcutorture: Print measure of batching efficiency
  rcutorture: Set rcuperf writer kthreads to real-time priority
  rcutorture: Bind rcuperf reader/writer kthreads to CPUs
  ...
2016-05-16 12:02:08 -07:00
Linus Torvalds 0052af4411 Merge branch 'core-lib-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core/lib update from Ingo Molnar:
 "This contains a single commit that removes an unused facility that the
  scheduler used to make use of"

* 'core-lib-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lib/proportions: Remove unused code
2016-05-16 11:36:02 -07:00
Daniel Borkmann d1c55ab5e4 bpf: prepare bpf_int_jit_compile/bpf_prog_select_runtime apis
Since the blinding is strictly only called from inside eBPF JITs,
we need to change signatures for bpf_int_jit_compile() and
bpf_prog_select_runtime() first in order to prepare that the
eBPF program we're dealing with can change underneath. Hence,
for call sites, we need to return the latest prog. No functional
change in this patch.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:32 -04:00
David S. Miller 909b27f706 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The nf_conntrack_core.c fix in 'net' is not relevant in 'net-next'
because we no longer have a per-netns conntrack hash.

The ip_gre.c conflict as well as the iwlwifi ones were cases of
overlapping changes.

Conflicts:
	drivers/net/wireless/intel/iwlwifi/mvm/tx.c
	net/ipv4/ip_gre.c
	net/netfilter/nf_conntrack_core.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-15 13:32:48 -04:00
Linus Torvalds 6ba5b85fd4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Overlayfs fixes from Miklos, assorted fixes from me.

  Stable fodder of varying severity, all sat in -next for a while"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ovl: ignore permissions on underlying lookup
  vfs: add lookup_hash() helper
  vfs: rename: check backing inode being equal
  vfs: add vfs_select_inode() helper
  get_rock_ridge_filename(): handle malformed NM entries
  ecryptfs: fix handling of directory opening
  atomic_open(): fix the handling of create_error
  fix the copy vs. map logics in blk_rq_map_user_iov()
  do_splice_to(): cap the size before passing to ->splice_read()
2016-05-14 11:59:43 -07:00
David Howells 23c8a812dc KEYS: Fix ASN.1 indefinite length object parsing
This fixes CVE-2016-0758.

In the ASN.1 decoder, when the length field of an ASN.1 value is extracted,
it isn't validated against the remaining amount of data before being added
to the cursor.  With a sufficiently large size indicated, the check:

	datalen - dp < 2

may then fail due to integer overflow.

Fix this by checking the length indicated against the amount of remaining
data in both places a definite length is determined.

Whilst we're at it, make the following changes:

 (1) Check the maximum size of extended length does not exceed the capacity
     of the variable it's being stored in (len) rather than the type that
     variable is assumed to be (size_t).

 (2) Compare the EOC tag to the symbolic constant ASN1_EOC rather than the
     integer 0.

 (3) To reduce confusion, move the initialisation of len outside of:

	for (len = 0; n > 0; n--) {

     since it doesn't have anything to do with the loop counter n.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Peter Jones <pjones@redhat.com>
2016-05-12 12:01:49 +01:00
Al Viro e4d35be584 Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
David S. Miller e800072c18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
In netdevice.h we removed the structure in net-next that is being
changes in 'net'.  In macsec.c and rtnetlink.c we have overlaps
between fixes in 'net' and the u64 attribute changes in 'net-next'.

The mlx5 conflicts have to do with vxlan support dependencies.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09 15:59:24 -04:00
Al Viro dd254f5a38 fold checks into iterate_and_advance()
they are open-coded in all users except iov_iter_advance(), and there
they wouldn't be a bad idea either - as it is, iov_iter_advance(i, 0)
ends up dereferencing potentially past the end of iovec array.  It
doesn't do anything with the value it reads, and very unlikely to
trigger an oops on dereference, but it is not impossible.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-09 14:04:29 -04:00
Joonsoo Kim 7c31190bcf lib/stackdepot: avoid to return 0 handle
Recently, we allow to save the stacktrace whose hashed value is 0.  It
causes the problem that stackdepot could return 0 even if in success.
User of stackdepot cannot distinguish whether it is success or not so we
need to solve this problem.  In this patch, 1 bit are added to handle
and make valid handle none 0 by setting this bit.  After that, valid
handle will not be 0 and 0 handle will represent failure correctly.

Fixes: 33334e2576 ("lib/stackdepot.c: allow the stack trace hash to be zero")
Link: http://lkml.kernel.org/r/1462252403-1106-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-05 17:38:53 -07:00
David S. Miller cba6532100 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv4/ip_gre.c

Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 00:52:29 -04:00
Tudor Ambarus ccab6058da lib: asn1_decoder - add MODULE_LICENSE("GPL")
A kernel taint results when loading the rsa_generic module:

root@(none):~# modprobe rsa_generic
asn1_decoder: module license 'unspecified' taints kernel.
Disabling lock debugging due to kernel taint

"Tainting" of the kernel is (usually) a way of indicating that
a proprietary module has been inserted, which is not the case here.

Signed-off-by: Tudor Ambarus <tudor-dan.ambarus@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-05-03 16:10:12 +08:00
Alexander Potapenko 33334e2576 lib/stackdepot.c: allow the stack trace hash to be zero
Do not bail out from depot_save_stack() if the stack trace has zero hash.
Initially depot_save_stack() silently dropped stack traces with zero
hashes, however there's actually no point in reserving this zero value.

Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Ingo Molnar 41ed943d85 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 * Documentation updates, including fixes to the design-level
   requirements documentation and a fixed version of the design-level
   data-structure documentation.  These fixes include removing
   cartoons and getting rid of the html/htmlx duplication.

 * Further improvements to the new-age expedited grace periods.

 * Miscellaneous fixes.

 * Torture-test changes, including a new rcuperf module for measuring
   RCU grace-period performance and scalability, which is useful for
   the expedited-grace-period changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-27 16:57:36 +02:00
Nicolas Dichtel 11a9957307 libnl: fix help of _64bit functions
Fix typo and describe 'padattr'.

Fixes: 089bf1a6a9 ("libnl: add more helpers to align attributes on 64-bit")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23 20:13:24 -04:00
David S. Miller 1602f49b58 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were two cases of simple overlapping changes,
nothing serious.

In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23 18:51:33 -04:00
Nicolas Dichtel 089bf1a6a9 libnl: add more helpers to align attributes on 64-bit
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:22:12 -04:00
Kees Cook 21985319ad string_helpers: add kstrdup_quotable_file
Allocate a NULL-terminated file path with special characters escaped,
safe for logging.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-04-21 10:47:26 +10:00
Kees Cook 0d0443288f string_helpers: add kstrdup_quotable_cmdline
Provide an escaped (but readable: no inter-argument NULLs) commandline
safe for logging.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-04-21 10:47:26 +10:00
Kees Cook b53f27e4fa string_helpers: add kstrdup_quotable
Handle allocating and escaping a string safe for logging.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-04-21 10:47:25 +10:00
Greg Kroah-Hartman 5614e77258 Merge 4.6-rc4 into driver-core-next
We want those fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-19 04:28:28 +09:00
Linus Torvalds e1e22b27ec "driver core" fixes for 4.6-rc4
Here are 3 small fixes 4.6-rc4.  Two fix up some lz4 issues with big
 endian systems, and the remaining one resolves a minor debugfs issue
 that was reported.
 
 All have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlcS3ucACgkQMUfUDdst+ynw1QCbBGgbY7Xt08whNFcAP81z1Q5X
 fmsAn1fyrhfsxe+JnybzswsTOjFw99Xd
 =YX6h
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull misc fixes from Greg KH:
 "Here are three small fixes for 4.6-rc4.

  Two fix up some lz4 issues with big endian systems, and the remaining
  one resolves a minor debugfs issue that was reported.

  All have been in linux-next with no reported issues"

* tag 'driver-core-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  lib: lz4: cleanup unaligned access efficiency detection
  lib: lz4: fixed zram with lz4 on big endian machines
  debugfs: Make automount point inodes permanently empty
2016-04-16 20:53:50 -07:00
Ming Lin 9b1d6c8950 lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c
Now it's ready to move the mempool based SG chained allocator code from
SCSI driver to lib/sg_pool.c, which will be compiled only based on a Kconfig
symbol CONFIG_SG_POOL.

SCSI selects CONFIG_SG_POOL.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-04-15 16:53:14 -04:00
Rui Salvaterra dea5c24a14 lib: lz4: cleanup unaligned access efficiency detection
These identifiers are bogus. The interested architectures should define
HAVE_EFFICIENT_UNALIGNED_ACCESS whenever relevant to do so. If this
isn't true for some arch, it should be fixed in the arch definition.

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-13 09:22:49 -07:00
Rui Salvaterra 3e26a691fe lib: lz4: fixed zram with lz4 on big endian machines
Based on Sergey's test patch [1], this fixes zram with lz4 compression
on big endian cpus.

Note that the 64-bit preprocessor test is not a cleanup, it's part of
the fix, since those identifiers are bogus (for example, __ppc64__
isn't defined anywhere else in the kernel, which means we'd fall into
the 32-bit definitions on ppc64).

Tested on ppc64 with no regression on x86_64.

[1] http://marc.info/?l=linux-kernel&m=145994470805853&w=4

Cc: stable@vger.kernel.org
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-13 09:22:49 -07:00
James Morris 58976eef9d Merge tag 'keys-fixes-20160412' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into for-linus 2016-04-13 11:06:52 +10:00
Nicolai Stange 9fd4dcece4 debugfs: prevent access to possibly dead file_operations at file open
Nothing prevents a dentry found by path lookup before a return of
__debugfs_remove() to actually get opened after that return. Now, after
the return of __debugfs_remove(), there are no guarantees whatsoever
regarding the memory the corresponding inode's file_operations object
had been kept in.

Since __debugfs_remove() is seldomly invoked, usually from module exit
handlers only, the race is hard to trigger and the impact is very low.

A discussion of the problem outlined above as well as a suggested
solution can be found in the (sub-)thread rooted at

  http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk
  ("Yet another pipe related oops.")

Basically, Greg KH suggests to introduce an intermediate fops and
Al Viro points out that a pointer to the original ones may be stored in
->d_fsdata.

Follow this line of reasoning:
- Add SRCU as a reverse dependency of DEBUG_FS.
- Introduce a srcu_struct object for the debugfs subsystem.
- In debugfs_create_file(), store a pointer to the original
  file_operations object in ->d_fsdata.
- Make debugfs_remove() and debugfs_remove_recursive() wait for a
  SRCU grace period after the dentry has been delete()'d and before they
  return to their callers.
- Introduce an intermediate file_operations object named
  "debugfs_open_proxy_file_operations". It's ->open() functions checks,
  under the protection of a SRCU read lock, whether the dentry is still
  alive, i.e. has not been d_delete()'d and if so, tries to acquire a
  reference on the owning module.
  On success, it sets the file object's ->f_op to the original
  file_operations and forwards the ongoing open() call to the original
  ->open().
- For clarity, rename the former debugfs_file_operations to
  debugfs_noop_file_operations -- they are in no way canonical.

The choice of SRCU over "normal" RCU is justified by the fact, that the
former may also be used to protect ->i_private data from going away
during the execution of a file's readers and writers which may (and do)
sleep.

Finally, introduce the fs/debugfs/internal.h header containing some
declarations internal to the debugfs implementation.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 14:14:21 -07:00
David S. Miller ae95d71261 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
Al Viro 357f435d8a fix the copy vs. map logics in blk_rq_map_user_iov()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-08 19:46:28 -04:00
Naveen N. Rao 9c94f6c8e0 lib/test_bpf: Add additional BPF_ADD tests
Some of these tests proved useful with the powerpc eBPF JIT port due to
sign-extended 16-bit immediate loads. Though some of these aspects get
covered in other tests, it is better to have explicit tests so as to
quickly tag the precise problem.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 16:47:51 -04:00
Naveen N. Rao b64b50eac4 lib/test_bpf: Add test to check for result of 32-bit add that overflows
BPF_ALU32 and BPF_ALU64 tests for adding two 32-bit values that results in
32-bit overflow.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 16:47:51 -04:00
Naveen N. Rao c7395d6bd7 lib/test_bpf: Add tests for unsigned BPF_JGT
Unsigned Jump-if-Greater-Than.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 16:47:51 -04:00
Naveen N. Rao 9f134c34fb lib/test_bpf: Fix JMP_JSET tests
JMP_JSET tests incorrectly used BPF_JNE. Fix the same.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 16:47:50 -04:00
Jerome Marchand 8d4a2ec1e0 assoc_array: don't call compare_object() on a node
Changes since V1: fixed the description and added KASan warning.

In assoc_array_insert_into_terminal_node(), we call the
compare_object() method on all non-empty slots, even when they're
not leaves, passing a pointer to an unexpected structure to
compare_object(). Currently it causes an out-of-bound read access
in keyring_compare_object detected by KASan (see below). The issue
is easily reproduced with keyutils testsuite.
Only call compare_object() when the slot is a leave.

KASan warning:
==================================================================
BUG: KASAN: slab-out-of-bounds in keyring_compare_object+0x213/0x240 at addr ffff880060a6f838
Read of size 8 by task keyctl/1655
=============================================================================
BUG kmalloc-192 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------

Disabling lock debugging due to kernel taint
INFO: Allocated in assoc_array_insert+0xfd0/0x3a60 age=69 cpu=1 pid=1647
	___slab_alloc+0x563/0x5c0
	__slab_alloc+0x51/0x90
	kmem_cache_alloc_trace+0x263/0x300
	assoc_array_insert+0xfd0/0x3a60
	__key_link_begin+0xfc/0x270
	key_create_or_update+0x459/0xaf0
	SyS_add_key+0x1ba/0x350
	entry_SYSCALL_64_fastpath+0x12/0x76
INFO: Slab 0xffffea0001829b80 objects=16 used=8 fp=0xffff880060a6f550 flags=0x3fff8000004080
INFO: Object 0xffff880060a6f740 @offset=5952 fp=0xffff880060a6e5d1

Bytes b4 ffff880060a6f730: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f740: d1 e5 a6 60 00 88 ff ff 0e 00 00 00 00 00 00 00  ...`............
Object ffff880060a6f750: 02 cf 8e 60 00 88 ff ff 02 c0 8e 60 00 88 ff ff  ...`.......`....
Object ffff880060a6f760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7d0: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Object ffff880060a6f7f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
CPU: 0 PID: 1655 Comm: keyctl Tainted: G    B           4.5.0-rc4-kasan+ #291
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 0000000000000000 000000001b2800b4 ffff880060a179e0 ffffffff81b60491
 ffff88006c802900 ffff880060a6f740 ffff880060a17a10 ffffffff815e2969
 ffff88006c802900 ffffea0001829b80 ffff880060a6f740 ffff880060a6e650
Call Trace:
 [<ffffffff81b60491>] dump_stack+0x85/0xc4
 [<ffffffff815e2969>] print_trailer+0xf9/0x150
 [<ffffffff815e9454>] object_err+0x34/0x40
 [<ffffffff815ebe50>] kasan_report_error+0x230/0x550
 [<ffffffff819949be>] ? keyring_get_key_chunk+0x13e/0x210
 [<ffffffff815ec62d>] __asan_report_load_n_noabort+0x5d/0x70
 [<ffffffff81994cc3>] ? keyring_compare_object+0x213/0x240
 [<ffffffff81994cc3>] keyring_compare_object+0x213/0x240
 [<ffffffff81bc238c>] assoc_array_insert+0x86c/0x3a60
 [<ffffffff81bc1b20>] ? assoc_array_cancel_edit+0x70/0x70
 [<ffffffff8199797d>] ? __key_link_begin+0x20d/0x270
 [<ffffffff8199786c>] __key_link_begin+0xfc/0x270
 [<ffffffff81993389>] key_create_or_update+0x459/0xaf0
 [<ffffffff8128ce0d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81992f30>] ? key_type_lookup+0xc0/0xc0
 [<ffffffff8199e19d>] ? lookup_user_key+0x13d/0xcd0
 [<ffffffff81534763>] ? memdup_user+0x53/0x80
 [<ffffffff819983ea>] SyS_add_key+0x1ba/0x350
 [<ffffffff81998230>] ? key_get_type_from_user.constprop.6+0xa0/0xa0
 [<ffffffff828bcf4e>] ? retint_user+0x18/0x23
 [<ffffffff8128cc7e>] ? trace_hardirqs_on_caller+0x3fe/0x580
 [<ffffffff81004017>] ? trace_hardirqs_on_thunk+0x17/0x19
 [<ffffffff828bc432>] entry_SYSCALL_64_fastpath+0x12/0x76
Memory state around the buggy address:
 ffff880060a6f700: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
 ffff880060a6f780: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
>ffff880060a6f800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                        ^
 ffff880060a6f880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880060a6f900: fc fc fc fc fc fc 00 00 00 00 00 00 00 00 00 00
==================================================================

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
2016-04-06 14:06:48 +01:00
Nicolai Stange 0bb5c9ead6 lib/mpi: mpi_read_raw_from_sgl(): fix out-of-bounds buffer access
Within the copying loop in mpi_read_raw_from_sgl(), the last input SGE's
byte count gets artificially extended as follows:

  if (sg_is_last(sg) && (len % BYTES_PER_MPI_LIMB))
    len += BYTES_PER_MPI_LIMB - (len % BYTES_PER_MPI_LIMB);

Within the following byte copying loop, this causes reads beyond that
SGE's allocated buffer:

  BUG: KASAN: slab-out-of-bounds in mpi_read_raw_from_sgl+0x331/0x650
                                     at addr ffff8801e168d4d8
  Read of size 1 by task systemd-udevd/721
  [...]
  Call Trace:
   [<ffffffff818c4d35>] dump_stack+0xbc/0x117
   [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169
   [<ffffffff814af5d1>] ? print_section+0x61/0xb0
   [<ffffffff814b1109>] print_trailer+0x179/0x2c0
   [<ffffffff814bc524>] object_err+0x34/0x40
   [<ffffffff814bfdc7>] kasan_report_error+0x307/0x8c0
   [<ffffffff814bf315>] ? kasan_unpoison_shadow+0x35/0x50
   [<ffffffff814bf38e>] ? kasan_kmalloc+0x5e/0x70
   [<ffffffff814c0ad1>] kasan_report+0x71/0xa0
   [<ffffffff81938171>] ? mpi_read_raw_from_sgl+0x331/0x650
   [<ffffffff814bf1a6>] __asan_load1+0x46/0x50
   [<ffffffff81938171>] mpi_read_raw_from_sgl+0x331/0x650
   [<ffffffff817f41b6>] rsa_verify+0x106/0x260
   [<ffffffff817f40b0>] ? rsa_set_pub_key+0xf0/0xf0
   [<ffffffff818edc79>] ? sg_init_table+0x29/0x50
   [<ffffffff817f4d22>] ? pkcs1pad_sg_set_buf+0xb2/0x2e0
   [<ffffffff817f5b74>] pkcs1pad_verify+0x1f4/0x2b0
   [<ffffffff81831057>] public_key_verify_signature+0x3a7/0x5e0
   [<ffffffff81830cb0>] ? public_key_describe+0x80/0x80
   [<ffffffff817830f0>] ? keyring_search_aux+0x150/0x150
   [<ffffffff818334a4>] ? x509_request_asymmetric_key+0x114/0x370
   [<ffffffff814b83f0>] ? kfree+0x220/0x370
   [<ffffffff818312c2>] public_key_verify_signature_2+0x32/0x50
   [<ffffffff81830b5c>] verify_signature+0x7c/0xb0
   [<ffffffff81835d0c>] pkcs7_validate_trust+0x42c/0x5f0
   [<ffffffff813c391a>] system_verify_data+0xca/0x170
   [<ffffffff813c3850>] ? top_trace_array+0x9b/0x9b
   [<ffffffff81510b29>] ? __vfs_read+0x279/0x3d0
   [<ffffffff8129372f>] mod_verify_sig+0x1ff/0x290
  [...]

The exact purpose of the len extension isn't clear to me, but due to
its form, I suspect that it's a leftover somehow accounting for leading
zero bytes within the most significant output limb.

Note however that without that len adjustement, the total number of bytes
ever processed by the inner loop equals nbytes and thus, the last output
limb gets written at this point. Thus the net effect of the len adjustement
cited above is just to keep the inner loop running for some more
iterations, namely < BYTES_PER_MPI_LIMB ones, reading some extra bytes from
beyond the last SGE's buffer and discarding them afterwards.

Fix this issue by purging the extension of len beyond the last input SGE's
buffer length.

Fixes: 2d4d1eea54 ("lib/mpi: Add mpi sgl helpers")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-05 20:35:51 +08:00
Nicolai Stange 85d541a3d1 lib/mpi: mpi_read_raw_from_sgl(): sanitize meaning of indices
Within the byte reading loop in mpi_read_raw_sgl(), there are two
housekeeping indices used, z and x.

At all times, the index z represents the number of output bytes covered
by the input SGEs for which processing has completed so far. This includes
any leading zero bytes within the most significant limb.

The index x changes its meaning after the first outer loop's iteration
though: while processing the first input SGE, it represents

  "number of leading zero bytes in most significant output limb" +
   "current position within current SGE"

For the remaining SGEs OTOH, x corresponds just to

  "current position within current SGE"

After all, it is only the sum of z and x that has any meaning for the
output buffer and thus, the

  "number of leading zero bytes in most significant output limb"

part can be moved away from x into z from the beginning, opening up the
opportunity for cleaner code.

Before the outer loop iterating over the SGEs, don't initialize z with
zero, but with the number of leading zero bytes in the most significant
output limb. For the inner loop iterating over a single SGE's bytes,
get rid of the buf_shift offset to x' bounds and let x run from zero to
sg->length - 1.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-05 20:35:51 +08:00
Nicolai Stange 64c09b0b59 lib/mpi: mpi_read_raw_from_sgl(): fix nbits calculation
The number of bits, nbits, is calculated in mpi_read_raw_from_sgl() as
follows:

  nbits = nbytes * 8;

Afterwards, the number of leading zero bits of the first byte get
subtracted:

  nbits -= count_leading_zeros(*(u8 *)(sg_virt(sgl) + lzeros));

However, count_leading_zeros() takes an unsigned long and thus,
the u8 gets promoted to an unsigned long.

Thus, the above doesn't subtract the number of leading zeros in the most
significant nonzero input byte from nbits, but the number of leading
zeros of the most significant nonzero input byte promoted to unsigned long,
i.e. BITS_PER_LONG - 8 too many.

Fix this by subtracting

  count_leading_zeros(...) - (BITS_PER_LONG - 8)

from nbits only.

Fixes: 2d4d1eea54 ("lib/mpi: Add mpi sgl helpers")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-05 20:35:51 +08:00
Nicolai Stange 60e1b74c22 lib/mpi: mpi_read_raw_from_sgl(): purge redundant clearing of nbits
In mpi_read_raw_from_sgl(), unsigned nbits is calculated as follows:

  nbits = nbytes * 8;

and redundantly cleared later on if nbytes == 0:

  if (nbytes > 0)
    ...
  else
    nbits = 0;

Purge this redundant clearing for the sake of clarity.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-05 20:35:51 +08:00
Nicolai Stange ab1e912ec1 lib/mpi: mpi_read_raw_from_sgl(): don't include leading zero SGEs in nbytes
At the very beginning of mpi_read_raw_from_sgl(), the leading zeros of
the input scatterlist are counted:

  lzeros = 0;
  for_each_sg(sgl, sg, ents, i) {
    ...
    if (/* sg contains nonzero bytes */)
      break;

    /* sg contains nothing but zeros here */
    ents--;
    lzeros = 0;
  }

Later on, the total number of trailing nonzero bytes is calculated by
subtracting the number of leading zero bytes from the total number of input
bytes:

  nbytes -= lzeros;

However, since lzeros gets reset to zero for each completely zero leading
sg in the loop above, it doesn't include those.

Besides wasting resources by allocating a too large output buffer,
this mistake propagates into the calculation of x, the number of
leading zeros within the most significant output limb:

  x = BYTES_PER_MPI_LIMB - nbytes % BYTES_PER_MPI_LIMB;

What's more, the low order bytes of the output, equal in number to the
extra bytes in nbytes, are left uninitialized.

Fix this by adjusting nbytes for each completely zero leading scatterlist
entry.

Fixes: 2d4d1eea54 ("lib/mpi: Add mpi sgl helpers")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-05 20:35:50 +08:00
Nicolai Stange b698538951 lib/mpi: mpi_read_raw_from_sgl(): replace len argument by nbytes
Currently, the nbytes local variable is calculated from the len argument
as follows:

  ... mpi_read_raw_from_sgl(..., unsigned int len)
  {
    unsigned nbytes;
    ...
    if (!ents)
      nbytes = 0;
    else
      nbytes = len - lzeros;
    ...
  }

Given that nbytes is derived from len in a trivial way and that the len
argument is shadowed by a local len variable in several loops, this is just
confusing.

Rename the len argument to nbytes and get rid of the nbytes local variable.
Do the nbytes calculation in place.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-05 20:35:49 +08:00
Nicolai Stange 462696fd0f lib/mpi: mpi_read_buffer(): fix buffer overflow
Currently, mpi_read_buffer() writes full limbs to the output buffer
and moves memory around to purge leading zero limbs afterwards.

However, with

  commit 9cbe21d8f8 ("lib/mpi: only require buffers as big as needed for
                        the integer")

the caller is only required to provide a buffer large enough to hold the
result without the leading zeros.

This might result in a buffer overflow for small MP numbers with leading
zeros.

Fix this by coping the result to its final destination within the output
buffer and not copying the leading zeros at all.

Fixes: 9cbe21d8f8 ("lib/mpi: only require buffers as big as needed for
                      the integer")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-05 20:35:49 +08:00