1
0
Fork 0
Commit Graph

97507 Commits (9a0e7120109632910e77295ce6fc512c16cd367b)

Author SHA1 Message Date
David Howells 9880150655 fscache: Fix the default for fscache_maybe_release_page()
Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached.  It mustn't return false as that indicates
the page cannot yet be freed.

The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.

This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress.  This might be an issue for
9P, however.

Note that the pages aren't entirely stuck.  Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.

Fixes: 201a15428b ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: stable@vger.kernel.org # 2.6.32+
2018-01-02 10:02:19 +00:00
Linus Torvalds cea92e843e Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "A pile of fixes for long standing issues with the timer wheel and the
  NOHZ code:

   - Prevent timer base confusion accross the nohz switch, which can
     cause unlocked access and data corruption

   - Reinitialize the stale base clock on cpu hotplug to prevent subtle
     side effects including rollovers on 32bit

   - Prevent an interrupt storm when the timer softirq is already
     pending caused by tick_nohz_stop_sched_tick()

   - Move the timer start tracepoint to a place where it actually makes
     sense

   - Add documentation to timerqueue functions as they caused confusion
     several times now"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timerqueue: Document return values of timerqueue_add/del()
  timers: Invoke timer_start_debug() where it makes sense
  nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
  timers: Reinitialize per cpu bases on hotplug
  timers: Use deferrable base independent of base::nohz_active
2017-12-31 12:30:34 -08:00
Linus Torvalds 88fa025d30 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A rather large update after the kaisered maintainer finally found time
  to handle regression reports.

   - The larger part addresses a regression caused by the x86 vector
     management rework.

     The reservation based model does not work reliably for MSI
     interrupts, if they cannot be masked (yes, yet another hw
     engineering trainwreck). The reason is that the reservation mode
     assigns a dummy vector when the interrupt is allocated and switches
     to a real vector when the interrupt is requested.

     If the MSI entry cannot be masked then the initialization might
     raise an interrupt before the interrupt is requested, which ends up
     as spurious interrupt and causes device malfunction and worse. The
     fix is to exclude MSI interrupts which do not support masking from
     reservation mode and assign a real vector right away.

   - Extend the extra lockdep class setup for nested interrupts with a
     class for the recently added irq_desc::request_mutex so lockdep can
     differeniate and does not emit false positive warnings.

   - A ratelimit guard for the bad irq printout so in case a bad irq
     comes back immediately the system does not drown in dmesg spam"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/msi, x86/vector: Prevent reservation mode for non maskable MSI
  genirq/irqdomain: Rename early argument of irq_domain_activate_irq()
  x86/vector: Use IRQD_CAN_RESERVE flag
  genirq: Introduce IRQD_CAN_RESERVE flag
  genirq/msi: Handle reactivation only on success
  gpio: brcmstb: Make really use of the new lockdep class
  genirq: Guard handle_bad_irq log messages
  kernel/irq: Extend lockdep class for request mutex
2017-12-31 11:23:11 -08:00
Linus Torvalds 5aa90a8458 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 page table isolation updates from Thomas Gleixner:
 "This is the final set of enabling page table isolation on x86:

   - Infrastructure patches for handling the extra page tables.

   - Patches which map the various bits and pieces which are required to
     get in and out of user space into the user space visible page
     tables.

   - The required changes to have CR3 switching in the entry/exit code.

   - Optimizations for the CR3 switching along with documentation how
     the ASID/PCID mechanism works.

   - Updates to dump pagetables to cover the user space page tables for
     W+X scans and extra debugfs files to analyze both the kernel and
     the user space visible page tables

  The whole functionality is compile time controlled via a config switch
  and can be turned on/off on the command line as well"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  x86/ldt: Make the LDT mapping RO
  x86/mm/dump_pagetables: Allow dumping current pagetables
  x86/mm/dump_pagetables: Check user space page table for WX pages
  x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
  x86/mm/pti: Add Kconfig
  x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
  x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
  x86/mm: Use INVPCID for __native_flush_tlb_single()
  x86/mm: Optimize RESTORE_CR3
  x86/mm: Use/Fix PCID to optimize user/kernel switches
  x86/mm: Abstract switching CR3
  x86/mm: Allow flushing for future ASID switches
  x86/pti: Map the vsyscall page if needed
  x86/pti: Put the LDT in its own PGD if PTI is on
  x86/mm/64: Make a full PGD-entry size hole in the memory map
  x86/events/intel/ds: Map debug buffers in cpu_entry_area
  x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
  x86/mm/pti: Map ESPFIX into user space
  x86/mm/pti: Share entry text PMD
  x86/entry: Align entry text section to PMD boundary
  ...
2017-12-29 17:02:49 -08:00
Thomas Gleixner 26456f87ac timers: Reinitialize per cpu bases on hotplug
The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
them with a potentially stale clk and next_expiry valuem, which can cause
trouble then the CPU is plugged.

Add a prepare callback which forwards the clock, sets next_expiry to far in
the future and reset the control flags to a known state.

Set base->must_forward_clk so the first timer which is queued will try to
forward the clock to current jiffies.

Fixes: 500462a9de ("timers: Switch to a non-cascading wheel")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
2017-12-29 23:13:09 +01:00
Thomas Gleixner 702cb0a028 genirq/irqdomain: Rename early argument of irq_domain_activate_irq()
The 'early' argument of irq_domain_activate_irq() is actually used to
denote reservation mode. To avoid confusion, rename it before abuse
happens.

No functional change.

Fixes: 7249164346 ("genirq/irqdomain: Update irq_domain_ops.activate() signature")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandru Chirvasitu <achirvasub@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Mikael Pettersson <mikpelinux@gmail.com>
Cc: Josh Poulson <jopoulso@microsoft.com>
Cc: Mihai Costache <v-micos@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-pci@vger.kernel.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: devel@linuxdriverproject.org
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Sakari Ailus <sakari.ailus@intel.com>,
Cc: linux-media@vger.kernel.org
2017-12-29 21:13:04 +01:00
Thomas Gleixner 69790ba92b genirq: Introduce IRQD_CAN_RESERVE flag
Add a new flag to mark interrupts which can use reservation mode. This is
going to be used in subsequent patches to disable reservation mode for a
certain class of MSI devices.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexandru Chirvasitu <achirvasub@gmail.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Mikael Pettersson <mikpelinux@gmail.com>
Cc: Josh Poulson <jopoulso@microsoft.com>
Cc: Mihai Costache <v-micos@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-pci@vger.kernel.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: devel@linuxdriverproject.org
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Sakari Ailus <sakari.ailus@intel.com>,
Cc: linux-media@vger.kernel.org
2017-12-29 21:13:04 +01:00
Linus Torvalds 61233580f1 Power management fix for v4.15-rc6
This fixes a schedutil cpufreq governor regression from the 4.14
 cycle that may cause a CPU idleness check to return incorrect results
 in some cases which leads to suboptimal decisions (Joel Fernandes).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJaROaWAAoJEILEb/54YlRxCNMP/3Dr1wAhR26HgVyAVXN3tR8j
 RjYvn/G7WFuczl+nqsMT8A2Gig61BAWVhADh5HJrqoZbYlwOLXCnMJOcQy4FDy2g
 THPQdzf8eT0OHGF/+SZvOWTPX+nL63cN+i+U1bCYyuQ6+N1Kk3vZdMx8fbyJScQ2
 KtszEg1zh2CIzHSDlxqQFOzE8O2GAU+evBQYMrwLpPBUbi9yN9nCZJ8XKarfJwnB
 YgALt0BdLU+n7kFj4ZCx76Mjf3qQL7rV+3YNs49OJU90p72O4aiadv7XxDMv5Fpf
 Ie9xh0gouZLPziJiHwFPb3PqxMVojJPKpqRk73mitCc2T93Kk4oWozGt60k0tahO
 pYBh2OGHycnMI/cn2wpOk/hIdHVz6IUD5Of8LtXAiDhAnmvlj5PFdnHGDgvMwbga
 2Kzbw8n2fKfiX2r9UfJRjrQHaomvdy4kgneMVjgrh0dIu9GL5zK2sntF0pZeyyKH
 XzIBZ9Ue1PLOWIDW1LSPLMq6YKh95FLRTebodiWRbq2CYITCXn37qGMd4BhXwOEx
 CLkSE3vmWbh6Yr6pITlFU0oTMenc0SCHAqh+hHTr4NAYA5rooLspH0swRk5Lf2bz
 OC7K4BkPc0Lyv2MUj5ZKYa3UCgJ9QF/ekL4JW4PsWRZ1Afk8XqiCC5O9Kn4/iYGT
 xzcKW02R1p/hIWTOY2kw
 =TPkr
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "This fixes a schedutil cpufreq governor regression from the 4.14 cycle
  that may cause a CPU idleness check to return incorrect results in
  some cases which leads to suboptimal decisions (Joel Fernandes)"

* tag 'pm-4.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: schedutil: Use idle_calls counter of the remote CPU
2017-12-29 11:54:15 -08:00
Linus Torvalds 2758b3e3e6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) IPv6 gre tunnels end up with different default features enabled
    depending upon whether netlink or ioctls are used to bring them up.
    Fix from Alexey Kodanev.

 2) Fix read past end of user control message in RDS< from Avinash
    Repaka.

 3) Missing RCU barrier in mini qdisc code, from Cong Wang.

 4) Missing policy put when reusing per-cpu route entries, from Florian
    Westphal.

 5) Handle nested PCI errors properly in bnx2x driver, from Guilherme G.
    Piccoli.

 6) Run nested transport mode IPSEC packets via tasklet, from Herbert
    Xu.

 7) Fix handling poll() for stream sockets in tipc, from Parthasarathy
    Bhuvaragan.

 8) Fix two stack-out-of-bounds issues in IPSEC, from Steffen Klassert.

 9) Another zerocopy ubuf handling fix, from Willem de Bruijn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
  strparser: Call sock_owned_by_user_nocheck
  sock: Add sock_owned_by_user_nocheck
  skbuff: in skb_copy_ubufs unclone before releasing zerocopy
  tipc: fix hanging poll() for stream sockets
  sctp: Replace use of sockets_allocated with specified macro.
  bnx2x: Improve reliability in case of nested PCI errors
  tg3: Enable PHY reset in MTU change path for 5720
  tg3: Add workaround to restrict 5762 MRRS to 2048
  tg3: Update copyright
  net: fec: unmap the xmit buffer that are not transferred by DMA
  tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path
  tipc: error path leak fixes in tipc_enable_bearer()
  RDS: Check cmsg_len before dereferencing CMSG_DATA
  tcp: Avoid preprocessor directives in tracepoint macro args
  tipc: fix memory leak of group member when peer node is lost
  net: sched: fix possible null pointer deref in tcf_block_put
  tipc: base group replicast ack counter on number of actual receivers
  net_sched: fix a missing rcu barrier in mini_qdisc_pair_swap()
  net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
  ip6_gre: fix device features for ioctl setup
  ...
2017-12-28 23:20:21 -08:00
Linus Torvalds 19286e4a7a Third pull request for 4.15-rc
- cxgb4 fix for an iser testing failure as debugged by Steve and Sagi.
   The problem was a driver bug in the handling of shutting down a QP.
 - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on error
   unwind and a use after free bug
 - Improper congestion counter values on mlx5 when link aggregation is enabled
 - ipoib lockdep regression introduced in this merge window
 - hfi1 regression supporting the device in a VM introduced in a recent patch
 - Typo that breaks future uAPI compatibility in the verbs core
 - More SELinux related oops fixing
 - Fix an oops during error unwind in mlx5
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJaRIC/AAoJEDht9xV+IJsaJfQP/1Z97/kDlJGIJQ4vBJ52xdHV
 LfRdmCBqU5nrAihEBpFLRc2S+kaSJbYAY48tRn28Jx6s9dmSvU6v2J2IqhmnM6p6
 ruWLR0Yqjg+xHcw+eaEoscJjRw+jDUEeVOgfbYc0HViWwvMNTrBB32HpAV48HuAl
 aCbM/qrQYXdYuJBImM4glERIpjlvYKoxv4D9xCJhJRRQvTnKOymHzZpKbqNujWxl
 dzCmZeOrw+HVxNW9MHHtUxClBoLNnykfRVKzMcdDjsqJ+Fdo2bY3ksgMvgiatRwY
 NxGfixhouhOz9vjN/ljpWXxTV5TTm6Nrib8XcHuOWjcYn/AFwJMMRsM+1w1AuCKs
 Zviq7QVApZzYuvHw1ewupRGvDX+P13sufD5sbc6cfVUT3w6ZX0Clpspl4++JN4ER
 WvBZikozaviL3w9ir0drlZ6k9BDnjQ6P7wZcBjDZC/j0zXKM65rISZrTsK7TeiTk
 lBNdLCkwZhO0dvafCNwA910tTaXEPhqqAh8Okob2A5U5lUAewd0AEHJusL/iCmSl
 uXnnxu8ik61QzOqwneEHSyVMkOSLEC+kk13fiFAq/LjPUSm9N/MihZd4JNxwSa6W
 4Rah7IKdh9F6qEnaKLPEfHxPhfghhb7O51zCA8mwA/JNCneqc4Gqi0U2JXkuloml
 395aK2aZSShIkZvIwbI8
 =IkGi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is the next batch of for-rc patches from RDMA. It includes the
  fix for the ipoib regression I mentioned last time, and the result of
  a fairly major debugging effort to get iser working reliably on cxgb4
  hardware - it turns out the cxgb4 driver was not handling QP error
  flushing properly causing iser to fail.

   - cxgb4 fix for an iser testing failure as debugged by Steve and
     Sagi. The problem was a driver bug in the handling of shutting down
     a QP.

   - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on
     error unwind and a use after free bug

   - Improper congestion counter values on mlx5 when link aggregation is
     enabled

   - ipoib lockdep regression introduced in this merge window

   - hfi1 regression supporting the device in a VM introduced in a
     recent patch

   - Typo that breaks future uAPI compatibility in the verbs core

   - More SELinux related oops fixing

   - Fix an oops during error unwind in mlx5"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/mlx5: Fix mlx5_ib_alloc_mr error flow
  IB/core: Verify that QP is security enabled in create and destroy
  IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
  IB/mlx5: Serialize access to the VMA list
  IB/hfi: Only read capability registers if the capability exists
  IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
  IB/mlx5: Fix congestion counters in LAG mode
  RDMA/vmw_pvrdma: Avoid use after free due to QP/CQ/SRQ destroy
  RDMA/vmw_pvrdma: Use refcount_dec_and_test to avoid warning
  RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path
  iw_cxgb4: when flushing, complete all wrs in a chain
  iw_cxgb4: reflect the original WR opcode in drain cqes
  iw_cxgb4: Only validate the MSN for successful completions
2017-12-28 23:06:01 -08:00
Tom Herbert 602f7a2714 sock: Add sock_owned_by_user_nocheck
This allows checking socket lock ownership with producing lockdep
warnings.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 14:28:22 -05:00
Joel Fernandes 466a2b42d6 cpufreq: schedutil: Use idle_calls counter of the remote CPU
Since the recent remote cpufreq callback work, its possible that a cpufreq
update is triggered from a remote CPU. For single policies however, the current
code uses the local CPU when trying to determine if the remote sg_cpu entered
idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
idle_calls counter of the remote CPU.

Fixes: 674e75411f (sched: cpufreq: Allow remote cpufreq callbacks)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-12-28 12:26:54 +01:00
Andrew Lunn 39c3fd5895 kernel/irq: Extend lockdep class for request mutex
The IRQ code already has support for lockdep class for the lock mutex
in an interrupt descriptor. Extend this to add a second class for the
request mutex in the descriptor. Not having a class is resulting in
false positive splats in some code paths.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: linus.walleij@linaro.org
Cc: grygorii.strashko@ti.com
Cc: f.fainelli@gmail.com
Link: https://lkml.kernel.org/r/1512234664-21555-1-git-send-email-andrew@lunn.ch
2017-12-28 12:26:35 +01:00
David S. Miller 65bbbf6c20 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2017-12-22

1) Check for valid id proto in validate_tmpl(), otherwise
   we may trigger a warning in xfrm_state_fini().
   From Cong Wang.

2) Fix a typo on XFRMA_OUTPUT_MARK policy attribute.
   From Michal Kubecek.

3) Verify the state is valid when encap_type < 0,
   otherwise we may crash on IPsec GRO .
   From Aviv Heller.

4) Fix stack-out-of-bounds read on socket policy lookup.
   We access the flowi of the wrong address family in the
   IPv4 mapped IPv6 case, fix this by catching address
   family missmatches before we do the lookup.

5) fix xfrm_do_migrate() with AEAD to copy the geniv
   field too. Otherwise the state is not fully initialized
   and migration fails. From Antony Antony.

6) Fix stack-out-of-bounds with misconfigured transport
   mode policies. Our policy template validation is not
   strict enough. It is possible to configure policies
   with transport mode template where the address family
   of the template does not match the selectors address
   family. Fix this by refusing such a configuration,
   address family can not change on transport mode.

7) Fix a policy reference leak when reusing pcpu xdst
   entry. From Florian Westphal.

8) Reinject transport-mode packets through tasklet,
   otherwise it is possible to reate a recursion
   loop. From Herbert Xu.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 10:58:23 -05:00
Mat Martineau 6a6b0b9914 tcp: Avoid preprocessor directives in tracepoint macro args
Using a preprocessor directive to check for CONFIG_IPV6 in the middle of
a DECLARE_EVENT_CLASS macro's arg list causes sparse to report a series
of errors:

./include/trace/events/tcp.h:68:1: error: directive in argument list
./include/trace/events/tcp.h:75:1: error: directive in argument list
./include/trace/events/tcp.h:144:1: error: directive in argument list
./include/trace/events/tcp.h:151:1: error: directive in argument list
./include/trace/events/tcp.h:216:1: error: directive in argument list
./include/trace/events/tcp.h:223:1: error: directive in argument list
./include/trace/events/tcp.h:274:1: error: directive in argument list
./include/trace/events/tcp.h:281:1: error: directive in argument list

Once sparse finds an error, it stops printing warnings for the file it
is checking. This masks any sparse warnings that would normally be
reported for the core TCP code.

Instead, handle the preprocessor conditionals in a couple of auxiliary
macros. This also has the benefit of reducing duplicate code.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26 17:25:22 -05:00
Thomas Gleixner aa8c6248f8 x86/mm/pti: Add infrastructure for page table isolation
Add the initial files for kernel page table isolation, with a minimal init
function and the boot time detection for this misfeature.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-23 21:12:59 +01:00
Linus Torvalds caf9a82657 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PTI preparatory patches from Thomas Gleixner:
 "Todays Advent calendar window contains twentyfour easy to digest
  patches. The original plan was to have twenty three matching the date,
  but a late fixup made that moot.

   - Move the cpu_entry_area mapping out of the fixmap into a separate
     address space. That's necessary because the fixmap becomes too big
     with NRCPUS=8192 and this caused already subtle and hard to
     diagnose failures.

     The top most patch is fresh from today and cures a brain slip of
     that tall grumpy german greybeard, who ignored the intricacies of
     32bit wraparounds.

   - Limit the number of CPUs on 32bit to 64. That's insane big already,
     but at least it's small enough to prevent address space issues with
     the cpu_entry_area map, which have been observed and debugged with
     the fixmap code

   - A few TLB flush fixes in various places plus documentation which of
     the TLB functions should be used for what.

   - Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
     more than sysenter now and keeping the name makes backtraces
     confusing.

   - Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
     which is only invoked on fork().

   - Make vysycall more robust.

   - A few fixes and cleanups of the debug_pagetables code. Check
     PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
     C89 initialization of the address hint array which already was out
     of sync with the index enums.

   - Move the ESPFIX init to a different place to prepare for PTI.

   - Several code moves with no functional change to make PTI
     integration simpler and header files less convoluted.

   - Documentation fixes and clarifications"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
  init: Invoke init_espfix_bsp() from mm_init()
  x86/cpu_entry_area: Move it out of the fixmap
  x86/cpu_entry_area: Move it to a separate unit
  x86/mm: Create asm/invpcid.h
  x86/mm: Put MMU to hardware ASID translation in one place
  x86/mm: Remove hard-coded ASID limit checks
  x86/mm: Move the CR3 construction functions to tlbflush.h
  x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
  x86/mm: Remove superfluous barriers
  x86/mm: Use __flush_tlb_one() for kernel memory
  x86/microcode: Dont abuse the TLB-flush interface
  x86/uv: Use the right TLB-flush API
  x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
  x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
  x86/mm/64: Improve the memory map documentation
  x86/ldt: Prevent LDT inheritance on exec
  x86/ldt: Rework locking
  arch, mm: Allow arch_dup_mmap() to fail
  x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
  ...
2017-12-23 11:53:04 -08:00
Linus Torvalds 9ad95bdaca xen: fixes for 4.15-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJaPKq7AAoJELDendYovxMvQOEH/2iuLSDI7b5vjPuBCvFjituP
 floACKQl3Zp1Xk//DQLwTis02/9cIAOUGM11PmrkEq1lehpXPxIPzyfpx3wbEezd
 A9hP71AMojdOIUCxucAGg94kxryv9OgXT6/qggzLlpmEpo7x12dVSPV+LxfcbkqL
 zeTi1WEzz9jacfFI5CRvJx68tacIxvxCdKfauq2Yz2AB3BKd2xtMR7j77lycAeSw
 KTFaIikKnZ3Aonn/yRUhD89oOp/Kt7XJib3glsAAKgA1GMuqmJsk1yB4Wm3qkpGD
 bFSzf51HLl2PRyV5PxlJOfHtyTUKRj1Jf80YQgI2x9jR2LT3pBSI+NZt7Paw4Wc=
 =QB74
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "This contains two fixes for running under Xen:

   - a fix avoiding resource conflicts between adding mmio areas and
     memory hotplug

   - a fix setting NX bits in page table entries copied from Xen when
     running a PV guest"

* tag 'for-linus-4.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: Mark unallocated host memory as UNUSABLE
  x86-64/Xen: eliminate W+X mappings
2017-12-22 12:30:10 -08:00
Linus Torvalds 0fc0f18bed Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

   - fix chacha20 crash on zero-length input due to unset IV

   - fix potential race conditions in mcryptd with spinlock

   - only wait once at top of algif recvmsg to avoid inconsistencies

   - fix potential use-after-free in algif_aead/algif_skcipher"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: af_alg - fix race accessing cipher request
  crypto: mcryptd - protect the per-CPU queue with a lock
  crypto: af_alg - wait for data at beginning of recvmsg
  crypto: skcipher - set walk.iv for zero-length inputs
2017-12-22 12:22:48 -08:00
Linus Torvalds 7edc3f20ef Here's a trio of fixes:
- The runtime PM clk patches that landed this merge window forgot to
    runtime resume devices that may be off while recalculating and setting
    rates of child clks of whatever clk is changing rates.
 
  - We had a NULL pointer deref in an old clk tracepoint when clk_set_parent()
    is called with a NULL parent pointer. This shouldn't really happen, but
    it's best to avoid this regardless.
 
  - The sun9i-mmc clk driver didn't provide 'reset' support, just 'assert'
    and 'deassert' so the MMC driver stopped probing when the probe was changed
    to do a reset instead of assert/deassert pair. This implements the reset so
    things work again.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJaPDpGAAoJEK0CiJfG5JUlMvIP/AtbqPVztT+YvVEXZhs2bUqD
 oTxg6G0mH5kLe90D+HksXJDlorn7a2VEkEqlqTXEkHReEs8OLf5YgvKvf+YCD9Fp
 Ouvn8Hi9YMok81EYP+X/84dfHrhNiEpgnQLw4epR/8g0Q5odIHZVOpTPOUtFC7Hv
 EcEog3brXS4hcTABzbKVS2YC3eOchg3CxglGjTsogrT9IUN/R/SbXDIU2mzgVoXQ
 hJWkWC2YYIL1mREbnV9FS1jBOpMUkr6wrtFKIMB0ayy5d7bs9uXd0uw5Lh3fu5d0
 yrXstCqF5zdfnr/4garDVsp2G40ScJwepP6ixcfbf6czbT+wHNhKtahHwJCNSLh3
 t7taVnojSuBLAjgllufFq73OY8+UUGShnQz1zh37P2ndoeOra5ULUjKg6UdicS2w
 tkhgeKsZLLVmy8lNb83nqGNR/oU9zXcmvCRWW/3rvxE2RidvEqur/6GKQ1SnZuOP
 Ovp3HcLbn/pU7GcHI6ppn0RKYCyg/F67ES6UvU9/eizJRpnF+SzEOMkFH5IFn7TD
 rthPPNnpGaWiChXwzav6W+R/nS42iT9RwP+9mnDOGWtLAq7f5WBCV8ih7HSsDJzW
 4UhLswQQrW/7uPP6nezC25EQTtjPlLfiyJSwK5Suqd/6e/V4elzT3o4s3dBciLM9
 JTkXmwQCiefzwwifml58
 =kvSh
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "Here's a trio of fixes:

   - The runtime PM clk patches that landed this merge window forgot to
     runtime resume devices that may be off while recalculating and
     setting rates of child clks of whatever clk is changing rates.

   - We had a NULL pointer deref in an old clk tracepoint when
     clk_set_parent() is called with a NULL parent pointer. This
     shouldn't really happen, but it's best to avoid this regardless.

   - The sun9i-mmc clk driver didn't provide 'reset' support, just
     'assert' and 'deassert' so the MMC driver stopped probing when the
     probe was changed to do a reset instead of assert/deassert pair.
     This implements the reset so things work again"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi: sun9i-mmc: Implement reset callback for reset controls
  clk: fix a panic error caused by accessing NULL pointer
  clk: Manage proper runtime PM state in clk_change_rate()
2017-12-22 11:48:36 -08:00
Thomas Gleixner 613e396bc0 init: Invoke init_espfix_bsp() from mm_init()
init_espfix_bsp() needs to be invoked before the page table isolation
initialization. Move it into mm_init() which is the place where pti_init()
will be added.

While at it get rid of the #ifdeffery and provide proper stub functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-22 20:13:05 +01:00
Thomas Gleixner c10e83f598 arch, mm: Allow arch_dup_mmap() to fail
In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
allowed to fail. Fix up all instances.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: dan.j.williams@intel.com
Cc: hughd@google.com
Cc: keescook@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-22 20:13:01 +01:00
Linus Torvalds ead68f2161 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller"
 "What's a holiday weekend without some networking bug fixes? [1]

   1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function
      calls, from Daniel Borkmann.

   2) Fix regression from errata limiting change to marvell PHY driver,
      from Zhao Qiang.

   3) Fix u16 overflow in SCTP, from Xin Long.

   4) Fix potential memory leak during bridge newlink, from Nikolay
      Aleksandrov.

   5) Fix BPF selftest build on s390, from Hendrik Brueckner.

   6) Don't append to cfg80211 automatically generated certs file,
      always write new ones from scratch. From Thierry Reding.

   7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai.

   8) Fix hang on tg3 MTU change with certain chips, from Brian King.

   9) Add stall detection to arc emac driver and reset chip when this
      happens, from Alexander Kochetkov.

  10) Fix MTU limitng in GRE tunnel drivers, from Xin Long.

  11) Fix stmmac timestamping bug due to mis-shifting of field. From
      Fredrik Hallenberg.

  12) Fix metrics match when deleting an ipv4 route. The kernel sets
      some internal metrics bits which the user isn't going to set when
      it makes the delete request. From Phil Sutter.

  13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix
      from Yelena Krivosheev.

  14) Fix double free and memory corruption in get_net_ns_by_id, from
      Eric W. Biederman.

  15) Flush ipv4 FIB tables in the reverse order. Some tables can share
      their actual backing data, in particular this happens for the MAIN
      and LOCAL tables. We have to kill the LOCAL table first, because
      it uses MAIN's backing memory. Fix from Ido Schimmel.

  16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann
      Horn, and Alexei Starovoitov.

  17) Make changes to ipv6 autoflowlabel sysctl really propagate to
      sockets, unless the socket has set the per-socket value
      explicitly. From Shaohua Li.

  18) Fix leaks and double callback invocations of zerocopy SKBs, from
      Willem de Bruijn"

[1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
  skbuff: skb_copy_ubufs must release uarg even without user frags
  skbuff: orphan frags before zerocopy clone
  net: reevalulate autoflowlabel setting after sysctl setting
  openvswitch: Fix pop_vlan action for double tagged frames
  ipv6: Honor specified parameters in fibmatch lookup
  bpf: do not allow root to mangle valid pointers
  selftests/bpf: add tests for recent bugfixes
  bpf: fix integer overflows
  bpf: don't prune branches when a scalar is replaced with a pointer
  bpf: force strict alignment checks for stack pointers
  bpf: fix missing error return in check_stack_boundary()
  bpf: fix 32-bit ALU op verification
  bpf: fix incorrect tracking of register size truncation
  bpf: fix incorrect sign extension in check_alu_op()
  bpf/verifier: fix bounds calculation on BPF_RSH
  ipv4: Fix use-after-free when flushing FIB tables
  s390/qeth: fix error handling in checksum cmd callback
  tipc: remove joining group member from congested list
  selftests: net: Adding config fragment CONFIG_NUMA=y
  nfp: bpf: keep track of the offloaded program
  ...
2017-12-21 15:57:30 -08:00
Majd Dibbiny 71a0ff65a2 IB/mlx5: Fix congestion counters in LAG mode
Congestion counters are counted and queried per physical function.
When working in LAG mode, CNP packets can be sent or received on both
of the functions, thus congestion counters should be aggregated from
the two physical functions.

Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-21 16:06:07 -07:00
Linus Torvalds 9035a8961b Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "It's been a few weeks, so here's a small collection of fixes that
  should go into the current series.

  This contains:

   - NVMe pull request from Christoph, with a few important fixes.

   - kyber hang fix from Omar.

   - A blk-throttl fix from Shaohua, fixing a case where we double
     charge a bio.

   - Two call_single_data alignment fixes from me, fixing up some
     unfortunate changes that went into 4.14 without being properly
     reviewed on the block side (since nobody was CC'ed on the
     patch...).

   - A bounce buffer fix in two parts, one from me and one from Ming.

   - Revert bdi debug error handling patch. It's causing boot issues for
     some folks, and a week down the line, we're still no closer to a
     fix. Revert this patch for now until it's figured out, then we can
     retry for 4.16"

* 'for-linus' of git://git.kernel.dk/linux-block:
  Revert "bdi: add error handle for bdi_debug_register"
  null_blk: unalign call_single_data
  block: unalign call_single_data in struct request
  block-throttle: avoid double charge
  block: fix blk_rq_append_bio
  block: don't let passthrough IO go into .make_request_fn()
  nvme: setup streams after initializing namespace head
  nvme: check hw sectors before setting chunk sectors
  nvme: call blk_integrity_unregister after queue is cleaned up
  nvme-fc: remove double put reference if admin connect fails
  nvme: set discard_alignment to zero
  kyber: fix another domain token wait queue hang
2017-12-21 11:13:37 -08:00
Linus Torvalds 409232a450 ARM fixes:
- A bug in handling of SPE state for non-vhe systems
 - A fix for a crash on system shutdown
 - Three timer fixes, introduced by the timer optimizations for v4.15
 
 x86 fixes:
 - fix for a WARN that was introduced in 4.15
 - fix for SMM when guest uses PCID
 - fixes for several bugs found by syzkaller
 
 ... and a dozen papercut fixes for the kvm_stat tool.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJaO6N9AAoJEL/70l94x66DC1wH/Rf+u0Cj6ZQil6LK6Nf8bfPd
 3TqrwrxUDeXwi8GzsvK14izBr1mDzidSHIO0Q4XINFRSRdaf43h3R2im/SJqvNhP
 xktCmJI2CxN96oaC7kIExgwf3YKhFdLIADfbT8oR9p3xZG/+c97dkr3b4XtmVCDb
 ZXdUEOcKnoW4zwpfJN30FLlq4OwYvuYVz02AEfPivZRDfhhus/TYSnuSdxH8CLNf
 75ymuKyXoo/RELbimwbMk8Cm9+ey7PjlUGOgbnbXIFtmgznXhLzAOeES2B+46J5b
 sMBPlmiJrn6N//lM18CC5yOBzBLGsYOoXggtw4aU/5nM4GVcFebWedpcoD4D8Jw=
 =Bt8w
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM fixes:
   - A bug in handling of SPE state for non-vhe systems
   - A fix for a crash on system shutdown
   - Three timer fixes, introduced by the timer optimizations for v4.15

  x86 fixes:
   - fix for a WARN that was introduced in 4.15
   - fix for SMM when guest uses PCID
   - fixes for several bugs found by syzkaller

  ... and a dozen papercut fixes for the kvm_stat tool"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  tools/kvm_stat: sort '-f help' output
  kvm: x86: fix RSM when PCID is non-zero
  KVM: Fix stack-out-of-bounds read in write_mmio
  KVM: arm/arm64: Fix timer enable flow
  KVM: arm/arm64: Properly handle arch-timer IRQs after vtimer_save_state
  KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC
  KVM: arm/arm64: Fix HYP unmapping going off limits
  arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu
  KVM/x86: Check input paging mode when cs.l is set
  tools/kvm_stat: add line for totals
  tools/kvm_stat: stop ignoring unhandled arguments
  tools/kvm_stat: suppress usage information on command line errors
  tools/kvm_stat: handle invalid regular expressions
  tools/kvm_stat: add hint on '-f help' to man page
  tools/kvm_stat: fix child trace events accounting
  tools/kvm_stat: fix extra handling of 'help' with fields filter
  tools/kvm_stat: fix missing field update after filter change
  tools/kvm_stat: fix drilldown in events-by-guests mode
  tools/kvm_stat: fix command line option '-g'
  kvm: x86: fix WARN due to uninitialized guest FPU state
  ...
2017-12-21 10:44:13 -08:00
Shaohua Li 513674b5a2 net: reevalulate autoflowlabel setting after sysctl setting
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.

The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.

To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.

Note, this changes behavior a little bit. Before commit 42240901f7
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21 13:07:20 -05:00
David S. Miller 8b6ca2bf5a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-12-21

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix multiple security issues in the BPF verifier mostly related
   to the value and min/max bounds tracking rework in 4.14. Issues
   range from incorrect bounds calculation in some BPF_RSH cases,
   to improper sign extension and reg size handling on 32 bit
   ALU ops, missing strict alignment checks on stack pointers, and
   several others that got fixed, from Jann, Alexei and Edward.

2) Fix various build failures in BPF selftests on sparc64. More
   specifically, librt needed to be added to the libs to link
   against and few format string fixups for sizeof, from David.

3) Fix one last remaining issue from BPF selftest build that was
   still occuring on s390x from the asm/bpf_perf_event.h include
   which could not find the asm/ptrace.h copy, from Hendrik.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20 23:10:29 -05:00
Alexei Starovoitov bb7f0f989c bpf: fix integer overflows
There were various issues related to the limited size of integers used in
the verifier:
 - `off + size` overflow in __check_map_access()
 - `off + reg->off` overflow in check_mem_access()
 - `off + reg->var_off.value` overflow or 32-bit truncation of
   `reg->var_off.value` in check_mem_access()
 - 32-bit truncation in check_stack_boundary()

Make sure that any integer math cannot overflow by not allowing
pointer math with large values.

Also reduce the scope of "scalar op scalar" tracking.

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21 02:15:41 +01:00
Linus Torvalds 7887f47031 spi: Fixes for v4.15-rc4
A bunch of really small fixes here, all driver specific and mostly in
 error handling and remove paths.  The most important fixes are for the
 a3700 clock configuration and a fix for a nasty stall which could
 potentially cause data corruption with the xilinx driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAlo6kgoTHGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0KGDB/4sweNiUfktOAN1Y86bRmyrTvJctCCY
 MAOAzDxvKjUuYEoq0LZWKEt0uIXM5+cB+YrghEn6e5lwdB0rRsifzRQ5D2iR8odf
 xhxOW4uV2+RhCDiPWnKP7cOiOxahYPSBw1RNHsAhXlcT11rRQC1QMwpWLE0ET/WG
 UreahRO1pwGOCPVCxkfDmv+DxTm3IE0TojT8GvC7QsAD2UapqbLAcCNGN8SoSKD9
 8P24pjStItA/6JQVawUIaFLHgcAGHw3WeTf2TQbP6rEF5u5je4BGtELGjSZ72bdS
 0RT3CP0eNnLPzh1XqcBGjEPDKWnnglNc0o9XSU3lTNbfgB1cQ2kFQDs/
 =ILZY
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A bunch of really small fixes here, all driver specific and mostly in
  error handling and remove paths.

  The most important fixes are for the a3700 clock configuration and a
  fix for a nasty stall which could potentially cause data corruption
  with the xilinx driver"

* tag 'spi-fix-v4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: atmel: fixed spin_lock usage inside atmel_spi_remove
  spi: sun4i: disable clocks in the remove function
  spi: rspi: Do not set SPCR_SPE in qspi_set_config_register()
  spi: Fix double "when"
  spi: a3700: Fix clk prescaling for coefficient over 15
  spi: xilinx: Detect stall with Unknown commands
  spi: imx: Update device tree binding documentation
2017-12-20 13:38:00 -08:00
Linus Torvalds 444fec197e - Bug Fixes
- Fix message timing	issues;
       cros_ec_spi
   - Report correct state when an error	occurs;
       cros_ec_spi
   - Reorder enums used	for Power Management;
       rtsx_pci
   - Use correct OF helper for obtaining child nodes;
       twl4030-audio, twl6040
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJaOmxWAAoJEFGvii+H/Hdh7q0P/1/hilKPUkcYME7vE4ZNCVUC
 fSZpUjcw4S444TvjjVm7RtRiP0KJMjqM+xUbfoe/Ofu/jhe/+QMu+ygLcEEujssR
 qGfLHcPJDodeSwNSAl3Q53ZF8MSsZFPsWA+xKMkdjllM5n8qWPDVN/HkWZR0z9cV
 RnwPo5dC/sdc+hBjhxWKLNTAPcJET1zo6IdOoDar2uAuypJB+1vUjTObzIfB0j2a
 yUozPhJfTwVMpDvnGMW97XSjRa59KPzmMTnHDBZlqSYCRgD4bUKFp2SX579HpjIW
 J/R2AFape6I+mXc7KpTQERncl+b02Uu/LRysQSG/x/SwOGIlrPgBGNxt41YLep/u
 iFDMVfXyh4N7sZC2KESiXEQurm/r8w+YHRFf/Z6omv92kshZ/FFNhYp4CJ3WIAqD
 ni3BrTXsQqQ0FI+revvG6TE3W4wcNoIfKAI35yy3q7uzOuuL7nDp7HI55JLzYTV8
 OcLMTTuKFNUF02pQ1djwkcFxXgfLjF4QW4blLQwMVdzi9gwn9jsg7BcRXuRLScyw
 sYm92tFpPHQryX4xaHXG7JwBLeo9OVGi0rIHMxo45IJ4ILqWdcFvOqfBA/PTRbVB
 v0FVbtz53/2WLXY0MX7Ihg7Hq/HQG+FWtxEfHOGDBr99ZFR/vUK9le30Lsb7vd/o
 pFN+b/7YGMtUPO2zCa1+
 =vlCx
 -----END PGP SIGNATURE-----

Merge tag 'mfd-fixes-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MDF bugfixes from Lee Jones:

  - Fix message timing issues and report correct state when an error
    occurs in cros_ec_spi

  - Reorder enums used for Power Management in rtsx_pci

  - Use correct OF helper for obtaining child nodes in twl4030-audio and
    twl6040

* tag 'mfd-fixes-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: Fix RTS5227 (and others) powermanagement
  mfd: cros ec: spi: Fix "in progress" error signaling
  mfd: twl6040: Fix child-node lookup
  mfd: twl4030-audio: Fix sibling-node lookup
  mfd: cros ec: spi: Don't send first message too soon
2017-12-20 13:35:10 -08:00
Jens Axboe 4ccafe0320 block: unalign call_single_data in struct request
A previous change blindly added massive alignment to the
call_single_data structure in struct request. This ballooned it in size
from 296 to 320 bytes on my setup, for no valid reason at all.

Use the unaligned struct __call_single_data variant instead.

Fixes: 966a967116 ("smp: Avoid using two cache lines for struct call_single_data")
Cc: stable@vger.kernel.org # v4.14
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20 13:16:33 -07:00
David S. Miller 932f8c77a9 mlx5-fixes-2017-12-19
Misc fixes for mlx5 core and mlx5 netdev driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEbBAABAgAGBQJaOYOtAAoJEEg/ir3gV/o+1/0H+I9yY/HYQIU0Cl08yNvYKBcS
 KuhGpeJCH20rPQPrcPTG3zN0KF6DZKjwsQOwxdE5pUXqQNUuyuogUZCuLYB4OElL
 P4b1o5G377sTcDQ7jH2YAWD5hO8c5vKxsDbvN+kJaUkK+SHvjhLdvC5teUPf58rx
 tlDcWzdp3w1nBWeoLbASXTiqPYyA8vGkjOWWiQxBZoD4A4U7QpRKsEKaWd6g3mtH
 mnKVd8sczIFCHoHCQ3e+efJMz478YvX2rzdKZ8eycAMkQHBIny0fZzc4IiFy5ZXN
 L2CUGesr8x1CZ9dtK+OEw1STpalD0kpCfjRhYd2B7X0KgY/FN+7vgnpmBtxdzA==
 =G//q
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2017-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

===================
Mellanox, mlx5 fixes 2017-12-19

The follwoing series includes some fixes for mlx5 core and etherent
driver.

Please pull and let me know if there is any problem.

This series doesn't introduce any conflict with the ongoing mlx5 for-next
submission.

For -stable:

kernels >= v4.7.y
    ("net/mlx5e: Fix possible deadlock of VXLAN lock")
    ("net/mlx5e: Add refcount to VXLAN structure")
    ("net/mlx5e: Prevent possible races in VXLAN control flow")
    ("net/mlx5e: Fix features check of IPv6 traffic")

kernels >= v4.9.y
    ("net/mlx5: Fix error flow in CREATE_QP command")
    ("net/mlx5: Fix rate limit packet pacing naming and struct")

kernels >= v4.13.y
    ("net/mlx5: FPGA, return -EINVAL if size is zero")

kernels >= v4.14.y
    ("Revert "mlx5: move affinity hints assignments to generic code")

All above patches apply and compile with no issues on corresponding -stable.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20 13:41:05 -05:00
Boris Ostrovsky b3cf8528bb xen/balloon: Mark unallocated host memory as UNUSABLE
Commit f5775e0b61 ("x86/xen: discard RAM regions above the maximum
reservation") left host memory not assigned to dom0 as available for
memory hotplug.

Unfortunately this also meant that those regions could be used by
others. Specifically, commit fa564ad963 ("x86/PCI: Enable a 64bit BAR
on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") may try to map those
addresses as MMIO.

To prevent this mark unallocated host memory as E820_TYPE_UNUSABLE (thus
effectively reverting f5775e0b61) and keep track of that region as
a hostmem resource that can be used for the hotplug.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
2017-12-20 13:16:20 -05:00
Shaohua Li 111be88398 block-throttle: avoid double charge
If a bio is throttled and split after throttling, the bio could be
resubmited and enters the throttling again. This will cause part of the
bio to be charged multiple times. If the cgroup has an IO limit, the
double charge will significantly harm the performance. The bio split
becomes quite common after arbitrary bio size change.

To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
If the bio is cloned/split, we copy the flag to new bio too to avoid a
double charge. However, cloned bio could be directed to a new disk,
keeping the flag be a problem. The observation is we always set new disk
for the bio in this case, so we can clear the flag in bio_set_dev().

This issue exists for a long time, arbitrary bio size change just makes
it worse, so this should go into stable at least since v4.2.

V1-> V2: Not add extra field in bio based on discussion with Tejun

Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20 11:10:17 -07:00
Jakub Kicinski 102740bd94 cls_bpf: fix offload assumptions after callback conversion
cls_bpf used to take care of tracking what offload state a filter
is in, i.e. it would track if offload request succeeded or not.
This information would then be used to issue correct requests to
the driver, e.g. requests for statistics only on offloaded filters,
removing only filters which were offloaded, using add instead of
replace if previous filter was not added etc.

This tracking of offload state no longer functions with the new
callback infrastructure.  There could be multiple entities trying
to offload the same filter.

Throw out all the tracking and corresponding commands and simply
pass to the drivers both old and new bpf program.  Drivers will
have to deal with offload state tracking by themselves.

Fixes: 3f7889c4c7 ("net: sched: cls_bpf: call block callbacks for offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20 13:08:18 -05:00
Moshe Shemesh d6b2785cd5 net/mlx5: Cleanup IRQs in case of unload failure
When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
In such failure flow the function will return without
releasing all EQs irqs and then pci_free_irq_vectors will fail.
Fix by only warn on destroy EQ failure and continue to release other
EQs and their irqs.

It fixes the following kernel trace:
kernel: kernel BUG at drivers/pci/msi.c:352!
...
...
kernel: Call Trace:
kernel: pci_disable_msix+0xd3/0x100
kernel: pci_free_irq_vectors+0xe/0x20
kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:05 +02:00
Eran Ben Elisha 37e92a9d4f net/mlx5: Fix rate limit packet pacing naming and struct
In mlx5_ifc, struct size was not complete, and thus driver was sending
garbage after the last defined field. Fixed it by adding reserved field
to complete the struct size.

In addition, rename all set_rate_limit to set_pp_rate_limit to be
compliant with the Firmware <-> Driver definition.

Fixes: 7486216b3a ("{net,IB}/mlx5: mlx5_ifc updates")
Fixes: 1466cc5b23 ("net/mlx5: Rate limit tables support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:58 +02:00
Saeed Mahameed 231243c827 Revert "mlx5: move affinity hints assignments to generic code"
Before the offending commit, mlx5 core did the IRQ affinity itself,
and it seems that the new generic code have some drawbacks and one
of them is the lack for user ability to modify irq affinity after
the initial affinity values got assigned.

The issue is still being discussed and a solution in the new generic code
is required, until then we need to revert this patch.

This fixes the following issue:
echo <new affinity> > /proc/irq/<x>/smp_affinity
fails with  -EIO

This reverts commit a435393aca.
Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since
it is used in mlx5_ib driver.

Fixes: a435393aca ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jes Sorensen <jsorensen@fb.com>
Reported-by: Jes Sorensen <jsorensen@fb.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:58 +02:00
David S. Miller c6479d6257 A few more fixes:
* hwsim:
    - set To-DS bit in some frames missing it
    - fix sleeping in atomic
  * nl80211:
    - doc cleanup
    - fix locking in an error path
  * build:
    - don't append to created certs C files
    - ship certificate pre-hexdumped
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlo44hIACgkQB8qZga/f
 l8RSchAAlk1jd9WTkvzQEUe3KpQ9cPoLe6SZ5ozPJCccJQ4GPlQiB9NK1hi+qfv/
 WWdJl7zSWMujVtneeEhxHFnlwhRGh3s1t9IP1RACPEhlvFbyKob55cXJnbbiRtCJ
 MEwW15c/SbCND79QRVXLQxV0JiNM0I8j1LIdEWoi2xFA+/g+zA7InZ1g/WvPUQq7
 N28bGJ4lVEmBFJ1nnaF4kFpGRn73Cq2E9CqkWtS3Gfo9Gso6XCMvmPRjO2+BzJ67
 Ovmy9jEJTZO9aGOUXtcaZbeNhlxJMqVdRv2mjpR/l9g6r2/u+BD4uiAlLy+RNcRU
 gABqk3RJl+CA7BDl+YJahm33rLhokkZZIwq7UV5jsYOR1UTUdK/j7s5MpoqaSIVI
 ZA5/emcEoY8XTt+MMCEJbhRIJA9rQXKOq6vdeqLK5YBL0HKlg5eIWtGqh3EcY6iO
 +qe0cc4TFp4sWzOOrKShmDh5agDta1+gHUoBQsTBZnnU6pEybcY2FTUF+xN4u3mk
 cEOOMFX8CiMDIyCf2GUALRbnwfZgWesxjfY7NXCAOL6T7JK+WFVX9Njn2JqT/eIO
 FKUZl29nathswPTb1YdHmoAUJPU3TicuVnVPJzkJVG0lN5x7pwFX7/nkrZxphCBT
 06t0r3RohfbMGwYTVXqi7z7FkDVr2YbP9fQ63r5yvWOgPCT+xLM=
 =OWs9
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A few more fixes:
 * hwsim:
   - set To-DS bit in some frames missing it
   - fix sleeping in atomic
 * nl80211:
   - doc cleanup
   - fix locking in an error path
 * build:
   - don't append to created certs C files
   - ship certificate pre-hexdumped
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19 09:39:11 -05:00
Mark Brown 4d02976372
Merge remote-tracking branches 'spi/fix/armada', 'spi/fix/atmel', 'spi/fix/doc', 'spi/fix/imx', 'spi/fix/rspi', 'spi/fix/sun4i' and 'spi/fix/xilinx' into spi-linus 2017-12-19 11:07:00 +00:00
Jonathan Corbet 958a1b5a5e nl80211: Remove obsolete kerneldoc line
Commit ca986ad9bc (nl80211: allow multiple active scheduled scan
requests) removed WIPHY_FLAG_SUPPORTS_SCHED_SCAN but left the kerneldoc
description in place, leading to this docs-build warning:

   ./include/net/cfg80211.h:3278: warning: Excess enum value
           'WIPHY_FLAG_SUPPORTS_SCHED_SCAN' description in 'wiphy_flags'

Remove the line and gain a bit of peace.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-12-19 09:15:36 +01:00
Herbert Xu acf568ee85 xfrm: Reinject transport-mode packets through tasklet
This is an old bugbear of mine:

https://www.mail-archive.com/netdev@vger.kernel.org/msg03894.html

By crafting special packets, it is possible to cause recursion
in our kernel when processing transport-mode packets at levels
that are only limited by packet size.

The easiest one is with DNAT, but an even worse one is where
UDP encapsulation is used in which case you just have to insert
an UDP encapsulation header in between each level of recursion.

This patch avoids this problem by reinjecting tranport-mode packets
through a tasklet.

Fixes: b05e106698 ("[IPV4/6]: Netfilter IPsec input hooks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-19 08:23:21 +01:00
Jens Axboe 0abc2a1038 block: fix blk_rq_append_bio
Commit caa4b02476e3(blk-map: call blk_queue_bounce from blk_rq_append_bio)
moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider
the fact that the bounced bio becomes invisible to caller since the
parameter type is 'struct bio *'. Make it a pointer to a pointer to
a bio, so the caller sees the right bio also after a bounce.

Fixes: caa4b02476 ("blk-map: call blk_queue_bounce from blk_rq_append_bio")
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Michele Ballabio <barra_cuda@katamail.com>
(handling failure of blk_rq_append_bio(), only call bio_get() after
blk_rq_append_bio() returns OK)
Tested-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-18 13:55:43 -07:00
Ming Lei 14cb0dc647 block: don't let passthrough IO go into .make_request_fn()
Commit a8821f3f3("block: Improvements to bounce-buffer handling") tries
to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES,
but ignores that passthrough I/O can use blk_queue_bounce() too.
Especially, passthrough IO may not be sector-aligned, and the check
of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may
become true even though the max bvec number doesn't exceed BIO_MAX_PAGES,
then cause the bio splitted, and the original passthrough bio is submited
to generic_make_request().

This patch fixes this issue by checking if the bio is passthrough IO,
and use bio_kmalloc() to allocate the cloned passthrough bio.

Cc: NeilBrown <neilb@suse.com>
Fixes: a8821f3f3("block: Improvements to bounce-buffer handling")
Tested-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-18 13:55:43 -07:00
Paolo Bonzini 43aabca38a KVM/ARM Fixes for v4.15, Round 2
Fixes:
  - A bug in our handling of SPE state for non-vhe systems
  - A bug that causes hyp unmapping to go off limits and crash the system on
    shutdown
  - Three timer fixes that were introduced as part of the timer optimizations
    for v4.15
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaN5C1AAoJEEtpOizt6ddyodkH/jN1lquFVdYJBlEO6NXiumEk
 GBH6x6CmuGyiUL3J0ffx5U51x0NN2jE89TpH5d1dsnQg77CCjTCxHtQ9suHne3n1
 5/r0BzHZhaCbnbY0f7+E4EL0UOTpiAwUIqin1ufLPjs4XywcFyiLa7xiWkQkDmyr
 WXKOdppTc4j/FUyqb1fQBmYY8pENR5jjfgdaeZ6C6o7e6aksXgrPWqXhV/6OSRLd
 MOcxA06QfwTWy+MT1x4yo1hzCTjOEvvQXT2Va09moiNxT7hVWWvO/kwJVQL+YpWW
 di7t4CLCvGYUsxM5t8fHHV7X+dfd2nvpJA46TWggPye7yMYkTYXFQu1LHwPIdDU=
 =c5Kt
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-fixes-for-v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Fixes for v4.15, Round 2

Fixes:
 - A bug in our handling of SPE state for non-vhe systems
 - A bug that causes hyp unmapping to go off limits and crash the system on
   shutdown
 - Three timer fixes that were introduced as part of the timer optimizations
   for v4.15
2017-12-18 12:57:43 +01:00
Wanpeng Li e39d200fa5 KVM: Fix stack-out-of-bounds read in write_mmio
Reported by syzkaller:

  BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
  Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298

  CPU: 6 PID: 32298 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #18
  Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
  Call Trace:
   dump_stack+0xab/0xe1
   print_address_description+0x6b/0x290
   kasan_report+0x28a/0x370
   write_mmio+0x11e/0x270 [kvm]
   emulator_read_write_onepage+0x311/0x600 [kvm]
   emulator_read_write+0xef/0x240 [kvm]
   emulator_fix_hypercall+0x105/0x150 [kvm]
   em_hypercall+0x2b/0x80 [kvm]
   x86_emulate_insn+0x2b1/0x1640 [kvm]
   x86_emulate_instruction+0x39a/0xb90 [kvm]
   handle_exception+0x1b4/0x4d0 [kvm_intel]
   vcpu_enter_guest+0x15a0/0x2640 [kvm]
   kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
   kvm_vcpu_ioctl+0x479/0x880 [kvm]
   do_vfs_ioctl+0x142/0x9a0
   SyS_ioctl+0x74/0x80
   entry_SYSCALL_64_fastpath+0x23/0x9a

The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741).  This patch fixes
it by just accessing the bytes which we operate on.

Before patch:

syz-executor-5567  [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f

After patch:

syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-18 12:57:01 +01:00
Marc Zyngier f384dcfe4d KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC
If we don't have a usable GIC, do not try to set the vcpu affinity
as this is guaranteed to fail.

Reported-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:23 +01:00
Linus Torvalds 6ba64feff6 Merge branch 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Page Table Isolation (PTI) preparatory tree from Ingo Molnar:
 "This does a rename to free up linux/pti.h to be used by the upcoming
  page table isolation feature"

* 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  drivers/misc/intel/pti: Rename the header file to free up the namespace
2017-12-17 13:54:31 -08:00
Will Deacon 3382290ed2 locking/barriers: Convert users of lockless_dereference() to READ_ONCE()
[ Note, this is a Git cherry-pick of the following commit:

    506458efaf ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")

  ... for easier x86 PTI code testing and back-porting. ]

READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:57:15 +01:00