Commit graph

648628 commits

Author SHA1 Message Date
Heiko Carstens 89175cf766 s390: provide sclp based boot console
Use the early sclp code to provide a boot console. This boot console
is available if the kernel parameter "earlyprintk" has been specified,
just like it works for other architectures that also provide an early
boot console.

This makes debugging of early problems much easier, since now we
finally have working console output even before memory detection is
running.

The boot console will be automatically disabled as soon as another
console will be registered.

Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:55 +01:00
Heiko Carstens f031974859 s390/sclp: always stay within bounds of the early sccb
Make sure the _sclp_print_lm function stays within bounds of the early
sccb, even if the passed string is very long.  If the string is too
long, the remaining characters will be dropped.

Suggested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:55 +01:00
Heiko Carstens 742dc5773c s390/sclp: make early sclp irq handler more robust
Make the early sclp interrupt handler more robust:

- disable all interrupt sub classes except for the service signal subclass
- extend ctlreg0 union so it is easily possible to set the service signal
  subclass mask bit without using a magic number
- disable lowcore protection before writing to it
- make sure that all write accesses are done before the original content
  of control register 0 is restored, which could enable lowcore protection

Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:55 +01:00
Heiko Carstens 68cc795d19 s390/topology: make "topology=off" parameter work
The "topology=off" kernel parameter is supposed to prevent the kernel
to use hardware topology information to generate scheduling domains
etc.
For an unknown reason I implemented this in a very odd way back then:
instead of simply clearing the MACHINE_HAS_TOPOLOGY flag within the
lowcore I added a second variable which indicated that topology
information should not be used. This is more than suboptimal since it
partially doesn't work.  For the fake NUMA case topology information
is still considered and scheduling domains will be created based on
this.
To fix this and to simplify the code get rid of the extra variable and
implement the "topology=off" case like it is done for other features.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:54 +01:00
Heiko Carstens 970ba6ac6a s390: use false/true when using bool
Yet another trivial patch to reduce the noise that coccinelle
generates.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:54 +01:00
Heiko Carstens 0b92515916 s390: remove couple of unneeded semicolons
Remove a couple of unneeded semicolons. This is just to reduce the
noise that the coccinelle static code checker generates.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:54 +01:00
Heiko Carstens 496e59cc48 s390/topology: reduce number of printks
Merge the seven printks within topology_init_early to a single one.
With an early boot console this avoids printing six lines each
containing only a single character.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:54 +01:00
Heiko Carstens 00de54c803 s390/mem_detect: fix memory type of first block
Fix a long-standing but currently irrelevant bug: the memory detection
code performs a tprot instruction on address zero to figure out if the
first memory chunk is readable or writable. Due to low address
protection the result is "read-only". If the memory detection code
would actually care, it would have to ignore the first memory
increment, but it adds the memory increment to writable memory anyway.

If memblock debugging is enabled this leads to an extra rather
surprising call which registers memory. To avoid this get rid of the
first misleading tprot call and simply assume that the first memory
increment is writable. Otherwise we wouldn't have reached the memory
detection code anyway.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:54 +01:00
Heiko Carstens a2ce2a9568 s390/mem_detect: add debugging output
The s390 specific memory detection code does not call memblock_add,
which would generate debug output if memblock=debug is specified on
the kernel command line. Instead it directly calls memblock_add_range,
which doesn't generate any debug output.
To have a chance to debug early memblock related bugs add an s390
specific memblock_dbg call and a (missing) memblock_dump_all call.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:53 +01:00
Heiko Carstens 7be5e359a7 s390/setup: call memblock_reserve only for size > 0
reserve_initrd currently calls memblock_reserve even if the to be
reserved size is zero. Even though the memblock core code can handle
this correctly, it still yields confusing debug messages if
memblock debugging is enabled.
Therefore make sure to not call memblock_reserve with a size of zero.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:53 +01:00
Sebastian Ott 5064cd3506 s390/pci: use proper endianness annotations
Add proper annotation to the bar definition and use casts within the
bus accessors. Also change the sequence in the accessors to do the
shifts in the native byte order. No functional change.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:53 +01:00
Heiko Carstens 90b3baa232 s390: proper type casts for csum_partial invocations
Keep sparse and other static code checkers from emitting warnings like:

arch/s390/kernel/ipl.c:1549:14: warning: incorrect type in assignment (different base types)
arch/s390/kernel/ipl.c:1549:14:    expected unsigned int [unsigned] csum
arch/s390/kernel/ipl.c:1549:14:    got restricted __wsum

All usages in s390 code are ok. Therefore add proper casts.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:53 +01:00
Corentin Labbe dd69554660 s390/zcore: remove unneeded linux/miscdevice.h include
drivers/s390/char/zcore.c does not contain any miscdevice so the
inclusion of linux/miscdevice.h is uncessary.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:53 +01:00
Sebastian Ott 00851e6925 s390/cio: remove unused struct member
Remove an unused member of struct channel subsystem.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:53 +01:00
Sebastian Ott 64dfdd4b53 s390/cio: export real cssid
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:52 +01:00
Sebastian Ott 15a2044d7f s390/cio: css initialization cleanup
Simplify error handling during css initialization by moving the error
handling code to setup_css (which now cleans up after itself).

Also remove the odd special cleanup handling of the pseudo_subchannel.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:52 +01:00
Sebastian Ott 6c7012688b s390/cio: css attribute cleanup
Cleanup the code to handle the css device attribute. Move everything
to an attribute group to let the driver core handle attribute
creation and removal.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:52 +01:00
Sebastian Ott e2e0de9b57 s390/cio: use cssid for pgid generation
Obtain the real channel subsystem id and use that for the generation
of a unique path group id. Note that this change does not affect the
channel subsystem id as used in the user-visible naming of subchannels
and friends.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshik@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:52 +01:00
Sebastian Ott 98cc43ab6b s390/cio: clarify cssid usage
Currently the cssid in various structures is used as the id of
the respective channel subsystem. Sometimes however we call the
index in the channel_subsystems array cssid. In some places the
id is even used as the index.

Provide a new define MAX_CSS_IDX and use it where appropriate.
In addition to that provide a dummy function to find a channel
subsystem by its id and a macro to iterate over the channel
subsystems.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:52 +01:00
Heiko Carstens 57c52ae757 s390/zcrypt: get rid of variable length arrays
The variable length arrays used to specify clobbered memory within
ap_nqap and ap_dqap would only work if the length would be known at
compile time.
This is not the case for both usages. Therefore simply use a full
memory clobber and get rid of the old construct.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:51 +01:00
Heiko Carstens 227374b1dd s390/zcrypt: make structures static
Get rid of these:
drivers/s390/crypto/ap_card.c:140:20:
  warning: symbol 'ap_card_type' was not declared. Should it be static?
drivers/s390/crypto/ap_queue.c:567:20:
 warning: symbol 'ap_queue_type' was not declared. Should it be static?

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:51 +01:00
Heiko Carstens 9fbd5a0931 s390/cio: get rid of variable length array
Use a flexible array instead. The size of the structure is not used
within chsc_sstpi, therefore no change in semantics but one less
sparse warning:

drivers/s390/cio/chsc.c:1219:27: warning: Variable length array is used.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:51 +01:00
Heiko Carstens 551f413434 s390/lib: improve memmove, memset and memcpy
Improve the memmove implementation to save one instruction and use
better label names. Also use better label names for the memset and
memcpy implementations so everything looks consistent.

Suggested-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:51 +01:00
Heiko Carstens e3850ecfc1 s390/cpumf: get rid of variable length array
The stcctm5 inline assembly uses a variable length array to specify
the memory that is written to.  According to the gcc manual this trick
only works if the length is known at compile time. This is not the the
case for the stccm5 inline assembly.

Therefore simply use a full memory clobber. As requested by Martin
also move the output Q constraint operand to the input operands list,
since all we want is that the compiler generates an instruction that
may use the displacement field: in other words we only need the
address of *val. That the inline assembly actually writes to an array
starting at val is taken care of with the memory clobber.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:51 +01:00
Heiko Carstens 1d9995771f s390: update defconfigs
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-01-16 07:27:48 +01:00
Heiko Carstens e991c24d68 s390/ctl_reg: make __ctl_load a full memory barrier
We have quite a lot of code that depends on the order of the
__ctl_load inline assemby and subsequent memory accesses, like
e.g. disabling lowcore protection and the writing to lowcore.

Since the __ctl_load macro does not have memory barrier semantics, nor
any other dependencies the compiler is, theoretically, free to shuffle
code around. Or in other words: storing to lowcore could happen before
lowcore protection is disabled.

In order to avoid this class of potential bugs simply add a full
memory barrier to the __ctl_load macro.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:48 +01:00
Linus Torvalds 49def18533 Linux 4.10-rc4 2017-01-15 16:21:59 -08:00
Linus Torvalds 99421c1cb2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
 "This tree contains 4 fixes.

  The first is a fix for a race that can causes oopses under the right
  circumstances, and that someone just recently encountered.

  Past that are several small trivial correct fixes. A real issue that
  was blocking development of an out of tree driver, but does not appear
  to have caused any actual problems for in-tree code. A potential
  deadlock that was reported by lockdep. And a deadlock people have
  experienced and took the time to track down caused by a cleanup that
  removed the code to drop a reference count"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  sysctl: Drop reference added by grab_header in proc_sys_readdir
  pid: fix lockdep deadlock warning due to ucount_lock
  libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount
  mnt: Protect the mountpoint hashtable with mount_lock
2017-01-15 16:09:50 -08:00
Linus Torvalds c928162756 char/misc driver fixes for 4.10-rc4
Here are some small char/misc driver fixes for 4.10-rc4 that resolve
 some reported issues.  The MEI driver issue resolves a lot of problems
 that people have been having, as does the mem driver fix.  The other
 minor fixes resolve other reported issues.
 
 All of these have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWHtrhQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk7xACeJZyxp00SO0WqDZlOWmLShYtwfdEAoLgT3A7t
 JkoIfE+/JNbpfLAPwl4N
 =gSvf
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char/misc driver fixes for 4.10-rc4 that resolve
  some reported issues.

  The MEI driver issue resolves a lot of problems that people have been
  having, as does the mem driver fix. The other minor fixes resolve
  other reported issues.

  All of these have been in linux-next for a while"

* tag 'char-misc-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  vme: Fix wrong pointer utilization in ca91cx42_slave_get
  auxdisplay: fix new ht16k33 build errors
  ppdev: don't print a free'd string
  extcon: return error code on failure
  drivers: char: mem: Fix thinkos in kmem address checks
  mei: bus: enable OS version only for SPT and newer
2017-01-15 12:40:53 -08:00
Linus Torvalds 2d5a7101a1 Driver core fix for 4.10-rc4
Here is a single patch being reverted to remove a feature that was added
 in 4.10-rc1 that isn't quite ready for release.  It will be redone as a
 debugfs file instead of a sysfs file in the future.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWHtr+A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymkJQCfZgr9tNZfzUZxojP6lEIiSAEsDO0Anj8l1ZZf
 PRICi50BvVtENhOyE25m
 =arvB
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
 "Here is a single patch being reverted to remove a feature that was
  added in 4.10-rc1 that isn't quite ready for release.

  It will be redone as a debugfs file instead of a sysfs file in the
  future"

* tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Revert "driver core: Add deferred_probe attribute to devices in sysfs"
2017-01-15 12:38:53 -08:00
Linus Torvalds 7f138b9706 TTY/serial fixes for 4.10-rc4
Here are some small tty/serial driver fixes for 4.10-rc4 to resolve a
 number of reported issues.
 
 Nothing major here at all, one revert of a problematic patch, and some
 other tiny bugfixes.  Full details are in the shortlog below.
 
 All have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWHtshA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ym6PQCeJtdwrkIQmisQZYcMpAZrnQnpF9YAnjrcqWZ4
 2glC2IsH8mDm5DR81axE
 =GIdT
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small tty/serial driver fixes for 4.10-rc4 to resolve a
  number of reported issues.

  Nothing major here at all, one revert of a problematic patch, and some
  other tiny bugfixes. Full details are in the shortlog below.

  All have been in linux-next with no reported issues"

* tag 'tty-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  sysrq: attach sysrq handler correctly for 32-bit kernel
  Revert "tty: serial: 8250: add CON_CONSDEV to flags"
  Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break
  8250_pci: Fix potential use-after-free in error path
  tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
  tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx
2017-01-15 12:36:32 -08:00
Linus Torvalds 793e039ea0 USB fixes for 4.10-rc4
Here are a few small USB driver fixes for 4.10-rc4 to resolve some
 reported issues.  The "largest" here is a number of bugs being fixed in
 the ch341 usb-serial driver, to hopefully resolve the mess of different
 devices floating around that use this driver that have been having
 problems with the 4.10-rc1 release.  There's also a tiny musb fix that I
 missed in the last pull request, as well as the traditional xhci fix
 rounding out the batch.
 
 All have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWHttLg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylfnQCguPTDSPkdU5vSsu8eCEplsql6izUAnjz5WAuI
 YDQBXrYkmQ5HRM4U2/8T
 =M2Eg
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a few small USB driver fixes for 4.10-rc4 to resolve some
  reported issues.

  The "largest" here is a number of bugs being fixed in the ch341
  usb-serial driver, to hopefully resolve the mess of different devices
  floating around that use this driver that have been having problems
  with the 4.10-rc1 release.

  There's also a tiny musb fix that I missed in the last pull request,
  as well as the traditional xhci fix rounding out the batch.

  All have been in linux-next with no reported issues"

* tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  xhci: fix deadlock at host remove by running watchdog correctly
  USB: serial: ch341: fix control-message error handling
  usb: musb: fix runtime PM in debugfs
  wusbcore: Fix one more crypto-on-the-stack bug
  USB: serial: kl5kusb105: fix line-state error handling
  USB: serial: ch341: fix baud rate and line-control handling
  USB: serial: ch341: fix line settings after reset-resume
  USB: serial: ch341: fix resume after reset
  USB: serial: ch341: fix open error handling
  USB: serial: ch341: fix modem-control and B0 handling
  USB: serial: ch341: fix open and resume after B0
  USB: serial: ch341: fix initial modem-control state
2017-01-15 12:34:35 -08:00
Linus Torvalds aa2797b30c Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Bugfixes for I2C. Mostly core this time which is a bit unusual but
  nothing really scary in there"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: piix4: Avoid race conditions with IMC
  i2c: fix spelling mistake: "insufficent" -> "insufficient"
  i2c: print correct device invalid address
  i2c: do not enable fall back to Host Notify by default
  i2c: fix kernel memory disclosure in dev interface
2017-01-15 12:28:14 -08:00
Linus Torvalds 83346fbc07 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - unwinder fixes
   - AMD CPU topology enumeration fixes
   - microcode loader fixes
   - x86 embedded platform fixes
   - fix for a bootup crash that may trigger when clearcpuid= is used
     with invalid values"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mpx: Use compatible types in comparison to fix sparse error
  x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()
  x86/entry: Fix the end of the stack for newly forked tasks
  x86/unwind: Include __schedule() in stack traces
  x86/unwind: Disable KASAN checks for non-current tasks
  x86/unwind: Silence warnings for non-current tasks
  x86/microcode/intel: Use correct buffer size for saving microcode data
  x86/microcode/intel: Fix allocation size of struct ucode_patch
  x86/microcode/intel: Add a helper which gives the microcode revision
  x86/microcode: Use native CPUID to tickle out microcode revision
  x86/CPU: Add native CPUID variants returning a single datum
  x86/boot: Add missing declaration of string functions
  x86/CPU/AMD: Fix Bulldozer topology
  x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev'
  x86/cpu: Fix typo in the comment for Anniedale
  x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
2017-01-15 12:03:11 -08:00
Linus Torvalds a11ce3a4ac Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NOHZ fix from Ingo Molnar:
 "This fixes an old NOHZ race where we incorrectly calculate the next
  timer interrupt in certain circumstances where hrtimers are pending,
  that can cause hard to reproduce stalled-values artifacts in
  /proc/stat"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: Fix collision between tick and other hrtimers
2017-01-15 12:00:37 -08:00
Linus Torvalds 79078c53ba Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
  driver fixes, plus miscellanous tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Reject non sampling events with precise_ip
  perf/x86/intel: Account interrupts for PEBS errors
  perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
  perf/core: Fix sys_perf_event_open() vs. hotplug
  perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
  perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
  perf/x86: Set pmu->module in Intel PMU modules
  perf probe: Fix to probe on gcc generated symbols for offline kernel
  perf probe: Fix --funcs to show correct symbols for offline module
  perf symbols: Robustify reading of build-id from sysfs
  perf tools: Install tools/lib/traceevent plugins with install-bin
  tools lib traceevent: Fix prev/next_prio for deadline tasks
  perf record: Fix --switch-output documentation and comment
  perf record: Make __record_options static
  tools lib subcmd: Add OPT_STRING_OPTARG_SET option
  perf probe: Fix to get correct modname from elf header
  samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
  samples/bpf sock_example: Avoid getting ethhdr from two includes
  perf sched timehist: Show total scheduling time
2017-01-15 11:37:43 -08:00
Linus Torvalds 255e6140fa Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "A number of regression fixes:

   - Fix a boot hang on machines that have somewhat unusual memory map
     entries of phys_addr=0x0 num_pages=0, which broke due to a recent
     commit. This commit got cherry-picked from the v4.11 queue because
     the bug is affecting real machines.

   - Fix a boot hang also reported by KASAN, caused by incorrect init
     ordering introduced by a recent optimization.

   - Fix a recent robustification fix to allocate_new_fdt_and_exit_boot()
     that introduced an invalid assumption. Neither bugs were seen in
     the wild AFAIK"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/x86: Prune invalid memory map entries and fix boot regression
  x86/efi: Don't allocate memmap through memblock after mm_init()
  efi/libstub/arm*: Pass latest memory map to the kernel
2017-01-15 10:54:39 -08:00
Linus Torvalds f4d3935e4f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

The most notable fix here is probably the fix for a splice regression
("fix a fencepost error in pipe_advance()") noticed by Alan Wylie.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix a fencepost error in pipe_advance()
  coredump: Ensure proper size of sparse core files
  aio: fix lock dep warning
  tmpfs: clear S_ISGID when setting posix ACLs
2017-01-14 17:13:28 -08:00
Linus Torvalds 34241af77b Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - the virtio_blk stack DMA corruption fix from Christoph, fixing and
   issue with VMAP stacks.

 - O_DIRECT blkbits calculation fix from Chandan.

 - discard regression fix from Christoph.

 - queue init error handling fixes for nbd and virtio_blk, from Omar and
   Jeff.

 - two small nvme fixes, from Christoph and Guilherme.

 - rename of blk_queue_zone_size and bdev_zone_size to _sectors instead,
   to more closely follow what we do in other places in the block layer.
   This interface is new for this series, so let's get the naming right
   before releasing a kernel with this feature. From Damien.

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: don't try to discard from __blkdev_issue_zeroout
  sd: remove __data_len hack for WRITE SAME
  nvme: use blk_rq_payload_bytes
  scsi: use blk_rq_payload_bytes
  block: add blk_rq_payload_bytes
  block: Rename blk_queue_zone_size and bdev_zone_size
  nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
  nvme-rdma: fix nvme_rdma_queue_is_ready
  virtio_blk: fix panic in initialization error path
  nbd: blk_mq_init_queue returns an error code on failure, not NULL
  virtio_blk: avoid DMA to stack for the sense buffer
  do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned
2017-01-14 17:07:04 -08:00
Al Viro b9dc6f65bc fix a fencepost error in pipe_advance()
The logics in pipe_advance() used to release all buffers past the new
position failed in cases when the number of buffers to release was equal
to pipe->buffers.  If that happened, none of them had been released,
leaving pipe full.  Worse, it was trivial to trigger and we end up with
pipe full of uninitialized pages.  IOW, it's an infoleak.

Cc: stable@vger.kernel.org # v4.9
Reported-by: "Alan J. Wylie" <alan@wylie.me.uk>
Tested-by: "Alan J. Wylie" <alan@wylie.me.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-14 19:50:41 -05:00
Dave Kleikamp 4d22c75d4c coredump: Ensure proper size of sparse core files
If the last section of a core file ends with an unmapped or zero page,
the size of the file does not correspond with the last dump_skip() call.
gdb complains that the file is truncated and can be confusing to users.

After all of the vma sections are written, make sure that the file size
is no smaller than the current file position.

This problem can be demonstrated with gdb's bigcore testcase on the
sparc architecture.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-14 19:32:40 -05:00
Shaohua Li a12f1ae61c aio: fix lock dep warning
lockdep reports a warnning. file_start_write/file_end_write only
acquire/release the lock for regular files. So checking the files in aio
side too.

[  453.532141] ------------[ cut here ]------------
[  453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670
[  453.533011] DEBUG_LOCKS_WARN_ON(depth <= 0)
[  453.533011] Modules linked in:
[  453.533011] CPU: 1 PID: 1298 Comm: fio Not tainted 4.9.0+ #964
[  453.533011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
[  453.533011]  ffff8803a24b7a70 ffffffff8196cffb ffff8803a24b7ae8 0000000000000000
[  453.533011]  ffff8803a24b7ab8 ffffffff81091ee1 ffff8803a5dba700 00000dba00000008
[  453.533011]  ffffed0074496f59 ffff8803a5dbaf54 ffff8803ae0f8488 fffffffffffffdef
[  453.533011] Call Trace:
[  453.533011]  [<ffffffff8196cffb>] dump_stack+0x67/0x9c
[  453.533011]  [<ffffffff81091ee1>] __warn+0x111/0x130
[  453.533011]  [<ffffffff81091f97>] warn_slowpath_fmt+0x97/0xb0
[  453.533011]  [<ffffffff81091f00>] ? __warn+0x130/0x130
[  453.533011]  [<ffffffff8191b789>] ? blk_finish_plug+0x29/0x60
[  453.533011]  [<ffffffff811205d4>] lock_release+0x434/0x670
[  453.533011]  [<ffffffff8198af94>] ? import_single_range+0xd4/0x110
[  453.533011]  [<ffffffff81322195>] ? rw_verify_area+0x65/0x140
[  453.533011]  [<ffffffff813aa696>] ? aio_write+0x1f6/0x280
[  453.533011]  [<ffffffff813aa6c9>] aio_write+0x229/0x280
[  453.533011]  [<ffffffff813aa4a0>] ? aio_complete+0x640/0x640
[  453.533011]  [<ffffffff8111df20>] ? debug_check_no_locks_freed+0x1a0/0x1a0
[  453.533011]  [<ffffffff8114793a>] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30
[  453.533011]  [<ffffffff81147985>] ? debug_lockdep_rcu_enabled+0x35/0x40
[  453.533011]  [<ffffffff812a92be>] ? __might_fault+0x7e/0xf0
[  453.533011]  [<ffffffff813ac9bc>] do_io_submit+0x94c/0xb10
[  453.533011]  [<ffffffff813ac2ae>] ? do_io_submit+0x23e/0xb10
[  453.533011]  [<ffffffff813ac070>] ? SyS_io_destroy+0x270/0x270
[  453.533011]  [<ffffffff8111d7b3>] ? mark_held_locks+0x23/0xc0
[  453.533011]  [<ffffffff8100201a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
[  453.533011]  [<ffffffff813acb90>] SyS_io_submit+0x10/0x20
[  453.533011]  [<ffffffff824f96aa>] entry_SYSCALL_64_fastpath+0x18/0xad
[  453.533011]  [<ffffffff81119190>] ? trace_hardirqs_off_caller+0xc0/0x110
[  453.533011] ---[ end trace b2fbe664d1cc0082 ]---

Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-14 19:31:40 -05:00
Linus Torvalds f0ad17712b dmaengine-4.10-rc4
dmaengine fixes for 4.10-rc4
 
 The fixes this time around are spread over drivers, pretty normal update.
 
  o PCI ID for SKL ioatdma, workaround for SKX and ioat_alloc_chan_resources
    sleepy allocation fix.
  o dw kconfig typo fix
  o null pointer deref for stm32
  o MAINTAINERS Update for at_hdmac
  o pl330 runtime pm fixes
  o omap-dma port window fix
  o rcar-dmac unmap slave resource fix.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYekAeAAoJEHwUBw8lI4NHyWAQAMQxH4dOms2p4PuM65HCwJFW
 BvEsljmvn+XmlqwhaMxrwGcyMPUpVSkQUkMQZYa9hMAH4fwWLIAemxdHYR9WpXL4
 YsVz9pAgDs6G3XbunhMA6mM5vyR18uKkzUaLWtU/z4nw45RWe7NkG9Gw7k2gWKfK
 rNYIOYts4Ue3sPXsYjYn4mdHkWslFcaKuxxRZGpKAmF/V7i7m5xg6WVEykVRK7EL
 0e944zdAwmRkc3dcW+SWmZqbZwp+Sjj0Vn5kCbTEoItUQN40VnxOK25ZN+YAxpxg
 28pljjTfo0LRW0rgNUGulFhg0ssMns4J3gArfqbzRvDWOFCH6MZ1e1CwbLmZLF9U
 rK/escqsKuXFG76EstdZzfjq0H15vfF1backq23ww6WXr9w4TAUwUJR7mwShVP+O
 9WTkfjwlXEnUoixutfsfTr7YSdztCBXB34BLJxNKItl5Mm7N923BTPW06e9J7S61
 Fiv8E85U4WnMe1BLBPNK550Zz6YUAtqfBeYBFWZExury9jSX5tIjeoEbaxDO+FEJ
 fvWcjFoHZ2nQfigyafaoEIkpvjnTgveSs39tPWhTat/pucCb7za/aiA7syR9oGF2
 kRa7++5TKt7q+hr1KOzvK3TNgI8fVItBd0Asi0LejIdJa4OoxAPNa7EuRLNh+vZG
 YQoKllPDYToI/RowjGKW
 =pJ7d
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "The fixes this time around are spread over drivers, pretty normal
  update:

   - PCI ID for SKL ioatdma, workaround for SKX and
     ioat_alloc_chan_resources sleepy allocation fix

   - dw kconfig typo fix

   - null pointer deref for stm32

   - MAINTAINERS Update for at_hdmac

   - pl330 runtime pm fixes

   - omap-dma port window fix

   - rcar-dmac unmap slave resource fix"

* tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: rcar-dmac: unmap slave resource when channel is freed
  dmaengine: omap-dma: Fix the port_window support
  dmaengine: iota: ioat_alloc_chan_resources should not perform sleeping allocations.
  dmaengine: pl330: Fix runtime PM support for terminated transfers
  MAINTAINERS: dmaengine: Update + Hand over the at_hdmac driver to Ludovic
  dmaengine: omap-dma: Fix dynamic lch_map allocation
  dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
  dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status
  dmaengine: stm32-dma: Set correct args number for DMA request from DT
  dmaengine: dw: fix typo in Kconfig
  dmaengine: ioatdma: workaround SKX ioatdma version
  dmaengine: ioatdma: Add Skylake PCI Dev ID
2017-01-14 11:09:24 -08:00
Peter Jones 0100a3e67a efi/x86: Prune invalid memory map entries and fix boot regression
Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
(2.28), include memory map entries with phys_addr=0x0 and num_pages=0.

These machines fail to boot after the following commit,

  commit 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")

Fix this by removing such bogus entries from the memory map.

Furthermore, currently the log output for this case (with efi=debug)
looks like:

 [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |  |  |  |  |  ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)

This is clearly wrong, and also not as informative as it could be.  This
patch changes it so that if we find obviously invalid memory map
entries, we print an error and skip those entries.  It also detects the
display of the address range calculation overflow, so the new output is:

 [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
 [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[0x0000000000000000-0x0000000000000000] (invalid)

It also detects memory map sizes that would overflow the physical
address, for example phys_addr=0xfffffffffffff000 and
num_pages=0x0200000000000001, and prints:

 [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
 [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)

It then removes these entries from the memory map.

Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
[Matt: Include bugzilla info in commit log]
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 16:48:53 +01:00
Greg Kroah-Hartman c7334ce814 Revert "driver core: Add deferred_probe attribute to devices in sysfs"
This reverts commit 6751667a29.

Rob Herring objected to it, and a replacement for it will be added using
debugfs in the future.

Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-14 14:09:03 +01:00
Jiri Olsa 18e7a45af9 perf/x86: Reject non sampling events with precise_ip
As Peter suggested [1] rejecting non sampling PEBS events,
because they dont make any sense and could cause bugs
in the NMI handler [2].

  [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop
  [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170103142454.GA26251@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:06:50 +01:00
Jiri Olsa 475113d937 perf/x86/intel: Account interrupts for PEBS errors
It's possible to set up PEBS events to get only errors and not
any data, like on SNB-X (model 45) and IVB-EP (model 62)
via 2 perf commands running simultaneously:

    taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10

This leads to a soft lock up, because the error path of the
intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
for error PEBS interrupts, so in case you're getting ONLY
errors you don't have a way to stop the event when it's over
the max_samples_per_tick limit:

  NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
  ...
  RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
  ...
  Call Trace:
   ? trace_hardirqs_on_caller+0xf5/0x1b0
   ? perf_cgroup_attach+0x70/0x70
   perf_install_in_context+0x199/0x1b0
   ? ctx_resched+0x90/0x90
   SYSC_perf_event_open+0x641/0xf90
   SyS_perf_event_open+0x9/0x10
   do_syscall_64+0x6c/0x1f0
   entry_SYSCALL64_slow_path+0x25/0x25

Add perf_event_account_interrupt() which does the interrupt
and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
error path.

We keep the pending_kill and pending_wakeup logic only in the
__perf_event_overflow() path, because they make sense only if
there's any data to deliver.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:06:49 +01:00
Peter Zijlstra 321027c1fe perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
Di Shen reported a race between two concurrent sys_perf_event_open()
calls where both try and move the same pre-existing software group
into a hardware context.

The problem is exactly that described in commit:

  f63a8daa58 ("perf: Fix event->ctx locking")

... where, while we wait for a ctx->mutex acquisition, the event->ctx
relation can have changed under us.

That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.

So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).

Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the 'move_group' case. This obviously means
we need to validate state after we acquire the locks.

Reported-by: Di Shen (Keen Lab)
Tested-by: John Dias <joaodias@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Min Chong <mchong@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: f63a8daa58 ("perf: Fix event->ctx locking")
Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 10:56:11 +01:00
Peter Zijlstra 63cae12bce perf/core: Fix sys_perf_event_open() vs. hotplug
There is problem with installing an event in a task that is 'stuck' on
an offline CPU.

Blocked tasks are not dis-assosciated from offlined CPUs, after all, a
blocked task doesn't run and doesn't require a CPU etc.. Only on
wakeup do we ammend the situation and place the task on a available
CPU.

If we hit such a task with perf_install_in_context() we'll loop until
either that task wakes up or the CPU comes back online, if the task
waking depends on the event being installed, we're stuck.

While looking into this issue, I also spotted another problem, if we
hit a task with perf_install_in_context() that is in the middle of
being migrated, that is we observe the old CPU before sending the IPI,
but run the IPI (on the old CPU) while the task is already running on
the new CPU, things also go sideways.

Rework things to rely on task_curr() -- outside of rq->lock -- which
is rather tricky. Imagine the following scenario where we're trying to
install the first event into our task 't':

CPU0            CPU1            CPU2

                (current == t)

t->perf_event_ctxp[] = ctx;
smp_mb();
cpu = task_cpu(t);

                switch(t, n);
                                migrate(t, 2);
                                switch(p, t);

                                ctx = t->perf_event_ctxp[]; // must not be NULL

smp_function_call(cpu, ..);

                generic_exec_single()
                  func();
                    spin_lock(ctx->lock);
                    if (task_curr(t)) // false

                    add_event_to_ctx();
                    spin_unlock(ctx->lock);

                                perf_event_context_sched_in();
                                  spin_lock(ctx->lock);
                                  // sees event

So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'.
Because if CPU2's load of that variable were to observe NULL, it would
not try to schedule the ctx and we'd have a task running without its
counter, which would be 'bad'.

As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
first and not see the event yet, then CPU0 must observe task_curr()
and retry. If the install happens first, then we must see the event on
sched-in and all is well.

I think we can translate the first part (until the 'must not be NULL')
of the scenario to a litmus test like:

  C C-peterz

  {
  }

  P0(int *x, int *y)
  {
          int r1;

          WRITE_ONCE(*x, 1);
          smp_mb();
          r1 = READ_ONCE(*y);
  }

  P1(int *y, int *z)
  {
          WRITE_ONCE(*y, 1);
          smp_store_release(z, 1);
  }

  P2(int *x, int *z)
  {
          int r1;
          int r2;

          r1 = smp_load_acquire(z);
	  smp_mb();
          r2 = READ_ONCE(*x);
  }

  exists
  (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)

Where:
  x is perf_event_ctxp[],
  y is our tasks's CPU, and
  z is our task being placed on the rq of CPU2.

The P0 smp_mb() is the one added by this patch, ordering the store to
perf_event_ctxp[] from find_get_context() and the load of task_cpu()
in task_function_call().

The smp_store_release/smp_load_acquire model the RCpc locking of the
rq->lock and the smp_mb() of P2 is the context switch switching from
whatever CPU2 was running to our task 't'.

This litmus test evaluates into:

  Test C-peterz Allowed
  States 7
  0:r1=0; 2:r1=0; 2:r2=0;
  0:r1=0; 2:r1=0; 2:r2=1;
  0:r1=0; 2:r1=1; 2:r2=1;
  0:r1=1; 2:r1=0; 2:r2=0;
  0:r1=1; 2:r1=0; 2:r2=1;
  0:r1=1; 2:r1=1; 2:r2=0;
  0:r1=1; 2:r1=1; 2:r2=1;
  No
  Witnesses
  Positive: 0 Negative: 7
  Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
  Observation C-peterz Never 0 7
  Hash=e427f41d9146b2a5445101d3e2fcaa34

And the strong and weak model agree.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: jeremy.linton@arm.com
Link: http://lkml.kernel.org/r/20161209135900.GU3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 10:56:10 +01:00
Tobias Klauser 4538286257 x86/mpx: Use compatible types in comparison to fix sparse error
info->si_addr is of type void __user *, so it should be compared against
something from the same address space.

This fixes the following sparse error:

  arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces)

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 09:32:06 +01:00