1
0
Fork 0
Commit Graph

826093 Commits (917257daa0fea7a007102691c0e27d9216a96768)

Author SHA1 Message Date
Olof Johansson 3e372088ab arm64: dts: stratix10: fix emac loading warning
- Add missing "altr,sysmgr-syscon" property to all gmac nodes
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEoHhMeiyk5VmwVMwNGZQEC4GjKPQFAlybmyUUHGRpbmd1eWVu
 QGtlcm5lbC5vcmcACgkQGZQEC4GjKPQ4QxAAsdVsviPs9gP+6kZgs4XfZTxWTOBG
 MINa6/6LMDZG6soOV0hoi7S8MMgJGfrW0Pd0gwhBeBMw7Fe51i85RK8TwVX1cV6g
 VxeOKn63OUn0SemjbqsfiHTMT7A7Zmr0i5uJyrQOlcT1xa3HUrRB2VUB7StEtz/9
 7bLxlyHh+bjxWc6b6muj4tB0pGDd3lk1wsFIoo6glDpSnaYs1EdU9l1R7Scn2Ev5
 5LixDzIrQyqTMjnuCVSxSNOARUGFq4v+2EiLwO2F9hzE/NjFTmVS/apNCcw3KaAb
 VhnAMJPJ5OMhefoOcww901ZWTy+rYobsGrF7xf8/AR/WH/bPh2mVGqA/1FbcsQC6
 TkR9evk9kCj87Ri7pbfIgdGywpFxpMla1z7I8yNhdwMcw2SuHds+8C9souZ5rEx1
 yIifAJF1ZhJd52e54Xar85Kj72mKTI2Y+iJPyROUxjvyDSJ/2pzsgf96NH2PXWYw
 n2aRnRXudHnl0cos1p/s8VcpMM1NuEBr/aSMpwhImyWgVWzoPbpGY/wEToxenANO
 iyoRRh513etjQUa5Biw10B4B8r5LOhJfJ2/LTl8F5ANcYXKqkfYpllz7OYb4Ydfa
 QcduDWtdWxjwXeXF06fRRFHCYHg2GKGoWVAXz1qah/JclrUjzS65NWoLbb3+RJsU
 nPU2ZkOje/1Eh2A=
 =5h+u
 -----END PGP SIGNATURE-----

Merge tag 'stratix10_fix_for_v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into arm/fixes

arm64: dts: stratix10: fix emac loading warning
- Add missing "altr,sysmgr-syscon" property to all gmac nodes

* tag 'stratix10_fix_for_v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
  arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's

Signed-off-by: Olof Johansson <olof@lixom.net>
2019-04-07 15:14:30 -07:00
Olof Johansson 57683e452b Reset controller fixes for v5.1
This tag adds missing USB PHY reset lines to the Meson G12A reset
 controller header and fixes the Meson Audio ARB driver to prevent
 module unloading while it is in use.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQRRO6F6WdpH1R0vGibVhaclGDdiwAUCXJoKoRcccC56YWJlbEBw
 ZW5ndXRyb25peC5kZQAKCRDVhaclGDdiwIrGAP9cKFE8ans4N54ymyvB2p5TrPSK
 2oSVofr49W9bN1sHiwD+MvRJpJDEHtzAxfnNOvAd02wlU9GR/iDuUfGFI2gc7AA=
 =RDL9
 -----END PGP SIGNATURE-----

Merge tag 'reset-fixes-for-v5.1' of git://git.pengutronix.de/pza/linux into arm/fixes

Reset controller fixes for v5.1

This tag adds missing USB PHY reset lines to the Meson G12A reset
controller header and fixes the Meson Audio ARB driver to prevent
module unloading while it is in use.

* tag 'reset-fixes-for-v5.1' of git://git.pengutronix.de/pza/linux:
  reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev
  dt-bindings: reset: meson-g12a: Add missing USB2 PHY resets

Signed-off-by: Olof Johansson <olof@lixom.net>
2019-04-07 15:14:00 -07:00
Maxime Ripard ac0722f23f dt-bindings: cpu: Fix JSON schema
Commit fd73403a48 ("dt-bindings: arm: Add SMP enable-method for
Milbeaut") added support for a new cpu enable-method, but did so using
tabulations to ident. This is however invalid in the syntax, and resulted
in a failure when trying to use that schemas for validation.

Use spaces instead of tabs to indent to fix this.

Fixes: fd73403a48 ("dt-bindings: arm: Add SMP enable-method for Milbeaut")
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Sugaya Taichi <sugaya.taichi@socionext.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2019-04-07 15:12:21 -07:00
Christophe Leroy dd9a994fc6 powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
Commit b5b4453e79 ("powerpc/vdso64: Fix CLOCK_MONOTONIC
inconsistencies across Y2038") changed the type of wtom_clock_sec
to s64 on PPC64. Therefore, VDSO32 needs to read it with a 4 bytes
shift in order to retrieve the lower part of it.

Fixes: b5b4453e79 ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-04-08 06:57:19 +10:00
Linus Torvalds 3b04689147 xen: fixes for 5.1-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXKoNnwAKCRCAXGG7T9hj
 vqpEAQCMeiLXXp+BMGI1+x1eeE4ri2woGkK1lsZJLOJhGIqTfgD/dDvmhCSQBDAs
 IbDDbNJP1IT4jQ98c5obw+qEt9OWcww=
 =J7ME
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "One minor fix and a small cleanup for the xen privcmd driver"

* tag 'for-linus-5.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Prevent buffer overflow in privcmd ioctl
  xen: use struct_size() helper in kzalloc()
2019-04-07 06:12:10 -10:00
Linus Torvalds 82331a70cc This pull request contains a single fix for MTD:
- Fix for a possible infinite loop in the cfi_cmdset_0002 driver
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlyqCf4WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wQbDD/94suc16vZ5YBONweRaxf+B19hq
 10VtuQe7gMIbQXPJh9MP4utQuw97WOHXRnDWyHibWffdozpUDGvayGMb++Vsa8Ck
 bKwLHP7oqWNjw+yEIXiB9XkQ4tRNZXMfOzMA9MGd4HjAAgtma00IH9xDDM7q3Fcf
 VUyOG4dXyhN78h1/y2dxukc6BldHpM8eJ8iKRPuk3xcTuFIXO/YwTKSGhaK4LJXh
 RkYs550IkmZalhKLU1ZA0CpPVJBedd8WlJ0K2dXuIPM/Ciyl7MMF5GyveVf7nMkO
 hUE7AJs7Qan+qQ6jfkS9B2G4I2VDY76SOKK3k6dGMYBxEKBnJhWHCCm5X/1hJnQ/
 QDxS1w3yQAAfAkphyoXI1xDdyrRHbEk5fwRNGyEwPsPaBvqT6uRVvYue/tS9QPJu
 bPPZLa4QY+7sekL7R0CiBiMttd5XQ3NyezQVCx9GST4A6Q1MDtNSo3CpRpwR38LK
 34xuXmUinK09ehf6VreBhQp3Gj4bhNs0HI8OUpeF4k4svKMlrjC0EtBQYF+PpCI8
 myZOqDI2KvPy30wS1gN5C7vt3tYbxuwbADRVhEgVgyr1PclU7NLzRi6LJaYM+02f
 3YJXLroPd6Xu1ZU+DmDQMdOZ1uf9M6IsUOiDczVfOs0klDWpROSphkPKedgZtxsj
 4HQPJYbP6rBJN5+KMA==
 =kUd0
 -----END PGP SIGNATURE-----

Merge tag 'mtd/fixes-for-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD fix from Richard Weinberger:
 "A single fix for a possible infinite loop in the cfi_cmdset_0002
  driver"

* tag 'mtd/fixes-for-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer
2019-04-07 06:07:20 -10:00
Linus Torvalds eccc58cb10 SCSI fixes on 20190406
five small fixes.  Four in three drivers: qedi, lpfc and storvsc.  The
 final one is labelled core, but merely adds a dh rdac entry for Lenovo
 systems.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXKmCMiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishVskAQCwHoh8
 M95Dp83LOxHcrbdKlKshKJdeIqnNnmMGyqu9mAD+NI0TJfnRXIjTAKdesCuxw6M1
 V8NVuWUuLBG2SOXYNXs=
 =207W
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Five small fixes. Four in three drivers: qedi, lpfc and storvsc. The
  final one is labelled core, but merely adds a dh rdac entry for Lenovo
  systems"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Fix missing wakeups on abort threads
  scsi: storvsc: Reduce default ring buffer size to 128 Kbytes
  scsi: storvsc: Fix calculation of sub-channel count
  scsi: core: add new RDAC LENOVO/DE_Series device
  scsi: qedi: remove declaration of nvm_image from stack
2019-04-07 06:00:35 -10:00
Dan Carpenter 6491d69839 nfc: nci: Potential off by one in ->pipes[] array
This is similar to commit e285d5bfb7 ("NFC: Fix the number of pipes")
where we changed NFC_HCI_MAX_PIPES from 127 to 128.

As the comment next to the define explains, the pipe identifier is 7
bits long.  The highest possible pipe is 127, but the number of possible
pipes is 128.  As the code is now, then there is potential for an
out of bounds array access:

    net/nfc/nci/hci.c:297 nci_hci_cmd_received() warn: array off by one?
    'ndev->hci_dev->pipes[pipe]' '0-127 == 127'

Fixes: 11f54f2286 ("NFC: nci: Add HCI over NCI protocol support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-06 15:05:07 -07:00
Dan Carpenter d7ee81ad09 NFC: nci: Add some bounds checking in nci_hci_cmd_received()
This is similar to commit 674d9de02a ("NFC: Fix possible memory
corruption when handling SHDLC I-Frame commands").

I'm not totally sure, but I think that commit description may have
overstated the danger.  I was under the impression that this data came
from the firmware?  If you can't trust your networking firmware, then
you're already in trouble.

Anyway, these days we add bounds checking where ever we can and we call
it kernel hardening.  Better safe than sorry.

Fixes: 11f54f2286 ("NFC: nci: Add HCI over NCI protocol support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-06 15:05:07 -07:00
Linus Torvalds faac51ddac Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "A simple but wanted driver bugfix"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: imx: don't leak the i2c adapter on error
2019-04-06 11:52:59 -10:00
Linus Torvalds 373c392508 Merge branch 'parisc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
 "A 32-bit boot regression fix introduced in the merge window, a QEMU
  detection fix and two fixes by Sven regarding ptrace & kprobes"

* 'parisc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Detect QEMU earlier in boot process
  parisc: also set iaoq_b in instruction_pointer_set()
  parisc: regs_return_value() should return gpr28
  Revert: parisc: Use F_EXTEND() macro in iosapic code
2019-04-06 10:59:30 -10:00
Helge Deller d006e95b55 parisc: Detect QEMU earlier in boot process
While adding LASI support to QEMU, I noticed that the QEMU detection in
the kernel happens much too late. For example, when a LASI chip is found
by the kernel, it registers the LASI LED driver as well.  But when we
run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so
we need to access the running_on_QEMU flag earlier than before.

This patch now makes the QEMU detection the fist task of the Linux
kernel by moving it to where the kernel enters the C-coding.

Fixes: 310d82784f ("parisc: qemu idle sleep support")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.14+
2019-04-06 19:07:55 +02:00
Sven Schnelle f324fa5832 parisc: also set iaoq_b in instruction_pointer_set()
When setting the instruction pointer on PA-RISC we also need
to set the back of the instruction queue to the new offset, otherwise
we will execute on instruction from the new location, and jumping
back to the old location stored in iaoq_b.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 75ebedf1d2 ("parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature")
Cc: stable@vger.kernel.org # 4.19+
2019-04-06 19:07:55 +02:00
Sven Schnelle 45efd871bf parisc: regs_return_value() should return gpr28
While working on kretprobes for PA-RISC I was wondering while the
kprobes sanity test always fails on kretprobes. This is caused by
returning gpr20 instead of gpr28.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.14+
2019-04-06 19:07:55 +02:00
Helge Deller c2f8d7cb32 Revert: parisc: Use F_EXTEND() macro in iosapic code
Revert parts of commit 97d7e2e3fd ("parisc: Use F_EXTEND() macro in
iosapic code"). It breaks booting the 32-bit kernel on some machines.

Reported-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Sven Schnelle <svens@stackframe.org>
Fixes: 97d7e2e3fd ("parisc: Use F_EXTEND() macro in iosapic code")
Signed-off-by: Helge Deller <deller@gmx.de>
2019-04-06 19:07:55 +02:00
Kirill Smelkov 10dce8af34 fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
Commit 9c225f2655 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.

This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d0 ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.

The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.

However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655 -
is doing so irregardless of whether a file is seekable or not.

See

    https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
    https://lwn.net/Articles/180387
    https://lwn.net/Articles/180396

for historic context.

The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:

	kernel/power/user.c		snapshot_read
	fs/debugfs/file.c		u32_array_read
	fs/fuse/control.c		fuse_conn_waiting_read + ...
	drivers/hwmon/asus_atk0110.c	atk_debugfs_ggrp_read
	arch/s390/hypfs/inode.c		hypfs_read_iter
	...

Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.

Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):

	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
	drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
	drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()

In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.

FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f7 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:

See

    https://github.com/libfuse/osspd
    https://lwn.net/Articles/308445

    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510

Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:

    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216

I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:

    https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169

Let's fix this regression. The plan is:

1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
   doing so would break many in-kernel nonseekable_open users which
   actually use ppos in read/write handlers.

2. Add stream_open() to kernel to open stream-like non-seekable file
   descriptors. Read and write on such file descriptors would never use
   nor change ppos. And with that property on stream-like files read and
   write will be running without taking f_pos lock - i.e. read and write
   could be running simultaneously.

3. With semantic patch search and convert to stream_open all in-kernel
   nonseekable_open users for which read and write actually do not
   depend on ppos and where there is no other methods in file_operations
   which assume @offset access.

4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
   steam_open if that bit is present in filesystem open reply.

   It was tempting to change fs/fuse/ open handler to use stream_open
   instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
   grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
   and in particular GVFS which actually uses offset in its read and
   write handlers

	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

   so if we would do such a change it will break a real user.

5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
   from v3.14+ (the kernel where 9c225f2655 first appeared).

   This will allow to patch OSSPD and other FUSE filesystems that
   provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
   in their open handler and this way avoid the deadlock on all kernel
   versions. This should work because fs/fuse/ ignores unknown open
   flags returned from a filesystem and so passing FOPEN_STREAM to a
   kernel that is not aware of this flag cannot hurt. In turn the kernel
   that is not aware of FOPEN_STREAM will be < v3.14 where just
   FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
   write deadlock.

This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.

Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.

The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-06 07:01:55 -10:00
Guenter Roeck 47b16820c4 xsysace: Fix error handling in ace_setup
If xace hardware reports a bad version number, the error handling code
in ace_setup() calls put_disk(), followed by queue cleanup. However, since
the disk data structure has the queue pointer set, put_disk() also
cleans and releases the queue. This results in blk_cleanup_queue()
accessing an already released data structure, which in turn may result
in a crash such as the following.

[   10.681671] BUG: Kernel NULL pointer dereference at 0x00000040
[   10.681826] Faulting instruction address: 0xc0431480
[   10.682072] Oops: Kernel access of bad area, sig: 11 [#1]
[   10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440
[   10.682387] Modules linked in:
[   10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.0.0-rc6-next-20190218+ #2
[   10.682733] NIP:  c0431480 LR: c043147c CTR: c0422ad8
[   10.682863] REGS: cf82fbe0 TRAP: 0300   Tainted: G        W          (5.0.0-rc6-next-20190218+)
[   10.683065] MSR:  00029000 <CE,EE,ME>  CR: 22000222  XER: 00000000
[   10.683236] DEAR: 00000040 ESR: 00000000
[   10.683236] GPR00: c043147c cf82fc90 cf82ccc0 00000000 00000000 00000000 00000002 00000000
[   10.683236] GPR08: 00000000 00000000 c04310bc 00000000 22000222 00000000 c0002c54 00000000
[   10.683236] GPR16: 00000000 00000001 c09aa39c c09021b0 c09021dc 00000007 c0a68c08 00000000
[   10.683236] GPR24: 00000001 ced6d400 ced6dcf0 c0815d9c 00000000 00000000 00000000 cedf0800
[   10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114
[   10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114
[   10.684602] Call Trace:
[   10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable)
[   10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c
[   10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68
[   10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c
[   10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508
[   10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8
[   10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c
[   10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464
[   10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4
[   10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc
[   10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0
[   10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234
[   10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c
[   10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac
[   10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330
[   10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478
[   10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114
[   10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c
[   10.687349] Instruction dump:
[   10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 38810008
[   10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 80810008
[   10.688056] ---[ end trace 13c9ff51d41b9d40 ]---

Fix the problem by setting the disk queue pointer to NULL before calling
put_disk(). A more comprehensive fix might be to rearrange the code
to check the hardware version before initializing data structures,
but I don't know if this would have undesirable side effects, and
it would increase the complexity of backporting the fix to older kernels.

Fixes: 74489a91dd ("Add support for Xilinx SystemACE CompactFlash interface")
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-06 10:51:12 -06:00
John Pittman 7ff684a683 null_blk: prevent crash from bad home_node value
At module load, if the selected home_node value is greater than
the available numa nodes, the system will crash in
__alloc_pages_nodemask() due to a bad paging request.  Prevent this
user error crash by detecting the bad value, logging an error, and
setting g_home_node back to the default of NUMA_NO_NODE.

Signed-off-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-06 10:51:08 -06:00
Linus Torvalds be76865df5 RTC fixes for 5.1
- Various alarm fixes for da9063, cros-ec and sh
  - sd3078 manufacturer name fix as this was introduced this cycle
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEXx9Viay1+e7J/aM4AyWl4gNJNJIFAlyoiekACgkQAyWl4gNJ
 NJLH4A/+Pzh9Af8TcOmE7jbvTS0ulbVi1E5eW1tyWZ3JZibOh5XFh/Sjln50l0sM
 B8zi4xeib5iFMEaLwO0MIaYp8LA7bWK2mZcKwpttSUuWSXK7v4Y2SjpkNOJW+ZzM
 ZNkR/Lp3ywGOTFTkQBCTrd5DFlEZe8pRf7MPYO38hjPgxJFzAp5HOblMQoA8PZGj
 ghro0R47Epz6uR1ltOSH1uE/kM5Ukx1vQ2YtHWPUXNsyevxvYsGCRlXfOXpaXZNj
 dLKgjpO6au2pnJ9xHU1tjxwRmlkVKjcmMyfvo6lwi5eXNw9gtXvZN5MRu7OwGU+M
 Oai+ayynYc3ZuQ+DzPa5YQcVLFsJdznjZxnqB1PduoOGFXIB16MqXRkD+LQ00Bxi
 NAQoWhdPMuXcng2Vhe8R2Rb4RGGJVKlnEHf9rvFbWzpNyO/whw8MPQQD9V62GPxZ
 6ww0SPEQzyXRNQHdd+hu7IUeUi0k8PtBr8cWCs0ufvQmxGhwYWr431VW/VvIVTrW
 CR3hqYgwVmHBF5A4GXK0dC4C3auzWBLN14HsN3eSmzPXI+ipEPjUPU1d6pOSuckv
 Pv2gyUgls5xhPhR6GQa6wcf0SWqmPQ16cp7hjOH5RlMg2BzL+Bbeg6L+x5UoIzx/
 3Rh3ahTTHMZ7YzoqvVARwCl/qx0jTRl0u9nbRd22tvsxXmjppbo=
 =s2X4
 -----END PGP SIGNATURE-----

Merge tag 'rtc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC fixes from Alexandre Belloni:

 - Various alarm fixes for da9063, cros-ec and sh

 - sd3078 manufacturer name fix as this was introduced this cycle

* tag 'rtc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: da9063: set uie_unsupported when relevant
  rtc: sd3078: fix manufacturer name
  rtc: sh: Fix invalid alarm warning for non-enabled alarm
  rtc: cros-ec: Fail suspend/resume if wake IRQ can't be configured
2019-04-06 06:26:36 -10:00
Laurentiu Tudor 3ace6891ce i2c: imx: don't leak the i2c adapter on error
Make sure to free the i2c adapter on the error exit path.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: e1ab9a468e ("i2c: imx: improve the error handling in i2c_imx_dma_request()")
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-04-06 17:54:28 +02:00
Alexander Potapenko 5b77e95dd7 x86/asm: Use stricter assembly constraints in bitops
There's a number of problems with how arch/x86/include/asm/bitops.h
is currently using assembly constraints for the memory region
bitops are modifying:

1) Use memory clobber in bitops that touch arbitrary memory

Certain bit operations that read/write bits take a base pointer and an
arbitrarily large offset to address the bit relative to that base.
Inline assembly constraints aren't expressive enough to tell the
compiler that the assembly directive is going to touch a specific memory
location of unknown size, therefore we have to use the "memory" clobber
to indicate that the assembly is going to access memory locations other
than those listed in the inputs/outputs.

To indicate that BTR/BTS instructions don't necessarily touch the first
sizeof(long) bytes of the argument, we also move the address to assembly
inputs.

This particular change leads to size increase of 124 kernel functions in
a defconfig build. For some of them the diff is in NOP operations, other
end up re-reading values from memory and may potentially slow down the
execution. But without these clobbers the compiler is free to cache
the contents of the bitmaps and use them as if they weren't changed by
the inline assembly.

2) Use byte-sized arguments for operations touching single bytes.

Passing a long value to ANDB/ORB/XORB instructions makes the compiler
treat sizeof(long) bytes as being clobbered, which isn't the case. This
may theoretically lead to worse code in the case of heavy optimization.

Practical impact:

I've built a defconfig kernel and looked through some of the functions
generated by GCC 7.3.0 with and without this clobber, and didn't spot
any miscompilations.

However there is a (trivial) theoretical case where this code leads to
miscompilation:

  https://lkml.org/lkml/2019/3/28/393

using just GCC 8.3.0 with -O2.  It isn't hard to imagine someone writes
such a function in the kernel someday.

So the primary motivation is to fix an existing misuse of the asm
directive, which happens to work in certain configurations now, but
isn't guaranteed to work under different circumstances.

[ --mingo: Added -stable tag because defconfig only builds a fraction
  of the kernel and the trivial testcase looks normal enough to
  be used in existing or in-development code. ]

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: James Y Knight <jyknight@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
[ Edited the changelog, tidied up one of the defines. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-06 09:52:02 +02:00
Linus Torvalds f654f0fc0b Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kernel/sysctl.c: fix out-of-bounds access when setting file-max
  mm/util.c: fix strndup_user() comment
  sh: fix multiple function definition build errors
  MAINTAINERS: add maintainer and replacing reviewer ARM/NUVOTON NPCM
  MAINTAINERS: fix bad pattern in ARM/NUVOTON NPCM
  mm: writeback: use exact memcg dirty counts
  psi: clarify the units used in pressure files
  mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
  hugetlbfs: fix memory leak for resv_map
  mm: fix vm_fault_t cast in VM_FAULT_GET_HINDEX()
  lib/lzo: fix bugs for very short or empty input
  include/linux/bitrev.h: fix constant bitrev
  kmemleak: powerpc: skip scanning holes in the .bss section
  lib/string.c: implement a basic bcmp
2019-04-05 17:08:55 -10:00
Will Deacon 9002b21465 kernel/sysctl.c: fix out-of-bounds access when setting file-max
Commit 32a5ad9c22 ("sysctl: handle overflow for file-max") hooked up
min/max values for the file-max sysctl parameter via the .extra1 and
.extra2 fields in the corresponding struct ctl_table entry.

Unfortunately, the minimum value points at the global 'zero' variable,
which is an int.  This results in a KASAN splat when accessed as a long
by proc_doulongvec_minmax on 64-bit architectures:

  | BUG: KASAN: global-out-of-bounds in __do_proc_doulongvec_minmax+0x5d8/0x6a0
  | Read of size 8 at addr ffff2000133d1c20 by task systemd/1
  |
  | CPU: 0 PID: 1 Comm: systemd Not tainted 5.1.0-rc3-00012-g40b114779944 #2
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x228
  |  show_stack+0x14/0x20
  |  dump_stack+0xe8/0x124
  |  print_address_description+0x60/0x258
  |  kasan_report+0x140/0x1a0
  |  __asan_report_load8_noabort+0x18/0x20
  |  __do_proc_doulongvec_minmax+0x5d8/0x6a0
  |  proc_doulongvec_minmax+0x4c/0x78
  |  proc_sys_call_handler.isra.19+0x144/0x1d8
  |  proc_sys_write+0x34/0x58
  |  __vfs_write+0x54/0xe8
  |  vfs_write+0x124/0x3c0
  |  ksys_write+0xbc/0x168
  |  __arm64_sys_write+0x68/0x98
  |  el0_svc_common+0x100/0x258
  |  el0_svc_handler+0x48/0xc0
  |  el0_svc+0x8/0xc
  |
  | The buggy address belongs to the variable:
  |  zero+0x0/0x40
  |
  | Memory state around the buggy address:
  |  ffff2000133d1b00: 00 00 00 00 00 00 00 00 fa fa fa fa 04 fa fa fa
  |  ffff2000133d1b80: fa fa fa fa 04 fa fa fa fa fa fa fa 04 fa fa fa
  | >ffff2000133d1c00: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 00 00
  |                                ^
  |  ffff2000133d1c80: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 00 00
  |  ffff2000133d1d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Fix the splat by introducing a unsigned long 'zero_ul' and using that
instead.

Link: http://lkml.kernel.org/r/20190403153409.17307-1-will.deacon@arm.com
Fixes: 32a5ad9c22 ("sysctl: handle overflow for file-max")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Christian Brauner <christian@brauner.io>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Andrew Morton e91455217d mm/util.c: fix strndup_user() comment
The kerneldoc misdescribes strndup_user()'s return value.

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Timur Tabi <timur@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Randy Dunlap acaf892ecb sh: fix multiple function definition build errors
Many of the sh CPU-types have their own plat_irq_setup() and
arch_init_clk_ops() functions, so these same (empty) functions in
arch/sh/boards/of-generic.c are not needed and cause build errors.

If there is some case where these empty functions are needed, they can
be retained by marking them as "__weak" while at the same time making
builds that do not need them succeed.

Fixes these build errors:

arch/sh/boards/of-generic.o: In function `plat_irq_setup':
(.init.text+0x134): multiple definition of `plat_irq_setup'
arch/sh/kernel/cpu/sh2/setup-sh7619.o:(.init.text+0x30): first defined here
arch/sh/boards/of-generic.o: In function `arch_init_clk_ops':
(.init.text+0x118): multiple definition of `arch_init_clk_ops'
arch/sh/kernel/cpu/sh2/clock-sh7619.o:(.init.text+0x0): first defined here

Link: http://lkml.kernel.org/r/9ee4e0c5-f100-86a2-bd4d-1d3287ceab31@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Tomer Maimon 803cfadcb6 MAINTAINERS: add maintainer and replacing reviewer ARM/NUVOTON NPCM
Add Tali Perry as Nuvoton NPCM maintainer, replace Brendan Higgins
Nuvoton NPCM reviewer with Benjamin Fair.

Link: http://lkml.kernel.org/r/20190328235752.334462-2-tmaimon77@gmail.com
Signed-off-by: Tomer Maimon <tmaimon77@gmail.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: Benjamin Fair <benjaminfair@google.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Cc: Joe Perches <joe@perches.com>
Cc: Avi Fishman <avifishman70@gmail.com>
Cc: Patrick Venture <venture@google.com>
Cc: Nancy Yuen <yuenn@google.com>
Cc: Tali Perry <tali.perry1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Tomer Maimon 166dbd930c MAINTAINERS: fix bad pattern in ARM/NUVOTON NPCM
In the process of upstreaming architecture support for ARM/NUVOTON NPCM
include/dt-bindings/clock/nuvoton,npcm7xx-clks.h was renamed
include/dt-bindings/clock/nuvoton,npcm7xx-clock.h without updating
MAINTAINERS.  This updates the MAINTAINERS pattern to match the new name
of this file.

Link: http://lkml.kernel.org/r/20190328235752.334462-1-tmaimon77@gmail.com
Fixes: 6a498e06ba ("MAINTAINERS: Add entry for the Nuvoton NPCM architecture")
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Tomer Maimon <tmaimon77@gmail.com>
Reported-by: Joe Perches <joe@perches.com>
Reviewed-by: Benjamin Fair <benjaminfair@google.com>
Cc: Avi Fishman <avifishman70@gmail.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Nancy Yuen <yuenn@google.com>
Cc: Patrick Venture <venture@google.com>
Cc: Tali Perry <tali.perry1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Greg Thelen 0b3d6e6f2d mm: writeback: use exact memcg dirty counts
Since commit a983b5ebee ("mm: memcontrol: fix excessive complexity in
memory.stat reporting") memcg dirty and writeback counters are managed
as:

 1) per-memcg per-cpu values in range of [-32..32]

 2) per-memcg atomic counter

When a per-cpu counter cannot fit in [-32..32] it's flushed to the
atomic.  Stat readers only check the atomic.  Thus readers such as
balance_dirty_pages() may see a nontrivial error margin: 32 pages per
cpu.

Assuming 100 cpus:
   4k x86 page_size:  13 MiB error per memcg
  64k ppc page_size: 200 MiB error per memcg

Considering that dirty+writeback are used together for some decisions the
errors double.

This inaccuracy can lead to undeserved oom kills.  One nasty case is
when all per-cpu counters hold positive values offsetting an atomic
negative value (i.e.  per_cpu[*]=32, atomic=n_cpu*-32).
balance_dirty_pages() only consults the atomic and does not consider
throttling the next n_cpu*32 dirty pages.  If the file_lru is in the
13..200 MiB range then there's absolutely no dirty throttling, which
burdens vmscan with only dirty+writeback pages thus resorting to oom
kill.

It could be argued that tiny containers are not supported, but it's more
subtle.  It's the amount the space available for file lru that matters.
If a container has memory.max-200MiB of non reclaimable memory, then it
will also suffer such oom kills on a 100 cpu machine.

The following test reliably ooms without this patch.  This patch avoids
oom kills.

  $ cat test
  mount -t cgroup2 none /dev/cgroup
  cd /dev/cgroup
  echo +io +memory > cgroup.subtree_control
  mkdir test
  cd test
  echo 10M > memory.max
  (echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo)
  (echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)

  $ cat memcg-writeback-stress.c
  /*
   * Dirty pages from all but one cpu.
   * Clean pages from the non dirtying cpu.
   * This is to stress per cpu counter imbalance.
   * On a 100 cpu machine:
   * - per memcg per cpu dirty count is 32 pages for each of 99 cpus
   * - per memcg atomic is -99*32 pages
   * - thus the complete dirty limit: sum of all counters 0
   * - balance_dirty_pages() only sees atomic count -99*32 pages, which
   *   it max()s to 0.
   * - So a workload can dirty -99*32 pages before balance_dirty_pages()
   *   cares.
   */
  #define _GNU_SOURCE
  #include <err.h>
  #include <fcntl.h>
  #include <sched.h>
  #include <stdlib.h>
  #include <stdio.h>
  #include <sys/stat.h>
  #include <sys/sysinfo.h>
  #include <sys/types.h>
  #include <unistd.h>

  static char *buf;
  static int bufSize;

  static void set_affinity(int cpu)
  {
  	cpu_set_t affinity;

  	CPU_ZERO(&affinity);
  	CPU_SET(cpu, &affinity);
  	if (sched_setaffinity(0, sizeof(affinity), &affinity))
  		err(1, "sched_setaffinity");
  }

  static void dirty_on(int output_fd, int cpu)
  {
  	int i, wrote;

  	set_affinity(cpu);
  	for (i = 0; i < 32; i++) {
  		for (wrote = 0; wrote < bufSize; ) {
  			int ret = write(output_fd, buf+wrote, bufSize-wrote);
  			if (ret == -1)
  				err(1, "write");
  			wrote += ret;
  		}
  	}
  }

  int main(int argc, char **argv)
  {
  	int cpu, flush_cpu = 1, output_fd;
  	const char *output;

  	if (argc != 2)
  		errx(1, "usage: output_file");

  	output = argv[1];
  	bufSize = getpagesize();
  	buf = malloc(getpagesize());
  	if (buf == NULL)
  		errx(1, "malloc failed");

  	output_fd = open(output, O_CREAT|O_RDWR);
  	if (output_fd == -1)
  		err(1, "open(%s)", output);

  	for (cpu = 0; cpu < get_nprocs(); cpu++) {
  		if (cpu != flush_cpu)
  			dirty_on(output_fd, cpu);
  	}

  	set_affinity(flush_cpu);
  	if (fsync(output_fd))
  		err(1, "fsync(%s)", output);
  	if (close(output_fd))
  		err(1, "close(%s)", output);
  	free(buf);
  }

Make balance_dirty_pages() and wb_over_bg_thresh() work harder to
collect exact per memcg counters.  This avoids the aforementioned oom
kills.

This does not affect the overhead of memory.stat, which still reads the
single atomic counter.

Why not use percpu_counter? memcg already handles cpus going offline, so
no need for that overhead from percpu_counter.  And the percpu_counter
spinlocks are more heavyweight than is required.

It probably also makes sense to use exact dirty and writeback counters
in memcg oom reports.  But that is saved for later.

Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>	[4.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Waiman Long be87ab0afd psi: clarify the units used in pressure files
The output of the PSI files show a bunch of numbers with no unit.  The
psi.txt documentation file also does not indicate what units are used.
One can only find out by looking at the source code.  The units are
percentage for the averages and useconds for the total.  Make the
information easier to find by documenting the units in psi.txt.

Link: http://lkml.kernel.org/r/20190402193810.3450-1-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Aneesh Kumar K.V c6f3c5ee40 mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
With some architectures like ppc64, set_pmd_at() cannot cope with a
situation where there is already some (different) valid entry present.

Use pmdp_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PMD entries.

This is similar to commit cae85cb8ad ("mm/memory.c: fix modifying of
page protection by insert_pfn()")

We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't
support pud pfn entries now.

Without this patch we also see the below message in kernel log "BUG:
non-zero pgtables_bytes on freeing mm:"

Link: http://lkml.kernel.org/r/20190402115125.18803-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Mike Kravetz 58b6e5e8f1 hugetlbfs: fix memory leak for resv_map
When mknod is used to create a block special file in hugetlbfs, it will
allocate an inode and kmalloc a 'struct resv_map' via resv_map_alloc().
inode->i_mapping->private_data will point the newly allocated resv_map.
However, when the device special file is opened bd_acquire() will set
inode->i_mapping to bd_inode->i_mapping.  Thus the pointer to the
allocated resv_map is lost and the structure is leaked.

Programs to reproduce:
        mount -t hugetlbfs nodev hugetlbfs
        mknod hugetlbfs/dev b 0 0
        exec 30<> hugetlbfs/dev
        umount hugetlbfs/

resv_map structures are only needed for inodes which can have associated
page allocations.  To fix the leak, only allocate resv_map for those
inodes which could possibly be associated with page allocations.

Link: http://lkml.kernel.org/r/20190401213101.16476-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Yufen Yu <yuyufen@huawei.com>
Suggested-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Jann Horn fcae96ff96 mm: fix vm_fault_t cast in VM_FAULT_GET_HINDEX()
Symmetrically to VM_FAULT_SET_HINDEX(), we need a force-cast in
VM_FAULT_GET_HINDEX() to tell sparse that this is intentional.

Sparse complains about the current code when building a kernel with
CONFIG_MEMORY_FAILURE:

  arch/x86/mm/fault.c:1058:53: warning: restricted vm_fault_t degrades to integer

Link: http://lkml.kernel.org/r/20190327204117.35215-1-jannh@google.com
Fixes: 3d3539018d ("mm: create the new vm_fault_t type")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:31 -10:00
Dave Rodgman b11ed18efa lib/lzo: fix bugs for very short or empty input
For very short input data (0 - 1 bytes), lzo-rle was not behaving
correctly.  Fix this behaviour and update documentation accordingly.

For zero-length input, lzo v0 outputs an end-of-stream marker only,
which was misinterpreted by lzo-rle as a bitstream version number.
Ensure bitstream versions > 0 require a minimum stream length of 5.

Also fixes a bug in handling the tail for very short inputs when a
bitstream version is present.

Link: http://lkml.kernel.org/r/20190326165857.34613-1-dave.rodgman@arm.com
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:30 -10:00
Arnd Bergmann 6147e136ff include/linux/bitrev.h: fix constant bitrev
clang points out with hundreds of warnings that the bitrev macros have a
problem with constant input:

  drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization
        [-Werror,-Wuninitialized]
          u8 crc = bitrev8(data->val_status & 0x0F);
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8'
          __constant_bitrev8(__x) :                       \
          ~~~~~~~~~~~~~~~~~~~^~~~
  include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8'
          u8 __x = x;                     \
             ~~~   ^

Both the bitrev and the __constant_bitrev macros use an internal
variable named __x, which goes horribly wrong when passing one to the
other.

The obvious fix is to rename one of the variables, so this adds an extra
'_'.

It seems we got away with this because

 - there are only a few drivers using bitrev macros

 - usually there are no constant arguments to those

 - when they are constant, they tend to be either 0 or (unsigned)-1
   (drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and
   give the correct result by pure chance.

In fact, the only driver that I could find that gets different results
with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver
for fairly rare hardware (adding the maintainer to Cc for testing).

Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de
Fixes: 556d2f055b ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Cc: Yalin Wang <yalin.wang@sonymobile.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:30 -10:00
Catalin Marinas 298a32b132 kmemleak: powerpc: skip scanning holes in the .bss section
Commit 2d4f567103 ("KVM: PPC: Introduce kvm_tmp framework") adds
kvm_tmp[] into the .bss section and then free the rest of unused spaces
back to the page allocator.

kernel_init
  kvm_guest_init
    kvm_free_tmp
      free_reserved_area
        free_unref_page
          free_unref_page_prepare

With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel.  As the
result, kmemleak scan will trigger a panic when it scans the .bss
section with unmapped pages.

This patch creates dedicated kmemleak objects for the .data, .bss and
potentially .data..ro_after_init sections to allow partial freeing via
the kmemleak_free_part() in the powerpc kvm_free_tmp() function.

Link: http://lkml.kernel.org/r/20190321171917.62049-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Qian Cai <cai@lca.pw>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Qian Cai <cai@lca.pw>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:30 -10:00
Nick Desaulniers 5f074f3e19 lib/string.c: implement a basic bcmp
A recent optimization in Clang (r355672) lowers comparisons of the
return value of memcmp against zero to comparisons of the return value
of bcmp against zero.  This helps some platforms that implement bcmp
more efficiently than memcmp.  glibc simply aliases bcmp to memcmp, but
an optimized implementation is in the works.

This results in linkage failures for all targets with Clang due to the
undefined symbol.  For now, just implement bcmp as a tailcail to memcmp
to unbreak the build.  This routine can be further optimized in the
future.

Other ideas discussed:

 * A weak alias was discussed, but breaks for architectures that define
   their own implementations of memcmp since aliases to declarations are
   not permitted (only definitions). Arch-specific memcmp
   implementations typically declare memcmp in C headers, but implement
   them in assembly.

 * -ffreestanding also is used sporadically throughout the kernel.

 * -fno-builtin-bcmp doesn't work when doing LTO.

Link: https://bugs.llvm.org/show_bug.cgi?id=41035
Link: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#bcmp
Link: 8e16d73346
Link: https://github.com/ClangBuiltLinux/linux/issues/416
Link: http://lkml.kernel.org/r/20190313211335.165605-1-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: James Y Knight <jyknight@google.com>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 16:02:30 -10:00
Linus Torvalds 4f1cbe0785 - Two queue_limits stacking fixes: disable discards if underlying driver
does.  And propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum
   errors.
 
 - Fix that reverts a DM core limit that wasn't needed given that
   dm-crypt was already updated to impose an equivalent limit.
 
 - Fix dm-init to properly establish 'const' for __initconst array.
 
 - Fix deadlock in DM integrity target that occurs when overlapping IO is
   being issued to it.  And two smaller fixes to the DM integrity target.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAlyn2/UTHHNuaXR6ZXJA
 cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWtNqB/9CbHYDCdeZ3cgHrp9GkRH9q3TKQPgD
 42evJzrDlJgpPNVPAahI92lFzPXPnQ4zgqDcZQSQl1wl3c0yXWQxIe+pUHqYQV1L
 tBxvN2bmFS9Zn9t/XRq3O6+hy7hY66rZVKIhTgg2nuWFxLUvCrZw2Ppjiij7iTvu
 icO81ONFd7kDUMUfYnuIowOKwyNye4dw2fzsFz6cxDiM2sGCaLLr6x5TBCey+zD0
 xYli1cUjakomgu3KHF/WCvRIoc5KlPGvZB2Gvxm+UpuzDkV5esTY+CKRcFpF9pXd
 Wn0G+rAthSIOFXID2TI6gfX1T7UQmeEIvzy2EYxsucK3YRoQARFO2xDE
 =At3j
 -----END PGP SIGNATURE-----

Merge tag 'for-5.1/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Two queue_limits stacking fixes: disable discards if underlying
   driver does. And propagate BDI_CAP_STABLE_WRITES to fix sporadic
   checksum errors.

 - Fix that reverts a DM core limit that wasn't needed given that
   dm-crypt was already updated to impose an equivalent limit.

 - Fix dm-init to properly establish 'const' for __initconst array.

 - Fix deadlock in DM integrity target that occurs when overlapping IO
   is being issued to it. And two smaller fixes to the DM integrity
   target.

* tag 'for-5.1/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm integrity: fix deadlock with overlapping I/O
  dm: disable DISCARD if the underlying storage no longer supports it
  dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors
  dm: revert 8f50e35815 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
  dm init: fix const confusion for dm_allowed_targets array
  dm integrity: make dm_integrity_init and dm_integrity_exit static
  dm integrity: change memcmp to strncmp in dm_integrity_ctr
2019-04-05 15:34:33 -10:00
Linus Torvalds 3e28fb0fcb VFIO fixes for v5.1-rc4
- Fix clang printk format errors (Louis Taylor)
 
  - Declare structure static to fix sparse warning (Wang Hai)
 
  - Limit user DMA mappings per container (CVE-2019-3882) (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJcp2hGAAoJECObm247sIsiccMP/RHCnR2s30uvhiRnWXCtjYuB
 KsaCfbS6JMI+vmeIV995deVlxVCWKblKIfwlW3fQ+vzljwRBe17MFhAj93aG9qmo
 BG3TVsSntz4FvlvzQbJjkF3LLENrTfw6bMskp+0gHVA3pX6g92NvEZGpBlXZo0+I
 41LkEBMYLZV3PaXRUa/QOwQqp6ootTwQUX3a75I1zzkQqpeypu5/MkRV+0/HRxYc
 q+jhGIlUVc7sFwH5Ggn8uTM0tuAOIpZfMEJcI64DX2BE6eBppR6lChZGIgFNF3DB
 YVRg5inyZmdi87QgkRXWqQ/6PDDqfNPMlKWZntAWCGbEzYsIsGLYjxPoOgSDsO7l
 Ud1Cgn8KF65n48vBiW68SPUMijVOYxaKE0dUhxhhcMdlz8lOpMeZ5B748Izswi08
 roKAZkMGyAbvbuAuUIQkAbSMRNlcAi7JMH3mE3XnyeEVzZtyMjtBX1oLl//0JFi9
 qdR3E7hJTbOZ/3tjNx718INkZ7xj44s+n7trs8PepEHxP42o9Mr3zpm0ETEgzaNI
 r8k4ViiztqMQeAYYfIugcr+F7a41E4xhyLShi/sxJZ0xo5i7v4hAZ2LoDcpRz4kd
 EuWEnkzIH2+3DH2UJuW4TzLhPoW3Y9Q04OmoJ+/8sdxBVjYZGwppaf/zGW2RSWkh
 VmMxOKc4TZsxLkSH2L+I
 =j6NB
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.1-rc4' of git://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - Fix clang printk format errors (Louis Taylor)

 - Declare structure static to fix sparse warning (Wang Hai)

 - Limit user DMA mappings per container (CVE-2019-3882) (Alex
   Williamson)

* tag 'vfio-v5.1-rc4' of git://github.com/awilliam/linux-vfio:
  vfio/type1: Limit DMA mappings per container
  vfio/spapr_tce: Make symbol 'tce_iommu_driver_ops' static
  vfio/pci: use correct format characters
2019-04-05 15:07:28 -10:00
Linus Torvalds bc5725f974 x86 fixes for overflows and other nastiness.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEbBAABAgAGBQJcp6fbAAoJEL/70l94x66DpFIH+M/M5gWfGGJ2zbZGhSJpQAjd
 5MvoZbJGCKqN/NVGvcjWES09z7iA79t34jMEYSBpAM1VQ5XJumpC6ITpSLg26lrD
 6MCftkAQTkN+xkJMHtKENsosqrqXb+CszTC/46CkNB65xbQQuaG5HRzIbQp7VccE
 3xt4FHibPC4QzjXrlOyDnzUF3TUYheVOxN3u7L5xBrmXq3tpW5U+RuCF5O9Xs0OR
 slf1RM4L9GBIOciaE6Q0633wDml3L/+kDg00xZ8COxIOZ7ilrgFNbtc2O6GKk1cu
 PFZ3D0+eoSmRoJWPEOUb4O0uPyzujo08iR7GU6Rhl4oBNA+8UqK4SYIIJtUWig==
 =rPQV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "x86 fixes for overflows and other nastiness"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: nVMX: fix x2APIC VTPR read intercept
  KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)
  KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow
  kvm: svm: fix potential get_num_contig_pages overflow
2019-04-05 13:43:07 -10:00
Linus Torvalds 2f9e10acfa Fix unwind_frame() in the context of pseudo NMI.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlynaHsACgkQa9axLQDI
 XvEW7w/+JWLWbXBimDhPhdzANJWzwop3rpqekv5D105+pI0ZxzEYt/I5B22TJmbj
 tYni/guF2U/I7e0bIMFWs5Tuu+NoQG/fIHdladg0ISb2TEhHDwkggsC8XAIyXT0h
 WZGcZnw0AG/ndo9TwWW1vzUqqmYzBOixbib3SWvlvBIhxeW1Vgl3Wn7HGSw0ijwE
 7qh+MdP9N3ylGJIcV1G/T5Fw9Oz4qHWNXcsenVxFxCRikbuCTkBtqUVlUuHbxiez
 qxaiEqKicGyF0FFvT5ZIIddWHQzNjqjToGRM77kVBy8AYPwRLTKy0/vMltUXifED
 w56XHnGmzD66fVnYTxXmJQvT5HaAYhML0hZ1KjrBzpShMjCxbLSbUuI5En6tlfT9
 cJXSXBPQcSBm9Fy/YtUHEM9bZ3kKHZu8gopUATDwot+Mh0S6h5B2rLjD29JVA+xd
 d3nUqXJTl/hq7MXIBwC7FS0ALsA85dvlYtiNM6fhoqofXzRmai0PNuhpGo7K22MH
 zU+jJG2J/K60vDN/C67DxBBuklWfr+Cdythqr1KVgjs6KJ5WB4i3I/zYpJlTVgp0
 uC+5so+46ydcn7wRU8afpJZs0ffelCLDUBaQzNtmVNCvL3d6FE+6856KYpHCS/cc
 JdV08CCVIqjBZIV0xd9zVFVmtWroCOCcxek8dJSJezKgq3JhFX8=
 =A++s
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix unwind_frame() in the context of pseudo NMI"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: fix wrong check of on_sdei_stack in nmi context
2019-04-05 13:36:45 -10:00
Linus Torvalds 970b766cfd Andy Lutomirski approached me to tell me that the syscall_get_arguments()
implementation in x86 was horrible and gcc  certainly gets it wrong. He
 said that since the tracepoints only pass in 0 and 6 for i and n repectively,
 it should be optimized for that case. Inspecting the kernel, I discovered
 that all users pass in 0 for i and only one file passing in something other
 than 6 for the number of arguments. That code happens to be my own code used
 for the special syscall tracing. That can easily be converted to just
 using 0 and 6 as well, and only copying what is needed. Which is probably
 the faster path anyway for that case.
 
 Along the way, a couple of real fixes came from this as the
 syscall_get_arguments() function was incorrect for csky and riscv.
 
 x86 has been optimized to for the new interface that removes the variable
 number of arguments, but the other architectures could still use some
 loving and take more advantage of the simpler interface.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXKdi7RQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qjtiAQDaZbFaSgEbs99jjuAPDSZ0li8dyUOC
 3KS5TyuLw+fEaAD/QZnKjplVFAfA5FxrABZ0ioIKDON4nLyESEb+xCv0gA4=
 =dTuo
 -----END PGP SIGNATURE-----

Merge tag 'trace-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull syscall-get-arguments cleanup and fixes from Steven Rostedt:
 "Andy Lutomirski approached me to tell me that the
  syscall_get_arguments() implementation in x86 was horrible and gcc
  certainly gets it wrong.

  He said that since the tracepoints only pass in 0 and 6 for i and n
  repectively, it should be optimized for that case. Inspecting the
  kernel, I discovered that all users pass in 0 for i and only one file
  passing in something other than 6 for the number of arguments. That
  code happens to be my own code used for the special syscall tracing.

  That can easily be converted to just using 0 and 6 as well, and only
  copying what is needed. Which is probably the faster path anyway for
  that case.

  Along the way, a couple of real fixes came from this as the
  syscall_get_arguments() function was incorrect for csky and riscv.

  x86 has been optimized to for the new interface that removes the
  variable number of arguments, but the other architectures could still
  use some loving and take more advantage of the simpler interface"

* tag 'trace-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  syscalls: Remove start and number from syscall_set_arguments() args
  syscalls: Remove start and number from syscall_get_arguments() args
  csky: Fix syscall_get_arguments() and syscall_set_arguments()
  riscv: Fix syscall_get_arguments() and syscall_set_arguments()
  tracing/syscalls: Pass in hardcoded 6 into syscall_get_arguments()
  ptrace: Remove maxargs from task_current_syscall()
2019-04-05 13:15:57 -10:00
Mikulas Patocka 4ed319c6ac dm integrity: fix deadlock with overlapping I/O
dm-integrity will deadlock if overlapping I/O is issued to it, the bug
was introduced by commit 724376a04d ("dm integrity: implement fair
range locks").  Users rarely use overlapping I/O so this bug went
undetected until now.

Fix this bug by correcting, likely cut-n-paste, typos in
ranges_overlap() and also remove a flawed ranges_overlap() check in
remove_range_unlocked().  This condition could leave unprocessed bios
hanging on wait_list forever.

Cc: stable@vger.kernel.org # v4.19+
Fixes: 724376a04d ("dm integrity: implement fair range locks")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-05 18:49:08 -04:00
Andre Przywara 9cde402a59 PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
There is a Marvell 88SE9170 PCIe SATA controller I found on a board here.
Some quick testing with the ARM SMMU enabled reveals that it suffers from
the same requester ID mixup problems as the other Marvell chips listed
already.

Add the PCI vendor/device ID to the list of chips which need the
workaround.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org
2019-04-05 16:11:16 -05:00
Marc Orr c73f4c998e KVM: x86: nVMX: fix x2APIC VTPR read intercept
Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the
SDM, when "virtualize x2APIC mode" is 1 and "APIC-register
virtualization" is 0, a RDMSR of 808H should return the VTPR from the
virtual APIC page.

However, for nested, KVM currently fails to disable the read intercept
for this MSR. This means that a RDMSR exit takes precedence over
"virtualize x2APIC mode", and KVM passes through L1's TPR to L2,
instead of sourcing the value from L2's virtual APIC page.

This patch fixes the issue by disabling the read intercept, in VMCS02,
for the VTPR when "APIC-register virtualization" is 0.

The issue described above and fix prescribed here, were verified with
a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC
mode w/ nested".

Signed-off-by: Marc Orr <marcorr@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Fixes: c992384bde ("KVM: vmx: speed up MSR bitmap merge")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05 21:08:30 +02:00
Marc Orr acff78477b KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)
The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the
x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a
result, we discovered the potential for a buggy or malicious L1 to get
access to L0's x2APIC MSRs, via an L2, as follows.

1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl
variable, in nested_vmx_prepare_msr_bitmap() to become true.
2. L1 disables "virtualize x2APIC mode" in VMCS12.
3. L1 enables "APIC-register virtualization" in VMCS12.

Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then
set "virtualize x2APIC mode" to 0 in VMCS02. Oops.

This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR
intercepts with VMCS12's "virtualize x2APIC mode" control.

The scenario outlined above and fix prescribed here, were verified with
a related patch in kvm-unit-tests titled "Add leak scenario to
virt_x2apic_mode_test".

Note, it looks like this issue may have been introduced inadvertently
during a merge---see 15303ba5d1.

Signed-off-by: Marc Orr <marcorr@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05 21:08:22 +02:00
David Rientjes b86bc2858b KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow
This ensures that the address and length provided to DBG_DECRYPT and
DBG_ENCRYPT do not cause an overflow.

At the same time, pass the actual number of pages pinned in memory to
sev_unpin_memory() as a cleanup.

Reported-by: Cfir Cohen <cfir@google.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05 20:49:42 +02:00
David Rientjes ede885ecb2 kvm: svm: fix potential get_num_contig_pages overflow
get_num_contig_pages() could potentially overflow int so make its type
consistent with its usage.

Reported-by: Cfir Cohen <cfir@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05 20:48:59 +02:00
Linus Torvalds 7f46774c64 MM/compaction patches for 5.1-rc4
The merge window for 5.1 introduced a number of compaction-related patches.
 with intermittent reports of corruption and functional issues. The bugs
 are due to sloopy checking of zone boundaries and a corner case where
 invalid indexes are used to access the free lists.
 
 Reports are not common but at least two users and 0-day have tripped
 over them.  There is a chance that one of the syzbot reports are related
 but it has not been confirmed properly.
 
 The normal submission path is with Andrew but there have been some delays
 and I consider them urgent enough that they should be picked up before
 RC4 to avoid duplicate reports.
 
 All of these have been successfully tested on older RC windows. This
 will make this branch look like a rebase but in fact, they've simply
 been lifted again from Andrew's tree and placed on a fresh branch. I've
 no reason to believe that this has invalidated the testing given the
 lack of change in compaction and the nature of the fixes.
 
 Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
 -----BEGIN PGP SIGNATURE-----
 
 iQJQBAABCAA6FiEECg+MacO/Fijve9AMfMb8M0SyR+IFAlynB+gcHG1nb3JtYW5A
 dGVjaHNpbmd1bGFyaXR5Lm5ldAAKCRB8xvwzRLJH4iW4EACw2vxxFTBUr1R3Yr8a
 YsRLT9nJvpebX98hyLG3Mhcv2c2haW+LC/H4T2UP1vW6zNH7Co738QcraH1xVPfr
 J3xJWSihXSAwLPHJQpJWd3pdWmmboIVUnYR/Cn/TiE2vukyFS61aRo/FEctJx2Hm
 cdkRh/j4HPtO7U4g0Lg6t1GYa40wy+1ahP+PTHpdIjadxnu+AGKrXgi+oysJJ29F
 QBBH2/5Lv8feaOyBRlWycBYHq7to/EsN+Xr0shsMT+AVcOvPZ2hi9GCdTRiDvHiJ
 c0cXftyxm8BYqKcPDeLOdQGtnDkDRCFrZI7hzHEvbUKdodVD5BZIet8HeblnUlMt
 TtgWenwo02MY3t9zSDSxBtzj8EimNQFIGJz7kZseS56leskvhXUXuDyUhT3O5I8R
 DBSX8DRJMhri2yGImC57RRtLGO/zbzHMrpB8ogpsqlo4UbMwNNOuL4yLAJ9XCcmV
 jjuakK08dr6v02NeOc/B2z5mkm4gQe0jvJyxMaWJHWORJEwrRu9ie8Af3yKBlP3Y
 dpX0R3XLtlOJdsFksDGm+24ftogN3l+0r/qVi7E8uwCvUeS9nvwY9ffNFiLHwMAg
 BewwduX2MWT3hAVzNprFmUTMaKGUjdpgwdxUt68qcmWEzlF2KcHbTpRv4+lxzx/A
 msO3x+upkskoFOBCRdvxq+AaKg==
 =WJCV
 -----END PGP SIGNATURE-----

Merge tag 'mm-compaction-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux

Pull mm/compaction fixes from Mel Gorman:
 "The merge window for 5.1 introduced a number of compaction-related
  patches. with intermittent reports of corruption and functional
  issues. The bugs are due to sloopy checking of zone boundaries and a
  corner case where invalid indexes are used to access the free lists.

  Reports are not common but at least two users and 0-day have tripped
  over them. There is a chance that one of the syzbot reports are
  related but it has not been confirmed properly.

  The normal submission path is with Andrew but there have been some
  delays and I consider them urgent enough that they should be picked up
  before RC4 to avoid duplicate reports.

  All of these have been successfully tested on older RC windows. This
  will make this branch look like a rebase but in fact, they've simply
  been lifted again from Andrew's tree and placed on a fresh branch.
  I've no reason to believe that this has invalidated the testing given
  the lack of change in compaction and the nature of the fixes"

* tag 'mm-compaction-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux:
  mm/compaction.c: abort search if isolation fails
  mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints
2019-04-05 06:09:53 -10:00
Greg Kroah-Hartman c7084edc3f tty: mark Siemens R3964 line discipline as BROKEN
The n_r3964 line discipline driver was written in a different time, when
SMP machines were rare, and users were trusted to do the right thing.
Since then, the world has moved on but not this code, it has stayed
rooted in the past with its lovely hand-crafted list structures and
loads of "interesting" race conditions all over the place.

After attempting to clean up most of the issues, I just gave up and am
now marking the driver as BROKEN so that hopefully someone who has this
hardware will show up out of the woodwork (I know you are out there!)
and will help with debugging a raft of changes that I had laying around
for the code, but was too afraid to commit as odds are they would break
things.

Many thanks to Jann and Linus for pointing out the initial problems in
this codebase, as well as many reviews of my attempts to fix the issues.
It was a case of whack-a-mole, and as you can see, the mole won.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-05 05:56:44 -10:00
Stephen Boyd 325aa19598 genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip
has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.

This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when
the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to
walk the chain of parents and set irq wake on any chips that don't have the
flag set either. If the intent is to call the .irq_set_wake() callback of
the parent irqchip, then we expect irqchip implementations to omit the
IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that
calls irq_chip_set_wake_parent().

The problem has been observed on a Qualcomm sdm845 device where set wake
fails on any GPIO interrupts after applying work in progress wakeup irq
patches to the GPIO driver. The chain of chips looks like this:

     QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)

The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM
GIC as parent.  The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag
set, and so does the grandparent ARM GIC.

The GPIO driver doesn't know if the parent needs to set wake or not, so it
unconditionally calls irq_chip_set_wake_parent() causing this function to
return a failure because the parent irqchip (PDC) doesn't have the
.irq_set_wake() callback set. Returning 0 instead makes everything work and
irqs from the GPIO controller can be configured for wakeup.

Make it consistent by returning 0 (success) from irq_chip_set_wake_parent()
when a parent chip has IRQCHIP_SKIP_SET_WAKE set.

[ tglx: Massaged changelog ]

Fixes: 08b55e2a92 ("genirq: Add irqchip_set_wake_parent")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-gpio@vger.kernel.org
Cc: Lina Iyer <ilina@codeaurora.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.org
2019-04-05 17:41:41 +02:00