1
0
Fork 0
Commit Graph

797284 Commits (e2b3fa5af70c1e646270f6c7c799414f5e904d7a)

Author SHA1 Message Date
Damien Le Moal e2b3fa5af7 block: Remove bio->bi_ioc
bio->bi_ioc is never set so always NULL. Remove references to it in
bio_disassociate_task() and in rq_ioc() and delete this field from
struct bio. With this change, rq_ioc() always returns
current->io_context without the need for a bio argument. Further
simplify the code and make it more readable by also removing this
helper, which also allows to simplify blk_mq_sched_assign_ioc() by
removing its bio argument.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-19 19:03:44 -07:00
Damien Le Moal 23464f8c34 aio: Comment use of IOCB_FLAG_IOPRIO aio flag
Comment the use of the IOCB_FLAG_IOPRIO aio flag similarly to the
IOCB_FLAG_RESFD flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-19 19:03:43 -07:00
Jens Axboe 92f806d678 nvme-fc: remove ->poll implementation
It's specifically looking for a given request, which we will not be
supporting going forward. Also kill the qla2xxx poll implementation
as that's the only user of the nvme-fc poll, and the now unused
->poll_queue() hook.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by:  James Smart <jsmart2021@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-19 12:06:32 -07:00
Jens Axboe 85f4d4b65f block: have ->poll_fn() return number of entries polled
We currently only really support sync poll, ie poll with 1 IO in flight.
This prepares us for supporting async poll.

Note that the returned value isn't necessarily 100% accurate. If poll
races with IRQ completion, we assume that the fact that the task is now
runnable means we found at least one entry. In reality it could be more
than 1, or not even 1. This is fine, the caller will just need to take
this into account.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-19 08:34:50 -07:00
Jens Axboe 849a370016 block: avoid ordered task state change for polled IO
For the core poll helper, the task state setting don't need to imply any
atomics, as it's the current task itself that is being modified and
we're not going to sleep.

For IRQ driven, the wakeup path have the necessary barriers to not need
us using the heavy handed version of the task state setting.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-19 08:34:49 -07:00
Jens Axboe a4668d9ba4 nvme: default to 0 poll queues
We need a better way of configuring this, and given that polling is
(still) a bit niche, let's default to using 0 poll queues. That way
we'll have the same read/write/poll behavior as 4.20, and users that
want to test/use polling are required to do manual configuration of the
number of poll queues.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-19 08:18:24 -07:00
Jens Axboe a78b03bc73 Linux 4.20-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlvx2sAeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGycgIAIuxobwt0RRKa0zO
 ROS+34JGoC2yU2P9VdEGWdtxS6ANMVQgKPBhWL6s+xR89Kd+V4xSdJLD1pNTxxqP
 0DCva0np1/Q4juH+JbU50v/lykoLgteZ0P0LBRGf1y8p3WiLPv45IbnNsMDNYhB2
 7a8rOmZYakRY9CPznRDw3X8cJt3sddKgFJHIOGz1OQJVWtCD0KPGcJmQNsbDSagY
 Zx6Z5BKSIdjRqaAdN5gDa1Pft3WQo7TpaQGl80lSsgr5LcjmscXA3sClOCy+25Mo
 FZLx0PcwP+Efq8RTGzNK51WSOMa6d37hvjDqUAdQBOR0KbyjRyXQwyQVw/MGbPJs
 7J3Pzm0=
 =56Mt
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc3' into for-4.21/block

Merge in -rc3 to resolve a few conflicts, but also to get a few
important fixes that have gone into mainline since the block
4.21 branch was forked off (most notably the SCSI queue issue,
which is both a conflict AND needed fix).

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-18 15:46:03 -07:00
Jens Axboe fce15a609f floppy: remove now unused 'flags' variable
With the locking removed, it's unused. Kill it.

Fixes: 503f620f0c ("floppy: remove queue_lock around floppy_end_request")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-18 15:42:48 -07:00
Linus Torvalds 9ff01193a2 Linux 4.20-rc3 2018-11-18 13:33:44 -08:00
Linus Torvalds 25e19c1fe4 libnvdimm 4.20-rc3
- Address Range Scrub overflow continuation handling has been broken
   since it was initially merged. It was only recently that error injection
   and platform-BIOS support enabled this corner case to be exercised.
 
 - The recent attempt to provide more isolation for the kernel Address
   Range Scrub state machine from userapace initiated sessions triggers a
   lockdep report. Revert and try again at the next merge window.
 
 - Fix a kasan reported buffer overflow in libnvdimm unit test
   infrastrucutre (nfit_test)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb8MdUAAoJEB7SkWpmfYgCifYP/A+OQ19HybqcY2nfvqXUdQum
 Q5x3qcKmGmEbKbnCUMOHZJpEjW4c/Cpm6OKhuFDQJ4tijn1XG3/ATSi7PXrZxs/o
 CRK8MIg5Wz/mMvYRvypIkCHHHr9+Y1NjqmQynM4LLzNG24GMXaeHHuZUTnrZCDmu
 0+jBTylNgVYdykoIxgHDYDB+cd6w4NtAP5OD9D46pdsmzX9ac+OQyZMyNB3glUhd
 /ZFAoywVNfvfJVWEci9RoHiKttWxgVoCuNbSlCs2Y6ymepA44ApR9AgLHtaC9pFO
 DrPkfCzPSmf4PVSxLJd79+/sw9YOcBD7LZ5IxzozxRMuRn5pIofdZIsBg9PlwT5B
 NL9jQK87XPiG0vNxhJu3wzP+FlyCXxGxkWfApp7w4rlWBV7RgugOZHyH051rdKzQ
 44JAPzLLCfA5Mj4o2tIbSx42f2JNX93XDEX8fkUB+qs3GzyOcMtlcmz9UjmnrT0R
 o9KHKhDn81Vivxh33Ts2G0iHktO83XSUBDWApSd6erjEUXMsCLY0D8y+nDGTOMUh
 kVcY8q93sgZGLVbcxt0eGc8Q7osZYawQGRGucflTETFcxNwMyLL4F9lWgPirGeYF
 i1JDWeTrhcImYufNj8o78LsbT5xh6YjbZZ8Q1obIgPXpDtxHNIXO6COId49Zp2cK
 obftWyVp+7kYe79NWzmD
 =sfNx
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
 "A small batch of fixes for v4.20-rc3.

  The overflow continuation fix addresses something that has been broken
  for several releases. Arguably it could wait even longer, but it's a
  one line fix and this finishes the last of the known address range
  scrub bug reports. The revert addresses a lockdep regression. The unit
  tests are not critical to fix, but no reason to hold this fix back.

  Summary:

   - Address Range Scrub overflow continuation handling has been broken
     since it was initially merged. It was only recently that error
     injection and platform-BIOS support enabled this corner case to be
     exercised.

   - The recent attempt to provide more isolation for the kernel Address
     Range Scrub state machine from userapace initiated sessions
     triggers a lockdep report. Revert and try again at the next merge
     window.

   - Fix a kasan reported buffer overflow in libnvdimm unit test
     infrastrucutre (nfit_test)"

* tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  Revert "acpi, nfit: Further restrict userspace ARS start requests"
  acpi, nfit: Fix ARS overflow continuation
  tools/testing/nvdimm: Fix the array size for dimm devices.
2018-11-18 12:21:09 -08:00
Linus Torvalds c67a98c00e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "16 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
  mm, page_alloc: check for max order in hot path
  scripts/spdxcheck.py: make python3 compliant
  tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
  lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
  mm/vmstat.c: fix NUMA statistics updates
  mm/gup.c: fix follow_page_mask() kerneldoc comment
  ocfs2: free up write context when direct IO failed
  scripts/faddr2line: fix location of start_kernel in comment
  mm: don't reclaim inodes with many attached pages
  mm, memory_hotplug: check zone_movable in has_unmovable_pages
  mm/swapfile.c: use kvzalloc for swap_info_struct allocation
  MAINTAINERS: update OMAP MMC entry
  hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
  kernel/sched/psi.c: simplify cgroup_move_task()
  z3fold: fix possible reclaim races
2018-11-18 11:31:26 -08:00
Linus Torvalds 03582f338e Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Fix an exec() related scalability/performance regression, which was
  caused by incorrectly calculating load and migrating tasks on exec()
  when they shouldn't be"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix cpu_util_wake() for 'execl' type workloads
2018-11-18 10:58:20 -08:00
Linus Torvalds b53e27f618 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Fix uncore PMU enumeration for CofeeLake CPUs"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Support CoffeeLake 8th CBOX
  perf/x86/intel/uncore: Add more IMC PCI IDs for KabyLake and CoffeeLake CPUs
2018-11-18 10:54:59 -08:00
Linus Torvalds 743a4863fd Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "Misc fixes: two warning splat fixes, a leak fix and persistent memory
  allocation fixes for ARM"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Permit calling efi_mem_reserve_persistent() from atomic context
  efi/arm: Defer persistent reservations until after paging_init()
  efi/arm/libstub: Pack FDT after populating it
  efi/arm: Revert deferred unmap of early memmap mapping
  efi: Fix debugobjects warning on 'efi_rts_work'
2018-11-18 10:52:26 -08:00
Linus Torvalds cfaa9f029f Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM spectre updates from Russell King:
 "These are the currently known final bits that resolve the Spectre
  issues. big.Little systems used to be sufficiently identical in that
  there were no differences between individual CPUs in the system that
  mattered to the kernel. With the advent of the Spectre problem, the
  CPUs now have differences in how the workaround is applied.

  As a result of previous Spectre patches, these systems ended up
  reporting quite a lot of:

     "CPUx: Spectre v2: incorrect context switching function, system vulnerable"

  messages due to the action of the big.Little switcher causing the CPUs
  to be re-initialised regularly. This series resolves that issue by
  making the CPU vtable unique to each CPU.

  However, since this is used very early, before per-cpu is setup,
  per-cpu can't be used. We also have a problem that two of the methods
  are not called from preempt-safe paths, but thankfully these remain
  identical between all CPUs in the system. To make sure, we validate
  that these are identical during boot"

* 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: spectre-v2: per-CPU vtables to work around big.Little systems
  ARM: add PROC_VTABLE and PROC_TABLE macros
  ARM: clean up per-processor check_bugs method call
  ARM: split out processor lookup
  ARM: make lookup_processor_type() non-__init
2018-11-18 10:45:09 -08:00
Chen Chang 45e79815b8 mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
Link: http://lkml.kernel.org/r/20181107100247.13359-1-rainccrun@gmail.com
Signed-off-by: Chen Chang <rainccrun@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:10 -08:00
Michal Hocko c63ae43ba5 mm, page_alloc: check for max order in hot path
Konstantin has noticed that kvmalloc might trigger the following
warning:

  WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
  [...]
  Call Trace:
   fragmentation_index+0x76/0x90
   compaction_suitable+0x4f/0xf0
   shrink_node+0x295/0x310
   node_reclaim+0x205/0x250
   get_page_from_freelist+0x649/0xad0
   __alloc_pages_nodemask+0x12a/0x2a0
   kmalloc_large_node+0x47/0x90
   __kmalloc_node+0x22b/0x2e0
   kvmalloc_node+0x3e/0x70
   xt_alloc_table_info+0x3a/0x80 [x_tables]
   do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
   nf_setsockopt+0x44/0x60
   SyS_setsockopt+0x6f/0xc0
   do_syscall_64+0x67/0x120
   entry_SYSCALL_64_after_hwframe+0x3d/0xa2

the problem is that we only check for an out of bound order in the slow
path and the node reclaim might happen from the fast path already.  This
is fixable by making sure that kvmalloc doesn't ever use kmalloc for
requests that are larger than KMALLOC_MAX_SIZE but this also shows that
the code is rather fragile.  A recent UBSAN report just underlines that
by the following report

  UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
  shift exponent 51 is too large for 32-bit type 'int'
  CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0xd2/0x148 lib/dump_stack.c:113
   ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
   __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
   __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
   zone_watermark_fast mm/page_alloc.c:3216 [inline]
   get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
   __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
   alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
   alloc_pages include/linux/gfp.h:509 [inline]
   __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
   dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
   raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
   raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
   fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
   fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
   __blkdev_driver_ioctl block/ioctl.c:303 [inline]
   blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
   block_ioctl+0x105/0x150 fs/block_dev.c:1883
   vfs_ioctl fs/ioctl.c:46 [inline]
   do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
   ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
   __do_sys_ioctl fs/ioctl.c:709 [inline]
   __se_sys_ioctl fs/ioctl.c:707 [inline]
   __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
   do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Note that this is not a kvmalloc path.  It is just that the fast path
really depends on having sanitzed order as well.  Therefore move the
order check to the fast path.

Link: http://lkml.kernel.org/r/20181113094305.GM15120@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Byoungyoung Lee <lifeasageek@gmail.com>
Cc: "Dae R. Jeong" <threeearcat@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:10 -08:00
Uwe Kleine-König 6f4d29df66 scripts/spdxcheck.py: make python3 compliant
Without this change the following happens when using Python3 (3.6.6):

	$ echo "GPL-2.0" | python3 scripts/spdxcheck.py -
	FAIL: 'str' object has no attribute 'decode'
	Traceback (most recent call last):
	  File "scripts/spdxcheck.py", line 253, in <module>
	    parser.parse_lines(sys.stdin, args.maxlines, '-')
	  File "scripts/spdxcheck.py", line 171, in parse_lines
	    line = line.decode(locale.getpreferredencoding(False), errors='ignore')
	AttributeError: 'str' object has no attribute 'decode'

So as the line is already a string, there is no need to decode it and
the line can be dropped.

/usr/bin/python on Arch is Python 3.  So this would indeed be worth
going into 4.19.

Link: http://lkml.kernel.org/r/20181023070802.22558-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Perches <joe@perches.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:10 -08:00
Yufen Yu 1a41364693 tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
Other filesystems such as ext4, f2fs and ubifs all return ENXIO when
lseek (SEEK_DATA or SEEK_HOLE) requests a negative offset.

man 2 lseek says

:      EINVAL whence  is  not  valid.   Or: the resulting file offset would be
:             negative, or beyond the end of a seekable device.
:
:      ENXIO  whence is SEEK_DATA or SEEK_HOLE, and the file offset is  beyond
:             the end of the file.

Make tmpfs return ENXIO under these circumstances as well.  After this,
tmpfs also passes xfstests's generic/448.

[akpm@linux-foundation.org: rewrite changelog]
Link: http://lkml.kernel.org/r/1540434176-14349-1-git-send-email-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:10 -08:00
Arnd Bergmann 1c23b4108d lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
gcc-8 complains about the prototype for this function:

  lib/ubsan.c:432:1: error: ignoring attribute 'noreturn' in declaration of a built-in function '__ubsan_handle_builtin_unreachable' because it conflicts with attribute 'const' [-Werror=attributes]

This is actually a GCC's bug. In GCC internals
__ubsan_handle_builtin_unreachable() declared with both 'noreturn' and
'const' attributes instead of only 'noreturn':

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84210

Workaround this by removing the noreturn attribute.

[aryabinin: add information about GCC bug in changelog]
Link: http://lkml.kernel.org/r/20181107144516.4587-1-aryabinin@virtuozzo.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:10 -08:00
Janne Huttunen 13c9aaf7fa mm/vmstat.c: fix NUMA statistics updates
Scan through the whole array to see if an update is needed.  While we're
at it, use sizeof() to be safe against any possible type changes in the
future.

The bug here is that we wouldn't sync per-cpu counters into global ones
if there was an update of numa_stats for higher cpus.  Highly
theoretical one though because it is much more probable that zone_stats
are updated so we would refresh anyway.  So I wouldn't bother to mark
this for stable, yet something nice to fix.

[mhocko@suse.com: changelog enhancement]
Link: http://lkml.kernel.org/r/1541601517-17282-1-git-send-email-janne.huttunen@nokia.com
Fixes: 1d90ca897c ("mm: update NUMA counter threshold size")
Signed-off-by: Janne Huttunen <janne.huttunen@nokia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:10 -08:00
Mike Rapoport 78179556e7 mm/gup.c: fix follow_page_mask() kerneldoc comment
Commit df06b37ffe ("mm/gup: cache dev_pagemap while pinning pages")
modified the signature of follow_page_mask() but left the parameter
description behind.

Update the description to make the code and comments agree again.

While at it, update formatting of the return value description to match
Documentation/doc-guide/kernel-doc.rst guidelines.

Link: http://lkml.kernel.org/r/1541603316-27832-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:10 -08:00
Wengang Wang 5040f8df56 ocfs2: free up write context when direct IO failed
The write context should also be freed even when direct IO failed.
Otherwise a memory leak is introduced and entries remain in
oi->ip_unwritten_list causing the following BUG later in unlink path:

  ERROR: bug expression: !list_empty(&oi->ip_unwritten_list)
  ERROR: Clear inode of 215043, inode has unwritten extents
  ...
  Call Trace:
  ? __set_current_blocked+0x42/0x68
  ocfs2_evict_inode+0x91/0x6a0 [ocfs2]
  ? bit_waitqueue+0x40/0x33
  evict+0xdb/0x1af
  iput+0x1a2/0x1f7
  do_unlinkat+0x194/0x28f
  SyS_unlinkat+0x1b/0x2f
  do_syscall_64+0x79/0x1ae
  entry_SYSCALL_64_after_hwframe+0x151/0x0

This patch also logs, with frequency limit, direct IO failures.

Link: http://lkml.kernel.org/r/20181102170632.25921-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Randy Dunlap f5f67cc0e0 scripts/faddr2line: fix location of start_kernel in comment
Fix a source file reference location to the correct path name.

Link: http://lkml.kernel.org/r/1d50bd3d-178e-dcd8-779f-9711887440eb@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Roman Gushchin a76cf1a474 mm: don't reclaim inodes with many attached pages
Spock reported that commit 172b06c32b ("mm: slowly shrink slabs with a
relatively small number of objects") leads to a regression on his setup:
periodically the majority of the pagecache is evicted without an obvious
reason, while before the change the amount of free memory was balancing
around the watermark.

The reason behind is that the mentioned above change created some
minimal background pressure on the inode cache.  The problem is that if
an inode is considered to be reclaimed, all belonging pagecache page are
stripped, no matter how many of them are there.  So, if a huge
multi-gigabyte file is cached in the memory, and the goal is to reclaim
only few slab objects (unused inodes), we still can eventually evict all
gigabytes of the pagecache at once.

The workload described by Spock has few large non-mapped files in the
pagecache, so it's especially noticeable.

To solve the problem let's postpone the reclaim of inodes, which have
more than 1 attached page.  Let's wait until the pagecache pages will be
evicted naturally by scanning the corresponding LRU lists, and only then
reclaim the inode structure.

Link: http://lkml.kernel.org/r/20181023164302.20436-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Spock <dairinin@gmail.com>
Tested-by: Spock <dairinin@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: <stable@vger.kernel.org>	[4.19.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Michal Hocko 9d7899999c mm, memory_hotplug: check zone_movable in has_unmovable_pages
Page state checks are racy.  Under a heavy memory workload (e.g.  stress
-m 200 -t 2h) it is quite easy to hit a race window when the page is
allocated but its state is not fully populated yet.  A debugging patch to
dump the struct page state shows

  has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0
  page:ffffea0437fb0000 count:1 mapcount:1 mapping:ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1
  flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)

Note that the state has been checked for both PageLRU and PageSwapBacked
already.  Closing this race completely would require some sort of retry
logic.  This can be tricky and error prone (think of potential endless
or long taking loops).

Workaround this problem for movable zones at least.  Such a zone should
only contain movable pages.  Commit 15c30bc090 ("mm, memory_hotplug:
make has_unmovable_pages more robust") has told us that this is not
strictly true though.  Bootmem pages should be marked reserved though so
we can move the original check after the PageReserved check.  Pages from
other zones are still prone to races but we even do not pretend that
memory hotremove works for those so pre-mature failure doesn't hurt that
much.

Link: http://lkml.kernel.org/r/20181106095524.14629-1-mhocko@kernel.org
Fixes: 15c30bc090 ("mm, memory_hotplug: make has_unmovable_pages more robust")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Baoquan He <bhe@redhat.com>
Tested-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Vasily Averin 873d7bcfd0 mm/swapfile.c: use kvzalloc for swap_info_struct allocation
Commit a2468cc9bf ("swap: choose swap device according to numa node")
changed 'avail_lists' field of 'struct swap_info_struct' to an array.
In popular linux distros it increased size of swap_info_struct up to 40
Kbytes and now swap_info_struct allocation requires order-4 page.
Switch to kvzmalloc allows to avoid unexpected allocation failures.

Link: http://lkml.kernel.org/r/fc23172d-3c75-21e2-d551-8b1808cbe593@virtuozzo.com
Fixes: a2468cc9bf ("swap: choose swap device according to numa node")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Aaro Koskinen f341e16fb6 MAINTAINERS: update OMAP MMC entry
Jarkko's e-mail address hasn't worked for a long time.  We still want to
keep this driver working as it is critical for some of the OMAP boards.
I use and test this driver frequently, so change myself as a maintainer
with "Odd Fixes" status.

Link: http://lkml.kernel.org/r/20181106222750.12939-1-aaro.koskinen@iki.fi
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Mike Kravetz 5e41540c8a hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
This bug has been experienced several times by the Oracle DB team.  The
BUG is in remove_inode_hugepages() as follows:

	/*
	 * If page is mapped, it was faulted in after being
	 * unmapped in caller.  Unmap (again) now after taking
	 * the fault mutex.  The mutex will prevent faults
	 * until we finish removing the page.
	 *
	 * This race can only happen in the hole punch case.
	 * Getting here in a truncate operation is a bug.
	 */
	if (unlikely(page_mapped(page))) {
		BUG_ON(truncate_op);

In this case, the elevated map count is not the result of a race.
Rather it was incorrectly incremented as the result of a bug in the huge
pmd sharing code.  Consider the following:

 - Process A maps a hugetlbfs file of sufficient size and alignment
   (PUD_SIZE) that a pmd page could be shared.

 - Process B maps the same hugetlbfs file with the same size and
   alignment such that a pmd page is shared.

 - Process B then calls mprotect() to change protections for the mapping
   with the shared pmd. As a result, the pmd is 'unshared'.

 - Process B then calls mprotect() again to chage protections for the
   mapping back to their original value. pmd remains unshared.

 - Process B then forks and process C is created. During the fork
   process, we do dup_mm -> dup_mmap -> copy_page_range to copy page
   tables. Copying page tables for hugetlb mappings is done in the
   routine copy_hugetlb_page_range.

In copy_hugetlb_page_range(), the destination pte is obtained by:

	dst_pte = huge_pte_alloc(dst, addr, sz);

If pmd sharing is possible, the returned pointer will be to a pte in an
existing page table.  In the situation above, process C could share with
either process A or process B.  Since process A is first in the list,
the returned pte is a pointer to a pte in process A's page table.

However, the check for pmd sharing in copy_hugetlb_page_range is:

	/* If the pagetables are shared don't copy or take references */
	if (dst_pte == src_pte)
		continue;

Since process C is sharing with process A instead of process B, the
above test fails.  The code in copy_hugetlb_page_range which follows
assumes dst_pte points to a huge_pte_none pte.  It copies the pte entry
from src_pte to dst_pte and increments this map count of the associated
page.  This is how we end up with an elevated map count.

To solve, check the dst_pte entry for huge_pte_none.  If !none, this
implies PMD sharing so do not copy.

Link: http://lkml.kernel.org/r/20181105212315.14125-1-mike.kravetz@oracle.com
Fixes: c5c99429fa ("fix hugepages leak due to pagetable page sharing")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Olof Johansson 8fcb2312d1 kernel/sched/psi.c: simplify cgroup_move_task()
The existing code triggered an invalid warning about 'rq' possibly being
used uninitialized.  Instead of doing the silly warning suppression by
initializa it to NULL, refactor the code to bail out early instead.

Warning was:

  kernel/sched/psi.c: In function `cgroup_move_task':
  kernel/sched/psi.c:639:13: warning: `rq' may be used uninitialized in this function [-Wmaybe-uninitialized]

Link: http://lkml.kernel.org/r/20181103183339.8669-1-olof@lixom.net
Fixes: 2ce7135adc ("psi: cgroup support")
Signed-off-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Vitaly Wool ca0246bb97 z3fold: fix possible reclaim races
Reclaim and free can race on an object which is basically fine but in
order for reclaim to be able to map "freed" object we need to encode
object length in the handle.  handle_to_chunks() is then introduced to
extract object length from a handle and use it during mapping.

Moreover, to avoid racing on a z3fold "headless" page release, we should
not try to free that page in z3fold_free() if the reclaim bit is set.
Also, in the unlikely case of trying to reclaim a page being freed, we
should not proceed with that page.

While at it, fix the page accounting in reclaim function.

This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups".

Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Signed-off-by: Jongseok Kim <ks77sj@gmail.com>
Reported-by-by: Jongseok Kim <ks77sj@gmail.com>
Reviewed-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Christoph Hellwig f5d72c5c55 mmc: stop abusing the request queue_lock pointer
Replace the lock in mmc_blk_data that is only used through a pointer
in struct mmc_queue and to protect fields in that structure with
an actual lock in struct mmc_queue.

Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-17 07:20:45 -07:00
Linus Torvalds 1ce80e0fe9 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlvukbUACgkQnJ2qBz9k
 QNkJnggAriAyO72fH7nfh2AzvwnkqLi9uE4JQk+ZxQFW9W6a32Y+qIRBsaiau++w
 /lOVSGLCCVewzAa4b/lYPdyOS/yG4w6UuEsg5HQPux4JsOccalYudB5ONdeZGd/P
 5FPxcsixDQQHP9mhz1gIn2wV7bbytH2DduIHxuKbJfX//xnO3jS/gnJq82rnxb4R
 IzqewVjbFiqYQKtyuzIqKqI3yOVqESJhyQ96j6chj5YKNUkMuwU9akhQGR4Zcqen
 /cxh+mK+ZzCybgu2Sfgx3noZb5RFYePpcSSpDG5OKJc04IY4m76fyHIi2flBE9AX
 HCN60y+VyKFOiNxzsSLXgsE9aj40OA==
 =UcJO
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify fix from Jan Kara:
 "One small fsnotify fix for duplicate events"

* tag 'fsnotify_for_v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: fix handling of events on child sub-directory
2018-11-16 13:18:36 -06:00
Linus Torvalds e6a2562fe2 Merge tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull bfs2 fixes from Andreas Gruenbacher:
 "Fix two bugs leading to leaked buffer head references:

   - gfs2: Put bitmap buffers in put_super
   - gfs2: Fix iomap buffer head reference counting bug

  And one bug leading to significant slow-downs when deleting large
  files:

   - gfs2: Fix metadata read-ahead during truncate (2)"

* tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix iomap buffer head reference counting bug
  gfs2: Fix metadata read-ahead during truncate (2)
  gfs2: Put bitmap buffers in put_super
2018-11-16 11:38:14 -06:00
Andreas Gruenbacher c26b5aa8ef gfs2: Fix iomap buffer head reference counting bug
GFS2 passes the inode buffer head (dibh) from gfs2_iomap_begin to
gfs2_iomap_end in iomap->private.  It sets that private pointer in
gfs2_iomap_get.  Users of gfs2_iomap_get other than gfs2_iomap_begin
would have to release iomap->private, but this isn't done correctly,
leading to a leak of buffer head references.

To fix this, move the code for setting iomap->private from
gfs2_iomap_get to gfs2_iomap_begin.

Fixes: 64bc06bb32 ("gfs2: iomap buffered write support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-16 11:35:09 -06:00
Linus Torvalds 32e2524a52 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

   - Potential memory overwrite in simd

   - Kernel info leaks in crypto_user

   - NULL dereference and use-after-free in hisilicon"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: user - Zeroize whole structure given to user space
  crypto: user - fix leaking uninitialized memory to userspace
  crypto: simd - correctly take reqsize of wrapped skcipher into account
  crypto: hisilicon - Fix reference after free of memories on error path
  crypto: hisilicon - Fix NULL dereference for same dst and src
2018-11-16 10:37:27 -06:00
Linus Torvalds 4efd34602f i915, amdgpu, omapdrm, docs and mst fix
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb7fpHAAoJEAx081l5xIa+750P/1b/w3g7nZFbcXpDhomBVQJH
 qbxAsGdxwerZsYBnp4aSa4CEy4BWVJqGEqvBlvoSTqrdXZoP4ViQotFHgQ8efpnj
 DluZrKHzNrSXPAZYJqGJ6nY5QiFDdYdlj303+h4pt+3Ndc7fDXwM2kTECQASfGxZ
 lqrCN9wwmYlXdAiaP0GDrk4iPFuF3s4S34R0TuAEuigr8usYbUky6cjd/GbANTVB
 ovvrgkNCz13EBBCqoWdA4S6h1/yJPzXxE6lG9w4nhbVtXupxkk6ZRwAxT+M6A8P6
 uGrqKQweAgfKPKWcLVEKJpGQwJ+zsbn1jqchjWLNKbcPdub9kLW7c+0SFQw4+Evm
 YMU9pS8DatM8jZ6fVv1Lwc7P3+Fue4zdNQ3Izw8+IiDbbdbb5bT3rUSKXU2qPl8o
 tDygle1R4k7jazOwK+htNX02MpQjHGDAKrkM188m6Wq8QPraqfjxbp26dP1Geh+h
 DPVQ833gIMKxXZIfo5BUaS8JK1gCYvgtDDSJd3twn5MBEuV94upXB6Zix6AkISmd
 pfjWv4OFh1wk0EraHfp50BlJ51BS8Tgfp565dC1NaBFqxulGaLLx9pxbiW9+lvQw
 fiQoC4KjkxCqpR3gHAF5WRrzUKwEIUyv/+qiVJJO7kkl0Jwf8I5rFVM1jeux1PAX
 UGjXLcYx+qMkF0/ItjWH
 =JYkZ
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2018-11-16' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Live from Vancouver, SoC maintainer talk, this weeks drm fixes pull
  for rc3:

  omapdrm:
   - regression fixes for the reordering bridge stuff that went into rc1

  i915:
   - incorrect EU count fix
   - HPD storm fix
   - MST fix
   - relocation fix for gen4/5

  amdgpu:
   - huge page handling fix
   - IH ring setup
   - XGMI aperture setup
   - watermark setup fix

  misc:
   - docs and MST fix"

* tag 'drm-fixes-2018-11-16' of git://anongit.freedesktop.org/drm/drm: (23 commits)
  drm/i915: Account for scale factor when calculating initial phase
  drm/i915: Clean up skl_program_scaler()
  drm/i915: Move programming plane scaler to its own function.
  drm/i915/icl: Drop spurious register read from icl_dbuf_slices_update
  drm/i915: fix broadwell EU computation
  drm/amdgpu: fix huge page handling on Vega10
  drm/amd/pp: Fix truncated clock value when set watermark
  drm/amdgpu: fix bug with IH ring setup
  drm/meson: venc: dmt mode must use encp
  drm/amdgpu: set system aperture to cover whole FB region
  drm/i915: Fix hpd handling for pins with two encoders
  drm/i915/execlists: Force write serialisation into context image vs execution
  drm/i915/icl: Fix power well 2 wrt. DC-off toggling order
  drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST
  drm/i915: Fix possible race in intel_dp_add_mst_connector()
  drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5
  drm/omap: dsi: Fix missing of_platform_depopulate()
  drm/omap: Move DISPC runtime PM handling to omapdrm
  drm/omap: dsi: Ensure the device is active during probe
  drm/omap: hdmi4: Ensure the device is active during bind
  ...
2018-11-16 10:17:29 -06:00
Christoph Hellwig f04842734c ide: don't acquire queue_lock in ide_complete_pm_rq
blk_mq_stop_hw_queues doesn't need any locking, and the ide
dev_flags field isn't protected by it either.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 09:17:02 -07:00
Christoph Hellwig b2101f655f ide: don't acquire queue lock in ide_pm_execute_rq
There is nothing we can synchronize against over a call to
blk_queue_dying.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 09:17:00 -07:00
Christoph Hellwig a50f9aec1a pktcdvd: remove queue_lock around blk_queue_max_hw_sectors
blk_queue_max_hw_sectors can't do anything with queue_lock protection
so don't hold it.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 09:16:59 -07:00
Christoph Hellwig 503f620f0c floppy: remove queue_lock around floppy_end_request
There is nothing the queue_lock could protect inside floppy_end_request,
so remove it.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 09:16:57 -07:00
Christoph Hellwig 2b78eae147 block: remove the rq_alloc_data request_queue field
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 09:16:55 -07:00
Linus Torvalds ef268de197 powerpc fixes for 4.20 #3
Two weeks worth of fixes since rc1.
 
  - I broke 16-byte alignment of the stack when we moved PPR into pt_regs.
    Despite being required by the ABI this broke almost nothing, we eventually
    hit it in code where GCC does arithmetic on the stack pointer assuming the
    bottom 4 bits are clear. Fix it by padding the in-kernel pt_regs by 8 bytes.
 
  - A couple of commits fixing minor bugs in the recent SLB rewrite.
 
  - A build fix related to tracepoints in KVM in some configurations.
 
  - Our old "IO workarounds" code written for Cell couldn't coexist in a kernel
    that runs on Power9 with the Radix MMU, fix that.
 
  - Remove the NPU DMA ops, these just printed a warning and should never have
    been called.
 
  - Suppress an overly chatty message triggered by CPU hotplug in some configs.
 
  - Two small selftest fixes.
 
 Thanks to:
   Alistair Popple, Gustavo Romero, Nicholas Piggin, Satheesh Rajendran, Scott Wood.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb7qyjAAoJEFHr6jzI4aWA73kP/3x+vghoHthxrazbN+9Z0bWQ
 +c5fHobHLuREjzLhy73lbCE9NOhZTlmdgfAB/MRWW9aVIzHViuhUjRLpLqw0+LBA
 mWIrVloJeMcbupE+zFnc6qh1WY/YZ8lsPZxmb5YSqDSxtcdh8JzDK+RgWn9XkiFa
 sjppaZoLLf/Wxz4VT4v75o8WXEFavpbEaS2PLdWhwT1//H4QpKYWY80tPCijdRhp
 0susCzObBfdwxS4qlwmLBmCxbGhqLzBg1vnPPGq6GypRELIeqR+jHWOjzYmmvQRh
 hLffVTaHIVFgO9c4ruCFmMsJCA1hf186w/62IHXLgOfp7eQJYPQYCn7uVVTWoVDC
 5hpPR71xOkovLJbipk07lshj/kVJQVapCbyGtOz8DKgQnAWcSq23HMuJBHwSEnKH
 xtIk6iupik+7gWjDDY0Yz0xiCmZLRS5heWNAJgXpPFNMSR47EkmU4bZpMrWwbetf
 CashFhbYcFzpPaP7bHgxLk79fhUitkwCFFlSQK3Hj4bog+U2KmKnKgLphn8If8jy
 iHuggxvxCwDvfn+d9X29VJu7Bxz4M5l0ouEp0+hWAockGCsaDxXNuzJNMk1V5GyO
 Ytg6UtGBxLwKBKLqNeXmuY0/CD1lALU5mD/KSIKDjm7skM7T0U3s8T+7/GpFqQ7J
 uQ9aA49aWy4kUy+xCay5
 =fEqA
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Two weeks worth of fixes since rc1.

   - I broke 16-byte alignment of the stack when we moved PPR into
     pt_regs. Despite being required by the ABI this broke almost
     nothing, we eventually hit it in code where GCC does arithmetic on
     the stack pointer assuming the bottom 4 bits are clear. Fix it by
     padding the in-kernel pt_regs by 8 bytes.

   - A couple of commits fixing minor bugs in the recent SLB rewrite.

   - A build fix related to tracepoints in KVM in some configurations.

   - Our old "IO workarounds" code written for Cell couldn't coexist in
     a kernel that runs on Power9 with the Radix MMU, fix that.

   - Remove the NPU DMA ops, these just printed a warning and should
     never have been called.

   - Suppress an overly chatty message triggered by CPU hotplug in some
     configs.

   - Two small selftest fixes.

  Thanks to: Alistair Popple, Gustavo Romero, Nicholas Piggin, Satheesh
  Rajendran, Scott Wood"

* tag 'powerpc-4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Adjust wild_bctr to build with old binutils
  powerpc/64: Fix kernel stack 16-byte alignment
  powerpc/numa: Suppress "VPHN is not supported" messages
  selftests/powerpc: Fix wild_bctr test to work on ppc64
  powerpc/io: Fix the IO workarounds code to work with Radix
  powerpc/mm/64s: Fix preempt warning in slb_allocate_kernel()
  KVM: PPC: Move and undef TRACE_INCLUDE_PATH/FILE
  powerpc/mm/64s: Only use slbfee on CPUs that support it
  powerpc/mm/64s: Use PPC_SLBFEE macro
  powerpc/mm/64s: Consolidate SLB assertions
  powerpc/powernv/npu: Remove NPU DMA ops
2018-11-16 10:14:54 -06:00
Linus Torvalds 50d25bdc64 Xtensa fixes for v4.20-rc3
- Fix stack alignment for bFLT binaries.
 - Fix physical-to-virtual address translation for boot parameters in MMUv3
   256+256 and 512+512 virtual memory layouts.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAlvt5vsTHGpjbXZia2Jj
 QGdtYWlsLmNvbQAKCRBR+cyR+D+gRF9rD/4uzfL1fbeEp3qRavlUzMMGW2P0Z7Gc
 GTUC6PHRUdZhs7j8/rkuXkZuNv5vMn/8ZXARQBgyymxDYzFmWsbRCF0eKt3RbxkG
 Pw2h2E3r8eERSj6Hwf0AZQx2c1krRIXFOaxYzsz8fV9yEX1Atze0P7YJJ3m90WCV
 kgFhkkF++fed8s2ioEBngXkn0Pwx9DHl7DnNhGTVoZxWZ/9xyuksqkjSS/B9Rpge
 o1SjUIYSCwIl8Ag4FW39CHTGvhDXG8WvFInQ7DGPEkzZ8nlGchnOD9L/SpZRXYo+
 HwfrPvpDMt8iSJTE1pwv6ovyJSmvCOIIjL0tuA6KmjXiE5EBdWdolVy9byQQ3pA+
 t5zvp6l9bmqUibb25lxNfckFjQhtH6CplRPmOu93irnsGB1wV9gVwNwRiKTl4HRM
 4rdSDjEvl6mClPLjkILUWp9Pcp6KiIQzPgwhK2qdKsCrvECzDjzf/swyV5+WMHSS
 Yz1OsKhSmzvKYxKyKPUdEdFeSvdAvek5CgyZD8Lb+haRqUQahT4JItV86O59fXTU
 xYteo32oL4odSMAkqgWxI4i2/Oq1PBCA0byhUDLt56HOCnzEYJV/qVLyJaH93w1p
 dIc+BZeak228xChf2zgYPHIyXXS061CNUV4MNYES5XlY+bprQbhlgO9UNCTCVkNy
 vLX5WnsMS8Owog==
 =Ih+A
 -----END PGP SIGNATURE-----

Merge tag 'xtensa-20181115' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa fixes from Max Filippov:

 - fix stack alignment for bFLT binaries.

 - fix physical-to-virtual address translation for boot parameters in
   MMUv3 256+256 and 512+512 virtual memory layouts.

* tag 'xtensa-20181115' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: fix boot parameters address translation
  xtensa: make sure bFLT stack is 16 byte aligned
2018-11-16 10:10:27 -06:00
Jens Axboe cb700eb3fa block: don't plug for aio/O_DIRECT HIPRI IO
Those will go straight to issue inside blk-mq, so don't bother
setting up a block plug for them.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 08:35:10 -07:00
Jens Axboe d34513d384 block: for async O_DIRECT, mark us as polling if asked to
Inherit the iocb IOCB_HIPRI flag, and pass on REQ_HIPRI for
those kinds of requests.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 08:34:59 -07:00
Jens Axboe 0619317ff8 block: add polled wakeup task helper
If we're polling for IO on a device that doesn't use interrupts, then
IO completion loop (and wake of task) is done by submitting task itself.
If that is the case, then we don't need to enter the wake_up_process()
function, we can simply mark ourselves as TASK_RUNNING.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 08:34:30 -07:00
Jens Axboe e504545446 blk-rq-qos: inline check for q->rq_qos functions
Put the short code in the fast path, where we don't have any
functions attached to the queue. This minimizes the impact on
the hot path in the core code.

Cc: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 08:34:19 -07:00
Jens Axboe 344e9ffcbd block: add queue_is_mq() helper
Various spots check for q->mq_ops being non-NULL, but provide
a helper to do this instead.

Where the ->mq_ops != NULL check is redundant, remove it.

Since mq == rq-based now that legacy is gone, get rid of the
queue_is_rq_based() and just use queue_is_mq() everywhere.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 08:34:06 -07:00
Jens Axboe dabcefab45 nvme: provide optimized poll function for separate poll queues
If we have separate poll queues, we know that they aren't using
interrupts. Hence we don't need to disable interrupts around
finding completions.

Provide a separate set of blk_mq_ops for such devices.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16 08:33:55 -07:00