1
0
Fork 0
Commit Graph

680109 Commits (376a78abf5cc721a86ed42a1a24044d35fb8d2a8)

Author SHA1 Message Date
Max Gurtovoy fe631457ff blk-mq: map all HWQ also in hyperthreaded system
This patch performs sequential mapping between CPUs and queues.
In case the system has more CPUs than HWQs then there are still
CPUs to map to HWQs. In hyperthreaded system, map the unmapped CPUs
and their siblings to the same HWQ.
This actually fixes a bug that found unmapped HWQs in a system with
2 sockets, 18 cores per socket, 2 threads per core (total 72 CPUs)
running NVMEoF (opens upto maximum of 64 HWQs).

Performance results running fio (72 jobs, 128 iodepth)
using null_blk (w/w.o patch):

bs      IOPS(read submit_queues=72)   IOPS(write submit_queues=72)   IOPS(read submit_queues=24)  IOPS(write submit_queues=24)
-----  ----------------------------  ------------------------------ ---------------------------- -----------------------------
512    4890.4K/4723.5K                 4524.7K/4324.2K                   4280.2K/4264.3K               3902.4K/3909.5K
1k     4910.1K/4715.2K                 4535.8K/4309.6K                   4296.7K/4269.1K               3906.8K/3914.9K
2k     4906.3K/4739.7K                 4526.7K/4330.6K                   4301.1K/4262.4K               3890.8K/3900.1K
4k     4918.6K/4730.7K                 4556.1K/4343.6K                   4297.6K/4264.5K               3886.9K/3893.9K
8k     4906.4K/4748.9K                 4550.9K/4346.7K                   4283.2K/4268.8K               3863.4K/3858.2K
16k    4903.8K/4782.6K                 4501.5K/4233.9K                   4292.3K/4282.3K               3773.1K/3773.5K
32k    4885.8K/4782.4K                 4365.9K/4184.2K                   4307.5K/4289.4K               3780.3K/3687.3K
64k    4822.5K/4762.7K                 2752.8K/2675.1K                   4308.8K/4312.3K               2651.5K/2655.7K
128k   2388.5K/2313.8K                 1391.9K/1375.7K                   2142.8K/2152.2K               1395.5K/1374.2K

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-29 08:40:11 -06:00
Steven Rostedt (VMware) 0f17976568 ftrace: Fix regression with module command in stack_trace_filter
When doing the following command:

 # echo ":mod:kvm_intel" > /sys/kernel/tracing/stack_trace_filter

it triggered a crash.

This happened with the clean up of probes. It required all callers to the
regex function (doing ftrace filtering) to have ops->private be a pointer to
a trace_array. But for the stack tracer, that is not the case.

Allow for the ops->private to be NULL, and change the function command
callbacks to handle the trace_array pointer being NULL as well.

Fixes: d2afd57a4b ("tracing/ftrace: Allow instances to have their own function probes")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-29 10:05:45 -04:00
Brian Norris 1d80df93d9 Revert "pinctrl: rockchip: avoid hardirq-unsafe functions in irq_chip"
This reverts commit 88bb94216f.

It introduced a new CONFIG_DEBUG_ATOMIC_SLEEP warning in v4.12-rc1:

[ 7226.716713] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
[ 7226.716716] in_atomic(): 0, irqs_disabled(): 0, pid: 1708, name: bash
[ 7226.716722] CPU: 1 PID: 1708 Comm: bash Not tainted 4.12.0-rc6+ #1213
[ 7226.716724] Hardware name: Google Kevin (DT)
[ 7226.716726] Call trace:
[ 7226.716738] [<ffffff8008089928>] dump_backtrace+0x0/0x24c
[ 7226.716743] [<ffffff8008089b94>] show_stack+0x20/0x28
[ 7226.716749] [<ffffff8008371370>] dump_stack+0x90/0xb0
[ 7226.716755] [<ffffff80080cd2a0>] ___might_sleep+0x10c/0x124
[ 7226.716760] [<ffffff80080cd330>] __might_sleep+0x78/0x88
[ 7226.716765] [<ffffff800879e210>] mutex_lock+0x2c/0x64
[ 7226.716771] [<ffffff80083ad678>] rockchip_irq_bus_lock+0x30/0x3c
[ 7226.716777] [<ffffff80080f6d40>] __irq_get_desc_lock+0x78/0x98
[ 7226.716782] [<ffffff80080f7e6c>] irq_set_irq_wake+0x44/0x12c
[ 7226.716787] [<ffffff8008486e18>] dev_pm_arm_wake_irq+0x4c/0x58
[ 7226.716792] [<ffffff800848b80c>] device_wakeup_arm_wake_irqs+0x3c/0x58
[ 7226.716796] [<ffffff80084896fc>] dpm_suspend_noirq+0xf8/0x3a0
[ 7226.716800] [<ffffff80080f1384>] suspend_devices_and_enter+0x1a4/0x9a8
[ 7226.716803] [<ffffff80080f21ec>] pm_suspend+0x664/0x6a4
[ 7226.716807] [<ffffff80080f04d8>] state_store+0xd4/0xf8
...

It was reported on -rc1, and it's still not fixed in -rc6, so it should
just be reverted.

Cc: John Keeping <john@metanate.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-29 15:03:24 +02:00
Hans de Goede c06632ea05 gpio: acpi: Skip _AEI entries without a handler rather then aborting the scan
acpi_walk_resources will stop as soon as the callback passed in returns
an error status. On a x86 tablet I have the first GpioInt in the _AEI
resource list has no handler defined in the DSDT, causing
acpi_walk_resources to abort scanning the rest of the resource list,
which does define valid ACPI GPIO events.

This commit changes the return for not finding a handler from
AE_BAD_PARAMETER to AE_OK so that the rest of the resource list will
get scanned normally in case of missing event handlers.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-29 14:55:08 +02:00
Bartosz Golaszewski ad537b8225 gpiolib: fix filtering out unwanted events
GPIOEVENT_REQUEST_BOTH_EDGES is not a single flag, but a binary OR of
GPIOEVENT_REQUEST_RISING_EDGE and GPIOEVENT_REQUEST_FALLING_EDGE.

The expression 'le->eflags & GPIOEVENT_REQUEST_BOTH_EDGES' we'll get
evaluated to true even if only one event type was requested.

Fix it by checking both RISING & FALLING flags explicitly.

Cc: stable@vger.kernel.org
Fixes: 61f922db72 ("gpio: userspace ABI for reading GPIO line events")
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-29 11:33:46 +02:00
Tony Luck 164c29244d EDAC, pnd2: Fix Apollo Lake DIMM detection
Non-existent or empty DIMM slots result in error return from
RD_REGP(). But we shouldn't give up on failure.

So long as we find at least one DIMM we can continue.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170628234407.21521-1-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29 10:37:50 +02:00
Jérémy Lefaure a8c8261425 EDAC, i5000, i5400: Fix definition of NRECMEMB register
In the i5000 and i5400 drivers, the NRECMEMB register is defined as a
16-bit value, which results in wrong shifts in the code, as reported by
sparse.

In the datasheets ([1], section 3.9.22.20 and [2], section 3.9.22.21),
this register is a 32-bit register. A u32 value for the register fixes
the wrong shifts warnings and matches the datasheet.

Also fix the mask to access to the CAS bits [27:16] in the i5000 driver.

[1]: https://www.intel.com/content/dam/doc/datasheet/5000p-5000v-5000z-chipset-memory-controller-hub-datasheet.pdf
[2]: https://www.intel.se/content/dam/doc/datasheet/5400-chipset-memory-controller-hub-datasheet.pdf

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170629005729.8478-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29 10:33:13 +02:00
Thomas Gleixner ff801b716e sched/numa: Hide numa_wake_affine() from UP build
Stephen reported the following build warning in UP:

kernel/sched/fair.c:2657:9: warning: 'struct sched_domain' declared inside
parameter list
         ^
/home/sfr/next/next/kernel/sched/fair.c:2657:9: warning: its scope is only this
definition or declaration, which is probably not what you want

Hide the numa_wake_affine() inline stub on UP builds to get rid of it.

Fixes: 3fed382b46 ("sched/numa: Implement NUMA node level wake_affine()")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2017-06-29 08:25:52 +02:00
Tobias Klauser 6474924e2b arch: remove unused macro/function thread_saved_pc()
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d55977 ("sched/core: Remove pointless printout in
sched_show_task()").  Remove the implementations as well.

Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-28 16:13:57 -07:00
Jens Axboe 9ae3b3f52c block: provide bio_uninit() free freeing integrity/task associations
Wen reports significant memory leaks with DIF and O_DIRECT:

"With nvme devive + T10 enabled, On a system it has 256GB and started
logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
leaking.

/proc/meminfo | grep SUnreclaim...

SUnreclaim:      6752128 kB
SUnreclaim:      6874880 kB
SUnreclaim:      7238080 kB
....
SUnreclaim:     22307264 kB
SUnreclaim:     22485888 kB
SUnreclaim:     22720256 kB

When testcases with T10 enabled call into __blkdev_direct_IO_simple,
code doesn't free memory allocated by bio_integrity_alloc. The patch
fixes the issue. HTX has been run with +60 hours without failure."

Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
doesn't go through the regular bio free. This means that any ancillary
data allocated with the bio through the stack is not freed. Hence, we
can leak the integrity data associated with the bio, if the device is
using DIF/DIX.

Fix this by providing a bio_uninit() and export it, so that we can use
it to free this data. Note that this is a minimal fix for this issue.
Any current user of bio's that are allocated outside of
bio_alloc_bioset() suffers from this issue, most notably some drivers.
We will fix those in a more comprehensive patch for 4.13. This also
means that the commit marked as being fixed by this isn't the real
culprit, it's just the most obvious one out there.

Fixes: 542ff7bf18 ("block: new direct I/O implementation")
Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 15:30:13 -06:00
Linus Torvalds e547204f1f NFS client bugfixes for Linux 4.12
Bugfixes include:
 
 - Stable fix for exclusive create if the server supports the umask attribute
 - Trunking detection should handle ERESTARTSYS/EINTR
 - Stable fix for a race in the LAYOUTGET function
 - Stable fix to revert "nfs_rename() handle -ERESTARTSYS dentry left behind"
 - nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZU7Z/AAoJEGcL54qWCgDyqkwQAKwHulvAMeKH9/Wy3vo7mqOR
 Yqbz2mujMzTFUaebV1bvOJzmpvP5uBmC/9ggqBCYbV0pW7mR9YYiX6RH7TfQ/IfQ
 XZjOq+isBFIhDUVQQ2VOCbdgIHBu1V5++1Oterwtz9yWfXB+GXdlVD0UwH/j4MHG
 LP/z7Xa7jdsG/XrQC8Z22Qk7byBR6+D4IBYRZlqwVtnEtte880LZJh7OjOUpwRVW
 SwH5p7EAF8plyr/9/OT8uC+dl+LPE5cs3ZXkkTiEB91VLRdCuU/wxo8r5xMU3wPY
 sT7q7O50fTvQka0to5Ag64laXwq352SjqfYwHd+90THc8Zh2XbuRftF3zhvi46d5
 kaXvdNqggV7qTJc8dcEt2dtdTOaQ9zr1SOfar9pusnfAvrmw46R6tmS7YJbMhomr
 iQzDKSwo9IZ7mTCiEopUVb/nXQwgy+/oojPNR6f0eXM2DU3z0CR/6awyAFmpgRUK
 OWcMSTSXYq/LMSaZEZq5htzTzkRl1m6V/z5vRD1M/ZqXp2hpwJWmDupciJEya/Y8
 /L+tkkPjGeBkEzgSYqJYpTMnzfxidMw+QDYkQIGYei421/nBi1BzH3IU82aKvsVI
 V6/i2DEL6ncJhw71tHHYBfn01PXIdwMdBHJVLunMZPU9hgc0UF5o1jaK3JourAWg
 Uqd1JTm/mOLSZNTgoXbO
 =w19C
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Bugfixes include:

   - stable fix for exclusive create if the server supports the umask
     attribute

   - trunking detection should handle ERESTARTSYS/EINTR

   - stable fix for a race in the LAYOUTGET function

   - stable fix to revert "nfs_rename() handle -ERESTARTSYS dentry left
     behind"

   - nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()"

* tag 'nfs-for-4.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()
  Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind"
  NFSv4.1: Fix a race in nfs4_proc_layoutget
  NFS: Trunking detection should handle ERESTARTSYS/EINTR
  NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umask
2017-06-28 13:27:15 -07:00
Linus Torvalds 5a37be4b51 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "This is the final set of fixes for -rc8, just a few i915 and one
  vmwgfx ones.

  I'm off on holidays for a week, so if anything shows up for fixes I've
  asked Daniel or Sean Paul to herd it in the right direction"

[ The additional etnaviv fixes were already herded towards me as seen in
  my previous pull - Linus ]

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
  drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations
  drm/i915: Hold struct_mutex for per-file stats in debugfs/i915_gem_object
  drm/i915: Retire the VMA's fence tracker before unbinding
2017-06-28 13:22:26 -07:00
Linus Torvalds cf723497f2 Merge branch 'etnaviv/fixes' of git://git.pengutronix.de/git/lst/linux
Pull drm/etnaviv fixes from Lucas Stach:
 "I realized I just missed the cut-off point for the final drm fixes
  pull, but I have 2 more etnaviv fixes that need to go into 4.12, as
  they fix fallout from the explicit sync work introduced in the last
  merge window"

[ Pulling directly because Dave is on vacation. Noted by Daniel Vetter,
  and acked by Dave Airlie  - Linus ]

* 'etnaviv/fixes' of git://git.pengutronix.de/git/lst/linux:
  drm/etnaviv: Fix implicit/explicit sync sense inversion
  drm/etnaviv: fix submit flags getting overwritten by BO content
2017-06-28 13:13:48 -07:00
Kees Cook fd25d19f6b locking/refcount: Create unchecked atomic_t implementation
Many subsystems will not use refcount_t unless there is a way to build the
kernel so that there is no regression in speed compared to atomic_t. This
adds CONFIG_REFCOUNT_FULL to enable the full refcount_t implementation
which has the validation but is slightly slower. When not enabled,
refcount_t uses the basic unchecked atomic_t routines, which results in
no code changes compared to just using atomic_t directly.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Windsor <dwindsor@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170621200026.GA115679@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-28 18:54:46 +02:00
Sagi Grimberg f1d4ef7d88 nvmet-rdma: register ib_client to not deadlock in device removal
We can deadlock in case we got to a device removal
event on a queue which is already in the process of
destroying the cm_id is this is blocking until all
events on this cm_id will drain. On the other hand
we cannot guarantee that rdma_destroy_id was invoked
as we only have indication that the queue disconnect
flow has been queued (the queue state is updated before
the realease work has been queued).

So, we leave all the queue removal to a separate ib_client
to avoid this deadlock as ib_client device removal is in
a different context than the cm_id itself.

Reported-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
James Smart 69fa964632 nvme_fc: fix error recovery on link down.
Currently, the fc transport invokes nvme_fc_error_recovery() on every
io in which the transport detects an error.  Which means:
a) it's really noisy on large io loads that all get hit by a link down.
b) we repeatively call nvme_stop_queues() even though queues are
 stopped upon the first error or as first steps of reset_work.

Correct by:
Errors are only meaningful if the controller is in the LIVE state.
Thus, enact the reset_work only if LIVE. If called repeatively, state
will have already transitioned.
There's no need to stop the queues here. Let the first steps of
reset_work do the queue stopping.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
James Smart 188f7e8a37 nvmet_fc: fix crashes on bad opcodes
if a nvme command is issued with an opcode that is not supported by
the target (example: opcode 21 - detach namespace), the target
crashes due to a null pointer.

nvmet_req_init() detects the bad opcode and immediately calls the nvme
command done routine with an error status, allowing the transport to
send the response. However, the FC transport was aborting the command
on error, so the abort freed the lldd point, but the rsp transmit path
referenced it psot the free.

Fix by removing the abort call on nvmet_req_init() failure.
The completion response will be sent with an error status code.

As the completion path will terminate the io, ensure the data_sg
lists show an unused state so that teardown paths are successful.

Signed-off-by: Paul Ely <Paul.Ely@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
James Smart 0b5a7669a4 nvme_fc: Fix crash when nvme controller connection fails.
If a controller connection is attempted (say to a subsystem that
does not exist), the first attempt errors out.  If another connect
is attempted, it crashes.

Issue is the prior controller has yet execute it's final put, thus
its still on lists. However, opts points on it have been cleared, thus
causing the crash if they are referenced.

Fix is to add the missing put after the nvme_uninit_ctrl() call on
the attachment failure.

Signed-off-by: Paul Ely <Paul.Ely@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
James Smart 36715cf4b3 nvme_fc: replace ioabort msleep loop with completion
Per the recommendation by Sagi on:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html

Wait for io aborts to complete wait converted from msleep look to
using a struct completion.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
James Smart b4dfd6ee99 nvme_fc: fix double calls to nvme_cleanup_cmd()
Current fc transport code, on io termination, is calling
nvme_cleanup_cmd() followed by the transport dma unmap routine
which also calls nvme_cleanup_cmd(). Which means two kfrees occur
on the same address, raising havoc. This resulted in odd data errors,
effectively corruption..

Fix by removing the extraneous double calls. Call now occurs only in
teardown paths and as part of dma unmap routine.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Christoph Hellwig b1465c6344 nvme-fabrics: verify that a controller returns the correct NQN
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Christoph Hellwig 49d3d50b0d nvme: simplify nvme_dev_attrs_are_visible
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Christoph Hellwig 180de00700 nvme: read the subsystem NQN from Identify Controller
NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
Identify controller data structures.  Use this NQN for the subsysnqn
sysfs attribute by storing it in the nvme_ctrl structure after verifying
it.  For older controllers we generate a "fake" NQN per non-normative
text in the NVMe 1.3 spec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Christoph Hellwig 942fbab4cd nvme: remove a misleading comment on struct nvme_ns
While a NVMe Namespace is somewhat similar to a SCSI Logical Unit (and not
a Logical Unit Number anyway) there are subtile differences.  Remove the
misleading comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grmberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Kai-Heng Feng 76a5af8417 nvme: explicitly disable APST on quirked devices
A user reports APST is enabled, even when the NVMe is quirked or with
option "default_ps_max_latency_us=0".

The current logic will not set APST if the device is quirked. But the
NVMe in question will enable APST automatically.

Separate the logic "apst is supported" and "to enable apst", so we can
use the latter one to explicitly disable APST at initialiaztion.

BugLink: https://bugs.launchpad.net/bugs/1699004
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Sagi Grimberg 7aa1f42752 nvme: use a single NVME_AQ_DEPTH and relax it to 32
No need to differentiate fabrics from pci/loop, also lower
it to 32 as we don't really need 256 inflight admin commands.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Johannes Thumshirn 6bfe04255d nvme: add hostid token to fabric options
Currently we have no way to define a stable host-id but always use the one
which is randomly generated when we add the host or use the default host.

Provide a "hostid=%s" for user-space to pass in a persistent host-id which
overrides the randomly generated one.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Keith Busch 3f7f25a910 nvme: Remove SCSI translations
The SCSI-to-NVMe translations were added to assist storage applications
utilizing SG_IO transitioning to NVMe. It was always recommended,
however, to use native NVMe for device management as too much is lost
in translation and the maintenance burden in keeping this kludgey
layer around has been neglected such that much of the translations are
completely broken.

This patch removes SG_IO handling from NVMe to avoid any confusion
regarding maintenance support for this interface. The config option for
NVMe SCSI emulation has been disabled by default since 4.5. The driver
has supported native nvme user commands since the beginning, and native
tooling is publicly available for use or as reference for anyone writing
their own tools, so there's no excuse for hanging onto a broken crutch.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Sagi Grimberg 442e19b7cc nvme-pci: open-code polling logic in nvme_poll
Given that the code is simple enough it seems better
then passing a tag by reference for each call site, also
we can now get rid of __nvme_process_cq.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Sagi Grimberg 920d13a884 nvme-pci: factor out the cqe reading mechanics from __nvme_process_cq
Also, maintain a consumed counter to rely on for doorbell and
cqe_seen update instead of directly relying on the cq head and phase.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Sagi Grimberg 83a12fb77b nvme-pci: factor out cqe handling into a dedicated routine
Makes the code slightly more readable.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Sagi Grimberg eb281c8283 nvme-pci: Introduce nvme_ring_cq_doorbell
Nice abstraction of the actual mechanics of how to do it.
Note the change that we call it after we assign nvmeq->cq_head
to avoid passing it.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:14:13 -06:00
Jens Axboe 5657cb0797 fs/fcntl: use copy_to/from_user() for u64 types
Some architectures (at least PPC) doesn't like get/put_user with
64-bit types on a 32-bit system. Use the variably sized copy
to/from user variants instead.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: c75b1d9421 ("fs: add fcntl() interface for setting/getting write life time hints")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28 08:09:45 -06:00
Suravee Suthikulpanit 84a21dbdef iommu/amd: Fix interrupt remapping when disable guest_mode
Pass-through devices to VM guest can get updated IRQ affinity
information via irq_set_affinity() when not running in guest mode.
Currently, AMD IOMMU driver in GA mode ignores the updated information
if the pass-through device is setup to use vAPIC regardless of guest_mode.
This could cause invalid interrupt remapping.

Also, the guest_mode bit should be set and cleared only when
SVM updates posted-interrupt interrupt remapping information.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Fixes: d98de49a53 ('iommu/amd: Enable vAPIC interrupt remapping mode by default')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 14:44:56 +02:00
Miklos Szeredi fbaf94ee3c ovl: don't set origin on broken lower hardlink
When copying up a file that has multiple hard links we need to break any
association with the origin file.  This makes copy-up be essentially an
atomic replace.

The new file has nothing to do with the old one (except having the same
data and metadata initially), so don't set the overlay.origin attribute.

We can relax this in the future when we are able to index upper object by
origin.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 3a1e819b4e ("ovl: store file handle of lower inode on copy up")
2017-06-28 13:41:22 +02:00
Miklos Szeredi e85f82ff9b ovl: copy-up: don't unlock between lookup and link
Nothing prevents mischief on upper layer while we are busy copying up the
data.

Move the lookup right before the looked up dentry is actually used.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 01ad3eb8a0 ("ovl: concurrent copy up of regular files")
Cc: <stable@vger.kernel.org> # v4.11
2017-06-28 13:41:22 +02:00
Takashi Iwai d94815f917 ALSA: hda - Fix endless loop of codec configure
azx_codec_configure() loops over the codecs found on the given
controller via a linked list.  The code used to work in the past, but
in the current version, this may lead to an endless loop when a codec
binding returns an error.

The culprit is that the snd_hda_codec_configure() unregisters the
device upon error, and this eventually deletes the given codec object
from the bus.  Since the list is initialized via list_del_init(), the
next object points to the same device itself.  This behavior change
was introduced at splitting the HD-audio code code, and forgotten to
adapt it here.

For fixing this bug, just use a *_safe() version of list iteration.

Fixes: d068ebc25e ("ALSA: hda - Move some codes up to hdac_bus struct")
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-06-28 12:10:05 +02:00
Daniel Stone 426ef1bb40 drm/etnaviv: Fix implicit/explicit sync sense inversion
We were reading the no-implicit sync flag the wrong way around,
synchronizing too much for the explicit case, and not at all for the
implicit case. Oops.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-28 10:35:53 +02:00
Lucas Stach f4a4381ba4 drm/etnaviv: fix submit flags getting overwritten by BO content
The addition of the flags member to etnaviv_gem_submit structure didn't
take into account that the last member of this structure is a variable
length array.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-06-28 10:35:46 +02:00
Dave Airlie 9ff1beb1d1 Merge tag 'drm-intel-fixes-2017-06-27' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
Just a few minor fixes. Important one is the execbuf async fix (aka
ANDROID_native_sync). There was another patch for a display coherency
corner case on APL, but we've random-walked in that space too much,
and the cherry-pick looked really invasive.

* tag 'drm-intel-fixes-2017-06-27' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations
  drm/i915: Hold struct_mutex for per-file stats in debugfs/i915_gem_object
  drm/i915: Retire the VMA's fence tracker before unbinding
2017-06-28 17:07:15 +10:00
Dave Airlie 5193c08c7e Merge branch 'vmwgfx-fixes-4.12' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Single vmwgfx fix
* 'vmwgfx-fixes-4.12' of git://people.freedesktop.org/~thomash/linux:
  drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
2017-06-28 17:06:58 +10:00
Hui Wang a8f20fd25b ALSA: hda - set input_path bitmap to zero after moving it to new place
Recently we met a problem, the codec has valid adcs and input pins,
and they can form valid input paths, but the driver does not build
valid controls for them like "Mic boost", "Capture Volume" and
"Capture Switch".

Through debugging, I found the driver needs to shrink the invalid
adcs and input paths for this machine, so it will move the whole
column bitmap value to the previous column, after moving it, the
driver forgets to set the original column bitmap value to zero, as a
result, the driver will invalidate the path whose index value is the
original colume bitmap value. After executing this function, all
valid input paths are invalidated by a mistake, there are no any
valid input paths, so the driver won't build controls for them.

Fixes: 3a65bcdc57 ("ALSA: hda - Fix inconsistent input_paths after ADC reduction")
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-06-28 07:09:19 +02:00
Trond Myklebust 2e31b4cb89 NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()
The current code works only for the case where we have exactly one slot,
which is no longer true.
nfs4_free_slot() will automatically declare the callback channel to be
drained when all slots have been returned.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-06-27 22:26:23 -04:00
Benjamin Coddington d9f2950006 Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind"
This reverts commit 920b4530fb which could
call d_move() without holding the directory's i_mutex, and reverts commit
d4ea7e3c5c "NFS: Fix old dentry rehash after
move", which was a follow-up fix.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 920b4530fb ("NFS: nfs_rename() handle -ERESTARTSYS dentry left behind")
Cc: stable@vger.kernel.org # v4.10+
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-06-27 21:58:14 -04:00
Trond Myklebust bd171930e6 NFSv4.1: Fix a race in nfs4_proc_layoutget
If the task calling layoutget is signalled, then it is possible for the
calls to nfs4_sequence_free_slot() and nfs4_layoutget_prepare() to race,
in which case we leak a slot.
The fix is to move the call to nfs4_sequence_free_slot() into the
nfs4_layoutget_release() so that it gets called at task teardown time.

Fixes: 2e80dbe7ac ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-06-27 21:44:58 -04:00
Trond Myklebust 898fc11bb2 NFS: Trunking detection should handle ERESTARTSYS/EINTR
Currently, it will return EIO in those cases.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-06-27 21:44:58 -04:00
Aleksandar Markovic ddbfff7429 MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately
If accumulator value is zero, just return the value of previously
calculated product. This brings logic in MADDF/MSUBF implementation
closer to the logic in ADD/SUB case.

Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com>
Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Cc: James.Hogan@imgtec.com
Cc: Paul.Burton@imgtec.com
Cc: Raghu.Gandham@imgtec.com
Cc: Leonid.Yegoshin@imgtec.com
Cc: Douglas.Leung@imgtec.com
Cc: Petar.Jovanovic@imgtec.com
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16512/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-28 02:54:30 +02:00
Julia Lawall e9d5d4a0c1 drbd: Drop unnecessary static
Drop static on a local variable, when the variable is initialized before
any use, on every possible execution path through the function.  The
static has no benefit, and dropping it reduces the code size.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@bad exists@
position p;
identifier x;
type T;
@@

static T x@p;
...
x = <+...x...+>

@@
identifier x;
expression e;
type T;
position p != bad.p;
@@

-static
 T x@p;
 ... when != x
     when strict
?x = e;
// </smpl>

The change in code size is indicates by the following output from the size
command.

before:
   text    data     bss     dec     hex filename
  67299    2291    1056   70646   113f6 drivers/block/drbd/drbd_nl.o

after:
   text    data     bss     dec     hex filename
  67283    2291    1056   70630   113e6 drivers/block/drbd/drbd_nl.o

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-27 17:56:50 -06:00
Keith Busch ebef736857 nvme/pci: Fix stuck nvme reset
The controller state is set to resetting prior to disabling the
controller, so this patch accounts for that state when deciding if it
needs to freeze the queues. Without this, an 'nvme reset /dev/nvme0'
blocks forever because the queues were never frozen.

Fixes: 82b057caef ("nvme-pci: fix multiple ctrl removal scheduling")
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-27 17:44:05 -06:00
Karl Beldan 25d8b92e0a MIPS: head: Reorder instructions missing a delay slot
In this sequence the 'move' is assumed in the delay slot of the 'beq',
but head.S is in reorder mode and the former gets pushed one 'nop'
farther by the assembler.

The corrected behavior made booting with an UHI supplied dtb erratic.

Fixes: 15f37e1588 ("MIPS: store the appended dtb address in a variable")
Signed-off-by: Karl Beldan <karl.beldan+oss@gmail.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Jonas Gorski <jogo@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16614/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-27 23:35:21 +02:00