Commit graph

131 commits

Author SHA1 Message Date
Christoph Hellwig 45686b6198 nvme: fix max_segments integer truncation
The block layer uses an unsigned short for max_segments.  The way we
calculate the value for NVMe tends to generate very large 32-bit values,
which after integer truncation may lead to a zero value instead of
the desired outcome.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:43:10 -07:00
Christoph Hellwig da35825d9a nvme: set queue limits for the admin queue
Factor out a helper to set all the device specific queue limits and apply
them to the admin queue in addition to the I/O queues.  Without this the
command size on the admin queue is arbitrarily low, and the missing
other limitations are just minefields waiting for victims.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:50 -07:00
Keith Busch e9fc63d682 NVMe: Fix 0-length integrity payload
A user could send a passthrough IO command with a metadata pointer to a
namespace without metadata. With metadata length of 0, kmalloc returns
ZERO_SIZE_PTR. Since that is not NULL, the driver would have set this as
the bio's integrity payload, which causes an access fault on completion.

This patch ignores the users metadata buffer if the namespace format
does not support separate metadata.

Reported-by: Stephen Bates <stephen.bates@microsemi.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:50 -07:00
Keith Busch 63088ec7c8 NVMe: Don't allow unsupported flags
The command flags can change the meaning of other fields in the command
that the driver is not prepared to handle. Specifically, the user could
passthrough an SGL flag, causing the controller to misinterpret the PRP
list the driver created, potentially corrupting memory or data.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:50 -07:00
Keith Busch 69d9a99c25 NVMe: Move error handling to failed reset handler
This moves failed queue handling out of the namespace removal path and
into the reset failure path, fixing a hanging condition if the controller
fails or link down during del_gendisk. Previously the driver had to see
the controller as degraded prior to calling del_gendisk to setup the
queues to fail. But, if the controller happened to fail after this,
there was no task to end outstanding requests.

On failure, all namespace states are set to dead. This has capacity
revalidate to 0, and ends all new requests with error status.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:50 -07:00
Keith Busch f58944e265 NVMe: Simplify device reset failure
A reset failure schedules the device to unbind from the driver through
the pci driver's remove. This cleans up all intialization, so there is
no need to duplicate the potentially racy cleanup.

To help understand why a reset failed, the status is logged with the
existing warning message.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:49 -07:00
Keith Busch 646017a612 NVMe: Fix namespace removal deadlock
This patch makes nvme namespace removal lockless. It is up to the caller
to ensure no active namespace scanning is occuring. To ensure no scan
work occurs, the nvme pci driver adds a removing state to the controller
device to avoid queueing scan work during removal. The work is flushed
after setting the state, so no new scan work can be queued.

The lockless removal allows the driver to cleanup a namespace
request_queue if the controller fails during removal. Previously this
could deadlock trying to acquire the namespace mutex in order to handle
such events.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:49 -07:00
Keith Busch 075790ebba NVMe: Use IDA for namespace disk naming
A namespace may be detached from a controller, but a user may be holding
a reference to it. Attaching a new namespace with the same NSID will create
duplicate names when using the NSID to name the disk.

This patch uses an IDA that is released only when the last reference is
released instead of using the namespace ID.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:49 -07:00
Keith Busch b00a726a9f NVMe: Don't unmap controller registers on reset
Unmapping the registers on reset or shutdown is not necessary. Keeping
the mapping simplifies reset handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 14:42:49 -07:00
Keith Busch f8e68a7c9a NVMe: Rate limit nvme IO warnings
We don't need to spam the kernel logs with thousands of IO cancelling
messages. We can infer all IO's are being cancelled with fewer, or
even none at all. This patch rate limits the message and uses the debug
log level as it is mainly used for testing purposes.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-12 08:10:31 -07:00
Keith Busch ff23a2a15a NVMe: Poll device while still active during remove
A device failure or link down wouldn't have been detected during namespace
removal. This patch keeps the device in the list for polling so that the
thread may see such failure and initiate a reset. The device is removed
from the list after disable, so we can safely flush the reset work as
it can't be requeued when disable completes.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-12 08:10:16 -07:00
Keith Busch ae1fba2001 NVMe: Requeue requests on suspended queues
It's possible a request may get to the driver after the nvme queue was
disabled. This has the request requeue if that happens.

Note the request is still "started" by the driver, but requeuing will
clear the start state for timeout handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-12 08:10:08 -07:00
Keith Busch ef2d4615c5 NVMe: Allow request merges
It is generally more efficient to submit larger IO.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-11 13:14:03 -07:00
Keith Busch 4f76d0e498 NVMe: Fix io incapable return values
The function returns true when the controller can't handle IO.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-11 13:14:02 -07:00
Christoph Hellwig 21d147880e nvme: fix Kconfig description for BLK_DEV_NVME_SCSI
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-09 10:21:22 -07:00
Matias Bjørling 6dde1d6c90 lightnvm: check overflow and correct mlc pairs
The specification currently limits the number of MLC pairs to 886. Make
sure that a device is unable to be instantiate if more is configured.

Also, previously the patch had the wrong math for copying MLC pairs, as
it only copied half of the actual entries.

Fixes: ca5927e7ab "lightnvm: introduce mlc lower page table mappings"
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-04 09:19:45 -07:00
Linus Torvalds 3e1e21c7bf Merge branch 'for-4.5/nvme' of git://git.kernel.dk/linux-block
Pull NVMe updates from Jens Axboe:
 "Last branch for this series is the nvme changes.  It's in a separate
  branch to avoid splitting too much between core and NVMe changes,
  since NVMe is still helping drive some blk-mq changes.  That said, not
  a huge amount of core changes in here.  The grunt of the work is the
  continued split of the code"

* 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits)
  uapi: update install list after nvme.h rename
  NVMe: Export NVMe attributes to sysfs group
  NVMe: Shutdown controller only for power-off
  NVMe: IO queue deletion re-write
  NVMe: Remove queue freezing on resets
  NVMe: Use a retryable error code on reset
  NVMe: Fix admin queue ring wrap
  nvme: make SG_IO support optional
  nvme: fixes for NVME_IOCTL_IO_CMD on the char device
  nvme: synchronize access to ctrl->namespaces
  nvme: Move nvme_freeze/unfreeze_queues to nvme core
  PCI/AER: include header file
  NVMe: Export namespace attributes to sysfs
  NVMe: Add pci error handlers
  block: remove REQ_NO_TIMEOUT flag
  nvme: merge iod and cmd_info
  nvme: meta_sg doesn't have to be an array
  nvme: properly free resources for cancelled command
  nvme: simplify completion handling
  nvme: special case AEN requests
  ...
2016-01-21 19:58:02 -08:00
Linus Torvalds 0a13daedf7 Merge branch 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block
Pull lightnvm fixes and updates from Jens Axboe:
 "This should have been part of the drivers branch, but it arrived a bit
  late and wasn't based on the official core block driver branch.  So
  they got a small scolding, but got a pass since it's still new.  Hence
  it's in a separate branch.

  This is mostly pure fixes, contained to lightnvm/, and minor feature
  additions"

* 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block: (26 commits)
  lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM
  lightnvm: introduce factory reset
  lightnvm: use system block for mm initialization
  lightnvm: introduce ioctl to initialize device
  lightnvm: core on-disk initialization
  lightnvm: introduce mlc lower page table mappings
  lightnvm: add mccap support
  lightnvm: manage open and closed blocks separately
  lightnvm: fix missing grown bad block type
  lightnvm: reference rrpc lun in rrpc block
  lightnvm: introduce nvm_submit_ppa
  lightnvm: move rq->error to nvm_rq->error
  lightnvm: support multiple ppas in nvm_erase_ppa
  lightnvm: move the pages per block check out of the loop
  lightnvm: sectors first in ppa list
  lightnvm: fix locking and mempool in rrpc_lun_gc
  lightnvm: put block back to gc list on its reclaim fail
  lightnvm: check bi_error in gc
  lightnvm: return the get_bb_tbl return value
  lightnvm: refactor end_io functions for sync
  ...
2016-01-21 19:01:55 -08:00
Linus Torvalds 7c24d9f3b2 Merge branch 'for-4.5/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
 "We don't have a lot of core changes this time around, it's mostly in
  drivers, which will come in a subsequent pull.

  The cores changes include:

   - blk-mq
        - Prep patch from Christoph, changing blk_mq_alloc_request() to
          take flags instead of just using gfp_t for sleep/nosleep.
        - Doc patch from me, clarifying the difference between legacy
          and blk-mq for timer usage.
        - Fixes from Raghavendra for memory-less numa nodes, and a reuse
          of CPU masks.

   - Cleanup from Geliang Tang, using offset_in_page() instead of open
     coding it.

   - From Ilya, rename request_queue slab to it reflects what it holds,
     and a fix for proper use of bdgrab/put.

   - A real fix for the split across stripe boundaries from Keith.  We
     yanked a broken version of this from 4.4-rc final, this one works.

   - From Mike Krinkin, emit a trace message when we split.

   - From Wei Tang, two small cleanups, not explicitly clearing memory
     that is already cleared"

* 'for-4.5/core' of git://git.kernel.dk/linux-block:
  block: use bd{grab,put}() instead of open-coding
  block: split bios to max possible length
  block: add call to split trace point
  blk-mq: Avoid memoryless numa node encoded in hctx numa_node
  blk-mq: Reuse hardware context cpumask for tags
  blk-mq: add a flags parameter to blk_mq_alloc_request
  Revert "blk-flush: Queue through IO scheduler when flush not required"
  block: clarify blk_add_timer() use case for blk-mq
  bio: use offset_in_page macro
  block: do not initialise statics to 0 or NULL
  block: do not initialise globals to 0 or NULL
  block: rename request_queue slab cache
2016-01-19 15:03:34 -08:00
Keith Busch 779ff75617 NVMe: Export NVMe attributes to sysfs group
Adds all controller information to attribute list exposed to sysfs, and
appends the reset_controller attribute to it. The nvme device is created
with this attribute list, so driver no long manages its attributes.

Reported-by: Sujith Pandel <sujithpshankar@gmail.com>
Cc: Sujith Pandel <sujithpshankar@ gmail.com>
Cc: David Milburn <dmilburn@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 15:11:54 -07:00
Keith Busch a5cdb68c2c NVMe: Shutdown controller only for power-off
We don't need to shutdown a controller for a reset. A controller in a
shutdown state may take longer to become ready than one that was simply
disabled. This patch has the driver shut down a controller only if the
device is about to be powered off or being removed. When taking the
controller down for a reset reason, the controller will be disabled
instead.

Function names have been updated in this patch to reflect their changed
semantics.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 14:47:59 -07:00
Keith Busch db3cbfff5b NVMe: IO queue deletion re-write
The nvme driver deletes IO queues asynchronously since this operation
may potentially take an undesirable amount of time with a large number
of queues if done serially.

The driver used to manage coordinating asynchronous deletions. This
patch simplifies that by leveraging the block layer rather than using
kthread workers and chaining more complicated callbacks.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 14:47:56 -07:00
Keith Busch 25646264e1 NVMe: Remove queue freezing on resets
NVMe submits all commands through the block layer now. This means we
can let requests queue at the blk-mq hardware context since there is no
path that bypasses this anymore so we don't need to freeze the queues
anymore. The driver can simply stop the h/w queues from running during
a reset instead.

This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen
with requeued requests.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 13:33:36 -07:00
Keith Busch 1d49c38c48 NVMe: Use a retryable error code on reset
A negative status has the "do not retry" bit set, which makes it not
retryable.  Use a fake status that can potentially be retried on reset.

An aborted command's status is overridden by the timeout handler so
that it won't be retried, which is necessary to keep initialization from
getting into a reset loop.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 13:33:35 -07:00
Keith Busch e3e9d50cd6 NVMe: Fix admin queue ring wrap
The tag set queue depth needs to be one less than the h/w queue depth
so we don't wrap the circular buffer. This conforms to the specification
defined "Full Queue" condition.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 13:33:33 -07:00
Christoph Hellwig 4490733250 nvme: make SG_IO support optional
Translation SCSI commands to NVMe commands is rather pointless in general
as applications must not expext to be able to use SCSI commands on a
generic block device.

Make the huge translation layer optional and hope no one will ever enable
it in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 13:30:16 -07:00
Christoph Hellwig bfd8947194 nvme: fixes for NVME_IOCTL_IO_CMD on the char device
Make sure we synchronize access to the namespaces list and grab a reference
to the namespace before doing I/O.  Make sure to reject the ioctl if multiple
namespaces are present as it's entirely unsafe, and warn when using it even
with a single namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 13:30:15 -07:00
Christoph Hellwig 69d3b8ac15 nvme: synchronize access to ctrl->namespaces
Currently traversal and modification of ctrl->namespaces happens completely
unsynchronized, which can be fixed by the addition of a simple mutex.

Note: nvme_dev_ioctl will be handled in the next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 13:30:13 -07:00
Sagi Grimberg 363c9aacb6 nvme: Move nvme_freeze/unfreeze_queues to nvme core
Nothing pci specific about them and We'll need them exported
in other transports too.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 13:30:11 -07:00
Matias Bjørling ca5927e7ab lightnvm: introduce mlc lower page table mappings
NAND MLC memories have both lower and upper pages. When programming,
both of these must be written, before data can be read. However,
these lower and upper pages might not placed at even and odd flash
pages, but can be skipped. Therefore each flash memory has its lower
pages defined, which can then be used when programming and to know when
padding are necessary.

This patch implements the lower page definition in the specification,
and exposes it through a simple lookup table at dev->lptbl.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 08:21:17 -07:00
Matias Bjørling 22513215b8 lightnvm: return the get_bb_tbl return value
During get_bb_tbl, a callback is used to allow an user-specific scan
function to be called. The callback may return an error, and in that
case, the return value is overridden. However, the callback error is
needed when the fault is a user error and not a kernel error. For
example, when a user tries to initialize the same device twice. The
get_bb_tbl callback should be able to communicate this.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 08:21:16 -07:00
Matias Bjørling 91276162de lightnvm: refactor end_io functions for sync
To implement sync I/O support within the LightNVM core, the end_io
functions are refactored to take an end_io function pointer instead of
testing for initialized media manager, followed by calling its end_io
function.

Sync I/O can then be implemented using a callback that signal I/O
completion. This is similar to the logic found in blk_to_execute_io().
By implementing it this way, the underlying device I/Os submission logic
is abstracted away from core, targets, and media managers.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 08:21:16 -07:00
Keith Busch b5875222de NVMe: IO ending fixes on surprise removal
This patch fixes a lost request discovered during IO + hot removal.

The driver's pci removal deletes gendisks prior to shutting down the
controller to allow dirty data to sync. Dirty data can not be synced on
a surprise removal, though, and would potentially block indefinitely.

The driver previously had marked the queue as dying in this scenario
to prevent new requests from attempting, however it will still block
for requests that already entered the queue. This patch fixes this by
quiescing IO first, then aborting the requeued requests before deleting
disks.

Reported-by: Sujith Pandel <sujith_pandel@dell.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Sujith Pandel <sujith_pandel@dell.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 10:12:04 -07:00
Keith Busch 2b9b6e86bc NVMe: Export namespace attributes to sysfs
Exposes the NGUID, EUI-64, and NSID to sysfs entries under the disk's
kobject.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 10:10:45 -07:00
Keith Busch a0a3408ee6 NVMe: Add pci error handlers
Requests enabling pcie aer support. Shuts down the controller on error
detected with io frozen state prior to requesting slot reset; resumes
controller after reset completes.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 10:09:06 -07:00
Christoph Hellwig f4800d6d15 nvme: merge iod and cmd_info
Merge the two per-request structures in the nvme driver into a single
one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Christoph Hellwig bf68405705 nvme: meta_sg doesn't have to be an array
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Christoph Hellwig eee417b069 nvme: properly free resources for cancelled command
We need to move freeing of resources to the ->complete handler to ensure
they are also freed when we cancel the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Christoph Hellwig aae239e191 nvme: simplify completion handling
Now that all commands are executed as block layer requests we can remove the
internal completion in the NVMe driver.  Note that we can simply call
blk_mq_complete_request to abort commands as the block layer will protect
against double copletions internally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Christoph Hellwig adf68f21c1 nvme: special case AEN requests
AEN requests are different from other requests in that they don't time out
or can easily be cancelled.  Because of that we should not use the blk-mq
infrastructure but just special case them in the completion path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Christoph Hellwig e7a2a87d59 nvme: switch abort to blk_execute_rq_nowait
And remove the now unused nvme_submit_cmd helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Christoph Hellwig d8f32166a9 nvme: switch delete SQ/CQ to blk_execute_rq_nowait
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Christoph Hellwig 7688faa6dd nvme: factor out a few helpers from req_completion
We'll need them in other places later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Christoph Hellwig 4680072003 nvme: fix admin queue depth
The number in tag_set->queue depth includes the reserved tags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Keith Busch 4b9d5b1510 NVMe: Simplify metadata setup
We no longer require the two-pass setup for block integrity.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:34 -07:00
Keith Busch 53029b0441 NVMe: Remove device management handles on remove
We don't want to allow new references to open on a device that is
removed. This ties the lifetime of these handles to the physical device's
presence rather than to the open reference count.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:33 -07:00
Keith Busch 92f7a1624b NVMe: Use unbounded work queue for all work
Removes all usage of the global work queue so work can't be
scheduled on two different work queues, and removes nvme's work queue
singlethreadedness so controllers can be driven in parallel.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: keep the dead controller removal on the system workqueue to avoid
 deadlocks]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:33 -07:00
Keith Busch 540c801c65 NVMe: Implement namespace list scanning
The NVMe 1.1 specification provides an identify mode to return a
list of active namespaces. This is more efficient to discover which
namespace identifiers are active on a controller, providing potentially
significant improvement in scan time for controllers with sparesly
populated namespaces.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: add quirk for the broken Qemu Identify implementation.  To be relaxed
 later]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:33 -07:00
Christoph Hellwig 6bf25d1641 nvme: switch abort_limit to an atomic_t
There is no lock to sychronize access to the abort_limit field of
struct nvme_ctrl, so switch it to an atomic_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:33 -07:00
Christoph Hellwig 5c8809e650 nvme: remove dead controllers from a work item
Compared to the kthread this gives us multiple call prevention for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 09:38:33 -07:00