1
0
Fork 0
Commit Graph

411 Commits (47ca23c1176a9e499d4652b33e398a7ddb0bef47)

Author SHA1 Message Date
Gustavo A. R. Silva 3b7dffb971 nvme-pci: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warning:

drivers/nvme/host/pci.c: In function ‘nvme_timeout’:
drivers/nvme/host/pci.c:1298:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   shutdown = true;
   ~~~~~~~~~^~~~~~
drivers/nvme/host/pci.c:1299:2: note: here
  case NVME_CTRL_CONNECTING:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-05-13 16:00:03 +02:00
Maxim Levitsky e8fd41bb3c nvme-pci: init shadow doorbell after each reset
The spec states:

  "The settings are not retained across a Controller Level Reset"

Therefore the driver must enable the shadow doorbell, after each reset.

This was caught while testing the nvme driver over upcoming nvme-mdev
device.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Minwoo Im <minwoo.im@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-05-13 16:00:03 +02:00
Christoph Hellwig 811015409f nvme: move command size checks to the core
Most command aren't PCIe specific, so move the size checking for them
to core.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-05-01 09:18:36 -04:00
Minwoo Im a97234e1ff nvme-pci: check more command sizes
All the NVMe command has 64bytes fixed size so that it has been assured
with BUILD_BUG_ON().  The remaining command structures in linux/nvme.h
also need to be checked here.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-05-01 09:17:27 -04:00
Minwoo Im 665648673e nvme-pci: remove an unneeded variable initialization
Variable "n" will be assigned once kstrtoint() succeeds, otherwise it
will not be referred because kstrtoint() will return an error which
means go out from this function.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-05-01 09:17:27 -04:00
Keith Busch c8e9e9b764 nvme-pci: unquiesce admin queue on shutdown
Just like IO queues, the admin queue also will not be restarted after a
controller shutdown. Unquiesce this queue so that we do not block
request dispatch on a permanently disabled controller.

Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-05-01 09:17:22 -04:00
Keith Busch 9dc1a38ef1 nvme-pci: shutdown on timeout during deletion
We do not restart a controller in a deleting state for timeout errors.
When in this state, unblock potential request dispatchers with failed
completions by shutting down the controller on timeout detection.

Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-05-01 09:17:18 -04:00
Klaus Birkelund Jensen 049bf37262 nvme-pci: fix psdt field for single segment sgls
The shortcut for single segment SGL requests did not set the PSDT field
to mark the request as using SGLs.

Fixes: 297910571f ("nvme-pci: optimize mapping single segment requests using SGLs")
Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-05-01 09:17:16 -04:00
Christoph Hellwig 70479b71bc nvme-pci: tidy up nvme_map_data
Remove two pointless local variables, remove ret assignment that is
never used, move the use_sgl initialization closer to where it is used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:58 +02:00
Christoph Hellwig 297910571f nvme-pci: optimize mapping single segment requests using SGLs
If the controller supports SGLs we can take another short cut for single
segment request, given that we can always map those without another
indirection structure, and thus don't need to create a scatterlist
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:58 +02:00
Christoph Hellwig dff824b2aa nvme-pci: optimize mapping of small single segment requests
If a request is single segment and fits into one or two PRP entries we
do not have to create a scatterlist for it, but can just map the bio_vec
directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:58 +02:00
Christoph Hellwig d43f1ccfad nvme-pci: remove the inline scatterlist optimization
We'll have a better way to optimize for small I/O that doesn't
require it soon, so remove the existing inline_sg case to make that
optimization easier to implement.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:58 +02:00
Christoph Hellwig 4aedb70543 nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data
This prepares for some bigger changes to the data mapping helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:58 +02:00
Christoph Hellwig 783b94bd92 nvme-pci: do not build a scatterlist to map metadata
We always have exactly one segment, so we can simply call dma_map_bvec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:57 +02:00
Christoph Hellwig b15c592de3 nvme-pci: only call nvme_unmap_data for requests transferring data
This mirrors how nvme_map_pci is called and will allow simplifying some
checks in nvme_unmap_pci later on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:57 +02:00
Christoph Hellwig 7fe07d14f7 nvme-pci: merge nvme_free_iod into nvme_unmap_data
This means we now have a function that undoes everything nvme_map_data
does and we can simplify the error handling a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:57 +02:00
Christoph Hellwig 915f04c93d nvme-pci: move the call to nvme_cleanup_cmd out of nvme_unmap_data
Cleaning up the command setup isn't related to unmapping data, and
disentangling them will simplify error handling a bit down the road.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:57 +02:00
Christoph Hellwig 9b048119a1 nvme-pci: remove nvme_init_iod
nvme_init_iod should really be split into two parts: initialize a few
general iod fields, which can easily be done at the beginning of
nvme_queue_rq, and allocating the scatterlist if needed, which logically
belongs into nvme_map_data with the code making use of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2019-04-05 08:07:57 +02:00
Keith Busch 39f8e36401 nvme-pci: remove unused nvme_iod member
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-04-05 08:07:57 +02:00
Keith Busch 88a041f4c1 nvme-pci: remove q_dmadev from nvme_queue
We don't need to save the dma device as it's not used in the hot path
and hasn't in a long time. Shrink the struct nvme_queue removing this
unnecessary member.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-04-05 08:07:57 +02:00
Keith Busch 7c349dde26 nvme-pci: use a flag for polled queues
A negative value for the cq_vector used to mean the queue is either
disabled or a polled queue. However, we have a queue enabled flag,
so the cq_vector had been serving double duty.

Don't overload the meaning of cq_vector. Use a flag specific to the
polled queues instead.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-04-05 08:07:57 +02:00
Linus Torvalds 11efae3506 for-5.1/block-post-20190315
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlyL124QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptsxD/42slmoE5TC3vwXcgMBEilrjIHCns6O4Leo
 0r8Awdwil8QkVDphfAWsgkTBjRPUNKv4cCg2kG4VEzAy62YSutUWPeqJZwLOpGDI
 kji9XI6WLqwQ/VhDFwEln9G+xWDUQxds5PZDomlzLpjiNqkFArwwsPFnJbshH4fB
 U6kZrhVSLfvJHIJmC9H4RIWuTEwUH1yFSvzzMqDOOyvRon2g/A2YlHb2KhSCaJPq
 1b0jbhyR0GVP0EH1FdeKvNYFZfvXXSPAbxDN1CEtW/Lq8WxXeoaCj390tC+gL7yQ
 WWHntvUoVU/weWudbT3tVsYgpI91KfPM5OuWTDGod6lFwHrI5X91Pao3KYUGPb9d
 cwvNBOlkNqR1ENZOGTgxLeKwiwV7G1DIjvsaijRQJhGy4Uw4RkM/YEct9JHxWBIF
 x4ZuSVUVZ5Y3zNPC945iJ6Z5feOz/UO9bQL00oimu0c0JhAp++3pHWAFJEMQ8q1a
 0IRifkeUyhf0p9CIVPDnUzmNgSBglFkAVTPVAWySBVDU+v0/GoNcYwTzPq4cgPrF
 UJEIlx+RdDpKKmCqBvKjtx4w7BC1lCebL/1ZJrbARNO42djt8xeuyvKw0t+MYVTZ
 UsvLX72tXwUIbj0IZZGuz+8uSGD4ddDs8+x486FN4oaCPf36FUnnkOZZkhjV/KQA
 vsZNrNNZpw==
 =qBae
 -----END PGP SIGNATURE-----

Merge tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block

Pull more block layer changes from Jens Axboe:
 "This is a collection of both stragglers, and fixes that came in after
  I finalized the initial pull. This contains:

   - An MD pull request from Song, with a few minor fixes

   - Set of NVMe patches via Christoph

   - Pull request from Konrad, with a few fixes for xen/blkback

   - pblk fix IO calculation fix (Javier)

   - Segment calculation fix for pass-through (Ming)

   - Fallthrough annotation for blkcg (Mathieu)"

* tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block: (25 commits)
  blkcg: annotate implicit fall through
  nvme-tcp: support C2HData with SUCCESS flag
  nvmet: ignore EOPNOTSUPP for discard
  nvme: add proper write zeroes setup for the multipath device
  nvme: add proper discard setup for the multipath device
  nvme: remove nvme_ns_config_oncs
  nvme: disable Write Zeroes for qemu controllers
  nvmet-fc: bring Disconnect into compliance with FC-NVME spec
  nvmet-fc: fix issues with targetport assoc_list list walking
  nvme-fc: reject reconnect if io queue count is reduced to zero
  nvme-fc: fix numa_node when dev is null
  nvme-fc: use nr_phys_segments to determine existence of sgl
  nvme-loop: init nvmet_ctrl fatal_err_work when allocate
  nvme: update comment to make the code easier to read
  nvme: put ns_head ref if namespace fails allocation
  nvme-trace: fix cdw10 buffer overrun
  nvme: don't warn on block content change effects
  nvme: add get-feature to admin cmds tracer
  md: Fix failed allocation of md_register_thread
  It's wrong to add len to sector_nr in raid10 reshape twice
  ...
2019-03-16 12:36:39 -07:00
Christoph Hellwig 7b210e4ed5 nvme: disable Write Zeroes for qemu controllers
Qemu started out with a broken implementation of Write Zeroes written
by yours truly.  Disable Write Zeroes on qemu for now, eventually
we need to go back and make all the qemu quirks version specific,
but that is left for another time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-13 12:57:34 -06:00
Linus Torvalds 80201fe175 for-5.1/block-20190302
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlx63XIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpp2vEACfrrQsap7R+Av28mmXpmXi2FPa3g5Tev1t
 yYjK2qHvhlMZjPTYw3hCmbYdDDczlF7PEgSE2x2DjdcsYapb8Fy1lZ2X16c7ztBR
 HD/t9b5AVSQsczZzKgv3RqsNtTnjzS5V0A8XH8FAP2QRgiwDMwSN6G0FP0JBLbE/
 ZgxQrH1Iy1F33Wz4hI3Z7dEghKPZrH1IlegkZCEu47q9SlWS76qUetSy2GEtchOl
 3Lgu54mQZyVdI5/QZf9DyMDLF6dIz3tYU2qhuo01AHjGRCC72v86p8sIiXcUr94Q
 8pbegJhJ/g8KBol9Qhv3+pWG/QUAZwi/ZwasTkK+MJ4klRXfOrznxPubW1z6t9Vn
 QRo39Po5SqqP0QWAscDxCFjESIQlWlKa+LZurJL7DJDCUGrSgzTpnVwFqKwc5zTP
 HJa5MT2tEeL2TfUYRYCfh0ZV0elINdHA1y1klDBh38drh4EWr2gW8xdseGYXqRjh
 fLgEpoF7VQ8kTvxKN+E4jZXkcZmoLmefp0ZyAbblS6IawpPVC7kXM9Fdn2OU8f2c
 fjVjvSiqxfeN6dnpfeLDRbbN9894HwgP/LPropJOQ7KmjCorQq5zMDkAvoh3tElq
 qwluRqdBJpWT/F05KweY+XVW8OawIycmUWqt6JrVNoIDAK31auHQv47kR0VA4OvE
 DRVVhYpocw==
 =VBaU
 -----END PGP SIGNATURE-----

Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
 "Not a huge amount of changes in this round, the biggest one is that we
  finally have Mings multi-page bvec support merged. Apart from that,
  this pull request contains:

   - Small series that avoids quiescing the queue for sysfs changes that
     match what we currently have (Aleksei)

   - Series of bcache fixes (via Coly)

   - Series of lightnvm fixes (via Mathias)

   - NVMe pull request from Christoph. Nothing major, just SPDX/license
     cleanups, RR mp policy (Hannes), and little fixes (Bart,
     Chaitanya).

   - BFQ series (Paolo)

   - Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
     for the fast path (Jianchao)

   - fops->iopoll() added for async IO polling, this is a feature that
     the upcoming io_uring interface will use (Christoph, me)

   - Partition scan loop fixes (Dongli)

   - mtip32xx conversion from managed resource API (Christoph)

   - cdrom registration race fix (Guenter)

   - MD pull from Song, two minor fixes.

   - Various documentation fixes (Marcos)

   - Multi-page bvec feature. This brings a lot of nice improvements
     with it, like more efficient splitting, larger IOs can be supported
     without growing the bvec table size, and so on. (Ming)

   - Various little fixes to core and drivers"

* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
  block: fix updating bio's front segment size
  block: Replace function name in string with __func__
  nbd: propagate genlmsg_reply return code
  floppy: remove set but not used variable 'q'
  null_blk: fix checking for REQ_FUA
  block: fix NULL pointer dereference in register_disk
  fs: fix guard_bio_eod to check for real EOD errors
  blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
  block: optimize bvec iteration in bvec_iter_advance
  block: introduce mp_bvec_for_each_page() for iterating over page
  block: optimize blk_bio_segment_split for single-page bvec
  block: optimize __blk_segment_map_sg() for single-page bvec
  block: introduce bvec_nth_page()
  iomap: wire up the iopoll method
  block: add bio_set_polled() helper
  block: wire up block device iopoll method
  fs: add an iopoll method to struct file_operations
  loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
  loop: do not print warn message if partition scan is successful
  block: bounce: make sure that bvec table is updated
  ...
2019-03-08 14:12:17 -08:00
Linus Torvalds 78f8601354 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The interrupt departement delivers this time:

   - New infrastructure to manage NMIs on platforms which have a sane
     NMI delivery, i.e. identifiable NMI vectors instead of a single
     lump.

   - Simplification of the interrupt affinity management so drivers
     don't have to implement ugly loops around the PCI/MSI enablement.

   - Speedup for interrupt statistics in /proc/stat

   - Provide a function to retrieve the default irq domain

   - A new interrupt controller for the Loongson LS1X platform

   - Affinity support for the SiFive PLIC

   - Better support for the iMX irqsteer driver

   - NUMA aware memory allocations for GICv3

   - The usual small fixes, improvements and cleanups all over the
     place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  irqchip/imx-irqsteer: Add multi output interrupts support
  irqchip/imx-irqsteer: Change to use reg_num instead of irq_group
  dt-bindings: irq: imx-irqsteer: Add multi output interrupts support
  dt-binding: irq: imx-irqsteer: Use irq number instead of group number
  irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code
  irqchip/gicv3-its: Use NUMA aware memory allocation for ITS tables
  irqdomain: Allow the default irq domain to be retrieved
  irqchip/sifive-plic: Implement irq_set_affinity() for SMP host
  irqchip/sifive-plic: Differentiate between PLIC handler and context
  irqchip/sifive-plic: Add warning in plic_init() if handler already present
  irqchip/sifive-plic: Pre-compute context hart base and enable base
  PCI/MSI: Remove obsolete sanity checks for multiple interrupt sets
  genirq/affinity: Remove the leftovers of the original set support
  nvme-pci: Simplify interrupt allocation
  genirq/affinity: Add new callback for (re)calculating interrupt sets
  genirq/affinity: Store interrupt sets size in struct irq_affinity
  genirq/affinity: Code consolidation
  irqchip/irq-sifive-plic: Check and continue in case of an invalid cpuid.
  irqchip/i8259: Fix shutdown order by moving syscore_ops registration
  dt-bindings: interrupt-controller: loongson ls1x intc
  ...
2019-03-05 12:21:47 -08:00
Christoph Hellwig 5f37396dff nvme-pci: convert to SPDX identifiers
Update license to use SPDX-License-Identifier instead of verbose license
text.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2019-02-20 07:22:25 -07:00
Bart Van Assche e895fedf12 nvme-pci: check kstrtoint() return value in queue_count_set()
This patch avoids that the compiler complains about 'ret' being set
but not being used when building with W=1.

Fixes: 3b6592f70a ("nvme: utilize two queue maps, one for reads and one for writes") # v5.0-rc1
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-02-20 07:18:07 -07:00
Ming Lei 612b72862b nvme-pci: Simplify interrupt allocation
The NVME PCI driver contains a tedious mechanism for interrupt
allocation, which is necessary to adjust the number and size of interrupt
sets to the maximum available number of interrupts which depends on the
underlying PCI capabilities and the available CPU resources.

It works around the former short comings of the PCI and core interrupt
allocation mechanims in combination with interrupt sets.

The PCI interrupt allocation function allows to provide a maximum and a
minimum number of interrupts to be allocated and tries to allocate as
many as possible. This worked without driver interaction as long as there
was only a single set of interrupts to handle.

With the addition of support for multiple interrupt sets in the generic
affinity spreading logic, which is invoked from the PCI interrupt
allocation, the adaptive loop in the PCI interrupt allocation did not
work for multiple interrupt sets. The reason is that depending on the
total number of interrupts which the PCI allocation adaptive loop tries
to allocate in each step, the number and the size of the interrupt sets
need to be adapted as well. Due to the way the interrupt sets support was
implemented there was no way for the PCI interrupt allocation code or the
core affinity spreading mechanism to invoke a driver specific function
for adapting the interrupt sets configuration.

As a consequence the driver had to implement another adaptive loop around
the PCI interrupt allocation function and calling that with maximum and
minimum interrupts set to the same value. This ensured that the
allocation either succeeded or immediately failed without any attempt to
adjust the number of interrupts in the PCI code.

The core code now allows drivers to provide a callback to recalculate the
number and the size of interrupt sets during PCI interrupt allocation,
which in turn allows the PCI interrupt allocation function to be called
in the same way as with a single set of interrupts. The PCI code handles
the adaptive loop and the interrupt affinity spreading mechanism invokes
the driver callback to adapt the interrupt set configuration to the
current loop value. This replaces the adaptive loop in the driver
completely.

Implement the NVME specific callback which adjusts the interrupt sets
configuration and remove the adaptive allocation loop.

[ tglx: Simplify the callback further and restore the dropped adjustment of
  	number of sets ]

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
Link: https://lkml.kernel.org/r/20190216172228.602546658@linutronix.de
2019-02-18 11:21:28 +01:00
Ming Lei 9cfef55bb5 genirq/affinity: Store interrupt sets size in struct irq_affinity
The interrupt affinity spreading mechanism supports to spread out
affinities for one or more interrupt sets. A interrupt set contains one
or more interrupts. Each set is mapped to a specific functionality of a
device, e.g. general I/O queues and read I/O queus of multiqueue block
devices.

The number of interrupts per set is defined by the driver. It depends on
the total number of available interrupts for the device, which is
determined by the PCI capabilites and the availability of underlying CPU
resources, and the number of queues which the device provides and the
driver wants to instantiate.

The driver passes initial configuration for the interrupt allocation via
a pointer to struct irq_affinity.

Right now the allocation mechanism is complex as it requires to have a
loop in the driver to determine the maximum number of interrupts which
are provided by the PCI capabilities and the underlying CPU resources.
This loop would have to be replicated in every driver which wants to
utilize this mechanism. That's unwanted code duplication and error
prone.

In order to move this into generic facilities it is required to have a
mechanism, which allows the recalculation of the interrupt sets and
their size, in the core code. As the core code does not have any
knowledge about the underlying device, a driver specific callback will
be added to struct affinity_desc, which will be invoked by the core
code. The callback will get the number of available interupts as an
argument, so the driver can calculate the corresponding number and size
of interrupt sets.

To support this, two modifications for the handling of struct irq_affinity
are required:

1) The (optional) interrupt sets size information is contained in a
   separate array of integers and struct irq_affinity contains a
   pointer to it.

   This is cumbersome and as the maximum number of interrupt sets is small,
   there is no reason to have separate storage. Moving the size array into
   struct affinity_desc avoids indirections and makes the code simpler.

2) At the moment the struct irq_affinity pointer which is handed in from
   the driver and passed through to several core functions is marked
   'const'.

   With the upcoming callback to recalculate the number and size of
   interrupt sets, it's necessary to remove the 'const'
   qualifier. Otherwise the callback would not be able to update the data.

Implement #1 and store the interrupt sets size in 'struct irq_affinity'.

No functional change.

[ tglx: Fixed the memcpy() size so it won't copy beyond the size of the
  	source. Fixed the kernel doc comments for struct irq_affinity and
  	de-'This patch'-ed the changelog ]

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
Link: https://lkml.kernel.org/r/20190216172228.423723127@linutronix.de
2019-02-18 11:21:27 +01:00
Keith Busch 4726bcf30f nvme-pci: add missing unlock for reset error
The reset work holds a mutex to prevent races with removal modifying the
same resources, but was unlocking only on success. Unlock on failure
too.

Fixes: 5c959d73db ("nvme-pci: fix rapid add remove sequence")
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-02-12 09:29:07 +01:00
Keith Busch 5c959d73db nvme-pci: fix rapid add remove sequence
A surprise removal may fail to tear down request queues if it is racing
with the initial asynchronous probe. If that happens, the remove path
won't see the queue resources to tear down, and the controller reset
path may create a new request queue on a removed device, but will not
be able to make forward progress, deadlocking the pci removal.

Protect setting up non-blocking resources from a shutdown by holding the
same mutex, and transition to the CONNECTING state after these resources
are initialized so the probe path may see the dead controller state
before dispatching new IO.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202081
Reported-by: Alex Gagniuc <Alex_Gagniuc@Dellteam.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Alex Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-02-06 16:35:33 +01:00
Linus Torvalds 0facb89245 for-linus-20190118
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxCLxcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmBID/0SpsLsnoeki37Wei0HzBZeRdFGIXFfg1XB
 +s66nJUL+igHIIwH+EO7Ct+vw8jfCWZfasI+ThGtkgRg88bqJO3HEbE1MQY8KlTh
 Dm9LRHH9Jp4OYbbag2RXammaD4XKi980fIyVTp9R0keszQroSUTXXS40fksszTtB
 K4LEBiAxGJgin60W9RgbbK93n/pw17U5Wu2WID6ZfVxZYVzpXl0iIQXUz1zmnpy5
 3aP7ORqJGnH7WQiV57LMpq5/QFBoKrfUsjtxpnJ+i2694sLYX8NCqnH82SUliSU+
 NN3Vxic1ozmZEADFKg4dfmgVqqyJaoVCRvsA5i5YjqG9m65G/S7/BmsaWbq5Ixf2
 N1KQr/36vkYGYWqtb2rbYRVf9XfbxWKp1wuSaX3N1Vh1/Ee180qD68+wpa/0mtkh
 2dXV6CQP9De4BXH6mxj3Yr7LI2a2r7KLYhULZCAvLMvDtsBrPyHAmRRgXxaLlKL8
 QMFYYO/1wjlzpGnn4IxP8sDMH6N2shtZHAmC2n9TM2gmk5tA0OmajJJ4uxNbtyhw
 +ZlDco3aw/x0v2onjlbTiPij2cReRgwN+tVaEvn0dj3uxaisCru60iYdnr9FX4nL
 g3NxNUJbUhl6YMwvNMI0GKfz1IILzjPgJIhDqSHvYhtCA1w0DMVXs3Qv4HXoVh3m
 ffdby6/Yqw==
 =5yJp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190118' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - block size setting fixes for loop/nbd (Jan Kara)

 - md bio_alloc_mddev() cleanup (Marcos)

 - Ensure we don't lose the REQ_INTEGRITY flag (Ming)

 - Two NVMe fixes by way of Christoph:
    - Fix NVMe IRQ calculation (Ming)
    - Uninitialized variable in nvmet-tcp (Sagi)

 - BFQ comment fix (Paolo)

 - License cleanup for recently added blk-mq-debugfs-zoned (Thomas)

* tag 'for-linus-20190118' of git://git.kernel.dk/linux-block:
  block: Cleanup license notice
  nvme-pci: fix nvme_setup_irqs()
  nvmet-tcp: fix uninitialized variable access
  block: don't lose track of REQ_INTEGRITY flag
  blockdev: Fix livelocks on loop device
  nbd: Use set_blocksize() to set device blocksize
  md: Make bio_alloc_mddev use bio_alloc_bioset
  block, bfq: fix comments on __bfq_deactivate_entity
2019-01-20 09:12:50 +12:00
Ming Lei c45b1fa243 nvme-pci: fix nvme_setup_irqs()
When -ENOSPC is returned from pci_alloc_irq_vectors_affinity(),
we still try to allocate multiple irq vectors again, so irq queues
covers the admin queue actually. But we don't consider that, then
number of the allocated irq vector may be same with sum of
io_queues[HCTX_TYPE_DEFAULT] and io_queues[HCTX_TYPE_READ], this way
is obviously wrong, and finally breaks nvme_pci_map_queues(), and
warning from pci_irq_get_affinity() is triggered.

IRQ queues should cover admin queues, this patch makes this
point explicitely in nvme_calc_io_queues().

We got severl boot failure internal report on aarch64, so please
consider to fix it in v4.20.

Fixes: 6451fe73fa ("nvme: fix irq vs io_queue calculations")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Tested-by: fin4478 <fin4478@hotmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-16 09:44:28 -07:00
Linus Torvalds b8c3b8992f for-linus-20190112
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlw6VoIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptaHD/wLDzVDkptMK7Essk73vjBaz6w2MYl/Bvb4
 2YtktHlrCB8S/0Xze7t5tvsGxym2fv1ymsV48Sdnl+lW+2tf6e3LJEfAp7yLWoEs
 K6UdFkhYK7fdKx84dHIcslvyY0id9qjRteRiFHrd/023N4FZDkbKOtfeRBZi7e4Q
 yTP2FzQEnSV5a0ngh9J+xxHUXF7a4LfeVj9ddhwcBIAeJ1MUfLTaHxRR8Qc3nw3F
 7nGvLHu2vrWK3EDBBA/PsBqMJhsoW2AV0G7Xbv6Ai7e14GWVYr+MGjIYC41yIlRm
 46oBoveIRI5dkhR5/mO685vB774yAN6RExOP9Xlg/cA3XxbpdWP9uDtfRbuvo64O
 WOm2btWKjxOldCao4wClaJARcjSyocPTKwlWQE7OfVcN+3WbsD26sd+GdCvrSG7a
 +E3xcyPUURN1e7t4YyDGDL1wKBPuZP6jvcGUfoqLMr3O12aIjlAiJFPhm1Hqv+dA
 KaJjsCwShSDJfeWLvP/Lt1KtVoBinrKA/5qkkB7zJA7I+qNP7bAR25BtuXKo1U7t
 5vRGk7rfvdtsE5oH7y1we9AxoA7jD6t5Z60mPRdkVaXzi9SSw3th/YAuUhcU2g0V
 0I2icAGQI8bbZzFgJe+d4f1L57EQ/7fWfcwENNLkLu57E/gZlYYqt//tSrF8W0ur
 wcthJWqOZg==
 =OFV3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190112' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request from Christoph, with little fixes all over the map

 - Loop caching fix for offset/bs change (Jaegeuk Kim)

 - Block documentation tweaks (Jeff, Jon, Weiping, John)

 - null_blk zoned tweak (John)

 - ahch mvebu suspend/resume support. Should have gone into the merge
   window, but there was some confusion on which tree had it. (Miquel)

* tag 'for-linus-20190112' of git://git.kernel.dk/linux-block: (22 commits)
  ata: ahci: mvebu: request PHY suspend/resume for Armada 3700
  ata: ahci: mvebu: add Armada 3700 initialization needed for S2RAM
  ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs
  ata: ahci: mvebu: remove stale comment
  ata: libahci_platform: comply to PHY framework
  loop: drop caches if offset or block_size are changed
  block: fix kerneldoc comment for blk_attempt_plug_merge()
  nvme: don't initlialize ctrl->cntlid twice
  nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN
  nvme: pad fake subsys NQN vid and ssvid with zeros
  nvme-multipath: zero out ANA log buffer
  nvme-fabrics: unset write/poll queues for discovery controllers
  nvme-tcp: don't ask if controller is fabrics
  nvme-tcp: remove dead code
  nvme-pci: fix out of bounds access in nvme_cqe_pending
  nvme-pci: rerun irq setup on IO queue init errors
  nvme-pci: use the same attributes when freeing host_mem_desc_bufs.
  nvme-pci: fix the wrong setting of nr_maps
  block: doc: add slice_idle_us to bfq documentation
  block: clarify documentation for blk_{start|finish}_plug
  ...
2019-01-12 13:40:51 -08:00
James Dingwall 6299358d19 nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN
If a device provides an NQN it is expected to be globally unique.
Unfortunately some firmware revisions for Intel 760p/Pro 7600p devices did
not satisfy this requirement.  In these circumstances if a system has >1
affected device then only one device is enabled.  If this quirk is enabled
then the device supplied subnqn is ignored and we fallback to generating
one as if the field was empty.  In this case we also suppress the version
check so we don't print a warning when the quirk is enabled.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: James Dingwall <james@dingwall.me.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-09 13:47:08 -05:00
Hongbo Yao dcca166272 nvme-pci: fix out of bounds access in nvme_cqe_pending
There is an out of bounds array access in nvme_cqe_peding().

When enable irq_thread for nvme interrupt, there is racing between the
nvmeq->cq_head updating and reading.

nvmeq->cq_head is updated in nvme_update_cq_head(), if nvmeq->cq_head
equals nvmeq->q_depth and before its value set to zero, nvme_cqe_pending()
uses its value as an array index, the index will be out of bounds.

Signed-off-by: Hongbo Yao <yaohongbo@huawei.com>
[hch: slight coding style update]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-09 13:47:05 -05:00
Keith Busch 8fae268b40 nvme-pci: rerun irq setup on IO queue init errors
If the driver is unable to create a subset of IO queues for any reason,
the read/write and polled queue sets will not match the actual allocated
hardware contexts. This leaves gaps in the CPU affinity mappings and
causes the following kernel panic after blk_mq_map_queue_type() returns
a NULL hctx.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000198
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP
  CPU: 64 PID: 1171 Comm: kworker/u259:1 Not tainted 4.20.0+ #241
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
  Workqueue: nvme-wq nvme_scan_work [nvme_core]
  RIP: 0010:blk_mq_init_allocated_queue+0x2d9/0x440
  RSP: 0018:ffffb1bf0abc3cd0 EFLAGS: 00010286
  RAX: 000000000000001f RBX: ffff8ea744cf0718 RCX: 0000000000000000
  RDX: 0000000000000002 RSI: 000000000000007c RDI: ffffffff9109a820
  RBP: ffff8ea7565f7008 R08: 000000000000001f R09: 000000000000003f
  R10: ffffb1bf0abc3c00 R11: 0000000000000000 R12: 000000000001d008
  R13: ffff8ea7565f7008 R14: 000000000000003f R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff8ea757200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000198 CR3: 0000000013058000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   blk_mq_init_queue+0x35/0x60
   nvme_validate_ns+0xc6/0x7c0 [nvme_core]
   ? nvme_identify_ctrl.isra.56+0x7e/0xc0 [nvme_core]
   nvme_scan_work+0xc8/0x340 [nvme_core]
   ? __wake_up_common+0x6d/0x120
   ? try_to_wake_up+0x55/0x410
   process_one_work+0x1e9/0x3d0
   worker_thread+0x2d/0x3d0
   ? process_one_work+0x3d0/0x3d0
   kthread+0x111/0x130
   ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30
  Modules linked in: nvme nvme_core serio_raw
  CR2: 0000000000000198

Fix by re-running the interrupt vector setup from scratch using a reduced
count that may be successful until the created queues matches the irq
affinity plus polling queue sets.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-09 13:47:02 -05:00
Liviu Dudau cc667f6d5d nvme-pci: use the same attributes when freeing host_mem_desc_bufs.
When using HMB the PCIe host driver allocates host_mem_desc_bufs using
dma_alloc_attrs() but frees them using dma_free_coherent(). Use the
correct dma_free_attrs() function to free the buffers.

Signed-off-by: Liviu Dudau <liviu@dudau.co.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-09 13:47:02 -05:00
Jianchao Wang c61e678f30 nvme-pci: fix the wrong setting of nr_maps
We only set the nr_maps to 3 if poll queues are supported.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-09 13:47:02 -05:00
Luis Chamberlain 750afb08ca cross-tree: phase out dma_zalloc_coherent()
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.

This change was generated with the following Coccinelle SmPL patch:

@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@

-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-08 07:58:37 -05:00
yupeng 604c01d567 nvme-pci: trace SQ status on completions
Export the disk name, queue id, sq_head, sq_tail to a trace event in
completion handling.

Usage example:

cd /sys/kernel/debug/tracing/events/nvme/nvme_sq

echo 'disk=="nvme1n1"' > filter

echo 1 > enable

cat /sys/kernel/debug/tracing/trace_pipe

Signed-off-by: yupeng <yupeng0921@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <keith.busch@intel.com>
[hch: slight formatting tweaks, use standard nvme tracepoint
 conventions]
Signed-off-by: Christoph Hellwig <hch@lst.de>

wip
2018-12-19 08:35:36 +01:00
Christoph Hellwig 91a509f8b7 nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
By duplicating the nvme_process_cq in both branches we keep the
sparse lock context checking happy, so do it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2018-12-18 17:50:45 +01:00
Christoph Hellwig ed92ad37e8 nvme-pci: only set nr_maps to 2 if poll queues are supported
The block layer now enables polling support on a queue if nr_maps
includes the poll map, so we should only set that if we actually
support poll queues.

Fixes:  6544d229bf ("block: enable polling by default if a poll map is initalized")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2018-12-18 17:50:44 +01:00
Christoph Hellwig 7e849dd9cf nvme-pci: don't share queue maps
Now that the block layer checks if a queue map has any queues inside
it there is no more reason to duplicate the maps for the non-default
types.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-17 05:44:45 -07:00
Jens Axboe 6451fe73fa nvme: fix irq vs io_queue calculations
Guenter reported an boot hang issue on HPPA after we default to 0 poll
queues. We have two issues in the queue count calculations:

1) We don't separate the poll queues from the read/write queues. This is
   important, since the former doesn't need interrupts.
2) The adjust logic is broken.

Adjust the poll queue count before doing nvme_calc_io_queues(). The poll
queue count is only limited by the IO queue count we were able to get
from the controller, not failures in the IRQ allocation loop. This
leaves nvme_calc_io_queues() just adjusting the read/write queue map.

Reported-by: Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-11 06:27:46 -07:00
Christoph Hellwig 376f7ef8bf block: only allow polling if a poll queue_map exists
This avoids having to have differnet mq_ops for different setups
with or without poll queues.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04 11:38:19 -07:00
Christoph Hellwig 3a7afd8ee4 nvme-pci: remove the CQ lock for interrupt driven queues
Now that we can't poll regular, interrupt driven I/O queues there
is almost nothing that can race with an interrupt.  The only
possible other contexts polling a CQ are the error handler and
queue shutdown, and both are so far off in the slow path that
we can simply use the big hammer of disabling interrupts.

With that we can stop taking the cq_lock for normal queues.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04 11:38:18 -07:00
Christoph Hellwig d1ed6aa14b nvme-pci: don't poll from irq context when deleting queues
This is the last place outside of nvme_irq that handles CQEs from
interrupt context, and thus is in the way of removing the cq_lock for
normal queues, and avoiding lockdep warnings on the poll queues, for
which we already take it without IRQ disabling.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04 11:38:18 -07:00
Christoph Hellwig 5271edd41d nvme-pci: refactor nvme_disable_io_queues
Pass the opcode for the delete SQ/CQ command as an argument instead of
the somewhat confusing pass loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04 11:38:18 -07:00
Christoph Hellwig 0b2a8a9f4b nvme-pci: consolidate code for polling non-dedicated queues
We have three places that can poll for I/O completions on a normal
interrupt-enabled queue.  All of them are in slow path code, so
consolidate them to a single helper that uses spin_lock_irqsave and
removes the fast path cqe_pending check.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04 11:38:18 -07:00