Commit graph

752063 commits

Author SHA1 Message Date
Martin Wilck 1409880357 scsi: devinfo: change blist_flag_t to 64bit
Space for SCSI blist flags is gradually running out. Change the type to
__u64 and fix a checkpatch complaint about symbolic mode flags in
scsi_devinfo.c.

Make checkpatch happy by replacing simple_strtoul() with kstrtoull().

Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-20 19:14:35 -04:00
Martin Wilck 659c1c1b29 scsi: devinfo: use const_ilog2 for array indices
Use the just introduced const_ilog2() macro to avoid sparse errors.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-20 19:14:28 -04:00
Martin Wilck dbef91ec54 scsi: ilog2: create truly constant version for sparse
Sparse emits errors about ilog2() in array indices because of the use of
__ilog2_32() and __ilog2_64(), rightly so
(https://www.spinics.net/lists/linux-sparse/msg03471.html).

Create a const_ilog2() variant that works with sparse for this scenario.

(Note: checkpatch.pl complains about missing parentheses, but that
appears to be a false positive. I can get rid of the warning simply by
inserting whitespace, making checkpatch "see" the whole macro).

Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-20 15:57:32 -04:00
Long Li 2217a47de4 scsi: storvsc: Select channel based on available percentage of ring buffer to write
This is a best effort for estimating on how busy the ring buffer is for
that channel, based on available buffer to write in percentage. It is
still possible that at the time of actual ring buffer write, the space
may not be available due to other processes may be writing at the time.

Selecting a channel based on how full it is can reduce the possibility
that a ring buffer write will fail, and avoid the situation a channel is
over busy.

Now it's possible that storvsc can use a smaller ring buffer size
(e.g. 40k bytes) to take advantage of cache locality.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-20 15:38:38 -04:00
Souptick Joarder 69589c9bb9 scsi: target: Change return type to vm_fault_t
Use new return type vm_fault_t for fault handler in struct
vm_operations_struct.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:49:37 -04:00
Lee Duncan 78a6295c71 scsi: target: prefer dbroot of /etc/target over /var/target
The target database root directory, dbroot, has defaulted to /var/target
for a while, but its main client, targetcli-fb, has been moving it to
/etc/target for quite some time. With the plethora of target drivers now
appearing, it has become more difficult to initialize this attribute
before use by any child drivers.

If the directory /etc/target exists, use that as the DB root. Otherwise,
fall back to using /var/target.

The ability to override this dbroot attribute still exists via sysfs.

Signed-off-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:47:02 -04:00
Colin Ian King bcbba7bb54 scsi: mptfc: fix spelling mistake in macro names
Rename macros MPI_FCPORTPAGE0_SUPPORT_SPEED_UKNOWN and
MPI_FCPORTPAGE0_CURRENT_SPEED_UKNOWN to add in missing N in UNKNOWN

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:08:47 -04:00
Bart Van Assche c976562162 scsi: sd_zbc: Let the SCSI core handle ILLEGAL REQUEST / ASC 0x21
scsi_io_completion() translates the sense key ILLEGAL REQUEST / ASC 0x21 into
ACTION_FAIL. That means that setting cmd->allowed to zero in sd_zbc_complete()
for this sense code / ASC combination is not necessary. Hence remove the code
that resets cmd->allowed from sd_zbc_complete().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:00:44 -04:00
Bart Van Assche 354f113205 scsi: sd_zbc: Change the type of the ZBC fields into u32
This patch does not change any functionality but makes it clear that it is on
purpose that these fields are 32 bits wide.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:00:44 -04:00
Christoph Hellwig 9027b15d51 scsi: storsvc: don't set a bounce limit
The default already is to never bounce, so the call is a no-op.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:00:44 -04:00
Christoph Hellwig 3e58c5cf16 scsi: iscsi_tcp: don't set a bounce limit
The default already is to never bounce, so the call is a no-op.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:00:44 -04:00
Souptick Joarder 0cac8e1bba scsi: sg: Change return type to vm_fault_t
Use new return type vm_fault_t for fault handler in struct
vm_operations_struct.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:00:44 -04:00
Michael Schmitz 3109e5ae03 scsi: zorro_esp: New driver for Amiga Zorro NCR53C9x boards
New combined SCSI driver for all ESP based Zorro SCSI boards for m68k Amiga.

Code largely based on board specific parts of the old drivers (blz1230.c,
blz2060.c, cyberstorm.c, cyberstormII.c, fastlane.c which were removed after
the 2.6 kernel series for lack of maintenance) with contributions by Tuomas
Vainikka (TCQ bug tests and workaround) and Finn Thain (TCQ bugfix by use of
PIO in extended message in transfer).

New Kconfig option and Makefile entries for new Amiga Zorro ESP SCSI driver
included in this patch.

Use DMA transfers wherever possible, with board-specific DMA set-up functions
copied from the old driver code. Three byte reselection messages do appear to
cause DMA timeouts. So wire up a PIO transfer routine for these
instead. esp_reselect_with_tag explicitly sets
esp->cmd_block_dma as target address for the message bytes but PIO
requires a virtual address.  Substiute kernel virtual address
esp->cmd_block in PIO transfer call if DMA address is esp->cmd_block_dma
and phase is message in.

PIO code taken from mac_esp.c where the reselection timeout issue was debugged
and fixed first, with minor macro and function rename.

Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian T. Steigies <cts@debian.org>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:00:44 -04:00
Xose Vazquez Perez 37b37d2609 scsi: scsi_dh: replace too broad "TP9" string with the exact models
SGI/TP9100 is not an RDAC array:
    ^^^
https://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libmultipath/hwtable.c;h=88b4700beb1d8940008020fbe4c3cd97d62f4a56;hb=HEAD#l235

This partially reverts commit 35204772ea ("[SCSI] scsi_dh_rdac :
Consolidate rdac strings together")

[mkp: fixed up the new entries to align with rest of struct]

Cc: NetApp RDAC team <ng-eseries-upstream-maintainers@netapp.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: SCSI ML <linux-scsi@vger.kernel.org>
Cc: DM ML <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:08 -04:00
Xose Vazquez Perez b15578ddab scsi: devinfo: delete duplicate "Generic"/"USB Storage-SMC" device
The revision field is currently unused by the devinfo pattern matching
code. Combine two blacklist entries into one.

$ egrep "Generic.*Storage-SMC" /proc/scsi/device_info
'Generic' 'USB Storage-SMC' 0x402
'Generic' 'USB Storage-SMC' 0x402

[mkp: tweaked commit desc]

Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: SCSI ML <linux-scsi@vger.kernel.org>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:08 -04:00
James Smart 40e4a2e15c scsi: lpfc: update driver version to 12.0.0.2
Update the driver version to 12.0.0.2

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:07 -04:00
James Smart b0a00d8d2b scsi: lpfc: Correct missing remoteport registration during link bounces
Remote port disappearance/reappearances would cause a series of RSCN
events to be delivered to the driver. During the resulting GID_FT
handling, the driver clears the fc4 settings on the remote port, which
makes it skip registration. As such, the nvme associations eventually
fail and return io errors to the applications.

Correct by not clearng the nlp_fc4_types for all nodes in
lpfc_issue_gidft.  Instead, when the GID_FT response is handled, clear
the nlp_fc4_types of FCP and NVME prior to evaluating the fc4_type
returned by the GID_FT response.  This approach leaves "skipped" nodes
with their nlp_fc4_types intacted.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:06 -04:00
James Smart 66a85155d4 scsi: lpfc: Fix NULL pointer reference when resetting adapter
Points referencing local port structures didn't accommodate cases where
the localport may not be registered yet.

Add NULL pointer checks to logic.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:06 -04:00
James Smart b15bd3e621 scsi: lpfc: Fix nvme remoteport registration race conditions
On tests adding and removing a remote port, calls to nvme_info would
eventually show fewer target ports discovered than were present in the
san. Additionally, the following error messages were seen:

  6031 RemotePort Registration failed err: -116, DID x471301

There is a race condition that exists between the driver and the nvme
transport on remote port unregister vs the confirmed deletion. It's
possible that the driver may rediscover the remote port and reregister
the remote port before a prior unregister delete callback was made (as
it rebinded to the prior remoteport structure). However, the driver was
coded to expect the callback before seeing the remote port again thus a
new registration. The logic results in the driver having an invalid
remoteport pointer set.

Correct by tracking when waiting for the delete callback. In cases where
the ndlp remoteport pointer is updated, it is only cleared when the wait
has not been superceded by a prior registration.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:05 -04:00
James Smart b04744ce52 scsi: lpfc: Fix driver not recovering NVME rports during target link faults
During target-side port faults, the driver would not recover all target
port logins. This resulted in a loss of nvme device discovery.

The driver is coded to wait for all GID_FT requests to complete before
restarting discovery. A fault is seen where the outstanding GIT_FT
counts are not properly decremented, thus discovery would never
start. Another fault was found in the clearing of the gidft_inp counter
that would be skipped in this condition. And a third fault found with
lpfc_nvme_register_port that would remove a reverence on the ndlp which
then allows a node swap on a port address change to prematurely remove
the reference and release the ndlp.

The following changes are made:

 - Correct the decrementing of the outstanding GID_FT counters.

 - In RSCN handling, no longer zero the counter before calling to issue
   another GID_FT.

 - No longer remove the reference on the dlp when the ndlp->nrport value
   is not yet null.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:05 -04:00
James Smart bf316c7851 scsi: lpfc: Fix WQ/CQ creation for older asic's.
The patch to enlarge WQ/CQ creation keys off of an adapter response that
indicates support for the larger values. Older adapters return an
incorrect response and are limited in size.  Thus the adapters fail the
WQ creation steps.

Augment the WQ sizing checks with a check on the older adapter types and
limit them to the restricted sizes.

Fixes: c176ffa084 ("scsi: lpfc: Increase CQ and WQ sizes for SCSI")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:04 -04:00
James Smart 01466024d2 scsi: lpfc: Fix NULL pointer access in lpfc_nvme_info_show
After making remoteport unregister requests, the ndlp nrport pointer was
stale.

Track when waiting for waiting for unregister completion callback and
adjust nldp pointer assignment.  Add a few safety checks for NULL
pointer values.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:04 -04:00
James Smart 0cdb84ec26 scsi: lpfc: Fix lingering lpfc_wq resource after driver unload
After driver unloads, lpfc_wq remains active. The destroy_workqueue
calls were not being made in driver unload.  Additionally, SLI3 is
allocating lpfc_wq resources, but never uses it.

Make the destroy_workqueue calls on driver unload.  Modify the SLI3 code
path no longer allocate lpfc_wq resources.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:03 -04:00
James Smart 59c68eaad7 scsi: lpfc: Fix Abort request WQ selection
When running loads that generated aborts, io errors where seen.  Turns
out the abort requests where not placed on the proper WQ resulting in
the errors. Closer inspection inspection of this error also showed
improper spinlock api use.

Correct the WQ selection policy for the abort requests.  Correct
spin_lock/spin_lock_irq/spin_lock_irqsave usage.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:02 -04:00
James Smart 2448e48425 scsi: lpfc: Enlarge nvmet asynchronous receive buffer counts
Under large io load, the current sizing of asynchronous buffer counts
could be exceeded, indicated by a 2885 log message:

  2885 Port Status Event: port status reg 0x81800000, port smphr
      reg 0xc000, error 1=0x52004a01, error 2=0x0

Enlarge the async receive queue size.  Allow for a configurable number
of buffers to be posted to each RQ, using the new attribute
lpfc_nvmet_mrq_post.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:02 -04:00
James Smart 66a210ffb8 scsi: lpfc: Add per io channel NVME IO statistics
When debugging various issues, per IO channel IO statistics were useful
to understand what was happening. However, many of the stats were on a
port basis rather than an io channel basis.

Move statistics to an io channel basis.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:01 -04:00
James Smart f91bc594ba scsi: lpfc: Correct target queue depth application changes
The max_scsicmpl_time parameter can be used to perform scsi cmd queue
depth mgmt based on io completion time: the queue depth is reduced to
make completion time shorter. However, as soon as an io completes and
the completion time is within limits, the code immediately bumps the
queue depth limit back up to the target queue depth. Thus the procedure
restarts, effectively limiting the usefulness of adjusting queue depth
to help completion time.

This patch makes the following changes:

 - Removes the code at io completion that resets the queue depth as soon
   as within limits.

 - As the code removed was where the target queue depth was first
   applied, change target queue depth application so that it occurs when
   the parameter is changed.

 - Makes target queue depth a standard parameter: both a module
   parameter and a sysfs parameter.

 - Optimizes the command pending count by using atomics rather than
   locks.

 - Updates the debugfs nodelist stats to allow better debugging of
   pending command counts.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:01 -04:00
James Smart 118c0415ee scsi: lpfc: Fix multiple PRLI completion error path
Nodelist entry for SCSI array ends up in UNMAPPED state. This is due to
illegal discovery State machine transition because of two PRLIs and the
first one failing with LS_RJT. Also, the error path was designed
assuming the PRLIs complete in the order they were sent, FCP first, then
NVME. In a failing case, the array thinks about the first PRLI (FCP),
but issues LS_RJT for the 2nd PRLI immediately.

Fix PRLI completion error path for the ordering expectation.  Ensure the
discovery state machine update is not set until all outstanding PRLIs
are complete.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:34:00 -04:00
Shivasharan S 67c5490ace scsi: megaraid_sas: driver version upgrade
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:33:59 -04:00
Shivasharan S 3239b8cd28 scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs
Hardware could time out Fastpath IOs one second earlier than the timeout
provided by the host.

For non-RAID devices, driver provides timeout value based on OS provided
timeout value. Under certain scenarios, if the OS provides a timeout
value of 1 second, due to above behavior hardware will timeout
immediately.

Increase timeout value for non-RAID fastpath IOs by 1 second.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:33:59 -04:00
Himanshu Jha 3c6c122cfc scsi: megaraid_sas: Use zeroing memory allocator than allocator/memset
Use pci_zalloc_consistent for allocating zeroed memory and remove
unnecessary memset function.

Done using Coccinelle.
Generated by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci

Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:33:58 -04:00
Long Li 6b1f8376dc scsi: netvsc: Use the vmbus function to calculate ring buffer percentage
In Vmbus, we have defined a function to calculate available ring buffer
percentage to write.

Use that function and remove netvsc's private version.

[mkp: typo]

Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:52 -04:00
Long Li 63273cb401 scsi: vmbus: Add function to report available ring buffer to write in total ring size percentage
Netvsc has a function to calculate how much ring buffer in percentage is
available to write. This function is also useful for storvsc and other
vmbus devices.

Define a similar function in vmbus to be used by other vmbus devices.

Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Jason Yan b6240a4df0 scsi: libsas: add transport class for ATA devices
Now ata devices attached with sas controller do not have transport
class, so that we can not see any information of these ata devices in
/sys/class/ata_port(or ata_link or ata_device).

Add transport class for the ata devices attached with sas controller.
The /sys/class directory will show the infomation of the ata devices
as follows:

localhost:/sys/class # ls ata*
ata_device:
dev1.0  dev2.0

ata_link:
link1  link2

ata_port:
ata1  ata2

No functional change of the device scanning and io path. The ata
transport class was deleted when destroying the sas devices.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
John Garry c90a0bea4f scsi: hisi_sas: remove some unneeded structure members
This patch removes unneeded structure elements:

- hisi_sas_phy.dev_sas_addr: only ever written
	- Also remove associated function which writes it,
	  hisi_sas_init_add().

- hisi_sas_device.attached_phy: only ever written
	- Also remove code to set it in hisi_sas_dev_found()

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
John Garry 381ed6c081 scsi: hisi_sas: print device id for errors
When we find an erroneous slot completion, to help aid debugging add the
device index to the current debug log.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Xiaofei Tan 327f242fa8 scsi: hisi_sas: check IPTT is valid before using it for v3 hw
There is a bug of v3 hw development version. When AXI error happen, hw
may return an abnormal CQ that IPTT value is 0xffff.  This will cause
IPTT out-of-bounds reference.

This patch adds a check of IPTT in cq_tasklet_v3_hw() and discards
invalid slot. This workaround scheme is just to enhance fault-tolerance
of the driver. So, we will apply this scheme for all version of v3 hw,
although release version has fixed this SoC bug.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Xiaofei Tan 3ff0f0b657 scsi: hisi_sas: consolidate command check in hisi_sas_get_ata_protocol()
Currently we check the fis->command value in 2 locations in
hisi_sas_get_ata_protocol() switch statement. Fix this by consolidating
the check for fis->command value to 1 location only.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Xiang Chen 4f4e21b8ff scsi: hisi_sas: use dma_zalloc_coherent()
This is a warning coming from Coccinelle, and need to use new interface
dma_zalloc_coherent() instead of dma_alloc_coherent()/memset().

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Xiang Chen 5df41af4b1 scsi: hisi_sas: delete timer when removing hisi_sas driver
Delete timer for v1 and v3 hw when removing hisi_sas driver.

Signed-off-by: Xiang chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Xiaofei Tan 6157363091 scsi: hisi_sas: update RAS feature for later revision of v3 HW
There is an modification for later revision of v3 hw. More HW errors are
reported through RAS interrupt. These errors were originally reported
only through MSI.

When report to RAS, some combinations are done to port AXI errors and
FIFO OMIT errors. For example, each port has 4 AXI errors, and they are
combined to one when report to RAS.

This patch does two things:

1. Enable RAS interrupt of these errors and handle them in PCI
   error handlers.

2. Disable MSI interrupts of these errors for this later revision hw.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Xiang Chen 8b8d665315 scsi: hisi_sas: make SAS address of SATA disks unique
When directly connected with SATA disks in different SAS cores, fill SAS
address with scsi_host's id to make it's fake SAS address unique.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Uma Krishnan d2d354a606 scsi: cxlflash: Handle spurious interrupts
The following Oops can occur when there is heavy I/O traffic and the host is
reset by a tool such as sg_reset.

[c000200fff3fbc90] c00800001690117c process_cmd_doneq+0x104/0x500
                                       [cxlflash] (unreliable)
[c000200fff3fbd80] c008000016901648 cxlflash_rrq_irq+0xd0/0x150 [cxlflash]
[c000200fff3fbde0] c000000000193130 __handle_irq_event_percpu+0xa0/0x310
[c000200fff3fbea0] c0000000001933d8 handle_irq_event_percpu+0x38/0x90
[c000200fff3fbee0] c000000000193494 handle_irq_event+0x64/0xb0
[c000200fff3fbf10] c000000000198ea0 handle_fasteoi_irq+0xc0/0x230
[c000200fff3fbf40] c00000000019182c generic_handle_irq+0x4c/0x70
[c000200fff3fbf60] c00000000001794c __do_irq+0x7c/0x1c0
[c000200fff3fbf90] c00000000002a390 call_do_irq+0x14/0x24
[c000200e5828fab0] c000000000017b2c do_IRQ+0x9c/0x130
[c000200e5828fb00] c000000000009b04 h_virt_irq_common+0x114/0x120

When a context is reset, the pending commands are flushed and the AFU is
notified. Before the AFU handles this request there could be command
completion interrupts queued to PHB which are yet to be delivered to the
context. In this scenario, a context could receive an interrupt for a command
that has been flushed, leading to a possible crash when the memory for the
flushed command is accessed.

To resolve this problem, a boolean will indicate if the hardware queue is
ready to process interrupts or not. This can be evaluated in the interrupt
handler before proessing an interrupt.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Uma Krishnan 9a597cd4c0 scsi: cxlflash: Remove commmands from pending list on timeout
The following Oops can occur if an internal command sent to the AFU does not
complete within the timeout:

[c000000ff101b810] c008000016020d94 term_mc+0xfc/0x1b0 [cxlflash]
[c000000ff101b8a0] c008000016020fb0 term_afu+0x168/0x280 [cxlflash]
[c000000ff101b930] c0080000160232ec cxlflash_pci_error_detected+0x184/0x230
                                       [cxlflash]
[c000000ff101b9e0] c00800000d95d468 cxl_vphb_error_detected+0x90/0x150[cxl]
[c000000ff101ba20] c00800000d95f27c cxl_pci_error_detected+0xa4/0x240 [cxl]
[c000000ff101bac0] c00000000003eaf8 eeh_report_error+0xd8/0x1b0
[c000000ff101bb20] c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170
[c000000ff101bbb0] c00000000003f438 eeh_handle_normal_event+0x198/0x580
[c000000ff101bc60] c00000000003fba4 eeh_handle_event+0x2a4/0x338
[c000000ff101bd10] c0000000000400b8 eeh_event_handler+0x1f8/0x200
[c000000ff101bdc0] c00000000013da48 kthread+0x1a8/0x1b0
[c000000ff101be30] c00000000000b528 ret_from_kernel_thread+0x5c/0xb4

When an internal command times out, the command buffer is freed while it is
still in the pending commands list of the context. This corrupts the list and
when the context is cleaned up, a crash is encountered.

To resolve this issue, when an AFU command or TMF command times out, the
command should be deleted from the hardware queue pending command list before
freeing the buffer.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:50 -04:00
Uma Krishnan a3feb6ef50 scsi: cxlflash: Synchronize reset and remove ops
The following Oops can be encountered if a device removal or system shutdown
is initiated while an EEH recovery is in process:

[c000000ff2f479c0] c008000015256f18 cxlflash_pci_slot_reset+0xa0/0x100
                                      [cxlflash]
[c000000ff2f47a30] c00800000dae22e0 cxl_pci_slot_reset+0x168/0x290 [cxl]
[c000000ff2f47ae0] c00000000003ef1c eeh_report_reset+0xec/0x170
[c000000ff2f47b20] c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170
[c000000ff2f47bb0] c00000000003f80c eeh_handle_normal_event+0x56c/0x580
[c000000ff2f47c60] c00000000003fba4 eeh_handle_event+0x2a4/0x338
[c000000ff2f47d10] c0000000000400b8 eeh_event_handler+0x1f8/0x200
[c000000ff2f47dc0] c00000000013da48 kthread+0x1a8/0x1b0
[c000000ff2f47e30] c00000000000b528 ret_from_kernel_thread+0x5c/0xb4

The remove handler frees AFU memory while the EEH recovery is in progress,
leading to a race condition. This can result in a crash if the recovery thread
tries to access this memory.

To resolve this issue, the cxlflash remove handler will evaluate the device
state and yield to any active reset or probing threads.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:50 -04:00
Uma Krishnan 07d0c52f87 scsi: cxlflash: Enable OCXL operations
This commit enables the OCXL operations for the OCXL devices.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:50 -04:00
Uma Krishnan 9433fb32b7 scsi: cxlflash: Support AFU reset
The cxlflash core driver resets the AFU when the master contexts are created
in the initialization or recovery paths. Today, the OCXL provider service to
perform this operation is pending implementation.  To avoid a crash due to a
missing fop, log an error once and return success to continue with execution.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:50 -04:00
Uma Krishnan 66ae644b92 scsi: cxlflash: Register for translation errors
While enabling a context on the link, a predefined callback can be registered
with the OCXL provider services to be notified on translation errors. These
errors can in turn be passed back to the user on a read operation.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:50 -04:00
Uma Krishnan f81face725 scsi: cxlflash: Introduce OCXL context state machine
In order to protect the OCXL hardware contexts from getting clobbered, a
simple state machine is added to indicate when a context is in open, close or
start state. The expected states are validated throughout the code to prevent
illegal operations on a context. A mutex is added to protect writes to the
context state field.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:50 -04:00
Uma Krishnan d91dd3a7d1 scsi: cxlflash: Update synchronous interrupt status bits
The SISLite specification has been updated to define new synchronous interrupt
status bits. These bits are set by the AFU when a given PASID or EA is bad and
a synchronous interrupt is triggered.

The SISLite header file is updated to support these new bits. Note that there
are also some formatting updates to some of the existing bits to allow all of
the definitions to line up uniformly.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:50 -04:00