redonkable/alistair23-linux

Author	SHA1	Message	Date
Martin Wilck	1409880357	scsi: devinfo: change blist_flag_t to 64bit Space for SCSI blist flags is gradually running out. Change the type to __u64 and fix a checkpatch complaint about symbolic mode flags in scsi_devinfo.c. Make checkpatch happy by replacing simple_strtoul() with kstrtoull(). Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-20 19:14:35 -04:00
Martin Wilck	659c1c1b29	scsi: devinfo: use const_ilog2 for array indices Use the just introduced const_ilog2() macro to avoid sparse errors. Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-20 19:14:28 -04:00
Martin Wilck	dbef91ec54	scsi: ilog2: create truly constant version for sparse Sparse emits errors about ilog2() in array indices because of the use of __ilog2_32() and __ilog2_64(), rightly so (https://www.spinics.net/lists/linux-sparse/msg03471.html). Create a const_ilog2() variant that works with sparse for this scenario. (Note: checkpatch.pl complains about missing parentheses, but that appears to be a false positive. I can get rid of the warning simply by inserting whitespace, making checkpatch "see" the whole macro). Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-20 15:57:32 -04:00
Long Li	2217a47de4	scsi: storvsc: Select channel based on available percentage of ring buffer to write This is a best effort for estimating on how busy the ring buffer is for that channel, based on available buffer to write in percentage. It is still possible that at the time of actual ring buffer write, the space may not be available due to other processes may be writing at the time. Selecting a channel based on how full it is can reduce the possibility that a ring buffer write will fail, and avoid the situation a channel is over busy. Now it's possible that storvsc can use a smaller ring buffer size (e.g. 40k bytes) to take advantage of cache locality. Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-20 15:38:38 -04:00
Souptick Joarder	69589c9bb9	scsi: target: Change return type to vm_fault_t Use new return type vm_fault_t for fault handler in struct vm_operations_struct. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:49:37 -04:00
Lee Duncan	78a6295c71	scsi: target: prefer dbroot of /etc/target over /var/target The target database root directory, dbroot, has defaulted to /var/target for a while, but its main client, targetcli-fb, has been moving it to /etc/target for quite some time. With the plethora of target drivers now appearing, it has become more difficult to initialize this attribute before use by any child drivers. If the directory /etc/target exists, use that as the DB root. Otherwise, fall back to using /var/target. The ability to override this dbroot attribute still exists via sysfs. Signed-off-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:47:02 -04:00
Colin Ian King	bcbba7bb54	scsi: mptfc: fix spelling mistake in macro names Rename macros MPI_FCPORTPAGE0_SUPPORT_SPEED_UKNOWN and MPI_FCPORTPAGE0_CURRENT_SPEED_UKNOWN to add in missing N in UNKNOWN Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:08:47 -04:00
Bart Van Assche	c976562162	scsi: sd_zbc: Let the SCSI core handle ILLEGAL REQUEST / ASC 0x21 scsi_io_completion() translates the sense key ILLEGAL REQUEST / ASC 0x21 into ACTION_FAIL. That means that setting cmd->allowed to zero in sd_zbc_complete() for this sense code / ASC combination is not necessary. Hence remove the code that resets cmd->allowed from sd_zbc_complete(). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:00:44 -04:00
Bart Van Assche	354f113205	scsi: sd_zbc: Change the type of the ZBC fields into u32 This patch does not change any functionality but makes it clear that it is on purpose that these fields are 32 bits wide. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:00:44 -04:00
Christoph Hellwig	9027b15d51	scsi: storsvc: don't set a bounce limit The default already is to never bounce, so the call is a no-op. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:00:44 -04:00
Christoph Hellwig	3e58c5cf16	scsi: iscsi_tcp: don't set a bounce limit The default already is to never bounce, so the call is a no-op. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:00:44 -04:00
Souptick Joarder	0cac8e1bba	scsi: sg: Change return type to vm_fault_t Use new return type vm_fault_t for fault handler in struct vm_operations_struct. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:00:44 -04:00
Michael Schmitz	3109e5ae03	scsi: zorro_esp: New driver for Amiga Zorro NCR53C9x boards New combined SCSI driver for all ESP based Zorro SCSI boards for m68k Amiga. Code largely based on board specific parts of the old drivers (blz1230.c, blz2060.c, cyberstorm.c, cyberstormII.c, fastlane.c which were removed after the 2.6 kernel series for lack of maintenance) with contributions by Tuomas Vainikka (TCQ bug tests and workaround) and Finn Thain (TCQ bugfix by use of PIO in extended message in transfer). New Kconfig option and Makefile entries for new Amiga Zorro ESP SCSI driver included in this patch. Use DMA transfers wherever possible, with board-specific DMA set-up functions copied from the old driver code. Three byte reselection messages do appear to cause DMA timeouts. So wire up a PIO transfer routine for these instead. esp_reselect_with_tag explicitly sets esp->cmd_block_dma as target address for the message bytes but PIO requires a virtual address. Substiute kernel virtual address esp->cmd_block in PIO transfer call if DMA address is esp->cmd_block_dma and phase is message in. PIO code taken from mac_esp.c where the reselection timeout issue was debugged and fixed first, with minor macro and function rename. Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Finn Thain <fthain@telegraphics.com.au> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian T. Steigies <cts@debian.org> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-19 00:00:44 -04:00
Xose Vazquez Perez	37b37d2609	scsi: scsi_dh: replace too broad "TP9" string with the exact models SGI/TP9100 is not an RDAC array: ^^^ https://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libmultipath/hwtable.c;h=88b4700beb1d8940008020fbe4c3cd97d62f4a56;hb=HEAD#l235 This partially reverts commit `35204772ea` ("[SCSI] scsi_dh_rdac : Consolidate rdac strings together") [mkp: fixed up the new entries to align with rest of struct] Cc: NetApp RDAC team <ng-eseries-upstream-maintainers@netapp.com> Cc: Hannes Reinecke <hare@suse.de> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: SCSI ML <linux-scsi@vger.kernel.org> Cc: DM ML <dm-devel@redhat.com> Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:08 -04:00
Xose Vazquez Perez	b15578ddab	scsi: devinfo: delete duplicate "Generic"/"USB Storage-SMC" device The revision field is currently unused by the devinfo pattern matching code. Combine two blacklist entries into one. $ egrep "Generic.*Storage-SMC" /proc/scsi/device_info 'Generic' 'USB Storage-SMC' 0x402 'Generic' 'USB Storage-SMC' 0x402 [mkp: tweaked commit desc] Cc: Hannes Reinecke <hare@suse.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: SCSI ML <linux-scsi@vger.kernel.org> Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:08 -04:00
James Smart	40e4a2e15c	scsi: lpfc: update driver version to 12.0.0.2 Update the driver version to 12.0.0.2 Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:07 -04:00
James Smart	b0a00d8d2b	scsi: lpfc: Correct missing remoteport registration during link bounces Remote port disappearance/reappearances would cause a series of RSCN events to be delivered to the driver. During the resulting GID_FT handling, the driver clears the fc4 settings on the remote port, which makes it skip registration. As such, the nvme associations eventually fail and return io errors to the applications. Correct by not clearng the nlp_fc4_types for all nodes in lpfc_issue_gidft. Instead, when the GID_FT response is handled, clear the nlp_fc4_types of FCP and NVME prior to evaluating the fc4_type returned by the GID_FT response. This approach leaves "skipped" nodes with their nlp_fc4_types intacted. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:06 -04:00
James Smart	66a85155d4	scsi: lpfc: Fix NULL pointer reference when resetting adapter Points referencing local port structures didn't accommodate cases where the localport may not be registered yet. Add NULL pointer checks to logic. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:06 -04:00
James Smart	b15bd3e621	scsi: lpfc: Fix nvme remoteport registration race conditions On tests adding and removing a remote port, calls to nvme_info would eventually show fewer target ports discovered than were present in the san. Additionally, the following error messages were seen: 6031 RemotePort Registration failed err: -116, DID x471301 There is a race condition that exists between the driver and the nvme transport on remote port unregister vs the confirmed deletion. It's possible that the driver may rediscover the remote port and reregister the remote port before a prior unregister delete callback was made (as it rebinded to the prior remoteport structure). However, the driver was coded to expect the callback before seeing the remote port again thus a new registration. The logic results in the driver having an invalid remoteport pointer set. Correct by tracking when waiting for the delete callback. In cases where the ndlp remoteport pointer is updated, it is only cleared when the wait has not been superceded by a prior registration. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:05 -04:00
James Smart	b04744ce52	scsi: lpfc: Fix driver not recovering NVME rports during target link faults During target-side port faults, the driver would not recover all target port logins. This resulted in a loss of nvme device discovery. The driver is coded to wait for all GID_FT requests to complete before restarting discovery. A fault is seen where the outstanding GIT_FT counts are not properly decremented, thus discovery would never start. Another fault was found in the clearing of the gidft_inp counter that would be skipped in this condition. And a third fault found with lpfc_nvme_register_port that would remove a reverence on the ndlp which then allows a node swap on a port address change to prematurely remove the reference and release the ndlp. The following changes are made: - Correct the decrementing of the outstanding GID_FT counters. - In RSCN handling, no longer zero the counter before calling to issue another GID_FT. - No longer remove the reference on the dlp when the ndlp->nrport value is not yet null. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:05 -04:00
James Smart	bf316c7851	scsi: lpfc: Fix WQ/CQ creation for older asic's. The patch to enlarge WQ/CQ creation keys off of an adapter response that indicates support for the larger values. Older adapters return an incorrect response and are limited in size. Thus the adapters fail the WQ creation steps. Augment the WQ sizing checks with a check on the older adapter types and limit them to the restricted sizes. Fixes: `c176ffa084` ("scsi: lpfc: Increase CQ and WQ sizes for SCSI") Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:04 -04:00
James Smart	01466024d2	scsi: lpfc: Fix NULL pointer access in lpfc_nvme_info_show After making remoteport unregister requests, the ndlp nrport pointer was stale. Track when waiting for waiting for unregister completion callback and adjust nldp pointer assignment. Add a few safety checks for NULL pointer values. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:04 -04:00
James Smart	0cdb84ec26	scsi: lpfc: Fix lingering lpfc_wq resource after driver unload After driver unloads, lpfc_wq remains active. The destroy_workqueue calls were not being made in driver unload. Additionally, SLI3 is allocating lpfc_wq resources, but never uses it. Make the destroy_workqueue calls on driver unload. Modify the SLI3 code path no longer allocate lpfc_wq resources. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:03 -04:00
James Smart	59c68eaad7	scsi: lpfc: Fix Abort request WQ selection When running loads that generated aborts, io errors where seen. Turns out the abort requests where not placed on the proper WQ resulting in the errors. Closer inspection inspection of this error also showed improper spinlock api use. Correct the WQ selection policy for the abort requests. Correct spin_lock/spin_lock_irq/spin_lock_irqsave usage. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:02 -04:00
James Smart	2448e48425	scsi: lpfc: Enlarge nvmet asynchronous receive buffer counts Under large io load, the current sizing of asynchronous buffer counts could be exceeded, indicated by a 2885 log message: 2885 Port Status Event: port status reg 0x81800000, port smphr reg 0xc000, error 1=0x52004a01, error 2=0x0 Enlarge the async receive queue size. Allow for a configurable number of buffers to be posted to each RQ, using the new attribute lpfc_nvmet_mrq_post. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:02 -04:00
James Smart	66a210ffb8	scsi: lpfc: Add per io channel NVME IO statistics When debugging various issues, per IO channel IO statistics were useful to understand what was happening. However, many of the stats were on a port basis rather than an io channel basis. Move statistics to an io channel basis. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:01 -04:00
James Smart	f91bc594ba	scsi: lpfc: Correct target queue depth application changes The max_scsicmpl_time parameter can be used to perform scsi cmd queue depth mgmt based on io completion time: the queue depth is reduced to make completion time shorter. However, as soon as an io completes and the completion time is within limits, the code immediately bumps the queue depth limit back up to the target queue depth. Thus the procedure restarts, effectively limiting the usefulness of adjusting queue depth to help completion time. This patch makes the following changes: - Removes the code at io completion that resets the queue depth as soon as within limits. - As the code removed was where the target queue depth was first applied, change target queue depth application so that it occurs when the parameter is changed. - Makes target queue depth a standard parameter: both a module parameter and a sysfs parameter. - Optimizes the command pending count by using atomics rather than locks. - Updates the debugfs nodelist stats to allow better debugging of pending command counts. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:01 -04:00
James Smart	118c0415ee	scsi: lpfc: Fix multiple PRLI completion error path Nodelist entry for SCSI array ends up in UNMAPPED state. This is due to illegal discovery State machine transition because of two PRLIs and the first one failing with LS_RJT. Also, the error path was designed assuming the PRLIs complete in the order they were sent, FCP first, then NVME. In a failing case, the array thinks about the first PRLI (FCP), but issues LS_RJT for the 2nd PRLI immediately. Fix PRLI completion error path for the ordering expectation. Ensure the discovery state machine update is not set until all outstanding PRLIs are complete. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:34:00 -04:00
Shivasharan S	67c5490ace	scsi: megaraid_sas: driver version upgrade Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:33:59 -04:00
Shivasharan S	3239b8cd28	scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs Hardware could time out Fastpath IOs one second earlier than the timeout provided by the host. For non-RAID devices, driver provides timeout value based on OS provided timeout value. Under certain scenarios, if the OS provides a timeout value of 1 second, due to above behavior hardware will timeout immediately. Increase timeout value for non-RAID fastpath IOs by 1 second. Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:33:59 -04:00
Himanshu Jha	3c6c122cfc	scsi: megaraid_sas: Use zeroing memory allocator than allocator/memset Use pci_zalloc_consistent for allocating zeroed memory and remove unnecessary memset function. Done using Coccinelle. Generated by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:33:58 -04:00
Long Li	6b1f8376dc	scsi: netvsc: Use the vmbus function to calculate ring buffer percentage In Vmbus, we have defined a function to calculate available ring buffer percentage to write. Use that function and remove netvsc's private version. [mkp: typo] Signed-off-by: Long Li <longli@microsoft.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:52 -04:00
Long Li	63273cb401	scsi: vmbus: Add function to report available ring buffer to write in total ring size percentage Netvsc has a function to calculate how much ring buffer in percentage is available to write. This function is also useful for storvsc and other vmbus devices. Define a similar function in vmbus to be used by other vmbus devices. Signed-off-by: Long Li <longli@microsoft.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
Jason Yan	b6240a4df0	scsi: libsas: add transport class for ATA devices Now ata devices attached with sas controller do not have transport class, so that we can not see any information of these ata devices in /sys/class/ata_port(or ata_link or ata_device). Add transport class for the ata devices attached with sas controller. The /sys/class directory will show the infomation of the ata devices as follows: localhost:/sys/class # ls ata* ata_device: dev1.0 dev2.0 ata_link: link1 link2 ata_port: ata1 ata2 No functional change of the device scanning and io path. The ata transport class was deleted when destroying the sas devices. Signed-off-by: Jason Yan <yanaijie@huawei.com> CC: Dan Williams <dan.j.williams@intel.com> CC: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
John Garry	c90a0bea4f	scsi: hisi_sas: remove some unneeded structure members This patch removes unneeded structure elements: - hisi_sas_phy.dev_sas_addr: only ever written - Also remove associated function which writes it, hisi_sas_init_add(). - hisi_sas_device.attached_phy: only ever written - Also remove code to set it in hisi_sas_dev_found() Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
John Garry	381ed6c081	scsi: hisi_sas: print device id for errors When we find an erroneous slot completion, to help aid debugging add the device index to the current debug log. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
Xiaofei Tan	327f242fa8	scsi: hisi_sas: check IPTT is valid before using it for v3 hw There is a bug of v3 hw development version. When AXI error happen, hw may return an abnormal CQ that IPTT value is 0xffff. This will cause IPTT out-of-bounds reference. This patch adds a check of IPTT in cq_tasklet_v3_hw() and discards invalid slot. This workaround scheme is just to enhance fault-tolerance of the driver. So, we will apply this scheme for all version of v3 hw, although release version has fixed this SoC bug. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
Xiaofei Tan	3ff0f0b657	scsi: hisi_sas: consolidate command check in hisi_sas_get_ata_protocol() Currently we check the fis->command value in 2 locations in hisi_sas_get_ata_protocol() switch statement. Fix this by consolidating the check for fis->command value to 1 location only. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
Xiang Chen	4f4e21b8ff	scsi: hisi_sas: use dma_zalloc_coherent() This is a warning coming from Coccinelle, and need to use new interface dma_zalloc_coherent() instead of dma_alloc_coherent()/memset(). Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
Xiang Chen	5df41af4b1	scsi: hisi_sas: delete timer when removing hisi_sas driver Delete timer for v1 and v3 hw when removing hisi_sas driver. Signed-off-by: Xiang chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
Xiaofei Tan	6157363091	scsi: hisi_sas: update RAS feature for later revision of v3 HW There is an modification for later revision of v3 hw. More HW errors are reported through RAS interrupt. These errors were originally reported only through MSI. When report to RAS, some combinations are done to port AXI errors and FIFO OMIT errors. For example, each port has 4 AXI errors, and they are combined to one when report to RAS. This patch does two things: 1. Enable RAS interrupt of these errors and handle them in PCI error handlers. 2. Disable MSI interrupts of these errors for this later revision hw. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
Xiang Chen	8b8d665315	scsi: hisi_sas: make SAS address of SATA disks unique When directly connected with SATA disks in different SAS cores, fill SAS address with scsi_host's id to make it's fake SAS address unique. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
Uma Krishnan	d2d354a606	scsi: cxlflash: Handle spurious interrupts The following Oops can occur when there is heavy I/O traffic and the host is reset by a tool such as sg_reset. [c000200fff3fbc90] c00800001690117c process_cmd_doneq+0x104/0x500 [cxlflash] (unreliable) [c000200fff3fbd80] c008000016901648 cxlflash_rrq_irq+0xd0/0x150 [cxlflash] [c000200fff3fbde0] c000000000193130 __handle_irq_event_percpu+0xa0/0x310 [c000200fff3fbea0] c0000000001933d8 handle_irq_event_percpu+0x38/0x90 [c000200fff3fbee0] c000000000193494 handle_irq_event+0x64/0xb0 [c000200fff3fbf10] c000000000198ea0 handle_fasteoi_irq+0xc0/0x230 [c000200fff3fbf40] c00000000019182c generic_handle_irq+0x4c/0x70 [c000200fff3fbf60] c00000000001794c __do_irq+0x7c/0x1c0 [c000200fff3fbf90] c00000000002a390 call_do_irq+0x14/0x24 [c000200e5828fab0] c000000000017b2c do_IRQ+0x9c/0x130 [c000200e5828fb00] c000000000009b04 h_virt_irq_common+0x114/0x120 When a context is reset, the pending commands are flushed and the AFU is notified. Before the AFU handles this request there could be command completion interrupts queued to PHB which are yet to be delivered to the context. In this scenario, a context could receive an interrupt for a command that has been flushed, leading to a possible crash when the memory for the flushed command is accessed. To resolve this problem, a boolean will indicate if the hardware queue is ready to process interrupts or not. This can be evaluated in the interrupt handler before proessing an interrupt. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:51 -04:00
Uma Krishnan	9a597cd4c0	scsi: cxlflash: Remove commmands from pending list on timeout The following Oops can occur if an internal command sent to the AFU does not complete within the timeout: [c000000ff101b810] c008000016020d94 term_mc+0xfc/0x1b0 [cxlflash] [c000000ff101b8a0] c008000016020fb0 term_afu+0x168/0x280 [cxlflash] [c000000ff101b930] c0080000160232ec cxlflash_pci_error_detected+0x184/0x230 [cxlflash] [c000000ff101b9e0] c00800000d95d468 cxl_vphb_error_detected+0x90/0x150[cxl] [c000000ff101ba20] c00800000d95f27c cxl_pci_error_detected+0xa4/0x240 [cxl] [c000000ff101bac0] c00000000003eaf8 eeh_report_error+0xd8/0x1b0 [c000000ff101bb20] c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170 [c000000ff101bbb0] c00000000003f438 eeh_handle_normal_event+0x198/0x580 [c000000ff101bc60] c00000000003fba4 eeh_handle_event+0x2a4/0x338 [c000000ff101bd10] c0000000000400b8 eeh_event_handler+0x1f8/0x200 [c000000ff101bdc0] c00000000013da48 kthread+0x1a8/0x1b0 [c000000ff101be30] c00000000000b528 ret_from_kernel_thread+0x5c/0xb4 When an internal command times out, the command buffer is freed while it is still in the pending commands list of the context. This corrupts the list and when the context is cleaned up, a crash is encountered. To resolve this issue, when an AFU command or TMF command times out, the command should be deleted from the hardware queue pending command list before freeing the buffer. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:50 -04:00
Uma Krishnan	a3feb6ef50	scsi: cxlflash: Synchronize reset and remove ops The following Oops can be encountered if a device removal or system shutdown is initiated while an EEH recovery is in process: [c000000ff2f479c0] c008000015256f18 cxlflash_pci_slot_reset+0xa0/0x100 [cxlflash] [c000000ff2f47a30] c00800000dae22e0 cxl_pci_slot_reset+0x168/0x290 [cxl] [c000000ff2f47ae0] c00000000003ef1c eeh_report_reset+0xec/0x170 [c000000ff2f47b20] c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170 [c000000ff2f47bb0] c00000000003f80c eeh_handle_normal_event+0x56c/0x580 [c000000ff2f47c60] c00000000003fba4 eeh_handle_event+0x2a4/0x338 [c000000ff2f47d10] c0000000000400b8 eeh_event_handler+0x1f8/0x200 [c000000ff2f47dc0] c00000000013da48 kthread+0x1a8/0x1b0 [c000000ff2f47e30] c00000000000b528 ret_from_kernel_thread+0x5c/0xb4 The remove handler frees AFU memory while the EEH recovery is in progress, leading to a race condition. This can result in a crash if the recovery thread tries to access this memory. To resolve this issue, the cxlflash remove handler will evaluate the device state and yield to any active reset or probing threads. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:50 -04:00
Uma Krishnan	07d0c52f87	scsi: cxlflash: Enable OCXL operations This commit enables the OCXL operations for the OCXL devices. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:50 -04:00
Uma Krishnan	9433fb32b7	scsi: cxlflash: Support AFU reset The cxlflash core driver resets the AFU when the master contexts are created in the initialization or recovery paths. Today, the OCXL provider service to perform this operation is pending implementation. To avoid a crash due to a missing fop, log an error once and return success to continue with execution. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:50 -04:00
Uma Krishnan	66ae644b92	scsi: cxlflash: Register for translation errors While enabling a context on the link, a predefined callback can be registered with the OCXL provider services to be notified on translation errors. These errors can in turn be passed back to the user on a read operation. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:50 -04:00
Uma Krishnan	f81face725	scsi: cxlflash: Introduce OCXL context state machine In order to protect the OCXL hardware contexts from getting clobbered, a simple state machine is added to indicate when a context is in open, close or start state. The expected states are validated throughout the code to prevent illegal operations on a context. A mutex is added to protect writes to the context state field. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:50 -04:00
Uma Krishnan	d91dd3a7d1	scsi: cxlflash: Update synchronous interrupt status bits The SISLite specification has been updated to define new synchronous interrupt status bits. These bits are set by the AFU when a given PASID or EA is bad and a synchronous interrupt is triggered. The SISLite header file is updated to support these new bits. Note that there are also some formatting updates to some of the existing bits to allow all of the definitions to line up uniformly. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-04-18 19:32:50 -04:00

1 2 3 4 5 ...

752063 commits