redonkable/alistair23-linux

Author	SHA1	Message	Date
Raghava Aditya Renukunta	97a4e8ac3f	scsi: aacraid: Refactor reset_host store function Refactored the reset_host store function to make consistent across code bases Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta	d1471eb0fa	scsi: aacraid: Allow reset_host sysfs var to recover Panicked Fw It is possible to restart the controller via the use of the reset_host sysfs variable. This does work for controllers that can no longer respond, since driver will attempt to send down a shutdown in this path. Check if the controller is able to receive commands before sending down a shutdown Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta	f3a2327725	scsi: aacraid: Fix ioctl reset hang Driver would hang when attempting to send reset from the ioctl interface, since it would wait to retrieve the ioctl mutex at send shutdown. Set adapter shutdown and unlock mutex before sending down reset request. Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta	95900629fa	scsi: aacraid: Do not remove offlined devices As part of the recovery process, the drivers removes offline devices ( done by the kernel) and then tries to add them back in the rescan code. Removing the device is like taking a sledgehammer to a nail. Set the device as running if it is marked offline. Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta	c5313ae8e4	scsi: aacraid: Fix hang in kdump Driver attempts to perform a device scan and device add after coming out of reset. At times when the kdump kernel loads and it tries to perform eh recovery, the device scan hangs since its commands are blocked because of the eh recovery. This should have shown up in normal eh recovery path (Should have been obvious) Remove the code that performs scanning.I can live without the rescanning support in the stable kernels but a hanging kdump/eh recovery needs to be fixed. Fixes: `a2d0321dd5` (scsi: aacraid: Reload offlined drives after controller reset) Cc: <stable@vger.kernel.org> Reported-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Fixes: `a2d0321dd5` (scsi: aacraid: Reload offlined drives after controller reset) Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta	dfb92a1f93	scsi: aacraid: Do not attempt abort when Fw panicked Check if the adapter can receive abort requests, before sending aborts Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta	f4e8708d31	scsi: aacraid: Fix udev inquiry race condition When udev requests for a devices inquiry string, it might create multiple threads causing a race condition on the shared inquiry resource string. Created a buffer with the string for each thread. Cc: <stable@vger.kernel.org> Fixes: `3bc8070fb7` ([SCSI] aacraid: SMC vendor identification) Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 23:26:41 -05:00
Randy Dunlap	ccd4a43035	scsi: doc: fix iscsi-related kernel-doc warnings Fix kernel-doc warnings in drivers/scsi/ that are related to iscsi support interfaces. Fixes these kernel-doc warnings: (tested by adding these files to a new target.rst documentation file: WIP) ../drivers/scsi/libiscsi.c:2740: warning: No description found for parameter 'dd_size' ../drivers/scsi/libiscsi.c:2740: warning: No description found for parameter 'id' ../drivers/scsi/libiscsi.c:2961: warning: No description found for parameter 'cls_conn' ../drivers/scsi/iscsi_tcp.c:313: warning: No description found for parameter 'conn' ../drivers/scsi/iscsi_tcp.c:363: warning: No description found for parameter 'conn' ../drivers/scsi/libiscsi_tcp.c:810: warning: No description found for parameter 'tcp_conn' ../drivers/scsi/libiscsi_tcp.c:810: warning: No description found for parameter 'segment' ../drivers/scsi/libiscsi_tcp.c:887: warning: No description found for parameter 'offloaded' ../drivers/scsi/libiscsi_tcp.c:887: warning: No description found for parameter 'status' ../drivers/scsi/libiscsi_tcp.c:887: warning: Excess function parameter 'offload' description in 'iscsi_tcp_recv_skb' ../drivers/scsi/libiscsi_tcp.c:964: warning: Excess function parameter 'conn' description in 'iscsi_tcp_task_init' ../drivers/scsi/libiscsi_tcp.c:964: warning: Excess function parameter 'sc' description in 'iscsi_tcp_task_init' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: linux-scsi@vger.kernel.org Cc: target-devel@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-rdma@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 23:10:06 -05:00
Chaitra P B	f49d4aed13	scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag. 1. In IO path, setting of "ATA command pending" flag early before device removal, invalid device handle etc., checks causes any new commands to be always returned with SAM_STAT_BUSY and when the driver removes the drive the SML issues SYNC Cache command and that command is always returned with SAM_STAT_BUSY and thus making SYNC Cache command to requeued. 2. If the driver gets an ATA PT command for a SATA drive then the driver set "ATA command pending" flag in device specific data structure not to allow any further commands until the ATA PT command is completed. However, after setting the flag if the driver decides to return the command back to upper layers without actually issuing to the firmware (i.e., returns from qcmd failure return paths) then the corresponding flag is not cleared and this prevents the driver from sending any new commands to the drive. This patch fixes above two issues by setting of "ATA command pending" flag after checking for whether device deleted, invalid device handle, device busy with task management. And by setting "ATA command pending" flag to false in all of the qcmd failure return paths after setting the flag. Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com> Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 23:08:01 -05:00
Colin Ian King	8fd03fd17f	scsi: lpfc: fix a couple of minor indentation issues Several statements are indented too far, fix these Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 22:53:46 -05:00
Colin Ian King	5c665aeb65	scsi: lpfc: don't dereference localport before it has been null checked localport is being dereferenced to assign lport and then immediately afterwards localport is being sanity checked to see if it is null. Fix this by only dereferencing localport until after it has been null checked. Detected by CoverityScan, CID#1463038 ("Dereference before null check") Fixes: 3a8cefbfc5ee ("scsi: lpfc: Beef up stat counters for debug") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 22:52:43 -05:00
James Smart	cc019a5a3b	scsi: scsi_transport_fc: fix typos on 64/128 GBit define names The define names specified 64Bit/128Bit, not 64GBIT/128GBIT. Correct the names. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 22:51:02 -05:00
Andy Shevchenko	9ea4e076bd	scsi: libsas: remove private hex2bin() implementation The function sas_parse_addr() could be easily substituted by hex2bin() which is in kernel library code. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 22:40:52 -05:00
Rafael David Tinoco	d754941225	scsi: libiscsi: Allow sd_shutdown on bad transport If, for any reason, userland shuts down iscsi transport interfaces before proper logouts - like when logging in to LUNs manually, without logging out on server shutdown, or when automated scripts can't umount/logout from logged LUNs - kernel will hang forever on its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all still existent paths. PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c This happens because iscsi_eh_cmd_timed_out(), the transport layer timeout helper, would tell the queue timeout function (scsi_times_out) to reset the request timer over and over, until the session state is back to logged in state. Unfortunately, during server shutdown, this might never happen again. Other option would be "not to handle" the issue in the transport layer. That would trigger the error handler logic, which would also need the session state to be logged in again. Best option, for such case, is to tell upper layers that the command was handled during the transport layer error handler helper, marking it as DID_NO_CONNECT, which will allow completion and inform about the problem. After the session was marked as ISCSI_STATE_FAILED, due to the first timeout during the server shutdown phase, all subsequent cmds will fail to be queued, allowing upper logic to fail faster. Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-01-03 22:37:41 -05:00
James Smart	b996ce3996	scsi: lpfc: correct sg_seg_cnt attribute min vs default Prior patch mixed up what argument in the macro was what, so min value was placed as the "default" argument, and the default value was placed as the "min" argument. Thus, when the default was applied, it looked like the default was smaller than the allowed min. Swap argument postions to correct. [mkp: fixed checkpatch warning] Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:27:30 -05:00
Himanshu Madhani	62aa281470	scsi: qla2xxx: Fix smatch warning in qla25xx_delete_{rsp\|req}_que This patch fixes following warnings reported by smatch: drivers/scsi/qla2xxx/qla_mid.c:586 qla25xx_delete_req_que() error: we previously assumed 'req' could be null (see line 580) drivers/scsi/qla2xxx/qla_mid.c:602 qla25xx_delete_rsp_que() error: we previously assumed 'rsp' could be null (see line 596) Fixes: `7867b98dce` ("scsi: qla2xxx: Fix memory leak in dual/target mode") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:52 -05:00
Jia-Ju Bai	b128458876	scsi: qedi: Fix a possible sleep-in-atomic bug in qedi_process_tmf_resp The driver may sleep under a spinlock. The function call path is: qedi_cpu_offline (acquire the spinlock) qedi_fp_process_cqes qedi_mtask_completion qedi_process_tmf_resp kzalloc(GFP_KERNEL) --> may sleep To fix it, GFP_KERNEL is replaced with GFP_ATOMIC. This bug is found by my static analysis tool(DSAC) and checked by my code review. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:51 -05:00
Ching Huang	6ae9abe0bd	scsi: arcmsr: simplify arcmsr_request_device_map routine Simplify arcmsr_request_device_map routine. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:51 -05:00
Ching Huang	1e9c81080d	scsi: arcmsr: simplify all arcmsr_hbaX_get_config routine by call a new get_adapter_config function Simplify all arcmsr_hbaX_get_config routine by call a new get_adapter_config function. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:50 -05:00
Ching Huang	22c4ae5b99	scsi: arcmsr: simplify arcmsr_hbaE_get_config function Simplify arcmsr_hbaE_get_config function. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:50 -05:00
Ching Huang	b6b3084acb	scsi: arcmsr: waiting for iop firmware ready before issue get_config command to iop Waiting for iop firmware ready before issue get_config command to iop for adapter type A and D. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:49 -05:00
Ching Huang	df9f0ee9d5	scsi: arcmsr: simplify arcmsr_hbaC_get_config function Simplify arcmsr_hbaC_get_config function. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:49 -05:00
James Smart	2f7005debe	scsi: lpfc: update driver version to 11.4.0.6 Update the driver version to 11.4.0.6 Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:48 -05:00
James Smart	4b056682d8	scsi: lpfc: Beef up stat counters for debug If log verbose in not turned on, its hard to tell when certain error paths get hit. Add stats counters and corresponding logic to debugfs/sysfs to aid understanding what paths were traversed. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:48 -05:00
James Smart	3fd78355cd	scsi: lpfc: Fix infinite wait when driver unregisters a remote NVME port. When unregistering a remote port the lpfc driver would eventually wait for the remoteport_unreg done callback. But the driver never completed the io aborts that would allow the connections to terminate thus the unreg done callback was never issued. Turns out the coding style of the driver allowed for the wait to occur on the same cpu that the deferred isr is called on. The blocking for the wait, blocked the isr, and as the isr didn't run, the io aborts wouldn't finish. Turns out there was never a good reason to block waiting for the unreg done in the first place. The driver can continue execution and the ref counting within the driver will do the right thing. Resolve by removing the wait and patching up a few cases where the ref counting didn't look right - mainly cases where the remote port comes back before the aborts had completed and the unreg done had been called. Additionally, a few places which used pointer values to guide driver actions weren't protected by lock, so correct those. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:47 -05:00
James Smart	e06351a002	scsi: lpfc: Fix issues connecting with nvme initiator In the lpfc discovery engine, when as a nvme target, where the driver was performing mailbox io with the adapter for port login when a NVME PRLI is received from the host. Rather than queue and eventually get back to sending a response after the mailbox traffic, the driver rejected the io with an error response. Turns out this particular initiator didn't like the rejection values (unable to process command/command in progress) so it never attempted a retry of the PRLI. Thus the host never established nvme connectivity with the lpfc target. By changing the rejection values (to Logical Busy/nothing more), the initiator accepted the response and would retry the PRLI, resulting in nvme connectivity. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:47 -05:00
James Smart	9de416ac67	scsi: lpfc: Fix SCSI LUN discovery when SCSI and NVME enabled When enabled for both SCSI and NVME support, and connected pt2pt to a SCSI only target, the driver nodelist entry for the remote port is left in PRLI_ISSUE state and no SCSI LUNs are discovered. Works fine if only configured for SCSI support. Error was due to some of the prli points still reflecting the need to send only 1 PRLI. On a lot of fabric configs, targets were NVME only, which meant the fabric-reported protocol attributes were only telling the driver one protocol or the other. Thus things worked fine. With pt2pt, the driver must send a PRLI for both protocols as there are no hints on what the target supports. Thus pt2pt targets were hitting the multiple PRLI issues. Complete the dual PRLI support. Track explicitly whether scsi (fcp) or nvme prli's have been sent. Accurately track protocol support detected on each node as reported by the fabric or probed by PRLI traffic. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:46 -05:00
James Smart	a51e41b671	scsi: lpfc: Increase SCSI CQ and WQ sizes. Increased the sizes of the SCSI WQ's and CQ's so that SCSI operation is similar to that used by NVME. However, size increase restricted only to those newer adapters that can support the larger WQE size, thus bigger queue sizes. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:46 -05:00
James Smart	b95e29b75d	scsi: lpfc: Fix receive PRLI handling Handling a rcv'ed PRLI incorrectly can cause the ndlp to end up in the wrong state or the driver to ACC and PRLI when it should send LS_RJT. The cause was due to the driver not properly looking at the PRLI type and taking the multiple protocol support into consideration. Resolved by adding checks in the various PRLI receive points to validate PRLI type and reject if not valid for the enabled protocols and mode (host vs target). Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:45 -05:00
James Smart	cbc5de1b8a	scsi: lpfc: Fix -EOVERFLOW behavior for NVMET and defer_rcv The driver is all set to handle the defer_rcv api for the nvmet_fc transport, yet didn't properly recognize the return status when the defer_rcv occurred. The driver treated it simply as an error and aborted the io. Several residual issues occurred at that point. Finish the defer_rcv support: recognize the return status when the io request is being handled in a deferred style. This stops the rogue aborts; Replenish the async cmd rcv buffer in the deferred receive if needed. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:45 -05:00
James Smart	cf1a1d3e2d	scsi: lpfc: Fix random heartbeat timeouts during heavy IO NVME targets appear to randomly disconnect from the initiator when running heavy IO. The error is due to the host aggregate (across all controllers) io load was beyond the maximum exchange count for nvme on the adapter. The driver was properly returning a resource busy status, but the io load was so great heartbeat commands would be bounced and not have a successful retry within the fuzz amount for the nvme heartbeat (yes, a very high io load!). Thus the target was terminating the controller due to a keep alive failure. Resolve by reserving a few exchanges (by counters) which can be used when the adapter is out of normal exchanges and the command is a NVME heartbeat command. As counters are used, while the reserved command is outstanding, as soon as any other exchange completes, the counters are adjusted and the reserved count is replenished. The heartbeat completes execution in a normal fashion. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:44 -05:00
Xiang Chen	4d0951ee70	scsi: hisi_sas: add v3 hw suspend and resume For v3 hw SAS, it supports configuring power state from D0 to D3 for entering Low Power status and power state from D3 to D0 for quit Low Power status. When power state from D0 to D3, HW will send FLR to clear the registers of ECAM and BAR space, and when power state from D3 to D0, it will clear the registers of ECAM space only. So when suspend, need to do like controller reset (including disable interrupts/DQ/PHY/BUS), and also release slots after FLR. When resume, re-config the registers of BAR space. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:44 -05:00
Xiang Chen	336bd78bda	scsi: hisi_sas: re-add the lldd_port_deformed() In function sas_suspend_devices(), it requires callback lldd_port_deformed callback to be implemented if lldd_port_deformed is implemented. So add a stub for lldd_port_deformed. Callback lldd_port_deformed was not required as the port deformation is done elsewhere in the LLDD. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:43 -05:00
Xiang Chen	9960a24a1c	scsi: hisi_sas: fix SAS_QUEUE_FULL problem while running IO This patch fix SAS_QUEUE_FULL problem. The test situation is close port while running IO. In sas_eh_handle_sas_errors(), SCSI EH will free sas_task of the device if lldd_I_T_nexus_reset() return TMF_RESP_FUNC_COMPLETE or -ENODEV. But in our SAS driver, we only free slots of the device when the return value is TMF_RESP_FUNC_COMPLETE. So if the return value is -ENODEV, the slot resource will not free any more. As an solution, we should also free slots of the device in lldd_I_T_nexus_reset() if the return value is -ENODEV. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:13 -05:00
Xiaofei Tan	2a03813123	scsi: hisi_sas: add internal abort dev in some places We should do internal abort dev before TMF_ABORT_TASK_SET and TMF_LU_RESET. Because we may only have done internal abort for single IO in the earlier part of SCSI EH process. Even the internal abort to the single IO, we also don't know whether it is successful. Besides, we should release slots of the device in hisi_sas_abort_task_set() if the abort is successful. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:03 -05:00
Xiaofei Tan	813709f2e1	scsi: hisi_sas: judge result of internal abort Normally, hardware should ensure that internal abort timeout will never happen. If happen, it would be an SoC failure. What's more, HW will not process any other commands if an internal abort hasn't return CQ, and they will time out also. So, we should judge the result of internal abort in SCSI EH, if it is failed, we should give up to do TMF/softreset and return failure to the upper layer directly. This patch do following things to achieve this: 1. When internal abort timeout happened, we set return value to -EIO in hisi_sas_internal_task_abort(). 2. If prep_abort() is not support, let hisi_sas_internal_task_abort() return TMF_RESP_FUNC_FAILED. 3. If hisi_sas_internal_task_abort() return an negative number, it can be thought that it not executed properly or internal abort timeout. Then we won't do behind TMF or softreset, and return failure directly. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:03 -05:00
Xiaofei Tan	057c3d1f07	scsi: hisi_sas: do link reset for some CHL_INT2 ints We should do link reset of PHY when identify timeout or STP link timeout. They are internal events of SOC and are notified to driver through interrupts of CHL_INT2. Besides, we should add an delay work to do link reset as it needs sleep. So, this patch add an new PHY event HISI_PHYE_LINK_RESET for this. Notes: v2 HW doesn't report the event of STP link timeout. So, we only need to handle event of identify timeout for v2 HW. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:03 -05:00
Xiaofei Tan	e537b62b07	scsi: hisi_sas: use an general way to delay PHY work Use an general way to do delay work for a PHY. Then it will be easier to add new delayed work for a PHY in future. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:03 -05:00
Xiaofei Tan	72f7fc3050	scsi: hisi_sas: add v2 hw port AXI error handling support Add port AXI errors handling for v2 hw. We do host controller reset for such errors. Besides, change port muli-bits ECC error handling, and we should also do host reset for such error. So, this patch put them in the same struct with port AXI error. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:03 -05:00
Xiaofei Tan	f64715d283	scsi: hisi_sas: improve int_chnl_int_v2_hw() consistency with v3 hw Change code format of int_chnl_int_v2_hw() to be consistent with v3 hw to reduce an tag indent. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiang Chen	f1c8821145	scsi: hisi_sas: add some print to enhance debugging Add some print at some places such as error info and cq of exception IO, device found etc, and also adjust some log levels. All this to assist debugging ability. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiaofei Tan	1aaf81e0e3	scsi: hisi_sas: add RAS feature for v3 hw We use PCIe AER to support RAS feature for v3 hw. This driver should do following two things to support this: 1. Enable RAS interrupts, so that errors can be reported to RAS module. 2. Realize err_handler for sas_v3_pci_driver. Then if non-fatal error is detected, print error source and try to recover SAS controller. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiang Chen	9f347b2fac	scsi: hisi_sas: change ncq process for v3 hw For v3 hw, each NCQ will return a CQ, so it is no need to acquire IPTT from ITCT, just acquire it from IPTT field of CQ. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiaofei Tan	e402acdb66	scsi: hisi_sas: add an mechanism to do reset work synchronously Sometimes it is required to know when the controller reset has completed and also if it has completed successfully. For such places, we call hisi_sas_controller_reset() directly before. That may lead to multiple calls to this function. This patch create a per-reset structure which contains a completion structure and status flag to know when the reset completes and also the status. It is also in hisi_hba.wq to do reset work. As all host reset works are done in hisi_hba.wq, we don't worry multiple calls to hisi_sas_controller_reset(). Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiang Chen	f8e45ec226	scsi: hisi_sas: modify hisi_sas_dev_gone() for reset Do a couple of changes for when HISI_SAS_RESET_BIT is set for HBA: - Clearing ITCT is not necessary - Remove internal abort as it will fail during reset Flag sas_dev->dev_type is kept as SAS_PHY_UNUSED. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiaofei Tan	fb51e7a8d3	scsi: hisi_sas: some optimizations of host controller reset This patch do following optimizations to host controller reset: 1. Unblock scsi requests before rescanning topology, as SCSI command need be used if new device is found during rescanning topology. 2. Remove drain_workqueue(hisi_hba->wq) and drain_workqueue(shost->work_q), as there is no need to ensure that all PHYs event are done before exiting host reset. 3. Improve message print level of host reset. Host reset is an important and very few occurrence event. We should know its progress even when not debugging. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiaofei Tan	a669bdbf49	scsi: hisi_sas: optimise port id refresh function Currently refreshing the PHY port id after reset is done in the rescan topology function, which is quite late in the reset process. It could be moved earlier in the process, as the port id can be refreshed once the PHYs become ready. In addition to this, we should set the hisi_sas_dev port id to 0xff (invalid port id) if all PHYs of this port remain down for the same device. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiaofei Tan	0258141aaa	scsi: hisi_sas: relocate clearing ITCT and freeing device In certain scenarios we may just want to clear the ITCT for a device, and not free other resources like the SATA bitmap using in v2 hw. To facilitate this, this patch relocates the code of clearing ITCT from free_device() to a new hw interface clear_itct(). Then for some hw, we should not realise free_device() if there's nothing left to do for it. [mkp: typo] Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiang Chen	dc1e4730e2	scsi: hisi_sas: fix dma_unmap_sg() parameter For function dma_unmap_sg(), the <nents> parameter should be number of elements in the scatterlist prior to the mapping, not after the mapping. Fix this usage. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00
Xiang Chen	39bade0c9f	scsi: hisi_sas: initialize dq spinlock before use It is required to initialize the dq spinlock before use, which was not being done, so fix it. This issue can be detected when CONFIG_DEBUG_SPINLOCK is enabled. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-14 21:25:02 -05:00

1 2 3 4 5 ...

15894 commits