redonkable/alistair23-linux

Author	SHA1	Message	Date
James Smart	f44ac12f1d	scsi: lpfc: Memory allocation error during driver start-up on power8 The driver fails to allocate command buffers in the routine lpfc_new_scsi_buf_s4 There is an inconsistency between lpfc_mem_alloc(), where the phba->lpfc_sg_dma_buf_pool is created, and lpfc_new_scsi_buf_s4(), when we allocate a buffer from the pool and check the alignment. The alignment should be on a page boundary, based on LPFC_SLI3_BG_ENABLED in sli3_options, for both cases. Fix by explicitly tracking sli4 vs sli3 and BG options. The result is that phba->cfg_sg_dma_buf_size is now set correctly for SLI-4. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-03-12 21:55:24 -04:00
James Smart	e29d74f8eb	scsi: lpfc: Fix mailbox wait for POST_SGL mbox command POST_SGL_PAGES mailbox command failed with status (timeout). wait_event_interruptible_timeout when called from mailbox wait interface, gets interrupted, and will randomly fail. Behavior seems very specific to 1 particular server type. Fix by changing from wait_event_interruptible_timeout to wait_for_completion_timeout. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-03-12 21:55:24 -04:00
James Smart	205e8240a1	scsi: lpfc: Code cleanup for 128byte wqe data type The driver is very sloppy about the WQE structure passed between routines. The base struct type is a 64byte wqe. But in many routines they typecast and access 128byte wqes. There were a couple of cases in the past (corrected already) where the typecasts were incorrectly done and the 64byte buffer was accessed as a 128 byte buffer. Clean this up by properly declaring wqe's as 128byte wqe's and removing the typecasts. 64byte wqes are considered a subset of the 128byte wqes. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-03-12 21:55:23 -04:00
James Smart	4c06619fc4	scsi: lpfc: use __raw_writeX on DPP copies Commit `1351e69fc6` ("scsi: lpfc: Add push-to-adapter support to sli4") fails compilation on some 32-bit systems as writeq() is not supported on all architectures. Additionally, it was pointed out that as writeX() does byteswapping if necessary for pci vs the cpu endianness, the code was broken on BE PPC. After discussions with Arnd Bergmann, we've resolved the issue to the following: Instead of writeX(), use __raw_writeX() - which writes to io space while preserving byte order. To use this, the code was changed to use a different buffer that lpfc prepped via sli_pcimem_bcopy() that was set to the bytestream to be written. On platforms with __raw_writeq support, use the routine, otherwise use __raw_writel() [mkp: checkpatch] Fixes: `1351e69fc6` ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-03-06 21:25:39 -05:00
James Smart	4e565cf041	scsi: lpfc: Work around NVME cmd iu SGL type The hardware offload for NVME commands was created when the FC-NVME standard was setting SGL Descriptor Type to SGL Data Block Descriptor (0h) and SGL Descriptor Sub Type to Address (0h). A late change in NVMe-over-Fabrics obsoleted these values, creating a transport SGL descriptor type with new values to go into these fields. For initial hardware support, in order to be compliant to the spec, use host-supplied cmd IU buffers instead of the adapter generated values. Later hardware will correct this. Add a module parameter to override this offload disablement if looking for lowest latency. This is reasonable as nothing in FC-NVME uses the SQE SGL values. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-22 20:39:29 -05:00
James Smart	0bc2b7c531	scsi: lpfc: Add embedded data pointers for enhanced performance The current driver isn't taking advantage of a performance hint whereby the initial data buffer descriptor can be placed in the WQE as well as the SGL. Add the logic to detect support for the feature and to use it when supported. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-22 20:39:29 -05:00
James Smart	1feb8204a1	scsi: lpfc: Enable fw download on if_type=6 devices Current code is very explicit in what it allows to be downloaded. The driver checking prevented G7 firmware download. The driver checking is unnecessary as the device will validate what it receives. Revise the firmware download interface checking. Added a little debug support in case there is still a failure. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-22 20:39:29 -05:00
James Smart	7365f6fdbb	scsi: lpfc: Add if_type=6 support for cycling valid bits Traditional SLI4 required the driver to clear Valid bits on EQEs and CQEs after consuming them. The new if_type=6 hardware will cycle the value for what is valid on each queue itteration. The driver no longer has to touch the valid bits. This also means all the cpu cache dirtying and perhaps flush/refill's done by the hardware in accessing the EQ/CQ elements is eliminated. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-22 20:39:29 -05:00
James Smart	1351e69fc6	scsi: lpfc: Add push-to-adapter support to sli4 New if_type=6 adapters support an additional BAR that provides apertures to allow direct WQE to adapter push support - termed Direct Packet Push (DPP). WQ creation differs slightly to ask for a WQ to be DPP-ized. When submitting a WQE to a DPP WQ, it is submitted to the host memory for the WQ normally, but is also written by the host cpu directly to a BAR aperture. Write buffer coalescing in hardware is (hopefully) turned on, enabling single pci write operation support. The doorbell is thing rung to indicate the WQE is available and was pushed to the aperture. This patch: - Updates the WQ Create commands for the DPP options - Adds the bar mapping for if_type=6 DPP bar - Adds the WQE pushing to the DDP aperture received from WQ create - Adds a new module parameter to disable DPP operation if desired. Default is enabled. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-22 20:39:29 -05:00
James Smart	27d6ac0a6e	scsi: lpfc: Add SLI-4 if_type=6 support to the code base New hardware supports a SLI-4 interface, but with a new if_type variant of 6. If_type=6 has a different PCI BAR map, separate EQ/CQ doorbells, and some changes in doorbell formats. Add the changes for the if_type into headers, adapter initialization and control flows. Add new eq and cq handlers. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-22 20:39:28 -05:00
James Smart	9dd35425a5	scsi: lpfc: Rework sli4 doorbell infrastructure Up until now, all SLI-4 devices had the same doorbells at the same bar locations. With newer hardware, there are now independent EQ and CQ doorbells and the bar locations differ. Prepare the code for new hardware by separating the eq/cq doorbell into separate components. The components can be set based on if_type. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-22 20:39:28 -05:00
James Smart	b71413dd01	scsi: lpfc: Rework lpfc to allow different sli4 cq and eq handlers Up until now, an SLI-4 device had no variance in the way it handled its EQs and CQs. With newer hardware, there are now differences in doorbells and some differences in how entries are valid. Prepare the code for new hardware by creating a sli4-based callout table that can be set based on if_type. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-22 20:39:28 -05:00
James Smart	128bddacc4	scsi: lpfc: Update 11.4.0.7 modified files for 2018 Copyright Updated Copyright in files updated 11.4.0.7 Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-12 11:43:24 -05:00
James Smart	c1dd9111b7	scsi: lpfc: Fix SCSI io host reset causing kernel crash During SCSI error handling escalation to host reset, the SCSI io routines were moved off the txcmplq, but the individual io's ON_CMPLQ flag wasn't cleared. Thus, a background thread saw the io and attempted to access it as if on the txcmplq. Clear the flag upon removal. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-12 11:43:23 -05:00
James Smart	411de511c6	scsi: lpfc: Fix RQ empty firmware trap When nvme target deferred receive logic waits for exchange resources, the corresponding receive buffer is not replenished with the hardware. This can result in a lack of asynchronous receive buffer resources in the hardware, resulting in a "2885 Port Status Event: ... error 1=0x52004a01 ..." message. Correct by replenishing the buffer whenenver the deferred logic kicks in. Update corresponding debug messages and statistics as well. [mkp: applied by hand] Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-12 11:43:23 -05:00
James Smart	6e8e1c14c6	scsi: lpfc: Add WQ Full Logic for NVME Target I/O conditions on the nvme target may have the driver submitting to a full hardware wq. The hardware wq is a shared resource among all nvme controllers. When the driver hit a full wq, it failed the io posting back to the nvme-fc transport, which then escalated it into errors. Correct by maintaining a sideband queue within the driver that is added to when the WQ full condition is hit, and drained from as soon as new WQ space opens up. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-12 11:43:23 -05:00
James Smart	c176ffa084	scsi: lpfc: Increase CQ and WQ sizes for SCSI Increased CQ and WQ sizes for SCSI FCP, matching those used for NVMe development. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-12 11:43:22 -05:00
James Smart	04673e38f5	scsi: lpfc: Fix frequency of Release WQE CQEs The driver controls when the hardware sends completions that communicate consumption of elements from the WQ. This is done by setting a WQEC bit on a WQE. The current driver sets it on every Nth WQE posting. However, the driver isn't clearing the bit if the WQE is reused. Thus, if the queue depth isn't evenly divisible by N, with enough time, it can be set on every element, creating a lot of overhead and risking CQ full conditions. Correct by clearing the bit when not setting it on an Nth element. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-02-12 11:43:22 -05:00
James Smart	cbc5de1b8a	scsi: lpfc: Fix -EOVERFLOW behavior for NVMET and defer_rcv The driver is all set to handle the defer_rcv api for the nvmet_fc transport, yet didn't properly recognize the return status when the defer_rcv occurred. The driver treated it simply as an error and aborted the io. Several residual issues occurred at that point. Finish the defer_rcv support: recognize the return status when the io request is being handled in a deferred style. This stops the rogue aborts; Replenish the async cmd rcv buffer in the deferred receive if needed. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-20 21:11:45 -05:00
James Smart	422c4cb7e9	scsi: lpfc: Fix NVME LS abort_xri Performing an LS abort results in the following message being seen: 0603 Invalid CQ subtype 6: 00000300 22000002 ffff0016 d0050000 and the associated exchange is not properly freed. The code did not recognize the exchange type that was aborted, thus it was not properly handled. Correct by adding the NVME LS ELS type to the exchange types that are recognized. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-04 20:32:54 -05:00
James Smart	8a5ca109a3	scsi: lpfc: Handle XRI_ABORTED_CQE in soft IRQ XRI_ABORTED_CQE completions were not being handled in the fast path. They were being queued and deferred to the lpfc worker thread for processing. This is an artifact of the driver design prior to moving queue processing out of the isr and into a workq element. Now that queue processing is already in a deferred context, remove this artifact and process them directly. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-04 20:32:53 -05:00
James Smart	81b96eda5f	scsi: lpfc: Expand WQE capability of every NVME hardware queue Hardware queues are a fast staging area to push commands into the adapter. The adapter should drain them extremely quickly. However, under heavy io load, the host cpu is pushing commands faster than the drain rate of the adapter causing the driver to resource busy commands. Enlarge the hardware queue (wq & cq) to support a larger number of queue entries (4x the prior size) before backpressure. Enlarging the queue requires larger contiguous buffers (16k) per logical page for the hardware. This changed calling sequences that were expecting 4K page sizes that now must pass a parameter with the page sizes. It also required use of a new version of an adapter command that can vary the page size values. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-04 20:32:53 -05:00
Linus Torvalds	670ffccb2f	SCSI misc on 20171114 This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor updates. There's no major behaviour change or additions to the core in all of this, so the potential for regressions should be small (biggest potential being in the scsi error handler changes). Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJaCxtCAAoJEAVr7HOZEZN4d9EQAI+OHP6ss6zjKKC21c9jNPcH NhLrNv37gHg/LA2VXeUEL9RGUjCGLIUrI4HsrxzkFAMLKP4TkshMs8/2RvczY+Sa VpayPqVybEKLIS6ipQyM1SLIQff2nvtDVcN/T+8z1lkk45TrbA6ZGuwUwd2aJyEA 2V2wtg51ObnL0Nr9QPPll0JrtL1AnCZyRlu9XrwTZuuSBZwk93opIuuvbZm/3dVg Ir4GSS4Y+PuHIfu4cxqdsPMdzRdY9I2me1YiE4jeFSn1/VTAjL4HBz7fO9eITT42 VhXSpDz1XvFsa9dJ0ubkqoALpJzCfOcBw+EuGvSydLEvOBoEVwMccdfaD9lT1zc5 L9e1Z5qqJoq7hTA6xTXCYfWG73I9HYvljtmc8yudKHhADOdnSTUXhaO6uBF0RNqD OxPSA1RZwRx3c6lDOcK6BTtvLAkTEuYKdrWSKJi0w+QXJAyQ6etqbmsKpmPdRim7 Z4ZSpJFro2gyo9gcdJO0ykTG+z3U7Z/ay1sNgnuprsv+eU/QjUdlAPl18o79EkRf H54zZggZ4wC6q/cFVVt4Vx+V+oqIeu38s7NDXS9UltLoTZPm2EzDW6pXd/38Z4Tf a1oBAUET8kYLC90P8sVZxUIHZjITlpgDbyE2Lq00PMYXhk8S4IxF0aMN5RvVqzUv +7N2HrHkSSgG1nhw1t+E =3O85 -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor updates. There's no major behaviour change or additions to the core in all of this, so the potential for regressions should be small (biggest potential being in the scsi error handler changes)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits) scsi: lpfc: Fix hard lock up NMI in els timeout handling. scsi: mpt3sas: remove a stray KERN_INFO scsi: mpt3sas: cleanup _scsih_pcie_enumeration_event() scsi: aacraid: use timespec64 instead of timeval scsi: scsi_transport_fc: add 64GBIT and 128GBIT port speed definitions scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair() scsi: mpt3sas: fix dma_addr_t casts scsi: be2iscsi: Use kasprintf scsi: storvsc: Avoid excessive host scan on controller change scsi: lpfc: fix kzalloc-simple.cocci warnings scsi: mpt3sas: Update mpt3sas driver version. scsi: mpt3sas: Fix sparse warnings scsi: mpt3sas: Fix nvme drives checking for tlr. scsi: mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info scsi: mpt3sas: Add-Task-management-debug-info-for-NVMe-drives. scsi: mpt3sas: scan and add nvme device after controller reset scsi: mpt3sas: Set NVMe device queue depth as 128 scsi: mpt3sas: Handle NVMe PCIe device related events generated from firmware. scsi: mpt3sas: API's to remove nvme drive from sml scsi: mpt3sas: API 's to support NVMe drive addition to SML ...	2017-11-14 16:23:44 -08:00
Dick Kennedy	341b2aa833	scsi: lpfc: Fix hard lock up NMI in els timeout handling. System crashed due to a hard lockup at lpfc_els_timeout_handler+0x128. The els ring's txcmplq list is corrupted: the last element in the list does not point back the the head causing a loop. Issue is the els processing path for sli4 hbas are using the hbalock instead of the ring_lock for removing elements from the txcmplq list. Use the adapter SLI_REV to determine which lock should be used for removing iocbqs from the els rings txcmplq. note: the future refactoring will address this so that we don't have this ugly type-based lock code. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-11-08 18:25:12 -05:00
Kees Cook	f22eb4d31c	scsi: lpfc: Convert timers to use timer_setup() In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: James Smart <james.smart@broadcom.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-11-01 11:27:07 -07:00
Dick Kennedy	8e036a9497	scsi: lpfc: Fix FCP hba_wqidx assignment The driver is encountering oops in lpfc_sli_calc_ring. The driver is setting hba_wqidx for FCP based on the policy in use for NVME. The two may not be the same. Change to set the wqidx based on the FCP policy. Cc: <stable@vger.kernel.org> # 4.12+ Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-10-02 22:46:36 -04:00
Dick Kennedy	f485c18db2	scsi: lpfc: Move CQ processing to a soft IRQ Under heavy target nvme load duration, the lpfc irq handler is encountering cpu lockup warnings. Convert the driver to a shortened ISR handler which identifies the interrupting condition then schedules a workq thread to process the completion queue the interrupt was for. This moves all the real work into the workq element. As nvmet_fc upcalls are no longer in ISR context, don't set the feature flags Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-10-02 22:46:36 -04:00
Dick Kennedy	c8a4ce0bf3	scsi: lpfc: Make ktime sampling more accurate Need to make ktime samples more accurate If ktime is turned on in the middle of an IO, the max calculation could be misleading. Base sampling on the start time of the IO as opposed to ktime_on. Make ISR ktime timestamps be from when CQE is read instead of EQE. Added additional sanity checks when deciding whether to accept an IO sample or not. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-10-02 22:46:35 -04:00
Dick Kennedy	1234a6d54f	scsi: lpfc: Fix crash receiving ELS while detaching driver The driver crashes when attempting to use a freed ndpl pointer. The pci_remove_one handler runs on a separate kernel thread. The order of the removal is starting by freeing all of the ndlps and then disabling interrupts. In between these two events the driver can still receive an ELS and process it. When it tries to use the ndlp pointer will be NULL Change the order of the pci_remove_one vs disable interrupts so that interrupts are disabled before the ndlp's are freed. Cc: <stable@vger.kernel.org> # 4.12+ Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-10-02 22:46:33 -04:00
Dick Kennedy	401bb4169d	scsi: lpfc: fix pci hot plug crash in list_add call During pci hot plug, the kernel crashes in a list_add_call The lookup by tag function will return null if the IOCB is out of range or does not have the on txcmplq flag set. Fix: Check for null return from lookup by tag. Cc: <stable@vger.kernel.org> # 4.12+ Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-10-02 22:46:32 -04:00
Colin Ian King	858e51e8cb	scsi: lpfc: remove redundant null check on eqe The pointer eqe is always non-null inside the while loop, so the check to see if eqe is NULL is redudant and hence can be removed. Detected by CoverityScan CID#1248693 ("Logically Dead Code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-09-15 15:39:11 -04:00
Dick Kennedy	66d7ce93a0	scsi: lpfc: Fix MRQ > 1 context list handling Various oops including cpu LOCKUPs were seen. For asynchronously received ius where the driver must assign exchange resources, the resources were on a single get (free) list and put list (finished, waiting to be put on get list). As all cpus are sharing the lists, an interrupt for a receive frame may have to wait for all the other cpus to place their done work onto the put list before it can acquire the lock to pull from the list. Fix by breaking the resource lists into per-cpu lists or at least more than 1 list with cpu's sharing the lists). A cpu would allocate from the free list for its own cpu, and put its done work on the its own put list - avoiding the contention. As cpu load may vary, when empty, a cpu may grab from another cpu, thereby changing resource distribution. But searching for a resource only occurs on 1 or a few cpus until a single resource can be allocated. if the condition reoccurs, it starts looking at a different cpu. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:29:41 -04:00
Dick Kennedy	e3e2863def	scsi: lpfc: Limit amount of work processed in IRQ Various oops being seen on being in the ISR too long and cpu lockups, when under heavy load. The amount of work being posted off of completion queues kept the ISR running almost all the time Correct the issue by limiting the amount of work per iteration. [mkp: typo] Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:29:40 -04:00
Dick Kennedy	cd22d6057c	scsi: lpfc: Correct return error codes to align with nvme_fc transport Modify driver return error codes to align with host nvme transport. Driver isn't returning Exxx error codes to properly reflect out of resource or connectivity conditions (-EBUSY), yet there were hard error conditions returning -EBUSY. Ensure the following situations return the proper return code: - Temporary failures or temporary resource availability: -EBUSY - Connectivity issues: -ENODEV All others are treated as hard errors and return an -Exxx value that indicates the type of error. Also, lpfc_sli4_issue_wqe() was modified to not translate error from -Exxx to WQE state. This allows lpfc_nvme_fcp_io_submit() routine to just return whatever -E value was returned from other routines. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:29:37 -04:00
Romain Perier	771db5c0e3	scsi: lpfc: Replace PCI pool old API The PCI pool API is deprecated. This commit replaces the PCI pool old API by the appropriate function with the DMA pool API. It also updates some comments, accordingly. Signed-off-by: Romain Perier <romain.perier@collabora.com> Reviewed-by: Peter Senna Tschudin <peter.senna@collabora.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-07 14:04:01 -04:00
James Smart	11e644e2a2	scsi: lpfc: Fix crash doing IO with resets During every reset, IOCBs are allocated. So, at one point, number of allocated IOCBs reaches maximum limit and lpfc_sli_next_iotag fails. Allocate IOCBs only during initialization. Reuse them after every reset instead of allocating new set of IOCBs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-06-19 21:40:44 -04:00
James Smart	569dbe84a3	scsi: lpfc: Fix crash after firmware flash when IO is running. OS crashes after the completion of firmware download. Failure in posting SCSI SGL buffers because number of SGL buffers is less than total count. Some of the pending IOs are not completed by driver. SGL buffers for these IOs are not added back to the list. Pending IOs are not completed because lpfc_wq_list list is initialized before completion of pending IOs. Postpone lpfc_wq_list reinitialization by moving lpfc_sli4_queue_destroy() after lpfc_hba_down_post(). Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-06-19 21:40:33 -04:00
James Smart	d41b65bcdc	scsi: lpfc: Fix system panic when express lane enabled. There is a null pointer dereference that can happen in the FOF interrupt handler. The driver was not setting up cq->assoc_qp_for sli4_hba->oas_cq. Initialize cq->assoc_qp before accessing it. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-06-19 21:39:01 -04:00
James Smart	0cf07f84dd	scsi: lpfc: Add auto EQ delay logic Administrator intervention is currently required to get good numbers when switching from running latency tests to IOPS tests. The configured interrupt coalescing values will greatly effect the results of these tests. Currently, the driver has a single coalescing value set by values of the module attribute. This patch changes the driver to support auto-configuration of the coalescing value based on the total number of outstanding IOs and average number of CQEs processed per interrupt for an EQ. Values are checked every 5 seconds. The driver defaults to the automatic selection. Automatic selection can be disabled by the new lpfc_auto_imax module_parameter. Older hardware can only change interrupt coalescing by mailbox command. Newer hardware supports change via a register. The patch support both. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-06-12 21:37:31 -04:00
James Smart	78e1d2009f	scsi: lpfc: Fix defects reported by Coverity Scan Addressed the following reported defects: CID 1411552: Control flow issues (MISSING_BREAK) /drivers/scsi/lpfc/lpfc_sli.c: 13259 in lpfc_sli4_nvmet_handle_rcqe() CID 1411553: Memory - illegal accesses (OVERRUN) /drivers/scsi/lpfc/lpfc_sli.c: 16218 in lpfc_fc_frame_check() CID 1411553: Memory - illegal accesses (OVERRUN) Overrunning array "lpfc_rctl_names" of 202 8-byte elements at element index 244 (byte offset 1952) using index "fc_hdr->fh_r_ctl" (which evaluates to 244). CID 1411554: Null pointer dereferences (REVERSE_INULL) /drivers/scsi/lpfc/lpfc_nvmet.c: 2131 in lpfc_nvmet_unsol_fcp_abort_cmp() ** CID 1411555: Memory - illegal accesses (UNINIT) /drivers/scsi/lpfc/lpfc_nvmet.c: 180 in lpfc_nvmet_ctxbuf_post() Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-06-12 21:37:31 -04:00
James Smart	e92974f6cd	scsi: lpfc: Null pointer dereference when log_verbose is set to 0xffffffff Kernel panic when log_verbose is set to 0xffffffff phba->pport is dereferenced before it is initialized Fix: Do not dereference phba->pport if it is NULL Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-06-12 21:37:31 -04:00
James Smart	ecbb227e63	scsi: lpfc: Fix crash on powering off BFS VM with passthrough device Null pointer dereference when BFS VM is powered off The driver incorrectly uses sli3_ring on SLI-4 adapters Use the correct ring structure based on sli_rev Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Tested-by: Raphael Silva <raphasil@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-06-12 21:37:31 -04:00
James Smart	14041bd170	scsi: lpfc: Fix Port going offline after multiple resets. Observing lpfc port down after issuing hbacmd reset command Failure in posting SGL buffers. If there is only one SGL buffer and rrq is valid for its XRI, we are rightly returning NULL but not adding the buffer back to the SGL list. So, number of buffers become less than total count and repost fails during reset. Add SGL buffer back to list before returning NULL. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-06-12 21:37:31 -04:00
James Smart	ae9e28f36a	scsi: lpfc: Add MDS Diagnostic support. Added code to support Cisco MDS loopback diagnostic. The diagnostics run various loopbacks including one which loops-back frame through the driver. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-05-16 21:24:47 -04:00
James Smart	64eb4dcb14	scsi: lpfc: Cleanup entry_repost settings on SLI4 queues Too many work items being processed in IRQ context take a lot of CPU time and cause problems. With a recent change, we get out of the ISR after hitting entry_repost work items on a queue. However, the actual values for entry repost are still high. EQ is 128 and CQ is 128, this could translate into processing 128 * 128 (16384) work items under IRQ context. Set entry_repost in the actual queue creation routine now. Limit EQ repost to 8 and CQ repost to 64 to further limit the amount of time spent in the IRQ. Fix fof IRQ routines as well. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-05-16 21:23:42 -04:00
James Smart	a8cf5dfeb4	scsi: lpfc: Added recovery logic for running out of NVMET IO context resources Previous logic would just drop the IO. Added logic to queue the IO to wait for an IO context resource from an IO thats already in progress. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-05-16 21:22:22 -04:00
James Smart	6c621a2229	scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context Currently IO resources are mapped 1 to 1 with RQ buffers posted Added logic to separate RQE buffers from IO op resources (sgl/iocbq/context). During initialization, the driver will determine how many SGLs it will allocate for NVMET (based on what the firmware reports) and associate a NVMET IOCBq and NVMET context structure with each one. Now that hdr/data buffers are immediately reposted back to the RQ, 512 RQEs for each MRQ is sufficient. Also, since NVMET data buffers are now 128 bytes, lpfc_nvmet_mrq_post is not necessary anymore as we will always post the max (512) buffers per NVMET MRQ. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-05-16 21:21:47 -04:00
James Smart	3c603be979	scsi: lpfc: Separate NVMET data buffer pool fir ELS/CT. Using 2048 byte buffer and onle 128 bytes is needed. Create nee LFPC_NVMET_DATA_BUF_SIZE define to use for NVMET RQ/MRQs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-05-16 21:21:17 -04:00
James Smart	7869da183a	scsi: lpfc: Fix NMI watchdog assertions when running nvmet IOPS tests After running IOPS test for 30 second we get kernel:NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 The driver is speend too much time in its ISR. In ISR EQ and CQ processing routines, if we hit the entry_repost numbers of EQE/CQEs just break out of the routine as opposed to hitting the doorbell with NOARM and continue processing. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-05-16 21:20:41 -04:00
James Smart	61f3d4bf4f	scsi: lpfc: Fix nvmet RQ resource needs for large block writes. Large block writes to the nvme target were failing because the default number of RQs posted was insufficient. Expand the NVMET RQs to 2048 RQEs and ensure a minimum of 512 RQEs are posted, no matter how many MRQs are configured. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-05-16 21:18:41 -04:00

1 2 3 4 5 ...

413 commits