From 11e42532ada3174840196e8f23df29cb91c44b50 Mon Sep 17 00:00:00 2001 From: Gavin Shan Date: Fri, 5 Sep 2014 15:35:30 -0600 Subject: [PATCH 01/10] PCI: Assume all Mellanox devices have broken INTx masking The VFIO driver routes LSI interrupts by capturing, masking, and then delivering. When passing though Mellanox adapters from host to guest, interrupt storm are reported from host and guest. That's because the PCI command register INTx Disable bit doesn't work on Mellanox devices. # lspci | grep Mellanox 0001:05:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3] 0005:01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0) Amir Vadai confirmed that all Mellanox devices have same problem. The patch marks broken INTx masking for all Mellanox adapters. Suggested-by: Benjamin Herrenschmidt Signed-off-by: Gavin Shan Signed-off-by: Bjorn Helgaas Acked-By: Amir Vadai --- drivers/pci/quirks.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 80c2d014283d..da062d8ae36f 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -2985,6 +2985,8 @@ DECLARE_PCI_FIXUP_HEADER(0x1814, 0x0601, /* Ralink RT2800 802.11n PCI */ */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_REALTEK, 0x8169, quirk_broken_intx_masking); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID, + quirk_broken_intx_masking); #ifdef CONFIG_ACPI /* From 89665a6a71408796565bfd29cfa6a7877b17a667 Mon Sep 17 00:00:00 2001 From: Rajat Jain Date: Mon, 8 Sep 2014 14:19:49 -0700 Subject: [PATCH 02/10] PCI: Check only the Vendor ID to identify Configuration Request Retry Per PCIe r3.0, sec 2.3.2, if a Root Complex - has Configuration Request Retry Status Software Visibility enabled, - issues a Configuration Read of both bytes of the Vendor ID, and - receives a Completion with Configuration Request Retry Status (CRS), it must complete the request to the host by fabricating data of 0x0001 for the Vendor ID and 0xff for any additional bytes in the request. Linux issues a single config read for the four bytes containing the Vendor ID and the Device ID. Previously we checked all four bytes for 0xffff0001 to identify CRS. However, it is only the Vendor ID that really indicates CRS, because it's sufficient to read only those two bytes. Checking the Device ID verifies spec compliance but doesn't add any information. Some Root Complexes appear to indicate CRS by returning 0x0001 for the Vendor ID along with the actual the Device ID. Previously we interpreted that as a valid Vendor/Device ID pair, although 0x0001 is reserved and cannot be a valid Vendor ID. [bhelgaas: changelog] Link: http://lkml.kernel.org/r/4729FC36.3040000@gmail.com Signed-off-by: Rajat Jain Signed-off-by: Rajat Jain Signed-off-by: Guenter Roeck Signed-off-by: Bjorn Helgaas --- drivers/pci/probe.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index e3cf8a2e6292..9c26663c91d2 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1292,8 +1292,13 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l, *l == 0x0000ffff || *l == 0xffff0000) return false; - /* Configuration request Retry Status */ - while (*l == 0xffff0001) { + /* + * Configuration Request Retry Status. Some root ports return the + * actual device ID instead of the synthetic ID (0xFFFF) required + * by the PCIe spec. Ignore the device ID and only check for + * (vendor id == 1). + */ + while ((*l & 0xffff) == 0x0001) { if (!crs_timeout) return false; From f3dbd802b3caf8da92173870bc270dda6b3f84ba Mon Sep 17 00:00:00 2001 From: Rajat Jain Date: Tue, 2 Sep 2014 16:26:00 -0700 Subject: [PATCH 03/10] PCI: Enable CRS Software Visibility for root port if it is supported Per PCIe r3.0, sec 2.3.2, an endpoint may respond to a Configuration Request with a Completion with Configuration Request Retry Status (CRS). This terminates the Configuration Request. When the CRS Software Visibility feature is disabled (as it is by default), a Root Complex must handle a CRS Completion by re-issuing the Configuration Request. This is invisible to software. From the CPU's point of view, an endpoint that always responds with CRS causes a hang because the Root Complex never supplies data to complete the CPU read. When CRS Software Visibility is enabled, a Root Complex that receives a CRS Completion for a read of the Vendor ID must return data of 0x0001. The Vendor ID of 0x0001 indicates to software that the endpoint is not ready. We now have more devices that require CRS Software Visibility. For example, a PLX 8713 NT bridge may respond with CRS until it has been configured via I2C, and the I2C configuration is completely independent of PCI enumeration. Enable CRS Software Visibility if it is supported. This allows a system with such a device to work (though the PCI core times out waiting for it to become ready, and we have to rescan the bus after it is ready). This essentially reverts ad7edfe04908 ("[PCI] Do not enable CRS Software Visibility by default"). The failures that led to ad7edfe04908 should be addressed by 89665a6a7140 ("PCI: Check only the Vendor ID to identify Configuration Request Retry"). [bhelgaas: changelog] Link: http://lkml.kernel.org/r/20071029061532.5d10dfc6@snowcone Link: http://lkml.kernel.org/r/alpine.LFD.0.9999.0712271023090.21557@woody.linux-foundation.org Signed-off-by: Rajat Jain Signed-off-by: Rajat Jain Signed-off-by: Guenter Roeck Signed-off-by: Bjorn Helgaas --- drivers/pci/probe.c | 13 +++++++++++++ include/uapi/linux/pci_regs.h | 1 + 2 files changed, 14 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 9c26663c91d2..e02cdaa5bf0c 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -740,6 +740,17 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev, } EXPORT_SYMBOL(pci_add_new_bus); +static void pci_enable_crs(struct pci_dev *pdev) +{ + u16 root_cap = 0; + + /* Enable CRS Software Visibility if supported */ + pcie_capability_read_word(pdev, PCI_EXP_RTCAP, &root_cap); + if (root_cap & PCI_EXP_RTCAP_CRSVIS) + pcie_capability_set_word(pdev, PCI_EXP_RTCTL, + PCI_EXP_RTCTL_CRSSVE); +} + /* * If it's a bridge, configure it and scan the bus behind it. * For CardBus bridges, we don't scan behind as the devices will @@ -787,6 +798,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT); + pci_enable_crs(dev); + if ((secondary || subordinate) && !pcibios_assign_all_busses() && !is_cardbus && !broken) { unsigned int cmax; diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index 30db069bce62..57d6e56a086a 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -552,6 +552,7 @@ #define PCI_EXP_RTCTL_PMEIE 0x0008 /* PME Interrupt Enable */ #define PCI_EXP_RTCTL_CRSSVE 0x0010 /* CRS Software Visibility Enable */ #define PCI_EXP_RTCAP 30 /* Root Capabilities */ +#define PCI_EXP_RTCAP_CRSVIS 0x0001 /* CRS Software Visibility capability */ #define PCI_EXP_RTSTA 32 /* Root Status */ #define PCI_EXP_RTSTA_PME 0x00010000 /* PME status */ #define PCI_EXP_RTSTA_PENDING 0x00020000 /* PME pending */ From ce0529843a505d09f5809a7db6288d2f038f64c4 Mon Sep 17 00:00:00 2001 From: Ethan Zhao Date: Tue, 9 Sep 2014 10:21:25 +0800 Subject: [PATCH 04/10] PCI: Add device flag helper functions Add helper functions to hide direct device flag operations: void pci_set_dev_assigned(struct pci_dev *dev); void pci_clear_dev_assigned(struct pci_dev *dev); bool pci_is_dev_assigned(struct pci_dev *dev); Signed-off-by: Ethan Zhao Signed-off-by: Bjorn Helgaas --- include/linux/pci.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/include/linux/pci.h b/include/linux/pci.h index 61978a460841..92c131efec1c 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1839,4 +1839,17 @@ int pci_for_each_dma_alias(struct pci_dev *pdev, */ struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev); +/* helper functions for operation of device flag */ +static inline void pci_set_dev_assigned(struct pci_dev *pdev) +{ + pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; +} +static inline void pci_clear_dev_assigned(struct pci_dev *pdev) +{ + pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; +} +static inline bool pci_is_dev_assigned(struct pci_dev *pdev) +{ + return (pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED) == PCI_DEV_FLAGS_ASSIGNED; +} #endif /* LINUX_PCI_H */ From ad0d217ca645477ba30c2f3cf1a5bbb7ef18b1fd Mon Sep 17 00:00:00 2001 From: Ethan Zhao Date: Tue, 9 Sep 2014 10:21:26 +0800 Subject: [PATCH 05/10] KVM: Use PCI device flag helper functions Use PCI device flag helper functions when assigning or releasing device. No functional change. Signed-off-by: Ethan Zhao Signed-off-by: Bjorn Helgaas Acked-by: Paolo Bonzini --- virt/kvm/assigned-dev.c | 2 +- virt/kvm/iommu.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 5819a2708d7e..e05000e200d2 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -302,7 +302,7 @@ static void kvm_free_assigned_device(struct kvm *kvm, else pci_restore_state(assigned_dev->dev); - assigned_dev->dev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(assigned_dev->dev); pci_release_regions(assigned_dev->dev); pci_disable_device(assigned_dev->dev); diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 714b94932312..e723bb91aa34 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -203,7 +203,7 @@ int kvm_assign_device(struct kvm *kvm, goto out_unmap; } - pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; + pci_set_dev_assigned(pdev); dev_info(&pdev->dev, "kvm assign device\n"); @@ -229,7 +229,7 @@ int kvm_deassign_device(struct kvm *kvm, iommu_detach_device(domain, &pdev->dev); - pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(pdev); dev_info(&pdev->dev, "kvm deassign device\n"); From be507fd09011d2af3b34940fe616a2dd569fd3f7 Mon Sep 17 00:00:00 2001 From: Ethan Zhao Date: Tue, 9 Sep 2014 10:21:27 +0800 Subject: [PATCH 06/10] xen/pciback: Use PCI device flag helper functions Use PCI device flag helper functions when assigning or releasing device. No functional change. Signed-off-by: Ethan Zhao Signed-off-by: Bjorn Helgaas Reviewed-by: Konrad Rzeszutek Wilk Acked-by: David Vrabel --- drivers/xen/xen-pciback/pci_stub.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 259ba2661543..017069a455d4 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref) xen_pcibk_config_free_dyn_fields(dev); xen_pcibk_config_free_dev(dev); - dev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(dev); pci_dev_put(dev); kfree(psdev); @@ -413,7 +413,7 @@ static int pcistub_init_device(struct pci_dev *dev) dev_dbg(&dev->dev, "reset device\n"); xen_pcibk_reset_device(dev); - dev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; + pci_set_dev_assigned(dev); return 0; config_release: From be63497c413e22d5abdf32313f4b469af6aa7f4c Mon Sep 17 00:00:00 2001 From: Ethan Zhao Date: Tue, 9 Sep 2014 10:21:28 +0800 Subject: [PATCH 07/10] PCI: Use device flag helper functions Use PCI device flag helper functions when checking whether a device is assigned. No functional change. Signed-off-by: Ethan Zhao Signed-off-by: Bjorn Helgaas --- drivers/pci/iov.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index cb6f24740ee3..4d109c07294a 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -633,7 +633,7 @@ int pci_vfs_assigned(struct pci_dev *dev) * our dev as the physical function and the assigned bit is set */ if (vfdev->is_virtfn && (vfdev->physfn == dev) && - (vfdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED)) + pci_is_dev_assigned(vfdev)) vfs_assigned++; vfdev = pci_get_device(dev->vendor, dev_id, vfdev); From e0d1b6b77ced59d852d38fcf9a8a0a1c40c84cee Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Tue, 5 Aug 2014 14:08:55 +0200 Subject: [PATCH 08/10] PCI/AER: Make standalone includable The header file references u16 and u32 types, but they are not defined in the header nor does the header pull in the necessary includes for them. This causes build breakage when the file is included without any of the dependencies being satisfied from somewhere else. Fix this by including linux/types.h (for u16 and u32). [bhelgaas: removed pci_dev declaration (already added by 5ccb8225abf2)] Signed-off-by: Thierry Reding Signed-off-by: Bjorn Helgaas --- include/linux/aer.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/aer.h b/include/linux/aer.h index c826d1c28f9c..4fef65e57023 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -7,6 +7,8 @@ #ifndef _AER_H_ #define _AER_H_ +#include + #define AER_NONFATAL 0 #define AER_FATAL 1 #define AER_CORRECTABLE 2 From 9fe373f9997b48fcd6222b95baf4a20c134b587a Mon Sep 17 00:00:00 2001 From: Douglas Lehr Date: Thu, 21 Aug 2014 09:26:52 +1000 Subject: [PATCH 09/10] PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size The Crocodile chip occasionally comes up with 4k and 8k BAR sizes. Due to an erratum, setting the SR-IOV page size causes the physical function BARs to expand to the system page size. Since ppc64 uses 64k pages, when Linux tries to assign the smaller resource sizes to the now 64k BARs the address will be truncated and the BARs will overlap. Force Linux to allocate the resource as a full page, which avoids the overlap. [bhelgaas: print expanded resource, too] Signed-off-by: Douglas Lehr Signed-off-by: Anton Blanchard Signed-off-by: Bjorn Helgaas Acked-by: Milton Miller CC: stable@vger.kernel.org --- drivers/pci/quirks.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 80c2d014283d..feaa5c23e991 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -24,6 +24,7 @@ #include #include #include +#include #include /* isa_dma_bridge_buggy */ #include "pci.h" @@ -287,6 +288,25 @@ static void quirk_citrine(struct pci_dev *dev) } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_IBM, PCI_DEVICE_ID_IBM_CITRINE, quirk_citrine); +/* On IBM Crocodile ipr SAS adapters, expand BAR to system page size */ +static void quirk_extend_bar_to_page(struct pci_dev *dev) +{ + int i; + + for (i = 0; i < PCI_STD_RESOURCE_END; i++) { + struct resource *r = &dev->resource[i]; + + if (r->flags & IORESOURCE_MEM && resource_size(r) < PAGE_SIZE) { + r->end = PAGE_SIZE - 1; + r->start = 0; + r->flags |= IORESOURCE_UNSET; + dev_info(&dev->dev, "expanded BAR %d to page size: %pR\n", + i, r); + } + } +} +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_IBM, 0x034a, quirk_extend_bar_to_page); + /* * S3 868 and 968 chips report region size equal to 32M, but they decode 64M. * If it's needed, re-allocate the region. From 63ddc0b8fe5ebbac88e2ac84b489470bf3a22965 Mon Sep 17 00:00:00 2001 From: Megan Kamiya Date: Fri, 5 Sep 2014 20:19:10 -0700 Subject: [PATCH 10/10] PCI: Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters Add parentheses around parameters in PCI_DEVID and PCI_VPD_LRDT_ID macros to prevent possible expansion errors as described by the CERT Secure Coding Standard: PRE01-C: Use parentheses within macros around parameter names Tested-by: Aaron Brown Signed-off-by: Megan Kamiya Signed-off-by: Jeff Kirsher Signed-off-by: Bjorn Helgaas --- include/linux/pci.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index 61978a460841..cb744f3925a4 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -45,7 +45,7 @@ * In the interest of not exposing interfaces to user-space unnecessarily, * the following kernel-only defines are being added here. */ -#define PCI_DEVID(bus, devfn) ((((u16)bus) << 8) | devfn) +#define PCI_DEVID(bus, devfn) ((((u16)(bus)) << 8) | (devfn)) /* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */ #define PCI_BUS_NUM(x) (((x) >> 8) & 0xff) @@ -1701,7 +1701,7 @@ bool pci_acs_path_enabled(struct pci_dev *start, struct pci_dev *end, u16 acs_flags); #define PCI_VPD_LRDT 0x80 /* Large Resource Data Type */ -#define PCI_VPD_LRDT_ID(x) (x | PCI_VPD_LRDT) +#define PCI_VPD_LRDT_ID(x) ((x) | PCI_VPD_LRDT) /* Large Resource Data Type Tag Item Names */ #define PCI_VPD_LTIN_ID_STRING 0x02 /* Identifier String */