1
0
Fork 0
Commit Graph

98 Commits (5b88ec228498b7d41de6599eca12cc0032056e3f)

Author SHA1 Message Date
Wei Yang 5b88ec2284 powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported
M64 aperture size is limited on PHB3.  When the IOV BAR is too big, this
will exceed the limitation and failed to be assigned.

Introduce a different mechanism based on the IOV BAR size:

  - if IOV BAR size is smaller than 64MB, expand to total_pe
  - if IOV BAR size is bigger than 64MB, roundup power2

[bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:38 +11:00
Wei Yang 781a868f31 powerpc/powernv: Shift VF resource with an offset
On PowerNV platform, resource position in M64 BAR implies the PE# the
resource belongs to. In some cases, adjustment of a resource is necessary
to locate it to a correct position in M64 BAR .

This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR
address according to an offset.

Note:

    After doing so, there would be a "hole" in the /proc/iomem when offset
    is a positive value. It looks like the device return some mmio back to
    the system, which actually no one could use it.

[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:38 +11:00
Wei Yang 5350ab3fd7 powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv
Implement pcibios_iov_resource_alignment() on powernv platform.

On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0

pnv_pci_iov_resource_alignment() handle these three cases respectively.

[bhelgaas: adjust to drop "align" parameter, return pci_iov_resource_size()
if no ppc_md machdep_call version]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Wei Yang 6e628c7d33 powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe
On PHB3, PF IOV BAR will be covered by M64 BAR to have better PE isolation.
M64 BAR is a type of hardware resource in PHB3, which could map a range of
MMIO to PE numbers on powernv platform. And this range is divided equally
by the number of total_pe with each divided range mapping to a PE number.
Also, the M64 BAR must map a MMIO range with power-of-two size.

The total_pe number is usually different from total_VFs, which can lead to
a conflict between MMIO space and the PE number.

For example, if total_VFs is 128 and total_pe is 256, the second half of
M64 BAR will be part of other PCI device, which may already belong to other
PEs.

This patch prevents the conflict by reserving additional space for the PF
IOV BAR, which is total_pe number of VF's BAR size.

[bhelgaas: make dev_printk() output more consistent, index resource[]
conventionally]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Wei Yang 9e8d4a19ab powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
Previously the iommu_table had the same lifetime as a struct pnv_ioda_pe
and was embedded in it. The pnv_ioda_pe was assigned to a PE on the bootup
stage. Since PEs are based on the hardware layout which is static in the
system, they will never get released. This means the iommu_table in the
pnv_ioda_pe will never get released either.

This no longer works for VF PE. VF PEs are created and released dynamically
when VFs are created and released. So we need to assign pnv_ioda_pe to VF
PEs respectively when VFs are enabled and clean up those resources for VF
PE when VFs are disabled. And iommu_table is one of the resources we need
to handle dynamically.

Current iommu_table is a static field in pnv_ioda_pe, which will face a
problem when freeing it. During the disabling of a VF,
pnv_pci_ioda2_release_dma_pe will call iommu_free_table to release the
iommu_table for this PE. A static iommu_table will fail in
iommu_free_table.

According to these requirement, this patch allocates iommu_table
dynamically.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Gavin Shan a8b2f8288a powerpc/pci: Create pci_dn for VFs
pci_dn is the extension of PCI device node and is created from device node.
Unfortunately, VFs are enabled dynamically by PF's driver and they don't
have corresponding device nodes and pci_dn, which is required to access
VFs' config spaces.

The patch creates pci_dn for VFs in pcibios_sriov_enable() on their PF,
and removes pci_dn for VFs in pcibios_sriov_disable() on their PF. When
VF's pci_dn is created, it's put to the child list of the pci_dn of PF's
upstream bridge. The pci_dn is linked to pci_dev during early fixup time
to setup the fast path.

[bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Gavin Shan 2f6cf79448 powerpc/powernv: Remove unused file
The patch removes unused file eeh-ioda.c and updates makefile
accordingly. Besides, the definition of "struct pnv_eeh_ops" and
the instances are all removed. Until now, the chip layer of EEH
implementation for PowerNV platform is removed completely.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:20 +11:00
Gavin Shan cadf364d14 powerpc/powernv: Drop PHB operation reset()
The patch drops PHB EEH operation reset() and merges its logic to
eeh_ops::reset().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Ryan Grimm 6f963ec2d6 cxl: Fix device_node reference counting
When unbinding and rebinding the driver on a system with a card in PHB0, this
error condition is reached after a few attempts:

ERROR: Bad of_node_put() on /pciex@3fffe40000000
CPU: 0 PID: 3040 Comm: bash Not tainted 3.18.0-rc3-12545-g3627ffe #152
Call Trace:
[c000000721acb5c0] [c00000000086ef94] .dump_stack+0x84/0xb0 (unreliable)
[c000000721acb640] [c00000000073a0a8] .of_node_release+0xd8/0xe0
[c000000721acb6d0] [c00000000044bc44] .kobject_release+0x74/0xe0
[c000000721acb760] [c0000000007394fc] .of_node_put+0x1c/0x30
[c000000721acb7d0] [c000000000545cd8] .cxl_probe+0x1a98/0x1d50
[c000000721acb900] [c0000000004845a0] .local_pci_probe+0x40/0xc0
[c000000721acb980] [c000000000484998] .pci_device_probe+0x128/0x170
[c000000721acba30] [c00000000052400c] .driver_probe_device+0xac/0x2a0
[c000000721acbad0] [c000000000522468] .bind_store+0x108/0x160
[c000000721acbb70] [c000000000521448] .drv_attr_store+0x38/0x60
[c000000721acbbe0] [c000000000293840] .sysfs_kf_write+0x60/0xa0
[c000000721acbc50] [c000000000292500] .kernfs_fop_write+0x140/0x1d0
[c000000721acbcf0] [c000000000208648] .vfs_write+0xd8/0x260
[c000000721acbd90] [c000000000208b18] .SyS_write+0x58/0x100
[c000000721acbe30] [c000000000009258] syscall_exit+0x0/0x98

We are missing a call to of_node_get(). pnv_pci_to_phb_node() should
call of_node_get() otherwise np's reference count isn't incremented and
it might go away. Rename pnv_pci_to_phb_node() to pnv_pci_get_phb_node()
so it's clear it calls of_node_get().

Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 14:51:31 +11:00
Thadeu Lima de Souza Cascardo 4e28784024 powernv/iommu: disable IOMMU bypass with param iommu=nobypass
When IOMMU bypass is enabled, a PCI device can read and write memory
that was not mapped by the driver without causing an EEH. That might
cause memory corruption, for example.

When we disable bypass, DMA reads and writes to addresses not mapped by
the IOMMU will cause an EEH, allowing us to debug such issues.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:46 +11:00
Wei Yang e9863e687d powerpc/powernv: Print the M64 range information in bootup log
The M64 range information is missed in dmesg, which would be helpful in debug.

This patch prints the M64 range information in the same format as M32.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 11:17:30 +11:00
Ryan Grimm 1212aa1c8c cxl: Enable CAPP recovery
Turning snoops on is the last step in CAPP recovery. Sapphire is expected to
have reinitialized the PHB and done the previous recovery steps.

Add mode argument to opal call to do this. Driver can turn snoops off although
it does not currently.

Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-22 17:31:52 +11:00
Linus Torvalds 140cd7fb04 powerpc updates for 3.19
Some nice cleanups like removing bootmem, and removal of __get_cpu_var().
 
 There is one patch to mm/gup.c. This is the generic GUP implementation, but is
 only used by us and arm(64). We have an ack from Steve Capper, and although we
 didn't get an ack from Andrew he told us to take the patch through the powerpc
 tree.
 
 There's one cxl patch. This is in drivers/misc, but Greg said he was happy for
 us to manage fixes for it.
 
 There is an infrastructure patch to support an IPMI driver for OPAL. That patch
 also appears in Corey Minyard's IPMI tree, you may see a conflict there.
 
 There is also an RTC driver for OPAL. We weren't able to get any response from
 the RTC maintainer, Alessandro Zummo, so in the end we just merged the driver.
 
 The usual batch of Freescale updates from Scott.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUiSTSAAoJEFHr6jzI4aWAirQP/3rIEng0LzLu5kW2zkGylIaM
 SNDum1vze3mHiTFl+CFcSIGpC1UEULoB49HA+2oE/ExKpIceG6lpL2LP+wNh2FW5
 mozjMjS6mZt4w1Fu1D2ZtgQc3O1T1pxkqsnZmPa8gVf5k5d5IQNPY6yB0pgVWwbV
 gwBKxe4VwPAzJjppE9i9MDhNTJwmHZq0lI8XuoTXOOU/f+4G1WxmjrbyveQ7cRP5
 i/sq2cKjxpWA+KDeIXo0GR0DpXR7qMeAvFX5xXY7oKuUJIFDM4kSHfmMYP6qLf5c
 2vlsJqHVqfOgQdve41z1ooaPzNtg7ezVo+VqqguSgtSgwy2JUo/uHpnzz3gD1Olo
 AP5+6xj8LZac0rTPxF4n4Hoyrp7AaaFjEFt1zqT9PWniZW4B41wtia0QORBNUf1S
 UEmKAC9T3WZJ47mH7WMSadtOPF9E3Yd/zuiPD4udtptCNKPbr6/k1MpJPIW2D4Rn
 BJ0QZTRd7V0yRofXxZtHxaMxq8pWd/Tip7J/zr/ghz+ulnH8BuFamuhCCLuJlESU
 +A2PMfuseyTMpH9sMAmmTwSGPDKjaUFWvmFvY/n88NZL7r2LlomNrDWFSSQOIHUP
 FxjYmjUMpZeexsfyRdgFV/INhYC3o3cso2fRGO45YK6nkxNnjNFEBS6WhQLvNLBu
 sknd1WjXkuJtoMC15SrQ
 =jvyT
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:
 "Some nice cleanups like removing bootmem, and removal of
  __get_cpu_var().

  There is one patch to mm/gup.c.  This is the generic GUP
  implementation, but is only used by us and arm(64).  We have an ack
  from Steve Capper, and although we didn't get an ack from Andrew he
  told us to take the patch through the powerpc tree.

  There's one cxl patch.  This is in drivers/misc, but Greg said he was
  happy for us to manage fixes for it.

  There is an infrastructure patch to support an IPMI driver for OPAL.

  There is also an RTC driver for OPAL.  We weren't able to get any
  response from the RTC maintainer, Alessandro Zummo, so in the end we
  just merged the driver.

  The usual batch of Freescale updates from Scott"

* tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (101 commits)
  powerpc/powernv: Return to cpu offline loop when finished in KVM guest
  powerpc/book3s: Fix partial invalidation of TLBs in MCE code.
  powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault
  powerpc/xmon: Cleanup the breakpoint flags
  powerpc/xmon: Enable HW instruction breakpoint on POWER8
  powerpc/mm/thp: Use tlbiel if possible
  powerpc/mm/thp: Remove code duplication
  powerpc/mm/hugetlb: Sanity check gigantic hugepage count
  powerpc/oprofile: Disable pagefaults during user stack read
  powerpc/mm: Check for matching hpte without taking hpte lock
  powerpc: Drop useless warning in eeh_init()
  powerpc/powernv: Cleanup unused MCE definitions/declarations.
  powerpc/eeh: Dump PHB diag-data early
  powerpc/eeh: Recover EEH error on ownership change for BCM5719
  powerpc/eeh: Set EEH_PE_RESET on PE reset
  powerpc/eeh: Refactor eeh_reset_pe()
  powerpc: Remove more traces of bootmem
  powerpc/pseries: Initialise nvram_pstore_info's buf_lock
  cxl: Name interrupts in /proc/interrupt
  cxl: Return error to PSL if IRQ demultiplexing fails & print clearer warning
  ...
2014-12-11 17:48:14 -08:00
Linus Torvalds 21f122f472 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
 "Here are five fixes for you to pull please.

  They're all CC'ed to stable except the "Fix PE state format" one which
  went in this release"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
  powerpc/powernv: Replace OPAL_DEASSERT_RESET with EEH_RESET_DEACTIVATE
  powerpc/eeh: Fix PE state format
  powerpc/pseries: Fix endiannes issue in RTAS call from xmon
  powerpc/powernv: Fix the hmi event version check.
2014-11-27 18:23:41 -08:00
Gavin Shan 360d88a9e3 powerpc/powernv: Replace OPAL_DEASSERT_RESET with EEH_RESET_DEACTIVATE
The flag passed to ioda_eeh_phb_reset() should be EEH_RESET_DEACTIVATE,
which is translated to OPAL_DEASSERT_RESET or something else by the
EEH backend accordingly.

The patch replaces OPAL_DEASSERT_RESET with EEH_RESET_DEACTIVATE for
ioda_eeh_phb_reset().

Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-27 09:40:32 +11:00
Benjamin Herrenschmidt 360743814c powerpc/powernv: Honor the generic "no_64bit_msi" flag
Instead of the arch specific quirk which we are deprecating
and that drivers don't understand.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
2014-11-24 14:36:01 +11:00
Michael Ellerman e39f223fc9 powerpc: Remove more traces of bootmem
Although we are now selecting NO_BOOTMEM, we still have some traces of
bootmem lying around. That is because even with NO_BOOTMEM there is
still a shim that converts bootmem calls into memblock calls, but
ultimately we want to remove all traces of bootmem.

Most of the patch is conversions from alloc_bootmem() to
memblock_virt_alloc(). In general a call such as:

  p = (struct foo *)alloc_bootmem(x);

Becomes:

  p = memblock_virt_alloc(x, 0);

We don't need the cast because memblock_virt_alloc() returns a void *.
The alignment value of zero tells memblock to use the default alignment,
which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.

We remove a number of NULL checks on the result of
memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
if it can't allocate, in exactly the same way as alloc_bootmem(), so the
NULL checks are and always have been redundant.

The memory returned by memblock_virt_alloc() is already zeroed, so we
remove several memsets of the result of memblock_virt_alloc().

Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
to that is pointless, 16XB ought to be enough for anyone.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-19 21:41:51 +11:00
Gavin Shan ec8e4e9d3d powerpc/powernv: Bail upon invalid master PE
When freezing compound PEs in pnv_ioda_freeze_pe(), we should bail
upon illegal master PE. We needn't freeze slave PE because it should
have been put into frozen state by hardware.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:25 +11:00
Gavin Shan 4773f76b61 powerpc/powernv: Simplify pnv_ioda_configure_pe()
Nested if statements are always bad and the patch avoids one by
checking PHB type and bail in advance if necessary.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:24 +11:00
Gavin Shan b131a8425c powerpc/powernv: Set PELTV for compound PEs
Commit 262af55 ("powerpc/powernv: Enable M64 aperatus for PHB3")
introduced compound PEs in order to support M64 aperatus on PHB3.
However, we never configured PELTV for compound PEs. The patch
fixes that by: parent PE can freeze all child compound PEs. Any
compound PE affects the group.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:24 +11:00
Gavin Shan 4b82ab1803 powerpc/powernv: Initialize M64 PE in time
The patch initializes PE instance when reserving PE number to
keep consistent things as we did before. Also, it replaces the
iteration on bridge's windows with the prefered way.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:23 +11:00
Gavin Shan 5ef7356781 powerpc/powernv: Rename alloc_m64_pe() to reserve_m64_pe()
The patch renames alloc_m64_pe() to reserve_m64_pe() to reflect
its real usage: We reserve PE numbers for M64 segments in advance
and then pick up the reserved PE numbers when building the mapping
between PE numbers and M64 segments.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:23 +11:00
Gavin Shan 9e9e893521 powerpc/powernv: Fix condition to remove M64
The M64 resource should be removed if we don't have hook to
initialize it, or (not and) fail to do that.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:22 +11:00
Gavin Shan 1665c4a892 powerpc/powernv: Check PHB type in advance
The patch checks PHB type a bit early to save a bit cycles
for P7 because we don't support M64 for P7IOC no matter what
OPAL firmware we have.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:22 +11:00
Ian Munsie 80c49c7e4a powerpc/powerpc: Add new PCIe functions for allocating cxl interrupts
This adds a number of functions for allocating IRQs under powernv PCIe for cxl.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08 20:15:44 +11:00
Ian Munsie fd9a1c26ae powerpc/powernv: Split out set MSI IRQ chip code
Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
split it out.

This will be used by some of the cxl PCIe code later.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-08 20:15:43 +11:00
Gavin Shan fe7e85c6f5 powerpc/powernv: Override dma_get_required_mask()
The dma_get_required_mask() function is used by some drivers to
query the platform about what DMA mask is needed to cover all of
memory. This is a bit of a strange semantic when we have to choose
between IOMMU translation or bypass, but essentially what it means
is "what DMA mask will give best performances".

Currently, our IOMMU backend always returns a 32-bit mask here, we
don't do anything special to it when we have bypass available. This
causes some drivers to choose a 32-bit mask, thus losing the ability
to use the bypass window, thinking this is more efficient. The problem
was reported from the driver of following device:

0004:03:00.0 0107: 1000:0087 (rev 05)
0004:03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios \
             Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)

This patch adds an override of that function in order to, instead,
return a 64-bit mask whenever a bypass window is available in order
for drivers to prefer this configuration.

Reported-by: Murali N. Iyer <mniyer@us.ibm.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30 17:15:20 +10:00
Gavin Shan d1a85eee35 powerpc/powernv: Sync OpalPciResetScope with firmware
The names of PCI reset scopes aren't sychronized with firmware.
The patch fixes it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30 17:15:17 +10:00
Joe Perches 6d31c2fa0e powerpc: pci-ioda: Use a single function to emit logging messages
No need for 3 functions when a single one will do.

Modify the function declaring macros to call the single function.

Reduces object code size a little:

$ size arch/powerpc/platforms/powernv/pci-ioda.o*
   text	   data	    bss	    dec	    hex	filename
  22303	   1073	   6680	  30056	   7568	arch/powerpc/platforms/powernv/pci-ioda.o.new
  22840	   1121	   6776	  30737	   7811	arch/powerpc/platforms/powernv/pci-ioda.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:57 +10:00
Joe Perches 45eb47242d powerpc: pci-ioda: Remove unnecessary return value from printk
The return value is unnecessary and unused, so make the functions
void instead of int.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:56 +10:00
Anton Blanchard e51df2c170 powerpc: Make a bunch of things static
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:41 +10:00
Gavin Shan 763fe0addb powerpc/powernv: Fix IOMMU group lost
When we take full hotplug to recover from EEH errors, PCI buses
could be involved. For the case, the child devices of involved
PCI buses can't be attached to IOMMU group properly, which is
caused by commit 3f28c5a ("powerpc/powernv: Reduce multi-hit of
iommu_add_device()").

When adding the PCI devices of the newly created PCI buses to
the system, the IOMMU group is expected to be added in (C).
(A) fails to bind the IOMMU group because bus->is_added is
false. (B) fails because the device doesn't have binding IOMMU
table yet. bus->is_added is set to true at end of (C) and
pdev->is_added is set to true at (D).

   pcibios_add_pci_devices()
      pci_scan_bridge()
         pci_scan_child_bus()
            pci_scan_slot()
               pci_scan_single_device()
                  pci_scan_device()
                  pci_device_add()
                     pcibios_add_device()           A: Ignore
                     device_add()                   B: Ignore
                  pcibios_fixup_bus()
                     pcibios_setup_bus_devices()
                        pcibios_setup_device()      C: Hit
      pcibios_finish_adding_to_bus()
         pci_bus_add_devices()
            pci_bus_add_device()                    D: Add device

If the parent PCI bus isn't involved in hotplug, the IOMMU
group is expected to be bound in (B). (A) should fail as the
sysfs entries aren't populated.

The patch fixes the issue by reverting commit 3f28c5a and remove
WARN_ON() in iommu_add_device() to allow calling the function
even the specified device already has associated IOMMU group.

Cc: <stable@vger.kernel.org>  # 3.16+
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:42 +10:00
Gavin Shan 49dec9222f powerpc/powernv: Handle compound PE
The patch introduces 3 PHB callbacks: compound PE state retrieval,
force freezing and unfreezing compound PE. The PCI config accessors
and PowerNV EEH backend can use them in subsequent patches.

We don't export the capability of compound PE to EEH core, which
helps avoiding more complexity to EEH core.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 16:33:30 +10:00
Guo Chao 262af557dd powerpc/powernv: Enable M64 aperatus for PHB3
This patch enables M64 aperatus for PHB3.

We already had platform hook (ppc_md.pcibios_window_alignment) to affect
the PCI resource assignment done in PCI core so that each PE's M32 resource
was built on basis of M32 segment size. Similarly, we're using that for
M64 assignment on basis of M64 segment size.

   * We're using last M64 BAR to cover M64 aperatus, and it's shared by all
     256 PEs.
   * We don't support P7IOC yet. However, some function callbacks are added
     to (struct pnv_phb) so that we can reuse them on P7IOC in future.
   * PE, corresponding to PCI bus with large M64 BAR device attached, might
     span multiple M64 segments. We introduce "compound" PE to cover the case.
     The compound PE is a list of PEs and the master PE is used as before.
     The slave PEs are just for MMIO isolation.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:47 +10:00
Gavin Shan 05b1721d9f powerpc/eeh: Refactor EEH flag accessors
There are multiple global EEH flags. Almost each flag has its own
accessor, which doesn't make sense. The patch refactors EEH flag
accessors so that they look unified:

  eeh_add_flag():   Add EEH flag
  eeh_clear_flag(): Clear EEH flag
  eeh_has_flag():   Check if one specific flag has been set
  eeh_enabled():    Check if EEH functionality has been enabled

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:21 +10:00
Gavin Shan dff4a39e88 powerpc/powernv: Fix IOMMU table for VFIO dev
On PHB3, PCI devices can bypass IOMMU for DMA access. If we pass
through one PCI device, whose hose driver ever enable the bypass
mode, pdev->dev.archdata.dma_data.iommu_table_base isn't IOMMU
table. However, EEH needs access the IOMMU table when the device
is owned by guest.

The patch fixes pdev->dev.archdata.dma_data.iommu_table when
passing through the device to guest in pnv_pci_ioda2_set_bypass().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:12 +10:00
Brian W Hart a32305bf90 powerpc/powernv: Update dev->dma_mask in pci_set_dma_mask() path
powerpc defines various machine-specific routines for handling
pci_set_dma_mask().  The routines for machine "PowerNV" may neglect
to set dev->dma_mask.  This could confuse anyone (e.g. drivers) that
consult dev->dma_mask to find the current mask.  Set the dma_mask in
the PowerNV leaf routine.

Signed-off-by: Brian W. Hart <hartb@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:40:54 +10:00
Mike Qiu dadcd6d6e7 powerpc/eeh: sysfs entries lost
The sysfs entries are lost because of commit 2213fb1 ("powerpc/eeh:
Skip eeh sysfs when eeh is disabled"). That commit added condition
to create sysfs entries with EEH_ENABLED, which isn't populated
when trying to create sysfs entries on PowerNV platform during system
boot time. The patch fixes the issue by:

   * Reoder EEH initialization functions so that they're same on
     PowerNV/pSeries.
   * Cache PE's primary bus by PowerNV platform instead of EEH core
     to avoid kernel crash caused by the function reorder. Another
     benefit with this is to avoid one eeh_probe_mode_dev() in EEH
     core.

Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:28:49 +10:00
Alexey Kardashevskiy 8fa5d4547e powerpc/powernv: Add a page size parameter to pnv_pci_setup_iommu_table()
Since a TCE page size can be other than 4K, make it configurable for
P5IOC2 and IODA PHBs.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 16:05:53 +10:00
Alexey Kardashevskiy b0376c9b08 powerpc/powernv: Use it_page_shift for TCE invalidation
This fixes IODA1/2 to use it_page_shift as it may be bigger than 4K.

This changes involved constant values to use "ull" modifier.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 16:05:43 +10:00
Gavin Shan e9bc03fe22 powerpc/powernv: Don't use pe->pbus to get the domain number
If the PE contains single PCI function, "pe->pbus" would be NULL.
It's not reliable to be used by pci_domain_nr().  We just grab the
PCI domain number from the PCI host controller (struct pci_controller)
instance.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 17:35:14 +10:00
Gavin Shan 65fd766b99 powerpc/powernv: Missed IOMMU table type
In function pnv_pci_ioda2_setup_dma_pe(), the IOMMU table type is
set to (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE) unconditionally.
It was just set to TCE_PCI by pnv_pci_setup_iommu_table(). So the
primary IOMMU table type (TCE_PCI) is lost. The patch fixes it.

Also, pnv_pci_setup_iommu_table() already set "tbl->it_busno" to
zero and we needn't do it again. The patch removes the redundant
assignment.

The patch also fixes similar issues in pnv_pci_ioda_setup_dma_pe().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 17:35:10 +10:00
Gavin Shan 361f2a2a15 powrpc/powernv: Reset PHB in kdump kernel
In the kdump scenario, the first kerenl doesn't shutdown PCI devices
and the kdump kerenl clean PHB IODA table at the early probe time.
That means the kdump kerenl can't support PCI transactions piled
by the first kerenl. Otherwise, lots of EEH errors and frozen PEs
will be detected.

In order to avoid the EEH errors, the PHB is resetted to drop all
PCI transaction from the first kerenl.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 17:34:57 +10:00
Gavin Shan d92a208d08 powerpc/pci: Mask linkDown on resetting PCI bus
The problem was initially reported by Wendy who tried pass through
IPR adapter, which was connected to PHB root port directly, to KVM
based guest. When doing that, pci_reset_bridge_secondary_bus() was
called by VFIO driver and linkDown was detected by the root port.
That caused all PEs to be frozen.

The patch fixes the issue by routing the reset for the secondary bus
of root port to underly firmware. For that, one more weak function
pci_reset_secondary_bus() is introduced so that the individual platforms
can override that and do specific reset for bridge's secondary bus.

Reported-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 17:34:53 +10:00
Wei Yang 4966bfa1b3 powerpc/powernv: Release the refcount for pci_dev
On PowerNV platform, we are holding an unnecessary refcount on a pci_dev, which
leads to the pci_dev is not destroyed when hotplugging a pci device.

This patch release the unnecessary refcount.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 13:11:20 +10:00
Wei Yang 3f28c5af39 powerpc/powernv: Reduce multi-hit of iommu_add_device()
During the EEH hotplug event, iommu_add_device() will be invoked three times
and two of them will trigger warning or error.

The three times to invoke the iommu_add_device() are:

    pci_device_add
       ...
       set_iommu_table_base_and_group   <- 1st time, fail
    device_add
       ...
       tce_iommu_bus_notifier           <- 2nd time, succees
    pcibios_add_pci_devices
       ...
       pcibios_setup_bus_devices        <- 3rd time, re-attach

The first time fails, since the dev->kobj->sd is not initialized. The
dev->kobj->sd is initialized in device_add().
The third time's warning is triggered by the re-attach of the iommu_group.

After applying this patch, the error

    iommu_tce: 0003:05:00.0 has not been added, ret=-14

and the warning

    [  204.123609] ------------[ cut here ]------------
    [  204.123645] WARNING: at arch/powerpc/kernel/iommu.c:1125
    [  204.123680] Modules linked in: xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT bnep bluetooth 6lowpan_iphc rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc mlx4_ib ib_sa ib_mad ib_core ib_addr ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnx2x tg3 mlx4_core nfsd ptp mdio ses libcrc32c nfs_acl enclosure be2net pps_core shpchp lockd kvm uinput sunrpc binfmt_misc lpfc scsi_transport_fc ipr scsi_tgt
    [  204.124356] CPU: 18 PID: 650 Comm: eehd Not tainted 3.14.0-rc5yw+ #102
    [  204.124400] task: c0000027ed485670 ti: c0000027ed50c000 task.ti: c0000027ed50c000
    [  204.124453] NIP: c00000000003cf80 LR: c00000000006c648 CTR: c00000000006c5c0
    [  204.124506] REGS: c0000027ed50f440 TRAP: 0700   Not tainted  (3.14.0-rc5yw+)
    [  204.124558] MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 88008084  XER: 20000000
    [  204.124682] CFAR: c00000000006c644 SOFTE: 1
    GPR00: c00000000006c648 c0000027ed50f6c0 c000000001398380 c0000027ec260300
    GPR04: c0000027ea92c000 c00000000006ad00 c0000000016e41b0 0000000000000110
    GPR08: c0000000012cd4c0 0000000000000001 c0000027ec2602ff 0000000000000062
    GPR12: 0000000028008084 c00000000fdca200 c0000000000d1d90 c0000027ec281a80
    GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
    GPR24: 000000005342697b 0000000000002906 c000001fe6ac9800 c000001fe6ac9800
    GPR28: 0000000000000000 c0000000016e3a80 c0000027ea92c090 c0000027ea92c000
    [  204.125353] NIP [c00000000003cf80] .iommu_add_device+0x30/0x1f0
    [  204.125399] LR [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
    [  204.125443] Call Trace:
    [  204.125464] [c0000027ed50f6c0] [c0000027ed50f750] 0xc0000027ed50f750 (unreliable)
    [  204.125526] [c0000027ed50f750] [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
    [  204.125588] [c0000027ed50f7d0] [c000000000069cc8] .pnv_pci_dma_dev_setup+0x78/0x340
    [  204.125650] [c0000027ed50f870] [c000000000044408] .pcibios_setup_device+0x88/0x2f0
    [  204.125712] [c0000027ed50f940] [c000000000046040] .pcibios_setup_bus_devices+0x60/0xd0
    [  204.125774] [c0000027ed50f9c0] [c000000000043acc] .pcibios_add_pci_devices+0xdc/0x1c0
    [  204.125837] [c0000027ed50fa50] [c00000000086f970] .eeh_reset_device+0x36c/0x4f0
    [  204.125939] [c0000027ed50fb20] [c00000000003a2d8] .eeh_handle_normal_event+0x448/0x480
    [  204.126068] [c0000027ed50fbc0] [c00000000003a35c] .eeh_handle_event+0x4c/0x340
    [  204.126192] [c0000027ed50fc80] [c00000000003a74c] .eeh_event_handler+0xfc/0x1b0
    [  204.126319] [c0000027ed50fd30] [c0000000000d1ea0] .kthread+0x110/0x130
    [  204.126430] [c0000027ed50fe30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
    [  204.126556] Instruction dump:
    [  204.126610] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff71 7c7e1b78 60000000
    [  204.126787] 60000000 e87e0298 3143ffff 7d2a1910 <0b090000> 2fa90000 40de00c8 ebfe0218
    [  204.126966] ---[ end trace 6e7aefd80add2973 ]---

are cleared.

This patch removes iommu_add_device() in pnv_pci_ioda_dma_dev_setup(), which
revert part of the change in commit d905c5df(PPC: POWERNV: move
iommu_add_device earlier).

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 13:11:20 +10:00
Benjamin Herrenschmidt cd15b04844 powerpc/powernv: Add iommu DMA bypass support for IODA2
This patch adds the support for to create a direct iommu "bypass"
window on IODA2 bridges (such as Power8) allowing to bypass iommu
page translation completely for 64-bit DMA capable devices, thus
significantly improving DMA performances.

Additionally, this adds a hook to the struct iommu_table so that
the IOMMU API / VFIO can disable the bypass when external ownership
is requested, since in that case, the device will be used by an
environment such as userspace or a KVM guest which must not be
allowed to bypass translations.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11 16:07:37 +11:00
Gavin Shan 8184616f6f powerpc/powernv: Remove unnecessary assignment
We don't have IO ports on PHB3 and the assignment of variable
"iomap_off" on PHB3 is meaningless. The patch just removes the
unnecessary assignment to the variable. The code change should
have been part of commit c35d2a8c ("powerpc/powernv: Needn't IO
segment map for PHB3").

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:46:45 +11:00
Benjamin Herrenschmidt dece8ada99 Merge branch 'merge' into next
Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such
as Anton Little Endian fixes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30 15:19:31 +11:00
Thadeu Lima de Souza Cascardo 08607afba6 powernv: Fix VFIO support with PHB3
I have recently found out that no iommu_groups could be found under
/sys/ on a P8. That prevents PCI passthrough from working.

During my investigation, I found out there seems to be a missing
iommu_register_group for PHB3. The following patch seems to fix the
problem. After applying it, I see iommu_groups under
/sys/kernel/iommu_groups/, and can also bind vfio-pci to an adapter,
which gives me a device at /dev/vfio/.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:28:38 +11:00