1
0
Fork 0
Commit Graph

15 Commits (5abefb861fd4306467813380cf21ce21d4b274ce)

Author SHA1 Message Date
Linus Torvalds 23971bdfff IOMMU Updates for Linux v3.18
This pull-request includes:
 
 	* Change in the IOMMU-API to convert the former iommu_domain_capable
 	  function to just iommu_capable
 
 	* Various fixes in handling RMRR ranges for the VT-d driver (one fix
 	  requires a device driver core change which was acked
 	  by Greg KH)
 
 	* The AMD IOMMU driver now assigns and deassigns complete alias groups
 	  to fix issues with devices using the wrong PCI request-id
 
 	* MMU-401 support for the ARM SMMU driver
 
 	* Multi-master IOMMU group support for the ARM SMMU driver
 
 	* Various other small fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUPNxYAAoJECvwRC2XARrjMwMP/RLSr+oA31rGVjLXcmcCHl7Q
 Uj7xpcnG19qB0aqNR1JeJuZNkK/tw44pE353MQPbz4N9UVUiogklGIVD1iJvFV53
 0qm84bvpDJIof4aP35B3H3Umft2USTn/lmsQg/RklQcNTW8DzNj63b8BTNR7k/GL
 G7bLg7F1BUCl0shZCCsFspOIulQPAJYN2OvHlfYBav/bfDvfouQ3lrV+loGrK44r
 F2Hmp+imXlIhUCjfbiWz6wKFxvPrxZx482vm2pXBCSnXEdW4/fz6nf9VHUK/Cfsq
 JAimY1CfiDo1aqH9/yVHUOw5SD/NYOXq6E5bFPg/WENbipbbae5cK2u6PX5MMBAn
 CG4BM8l9xicfGPqgn5YFSRY/6qC6K7NlxMnt9U8l18QIkDVDqEtUgJQISJuce7wx
 FWx6eSWaxpIe5yhq19/h2ELalUUyR/fPq+UXXjYDL1kLV/vcvC/lC3mbNAQU93zU
 WK0bG2tDg88JHavc25Ewa2aOn4BVM2BpwuLbYlgQReaEmsQRnEPgtmRNyLJHqbFE
 wwpCj8pBWdufsJWRyvpnXQ+CfA7oSz4e7hz1G+0/5uiDmagfvg16Ql5JtPmmuLUm
 Kc3dVIiG0s1ewohZIIJETGCqprQbCSqs8CCQqB6p2zDBWFKpNT7F38lm/KlehkCz
 JpAiI7Y2K9Jejp0VIPrt
 =OMOt
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "This pull-request includes:

   - change in the IOMMU-API to convert the former iommu_domain_capable
     function to just iommu_capable

   - various fixes in handling RMRR ranges for the VT-d driver (one fix
     requires a device driver core change which was acked by Greg KH)

   - the AMD IOMMU driver now assigns and deassigns complete alias
     groups to fix issues with devices using the wrong PCI request-id

   - MMU-401 support for the ARM SMMU driver

   - multi-master IOMMU group support for the ARM SMMU driver

   - various other small fixes all over the place"

* tag 'iommu-updates-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
  iommu/vt-d: Work around broken RMRR firmware entries
  iommu/vt-d: Store bus information in RMRR PCI device path
  iommu/vt-d: Only remove domain when device is removed
  driver core: Add BUS_NOTIFY_REMOVED_DEVICE event
  iommu/amd: Fix devid mapping for ivrs_ioapic override
  iommu/irq_remapping: Fix the regression of hpet irq remapping
  iommu: Fix bus notifier breakage
  iommu/amd: Split init_iommu_group() from iommu_init_device()
  iommu: Rework iommu_group_get_for_pci_dev()
  iommu: Make of_device_id array const
  amd_iommu: do not dereference a NULL pointer address.
  iommu/omap: Remove omap_iommu unused owner field
  iommu: Remove iommu_domain_has_cap() API function
  IB/usnic: Convert to use new iommu_capable() API function
  vfio: Convert to use new iommu_capable() API function
  kvm: iommu: Convert to use new iommu_capable() API function
  iommu/tegra: Convert to iommu_capable() API function
  iommu/msm: Convert to iommu_capable() API function
  iommu/vt-d: Convert to iommu_capable() API function
  iommu/fsl: Convert to iommu_capable() API function
  ...
2014-10-15 07:23:49 +02:00
Will Deacon f5c9ecebaf vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type
VFIO allows devices to be safely handed off to userspace by putting
them behind an IOMMU configured to ensure DMA and interrupt isolation.
This enables userspace KVM clients, such as kvmtool and qemu, to further
map the device into a virtual machine.

With IOMMUs such as the ARM SMMU, it is then possible to provide SMMU
translation services to the guest operating system, which are nested
with the existing translation installed by VFIO. However, enabling this
feature means that the IOMMU driver must be informed that the VFIO domain
is being created for the purposes of nested translation.

This patch adds a new IOMMU type (VFIO_TYPE1_NESTING_IOMMU) to the VFIO
type-1 driver. The new IOMMU type acts identically to the
VFIO_TYPE1v2_IOMMU type, but additionally sets the DOMAIN_ATTR_NESTING
attribute on its IOMMU domains.

Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-09-29 10:06:19 -06:00
Joerg Roedel eb165f0584 vfio: Convert to use new iommu_capable() API function
Cc: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-09-25 15:47:45 +02:00
Alex Williamson c8dbca165b vfio/iommu_type1: Avoid overflow
Coverity reports use of a tained scalar used as a loop boundary.
For the most part, any values passed from userspace for a DMA mapping
size, IOVA, or virtual address are valid, with some alignment
constraints.  The size is ultimately bound by how many pages the user
is able to lock, IOVA is tested by the IOMMU driver when doing a map,
and the virtual address needs to pass get_user_pages.  The only
problem I can find is that we do expect the __u64 user values to fit
within our variables, which might not happen on 32bit platforms.  Add
a test for this and return error on overflow.  Also propagate use of
the type-correct local variables throughout the function.

The above also points to the 'end' variable, which can be zero if
we're operating at the very top of the address space.  We try to
account for this, but our loop botches it.  Rework the loop to use
the remaining size as our loop condition rather than the IOVA vs end.

Detected by Coverity: CID 714659

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-05-30 11:35:54 -06:00
Linus Torvalds d0cb5f71c5 VFIO updates for v3.15 include:
- Allow the vfio-type1 IOMMU to support multiple domains within a container
 - Plumb path to query whether all domains are cache-coherent
 - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation
 - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd)
 
 The first patch also makes the vfio-type1 IOMMU driver completely independent
 of the bus_type of the devices it's handling, which enables it to be used for
 both vfio-pci and a future vfio-platform (and hopefully combinations involving
 both simultaneously).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTPbikAAoJECObm247sIsiG9cP/jnurW84YuAHmybzy3R4nMaa
 8BWcQST+TyQ78GWvubFDcRt+vHmJUI4iFWPBIm1twBSVrMy6F1GcT/spmSne1Dfb
 OvcfGEy59tGO+BeklHg5Qq2Hj1UzfeV9uQoy+PrpTF/sYzsyL6g5+O3Zv39SNvr0
 zCwnO2JAKXIpQlkK3wVha6V13X07Z/+d2b4JSsvmON2cnlPzOUYNrgBvoSeV2IQe
 3kMAU9qfAfQlpNDhRRZL/HshgWazCg4XZUp9UdNjUkOxrUp4vXHrVTUJnUGBDytk
 8V9UGRKF3mik9PqpZJk4jLV5urgUVpnUR5747uqs0KF+9GWxClXvh3gp1XX8Zn7f
 NCfDSMn/wrDr6fQBglKUadITaDhF49KV+J0cX083q46BbYxhAfgxv1I9UOvLreG6
 SGAezlTB0mefa1BmSwJdfKwDBsuDOhbF0TsHC6zT7yFc8pqe3hl/vQzUyu3HtwuM
 yPycQiQQmvOaymaiiBRwyLocwJbDNKPigDomv8NZ8Mcd97Fu9YsF4ydaxMJmmeKf
 TffYS4z4tUElYiU2vv1ipB8T+ReCmdtj2cIvyfwTL9jycY9ouO4bJ56dOGWd3WNl
 m7AVokbrrJQ03itUpFMuYjHAzTNUlXVvXlGr9n6L4VTR0RTL1zwJVrMVDsXdhxm4
 XgDbVB5PrxinDAC1vJvI
 =mnzO
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:
 "VFIO updates for v3.15 include:

   - Allow the vfio-type1 IOMMU to support multiple domains within a
     container
   - Plumb path to query whether all domains are cache-coherent
   - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation
   - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd)

  The first patch also makes the vfio-type1 IOMMU driver completely
  independent of the bus_type of the devices it's handling, which
  enables it to be used for both vfio-pci and a future vfio-platform
  (and hopefully combinations involving both simultaneously)"

* tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio:
  vfio: always select ANON_INODES
  kvm/vfio: Support for DMA coherent IOMMUs
  vfio: Add external user check extension interface
  vfio/type1: Add extension to test DMA cache coherence of IOMMU
  vfio/iommu_type1: Multi-IOMMU domain support
2014-04-03 14:05:02 -07:00
David Rientjes 668f9abbd4 mm: close PageTail race
Commit bf6bddf192 ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04 07:55:47 -08:00
Alex Williamson aa42931827 vfio/type1: Add extension to test DMA cache coherence of IOMMU
Now that the type1 IOMMU backend can support IOMMU_CACHE, we need to
be able to test whether coherency is currently enforced.  Add an
extension for this.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-02-26 11:38:37 -07:00
Alex Williamson 1ef3e2bc04 vfio/iommu_type1: Multi-IOMMU domain support
We currently have a problem that we cannot support advanced features
of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee
that those features will be supported by all of the hardware units
involved with the domain over its lifetime.  For instance, the Intel
VT-d architecture does not require that all DRHDs support snoop
control.  If we create a domain based on a device behind a DRHD that
does support snoop control and enable SNP support via the IOMMU_CACHE
mapping option, we cannot then add a device behind a DRHD which does
not support snoop control or we'll get reserved bit faults from the
SNP bit in the pagetables.  To add to the complexity, we can't know
the properties of a domain until a device is attached.

We could pass this problem off to userspace and require that a
separate vfio container be used, but we don't know how to handle page
accounting in that case.  How do we know that a page pinned in one
container is the same page as a different container and avoid double
billing the user for the page.

The solution is therefore to support multiple IOMMU domains per
container.  In the majority of cases, only one domain will be required
since hardware is typically consistent within a system.  However, this
provides us the ability to validate compatibility of domains and
support mixed environments where page table flags can be different
between domains.

To do this, our DMA tracking needs to change.  We currently try to
coalesce user mappings into as few tracking entries as possible.  The
problem then becomes that we lose granularity of user mappings.  We've
never guaranteed that a user is able to unmap at a finer granularity
than the original mapping, but we must honor the granularity of the
original mapping.  This coalescing code is therefore removed, allowing
only unmaps covering complete maps.  The change in accounting is
fairly small here, a typical QEMU VM will start out with roughly a
dozen entries, so it's arguable if this coalescing was ever needed.

We also move IOMMU domain creation to the point where a group is
attached to the container.  An interesting side-effect of this is that
we now have access to the device at the time of domain creation and
can probe the devices within the group to determine the bus_type.
This finally makes vfio_iommu_type1 completely device/bus agnostic.
In fact, each IOMMU domain can host devices on different buses managed
by different physical IOMMUs, and present a single DMA mapping
interface to the user.  When a new domain is created, mappings are
replayed to bring the IOMMU pagetables up to the state of the current
container.  And of course, DMA mapping and unmapping automatically
traverse all of the configured IOMMU domains.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Varun Sethi <Varun.Sethi@freescale.com>
2014-02-26 11:38:36 -07:00
Antonios Motakis d93b3ac0ed VFIO: vfio_iommu_type1: fix bug caused by break in nested loop
In vfio_iommu_type1.c there is a bug in vfio_dma_do_map, when checking
that pages are not already mapped. Since the check is being done in a
for loop nested within the main loop, breaking out of it does not create
the intended behavior. If the underlying IOMMU driver returns a non-NULL
value, this will be ignored and mapping the DMA range will be attempted
anyway, leading to unpredictable behavior.

This interracts badly with the ARM SMMU driver issue fixed in the patch
that was submitted with the title:
"[PATCH 2/2] ARM: SMMU: return NULL on error in arm_smmu_iova_to_phys"
Both fixes are required in order to use the vfio_iommu_type1 driver
with an ARM SMMU.

This patch refactors the function slightly, in order to also make this
kind of bug less likely.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-10-11 10:40:46 -06:00
Alex Williamson 8d38ef1948 vfio/type1: Fix leak on error path
We also don't handle unpinning zero pages as an error on other exits
so we can fix that inconsistency by rolling in the next conditional
return.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-07-01 08:28:58 -06:00
Alex Williamson f5bfdbf252 vfio/type1: Fix missed frees and zero sized removes
With hugepage support we can only properly aligned and sized ranges.
We only guarantee that we can unmap the same ranges mapped and not
arbitrary sub-ranges.  This means we might not free anything or might
free more than requested.  The vfio unmap interface started storing
the unmapped size to return to userspace to handle this.  This patch
fixes a few places where we don't properly handle those cases, moves
a memory allocation to a place where failure is an option and checks
our loops to make sure we don't get into an infinite loop trying to
remove an overlap.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-06-25 16:01:44 -06:00
Alex Williamson 5c6c2b21ec vfio: Provide module option to disable vfio_iommu_type1 hugepage support
Add a module option to vfio_iommu_type1 to disable IOMMU hugepage
support.  This causes iommu_map to only be called with single page
mappings, disabling the IOMMU driver's ability to use hugepages.
This option can be enabled by loading vfio_iommu_type1 with
disable_hugepages=1 or dynamically through sysfs.  If enabled
dynamically, only new mappings are restricted.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-06-21 09:38:11 -06:00
Alex Williamson 166fd7d94a vfio: hugepage support for vfio_iommu_type1
We currently send all mappings to the iommu in PAGE_SIZE chunks,
which prevents the iommu from enabling support for larger page sizes.
We still need to pin pages, which means we step through them in
PAGE_SIZE chunks, but we can batch up contiguous physical memory
chunks to allow the iommu the opportunity to use larger pages.  The
approach here is a bit different that the one currently used for
legacy KVM device assignment.  Rather than looking at the vma page
size and using that as the maximum size to pass to the iommu, we
instead simply look at whether the next page is physically
contiguous.  This means we might ask the iommu to map a 4MB region,
while legacy KVM might limit itself to a maximum of 2MB.

Splitting our mapping path also allows us to be smarter about locked
memory because we can more easily unwind if the user attempts to
exceed the limit.  Therefore, rather than assuming that a mapping
will result in locked memory, we test each page as it is pinned to
determine whether it locks RAM vs an mmap'd MMIO region.  This should
result in better locking granularity and less locked page fudge
factors in userspace.

The unmap path uses the same algorithm as legacy KVM.  We don't want
to track the pfn for each mapping ourselves, but we need the pfn in
order to unpin pages.  We therefore ask the iommu for the iova to
physical address translation, ask it to unpin a page, and see how many
pages were actually unpinned.  iommus supporting large pages will
often return something bigger than a page here, which we know will be
physically contiguous and we can unpin a batch of pfns.  iommus not
supporting large mappings won't see an improvement in batching here as
they only unmap a page at a time.

With this change, we also make a clarification to the API for mapping
and unmapping DMA.  We can only guarantee unmaps at the same
granularity as used for the original mapping.  In other words,
unmapping a subregion of a previous mapping is not guaranteed and may
result in a larger or smaller unmapping than requested.  The size
field in the unmapping structure is updated to reflect this.
Previously this was unmodified on mapping, always returning the the
requested unmap size.  This is now updated to return the actual unmap
size on success, allowing userspace to appropriately track mappings.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-06-21 09:38:02 -06:00
Alex Williamson cd9b22685e vfio: Convert type1 iommu to use rbtree
We need to keep track of all the DMA mappings of an iommu container so
that it can be automatically unmapped when the user releases the file
descriptor.  We currently do this using a simple list, where we merge
entries with contiguous iovas and virtual addresses.  Using a tree for
this is a bit more efficient and allows us to use common code instead
of inventing our own.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-06-21 09:37:50 -06:00
Alex Williamson 73fa0d10d0 vfio: Type1 IOMMU implementation
This VFIO IOMMU backend is designed primarily for AMD-Vi and Intel
VT-d hardware, but is potentially usable by anything supporting
similar mapping functionality.  We arbitrarily call this a Type1
backend for lack of a better name.  This backend has no IOVA
or host memory mapping restrictions for the user and is optimized
for relatively static mappings.  Mapped areas are pinned into system
memory.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-07-31 08:16:23 -06:00