Commit graph

118 commits

Author SHA1 Message Date
David Woodhouse 7c7faa11ec iommu/vt-d: Remove device_to_iommu() call from domain_remove_dev_info()
This was problematic because it works by domain/bus/devfn and we want
to make device_to_iommu() use only a struct device * (for handling non-PCI
devices). Now that the iommu pointer is reliably stored in the
device_domain_info, we don't need to look it up.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:53 +00:00
David Woodhouse 8bbc441012 iommu/vt-d: Simplify iommu check in domain_remove_one_dev_info()
Now we store the iommu in the device_domain_info, we don't need to do a
lookup.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:51 +00:00
David Woodhouse 5a8f40e8c8 iommu/vt-d: Always store iommu in device_domain_info
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:44 +00:00
David Woodhouse e2f8c5f6d4 iommu/vt-d: Use domain_remove_one_dev_info() in domain_add_dev_info() error path
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:42 +00:00
David Woodhouse 0ac7266485 iommu/vt-d: use dmar_insert_dev_info() from dma_add_dev_info()
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:41 +00:00
David Woodhouse b718cd3d84 iommu/vt-d: Stop dmar_insert_dev_info() freeing domains on losing race
By moving this into get_domain_for_dev() we can make dmar_insert_dev_info()
suitable for use with "special" domains such as the si_domain, which
currently use domain_add_dev_info().

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:39 +00:00
David Woodhouse 64ae892bfe iommu/vt-d: Pass iommu to domain_context_mapping_one() and iommu_support_dev_iotlb()
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:37 +00:00
David Woodhouse 0bcb3e28c3 iommu/vt-d: Use struct device in device_domain_info, not struct pci_dev
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:36 +00:00
David Woodhouse 1525a29a7d iommu/vt-d: Make dmar_insert_dev_info() take struct device instead of struct pci_dev
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:34 +00:00
David Woodhouse 3d89194a94 iommu/vt-d: Make iommu_dummy() take struct device instead of struct pci_dev
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:06:33 +00:00
David Woodhouse 832bd85867 iommu/vt-d: Change scope lists to struct device, bus, devfn
It's not only for PCI devices any more, and the scope information for an
ACPI device provides the bus and devfn so that has to be stored here too.

It is the device pointer itself which needs to be protected with RCU,
so the __rcu annotation follows it into the definition of struct
dmar_dev_scope, since we're no longer just passing arrays of device
pointers around.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-24 14:05:08 +00:00
David Woodhouse d050196087 iommu/vt-d: Be less pessimistic about domain coherency where possible
In commit 2e12bc29 ("intel-iommu: Default to non-coherent for domains
unattached to iommus") we decided to err on the side of caution and
always assume that it's possible that a device will be attached which is
behind a non-coherent IOMMU.

In some cases, however, that just *cannot* happen. If there *are* no
IOMMUs in the system which are non-coherent, then we don't need to do
it. And flushing the dcache is a *significant* performance hit.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-19 17:25:48 +00:00
David Woodhouse 214e39aa36 iommu/vt-d: Honour intel_iommu=sp_off for non-VMM domains
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-19 17:22:13 +00:00
David Woodhouse ea8ea460c9 iommu/vt-d: Clean up and fix page table clear/free behaviour
There is a race condition between the existing clear/free code and the
hardware. The IOMMU is actually permitted to cache the intermediate
levels of the page tables, and doesn't need to walk the table from the
very top of the PGD each time. So the existing back-to-back calls to
dma_pte_clear_range() and dma_pte_free_pagetable() can lead to a
use-after-free where the IOMMU reads from a freed page table.

When freeing page tables we actually need to do the IOTLB flush, with
the 'invalidation hint' bit clear to indicate that it's not just a
leaf-node flush, after unlinking each page table page from the next level
up but before actually freeing it.

So in the rewritten domain_unmap() we just return a list of pages (using
pg->freelist to make a list of them), and then the caller is expected to
do the appropriate IOTLB flush (or tear down the domain completely,
whatever), before finally calling dma_free_pagelist() to free the pages.

As an added bonus, we no longer need to flush the CPU's data cache for
pages which are about to be *removed* from the page table hierarchy anyway,
in the non-cache-coherent case. This drastically improves the performance
of large unmaps.

As a side-effect of all these changes, this also fixes the fact that
intel_iommu_unmap() was neglecting to free the page tables for the range
in question after clearing them.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-19 17:21:41 +00:00
David Woodhouse 5cf0a76fa2 iommu/vt-d: Clean up size handling for intel_iommu_unmap()
We have this horrid API where iommu_unmap() can unmap more than it's asked
to, if the IOVA in question happens to be mapped with a large page.

Instead of propagating this nonsense to the point where we end up returning
the page order from dma_pte_clear_range(), let's just do it once and adjust
the 'size' parameter accordingly.

Augment pfn_to_dma_pte() to return the level at which the PTE was found,
which will also be useful later if we end up changing the API for
iommu_iova_to_phys() to behave the same way as is being discussed upstream.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2014-03-19 17:21:32 +00:00
Jiang Liu 75f05569d0 iommu/vt-d: Update IOMMU state when memory hotplug happens
If static identity domain is created, IOMMU driver needs to update
si_domain page table when memory hotplug event happens. Otherwise
PCI device DMA operations can't access the hot-added memory regions.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:06 +01:00
Jiang Liu 2e45528930 iommu/vt-d: Unify the way to process DMAR device scope array
Now we have a PCI bus notification based mechanism to update DMAR
device scope array, we could extend the mechanism to support boot
time initialization too, which will help to unify and simplify
the implementation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:06 +01:00
Jiang Liu 59ce0515cd iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens
Current Intel DMAR/IOMMU driver assumes that all PCI devices associated
with DMAR/RMRR/ATSR device scope arrays are created at boot time and
won't change at runtime, so it caches pointers of associated PCI device
object. That assumption may be wrong now due to:
1) introduction of PCI host bridge hotplug
2) PCI device hotplug through sysfs interfaces.

Wang Yijing has tried to solve this issue by caching <bus, dev, func>
tupple instead of the PCI device object pointer, but that's still
unreliable because PCI bus number may change in case of hotplug.
Please refer to http://lkml.org/lkml/2013/11/5/64
Message from Yingjing's mail:
after remove and rescan a pci device
[  611.857095] dmar: DRHD: handling fault status reg 2
[  611.857109] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff7000
[  611.857109] DMAR:[fault reason 02] Present bit in context entry is clear
[  611.857524] dmar: DRHD: handling fault status reg 102
[  611.857534] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff6000
[  611.857534] DMAR:[fault reason 02] Present bit in context entry is clear
[  611.857936] dmar: DRHD: handling fault status reg 202
[  611.857947] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff5000
[  611.857947] DMAR:[fault reason 02] Present bit in context entry is clear
[  611.858351] dmar: DRHD: handling fault status reg 302
[  611.858362] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff4000
[  611.858362] DMAR:[fault reason 02] Present bit in context entry is clear
[  611.860819] IPv6: ADDRCONF(NETDEV_UP): eth3: link is not ready
[  611.860983] dmar: DRHD: handling fault status reg 402
[  611.860995] dmar: INTR-REMAP: Request device [[86:00.3] fault index a4
[  611.860995] INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

This patch introduces a new mechanism to update the DRHD/RMRR/ATSR device scope
caches by hooking PCI bus notification.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:06 +01:00
Jiang Liu 0e242612d9 iommu/vt-d: Use RCU to protect global resources in interrupt context
Global DMA and interrupt remapping resources may be accessed in
interrupt context, so use RCU instead of rwsem to protect them
in such cases.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:05 +01:00
Jiang Liu 3a5670e8ac iommu/vt-d: Introduce a rwsem to protect global data structures
Introduce a global rwsem dmar_global_lock, which will be used to
protect DMAR related global data structures from DMAR/PCI/memory
device hotplug operations in process context.

DMA and interrupt remapping related data structures are read most,
and only change when memory/PCI/DMAR hotplug event happens.
So a global rwsem solution is adopted for balance between simplicity
and performance.

For interrupt remapping driver, function intel_irq_remapping_supported(),
dmar_table_init(), intel_enable_irq_remapping(), disable_irq_remapping(),
reenable_irq_remapping() and enable_drhd_fault_handling() etc
are called during booting, suspending and resuming with interrupt
disabled, so no need to take the global lock.

For interrupt remapping entry allocation, the locking model is:
	down_read(&dmar_global_lock);
	/* Find corresponding iommu */
	iommu = map_hpet_to_ir(id);
	if (iommu)
		/*
		 * Allocate remapping entry and mark entry busy,
		 * the IOMMU won't be hot-removed until the
		 * allocated entry has been released.
		 */
		index = alloc_irte(iommu, irq, 1);
	up_read(&dmar_global_lock);

For DMA remmaping driver, we only uses the dmar_global_lock rwsem to
protect functions which are only called in process context. For any
function which may be called in interrupt context, we will use RCU
to protect them in following patches.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:05 +01:00
Jiang Liu b683b230a2 iommu/vt-d: Introduce macro for_each_dev_scope() to walk device scope entries
Introduce for_each_dev_scope()/for_each_active_dev_scope() to walk
{active} device scope entries. This will help following RCU lock
related patches.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:04 +01:00
Jiang Liu b5f82ddf22 iommu/vt-d: Fix error in detect ATS capability
Current Intel IOMMU driver only matches a PCIe root port with the first
DRHD unit with the samge segment number. It will report false result
if there are multiple DRHD units with the same segment number, thus fail
to detect ATS capability for some PCIe devices.

This patch refines function dmar_find_matched_atsr_unit() to search all
DRHD units with the same segment number.

An example DMAR table entries as below:
[1D0h 0464  2]                Subtable Type : 0002 <Root Port ATS Capability>
[1D2h 0466  2]                       Length : 0028
[1D4h 0468  1]                        Flags : 00
[1D5h 0469  1]                     Reserved : 00
[1D6h 0470  2]           PCI Segment Number : 0000

[1D8h 0472  1]      Device Scope Entry Type : 02
[1D9h 0473  1]                 Entry Length : 08
[1DAh 0474  2]                     Reserved : 0000
[1DCh 0476  1]               Enumeration ID : 00
[1DDh 0477  1]               PCI Bus Number : 00
[1DEh 0478  2]                     PCI Path : [02, 00]

[1E0h 0480  1]      Device Scope Entry Type : 02
[1E1h 0481  1]                 Entry Length : 08
[1E2h 0482  2]                     Reserved : 0000
[1E4h 0484  1]               Enumeration ID : 00
[1E5h 0485  1]               PCI Bus Number : 00
[1E6h 0486  2]                     PCI Path : [03, 00]

[1E8h 0488  1]      Device Scope Entry Type : 02
[1E9h 0489  1]                 Entry Length : 08
[1EAh 0490  2]                     Reserved : 0000
[1ECh 0492  1]               Enumeration ID : 00
[1EDh 0493  1]               PCI Bus Number : 00
[1EEh 0494  2]                     PCI Path : [03, 02]

[1F0h 0496  1]      Device Scope Entry Type : 02
[1F1h 0497  1]                 Entry Length : 08
[1F2h 0498  2]                     Reserved : 0000
[1F4h 0500  1]               Enumeration ID : 00
[1F5h 0501  1]               PCI Bus Number : 00
[1F6h 0502  2]                     PCI Path : [03, 03]

[1F8h 0504  2]                Subtable Type : 0002 <Root Port ATS Capability>
[1FAh 0506  2]                       Length : 0020
[1FCh 0508  1]                        Flags : 00
[1FDh 0509  1]                     Reserved : 00
[1FEh 0510  2]           PCI Segment Number : 0000

[200h 0512  1]      Device Scope Entry Type : 02
[201h 0513  1]                 Entry Length : 08
[202h 0514  2]                     Reserved : 0000
[204h 0516  1]               Enumeration ID : 00
[205h 0517  1]               PCI Bus Number : 40
[206h 0518  2]                     PCI Path : [02, 00]

[208h 0520  1]      Device Scope Entry Type : 02
[209h 0521  1]                 Entry Length : 08
[20Ah 0522  2]                     Reserved : 0000
[20Ch 0524  1]               Enumeration ID : 00
[20Dh 0525  1]               PCI Bus Number : 40
[20Eh 0526  2]                     PCI Path : [02, 02]

[210h 0528  1]      Device Scope Entry Type : 02
[211h 0529  1]                 Entry Length : 08
[212h 0530  2]                     Reserved : 0000
[214h 0532  1]               Enumeration ID : 00
[215h 0533  1]               PCI Bus Number : 40
[216h 0534  2]                     PCI Path : [03, 00]

[218h 0536  2]                Subtable Type : 0002 <Root Port ATS Capability>
[21Ah 0538  2]                       Length : 0020
[21Ch 0540  1]                        Flags : 00
[21Dh 0541  1]                     Reserved : 00
[21Eh 0542  2]           PCI Segment Number : 0000

[220h 0544  1]      Device Scope Entry Type : 02
[221h 0545  1]                 Entry Length : 08
[222h 0546  2]                     Reserved : 0000
[224h 0548  1]               Enumeration ID : 00
[225h 0549  1]               PCI Bus Number : 80
[226h 0550  2]                     PCI Path : [02, 00]

[228h 0552  1]      Device Scope Entry Type : 02
[229h 0553  1]                 Entry Length : 08
[22Ah 0554  2]                     Reserved : 0000
[22Ch 0556  1]               Enumeration ID : 00
[22Dh 0557  1]               PCI Bus Number : 80
[22Eh 0558  2]                     PCI Path : [02, 02]

[230h 0560  1]      Device Scope Entry Type : 02
[231h 0561  1]                 Entry Length : 08
[232h 0562  2]                     Reserved : 0000
[234h 0564  1]               Enumeration ID : 00
[235h 0565  1]               PCI Bus Number : 80
[236h 0566  2]                     PCI Path : [03, 00]

[238h 0568  2]                Subtable Type : 0002 <Root Port ATS Capability>
[23Ah 0570  2]                       Length : 0020
[23Ch 0572  1]                        Flags : 00
[23Dh 0573  1]                     Reserved : 00
[23Eh 0574  2]           PCI Segment Number : 0000

[240h 0576  1]      Device Scope Entry Type : 02
[241h 0577  1]                 Entry Length : 08
[242h 0578  2]                     Reserved : 0000
[244h 0580  1]               Enumeration ID : 00
[245h 0581  1]               PCI Bus Number : C0
[246h 0582  2]                     PCI Path : [02, 00]

[248h 0584  1]      Device Scope Entry Type : 02
[249h 0585  1]                 Entry Length : 08
[24Ah 0586  2]                     Reserved : 0000
[24Ch 0588  1]               Enumeration ID : 00
[24Dh 0589  1]               PCI Bus Number : C0
[24Eh 0590  2]                     PCI Path : [02, 02]

[250h 0592  1]      Device Scope Entry Type : 02
[251h 0593  1]                 Entry Length : 08
[252h 0594  2]                     Reserved : 0000
[254h 0596  1]               Enumeration ID : 00
[255h 0597  1]               PCI Bus Number : C0
[256h 0598  2]                     PCI Path : [03, 00]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:04 +01:00
Jiang Liu a4eaa86c0c iommu/vt-d: Check for NULL pointer when freeing IOMMU data structure
Domain id 0 will be assigned to invalid translation without allocating
domain data structure if DMAR unit supports caching mode. So in function
free_dmar_iommu(), we should check whether the domain pointer is NULL,
otherwise it will cause system crash as below:
[    6.790519] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
[    6.799520] IP: [<ffffffff810e2dc8>] __lock_acquire+0x11f8/0x1430
[    6.806493] PGD 0
[    6.817972] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[    6.823303] Modules linked in:
[    6.826862] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc1+ #126
[    6.834252] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.R00.1402050741 02/05/2014
[    6.845951] task: ffff880455a80000 ti: ffff880455a88000 task.ti: ffff880455a88000
[    6.854437] RIP: 0010:[<ffffffff810e2dc8>]  [<ffffffff810e2dc8>] __lock_acquire+0x11f8/0x1430
[    6.864154] RSP: 0000:ffff880455a89ce0  EFLAGS: 00010046
[    6.870179] RAX: 0000000000000046 RBX: 0000000000000002 RCX: 0000000000000000
[    6.878249] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000c8
[    6.886318] RBP: ffff880455a89d40 R08: 0000000000000002 R09: 0000000000000001
[    6.894387] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880455a80000
[    6.902458] R13: 0000000000000000 R14: 00000000000000c8 R15: 0000000000000000
[    6.910520] FS:  0000000000000000(0000) GS:ffff88045b800000(0000) knlGS:0000000000000000
[    6.919687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.926198] CR2: 00000000000000c8 CR3: 0000000001e0e000 CR4: 00000000001407f0
[    6.934269] Stack:
[    6.936588]  ffffffffffffff10 ffffffff810f59db 0000000000000010 0000000000000246
[    6.945219]  ffff880455a89d10 0000000000000000 ffffffff82bcb980 0000000000000046
[    6.953850]  0000000000000000 0000000000000000 0000000000000002 0000000000000000
[    6.962482] Call Trace:
[    6.965300]  [<ffffffff810f59db>] ? vprintk_emit+0x4fb/0x5a0
[    6.971716]  [<ffffffff810e3185>] lock_acquire+0x185/0x200
[    6.977941]  [<ffffffff821fbbee>] ? init_dmars+0x839/0xa1d
[    6.984167]  [<ffffffff81870b06>] _raw_spin_lock_irqsave+0x56/0x90
[    6.991158]  [<ffffffff821fbbee>] ? init_dmars+0x839/0xa1d
[    6.997380]  [<ffffffff821fbbee>] init_dmars+0x839/0xa1d
[    7.003410]  [<ffffffff8147d575>] ? pci_get_dev_by_id+0x75/0xd0
[    7.010119]  [<ffffffff821fc146>] intel_iommu_init+0x2f0/0x502
[    7.016735]  [<ffffffff821a7947>] ? iommu_setup+0x27d/0x27d
[    7.023056]  [<ffffffff821a796f>] pci_iommu_init+0x28/0x52
[    7.029282]  [<ffffffff81002162>] do_one_initcall+0xf2/0x220
[    7.035702]  [<ffffffff810a4a29>] ? parse_args+0x2c9/0x450
[    7.041919]  [<ffffffff8219d1b1>] kernel_init_freeable+0x1c9/0x25b
[    7.048919]  [<ffffffff8219c8d2>] ? do_early_param+0x8a/0x8a
[    7.055336]  [<ffffffff8184d3f0>] ? rest_init+0x150/0x150
[    7.061461]  [<ffffffff8184d3fe>] kernel_init+0xe/0x100
[    7.067393]  [<ffffffff8187b5fc>] ret_from_fork+0x7c/0xb0
[    7.073518]  [<ffffffff8184d3f0>] ? rest_init+0x150/0x150
[    7.079642] Code: 01 76 18 89 05 46 04 36 01 41 be 01 00 00 00 e9 2f 02 00 00 0f 1f 80 00 00 00 00 41 be 01 00 00 00 e9 1d 02 00 00 0f 1f 44 00 00 <49> 81 3e c0 31 34 82 b8 01 00 00 00 0f 44 d8 41 83 ff 01 0f 87
[    7.104944] RIP  [<ffffffff810e2dc8>] __lock_acquire+0x11f8/0x1430
[    7.112008]  RSP <ffff880455a89ce0>
[    7.115988] CR2: 00000000000000c8
[    7.119784] ---[ end trace 13d756f0f462c538 ]---
[    7.125034] note: swapper/0[1] exited with preempt_count 1
[    7.131285] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    7.131285]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:03 +01:00
Jiang Liu 9ebd682e5a iommu/vt-d: Fix incorrect iommu_count for si_domain
The iommu_count field in si_domain(static identity domain) is
initialized to zero and never increases. It will underflow
when tearing down iommu unit in function free_dmar_iommu()
and leak memory. So refine code to correctly manage
si_domain->iommu_count.

Warning message caused by si_domain memory leak:
[   14.609681] IOMMU: Setting RMRR:
[   14.613496] Ignoring identity map for HW passthrough device 0000:00:1a.0 [0xbdcfd000 - 0xbdd1dfff]
[   14.623809] Ignoring identity map for HW passthrough device 0000:00:1d.0 [0xbdcfd000 - 0xbdd1dfff]
[   14.634162] IOMMU: Prepare 0-16MiB unity mapping for LPC
[   14.640329] Ignoring identity map for HW passthrough device 0000:00:1f.0 [0x0 - 0xffffff]
[   14.673360] IOMMU: dmar init failed
[   14.678157] kmem_cache_destroy iommu_devinfo: Slab cache still has objects
[   14.686076] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59
[   14.694176] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[   14.707412]  0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff880c2cc37c00
[   14.716407]  ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00
[   14.725468]  ffffffff81dc7a6a ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711
[   14.734464] Call Trace:
[   14.737453]  [<ffffffff8156223d>] dump_stack+0x4d/0x66
[   14.743430]  [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100
[   14.750279]  [<ffffffff81dc7a6a>] intel_iommu_init+0x122/0x56a
[   14.757035]  [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[   14.763491]  [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[   14.769846]  [<ffffffff81000342>] do_one_initcall+0x122/0x180
[   14.776506]  [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[   14.782866]  [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[   14.789994]  [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[   14.796556]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   14.802626]  [<ffffffff8154ffce>] kernel_init+0xe/0x130
[   14.808698]  [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0
[   14.814963]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   14.821640] kmem_cache_destroy iommu_domain: Slab cache still has objects
[   14.829456] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59
[   14.837562] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[   14.850803]  0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff88102c1ee3c0
[   14.861222]  ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00
[   14.870284]  ffffffff81dc7a76 ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711
[   14.879271] Call Trace:
[   14.882227]  [<ffffffff8156223d>] dump_stack+0x4d/0x66
[   14.888197]  [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100
[   14.895034]  [<ffffffff81dc7a76>] intel_iommu_init+0x12e/0x56a
[   14.901781]  [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[   14.908238]  [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[   14.914594]  [<ffffffff81000342>] do_one_initcall+0x122/0x180
[   14.921244]  [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[   14.927598]  [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[   14.934738]  [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[   14.941309]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   14.947380]  [<ffffffff8154ffce>] kernel_init+0xe/0x130
[   14.953430]  [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0
[   14.959689]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   14.966299] kmem_cache_destroy iommu_iova: Slab cache still has objects
[   14.973923] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59
[   14.982020] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[   14.995263]  0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff88042cb5c980
[   15.004265]  ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00
[   15.013322]  ffffffff81dc7a82 ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711
[   15.022318] Call Trace:
[   15.025238]  [<ffffffff8156223d>] dump_stack+0x4d/0x66
[   15.031202]  [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100
[   15.038038]  [<ffffffff81dc7a82>] intel_iommu_init+0x13a/0x56a
[   15.044786]  [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[   15.051242]  [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[   15.057601]  [<ffffffff81000342>] do_one_initcall+0x122/0x180
[   15.064254]  [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[   15.070608]  [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[   15.077747]  [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[   15.084300]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   15.090362]  [<ffffffff8154ffce>] kernel_init+0xe/0x130
[   15.096431]  [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0
[   15.102693]  [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[   15.189273] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:02 +01:00
Jiang Liu 92d03cc8d0 iommu/vt-d: Reduce duplicated code to handle virtual machine domains
Reduce duplicated code to handle virtual machine domains, there's no
functionality changes. It also improves code readability.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:01 +01:00
Jiang Liu e85bb5d4d1 iommu/vt-d: Free resources if failed to create domain for PCIe endpoint
Enhance function get_domain_for_dev() to release allocated resources
if failed to create domain for PCIe endpoint, otherwise the allocated
resources will get lost.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:01 +01:00
Jiang Liu 745f2586e7 iommu/vt-d: Simplify function get_domain_for_dev()
Function get_domain_for_dev() is a little complex, simplify it
by factoring out dmar_search_domain_by_dev_info() and
dmar_insert_dev_info().

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:01 +01:00
Jiang Liu b94e4117f8 iommu/vt-d: Move private structures and variables into intel-iommu.c
Move private structures and variables into intel-iommu.c, which will
help to simplify locking policy for hotplug. Also delete redundant
declarations.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:00 +01:00
Jiang Liu 7e7dfab71a iommu/vt-d: Avoid caching stale domain_device_info when hot-removing PCI device
Function device_notifier() in intel-iommu.c only remove domain_device_info
data structure associated with a PCI device when handling PCI device
driver unbinding events. If a PCI device has never been bound to a PCI
device driver, there won't be BUS_NOTIFY_UNBOUND_DRIVER event when
hot-removing the PCI device. So associated domain_device_info data
structure may get lost.

On the other hand, if iommu_pass_through is enabled, function
iommu_prepare_static_indentify_mapping() will create domain_device_info
data structure for each PCIe to PCIe bridge and PCIe endpoint,
no matter whether there are drivers associated with those PCIe devices
or not. So those domain_device_info data structures will get lost when
hot-removing the assocated PCIe devices if they have never bound to
any PCI device driver.

To be even worse, it's not only an memory leak issue, but also an
caching of stale information bug because the memory are kept in
device_domain_list and domain->devices lists.

Fix the bug by trying to remove domain_device_info data structure when
handling BUS_NOTIFY_DEL_DEVICE event.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:51:00 +01:00
Jiang Liu 816997d03b iommu/vt-d: Avoid caching stale domain_device_info and fix memory leak
Function device_notifier() in intel-iommu.c fails to remove
device_domain_info data structures for PCI devices if they are
associated with si_domain because iommu_no_mapping() returns true
for those PCI devices. This will cause memory leak and caching of
stale information in domain->devices list.

So fix the issue by not calling iommu_no_mapping() and skipping check
of iommu_pass_through.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:50:59 +01:00
Jiang Liu 989d51fc99 iommu/vt-d: Avoid double free of g_iommus on error recovery path
Array 'g_iommus' may be freed twice on error recovery path in function
init_dmars() and free_dmar_iommu(), thus cause random system crash as
below.

[    6.774301] IOMMU: dmar init failed
[    6.778310] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    6.785615] software IO TLB [mem 0x76bcf000-0x7abcf000] (64MB) mapped at [ffff880076bcf000-ffff88007abcefff]
[    6.796887] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[    6.804173] Modules linked in:
[    6.807731] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc1+ #108
[    6.815122] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.R00.1402050741 02/05/2014
[    6.836000] task: ffff880455a80000 ti: ffff880455a88000 task.ti: ffff880455a88000
[    6.844487] RIP: 0010:[<ffffffff8143eea6>]  [<ffffffff8143eea6>] memcpy+0x6/0x110
[    6.853039] RSP: 0000:ffff880455a89cc8  EFLAGS: 00010293
[    6.859064] RAX: ffff006568636163 RBX: ffff00656863616a RCX: 0000000000000005
[    6.867134] RDX: 0000000000000005 RSI: ffffffff81cdc439 RDI: ffff006568636163
[    6.875205] RBP: ffff880455a89d30 R08: 000000000001bc3b R09: 0000000000000000
[    6.883275] R10: 0000000000000000 R11: ffffffff81cdc43e R12: ffff880455a89da8
[    6.891338] R13: ffff006568636163 R14: 0000000000000005 R15: ffffffff81cdc439
[    6.899408] FS:  0000000000000000(0000) GS:ffff88045b800000(0000) knlGS:0000000000000000
[    6.908575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.915088] CR2: ffff88047e1ff000 CR3: 0000000001e0e000 CR4: 00000000001407f0
[    6.923160] Stack:
[    6.925487]  ffffffff8143c904 ffff88045b407e00 ffff006568636163 ffff006568636163
[    6.934113]  ffffffff8120a1a9 ffffffff81cdc43e 0000000000000007 0000000000000000
[    6.942747]  ffff880455a89da8 ffff006568636163 0000000000000007 ffffffff81cdc439
[    6.951382] Call Trace:
[    6.954197]  [<ffffffff8143c904>] ? vsnprintf+0x124/0x6f0
[    6.960323]  [<ffffffff8120a1a9>] ? __kmalloc_track_caller+0x169/0x360
[    6.967716]  [<ffffffff81440e1b>] kvasprintf+0x6b/0x80
[    6.973552]  [<ffffffff81432bf1>] kobject_set_name_vargs+0x21/0x70
[    6.980552]  [<ffffffff8143393d>] kobject_init_and_add+0x4d/0x90
[    6.987364]  [<ffffffff812067c9>] ? __kmalloc+0x169/0x370
[    6.993492]  [<ffffffff8102dbbc>] ? cache_add_dev+0x17c/0x4f0
[    7.000005]  [<ffffffff8102ddfa>] cache_add_dev+0x3ba/0x4f0
[    7.006327]  [<ffffffff821a87ca>] ? i8237A_init_ops+0x14/0x14
[    7.012842]  [<ffffffff821a87f8>] cache_sysfs_init+0x2e/0x61
[    7.019260]  [<ffffffff81002162>] do_one_initcall+0xf2/0x220
[    7.025679]  [<ffffffff810a4a29>] ? parse_args+0x2c9/0x450
[    7.031903]  [<ffffffff8219d1b1>] kernel_init_freeable+0x1c9/0x25b
[    7.038904]  [<ffffffff8219c8d2>] ? do_early_param+0x8a/0x8a
[    7.045322]  [<ffffffff8184d5e0>] ? rest_init+0x150/0x150
[    7.051447]  [<ffffffff8184d5ee>] kernel_init+0xe/0x100
[    7.057380]  [<ffffffff8187b87c>] ret_from_fork+0x7c/0xb0
[    7.063503]  [<ffffffff8184d5e0>] ? rest_init+0x150/0x150
[    7.069628] Code: 89 e5 53 48 89 fb 75 16 80 7f 3c 00 75 05 e8 d2 f9 ff ff 48 8b 43 58 48 2b 43 50 88 43 4e 5b 5d c3 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 c3 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b
[    7.094960] RIP  [<ffffffff8143eea6>] memcpy+0x6/0x110
[    7.100856]  RSP <ffff880455a89cc8>
[    7.104864] ---[ end trace b5d3fdc6c6c28083 ]---
[    7.110142] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    7.110142]
[    7.120540] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04 17:50:59 +01:00
Linus Torvalds b3a4bcaa5a IOMMU Updates for Linux v3.14
A few patches have been queued up for this merge window:
 
 	* Improvements for the ARM-SMMU driver
 	  (IOMMU_EXEC support, IOMMU group support)
 	* Updates and fixes for the shmobile IOMMU driver
 	* Various fixes to generic IOMMU code and the
 	  Intel IOMMU driver
 	* Some cleanups in IOMMU drivers (dev_is_pci() usage)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJS6XrCAAoJECvwRC2XARrjgy4P/itemtg2U+603Ldje8WcPo0E
 OCO/0VVSmCTYKUJDZY0hiVwqmhe5gFL3Hm/gGwkS0+UJenFXMmi+aVaPp4pCpgH+
 dL2HD3dIEvi14bisrdxG/8MdR6mIx0qzKtnZLkKSR4LXwucyLHvC/DaCoOytb7Yk
 7s+eEuo0hj0jAkiqSG/zLEtKElTEnoAAkLOjMy46orecJ5q4HusPZekLtWZs2ETe
 x3NS63Unb9g1iSQJWIA7HnQlxWIr2+iynoamHHJRiVFzqRF0W0sGvQY3auG0DSCn
 70WRNE1rKfEkfXMJxosRQ4394YUQdAkt8MBENNcJcC6E1n5PBi0cEZXH6mCnEIlG
 jXzIKUY9fz68ZboaqIxXv4Hb+JLlPXCvPBvQzIQiKRgVxd8nncEjn5I9MHdf+je5
 BmJlzJLJvP4cFvW8Hc8k2Oq101b1kEcSCLARWWvE9/bk9xIUyrqBkR4XjC0vb6qq
 1HbKVdZ7KFKCkBHy9xMpr7CUjKiDiiLeUmqlhyjcK9spicuNIZQnC11HemL6/USP
 oR6Ext9RGhvz+ch656+5+L6f6FURVP8/ywKiJ3RjmvXV5/fCYo3WMitOB2qzlWCy
 SYXAczAOMOdOo+1Dxbghrr+7HzUWPqgfPmntZEPGMZhfuZ6xXr+7pGLjAhHb4vcR
 SZxqkDo1cprqrR9KFAWC
 =YKLk
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU Updates from Joerg Roedel:
 "A few patches have been queued up for this merge window:

   - improvements for the ARM-SMMU driver (IOMMU_EXEC support, IOMMU
     group support)
   - updates and fixes for the shmobile IOMMU driver
   - various fixes to generic IOMMU code and the Intel IOMMU driver
   - some cleanups in IOMMU drivers (dev_is_pci() usage)"

* tag 'iommu-updates-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits)
  iommu/vt-d: Fix signedness bug in alloc_irte()
  iommu/vt-d: free all resources if failed to initialize DMARs
  iommu/vt-d, trivial: clean sparse warnings
  iommu/vt-d: fix wrong return value of dmar_table_init()
  iommu/vt-d: release invalidation queue when destroying IOMMU unit
  iommu/vt-d: fix access after free issue in function free_dmar_iommu()
  iommu/vt-d: keep shared resources when failed to initialize iommu devices
  iommu/vt-d: fix invalid memory access when freeing DMAR irq
  iommu/vt-d, trivial: simplify code with existing macros
  iommu/vt-d, trivial: use defined macro instead of hardcoding
  iommu/vt-d: mark internal functions as static
  iommu/vt-d, trivial: clean up unused code
  iommu/vt-d, trivial: check suitable flag in function detect_intel_iommu()
  iommu/vt-d, trivial: print correct domain id of static identity domain
  iommu/vt-d, trivial: refine support of 64bit guest address
  iommu/vt-d: fix resource leakage on error recovery path in iommu_init_domains()
  iommu/vt-d: fix a race window in allocating domain ID for virtual machines
  iommu/vt-d: fix PCI device reference leakage on error recovery path
  drm/msm: Fix link error with !MSM_IOMMU
  iommu/vt-d: use dedicated bitmap to track remapping entry allocation status
  ...
2014-01-29 20:00:13 -08:00
Alex Williamson 08336fd218 intel-iommu: fix off-by-one in pagetable freeing
dma_pte_free_level() has an off-by-one error when checking whether a pte
is completely covered by a range.  Take for example the case of
attempting to free pfn 0x0 - 0x1ff, ie.  512 entries covering the first
2M superpage.

The level_size() is 0x200 and we test:

  static void dma_pte_free_level(...
	...

	if (!(0 > 0 || 0x1ff < 0 + 0x200)) {
		...
	}

Clearly the 2nd test is true, which means we fail to take the branch to
clear and free the pagetable entry.  As a result, we're leaking
pagetables and failing to install new pages over the range.

This was found with a PCI device assigned to a QEMU guest using vfio-pci
without a VGA device present.  The first 1M of guest address space is
mapped with various combinations of 4K pages, but eventually the range
is entirely freed and replaced with a 2M contiguous mapping.
intel-iommu errors out with something like:

  ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083)

In this case 5c2b8003 is the pointer to the previous leaf page that was
neither freed nor cleared and 849c00083 is the superpage entry that
we're trying to replace it with.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Jiang Liu 9bdc531ec6 iommu/vt-d: free all resources if failed to initialize DMARs
Enhance intel_iommu_init() to free all resources if failed to
initialize DMAR hardware.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:44:30 +01:00
Jiang Liu b707cb027e iommu/vt-d, trivial: clean sparse warnings
Clean up most sparse warnings in Intel DMA and interrupt remapping
drivers.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:44:16 +01:00
Jiang Liu 5ced12af69 iommu/vt-d: fix access after free issue in function free_dmar_iommu()
Function free_dmar_iommu() may access domain->iommu_lock by
	spin_unlock_irqrestore(&domain->iommu_lock, flags);
after freeing corresponding domain structure.

Sample stack dump:
[    8.912818] =========================
[    8.917072] [ BUG: held lock freed! ]
[    8.921335] 3.13.0-rc1-gerry+ #12 Not tainted
[    8.926375] -------------------------
[    8.930629] swapper/0/1 is freeing memory ffff880c23b56040-ffff880c23b5613f, with a lock still held there!
[    8.941675]  (&(&domain->iommu_lock)->rlock){......}, at: [<ffffffff81dc775c>] init_dmars+0x72c/0x95b
[    8.952582] 1 lock held by swapper/0/1:
[    8.957031]  #0:  (&(&domain->iommu_lock)->rlock){......}, at: [<ffffffff81dc775c>] init_dmars+0x72c/0x95b
[    8.968487]
[    8.968487] stack backtrace:
[    8.973602] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #12
[    8.981556] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[    8.994742]  ffff880c23b56040 ffff88042dd33c98 ffffffff815617fd ffff88042dd38b28
[    9.003566]  ffff88042dd33cd0 ffffffff810a977a ffff880c23b56040 0000000000000086
[    9.012403]  ffff88102c4923c0 ffff88042ddb4800 ffffffff81b1e8c0 ffff88042dd33d28
[    9.021240] Call Trace:
[    9.024138]  [<ffffffff815617fd>] dump_stack+0x4d/0x66
[    9.030057]  [<ffffffff810a977a>] debug_check_no_locks_freed+0x15a/0x160
[    9.037723]  [<ffffffff811aa1c2>] kmem_cache_free+0x62/0x5b0
[    9.044225]  [<ffffffff81465e27>] domain_exit+0x197/0x1c0
[    9.050418]  [<ffffffff81dc7788>] init_dmars+0x758/0x95b
[    9.056527]  [<ffffffff81dc7dfa>] intel_iommu_init+0x351/0x438
[    9.063207]  [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[    9.069601]  [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[    9.075910]  [<ffffffff81000342>] do_one_initcall+0x122/0x180
[    9.082509]  [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[    9.088815]  [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[    9.095895]  [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[    9.102396]  [<ffffffff8154f580>] ? rest_init+0xd0/0xd0
[    9.108410]  [<ffffffff8154f58e>] kernel_init+0xe/0x130
[    9.114423]  [<ffffffff81574a2c>] ret_from_fork+0x7c/0xb0
[    9.120612]  [<ffffffff8154f580>] ? rest_init+0xd0/0xd0

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:43:42 +01:00
Jiang Liu a868e6b7b6 iommu/vt-d: keep shared resources when failed to initialize iommu devices
Data structure drhd->iommu is shared between DMA remapping driver and
interrupt remapping driver, so DMA remapping driver shouldn't release
drhd->iommu when it failed to initialize IOMMU devices. Otherwise it
may cause invalid memory access to the interrupt remapping driver.

Sample stack dump:
[   13.315090] BUG: unable to handle kernel paging request at ffffc9000605a088
[   13.323221] IP: [<ffffffff81461bac>] qi_submit_sync+0x15c/0x400
[   13.330107] PGD 82f81e067 PUD c2f81e067 PMD 82e846067 PTE 0
[   13.336818] Oops: 0002 [#1] SMP
[   13.340757] Modules linked in:
[   13.344422] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.13.0-rc1-gerry+ #7
[   13.352474] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T,                                               BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[   13.365659] Workqueue: events work_for_cpu_fn
[   13.370774] task: ffff88042ddf00d0 ti: ffff88042ddee000 task.ti: ffff88042dde                                              e000
[   13.379389] RIP: 0010:[<ffffffff81461bac>]  [<ffffffff81461bac>] qi_submit_sy                                              nc+0x15c/0x400
[   13.389055] RSP: 0000:ffff88042ddef940  EFLAGS: 00010002
[   13.395151] RAX: 00000000000005e0 RBX: 0000000000000082 RCX: 0000000200000025
[   13.403308] RDX: ffffc9000605a000 RSI: 0000000000000010 RDI: ffff88042ddb8610
[   13.411446] RBP: ffff88042ddef9a0 R08: 00000000000005d0 R09: 0000000000000001
[   13.419599] R10: 0000000000000000 R11: 000000000000005d R12: 000000000000005c
[   13.427742] R13: ffff88102d84d300 R14: 0000000000000174 R15: ffff88042ddb4800
[   13.435877] FS:  0000000000000000(0000) GS:ffff88043de00000(0000) knlGS:00000                                              00000000000
[   13.445168] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.451749] CR2: ffffc9000605a088 CR3: 0000000001a0b000 CR4: 00000000000407f0
[   13.459895] Stack:
[   13.462297]  ffff88042ddb85d0 000000000000005d ffff88042ddef9b0 0000000000000                                              5d0
[   13.471147]  00000000000005c0 ffff88042ddb8000 000000000000005c 0000000000000                                              015
[   13.480001]  ffff88042ddb4800 0000000000000282 ffff88042ddefa40 ffff88042ddef                                              ac0
[   13.488855] Call Trace:
[   13.491771]  [<ffffffff8146848d>] modify_irte+0x9d/0xd0
[   13.497778]  [<ffffffff8146886d>] intel_setup_ioapic_entry+0x10d/0x290
[   13.505250]  [<ffffffff810a92a6>] ? trace_hardirqs_on_caller+0x16/0x1e0
[   13.512824]  [<ffffffff810346b0>] ? default_init_apic_ldr+0x60/0x60
[   13.519998]  [<ffffffff81468be0>] setup_ioapic_remapped_entry+0x20/0x30
[   13.527566]  [<ffffffff8103683a>] io_apic_setup_irq_pin+0x12a/0x2c0
[   13.534742]  [<ffffffff8136673b>] ? acpi_pci_irq_find_prt_entry+0x2b9/0x2d8
[   13.544102]  [<ffffffff81037fd5>] io_apic_setup_irq_pin_once+0x85/0xa0
[   13.551568]  [<ffffffff8103816f>] ? mp_find_ioapic_pin+0x8f/0xf0
[   13.558434]  [<ffffffff81038044>] io_apic_set_pci_routing+0x34/0x70
[   13.565621]  [<ffffffff8102f4cf>] mp_register_gsi+0xaf/0x1c0
[   13.572111]  [<ffffffff8102f5ee>] acpi_register_gsi_ioapic+0xe/0x10
[   13.579286]  [<ffffffff8102f33f>] acpi_register_gsi+0xf/0x20
[   13.585779]  [<ffffffff81366b86>] acpi_pci_irq_enable+0x171/0x1e3
[   13.592764]  [<ffffffff8146d771>] pcibios_enable_device+0x31/0x40
[   13.599744]  [<ffffffff81320e9b>] do_pci_enable_device+0x3b/0x60
[   13.606633]  [<ffffffff81322248>] pci_enable_device_flags+0xc8/0x120
[   13.613887]  [<ffffffff813222f3>] pci_enable_device+0x13/0x20
[   13.620484]  [<ffffffff8132fa7e>] pcie_port_device_register+0x1e/0x510
[   13.627947]  [<ffffffff810a92a6>] ? trace_hardirqs_on_caller+0x16/0x1e0
[   13.635510]  [<ffffffff810a947d>] ? trace_hardirqs_on+0xd/0x10
[   13.642189]  [<ffffffff813302b8>] pcie_portdrv_probe+0x58/0xc0
[   13.648877]  [<ffffffff81323ba5>] local_pci_probe+0x45/0xa0
[   13.655266]  [<ffffffff8106bc44>] work_for_cpu_fn+0x14/0x20
[   13.661656]  [<ffffffff8106fa79>] process_one_work+0x369/0x710
[   13.668334]  [<ffffffff8106fa02>] ? process_one_work+0x2f2/0x710
[   13.675215]  [<ffffffff81071d56>] ? worker_thread+0x46/0x690
[   13.681714]  [<ffffffff81072194>] worker_thread+0x484/0x690
[   13.688109]  [<ffffffff81071d10>] ? cancel_delayed_work_sync+0x20/0x20
[   13.695576]  [<ffffffff81079c60>] kthread+0xf0/0x110
[   13.701300]  [<ffffffff8108e7bf>] ? local_clock+0x3f/0x50
[   13.707492]  [<ffffffff81079b70>] ? kthread_create_on_node+0x250/0x250
[   13.714959]  [<ffffffff81574d2c>] ret_from_fork+0x7c/0xb0
[   13.721152]  [<ffffffff81079b70>] ? kthread_create_on_node+0x250/0x250

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:43:40 +01:00
Jiang Liu b5f36d9e61 iommu/vt-d: fix invalid memory access when freeing DMAR irq
In function free_dmar_iommu(), it sets IRQ handler data to NULL
before calling free_irq(), which will cause invalid memory access
because free_irq() will access IRQ handler data when calling
function dmar_msi_mask(). So only set IRQ handler data to NULL
after calling free_irq().

Sample stack dump:
[   13.094010] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[   13.103215] IP: [<ffffffff810a97cd>] __lock_acquire+0x4d/0x12a0
[   13.110104] PGD 0
[   13.112614] Oops: 0000 [#1] SMP
[   13.116585] Modules linked in:
[   13.120260] CPU: 60 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0-rc1-gerry+ #9
[   13.129367] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[   13.142555] task: ffff88042dd38010 ti: ffff88042dd32000 task.ti: ffff88042dd32000
[   13.151179] RIP: 0010:[<ffffffff810a97cd>]  [<ffffffff810a97cd>] __lock_acquire+0x4d/0x12a0
[   13.160867] RSP: 0000:ffff88042dd33b78  EFLAGS: 00010046
[   13.166969] RAX: 0000000000000046 RBX: 0000000000000002 RCX: 0000000000000000
[   13.175122] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000048
[   13.183274] RBP: ffff88042dd33bd8 R08: 0000000000000002 R09: 0000000000000001
[   13.191417] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88042dd38010
[   13.199571] R13: 0000000000000000 R14: 0000000000000048 R15: 0000000000000000
[   13.207725] FS:  0000000000000000(0000) GS:ffff88103f200000(0000) knlGS:0000000000000000
[   13.217014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.223596] CR2: 0000000000000048 CR3: 0000000001a0b000 CR4: 00000000000407e0
[   13.231747] Stack:
[   13.234160]  0000000000000004 0000000000000046 ffff88042dd33b98 ffffffff810a567d
[   13.243059]  ffff88042dd33c08 ffffffff810bb14c ffffffff828995a0 0000000000000046
[   13.251969]  0000000000000000 0000000000000000 0000000000000002 0000000000000000
[   13.260862] Call Trace:
[   13.263775]  [<ffffffff810a567d>] ? trace_hardirqs_off+0xd/0x10
[   13.270571]  [<ffffffff810bb14c>] ? vprintk_emit+0x23c/0x570
[   13.277058]  [<ffffffff810ab1e3>] lock_acquire+0x93/0x120
[   13.283269]  [<ffffffff814623f7>] ? dmar_msi_mask+0x47/0x70
[   13.289677]  [<ffffffff8156b449>] _raw_spin_lock_irqsave+0x49/0x90
[   13.296748]  [<ffffffff814623f7>] ? dmar_msi_mask+0x47/0x70
[   13.303153]  [<ffffffff814623f7>] dmar_msi_mask+0x47/0x70
[   13.309354]  [<ffffffff810c0d93>] irq_shutdown+0x53/0x60
[   13.315467]  [<ffffffff810bdd9d>] __free_irq+0x26d/0x280
[   13.321580]  [<ffffffff810be920>] free_irq+0xf0/0x180
[   13.327395]  [<ffffffff81466591>] free_dmar_iommu+0x271/0x2b0
[   13.333996]  [<ffffffff810a947d>] ? trace_hardirqs_on+0xd/0x10
[   13.340696]  [<ffffffff81461a17>] free_iommu+0x17/0x50
[   13.346597]  [<ffffffff81dc75a5>] init_dmars+0x691/0x77a
[   13.352711]  [<ffffffff81dc7afd>] intel_iommu_init+0x351/0x438
[   13.359400]  [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[   13.365806]  [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[   13.372114]  [<ffffffff81000342>] do_one_initcall+0x122/0x180
[   13.378707]  [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[   13.385016]  [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[   13.392100]  [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[   13.398596]  [<ffffffff8154f8b0>] ? rest_init+0xd0/0xd0
[   13.404614]  [<ffffffff8154f8be>] kernel_init+0xe/0x130
[   13.410626]  [<ffffffff81574d6c>] ret_from_fork+0x7c/0xb0
[   13.416829]  [<ffffffff8154f8b0>] ? rest_init+0xd0/0xd0
[   13.422842] Code: ec 99 00 85 c0 8b 05 53 05 a5 00 41 0f 45 d8 85 c0 0f 84 ff 00 00 00 8b 05 99 f9 7e 01 49 89 fe 41 89 f7 85 c0 0f 84 03 01 00 00 <49> 8b 06 be 01 00 00 00 48 3d c0 0e 01 82 0f 44 de 41 83 ff 01
[   13.450191] RIP  [<ffffffff810a97cd>] __lock_acquire+0x4d/0x12a0
[   13.458598]  RSP <ffff88042dd33b78>
[   13.462671] CR2: 0000000000000048
[   13.466551] ---[ end trace c5bd26a37c81d760 ]---

Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:43:38 +01:00
Jiang Liu 7c9197791a iommu/vt-d, trivial: simplify code with existing macros
Simplify vt-d related code with existing macros and introduce a new
macro for_each_active_drhd_unit() to enumerate all active DRHD unit.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:43:37 +01:00
Jiang Liu b8a2d2881e iommu/vt-d, trivial: clean up unused code
Remove dead code from VT-d related files.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>

Conflicts:

	drivers/iommu/dmar.c
2014-01-09 12:43:31 +01:00
Jiang Liu 9544c003e8 iommu/vt-d, trivial: print correct domain id of static identity domain
Field si_domain->id is set by iommu_attach_domain(), so we should only
print domain id for static identity domain after calling
iommu_attach_domain(si_domain, iommu), otherwise it's always zero.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:43:28 +01:00
Jiang Liu 5c645b35b7 iommu/vt-d, trivial: refine support of 64bit guest address
In Intel IOMMU driver, it calculate page table level from adjusted guest
address width as 'level = (agaw - 30) / 9', which assumes (agaw -30)
could be divided by 9. On the other hand, 64bit is a valid agaw and
(64 - 30) can't be divided by 9, so it needs special handling.

This patch enhances Intel IOMMU driver to correctly handle 64bit agaw.
It's mainly for code readability because there's no hardware supporting
64bit agaw yet.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:43:27 +01:00
Jiang Liu 852bdb04f8 iommu/vt-d: fix resource leakage on error recovery path in iommu_init_domains()
Release allocated resources on error recovery path in function
iommu_init_domains().

Also improve printk messages in iommu_init_domains().

Acked-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:43:25 +01:00
Jiang Liu 18d99165d3 iommu/vt-d: fix a race window in allocating domain ID for virtual machines
Function intel_iommu_domain_init() may be concurrently called by upper
layer without serialization, so use atomic_t to protect domain id
allocation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:43:24 +01:00
Yijing Wang dbad086433 iommu/vt-d: Use dev_is_pci() to check whether it is pci device
Use PCI standard marco dev_is_pci() instead of directly compare
pci_bus_type to check whether it is pci device.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-07 15:21:45 +01:00
Yijing Wang bca2b916f3 iommu/vt-d: Use list_for_each_entry_safe() for dmar_domain->devices traversal
Replace list_for_each_safe() + list_entry() with the simpler
list_for_each_entry_safe().

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-11-01 14:18:48 +01:00
Julian Stecklina f9423606ad iommu/vt-d: Fixed interaction of VFIO_IOMMU_MAP_DMA with IOMMU address limits
The BUG_ON in drivers/iommu/intel-iommu.c:785 can be triggered from userspace via
VFIO by calling the VFIO_IOMMU_MAP_DMA ioctl on a vfio device with any address
beyond the addressing capabilities of the IOMMU. The problem is that the ioctl code
calls iommu_iova_to_phys before it calls iommu_map. iommu_map handles the case that
it gets addresses beyond the addressing capabilities of its IOMMU.
intel_iommu_iova_to_phys does not.

This patch fixes iommu_iova_to_phys to return NULL for addresses beyond what the
IOMMU can handle. This in turn causes the ioctl call to fail in iommu_map and
(correctly) return EFAULT to the user with a helpful warning message in the kernel
log.

Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-11-01 12:46:25 +01:00
Alex Williamson 3269ee0bd6 intel-iommu: Fix leaks in pagetable freeing
At best the current code only seems to free the leaf pagetables and
the root.  If you're unlucky enough to have a large gap (like any
QEMU guest with more than 3G of memory), only the first chunk of leaf
pagetables are freed (plus the root).  This is a massive memory leak.
This patch re-writes the pagetable freeing function to use a
recursive algorithm and manages to not only free all the pagetables,
but does it without any apparent performance loss versus the current
broken version.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-08-14 22:21:04 +02:00
Alex Williamson c14d26905d iommu/{vt-d,amd}: Remove multifunction assumption around grouping
If a device is multifunction and does not have ACS enabled then we
assume that the entire package lacks ACS and use function 0 as the
base of the group.  The PCIe spec however states that components are
permitted to implement ACS on some, none, or all of their applicable
functions.  It's therefore conceivable that function 0 may be fully
independent and support ACS while other functions do not.  Instead
use the lowest function of the slot that does not have ACS enabled
as the base of the group.  This may be the current device, which is
intentional.  So long as we use a consistent algorithm, all the
non-ACS functions will be grouped together and ACS functions will
get separate groups.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-06-20 17:21:09 +02:00
Joerg Roedel 0c4513be3d Merge branches 'iommu/fixes', 'x86/vt-d', 'x86/amd', 'ppc/pamu', 'core' and 'arm/tegra' into next 2013-05-02 12:10:19 +02:00