alistair23-linux

redonkable

Author	SHA1	Message	Date
Daniel Vetter	ee960be7bb	drm/i915: Some cleanups for the ppgtt lifetime handling So when reviewing Michel's patch I've noticed a few things and cleaned them up: - The early checks in ppgtt_release are now redundant: The inactive list should always be empty now, so we can ditch these checks. Even for the aliasing ppgtt (though that's a different confusion) since we tear that down after all the objects are gone. - The ppgtt handling functions are splattered all over. Consolidate them in i915_gem_gtt.c, give them OCD prefixes and add wrappers for get/put. - There was a bit a confusion in ppgtt_release about whether it cares about the active or inactive list. It should care about them both, so augment the WARNINGs to check for both. There's still create_vm_for_ctx left to do, put that is blocked on the removal of ppgtt->ctx. Once that's done we can rename it to i915_ppgtt_create and move it to its siblings for handling ppgtts. v2: Move the ppgtt checks into the inline get/put functions as suggested by Chris. v3: Inline the now redundant ppgtt local variable. Cc: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-12 15:24:04 +02:00
Michel Thierry	b9d06dd9d1	drm/i915: vma/ppgtt lifetime rules VMAs should take a reference of the address space they use. Now, when the fd is closed, it will release the ref that the context was holding, but it will still be referenced by any vmas that are still active. ppgtt_release() should then only be called when the last thing referencing it releases the ref, and it can just call the base cleanup and free the ppgtt. Note that with this we will extend the lifetime of ppgtts which contain shared objects. But all the non-shared objects will get removed as soon as they drop of the active list and for the shared ones the shrinker can eventually reap them. Since we currently can't evict ppgtt pagetables either I don't think that temporary leak is important. Signed-off-by: Michel Thierry <michel.thierry@intel.com> [danvet: Add note about potential ppgtt leak with this approach.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-12 15:22:26 +02:00
Oscar Mateo	454afebde8	drm/i915/bdw: Skeleton for the new logical rings submission path Execlists are indeed a brave new world with respect to workload submission to the GPU. In previous version of these series, I have tried to impact the legacy ringbuffer submission path as little as possible (mostly, passing the context around and using the correct ringbuffer when I needed one) but Daniel is afraid (probably with a reason) that these changes and, especially, future ones, will end up breaking older gens. This commit and some others coming next will try to limit the damage by creating an alternative path for workload submission. The first step is here: laying out a new ring init/fini. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:40:57 +02:00
Oscar Mateo	a83014d3f8	drm/i915: Abstract the legacy workload submission mechanism away As suggested by Daniel Vetter. The idea, in subsequent patches, is to provide an alternative to these vfuncs for the Execlists submission mechanism. v2: Splitted into two and reordered to illustrate our intentions, instead of showing it off. Also, remove the add_request vfunc and added the stop_ring one. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: - Make checkpatch happy. - Be grumpy about the excessive vtable. - Ditch gt->is_ring_initialized.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:40:32 +02:00
Oscar Mateo	127f100369	drm/i915/bdw: Macro for LRCs and module option for Execlists GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". These expanded contexts enable a number of new abilities, especially "Execlists". The macro is defined to off until we have things in place to hope to work. v2: Rename "advanced contexts" to the more correct "logical ring contexts". v3: Add a module parameter to enable execlists. Execlist are relatively new, and so it'd be wise to be able to switch back to ring submission to debug subtle problems that will inevitably arise. v4: Add an intel_enable_execlists function. v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:00:27 +02:00
Chris Wilson	82b6b6d786	drm/i915: Remove fenced_gpu_access and pending_fenced_gpu_access This migrates the fence tracking onto the existing seqno infrastructure so that the later conversion to tracking via requests is simplified. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 12:20:25 +02:00
Chris Wilson	e6a844687c	drm/i915: Force CPU relocations if not GTT mapped Move the decision on whether we need to have a mappable object during execbuffer to the fore and then reuse that decision by propagating the flag through to reservation. As a corollary, before doing the actual relocation through the GTT, we can make sure that we do have a GTT mapping through which to operate. Note that the key to make this work is to ditch the obj->map_and_fenceable unbind optimization - with full ppgtt it doesn't make a lot of sense any more anyway. v2: Revamp and resend to ease future patches. v3: Refresh patch rationale References: https://bugs.freedesktop.org/show_bug.cgi?id=81094 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: Explain why obj->map_and_fenceable is key and split out the secure batch fix.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 12:01:29 +02:00
Chris Wilson	dc8cd1e790	drm/i915: Only perform set-to-gtt domain for objects bound to the global gtt If an object is not bound into the global GTT, then it cannot be accessed via the GTT. This restores the original code that was muddled by ppGTT. In the process, we remove a WARN that had long outlived its usefulness and was simply being coded around instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 11:36:12 +02:00
Deepak S	274fa1c1ac	drm/i915: Bring GPU Freq to min while suspending. We might be leaving the PGU Frequency (and thus vnn) high during the suspend. Flusing the delayed work queue should take care of this. Signed-off-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-07 14:04:09 +02:00
Armin Reese	9490edb588	drm/i915: Do not unmap object unless no other VMAs reference it When using an IOMMU, GEM objects are mapped by their DMA address as the physical address is unknown. This depends on the underlying IOMMU driver to map and unmap the physical pages properly as defined in intel_iommu.c. The current code will tell the IOMMU to unmap the GEM BO's pages on the destruction of the first VMA that "maps" that BO. This is clearly wrong as there may be other VMAs "mapping" that BO (using flink). The scanout is one such example. The patch fixes this issue by only unmapping the DMA maps when there are no more VMAs mapping that object. This is equivalent to when an object is considered unbound as can be seen by the code. On the first VMA that again because bound, we will remap. An alternate solution would be to move the dma mapping to object creation and destrubtion. I am not sure if this is considered an unfriendly thing to do. Some notes to backporters trying to backport full PPGTT: The bug can never be hit without enabling the IOMMU. The existing code will also do the right thing when the object is shared via dmabuf. The failure should be demonstrable with flink. In cases when not using intel_iommu_strict it is likely (likely, as defined by: off the top of my head) on current workloads to not hit this bug since we often teardown all VMAs for an object shared across multiple VMs. We also finish access to that object before the first dma_unmapping. intel_iommu_strict with flinked buffers is likely to hit this issue. Signed-off-by: Armin Reese <armin.c.reese@intel.com> [danvet: Add the excellent commit message provided by Ben.] Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:40 +02:00
Jesse Barnes	9df7575f1c	drm/i915: add helper for checking whether IRQs are enabled Now that we use the runtime IRQ enable/disable functions in our suspend path, we can simply check the pm._irqs_disabled flag everywhere. So rename it to catch the users, and add an inline for it to make the checks clear everywhere. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:34 +02:00
Chris Wilson	a1db2fa7c8	drm/i915: Abandon oom quickly if killed by a signal Whilst waiting to obtain our locks for the last resort shrinking before an oom, we check whether or not a fatal signal was pending. If there was, we do not need to keep waiting as the oom will be aborted. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:28 +02:00
Oscar Mateo	1b5d063faf	drm/i915: Generalize intel_ring_get_tail to take a ringbuf Again, it's low-level enough to simply take a ringbuf and nothing else. Trivial change. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-08 12:31:02 +02:00
Chris Wilson	ec5cc0f9b0	drm/i915: Restrict GPU boost to the RCS engine Make the assumption that media workloads are not as latency sensitive for __wait_seqno, and that upclocking the GPU does not affect the BLT engine. Under that assumption, we only wait to forcibly upclock the GPU when we are stalling for results from the render pipeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Deepak S<deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-08 10:25:17 +02:00
Rodrigo Vivi	ddd4dbc6c1	drm/i915: Updating comments. ring index calculation table was out of date after other rings were added, although the formula is flexible and scale when adding new rings. So this patch just update the comments and add a brief explanation why to use sync_seqno[ring index]. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 22:02:49 +02:00
Daniel Vetter	f99d70690e	drm/i915: Track frontbuffer invalidation/flushing So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush__domain and set_to__domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 18:14:47 +02:00
Daniel Vetter	a071fa0064	drm/i915: Introduce accurate frontbuffer tracking So from just a quick look we seem to have enough information to accurately figure out whether a given gem bo is used as a frontbuffer and where exactly: We have obj->pin_count as a first check with no false negatives and only negligible false positives. And then we can just walk the modeset objects and figure out where exactly a buffer is used as scanout. Except that we can't due to locking order: If we already hold dev->struct_mutex we can't acquire any modeset locks, so could potential chase freed pointers and other evil stuff. So we need something else. For that introduce a new set of bits obj->frontbuffer_bits to track where a buffer object is used. That we can then chase without grabbing any modeset locks. Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be able to do their magic both when called from modeset and from gem code. But that can be easily achieved by adding locks for these specific subsystems which always nest within either kms or gem locking. This patch just adds the relevant update code to all places. Note that if we ever support multi-planar scanout targets then we need one frontbuffer tracking bit per attachment point that we expose to userspace. v2: - Fix more oopsen. Oops. - WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix the bugs this brought to light. - s/update_frontbuffer_bits/update_fb_bits/. More consistent with the fb tracking functions (fb for gem object, frontbuffer for raw bits). And the function name was way too long. v3: Size obj->frontbuffer_bits correctly so that all pipes fit in. v4: Don't update fb bits in set_base on failure. Noticed by Chris. v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few local enum pipe variables which are now no longer needed to make the function arguments no drop over the 80 char limit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 10:04:41 +02:00
Daniel Vetter	3108e99ea9	drm/i915: Drop schedule_back from psr_exit It doesn't make sense to never again schedule the work, since by the time we might want to re-enable psr the world might have changed and we can do it again. The only exception is when we shut down the pipe, but that's an entirely different thing and needs to be handled in psr_disable. Note that later patch will again split psr_exit into psr_invalidate and psr_flush. But the split is different and this simplification helps with the transition. v2: Improve the commit message a bit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 09:59:19 +02:00
Daniel Vetter	f25748ea73	drm/i915: Don't BUG_ON in i915_gem_obj_offset A WARN_ON is perfectly fine. The BUG in here seems to be the cause behind hard-hangs when I cat the i915_gem_pageflip debugfs file (which calls this from an irq spinlock). But only while running a full igt run after a while. I still need to root cause the underlying issue. I'll also start reject patches which add new BUG_ON but don't come with a really good justification for it. The general rule really should be to just WARN and hope the driver survives for long enough. v2: Make the WARN a bit more useful per Chris' suggestion. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-18 00:48:37 +02:00
Ville Syrjälä	beff0d0f61	drm/i915: Don't prefault the entire obj if the vma is smaller Take the minimum of the object size and the vma size and prefault only that much. Avoids a SIGBUS when mmapping only a portion of the object. Prefaulting was introduced here: commit `b90b91d870` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jun 10 12:14:40 2014 +0100 drm/i915: Prefault the entire object on first page fault Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Testcase: igt/gem_mmap/short-mmap Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-18 00:48:35 +02:00
Sourab Gupta	84c33a64b4	drm/i915: Replaced Blitter ring based flips with MMIO flips This patch enables the framework for using MMIO based flip calls, in contrast with the CS based flip calls which are being used currently. MMIO based flip calls can be enabled on architectures where Render and Blitter engines reside in different power wells. The decision to use MMIO flips can be made based on workloads to give 100% residency for Media power well. v2: The MMIO flips now use the interrupt driven mechanism for issuing the flips when target seqno is reached. (Incorporating Ville's idea) v3: Rebasing on latest code. Code restructuring after incorporating Damien's comments v4: Addressing Ville's review comments -general cleanup -updating only base addr instead of calling update_primary_plane -extending patch for gen5+ platforms v5: Addressed Ville's review comments -Making mmio flip vs cs flip selection based on module parameter -Adding check for DRIVER_MODESET feature in notify_ring before calling notify mmio flip. -Other changes mostly in function arguments v6: -Having a seperate function to check condition for using mmio flips (Ville) -propogating error code from i915_gem_check_olr (Ville) v7: -Adding __must_check with i915_gem_check_olr (Chris) -Renaming mmio_flip_data to mmio_flip (Chris) -Rebasing on latest nightly v8: -Rebasing on latest code -squash 3rd patch in series(mmio setbase vs page flip race) with this patch -Added new tiling mode update in intel_do_mmio_flip (Chris) v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in intel_postpone_flip, as this is a more restrictive condition (Chris) v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch. These patches make the selection of CS vs MMIO flip at the page flip time, and make the module parameter for using mmio flips as tristate, the states being 'force CS flips', 'force mmio flips', 'driver discretion'. Changed the logic for driver discretion (Chris) v11: Minor code cleanup(better readability, fixing whitespace errors, using lockdep to check mutex locked status in postpone_flip, removal of __must_check in function definition) (Chris) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Akash Goel <akash.goel@intel.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb [danvet: Fix up parameter alignement checkpatch spotted.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-17 16:16:20 +02:00
Chris Wilson	6254b2042c	drm/i915: Simplify i915_gem_release_all_mmaps() An object can only have an active gtt mapping if it is currently bound into the global gtt. Therefore we can simply walk the list of all bound objects and check the flag upon those for an active gtt mapping. From commit `48018a57a8` Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Fri Dec 13 15:22:31 2013 -0200 drm/i915: release the GTT mmaps when going into D3 Also note that the WARN is inappropriate for this function as GPU activity is orthogonal to GTT mmap status. Rather it is the caller that relies upon this condition and so it should assert that the GPU is idle itself. References: https://bugs.freedesktop.org/show_bug.cgi?id=80081 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-16 19:52:20 +02:00
Rodrigo Vivi	7c8f8a7007	drm/i915: Force PSR exit by inactivating it. The perfect solution for psr_exit is the hardware tracking the changes and doing the psr exit by itself. This scenario works for HSW and BDW with some environments like Gnome and Wayland. However there are many other scenarios that this isn't true. Mainly one right now is KDE users on HSW and BDW with PSR on. User would miss many screen updates. For instances any key typed could be seen only when mouse cursor is moved. So this patch introduces the ability of trigger PSR exit on kernel side on some common cases that. Most of the cases are coverred by psr_exit at set_domain. The remaining cases are coverred by triggering it at set_domain, busy_ioctl, sw_finish and mark_busy. The downside here might be reducing the residency time on the cases this already work very wall like Gnome environment. But so far let's get focused on fixinge issues sio PSR couild be used for everybody and we could even get it enabled by default. Later we can add some alternatives to choose the level of PSR efficiency over boot flag of even over crtc property. v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and also this isn't needed for BDW and HSW anyway. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 21:21:36 +02:00
Chris Wilson	b90b91d870	drm/i915: Prefault the entire object on first page fault Inserting additional PTEs has no side-effect for us as the pfn are fixed for the entire time the object is resident in the global GTT. The downside is that we pay the entire cost of faulting the object upon the first hit, for which we in return receive the benefit of removing the per-page faulting overhead. On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8127MiB/s -> 8134MiB/s Upload rate for 2 tiled surfaces: 8607MiB/s -> 8625MiB/s Upload rate for 4 linear surfaces: 8127MiB/s -> 8127MiB/s Upload rate for 4 tiled surfaces: 8611MiB/s -> 8602MiB/s Upload rate for 8 linear surfaces: 8114MiB/s -> 8124MiB/s Upload rate for 8 tiled surfaces: 8601MiB/s -> 8603MiB/s Upload rate for 16 linear surfaces: 8110MiB/s -> 8123MiB/s Upload rate for 16 tiled surfaces: 8595MiB/s -> 8606MiB/s Upload rate for 32 linear surfaces: 8104MiB/s -> 8121MiB/s Upload rate for 32 tiled surfaces: 8589MiB/s -> 8605MiB/s Upload rate for 64 linear surfaces: 8107MiB/s -> 8121MiB/s Upload rate for 64 tiled surfaces: 2013MiB/s -> 3017MiB/s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Goel, Akash" <akash.goel@intel.com> Testcasee: igt/gem_fence_upload/performance Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 15:17:41 +02:00
Chris Wilson	ef0cf27c4d	drm/i915: Use the .release hook to drop the stolen drm_mm tracking Now that we have a release hook into i915_gem_object_free, we can move the explicit call to the internal stolen function and hook it up throught the callback instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 15:17:36 +02:00
David Herrmann	f461d1be22	drm/i915: use shmem helpers if possible Instead of shuffling gfp-masks all the time, use the shmem_read_mapping_page() helper. Note that __GFP_IO and __GFP_WAIT are set in mapping_gfp_mask() for i915, so the behavior is still the same. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-11 16:57:38 +02:00
Dave Airlie	ecb889e620	Merge tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel into drm-next > Bunch of stuff for 3.16 still: > - Mipi dsi panel support for byt. Finally! From Shobhit&others. I've > squeezed this in since it's a regression compared to vbios and we've > been ridiculed about it a bit too often ... > - connection_mutex deadlock fix in get_connector (only affects i915). > - Core patches from Matt's primary plane from Matt Roper, I've pushed the > i915 stuff to 3.17. > - vlv power well sequencing fixes from Jesse. > - Fix for cursor size changes from Chris. > - agpbusy fixes from Ville. > - A few smaller things. > * tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel: (32 commits) drm/i915: BDW: Adding missing cursor offsets. drm: Fix getconnector connection_mutex locking drm/i915/bdw: Only use 2g GGTT for 32b platforms drm/i915: Nuke pipe A quirk on i830M drm/i915: fix display power sw state reporting drm/i915: Always apply cursor width changes drm/i915: tell the user if both KMS and UMS are disabled drm/plane-helper: Add drm_plane_helper_check_update() (v3) drm: Check CRTC compatibility in setplane drm/i915: use VBT to determine whether to enumerate the VGA port drm/i915: Don't WARN about ring idle bit on gen2 drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS drm/i915: Enable interrupt-based AGPBUSY# enable on 85x drm/i915: Flip the sense of AGPBUSY_DIS bit drm/i915: Set AGPBUSY# bit in init_clock_gating drm/i915/vlv: add pll assertion when disabling DPIO common well drm/i915/vlv: move DPIO common reset de-assert into __vlv_set_power_well drm/i915/vlv: re-order power wells so DPIO common comes after TX drm/i915/vlv: move CRI refclk enable into __vlv_set_power_well ...	2014-06-06 19:07:09 +10:00
Dave Airlie	8d4ad9d4bb	Merge commit '9e9a928eed8796a0a1aaed7e0b676db86ba84594' into drm-next Merge drm-fixes into drm-next. Both i915 and radeon need this done for later patches. Conflicts: drivers/gpu/drm/drm_crtc_helper.c drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem.c drivers/gpu/drm/i915/i915_gem_execbuffer.c drivers/gpu/drm/i915/i915_gem_gtt.c	2014-06-05 20:28:59 +10:00
Chris Wilson	ddeff6ee42	drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object If the user tries to mmap through the GTT an object that is marked as snooped, we report an error rather than allow the GPU to hang the machine. The choice of EINVAL, however, was unfortunate as we turn that into a WARN rather than a quiet SIGBUS. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-05 08:52:41 +02:00
Ville Syrjälä	dbb42748ac	drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS Move the MI_ARB_STATE MI_ARB_C3_LP_WRITE_ENABLE setup to gen3_init_clock_gating() from i915_gem_load() when KMS is enabled. Leave it in i915_gem_load() for the UMS case, but add an explcit check, just to make it easier to spot it when we eventually rip out UMS support. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-05 08:52:40 +02:00
Chris Wilson	d23db88c3a	drm/i915: Prevent negative relocation deltas from wrapping This is pure evil. Userspace, I'm looking at you SNA, repacks batch buffers on the fly after generation as they are being passed to the kernel for execution. These batches also contain self-referenced relocations as a single buffer encompasses the state commands, kernels, vertices and sampler. During generation the buffers are placed at known offsets within the full batch, and then the relocation deltas (as passed to the kernel) are tweaked as the batch is repacked into a smaller buffer. This means that userspace is passing negative relocations deltas, which subsequently wrap to large values if the batch is at a low address. The GPU hangs when it then tries to use the large value as a base for its address offsets, rather than wrapping back to the real value (as one would hope). As the GPU uses positive offsets from the base, we can treat the relocation address as the minimum address read by the GPU. For the upper bound, we trust that userspace will not read beyond the end of the buffer. So, how do we fix negative relocations from wrapping? We can either check that every relocation looks valid when we write it, and then position each object such that we prevent the offset wraparound, or we just special-case the self-referential behaviour of SNA and force all batches to be above 256k. Daniel prefers the latter approach. This fixes a GPU hang when it tries to use an address (relocation + offset) greater than the GTT size. The issue would occur quite easily with full-ppgtt as each fd gets its own VM space, so low offsets would often be handed out. However, with the rearrangement of the low GTT due to capturing the BIOS framebuffer, it is already affecting kernels 3.15 onwards. I think only IVB+ is susceptible to this bug, but the workaround should only kick in rarely, so it seems sensible to always apply it. v3: Use a bias for batch buffers to prevent small negative delta relocations from wrapping. v4 from Daniel: - s/BIAS/BATCH_OFFSET_BIAS/ - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions were growing rather cumbersome. - Add a comment to eb_get_batch explaining why we do this. - Apply the batch offset bias everywhere but mention that we've only observed it on gen7 gpus. - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch. v5: Add static to eb_get_batch, spotted by 0-day tester. Testcase: igt/gem_bad_reloc Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-27 11:18:40 +03:00
Chris Wilson	00731155a7	drm/i915: Fix dynamic allocation of physical handles A single object may be referenced by multiple registers fundamentally breaking the static allotment of ids in the current design. When the object is used the second time, the physical address of the first assignment is relinquished and a second one granted. However, the hardware is still reading (and possibly writing) to the old physical address now returned to the system. Eventually hilarity will ensue, but in the short term, it just means that cursors are broken when using more than one pipe. v2: Fix up leak of pci handle when handling an error during attachment, and avoid a double kmap/kunmap. (Ville) Rebase against -fixes. v3: And fix the error handling added in v2 (Ville) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-27 11:18:39 +03:00
Oscar Mateo	273497e5cd	drm/i915: s/i915_hw_context/intel_context Up until now, contexts had one (and only one) backing object that was used by the hardware to save/restore render ring contexts (via the MI_SET_CONTEXT command). Other rings did not have or need this, so our i915_hw_context struct had a 1:1 relationship with a a real HW context. With Logical Ring Contexts and Execlists, this is not possible anymore: all rings need a backing object, and it cannot be reused. To prepare for that, rename our contexts to the more generic term intel_context. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:41:17 +02:00
Oscar Mateo	ee1b1e5ef3	drm/i915: Split the ringbuffers from the rings (2/3) This refactoring has been performed using the following Coccinelle semantic script: @@ struct intel_engine_cs r; @@ ( - (r).obj + r.buffer->obj \| - (r).virtual_start + r.buffer->virtual_start \| - (r).head + r.buffer->head \| - (r).tail + r.buffer->tail \| - (r).space + r.buffer->space \| - (r).size + r.buffer->size \| - (r).effective_size + r.buffer->effective_size \| - (r).last_retired_head + r.buffer->last_retired_head ) @@ struct intel_engine_cs *r; @@ ( - (r)->obj + r->buffer->obj \| - (r)->virtual_start + r->buffer->virtual_start \| - (r)->head + r->buffer->head \| - (r)->tail + r->buffer->tail \| - (r)->space + r->buffer->space \| - (r)->size + r->buffer->size \| - (r)->effective_size + r->buffer->effective_size \| - (r)->last_retired_head + r->buffer->last_retired_head ) @@ expression E; @@ ( - LP_RING(E)->obj + LP_RING(E)->buffer->obj \| - LP_RING(E)->virtual_start + LP_RING(E)->buffer->virtual_start \| - LP_RING(E)->head + LP_RING(E)->buffer->head \| - LP_RING(E)->tail + LP_RING(E)->buffer->tail \| - LP_RING(E)->space + LP_RING(E)->buffer->space \| - LP_RING(E)->size + LP_RING(E)->buffer->size \| - LP_RING(E)->effective_size + LP_RING(E)->buffer->effective_size \| - LP_RING(E)->last_retired_head + LP_RING(E)->buffer->last_retired_head ) Note: On top of this this patch also removes the now unused ringbuffer fields in intel_engine_cs. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> [danvet: Add note about fixup patch included here.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:27:25 +02:00
Oscar Mateo	a4872ba6d0	drm/i915: s/intel_ring_buffer/intel_engine_cs In the upcoming patches we plan to break the correlation between engine command streamers (a.k.a. rings) and ringbuffers, so it makes sense to refactor the code and make the change obvious. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:01:05 +02:00
Chris Wilson	340fbd8ca1	drm/i915: Only discard backing storage on releasing the last ref Before purging our pages (as opposed to copying back the contents from the GPU), make sure that there is not an exposed CPU mmapping through which the user can inspect the results. Regression from commit `5537252b6b` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 25 13:23:06 2014 +0000 drm/i915: Invalidate our pages under memory pressure Testcase: igt/gem_mmap/new-object Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79005 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Guo Jinxian <jinxianx.guo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 15:06:34 +02:00
Chris Wilson	2cfcd32a92	drm/i915: Implement an oom-notifier for last resort shrinking Before the process killer is invoked, oom-notifiers are executed for one last try at recovering pages. We can hook into this callback to be sure that everything that can be is purged from our page lists, and to give a summary of how much memory is still pinned by the GPU in the case of an oom. This should be really valuable for debugging OOM issues. Note that the last-ditch effort call to shrink_all we've previously called from our normal shrinker when we could free as much as the vm demaned is moved into the oom notifier. Since the shrinker accounting races against bind/unbind operations we might have called shrink_all prematurely, which this approach with an oom notifier avoids. References: https://bugs.freedesktop.org/show_bug.cgi?id=72742 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: lu hua <huax.lu@intel.com> [danvet: Bikeshed logical \| into \|\| and pimp commit message.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 10:57:13 +02:00
Chris Wilson	5537252b6b	drm/i915: Invalidate our pages under memory pressure Try to flush out dirty pages into the swapcache (and from there into the swapfile) when under memory pressure and forced to drop GEM objects from memory. In effect, this should just allow us to discard unused pages for memory reclaim and to start writeback earlier. v2: Hugh Dickins warned that explicitly starting writeback from shrink_slab was prone to deadlocks within shmemfs. Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:51:18 +02:00
Chris Wilson	b453c4dbc3	drm/i915: Refactor common lock handling between shrinker count/scan We can share a few lines of tricky lock handling we need to use for both shrinker routines and in the process fix the return value for count() when reporting a deadlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:46:52 +02:00
Chris Wilson	ceabbba524	drm/i915: Include bound and active pages in the count of shrinkable objects When the machine is under a lot of memory pressure and being stressed by multiple GPU threads, we quite often report fewer than shrinker->batch (i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to skip calling into i915.ko to release pages, despite the GPU holding onto most of the physical pages in its active lists. References: https://bugs.freedesktop.org/show_bug.cgi?id=72742 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:46:06 +02:00
Chris Wilson	0820baf39b	drm/i915: Translate ENOSPC from shmem_get_page() to ENOMEM shmemfs first checks if there is enough memory to allocate the page and reports ENOSPC should there be insufficient, along with the usual ENOMEM for a genuine allocation failure. We use ENOSPC in our driver to mean that we have run out of aperture space and so want to translate the error from shmemfs back to our usual understanding of ENOMEM. None of the the other GEM users appear to distinguish between ENOMEM and ENOSPC in their error handling, hence it is easiest to do the fixup in i915.ko Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:45:22 +02:00
Chris Wilson	5cc9ed4b9a	drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 19:31:29 +02:00
Oscar Mateo	19656430a8	drm/i915: Gracefully handle obj not bound to GGTT in is_pin_display Otherwise, we do a NULL pointer dereference. I've seen this happen while handling an error in i915_gem_object_pin_to_display_plane(): If i915_gem_object_set_cache_level() fails, we call is_pin_display() to handle the error. At this point, the object is still not pinned to GGTT and maybe not even bound, so we have to check before we dereference its GGTT vma. The IGT kms_flip/bo-too-big tests for this bug. v2: Chris Wilson says restoring the old value is easier, but that is_pin_display is useful as a theory of operation. Take the solomonic decision: at least this way is_pin_display is a little more robust (until Chris can kill it off). v3: Chris suggests the WARN in i915_gem_obj_to_ggtt has outlived its usefulness: add a reminder to remove it. Issue: VIZ-3772 Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Testcase: igt/kms_flip/bo-too-big Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 16:24:39 +02:00
Daniel Vetter	8b1bc9b4f1	drm/i915: Only do gtt cleanup in vma_unbind for the global vma Otherwise we end up tearing down fences when e.g. the client quits way too early. Might or might not fix a fence pin_count BUG Ville has reported. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 18:39:54 +02:00
Daniel Vetter	aff10b30a1	drm/i915: Don't drop pinned fences Userspace can currently provoke this when e.g. trying to use a pinned scanout as a cursor or overlay target. Later on that might lead to some fun fence pin count mayhem. Spurred by Ville's report that something goes wrong here and originally I've thought that this might slip through the pwrite gtt fastpath. But that one checks of obj tiling, so should be ok. But one thing that _does_ blow up is the vma unbinding with more than one address space. The next patch will fix this. v2: Use a WARN_ON - Chris pointed out that we already catch all cases so userspace can't provoke this like I've originally feared. While reviewing relevant code I've noticed a pile of DRM_ERROR in the overlay&cursor code which are all triggerable by userspace. Tune them down while at it. v3: Split out the DRM_ERROR->DRM_DEBUG_KMS change into a separate patch, as requested by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 18:39:31 +02:00
Daniel Vetter	d8ffa60b52	drm/i915: WARN_ON fence pin leaks The fence pin count should always be <= the bo pin count. If that's not the case then we have a funny problem and are leaking references somewhere. Which means we can catch fence pin leaks by checking for the same upper limit as we do for the bo pin count. Inspired by a discussion with Ville about a fence leak igt testcase. v2: Also check for fence->pin_count <= ggtt_vma->pin_count, since that might catch a leak even quicker. Also de-inline them, they're getting too big. v3: Don't separately check for MAX_PIN_COUNT since the > vma->pin_count check will catch that already (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-13 17:16:12 +02:00
Chris Wilson	1cf0ba1474	drm/i915: Flush request queue when waiting for ring space During the review of commit `1f70999f90` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 27 22:43:07 2014 +0000 drm/i915: Prevent recursion by retiring requests when the ring is full Ville raised the point that our interaction with request->tail was likely to foul up other uses elsewhere (such as hang check comparing ACTHD against requests). However, we also need to restore the implicit retire requests that certain test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the spectre that the ppgtt will randomly call i915_gpu_idle() and recurse back into intel_ring_begin(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023 Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Remove now unused 'tail' variable as spotted by Brad.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-08 01:23:34 +02:00
Ben Widawsky	6e7186af3b	drm/i915: Make aliasing a 2nd class VM There is a good debate to be had about how best to fit the aliasing PPGTT into the code. However, as it stands right now, getting aliasing PPGTT bindings is a hack, and done through implicit arguments. To make this absolutely clear, WARN and return an error if a driver writer tries to do something they shouldn't. I have no issue with an eventual revert of this patch. It makes sense for what we have today. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-07 10:01:41 +02:00
Ben Widawsky	ebc348b2ad	drm/i915: Move semaphore specific ring members to struct This will be helpful in abstracting some of the code in preparation for gen8 semaphores. v2: Move mbox stuff to a separate struct v3: Rebased over VCS2 work Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 10:56:52 +02:00
Chris Wilson	c8725f3dc0	drm/i915: Do not call retire_requests from wait_for_rendering A common issue we have is that retiring requests causes recursion through GTT manipulation or page table manipulation which we can only handle at very specific points. However, to maintain internal consistency (enforced through our sanity checks on write_domain at various points in the GEM object lifecycle) we do need to retire the object prior to marking it with a new write_domain, and also clear the write_domain for the implicit flush following a batch. Note that this then allows the unbound objects to still be on the active lists, and so care must be taken when removing objects from unbound lists (similar to the caveats we face processing the bound lists). v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules, by refactoring it to call into __i915_gem_shrink(). v3: Missed an object-retire prior to changing cache domains in i915_gem_object_set_cache_leve() v4: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:09:15 +02:00

1 2 3 4 5 ...

989 Commits (4d884705dababd7d0f3f12796bc7b45e84962596)