redonkable/alistair23-linux

Author	SHA1	Message	Date
Ville Syrjälä	83843d84fc	drm/i915: Parametrize LRC registers Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-09-23 17:13:01 +02:00
Nick Hoath	e84fe80337	drm/i915: Split alloc from init for lrc Extend init/init_hw split to context init. - Move context initialisation in to i915_gem_init_hw - Move one off initialisation for render ring to i915_gem_validate_context - Move default context initialisation to logical_ring_init Rename intel_lr_context_deferred_create to intel_lr_context_deferred_alloc, to reflect reduced functionality & alloc/init split. This patch is intended to split out the allocation of resources & initialisation to allow easier reuse of code for resume/gpu reset. v2: Removed function ptr wrapping of do_switch_context (Daniel Vetter) Left ->init_context int intel_lr_context_deferred_alloc (Daniel Vetter) Remove unnecessary init flag & ring type test. (Daniel Vetter) Improve commit message (Daniel Vetter) v3: On init/reinit, set the hw next sequence number to the sw next sequence number. This is set to 1 at driver load time. This prevents the seqno being reset on reinit (Chris Wilson) v4: Set seqno back to ~0 - 0x1000 at start-of-day, and increment by 0x100 on reset. This makes it obvious which bbs are which after a reset. (David Gordon & John Harrison) Rebase. v5: Rebase. Fixed rebase breakage. Put context pinning in separate function. Removed code churn. (Thomas Daniel) v6: Cleanup up issues introduced in v2 & v5 (Thomas Daniel) Issue: VIZ-4798 Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <john.c.harrison@intel.com> Cc: David Gordon <david.s.gordon@intel.com> Cc: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-09-14 11:42:34 +02:00
Alex Dai	d1675198ed	drm/i915: Integrate GuC-based command submission GuC-based submission is mostly the same as execlist mode, up to intel_logical_ring_advance_and_submit(), where the context being dispatched would be added to the execlist queue; at this point we submit the context to the GuC backend instead. There are, however, a few other changes also required, notably: 1. Contexts must be pinned at GGTT addresses accessible by the GuC i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls. 2. The GuC's TLB must be invalidated after a context is pinned at a new GGTT address. 3. GuC firmware uses the one page before Ring Context as shared data. Therefore, whenever driver wants to get base address of LRC, we will offset one page for it. LRC_PPHWSP_PN is defined as the page number of LRCA. 4. In the work queue used to pass requests to the GuC, the GuC firmware requires the ring-tail-offset to be represented as an 11-bit value, expressed in QWords. Therefore, the ringbuffer size must be reduced to the representable range (4 pages). v2: Defer adding #defines until needed [Chris Wilson] Rationalise type declarations [Chris Wilson] v4: Squashed kerneldoc patch into here [Daniel Vetter] v5: Update request->tail in code common to both GuC and execlist modes. Add a private version of lr_context_update(), as sharing the execlist version leads to race conditions when the CPU and the GuC both update TAIL in the context image. Conversion of error-captured HWS page to string must account for offset from start of object to actual HWS (LRC_PPHWSP_PN). Issue: VIZ-4884 Signed-off-by: Alex Dai <yu.dai@intel.com> Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-08-14 18:16:44 +02:00
Dave Gordon	919f1f55d9	drm/i915: Expose one LRC function for GuC submission mode GuC submission is basically execlist submission, but with the GuC handling the actual writes to the ELSP and the resulting context switch interrupts. So to describe a context for submission via the GuC, we need one of the same functions used in execlist mode. This commit exposes one such function, changing its name to better describe what it does (it's related to logical ring contexts rather than to execlists per se). v2: Replaces previous "drm/i915: Move execlists defines from .c to .h" v3: Incorporates a change to one of the functions exposed here that was previously part of an internal patch, but which was omitted from the version recently committed to drm-intel-nightly: `7a01a0a` drm/i915/lrc: Update PDPx registers with lri commands So we reinstate this change here. v4: Drop v3 change, update function parameters due to collision with `8ee3615` drm/i915: Convert execlists_ctx_descriptor() for requests v5: Don't expose execlists_update_context() after all. The current version is no longer compatible with GuC submission; trying to share the execlist version of this function results in both GuC and CPU updating TAIL in the context image, with bad results when they get out of step. The GuC submission path now has its own private version that just updates the ringbuffer start address, and not TAIL or PDPx. v6: Rebased Issue: VIZ-4884 Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-08-14 18:16:40 +02:00
Peter Antoine	3bbaba0cea	drm/i915: Added Programming of the MOCS This change adds the programming of the MOCS registers to the gen 9+ platforms. The set of MOCS configuration entries introduced by this patch is intended to be minimal but sufficient to cover the needs of current userspace - i.e. a good set of defaults. It is expected to be extended in the future to provide further default values or to allow userspace to redefine its private MOCS tables based on its demand for additional caching configurations. In this setup, userspace should only utilize the first N entries, higher entries are reserved for future use. It creates a fixed register set that is programmed across the different engines so that all engines have the same table. This is done as the main RCS context only holds the registers for itself and the shared L3 values. By trying to keep the registers consistent across the different engines it should make the programming for the registers consistent. v2: -'static const' for private data structures and style changes.(Matt Turner) v3: - Make the tables "slightly" more readable. (Damien Lespiau) - Updated tables fix performance regression. v4: - Code formatting. (Chris Wilson) - re-privatised mocs code. (Daniel Vetter) v5: - Changed the name of a function. (Chris Wilson) v6: - re-based - Added Mesa table entry (skylake & broxton) (Francisco Jerez) - Tidied up the readability defines (Francisco Jerez) - NUMBER of entries defines wrong. (Jim Bish) - Added comments to clear up the meaning of the tables (Jim Bish) Signed-off-by: Peter Antoine <peter.antoine@intel.com> v7 (Francisco Jerez): - Don't write L3-specific MOCS_ESC/SCC values into the e/LLC control tables. Prefix L3-specific defines consistently with L3_ and e/LLC-specific defines with LE_ to avoid this kind of confusion in the future. - Change L3CC WT define back to RESERVED (matches my hardware documentation and the original patch, probably a misunderstanding of my own previous comment). - Drop Android tables, define new minimal tables more suitable for the open source stack. - Add comment that the MOCS tables are part of the kernel ABI. - Move intel_logical_ring_begin() and _advance() calls one level down (Chris Wilson). - Minor formatting and style fixes. v8 (Francisco Jerez): - Add table size sanity check to emit_mocs_control/l3cc_table() (Chris Wilson). - Add comment about undefined entries being implicitly set to uncached for forwards compatibility. v9 (Francisco Jerez): - Minor style fixes. Signed-off-by: Francisco Jerez <currojerez@riseup.net> Acked-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-14 17:13:14 +02:00
Mika Kuoppala	8ba319da89	drm/i915: Convert intel_lr_context_pin() for requests Pass around requests to carry context deeper in callchain. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-06 16:47:41 +02:00
Abdiel Janulgue	6922528a04	drm/i915: Enable resource streamer on Execlists GEN8 and above uses Execlists by default instead of the legacy ringbuffer for batch execution. This patch enables the resource streamer bits when required. Patch is based on the initial work by Minu Mathai <minu.mathai@intel.com> This version also adds the required bits to enable GEN8 Resource Streamer context save and restore for Execlists. Cc: ville.syrjala@linux.intel.com Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-06 10:26:13 +02:00
John Harrison	ccd98fe499	drm/i915: Add _ring_begin() to request allocation Now that the _ring_begin() functions no longer call the request allocation code, it is finally safe for the request allocation code to call *_ring_begin(). This is important to guarantee that the space reserved for the subsequent i915_add_request() call does actually get reserved. v2: Renamed functions according to review feedback (Tomas Elf). For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:30 +02:00
John Harrison	4866d729ab	drm/i915: Update flush_all_caches() to take request structures Updated the *_ring_flush_all_caches() functions to take requests instead of rings or ringbuf/context pairs. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:20 +02:00
John Harrison	5f19e2bffa	drm/i915: Merged the many do_execbuf() parameters into a structure The do_execbuf() function takes quite a few parameters. The actual set of parameters is going to change with the conversion to passing requests around. Further, it is due to grow massively with the arrival of the GPU scheduler. This patch simplifies the prototype by passing a parameter structure instead. Changing the parameter set in the future is then simply a matter of adding/removing items to the structure. Note that the structure does not contain absolutely everything that is passed in. This is because the intention is to use this structure more extensively later in this patch series and more especially in the GPU scheduler that is coming soon. The latter requires hanging on to the structure as the final hardware submission can be delayed until long after the execbuf IOCTL has returned to user land. Thus it is unsafe to put anything in the structure that is local to the IOCTL call itself - such as the 'args' parameter. All entries must be copies of data or pointers to structures that are reference counted in some way and guaranteed to exist for the duration of the batch buffer's life. v2: Rebased to newer tree and updated for changes to the command parser. Specifically, a code shuffle has required saving the batch start address in the params structure. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:59 +02:00
John Harrison	40e895ceca	drm/i915: Set context in request from creation even in legacy mode In execlist mode, the context object pointer is written in to the request structure (and reference counted) at the point of request creation. In legacy mode, this only happens inside i915_add_request(). This patch updates the legacy code path to match the execlist version. This allows all the intermediate code between request creation and request submission to get at the context object given only a request structure. Thus negating the need to pass context pointers here, there and everywhere. v2: Moved the context reference so it does not need to be undone if the get_seqno() fails. v3: Fixed execlist mode always hitting a warning about invalid last_contexts (which don't exist in execlist mode). v4: Updated for new i915_gem_request_alloc() scheme. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:58 +02:00
John Harrison	6689cb2b62	drm/i915: Move common request allocation code into a common function The request allocation code is largely duplicated between legacy mode and execlist mode. The actual difference between the two versions of the code is pretty minimal. This patch moves the common code out into a separate function. This is then called by the execution specific version prior to setting up the one different value. For: VIZ-5190 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-01 07:54:30 +02:00
John Harrison	bc0dce3fd0	drm/i915: Make intel_logical_ring_begin() static The only usage of intel_logical_ring_begin() is within intel_lrc.c so it can be made static. To avoid a forward declaration at the top of the file, it and bunch of other functions have been shuffled upwards. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-01 07:54:17 +02:00
John Harrison	8e004efc16	drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading There is a flags word that is passed through the execbuffer code path all the way from initial decoding of the user parameters down to the very final dispatch buffer call. It is simply called 'flags'. Unfortuantely, there are many other flags words floating around in the same blocks of code. Even more once the GPU scheduler arrives. This patch makes it more obvious exactly which flags word is which by renaming 'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word already have an 'I915_DISPATCH_' prefix on them and so are not quite so ambiguous. OTC-Jira: VIZ-1587 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> [danvet: Resolve conflict with Chris' rework of the bb parsing.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-25 22:43:29 +01:00
Thomas Daniel	3e5b6f05a2	drm/i915: Reset logical ring contexts' head and tail during GPU reset Work was getting left behind in LRC contexts during reset. This causes a hang if the GPU is reset when HEAD==TAIL because the context's ringbuffer head and tail don't get reset and retiring a request doesn't alter them, so the ring still appears full. Added a function intel_lr_context_reset() to reset head and tail on a LRC and its ringbuffer. Call intel_lr_context_reset() for each context in i915_gem_context_reset() when in execlists mode. Testcase: igt/pm_rps --run-subtest reset #bdw Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096 Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [danvet: Flatten control flow in the lrc reset code a notch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-24 00:19:37 +01:00
Damien Lespiau	183c990673	drm/i915: Make intel_logical_ring_advance_and_submit() static This function is only used in intel_lrc.c, so restrict it to that file. The function was moved around to avoid a forward declaration and group it with its user. Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-13 23:28:28 +01:00
Damien Lespiau	cef437ad22	drm/i915: Make intel_lr_context_render_state_init() static This function is only used in intel_lrc.c, so restrict it to that file. The function was moved around to avoid a forward declaration and group it with its user. Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-13 23:28:28 +01:00
Zhi Wang	5baa22c59f	drm/i915: Introduce bit definitions of CTXT_SR_CTRL register. This patch introduces 2 bit definitions of context save/restore control register. Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Suggested-by: Dave Gordon <david.s.gordon@intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-13 23:28:22 +01:00
Nick Hoath	6d3d8274bc	drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request Move all remaining elements that were unique to execlists queue items in to the associated request. Issue: VIZ-4274 v2: Rebase. Fixed issue of overzealous freeing of request. v3: Removed re-addition of cleanup work queue (found by Daniel Vetter) v4: Rebase. v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix pointer in __i915_add_request (found by Thomas Daniel) v6: Removed unrelated changes Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> [danvet: Reformat comment with strange linebreaks.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:53 +01:00
Nick Hoath	21076372af	drm/i915: Remove FIXME_lrc_ctx backpointer The first pass implementation of execlists required a backpointer to the context to be held in the intel_ringbuffer. However the context pointer is available higher in the call stack. Remove the backpointer from the ring buffer structure and instead pass it down through the call stack. v2: Integrate this changeset with the removal of duplicate request/execlist queue item members. v3: Rebase v4: Rebase. Remove passing of context when the request is passed. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:53 +01:00
Nick Hoath	72f95afa5f	drm/i915: Removed duplicate members from submit_request Where there were duplicate variables for the tail, context and ring (engine) in the gem request and the execlist queue item, use the one from the request and remove the duplicate from the execlist queue item. Issue: VIZ-4274 v1: Rebase v2: Fixed build issues. Keep separate postfix & tail pointers as these are used in different ways. Reinserted missing full tail pointer update. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:52 +01:00
Nick Hoath	2d12955a3e	drm/i915: execlist request keeps ptr/ref to gem_request Add a reference and pointer from the execlist queue item to the associated gem request. For execlist requests that don't have a request, create one as a placeholder. Issue: VIZ-4274 v1: Rebase after upstream of "Replace seqno values with request structures" patchset. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:52 +01:00
Daniel Vetter	3f7531c3b3	drm/i915: Name the lrc irq handler correctly We consistently use the _irq_handler postfix for functions called in hardirq context. Especially when it's a non-static function hardirq is a crazy enough calling context to warrant this level of ocd. So rename it. Cc: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-12-15 09:54:05 +01:00
Oscar Mateo	dcb4c12a68	drm/i915/bdw: Pin the context backing objects to GGTT on-demand Up until now, we have pinned every logical ring context backing object during creation, and left it pinned until destruction. This made my life easier, but it's a harmful thing to do, because we cause fragmentation of the GGTT (and, eventually, we would run out of space). This patch makes the pinning on-demand: the backing objects of the two contexts that are written to the ELSP are pinned right before submission and unpinned once the hardware is done with them. The only context that is still pinned regardless is the global default one, so that the HWS can still be accessed in the same way (ring->status_page). v2: In the early version of this patch, we were pinning the context as we put it into the ELSP: on the one hand, this is very efficient because only a maximum two contexts are pinned at any given time, but on the other hand, we cannot really pin in interrupt time :( v3: Use a mutex rather than atomic_t to protect pin count to avoid races. Do not unpin default context in free_request. v4: Break out pin and unpin into functions. Fix style problems reported by checkpatch v5: Remove unpin_lock as all pinning and unpinning is done with the struct mutex already locked. Add WARN_ONs to make sure this is the case in future. Issue: VIZ-4277 Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Akash Goel <akash.goels@gmail.com> Reviewed-by: Deepak S<deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-19 19:32:58 +01:00
Thomas Daniel	c86ee3a9f8	drm/i915/bdw: Clean up execlist queue items in retire_work No longer create a work item to clean each execlist queue item. Instead, move retired execlist requests to a queue and clean up the items during retire_requests. v2: Fix legacy ring path broken during overzealous cleanup v3: Update idle detection to take execlists queue into account v4: Grab execlist lock when checking queue state v5: Fix leaking requests by freeing in execlists_retire_requests. Issue: VIZ-4274 Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Reviewed-by: Akash Goel <akash.goels@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-19 19:16:45 +01:00
Oscar Mateo	564ddb2fae	drm/i915/bdw: Render state init for Execlists The batchbuffer that sets the render context state is submitted in a different way, and from different places. We needed to make both the render state preparation and free functions outside accesible, and namespace accordingly. This mess is so that all LR, LRC and Execlists functionality can go together in intel_lrc.c: we can fix all of this later on, once the interfaces are clear. v2: Create a separate ctx->rcs_initialized for the Execlists case, as suggested by Chris Wilson. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> v3: Setup ring status page in lr_context_deferred_create when the default context is being created. This means that the render state init for the default context is no longer a special case. Execute deferred creation of the default context at the end of logical_ring_init to allow the render state commands to be submitted. Fix style errors reported by checkpatch. Rebased. Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-03 11:04:52 +02:00
Oscar Mateo	73e4d07f8a	drm/i915/bdw: Document Logical Rings, LR contexts and Execlists Add theory of operation notes to intel_lrc.c and comments to externally visible functions. v2: Add notes on logical ring context creation. v3: Use kerneldoc. v4: Integrate it in the DocBook template. Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v3) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Drop hunk about render ring init function since that's not yet merged.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-20 17:17:51 +02:00
Oscar Mateo	4ba70e448b	drm/i915/bdw: Display execlists info in debugfs v2: Warn and return if LRCs are not enabled. v3: Grab the Execlists spinlock (noticed by Daniel Vetter). Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> v4: Lock the struct mutex for atomic state capture Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Checkpatch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-20 17:17:49 +02:00
Oscar Mateo	e1fee72c2e	drm/i915/bdw: Avoid non-lite-restore preemptions In the current Execlists feeding mechanism, full preemption is not supported yet: only lite-restores are allowed (this is: the GPU simply samples a new tail pointer for the context currently in execution). But we have identified an scenario in which a full preemption occurs: 1) We submit two contexts for execution (A & B). 2) The GPU finishes with the first one (A), switches to the second one (B) and informs us. 3) We submit B again (hoping to cause a lite restore) together with C, but in the time we spend writing to the ELSP, the GPU finishes B. 4) The GPU start executing B again (since we told it so). 5) We receive a B finished interrupt and, mistakenly, we submit C (again) and D, causing a full preemption of B. The race is avoided by keeping track of how many times a context has been submitted to the hardware and by better discriminating the received context switch interrupts: in the example, when we have submitted B twice, we won´t submit C and D as soon as we receive the notification that B is completed because we were expecting to get a LITE_RESTORE and we didn´t, so we know a second completion will be received shortly. Without this explicit checking, somehow, the batch buffer execution order gets messed with. This can be verified with the IGT test I sent together with the series. I don´t know the exact mechanism by which the pre-emption messes with the execution order but, since other people is working on the Scheduler + Preemption on Execlists, I didn´t try to fix it. In these series, only Lite Restores are supported (other kind of preemptions WARN). v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several rebase changes. v3: Clarify how the race is avoided, as requested by Daniel. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Align function parameters ...] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:43:58 +02:00
Thomas Daniel	e981e7b17f	drm/i915/bdw: Handle context switch events Handle all context status events in the context status buffer on every context switch interrupt. We only remove work from the execlist queue after a context status buffer reports that it has completed and we only attempt to schedule new contexts on interrupt when a previously submitted context completes (unless no contexts are queued, which means the GPU is free). We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock grabbed, FWIW), because it might sleep, which is not a nice thing to do. Instead, do the runtime_pm get/put together with the create/destroy request, and handle the forcewake get/put directly. Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> v2: Unreferencing the context when we are freeing the request might free the backing bo, which requires the struct_mutex to be grabbed, so defer unreferencing and freeing to a bottom half. v3: - Ack the interrupt inmediately, before trying to handle it (fix for missing interrupts by Bob Beckett <robert.beckett@intel.com>). - Update the Context Status Buffer Read Pointer, just in case (spotted by Damien Lespiau). v4: New namespace and multiple rebase changes. v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt", as suggested by Daniel. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Checkpatch ...] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:43:47 +02:00
Michel Thierry	acdd884a2e	drm/i915/bdw: Two-stage execlist submit process Context switch (and execlist submission) should happen only when other contexts are not active, otherwise pre-emption occurs. To assure this, we place context switch requests in a queue and those request are later consumed when the right context switch interrupt is received (still TODO). v2: Use a spinlock, do not remove the requests on unqueue (wait for context switch completion). Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> v3: Several rebases and code changes. Use unique ID. v4: - Move the queue/lock init to the late ring initialization. - Damien's kmalloc review comments: check return, use sizeof(*req), do not cast. v5: - Do not reuse drm_i915_gem_request. Instead, create our own. - New namespace. Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:10:59 +02:00
Ben Widawsky	84b790f80e	drm/i915/bdw: Implement context switching (somewhat) A context switch occurs by submitting a context descriptor to the ExecList Submission Port. Given that we can now initialize a context, it's possible to begin implementing the context switch by creating the descriptor and submitting it to ELSP (actually two, since the ELSP has two ports). The context object must be mapped in the GGTT, which means it must exist in the 0-4GB graphics VA range. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> v2: This code has changed quite a lot in various rebases. Of particular importance is that now we use the globally unique Submission ID to send to the hardware. Also, context pages are now pinned unconditionally to GGTT, so there is no need to bind them. v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context ID we submit to the ELSP is globally unique and != 0 (Bspec requirements of the software use-only bits of the Context ID in the Context Descriptor Format) without the hassle of the previous submission Id construction. Also, re-add the ELSP porting read (it was dropped somewhere during the rebases). v4: - Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC says: "SW must set Force Wakeup bit to prevent GT from entering C6 while ELSP writes are in progress") as noted by Thomas Daniel (thomas.daniel@intel.com). - Rename functions and use an execlists/intel_execlists_ namespace. - The BUG_ON only checked that the LRCA was <32 bits, but it didn't make sure that it was properly aligned. Spotted by Alistair Mcaulay <alistair.mcaulay@intel.com>. v5: - Improved source code comments as suggested by Chris Wilson. - No need to abstract submit_ctx away, as pointed by Brad Volkin. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Checkpatch. Sigh.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:03:03 +02:00
Oscar Mateo	48e29f5535	drm/i915/bdw: Emission of requests with logical rings On a previous iteration of this patch, I created an Execlists version of __i915_add_request and asbtracted it away as a vfunc. Daniel Vetter wondered then why that was needed: "with the clean split in command submission I expect every function to know wether it'll submit to an lrc (everything in intel_lrc.c) or wether it'll submit to a legacy ring (existing code), so I don't see a need for an add_request vfunc." The honest, hairy truth is that this patch is the glue keeping the whole logical ring puzzle together: - i915_add_request is used by intel_ring_idle, which in turn is used by i915_gpu_idle, which in turn is used in several places inside the eviction and gtt codes. - Also, it is used by i915_gem_check_olr, which is littered all over i915_gem.c - ... If I were to duplicate all the code that directly or indirectly uses __i915_add_request, I'll end up creating a separate driver. To show the differences between the existing legacy version and the new Execlists one, this time I have special-cased __i915_add_request instead of adding an add_request vfunc. I hope this helps to untangle this Gordian knot. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with Thomas Daniel.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:02:55 +02:00
Oscar Mateo	82e104cc26	drm/i915/bdw: New logical ring submission mechanism Well, new-ish: if all this code looks familiar, that's because it's a clone of the existing submission mechanism (with some modifications here and there to adapt it to LRCs and Execlists). And why did we do this instead of reusing code, one might wonder? Well, there are some fears that the differences are big enough that they will end up breaking all platforms. Also, Execlists offer several advantages, like control over when the GPU is done with a given workload, that can help simplify the submission mechanism, no doubt. I am interested in getting Execlists to work first and foremost, but in the future this parallel submission mechanism will help us to fine tune the mechanism without affecting old gens. v2: Pass the ringbuffer only (whenever possible). Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Appease checkpatch. Again. And drop the legacy sarea gunk that somehow crept in.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 22:42:36 +02:00
Oscar Mateo	454afebde8	drm/i915/bdw: Skeleton for the new logical rings submission path Execlists are indeed a brave new world with respect to workload submission to the GPU. In previous version of these series, I have tried to impact the legacy ringbuffer submission path as little as possible (mostly, passing the context around and using the correct ringbuffer when I needed one) but Daniel is afraid (probably with a reason) that these changes and, especially, future ones, will end up breaking older gens. This commit and some others coming next will try to limit the damage by creating an alternative path for workload submission. The first step is here: laying out a new ring init/fini. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:40:57 +02:00
Oscar Mateo	ede7d42bae	drm/i915/bdw: Initialization for Logical Ring Contexts For the moment this is just a placeholder, but it shows one of the main differences between the good ol' HW contexts and the shiny new Logical Ring Contexts: LR contexts allocate and free their own backing objects. Another difference is that the allocation is deferred (as the create function name suggests), but that does not happen in this patch yet, because for the moment we are only dealing with the default context. Early in the series we had our own gen8_gem_context_init/fini functions, but the truth is they now look almost the same as the legacy hw context init/fini functions. We can always split them later if this ceases to be the case. Also, we do not fall back to legacy ringbuffers when logical ring context initialization fails (not very likely to happen and, even if it does, hw contexts would probably fail as well). v2: Daniel says "explain, do not showcase". Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: s/BUG_ON/WARN_ON/.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:04:11 +02:00
Oscar Mateo	127f100369	drm/i915/bdw: Macro for LRCs and module option for Execlists GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". These expanded contexts enable a number of new abilities, especially "Execlists". The macro is defined to off until we have things in place to hope to work. v2: Rename "advanced contexts" to the more correct "logical ring contexts". v3: Add a module parameter to enable execlists. Execlist are relatively new, and so it'd be wise to be able to switch back to ring submission to debug subtle problems that will inevitably arise. v4: Add an intel_enable_execlists function. v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:00:27 +02:00
Oscar Mateo	b20385f1f8	drm/i915/bdw: New source and header file for LRs, LRCs and Execlists Some legacy HW context code assumptions don't make sense for this new submission method, so we will place this stuff in a separate file. Note for reviewers: I've carefully considered the best name for this file and this was my best option (other possibilities were intel_lr_context.c or intel_execlist.c). I am open to a certain bikeshedding on this matter, anyway. And some point in time, it would be a good idea to split intel_lrc.c/.h even further, but for the moment just shove everything together. v2: Change to intel_lrc.c v3: Squash together with the header file addition Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:00:07 +02:00

38 commits