Commit graph

542 commits

Author SHA1 Message Date
Chris Wilson 221ab9719b drm/i915/execlists: Unwind incomplete requests on resets
Given the mechanism to unwind and replay requests (designed to support
preemption), we have an alternative to the current method of
resubmitting the ELSP upon reset. Resubmitting ELSP turns out to be more
complicated than expected, due to having to handle lost context-switch
interrupts and so guessing what ELSP we need to resubmit later. Instead,
by unwinding the requests and clearing the ELSP tracking entirely, we
can then just dequeue the first pair of ready requests after resetting,
using the normal submission procedure.

Currently, the unwound requests have maximum priority and so are
guaranteed to be resubmitted upon resume. If we are lucky, we may be
able to coalesce a new request on top!

Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-4-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-09-18 11:00:12 +01:00
Chris Wilson 27606fd878 drm/i915/execlists: Split insert_request()
In the next patch we will want to reinsert a request not at the end of
the priority queue, but at the front. Here we split insert_request()
into two, the first function retrieves the priority list (for reuse for
unsubmit later) and a wrapper function to insert at the end of that list
and to schedule the tasklet if we were first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-3-chris@chris-wilson.co.uk
2017-09-18 11:00:10 +01:00
Chris Wilson 08dd3e1acc drm/i915/execlists: Move insert_request()
Move insert_request() earlier to avoid a forward declaration in a later
patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-2-chris@chris-wilson.co.uk
2017-09-18 11:00:05 +01:00
Chris Wilson 523e7c9278 drm/i915/execlists: Kick start request processing after a reset
During a reset, we may skip over completed requests and lost
context-switch interrupts. Following the reset, we may then may end up
with no active requests in the ELSP (and so do not resubmit to restart
the engine), but have a queue of requests ready for execution. This is
unlikely, it requires the last request to complete after the hang is
detected, but not impossible. The outcome of this is that the engine
stalls, possibly leading to full ring and indefinite wait under
struct_mutex, eventually leading to a full driver hang.

Alternatively, we can solve this by unsubmitting the incomplete requests
and just kickstarting the tasklet. Michał has patches for that, which I
initially disliked due to the extra complexity, but the complexity of
this "simple" restart is growing...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-09-18 10:59:59 +01:00
Chris Wilson 27a5f61b37 drm/i915: Cancel all ready but queued requests when wedging
When wedging the hw, we want to mark all in-flight requests as -EIO.
This is made slightly more complex by execlists who store the ready but
not yet submitted-to-hw requests on a private queue (an rbtree
priolist). Call into execlists to cancel not only the ELSP tracking for
the submitted requests, but also the queue of unsubmitted requests.

v2: Move the majority of engine_set_wedged to the backends (both legacy
ringbuffer and execlists handling their own lists).

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_eio/in-flight-contexts
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-09-18 10:59:55 +01:00
Chris Wilson 767a983ab2 drm/i915/execlists: Read the context-status HEAD from the HWSP
The engine also provides a mirror of the CSB write pointer in the HWSP,
but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!

v2: Cannonlake moved the CSB write index
v3: Include the sw/hwsp state in debugfs/i915_engine_info
v4: Also revert to using CSB mmio for GVT-g
v5: Prevent the compiler reloading tail (Mika)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-09-13 17:28:46 +01:00
Chris Wilson 6d2cb5aa38 drm/i915/execlists: Read the context-status buffer from the HWSP
The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.

Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.

v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path
v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
memory latency. (Mika)
v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio
read and we want to stop the compiler from issuing a later (v.slow) reload.

Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913133534.26927-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-09-13 17:24:32 +01:00
Daniele Ceraolo Spurio 486e93f72a drm/i915/lrc: allocate separate page for HWSP
On gen8+ we're currently using the PPHWSP of the kernel ctx as the
global HWSP. However, when the kernel ctx gets submitted (e.g. from
__intel_autoenable_gt_powersave) the HW will use that page as both
HWSP and PPHWSP. This causes a conflict in the register arena of the
HWSP, i.e. dword indices below 0x30. We don't current utilize this arena,
but in the following patches we will take advantage of the cached
register state for handling execlist's context status interrupt.

To avoid the conflict, instead of re-using the PPHWSP of the kernel
ctx we can allocate a separate page for the HWSP like what happens for
pre-execlists platform.

v2: Add a use-case for the register arena of the HWSP.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-3-chris@chris-wilson.co.uk
2017-09-13 15:02:39 +01:00
Michel Thierry 0b29c75a01 drm/i915/lrc: Clarify the format of the context image
Not only the context image consist of two parts (the PPHWSP, and the
logical context state), but we also allocate a header at the start of
for sharing data with GuC. Thus every lrc looks like this:

  | [guc]          | [hwsp] [logical state] |
  |<- our header ->|<- context image      ->|

So far, we have oversimplified whenever we use each of these parts of the
context, just because the GuC header happens to be in page 0, and the
(PP)HWSP is in page 1. But this had led to using the same define for more
than one meaning (as a page index in the lrc and as 1 page).

This patch adds defines for the GuC shared page, the PPHWSP page and the
start of the logical state. It also updated the places where the old
define was being used. Since we are not changing the size (or format) of
the context, there are no functional changes.

v2: Use PPHWSP index for hws again.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-1-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-1-chris@chris-wilson.co.uk
2017-09-13 15:02:15 +01:00
Chris Wilson 2013ddebd2 drm/i915: Move the context descriptor to an inline helper
The context descriptor is stored inside the per-engine context state, as
we only need to compute it once and access it frequently. However,
currently only intel_lrc.c has easy access, but i915_guc_submission.c
would like to frequently read it as well, and more so only ever needs
the lower 32bits. Make it an inline as the compiler should be able to
retrieve the value in less instructions than it takes to do the function
call:

add/remove: 0/1 grow/shrink: 1/0 up/down: 8/-45 (-37)
function                                     old     new   delta
i915_guc_submit                              621     629      +8
intel_lr_context_descriptor                   45       -     -45

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170912214905.21987-1-chris@chris-wilson.co.uk
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
2017-09-13 10:31:48 +01:00
Rodrigo Vivi 90007bca61 drm/i915/cnl: Introduce initial Cannonlake Workarounds.
Let's inherit workarounds from previous platforms that
according to wa_database and BSpec are still valid for
Cannonlake.

v2: Add missed workarounds.
v3: Rebase
v4: Remove bad chunk that was added to rc6 disable. (Ander)
    Also remove A0 W/a that are not needed anymore.
v5: Rebase on top of CFL.
v6: Remove empty gen9_init_perctx_bb and gen9_init_indirectctx_bb
    since they don't carry any gen10 related W/a. (by Oscar).
    Also Remove A0 exclusive workaround.
v7: Remove more A0 exclusive workarounds. As pointed out by Oscar
    many workarounds were changed to be A0 only so let's remove
    them.

Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170815231651.975-1-rodrigo.vivi@intel.com
2017-08-18 15:32:17 -07:00
Chris Wilson 64f09f00ca drm/i915: Clear lost context-switch interrupts across reset
During a global reset, we disable the irq. As we disable the irq, the
hardware may be raising a GT interrupt that we then ignore, leaving it
pending in the GTIIR. After the reset, we then re-enable the irq,
triggering the pending interrupt. However, that interrupt was for the
stale state from before the reset, and the contents of the CSB buffer
are now invalid.

v2: Add a comment to make it clear that the double clear is purely my
paranoia.

Reported-by: "Dong, Chuanxiao" <chuanxiao.dong@intel.com>
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Dong, Chuanxiao" <chuanxiao.dong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170807121919.30165-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20170818090509.5363-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
2017-08-18 22:42:13 +01:00
Chris Wilson cdb6ded42f drm/i915: Flush the execlist ports if idle
When doing a GPU reset, the CSB register will be trashed and we will
lose any context-switch notifications that happened since the tasklet
was disabled. If we find that all requests on this engine were
completed, we want to make sure that the ELSP tracker is similarly empty
so that we do not feed back in the completed requests upon recovering
from the reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-4-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:45 +02:00
Chris Wilson 829a0af29f drm/i915: Group all the global context information together
Create a substruct to hold all the global context state under
drm_i915_private.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk
2017-06-20 17:13:40 +01:00
Robert Bragg 19f81df285 drm/i915/perf: Add OA unit support for Gen 8+
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

v5: Drain submitted requests when enabling metric set to ensure no
    lite-restore erases the context image we just updated (Lionel)

v6: In addition to drain, switch to kernel context & update all
    context in place (Chris)

v7: Add missing mutex_unlock() if switching to kernel context fails
    (Matthew)

v8: Simplify OA period/flex-eu-counters programming by using the
    batchbuffer instead of modifying ctx-image (Lionel)

v9: Back to updating the context image (due to erroneous testing,
    batchbuffer programming the OA unit doesn't actually work)
    (Lionel)
    Pin context before updating context image (Chris)
    Drop MMIO programming now that we switch to a kernel context with
    right values in initial context image (Chris)

v10: Just pin_map the contexts we want to modify or let the
     configuration happen on first use (Chris)

v11: Update kernel context OA config through the batchbuffer rather
     than on the fly ctx-image update (Lionel)

v12: Rework OA context registers update again by swithing away from
     user contexts and reconfiguring the kernel context through the
     batchbuffer and updating all the other contexts' context image.
     Also take care to lock slice/subslice configuration when OA is
     on. (Lionel)

v13: Request rpcs updates on all engine when updating the OA config
     (Lionel)

v14: Drop any kind of rpcs management now that we monitor sseu
     configuration changes in a later patch (Lionel)
     Remove usleep after programming the NOA configs on Gen8+, this
     doesn't seem to be needed (Lionel)

v15: Respect coding style for block comments (Chris)

v16: Add missing i915_add_request() in case we fail to emit OA
     configuration (Matthew)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2017-06-14 12:31:57 -07:00
Michel Thierry 7bd0a2c6e1 drm/i915/gen10: Set value of Indirect Context Offset for gen10
Indirect Context Offset Pointer has changed for Cannonlake.

INDIRECT_CTX_OFFSET[15:6] valid value for CNL is 19h per Spec.

v2: rebased to intel_lr_indirect_ctx_offset

v3: Commit message added per Tvrtko request.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496781040-20888-9-git-send-email-rodrigo.vivi@intel.com
2017-06-07 07:29:58 -07:00
Chris Wilson 349bdb68bd drm/i915/execlists: Reduce lock contention between schedule/submit_request
If we do not require to perform priority bumping, and we haven't yet
submitted the request, we can update its priority in situ and skip
acquiring the engine locks -- thus avoiding any contention between us
and submit/execute.

v2: Remove the stack element from the list if we can do the early
assignment.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-10-chris@chris-wilson.co.uk
2017-05-17 13:38:13 +01:00
Chris Wilson c5cf9a9147 drm/i915: Create a kmem_cache to allocate struct i915_priolist from
The i915_priolist are allocated within an atomic context on a path where
we wish to minimise latency. If we use a dedicated kmem_cache, we have
the advantage of a local freelist from which to service new requests
that should keep the latency impact of an allocation small. Though
currently we expect the majority of requests to be at default priority
(and so hit the preallocate priolist), once userspace starts using
priorities they are likely to use many fine grained policies improving
the utilisation of a private slab.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
2017-05-17 13:38:12 +01:00
Chris Wilson 6c067579e6 drm/i915: Split execlist priority queue into rbtree + linked list
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.

Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.

There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).

v2: Avoid use-after-free when deleting a depleted priolist

v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
2017-05-17 13:38:09 +01:00
Chris Wilson 77f0d0e925 drm/i915/execlists: Pack the count into the low bits of the port.request
add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function                                     old     new   delta
execlists_submit_ports                       262     471    +209
port_assign.isra                               -     136    +136
capture                                     6344    6359     +15
reset_common_ring                            438     452     +14
execlists_submit_request                     228     238     +10
gen8_init_common_ring                        334     341      +7
intel_engine_is_idle                         106     105      -1
i915_engine_info                            2314    2290     -24
__i915_gem_set_wedged_BKL                    485     411     -74
intel_lrc_irq_handler                       1789    1604    -185
execlists_update_context                     294       -    -294

The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).

v2: Use the port_api() for guc as well (even though currently we do not
pack any counters in there, yet) and hide all port->request_count inside
the helpers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
2017-05-17 13:38:06 +01:00
Chuanxiao Dong 0d402a24df drm/i915: set initialised only when init_context callback is NULL
During execlist_context_deferred_alloc() we presumed that the context is
uninitialised (we only just allocated the state object for it!) and
chose to optimise away the later call to engine->init_context() if
engine->init_context were NULL. This breaks with GVT's contexts that are
marked as pre-initialised to avoid us annoyingly calling
engine->init_context(). The fix is to not override ce->initialised if it
is already true.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1494497262-24855-1-git-send-email-chuanxiao.dong@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-12 11:11:11 +01:00
Chris Wilson 266a240bf0 drm/i915: Use engine->context_pin() to report the intel_ring
Since unifying ringbuffer/execlist submission to use
engine->pin_context, we ensure that the intel_ring is available before
we start constructing the request. We can therefore move the assignment
of the request->ring to the central i915_gem_request_alloc() and not
require it in every engine->request_alloc() callback. Another small step
towards simplification (of the core, but at a cost of handling error
pointers in less important callers of engine->pin_context).

v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code
generation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk
2017-05-04 11:54:43 +01:00
Joonas Lahtinen 63ffbcdadc drm/i915: Sanitize engine context sizes
Pre-calculate engine context size based on engine class and device
generation and store it in the engine instance.

v2:
- Squash and get rid of hw_context_size (Chris)

v3:
- Move after MMIO init for probing on Gen7 and 8 (Chris)
- Retained rounding (Tvrtko)
v4:
- Rebase for deferred legacy context allocation

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-28 12:11:59 +03:00
Chris Wilson e6ba9992de drm/i915: Differentiate between sw write location into ring and last hw read
We need to keep track of the last location we ask the hw to read up to
(RING_TAIL) separately from our last write location into the ring, so
that in the event of a GPU reset we do not tell the HW to proceed into
a partially written request (which can happen if that request is waiting
for an external signal before being executed).

v2: Refactor intel_ring_reset() (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144
Testcase: igt/gem_exec_fence/await-hang
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-04-25 15:33:22 +01:00
Chris Wilson 6b764a594f drm/i915: Report request restarts for both execlists/guc
As we now share the execlist_port[] tracking for both execlists/guc, we
can reset the inflight count on both and report which requests are being
restarted.

Suggested-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170425103835.31871-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-04-25 14:02:36 +01:00
Chris Wilson 48921260b7 drm/i915/execlists: Document runtime pm for intel_lrc_irq_handler()
We indirectly hold the runtime-pm for the intel_lrc_irq_handler() by
virtue of dev_priv->gt.awake keeping a wakeref whilst the requests are
busy. As this is not obvious from the code, add a comment.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170411175850.2470-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-04-12 09:44:08 +01:00
Daniele Ceraolo Spurio ddfb570c20 drm/i915: Use the engine class to get the context size
Technically speaking, the context size is per engine class, not per
instance.

v2: Add MISSING_CASE (Tvrtko)

v3: Rebased

v4: Restore the interface back to hiding the class lookup (Chris)

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491905472-16189-1-git-send-email-oscar.mateo@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 19:44:04 +01:00
Chris Wilson d822bb18ce drm/i915: intel_ring.engine is unused
Or rather it is used only by intel_ring_pin() to extract the
drm_i915_private which we can easily pass in. As this is a relatively
rare operation, save the space in the struct, and as such it is even
break even in the extra code for passing around the parameter:

add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0)
function                                     old     new   delta
intel_init_ring_buffer                       906     918     +12
execlists_context_pin                       1308    1311      +3
mock_engine                                  407     403      -4
intel_engine_create_ring                     367     363      -4
intel_ring_pin                               326     319      -7
Total: Before=1261794, After=1261794, chg +0.00%

v2: Reorder intel_init_ring_buffer to keep the ring setup together:

add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6)
function                                     old     new   delta
intel_init_ring_buffer                       906     912      +6
execlists_context_pin                       1308    1311      +3
mock_engine                                  407     403      -4
intel_engine_create_ring                     367     363      -4
intel_ring_pin                               326     319      -7
Total: Before=1261794, After=1261788, chg -0.00%

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk
2017-04-03 13:52:58 +01:00
Chris Wilson 18afa28892 Revert "drm/i915: Skip execlists_dequeue() early if the list is empty"
This reverts commit 6c943de668 ("drm/i915: Skip execlists_dequeue()
early if the list is empty").

The validity of using READ_ONCE there depends upon having a mb to
coordinate the assignment of engine->execlist_first inside
submit_request() and checking prior to taking the spinlock in
execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a
memory barrier making this cross-cpu checking safe", but failed to
notice that this mb was *conditional* on the execlists being ready, i.e.
there wasn't the required mb when it was most necessary!

We could install an unconditional memory barrier to fixup the
READ_ONCE():

diff --git a/drivers/gpu/drm/i915/intel_lrc.c
b/drivers/gpu/drm/i915/intel_lrc.c
index 7dd732cb9f57..1ed164b16d44 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -616,6 +616,7 @@ static void execlists_submit_request(struct
drm_i915_gem_request *request)

        if (insert_request(&request->priotree, &engine->execlist_queue))
{
                engine->execlist_first = &request->priotree.node;
+               smp_wmb();
                if (execlists_elsp_ready(engine))

But we have opted to remove the race as it should be rarely effective,
and saves us having to explain the necessary memory barriers which we
quite clearly failed at.

Reported-and-tested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Fixes: 6c943de668 ("drm/i915: Skip execlists_dequeue() early if the list is empty")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170329100052.29505-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
2017-03-29 13:02:24 +01:00
Chris Wilson a79a524e92 drm/i915: Avoid lock dropping between rescheduling
Unlocking is dangerous. In this case we combine an early update to the
out-of-queue request, because we know that it will be inserted into the
correct FIFO priority-ordered slot when it becomes ready in the future.
However, given sufficient enthusiasm, it may become ready as we are
continuing to reschedule, and so may gazump the FIFO if we have since
dropped its spinlock. The result is that it may be executed too early,
before its dependencies.

v2: Move all work into the second phase over the topological sort. This
removes the shortcut on the out-of-rbtree request to ensure that we only
adjust its priority after adjusting all of its dependencies.

Fixes: 20311bd350 ("drm/i915/scheduler: Execute requests in order of priorities")
Testcase: igt/gem_exec_whisper
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170327202143.7972-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-29 11:47:45 +01:00
Chris Wilson ed1501d451 drm/i915: Refactor tests for validity of RING_TAIL
Whilst I like having the assertions clearly visible in the code, they
are quite repetitious! As we find new limits we want to incorporate into
the set of assertions, it make sense to refactor them to a common
routine.

v2: Add a guc holdout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327131412.20293-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-27 15:03:53 +01:00
Chris Wilson a91fdf1293 drm/i915: Assert that the request->tail fits within the ring
In addition to being qword-aligned, the RING_TAIL offset must be within
the ring!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-27 15:03:21 +01:00
Chris Wilson 450362d3fe drm/i915/execlists: Wrap tail pointer after reset tweaking
If the request->wa_tail is 0 (because it landed exactly on the end of
the ringbuffer), when we reconstruct request->tail following a reset we
fill in an illegal value (-8 or 0x001ffff8). As a result, RING_HEAD is
never able to catch up with RING_TAIL and the GPU spins endlessly. If
the ring contains a couple of breadcrumbs, even our hangcheck is unable
to catch the busy-looping as the ACTHD and seqno continually advance.

v2: Move the wrap into a common intel_ring_wrap().

Fixes: a3aabe86a3 ("drm/i915/execlists: Reinitialise context image after GPU hang")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-27 15:02:56 +01:00
Chris Wilson 4af0d727b1 drm/i915/execlists: Trim irq handler
I noticed that gcc was spilling the CSB to the stack, so rearrange the
code to be more compact. Spilling in this function is slightly more
interesting due to the mmio reads acting as memory barriers and so
end up flushing the stack spills. Still miniscule to having to do at
least the pair of uncached reads :(

function                                     old     new   delta
intel_lrc_irq_handler                       1039     878    -161

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170325201053.21306-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-27 12:48:44 +01:00
Chris Wilson 2e70b8c6bf drm/i915/execlists: Relax the locked clear_bit(IRQ_EXECLIST)
We only need to care about the ordering of the clearing of the bit with
the uncached CSB read in order to correctly detect a new interrupt
before the read completes. The uncached read itself acts as a full
memory barrier, so we do not need to enforce another in the form of a
locked clear_bit.

v2: Clarify why the split and unlocked test/clear is harmless.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170323134803.10418-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-23 22:13:51 +00:00
Tvrtko Ursulin 9f7886d07f drm/i915: Spinlocks in tasklets can use spin_(un)lock_irq
The tasklets callbacks are only called from tasklet context so
it is safe do to this.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170321105511.18269-1-tvrtko.ursulin@linux.intel.com
2017-03-21 15:04:55 +00:00
Chris Wilson fe085f13c7 drm/i915: Remove intel_ring.last_retired_head
Storing the position of the breadcrumb of the last retired request as
a separate last_retired_head is superfluous as we always copy that into
head prior to recalculation of the intel_ring.space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170321102552.24357-1-chris@chris-wilson.co.uk
2017-03-21 14:21:50 +00:00
Chris Wilson 899f6204c0 drm/i915/execlists: Split the atomic test_and_clear_bit for irq handler
Rather than impose the cost of a locked test before queuing a new
request, reduce it to a simple test_bit() with a following clear_bit()
prior to doing the CSB check. This ensure that if an interrupt does
occur whilst reading from the CSB, we still detect it (the interrupt
would trigger a rescheduling of the tasklet anyway).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170321113320.2603-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-21 14:14:55 +00:00
Chris Wilson c9203e8277 drm/i915: Reset tasklet back to execlists after disabling guc
When switching back to execlists, we also now need to restore the
tasklet handler.

Reported-by: Oscar Mateo <oscar.mateo@intel.com>
Fixes: 31de73501a ("drm/i915/scheduler: emulate a scheduler for guc")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170318102859.24101-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-03-20 10:13:14 +00:00
Chris Wilson 6c943de668 drm/i915: Skip execlists_dequeue() early if the list is empty
Do an early read of the execlists' queue before we take the spinlock and
start checking. This is safe as the first writer to the execlists queue
will cause the tasklet to be run again after a memory barrier.

v2: Keep guc in sync with execlists queue changes
v3: Explain the mb between the tasklet running on one cpu and the
execlist_first update and schedule from a second cpu.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170317120716.17191-1-chris@chris-wilson.co.uk
2017-03-17 15:53:26 +00:00
Chris Wilson a533b4ba77 drm/i915: Assert that the context pin_counts do not overflow
This should be impossible, but let's assert that we do not pin a context
4 billion times before retiring!

v2: Fix the assertion -- the patch had just one job to do!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171628.3228-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-16 20:48:58 +00:00
Chris Wilson ff44ad51eb drm/i915: Move engine->submit_request selection to a vfunc
It turns out that we may want to restore the original
engine->submit_request (and engine->schedule) callbacks from more than
just the guc <-> execlists transition. Move this to a vfunc so we can
have a common interface.

v2: Move initial selection to intel_engines_init_common(), repaint vfunc
with engine->set_default_submission (and a similar colour for the
helper).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-2-chris@chris-wilson.co.uk
2017-03-16 17:17:12 +00:00
Changbin Du 3fc03069bc drm/i915: make context status notifier head be per engine
GVTg has introduced the context status notifier to schedule the GVTg
workload. At that time, the notifier is bound to GVTg context only,
so GVTg is not aware of host workloads.

Now we are going to improve GVTg's guest workload scheduler policy,
and add Guc emulation support for new Gen graphics. Both these two
features require acknowledgment for all contexts running on hardware.
(But will not alter host workload.) So here try to make some change.

The change is simple:
  1. Move the context status notifier head from i915_gem_context to
     intel_engine_cs. Which means there is a notifier head per engine
     instead of per context. Execlist driver still call notifier for
     each context sched-in/out events of current engine.
  2. At GVTg side, it binds a notifier_block for each physical engine
     at GVTg initialization period. Then GVTg can hear all context
     status events.

In this patch, GVTg do nothing for host context event, but later
will add a function there. But in any case, the notifier callback is
a noop if this is no active vGPU.

Since intel_gvt_init() is called at early initialization stage and
require the status notifier head has been initiated, I initiate it in
intel_engine_setup().

v2: remove a redundant newline. (chris)

Fixes: 3c7ba6359d ("drm/i915: Introduce execlist context status change notification")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100232
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170313024711.28591-1-changbin.du@intel.com
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-16 16:24:35 +00:00
Chris Wilson 31de73501a drm/i915/scheduler: emulate a scheduler for guc
This emulates execlists on top of the GuC in order to defer submission of
requests to the hardware. This deferral allows time for high priority
requests to gazump their way to the head of the queue, however it nerfs
the GuC by converting it back into a simple execlist (where the CPU has
to wake up after every request to feed new commands into the GuC).

v2: Drop hack status - though iirc there is still a lockdep inversion
between fence and engine->timeline->lock (which is impossible as the
nesting only occurs on different fences - hopefully just requires some
judicious lockdep annotation)
v3: Apply lockdep nesting to enabling signaling on the request, using
the pattern we already have in __i915_gem_request_submit();
v4: Replaying requests after a hang also now needs the timeline
spinlock, to disable the interrupts at least
v5: Hold wq lock for completeness, and emit a tracepoint for enabling signal
v6: Reorder interrupt checking for a happier gcc.
v7: Only signal the tasklet after a user-interrupt if using guc scheduling
v8: Restore lost update of rq through the i915_guc_irq_handler (Tvrtko)
v9: Avoid re-initialising the engine->irq_tasklet from inside a reset
v10: Hook up the execlists-style tracepoints
v11: Clear the execlists irq_posted bit after taking over the interrupt/tasklet

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316125619.6856-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-03-16 14:45:07 +00:00
Mika Kuoppala e71677698b drm/i915: Avoid using word legacy with ppgtt
The term legacy is subjective. Use 3lvl and 4lvl
where appropriate.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-4-git-send-email-mika.kuoppala@intel.com
2017-03-03 16:46:23 +02:00
Mika Kuoppala 54af56dbf8 drm/i915: Don't mark pdps clear if pdps are not submitted
Don't mark pdps clear if never do the necessary actions
with the hardware to make them clear.

v2: totally get rid of confusing ppgtt bool (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-2-git-send-email-mika.kuoppala@intel.com
2017-03-03 16:45:11 +02:00
Chris Wilson 0542524944 drm/i915: Generalise wait for execlists to be idle
The code to check for execlists completion is generic, so move it to
intel_engine_cs.c, where we can reuse the new intel_engine_is_idle().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170303121947.20482-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-03 13:08:15 +00:00
Kelvin Gardiner 69060d9644 drm/i915/bdw: Do not write the replay bit of the ring mode register
The replay bit of the ring mode register is not a valid bit for Gen8+.
Do not write to this bit.

Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
[Joonas: Fixed commit message line to be under 72 chars]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1487963724-4824-1-git-send-email-kelvin.gardiner@intel.com
2017-02-27 14:02:50 +02:00
Chris Wilson fe9ae7a3bf drm/i915/execlists: Detect an out-of-order context switch
We require that the request is completed before the context is switched
away.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170223145031.26210-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-24 12:57:32 +00:00
Tvrtko Ursulin d7d96833f2 drm/i915/tracepoints: Add backend level request in and out tracepoints
Two new tracepoints placed at the call sites where requests are
actually passed to the GPU enable userspace to track engine
utilisation.

These tracepoints are only enabled when the
DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option is enabled.

v2: Fix compilation with !CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS.

v3: Name global seqno consistently across tracepoints.

v4: Remove port info from request out tracepoint. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-21 13:18:25 +00:00
Tvrtko Ursulin 56e51bf036 drm/i915: Tidy execlists_init_reg_state
Compact the name of the macro and reg_state variable, and cache
some data in local variables to make the function more compact
and more readable.

v2: Fixup some checkpatch warnings.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170221095839.30525-1-tvrtko.ursulin@linux.intel.com
2017-02-21 13:15:41 +00:00
Chris Wilson 944a36d472 drm/i915: Assert that the request->tail is always qword aligned
The hardware requires that the tail pointer only advance in qword units,
so assert that the value we write is aligned to qwords, and similarly
enforce this restriction onto the request->tail.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170217163833.731-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-02-20 14:32:25 +00:00
Tvrtko Ursulin 9f235dfa49 drm/i915: Consolidate gen8_emit_pipe_control
We have a few open coded instances in the execlists code and an
almost suitable helper in intel_ringbuf.c

We can consolidate to a single helper if we change the existing
helper to emit directly to ring buffer memory and move the space
reservation outside it.

v2: Drop memcpy for memset. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170216122325.31391-2-tvrtko.ursulin@linux.intel.com
2017-02-17 11:39:59 +00:00
Tvrtko Ursulin 097d4f1c12 drm/i915: Tidy workaround batch buffer emission
Use the "*batch++ = " style as in the ring emission for better
readability and also simplify the logic a bit by consolidating
the offset and size calculations and overflow checking. The
latter is a programming error so it is not required to check
for it after each write to the object, but rather do it once the
whole state has been written and fail the driver if something
went wrong.

v2: Rebase.

v3: Keep track of offsets and sizes in bytes for simplicity
    and rename function pointer variable to _fn suffix.
    (Chris Wilson)

v4: Fix size calc broken in v3 and add alignment warning. (Chris Wilson)

v5: Fix return code.

v6: I added an exit from loop in v5 but forgot to put back
    the object teardown.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-17 11:39:59 +00:00
Tvrtko Ursulin 4ac9659ef9 drm/i915: Remove duplicate intel_logical_ring_workarounds_emit
intel_ring_workarounds_emit is exactly the same code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170214150017.16058-1-tvrtko.ursulin@linux.intel.com
2017-02-15 08:16:41 +00:00
Tvrtko Ursulin 73dec95e6b drm/i915: Emit to ringbuffer directly
This removes the usage of intel_ring_emit in favour of
directly writing to the ring buffer.

intel_ring_emit was preventing the compiler for optimising
fetch and increment of the current ring buffer pointer and
therefore generating very verbose code for every write.

It had no useful purpose since all ringbuffer operations
are started and ended with intel_ring_begin and
intel_ring_advance respectively, with no bail out in the
middle possible, so it is fine to increment the tail in
intel_ring_begin and let the code manage the pointer
itself.

Useless instruction removal amounts to approximately
two and half kilobytes of saved text on my build.

Not sure if this has any measurable performance
implications but executing a ton of useless instructions
on fast paths cannot be good.

v2:
 * Change return from intel_ring_begin to error pointer by
   popular demand.
 * Move tail increment to intel_ring_advance to enable some
   error checking.

v3:
 * Move tail advance back into intel_ring_begin.
 * Rebase and tidy.

v4:
 * Complete rebase after a few months since v3.

v5:
 * Remove unecessary cast and fix !debug compile. (Chris Wilson)

v6:
 * Make intel_ring_offset take request as well.
 * Fix recording of request postfix plus a sprinkle of asserts.
   (Chris Wilson)

v7:
 * Use intel_ring_offset to get the postfix. (Chris Wilson)
 * Convert GVT code as well.

v8:
 * Rename *out++ to *cs++.

v9:
 * Fix GVT out to cs conversion in GVT.

v10:
 * Rebase for new intel_ring_begin in selftests.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com
2017-02-14 14:30:46 +00:00
Chris Wilson ae9a043b0c drm/i915: Rename conditional GEM execution macros
After a brief discussion, we settled on a naming convention for the
conditional GEM debugging data that should be clearer to the casual
user: GEM_DEBUG

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207102319.10910-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-10 21:43:43 +00:00
Chris Wilson 72b72ae473 drm/i915: Always pin contexts into the high GGTT
Now that we have fast top-down insertion into the drm_mm, we can use it
for frequent runtime operations like insertion of the context object,
whereas before we limited it to the one-off insertion of the pinned
kernel context. Keeping the active context objects out of the mappable
region of the global GTT (except under memory pressure) improves our
ability to allocate mappable aperture region without triggering a GPU
stall.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210101422.1598-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-10 13:58:40 +00:00
Chris Wilson 949e8ab3a9 drm/i915: Use the size/type of address space to make decisions
Once the address space has been created (using 3 or 4 levels of page
tables), we should use that to program the appropriate type into the
contexts. This gives us the flexibility to handle different types of
address spaces at runtime.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170209144036.23664-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-09 17:09:27 +00:00
Chris Wilson c0dcb203fb drm/i915: Restore context and pd for ringbuffer submission after reset
Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).

v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-07 21:34:58 +00:00
Chris Wilson 2ffe80aa44 drm/i915: Avoid unguarded reads from the request pointer
In commit 86aa7e760a ("drm/i915: Assert that the context-switch
completion matches our context") I added a read to the irq tasklet
handler that compared the on-chip status with that of our sw tracking,
using an unguarded read of the request pointer to get the context and
beyond. Whilst we hold a reference to the request, we do not hold
anything on the context and if we are unlucky it may be reaped from a
second thread retiring the request (since it may retire the request as
soon as the breadcrumb is complete, even before we finish processing the
context switch) as we try to read from the context pointer.

Avoid the racy read from underneath the request by storing the expected
result in the execlist_port[].

v2: Include commentary about port[].request being unprotected.

Fixes: 86aa7e760a ("drm/i915: Assert that the context-switch completion matches our context")
Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Testcase: igt/gem_ctx_create
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170206170502.30944-2-chris@chris-wilson.co.uk
2017-02-06 20:41:01 +00:00
Zhi Wang 04da811b3d drm/i915: Let execlist_update_context() cover !FULL_PPGTT mode.
execlist_update_context() will try to update PDPs in a context before a
ELSP submission only for full PPGTT mode, while PDPs was populated during
context initialization. Now the latter code path is removed. Let
execlist_update_context() also cover !FULL_PPGTT mode.

Fixes: 34869776c7 ("drm/i915: check ppgtt validity when init reg state")
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhiyuan Lv <zhiyuan.lv@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1486377436-15380-1-git-send-email-zhi.a.wang@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-06 15:14:14 +00:00
Chris Wilson 22cc440eae drm/i915: Print execlists restart after reset
After resetting, show the requests that each engine restarts from in the
debug log.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170204110519.7645-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-06 11:48:16 +00:00
Chris Wilson 453cfe2171 drm/i915/execlists: Add interrupt-pending check to intel_execlists_idle()
Primarily this serves as a sanity check that the bit has been cleared
before we suspend (and hasn't reappeared after resume).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170201131222.11882-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-01 15:23:06 +00:00
Chris Wilson e2de845e0b drm/i915/execlists: Skip resetting RING_CONTEXT_STATUS_PTR
As we now flag when the GPU signals a context-switch and do not read the
status register before we see that signal, we do not have to ensure that
it is cleared upon reset (and can leave it to the GPU to reset it from
the power context).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170201125338.12932-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-01 15:22:38 +00:00
Mika Kuoppala 2355cf088d drm/i915: Create context desc template when context is created
Move the invariant parts of context desc setup from execlist init
to context creation. This is advantageous when we need to
create different templates based on the context parametrization,
ie. for svm capable contexts.

v2: s/create/default, remove engine->ctx_desc_template

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485522189-31984-1-git-send-email-mika.kuoppala@intel.com
2017-01-30 16:36:20 +02:00
Ander Conselvan de Oliveira 9fb5026f85 drm/i915/glk: Turn on workarounds that apply to Geminilake too
Apply workarounds to Geminilake, and annotate those that are applied
unconditionally when they apply to GLK based on the workaround database.

v2: Fix commit message typos. (David)
v3: Rebase.
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485422218-9102-1-git-send-email-ander.conselvan.de.oliveira@intel.com
2017-01-30 09:49:07 +02:00
Chris Wilson 48ea2554f4 drm/i915: Dequeue execlists on a new request if any port is available
If the second ELSP port is available, schedule the execlists tasklet to
see if the new request can occupy it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-7-chris@chris-wilson.co.uk
2017-01-24 16:00:25 +00:00
Chris Wilson 3833281adb drm/i915: Only attempt to pass the first request to execlists
Only the first request added to the execlist queue can be submitted. If
this request is not the first request on the queue, it means that there
are already higher priority requests waiting upon the tasklet and
kicking it will make no difference.

This is more relevant for a later patch, where we more eagerly try and
kick the tasklet to handle the submission of new requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-6-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-01-24 16:00:13 +00:00
Chris Wilson a37951ac9f drm/i915: Skip the execlists CSB scan and rewrite if the ring is untouched
If the CSB head/tail pointers are unchanged, we can skip the update of
the CSB register afterwards.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-5-chris@chris-wilson.co.uk
2017-01-24 15:56:18 +00:00
Chris Wilson f747026c2b drm/i915: Only run execlist context-switch handler after an interrupt
Mark when we run the execlist tasklet following the interrupt, so we
don't probe a potentially uninitialised register when submitting the
contexts multiple times before the hardware responds.

v2: Use a shared engine->irq_posted
v3: Always use locked bitops to be sure of atomicity wrt to other bits
in the mask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124152021.26587-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-01-24 15:56:01 +00:00
Chris Wilson 816ee798ec drm/i915: Only disable execlist preemption for the duration of the request
We need to prevent resubmission of the context immediately following an
initial resubmit (which does a lite-restore preemption). Currently we do
this by disabling all submission whilst the context is still active, but
we can improve this by limiting the restriction to only until we
receive notification from the context-switch interrupt that the
lite-restore preemption is complete.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-2-chris@chris-wilson.co.uk
2017-01-24 15:55:16 +00:00
Chris Wilson c816e605ff drm/i915: Assert that we don't submit to execlists whilst a preempt is pending
To complement the check in execlists_elsp_ready(), also assert that we
don't submit the same context while it has a lite restore still pending.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-1-chris@chris-wilson.co.uk
2017-01-24 15:55:05 +00:00
Chris Wilson b403c8feaf drm/i915: Remove BXT TDL state w/a
This w/a (WaClearTdlStateAckDirtyBits) was only used for preproduction hw,
which is no longer in use. Remove the workaround to simplify the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-23 15:53:02 +00:00
Chris Wilson 68bee61573 drm/i915: Remove BXT disable pixel mask clamping w/a
This w/a (WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken) was only
used for preproduction hw, which is no longer in use. Remove the
workaround to simplify the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-23 15:53:01 +00:00
Chris Wilson 32ebc29211 drm/i915: Remove BXT restore arbitration around ctx switch
This w/a (WaDisableCtxRestoreArbitration) was only used for preproduction
hw, which is no longer in use. Remove the workaround to simplify the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-23 15:53:01 +00:00
Chris Wilson f8dd2934c4 drm/i915: Remove BXT incoherent seqno write workaround
This w/a was only used for preproduction hw, which is no longer in use.
Remove the workaround to simplify the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-23 15:53:00 +00:00
Chris Wilson 70962fbe5c drm/i915: Remove disable_lite_restore_wa
This w/a (WaEnableForceRestoreInCtxtDescForVCS) was only used for
preproduction hw, which is no longer in use.  Remove the workaround to
simplify the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170123130601.2281-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-23 15:52:42 +00:00
Chris Wilson 86aa7e760a drm/i915: Assert that the context-switch completion matches our context
When execlists signals the context completion, it also provides the
context id for the status event. Assert that id matches the one we expect.

v2: The upper dword of the context status is a duplicate of the upper
dword from elsp submission (i.e. includes the group id as well as the
context id). Include this check as well.
v3: Only check against lrc_desc (as this contains the hw_id check)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170123113132.18665-2-chris@chris-wilson.co.uk
2017-01-23 12:04:46 +00:00
Chris Wilson a01cb37aff drm/i915: Remove i915_vma_create from VMA API
With the introduce of i915_vma_instance() for obtaining the VMA
singleton for a (obj, vm, view) tuple, we can remove the
i915_vma_create() in favour of a single entry point. We do incur a
lookup onto an empty tree, but the i915_vma_create() were being called
infrequently and during initialisation, so the small overhead is
negligible.

v2: Drop the i915_ prefix from the now static vma_create() function

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-4-chris@chris-wilson.co.uk
2017-01-19 10:17:39 +00:00
Chris Wilson 7c3f86b6dc drm/i915: Invalidate the guc ggtt TLB upon insertion
Move the GuC invalidation of its ggtt TLB to where we perform the ggtt
modification rather than proliferate it into all the callers of the
insert (which may or may not in fact have to do the insertion).

v2: Just do the guc invalidate unconditionally, (afaict) it has no impact
without the guc loaded on gen8+
v3: Conditionally invalidate the guc - just in case that register has
not been validated for other modes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170112110050.25333-1-chris@chris-wilson.co.uk
2017-01-12 16:47:04 +00:00
Francisco Jerez 8726f2faa3 drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
The WaDisableLSQCROPERFforOCL workaround has the side effect of
disabling an L3SQ optimization that has huge performance implications
and is unlikely to be necessary for the correct functioning of usual
graphic workloads.  Userspace is free to re-enable the workaround on
demand, and is generally in a better position to determine whether the
workaround is necessary than the DRM is (e.g. only during the
execution of compute kernels that rely on both L3 fences and HDC R/W
requests).

The same workaround seems to apply to BDW (at least to production
stepping G1) and SKL as well (the internal workaround database claims
that it does for all steppings, while the BSpec workaround table only
mentions pre-production steppings), but the DRM doesn't do anything
beyond whitelisting the L3SQCREG4 register so userspace can enable it
when it sees fit.  Do the same on KBL platforms.

Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
This is followed by a regression of 35% and 10% respectively for the
same benchmarks and platform caused by my recent patch series
switching userspace to use the dataport constant cache instead of the
sampler to implement uniform pull constant loads, which caused us to
hit more heavily the L3 cache (and on platforms other than KBL had the
opposite effect of improving performance of the same two benchmarks).
The overall effect on KBL of this change combined with the recent
userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
was affected by the constant cache changes (though it improved as it
did on other platforms rather than regressing), but is not
significantly affected by this patch (with statistical significance of
5% and sample size 20).

v2: Drop some more code to avoid unused variable warning.

Fixes: 738fa1b312 ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: beignet@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[Removed double Fixes tag]
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
2017-01-12 15:48:08 +02:00
Zhenyu Wang 34869776c7 drm/i915: check ppgtt validity when init reg state
Check if ppgtt is valid for context when init reg state. For gvt
context which has no i915 allocated ppgtt, failed to check that
would cause kernel null ptr reference error.

v2: remove !48bit ppgtt case as we'll always update before submit (Chris)

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109131453.3943-1-zhenyuw@linux.intel.com
2017-01-12 08:39:41 +00:00
Chris Wilson f51455d442 drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.

v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10 20:54:32 +00:00
Chris Wilson 984ff29f74 drm/i915: Simplify testing for am-I-the-kernel-context?
The kernel context (dev_priv->kernel_context) is unique in that it is
not associated with any user filp - it is the only one with
ctx->file_priv == NULL. This is a simpler test than comparing it against
dev_priv->kernel_context which involves some pointer dancing.

In checking that this is true, we notice that the gvt context is
allocating itself a i915_hw_ppgtt it doesn't use and not flagging that
its file_priv should be invalid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk
2017-01-06 16:02:12 +00:00
Chris Wilson f3b8f9126a drm/i915/execlists: Reorder execlists register enabling
Empirically we restart following a GPU reset more successfully if we call
lrc_init_hws() (which contains a posting read) last. (The failure mode
that was observed was that breadcrumb writes into the HWS from the
recovered requests went astray leading to the context-switch maintaining
forward progress, but the requests not being retired/completed.)

For clarity, lrc_init_hws() is inlined (and the unused function then
removed).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-3-chris@chris-wilson.co.uk
2017-01-05 15:34:42 +00:00
Chris Wilson 56f6e0a7e7 drm/i915: Assert that we do create the deferred context
In order to convince static analyzers that the allocation function
returns an error or sets ce->state, assert that it is set afterwards.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-2-chris@chris-wilson.co.uk
2017-01-05 15:34:41 +00:00
Chris Wilson 6095868a27 drm/i915: Complete kerneldoc for struct i915_gem_context
The existing kerneldoc was outdated, so time for a refresh.

v2: Use single line kdoc, mention functions for manipulation

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161231112012.29263-3-chris@chris-wilson.co.uk
2016-12-31 11:41:47 +00:00
Daniele Ceraolo Spurio feef2a7cb9 drm/i915: re-use computed offset bias for context pin
The context has to obey the same offset requirements as the ring,
so we can re-use the same bias value we computed for the ring instead of
unconditionally using GUC_WOPCM_TOP.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1482537382-28584-2-git-send-email-daniele.ceraolospurio@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-24 10:08:45 +00:00
Daniele Ceraolo Spurio d3ef1af6fd drm/i915: request ring to be pinned above GUC_WOPCM_TOP
GuC will validate the ring offset and fail if it is in the
[0, GUC_WOPCM_TOP) range. The bias is conditionally applied only
if GuC loading is enabled (we can't check for guc submission enabled as
in other cases because HuC loading requires this fix).

Note that the default context is processed before enable_guc_loading is
sanitized, so we might still apply the bias to its ring even if it is
not needed.

v2: compute the value during ctx init and pass it to
    intel_ring_pin (Chris), updated commit message

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1482537382-28584-1-git-send-email-daniele.ceraolospurio@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-24 10:06:59 +00:00
Chris Wilson f73e73999d drm/i915: Swap if(enable_execlists) in i915_gem_request_alloc for a vfunc
A fairly trivial move of a matching pair of routines (for preparing a
request for construction) onto an engine vfunc. The ulterior motive is
to be able to create a mock request implementation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-7-chris@chris-wilson.co.uk
2016-12-18 16:18:56 +00:00
Chris Wilson 2947e4080f drm/i915/execlists: Request the kernel context be pinned high
PIN_HIGH is an expensive operation (in comparison to allocating from the
hole stack) unsuitable for frequent use (such as switching between
contexts). However, the kernel context should be pinned just once for
the lifetime of the driver, and here it is appropriate to keep it out of
the mappable range (in order to maximise mappable space for users).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-6-chris@chris-wilson.co.uk
2016-12-18 16:18:55 +00:00
Chris Wilson e8a9c58fcd drm/i915: Unify active context tracking between legacy/execlists/guc
The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.

We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.

We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.

And finally an ulterior motive for unifying context handling was to
prepare for mock requests.

v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
2016-12-18 16:18:50 +00:00
Chris Wilson ef11c01db4 drm/i915: Move intel_lrc_context_pin() to avoid the forward declaration
Just a simple move to avoid a forward declaration, though the diff likes
to present itself as a move of intel_logical_ring_alloc_request_extras()
in the opposite direction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-2-chris@chris-wilson.co.uk
2016-12-18 16:18:49 +00:00
Tvrtko Ursulin d038fc7e4f drm/i915: Fix use after free in logical_render_ring_init
Commit 3b3f1650b1 ("drm/i915: Allocate intel_engine_cs
structure only for the enabled engines") introduced the
dynanically allocated engine instances and created an
potential use after free scenario in logical_render_ring_init
where lrc_destroy_wa_ctx_obj could be called after the engine
instance has been freed.

This can only happen during engine setup/init error handling
which luckily does not happen ever in practice.

Fix is to not call lrc_destroy_wa_ctx_obj since it would have
already been executed from the preceding engine cleanup.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 3b3f1650b1 ("drm/i915: Allocate intel_engine_cs structure only for the enabled engines")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1481894322-2145-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-12-16 14:30:00 +00:00
Chris Wilson 0798cff46b drm/i915/execlists: Use list_safe_reset_next() instead of opencoding
list.h provides a macro for updating the next element in a safe
list-iter, so let's use it so that it is hopefully clearer to the reader
about the unusual behaviour, and also easier to grep.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-6-chris@chris-wilson.co.uk
2016-12-05 20:49:17 +00:00
Tvrtko Ursulin 12d79d7828 drm/i915: Make GEM object create and create from data take dev_priv
Makes all GEM object constructors consistent.

v2: Fix compilation in GVT code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
2016-12-01 18:01:08 +00:00
Chris Wilson 70cd14761d Revert "drm/i915/execlists: Use a local lock for dfs_link access"
This reverts commit 27745e829a ("drm/i915/execlists: Use a local lock
for dfs_link access") as the struct_mutex was required to prevent
concurrent retiring and freeing, now restored in the previous patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161128143649.4289-2-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-11-29 09:17:23 +00:00
Min He d7ab992c68 drm/i915: fix the dequeue logic for single_port_submission context
For a single_port_submission context, GVT expects that it can only be
submitted to port 0, and there shouldn't be any other context in port 1
at the same time. This is required by GVT-g context to have an opportunity
to save/restore some non-hw context render registers.

This patch is to workaround GVT-g.

v2: optimized code by following Chris's advice, and added more comments to
explain the patch.
v3: followed the coding style.

Signed-off-by: Min He <min.he@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1479305104-17049-1-git-send-email-min.he@intel.com
2016-11-17 07:56:58 +00:00
Chris Wilson 27745e829a drm/i915/execlists: Use a local lock for dfs_link access
Avoid requiring struct_mutex for exclusive access to the temporary
dfs_link inside the i915_dependency as not all callers may want to touch
struct_mutex. So rather than force them to take a highly contended
lock, introduce a local lock for the execlists schedule operation.

Reported-by: David Weinehall <david.weinehall@linux.intel.com>
Fixes: 9a151987d7 ("drm/i915: Add execution priority boosting for mmioflips")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161116152721.11053-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-11-16 21:12:39 +00:00