1
0
Fork 0
Commit Graph

288 Commits (a0442461f2caa406ce77261214ec3217fe3cac22)

Author SHA1 Message Date
Tvrtko Ursulin e39d42fa7e drm/i915: Stop tracking execlists retired requests
With the previous patch having extended the pinned lifetime of
contexts by referencing the previous context from the current
request until the latter is retired (completed by the GPU),
we can now remove usage of execlist retired queue entirely.

This is because the above now guarantees that all execlist
object access requirements are satisfied by this new tracking,
and we can stop taking additional references and stop keeping
request on the execlists retired queue.

The latter was a source of significant scalability issues in
the driver causing performance hits on some tests. Most
dramatical of which was igt/gem_close_race which had run time
in tens of minutes which is now reduced to tens of seconds.

Signed-off-by: Tvrtko Ursulin <tvrtko@ursulin.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-24-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Tvrtko Ursulin a3d127616e drm/i915: Store LRC hardware id in the request
This way in the following patch we can disconnect requests
from contexts.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-23-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson a16a405259 drm/i915: Track the previous pinned context inside the request
As the contexts are accessed by the hardware until the switch is completed
to a new context, the hardware may still be writing to the context object
after the breadcrumb is visible. We must not unpin/unbind/prune that
object whilst still active and so we keep the previous context pinned until
the following request. We can generalise the tracking we already do via
the engine->last_context and move it to the request so that it works
equally for execlists and GuC.

v2: Drop the execlists double pin as that exposes a race inside the lrc
irq handler as it tries to access the context after it may be retired.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-22-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 978f1e09ab drm/i915: Move the magical deferred context allocation into the request
We can hide more details of execlists from higher level code by removing
the explicit call to create an execlist context from execbuffer and
into its first use by execlists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-20-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 24f1d3cc09 drm/i915: Refactor execlists default context pinning
Refactor pinning and unpinning of contexts, such that the default
context for an engine is pinned during initialisation and unpinned
during teardown (pinning of the context handles the reference counting).
Thus we can eliminate the special case handling of the default context
that was required to mask that it was not being pinned normally.

v2: Rebalance context_queue after rebasing.
v3: Rebase to -nightly (not 40 patches in)
v4: Rebase onto request_alloc unwinding

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-19-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 7069b14489 drm/i915: Replace the pinned context address with its unique ID
Rather than reuse the current location of the context in the global GTT
for its hardware identifier, use the context's unique ID assigned to it
for its whole lifetime.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-18-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson ef87bba828 drm/i915: Update execlists context descriptor format commentary
The comments describing the Context Descriptor Format are off by a bit
for the size of the context ID.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-16-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 6310346e75 drm/i915: Preallocate enough space for the average request
Rather than being interrupted when we run out of space halfway through
the request, and having to restart from the beginning (and returning to
userspace), flush a little more free space when we prepare the request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-15-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson bfa0120073 drm/i915: Manually unwind after a failed request allocation
In the next patches, we want to move the work out of freeing the request
and into its retirement (so that we can free the request without
requiring the struct_mutex). This means that we cannot rely on
unreferencing the request to completely teardown the request any more
and so we need to manually unwind the failed allocation. In doing so, we
reorder the allocation in order to make the unwind simple (and ensure
that we don't try to unwind a partial request that may have modified
global state) and so we end up pushing the initial preallocation down
into the engine request initialisation functions where we have the
requisite control over the state of the request.

Moving the initial preallocation into the engine is less than ideal: it
moves logic to handle a specific problem with request handling out of
the common code. On the other hand, it does allow those backends
significantly more flexibility in performing its allocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-14-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 0251a96322 drm/i915: Remove the identical implementations of request space reservation
Now that we share intel_ring_begin(), reserving space for the tail of
the request is identical between legacy/execlists and so the tautology
can be removed. In the process, we move the reserved space tracking
from the ringbuffer on to the request. This is to enable us to reorder
the reserved space allocation in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-13-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 987046ad65 drm/i915: Unify intel_ring_begin()
Combine the near identical implementations of intel_logical_ring_begin()
and intel_ring_begin() - the only difference is that the logical wait
has to check for a matching ring (which is assumed by legacy).

In the process some debug messages are culled as there were following a
WARN if we hit an actual error.

v2: Updated commentary

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-12-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson fe3db79b0b drm/i915: Propagate error from drm_gem_object_init()
Propagate the real error from drm_gem_object_init(). Note this also
fixes some confusion in the error return from i915_gem_alloc_object...

v2:
(Matthew Auld)
  - updated new users of gem_alloc_object from latest drm-nightly
  - replaced occurrences of IS_ERR_OR_NULL() with IS_ERR()
v3:
(Joonas Lahtinen)
  - fix double "From:" in commit message
  - add goto teardown path
v4:
(Matthew Auld)
  - rebase with i915_gem_alloc_object name change

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461587533-8841-1-git-send-email-matthew.auld@intel.com
[Joonas: Removed spurious " = NULL" from _init() function]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-04-28 12:28:58 +03:00
Dave Gordon d37cd8a887 drm/i915: rename i915_gem_alloc_object() to i915_gem_object_create()
Because having both i915_gem_object_alloc() and i915_gem_alloc_object()
(with different return conventions) is just too confusing!

(i915_gem_object_alloc() is the low-level memory allocator, and remains
unchanged, whereas i915_gem_alloc_object() is a constructor that ALSO
initialises the newly-allocated object.)

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461348872-4702-1-git-send-email-david.s.gordon@intel.com
2016-04-25 12:31:34 +01:00
Tvrtko Ursulin 791bee125b drm/i915: Remove a couple pointless WARN_ONs
Just two WARN_ONs followed by pointer dereference I spotted by accident.

v2: Remove some more of the same.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461080770-14693-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-04-20 09:59:17 +01:00
Peter Antoine 0ccdacf694 drm/i915/mocs: Program MOCS for all engines on init
Allow for the MOCS to be programmed for all engines.
Currently we program the MOCS when the first render batch
goes through. This works on most platforms but fails on
platforms that do not run a render batch early,
i.e. headless servers. The patch now programs all initialised engines
on init and the RCS is programmed again within the initial batch. This
is done for predictable consistency with regards to the hardware
context.

Hardware context loading sets the values of the MOCS for RCS
and L3CC. Programming them from within the batch makes sure that
the render context is valid, no matter what the previous state of
the saved-context was.

v2: posted correct version to the mailing list.
v3: moved programming to within engine->init_hw() (Chris Wilson)
v4: code formatting and white-space changes. (Chris Wilson)

Testcase: igt/gem_mocs_settings
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460556205-6644-1-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson aa9b78104f drm/i915: Late request cancellations are harmful
Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.

The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.

All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.

A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-16-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson f4457ae71f drm/i915: Prevent leaking of -EIO from i915_wait_request()
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.

If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.

Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.

v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson 299259a3a9 drm/i915: Store the reset counter when constructing a request
As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.

PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson c19ae989b0 drm/i915: Hide the atomic_read(reset_counter) behind a helper
This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.

v2: Be more strict on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-4-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Michał Winiarski ce81a65c79 drm/i915: Adjust size of PIPE_CONTROL used for gen8 render seqno write
We started to use PIPE_CONTROL to write render ring seqno in order to
combat seqno write vs interrupt generation problems. This was introduced
by commit 7c17d37737 ("drm/i915: Use ordered seqno write interrupt
generation on gen8+ execlists").

On gen8+ size of PIPE_CONTROL with Post Sync Operation should be
6 dwords. When we're using older 5-dword variant it's possible to
observe inconsistent values written by PIPE_CONTROL with Post
Sync Operation from user batches, resulting in rendering corruptions.

v2: Fix BAT failures
v3: Comments on alignment and thrashing high dword of seqno (Chris)
v4: Updated commit msg (Mika)

Testcase: igt/gem_pipe_control_store_loop/*-qword-write
Issue: VIZ-7393
Cc: stable@vger.kernel.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460469115-26002-1-git-send-email-michal.winiarski@intel.com
2016-04-13 15:34:51 +03:00
Tvrtko Ursulin 7d774cacf0 drm/i915: Use new i915_gem_object_pin_map for LRC
We can use the new pin/lazy unpin API for simplicity
and more performance in the execlist submission paths.

v2:
  * Fix error handling and convert more users.
  * Compact some names for readability.

v3:
  * intel_lr_context_free was not unpinning.
  * Special case for GPU reset which otherwise unbalances
    the HWS object pages pin count by running the engine
    initialization only (not destructors).

v4:
  * Rebased on top of hws setup/init split.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460472042-1998-1-git-send-email-tvrtko.ursulin@linux.intel.com
[tursulin: renames: s/hwd/hws/, s/obj_addr/vaddr/]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-13 10:47:54 +01:00
Tvrtko Ursulin 04794adbdd drm/i915: Split execlists hardware status page initialisation from setup
Split the hardware status page into setup and initialisation,
where setup means setting up the driver state to support the
engine, and initialization means programming the hardware
with the before set up state.

This way the design matches the design of the engine setup/init
code which is split in the same fashion and it enables the
stages to be used in a balanced fashion (engine setup - hws
setup, engine init - hws init).

This will enable the upcoming improvements to slot in without
any kludges on the GPU reset path.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-13 10:44:32 +01:00
Tvrtko Ursulin 3756685a18 drm/i915: Only grab correct forcewake for the engine with execlists
Rather than blindly waking up all forcewake domains on command
submission, we can teach each engine what is (or are) the correct
one to take.

On platforms with multiple forcewake domains like VLV, CHV, SKL
and BXT, this has the potential of lowering the GPU and CPU
power use and submission latency.

To implement it we add a function named
intel_uncore_forcewake_for_reg whose purpose is to query which
forcewake domains need to be taken to read or write a specific
register with raw mmio accessors.

These enables the execlists engine setup  to query which
forcewake domains are relevant per engine on the currently
running platform.

v2:
  * Kerneldoc.
  * Split from intel_uncore.c macro extraction, WARN_ON,
    no warns on old platforms. (Chris Wilson)

v3:
  * Single domain per engine, mention all registers,
    bi-directional function and a new name, fix handling
    of gen6 and gen7 writes. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460468251-14069-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-04-12 15:35:22 +01:00
Tim Gore b1e429fe3b drm/i915: implement WaClearTdlStateAckDirtyBits
This is to fix a GPU hang seen with mid thread pre-emption
and pooled EUs.

v2. Use IS_BXT_REVID instead of IS_BROXTON and INTEL_REVID

v3. And use correct type for register addresses

Signed-off-by: Tim Gore <tim.gore@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458571049-854-1-git-send-email-tim.gore@intel.com
2016-04-11 11:59:43 +01:00
Chris Wilson c04e0f3b4e drm/i915: Separate out the seqno-barrier from engine->get_seqno
In order to simplify future patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.

v2: Rename the barrier to engine->irq_seqno_barrier to try and better
reflect that the barrier is only required after the user interrupt before
reading the seqno (to ensure that the seqno update lands in time as we
do not have strict seqno-irq ordering on all platforms).

Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [#v2]

v3: Comments for hangcheck paranoia. Mika wanted to keep the extra
barrier inside the hangcheck, just in case. I can argue that it doesn't
provide a barrier against anything, but the side-effects of applying the
barrier may prevent a false declaration of a hung GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-2-git-send-email-chris@chris-wilson.co.uk
2016-04-09 12:09:05 +01:00
Akash Goel 782f6bc0ab drm/i915: Fixup the free space logic in ring_prepare
Currently for the case where there is enough space at the end of Ring
buffer for accommodating only the base request, the wrapround is done
immediately and as a result the base request gets added at the start
of Ring buffer. But there may not be enough free space at the beginning
to accommodate the base request, as before the wraparound, the wait was
effectively done for the reserved_size free space from the start of
Ring buffer. In such a case there is a potential of Ring buffer overflow,
the instructions at the head of Ring (ACTHD) can get overwritten.

Since the base request can fit in the remaining space, there is no need
to wraparound immediately. The wraparound will anyway happen later when
the reserved part starts getting used.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1457688402-10411-1-git-send-email-akash.goel@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
2016-04-09 11:37:48 +01:00
Tvrtko Ursulin 27af5eea54 drm/i915: Move execlists irq handler to a bottom half
Doing a lot of work in the interrupt handler introduces huge
latencies to the system as a whole.

Most dramatic effect can be seen by running an all engine
stress test like igt/gem_exec_nop/all where, when the kernel
config is lean enough, the whole system can be brought into
multi-second periods of complete non-interactivty. That can
look for example like this:

 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
 Modules linked in: [redacted for brevity]
 CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
 Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
 Workqueue: i915 gen6_pm_rps_work [i915]
 task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
 RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
 RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
 RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
 RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
 RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
 R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
 R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
 FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Stack:
  042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
  0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
  0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
 Call Trace:
  <IRQ>
  [<ffffffff8104a716>] irq_exit+0x86/0x90
  [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
  [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
  <EOI>
  [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
  [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
  [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
  [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
  [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
  [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
  [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
  [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
  [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
  [<ffffffff8105ab29>] process_one_work+0x139/0x350
  [<ffffffff8105b186>] worker_thread+0x126/0x490
  [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
  [<ffffffff8105fa64>] kthread+0xc4/0xe0
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
  [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170

I could not explain, or find a code path, which would explain
a +20 second lockup, but from some instrumentation it was
apparent the interrupts off proportion of time was between
10-25% under heavy load which is quite bad.

When a interrupt "cliff" is reached, which was >~320k irq/s on
my machine, the whole system goes into a terrible state of the
above described multi-second lockups.

By moving the GT interrupt handling to a tasklet in a most
simple way, the problem above disappears completely.

Testing the effect on sytem-wide latencies using
igt/gem_syslatency shows the following before this patch:

gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us

This shows that the unrelated process is experiencing huge
delays in its wake-up latency. After the patch the results
look like this:

gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
gem_syslatency: cycles=841683, latency mean=56.914us max=1667us

Showing a huge improvement in the unrelated process wake-up
latency. It also shows an approximate halving in the number
of total empty batches submitted during the test. This may
not be worrying since the test puts the driver under
a very unrealistic load with ncpu threads doing empty batch
submission to all GPU engines each.

Another benefit compared to the hard-irq handling is that now
work on all engines can be dispatched in parallel since we can
have up to number of CPUs active tasklets. (While previously
a single hard-irq would serially dispatch on one engine after
another.)

More interesting scenario with regards to throughput is
"gem_latency -n 100" which  shows 25% better throughput and
CPU usage, and 14% better dispatch latencies.

I did not find any gains or regressions with Synmark2 or
GLbench under light testing. More benchmarking is certainly
required.

v2:
   * execlists_lock should be taken as spin_lock_bh when
     queuing work from userspace now. (Chris Wilson)
   * uncore.lock must be taken with spin_lock_irq when
     submitting requests since that now runs from either
     softirq or process context.

v3:
   * Expanded commit message with more testing data;
   * converted missed locking sites to _bh;
   * added execlist_lock comment. (Chris Wilson)

v4:
   * Mention dispatch parallelism in commit. (Chris Wilson)
   * Do not hold uncore.lock over MMIO reads since the block
     is already serialised per-engine via the tasklet itself.
     (Chris Wilson)
   * intel_lrc_irq_handler should be static. (Chris Wilson)
   * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
   * Document and WARN that tasklet cannot be active/pending
     on engine cleanup. (Chris Wilson/Imre Deak)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Testcase: igt/gem_exec_nop/all
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-04-04 14:08:52 +01:00
Dave Gordon b4ac5afc6b drm/i915: replace for_each_engine()
Having provided for_each_engine_id() for cases where the third (id)
argument is useful, we can now replace all the remaining instances with
a simpler version that takes only two parameters. In many cases, this
also allows the elimination of the local variable used in the iterator
(usually 'i').

v2:
    s/dev_priv/(dev_priv__)/ in body of for_each_engine_masked() [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458757194-17783-2-git-send-email-david.s.gordon@intel.com
2016-03-24 14:34:11 +00:00
Tomas Elf fc0768ceac drm/i915/tdr: Initialize hangcheck struct for each engine
Initialize hangcheck struct during driver load. Since we do the same after
recovering from a reset, this is extracted into a helper function.

v2: remove redundant hangcheck init during load as this is done when
engines are initialized (Chris)

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458577619-12006-1-git-send-email-arun.siluvery@linux.intel.com
2016-03-22 13:52:42 +02:00
Tvrtko Ursulin 26720ab97f drm/i915: Move CSB MMIO reads out of the execlists lock
By reading the CSB (slow MMIO accesses) into a temporary local
buffer we can decrease the duration of holding the execlist
lock.

Main advantage is that during heavy batch buffer submission we
reduce the execlist lock contention, which should decrease the
latency and CPU usage between the submitting userspace process
and interrupt handling.

Downside is that we need to grab and relase the forcewake twice,
but as the below numbers will show this is completely hidden
by the primary gains.

Testing with "gem_latency -n 100" (submit batch buffers with a
hundred nops each) shows more than doubling of the throughput
and more than halving of the dispatch latency, overall latency
and CPU time spend in the submitting process.

Submitting empty batches ("gem_latency -n 0") does not seem
significantly affected by this change with throughput and CPU
time improving by half a percent, and overall latency worsening
by the same amount.

Above tests were done in a hundred runs on a big core Broadwell.

v2:
  * Overflow protection to local CSB buffer.
  * Use closer dev_priv in execlists_submit_requests. (Chris Wilson)

v3: Rebase.

v4: Added commend about irq needed to be disabled in
    execlists_submit_request. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilsno <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458219586-20452-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-18 10:25:56 +00:00
Tvrtko Ursulin 39dabecd99 drm/i915: Use shorter route to dev_private where possible
Where we have a request we can use req->i915 directly instead
of going through the engine and device. Coccinelle script:

@@
function f;
identifier r;
@@
f(..., struct drm_i915_gem_request *r, ...)
{
...
- engine->dev->dev_private
+ r->i915
...
}
@@
struct drm_i915_gem_request *req;
@@
(
  req->
- engine->dev->dev_private
+ i915
)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458219850-21007-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-18 09:50:37 +00:00
Tvrtko Ursulin 117897f42c drm/i915: More renaming of rings to engines
This time using only sed and a few by hand.

v2: Rename also intel_ring_id and intel_ring_initialized.
v3: Fixed typo in intel_ring_initialized.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458126040-33105-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-16 15:33:30 +00:00
Tvrtko Ursulin 666796da7a drm/i915: More intel_engine_cs renaming
Some trivial ones, first pass done with Coccinelle:

@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)

But that didn't fully work so I cleaned it up with:

for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:24 +00:00
Tvrtko Ursulin 4a570db57c drm/i915: Rename intel_engine_cs struct members
below and a couple manual fixups.

@@
identifier I, J;
@@
struct I {
...
- struct intel_engine_cs *J;
+ struct intel_engine_cs *engine;
...
}
@@
identifier I, J;
@@
struct I {
...
- struct intel_engine_cs J;
+ struct intel_engine_cs engine;
...
}
@@
struct drm_i915_private *d;
@@
(
- d->ring
+ d->engine
)
@@
struct i915_execbuffer_params *p;
@@
(
- p->ring
+ p->engine
)
@@
struct intel_ringbuffer *r;
@@
(
- r->ring
+ r->engine
)
@@
struct drm_i915_gem_request *req;
@@
(
- req->ring
+ req->engine
)

v2: Script missed the tracepoint code - fixed up by hand.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:17 +00:00
Tvrtko Ursulin 0bc40be85f drm/i915: Rename intel_engine_cs function parameters
@@
identifier func;
@@
func(..., struct intel_engine_cs *
- ring
+ engine
, ...)
{
<...
- ring
+ engine
...>
}
@@
identifier func;
type T;
@@
T func(..., struct intel_engine_cs *
- ring
+ engine
, ...);

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:10 +00:00
Tvrtko Ursulin e2f8039147 drm/i915: Rename local struct intel_engine_cs variables
Done by the Coccinelle script below plus a manual
intervention to GEN8_RING_SEMAPHORE_INIT.

@@
expression E;
@@
- struct intel_engine_cs *ring = E;
+ struct intel_engine_cs *engine = E;
<+...
- ring
+ engine
...+>
@@
@@
- struct intel_engine_cs *ring;
+ struct intel_engine_cs *engine;
<+...
- ring
+ engine
...+>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:00 +00:00
Tvrtko Ursulin 8de1b23efa drm/i915/lrc: Do not wait atomically when stopping engines
I do not see that this needs to be done atomically and up to
one second is quite a long time to busy loop.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-03 17:26:51 +00:00
Tvrtko Ursulin c6a2ac712d drm/i915: Execlists small cleanups and micro-optimisations
Assorted changes in the areas of code cleanup, reduction of
invariant conditional in the interrupt handler and lock
contention and MMIO access optimisation.

 * Remove needless initialization.
 * Improve cache locality by reorganizing code and/or using
   branch hints to keep unexpected or error conditions out
   of line.
 * Favor busy submit path vs. empty queue.
 * Less branching in hot-paths.

v2:

 * Avoid mmio reads when possible. (Chris Wilson)
 * Use natural integer size for csb indices.
 * Remove useless return value from execlists_update_context.
 * Extract 32-bit ppgtt PDPs update so it is out of line and
   shared with two callers.
 * Grab forcewake across all mmio operations to ease the
   load on uncore lock and use chepear mmio ops.

v3:

 * Removed some more pointless u8 data types.
 * Removed unused return from execlists_context_queue.
 * Commit message updates.

v4:
 * Unclumsify the unqueue if statement. (Chris Wilson)
 * Hide forcewake from the queuing function. (Chris Wilson)

Version 3 now makes the irq handling code path ~20% smaller on
48-bit PPGTT hardware, and a little bit less elsewhere. Hot
paths are mostly in-line now and hammering on the uncore
spinlock is greatly reduced together with mmio traffic to an
extent.

Benchmarking with "gem_latency -n 100" (keep submitting
batches with 100 nop instruction) shows approximately 4% higher
throughput, 2% less CPU time and 22% smaller latencies. This was
on a big-core while small-cores could benefit even more.

Most likely reason for the improvements are the MMIO
optimization and uncore lock traffic reduction.

One odd result is with "gem_latency -n 0" (dispatching empty
batches) which shows 5% more throughput, 8% less CPU time,
25% better producer and consumer latencies, but 15% higher
dispatch latency which is yet unexplained.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-01 10:36:02 +00:00
Chris Wilson 2743179d95 drm/i915: Execlists cannot pin a context without the object
Given that the intel_lr_context_pin cannot succeed without the object,
we cannot reach intel_lr_context_unpin() without first allocating that
object - so we can remove the redundant test.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456485751-15213-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-02-26 13:16:36 +00:00
Michel Thierry 99cf8ea165 drm/i915/lrc: Only set RS ctx enable in ctx control reg if there is a RS
The driver should only set the "RS context enable" bit in the context
image if we plan to use the resource streamer.

Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456393738-35608-1-git-send-email-michel.thierry@intel.com
2016-02-26 11:30:28 +00:00
Michel Thierry 715629190e drm/i915/gen9: Set value of Indirect Context Offset based on gen version
The cache line offset for the Indirect CS context (0x21C8) varies from gen
to gen.

v2: Move it into a function (Arun), use MISSING_CASE (Chris)
v3: Rebased (catched by ci bat)

Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456223509-6454-1-git-send-email-michel.thierry@intel.com
2016-02-26 11:30:17 +00:00
Tvrtko Ursulin f4e2deceb6 drm/i915: Fix premature LRC unpin in GuC mode
In GuC mode LRC pinning lifetime depends exclusively on the
request liftime. Since that is terminated by the seqno update
that opens up a race condition between GPU finishing writing
out the context image and the driver unpinning the LRC.

To extend the LRC lifetime we will employ a similar approach
to what legacy ringbuffer submission does.

We will start tracking the last submitted context per engine
and keep it pinned until it is replaced by another one.

Note that the driver unload path is a bit fragile and could
benefit greatly from efforts to unify the legacy and exec
list submission code paths.

At the moment i915_gem_context_fini has special casing for the
two which are potentialy not needed, and also depends on
i915_gem_cleanup_ringbuffer running before itself.

v2:
 * Move pinning into engine->emit_request and actually fix
   the reference/unreference logic. (Chris Wilson)

 * ring->dev can be NULL on driver unload so use a different
   route towards it.

v3:
 * Rebase.
 * Handle the reset path. (Chris Wilson)
 * Exclude default context from the pinning - it is impossible
   to get it right before default context special casing in
   general is eliminated.

v4:
 * Rebased & moved context tracking to
   intel_logical_ring_advance_and_submit.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Issue: VIZ-4277
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453976997-25424-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-01-28 17:23:15 +00:00
Tvrtko Ursulin 321fe304f1 drm/i915: Make LRC pinning own a reference to the context
Will simplify the following fix and sounds logical.

v2: Add some whitespace to separate logic better. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
2016-01-28 17:23:15 +00:00
Tvrtko Ursulin e5292823c1 drm/i915: Make LRC (un)pinning work on context and engine
Previously intel_lr_context_(un)pin were operating on requests
which is in conflict with their names.

If we make them take a context and an engine, it makes the names
make more sense and it also makes future fixes possible.

v2: Rebase for default_context/kernel_context change.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
2016-01-28 17:23:15 +00:00
Alex Dai 397097b026 drm/i915/guc: Decouple GuC engine id from ring id
Previously GuC uses ring id as engine id because of same definition.
But this is not true since this commit:

commit de1add3605
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Fri Jan 15 15:12:50 2016 +0000

    drm/i915: Decouple execbuf uAPI from internal implementation

Added GuC engine id into GuC interface to decouple it from ring id used
by driver.

v2: Keep ring name print out in debugfs; using for_each_ring() where
    possible to keep driver consistent. (Chris W.)

Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453579094-29860-1-git-send-email-yu.dai@intel.com
2016-01-25 10:56:30 +00:00
Tvrtko Ursulin 77b04a0428 drm/i915: More use of the cached LRC state
Since:

commit 82352e908a
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Fri Jan 15 17:12:45 2016 +0000

    drm/i915: Cache LRC state page in the context

and:

commit 0eb973d31d
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Fri Jan 15 15:10:28 2016 +0000

    drm/i915: Cache ringbuffer GTT VMA

We can also remove the ring buffer start updates on every
context update since the address will not change for the
duration of the LRC pin.

For GuC we can remove the update altogether because it
only cares about the ring buffer start.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453466567-33369-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-01-25 10:55:19 +00:00
Chris Wilson 426960bed3 drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids
Tvrtko was looking through the execbuffer-ioctl and noticed that the
uABI was tightly coupled to our internal engine identifiers. Close
inspection also revealed that we leak those internal engine identifiers
through the busy-ioctl, and those internal identifiers already do not
match the user identifiers. Fortuitiously, there is only one user of the
set of busy rings from the busy-ioctl, and they only wish to choose
between the RENDER and the BLT engines.

Let's fix the userspace ABI while we still can.

v2: Update the uAPI documentation to explain the identifiers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_busy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452876706-21620-1-git-send-email-chris@chris-wilson.co.uk
2016-01-21 11:00:35 +00:00
Chris Wilson 7c17d37737 drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists
Broadwell and later currently use the same unordered command sequence to
update the seqno in the HWS status page and then assert the user
interrupt. We should apply the w/a from legacy (where we do an mmio
read to delay the seqno read after the interrupt), but this is not
enough to enforce coherent seqno visibilty on Skylake. Rather than
search for the proper post-interrupt seqno barrier, use a strongly
ordered command sequence to write the seqno, then assert the user
interrupt from the ring.

v2: Move around the wa tail dwords to avoid adding duplicate code.

v3: Add references, comments on workarounds and bit5 check.

References: https://bugs.freedesktop.org/show_bug.cgi?id=93693
Testcase: igt/gem_ring_sync_loop #skl
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453297415-17793-1-git-send-email-mika.kuoppala@intel.com
2016-01-21 11:53:09 +02:00
Dave Gordon e28e404c3e drm/i915: tidy up a few leftovers
There are a few bits of code which the transformations implemented by
the previous patch reveal to be suboptimal, once the notion of a per-
ring default context has gone away. So this tidies up the leftovers.

It could have been squashed into the previous patch, but that would have
made that patch less clearly a simple transformation. In particular, any
change which alters the code block structure or indentation has been
deferred into this separate patch, because such things tend to make
diffs more difficult to read.

v4:	Rebased

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-4-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-21 09:21:29 +01:00
Dave Gordon ed54c1a1d1 drm/i915: abolish separate per-ring default_context pointers
Now that we've eliminated a lot of uses of ring->default_context,
we can eliminate the pointer itself.

All the engines share the same default intel_context, so we can just
keep a single reference to it in the dev_priv structure rather than one
in each of the engine[] elements. This make refcounting more sensible
too, as we now have a refcount of one for the one pointer, rather than
a refcount of one but multiple pointers.

From an idea by Chris Wilson.

v2:	transform an extra instance of ring->default_context introduced by
    42f1cae8c drm/i915: Restore inhibiting the load of the default context
    That patch's commentary includes:
	v2: Mark the global default context as uninitialized on GPU reset so
	    that the context-local workarounds are reloaded upon re-enabling
    The code implementing that now also benefits from the replacement of
    the multiple (per-ring) pointers to the default context with a single
    pointer to the unique kernel context.

v4:	Rebased, remove underused local (Nick Hoath)

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-21 09:21:29 +01:00