1
0
Fork 0
Commit Graph

32 Commits (761cab513d5e9079ec8ace48ab05d84b0083653b)

Author SHA1 Message Date
Matthew Auld 105d663d84 drm/i915: be more solid in checking the alignment
commit 1d61c5d711 upstream.

The alignment is u64, and yet is_power_of_2() assumes unsigned long,
which might give different results between 32b and 64b kernel.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305203534.210466-1-matthew.auld@intel.com
Cc: stable@vger.kernel.org
(cherry picked from commit 2920516b2f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:17:50 +01:00
Chris Wilson df40306902 drm/i915/execlists: Lift process_csb() out of the irq-off spinlock
If we only call process_csb() from the tasklet, though we lose the
ability to bypass ksoftirqd interrupt processing on direct submission
paths, we can push it out of the irq-off spinlock.

The penalty is that we then allow schedule_out to be called concurrently
with schedule_in requiring us to handle the usage count (baked into the
pointer itself) atomically.

As we do kick the tasklets (via local_bh_enable()) after our submission,
there is a possibility there to see if we can pull the local softirq
processing back from the ksoftirqd.

v2: Store the 'switch_priority_hint' on submission, so that we can
safely check during process_csb().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816171608.11760-1-chris@chris-wilson.co.uk
2019-08-16 20:59:02 +01:00
Jani Nikula 358c855cb6 drm/i915: move printing and load error inject to i915_utils.[ch]
Seems like a better fit. Reduce clutter in i915_drv.[ch].

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/bc458fa5e62fc8dae46216666f64ed6976fafaee.1565271681.git.jani.nikula@intel.com
2019-08-09 11:51:58 +03:00
Jani Nikula e3adffe838 drm/i915: move add_taint_for_CI() to i915_utils.h
Seems like a better fit. Reduce clutter in i915_drv.h.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a82d8c4e95496b3b4a9a251b655ea56a35c9d752.1565271681.git.jani.nikula@intel.com
2019-08-09 11:42:03 +03:00
Chris Wilson 22b7a426bb drm/i915/execlists: Preempt-to-busy
When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.

However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.

To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.

v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
2019-06-20 16:52:36 +01:00
Chris Wilson d45a1a5334 drm/i915: Move GEM object waiting to its own file
Continuing the decluttering of i915_gem.c by moving the object wait
decomposition into its own file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-11-chris@chris-wilson.co.uk
2019-05-28 12:45:29 +01:00
Chris Wilson 976b55f0e1 drm/i915: Allow a context to define its set of engines
Over the last few years, we have debated how to extend the user API to
support an increase in the number of engines, that may be sparse and
even be heterogeneous within a class (not all video decoders created
equal). We settled on using (class, instance) tuples to identify a
specific engine, with an API for the user to construct a map of engines
to capabilities. Into this picture, we then add a challenge of virtual
engines; one user engine that maps behind the scenes to any number of
physical engines. To keep it general, we want the user to have full
control over that mapping. To that end, we allow the user to constrain a
context to define the set of engines that it can access, order fully
controlled by the user via (class, instance). With such precise control
in context setup, we can continue to use the existing execbuf uABI of
specifying a single index; only now it doesn't automagically map onto
the engines, it uses the user defined engine map from the context.

v2: Fixup freeing of local on success of get_engines()
v3: Allow empty engines[]
v4: s/nengine/num_engines/
v5: Replace 64 limit on num_engines with a note that execbuf is
currently limited to only using the first 64 engines.
v6: Actually use the engines_mutex to guard the ctx->engines.

Testcase: igt/gem_ctx_engines
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-2-chris@chris-wilson.co.uk
2019-05-22 08:40:31 +01:00
Imre Deak 4547c255f4 drm/i915: Add support for tracking wakerefs w/o power-on guarantee
It's useful to track runtime PM refs that don't guarantee a device
power-on state to the rest of the driver. One such case is holding a
reference that will be put asynchronously, during which normal users
without their own reference shouldn't access the HW. A follow-up patch
will add support for disabling display power domains asynchronously
which needs this.

For this we can split wakeref_count into a low half-word tracking
all references (raw-wakerefs) and a high half-word tracking
references guaranteeing a power-on state (wakelocks).

Follow-up patches will make use of the API added here.

While at it add the missing docbook header for the unchecked
display-power and runtime_pm put functions.

No functional changes, except for printing leaked raw-wakerefs
and wakelocks separately in intel_runtime_pm_cleanup().

v2:
- Track raw wakerefs/wakelocks in the low/high half-word of
  wakeref_count, instead of adding a new counter. (Chris)
v3:
- Add a struct_member(T, m) helper instead of open-coding it. (Chris)
- Checkpatch indentation formatting fix.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-2-imre.deak@intel.com
2019-05-14 13:56:03 +03:00
Jani Nikula b30ed4cc2e drm/i915: move more generic utils to i915_utils.h
Reduce clutter from i915_drv.h and intel_drv.h.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/8c197872384fc35442b738c21ba0da9336e02a85.1556809195.git.jani.nikula@intel.com
2019-05-03 10:06:51 +03:00
Jani Nikula cb36330467 drm/i915: make i915_utils.h self-contained
And ensure it stays that way.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/69bcebefa6d8689d4a962394b0c6db04904354ed.1556809195.git.jani.nikula@intel.com
2019-05-03 10:06:48 +03:00
Joonas Lahtinen 0e2f54f88b Merge drm/drm-next into drm-intel-next-queued
This is needed to get the fourcc code merged without conflicts.

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-03-27 18:23:53 +02:00
Chris Wilson 9d1305ef80 drm/i915: Introduce the i915_user_extension_method
An idea for extending uABI inspired by Vulkan's extension chains.
Instead of expanding the data struct for each ioctl every time we need
to add a new feature, define an extension chain instead. As we add
optional interfaces to control the ioctl, we define a new extension
struct that can be linked into the ioctl data only when required by the
user. The key advantage being able to ignore large control structs for
optional interfaces/extensions, while being able to process them in a
consistent manner.

In comparison to other extensible ioctls, the key difference is the
use of a linked chain of extension structs vs an array of tagged
pointers. For example,

struct drm_amdgpu_cs_chunk {
        __u32           chunk_id;
        __u32           length_dw;
        __u64           chunk_data;
};

struct drm_amdgpu_cs_in {
        __u32           ctx_id;
        __u32           bo_list_handle;
        __u32           num_chunks;
        __u32           _pad;
        __u64           chunks;
};

allows userspace to pass in array of pointers to extension structs, but
must therefore keep constructing that array along side the command stream.
In dynamic situations like that, a linked list is preferred and does not
similar from extra cache line misses as the extension structs themselves
must still be loaded separate to the chunks array.

v2: Apply the tail call optimisation directly to nip the worry of stack
overflow in the bud.
v3: Defend against recursion.
v4: Fixup local types to match new uabi

Opens:
- do we include the result as an out-field in each chain?
struct i915_user_extension {
	__u64 next_extension;
	__u64 name;
	__s32 result;
	__u32 mbz; /* reserved for future use */
};
* Undecided, so provision some room for future expansion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-1-chris@chris-wilson.co.uk
2019-03-22 13:12:30 +00:00
Mel Gorman 70b44595ea mm, compaction: use free lists to quickly locate a migration source
The migration scanner is a linear scan of a zone with a potentiall large
search space.  Furthermore, many pageblocks are unusable such as those
filled with reserved pages or partially filled with pages that cannot
migrate.  These still get scanned in the common case of allocating a THP
and the cost accumulates.

The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP.  It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire
block.  The lowest PFN found during searches is tracked as the basis of
the start for the linear search after the first search of the free list
fails.  After the search, the free list is shuffled so that the next
search will not encounter the same page.  If the search fails then the
subsequent searches will be shorter and the linear scanner is used.

If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used.
It is somewhat pointless to use the list search in those cases.  Small
free pages must be used for the search and there is no guarantee that
movable pages are located within that block that are contiguous.

                                     5.0.0-rc1              5.0.0-rc1
                                 noboost-v3r10          findmig-v3r15
Amean     fault-both-3      3771.41 (   0.00%)     3390.40 (  10.10%)
Amean     fault-both-5      5409.05 (   0.00%)     5082.28 (   6.04%)
Amean     fault-both-7      7040.74 (   0.00%)     7012.51 (   0.40%)
Amean     fault-both-12    11887.35 (   0.00%)    11346.63 (   4.55%)
Amean     fault-both-18    16718.19 (   0.00%)    15324.19 (   8.34%)
Amean     fault-both-24    21157.19 (   0.00%)    16088.50 *  23.96%*
Amean     fault-both-30    21175.92 (   0.00%)    18723.42 *  11.58%*
Amean     fault-both-32    21339.03 (   0.00%)    18612.01 *  12.78%*

                                5.0.0-rc1              5.0.0-rc1
                            noboost-v3r10          findmig-v3r15
Percentage huge-3        86.50 (   0.00%)       89.83 (   3.85%)
Percentage huge-5        92.52 (   0.00%)       91.96 (  -0.61%)
Percentage huge-7        92.44 (   0.00%)       92.85 (   0.44%)
Percentage huge-12       92.98 (   0.00%)       92.74 (  -0.25%)
Percentage huge-18       91.70 (   0.00%)       91.71 (   0.02%)
Percentage huge-24       91.59 (   0.00%)       92.13 (   0.60%)
Percentage huge-30       90.14 (   0.00%)       93.79 (   4.04%)
Percentage huge-32       90.03 (   0.00%)       91.27 (   1.37%)

This shows an improvement in allocation latencies with similar
allocation success rates.  While not presented, there was a 31%
reduction in migration scanning and a 8% reduction on system CPU usage.
A 2-socket machine showed similar benefits.

[mgorman@techsingularity.net: several fixes]
  Link: http://lkml.kernel.org/r/20190204120111.GL9565@techsingularity.net
[vbabka@suse.cz: migrate block that was found-fast, some optimisations]
Link: http://lkml.kernel.org/r/20190118175136.31341-10-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <Vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:16 -08:00
Ville Syrjälä 70bbe53c6e drm/i915: Make sure fb gtt offsets stay within 32bits
Let's try to make sure the fb offset computations never hit
an integer overflow by making sure the entire fb stays
below 32bits. framebuffer_check() in the core already does
the same check, but as it doesn't know about tiling some things
can slip through. Repeat the check in the driver with tiling
taken into account.

v2: Use add_overflows() after massaging it to work for me (Chris)
v3: Call it add_overflow_t() to match min_t() & co. (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181023160201.9840-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2018-11-29 22:00:49 +02:00
Chris Wilson 74f6e18391 drm/i915: Convert to BITS_PER_TYPE
In commit 9144d75e22 ("include/linux/bitops.h: introduce BITS_PER_TYPE"),
we made BITS_PER_TYPE available to all and now we can use the macro to
replace some open-coded computation of sizeof(T) * BITS_PER_BYTE.

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926104707.17410-1-chris@chris-wilson.co.uk
2018-09-26 13:32:03 +01:00
Nick Desaulniers 815f0ddb34 include/linux/compiler*.h: make compiler-*.h mutually exclusive
Commit cafa0010cd ("Raise the minimum required gcc version to 4.6")
recently exposed a brittle part of the build for supporting non-gcc
compilers.

Both Clang and ICC define __GNUC__, __GNUC_MINOR__, and
__GNUC_PATCHLEVEL__ for quick compatibility with code bases that haven't
added compiler specific checks for __clang__ or __INTEL_COMPILER.

This is brittle, as they happened to get compatibility by posing as a
certain version of GCC.  This broke when upgrading the minimal version
of GCC required to build the kernel, to a version above what ICC and
Clang claim to be.

Rather than always including compiler-gcc.h then undefining or
redefining macros in compiler-intel.h or compiler-clang.h, let's
separate out the compiler specific macro definitions into mutually
exclusive headers, do more proper compiler detection, and keep shared
definitions in compiler_types.h.

Fixes: cafa0010cd ("Raise the minimum required gcc version to 4.6")
Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Suggested-by: Eli Friedman <efriedma@codeaurora.org>
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 17:31:34 -07:00
Chris Wilson b887d61546 drm/i915: Retire requests along rings
In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.

As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.

In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!

v2: We need both engine-order and ring-order processing to maintain our
knowledge of where individual rings have completed upto as well as
knowing what was last executing on any engine. And finally by decoupling
retiring the contexts on the engine and the timelines along the rings,
we do have to keep a reference to the context on each request
(previously it was guaranteed by the context being pinned).

v3: Not just a reference to the context, but we need to keep it pinned
as we manipulate the rings; i.e. we need a pin for both the manipulation
of the engine state during its retirements, and a separate pin for the
manipulation of the ring state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
2018-04-30 16:01:18 +01:00
Lucas De Marchi 57bdff48a0 drm/i915: Reword warning for missing cases
In some places we end up converting switch statements to a series of
if/else, particularly when introducing helper functions to handle a
group of cases. It's tempting to either leave a wrong warning (since now
we don't have a switch case anymore) or to convert to WARN(1, ...),
but we can just provide a better message and avoid the doubt when such
conversions arrise.

Introducing a warning inside i915_driver_load() just for tests we get:

[ 4535.233717] Missing case (ret == 0)
[ 4535.233868] WARNING: CPU: 1 PID: 795 at drivers/gpu/drm/i915/i915_drv.c:1341 i915_driver_load+0x42/0x10e0 [i915]

which is clear enough.

v2: remove __func__ since this is already on the warning.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180319173720.6974-1-lucas.demarchi@intel.com
2018-03-27 10:30:04 -07:00
Michal Wajdeczko b74eeeb6b1 drm/i915: Move some utility functions to i915_util.h
We have dedicated header file for utility functions and macros.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171221185334.17396-2-michal.wajdeczko@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171221215735.30314-1-chris@chris-wilson.co.uk
2017-12-22 09:13:47 +00:00
Tvrtko Ursulin 2d7514152b drm/i915: Warn in debug builds of incorrect usages of ptr_pack_bits
GEM_BUG_ON if the packed bits do not fit into the specified width.

v2: Avoid using the macro argument twice.
v3: Drop unnecessary braces. (Joonas)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171103090538.14474-1-tvrtko.ursulin@linux.intel.com
2017-11-03 09:28:06 +00:00
Chris Wilson 7c26240e8a drm/i915: Try harder to finish the idle-worker
If a worker requeues itself, it may switch to a different kworker pool,
which flush_work() considers as complete. To be strict, we then need to
keep flushing the work until it is no longer pending.

References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006104038.22337-1-chris@chris-wilson.co.uk
2017-10-06 17:49:46 +01:00
Michal Wajdeczko bb8920f5be drm/i915: Fix pointer-to-int conversion
Commit faf654864b ("drm/i915: Unify uC variable types to avoid
flooding checkpatch.pl") breaks 32-bit kernel builds. Lets use
cast helper to make compiler happy.

v2: introduce ptr_to_u64 (Chris)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006130844.49012-1-michal.wajdeczko@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-10-06 16:29:47 +01:00
Chris Wilson 4ff4b44cbb drm/i915: Store a direct lookup from object handle to vma
The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.

In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.

v2: Prettier names, more magic.
v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16 16:54:04 +01:00
Chris Wilson 6c067579e6 drm/i915: Split execlist priority queue into rbtree + linked list
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.

Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.

There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).

v2: Avoid use-after-free when deleting a depleted priolist

v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
2017-05-17 13:38:09 +01:00
Chris Wilson 0ce8178808 drm/i915: Redefine ptr_pack_bits() and friends
Rebrand the current (pointer | bits) pack/unpack utility macros as
explicit bit twiddling for PAGE_SIZE so that we can use the more
flexible underlying macros for different bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-4-chris@chris-wilson.co.uk
2017-05-17 13:38:04 +01:00
Chris Wilson 991bfc64db drm/i915: Make ptr_unpack_bits() more function-like
ptr_unpack_bits() is a function-like macro, as such it is meant to be
replaceable by a function. In this case, we should be passing in the
out-param as a pointer.

Bizarrely this does affect code generation:

function                                     old     new   delta
i915_gem_object_pin_map                      409     389     -20

An improvement(?) in this case, but one can't help wonder what
strict-aliasing optimisations we are preventing.

The generated code looks identical in using ptr_unpack_bits (no extra
motions to stack, the pointer and bits appear to be kept in registers),
the difference appears to be code ordering and with a reorder it is able
to use smaller forward jumps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-3-chris@chris-wilson.co.uk
2017-05-17 13:38:03 +01:00
Michal Wajdeczko 16586fcd86 drm/i915: Move uncore definitions into a separate header
In order to allow use of e.g. forcewake_domains in a other feature headers
included from the top of i915_drv.h, move all uncore related definitions
into their own header.

v2: move __mask_next_bit macro to utils header (Mika)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-05-09 16:02:47 +03:00
Michal Wajdeczko f0d661534f drm/i915: Move WARN_ON/MISSING_CASE macros to i915_utils.h
We can't sometimes use these macros in other headers due to
include and definition order. As i915_utils.h already contains
other helper macros move these macros there.

v2: checkpatch cleanup for WARN() macro.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170328084513.174200-1-michal.wajdeczko@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-29 11:10:28 +01:00
Michal Wajdeczko 16f11f4696 drm/i915/guc: Use formalized struct definition for ads object
Manual pointer manipulation is error prone. Let compiler calculate
right offsets for us in case we need to change ads layout.

v2: don't call it object (Chris)
v3: restyle offset assignments (Chris)
v4: stylistic reductions

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170314133309.126432-1-michal.wajdeczko@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-15 15:43:43 +00:00
Chris Wilson 1692cd60d9 drm/i915: Sanity check the computed size and base of stolen memory
Just do a quick check that the stolen memory address range doesn't
overflow our chosen integer type.

v2: Add add_overflows() to utils with the promise that gcc7 can do this
better than C and then maybe it will have a proper definition in core.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170130134721.5159-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-31 11:19:23 +00:00
Chris Wilson b7163936bd drm/i915: Move a few more utility macros to i915_utils.h
Now that we have split out a header file for simple macros (that maybe
we can promote into a core header), move a few macros across from
i915_drv.h

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105164148.26875-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-05 21:02:04 +00:00
Chris Wilson 40b326eefe drm/i915: Move a few utility macros into a separate header
In order to defeat some circular dependencies between headers to allow use
of e.g. range_overflows() in a header, move the simple independent macros
into their own header.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-4-chris@chris-wilson.co.uk
2017-01-05 15:34:45 +00:00