Commit graph

181 commits

Author SHA1 Message Date
Michel Thierry 2dba3239f5 drm/i915/gen8: Add 4 level switching infrastructure and lrc support
In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.

In LRC, the addressing mode must be specified in every context
descriptor, and the base address to PML4 is stored in the reg state.

v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.
v3: s/gen8_map_page_directory/gen8_setup_page_directory and
s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
Also, clflush will be needed for bxt. (Akash)
v4: Squashed lrc-specific code and use a macro to set PML4 register.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
PDP update in bb_start is only for legacy 32b mode.
v6: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v7: There is no need to update the pml4 register value in
execlists_update_context. (Akash)
v8: Move pd and pdp setup functions to a previous patch, they do not
belong here. (Akash)
v9: Check USES_FULL_48BIT_PPGTT instead of GEN8_CTX_ADDRESSING_MODE in
gen8_emit_bb_start to check if emit pdps is needed. (Akash)

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-08-14 18:16:21 +02:00
Mika Kuoppala 031a8936dc drm/i915: Check idle to active before processing CSQ
If idle to active bit is set, the rest of the fields
in CSQ are not valid.

Bail out early if this is the case in order to prevent
rest of the loop inspecting stale values.

This was found by Bspec/code inspection. Doesn't seem to fix any of
the known issues.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
[danvet: Add note about how this was found.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-08-14 17:50:40 +02:00
Mika Kuoppala cc53699b25 drm/i915: Use masked write for Context Status Buffer Pointer
This register needs to be updated with masked writes.

This was found by code inspection and comparison with Bspec and
doesn't seem to fix any known issue.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
[danvet: Add note about impact.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-08-14 17:50:38 +02:00
Arun Siluvery 84e81020ee drm/i915: Add provision to extend Golden context batch
The Golden batch carries 3D state at the beginning so that HW starts with
a known state. It is carried as a binary blob which is auto-generated from
source. The idea was it would be easier to maintain and keep the complexity
out of the kernel which makes sense as we don't really touch it. However if
you really need to update it then you need to update generator source and
keep the binary blob in sync with it.

There is a need to patch this in bxt to send one additional command to enable
a feature. A solution was to patch the binary data with some additional
data structures (included as part of auto-generator source) but it was
unnecessarily complicated.

Chris suggested the idea of having a secondary batch and execute two batch
buffers. It has clear advantages as we needn't touch the base golden batch,
can customize secondary/auxiliary batch depending on Gen and can be carried
in the driver with no dependencies.

This patch adds support for this auxiliary batch which is inserted at the
end of golden batch and is completely independent from it. Thanks to Mika
for the preliminary review.

v2: Strictly conform to the batch size requirements to cover Gen2 and
add comments to clarify overflow check in macro (Chris, Mika).

v3: aux_batch_offset was declared as u64, change it to u32 (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Armin Reese <armin.c.reese@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-21 09:30:57 +02:00
Daniel Vetter ca6e440577 Merge tag 'drm-intel-fixes-2015-07-15' into drm-intel-next-queued
Backmerge fixes since it's getting out of hand again with the massive
split due to atomic between -next and 4.2-rc. All the bugfixes in
4.2-rc are addressed already (by converting more towards atomic
instead of minimal duct-tape) so just always pick the version in next
for the conflicts in modeset code.

All the other conflicts are just adjacent lines changed.

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem_gtt.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_drv.h
	drivers/gpu/drm/i915/intel_ringbuffer.h

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-07-15 16:36:50 +02:00
Arun Siluvery 9b01435d28 drm/i915/gen9: Add WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken
In Indirect context w/a batch buffer,
+WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken

v2: SKL revision id was used for BXT, copy paste error found during
internal review (Bob Beckett).

v3: explain why part of the WA is in Per ctx batch (Mika)

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-15 14:30:18 +02:00
Arun Siluvery a4106a782d drm/i915/gen9: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
In Indirect context w/a batch buffer,
+WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt

v2: address static checker warning where unsigned value was checked for
less than zero which is never true (Dan Carpenter).

v3: The WA uses default value of GEN8_L3SQCREG4 during flush but that disables
some other WA; update default value to retain it and document dependency (Mika).

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-15 14:30:12 +02:00
Arun Siluvery 0907c8f7e0 drm/i915/gen9: Add WaDisableCtxRestoreArbitration workaround
In Indirect and Per context w/a batch buffer,
+WaDisableCtxRestoreArbitration

v2: SKL revision id was used for BXT, copy paste error found during
internal review (Bob Beckett).

v3: use updated macro.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Robert Beckett <robert.beckett@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-15 14:30:07 +02:00
Arun Siluvery 0504cffc7b drm/i915: Enable WA batch buffers for Gen9
This patch only enables support for Gen9, the actual WA will be
initialized in subsequent patches.

The WARN that we use to warn user if WA batch support is not available
for a particular Gen is replaced with DRM_ERROR as warning here doesn't
really add much value.

v2: include all infrastructure bits in this patch so that subsequent
changes only correspond the WA added (Chris)

v3: use updated macro.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-15 14:30:02 +02:00
Peter Antoine 3bbaba0cea drm/i915: Added Programming of the MOCS
This change adds the programming of the MOCS registers to the gen 9+
platforms. The set of MOCS configuration entries introduced by this
patch is intended to be minimal but sufficient to cover the needs of
current userspace - i.e. a good set of defaults. It is expected to be
extended in the future to provide further default values or to allow
userspace to redefine its private MOCS tables based on its demand for
additional caching configurations. In this setup, userspace should
only utilize the first N entries, higher entries are reserved for
future use.

It creates a fixed register set that is programmed across the different
engines so that all engines have the same table. This is done as the
main RCS context only holds the registers for itself and the shared
L3 values. By trying to keep the registers consistent across the
different engines it should make the programming for the registers
consistent.

v2:
-'static const' for private data structures and style changes.(Matt Turner)
v3:
- Make the tables "slightly" more readable. (Damien Lespiau)
- Updated tables fix performance regression.
v4:
- Code formatting. (Chris Wilson)
- re-privatised mocs code. (Daniel Vetter)
v5:
- Changed the name of a function. (Chris Wilson)
v6:
- re-based
- Added Mesa table entry (skylake & broxton) (Francisco Jerez)
- Tidied up the readability defines (Francisco Jerez)
- NUMBER of entries defines wrong. (Jim Bish)
- Added comments to clear up the meaning of the tables (Jim Bish)

Signed-off-by: Peter Antoine <peter.antoine@intel.com>

v7 (Francisco Jerez):
- Don't write L3-specific MOCS_ESC/SCC values into the e/LLC control
  tables.  Prefix L3-specific defines consistently with L3_ and
  e/LLC-specific defines with LE_ to avoid this kind of confusion in
  the future.
- Change L3CC WT define back to RESERVED (matches my hardware
  documentation and the original patch, probably a misunderstanding
  of my own previous comment).
- Drop Android tables, define new minimal tables more suitable for the
  open source stack.
- Add comment that the MOCS tables are part of the kernel ABI.
- Move intel_logical_ring_begin() and _advance() calls one level down
  (Chris Wilson).
- Minor formatting and style fixes.
v8 (Francisco Jerez):
- Add table size sanity check to emit_mocs_control/l3cc_table() (Chris
  Wilson).
- Add comment about undefined entries being implicitly set to uncached
  for forwards compatibility.
v9 (Francisco Jerez):
- Minor style fixes.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-14 17:13:14 +02:00
Arun Siluvery 83b8a982b1 drm/i915: Update wa_ctx_emit() macro as per kernel coding guidelines
wa_ctx_emit() depends on the name of a local variable; if the name of that
variable is changed then we get compile errors. In this case it is unlikely
to be changed as this macro is only used in this set of functions but
Kernel coding guidelines doesn't recommend doing this. It was my mistake
as I should have corrected it at the beginning but missed so correct
this before there are more usages of this macro (Bob Beckett).

https://www.kernel.org/doc/Documentation/CodingStyle,
Chapter 12, "Things to avoid when using macros", point 2):

"
2) macros that depend on having a local variable with a magic name:

   #define FOO(val) bar(index, val)

might look like a good thing, but it's confusing as hell when one reads the
code and it's prone to breakage from seemingly innocent changes.
"

v2: Optimization to avoid multiple evaluation of 'index' in the macro.
Since we invoke it multiple times, compiler, if it can, should be able to coalesce
them into a single condition and remove multiple WARN_ON checks (Chris).

Suggested-by: Robert Beckett <robert.beckett@intel.com>
Cc: Robert Beckett <robert.beckett@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-08 17:24:14 +02:00
Mika Kuoppala 1cff8cc35b drm/i915: Mark elsps submitted when they are pushed to hw
Now when we have requests this deep on call chain, we can mark
the elsp being submitted when it actually is. Remove temp variable
and readjust commenting to more closely fit to the code.

v2: Avoid tmp variable and reduce number of writes (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 16:51:43 +02:00
Mika Kuoppala 8ee36152cf drm/i915: Convert execlists_ctx_descriptor() for requests
Pass around requests to carry context deeper in callchain.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 16:51:33 +02:00
Mika Kuoppala cc3c42532c drm/i915: Convert execlists_elsp_writ() for requests
Pass around requests to carry context deeper in callchain.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 16:51:26 +02:00
Mika Kuoppala 8ba319da89 drm/i915: Convert intel_lr_context_pin() for requests
Pass around requests to carry context deeper in callchain.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 16:47:41 +02:00
Mika Kuoppala f3cc01f094 drm/i915: Assign request ringbuf before pin
In preparation to make intel_lr_context_pin|unpin to accept
requests, assign ringbuf into request before we call the pinning.

v2: No need to unset ringbuf on error path (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 16:46:07 +02:00
Mika Kuoppala 05d9824bfb drm/i915: Convert execlists_update_context() for requests
Pass around requests to carry context deeper in callchain.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 16:45:12 +02:00
Mika Kuoppala d8cb8875ac drm/i915: Convert execlist_submit_contexts() for requests
Pass around requests to carry context deeper in callchain.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 16:44:34 +02:00
Arun Siluvery 9e00084750 drm/i915: Update WaFlushCoherentL3CacheLinesAtContextSwitch
In this WA we need to set GEN8_L3SQCREG4[21:21] and reset it after PIPE_CONTROL
instruction but there is a slight complication as this is applied in WA batch
where the values are only initialized once.
Dave identified an issue with the current implementation where the register value
is read once at the beginning and it is reused; this patch corrects this by saving
the register value to memory, update register with the bit of our interest and
restore it back with original value.

This implementation uses MI_LOAD_REGISTER_MEM which is currently only used
by command parser and was using a default length of 0. This is now updated
with correct length and moved to appropriate place.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 14:37:39 +02:00
Abdiel Janulgue 6922528a04 drm/i915: Enable resource streamer on Execlists
GEN8 and above uses Execlists by default instead of the legacy
ringbuffer for batch execution. This patch enables the resource
streamer bits when required.

Patch is based on the initial work by Minu Mathai <minu.mathai@intel.com>
This version also adds the required bits to enable GEN8 Resource
Streamer context save and restore for Execlists.

Cc: ville.syrjala@linux.intel.com
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 10:26:13 +02:00
John Harrison 79bbcc299f drm/i915: Reserve space improvements
An earlier patch was added to reserve space in the ring buffer for the
commands issued during 'add_request()'. The initial version was
pessimistic in the way it handled buffer wrapping and would cause
premature wraps and thus waste ring space.

This patch updates the code to better handle the wrap case. It no
longer enforces that the space being asked for and the reserved space
are a single contiguous block. Instead, it allows the reserve to be on
the far end of a wrap operation. It still guarantees that the space is
available so when the wrap occurs, no wait will happen. Thus the wrap
cannot fail which is the whole point of the exercise.

Also fixed a merge failure with some comments from the original patch.

v2: Incorporated suggestion by David Gordon to move the wrap code
inside the prepare function and thus allow a single combined
wait_for_space() call rather than doing one before the wrap and
another after. This also makes the prepare code much simpler and
easier to follow.

v3: Fix for 'effective_size' vs 'size' during ring buffer remainder
calculations (spotted by Tomas Elf).

For: VIZ-5115
CC: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-03 07:38:59 +02:00
Linus Torvalds 099bfbfc7f Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is the main drm pull request for v4.2.

  I've one other new driver from freescale on my radar, it's been posted
  and reviewed, I'd just like to get someone to give it a last look, so
  maybe I'll send it or maybe I'll leave it.

  There is no major nouveau changes in here, Ben was working on
  something big, and we agreed it was a bit late, there wasn't anything
  else he considered urgent to merge.

  There might be another msm pull for some bits that are waiting on
  arm-soc, I'll see how we time it.

  This touches some "of" stuff, acks are in place except for the fixes
  to the build in various configs,t hat I just applied.

  Summary:

  New drivers:
      - virtio-gpu:
                KMS only pieces of driver for virtio-gpu in qemu.
                This is just the first part of this driver, enough to run
                unaccelerated userspace on. As qemu merges more we'll start
                adding the 3D features for the virgl 3d work.
      - amdgpu:
                a new driver from AMD to driver their newer GPUs. (VI+)
                It contains a new cleaner userspace API, and is a clean
                break from radeon moving forward, that AMD are going to
                concentrate on. It also contains a set of register headers
                auto generated from AMD internal database.

  core:
      - atomic modesetting API completed, enabled by default now.
      - Add support for mode_id blob to atomic ioctl to complete interface.
      - bunch of Displayport MST fixes
      - lots of misc fixes.

  panel:
      - new simple panels
      - fix some long-standing build issues with bridge drivers

  radeon:
      - VCE1 support
      - add a GPU reset counter for userspace
      - lots of fixes.

  amdkfd:
      - H/W debugger support module
      - static user-mode queues
      - support killing all the waves when a process terminates
      - use standard DECLARE_BITMAP

  i915:
      - Add Broxton support
      - S3, rotation support for Skylake
      - RPS booting tuning
      - CPT modeset sequence fixes
      - ns2501 dither support
      - enable cmd parser on haswell
      - cdclk handling fixes
      - gen8 dynamic pte allocation
      - lots of atomic conversion work

  exynos:
      - Add atomic modesetting support
      - Add iommu support
      - Consolidate drm driver initialization
      - and MIC, DECON and MIPI-DSI support for exynos5433

  omapdrm:
      - atomic modesetting support (fixes lots of things in rewrite)

  tegra:
      - DP aux transaction fixes
      - iommu support fix

  msm:
      - adreno a306 support
      - various dsi bits
      - various 64-bit fixes
      - NV12MT support

  rcar-du:
      - atomic and misc fixes

  sti:
      - fix HDMI timing complaince

  tilcdc:
      - use drm component API to access tda998x driver
      - fix module unloading

  qxl:
      - stability fixes"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (872 commits)
  drm/nouveau: Pause between setting gpu to D3hot and cutting the power
  drm/dp/mst: close deadlock in connector destruction.
  drm: Always enable atomic API
  drm/vgem: Set unique to "vgem"
  of: fix a build error to of_graph_get_endpoint_by_regs function
  drm/dp/mst: take lock around looking up the branch device on hpd irq
  drm/dp/mst: make sure mst_primary mstb is valid in work function
  of: add EXPORT_SYMBOL for of_graph_get_endpoint_by_regs
  ARM: dts: rename the clock of MIPI DSI 'pll_clk' to 'sclk_mipi'
  drm/atomic: Don't set crtc_state->enable manually
  drm/exynos: dsi: do not set TE GPIO direction by input
  drm/exynos: dsi: add support for MIC driver as a bridge
  drm/exynos: dsi: add support for Exynos5433
  drm/exynos: dsi: make use of array for clock access
  drm/exynos: dsi: make use of driver data for static values
  drm/exynos: dsi: add macros for register access
  drm/exynos: dsi: rename pll_clk to sclk_clk
  drm/exynos: mic: add MIC driver
  of: add helper for getting endpoint node of specific identifiers
  drm/exynos: add Exynos5433 decon driver
  ...
2015-06-26 13:18:51 -07:00
Michel Thierry 7a01a0a292 drm/i915/lrc: Update PDPx registers with lri commands
A safer way to update the PDPx registers is sending lri commands, added
in the ring before the batchbuffer start. Otherwise, the ctx must be idle
before trying to change anything (but the ring-tail) in the ctx image. An
example where the ctx won't be idle is lite-restore.

This patch depends on 5b7e4c9ce ("drm/i915/gtt: Mark TLBS dirty for gen8+").

v2: Combine lri writes (and save 8 commands). (Mika)
v3: Rebase after ring/req changes, and removed references to deprecated patches.

Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-26 18:33:51 +02:00
Mika Kuoppala d852c7bf90 drm/i915/gtt: Introduce i915_page_dir_dma_addr
The legacy mode mm switch and the execlist context assignment
needs dma address for the page directories.

Introduce a function that encapsulates the scratch_pd dma
fallback if no pd is found.

v2: Rebase, s/ring/req

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-26 10:50:52 +02:00
Arun Siluvery 0160f05539 drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
In Indirect context w/a batch buffer,
WaClearSlmSpaceAtContextSwitch

This WA performs writes to scratch page so it must be valid, this check
is performed before initializing the batch with this WA.

v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)

v3: GTT bit in scratch address should be mbz (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-24 00:22:38 +02:00
Arun Siluvery 5e60d79071 drm/i915: Bail out early if WA batch is not available for given Gen
To initialize WA batch, at the moment we first allocate batch and then check
whether we have any WA to be initialized for the given Gen; if we don't have
any WA then we WARN the user, destroy the batch and return but this is causing
another WARN in cleanup code complaining about sleeping in atomic context.
Till we understand this better and to keep things simpler, bail out early
if we don't have WA.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 17:26:16 +02:00
Arun Siluvery 4d78c8dcf9 drm/i915: Fix warnings reported by 0-day
Kernel 0-day framework reported warnings with WA batch patches, this patch
fixes those warnings and an additional warning reported in intel_lrc.c file.

Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 17:25:54 +02:00
John Harrison ae70797d8d drm/i915: Update a bunch of LRC functions to take requests
A bunch of the low level LRC functions were passing around ringbuf and ctx
pairs. In a few cases, they took the r/c pair and a request as well. This is all
quite messy and unnecesary. The context_queue() call is especially bad since the
fake request code got removed - it takes a request and three extra things that
must be extracted from the request and then it checks them against what it finds
in the request. Removing all the derivable data makes the code much simpler all
round.

This patch updates those functions to just take the request structure.

Note that logical_ring_wait_for_space now takes a request structure but already
had a local request pointer that it uses to scan for something to wait on. To
avoid confusion the local variable has been renamed 'target' (it is searching
for a target request to do something with) and the parameter has been called req
(to guarantee anything accidentally missed gets a compiler error).

v2: Updated commit message re wait_for_space (Tomas Elf review comment).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:34 +02:00
John Harrison 9bb1af4406 drm/i915: Remove 'faked' request from LRC submission
The LRC submission code requires a request for tracking purposes. It does not
actually require that request to 'complete' it simply uses it for keeping hold
of reference counts on contexts and such like.

Previously, the fall back path of polling for space in the ring would start by
submitting any outstanding work that was sat in the buffer. This submission was
not done as part of the request that that work was owned by because that would
lead to complications with the request being submitted twice. Instead, a null
request structure was passed in to the submit call and a fake one was created.

That fall back path has long since been obsoleted and has now been removed. Thus
there is never any need to fake up a request structure. This patch removes that
code. A couple of sanity check warnings are added as well, just in case.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:33 +02:00
John Harrison bccca494f7 drm/i915: Remove the now obsolete 'outstanding_lazy_request'
The outstanding_lazy_request is no longer used anywhere in the driver.
Everything that was looking at it now has a request explicitly passed in from on
high. Everything that was relying upon it behind the scenes is now explicitly
creating/passing/submitting its own private request. Thus the OLR can be
removed.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:32 +02:00
John Harrison ccd98fe499 drm/i915: Add *_ring_begin() to request allocation
Now that the *_ring_begin() functions no longer call the request allocation
code, it is finally safe for the request allocation code to call *_ring_begin().
This is important to guarantee that the space reserved for the subsequent
i915_add_request() call does actually get reserved.

v2: Renamed functions according to review feedback (Tomas Elf).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:30 +02:00
John Harrison 4d616a293a drm/i915: Update intel_logical_ring_begin() to take a request structure
Now that everything above has been converted to use requests,
intel_logical_ring_begin() can be updated to take a request instead of a
ringbuf/context pair. This also means that it no longer needs to lazily allocate
a request if no-one happens to have done it earlier.

Note that this change makes the execlist signature the same as the legacy
version. Thus the two functions could be merged into a ring->begin() wrapper if
required.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:30 +02:00
John Harrison be795fc17b drm/i915: Update ring->emit_bb_start() to take a request structure
Updated the ring->emit_bb_start() implementation to take a request instead of a
ringbuf/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:26 +02:00
John Harrison c4e766389e drm/i915: Update ring->emit_request() to take a request structure
Updated the ring->emit_request() implementation to take a request instead of a
ringbuf/request pair. Also removed its use of the OLR for obtaining the
request's seqno.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:24 +02:00
John Harrison 7deb4d3980 drm/i915: Update ring->emit_flush() to take a request structure
Updated the various ring->emit_flush() implementations to take a request instead
of a ringbuf/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:23 +02:00
John Harrison 4866d729ab drm/i915: Update flush_all_caches() to take request structures
Updated the *_ring_flush_all_caches() functions to take requests instead of
rings or ringbuf/context pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:20 +02:00
John Harrison e2be4faf30 drm/i915: Update workarounds_emit() to take request structures
Updated the *_ring_workarounds_emit() functions to take requests instead of
ring/context pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:19 +02:00
John Harrison 2f20055d36 drm/i915: Update a bunch of execbuffer helpers to take request structures
Updated *_ring_invalidate_all_caches(), i915_reset_gen7_sol_offsets() and
i915_emit_box() to take request structures instead of ring or ringbuf/context
pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:18 +02:00
John Harrison b2af037693 drm/i915: Update [vma|object]_move_to_active() to take request structures
Now that everything above has been converted to use request structures, it is
possible to update the lower level move_to_active() functions to be request
based as well.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:16 +02:00
John Harrison 75289874e4 drm/i915: Update add_request() to take a request structure
Now that all callers of i915_add_request() have a request pointer to hand, it is
possible to update the add request function to take a request pointer rather
than pulling it out of the OLR.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:15 +02:00
John Harrison 91af127fd7 drm/i915: Update i915_gem_object_sync() to take a request structure
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the i915_gem_object_sync()
code path.

v2: Much more complex patch to share a single request between the sync and the
page flip. The _sync() function now supports lazy allocation of the request
structure. That is, if one is passed in then that will be used. If one is not,
then a request will be allocated and passed back out. Note that the _sync() code
does not necessarily require a request. Thus one will only be created until
certain situations. The reason the lazy allocation must be done within the
_sync() code itself is because the decision to need one or not is not really
something that code above can second guess (except in the case where one is
definitely not required because no ring is passed in).

The call chains above _sync() now support passing a request through which most
callers passing in NULL and assuming that no request will be required (because
they also pass in NULL for the ring and therefore can't be generating any ring
code).

The exeception is intel_crtc_page_flip() which now supports having a request
returned from _sync(). If one is, then that request is shared by the page flip
(if the page flip is of a type to need a request). If _sync() does not generate
a request but the page flip does need one, then the page flip path will create
its own request.

v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
Elf review request). Rebased onto newer tree that significantly changed the
synchronisation code.

v4: Updated comments from review feedback (Tomas Elf)

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:13 +02:00
John Harrison be01363f0a drm/i915: Update render_state_init() to take a request structure
Updated the two render_state_init() functions to take a request pointer instead
of a ring. This removes their reliance on the OLR.

v2: Rebased to newer tree.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:12 +02:00
John Harrison 8753181e10 drm/i915: Update init_context() to take a request structure
Now that everything above has been converted to use requests, it is possible to
update init_context() to take a request pointer instead of a ring/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:12 +02:00
John Harrison 76c3916887 drm/i915: Update deferred context creation to do explicit request management
In execlist mode, context initialisation is deferred until first use of the
given context. This is because execlist mode has per ring context state and thus
many more context storage objects than legacy mode and many are never actually
used. Previously, the initialisation commands were written to the ring and
tagged with some random request structure via the OLR. This seemed to be causing
a null pointer deference bug under certain circumstances (BZ:88865).

This patch adds explicit request creation and submission to the deferred
initialisation code path. Thus removing any reliance on or randomness caused by
the OLR.

Note that it should be possible to move the deferred context creation until even
later - when the context is actually switched to rather than when it is merely
validated. This would allow the initialisation to be done within the request of
the work that is wanting to use the context. Hence, the extra request that is
created, used and retired just for the context init could be removed completely.
However, this is left for a follow up patch.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:11 +02:00
John Harrison dc4be6071a drm/i915: Add explicit request management to i915_gem_init_hw()
Now that a single per ring loop is being done for all the different
intialisation steps in i915_gem_init_hw(), it is possible to add proper request
management as well. The last remaining issue is that the context enable call
eventually ends up within *_render_state_init() and this does its own private
_i915_add_request() call.

This patch adds explicit request creation and submission to the top level loop
and removes the add_request() from deep within the sub-functions.

v2: Updated for removal of batch_obj from add_request call in previous patch.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:08 +02:00
John Harrison a3fbe05a61 drm/i915: Don't tag kernel batches as user batches
The render state initialisation code does an explicit i915_add_request() call to
commit the init commands. It was passing in the initialisation batch buffer to
add_request() as the batch object parameter. However, the batch object entry in
the request structure (which is all that parameter is used for) is meant for
keeping track of user generated batch buffers for blame tagging during GPU
hangs.

This patch clears the batch object parameter so that kernel generated batch
buffers are not tagged as being user generated.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:07 +02:00
John Harrison 5b4a60c276 drm/i915: Add flag to i915_add_request() to skip the cache flush
In order to explcitly track all GPU work (and completely remove the outstanding
lazy request), it is necessary to add extra i915_add_request() calls to various
places. Some of these do not need the implicit cache flush done as part of the
standard batch buffer submission process.

This patch adds a flag to _add_request() to specify whether the flush is
required or not.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:04 +02:00
John Harrison 8a8edb5917 drm/i915: Update execbuffer_move_to_active() to take a request structure
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the
execbuffer_move_to_active() code path.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:03 +02:00
John Harrison 535fbe8233 drm/i915: Update move_to_gpu() to take a request structure
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the move_to_gpu() code paths.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:03 +02:00
John Harrison 95c24161cd drm/i915: Update the dispatch tracepoint to use params->request
Updated a couple of trace points to use the now cached request pointer rather
than extracting it from the ring.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:02 +02:00