alistair23-linux

redonkable

Author	SHA1	Message	Date
Christian König	68c12d24ce	drm/sched: revert "fix timeout handling v2" v2 This reverts commit `0efd2d2f68`. It's still causing problems for V3D. v2: keep rearming the timeout. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-28 15:49:58 -05:00
Trigger Huang	85744e9c10	drm/scheduler: Fix bad job be re-processed in TDR A bad job is the one triggered TDR(In the current amdgpu's implementation, actually all the jobs in the current joq-queue will be treated as bad jobs). In the recovery process, its fence will be fake signaled and as a result, the work behind will be scheduled to delete it from the mirror list, but if the TDR process is invoked before the work's execution, then this bad job might be processed again and the call dma_fence_set_error to its fence in TDR process will lead to kernel warning trace: [ 143.033605] WARNING: CPU: 2 PID: 53 at ./include/linux/dma-fence.h:437 amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] kernel: [ 143.033606] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 snd_hda_codec_generic crypto_simd glue_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev snd_seq_device snd_timer snd soundcore binfmt_misc input_leds mac_hid serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 8139too floppy psmouse 8139cp mii i2c_piix4 pata_acpi [ 143.033649] CPU: 2 PID: 53 Comm: kworker/2:1 Tainted: G OE 4.15.0-20-generic #21-Ubuntu [ 143.033650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 143.033653] Workqueue: events drm_sched_job_timedout [amd_sched] [ 143.033656] RIP: 0010:amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] [ 143.033657] RSP: 0018:ffffa9f880fe7d48 EFLAGS: 00010202 [ 143.033659] RAX: 0000000000000007 RBX: ffff9b98f2b24c00 RCX: ffff9b98efef4f08 [ 143.033660] RDX: ffff9b98f2b27400 RSI: ffff9b98f2b24c50 RDI: ffff9b98efef4f18 [ 143.033660] RBP: ffffa9f880fe7d98 R08: 0000000000000001 R09: 00000000000002b6 [ 143.033661] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b98efef3430 [ 143.033662] R13: ffff9b98efef4d80 R14: ffff9b98efef4e98 R15: ffff9b98eaf91c00 [ 143.033663] FS: 0000000000000000(0000) GS:ffff9b98ffd00000(0000) knlGS:0000000000000000 [ 143.033664] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 143.033665] CR2: 00007fc49c96d470 CR3: 000000001400a005 CR4: 00000000003606e0 [ 143.033669] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 143.033669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 143.033670] Call Trace: [ 143.033744] amdgpu_device_gpu_recover+0x144/0x820 [amdgpu] [ 143.033788] amdgpu_job_timedout+0x9b/0xa0 [amdgpu] [ 143.033791] drm_sched_job_timedout+0xcc/0x150 [amd_sched] [ 143.033795] process_one_work+0x1de/0x410 [ 143.033797] worker_thread+0x32/0x410 [ 143.033799] kthread+0x121/0x140 [ 143.033801] ? process_one_work+0x410/0x410 [ 143.033803] ? kthread_create_worker_on_cpu+0x70/0x70 [ 143.033806] ret_from_fork+0x35/0x40 So just delete the bad job from mirror list directly Changes in v3: - Add a helper function to delete the bad jobs from mirror list and call it directly before the job's fence is signaled Changes in v2: - delete the useless list node check - also delete bad jobs in drm_sched_main because: kthread_unpark(ring->sched.thread) will be invoked very early before amdgpu_device_gpu_recover's return, then drm_sched_main will have chance to pick up a new job from the job queue. This new job will be added into the mirror list and processed by amdgpu_job_run, but may not be deleted from the mirror list on time due to the same reason. And finally re-processed by drm_sched_job_recovery Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Reviewed-by: Christian König <chrstian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-19 16:38:15 -05:00
Sharat Masetty	26efecf955	drm/scheduler: Add drm_sched_job_cleanup This patch adds a new API to clean up the scheduler job resources. This is primarliy needed in cases the job was created but was not queued to the scheduler queue. Additionally with this change, the layer which creates the scheduler job also gets to free up the job's resources and this entails moving the dma_fence_put(finished_fence) to the drivers ops free handler routines. Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:27 -05:00
Andrey Grodzovsky	faf6e1a87e	drm/sched: Add boolean to mark if sched is ready to work v5 Problem: A particular scheduler may become unsuable (underlying HW) after some event (e.g. GPU reset). If it's later chosen by the get free sched. policy a command will fail to be submitted. Fix: Add a driver specific callback to report the sched status so rq with bad sched can be avoided in favor of working one or none in which case job init will fail. v2: Switch from driver callback to flag in scheduler. v3: rebase v4: Remove ready paramter from drm_sched_init, set uncoditionally to true once init done. v5: fix missed change in v3d in v4 (Alex) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:22 -05:00
Christian König	8fe159b014	drm/sched: add drm_sched_fault Add a helper to immediately start timeout handling in case of a hardware fault. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:02 -05:00
Christian König	19067e522d	drm/sched: make sure timer is restarted Make sure we always restart the timer after a timeout and remove the device specific workarounds. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:01 -05:00
Christian König	0efd2d2f68	drm/sched: fix timeout handling v2 We need to make sure that we don't race between job completion and timeout. v2: put revert label after calling the handling manually Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-10-12 12:52:32 -05:00
Christian König	b981c86f03	drm/sched: add drm_sched_start_timeout helper Cleanup starting the timeout a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-10-12 12:52:21 -05:00
Nathan Chancellor	c1f0320e03	drm/scheduler: Simplify spsc_queue_count check in drm_sched_entity_select_rq Clang generates a warning when it sees a logical not followed by a conditional operator like ==, >, or <. drivers/gpu/drm/scheduler/sched_entity.c:470:6: warning: logical not is only applied to the left hand side of this comparison [-Wlogical-not-parentheses] if (!spsc_queue_count(&entity->job_queue) == 0 \|\| ^ ~~ drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses after the '!' to evaluate the comparison first if (!spsc_queue_count(&entity->job_queue) == 0 \|\| ^ ( ) drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses around left hand side expression to silence this warning if (!spsc_queue_count(&entity->job_queue) == 0 \|\| ^ ( ) 1 warning generated. It assumes the author might have made a mistake in their logic: if (!a == b) -> if (!(a == b)) Sometimes that is the case; other times, it's just a super convoluted way of saying 'if (a)' when b = 0: if (!1 == 0) -> if (0 == 0) -> if (true) Alternatively: if (!1 == 0) -> if (!!1) -> if (1) Simplify this comparison so that Clang doesn't complain. Fixes: `35e160e781` ("drm/scheduler: change entities rq even earlier") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-10-09 17:06:00 -05:00
Nayan Deshmukh	6a96243056	drm/scheduler: remove timeout work_struct from drm_sched_job (v3) having a delayed work item per job is redundant as we only need one per scheduler to track the time out the currently executing job. v2: the first element of the ring mirror list is the currently executing job so we don't need a additional variable for it v3: squash in fixes for v3d and etnaviv Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-09-27 09:55:45 -05:00
Nayan Deshmukh	c89677afb3	drm/scheduler: avoid redundant shifting of the entity v2 do not remove entity from the rq if the current rq is from the least loaded scheduler. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:11:14 -05:00
Andrey Grodzovsky	62347a3300	drm/scheduler: Add stopped flag to drm_sched_entity The flag will prevent another thread from same process to reinsert the entity queue into scheduler's rq after it was already removewd from there by another thread during drm_sched_entity_flush. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:11:10 -05:00
Christian König	23f67981fd	drm/scheduler: rename gpu_scheduler.c to sched_main.c Better match the naming of the other components. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:44 -05:00
Christian König	7b10574eac	drm/scheduler: cleanup entity coding style Cleanup coding style in sched_entity.c Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:43 -05:00
Christian König	620e762f9a	drm/scheduler: move entity handling into separate file This is complex enough on it's own. Move it into a separate C file. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:43 -05:00
Christian König	8acc725457	drm/scheduler: trivial error handling fix Return -ENOMEM when allocating the rq_list fails. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:42 -05:00
Christian König	35e160e781	drm/scheduler: change entities rq even earlier Looks like for correct debugging we need to know the scheduler even earlier. So move picking a rq for an entity into job creation. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:07 -05:00
Christian König	573edb241b	drm/scheduler: fix last_scheduled handling Make sure we access last_scheduled only after checking that there are no more jobs on the entity. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:06 -05:00
Christian König	93f15e1c07	drm/scheduler: Remove entity->rq NULL check That is superflous now. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:06 -05:00
Christian König	e854b61acf	drm/scheduler: bind job earlier to scheduler Update job earlier with the scheduler it is supposed to be scheduled on. Otherwise we could incorrectly optimize dependencies when moving an entity from one scheduler to another. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:01 -05:00
Christian König	7febe4bfd5	drm/scheduler: fix setting the priorty for entities (v2) Since we now deal with multiple rq we need to update all of them, not just the current one. v2: Trivial: Removed unused variable (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:00 -05:00
Andrey Grodzovsky	07507c01aa	drm/scheduler: Add job dependency trace. During debug sessions I encountered a need to trace back a job dependecy a few steps back to the first failing job. This trace helpped me a lot. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:09:46 -05:00
Nayan Deshmukh	df0ca30838	drm/scheduler: move idle entities to scheduler with less load v2 This is the first attempt to move entities between schedulers to have dynamic load balancing. We just move entities with no jobs for now as moving the ones with jobs will lead to other compilcations like ensuring that the other scheduler does not remove a job from the current entity while we are moving. v2: remove unused variable and an unecessary check Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:09:46 -05:00
Nayan Deshmukh	97ffa35b5d	drm/scheduler: add new function to get least loaded sched v2 The function selects the run queue from the rq_list with the least load. The load is decided by the number of jobs in a scheduler. v2: avoid using atomic read twice consecutively, instead store it locally Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:09:45 -05:00
Nayan Deshmukh	249a07c05a	drm/scheduler: add counter for total jobs in scheduler To keep track of the scheduler load. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:09:45 -05:00
Nayan Deshmukh	ac0a6cf1c6	drm/scheduler: add a list of run queues to the entity These are the potential run queues on which the jobs from this entity can be scheduled. We will use this to do load balancing. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:09:44 -05:00
Nayan Deshmukh	275e6fa8ec	drm/scheduler: fix param documentation We no longer have sched parameter so remove its description as well Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-09 11:57:39 -05:00
Lucas Stach	4823e5da2e	drm/scheduler: fix timeout worker setup for out of order job completions drm_sched_job_finish() is a work item scheduled for each finished job on a unbound system workqueue. This means the workers can execute out of order with regard to the real hardware job completions. If this happens queueing a timeout worker for the first job on the ring mirror list is wrong, as this may be a job which has already finished executing. Fix this by reorganizing the code to always queue the worker for the next job on the list, if this job hasn't finished yet. This is robust against a potential reordering of the finish workers. Also move out the timeout worker cancelling, so that we don't need to take the job list lock twice. As a small optimization list_del is used to remove the job from the ring mirror list, as there is no need to reinit the list head in the job we are about to free. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-06 15:58:00 -05:00
Christian König	a875f58e23	drm/scheduler: stop setting rq to NULL We removed the redundancy of having an extra scheduler field, so we can't set the rq to NULL any more or otherwise won't know which scheduler to use for the cleanup. Just remove the entity from the scheduling list instead. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Bug: https://bugs.freedesktop.org/show_bug.cgi?id=107367 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-31 16:58:20 -05:00
Christian König	43bce41cf4	drm/scheduler: only kill entity if last user is killed v2 Note which task is using the entity and only kill it if the last user of the entity is killed. This should prevent problems when entities are leaked to child processes. v2: add missing kernel doc Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-31 16:58:20 -05:00
Masahiro Yamada	ba7f47831e	drm/sched: remove unneeded -Iinclude/drm compiler flag I refactored the include directives under include/drm/ some time ago. This flag is unneeded. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-31 16:58:16 -05:00
Nayan Deshmukh	068c330419	drm/scheduler: remove sched field from the entity The scheduler of the entity is decided by the run queue on which it is queued. This patch avoids us the effort required to maintain a sync between rq and sched field when we start shifting entites among different rqs. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-25 15:06:26 -05:00
Nayan Deshmukh	cdc5017659	drm/scheduler: modify API to avoid redundancy entity has a scheduler field and we don't need the sched argument in any of the functions where entity is provided. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-25 15:06:19 -05:00
Junwei Zhang	a6da48caf9	drm/scheduler: add NULL pointer check for run queue (v2) To check rq pointer before adding entity into it. That avoids NULL pointer access in some case. v2: move the check to caller Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-16 16:11:48 -05:00
Nayan Deshmukh	aa16b6c6b4	drm/scheduler: modify args of drm_sched_entity_init replace run queue by a list of run queues and remove the sched arg as that is part of run queue itself Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-13 14:46:05 -05:00
Nayan Deshmukh	8dc9fbbf27	drm/scheduler: add a pointer to scheduler in the rq This patch is in preparation for a better load balancing in scheduler. It allows us to associate entities with the run queues instead of binding them to a scheduler. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-13 14:45:58 -05:00
Dave Airlie	ba7ca97d73	Merge branch 'drm-next-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-next More features for 4.19: - Use core pcie functionality rather than duplicating our own for pcie gens and lanes - Scheduler function naming cleanups - More documentation - Reworked DC/Powerplay interfaces to improve power savings - Initial stutter mode support for RV (power feature) - Vega12 powerplay updates - GFXOFF fixes - Misc fixes Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180705221447.2807-1-alexander.deucher@amd.com	2018-07-10 10:57:08 +10:00
Andrey Grodzovsky	180fc134d7	drm/scheduler: Rename cleanup functions v2. Everything in the flush code path (i.e. waiting for SW queue to become empty) names with _flush() and everything in the release code path names _fini() This patch also effect the amdgpu and etnaviv drivers which use those functions. v2: Also pplay the change to vd3. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-05 16:38:45 -05:00
Daniel Vetter	99e227cb03	drm: Remove unecessary dma_fence_ops dma_fence_default_wait is the default now, same for the trivial enable_signaling implementation. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Sean Paul <seanpaul@chromium.org> Cc: David Airlie <airlied@linux.ie> Link: https://patchwork.freedesktop.org/patch/msgid/20180503142603.28513-7-daniel.vetter@ffwll.ch	2018-07-03 13:13:22 +02:00
Andrey Grodzovsky	741f01e636	drm/scheduler: Avoid using wait_event_killable for dying process (V4) Dying process might be blocked from receiving any more signals so avoid using it. Also retire enity->fini_status and just check the SW queue, if it's not empty do the fallback cleanup. Also handle entity->last_scheduled == NULL use case which happens when HW ring is already hangged whem a new entity tried to enqeue jobs. v2: Return the remaining timeout and use that as parameter for the next call. This way when we need to cleanup multiple queues we don't wait for the entire TO period for each queue but rather in total. Styling comments. Rebase. v3: Update types from unsigned to long. Work with jiffies instead of ms. Return 0 when TO expires. Rebase. v4: Remove unnecessary timeout calculation. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-06-15 12:20:33 -05:00
Nayan Deshmukh	2d33948e4e	drm/scheduler: add documentation convert existing raw comments into kernel-doc format as well as add new documentation v2: reword the overview Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel@ffwll.ch>	2018-06-15 12:20:21 -05:00
Nayan Deshmukh	6201e033d7	drm/scheduler: fix a corner case in dependency optimization When checking for a dependency fence for belonging to the same entity compare it with scheduled as well finished fence. Earlier we were only comparing it with the scheduled fence. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-05-25 11:02:14 -05:00
Emily Deng	52bf20f414	drm/sched: add rcu_barrier after entity fini To free the fence from the amdgpu_fence_slab, need twice call_rcu, to avoid the amdgpu_fence_slab_fini call kmem_cache_destroy(amdgpu_fence_slab) before kmem_cache_free(amdgpu_fence_slab, fence), add rcu_barrier after drm_sched_entity_fini. The kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as below: 1.drm_sched_entity_fini -> drm_sched_entity_cleanup -> dma_fence_put(entity->last_scheduled) -> drm_sched_fence_release_finished -> drm_sched_fence_release_scheduled -> call_rcu(&fence->finished.rcu, drm_sched_fence_free) 2.drm_sched_fence_free -> dma_fence_put(fence->parent) -> amdgpu_fence_release -> call_rcu(&f->rcu, amdgpu_fence_free) -> kmem_cache_free(amdgpu_fence_slab, fence); v2:put the barrier before the kmem_cache_destroy v3:put the dma_fence_put(fence->parent) before call_rcu in drm_sched_fence_release_scheduled Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-05-24 10:07:55 -05:00
Nayan Deshmukh	652470ac55	drm/scheduler: fix function name prefix in comments That got missed while moving the files outside of amdgpu. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-05-18 16:08:20 -05:00
Andrey Grodzovsky	563e1e664d	drm/scheduler: Remove obsolete spinlock. This spinlock is superfluous, any call to drm_sched_entity_push_job should already be under a lock together with matching drm_sched_job_init to match the order of insertion into queue with job's fence seqence number. v2: Improve patch description. Add functions documentation describing the locking considerations Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-05-18 16:08:17 -05:00
Nayan Deshmukh	8344c53f57	drm/scheduler: remove unused parameter this patch also effect the amdgpu and etnaviv drivers which use the function drm_sched_entity_init Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-05-15 13:44:27 -05:00
Pixel Ding	5dd3f9efd4	drm/scheduler: don't update last scheduled fence in TDR The current sequence in scheduler thread is: 1. update last sched fence 2. job begin (adding to mirror list) 3. job finish (remove from mirror list) 4. back to 1 Since we update last sched prior to joining mirror list, the jobs in mirror list already pass the last sched fence. TDR just run the jobs in mirror list, so we should not update the last sched fences in TDR. Signed-off-by: Pixel Ding <Pixel.Ding@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-05-15 13:44:05 -05:00
Pixel Ding	b5b4ea4d98	drm/scheduler: move last_sched fence updating prior to job popping (v2) Make sure main thread won't update last_sched fence when entity is cleanup. Fix a racing issue which is caused by putting last_sched fence twice. Running vulkaninfo in tight loop can produce this issue as seeing wild fence pointer. v2: squash in build fix (Christian) Signed-off-by: Pixel Ding <Pixel.Ding@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-05-15 13:43:30 -05:00
Pixel Ding	a4b3996aee	drm/scheduler: always put last_sched fence in entity_fini Fix the potential memleak since scheduler main thread always hold one last_sched fence. Signed-off-by: Pixel Ding <Pixel.Ding@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-05-15 13:43:30 -05:00
Emily Deng	8ee3a52e3f	drm/gpu-sched: fix force APP kill hang(v4) issue: there are VMC page fault occurred if force APP kill during 3dmark test, the cause is in entity_fini we manually signal all those jobs in entity's queue which confuse the sync/dep mechanism: 1)page fault occurred in sdma's clear job which operate on shadow buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release since the fence in its reservation was fake signaled by entity_fini() under the case of SIGKILL received. 2)page fault occurred in gfx' job because during the lifetime of gfx job we manually fake signal all jobs from its entity in entity_fini(), thus the unmapping/clear PTE job depend on those result fence is satisfied and sdma start clearing the PTE and lead to GFX page fault. fix: 1)should at least wait all jobs already scheduled complete in entity_fini() if SIGKILL is the case. 2)if a fence signaled and try to clear some entity's dependency, should set this entity guilty to prevent its job really run since the dependency is fake signaled. v2: splitting drm_sched_entity_fini() into two functions: 1)The first one is does the waiting, removes the entity from the runqueue and returns an error when the process was killed. 2)The second one then goes over the entity, install it as completion signal for the remaining jobs and signals all jobs with an error code. v3: 1)Replace the fini1 and fini2 with better name 2)Call the first part before the VM teardown in amdgpu_driver_postclose_kms() and the second part after the VM teardown 3)Keep the original function drm_sched_entity_fini to refine the code. v4: 1)Rename entity->finished to entity->last_scheduled; 2)Rename drm_sched_entity_fini_job_cb() to drm_sched_entity_kill_jobs_cb(); 3)Pass NULL to drm_sched_entity_fini_job_cb() if -ENOENT; 4)Replace the type of entity->fini_status with "int"; 5)Remove the check about entity->finished. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-05-15 13:43:17 -05:00

1 2

56 Commits (84c6127580c1cee58d57d5f97ce22f1131ecdfc9)