alistair23-linux

redonkable

History

Rafael J. Wysocki 85572c2c4a cpufreq: Avoid leaving stale IRQ work items during CPU offline The scheduler code calling cpufreq_update_util() may run during CPU offline on the target CPU after the IRQ work lists have been flushed for it, so the target CPU should be prevented from running code that may queue up an IRQ work item on it at that point. Unfortunately, that may not be the case if dvfs_possible_from_any_cpu is set for at least one cpufreq policy in the system, because that allows the CPU going offline to run the utilization update callback of the cpufreq governor on behalf of another (online) CPU in some cases. If that happens, the cpufreq governor callback may queue up an IRQ work on the CPU running it, which is going offline, and the IRQ work may not be flushed after that point. Moreover, that IRQ work cannot be flushed until the "offlining" CPU goes back online, so if any other CPU calls irq_work_sync() to wait for the completion of that IRQ work, it will have to wait until the "offlining" CPU is back online and that may not happen forever. In particular, a system-wide deadlock may occur during CPU online as a result of that. The failing scenario is as follows. CPU0 is the boot CPU, so it creates a cpufreq policy and becomes the "leader" of it (policy->cpu). It cannot go offline, because it is the boot CPU. Next, other CPUs join the cpufreq policy as they go online and they leave it when they go offline. The last CPU to go offline, say CPU3, may queue up an IRQ work while running the governor callback on behalf of CPU0 after leaving the cpufreq policy because of the dvfs_possible_from_any_cpu effect described above. Then, CPU0 is the only online CPU in the system and the stale IRQ work is still queued on CPU3. When, say, CPU1 goes back online, it will run irq_work_sync() to wait for that IRQ work to complete and so it will wait for CPU3 to go back online (which may never happen even in principle), but (worse yet) CPU0 is waiting for CPU1 at that point too and a system-wide deadlock occurs. To address this problem notice that CPUs which cannot run cpufreq utilization update code for themselves (for example, because they have left the cpufreq policies that they belonged to), should also be prevented from running that code on behalf of the other CPUs that belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so in that case the cpufreq_update_util_data pointer of the CPU running the code must not be NULL as well as for the CPU which is the target of the cpufreq utilization update in progress. Accordingly, change cpufreq_this_cpu_can_update() into a regular function in kernel/sched/cpufreq.c (instead of a static inline in a header file) and make it check the cpufreq_update_util_data pointer of the local CPU if dvfs_possible_from_any_cpu is set for the target cpufreq policy. Also update the schedutil governor to do the cpufreq_this_cpu_can_update() check in the non-fast-switch case too to avoid the stale IRQ work issues. Fixes: `99d14d0e16` ("cpufreq: Process remote callbacks from any CPU if the platform permits") Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/ Reported-by: Anson Huang <anson.huang@nxp.com> Tested-by: Anson Huang <anson.huang@nxp.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>		2019-12-12 17:59:43 +01:00
..
autogroup.h	…
clock.h	…
coredump.h	oom, oom_reaper: do not enqueue same task twice	2019-02-01 15:46:23 -08:00
cpufreq.h	cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-12 17:59:43 +01:00
cputime.h	posix-cpu-timers: Move state tracking to struct posix_cputimers	2019-08-28 11:50:42 +02:00
deadline.h	cpusets: Rebuild root domain deadline accounting information	2019-07-25 15:55:01 +02:00
debug.h	…
hotplug.h	…
idle.h	…
init.h	…
isolation.h	KVM: LAPIC: Inject timer interrupt via posted interrupt	2019-07-20 09:00:40 +02:00
jobctl.h	cgroup: cgroup v2 freezer	2019-04-19 11:26:48 -07:00
loadavg.h	sched: loadavg: make calc_load_n() public	2018-10-26 16:26:32 -07:00
mm.h	exit/exec: Seperate mm_release()	2019-11-20 09:40:08 +01:00
nohz.h	sched/fair: Remove the rq->cpu_load[] update code	2019-06-03 11:49:38 +02:00
numa_balancing.h	sched/fair: Don't free p->numa_faults with concurrent readers	2019-07-25 15:37:04 +02:00
prio.h	…
rt.h	…
signal.h	posix-cpu-timers: Move state tracking to struct posix_cputimers	2019-08-28 11:50:42 +02:00
smt.h	x86/speculation: Rework SMT state change	2018-11-28 11:57:07 +01:00
stat.h	sched: Fix various typos in comments	2018-12-03 11:55:42 +01:00
sysctl.h	sched/uclamp: Add system default clamps	2019-06-24 19:23:45 +02:00
task.h	fork: extend clone3() to support setting a PID	2019-11-15 23:49:22 +01:00
task_stack.h	sched/core: Convert task_struct.stack_refcount to refcount_t	2019-02-04 08:53:56 +01:00
topology.h	sched/topology: Add partition_sched_domains_locked()	2019-07-25 15:51:57 +02:00
types.h	posix-cpu-timers: Provide array based access to expiry cache	2019-08-28 11:50:35 +02:00
user.h	keys: Move the user and user-session keyrings to the user_namespace	2019-06-26 21:02:32 +01:00
wake_q.h	locking/rwsem: Always release wait_lock before waking up tasks	2019-06-17 12:28:00 +02:00
xacct.h	…