cpufreq: Add android's 'interactive' governor

Interactive governor has lived in Android sources for a very long time and this commit is based on the code present in following branch: https://android.googlesource.com/kernel/common android-4.4 The Interactive governor is designed for latency-sensitive workloads, such as interactive user interfaces like the mobile phones and tablets. The interactive governor aims to be significantly more responsive to ramp CPU quickly up when CPU-intensive activity begins. Existing governors sample CPU load at a particular rate, typically every X ms and then update the frequency from a work-handler. This can lead to under-powering UI threads for the period of time during which the user begins interacting with a previously-idle system until the next sample period happens. The 'interactive' governor uses a different approach. A real-time thread is used for scaling up, giving the remaining tasks the CPU performance benefit, unlike existing governors which are more likely to schedule ramp-up work to occur after your performance starved tasks have completed. The Android version of interactive governor also checks whether to scale the CPU frequency up soon after coming out of idle. When the CPU comes out of idle, the governor check if the CPU sampling is overdue or not. If yes, it immediately starts the sampling. Otherwise, the utilization hooks from the scheduler handle the sampling later. If the CPU is very busy from exiting idle to when the evaluation happens, then it assumes that the CPU is under-powered and ramps it to MAX speed. If the CPU was not sufficiently busy to immediately ramp to MAX speed, then the governor evaluates the CPU load since the last speed adjustment, choosing the highest value between that longer-term load or the short-term load since idle exit to determine the CPU speed to ramp to. Idle notifiers will be be handled later and are not included for now. The core of this code is written and maintained (in Android repositories) by Mike Chan and Todd Poyner over a long period of time. Vireshk has made changes to to the governor to align it with the current practices followed with mainline governors, like using utilization hooks from the scheduler and handling kobject (for governor's sysfs directory) in a race free manner. And of course this included general cleanup of the governor as well. Signed-off-by: Mike Chan <mike@android.com> Signed-off-by: Todd Poynor <toddpoynor@google.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-05-17 14:41:22 +05:30 · 2016-05-17 14:41:22 +05:30 · 27719a5032
parent 860c1b9824
commit 27719a5032
5 changed files with 1607 additions and 0 deletions
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@ -587,6 +587,102 @@ This governor exposes the following tunables:
 	It effectively causes the frequency to go down ``sampling_down_factor``
 	times slower than it ramps up.

+``interactive``
+---------------
+
+The CPUfreq governor "interactive" is designed for latency-sensitive,
+interactive workloads. This governor sets the CPU speed depending on
+usage, similar to "ondemand" and "conservative" governors, but with a
+different set of configurable behaviors.
+
+The tunable values for this governor are:
+
+``above_hispeed_delay``
+        When speed is at or above hispeed_freq, wait for
+        this long before raising speed in response to continued high load.
+        The format is a single delay value, optionally followed by pairs of
+        CPU speeds and the delay to use at or above those speeds.  Colons can
+        be used between the speeds and associated delays for readability.  For
+        example:
+
+           80000 1300000:200000 1500000:40000
+
+        uses delay 80000 uS until CPU speed 1.3 GHz, at which speed delay
+        200000 uS is used until speed 1.5 GHz, at which speed (and above)
+        delay 40000 uS is used.  If speeds are specified these must appear in
+        ascending order.  Default is 20000 uS.
+
+``boost``
+        If non-zero, immediately boost speed of all CPUs to at least
+        hispeed_freq until zero is written to this attribute.  If zero, allow
+        CPU speeds to drop below hispeed_freq according to load as usual.
+        Default is zero.
+
+``boostpulse``
+        On each write, immediately boost speed of all CPUs to
+        hispeed_freq for at least the period of time specified by
+        boostpulse_duration, after which speeds are allowed to drop below
+        hispeed_freq according to load as usual. Its a write-only file.
+
+``boostpulse_duration``
+        Length of time to hold CPU speed at hispeed_freq
+        on a write to boostpulse, before allowing speed to drop according to
+        load as usual.  Default is 80000 uS.
+
+``go_hispeed_load``
+        The CPU load at which to ramp to hispeed_freq.
+        Default is 99%.
+
+``hispeed_freq``
+        An intermediate "high speed" at which to initially ramp
+        when CPU load hits the value specified in go_hispeed_load.  If load
+        stays high for the amount of time specified in above_hispeed_delay,
+        then speed may be bumped higher.  Default is the maximum speed allowed
+        by the policy at governor initialization time.
+
+``io_is_busy``
+        If set, the governor accounts IO time as CPU busy time.
+
+``min_sample_time``
+        The minimum amount of time to spend at the current
+        frequency before ramping down. Default is 80000 uS.
+
+``target_loads``
+        CPU load values used to adjust speed to influence the
+        current CPU load toward that value.  In general, the lower the target
+        load, the more often the governor will raise CPU speeds to bring load
+        below the target.  The format is a single target load, optionally
+        followed by pairs of CPU speeds and CPU loads to target at or above
+        those speeds.  Colons can be used between the speeds and associated
+        target loads for readability.  For example:
+
+           85 1000000:90 1700000:99
+
+        targets CPU load 85% below speed 1GHz, 90% at or above 1GHz, until
+        1.7GHz and above, at which load 99% is targeted.  If speeds are
+        specified these must appear in ascending order.  Higher target load
+        values are typically specified for higher speeds, that is, target load
+        values also usually appear in an ascending order. The default is
+        target load 90% for all speeds.
+
+``timer_rate``
+        Sample rate for reevaluating CPU load when the CPU is not
+        idle.  A deferrable timer is used, such that the CPU will not be woken
+        from idle to service this timer until something else needs to run.
+        (The maximum time to allow deferring this timer when not running at
+        minimum speed is configurable via timer_slack.)  Default is 20000 uS.
+
+``timer_slack``
+        Maximum additional time to defer handling the governor
+        sampling timer beyond timer_rate when running at speeds above the
+        minimum.  For platforms that consume additional power at idle when
+        CPUs are running at speeds greater than minimum, this places an upper
+        bound on how long the timer will be deferred prior to re-evaluating
+        load and dropping speed.  For example, if timer_rate is 20000uS and
+        timer_slack is 10000uS then timers will be deferred for up to 30msec
+        when not at lowest speed.  A value of -1 means defer timers
+        indefinitely at all speeds.  Default is 80000 uS.
+

 Frequency Boost Support
 =======================
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@ -104,6 +104,16 @@ config CPU_FREQ_DEFAULT_GOV_SCHEDUTIL
 	  have a look at the help section of that governor. The fallback
 	  governor will be 'performance'.

+config CPU_FREQ_DEFAULT_GOV_INTERACTIVE
+	bool "interactive"
+	select CPU_FREQ_GOV_INTERACTIVE
+	select CPU_FREQ_GOV_PERFORMANCE
+	help
+	  Use the CPUFreq governor 'interactive' as default. This allows
+	  you to get a full dynamic cpu frequency capable system by simply
+	  loading your cpufreq low-level hardware driver, using the
+	  'interactive' governor for latency-sensitive workloads.
+
 endchoice

 config CPU_FREQ_GOV_PERFORMANCE
@ -202,6 +212,26 @@ config CPU_FREQ_GOV_SCHEDUTIL

 	  If in doubt, say N.

+config CPU_FREQ_GOV_INTERACTIVE
+	tristate "'interactive' cpufreq policy governor"
+	depends on CPU_FREQ
+	select CPU_FREQ_GOV_ATTR_SET
+	select IRQ_WORK
+	help
+	  'interactive' - This driver adds a dynamic cpufreq policy governor
+	  designed for latency-sensitive workloads.
+
+	  This governor attempts to reduce the latency of clock
+	  increases so that the system is more responsive to
+	  interactive workloads.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called cpufreq_interactive.
+
+	  For details, take a look at linux/Documentation/cpu-freq.
+
+	  If in doubt, say N.
+
 comment "CPU frequency scaling drivers"

 config CPUFREQ_DT
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@ -11,6 +11,7 @@ obj-$(CONFIG_CPU_FREQ_GOV_POWERSAVE)	+= cpufreq_powersave.o
 obj-$(CONFIG_CPU_FREQ_GOV_USERSPACE)	+= cpufreq_userspace.o
 obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND)	+= cpufreq_ondemand.o
 obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE)	+= cpufreq_conservative.o
+obj-$(CONFIG_CPU_FREQ_GOV_INTERACTIVE)	+= cpufreq_interactive.o
 obj-$(CONFIG_CPU_FREQ_GOV_COMMON)		+= cpufreq_governor.o
 obj-$(CONFIG_CPU_FREQ_GOV_ATTR_SET)	+= cpufreq_governor_attr_set.o

--- a/drivers/cpufreq/cpufreq_interactive.c
+++ b/drivers/cpufreq/cpufreq_interactive.c
--- a/include/trace/events/cpufreq_interactive.h
+++ b/include/trace/events/cpufreq_interactive.h
@ -0,0 +1,112 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cpufreq_interactive
+
+#if !defined(_TRACE_CPUFREQ_INTERACTIVE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_CPUFREQ_INTERACTIVE_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(set,
+	TP_PROTO(u32 cpu_id, unsigned long targfreq,
+		 unsigned long actualfreq),
+	TP_ARGS(cpu_id, targfreq, actualfreq),
+
+	TP_STRUCT__entry(
+		__field(u32, cpu_id)
+		__field(unsigned long, targfreq)
+		__field(unsigned long, actualfreq)
+	),
+
+	TP_fast_assign(
+		__entry->cpu_id = (u32)cpu_id;
+		__entry->targfreq = targfreq;
+		__entry->actualfreq = actualfreq;
+	),
+
+	TP_printk("cpu=%u targ=%lu actual=%lu",
+		__entry->cpu_id, __entry->targfreq,
+		__entry->actualfreq)
+);
+
+DEFINE_EVENT(set, cpufreq_interactive_setspeed,
+	TP_PROTO(u32 cpu_id, unsigned long targfreq,
+		 unsigned long actualfreq),
+	TP_ARGS(cpu_id, targfreq, actualfreq)
+);
+
+DECLARE_EVENT_CLASS(loadeval,
+	TP_PROTO(unsigned long cpu_id, unsigned long load,
+		 unsigned long curtarg, unsigned long curactual,
+		 unsigned long newtarg),
+	TP_ARGS(cpu_id, load, curtarg, curactual, newtarg),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, cpu_id)
+		__field(unsigned long, load)
+		__field(unsigned long, curtarg)
+		__field(unsigned long, curactual)
+		__field(unsigned long, newtarg)
+	),
+
+	TP_fast_assign(
+		__entry->cpu_id = cpu_id;
+		__entry->load = load;
+		__entry->curtarg = curtarg;
+		__entry->curactual = curactual;
+		__entry->newtarg = newtarg;
+	),
+
+	TP_printk("cpu=%lu load=%lu cur=%lu actual=%lu targ=%lu",
+		  __entry->cpu_id, __entry->load, __entry->curtarg,
+		  __entry->curactual, __entry->newtarg)
+);
+
+DEFINE_EVENT(loadeval, cpufreq_interactive_target,
+	TP_PROTO(unsigned long cpu_id, unsigned long load,
+		 unsigned long curtarg, unsigned long curactual,
+		 unsigned long newtarg),
+	TP_ARGS(cpu_id, load, curtarg, curactual, newtarg)
+);
+
+DEFINE_EVENT(loadeval, cpufreq_interactive_already,
+	TP_PROTO(unsigned long cpu_id, unsigned long load,
+		 unsigned long curtarg, unsigned long curactual,
+		 unsigned long newtarg),
+	TP_ARGS(cpu_id, load, curtarg, curactual, newtarg)
+);
+
+DEFINE_EVENT(loadeval, cpufreq_interactive_notyet,
+	TP_PROTO(unsigned long cpu_id, unsigned long load,
+		 unsigned long curtarg, unsigned long curactual,
+		 unsigned long newtarg),
+	TP_ARGS(cpu_id, load, curtarg, curactual, newtarg)
+);
+
+TRACE_EVENT(cpufreq_interactive_boost,
+	TP_PROTO(const char *s),
+	TP_ARGS(s),
+	TP_STRUCT__entry(
+		__string(s, s)
+	),
+	TP_fast_assign(
+		__assign_str(s, s);
+	),
+	TP_printk("%s", __get_str(s))
+);
+
+TRACE_EVENT(cpufreq_interactive_unboost,
+	TP_PROTO(const char *s),
+	TP_ARGS(s),
+	TP_STRUCT__entry(
+		__string(s, s)
+	),
+	TP_fast_assign(
+		__assign_str(s, s);
+	),
+	TP_printk("%s", __get_str(s))
+);
+
+#endif /* _TRACE_CPUFREQ_INTERACTIVE_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>