1
0
Fork 0
Commit Graph

104 Commits (8508cf3ffad4defa202b303e5b6379efc4cd9054)

Author SHA1 Message Date
Johannes Weiner 8508cf3ffa sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
There are several definitions of those functions/macros in places that
mess with fixed-point load averages.  Provide an official version.

[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:26:32 -07:00
Rafael J. Wysocki f1c8e410cd cpuidle: menu: Avoid computations when result will be discarded
If the minimum interval taken into account in the average computation
loop in get_typical_interval() is less than the expected idle
duration determined so far, the resultant average cannot be greater
than that value as well and the entire return result of the function
is going to be discarded anyway going forward.

In that case, it is a waste of time to carry out the remaining
computations in get_typical_interval(), so avoid that by returning
early if the minimum interval is not below the expected idle duration.

No intentional changes of behavior.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-18 09:34:13 +02:00
Rafael J. Wysocki 12b65eadf0 cpuidle: menu: Drop redundant comparison
Since the correction factor cannot be greater than RESOLUTION * DECAY,
the result of the predicted_us computation in menu_select() cannot be
greater than data->next_timer_us, so it is not necessary to compare
the "typical interval" value coming from get_typical_interval() with
data->next_timer_us separately.

It is sufficient to copmare predicted_us with the return value of
get_typical_interval() directly, so do that and drop the now
redundant expected_interval variable.

No intentional changes of behavior.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-18 09:34:13 +02:00
Rafael J. Wysocki bde091ece2 cpuidle: menu: Simplify checks related to the polling state
After some recent menu governor changes, the promotion of the
"polling" state to a physical one is mostly controlled by the
latency limit (resulting from the "interactivity" factor) and
not by the time to the closest timer event, so it should be
sufficient to check the exit latency of that state for this
purpose (of course, its target residency still needs to be
within the next timer event range for energy-efficiency).

Also, the physical state the "polling" one is promoted to need not
be the next one in principle (in case the next state is disabled,
for example).

For these reasons, simplify the checks made to decide whether or
not to promote the "polling" state to a physical one and update
the target idle duration when it is promoted in case the residency
of the new state turns out to be above the tick boundary (in which
case there is no reason to stop the tick).

Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-12 10:46:37 +02:00
Rafael J. Wysocki 53812cdc91 cpuidle: menu: Move the latency_req == 0 special case check
It is better to always update data->bucket before returning from
menu_select() to avoid updating the correction factor for a stale
bucket, so combine the latency_req == 0 special check with the more
general check below.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2018-10-04 19:27:27 +02:00
Rafael J. Wysocki 8b007ebec9 cpuidle: menu: Avoid computations for very close timers
If the next timer event (not including the tick) is closer than the
target residency of the second state or the PM QoS latency constraint
is below its exit latency, state[0] will be used regardless of any
other factors, so skip the computations in menu_select() then and
return 0 straight away from it.

Still, do that after the bucket has been determined to avoid
updating the correction factor for a stale bucket.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2018-10-04 19:27:27 +02:00
Rafael J. Wysocki eb40a380bf cpuidle: menu: Do not update last_state_idx in menu_select()
It is not necessary to update data->last_state_idx in menu_select()
as it only is used in menu_update() which only runs when
data->needs_update is set and that is set only when updating
data->last_state_idx in menu_reflect().

Accordingly, drop the update of data->last_state_idx from
menu_select() and get rid of the (now redundant) "out" label
from it.

No intentional behavior changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-10-04 19:26:38 +02:00
Rafael J. Wysocki 96c3d11df1 cpuidle: menu: Get rid of first_idx from menu_select()
Rearrange the code in menu_select() so that the loop over idle states
always starts from 0 and get rid of the first_idx variable.

While at it, add two empty lines to separate conditional statements
from one another.

No intentional behavior changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-10-04 19:25:53 +02:00
Rafael J. Wysocki 23e8ceb9ce cpuidle: menu: Compute first_idx when latency_req is known
Since menu_select() can only set first_idx to 1 if the exit latency
of the second state is not greater than the latency limit, it should
first determine that limit.  Thus first_idx should be computed after
the "interactivity" factor has been taken into account.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewedy-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-10-04 19:24:14 +02:00
Rafael J. Wysocki 5f26bdceb9 cpuidle: menu: Fix wakeup statistics updates for polling state
If the CPU exits the "polling" state due to the time limit in the
loop in poll_idle(), this is not a real wakeup and it just means
that the "polling" state selection was not adequate.  The governor
mispredicted short idle duration, but had a more suitable state been
selected, the CPU might have spent more time in it.  In fact, there
is no reason to expect that there would have been a wakeup event
earlier than the next timer in that case.

Handling such cases as regular wakeups in menu_update() may cause the
menu governor to make suboptimal decisions going forward, but ignoring
them altogether would not be correct either, because every time
menu_select() is invoked, it makes a separate new attempt to predict
the idle duration taking distinct time to the closest timer event as
input and the outcomes of all those attempts should be recorded.

For this reason, make menu_update() always assume that if the
"polling" state was exited due to the time limit, the next proper
wakeup event for the CPU would be the next timer event (not
including the tick).

Fixes: a37b969a61 "cpuidle: poll_state: Add time limit to poll_idle()"
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-10-04 10:23:37 +02:00
Rafael J. Wysocki 03dba27804 cpuidle: menu: Replace data->predicted_us with local variable
The predicted_us field in struct menu_device is only accessed in
menu_select(), so replace it with a local variable in that function.

With that, stop using expected_interval instead of predicted_us to
store the new predicted idle duration value if it is set to the
selected state's target residency which is quite confusing.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-10-03 12:02:44 +02:00
Fieah Lim 6a5f95b5a4 cpuidle: Remove unnecessary wrapper cpuidle_get_last_residency()
cpuidle_get_last_residency() is just a wrapper for retrieving
the last_residency member of struct cpuidle_device.  It's also
weirdly the only wrapper function for accessing cpuidle_* struct
member (by my best guess is it could be a leftover from v2.x).

Anyhow, since the only two users (the ladder and menu governors)
can access dev->last_residency directly, and it's more intuitive to
do it that way, let's just get rid of the wrapper.

This patch tidies up CPU idle code a bit without functional changes.

Signed-off-by: Fieah Lim <kw@fieahl.im>
[ rjw: Changelog cleanup ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-09-18 09:24:44 +02:00
Rafael J. Wysocki 757ab15c3f cpuidle: menu: Retain tick when shallow state is selected
The case addressed by commit 5ef499cd57 (cpuidle: menu: Handle
stopped tick more aggressively) in the stopped tick case is present
when the tick has not been stopped yet too.  Namely, if only two CPU
idle states, shallow state A with target residency significantly
below the tick boundary and deep state B with target residency
significantly above it, are available and the predicted idle
duration is above the tick boundary, but below the target residency
of state B, state A will be selected and the CPU may spend indefinite
amount of time in it, which is not quite energy-efficient.

However, if the tick has not been stopped yet and the governor is
about to select a shallow idle state for the CPU even though the idle
duration predicted by it is above the tick boundary, it should be
fine to wake up the CPU early, so the tick can be retained then and
the governor will have a chance to select a deeper state when it runs
next time.

[Note that when this really happens, it will make the idle duration
 predictor believe that the CPU might be idle longer than predicted,
 which will make it more likely to predict longer idle durations going
 forward, but that will also cause deeper idle states to be selected
 going forward, on average, which is what's needed here.]

Fixes: 87c9fe6ee4 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <leo.yan@linaro.org>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+: 5ef499cd57 (cpuidle: menu: Handle ...)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-25 13:16:08 +02:00
Rafael J. Wysocki 5ef499cd57 cpuidle: menu: Handle stopped tick more aggressively
Commit 87c9fe6ee4 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.

Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary.  In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly.  However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.

To address this issue, modify the governor to always use the time
till the closest timer event instead of the predicted idle duration
if the latter is less than the tick period length and the tick has
been stopped already.  Also make it extend the search for a matching
idle state if the tick is stopped to avoid settling on a shallow
state if deep states with target residencies above the tick period
length are available.

In addition, make it always indicate that the tick should be stopped
if it has been stopped already for consistency.

Fixes: 87c9fe6ee4 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-20 13:37:03 +02:00
Rafael J. Wysocki 50f7ccc647 cpuidle: menu: Update stale polling override comment
The comment to explain why the menu governor uses idle state 1
instead of idle state 0 as the first one sometimes is stale (among
other things it mentions a user setting not present any more),
so update it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-16 23:05:43 +02:00
Rafael J. Wysocki f390c5eb28 cpuidle: menu: Fix white space
Fix some damaged white space in menu_select().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-15 00:08:51 +02:00
Rafael J. Wysocki 0fc784fb09 cpuidle: governors: Consolidate PM QoS handling
There is some code duplication related to the PM QoS handling between
the existing cpuidle governors, so move that code to a common helper
function and call that from the governors.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-30 23:13:00 +02:00
Rafael J. Wysocki cf7eeea947 cpuidle: governors: Drop redundant checks related to PM QoS
PM_QOS_RESUME_LATENCY_NO_CONSTRAINT is defined as the 32-bit integer
maximum, so it is not necessary to test the return value of
dev_pm_qos_raw_read_value() against it directly in the menu and
ladder cpuidle governors.

Drop these redundant checks.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-30 23:13:00 +02:00
Rafael J. Wysocki 87c9fe6ee4 cpuidle: menu: Avoid selecting shallow states with stopped tick
If the scheduler tick has been stopped already and the governor
selects a shallow idle state, the CPU can spend a long time in that
state if the selection is based on an inaccurate prediction of idle
time.  That effect turns out to be relevant, so it needs to be
mitigated.

To that end, modify the menu governor to discard the result of the
idle time prediction if the tick is stopped and the predicted idle
time is less than the tick period length, unless the tick timer is
going to expire soon.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2018-04-09 11:54:57 +02:00
Rafael J. Wysocki 296bb1e51a cpuidle: menu: Refine idle state selection for running tick
If the tick isn't stopped, the target residency of the state selected
by the menu governor may be greater than the actual time to the next
tick and that means lost energy.

To avoid that, make tick_nohz_get_sleep_length() return the current
time to the next event (before stopping the tick) in addition to the
estimated one via an extra pointer argument and make menu_select()
use that value to refine the state selection when necessary.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2018-04-09 11:54:56 +02:00
Rafael J. Wysocki 45f1ff59e2 cpuidle: Return nohz hint from cpuidle_select()
Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.

Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
 (1) the idle exit latency is constrained at 0, or
 (2) the selected state is a polling one, or
 (3) the expected idle period duration is within the tick period
     range.

In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values.  To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_update()
behavior when tick wakeup occurs.  Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_update() try to compensate for that by updating
the governor statistics as though the CPU was idle for a long time.

Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2018-04-06 09:29:34 +02:00
Rafael J. Wysocki 0759e80b84 PM / QoS: Fix device resume latency framework
The special value of 0 for device resume latency PM QoS means
"no restriction", but there are two problems with that.

First, device resume latency PM QoS requests with 0 as the
value are always put in front of requests with positive
values in the priority lists used internally by the PM QoS
framework, causing 0 to be chosen as an effective constraint
value.  However, that 0 is then interpreted as "no restriction"
effectively overriding the other requests with specific
restrictions which is incorrect.

Second, the users of device resume latency PM QoS have no
way to specify that *any* resume latency at all should be
avoided, which is an artificial limitation in general.

To address these issues, modify device resume latency PM QoS to
use S32_MAX as the "no constraint" value and 0 as the "no
latency at all" one and rework its users (the cpuidle menu
governor, the genpd QoS governor and the runtime PM framework)
to follow these changes.

Also add a special "n/a" value to the corresponding user space I/F
to allow user space to indicate that it cannot accept any resume
latencies at all for the given device.

Fixes: 85dc0b8a40 (PM / QoS: Make it possible to expose PM QoS latency constraints)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Tero Kristo <t-kristo@ti.com>
Reviewed-by: Ramesh Thomas <ramesh.thomas@intel.com>
2017-11-08 12:14:51 +01:00
Rafael J. Wysocki dc2251bf98 cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
On some architectures the first (index 0) idle state is a polling
one and it doesn't really save energy, so there is the
CPUIDLE_DRIVER_STATE_START symbol allowing some pieces of
cpuidle code to avoid using that state.

However, this makes the code rather hard to follow.  It is better
to explicitly avoid the polling state, so add a new cpuidle state
flag CPUIDLE_FLAG_POLLING to mark it and make the relevant code
check that flag for the first state instead of using the
CPUIDLE_DRIVER_STATE_START symbol.

In the ACPI processor driver that cannot always rely on the state
flags (like before the states table has been set up) define
a new internal symbol ACPI_IDLE_STATE_START equivalent to the
CPUIDLE_DRIVER_STATE_START one and drop the latter.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-30 03:05:29 +02:00
Nicholas Piggin 3ed09c9458 cpuidle: menu: allow state 0 to be disabled
The menu driver does not allow state0 to be disabled completely.
If it is disabled but other enabled states don't meet latency
requirements, it is still used.

Fix this by starting with the first enabled idle state. Fall back
to state 0 if no idle states are enabled (arguably this should be
-EINVAL if it is attempted, but this is the minimal fix).

Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-29 22:59:17 +02:00
Linus Torvalds 1827adb11a Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched.h split-up from Ingo Molnar:
 "The point of these changes is to significantly reduce the
  <linux/sched.h> header footprint, to speed up the kernel build and to
  have a cleaner header structure.

  After these changes the new <linux/sched.h>'s typical preprocessed
  size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
  lines), which is around 40% faster to build on typical configs.

  Not much changed from the last version (-v2) posted three weeks ago: I
  eliminated quirks, backmerged fixes plus I rebased it to an upstream
  SHA1 from yesterday that includes most changes queued up in -next plus
  all sched.h changes that were pending from Andrew.

  I've re-tested the series both on x86 and on cross-arch defconfigs,
  and did a bisectability test at a number of random points.

  I tried to test as many build configurations as possible, but some
  build breakage is probably still left - but it should be mostly
  limited to architectures that have no cross-compiler binaries
  available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
  sched/headers: Clean up <linux/sched.h>
  sched/headers: Remove #ifdefs from <linux/sched.h>
  sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
  sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
  sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
  sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
  sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
  sched/core: Remove unused prefetch_stack()
  sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
  sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
  sched/headers: Remove <linux/signal.h> from <linux/sched.h>
  sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
  sched/headers: Remove the runqueue_is_locked() prototype
  sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
  sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
  sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
  ...
2017-03-03 10:16:38 -08:00
Ingo Molnar 03441a3482 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/stat.h>
We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/stat.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:34 +01:00
Ingo Molnar 4f17722c72 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h>
We are going to split <linux/sched/loadavg.h> out of <linux/sched.h>, which
will have to be picked up from a couple of .c files.

Create a trivial placeholder <linux/sched/topology.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:27 +01:00
Rafael J. Wysocki 6dbf5cea05 cpuidle: menu: Avoid taking spinlock for accessing QoS values
After commit 9908859aca (cpuidle/menu: add per CPU PM QoS resume
latency consideration) the cpuidle menu governor calls
dev_pm_qos_read_value() on CPU devices to read the current resume
latency QoS constraint values for them.  That function takes a spinlock
to prevent the device's power.qos pointer from becoming NULL during
the access which is a problem for the RT patchset where spinlocks are
converted into mutexes and the idle loop stops working.

However, it is not even necessary for the menu governor to take
that spinlock, because the power.qos pointer accessed under it
cannot be modified during the access anyway.

For this reason, introduce a "raw" routine for accessing device
QoS resume latency constraints without locking and use it in the
menu governor.

Fixes: 9908859aca (cpuidle/menu: add per CPU PM QoS resume latency consideration)
Acked-by: Alex Shi <alex.shi@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-27 15:07:38 +01:00
Alex Shi 9908859aca cpuidle/menu: add per CPU PM QoS resume latency consideration
There may be special requirements on CPU response time, like if a
interrupt is pinned to a CPU, that CPU should not go into excessively
deep idle states.  For this reason, add a mechanism for adding
PM QoS resume latency constraints for individual CPUs and modify the
menu governor to take them into account.

To that end, extend the device PM QoS pm_qos_resume_latency attribute
to CPUs, which is possible, because the exit latency for CPUs is
effectively equivalent to the resume latency for devices.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
Acked-by: Rik van Riel <riel@redhat.com>
[ rjw : Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 11:03:32 +01:00
Alex Shi 8e37e1a2a3 cpuidle/menu: stop seeking deeper idle if current state is deep enough
Obsolete commit 71abbbf856 (cpuidle: extend cpuidle and menu governor
to handle dynamic states) wanted to introduce dynamic C-states, but that
idea was dropped long ago.  The nonsense deeper C-state checking
remained, though.

Since both target_residency and exit_latency are longer for deeper
idle state, there's no need to waste CPU time on useless checks.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
Acked-by: Rik van Riel <riel@redhat.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 10:56:00 +01:00
Daniel Lezcano e5f1b24587 cpuidle: governors: Remove remaining old module code
The governor's code use try_module_get() and put_module() to refcount
the governor's module. But the governors are not compiled as module.

The refcount does not prevent to switch the governor or unload
a module as they aren't compiled as modules. The code is pointless,
so remove it.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-21 14:49:51 +02:00
Rafael J. Wysocki 0c313cb207 cpuidle: menu: Fall back to polling if next timer event is near
Commit a9ceb78bc7 (cpuidle,menu: use interactivity_req to disable
polling) changed the behavior of the fallback state selection part
of menu_select() so it looks at interactivity_req instead of
data->next_timer_us when it makes its decision.  That effectively
caused polling to be used more often as fallback idle which led to
significant increases of energy consumption in some cases.

Commit e132b9b3bc (cpuidle: menu: use high confidence factors
only when considering polling) changed that logic again to be more
predictable, but that didn't help with the increased energy
consumption problem.

For this reason, go back to making decisions on which state to fall
back to based on data->next_timer_us which is the time we know for
sure something will happen rather than a prediction (which may be
inaccurate and turns out to be so often enough to be problematic).
However, take the target residency of the first proper idle state
(C1) into account, so that state is not used as the fallback one
if its target residency is greater than data->next_timer_us.

Fixes: a9ceb78bc7 (cpuidle,menu: use interactivity_req to disable polling)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-and-tested-by: Doug Smythies <dsmythies@telus.net>
2016-03-21 15:50:28 +01:00
Rik van Riel e132b9b3bc cpuidle: menu: use high confidence factors only when considering polling
The menu governor uses five different factors to pick the
idle state:
 - the user configured latency_req
 - the time until the next timer (next_timer_us)
 - the typical sleep interval, as measured recently
 - an estimate of sleep time by dividing next_timer_us by an observed factor
 - a load corrected version of the above, divided again by load

Only the first three items are known with enough confidence that
we can use them to consider polling, instead of an actual CPU
idle state, because the cost of being wrong about polling can be
excessive power use.

The latter two are used in the menu governor's main selection
loop, and can result in choosing a shallower idle state when
the system is expected to be busy again soon.

This pushes a busy system in the "performance" direction of
the performance<>power tradeoff, when choosing between idle
states, but stays more strictly on the "power" state when
deciding between polling and C1.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-17 02:40:32 +01:00
Rasmus Villemoes 3b99669b75 cpuidle: menu: help gcc generate slightly better code
We know that the avg variable actually ends up holding a 32 bit
quantity, since it's an average of such numbers. It is only a u64
because it is temporarily used to hold the sum. Making it an actual
u32 allows gcc to generate slightly better code, e.g. when computing
the square, it can do a 32x32->64 multiply.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-17 00:28:15 +01:00
Rasmus Villemoes 7024b18ca4 cpuidle: menu: avoid expensive square root computation
Computing the integer square root is a rather expensive operation, at
least compared to doing a 64x64 -> 64 multiply (avg*avg) and, on 64
bit platforms, doing an extra comparison to a constant (variance <=
U64_MAX/36).

On 64 bit platforms, this does mean that we add a restriction on the
range of the variance where we end up using the estimate (since
previously the stddev <= ULONG_MAX was a tautology), but on the other
hand, we extend the range quite substantially on 32 bit platforms - in
both cases, we now allow standard deviations up to 715 seconds, which
is for example guaranteed if all observations are less than 1430
seconds.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-17 00:27:16 +01:00
Rafael J. Wysocki 5bb1729cbd cpuidle: menu: Avoid pointless checks in menu_select()
If menu_select() cannot find a suitable state to return, it will
return the state index stored in data->last_state_idx.  This
means that it is pointless to look at the states whose indices
are less than or equal to data->last_state_idx in the main loop,
so don't do that.

Given that those checks are done on every idle state selection, this
change can save quite a bit of completely unnecessary overhead.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
2016-01-19 15:28:23 +01:00
Rafael J. Wysocki 9c4b2867ed cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0
Commit a9ceb78bc7 (cpuidle,menu: use interactivity_req to disable
polling) exposed a bug in menu_select() causing it to return -1
on systems with CPUIDLE_DRIVER_STATE_START equal to zero, although
it should have returned 0.  As a result, idle states are not entered
by CPUs on those systems.

Namely, on the systems in question data->last_state_idx is initially
equal to -1 and the above commit modified the condition that would
have caused it to be changed to 0 to be less likely to trigger which
exposed the problem.  However, setting data->last_state_idx initially
to -1 doesn't make sense at all and on the affected systems it should
always be set to CPUIDLE_DRIVER_STATE_START (ie. 0) unconditionally,
so make that happen.

Fixes: a9ceb78bc7 (cpuidle,menu: use interactivity_req to disable polling)
Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-14 23:24:22 +01:00
Rik van Riel efddfd90fb cpuidle,menu: smooth out measured_us calculation
The cpuidle state tables contain the maximum exit latency for each
cpuidle state. On x86, that is the exit latency for when the entire
package goes into that same idle state.

However, a lot of the time we only go into the core idle state,
not the package idle state. This means we see a much smaller exit
latency.

We have no way to detect whether we went into the core or package
idle state while idle, and that is ok.

However, the current menu_update logic does have the potential to
trip up the repeating pattern detection in get_typical_interval.
If the system is experiencing an exit latency near the idle state's
exit latency, some of the samples will have exit_us subtracted,
while others will not. This turns a repeating pattern into mush,
potentially breaking get_typical_interval.

Furthermore, for smaller sleep intervals, we know the chance that
all the cores in the package went to the same idle state are fairly
small. Dividing the measured_us by two, instead of subtracting the
full exit latency when hitting a small measured_us, will reduce the
error.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-17 02:24:25 +01:00
Rik van Riel a9ceb78bc7 cpuidle,menu: use interactivity_req to disable polling
The menu governor carefully figures out how much time we typically
sleep for an estimated sleep interval, or whether there is a repeating
pattern going on, and corrects that estimate for the CPU load.

Then it proceeds to ignore that information when determining whether
or not to consider polling. This is not a big deal on most x86 CPUs,
which have very low C1 latencies, and the patch should not have any
effect on those CPUs.

However, certain CPUs (eg. Atom) have much higher C1 latencies, and
it would be good to not waste performance and power on those CPUs if
we are expecting a very low wakeup latency.

Disable polling based on the estimated interactivity requirement, not
on the time to the next timer interrupt.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-17 02:24:25 +01:00
Rik van Riel 7884084f3b cpuidle,x86: increase forced cut-off for polling to 20us
The cpuidle menu governor has a forced cut-off for polling at 5us,
in order to deal with firmware that gives the OS bad information
on cpuidle states, leading to the system spending way too much time
in polling.

However, at least one x86 CPU family (Atom) has chips that have
a 20us break-even point for C1. Forcing the polling cut-off to
less than that wastes performance and power.

Increase the polling cut-off to 20us.

Systems with a lower C1 latency will be found in the states table by
the menu governor, which will pick those states as appropriate.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-17 02:24:24 +01:00
Rafael J. Wysocki a802ea9645 cpuidle: Check the sign of index in cpuidle_reflect()
Avoid calling the governor's ->reflect method if the state index
passed to cpuidle_reflect() is negative.

This allows the analogous check to be dropped from menu_reflect(),
so do that too, and ensures that arbitrary error codes can be
passed to cpuidle_reflect() as the index with no adverse
consequences.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-05-04 22:53:28 +02:00
Javi Merino ee3c86f356 cpuidle: menu: use DIV_ROUND_CLOSEST_ULL()
Now that the kernel provides DIV_ROUND_CLOSEST_ULL(), drop the internal
implementation and use the kernel one.

Signed-off-by: Javi Merino <javi.merino@arm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:03:55 -04:00
Len Brown 4108b3d962 cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID
When menu sees CPUIDLE_FLAG_TIME_INVALID, it ignores its timestamps,
and assumes that idle lasted as long as the time till next predicted
timer expiration.

But if an interrupt was seen and serviced before that duration,
it would actually be more accurate to use the measured time
rather than rounding up to the next predicted timer expiration.

And if an interrupt is seen and serviced such that the mesured time
exceeds the time till next predicted timer expiration, then
truncating to that expiration is the right thing to do --
since we can never stay idle past that timer expiration.

So the code can do a better job without
checking for CPUIDLE_FLAG_TIME_INVALID.

Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-17 02:26:28 +01:00
Daniel Lezcano b82b6cca48 cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic
The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
method is not set. Otherwise for all the drivers, the time can be correctly
measured.

Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
for all the states, just invert the logic by replacing it by the flag
CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
driver, remove the former flag from all the drivers and invert the logic with
this flag in the different governor.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 21:17:27 +01:00
Christoph Lameter 229b6863b2 drivers/cpuidle: Replace __get_cpu_var uses for address calculation
All of these are for address calculation. Replace with
this_cpu_ptr().

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-pm@vger.kernel.org
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
[cpufreq changes]
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:45 -04:00
Linus Torvalds c9d26423e5 More ACPI and power management updates for 3.17-rc1
- Fix for an ACPI-based device hotplug regression introduced in 3.14
    that causes a kernel panic to trigger when memory hot-remove is
    attempted with CONFIG_ACPI_HOTPLUG_MEMORY unset from Tang Chen.
 
  - Fix for a cpufreq regression introduced in 3.16 that triggers a
    "sleeping function called from invalid context" bug in
    dev_pm_opp_init_cpufreq_table() from Stephen Boyd.
 
  - ACPI battery driver fix for a warning message added in 3.16 that
    prints silly stuff sometimes from Mariusz Ceier.
 
  - Hibernation fix for safer handling of mismatches in the 820 memory
    map between the configurations during image creation and during
    the subsequent restore from Chun-Yi Lee.
 
  - ACPI processor driver fix to handle CPU hotplug notifications
    correctly during system suspend/resume from Lan Tianyu.
 
  - Series of four cpuidle menu governor cleanups that also should
    speed it up a bit from Mel Gorman.
 
  - Fixes for the speedstep-smi, integrator, cpu0 and arm_big_little
    cpufreq drivers from Hans Wennborg, Himangi Saraogi, Markus Pargmann
    and Uwe Kleine-König.
 
  - Version 3.0 of the analyze_suspend.py suspend profiling tool
    from Todd E Brandt.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJT7UnNAAoJEILEb/54YlRxcxIP/ROFeak3+5tt3hkvZCevxpUh
 AMPccgUoqsF2dognO3pcR4AgGP+meM6Qw0zBjPDNx6oo87hw7P1HlngfaRPHnWPh
 iAkY2p1QhGAZW29vqxqBIdLVP9M+Nje0tvOX8/6QEsQgo2y6YCbJU0zITmvb8Tsk
 183cXiz6xXDezt4sPeIVg2QVfngVFtOeNVgHDIhldQSF6zUQJP/3+BVutvaj3olt
 2O3qpNfwJjFh9p6LWQ+CAalq3hJyNZ6ettLNCvudeq4kqRo49WAdjHaRW+qju/NR
 dWybO29MfviczABVQ1ReqSnz0MJOqhZNxkEi5KqnYBb3fx8e2XffsBFzFzTp6BJi
 bp4ALcFIu9r5ctWVxQhmgEC6uhYMIXZ681sH99HyIdzk2cNRgMxRj6u2aVe/Cczu
 Bb489CRHmOrZyXrkmENg+LkOYBNoXcT+RepH9Ex8R+TNBlKLEBKMMgPrfbFeVKWB
 Vm621tHNATJG8nJcs3zJulM2FQ0q8c2irw6WwhUxzbSOxmqSvO5zN3OgYt+c+gWk
 MmA8IhUpQBLkqBx1FMi0lOOdIW3qKZJFrU39VQEjoP4P1nXgf373NPlfgzMvEvqM
 qQ8srMKFUjYxH3g0ftWk5a2MwEjyHQpvZe0djsMCN7ZkFLwUe1ri/R9Ja2LLQcIZ
 SyVkFbbO+moXTRMA1yA9
 =kpiw
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI and power management updates from Rafael Wysocki:
 "These are a couple of regression fixes, cpuidle menu governor
  optimizations, fixes for ACPI proccessor and battery drivers,
  hibernation fix to avoid problems related to the e820 memory map,
  fixes for a few cpufreq drivers and a new version of the suspend
  profiling tool analyze_suspend.py.

  Specifics:

   - Fix for an ACPI-based device hotplug regression introduced in 3.14
     that causes a kernel panic to trigger when memory hot-remove is
     attempted with CONFIG_ACPI_HOTPLUG_MEMORY unset from Tang Chen

   - Fix for a cpufreq regression introduced in 3.16 that triggers a
     "sleeping function called from invalid context" bug in
     dev_pm_opp_init_cpufreq_table() from Stephen Boyd

   - ACPI battery driver fix for a warning message added in 3.16 that
     prints silly stuff sometimes from Mariusz Ceier

   - Hibernation fix for safer handling of mismatches in the 820 memory
     map between the configurations during image creation and during the
     subsequent restore from Chun-Yi Lee

   - ACPI processor driver fix to handle CPU hotplug notifications
     correctly during system suspend/resume from Lan Tianyu

   - Series of four cpuidle menu governor cleanups that also should
     speed it up a bit from Mel Gorman

   - Fixes for the speedstep-smi, integrator, cpu0 and arm_big_little
     cpufreq drivers from Hans Wennborg, Himangi Saraogi, Markus
     Pargmann and Uwe Kleine-König

   - Version 3.0 of the analyze_suspend.py suspend profiling tool from
     Todd E Brandt"

* tag 'pm+acpi-3.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / battery: Fix warning message in acpi_battery_get_state()
  PM / tools: analyze_suspend.py: update to v3.0
  cpufreq: arm_big_little: fix module license spec
  cpufreq: speedstep-smi: fix decimal printf specifiers
  ACPI / hotplug: Check scan handlers in acpi_scan_hot_remove()
  cpufreq: OPP: Avoid sleeping while atomic
  cpufreq: cpu0: Do not print error message when deferring
  cpufreq: integrator: Use set_cpus_allowed_ptr
  PM / hibernate: avoid unsafe pages in e820 reserved regions
  ACPI / processor: Make acpi_cpu_soft_notify() process CPU FROZEN events
  cpuidle: menu: Lookup CPU runqueues less
  cpuidle: menu: Call nr_iowait_cpu less times
  cpuidle: menu: Use ktime_to_us instead of reinventing the wheel
  cpuidle: menu: Use shifts when calculating averages where possible
2014-08-14 18:13:46 -06:00
Linus Torvalds 158c12948f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree changes from Jiri Kosina:
 "Summer edition of trivial tree updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: fix two typos in watchdog-api.txt
  irq-gic: remove file name from heading comment
  MAINTAINERS: Add miscdevice.h to file list for char/misc drivers.
  scsi: mvsas: mv_sas.c: Fix for possible null pointer dereference
  doc: replace "practise" with "practice" in Documentation
  befs: remove check for CONFIG_BEFS_RW
  scsi: doc: fix 'SCSI_NCR_SETUP_MASTER_PARITY'
  drivers/usb/phy/phy.c: remove a leading space
  mfd: fix comment
  cpuidle: fix comment
  doc: hpfall.c: fix missing null-terminate after strncpy call
  usb: doc: hotplug.txt code typos
  kbuild: fix comment in Makefile.modinst
  SH: add proper prompt to SH_MAGIC_PANEL_R2_VERSION
  ARM: msm: Remove MSM_SCM
  crypto: Remove MPILIB_EXTRA
  doc: CN: remove dead link, kerneltrap.org no longer works
  media: update reference, kerneltrap.org no longer works
  hexagon: update reference, kerneltrap.org no longer works
  doc: LSM: update reference, kerneltrap.org no longer works
  ...
2014-08-06 21:03:53 -07:00
Mel Gorman 372ba8cb46 cpuidle: menu: Lookup CPU runqueues less
The menu governer makes separate lookups of the CPU runqueue to get
load and number of IO waiters but it can be done with a single lookup.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-06 21:17:45 +02:00
Mel Gorman 64b4ca5cb6 cpuidle: menu: Call nr_iowait_cpu less times
menu_select() via inline functions calls nr_iowait_cpu() twice as much
as necessary.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-06 21:17:44 +02:00
Mel Gorman 107d4f4601 cpuidle: menu: Use ktime_to_us instead of reinventing the wheel
The ktime_to_us implementation is slightly better than the one implemented
in menu.c. Use it

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-06 21:17:44 +02:00