Commit graph

841 commits

Author SHA1 Message Date
Linus Torvalds 87093826aa Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes from Ingo Molnar:
 "Main changes in this cycle were:

   - Updated full dynticks support.

   - Event stream support for architected (ARM) timers.

   - ARM clocksource driver updates.

   - Move arm64 to using the generic sched_clock framework & resulting
     cleanup in the generic sched_clock code.

   - Misc fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
  clocksource: sun4i: remove IRQF_DISABLED
  clocksource: sun4i: Report the minimum tick that we can program
  clocksource: sun4i: Select CLKSRC_MMIO
  clocksource: Provide timekeeping for efm32 SoCs
  clocksource: em_sti: convert to clk_prepare/unprepare
  time: Fix signedness bug in sysfs_get_uname() and its callers
  timekeeping: Fix some trivial typos in comments
  alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
  clocksource: arch_timer: Do not register arch_sys_counter twice
  timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
  sched_clock: Remove sched_clock_func() hook
  arch_timer: Move to generic sched_clock framework
  clocksource: tcb_clksrc: Remove IRQF_DISABLED
  clocksource: tcb_clksrc: Improve driver robustness
  clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare
  clocksource: arm_arch_timer: Use clocksource for suspend timekeeping
  clocksource: dw_apb_timer_of: Mark a few more functions as __init
  clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally
  arm: zynq: Enable arm_global_timer
  ...
2013-11-12 10:36:00 +09:00
Thomas Gleixner 97b9410643 clockevents: Sanitize ticks to nsec conversion
Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use
clockevents_config_and_register() where possible" caused a regression
for some of the converted subarchs.

The reason is, that the clockevents core code converts the minimal
hardware tick delta to a nanosecond value for core internal
usage. This conversion is affected by integer math rounding loss, so
the backwards conversion to hardware ticks will likely result in a
value which is less than the configured hardware limitation. The
affected subarchs used their own workaround (SIGH!) which got lost in
the conversion.

The solution for the issue at hand is simple: adding evt->mult - 1 to
the shifted value before the integer divison in the core conversion
function takes care of it. But this only works for the case where for
the scaled math mult/shift pair "mult <= 1 << shift" is true. For the
case where "mult > 1 << shift" we can apply the rounding add only for
the minimum delta value to make sure that the backward conversion is
not less than the given hardware limit. For the upper bound we need to
omit the rounding add, because the backwards conversion is always
larger than the original latch value. That would violate the upper
bound of the hardware device.

Though looking closer at the details of that function reveals another
bogosity: The upper bounds check is broken as well. Checking for a
resulting "clc" value greater than KTIME_MAX after the conversion is
pointless. The conversion does:

      u64 clc = (latch << evt->shift) / evt->mult;

So there is no sanity check for (latch << evt->shift) exceeding the
64bit boundary. The latch argument is "unsigned long", so on a 64bit
arch the handed in argument could easily lead to an unnoticed shift
overflow. With the above rounding fix applied the calculation before
the divison is:

       u64 clc = (latch << evt->shift) + evt->mult - 1;

So we need to make sure, that neither the shift nor the rounding add
is overflowing the u64 boundary.

[ukl: move assignment to rnd after eventually changing mult, fix build
 issue and correct comment with the right math]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: nicolas.ferre@atmel.com
Cc: Marc Pignat <marc.pignat@hevs.ch>
Cc: john.stultz@linaro.org
Cc: kernel@pengutronix.de
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
2013-10-23 12:51:21 +02:00
Patrick Palka 891292a767 time: Fix signedness bug in sysfs_get_uname() and its callers
sysfs_get_uname() is erroneously declared as returning size_t even
though it may return a negative value, specifically -EINVAL.  Its
callers then check whether its return value is less than zero and indeed
that is never the case for size_t.

This patch changes sysfs_get_uname() to return ssize_t and makes sure
its callers use ssize_t accordingly.

Signed-off-by: Patrick Palka <patrick@parcs.ath.cx>
[jstultz: Didn't apply cleanly, as a similar partial fix was also applied
so had to resolve the collisions]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-10-18 16:45:58 -07:00
Xie XiuQi b7bc50e451 timekeeping: Fix some trivial typos in comments
Fix some typos in timekeeping comments.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[jstultz: Commit message tweaks]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-10-18 16:30:17 -07:00
KOSAKI Motohiro 98d6f4dd84 alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
Fedora Ruby maintainer reported latest Ruby doesn't work on Fedora Rawhide
on ARM. (http://bugs.ruby-lang.org/issues/9008)

Because of, commit 1c6b39ad3f (alarmtimers: Return -ENOTSUPP if no
RTC device is present) intruduced to return ENOTSUPP when
clock_get{time,res} can't find a RTC device. However this is incorrect.

First, ENOTSUPP isn't exported to userland (ENOTSUP or EOPNOTSUP are the
closest userland equivlents).

Second, Posix and Linux man pages agree that clock_gettime and
clock_getres should return EINVAL if clk_id argument is invalid.
While the arugment that the clockid is valid, but just not supported
on this hardware could be made, this is just a technicality that
doesn't help userspace applicaitons, and only complicates error
handling.

Thus, this patch changes the code to use EINVAL.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: stable <stable@vger.kernel.org>  #3.0 and up
Reported-by: Vit Ondruch <v.ondruch@tiscali.cz>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
[jstultz: Tweaks to commit message to include full rational]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-10-18 16:23:58 -07:00
Dong Zhu 2cb763614c timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
We can enable/disable timer statistics collection via:

  echo [1|0] > /proc/timers_stats

and it would be nice if apps had the ability to check
what the current collection status is.

This patch adds a 'Collection: active/inactive' line to display the
current timer collection status.

Also bump up the timer stats version to v0.3.

Signed-off-by: Dong Zhu <bluezhudong@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20131010075618.GH2139@zhudong.nay.redhat.com
[ Improved the changelog and the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-10 09:59:25 +02:00
Ingo Molnar 8a749de5e3 Merge branch 'fortglx/3.13/time' of git://git.linaro.org/people/jstultz/linux into timers/core
Pull more timekeeping items for v3.13 from John Stultz:

  * Small cleanup in the clocksource code.

  * Fix for rtc-pl031 to let it work with alarmtimers.

  * Move arm64 to using the generic sched_clock framework & resulting
    cleanup in the generic sched_clock code.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-10 06:25:23 +02:00
Stephen Boyd b4042ceaab sched_clock: Remove sched_clock_func() hook
Nobody is using sched_clock_func() anymore now that sched_clock
supports up to 64 bits. Remove the hook so that new code only
uses sched_clock_register().

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-10-09 16:54:39 -07:00
Ingo Molnar 68e9074028 Merge branch 'clockevents/3.13' of git://git.linaro.org/people/dlezcano/linux into timers/core
Pull (mostly) ARM clocksource driver updates from Daniel Lezcano:

" - Soren Brinkmann added FEAT_PERCPU to a clock device when it is local
    per cpu. This feature prevents the clock framework to choose a per cpu
    timer as a broadcast timer. This problem arised when the ARM global
    timer is used when switching to the broadcast timer which is the case
    now on Xillinx with its cpuidle driver.

  - Stephen Boyd extended the generic sched_clock code to support 64bit
    counters and removes the setup_sched_clock deprecation, as that causes
    lots of warnings since there's still users in the arch/arm tree. He
    added also the CLOCK_SOURCE_SUSPEND_NONSTOP flag on the architected
    timer as they continue counting during suspend.

  - Uwe Kleine-König added some missing __init sections and consolidated the
    code by moving the of_node_put call from the drivers to the function
    clocksource_of_init. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-03 07:57:02 +02:00
Ingo Molnar 19f29887a7 Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core
Merge updated full dynticks support from Frederic Weisbecker:

   - support 32-bit systems (full dynticks was 64-bit only before)
   - support ARM

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-03 07:53:25 +02:00
Ingo Molnar 6c09f6d830 Linux 3.12-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSSKOHAAoJEHm+PkMAQRiGeREH/3EqHmJPBzmVoJwR9/ykDoLg
 u+TJTkuxZG220WhgXS7W/0ECyBX0U7yA0bY9PZbqgcdiLjY0veR18/pOhEq5RzHq
 ub8Q+AJdiORF/sq268q7gnNmy3rSCgnrAyHA/bzBtkbisYODwZPYvWQVUjgNZ2dW
 qtW/TE9rjANcUrk8WdOu9oWcwsq4cyG3cscbfHE/JLFy/8tB5GoD158gxKLZsLXk
 uTCeUHMmvFRT56fZwfyvNstA8ozxXcHBmuu6+Ttceky2zeGzp6dOrd+d2SU1Ps3O
 P91x4e/Af4RFEwDczGP6TpSBEf/J/JaqrM1drjhnQHho0hrNRZVUXhADFVADCXY=
 =dOjB
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc3' into timers/core

Merge Linux 3.12-rc3 - refresh the tree with the latest fixes before merging new bits.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-03 07:52:21 +02:00
Soren Brinkmann 245a349626 tick: broadcast: Deny per-cpu clockevents from being broadcast sources
On most ARM systems the per-cpu clockevents are truly per-cpu in
the sense that they can't be controlled on any other CPU besides
the CPU that they interrupt. If one of these clockevents were to
become a broadcast source we will run into a lot of trouble
because the broadcast source is enabled on the first CPU to go
into deep idle (if that CPU suffers from FEAT_C3_STOP) and that
could be a different CPU than what the clockevent is interrupting
(or even worse the CPU that the clockevent interrupts could be
offline).

Theoretically it's possible to support per-cpu clockevents as the
broadcast source but so far we haven't needed this and supporting
it is rather complicated. Let's just deny the possibility for now
until this becomes a reality (let's hope it never does!).

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Michal Simek <michal.simek@xilinx.com>
2013-10-02 11:34:06 +02:00
Kevin Hilman ff3fb25412 nohz: Drop generic vtime obsolete dependency on CONFIG_64BIT
The CONFIG_64BIT requirement on vtime can finally be removed
since we now depend on HAVE_VIRT_CPU_ACCOUNTING_GEN which
already takes care of the arch ability to handle nsecs based
cputime_t safely.

Signed-off-by: Kevin Hilman <khilman@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Arm Linux <linux-arm-kernel@lists.infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-09-30 15:37:01 +02:00
Kevin Hilman 554b0004d0 vtime: Add HAVE_VIRT_CPU_ACCOUNTING_GEN Kconfig
With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. In order
to use that feature, arch code should be audited to ensure there are no
races in concurrent read/write of cputime_t. For example,
reading/writing 64-bit cputime_t on some 32-bit arches may require
multiple accesses for low and high value parts, so proper locking
is needed to protect against concurrent accesses.

Therefore, add CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN which arches can
enable after they've been audited for potential races.

This option is automatically enabled on 64-bit platforms.

Feature requested by Frederic Weisbecker.

Signed-off-by: Kevin Hilman <khilman@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Arm Linux <linux-arm-kernel@lists.infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-09-30 15:35:53 +02:00
Linus Torvalds 9d2cd7048b Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "An NTP related lockup fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix HRTICK related deadlock from ntp lock changes
2013-09-18 11:24:49 -05:00
Elad Wexler 233bcb411c clocksource: Fix 'ret' data type of sysfs_override_clocksource() and sysfs_unbind_clocksource()
sysfs_override_clocksource(): The expression 'if (ret >= 0)' is always true.
This will cause clocksource_select() to always run.
Thus modified ret to be of type ssize_t.

sysfs_unbind_clocksource(): The expression 'if (ret < 0)' is always false.
So in case sysfs_get_uname() failed, the expression won't take an effect.
Thus modified ret to be of type ssize_t.

Signed-off-by: Elad Wexler <elad.wexler@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-09-17 11:19:27 -07:00
John Stultz 389e067032 Merge branch 'fortglx/3.12/time' into fortglx/3.13/time
Merge in the timekeeping changes that missed 3.12

Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-09-16 18:54:07 -07:00
John Stultz 19c3205cea Merge branch 'fortglx/3.12/sched-clock64-base' into fortglx/3.13/time
Merge in 64bit sched_clock support that missed 3.12.

Conflicts:
	kernel/time/sched_clock.c

Signed-off-by: John.Stultz <john.stultz@linaro.org>
2013-09-16 18:52:52 -07:00
John Stultz 7bd3601446 timekeeping: Fix HRTICK related deadlock from ntp lock changes
Gerlando Falauto reported that when HRTICK is enabled, it is
possible to trigger system deadlocks. These were hard to
reproduce, as HRTICK has been broken in the past, but seemed
to be connected to the timekeeping_seq lock.

Since seqlock/seqcount's aren't supported w/ lockdep, I added
some extra spinlock based locking and triggered the following
lockdep output:

[   15.849182] ntpd/4062 is trying to acquire lock:
[   15.849765]  (&(&pool->lock)->rlock){..-...}, at: [<ffffffff810aa9b5>] __queue_work+0x145/0x480
[   15.850051]
[   15.850051] but task is already holding lock:
[   15.850051]  (timekeeper_lock){-.-.-.}, at: [<ffffffff810df6df>] do_adjtimex+0x7f/0x100

<snip>

[   15.850051] Chain exists of: &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock
[   15.850051]  Possible unsafe locking scenario:
[   15.850051]
[   15.850051]        CPU0                    CPU1
[   15.850051]        ----                    ----
[   15.850051]   lock(timekeeper_lock);
[   15.850051]                                lock(&p->pi_lock);
[   15.850051] lock(timekeeper_lock);
[   15.850051] lock(&(&pool->lock)->rlock);
[   15.850051]
[   15.850051]  *** DEADLOCK ***

The deadlock was introduced by 06c017fdd4 ("timekeeping:
Hold timekeepering locks in do_adjtimex and hardpps") in 3.10

This patch avoids this deadlock, by moving the call to
schedule_delayed_work() outside of the timekeeper lock
critical section.

Reported-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Tested-by: Lin Ming <minggr@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable <stable@vger.kernel.org> #3.11, 3.10
Link: http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 07:49:51 +02:00
Linus Torvalds 6832d9652f Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers/nohz changes from Ingo Molnar:
 "It mostly contains fixes and full dynticks off-case optimizations, by
  Frederic Weisbecker"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  nohz: Include local CPU in full dynticks global kick
  nohz: Optimize full dynticks's sched hooks with static keys
  nohz: Optimize full dynticks state checks with static keys
  nohz: Rename a few state variables
  vtime: Always debug check snapshot source _before_ updating it
  vtime: Always scale generic vtime accounting results
  vtime: Optimize full dynticks accounting off case with static keys
  vtime: Describe overriden functions in dedicated arch headers
  m68k: hardirq_count() only need preempt_mask.h
  hardirq: Split preempt count mask definitions
  context_tracking: Split low level state headers
  vtime: Fix racy cputime delta update
  vtime: Remove a few unneeded generic vtime state checks
  context_tracking: User/kernel broundary cross trace events
  context_tracking: Optimize context switch off case with static keys
  context_tracking: Optimize guest APIs off case with static key
  context_tracking: Optimize main APIs off case with static key
  context_tracking: Ground setup for static key use
  context_tracking: Remove full dynticks' hacky dependency on wide context tracking
  nohz: Only enable context tracking on full dynticks CPUs
  ...
2013-09-04 09:36:54 -07:00
Ingo Molnar 7d992feb76 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

"
 * Update RCU documentation.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/611.

 * Miscellaneous fixes.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/619.

 * Full-system idle detection.  This is for use by Frederic
   Weisbecker's adaptive-ticks mechanism.  Its purpose is
   to allow the timekeeping CPU to shut off its tick when
   all other CPUs are idle.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/648.

 * Improve rcutorture test coverage.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/675.
"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-03 07:41:11 +02:00
Paul E. McKenney 0edd1b1784 nohz_full: Add full-system-idle state machine
This commit adds the state machine that takes the per-CPU idle data
as input and produces a full-system-idle indication as output.  This
state machine is driven out of RCU's quiescent-state-forcing
mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU
idle state and then rcu_sysidle_report() to drive the state machine.

The full-system-idle state is sampled using rcu_sys_is_idle(), which
also drives the state machine if RCU is idle (and does so by forcing
RCU to become non-idle).  This function returns true if all but the
timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long
enough to avoid memory contention on the full_sysidle_state state
variable.  The rcu_sysidle_force_exit() may be called externally
to reset the state machine back into non-idle state.

For large systems the state machine is driven out of RCU's
force-quiescent-state logic, which provides good scalability at the price
of millisecond-scale latencies on the transition to full-system-idle
state.  This is not so good for battery-powered systems, which are usually
small enough that they don't need to care about scalability, but which
do care deeply about energy efficiency.  Small systems therefore drive
the state machine directly out of the idle-entry code.  The number of
CPUs in a "small" system is defined by a new NO_HZ_FULL_SYSIDLE_SMALL
Kconfig parameter, which defaults to 8.  Note that this is a build-time
definition.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
[ paulmck: Use true and false for boolean constants per Lai Jiangshan. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
[ paulmck: Simplify logic and provide better comments for memory barriers,
  based on review comments and questions by Lai Jiangshan. ]
2013-08-31 14:43:50 -07:00
Nathan Zimmer 84a78a6504 timer_list: correct the iterator for timer_list
Correct an issue with /proc/timer_list reported by Holger.

When reading from the proc file with a sufficiently small buffer, 2k so
not really that small, there was one could get hung trying to read the
file a chunk at a time.

The timer_list_start function failed to account for the possibility that
the offset was adjusted outside the timer_list_next.

Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Reported-by: Holger Hans Peter Freyther <holger@freyther.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Berke Durak <berke.durak@xiphos.com>
Cc: Jeff Layton <jlayton@redhat.com>
Tested-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> # 3.10.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-28 19:26:38 -07:00
Miroslav Lichvar a97ad0c4b4 ntp: Make periodic RTC update more reliable
The current code requires that the scheduled update of the RTC happens
in the closest tick to the half of the second. This seems to be
difficult to achieve reliably. The scheduled work may be missing the
target time by a tick or two and be constantly rescheduled every second.

Relax the limit to 10 ticks. As a typical RTC drifts in the 11-minute
update interval by several milliseconds, this shouldn't affect the
overall accuracy of the RTC much.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-08-22 12:33:38 -07:00
Linus Torvalds e91dade52b Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Three small fixlets"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: fix compile warning in tick_nohz_init()
  nohz: Do not warn about unstable tsc unless user uses nohz_full
  sched_clock: Fix integer overflow
2013-08-19 09:17:35 -07:00
Paul E. McKenney b44379af1c nohz_full: Add Kconfig parameter for scalable detection of all-idle state
At least one CPU must keep the scheduling-clock tick running for
timekeeping purposes whenever there is a non-idle CPU.  However, with
the new nohz_full adaptive-idle machinery, it is difficult to distinguish
between all CPUs really being idle as opposed to all non-idle CPUs being
in adaptive-ticks mode.  This commit therefore adds a Kconfig parameter
as a first step towards enabling a scalable detection of full-system
idle state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
[ paulmck: Update help text per Frederic Weisbecker. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-08-18 18:07:02 -07:00
Frederic Weisbecker c2e7fcf53c nohz: Include local CPU in full dynticks global kick
tick_nohz_full_kick_all() is useful to notify all full dynticks
CPUs that there is a system state change to checkout before
re-evaluating the need for the tick.

Unfortunately this is implemented using smp_call_function_many()
that ignores the local CPU. This CPU also needs to re-evaluate
the tick.

on_each_cpu_mask() is not useful either because we don't want to
re-evaluate the tick state in place but asynchronously from an IPI
to avoid messing up with any random locking scenario.

So lets call tick_nohz_full_kick() from tick_nohz_full_kick_all()
so that the usual irq work takes care of it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375460996-16329-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16 17:55:33 +02:00
Ingo Molnar 6f1d657668 Merge branch 'timers/nohz-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/nohz
Pull nohz improvements from Frederic Weisbecker:

 " It mostly contains fixes and full dynticks off-case optimizations. I believe that
   distros want to enable this feature so it seems important to optimize the case
   where the "nohz_full=" parameter is empty. ie: I'm trying to remove any performance
   regression that comes with NO_HZ_FULL=y when the feature is not used.

   This patchset improves the current situation a lot (off-case appears to be around 11% faster
   with hackbench, although I guess it may vary depending on the configuration but it should be
   significantly faster in any case) now there is still some work to do: I can still observe a
   remaining loss of 1.6% throughput seen with hackbench compared to CONFIG_NO_HZ_FULL=n. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-14 17:58:56 +02:00
Frederic Weisbecker d13508f944 nohz: Optimize full dynticks's sched hooks with static keys
Scheduler IPIs and task context switches are serious fast path.
Let's try to hide as much as we can the impact of full
dynticks APIs' off case that are called on these sites
through the use of static keys.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
2013-08-14 17:14:58 +02:00
Frederic Weisbecker 460775df46 nohz: Optimize full dynticks state checks with static keys
These APIs are frequenctly accessed and priority is given
to optimize the full dynticks off-case in order to let
distros enable this feature without suffering from
significant performance regressions.

Let's inline these APIs and optimize them with static keys.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
2013-08-14 17:14:57 +02:00
Frederic Weisbecker 73867dcd07 nohz: Rename a few state variables
Rename the full dynticks's cpumask and cpumask state variables
to some more exportable names.

These will be used later from global headers to optimize
the main full dynticks APIs in conjunction with static keys.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
2013-08-14 17:14:57 +02:00
Frederic Weisbecker d84d27a491 context_tracking: Remove full dynticks' hacky dependency on wide context tracking
Now that the full dynticks subsystem only enables the context tracking
on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE

This dependency was a hack to enable the context tracking widely for the
full dynticks susbsystem until the latter becomes able to enable it in a
more CPU-finegrained fashion.

Now CONTEXT_TRACKING_FORCE only stands for testing on archs that
work on support for the context tracking while full dynticks can't be
used yet due to unmet dependencies. It simulates a system where all CPUs
are full dynticks so that RCU user extended quiescent states and dynticks
cputime accounting can be tested on the given arch.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13 00:54:34 +02:00
Frederic Weisbecker 2e70933866 nohz: Only enable context tracking on full dynticks CPUs
The context tracking subsystem has the ability to selectively
enable the tracking on any defined subset of CPU. This means that
we can define a CPU range that doesn't run the context tracking
and another range that does.

Now what we want in practice is to enable the tracking on full
dynticks CPUs only. In order to perform this, we just need to pass
our full dynticks CPU range selection from the full dynticks
subsystem to the context tracking.

This way we can spare the overhead of RCU user extended quiescent
state and vtime maintainance on the CPUs that are outside the
full dynticks range. Just keep in mind the raw context tracking
itself is still necessary everywhere.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13 00:54:07 +02:00
Ingo Molnar ae920eb242 Merge branch 'fortglx/3.11/time' of git://git.linaro.org/people/jstultz/linux into timers/urgent
Pull small fix for v3.11 from John Stultz.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 18:08:23 +02:00
Stephen Boyd e7e3ff1bfe sched_clock: Add support for >32 bit sched_clock
The ARM architected system counter has at least 56 usable bits.
Add support for counters with more than 32 bits to the generic
sched_clock implementation so we can increase the time between
wakeups due to dealing with wrap-around on these devices while
benefiting from the irqtime accounting and suspend/resume
handling that the generic sched_clock code already has. On my
system using 56 bits over 32 bits changes the wraparound time
from a few minutes to an hour. For faster running counters (GHz
range) this is even more important because we may not be able to
execute the timer in time to deal with the wraparound if only 32
bits are used.

We choose a maxsec value of 3600 seconds because we assume no
system will go idle for more than an hour. In the future we may
need to increase this value.

Note: All users should switch over to the 64-bit read function so
we can remove setup_sched_clock() in favor of sched_clock_register().

Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-30 11:24:21 -07:00
Stephen Boyd a08ca5d108 sched_clock: Use an hrtimer instead of timer
In the next patch we're going to increase the number of bits that
the generic sched_clock can handle to be greater than 32. With
more than 32 bits the wraparound time can be larger than what can
fit into the units that msecs_to_jiffies takes (unsigned int).
Luckily, the wraparound is initially calculated in nanoseconds
which we can easily use with hrtimers, so switch to using an
hrtimer.

Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[jstultz: Fixup hrtimer intitialization order issue]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-30 11:24:20 -07:00
Stephen Boyd 85c3d2dd15 sched_clock: Use seqcount instead of rolling our own
We're going to increase the cyc value to 64 bits in the near
future. Doing that is going to break the custom seqcount
implementation in the sched_clock code because 64 bit numbers
aren't guaranteed to be atomic. Replace the cyc_copy with a
seqcount to avoid this problem.

Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-30 11:24:20 -07:00
Stephen Boyd 87d8b9eb7e clocksource: Extract max nsec calculation into separate function
We need to calculate the same number in the clocksource code and
the sched_clock code, so extract this code into its own function.
We also drop the min_t and just use min() because the two types
are the same.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-30 11:24:20 -07:00
Rafael J. Wysocki 148519120c Revert "cpuidle: Quickly notice prediction failure for repeat mode"
Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
repeat mode), because it has been identified as the source of a
significant performance regression in v3.8 and later as explained by
Jeremy Eder:

  We believe we've identified a particular commit to the cpuidle code
  that seems to be impacting performance of variety of workloads.
  The simplest way to reproduce is using netperf TCP_RR test, so
  we're using that, on a pair of Sandy Bridge based servers.  We also
  have data from a large database setup where performance is also
  measurably/positively impacted, though that test data isn't easily
  share-able.

  Included below are test results from 3 test kernels:

  kernel       reverts
  -----------------------------------------------------------
  1) vanilla   upstream (no reverts)

  2) perfteam2 reverts e11538d1f0

  3) test      reverts 69a37beabf
                       e11538d1f0

  In summary, netperf TCP_RR numbers improve by approximately 4%
  after reverting 69a37beabf.  When
  69a37beabf is included, C0 residency
  never seems to get above 40%.  Taking that patch out gets C0 near
  100% quite often, and performance increases.

  The below data are histograms representing the %c0 residency @
  1-second sample rates (using turbostat), while under netperf test.

  - If you look at the first 4 histograms, you can see %c0 residency
    almost entirely in the 30,40% bin.
  - The last pair, which reverts 69a37beabf,
    shows %c0 in the 80,90,100% bins.

  Below each kernel name are netperf TCP_RR trans/s numbers for the
  particular kernel that can be disclosed publicly, comparing the 3
  test kernels.  We ran a 4th test with the vanilla kernel where
  we've also set /dev/cpu_dma_latency=0 to show overall impact
  boosting single-threaded TCP_RR performance over 11% above
  baseline.

  3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
  TCP_RR trans/s 54323.78

  -----------------------------------------------------------
  3.10-rc2 vanilla RX (no reverts)
  TCP_RR trans/s 48192.47

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     1]: *
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [    49]:
  *************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 perfteam2 RX (reverts commit
  e11538d1f0)
  TCP_RR trans/s 49698.69

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [     2]: **
     40.0000 -    50.0000 [    58]:
  **********************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 test RX (reverts 69a37beabf
  and e11538d1f0)
  TCP_RR trans/s 47766.95

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    27]: ***************************
     40.0000 -    50.0000 [     2]: **
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     2]: **
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [    28]: ****************************

  Sender:
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     1]: *
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     3]: ***
     80.0000 -    90.0000 [     7]: *******
     90.0000 -   100.0000 [    38]: **************************************

  These results demonstrate gaining back the tendency of the CPU to
  stay in more responsive, performant C-states (and thus yield
  measurably better performance), by reverting commit
  69a37beabf.

Requested-by: Jeremy Eder <jeder@redhat.com>
Tested-by: Len Brown <len.brown@intel.com>
Cc: 3.8+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-29 13:32:29 +02:00
Li Zhong ca06416b2b nohz: fix compile warning in tick_nohz_init()
cpu is not used after commit 5b8621a68f

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-07-24 20:30:33 +02:00
Steven Rostedt 543487c7a2 nohz: Do not warn about unstable tsc unless user uses nohz_full
If the user enables CONFIG_NO_HZ_FULL and runs the kernel on a machine
with an unstable TSC, it will produce a WARN_ON dump as well as taint
the kernel. This is a bit extreme for a kernel that just enables a
feature but doesn't use it.

The warning should only happen if the user tries to use the feature by
either adding nohz_full to the kernel command line, or by enabling
CONFIG_NO_HZ_FULL_ALL that makes nohz used on all CPUs at boot up. Note,
this second feature should not (yet) be used by distros or anyone that
doesn't care if NO_HZ is used or not.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-07-24 20:30:33 +02:00
Baruch Siach 53c0352042 sched_clock: Fix integer overflow
The expression '(1 << 32)' happens to evaluate as 0 on ARM, but
it evaluates as 1 on xtensa and x86_64. This zeros sched_clock_mask,
and breaks sched_clock().

Set the type of 1 to 'unsigned long long' to get the value we need.

Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-22 16:24:22 -07:00
Prarit Bhargava 397bbf6dee clocksource: Fix !CONFIG_CLOCKSOURCE_WATCHDOG compile
If I explicitly disable the clocksource watchdog in the x86 Kconfig,
the x86 kernel will not compile unless this is properly defined.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-07-22 16:00:17 -07:00
Paul Gortmaker 0db0628d90 kernel: delete __cpuinit usage from all core kernel files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.

[1] https://lkml.org/lkml/2013/5/20/589

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:59 -04:00
Stephen Boyd a272dcca18 tick: broadcast: Check broadcast mode on CPU hotplug
On ARM systems the dummy clockevent is registered with the cpu
hotplug notifier chain before any other per-cpu clockevent. This
has the side-effect of causing the dummy clockevent to be
registered first in every hotplug sequence. Because the dummy is
first, we'll try to turn the broadcast source on but the code in
tick_device_uses_broadcast() assumes the broadcast source is in
periodic mode and calls tick_broadcast_start_periodic()
unconditionally.

On boot this isn't a problem because we typically haven't
switched into oneshot mode yet (if at all). During hotplug, if
the broadcast source isn't in periodic mode we'll replace the
broadcast oneshot handler with the broadcast periodic handler and
start emulating oneshot mode when we shouldn't. Due to the way
the broadcast oneshot handler programs the next_event it's
possible for it to contain KTIME_MAX and cause us to hang the
system when the periodic handler tries to program the next tick.
Fix this by using the appropriate function to start the broadcast
source.

Reported-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: ARM kernel mailing list <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joseph Lo <josephl@nvidia.com>
Link: http://lkml.kernel.org/r/20130711140059.GA27430@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-12 12:35:40 +02:00
Thomas Gleixner f2006e2739 Merge branch 'linus' into timers/urgent
Get upstream changes so we can apply fixes against them

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-12 12:34:42 +02:00
Ingo Molnar e399eb56a6 Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent
Pull nohz updates/fixes from Frederic Weisbecker:

' Note that "watchdog: Boot-disable by default on full dynticks" is a temporary
  solution to solve the issue with the watchdog that prevents the tick from
  stopping. This is to make sure that 3.11 doesn't have that problem as several
  people complained about it.

  A proper and longer term solution has been proposed by Peterz:

          http://lkml.kernel.org/r/20130618103632.GO3204@twins.programming.kicks-ass.net
'

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-10 10:43:25 +02:00
Thomas Gleixner 332962f2c8 clocksource: Reselect clocksource when watchdog validated high-res capability
Up to commit 5d33b883a (clocksource: Always verify highres capability)
we had no sanity check when selecting a clocksource, which prevented
that a non highres capable clocksource is used when the system already
switched to highres/nohz mode.

The new sanity check works as Alex and Tim found out. It prevents the
TSC from being used. This happens because on x86 the boot process
looks like this:

 tsc_start_freqency_validation(TSC);
 clocksource_register(HPET);
 clocksource_done_booting();
	clocksource_select()
		Selects HPET which is valid for high-res

 switch_to_highres();

 clocksource_register(TSC);
 	TSC is not selected, because it is not yet
	flagged as VALID_HIGH_RES

 clocksource_watchdog()
	Validates TSC for highres, but that does not make TSC
	the current clocksource.

Before the sanity check was added, we installed TSC unvalidated which
worked most of the time. If the TSC was really detected as unstable,
then the unstable logic removed it and installed HPET again.

The sanity check is correct and needed. So the watchdog needs to kick
a reselection of the clocksource, when it qualifies TSC as a valid
high res clocksource.

To solve this, we mark the clocksource which got the flag
CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag
CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog
thread evaluates the flag and invokes clocksource_select() when set.

To avoid that the clocksource_done_booting() code, which is about to
install the first real clocksource anyway, needs to go through
clocksource_select and tick_oneshot_notify() pointlessly, split out
the clocksource_watchdog_kthread() list walk code and invoke the
select/notify only when called from clocksource_watchdog_kthread().

So clocksource_done_booting() can utilize the same splitout code
without the select/notify invocation and the clocksource_mutex
unlock/relock dance.

Reported-and-tested-by: Alex Shi <alex.shi@intel.com>
Cc: Hans Peter Anvin <hpa@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Tested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307042239150.11637@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-05 11:09:28 +02:00
Thomas Gleixner 2b0f89317e Merge branch 'timers/posix-cpu-timers-for-tglx' of
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core

Frederic sayed: "Most of these patches have been hanging around for
several month now, in -mmotm for a significant chunk. They already
missed a few releases."

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-04 23:11:22 +02:00
Thomas Gleixner 07bd117290 tick: Sanitize broadcast control logic
The recent implementation of a generic dummy timer resulted in a
different registration order of per cpu local timers which made the
broadcast control logic go belly up.

If the dummy timer is the first clock event device which is registered
for a CPU, then it is installed, the broadcast timer is initialized
and the CPU is marked as broadcast target.

If a real clock event device is installed after that, we can fail to
take the CPU out of the broadcast mask. In the worst case we end up
with two periodic timer events firing for the same CPU. One from the
per cpu hardware device and one from the broadcast.

Now the problem is that we have no way to distinguish whether the
system is in a state which makes broadcasting necessary or the
broadcast bit was set due to the nonfunctional dummy timer
installment.

To solve this we need to keep track of the system state seperately and
provide a more detailed decision logic whether we keep the CPU in
broadcast mode or not.

The old decision logic only clears the broadcast mode, if the newly
installed clock event device is not affected by power states.

The new logic clears the broadcast mode if one of the following is
true:

  - The new device is not affected by power states.

  - The system is not in a power state affected mode

  - The system has switched to oneshot mode. The oneshot broadcast is
    controlled from the deep idle state. The CPU is not in idle at
    this point, so it's safe to remove it from the mask.

If we clear the broadcast bit for the CPU when a new device is
installed, we also shutdown the broadcast device when this was the
last CPU in the broadcast mask.

If the broadcast bit is kept, then we leave the new device in shutdown
state and rely on the broadcast to deliver the timer interrupts via
the broadcast ipis.

Reported-and-tested-by: Stehle Vincent-B46079 <B46079@freescale.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>,
Cc: Mark Rutland <mark.rutland@arm.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-02 14:26:45 +02:00