alistair23-linux/kernel/sched
Vincent Guittot 25f55d9d01 sched: Fix init NOHZ_IDLE flag
On my SMP platform which is made of 5 cores in 2 clusters, I
have the nr_busy_cpu field of sched_group_power struct that is
not null when the platform is fully idle - which makes the
scheduler unhappy.

The root cause is:

During the boot sequence, some CPUs reach the idle loop and set
their NOHZ_IDLE flag while waiting for others CPUs to boot. But
the nr_busy_cpus field is initialized later with the assumption
that all CPUs are in the busy state whereas some CPUs have
already set their NOHZ_IDLE flag.

More generally, the NOHZ_IDLE flag must be initialized when new
sched_domains are created in order to ensure that NOHZ_IDLE and
nr_busy_cpus are aligned.

This condition can be ensured by adding a synchronize_rcu()
between the destruction of old sched_domains and the creation of
new ones so the NOHZ_IDLE flag will not be updated with old
sched_domain once it has been initialized. But this solution
introduces a additionnal latency in the rebuild sequence that is
called during cpu hotplug.

As suggested by Frederic Weisbecker, another solution is to have
the same rcu lifecycle for both NOHZ_IDLE and sched_domain
struct. A new nohz_idle field is added to sched_domain so both
status and sched_domain will share the same RCU lifecycle and
will be always synchronized. In addition, there is no more need
to protect nohz_idle against concurrent access as it is only
modified by 2 exclusive functions called by local cpu.

This solution has been prefered to the creation of a new struct
with an extra pointer indirection for sched_domain.

The synchronization is done at the cost of :

 - An additional indirection and a rcu_dereference for accessing nohz_idle.
 - We use only the nohz_idle field of the top sched_domain.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linaro-kernel@lists.linaro.org
Cc: peterz@infradead.org
Cc: fweisbec@gmail.com
Cc: pjt@google.com
Cc: rostedt@goodmis.org
Cc: efault@gmx.de
Link: http://lkml.kernel.org/r/1366729142-14662-1-git-send-email-vincent.guittot@linaro.org
[ Fixed !NO_HZ build bug. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-26 12:13:44 +02:00
..
auto_group.c sched: split out css_online/css_offline from tg creation/destruction 2013-01-24 12:05:18 -08:00
auto_group.h Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled" 2012-12-11 10:23:45 +01:00
clock.c
core.c sched: Rename load_balance_tmpmask to load_balance_mask 2013-04-24 08:52:45 +02:00
cpuacct.c sched/cpuacct/UML: Fix header file dependency bug on the UML build 2013-04-10 15:12:41 +02:00
cpuacct.h sched/cpuacct: Initialize root cpuacct earlier 2013-04-10 13:54:20 +02:00
cpupri.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
cpupri.h
cputime.c sched/cpuacct: Add cpuacct_acount_field() 2013-04-10 13:54:17 +02:00
debug.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-26 19:42:08 -08:00
fair.c sched: Fix init NOHZ_IDLE flag 2013-04-26 12:13:44 +02:00
features.h Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
idle_task.c sched: Fix wrong rq's runnable_avg update with rt tasks 2013-04-21 11:22:52 +02:00
Makefile sched: Split cpuacct code out of core.c 2013-04-10 13:54:15 +02:00
rt.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-19 18:19:48 -08:00
sched.h sched: Fix init NOHZ_IDLE flag 2013-04-26 12:13:44 +02:00
stats.c sched: Fix /proc/sched_stat failure on very very large systems 2013-02-22 10:27:24 +01:00
stats.h
stop_task.c sched: Fix migration thread runtime bogosity 2012-08-13 18:41:55 +02:00