alistair23-linux/kernel
Srivatsa S. Bhat 011e4b02f1 powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
(ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
get the following messages during boot:

[    0.089866] POWER8 performance monitor hardware support registered
[    0.089985] power8-pmu: PMAO restore workaround active.
[    5.095419] Processor 1 is stuck.
[   10.097933] Processor 2 is stuck.
[   15.100480] Processor 3 is stuck.
[   20.102982] Processor 4 is stuck.
[   25.105489] Processor 5 is stuck.
[   30.108005] Processor 6 is stuck.
[   35.110518] Processor 7 is stuck.
[   40.113369] Processor 9 is stuck.
[   45.115879] Processor 10 is stuck.
[   50.118389] Processor 11 is stuck.
[   55.120904] Processor 12 is stuck.
[   60.123425] Processor 13 is stuck.
[   65.125970] Processor 14 is stuck.
[   70.128495] Processor 15 is stuck.
[   75.131316] Processor 17 is stuck.

Note that only the sibling threads are stuck, while the primary threads (0, 8,
16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
that kexec tries to wakeup (bring online) the sibling threads of all the cores,
before performing kexec:

[ 9464.131231] Starting new kernel
[ 9464.148507] kexec: Waking offline cpu 1.
[ 9464.148552] kexec: Waking offline cpu 2.
[ 9464.148600] kexec: Waking offline cpu 3.
[ 9464.148636] kexec: Waking offline cpu 4.
[ 9464.148671] kexec: Waking offline cpu 5.
[ 9464.148708] kexec: Waking offline cpu 6.
[ 9464.148743] kexec: Waking offline cpu 7.
[ 9464.148779] kexec: Waking offline cpu 9.
[ 9464.148815] kexec: Waking offline cpu 10.
[ 9464.148851] kexec: Waking offline cpu 11.
[ 9464.148887] kexec: Waking offline cpu 12.
[ 9464.148922] kexec: Waking offline cpu 13.
[ 9464.148958] kexec: Waking offline cpu 14.
[ 9464.148994] kexec: Waking offline cpu 15.
[ 9464.149030] kexec: Waking offline cpu 17.

Instrumenting this piece of code revealed that the cpu_up() operation actually
fails with -EBUSY. Thus, only the primary threads of all the cores are online
during kexec, and hence this is a sure-shot receipe for disaster, as explained
in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec),
as well as in the comment above wake_offline_cpus().

It turns out that cpu_up() was returning -EBUSY because the variable
'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
by migrate_to_reboot_cpu() inside kernel_kexec().

Now, migrate_to_reboot_cpu() was originally written with the assumption that
any further code will not need to perform CPU hotplug, since we are anyway in
the reboot path. However, kexec is clearly not such a case, since we depend on
onlining CPUs, atleast on powerpc.

So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
kexec path, to fix this regression in kexec on powerpc.

Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
can catch such issues more easily in the future.

Fixes: c97102ba96 (kexec: migrate to reboot cpu)
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-05-28 13:24:26 +10:00
..
debug mm: per-thread vma caching 2014-04-07 16:35:53 -07:00
events Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2014-04-05 13:20:43 -07:00
gcov
irq genirq: Export symbol no_action() 2014-03-22 11:33:09 +01:00
locking Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-16 16:35:18 -07:00
power kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
printk printk: fix one circular lockdep warning about console_lock 2014-04-03 16:21:08 -07:00
rcu Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 11:21:19 -07:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-19 10:40:51 -07:00
time tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz() 2014-04-15 20:26:58 +02:00
trace This contains two fixes. 2014-04-18 10:16:43 -07:00
.gitignore
acct.c
async.c
audit.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
audit.h audit: Use struct net not pid_t to remember the network namespce to reply in 2014-03-20 10:10:53 -04:00
audit_tree.c
audit_watch.c
auditfilter.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
auditsc.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
backtracetest.c
bounds.c
capability.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2014-04-03 09:26:18 -07:00
cgroup.c cgroup: newly created dirs and files should be owned by the creator 2014-04-07 16:44:47 -04:00
cgroup_freezer.c cgroup: drop const from @buffer of cftype->write_string() 2014-03-19 10:23:54 -04:00
compat.c Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 12:51:41 -07:00
configs.c
context_tracking.c
cpu.c CPU hotplug: Provide lockless versions of callback registration functions 2014-03-20 13:43:40 +01:00
cpu_pm.c
cpuset.c Merge branch 'akpm' (incoming from Andrew) 2014-04-03 16:22:16 -07:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process 2014-04-07 16:36:06 -07:00
extable.c
fork.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
freezer.c
futex.c futex: update documentation for ordering guarantees 2014-04-12 17:57:51 -07:00
futex_compat.c
groups.c kernel/groups.c: remove return value of set_groups 2014-04-03 16:21:05 -07:00
hrtimer.c timer: Remove code redundancy while calling get_nohz_timer_target() 2014-03-20 12:35:46 +01:00
hung_task.c kernel: audit/fix non-modular users of module_init in core code 2014-04-03 16:21:07 -07:00
irq_work.c perf/x86: Warn to early_printk() in case irq_work is too slow 2014-02-21 21:49:07 +01:00
itimer.c
jump_label.c
kallsyms.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode 2014-05-28 13:24:26 +10:00
kmod.c
kprobes.c
ksysfs.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
kthread.c kthread: ensure locality of task_struct allocations 2014-04-03 16:20:49 -07:00
latencytop.c
Makefile Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 14:13:25 -07:00
module-internal.h
module.c modules: use raw_cpu_write for initialization of per cpu refcount. 2014-04-07 16:36:14 -07:00
module_signing.c
notifier.c notifier: Substitute rcu_access_pointer() for rcu_dereference_raw() 2014-02-26 06:35:13 -08:00
nsproxy.c
padata.c
panic.c kernel/panic.c: display reason at end + pr_emerg 2014-04-07 16:36:08 -07:00
params.c
pid.c
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-02 16:20:21 -07:00
posix-cpu-timers.c
posix-timers.c
profile.c CPU hotplug notifiers registration fixes for 3.15-rc1 2014-04-07 14:55:46 -07:00
ptrace.c kernel/compat: convert to COMPAT_SYSCALL_DEFINE 2014-03-06 15:35:10 +01:00
range.c
reboot.c
relay.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
res_counter.c res_counter: remove interface for locked charging and uncharging 2014-04-07 16:35:54 -07:00
resource.c kernel/resource.c: make reallocate_resource() static 2014-04-03 16:21:07 -07:00
seccomp.c seccomp: fix memory leak on filter attach 2014-04-16 15:25:53 -04:00
signal.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
smp.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
smpboot.c
smpboot.h
softirq.c softirq: Add linux/irq.h to make it compile again 2014-03-19 11:28:14 +01:00
stacktrace.c
stop_machine.c stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() 2014-03-11 11:33:47 +01:00
sys.c mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE 2014-04-07 16:35:52 -07:00
sys_ni.c fs, kernel: permit disabling the uselib syscall 2014-04-03 16:21:05 -07:00
sysctl.c hung_task: check the value of "sysctl_hung_task_timeout_sec" 2014-04-07 16:36:07 -07:00
sysctl_binary.c
system_certificates.S
system_keyring.c
task_work.c
taskstats.c
test_kprobes.c
time.c
timeconst.bc
timer.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00
torture.c rcutorture: Gracefully handle NULL cleanup hooks 2014-02-23 09:04:39 -08:00
tracepoint.c This includes the final patch to clean up and fix the issue with the 2014-04-12 13:06:10 -07:00
tsacct.c
uid16.c
up.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
user-return-notifier.c
user.c kernel: audit/fix non-modular users of module_init in core code 2014-04-03 16:21:07 -07:00
user_namespace.c user namespace: fix incorrect memory barriers 2014-04-14 16:03:02 -07:00
utsname.c
utsname_sysctl.c
watchdog.c kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write() 2014-04-18 16:40:08 -07:00
workqueue.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00
workqueue_internal.h