alistair23-linux/kernel
Steven Rostedt ce6bd420f4 futex: fix for futex_wait signal stack corruption
David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too.  The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.

Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.

Here's the relevant code from kernel/futex.c: (not in order in the file)

[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
                          struct timespec __user *utime, u32 __user *uaddr2,
                          u32 val3)
{
        struct timespec ts;
        ktime_t t, *tp = NULL;
        u32 val2 = 0;
        int cmd = op & FUTEX_CMD_MASK;

        if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
                if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
                        return -EFAULT;
                if (!timespec_valid(&ts))
                        return -EINVAL;

                t = timespec_to_ktime(ts);
                if (cmd == FUTEX_WAIT)
                        t = ktime_add(ktime_get(), t);
                tp = &t;
        }
[...]
        return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}

[...]

long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
                u32 __user *uaddr2, u32 val2, u32 val3)
{
        int ret;
        int cmd = op & FUTEX_CMD_MASK;
        struct rw_semaphore *fshared = NULL;

        if (!(op & FUTEX_PRIVATE_FLAG))
                fshared = &current->mm->mmap_sem;

        switch (cmd) {
        case FUTEX_WAIT:
                ret = futex_wait(uaddr, fshared, val, timeout);

[...]

static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
                      u32 val, ktime_t *abs_time)
{
[...]
               struct restart_block *restart;
                restart = &current_thread_info()->restart_block;
                restart->fn = futex_wait_restart;
                restart->arg0 = (unsigned long)uaddr;
                restart->arg1 = (unsigned long)val;
                restart->arg2 = (unsigned long)abs_time;
                restart->arg3 = 0;
                if (fshared)
                        restart->arg3 |= ARG3_SHARED;
                return -ERESTART_RESTARTBLOCK;
[...]

static long futex_wait_restart(struct restart_block *restart)
{
        u32 __user *uaddr = (u32 __user *)restart->arg0;
        u32 val = (u32)restart->arg1;
        ktime_t *abs_time = (ktime_t *)restart->arg2;
        struct rw_semaphore *fshared = NULL;

        restart->fn = do_no_restart_syscall;
        if (restart->arg3 & ARG3_SHARED)
                fshared = &current->mm->mmap_sem;
        return (long)futex_wait(uaddr, fshared, val, abs_time);
}

So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK.  The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.

This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.

I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.

The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters.  This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.

Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for.  If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 15:46:09 +01:00
..
irq __do_IRQ does not check IRQ_DISABLED when IRQ_PER_CPU is set 2007-11-14 18:45:43 -08:00
power hibernate: fix lockdep report 2007-11-14 18:45:43 -08:00
time softlockup: fix false positives on CONFIG_NOHZ 2007-11-28 15:52:56 +01:00
.gitignore
acct.c sched: fix kernel/acct.c comment 2007-11-26 21:21:49 +01:00
audit.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit.h [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit_tree.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditfilter.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditsc.c auditsc: fix kernel-doc param warnings 2007-10-22 19:40:02 -07:00
capability.c Uninline find_pid etc set of functions 2007-10-19 11:53:41 -07:00
cgroup.c Improve cgroup printks 2007-11-14 18:45:37 -08:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
compat.c Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt 2007-10-18 15:12:41 -07:00
configs.c
cpu.c CPU HOTPLUG: avoid hotadd when proper possible_map isn't specified 2007-10-19 11:53:44 -07:00
cpuset.c hotplug cpu: migrate a task within its cpuset 2007-10-19 11:53:44 -07:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c wait_task_stopped(): pass correct exit_code to wait_noreap_copyout() 2007-11-29 09:24:55 -08:00
extable.c
fork.c sched: fix copy_namespace() <-> sched_fork() dependency in do_fork 2007-11-09 22:39:39 +01:00
futex.c futex: fix for futex_wait signal stack corruption 2007-12-05 15:46:09 +01:00
futex_compat.c [FUTEX] Fix address computation in compat code. 2007-11-09 16:13:08 -08:00
hrtimer.c Quieten hrtimer printk: "Switched to high resolution mode .." 2007-10-29 09:39:38 +01:00
itimer.c whitespace fixes: interval timers 2007-10-18 14:37:26 -07:00
kallsyms.c FRV: fix the extern declaration of kallsyms_num_syms 2007-11-29 09:24:54 -08:00
Kconfig.hz
Kconfig.instrumentation uml: add !UML dependencies 2007-12-03 08:13:17 -08:00
Kconfig.preempt Move PREEMPT_NOTIFIERS into an always-included Kconfig 2007-10-17 08:42:55 -07:00
kexec.c Extended crashkernel command line 2007-10-19 11:53:49 -07:00
kfifo.c
kmod.c
kprobes.c kprobes: support kretprobe blacklist 2007-10-16 09:43:10 -07:00
ksysfs.c add-vmcore: cleanup the coding style according to Andrew's comments 2007-10-17 08:42:54 -07:00
kthread.c
latency.c
lockdep.c lockdep: fix a typo in the __lock_acquire comment 2007-10-28 20:47:01 +01:00
lockdep_internals.h
lockdep_proc.c
Makefile revert "Task Control Groups: example CPU accounting subsystem" 2007-11-14 18:45:40 -08:00
marker.c Linux Kernel Markers: fix marker mutex not taken upon module load 2007-11-14 18:45:40 -08:00
module.c module: fix and elaborate comments 2007-11-19 11:20:43 +11:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c Add kernel/notifier.c 2007-10-19 11:53:34 -07:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c pid namespaces: allow cloning of new namespace 2007-10-19 11:53:39 -07:00
panic.c trivial comment wording/typo fix regarding taint flags 2007-10-20 00:30:06 +02:00
params.c fix param_sysfs_builtin name length check 2007-11-14 18:45:42 -08:00
pid.c pidns: Place under CONFIG_EXPERIMENTAL 2007-11-14 18:45:43 -08:00
posix-cpu-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
posix-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
printk.c serial: turn serial console suspend a boot rather than compile time option 2007-10-18 14:37:19 -07:00
profile.c sched: document profile=sleep requiring CONFIG_SCHEDSTATS 2007-10-24 18:23:50 +02:00
ptrace.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
rcupdate.c Clean up duplicate includes in kernel/ 2007-10-17 08:42:48 -07:00
rcutorture.c Make rcutorture RNG use temporal entropy 2007-10-17 08:42:53 -07:00
relay.c whitespace fixes: relayfs 2007-10-18 14:37:24 -07:00
resource.c Add IORESOUCE_BUSY flag for System RAM 2007-11-14 18:45:39 -08:00
rtmutex-debug.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c sched: fix crash in sys_sched_rr_get_interval() 2007-12-04 17:04:39 +01:00
sched_debug.c sched: clean up overlong line in kernel/sched_debug.c 2007-11-28 15:52:56 +01:00
sched_fair.c sched: default to more agressive yield for SCHED_BATCH tasks 2007-12-04 17:04:39 +01:00
sched_idletask.c sched: isolate SMP balancing code a bit more 2007-10-24 18:23:51 +02:00
sched_rt.c sched: cpu accounting controller (V2) 2007-12-02 20:04:49 +01:00
sched_stats.h sched: clean up kernel/sched_stat.h 2007-11-28 15:52:56 +01:00
seccomp.c
signal.c sigwait eats blocked default-ignore signals 2007-11-12 16:05:23 -08:00
softirq.c
softlockup.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys.c x86: ignore the sys_getcpu() tcache parameter 2007-11-17 16:27:00 +01:00
sys_ni.c [COMPAT]: Fix build on COMPAT platforms when CONFIG_NET is disabled. 2007-10-30 21:29:56 -07:00
sysctl.c sysctl: check length at deprecated_sysctl_warning 2007-11-14 18:45:37 -08:00
sysctl_check.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6 2007-11-26 20:09:07 -08:00
taskstats.c kernel/taskstats.c: fix bogus nlmsg_free() 2007-11-14 18:45:44 -08:00
time.c whitespace fixes: time syscalls 2007-10-18 14:37:24 -07:00
timer.c sched: restore deterministic CPU accounting on powerpc 2007-11-09 22:39:38 +01:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c
user.c sched: don't forget to unlock uids_mutex on error paths 2007-11-26 21:21:49 +01:00
user_namespace.c
utsname.c
utsname_sysctl.c Isolate the UTS namespace's domainname and hostname back 2007-11-29 09:24:53 -08:00
wait.c
workqueue.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00