1
0
Fork 0
remarkable-linux/fs
Kees Cook 800179c9b8 fs: add link restrictions
This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

 1996 Aug, Zygo Blaxell
  http://marc.info/?l=bugtraq&m=87602167419830&w=2
 1996 Oct, Andrew Tridgell
  http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
 1997 Dec, Albert D Cahalan
  http://lkml.org/lkml/1997/12/16/4
 2005 Feb, Lorenzo Hernández García-Hierro
  http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
 2010 May, Kees Cook
  https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

 - Violates POSIX.
   - POSIX didn't consider this situation and it's not useful to follow
     a broken specification at the cost of security.
 - Might break unknown applications that use this feature.
   - Applications that break because of the change are easy to spot and
     fix. Applications that are vulnerable to symlink ToCToU by not having
     the change aren't. Additionally, no applications have yet been found
     that rely on this behavior.
 - Applications should just use mkstemp() or O_CREATE|O_EXCL.
   - True, but applications are not perfect, and new software is written
     all the time that makes these mistakes; blocking this flaw at the
     kernel is a single solution to the entire class of vulnerability.
 - This should live in the core VFS.
   - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
 - This should live in an LSM.
   - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:37:58 +04:00
..
9p VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
adfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
affs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
afs VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
autofs4 switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
befs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
bfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
btrfs btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() 2012-07-23 00:01:43 +04:00
cachefiles switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
ceph VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
cifs VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
coda don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
configfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
cramfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
debugfs debugfs: get rid of useless arguments to debugfs_{mkdir,symlink} 2012-07-14 16:35:30 +04:00
devpts VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
dlm dlm: NULL dereference on failure in kmem_cache_create() 2012-05-15 10:39:28 -05:00
ecryptfs ecryptfs_lookup_interpose(): allocate dentry_info first 2012-07-29 21:24:17 +04:00
efs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
exofs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
exportfs switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
ext2 don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
ext3 don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
ext4 ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file() 2012-07-23 00:01:55 +04:00
fat don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
freevxfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
fscache
fuse don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
gfs2 quota: Move quota syncing to ->sync_fs method 2012-07-22 23:58:34 +04:00
hfs hfs: get rid of hfs_sync_super 2012-07-22 23:58:09 +04:00
hfsplus hfsplus: get rid of write_super 2012-07-22 23:58:04 +04:00
hostfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
hpfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
hppfs switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
hugetlbfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
isofs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
jbd jbd: Write journal superblock with WRITE_FUA after checkpointing 2012-05-15 23:34:37 +02:00
jbd2 jbd2: use kmem_cache_zalloc wrapper instead of flag 2012-06-01 00:10:32 -04:00
jffs2 don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
jfs don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
lockd Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linux 2012-06-01 08:32:58 -07:00
logfs VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
minix don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
ncpfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
nfs VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
nfs_common
nfsd switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
nilfs2 VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
nls nls: fix (and rename) mac NLS table files and config options 2012-06-01 19:51:22 -07:00
notify switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
ntfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
ocfs2 pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. 2012-07-29 21:24:15 +04:00
omfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
openpromfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
proc VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
pstore staging tree fixes for 3.5-rc4 2012-06-20 15:15:03 -07:00
qnx4 stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
qnx6 stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
quota quota: Split dquot_quota_sync() to writeback and cache flushing part 2012-07-22 23:58:19 +04:00
ramfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
reiserfs don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
romfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
squashfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
sysfs VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
sysv fs/sysv: stop using write_super and s_dirt 2012-07-22 23:58:12 +04:00
ubifs VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
udf don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
ufs fs/ufs: get rid of write_super 2012-07-22 23:58:16 +04:00
xfs switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
Kconfig.binfmt C6X: add support to build with BINFMT_ELF_FDPIC 2012-05-15 09:17:34 -04:00
Makefile fs: initial qnx6fs addition 2012-03-20 21:29:38 -04:00
aio.c aio: now fput() is OK from interrupt context; get rid of manual delayed __fput() 2012-07-22 23:57:59 +04:00
anon_inodes.c anon_inodes: move allocation of anon_inode into ->mount() 2012-03-20 21:29:45 -04:00
attr.c notify_change(): check that i_mutex is held 2012-07-14 16:35:42 +04:00
bad_inode.c don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
binfmt_aout.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
binfmt_elf.c binfmt_elf: switch elf_map() to vm_mmap/vm_munmap 2012-05-30 21:04:55 -04:00
binfmt_elf_fdpic.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-05-23 17:42:39 -07:00
binfmt_em86.c __register_binfmt() made void 2012-03-20 21:29:46 -04:00
binfmt_flat.c binfmt_flat: use vm_munmap, we are missing ->mmap_sem there 2012-05-30 21:04:56 -04:00
binfmt_misc.c vfs: Rename end_writeback() to clear_inode() 2012-05-06 13:43:41 +08:00
binfmt_script.c __register_binfmt() made void 2012-03-20 21:29:46 -04:00
binfmt_som.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
bio-integrity.c fs: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:21 +08:00
bio.c Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block 2012-05-30 08:52:42 -07:00
block_dev.c vfs: Create function for iterating over block devices 2012-07-22 23:58:45 +04:00
buffer.c block: fix infinite loop in __getblk_slow 2012-07-13 08:36:35 -07:00
char_dev.c
compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-06-01 11:53:44 -07:00
compat_binfmt_elf.c
compat_ioctl.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
dcache.c __d_unalias() should refuse to move mountpoints 2012-07-14 16:35:15 +04:00
dcookies.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
direct-io.c fs/direct-io.c: adjust suspicious bit operation 2012-07-14 16:32:46 +04:00
drop_caches.c
eventfd.c eventfd: change int to __u64 in eventfd_signal() 2012-05-31 17:49:32 -07:00
eventpoll.c HAVE_RESTORE_SIGMASK is defined on all architectures now 2012-06-01 12:58:46 -04:00
exec.c consolidate pipe file creation 2012-07-29 21:24:19 +04:00
fcntl.c switch fcntl to fget_raw_light/fput_light 2012-05-29 23:28:30 -04:00
fhandle.c
fifo.c
file.c Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-03-29 18:12:23 -07:00
file_table.c uninline file_free_rcu() 2012-07-29 21:24:17 +04:00
filesystems.c
fs-writeback.c vfs: Move noop_backing_dev_info check from sync into writeback 2012-07-22 23:58:18 +04:00
fs_struct.c get rid of ->mnt_longterm 2012-07-14 16:32:47 +04:00
generic_acl.c
inode.c vfs: switch i_dentry/d_alias to hlist 2012-07-14 16:32:55 +04:00
internal.h VFS: Split inode_permission() 2012-07-14 16:38:36 +04:00
ioctl.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
ioprio.c Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block 2012-05-30 08:52:42 -07:00
libfs.c VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
locks.c Remove easily user-triggerable BUG from generic_setlease 2012-07-13 10:50:23 -07:00
mbcache.c
mount.h get rid of magic in proc_namespace.c 2012-07-14 16:32:48 +04:00
mpage.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
namei.c fs: add link restrictions 2012-07-29 21:37:58 +04:00
namespace.c VFS: Comment mount following code 2012-07-14 16:38:32 +04:00
no-block.c
open.c take grabbing f->f_path to do_dentry_open() 2012-07-29 21:24:18 +04:00
pipe.c consolidate pipe file creation 2012-07-29 21:24:19 +04:00
pnode.c VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors 2012-07-14 16:37:27 +04:00
pnode.h
posix_acl.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
proc_namespace.c get rid of magic in proc_namespace.c 2012-07-14 16:32:48 +04:00
read_write.c vfs: allow custom EOF in generic_file_llseek code 2012-07-23 00:00:15 +04:00
read_write.h
readdir.c switch readdir/getdents to fget_light/fput_light 2012-05-29 23:28:29 -04:00
select.c HAVE_RESTORE_SIGMASK is defined on all architectures now 2012-06-01 12:58:46 -04:00
seq_file.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
signalfd.c switch signalfd4() to fget_light/fput_light 2012-05-29 23:28:30 -04:00
splice.c splice: fix racy pipe->buffers uses 2012-06-13 21:16:42 +02:00
stack.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
stat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-05-23 17:42:39 -07:00
statfs.c switch statfs to fget_light/fput_light 2012-05-29 23:28:31 -04:00
super.c VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
sync.c vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes 2012-07-22 23:59:01 +04:00
timerfd.c
utimes.c switch utimes() to fget_light/fput_light 2012-05-29 23:28:32 -04:00
xattr.c switch xattr syscalls to fget_light/fput_light 2012-05-29 23:28:30 -04:00
xattr_acl.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00