remarkable-linux/fs
Eric W. Biederman 54fcb2303e mnt: Make propagate_umount less slow for overlapping mount propagation trees
commit 296990deb3 upstream.

Andrei Vagin pointed out that time to executue propagate_umount can go
non-linear (and take a ludicrious amount of time) when the mount
propogation trees of the mounts to be unmunted by a lazy unmount
overlap.

Make the walk of the mount propagation trees nearly linear by
remembering which mounts have already been visited, allowing
subsequent walks to detect when walking a mount propgation tree or a
subtree of a mount propgation tree would be duplicate work and to skip
them entirely.

Walk the list of mounts whose propgatation trees need to be traversed
from the mount highest in the mount tree to mounts lower in the mount
tree so that odds are higher that the code will walk the largest trees
first, allowing later tree walks to be skipped entirely.

Add cleanup_umount_visitation to remover the code's memory of which
mounts have been visited.

Add the functions last_slave and skip_propagation_subtree to allow
skipping appropriate parts of the mount propagation tree without
needing to change the logic of the rest of the code.

A script to generate overlapping mount propagation trees:

$ cat runs.h
set -e
mount -t tmpfs zdtm /mnt
mkdir -p /mnt/1 /mnt/2
mount -t tmpfs zdtm /mnt/1
mount --make-shared /mnt/1
mkdir /mnt/1/1

iteration=10
if [ -n "$1" ] ; then
	iteration=$1
fi

for i in $(seq $iteration); do
	mount --bind /mnt/1/1 /mnt/1/1
done

mount --rbind /mnt/1 /mnt/2

TIMEFORMAT='%Rs'
nr=$(( ( 2 ** ( $iteration + 1 ) ) + 1 ))
echo -n "umount -l /mnt/1 -> $nr        "
time umount -l /mnt/1

nr=$(cat /proc/self/mountinfo | grep zdtm | wc -l )
time umount -l /mnt/2

$ for i in $(seq 9 19); do echo $i; unshare -Urm bash ./run.sh $i; done

Here are the performance numbers with and without the patch:

     mhash |  8192   |  8192  | 1048576 | 1048576
    mounts | before  | after  |  before | after
    ------------------------------------------------
      1025 |  0.040s | 0.016s |  0.038s | 0.019s
      2049 |  0.094s | 0.017s |  0.080s | 0.018s
      4097 |  0.243s | 0.019s |  0.206s | 0.023s
      8193 |  1.202s | 0.028s |  1.562s | 0.032s
     16385 |  9.635s | 0.036s |  9.952s | 0.041s
     32769 | 60.928s | 0.063s | 44.321s | 0.064s
     65537 |         | 0.097s |         | 0.097s
    131073 |         | 0.233s |         | 0.176s
    262145 |         | 0.653s |         | 0.344s
    524289 |         | 2.305s |         | 0.735s
   1048577 |         | 7.107s |         | 2.603s

Andrei Vagin reports fixing the performance problem is part of the
work to fix CVE-2016-6213.

Fixes: a05964f391 ("[PATCH] shared mounts handling: umount")
Reported-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:42:22 +02:00
..
9p 9p: fix a potential acl leak 2017-05-14 14:00:13 +02:00
adfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
afs fs: Better permission checking for submounts 2017-03-15 10:02:44 +08:00
autofs4 autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL 2017-06-29 13:00:28 +02:00
befs befs fixes for 4.9-rc1 2016-10-15 12:09:13 -07:00
bfs Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
btrfs Btrfs: fix truncate down when no_holes feature is enabled 2017-07-05 14:40:22 +02:00
cachefiles Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
ceph ceph: choose readdir frag based on previous readdir reply 2017-07-12 15:01:02 +02:00
cifs CIFS: Improve readdir verbosity 2017-06-29 13:00:29 +02:00
coda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
configfs configfs: Fix race between create_link and configfs_rmdir 2017-06-24 07:11:12 +02:00
cramfs more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
crypto fscrypt: avoid collisions when presenting long encrypted filenames 2017-05-25 15:44:38 +02:00
debugfs fs: Better permission checking for submounts 2017-03-15 10:02:44 +08:00
devpts Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
dlm dlm: free workqueues after the connections 2016-10-10 09:54:00 -05:00
ecryptfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
efivarfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
efs fs/efs/super.c: fix return value 2016-05-20 17:58:30 -07:00
exofs fs: exofs: print a hex number after a 0x prefix 2016-10-27 18:43:43 -07:00
exportfs exportfs: be careful to only return expected errors. 2016-10-06 09:07:44 -04:00
ext2 ext2: avoid bogus -Wmaybe-uninitialized warning 2016-10-18 11:29:35 +02:00
ext4 ext4: check return value of kstrtoull correctly in reserved_clusters_store 2017-07-15 12:16:16 +02:00
f2fs crypto: Work around deallocated stack frame reference gcc bug on sparc. 2017-06-24 07:11:17 +02:00
fat fat: fix using uninitialized fields of fat_inode/fsinfo_inode 2017-03-15 10:02:52 +08:00
freevxfs freevxfs: update Kconfig information 2016-06-13 10:20:39 +02:00
fscache FS-Cache: Initialise stores_lock in netfs cookie 2017-06-17 06:41:52 +02:00
fuse fuse: add missing FR_FORCE 2017-03-12 06:41:47 +01:00
gfs2 gfs2: Fix glock rhashtable rcu bug 2017-07-12 15:01:06 +02:00
hfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hfsplus Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hostfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hugetlbfs mm: larger stack guard gap, between vmas 2017-06-24 07:11:18 +02:00
isofs isofs: add KERN_CONT to printing of ER records 2016-11-30 10:41:26 -08:00
jbd2 jbd2: don't leak memory if setting up journal fails 2017-03-30 09:41:27 +02:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
jfs fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
kernfs kernfs: Add noop_fsync to supported kernfs_file_fops 2016-10-27 17:47:11 +02:00
lockd treewide: remove redundant #include <linux/kconfig.h> 2016-10-11 15:06:33 -07:00
logfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
minix Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
nfs NFSv4.1: Fix a race in nfs4_proc_layoutget 2017-07-05 14:40:18 +02:00
nfs_common
nfsd fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
nilfs2 fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
nls
notify fanotify: don't expose EOPENSTALE to userspace 2017-05-25 15:44:31 +02:00
ntfs fs: remove the never implemented aio_fsync file operation 2016-10-30 13:09:42 -04:00
ocfs2 ocfs2: o2hb: revert hb threshold to keep compatible 2017-07-05 14:40:29 +02:00
omfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
openpromfs fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
orangefs fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
overlayfs ovl: fix d_real() for stacked fs 2016-11-29 10:20:24 +01:00
proc mm: larger stack guard gap, between vmas 2017-06-24 07:11:18 +02:00
pstore pstore: Shut down worker when unregistering 2017-05-20 14:28:42 +02:00
qnx4 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
qnx6 more trivial ->iterate_shared conversions 2016-05-09 11:41:14 -04:00
quota quota: fill in Q_XGETQSTAT inode information for inactive quotas 2016-08-15 17:43:31 +02:00
ramfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
reiserfs fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
romfs romfs: use different way to generate fsid for BLOCK or MTD 2017-06-17 06:41:56 +02:00
squashfs vfs: Remove {get,set,remove}xattr inode operations 2016-10-07 21:48:36 -04:00
sysfs sysfs: be careful of error returns from ops->show() 2017-04-12 12:41:11 +02:00
sysv Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
tracefs fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
ubifs ubifs: Fix O_TMPFILE corner case in ubifs_link() 2017-04-27 09:10:38 +02:00
udf fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
ufs ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path 2017-06-14 15:06:01 +02:00
xfs fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
aio.c aio: fix lock dep warning 2017-07-05 14:40:26 +02:00
anon_inodes.c
attr.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
bad_inode.c bad_inode: add missing i_op initializers 2017-01-09 08:32:24 +01:00
binfmt_aout.c fs: fix binfmt_aout.c build error 2016-05-28 16:34:59 -07:00
binfmt_elf.c binfmt_elf: use ELF_ET_DYN_BASE only for PIE 2017-07-21 07:42:21 +02:00
binfmt_elf_fdpic.c elf_fdpic_transfer_args_to_stack(): make it generic 2016-07-25 16:51:49 +10:00
binfmt_em86.c fs/binfmt_em86.c: fix incompatible pointer type 2016-08-02 19:35:15 -04:00
binfmt_flat.c binfmt_flat: allow compressed flat binary format to work on MMU systems 2016-07-28 13:29:12 +10:00
binfmt_misc.c fs: Replace current_fs_time() with current_time() 2016-09-27 21:06:22 -04:00
binfmt_script.c
block_dev.c fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
buffer.c fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
char_dev.c dax: define a unified inode/address_space for device-dax mappings 2016-08-23 22:58:51 -07:00
compat.c compat: remove compat_printk() 2016-09-27 21:20:53 -04:00
compat_binfmt_elf.c
compat_ioctl.c fs: compat_ioctl: add pretimeout functions for watchdogs 2016-09-24 09:27:18 +02:00
coredump.c coredump: Ensure proper size of sparse core files 2017-07-05 14:40:26 +02:00
dax.c fs: break out of iomap_file_buffered_write on fatal signals 2017-02-09 08:08:31 +01:00
dcache.c fs/dcache.c: fix spin lockup issue on nlru->lock 2017-07-21 07:42:21 +02:00
dcookies.c
direct-io.c fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
drop_caches.c
eventfd.c
eventpoll.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
exec.c exec: Limit arg stack to at most 75% of _STK_LIM 2017-07-21 07:42:22 +02:00
fcntl.c fs: add a VALID_OPEN_FLAGS 2017-07-12 15:01:02 +02:00
fhandle.c
file.c fs/file: more unsigned file descriptors 2016-09-27 18:47:38 -04:00
file_table.c
filesystems.c
fs-writeback.c mm, writeback: flush plugged IO in wakeup_flusher_threads() 2016-08-09 19:58:06 -06:00
fs_pin.c
fs_struct.c
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
internal.h Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 13:04:49 -07:00
ioctl.c vfs: cap dedupe request structure size at PAGE_SIZE 2016-09-15 13:29:52 -07:00
iomap.c fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
Kconfig mm/hugetlb: introduce ARCH_HAS_GIGANTIC_PAGE 2016-10-07 18:46:29 -07:00
Kconfig.binfmt ARM: 8594/1: enable binfmt_flat on systems with an MMU 2016-08-12 16:47:05 +01:00
libfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
locks.c locking, fs/locks: Add missing file_sem locks 2016-10-18 12:21:28 +02:00
Makefile fs: introduce iomap infrastructure 2016-06-21 09:23:11 +10:00
mbcache.c mbcache: fix to detect failure of register_shrinker 2016-08-31 11:44:36 -04:00
mount.h mnt: In propgate_umount handle visiting mounts in any order 2017-07-21 07:42:22 +02:00
mpage.c fs: add i_blocksize() 2017-06-14 15:06:00 +02:00
namei.c fs: Better permission checking for submounts 2017-03-15 10:02:44 +08:00
namespace.c mnt: In propgate_umount handle visiting mounts in any order 2017-07-21 07:42:22 +02:00
no-block.c
nsfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
open.c fs: completely ignore unknown open flags 2017-07-12 15:01:02 +02:00
pipe.c pipe: cap initial pipe capacity according to pipe-max-size limit 2016-10-11 15:06:32 -07:00
pnode.c mnt: Make propagate_umount less slow for overlapping mount propagation trees 2017-07-21 07:42:22 +02:00
pnode.h mnt: Tuck mounts under others instead of creating shadow/side mounts. 2017-03-15 10:02:43 +08:00
posix_acl.c tmpfs: clear S_ISGID when setting posix ACLs 2017-01-26 08:24:37 +01:00
proc_namespace.c
read_write.c fs: pass on flags in compat_writev 2017-06-24 07:11:12 +02:00
readdir.c restore killability of old mutex_lock_killable(&inode->i_mutex) users 2016-05-26 00:13:25 -04:00
select.c fs/select: add vmalloc fallback for select(2) 2016-10-11 15:06:30 -07:00
seq_file.c seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char 2016-10-07 18:46:30 -07:00
signalfd.c
splice.c vfs: fix uninitialized flags in splice_to_pipe() 2017-02-23 17:44:35 +01:00
stack.c
stat.c ufs: restore maintaining ->i_blocks 2017-06-14 15:06:01 +02:00
statfs.c
super.c fs: Better permission checking for submounts 2017-03-15 10:02:44 +08:00
sync.c
timerfd.c timerfd: Protect the might cancel mechanism proper 2017-05-08 07:47:54 +02:00
userfaultfd.c userfaultfd: fix SIGBUS resulting from false rwsem wakeups 2017-06-17 06:41:56 +02:00
utimes.c Merge remote-tracking branch 'jk/vfs' into work.misc 2016-10-08 11:06:08 -04:00
xattr.c fs/xattr.c: zero out memory copied to userspace in getxattr 2017-05-20 14:28:39 +02:00