alistair23-linux/fs
Filipe Manana dac5705cad Btrfs: fix crash while doing a ranged fsync
While doing a ranged fsync, that is, one whose range doesn't cover the
whole possible file range (0 to LLONG_MAX), we can crash under certain
circumstances with a trace like the following:

[41074.641913] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
(...)
[41074.642692] CPU: 0 PID: 24580 Comm: fsx Not tainted 3.16.0-fdm-btrfs-next-45+ #1
(...)
[41074.643886] RIP: 0010:[<ffffffffa01ecc99>]  [<ffffffffa01ecc99>] btrfs_ordered_update_i_size+0x279/0x2b0 [btrfs]
(...)
[41074.644919] Stack:
(...)
[41074.644919] Call Trace:
[41074.644919]  [<ffffffffa01db531>] btrfs_truncate_inode_items+0x3f1/0xa10 [btrfs]
[41074.644919]  [<ffffffffa01eb54f>] ? btrfs_get_logged_extents+0x4f/0x80 [btrfs]
[41074.644919]  [<ffffffffa02137a9>] btrfs_log_inode+0x2f9/0x970 [btrfs]
[41074.644919]  [<ffffffff81090875>] ? sched_clock_local+0x25/0xa0
[41074.644919]  [<ffffffff8164a55e>] ? mutex_unlock+0xe/0x10
[41074.644919]  [<ffffffff810af51d>] ? trace_hardirqs_on+0xd/0x10
[41074.644919]  [<ffffffffa0214b4f>] btrfs_log_inode_parent+0x1ef/0x560 [btrfs]
[41074.644919]  [<ffffffff811d0c55>] ? dget_parent+0x5/0x180
[41074.644919]  [<ffffffffa0215d11>] btrfs_log_dentry_safe+0x51/0x80 [btrfs]
[41074.644919]  [<ffffffffa01e2d1a>] btrfs_sync_file+0x1ba/0x3e0 [btrfs]
[41074.644919]  [<ffffffff811eda6b>] vfs_fsync_range+0x1b/0x30
(...)

The necessary conditions that lead to such crash are:

* an incremental fsync (when the inode doesn't have the
  BTRFS_INODE_NEEDS_FULL_SYNC flag set) happened for our file and it logged
  a file extent item ending at offset X;

* the file got the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode, due
  to a file truncate operation that reduces the file to a size smaller
  than X;

* a ranged fsync call happens (via an msync for example), with a range that
  doesn't cover the whole file and the end of this range, lets call it Y, is
  smaller than X;

* btrfs_log_inode, sees the flag BTRFS_INODE_NEEDS_FULL_SYNC set and
  calls btrfs_truncate_inode_items() to remove all items from the log
  tree that are associated with our file;

* btrfs_truncate_inode_items() removes all of the inode's items, and the lowest
  file extent item it removed is the one ending at offset X, where X > 0 and
  X > Y - before returning, it calls btrfs_ordered_update_i_size() with an offset
  parameter set to X;

* btrfs_ordered_update_i_size() sees that X is greater then the current ordered
  size (btrfs_inode's disk_i_size) and then it assumes there can't be any ongoing
  ordered operation with a range covering the offset X, calling a BUG_ON() if
  such ordered operation exists. This assumption is made because the disk_i_size
  is only increased after the corresponding file extent item is added to the
  btree (btrfs_finish_ordered_io);

* But because our fsync covers only a limited range, such an ordered extent might
  exist, and our fsync callback (btrfs_sync_file) doesn't wait for such ordered
  extent to finish when calling btrfs_wait_ordered_range();

And then by the time btrfs_ordered_update_i_size() is called, via:

   btrfs_sync_file() ->
       btrfs_log_dentry_safe() ->
           btrfs_log_inode_parent() ->
               btrfs_log_inode() ->
                   btrfs_truncate_inode_items() ->
                       btrfs_ordered_update_i_size()

We hit the BUG_ON(), which could never happen if the fsync range covered the whole
possible file range (0 to LLONG_MAX), as we would wait for all ordered extents to
finish before calling btrfs_truncate_inode_items().

So just don't call btrfs_ordered_update_i_size() if we're removing the inode's items
from a log tree, which isn't supposed to change the in memory inode's disk_i_size.

Issue found while running xfstests/generic/127 (happens very rarely for me), more
specifically via the fsx calls that use memory mapped IO (and issue msync calls).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-02 16:46:05 -07:00
..
9p Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
adfs write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
afs AFS: Correctly assemble the client UUID 2014-07-29 10:14:36 -07:00
autofs4 autofs4: fix false positive compile error 2014-07-03 09:21:53 -07:00
befs fs/befs: kernel-doc fixes 2014-06-06 16:08:09 -07:00
bfs write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
btrfs Btrfs: fix crash while doing a ranged fsync 2014-09-02 16:46:05 -07:00
cachefiles fs/cachefiles: replace kerror by pr_err 2014-06-06 16:08:14 -07:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-06-12 23:06:23 -07:00
cifs [CIFS] fix mount failure with broken pathnames when smb3 mount with mapchars option 2014-06-24 08:10:24 -05:00
coda coda: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
configfs fs/configfs: use pr_fmt 2014-06-04 16:53:53 -07:00
cramfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
debugfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
devpts fs/devpts/inode.c: convert printk to pr_foo() 2014-06-06 16:08:14 -07:00
dlm dlm: keep listening connection alive with sctp mode 2014-06-12 10:26:14 -05:00
ecryptfs write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
efivarfs fs/efivarfs/super.c: use static const for dentry_operations 2014-06-04 16:54:14 -07:00
efs fs/efs: convert printk(KERN_DEBUG to pr_debug 2014-06-04 16:54:21 -07:00
exofs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
exportfs fs/exportfs/expfs.c: kernel-doc warning fixes 2014-06-04 16:54:14 -07:00
ext2 ->splice_write() via ->write_iter() 2014-06-12 00:18:51 -04:00
ext3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
ext4 ext4: fix potential null pointer dereference in ext4_free_inode 2014-07-12 16:11:42 -04:00
f2fs f2fs: avoid to access NULL pointer in issue_flush_thread 2014-07-09 05:59:55 -07:00
fat Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
freevxfs Major changes for 3.14 include support for the newly added ZERO_RANGE 2014-04-04 15:39:39 -07:00
fscache fscache: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
fuse fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT 2014-07-22 16:37:43 +02:00
gfs2 GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes 2014-07-18 11:15:14 +01:00
hfs write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
hfsplus Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
hostfs write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
hpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
hppfs
hugetlbfs fs/hugetlbfs/inode.c: remove null test before kfree 2014-06-04 16:54:11 -07:00
isofs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-04-07 17:59:17 -07:00
jbd fs/jbd/revoke.c: replace shift loop by ilog2 2014-05-21 10:26:13 +02:00
jbd2 ext4: disable synchronous transaction batching if max_batch_time==0 2014-07-05 19:18:22 -04:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
jfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
kernfs Merge branch 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-07-10 11:38:23 -07:00
lockd Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux 2014-06-10 11:50:57 -07:00
logfs write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
minix write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
ncpfs fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul 2014-06-04 16:54:21 -07:00
nfs NFS: Don't reset pg_moreio in __nfs_pageio_add_request 2014-07-13 15:18:44 -04:00
nfs_common
nfsd NFSD: Fix crash encoding lock reply on 32-bit 2014-07-23 10:31:56 -04:00
nilfs2 write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
nls
notify inotify: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
ocfs2 ocfs2/dlm: do not purge lockres that is queued for assert master 2014-06-23 16:47:45 -07:00
omfs write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
openpromfs fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
proc /proc/stat: convert to single_open_size() 2014-07-03 09:21:54 -07:00
pstore fs/pstore: logging clean-up 2014-06-06 16:08:13 -07:00
qnx4 fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
qnx6 fs: push sync_filesystem() down to the file system's remount_fs() 2014-03-13 10:14:33 -04:00
quota quota: missing lock in dqcache_shrink_scan() 2014-07-15 22:36:18 +02:00
ramfs ->splice_write() via ->write_iter() 2014-06-12 00:18:51 -04:00
reiserfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
romfs switch simple generic_file_aio_read() users to ->read_iter() 2014-05-06 17:37:55 -04:00
squashfs fs/squashfs/squashfs.h: replace pr_warning by pr_warn 2014-06-04 16:53:52 -07:00
sysfs kernfs: move the last knowledge of sysfs out from kernfs 2014-06-03 08:11:18 -07:00
sysv write_iter variants of {__,}generic_file_aio_write() 2014-05-06 17:38:00 -04:00
ubifs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
udf udf: switch to ->write_iter() 2014-05-06 17:39:36 -04:00
ufs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
xfs xfs: null unused quota inodes when quota is on 2014-07-15 07:28:41 +10:00
aio.c aio: protect reqs_available updates from changes in interrupt handlers 2014-07-14 13:05:26 -04:00
anon_inodes.c vfs: Allocate anon_inode_inode in anon_inode_init() 2014-03-27 09:52:54 -07:00
attr.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-10 13:57:22 -07:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-05 08:05:29 -07:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c fs/binfmt_flat.c: make old_reloc() static 2014-06-04 16:54:21 -07:00
binfmt_misc.c binfmt_misc: add missing 'break' statement 2014-04-03 16:21:16 -07:00
binfmt_script.c
binfmt_som.c
block_dev.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
buffer.c mm: non-atomically mark page accessed during page cache allocation where possible 2014-06-04 16:54:10 -07:00
char_dev.c
compat.c locks: rename file-private locks to "open file description locks" 2014-04-22 08:23:58 -04:00
compat_binfmt_elf.c
compat_ioctl.c
coredump.c coredump: fix the setting of PF_DUMPCORE 2014-07-23 15:10:54 -07:00
dcache.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
dcookies.c
direct-io.c direct-io: fix AIO regression 2014-08-01 02:35:51 -04:00
drop_caches.c fs: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
eventfd.c
eventpoll.c epoll: fix use-after-free in eventpoll_release_file 2014-06-16 17:21:59 -10:00
exec.c perf: Differentiate exec() and non-exec() comm events 2014-06-06 07:56:22 +02:00
fcntl.c locks: rename file-private locks to "open file description locks" 2014-04-22 08:23:58 -04:00
fhandle.c
file.c fs/file.c: don't open-code kvfree() 2014-05-06 17:31:10 -04:00
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
filesystems.c sys_sysfs: Add CONFIG_SYSFS_SYSCALL 2014-04-03 16:21:05 -07:00
fs-writeback.c One of the main highlights this time, is not the patches themselves 2014-04-04 14:49:16 -07:00
fs_struct.c
inode.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-10 13:57:22 -07:00
internal.h
ioctl.c
Kconfig
Kconfig.binfmt
libfs.c fs/libfs.c: add generic data flush to fsync 2014-06-04 16:53:55 -07:00
locks.c locks: set fl_owner for leases back to current->files 2014-06-10 12:29:05 -04:00
Makefile block: move ioprio.c from fs/ to block/ 2014-05-19 11:02:18 -06:00
mbcache.c fs/mbcache: replace __builtin_log2() with ilog2() 2014-06-25 22:08:29 -04:00
mount.h reduce m_start() cost... 2014-04-01 23:19:09 -04:00
mpage.c fs/block_dev.c: add bdev_read_page() and bdev_write_page() 2014-06-04 16:54:02 -07:00
namei.c fs: umount on symlink leaks mnt count 2014-07-24 06:18:12 -04:00
namespace.c VFS: Make delayed_free() call free_vfsmnt() 2014-04-01 23:19:18 -04:00
no-block.c
open.c vfs: fix check for fallocate on active swapfile 2014-08-01 02:36:04 -04:00
pipe.c new helper: copy_page_from_iter() 2014-05-06 17:39:42 -04:00
pnode.c smarter propagate_mnt() 2014-04-01 23:19:08 -04:00
pnode.h smarter propagate_mnt() 2014-04-01 23:19:08 -04:00
posix_acl.c posix_acl: handle NULL ACL in posix_acl_equiv_mode 2014-05-06 13:58:42 -04:00
proc_namespace.c reduce m_start() cost... 2014-04-01 23:19:09 -04:00
read_write.c switch simple generic_file_aio_read() users to ->read_iter() 2014-05-06 17:37:55 -04:00
readdir.c fanotify: create FAN_ACCESS event for readdir 2014-06-04 16:53:52 -07:00
select.c
seq_file.c fs/seq_file: fallback to vmalloc allocation 2014-07-03 09:21:54 -07:00
signalfd.c
splice.c Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linus 2014-06-12 00:28:09 -04:00
stack.c
stat.c
statfs.c
super.c fs/superblock: avoid locking counting inodes and dentries before reclaiming them 2014-06-04 16:54:11 -07:00
sync.c
timerfd.c
utimes.c
xattr.c simple_xattr: permit 0-size extended attributes 2014-07-23 15:10:55 -07:00