alistair23-linux/fs/btrfs
Filipe Manana dac5705cad Btrfs: fix crash while doing a ranged fsync
While doing a ranged fsync, that is, one whose range doesn't cover the
whole possible file range (0 to LLONG_MAX), we can crash under certain
circumstances with a trace like the following:

[41074.641913] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
(...)
[41074.642692] CPU: 0 PID: 24580 Comm: fsx Not tainted 3.16.0-fdm-btrfs-next-45+ #1
(...)
[41074.643886] RIP: 0010:[<ffffffffa01ecc99>]  [<ffffffffa01ecc99>] btrfs_ordered_update_i_size+0x279/0x2b0 [btrfs]
(...)
[41074.644919] Stack:
(...)
[41074.644919] Call Trace:
[41074.644919]  [<ffffffffa01db531>] btrfs_truncate_inode_items+0x3f1/0xa10 [btrfs]
[41074.644919]  [<ffffffffa01eb54f>] ? btrfs_get_logged_extents+0x4f/0x80 [btrfs]
[41074.644919]  [<ffffffffa02137a9>] btrfs_log_inode+0x2f9/0x970 [btrfs]
[41074.644919]  [<ffffffff81090875>] ? sched_clock_local+0x25/0xa0
[41074.644919]  [<ffffffff8164a55e>] ? mutex_unlock+0xe/0x10
[41074.644919]  [<ffffffff810af51d>] ? trace_hardirqs_on+0xd/0x10
[41074.644919]  [<ffffffffa0214b4f>] btrfs_log_inode_parent+0x1ef/0x560 [btrfs]
[41074.644919]  [<ffffffff811d0c55>] ? dget_parent+0x5/0x180
[41074.644919]  [<ffffffffa0215d11>] btrfs_log_dentry_safe+0x51/0x80 [btrfs]
[41074.644919]  [<ffffffffa01e2d1a>] btrfs_sync_file+0x1ba/0x3e0 [btrfs]
[41074.644919]  [<ffffffff811eda6b>] vfs_fsync_range+0x1b/0x30
(...)

The necessary conditions that lead to such crash are:

* an incremental fsync (when the inode doesn't have the
  BTRFS_INODE_NEEDS_FULL_SYNC flag set) happened for our file and it logged
  a file extent item ending at offset X;

* the file got the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode, due
  to a file truncate operation that reduces the file to a size smaller
  than X;

* a ranged fsync call happens (via an msync for example), with a range that
  doesn't cover the whole file and the end of this range, lets call it Y, is
  smaller than X;

* btrfs_log_inode, sees the flag BTRFS_INODE_NEEDS_FULL_SYNC set and
  calls btrfs_truncate_inode_items() to remove all items from the log
  tree that are associated with our file;

* btrfs_truncate_inode_items() removes all of the inode's items, and the lowest
  file extent item it removed is the one ending at offset X, where X > 0 and
  X > Y - before returning, it calls btrfs_ordered_update_i_size() with an offset
  parameter set to X;

* btrfs_ordered_update_i_size() sees that X is greater then the current ordered
  size (btrfs_inode's disk_i_size) and then it assumes there can't be any ongoing
  ordered operation with a range covering the offset X, calling a BUG_ON() if
  such ordered operation exists. This assumption is made because the disk_i_size
  is only increased after the corresponding file extent item is added to the
  btree (btrfs_finish_ordered_io);

* But because our fsync covers only a limited range, such an ordered extent might
  exist, and our fsync callback (btrfs_sync_file) doesn't wait for such ordered
  extent to finish when calling btrfs_wait_ordered_range();

And then by the time btrfs_ordered_update_i_size() is called, via:

   btrfs_sync_file() ->
       btrfs_log_dentry_safe() ->
           btrfs_log_inode_parent() ->
               btrfs_log_inode() ->
                   btrfs_truncate_inode_items() ->
                       btrfs_ordered_update_i_size()

We hit the BUG_ON(), which could never happen if the fsync range covered the whole
possible file range (0 to LLONG_MAX), as we would wait for all ordered extents to
finish before calling btrfs_truncate_inode_items().

So just don't call btrfs_ordered_update_i_size() if we're removing the inode's items
from a log tree, which isn't supposed to change the in memory inode's disk_i_size.

Issue found while running xfstests/generic/127 (happens very rarely for me), more
specifically via the fsx calls that use memory mapped IO (and issue msync calls).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-02 16:46:05 -07:00
..
tests Btrfs: fix qgroups sanity test crash or hang 2014-06-13 09:52:24 -07:00
acl.c btrfs: remove useless ACL check 2014-06-09 17:20:42 -07:00
async-thread.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
async-thread.h Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
backref.c Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch 2014-08-15 07:43:19 -07:00
backref.h Btrfs: fix scrub_print_warning to handle skinny metadata extents 2014-06-09 17:21:17 -07:00
btrfs_inode.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
check-integrity.c btrfs: check_int: propagate out-of-memory error upwards 2014-06-09 17:20:21 -07:00
check-integrity.h
compression.c btrfs compression: reuse recently used workspace 2014-06-28 13:48:46 -07:00
compression.h
ctree.c Btrfs: __btrfs_mod_ref should always use no_quota 2014-08-15 07:43:11 -07:00
ctree.h Btrfs: __btrfs_mod_ref should always use no_quota 2014-08-15 07:43:11 -07:00
delayed-inode.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
delayed-inode.h
delayed-ref.c Btrfs: rework qgroup accounting 2014-06-09 17:20:48 -07:00
delayed-ref.h Btrfs: rework qgroup accounting 2014-06-09 17:20:48 -07:00
dev-replace.c btrfs: dev replace should replace the sysfs entry 2014-06-28 13:48:44 -07:00
dev-replace.h
dir-item.c
disk-io.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
disk-io.h Btrfs: add sanity tests for new qgroup accounting code 2014-06-09 17:20:49 -07:00
export.c
export.h
extent-tree.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
extent_io.c Btrfs: fix crash on endio of reading corrupted block 2014-08-21 07:55:30 -07:00
extent_io.h Btrfs: remove unused wait queue in struct extent_buffer 2014-06-19 14:20:28 -07:00
extent_map.c Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
extent_map.h Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
file-item.c Btrfs: fix csum tree corruption, duplicate and outdated checksums 2014-08-15 07:43:40 -07:00
file.c Btrfs: fix filemap_flush call in btrfs_file_release 2014-08-21 07:55:31 -07:00
free-space-cache.c Btrfs: fix broken free space cache after the system crashed 2014-06-19 14:20:54 -07:00
free-space-cache.h
hash.c
hash.h
inode-item.c
inode-map.c btrfs: remove newline from inode cache kthread name 2014-06-09 17:20:53 -07:00
inode-map.h
inode.c Btrfs: fix crash while doing a ranged fsync 2014-09-02 16:46:05 -07:00
ioctl.c Btrfs: fix autodefrag with compression 2014-08-27 08:45:37 -07:00
Kconfig
locking.c Btrfs: fix deadlocks with trylock on tree nodes 2014-06-19 14:19:55 -07:00
locking.h
lzo.c btrfs: return errno instead of -1 from compression 2014-06-09 17:20:21 -07:00
Makefile Btrfs: add sanity tests for new qgroup accounting code 2014-06-09 17:20:49 -07:00
math.h
ordered-data.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
ordered-data.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
orphan.c
print-tree.c Btrfs: fix btrfs_print_leaf for skinny metadata 2014-07-03 07:04:16 -07:00
print-tree.h
props.c
props.h
qgroup.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
qgroup.h btrfs: qgroup: account shared subtrees during snapshot delete 2014-08-15 07:43:14 -07:00
raid56.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
raid56.h
rcu-string.h
reada.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
relocation.c btrfs: remove stale newlines from log messages 2014-06-09 17:20:53 -07:00
root-tree.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
scrub.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
send.c Btrfs: send, use the right limits for xattr names and values 2014-06-09 17:21:00 -07:00
send.h
struct-funcs.c
super.c btrfs: adjust statfs calculations according to raid profiles 2014-08-15 07:43:10 -07:00
sysfs.c Btrfs: fix regression of btrfs device replace 2014-08-21 07:55:20 -07:00
sysfs.h btrfs: dev add should add its sysfs entry 2014-06-28 13:48:43 -07:00
transaction.c btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
transaction.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
tree-defrag.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
tree-log.c Btrfs: fix hole detection during file fsync 2014-08-21 07:55:24 -07:00
tree-log.h Btrfs: use helpers for last_trans_log_full_commit instead of opencode 2014-06-09 17:20:45 -07:00
ulist.c
ulist.h Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch 2014-08-15 07:43:19 -07:00
uuid-tree.c
volumes.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
volumes.h Btrfs: fix deadlock when mounting a degraded fs 2014-06-19 14:20:56 -07:00
xattr.c
xattr.h
zlib.c btrfs: use E2BIG instead of EIO if compression does not help 2014-07-03 07:04:13 -07:00