alistair23-linux/fs/btrfs
Filipe Manana 74121f7cbb Btrfs: fix hole detection during file fsync
The file hole detection logic during a file fsync wasn't correct,
because it didn't look back (in a previous leaf) for the last file
extent item that can be in a leaf to the left of our leaf and that
has a generation lower than the current transaction id. This made it
assume that a hole exists when it really doesn't exist in the file.

Such false positive hole detection happens in the following scenario:

* We have a file that has many file extent items, covering 3 or more
  btree leafs (the first leaf must contain non file extent items too).

* Two ranges of the file are modified, with their extent items being
  located at 2 different leafs and those leafs aren't consecutive.

* When processing the second modified leaf, we weren't checking if
  some file extent item exists that is located in some leaf that is
  between our 2 modified leafs, and therefore assumed the range defined
  between the last file extent item in the first leaf and the first file
  extent item in the second leaf matched a hole.

Fortunately this didn't result in overriding the log with wrong data,
instead it made the last loop in copy_items() attempt to insert a
duplicated key (for a hole file extent item), which makes the file
fsync code return with -EEXIST to file.c:btrfs_sync_file() which in
turn ends up doing a full transaction commit, which is much more
expensive then writing only to the log tree and wait for it to be
durably persisted (as well as the file's modified extents/pages).
Therefore fix the hole detection logic, so that we don't pay the
cost of doing full transaction commits.

I could trigger this issue with the following test for xfstests (which
never fails, either without or with this patch). The last fsync call
results in a full transaction commit, due to the -EEXIST error mentioned
above. I could also observe this behaviour happening frequently when
running xfstests/generic/075 in a loop.

Test:

    _cleanup()
    {
        _cleanup_flakey
        rm -fr $tmp
    }

    # get standard environment, filters and checks
    . ./common/rc
    . ./common/filter
    . ./common/dmflakey

    # real QA test starts here
    _supported_fs btrfs
    _supported_os Linux
    _require_scratch
    _require_dm_flakey
    _need_to_be_root

    rm -f $seqres.full

    # Create a file with many file extent items, each representing a 4Kb extent.
    # These items span 3 btree leaves, of 16Kb each (default mkfs.btrfs leaf size
    # as of btrfs-progs 3.12).
    _scratch_mkfs -l 16384 >/dev/null 2>&1
    _init_flakey
    SAVE_MOUNT_OPTIONS="$MOUNT_OPTIONS"
    MOUNT_OPTIONS="$MOUNT_OPTIONS -o commit=999"
    _mount_flakey

    # First fsync, inode has BTRFS_INODE_NEEDS_FULL_SYNC flag set.
    $XFS_IO_PROG -f -c "pwrite -S 0x01 -b 4096 0 4096" -c "fsync" \
            $SCRATCH_MNT/foo | _filter_xfs_io

    # For any of the following fsync calls, inode doesn't have the flag
    # BTRFS_INODE_NEEDS_FULL_SYNC set.
    for ((i = 1; i <= 500; i++)); do
        OFFSET=$((4096 * i))
        LEN=4096
        $XFS_IO_PROG -c "pwrite -S 0x01 $OFFSET $LEN" -c "fsync" \
                $SCRATCH_MNT/foo | _filter_xfs_io
    done

    # Commit transaction and bump next transaction's id (to 7).
    sync

    # Truncate will set the BTRFS_INODE_NEEDS_FULL_SYNC flag in the btrfs's
    # inode runtime flags.
    $XFS_IO_PROG -c "truncate 2048000" $SCRATCH_MNT/foo

    # Commit transaction and bump next transaction's id (to 8).
    sync

    # Touch 1 extent item from the first leaf and 1 from the last leaf. The leaf
    # in the middle, containing only file extent items, isn't touched. So the
    # next fsync, when calling btrfs_search_forward(), won't visit that middle
    # leaf. First and 3rd leaf have now a generation with value 8, while the
    # middle leaf remains with a generation with value 6.
    $XFS_IO_PROG \
        -c "pwrite -S 0xee -b 4096 0 4096" \
        -c "pwrite -S 0xff -b 4096 2043904 4096" \
        -c "fsync" \
        $SCRATCH_MNT/foo | _filter_xfs_io

    _load_flakey_table $FLAKEY_DROP_WRITES
    md5sum $SCRATCH_MNT/foo | _filter_scratch
    _unmount_flakey

    _load_flakey_table $FLAKEY_ALLOW_WRITES
    # During mount, we'll replay the log created by the fsync above, and the file's
    # md5 digest should be the same we got before the unmount.
    _mount_flakey
    md5sum $SCRATCH_MNT/foo | _filter_scratch
    _unmount_flakey
    MOUNT_OPTIONS="$SAVE_MOUNT_OPTIONS"

    status=0
    exit

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-08-21 07:55:24 -07:00
..
tests Btrfs: fix qgroups sanity test crash or hang 2014-06-13 09:52:24 -07:00
acl.c btrfs: remove useless ACL check 2014-06-09 17:20:42 -07:00
async-thread.c
async-thread.h
backref.c Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch 2014-08-15 07:43:19 -07:00
backref.h Btrfs: fix scrub_print_warning to handle skinny metadata extents 2014-06-09 17:21:17 -07:00
btrfs_inode.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
check-integrity.c btrfs: check_int: propagate out-of-memory error upwards 2014-06-09 17:20:21 -07:00
check-integrity.h
compression.c btrfs compression: reuse recently used workspace 2014-06-28 13:48:46 -07:00
compression.h
ctree.c Btrfs: __btrfs_mod_ref should always use no_quota 2014-08-15 07:43:11 -07:00
ctree.h Btrfs: __btrfs_mod_ref should always use no_quota 2014-08-15 07:43:11 -07:00
delayed-inode.c btrfs: free delayed node outside of root->inode_lock 2014-06-09 17:21:08 -07:00
delayed-inode.h
delayed-ref.c Btrfs: rework qgroup accounting 2014-06-09 17:20:48 -07:00
delayed-ref.h Btrfs: rework qgroup accounting 2014-06-09 17:20:48 -07:00
dev-replace.c btrfs: dev replace should replace the sysfs entry 2014-06-28 13:48:44 -07:00
dev-replace.h
dir-item.c
disk-io.c Btrfs: Fix wrong device size when we are resizing the device 2014-08-19 08:52:18 -07:00
disk-io.h Btrfs: add sanity tests for new qgroup accounting code 2014-06-09 17:20:49 -07:00
export.c
export.h
extent-tree.c Btrfs: don't consider the missing device when allocating new chunks 2014-08-19 08:52:19 -07:00
extent_io.c btrfs: Return right extent when fiemap gives unaligned offset and len. 2014-08-19 08:52:14 -07:00
extent_io.h Btrfs: remove unused wait queue in struct extent_buffer 2014-06-19 14:20:28 -07:00
extent_map.c Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
extent_map.h Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
file-item.c Btrfs: fix csum tree corruption, duplicate and outdated checksums 2014-08-15 07:43:40 -07:00
file.c Btrfs: fill_holes: Fix slot number passed to hole_mergeable() call. 2014-08-19 08:36:26 -07:00
free-space-cache.c Btrfs: fix broken free space cache after the system crashed 2014-06-19 14:20:54 -07:00
free-space-cache.h
hash.c
hash.h
inode-item.c
inode-map.c btrfs: remove newline from inode cache kthread name 2014-06-09 17:20:53 -07:00
inode-map.h
inode.c Btrfs: ensure tmpfile inode is always persisted with link count of 0 2014-08-21 07:55:23 -07:00
ioctl.c Btrfs: race free update of commit root for ro snapshots 2014-08-21 07:55:21 -07:00
Kconfig
locking.c Btrfs: fix deadlocks with trylock on tree nodes 2014-06-19 14:19:55 -07:00
locking.h
lzo.c btrfs: return errno instead of -1 from compression 2014-06-09 17:20:21 -07:00
Makefile Btrfs: add sanity tests for new qgroup accounting code 2014-06-09 17:20:49 -07:00
math.h
ordered-data.c btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
ordered-data.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
orphan.c
print-tree.c Btrfs: fix btrfs_print_leaf for skinny metadata 2014-07-03 07:04:16 -07:00
print-tree.h
props.c
props.h
qgroup.c btrfs: correctly handle return from ulist_add 2014-08-15 07:43:16 -07:00
qgroup.h btrfs: qgroup: account shared subtrees during snapshot delete 2014-08-15 07:43:14 -07:00
raid56.c Btrfs: fix crash when mounting raid5 btrfs with missing disks 2014-06-28 13:48:45 -07:00
raid56.h
rcu-string.h
reada.c Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting 2014-06-13 09:52:21 -07:00
relocation.c btrfs: remove stale newlines from log messages 2014-06-09 17:20:53 -07:00
root-tree.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
scrub.c Btrfs: don't write any data into a readonly device when scrub 2014-08-19 08:52:17 -07:00
send.c Btrfs: send, use the right limits for xattr names and values 2014-06-09 17:21:00 -07:00
send.h
struct-funcs.c
super.c btrfs: adjust statfs calculations according to raid profiles 2014-08-15 07:43:10 -07:00
sysfs.c Btrfs: fix regression of btrfs device replace 2014-08-21 07:55:20 -07:00
sysfs.h btrfs: dev add should add its sysfs entry 2014-06-28 13:48:43 -07:00
transaction.c btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
transaction.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
tree-defrag.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
tree-log.c Btrfs: fix hole detection during file fsync 2014-08-21 07:55:24 -07:00
tree-log.h Btrfs: use helpers for last_trans_log_full_commit instead of opencode 2014-06-09 17:20:45 -07:00
ulist.c
ulist.h Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch 2014-08-15 07:43:19 -07:00
uuid-tree.c
volumes.c Btrfs: Fix wrong device size when we are resizing the device 2014-08-19 08:52:18 -07:00
volumes.h Btrfs: fix deadlock when mounting a degraded fs 2014-06-19 14:20:56 -07:00
xattr.c
xattr.h
zlib.c btrfs: use E2BIG instead of EIO if compression does not help 2014-07-03 07:04:13 -07:00