1
0
Fork 0
remarkable-linux/fs/btrfs
Filipe Manana 0ea7fcfc7f Btrfs: fix file data corruption after cloning a range and fsync
commit bd3599a0e1 upstream.

When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the list of modified
extents in the inode's extent map tree, so that a "fast" fsync (the flag
BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
and log corresponding extent items. However, at the end of range cloning
operation we do truncate all the pages in the affected range (in order to
ensure future reads will not get stale data). Sometimes this truncation
will release the corresponding extent maps besides the pages from the page
cache. If this happens, then a "fast" fsync operation will miss logging
some extent items, because it relies exclusively on the extent maps being
present in the inode's extent tree, leading to data loss/corruption if
the fsync ends up using the same transaction used by the clone operation
(that transaction was not committed in the meanwhile). An extent map is
released through the callback btrfs_invalidatepage(), which gets called by
truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
later ends up calling try_release_extent_mapping() which will release the
extent map if some conditions are met, like the file size being greater
than 16Mb, gfp flags allow blocking and the range not being locked (which
is the case during the clone operation) nor being the extent map flagged
as pinned (also the case for cloning).

The following example, turned into a test for fstests, reproduces the
issue:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
  $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar

  $ xfs_io -c "fsync" /mnt/bar
  # reflink destination offset corresponds to the size of file bar,
  # 2728Kb minus 4Kb.
  $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
  $ xfs_io -c "fsync" /mnt/bar

  $ md5sum /mnt/bar
  95a95813a8c2abc9aa75a6c2914a077e  /mnt/bar

  <power fail>

  $ mount /dev/sdb /mnt
  $ md5sum /mnt/bar
  207fd8d0b161be8a84b945f0df8d5f8d  /mnt/bar
  # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
  # power failure

In the above example, the destination offset of the clone operation
corresponds to the size of the "bar" file minus 4Kb. So during the clone
operation, the extent map covering the range from 2572Kb to 2728Kb gets
trimmed so that it ends at offset 2724Kb, and a new extent map covering
the range from 2724Kb to 11724Kb is created. So at the end of the clone
operation when we ask to truncate the pages in the range from 2724Kb to
2724Kb + 15908Kb, the page invalidation callback ends up removing the new
extent map (through try_release_extent_mapping()) when the page at offset
2724Kb is passed to that callback.

Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
map is removed at try_release_extent_mapping(), forcing the next fsync to
search for modified extents in the fs/subvolume tree instead of relying on
the presence of extent maps in memory. This way we can continue doing a
"fast" fsync if the destination range of a clone operation does not
overlap with an existing range or if any of the criteria necessary to
remove an extent map at try_release_extent_mapping() is not met (file
size not bigger then 16Mb or gfp flags do not allow blocking).

CC: stable@vger.kernel.org # 3.16+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-09 12:16:39 +02:00
..
tests btrfs: tests/qgroup: Fix wrong tree backref level 2018-05-30 07:52:26 +02:00
Kconfig btrfs: Add zstd support 2017-08-15 09:02:09 -07:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
acl.c btrfs: preserve i_mode if __btrfs_set_acl() fails 2017-08-21 17:47:42 +02:00
async-thread.c btrfs: constify tracepoint arguments 2017-08-16 14:19:53 +02:00
async-thread.h btrfs: constify tracepoint arguments 2017-08-16 14:19:53 +02:00
backref.c btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes 2018-03-21 12:06:44 +01:00
backref.h btrfs: backref, add tracepoints for prelim_ref insertion and merging 2017-08-16 16:12:01 +02:00
btrfs_inode.h btrfs: separate defrag and property compression 2017-08-16 16:12:05 +02:00
check-integrity.c Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-09-09 13:27:51 -07:00
check-integrity.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
compression.c Merge branch 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-09-29 12:57:35 -07:00
compression.h Merge branch 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2017-09-14 17:30:49 -07:00
ctree.c btrfs: fix reading stale metadata blocks after degraded raid1 mounts 2018-05-22 18:54:01 +02:00
ctree.h btrfs: use kvzalloc to allocate btrfs_fs_info 2018-05-30 07:52:08 +02:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c Btrfs: fix stale entries in readdir 2018-01-31 14:03:42 +01:00
delayed-inode.h btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t 2017-04-18 14:07:23 +02:00
delayed-ref.c Btrfs: return old and new total ref mods when adding delayed refs 2017-06-29 20:17:01 +02:00
delayed-ref.h Btrfs: return old and new total ref mods when adding delayed refs 2017-06-29 20:17:01 +02:00
dev-replace.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:54:01 -07:00
dev-replace.h btrfs: constify device path passed to relevant helpers 2017-02-28 14:26:07 +01:00
dir-item.c btrfs: fix validation of XATTR_ITEM dir items 2017-06-29 20:06:11 +02:00
disk-io.c btrfs: define SUPER_FLAG_METADUMP_V2 2018-06-11 22:49:18 +02:00
disk-io.h btrfs: use named constant for bdev blocksize 2017-08-16 16:12:04 +02:00
export.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
export.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
extent-tree.c btrfs: Fix possible softlock on single core machines 2018-05-30 07:52:24 +02:00
extent_io.c Btrfs: fix file data corruption after cloning a range and fsync 2018-08-09 12:16:39 +02:00
extent_io.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
extent_map.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
extent_map.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
file-item.c Merge branch 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-07-05 16:41:23 -07:00
file.c Btrfs: set plug for fsync 2018-04-26 11:02:09 +02:00
free-space-cache.c btrfs: fix deadlock when writing out space cache 2018-02-03 17:39:04 +01:00
free-space-cache.h btrfs: free-space-cache, clean up unnecessary root arguments 2017-02-17 12:03:56 +01:00
free-space-tree.c btrfs: pass fs_info to btrfs_del_root instead of tree_root 2017-08-21 17:49:54 +02:00
free-space-tree.h btrfs: expose internal free space tree routine only if sanity tests are enabled 2017-08-18 16:36:29 +02:00
hash.c crypto: Work around deallocated stack frame reference gcc bug on sparc. 2017-06-08 17:36:03 +08:00
hash.h btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
inode-item.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
inode-map.c btrfs: qgroup: Introduce extent changeset for qgroup reserve functions 2017-06-29 20:17:02 +02:00
inode-map.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
inode.c Btrfs: don't BUG_ON() in btrfs_truncate_inode_items() 2018-08-03 07:50:28 +02:00
ioctl.c Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2() 2018-06-26 08:06:30 +08:00
locking.c btrfs: cleanup, remove stray return statements 2016-01-07 14:30:52 +01:00
locking.h
lzo.c btrfs: switch to kvmalloc and GFP_KERNEL in lzo/zlib alloc_workspace 2017-06-19 18:26:02 +02:00
math.h
ordered-data.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
ordered-data.h btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
orphan.c
print-tree.c Btrfs: add one more sanity check for shared ref type 2017-08-21 17:47:43 +02:00
print-tree.h btrfs: get fs_info from eb in btrfs_print_tree, remove argument 2017-08-16 16:12:03 +02:00
props.c btrfs: property: Set incompat flag if lzo/zstd compression is set 2018-05-22 18:54:01 +02:00
props.h
qgroup.c btrfs: qgroup: Finish rescan when hit the last leaf of extent tree 2018-08-03 07:50:28 +02:00
qgroup.h btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges 2017-06-29 20:17:02 +02:00
raid56.c Btrfs: make raid6 rebuild retry more 2018-06-21 04:03:02 +09:00
raid56.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
rcu-string.h
reada.c btrfs: remove unused member err from reada_extent 2017-06-19 18:25:59 +02:00
relocation.c btrfs: fix NULL pointer dereference from free_reloc_roots() 2017-09-26 14:51:49 +02:00
root-tree.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:54:01 -07:00
scrub.c btrfs: scrub: Don't use inode pages for device replace 2018-06-26 08:06:30 +08:00
send.c Btrfs: send, fix issuing write op when processing hole in no data mode 2018-05-30 07:52:09 +02:00
send.h Btrfs: use linux/sizes.h to represent constants 2016-01-07 14:38:02 +01:00
struct-funcs.c btrfs: struct-funcs, constify readers 2017-08-16 14:19:53 +02:00
super.c btrfs: use kvzalloc to allocate btrfs_fs_info 2018-05-30 07:52:08 +02:00
sysfs.c Revert "btrfs: use proper endianness accessors for super_copy" 2018-03-19 08:42:47 +01:00
sysfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
transaction.c btrfs: qgroup: Fix root item corruption when multiple same source snapshots are created with quota enabled 2018-05-30 07:52:26 +02:00
transaction.h btrfs: remove unused qgroup members from btrfs_trans_handle 2017-04-18 14:07:25 +02:00
tree-defrag.c Btrfs: fix locking bugs when defragging leaves 2015-12-18 02:51:32 +00:00
tree-log.c btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups 2018-08-03 07:50:28 +02:00
tree-log.h btrfs: Make btrfs_del_inode_ref take btrfs_inode 2017-02-14 15:50:54 +01:00
ulist.c btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
ulist.h btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
uuid-tree.c btrfs: return the actual error value from from btrfs_uuid_tree_iterate 2016-12-19 18:08:15 +01:00
volumes.c Btrfs: make raid6 rebuild retry more 2018-06-21 04:03:02 +09:00
volumes.h btrfs: Fix memory barriers usage with device stats counters 2018-03-21 12:06:44 +01:00
xattr.c btrfs: Check name_len with boundary in verify dir_item 2017-06-21 19:16:04 +02:00
xattr.h btrfs: Switch to generic xattr handlers 2016-05-17 19:17:09 -04:00
zlib.c btrfs: switch to kvmalloc and GFP_KERNEL in lzo/zlib alloc_workspace 2017-06-19 18:26:02 +02:00
zstd.c btrfs: Add zstd support 2017-08-15 09:02:09 -07:00