redonkable/alistair23-linux

Author	SHA1	Message	Date
Trond Myklebust	18290650b1	NFS: Move buffered I/O locking into nfs_file_write() Preparation for the patch that de-serialises O_DIRECT reads and writes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:11:03 -04:00
Trond Myklebust	89698b24d2	NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:11:03 -04:00
Trond Myklebust	2f3c7d87a3	NFS: Remove racy size manipulations in O_DIRECT On success, the RPC callbacks will ensure that we make the appropriate calls to nfs_writeback_update_inode() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:11:02 -04:00
Trond Myklebust	a5314a7492	NFS: Ensure we reset the write verifier 'committed' value on resend. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:11:02 -04:00
Trond Myklebust	8fc3c38627	NFS: Fix O_DIRECT verifier problems We should not be interested in looking at the value of the stable field, since that could take any value. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:11:01 -04:00
Trond Myklebust	6712007734	pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 Cleanup... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:11:01 -04:00
Trond Myklebust	ac46bd374c	pNFS: Ensure we layoutcommit before revalidating attributes If we need to update the cached attributes, then we'd better make sure that we also layoutcommit first. Otherwise, the server may have stale attributes. Prior to this patch, the revalidation code tried to "fix" this problem by simply disabling attributes that would be affected by the layoutcommit. That approach breaks nfs_writeback_check_extend(), leading to a file size corruption. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:08:27 -04:00
Trond Myklebust	2e18d4d822	pNFS: Files and flexfiles always need to commit before layoutcommit So ensure that we mark the layout for commit once the write is done, and then ensure that the commit to ds is finished before sending layoutcommit. Note that by doing this, we're able to optimise away the commit for the case of servers that don't need layoutcommit in order to return updated attributes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:08:01 -04:00
Trond Myklebust	bc28e1c2e3	pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() Let's just have one place where we check ff_layout_need_layoutcommit(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 18:52:26 -04:00
Trond Myklebust	c001c87a63	pNFS/flexfiles: Fix layoutcommit after a commit to DS We should always do a layoutcommit after commit to DS, except if the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT. Fixes: `d67ae825a5` ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 18:52:26 -04:00
Trond Myklebust	73e6c5d854	pNFS/files: Fix layoutcommit after a commit to DS According to the errata https://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751 we should always send layout commit after a commit to DS. Fixes: `bc7d4b8fd0` ("nfs/filelayout: set layoutcommit...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 18:52:25 -04:00
Trond Myklebust	4f52b6bb8c	NFS: Don't call COMMIT in ->releasepage() While COMMIT has the potential to free up a lot of memory that is being taken by unstable writes, it isn't guaranteed to free up this particular page. Also, calling fsync() on the server is expensive and so we want to do it in a more controlled fashion, rather than have it triggered at random by the VM. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-22 09:59:44 -04:00
Trond Myklebust	93761d9863	NFS: Don't hold the inode lock across fsync() Commits are no longer required to be serialised. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-22 09:59:43 -04:00
Trond Myklebust	811ed92ecc	NFS: writepage of a single page should not be synchronous It is almost always better to wait for more so that we can issue a bulk commit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-22 09:59:43 -04:00
Trond Myklebust	6b56a89833	NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer filemap_datawrite() and friends already deal just fine with livelock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-22 09:59:42 -04:00
Trond Myklebust	ca0daa277a	NFS: Cache aggressively when file is open for writing Unless the user is using file locking, we must assume close-to-open cache consistency when the file is open for writing. Adjust the caching algorithm so that it does not clear the cache on out-of-order writes and/or attribute revalidations. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-22 09:59:42 -04:00
Trond Myklebust	57b691819e	NFS: Cache access checks more aggressively If an attribute revalidation fails, then we already know that we'll zap the access cache. If, OTOH, the inode isn't changing, there should be no need to eject access calls just because they are old. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-15 16:36:01 -04:00
Trond Myklebust	38512aa98a	NFS: Don't flush caches for a getattr that races with writeback If there were outstanding writes then chalk up the unexpected change attribute on the server to them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-06-13 12:36:02 -04:00
Linus Torvalds	3d0f0b6a55	Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Has some fixes and some new self tests for btrfs. The self tests are usually disabled in the .config file (unless you're doing btrfs dev work), and this bunch is meant to find problems with the 64K page size patches. Jeff has a patch to help people see if they are using the hardware assist crc32c module, which really helps us nail down problems when people ask why crcs are using so much CPU. Otherwise, it's small fixes" * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: self-tests: Fix extent buffer bitmap test fail on BE system Btrfs: self-tests: Fix test_bitmaps fail on 64k sectorsize Btrfs: self-tests: Use macros instead of constants and add missing newline Btrfs: self-tests: Support testing all possible sectorsizes and nodesizes Btrfs: self-tests: Execute page straddling test only when nodesize < PAGE_SIZE btrfs: advertise which crc32c implementation is being used at module load Btrfs: add validadtion checks for chunk loading Btrfs: add more validation checks for superblock Btrfs: clear uptodate flags of pages in sys_array eb Btrfs: self-tests: Support non-4k page size Btrfs: Fix integer overflow when calculating bytes_per_bitmap Btrfs: test_check_exists: Fix infinite loop when searching for free space entries Btrfs: end transaction if we abort when creating uuid root btrfs: Use __u64 in exported linux/btrfs.h.	2016-06-10 14:13:27 -07:00
Linus Torvalds	f5364c150a	Merge branch 'stacking-fixes' (vfs stacking fixes from Jann) Merge filesystem stacking fixes from Jann Horn. * emailed patches from Jann Horn <jannh@google.com>: sched: panic on corrupted stack end ecryptfs: forbid opening files without mmap handler proc: prevent stacking filesystems on top	2016-06-10 12:10:02 -07:00
Jann Horn	2f36db7100	ecryptfs: forbid opening files without mmap handler This prevents users from triggering a stack overflow through a recursive invocation of pagefault handling that involves mapping procfs files into virtual memory. Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-06-10 12:09:43 -07:00
Jann Horn	e54ad7f1ee	proc: prevent stacking filesystems on top This prevents stacking filesystems (ecryptfs and overlayfs) from using procfs as lower filesystem. There is too much magic going on inside procfs, and there is no good reason to stack stuff on top of procfs. (For example, procfs does access checks in VFS open handlers, and ecryptfs by design calls open handlers from a kernel thread that doesn't drop privileges or so.) Signed-off-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-06-10 12:09:43 -07:00
Chris Mason	719da39a61	Merge branch 'misc-fixes-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.7	2016-06-08 14:36:12 -07:00
Chris Mason	4c52990080	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.7	2016-06-08 14:35:11 -07:00
Linus Torvalds	c8ae067f26	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Fixes for crap of assorted ages: EOPENSTALE one is 4.2+, autofs one is 4.6, d_walk - 3.2+. The atomic_open() and coredump ones are regressions from this window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coredump: fix dumping through pipes fix a regression in atomic_open() fix d_walk()/non-delayed __d_free() race autofs braino fix for do_last() fix EOPENSTALE bug in do_last()	2016-06-07 20:41:36 -07:00
Mateusz Guzik	1607f09c22	coredump: fix dumping through pipes The offset in the core file used to be tracked with ->written field of the coredump_params structure. The field was retired in favour of file->f_pos. However, ->f_pos is not maintained for pipes which leads to breakage. Restore explicit tracking of the offset in coredump_params. Introduce ->pos field for this purpose since ->written was already reused. Fixes: `a008393951` ("get rid of coredump_params->written"). Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-06-07 22:07:09 -04:00
Al Viro	a01e718f72	fix a regression in atomic_open() open("/foo/no_such_file", O_RDONLY \| O_CREAT) on should fail with EACCES when /foo is not writable; failing with ENOENT is obviously wrong. That got broken by a braino introduced when moving the creat_error logics from atomic_open() to lookup_open(). Easy to fix, fortunately. Spotted-by: "Yan, Zheng" <ukernel@gmail.com> Tested-by: "Yan, Zheng" <ukernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-06-07 21:53:51 -04:00
Al Viro	3d56c25e3b	fix d_walk()/non-delayed __d_free() race Ascend-to-parent logics in d_walk() depends on all encountered child dentries not getting freed without an RCU delay. Unfortunately, in quite a few cases it is not true, with hard-to-hit oopsable race as the result. Fortunately, the fix is simiple; right now the rule is "if it ever been hashed, freeing must be delayed" and changing it to "if it ever had a parent, freeing must be delayed" closes that hole and covers all cases the old rule used to cover. Moreover, pipes and sockets remain _not_ covered, so we do not introduce RCU delay in the cases which are the reason for having that delay conditional in the first place. Cc: stable@vger.kernel.org # v3.2+ (and watch out for __d_materialise_dentry()) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-06-07 21:26:55 -04:00
Eric W. Biederman	d71ed6c930	mnt: fs_fully_visible test the proper mount for MNT_LOCKED MNT_LOCKED implies on a child mount implies the child is locked to the parent. So while looping through the children the children should be tested (not their parent). Typically an unshare of a mount namespace locks all mounts together making both the parent and the slave as locked but there are a few corner cases where other things work. Cc: stable@vger.kernel.org Fixes: `ceeb0e5d39` ("vfs: Ignore unlocked mounts in fs_fully_visible") Reported-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2016-06-06 20:52:03 -05:00
Eric W. Biederman	97c1df3e54	mnt: If fs_fully_visible fails call put_filesystem. Add this trivial missing error handling. Cc: stable@vger.kernel.org Fixes: `1b852bceb0` ("mnt: Refactor the logic for mounting sysfs and proc in a user namespace") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2016-06-06 20:48:31 -05:00
Feifei Xu	34b3e6c92a	Btrfs: self-tests: Fix extent buffer bitmap test fail on BE system In __test_eb_bitmaps(), we write random data to a bitmap. Then copy the bitmap to another bitmap that resides inside an extent buffer. Later we verify the values of corresponding bits in the bitmap and the bitmap inside the extent buffer. However, extent_buffer_test_bit() reads in byte granularity while test_bit() reads in unsigned long granularity. Hence we end up comparing wrong bits on big-endian systems such as ppc64. This commit fixes the issue by reading the bitmap in byte granularity. Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-06 17:17:12 +02:00
Feifei Xu	36b3dc05b4	Btrfs: self-tests: Fix test_bitmaps fail on 64k sectorsize With 64K sectorsize, 1G sized block group cannot span across bitmaps. To execute test_bitmaps() function, this commit allocates "BITS_PER_BITMAP * sectorsize + PAGE_SIZE" sized block group. Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-06 17:17:12 +02:00
Feifei Xu	ef9f2db365	Btrfs: self-tests: Use macros instead of constants and add missing newline This commit replaces numerical constants with appropriate preprocessor macros. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-06 17:17:12 +02:00
Feifei Xu	d94f43b4c6	Btrfs: self-tests: Support testing all possible sectorsizes and nodesizes To test all possible sectorsizes, this commit adds a sectorsize array. This commit executes the tests for all possible sectorsizes and nodesizes. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-06 17:17:12 +02:00
Feifei Xu	ed9e4afdb0	Btrfs: self-tests: Execute page straddling test only when nodesize < PAGE_SIZE On ppc64, PAGE_SIZE is 64k which is same as BTRFS_MAX_METADATA_BLOCKSIZE. In such a scenario, we will never be able to have an extent buffer containing more than one page. Hence in such cases this commit does not execute the page straddling tests. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-06 17:17:11 +02:00
Jeff Mahoney	5f9e1059d9	btrfs: advertise which crc32c implementation is being used at module load Since several architectures support hardware-accelerated crc32c calculation, it would be nice to confirm that btrfs is actually using it. We can see an elevated use count for the module, but it doesn't actually show who the users are. This patch simply prints the name of the driver after successfully initializing the shash. Signed-off-by: Jeff Mahoney <jeffm@suse.com> [ added a helper and used in module load-time message ] Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-06 14:08:28 +02:00
Liu Bo	e06cd3dd7c	Btrfs: add validadtion checks for chunk loading To prevent fuzzed filesystem images from panic the whole system, we need various validation checks to refuse to mount such an image if btrfs finds any invalid value during loading chunks, including both sys_array and regular chunks. Note that these checks may not be sufficient to cover all corner cases, feel free to add more checks. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-06 10:57:09 +02:00
Liu Bo	99e3ecfcb9	Btrfs: add more validation checks for superblock This adds validation checks for super_total_bytes, super_bytes_used and super_stripesize, super_num_devices. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-06 10:41:53 +02:00
Liu Bo	d865177a5e	Btrfs: clear uptodate flags of pages in sys_array eb We set uptodate flag to pages in the temporary sys_array eb, but do not clear the flag after free eb. As the special btree inode may still hold a reference on those pages, the uptodate flag can remain alive in them. If btrfs_super_chunk_root has been intentionally changed to the offset of this sys_array eb, reading chunk_root will read content of sys_array and it will skip our beautiful checks in btree_readpage_end_io_hook() because of "pages of eb are uptodate => eb is uptodate" This adds the 'clear uptodate' part to force it to read from disk. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-06 10:14:40 +02:00
Eric W. Biederman	eedf265aa0	devpts: Make each mount of devpts an independent filesystem. The /dev/ptmx device node is changed to lookup the directory entry "pts" in the same directory as the /dev/ptmx device node was opened in. If there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx uses that filesystem. Otherwise the open of /dev/ptmx fails. The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that userspace can now safely depend on each mount of devpts creating a new instance of the filesystem. Each mount of devpts is now a separate and equal filesystem. Reserved ttys are now available to all instances of devpts where the mounter is in the initial mount namespace. A new vfs helper path_pts is introduced that finds a directory entry named "pts" in the directory of the passed in path, and changes the passed in path to point to it. The helper path_pts uses a function path_parent_directory that was factored out of follow_dotdot. In the implementation of devpts: - devpts_mnt is killed as it is no longer meaningful if all mounts of devpts are equal. - pts_sb_from_inode is replaced by just inode->i_sb as all cached inodes in the tty layer are now from the devpts filesystem. - devpts_add_ref is rolled into the new function devpts_ptmx. And the unnecessary inode hold is removed. - devpts_del_ref is renamed devpts_release and reduced to just a deacrivate_super. - The newinstance mount option continues to be accepted but is now ignored. In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as they are never used. Documentation/filesystems/devices.txt is updated to describe the current situation. This has been verified to work properly on openwrt-15.05, centos5, centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3, ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1, slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01. With the caveat that on centos6 and on slackware-14.1 that there wind up being two instances of the devpts filesystem mounted on /dev/pts, the lower copy does not end up getting used. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg KH <greg@kroah.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-06-05 10:36:01 -07:00
Al Viro	e6ec03a25f	autofs braino fix for do_last() It's an analogue of commit `7500c38a` (fix the braino in "namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()"). The same problem (->lookup()-returned unhashed negative dentry just might be an autofs one with ->d_manage() that would wait until the daemon makes it positive) applies in do_last() - we need to do follow_managed() first. Fortunately, remaining callers of follow_managed() are OK - only autofs has that weirdness (negative dentry that does not mean an instant -ENOENT)) and autofs never has its negative dentries hashed, so we can't pick one from a dcache lookup. ->d_manage() is a bloody mess ;-/ Cc: stable@vger.kernel.org # v4.6 Spotted-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-06-05 00:23:09 -04:00
Linus Torvalds	b2d5ad8223	Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "The important part of this pull is Filipe's set of fixes for btrfs device replacement. Filipe fixed a few issues seen on the list and a number he found on his own" * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent Btrfs: fix race between device replace and read repair Btrfs: fix race between device replace and discard Btrfs: fix race between device replace and chunk allocation Btrfs: fix race setting block group back to RW mode during device replace Btrfs: fix unprotected assignment of the left cursor for device replace Btrfs: fix race setting block group readonly during device replace Btrfs: fix race between device replace and block group removal Btrfs: fix race between readahead and device replace/removal	2016-06-04 11:56:28 -07:00
Al Viro	fac7d1917d	fix EOPENSTALE bug in do_last() EOPENSTALE occuring at the last component of a trailing symlink ends up with do_last() retrying its lookup. After the symlink body has been discarded. The thing is, all this retry_lookup logics in there is not needed at all - the upper layers will do the right thing if we simply return that -EOPENSTALE as we would with any other error. Trying to microoptimize in do_last() is a lot of headache for no good reason. Cc: stable@vger.kernel.org # v4.2+ Tested-by: Oleg Drokin <green@linuxhacker.ru> Reviewed-and-Tested-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-06-04 11:59:04 -04:00
Chris Mason	8dff9c8534	Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent When dealing with inline extents, btrfs_get_extent will incorrectly try to insert a duplicate extent_map. The dup hits -EEXIST from add_extent_map, but then we try to merge with the existing one and end up trying to insert a zero length extent_map. This actually works most of the time, except when there are extent maps past the end of the inline extent. rocksdb will trigger this sometimes because it preallocates an extent and then truncates down. Josef made a script to trigger with xfs_io: #!/bin/bash xfs_io -f -c "pwrite 0 1000" inline xfs_io -c "falloc -k 4k 1M" inline xfs_io -c "pread 0 1000" -c "fadvise -d 0 1000" -c "pread 0 1000" inline xfs_io -c "fadvise -d 0 1000" inline cat inline You'll get EIOs trying to read inline after this because add_extent_map is returning EEXIST Signed-off-by: Chris Mason <clm@fb.com>	2016-06-03 12:32:34 -07:00
Feifei Xu	b9ef22dedd	Btrfs: self-tests: Support non-4k page size self-tests code assumes 4k as the sectorsize and nodesize. This commit fix hardcoded 4K. Enables the self-tests code to be executed on non-4k page sized systems (e.g. ppc64). Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-02 19:23:14 +02:00
Feifei Xu	0ef6447a3d	Btrfs: Fix integer overflow when calculating bytes_per_bitmap On ppc64, bytes_per_bitmap will be (65536865536). Hence append UL to fix integer overflow. Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-02 19:22:49 +02:00
Feifei Xu	5473e0c426	Btrfs: test_check_exists: Fix infinite loop when searching for free space entries On a ppc64 machine using 64K as the block size, assume that the RB tree at btrfs_free_space_ctl->free_space_offset contains following two entries: 1. A bitmap entry having an offset value of 0 and having the bits corresponding to the address range [128M+512K, 128M+768K] set. 2. An extent entry corresponding to the address range [128M-256K, 128M-128K] In such a scenario, test_check_exists() invoked for checking the existence of address range [128M+768K, 256M] can lead to an infinite loop as explained below: - Checking for the extent entry fails. - Checking for a bitmap entry results in the free space info in range [128M+512K, 128M+768K] beng returned. - rb_prev(info) returns NULL because the bitmap entry starting from offset 0 comes first in the RB tree. - current_node = bitmap node. - while (current_node) tmp = rb_next(bitmap_node);/tmp is extent based free space entry/ Since extent based free space entry's last address is smaller than the address being searched for (i.e. 128M+768K) we incorrectly again obtain the extent node as the "next right node" of the RB tree and thus end up looping infinitely. This patch fixes the issue by checking the "tmp" variable which point to the most recently searched free space node. Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-02 19:22:34 +02:00
Yan, Zheng	f6973c0949	ceph: use i_version to check validity of fscache Signed-off-by: Yan, Zheng <zyan@redhat.com>	2016-06-01 10:32:14 +02:00
Yan, Zheng	f7f7e7a063	ceph: improve fscache revalidation There are several issues in fscache revalidation code. - In ceph_revalidate_work(), fscache_invalidate() is called when fscache_check_consistency() return 0. This is complete wrong because 0 means cache is valid. - Handle_cap_grant() calls ceph_queue_revalidate() if client already has CAP_FILE_CACHE. This code is confusing. Client should revalidate the cache each time it got CAP_FILE_CACHE anew. - In Handle_cap_grant(), fscache_invalidate() is called if MDS revokes CAP_FILE_CACHE. This is inconsistency with the case that inode get evicted. In the later case, the cache is not discarded. Client may use the cache when inode is reloaded. This patch moves the fscache revalidation into ceph_get_caps(). Client revalidates the cache after it gets CAP_FILE_CACHE. i_rdcache_gen should keep constance while CAP_FILE_CACHE is used. If i_fscache_gen is not equal to i_rdcache_gen, client needs to check cache's consistency. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2016-06-01 10:31:50 +02:00
Yan, Zheng	46b59b2be0	ceph: disable fscache when inode is opened for write All other filesystems do not add dirty pages to fscache. They all disable fscache when inode is opened for write. Only ceph adds dirty pages to fscache, but the code is buggy. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2016-06-01 10:31:07 +02:00

1 2 3 4 5 ...

44874 commits