alistair23-linux

redonkable

Author	SHA1	Message	Date
Josef Bacik	c37b2b6269	Btrfs: do not bug when we fail to commit the transaction We BUG if we fail to commit the transaction when creating a snapshot, which is just obnoxious. Remove the BUG_ON(). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-10-25 15:59:57 -04:00
Lukas Czerner	e515c18bfe	btrfs: Return EINVAL when length to trim is less than FSB Currently if len argument in btrfs_ioctl_fitrim() is smaller than one FSB we will continue and finally return 0 bytes discarded. However if the length to discard is smaller then file system block we should really return EINVAL. Signed-off-by: Lukas Czerner <lczerner@redhat.com>	2012-10-25 15:46:22 -04:00
Jeff Layton	4fa6b5ecbf	audit: overhaul __audit_inode_child to accomodate retrying In order to accomodate retrying path-based syscalls, we need to add a new "type" argument to audit_inode_child. This will tell us whether we're looking for a child entry that represents a create or a delete. If we find a parent, don't automatically assume that we need to create a new entry. Instead, use the information we have to try to find an existing entry first. Update it if one is found and create a new one if not. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-12 00:32:03 -04:00
Jeff Layton	c43a25abba	audit: reverse arguments to audit_inode_child Most of the callers get called with an inode and dentry in the reverse order. The compiler then has to reshuffle the arg registers and/or stack in order to pass them on to audit_inode_child. Reverse those arguments for a micro-optimization. Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-12 00:32:00 -04:00
Linus Torvalds	72055425e5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "This is a large pull, with the bulk of the updates coming from: - Hole punching - send/receive fixes - fsync performance - Disk format extension allowing more hardlinks inside a single directory (btrfs-progs patch required to enable the compat bit for this one) I'm cooking more unrelated RAID code, but I wanted to make sure this original batch makes it in. The largest updates here are relatively old and have been in testing for some time." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (121 commits) btrfs: init ref_index to zero in add_inode_ref Btrfs: remove repeated eb->pages check in, disk-io.c/csum_dirty_buffer Btrfs: fix page leakage Btrfs: do not warn_on when we cannot alloc a page for an extent buffer Btrfs: don't bug on enomem in readpage Btrfs: cleanup pages properly when ENOMEM in compression Btrfs: make filesystem read-only when submitting barrier fails Btrfs: detect corrupted filesystem after write I/O errors Btrfs: make compress and nodatacow mount options mutually exclusive btrfs: fix message printing Btrfs: don't bother committing delayed inode updates when fsyncing btrfs: move inline function code to header file Btrfs: remove unnecessary IS_ERR in bio_readpage_error() btrfs: remove unused function btrfs_insert_some_items() Btrfs: don't commit instead of overcommitting Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called Btrfs: be smarter about dropping things from the tree log Btrfs: don't lookup csums for prealloc extents Btrfs: cache extent state when writing out dirty metadata pages Btrfs: do not hold the file extent leaf locked when adding extent item ...	2012-10-10 10:49:20 +09:00
Stefan Behrens	5af3e8cce8	Btrfs: make filesystem read-only when submitting barrier fails So far the return code of barrier_all_devices() is ignored, which means that errors are ignored. The result can be a corrupt filesystem which is not consistent. This commit adds code to evaluate the return code of barrier_all_devices(). The normal btrfs_error() mechanism is used to switch the filesystem into read-only mode when errors are detected. In order to decide whether barrier_all_devices() should return error or success, the number of disks that are allowed to fail the barrier submission is calculated. This calculation accounts for the worst RAID level of metadata, system and data. If single, dup or RAID0 is in use, a single disk error is already considered to be fatal. Otherwise a single disk error is tolerated. The calculation of the number of disks that are tolerated to fail the barrier operation is performed when the filesystem gets mounted, when a balance operation is started and finished, and when devices are added or removed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-10-09 09:20:19 -04:00
Liu Bo	aa42ffd918	Btrfs: fix off-by-one in file clone Btrfs uses inclusive range end for lock_extent(), unlock_extent() and related functions, so we made off-by-one errors in file clone. This fixes it and also fixes some style problems. Signed-off-by: Liu Bo <bo.li.liu@oracle.com>	2012-10-08 20:07:32 -04:00
David Sterba	7e97b8daf6	btrfs: allow setting NOCOW for a zero sized file via ioctl Hi, the patch si simple, but it has user visible impact and I'm not quite sure how to resolve it. In short, $subj says it, chattr -C supports it and we want to use it. The conditions that acutally allow to change the NOCOW flag are clear. What if I try to set the flag on a file that is not empty? Options: 1) whole ioctl will fail, EINVAL 2.1) ioctl will succeed, the NOCOW flag will be silently removed, but the file will stay COW-ed and checksummed 2.2) ioctl will succeed, flag will not be removed and a syslog message will warn that the COW flag has not been changed 2.2.1) dtto, no syslog message Man page of chattr states that "If it is set on a file which already has data blocks, it is undefined when the blocks assigned to the file will be fully stable." Yes, it's undefined and with current implementation it'll never happen. So from this end, the user cannot expect anything. I'm trying to find a reasonable behaviour, so that a command like 'chattr -R -aijS +C' to tweak a broad set of flags in a deep directory does not fail unnecessarily and does not pollute the log. My personal preference is 2.2.1, but my dev's oppinion is skewed, not counting the fact that I know the code and otherwise would look there before consulting the documentation. The patch implements 2.2.1. david -------------8<------------------- From: David Sterba <dsterba@suse.cz> It's safe to turn off checksums for a zero sized file. http://thread.gmane.org/gmane.comp.file-systems.btrfs/18030 "We cannot switch on NODATASUM for a file that already has extents that are checksummed. The invariant here is that either all the extents or none are checksummed. Theoretically it's possible to add/remove all checksums from a given file, but it's a potentially longtime operation, the file has to be in some intermediate state where the checksums partially exist but have to be ignored (for the csum->nocsum) until the file is fully converted, this brings more special cases to extent handling, it has to survive power failure and remain consistent, and probably needs to be restarted after next mount." Signed-off-by: David Sterba <dsterba@suse.cz>	2012-10-04 09:40:00 -04:00
Linus Torvalds	aab174f0df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...	2012-10-02 20:25:04 -07:00
Linus Torvalds	437589a74b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...	2012-10-02 11:11:09 -07:00
Liu Bo	425d17a290	Btrfs: use larger limit for translation of logical to inode This is the change of the kernel side. Translation of logical to inode used to have an upper limit 4k on inode container's size, but the limit is not large enough for a data with a great many of refs, so when resolving logical address, we can end up with "ioctl ret=0, bytes_left=0, bytes_missing=19944, cnt=510, missed=2493" This changes to regard 64k as the upper limit and use vmalloc instead of kmalloc to get memory more easily. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>	2012-10-01 15:19:19 -04:00
Liu Bo	df031f0752	Btrfs: use helper for logical resolve We already have a helper, iterate_inodes_from_logical(), for logical resolve, so just use it. Signed-off-by: Liu Bo <bo.li.liu@oracle.com>	2012-10-01 15:19:18 -04:00
Liu Bo	69917e4312	Btrfs: fix a bug in parsing return value in logical resolve In logical resolve, we parse extent_from_logical()'s 'ret' as a kind of flag. It is possible to lose our errors because (-EXXXX & BTRFS_EXTENT_FLAG_TREE_BLOCK) is true. I'm not sure if it is on purpose, it just looks too hacky if it is. I'd rather use a real flag and a 'ret' to catch errors. Acked-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Liu Bo <liub.liubo@gmail.com>	2012-10-01 15:19:18 -04:00
Liu Bo	9e8a4a8b0b	Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag We're going to use this flag EXTENT_DEFRAG to indicate which range belongs to defragment so that we can implement snapshow-aware defrag: We set the EXTENT_DEFRAG flag when dirtying the extents that need defragmented, so later on writeback thread can differentiate between normal writeback and writeback started by defragmentation. Original-Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>	2012-10-01 15:19:15 -04:00
Miao Xie	48c03c4bcf	Btrfs: fix wrong size for the reservation of the, snapshot creation We should insert/update 6 items(root ref, root backref, dir item, dir index, root item and parent inode) when creating a snapshot, not 5 items, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>	2012-10-01 15:19:12 -04:00
Miao Xie	66d8f3dd1c	Btrfs: add a new "type" field into the block reservation structure Sometimes we need choose the method of the reservation according to the type of the block reservation, such as the reservation for the delayed inode update. Now we identify the type just by comparing the address of the reservation variants, it is very ugly if it is a temporary one because we need compare it with all the common reservation variants. So we add a new "type" field to keep the type the reservation variants. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>	2012-10-01 15:19:11 -04:00
Josef Bacik	2671485d39	Btrfs: remove unused hint byte argument for btrfs_drop_extents I audited all users of btrfs_drop_extents and found that nobody actually uses the hint_byte argument. I'm sure it was used for something at some point but it's not used now, and the way the pinning works the disk bytenr would never be immediately useful anyway so lets just remove it. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-10-01 15:19:06 -04:00
Josef Bacik	5dc562c541	Btrfs: turbo charge fsync At least for the vm workload. Currently on fsync we will 1) Truncate all items in the log tree for the given inode if they exist and 2) Copy all items for a given inode into the log The problem with this is that for things like VMs you can have lots of extents from the fragmented writing behavior, and worst yet you may have only modified a few extents, not the entire thing. This patch fixes this problem by tracking which transid modified our extent, and then when we do the tree logging we find all of the extents we've modified in our current transaction, sort them and commit them. We also only truncate up to the xattrs of the inode and copy that stuff in normally, and then just drop any extents in the range we have that exist in the log already. Here are some numbers of a 50 meg fio job that does random writes and fsync()s after every write Original Patched SATA drive 82KB/s 140KB/s Fusion drive 431KB/s 2532KB/s So around 2-6 times faster depending on your hardware. There are a few corner cases, for example if you truncate at all we have to do it the old way since there is no way to be sure what is in the log is ok. This probably could be done smarter, but if you write-fsync-truncate-write-fsync you deserve what you get. All this work is in RAM of course so if your inode gets evicted from cache and you read it in and fsync it we'll do it the slow way if we are still in the same transaction that we last modified the inode in. The biggest cool part of this is that it requires no changes to the recovery code, so if you fsync with this patch and crash and load an old kernel, it will run the recovery and be a-ok. I have tested this pretty thoroughly with an fsync tester and everything comes back fine, as well as xfstests. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-10-01 15:19:03 -04:00
Al Viro	2903ff019b	switch simple cases of fget_light to fdget Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 22:20:08 -04:00
Al Viro	8319aa9127	switch btrfs_ioctl_clone() to fget_light() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:10:09 -04:00
Al Viro	ecd188159e	switch btrfs_ioctl_snap_create_transid() to fget_light() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:10:07 -04:00
Eric W. Biederman	2f2f43d3c7	userns: Convert btrfs to use kuid/kgid where appropriate Cc: Chris Mason <chris.mason@fusionio.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-09-21 03:13:31 -07:00
Linus Torvalds	318e151019	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I've split out the big send/receive update from my last pull request and now have just the fixes in my for-linus branch. The send/recv branch will wander over to linux-next shortly though. The largest patches in this pull are Josef's patches to fix DIO locking problems and his patch to fix a crash during balance. They are both well tested. The rest are smaller fixes that we've had queued. The last rc came out while I was hacking new and exciting ways to recover from a misplaced rm -rf on my dev box, so these missed rc3." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits) Btrfs: fix that repair code is spuriously executed for transid failures Btrfs: fix ordered extent leak when failing to start a transaction Btrfs: fix a dio write regression Btrfs: fix deadlock with freeze and sync V2 Btrfs: revert checksum error statistic which can cause a BUG() Btrfs: remove superblock writing after fatal error Btrfs: allow delayed refs to be merged Btrfs: fix enospc problems when deleting a subvol Btrfs: fix wrong mtime and ctime when creating snapshots Btrfs: fix race in run_clustered_refs Btrfs: don't run __tree_mod_log_free_eb on leaves Btrfs: increase the size of the free space cache Btrfs: barrier before waitqueue_active Btrfs: fix deadlock in wait_for_more_refs btrfs: fix second lock in btrfs_delete_delayed_items() Btrfs: don't allocate a seperate csums array for direct reads Btrfs: do not strdup non existent strings Btrfs: do not use missing devices when showing devname Btrfs: fix that error value is changed by mistake Btrfs: lock extents as we map them in DIO ...	2012-08-29 11:36:22 -07:00
Dan Carpenter	dadd1105ca	Btrfs: fix some endian bugs handling the root times "trans->transid" is cpu endian but we want to store the data as little endian. "item->ctime.nsec" is only 32 bits, not 64. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>	2012-08-28 16:53:26 -04:00
Alexander Block	e00da2067b	Btrfs: remove mnt_want_write call in btrfs_mksubvol We got a recursive lock in mksubvol because the caller already held a lock. I think we got into this due to a merge error. Commit `a874a63` removed the mnt_want_write call from btrfs_mksubvol and added a replacement call to mnt_want_write_file in btrfs_ioctl_snap_create_transid. Commit `e7848683` however tried to move all calls to mnt_want_write above i_mutex. So somewhere while merging this, it got mixed up. The solution is to remove the mnt_want_write call completely from mksubvol. Reported-by: David Sterba <dave@jikos.cz> Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-08-09 11:01:54 -04:00
Linus Torvalds	a0e881b7c1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is not dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...	2012-08-01 10:26:23 -07:00
Jan Kara	e7848683ae	btrfs: Push mnt_want_write() outside of i_mutex When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call consistently outside of i_mutex. CC: Chris Mason <chris.mason@oracle.com> CC: linux-btrfs@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:51 +04:00
Chris Mason	113c1cb530	Merge branch 'send-v2' of git://github.com/ablock84/linux-btrfs into for-linus This is the kernel portion of btrfs send/receive Conflicts: fs/btrfs/Makefile fs/btrfs/backref.h fs/btrfs/ctree.c fs/btrfs/ioctl.c fs/btrfs/ioctl.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 19:19:10 -04:00
Alexander Block	31db9f7c23	Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive This patch introduces the BTRFS_IOC_SEND ioctl that is required for send. It allows btrfs-progs to implement full and incremental sends. Patches for btrfs-progs will follow. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>	2012-07-25 23:30:19 +02:00
Alexander Block	8ea05e3a42	Btrfs: introduce subvol uuids and times This patch introduces uuids for subvolumes. Each subvolume has it's own uuid. In case it was snapshotted, it also contains parent_uuid. In case it was received, it also contains received_uuid. It also introduces subvolume ctime/otime/stime/rtime. The first two are comparable to the times found in inodes. otime is the origin/creation time and ctime is the change time. stime/rtime are only valid on received subvolumes. stime is the time of the subvolume when it was sent. rtime is the time of the subvolume when it was received. Additionally to the times, we have a transid for each time. They are updated at the same place as the times. btrfs receive uses stransid and rtransid to find out if a received subvolume changed in the meantime. If an older kernel mounts a filesystem with the extented fields, all fields become invalid. The next mount with a new kernel will detect this and reset the fields. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>	2012-07-25 23:28:38 +02:00
Mitch Harder	2b0ce2c290	Btrfs: Check INCOMPAT flags on remount and add helper function In support of the recently added capability to remount with lzo compression, provide a helper function to check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Also, implement the new helper function when defragmenting with explicit lzo compression and when setting the default subvolume. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 16:14:31 -04:00
Chris Mason	b478b2baa3	Merge branch 'qgroup' of git://git.jan-o-sch.net/btrfs-unstable into for-linus Conflicts: fs/btrfs/ioctl.c fs/btrfs/ioctl.h fs/btrfs/transaction.c fs/btrfs/transaction.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 16:11:38 -04:00
David Sterba	362a20c5e2	btrfs: allow cross-subvolume file clone Lift the EXDEV condition and allow different root trees for files being cloned, then pass source inode's root when searching for extents. Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from one filesystem are mounted separately. Signed-off-by: David Sterba <dsterba@suse.cz>	2012-07-25 17:33:09 +02:00
Liu Bo	b9ca0664dc	Btrfs: do not set subvolume flags in readonly mode $ mkfs.btrfs /dev/sdb7 $ btrfstune -S1 /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs mount: block device /dev/sdb7 is write-protected, mounting read-only $ btrfs dev add /dev/sdb8 /mnt/btrfs/ Now we get a btrfs in which mnt flags has readonly but sb flags does not. So for those ioctls that only check sb flags with MS_RDONLY, it is going to be a problem. Setting subvolume flags is such an ioctl, we should use mnt_want_write_file() to check RO flags. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:58 -04:00
Liu Bo	e54bfa3104	Btrfs: use mnt_want_write_file instead of mnt_want_write mnt_want_write_file is faster when file has been opened for write. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:57 -04:00
Liu Bo	768e9dfe82	Btrfs: remove redundant r/o check for superblock mnt_want_write() and mnt_want_write_file() will check sb->s_flags with MS_RDONLY, and we don't need to do it ourselves. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:56 -04:00
Liu Bo	a874a63e13	Btrfs: check write access to mount earlier while creating snapshots Move check of write access to mount into upper functions so that we can use mnt_want_write_file instead, which is faster than mnt_want_write. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:56 -04:00
David Sterba	b27f7c0c15	btrfs: join DEV_STATS ioctls to one Commit `c11d2c236c` (Btrfs: add ioctl to get and reset the device stats) introduced two ioctls doing almost the same thing distinguished by just the ioctl number which encodes "do reset after read". I have suggested http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16604.html to implement it via the ioctl args. This hasn't happen, and I think we should use a more clean way to pass flags and should not waste ioctl numbers. CC: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: David Sterba <dsterba@suse.cz>	2012-07-23 15:41:40 -04:00
Andrew Mahone	a43a211133	btrfs: ignore unfragmented file checks in defrag when compression enabled - rebased Rebased on btrfs-next and retested. Inform should_defrag_range if BTRFS_DEFRAG_RANGE_COMPRESS is set. If so, skip checks for adjacent extents and extent size when deciding whether to defrag, as these can prevent an uncompressed and unfragmented file from being compressed as requested. Signed-off-by: Andrew Mahone <andrew.mahone@gmail.com>	2012-07-23 15:41:39 -04:00
Al Viro	11e62a8fab	btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:01:43 +04:00
Arne Jansen	6f72c7e20d	Btrfs: add qgroup inheritance When creating a subvolume or snapshot, it is necessary to initialize the qgroup account with a copy of some other (tracking) qgroup. This patch adds parameters to the ioctls to pass the information from which qgroup to inherit. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-12 10:54:40 +02:00
Arne Jansen	5d13a37bd5	Btrfs: add qgroup ioctls Ioctls to control the qgroup feature like adding and removing qgroups and assigning qgroups. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-12 10:54:39 +02:00
Chris Mason	a8c4a33b98	Btrfs: cast devid to unsigned long long for printk %llu Avoid warning in 32 bit machines Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-06-15 20:07:17 -04:00
Liu Bo	4e42ae1bdc	Btrfs: do not resize a seeding device Seeding devices are not supposed to change any more. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-06-15 11:42:26 -04:00
Li Zefan	6c282eb40e	Btrfs: fix defrag regression If a file has 3 small extents: \| ext1 \| ext2 \| ext3 \| Running "btrfs fi defrag" will only defrag the last two extents, if those extent mappings hasn't been read into memory from disk. This bug was introduced by commit `17ce6ef8d7` ("Btrfs: add a check to decide if we should defrag the range") The cause is, that commit looked into previous and next extents using lookup_extent_mapping() only. While at it, remove the code that checks the previous extent, since it's sufficient to check the next extent. Signed-off-by: Li Zefan <lizefan@huawei.com>	2012-06-14 21:30:55 -04:00
Josef Bacik	606686eeac	Btrfs: use rcu to protect device->name Al pointed out that we can just toss out the old name on a device and add a new one arbitrarily, so anybody who uses device->name in printk could possibly use free'd memory. Instead of adding locking around all of this he suggested doing it with RCU, so I've introduced a struct rcu_string that does just that and have gone through and protected all accesses to device->name that aren't under the uuid_mutex with rcu_read_lock(). This protects us and I will use it for dealing with removing the device that we used to mount the file system in a later patch. Thanks, Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com>	2012-06-14 21:29:16 -04:00
Chris Mason	1e20932a23	Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into for-linus Conflicts: fs/btrfs/ulist.h Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-05-31 16:49:53 -04:00
Stefan Behrens	c11d2c236c	Btrfs: add ioctl to get and reset the device stats An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-05-30 10:23:40 -04:00
Liu Bo	9ba1f6e44e	Btrfs: do not do balance in readonly mode In normal cases, we would not be allowed to do balance in RO mode. However, when we're using a seeding device and adding another device to sprout, things will change: $ mkfs.btrfs /dev/sdb7 $ btrfstune -S 1 /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs -o ro $ btrfs fi bal /mnt/btrfs -----------------------> fail. $ btrfs dev add /dev/sdb8 /mnt/btrfs $ btrfs fi bal /mnt/btrfs -----------------------> works! It should not be designed as an exception, and we'd better add another check for mnt flags. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:35 -04:00
Jim Meyering	a27202fbe9	Btrfs: NUL-terminate path buffer in DEV_INFO ioctl result A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer would not be NUL-terminated in the DEV_INFO ioctl result buffer. Signed-off-by: Jim Meyering <meyering@redhat.com>	2012-05-30 10:23:31 -04:00
Daniel J Blueman	2eec6c8102	Fix minor type issues Address some minor type issues identified by sparse checker. Signed-off-by: Daniel J Blueman <daniel@quora.org>	2012-05-30 10:23:30 -04:00
Josef Bacik	0c4d2d95d0	Btrfs: use i_version instead of our own sequence We've been keeping around the inode sequence number in hopes that somebody would use it, but nobody uses it and people actually use i_version which serves the same purpose, so use i_version where we used the incore inode's sequence number and that way the sequence is updated properly across the board, and not just in file write. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:27 -04:00
Jan Schmidt	5581a51a59	Btrfs: don't set for_cow parameter for tree block functions Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed parameter for_cow = 1. In fact, these two functions should never mark their tree modification operations as for_cow, because they can change the number of blocks referenced by a tree. Hence, we remove the extra for_cow parameter from these functions and make them pass a zero down. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:53 +02:00
Stefan Behrens	99ba55ad69	Btrfs: fix btrfs_ioctl_dev_info() crash on missing device When a filesystem is mounted with the degraded option, it is possible that some of the devices are not there. btrfs_ioctl_dev_info() crashs in this case because the device name is a NULL pointer. This ioctl was only used for scrub. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-04-18 19:22:35 +02:00
Liu Bo	e1f041e14c	Btrfs: update to the right index of defragment When we use autodefrag, we forget to update the index which indicates the last page we've dirty. And we'll set dirty flags on a same set of pages again and again. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-29 09:57:45 -04:00
Liu Bo	66c2689226	Btrfs: do not bother to defrag an extent if it is a big real extent $ mkfs.btrfs /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag $ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null $ filefrag -v /mnt/btrfs/foobar Filesystem type is: 9123683e File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3072 10 eof /mnt/btrfs/foobar: 1 extent found Now we have a big real extent [0, 40960), but autodefrag will still defrag it. $ sync $ filefrag -v /mnt/btrfs/foobar Filesystem type is: 9123683e File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3082 10 eof /mnt/btrfs/foobar: 1 extent found So if we already find a big real extent, we're ok about that, just skip it. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-29 09:57:45 -04:00
Liu Bo	17ce6ef8d7	Btrfs: add a check to decide if we should defrag the range If our file's layout is as follows: \| hole \| data1 \| hole \| data2 \| we do not need to defrag this file, because this file has holes and cannot be merged into one extent. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-29 09:57:45 -04:00
Liu Bo	1f12bd0632	Btrfs: fix the mismatch of page->mapping commit `600a45e1d5` (Btrfs: fix deadlock on page lock when doing auto-defragment) fixes the deadlock on page, but it also introduces another bug. A page may have been truncated after unlock & lock. So we need to find it again to get the right one. And since we've held i_mutex lock, inode size remains unchanged and we can drop isize overflow checks. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-29 09:57:44 -04:00
Liu Bo	ecb8bea87d	Btrfs: fix race between direct io and autodefrag The bug is from running xfstests 209 with autodefrag. The race is as follows: t1 t2(autodefrag) direct IO invalidate pagecache dio(old data) add_inode_defrag invalidate pagecache endio direct IO invalidate pagecache run_defrag readpage(old data) set page dirty (old data) dio(new data, rewrite) invalidate pagecache () endio t2(autodefrag) will get old data into pagecache via readpage and set pagecache dirty. Meanwhile, invalidate pagecache() will fail due to dirty flags in pages. So the old data may be flushed into disk by flush thread, which will lead to data loss. And so does the case of user defragment progs. The patch fixes this race by holding i_mutex when we readpage and set page dirty. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-29 09:57:44 -04:00
Chris Mason	98961a7e43	Merge git://git.jan-o-sch.net/btrfs-unstable into for-linus Conflicts: fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-03-28 20:33:40 -04:00
Jan Schmidt	7a3ae2f8c8	Btrfs: fix regression in scrub path resolving In commit `4692cf58` we introduced new backref walking code for btrfs. This assumes we're searching live roots, which requires a transaction context. While scrubbing, however, we must not join a transaction because this could deadlock with the commit path. Additionally, what scrub really wants to do is resolving a logical address in the commit root it's currently checking. This patch adds support for logical to path resolving on commit roots and makes scrub use that. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-03-27 14:51:21 +02:00
Jeff Mahoney	79787eaab4	btrfs: replace many BUG_ONs with proper error handling btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 11:52:54 +01:00
Mark Fasheh	ce598979be	btrfs: Don't BUG_ON errors from btrfs_create_subvol_root() This is called from only one place - create_subvol() which passes errors safely back out to it's caller, btrfs_mksubvol where they are handled. Additionally, btrfs_create_subvol_root() itself bug's needlessly from error return of btrfs_update_inode(). Since create_subvol() was fixed to catch errors we can bubble this one up too. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2012-03-22 01:45:36 +01:00
Jeff Mahoney	d0082371cf	btrfs: drop gfp_t from lock_extent lock_extent and unlock_extent are always called with GFP_NOFS, drop the argument and use GFP_NOFS consistently. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:35 +01:00
Linus Torvalds	855a85f704	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Quoth Chris: "This is later than I wanted because I got backed up running through btrfs bugs from the Oracle QA teams. But they are all bug fixes that we've queued and tested since rc1. Nothing in particular stands out, this just reflects bug fixing and QA done in parallel by all the btrfs developers. The most user visible of these is: Btrfs: clear the extent uptodate bits during parent transid failures Because that helps deal with out of date drives (say an iscsi disk that has gone away and come back). The old code wasn't always properly retrying the other mirror for this type of failure." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits) Btrfs: fix compiler warnings on 32 bit systems Btrfs: increase the global block reserve estimates Btrfs: clear the extent uptodate bits during parent transid failures Btrfs: add extra sanity checks on the path names in btrfs_mksubvol Btrfs: make sure we update latest_bdev Btrfs: improve error handling for btrfs_insert_dir_item callers Btrfs: be less strict on finding next node in clear_extent_bit Btrfs: fix a bug on overcommit stuff Btrfs: kick out redundant stuff in convert_extent_bit Btrfs: skip states when they does not contain bits to clear Btrfs: check return value of lookup_extent_mapping() correctly Btrfs: fix deadlock on page lock when doing auto-defragment Btrfs: fix return value check of extent_io_ops btrfs: honor umask when creating subvol root btrfs: silence warning in raid array setup btrfs: fix structs where bitfields and spinlock/atomic share 8B word btrfs: delalloc for page dirtied out-of-band in fixup worker Btrfs: fix memory leak in load_free_space_cache() btrfs: don't check DUP chunks twice Btrfs: fix trim 0 bytes after a device delete ...	2012-02-24 09:02:53 -08:00
Chris Mason	16780cabb8	Btrfs: add extra sanity checks on the path names in btrfs_mksubvol Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-23 10:43:45 -05:00
Miao Xie	600a45e1d5	Btrfs: fix deadlock on page lock when doing auto-defragment When I ran xfstests circularly on a auto-defragment btrfs, the deadlock happened. Steps to reproduce: [tty0] # export MOUNT_OPTIONS="-o autodefrag" # export TEST_DEV=<partition1> # export TEST_DIR=<mountpoint1> # export SCRATCH_DEV=<partition2> # export SCRATCH_MNT=<mountpoint2> # while [ 1 ] > do > ./check 091 127 263 > sleep 1 > done [tty1] # while [ 1 ] > do > echo 3 > /proc/sys/vm/drop_caches > done Several hours later, the test processes will hang on, and the deadlock will happen on page lock. The reason is that: Auto defrag task Flush thread Test task btrfs_writepages() add ordered extent (including page 1, 2) set page 1 writeback set page 2 writeback endio_fn() end page 2 writeback release page 2 lock page 1 alloc and lock page 2 page 2 is not uptodate btrfs_readpage() start ordered extent() btrfs_writepages() try to lock page 1 so deadlock happens. Fix this bug by unlocking the page which is in writeback, and re-locking it after the writeback end. Signed-off-by: Miao Xie <miax@cn.fujitsu.com>	2012-02-16 17:23:16 +01:00
Linus Torvalds	67d2433ee7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix reservations in btrfs_page_mkwrite Btrfs: advance window_start if we're using a bitmap btrfs: mask out gfp flags in releasepage Btrfs: fix enospc error caused by wrong checks of the chunk Btrfs: do not defrag a file partially Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c Btrfs: use cluster->window_start when allocating from a cluster bitmap Btrfs: Check for NULL page in extent_range_uptodate btrfs: Fix busyloops in transaction waiting code Btrfs: make sure a bitmap has enough bytes Btrfs: fix uninit warning in backref.c	2012-01-28 17:00:19 -08:00
Liu Bo	7ec31b548a	Btrfs: do not defrag a file partially xfstests 218 complains that btrfs defrags a file partially: After: 1 Write backwards sync, but contiguous - should defrag to 1 extent Before: 10 -After: 1 +After: 2 To fix this, we need to set max_to_defrag count properly. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
Linus Torvalds	d65773b22b	Merge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: take allocation of ->tree_root into open_ctree() btrfs: let ->s_fs_info point to fs_info, not root... btrfs: consolidate failure exits in btrfs_mount() a bit btrfs: make free_fs_info() call ->kill_sb() unconditional btrfs: merge free_fs_info() calls on fill_super failures btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super() btrfs: make open_ctree() return int btrfs: sanitizing ->fs_info, part 5 btrfs: sanitizing ->fs_info, part 4 btrfs: sanitizing ->fs_info, part 3 btrfs: sanitizing ->fs_info, part 2 btrfs: sanitizing ->fs_info, part 1 btrfs: fix a deadlock in btrfs_scan_one_device() btrfs: fix mount/umount race btrfs: get ->kill_sb() of its own btrfs: preparation to fixing mount/umount race	2012-01-17 15:52:51 -08:00
Linus Torvalds	f9156c7288	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits) Btrfs: use larger system chunks Btrfs: add a delalloc mutex to inodes for delalloc reservations Btrfs: space leak tracepoints Btrfs: protect orphan block rsv with spin_lock Btrfs: add allocator tracepoints Btrfs: don't call btrfs_throttle in file write Btrfs: release space on error in page_mkwrite Btrfs: fix btrfsck error 400 when truncating a compressed Btrfs: do not use btrfs_end_transaction_throttle everywhere Btrfs: add balance progress reporting Btrfs: allow for resuming restriper after it was paused Btrfs: allow for canceling restriper Btrfs: allow for pausing restriper Btrfs: add skip_balance mount option Btrfs: recover balance on mount Btrfs: save balance parameters to disk Btrfs: soft profile changing mode (aka soft convert) Btrfs: implement online profile changing Btrfs: do not reduce profile in do_chunk_alloc() Btrfs: virtual address space subset filter ... Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new mnt_drop_write_file() helper.	2012-01-17 15:49:54 -08:00
Josef Bacik	f248679e86	Btrfs: add a delalloc mutex to inodes for delalloc reservations I was using i_mutex for this, but we're getting bogus lockdep warnings by doing that and theres no real way to get rid of those, so just stop using i_mutex to protect delalloc metadata reservations and use a delalloc mutex instead. This shouldn't be contended often at all, only if you are writing and mmap writing to the file at the same time. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-01-16 15:29:43 -05:00
Chris Mason	9785dbdf26	Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into integration	2012-01-16 15:26:31 -05:00
Chris Mason	d756bd2d93	Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integration Conflicts: fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-16 15:26:17 -05:00
Ilya Dryomov	19a39dce3b	Btrfs: add balance progress reporting Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:49 +02:00
Ilya Dryomov	de322263d3	Btrfs: allow for resuming restriper after it was paused Recognize BTRFS_BALANCE_RESUME flag passed from userspace. We use the same heuristics used when recovering balance after a crash to try to start where we left off last time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:49 +02:00
Ilya Dryomov	a7e99c691a	Btrfs: allow for canceling restriper Implement an ioctl for canceling restriper. Currently we wait until relocation of the current block group is finished, in future this can be done by triggering a commit. Balance item is deleted and no memory about the interrupted balance is kept. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:49 +02:00
Ilya Dryomov	837d5b6e46	Btrfs: allow for pausing restriper Implement an ioctl for pausing restriper. This pauses the relocation, but balance is still considered to be "in progress": balance item is not deleted, other volume operations cannot be started, etc. If paused in the middle of profile changing operation we will continue making allocations with the target profile. Add a hook to close_ctree() to pause restriper and free its data structures on unmount. (It's safe to unmount when restriper is in "paused" state, we will resume with the same parameters on the next mount) Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:49 +02:00
Ilya Dryomov	f43ffb60fd	Btrfs: add basic infrastructure for selective balancing This allows to have a separate set of filters for each chunk type (data,meta,sys). The code however is generic and switch on chunk type is only done once. This commit also adds a type filter: it allows to balance for example meta and system chunks w/o touching data ones. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Ilya Dryomov	c9e9f97bdf	Btrfs: add basic restriper infrastructure Add basic restriper infrastructure: extended balancing ioctl and all related ioctl data structures, add data structure for tracking restriper's state to fs_info, etc. The semantics of the old balancing ioctl are fully preserved. Explicitly disallow any volume operations when balance is in progress. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-01-16 22:04:47 +02:00
Li Zefan	4da6f1a332	Btrfs: reserve metadata space in btrfs_ioctl_setflags() Check and reserve space for btrfs_update_inode(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:39 +08:00
Li Zefan	f062abf089	Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags() We can recover from errors and return -errno to user space. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:38 +08:00
Al Viro	815745cf3e	btrfs: let ->s_fs_info point to fs_info, not root... the latter can be obtained from the former (by looking as ->tree_root) just as cheaply as we currently are doing the other way round. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-08 19:35:37 -05:00
Jan Schmidt	4692cf58aa	Btrfs: new backref walking code The old backref iteration code could only safely be used on commit roots. Besides this limitation, it had bugs in finding the roots for these references. This commit replaces large parts of it by btrfs_find_all_roots() which a) really finds all roots and the correct roots, b) works correctly under heavy file system load, c) considers delayed refs. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-01-05 10:49:43 +01:00
Al Viro	2a79f17e4a	vfs: mnt_drop_write_file() new helper (wrapper around mnt_drop_write()) to be used in pair with mnt_want_write_file(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:40 -05:00
Al Viro	a561be7100	switch a bunch of places to mnt_want_write_file() it's both faster (in case when file has been opened for write) and cleaner. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:35 -05:00
Arne Jansen	66d7e7f09f	Btrfs: mark delayed refs as for cow Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value from every call site. The for_cow parameter will later on be used to determine if a ref will change anything with respect to qgroups. Delayed refs coming from relocation are always counted as for_cow, as they don't change subvol quota. Also pass in the fs_info for later use. btrfs_find_all_roots() will use this as an optimization, as changes that are for_cow will not change anything with respect to which root points to a certain leaf. Thus, we don't need to add the current sequence number to those delayed refs. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2011-12-22 16:22:27 +01:00
Chris Mason	567a45e917	Merge branch 'for-chris' of http://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into integration Conflicts: fs/btrfs/inode.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-12-15 13:43:49 -05:00
Josef Bacik	660d3f6cde	Btrfs: fix how we do delalloc reservations and how we free reservations on error Running xfstests 269 with some tracing my scripts kept spitting out errors about releasing bytes that we didn't actually have reserved. This took me down a huge rabbit hole and it turns out the way we deal with reserved_extents is wrong, we need to only be setting it if the reservation succeeds, otherwise the free() method will come in and unreserve space that isn't actually reserved yet, which can lead to other warnings and such. The math was all working out right in the end, but it caused all sorts of other issues in addition to making my scripts yell and scream and generally make it impossible for me to track down the original issue I was looking for. The other problem is with our error handling in the reservation code. There are two cases that we need to deal with 1) We raced with free. In this case free won't free anything because csum_bytes is modified before we dro the lock in our reservation path, so free rightly doesn't release any space because the reservation code may be depending on that reservation. However if we fail, we need the reservation side to do the free at that point since that space is no longer in use. So as it stands the code was doing this fine and it worked out, except in case #2 2) We don't race with free. Nobody comes in and changes anything, and our reservation fails. In this case we didn't reserve anything anyway and we just need to clean up csum_bytes but not free anything. So we keep track of csum_bytes before we drop the lock and if it hasn't changed we know we can just decrement csum_bytes and carry on. Because of the case where we can race with free()'s since we have to drop our spin_lock to do the reservation, I'm going to serialize all reservations with the i_mutex. We already get this for free in the heavy use paths, truncate and file write all hold the i_mutex, just needed to add it to page_mkwrite and various ioctl/balance things. With this patch my space leak scripts no longer scream bloody murder. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-12-15 11:04:22 -05:00
Li Zefan	306424cc88	Btrfs: fix ctime update of on-disk inode To reproduce the bug: # touch /mnt/tmp # stat /mnt/tmp \| grep Change Change: 2011-12-09 09:32:23.412105981 +0800 # chattr +i /mnt/tmp # stat /mnt/tmp \| grep Change Change: 2011-12-09 09:32:43.198105295 +0800 # umount /mnt # mount /dev/loop1 /mnt # stat /mnt/tmp \| grep Change Change: 2011-12-09 09:32:23.412105981 +0800 We should update ctime of in-memory inode before calling btrfs_update_inode(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-12-15 10:50:37 -05:00
Mike Fleetwood	ece7d20e8b	Btrfs: Don't error on resizing FS to same size It seems overly harsh to fail a resize of a btrfs file system to the same size when a shrink or grow would succeed. User app GParted trips over this error. Allow it by bypassing the shrink or grow operation. Signed-off-by: Mike Fleetwood <mike.fleetwood@googlemail.com>	2011-11-30 18:46:04 +01:00
Arnd Hannemann	5bb1468238	Btrfs: prefix resize related printks with btrfs: For the user it is confusing to find something like: [10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472 in kernel log, because it doesn't point directly to btrfs. This patch prefixes those messages with "btrfs:" like other btrfs related printks. Signed-off-by: Arnd Hannemann <arnd@arndnet.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-20 07:42:16 -05:00
Jeff Mahoney	745c4d8e16	btrfs: Fix up 32/64-bit compatibility for new ioctls This patch casts to unsigned long before casting to a pointer and fixes the following warnings: fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-20 07:42:13 -05:00
Chris Mason	740c3d226c	Btrfs: fix the new inspection ioctls for 32 bit compat The new ioctls to follow backrefs are not clean for 32/64 bit compat. This reworks them for u64s everywhere. They are brand new, so there are no problems with changing the interface now. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-06 03:08:49 -05:00
Chris Mason	806468f8bf	Merge git://git.jan-o-sch.net/btrfs-unstable into integration Conflicts: fs/btrfs/Makefile fs/btrfs/extent_io.c fs/btrfs/extent_io.h fs/btrfs/scrub.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-06 03:07:10 -05:00
David Sterba	6c41761fc6	btrfs: separate superblock items out of fs_info fs_info has now ~9kb, more than fits into one page. This will cause mount failure when memory is too fragmented. Top space consumers are super block structures super_copy and super_for_commit, ~2.8kb each. Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64) Add a wrapper for freeing fs_info and all of it's dynamically allocated members. Signed-off-by: David Sterba <dsterba@suse.cz>	2011-11-06 03:04:01 -05:00
David Sterba	a81d3b1ba2	Merge branch 'hotfixes-20111024/josef/for-chris' into btrfs-next-stable	2011-10-24 14:47:58 +02:00
David Sterba	afd582ac8f	Merge remote-tracking branch 'remotes/josef/for-chris' into btrfs-next-stable	2011-10-24 14:47:57 +02:00
Lukas Czerner	f4c697e640	btrfs: return EINVAL if start > total_bytes in fitrim ioctl We should retirn EINVAL if the start is beyond the end of the file system in the btrfs_ioctl_fitrim(). Fix that by adding the appropriate check for it. Also in the btrfs_trim_fs() it is possible that len+start might overflow if big values are passed. Fix it by decrementing the len so that start+len is equal to the file system size in the worst case. Signed-off-by: Lukas Czerner <lczerner@redhat.com>	2011-10-20 18:10:40 +02:00
Li Zefan	008873eafb	Btrfs: honor extent thresh during defragmentation We won't defrag an extent, if it's bigger than the threshold we specified and there's no small extent before it, but actually the code doesn't work this way. There are three bugs: - When should_defrag_range() decides we should keep on defragmenting an extent, last_len is not incremented. (old bug) - The length that passes to should_defrag_range() is not the length we're going to defrag. (new bug) - We always defrag 256K bytes data, and a big extent can be part of this range. (new bug) For a file with 4 extents: \| 4K \| 4K \| 256K \| 256K \| The result of defrag with (the default) 256K extent thresh should be: \| 264K \| 256K \| but with those bugs, we'll get: \| 520K \| Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-10-20 18:10:39 +02:00
Li Zefan	5ca496604b	Btrfs: fix wrong max_to_defrag in btrfs_defrag_file() It's off-by-one, and thus we may skip the last page while defragmenting. An example case: # create /mnt/file with 2 4K file extents # btrfs fi defrag /mnt/file # sync # filefrag /mnt/file /mnt/file: 2 extents found So it's not defragmented. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-10-20 18:10:37 +02:00
Li Zefan	151a31b25e	Btrfs: use i_size_read() in btrfs_defrag_file() Don't use inode->i_size directly, since we're not holding i_mutex. This also fixes another bug, that i_size can change after it's checked against 0 and then (i_size - 1) can be negative. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-10-20 18:10:35 +02:00
Li Zefan	cbcc83265d	Btrfs: fix defragmentation regression There's an off-by-one bug: # create a file with lots of 4K file extents # btrfs fi defrag /mnt/file # sync # filefrag -v /mnt/file Filesystem type is: 9123683e File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3372 64 1 64 3136 3435 1 2 65 3436 3136 64 3 129 3201 3499 1 4 130 3500 3201 64 5 194 3266 3563 1 6 195 3564 3266 64 7 259 3331 3627 1 8 260 3628 3331 40 eof After this patch: ... # filefrag -v /mnt/file Filesystem type is: 9123683e File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3372 300 eof /mnt/file: 1 extent found Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-10-20 18:10:34 +02:00
Diego Calleja	60ccf82f5b	btrfs: fix memory leak in btrfs_defrag_file kmemleak found this: unreferenced object 0xffff8801b64af968 (size 512): comm "btrfs-cleaner", pid 3317, jiffies 4306810886 (age 903.272s) hex dump (first 32 bytes): 00 82 01 07 00 ea ff ff c0 83 01 07 00 ea ff ff ................ 80 82 01 07 00 ea ff ff c0 87 01 07 00 ea ff ff ................ backtrace: [<ffffffff816875cc>] kmemleak_alloc+0x5c/0xc0 [<ffffffff8114aec3>] kmem_cache_alloc_trace+0x163/0x240 [<ffffffff8127a290>] btrfs_defrag_file+0xf0/0xb20 [<ffffffff8125d9a5>] btrfs_run_defrag_inodes+0x165/0x210 [<ffffffff812479d7>] cleaner_kthread+0x177/0x190 [<ffffffff81075c7d>] kthread+0x8d/0xa0 [<ffffffff816af5f4>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff "pages" is not always freed. Fix it removing the unnecesary additional return. Signed-off-by: Diego Calleja <diegocg@gmail.com>	2011-10-20 18:10:33 +02:00
Josef Bacik	e27425d614	Btrfs: only inherit btrfs specific flags when creating files Xfstests 79 was failing because we were inheriting the S_APPEND flag when we weren't supposed to. There isn't any specific documentation on this so I'm taking the test as the standard of how things work, and having S_APPEND set on a directory doesn't mean that S_APPEND gets inherited by its children according to this test. So only inherit btrfs specific things. This will let us set compress/nocompress on specific directories and everything in the directories will inherit this flag, same with nodatacow. With this patch test 79 passes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:50 -04:00
Josef Bacik	3b16a4e3c3	Btrfs: use the inode's mapping mask for allocating pages Johannes pointed out we were allocating only kernel pages for doing writes, which is kind of a big deal if you are on 32bit and have more than a gig of ram. So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we don't re-enter. Thanks, Reported-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:45 -04:00
Linus Torvalds	b2f9452bd5	Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux * 'btrfs-3.0' of git://github.com/chrismason/linux: Btrfs: make sure not to defrag extents past i_size Btrfs: fix recursive auto-defrag	2011-10-13 18:20:40 +12:00
Chris Mason	f7f43cc841	Btrfs: make sure not to defrag extents past i_size The btrfs file defrag code will loop through the extents and force COW on them. But there is a concurrent truncate in the middle of the defrag, it might end up defragging the same range over and over again. The problem is that writepage won't go through and do anything on pages past i_size, so the cow won't happen, so the file will appear to still be fragmented. defrag will end up hitting the same extents again and again. In the worst case, the truncate can actually live lock with the defrag because the defrag keeps creating new ordered extents which the truncate code keeps waiting on. The fix here is to make defrag check for i_size inside the main loop, instead of just once before the looping starts. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-10-11 11:45:55 -04:00
Li Zefan	2a0f7f5769	Btrfs: fix recursive auto-defrag Follow those steps: # mount -o autodefrag /dev/sda7 /mnt # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1 # sync # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc and then it'll go into a loop: writeback -> defrag -> writeback ... It's because writeback writes [8K, 200K] and then writes [0, 8K]. I tried to make writeback know if the pages are dirtied by defrag, but the patch was a bit intrusive. Here I simply set writeback_index when we defrag a file. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-10-10 15:43:34 -04:00
Jan Schmidt	d7728c960d	btrfs: new ioctls to do logical->inode and inode->path resolving these ioctls make use of the new functions initially added for scrub. they return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and all paths belonging to an inode (BTRFS_IOC_INO_PATHS). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2011-09-29 12:54:28 +02:00
Chris Mason	0a7a0519d1	Merge branch 'btrfs-3.0' into for-linus	2011-09-20 14:49:29 -04:00
Sage Weil	b6f3409b21	Btrfs: reserve sufficient space for ioctl clone Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We need to reserve space for: - adjusting the old extent (possibly splitting it) - adding the new extent - updating the inode Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-20 14:48:51 -04:00
Chris Mason	2cf4ce7c2a	Merge branch 'btrfs-3.0' into for-linus	2011-09-18 10:31:44 -04:00
Li Zefan	dde820fbf7	Btrfs: don't change inode flag of the dest clone file The dst file will have the same inode flags with dst file after file clone, and I think it's unexpected. For example, the dst file will suddenly become immutable after getting some share of data with src file, if the src is immutable. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:20:46 -04:00
Li Zefan	0e7b824c4e	Btrfs: don't make a file partly checksummed through file clone To reproduce the bug: # mount /dev/sda7 /mnt # dd if=/dev/zero of=/mnt/src bs=4K count=1 # umount /mnt # mount -o nodatasum /dev/sda7 /mnt # dd if=/dev/zero of=/mnt/dst bs=4K count=1 # clone_range -s 4K -l 4K /mnt/src /mnt/dst # echo 3 > /proc/sys/vm/drop_caches # cat /mnt/dst # dmesg ... btrfs no csum found for inode 258 start 0 btrfs csum failed ino 258 off 0 csum 2566472073 private 0 It's because part of the file is checksummed and the other part is not, and then btrfs will complain checksum is not found when we read the file. Disallow file clone if src and dst file have different checksum flag, so we ensure a file is completely checksummed or unchecksummed. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:20:46 -04:00
Li Zefan	71ef078610	Btrfs: fix pages truncation in btrfs_ioctl_clone() It's a bug in commit `f81c9cdc56` (Btrfs: truncate pages from clone ioctl target range) We should pass the dest range to the truncate function, but not the src range. Also move the function before locking extent state. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-18 10:20:46 -04:00
Linus Torvalds	0b001b2eda	Merge branch 'for-linus' of git://github.com/chrismason/linux * 'for-linus' of git://github.com/chrismason/linux: Btrfs: add dummy extent if dst offset excceeds file end in Btrfs: calc file extent num_bytes correctly in file clone btrfs: xattr: fix attribute removal Btrfs: fix wrong nbytes information of the inode Btrfs: fix the file extent gap when doing direct IO Btrfs: fix unclosed transaction handle in btrfs_cont_expand Btrfs: fix misuse of trans block rsv Btrfs: reset to appropriate block rsv after orphan operations Btrfs: skip locking if searching the commit root in csum lookup btrfs: fix warning in iput for bad-inode Btrfs: fix an oops when deleting snapshots	2011-09-12 11:47:49 -07:00
Li Zefan	d525e8ab02	Btrfs: add dummy extent if dst offset excceeds file end in You can see there's no file extent with range [0, 4096]. Check this by btrfsck: # btrfsck /dev/sda7 root 5 inode 258 errors 100 ... Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:25 -04:00
Li Zefan	d72c0842ff	Btrfs: calc file extent num_bytes correctly in file clone num_bytes should be 4096 not 12288. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:25 -04:00
Chris Mason	81d86e1b70	Merge branch 'btrfs-3.0' into for-linus	2011-08-18 10:38:03 -04:00
Sage Weil	f81c9cdc56	Btrfs: truncate pages from clone ioctl target range We need to truncate page cache pages for the clone ioctl target range or else we'll confuse ourselves to no end. If the old data was cached, we used to still see it (until remount). If the page was partially updated we used to get a mix of old and new data. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:31 -04:00
Linus Torvalds	ed8f37370d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (31 commits) Btrfs: don't call writepages from within write_full_page Btrfs: Remove unused variable 'last_index' in file.c Btrfs: clean up for find_first_extent_bit() Btrfs: clean up for wait_extent_bit() Btrfs: clean up for insert_state() Btrfs: remove unused members from struct extent_state Btrfs: clean up code for merging extent maps Btrfs: clean up code for extent_map lookup Btrfs: clean up search_extent_mapping() Btrfs: remove redundant code for dir item lookup Btrfs: make acl functions really no-op if acl is not enabled Btrfs: remove remaining ref-cache code Btrfs: remove a BUG_ON() in btrfs_commit_transaction() Btrfs: use wait_event() Btrfs: check the nodatasum flag when writing compressed files Btrfs: copy string correctly in INO_LOOKUP ioctl Btrfs: don't print the leaf if we had an error btrfs: make btrfs_set_root_node void Btrfs: fix oops while writing data to SSD partitions Btrfs: Protect the readonly flag of block group ... Fix up trivial conflicts (due to acl and writeback cleanups) in - fs/btrfs/acl.c - fs/btrfs/ctree.h - fs/btrfs/extent_io.c	2011-08-02 21:14:05 -10:00
Li Zefan	77906a5075	Btrfs: copy string correctly in INO_LOOKUP ioctl Memory areas [ptr, ptr+total_len] and [name, name+total_len] may overlap, so it's wrong to use memcpy(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-01 14:30:45 -04:00
Linus Torvalds	22712200e1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors Btrfs: use the commit_root for reading free_space_inode crcs Btrfs: reduce extent_state lock contention for metadata Btrfs: remove lockdep magic from btrfs_next_leaf Btrfs: make a lockdep class for each root Btrfs: switch the btrfs tree locks to reader/writer Btrfs: fix deadlock when throttling transactions Btrfs: stop using highmem for extent_buffers Btrfs: fix BUG_ON() caused by ENOSPC when relocating space Btrfs: tag pages for writeback in sync Btrfs: fix enospc problems with delalloc Btrfs: don't flush delalloc arbitrarily Btrfs: use find_or_create_page instead of grab_cache_page Btrfs: use a worker thread to do caching Btrfs: fix how we merge extent states and deal with cached states Btrfs: use the normal checksumming infrastructure for free space cache Btrfs: serialize flushers in reserve_metadata_bytes Btrfs: do transaction space reservation before joining the transaction Btrfs: try to only do one btrfs_search_slot in do_setxattr	2011-07-27 16:43:52 -07:00
Josef Bacik	9e0baf60de	Btrfs: fix enospc problems with delalloc So I had this brilliant idea to use atomic counters for outstanding and reserved extents, but this turned out to be a bad idea. Consider this where we have 1 outstanding extent and 1 reserved extent Reserver Releaser atomic_dec(outstanding) now 0 atomic_read(outstanding)+1 get 1 atomic_read(reserved) get 1 don't actually reserve anything because they are the same atomic_cmpxchg(reserved, 1, 0) atomic_inc(outstanding) atomic_add(0, reserved) free reserved space for 1 extent Then the reserver now has no actual space reserved for it, and when it goes to finish the ordered IO it won't have enough space to do it's allocation and you get those lovely warnings. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:44 -04:00
Josef Bacik	a94733d0bc	Btrfs: use find_or_create_page instead of grab_cache_page grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we need GFP_NOFS so we don't deadlock. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-07-27 12:46:43 -04:00
Al Viro	2fbe8c8ad1	get rid of useless dget_parent() in fs/btrfs/ioctl.c both callers there have dentry->d_parent stabilized by the fact that their caller had obtained dentry from lookup_one_len() and had not dropped ->i_mutex on parent since then. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:00 -04:00
Josef Bacik	8351583e3f	Btrfs: protect the pending_snapshots list with trans_lock Currently there is nothing protecting the pending_snapshots list on the transaction. We only hold the directory mutex that we are snapshotting and a read lock on the subvol_sem, so we could race with somebody else creating a snapshot in a different directory and end up with list corruption. So protect this list with the trans_lock. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-15 13:24:46 -04:00
Li Zefan	027ed2f004	Btrfs: avoid stack bloat in btrfs_ioctl_fs_info() The size of struct btrfs_ioctl_fs_info_args is as big as 1KB, so don't declare the variable on stack. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 18:57:10 -04:00
David Sterba	a4689d2bd3	btrfs: use btrfs_ino to access inode number commit `4cb5300bc` ("Btrfs: add mount -o auto_defrag") accesses inode number directly while it should use the helper with the new inode number allocator. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:46 -04:00
Chris Mason	ff5714cca9	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus Conflicts: fs/btrfs/disk-io.c fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-28 07:00:39 -04:00
Chris Mason	4cb5300bc8	Btrfs: add mount -o auto_defrag This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-26 17:52:15 -04:00
Chris Mason	d6c0cb379c	Merge branch 'cleanups_and_fixes' into inode_numbers Conflicts: fs/btrfs/tree-log.c fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 14:37:47 -04:00
Xiao Guangrong	1f78160ce1	Btrfs: using rcu lock in the reader side of devices list fs_devices->devices is only updated on remove and add device paths, so we can use rcu to protect it in the reader side Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:43 -04:00
Hugo Mills	e215686715	btrfs: Ensure the tree search ioctl returns the right number of records Btrfs's tree search ioctl has a field to indicate that no more than a given number of records should be returned. The ioctl doesn't honour this, as the tested value is not incremented until the end of the copy_to_sk function. This patch removes an unnecessary local variable, and updates the num_found counter as each key is found in the tree. Signed-off-by: Hugo Mills <hugo@carfax.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:05:39 -04:00
Josef Bacik	d82a6f1d7e	Btrfs: kill BTRFS_I(inode)->block_group Originally this was going to be used as a way to give hints to the allocator, but frankly we can get much better hints elsewhere and it's not even used at all for anything usefull. In addition to be completely useless, when we initialize an inode we try and find a freeish block group to set as the inodes block group, and with a completely full 40gb fs this takes _forever_, so I imagine with say 1tb fs this is just unbearable. So just axe the thing altoghether, we don't need it and it saves us 8 bytes in the inode and saves us 500 microseconds per inode lookup in my testcase. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:12 -04:00
Josef Bacik	a4abeea41a	Btrfs: kill trans_mutex We use trans_mutex for lots of things, here's a basic list 1) To serialize trans_handles joining the currently running transaction 2) To make sure that no new trans handles are started while we are committing 3) To protect the dead_roots list and the transaction lists Really the serializing trans_handles joining is not too hard, and can really get bogged down in acquiring a reference to the transaction. So replace the trans_mutex with a trans_lock spinlock and use it to do the following 1) Protect fs_info->running_transaction. All trans handles have to do is check this, and then take a reference of the transaction and keep on going. 2) Protect the fs_info->trans_list. This doesn't get used too much, basically it just holds the current transactions, which will usually just be the currently committing transaction and the currently running transaction at most. 3) Protect the dead roots list. This is only ever processed by splicing the list so this is relatively simple. 4) Protect the fs_info->reloc_ctl stuff. This is very lightweight and was using the trans_mutex before, so this is a pretty straightforward change. 5) Protect fs_info->no_trans_join. Because we don't hold the trans_lock over the entirety of the commit we need to have a way to block new people from creating a new transaction while we're doing our work. So we set no_trans_join and in join_transaction we test to see if that is set, and if it is we do a wait_on_commit. 6) Make the transaction use count atomic so we don't need to take locks to modify it when we're dropping references. 7) Add a commit_lock to the transaction to make sure multiple people trying to commit the same transaction don't race and commit at the same time. 8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl trans. I have tested this with xfstests, but obviously it is a pretty hairy change so lots of testing is greatly appreciated. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:00:57 -04:00
Josef Bacik	7a7eaa40a3	Btrfs: take away the num_items argument from btrfs_join_transaction I keep forgetting that btrfs_join_transaction() just ignores the num_items argument, which leads me to sending pointless patches and looking stupid :). So just kill the num_items argument from btrfs_join_transaction and btrfs_start_ioctl_transaction, since neither of them use it. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:00:56 -04:00
Chris Mason	712673339a	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into inode_numbers Conflicts: fs/btrfs/Makefile fs/btrfs/ctree.h fs/btrfs/volumes.h Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 06:30:52 -04:00
Chris Mason	945d8962ce	Merge branch 'cleanups' of git://repo.or.cz/linux-2.6/btrfs-unstable into inode_numbers Conflicts: fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/tree-log.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-22 12:33:42 -04:00
Chris Mason	dcc6d07322	Merge branch 'delayed_inode' into inode_numbers Conflicts: fs/btrfs/inode.c fs/btrfs/ioctl.c fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-22 07:07:01 -04:00
Miao Xie	16cdcec736	btrfs: implement delayed inode items operation Changelog V5 -> V6: - Fix oom when the memory load is high, by storing the delayed nodes into the root's radix tree, and letting btrfs inodes go. Changelog V4 -> V5: - Fix the race on adding the delayed node to the inode, which is spotted by Chris Mason. - Merge Chris Mason's incremental patch into this patch. - Fix deadlock between readdir() and memory fault, which is reported by Itaru Kitayama. Changelog V3 -> V4: - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache inode in time. Changelog V2 -> V3: - Fix the race between the delayed worker and the task which does delayed items balance, which is reported by Tsutomu Itoh. - Modify the patch address David Sterba's comment. - Fix the bug of the cpu recursion spinlock, reported by Chris Mason Changelog V1 -> V2: - break up the global rb-tree, use a list to manage the delayed nodes, which is created for every directory and file, and used to manage the delayed directory name index items and the delayed inode item. - introduce a worker to deal with the delayed nodes. Compare with Ext3/4, the performance of file creation and deletion on btrfs is very poor. the reason is that btrfs must do a lot of b+ tree insertions, such as inode item, directory name item, directory name index and so on. If we can do some delayed b+ tree insertion or deletion, we can improve the performance, so we made this patch which implemented delayed directory name index insertion/deletion and delayed inode update. Implementation: - introduce a delayed root object into the filesystem, that use two lists to manage the delayed nodes which are created for every file/directory. One is used to manage all the delayed nodes that have delayed items. And the other is used to manage the delayed nodes which is waiting to be dealt with by the work thread. - Every delayed node has two rb-tree, one is used to manage the directory name index which is going to be inserted into b+ tree, and the other is used to manage the directory name index which is going to be deleted from b+ tree. - introduce a worker to deal with the delayed operation. This worker is used to deal with the works of the delayed directory name index items insertion and deletion and the delayed inode update. When the delayed items is beyond the lower limit, we create works for some delayed nodes and insert them into the work queue of the worker, and then go back. When the delayed items is beyond the upper bound, we create works for all the delayed nodes that haven't been dealt with, and insert them into the work queue of the worker, and then wait for that the untreated items is below some threshold value. - When we want to insert a directory name index into b+ tree, we just add the information into the delayed inserting rb-tree. And then we check the number of the delayed items and do delayed items balance. (The balance policy is above.) - When we want to delete a directory name index from the b+ tree, we search it in the inserting rb-tree at first. If we look it up, just drop it. If not, add the key of it into the delayed deleting rb-tree. Similar to the delayed inserting rb-tree, we also check the number of the delayed items and do delayed items balance. (The same to inserting manipulation) - When we want to update the metadata of some inode, we cached the data of the inode into the delayed node. the worker will flush it into the b+ tree after dealing with the delayed insertion and deletion. - We will move the delayed node to the tail of the list after we access the delayed node, By this way, we can cache more delayed items and merge more inode updates. - If we want to commit transaction, we will deal with all the delayed node. - the delayed node will be freed when we free the btrfs inode. - Before we log the inode items, we commit all the directory name index items and the delayed inode update. I did a quick test by the benchmark tool[1] and found we can improve the performance of file creation by ~15%, and file deletion by ~20%. Before applying this patch: Create files: Total files: 50000 Total time: 1.096108 Average time: 0.000022 Delete files: Total files: 50000 Total time: 1.510403 Average time: 0.000030 After applying this patch: Create files: Total files: 50000 Total time: 0.932899 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.215732 Average time: 0.000024 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 Many thanks for Kitayama-san's help! Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: David Sterba <dave@jikos.cz> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Tested-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-21 09:30:56 -04:00
Chris Mason	0965537308	Merge branch 'ino-alloc' of git://repo.or.cz/linux-btrfs-devel into inode_numbers Conflicts: fs/btrfs/free-space-cache.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-21 09:27:38 -04:00
Linus Torvalds	eed631e0d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix FS_IOC_SETFLAGS ioctl Btrfs: fix FS_IOC_GETFLAGS ioctl fs: remove FS_COW_FL Btrfs: fix easily get into ENOSPC in mixed case Prevent oopsing in posix_acl_valid()	2011-05-15 10:22:10 -07:00
Li Zefan	ebcb904dfe	Btrfs: fix FS_IOC_SETFLAGS ioctl Steps to reproduce the bug: - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL - Call FS_IOC_SETFLAGS ioctl with flags=0 - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set! Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-14 16:10:28 -04:00
Li Zefan	d0092bdda8	Btrfs: fix FS_IOC_GETFLAGS ioctl As we've added per file compression/cow support. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-14 16:10:27 -04:00
Li Zefan	e1e8fb6a1f	fs: remove FS_COW_FL FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file COW in btrfs, but FS_NOCOW_FL is sufficient. The fact is we don't have corresponding BTRFS_INODE_COW flag. COW is default, and FS_NOCOW_FL can be used to switch off COW for a single file. If we mount btrfs with nodatacow, a newly created file will be set with the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the FS_NOCOW_FL flag. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-14 16:10:26 -04:00
Arne Jansen	8628764e1a	btrfs: add readonly flag setting the readonly flag prevents writes in case an error is detected Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-05-12 14:48:31 +02:00
Jan Schmidt	475f63874d	btrfs: new ioctls for scrub adds ioctls necessary to start and cancel scrubs, to get current progress and to get info about devices to be scrubbed. Note that the scrub is done per-device and that the ioctl only returns after the scrub for this devices is finished or has been canceled. Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-05-12 14:45:38 +02:00
David Sterba	b3b4aa74b5	btrfs: drop unused parameter from btrfs_release_path parameter tree root it's not used since commit `5f39d397df` ("Btrfs: Create extent_buffer interface for large blocksizes") Signed-off-by: David Sterba <dsterba@suse.cz>	2011-05-02 13:57:22 +02:00

1 2 3 4 5 ...

393 Commits (7cd875d2b7bc61abdad75209b1ab63301a78f99c)