Commit graph

109 commits

Author SHA1 Message Date
Mark Fasheh 82bd101b52 btrfs: qgroup: account shared subtree during snapshot delete
Commit 0ed4792 ('btrfs: qgroup: Switch to new extent-oriented qgroup
mechanism.') removed our qgroup accounting during
btrfs_drop_snapshot(). Predictably, this results in qgroup numbers
going bad shortly after a snapshot is removed.

Fix this by adding a dirty extent record when we encounter extents during
our shared subtree walk. This effectively restores the functionality we had
with the original shared subtree walking code in 1152651 (btrfs: qgroup:
account shared subtrees during snapshot delete).

The idea with the original patch (and this one) is that shared subtrees can
get skipped during drop_snapshot. The shared subtree walk then allows us a
chance to visit those extents and add them to the qgroup work for later
processing. This ultimately makes the accounting for drop snapshot work.

The new qgroup code nicely handles all the other extents during the tree
walk via the ref dec/inc functions so we don't have to add actions beyond
what we had originally.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:27:33 -08:00
Justin Maggard 967ef5131e btrfs: qgroup: fix quota disable during rescan
There's a race condition that leads to a NULL pointer dereference if you
disable quotas while a quota rescan is running.  To fix this, we just need
to wait for the quota rescan worker to actually exit before tearing down
the quota structures.

Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:22:08 -08:00
Filipe Manana 3b2ba7b31d Btrfs: fix sleeping inside atomic context in qgroup rescan worker
We are holding a btree path with spinning locks and then we attempt to
clone an extent buffer, which calls kmem_cache_alloc() and this function
can sleep, causing the following trace to be reported on a debug kernel:

[107118.218536] BUG: sleeping function called from invalid context at mm/slab.c:2871
[107118.224110] in_atomic(): 1, irqs_disabled(): 0, pid: 19148, name: kworker/u32:3
[107118.226120] INFO: lockdep is turned off.
[107118.226843] Preemption disabled at:[<ffffffffa05ffa22>] btrfs_clear_lock_blocking_rw+0x96/0xea [btrfs]

[107118.229175] CPU: 3 PID: 19148 Comm: kworker/u32:3 Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[107118.231326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[107118.233687] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper [btrfs]
[107118.236835]  0000000000000000 ffff880424bf3b78 ffffffff812566f4 0000000000000000
[107118.238369]  ffff880424bf3ba0 ffffffff81070664 ffffffff817f1cd5 0000000000000b37
[107118.239769]  0000000000000000 ffff880424bf3bc8 ffffffff8107070a 0000000000008850
[107118.241244] Call Trace:
[107118.241729]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
[107118.242602]  [<ffffffff81070664>] ___might_sleep+0x23a/0x241
[107118.243586]  [<ffffffff8107070a>] __might_sleep+0x9f/0xa6
[107118.244532]  [<ffffffff8115af70>] cache_alloc_debugcheck_before+0x25/0x36
[107118.245939]  [<ffffffff8115d52b>] kmem_cache_alloc+0x50/0x215
[107118.246930]  [<ffffffffa05e627e>] __alloc_extent_buffer+0x2a/0x11f [btrfs]
[107118.248121]  [<ffffffffa05ecb1a>] btrfs_clone_extent_buffer+0x3d/0xdd [btrfs]
[107118.249451]  [<ffffffffa06239ea>] btrfs_qgroup_rescan_worker+0x16d/0x434 [btrfs]
[107118.250755]  [<ffffffff81087481>] ? arch_local_irq_save+0x9/0xc
[107118.251754]  [<ffffffffa05f7952>] normal_work_helper+0x14c/0x32a [btrfs]
[107118.252899]  [<ffffffffa05f7952>] ? normal_work_helper+0x14c/0x32a [btrfs]
[107118.254195]  [<ffffffffa05f7c82>] btrfs_qgroup_rescan_helper+0x12/0x14 [btrfs]
[107118.255436]  [<ffffffff81063b23>] process_one_work+0x24a/0x4ac
[107118.263690]  [<ffffffff81064285>] worker_thread+0x206/0x2c2
[107118.264888]  [<ffffffff8106407f>] ? rescuer_thread+0x2cb/0x2cb
[107118.267413]  [<ffffffff8106904d>] kthread+0xef/0xf7
[107118.268417]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
[107118.269505]  [<ffffffff8147d10f>] ret_from_fork+0x3f/0x70
[107118.270491]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24

So just use blocking locks for our path to solve this.
This fixes the patch titled:
  "btrfs: qgroup: Don't copy extent buffer to do qgroup rescan"

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-11-05 11:02:22 +00:00
Filipe Manana 190631f1c8 Btrfs: fix race waiting for qgroup rescan worker
We were initializing the completion (fs_info->qgroup_rescan_completion)
object after releasing the qgroup rescan lock, which gives a small time
window for a rescan waiter to not actually wait for the rescan worker
to finish. Example:

         CPU 1                                                     CPU 2

 fs_info->qgroup_rescan_completion->done is 0

 btrfs_qgroup_rescan_worker()
   complete_all(&fs_info->qgroup_rescan_completion)
     sets fs_info->qgroup_rescan_completion->done
     to UINT_MAX / 2

 ... do some other stuff ....

 qgroup_rescan_init()
   mutex_lock(&fs_info->qgroup_rescan_lock)
   set flag BTRFS_QGROUP_STATUS_FLAG_RESCAN
     in fs_info->qgroup_flags
   mutex_unlock(&fs_info->qgroup_rescan_lock)

                                                       btrfs_qgroup_wait_for_completion()
                                                         mutex_lock(&fs_info->qgroup_rescan_lock)
                                                         sees flag BTRFS_QGROUP_STATUS_FLAG_RESCAN
                                                           in fs_info->qgroup_flags
                                                         mutex_unlock(&fs_info->qgroup_rescan_lock)

                                                         wait_for_completion_interruptible(
                                                           &fs_info->qgroup_rescan_completion)

                                                           fs_info->qgroup_rescan_completion->done
                                                           is > 0 so it returns immediately

  init_completion(&fs_info->qgroup_rescan_completion)
    sets fs_info->qgroup_rescan_completion->done to 0

So fix this by initializing the completion object while holding the mutex
fs_info->qgroup_rescan_lock.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2015-11-05 10:32:21 +00:00
Justin Maggard 7343dd61fd btrfs: qgroup: exit the rescan worker during umount
I was hitting a consistent NULL pointer dereference during shutdown that
showed the trace running through end_workqueue_bio().  I traced it back to
the endio_meta_workers workqueue being poked after it had already been
destroyed.

Eventually I found that the root cause was a qgroup rescan that was still
in progress while we were stopping all the btrfs workers.

Currently we explicitly pause balance and scrub operations in
close_ctree(), but we do nothing to stop the qgroup rescan.  We should
probably be doing the same for qgroup rescan, but that's a much larger
change.  This small change is good enough to allow me to unmount without
crashing.

Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
2015-11-05 10:32:20 +00:00
Qu Wenruo 90ce321da8 btrfs: qgroup: Fix a rebase bug which will cause qgroup double free
When rebasing my patchset, I forgot to pick up a cleanup patch to remove
old hotfix in 4.2 release.

Witouth the cleanup, it will screw up new qgroup reserve framework and
always cause minus reserved number.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-26 19:44:39 -07:00
Qu Wenruo 0a0e8b8938 btrfs: qgroup: Don't copy extent buffer to do qgroup rescan
Ancient qgroup code call memcpy() on a extent buffer and use it for leaf
iteration.

As extent buffer contains lock, pointers to pages, it's never sane to do
such copy.

The following bug may be caused by this insane operation:
[92098.841309] general protection fault: 0000 [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
4.3.0-rc1 #1
[92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[92098.842261] Call Trace:
[92098.842277]  [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
[btrfs]
[92098.842304]  [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
[btrfs]
[92098.842329]  [<ffffffffc039af3d>]
btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]

Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(),
called in reading key from the copied extent_buffer.

This patch will use btrfs_clone_extent_buffer() to a better copy of
extent buffer to deal such case.

Reported-by: Stephane Lesimple <stephane_btrfs@lesimple.fr>
Suggested-by: Filipe Manana <fdmanana@kernel.org>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-26 19:42:30 -07:00
Qu Wenruo 56fa9d0762 btrfs: qgroup: Check if qgroup reserved space leaked
Add check at btrfs_destroy_inode() time to detect qgroup reserved space
leak.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-21 18:41:10 -07:00
Qu Wenruo 81fb6f77a0 btrfs: qgroup: Add new trace point for qgroup data reserve
Now each qgroup reserve for data will has its ftrace event for better
debugging.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-21 18:41:08 -07:00
Qu Wenruo 7cf5b97650 btrfs: qgroup: Cleanup old inaccurate facilities
Cleanup the old facilities which use old btrfs_qgroup_reserve() function
call, replace them with the newer version, and remove the "__" prefix in
them.

Also, make btrfs_qgroup_reserve/free() functions private, as they are
now only used inside qgroup codes.

Now, the whole btrfs qgroup is swithed to use the new reserve facilities.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-21 18:41:06 -07:00
Qu Wenruo 55eeaf0578 btrfs: qgroup: Introduce new functions to reserve/free metadata
Introduce new functions btrfs_qgroup_reserve/free_meta() to reserve/free
metadata reserved space.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-21 18:37:47 -07:00
Qu Wenruo 297d750b9f btrfs: delayed_ref: release and free qgroup reserved at proper timing
Qgroup reserved space needs to be released from inode dirty map and get
freed at different timing:

1) Release when the metadata is written into tree
After corresponding metadata is written into tree, any newer write will
be COWed(don't include NOCOW case yet).
So we must release its range from inode dirty range map, or we will
forget to reserve needed range, causing accounting exceeding the limit.

2) Free reserved bytes when delayed ref is run
When delayed refs are run, qgroup accounting will follow soon and turn
the reserved bytes into rfer/excl numbers.
As run_delayed_refs and qgroup accounting are all done at
commit_transaction() time, we are safe to free reserved space in
run_delayed_ref time().

With these timing to release/free reserved space, we should be able to
resolve the long existing qgroup reserve space leak problem.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-21 18:37:47 -07:00
Qu Wenruo f695fdcef8 btrfs: qgroup: Introduce functions to release/free qgroup reserve data
space

Introduce functions btrfs_qgroup_release/free_data() to release/free
reserved data range.

Release means, just remove the data range from io_tree, but doesn't
free the reserved space.
This is for normal buffered write case, when data is written into disc
and its metadata is added into tree, its reserved space should still be
kept until commit_trans().
So in that case, we only release dirty range, but keep the reserved
space recorded some other place until commit_tran().

Free means not only remove data range, but also free reserved space.
This is used for case for cleanup and invalidate page.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-21 18:37:46 -07:00
Qu Wenruo 5247255370 btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function
Introduce a new function, btrfs_qgroup_reserve_data(), which will use
io_tree to accurate qgroup reserve, to avoid reserved space leaking.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-21 18:37:45 -07:00
Linus Torvalds 089b669506 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree for 4.3 (kerneldoc updates, printk()
  fixes, Documentation and MAINTAINERS updates)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
  MAINTAINERS: update my e-mail address
  mod_devicetable: add space before */
  scsi: a100u2w: trivial typo in printk
  i2c: Fix typo in i2c-bfin-twi.c
  treewide: fix typos in comment blocks
  Doc: fix trivial typo in SubmittingPatches
  proportions: Spelling s/consitent/consistent/
  dm: Spelling s/consitent/consistent/
  aic7xxx: Fix typo in error message
  pcmcia: Fix typo in locking documentation
  scsi/arcmsr: Fix typos in error log
  drm/nouveau/gr: Fix typo in nv10.c
  [SCSI] Fix printk typos in drivers/scsi
  staging: comedi: Grammar s/Enable support a/Enable support for a/
  Btrfs: Spelling s/consitent/consistent/
  README: GTK+ is a acronym
  ASoC: omap: Fix typo in config option description
  mm: tlb.c: Fix error message
  ntfs: super.c: Fix error log
  fix typo in Documentation/SubmittingPatches
  ...
2015-09-01 18:46:42 -07:00
Geert Uytterhoeven d41e36a0ab Btrfs: Spelling s/consitent/consistent/
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2015-08-07 14:13:21 +02:00
Qu Wenruo c05f9429e1 btrfs: qgroup: Fix a regression in qgroup reserved space.
During the change to new btrfs extent-oriented qgroup implement, due to
it doesn't use the old __qgroup_excl_accounting() for exclusive extent,
it didn't free the reserved bytes.

The bug will cause limit function go crazy as the reserved space is
never freed, increasing limit will have no effect and still cause
EQOUT.

The fix is easy, just free reserved bytes for newly created exclusive
extent as what it does before.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Yang Dongsheng <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-08-06 14:51:15 -07:00
Yang Dongsheng fe7599079b btrfs: qgroup: allow user to clear the limitation on qgroup
Currently, we can only set a limitation on a qgroup, but we
can not clear it.

This patch provide a choice to user to clear a limitation on
qgroup by passing a value of CLEAR_VALUE(-1) to kernel.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-30 13:20:00 -07:00
Qu Wenruo 9086db86e0 btrfs: qgroup: Add the ability to skip given qgroup for old/new_roots.
This is used by later qgroup fix patches for snapshot.

As current snapshot accounting is done by btrfs_qgroup_inherit(), but
new extent oriented quota mechanism will account extent from
btrfs_copy_root() and other snapshot things, causing wrong result.

So add this ability to handle snapshot accounting.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:26:23 -07:00
Qu Wenruo e69bcee376 btrfs: qgroup: Cleanup the old ref_node-oriented mechanism.
Goodbye, the old mechanisim.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:26:11 -07:00
Qu Wenruo 442244c963 btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.
Since the self test transaction don't have delayed_ref_roots, so use
find_all_roots() and export btrfs_qgroup_account_extent() to simulate it

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:26:05 -07:00
Qu Wenruo 9d220c95f5 btrfs: qgroup: Switch rescan to new mechanism.
Switch rescan to use the new new extent oriented mechanism.

As rescan is also based on extent, new mechanism is just a perfect match
for rescan.

With re-designed internal functions, rescan is quite easy, just call
btrfs_find_all_roots() and then btrfs_qgroup_account_one_extent().

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:25:54 -07:00
Qu Wenruo 550d7a2ed5 btrfs: qgroup: Add new qgroup calculation function
btrfs_qgroup_account_extents().

The new btrfs_qgroup_account_extents() function should be called in
btrfs_commit_transaction() and it will update all the qgroup according
to delayed_ref_root->dirty_extent_root.

The new function can handle both normal operation during
commit_transaction() or in rescan in a unified method with clearer
logic.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:25:49 -07:00
Qu Wenruo 3b7d00f99c btrfs: qgroup: Add new function to record old_roots.
Add function btrfs_qgroup_prepare_account_extents() to get old_roots
which are needed for qgroup.

We do it in commit_transaction() and before switch_roots(), and only
search commit_root, so it gives a quite accurate view for previous
transaction.

With old_roots from previous transaction, we can use it to do accurate
account with current transaction.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:25:39 -07:00
Qu Wenruo 3368d001ba btrfs: qgroup: Record possible quota-related extent for qgroup.
Add hook in add_delayed_ref_head() to record quota-related extent record
into delayed_ref_root->dirty_extent_record rb-tree for later qgroup
accounting.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:25:32 -07:00
Qu Wenruo 823ae5b8e3 btrfs: qgroup: Add function qgroup_update_counters().
Add function qgroup_update_counters(), which will update related
qgroups' rfer/excl according to old/new_roots.

This is one of the two core functions for the new qgroup implement.

This is based on btrfs_adjust_coutners() but with clearer logic and
comment.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:25:28 -07:00
Qu Wenruo d810ef2be5 btrfs: qgroup: Add function qgroup_update_refcnt().
This function is used to update refcnt for qgroups.
And is one of the two core functions used in the new qgroup implement.

This is based on the old update_old/new_refcnt, but provides a unified
logic and behavior.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:25:24 -07:00
Qu Wenruo 9c542136fd btrfs: qgroup: Cleanup open-coded old/new_refcnt update and read.
Use inline functions to do such things, to improve readability.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Acked-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-10 09:25:13 -07:00
Christian Engelmayer ab3680dd18 btrfs: qgroup: Fix possible leak in btrfs_add_qgroup_relation()
Commit 9c8b35b1ba ("btrfs: quota: Automatically update related qgroups or
mark INCONSISTENT flags when assigning/deleting a qgroup relations.")
introduced the allocation of a temporary ulist in function
btrfs_add_qgroup_relation() and added the corresponding cleanup to the out
path. However, the allocation was introduced before the src/dst level check
that directly returns. Fix the possible leakage of the ulist by moving the
allocation after the input validation. Detected by Coverity CID 1295988.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-06-02 19:34:35 -07:00
Qu Wenruo 9c8b35b1ba btrfs: quota: Automatically update related qgroups or mark INCONSISTENT flags when assigning/deleting a qgroup relations.
Operation like qgroups assigning/deleting qgroup relations will mostly
cause qgroup data inconsistent, since it needs to do the full rescan to
determine whether shared extents are exclusive or still shared in
parent qgroups.

But there are some exceptions, like qgroup with only exclusive extents
(qgroup->excl == qgroup->rfer), in that case, we only needs to
modify all its parents' excl and rfer.

So this patch adds a quick path for such qgroup in qgroup
assign/remove routine, and if quick path failed, the qgroup status will
be marked INCONSISTENT, and return 1 to info user-land.

BTW since the quick path is much the same of qgroup_excl_accounting(),
so move the core of it to __qgroup_excl_accounting() and reuse it.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:59 -07:00
Dongsheng Yang 8ea0ec9e01 btrfs: qgroup: clear STATUS_FLAG_ON in disabling quota.
we forgot to clear STATUS_FLAG_ON in quota_disable(), it
will cause a problem shown as below:

	# mount /dev/sdc /mnt
	# btrfs quota enable /mnt
	# btrfs quota disable /mnt
	# btrfs quota rescan /mnt
	quota rescan started <--- expecting it fail here.
	# echo $?
	0

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:58 -07:00
Qu Wenruo 53b7cde9d5 btrfs: Update btrfs qgroup status item when rescan is done.
Update qgroup status when rescan is done.

Before this patch, status item is not updated on rescan finish, which
causing the RESCAN and INCONSISTENT flags never cleared.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:57 -07:00
Qu Wenruo 3393168d22 btrfs: qgroup: Fix dead judgement on qgroup_rescan_leaf() return value.
Old qgroup_rescan_leaf() comment indicates ret == 2 as complete and
cleared INCONSISTENT flag.

This is not true since it will never return 2, and inside it no codes
will clear INCONSISTENT flag.
The flag clearance is done in btrfs_qgroup_rescan_work().
This caused the bug that INCONSISTENT flag is never cleared.

So change the comment and fix the dead judgment.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:55 -07:00
Qu Wenruo 8465ecec96 btrfs: Check qgroup level in kernel qgroup assign.
Although we have qgroup level check in btrfs-progs, it's not enough
since other programe may still call ioctl directly not using
btrfs-progs. For example, systemd.

But it's btrfs-progs to be blame since we don't provide a
full-function(like subvolume create things) btrfs library with enough
check, and only rely on kernel ioctl.

So Add level checks in kernel too.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:53 -07:00
Dongsheng Yang f5a6b1c53b btrfs: qgroup: allow to remove qgroup which has parent but no child.
When a qgroup has parents but no child, it should be removable in
Theory I think. But currently, we can not remove it when it has
either parent or child.

Example:
	# btrfs quota enable /mnt
	# btrfs qgroup create 1/0 /mnt
	# btrfs qgroup create 2/0 /mnt
	# btrfs qgroup assign 1/0 2/0 /mnt
	# btrfs qgroup show -pcre /mnt
qgroupid rfer  excl  max_rfer max_excl parent  child
-------- ----  ----  -------- -------- ------  -----
0/5      16384 16384 0        0        ---     ---
1/0      0     0     0        0        2/0     ---
2/0      0     0     0        0        ---     1/0

At this time, there is no subvol or qgroup depending on it.
Just a qgroup 2/0 is its parent, but 2/0 can work well without
1/0. So I think 1/0 should be removalbe. But:
	# btrfs qgroup destroy 1/0 /mnt
ERROR: unable to destroy quota group: Device or resource busy

This patch remove the check of qgroup->parent in removing it,
then we can remove a qgroup when it has a parent.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:52 -07:00
Dongsheng Yang 09870d2772 btrfs: qgroup: return EINVAL if level of parent is not higher than child's.
When we create a subvol inheriting a qgroup, we need to check the level
of them. Otherwise, there is a chance a qgroup can inherit another qgroup
at the same level.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:51 -07:00
Dongsheng Yang e2d1f92399 btrfs: qgroup: do a reservation in a higher level.
There are two problems in qgroup:

a). The PAGE_CACHE is 4K, even when we are writing a data of 1K,
qgroup will reserve a 4K size. It will cause the last 3K in a qgroup
is not available to user.

b). When user is writing a inline data, qgroup will not reserve it,
it means this is a window we can exceed the limit of a qgroup.

The main idea of this patch is reserving the data size of write_bytes
rather than the reserve_bytes. It means qgroup will not care about
the data size btrfs will reserve for user, but only care about the
data size user is going to write. Then reserve it when user want to
write and release it in transaction committed.

In this way, qgroup can be released from the complex procedure in
btrfs and only do the reserve when user want to write and account
when the data is written in commit_transaction().

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:50 -07:00
Dongsheng Yang 31193213f1 Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
Currently, for pre_alloc or delay_alloc, the bytes will be accounted
in space_info by the three guys.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
But on the other hand, in qgroup, there are only two counters to account the
bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
bytes in space_info->bytes_may_use and qg->excl accounts bytes in
space_info->used. So the bytes in space_info->reserved is not accounted
in qgroup. If so, there is a window we can exceed the quota limit when
bytes is in space_info->reserved.

Example:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer     excl     max_rfer max_excl parent  child
-------- ----     ----     -------- -------- ------  -----
0/5      20987904 20987904 0        10485760 ---     ---

qg->excl is 20987904 larger than max_excl 10485760.

This patch introduce a new counter named may_use to qgroup, then
there are three counters in qgroup to account bytes in space_info
as below.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
qgroup->may_use           --- qgroup->reserved     --- qgroup->excl

With this patch applied:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer    excl    max_rfer max_excl parent  child
-------- ----    ----    -------- -------- ------  -----
0/5      9453568 9453568 0        10485760 ---     ---

Reported-by: Cyril SCETBON <cyril.scetbon@free.fr>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:47 -07:00
Dongsheng Yang 4087cf24ae Btrfs: qgroup: cleanup, remove an unsued parameter in btrfs_create_qgroup().
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:44 -07:00
Dongsheng Yang 03477d945f btrfs: qgroup: fix limit args override whole limit struct
btrfs_limit_group use arg limit to override the old qgroup_limit of
corresponding qgroup. However, we should override part of old qgroup_limit
according to the bit which has been set in arg limit.

Signed-off-by: Fan Chengniang <fancn.fnst@cn.fujitsu.com>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:43 -07:00
Dongsheng Yang d3001ed3a8 btrfs: qgroup: update limit info in function btrfs_run_qgroups().
When we commit_transaction(), qgroups in btree should be updated.
But, limit info is not considered currently. It will cause a problem
when a qgroup of a snapshot inherit the limit info from srcqgroup,
then there is an inconsistency.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:42 -07:00
Dongsheng Yang 1510e71c62 btrfs: qgroup: consolidate the parameter of fucntion update_qgroup_limit_item().
Cleanup: Change the parameter of update_qgroup_limit_item() to the family of
update_qgroup_xxx_item().

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:41 -07:00
Dongsheng Yang e8c8541ac3 btrfs: qgroup: update qgroup in memory at the same time when we update it in btree.
When we call btrfs_qgroup_inherit() with BTRFS_QGROUP_INHERIT_SET_LIMITS,
btrfs will update the limit info of qgroup in btree but forget to update
the qgroup in rbtree at the same time. It obviousely will cause an inconsistency.

This patch fix it by updating the rbtree at the same time.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:40 -07:00
Dongsheng Yang 3eeb4d597e btrfs: qgroup: inherit limit info from srcgroup in creating snapshot.
Currently, when we snapshot a subvol, snapshot will not copy the limits
from srcqgroup.

This patch make the qgroup in snapshot inherit the limit info when create
a snapshot.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:38 -07:00
Filipe Manana bf69196045 Btrfs: change the insertion criteria for the qgroup operations rbtree
After looking at Liu Bo's recent patch (titled
"Btrfs: fix comp_oper to get right order") I realized the search made by
qgroup_oper_exists() was buggy because its rbtree navigation comparison
function, comp_oper_exist(), only looks at the fields bytenr and ref_root
of a tree node, ignoring the seq field completely. This was wrong because
when we insert a node into the rbtree we use comp_oper(), which takes a
decision based first on bytenr, then on seq and then on the ref_root field.
That means qgroup_oper_exists() could miss the fact that at least one
operation with given bytenr and ref_root exists.

Consider the following simple example of a 3 nodes qgroup operations
rbtree (created using comp_oper before this patch), where each node's key
is a tuple with the shape (bytenr, seq, ref_root, op):

                          [ (4096, 2, 20, op X) ]
                         /                       \
                        /                         \
   [ (4096, 1, 5, op Y) ]                         [ (4096, 3, 10, op Z) ]

qgroup_oper_exists() when called to search for an existing operation for
bytenr 4096 and ref root 10 wouldn't find anything because it would go to
the left subtree instead of the right subtree, since comp_oper_exits()
ignores the seq field completely.

Fix this by changing the insertion navigation function to use the ref_root
field right after using the bytenr field and before using the seq field,
so that qgroup_oper_exists() / comp_oper_exist() work as expected.

This patch applies on top of the patch mentioned above from Liu.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:55:52 -07:00
Chris Mason fc4c3c872f Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1
Signed-off-by: Chris Mason <clm@fb.com>

Conflicts:
	fs/btrfs/disk-io.c
2015-03-25 10:52:48 -07:00
Chris Mason 9deed229fa Merge branch 'cleanups-for-4.1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:43:16 -07:00
Liu Bo 48da5f0a4c Btrfs: fix comp_oper to get right order
Case (oper1->seq > oper2->seq) should differ with case (oper1->seq < oper2->seq).

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-13 13:46:59 -07:00
David Sterba 3284da7b7b btrfs: use explicit initializer for seq_elem
Using {} as initializer for struct seq_elem does not properly initialize
the list_head member, but it currently works because it gets set through
btrfs_get_tree_mod_seq if 'seq' is 0.

Signed-off-by: David Sterba <dsterba@suse.cz>
2015-03-03 17:23:59 +01:00
Daniel Dressler 01d58472a8 Btrfs: disk-io: replace root args iff only fs_info used
This is the 3rd independent patch of a larger project to cleanup btrfs's
internal usage of btrfs_root. Many functions take btrfs_root only to
grab the fs_info struct.

By requiring a root these functions cause programmer overhead. That
these functions can accept any valid root is not obvious until
inspection.

This patch reduces the specificity of such functions to accept the
fs_info directly.

These patches can be applied independently and thus are not being
submitted as a patch series. There should be about 26 patches by the
project's completion. Each patch will cleanup between 1 and 34 functions
apiece.  Each patch covers a single file's functions.

This patch affects the following function(s):
  1) csum_tree_block
  2) csum_dirty_buffer
  3) check_tree_block_fsid
  4) btrfs_find_tree_block
  5) clean_tree_block

Signed-off-by: Daniel Dressler <danieru.dressler@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16 18:48:43 +01:00