Commit graph

201 commits

Author SHA1 Message Date
Linus Torvalds e213e26ab3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
  quota: stop using QUOTA_OK / NO_QUOTA
  dquot: cleanup dquot initialize routine
  dquot: move dquot initialization responsibility into the filesystem
  dquot: cleanup dquot drop routine
  dquot: move dquot drop responsibility into the filesystem
  dquot: cleanup dquot transfer routine
  dquot: move dquot transfer responsibility into the filesystem
  dquot: cleanup inode allocation / freeing routines
  dquot: cleanup space allocation / freeing routines
  ext3: add writepage sanity checks
  ext3: Truncate allocated blocks if direct IO write fails to update i_size
  quota: Properly invalidate caches even for filesystems with blocksize < pagesize
  quota: generalize quota transfer interface
  quota: sb_quota state flags cleanup
  jbd: Delay discarding buffers in journal_unmap_buffer
  ext3: quota_write cross block boundary behaviour
  quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
  quota: split out compat_sys_quotactl support from quota.c
  quota: split out netlink notification support from quota.c
  quota: remove invalid optimization from quota_sync_all
  ...

Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
2010-03-05 13:20:53 -08:00
Christoph Hellwig 5dd4056db8 dquot: cleanup space allocation / freeing routines
Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not.  Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:28 +01:00
Tiger Yang b89c54282d ocfs2: add extent block stealing for ocfs2 v5
This patch add extent block (metadata) stealing mechanism for
extent allocation. This mechanism is same as the inode stealing.
if no room in slot specific extent_alloc, we will try to
allocate extent block from the next slot.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Acked-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:07 -08:00
Linus Torvalds 849254e9bb Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Set i_nlink properly during reflink.
  ocfs2: Add reflinked file's inode to inode hash eariler.
  ocfs2: refcounttree.c cleanup.
  ocfs2: Find proper end cpos for a leaf refcount block.
2009-12-23 13:33:07 -08:00
Christoph Hellwig 2cfd30adf6 ocfs: stop using do_sync_mapping_range
do_sync_mapping_range(..., SYNC_FILE_RANGE_WRITE) is a very awkward way
to perform a filemap_fdatawrite_range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:49 -05:00
André Goddard Rosa af901ca181 tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 15:39:55 +01:00
Tao Ma 38a04e4327 ocfs2: Find proper end cpos for a leaf refcount block.
ocfs2 refcount tree is stored as an extent tree while
the leaf ocfs2_refcount_rec points to a refcount block.

The following step can trip a kernel panic.
mkfs.ocfs2 -b 512 -C 1M --fs-features=refcount $DEVICE
mount -t ocfs2 $DEVICE $MNT_DIR
FILE_NAME=$RANDOM
FILE_NAME_1=$RANDOM
FILE_REF="${FILE_NAME}_ref"
FILE_REF_1="${FILE_NAME}_ref_1"
for((i=0;i<305;i++))
do
# /mnt/1048576 is a file with 1048576 sizes.
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
done
for((i=0;i<3;i++))
do
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
done

for((i=0;i<2;i++))
do
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
done

cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME

for((i=0;i<11;i++))
do
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
done
reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF
# write_f is a program which will write some bytes to a file at offset.
# write_f -f file_name -l offset -w write_bytes.
./write_f -f $MNT_DIR/$FILE_REF -l $[310*1048576] -w 4096
./write_f -f $MNT_DIR/$FILE_REF -l $[306*1048576] -w 4096
./write_f -f $MNT_DIR/$FILE_REF -l $[311*1048576] -w 4096
./write_f -f $MNT_DIR/$FILE_NAME -l $[310*1048576] -w 4096
./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF_1
./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
#kernel panic here.

The reason is that if the ocfs2_extent_rec is the last record
in a leaf extent block, the old solution fails to find the
suitable end cpos. So this patch try to walk through the b-tree,
find the next sub root and get the c_pos the next sub-tree starts
from.

btw, I have runned tristan's test case against the patched kernel
for several days and this type of kernel panic never happens again.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-02 16:14:57 -08:00
Tao Ma c18b812d12 ocfs2: Make transaction extend more efficient.
In ocfs2_extend_rotate_transaction, op_credits is the orignal
credits in the handle and we only want to extend the credits
for the rotation, but the old solution always double it. It
is harmless for some minor operations, but for actions like
reflink we may rotate tree many times and cause the credits
increase dramatically. So this patch try to only increase
the desired credits.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:46 -07:00
Tao Ma 6ae23c5555 ocfs2: CoW refcount tree improvement.
During CoW, if the old extent record is refcounted, we allocate
som new clusters and do CoW. Actually we can have some improvement
here. If the old extent has refcount=1, that means now it is only
used by this file. So we don't need to allocate new clusters, just
remove the refcounted flag and it is OK. We also have to remove
it from the refcount tree while not deleting it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:36 -07:00
Tao Ma 6f70fa5199 ocfs2: Add CoW support.
This patch try CoW support for a refcounted record.

the whole process will be:
1. Calculate how many clusters we need to CoW and where we start.
   Extents that are not completely encompassed by the write will
   be broken on 1MB boundaries.
2. Do CoW for the clusters with the help of page cache.
3. Change the b-tree structure with the new allocated clusters.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:36 -07:00
Tao Ma bcbbb24a6a ocfs2: Decrement refcount when truncating refcounted extents.
Add 'Decrement refcount for delete' in to the normal truncate
process. So for a refcounted extent record, call refcount rec
decrementation instead of cluster free.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:35 -07:00
Tao Ma 1aa75fea64 ocfs2: Add functions for extents refcounted.
Add function ocfs2_mark_extent_refcounted which can mark
an extent refcounted.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:34 -07:00
Tao Ma 1823cb0b9f ocfs2: Add support of decrementing refcount for delete.
Given a physical cpos and length, decrement the refcount
in the tree. If the refcount for any portion of the extent goes
to zero, that portion is queued for freeing.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:33 -07:00
Tao Ma e2e9f6082b ocfs2: move tree path functions to alloc.h.
Now fs/ocfs2/alloc.c has more than 7000 lines. It contains our
basic b-tree operation. Although we have already make our b-tree
operation generic, the basic structrue ocfs2_path which is used
to iterate one b-tree branch is still static and limited to only
used in alloc.c. As refcount tree need them and I don't want to
add any more b-tree unrelated code to alloc.c, export them out.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:32 -07:00
Tao Ma fe92441595 ocfs2: Add refcount b-tree as a new extent tree.
Add refcount b-tree as a new extent tree so that it can
use the b-tree to store and maniuplate ocfs2_refcount_rec.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:31 -07:00
Tao Ma 555936bfcb ocfs2: Abstract extent split process.
ocfs2_mark_extent_written actually does the following things:
1. check the parameters.
2. initialize the left_path and split_rec.
3. call __ocfs2_mark_extent_written. it will do:
   1) check the flags of unwritten
   2) do the real split work.
The whole process is packed tightly somehow. So this patch
will abstract 2 different functions so that future b-tree
operation can work with it.

1. __ocfs2_split_extent will accept path and split_rec and do
  the real split work.
2. ocfs2_change_extent_flag will accept a new flag and initialize
   path and split_rec.

So now ocfs2_mark_extent_written will do:
1. check the parameters.
2. call ocfs2_change_extent_flag.
   1) initalize the left_path and split_rec.
   2) check whether the new flags conflict with the old one.
   3) call __ocfs2_split_extent to do the split.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:31 -07:00
Tao Ma 853a3a1439 ocfs2: Wrap ocfs2_extent_contig in ocfs2_extent_tree.
Add a new operation eo_ocfs2_extent_contig int the extent tree's
operations vector. So that with the new refcount tree, We want
this so that refcount trees can always return CONTIG_NONE and
prevent extent merging.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22 20:09:30 -07:00
Joel Becker 5e404e9ed1 ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree().
With this commit, extent tree operations are divorced from inodes and
rely on ocfs2_caching_info.  Phew!

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:13 -07:00
Joel Becker a1cf076ba9 ocfs2: __ocfs2_mark_extent_written() doesn't need struct inode.
We only allow unwritten extents on data, so the toplevel
ocfs2_mark_extent_written() can use an inode all it wants.  But the
subfunction isn't even using the inode argument.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:12 -07:00
Joel Becker f3868d0fa2 ocfs2: Teach ocfs2_replace_extent_rec() to use an extent_tree.
Don't use a struct inode anymore.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:11 -07:00
Joel Becker d231129f44 ocfs2: ocfs2_split_and_insert() no longer needs struct inode.
It already has an extent_tree.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:11 -07:00
Joel Becker dbdcf6a48a ocfs2: ocfs2_remove_extent() no longer needs struct inode.
One more generic btree function that is isolated from struct inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:10 -07:00
Joel Becker cbee7e1a6a ocfs2: ocfs2_add_clusters_in_btree() no longer needs struct inode.
One more function that doesn't need a struct inode to pass to its
children.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:09 -07:00
Joel Becker cc79d8c19e ocfs2: ocfs2_insert_extent() no longer needs struct inode.
One more function down, no inode in the entire insert-extent chain.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:09 -07:00
Joel Becker 92ba470c44 ocfs2: Make extent map insertion an extent_tree_operation.
ocfs2_insert_extent() wants to insert a record into the extent map if
it's an inode data extent.  But since many btrees can call that
function, let's make it an op on ocfs2_extent_tree.  Other tree types
can leave it empty.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:08 -07:00
Joel Becker 627961b77e ocfs2: ocfs2_figure_insert_type() no longer needs struct inode.
It's not using it, so remove it from the parameter list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:08 -07:00
Joel Becker 1ef61b3314 ocfs2: Remove inode from ocfs2_figure_extent_contig().
It already has an ocfs2_extent_tree and doesn't need the inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:07 -07:00
Joel Becker a29702914a ocfs2: Swap inode for extent_tree in ocfs2_figure_merge_contig_type().
We don't want struct inode in generic btree operations.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:07 -07:00
Joel Becker b4a176515c ocfs2: ocfs2_extent_contig() only requires the superblock.
Don't pass the inode in.  We don't want it around for generic btree
operations.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:05 -07:00
Joel Becker 3505bec018 ocfs2: ocfs2_do_insert_extent() and ocfs2_insert_path() no longer need an inode.
They aren't using it, so remove it from their parameter lists.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:05 -07:00
Joel Becker c38e52bb1c ocfs2: Give ocfs2_split_record() an extent_tree instead of an inode.
Another on the way to generic btree functions.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:05 -07:00
Joel Becker d562862314 ocfs2: ocfs2_insert_at_leaf() doesn't need struct inode.
Give it an ocfs2_extent_tree and it is happy.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:04 -07:00
Joel Becker 4c911eefca ocfs2: Make truncating the extent map an extent_tree_operation.
ocfs2_remove_extent() wants to truncate the extent map if it's
truncating an inode data extent.  But since many btrees can call that
function, let's make it an op on ocfs2_extent_tree.  Other tree types
can leave it empty.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:03 -07:00
Joel Becker 043beebb6c ocfs2: ocfs2_truncate_rec() doesn't need struct inode.
It's not using it anymore.  Remove it from the parameter list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:03 -07:00
Joel Becker d401dc12fc ocfs2: ocfs2_grow_branch() and ocfs2_append_rec_to_path() lose struct inode.
ocfs2_grow_branch() not really using it other than to pass it to the
subfunctions ocfs2_shift_tree_depth(), ocfs2_find_branch_target(), and
ocfs2_add_branch().  The first two weren't it either, so they drop the
argument.  ocfs2_add_branch() only passed it to
ocfs2_adjust_rightmost_branch(), which drops the inode argument and uses
the ocfs2_extent_tree as well.

ocfs2_append_rec_to_path() can be take an ocfs2_extent_tree instead of
the inode.  The function ocfs2_adjust_rightmost_records() goes along for
the ride.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:02 -07:00
Joel Becker c495dd24ac ocfs2: ocfs2_try_to_merge_extent() doesn't need struct inode.
It's not using it, so remove it from the parameter list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:02 -07:00
Joel Becker 4fe82c312a ocfs2: ocfs2_merge_rec_left/right() no longer need struct inode.
Drop it from the parameters - they already have ocfs2_extent_list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:01 -07:00
Joel Becker 70f18c08b4 ocfs2: ocfs2_rotate_tree_left() no longer needs struct inode.
It already gets ocfs2_extent_tree, so we can just use that.  This chains
to the same modification for ocfs2_remove_rightmost_path() and
ocfs2_rotate_rightmost_leaf_left().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:08:00 -07:00
Joel Becker e46f74dc35 ocfs2: __ocfs2_rotate_tree_left() doesn't need struct inode.
It already has struct ocfs2_extent_tree, which has the caching info.  So
we don't need to pass it struct inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:59 -07:00
Joel Becker 1e2dd63fe0 ocfs2: ocfs2_rotate_subtree_left() doesn't need struct inode.
It already has struct ocfs2_extent_tree, which has the caching info.  So
we don't need to pass it struct inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:59 -07:00
Joel Becker 09106bae05 ocfs2: ocfs2_update_edge_lengths() doesn't need struct inode.
Pass in the extent tree, which is all we need.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:58 -07:00
Joel Becker 1bbf0b8d60 ocfs2: ocfs2_rotate_tree_right() doesn't need struct inode.
We don't need struct inode in ocfs2_rotate_tree_right() anymore.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:58 -07:00
Joel Becker 6136ca5f5f ocfs2: Drop struct inode from ocfs2_extent_tree_operations.
We can get to the inode from the caching information.  Other parent
types don't need it.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:57 -07:00
Joel Becker 7dc0280567 ocfs2: Pass ocfs2_extent_tree to ocfs2_get_subtree_root()
Get rid of the inode argument.  Use extent_tree instead.  This means a
few more functions have to pass an extent_tree around.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:55 -07:00
Joel Becker 5c601aba8c ocfs2: Get inode out of ocfs2_rotate_subtree_root_right().
Pass the ocfs2_extent_list down through ocfs2_rotate_tree_right() and
get rid of struct inode in ocfs2_rotate_subtree_root_right().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:55 -07:00
Joel Becker 4619c73e7c ocfs2: ocfs2_complete_edge_insert() doesn't need struct inode at all.
Completely unused argument.  Get rid of it.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:54 -07:00
Joel Becker 6641b0ce32 ocfs2: Pass ocfs2_extent_tree to ocfs2_unlink_path()
ocfs2_unlink_path() doesn't need struct inode, so let's pass it struct
ocfs2_extent_tree.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:54 -07:00
Joel Becker 42a5a7a9a5 ocfs2: ocfs2_create_new_meta_bhs() doesn't need struct inode.
Pass struct ocfs2_extent_tree into ocfs2_create_new_meta_bhs().  It no
longer needs struct inode or ocfs2_super.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:53 -07:00
Joel Becker facdb77f54 ocfs2: ocfs2_find_path() only needs the caching info
ocfs2_find_path and ocfs2_find_leaf() walk our btrees, reading extent
blocks.  They need struct ocfs2_caching_info for that, but not struct
inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:53 -07:00
Joel Becker 3d03a305de ocfs2: Pass ocfs2_caching_info to ocfs2_read_extent_block().
extent blocks belong to btrees on more than just inodes, so we want to
pass the ocfs2_caching_info structure directly to
ocfs2_read_extent_block().  A number of places in alloc.c can now drop
struct inode from their argument list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:52 -07:00
Joel Becker d9a0a1f83b ocfs2: Store the ocfs2_caching_info on ocfs2_extent_tree.
What do we cache?  Metadata blocks.  What are most of our non-inode metadata
blocks?  Extent blocks for our btrees.  struct ocfs2_extent_tree is the
main structure for managing those.  So let's store the associated
ocfs2_caching_info there.

This means that ocfs2_et_root_journal_access() doesn't need struct inode
anymore, and any place that has an et can refer to et->et_ci instead of
INODE_CACHE(inode).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:51 -07:00
Joel Becker 0cf2f7632b ocfs2: Pass struct ocfs2_caching_info to the journal functions.
The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions.  Thus the
journal locks a metadata cache with the cache io_lock function.  It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:50 -07:00
Joel Becker 8cb471e8f8 ocfs2: Take the inode out of the metadata read/write paths.
We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache.  This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:48 -07:00
Tao Ma 60e2ec4866 ocfs2: release the buffer head in ocfs2_do_truncate.
In ocfs2_do_truncate, we forget to release last_eb_bh which
will cause memleak. So call brelse in the end.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-17 12:50:35 -07:00
Tao Ma 82e12644cf ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records.
In ocfs2_adjust_adjacent_records, we will adjust adjacent records
according to the extent_list in the lower level. But actually
the lower level tree will either be a leaf or a branch. If we only
use ocfs2_is_empty_extent we will meet with some problem if the lower
tree is a branch (tree_depth > 1). So use !ocfs2_rec_clusters instead.
And actually only the leaf record can have holes. So add a BUG_ON
for non-leaf branch.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23 10:58:46 -07:00
Tao Ma 3c5e10683e ocfs2: Add extra credits and access the modified bh in update_edge_lengths.
In normal tree rotation left process, we will never touch the tree
branch above subtree_index and ocfs2_extend_rotate_transaction doesn't
reserve the credits for them either.

But when we want to delete the rightmost extent block, we have to update
the rightmost records for all the rightmost branch(See
ocfs2_update_edge_lengths), so we have to allocate extra credits for them.
What's more, we have to access them also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-21 14:41:54 -07:00
Tao Ma 6b791bcc8b ocfs2: Adjust rightmost path in ocfs2_add_branch.
In ocfs2_add_branch, we use the rightmost rec of the leaf extent block
to generate the e_cpos for the newly added branch. In the most case, it
is OK but if the parent extent block's rightmost rec covers more clusters
than the leaf does, it will cause kernel panic if we insert some clusters
in it. The message is something like:
(7445,1):ocfs2_insert_at_leaf:3775 ERROR: bug expression:
le16_to_cpu(el->l_next_free_rec) >= le16_to_cpu(el->l_count)
(7445,1):ocfs2_insert_at_leaf:3775 ERROR: inode 66053, depth 0, count 28,
next free 28, rec.cpos 270, rec.clusters 1, insert.cpos 275, insert.clusters 1
 [<fa7ad565>] ? ocfs2_do_insert_extent+0xb58/0xda0 [ocfs2]
 [<fa7b08f2>] ? ocfs2_insert_extent+0x5bd/0x6ba [ocfs2]
 [<fa7b1b8b>] ? ocfs2_add_clusters_in_btree+0x37f/0x564 [ocfs2]
...

The panic can be easily reproduced by the following small test case
(with bs=512, cs=4K, and I remove all the error handling so that it looks
clear enough for reading).

int main(int argc, char **argv)
{
	int fd, i;
	char buf[5] = "test";

	fd = open(argv[1], O_RDWR|O_CREAT);

	for (i = 0; i < 30; i++) {
		lseek(fd, 40960 * i, SEEK_SET);
		write(fd, buf, 5);
	}

	ftruncate(fd, 1146880);

	lseek(fd, 1126400, SEEK_SET);
	write(fd, buf, 5);

	close(fd);

	return 0;
}

The reason of the panic is that:
the 30 writes and the ftruncate makes the file's extent list looks like:

	Tree Depth: 1   Count: 19   Next Free Rec: 1
	## Offset        Clusters       Block#
	0  0             280            86183
	SubAlloc Bit: 7   SubAlloc Slot: 0
	Blknum: 86183   Next Leaf: 0
	CRC32: 00000000   ECC: 0000
	Tree Depth: 0   Count: 28   Next Free Rec: 28
	## Offset        Clusters       Block#          Flags
	0  0             1              143368          0x0
	1  10            1              143376          0x0
	...
	26 260           1              143576          0x0
	27 270           1              143584          0x0

Now another write at 1126400(275 cluster) whiich will write at the gap
between 271 and 280 will trigger ocfs2_add_branch, but the result after
the function looks like:
	Tree Depth: 1   Count: 19   Next Free Rec: 2
	## Offset        Clusters       Block#
	0  0             280            86183
	1  271           0             143592
So the extent record is intersected and make the following operation bug out.

This patch just try to remove the gap before we add the new branch, so that
the root(branch) rightmost rec will cover the same right position. So in the
above case, before adding branch the tree will be changed to
	Tree Depth: 1   Count: 19   Next Free Rec: 1
	## Offset        Clusters       Block#
	0  0             271            86183
	SubAlloc Bit: 7   SubAlloc Slot: 0
	Blknum: 86183   Next Leaf: 0
	CRC32: 00000000   ECC: 0000
	Tree Depth: 0   Count: 28   Next Free Rec: 28
	## Offset        Clusters       Block#          Flags
	0  0             1              143368          0x0
	1  10            1              143376          0x0
	...
	26 260           1              143576          0x0
	27 270           1              143584          0x0
And after branch add, the tree looks like
	Tree Depth: 1   Count: 19   Next Free Rec: 2
	## Offset        Clusters       Block#
	0  0             271            86183
	1  271           0             143592

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-15 14:49:43 -07:00
Mark Fasheh 9b7895efac ocfs2: Add a name indexed b-tree to directory inodes
This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
2009-04-03 11:39:15 -07:00
Tao Ma 74e77eb30d ocfs2: Fix a bug found by sparse check.
We need to use le32_to_cpu to test rec->e_cpos in
ocfs2_dinode_insert_check.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-03-12 16:46:01 -07:00
Tao Ma 47be12e4ee ocfs2: Access and dirty the buffer_head in mark_written.
In __ocfs2_mark_extent_written, when we meet with the situation
of c_split_covers_rec, the old solution just replace the extent
record and forget to access and dirty the buffer_head. This will
cause a problem when the unwritten extent is in an extent block.
So access and dirty it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:09 -08:00
Mark Fasheh fd4ef23196 ocfs2: add quota call to ocfs2_remove_btree_range()
We weren't reclaiming the clusters which get free'd from this function,
so any user punching holes in a file would still have those bytes accounted
against him/her. Add the call to vfs_dq_free_space_nodirty() to fix this.
Interestingly enough, the journal credits calculation already took this into
account.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Jan Kara <jack@suse.cz>
2009-02-02 14:20:20 -08:00
Fernando Carrijo c19a28e119 remove lots of double-semicolons
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:14 -08:00
Tao Ma 9047beabb8 ocfs2: Access the right buffer_head in ocfs2_merge_rec_left.
In commit "ocfs2: Use metadata-specific ocfs2_journal_access_*()
functions", the wrong buffer_head is accessed. So change it
to the right buffer_head.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:37 -08:00
Joel Becker 2a50a743bd ocfs2: Create ocfs2_xattr_value_buf.
When an ocfs2 extended attribute is large enough to require its own
allocation tree, we root it with an ocfs2_xattr_value_root.  However,
these roots can be a part of inodes, xattr blocks, or xattr buckets.
Thus, they need a different journal access function for each container.

We wrap the bh, its journal access function, and the value root (xv) in
a structure called ocfs2_xattr_valu_buf.  This is a package that can
be passed around.  In this first pass, we simply pass it to the
extent tree code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:32 -08:00
Joel Becker 13723d00e3 ocfs2: Use metadata-specific ocfs2_journal_access_*() functions.
The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
commit triggers and allow us to compute metadata ecc right before the
buffers are written out.  This commit provides ecc for inodes, extent
blocks, group descriptors, and quota blocks.  It is not safe to use
extened attributes and metaecc at the same time yet.

The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
the type of block at their root.  Before, it didn't matter, but now the
root block must use the appropriate ocfs2_journal_access_*() function.
To keep this abstract, the structures now have a pointer to the matching
journal_access function and a wrapper call to call it.

A few places use naked ocfs2_write_block() calls instead of adding the
blocks to the journal.  We make sure to calculate their checksum and ecc
before the write.

Since we pass around the journal_access functions.  Let's typedef them
in ocfs2.h.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:32 -08:00
Joel Becker ffdd7a5463 ocfs2: Wrap up the common use cases of ocfs2_new_path().
The majority of ocfs2_new_path() calls are:

	ocfs2_new_path(path_root_bh(otherpath),
		       path_root_el(otherpath));

Let's call that ocfs2_new_path_from_path().  The rest do similar things
from struct ocfs2_extent_tree.  Let's call those
ocfs2_new_path_from_et().  This will make the next change easier.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:31 -08:00
Joel Becker d6b32bbb3e ocfs2: block read meta ecc.
Add block check calls to the read_block validate functions.  This is the
almost all of the read-side checking of metaecc.  xattr buckets are not checked
yet.   Writes are also unchecked, and so a read-write mount will quickly fail.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:31 -08:00
Jan Kara a90714c150 ocfs2: Add quota calls for allocation and freeing of inodes and space
Add quota calls for allocation and freeing of inodes and space, also update
estimates on number of needed credits for a transaction. Move out inode
allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
outside of a transaction.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:23 -08:00
Mark Fasheh 53ef99cad9 ocfs2: Remove JBD compatibility layer
JBD2 is fully backwards compatible with JBD and it's been tested enough with
Ocfs2 that we can clean this code up now.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:55 -08:00
Joel Becker 970e4936d7 ocfs2: Validate metadata only when it's read from disk.
Add an optional validation hook to ocfs2_read_blocks().  Now the
validation function is only called when a block was actually read off of
disk.  It is not called when the buffer was in cache.

We add a buffer state bit BH_NeedsValidate to flag these buffers.  It
must always be one higher than the last JBD2 buffer state bit.

The dinode, dirblock, extent_block, and xattr_block validators are
lifted to this scheme directly.  The group_descriptor validator needs to
be split into two pieces.  The first part only needs the gd buffer and
is passed to ocfs2_read_block().  The second part requires the dinode as
well, and is called every time.  It's only 3 compares, so it's tiny.
This also allows us to clean up the non-fatal gd check used by resize.c.
It now has no magic argument.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:53 -08:00
Joel Becker 5e96581a37 ocfs2: Wrap extent block reads in a dedicated function.
We weren't consistently checking extent blocks after we read them.
Most places checked the signature, but none checked h_blkno or
h_fs_signature.  Create a toplevel ocfs2_read_extent_block() that does
the read and the validation.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:53 -08:00
Joel Becker 10995aa245 ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks.
Random places in the code would check a dinode bh to see if it was
valid.  Not only did they do different levels of validation, they
handled errors in different ways.

The previous commit unified inode block reads, validating all block
reads in the same place.  Thus, these haphazard checks are no longer
necessary.  Rather than eliminate them, however, we change them to
BUG_ON() checks.  This ensures the assumptions remain true.  All of the
code paths to these checks have been audited to ensure they come from a
validated inode read.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:52 -08:00
Joel Becker b657c95c11 ocfs2: Wrap inode block reads in a dedicated function.
The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call.  Each place that does this has a different set
of sanity checks it performs.  Some check only the signature.  A couple
validate the block number (the block read vs di->i_blkno).  A couple
others check for VALID_FL.  Only one place validates i_fs_generation.  A
couple check nothing.  Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block().  This will validate
all the above fields, going readonly if they are invalid (they never
should be).  ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags.  Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:52 -08:00
Mark Fasheh fecc01126d ocfs2: turn __ocfs2_remove_inode_range() into ocfs2_remove_btree_range()
This patch genericizes the high level handling of extent removal.
ocfs2_remove_btree_range() is nearly identical to
__ocfs2_remove_inode_range(), except that extent tree operations have been
used where necessary. We update ocfs2_remove_inode_range() to use the
generic helper. Now extent tree based structures have an easy way to
truncate ranges.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
2009-01-05 08:34:19 -08:00
Tao Ma 2891d290aa ocfs2: Add clusters free in dealloc_ctxt.
Now in ocfs2 xattr set, the whole process are divided into many small
parts and they are wrapped into diffrent transactions and it make the
set doesn't look like a real transaction. So we want to integrate it
into a real one.

In some cases we will allocate some clusters and free some in just one
transaction. e.g, one xattr is larger than inline size, so it and its
value root is stored within the inode while the value is outside in a
cluster. Then we try to update it with a smaller value(larger than the
size of root but smaller than inline size), we may need to free the
outside cluster while allocate a new bucket(one cluster) since now the
inode may be full. The old solution will lock the global_bitmap(if the
local alloc failed in stress test) and then the truncate log. This will
cause a ABBA lock with truncate log flush.

This patch add the clusters free in dealloc_ctxt, so that we can record
the free clusters during the transaction and then free it after we
release the global_bitmap in xattr set.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:18 -08:00
Joel Becker 0fcaa56a2a ocfs2: Simplify ocfs2_read_block()
More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
Only six pass a different flag set.  Rather than have every caller care,
let's make ocfs2_read_block() take no flags and always do a cached read.
The remaining six places can call ocfs2_read_blocks() directly.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14 11:51:57 -07:00
Joel Becker 31d33073ca ocfs2: Require an inode for ocfs2_read_block(s)().
Now that synchronous readers are using ocfs2_read_blocks_sync(), all
callers of ocfs2_read_blocks() are passing an inode.  Use it
unconditionally.  Since it's there, we don't need to pass the
ocfs2_super either.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14 11:43:29 -07:00
Mark Fasheh a81cb88b64 ocfs2: Don't check for NULL before brelse()
This is pointless as brelse() already does the check.

Signed-off-by: Mark Fasheh
2008-10-13 17:02:44 -07:00
Joel Becker 2b4e30fbde ocfs2: Switch over to JBD2.
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.

It's a pretty trivial change.  Most functions are just renamed.  The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.

Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem.  It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.

We provide a compatibility option so that paranoid people can still use
JBD for the time being.  This will go away shortly.

[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
  ocfs2_truncate_for_delete(). --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:43 -07:00
Joel Becker 8d6220d6a7 ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree()
The original get/put_extent_tree() functions held a reference on
et_root_bh.  However, every single caller already has a safe reference,
making the get/put cycle irrelevant.

We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree().  It
no longer gets a reference on et_root_bh.  ocfs2_put_extent_tree() is
removed.  Callers now have a simpler init+use pattern.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker 1625f8ac15 ocfs2: Comment struct ocfs2_extent_tree_operations.
struct ocfs2_extent_tree_operations provides methods for the different
on-disk btrees in ocfs2.  Describing what those methods do is probably a
good idea.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker f99b9b7ccf ocfs2: Make ocfs2_extent_tree the first-class representation of a tree.
We now have three different kinds of extent trees in ocfs2: inode data
(dinode), extended attributes (xattr_tree), and extended attribute
values (xattr_value).  There is a nice abstraction for them,
ocfs2_extent_tree, but it is hidden in alloc.c.  All the calling
functions have to pick amongst a varied API and pass in type bits and
often extraneous pointers.

A better way is to make ocfs2_extent_tree a first-class object.
Everyone converts their object to an ocfs2_extent_tree() via the
ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all
tree calls to alloc.c.

This simplifies a lot of callers, making for readability.  It also
provides an easy way to add additional extent tree types, as they only
need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree()
function.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker 1e61ee79e2 ocfs2: Add an insertion check to ocfs2_extent_tree_operations.
A couple places check an extent_tree for a valid inode.  We move that
out to add an eo_insert_check() operation.  It can be called from
ocfs2_insert_extent() and elsewhere.

We also have the wrapper calls ocfs2_et_insert_check() and
ocfs2_et_sanity_check() ignore NULL ops.  That way we don't have to
provide useless operations for xattr types.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker 1a09f556e5 ocfs2: Create specific get_extent_tree functions.
A caller knows what kind of extent tree they have.  There's no reason
they have to call ocfs2_get_extent_tree() with a NULL when they could
just as easily call a specific function to their type of extent tree.

Introduce ocfs2_dinode_get_extent_tree(),
ocfs2_xattr_tree_get_extent_tree(), and
ocfs2_xattr_value_get_extent_tree().  They only take the necessary
arguments, calling into the underlying __ocfs2_get_extent_tree() to do
the real work.

__ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but
without needing any switch-by-type logic.

ocfs2_get_extent_tree() is now a wrapper around the specific calls.  It
exists because a couple alloc.c functions can take et_type.  This will
go later.

Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a
struct ocfs2_xattr_value_root* instead of void*.  This gives us
typechecking where we didn't have it before.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker 943cced39e ocfs2: Determine an extent tree's max_leaf_clusters in an et_op.
Provide an optional extent_tree_operation to specify the
max_leaf_clusters of an ocfs2_extent_tree.  If not provided, the value
is 0 (unlimited).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker 1c25d93a4a ocfs2: Use struct ocfs2_extent_tree in ocfs2_num_free_extents().
ocfs2_num_free_extents() re-implements the logic of
ocfs2_get_extent_tree().  Now that ocfs2_get_extent_tree() does not
allocate, let's use it in ocfs2_num_free_extents() to simplify the code.

The inode validation code in ocfs2_num_free_extents() is not needed.
All callers are passing in pre-validated inodes.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker 0ce1010f1a ocfs2: Provide the get_root_el() method to ocfs2_extent_tree_operations.
The root_el of an ocfs2_extent_tree needs to be calculated from
et->et_object.  Make it an operation on et->et_ops.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker ea5efa1512 ocfs2: Make 'private' into 'object' on ocfs2_extent_tree.
The 'private' pointer was a way to store off xattr values, which don't
live at a set place in the bh.  But the concept of "the object
containing the extent tree" is much more generic.  For an inode it's the
struct ocfs2_dinode, for an xattr value its the value.  Let's save off
the 'object' at all times.  If NULL is passed to
ocfs2_get_extent_tree(), 'object' is set to bh->b_data;

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker dc0ce61af4 ocfs2: Make ocfs2_extent_tree get/put instead of alloc.
Rather than allocating a struct ocfs2_extent_tree, just put it on the
stack.  Fill it with ocfs2_get_extent_tree() and drop it with
ocfs2_put_extent_tree().  Now the callers don't have to ENOMEM, yet
still safely ref the root_bh.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker ce1d9ea621 ocfs2: Prefix the ocfs2_extent_tree structure.
The members of the ocfs2_extent_tree structure gain a prefix of 'et_'.
All users are updated.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker 35dc0aa3c5 ocfs2: Prefix the extent tree operations structure.
The ocfs2_extent_tree_operations structure gains a field prefix on its
members.  The ->eo_sanity_check() operation gains a wrapper function for
completeness.  All of the extent tree operation wrappers gain a
consistent name (ocfs2_et_*()).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Tao Ma ca12b7c489 ocfs2: Optionally limit extent size in ocfs2_insert_extent()
In xattr bucket, we want to limit the maximum size of a btree leaf,
otherwise we'll lose the benefits of hashing because we'll have to search
large leaves.

So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster
size we want so that we can prevent ocfs2_insert_extent() from merging the leaf
record even if it is contiguous with an adjacent record.

Other btree types are not affected by this change.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:03 -07:00
Tao Ma ba492615f0 ocfs2: Add xattr index tree operations
When necessary, an ocfs2_xattr_block will embed an ocfs2_extent_list to
store large numbers of EAs. This patch adds a new type in
ocfs2_extent_tree_type and adds the implementation so that we can re-use the
b-tree code to handle the storage of many EAs.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:02 -07:00
Tiger Yang fdd77704a8 ocfs2: reserve inline space for extended attribute
Add the structures and helper functions we want for handling inline extended
attributes. We also update the inline-data handlers so that they properly
function in the event that we have both inline data and inline attributes
sharing an inode block.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:02 -07:00
Tao Ma f56654c435 ocfs2: Add extent tree operation for xattr value btrees
Add some thin wrappers around ocfs2_insert_extent() for each of the 3
different btree types, ocfs2_inode_insert_extent(),
ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The
last is for the xattr index btree, which will be used in a followup patch.

All the old callers in file.c etc will call ocfs2_dinode_insert_extent(),
while the other two handle the xattr issue. And the init of extent tree are
handled by these functions.

When storing xattr value which is too large, we will allocate some clusters
for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In
order to re-use the b-tree operation code, a new parameter named "private"
is added into ocfs2_extent_tree and it is used to indicate the root of
ocfs2_exent_list. The reason is that we can't deduce the root from the
buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse,
in any place in an ocfs2_xattr_bucket.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:01 -07:00
Tao Ma 0eb8d47e69 ocfs2: Make high level btree extend code generic
Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic
function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls
ocfs2_do_cluster_allocation() now, but the latter can be used for other
btree types as well.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:59 -07:00
Tao Ma e7d4cb6bc1 ocfs2: Abstract ocfs2_extent_tree in b-tree operations.
In the old extent tree operation, we take the hypothesis that we
are using the ocfs2_extent_list in ocfs2_dinode as the tree root.
As xattr will also use ocfs2_extent_list to store large value
for a xattr entry, we refactor the tree operation so that xattr
can use it directly.

The refactoring includes 4 steps:
1. Abstract set/get of last_eb_blk and update_clusters since they may
   be stored in different location for dinode and xattr.
2. Add a new structure named ocfs2_extent_tree to indicate the
   extent tree the operation will work on.
3. Remove all the use of fe_bh and di, use root_bh and root_el in
   extent tree instead. So now all the fe_bh is replaced with
   et->root_bh, el with root_el accordingly.
4. Make ocfs2_lock_allocators generic. Now it is limited to be only used
   in file extend allocation. But the whole function is useful when we want
   to store large EAs.

Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used
for anything other than truncate inode data btrees.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:58 -07:00
Tao Ma 811f933df1 ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode.
ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
they are all limited to an inode btree because they use a struct
ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
(the part of an ocfs2_dinode they actually use) so that the xattr btree code
can use these functions.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:58 -07:00
Tao Ma 231b87d109 ocfs2: Modify ocfs2_num_free_extents for future xattr usage.
ocfs2_num_free_extents() is used to find the number of free extent records
in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to
use this for extended attribute trees in the future, so genericize the
interface the take a buffer head. A future patch will allow that buffer_head
to contain any structure rooting an ocfs2 btree.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:58 -07:00
Mark Fasheh 00dc417fa3 ocfs2: fiemap support
Plug ocfs2 into ->fiemap. Some portions of ocfs2_get_clusters() had to be
refactored so that the extent cache can be skipped in favor of going
directly to the on-disk records. This makes it easier for us to determine
which extent is the last one in the btree. Also, I'm not sure we want to be
caching fiemap lookups anyway as they're not directly related to data
read/write.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ocfs2-devel@oss.oracle.com
Cc: linux-fsdevel@vger.kernel.org
2008-10-03 17:32:11 -04:00