redonkable/alistair23-linux

Author	SHA1	Message	Date
Changman Lee	9c01503f4d	f2fs: cleanup redundant macro We've already made fi and sbi for inode. Let's avoid duplicated work. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-01 14:16:50 -08:00
Jaegeuk Kim	92dffd0179	f2fs: convert inline_data when i_size becomes large If i_size becomes large outside of MAX_INLINE_DATA, we shoud convert the inode. Otherwise, we can make some dirty pages during the truncation, and those pages will be written through f2fs_write_data_page. At that moment, the inode has still inline_data, so that it tries to write non- zero pages into inline_data area. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-11 14:16:12 -08:00
Jaegeuk Kim	764d2c8040	f2fs: fix deadlock to grab 0'th data page The scenario is like this. One trhead triggers: f2fs_write_data_pages lock_page f2fs_write_data_page f2fs_lock_op <- wait The other thread triggers: f2fs_truncate truncate_blocks f2fs_lock_op truncate_partial_data_page lock_page <- wait for locking the page This patch resolves this bug by relocating truncate_partial_data_page. This function is just to truncate user data page and not related to FS consistency as well. And, we don't need to call truncate_inline_data. Rather than that, f2fs_write_data_page will finally update inline_data later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-11 14:15:48 -08:00
Jaegeuk Kim	a344b9fda0	f2fs: disable roll-forward when active_logs = 2 The roll-forward mechanism should be activated when the number of active logs is not 2. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-05 20:05:53 -08:00
Jaegeuk Kim	d5053a34a9	f2fs: introduce -o fastboot for reducing booting time only If a system wants to reduce the booting time as a top priority, now we can use a mount option, -o fastboot. With this option, f2fs conducts a little bit slow write_checkpoint, but it can avoid the node page reads during the next mount time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-04 17:34:15 -08:00
Jaegeuk Kim	b3d208f96d	f2fs: revisit inline_data to avoid data races and potential bugs This patch simplifies the inline_data usage with the following rule. 1. inline_data is set during the file creation. 2. If new data is requested to be written ranges out of inline_data, f2fs converts that inode permanently. 3. There is no cases which converts non-inline_data inode to inline_data. 4. The inline_data flag should be changed under inode page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-04 17:34:11 -08:00
Jaegeuk Kim	5ab18570b8	f2fs: should not truncate any inline_dentry If the inode has inline_dentry, we should not truncate any block indices. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:34 -08:00
Chao Yu	622f28ae9b	f2fs: enable inline dir handling Add inline dir functions into normal dir ops' function to handle inline ops. Besides, we enable inline dir mode when a new dir inode is created if inline_data option is on. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:32 -08:00
Jaegeuk Kim	13fd8f89f6	f2fs: fix to call f2fs_unlock_op This patch fixes to call f2fs_unlock_op, which was missing before. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:30 -08:00
Jaegeuk Kim	9ba69cf987	f2fs: avoid to allocate when inline_data was written The sceanrio is like this. inline_data i_size page write_begin/vm_page_mkwrite X 30 dirty_page X 30 write to #4096 position X 30 get_dnode_of_data wait for get_dnode_of_data O 30 write inline_data O 30 get_dnode_of_data O 30 reserve data block .. In this case, we have #0 = NEW_ADDR and inline_data as well. We should not allow this condition for further access. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:30 -08:00
Jaegeuk Kim	1ce86bf6f8	f2fs: fix race conditon on truncation with inline_data Let's consider the following scenario. blkaddr[0] inline_data i_size i_blocks writepage truncate NEW X 4096 2 dirty page #0 NEW X 0 change i_size NEW X 0 2 f2fs_write_inline_data NEW X 0 2 get_dnode_of_data NEW X 0 2 truncate_data_blocks_range NULL O 0 1 memcpy(inline_data) NULL O 0 1 f2fs_put_dnode NULL O 0 1 f2fs_truncate NULL O 0 1 get_dnode_of_data NULL O 0 1 invalid block addr This patch adds checking inline_data flag during f2fs_truncate not to refer corrupted block indices. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:29 -08:00
Jaegeuk Kim	02a1335f25	f2fs: support volatile operations for transient data This patch adds support for volatile writes which keep data pages in memory until f2fs_evict_inode is called by iput. For instance, we can use this feature for the sqlite database as follows. While supporting atomic writes for main database file, we can keep its journal data temporarily in the page cache by the following sequence. 1. open -> ioctl(F2FS_IOC_START_VOLATILE_WRITE); 2. writes : keep all the data in the page cache. 3. flush to the database file with atomic writes a. ioctl(F2FS_IOC_START_ATOMIC_WRITE); b. writes c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE); 4. close -> drop the cached data Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-10-07 11:54:41 -07:00
Jaegeuk Kim	88b88a6679	f2fs: support atomic writes This patch introduces a very limited functionality for atomic write support. In order to support atomic write, this patch adds two ioctls: o F2FS_IOC_START_ATOMIC_WRITE o F2FS_IOC_COMMIT_ATOMIC_WRITE The database engine should be aware of the following sequence. 1. open -> ioctl(F2FS_IOC_START_ATOMIC_WRITE); 2. writes : all the written data will be treated as atomic pages. 3. commit -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE); : this flushes all the data blocks to the disk, which will be shown all or nothing by f2fs recovery procedure. 4. repeat to #2. The IO pattens should be: ,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE CP \| D D D D D D \| FSYNC \| D D D D \| FSYNC ... `- COMMIT_ATOMIC_WRITE Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-10-06 17:39:50 -07:00
Jaegeuk Kim	52656e6cf7	f2fs: clean up f2fs_ioctl functions This patch cleans up f2fs_ioctl functions for better readability. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:34:56 -07:00
Jaegeuk Kim	4b2fecc846	f2fs: introduce FITRIM in f2fs_ioctl This patch introduces FITRIM in f2fs_ioctl. In this case, f2fs will issue small discards and prefree discards as many as possible for the given area. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:06:09 -07:00
Chao Yu	14cecc5cd6	f2fs: skip punching hole in special condition Now punching hole in directory is not supported in f2fs, so let's limit file type in punch_hole(). In addition, in punch_hole if offset is exceed file size, we should skip punching hole. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:21 -07:00
Chao Yu	09db6a2ef8	f2fs: fix to truncate blocks past EOF in ->setattr By using FALLOC_FL_KEEP_SIZE in ->fallocate of f2fs, we can fallocate block past EOF without changing i_size of inode. These blocks past EOF will not be truncated in ->setattr as we truncate them only when change the file size. We should give a chance to truncate blocks out of filesize in setattr(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:20 -07:00
Jaegeuk Kim	19c9c466e5	f2fs: do not skip latest inode information In f2fs_sync_file, if there is no written appended writes, it skips to write its node blocks. But, if there is up-to-date inode page, we should write it to update its metadata during the roll-forward recovery. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:16 -07:00
Jaegeuk Kim	88bd02c947	f2fs: fix conditions to remain recovery information in f2fs_sync_file This patch revisited whole the recovery information during the f2fs_sync_file. In this patch, there are three information to make a decision. a) IS_CHECKPOINTED, /* is it checkpointed before? / b) HAS_FSYNCED_INODE, / is the inode fsynced before? / c) HAS_LAST_FSYNC, / has the latest node fsync mark? */ And, the scenarios for our rule are based on: [Term] F: fsync_mark, D: dentry_mark 1. inode(x) \| CP \| inode(x) \| dnode(F) 2. inode(x) \| CP \| inode(F) \| dnode(F) 3. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) 4. inode(x) \| CP \| dnode(F) \| inode(F) 5. CP \| inode(x) \| dnode(F) \| inode(DF) 6. CP \| inode(DF) \| dnode(F) 7. CP \| dnode(F) \| inode(DF) 8. CP \| dnode(F) \| inode(x) \| inode(DF) For example, #3, the three conditions should be changed as follows. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) a) x o o o o b) x x x x o c) x o o x o If f2fs_sync_file stops ------^, it should write inode(F) --------------^ So, the need_inode_block_update should return true, since c) get_nat_flag(e, HAS_LAST_FSYNC), is false. For example, #8, CP \| alloc \| dnode(F) \| inode(x) \| inode(DF) a) o x x x x b) x x x o c) o o x o If f2fs_sync_file stops -------^, it should write inode(DF) --------------^ Note that, the roll-forward policy should follow this rule, which means, if there are any missing blocks, we doesn't need to recover that inode. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:15 -07:00
Jaegeuk Kim	c1ce1b02bb	f2fs: give an option to enable in-place-updates during fsync to users If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file only starts to try in-place-updates. And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it keeps out-of-order manner. Otherwise, it triggers in-place-updates. This may be used by storage showing very high random write performance. For example, it can be used when, Seq. writes (Data) + wait + Seq. writes (Node) is pretty much slower than, Rand. writes (Data) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:44 -07:00
Jaegeuk Kim	2403c155b8	f2fs: remove lengthy inode->i_ino This patch is to remove lengthy name by adding a new variable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-10 17:00:25 -07:00
Jaegeuk Kim	0b4c5afde9	f2fs: fix negative value for lseek offset If application throws negative value of lseek with SEEK_DATA\|SEEK_HOLE, previous f2fs went into BUG_ON in get_dnode_of_data, which was reported by Tommi Rantala. He could make a simple code to detect this having: lseek(fd, -17595150933902LL, SEEK_DATA); This patch should resolve that bug. Reported-by: Tommi Rentala <tt.rantala@gmail.com> [Jaegeuk Kim: relocate the condition as suggested by Chao] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 14:46:36 -07:00
Jaegeuk Kim	9850cf4a89	f2fs: need fsck.f2fs when f2fs_bug_on is triggered If any f2fs_bug_on is triggered, fsck.f2fs is needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 13:15:02 -07:00
Jaegeuk Kim	4081363fbe	f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB This patch adds three inline functions to clean up dirty casting codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-03 17:37:13 -07:00
Chao Yu	9d1589ef2e	f2fs: introduce need_do_checkpoint for readability This patch introduce need_do_checkpoint() to include numerous judgment condition for readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 13:57:07 -07:00
Jaegeuk Kim	764aa3e978	f2fs: avoid double lock in truncate_blocks The init_inode_metadata calls truncate_blocks when error is occurred. The callers holds f2fs_lock_op, so we should not call it again in truncate_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 13:57:01 -07:00
Jaegeuk Kim	b067ba1f1b	f2fs: should convert inline_data during the mkwrite If mkwrite is called to an inode having inline_data, it can overwrite the data index space as NEW_ADDR. (e.g., the first 4 bytes are coincidently zero) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:33 -07:00
arter97	e1c4204520	f2fs: fix typo Fix typo and some grammatical errors. The words "filesystem" and "readahead" are being used without the space treewide. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:33 -07:00
Chao Yu	497a0930bb	f2fs: add f2fs_balance_fs for expand_inode_data This patch adds f2fs_balance_fs in expand_inode_data to avoid allocation failure with segment. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-04 13:02:03 -07:00
Jaegeuk Kim	65b85ccce0	f2fs: fix coding style This patch fixes wrong coding style. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 17:25:54 -07:00
Jaegeuk Kim	ea1aa12ca2	f2fs: enable in-place-update for fdatasync This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:23 -07:00
Jaegeuk Kim	6d99ba41a7	f2fs: skip unnecessary data writes during fsync This patch intends to improve the fsync performance by skipping remaining the recovery information, only when there is no data that we should recover. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:03 -07:00
Chao Yu	6b2920a513	f2fs: use inner macro and function to clean up codes In this patch we use below inner macro and function to clean up codes. 1. ADDRS_PER_PAGE 2. SM_I 3. f2fs_readonly Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 14:04:26 -07:00
Chao Yu	ca0a81b397	f2fs: avoid to truncate non-updated page partially After we call find_data_page in truncate_partial_data_page, we could not guarantee this page is updated or not as error may occurred in lower layer. We'd better check status of the page to avoid this no updated page be writebacked to device. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 14:04:24 -07:00
Jaegeuk Kim	98397ff3cd	f2fs: fix not to allocate unnecessary blocks during fallocate This patch fixes the fallocate bug like below. (See xfstests/255) In fallocate(fd, 0, 20480), expand_inode_data processes for (index = pg_start; index <= pg_end; index++) { f2fs_reserve_block(); ... } So, even though fallocate requests 20480, 5 blocks, f2fs allocates 6 blocks including pg_end. So, this patch adds one condition to avoid block allocation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-23 10:05:08 +09:00
Jaegeuk Kim	ead432756a	f2fs: recover fallocated data and its i_size together This patch arranges the f2fs_locks to cover the fallocated data and its i_size. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-23 10:05:08 +09:00
Linus Torvalds	16b9057804	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...	2014-06-12 10:30:18 -07:00
Al Viro	8d0207652c	->splice_write() via ->write_iter() iter_file_splice_write() - a ->splice_write() instance that gathers the pipe buffers, builds a bio_vec-based iov_iter covering those and feeds it to ->write_iter(). A bunch of simple cases coverted to that... [AV: fixed the braino spotted by Cyrill] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:18:51 -04:00
Jaegeuk Kim	9ab7013492	f2fs: support f2fs_fiemap This patch links f2fs_fiemap with generic function with get_block. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-08 08:56:49 +09:00
Jaegeuk Kim	6fa1df533a	f2fs: recover fallocated space If a fallocated file is fsynced, we should recover the i_size after sudden power cut. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-07 03:18:35 +09:00
Chao Yu	8aa6f1c5bd	f2fs: fix to truncate inline data in inode page when setattr Previous we do not truncate inline data in inode page when setattr, so following case could still read the inline data which has already truncated: 1.write inline data 2.ftruncate size to 0 3.ftruncate size to max inline data size 4.read from offset 0 This patch introduces truncate_inline_data() to fix this problem. change log from v1: o fix a bug and do not truncate first page data after truncate inline data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:58 +09:00
Jaegeuk Kim	7f7670fe9f	f2fs: consider fallocated space for SEEK_DATA If an amount of data are allocated though fallocate and user writes a couple of data among the space, f2fs should return the data offset made by user when SEEK_DATA is requested. For example, (N: NEW_ADDR by fallocate, X: NEW_ADDR by user) 1) fallocate 0 ~ 10MB f -> N N N N N N N N N N N N ... N 2) write 4KB at 5MB offset f -> N N N N N X N N N N N N ... N 3) SEEK_DATA from 0 should return 5MB offset So, this patch adds a routine to search the first dirty page to handle that. Then, the SEEK_DATA flow skips NEW_ADDR offsets until any dirty page is found. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:57 +09:00
Jaegeuk Kim	fe369bc8ba	f2fs: return i_size if the hole is outside of i_size When SEEK_HOLE is requeted, it should return i_size if the hole position is found outside of i_size. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:57 +09:00
Chao Yu	267378d4de	f2fs: introduce f2fs_seek_block to support SEEK_{DATA, HOLE} in llseek In This patch we introduce f2fs_seek_block to support SEEK_{DATA,HOLE} of lseek(2). change log from v1: o fix bug when lseek from middle of page and fix wrong calculation of PGOFS_OF_NEXT_DNODE macro. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:57 +09:00
Chao Yu	6403eb1f64	f2fs: introduce help macro ADDRS_PER_PAGE() Introduce help macro ADDRS_PER_PAGE() to get the number of address pointers in direct node or inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:56 +09:00
Al Viro	8174202b34	write_iter variants of {__,}generic_file_aio_write() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:38:00 -04:00
Al Viro	aad4f8bb42	switch simple generic_file_aio_read() users to ->read_iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:37:55 -04:00
Linus Torvalds	26c12d9334	Merge branch 'akpm' (incoming from Andrew) Merge second patch-bomb from Andrew Morton: - the rest of MM - zram updates - zswap updates - exit - procfs - exec - wait - crash dump - lib/idr - rapidio - adfs, affs, bfs, ufs - cris - Kconfig things - initramfs - small amount of IPC material - percpu enhancements - early ioremap support - various other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits) MAINTAINERS: update Intel C600 SAS driver maintainers fs/ufs: remove unused ufs_super_block_third pointer fs/ufs: remove unused ufs_super_block_second pointer fs/ufs: remove unused ufs_super_block_first pointer fs/ufs/super.c: add __init to init_inodecache() doc/kernel-parameters.txt: add early_ioremap_debug arm64: add early_ioremap support arm64: initialize pgprot info earlier in boot x86: use generic early_ioremap mm: create generic early_ioremap() support x86/mm: sparse warning fix for early_memremap lglock: map to spinlock when !CONFIG_SMP percpu: add preemption checks to __this_cpu ops vmstat: use raw_cpu_ops to avoid false positives on preemption checks slub: use raw_cpu_inc for incrementing statistics net: replace __this_cpu_inc in route.c with raw_cpu_inc modules: use raw_cpu_write for initialization of per cpu refcount. mm: use raw_cpu ops for determining current NUMA node percpu: add raw_cpu_ops slub: fix leak of 'name' in sysfs_slab_add ...	2014-04-07 16:38:06 -07:00
Kirill A. Shutemov	f1820361f8	mm: implement ->map_pages for page cache filemap_map_pages() is generic implementation of ->map_pages() for filesystems who uses page cache. It should be safe to use filemap_map_pages() for ->map_pages() if filesystem use filemap_fault() for ->fault(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Ning Qu <quning@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-07 16:35:53 -07:00
Jaegeuk Kim	6b4afdd794	f2fs: introduce f2fs_issue_flush to avoid redundant flush issue Some storage devices show relatively high latencies to complete cache_flush commands, even though their normal IO speed is prettry much high. In such the case, it needs to merge cache_flush commands as much as possible to avoid issuing them redundantly. So, this patch introduces a mount option, "-o flush_merge", to mitigate such the overhead. If this option is enabled by user, F2FS merges the cache_flush commands and then issues just one cache_flush on behalf of them. Once the single command is finished, F2FS sends a completion signal to all the pending threads. Note that, this option can be used under a workload consisting of very intensive concurrent fsync calls, while the storage handles cache_flush commands slowly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-07 09:50:58 +09:00
Jaegeuk Kim	479f40c44a	f2fs: skip unnecessary node writes during fsync If multiple redundant fsync calls are triggered, we don't need to write its node pages with fsync mark continuously. So, this patch adds FI_NEED_FSYNC to track whether the latest node block is written with the fsync mark or not. If the mark was set, a new fsync doesn't need to write a node block. Otherwise, we should do a new node block with the mark for roll-forward recovery. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:11 +09:00
Jaegeuk Kim	d928bfbfe7	f2fs: introduce fi->i_sem to protect fi's info This patch introduces fi->i_sem to protect fi's info that includes xattr_ver, pino, i_nlink. This enables to remove i_mutex during f2fs_sync_file, resulting in performance improvement when a number of fsync calls are triggered from many concurrent threads. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:11 +09:00
Jaegeuk Kim	3cb5ad152b	f2fs: call f2fs_wait_on_page_writeback instead of native function If a page is on writeback, f2fs can face with deadlock due to under writepages. This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw merged IOs, which is implemented by f2fs_wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:04 +09:00
Jaegeuk Kim	c81bf1c84f	f2fs: fix to write node pages with WRITE_SYNC This patch fixes performance regression of dbench reported by Alex <hbx7d@yandex.com>. This issue was revealed by Phoronix tests results: http://www.phoronix.com/scan.php?page=article&item=linux_314_ssdfs&num=2 It turns out that we need to assign WRITE_SYNC to the node writes, if fsync is triggered. The performance numbers are like below, which is measured by Alex. 1. 355MB/s ext4 2. 225MB/s f2fs : WRITE for node writes 3. 525MB/s f2fs : WRITE_SYNC for node writes Reported-And-Tested-by: Alex <hbx7d@yandex.com>. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-03 11:28:40 +09:00
Chao Yu	3375f696bd	f2fs: use inode mutex to keep atomicity of f2fs_falloc Previously without protection of inode mutex, f2fs_falloc and other data correlated operations will interfere with each other. So let's use inode mutex to keep atomicity of f2fs_falloc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Linus Torvalds	bf3d846b78	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus assorted cleanups and fixes all over the place... There will be another pile later this week" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits) __dentry_path() fixes vfs: Remove second variable named error in __dentry_path vfs: Is mounted should be testing mnt_ns for NULL or error. Fix race when checking i_size on direct i/o read hfsplus: remove can_set_xattr nfsd: use get_acl and ->set_acl fs: remove generic_acl nfs: use generic posix ACL infrastructure for v3 Posix ACLs gfs2: use generic posix ACL infrastructure jfs: use generic posix ACL infrastructure xfs: use generic posix ACL infrastructure reiserfs: use generic posix ACL infrastructure ocfs2: use generic posix ACL infrastructure jffs2: use generic posix ACL infrastructure hfsplus: use generic posix ACL infrastructure f2fs: use generic posix ACL infrastructure ext2/3/4: use generic posix ACL infrastructure btrfs: use generic posix ACL infrastructure fs: make posix_acl_create more useful fs: make posix_acl_chmod more useful ...	2014-01-28 08:38:04 -08:00
Christoph Hellwig	a6dda0e63e	f2fs: use generic posix ACL infrastructure f2fs has some weird mode bit handling, so still using the old chmod code for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-01-25 23:58:19 -05:00
Chris Fries	6c311ec6c2	f2fs: clean checkpatch warnings Fixed a variety of trivial checkpatch warnings. The only delta should be some minor formatting on log strings that were split / too long. Signed-off-by: Chris Fries <cfries@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-20 10:27:12 +09:00
Jaegeuk Kim	fb5566da91	f2fs: improve write performance under frequent fsync calls When considering a bunch of data writes with very frequent fsync calls, we are able to think the following performance regression. N: Node IO, D: Data IO, IO scheduler: cfq Issue pending IOs D1 D2 D3 D4 D1 D2 D3 D4 N1 D2 D3 D4 N1 N2 N1 D3 D4 N2 D1 --> N1 can be selected by cfq becase of the same priority of N and D. Then D3 and D4 would be delayed, resuling in performance degradation. So, when processing the fsync call, it'd better give higher priority to data IOs than node IOs by assigning WRITE and WRITE_SYNC respectively. This patch improves the random wirte performance with frequent fsync calls by up to 10%. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-08 11:16:20 +09:00
Jaegeuk Kim	1e1bb4baf1	f2fs: add inline_data recovery routine This patch adds a inline_data recovery routine with the following policy. [prev.] [next] of inline_data flag o o -> recover inline_data o x -> remove inline_data, and then recover data blocks x o -> remove inline_data, and then recover inline_data x x -> recover data blocks Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:20 +09:00
Jaegeuk Kim	9e09fc855d	f2fs: refactor f2fs_convert_inline_data Change log from v1: o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu. This patch refactors f2fs_convert_inline_data to check a couple of conditions internally for deciding whether it needs to convert inline_data or not. So, the new f2fs_convert_inline_data initially checks: 1) f2fs_has_inline_data(), and 2) the data size to be changed. If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA, then we don't need to convert the inline_data with data allocation. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:19 +09:00
Jaegeuk Kim	8230a0a49f	f2fs: convert inline_data for punch_hole In the punch_hole(), let's convert inline_data all the time for simplicity and to avoid potential deadlock conditions. It is pretty much not a big deal to do this. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:12 +09:00
Jaegeuk Kim	f185ff979f	f2fs: don't need to get f2fs_lock_op for the inline_data test This patch locates checking the inline_data prior to calling f2fs_lock_op() in truncate_blocks(), since getting the lock is unnecessary. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-27 12:40:41 +09:00
Huajun Li	9ffe0fb5f3	f2fs: handle inline data operations Hook inline data read/write, truncate, fallocate, setattr, etc. Files need meet following 2 requirement to inline: 1) file size is not greater than MAX_INLINE_DATA; 2) file doesn't pre-allocate data blocks by fallocate(). FI_INLINE_DATA will not be set while creating a new regular inode because most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when data is submitted to block layer, ranther than set it while creating a new inode, this also avoids converting data from inline to normal data block and vice versa. While writting inline data to inode block, the first data block should be released if the file has a block indexed by i_addr[0]. On the other hand, when a file operation is appied to a file with inline data, we need to test if this file can remain inline by doing this operation, otherwise it should be convert into normal file by reserving a new data block, copying inline data to this new block and clear FI_INLINE_DATA flag. Because reserve a new data block here will make use of i_addr[0], if we save inline data in i_addr[0..872], then the first 4 bytes would be overwriten. This problem can be avoided simply by not using i_addr[0] for inline data. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 20:40:41 +09:00
Jaegeuk Kim	6bacf52fb5	f2fs: add unlikely() macro for compiler more aggressively This patch adds unlikely() macro into the most of codes. The basic rule is to add that when: - checking unusual errors, - checking page mappings, - and the other unlikely conditions. Change log from v1: - Don't add unlikely for the NULL test and error test: advised by Andi Kleen. Cc: Chao Yu <chao2.yu@samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:06 +09:00
Chao Yu	a66c7b2fcf	f2fs: remove unneeded code in punch_hole Because FALLOC_FL_PUNCH_HOLE flag must be ORed with FALLOC_FL_KEEP_SIZE in fallocate, so we could remove the useless 'keep size' branch code which will never be excuted in punch_hole. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Fan Li <fanofcode.li@samsung.com> [Jaegeuk Kim: remove an unnecessary parameter togather] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Huajun Li	b600965c43	f2fs: add a new function: f2fs_reserve_block() Add the function f2fs_reserve_block() to easily reserve new blocks, and use it to clean up more codes. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:03 +09:00
Jaegeuk Kim	cfe58f9dcd	f2fs: avoid to wait all the node blocks during fsync Previously, f2fs_sync_file() waits for all the node blocks to be written. But, we don't need to do that, but wait only the inode-related node blocks. This patch adds wait_on_node_pages_writeback() in which waits inode-related node blocks that are on writeback. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-31 16:01:03 +09:00
Jaegeuk Kim	5d56b6718a	f2fs: add an option to avoid unnecessary BUG_ONs If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS in your kernel config. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-29 15:44:38 +09:00
Jaegeuk Kim	e943a10d94	f2fs: add tracepoint for vm_page_mkwrite This patch adds a tracepoint for f2fs_vm_page_mkwrite. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:40 +09:00
Gu Zheng	e479556bfd	f2fs: use rw_sem instead of fs_lock(locks mutex) The fs_locks is used to block other ops(ex, recovery) when doing checkpoint. And each other operate routine(besides checkpoint) needs to acquire a fs_lock, there is a terrible problem here, if these are too many concurrency threads acquiring fs_lock, so that they will block each other and may lead to some performance problem, but this is not the phenomenon we want to see. Though there are some optimization patches introduced to enhance the usage of fs_lock, but the thorough solution is using a rw_sem to replace the fs_lock. Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other, this can avoid the problem described above completely. Because of the weakness of rw_sem, the above change may introduce a potential problem that the checkpoint thread might get starved if other threads are intensively locking the read semaphore for I/O.(Pointed out by Xu Jin) In order to avoid this, a wait_list is introduced, the appending read semaphore ops will be dropped into the wait_list if checkpoint thread is waiting for write semaphore, and will be waked up when checkpoint thread gives up write semaphore. Thanks to Kim's previous review and test, and will be very glad to see other guys' performance tests about this patch. V2: -fix the potential starvation problem. -use more suitable func name suggested by Xu Jin. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: adjust minor coding standard] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-07 11:33:05 +09:00
Jaegeuk Kim	de93653fe3	f2fs: reserve the xattr space dynamically This patch enables the number of direct pointers inside on-disk inode block to be changed dynamically according to the size of inline xattr space. The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file has inline xattr flag. The number of direct pointers that will be used by inline xattrs is defined as F2FS_INLINE_XATTR_ADDRS. Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-26 20:15:01 +09:00
Jaegeuk Kim	d71b5564c0	f2fs: introduce cur_cp_version function to reduce code size This patch introduces a new inline function, cur_cp_version, to reduce redundant codes. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 15:25:37 +09:00
Jaegeuk Kim	e518ff81c3	f2fs: fix inconsistency between xattr node blocks and its inode Previously xattr node blocks are stored to the COLD_NODE log, which means that our roll-forward mechanism doesn't recover the xattr node blocks at all. Only the direct node blocks in the WARM_NODE log can be recovered. So, let's resolve the issue simply by conducting checkpoint during fsync when a file has a modified xattr node block. This approach is able to degrade the performance, but normally the checkpoint overhead is shown at the initial fsync call after the xattr entry changes. Once the checkpoint is done, no additional overhead would be occurred. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 15:25:24 +09:00
Jaegeuk Kim	f0947e5cce	f2fs: fix i_name during f2fs_sync_file As similar as the i_pino fix, i_name also should be fixed when i_nlink is 1. The errorneous scenario is like this. 1. touch test1 2. link test1 test2 3. unlink test2 4. fsync test1 After this, i_name should be test1. CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:03 +09:00
Gu Zheng	4559071063	f2fs: introduce help function F2FS_NODE() Introduce help function F2FS_NODE() to simplify the conversion of node_page to f2fs_node. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:02 +09:00
Jaegeuk Kim	e5d2385ed6	f2fs: recover date requested by fdatasync In order to support SQLite that uses fdatasync instead of fsync, we should guarantee the data requested by fdatasync can be recovered after sudden-power- off. So, let's remove the fdatasync condition in f2fs_sync_file. Otherwise, we can restore the data after sudden-power-off due to nonexistence of any fsync mark'ed node blocks. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:02 +09:00
Jaegeuk Kim	354a3399dc	f2fs: recover wrong pino after checkpoint during fsync If a file is linked, f2fs loose its parent inode number so that fsync calls for the linked file should do checkpoint all the time. But, if we can recover its parent inode number after the checkpoint, we can adjust roll-forward mechanism for the further fsync calls, which is able to improve the fsync performance significatly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:45 +09:00
Jaegeuk Kim	b3783873cc	f2fs: avoid freqeunt write_inode calls If update_inode is called, we don't need to do write_inode. So, let's use a dirty flag for each inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:43 +09:00
Namjae Jeon	d7cc950b4c	f2fs: optimise the truncate_data_blocks_range() range The function truncate_data_blocks_range() decrements the valid block count of inode via dec_valid_block_count(). Since this function updates the i_blocks field of inode, we can update this field once we have calculated total the number of blocks to be freed. Therefore we can decrement valid blocks outside of the for loop. if (nr_free) { + dec_valid_block_count(sbi, dn->inode, nr_free); set_page_dirty(dn->node_page); sync_inode_page(dn); } 'nr_free' tells the total number of blocks freed. So, we can just directly pass this value to dec_valid_block_count() and update the i_blocks. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:42 +09:00
Namjae Jeon	6a3e8ef0de	f2fs: use the F2FS specific flags in f2fs_ioctl() In f2fs_ioctl() function, it is using generic flags. Since F2FS specific flags are defined. So lets use those flags. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:26 +09:00
Jaegeuk Kim	2d4d9fb591	f2fs: fix i_blocks translation on various types of files Basically an inode manages the number of allocated blocks with inode->i_blocks which is represented in a unit of sectors, not file system blocks. But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr translates it to the number of sectors when fstat is called. However, previously f2fs_file_inode_operations only has this, so this patch adds it to all the types of inode_operations. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-11 16:01:09 +09:00
Jaegeuk Kim	b292dcab06	f2fs: reuse the locked dnode page and its inode This patch fixes the following deadlock bug during the recovery. INFO: task mount:1322 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount D ffffffff81125870 0 1322 1266 0x00000000 ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046 ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8 ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520 Call Trace: [<ffffffff81125870>] ? __lock_page+0x70/0x70 [<ffffffff816a92d9>] schedule+0x29/0x70 [<ffffffff816a93af>] io_schedule+0x8f/0xd0 [<ffffffff8112587e>] sleep_on_page+0xe/0x20 [<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff81125867>] __lock_page+0x67/0x70 [<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40 [<ffffffff81126857>] find_lock_page+0x67/0x80 [<ffffffff8112698f>] find_or_create_page+0x3f/0xb0 [<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs] [<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs] [<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs] [<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40 [<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs] [<ffffffff81184cf9>] mount_bdev+0x1c9/0x210 [<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs] [<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff81185a13>] mount_fs+0x43/0x1b0 [<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20 [<ffffffff811a0796>] vfs_kern_mount+0x76/0x120 [<ffffffff811a2cb7>] do_mount+0x237/0xa10 [<ffffffff81140b9b>] ? strndup_user+0x5b/0x80 [<ffffffff811a3520>] SyS_mount+0x90/0xe0 [<ffffffff816b3502>] system_call_fastpath+0x16/0x1b The bug is triggered when check_index_in_prev_nodes tries to get the direct node page by calling get_node_page. At this point, if the direct node page is already locked by get_dnode_of_data, its caller, we got a deadlock condition. This patch adds additional condition check for the reuse of locked direct node pages prior to the get_node_page call. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:04 +09:00
Jaegeuk Kim	77888c1e42	f2fs: add f2fs_readonly() Introduce a simple macro function for readability. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:03 +09:00
Namjae Jeon	9851e6e189	f2fs: reorganize f2fs_vm_page_mkwrite Few things can be changed in the default mkwrite function 1) Make file_update_time at the start before acquiring any lock 2) the condition page_offset(page) >= i_size_read(inode) should be changed to page_offset(page) > i_size_read 3) Move wait_on_page_writeback. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:02 +09:00
Jaegeuk Kim	64aa7ed98d	f2fs: change get_new_data_page to pass a locked node page This patch is for passing a locked node page to get_dnode_of_data. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:01 +09:00
Linus Torvalds	942d33da99	f2fs updates for v3.10 This patch-set includes the following major enhancement patches. o introduce a new gloabl lock scheme o add tracepoints on several major functions o fix the overall cleaning process focused on victim selection o apply the block plugging to merge IOs as much as possible o enhance management of free nids and its list o enhance the readahead mode for node pages o address several cretical deadlock conditions o reduce lock_page calls The other minor bug fixes and enhancements are as follows. o calculation mistakes: overflow o bio types: READ, READA, and READ_SYNC o fix the recovery flow, data races, and null pointer errors -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRijCLAAoJEEAUqH6CSFDSg9kQAIqxmQzCUvCN3HcyVe8bGhKz 8xhKrAY6ySRCKMuBbFRQsNrXUhckE3A44DgzYm5/gQikr/c8zhbqPVrtZ968eCKb wm3J+Re/uwZr5eOXlJEaHIiSkMDtERN7Cu2oYJWZi2B9wCSZcgvoWQ3c3LUVk6yF GFdi1Y00ll5tFKbEGbXSsfdul9P8jp0MmuMnWBBQZF3TrjETXMdThA5FXN0yTf9s XkcGE9vTCCPk8p7P3YmGGw6CwlaL8oallm0//iL4nMNpJzveq2C09IlY2BNrxU3L iTNXeIBdbhwXpnh2zq26Cy+cIEDIp0oXYui5BYdr/LWyWU3T/INa+hjUUszsESxF 51LIUA1rA9nX/BSmj2QomswZ3lt4u5jl6rSBFKv3NG1KsFrAdb8S4tHukRSTSxAJ gzpY6kLT1+bgciA16F5W4yhzMYPN5hPa8s6hx4LHlpoqQICQsurjtS9KW7vncLFt ttmCMn8ehHcTzKRNNqYaBerCtSB3Z3G/uAy1y+DB7Zx2h2mqhCBXRalyRvs7RKvK d5OyYCpHntxuzDwVuivnr9Ddp30LUP1WqexxK+ykn99Ji3leMmffHP8Oari8w96b RxSbjoo8hOgoS5xZ4v3AaqtLDlBpxC6oWJzDaq/fJeKxOx22Z5BDFUM9mBGxrouJ AATl8b+cW/aTZ4l7WOPU =Hqii -----END PGP SIGNATURE----- Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches. - introduce a new gloabl lock scheme - add tracepoints on several major functions - fix the overall cleaning process focused on victim selection - apply the block plugging to merge IOs as much as possible - enhance management of free nids and its list - enhance the readahead mode for node pages - address several cretical deadlock conditions - reduce lock_page calls The other minor bug fixes and enhancements are as follows. - calculation mistakes: overflow - bio types: READ, READA, and READ_SYNC - fix the recovery flow, data races, and null pointer errors" * tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits) f2fs: cover free_nid management with spin_lock f2fs: optimize scan_nat_page() f2fs: code cleanup for scan_nat_page() and build_free_nids() f2fs: bugfix for alloc_nid_failed() f2fs: recover when journal contains deleted files f2fs: continue to mount after failing recovery f2fs: avoid deadlock during evict after f2fs_gc f2fs: modify the number of issued pages to merge IOs f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry. f2fs: fix inconsistent using of NM_WOUT_THRESHOLD f2fs: check truncation of mapping after lock_page f2fs: enhance alloc_nid and build_free_nids flows f2fs: add a tracepoint on f2fs_new_inode f2fs: check nid == 0 in add_free_nid f2fs: add REQ_META about metadata requests for submit f2fs: give a chance to merge IOs by IO scheduler f2fs: avoid frequent background GC f2fs: add tracepoints to debug checkpoint request f2fs: add tracepoints for write page operations f2fs: add tracepoints to debug the block allocation ...	2013-05-08 15:11:48 -07:00
Jaegeuk Kim	afcb7ca01f	f2fs: check truncation of mapping after lock_page We call lock_page when we need to update a page after readpage. Between grab and lock page, the page can be truncated by other thread. So, we should check the page after lock_page whether it was truncated or not. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-29 11:19:32 +09:00
Jaegeuk Kim	c718379b6b	f2fs: give a chance to merge IOs by IO scheduler Previously, background GC submits many 4KB read requests to load victim blocks and/or its (i)node blocks. ... f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0] f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0] f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0] ... However, by the fact that many IOs are sequential, we can give a chance to merge the IOs by IO scheduler. In order to do that, let's use blk_plug. ... f2fs_gc : f2fs_iget: ino = 143 f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee f2fs_gc : f2fs_iget: ino = 143 f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef <idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0] <idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0] <idle> : block_rq_complete: 8,16 R () `1520432` + 96 [0] <idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0] <idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0] <idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0] <idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0] <idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0] <idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0] ... Note that this issue should be addressed in checkpoint, and some readahead flows too. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-26 10:35:10 +09:00
Namjae Jeon	c01e285324	f2fs: add tracepoints to debug the block allocation Add tracepoints to debug the block allocation & fallocate. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> [Jaegeuk: enhance information] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-23 18:15:16 +09:00
Namjae Jeon	51dd624934	f2fs: add tracepoints for truncate operation add tracepoints for tracing the truncate operations like truncate node/data blocks, f2fs_truncate etc. Tracepoints are added at entry and exit of operation to trace the success & failure of operation. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> [Jaegeuk: combine and modify the tracepoint structures] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-23 16:40:38 +09:00
Namjae Jeon	a2a4a7e4ab	f2fs: add tracepoints for sync & inode operations Add tracepoints in f2fs for tracing the syncing operations like filesystem sync, file sync enter/exit. It will helf to trace the code under debugging scenarios. Also add tracepoints for tracing the various inode operations like building inode, eviction of inode, link/unlike of inodes. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> [Jaegeuk: combine and modify the tracepoint structures] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-23 15:30:27 +09:00
Al Viro	bdaec334bb	f2fs: use mnt_want_write_file() in ioctl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-09 14:12:56 -04:00
Jaegeuk Kim	399368372e	f2fs: introduce a new global lock scheme In the previous version, f2fs uses global locks according to the usage types, such as directory operations, block allocation, block write, and so on. Reference the following lock types in f2fs.h. enum lock_type { RENAME, /* for renaming operations / DENTRY_OPS, / for directory operations / DATA_WRITE, / for data write / DATA_NEW, / for data allocation / DATA_TRUNC, / for data truncate / NODE_NEW, / for node allocation / NODE_TRUNC, / for node truncate / NODE_WRITE, / for node write */ NR_LOCK_TYPE, }; In that case, we lose the performance under the multi-threading environment, since every types of operations must be conducted one at a time. In order to address the problem, let's share the locks globally with a mutex array regardless of any types. So, let users grab a mutex and perform their jobs in parallel as much as possbile. For this, I propose a new global lock scheme as follows. 0. Data structure - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS] - f2fs_sb_info -> node_write 1. mutex_lock_op(sbi) - try to get an avaiable lock from the array. - returns the index of the gottern lock variable. 2. mutex_unlock_op(sbi, index of the lock) - unlock the given index of the lock. 3. mutex_lock_all(sbi) - grab all the locks in the array before the checkpoint. 4. mutex_unlock_all(sbi) - release all the locks in the array after checkpoint. 5. block_operations() - call mutex_lock_all() - sync_dirty_dir_inodes() - grab node_write - sync_node_pages() Note that, the pairs of mutex_lock_op()/mutex_unlock_op() and mutex_lock_all()/mutex_unlock_all() should be used together. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-09 18:21:18 +09:00
Jason Hrycay	1127a3d448	f2fs: move f2fs_balance_fs from truncate to punch_hole Move the f2fs_balance_fs out of the truncate_hole function and only perform that in punch_hole use case. The commit: ed60b1644e7f7e5dd67d21caf7e4425dff05dad0 intended to do this but moved it into truncate_hole to cover more cases. However, a deadlock scenario is possible when deleting an inode entry under specific conditions: f2fs_delete_entry() mutex_lock_op(sbi, DENTRY_OPS); truncate_hole() f2fs_balance_fs() mutex_lock(&sbi->gc_mutex); f2fs_gc() write_checkpoint() block_operations() mutex_lock_op(sbi, DENTRY_OPS); Lets move it into the punch_hole case to cover the original intent of avoiding it during fallocate's expand_inode_data case. Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e Signed-off-by: Jason Hrycay <jason.hrycay@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-09 17:22:45 +09:00
Jaegeuk Kim	953a3e27e1	f2fs: fix to give correct parent inode number for roll forward When we recover fsync'ed data after power-off-recovery, we should guarantee that any parent inode number should be correct for each direct inode blocks. So, let's make the following rules. - The fsync should do checkpoint to all the inodes that were experienced hard links. - So, the only normal files can be recovered by roll-forward. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-27 09:16:25 +09:00
Jaegeuk Kim	0ff153a2f1	f2fs: do not skip writing file meta during fsync This patch removes data_version check flow during the fsync call. The original purpose for the use of data_version was to avoid writng inode pages redundantly by the fsync calls repeatedly. However, when user can modify file meta and then call fsync, we should not skip fsync procedure. So, let's remove this condition check and hope that user triggers in right manner. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-27 09:16:16 +09:00
Jaegeuk Kim	ae51fb31b8	f2fs: fix to call WRITE_FLUSH at the end of fsync The fsync call should be ended after flushing the in-device caches. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:14 +09:00
Jaegeuk Kim	266e97a81c	f2fs: introduce readahead mode of node pages Previously, f2fs reads several node pages ahead when get_dnode_of_data is called with RDONLY_NODE flag. And, this flag is set by the following functions. - get_data_block_ro - get_lock_data_page - do_write_data_page - truncate_blocks - truncate_hole However, this readahead mechanism is initially introduced for the use of get_data_block_ro to enhance the sequential read performance. So, let's clarify all the cases with the additional modes as follows. enum { ALLOC_NODE, /* allocate a new node page if needed / LOOKUP_NODE, / look up a node without readahead / LOOKUP_NODE_RA, / * look up a node with readahead called * by get_datablock_ro. */ } Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>	2013-03-18 21:00:33 +09:00
Al Viro	6131ffaa1f	more file_inode() open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-27 16:59:05 -05:00
Namjae Jeon	e975082411	f2fs: add compat_ioctl to provide backward compatability adding compat_ioctl to provide support for backward comptability - 32bit binary execution on 64bit kernel. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-02-12 07:15:02 +09:00
Changman Lee	facb020540	f2fs: stop repeated checking if cp is needed If it is decided that f2fs should do checkpoint, skip next comparison. Signed-off-by: Changman Lee <cm224.lee@samsung.com>	2013-02-12 07:15:01 +09:00
Jaegeuk Kim	d4686d56ec	f2fs: avoid balanc_fs during evict_inode 1. Background Previously, if f2fs tries to move data blocks of an evicting inode during the cleaning process, it stops the process incompletely and then restarts the whole process, since it needs a locked inode to grab victim data pages in its address space. In order to get a locked inode, iget_locked() by f2fs_iget() is normally used, but, it waits if the inode is on freeing. So, here is a deadlock scenario. 1. f2fs_evict_inode() <- inode "A" 2. f2fs_balance_fs() 3. f2fs_gc() 4. gc_data_segment() 5. f2fs_iget() <- inode "A" too! If step #1 and #5 treat a same inode "A", step #5 would fall into deadlock since the inode "A" is on freeing. In order to resolve this, f2fs_iget_nowait() which skips __wait_on_freeing_inode() was introduced in step #5, and stops f2fs_gc() to complete f2fs_evict_inode(). 1. f2fs_evict_inode() <- inode "A" 2. f2fs_balance_fs() 3. f2fs_gc() 4. gc_data_segment() 5. f2fs_iget_nowait() <- inode "A", then stop f2fs_gc() w/ -ENOENT 2. Problem and Solution In the above scenario, however, f2fs cannot finish f2fs_evict_inode() only if: o there are not enough free sections, and o f2fs_gc() tries to move data blocks of the evicting inode repeatedly. So, the final solution is to use f2fs_iget() and remove f2fs_balance_fs() in f2fs_evict_inode(). The f2fs_evict_inode() actually truncates all the data and node blocks, which means that it doesn't produce any dirty node pages accordingly. So, we don't need to do f2fs_balance_fs() in practical. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-02-12 07:15:01 +09:00
Jaegeuk Kim	bd43df021a	f2fs: cover global locks for reserve_new_block The fill_zero() from fallocate() calls get_new_data_page() in which calls reserve_new_block(). The reserve_new_block() should be covered by DATA_NEW, one of global locks. And also, before getting the lock, we should check free sections by calling f2fs_balance_fs(). If we break this rule, f2fs is able to face with out-of-control free space management and fall into infinite loop like the following scenario as well. [f2fs_sync_fs()] [fallocate()] - write_checkpoint() - fill_zero() - block_operations() - get_new_data_page() : grab NODE_NEW - get_dnode_of_data() : get locked dirty node page - sync_node_pages() : try to grab NODE_NEW for data allocation : trylock and skip the dirty node page : call sync_node_pages() repeatedly in order to flush all the dirty node pages! In order to avoid this, we should grab another global lock such as DATA_NEW before calling get_new_data_page() in fill_zero(). Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-02-12 07:15:00 +09:00
Jaegeuk Kim	692bb55d1a	f2fs: add remap_pages as generic_file_remap_pages This was added for all the file systems before. See the following commit. commit id: `0b173bc4da` [PATCH] mm: kill vma flag VM_CAN_NONLINEAR This patch moves actual ptes filling for non-linear file mappings into special vma operation: ->remap_pages(). File system must implement this method to get non-linear mappings support, if it uses filemap_fault() then generic_file_remap_pages() can be used. Now device drivers can implement this method and obtain nonlinear vma support." Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-01-22 10:48:58 +09:00
Jaegeuk Kim	9eaeba7013	f2fs: move f2fs_balance_fs to punch_hole The f2fs_fallocate() has two operations: punch_hole and expand_size. Only in the case of punch_hole, dirty node pages can be produced, so let's trigger f2fs_balance_fs() in this case only. Furthermore, let's trigger it at every data truncation routine. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-01-11 15:09:23 +09:00
Jaegeuk Kim	7d82db8316	f2fs: add f2fs_balance_fs in several interfaces The f2fs_balance_fs() is to check the number of free sections and decide whether it needs to conduct cleaning or not. If there are not enough free sections, the cleaning job should be started. In order to control an amount of free sections even under high utilization, f2fs should call f2fs_balance_fs at all the VFS interfaces that are able to produce dirty pages. This patch adds the function calls in the missing interfaces as follows. 1. f2fs_setxattr() The f2fs_setxattr() produces dirty node pages so that we should call f2fs_balance_fs() either likewise doing in other VFS interfaces such as f2fs_lookup(), f2fs_mkdir(), and so on. 2. f2fs_sync_file() We should guarantee serving free sections for syncing metadata during fsync. Previously, there is no space check before triggering checkpoint and sync_node_pages. Therefore, if a bunch of fsync calls are triggered under 100% of FS utilization, f2fs is able to be faced with no free sections, resulting in BUG_ON(). 3. f2fs_sync_fs() Before calling write_checkpoint(), we should guarantee that there are minimum free sections. 4. f2fs_write_inode() f2fs_write_inode() is also able to produce dirty node pages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-01-11 15:09:17 +09:00
Namjae Jeon	3af60a49fd	f2fs: fix time update in case of f2fs fallocate After doing a punch hole or expanding inode doing fallocation. The change and modification time are not update for the file. So, update time after no issue is observed in fallocate. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-01-04 09:42:59 +09:00
Jaegeuk Kim	398b1ac5a5	f2fs: fix handling errors got by f2fs_write_inode Ruslan reported that f2fs hangs with an infinite loop in f2fs_sync_file(): while (sync_node_pages(sbi, inode->i_ino, &wbc) == 0) f2fs_write_inode(inode, NULL); The reason was revealed that the cold flag is not set even thought this inode is a normal file. Therefore, sync_node_pages() skips to write node blocks since it only writes cold node blocks. The cold flag is stored to the node_footer in node block, and whenever a new node page is allocated, it is set according to its file type, file or directory. But, after sudden-power-off, when recovering the inode page, f2fs doesn't recover its cold flag. So, let's assign the cold flag in more right places. One more thing: If f2fs_write_inode() returns an error due to whatever situations, there would be no dirty node pages so that sync_node_pages() returns zero. (i.e., zero means nothing was written.) Reported-by: Ruslan N. Marchenko <me@ruff.mobi> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-26 10:39:52 +09:00
Wei Yongjun	705f814e34	f2fs: remove unused variable The variables node_page and page_offset are initialized but never used otherwise, so remove those unused variables. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>	2012-12-11 13:43:44 +09:00
Namjae Jeon	1fa95b0b67	f2fs: check read only condition before beginning write out If the filesystem is mounted as read-only then return from that point itself instead of first doing a writeout/wait and then checking for read-only condition. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>	2012-12-11 13:43:43 +09:00
Jaegeuk Kim	0a8165d7c2	f2fs: adjust kernel coding style As pointed out by Randy Dunlap, this patch removes all usage of "/*" for comment blocks. Instead, just use "/". Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:42 +09:00
Jaegeuk Kim	fbfa2cc58d	f2fs: add file operations This adds memory operations and file/file_inode operations. - F2FS supports fallocate(), mmap(), fsync(), and basic ioctl(). Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2012-12-11 13:43:41 +09:00

1 2 3 4 5

213 commits