remarkable-linux

redonkable

Author	SHA1	Message	Date
Jaegeuk Kim	1e87a78d95	f2fs: avoid to conduct roll-forward due to the remained garbage blocks The f2fs always scans the next chain of direct node blocks. But some garbage blocks are able to be remained due to no discard support or SSR triggers. This occasionally wreaks recovering wrong inodes that were used or BUG_ONs due to reallocating node ids as follows. When mount this f2fs image: http://linuxtesting.org/downloads/f2fs_fault_image.zip BUG_ON is triggered in f2fs driver (messages below are generated on kernel 3.13.2; for other kernels output is similar): kernel BUG at fs/f2fs/node.c:215! Call Trace: [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs] [<ffffffff811446e7>] ? __lock_page+0x67/0x70 [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50 [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs] [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20 [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80 [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs] [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210 [<ffffffff811b8769>] mount_bdev+0x1c9/0x210 [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs] [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs] [<ffffffff811b9497>] mount_fs+0x47/0x1c0 [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20 [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110 [<ffffffff811d6763>] do_mount+0x493/0x910 [<ffffffff811615cb>] ? strndup_user+0x5b/0x80 [<ffffffff811d6c70>] SyS_mount+0x90/0xe0 [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b Found by Linux File System Verification project (linuxtesting.org). Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:54 +09:00
Gu Zheng	b270ad6f0a	f2fs: enable flush_merge only in f2fs is not read-only Enable flush_merge only in f2fs is not read-only, so does the mount option show. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:54 +09:00
Gu Zheng	197d46476c	f2fs: use __GFP_ZERO to avoid appending set-NULL Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:53 +09:00
Gu Zheng	a4ed23f2f1	f2fs: put the bio when issue_flush completed Put the bio when the flush cmd issued, it also can fix the following kmemleak: unreferenced object 0xffff8800270c73c0 (size 200): comm "f2fs_flush-7:0", pid 27161, jiffies 4312127988 (age 988.503s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 40 07 81 19 01 88 ff ff ........@....... 01 00 00 00 00 00 00 f0 11 14 00 00 00 00 00 00 ................ backtrace: [<ffffffff81559866>] kmemleak_alloc+0x72/0x96 [<ffffffff81156f7e>] slab_post_alloc_hook+0x28/0x2a [<ffffffff811595b1>] kmem_cache_alloc+0xec/0x157 [<ffffffff8111924d>] mempool_alloc_slab+0x15/0x17 [<ffffffff81119513>] mempool_alloc+0x71/0x138 [<ffffffff81193548>] bio_alloc_bioset+0x93/0x18c [<ffffffffa040f857>] issue_flush_thread+0x8d/0x145 [f2fs] [<ffffffff8107ac16>] kthread+0xba/0xc2 [<ffffffff81571b2c>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:53 +09:00
Al Viro	8174202b34	write_iter variants of {__,}generic_file_aio_write() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:38:00 -04:00
Al Viro	aad4f8bb42	switch simple generic_file_aio_read() users to ->read_iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:37:55 -04:00
Al Viro	5b46f25ddc	f2fs: switch to iov_iter_alignment() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:32:52 -04:00
Al Viro	31b140398c	switch {__,}blockdev_direct_IO() to iov_iter Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:32:46 -04:00
Al Viro	d8d3d94b80	pass iov_iter to ->direct_IO() unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:32:44 -04:00
Linus Torvalds	26c12d9334	Merge branch 'akpm' (incoming from Andrew) Merge second patch-bomb from Andrew Morton: - the rest of MM - zram updates - zswap updates - exit - procfs - exec - wait - crash dump - lib/idr - rapidio - adfs, affs, bfs, ufs - cris - Kconfig things - initramfs - small amount of IPC material - percpu enhancements - early ioremap support - various other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits) MAINTAINERS: update Intel C600 SAS driver maintainers fs/ufs: remove unused ufs_super_block_third pointer fs/ufs: remove unused ufs_super_block_second pointer fs/ufs: remove unused ufs_super_block_first pointer fs/ufs/super.c: add __init to init_inodecache() doc/kernel-parameters.txt: add early_ioremap_debug arm64: add early_ioremap support arm64: initialize pgprot info earlier in boot x86: use generic early_ioremap mm: create generic early_ioremap() support x86/mm: sparse warning fix for early_memremap lglock: map to spinlock when !CONFIG_SMP percpu: add preemption checks to __this_cpu ops vmstat: use raw_cpu_ops to avoid false positives on preemption checks slub: use raw_cpu_inc for incrementing statistics net: replace __this_cpu_inc in route.c with raw_cpu_inc modules: use raw_cpu_write for initialization of per cpu refcount. mm: use raw_cpu ops for determining current NUMA node percpu: add raw_cpu_ops slub: fix leak of 'name' in sysfs_slab_add ...	2014-04-07 16:38:06 -07:00
Kirill A. Shutemov	f1820361f8	mm: implement ->map_pages for page cache filemap_map_pages() is generic implementation of ->map_pages() for filesystems who uses page cache. It should be safe to use filemap_map_pages() for ->map_pages() if filesystem use filemap_fault() for ->fault(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Ning Qu <quning@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-07 16:35:53 -07:00
Linus Torvalds	3021112598	f2fs updates for v3.15 This patch-set includes the following major enhancement patches. o introduce large directory support o introduce f2fs_issue_flush to merge redundant flush commands o merge write IOs as much as possible aligned to the segment o add sysfs entries to tune the f2fs configuration o use radix_tree for the free_nid_list to reduce in-memory operations o remove costly bit operations in f2fs_find_entry o enhance the readahead flow for CP/NAT/SIT/SSA blocks The other bug fixes are as follows. o recover xattr node blocks correctly after sudden-power-cut o fix to calculate the maximum number of node ids o enhance to handle many error cases And, there are a bunch of cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJTQiQrAAoJEEAUqH6CSFDSlbIP/iq06BrUeMDLoQFhA2GQFKFD wd0A5h9hCiFcKBcI/u/aAQqj/a5wdwzDl9XzH2PzJ45IM6sVGQZ0lv+kdLhab6rk ipNbV7G0yLAX+8ygS6GZF7pSKfMzGSGTrRvfdtoiunIip1jCY1IkUxv1XMgBSPza wnWYrE5HXEqRUDCqPXJyxrPmx0/0jw8/V82Ng9stnY34ySs+l/3Pvg65Kh0QuSSy BRjJUGlOCF68KUBKd+6YB2T5KlbQde3/5lhP+GMOi+xm5sFB+j+59r/WpJpF2Nxs ImxQs5GkiU01ErH/rn5FgHY/zzddQenBKwOvrjEeUA1eVpBurdsIr1JN0P6qDbgB ho5U8LzCQq+HZiW444eQGkXSOagpUKqDhTVJO7Fji/wG88Atc9gLX3ix8TH2skxT C5CvvrJM7DKBtkZyTzotKY/cWorOZhge6E/EkbGaM1sSHdK5b1Rg4YlFi9TDyz0n QjGD1uuvEeukeKGdIG9pjc7o5ledbMDYwLpT2RuRXenLOTsn8BqDOo9aRTg+5Kag tJNJLFumjPR2mEBNKjicJMUf381J/SKDwZszAz9mgvCZXldMza/Ax0LzJDJCVmkP UuBiVzGxVzpd33IsESUDr0J9hc+t8kS10jfAeKnE3cpb6n7/RYxstHh6CHOFKNXM gPUSYPN3CYiP47DnSfzA =eSW+ -----END PGP SIGNATURE----- Merge tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches. - introduce large directory support - introduce f2fs_issue_flush to merge redundant flush commands - merge write IOs as much as possible aligned to the segment - add sysfs entries to tune the f2fs configuration - use radix_tree for the free_nid_list to reduce in-memory operations - remove costly bit operations in f2fs_find_entry - enhance the readahead flow for CP/NAT/SIT/SSA blocks The other bug fixes are as follows: - recover xattr node blocks correctly after sudden-power-cut - fix to calculate the maximum number of node ids - enhance to handle many error cases And, there are a bunch of cleanups" * tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (62 commits) f2fs: fix wrong statistics of inline data f2fs: check the acl's validity before setting f2fs: introduce f2fs_issue_flush to avoid redundant flush issue f2fs: fix to cover io->bio with io_rwsem f2fs: fix error path when fail to read inline data f2fs: use list_for_each_entry{_safe} for simplyfying code f2fs: avoid free slab cache under spinlock f2fs: avoid unneeded lookup when xattr name length is too long f2fs: avoid unnecessary bio submit when wait page writeback f2fs: return -EIO when node id is not matched f2fs: avoid RECLAIM_FS-ON-W warning f2fs: skip unnecessary node writes during fsync f2fs: introduce fi->i_sem to protect fi's info f2fs: change reclaim rate in percentage f2fs: add missing documentation for dir_level f2fs: remove unnecessary threshold f2fs: throttle the memory footprint with a sysfs entry f2fs: avoid to drop nat entries due to the negative nr_shrink f2fs: call f2fs_wait_on_page_writeback instead of native function f2fs: introduce nr_pages_to_write for segment alignment ...	2014-04-07 10:55:36 -07:00
Chao Yu	48b230a583	f2fs: fix wrong statistics of inline data If we remove a file that has inline data after mount, our statistics turns to inaccurate. cat /sys/kernel/debug/f2fs/status - Inline_data Inode: 4294967295 Let's add stat_inc_inline_inode() to stat inline info of the file when lookup. Change log from v1: o stat in f2fs_lookup() instead of in do_read_inode() for excluding wrong stat. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-07 12:40:58 +09:00
ZhangZhen	3a8861e271	f2fs: check the acl's validity before setting Before setting the acl, call posix_acl_valid() to check if it is valid or not. Signed-off-by: zhangzhen <zhenzhang.zhang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-07 12:18:30 +09:00
Jaegeuk Kim	6b4afdd794	f2fs: introduce f2fs_issue_flush to avoid redundant flush issue Some storage devices show relatively high latencies to complete cache_flush commands, even though their normal IO speed is prettry much high. In such the case, it needs to merge cache_flush commands as much as possible to avoid issuing them redundantly. So, this patch introduces a mount option, "-o flush_merge", to mitigate such the overhead. If this option is enabled by user, F2FS merges the cache_flush commands and then issues just one cache_flush on behalf of them. Once the single command is finished, F2FS sends a completion signal to all the pending threads. Note that, this option can be used under a workload consisting of very intensive concurrent fsync calls, while the storage handles cache_flush commands slowly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-07 09:50:58 +09:00
Linus Torvalds	24e7ea3bea	Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15 a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/ +165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA 2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr 700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU DrPKlXwIgva7zq5/S0Vr =4s1Z -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits) ext4: fix premature freeing of partial clusters split across leaf blocks ext4: remove unneeded test of ret variable ext4: fix comment typo ext4: make ext4_block_zero_page_range static ext4: atomically set inode->i_flags in ext4_set_inode_flags() ext4: optimize Hurd tests when reading/writing inodes ext4: kill i_version support for Hurd-castrated file systems ext4: each filesystem creates and uses its own mb_cache fs/mbcache.c: doucple the locking of local from global data fs/mbcache.c: change block and index hash chain to hlist_bl_node ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate ext4: refactor ext4_fallocate code ext4: Update inode i_size after the preallocation ext4: fix partial cluster handling for bigalloc file systems ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents ext4: only call sync_filesystm() when remounting read-only fs: push sync_filesystem() down to the file system's remount_fs() jbd2: improve error messages for inconsistent journal heads jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() jbd2: minimize region locked by j_list_lock in journal_get_create_access() ...	2014-04-04 15:39:39 -07:00
Johannes Weiner	91b0abe36a	mm + fs: store shadow entries in page cache Reclaim will be leaving shadow entries in the page cache radix tree upon evicting the real page. As those pages are found from the LRU, an iput() can lead to the inode being freed concurrently. At this point, reclaim must no longer install shadow pages because the inode freeing code needs to ensure the page tree is really empty. Add an address_space flag, AS_EXITING, that the inode freeing code sets under the tree lock before doing the final truncate. Reclaim will check for this flag before installing shadow pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-03 16:21:01 -07:00
Jaegeuk Kim	ce23447fe5	f2fs: fix to cover io->bio with io_rwsem In the f2fs_wait_on_page_writeback, io->bio should be covered by io_rwsem. Otherwise, the bio pointer can become a dangling pointer due to data races. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-02 09:56:27 +09:00
Chao Yu	d54c795b49	f2fs: fix error path when fail to read inline data We should unlock page in ->readpage() path and also should unlock & release page in error path of ->write_begin() to avoid deadlock or memory leak. So let's add release code to fix the problem when we fail to read inline data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-02 09:56:27 +09:00
Chao Yu	2d7b822ad9	f2fs: use list_for_each_entry{_safe} for simplyfying code This patch use list_for_each_entry{_safe} instead of list_for_each{_safe} for simplfying code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-02 09:56:27 +09:00
Chao Yu	cf0ee0f09b	f2fs: avoid free slab cache under spinlock Move kmem_cache_free out of spinlock protection region for better performance. Change log from v1: o remove spinlock protection for kmem_cache_free in destroy_node_manager suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-02 09:56:12 +09:00
Chao Yu	6e452d69d4	f2fs: avoid unneeded lookup when xattr name length is too long In f2fs_setxattr we have limit this attribute name length, so we should also check it in f2fs_getxattr to avoid useless lookup caused by invalid name length. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-01 18:54:24 +09:00
Chao Yu	df0f8dc0e1	f2fs: avoid unnecessary bio submit when wait page writeback This patch introduce is_merged_page() to check whether current page is merged in f2fs bio cache. When page is not in cache, we can avoid submitting bio cache, resulting in having more chance to merge pages. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-01 18:53:41 +09:00
Jaegeuk Kim	3bb5e2c8fe	f2fs: return -EIO when node id is not matched During the cleaing of node segments, F2FS can get errored node blocks due to data race between node page lock and its valid bitmap operations. In that case, it needs to return an error to skip such the obsolete block copy. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-01 17:38:26 +09:00
Jaegeuk Kim	808a1d7490	f2fs: avoid RECLAIM_FS-ON-W warning This patch should resolve the following possible bug. RECLAIM_FS-ON-W at: mark_held_locks+0xb9/0x140 lockdep_trace_alloc+0x85/0xf0 __kmalloc+0x53/0x1d0 read_all_xattrs+0x3d1/0x3f0 [f2fs] f2fs_getxattr+0x4f/0x100 [f2fs] f2fs_get_acl+0x4c/0x290 [f2fs] get_acl+0x4f/0x80 posix_acl_create+0x72/0x180 f2fs_init_acl+0x29/0xcc [f2fs] __f2fs_add_link+0x259/0x710 [f2fs] f2fs_create+0xad/0x1c0 [f2fs] vfs_create+0xed/0x150 do_last+0xd36/0xed0 path_openat+0xc5/0x680 do_filp_open+0x43/0xa0 do_sys_open+0x13c/0x230 SyS_creat+0x1e/0x20 system_call_fastpath+0x16/0x1b Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:21:08 +09:00
Jaegeuk Kim	479f40c44a	f2fs: skip unnecessary node writes during fsync If multiple redundant fsync calls are triggered, we don't need to write its node pages with fsync mark continuously. So, this patch adds FI_NEED_FSYNC to track whether the latest node block is written with the fsync mark or not. If the mark was set, a new fsync doesn't need to write a node block. Otherwise, we should do a new node block with the mark for roll-forward recovery. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:11 +09:00
Jaegeuk Kim	d928bfbfe7	f2fs: introduce fi->i_sem to protect fi's info This patch introduces fi->i_sem to protect fi's info that includes xattr_ver, pino, i_nlink. This enables to remove i_mutex during f2fs_sync_file, resulting in performance improvement when a number of fsync calls are triggered from many concurrent threads. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:11 +09:00
Jaegeuk Kim	58c410351e	f2fs: change reclaim rate in percentage It is more reasonable to determine the reclaiming rate of prefree segments according to the volume size, which is set to 5% by default. For example, if the volume is 128GB, the prefree segments are reclaimed when the number reaches to 6.4GB. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:10 +09:00
Jaegeuk Kim	a5f420101d	f2fs: remove unnecessary threshold The NM_WOUT_THRESHOLD is now obsolete since f2fs starts to control on a basis of the memory footprint. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:09 +09:00
Jaegeuk Kim	cdfc41c134	f2fs: throttle the memory footprint with a sysfs entry This patch introduces ram_thresh, a sysfs entry, which controls the memory footprint used by the free nid list and the nat cache. Previously, the free nid list was controlled by MAX_FREE_NIDS, while the nat cache was managed by NM_WOUT_THRESHOLD. However, this approach cannot be applied dynamically according to the system. So, this patch adds ram_thresh that users can specify the threshold, which is in order of 1 / 1024. For example, if the total ram size is 4GB and the value is set to 10 by default, f2fs tries to control the number of free nids and nat caches not to consume over 10 * (4GB / 1024) = 10MB. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:09 +09:00
Jaegeuk Kim	40bb0058c8	f2fs: avoid to drop nat entries due to the negative nr_shrink The try_to_free_nats should not receive the negative nr_shrink. Otherwise, it can drop all the nat entries by the while loop. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:08 +09:00
Jaegeuk Kim	3cb5ad152b	f2fs: call f2fs_wait_on_page_writeback instead of native function If a page is on writeback, f2fs can face with deadlock due to under writepages. This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw merged IOs, which is implemented by f2fs_wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:04 +09:00
Jaegeuk Kim	50c8cdb35a	f2fs: introduce nr_pages_to_write for segment alignment This patch introduces nr_pages_to_write to align page writes to the segment or other operational unit size, which can be tuned according to the system environment. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 16:37:53 +09:00
Jaegeuk Kim	d3baf95da5	f2fs: increase pages_skipped when skipping writepages This patch increases pages_skipped when skipping writepages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 16:37:16 +09:00
Jaegeuk Kim	87d6f89094	f2fs: avoid small data writes by skipping writepages This patch introduces nr_pages_to_skip(sbi, type) to determine writepages can be skipped. The dentry, node, and meta pages can be conrolled by F2FS without breaking the FS consistency. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 13:58:59 +09:00
Jaegeuk Kim	f8b2c1f940	f2fs: introduce get_dirty_dents for readability The get_dirty_dents gives us the number of dirty dentry pages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 12:34:30 +09:00
Chao Yu	04c0938844	f2fs: fix incorrect parsing with option string Previously 'background_gc={on*,off*}' is being parsed as correct option, with this patch we cloud fix the trivial bug in mount process. Change log from v1: o need to check length of parameter suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 10:13:02 +09:00
Chao Yu	e4fc5fbfc9	f2fs: avoid to return incorrect errno of read_normal_summaries We should return error number of read_normal_summaries instead of -EINVAL when read_normal_summaries failed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 09:29:53 +09:00
Chao Yu	4bc8e9bcf5	f2fs: introduce f2fs_has_xattr_block for better readability This patch introduces a help function f2fs_has_xattr_block for better readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 09:29:46 +09:00
Chao Yu	90aa6dc9b9	f2fs: print type for each segment in segment_info's show The original segment_info's show looks out-of-format: cat /proc/fs/f2fs/loop0/segment_info 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 512 512 512 512 512 512 512 512 0 0 512 348 0 263 0 0 512 0 0 512 512 512 512 0 512 512 512 512 512 512 512 512 512 511 328 512 512 512 512 512 512 512 512 512 512 512 512 512 0 0 175 Let's fix this and show type for each segment. cat /proc/fs/f2fs/loop0/segment_info format: segment_type\|valid_blocks segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN) 0 2\|0 1\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 10 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 20 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 30 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 40 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 0\|0 50 3\|0 3\|0 3\|0 3\|0 3\|0 3\|0 3\|0 0\|0 3\|0 3\|0 60 3\|0 3\|0 3\|0 3\|0 3\|0 3\|0 3\|0 3\|0 3\|0 3\|512 70 3\|512 3\|512 3\|512 3\|512 3\|512 3\|512 3\|512 3\|0 3\|0 3\|512 80 3\|0 3\|0 3\|0 3\|0 3\|0 3\|512 3\|0 3\|0 3\|512 3\|512 90 3\|512 0\|512 3\|274 0\|512 0\|512 0\|512 0\|512 0\|512 0\|512 3\|512 100 3\|512 0\|512 3\|511 0\|328 3\|512 0\|512 0\|512 3\|512 0\|512 0\|512 110 0\|512 0\|512 0\|512 0\|512 0\|512 0\|512 0\|512 5\|0 4\|0 3\|512 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 09:27:18 +09:00
Theodore Ts'o	02b9984d64	fs: push sync_filesystem() down to the file system's remount_fs() Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org	2014-03-13 10:14:33 -04:00
Chao Yu	910bb12d29	f2fs: check upper bound of ino value in f2fs_nfs_get_inode Upper bound checking of ino should be added to f2fs_nfs_get_inode, so unneeded process before do_read_inode in f2fs_iget could be avoided when ino is invalid. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-12 18:15:38 +09:00
Chao Yu	987c7c3112	f2fs: introduce f2fs_has_inline_xattr for better readability This patch introduces a help function f2fs_has_inline_xattr for better readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-12 17:23:35 +09:00
Chao Yu	28cdce0459	f2fs: recover inline xattr data in roll-forward process Previously we do not recover inline xattr data of inode after power-cut, so inline xattr data may be lost. We should recover the data during the roll-forward process. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-11 16:31:06 +09:00
Gu Zheng	d653788a43	f2fs: optimize restore_node_summary slightly Previously, we ra_sum_pages to pre-read contiguous pages as more as possible, and if we fail to alloc more pages, an ENOMEM error will be reported upstream, even though we have alloced some pages yet. In fact, we can use the available pages to do the job partly, and continue the rest in the following circle. Only reporting ENOMEM upstream if we really can not alloc any available page. And another fix is ignoring dealing with the following pages if an EIO occurs when reading page from page_list. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: modify the flow for better neat code] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-10 18:45:15 +09:00
Gu Zheng	46c04366bb	f2fs: format segment_info's show for better legibility The original segment_info's show is a bit out-of-format: [root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...... 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 [root@guz Demoes]# so we fix it here for better legibility. [root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...... 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 [root@guz Demoes]# Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-10 18:45:15 +09:00
Gu Zheng	e8512d2e0c	f2fs: remove the unused ctor argument of f2fs_kmem_cache_create() Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-10 18:45:14 +09:00
Gu Zheng	b6ce391e61	f2fs: update start nid only once each circle Integrated a couple of minor changes for better readability suggested by Chao Yu. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-10 18:45:09 +09:00
Jaegeuk Kim	20f70751c6	f2fs: fix wrong kernel coding style This patch includes a simple fix to adjust coding style. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-05 10:48:53 +09:00
Jaegeuk Kim	c81bf1c84f	f2fs: fix to write node pages with WRITE_SYNC This patch fixes performance regression of dbench reported by Alex <hbx7d@yandex.com>. This issue was revealed by Phoronix tests results: http://www.phoronix.com/scan.php?page=article&item=linux_314_ssdfs&num=2 It turns out that we need to assign WRITE_SYNC to the node writes, if fsync is triggered. The performance numbers are like below, which is measured by Alex. 1. 355MB/s ext4 2. 225MB/s f2fs : WRITE for node writes 3. 525MB/s f2fs : WRITE_SYNC for node writes Reported-And-Tested-by: Alex <hbx7d@yandex.com>. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-03 11:28:40 +09:00
Chao Yu	9cf3c3898a	f2fs: fix dirty page accounting when redirty We should de-account dirty counters for page when redirty in ->writepage(). Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75': "writeback: fix dirtied pages accounting on redirty De-account the accumulative dirty counters on page redirty. Page redirties (very common in ext4) will introduce mismatch between counters (a) and (b) a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied b) NR_WRITTEN, BDI_WRITTEN This will introduce systematic errors in balanced_rate and result in dirty page position errors (ie. the dirty pages are no longer balanced around the global/bdi setpoints)." Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-28 13:09:08 +09:00
Chao Yu	695fd1ed3b	f2fs: use existing macro to clean up some codes This patch use existing macro F2FS_INODE/NEXT_FREE_BLKADDR to clean up some codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-27 21:09:28 +09:00
Chao Yu	81c1a0f13e	f2fs: readahead contiguous SSA blocks for f2fs_gc If there are multi segments in one section, we will read those SSA blocks which have contiguous address one by one in f2fs_gc. It may lost performance, let's read ahead SSA blocks by merge multi read request. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-27 20:40:36 +09:00
Jaegeuk Kim	ab9fa662e4	f2fs: add an sysfs entry to control the directory level This patch adds an sysfs entry to control dir_level used by the large directory. The description of this entry is: dir_level This parameter controls the directory level to support large directory. If a directory has a number of files, it can reduce the file lookup latency by increasing this dir_level value. Otherwise, it needs to decrease this value to reduce the space overhead. The default value is 0. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-27 20:31:15 +09:00
Jaegeuk Kim	3843154598	f2fs: introduce large directory support This patch introduces an i_dir_level field to support large directory. Previously, f2fs maintains multi-level hash tables to find a dentry quickly from a bunch of chiild dentries in a directory, and the hash tables consist of the following tree structure as below. In Documentation/filesystems/f2fs.txt, ---------------------- A : bucket B : block N : MAX_DIR_HASH_DEPTH ---------------------- level #0 \| A(2B) \| level #1 \| A(2B) - A(2B) \| level #2 \| A(2B) - A(2B) - A(2B) - A(2B) . \| . . . . level #N/2 \| A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B) . \| . . . . level #N \| A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B) But, if we can guess that a directory will handle a number of child files, we don't need to traverse the tree from level #0 to #N all the time. Since the lower level tables contain relatively small number of dentries, the miss ratio of the target dentry is likely to be high. In order to avoid that, we can configure the hash tables sparsely from level #0 like this. level #0 \| A(2B) - A(2B) - A(2B) - A(2B) level #1 \| A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B) . \| . . . . level #N/2 \| A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B) . \| . . . . level #N \| A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B) With this structure, we can skip the ineffective tree searches in lower level hash tables. This patch adds just a facility for this by introducing i_dir_level in f2fs_inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-27 19:56:09 +09:00
Jaegeuk Kim	5d0c667121	f2fs: remove costly bit operations for f2fs_find_entry It turns out that a bit operation like find_next_bit is not always fast enough for f2fs_find_entry. Instead, it is pretty much simple and fast to traverse each dentries. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-27 16:25:20 +09:00
Jaegeuk Kim	8b8343fa9d	f2fs: implement a lock-free stat_show The stat_show is just to show the current status of f2fs. So, we can remove all the there-in locks. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-24 16:00:41 +09:00
Jaegeuk Kim	8a7ed66aaf	f2fs: introduce a radix_tree for the free_nid list This patch introduces a radix tree for the list of free_nids, which enhances the performance on free nid management. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-24 16:00:41 +09:00
Gu Zheng	f978f5a061	f2fs: introduce help macro on_build_free_nids() Introduce help macro on_build_free_nids() which just uses build_lock to judge whether the building free nid is going, so that we can remove the on_build_free_nids field from f2fs_sb_info. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: remove an unnecessary white line removal] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-24 16:00:40 +09:00
Jaegeuk Kim	fffc2a00fc	f2fs: fix to mark the checkpointed nat entry correctly The nat cache entry maintains a status whether it is checkpointed or not. So, if a new cache entry is loaded from the last checkpoint, nat_entry->checkpointed should be true. If the cache entry is modified as being dirty, nat_entry->checkpoint should be false. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-24 16:00:40 +09:00
Jaegeuk Kim	6437d1b0ad	f2fs: fix to do build_stat prior to the recovery procedure At the end of the recovery procedure, write_checkpoint is called and updates the cp count which is managed by f2fs stat. But, previously build_stat() is called after the recovery procedure, which results in: BUG: unable to handle kernel NULL pointer dereference at 000000000000012c IP: [<ffffffffa03b1030>] write_checkpoint+0x720/0xbc0 [f2fs] Call Trace: [<ffffffff810a6b44>] ? mark_held_locks+0x74/0x140 [<ffffffff8109a3e0>] ? __init_waitqueue_head+0x60/0x60 [<ffffffffa03bf036>] recover_fsync_data+0x656/0xf20 [f2fs] [<ffffffff812ee3eb>] ? security_d_instantiate+0x1b/0x30 [<ffffffffa03aeb4d>] f2fs_fill_super+0x94d/0xa00 [f2fs] [<ffffffff811a9825>] mount_bdev+0x1a5/0x1f0 [<ffffffff8114915e>] ? __get_free_pages+0xe/0x40 [<ffffffffa03ae200>] ? f2fs_remount+0x130/0x130 [f2fs] [<ffffffffa03aa575>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff811aa713>] mount_fs+0x43/0x1b0 [<ffffffff811c7124>] vfs_kern_mount+0x74/0x160 [<ffffffff811c5cb1>] ? __get_fs_type+0x51/0x60 [<ffffffff811c9727>] do_mount+0x237/0xb50 [<ffffffff811c936a>] ? copy_mount_options+0x3a/0x170 So, this patche changes the order of recovery_fsync_data() and f2fs_build_stats(). Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-24 16:00:39 +09:00
Jaegeuk Kim	8618b881e9	f2fs: fix not to write data pages on the page reclaiming path Even if f2fs_write_data_page is called by the page reclaiming path, we should not write the page to provide enough free segments for the worst case scenario. Otherwise, f2fs can face with no free segment while gc is conducted, resulting in: ------------[ cut here ]------------ kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/segment.c:565! RIP: 0010:[<ffffffffa02c3b11>] [<ffffffffa02c3b11>] new_curseg+0x331/0x340 [f2fs] Call Trace: allocate_segment_by_default+0x204/0x280 [f2fs] allocate_data_block+0x108/0x210 [f2fs] write_data_page+0x8a/0xc0 [f2fs] do_write_data_page+0xe1/0x2a0 [f2fs] move_data_page+0x8a/0xf0 [f2fs] f2fs_gc+0x446/0x970 [f2fs] f2fs_balance_fs+0xb6/0xd0 [f2fs] f2fs_write_begin+0x50/0x350 [f2fs] ? unlock_page+0x27/0x30 ? unlock_page+0x27/0x30 generic_file_buffered_write+0x10a/0x280 ? file_update_time+0xa3/0xf0 __generic_file_aio_write+0x1c8/0x3d0 ? generic_file_aio_write+0x52/0xb0 ? generic_file_aio_write+0x52/0xb0 generic_file_aio_write+0x65/0xb0 do_sync_write+0x5a/0x90 vfs_write+0xc5/0x1f0 SyS_write+0x55/0xa0 system_call_fastpath+0x16/0x1b Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-24 16:00:33 +09:00
Jaegeuk Kim	b63da15e8b	f2fs: fix the calculation of max_nids Total nids that f2fs can use should not include 0, nid for node inode, and nid for meta inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Changman Lee	942e0be621	f2fs: show counts of checkpoint in status This patch shows the counts of checkpoint in f2fs' status. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Chao Yu	662befda25	f2fs: introduce ra_meta_pages to readahead CP/NAT/SIT pages This patch help us to cleanup the readahead code by merging ra_{sit,nat}_pages function into ra_meta_pages. Additionally the new function is used to readahead cp block in recover_orphan_inodes. Change log from v1: o fix a deadloop bug pointed by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Chao Yu	3375f696bd	f2fs: use inode mutex to keep atomicity of f2fs_falloc Previously without protection of inode mutex, f2fs_falloc and other data correlated operations will interfere with each other. So let's use inode mutex to keep atomicity of f2fs_falloc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Jaegeuk Kim	1fe54f9dd3	f2fs: clean up redundant function call This patch integrates inode_[inc\|dec]_dirty_dents with inc_page_count to remove redundant calls. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Jaegeuk Kim	203681f65b	f2fs: fix f2fs_write_meta_page at no checkpoint status If f2fs entered errorneous checkpoint status, it should skip writing meta pages instead of redirtying the pages out. Otherwise, it cannot unmount the partition even though f2fs is under read-only status. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Jaegeuk Kim	bd859c6598	f2fs: fix to truncate dentry pages in the error case When a new directory is allocated, if an error is occurred, we should truncate preallocated dentry pages too. This bug was reported by Andrey Tsyvarev after a while as follows. mkdir()-> f2fs_add_link()-> init_inode_metadata()-> f2fs_init_acl()-> f2fs_get_acl()-> f2fs_getxattr()-> read_all_xattrs() fails. Also there was a BUG_ON triggered after the fault in mkdir()-> f2fs_add_link()-> init_inode_metadata()-> remove_inode_page() -> f2fs_bug_on(inode->i_blocks != 0 && inode->i_blocks != 1); But, previous patch wasn't perfect to resolve that bug, so the following bug report was also submitted. kernel BUG at fs/f2fs/inode.c:274! Call Trace: [<ffffffff811fde03>] evict+0xa3/0x1a0 [<ffffffff811fe615>] iput+0xf5/0x180 [<ffffffffa01c7f63>] f2fs_mkdir+0xf3/0x150 [f2fs] [<ffffffff811f2a77>] vfs_mkdir+0xb7/0x160 [<ffffffff811f36bf>] SyS_mkdir+0x5f/0xc0 [<ffffffff81680769>] system_call_fastpath+0x16/0x1b Finally, this patch resolves all the issues like below. If an error is occurred after make_empty_dir(), 1. truncate_inode_pages() The make_bad_inode() prior to iput() will change i_mode to S_IFREG, which means that f2fs will not decrement fi->dirty_dents during f2fs_evict_inode. But, by calling it here, we can do that. 2. truncate_blocks() Preallocated dentry pages are trucated here to sync i_blocks. 3. remove_dirty_dir_inode() Remove this directory inode from the list. Reported-and-Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:52 +09:00
Jaegeuk Kim	f6517cfc84	f2fs: fix a build warning This patch modifies flow a little bit to avoid the following build warnings. src/fs/f2fs/recovery.c: In function ‘check_index_in_prev_nodes’: src/fs/f2fs/recovery.c:288:51: warning: ‘sum.<U5390>.<U52f8>.ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized] src/fs/f2fs/recovery.c:260:23: warning: ‘sum.nid’ may be used uninitialized in this function [-Wmaybe-uninitialized] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:52 +09:00
Jaegeuk Kim	491c0854b4	f2fs: clean up with a macro This patch adds GET_BLKOFF_FROM_SEG0 to clean up some codes. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:52 +09:00
Jaegeuk Kim	924a2ddbd0	f2fs: fix the potential mismatch between dir's i_size and i_blocks This is the erroneous scenario. i_size on-disk i_size i_blocks __f2fs_add_link() 4096 4096 2 get_new_data_page 8192 4096 3 -ENOSPC = init_inode_metadata checkpoint - 4096 3 POR and reboot __f2fs_add_link() 4096 4096 3 page = get_new_data_page (page->index = 1 by NEW_ADDR) add a dentry to the page successfully f2fs_rmdir() f2fs_empty_dir() 4096 4096 3 f2fs_unlink() goes, since there is no valid dentry due to i_size = 4096. But, still there is one dentry in page->index = 1. So this patch moves the code to write dir->i_size into on-disk i_size in order to sync dir's i_size, on-disk i_size, and its i_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:52 +09:00
Jaegeuk Kim	1b1f559fc3	f2fs: remove the ugly pointer conversion This patch modifies the use of bi_private to remove pointer chasing for sbi. Previously, we had a bi_private structure, but it needs memory allocation. So this patch uses bi_private by the sbi pointer and adds a completion pointer into the sbi. This can achieve no memory allocation and nice use of the bi_private. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:52 +09:00
Jaegeuk Kim	abb2366c82	f2fs: fix to recover xattr node block If a new xattr node page was allocated and its inode is fsynced, we should recover the xattr node page during the roll-forward process after power-cut. But, previously, f2fs didn't handle that case, resulting in kernel panic as follows reported by Tom Li. BUG: unable to handle kernel paging request at ffffc9001c861a98 IP: [<ffffffffa0295236>] check_index_in_prev_nodes+0x86/0x2d0 [f2fs] Call Trace: [<ffffffff815ece9b>] ? printk+0x48/0x4a [<ffffffffa029626a>] recover_fsync_data+0xdca/0xf50 [f2fs] [<ffffffffa02873ae>] f2fs_fill_super+0x92e/0x970 [f2fs] [<ffffffff8112c9f8>] mount_bdev+0x1b8/0x200 [<ffffffffa0286a80>] ? f2fs_remount+0x130/0x130 [f2fs] [<ffffffffa0285e40>] f2fs_mount+0x10/0x20 [f2fs] [<ffffffff8112d4de>] mount_fs+0x3e/0x1b0 [<ffffffff810ef4eb>] ? __alloc_percpu+0xb/0x10 [<ffffffff8114761f>] vfs_kern_mount+0x6f/0x120 [<ffffffff811497b9>] do_mount+0x259/0xa90 [<ffffffff810ead1d>] ? memdup_user+0x3d/0x80 [<ffffffff810eadb3>] ? strndup_user+0x53/0x70 [<ffffffff8114a2c9>] SyS_mount+0x89/0xd0 [<ffffffff815feae2>] system_call_fastpath+0x16/0x1b This patch adds a recovery function of xattr node pages. Reported-by: Tom Li <biergaizi@members.fsf.org> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:52 +09:00
Jaegeuk Kim	5e443818fa	f2fs: handle dirty segments inside refresh_sit_entry This patch cleans up the refresh_sit_entry to handle locate_dirty_segments. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:52 +09:00
Jaegeuk Kim	744602cf45	f2fs: update_inode_page should be done all the time In order to make fs consistency, update_inode_page should not be failed all the time. Otherwise, it is possible to lose some metadata in the inode like a link count. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:51 +09:00
Linus Torvalds	f568849eda	Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...	2014-01-30 11:19:05 -08:00
Linus Torvalds	bf3d846b78	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus assorted cleanups and fixes all over the place... There will be another pile later this week" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits) __dentry_path() fixes vfs: Remove second variable named error in __dentry_path vfs: Is mounted should be testing mnt_ns for NULL or error. Fix race when checking i_size on direct i/o read hfsplus: remove can_set_xattr nfsd: use get_acl and ->set_acl fs: remove generic_acl nfs: use generic posix ACL infrastructure for v3 Posix ACLs gfs2: use generic posix ACL infrastructure jfs: use generic posix ACL infrastructure xfs: use generic posix ACL infrastructure reiserfs: use generic posix ACL infrastructure ocfs2: use generic posix ACL infrastructure jffs2: use generic posix ACL infrastructure hfsplus: use generic posix ACL infrastructure f2fs: use generic posix ACL infrastructure ext2/3/4: use generic posix ACL infrastructure btrfs: use generic posix ACL infrastructure fs: make posix_acl_create more useful fs: make posix_acl_chmod more useful ...	2014-01-28 08:38:04 -08:00
Christoph Hellwig	a6dda0e63e	f2fs: use generic posix ACL infrastructure f2fs has some weird mode bit handling, so still using the old chmod code for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-01-25 23:58:19 -05:00
Christoph Hellwig	37bc15392a	fs: make posix_acl_create more useful Rename the current posix_acl_created to __posix_acl_create and add a fully featured helper to set up the ACLs on file creation that uses get_acl(). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-01-25 23:58:18 -05:00
Christoph Hellwig	5bf3258fd2	fs: make posix_acl_chmod more useful Rename the current posix_acl_chmod to __posix_acl_chmod and add a fully featured ACL chmod helper that uses the ->set_acl inode operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-01-25 23:58:18 -05:00
Jaegeuk Kim	bf39c00a9a	f2fs: drop obsolete node page when it is truncated If a node page is trucated, we'd better drop the page in the node_inode's page cache for better memory footprint. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-23 08:04:21 +09:00
Jaegeuk Kim	4ef51a8fcc	f2fs: introduce NODE_MAPPING for code consistency This patch adds NODE_MAPPING which is similar as META_MAPPING introduced by Gu Zheng. Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-22 18:41:08 +09:00
Gu Zheng	63f5384c9a	f2fs: remove the orphan block page array As the orphan_blocks may be max to 504, so it is not security and rigorous to store such a large array in the kernel stack as Dan Carpenter said. In fact, grab_meta_page has locked the page in the page cache, and we can use find_get_page() to fetch the page safely in the downstream, so we can remove the page array directly. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-22 18:41:08 +09:00
Gu Zheng	9df27d982d	f2fs: add help function META_MAPPING Introduce help function META_MAPPING() to get the cache meta blocks' address space. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-22 18:41:07 +09:00
Jaegeuk Kim	e8dae60458	f2fs: move a branch for code redability This patch moves a function in f2fs_delete_entry for code readability. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-22 18:41:07 +09:00
Jaegeuk Kim	a18ff06340	f2fs: call mark_inode_dirty to flush dirty pages If a dentry page is updated, we should call mark_inode_dirty to add the inode into the dirty list, so that its dentry pages are flushed to the disk. Otherwise, the inode can be evicted without flush. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-22 18:40:34 +09:00
Chris Fries	6c311ec6c2	f2fs: clean checkpatch warnings Fixed a variety of trivial checkpatch warnings. The only delta should be some minor formatting on log strings that were split / too long. Signed-off-by: Chris Fries <cfries@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-20 10:27:12 +09:00
Changman Lee	c434cbc0ed	f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH) Doing sync_meta_pages with META_FLUSH when checkpoint, we overide rw using WRITE_FLUSH_FUA. At this time, we also should set REQ_META\|REQ_PRIO. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-16 17:28:35 +09:00
Jaegeuk Kim	c33ec32692	f2fs: avoid f2fs_balance_fs call during pageout This patch should resolve the following bug. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 3.13.0-rc5.f2fs+ #6 Not tainted --------------------------------------------------------- kswapd0/41 just changed the state of lock: (&sbi->gc_mutex){+.+.-.}, at: [<ffffffffa030503e>] f2fs_balance_fs+0xae/0xd0 [f2fs] but this lock took another, RECLAIM_FS-READ-unsafe lock in the past: (&sbi->cp_rwsem){++++.?} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Chain exists of: &sbi->gc_mutex --> &sbi->cp_mutex --> &sbi->cp_rwsem Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sbi->cp_rwsem); local_irq_disable(); lock(&sbi->gc_mutex); lock(&sbi->cp_mutex); <Interrupt> lock(&sbi->gc_mutex); * DEADLOCK * This bug is due to the f2fs_balance_fs call in f2fs_write_data_page. If f2fs_write_data_page is triggered by wbc->for_reclaim via kswapd, it should not call f2fs_balance_fs which tries to get a mutex grabbed by original syscall flow. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-16 16:20:40 +09:00
Changman Lee	499046ab2c	f2fs: add delimiter to seperate name and value in debug phrase Support for f2fs-tools/tools/f2stat to monitor /sys/kernel/debug/f2fs/status Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-14 18:22:17 +09:00
Gu Zheng	17b692f60e	f2fs: use spinlock rather than mutex for better speed With the 2 previous changes, all the long time operations are moved out of the protection region, so here we can use spinlock rather than mutex (orphan_inode_mutex) for lower overhead. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-14 18:12:05 +09:00
Gu Zheng	c1ef372572	f2fs: move alloc new orphan node out of lock protection region Move alloc new orphan node out of lock protection region. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-14 18:12:04 +09:00
Gu Zheng	4531929e39	f2fs: move grabing orphan pages out of protection region Move grabing orphan block page out of protection region, and grab all the orphan block pages ahead. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: remove unnecessary code pointed by Chao Yu] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-14 18:11:20 +09:00
Yuan Zhong	5514f0aadd	f2fs: remove the needless parameter of f2fs_wait_on_page_writeback "boo sync" parameter is never referenced in f2fs_wait_on_page_writeback. We should remove this parameter. Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-14 17:45:54 +09:00
Jaegeuk Kim	b1c57c1caa	f2fs: add a sysfs entry to control max_victim_search Previously during SSR and GC, the maximum number of retrials to find a victim segment was hard-coded by MAX_VICTIM_SEARCH, 4096 by default. This number makes an effect on IO locality, when SSR mode is activated, which results in performance fluctuation on some low-end devices. If max_victim_search = 4, the victim will be searched like below. ("D" represents a dirty segment, and "" indicates a selected victim segment.) D1 D2 D3 D4 D5 D6 D7 D8 D9 [ ] [ * ] [ * ] [ ....] This patch adds a sysfs entry to control the number dynamically through: /sys/fs/f2fs/$dev/max_victim_search Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-08 13:45:08 +09:00
Jaegeuk Kim	fb5566da91	f2fs: improve write performance under frequent fsync calls When considering a bunch of data writes with very frequent fsync calls, we are able to think the following performance regression. N: Node IO, D: Data IO, IO scheduler: cfq Issue pending IOs D1 D2 D3 D4 D1 D2 D3 D4 N1 D2 D3 D4 N1 N2 N1 D3 D4 N2 D1 --> N1 can be selected by cfq becase of the same priority of N and D. Then D3 and D4 would be delayed, resuling in performance degradation. So, when processing the fsync call, it'd better give higher priority to data IOs than node IOs by assigning WRITE and WRITE_SYNC respectively. This patch improves the random wirte performance with frequent fsync calls by up to 10%. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-08 11:16:20 +09:00
Chao Yu	04a17fb17f	f2fs: avoid to read inline data except first page Here is a case which could read inline page data not from first page. 1. write inline data 2. lseek to offset 4096 3. read 4096 bytes from offset 4096 (read_inline_data read inline data page to non-first page, And previously VFS has add this page to page cache) 4. ftruncate offset 8192 5. read 4096 bytes from offset 4096 (we meet this updated page with inline data in cache) So we should leave this page with inited data and uptodate flag for this case. Change log from v1: o fix a deadlock bug Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:22 +09:00
Chao Yu	18309aaa41	f2fs: avoid to left uninitialized data in page when read inline data Change log from v1: o reduce unneeded memset in __f2fs_convert_inline_data >From 58796be2bd2becbe8d52305210fb2a64e7dd80b6 Mon Sep 17 00:00:00 2001 From: Chao Yu <chao2.yu@samsung.com> Date: Mon, 30 Dec 2013 09:21:33 +0800 Subject: [PATCH] f2fs: avoid to left uninitialized data in page when read inline data We left uninitialized data in the tail of page when we read an inline data page. So let's initialize left part of the page excluding inline data region. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:22 +09:00
shifei10.ge	a225dca394	f2fs: fix truncate_partial_nodes bug The truncate_partial_nodes puts pages incorrectly in the following two cases. Note that the value for argc 'depth' can only be 2 or 3. Please see truncate_inode_blocks() and truncate_partial_nodes(). 1) An err is occurred in the first 'for' loop When err is occurred with depth = 2, pages[0] is invalid, so this page doesn't need to be put. There is no problem, however, when depth is 3, it doesn't put the pages correctly where pages[0] is valid and pages[1] is invalid. In this case, depth is set to 2 (ref to statemnt depth = i + 1), and then 'goto fail'. In label 'fail', for (i = depth - 3; i >= 0; i--) cannot meet the condition because i = -1, so pages[0] cann't be put. 2) An err happened in the second 'for' loop Now we've got pages[0] with depth = 2, or we've got pages[0] and pages[1] with depth = 3. When an err is detected, we need 'goto fail' to put such the pages. When depth is 2, in label 'fail', for (i = depth - 3; i >= 0; i--) cann't meet the condition because i = -1, so pages[0] cann't be put. When depth is 3, in label 'fail', for (i = depth - 3; i >= 0; i--) can only put pages[0], pages[1] also cann't be put. Note that 'depth' has been changed before first 'goto fail' (ref to statemnt depth = i + 1), so passing this modified 'depth' to the tracepoint, trace_f2fs_truncate_partial_nodes, is also incorrect. Signed-off-by: Shifei Ge <shifei10.ge@samsung.com> [Jaegeuk Kim: modify the description and fix one bug] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:21 +09:00
Jaegeuk Kim	a8865372a8	f2fs: handle errors correctly during f2fs_reserve_block The get_dnode_of_data nullifies inode and node page when error is occurred. There are two cases that passes inode page into get_dnode_of_data(). 1. make_empty_dir() -> get_new_data_page() -> f2fs_reserve_block(ipage) -> get_dnode_of_data() 2. f2fs_convert_inline_data() -> __f2fs_convert_inline_data() -> f2fs_reserve_block(ipage) -> get_dnode_of_data() This patch adds correct error handling codes when get_dnode_of_data() returns an error. At first, f2fs_reserve_block() calls f2fs_put_dnode() whenever reserve_new_block returns an error. So, the rule of f2fs_reserve_block() is to nullify inode page when there is any error internally. Finally, two callers of f2fs_reserve_block() should call f2fs_put_dnode() appropriately if they got an error since successful f2fs_reserve_block(). Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:21 +09:00
Jaegeuk Kim	1e1bb4baf1	f2fs: add inline_data recovery routine This patch adds a inline_data recovery routine with the following policy. [prev.] [next] of inline_data flag o o -> recover inline_data o x -> remove inline_data, and then recover data blocks x o -> remove inline_data, and then recover inline_data x x -> recover data blocks Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:20 +09:00
Jaegeuk Kim	0dbdc2ae9b	f2fs: add the number of inline_data files to status info This patch adds the number of inline_data files into the status information. Note that the number is reset whenever the filesystem is newly mounted. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:20 +09:00
Jaegeuk Kim	9e09fc855d	f2fs: refactor f2fs_convert_inline_data Change log from v1: o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu. This patch refactors f2fs_convert_inline_data to check a couple of conditions internally for deciding whether it needs to convert inline_data or not. So, the new f2fs_convert_inline_data initially checks: 1) f2fs_has_inline_data(), and 2) the data size to be changed. If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA, then we don't need to convert the inline_data with data allocation. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:19 +09:00
Jaegeuk Kim	26f466f4a9	f2fs: call f2fs_put_page at the error case In f2fs_write_begin(), if f2fs_conver_inline_data() returns an error like -ENOSPC, f2fs should call f2fs_put_page(). Otherwise, it is remained as a locked page, resulting in the following bug. [<ffffffff8114657e>] sleep_on_page+0xe/0x20 [<ffffffff81146567>] __lock_page+0x67/0x70 [<ffffffff81157d08>] truncate_inode_pages_range+0x368/0x5d0 [<ffffffff81157ff5>] truncate_inode_pages+0x15/0x20 [<ffffffff8115804b>] truncate_pagecache+0x4b/0x70 [<ffffffff81158082>] truncate_setsize+0x12/0x20 [<ffffffffa02a1842>] f2fs_setattr+0x72/0x270 [f2fs] [<ffffffff811cdae3>] notify_change+0x213/0x400 [<ffffffff811ab376>] do_truncate+0x66/0xa0 [<ffffffff811ab541>] vfs_truncate+0x191/0x1b0 [<ffffffff811ab5bc>] do_sys_truncate+0x5c/0xa0 [<ffffffff811ab78e>] SyS_truncate+0xe/0x10 [<ffffffff81756052>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:19 +09:00
Jaegeuk Kim	8230a0a49f	f2fs: convert inline_data for punch_hole In the punch_hole(), let's convert inline_data all the time for simplicity and to avoid potential deadlock conditions. It is pretty much not a big deal to do this. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:12 +09:00
Jaegeuk Kim	f185ff979f	f2fs: don't need to get f2fs_lock_op for the inline_data test This patch locates checking the inline_data prior to calling f2fs_lock_op() in truncate_blocks(), since getting the lock is unnecessary. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-27 12:40:41 +09:00
Huajun Li	9ffe0fb5f3	f2fs: handle inline data operations Hook inline data read/write, truncate, fallocate, setattr, etc. Files need meet following 2 requirement to inline: 1) file size is not greater than MAX_INLINE_DATA; 2) file doesn't pre-allocate data blocks by fallocate(). FI_INLINE_DATA will not be set while creating a new regular inode because most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when data is submitted to block layer, ranther than set it while creating a new inode, this also avoids converting data from inline to normal data block and vice versa. While writting inline data to inode block, the first data block should be released if the file has a block indexed by i_addr[0]. On the other hand, when a file operation is appied to a file with inline data, we need to test if this file can remain inline by doing this operation, otherwise it should be convert into normal file by reserving a new data block, copying inline data to this new block and clear FI_INLINE_DATA flag. Because reserve a new data block here will make use of i_addr[0], if we save inline data in i_addr[0..872], then the first 4 bytes would be overwriten. This problem can be avoided simply by not using i_addr[0] for inline data. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 20:40:41 +09:00
Huajun Li	e18c65b2ac	f2fs: key functions to handle inline data Functions to implement inline data read/write, and move inline data to normal data block when file size exceeds inline data limitation. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 20:40:09 +09:00
Gu Zheng	0d47c1adc2	f2fs: convert max_orphans to a field of f2fs_sb_info Previously, we need to calculate the max orphan num when we try to acquire an orphan inode, but it's a stable value since the super block was inited. So converting it to a field of f2fs_sb_info and use it directly when needed seems a better choose. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 20:37:52 +09:00
Jaegeuk Kim	944fcfc184	f2fs: check the blocksize before calling generic_direct_IO path The f2fs supports 4KB block size. If user requests dwrite with under 4KB data, it allocates a new 4KB data block. However, f2fs doesn't add zero data into the untouched data area inside the newly allocated data block. This incurs an error during the xfstest #263 test as follow. 263 12s ... [failed, exit status 1] - output mismatch (see 263.out.bad) --- 263.out 2013-03-09 03:37:15.043967603 +0900 +++ 263.out.bad 2013-12-27 04:20:39.230203114 +0900 @@ -1,3 +1,976 @@ QA output created by 263 fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z +fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z +truncating to largest ever: 0x12a00 +truncating to largest ever: 0x75400 +fallocating to largest ever: 0x79cbf ... (Run 'diff -u 263.out 263.out.bad' to see the entire diff) Ran: 263 Failures: 263 Failed 1 of 1 tests It turns out that, when the test tries to write 2KB data with dio, the new dio path allocates 4KB data block without filling zero data inside the remained 2KB area. Finally, the output file contains a garbage data for that region. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 20:33:07 +09:00
Jaegeuk Kim	1ec79083b2	f2fs: should put the dnode when NEW_ADDR is detected When get_dnode_of_data() in get_data_block() returns a successful dnode, we should put the dnode. But, previously, if its data block address is equal to NEW_ADDR, we didn't do that, resulting in a deadlock condition. So, this patch splits original error conditions with this case, and then calls f2fs_put_dnode before finishing the function. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 20:33:06 +09:00
Jaegeuk Kim	58bfaf44df	f2fs: introduce F2FS_INODE macro to get f2fs_inode This patch introduces F2FS_INODE that returns struct f2fs_inode * from the inode page. By using this macro, we can remove unnecessary casting codes like below. struct f2fs_inode ri = &F2FS_NODE(inode_page)->i; -> struct f2fs_inode ri = F2FS_INODE(inode_page); Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 20:32:48 +09:00
Chao Yu	d96b143151	f2fs: check filename length in recover_dentry In current flow, we will get Null return value of f2fs_find_entry in recover_dentry when name.len is bigger than F2FS_NAME_LEN, and then we still add this inode into its dir entry. To avoid this situation, we must check filename length before we use it. Another point is that we could remove the code of checking filename length In f2fs_find_entry, because f2fs_lookup will be called previously to ensure of validity of filename length. V2: o add WARN_ON() as Jaegeuk Kim suggested. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 12:50:09 +09:00
Chao Yu	deead09009	f2fs: avoid to set wrong pino of inode when rename dir When we rename a dir to new name which is not exist previous, we will set pino of parent inode with ino of child inode in f2fs_set_link. It destroy consistency of pino, it should be fixed. Thanks for previous work of Shu Tan. Signed-off-by: Shu Tan <shu.tan@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:42:51 +09:00
Chao Yu	4f4124d0b9	f2fs: update several comments Update several comments: 1. use f2fs_{un}lock_op install of mutex_{un}lock_op. 2. update comment of get_data_block(). 3. update description of node offset. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:26:03 +09:00
Gu Zheng	7e8f23081a	f2fs: remove the rw_flag domain from f2fs_io_info When using the f2fs_io_info in the low level, we still need to merge the rw and rw_flag, so use the rw to hold all the io flags directly, and remove the rw_flag field. ps.It is based on the previous patch: f2fs: move all the bio initialization into __bio_alloc Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:07 +09:00
Gu Zheng	940a6d34b3	f2fs: move all the bio initialization into __bio_alloc Move all the bio initialization into __bio_alloc, and some minor cleanups are also added. v3: Use 'bool' rather than 'int' as Kim suggested. v2: Use 'is_read' rather than 'rw' as Yu Chao suggested. Remove the needless initialization of bio->bi_private. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:07 +09:00
Jaegeuk Kim	5459aa9770	f2fs: write dirty meta pages collectively This patch enhances writing dirty meta pages collectively in background. During the file data writes, it'd better avoid to write small dirty meta pages frequently. So let's give a chance to collect a number of dirty meta pages for a while. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:07 +09:00
Jaegeuk Kim	bfad7c2d40	f2fs: introduce a new direct_IO write path Previously, f2fs doesn't support direct IOs with high performance, which throws every write requests via the buffered write path, resulting in highly performance degradation due to memory opeations like copy_from_user. This patch introduces a new direct IO path in which every write requests are processed by generic blockdev_direct_IO() with enhanced get_block function. The get_data_block() in f2fs handles: 1. if original data blocks are allocates, then give them to blockdev. 2. otherwise, a. preallocate requested block addresses b. do not use extent cache for better performance c. give the block addresses to blockdev This policy induces that: - new allocated data are sequentially written to the disk - updated data are randomly written to the disk. - f2fs gives consistency on its file meta, not file data. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:07 +09:00
Jaegeuk Kim	216fbd6443	f2fs: introduce sysfs entry to control in-place-update policy This patch introduces new sysfs entries for users to control the policy of in-place-updates, namely IPU, in f2fs. Sometimes f2fs suffers from performance degradation due to its out-of-place update policy that produces many additional node block writes. If the storage performance is very dependant on the amount of data writes instead of IO patterns, we'd better drop this out-of-place update policy. This patch suggests 5 polcies and their triggering conditions as follows. [sysfs entry name = ipu_policy] 0: F2FS_IPU_FORCE all the time, 1: F2FS_IPU_SSR if SSR mode is activated, 2: F2FS_IPU_UTIL if FS utilization is over threashold, 3: F2FS_IPU_SSR_UTIL if SSR mode is activated and FS utilization is over threashold, 4: F2FS_IPU_DISABLE disable IPU. (=default option) [sysfs entry name = min_ipu_util] This parameter controls the threshold to trigger in-place-updates. The number indicates percentage of the filesystem utilization, and used by F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies. For more details, see need_inplace_update() in segment.h. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:07 +09:00
Changman Lee	5dcd8a7150	f2fs: missing kmem_cache_destroy for discard_entry insmod f2fs.ko is failed after insmod and rmmod firstly. $ sudo insmod fs/f2fs/f2fs.ko insmod: error inserting 'fs/f2fs/f2fs.ko': -1 Cannot allocate memory -- dmesg -- kmem_cache_sanity_check (free_nid): Cache name already exists. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:07 +09:00
Jaegeuk Kim	76130ccabc	f2fs: fix the location of tracepoint We need to get a trace before submit_bio, since its bi_sector is remapped during the submit_bio. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:06 +09:00
Jaegeuk Kim	458e6197c3	f2fs: refactor bio->rw handling This patch introduces f2fs_io_info to mitigate the complex parameter list. struct f2fs_io_info { enum page_type type; /* contains DATA/NODE/META/META_FLUSH / int rw; / contains R/RS/W/WS / int rw_flag; / contains REQ_META/REQ_PRIO / } 1. f2fs_write_data_pages - DATA - WRITE_SYNC is set when wbc->WB_SYNC_ALL. 2. sync_node_pages - NODE - WRITE_SYNC all the time 3. sync_meta_pages - META - WRITE_SYNC all the time - REQ_META \| REQ_PRIO all the time * f2fs_submit_merged_bio() handles META_FLUSH. 4. ra_nat_pages, ra_sit_pages, ra_sum_pages - META - READ_SYNC Cc: Fan Li <fanofcode.li@samsung.com> Cc: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:06 +09:00
Fan Li	63a0b7cb33	f2fs: merge pages with the same sync_mode flag Previously f2fs submits most of write requests using WRITE_SYNC, but f2fs_write_data_pages submits last write requests by sync_mode flags callers pass. This causes a performance problem since continuous pages with different sync flags can't be merged in cfq IO scheduler(thanks yu chao for pointing it out), and synchronous requests often take more time. This patch makes the following modifies to DATA writebacks: 1. every page will be written back using the sync mode caller pass. 2. only pages with the same sync mode can be merged in one bio request. These changes are restricted to DATA pages.Other types of writebacks are modified To remain synchronous. In my test with tiotest, f2fs sequence write performance is improved by about 7%-10% , and this patch has no obvious impact on other performance tests. Signed-off-by: Fan Li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:06 +09:00
Jaegeuk Kim	6bacf52fb5	f2fs: add unlikely() macro for compiler more aggressively This patch adds unlikely() macro into the most of codes. The basic rule is to add that when: - checking unusual errors, - checking page mappings, - and the other unlikely conditions. Change log from v1: - Don't add unlikely for the NULL test and error test: advised by Andi Kleen. Cc: Chao Yu <chao2.yu@samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:06 +09:00
Chao Yu	cfb271d485	f2fs: add unlikely() macro for compiler optimization As we know, some of our branch condition will rarely be true. So we could add 'unlikely' to let compiler optimize these code, by this way we could drop unneeded 'jump' assemble code to improve performance. change log: o add unlikely as many as possible across the whole source files at once suggested by Jaegeuk Kim. Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:06 +09:00
Chao Yu	b9987a277f	f2fs: avoid unneeded page release for correct _count of page In find_fsync_dnodes() and recover_data(), our flow is like this: ->f2fs_submit_page_bio() -> f2fs_put_page() -> page_cache_release() ---- page->_count declined to zero. ->__free_pages() -> put_page_testzero() ---- page->_count will be declined again. We will get a segment fault in put_page_testzero when CONFIG_DEBUG_VM is on, or return MM with a bad page with wrong _count num. So let's just release this page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:06 +09:00
Chao Yu	a0acdfe05a	f2fs: use inner macro GFP_F2FS_ZERO for simplification Use inner macro GFP_F2FS_ZERO to instead of GFP_NOFS \| __GFP_ZERO for simplification of code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:05 +09:00
Younger Liu	40e1ebe97d	f2fs: replace the debugfs_root with f2fs_debugfs_root This minor change for the naming conventions of debugfs_root to avoid any possible conflicts to the other filesystem. Signed-off-by: Younger Liu <younger.liucn@gmail.com> Cc: Younger Liu <younger.liucn@gmail.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> [Jaegeuk Kim: change the patch name] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:05 +09:00
Younger Liu	c524723ebf	f2fs: remove debufs dir if debugfs_create_file() failed When debugfs_create_file() failed in f2fs_create_root_stats(), debugfs_root should be remove. Signed-off-by: Younger Liu <liuyiyang@hisense.com> Cc: Younger Liu <younger.liucn@gmail.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:05 +09:00
Chao Yu	9af0ff1c52	f2fs: readahead contiguous pages for restore_node_summary If cp has no CP_UMOUNT_FLAG, we will read all pages in whole node segment one by one, it makes low performance. So let's merge contiguous pages and readahead for better performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: adjust the new bio operations] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:05 +09:00
Jaegeuk Kim	93dfe2ac51	f2fs: refactor bio-related operations This patch integrates redundant bio operations on read and write IOs. 1. Move bio-related codes to the top of data.c. 2. Replace f2fs_submit_bio with f2fs_submit_merged_bio, which handles read bios additionally. 3. Introduce __submit_merged_bio to submit the merged bio. 4. Change f2fs_readpage to f2fs_submit_page_bio. 5. Introduce f2fs_submit_page_mbio to integrate previous submit_read_page and submit_write_page. Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com > Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:05 +09:00
Jaegeuk Kim	187b5b8b3d	f2fs: remove the own bi_private allocation Previously f2fs allocates its own bi_private data structure all the time even though we don't use it. But, can we remove this bi_private allocation? This patch removes such the additional bi_private allocation. 1. Retrieve f2fs_sb_info from its page->mapping->host->i_sb. - This removes the usecases of bi_private in end_io. 2. Use bi_private only when we really need it. - The bi_private is used only when the checkpoint procedure is conducted. - When conducting the checkpoint, f2fs submits a META_FLUSH bio to wait its bio completion. - Since we have no dependancies to remove bi_private now, let's just use bi_private pointer as the completion pointer. Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:05 +09:00
Chao Yu	8f99a946f3	f2fs: convert recover_orphan_inodes to void The recover_orphan_inodes() returns no error all the time, so we don't need to check its errors. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: add description] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:05 +09:00
Chao Yu	1069bbf7b9	f2fs: check return value of f2fs_readpage in find_data_page We should return error if we do not get an updated page in find_date_page when f2fs_readpage failed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Chao Yu	01d2d1aa06	f2fs: use true and false for boolean variable The inode_page_locked should be a boolean variable. struct dnode_of_data { struct inode inode; / vfs inode pointer / struct page inode_page; /* its inode page, NULL is possible / struct page node_page; /* cached direct node page / nid_t nid; / node id of the direct node block / unsigned int ofs_in_node; / data offset in the node page / ==> bool inode_page_locked; / inode page is locked or not / block_t data_blkaddr; / block address of the node block */ }; Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: add description] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Chao Yu	aac44046a2	f2fs: correct type of wait in struct bio_private The void *wait in bio_private is used for waiting completion of checkpoint bio. So we don't need to use its type as void, but declare it as completion type. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: add description] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Chao Yu	6947eea957	f2fs: avoid to calculate incorrect max orphan number Because we will write node summaries when do_checkpoint with umount flag, our number of max orphan blocks should minus NR_CURSEG_NODE_TYPE additional. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Shu Tan <shu.tan@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Chao Yu	a66c7b2fcf	f2fs: remove unneeded code in punch_hole Because FALLOC_FL_PUNCH_HOLE flag must be ORed with FALLOC_FL_KEEP_SIZE in fallocate, so we could remove the useless 'keep size' branch code which will never be excuted in punch_hole. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Fan Li <fanofcode.li@samsung.com> [Jaegeuk Kim: remove an unnecessary parameter togather] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Jaegeuk Kim	031fa8cc9b	f2fs: remove unnecessary condition checks This patch removes the unnecessary condition checks on: fs/f2fs/gc.c:667 do_garbage_collect() warn: 'sum_page' isn't an ERR_PTR fs/f2fs/f2fs.h:795 f2fs_put_page() warn: 'page' isn't an ERR_PTR Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Jaegeuk Kim	f9a4e6df52	f2fs: bug fix on bit overflow from 32bits to 64bits This patch fixes some bit overflows by the shift operations. Dan Carpenter reported potential bugs on bit overflows as follows. fs/f2fs/segment.c:910 submit_write_page() warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type? fs/f2fs/checkpoint.c:429 get_valid_checkpoint() warn: should '1 << ()' be a 64 bit type? fs/f2fs/data.c:408 f2fs_readpage() warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type? fs/f2fs/data.c:457 submit_read_page() warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type? fs/f2fs/data.c:525 get_data_block_ro() warn: should 'i << blkbits' be a 64 bit type? Bug-Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Gu Zheng	3679556794	f2fs: fix a potential out of range issue Fix a potential out of range issue introduced by commit: 22fb72225a f2fs: simplify write_orphan_inodes for better readable Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Jaegeuk Kim	0e80220ac5	f2fs: remove unnecessary return value Let's remove the unnecessary return value. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:03 +09:00
Huajun Li	8274de77b7	f2fs: add a new mount option: inline_data Add a mount option: inline_data. If the mount option is set, data of New created small files can be stored in their inode. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:03 +09:00
Huajun Li	1001b3479c	f2fs: add flags and helpers to support inline data Add new inode flags F2FS_INLINE_DATA and FI_INLINE_DATA to indicate whether the inode has inline data. Inline data makes use of inode block's data indices region to save small file. Currently there are 923 data indices in an inode block. Since inline xattr has made use of the last 50 indices to save its data, there are 873 indices left which can be used for inline data. When FI_INLINE_DATA is set, the layout of inode block's indices region is like below: +-----------------+ \| \| Reserved. reserve_new_block() will make use of \| i_addr[0] \| i_addr[0] when we need to reserve a new data block \| \| to convert inline data into regular one's. \|-----------------\| \| \| Used by inline data. A file whose size is less than \| i_addr[1~872] \| 3488 bytes(~3.4k) and doesn't reserve extra \| \| blocks by fallocate() can be saved here. \|-----------------\| \| \| \| i_addr[873~922] \| Reserved for inline xattr \| \| +-----------------+ Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:03 +09:00
Changman Lee	03232305ff	f2fs: send REQ_META or REQ_PRIO when reading meta area Let's send REQ_META or REQ_PRIO when reading meta area such as NAT/SIT etc. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:03 +09:00
Jaegeuk Kim	a709f4a2f2	f2fs: add detailed information of bio types in the tracepoints This patch inserts information of bio types in more detail. So, we can now see REQ_META and REQ_PRIO too. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:03 +09:00
Huajun Li	b600965c43	f2fs: add a new function: f2fs_reserve_block() Add the function f2fs_reserve_block() to easily reserve new blocks, and use it to clean up more codes. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:03 +09:00
Jaegeuk Kim	0daaad97dc	f2fs: avoid lock debugging overhead If CONFIG_F2FS_CHECK_FS is unset, we don't need to add any debugging overhead. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:03 +09:00

1 2 3 4 5 ...

596 Commits (f6e63f90809946d410c42045577cb159fedabf8c)