Commit graph

2146 commits

Author SHA1 Message Date
Jaegeuk Kim 7c2e59632b f2fs: add resgid and resuid to reserve root blocks
This patch adds mount options to reserve some blocks via resgid=%u,resuid=%u.
It only activates with reserve_root=%u.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:40:02 -08:00
Yufen Yu 578c647879 f2fs: implement cgroup writeback support
Cgroup writeback requires explicit support from the filesystem.
f2fs's data and node writeback IOs go through __write_data_page,
which sets fio for submiting IOs. So, we add io_wbc for fio,
associate bios with blkcg by invoking wbc_init_bio() and
account IOs issuing by wbc_account_io().
In addtion, f2fs_fill_super() is updated to set SB_I_CGROUPWB.

Meta writeback IOs is left alone by this patch and will always be
attributed to the root cgroup.

The results show that f2fs can throttle writeback nicely for
data writing and file creating.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:40:01 -08:00
Chao Yu bffa8d3b00 f2fs: remove unused pend_list_tag
In commit 78997b569f ("f2fs: split discard policy"), we have get rid
of using pend_list_tag field in struct discard_cmd_control, but forgot
to remove it, now do it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:40:00 -08:00
Chao Yu 49c60c67d2 f2fs: avoid high cpu usage in discard thread
We take very long time to finish generic/476, this is because we will
check consistence of all discard entries in global rb tree while
traversing all different granularity pending lists, even when the list
is empty, in order to avoid that unneeded overhead, we have to skip
the check when coming up an empty list.

generic/476 time consumption:
					cost
Before patch & w/o consistence check	57s
Before patch & w/ consistence check	1426s
After patch				78s

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:40:00 -08:00
Wei Yongjun 94b1e10e74 f2fs: make local functions static
Fixes the following sparse warnings:

fs/f2fs/segment.c:887:6: warning:
 symbol '__check_sit_bitmap' was not declared. Should it be static?
fs/f2fs/segment.c:1327:6: warning:
 symbol 'f2fs_wait_discard_bio' was not declared. Should it be static?
fs/f2fs/super.c:1661:5: warning:
 symbol 'f2fs_get_projid' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:59 -08:00
Jaegeuk Kim 7e65be49ed f2fs: add reserved blocks for root user
This patch allows root to reserve some blocks via mount option.

"-o reserve_root=N" means N x 4KB-sized blocks for root only.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:58 -08:00
Yunlong Song 2c1905042c f2fs: check segment type in __f2fs_replace_block
In some case, the node blocks has wrong blkaddr whose segment type is
NODE, e.g., recover inode has missing xattr flag and the blkaddr is in
the xattr range. Since fsck.f2fs does not check the recovery nodes, this
will cause __f2fs_replace_block change the curseg of node and do the
update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a
result, when recovery process write checkpoint and sync nodes, the
next_blkoff of curseg is used in the segment bit map, then it will
cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:57 -08:00
Yunlei He 1eca05aa9d f2fs: update inode info to inode page for new file
After checkpoint,
 1. creat a new file A ,(with dirty inode && dirty inode page && xattr info)
 2. backgroud wb write back file A inode page (without update from inode cache)
 3. fsync file A, write back inode page of file A with inode cache info
 4. sudden power off before new checkpoint

In this case, recovery process will try to recover a zero inode
page. Inline xattr flag of file A will be miss and xattr info
will be taken as blkaddr index.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:56 -08:00
Jaegeuk Kim f66c027ead f2fs: show precise # of blocks that user/root can use
Let's show precise # of blocks that user/root can use through bavail and bfree
respectively.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-16 15:39:55 -08:00
Chao Yu 7f1a45a5b6 f2fs: clean up unneeded declaration
Commit 6afc662e68 ("f2fs: support flexible inline xattr size")
declared f2fs_sb_has_flexible_inline_xattr in f2fs.h for latter being
used in get_inline_xattr_addrs, but in latter version, related code
has been changed, leave f2fs_sb_has_flexible_inline_xattr w/o any
users. Let's remove it for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-03 22:48:34 -08:00
Chao Yu d6d478a14b f2fs: continue to do direct IO if we only preallocate partial blocks
While doing direct IO, if we run out-of-space when we preallocate blocks,
we should not return ENOSPC error directly, instead, we should continue
to do following direct IO, which will keep directIO of f2fs acting like
other filesystems.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-03 22:48:33 -08:00
Jaegeuk Kim 6279398db7 f2fs: enable quota at remount from r to w
We have to enable quota only when remounting from read to write. Otherwise,
we'll get remount failure. (e.g., write to write case)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-03 22:48:25 -08:00
Jaegeuk Kim b1ca321d1c f2fs: skip stop_checkpoint for user data writes
We can give another chance to write user data, which can resolve
generic/441.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:31 -08:00
Jaegeuk Kim d620439f25 f2fs: fix missing error number for xattr operation
This fixes generic/449 hang problem caused by no ENOSPC forever which should be
returned by setxattr under disk full scenario.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:31 -08:00
Jaegeuk Kim 0a007b97aa f2fs: recover directory operations by fsync
This fixes generic/342 which doesn't recover renamed file which was fsynced
before. It will be done via another fsync on newly created file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:31 -08:00
Jaegeuk Kim c39a1b348c f2fs: return error during fill_super
Let's avoid BUG_ON during fill_super, when on-disk was totall corrupted.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:31 -08:00
Yunlei He 211a6fa04c f2fs: fix an error case of missing update inode page
-Thread A                             Thread B

-write_checkpoint
 -block_operations
  -f2fs_unlock_all                    -f2fs_sync_file
                                       -f2fs_write_inode
                                        -f2fs_inode_synced
    -f2fs_sync_inode_meta
     -sync_node_pages
                                        -set_page_drity

In this case, if sudden power off without next new checkpoint,
the last inode page update will lost. wb_writeback is same with
fsync.

Yunlei also reproduced the bug by:

@@ -366,7 +366,7 @@ int update_inode(struct inode *inode, struct page *node_page)
        struct extent_tree *et = F2FS_I(inode)->extent_tree;

        f2fs_inode_synced(inode);
-
+       msleep(10000);
        f2fs_wait_on_page_writeback(node_page, NODE, true);

shell 1:                                       shell2:

dd if=/dev/zero of=./test bs=1M count=10
sync
echo "hello" >> ./test
fsync test  // sleep 10s
                                               sync //return quickly
echo c > /proc/sysrq-trigger

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:31 -08:00
Chao Yu 4635b46af2 f2fs: fix potential hangtask in f2fs_trace_pid
As Jia-Ju Bai reported:

"According to fs/f2fs/trace.c, the kernel module may sleep under a spinlock.
The function call path is:
f2fs_trace_pid (acquire the spinlock)
   f2fs_radix_tree_insert
     cond_resched --> may sleep

I do not find a good way to fix it, so I only report.
This possible bug is found by my static analysis tool (DSAC) and my code
review."

Obviously, it's problemetic to schedule in critical region of spinlock,
which will cause uninterruptable sleep if there is no waker.

This patch changes to use mutex lock intead of spinlock to avoid this
condition.

Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:30 -08:00
Yunlei He c376fc0f35 f2fs: no need return value in restore summary process
No need return value in restore summary process

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:30 -08:00
LiFan fab2adee36 f2fs: use unlikely for release case
Since the variable release is only nonzero when another unlikely
case occurs, use unlikely() on it seems logical.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:30 -08:00
Chao Yu f652e9d988 f2fs: don't return value in truncate_data_blocks_range
There is no caller cares about return value of truncate_data_blocks_range,
remove it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:30 -08:00
Chao Yu 4c2ac6a860 f2fs: clean up f2fs_map_blocks
f2fs_map_blocks():

if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) {
	if (create) {
		...
	} else {
		...
		if (flag == F2FS_GET_BLOCK_FIEMAP &&
					blkaddr == NULL_ADDR) {
			...
		}
		if (flag != F2FS_GET_BLOCK_FIEMAP ||
					blkaddr != NEW_ADDR)
			goto sync_out;
	}

It means we can break the loop in cases of:
a) flag != F2FS_GET_BLOCK_FIEMAP or
b) flag == F2FS_GET_BLOCK_FIEMAP && blkaddr == NULL_ADDR

Condition b) is the same as previous one, so merge operations of them
for readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:30 -08:00
Chao Yu 416d2dbb4e f2fs: clean up hash codes
f2fs_chksum and f2fs_crc32 use the same 'crc32' crypto engine, also
their implementation are almost the same, except with different
shash description context.

Introduce __f2fs_crc32 to wrap the common codes, and reuse it in
f2fs_chksum and f2fs_crc32.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:30 -08:00
Chao Yu bae01eda8e f2fs: fix error handling in fill_super
In fill_super, if we fail to call f2fs_build_stats(), it needs to detach
from global f2fs shrink list, otherwise once system starts to shrink slab
cache, we will encounter below panic:

BUG: unable to handle kernel paging request at 00007d35
Oops: 0002 [#1] PREEMPT SMP
EIP: __lock_acquire+0x70/0x12c0
Call Trace:
 lock_acquire+0xae/0x220
 mutex_trylock+0xc5/0xf0
 f2fs_shrink_count+0x32/0xb0 [f2fs]
 shrink_slab+0xf1/0x5b0
 drop_slab_node+0x35/0x60
 drop_slab+0xf/0x20
 drop_caches_sysctl_handler+0x79/0xc0
 proc_sys_call_handler+0xa4/0xc0
 proc_sys_write+0x1f/0x30
 __vfs_write+0x24/0x150
 SyS_write+0x44/0x90
 do_fast_syscall_32+0xa1/0x1ca
 entry_SYSENTER_32+0x4c/0x7b

In addition, this patch relocates f2fs_join_shrinker in fill_super to
avoid unneeded error handling of it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:30 -08:00
Chao Yu 4e6aad29bc f2fs: spread f2fs_k{m,z}alloc
Use f2fs_k{m,z}alloc as much as possible to increase fault injection
points.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:29 -08:00
Chao Yu 628b3d1438 f2fs: inject fault to kvmalloc
This patch supports to inject fault into kvmalloc/kvzalloc.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:29 -08:00
Chao Yu acbf054d53 f2fs: inject fault to kzalloc
This patch introduces f2fs_kzalloc based on f2fs_kmalloc in order to
support error injection for kzalloc().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:29 -08:00
LiFan 979f492fe3 f2fs: remove a redundant conditional expression
Avoid checking is_inode repeatedly, and make the logic
a little bit clearer.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:29 -08:00
Hyunchul Lee d5097be55c f2fs: apply write hints to select the type of segment for direct write
When blocks are allocated for direct write, select the type of
segment using the kiocb hint. But if an inode has FI_NO_ALLOC,
use the inode hint.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:29 -08:00
Eric Biggers 20bb2479be f2fs: switch to fscrypt_prepare_setattr()
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:29 -08:00
Eric Biggers 55899d7b49 f2fs: switch to fscrypt_prepare_lookup()
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:29 -08:00
Eric Biggers 2e45b07fda f2fs: switch to fscrypt_prepare_rename()
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:28 -08:00
Eric Biggers b05157e772 f2fs: switch to fscrypt_prepare_link()
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:28 -08:00
Eric Biggers 2e168c82dc f2fs: switch to fscrypt_file_open()
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:28 -08:00
Elena Reshetova 6671726054 posix_acl: convert posix_acl.a_refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable posix_acl.a_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

**Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the posix_acl.a_refcount it might make a difference
in following places:
 - get_cached_acl(): increment in refcount_inc_not_zero() only
   guarantees control dependency on success vs. fully ordered
   atomic counterpart. However this operation is performed under
   rcu_read_lock(), so this should be fine.
 - posix_acl_release(): decrement in refcount_dec_and_test() only
   provides RELEASE ordering and control dependency on success
   vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:28 -08:00
Zhikang Zhang de8b10ac13 f2fs: remove repeated f2fs_bug_on
f2fs: remove repeated f2fs_bug_on which has already existed
      in function invalidate_blocks.

Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:28 -08:00
LiFan 736c0a7485 f2fs: remove an excess variable
Remove the variable page_idx which no one would miss.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:28 -08:00
Chao Yu 21020812c9 f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem
test/generic/208 reports a potential deadlock as below:

Chain exists of:
  &mm->mmap_sem --> &fi->i_mmap_sem --> &fi->dio_rwsem[WRITE]

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fi->dio_rwsem[WRITE]);
                               lock(&fi->i_mmap_sem);
                               lock(&fi->dio_rwsem[WRITE]);
  lock(&mm->mmap_sem);

This patch changes the lock dependency as below in fallocate() to
fix this issue:
- dio_rwsem
 - i_mmap_sem

Fixes: bb06664a53 ("f2fs: avoid race in between GC and block exchange")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:28 -08:00
Sheng Yong e17d488bce f2fs: remove unused parameter
Commit d260081ccf ("f2fs: change recovery policy of xattr node block")
removes the use of blkaddr, which is no longer used. So remove the
parameter.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:27 -08:00
Sheng Yong 25006645d2 f2fs: still write data if preallocate only partial blocks
If there is not enough space left, f2fs_preallocate_blocks may only
preallocte partial blocks. As a result, the write operation fails
but i_blocks is not 0.  To avoid this, f2fs should write data in
non-preallocation way and write as many data as the size of i_blocks.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:27 -08:00
Sheng Yong f6df8f234e f2fs: introduce sysfs readdir_ra to readahead inode block in readdir
This patch introduces a sysfs interface readdir_ra to enable/disable
readaheading inode block in f2fs_readdir. When readdir_ra is enabled,
it improves the performance of "readdir + stat".

For 300,000 files:
	time find /data/test > /dev/null
disable readdir_ra: 1m25.69s real  0m01.94s user  0m50.80s system
enable  readdir_ra: 0m18.55s real  0m00.44s user  0m15.39s system

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:27 -08:00
LiFan 5921aaa185 f2fs: fix concurrent problem for updating free bitmap
alloc_nid_failed and scan_nat_page can be called at the same time,
and we haven't protected add_free_nid and update_free_nid_bitmap
with the same nid_list_lock. That could lead to

Thread A				Thread B
- __build_free_nids
 - scan_nat_page
  - add_free_nid
					- alloc_nid_failed
					 - update_free_nid_bitmap
  - update_free_nid_bitmap

scan_nat_page will clear the free bitmap since the nid is PREALLOC_NID,
but alloc_nid_failed needs to set the free bitmap. This results in
free nid with free bitmap cleared.
This patch update the bitmap under the same nid_list_lock in add_free_nid.
And use __GFP_NOFAIL to make sure to update status of free nid correctly.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:27 -08:00
Chao Yu 2ab56a59ca f2fs: remove unneeded memory footprint accounting
We forgot to remov memory footprint accounting of per-cpu type
variables, fix it.

Fixes: 35782b233f ("f2fs: remove percpu_count due to performance regression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:27 -08:00
Yunlei He 66e8336137 f2fs: no need to read nat block if nat_block_bitmap is set
No need to read nat block if nat_block_bitmap is set.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:27 -08:00
Chao Yu 292c196a36 f2fs: reserve nid resource for quota sysfile
During mkfs, quota sysfiles have already occupied nid resource,
it needs to adjust remaining available nid count in kernel side.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-02 19:27:26 -08:00
Linus Torvalds 1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
Linus Torvalds a02cd4229e f2fs-for-4.15-rc1
In this round, we introduce sysfile-based quota support which is required
 for Android by default. In addition, we allow that users are able to reserve
 some blocks in runtime to mitigate performance drops in low free space.
 
 Enhancement
 - assign proper data segments according to write_hints given by user
 - issue cache_flush on dirty devices only among multiple devices
 - exploit cp_error flag and add more faults to enhance fault injection test
 - conduct more readaheads during f2fs_readdir
 - add a range for discard commands
 
 Bug fix
 - fix zero stat->st_blocks when inline_data is set
 - drop crypto key and free stale memory pointer while evict_inode is failing
 - fix some corner cases in free space and segment management
 - fix wrong last_disk_size
 
 This series includes lots of clean-ups and code enhancement in terms of xattr
 operations, discard/flush command control. In addition, it adds versatile
 debugfs entries to monitor f2fs status.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAloNCPAACgkQQBSofoJI
 UNLYmg/8DbDp/mTXqJ0AURo84Z4OQUOTRxYkWazx4ct2WPZp2+5HCWDDoM8AAtUn
 1J6/t7cU3osjos+zWvpUREZq1SPbp5m0h818HBFFJ/YMBPXucdQcd6wpepniOR5J
 5uKauVd7jd2pbAAL7hKyr+iBSLrJl816wsq34Ml8y8zkDSJe4wO5YsGDqzqyKf4N
 8nxMavUgerb14I/qXPb3ljlYlfaNNRlCT649QGCG78gx5hPeiUtUJ2l5DKV2xPe7
 v+5lZO93FFwW1siGy+Atq+nqQJyUkeiOYGPR1NPx9tfmaPO58iOIXLirfblKASZY
 HXJigVf50fQQBtwdBFL8ICSop6zV6gCKkNGZCHLzcYFWWL2TQwCIP3/iJdj9Wy+j
 +YUYyN0dyl2mmNEDZjRNX1V+QBW1k+msmvBCb0fT1GJTQAyRfA4XfBDyg94cpWQ1
 9YivNywuzG8YtghY7gYU3lCfT2OG19nXCSdz4qYUb5SSwoeGtLahLxMV4mlil4Tg
 dOa8CPLFhJnCqB9ivI4L6SennBr+gNgL26SeZ3PF+B5KimYOTZxbenrll1kTi1xp
 uCU6UR1xJS0W7Cjk8sCIu5hXkJMJwPJ0hcVeTgsxMkujLGvSSRCGb2hmOeILfwRZ
 N4aGn+kVmwwgKaKjD/F4CY4b3yJLdTKMjjl74u5YaMQWe4Bq4qU=
 =c49T
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we introduce sysfile-based quota support which is
  required for Android by default. In addition, we allow that users are
  able to reserve some blocks in runtime to mitigate performance drops
  in low free space.

  Enhancements:
   - assign proper data segments according to write_hints given by user
   - issue cache_flush on dirty devices only among multiple devices
   - exploit cp_error flag and add more faults to enhance fault
     injection test
   - conduct more readaheads during f2fs_readdir
   - add a range for discard commands

  Bug fixes:
   - fix zero stat->st_blocks when inline_data is set
   - drop crypto key and free stale memory pointer while evict_inode is
     failing
   - fix some corner cases in free space and segment management
   - fix wrong last_disk_size

  This series includes lots of clean-ups and code enhancement in terms
  of xattr operations, discard/flush command control. In addition, it
  adds versatile debugfs entries to monitor f2fs status"

* tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (75 commits)
  f2fs: deny accessing encryption policy if encryption is off
  f2fs: inject fault in inc_valid_node_count
  f2fs: fix to clear FI_NO_PREALLOC
  f2fs: expose quota information in debugfs
  f2fs: separate nat entry mem alloc from nat_tree_lock
  f2fs: validate before set/clear free nat bitmap
  f2fs: avoid opened loop codes in __add_ino_entry
  f2fs: apply write hints to select the type of segments for buffered write
  f2fs: introduce scan_curseg_cache for cleanup
  f2fs: optimize the way of traversing free_nid_bitmap
  f2fs: keep scanning until enough free nids are acquired
  f2fs: trace checkpoint reason in fsync()
  f2fs: keep isize once block is reserved cross EOF
  f2fs: avoid race in between GC and block exchange
  f2fs: save a multiplication for last_nid calculation
  f2fs: fix summary info corruption
  f2fs: remove dead code in update_meta_page
  f2fs: remove unneeded semicolon
  f2fs: don't bother with inode->i_version
  f2fs: check curseg space before foreground GC
  ...
2017-11-16 12:10:21 -08:00
Mel Gorman 8667982014 mm, pagevec: remove cold parameter for pagevecs
Every pagevec_init user claims the pages being released are hot even in
cases where it is unlikely the pages are hot.  As no one cares about the
hotness of pages being released to the allocator, just ditch the
parameter.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Jan Kara 67fd707f46 mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.  Just drop the argument.

Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Jan Kara 8faab64229 f2fs: use find_get_pages_tag() for looking up single page
__get_first_dirty_index() wants to lookup only the first dirty page
after given index.  There's no point in using pagevec_lookup_tag() for
that.  Just use find_get_pages_tag() directly.

Link: http://lkml.kernel.org/r/20171009151359.31984-8-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00