1
0
Fork 0
Commit Graph

1464 Commits (74c8164e1cdb1eb22f1d49d54e515e81821a8ad0)

Author SHA1 Message Date
Jens Axboe 74c8164e1c mpage: mpage_readpages() should submit IO as read-ahead
a_ops->readpages() is only ever used for read-ahead, yet we don't flag
the IO being submitted as such.  Fix that up.  Any file system that uses
mpage_readpages() as its ->readpages() implementation will now get this
right.

Since we're passing in whether the IO is read-ahead or not, we don't
need to pass in the 'gfp' separately, as it is dependent on the IO being
read-ahead.  Kill off that member.

Add some documentation notes on ->readpages() being purely for
read-ahead.

Link: http://lkml.kernel.org/r/20180621010725.17813-3-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <clm@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:29 -07:00
NeilBrown 4cdfffc872 vfs: discard ATTR_ATTR_FLAG
This flag was introduce in 2.1.37pre1 and the only place it was tested
was removed in 2.1.43pre1.  The flag was never set.

Let's discard it properly.

Link: http://lkml.kernel.org/r/877en0hewz.fsf@notabene.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:28 -07:00
Linus Torvalds 4591343e35 Merge branches 'work.misc' and 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Misc cleanups from various folks all over the place

  I expected more fs/dcache.c cleanups this cycle, so that went into a
  separate branch. Said cleanups have missed the window, so in the
  hindsight it could've gone into work.misc instead. Decided not to
  cherry-pick, thus the 'work.dcache' branch"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: dcache: Use true and false for boolean values
  fold generic_readlink() into its only caller
  fs: shave 8 bytes off of struct inode
  fs: Add more kernel-doc to the produced documentation
  fs: Fix attr.c kernel-doc
  removed extra extern file_fdatawait_range

* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  kill dentry_update_name_case()
2018-08-13 21:28:25 -07:00
Linus Torvalds 0ea97a2d61 Merge branch 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs icache updates from Al Viro:

 - NFS mkdir/open_by_handle race fix

 - analogous solution for FUSE, replacing the one currently in mainline

 - new primitive to be used when discarding halfway set up inodes on
   failed object creation; gives sane warranties re icache lookups not
   returning such doomed by still not freed inodes. A bunch of
   filesystems switched to that animal.

 - Miklos' fix for last cycle regression in iget5_locked(); -stable will
   need a slightly different variant, unfortunately.

 - misc bits and pieces around things icache-related (in adfs and jfs).

* 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  jfs: don't bother with make_bad_inode() in ialloc()
  adfs: don't put inodes into icache
  new helper: inode_fake_hash()
  vfs: don't evict uninitialized inode
  jfs: switch to discard_new_inode()
  ext2: make sure that partially set up inodes won't be returned by ext2_iget()
  udf: switch to discard_new_inode()
  ufs: switch to discard_new_inode()
  btrfs: switch to discard_new_inode()
  new primitive: discard_new_inode()
  kill d_instantiate_no_diralias()
  nfs_instantiate(): prevent multiple aliases for directory inode
2018-08-13 20:25:58 -07:00
Linus Torvalds a66b4cd1e7 Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs open-related updates from Al Viro:

 - "do we need fput() or put_filp()" rules are gone - it's always fput()
   now. We keep track of that state where it belongs - in ->f_mode.

 - int *opened mess killed - in finish_open(), in ->atomic_open()
   instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open().

 - alloc_file() wrappers with saner calling conventions are introduced
   (alloc_file_clone() and alloc_file_pseudo()); callers converted, with
   much simplification.

 - while we are at it, saner calling conventions for path_init() and
   link_path_walk(), simplifying things inside fs/namei.c (both on
   open-related paths and elsewhere).

* 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
  few more cleanups of link_path_walk() callers
  allow link_path_walk() to take ERR_PTR()
  make path_init() unconditionally paired with terminate_walk()
  document alloc_file() changes
  make alloc_file() static
  do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone()
  new helper: alloc_file_clone()
  create_pipe_files(): switch the first allocation to alloc_file_pseudo()
  anon_inode_getfile(): switch to alloc_file_pseudo()
  hugetlb_file_setup(): switch to alloc_file_pseudo()
  ocxlflash_getfile(): switch to alloc_file_pseudo()
  cxl_getfile(): switch to alloc_file_pseudo()
  ... and switch shmem_file_setup() to alloc_file_pseudo()
  __shmem_file_setup(): reorder allocations
  new wrapper: alloc_file_pseudo()
  kill FILE_{CREATED,OPENED}
  switch atomic_open() and lookup_open() to returning 0 in all success cases
  document ->atomic_open() changes
  ->atomic_open(): return 0 in all success cases
  get rid of 'opened' in path_openat() and the helpers downstream
  ...
2018-08-13 19:58:36 -07:00
Al Viro 5bef915104 new helper: inode_fake_hash()
open-coded in a quite a few places...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03 16:03:32 -04:00
Al Viro c2b6d621c4 new primitive: discard_new_inode()
We don't want open-by-handle picking half-set-up in-core
struct inode from e.g. mkdir() having failed halfway through.
In other words, we don't want such inodes returned by iget_locked()
on their way to extinction.  However, we can't just have them
unhashed - otherwise open-by-handle immediately *after* that would've
ended up creating a new in-core inode over the on-disk one that
is in process of being freed right under us.

	Solution: new flag (I_CREATING) set by insert_inode_locked() and
removed by unlock_new_inode() and a new primitive (discard_new_inode())
to be used by such halfway-through-setup failure exits instead of
unlock_new_inode() / iput() combinations.  That primitive unlocks new
inode, but leaves I_CREATING in place.

	iget_locked() treats finding an I_CREATING inode as failure
(-ESTALE, once we sort out the error propagation).
	insert_inode_locked() treats the same as instant -EBUSY.
	ilookup() treats those as icache miss.

[Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03 15:55:30 -04:00
Linus Torvalds 165ea0d1c2 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Fix several places that screw up cleanups after failures halfway
  through opening a file (one open-coding filp_clone_open() and getting
  it wrong, two misusing alloc_file()). That part is -stable fodder from
  the 'work.open' branch.

  And Christoph's regression fix for uapi breakage in aio series;
  include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel
  definition of sigset_t, the reason for doing so in the first place had
  been bogus - there's no need to expose struct __aio_sigset in
  aio_abi.h at all"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aio: don't expose __aio_sigset in uapi
  ocxlflash_getfile(): fix double-iput() on alloc_file() failures
  cxl_getfile(): fix double-iput() on alloc_file() failures
  drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()
2018-07-22 12:04:51 -07:00
Al Viro dbae8f2ca2 kill FILE_{CREATED,OPENED}
no users left

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:23 -04:00
Al Viro 44907d7900 get rid of 'opened' argument of ->atomic_open() - part 3
now it can be done...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:20 -04:00
Al Viro be12af3ef5 getting rid of 'opened' argument of ->atomic_open() - part 1
'opened' argument of finish_open() is unused.  Kill it.

Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:19 -04:00
Al Viro 73a09dd943 introduce FMODE_CREATED and switch to it
Parallel to FILE_CREATED, goes into ->f_mode instead of *opened.
NFS is a bit of a wart here - it doesn't have file at the point
where FILE_CREATED used to be set, so we need to propagate it
there (for now).  IMA is another one (here and everywhere)...

Note that this needs do_dentry_open() to leave old bits in ->f_mode
alone - we want it to preserve FMODE_CREATED if it had been already
set (no other bit can be there).

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:18 -04:00
Al Viro f5d11409e6 introduce FMODE_OPENED
basically, "is that instance set up enough for regular fput(), or
do we want put_filp() for that one".

NOTE: the only alloc_file() caller that could be followed by put_filp()
is in arch/ia64/kernel/perfmon.c, which is (Kconfig-level) broken.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:16 -04:00
Al Viro b4e7a7a88b drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()
Failure of ->open() should *not* be followed by fput().  Fixed by
using filp_clone_open(), which gets the cleanups right.

Cc: stable@vger.kernel.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-10 23:29:03 -04:00
Al Viro 19f391eb05 turn filp_clone_open() into inline wrapper for dentry_open()
it's exactly the same thing as
	dentry_open(&file->f_path, file->f_flags, file->f_cred)

... and rename it to file_clone_open(), while we are at it.
'filp' naming convention is bogus; sure, it's "file pointer",
but we generally don't do that kind of Hungarian notation.
Some of the instances have too many callers to touch, but this
one has only two, so let's sanitize it while we can...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-10 23:29:03 -04:00
Amir Goldstein 69448867ab fs: shave 8 bytes off of struct inode
Here is a link to Linus' reply to Jan's concern about making
i_blkbibts byte addressable:
https://marc.info/?l=linux-fsdevel&m=152882624707975&w=2

Here is a link to an lkp.org report about potential performance
improvement in some workload, which could(?) be related to packing
i_blkbits closer to i_bytes/i_lock:
https://marc.info/?l=linux-fsdevel&m=153077048108198&w=2

Changes since v1:
- Add links to relevant discussions

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-08 19:58:24 -04:00
Linus Torvalds a11e1d432b Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL
The poll() changes were not well thought out, and completely
unexplained.  They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.

Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead.  That gets rid of one of the new indirections.

But that doesn't fix the new complexity that is completely unwarranted
for the regular case.  The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.

[ This revert is a revert of about 30 different commits, not reverted
  individually because that would just be unnecessarily messy  - Linus ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 10:40:47 -07:00
Vasily Averin cf85ec8870 removed extra extern file_fdatawait_range
Jeff added this extern twice in commit a823e4589e

Fixes: a823e4589e ("mm: add file_fdatawait_range and file_write_and_wait")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-18 10:43:18 -04:00
Linus Torvalds 7a932516f5 vfs/y2038: inode timestamps conversion to timespec64
This is a late set of changes from Deepa Dinamani doing an automated
 treewide conversion of the inode and iattr structures from 'timespec'
 to 'timespec64', to push the conversion from the VFS layer into the
 individual file systems.
 
 There were no conflicts between this and the contents of linux-next
 until just before the merge window, when we saw multiple problems:
 
 - A minor conflict with my own y2038 fixes, which I could address
   by adding another patch on top here.
 - One semantic conflict with late changes to the NFS tree. I addressed
   this by merging Deepa's original branch on top of the changes that
   now got merged into mainline and making sure the merge commit includes
   the necessary changes as produced by coccinelle.
 - A trivial conflict against the removal of staging/lustre.
 - Multiple conflicts against the VFS changes in the overlayfs tree.
   These are still part of linux-next, but apparently this is no longer
   intended for 4.18 [1], so I am ignoring that part.
 
 As Deepa writes:
 
   The series aims to switch vfs timestamps to use struct timespec64.
   Currently vfs uses struct timespec, which is not y2038 safe.
 
   The series involves the following:
   1. Add vfs helper functions for supporting struct timepec64 timestamps.
   2. Cast prints of vfs timestamps to avoid warnings after the switch.
   3. Simplify code using vfs timestamps so that the actual
      replacement becomes easy.
   4. Convert vfs timestamps to use struct timespec64 using a script.
      This is a flag day patch.
 
   Next steps:
   1. Convert APIs that can handle timespec64, instead of converting
      timestamps at the boundaries.
   2. Update internal data structures to avoid timestamp conversions.
 
 Thomas Gleixner adds:
 
   I think there is no point to drag that out for the next merge window.
   The whole thing needs to be done in one go for the core changes which
   means that you're going to play that catchup game forever. Let's get
   over with it towards the end of the merge window.
 
 [1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf
 MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1
 g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h
 L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+
 Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E
 LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs
 nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G
 wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v
 c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK
 tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/
 WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy
 A3HkjIBrKW5AgQDxfgvm
 =CZX2
 -----END PGP SIGNATURE-----

Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground

Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
 "This is a late set of changes from Deepa Dinamani doing an automated
  treewide conversion of the inode and iattr structures from 'timespec'
  to 'timespec64', to push the conversion from the VFS layer into the
  individual file systems.

  As Deepa writes:

   'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
       timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
       becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
       This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
       timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

  Thomas Gleixner adds:

   'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
  pstore: Remove bogus format string definition
  vfs: change inode times to use struct timespec64
  pstore: Convert internal records to timespec64
  udf: Simplify calls to udf_disk_stamp_to_time
  fs: nfs: get rid of memcpys for inode times
  ceph: make inode time prints to be long long
  lustre: Use long long type to print inode time
  fs: add timespec64_truncate()
2018-06-15 07:31:07 +09:00
Linus Torvalds 4a189982e2 Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull aio iopriority support from Al Viro:
 "The rest of aio stuff for this cycle - Adam's aio ioprio series"

* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: aio ioprio use ioprio_check_cap ret val
  fs: aio ioprio add explicit block layer dependence
  fs: iomap dio set bio prio from kiocb prio
  fs: blkdev set bio prio from kiocb prio
  fs: Add aio iopriority support
  fs: Convert kiocb rw_hint from enum to u16
  block: add ioprio_check_cap function
2018-06-08 10:00:20 -07:00
Linus Torvalds 70f2ae1f00 overlayfs fixes for 4.18
This contains a fix for the vfs_mkdir() issue discovered by Al, as well as
 other fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCWxatHQAKCRDh3BK/laaZ
 POg5AP95a/uUOrTJeTsENJwTmyAwHed9a6y4abKtvNErxUm4awD9FmhyYXodzJNq
 9/mheT4kV2XkR/KkxI5sizfT1uPuvgA=
 =+ljQ
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "This contains a fix for the vfs_mkdir() issue discovered by Al, as
  well as other fixes and cleanups"

* tag 'ovl-fixes-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: use inode_insert5() to hash a newly created inode
  ovl: Pass argument to ovl_get_inode() in a structure
  vfs: factor out inode_insert5()
  ovl: clean up copy-up error paths
  ovl: return EIO on internal error
  ovl: make ovl_create_real() cope with vfs_mkdir() safely
  ovl: create helper ovl_create_temp()
  ovl: return dentry from ovl_create_real()
  ovl: struct cattr cleanups
  ovl: strip debug argument from ovl_do_ helpers
  ovl: remove WARN_ON() real inode attributes mismatch
  ovl: Kconfig documentation fixes
  ovl: update documentation for unionmount-testsuite
2018-06-07 08:53:50 -07:00
Deepa Dinamani 95582b0083 vfs: change inode times to use struct timespec64
struct timespec is not y2038 safe. Transition vfs to use
y2038 safe struct timespec64 instead.

The change was made with the help of the following cocinelle
script. This catches about 80% of the changes.
All the header file and logic changes are included in the
first 5 rules. The rest are trivial substitutions.
I avoid changing any of the function signatures or any other
filesystem specific data structures to keep the patch simple
for review.

The script can be a little shorter by combining different cases.
But, this version was sufficient for my usecase.

virtual patch

@ depends on patch @
identifier now;
@@
- struct timespec
+ struct timespec64
  current_time ( ... )
  {
- struct timespec now = current_kernel_time();
+ struct timespec64 now = current_kernel_time64();
  ...
- return timespec_trunc(
+ return timespec64_trunc(
  ... );
  }

@ depends on patch @
identifier xtime;
@@
 struct \( iattr \| inode \| kstat \) {
 ...
-       struct timespec xtime;
+       struct timespec64 xtime;
 ...
 }

@ depends on patch @
identifier t;
@@
 struct inode_operations {
 ...
int (*update_time) (...,
-       struct timespec t,
+       struct timespec64 t,
...);
 ...
 }

@ depends on patch @
identifier t;
identifier fn_update_time =~ "update_time$";
@@
 fn_update_time (...,
- struct timespec *t,
+ struct timespec64 *t,
 ...) { ... }

@ depends on patch @
identifier t;
@@
lease_get_mtime( ... ,
- struct timespec *t
+ struct timespec64 *t
  ) { ... }

@te depends on patch forall@
identifier ts;
local idexpression struct inode *inode_node;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn_update_time =~ "update_time$";
identifier fn;
expression e, E3;
local idexpression struct inode *node1;
local idexpression struct inode *node2;
local idexpression struct iattr *attr1;
local idexpression struct iattr *attr2;
local idexpression struct iattr attr;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
@@
(
(
- struct timespec ts;
+ struct timespec64 ts;
|
- struct timespec ts = current_time(inode_node);
+ struct timespec64 ts = current_time(inode_node);
)

<+... when != ts
(
- timespec_equal(&inode_node->i_xtime, &ts)
+ timespec64_equal(&inode_node->i_xtime, &ts)
|
- timespec_equal(&ts, &inode_node->i_xtime)
+ timespec64_equal(&ts, &inode_node->i_xtime)
|
- timespec_compare(&inode_node->i_xtime, &ts)
+ timespec64_compare(&inode_node->i_xtime, &ts)
|
- timespec_compare(&ts, &inode_node->i_xtime)
+ timespec64_compare(&ts, &inode_node->i_xtime)
|
ts = current_time(e)
|
fn_update_time(..., &ts,...)
|
inode_node->i_xtime = ts
|
node1->i_xtime = ts
|
ts = inode_node->i_xtime
|
<+... attr1->ia_xtime ...+> = ts
|
ts = attr1->ia_xtime
|
ts.tv_sec
|
ts.tv_nsec
|
btrfs_set_stack_timespec_sec(..., ts.tv_sec)
|
btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
|
- ts = timespec64_to_timespec(
+ ts =
...
-)
|
- ts = ktime_to_timespec(
+ ts = ktime_to_timespec64(
...)
|
- ts = E3
+ ts = timespec_to_timespec64(E3)
|
- ktime_get_real_ts(&ts)
+ ktime_get_real_ts64(&ts)
|
fn(...,
- ts
+ timespec64_to_timespec(ts)
,...)
)
...+>
(
<... when != ts
- return ts;
+ return timespec64_to_timespec(ts);
...>
)
|
- timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
|
- timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
+ timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
|
- timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
|
node1->i_xtime1 =
- timespec_trunc(attr1->ia_xtime1,
+ timespec64_trunc(attr1->ia_xtime1,
...)
|
- attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
+ attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
...)
|
- ktime_get_real_ts(&attr1->ia_xtime1)
+ ktime_get_real_ts64(&attr1->ia_xtime1)
|
- ktime_get_real_ts(&attr.ia_xtime1)
+ ktime_get_real_ts64(&attr.ia_xtime1)
)

@ depends on patch @
struct inode *node;
struct iattr *attr;
identifier fn;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
expression e;
@@
(
- fn(node->i_xtime);
+ fn(timespec64_to_timespec(node->i_xtime));
|
 fn(...,
- node->i_xtime);
+ timespec64_to_timespec(node->i_xtime));
|
- e = fn(attr->ia_xtime);
+ e = fn(timespec64_to_timespec(attr->ia_xtime));
)

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
fn (...,
- &attr->ia_xtime,
+ &ts,
...);
)
...+>
}

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
struct kstat *stat;
identifier ia_xtime =~ "^ia_[acm]time$";
identifier i_xtime =~ "^i_[acm]time$";
identifier xtime =~ "^[acm]time$";
identifier fn, ret;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(stat->xtime);
ret = fn (...,
- &stat->xtime);
+ &ts);
)
...+>
}

@ depends on patch @
struct inode *node;
struct inode *node2;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier i_xtime3 =~ "^i_[acm]time$";
struct iattr *attrp;
struct iattr *attrp2;
struct iattr attr ;
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
struct kstat *stat;
struct kstat stat1;
struct timespec64 ts;
identifier xtime =~ "^[acmb]time$";
expression e;
@@
(
( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
|
 node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 stat->xtime = node2->i_xtime1;
|
 stat1.xtime = node2->i_xtime1;
|
( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
|
( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
|
- e = node->i_xtime1;
+ e = timespec64_to_timespec( node->i_xtime1 );
|
- e = attrp->ia_xtime1;
+ e = timespec64_to_timespec( attrp->ia_xtime1 );
|
node->i_xtime1 = current_time(...);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
 node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
- node->i_xtime1 = e;
+ node->i_xtime1 = timespec_to_timespec64(e);
)

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: <anton@tuxera.com>
Cc: <balbi@kernel.org>
Cc: <bfields@fieldses.org>
Cc: <darrick.wong@oracle.com>
Cc: <dhowells@redhat.com>
Cc: <dsterba@suse.com>
Cc: <dwmw2@infradead.org>
Cc: <hch@lst.de>
Cc: <hirofumi@mail.parknet.co.jp>
Cc: <hubcap@omnibond.com>
Cc: <jack@suse.com>
Cc: <jaegeuk@kernel.org>
Cc: <jaharkes@cs.cmu.edu>
Cc: <jslaby@suse.com>
Cc: <keescook@chromium.org>
Cc: <mark@fasheh.com>
Cc: <miklos@szeredi.hu>
Cc: <nico@linaro.org>
Cc: <reiserfs-devel@vger.kernel.org>
Cc: <richard@nod.at>
Cc: <sage@redhat.com>
Cc: <sfrench@samba.org>
Cc: <swhiteho@redhat.com>
Cc: <tj@kernel.org>
Cc: <trond.myklebust@primarydata.com>
Cc: <tytso@mit.edu>
Cc: <viro@zeniv.linux.org.uk>
2018-06-05 16:57:31 -07:00
Linus Torvalds fd59ccc530 Add bunch of cleanups, and add support for the Speck128/256
algorithms.  Yes, Speck is contrversial, but the intention is to use
 them only for the lowest end Android devices, where the alternative
 *really* is no encryption at all for data stored at rest.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlsW/RAACgkQ8vlZVpUN
 gaO1pwf/WOusoXBK5sUuiC8d9I5s+OlPhTKhrh+BcL7/xhOkyh2xDv2FEwsjhwUf
 qo26AMf7DsWKWgJ6wDQ1z+PIuPSNeQy5dCKbz2hbfNjET3vdk2NuvPWnIbFrmIek
 LB6Ii9jKlPJRO4T3nMrE9JzJZLsoX5OKRSgYTT3EviuW/wSXaFyi7onFnyKXBnF/
 e689tE50P42PgTEDKs4RDw43PwEGbcl5Vtj+Lnoh6VGX/pYvL/9ZbEYlKrgqSOU4
 DmckR8D8UU/Gy6G5bvMsVuJpLEU7vBxupOOHI/nJFwR6tuYi0Q1j7C/zH8BvWp5e
 o8P5GpOWk7Gm346FaUlkAZ+25bCU+A==
 =EBeE
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypt updates from Ted Ts'o:
 "Add bunch of cleanups, and add support for the Speck128/256
  algorithms.

  Yes, Speck is contrversial, but the intention is to use them only for
  the lowest end Android devices, where the alternative *really* is no
  encryption at all for data stored at rest"

* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: log the crypto algorithm implementations
  fscrypt: add Speck128/256 support
  fscrypt: only derive the needed portion of the key
  fscrypt: separate key lookup from key derivation
  fscrypt: use a common logging function
  fscrypt: remove internal key size constants
  fscrypt: remove unnecessary check for non-logon key type
  fscrypt: make fscrypt_operations.max_namelen an integer
  fscrypt: drop empty name check from fname_decrypt()
  fscrypt: drop max_namelen check from fname_decrypt()
  fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info()
  fscrypt: don't clear flags on crypto transform
  fscrypt: remove stale comment from fscrypt_d_revalidate()
  fscrypt: remove error messages for skcipher_request_alloc() failure
  fscrypt: remove unnecessary NULL check when allocating skcipher
  fscrypt: clean up after fscrypt_prepare_lookup() conversions
  fs, fscrypt: only define ->s_cop when FS_ENCRYPTION is enabled
  fscrypt: use unbound workqueue for decryption
2018-06-05 15:15:32 -07:00
Linus Torvalds 1434763ca5 A lot of cleanups and bug fixes, especially dealing with corrupted
file systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlsWmXcACgkQ8vlZVpUN
 gaMGmAf+JGK4XysAvlJuj9tJfFPHHwgXSBBe/GAgyjhW9XhtNRHprUM2SpvwpIdI
 Isl5O8Ec+FywBJB0I4AGy6yds6DE6jn38FFRFEhVmkp4EoROJiIr8+a7spfVuC3m
 cWrHBgc7FwK4qYlyuGtH2+6NYva+KNFr+wwbvvUusvldyZAWMzflfrcdHM6D+/JE
 +Sd5I7aniqnP5fICq0b4xrP2zWO4XJEKMbZO2dJ9yRtMmUnbaSj6G+bTGDRyfrNk
 L3wJhqIu93o7zjDaEC0UfXSLAXzoDGWHeq7fBssaJiXj/hNtAvAGPaRMbgFR9a3h
 uHmhvf84iyJuM+8GgG25UqeGwCuWiA==
 =b0VQ
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "A lot of cleanups and bug fixes, especially dealing with corrupted
  file systems"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: fix fencepost error in check for inode count overflow during resize
  ext4: correctly handle a zero-length xattr with a non-zero e_value_offs
  ext4: bubble errors from ext4_find_inline_data_nolock() up to ext4_iget()
  ext4: do not allow external inodes for inline data
  ext4: report delalloc reserve as non-free in statfs for project quota
  ext4: remove NULL check before calling kmem_cache_destroy()
  jbd2: remove NULL check before calling kmem_cache_destroy()
  jbd2: remove bunch of empty lines with jbd2 debug
  ext4: handle errors on ext4_commit_super
  ext4: do not update s_last_mounted of a frozen fs
  ext4: factor out helper ext4_sample_last_mounted()
  vfs: add the sb_start_intwrite_trylock() helper
  ext4: update mtime in ext4_punch_hole even if no blocks are released
  ext4: add verifier check for symlink with append/immutable flags
  fs: ext4: add new return type vm_fault_t
  ext4: fix hole length detection in ext4_ind_map_blocks()
  ext4: mark block bitmap corrupted when found
  ext4: mark inode bitmap corrupted when found
  ext4: add new ext4_mark_group_bitmap_corrupted() helper
  ext4: fix wrong return value in ext4_read_inode_bitmap()
  ...
2018-06-05 12:49:17 -07:00
Linus Torvalds 408afb8d78 Merge branch 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull aio updates from Al Viro:
 "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.

  The only thing I'm holding back for a day or so is Adam's aio ioprio -
  his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
  but let it sit in -next for decency sake..."

* 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  aio: sanitize the limit checking in io_submit(2)
  aio: fold do_io_submit() into callers
  aio: shift copyin of iocb into io_submit_one()
  aio_read_events_ring(): make a bit more readable
  aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
  aio: take list removal to (some) callers of aio_complete()
  aio: add missing break for the IOCB_CMD_FDSYNC case
  random: convert to ->poll_mask
  timerfd: convert to ->poll_mask
  eventfd: switch to ->poll_mask
  pipe: convert to ->poll_mask
  crypto: af_alg: convert to ->poll_mask
  net/rxrpc: convert to ->poll_mask
  net/iucv: convert to ->poll_mask
  net/phonet: convert to ->poll_mask
  net/nfc: convert to ->poll_mask
  net/caif: convert to ->poll_mask
  net/bluetooth: convert to ->poll_mask
  net/sctp: convert to ->poll_mask
  net/tipc: convert to ->poll_mask
  ...
2018-06-04 13:57:43 -07:00
Linus Torvalds 9214407d12 fasync fix for v4.18
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbFTxwAAoJEAAOaEEZVoIVfOQQAIMmcJjx8c+86TS5bVpi9yWN
 z6C2QEW1vMN2Z65dvjquSbF81yY54oCgiV3/6iGOtcOOI3rH/7f+CKb+4xRskpFv
 PSVXKXkXKOGRnnhcz2X8F+FBbd4xLxzyB+Ff9HCWVv2L76d7Uu8wUcbSpOdC7GOd
 pXHQkA+WNPEnHpr4To09ZwS14IzVCAhFii7MvOU6IXxzbP473/aPpFjrwa+mbkkz
 J7U4bijaLL5eGJGE0ElHOaeF/iAhOBEFGrkT3lR78ZA+spnqVXlv2U9E29zg9H98
 JyFM7vMe9MtR///Ve6BeLNKrDnwCmBo9ScMR8eWUWhrOInGX9yU0UR9XkD5TzA3U
 hQHW2ckjloykp+HuvbUOF/Aut8GczbyaDsAHZ6/fAxgNGnFeJ7yjR5IPKFkKekPt
 H+ls9j4C9yZpTLXioNqJz6d2IOS5MKfjPFC0c2ItO11jzNdbmjO7ciIq4EecSR51
 FKeoUHyjhfak94Dh7x3IQEtnm1o5CbbSNvQ8g6PJYXFBi88jcmQkLmZlULDV3vyg
 9febfSYpJhamJ+GUn8av+HziJG/1o1sOLrIVi2qGRIBJiUO5tAYY6UVvo1xKuzQ3
 BMSRVnqVbs749lUV0QrYoFVdMvVxL6l0+8p4Xgx7q5BR4gG73e4yaBnXLeStMIYo
 oSFy/EH3MYR87m5Yk45E
 =rqcZ
 -----END PGP SIGNATURE-----

Merge tag 'locks-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull fasync fix from Jeff Layton:
 "Just a single fix for a deadlock in the fasync handling code that
  Kirill observed while testing.

  The fix is to change the fa_lock to be rwlock_t, and use a read lock
  in kill_fasync_rcu"

* tag 'locks-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  fasync: Fix deadlock between task-context and interrupt-context kill_fasync()
2018-06-04 13:05:02 -07:00
Linus Torvalds f956d08a56 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Misc bits and pieces not fitting into anything more specific"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: delete unnecessary assignment in vfs_listxattr
  Documentation: filesystems: update filesystem locking documentation
  vfs: namei: use path_equal() in follow_dotdot()
  fs.h: fix outdated comment about file flags
  __inode_security_revalidate() never gets NULL opt_dentry
  make xattr_getsecurity() static
  vfat: simplify checks in vfat_lookup()
  get rid of dead code in d_find_alias()
  it's SB_BORN, not MS_BORN...
  msdos_rmdir(): kill BS comment
  remove rpc_rmdir()
  fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
2018-06-04 10:14:28 -07:00
Adam Manzanares d9a08a9e61 fs: Add aio iopriority support
This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

This patch depends on block: add ioprio_check_cap function.

Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-31 10:50:55 -04:00
Adam Manzanares fc28724d67 fs: Convert kiocb rw_hint from enum to u16
In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assignment.

Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-31 10:50:54 -04:00
Miklos Szeredi 80ea09a002 vfs: factor out inode_insert5()
Split out common helper for race free insertion of an already allocated
inode into the cache.  Use this from iget5_locked() and
insert_inode_locked4().  Make iget5_locked() use new_inode()/iput() instead
of alloc_inode()/destroy_inode() directly.

Also export to modules for use by filesystems which want to preallocate an
inode before file/directory creation.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:11 +02:00
Christoph Hellwig 5afb78356c block: don't print a message when the device went away
The information about a size change in this case just creates confusion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29 08:59:21 -06:00
Christoph Hellwig 3deb642f0d fs: introduce new ->get_poll_head and ->poll_mask methods
->get_poll_head returns the waitqueue that the poll operation is going
to sleep on.  Note that this means we can only use a single waitqueue
for the poll, unlike some current drivers that use two waitqueues for
different events.  But now that we have keyed wakeups and heavily use
those for poll there aren't that many good reason left to keep the
multiple waitqueues, and if there are any ->poll is still around, the
driver just won't support aio poll.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-26 09:16:44 +02:00
Deepa Dinamani 8efd6894ff fs: add timespec64_truncate()
As vfs moves to using struct timespec64 to represent times,
update the argument to timespec_truncate() to use
struct timespec64. Also change the name of the function.
The rest of the implementation logic is the same.

Move this to fs/inode.c instead of kernel/time/time.c as all the
users of this api are filesystems.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: <viro@zeniv.linux.org.uk>
2018-05-25 15:31:09 -07:00
Eric Biggers bbbc3fb663 fs, fscrypt: only define ->s_cop when FS_ENCRYPTION is enabled
Now that filesystems only set and use their fscrypt_operations when they
are built with encryption support, we can remove ->s_cop from
'struct super_block' when FS_ENCRYPTION is disabled.  This saves a few
bytes on some kernels and also makes it consistent with ->i_crypt_info.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-05-20 16:20:59 -04:00
Li Qiang 75abe32946 fs.h: fix outdated comment about file flags
The __dentry_open function was removed in
commit <2a027e7a18738>("fold __dentry_open() into its sole caller").

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-17 22:19:39 -04:00
Amir Goldstein 0c8e3fe35d vfs: add the sb_start_intwrite_trylock() helper
Needed by ext4 to test frozen fs before updating s_last_mounted.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2018-05-13 22:40:30 -04:00
Kirill Tkhai 7a107c0f55 fasync: Fix deadlock between task-context and interrupt-context kill_fasync()
I observed the following deadlock between them:

[task 1]                          [task 2]                         [task 3]
kill_fasync()                     mm_update_next_owner()           copy_process()
 spin_lock_irqsave(&fa->fa_lock)   read_lock(&tasklist_lock)        write_lock_irq(&tasklist_lock)
  send_sigio()                    <IRQ>                             ...
   read_lock(&fown->lock)         kill_fasync()                     ...
    read_lock(&tasklist_lock)      spin_lock_irqsave(&fa->fa_lock)  ...

Task 1 can't acquire read locked tasklist_lock, since there is
already task 3 expressed its wish to take the lock exclusive.
Task 2 holds the read locked lock, but it can't take the spin lock.

Also, there is possible another deadlock (which I haven't observed):

[task 1]                            [task 2]
f_getown()                          kill_fasync()
 read_lock(&f_own->lock)             spin_lock_irqsave(&fa->fa_lock,)
 <IRQ>                               send_sigio()                     write_lock_irq(&f_own->lock)
  kill_fasync()                       read_lock(&fown->lock)
   spin_lock_irqsave(&fa->fa_lock,)

Actually, we do not need exclusive fa->fa_lock in kill_fasync_rcu(),
as it guarantees fa->fa_file->f_owner integrity only. It may seem,
that it used to give a task a small possibility to receive two sequential
signals, if there are two parallel kill_fasync() callers, and task
handles the first signal fastly, but the behaviour won't become
different, since there is exclusive sighand lock in do_send_sig_info().

The patch converts fa_lock into rwlock_t, and this fixes two above
deadlocks, as rwlock is allowed to be taken from interrupt handler
by qrwlock design.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-05-01 07:39:50 -04:00
Linus Torvalds 7214dd4ea9 Merge branch 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs thaw updates from Al Viro:
 "An ancient series that has fallen through the cracks in the previous
  cycle"

* 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  buffer.c: call thaw_super during emergency thaw
  vfs: factor sb iteration out of do_emergency_remount
2018-04-12 12:28:32 -07:00
Linus Torvalds 19e8a2f875 Merge branch 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull AFS updates from Al Viro:
 "The AFS series posted by dhowells depended upon lookup_one_len()
  rework; now that prereq is in the mainline, that series had been
  rebased on top of it and got some exposure and testing..."

* 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs: Do better accretion of small writes on newly created content
  afs: Add stats for data transfer operations
  afs: Trace protocol errors
  afs: Locally edit directory data for mkdir/create/unlink/...
  afs: Adjust the directory XDR structures
  afs: Split the directory content defs into a header
  afs: Fix directory handling
  afs: Split the dynroot stuff out and give it its own ops tables
  afs: Keep track of invalid-before version for dentry coherency
  afs: Rearrange status mapping
  afs: Make it possible to get the data version in readpage
  afs: Init inode before accessing cache
  afs: Introduce a statistics proc file
  afs: Dump bad status record
  afs: Implement @cell substitution handling
  afs: Implement @sys substitution handling
  afs: Prospectively look up extra files when doing a single lookup
  afs: Don't over-increment the cell usage count when pinning it
  afs: Fix checker warnings
  vfs: Remove the const from dir_context::actor
2018-04-12 11:59:06 -07:00
Matthew Wilcox b93b016313 page cache: use xa_lock
Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Linus Torvalds 9f3a0941fb libnvdimm for 4.17
* A rework of the filesytem-dax implementation provides for detection of
   unmap operations (truncate / hole punch) colliding with in-progress
   device-DMA. A fix for these collisions remains a work-in-progress
   pending resolution of truncate latency and starvation regressions.
 
 * The of_pmem driver expands the users of libnvdimm outside of x86 and
   ACPI to describe an implementation of persistent memory on PowerPC with
   Open Firmware / Device tree.
 
 * Address Range Scrub (ARS) handling is completely rewritten to account for
   the fact that ARS may run for 100s of seconds and there is no platform
   defined way to cancel it. ARS will now no longer block namespace
   initialization.
 
 * The NVDIMM Namespace Label implementation is updated to handle label
   areas as small as 1K, down from 128K.
 
 * Miscellaneous cleanups and updates to unit test infrastructure.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJazDt5AAoJEB7SkWpmfYgCqGMQALLwdPeY87cUK7AvQ2IXj46B
 lJgeVuHPzyQDbC03AS5uUYnnU3I5lFd7i4y7ZrywNpFs4lsb/bNmbUpQE5xp+Yvc
 1MJ/JYDIP5X4misWYm3VJo85N49+VqSRgAQk52PBigwnZ7M6/u4cSptXM9//c9JL
 /NYbat6IjjY6Tx49Tec6+F3GMZjsFLcuTVkQcREoOyOqVJE4YpP0vhNjEe0vq6vr
 EsSWiqEI5VFH4PfJwKdKj/64IKB4FGKj2A5cEgjQBxW2vw7tTJnkRkdE3jDUjqtg
 xYAqGp/Dqs4+bgdYlT817YhiOVrcr5mOHj7TKWQrBPgzKCbcG5eKDmfT8t+3NEga
 9kBlgisqIcG72lwZNA7QkEHxq1Omy9yc1hUv9qz2YA0G+J1WE8l1T15k1DOFwV57
 qIrLLUypklNZLxvrzNjclempboKc4JCUlj+TdN5E5Y6pRs55UWTXaP7Xf5O7z0vf
 l/uiiHkc3MPH73YD2PSEGFJ8m8EU0N8xhrcz3M9E2sHgYCnbty1Lw3FH0/GhThVA
 ya1mMeDdb8A2P7gWCBk1Lqeig+rJKXSey4hKM6D0njOEtMQO1H4tFqGjyfDX1xlJ
 3plUR9WBVEYzN5+9xWbwGag/ezGZ+NfcVO2gmy6yXiEph796BxRAZx/18zKRJr0m
 9eGJG1H+JspcbtLF9iHn
 =acZQ
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This cycle was was not something I ever want to repeat as there were
  several late changes that have only now just settled.

  Half of the branch up to commit d2c997c0f1 ("fs, dax: use
  page->mapping to warn...") have been in -next for several releases.
  The of_pmem driver and the address range scrub rework were late
  arrivals, and the dax work was scaled back at the last moment.

  The of_pmem driver missed a previous merge window due to an oversight.
  A sense of obligation to rectify that miss is why it is included for
  4.17. It has acks from PowerPC folks. Stephen reported a build failure
  that only occurs when merging it with your latest tree, for now I have
  fixed that up by disabling modular builds of of_pmem. A test merge
  with your tree has received a build success report from the 0day robot
  over 156 configs.

  An initial version of the ARS rework was submitted before the merge
  window. It is self contained to libnvdimm, a net code reduction, and
  passing all unit tests.

  The filesystem-dax changes are based on the wait_var_event()
  functionality from tip/sched/core. However, late review feedback
  showed that those changes regressed truncate performance to a large
  degree. The branch was rewound to drop the truncate behavior change
  and now only includes preparation patches and cleanups (with full acks
  and reviews). The finalization of this dax-dma-vs-trnucate work will
  need to wait for 4.18.

  Summary:

   - A rework of the filesytem-dax implementation provides for detection
     of unmap operations (truncate / hole punch) colliding with
     in-progress device-DMA. A fix for these collisions remains a
     work-in-progress pending resolution of truncate latency and
     starvation regressions.

   - The of_pmem driver expands the users of libnvdimm outside of x86
     and ACPI to describe an implementation of persistent memory on
     PowerPC with Open Firmware / Device tree.

   - Address Range Scrub (ARS) handling is completely rewritten to
     account for the fact that ARS may run for 100s of seconds and there
     is no platform defined way to cancel it. ARS will now no longer
     block namespace initialization.

   - The NVDIMM Namespace Label implementation is updated to handle
     label areas as small as 1K, down from 128K.

   - Miscellaneous cleanups and updates to unit test infrastructure"

* tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits)
  libnvdimm, of_pmem: workaround OF_NUMA=n build error
  nfit, address-range-scrub: add module option to skip initial ars
  nfit, address-range-scrub: rework and simplify ARS state machine
  nfit, address-range-scrub: determine one platform max_ars value
  powerpc/powernv: Create platform devs for nvdimm buses
  doc/devicetree: Persistent memory region bindings
  libnvdimm: Add device-tree based driver
  libnvdimm: Add of_node to region and bus descriptors
  libnvdimm, region: quiet region probe
  libnvdimm, namespace: use a safe lookup for dimm device name
  libnvdimm, dimm: fix dpa reservation vs uninitialized label area
  libnvdimm, testing: update the default smart ctrl_temperature
  libnvdimm, testing: Add emulation for smart injection commands
  nfit, address-range-scrub: introduce nfit_spa->ars_state
  libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device'
  nfit, address-range-scrub: fix scrub in-progress reporting
  dax, dm: allow device-mapper to operate without dax support
  dax: introduce CONFIG_DAX_DRIVER
  fs, dax: use page->mapping to warn if truncate collides with a busy page
  ext2, dax: introduce ext2_dax_aops
  ...
2018-04-10 10:25:57 -07:00
David Howells a09acf4b43 vfs: Remove the const from dir_context::actor
Remove the const marking from the actor function pointer in the dir_context
struct.  The const prevents the structure from being used as part of a
kmalloc'd object as it makes the compiler require that the actor member be
set at object initialisation time (or not at all), incuring something like
the following error if you try and set it later:

	fs/afs/dir.c:556:20: error: assignment of read-only member 'actor'

Marking the member const like this adds very little in the way of sanity
checking as the type checking system is likely to provide sufficient - and
if not, the kernel is very likely to oops repeatably in this case.

Fixes: ac6614b764 ("[readdir] constify ->actor")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-09 21:12:30 +01:00
Dan Williams e13e75b86e Merge branch 'for-4.17/dax' into libnvdimm-for-next 2018-04-09 10:50:17 -07:00
Linus Torvalds f8cf2f16a7 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull integrity updates from James Morris:
 "A mixture of bug fixes, code cleanup, and continues to close
  IMA-measurement, IMA-appraisal, and IMA-audit gaps.

  Also note the addition of a new cred_getsecid LSM hook by Matthew
  Garrett:

     For IMA purposes, we want to be able to obtain the prepared secid
     in the bprm structure before the credentials are committed. Add a
     cred_getsecid hook that makes this possible.

  which is used by a new CREDS_CHECK target in IMA:

     In ima_bprm_check(), check with both the existing process
     credentials and the credentials that will be committed when the new
     process is started. This will not change behaviour unless the
     system policy is extended to include CREDS_CHECK targets -
     BPRM_CHECK will continue to check the same credentials that it did
     previously"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  ima: Fallback to the builtin hash algorithm
  ima: Add smackfs to the default appraise/measure list
  evm: check for remount ro in progress before writing
  ima: Improvements in ima_appraise_measurement()
  ima: Simplify ima_eventsig_init()
  integrity: Remove unused macro IMA_ACTION_RULE_FLAGS
  ima: drop vla in ima_audit_measurement()
  ima: Fix Kconfig to select TPM 2.0 CRB interface
  evm: Constify *integrity_status_msg[]
  evm: Move evm_hmac and evm_hash from evm_main.c to evm_crypto.c
  fuse: define the filesystem as untrusted
  ima: fail signature verification based on policy
  ima: clear IMA_HASH
  ima: re-evaluate files on privileged mounted filesystems
  ima: fail file signature verification on non-init mounted filesystems
  IMA: Support using new creds in appraisal policy
  security: Add a cred_getsecid hook
2018-04-07 16:53:59 -07:00
Linus Torvalds 9022ca6b11 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted stuff, including Christoph's I_DIRTY patches"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: move I_DIRTY_INODE to fs.h
  ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
  ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
  gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
  fs: fold open_check_o_direct into do_dentry_open
  vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents
  vfs: make sure struct filename->iname is word-aligned
  get rid of pointless includes of fs_struct.h
  [poll] annotate SAA6588_CMD_POLL users
2018-04-06 11:07:08 -07:00
Linus Torvalds 3526dd0c78 for-4.17/block-20180402
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJawr05AAoJEPfTWPspceCmT2UP/1uuaqwzyl4VjFNb/k7KS7UM
 +Cs/1HBlGomgMA8orDTGqtWqLRdR3z4RSh0+MvXTzQ78HpFVYz7CbDc9itHm+G9M
 X0ypD4kF/JGCFb5cxk+x6qv28uO2nv4DP3+0hHqJWLH4UVJBWDY6bs4BPShsf9QB
 I6XjioNMhoqylXgdOITLODJZz+TcChlJMDAqwhpJwh9TH1wjobleAZ6AdmCPfgi5
 h0UCKMUKzcVJlNZwQUrzrs2cxcx9Uhunnbz7HK0ZV4n/FKFtDpGynFpQQ71pZxKe
 Be0ZOBPCQvC3ykOM/egCIvC/e5y7FgrjORD6jxyu1PTwAugI5E1VYSMxHkXvgPAx
 zOo9A7RT4GPO2tDQv+DbzNFpqeSAclTgSmr+/y1wmheBs8DiSt7MPVBiNM4zdCNv
 NLk9z7IEjFhdmluSB/LbTb1aokypMb/q7QTLouPHdwGn80k7yrhFyLHgdjpNTQ2K
 UHfHZvGxkOX6SmFhBNOtIFUkuSceenh64a0RkRle7filx+ImpbCVm2/GYi9zZNCu
 EtctgzLbLmz40zMiyDaZS2bxBgGzfn6yf4xd9LsaAJPMhvZnmXogT0D9ctWXB0WU
 mMaS7sOkLnNjnGkzF1fHkeiZ/oigrstJbe+CA7BtOdwxpWn6MZBgKEoFQ6iA2b3X
 5J1axMgVH5LAsIEcEQVq
 =RVhK
 -----END PGP SIGNATURE-----

Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
 "It's a pretty quiet round this time, which is nice. This contains:

   - series from Bart, cleaning up the way we set/test/clear atomic
     queue flags.

   - series from Bart, fixing races between gendisk and queue
     registration and removal.

   - set of bcache fixes and improvements from various folks, by way of
     Michael Lyle.

   - set of lightnvm updates from Matias, most of it being the 1.2 to
     2.0 transition.

   - removal of unused DIO flags from Nikolay.

   - blk-mq/sbitmap memory ordering fixes from Omar.

   - divide-by-zero fix for BFQ from Paolo.

   - minor documentation patches from Randy.

   - timeout fix from Tejun.

   - Alpha "can't write a char atomically" fix from Mikulas.

   - set of NVMe fixes by way of Keith.

   - bsg and bsg-lib improvements from Christoph.

   - a few sed-opal fixes from Jonas.

   - cdrom check-disk-change deadlock fix from Maurizio.

   - various little fixes, comment fixes, etc from various folks"

* tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits)
  blk-mq: Directly schedule q->timeout_work when aborting a request
  blktrace: fix comment in blktrace_api.h
  lightnvm: remove function name in strings
  lightnvm: pblk: remove some unnecessary NULL checks
  lightnvm: pblk: don't recover unwritten lines
  lightnvm: pblk: implement 2.0 support
  lightnvm: pblk: implement get log report chunk
  lightnvm: pblk: rename ppaf* to addrf*
  lightnvm: pblk: check for supported version
  lightnvm: implement get log report chunk helpers
  lightnvm: make address conversions depend on generic device
  lightnvm: add support for 2.0 address format
  lightnvm: normalize geometry nomenclature
  lightnvm: complete geo structure with maxoc*
  lightnvm: add shorten OCSSD version in geo
  lightnvm: add minor version to generic geometry
  lightnvm: simplify geometry structure
  lightnvm: pblk: refactor init/exit sequences
  lightnvm: Avoid validation of default op value
  lightnvm: centralize permission check for lightnvm ioctl
  ...
2018-04-05 14:27:02 -07:00
Dan Williams f44c77630d fs, dax: prepare for dax-specific address_space_operations
In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Define some generic VFS aops
helpers for dax. These noop implementations are there in the dax case to
prevent the VFS from falling back to operations with page-cache
assumptions, dax_writeback_mapping_range() may not be referenced in the
FS_DAX=n case.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Matthew Wilcox <mawilcox@microsoft.com>
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-30 11:34:55 -07:00
Christoph Hellwig 0e11f6443f fs: move I_DIRTY_INODE to fs.h
And use it in a few more places rather than opencoding the values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28 01:39:02 -04:00
Mimi Zohar 57b56ac6fe ima: fail file signature verification on non-init mounted filesystems
FUSE can be mounted by unprivileged users either today with fusermount
installed with setuid, or soon with the upcoming patches to allow FUSE
mounts in a non-init user namespace.

This patch addresses the new unprivileged non-init mounted filesystems,
which are untrusted, by failing the signature verification.

This patch defines two new flags SB_I_IMA_UNVERIFIABLE_SIGNATURE and
SB_I_UNTRUSTED_MOUNTER.

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Dongsu Park <dongsu@kinvolk.io>
Cc: Alban Crequy <alban@kinvolk.io>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-03-23 06:31:37 -04:00
Mateusz Guzik 08fdc8a013 buffer.c: call thaw_super during emergency thaw
There are 2 distinct freezing mechanisms - one operates on block
devices and another one directly on super blocks. Both end up with the
same result, but thaw of only one of these does not thaw the other.

In particular fsfreeze --freeze uses the ioctl variant going to the
super block. Since prior to this patch emergency thaw was not doing
a relevant thaw, filesystems frozen with this method remained
unaffected.

The patch is a hack which adds blind unfreezing.

In order to keep the super block write-locked the whole time the code
is shuffled around and the newly introduced __iterate_supers is
employed.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-19 02:21:40 -04:00