Commit graph

326 commits

Author SHA1 Message Date
Amir Goldstein e506ac1dab ovl: hash directory inodes for fsnotify
commit 31747eda41 upstream.

fsnotify pins a watched directory inode in cache, but if directory dentry
is released, new lookup will allocate a new dentry and a new inode.
Directory events will be notified on the new inode, while fsnotify listener
is watching the old pinned inode.

Hash all directory inodes to reuse the pinned inode on lookup. Pure upper
dirs are hashes by real upper inode, merge and lower dirs are hashed by
real lower inode.

The reference to lower inode was being held by the lower dentry object
in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry
may drop lower inode refcount to zero. Add a refcount on behalf of the
overlay inode to prevent that.

As a by-product, hashing directory inodes also detects multiple
redirected dirs to the same lower dir and uncovered redirected dir
target on and returns -ESTALE on lookup.

The reported issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.

Cc: <stable@vger.kernel.org> #v4.13
Reported-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22 15:42:33 +01:00
Amir Goldstein c7aee3941e ovl: take mnt_want_write() for removing impure xattr
commit a5a927a7c8 upstream.

The optimization in ovl_cache_get_impure() that tries to remove an
unneeded "impure" xattr needs to take mnt_want_write() on upper fs.

Fixes: 4edb83bb10 ("ovl: constant d_ino for non-merge dirs")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16 20:23:10 +01:00
Amir Goldstein e822be7502 ovl: fix failure to fsync lower dir
commit d796e77f1d upstream.

As a writable mount, it is not expected for overlayfs to return
EINVAL/EROFS for fsync, even if dir/file is not changed.

This commit fixes the case of fsync of directory, which is easier to
address, because overlayfs already implements fsync file operation for
directories.

The problem reported by Raphael is that new PostgreSQL 10.0 with a
database in overlayfs where lower layer in squashfs fails to start.
The failure is due to fsync error, when PostgreSQL does fsync on all
existing db directories on startup and a specific directory exists
lower layer with no changes.

Reported-by: Raphael Hertzog <raphael@ouaza.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Raphaƫl Hertzog <hertzog@debian.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16 20:23:10 +01:00
Will Deacon 5383f45db3 locking/barriers: Convert users of lockless_dereference() to READ_ONCE()
commit 3382290ed2 upstream.

[ Note, this is a Git cherry-pick of the following commit:

    506458efaf ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")

  ... for easier x86 PTI code testing and back-porting. ]

READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:26:21 +01:00
Amir Goldstein cb6ee0f3c1 ovl: update ctx->pos on impure dir iteration
commit b02a16e641 upstream.

This fixes a regression with readdir of impure dir in overlayfs
that is shared to VM via 9p fs.

Reported-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Fixes: 4edb83bb10 ("ovl: constant d_ino for non-merge dirs")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:10:18 +01:00
Vivek Goyal 066f40dc49 ovl: Pass ovl_get_nlink() parameters in right order
commit 08d8f8a5b0 upstream.

Right now we seem to be passing index as "lowerdentry" and origin.dentry
as "upperdentry". IIUC, we should pass these parameters in reversed order
and this looks like a bug.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Fixes: caf70cb2ba ("ovl: cleanup orphan index entries")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:10:18 +01:00
Vivek Goyal 13e6560075 ovl: Put upperdentry if ovl_check_origin() fails
commit 5455f92b54 upstream.

If ovl_check_origin() fails, we should put upperdentry. We have a reference
on it by now. So goto out_put_upper instead of out.

Fixes: a9d019573e ("ovl: lookup non-dir copy-up-origin by file handle")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30 08:40:43 +00:00
Amir Goldstein fa0096e3ba ovl: do not cleanup unsupported index entries
With index=on, ovl_indexdir_cleanup() tries to cleanup invalid index
entries (e.g. bad index name). This behavior could result in cleaning of
entries created by newer kernels and is therefore undesirable.
Instead, abort mount if such entries are encountered. We still cleanup
'stale' entries and 'orphan' entries, both those cases can be a result
of offline changes to lower and upper dirs.

When encoutering an index entry of type directory or whiteout, kernel
was supposed to fallback to read-only mount, but the fill_super()
operation returns EROFS in this case instead of returning success with
read-only mount flag, so mount fails when encoutering directory or
whiteout index entries. Bless this behavior by returning -EINVAL on
directory and whiteout index entries as we do for all unsupported index
entries.

Fixes: 61b674710c ("ovl: do not cleanup directory and whiteout index..")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2017-10-24 16:06:17 +02:00
Amir Goldstein 7937a56fdf ovl: handle ENOENT on index lookup
Treat ENOENT from index entry lookup the same way as treating a returned
negative dentry. Apparently, either could be returned if file is not
found, depending on the underlying file system.

Fixes: 359f392ca5 ("ovl: lookup index entry for copy up origin")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2017-10-24 16:06:17 +02:00
Amir Goldstein 6eaf011144 ovl: fix EIO from lookup of non-indexed upper
Commit fbaf94ee3c ("ovl: don't set origin on broken lower hardlink")
attempt to avoid the condition of non-indexed upper inode with lower
hardlink as origin. If this condition is found, lookup returns EIO.

The protection of commit mentioned above does not cover the case of lower
that is not a hardlink when it is copied up (with either index=off/on)
and then lower is hardlinked while overlay is offline.

Changes to lower layer while overlayfs is offline should not result in
unexpected behavior, so a permanent EIO error after creating a link in
lower layer should not be considered as correct behavior.

This fix replaces EIO error with success in cases where upper has origin
but no index is found, or index is found that does not match upper
inode. In those cases, lookup will not fail and the returned overlay inode
will be hashed by upper inode instead of by lower origin inode.

Fixes: 359f392ca5 ("ovl: lookup index entry for copy up origin")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-24 16:06:16 +02:00
Dan Carpenter 0ce5cdc9d7 ovl: Return -ENOMEM if an allocation fails ovl_lookup()
The error code is missing here so it means we return ERR_PTR(0) or NULL.
The other error paths all return an error code so this probably should
as well.

Fixes: 02b69b284c ("ovl: lookup redirects")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-19 16:19:52 +02:00
Hirofumi Nakagawa b3885bd6ed ovl: add NULL check in ovl_alloc_inode
This was detected by fault injection test

Signed-off-by: Hirofumi Nakagawa <nklabs@gmail.com>
Fixes: 13cf199d00 ("ovl: allocate an ovl_inode struct")
Cc: <stable@vger.kernel.org> # v4.13
2017-10-19 16:19:51 +02:00
Amir Goldstein 85fdee1eef ovl: fix regression caused by exclusive upper/work dir protection
Enforcing exclusive ownership on upper/work dirs caused a docker
regression: https://github.com/moby/moby/issues/34672.

Euan spotted the regression and pointed to the offending commit.
Vivek has brought the regression to my attention and provided this
reproducer:

Terminal 1:

  mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
        merged/

Terminal 2:

  unshare -m

Terminal 1:

  umount merged
  mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
        merged/
  mount: /root/overlay-testing/merged: none already mounted or mount point
         busy

To fix the regression, I replaced the error with an alarming warning.
With index feature enabled, mount does fail, but logs a suggestion to
override exclusive dir protection by disabling index.
Note that index=off mount does take the inuse locks, so a concurrent
index=off will issue the warning and a concurrent index=on mount will fail.

Documentation was updated to reflect this change.

Fixes: 2cac0c00a6 ("ovl: get exclusive ownership on upper/work dirs")
Cc: <stable@vger.kernel.org> # v4.13
Reported-by: Euan Kemp <euank@euank.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-05 15:53:18 +02:00
Amir Goldstein 5820dc0888 ovl: fix missing unlock_rename() in ovl_do_copy_up()
Use the ovl_lock_rename_workdir() helper which requires
unlock_rename() only on lock success.

Fixes: ("fd210b7d67ee ovl: move copy up lock out")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-05 15:53:18 +02:00
Amir Goldstein dc7ab6773e ovl: fix dentry leak in ovl_indexdir_cleanup()
index dentry was not released when breaking out of the loop
due to index verification error.

Fixes: 415543d5c6 ("ovl: cleanup bad and stale index entries on mount")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-05 15:53:18 +02:00
Amir Goldstein 9f4ec904db ovl: fix dput() of ERR_PTR in ovl_cleanup_index()
Fixes: caf70cb2ba ("ovl: cleanup orphan index entries")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-05 15:53:18 +02:00
Amir Goldstein e0082a0f04 ovl: fix error value printed in ovl_lookup_index()
Fixes: 359f392ca5 ("ovl: lookup index entry for copy up origin")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-05 15:53:18 +02:00
Linus Torvalds 0f0d12728e Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
 "Another chunk of fmount preparations from dhowells; only trivial
  conflicts for that part. It separates MS_... bits (very grotty
  mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
  only a small subset of MS_... stuff).

  This does *not* convert the filesystems to new constants; only the
  infrastructure is done here. The next step in that series is where the
  conflicts would be; that's the conversion of filesystems. It's purely
  mechanical and it's better done after the merge, so if you could run
  something like

	list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

	sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
	        -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
	        -e 's/\<MS_NODEV\>/SB_NODEV/g' \
	        -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
	        -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
	        -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
	        -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
	        -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
	        -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
	        -e 's/\<MS_SILENT\>/SB_SILENT/g' \
	        -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
	        -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
	        -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
	        -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
	        $list

  and commit it with something along the lines of 'convert filesystems
  away from use of MS_... constants' as commit message, it would save a
  quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Differentiate mount flags (MS_*) from internal superblock flags
  VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
  vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-14 18:54:01 -07:00
Michal Hocko 0ee931c4e3 mm: treewide: remove GFP_TEMPORARY allocation flag
GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13 18:53:16 -07:00
Linus Torvalds c353f88f3d Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "This fixes d_ino correctness in readdir, which brings overlayfs on par
  with normal filesystems regarding inode number semantics, as long as
  all layers are on the same filesystem.

  There are also some bug fixes, one in particular (random ioctl's
  shouldn't be able to modify lower layers) that touches some vfs code,
  but of course no-op for non-overlay fs"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix false positive ESTALE on lookup
  ovl: don't allow writing ioctl on lower layer
  ovl: fix relatime for directories
  vfs: add flags to d_real()
  ovl: cleanup d_real for negative
  ovl: constant d_ino for non-merge dirs
  ovl: constant d_ino across copy up
  ovl: fix readdir error value
  ovl: check snprintf return
2017-09-13 09:11:44 -07:00
Amir Goldstein 939ae4efd5 ovl: fix false positive ESTALE on lookup
Commit b9ac5c274b ("ovl: hash overlay non-dir inodes by copy up origin")
verifies that the origin lower inode stored in the overlayfs inode matched
the inode of a copy up origin dentry found by lookup.

There is a false positive result in that check when lower fs does not
support file handles and copy up origin cannot be followed by file handle
at lookup time.

The false negative happens when finding an overlay inode in cache on a
copied up overlay dentry lookup. The overlay inode still 'remembers' the
copy up origin inode, but the copy up origin dentry is not available for
verification.

Relax the check in case copy up origin dentry is not available.

Fixes: b9ac5c274b ("ovl: hash overlay non-dir inodes by copy up...")
Cc: <stable@vger.kernel.org> # v4.13
Reported-by: Jordi Pujol <jordipujolp@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12 17:22:20 +02:00
Miklos Szeredi cd91304e71 ovl: fix relatime for directories
Need to treat non-regular overlayfs files the same as regular files when
checking for an atime update.

Add a d_real() flag to make it return the upper dentry for all file types.

Reported-by: "zhangyi (F)" <yi.zhang@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-05 12:53:11 +02:00
Miklos Szeredi 495e642939 vfs: add flags to d_real()
Add a separate flags argument (in addition to the open flags) to control
the behavior of d_real().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-04 21:42:22 +02:00
Miklos Szeredi 191a3980c6 ovl: cleanup d_real for negative
d_real() is never called with a negative dentry.  So remove the
d_is_negative() check (which would never trigger anyway, since d_is_reg()
returns false for a negative dentry).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-04 16:44:42 +02:00
Peter Zijlstra ff7a5fb0f1 overlayfs, locking: Remove smp_mb__before_spinlock() usage
While we could replace the smp_mb__before_spinlock() with the new
smp_mb__after_spinlock(), the normal pattern is to use
smp_store_release() to publish an object that is used for
lockless_dereference() -- and mirrors the regular rcu_assign_pointer()
/ rcu_dereference() patterns.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:02 +02:00
Miklos Szeredi 4edb83bb10 ovl: constant d_ino for non-merge dirs
Impure directories are ones which contain objects with origins (i.e. those
that have been copied up).  These are relevant to readdir operation only
because of the d_ino field, no other transformation is necessary.  Also a
directory can become impure between two getdents(2) calls.

This patch creates a cache for impure directories.  Unlike the cache for
merged directories, this one only contains entries with origin and is not
refcounted but has a its lifetime tied to that of the dentry.

Similarly to the merged cache, the impure cache is invalidated based on a
version number.  This version number is incremented when an entry with
origin is added or removed from the directory.

If the cache is empty, then the impure xattr is removed from the directory.

This patch also fixes up handling of d_ino for the ".." entry if the parent
directory is merged.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-27 21:54:06 +02:00
Amir Goldstein b5efccbe0a ovl: constant d_ino across copy up
When all layers are on the same fs, and iterating a directory which may
contain copy up entries, call vfs_getattr() on the overlay entries to make
sure that d_ino will be consistent with st_ino from stat(2).

There is an overhead of lookup per upper entry in readdir.

The overhead is minimal if the iterated entries are already in dcache.  It
is also quite useful for the common case of 'ls -l' that readdir() pre
populates the dcache with the listed entries, making the following stat()
calls faster.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-27 21:54:06 +02:00
Miklos Szeredi 31e8ccea3c ovl: fix readdir error value
actor's return value is taken as a bool (filled/not filled) so we need to
return the error in the context.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-27 21:54:06 +02:00
Miklos Szeredi 6787341a0f ovl: check snprintf return
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-27 21:54:05 +02:00
Amir Goldstein 0e082555ce ovl: check for bad and whiteout index on lookup
Index should always be of the same file type as origin, except for
the case of a whiteout index.  A whiteout index should only exist
if all lower aliases have been unlinked, which means that finding
a lower origin on lookup whose index is a whiteout should be treated
as a lookup error.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-20 11:08:21 +02:00
Amir Goldstein 61b674710c ovl: do not cleanup directory and whiteout index entries
Directory index entries are going to be used for looking up
redirected upper dirs by lower dir fh when decoding an overlay
file handle of a merge dir.

Whiteout index entries are going to be used as an indication that
an exported overlay file handle should be treated as stale (i.e.
after unlink of the overlay inode).

We don't know the verification rules for directory and whiteout
index entries, because they have not been implemented yet, so fail
to mount overlay rw if those entries are found to avoid corrupting
an index that was created by a newer kernel.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-20 11:08:21 +02:00
Miklos Szeredi 1d88f18373 ovl: fix xattr get and set with selinux
inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr
to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which
depends on dentry->d_inode, but d_inode is null and not initialized yet at
this point resulting in an Oops.

Fix by getting the upperdentry info from the inode directly in this case.

Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: 09d8b58673 ("ovl: move __upperdentry to ovl_inode")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-20 11:08:21 +02:00
David Howells bc98a42c1f VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
Firstly by applying the following with coccinelle's spatch:

	@@ expression SB; @@
	-SB->s_flags & MS_RDONLY
	+sb_rdonly(SB)

to effect the conversion to sb_rdonly(sb), then by applying:

	@@ expression A, SB; @@
	(
	-(!sb_rdonly(SB)) && A
	+!sb_rdonly(SB) && A
	|
	-A != (sb_rdonly(SB))
	+A != sb_rdonly(SB)
	|
	-A == (sb_rdonly(SB))
	+A == sb_rdonly(SB)
	|
	-!(sb_rdonly(SB))
	+!sb_rdonly(SB)
	|
	-A && (sb_rdonly(SB))
	+A && sb_rdonly(SB)
	|
	-A || (sb_rdonly(SB))
	+A || sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) != A
	+sb_rdonly(SB) != A
	|
	-(sb_rdonly(SB)) == A
	+sb_rdonly(SB) == A
	|
	-(sb_rdonly(SB)) && A
	+sb_rdonly(SB) && A
	|
	-(sb_rdonly(SB)) || A
	+sb_rdonly(SB) || A
	)

	@@ expression A, B, SB; @@
	(
	-(sb_rdonly(SB)) ? 1 : 0
	+sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) ? A : B
	+sb_rdonly(SB) ? A : B
	)

to remove left over excess bracketage and finally by applying:

	@@ expression A, SB; @@
	(
	-(A & MS_RDONLY) != sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
	|
	-(A & MS_RDONLY) == sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
	)

to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-17 08:45:34 +01:00
Amir Goldstein a59f97ff66 ovl: remove unneeded check for IS_ERR()
ovl_workdir_create() returns a valid index dentry or NULL.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-13 22:06:46 +02:00
Amir Goldstein 961af647fc ovl: fix origin verification of index dir
Commit 54fb347e83 ("ovl: verify index dir matches upper dir")
introduced a new ovl_fh flag OVL_FH_FLAG_PATH_UPPER to indicate
an upper file handle, but forgot to add the flag to the mask of
valid flags, so index dir origin verification always discards
existing origin and stores a new one.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-13 22:06:46 +02:00
Amir Goldstein ea3dad18dc ovl: mark parent impure on ovl_link()
When linking a file with copy up origin into a new parent, mark the
new parent dir "impure".

Fixes: ee1d6d37b6 ("ovl: mark upper dir with type origin entries "impure"")
Cc: <stable@vger.kernel.org> # v4.12
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-13 22:06:45 +02:00
Amir Goldstein 8fc646b443 ovl: fix random return value on mount
On failure to prepare_creds(), mount fails with a random
return value, as err was last set to an integer cast of
a valid lower mnt pointer or set to 0 if inodes index feature
is enabled.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 3fe6e52f06 ("ovl: override creds with the ones from ...")
Cc: <stable@vger.kernel.org> # v4.7
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-13 22:06:45 +02:00
Amir Goldstein f4439de118 ovl: mark parent impure and restore timestamp on ovl_link_up()
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2017-07-04 22:08:15 +02:00
Amir Goldstein caf70cb2ba ovl: cleanup orphan index entries
index entry should live only as long as there are upper or lower
hardlinks.

Cleanup orphan index entries on mount and when dropping the last
overlay inode nlink.

When about to cleanup or link up to orphan index and the index inode
nlink > 1, admit that something went wrong and adjust overlay nlink
to index inode nlink - 1 to prevent it from dropping below zero.
This could happen when adding lower hardlinks underneath a mounted
overlay and then trying to unlink them.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:19 +02:00
Amir Goldstein 5f8415d6b8 ovl: persistent overlay inode nlink for indexed inodes
With inodes index enabled, an overlay inode nlink counts the union of upper
and non-covered lower hardlinks. During the lifetime of a non-pure upper
inode, the following nlink modifying operations can happen:

1. Lower hardlink copy up
2. Upper hardlink created, unlinked or renamed over
3. Lower hardlink whiteout or renamed over

For the first, copy up case, the union nlink does not change, whether the
operation succeeds or fails, but the upper inode nlink may change.
Therefore, before copy up, we store the union nlink value relative to the
lower inode nlink in the index inode xattr trusted.overlay.nlink.

For the second, upper hardlink case, the union nlink should be incremented
or decremented IFF the operation succeeds, aligned with nlink change of the
upper inode. Therefore, before link/unlink/rename, we store the union nlink
value relative to the upper inode nlink in the index inode.

For the last, lower cover up case, we simplify things by preceding the
whiteout or cover up with copy up. This makes sure that there is an index
upper inode where the nlink xattr can be stored before the copied up upper
entry is unlink.

Return the overlay inode nlinks for indexed upper inodes on stat(2).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:19 +02:00
Amir Goldstein 59be09712a ovl: implement index dir copy up
Implement a copy up method for non-dir objects using index dir to
prevent breaking lower hardlinks on copy up.

This method requires that the inodes index dir feature was enabled and
that all underlying fs support file handle encoding/decoding.

On the first lower hardlink copy up, upper file is created in index dir,
named after the hex representation of the lower origin inode file handle.
On the second lower hardlink copy up, upper file is found in index dir,
by the same lower handle key.
On either case, the upper indexed inode is then linked to the copy up
upper path.

The index entry remains linked for future lower hardlink copy up and for
lower to upper inode map, that is needed for exporting overlayfs to NFS.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:19 +02:00
Miklos Szeredi fd210b7d67 ovl: move copy up lock out
Move ovl_copy_up_start()/ovl_copy_up_end() out so that it's used for both
tempfile and workdir copy ups.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:18 +02:00
Miklos Szeredi a6fb235a44 ovl: rearrange copy up
Split up and rearrange copy up functions to make them better readable.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:18 +02:00
Miklos Szeredi 55acc66182 ovl: add flag for upper in ovl_entry
For rename, we need to ensure that an upper alias exists for hard links
before attempting the operation.  Introduce a flag in ovl_entry to track
the state of the upper alias.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:18 +02:00
Miklos Szeredi 23f0ab13ea ovl: use struct copy_up_ctx as function argument
This cleans up functions with too many arguments.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:18 +02:00
Miklos Szeredi 7ab8b1763f ovl: base tmpfile in workdir too
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:18 +02:00
Amir Goldstein 02209d1070 ovl: factor out ovl_copy_up_inode() helper
Factor out helper for copying lower inode data and metadata to temp
upper inode, that is common to copy up using O_TMPFILE and workdir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:18 +02:00
Miklos Szeredi 7d90b853f9 ovl: extract helper to get temp file in copy up
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:18 +02:00
Amir Goldstein 15932c415b ovl: defer upper dir lock to tempfile link
On copy up of regular file using an O_TMPFILE, lock upper dir only
before linking the tempfile in place.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:18 +02:00
Miklos Szeredi b9ac5c274b ovl: hash overlay non-dir inodes by copy up origin
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-04 22:03:17 +02:00