1
0
Fork 0
Commit Graph

500 Commits (9f2959b6b52d43326b2f6a0e0d7ffe6f4fc3b5ca)

Author SHA1 Message Date
Miklos Szeredi 46e5d0a390 ovl: copy up file size as well
Copy i_size of the underlying inode to the overlay inode in ovl_copyattr().

This is in preparation for stacking I/O operations on overlay files.

This patch shouldn't have any observable effect.

Remove stale comment from ovl_setattr() [spotted by Vivek Goyal].

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi 5812160eb5 Revert "Revert "ovl: get_write_access() in truncate""
This reverts commit 31c3a70695.

Re-add functionality dealing with i_writecount on truncate to overlayfs.
This patch shouldn't have any observable effects, since we just re-assert
the writecout that vfs_truncate() already got for us.

This is in preparation for moving overlay functionality out of the VFS.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi 4f3572954a ovl: copy up inode flags
On inode creation copy certain inode flags from the underlying real inode
to the overlay inode.

This is in preparation for moving overlay functionality out of the VFS.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:41 +02:00
Miklos Szeredi d9854c87f0 ovl: copy up times
Copy up mtime and ctime to overlay inode after times in real object are
modified.  Be careful not to dirty cachelines when not necessary.

This is in preparation for moving overlay functionality out of the VFS.

This patch shouldn't have any observable effect.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:40 +02:00
Amir Goldstein 6781069307 ovl: fix wrong use of impure dir cache in ovl_iterate()
Only upper dir can be impure, but if we are in the middle of
iterating a lower real dir, dir could be copied up and marked
impure. We only want the impure cache if we started iterating
a real upper dir to begin with.

Aditya Kali reported that the following reproducer hits the
WARN_ON(!cache->refcount) in ovl_get_cache():

 docker run --rm drupal:8.5.4-fpm-alpine \
    sh -c 'cd /var/www/html/vendor/symfony && \
           chown -R www-data:www-data . && ls -l .'

Reported-by: Aditya Kali <adityakali@google.com>
Tested-by: Aditya Kali <adityakali@google.com>
Fixes: 4edb83bb10 ('ovl: constant d_ino for non-merge dirs')
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-17 16:04:34 +02:00
Linus Torvalds 7a932516f5 vfs/y2038: inode timestamps conversion to timespec64
This is a late set of changes from Deepa Dinamani doing an automated
 treewide conversion of the inode and iattr structures from 'timespec'
 to 'timespec64', to push the conversion from the VFS layer into the
 individual file systems.
 
 There were no conflicts between this and the contents of linux-next
 until just before the merge window, when we saw multiple problems:
 
 - A minor conflict with my own y2038 fixes, which I could address
   by adding another patch on top here.
 - One semantic conflict with late changes to the NFS tree. I addressed
   this by merging Deepa's original branch on top of the changes that
   now got merged into mainline and making sure the merge commit includes
   the necessary changes as produced by coccinelle.
 - A trivial conflict against the removal of staging/lustre.
 - Multiple conflicts against the VFS changes in the overlayfs tree.
   These are still part of linux-next, but apparently this is no longer
   intended for 4.18 [1], so I am ignoring that part.
 
 As Deepa writes:
 
   The series aims to switch vfs timestamps to use struct timespec64.
   Currently vfs uses struct timespec, which is not y2038 safe.
 
   The series involves the following:
   1. Add vfs helper functions for supporting struct timepec64 timestamps.
   2. Cast prints of vfs timestamps to avoid warnings after the switch.
   3. Simplify code using vfs timestamps so that the actual
      replacement becomes easy.
   4. Convert vfs timestamps to use struct timespec64 using a script.
      This is a flag day patch.
 
   Next steps:
   1. Convert APIs that can handle timespec64, instead of converting
      timestamps at the boundaries.
   2. Update internal data structures to avoid timestamp conversions.
 
 Thomas Gleixner adds:
 
   I think there is no point to drag that out for the next merge window.
   The whole thing needs to be done in one go for the core changes which
   means that you're going to play that catchup game forever. Let's get
   over with it towards the end of the merge window.
 
 [1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf
 MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1
 g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h
 L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+
 Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E
 LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs
 nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G
 wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v
 c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK
 tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/
 WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy
 A3HkjIBrKW5AgQDxfgvm
 =CZX2
 -----END PGP SIGNATURE-----

Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground

Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
 "This is a late set of changes from Deepa Dinamani doing an automated
  treewide conversion of the inode and iattr structures from 'timespec'
  to 'timespec64', to push the conversion from the VFS layer into the
  individual file systems.

  As Deepa writes:

   'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
       timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
       becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
       This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
       timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

  Thomas Gleixner adds:

   'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
  pstore: Remove bogus format string definition
  vfs: change inode times to use struct timespec64
  pstore: Convert internal records to timespec64
  udf: Simplify calls to udf_disk_stamp_to_time
  fs: nfs: get rid of memcpys for inode times
  ceph: make inode time prints to be long long
  lustre: Use long long type to print inode time
  fs: add timespec64_truncate()
2018-06-15 07:31:07 +09:00
Kees Cook 6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Deepa Dinamani 95582b0083 vfs: change inode times to use struct timespec64
struct timespec is not y2038 safe. Transition vfs to use
y2038 safe struct timespec64 instead.

The change was made with the help of the following cocinelle
script. This catches about 80% of the changes.
All the header file and logic changes are included in the
first 5 rules. The rest are trivial substitutions.
I avoid changing any of the function signatures or any other
filesystem specific data structures to keep the patch simple
for review.

The script can be a little shorter by combining different cases.
But, this version was sufficient for my usecase.

virtual patch

@ depends on patch @
identifier now;
@@
- struct timespec
+ struct timespec64
  current_time ( ... )
  {
- struct timespec now = current_kernel_time();
+ struct timespec64 now = current_kernel_time64();
  ...
- return timespec_trunc(
+ return timespec64_trunc(
  ... );
  }

@ depends on patch @
identifier xtime;
@@
 struct \( iattr \| inode \| kstat \) {
 ...
-       struct timespec xtime;
+       struct timespec64 xtime;
 ...
 }

@ depends on patch @
identifier t;
@@
 struct inode_operations {
 ...
int (*update_time) (...,
-       struct timespec t,
+       struct timespec64 t,
...);
 ...
 }

@ depends on patch @
identifier t;
identifier fn_update_time =~ "update_time$";
@@
 fn_update_time (...,
- struct timespec *t,
+ struct timespec64 *t,
 ...) { ... }

@ depends on patch @
identifier t;
@@
lease_get_mtime( ... ,
- struct timespec *t
+ struct timespec64 *t
  ) { ... }

@te depends on patch forall@
identifier ts;
local idexpression struct inode *inode_node;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn_update_time =~ "update_time$";
identifier fn;
expression e, E3;
local idexpression struct inode *node1;
local idexpression struct inode *node2;
local idexpression struct iattr *attr1;
local idexpression struct iattr *attr2;
local idexpression struct iattr attr;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
@@
(
(
- struct timespec ts;
+ struct timespec64 ts;
|
- struct timespec ts = current_time(inode_node);
+ struct timespec64 ts = current_time(inode_node);
)

<+... when != ts
(
- timespec_equal(&inode_node->i_xtime, &ts)
+ timespec64_equal(&inode_node->i_xtime, &ts)
|
- timespec_equal(&ts, &inode_node->i_xtime)
+ timespec64_equal(&ts, &inode_node->i_xtime)
|
- timespec_compare(&inode_node->i_xtime, &ts)
+ timespec64_compare(&inode_node->i_xtime, &ts)
|
- timespec_compare(&ts, &inode_node->i_xtime)
+ timespec64_compare(&ts, &inode_node->i_xtime)
|
ts = current_time(e)
|
fn_update_time(..., &ts,...)
|
inode_node->i_xtime = ts
|
node1->i_xtime = ts
|
ts = inode_node->i_xtime
|
<+... attr1->ia_xtime ...+> = ts
|
ts = attr1->ia_xtime
|
ts.tv_sec
|
ts.tv_nsec
|
btrfs_set_stack_timespec_sec(..., ts.tv_sec)
|
btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
|
- ts = timespec64_to_timespec(
+ ts =
...
-)
|
- ts = ktime_to_timespec(
+ ts = ktime_to_timespec64(
...)
|
- ts = E3
+ ts = timespec_to_timespec64(E3)
|
- ktime_get_real_ts(&ts)
+ ktime_get_real_ts64(&ts)
|
fn(...,
- ts
+ timespec64_to_timespec(ts)
,...)
)
...+>
(
<... when != ts
- return ts;
+ return timespec64_to_timespec(ts);
...>
)
|
- timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
|
- timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
+ timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
|
- timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
|
node1->i_xtime1 =
- timespec_trunc(attr1->ia_xtime1,
+ timespec64_trunc(attr1->ia_xtime1,
...)
|
- attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
+ attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
...)
|
- ktime_get_real_ts(&attr1->ia_xtime1)
+ ktime_get_real_ts64(&attr1->ia_xtime1)
|
- ktime_get_real_ts(&attr.ia_xtime1)
+ ktime_get_real_ts64(&attr.ia_xtime1)
)

@ depends on patch @
struct inode *node;
struct iattr *attr;
identifier fn;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
expression e;
@@
(
- fn(node->i_xtime);
+ fn(timespec64_to_timespec(node->i_xtime));
|
 fn(...,
- node->i_xtime);
+ timespec64_to_timespec(node->i_xtime));
|
- e = fn(attr->ia_xtime);
+ e = fn(timespec64_to_timespec(attr->ia_xtime));
)

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
fn (...,
- &attr->ia_xtime,
+ &ts,
...);
)
...+>
}

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
struct kstat *stat;
identifier ia_xtime =~ "^ia_[acm]time$";
identifier i_xtime =~ "^i_[acm]time$";
identifier xtime =~ "^[acm]time$";
identifier fn, ret;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(stat->xtime);
ret = fn (...,
- &stat->xtime);
+ &ts);
)
...+>
}

@ depends on patch @
struct inode *node;
struct inode *node2;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier i_xtime3 =~ "^i_[acm]time$";
struct iattr *attrp;
struct iattr *attrp2;
struct iattr attr ;
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
struct kstat *stat;
struct kstat stat1;
struct timespec64 ts;
identifier xtime =~ "^[acmb]time$";
expression e;
@@
(
( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
|
 node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 stat->xtime = node2->i_xtime1;
|
 stat1.xtime = node2->i_xtime1;
|
( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
|
( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
|
- e = node->i_xtime1;
+ e = timespec64_to_timespec( node->i_xtime1 );
|
- e = attrp->ia_xtime1;
+ e = timespec64_to_timespec( attrp->ia_xtime1 );
|
node->i_xtime1 = current_time(...);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
 node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
- node->i_xtime1 = e;
+ node->i_xtime1 = timespec_to_timespec64(e);
)

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: <anton@tuxera.com>
Cc: <balbi@kernel.org>
Cc: <bfields@fieldses.org>
Cc: <darrick.wong@oracle.com>
Cc: <dhowells@redhat.com>
Cc: <dsterba@suse.com>
Cc: <dwmw2@infradead.org>
Cc: <hch@lst.de>
Cc: <hirofumi@mail.parknet.co.jp>
Cc: <hubcap@omnibond.com>
Cc: <jack@suse.com>
Cc: <jaegeuk@kernel.org>
Cc: <jaharkes@cs.cmu.edu>
Cc: <jslaby@suse.com>
Cc: <keescook@chromium.org>
Cc: <mark@fasheh.com>
Cc: <miklos@szeredi.hu>
Cc: <nico@linaro.org>
Cc: <reiserfs-devel@vger.kernel.org>
Cc: <richard@nod.at>
Cc: <sage@redhat.com>
Cc: <sfrench@samba.org>
Cc: <swhiteho@redhat.com>
Cc: <tj@kernel.org>
Cc: <trond.myklebust@primarydata.com>
Cc: <tytso@mit.edu>
Cc: <viro@zeniv.linux.org.uk>
2018-06-05 16:57:31 -07:00
Amir Goldstein 01b39dcc95 ovl: use inode_insert5() to hash a newly created inode
Currently, there is a small window where ovl_obtain_alias() can
race with ovl_instantiate() and create two different overlay inodes
with the same underlying real non-dir non-hardlink inode.

The race requires an adversary to guess the file handle of the
yet to be created upper inode and decode the guessed file handle
after ovl_creat_real(), but before ovl_instantiate().
This race does not affect overlay directory inodes, because those
are decoded via ovl_lookup_real() and not with ovl_obtain_alias().

This patch fixes the race, by using inode_insert5() to add a newly
created inode to cache.

If the newly created inode apears to already exist in cache (hashed
by the same real upper inode), we instantiate the dentry with the old
inode and drop the new inode, instead of silently not hashing the new
inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:12 +02:00
Vivek Goyal ac6a52eb65 ovl: Pass argument to ovl_get_inode() in a structure
ovl_get_inode() right now has 5 parameters. Soon this patch series will
add 2 more and suddenly argument list starts looking too long.

Hence pass arguments to ovl_get_inode() in a structure and it looks
little cleaner.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:12 +02:00
Miklos Szeredi b148cba403 ovl: clean up copy-up error paths
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:11 +02:00
Miklos Szeredi dd8ac699ed ovl: return EIO on internal error
EIO better represents an internal error than ENOENT.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:11 +02:00
Al Viro f73cc77c3a ovl: make ovl_create_real() cope with vfs_mkdir() safely
vfs_mkdir() may succeed and leave the dentry passed to it unhashed and
negative.  ovl_create_real() is the last caller breaking when that
happens.

[amir: split re-factoring of ovl_create_temp() to prep patch
       add comment about unhashed dir after mkdir
       add pr_warn() if mkdir succeeds and lookup fails]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:11 +02:00
Amir Goldstein 137ec526a2 ovl: create helper ovl_create_temp()
Also used ovl_create_temp() in ovl_create_index() instead of calling
ovl_do_mkdir() directly, so now all callers of ovl_do_mkdir() are routed
through ovl_create_real(), which paves the way for Al's fix for non-hashed
result from vfs_mkdir().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:11 +02:00
Miklos Szeredi 95a1c8153a ovl: return dentry from ovl_create_real()
Al Viro suggested to simplify callers of ovl_create_real() by
returning the created dentry (or ERR_PTR) from ovl_create_real().

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:11 +02:00
Amir Goldstein 471ec5dcf4 ovl: struct cattr cleanups
* Rename to ovl_cattr

* Fold ovl_create_real() hardlink argument into struct ovl_cattr

* Create macro OVL_CATTR() to initialize struct ovl_cattr from mode

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:10 +02:00
Amir Goldstein 6cf00764b0 ovl: strip debug argument from ovl_do_ helpers
It did not prove to be useful.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:10 +02:00
Amir Goldstein a8b9e0ceed ovl: remove WARN_ON() real inode attributes mismatch
Overlayfs should cope with online changes to underlying layer
without crashing the kernel, which is what xfstest overlay/019
checks.

This test may sometimes trigger WARN_ON() in ovl_create_or_link()
when linking an overlay inode that has been changed on underlying
layer.

Remove those WARN_ON() to prevent the stress test from failing.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:10 +02:00
Miklos Szeredi 4280f74a57 ovl: Kconfig documentation fixes
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-31 11:06:10 +02:00
Amir Goldstein 795939a93e ovl: add support for "xino" mount and config options
With mount option "xino=on", mounter declares that there are enough
free high bits in underlying fs to hold the layer fsid.
If overlayfs does encounter underlying inodes using the high xino
bits reserved for layer fsid, a warning will be emitted and the original
inode number will be used.

The mount option name "xino" goes after a similar meaning mount option
of aufs, but in overlayfs case, the mapping is stateless.

An example for a use case of "xino=on" is when upper/lower is on an xfs
filesystem. xfs uses 64bit inode numbers, but it currently never uses the
upper 8bit for inode numbers exposed via stat(2) and that is not likely to
change in the future without user opting-in for a new xfs feature. The
actual number of unused upper bit is much larger and determined by the xfs
filesystem geometry (64 - agno_log - agblklog - inopblog). That means
that for all practical purpose, there are enough unused bits in xfs
inode numbers for more than OVL_MAX_STACK unique fsid's.

Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode
numbers are allocated sequentially since boot, so they will practially
never use the high inode number bits.

For compatibility with applications that expect 32bit inodes, the feature
can be disabled with "xino=off". The option "xino=auto" automatically
detects underlying filesystem that use 32bit inodes and enables the
feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of
the same name, determine if the default mode for overlayfs mount is
"xino=auto" or "xino=off".

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein adbf4f7ea8 ovl: consistent d_ino for non-samefs with xino
When overlay layers are not all on the same fs, but all inode numbers
of underlying fs do not use the high 'xino' bits, overlay st_ino values
are constant and persistent.

In that case, relax non-samefs constraint for consistent d_ino and always
iterate non-merge dir using ovl_fill_real() actor so we can remap lower
inode numbers to unique lower fs range.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein 12574a9f4c ovl: consistent i_ino for non-samefs with xino
When overlay layers are not all on the same fs, but all inode numbers
of underlying fs do not use the high 'xino' bits, overlay st_ino values
are constant and persistent.

In that case, set i_ino value to the same value as st_ino for nfsd
readdirplus validator.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein e487d889b7 ovl: constant st_ino for non-samefs with xino
On 64bit systems, when overlay layers are not all on the same fs, but
all inode numbers of underlying fs are not using the high bits, use the
high bits to partition the overlay st_ino address space.  The high bits
hold the fsid (upper fsid is 0).  This way overlay inode numbers are unique
and all inodes use overlay st_dev.  Inode numbers are also persistent
for a given layer configuration.

Currently, our only indication for available high ino bits is from a
filesystem that supports file handles and uses the default encode_fh()
operation, which encodes a 32bit inode number.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein 5148626b80 ovl: allocate anon bdev per unique lower fs
Instead of allocating an anonymous bdev per lower layer, allocate
one anonymous bdev per every unique lower fs that is different than
upper fs.

Every unique lower fs is assigned an fsid > 0 and the number of
unique lower fs are stored in ofs->numlowerfs.

The assigned fsid is stored in the lower layer struct and will be
used also for inode number multiplexing.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein da309e8c05 ovl: factor out ovl_map_dev_ino() helper
A helper for ovl_getattr() to map the values of st_dev and st_ino
according to constant st_ino rules.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Miklos Szeredi 8f35cf51cd ovl: cleanup ovl_update_time()
No need to mess with an alias, the upperdentry can be retrieved directly
from the overlay inode.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Miklos Szeredi 3a291774d1 ovl: add WARN_ON() for non-dir redirect cases
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:49 +02:00
Vivek Goyal 0471a9cdb0 ovl: cleanup setting OVL_INDEX
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:49 +02:00
Vivek Goyal 102b0d11cb ovl: set d->is_dir and d->opaque for last path element
Certain properties in ovl_lookup_data should be set only for the last
element of the path. IOW, if we are calling ovl_lookup_single() for an
absolute redirect, then d->is_dir and d->opaque do not make much sense
for intermediate path elements. Instead set them only if dentry being
lookup is last path element.

As of now we do not seem to be making use of d->opaque if it is set for
a path/dentry in lower. But just define the semantics so that future code
can make use of this assumption.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:49 +02:00
Vivek Goyal e9b77f90cc ovl: Do not check for redirect if this is last layer
If we are looking in last layer, then there should not be any need to
process redirect. redirect information is used only for lookup in next
lower layer and there is no more lower layer to look into. So no need
to process redirects.

IOW, ignore redirects on lowest layer.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:49 +02:00
Amir Goldstein 8b58924ad5 ovl: lookup in inode cache first when decoding lower file handle
When decoding a lower file handle, we need to check if lower file was
copied up and indexed and if it has a whiteout index, we need to check
if this is an unlinked but open non-dir before returning -ESTALE.

To find out if this is an unlinked but open non-dir we need to lookup
an overlay inode in inode cache by lower inode and that requires decoding
the lower file handle before looking in inode cache.

Before this change, if the lower inode turned out to be a directory, we
may have paid an expensive cost to reconnect that lower directory for
nothing.

After this change, we start by decoding a disconnected lower dentry and
using the lower inode for looking up an overlay inode in inode cache.
If we find overlay inode and dentry in cache, we avoid the index lookup
overhead. If we don't find an overlay inode and dentry in cache, then we
only need to decode a connected lower dentry in case the lower dentry is
a non-indexed directory.

The xfstests group overlay/exportfs tests decoding overlayfs file
handles after drop_caches with different states of the file at encode
and decode time. Overall the tests in the group call ovl_lower_fh_to_d()
89 times to decode a lower file handle.

Before this change, the tests called ovl_get_index_fh() 75 times and
reconnect_one() 61 times.
After this change, the tests call ovl_get_index_fh() 70 times and
reconnect_one() 59 times. The 2 cases where reconnect_one() was avoided
are cases where a non-upper directory file handle was encoded, then the
directory removed and then file handle was decoded.

To demonstrate the affect on decoding file handles with hot inode/dentry
cache, the drop_caches call in the tests was disabled. Without
drop_caches, there are no reconnect_one() calls at all before or after
the change. Before the change, there are 75 calls to ovl_get_index_fh(),
exactly as the case with drop_caches. After the change, there are only
10 calls to ovl_get_index_fh().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:49 +02:00
Amir Goldstein 8a22efa15b ovl: do not try to reconnect a disconnected origin dentry
On lookup of non directory, we try to decode the origin file handle
stored in upper inode. The origin file handle is supposed to be decoded
to a disconnected non-dir dentry, which is fine, because we only need
the lower inode of a copy up origin.

However, if the origin file handle somehow turns out to be a directory
we pay the expensive cost of reconnecting the directory dentry, only to
get a mismatch file type and drop the dentry.

Optimize this case by explicitly opting out of reconnecting the dentry.
Opting-out of reconnect is done by passing a NULL acceptable callback
to exportfs_decode_fh().

While the case described above is a strange corner case that does not
really need to be optimized, the API added for this optimization will
be used by a following patch to optimize a more common case of decoding
an overlayfs file handle.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:49 +02:00
Amir Goldstein 5b2cccd32c ovl: disambiguate ovl_encode_fh()
Rename ovl_encode_fh() to ovl_encode_real_fh() to differentiate from the
exportfs function ovl_encode_inode_fh() and change the latter to
ovl_encode_fh() to match the exportfs method name.

Rename ovl_decode_fh() to ovl_decode_real_fh() for consistency.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:49 +02:00
Amir Goldstein 9f99e50d46 ovl: set lower layer st_dev only if setting lower st_ino
For broken hardlinks, we do not return lower st_ino, so we should
also not return lower pseudo st_dev.

Fixes: a0c5ad307a ("ovl: relax same fs constraint for constant st_ino")
Cc: <stable@vger.kernel.org> #v4.15
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:49 +02:00
Amir Goldstein 3ec9b3fafc ovl: fix lookup with middle layer opaque dir and absolute path redirects
As of now if we encounter an opaque dir while looking for a dentry, we set
d->last=true. This means that there is no need to look further in any of
the lower layers. This works fine as long as there are no redirets or
relative redircts. But what if there is an absolute redirect on the
children dentry of opaque directory. We still need to continue to look into
next lower layer. This patch fixes it.

Here is an example to demonstrate the issue. Say you have following setup.

upper:  /redirect (redirect=/a/b/c)
lower1: /a/[b]/c       ([b] is opaque) (c has absolute redirect=/a/b/d/)
lower0: /a/b/d/foo

Now "redirect" dir should merge with lower1:/a/b/c/ and lower0:/a/b/d.
Note, despite the fact lower1:/a/[b] is opaque, we need to continue to look
into lower0 because children c has an absolute redirect.

Following is a reproducer.

Watch me make foo disappear:

 $ mkdir lower middle upper work work2 merged
 $ mkdir lower/origin
 $ touch lower/origin/foo
 $ mount -t overlay none merged/ \
         -olowerdir=lower,upperdir=middle,workdir=work2
 $ mkdir merged/pure
 $ mv merged/origin merged/pure/redirect
 $ umount merged
 $ mount -t overlay none merged/ \
         -olowerdir=middle:lower,upperdir=upper,workdir=work
 $ mv merged/pure/redirect merged/redirect

Now you see foo inside a twice redirected merged dir:

 $ ls merged/redirect
 foo
 $ umount merged
 $ mount -t overlay none merged/ \
         -olowerdir=middle:lower,upperdir=upper,workdir=work

After mount cycle you don't see foo inside the same dir:

 $ ls merged/redirect

During middle layer lookup, the opaqueness of middle/pure is left in
the lookup state and then middle/pure/redirect is wrongly treated as
opaque.

Fixes: 02b69b284c ("ovl: lookup redirects")
Cc: <stable@vger.kernel.org> #v4.10
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:48 +02:00
Vivek Goyal 452061fd45 ovl: Set d->last properly during lookup
d->last signifies that this is the last layer we are looking into and there
is no more. And that means this allows for some optimzation opportunities
during lookup. For example, in ovl_lookup_single() we don't have to check
for opaque xattr of a directory is this is the last layer we are looking
into (d->last = true).

But knowing for sure whether we are looking into last layer can be very
tricky. If redirects are not enabled, then we can look at poe->numlower and
figure out if the lookup we are about to is last layer or not. But if
redircts are enabled then it is possible poe->numlower suggests that we are
looking in last layer, but there is an absolute redirect present in found
element and that redirects us to a layer in root and that means lookup will
continue in lower layers further.

For example, consider following.

/upperdir/pure (opaque=y)
/upperdir/pure/foo (opaque=y,redirect=/bar)
/lowerdir/bar

In this case pure is "pure upper". When we look for "foo", that time
poe->numlower=0. But that alone does not mean that we will not search for a
merge candidate in /lowerdir. Absolute redirect changes that.

IOW, d->last should not be set just based on poe->numlower if redirects are
enabled. That can lead to setting d->last while it should not have and that
means we will not check for opaque xattr while we should have.

So do this.

 - If redirects are not enabled, then continue to rely on poe->numlower
   information to determine if it is last layer or not.

 - If redirects are enabled, then set d->last = true only if this is the
   last layer in root ovl_entry (roe).

Suggested-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 02b69b284c ("ovl: lookup redirects")
Cc: <stable@vger.kernel.org> #v4.10
2018-04-12 12:04:48 +02:00
Amir Goldstein 695b46e76b ovl: set i_ino to the value of st_ino for NFS export
Eddie Horng reported that readdir of an overlayfs directory that
was exported via NFSv3 returns entries with d_type set to DT_UNKNOWN.
The reason is that while preparing the response for readdirplus, nfsd
checks inside encode_entryplus_baggage() that a child dentry's inode
number matches the value of d_ino returns by overlayfs readdir iterator.

Because the overlayfs inodes use arbitrary inode numbers that are not
correlated with the values of st_ino/d_ino, NFSv3 falls back to not
encoding d_type. Although this is an allowed behavior, we can fix it for
the case of all overlayfs layers on the same underlying filesystem.

When NFS export is enabled and d_ino is consistent with st_ino
(samefs), set the same value also to i_ino in ovl_fill_inode() for all
overlayfs inodes, nfsd readdirplus sanity checks will pass.
ovl_fill_inode() may be called from ovl_new_inode(), before real inode
was created with ino arg 0. In that case, i_ino will be updated to real
upper inode i_ino on ovl_inode_init() or ovl_inode_update().

Reported-by: Eddie Horng <eddiehorng.tw@gmail.com>
Tested-by: Eddie Horng <eddiehorng.tw@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Fixes: 8383f17488 ("ovl: wire up NFS export operations")
Cc: <stable@vger.kernel.org> #v4.16
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:48 +02:00
Miklos Szeredi 36cd95dfa1 ovl: update Kconfig texts
Add some hints about overlayfs kernel config options.

Enabling NFS export by default is especially recommended against, as it
incurs a performance penalty even if the filesystem is not actually
exported.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-03-07 11:47:15 +01:00
Vivek Goyal d1fe96c0e4 ovl: redirect_dir=nofollow should not follow redirect for opaque lower
redirect_dir=nofollow should not follow a redirect. But in a specific
configuration it can still follow it.  For example try this.

$ mkdir -p lower0 lower1/foo upper work merged
$ touch lower1/foo/lower-file.txt
$ setfattr -n "trusted.overlay.opaque" -v "y" lower1/foo
$ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=on none merged
$ cd merged
$ mv foo foo-renamed
$ umount merged

# mount again. This time with redirect_dir=nofollow
$ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=nofollow none merged
$ ls merged/foo-renamed/
# This lists lower-file.txt, while it should not have.

Basically, we are doing redirect check after we check for d.stop. And
if this is not last lower, and we find an opaque lower, d.stop will be
set.

ovl_lookup_single()
        if (!d->last && ovl_is_opaquedir(this)) {
                d->stop = d->opaque = true;
                goto out;
        }

To fix this, first check redirect is allowed. And after that check if
d.stop has been set or not.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Fixes: 438c84c2f0 ("ovl: don't follow redirects if redirect_dir=off")
Cc: <stable@vger.kernel.org> #v4.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-26 16:55:51 +01:00
Fengguang Wu b5095f24e7 ovl: fix ptr_ret.cocci warnings
fs/overlayfs/export.c:459:10-16: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 4b91c30a5a ("ovl: lookup connected ancestor of dir in inode cache")
CC: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-26 12:45:20 +01:00
Amir Goldstein 7168179fcf ovl: check ERR_PTR() return value from ovl_lookup_real()
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 0617015403 ("ovl: lookup indexed ancestor of lower dir")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-16 15:53:20 +01:00
Amir Goldstein 2ca3c148a0 ovl: check lower ancestry on encode of lower dir file handle
This change relaxes copy up on encode of merge dir with lower layer > 1
and handles the case of encoding a merge dir with lower layer 1, where an
ancestor is a non-indexed merge dir. In that case, decode of the lower
file handle will not have been possible if the non-indexed ancestor is
redirected before or after encode.

Before encoding a non-upper directory file handle from real layer N, we
need to check if it will be possible to reconnect an overlay dentry from
the real lower decoded dentry. This is done by following the overlay
ancestry up to a "layer N connected" ancestor and verifying that all
parents along the way are "layer N connectable". If an ancestor that is
NOT "layer N connectable" is found, we need to copy up an ancestor, which
is "layer N connectable", thus making that ancestor "layer N connected".
For example:

 layer 1: /a
 layer 2: /a/b/c

The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is
copied up and renamed, upper dir /a will be indexed by lower dir /a from
layer 1. The dir /a from layer 2 will never be indexed, so the algorithm
in ovl_lookup_real_ancestor() (*) will not be able to lookup a connected
overlay dentry from the connected lower dentry /a/b/c.

To avoid this problem on decode time, we need to copy up an ancestor of
/a/b/c, which is "layer 2 connectable", on encode time. That ancestor is
/a/b. After copy up (and index) of /a/b, it will become "layer 2 connected"
and when the time comes to decode the file handle from lower dentry /a/b/c,
ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding
a connected overlay dentry will be accomplished.

(*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup
an entry /a in the lower layers above layer N and find the indexed dir /a
from layer 1. If that improvement is made, then the check for "layer N
connected" will need to verify there are no redirects in lower layers above
layer N. In the example above, /a will be "layer 2 connectable". However,
if layer 2 dir /a is a target of a layer 1 redirect, then /a will NOT be
"layer 2 connectable":

 layer 1: /A (redirect = /a)
 layer 2: /a/b/c

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-16 15:53:20 +01:00
Amir Goldstein 764baba801 ovl: hash non-dir by lower inode for fsnotify
Commit 31747eda41 ("ovl: hash directory inodes for fsnotify")
fixed an issue of inotify watch on directory that stops getting
events after dropping dentry caches.

A similar issue exists for non-dir non-upper files, for example:

$ mkdir -p lower upper work merged
$ touch lower/foo
$ mount -t overlay -o
lowerdir=lower,workdir=work,upperdir=upper none merged
$ inotifywait merged/foo &
$ echo 2 > /proc/sys/vm/drop_caches
$ cat merged/foo

inotifywait doesn't get the OPEN event, because ovl_lookup() called
from 'cat' allocates a new overlay inode and does not reuse the
watched inode.

Fix this by hashing non-dir overlay inodes by lower real inode in
the following cases that were not hashed before this change:
 - A non-upper overlay mount
 - A lower non-hardlink when index=off

A helper ovl_hash_bylower() was added to put all the logic and
documentation about which real inode an overlay inode is hashed by
into one place.

The issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.

Cc: <stable@vger.kernel.org> #v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-16 15:53:20 +01:00
Amir Goldstein 9b6faee074 ovl: check ERR_PTR() return value from ovl_encode_fh()
Another fix for an issue reported by 0-day robot.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 8ed5eec9d6 ("ovl: encode pure upper file handles")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-05 09:50:29 +01:00
Amir Goldstein 2aed489d16 ovl: fix regression in fsnotify of overlay merge dir
A re-factoring patch in NFS export series has passed the wrong argument
to ovl_get_inode() causing a regression in the very recent fix to
fsnotify of overlay merge dir.

The regression has caused merge directory inodes to be hashed by upper
instead of lower real inode, when NFS export and directory indexing is
disabled. That caused an inotify watch to become obsolete after directory
copy up and drop caches.

LTP test inotify07 was improved to catch this regression.
The regression also caused multiple redirect dirs to same origin not to
be detected on lookup with NFS export disabled. An xfstest was added to
cover this case.

Fixes: 0aceb53e73 ("ovl: do not pass overlay dentry to ovl_get_inode()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-05 09:50:29 +01:00
Amir Goldstein 8383f17488 ovl: wire up NFS export operations
Now that NFS export operations are implemented, enable overlayfs NFS
export support if the "nfs_export" feature is enabled.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:06 +01:00
Amir Goldstein 0617015403 ovl: lookup indexed ancestor of lower dir
ovl_lookup_real() in lower layer walks back lower parents to find the
topmost indexed parent. If an indexed ancestor is found before reaching
lower layer root, ovl_lookup_real() is called recursively with upper
layer to walk back from indexed upper to the topmost connected/hashed
upper parent (or up to root).

ovl_lookup_real() in upper layer then walks forward to connect the topmost
upper overlay dir dentry and ovl_lookup_real() in lower layer continues to
walk forward to connect the decoded lower overlay dir dentry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:05 +01:00
Amir Goldstein 4b91c30a5a ovl: lookup connected ancestor of dir in inode cache
Decoding a dir file handle requires walking backward up to layer root and
for lower dir also checking the index to see if any of the parents have
been copied up.

Lookup overlay ancestor dentry in inode/dentry cache by decoded real
parents to shortcut looking up all the way back to layer root.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:05 +01:00
Amir Goldstein 7a9dadef96 ovl: hash non-indexed dir by upper inode for NFS export
Non-indexed upper dirs are encoded as upper file handles. When NFS export
is enabled, hash non-indexed directory inodes by upper inode, so we can
find them in inode cache using the decoded upper inode.

When NFS export is disabled, directories are not indexed on copy up, so
hash non-indexed directory inodes by origin inode, the same hash key
that is used before copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:04 +01:00
Amir Goldstein 988925164f ovl: decode pure lower dir file handles
Similar to decoding a pure upper dir file handle, decoding a pure lower
dir file handle is implemented by looking an overlay dentry of the same
path as the pure lower path and verifying that the overlay dentry's
real lower matches the decoded real lower file handle.

Unlike the case of upper dir file handle, the lookup of overlay path by
lower real path can fail or find a mismatched overlay dentry if any of
the lower parents have been copied up and renamed. To address this case
we will need to check if any of the lower parents are indexed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:04 +01:00
Amir Goldstein 3b0bfc6ed3 ovl: decode indexed dir file handles
Decoding an indexed dir file handle is done by looking up the file handle
in index dir by name and then decoding the upper dir from the index origin
file handle. The decoded upper path is used to lookup an overlay dentry of
the same path.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:03 +01:00
Amir Goldstein 9436a1a339 ovl: decode lower file handles of unlinked but open files
Lookup overlay inode in cache by origin inode, so we can decode a file
handle of an open file even if the index has a whiteout index entry to
mark this overlay inode was unlinked.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:03 +01:00
Amir Goldstein f71bd9cfb6 ovl: decode indexed non-dir file handles
Decoding an indexed non-dir file handle is similar to decoding a lower
non-dir file handle, but additionally, we lookup the file handle in index
dir by name to find the real upper inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:03 +01:00
Amir Goldstein f941866fc4 ovl: decode lower non-dir file handles
Decoding a lower non-dir file handle is done by decoding the lower dentry
from underlying lower fs, finding or allocating an overlay inode that is
hashed by the real lower inode and instantiating an overlay dentry with
that inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:02 +01:00
Amir Goldstein 03e1c584ff ovl: encode lower file handles
For indexed or lower non-dir, encode a non-connectable lower file handle
from origin inode. For indexed or lower dir, when ofs->numlower == 1,
encode a lower file handle from lower dir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:02 +01:00
Amir Goldstein 05e1f11816 ovl: copy up before encoding non-connectable dir file handle
Decoding a merge dir, whose origin's parent is under a redirected
lower dir is not always possible. As a simple aproximation, we do
not encode lower dir file handles when overlay has multiple lower
layers and origin is below the topmost lower layer.

We should later relax this condition and copy up only the parent
that is under a redirected lower.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:01 +01:00
Amir Goldstein b305e8443f ovl: encode non-indexed upper file handles
We only need to encode origin if there is a chance that the same object was
encoded pre copy up and then we need to stay consistent with the same
encoding also after copy up.

In case a non-pure upper is not indexed, then it was copied up before NFS
export support was enabled. In that case, we don't need to worry about
staying consistent with pre copy up encoding and we encode an upper file
handle.

This mitigates the problem that with no index, we cannot find an upper
inode from origin inode, so we cannot decode a non-indexed upper from
origin file handle.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:01 +01:00
Amir Goldstein 3985b70a3e ovl: decode connected upper dir file handles
Until this change, we decoded upper file handles by instantiating an
overlay dentry from the real upper dentry. This is sufficient to handle
pure upper files, but insufficient to handle merge/impure dirs.

To that end, if decoded real upper dir is connected and hashed, we
lookup an overlay dentry with the same path as the real upper dir.
If decoded real upper is non-dir, we instantiate a disconnected overlay
dentry as before this change.

Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
exportfs never needs to call get_parent() and get_name() to reconnect an
upper overlay dir. Because connectable non-dir file handles are not
supported, exportfs will not be able to use fh_to_parent() and get_name()
methods to reconnect a disconnected non-dir to its parent. Therefore, the
methods get_parent() and get_name() are implemented just to print out a
sanity warning and the method fh_to_parent() is implemented to warn the
user that using the 'subtree_check' exportfs option is not supported.

An alternative approach could have been to implement instantiating of
an overlay directory inode from origin/index and implement get_parent()
and get_name() by calling into underlying fs operations and them
instantiating the overlay parent dir.

The reasons for not choosing the get_parent() approach were:
- Obtaining a disconnected overlay dir dentry would requires a
  delicate re-factoring of ovl_lookup() to get a dentry with overlay
  parent info. It was preferred to avoid doing that re-factoring unless
  it was proven worthy.
- Going down the path of disconnected dir would mean that the (non
  trivial) code path of d_splice_alias() could be traveled and that
  meant writing more tests and introduces race cases that are very hard
  to hit on purpose. Taking the path of connecting overlay dentry by
  forward lookup is therefore the safe and boring way to avoid surprises.

The culprits of the chosen "connected overlay dentry" approach:
- We need to take special care to rename of ancestors while connecting
  the overlay dentry by real dentry path. These subtleties are usually
  handled by generic exportfs and VFS code.
- In a hypothetical workload, we could end up in a loop trying to connect,
  interrupted by rename and restarting connect forever.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:00 +01:00
Amir Goldstein 8556a4205b ovl: decode pure upper file handles
Decoding an upper file handle is done by decoding the upper dentry from
underlying upper fs, finding or allocating an overlay inode that is
hashed by the real upper inode and instantiating an overlay dentry with
that inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:00 +01:00
Amir Goldstein 8ed5eec9d6 ovl: encode pure upper file handles
Encode overlay file handles as struct ovl_fh containing the file handle
encoding of the real upper inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:59 +01:00
Amir Goldstein c62520a83b ovl: store 'has_upper' and 'opaque' as bit flags
We need to make some room in struct ovl_entry to store information
about redirected ancestors for NFS export, so cram two booleans as
bit flags.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:58 +01:00
Amir Goldstein aa3ff3c152 ovl: copy up of disconnected dentries
With NFS export, some operations on decoded file handles (e.g. open,
link, setattr, xattr_set) may call copy up with a disconnected non-dir.
In this case, we will copy up lower inode to index dir without
linking it to upper dir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:58 +01:00
Amir Goldstein 829c28be9b ovl: use d_splice_alias() in place of d_add() in lookup
This is required for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:57 +01:00
Amir Goldstein 0aceb53e73 ovl: do not pass overlay dentry to ovl_get_inode()
This is needed for using ovl_get_inode() for decoding file handles
for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:57 +01:00
Amir Goldstein 91ffe7beb3 ovl: factor out ovl_get_index_fh() helper
The helper is needed to lookup an index by file handle for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:56 +01:00
Amir Goldstein 24f0b17203 ovl: whiteout orphan index entries on mount
Orphan index entries are non-dir index entries whose union nlink count
dropped to zero. With index=on, orphan index entries are removed on
mount. With NFS export feature enabled, orphan index entries are replaced
with white out index entries to block future open by handle from opening
the lower file.

When dir index has a stale 'upper' xattr, we assume that the upper dir
was removed and we treat the dir index as orphan entry that needs to be
whited out or removed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:56 +01:00
Amir Goldstein e7dd0e7134 ovl: whiteout index when union nlink drops to zero
With NFS export feature enabled, when overlay inode nlink drops to
zero, instead of removing the index entry, replace it with a whiteout
index entry.

This is needed for NFS export in order to prevent future open by handle
from opening the lower file directly.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:56 +01:00
Amir Goldstein 89a17556ce ovl: cleanup dir index when dir nlink drops to zero
When non-dir index union nlink drops to zero the non-dir index
is cleaned. Do the same for directory type index entries when
union directory is removed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:55 +01:00
Amir Goldstein 016b720f55 ovl: index directories on copy up for NFS export
With the NFS export feature enabled, all dirs are indexed on copy up.
Non-dir files are copied up directly to indexdir and then hardlinked
to upper dir.

Directories are copied up to indexdir, then an index entry is created
in indexdir with 'upper' xattr pointing to the copied up dir and then
the copied up dir is moved to upper dir.

Directory index is also used for consistency verification, like
detecting multiple redirected dirs to the same lower dir on lookup.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:55 +01:00
Amir Goldstein fbd2d2074b ovl: index all non-dir on copy up for NFS export
With the NFS export feature enabled, all non-dir are indexed on copy up.
The copy up origin inode of an indexed non-dir can be used as a unique
identifier of the overlay object.

The full index is also used for consistency verfication, like detecting
multiple non-hardlink uppers with the same 'origin' on lookup.

Directory index on copy up will be implemented by following patch.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:54 +01:00
Amir Goldstein 24b33ee104 ovl: create ovl_need_index() helper
The helper determines which lower file needs to be indexed
on copy up and before nlink changes.

For index=on, the helper evaluates to true for lower hardlinks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:54 +01:00
Amir Goldstein 9ee60ce249 ovl: cleanup temp index entries
A previous failed attempt to create or whiteout a directory index may
leave index entries named '#%x' in the index dir. Cleanup those temp
entries on mount instead of failing the mount.

In the future, we may drop 'work' dir and use 'index' dir instead.
This change is enough for cleaning up copy up leftovers 'from the future',
but it is not enough for cleaning up rmdir leftovers 'from the future'
(i.e. temp dir containing whiteouts).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:53 +01:00
Amir Goldstein e8f9e5b780 ovl: verify directory index entries on mount
Directory index entries should have 'upper' xattr pointing to the real
upper dir. Verifying that the upper dir file handle is not stale is
expensive, so only verify stale directory index entries on mount if
NFS export feature is enabled.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:53 +01:00
Amir Goldstein 7db25d36d9 ovl: verify whiteout index entries on mount
Whiteout index entries are used as an indication that an exported
overlay file handle should be treated as stale (i.e. after unlink
of the overlay inode).

Check on mount that whiteout index entries have a name that looks like
a valid file handle and cleanup invalid index entries.

For whiteout index entries, do not check that they also have valid
origin fh and nlink xattr, because those xattr do not exist for a
whiteout index entry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:53 +01:00
Amir Goldstein ad1d615cec ovl: use directory index entries for consistency verification
A directory index is a directory type entry in index dir with a
"trusted.overlay.upper" xattr containing an encoded ovl_fh of the merge
directory upper dir inode.

On lookup of non-dir files, lower file is followed by origin file handle.
On lookup of dir entries, lower dir is found by name and then compared
to origin file handle. We only trust dir index if we verified that lower
dir matches origin file handle, otherwise index may be inconsistent and
we ignore it.

If we find an indexed non-upper dir or an indexed merged dir, whose
index 'upper' xattr points to a different upper dir, that means that the
lower directory may be also referenced by another upper dir via redirect,
so we fail the lookup on inconsistency error.

To be consistent with directory index entries format, the association of
index dir to upper root dir, that was stored by older kernels in
"trusted.overlay.origin" xattr is now stored in "trusted.overlay.upper"
xattr. This also serves as an indication that overlay was mounted with a
kernel that support index directory entries. For backward compatibility,
if an 'origin' xattr exists on the index dir we also verify it on mount.

Directory index entries are going to be used for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:52 +01:00
Amir Goldstein 86eaa13046 ovl: unbless lower st_ino of unverified origin
On a malformed overlay, several redirected dirs can point to the same
dir on a lower layer. This presents a similar challenge as broken
hardlinks, because different objects in the overlay can return the same
st_ino/st_dev pair from stat(2).

For broken hardlinks, we do not provide constant st_ino on copy up to
avoid this inconsistency. When NFS export feature is enabled, apply
the same logic to files and directories with unverified lower origin.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:52 +01:00
Amir Goldstein 37b12916c0 ovl: verify stored origin fh matches lower dir
When the NFS export feature is enabled, overlayfs implicitly enables the
feature "verify_lower". When the "verify_lower" feature is enabled, a
directory inode found in lower layer by name or by redirect_dir is
verified against the file handle of the copy up origin that is stored in
the upper layer.

This introduces a change of behavior for the case of lower layer
modification while overlay is offline. A lower directory created or
moved offline under an exisitng upper directory, will not be merged with
that upper directory.

The NFS export feature should not be used after copying layers, because
the new lower directory inodes would fail verification and won't be
merged with upper directories.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:51 +01:00
Amir Goldstein f168f1098d ovl: add support for "nfs_export" configuration
Introduce the "nfs_export" config, module and mount options.

The NFS export feature depends on the "index" feature and enables two
implicit overlayfs features: "index_all" and "verify_lower".
The "index_all" feature creates an index on copy up of every file and
directory. The "verify_lower" feature uses the full index to detect
overlay filesystems inconsistencies on lookup, like redirect from
multiple upper dirs to the same lower dir.

NFS export can be enabled for non-upper mount with no index. However,
because lower layer redirects cannot be verified with the index, enabling
NFS export support on an overlay with no upper layer requires turning off
redirect follow (e.g. "redirect_dir=nofollow").

The full index may incur some overhead on mount time, especially when
verifying that lower directory file handles are not stale.

NFS export support, full index and consistency verification will be
implemented by following patches.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:37 +01:00
Amir Goldstein 60b866420b ovl: update documentation of inodes index feature
Document that inode index feature solves breaking hard links on
copy up.

Simplify Kconfig backward compatibility disclaimer.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 10:20:02 +01:00
Amir Goldstein 051224438a ovl: generalize ovl_verify_origin() and helpers
Remove the "origin" language from the functions that handle set, get
and verify of "origin" xattr and pass the xattr name as an argument.

The same helpers are going to be used for NFS export to get, get and
verify the "upper" xattr for directory index entries.

ovl_verify_origin() is now a helper used only to verify non upper
file handle stored in "origin" xattr of upper inode.

The upper root dir file handle is still stored in "origin" xattr on
the index dir for backward compatibility. This is going to be changed
by the patch that adds directory index entries support.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 10:19:54 +01:00
Amir Goldstein 1eff1a1dee ovl: simplify arguments to ovl_check_origin_fh()
Pass the fs instance with lower_layers array instead of the dentry
lowerstack array to ovl_check_origin_fh(), because the dentry members
of lowerstack play no role in this helper.

This change simplifies the argument list of ovl_check_origin(),
ovl_cleanup_index() and ovl_verify_index().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 10:19:46 +01:00
Amir Goldstein 2e1a532883 ovl: factor out ovl_check_origin_fh()
Re-factor ovl_check_origin() and ovl_get_origin(), so origin fh xattr is
read from upper inode only once during lookup with multiple lower layers
and only once when verifying index entry origin.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 10:19:35 +01:00
Amir Goldstein d583ed7d13 ovl: store layer index in ovl_layer
Store the fs root layer index inside ovl_layer struct, so we can
get the root fs layer index from merge dir lower layer instead of
find it with ovl_find_layer() helper.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 10:19:25 +01:00
Amir Goldstein 972d0093c2 ovl: force r/o mount when index dir creation fails
When work dir creation fails, a warning is emitted and overlay is
mounted r/o. Trying to remount r/w will fail with no work dir.

When index dir creation fails, the same warning is emitted and overlay
is mounted r/o, but trying to remount r/w will succeed. This may cause
unintentional corruption of filesystem consistency.

Adjust the behavior of index dir creation failure to that of work dir
creation failure and do not allow to remount r/w. User needs to state
an explicitly intention to work without an index by mounting with
option 'index=off' to allow r/w mount with no index dir.

When mounting with option 'index=on' and no 'upperdir', index is
implicitly disabled, so do not warn about no file handle support.

The issue was introduced with inodes index feature in v4.13, but this
patch will not apply cleanly before ovl_fill_super() re-factoring in
v4.15.

Fixes: 02bcd15774 ("ovl: introduce the inodes index dir feature")
Cc: <stable@vger.kernel.org> #v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 10:19:14 +01:00
Amir Goldstein a683737ba9 ovl: disable index when no xattr support
Overlayfs falls back to index=off if lower/upper fs does not support
file handles. Do the same if upper fs does not support xattr.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 10:19:07 +01:00
Amir Goldstein 9678e63030 ovl: fix inconsistent d_ino for legacy merge dir
For a merge dir that was copied up before v4.12 or that was hand crafted
offline (e.g. mkdir {upper/lower}/dir), upper dir does not contain the
'trusted.overlay.origin' xattr.  In that case, stat(2) on the merge dir
returns the lower dir st_ino, but getdents(2) returns the upper dir d_ino.

After this change, on merge dir lookup, missing origin xattr on upper
dir will be fixed and 'impure' xattr will be fixed on parent of the legacy
merge dir.

Suggested-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 10:18:19 +01:00
Amir Goldstein a5a927a7c8 ovl: take mnt_want_write() for removing impure xattr
The optimization in ovl_cache_get_impure() that tries to remove an
unneeded "impure" xattr needs to take mnt_want_write() on upper fs.

Fixes: 4edb83bb10 ("ovl: constant d_ino for non-merge dirs")
Cc: <stable@vger.kernel.org> #v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-19 17:43:24 +01:00
Amir Goldstein 2ba9d57e65 ovl: take mnt_want_write() for work/index dir setup
There are several write operations on upper fs not covered by
mnt_want_write():

- test set/remove OPAQUE xattr
- test create O_TMPFILE
- set ORIGIN xattr in ovl_verify_origin()
- cleanup of index entries in ovl_indexdir_cleanup()

Some of these go way back, but this patch only applies over the
v4.14 re-factoring of ovl_fill_super().

Cc: <stable@vger.kernel.org> #v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-19 17:43:24 +01:00
Amir Goldstein f81678173c ovl: fix another overlay: warning prefix
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-19 17:43:24 +01:00
Amir Goldstein 6d0a8a90a5 ovl: take lower dir inode mutex outside upper sb_writers lock
The functions ovl_lower_positive() and ovl_check_empty_dir() both take
inode mutex on the real lower dir under ovl_want_write() which takes
the upper_mnt sb_writers lock.

While this is not a clear locking order or layering violation, it creates
an undesired lock dependency between two unrelated layers for no good
reason.

This lock dependency materializes to a false(?) positive lockdep warning
when calling rmdir() on a nested overlayfs, where both nested and
underlying overlayfs both use the same fs type as upper layer.

rmdir() on the nested overlayfs creates the lock chain:
  sb_writers of upper_mnt (e.g. tmpfs) in ovl_do_remove()
  ovl_i_mutex_dir_key[] of lower overlay dir in ovl_lower_positive()

rmdir() on the underlying overlayfs creates the lock chain in
reverse order:
  ovl_i_mutex_dir_key[] of lower overlay dir in vfs_rmdir()
  sb_writers of nested upper_mnt (e.g. tmpfs) in ovl_do_remove()

To rid of the unneeded locking dependency, move both ovl_lower_positive()
and ovl_check_empty_dir() to before ovl_want_write() in rmdir() and
rename() implementation.

This change spreads the pieces of ovl_check_empty_and_clear() directly
inside the rmdir()/rename() implementations so the helper is no longer
needed and removed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-19 17:43:23 +01:00
Amir Goldstein d796e77f1d ovl: fix failure to fsync lower dir
As a writable mount, it is not expected for overlayfs to return
EINVAL/EROFS for fsync, even if dir/file is not changed.

This commit fixes the case of fsync of directory, which is easier to
address, because overlayfs already implements fsync file operation for
directories.

The problem reported by Raphael is that new PostgreSQL 10.0 with a
database in overlayfs where lower layer in squashfs fails to start.
The failure is due to fsync error, when PostgreSQL does fsync on all
existing db directories on startup and a specific directory exists
lower layer with no changes.

Reported-by: Raphael Hertzog <raphael@ouaza.com>
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Raphaƫl Hertzog <hertzog@debian.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-19 13:54:33 +01:00
Amir Goldstein 31747eda41 ovl: hash directory inodes for fsnotify
fsnotify pins a watched directory inode in cache, but if directory dentry
is released, new lookup will allocate a new dentry and a new inode.
Directory events will be notified on the new inode, while fsnotify listener
is watching the old pinned inode.

Hash all directory inodes to reuse the pinned inode on lookup. Pure upper
dirs are hashes by real upper inode, merge and lower dirs are hashed by
real lower inode.

The reference to lower inode was being held by the lower dentry object
in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry
may drop lower inode refcount to zero. Add a refcount on behalf of the
overlay inode to prevent that.

As a by-product, hashing directory inodes also detects multiple
redirected dirs to the same lower dir and uncovered redirected dir
target on and returns -ESTALE on lookup.

The reported issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.

Cc: <stable@vger.kernel.org> #v4.13
Reported-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Niklas Cassel <niklas.cassel@axis.com>
2018-01-19 13:54:33 +01:00
Amir Goldstein da2e6b7eed ovl: fix overlay: warning prefix
Conform two stray warning messages to the standard overlayfs: prefix.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-14 11:14:52 +01:00
Vasyl Gomonovych 7879cb43f9 ovl: Use PTR_ERR_OR_ZERO()
Fix ptr_ret.cocci warnings:
fs/overlayfs/overlayfs.h:179:11-17: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11 11:28:11 +01:00
Chengguang Xu e8d4bfe3a7 ovl: Sync upper dirty data when syncing overlayfs
When executing filesystem sync or umount on overlayfs,
dirty data does not get synced as expected on upper filesystem.
This patch fixes sync filesystem method to keep data consistency
for overlayfs.

Signed-off-by: Chengguang Xu <cgxu@mykernel.net>
Fixes: e593b2bf51 ("ovl: properly implement sync_filesystem()")
Cc: <stable@vger.kernel.org> #4.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11 11:28:11 +01:00
Amir Goldstein b02a16e641 ovl: update ctx->pos on impure dir iteration
This fixes a regression with readdir of impure dir in overlayfs
that is shared to VM via 9p fs.

Reported-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Fixes: 4edb83bb10 ("ovl: constant d_ino for non-merge dirs")
Cc: <stable@vger.kernel.org> #4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11 11:28:11 +01:00
Vivek Goyal 08d8f8a5b0 ovl: Pass ovl_get_nlink() parameters in right order
Right now we seem to be passing index as "lowerdentry" and origin.dentry
as "upperdentry". IIUC, we should pass these parameters in reversed order
and this looks like a bug.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Fixes: caf70cb2ba ("ovl: cleanup orphan index entries")
Cc: <stable@vger.kernel.org> #v4.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11 11:28:10 +01:00
Miklos Szeredi 438c84c2f0 ovl: don't follow redirects if redirect_dir=off
Overlayfs is following redirects even when redirects are disabled. If this
is unintentional (probably the majority of cases) then this can be a
problem.  E.g. upper layer comes from untrusted USB drive, and attacker
crafts a redirect to enable read access to otherwise unreadable
directories.

If "redirect_dir=off", then turn off following as well as creation of
redirects.  If "redirect_dir=follow", then turn on following, but turn off
creation of redirects (which is what "redirect_dir=off" does now).

This is a backward incompatible change, so make it dependent on a config
option.

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-12-11 11:28:10 +01:00
Linus Torvalds 1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
Linus Torvalds b04a23421b Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:

 - Report constant st_ino values across copy-up even if underlying
   layers are on different filesystems, but using different st_dev
   values for each layer.

   Ideally we'd report the same st_dev across the overlay, and it's
   possible to do for filesystems that use only 32bits for st_ino by
   unifying the inum space. It would be nice if it wasn't a choice of 32
   or 64, rather filesystems could report their current maximum (that
   could change on resize, so it wouldn't be set in stone).

 - miscellaneus fixes and a cleanup of ovl_fill_super(), that was long
   overdue.

 - created a path_put_init() helper that clears out the pointers after
   putting the ref.

   I think this could be useful elsewhere, so added it to <linux/path.h>

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (30 commits)
  ovl: remove unneeded arg from ovl_verify_origin()
  ovl: Put upperdentry if ovl_check_origin() fails
  ovl: rename ufs to ofs
  ovl: clean up getting lower layers
  ovl: clean up workdir creation
  ovl: clean up getting upper layer
  ovl: move ovl_get_workdir() and ovl_get_lower_layers()
  ovl: reduce the number of arguments for ovl_workdir_create()
  ovl: change order of setup in ovl_fill_super()
  ovl: factor out ovl_free_fs() helper
  ovl: grab reference to workbasedir early
  ovl: split out ovl_get_indexdir() from ovl_fill_super()
  ovl: split out ovl_get_lower_layers() from ovl_fill_super()
  ovl: split out ovl_get_workdir() from ovl_fill_super()
  ovl: split out ovl_get_upper() from ovl_fill_super()
  ovl: split out ovl_get_lowerstack() from ovl_fill_super()
  ovl: split out ovl_get_workpath() from ovl_fill_super()
  ovl: split out ovl_get_upperpath() from ovl_fill_super()
  ovl: use path_put_init() in error paths for ovl_fill_super()
  vfs: add path_put_init()
  ...
2017-11-17 13:36:59 -08:00