Commit graph

1950 commits

Author SHA1 Message Date
Tetsuo Handa 8b0d7f56b9 gfs2: Fix wrong error handling in init_gfs2_fs()
init_gfs2_fs() is calling e.g. calling unregister_shrinker() without
register_shrinker() when an error occurred during initialization.
Rename goto labels and call appropriate undo function.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-27 10:47:22 -06:00
Bob Peterson a18c78c5f5 GFS2: Combine gfs2_free_di with gfs2_free_uninit_di
Before this patch, function gfs2_free_di was 4 lines of code, and
one of those lines was to call gfs2_free_uninit_di. Although
unlikely, if function gfs2_free_uninit_di encountered an error
finding the block to be freed, the error was silently ignored by the
caller, which went ahead and improperly did a quota-change operation
and meta_wipe despite the error. This patch combines the two
functions into one to make the code more readable and fixes the bug
by returning from the combined function before it takes those next
incorrect steps.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-27 10:47:14 -06:00
Mel Gorman 8667982014 mm, pagevec: remove cold parameter for pagevecs
Every pagevec_init user claims the pages being released are hot even in
cases where it is unlikely the pages are hot.  As no one cares about the
hotness of pages being released to the allocator, just ditch the
parameter.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Jan Kara 67fd707f46 mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.  Just drop the argument.

Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Jan Kara d2bc5b3c67 gfs2: use pagevec_lookup_range_tag()
We want only pages from given range in gfs2_write_cache_jdata().  Use
pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove
unnecessary code.

Link: http://lkml.kernel.org/r/20171009151359.31984-9-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Linus Torvalds 29309a4eb8 We've got a total of 17 GFS2 patches for this merge window. The
patches are basically in three categories: (1) patches related to
 broken xfstest cases, (2) patches related to improving iomap and
 start using it in GFS2, and (3) general typos and clarifications.
 
 Please note that one of the iomap patches extends beyond GFS2 and
 affects other file systems, but it was publically reviewed by a
 variety of file system people in the community.
 
 1. Andreas has a patch that simply renames variable 'bsize' to 'factor'
    to clarify the logic related to gfs2_block_map.
 2. He also has a patch to correctly set ctime in the setflags ioctl,
    which fixes broken xfstests test 277.
 3. He also fixed broken xfstest 258, due to an atime initialization
    problem.
 4. He also fixed broken xfstest 307, in which GFS2 was not setting
    ctime when setting acls.
 5. He has a patch to switch general iomap code from blkno to disk
    offset for a variety of file systems.
 6. He has a patch to add a new IOMAP_F_DATA_INLINE flag for iomap
    to indicate blocks that have data mixed with metadata.
 7. I contributed a patch to make inode height info part of the
    'metapath' data structure to facilitate using iomap in GFS2.
 8. I have a patch to start using iomap inside GFS2 and switch GFS2's
    block_map functions to use iomap under the covers.
 9. I have a patch to switch GFS2's fiemap implementation from using
    block_map to using iomap under the covers.
 10. Andreas has a patch to implement SEEK_HOLE and SEEK_DATA via
     iomap in GFS2.
 11. I have a patch related to journaled data pages not being properly
     synced to media when writing inodes. This was caught with xfstests.
 12. I have a patch to fix another failing xfstest case in which
     switching a file from ordered_write to journaled data via set_flags
     caused a deadlock.
 13. Andreas has a patch to fix failing xfstest case 066, which was
     due to not properly syncing dirty inodes when changing extended
     attributes.
 14. Andreas fixed a minor typo in a comment.
 15. Andreas contributed a patch to partially fix xfstest 424, which
     involved GET_FLAGS and SET_FLAGS ioctl. This is also a cleanup
     and simplification of the translation of flags from fs flags to
     gfs2 flags.
 16. He also added support for STATX_ATTR_ in statx, which fixed broken
     xfstest 424.
 17. He also contributed a fix for failing xfstest 093 which fixes a
     recursive glock problem with gfs2_xattr_get and _set.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaCx5uAAoJENeLYdPf93o7Nb8H/RLJ2CsEbTSJQ82RH4eptoxe
 XbQ4HVig9Hm8k5teSTH9DdVypkxjPtbJZY9k1Y4mEtddtCZ/yS407aTdr/pP0C5r
 3W8Ouu2JXmqPKWg0sp3wC/Pji2ThCYssQXNyBSDPADsF2C8XEuT7aL/YPzMitIdm
 Lxa9JHo1tKgdFnkloNyaTt4MdBGNF5M5UBr6KgRfwhgooHWbxM0rNyZIXJtySb0I
 vsaNNOA7a4VQp1Fo1DkHQomNbOG5hpVKfswUOOZvk2RdAewTPN+jXiOAmIhNjQ3Y
 /PkJLjRCf8Ob/VIYmt2BTs16+07mODGv1d6DuhgXzH/dfiVihVGvVo71DxXx5uw=
 =i8b2
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.15.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Bob Peterson:
 "We've got a total of 17 GFS2 patches for this merge window. The
  patches are basically in three categories: (1) patches related to
  broken xfstest cases, (2) patches related to improving iomap and start
  using it in GFS2, and (3) general typos and clarifications.

  Please note that one of the iomap patches extends beyond GFS2 and
  affects other file systems, but it was publically reviewed by a
  variety of file system people in the community.

  From Andreas Gruenbacher:

   - rename variable 'bsize' to 'factor' to clarify the logic related to
     gfs2_block_map.

   - correctly set ctime in the setflags ioctl, which fixes broken
     xfstests test 277.

   - fix broken xfstest 258, due to an atime initialization problem.

   - fix broken xfstest 307, in which GFS2 was not setting ctime when
     setting acls.

   - switch general iomap code from blkno to disk offset for a variety
     of file systems.

   - add a new IOMAP_F_DATA_INLINE flag for iomap to indicate blocks
     that have data mixed with metadata.

   - implement SEEK_HOLE and SEEK_DATA via iomap in GFS2.

   - fix failing xfstest case 066, which was due to not properly syncing
     dirty inodes when changing extended attributes.

   - fix a minor typo in a comment.

   - partially fix xfstest 424, which involved GET_FLAGS and SET_FLAGS
     ioctl. This is also a cleanup and simplification of the translation
     of flags from fs flags to gfs2 flags.

   - add support for STATX_ATTR_ in statx, which fixed broken xfstest
     424.

   - fix for failing xfstest 093 which fixes a recursive glock problem
     with gfs2_xattr_get and _set

  From me:

   - make inode height info part of the 'metapath' data structure to
     facilitate using iomap in GFS2.

   - start using iomap inside GFS2 and switch GFS2's block_map functions
     to use iomap under the covers.

   - switch GFS2's fiemap implementation from using block_map to using
     iomap under the covers.

   - fix journaled data pages not being properly synced to media when
     writing inodes. This was caught with xfstests.

   - fix another failing xfstest case in which switching a file from
     ordered_write to journaled data via set_flags caused a deadlock"

* tag 'gfs2-4.15.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Allow gfs2_xattr_set to be called with the glock held
  gfs2: Add support for statx inode flags
  gfs2: Fix and clean up {GET,SET}FLAGS ioctl
  gfs2: Fix a harmless typo
  gfs2: Fix xattr fsync
  GFS2: Take inode off order_write list when setting jdata flag
  GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALL
  gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap
  GFS2: Switch fiemap implementation to use iomap
  GFS2: Implement iomap for block_map
  GFS2: Make height info part of metapath
  gfs2: Always update inode ctime in set_acl
  gfs2: Support negative atimes
  gfs2: Update ctime in setflags ioctl
  gfs2: Clarify gfs2_block_map
2017-11-14 13:55:51 -08:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Andreas Gruenbacher d0920a9cd7 gfs2: Allow gfs2_xattr_set to be called with the glock held
On the following call path:

  gfs2_setattr -> setattr_prepare -> ... ->
    cap_inode_killpriv -> ... ->
      gfs2_xattr_set

the glock is locked in gfs2_setattr, so check for recursive locking in
gfs2_xattr_set as gfs2_xattr_get already does.  While at it, get rid of
need_unlock in gfs2_xattr_get.

Fixes xfstest generic/093.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:59 +01:00
Andreas Gruenbacher b2623c2fe6 gfs2: Add support for statx inode flags
Add support for the STATX_ATTR_ flags in statx.  (Compression,
encryption, and the nodump flag are not supported by gfs2.)

Partially fixes xfstest generic/424.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:58 +01:00
Andreas Gruenbacher b16f7e57b7 gfs2: Fix and clean up {GET,SET}FLAGS ioctl
Switch to a simple array for mapping between the FS_*_FL and GFS_DIF_*
flags.  Clarify how the mapping between FS_JOURNAL_DATA_FL and the
filesystem flags works.  The GFS2_DIF_SYSTEM flag cannot be set from
user space, so remove it from GFS2_FLAGS_USER_SET.  Fail with -EINVAL
when trying to set flags that are not supported instead of silently
ignoring those flags.

Partially fixes xfstest generic/424.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:57 +01:00
Andreas Gruenbacher 61d6899ad4 gfs2: Fix a harmless typo
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:56 +01:00
Andreas Gruenbacher 6862c44ec5 gfs2: Fix xattr fsync
Make sure that changing xattrs marks the corresponding inode dirty so
that a subsequent fsync will sync those changes to disk.  We set
I_DIRTY_SYNC as well as I_DIRTY_DATASYNC so that both fsync and
fdatasync will sync xattr changes: xattrs can contain information
critical to how the data can be accessed, so we don't want fdatasync
to skip them.

Fixes xfstest generic/066.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:56 +01:00
Bob Peterson cc555b09d8 GFS2: Take inode off order_write list when setting jdata flag
This patch fixes a deadlock caused when the jdata flag is set for
inodes that are already on the ordered write list. Since it is
on the ordered write list, log_flush calls gfs2_ordered_write which
calls filemap_fdatawrite. But since the inode had the jdata flag
set, that calls gfs2_jdata_writepages, which tries to start a new
transaction. A new transaction cannot be started because it tries
to acquire the log_flush rwsem which is already locked by the log
flush operation.

The bottom line is: We cannot switch an inode from ordered to jdata
until we eliminate any ordered data pages (via log flush) or any
log_flush operation afterward will create the circular dependency
above. So we need to flush the log before setting the diskflags to
switch the file mode, then we need to remove the inode from the
ordered writes list.

Before this patch, the log flush was done for jdata->ordered, but
that's wrong. If we're going from jdata to ordered, we don't need
to call gfs2_log_flush because the call to filemap_fdatawrite will
do it for us:

   filemap_fdatawrite() -> __filemap_fdatawrite_range()
      __filemap_fdatawrite_range() -> do_writepages()
         do_writepages() -> gfs2_jdata_writepages()
            gfs2_jdata_writepages() -> gfs2_log_flush()

This patch modifies function do_gfs2_set_flags so that if a file
has its jdata flag set, and it's already on the ordered write list,
the log will be flushed and it will be removed from the list
before setting the flag.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31 14:26:47 +01:00
Bob Peterson adbc3ddf28 GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALL
In function gfs2_write_inode, starting with patch a9185b41a4, we
only flush the log and call filemap_fdatawait if we're passed in a
wbc sync_mode of WB_SYNC_ALL. We also need to do these things if
we're evicting a jdata inode, because we might have jdata pages
still attached to bufdata descriptors that need to be revoked, but
by the time it gets to evict() it's too late to start a new
transaction. This patch changes it to treat jdata inodes as if
WB_SYNC_ALL had been specified.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31 14:26:35 +01:00
Andreas Gruenbacher 3a27411cb4 gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap
So far, lseek on gfs2 did not report holes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:35 +01:00
Bob Peterson aac1a55b45 GFS2: Switch fiemap implementation to use iomap
This patch switches GFS2's implementation of fiemap from the old
block_map code to the new iomap interface.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31 14:26:34 +01:00
Bob Peterson 3974320ca6 GFS2: Implement iomap for block_map
This patch implements iomap for block mapping, and switches the
block_map function to use it under the covers.

The additional IOMAP_F_BOUNDARY iomap flag indicates when iomap has
reached a "metadata boundary" and fetching the next mapping is likely to
incur an additional I/O.  This flag is used for setting the bh buffer
boundary flag.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31 14:26:33 +01:00
Bob Peterson 5f8bd4440d GFS2: Make height info part of metapath
This patch eliminates height parameters from function gfs2_bmap_alloc.
Function find_metapath determines the metapath's "find height", also
known as the desired height. Function lookup_metapath determines the
metapath's "actual height", previously known as starting height or
sheight. Function gfs2_bmap_alloc now gets both height values from
the metapath. This simplification was done as a step toward switching
the block_map functions to using iomap. The bh_map responsibilities
are also removed from function gfs2_bmap_alloc for the same reason.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31 14:26:23 +01:00
Andreas Gruenbacher 0c9a66ec0e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2017-10-16 15:06:23 +02:00
Linus Torvalds 17763641ff GFS2: Fix an old regression in GFS2's debugfs interface
This tag is meant for pulling a patch called "gfs2: Fix
 debugfs glocks dump" which fixes a regression introduced
 by commit 88ffbf3e03. The regression caused the glock
 dump in debugfs to not report all the glocks, which makes
 debugging extremely difficult.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZyUI8AAoJENeLYdPf93o7iq4IAKhb9wJ8kmpu7LZ5k6Fl8BCy
 GFztPe2bKsFG8cul1o1gZx8c/GWORaCHe3ZDI6pxl16/E+AvWoA1pKbBLYB1GSvD
 90a7/m6+hx02ZXR/MHxBUQLWYXBtBrVMVcZDCmFMHWYCRUIiX2etPZL8wOXeJLTl
 lNCSGdd1+3y6IJbthaIKTt1ctzsR8ZqV4QN786d2C3L9dxZ63FnAV43p3rUBzBLX
 B5uT5LTmdWSLRqe0A9rnrPga/BfEnA8GDtIYUMic9Yz0Hq2a3vEnCC3P3Myp0DJZ
 PGposwqL/emRhXkC4+ICrGsTOIy1BzwMXLF47GQaB/k+2Rd3/l9r/hU5ESjQOgA=
 =taQL
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Bob Peterson:
 "GFS2: Fix an old regression in GFS2's debugfs interface

 This fixes a regression introduced by commit 88ffbf3e03 ("GFS2: Use
 resizable hash table for glocks"). The regression caused the glock dump
 in debugfs to not report all the glocks, which makes debugging
 extremely difficult"

* tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix debugfs glocks dump
2017-09-25 15:41:56 -07:00
Andreas Gruenbacher c2c4be28c2 gfs2: Always update inode ctime in set_acl
Three-entry POSIX ACLs can be stored in the file mode permission bits,
with no need to store them in extended attributes.  When a process sets
such a minimal ACL, the kernel updates the file mode like chmod does,
and removes any existing extended attributes for that ACL.  Make sure
the ctime is always updated in that case.

Fixes xfstest generic/307.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:19 -05:00
Andreas Gruenbacher 38eedf2841 gfs2: Support negative atimes
When inodes are read from disk, GFS2 will only update in-memory atimes
older than the on-disk atimes; this prevents atimes from going
backwards.  The atimes of newly allocated inodes are initialized to 0.
This means that when an atime is explicitly set to a negative value,
this value will not persist.

Fix by setting the atime of newly allocated inodes to the lowest
possible value instead of 0.

Fixes xfstest generic/258.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:19 -05:00
Andreas Gruenbacher 9b7c2ddb45 gfs2: Update ctime in setflags ioctl
The FS_IOC_SETFLAGS ioctl is supposed to update the inode ctime.
Fixes xfstests generic/277.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:18 -05:00
Andreas Gruenbacher 20cdc1931e gfs2: Clarify gfs2_block_map
Add a comment about the logical block size for directories.  Rename
"bsize" in gfs2_block_map to "factor".  Fix a typo in the description of
metaptr1.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:18 -05:00
Andreas Gruenbacher 10201655b0 gfs2: Fix debugfs glocks dump
The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock
dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a
single buffer: the right function for restarting an rhashtable iteration
from the beginning of the hash table is rhashtable_walk_enter;
rhashtable_walk_stop + rhashtable_walk_start will just resume from the
current position.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.3+
2017-09-25 12:32:33 -05:00
Linus Torvalds 0f0d12728e Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
 "Another chunk of fmount preparations from dhowells; only trivial
  conflicts for that part. It separates MS_... bits (very grotty
  mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
  only a small subset of MS_... stuff).

  This does *not* convert the filesystems to new constants; only the
  infrastructure is done here. The next step in that series is where the
  conflicts would be; that's the conversion of filesystems. It's purely
  mechanical and it's better done after the merge, so if you could run
  something like

	list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

	sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
	        -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
	        -e 's/\<MS_NODEV\>/SB_NODEV/g' \
	        -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
	        -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
	        -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
	        -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
	        -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
	        -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
	        -e 's/\<MS_SILENT\>/SB_SILENT/g' \
	        -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
	        -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
	        -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
	        -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
	        $list

  and commit it with something along the lines of 'convert filesystems
  away from use of MS_... constants' as commit message, it would save a
  quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Differentiate mount flags (MS_*) from internal superblock flags
  VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
  vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-14 18:54:01 -07:00
Linus Torvalds a0725ab0c7 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the first pull request for 4.14, containing most of the code
  changes. It's a quiet series this round, which I think we needed after
  the churn of the last few series. This contains:

   - Fix for a registration race in loop, from Anton Volkov.

   - Overflow complaint fix from Arnd for DAC960.

   - Series of drbd changes from the usual suspects.

   - Conversion of the stec/skd driver to blk-mq. From Bart.

   - A few BFQ improvements/fixes from Paolo.

   - CFQ improvement from Ritesh, allowing idling for group idle.

   - A few fixes found by Dan's smatch, courtesy of Dan.

   - A warning fixup for a race between changing the IO scheduler and
     device remova. From David Jeffery.

   - A few nbd fixes from Josef.

   - Support for cgroup info in blktrace, from Shaohua.

   - Also from Shaohua, new features in the null_blk driver to allow it
     to actually hold data, among other things.

   - Various corner cases and error handling fixes from Weiping Zhang.

   - Improvements to the IO stats tracking for blk-mq from me. Can
     drastically improve performance for fast devices and/or big
     machines.

   - Series from Christoph removing bi_bdev as being needed for IO
     submission, in preparation for nvme multipathing code.

   - Series from Bart, including various cleanups and fixes for switch
     fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
  kernfs: checking for IS_ERR() instead of NULL
  drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
  drbd: Fix allyesconfig build, fix recent commit
  drbd: switch from kmalloc() to kmalloc_array()
  drbd: abort drbd_start_resync if there is no connection
  drbd: move global variables to drbd namespace and make some static
  drbd: rename "usermode_helper" to "drbd_usermode_helper"
  drbd: fix race between handshake and admin disconnect/down
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: A single dot should be put into a sequence.
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: Use setup_timer() instead of init_timer() to simplify the code.
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: new disk-option disable-write-same
  drbd: Fix resource role for newly created resources in events2
  drbd: mark symbols static where possible
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: add explicit plugging when submitting batches
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: introduce drbd_recv_header_maybe_unplug
  ...
2017-09-07 11:59:42 -07:00
Linus Torvalds ec3604c7a5 Writeback error handling fixes for v4.14
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZrTy3AAoJEAAOaEEZVoIVaucP/ApBAj2S5wzvlV1u6l8E6ae7
 ZeEEZfcWwzRYlKjZAkTWqj9XvGpDGO5gLq4wsZK2edFAq++/MJF8ZVtN4tdZ1kUZ
 DUvRodtVOrT08Kp9wZXGT7JOFrf6U/6gMcR6p0MuWnHndeKYvlpcFi9NPT4EC9/z
 Zm9V7gtlPdSOha7eaSjUS0+vLERkxqXLBW3Av9QUOBP/lbI3lqIroGKeHDYnVdya
 2P/k5EcRRJMyJP6TqyYxmmJl+UWjJFMLvnlUDBslHnD/u3mIUhw3JLHYBjn5dZRE
 Xjq56IDPoXDUvzlBhtn/Uqyx+/wtwsNsylpmKv6K5G1JfdeuSsPVsCey+A1cqV64
 LpE5896wf9TmnmI9LNyh6vDn925xPSGBiF45UEp5f9aO7jXeY0MaEZ8g+ENqFIDK
 v4gtZdS9FhYHV+/l4qEwYMKrqSbwKEs1r1FT+f4wnABby1ojfdA57ZPlp5PV2Vjp
 szTp88Zkb7cMvZwEnWwxWofcJNmgS7uNahvnQF3IJ4ITsioEkuyYR3K4ZQMaaaV9
 wCp6G0FhXZaK3OI7o9WiDwaO2elp9Hxc8bnqKpiBbHZkY0NLh7/++5VxpeNbTHFy
 AGijQiiKGNNyYqNj93wq9jpVdMNjB0pXrHRxfav8v7MtQ+WfbEoAENF4T7hN7iXn
 UuF6eSWEC5O1UCRUk1A+
 =LLY3
 -----END PGP SIGNATURE-----

Merge tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull writeback error handling updates from Jeff Layton:
 "This pile continues the work from last cycle on better tracking
  writeback errors. In v4.13 we added some basic errseq_t infrastructure
  and converted a few filesystems to use it.

  This set continues refining that infrastructure, adds documentation,
  and converts most of the other filesystems to use it. The main
  exception at this point is the NFS client"

* tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  ecryptfs: convert to file_write_and_wait in ->fsync
  mm: remove optimizations based on i_size in mapping writeback waits
  fs: convert a pile of fsync routines to errseq_t based reporting
  gfs2: convert to errseq_t based writeback error reporting for fsync
  fs: convert sync_file_range to use errseq_t based error-tracking
  mm: add file_fdatawait_range and file_write_and_wait
  fuse: convert to errseq_t based error tracking for fsync
  mm: consolidate dax / non-dax checks for writeback
  Documentation: add some docs for errseq_t
  errseq: rename __errseq_set to errseq_set
2017-09-06 14:11:03 -07:00
Linus Torvalds 77d0ab600a We've got a whopping 29 GFS2 patches for this merge window, mainly
because we held some back from the previous merge window until we
 could get them perfected and well tested. We have a couple patch
 sets, including my patch set for protecting glock gl_object and
 Andreas Gruenbacher's patch set to fix the long-standing shrink-
 slab hang, plus a bunch of assorted bugs and cleanups:
 
 1. I fixed a bug whereby an IO error would lead to a double-brelse.
 2. Andreas Gruenbacher made a minor cleanup to call his relatively
    new function, gfs2_holder_initialized, rather than doing it
    manually. This was just missed by a previous patch set.
 3. Jan Kara fixed a bug whereby the SGID was being cleared when
    inheriting ACLs.
 4. Andreas found a bug and fixed it in his previous patch,
    "Get rid of flush_delayed_work in gfs2_evict_inode". A call to
    flush_delayed_work was deleted from *gfs2_inode_lookup and added
    to gfs2_create_inode.
 5. Wang Xibo found and fixed a list_add call in inode_go_lock
    that specified the parameters in the wrong order.
 6. Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's
    metadata reads that were accidentally missing them.
 7 - 10. I submitted a 4-patch set to protect the glock gl_object
    field. GFS2 was setting and checking gl_object with no locking
    mechanism, so the value was occasionally stomped on, which caused
    file system corruption.
 11. I submitted a small cleanup to function gfs2_clear_rgrpd.
    It was needlessly adding rgrp glocks to the lru list, then pulling
    them back off immediately. The rgrp glocks don't use the lru list
    anyway, so doing so was just a waste of time.
 12. I submitted a patch that checks the GLOF_LRU flag on a glock
    before trying to remove it from the lru_list. This avoids a lot
    of unnecessary spin_lock contention.
 13. I submitted a patch to delete GFS2's debugfs files only after
    we evict all the glocks. Before this patch, GFS2 would delete the
    debugfs files, and if unmount hung waiting for a glock, there was
    no way to debug the problem. Now, if a hang occurs during umount,
    we can examine the debugfs files to figure out why it's hung.
 14. Andreas Gruenbacher submitted a patch to fix some trivial typos.
 15 - 19. Andreas also submitted a five-part patch set to fix the
    longstanding hang involving the slab shrinker: dlm requires
    memory, calls the inode shrinker, which calls gfs2's evict, which
    calls back into DLM before it can evict an inode.
 20. Abhi Das submitted a patch to forcibly flush the active items
    list to relieve memory pressure. This fixes a long-standing bug
    whereby GFS2 was getting hung permanently in balance_dirty_pages.
 21. Thomas Tai submitted a patch to fix a slab corruption problem
    due to a residual pointer left in the lock_dlm lockstruct.
 22. I submitted a patch to withdraw the file system if IO errors
    are encountered while writing to the journals or statfs system
    file which were previously not being sent back up. Before, some
    IO errors were sometimes not be detected for several hours, and
    at recovery time, the journal errors made journal replay
    impossible.
 23. Andreas has a patch to fix an annoying format-truncation compiler
    warning so GFS2 compiles cleanly.
 24. I have a patch that fixes a handful of sparse compiler warnings.
 25. Andreas fixed up an useless gl_object warning caused by an
    earlier patch.
 26. Arvind Yadav added a patch to properly constify our rhashtable
    params declare.
 27. I added a patch to fix a regression caused by the non-recursive
    delete and truncate patch that caused file system blocks to not
    be properly freed.
 28. Ernesto A. Fernández added a patch to fix a place where GFS2
    would send back the wrong return code setting extended attributes.
 29. Ernesto also added a patch to fix a case in which GFS2 was
    improperly setting an inode's i_mode, potentially granting access
    to the wrong users.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZrMC2AAoJENeLYdPf93o7PIQIAKY4hdC2pMM5tiiIHx5fPAAr
 tjpVuFkDQzyEaTb9sArVLxEdva3ShKERQKoYq/VVxqbAEwPgXbzJFNNil1WTJi1t
 J2gE4wE4G5x1+A7XDzCdPI8KAcF+yX63AaFYlVKyuZSq5w7njIRc1Vk+TFiIexxC
 xb0nP0g9L6Zt114rE8kfi0/GLjTO9vOKM3XsJgG612I3/cs3RUx4gJ+nSUG0bYLA
 qoBIXEJ3SFHw2Zr/LgHZ9QDHnlPVl3bjg03sRQaWZms7XbLegDBYsDSvS1HLZ300
 gjTc0Dgz/6KwzDVJ7cZ/fPNYtIFY58tKs6aqqDTrCncsX9nPjcTAxYkBNWsFyZM=
 =tXJ8
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We've got a whopping 29 GFS2 patches for this merge window, mainly
  because we held some back from the previous merge window until we
  could get them perfected and well tested. We have a couple patch sets,
  including my patch set for protecting glock gl_object and Andreas
  Gruenbacher's patch set to fix the long-standing shrink- slab hang,
  plus a bunch of assorted bugs and cleanups.

  Summary:

   - I fixed a bug whereby an IO error would lead to a double-brelse.

   - Andreas Gruenbacher made a minor cleanup to call his relatively new
     function, gfs2_holder_initialized, rather than doing it manually.
     This was just missed by a previous patch set.

   - Jan Kara fixed a bug whereby the SGID was being cleared when
     inheriting ACLs.

   - Andreas found a bug and fixed it in his previous patch, "Get rid of
     flush_delayed_work in gfs2_evict_inode". A call to
     flush_delayed_work was deleted from *gfs2_inode_lookup and added to
     gfs2_create_inode.

   - Wang Xibo found and fixed a list_add call in inode_go_lock that
     specified the parameters in the wrong order.

   - Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's
     metadata reads that were accidentally missing them.

   - I submitted a 4-patch set to protect the glock gl_object field.
     GFS2 was setting and checking gl_object with no locking mechanism,
     so the value was occasionally stomped on, which caused file system
     corruption.

   - I submitted a small cleanup to function gfs2_clear_rgrpd. It was
     needlessly adding rgrp glocks to the lru list, then pulling them
     back off immediately. The rgrp glocks don't use the lru list
     anyway, so doing so was just a waste of time.

   - I submitted a patch that checks the GLOF_LRU flag on a glock before
     trying to remove it from the lru_list. This avoids a lot of
     unnecessary spin_lock contention.

   - I submitted a patch to delete GFS2's debugfs files only after we
     evict all the glocks. Before this patch, GFS2 would delete the
     debugfs files, and if unmount hung waiting for a glock, there was
     no way to debug the problem. Now, if a hang occurs during umount,
     we can examine the debugfs files to figure out why it's hung.

   - Andreas Gruenbacher submitted a patch to fix some trivial typos.

   - Andreas also submitted a five-part patch set to fix the
     longstanding hang involving the slab shrinker: dlm requires memory,
     calls the inode shrinker, which calls gfs2's evict, which calls
     back into DLM before it can evict an inode.

   - Abhi Das submitted a patch to forcibly flush the active items list
     to relieve memory pressure. This fixes a long-standing bug whereby
     GFS2 was getting hung permanently in balance_dirty_pages.

   - Thomas Tai submitted a patch to fix a slab corruption problem due
     to a residual pointer left in the lock_dlm lockstruct.

   - I submitted a patch to withdraw the file system if IO errors are
     encountered while writing to the journals or statfs system file
     which were previously not being sent back up. Before, some IO
     errors were sometimes not be detected for several hours, and at
     recovery time, the journal errors made journal replay impossible.

   - Andreas has a patch to fix an annoying format-truncation compiler
     warning so GFS2 compiles cleanly.

   - I have a patch that fixes a handful of sparse compiler warnings.

   - Andreas fixed up an useless gl_object warning caused by an earlier
     patch.

   - Arvind Yadav added a patch to properly constify our rhashtable
     params declare.

   - I added a patch to fix a regression caused by the non-recursive
     delete and truncate patch that caused file system blocks to not be
     properly freed.

   - Ernesto A. Fernández added a patch to fix a place where GFS2 would
     send back the wrong return code setting extended attributes.

   - Ernesto also added a patch to fix a case in which GFS2 was
     improperly setting an inode's i_mode, potentially granting access
     to the wrong users"

* tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits)
  gfs2: preserve i_mode if __gfs2_set_acl() fails
  gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing
  GFS2: Fix non-recursive truncate bug
  gfs2: constify rhashtable_params
  GFS2: Fix gl_object warnings
  GFS2: Fix up some sparse warnings
  gfs2: Silence gcc format-truncation warning
  GFS2: Withdraw for IO errors writing to the journal or statfs
  gfs2: fix slab corruption during mounting and umounting gfs file system
  gfs2: forcibly flush ail to relieve memory pressure
  gfs2: Clean up waiting on glocks
  gfs2: Defer deleting inodes under memory pressure
  gfs2: gfs2_evict_inode: Put glocks asynchronously
  gfs2: Get rid of gfs2_set_nlink
  gfs2: gfs2_glock_get: Wait on freeing glocks
  gfs2: Fix trivial typos
  GFS2: Delete debugfs files only after we evict the glocks
  GFS2: Don't waste time locking lru_lock for non-lru glocks
  GFS2: Don't bother trying to add rgrps to the lru list
  GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode
  ...
2017-09-06 11:42:31 -07:00
Ernesto A. Fernández 309e8cda59 gfs2: preserve i_mode if __gfs2_set_acl() fails
When changing a file's acl mask, __gfs2_set_acl() will first set the
group bits of i_mode to the value of the mask, and only then set the
actual extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the
file had no acl attribute to begin with, the system will from now on
assume that the mask permission bits are actual group permission bits,
potentially granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-31 07:53:15 -05:00
Ernesto A. Fernández 54aae14bee gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing
The function __gfs2_xattr_set() will return -ENODATA when called to
remove a xattr that does not exist. The result is that setfacl will
show an exit status of 1 when called to set only a file's mode bits
(on a file with no ACLs), despite succeeding. A "No data available"
error will be printed as well.

To fix this return 0 instead, except when the XATTR_REPLACE flag is
set, in which case -ENODATA is appropriate. This is consistent with
how most other xattr setting functions work, in other filesystems.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-31 07:43:03 -05:00
Bob Peterson c4a9d1892f GFS2: Fix non-recursive truncate bug
Before this patch if you truncated a file to a smaller size it
wasn't freeing all the blocks properly. There are two reasons.

First, the metapath comparison was not comparing previous heights.
I added a function, mp_eq_to_hgt, which checks the metapath at
all heights prior to the target height.

Second, in function find_nonnull_ptr, it needed to zero out all
pointers for heights following the target height. Translated into
decimal integer terms, this way a number like 299, when incremented,
becomes 300, not 399. The 2 gets incremented to 3, and the following
digits need to be reset.

These two things allow the truncate state machine to properly find
the blocks it needs to delete.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30 13:29:22 -05:00
Arvind Yadav d296b15ed5 gfs2: constify rhashtable_params
rhashtable_params are not supposed to change at runtime. All
Functions rhashtable_* working with const rhashtable_params
provided by <linux/rhashtable.h>. So mark the non-const structs
as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30 08:14:39 -05:00
Andreas Gruenbacher 7023a0b16f GFS2: Fix gl_object warnings
The following cleanup is needed to avoid spilling the syslog with
false warnings.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30 08:14:27 -05:00
Bob Peterson 27c3b415f6 GFS2: Fix up some sparse warnings
This patch cleans up various pieces of GFS2 to avoid sparse errors.
This doesn't fix them all, but it fixes several. The first error,
in function glock_hash_walk was a genuine bug where the rhashtable
could be started and not stopped.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25 18:47:18 -05:00
Andreas Gruenbacher 561b796987 gfs2: Silence gcc format-truncation warning
Enlarge sd_fsname to be big enough for the longest long lock table name
and an arbitrary journal number.  This silences two -Wformat-truncation
warnings with gcc 7.1.1.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25 10:59:21 -05:00
Bob Peterson 942b0cddfb GFS2: Withdraw for IO errors writing to the journal or statfs
Before this patch, if GFS2 encountered IO errors while writing to
the journal, it would not report the problem, so they would go
unnoticed, sometimes for many hours. Sometimes this would only be
noticed later, when recovery tried to do journal replay and failed
due to invalid metadata at the blocks that resulted in IO errors.

This patch makes GFS2's log daemon check for IO errors. If it
encounters one, it withdraws from the file system and reports
why in dmesg. A similar action is taken when IO errors occur when
writing to the system statfs file.

These errors are also reported back to any callers of fsync, since
that requires the journal to be flushed. Therefore, any IO errors
that would previously go unnoticed are now noticed and the file
system is withdrawn as early as possible, thus preventing further
file system damage.

Also note that this reintroduces superblock variable sd_log_error,
which Christoph removed with commit f729b66fca.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25 10:59:09 -05:00
Christoph Hellwig 74d46992e0 block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-23 12:49:55 -06:00
Thomas Tai cc1dfa8b75 gfs2: fix slab corruption during mounting and umounting gfs file system
When using cman-3.0.12.1 and gfs2-utils-3.0.12.1, mounting and
unmounting GFS2 file system would cause kernel to hang. The slab
allocator suggests that it is likely a double free memory corruption.
The issue is traced back to v3.9-rc6 where a patch is submitted to
use kzalloc() for storing a bitmap instead of using a local variable.
The intention is to allocate memory during mount and to free memory
during unmount. The original patch misses a code path which has
already freed the memory and caused memory corruption. This patch sets
the memory pointer to NULL after the memory is freed, so that double
free memory corruption will not happen.

gdlm_mount()
  '-- set_recover_size() which use kzalloc()
  '-- if dlm does not support ops callbacks then
          '--- free_recover_size() which use kfree()

gldm_unmount()
  '-- free_recover_size() which use kfree()

Previous patch which introduced the double free issue is
commit 57c7310b8e ("GFS2: use kmalloc for lvb bitmap")

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
2017-08-15 11:54:09 -05:00
Abhi Das b066a4eebd gfs2: forcibly flush ail to relieve memory pressure
On systems with low memory, it is possible for gfs2 to infinitely
loop in balance_dirty_pages() under heavy IO (creating sparse files).

balance_dirty_pages() attempts to write out the dirty pages via
gfs2_writepages() but none are found because these dirty pages are
being used by the journaling code in the ail. Normally, the journal
has an upper threshold which when hit triggers an automatic flush
of the ail. But this threshold can be higher than the number of
allowable dirty pages and result in the ail never being flushed.

This patch forces an ail flush when gfs2_writepages() fails to write
anything. This is a good indication that the ail might be holding
some dirty pages.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:51:03 -05:00
Andreas Gruenbacher a91323e255 gfs2: Clean up waiting on glocks
The prepare_to_wait_on_glock and finish_wait_on_glock functions introduced in
commit 56a365be "gfs2: gfs2_glock_get: Wait on freeing glocks" are
better removed, resulting in cleaner code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:51:02 -05:00
Andreas Gruenbacher 6a1c8f6dcf gfs2: Defer deleting inodes under memory pressure
When under memory pressure and an inode's link count has dropped to
zero, defer deleting the inode to the delete workqueue.  This avoids
calling into DLM under memory pressure, which can deadlock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:49:13 -05:00
Andreas Gruenbacher 71c1b21368 gfs2: gfs2_evict_inode: Put glocks asynchronously
gfs2_evict_inode is called to free inodes under memory pressure.  The
function calls into DLM when an inode's last cluster-wide reference goes
away (remote unlink) and to release the glock and associated DLM lock
before finally destroying the inode.  However, if DLM is blocked on
memory to become available, calling into DLM again will deadlock.

Avoid that by decoupling releasing glocks from destroying inodes in that
case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
in work queue context, when the associated inodes have likely already
been destroyed.

With this change, inodes can end up being unlinked, remote-unlink can be
triggered, and then the inode can be reallocated before all
remote-unlink callbacks are processed.  To detect that, revalidate the
link count in gfs2_evict_inode to make sure we're not deleting an
allocated, referenced inode.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:45:21 -05:00
Andreas Gruenbacher eebd2e813f gfs2: Get rid of gfs2_set_nlink
Remove gfs2_set_nlink which prevents the link count of an inode from
becoming non-zero once it has reached zero.  The next commit reduces the
amount of waiting on glocks when an inode is evicted from memory.  With
that, an inode can become reallocated before all the remote-unlink
callbacks from a previous delete are processed, which causes the link
count to change from zero to non-zero.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:42:11 -05:00
Andreas Gruenbacher 0515480ad4 gfs2: gfs2_glock_get: Wait on freeing glocks
Keep glocks in their hash table until they are freed instead of removing
them when their last reference is dropped.  This allows to wait for any
previous instances of a glock to go away in gfs2_glock_get before
creating a new glocks.

Special thanks to Andy Price for finding and fixing a problem which also
required us to delete the rcu_read_unlock from the error case in function
gfs2_glock_get.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:39:31 -05:00
Andreas Gruenbacher 61b91cfdc6 gfs2: Fix trivial typos
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Bob Peterson b2fb7dab7f GFS2: Delete debugfs files only after we evict the glocks
This patch moves the call to gfs2_delete_debugfs_file so that it
comes after the glock hash table has been cleared. This way we
can query the debugfs files if umount hangs.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Bob Peterson 645ebd49f0 GFS2: Don't waste time locking lru_lock for non-lru glocks
Before this patch, glock_dq would call gfs2_glock_remove_from_lru.
For glocks that are never put on the LRU, such as the transaction
glock, this just takes the spin_lock, determines there's nothing to
be done because the list is empty, then unlocks again. This was
causing unnecessary lock contention on the lru_lock spin_lock.
This patch adds a check for GLOF_LRU in the glops before taking
the spin_lock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Bob Peterson 2d821a8b71 GFS2: Don't bother trying to add rgrps to the lru list
This patch removes a call to gfs2_glock_add_to_lru from function
gfs2_clear_rgrpd. The call is just a waste of time because as soon
as it adds it to the lru_list, the call to gfs2_glock_put takes it
back off again.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:38 -05:00
Bob Peterson 240c6235df GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode
This patch adds some calls to clear gl_object in function
gfs2_delete_inode. Since we are deleting the inode, and the glock
typically outlives the inode in core, we must clear gl_object
so subsequent use of the glock (e.g. for a new inode in its place)
will not have the old pointer sitting there. In error cases we
need to tidy up after ourselves. In non-error cases, we need to
clear gl_object before we set the block free in the bitmap so
residules aren't left for potential inode creators.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-08-09 09:36:38 -05:00
Bob Peterson 9c1b28081f GFS2: Clear gl_object if gfs2_create_inode fails
If function gfs2_create_inode fails after the inode has been
created (for example, if the inode_refresh fails for some reason)
the function was setting gl_object but never clearing it again.
The glocks are left pointing to a freed inode. This patch adds
the calls to clear gl_object in the appropriate error paths.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-08-09 09:36:26 -05:00
Jeff Layton d07a6ac7b6 gfs2: convert to errseq_t based writeback error reporting for fsync
Also, fix a place where a writeback error might get dropped in the
gfs2_is_jdata case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-08-01 08:39:29 -04:00
Bob Peterson 4d7c18c7df GFS2: Set gl_object in inode lookup only after block type check
Before this patch, the inode glock's gl_object was set after a
reference was acquired, but before the block type was verified.
In cases where the block was unlinked, then freed and reused on
another node, a residule delete callback (delete_work) would try
to look up the inode, eventually failing the block check, but
only after it overwrites gl_object with a pointer to the wrong
inode. This patch moves the assignment of gl_object after the
block check so it won't be improperly overwritten.

Likewise, at the end of the function, gfs2_inode_lookup was
clearing gl_object after it unlocked the glock, which meant
another process might free the glock in the meantime. This
patch guards against that case.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-07-21 08:20:42 -05:00
Bob Peterson df3d87bde1 GFS2: Introduce helper for clearing gl_object
This patch introduces a new helper function in glock.h that
clears gl_object, with an added integrity check. An additional
integrity check has been added to glock_set_object, plus comments.
This is step 1 in a series to ensure gl_object integrity.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-07-21 08:20:05 -05:00
Coly Li e477b24b50 gfs2: add flag REQ_PRIO for metadata I/O
When gfs2 does metadata I/O, only REQ_META is used as a metadata hint of
the bio. But flag REQ_META is just a hint for block trace, not for block
layer code to handle a bio as metadata request.

For some of metadata I/Os of gfs2, A REQ_PRIO flag on the metadata bio
would be very informative to block layer code. For example, if bcache is
used as a I/O cache for gfs2, it will be possible for bcache code to get
the hint and cache the pre-fetched metadata blocks on cache device. This
behavior may be helpful to improve metadata I/O performance if the
following requests hit the cache.

Here are the locations in gfs2 code where a REQ_PRIO flag should be added,
- All places where REQ_READAHEAD is used, gfs2 code uses this flag for
  metadata read ahead.
- In gfs2_meta_rq() where the first metadata block is read in.
- In gfs2_write_buf_to_page(), read in quota metadata blocks to have them
  up to date.
These metadata blocks are probably to be accessed again in future, adding
a REQ_PRIO flag may have bcache to keep such metadata in fast cache
device. For system without a cache layer, REQ_PRIO can still provide hint
to block layer to handle metadata requests more properly.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-21 07:48:22 -05:00
Wang Xibo e7cb550d79 GFS2: fix code parameter error in inode_go_lock
In inode_go_lock() function, the parameter order of list_add() is error.
According to the define of list_add(), the first parameter is new entry
and the second is the list head, so ip->i_trunc_list should be the
first parameter and the sdp->sd_trunc_list should be second.

Signed-off-by: Wang Xibo<wang.xibo@zte.com.cn>
Signed-off-by: Xiao Likun<xiao.likun@zte.com.cn>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-21 07:40:59 -05:00
Andreas Gruenbacher 98e5a91a61 gfs2: Fixup to "Get rid of flush_delayed_work in gfs2_evict_inode"
When commit 4fd1a57952 moved the call to flush_delayed_work from
gfs2_evict_inode to gfs2_inode_lookup to avoid calling into DLM during
evict, a similar call should have been added to gfs2_create_inode:
that's another code path in which glocks of previous inodes may be
reused.

The flush of the iopen glock work queue added by 4fd1a57952, on the
other hand, is unnecessary and can be removed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-19 11:10:19 -05:00
Jan Kara 914cea93dd gfs2: Don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__gfs2_set_acl() into gfs2_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 073931017b
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-19 10:58:54 -05:00
Andreas Gruenbacher 283c9a97be gfs2: Lock holder cleanup (fixup)
Function gfs2_holder_initialized should be used in do_flock as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-17 13:39:15 -05:00
Bob Peterson 61eaadcd52 GFS2: Prevent double brelse in gfs2_meta_indirect_buffer
Before this patch, problems reading in indirect buffers would send
an IO error back to the caller, and release the buffer_head with
brelse() in function gfs2_meta_indirect_buffer, however, it would
still return the address of the buffer_head it released. After the
error was discovered, function gfs2_block_map would call function
release_metapath to free all buffers. That checked:
if (mp->mp_bh[i] == NULL) but since the value was set after the
error, it was non-zero, so brelse was called a second time. This
resulted in the following error:

kernel: WARNING: at fs/buffer.c:1224 __brelse+0x3a/0x40() (Tainted: G        W  -- ------------   )
kernel: Hardware name: RHEV Hypervisor
kernel: VFS: brelse: Trying to free free buffer

This patch changes gfs2_meta_indirect_buffer so it only sets
the buffer_head pointer in cases where it isn't released.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2017-07-17 08:39:48 -05:00
David Howells bc98a42c1f VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
Firstly by applying the following with coccinelle's spatch:

	@@ expression SB; @@
	-SB->s_flags & MS_RDONLY
	+sb_rdonly(SB)

to effect the conversion to sb_rdonly(sb), then by applying:

	@@ expression A, SB; @@
	(
	-(!sb_rdonly(SB)) && A
	+!sb_rdonly(SB) && A
	|
	-A != (sb_rdonly(SB))
	+A != sb_rdonly(SB)
	|
	-A == (sb_rdonly(SB))
	+A == sb_rdonly(SB)
	|
	-!(sb_rdonly(SB))
	+!sb_rdonly(SB)
	|
	-A && (sb_rdonly(SB))
	+A && sb_rdonly(SB)
	|
	-A || (sb_rdonly(SB))
	+A || sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) != A
	+sb_rdonly(SB) != A
	|
	-(sb_rdonly(SB)) == A
	+sb_rdonly(SB) == A
	|
	-(sb_rdonly(SB)) && A
	+sb_rdonly(SB) && A
	|
	-(sb_rdonly(SB)) || A
	+sb_rdonly(SB) || A
	)

	@@ expression A, B, SB; @@
	(
	-(sb_rdonly(SB)) ? 1 : 0
	+sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) ? A : B
	+sb_rdonly(SB) ? A : B
	)

to remove left over excess bracketage and finally by applying:

	@@ expression A, SB; @@
	(
	-(A & MS_RDONLY) != sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
	|
	-(A & MS_RDONLY) == sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
	)

to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-17 08:45:34 +01:00
Linus Torvalds 78dcf73421 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ->s_options removal from Al Viro:
 "Preparations for fsmount/fsopen stuff (coming next cycle). Everything
  gets moved to explicit ->show_options(), killing ->s_options off +
  some cosmetic bits around fs/namespace.c and friends. Basically, the
  stuff needed to work with fsmount series with minimum of conflicts
  with other work.

  It's not strictly required for this merge window, but it would reduce
  the PITA during the coming cycle, so it would be nice to have those
  bits and pieces out of the way"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  isofs: Fix isofs_show_options()
  VFS: Kill off s_options and helpers
  orangefs: Implement show_options
  9p: Implement show_options
  isofs: Implement show_options
  afs: Implement show_options
  affs: Implement show_options
  befs: Implement show_options
  spufs: Implement show_options
  bpf: Implement show_options
  ramfs: Implement show_options
  pstore: Implement show_options
  omfs: Implement show_options
  hugetlbfs: Implement show_options
  VFS: Don't use save/replace_mount_options if not using generic_show_options
  VFS: Provide empty name qstr
  VFS: Make get_filesystem() return the affected filesystem
  VFS: Clean up whitespace in fs/namespace.c and fs/super.c
  Provide a function to create a NUL-terminated string from unterminated data
2017-07-15 12:00:42 -07:00
Linus Torvalds 088737f44b Writeback error handling fixes (pile #2)
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZXhmCAAoJEAAOaEEZVoIVpRkP/1qlYn3pq6d5Kuz84pejOmlL
 5jbkS/cOmeTxeUU4+B1xG8Lx7bAk8PfSXQOADbSJGiZd0ug95tJxplFYIGJzR/tG
 aNMHeu/BVKKhUKORGuKR9rJKtwC839L/qao+yPBo5U3mU4L73rFWX8fxFuhSJ8HR
 hvkgBu3Hx6GY59CzxJ8iJzj+B+uPSFrNweAk0+0UeWkBgTzEdiGqaXBX4cHIkq/5
 hMoCG+xnmwHKbCBsQ5js+YJT+HedZ4lvfjOqGxgElUyjJ7Bkt/IFYOp8TUiu193T
 tA4UinDjN8A7FImmIBIftrECmrAC9HIGhGZroYkMKbb8ReDR2ikE5FhKEpuAGU3a
 BXBgX2mPQuArvZWM7qeJCkxV9QJ0u/8Ykbyzo30iPrICyrzbEvIubeB/mDA034+Z
 Z0/z8C3v7826F3zP/NyaQEojUgRq30McMOIS8GMnx15HJwRsRKlzjfy9Wm4tWhl0
 t3nH1jMqAZ7068s6rfh/oCwdgGOwr5o4hW/bnlITzxbjWQUOnZIe7KBxIezZJ2rv
 OcIwd5qE8PNtpagGj5oUbnjGOTkERAgsMfvPk5tjUNt28/qUlVs2V0aeo47dlcsh
 oYr8WMOIzw98Rl7Bo70mplLrqLD6nGl0LfXOyUlT4STgLWW4ksmLVuJjWIUxcO/0
 yKWjj9wfYRQ0vSUqhsI5
 =3Z93
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull Writeback error handling updates from Jeff Layton:
 "This pile represents the bulk of the writeback error handling fixes
  that I have for this cycle. Some of the earlier patches in this pile
  may look trivial but they are prerequisites for later patches in the
  series.

  The aim of this set is to improve how we track and report writeback
  errors to userland. Most applications that care about data integrity
  will periodically call fsync/fdatasync/msync to ensure that their
  writes have made it to the backing store.

  For a very long time, we have tracked writeback errors using two flags
  in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
  writeback error occurs (via mapping_set_error) and are cleared as a
  side-effect of filemap_check_errors (as you noted yesterday). This
  model really sucks for userland.

  Only the first task to call fsync (or msync or fdatasync) will see the
  error. Any subsequent task calling fsync on a file will get back 0
  (unless another writeback error occurs in the interim). If I have
  several tasks writing to a file and calling fsync to ensure that their
  writes got stored, then I need to have them coordinate with one
  another. That's difficult enough, but in a world of containerized
  setups that coordination may even not be possible.

  But wait...it gets worse!

  The calls to filemap_check_errors can be buried pretty far down in the
  call stack, and there are internal callers of filemap_write_and_wait
  and the like that also end up clearing those errors. Many of those
  callers ignore the error return from that function or return it to
  userland at nonsensical times (e.g. truncate() or stat()). If I get
  back -EIO on a truncate, there is no reason to think that it was
  because some previous writeback failed, and a subsequent fsync() will
  (incorrectly) return 0.

  This pile aims to do three things:

   1) ensure that when a writeback error occurs that that error will be
      reported to userland on a subsequent fsync/fdatasync/msync call,
      regardless of what internal callers are doing

   2) report writeback errors on all file descriptions that were open at
      the time that the error occurred. This is a user-visible change,
      but I think most applications are written to assume this behavior
      anyway. Those that aren't are unlikely to be hurt by it.

   3) document what filesystems should do when there is a writeback
      error. Today, there is very little consistency between them, and a
      lot of cargo-cult copying. We need to make it very clear what
      filesystems should do in this situation.

  To achieve this, the set adds a new data type (errseq_t) and then
  builds new writeback error tracking infrastructure around that. Once
  all of that is in place, we change the filesystems to use the new
  infrastructure for reporting wb errors to userland.

  Note that this is just the initial foray into cleaning up this mess.
  There is a lot of work remaining here:

   1) convert the rest of the filesystems in a similar fashion. Once the
      initial set is in, then I think most other fs' will be fairly
      simple to convert. Hopefully most of those can in via individual
      filesystem trees.

   2) convert internal waiters on writeback to use errseq_t for
      detecting errors instead of relying on the AS_* flags. I have some
      draft patches for this for ext4, but they are not quite ready for
      prime time yet.

  This was a discussion topic this year at LSF/MM too. If you're
  interested in the gory details, LWN has some good articles about this:

      https://lwn.net/Articles/718734/
      https://lwn.net/Articles/724307/"

* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  btrfs: minimal conversion to errseq_t writeback error reporting on fsync
  xfs: minimal conversion to errseq_t writeback error reporting
  ext4: use errseq_t based error handling for reporting data writeback errors
  fs: convert __generic_file_fsync to use errseq_t based reporting
  block: convert to errseq_t based writeback error tracking
  dax: set errors in mapping when writeback fails
  Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
  mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
  fs: new infrastructure for writeback error handling and reporting
  lib: add errseq_t type and infrastructure for handling it
  mm: don't TestClearPageError in __filemap_fdatawait_range
  mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
  jbd2: don't clear and reset errors after waiting on writeback
  buffer: set errors in mapping at the time that the error occurs
  fs: check for writeback errors after syncing out buffers in generic_file_fsync
  buffer: use mapping_set_error instead of setting the flag
  mm: fix mapping_set_error call in me_pagecache_dirty
2017-07-07 19:38:17 -07:00
Andreas Gruenbacher 961ae1d83d gfs2: Fix glock rhashtable rcu bug
Before commit 88ffbf3e03 "GFS2: Use resizable hash table for glocks",
glocks were freed via call_rcu to allow reading the glock hashtable
locklessly using rcu.  This was then changed to free glocks immediately,
which made reading the glock hashtable unsafe.  Bring back the original
code for freeing glocks via call_rcu.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # 4.3+
2017-07-07 13:22:05 -05:00
Jeff Layton 87354e5de0 buffer: set errors in mapping at the time that the error occurs
I noticed on xfs that I could still sometimes get back an error on fsync
on a fd that was opened after the error condition had been cleared.

The problem is that the buffer code sets the write_io_error flag and
then later checks that flag to set the error in the mapping. That flag
perisists for quite a while however. If the file is later opened with
O_TRUNC, the buffers will then be invalidated and the mapping's error
set such that a subsequent fsync will return error. I think this is
incorrect, as there was no writeback between the open and fsync.

Add a new mark_buffer_write_io_error operation that sets the flag and
the error in the mapping at the same time. Replace all calls to
set_buffer_write_io_error with mark_buffer_write_io_error, and remove
the places that check this flag in order to set the error in the
mapping.

This sets the error in the mapping earlier, at the time that it's first
detected.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-07-06 07:02:21 -04:00
David Howells cdf01226b2 VFS: Provide empty name qstr
Provide an empty name (ie. "") qstr for general use.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-06 03:27:09 -04:00
Linus Torvalds c96e6dabfb We've got eight GFS2 patches for this merge window:
1. Andreas Gruenbacher has four patches related to cleaning up the GFS2
    inode evict process. This is about half of his patches designed to
    fix a long-standing GFS2 hang related to the inode shrinker.
    (Shrinker calls gfs2 evict, evict calls DLM, DLM requires memory
    and blocks on the shrinker.) These 4 patches have been well tested.
    His second set of patches are still being tested, so I plan to hold
    them until the next merge window, after we have more weeks of testing.
    The first patch eliminates the flush_delayed_work, which can block.
 2. Andreas's second patch protects setting of gl_object for rgrps with
    a spin_lock to prevent proven races.
 3. His third patch introduces a centralized mechanism for queueing glock
    work with better reference counting, to prevent more races.
 4. His fourth patch retains a reference to inode glocks when an error
    occurs while creating an inode. This keeps the subsequent evict from
    needing to reacquire the glock, which might call into DLM and block
    in low memory conditions.
 5. Arvind Yadav has a patch to add const to attribute_group structures.
 6. I have a patch to detect directory entry inconsistencies and withdraw
    the file system if any are found. Better that than silent corruption.
 7. I have a patch to remove a vestigial variable from glock structures,
    saving some slab space.
 8. I have another patch to remove a vestigial variable from the GFS2
    in-core superblock structure.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZXOIfAAoJENeLYdPf93o7RVcH/jLEK3hmZOd94pDTYg3Damuo
 KI3xjyutDgQT83uwg8p5UBPwRYCDnyiOLwOWGBJJvjPEI1S4syrXq/FzOmxmX6cV
 nE28ARL/OXCoFEXBMUVHvHL3nK+zEUr8rO6Xz51B1ifVq7GV8iVK+ZgxzRhx0PWP
 f+0SVHiQtU0HKyxR5y9p43oygtHZaGbjy4WL0YbmFZM59y5q9A8rBHFACn2JyPBm
 /zXN6gF/Orao+BDXLT6OM3vNXZcOQ7FUPWwctguHsAO/bLzWiISyfJxLWJsHvSdW
 tzFTN1DByjXvqAhs4HTSuh9JfBDAyxcXkmczXJyATBkCTEJv42Iev+ILmre+wwQ=
 =YTwn
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.13.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We've got eight GFS2 patches for this merge window:

   - Andreas Gruenbacher has four patches related to cleaning up the
     GFS2 inode evict process. This is about half of his patches
     designed to fix a long-standing GFS2 hang related to the inode
     shrinker: Shrinker calls gfs2 evict, evict calls DLM, DLM requires
     memory and blocks on the shrinker.

     These four patches have been well tested. His second set of patches
     are still being tested, so I plan to hold them until the next merge
     window, after we have more weeks of testing. The first patch
     eliminates the flush_delayed_work, which can block.

   - Andreas's second patch protects setting of gl_object for rgrps with
     a spin_lock to prevent proven races.

   - His third patch introduces a centralized mechanism for queueing
     glock work with better reference counting, to prevent more races.

    -His fourth patch retains a reference to inode glocks when an error
     occurs while creating an inode. This keeps the subsequent evict
     from needing to reacquire the glock, which might call into DLM and
     block in low memory conditions.

   - Arvind Yadav has a patch to add const to attribute_group
     structures.

   - I have a patch to detect directory entry inconsistencies and
     withdraw the file system if any are found. Better that than silent
     corruption.

   - I have a patch to remove a vestigial variable from glock
     structures, saving some slab space.

   - I have another patch to remove a vestigial variable from the GFS2
     in-core superblock structure"

* tag 'gfs2-4.13.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: constify attribute_group structures.
  gfs2: gfs2_create_inode: Keep glock across iput
  gfs2: Clean up glock work enqueuing
  gfs2: Protect gl->gl_object by spin lock
  gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
  GFS2: Eliminate vestigial sd_log_flush_wrapped
  GFS2: Remove gl_list from glock structure
  GFS2: Withdraw when directory entry inconsistencies are detected
2017-07-05 16:57:08 -07:00
Arvind Yadav 29695254ec GFS2: constify attribute_group structures.
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
   5259	   1344	      8	   6611	   19d3	fs/gfs2/sys.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   5371	   1216	      8	   6595	   19c3	fs/gfs2/sys.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:21:14 -05:00
Andreas Gruenbacher e0b62e21b7 gfs2: gfs2_create_inode: Keep glock across iput
On failure, keep the inode glock across the final iput of the new inode
so that gfs2_evict_inode doesn't have to re-acquire the glock.  That
way, gfs2_evict_inode won't need to revalidate the block type.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:21:07 -05:00
Andreas Gruenbacher 6b0c7440bc gfs2: Clean up glock work enqueuing
This patch adds a standardized queueing mechanism for glock work
with spin_lock protection to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:21:00 -05:00
Andreas Gruenbacher 6f6597baae gfs2: Protect gl->gl_object by spin lock
Put all remaining accesses to gl->gl_object under the
gl->gl_lockref.lock spinlock to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:20:52 -05:00
Andreas Gruenbacher 4fd1a57952 gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock
work queue to make sure that inode glops which dereference gl->gl_object
have finished running before the inode is destroyed.  However, flushing
the work queue may do more work than needed, and in particular, it may
call into DLM, which we want to avoid here.  Use a bit lock
(GIF_GLOP_PENDING) to synchronize between the inode glops and
gfs2_evict_inode instead to get rid of the flushing.

In addition, flush the work queues of existing glocks before reusing
them for new inodes to get those glocks into a known state: the glock
state engine currently doesn't handle glock re-appropriation correctly.
(We may be able to fix the glock state engine instead later.)

Based on a patch by Steven Whitehouse <swhiteho@redhat.com>.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:20:24 -05:00
Bob Peterson 722f6f62a5 GFS2: Eliminate vestigial sd_log_flush_wrapped
Superblock variable sd_log_flush_wrapped is set, but never referenced,
so this patch eliminates it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-06-20 09:52:57 -05:00
Christoph Hellwig fdd050b5b3 Merge branch 'uuid-types' of bombadil.infradead.org:public_git/uuid into nvme-base 2017-06-13 11:45:14 +02:00
Bob Peterson df68f20f56 GFS2: Remove gl_list from glock structure
The gl_list is no longer used nor needed in the glock structure,
so this patch eliminates it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-06-12 14:39:12 -05:00
Bob Peterson d87d62b75d GFS2: Withdraw when directory entry inconsistencies are detected
This patch prints an inode consistency error and withdraws the file
system when directory entry counts are mismatched.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-06-12 14:38:53 -05:00
Jens Axboe 8f66439eec Linux 4.12-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh
 biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G
 kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU
 4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+
 2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E
 W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4=
 =m2Gx
 -----END PGP SIGNATURE-----

Merge tag 'v4.12-rc5' into for-4.13/block

We've already got a few conflicts and upcoming work depends on some of the
changes that have gone into mainline as regression fixes for this series.

Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
trees to continue working on 4.13 changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-12 08:30:13 -06:00
Christoph Hellwig 4e4cbee93d block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Christoph Hellwig f729b66fca gfs2: remove the unused sd_log_error field
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Christoph Hellwig 85787090a2 fs: switch ->s_uuid to uuid_t
For some file systems we still memcpy into it, but in various places this
already allows us to use the proper uuid helpers.  More to come..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-06-05 16:59:12 +02:00
Jan Kara 0f0b9b63e1 gfs2: Make flush bios explicitely sync
Commit b685d3d65a "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d65a
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: cluster-devel@redhat.com
CC: stable@vger.kernel.org
Acked-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-05-24 13:35:20 +02:00
Stephen Rothwell b32c8c7648 gfs2: replace CURRENT_TIME with current_time
Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.au
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Linus Torvalds 1a5fb64fee We've got ten GFS2 patches for this merge window.
1. Andreas Gruenbacher wrote a patch to replace the deprecated
    call to rhashtable_walk_init with rhashtable_walk_enter.
 2. Andreas also wrote a patch to eliminate redundant code in
    two of our debugfs sequence files.
 3. Andreas also cleaned up the rhashtable key ugliness Linus
    pointed out during this cycle, following Linus's suggestions.
 4. Andreas also wrote a patch to take advantage of his new
    function rhashtable_lookup_get_insert_fast. This makes glock
    lookup faster and more bullet-proof.
 5. Andreas also wrote a patch to revert a patch in the evict
    path that caused occasional deadlocks, and is no longer
    needed.
 6. Andrew Price wrote a patch to re-enable fallocate for the
    rindex system file to enable gfs2_grow to grow properly on
    secondary file system grow operations.
 7. I wrote a patch to initialize an inode number field to make
    certain kernel trace points more understandable.
 8. I also wrote a patch that makes GFS2 file system "withdraw"
    work more like it should by ignoring operations after a
    withdraw that would formerly cause a BUG() and kernel panic.
 9. I also reworked the entire truncate/delete algorithm,
    scrapping the old recursive algorithm in favor of a new
    non-recursive algorithm. This was done for performance:
    This way, GFS2 no longer needs to lock multiple resource
    groups while doing truncates and deletes of files that cross
    multiple resource group boundaries, allowing for better
    parallelism. It also solves a problem whereby deleting large
    files would request a large chunk of kernel memory, which
    resulted in a get_page_from_freelist warning.
 10. Due to a regression found during testing, I added a new
     patch to correct "GFS2: Prevent BUG from occurring when
     normal Withdraws occur".
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZDNnaAAoJENeLYdPf93o7B7kIAJzwz7vVDVg2TpWVhMmXIWhf
 rZx3Gth5F0h+ZHddW7HzTLg+64XQ5//GyDD3UDtCpkhl5SJH+nt3juHyPJlRwioT
 0ua4SjyKLQSoJJVAEgAwu42QjORTXab7NjYn5LEhvRc0Gg/El9WGU+ZgmP2/aAvf
 KE2u/IEYNDkoJNS3Oqc7shajAyLYda6wCAASs/1ZGt9u48m/o/I23Zd7wr7EOkzw
 rd3gB0x80cJqDAB5IcymGOm111Tg4g34LwsRuyMnWE3H1jOgV+J515FVHEIvZuPq
 Wl9X7V8CzktI7nyLKVnZhpuv5JzyMq/vOPiD01tTFx8Oy1JCRezjmATXFjW/zIo=
 =MX3c
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.12.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We've got ten GFS2 patches for this merge window.

   - Andreas Gruenbacher wrote a patch to replace the deprecated call to
     rhashtable_walk_init with rhashtable_walk_enter.

   - Andreas also wrote a patch to eliminate redundant code in two of
     our debugfs sequence files.

   - Andreas also cleaned up the rhashtable key ugliness Linus pointed
     out during this cycle, following Linus's suggestions.

   - Andreas also wrote a patch to take advantage of his new function
     rhashtable_lookup_get_insert_fast. This makes glock lookup faster
     and more bullet-proof.

   - Andreas also wrote a patch to revert a patch in the evict path that
     caused occasional deadlocks, and is no longer needed.

   - Andrew Price wrote a patch to re-enable fallocate for the rindex
     system file to enable gfs2_grow to grow properly on secondary file
     system grow operations.

   - I wrote a patch to initialize an inode number field to make certain
     kernel trace points more understandable.

   - I also wrote a patch that makes GFS2 file system "withdraw" work
     more like it should by ignoring operations after a withdraw that
     would formerly cause a BUG() and kernel panic.

   - I also reworked the entire truncate/delete algorithm, scrapping the
     old recursive algorithm in favor of a new non-recursive algorithm.
     This was done for performance: This way, GFS2 no longer needs to
     lock multiple resource groups while doing truncates and deletes of
     files that cross multiple resource group boundaries, allowing for
     better parallelism. It also solves a problem whereby deleting large
     files would request a large chunk of kernel memory, which resulted
     in a get_page_from_freelist warning.

   - Due to a regression found during testing, I added a new patch to
     correct 'GFS2: Prevent BUG from occurring when normal Withdraws
     occur'."

* tag 'gfs2-4.12.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Allow glocks to be unlocked after withdraw
  GFS2: Non-recursive delete
  gfs2: Re-enable fallocate for the rindex
  Revert "GFS2: Wait for iopen glock dequeues"
  gfs2: Switch to rhashtable_lookup_get_insert_fast
  GFS2: Temporarily zero i_no_addr when creating a dinode
  gfs2: Don't pack struct lm_lockname
  gfs2: Deduplicate gfs2_{glocks,glstats}_open
  gfs2: Replace rhashtable_walk_init with rhashtable_walk_enter
  GFS2: Prevent BUG from occurring when normal Withdraws occur
2017-05-05 13:40:20 -07:00
Bob Peterson ed17545d01 GFS2: Allow glocks to be unlocked after withdraw
This bug fixes a regression introduced by patch 0d1c7ae9d8.

The intent of the patch was to stop promoting glocks after a
file system is withdrawn due to a variety of errors, because doing
so results in a BUG(). (You should be able to unmount after a
withdraw rather than having the kernel panic.)

Unfortunately, it also stopped demotions, so glocks could not be
unlocked after withdraw, which means the unmount would hang.

This patch allows function do_xmote to demote locks to an
unlocked state after a withdraw, but not promote them.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-05-05 14:19:28 -05:00
Jan Kara c1844d536d fs: Remove SB_I_DYNBDI flag
Now that all bdi structures filesystems use are properly refcounted, we
can remove the SB_I_DYNBDI flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:09:55 -06:00
Jan Kara 95fe66de9f gfs2: Convert to properly refcounting bdi
Similarly to set_bdev_super() GFS2 just used block device reference to
bdi. Convert it to properly getting bdi reference. The reference will
get automatically dropped on superblock destruction.

CC: Steven Whitehouse <swhiteho@redhat.com>
CC: Bob Peterson <rpeterso@redhat.com>
CC: cluster-devel@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:09:55 -06:00
Bob Peterson d552a2b9b3 GFS2: Non-recursive delete
Implement truncate/delete as a non-recursive algorithm. The older
algorithm was implemented with recursion to strip off each layer
at a time (going by height, starting with the maximum height.
This version tries to do the same thing but without recursion,
and without needing to allocate new structures or lists in memory.

For example, say you want to truncate a very large file to 1 byte,
and its end-of-file metapath is: 0.505.463.428. The starting
metapath would be 0.0.0.0. Since it's a truncate to non-zero, it
needs to preserve that byte, and all metadata pointing to it.
So it would start at 0.0.0.0, look up all its metadata buffers,
then free all data blocks pointed to at the highest level.
After that buffer is "swept", it moves on to 0.0.0.1, then
0.0.0.2, etc., reading in buffers and sweeping them clean.
When it gets to the end of the 0.0.0 metadata buffer (for 4K
blocks the last valid one is 0.0.0.508), it backs up to the
previous height and starts working on 0.0.1.0, then 0.0.1.1,
and so forth. After it reaches the end and sweeps 0.0.1.508,
it continues with 0.0.2.0, and so on. When that height is
exhausted, and it reaches 0.0.508.508 it backs up another level,
to 0.1.0.0, then 0.1.0.1, through 0.1.0.508. So it has to keep
marching backwards and forwards through the metadata until it's
all swept clean. Once it has all the data blocks freed, it
lowers the strip height, and begins the process all over again,
but with one less height. This time it sweeps 0.0.0 through
0.505.463. When that's clean, it lowers the strip height again
and works to free 0.505. Eventually it strips the lowest height, 0.
For a delete or truncate to 0, all metadata for all heights of
0.0.0.0 would be freed. For a truncate to 1 byte, 0.0.0.0 would
be preserved.

This isn't much different from normal integer incrementing,
where an integer gets incremented from 0000 (0.0.0.0) to 3021
(3.0.2.1). So 0000 gets increments to 0001, 0002, up to 0009,
then on to 0010, 0011 up to 0099, then 0100 and so forth. It's
just that each "digit" goes from 0 to 508 (for a total of 509
pointers) rather than from 0 to 9.

Note that the dinode will only have 483 pointers due to the
dinode structure itself.

Also note: this is just an example. These numbers (509 and 483)
are based on a standard 4K block size. Smaller block sizes will
yield smaller numbers of indirect pointers accordingly.

The truncation process is accomplished with the help of two
major functions and a few helper functions.

Functions do_strip and recursive_scan are obsolete, so removed.

New function sweep_bh_for_rgrps cleans a buffer_head pointed to
by the given metapath and height. By cleaning, I mean it frees
all blocks starting at the offset passed in metapath. It starts
at the first block in the buffer pointed to by the metapath and
identifies its resource group (rgrp). From there it frees all
subsequent block pointers that lie within that rgrp. If it's
already inside a transaction, it stays within it as long as it
can. In other words, it doesn't close a transaction until it knows
it's freed what it can from the resource group. In this way,
multiple buffers may be cleaned in a single transaction, as long
as those blocks in the buffer all lie within the same rgrp.

If it's not in a transaction, it starts one. If the buffer_head
has references to blocks within multiple rgrps, it frees all the
blocks inside the first rgrp it finds, then closes the
transaction. Then it repeats the cycle: identifies the next
unfreed block, uses it to find its rgrp, then starts a new
transaction for that set. It repeats this process repeatedly
until the buffer_head contains no more references to any blocks
past the given metapath.

Function trunc_dealloc has been reworked into a finite state
automaton. It has basically 3 active states:
DEALLOC_MP_FULL, DEALLOC_MP_LOWER, and DEALLOC_FILL_MP:

The DEALLOC_MP_FULL state implies the metapath has a full set
of buffers out to the "shrink height", and therefore, it can
call function sweep_bh_for_rgrps to free the blocks within the
highest height of the metapath. If it's just swept the lowest
level (or an error has occurred) the state machine is ended.
Otherwise it proceeds to the DEALLOC_MP_LOWER state.

The DEALLOC_MP_LOWER state implies we are finished with a given
buffer_head, which may now be released, and therefore we are
then missing some buffer information from the metapath. So we
need to find more buffers to read in. In most cases, this is
just a matter of releasing the buffer_head and moving to the
next pointer from the previous height, so it may be read in and
swept as well. If it can't find another non-null pointer to
process, it checks whether it's reached the end of a height
and needs to lower the strip height, or whether it still needs
move forward through the previous height's metadata. In this
state, all zero-pointers are skipped. From this state, it can
only loop around (once more backing up another height) or,
once a valid metapath is found (one that has non-zero
pointers), proceed to state DEALLOC_FILL_MP.

The DEALLOC_FILL_MP state implies that we have a metapath
but not all its buffers are read in. So we must proceed to read
in buffer_heads until the metapath has a valid buffer for every
height. If the previous state backed us up 3 heights, we may
need to read in a buffer, increment the height, then repeat the
process until buffers have been read in for all required heights.
If it's successful reading a buffer, and it's at the highest
height we need, it proceeds back to the DEALLOC_MP_FULL state.
If it's unable to fill in a buffer, (encounters a hole, etc.)
it tries to find another non-zero block pointer. If they're all
zero, it lowers the height and returns to the DEALLOC_MP_LOWER
state. If it finds a good non-null pointer, it loops around and
reads it in, while keeping the metapath in lock-step with the
pointers it examines.

The state machine runs until the truncation request is
satisfied. Then any transactions are ended, the quota and
statfs data are updated, and the function is complete.

Helper function metaptr1 was introduced to be an easy way to
determine the start of a buffer_head's indirect pointers.

Helper function lookup_mp_height was introduced to find a
metapath index and read in the buffer that corresponds to it.
In this way, function lookup_metapath becomes a simple loop to
call it for every height.

Helper function fillup_metapath is similar to lookup_metapath
except it can do partial lookups. If the state machine
backed up multiple levels (like 2999 wrapping to 3000) it
needs to find out the next starting point and start issuing
metadata reads at that point.

Helper function hptrs is a shortcut to determine how many
pointers should be expected in a buffer. Height 0 is the dinode
which has fewer pointers than the others.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-19 08:25:43 -04:00
Andrew Price d4d7fc12b6 gfs2: Re-enable fallocate for the rindex
Commit 86066914ed "gfs2: Don't support
fallocate on jdata files" removed the ability of gfs2_grow to reserve
space at the end of the rindex, which could prevent a second gfs2_grow
from succeeding if the fs is full. Allow fallocate to work on the rindex
once again.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-05 11:45:26 -04:00
Andreas Gruenbacher d4da31986c Revert "GFS2: Wait for iopen glock dequeues"
Revert commit 86d067a797: it turns out
that waiting for iopen glock dequeues here isn't needed anymore because
the bugs that commit was meant to fix have been fixed otherwise.

In addition, we want to avoid waiting on glocks in gfs2_evict_inode in
shrinker context because the shrinker may be invoked on behalf of DLM,
in which case calling into DLM again would deadlock.  This commit makes
the described scenario less likely without completely avoiding it; it's
still a step in the right direction, though.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-03 09:14:47 -04:00
Andreas Gruenbacher 0a52aba7c2 gfs2: Switch to rhashtable_lookup_get_insert_fast
Switch from rhashtable_lookup_insert_fast + rhashtable_lookup_fast to
rhashtable_lookup_get_insert_fast, which is cleaner and avoids an extra
rhashtable lookup.

At the same time, turn the retry loop in gfs2_glock_get into an infinite
loop.  The lookup or insert will eventually succeed, usually very fast,
but there is no reason to give up trying at a fixed number of
iterations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-03 09:14:41 -04:00
Bob Peterson cc963a11b6 GFS2: Temporarily zero i_no_addr when creating a dinode
Before this patch i_no_addr was not initialized until after the
return from allocating its block. That meant the i_no_addr was
temporarily uninitialized storage. Ordinarily that's not a concern,
but if inplace_reserve can't find space, it can call try_rgrp_unlink
which references i_no_addr as a block to avoid. That can result in
unpredictable behavior. More importantly, the trace point in
gfs2_alloc_blocks references ip->i_no_addr before it is set, which
is misleading when reading the kernel traces. This patch makes it
look like the new dinode block was assigned in the name of inode 0
rather than a random inode that's completely unrelated.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 15:29:13 -04:00
Andreas Gruenbacher 972b044eec gfs2: Don't pack struct lm_lockname
As per a suggestion by Linus, don't pack struct lm_lockname: we did that
because the struct is used as a rhashtable key, but packing tells the
compiler that the 64-bit fields in the struct may be unaligned, causing
it to generate worse code on some architectures.  Instead, rearrange the
fields in the struct so that there is no padding between fields, and
exclude any tail padding from the hash key size.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 09:58:49 -04:00
Andreas Gruenbacher 92ecd73a88 gfs2: Deduplicate gfs2_{glocks,glstats}_open
Both functions are identical except for the seq_operations used.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 08:18:35 -04:00
Andreas Gruenbacher cc37a62785 gfs2: Replace rhashtable_walk_init with rhashtable_walk_enter
Function rhashtable_walk_init is deprecated.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 08:18:35 -04:00
Bob Peterson 0d1c7ae9d8 GFS2: Prevent BUG from occurring when normal Withdraws occur
When the GFS2 file system withdraws due to metadata corruption, it
often has outstanding transactions in the journal and delayed work
queued for its glocks. This patch adds some new checks for a
withdrawn file system before proceeding with operations that would
obviously cause a BUG() to be triggered. That allows GFS2 to be
safely unmounted rather than cause the system to go down.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 08:18:35 -04:00
Andreas Gruenbacher 28ea06c46f gfs2: Avoid alignment hole in struct lm_lockname
Commit 88ffbf3e03 switches to using rhashtables for glocks, hashing over
the entire struct lm_lockname instead of its individual fields.  On some
architectures, struct lm_lockname contains a hole of uninitialized
memory due to alignment rules, which now leads to incorrect hash values.
Get rid of that hole.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
CC: <stable@vger.kernel.org> #v4.3+
2017-03-15 10:06:07 -04:00
Linus Torvalds 590dce2d49 Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs 'statx()' update from Al Viro.

This adds the new extended stat() interface that internally subsumes our
previous stat interfaces, and allows user mode to specify in more detail
what kind of information it wants.

It also allows for some explicit synchronization information to be
passed to the filesystem, which can be relevant for network filesystems:
is the cached value ok, or do you need open/close consistency, or what?

From David Howells.

Andreas Dilger points out that the first version of the extended statx
interface was posted June 29, 2010:

    https://www.spinics.net/lists/linux-fsdevel/msg33831.html

* 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  statx: Add a system call to make enhanced file info available
2017-03-03 11:38:56 -08:00
David Howells a528d35e8b statx: Add a system call to make enhanced file info available
Add a system call to make extended file information available, including
file creation and some attribute flags where available through the
underlying filesystem.

The getattr inode operation is altered to take two additional arguments: a
u32 request_mask and an unsigned int flags that indicate the
synchronisation mode.  This change is propagated to the vfs_getattr*()
function.

Functions like vfs_stat() are now inline wrappers around new functions
vfs_statx() and vfs_statx_fd() to reduce stack usage.

========
OVERVIEW
========

The idea was initially proposed as a set of xattrs that could be retrieved
with getxattr(), but the general preference proved to be for a new syscall
with an extended stat structure.

A number of requests were gathered for features to be included.  The
following have been included:

 (1) Make the fields a consistent size on all arches and make them large.

 (2) Spare space, request flags and information flags are provided for
     future expansion.

 (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
     __s64).

 (4) Creation time: The SMB protocol carries the creation time, which could
     be exported by Samba, which will in turn help CIFS make use of
     FS-Cache as that can be used for coherency data (stx_btime).

     This is also specified in NFSv4 as a recommended attribute and could
     be exported by NFSD [Steve French].

 (5) Lightweight stat: Ask for just those details of interest, and allow a
     netfs (such as NFS) to approximate anything not of interest, possibly
     without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
     Dilger] (AT_STATX_DONT_SYNC).

 (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
     its cached attributes are up to date [Trond Myklebust]
     (AT_STATX_FORCE_SYNC).

And the following have been left out for future extension:

 (7) Data version number: Could be used by userspace NFS servers [Aneesh
     Kumar].

     Can also be used to modify fill_post_wcc() in NFSD which retrieves
     i_version directly, but has just called vfs_getattr().  It could get
     it from the kstat struct if it used vfs_xgetattr() instead.

     (There's disagreement on the exact semantics of a single field, since
     not all filesystems do this the same way).

 (8) BSD stat compatibility: Including more fields from the BSD stat such
     as creation time (st_btime) and inode generation number (st_gen)
     [Jeremy Allison, Bernd Schubert].

 (9) Inode generation number: Useful for FUSE and userspace NFS servers
     [Bernd Schubert].

     (This was asked for but later deemed unnecessary with the
     open-by-handle capability available and caused disagreement as to
     whether it's a security hole or not).

(10) Extra coherency data may be useful in making backups [Andreas Dilger].

     (No particular data were offered, but things like last backup
     timestamp, the data version number and the DOS archive bit would come
     into this category).

(11) Allow the filesystem to indicate what it can/cannot provide: A
     filesystem can now say it doesn't support a standard stat feature if
     that isn't available, so if, for instance, inode numbers or UIDs don't
     exist or are fabricated locally...

     (This requires a separate system call - I have an fsinfo() call idea
     for this).

(12) Store a 16-byte volume ID in the superblock that can be returned in
     struct xstat [Steve French].

     (Deferred to fsinfo).

(13) Include granularity fields in the time data to indicate the
     granularity of each of the times (NFSv4 time_delta) [Steve French].

     (Deferred to fsinfo).

(14) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
     Note that the Linux IOC flags are a mess and filesystems such as Ext4
     define flags that aren't in linux/fs.h, so translation in the kernel
     may be a necessity (or, possibly, we provide the filesystem type too).

     (Some attributes are made available in stx_attributes, but the general
     feeling was that the IOC flags were to ext[234]-specific and shouldn't
     be exposed through statx this way).

(15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
     Michael Kerrisk].

     (Deferred, probably to fsinfo.  Finding out if there's an ACL or
     seclabal might require extra filesystem operations).

(16) Femtosecond-resolution timestamps [Dave Chinner].

     (A __reserved field has been left in the statx_timestamp struct for
     this - if there proves to be a need).

(17) A set multiple attributes syscall to go with this.

===============
NEW SYSTEM CALL
===============

The new system call is:

	int ret = statx(int dfd,
			const char *filename,
			unsigned int flags,
			unsigned int mask,
			struct statx *buffer);

The dfd, filename and flags parameters indicate the file to query, in a
similar way to fstatat().  There is no equivalent of lstat() as that can be
emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags.  There is
also no equivalent of fstat() as that can be emulated by passing a NULL
filename to statx() with the fd of interest in dfd.

Whether or not statx() synchronises the attributes with the backing store
can be controlled by OR'ing a value into the flags argument (this typically
only affects network filesystems):

 (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
     respect.

 (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
     its attributes with the server - which might require data writeback to
     occur to get the timestamps correct.

 (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
     network filesystem.  The resulting values should be considered
     approximate.

mask is a bitmask indicating the fields in struct statx that are of
interest to the caller.  The user should set this to STATX_BASIC_STATS to
get the basic set returned by stat().  It should be noted that asking for
more information may entail extra I/O operations.

buffer points to the destination for the data.  This must be 256 bytes in
size.

======================
MAIN ATTRIBUTES RECORD
======================

The following structures are defined in which to return the main attribute
set:

	struct statx_timestamp {
		__s64	tv_sec;
		__s32	tv_nsec;
		__s32	__reserved;
	};

	struct statx {
		__u32	stx_mask;
		__u32	stx_blksize;
		__u64	stx_attributes;
		__u32	stx_nlink;
		__u32	stx_uid;
		__u32	stx_gid;
		__u16	stx_mode;
		__u16	__spare0[1];
		__u64	stx_ino;
		__u64	stx_size;
		__u64	stx_blocks;
		__u64	__spare1[1];
		struct statx_timestamp	stx_atime;
		struct statx_timestamp	stx_btime;
		struct statx_timestamp	stx_ctime;
		struct statx_timestamp	stx_mtime;
		__u32	stx_rdev_major;
		__u32	stx_rdev_minor;
		__u32	stx_dev_major;
		__u32	stx_dev_minor;
		__u64	__spare2[14];
	};

The defined bits in request_mask and stx_mask are:

	STATX_TYPE		Want/got stx_mode & S_IFMT
	STATX_MODE		Want/got stx_mode & ~S_IFMT
	STATX_NLINK		Want/got stx_nlink
	STATX_UID		Want/got stx_uid
	STATX_GID		Want/got stx_gid
	STATX_ATIME		Want/got stx_atime{,_ns}
	STATX_MTIME		Want/got stx_mtime{,_ns}
	STATX_CTIME		Want/got stx_ctime{,_ns}
	STATX_INO		Want/got stx_ino
	STATX_SIZE		Want/got stx_size
	STATX_BLOCKS		Want/got stx_blocks
	STATX_BASIC_STATS	[The stuff in the normal stat struct]
	STATX_BTIME		Want/got stx_btime{,_ns}
	STATX_ALL		[All currently available stuff]

stx_btime is the file creation time, stx_mask is a bitmask indicating the
data provided and __spares*[] are where as-yet undefined fields can be
placed.

Time fields are structures with separate seconds and nanoseconds fields
plus a reserved field in case we want to add even finer resolution.  Note
that times will be negative if before 1970; in such a case, the nanosecond
fields will also be negative if not zero.

The bits defined in the stx_attributes field convey information about a
file, how it is accessed, where it is and what it does.  The following
attributes map to FS_*_FL flags and are the same numerical value:

	STATX_ATTR_COMPRESSED		File is compressed by the fs
	STATX_ATTR_IMMUTABLE		File is marked immutable
	STATX_ATTR_APPEND		File is append-only
	STATX_ATTR_NODUMP		File is not to be dumped
	STATX_ATTR_ENCRYPTED		File requires key to decrypt in fs

Within the kernel, the supported flags are listed by:

	KSTAT_ATTR_FS_IOC_FLAGS

[Are any other IOC flags of sufficient general interest to be exposed
through this interface?]

New flags include:

	STATX_ATTR_AUTOMOUNT		Object is an automount trigger

These are for the use of GUI tools that might want to mark files specially,
depending on what they are.

Fields in struct statx come in a number of classes:

 (0) stx_dev_*, stx_blksize.

     These are local system information and are always available.

 (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
     stx_size, stx_blocks.

     These will be returned whether the caller asks for them or not.  The
     corresponding bits in stx_mask will be set to indicate whether they
     actually have valid values.

     If the caller didn't ask for them, then they may be approximated.  For
     example, NFS won't waste any time updating them from the server,
     unless as a byproduct of updating something requested.

     If the values don't actually exist for the underlying object (such as
     UID or GID on a DOS file), then the bit won't be set in the stx_mask,
     even if the caller asked for the value.  In such a case, the returned
     value will be a fabrication.

     Note that there are instances where the type might not be valid, for
     instance Windows reparse points.

 (2) stx_rdev_*.

     This will be set only if stx_mode indicates we're looking at a
     blockdev or a chardev, otherwise will be 0.

 (3) stx_btime.

     Similar to (1), except this will be set to 0 if it doesn't exist.

=======
TESTING
=======

The following test program can be used to test the statx system call:

	samples/statx/test-statx.c

Just compile and run, passing it paths to the files you want to examine.
The file is built automatically if CONFIG_SAMPLES is enabled.

Here's some example output.  Firstly, an NFS directory that crosses to
another FSID.  Note that the AUTOMOUNT attribute is set because transiting
this directory will cause d_automount to be invoked by the VFS.

	[root@andromeda ~]# /tmp/test-statx -A /warthog/data
	statx(/warthog/data) = 0
	results=7ff
	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
	Device: 00:26           Inode: 1703937     Links: 125
	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
	Access: 2016-11-24 09:02:12.219699527+0000
	Modify: 2016-11-17 10:44:36.225653653+0000
	Change: 2016-11-17 10:44:36.225653653+0000
	Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

Secondly, the result of automounting on that directory.

	[root@andromeda ~]# /tmp/test-statx /warthog/data
	statx(/warthog/data) = 0
	results=7ff
	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
	Device: 00:27           Inode: 2           Links: 125
	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
	Access: 2016-11-24 09:02:12.219699527+0000
	Modify: 2016-11-17 10:44:36.225653653+0000
	Change: 2016-11-17 10:44:36.225653653+0000

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-02 20:51:15 -05:00
Ingo Molnar 174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Ingo Molnar 5b825c3af1 sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:31 +01:00