1
0
Fork 0
Commit Graph

355 Commits (549c7297717c32ee53f156cd949e055e601f67bb)

Author SHA1 Message Date
Christian Brauner 549c729771
fs: make helpers idmap mount aware
Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.

Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:20 +01:00
Christian Brauner 0d56a4518d
stat: handle idmapped mounts
The generic_fillattr() helper fills in the basic attributes associated
with an inode. Enable it to handle idmapped mounts. If the inode is
accessed through an idmapped mount map it into the mount's user
namespace before we store the uid and gid. If the initial user namespace
is passed nothing changes so non-idmapped mounts will see identical
behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-12-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:17 +01:00
Christian Brauner e65ce2a50c
acl: handle idmapped mounts
The posix acl permission checking helpers determine whether a caller is
privileged over an inode according to the acls associated with the
inode. Add helpers that make it possible to handle acls on idmapped
mounts.

The vfs and the filesystems targeted by this first iteration make use of
posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
translate basic posix access and default permissions such as the
ACL_USER and ACL_GROUP type according to the initial user namespace (or
the superblock's user namespace) to and from the caller's current user
namespace. Adapt these two helpers to handle idmapped mounts whereby we
either map from or into the mount's user namespace depending on in which
direction we're translating.
Similarly, cap_convert_nscap() is used by the vfs to translate user
namespace and non-user namespace aware filesystem capabilities from the
superblock's user namespace to the caller's user namespace. Enable it to
handle idmapped mounts by accounting for the mount's user namespace.

In addition the fileystems targeted in the first iteration of this patch
series make use of the posix_acl_chmod() and, posix_acl_update_mode()
helpers. Both helpers perform permission checks on the target inode. Let
them handle idmapped mounts. These two helpers are called when posix
acls are set by the respective filesystems to handle this case we extend
the ->set() method to take an additional user namespace argument to pass
the mount's user namespace down.

Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:17 +01:00
Christian Brauner 2f221d6f7b
attr: handle idmapped mounts
When file attributes are changed most filesystems rely on the
setattr_prepare(), setattr_copy(), and notify_change() helpers for
initialization and permission checking. Let them handle idmapped mounts.
If the inode is accessed through an idmapped mount map it into the
mount's user namespace. Afterwards the checks are identical to
non-idmapped mounts. If the initial user namespace is passed nothing
changes so non-idmapped mounts will see identical behavior as before.

Helpers that perform checks on the ia_uid and ia_gid fields in struct
iattr assume that ia_uid and ia_gid are intended values and have already
been mapped correctly at the userspace-kernelspace boundary as we
already do today. If the initial user namespace is passed nothing
changes so non-idmapped mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:16 +01:00
Christian Brauner 47291baa8d
namei: make permission helpers idmapped mount aware
The two helpers inode_permission() and generic_permission() are used by
the vfs to perform basic permission checking by verifying that the
caller is privileged over an inode. In order to handle idmapped mounts
we extend the two helpers with an additional user namespace argument.
On idmapped mounts the two helpers will make sure to map the inode
according to the mount's user namespace and then peform identical
permission checks to inode_permission() and generic_permission(). If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:16 +01:00
Andreas Gruenbacher a55a47a3bc Revert "GFS2: Prevent delete work from occurring on glocks used for create"
Since commit a0e3cc65fa ("gfs2: Turn gl_delete into a delayed work"), we're
cancelling any pending delete work of an iopen glock before attaching a new
inode to that glock in gfs2_create_inode.  This means that delete_work_func can
no longer be queued or running when attaching the iopen glock to the new inode,
and we can revert commit a4923865ea ("GFS2: Prevent delete work from
occurring on glocks used for create"), which tried to achieve the same but in a
racy way.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-01 00:25:21 +01:00
Andreas Gruenbacher e3a77eebfa gfs2: Make inode operations static
The inode operations are not used outside inode.c.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-01 00:25:21 +01:00
Andreas Gruenbacher dd0ecf5441 gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func
In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending
delete work before taking the inode glock.  Otherwise, gfs2_cancel_delete_work
may block waiting for delete_work_func to complete, and delete_work_func may
block trying to acquire the inode glock in gfs2_inode_lookup.

Reported-by: Alexander Aring <aahringo@redhat.com>
Fixes: a0e3cc65fa ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-01 00:21:10 +01:00
Andreas Gruenbacher 82e938bd53 gfs2: Upgrade shared glocks for atime updates
Commit 20f829999c ("gfs2: Rework read and page fault locking") lifted
the glock lock taking from the low-level ->readpage and ->readahead
address space operations to the higher-level ->read_iter file and
->fault vm operations.  The glocks are still taken in LM_ST_SHARED mode
only.  On filesystems mounted without the noatime option, ->read_iter
sometimes needs to update the atime as well, though.  Right now, this
leads to a failed locking mode assertion in gfs2_dirty_inode.

Fix that by introducing a new update_time inode operation.  There, if
the glock is held non-exclusively, upgrade it to an exclusive lock.

Reported-by: Alexander Aring <aahringo@redhat.com>
Fixes: 20f829999c ("gfs2: Rework read and page fault locking")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-26 19:58:25 +01:00
Andreas Gruenbacher 6bd1c7bd4e gfs2: Don't call cancel_delayed_work_sync from within delete work function
Right now, we can end up calling cancel_delayed_work_sync from within
delete_work_func via gfs2_lookup_by_inum -> gfs2_inode_lookup ->
gfs2_cancel_delete_work.  When that happens, it will result in a
deadlock.  Instead, gfs2_inode_lookup should skip the call to
gfs2_cancel_delete_work when called from delete_work_func (blktype ==
GFS2_BLKST_UNLINKED).

Reported-by: Alexander Ahring Oder Aring <aahringo@redhat.com>
Fixes: a0e3cc65fa ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-02 21:34:47 +01:00
Andreas Gruenbacher 5902f4dd6e gfs2: Don't return NULL from gfs2_inode_lookup
Callers expect gfs2_inode_lookup to return an inode pointer or ERR_PTR(error).
Commit b66648ad6d caused it to return NULL instead of ERR_PTR(-ESTALE) in
some cases.  Fix that.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: b66648ad6d ("gfs2: Move inode generation number check into gfs2_inode_lookup")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-30 13:04:45 +02:00
Linus Torvalds ca687877e0 Changes in gfs2:
- An iopen glock locking scheme rework that speeds up deletes of
   inodes accessed from multiple nodes.
 - Various bug fixes and debugging improvements.
 - Convert gfs2-glocks.txt to ReST.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAl7eYjMUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTr/9g//cJ6jgiD/+qzh0VzougVksVZIduAl
 RMB+kldOjBS2ORbyYM87Jm1tdyakgZuFO91XlSwChWRdC3Y2mqdaJIEE/kATqfY9
 7Frlw++SyFKLvIf04kDYGk2hXX+umXXYFfrIiKb0tzDSGkPRaARUb3RM4TRvlSDP
 /U0JlYA/4aXMUge+VpYsbpSGeqHNfEzmcmCyPXGmZYyh1MZ/RocMZFYEsP9NH82J
 l07fxowUd10LJPEmBajzjD2NmEvjdvF4gBCOfJVNIfOzCj0CwXL3vmtu1SUNOKr+
 em266EWZF89eOcvfdtE6xF0w81oCAK43wYRjIODSgI9JCLXGiOYmlWZVwZoqu5iy
 2GQDhj/taq3ObuVqjR5n6GYuqMoJ+D0LSD13qccDALq/Bdy4lq9TMLSdDbkrVIM/
 8BVn0nI5MzUlTV3mq6uxhU0HqtYDwUEiHWURWw6bYRug5OvQy3nbg/XZptYlnD87
 XQccE4ErjlgSHLiYx1YckFz/GG6ytrRAKl9bGMkZ0u2+XmDsH+iJJgLcaXCPUP9h
 /hhYagKI55UBDer7we4tppbu+gnJrg3PXlgImf53iMc7ia0KQHd+TfSIFkGPuydI
 aEKKhIQzje23JayMbPRnwPbNlw9zU1loPi7hPS3rCpDY+w8oawpFyIieEOcxJyEt
 pYkOt4BQi9LvpGc=
 =PsCY
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - An iopen glock locking scheme rework that speeds up deletes of inodes
   accessed from multiple nodes

 - Various bug fixes and debugging improvements

 - Convert gfs2-glocks.txt to ReST

* tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: fix use-after-free on transaction ail lists
  gfs2: new slab for transactions
  gfs2: initialize transaction tr_ailX_lists earlier
  gfs2: Smarter iopen glock waiting
  gfs2: Wake up when setting GLF_DEMOTE
  gfs2: Check inode generation number in delete_work_func
  gfs2: Move inode generation number check into gfs2_inode_lookup
  gfs2: Minor gfs2_lookup_by_inum cleanup
  gfs2: Try harder to delete inodes locally
  gfs2: Give up the iopen glock on contention
  gfs2: Turn gl_delete into a delayed work
  gfs2: Keep track of deleted inode generations in LVBs
  gfs2: Allow ASPACE glocks to also have an lvb
  gfs2: instrumentation wrt log_flush stuck
  gfs2: introduce new gfs2_glock_assert_withdraw
  gfs2: print mapping->nrpages in glock dump for address space glocks
  gfs2: Only do glock put in gfs2_create_inode for free inodes
  gfs2: Allow lock_nolock mount to specify jid=X
  gfs2: Don't ignore inode write errors during inode_go_sync
  docs: filesystems: convert gfs2-glocks.txt to ReST
2020-06-08 12:47:09 -07:00
Linus Torvalds 0b166a57e6 A lot of bug fixes and cleanups for ext4, including:
* Fix performance problems found in dioread_nolock now that it is the
   default, caused by transaction leaks.
 * Clean up fiemap handling in ext4
 * Clean up and refactor multiple block allocator (mballoc) code
 * Fix a problem with mballoc with a smaller file systems running out
   of blocks because they couldn't properly use blocks that had been
   reserved by inode preallocation.
 * Fixed a race in ext4_sync_parent() versus rename()
 * Simplify the error handling in the extent manipulation code
 * Make sure all metadata I/O errors are felected to ext4_ext_dirty()'s and
   ext4_make_inode_dirty()'s callers.
 * Avoid passing an error pointer to brelse in ext4_xattr_set()
 * Fix race which could result to freeing an inode on the dirty last
   in data=journal mode.
 * Fix refcount handling if ext4_iget() fails
 * Fix a crash in generic/019 caused by a corrupted extent node
 -----BEGIN PGP SIGNATURE-----
 
 iQEyBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl7Ze8kACgkQ8vlZVpUN
 gaNChAf4xn0ytFSrweI/S2Sp05G/2L/ocZ2TZZk2ZdGeN1E+ABdSIv/zIF9zuFgZ
 /pY/C+fyEZWt4E3FlNO8gJzoEedkzMCMnUhSIfI+wZbcclyTOSNMJtnrnJKAEtVH
 HOvGZJmg357jy407RCGhZpJ773nwU2xhBTr5OFxvSf9mt/vzebxIOnw5D7HPlC1V
 Fgm6Du8q+tRrPsyjv1Yu4pUEVXMJ7qUcvt326AXVM3kCZO1Aa5GrURX0w3J4mzW1
 tc1tKmtbLcVVYTo9CwHXhk/edbxrhAydSP2iACand3tK6IJuI6j9x+bBJnxXitnr
 vsxsfTYMG18+2SxrJ9LwmagqmrRq
 =HMTs
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "A lot of bug fixes and cleanups for ext4, including:

   - Fix performance problems found in dioread_nolock now that it is the
     default, caused by transaction leaks.

   - Clean up fiemap handling in ext4

   - Clean up and refactor multiple block allocator (mballoc) code

   - Fix a problem with mballoc with a smaller file systems running out
     of blocks because they couldn't properly use blocks that had been
     reserved by inode preallocation.

   - Fixed a race in ext4_sync_parent() versus rename()

   - Simplify the error handling in the extent manipulation code

   - Make sure all metadata I/O errors are felected to
     ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.

   - Avoid passing an error pointer to brelse in ext4_xattr_set()

   - Fix race which could result to freeing an inode on the dirty last
     in data=journal mode.

   - Fix refcount handling if ext4_iget() fails

   - Fix a crash in generic/019 caused by a corrupted extent node"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
  ext4: avoid unnecessary transaction starts during writeback
  ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
  ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
  fs: remove the access_ok() check in ioctl_fiemap
  fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
  fs: move fiemap range validation into the file systems instances
  iomap: fix the iomap_fiemap prototype
  fs: move the fiemap definitions out of fs.h
  fs: mark __generic_block_fiemap static
  ext4: remove the call to fiemap_check_flags in ext4_fiemap
  ext4: split _ext4_fiemap
  ext4: fix fiemap size checks for bitmap files
  ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
  add comment for ext4_dir_entry_2 file_type member
  jbd2: avoid leaking transaction credits when unreserving handle
  ext4: drop ext4_journal_free_reserved()
  ext4: mballoc: use lock for checking free blocks while retrying
  ext4: mballoc: refactor ext4_mb_good_group()
  ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
  ext4: mballoc: refactor ext4_mb_discard_preallocations()
  ...
2020-06-05 16:19:28 -07:00
Andreas Gruenbacher 300e549b6e Merge branch 'gfs2-iopen' into for-next 2020-06-05 21:25:36 +02:00
Andreas Gruenbacher b66648ad6d gfs2: Move inode generation number check into gfs2_inode_lookup
Move the inode generation number check from gfs2_lookup_by_inum into
gfs2_inode_lookup: gfs2_inode_lookup may be able to decide that an inode with
the given inode generation number cannot exist without having to verify the
block type or reading the inode from disk.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher 6bdcadea75 gfs2: Minor gfs2_lookup_by_inum cleanup
Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode
generation number will qualify: a valid inode never has a zero no_formal_ino.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher a0e3cc65fa gfs2: Turn gl_delete into a delayed work
This requires flushing delayed work items in gfs2_make_fs_ro (which is called
before unmounting a filesystem).

When inodes are deleted and then recreated, pending gl_delete work items would
have no effect because the inode generations will have changed, so we can
cancel any pending gl_delete works before reusing iopen glocks.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Christoph Hellwig 10c5db2864 fs: move the fiemap definitions out of fs.h
No need to pull the fiemap definitions into almost every file in the
kernel build.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Bob Peterson 1a0b00d15d gfs2: Only do glock put in gfs2_create_inode for free inodes
Before this patch, the error path of function gfs2_create_inode would
always calls gfs2_glock_put for the inode glock. That's good for inodes
that are free. But after they've been added to the vfs inodes, errors
will cause the inode to be evicted, and the evict will do the glock
put for us. If we do a glock put again, we can try to free the glock
while there are still references to it, e.g. revokes pending for
the transaction that created it.

This patch adds a check: if (free_vfs_inode) before the put, thus
solving the problem.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-02 21:23:55 +02:00
Bob Peterson 2297ab6144 gfs2: Fix problems regarding gfs2_qa_get and _put
This patch fixes a couple of places in which gfs2_qa_get and gfs2_qa_put are
not balanced: we now keep references around whenever a file is open for writing
(see gfs2_open_common and gfs2_release), so we need to put all references we
grab in function gfs2_create_inode.  This was broken in the successful case and
on one error path.

This also means that we don't have a reference to put in gfs2_evict_inode.

In addition, gfs2_qa_put was called for the wrong inode in gfs2_link.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-08 18:45:11 +02:00
Linus Torvalds 018d21f5c5 We've got a lot of patches (39) for this merge window. Most of these patches
are related to corruption that occurs when journals are replayed.
 For example:
 
    1. A node fails while writing to the file system.
    2. Other nodes use the metadata that was once used by the failed node.
    3. When the node returns to the cluster, its journal is replayed,
       but the older metadata blocks overwrite the changes from step 2.
 
 - Fixed the recovery sequence to prevent corruption during journal replay.
 - Many bug fixes found during recovery testing.
 - New improved file system withdraw sequence.
 - Fixed how resource group buffers are managed.
 - Fixed how metadata revokes are tracked and written.
 - Improve processing of IO errors hit by daemons like logd and quotad.
 - Improved error checking in metadata writes.
 - Fixed how qadata quota data structures are managed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE89F0ZrnZapxy/9qS14th09/3ejsFAl6Db/QACgkQ14th09/3
 ejvVTgf+IdHXfmpv3ftah8lDDpbsnSKZYRC1NW7skQB+NVG9KtJhtzy1nldaMqMv
 s8wQ5aGKrfBfmzg8IZ9Pt3dCItFqC5d8IqcO0M0FtNuyN+27ETUUMnqBf1NwL6wI
 iAm/+ncZ/BiZN2P8MgXV3OgRGvaC9ebmz860+nthwyJT+6y8d8Qab7pUfyix5e0d
 oTgDhEJqF0DOrGsrlS5rxjTU+RMixtepsAW958D4Eks28OlyduRAj6fAMDoLN2/E
 WoDpX6iKeczH0lOZxnIVQOkCztDaa0jDlK2JK7sJRBMpNxj77aUn4cffY+b/A4kk
 sR5gjsiHoesdAMEpHIXSdEcYMIstIg==
 =VEKB
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Bob Peterson:
 "We've got a lot of patches (39) for this merge window. Most of these
  patches are related to corruption that occurs when journals are
  replayed. For example:

   1. A node fails while writing to the file system.
   2. Other nodes use the metadata that was once used by the failed
      node.
   3. When the node returns to the cluster, its journal is replayed, but
      the older metadata blocks overwrite the changes from step 2.

  Summary:

   - Fixed the recovery sequence to prevent corruption during journal
     replay.

   - Many bug fixes found during recovery testing.

   - New improved file system withdraw sequence.

   - Fixed how resource group buffers are managed.

   - Fixed how metadata revokes are tracked and written.

   - Improve processing of IO errors hit by daemons like logd and
     quotad.

   - Improved error checking in metadata writes.

   - Fixed how qadata quota data structures are managed"

* tag 'gfs2-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (39 commits)
  gfs2: Fix oversight in gfs2_ail1_flush
  gfs2: change from write to read lock for sd_log_flush_lock in journal replay
  gfs2: instrumentation wrt ail1 stuck
  gfs2: don't lock sd_log_flush_lock in try_rgrp_unlink
  gfs2: Remove unnecessary gfs2_qa_{get,put} pairs
  gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_put
  gfs2: Change inode qa_data to allow multiple users
  gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_alloc
  gfs2: Switch to list_{first,last}_entry
  gfs2: Clean up inode initialization and teardown
  gfs2: Additional information when gfs2_ail1_flush withdraws
  gfs2: leaf_dealloc needs to allocate one more revoke
  gfs2: allow journal replay to hold sd_log_flush_lock
  gfs2: don't allow releasepage to free bd still used for revokes
  gfs2: flesh out delayed withdraw for gfs2_log_flush
  gfs2: Do proper error checking for go_sync family of glops functions
  gfs2: Don't demote a glock until its revokes are written
  gfs2: drain the ail2 list after io errors
  gfs2: Withdraw in gfs2_ail1_flush if write_cache_pages fails
  gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty
  ...
2020-03-31 14:16:03 -07:00
Andreas Gruenbacher 1595548fe7 gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_put
Keeping reservations and quotas separate helps reviewing the code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27 14:08:04 -05:00
Bob Peterson 2fba46a04c gfs2: Change inode qa_data to allow multiple users
Before this patch, multiple users called gfs2_qa_alloc which allocated
a qadata structure to the inode, if quotas are turned on. Later, in
file close or evict, the structure was deleted with gfs2_qa_delete.
But there can be several competing processes who need access to the
structure. There were races between file close (release) and the others.
Thus, a release could delete the structure out from under a process
that relied upon its existence. For example, chown.

This patch changes the management of the qadata structures to be
a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get
and if the structure is allocated, the count essentially starts out at
1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the
last guy to decrement the count to 0 frees the memory.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27 14:08:04 -05:00
Bob Peterson d580712a37 gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_alloc
Before this patch, multiple callers called gfs2_rsqa_alloc to force
the existence of a reservations structure and a quota data structure
if needed. However, now the reservations are handled separately, so
the quota data is only the quota data. So we eliminate the one in
favor of just calling gfs2_qa_alloc directly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27 14:08:04 -05:00
Andreas Gruenbacher 40e7e86ef1 gfs2: Clean up inode initialization and teardown
When allocating a new inode, mark the iopen glock holder as uninitialized to
make sure gfs2_evict_inode won't fail after an incomplete create or lookup.  In
gfs2_evict_inode, allow the inode glock to be NULL and remove the duplicate
iopen glock teardown code.  In gfs2_inode_lookup, don't tear down things that
gfs2_evict_inode will already tear down.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27 14:08:04 -05:00
Al Viro 2103913265 gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
with the way fs/namei.c:do_last() had been done, ->atomic_open()
instances needed to recognize the case when existing file got
found with O_EXCL|O_CREAT, either by falling back to finish_no_open()
or failing themselves.  gfs2 one didn't.

Fixes: 6d4ade986f (GFS2: Add atomic_open support)
Cc: stable@kernel.org # v3.11
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-12 18:21:24 -04:00
Andreas Gruenbacher 2b0fb353c0 gfs2: Avoid access time thrashing in gfs2_inode_lookup
In gfs2_inode_lookup, we initialize inode->i_atime to the lowest
possibly value after gfs2_inode_refresh may already have been called.
This should be the other way around, but we didn't notice because
usually the inode type is known from the directory entry and so
gfs2_inode_lookup won't call gfs2_inode_refresh.

In addition, only initialize ip->i_no_formal_ino from no_formal_ino when
actually needed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-01-15 15:20:07 +01:00
Andreas Gruenbacher 8f81180ac1 gfs2: Remove duplicate call from gfs2_create_inode
In gfs2_create_inode, gfs2_set_inode_blocks is called twice for no good reason.
Remove the unnecessary call.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-11-21 11:37:12 +01:00
Bob Peterson 2c47c1be51 gfs2: clean up iopen glock mess in gfs2_create_inode
Before this patch, gfs2_create_inode had a use-after-free for the
iopen glock in some error paths because it did this:

	gfs2_glock_put(io_gl);
fail_gunlock2:
	if (io_gl)
		clear_bit(GLF_INODE_CREATING, &io_gl->gl_flags);

In some cases, the io_gl was used for create and only had one
reference, so the glock might be freed before the clear_bit().
This patch tries to straighten it out by only jumping to the
error paths where iopen is properly set, and moving the
gfs2_glock_put after the clear_bit.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-11-19 21:02:01 +01:00
Aliasgar Surti 098b9c1453 gfs2: removed unnecessary semicolon
There is use of unnecessary semicolon after switch case.
Removed the semicolon.

Signed-off-by: Aliasgar Surti <aliasgar.surti500@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-10-30 12:17:04 +01:00
Bob Peterson ad26967b9a gfs2: Use async glocks for rename
Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can
reverse the roles of which directories are "old" and which are "new" for
the purposes of rename. This can cause deadlocks where two nodes end up
waiting for each other.

There can be several layers of directory dependencies across many nodes.

This patch fixes the problem by acquiring all gfs2_rename's inode glocks
asychronously and waiting for all glocks to be acquired.  That way all
inodes are locked regardless of the order.

The timeout value for multiple asynchronous glocks is calculated to be
the total of the individual wait times for each glock times two.

Since gfs2_exchange is very similar to gfs2_rename, both functions are
patched in the same way.

A new async glock wait queue, sd_async_glock_wait, keeps a list of
waiters for these events. If gfs2's holder_wake function detects an
async holder, it wakes up any waiters for the event. The waiter only
tests whether any of its requests are still pending.

Since the glocks are sent to dlm asychronously, the wait function needs
to check to see which glocks, if any, were granted.

If a glock is granted by dlm (and therefore held), its minimum hold time
is checked and adjusted as necessary, as other glock grants do.

If the event times out, all glocks held thus far must be dequeued to
resolve any existing deadlocks.  Then, if there are any outstanding
locking requests, we need to loop around and wait for dlm to respond to
those requests too.  After we release all requests, we return -ESTALE to
the caller (vfs rename) which loops around and retries the request.

    Node1           Node2
    ---------       ---------
1.  Enqueue A       Enqueue B
2.  Enqueue B       Enqueue A
3.  A granted
6.                  B granted
7.  Wait for B
8.                  Wait for A
9.                  A times out (since Node 1 holds A)
10.                 Dequeue B (since it was granted)
11.                 Wait for all requests from DLM
12. B Granted (since Node2 released it in step 10)
13. Rename
14. Dequeue A
15.                 DLM Grants A
16.                 Dequeue A (due to the timeout and since we
                    no longer have B held for our task).
17. Dequeue B
18.                 Return -ESTALE to vfs
19.                 VFS retries the operation, goto step 1.

This release-all-locks / acquire-all-locks may slow rename / exchange
down as both nodes struggle in the same way and do the same thing.
However, this will only happen when there is contention for the same
inodes, which ought to be rare.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-09-04 20:22:17 +02:00
Bob Peterson bc74aaefdd gfs2: separate holder for rgrps in gfs2_rename
Before this patch, gfs2_rename added a holder for the rgrp glock to
its array of holders, ghs. There's nothing wrong with that, but this
patch separates it into a separate holder. This is done to ensure
it's always locked last as per the proper glock lock ordering,
and also to pave the way for a future patch in which we will
lock the non-rgrp glocks asynchronously.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-09-04 20:22:17 +02:00
Kefeng Wang 15a798f7de gfs2: Use IS_ERR_OR_NULL
Use IS_ERR_OR_NULL where appropriate.

(Several more places converted by Andreas.)

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 20:53:46 +02:00
Thomas Gleixner 7336d0e654 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398
Based on 1 normalized pattern(s):

  this copyrighted material is made available to anyone wishing to use
  modify copy or redistribute it subject to the terms and conditions
  of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:12 +02:00
Andreas Gruenbacher 6ff9b09e00 gfs2: Get rid of potential double-freeing in gfs2_create_inode
In gfs2_create_inode, after setting and releasing the acl / default_acl, the
acl / default_acl pointers are not set to NULL as they should be.  In that
state, when the function reaches label fail_free_acls, gfs2_create_inode will
try to release the same acls again.

Fix that by setting the pointers to NULL after releasing the acls.  Slightly
simplify the logic.  Also, posix_acl_release checks for NULL already, so
there is no need to duplicate those checks here.

Fixes: e01580bf9e ("gfs2: use generic posix ACL infrastructure")
Reported-by: Pan Bian <bianpan2016@163.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-12-11 21:44:29 +01:00
Al Viro 44907d7900 get rid of 'opened' argument of ->atomic_open() - part 3
now it can be done...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:20 -04:00
Al Viro b452a458ca getting rid of 'opened' argument of ->atomic_open() - part 2
__gfs2_lookup(), gfs2_create_inode(), nfs_finish_open() and fuse_create_open()
don't need 'opened' anymore.  Get rid of that argument in those.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:20 -04:00
Al Viro be12af3ef5 getting rid of 'opened' argument of ->atomic_open() - part 1
'opened' argument of finish_open() is unused.  Kill it.

Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:19 -04:00
Al Viro 73a09dd943 introduce FMODE_CREATED and switch to it
Parallel to FILE_CREATED, goes into ->f_mode instead of *opened.
NFS is a bit of a wart here - it doesn't have file at the point
where FILE_CREATED used to be set, so we need to propagate it
there (for now).  IMA is another one (here and everywhere)...

Note that this needs do_dentry_open() to leave old bits in ->f_mode
alone - we want it to preserve FMODE_CREATED if it had been already
set (no other bit can be there).

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:18 -04:00
Al Viro aad888f828 switch all remaining checks for FILE_OPENED to FMODE_OPENED
... and don't bother with setting FILE_OPENED at all.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:18 -04:00
Andreas Gruenbacher 628e366df1 gfs2: Iomap cleanups and improvements
Clean up gfs2_iomap_alloc and gfs2_iomap_get.  Document how
gfs2_iomap_alloc works: it now needs to be called separately after
gfs2_iomap_get where necessary; this will be used later by iomap write.
Move gfs2_iomap_ops into bmap.c.

Introduce a new gfs2_iomap_get_alloc helper and use it in
fallocate_chunk: gfs2_iomap_begin will become unsuitable for fallocate
with proper iomap write support.

In gfs2_block_map and fallocate_chunk, zero-initialize struct iomap.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-04 07:56:51 -05:00
Andreas Gruenbacher 83998ccd9b gfs2: Dirty source inode during rename
Mark the source inode dirty during a rename instead of just updating the
underlying buffer head.  Otherwise, fsync may find the inode clean and
will then skip flushing the journal.  A subsequent power failure will
cause the rename to be lost.  This happens in command sequences like:

  xfs_io -f -c 'pwrite 0 4096' -c 'fsync' foo
  mv foo bar
  xfs_io -c 'fsync' bar
  # power failure

Fixes xfstests generic/322, generic/376.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-08 09:26:20 -07:00
Bob Peterson 2eb5909dee GFS2: Don't try to end a non-existent transaction in unlink
Before this patch, if function gfs2_unlink failed to get a valid
transaction (for example, not enough journal blocks) it would go
to label out_end_trans which did gfs2_trans_end. But if the
trans_begin failed, there's no transaction to end, and trying to
do so results in: kernel BUG at fs/gfs2/trans.c:117!

This patch changes the goto so that it does not try to end a
non-existent transaction.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-29 10:00:23 -07:00
Andreas Gruenbacher 235628c5c7 gfs2: Add gfs2_max_stuffed_size
Add a small inline function for computing the maximum size of a stuffed
inode instead of open coding that in several places throughout the code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 14:18:53 -07:00
Andreas Gruenbacher b2623c2fe6 gfs2: Add support for statx inode flags
Add support for the STATX_ATTR_ flags in statx.  (Compression,
encryption, and the nodump flag are not supported by gfs2.)

Partially fixes xfstest generic/424.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:58 +01:00
Andreas Gruenbacher 3a27411cb4 gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap
So far, lseek on gfs2 did not report holes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:35 +01:00
Bob Peterson aac1a55b45 GFS2: Switch fiemap implementation to use iomap
This patch switches GFS2's implementation of fiemap from the old
block_map code to the new iomap interface.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31 14:26:34 +01:00
Andreas Gruenbacher 38eedf2841 gfs2: Support negative atimes
When inodes are read from disk, GFS2 will only update in-memory atimes
older than the on-disk atimes; this prevents atimes from going
backwards.  The atimes of newly allocated inodes are initialized to 0.
This means that when an atime is explicitly set to a negative value,
this value will not persist.

Fix by setting the atime of newly allocated inodes to the lowest
possible value instead of 0.

Fixes xfstest generic/258.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:19 -05:00
Andreas Gruenbacher 61b91cfdc6 gfs2: Fix trivial typos
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Bob Peterson 9c1b28081f GFS2: Clear gl_object if gfs2_create_inode fails
If function gfs2_create_inode fails after the inode has been
created (for example, if the inode_refresh fails for some reason)
the function was setting gl_object but never clearing it again.
The glocks are left pointing to a freed inode. This patch adds
the calls to clear gl_object in the appropriate error paths.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-08-09 09:36:26 -05:00