1
0
Fork 0
Commit Graph

68445 Commits (1e249cb5b7fc09ff216aa5a12f6c302e434e88f9)

Author SHA1 Message Date
Eric Biggers 1e249cb5b7 fs: fix lazytime expiration handling in __writeback_single_inode()
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.

This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).

However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state.  This causes two bugs:

- mark_inode_dirty_sync() redirties the inode, causing it to remain
  dirty.  This wastefully causes the inode to be written twice.  But
  more importantly, it breaks cases where sync_filesystem() is expected
  to clean dirty inodes.  This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
  ioctl (as reported at
  https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
  as possibly filesystem freezing (freeze_super()).

- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
  called from __writeback_single_inode() for lazytime expiration,
  xfs_fs_dirty_inode() ignores the notification.  (XFS only cares about
  lazytime expirations, and it assumes that i_state will contain
  I_DIRTY_TIME during those.)  Therefore, lazy timestamps aren't
  persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.

Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state.  This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.

This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled.  It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).

Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly.  But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.

Fixes: 0ae45f63d4 ("vfs: add support for a lazytime mount option")
Cc: stable@vger.kernel.org
Depends-on: 5afced3bf2 ("writeback: Avoid skipping inode writeback")
Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-01-13 17:26:21 +01:00
Linus Torvalds e609571b5f NFS client bugfixes for Linux 5.11
Highlights include:
 
 Bugfixes:
 - Fix parsing of link-local IPv6 addresses
 - Fix confusing logging of mount errors that was introduced by the
   fsopen() patchset.
 - Fix a tracing use after free in _nfs4_do_setlk()
 - Layout return-on-close fixes when called from nfs4_evict_inode()
 - Layout segments were being leaked in pnfs_generic_clear_request_commit()
 - Don't leak DS commits in pnfs_generic_retry_commit()
 - Fix an Oopsable use-after-free when nfs_delegation_find_inode_server()
   calls iput() on an inode after the super block has gone away.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl/806IACgkQZwvnipYK
 APKIZA/+L+LvkMXflS9TQGGccpOPw+BBW5ixi2DabFYLqHz6WXNnIcUStU0NtF3q
 uHM2YrJT0XtWtQ8W6fWcsfdeS/1ixciXDS/5RH/o2e+fFMNg1lPWAeOc4brQSDFd
 DYEc7lSqw0D/pX8vY4dFIrpQorU2hnasjMK582JU7mDYXveRMLB/Bhcq9qBP2XgQ
 LVUpnHU/3dayvFGmr/sPzzZk/rIEfPaHU/J0YLbPfrEGFOo/mZKqstfS4ZkINAWp
 0yRD90s1hWTfRcxAiDaUoYPoxEw5AYjdbwC82owOaEa0zNWA2U7tD94UeVS51JCJ
 DtCn81znWaF4jVzes4VGzPlWirYoumthJwrKpKh04tEwo0a4V4AtsOAg2IbxfE/O
 CYsfwjwikzW4nOEerv22zOHICLNd2IP65kHAACaN0NVhS7dlLSuckwnMILdstD2Z
 x0LHxFhyRQe5c7bf6W6Jal2E/ThyD2qaUmSIxWweTq93OldD0mTLGHO7e2/chXwP
 3xkcuZLpU6bmg9QzmylWZWBB3ncDtC95VlRv/IV29mbN3a8XjJaugSOAwjx14JNT
 OFlJtLav2pvCwFLUutvgAMSgbshhfkwdUoUUHrcabXNL/4QBeeZB/pp9Ytr3NoBT
 xxC6nmB/Af7FtRnTrTpOSlH9s1NEB3JN4uMNx4kAKC+ZLySdMPQ=
 =08H3
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Highlights include:

   - Fix parsing of link-local IPv6 addresses

   - Fix confusing logging of mount errors that was introduced by the
     fsopen() patchset.

   - Fix a tracing use after free in _nfs4_do_setlk()

   - Layout return-on-close fixes when called from nfs4_evict_inode()

   - Layout segments were being leaked in
     pnfs_generic_clear_request_commit()

   - Don't leak DS commits in pnfs_generic_retry_commit()

   - Fix an Oopsable use-after-free when nfs_delegation_find_inode_server()
     calls iput() on an inode after the super block has gone away"

* tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: nfs_igrab_and_active must first reference the superblock
  NFS: nfs_delegation_find_inode_server must first reference the superblock
  NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter
  NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit()
  NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the request
  pNFS: Stricter ordering of layoutget and layoutreturn
  pNFS: Clean up pnfs_layoutreturn_free_lsegs()
  pNFS: We want return-on-close to complete when evicting the inode
  pNFS: Mark layout for return if return-on-close was not sent
  net: sunrpc: interpret the return value of kstrtou32 correctly
  NFS: Adjust fs_context error logging
  NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
2021-01-12 09:38:53 -08:00
Linus Torvalds 6e68b9961f for-5.11-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl/8jD4ACgkQxWXV+ddt
 WDteWQ//QcpD6STpLwAC+g6zJyJln7Au9lfQvawugvOJssbtdPkJQP3ZiK+Izwi/
 /xagu6XMazJM+47acNJKDNntOqVkp+O6CxEbLU+rL/D288L3HEGxayZ2LL90wm6J
 tbIebOE+BSVZ/5oe0jVdqZXwYvUtTiJ7PoFgrZPXJCnddSitZRD3tC4Wmi/Yo5+0
 +7CW6PT3/s7KARwYXpgpMM5vi8qO2nfHfTUdRlSh59g7zC/TH7HiitL6roHzlX1k
 g/aaKYLVcg62OPpw7ZXwde/qH8n1TR+H5WX6vBInqd/9jYcNkVGqijCgBeL1TJkN
 Vx/b69ccODK2GNzuuYoo3k3XvSwZWsOTZp+k4y3EZ1cMONMo1snu/xglYsvSZvUL
 lNCQlA9hIZNskRwEvkEea68/bQdiOl6xezgR9tajMlmz7oCsV/Cz/MJ+RfqaxdH3
 bV6eTTex67lQfzAda+gN+zjBrFzQdmK700gKimdzF1XfcYmmCIdZVX8Gm/N6ldQN
 LNRe8zYRaqrmRk9PQ355RqYDZmft/wLiUV6V0j74oV65WpPe2R4pULWdmPAGm6Oj
 UWM+ZR3u9m8asg7ghKYgct2pxCS3+gLbDNXNcOSxYxthEEZB2JqkAMjtjCfwJilN
 PXfuXaBKRmRck+AcYfbBrfJOljQ+zAJdTK/Rid40TwwpFCe/jjY=
 =G3R4
 -----END PGP SIGNATURE-----

Merge tag 'for-5.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "More material for stable trees.

   - tree-checker: check item end overflow

   - fix false warning during relocation regarding extent type

   - fix inode flushing logic, caused notable performance regression
     (since 5.10)

   - debugging fixups:
      - print correct offset for reloc tree key
      - pass reliable fs_info pointer to error reporting helper"

* tag 'for-5.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: shrink delalloc pages instead of full inodes
  btrfs: reloc: fix wrong file extent type check to avoid false ENOENT
  btrfs: tree-checker: check if chunk item end overflows
  btrfs: prevent NULL pointer dereference in extent_io_tree_panic
  btrfs: print the actual offset in btrfs_root_name
2021-01-11 14:18:56 -08:00
Linus Torvalds c912fd05fa Fixes:
- Fix major TCP performance regression
 - Get NFSv4.2 READ_PLUS regression tests to pass
 - Improve NFSv4 COMPOUND memory allocation
 - Fix sparse warning
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl/138wACgkQM2qzM29m
 f5c0QQ/+NkUxtmXd5lKXjzB0NcXsiQm9QxGvY52Oj75DHHprGmGkNEQAKczr/1Gu
 l+MArFXJTITrZRwbqQMA4uxwgCfup51atI12c27n1u5T9+bMicJIjT5yCtQ7rT2t
 U70VSZKgBlWWTcvfiEcFc1rloI3IY5c4ZYpeMxaXseegn6w3LYQfkZLcRdRleSz3
 P0IO59Eow8Wt/GxRXpeYv0sK2m8OK1OyknKAzbq9swrc0ARJzKIwuTDs7jPtlvg5
 SkDOTrXdSHwVvTrCqr9BwaNtQa76xR/Zo5UqKYgyzx3/NQ7h39hRTR5xLVst+Ynh
 3TgOPS0YDWlmRzjX0xhr5y+rwWFxRvS6uecaIMOSuqABQ1F0RwbfXE/XplQLhk1E
 kjL819y5MuUpOdjMx5SZEo0pC7VeAoqGmzvTunpf974ExTNvDiKf0fPFs74cYUzG
 /a4k3DYJQbzUgG1PzPElbKbPUwSk/W/M7p9Tw7R9dnX2huVa/2J6TllbnbUi6REf
 4qVqCe3WXFHE8Q9FCBuYEaTddToPqA4M98B8ba/pDYiqgfI8goWvGEQukuL7RES0
 0i3G5SMC5zScgk44RMewyNrzl8IzCJXITv39+YDQ9O4FVJJXTSAMoyQ5aXlzVhc6
 v+b4560cXoltEecFzooKjNbb+2FURKNgfeDk9xgG2DoydzelipU=
 =POBn
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6

Pull nfsd fixes from Chuck Lever:

 - Fix major TCP performance regression

 - Get NFSv4.2 READ_PLUS regression tests to pass

 - Improve NFSv4 COMPOUND memory allocation

 - Fix sparse warning

* tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6:
  NFSD: Restore NFSv4 decoding's SAVEMEM functionality
  SUNRPC: Handle TCP socket sends with kernel_sendpage() again
  NFSD: Fix sparse warning in nfssvc.c
  nfsd: Don't set eof on a truncated READ_PLUS
  nfsd: Fixes for nfsd4_encode_read_plus_data()
2021-01-11 11:35:46 -08:00
Trond Myklebust 896567ee7f NFS: nfs_igrab_and_active must first reference the superblock
Before referencing the inode, we must ensure that the superblock can be
referenced. Otherwise, we can end up with iput() calling superblock
operations that are no longer valid or accessible.

Fixes: ea7c38fef0 ("NFSv4: Ensure we reference the inode for return-on-close in delegreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 16:29:28 -05:00
Trond Myklebust 113aac6d56 NFS: nfs_delegation_find_inode_server must first reference the superblock
Before referencing the inode, we must ensure that the superblock can be
referenced. Otherwise, we can end up with iput() calling superblock
operations that are no longer valid or accessible.

Fixes: e39d8a186e ("NFSv4: Fix an Oops during delegation callbacks")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 16:29:28 -05:00
Linus Torvalds ed41fd071c block-5.11-2021-01-10
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/7KA0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpn6WEACeUa97qyzm7G8/E5ejBL6lXSTRXNc8qa+h
 YCdrDltkqs6OHAuEyUCwGw3zPmb7fp4M5RLZ/Dp9EtMwld45HfoN6mpRe0+i4U96
 iAkHMNUo6ytp3wXX1XKgZ0FhcSOSwkQK8CMzmLPn+pxkDYzQPFg38AUISPpoDA/L
 YNh4tEiHHd5oprHIzludE00m2i1oYNrBcmUe27sKxR0mak0kEJtxr4cXLrqBtN3k
 9C31A0gstCINSHmQPAcRvFerDxDM0WPYQ7K6UEXfkCfbyf6i+1eG/qLUwUCdm9MD
 Rjot6dXzQ2LzqJbaAZndjJRDRZx2xpC2TNlNaBjYzSOC6AXSY0MKiZBCnH/i/OoZ
 f0Bq/k7LVeMbyu02cgIis4DPLabfG+XQUOniu4HQTrzK8+neApAlCwINc73cvQOb
 hBS+LfUVqP6K6g3oVGSvqG01wj2HK69SWMNKTr9GZ3GIqrcWYtA/JnqFfTE7/KwC
 H7rkPL8i3+NBXmjjz6hm8hx3MrnekKJpsdCBicm9OOYqJRbkGVjoUYeDFz5MElfp
 k71u2WDQ81aiqfWajsJkZaUFxZgUrRzuWeyBZiQQP9kJEMzUUiDSg4K+0WJhk5bO
 Y0EX0sdCz8k9IBKfi2+FcF5dYj3RDolALmBDrrcfchTW0h7vxMpn4rr/ueN7gViz
 rW/Gj9pRsA==
 =CClj
 -----END PGP SIGNATURE-----

Merge tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Missing CRC32 selections (Arnd)

 - Fix for a merge window regression with bdev inode init (Christoph)

 - bcache fixes

 - rnbd fixes

 - NVMe pull request from Christoph:
    - fix a race in the nvme-tcp send code (Sagi Grimberg)
    - fix a list corruption in an nvme-rdma error path (Israel Rukshin)
    - avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
    - add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
    - fix two compiler warnings in nvme-fcloop (James Smart)
    - don't call sleeping functions from irq context in nvme-fc (James Smart)
    - remove an unused argument (Max Gurtovoy)
    - remove unused exports (Minwoo Im)

 - Use-after-free fix for partition iteration (Ming)

 - Missing blk-mq debugfs flag annotation (John)

 - Bdev freeze regression fix (Satya)

 - blk-iocost NULL pointer deref fix (Tejun)

* tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block: (26 commits)
  bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET
  bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket
  bcache: check unsupported feature sets for bcache register
  bcache: fix typo from SUUP to SUPP in features.h
  bcache: set pdev_set_uuid before scond loop iteration
  blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED
  block/rnbd-clt: avoid module unload race with close confirmation
  block/rnbd: Adding name to the Contributors List
  block/rnbd-clt: Fix sg table use after free
  block/rnbd-srv: Fix use after free in rnbd_srv_sess_dev_force_close
  block/rnbd: Select SG_POOL for RNBD_CLIENT
  block: pre-initialize struct block_device in bdev_alloc_inode
  fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb
  nvme: remove the unused status argument from nvme_trace_bio_complete
  nvmet-rdma: Fix list_del corruption on queue establishment failure
  nvme: unexport functions with no external caller
  nvme: avoid possible double fetch in handling CQE
  nvme-tcp: Fix possible race of io_work and direct send
  nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
  nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
  ...
2021-01-10 12:53:08 -08:00
Linus Torvalds d430adfea8 io_uring-5.11-2021-01-10
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/7J/cQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpr2IEACmFmPGOHiRgo3ET6F+pXufVtqN/TiiOKIM
 GNznobO62HRL35ZTSVMH86u+koouWfoBCWtNMOq6ln2wMbNnHUKFgCdNhhX1vXPg
 T404WTVFtqw19wBigLUBnANDWAHOOVOdcbShNuznE/PYgC1faz9HFjuKY7GVnrzU
 xOSuw1k8P8swwf5h641fIwGI1sdyBgFHRsG8YwVBVJtmf4B8tRSQvAF1fesIPw6j
 ZQTcfAAFl29Kj1Tjlog/rD75CS6xVQRd59AaJqvF8khMAsp/VHb7snAPMZJ2EEO2
 6by5zNbh5mhqOhWFe7jvem+MBFGARx/Ol5cfKPehGsjLoOAk/QJU6tOYA5T1YHdY
 aOB5QR/c/H7uHdWBu6pcU9sT3b7LvOJGJF+kXHt/N2A8k03jXmmWk+LoxrRsXrbr
 Tvo7ehOIHdgnOuy2M9R9a/c0WtjY6mzBM9OEyFT3oa1ijM5mjAgU1YAVKJgjOa83
 eeN9p50+OcFH34bDU2bGCbDdnfZvplcgcpib1p2A5H0seoci2Tn//ihJ7us4+JTP
 5hZS1alkL/ngwr8LGoltS/RnJyJsP2ZJO8NzrRli66e6EflUXnJtla8zP4EASKmQ
 FnD5Yt2TFtBTt/MOAhUKG2whD5uvUtJIRLll33i6gfO97LSMZVkH43pji4XEAZkR
 idVotzElKw==
 =+yAt
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.11-2021-01-10' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A bit larger than I had hoped at this point, but it's all changes that
  will be directed towards stable anyway. In detail:

   - Fix a merge window regression on error return (Matthew)

   - Remove useless variable declaration/assignment (Ye Bin)

   - IOPOLL fixes (Pavel)

   - Exit and cancelation fixes (Pavel)

   - fasync lockdep complaint fix (Pavel)

   - Ensure SQPOLL is synchronized with creator life time (Pavel)"

* tag 'io_uring-5.11-2021-01-10' of git://git.kernel.dk/linux-block:
  io_uring: stop SQPOLL submit on creator's death
  io_uring: add warn_once for io_uring_flush()
  io_uring: inline io_uring_attempt_task_drop()
  io_uring: io_rw_reissue lockdep annotations
  io_uring: synchronise ev_posted() with waitqueues
  io_uring: dont kill fasync under completion_lock
  io_uring: trigger eventfd for IOPOLL
  io_uring: Fix return value from alloc_fixed_file_ref_node
  io_uring: Delete useless variable ‘id’ in io_prep_async_work
  io_uring: cancel more aggressively in exit_work
  io_uring: drop file refs after task cancel
  io_uring: patch up IOPOLL overflow_flush sync
  io_uring: synchronise IOPOLL on task_submit fail
2021-01-10 12:39:38 -08:00
Linus Torvalds a440e4d761 - A fix for fanotify_mark() missing the conversion of x86_32 native
syscalls which take 64-bit arguments to the compat handlers due to
 former having a general compat handler. (Brian Gerst)
 
 - Add a forgotten pmd page destructor call to pud_free_pmd_page() where
 a pmd page is freed. (Dan Williams)
 
 - Make IN/OUT insns with an u8 immediate port operand handling for
 SEV-ES guests more precise by using only the single port byte and not
 the whole s32 value of the insn decoder. (Peter Gonda)
 
 - Correct a straddling end range check before returning the proper MTRR
 type, when the end address is the same as top of memory. (Ying-Tsun
 Huang)
 
 - Change PQR_ASSOC MSR update scheme when moving a task to a resctrl
 resource group to avoid significant performance overhead with some
 resctrl workloads. (Fenghua Yu)
 
 - Avoid the actual task move overhead when the task is already in the
 resource group. (Fenghua Yu)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/61xYACgkQEsHwGGHe
 VUrcvQ//dAWAteCC/BXVHpgcWrBOgPrkwv7aAo70bIO50fUj4pHPYbfhOJU1ey7j
 5o4FrqdsOVhGfZjQzvT/juLsr9mQHsfszxKpDTLyK3wVtUtIODYXzgiXRc/qfZDO
 ozXCVUsUSKJgrIcKTBQbmugK36iZZk+ER+qzUaqd0aq8mocdtSSO8b14uaRJw3MR
 vumqmEmEEcyM9XK0UgTLPcf6Uhu+Mlg3YSNkV5Qhu0yiCTJaqeEySsytUcRsnnF/
 z8AkxZP03Q65o3aoRoSGZihHNKTkNucbavYp70LkcqopoHlC+XERvya9ANRibLPi
 /+s9GQUm4QPg7XRHLB8dXFZ9RY3YGUeE60BUxVZa4vI3pwciPQD5tbvUF3F/jEN0
 PYLy/zVlAkDfI6Z8wTl8DNmd8nd/rE0F4p5zayjpQUWsjjfZDrh+GzBl/YsMuYRp
 G8dk3tEUc8KREBEccv/YzuVcE0AhX4t1tkn3l2Le5v+4PbwRWBm2uNOiRfn4OM31
 iB4E4yCHBnBhTyBA0TkWuHV1TJX6Tb2+0g+D49ZoMGFVoBd8NL6f+dBr0psjX/U+
 RsZucit0FcJG2VhJNXEPD+rwNZ6XPfDmIU9GNTAmXUuoKR/kqT8D/NWYkqmKh/Vw
 +F2EIgOZVhQVOvLKWRut+4qmQRStm6B3UBJimEDySUJPT72O+dU=
 =2/Eq
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.11_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "As expected, fixes started trickling in after the holidays so here is
  the accumulated pile of x86 fixes for 5.11:

   - A fix for fanotify_mark() missing the conversion of x86_32 native
     syscalls which take 64-bit arguments to the compat handlers due to
     former having a general compat handler. (Brian Gerst)

   - Add a forgotten pmd page destructor call to pud_free_pmd_page()
     where a pmd page is freed. (Dan Williams)

   - Make IN/OUT insns with an u8 immediate port operand handling for
     SEV-ES guests more precise by using only the single port byte and
     not the whole s32 value of the insn decoder. (Peter Gonda)

   - Correct a straddling end range check before returning the proper
     MTRR type, when the end address is the same as top of memory.
     (Ying-Tsun Huang)

   - Change PQR_ASSOC MSR update scheme when moving a task to a resctrl
     resource group to avoid significant performance overhead with some
     resctrl workloads. (Fenghua Yu)

   - Avoid the actual task move overhead when the task is already in the
     resource group. (Fenghua Yu)"

* tag 'x86_urgent_for_v5.11_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Don't move a task to the same resource group
  x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
  x86/mtrr: Correct the range check before performing MTRR type lookups
  x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling
  x86/mm: Fix leak of pmd ptlock
  fanotify: Fix sys_fanotify_mark() on native x86-32
2021-01-10 11:31:17 -08:00
Trond Myklebust cb2856c597 NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter
If we exit _lgopen_prepare_attached() without setting a layout, we will
currently leak the plh_outstanding counter.

Fixes: 411ae722d1 ("pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:52 -05:00
Trond Myklebust 46c9ea1d4f NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit()
We must ensure that we pass a layout segment to nfs_retry_commit() when
we're cleaning up after pnfs_bucket_alloc_ds_commits(). Otherwise,
requests that should be committed to the DS will get committed to the
MDS.
Do so by ensuring that pnfs_bucket_get_committing() always tries to
return a layout segment when it returns a non-empty page list.

Fixes: c84bea5944 ("NFS/pNFS: Simplify bucket layout segment reference counting")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:52 -05:00
Trond Myklebust 1757655d78 NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the request
In pnfs_generic_clear_request_commit(), we try calling
pnfs_free_bucket_lseg() before we remove the request from the DS bucket.
That will always fail, since the point is to test for whether or not
that bucket is empty.

Fixes: c84bea5944 ("NFS/pNFS: Simplify bucket layout segment reference counting")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:52 -05:00
Trond Myklebust 2c8d5fc37f pNFS: Stricter ordering of layoutget and layoutreturn
If a layout return is in progress, we should wait for it to complete,
in case the layout segment we are picking up gets returned too.

Fixes: 30cb3ee299 ("pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:52 -05:00
Trond Myklebust c18d1e17ba pNFS: Clean up pnfs_layoutreturn_free_lsegs()
Remove the check for whether or not the stateid is NULL, and fix up the
callers.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:52 -05:00
Trond Myklebust 078000d02d pNFS: We want return-on-close to complete when evicting the inode
If the inode is being evicted, it should be safe to run return-on-close,
so we should do it to ensure we don't inadvertently leak layout segments.

Fixes: 1c5bd76d17 ("pNFS: Enable layoutreturn operation for return-on-close")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:51 -05:00
Trond Myklebust 67bbceedc9 pNFS: Mark layout for return if return-on-close was not sent
If the layout return-on-close failed because the layoutreturn was never
sent, then we should mark the layout for return again.

Fixes: 9c47b18cf7 ("pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:51 -05:00
Scott Mayhew c98e9daa59 NFS: Adjust fs_context error logging
Several existing dprink()/dfprintk() calls were converted to use the new
mount API logging macros by commit ce8866f091 ("NFS: Attach
supplementary error information to fs_context").  If the fs_context was
not created using fsopen() then it will not have had a log buffer
allocated for it, and the new mount API logging macros will wind up
calling printk().

This can result in syslog messages being logged where previously there
were none... most notably "NFS4: Couldn't follow remote path", which can
happen if the client is auto-negotiating a protocol version with an NFS
server that doesn't support the higher v4.x versions.

Convert the nfs_errorf(), nfs_invalf(), and nfs_warnf() macros to check
for the existence of the fs_context's log buffer and call dprintk() if
it doesn't exist.  Add nfs_ferrorf(), nfs_finvalf(), and nfs_warnf(),
which do the same thing but take an NFS debug flag as an argument and
call dfprintk().  Finally, modify the "NFS4: Couldn't follow remote
path" message to use nfs_ferrorf().

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207385
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: ce8866f091 ("NFS: Attach supplementary error information to fs_context.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:39 -05:00
Pavel Begunkov d9d05217cb io_uring: stop SQPOLL submit on creator's death
When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
its internals like ->files and ->mm to be poked by the SQPOLL task, it
have never been nice and recently got racy. That can happen when the
owner undergoes destruction and SQPOLL tasks tries to submit new
requests in parallel, and so calls io_sq_thread_acquire*().

That patch halts SQPOLL submissions when sqo_task dies by introducing
sqo_dead flag. Once set, the SQPOLL task must not do any submission,
which is synchronised by uring_lock as well as the new flag.

The tricky part is to make sure that disabling always happens, that
means either the ring is discovered by creator's do_exit() -> cancel,
or if the final close() happens before it's done by the creator. The
last is guaranteed by the fact that for SQPOLL the creator task and only
it holds exactly one file note, so either it pins up to do_exit() or
removed by the creator on the final put in flush. (see comments in
uring_flush() around file->f_count == 2).

One more place that can trigger io_sq_thread_acquire_*() is
__io_req_task_submit(). Shoot off requests on sqo_dead there, even
though actually we don't need to. That's because cancellation of
sqo_task should wait for the request before going any further.

note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
caller would enter the ring to get an error, but it still doesn't
guarantee that the flag won't be cleared.

note 2: if final __userspace__ close happens not from the creator
task, the file note will pin the ring until the task dies.

Fixed: b1b6b5a30d ("kernel/io_uring: cancel io_uring before task works")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-09 09:21:43 -07:00
Pavel Begunkov 6b5733eb63 io_uring: add warn_once for io_uring_flush()
files_cancel() should cancel all relevant requests and drop file notes,
so we should never have file notes after that, including on-exit fput
and flush. Add a WARN_ONCE to be sure.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-09 09:21:43 -07:00
Pavel Begunkov 4f793dc40b io_uring: inline io_uring_attempt_task_drop()
A simple preparation change inlining io_uring_attempt_task_drop() into
io_uring_flush().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-09 09:21:43 -07:00
Pavel Begunkov 55e6ac1e1f io_uring: io_rw_reissue lockdep annotations
We expect io_rw_reissue() to take place only during submission with
uring_lock held. Add a lockdep annotation to check that invariant.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-09 09:21:43 -07:00
Linus Torvalds 996e435fd4 zonefs fixes for 5.11-rc3
A single patch from Arnd in this pull request to fix a missing
 dependency in zonefs Kconfig.
 
 Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCX/j4XAAKCRDdoc3SxdoY
 dq7JAQDx1EafzD1RgEKoa0gKNZMBkNr08zaF1viEg9RoAu+KsAEApem43PmHfUQx
 FSH9cSzkqbgcnusBsWv8pdCKS1R0iQs=
 =Fd60
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fix from Damien Le Moal:
 "A single patch from Arnd to fix a missing dependency in zonefs
  Kconfig"

* tag 'zonefs-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: select CONFIG_CRC32
2021-01-08 18:04:43 -08:00
Linus Torvalds ef0ba05538 poll: fix performance regression due to out-of-line __put_user()
The kernel test robot reported a -5.8% performance regression on the
"poll2" test of will-it-scale, and bisected it to commit d55564cfc2
("x86: Make __put_user() generate an out-of-line call").

I didn't expect an out-of-line __put_user() to matter, because no normal
core code should use that non-checking legacy version of user access any
more.  But I had overlooked the very odd poll() usage, which does a
__put_user() to update the 'revents' values of the poll array.

Now, Al Viro correctly points out that instead of updating just the
'revents' field, it would be much simpler to just copy the _whole_
pollfd entry, and then we could just use "copy_to_user()" on the whole
array of entries, the same way we use "copy_from_user()" a few lines
earlier to get the original values.

But that is not what we've traditionally done, and I worry that threaded
applications might be concurrently modifying the other fields of the
pollfd array.  So while Al's suggestion is simpler - and perhaps worth
trying in the future - this instead keeps the "just update revents"
model.

To fix the performance regression, use the modern "unsafe_put_user()"
instead of __put_user(), with the proper "user_write_access_begin()"
guarding in place. This improves code generation enormously.

Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Laight <David.Laight@aculab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-08 11:06:29 -08:00
Josef Bacik e076ab2a2c btrfs: shrink delalloc pages instead of full inodes
Commit 38d715f494 ("btrfs: use btrfs_start_delalloc_roots in
shrink_delalloc") cleaned up how we do delalloc shrinking by utilizing
some infrastructure we have in place to flush inodes that we use for
device replace and snapshot.  However this introduced a pretty serious
performance regression.  To reproduce the user untarred the source
tarball of Firefox (360MiB xz compressed/1.5GiB uncompressed), and would
see it take anywhere from 5 to 20 times as long to untar in 5.10
compared to 5.9. This was observed on fast devices (SSD and better) and
not on HDD.

The root cause is because before we would generally use the normal
writeback path to reclaim delalloc space, and for this we would provide
it with the number of pages we wanted to flush.  The referenced commit
changed this to flush that many inodes, which drastically increased the
amount of space we were flushing in certain cases, which severely
affected performance.

We cannot revert this patch unfortunately because of 3d45f221ce
("btrfs: fix deadlock when cloning inline extent and low on free
metadata space") which requires the ability to skip flushing inodes that
are being cloned in certain scenarios, which means we need to keep using
our flushing infrastructure or risk re-introducing the deadlock.

Instead to fix this problem we can go back to providing
btrfs_start_delalloc_roots with a number of pages to flush, and then set
up a writeback_control and utilize sync_inode() to handle the flushing
for us.  This gives us the same behavior we had prior to the fix, while
still allowing us to avoid the deadlock that was fixed by Filipe.  I
redid the users original test and got the following results on one of
our test machines (256GiB of ram, 56 cores, 2TiB Intel NVMe drive)

  5.9		0m54.258s
  5.10		1m26.212s
  5.10+patch	0m38.800s

5.10+patch is significantly faster than plain 5.9 because of my patch
series "Change data reservations to use the ticketing infra" which
contained the patch that introduced the regression, but generally
improved the overall ENOSPC flushing mechanisms.

Additional testing on consumer-grade SSD (8GiB ram, 8 CPU) confirm
the results:

  5.10.5            4m00s
  5.10.5+patch      1m08s
  5.11-rc2	    5m14s
  5.11-rc2+patch    1m30s

Reported-by: René Rebe <rene@exactcode.de>
Fixes: 38d715f494 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
CC: stable@vger.kernel.org # 5.10
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: David Sterba <dsterba@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add my test results ]
Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-08 16:36:44 +01:00
Christoph Hellwig 2d2f6f1b47 block: pre-initialize struct block_device in bdev_alloc_inode
bdev_evict_inode and bdev_free_inode are also called for the root inode
of bdevfs, for which bdev_alloc is never called.  Move the zeroing o
f struct block_device and the initialization of the bd_bdi field into
bdev_alloc_inode to make sure they are initialized for the root inode
as well.

Fixes: e6cb53827e ("block: initialize struct block_device in bdev_alloc")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-07 20:57:53 -07:00
Satya Tangirala 04a6a536bc fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb
freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
bd_fsfreeze_sb.

But this means a freeze_bdev() call followed by a thaw_bdev() call can
leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
zero. If freeze_bdev() is called again, and this time
get_active_super() returns NULL (e.g. because the FS is unmounted),
we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
*untouched* - it stays the same (now garbage) value. A subsequent
thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
(since bd_fsfreeze_count > 0), and attempt to use it.

Fix this by always setting bd_fsfreeze_sb to NULL when
bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
Alternatively, we could set bd_fsfreeze_sb to whatever
get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
is successfully incremented to 1 from 0 (which can be achieved cleanly
by moving the line currently setting bd_fsfreeze_sb to immediately
after the "sync:" label, but it might be a little too subtle/easily
overlooked in future).

This fixes the currently panicking xfstests generic/085.

Fixes: 040f04bd2e ("fs: simplify freeze_bdev/thaw_bdev")
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-07 09:25:54 -07:00
Qu Wenruo 50e31ef486 btrfs: reloc: fix wrong file extent type check to avoid false ENOENT
[BUG]
There are several bug reports about recent kernel unable to relocate
certain data block groups.

Sometimes the error just goes away, but there is one reporter who can
reproduce it reliably.

The dmesg would look like:

  [438.260483] BTRFS info (device dm-10): balance: start -dvrange=34625344765952..34625344765953
  [438.269018] BTRFS info (device dm-10): relocating block group 34625344765952 flags data|raid1
  [450.439609] BTRFS info (device dm-10): found 167 extents, stage: move data extents
  [463.501781] BTRFS info (device dm-10): balance: ended with status: -2

[CAUSE]
The ENOENT error is returned from the following call chain:

  add_data_references()
  |- delete_v1_space_cache();
     |- if (!found)
	   return -ENOENT;

The variable @found is set to true if we find a data extent whose
disk bytenr matches parameter @data_bytes.

With extra debugging, the offending tree block looks like this:

  leaf bytenr = 42676709441536, data_bytenr = 34626327621632

                ctime 1567904822.739884119 (2019-09-08 03:07:02)
                mtime 0.0 (1970-01-01 01:00:00)
                otime 0.0 (1970-01-01 01:00:00)
        item 27 key (51933 EXTENT_DATA 0) itemoff 9854 itemsize 53
                generation 1517381 type 2 (prealloc)
                prealloc data disk byte 34626327621632 nr 262144 <<<
                prealloc data offset 0 nr 262144
        item 28 key (52262 ROOT_ITEM 0) itemoff 9415 itemsize 439
                generation 2618893 root_dirid 256 bytenr 42677048360960 level 3 refs 1
                lastsnap 2618893 byte_limit 0 bytes_used 5557338112 flags 0x0(none)
                uuid d0d4361f-d231-6d40-8901-fe506e4b2b53

Although item 27 has disk bytenr 34626327621632, which matches the
data_bytenr, its type is prealloc, not reg.
This makes the existing code skip that item, and return ENOENT.

[FIX]
The code is modified in commit 19b546d7a1 ("btrfs: relocation: Use
btrfs_find_all_leafs to locate data extent parent tree leaves"), before
that commit, we use something like

  "if (type == BTRFS_FILE_EXTENT_INLINE) continue;"

But in that offending commit, we use (type == BTRFS_FILE_EXTENT_REG),
ignoring BTRFS_FILE_EXTENT_PREALLOC.

Fix it by also checking BTRFS_FILE_EXTENT_PREALLOC.

Reported-by: Stéphane Lesimple <stephane_btrfs2@lesimple.fr>
Link: https://lore.kernel.org/linux-btrfs/505cabfa88575ed6dbe7cb922d8914fb@lesimple.fr
Fixes: 19b546d7a1 ("btrfs: relocation: Use btrfs_find_all_leafs to locate data extent parent tree leaves")
CC: stable@vger.kernel.org # 5.6+
Tested-By: Stéphane Lesimple <stephane_btrfs2@lesimple.fr>
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-07 17:25:05 +01:00
Su Yue 347fb0cfc9 btrfs: tree-checker: check if chunk item end overflows
While mounting a crafted image provided by user, kernel panics due to
the invalid chunk item whose end is less than start.

  [66.387422] loop: module loaded
  [66.389773] loop0: detected capacity change from 262144 to 0
  [66.427708] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 12 /dev/loop0 scanned by mount (613)
  [66.431061] BTRFS info (device loop0): disk space caching is enabled
  [66.431078] BTRFS info (device loop0): has skinny extents
  [66.437101] BTRFS error: insert state: end < start 29360127 37748736
  [66.437136] ------------[ cut here ]------------
  [66.437140] WARNING: CPU: 16 PID: 613 at fs/btrfs/extent_io.c:557 insert_state.cold+0x1a/0x46 [btrfs]
  [66.437369] CPU: 16 PID: 613 Comm: mount Tainted: G           O      5.11.0-rc1-custom #45
  [66.437374] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
  [66.437378] RIP: 0010:insert_state.cold+0x1a/0x46 [btrfs]
  [66.437420] RSP: 0018:ffff93e5414c3908 EFLAGS: 00010286
  [66.437427] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000
  [66.437431] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff
  [66.437434] RBP: ffff93e5414c3938 R08: 0000000000000001 R09: 0000000000000001
  [66.437438] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d72aa0
  [66.437441] R13: ffff8ec78bc71628 R14: 0000000000000000 R15: 0000000002400000
  [66.437447] FS:  00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000
  [66.437451] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [66.437455] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0
  [66.437460] PKRU: 55555554
  [66.437464] Call Trace:
  [66.437475]  set_extent_bit+0x652/0x740 [btrfs]
  [66.437539]  set_extent_bits_nowait+0x1d/0x20 [btrfs]
  [66.437576]  add_extent_mapping+0x1e0/0x2f0 [btrfs]
  [66.437621]  read_one_chunk+0x33c/0x420 [btrfs]
  [66.437674]  btrfs_read_chunk_tree+0x6a4/0x870 [btrfs]
  [66.437708]  ? kvm_sched_clock_read+0x18/0x40
  [66.437739]  open_ctree+0xb32/0x1734 [btrfs]
  [66.437781]  ? bdi_register_va+0x1b/0x20
  [66.437788]  ? super_setup_bdi_name+0x79/0xd0
  [66.437810]  btrfs_mount_root.cold+0x12/0xeb [btrfs]
  [66.437854]  ? __kmalloc_track_caller+0x217/0x3b0
  [66.437873]  legacy_get_tree+0x34/0x60
  [66.437880]  vfs_get_tree+0x2d/0xc0
  [66.437888]  vfs_kern_mount.part.0+0x78/0xc0
  [66.437897]  vfs_kern_mount+0x13/0x20
  [66.437902]  btrfs_mount+0x11f/0x3c0 [btrfs]
  [66.437940]  ? kfree+0x5ff/0x670
  [66.437944]  ? __kmalloc_track_caller+0x217/0x3b0
  [66.437962]  legacy_get_tree+0x34/0x60
  [66.437974]  vfs_get_tree+0x2d/0xc0
  [66.437983]  path_mount+0x48c/0xd30
  [66.437998]  __x64_sys_mount+0x108/0x140
  [66.438011]  do_syscall_64+0x38/0x50
  [66.438018]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [66.438023] RIP: 0033:0x7f0138827f6e
  [66.438033] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
  [66.438040] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e
  [66.438044] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0
  [66.438047] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001
  [66.438050] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
  [66.438054] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440
  [66.438078] irq event stamp: 18169
  [66.438082] hardirqs last  enabled at (18175): [<ffffffffb81154bf>] console_unlock+0x4ff/0x5f0
  [66.438088] hardirqs last disabled at (18180): [<ffffffffb8115427>] console_unlock+0x467/0x5f0
  [66.438092] softirqs last  enabled at (16910): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20
  [66.438097] softirqs last disabled at (16905): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20
  [66.438103] ---[ end trace e114b111db64298b ]---
  [66.438107] BTRFS error: found node 12582912 29360127 on insert of 37748736 29360127
  [66.438127] BTRFS critical: panic in extent_io_tree_panic:679: locking error: extent tree was modified by another thread while locked (errno=-17 Object already exists)
  [66.441069] ------------[ cut here ]------------
  [66.441072] kernel BUG at fs/btrfs/extent_io.c:679!
  [66.442064] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  [66.443018] CPU: 16 PID: 613 Comm: mount Tainted: G        W  O      5.11.0-rc1-custom #45
  [66.444538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
  [66.446223] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs]
  [66.450878] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246
  [66.451840] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000
  [66.453141] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff
  [66.454445] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001
  [66.455743] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0
  [66.457055] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000
  [66.458356] FS:  00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000
  [66.459841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [66.460895] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0
  [66.462196] PKRU: 55555554
  [66.462692] Call Trace:
  [66.463139]  set_extent_bit.cold+0x30/0x98 [btrfs]
  [66.464049]  set_extent_bits_nowait+0x1d/0x20 [btrfs]
  [66.490466]  add_extent_mapping+0x1e0/0x2f0 [btrfs]
  [66.514097]  read_one_chunk+0x33c/0x420 [btrfs]
  [66.534976]  btrfs_read_chunk_tree+0x6a4/0x870 [btrfs]
  [66.555718]  ? kvm_sched_clock_read+0x18/0x40
  [66.575758]  open_ctree+0xb32/0x1734 [btrfs]
  [66.595272]  ? bdi_register_va+0x1b/0x20
  [66.614638]  ? super_setup_bdi_name+0x79/0xd0
  [66.633809]  btrfs_mount_root.cold+0x12/0xeb [btrfs]
  [66.652938]  ? __kmalloc_track_caller+0x217/0x3b0
  [66.671925]  legacy_get_tree+0x34/0x60
  [66.690300]  vfs_get_tree+0x2d/0xc0
  [66.708221]  vfs_kern_mount.part.0+0x78/0xc0
  [66.725808]  vfs_kern_mount+0x13/0x20
  [66.742730]  btrfs_mount+0x11f/0x3c0 [btrfs]
  [66.759350]  ? kfree+0x5ff/0x670
  [66.775441]  ? __kmalloc_track_caller+0x217/0x3b0
  [66.791750]  legacy_get_tree+0x34/0x60
  [66.807494]  vfs_get_tree+0x2d/0xc0
  [66.823349]  path_mount+0x48c/0xd30
  [66.838753]  __x64_sys_mount+0x108/0x140
  [66.854412]  do_syscall_64+0x38/0x50
  [66.869673]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [66.885093] RIP: 0033:0x7f0138827f6e
  [66.945613] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
  [66.977214] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e
  [66.994266] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0
  [67.011544] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001
  [67.028836] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
  [67.045812] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440
  [67.216138] ---[ end trace e114b111db64298c ]---
  [67.237089] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs]
  [67.325317] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246
  [67.347946] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000
  [67.371343] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff
  [67.394757] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001
  [67.418409] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0
  [67.441906] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000
  [67.465436] FS:  00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000
  [67.511660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [67.535047] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0
  [67.558449] PKRU: 55555554
  [67.581146] note: mount[613] exited with preempt_count 2

The image has a chunk item which has a logical start 37748736 and length
18446744073701163008 (-8M). The calculated end 29360127 overflows.
EEXIST was caught by insert_state() because of the duplicate end and
extent_io_tree_panic() was called.

Add overflow check of chunk item end to tree checker so it can be
detected early at mount time.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208929
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-07 17:25:05 +01:00
Su Yue 29b665cc51 btrfs: prevent NULL pointer dereference in extent_io_tree_panic
Some extent io trees are initialized with NULL private member (e.g.
btrfs_device::alloc_state and btrfs_fs_info::excluded_extents).
Dereference of a NULL tree->private as inode pointer will cause panic.

Pass tree->fs_info as it's known to be valid in all cases.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208929
Fixes: 05912a3c04 ("btrfs: drop extent_io_ops::tree_fs_info callback")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-07 17:25:05 +01:00
Josef Bacik 71008734d2 btrfs: print the actual offset in btrfs_root_name
We're supposed to print the root_key.offset in btrfs_root_name in the
case of a reloc root, not the objectid.  Fix this helper to take the key
so we have access to the offset when we need it.

Fixes: 457f1864b5 ("btrfs: pretty print leaked root name")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-07 17:25:05 +01:00
Pavel Begunkov b1445e59cc io_uring: synchronise ev_posted() with waitqueues
waitqueue_active() needs smp_mb() to be in sync with waitqueues
modification, but we miss it in io_cqring_ev_posted*() apart from
cq_wait() case.

Take an smb_mb() out of wq_has_sleeper() making it waitqueue_active(),
and place it a few lines before, so it can synchronise other
waitqueue_active() as well.

The patch doesn't add any additional overhead, so even if there are
no problems currently, it's just safer to have it this way.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-07 07:48:09 -07:00
Pavel Begunkov 4aa84f2ffa io_uring: dont kill fasync under completion_lock
CPU0                    CPU1
       ----                    ----
  lock(&new->fa_lock);
                               local_irq_disable();
                               lock(&ctx->completion_lock);
                               lock(&new->fa_lock);
  <Interrupt>
    lock(&ctx->completion_lock);

 *** DEADLOCK ***

Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(),
so it doesn't hold completion_lock while doing it. That saves from the
reported deadlock, and it's just nice to shorten the locking time and
untangle nested locks (compl_lock -> wq_head::lock).

Reported-by: syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-07 07:48:09 -07:00
Pavel Begunkov 80c18e4ac2 io_uring: trigger eventfd for IOPOLL
Make sure io_iopoll_complete() tries to wake up eventfd, which currently
is skipped together with io_cqring_ev_posted() for non-SQPOLL IOPOLL.

Add an iopoll version of io_cqring_ev_posted(), duplicates a bit of
code, but they actually use different sets of wait queues may be for
better.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-07 07:48:09 -07:00
Linus Torvalds 71c061d244 for-5.11-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl/0cI8ACgkQxWXV+ddt
 WDspQw/8DcC8zhGgunk0m2kcXd6dFOGbsr3hNGCsgUSKESRw6AgTZ0rJf/QLjayF
 /vaJWzQW9ijfZ92fWZS+mrmskk0N8RFOsEvkCRLesgRaasbrkchLBo5HGQasOBEV
 LXyU878GrBkNaHzClJz+JdU26i0d17BFdddgtZVQ1St9Wd9ecc7Q6iqG80RWFeE7
 uVbhv+QjocM3EieOnwIy5Mz6jZgJLYwqw7/y2njKduBeJtbt1K1j/y7IJk0WFMUM
 8eUpDL6vlAHB8FjV2wWOzO46bbEaUpaBADM6yabrq0lnM0kr7Rb+WV/WSLM/AZ3g
 Hzs4qROOEP+zjfZ5nYjJQDJRMpSipZomsUY5uMZnhRxlZuHPaoBotRRzs5AIZYj2
 BnkfucOcjxS/JTBD//ltJXE8RxbMIyMBBBipbBwqmxOkR9gM9BPuJ6iJPfUX//gG
 1GHJ+FPns8ua3JW21ih6H31xNEPS36tsywvE8yCEtEWMxCFCBwgGu+4D8KpGBjtY
 ySFxkxxAbTuFi9fqSE/mBC+6lpbVTO0OvizuoEQh8C2izkXRbDsDVgPN8d7rCW7h
 Cdox4DUp61sNf+G3ll9Dv9ceAXroZTVRTHGjlav6NAFpydz3yPo5x54Ex7S+k3oN
 BAcZEl1Tl3hz4WxF8Ywc+yJ8n8l9AVa3KcYRXVbyVjTGg+JjU94=
 =jlQf
 -----END PGP SIGNATURE-----

Merge tag 'for-5.11-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes that arrived before the end of the year:

   - a bunch of fixes related to transaction handle lifetime wrt various
     operations (umount, remount, qgroup scan, orphan cleanup)

   - async discard scheduling fixes

   - fix item size calculation when item keys collide for extend refs
     (hardlinks)

   - fix qgroup flushing from running transaction

   - fix send, wrong file path when there is an inode with a pending
     rmdir

   - fix deadlock when cloning inline extent and low on free metadata
     space"

* tag 'for-5.11-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: run delayed iputs when remounting RO to avoid leaking them
  btrfs: add assertion for empty list of transactions at late stage of umount
  btrfs: fix race between RO remount and the cleaner task
  btrfs: fix transaction leak and crash after cleaning up orphans on RO mount
  btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan
  btrfs: merge critical sections of discard lock in workfn
  btrfs: fix racy access to discard_ctl data
  btrfs: fix async discard stall
  btrfs: tests: initialize test inodes location
  btrfs: send: fix wrong file path when there is an inode with a pending rmdir
  btrfs: qgroup: don't try to wait flushing if we're already holding a transaction
  btrfs: correctly calculate item size used when item key collision happens
  btrfs: fix deadlock when cloning inline extent and low on free metadata space
2021-01-06 11:19:08 -08:00
Dave Wysochanski 3d1a90ab0e NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
It is only safe to call the tracepoint before rpc_put_task() because
'data' is freed inside nfs4_lock_release (rpc_release).

Fixes: 48c9579a1a ("Adding stateid information to tracepoints")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-06 12:34:57 -05:00
Matthew Wilcox (Oracle) 3e2224c586 io_uring: Fix return value from alloc_fixed_file_ref_node
alloc_fixed_file_ref_node() currently returns an ERR_PTR on failure.
io_sqe_files_unregister() expects it to return NULL and since it can only
return -ENOMEM, it makes more sense to change alloc_fixed_file_ref_node()
to behave that way.

Fixes: 1ffc54220c ("io_uring: fix io_sqe_files_unregister() hangs")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-06 09:19:49 -07:00
Linus Torvalds 6207214a70 AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl/znRUACgkQ+7dXa6fL
 C2tCgQ/9Erue3NZlcqHxYtpEUWniUUS39zRmGaPnDvnPqykM4nqN196udMPzeVx1
 U/Uc1y4bQqLy43aPKVqpOMtjsiQfdqRTc1S8iZwd+4tv39R0RuBn2fP6d52cbsj1
 plCS2qHr9YHic+PPlOOZ21uU2hfDwyZwEcbmhaA0TrFq3nLkNqcP1nRKKEtTxhUD
 EG0j2LRDKrBDDWKqo/lwNb81lnyTJsbnuFVWVI11UK1+xjyIkCBc7KZg4kMH+6+r
 Sc7De1qe0w3qYczLKXfKvi4axRkXIKaswvwlEA1+HDXK1HwYuoVgbfnF/Z3ILA5s
 sr4dNx/M+M9CB5OaUG4rnn1VUQwhFK7c2MftR8iaXiYReK1rjV3p+AQ0B8OUDxb7
 un8w0gIdDaaw7qN52iaywXWf+pYKDIqE5IHEm/daNNPfmC9S4vMLLc89tTzvi1aC
 YGH6+dERUtf5lnR2L9z3GodMDuEW3YpLelgVspL7tGYsQm3nvsiIi1vew/QR4qir
 TksjJOwu/0WhHAJD32rnmppiJQ+s5a1N+vo46Ax69fV9YAglNWPPYYUZyY95+mw/
 g5Y8jswCy0h89yq/3nYBMJsgFhFY2LNOOn+1C9PtQxKcXkSubVLh7ssw0aORD52q
 UbdPnG5RoU7UpqmvGnosWIoO+iYAKp23DCIiJLIoR9Ve0RfAx3Q=
 =OQRI
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-04012021' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
 "Two fixes.

  The first is the fix for the strnlen() array limit check and the
  second fixes the calculation of the number of dirent records used to
  represent any particular filename length"

* tag 'afs-fixes-04012021' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix directory entry size calculation
  afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y
2021-01-05 11:55:46 -08:00
Ye Bin 170b3bbda0 io_uring: Delete useless variable ‘id’ in io_prep_async_work
Fix follow warning:
fs/io_uring.c:1523:22: warning: variable ‘id’ set but not used
[-Wunused-but-set-variable]
  struct io_identity *id;
                        ^~
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-05 11:34:23 -07:00
Pavel Begunkov 90df08538c io_uring: cancel more aggressively in exit_work
While io_ring_exit_work() is running new requests of all sorts may be
issued, so it should do a bit more to cancel them, otherwise they may
just get stuck. e.g. in io-wq, in poll lists, etc.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-04 15:22:51 -07:00
Pavel Begunkov de7f1d9e99 io_uring: drop file refs after task cancel
io_uring fds marked O_CLOEXEC and we explicitly cancel all requests
before going through exec, so we don't want to leave task's file
references to not our anymore io_uring instances.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-04 15:22:50 -07:00
Pavel Begunkov 6c503150ae io_uring: patch up IOPOLL overflow_flush sync
IOPOLL skips completion locking but keeps it under uring_lock, thus
io_cqring_overflow_flush() and so io_cqring_events() need additional
locking with uring_lock in some cases for IOPOLL.

Remove __io_cqring_overflow_flush() from io_cqring_events(), introduce a
wrapper around flush doing needed synchronisation and call it by hand.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-04 15:22:29 -07:00
Pavel Begunkov 81b6d05cca io_uring: synchronise IOPOLL on task_submit fail
io_req_task_submit() might be called for IOPOLL, do the fail path under
uring_lock to comply with IOPOLL synchronisation based solely on it.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-04 15:22:27 -07:00
David Howells 366911cd76 afs: Fix directory entry size calculation
The number of dirent records used by an AFS directory entry should be
calculated using the assumption that there is a 16-byte name field in the
first block, rather than a 20-byte name field (which is actually the case).
This miscalculation is historic and effectively standard, so we have to use
it.

The calculation we need to use is:

	1 + (((strlen(name) + 1) + 15) >> 5)

where we are adding one to the strlen() result to account for the NUL
termination.

Fix this by the following means:

 (1) Create an inline function to do the calculation for a given name
     length.

 (2) Use the function to calculate the number of records used for a dirent
     in afs_dir_iterate_block().

     Use this to move the over-end check out of the loop since it only
     needs to be done once.

     Further use this to only go through the loop for the 2nd+ records
     composing an entry.  The only test there now is for if the record is
     allocated - and we already checked the first block at the top of the
     outer loop.

 (3) Add a max name length check in afs_dir_iterate_block().

 (4) Make afs_edit_dir_add() and afs_edit_dir_remove() use the function
     from (1) to calculate the number of blocks rather than doing it
     incorrectly themselves.

Fixes: 63a4681ff3 ("afs: Locally edit directory data for mkdir/create/unlink/...")
Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
2021-01-04 12:25:19 +00:00
David Howells 26982a89ca afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y
AFS has a structured layout in its directory contents (AFS dirs are
downloaded as files and parsed locally by the client for lookup/readdir).
The slots in the directory are defined by union afs_xdr_dirent.  This,
however, only directly allows a name of a length that will fit into that
union.  To support a longer name, the next 1-8 contiguous entries are
annexed to the first one and the name flows across these.

afs_dir_iterate_block() uses strnlen(), limited to the space to the end of
the page, to find out how long the name is.  This worked fine until
6a39e62abb.  With that commit, the compiler determines the size of the
array and asserts that the string fits inside that array.  This is a
problem for AFS because we *expect* it to overflow one or more arrays.

A similar problem also occurs in afs_dir_scan_block() when a directory file
is being locally edited to avoid the need to redownload it.  There strlen()
was being used safely because each page has the last byte set to 0 when the
file is downloaded and validated (in afs_dir_check_page()).

Fix this by changing the afs_xdr_dirent union name field to an
indeterminate-length array and dropping the overflow field.

(Note that whilst looking at this, I realised that the calculation of the
number of slots a dirent used is non-standard and not quite right, but I'll
address that in a separate patch.)

The issue can be triggered by something like:

        touch /afs/example.com/thisisaveryveryverylongname

and it generates a report that looks like:

        detected buffer overflow in strnlen
        ------------[ cut here ]------------
        kernel BUG at lib/string.c:1149!
        ...
        RIP: 0010:fortify_panic+0xf/0x11
        ...
        Call Trace:
         afs_dir_iterate_block+0x12b/0x35b
         afs_dir_iterate+0x14e/0x1ce
         afs_do_lookup+0x131/0x417
         afs_lookup+0x24f/0x344
         lookup_open.isra.0+0x1bb/0x27d
         open_last_lookups+0x166/0x237
         path_openat+0xe0/0x159
         do_filp_open+0x48/0xa4
         ? kmem_cache_alloc+0xf5/0x16e
         ? __clear_close_on_exec+0x13/0x22
         ? _raw_spin_unlock+0xa/0xb
         do_sys_openat2+0x72/0xde
         do_sys_open+0x3b/0x58
         do_syscall_64+0x2d/0x3a
         entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 6a39e62abb ("lib: string.h: detect intra-object overflow in fortified string functions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: Daniel Axtens <dja@axtens.net>
2021-01-04 12:25:19 +00:00
Arnd Bergmann 4f8b848788 zonefs: select CONFIG_CRC32
When CRC32 is disabled, zonefs cannot be linked:

ld: fs/zonefs/super.o: in function `zonefs_fill_super':

Add a Kconfig 'select' statement for it.

Fixes: 8dcc1a9d90 ("fs: New zonefs file system")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
2021-01-04 09:06:42 +09:00
Linus Torvalds 8b4805c68a block-5.11-2021-01-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/vOAwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpryUD/9C4cYltnJmzd4OQLDav3vchGI2dhy8Fh7T
 Lp04YPpsVFswZq/tz1fyrP1gA4r7lD2QGn+rGtel/hgaXkaxLqwoQ9No/lOJ7Y22
 dtDfPGlNrvhjBQL5l+N7xP1DF8BlBOaXHPfMSW2t14InnV/TYUvxuI4YwhZIiuqP
 kWYmAGcTdyFRS/x+tQjiyvqMd8VVYiTlEWyL4TpZoxeHigZF2Q3An3uZ+NdsnO0z
 S19yZ7eMwUks4Kx2X2WQ2uaMea90bX+sU6v4XABqBcgWqVH/1mbL4MZ8kaiCUaBr
 66Im7dN1xS/VMyueB3crDhz7RvjDlZZmCz3i9CNnWUcbHrvUXRms8b9LNsrpXkJ/
 ZZq8YAmqM20EBeZVSXL2WCFK1sDBxxsXv5zX4MYwUk7pZ3B+Uea8Z/DCUHBtTpnN
 FEbeGDFZs4IlhHoQ/UnnykdAYHvxUVEbWSICcQrzgeh0e4aPgS7nZOE6FiLU5q4n
 rl+dOjz5SrdURvFBVPybCFnoV9YCdU7mRDZkx/AWyYpG/zGzQhbS1JQzd9YATIFA
 TF6aAl6TuA5yoq9QIiVfd+7SdGqxhM03rCxelw7I9conVzpfBUFXSphVsEh5XnkW
 X2M4R1aTtQ49cscFALX6okuadqJoRFEH1f4hT4m4C8BszRH2UjD/Up52pP15Wq0Q
 mmtr1MenIw==
 =OvP3
 -----END PGP SIGNATURE-----

Merge tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Two minor block fixes from this last week that should go into 5.11:

   - Add missing NOWAIT debugfs definition (Andres)

   - Fix kerneldoc warning introduced this merge window (Randy)"

* tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
  block: add debugfs stanza for QUEUE_FLAG_NOWAIT
  fs: block_dev.c: fix kernel-doc warnings from struct block_device changes
2021-01-01 12:49:09 -08:00
Linus Torvalds dc3e24b214 io_uring-5.11-2021-01-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/vOCwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpqA7D/9AyFvg16KEgfYCN2OYXU5jyphu7sCCb8Cx
 PJ4H+Lf7fWki+/yFdXLxQnuBMGEOYWqtEIPN9CnO/I1ixzOoNugxiFAyerhd/Noh
 COg2EUsUrWq/zobYP60wN9pBPnW6EHTnFVA02kMVKunm4d5O5DZWPXy5BwA9yU3u
 dE9LoYDjFiaahogi3x+EmYStexxT0FB0d5WTONA7qSFrskeNbyVaYy8mY09jPynG
 IbG41fv2n0Zwlcx4XDCebsZ1+08rAGZFhwiq8VBhPNiz7sOud9jW7rRFHXR2FVoo
 DsW2npiYHVvOYqkl1HjXw5Mo6p8UKrDEDAIS7OOAHXM9Lz2/YGS9h9ogROccBta2
 5er12VaahIEiH05KtxpGv/q+vyJK7Gdqg0jSuSzKHSdSpTS10Ejh82Xo2V6lRedb
 gP03ZiDZjLtvh8F5hrWTJqPTtnFDRkY/I7R3WP1Ga7mqajFhpFDMvjvyEMMBCz+K
 KGjMfahNo2nzc9nu5M1VjX42tz5VxKjA3N2netxBfDMVB/GpGcQ7xygS85wx7VPn
 UUChgqw0aJrrq5slOZEAVqSsBN/wN97+m6uLLdk025CzQngwiw5fkTooakPxnGee
 bW9WKMpWBj/ipPXvU5C1tvHk4gxMg+cmxcr6EZ3uaWfE+MC7Xk9c00lNF62CT0Xm
 e+0RWRV1ig==
 =XYT5
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few fixes that should go into 5.11, all marked for stable as well:

   - Fix issue around identity COW'ing and users that share a ring
     across processes

   - Fix a hang associated with unregistering fixed files (Pavel)

   - Move the 'process is exiting' cancelation a bit earlier, so
     task_works aren't affected by it (Pavel)"

* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
  kernel/io_uring: cancel io_uring before task works
  io_uring: fix io_sqe_files_unregister() hangs
  io_uring: add a helper for setting a ref node
  io_uring: don't assume mm is constant across submits
2021-01-01 12:29:49 -08:00
Pavel Begunkov b1b6b5a30d kernel/io_uring: cancel io_uring before task works
For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.

Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-30 19:36:54 -07:00
Pavel Begunkov 1ffc54220c io_uring: fix io_sqe_files_unregister() hangs
io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
however requests keeping them may never complete, e.g. because of some
userspace dependency. Make sure it's interruptible otherwise it would
hang forever.

Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-30 19:35:53 -07:00
Pavel Begunkov 1642b4450d io_uring: add a helper for setting a ref node
Setting a new reference node to a file data is not trivial, don't repeat
it, add and use a helper.

Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-30 19:35:53 -07:00