1
0
Fork 0
Commit Graph

60767 Commits (359efcc2c910117d2faf704ce154e91fc976d37f)

Author SHA1 Message Date
Linus Torvalds b66b449872 Fix remounting (broken in -rc1).
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJduXHFAAoJENW/n+sDE2U6FKkP/j9xxzkjHAQwCaim5DIBNJfo
 1GDpDxrN6Voa4hC82p/8fUwFGedo72Ex/TyyVCOnJ9/iNRSugpIdEAWM+NCNs/Sm
 UHbNkKPxCB0rRiO20tvHZsbhytbM2zLEKuDuNwaI5BN+6RvW6AaAeMMSAv245tMh
 0iu5U8l5B/oPRx+EcirSia8Ep+MiMp8+6MrLoQLVYnrs9wsfTrD8LltGPLqLhbPV
 JPkJaeIdjdM5Ii34z5nE8yyAEYr7pjfL7CNim1NY/cM7gRsX4jTdd2xGMecIaUdV
 TIBWn+NzqIguqLdPHf7xDinwNxF6/F2ppQpwW3zSS0VpQpJI1oApi0Xw8ESiylZ5
 uMkR/+x9XpeSjvyvKonk0D0WlnfLfDy/frSxhw8emsJ1o8TSqL2TgHRqVIfU8+ZD
 fR9zGw8tXjmWHZkC1gkyPx6xWiMAVBrTsgHlRIRWukXtKCf/qlCgoq0HwSfFINkt
 z3AYfpHcAZ/zVpYTfy7x3X4jeM71aToTEQuoJPNk3AQ+b5uKH9fD48XYyx1EQlZl
 jwvldyLcPfU2YVe2KTfsglEGY3XqN4cn3B6Ysm7kX81uh7dJ4oeKX1w0yC47/umS
 1cNbx/4h7R+6fTZGhduOH1znDfw92iel8vz6npOe508mq+isj49kGRPJ9wurAMce
 lcSV5NZgDRpikbIq6fZw
 =hVf6
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.4-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:
 "Fix remounting (broken in -rc1)."

* tag 'gfs2-v5.4-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix initialisation of args for remount
2019-10-30 14:05:40 +01:00
Andrew Price d5798141fd gfs2: Fix initialisation of args for remount
When gfs2 was converted to use fs_context, the initialisation of the
mount args structure to the currently active args was lost with the
removal of gfs2_remount_fs(), so the checks of the new args on remount
became checks against the default values instead of the current ones.
This caused unexpected remount behaviour and test failures (xfstests
generic/294, generic/306 and generic/452).

Reinstate the args initialisation, this time in gfs2_init_fs_context()
and conditional upon fc->purpose, as that's the only time we get control
before the mount args are parsed in the remount process.

Fixes: 1f52aa08d1 ("gfs2: Convert gfs2 to fs_context")
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-10-30 12:16:53 +01:00
Linus Torvalds 23fdb198ae fuse fixes for 5.4-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXbgx9wAKCRDh3BK/laaZ
 PIjMAQCTrWPf6pLHSoq+Pll7b8swu1m2mW97fj5k8coED1/DSQEA4222PkIdMmhu
 qyHnoX0fTtYXg6NnHbDVWsNL4uG8YAM=
 =RE9K
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
 "Mostly virtiofs fixes, but also fixes a regression and couple of
  longstanding data/metadata writeback ordering issues"

* tag 'fuse-fixes-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: redundant get_fuse_inode() calls in fuse_writepages_fill()
  fuse: Add changelog entries for protocols 7.1 - 7.8
  fuse: truncate pending writes on O_TRUNC
  fuse: flush dirty data/metadata before non-truncate setattr
  virtiofs: Remove set but not used variable 'fc'
  virtiofs: Retry request submission from worker context
  virtiofs: Count pending forgets as in_flight forgets
  virtiofs: Set FR_SENT flag only after request has been sent
  virtiofs: No need to check fpq->connected state
  virtiofs: Do not end request in submission context
  fuse: don't advise readdirplus for negative lookup
  fuse: don't dereference req->args on finished request
  virtio-fs: don't show mount options
  virtio-fs: Change module name to virtiofs.ko
2019-10-29 17:43:33 +01:00
Linus Torvalds c9a2e4a829 Seven cifs/smb3 fixes, including three for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGyBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl208asACgkQiiy9cAdy
 T1EVJAv41lwLL01FQtWsoMB8yn46q7iboc0eqVJ4CEnZj25Fe3VgokKm1+2N6/15
 g8tQeLnE8kzjQyAxEgI260PHRF6Dg0RSL7zwxnJOFpspWLb9+U4UEryCDao/v4Ao
 LYr88F6XeYDGojyuGKokT/jlsxWfdPwaYhpOPabjO76f6Jv7cNfIa9EKzbEAtq3e
 a9to2AVnJ02FLGTP3QM1ThBtCu3JFFMI07u2Rp8ANiJYtM5Mg2piOkO1PQ3hGLIM
 /3+1E7B12KgG7D3GA7ZuP1HLjyo+A91BhZVTzlBDq1VHTd8ccJDJklg28oXoMrMh
 dgW5XRcS/s60LsRzagpNMC0cPlAdBAd9N9uZaoXmQhNAg0I7CJ24H3tglwKejoUR
 m/duI1C89Fbo1kgjwfAztxhfvair2SYLu8DGijvYH/C5syNUR7V99uy1th+nmOgl
 vpBM9CbkM6mtSxFHtgcFuv4rwIf61EQaGoXiL/e3RLH4zc/HaUpBo2rDh0LPmwFB
 koXIw6E=
 =fTOk
 -----END PGP SIGNATURE-----

Merge tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Seven cifs/smb3 fixes, including three for stable"

* tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs
  CIFS: Fix use after free of file info structures
  CIFS: Fix retry mid list corruption on reconnects
  cifs: Fix missed free operations
  CIFS: avoid using MID 0xFFFF
  cifs: clarify comment about timestamp granularity for old servers
  cifs: Handle -EINPROGRESS only when noblockcnt is set
2019-10-27 06:41:52 -04:00
Linus Torvalds acf913b7fb for-linus-2019-10-26
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl20ST0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprl3D/wJo392Nd3fDa1+t9XpV3M5T/HKcC7yau/G
 98r41DF8TzjlwWFpsbolbBibeUkwpkZr32rAyEN7UScw6d5I+qlGF/ATzZijWGvd
 a714VhMq3fs8siGLDtNxdclYlSXzF5wnhbniPdF1QJsEHK7wAbqXV6/xf5FubHke
 SjYX9HOt0PGfkdDWmdMLRlaaedUZG5RPWAZV4dtg8LWMcL9yu9FZtxbyYyhFXf94
 Wd3MxvDp4/aeudO+CLkI73H0LOl5f0iuIOAOTEDE39X0zQA3ezwYiBh8wRtrBh1Y
 DJus7LI0G56DdUOengyptFkR2qvjmQpJr6mFKzVlp9bRJhwuvzcJ1MWj23oDDFOx
 xW1m74rQcP2COZ2BIvzufKBrBJn8Gq5BXDjQAHgQMNn0FQbEnBnCQ+vA3V4NkzMs
 EuUgk2P/LAwBpldkhFUg78UZUKCxiV72cTihLZJHN2FW5cf652Hf9jaA5Vnrzye5
 gJvXmjWojDmEu77lP51UdKSk4Jhhf+Mr/FM3tNoCyteXpceSIrDE8gOMwhkj3oM4
 eXVgHKSyxrD1npcE2/C7wwOMP9V0FBCFb5ulDbuvRyhZRQeSXtugbzU5Pn9VqgiK
 e7uhXA7AqrKS0bdHdXXlxs0of1gvq42zYcD8A9KFJhUxU07yUKXmIwXO4hApAMsk
 VLxsoPdIMA==
 =PhEL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2019-10-26' of git://git.kernel.dk/linux-block

Pull block and io_uring fixes from Jens Axboe:
 "A bit bigger than usual at this point in time, mostly due to some good
  bug hunting work by Pavel that resulted in three io_uring fixes from
  him and two from me. Anyway, this pull request contains:

   - Revert of the submit-and-wait optimization for io_uring, it can't
     always be done safely. It depends on commands always making
     progress on their own, which isn't necessarily the case outside of
     strict file IO. (me)

   - Series of two patches from me and three from Pavel, fixing issues
     with shared data and sequencing for io_uring.

   - Lastly, two timeout sequence fixes for io_uring (zhangyi)

   - Two nbd patches fixing races (Josef)

   - libahci regulator_get_optional() fix (Mark)"

* tag 'for-linus-2019-10-26' of git://git.kernel.dk/linux-block:
  nbd: verify socket is supported during setup
  ata: libahci_platform: Fix regulator_get_optional() misuse
  nbd: handle racing with error'ed out commands
  nbd: protect cmd->status with cmd->lock
  io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD
  io_uring: used cached copies of sq->dropped and cq->overflow
  io_uring: Fix race for sqes with userspace
  io_uring: Fix broken links with offloading
  io_uring: Fix corrupted user_data
  io_uring: correct timeout req sequence when inserting a new entry
  io_uring : correct timeout req sequence when waiting timeout
  io_uring: revert "io_uring: optimize submit_and_wait API"
2019-10-26 14:59:51 -04:00
Linus Torvalds 485fc4b69c dax fix 5.4-rc5
- Fix a performance regression that followed from a fix to the
   conversion of the fsdax implementation to the xarray. v5.3 users
   report that they stop seeing huge page mappings on an application +
   filesystem layout that was seeing huge pages previously on v5.2.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJds5qfAAoJEB7SkWpmfYgCD6gP/1PuAbdRnTt/pEUB5GLtOR21
 hszHP5vFiWaiDN4yUw7IknYmOWlPVPoHg8SdSwKLM75EGj9JCfQGUcUTZIREB15h
 6MnAavVr3u697ZGfnqg8y2Nhg6hNKbqzRnzCEACAXe+O2GnH2TuJRauyS6Zr33St
 dXiJ+XQbTGoUbR0+CEDIdv0DnkUFaaE/m1+O5DbApcP56Gt/4E1PvlIPJjMog72/
 WKVk4/Cv9+gc7JZMjR+GjjXJafiNaXT1btPGzF6zeHe9XrAnkxvfjm9BSh+qeCxj
 19jOKKFs+bY5O21jczEIvgE4cpPQEytYhZijPwMXFt2o3betOH4ACMDxVMuNJKag
 Q3yd8QJRqtP5xALAxeby7HoD6gdVLLcH0qjTdMt+LHvSde5OH3l8qOfWjK5CoIdx
 uiUC+ugg8FDZLTsliI5qPaDCSYhswoyLlvIF3AIqllB7Y1lUbihw5h+gXr9aLGmu
 1SUA6ipPw1gUW5ikYlKTu5Lzxa2VN8vHBv03W1mjuj2nOggRWp8cK8CmwKpnUuOp
 9xD1WB0dl23MRNRZOKQvkrg8kOL2vuowwJYQ6Kl3Tc2vWvqYfNRQW6dSwgx3gF38
 rQ0ZQddet4EMATrxbwMalU5hVDV2w5LAUz0+bNyovb7S0oZ20wLHm5o5iImppV1U
 +K2aRqwaPi/ytcwRF+iF
 =LioO
 -----END PGP SIGNATURE-----

Merge tag 'dax-fix-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull dax fix from Dan Williams:
 "Fix a performance regression that followed from a fix to the
  conversion of the fsdax implementation to the xarray. v5.3 users
  report that they stop seeing huge page mappings on an application +
  filesystem layout that was seeing huge pages previously on v5.2"

* tag 'dax-fix-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  fs/dax: Fix pmd vs pte conflict detection
2019-10-26 06:26:04 -04:00
Jens Axboe 2b2ed9750f io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD
We currently assume that submissions from the sqthread are successful,
and if IO polling is enabled, we use that value for knowing how many
completions to look for. But if we overflowed the CQ ring or some
requests simply got errored and already completed, they won't be
available for polling.

For the case of IO polling and SQTHREAD usage, look at the pending
poll list. If it ever hits empty then we know that we don't have
anymore pollable requests inflight. For that case, simply reset
the inflight count to zero.

Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25 10:58:53 -06:00
Jens Axboe 498ccd9eda io_uring: used cached copies of sq->dropped and cq->overflow
We currently use the ring values directly, but that can lead to issues
if the application is malicious and changes these values on our behalf.
Created in-kernel cached versions of them, and just overwrite the user
side when we update them. This is similar to how we treat the sq/cq
ring tail/head updates.

Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25 10:58:45 -06:00
Pavel Begunkov 935d1e4590 io_uring: Fix race for sqes with userspace
io_ring_submit() finalises with
1. io_commit_sqring(), which releases sqes to the userspace
2. Then calls to io_queue_link_head(), accessing released head's sqe

Reorder them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25 09:02:01 -06:00
Pavel Begunkov fb5ccc9878 io_uring: Fix broken links with offloading
io_sq_thread() processes sqes by 8 without considering links. As a
result, links will be randomely subdivided.

The easiest way to fix it is to call io_get_sqring() inside
io_submit_sqes() as do io_ring_submit().

Downsides:
1. This removes optimisation of not grabbing mm_struct for fixed files
2. It submitting all sqes in one go, without finer-grained sheduling
with cq processing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25 09:01:59 -06:00
Pavel Begunkov 84d55dc5b9 io_uring: Fix corrupted user_data
There is a bug, where failed linked requests are returned not with
specified @user_data, but with garbage from a kernel stack.

The reason is that io_fail_links() uses req->user_data, which is
uninitialised when called from io_queue_sqe() on fail path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25 09:01:58 -06:00
Dave Wysochanski d46b0da7a3 cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs
There's a deadlock that is possible and can easily be seen with
a test where multiple readers open/read/close of the same file
and a disruption occurs causing reconnect.  The deadlock is due
a reader thread inside cifs_strict_readv calling down_read and
obtaining lock_sem, and then after reconnect inside
cifs_reopen_file calling down_read a second time.  If in
between the two down_read calls, a down_write comes from
another process, deadlock occurs.

        CPU0                    CPU1
        ----                    ----
cifs_strict_readv()
 down_read(&cifsi->lock_sem);
                               _cifsFileInfo_put
                                  OR
                               cifs_new_fileinfo
                                down_write(&cifsi->lock_sem);
cifs_reopen_file()
 down_read(&cifsi->lock_sem);

Fix the above by changing all down_write(lock_sem) calls to
down_write_trylock(lock_sem)/msleep() loop, which in turn
makes the second down_read call benign since it will never
block behind the writer while holding lock_sem.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Suggested-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed--by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-10-24 21:35:04 -05:00
Pavel Shilovsky 1a67c41596 CIFS: Fix use after free of file info structures
Currently the code assumes that if a file info entry belongs
to lists of open file handles of an inode and a tcon then
it has non-zero reference. The recent changes broke that
assumption when putting the last reference of the file info.
There may be a situation when a file is being deleted but
nothing prevents another thread to reference it again
and start using it. This happens because we do not hold
the inode list lock while checking the number of references
of the file info structure. Fix this by doing the proper
locking when doing the check.

Fixes: 487317c994 ("cifs: add spinlock for the openFileList to cifsInodeInfo")
Fixes: cb248819d2 ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic")
Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-10-24 21:32:35 -05:00
Pavel Shilovsky abe57073d0 CIFS: Fix retry mid list corruption on reconnects
When the client hits reconnect it iterates over the mid
pending queue marking entries for retry and moving them
to a temporary list to issue callbacks later without holding
GlobalMid_Lock. In the same time there is no guarantee that
mids can't be removed from the temporary list or even
freed completely by another thread. It may cause a temporary
list corruption:

[  430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
[  430.464668] ------------[ cut here ]------------
[  430.466569] kernel BUG at lib/list_debug.c:51!
[  430.468476] invalid opcode: 0000 [#1] SMP PTI
[  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
[  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
...
[  430.510426] Call Trace:
[  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
[  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
[  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
[  430.517452]  ? try_to_wake_up+0x212/0x650
[  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
[  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
[  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[  430.525116]  kthread+0xfb/0x130
[  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
[  430.528514]  ? kthread_park+0x90/0x90
[  430.530019]  ret_from_fork+0x35/0x40

Fix this by obtaining extra references for mids being retried
and marking them as MID_DELETED which indicates that such a mid
has been dequeued from the pending list.

Also move mid cleanup logic from DeleteMidQEntry to
_cifs_mid_q_entry_release which is called when the last reference
to a particular mid is put. This allows to avoid any use-after-free
of response buffers.

The patch needs to be backported to stable kernels. A stable tag
is not mentioned below because the patch doesn't apply cleanly
to any actively maintained stable kernel.

Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-and-tested-by: David Wysochanski <dwysocha@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-10-24 21:32:32 -05:00
Linus Torvalds 65b15b7f4b Fix a memory leak introduced in -rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdsbi7AAoJENW/n+sDE2U6CNMQAL9iTPDLkNEDvNNrnj3v5cq3
 WS3z6seqt+IQT9PYO5/vWsjCXOM0KLU4MC9gUUVYm+8xy6t10ViOUdltx7zRXpJP
 yh8nIHiAAYNAE2sMbZ1JsmXZPE0fZd8/kP8fhAiyYE7KYpAoCJyZGNPQsZwmI6JU
 oa4c6sJ9diYkUSiYJ5A+DN9kv2ANjDQ96ShcfoTFKSijurasIslqybLqdg0r0bgc
 NpF+xIh1z66dEisAE+Hj+l/FXHeWU+c/wCpRV8atoRNpYX0f7X8u7PBQOcrvVtcd
 Lyq3UwSCqqO234Dly5o9RotdnvI66iLXj7Hf8Zx5l9DhR6qTbqpG30fEGiv/vIrC
 Zvcavg6zQXYcPrXllmqUIZv3P/reVdUc4MQMToO5lRuWpLtbdMnUtulVGBMusLSF
 jthxasj6u+EcM0Xe/mhgDBpI/GDQrjH2+pGGjQk6VI1lkLBWZdEQphUAiuthc6GR
 9FA6oPX/4Gp1ry+6CUJePegHYW6BW9X2Qcmqqz/QHOCabkAnX37ay5z/9tbb47JT
 Jl0EATxUQag4kIco7M04JFviXMONfsXpA1QyDj/VxfOiKx0rDB054DJT1Ge+5qvR
 9x7ymUWIbPb7NR9jMfIfzgb5iwlVQdQJv46urryKEKNZaKubVWY8j1iaYpr4cOny
 1g40ngooghJF3LuwEJA4
 =sfLj
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.4-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:
 "Fix a memory leak introduced in -rc1"

* tag 'gfs2-v5.4-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix memory leak when gfs2meta's fs_context is freed
2019-10-24 15:31:55 -04:00
Andrew Price 30aecae86e gfs2: Fix memory leak when gfs2meta's fs_context is freed
gfs2 and gfs2meta share an ->init_fs_context function which allocates an
args structure stored in fc->fs_private. gfs2 registers a ->free
function to free this memory when the fs_context is cleaned up, but
there was not one registered for gfs2meta, causing a leak.

Register a ->free function for gfs2meta. The existing gfs2_fc_free
function does what we need.

Reported-by: syzbot+c2fdfd2b783754878fb6@syzkaller.appspotmail.com
Fixes: 1f52aa08d1 ("gfs2: Convert gfs2 to fs_context")
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-10-24 16:20:43 +02:00
zhangyi (F) a1f58ba46f io_uring: correct timeout req sequence when inserting a new entry
The sequence number of the timeout req (req->sequence) indicate the
expected completion request. Because of each timeout req consume a
sequence number, so the sequence of each timeout req on the timeout
list shouldn't be the same. But now, we may get the same number (also
incorrect) if we insert a new entry before the last one, such as submit
such two timeout reqs on a new ring instance below.

                    req->sequence
 req_1 (count = 2):       2
 req_2 (count = 1):       2

Then, if we submit a nop req, req_2 will still timeout even the nop req
finished. This patch fix this problem by adjust the sequence number of
each reordered reqs when inserting a new entry.

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-23 22:09:56 -06:00
zhangyi (F) ef03681ae8 io_uring : correct timeout req sequence when waiting timeout
The sequence number of reqs on the timeout_list before the timeout req
should be adjusted in io_timeout_fn(), because the current timeout req
will consumes a slot in the cq_ring and cq_tail pointer will be
increased, otherwise other timeout reqs may return in advance without
waiting for enough wait_nr.

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-23 22:09:56 -06:00
Jens Axboe bc808bced3 io_uring: revert "io_uring: optimize submit_and_wait API"
There are cases where it isn't always safe to block for submission,
even if the caller asked to wait for events as well. Revert the
previous optimization of doing that.

This reverts two commits:

bf7ec93c64
c576666863

Fixes: c576666863 ("io_uring: optimize submit_and_wait API")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-23 22:09:56 -06:00
Vasily Averin 091d1a7267 fuse: redundant get_fuse_inode() calls in fuse_writepages_fill()
Currently fuse_writepages_fill() calls get_fuse_inode() few times with
the same argument.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 14:26:37 +02:00
Miklos Szeredi e4648309b8 fuse: truncate pending writes on O_TRUNC
Make sure cached writes are not reordered around open(..., O_TRUNC), with
the obvious wrong results.

Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 14:26:37 +02:00
Miklos Szeredi b24e7598db fuse: flush dirty data/metadata before non-truncate setattr
If writeback cache is enabled, then writes might get reordered with
chmod/chown/utimes.  The problem with this is that performing the write in
the fuse daemon might itself change some of these attributes.  In such case
the following sequence of operations will result in file ending up with the
wrong mode, for example:

  int fd = open ("suid", O_WRONLY|O_CREAT|O_EXCL);
  write (fd, "1", 1);
  fchown (fd, 0, 0);
  fchmod (fd, 04755);
  close (fd);

This patch fixes this by flushing pending writes before performing
chown/chmod/utimes.

Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 14:26:37 +02:00
Linus Torvalds 54955e3bfd for-5.4-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl2vBPgACgkQxWXV+ddt
 WDvaGQ/+JbV05ML7QjufNFKjzojNPg8dOwvLqF58askMEAqzyqOrzu1YsIOjchqv
 eZPP+wxh6j6i8LFnbaMYRhSVMlgpK9XOR3tNStp73QlSox6/6VKu7XR4wFXAoip3
 l4XI8UqO5XqDG55UTtjKyo2VNq8vgq1gRUWD6hPtfzDP6WYj4JZuXOd+dT6PbP32
 VHd91RoB8Z8qmypLF9Sju6IBWNhKl91TjlRqVdfLywaK8azqaxE1UttPf5DVbE6+
 MlTuXhuxkNc4ddzj//oJ1s3ZP/nXtFmIZ75+Sd6P5DfqRNeIfjkKqXvffsjqoYVI
 1Wv9sDiezxRB1RZjfInhtqvdvsqcsrXrM7x6BRVqI7IZDUaH5em8IoozQT62ze/4
 MJBrRIbVodx2I7EVDMNGkx2TaIDAfnW4Z3UC+YSHLoy6jht1+SA5KLQDR1G/4NeR
 dWht5wOXwhp8P3DoaczTUpk0DLtAtygj04fH8CG277EILLCpuWwW8iRkPttkrIM2
 HrtRKrKFJNyEpq9vHSFVvpQJkzgzqBFs1UnqnEuYh6qNgWCrS4PJDu9geQ8aPGr2
 pA5aEqg5b+jjWzIYDP93PF0u7kF4mFVAozn6xMf95FKmM1OupYYQg5BRe7n/DfDu
 yh3Ms0Mmd9+snpNJ9lDr40cobHC5CDvfvO2SRERyBeXxARainN0=
 =Ft+G
 -----END PGP SIGNATURE-----

Merge tag 'for-5.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fixes of error handling cleanup of metadata accounting with qgroups
   enabled

 - fix swapped values for qgroup tracepoints

 - fix race when handling full sync flag

 - don't start unused worker thread, functionality removed already

* tag 'for-5.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: check for the full sync flag while holding the inode lock during fsync
  Btrfs: fix qgroup double free after failure to reserve metadata for delalloc
  btrfs: tracepoints: Fix bad entry members of qgroup events
  btrfs: tracepoints: Fix wrong parameter order for qgroup events
  btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents()
  btrfs: don't needlessly create extent-refs kernel thread
  btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
  Btrfs: add missing extents release on file extent cluster relocation error
2019-10-23 06:14:29 -04:00
zhengbin 80da5a809d virtiofs: Remove set but not used variable 'fc'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/fuse/virtio_fs.c: In function virtio_fs_wake_pending_and_unlock:
fs/fuse/virtio_fs.c:983:20: warning: variable fc set but not used [-Wunused-but-set-variable]

It is not used since commit 7ee1e2e631 ("virtiofs: No need to check
fpq->connected state")

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 10:25:17 +02:00
Dan Williams 6370740e5f fs/dax: Fix pmd vs pte conflict detection
Users reported a v5.3 performance regression and inability to establish
huge page mappings. A revised version of the ndctl "dax.sh" huge page
unit test identifies commit 23c84eb783 "dax: Fix missed wakeup with
PMD faults" as the source.

Update get_unlocked_entry() to check for NULL entries before checking
the entry order, otherwise NULL is misinterpreted as a present pte
conflict. The 'order' check needs to happen before the locked check as
an unlocked entry at the wrong order must fallback to lookup the correct
order.

Reported-by: Jeff Smits <jeff.smits@intel.com>
Reported-by: Doug Nelson <doug.nelson@intel.com>
Cc: <stable@vger.kernel.org>
Fixes: 23c84eb783 ("dax: Fix missed wakeup with PMD faults")
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Link: https://lore.kernel.org/r/157167532455.3945484.11971474077040503994.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-10-22 22:53:02 -07:00
Vivek Goyal a9bfd9dd34 virtiofs: Retry request submission from worker context
If regular request queue gets full, currently we sleep for a bit and
retrying submission in submitter's context. This assumes submitter is not
holding any spin lock. But this assumption is not true for background
requests. For background requests, we are called with fc->bg_lock held.

This can lead to deadlock where one thread is trying submission with
fc->bg_lock held while request completion thread has called
fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As
request completion thread gets blocked, it does not make further progress
and that means queue does not get empty and submitter can't submit more
requests.

To solve this issue, retry submission with the help of a worker, instead of
retrying in submitter's context. We already do this for hiprio/forget
requests.

Reported-by: Chirantan Ekbote <chirantan@chromium.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:08 +02:00
Vivek Goyal c17ea00961 virtiofs: Count pending forgets as in_flight forgets
If virtqueue is full, we put forget requests on a list and these forgets
are dispatched later using a worker. As of now we don't count these forgets
in fsvq->in_flight variable. This means when queue is being drained, we
have to have special logic to first drain these pending requests and then
wait for fsvq->in_flight to go to zero.

By counting pending forgets in fsvq->in_flight, we can get rid of special
logic and just wait for in_flight to go to zero. Worker thread will kick
and drain all the forgets anyway, leading in_flight to zero.

I also need similar logic for normal request queue in next patch where I am
about to defer request submission in the worker context if queue is full.

This simplifies the code a bit.

Also add two helper functions to inc/dec in_flight. Decrement in_flight
helper will later used to call completion when in_flight reaches zero.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Vivek Goyal 5dbe190f34 virtiofs: Set FR_SENT flag only after request has been sent
FR_SENT flag should be set when request has been sent successfully sent
over virtqueue. This is used by interrupt logic to figure out if interrupt
request should be sent or not.

Also add it to fqp->processing list after sending it successfully.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Vivek Goyal 7ee1e2e631 virtiofs: No need to check fpq->connected state
In virtiofs we keep per queue connected state in virtio_fs_vq->connected
and use that to end request if queue is not connected. And virtiofs does
not even touch fpq->connected state.

We probably need to merge these two at some point of time. For now,
simplify the code a bit and do not worry about checking state of
fpq->connected.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Vivek Goyal 51fecdd255 virtiofs: Do not end request in submission context
Submission context can hold some locks which end request code tries to hold
again and deadlock can occur. For example, fc->bg_lock. If a background
request is being submitted, it might hold fc->bg_lock and if we could not
submit request (because device went away) and tried to end request, then
deadlock happens. During testing, I also got a warning from deadlock
detection code.

So put requests on a list and end requests from a worker thread.

I got following warning from deadlock detector.

[  603.137138] WARNING: possible recursive locking detected
[  603.137142] --------------------------------------------
[  603.137144] blogbench/2036 is trying to acquire lock:
[  603.137149] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse]
[  603.140701]
[  603.140701] but task is already holding lock:
[  603.140703] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse]
[  603.140713]
[  603.140713] other info that might help us debug this:
[  603.140714]  Possible unsafe locking scenario:
[  603.140714]
[  603.140715]        CPU0
[  603.140716]        ----
[  603.140716]   lock(&(&fc->bg_lock)->rlock);
[  603.140718]   lock(&(&fc->bg_lock)->rlock);
[  603.140719]
[  603.140719]  *** DEADLOCK ***

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Miklos Szeredi 6c26f71759 fuse: don't advise readdirplus for negative lookup
If the FUSE_READDIRPLUS_AUTO feature is enabled, then lookups on a
directory before/during readdir are used as an indication that READDIRPLUS
should be used instead of READDIR.  However if the lookup turns out to be
negative, then selecting READDIRPLUS makes no sense.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Miklos Szeredi 2b319d1f6f fuse: don't dereference req->args on finished request
Move the check for async request after check for the request being already
finished and done with.

Reported-by: syzbot+ae0bb7aae3de6b4594e2@syzkaller.appspotmail.com
Fixes: d49937749f ("fuse: stop copying args to fuse_req")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 09:11:40 +02:00
Chuhong Yuan 783bf7b8b6 cifs: Fix missed free operations
cifs_setattr_nounix has two paths which miss free operations
for xid and fullpath.
Use goto cifs_setattr_exit like other paths to fix them.

CC: Stable <stable@vger.kernel.org>
Fixes: aa081859b1 ("cifs: flush before set-info if we have writeable handles")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-10-20 19:19:49 -05:00
Roberto Bergantinos Corpas 03d9a9fe3f CIFS: avoid using MID 0xFFFF
According to MS-CIFS specification MID 0xFFFF should not be used by the
CIFS client, but we actually do. Besides, this has proven to cause races
leading to oops between SendReceive2/cifs_demultiplex_thread. On SMB1,
MID is a 2 byte value easy to reach in CurrentMid which may conflict with
an oplock break notification request coming from server

Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-10-20 19:19:49 -05:00
Steve French 553292a634 cifs: clarify comment about timestamp granularity for old servers
It could be confusing why we set granularity to 1 seconds rather
than 2 seconds (1 second is the max the VFS allows) for these
mounts to very old servers ...

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-10-20 19:19:49 -05:00
Paulo Alcantara (SUSE) d532cc7efd cifs: Handle -EINPROGRESS only when noblockcnt is set
We only want to avoid blocking in connect when mounting SMB root
filesystems, otherwise bail out from generic_ip_connect() so cifs.ko
can perform any reconnect failover appropriately.

This fixes DFS failover/reconnection tests in upstream buildbot.

Fixes: 8eecd1c2e5 ("cifs: Add support for root file systems")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-10-20 19:19:49 -05:00
Linus Torvalds 998d75510e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "Rather a lot of fixes, almost all affecting mm/"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits)
  scripts/gdb: fix debugging modules on s390
  kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
  mm/thp: allow dropping THP from page cache
  mm/vmscan.c: support removing arbitrary sized pages from mapping
  mm/thp: fix node page state in split_huge_page_to_list()
  proc/meminfo: fix output alignment
  mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch
  mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition
  mm: include <linux/huge_mm.h> for is_vma_temporary_stack
  zram: fix race between backing_dev_show and backing_dev_store
  mm/memcontrol: update lruvec counters in mem_cgroup_move_account
  ocfs2: fix panic due to ocfs2_wq is null
  hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
  mm: memblock: do not enforce current limit for memblock_phys* family
  mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
  mm/gup: fix a misnamed "write" argument, and a related bug
  mm/gup_benchmark: add a missing "w" to getopt string
  ocfs2: fix error handling in ocfs2_setattr()
  mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
  mm/memunmap: don't access uninitialized memmap in memunmap_pages()
  ...
2019-10-19 06:53:59 -04:00
Kirill A. Shutemov 2be5fbf9a9 proc/meminfo: fix output alignment
Patch series "Fixes for THP in page cache", v2.

This patch (of 5):

Add extra space for FileHugePages and FilePmdMapped, so the output is
aligned with other rows.

Link: http://lkml.kernel.org/r/20191017164223.2762148-2-songliubraving@fb.com
Fixes: 60fbf0ab5d ("mm,thp: stats for file backed THP")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19 06:32:32 -04:00
Yi Li b918c43021 ocfs2: fix panic due to ocfs2_wq is null
mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
an error.  ocfs2_initialize_super() returns before allocating ocfs2_wq.
ocfs2_dismount_volume() triggers the following panic.

  Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
  ------------[ cut here ]------------
  Oops: 0002 [#1] SMP NOPTI
  CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G  E
        4.14.148-200.ckv.x86_64 #1
  Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
  task: ffff967af0520000 task.stack: ffffa5f05484000
  RIP: 0010:mutex_lock+0x19/0x20
  Call Trace:
    flush_workqueue+0x81/0x460
    ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
    ocfs2_dismount_volume+0x84/0x400 [ocfs2]
    ocfs2_fill_super+0xa4/0x1270 [ocfs2]
    ? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
    mount_bdev+0x17f/0x1c0
    mount_fs+0x3a/0x160

Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.com
Signed-off-by: Yi Li <yilikernel@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19 06:32:32 -04:00
Chengguang Xu ce750f43f5 ocfs2: fix error handling in ocfs2_setattr()
Should set transfer_to[USRQUOTA/GRPQUOTA] to NULL on error case before
jumping to do dqput().

Link: http://lkml.kernel.org/r/20191010082349.1134-1-cgxu519@mykernel.net
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19 06:32:32 -04:00
David Hildenbrand aad5f69bc1 fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c
There are three places where we access uninitialized memmaps, namely:
- /proc/kpagecount
- /proc/kpageflags
- /proc/kpagecgroup

We have initialized memmaps either when the section is online or when the
page was initialized to the ZONE_DEVICE.  Uninitialized memmaps contain
garbage and in the worst case trigger kernel BUGs, especially with
CONFIG_PAGE_POISONING.

For example, not onlining a DIMM during boot and calling /proc/kpagecount
with CONFIG_PAGE_POISONING:

  :/# cat /proc/kpagecount > tmp.test
  BUG: unable to handle page fault for address: fffffffffffffffe
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
  Oops: 0000 [#1] SMP NOPTI
  CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
  RIP: 0010:kpagecount_read+0xce/0x1e0
  Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
  RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
  RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
  RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
  R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
  FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
  Call Trace:
   proc_reg_read+0x3c/0x60
   vfs_read+0xc5/0x180
   ksys_read+0x68/0xe0
   do_syscall_64+0x5c/0xa0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

For now, let's drop support for ZONE_DEVICE from the three pseudo files
in order to fix this.  To distinguish offline memory (with garbage
memmap) from ZONE_DEVICE memory with properly initialized memmaps, we
would have to check get_dev_pagemap() and pfn_zone_device_reserved()
right now.  The usage of both (especially, special casing devmem) is
frowned upon and needs to be reworked.

The fundamental issue we have is:

	if (pfn_to_online_page(pfn)) {
		/* memmap initialized */
	} else if (pfn_valid(pfn)) {
		/*
		 * ???
		 * a) offline memory. memmap garbage.
		 * b) devmem: memmap initialized to ZONE_DEVICE.
		 * c) devmem: reserved for driver. memmap garbage.
		 * (d) devmem: memmap currently initializing - garbage)
		 */
	}

We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
in place as that function is also used from memory failure.  We now no
longer dump information about pages that are not in use anymore -
offline.

Link: http://lkml.kernel.org/r/20191009142435.3975-2-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e86b]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org>	[4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19 06:32:31 -04:00
Linus Torvalds d418d07005 for-linus-2019-10-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl2qbF0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptsuEADEKL8pta74uy50pl0t8l9fZ++U+wdIeEIW
 9uumpOEPnI2GpkG1sOyKWK6tl8InQLw6pAquP9MoT2BHXqFHk7NIgtvk67lwQeoc
 dRwklVfvOLAdnKzyfODqE9Fh9BgczZIuOLzgdtNqrPKqgJfFRCwN94Kj/r2tYuy7
 v+riK3A49u12dOLtjU6ciNgZ0m1iUX9s0+PFYVUXtJHU/1OYToQaKP+sgWiue0Ca
 VJP/L4MLYD0a7tfd92WAK7xWLsYWTDw1Gg20hXH/tV+IIDQ5+OXhu2s6PuqI7c0y
 cZqWHQHBDkZMQvT8+V+YqZtEa+xwVCom51prJEPasmdq3fGx+2sDC1HQiySao1ML
 wfFxZvFvY9fm6M7p2xsSNEcOmamrx1aLLyNSbjIvAqLUDYJWWS56BHsKyTU5Z+Jp
 RA9dpq8iR6ISaIAcFf0IB0pJSv1HEeHyo/ixlALqezBFJaMdhWy/M+dEbWKtix9M
 s19ozcpe+omN9+O0anlLtzKNgj2Xnjiwuu8mhVcqn6uG/p6GUOup+lNvTW/fig3I
 JBH8kObjYXL181V9rYVqFutnuqcf2HYqMvV2vzAmg4LYnPVUmU7HMj8zEpxc4N+f
 Evd77j0wXmY9S+4JERxaqQZuvKBEIkvM1rkk3N4NbNghfa7QL4aW+I9cWtuelPC2
 E+DK7if0Gg==
 =rvkw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request from Keith that address deadlocks, double resets,
   memory leaks, and other regression.

 - Fixup elv_support_iosched() for bio based devices (Damien)

 - Fixup for the ahci PCS quirk (Dan)

 - Socket O_NONBLOCK handling fix for io_uring (me)

 - Timeout sequence io_uring fixes (yangerkun)

 - MD warning fix for parameter default_layout (Song)

 - blkcg activation fixes (Tejun)

 - blk-rq-qos node deletion fix (Tejun)

* tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block:
  nvme-pci: Set the prp2 correctly when using more than 4k page
  io_uring: fix logic error in io_timeout
  io_uring: fix up O_NONBLOCK handling for sockets
  md/raid0: fix warning message for parameter default_layout
  libata/ahci: Fix PCS quirk application
  blk-rq-qos: fix first node deletion of rq_qos_del()
  blkcg: Fix multiple bugs in blkcg_activate_policy()
  io_uring: consider the overflow of sequence for timeout req
  nvme-tcp: fix possible leakage during error flow
  nvmet-loop: fix possible leakage during error flow
  block: Fix elv_support_iosched()
  nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
  nvme: Wait for reset state when required
  nvme: Prevent resets during paused controller state
  nvme: Restart request timers in resetting state
  nvme: Remove ADMIN_ONLY state
  nvme-pci: Free tagset if no IO queues
  nvme: retain split access workaround for capability reads
  nvme: fix possible deadlock when nvme_update_formats fails
2019-10-18 22:29:36 -04:00
Linus Torvalds b9959c7a34 filldir[64]: remove WARN_ON_ONCE() for bad directory entries
This was always meant to be a temporary thing, just for testing and to
see if it actually ever triggered.

The only thing that reported it was syzbot doing disk image fuzzing, and
then that warning is expected.  So let's just remove it before -rc4,
because the extra sanity testing should probably go to -stable, but we
don't want the warning to do so.

Reported-by: syzbot+3031f712c7ad5dd4d926@syzkaller.appspotmail.com
Fixes: 8a23eb804c ("Make filldir[64]() verify the directory entry filename is valid")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-18 18:41:16 -04:00
Linus Torvalds 6b95cf9b8b A future-proofing decoding fix from Jeff intended for stable and
a patch for a mostly benign race from Dongsheng.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl2qAEkTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi4mZB/9HMfEfZ8JC9keyVaJpAyvV8ufTR4qs
 4b8NNc0MDM01z1Z23G0o89b5M0WEDcGslh25plCifxyNIMa+L/lYKl8CTr7CLVQS
 qCEtNgJ7ibfM26v7rfHOlk6Nnd07/OmjcioaHu/R3bqEQmXpcWQg+aX9C6mPh/2f
 yzZTKZdKhTZfUyQQctuhNo9G+wD8K86DYT1XRbubPNQ3VtXKPuNH9rLhvLCZzbVA
 6FHW05A4mwSv80MsLgN6qLSKxv/+LjV/voHepH4HygqUKw2+1lwi9BC/4k7sprQs
 1jFONZ0p1sv/LdWwJYUyCpwj6d3NliXM0uvYxfyzKveWWCxb3l3gaWUS
 =7scd
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A future-proofing decoding fix from Jeff intended for stable and a
  patch for a mostly benign race from Dongsheng"

* tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client:
  rbd: cancel lock_dwork if the wait is interrupted
  ceph: just skip unrecognized info in ceph_reply_info_extra
2019-10-18 18:30:09 -04:00
yangerkun 8b07a65ad3 io_uring: fix logic error in io_timeout
If ctx->cached_sq_head < nxt_sq_head, we should add UINT_MAX to tmp, not
tmp_nxt.

Fixes: 5da0fb1ab3 ("io_uring: consider the overflow of sequence for timeout req")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-17 15:49:15 -06:00
Jens Axboe 491381ce07 io_uring: fix up O_NONBLOCK handling for sockets
We've got two issues with the non-regular file handling for non-blocking
IO:

1) We don't want to re-do a short read in full for a non-regular file,
   as we can't just read the data again.
2) For non-regular files that don't support non-blocking IO attempts,
   we need to punt to async context even if the file is opened as
   non-blocking. Otherwise the caller always gets -EAGAIN.

Add two new request flags to handle these cases. One is just a cache
of the inode S_ISREG() status, the other tells io_uring that we always
need to punt this request to async context, even if REQ_F_NOWAIT is set.

Cc: stable@vger.kernel.org
Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-17 15:49:11 -06:00
Linus Torvalds 6e8ba0098e Changes since last update:
- Fix a timestamp signedness problem in the new bulkstat ioctl.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl2l7R8ACgkQ+H93GTRK
 tOtFnw/9GyBxDqODfne2puAb3nISogzVDVRxzo2Ip4fH+djQ8XF0nZgaVmdK45xm
 lD0VNCMT7tb7YA9rIKb+G8WdZDfq5XUs2QOR8RCqg2rCy5Kxww8B0e83ziJ/hz8C
 ebmfuBpNoV3Ul8I4vPN2Qb2nN2wHo8LqMi3GgK3nmv92xuL7Fignh3AxForHSFRz
 VUQF+canS5IQo2KkbKsP92X1r+nwTG/8GL3WNicm4BgeFHcZbMKJn32OSpZHUSGb
 RPEpWD1TAROhjOejFpEqEXmU8x8o0BtErgo3dBgGQTntrq9gOPgOuoot3cyweHZK
 JEtxglnIIbJs0B0sru9ZuN7RkoXZYJjDaTOU8uLst8dwVaoX2kw8siWjQimadKI4
 aI323yHpTTjuzZa2PQRU682fNyVNwcDfQnOss28xQG1wG9Pjxo39AZqhpgR1GtX4
 kgYBVr6oJ/5HrZ5KFghBnjE1+jx4F4axAhgKqmbsg712mZcr1EM4TlFfd/yX6+Sd
 ihHlgSeIiZk+2fthnUwRvPuGSYIimtUAXY5wgRzXnqmCFxy3rPafLo41pPkCcYV4
 wCHdh6YR1H0/wJ9K2iUMxKKrSdu4PcdaprlD7aX7j3u727gybqOqIhaK6GdlxQg8
 zyUZIGQEyR6uTpbtC4xNttZ3OcR5w5u7JjIn6TR3xIws4TkVMwU=
 =WVUx
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.4-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Darrick Wong:
 "The single fix converts the seconds field in the recently added XFS
  bulkstat structure to a signed 64-bit quantity.

  The structure layout doesn't change and so far there are no users of
  the ioctl to break because we only publish xfs ioctl interfaces
  through the XFS userspace development libraries, and we're still
  working on a 5.3 release"

* tag 'xfs-5.4-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: change the seconds fields in xfs_bulkstat to signed
2019-10-17 14:19:52 -07:00
Filipe Manana ba0b084ac3 Btrfs: check for the full sync flag while holding the inode lock during fsync
We were checking for the full fsync flag in the inode before locking the
inode, which is racy, since at that that time it might not be set but
after we acquire the inode lock some other task set it. One case where
this can happen is on a system low on memory and some concurrent task
failed to allocate an extent map and therefore set the full sync flag on
the inode, to force the next fsync to work in full mode.

A consequence of missing the full fsync flag set is hitting the problems
fixed by commit 0c713cbab6 ("Btrfs: fix race between ranged fsync and
writeback of adjacent ranges"), BUG_ON() when dropping extents from a log
tree, hitting assertion failures at tree-log.c:copy_items() or all sorts
of weird inconsistencies after replaying a log due to file extents items
representing ranges that overlap.

So just move the check such that it's done after locking the inode and
before starting writeback again.

Fixes: 0c713cbab6 ("Btrfs: fix race between ranged fsync and writeback of adjacent ranges")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-17 20:36:02 +02:00
Filipe Manana c7967fc149 Btrfs: fix qgroup double free after failure to reserve metadata for delalloc
If we fail to reserve metadata for delalloc operations we end up releasing
the previously reserved qgroup amount twice, once explicitly under the
'out_qgroup' label by calling btrfs_qgroup_free_meta_prealloc() and once
again, under label 'out_fail', by calling btrfs_inode_rsv_release() with a
value of 'true' for its 'qgroup_free' argument, which results in
btrfs_qgroup_free_meta_prealloc() being called again, so we end up having
a double free.

Also if we fail to reserve the necessary qgroup amount, we jump to the
label 'out_fail', which calls btrfs_inode_rsv_release() and that in turns
calls btrfs_qgroup_free_meta_prealloc(), even though we weren't able to
reserve any qgroup amount. So we freed some amount we never reserved.

So fix this by removing the call to btrfs_inode_rsv_release() in the
failure path, since it's not necessary at all as we haven't changed the
inode's block reserve in any way at this point.

Fixes: c8eaeac7b7 ("btrfs: reserve delalloc metadata differently")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-17 20:13:44 +02:00
Qu Wenruo fd2b007eae btrfs: tracepoints: Fix wrong parameter order for qgroup events
[BUG]
For btrfs:qgroup_meta_reserve event, the trace event can output garbage:

  qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2

The diff should always be alinged to sector size (4k), so there is
definitely something wrong.

[CAUSE]
For the wrong @diff, it's caused by wrong parameter order.
The correct parameters are:

  struct btrfs_root, s64 diff, int type.

However the parameters used are:

  struct btrfs_root, int type, s64 diff.

Fixes: 4ee0d8832c ("btrfs: qgroup: Update trace events for metadata reservation")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-17 14:09:31 +02:00