1
0
Fork 0
Commit Graph

1042 Commits (e5194ee4bf62f4187c64e8c5755d53066c8851f5)

Author SHA1 Message Date
Miklos Szeredi cbc5b45148 fuse: verify attributes
commit eb59bd17d2 upstream.

If a filesystem returns negative inode sizes, future reads on the file were
causing the cpu to spin on truncate_pagecache.

Create a helper to validate the attributes.  This now does two things:

 - check the file mode
 - check if the file size fits in i_size without overflowing

Reported-by: Arijit Banerjee <arijit@rubrik.com>
Fixes: d8a5ba4545 ("[PATCH] FUSE - core")
Cc: <stable@vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:31 +01:00
Miklos Szeredi 8aa5c23ef8 fuse: verify write return
commit 8aab336b14 upstream.

Make sure filesystem is not returning a bogus number of bytes written.

Fixes: ea9b9907b8 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org> # v2.6.26
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:31 +01:00
Miklos Szeredi ba916a1310 fuse: verify nlink
commit c634da718d upstream.

When adding a new hard link, make sure that i_nlink doesn't overflow.

Fixes: ac45d61357 ("fuse: fix nlink after unlink")
Cc: <stable@vger.kernel.org> # v3.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:30 +01:00
Miklos Szeredi a266e9072a fuse: fix leak of fuse_io_priv
commit f1ebdeffc6 upstream.

exit_aio() is sometimes stuck in wait_for_completion() after aio is issued
with direct IO and the task receives a signal.

The reason is failure to call ->ki_complete() due to a leaked reference to
fuse_io_priv.  This happens in fuse_async_req_send() if
fuse_simple_background() returns an error (e.g. -EINTR).

In this case the error value is propagated via io->err, so return success
to not confuse callers.

This issue is tracked as a virtio-fs issue:
https://gitlab.com/virtio-fs/qemu/issues/14

Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Fixes: 45ac96ed7c ("fuse: convert direct_io to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13 08:42:30 +01:00
Vasily Averin 091d1a7267 fuse: redundant get_fuse_inode() calls in fuse_writepages_fill()
Currently fuse_writepages_fill() calls get_fuse_inode() few times with
the same argument.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 14:26:37 +02:00
Miklos Szeredi e4648309b8 fuse: truncate pending writes on O_TRUNC
Make sure cached writes are not reordered around open(..., O_TRUNC), with
the obvious wrong results.

Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 14:26:37 +02:00
Miklos Szeredi b24e7598db fuse: flush dirty data/metadata before non-truncate setattr
If writeback cache is enabled, then writes might get reordered with
chmod/chown/utimes.  The problem with this is that performing the write in
the fuse daemon might itself change some of these attributes.  In such case
the following sequence of operations will result in file ending up with the
wrong mode, for example:

  int fd = open ("suid", O_WRONLY|O_CREAT|O_EXCL);
  write (fd, "1", 1);
  fchown (fd, 0, 0);
  fchmod (fd, 04755);
  close (fd);

This patch fixes this by flushing pending writes before performing
chown/chmod/utimes.

Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 14:26:37 +02:00
zhengbin 80da5a809d virtiofs: Remove set but not used variable 'fc'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/fuse/virtio_fs.c: In function virtio_fs_wake_pending_and_unlock:
fs/fuse/virtio_fs.c:983:20: warning: variable fc set but not used [-Wunused-but-set-variable]

It is not used since commit 7ee1e2e631 ("virtiofs: No need to check
fpq->connected state")

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-23 10:25:17 +02:00
Vivek Goyal a9bfd9dd34 virtiofs: Retry request submission from worker context
If regular request queue gets full, currently we sleep for a bit and
retrying submission in submitter's context. This assumes submitter is not
holding any spin lock. But this assumption is not true for background
requests. For background requests, we are called with fc->bg_lock held.

This can lead to deadlock where one thread is trying submission with
fc->bg_lock held while request completion thread has called
fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As
request completion thread gets blocked, it does not make further progress
and that means queue does not get empty and submitter can't submit more
requests.

To solve this issue, retry submission with the help of a worker, instead of
retrying in submitter's context. We already do this for hiprio/forget
requests.

Reported-by: Chirantan Ekbote <chirantan@chromium.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:08 +02:00
Vivek Goyal c17ea00961 virtiofs: Count pending forgets as in_flight forgets
If virtqueue is full, we put forget requests on a list and these forgets
are dispatched later using a worker. As of now we don't count these forgets
in fsvq->in_flight variable. This means when queue is being drained, we
have to have special logic to first drain these pending requests and then
wait for fsvq->in_flight to go to zero.

By counting pending forgets in fsvq->in_flight, we can get rid of special
logic and just wait for in_flight to go to zero. Worker thread will kick
and drain all the forgets anyway, leading in_flight to zero.

I also need similar logic for normal request queue in next patch where I am
about to defer request submission in the worker context if queue is full.

This simplifies the code a bit.

Also add two helper functions to inc/dec in_flight. Decrement in_flight
helper will later used to call completion when in_flight reaches zero.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Vivek Goyal 5dbe190f34 virtiofs: Set FR_SENT flag only after request has been sent
FR_SENT flag should be set when request has been sent successfully sent
over virtqueue. This is used by interrupt logic to figure out if interrupt
request should be sent or not.

Also add it to fqp->processing list after sending it successfully.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Vivek Goyal 7ee1e2e631 virtiofs: No need to check fpq->connected state
In virtiofs we keep per queue connected state in virtio_fs_vq->connected
and use that to end request if queue is not connected. And virtiofs does
not even touch fpq->connected state.

We probably need to merge these two at some point of time. For now,
simplify the code a bit and do not worry about checking state of
fpq->connected.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Vivek Goyal 51fecdd255 virtiofs: Do not end request in submission context
Submission context can hold some locks which end request code tries to hold
again and deadlock can occur. For example, fc->bg_lock. If a background
request is being submitted, it might hold fc->bg_lock and if we could not
submit request (because device went away) and tried to end request, then
deadlock happens. During testing, I also got a warning from deadlock
detection code.

So put requests on a list and end requests from a worker thread.

I got following warning from deadlock detector.

[  603.137138] WARNING: possible recursive locking detected
[  603.137142] --------------------------------------------
[  603.137144] blogbench/2036 is trying to acquire lock:
[  603.137149] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse]
[  603.140701]
[  603.140701] but task is already holding lock:
[  603.140703] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse]
[  603.140713]
[  603.140713] other info that might help us debug this:
[  603.140714]  Possible unsafe locking scenario:
[  603.140714]
[  603.140715]        CPU0
[  603.140716]        ----
[  603.140716]   lock(&(&fc->bg_lock)->rlock);
[  603.140718]   lock(&(&fc->bg_lock)->rlock);
[  603.140719]
[  603.140719]  *** DEADLOCK ***

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Miklos Szeredi 6c26f71759 fuse: don't advise readdirplus for negative lookup
If the FUSE_READDIRPLUS_AUTO feature is enabled, then lookups on a
directory before/during readdir are used as an indication that READDIRPLUS
should be used instead of READDIR.  However if the lookup turns out to be
negative, then selecting READDIRPLUS makes no sense.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 15:57:07 +02:00
Miklos Szeredi 2b319d1f6f fuse: don't dereference req->args on finished request
Move the check for async request after check for the request being already
finished and done with.

Reported-by: syzbot+ae0bb7aae3de6b4594e2@syzkaller.appspotmail.com
Fixes: d49937749f ("fuse: stop copying args to fuse_req")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-21 09:11:40 +02:00
Miklos Szeredi 3f22c74671 virtio-fs: don't show mount options
Virtio-fs does not accept any mount options, so it's confusing and wrong to
show any in /proc/mounts.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com> 
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-15 16:11:41 +02:00
Vivek Goyal 112e72373d virtio-fs: Change module name to virtiofs.ko
We have been calling it virtio_fs and even file name is virtio_fs.c. Module
name is virtio_fs.ko but when registering file system user is supposed to
specify filesystem type as "virtiofs".

Masayoshi Mizuma reported that he specified filesytem type as "virtio_fs"
and got this warning on console.

  ------------[ cut here ]------------
  request_module fs-virtio_fs succeeded, but still no fs?
  WARNING: CPU: 1 PID: 1234 at fs/filesystems.c:274 get_fs_type+0x12c/0x138
  Modules linked in: ... virtio_fs fuse virtio_net net_failover ...
  CPU: 1 PID: 1234 Comm: mount Not tainted 5.4.0-rc1 #1

So looks like kernel could find the module virtio_fs.ko but could not find
filesystem type after that.

It probably is better to rename module name to virtiofs.ko so that above
warning goes away in case user ends up specifying wrong fs name.

Reported-by: Masayoshi Mizuma <msys.mizuma@gmail.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-14 10:20:33 +02:00
Linus Torvalds 8f744bdee4 add virtio-fs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXYx2zAAKCRDh3BK/laaZ
 PFpHAQD2G+F8a9e41jFTJg5YpNKMD8/Pl4T6v9chIO9qPXF2IAEAji0P1JterRfv
 ixiBhv54hSwYbk527nxNWE9tP5gAHAQ=
 =WCHy
 -----END PGP SIGNATURE-----

Merge tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse virtio-fs support from Miklos Szeredi:
 "Virtio-fs allows exporting directory trees on the host and mounting
  them in guest(s).

  This isn't actually a new filesystem, but a glue layer between the
  fuse filesystem and a virtio based back-end.

  It's similar in functionality to the existing virtio-9p solution, but
  significantly faster in benchmarks and has better POSIX compliance.
  Further permformance improvements can be achieved by sharing the page
  cache between host and guest, allowing for faster I/O and reduced
  memory use.

  Kata Containers have been including the out-of-tree virtio-fs (with
  the shared page cache patches as well) since version 1.7 as an
  experimental feature. They have been active in development and plan to
  switch from virtio-9p to virtio-fs as their default solution. There
  has been interest from other sources as well.

  The userspace infrastructure is slated to be merged into qemu once the
  kernel part hits mainline.

  This was developed by Vivek Goyal, Dave Gilbert and Stefan Hajnoczi"

* tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  virtio-fs: add virtiofs filesystem
  virtio-fs: add Documentation/filesystems/virtiofs.rst
  fuse: reserve values for mapping protocol
2019-09-27 15:54:24 -07:00
YueHaibing 5addcd5dbd fuse: Make fuse_args_to_req static
Fix sparse warning:

fs/fuse/dev.c:468:6: warning: symbol 'fuse_args_to_req' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 68583165f9 ("fuse: add pages to fuse_args")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:02 +02:00
zhengbin 9ad09b1976 fuse: fix memleak in cuse_channel_open
If cuse_send_init fails, need to fuse_conn_put cc->fc.

cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1)
                 ->fuse_dev_alloc->fuse_conn_get
                 ->fuse_dev_free->fuse_conn_put

Fixes: cc080e9e9b ("fuse: introduce per-instance fuse_dev structure")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00
Tejun Heo e5854b1cdf fuse: fix beyond-end-of-page access in fuse_parse_cache()
With DEBUG_PAGEALLOC on, the following triggers.

  BUG: unable to handle page fault for address: ffff88859367c000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 3001067 P4D 3001067 PUD 406d3a8067 PMD 406d30c067 PTE 800ffffa6c983060
  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
  CPU: 38 PID: 3110657 Comm: python2.7
  RIP: 0010:fuse_readdir+0x88f/0xe7a [fuse]
  Code: 49 8b 4d 08 49 39 4e 60 0f 84 44 04 00 00 48 8b 43 08 43 8d 1c 3c 4d 01 7e 68 49 89 dc 48 03 5c 24 38 49 89 46 60 8b 44 24 30 <8b> 4b 10 44 29 e0 48 89 ca 48 83 c1 1f 48 83 e1 f8 83 f8 17 49 89
  RSP: 0018:ffffc90035edbde0 EFLAGS: 00010286
  RAX: 0000000000001000 RBX: ffff88859367bff0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffff88859367bfed RDI: 0000000000920907
  RBP: ffffc90035edbe90 R08: 000000000000014b R09: 0000000000000004
  R10: ffff88859367b000 R11: 0000000000000000 R12: 0000000000000ff0
  R13: ffffc90035edbee0 R14: ffff889fb8546180 R15: 0000000000000020
  FS:  00007f80b5f4a740(0000) GS:ffff889fffa00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff88859367c000 CR3: 0000001c170c2001 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   iterate_dir+0x122/0x180
   __x64_sys_getdents+0xa6/0x140
   do_syscall_64+0x42/0x100
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

It's in fuse_parse_cache().  %rbx (ffff88859367bff0) is fuse_dirent
pointer - addr + offset.  FUSE_DIRENT_SIZE() is trying to dereference
namelen off of it but that derefs into the next page which is disabled
by pagealloc debug causing a PF.

This is caused by dirent->namelen being accessed before ensuring that
there's enough bytes in the page for the dirent.  Fix it by pushing
down reclen calculation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5d7bc7e868 ("fuse: allow using readdir cache")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00
Arnd Bergmann 0ed4059302 fuse: unexport fuse_put_request
This function has been made static, which now causes a compile-time
warning:

WARNING: "fuse_put_request" [vmlinux] is a static EXPORT_SYMBOL_GPL

Remove the unneeded export.

Fixes: 66abc3599c ("fuse: unexport request ops")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00
Khazhismel Kumykov dc69e98c24 fuse: kmemcg account fs data
account per-file, dentry, and inode data

blockdev/superblock and temporary per-request data was left alone, as
this usually isn't accounted

Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00
Khazhismel Kumykov 30c6a23d34 fuse: on 64-bit store time in d_fsdata directly
Implements the optimization noted in commit f75fdf22b0 ("fuse: don't
use ->d_time"), as the additional memory can be significant.  (In
particular, on SLAB configurations this 8-byte alloc becomes 32 bytes).
Per-dentry, this can consume significant memory.

Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00
Vasily Averin d5880c7a86 fuse: fix missing unlock_page in fuse_writepage()
unlock_page() was missing in case of an already in-flight write against the
same page.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Fixes: ff17be0864 ("fuse: writepage: skip already in flight")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-24 15:28:01 +02:00
Stefan Hajnoczi a62a8ef9d9 virtio-fs: add virtiofs filesystem
Add a basic file system module for virtio-fs.  This does not yet contain
shared data support between host and guest or metadata coherency speedups.
However it is already significantly faster than virtio-9p.

Design Overview
===============

With the goal of designing something with better performance and local file
system semantics, a bunch of ideas were proposed.

 - Use fuse protocol (instead of 9p) for communication between guest and
   host.  Guest kernel will be fuse client and a fuse server will run on
   host to serve the requests.

 - For data access inside guest, mmap portion of file in QEMU address space
   and guest accesses this memory using dax.  That way guest page cache is
   bypassed and there is only one copy of data (on host).  This will also
   enable mmap(MAP_SHARED) between guests.

 - For metadata coherency, there is a shared memory region which contains
   version number associated with metadata and any guest changing metadata
   updates version number and other guests refresh metadata on next access.
   This is yet to be implemented.

How virtio-fs differs from existing approaches
==============================================

The unique idea behind virtio-fs is to take advantage of the co-location of
the virtual machine and hypervisor to avoid communication (vmexits).

DAX allows file contents to be accessed without communication with the
hypervisor.  The shared memory region for metadata avoids communication in
the common case where metadata is unchanged.

By replacing expensive communication with cheaper shared memory accesses,
we expect to achieve better performance than approaches based on network
file system protocols.  In addition, this also makes it easier to achieve
local file system semantics (coherency).

These techniques are not applicable to network file system protocols since
the communications channel is bypassed by taking advantage of shared memory
on a local machine.  This is why we decided to build virtio-fs rather than
focus on 9P or NFS.

Caching Modes
=============

Like virtio-9p, different caching modes are supported which determine the
coherency level as well.  The “cache=FOO” and “writeback” options control
the level of coherence between the guest and host filesystems.

 - cache=none
   metadata, data and pathname lookup are not cached in guest.  They are
   always fetched from host and any changes are immediately pushed to host.

 - cache=always
   metadata, data and pathname lookup are cached in guest and never expire.

 - cache=auto
   metadata and pathname lookup cache expires after a configured amount of
   time (default is 1 second).  Data is cached while the file is open
   (close to open consistency).

 - writeback/no_writeback
   These options control the writeback strategy.  If writeback is disabled,
   then normal writes will immediately be synchronized with the host fs.
   If writeback is enabled, then writes may be cached in the guest until
   the file is closed or an fsync(2) performed.  This option has no effect
   on mmap-ed writes or writes going through the DAX mechanism.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-18 20:17:50 +02:00
Vivek Goyal 15c8e72e88 fuse: allow skipping control interface and forced unmount
virtio-fs does not support aborting requests which are being
processed. That is requests which have been sent to fuse daemon on host.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Miklos Szeredi 783863d647 fuse: dissociate DESTROY from fuseblk
Allow virtio-fs to also send DESTROY request.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Miklos Szeredi 8fab010644 fuse: delete dentry if timeout is zero
Don't hold onto dentry in lru list if need to re-lookup it anyway at next
access.  Only do this if explicitly enabled, otherwise it could result in
performance regression.

More advanced version of this patch would periodically flush out dentries
from the lru which have gone stale.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Vivek Goyal 0cd1eb9a41 fuse: separate fuse device allocation and installation in fuse_conn
As of now fuse_dev_alloc() both allocates a fuse device and installs it in
fuse_conn list.  fuse_dev_alloc() can fail if fuse_device allocation fails.

virtio-fs needs to initialize multiple fuse devices (one per virtio queue).
It initializes one fuse device as part of call to fuse_fill_super_common()
and rest of the devices are allocated and installed after that.

But, we can't afford to fail after calling fuse_fill_super_common() as we
don't have a way to undo all the actions done by fuse_fill_super_common().
So to avoid failures after the call to fuse_fill_super_common(),
pre-allocate all fuse devices early and install them into fuse connection
later.

This patch provides two separate helpers for fuse device allocation and
fuse device installation in fuse_conn.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Stefan Hajnoczi ae3aad77f4 fuse: add fuse_iqueue_ops callbacks
The /dev/fuse device uses fiq->waitq and fasync to signal that requests are
available.  These mechanisms do not apply to virtio-fs.  This patch
introduces callbacks so alternative behavior can be used.

Note that queue_interrupt() changes along these lines:

  spin_lock(&fiq->waitq.lock);
  wake_up_locked(&fiq->waitq);
+ kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
  spin_unlock(&fiq->waitq.lock);
- kill_fasync(&fiq->fasync, SIGIO, POLL_IN);

Since queue_request() and queue_forget() also call kill_fasync() inside
the spinlock this should be safe.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:41 +02:00
Stefan Hajnoczi 0cc2656cdb fuse: extract fuse_fill_super_common()
fuse_fill_super() includes code to process the fd= option and link the
struct fuse_dev to the fd's struct file.  In virtio-fs there is no file
descriptor because /dev/fuse is not used.

This patch extracts fuse_fill_super_common() so that both classic fuse and
virtio-fs can share the code to initialize a mount.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00
Vivek Goyal 4388c5aac4 fuse: export fuse_dequeue_forget() function
File systems like virtio-fs need to do not have to play directly with
forget list data structures. There is a helper function use that instead.

Rename dequeue_forget() to fuse_dequeue_forget() and export it so that
stacked filesystems can use it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00
Stefan Hajnoczi 79d96efffd fuse: export fuse_get_unique()
virtio-fs will need unique IDs for FORGET requests from outside
fs/fuse/dev.c.  Make the symbol visible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00
Vivek Goyal 95a84cdb11 fuse: export fuse_send_init_request()
This will be used by virtio-fs to send init request to fuse server after
initialization of virt queues.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00
Stefan Hajnoczi 14d46d7abc fuse: export fuse_len_args()
virtio-fs will need to query the length of fuse_arg lists.  Make the symbol
visible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00
Stefan Hajnoczi 04ec5af077 fuse: export fuse_end_request()
virtio-fs will need to complete requests from outside fs/fuse/dev.c.  Make
the symbol visible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00
Miklos Szeredi f22f812d5c fuse: fix request limit
The size of struct fuse_req was reduced from 392B to 144B on a non-debug
config, thus the sanitize_global_limit() helper was setting a larger
default limit.  This doesn't really reflect reduction in the memory used by
requests, since the fields removed from fuse_req were added to fuse_args
derived structs; e.g. sizeof(struct fuse_writepages_args) is 248B, thus
resulting in slightly more memory being used for writepage requests
overalll (due to using 256B slabs).

Make the calculatation ignore the size of fuse_req and use the old 392B
value.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12 14:59:40 +02:00
Miklos Szeredi 05ea48cc2b fuse: stop copying pages to fuse_req
The page array pointers are also duplicated across fuse_args_pages and
fuse_req.  Get rid of the fuse_req ones.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi d49937749f fuse: stop copying args to fuse_req
No need to duplicate the argument arrays in fuse_req, so just dereference
req->args instead of copying to the fuse_req internal ones.

This allows further cleanup of the fuse_req structure.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi 145b673bd2 fuse: clean up fuse_req
Get rid of request specific fields in fuse_req that are not used anymore.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi 7213394c4e fuse: simplify request allocation
Page arrays are not allocated together with the request anymore.  Get rid
of the dead code

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi 66abc3599c fuse: unexport request ops
All requests are now sent with one of the fuse_simple_... helpers.  Get rid
of the old api from the fuse internal header.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi 75b399dda5 fuse: convert retrieve to simple api
Rename fuse_request_send_notify_reply() to fuse_simple_notify_reply() and
convert to passing fuse_args instead of fuse_req.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi 4cb548666e fuse: convert release to simple api
Since we cannot reserve the request structure up-front, make sure that the
request allocation doesn't fail using __GFP_NOFAIL.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:50 +02:00
Miklos Szeredi b50ef7c52a cuse: convert init to simple api
This is a straightforward conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:49 +02:00
Miklos Szeredi 615047eff1 fuse: convert init to simple api
Bypass the fc->initialized check by setting the force flag.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:49 +02:00
Miklos Szeredi 33826ebbbe fuse: convert writepages to simple api
Derive fuse_writepage_args from fuse_io_args.

Sending the request is tricky since it was done with fi->lock held, hence
we must either use atomic allocation or release the lock.  Both are
possible so try atomic first and if it fails, release the lock and do the
regular allocation with GFP_NOFS and __GFP_NOFAIL.  Both flags are
necessary for correct operation.

Move the page realloc function from dev.c to file.c and convert to using
fuse_writepage_args.

The last caller of fuse_write_fill() is gone, so get rid of it.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:49 +02:00
Miklos Szeredi 43f5098eb8 fuse: convert readdir to simple api
The old fuse_read_fill() helper can be deleted, now that the last user is
gone.

The fuse_io_args struct is moved to fuse_i.h so it can be shared between
readdir/read code.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:49 +02:00
Miklos Szeredi 134831e36b fuse: convert readpages to simple api
Need to extend fuse_io_args with 'attr_ver' and 'ff' members, that take the
functionality of the same named members in fuse_req.

fuse_short_read() can now take struct fuse_args_pages.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10 16:29:49 +02:00