1
0
Fork 0
Commit Graph

901474 Commits (60cf46ae605446feb0c43c472c0fd1af4cd96231)

Author SHA1 Message Date
Pavel Begunkov 60cf46ae60 io-wq: hash dependent work
Enable io-wq hashing stuff for dependent works simply by re-enqueueing
such requests.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-14 17:02:30 -06:00
Pavel Begunkov 8766dd516c io-wq: split hashing and enqueueing
It's a preparation patch removing io_wq_enqueue_hashed(), which
now should be done by io_wq_hash_work() + io_wq_enqueue().

Also, set hash value for dependant works, and do it as late as possible,
because req->file can be unavailable before. This hash will be ignored
by io-wq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-14 17:02:28 -06:00
Pavel Begunkov d78298e73a io-wq: don't resched if there is no work
This little tweak restores the behaviour that was before the recent
io_worker_handle_work() optimisation patches. It makes the function do
cond_resched() and flush_signals() only if there is an actual work to
execute.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-14 17:02:26 -06:00
Pavel Begunkov 2293b41958 io-wq: remove duplicated cancel code
Deduplicate cancellation parts, as many of them looks the same, as do
e.g.
- io_wqe_cancel_cb_work() and io_wqe_cancel_work()
- io_wq_worker_cancel() and io_work_cancel()

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-12 07:50:22 -06:00
Jens Axboe 3f9d64415f io_uring: fix truncated async read/readv and write/writev retry
Ensure we keep the truncated value, if we did truncate it. If not, we
might read/write more than the registered buffer size.

Also for retry, ensure that we return the truncated mapped value for
the vectorized versions of the read/write commands.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-11 12:29:15 -06:00
Jens Axboe bbbdeb4720 io_uring: dual license io_uring.h uapi header
This just syncs the header it with the liburing version, so there's no
confusion on the license of the header parts.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-11 07:45:46 -06:00
Xiaoguang Wang 32b2244a84 io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled
When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, applications don't need
to do io completion events polling again, they can rely on io_sq_thread to do
polling work, which can reduce cpu usage and uring_lock contention.

I modify fio io_uring engine codes a bit to evaluate the performance:
static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
                        continue;
                }

-               if (!o->sqpoll_thread) {
+               if (o->sqpoll_thread && o->hipri) {
                        r = io_uring_enter(ld, 0, actual_min,
                                                IORING_ENTER_GETEVENTS);
                        if (r < 0) {

and use "fio  -name=fiotest -filename=/dev/nvme0n1 -iodepth=$depth -thread
-rw=read -ioengine=io_uring  -hipri=1 -sqthread_poll=1  -direct=1 -bs=4k
-size=10G -numjobs=1  -time_based -runtime=120"

original codes
--------------------------------------------------------------------
iodepth       |        4 |        8 |       16 |       32 |       64
bw            | 1133MB/s | 1519MB/s | 2090MB/s | 2710MB/s | 3012MB/s
fio cpu usage |     100% |     100% |     100% |     100% |     100%
--------------------------------------------------------------------

with patch
--------------------------------------------------------------------
iodepth       |        4 |        8 |       16 |       32 |       64
bw            | 1196MB/s | 1721MB/s | 2351MB/s | 2977MB/s | 3357MB/s
fio cpu usage |    63.8% |   74.4%% |    81.1% |    83.7% |    82.4%
--------------------------------------------------------------------
bw improve    |     5.5% |    13.2% |    12.3% |     9.8% |    11.5%
--------------------------------------------------------------------

From above test results, we can see that bw has above 5.5%~13%
improvement, and fio process's cpu usage also drops much. Note this
won't improve io_sq_thread's cpu usage when SETUP_IOPOLL|SETUP_SQPOLL
are both enabled, in this case, io_sq_thread always has 100% cpu usage.
I think this patch will be friendly to applications which will often use
io_uring_wait_cqe() or similar from liburing.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-11 07:14:12 -06:00
YueHaibing 469956e853 io_uring: Fix unused function warnings
If CONFIG_NET is not set, gcc warns:

fs/io_uring.c:3110:12: warning: io_setup_async_msg defined but not used [-Wunused-function]
 static int io_setup_async_msg(struct io_kiocb *req,
            ^~~~~~~~~~~~~~~~~~

There are many funcions wraped by CONFIG_NET, move them
together to simplify code, also fix this warning.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Minor tweaks.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:56 -06:00
Jens Axboe 84557871f2 io_uring: add end-of-bits marker and build time verify it
Not easy to tell if we're going over the size of bits we can shove
in req->flags, so add an end-of-bits marker and a BUILD_BUG_ON()
check for it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:56 -06:00
Jens Axboe 067524e914 io_uring: provide means of removing buffers
We have IORING_OP_PROVIDE_BUFFERS, but the only way to remove buffers
is to trigger IO on them. The usual case of shrinking a buffer pool
would be to just not replenish the buffers when IO completes, and
instead just free it. But it may be nice to have a way to manually
remove a number of buffers from a given group, and
IORING_OP_REMOVE_BUFFERS provides that functionality.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:56 -06:00
Jens Axboe 52de1fe122 io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG
Like IORING_OP_READV, this is limited to supporting just a single
segment in the iovec passed in.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:51 -06:00
Jens Axboe 0a384abfae net: abstract out normal and compat msghdr import
This splits it into two parts, one that imports the message, and one
that imports the iovec. This allows a caller to only do the first part,
and import the iovec manually afterwards.

No functional changes in this patch.

Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:49 -06:00
Jens Axboe 4d954c258a io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_READV
This adds support for the vectored read. This is limited to supporting
just 1 segment in the iov, and is provided just for convenience for
applications that use IORING_OP_READV already.

The iov helpers will be used for IORING_OP_RECVMSG as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:48 -06:00
Jens Axboe bcda7baaa3 io_uring: support buffer selection for OP_READ and OP_RECV
If a server process has tons of pending socket connections, generally
it uses epoll to wait for activity. When the socket is ready for reading
(or writing), the task can select a buffer and issue a recv/send on the
given fd.

Now that we have fast (non-async thread) support, a task can have tons
of pending reads or writes pending. But that means they need buffers to
back that data, and if the number of connections is high enough, having
them preallocated for all possible connections is unfeasible.

With IORING_OP_PROVIDE_BUFFERS, an application can register buffers to
use for any request. The request then sets IOSQE_BUFFER_SELECT in the
sqe, and a given group ID in sqe->buf_group. When the fd becomes ready,
a free buffer from the specified group is selected. If none are
available, the request is terminated with -ENOBUFS. If successful, the
CQE on completion will contain the buffer ID chosen in the cqe->flags
member, encoded as:

	(buffer_id << IORING_CQE_BUFFER_SHIFT) | IORING_CQE_F_BUFFER;

Once a buffer has been consumed by a request, it is no longer available
and must be registered again with IORING_OP_PROVIDE_BUFFERS.

Requests need to support this feature. For now, IORING_OP_READ and
IORING_OP_RECV support it. This is checked on SQE submission, a CQE with
res == -EOPNOTSUPP will be posted if attempted on unsupported requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:45 -06:00
Jens Axboe ddf0322db7 io_uring: add IORING_OP_PROVIDE_BUFFERS
IORING_OP_PROVIDE_BUFFERS uses the buffer registration infrastructure to
support passing in an addr/len that is associated with a buffer ID and
buffer group ID. The group ID is used to index and lookup the buffers,
while the buffer ID can be used to notify the application which buffer
in the group was used. The addr passed in is the starting buffer address,
and length is each buffer length. A number of buffers to add with can be
specified, in which case addr is incremented by length for each addition,
and each buffer increments the buffer ID specified.

No validation is done of the buffer ID. If the application provides
buffers within the same group with identical buffer IDs, then it'll have
a hard time telling which buffer ID was used. The only restriction is
that the buffer ID can be a max of 16-bits in size, so USHRT_MAX is the
maximum ID that can be used.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10 09:12:14 -06:00
Jens Axboe 5a2e745d4d io_uring: buffer registration infrastructure
This just prepares the ring for having lists of buffers associated with
it, that the application can provide for SQEs to consume instead of
providing their own.

The buffers are organized by group ID.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-04 11:49:14 -07:00
Pavel Begunkov e9fd939654 io_uring/io-wq: forward submission ref to async
First it changes io-wq interfaces. It replaces {get,put}_work() with
free_work(), which guaranteed to be called exactly once. It also enforces
free_work() callback to be non-NULL.

io_uring follows the changes and instead of putting a submission reference
in io_put_req_async_completion(), it will be done in io_free_work(). As
removes io_get_work() with corresponding refcount_inc(), the ref balance
is maintained.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-04 11:39:07 -07:00
Pavel Begunkov f462fd36fc io-wq: optimise out *next_work() double lock
When executing non-linked hashed work, io_worker_handle_work()
will lock-unlock wqe->lock to update hash, and then immediately
lock-unlock to get next work. Optimise this case and do
lock/unlock only once.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-04 11:39:06 -07:00
Pavel Begunkov 58e3931987 io-wq: optimise locking in io_worker_handle_work()
There are 2 optimisations:
- Now, io_worker_handler_work() do io_assign_current_work() twice per
request, and each one adds lock/unlock(worker->lock) pair. The first is
to reset worker->cur_work to NULL, and the second to set a real work
shortly after. If there is a dependant work, set it immediately, that
effectively removes the extra NULL'ing.

- And there is no use in taking wqe->lock for linked works, as they are
not hashed now. Optimise it out.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-04 11:39:04 -07:00
Pavel Begunkov dc026a73c7 io-wq: shuffle io_worker_handle_work() code
This is a preparation patch, it adds some helpers and makes
the next patches cleaner.

- extract io_impersonate_work() and io_assign_current_work()
- replace @next label with nested do-while
- move put_work() right after NULL'ing cur_work.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-04 11:39:03 -07:00
Pavel Begunkov 7a743e225b io_uring: get next work with submission ref drop
If after dropping the submission reference req->refs == 1, the request
is done, because this one is for io_put_work() and will be dropped
synchronously shortly after. In this case it's safe to steal a next
work from the request.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-03 20:02:49 -07:00
Pavel Begunkov 014db0073c io_uring: remove @nxt from handlers
There will be no use for @nxt in the handlers, and it's doesn't work
anyway, so purge it

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-03 20:02:49 -07:00
Pavel Begunkov 594506fec5 io_uring: make submission ref putting consistent
The rule is simple, any async handler gets a submission ref and should
put it at the end. Make them all follow it, and so more consistent.

This is a preparation patch, and as io_wq_assign_next() currently won't
ever work, this doesn't care to use io_put_req_find_next() instead of
io_put_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

refcount_inc_not_zero() -> refcount_inc() fix.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-03 20:02:35 -07:00
Pavel Begunkov a2100672f3 io_uring: clean up io_close
Don't abuse labels for plain and straightworward code.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 21:26:56 -07:00
Nathan Chancellor 8755d97a09 io_uring: Ensure mask is initialized in io_arm_poll_handler
Clang warns:

fs/io_uring.c:4178:6: warning: variable 'mask' is used uninitialized
whenever 'if' condition is false [-Wsometimes-uninitialized]
        if (def->pollin)
            ^~~~~~~~~~~
fs/io_uring.c:4182:2: note: uninitialized use occurs here
        mask |= POLLERR | POLLPRI;
        ^~~~
fs/io_uring.c:4178:2: note: remove the 'if' if its condition is always
true
        if (def->pollin)
        ^~~~~~~~~~~~~~~~
fs/io_uring.c:4154:15: note: initialize the variable 'mask' to silence
this warning
        __poll_t mask, ret;
                     ^
                      = 0
1 warning generated.

io_op_defs has many definitions where pollin is not set so mask indeed
might be uninitialized. Initialize it to zero and change the next
assignment to |=, in case further masks are added in the future to avoid
missing changing the assignment then.

Fixes: d7718a9d25 ("io_uring: use poll driven retry for files that support it")
Link: https://github.com/ClangBuiltLinux/linux/issues/916
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 16:13:24 -07:00
Pavel Begunkov 3b17cf5a58 io_uring: remove io_prep_next_work()
io-wq cares about IO_WQ_WORK_UNBOUND flag only while enqueueing, so
it's useless setting it for a next req of a link. Thus, removed it
from io_prep_linked_timeout(), and inline the function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:41 -07:00
Pavel Begunkov 4bc4494ec7 io_uring: remove extra nxt check after punt
After __io_queue_sqe() ended up in io_queue_async_work(), it's already
known that there is no @nxt req, so skip the check and return from the
function.

Also, @nxt initialisation now can be done just before
io_put_req_find_next(), as there is no jumping until it's checked.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:40 -07:00
Jens Axboe d7718a9d25 io_uring: use poll driven retry for files that support it
Currently io_uring tries any request in a non-blocking manner, if it can,
and then retries from a worker thread if we get -EAGAIN. Now that we have
a new and fancy poll based retry backend, use that to retry requests if
the file supports it.

This means that, for example, an IORING_OP_RECVMSG on a socket no longer
requires an async thread to complete the IO. If we get -EAGAIN reading
from the socket in a non-blocking manner, we arm a poll handler for
notification on when the socket becomes readable. When it does, the
pending read is executed directly by the task again, through the io_uring
task work handlers. Not only is this faster and more efficient, it also
means we're not generating potentially tons of async threads that just
sit and block, waiting for the IO to complete.

The feature is marked with IORING_FEAT_FAST_POLL, meaning that async
pollable IO is fast, and that poll<link>other_op is fast as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:38 -07:00
Jens Axboe 8a72758c51 io_uring: mark requests that we can do poll async in io_op_defs
Add a pollin/pollout field to the request table, and have commands that
we can safely poll for properly marked.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:37 -07:00
Jens Axboe b41e98524e io_uring: add per-task callback handler
For poll requests, it's not uncommon to link a read (or write) after
the poll to execute immediately after the file is marked as ready.
Since the poll completion is called inside the waitqueue wake up handler,
we have to punt that linked request to async context. This slows down
the processing, and actually means it's faster to not use a link for this
use case.

We also run into problems if the completion_lock is contended, as we're
doing a different lock ordering than the issue side is. Hence we have
to do trylock for completion, and if that fails, go async. Poll removal
needs to go async as well, for the same reason.

eventfd notification needs special case as well, to avoid stack blowing
recursion or deadlocks.

These are all deficiencies that were inherited from the aio poll
implementation, but I think we can do better. When a poll completes,
simply queue it up in the task poll list. When the task completes the
list, we can run dependent links inline as well. This means we never
have to go async, and we can remove a bunch of code associated with
that, and optimizations to try and make that run faster. The diffstat
speaks for itself.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:36 -07:00
Jens Axboe c2f2eb7d2c io_uring: store io_kiocb in wait->private
Store the io_kiocb in the private field instead of the poll entry, this
is in preparation for allowing multiple waitqueues.

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:34 -07:00
Oleg Nesterov 6fb614920b task_work_run: don't take ->pi_lock unconditionally
As Peter pointed out, task_work() can avoid ->pi_lock and cmpxchg()
if task->task_works == NULL && !PF_EXITING.

And in fact the only reason why task_work_run() needs ->pi_lock is
the possible race with task_work_cancel(), we can optimize this code
and make the locking more clear.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:33 -07:00
Pavel Begunkov 3684f24653 io-wq: use BIT for ulong hash
@hash_map is unsigned long, but BIT_ULL() is used for manipulations.
BIT() is a better match as it returns exactly unsigned long value.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:31 -07:00
Pavel Begunkov 5eae861990 io_uring: remove IO_WQ_WORK_CB
IO_WQ_WORK_CB is used only for linked timeouts, which will be armed
before the work setup (i.e. mm, override creds, etc). The setup
shouldn't take long, so it's ok to arm it a bit later and get rid
of IO_WQ_WORK_CB.

Make io-wq call work->func() only once, callbacks will handle the rest.
i.e. the linked timeout handler will do the actual issue. And as a
bonus, it removes an extra indirect call.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:06:29 -07:00
Pavel Begunkov e85530ddda io-wq: remove unused IO_WQ_WORK_HAS_MM
IO_WQ_WORK_HAS_MM is set but never used, remove it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:05:24 -07:00
Pavel Begunkov 02d27d8953 io_uring: extract kmsg copy helper
io_recvmsg() and io_sendmsg() duplicate nonblock -EAGAIN finilising
part, so add helper for that.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:41 -07:00
Pavel Begunkov b0a20349f2 io_uring: clean io_poll_complete
Deduplicate call to io_cqring_fill_event(), plain and easy

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:37 -07:00
Pavel Begunkov 7d67af2c01 io_uring: add splice(2) support
Add support for splice(2).

- output file is specified as sqe->fd, so it's handled by generic code
- hash_reg_file handled by generic code as well
- len is 32bit, but should be fine
- the fd_in is registered file, when SPLICE_F_FD_IN_FIXED is set, which
is a splice flag (i.e. sqe->splice_flags).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:37 -07:00
Pavel Begunkov 8da11c1994 io_uring: add interface for getting files
Preparation without functional changes. Adds io_get_file(), that allows
to grab files not only into req->file.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:37 -07:00
Pavel Begunkov 444ebb5768 splice: make do_splice public
Make do_splice(), so other kernel parts can reuse it

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:31 -07:00
Pavel Begunkov bcaec089c5 io_uring: remove req->in_async
req->in_async is not really needed, it only prevents propagation of
@nxt for fast not-blocked submissions. Remove it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:29 -07:00
Pavel Begunkov deb6dc0544 io_uring: don't do full *prep_worker() from io-wq
io_prep_async_worker() called io_wq_assign_next() do many useless checks:
io_req_work_grab_env() was already called during prep, and @do_hashed
is not ever used. Add io_prep_next_work() -- simplified version, that
can be called io-wq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:28 -07:00
Pavel Begunkov 5ea6216116 io_uring: don't call work.func from sync ctx
Many operations define custom work.func before getting into an io-wq.
There are several points against:
- it calls io_wq_assign_next() from outside io-wq, that may be confusing
- sync context would go unnecessary through io_req_cancelled()
- prototypes are quite different, so work!=old_work looks strange
- makes async/sync responsibilities fuzzy
- adds extra overhead

Don't call generic path and io-wq handlers from each other, but use
helpers instead

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:24 -07:00
Jens Axboe e441d1cf20 io_uring: io_accept() should hold on to submit reference on retry
Don't drop an early reference, hang on to it and let the caller drop
it. This makes it behave more like "regular" requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:24 -07:00
Jens Axboe 29de5f6a35 io_uring: consider any io_read/write -EAGAIN as final
If the -EAGAIN happens because of a static condition, then a poll
or later retry won't fix it. We must call it again from blocking
condition. Play it safe and ensure that any -EAGAIN condition from read
or write must retry from async context.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:04:24 -07:00
Pavel Begunkov 80ad894382 io-wq: remove io_wq_flush and IO_WQ_WORK_INTERNAL
io_wq_flush() is buggy, during cancelation of a flush, the associated
work may be passed to the caller's (i.e. io_uring) @match callback. That
callback is expecting it to be embedded in struct io_kiocb. Cancelation
of internal work probably doesn't make a lot of sense to begin with.

As the flush helper is no longer used, just delete it and the associated
work flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 14:03:24 -07:00
Pavel Begunkov fc04c39bae io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
To cancel a work, io-wq sets IO_WQ_WORK_CANCEL and executes the
callback. However, IO_WQ_WORK_NO_CANCEL works will just execute and may
return next work, which will be ignored and lost.

Cancel the whole link.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-02 07:20:08 -07:00
Linus Torvalds 98d54f81e3 Linux 5.6-rc4 2020-03-01 16:38:46 -06:00
Linus Torvalds e70869821a Two more bug fixes (including a regression) for 5.6
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl5cMPoACgkQ8vlZVpUN
 gaNYmgf/WX4/jMSYQu2fICudCqLr5fkLqsybvYGZGei3F8BaJ90zohQAQybNznWS
 iyF0JzrOp37b/o0haz7KfDr7xVB3lAVsKu9Bglq+zL8mc9IkPmjhCXuLbknUtOUw
 j3aVdntt4d6S3szbtP4PIZxNqh+/4KJDS2soWvuNWRpYMOv2yoMClptWWQtsimAt
 3fYpxasSz0Jrhtbuf+I1oID++wOycDT3RKiko5tpLlQiFVoKBzfou+0ZdkC4+UIl
 KvcpMBm1ijdGAaN9jfb2L2KCY5UdSvmeVui3sMXtHBEpKMJl2QsClylR1wGfgBKi
 +YMEsjBONxKo3kH2DaPJaU6LEm8JuQ==
 =rszH
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Two more bug fixes (including a regression) for 5.6"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
  jbd2: fix data races at struct journal_head
2020-03-01 16:35:08 -06:00
Linus Torvalds f853ed90e2 More bugfixes, including a few remaining "make W=1" issues such
as too large frame sizes on some configurations.  On the
 ARM side, the compiler was messing up shadow stacks between
 EL1 and EL2 code, which is easily fixed with __always_inline.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeXAT4AAoJEL/70l94x66DWywH/1kv4MmeGo6PI0Nxk/yvA7X8
 78iqIBchtxZX0v/9kqpTB7bYmHyTgmZHM+IkwtIUANDSaOvWqJwU+TLUfduOiuXF
 NxBHcZDyuMoftX5CSQ+bJ5PwxKijAdJsIkCZ13CnsTCkwcfamSGypFUCK8LacPeq
 WHvV5Ws5pFc51xrP3CH1DrRhLoulaBmt5xxqK9fxWtslrlsnm1uNza5vs8As8CzM
 apnmdRIf5p4v91Zic3PFH7/GXES0m1tjIBKdtZ4YHb8yrXV/kBsEVhhTjqE9mrUq
 qtRRl5waOFoP4yc9ey52PAbMm1x1Ho/pyunpM0xh40Yq8OPFwqXBPTnWfobSoiM=
 =LNQc
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "More bugfixes, including a few remaining "make W=1" issues such as too
  large frame sizes on some configurations.

  On the ARM side, the compiler was messing up shadow stacks between EL1
  and EL2 code, which is easily fixed with __always_inline"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: check descriptor table exits on instruction emulation
  kvm: x86: Limit the number of "kvm: disabled by bios" messages
  KVM: x86: avoid useless copy of cpufreq policy
  KVM: allow disabling -Werror
  KVM: x86: allow compiling as non-module with W=1
  KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
  KVM: Introduce pv check helpers
  KVM: let declaration of kvm_get_running_vcpus match implementation
  KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
  arm64: Ask the compiler to __always_inline functions used by KVM at HYP
  KVM: arm64: Define our own swab32() to avoid a uapi static inline
  KVM: arm64: Ask the compiler to __always_inline functions used at HYP
  kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
  KVM: arm/arm64: Fix up includes for trace.h
2020-03-01 15:16:35 -06:00