1
0
Fork 0
Commit Graph

798609 Commits (74030653f0b9736f179c1c4e713ba1f2070aa0dd)

Author SHA1 Message Date
Christoph Hellwig 74030653f0 block: remove the bioset_integrity_free export
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-16 08:33:38 -07:00
Christoph Hellwig a45eb575cd block: remove the unused bio_set_pages_dirty and bio_check_pages_dirty exports
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-14 06:17:58 -07:00
Christoph Hellwig 0374e11322 block: remove the unused bio_iov_iter_get_pages export
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-14 06:17:57 -07:00
Christoph Hellwig 637b60ade3 block: remove the blk_recount_segments export
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-14 06:17:55 -07:00
Christoph Hellwig 6c210aa596 block: remove the bio_phys_segments export
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-14 06:17:53 -07:00
Sagi Grimberg 092ff05200 nvme: fix kernel paging oops
free the controller discard_page correctly.

Fixes: cb5b7262b0 ("nvme: provide fallback for discard alloc failure")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 13:58:23 -07:00
Jens Axboe 2d9a058e3f Merge branch 'nvme-4.21' of git://git.infradead.org/nvme into for-4.21/block
Pull NVMe updates from Christoph:

"Here is the second large chunk of nvme updates for 4.21:

 - host and target support for NVMe over TCP (Sagi Grimberg,
	Roy Shterman, Solganik Alexander)
 - error log page support in target (Chaitanya Kulkarni)

plus small fixes and improvements from Jens Axboe and Chengguang Xu."

* 'nvme-4.21' of git://git.infradead.org/nvme: (33 commits)
  nvme-rdma: support separate queue maps for read and write
  nvme-tcp: support separate queue maps for read and write
  nvme-fabrics: allow user to set nr_write_queues for separate queue maps
  nvme-fabrics: add missing nvmf_ctrl_options documentation
  blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues
  nvmet: update smart log with num err log entries
  nvmet: add error log page cmd handler
  nvmet: add error log support for file backend
  nvmet: add error log support for bdev backend
  nvmet: add error log support for admin-cmd
  nvmet: add error log support for rdma backend
  nvmet: add error log support for fabrics-cmd
  nvmet: add error log support in the core
  nvmet: add interface to update error-log page
  nvmet: add error-log definitions
  nvme: add error log page slot definition
  nvme: remove nvme_common command cdw10 array
  nvmet: remove unused variable
  nvme: provide fallback for discard alloc failure
  nvme: add __exit annotation
  ...
2018-12-13 12:59:37 -07:00
Guoju Fang e78bd0d26f bcache: print number of keys in trace_bcache_journal_write
Sometimes flush journal may be very frequent, so it's useful to dump
number of keys every time write journal.

Signed-off-by: Guoju Fang <fangguoju@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Coly Li cc38ca7ed5 bcache: set writeback_percent in a flexible range
Because CUTOFF_WRITEBACK is defined as 40, so before the changes of
dynamic cutoff writeback values, writeback_percent is limited to [0,
CUTOFF_WRITEBACK]. Any value larger than CUTOFF_WRITEBACK will be fixed
up to 40.

Now cutof writeback limit is a dynamic value bch_cutoff_writeback, so
the range of writeback_percent can be a more flexible range as [0,
bch_cutoff_writeback]. The flexibility is, it can be expended to a
larger or smaller range than [0, 40], depends on how value
bch_cutoff_writeback is specified.

The default value is still strongly recommended to most of users for
most of workloads. But for people who want to do research on bcache
writeback perforamnce tuning, they may have chance to specify more
flexible writeback_percent in range [0, 70].

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Coly Li 9aaf516546 bcache: make cutoff_writeback and cutoff_writeback_sync tunable
Currently the cutoff writeback and cutoff writeback sync thresholds are
defined by CUTOFF_WRITEBACK (40) and CUTOFF_WRITEBACK_SYNC (70) as
static values. Most of time these they work fine, but when people want
to do research on bcache writeback mode performance tuning, there is no
chance to modify the soft and hard cutoff writeback values.

This patch introduces two module parameters bch_cutoff_writeback_sync
and bch_cutoff_writeback which permit people to tune the values when
loading bcache.ko. If they are not specified by module loading, current
values CUTOFF_WRITEBACK_SYNC and CUTOFF_WRITEBACK will be used as
default and nothing changes.

When people want to tune this two values,
- cutoff_writeback can be set in range [1, 70]
- cutoff_writeback_sync can be set in range [1, 90]
- cutoff_writeback always <= cutoff_writeback_sync

The default values are strongly recommended to most of users for most of
workloads. Anyway, if people wants to take their own risk to do research
on new writeback cutoff tuning for their own workload, now they can make
it.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Coly Li 009673d02f bcache: add MODULE_DESCRIPTION information
This patch moves MODULE_AUTHOR and MODULE_LICENSE to end of super.c, and
add MODULE_DESCRIPTION("Bcache: a Linux block layer cache").

This is preparation for adding module parameters.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Coly Li 7a671d8ef8 bcache: option to automatically run gc thread after writeback
The option gc_after_writeback is disabled by default, because garbage
collection will discard SSD data which drops cached data.

Echo 1 into /sys/fs/bcache/<UUID>/internal/gc_after_writeback will
enable this option, which wakes up gc thread when writeback accomplished
and all cached data is clean.

This option is helpful for people who cares writing performance more. In
heavy writing workload, all cached data can be clean only happens when
writeback thread cleans all cached data in I/O idle time. In such
situation a following gc running may help to shrink bcache B+ tree and
discard more clean data, which may be helpful for future writing
requests.

If you are not sure whether this is helpful for your own workload,
please leave it as disabled by default.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Coly Li cb07ad6368 bcache: introduce force_wake_up_gc()
Garbage collection thread starts to work when c->sectors_to_gc is
negative value, otherwise nothing will happen even the gc thread is
woken up by wake_up_gc().

force_wake_up_gc() sets c->sectors_to_gc to -1 before calling
wake_up_gc(), then gc thread may have chance to run if no one else sets
c->sectors_to_gc to a positive value before gc_should_run().

This routine can be called where the gc thread is woken up and required
to run in force.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Shenghui Wang f383ae300c bcache: cannot set writeback_running via sysfs if no writeback kthread created
"echo 1 > writeback_running" marks writeback_running even if no
writeback kthread created as "d_strtoul(writeback_running)" will simply
set dc-> writeback_running without checking the existence of
dc->writeback_thread.

Add check for setting writeback_running via sysfs: if no writeback
kthread available, reject setting to 1.

v2 -> v3:
  * Make message on wrong assignment more clear.
  * Print name of bcache device instead of name of backing device.

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Shenghui Wang 79b791466e bcache: do not mark writeback_running too early
A fresh backing device is not attached to any cache_set, and
has no writeback kthread created until first attached to some
cache_set.

But bch_cached_dev_writeback_init run
"
	dc->writeback_running		= true;
	WARN_ON(test_and_clear_bit(BCACHE_DEV_WB_RUNNING,
			&dc->disk.flags));
"
for any newly formatted backing devices.

For a fresh standalone backing device, we can get something like
following even if no writeback kthread created:
------------------------
/sys/block/bcache0/bcache# cat writeback_running
1
/sys/block/bcache0/bcache# cat writeback_rate_debug
rate:		512.0k/sec
dirty:		0.0k
target:		0.0k
proportional:	0.0k
integral:	0.0k
change:		0.0k/sec
next io:	-15427384ms

The none ZERO fields are misleading as no alive writeback kthread yet.

Set dc->writeback_running false as no writeback thread created in
bch_cached_dev_writeback_init().

We have writeback thread created and woken up in bch_cached_dev_writeback
_start(). Set dc->writeback_running true before bch_writeback_queue()
called, as a writeback thread will check if dc->writeback_running is true
before writing back dirty data, and hung if false detected.

After the change, we can get the following output for a fresh standalone
backing device:
-----------------------
/sys/block/bcache0/bcache$ cat writeback_running
0
/sys/block/bcache0/bcache# cat writeback_rate_debug
rate:		0.0k/sec
dirty:		0.0k
target:		0.0k
proportional:	0.0k
integral:	0.0k
change:		0.0k/sec
next io:	0ms

v1 -> v2:
  Set dc->writeback_running before bch_writeback_queue() called,

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Shenghui Wang 4e361e020e bcache: update comment in sysfs.c
We have struct cached_dev allocated by kzalloc in register_bcache(),
which initializes all the fields of cached_dev with 0s. And commit
ce4c3e19e5 ("bcache: Replace bch_read_string_list() by
__sysfs_match_string()") has remove the string "default".

Update the comment.

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Shenghui Wang 3db4d0783e bcache: update comment for bch_data_insert
commit 220bb38c21 ("bcache: Break up struct search") introduced
changes to struct search and s->iop. bypass/bio are fields of struct
data_insert_op now. Update the comment.

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Shenghui Wang ae17102316 bcache: do not check if debug dentry is ERR or NULL explicitly on remove
debugfs_remove and debugfs_remove_recursive will check if the dentry
pointer is NULL or ERR, and will do nothing in that case.

Remove the check in cache_set_free and bch_debug_init.

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Shenghui Wang d2f96f487f bcache: add comment for cache_set->fill_iter
We have the following define for btree iterator:
	struct btree_iter {
		size_t size, used;
	#ifdef CONFIG_BCACHE_DEBUG
		struct btree_keys *b;
	#endif
		struct btree_iter_set {
			struct bkey *k, *end;
		} data[MAX_BSETS];
	};

We can see that the length of data[] field is static MAX_BSETS, which is
defined as 4 currently.

But a btree node on disk could have too many bsets for an iterator to fit
on the stack - maybe far more that MAX_BSETS. Have to dynamically allocate
space to host more btree_iter_sets.

bch_cache_set_alloc() will make sure the pool cache_set->fill_iter can
allocate an iterator equipped with enough room that can host
	(sb.bucket_size / sb.block_size)
btree_iter_sets, which is more than static MAX_BSETS.

bch_btree_node_read_done() will use that pool to allocate one iterator, to
host many bsets in one btree node.

Add more comment around cache_set->fill_iter to make code less confusing.

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-13 08:15:54 -07:00
Sagi Grimberg b65bb777ef nvme-rdma: support separate queue maps for read and write
llow NVMF_OPT_NR_WRITE_QUEUES to describe additional write queues.  In
addition, implement .map_queues that will apply 2 queue maps for read
and write queue sets.

Note that with the separate queue map, HCTX_TYPE_READ will always use
nr_io_queues and HCTX_TYPE_DEFAULT will use nr_write_queues.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:10 +01:00
Sagi Grimberg 873946f4b9 nvme-tcp: support separate queue maps for read and write
Allow NVMF_OPT_NR_WRITE_QUEUES to describe additional write queues.  In
addition, implement .map_queues that will apply 2 queue maps for read
and write queue sets.

Note that with the separate queue map, HCTX_TYPE_READ will always use
nr_io_queues and HCTX_TYPE_DEFAULT will use nr_write_queues.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:09 +01:00
Sagi Grimberg 330f6b8a70 nvme-fabrics: allow user to set nr_write_queues for separate queue maps
This argument will specify how many I/O queues will be connected in
create_ctrl in addition to nr_io_queues. With this configuration, I/O
that carries payload from the host to the target, will use the default
hctx queue map, and I/O that involves target to host transfers will use
the read hctx queue map.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:09 +01:00
Sagi Grimberg fa9a1811e0 nvme-fabrics: add missing nvmf_ctrl_options documentation
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:08 +01:00
Sagi Grimberg e42b3867de blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues
Will be used by nvme-rdma for queue map separation support.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:08 +01:00
Chaitanya Kulkarni 23454d59cc nvmet: update smart log with num err log entries
Now that we have error log page implementation update smart log command
handler to provide number of error log entries in the lifetime of the
controller field.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:07 +01:00
Chaitanya Kulkarni 11ad507784 nvmet: add error log page cmd handler
Now that we have support for all the major parts of the target we add
a NVMe error log page handler so that host can read the log page.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:06 +01:00
Chaitanya Kulkarni c6aa3542e0 nvmet: add error log support for file backend
This patch adds support for the file backend to populate the
error log entries. Here we map the errno to the NVMe status codes.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:06 +01:00
Chaitanya Kulkarni 3b031d1599 nvmet: add error log support for bdev backend
This patch adds the support for the block device backend to populate the
error log entries. Here we map the blk_status_t to the NVMe status.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:05 +01:00
Chaitanya Kulkarni 2da6e00580 nvmet: add error log support for admin-cmd
This patch adds the support to maintain the error log page for admin
commands.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:05 +01:00
Chaitanya Kulkarni 762a11dfee nvmet: add error log support for rdma backend
This patch adds the support to maintain the error log page for rdma
transport, we mainly focus here on the NVME_INVALID_FIELD errors.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:04 +01:00
Chaitanya Kulkarni 84faf42b8a nvmet: add error log support for fabrics-cmd
This patch adds the support to maintain error log page for the fabrics
prop get, prop set, and admin connect commands. Here we also update the
discovery.c and add update set/get features and parse functions to
support error log page.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:04 +01:00
Chaitanya Kulkarni e81446afc7 nvmet: add error log support in the core
This patch adds the support to maintain error log page for the
nvmet-core.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:03 +01:00
Chaitanya Kulkarni 76574f37bf nvmet: add interface to update error-log page
This patch adds nvmet_req based interface to the nvmet-core so that
we can update the error log page. We update error log page in
the request completion path when status is not set to NVME_SC_SUCCESS.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:03 +01:00
Chaitanya Kulkarni e4a976254e nvmet: add error-log definitions
This patch adds necessary fields in the target data structures to
support error log page. For a target controller, we add a new error log
field to maintain the error log, at any given point we maintain error
entries equal to NVMET_ERROR_LOG_SLOTS for each controller. In the
following patch, we also update the error log page entry in the I/O
completion path so we introduce a spinlock for synchronization of the
log.

For nvmet_req, we add a new field error_loc to hold the location of
the error in the command when the actual error occurs for each request
and a starting LBA if applicable.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:02 +01:00
Chaitanya Kulkarni b34de7cee0 nvme: add error log page slot definition
This patch adds the NVMe error slot definition from the spec.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:01 +01:00
Chaitanya Kulkarni b7c8f3663d nvme: remove nvme_common command cdw10 array
This is a preparation patch which removes the nvme common command cdw10
array and replace with individual fields. This is needed for the nvmet
error log page implementation make is error log page entry offset
assignment easier.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:01 +01:00
Sagi Grimberg 16d3a280d4 nvmet: remove unused variable
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:00 +01:00
Jens Axboe cb5b7262b0 nvme: provide fallback for discard alloc failure
When boxes are run near (or to) OOM, we have a problem with the discard
page allocation in nvme. If we fail allocating the special page, we
return busy, and it'll get retried. But since ordering is honored for
dispatch requests, we can keep retrying this same IO and failing. Behind
that IO could be requests that want to free memory, but they never get
the chance.

Allocate a fixed discard page per controller for a safe fallback, and use
that if the initial allocation fails.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:00 +01:00
Chengguang Xu 8eb5d89f48 nvme: add __exit annotation
Add __exit annotation to cleanup helper which is only
called once in the module.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:59 +01:00
Sagi Grimberg 3f2304f8c6 nvme-tcp: add NVMe over TCP host driver
This patch implements the NVMe over TCP host driver. It can be used to
connect to remote NVMe over Fabrics subsystems over good old TCP/IP.

The driver implements the TP 8000 of how nvme over fabrics capsules and
data are encapsulated in nvme-tcp pdus and exchaged on top of a TCP byte
stream. nvme-tcp header and data digest are supported as well.

To connect to all NVMe over Fabrics controllers reachable on a given taget
port over TCP use the following command:

	nvme connect-all -t tcp -a $IPADDR

This requires the latest version of nvme-cli with TCP support.

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:58 +01:00
Sagi Grimberg ad4f530e95 nvmet: allow configfs tcp trtype configuration
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:58 +01:00
Sagi Grimberg 872d26a391 nvmet-tcp: add NVMe over TCP target driver
This patch implements the TCP transport driver for the NVMe over Fabrics
target stack. This allows exporting NVMe over Fabrics functionality over
good old TCP/IP.

The driver implements the TP 8000 of how nvme over fabrics capsules and
data are encapsulated in nvme-tcp pdus and exchaged on top of a TCP byte
stream. nvme-tcp header and data digest are supported as well.

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:57 +01:00
Sagi Grimberg fc221d0544 nvme-tcp: Add protocol header
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:57 +01:00
Sagi Grimberg 20d44e8632 nvme-fabrics: allow user passing data digest
Data digest is a nvme-tcp specific feature, but nothing prevents other
transports reusing the concept so do not associate with tcp transport
solely.

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:56 +01:00
Sagi Grimberg 3b49fa8072 nvme-fabrics: allow user passing header digest
Header digest is a nvme-tcp specific feature, but nothing prevents other
transports reusing the concept so do not associate with tcp transport
solely.

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:56 +01:00
Sagi Grimberg 1672ddb8d6 nvmet: Add install_queue callout
nvmet-tcp will implement it to allocate queue commands which
are only known at nvmf connect time (sq size).

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:55 +01:00
Sagi Grimberg 65d69e2505 datagram: introduce skb_copy_and_hash_datagram_iter helper
Introduce a helper to copy datagram into an iovec iterator
but also update a predefined hash. This is useful for
consumers of skb_copy_datagram_iter to also support inflight
data digest without having to finish to copy and only then
traverse the iovec and calculate the digest hash.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:55 +01:00
Sagi Grimberg d05f443554 iov_iter: introduce hash_and_copy_to_iter helper
Allow consumers that want to use iov iterator helpers and also update
a predefined hash calculation online when copying data. This is useful
when copying incoming network buffers to a local iterator and calculate
a digest on the incoming stream. nvme-tcp host driver that will be
introduced in following patches is the first consumer via
skb_copy_and_hash_datagram_iter.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:54 +01:00
Sagi Grimberg 950fcaecd5 datagram: consolidate datagram copy to iter helpers
skb_copy_datagram_iter and skb_copy_and_csum_datagram are essentialy
the same but with a couple of differences: The first is the copy
operation used which either a simple copy or a csum_and_copy, and the
second are the behavior on the "short copy" path where simply copy
needs to return the number of bytes successfully copied while csum_and_copy
needs to fault immediately as the checksum is partial.

Introduce __skb_datagram_iter that additionally accepts:
1. copy operation function pointer
2. private data that goes with the copy operation
3. fault_short flag to indicate the action on short copy

Suggested-by: David S. Miller <davem@davemloft.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:53 +01:00
Sagi Grimberg cb002d074d iov_iter: pass void csum pointer to csum_and_copy_to_iter
The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram
and we are trying to unite its logic with skb_copy_datagram_iter by passing
a callback to the copy function that we want to apply. Thus, we need
to make the checksum pointer private to the function.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:53 +01:00