for-5.3/block-20190708
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0jrIMQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgptlFD/9CNsBX+Aap2lO6wKNr6QISwNAK76GMzEay s4LSY2kGkXvzv8i89mCuY+8UVNI8WH2/22WnU+8CBAJOjWyFQMsIwH/mrq0oZWRD J6STJE8rTr6Fc2MvJUWryp/xdBh3+eDIsAdIZVHVAkIzqYPBnpIAwEIeIw8t0xsm v9ngpQ3WD6ep8tOj9pnG1DGKFg1CmukZCC/Y4CQV1vZtmm2I935zUwNV/TB+Egfx G8JSC0cSV02LMK88HCnA6MnC/XSUC0qgfXbnmP+TpKlgjVX+P/fuB3oIYcZEu2Rk 3YBpIkhsQytKYbF42KRLsmBH72u6oB9G+tNZTgB1STUDrZqdtD9xwX1rjDlY0ZzP EUDnk48jl/cxbs+VZrHoE2TcNonLiymV7Kb92juHXdIYmKFQStprGcQUbMaTkMfB 6BYrYLifWx0leu1JJ1i7qhNmug94BYCSCxcRmH0p6kPazPcY9LXNmDWMfMuBPZT7 z79VLZnHF2wNXJyT1cBluwRYYJRT4osWZ3XUaBWFKDgf1qyvXJfrN/4zmgkEIyW7 ivXC+KLlGkhntDlWo2pLKbbyOIKY1HmU6aROaI11k5Zyh0ixKB7tHKavK39l+NOo YB41+4l6VEpQEyxyRk8tO0sbHpKaKB+evVIK3tTwbY+Q0qTExErxjfWUtOgRWhjx iXJssPRo4w== =VSYT -----END PGP SIGNATURE----- Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "This is the main block updates for 5.3. Nothing earth shattering or major in here, just fixes, additions, and improvements all over the map. This contains: - Series of documentation fixes (Bart) - Optimization of the blk-mq ctx get/put (Bart) - null_blk removal race condition fix (Bob) - req/bio_op() cleanups (Chaitanya) - Series cleaning up the segment accounting, and request/bio mapping (Christoph) - Series cleaning up the page getting/putting for bios (Christoph) - block cgroup cleanups and moving it to where it is used (Christoph) - block cgroup fixes (Tejun) - Series of fixes and improvements to bcache, most notably a write deadlock fix (Coly) - blk-iolatency STS_AGAIN and accounting fixes (Dennis) - Series of improvements and fixes to BFQ (Douglas, Paolo) - debugfs_create() return value check removal for drbd (Greg) - Use struct_size(), where appropriate (Gustavo) - Two lighnvm fixes (Heiner, Geert) - MD fixes, including a read balance and corruption fix (Guoqing, Marcos, Xiao, Yufen) - block opal shadow mbr additions (Jonas, Revanth) - sbitmap compare-and-exhange improvemnts (Pavel) - Fix for potential bio->bi_size overflow (Ming) - NVMe pull requests: - improved PCIe suspent support (Keith Busch) - error injection support for the admin queue (Akinobu Mita) - Fibre Channel discovery improvements (James Smart) - tracing improvements including nvmetc tracing support (Minwoo Im) - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya Kulkarni)" - Various little fixes and improvements to drivers and core" * tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits) blk-iolatency: fix STS_AGAIN handling block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES blk-mq: simplify blk_mq_make_request() blk-mq: remove blk_mq_put_ctx() sbitmap: Replace cmpxchg with xchg block: fix .bi_size overflow block: sed-opal: check size of shadow mbr block: sed-opal: ioctl for writing to shadow mbr block: sed-opal: add ioctl for done-mark of shadow mbr block: never take page references for ITER_BVEC direct-io: use bio_release_pages in dio_bio_complete block_dev: use bio_release_pages in bio_unmap_user block_dev: use bio_release_pages in blkdev_bio_end_io iomap: use bio_release_pages in iomap_dio_bio_end_io block: use bio_release_pages in bio_map_user_iov block: use bio_release_pages in bio_unmap_user block: optionally mark pages dirty in bio_release_pages block: move the BIO_NO_PAGE_REF check into bio_release_pages block: skd_main.c: Remove call to memset after dma_alloc_coherent block: mtip32xx: Remove call to memset after dma_alloc_coherent ...alistair/sunxi64-5.4-dsi
commit
3b99107f0e
|
@ -38,13 +38,13 @@ stack). To give an idea of the limits with BFQ, on slow or average
|
||||||
CPUs, here are, first, the limits of BFQ for three different CPUs, on,
|
CPUs, here are, first, the limits of BFQ for three different CPUs, on,
|
||||||
respectively, an average laptop, an old desktop, and a cheap embedded
|
respectively, an average laptop, an old desktop, and a cheap embedded
|
||||||
system, in case full hierarchical support is enabled (i.e.,
|
system, in case full hierarchical support is enabled (i.e.,
|
||||||
CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_DEBUG_BLK_CGROUP is not
|
CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not
|
||||||
set (Section 4-2):
|
set (Section 4-2):
|
||||||
- Intel i7-4850HQ: 400 KIOPS
|
- Intel i7-4850HQ: 400 KIOPS
|
||||||
- AMD A8-3850: 250 KIOPS
|
- AMD A8-3850: 250 KIOPS
|
||||||
- ARM CortexTM-A53 Octa-core: 80 KIOPS
|
- ARM CortexTM-A53 Octa-core: 80 KIOPS
|
||||||
|
|
||||||
If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical
|
If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical
|
||||||
support is enabled), then the sustainable throughput with BFQ
|
support is enabled), then the sustainable throughput with BFQ
|
||||||
decreases, because all blkio.bfq* statistics are created and updated
|
decreases, because all blkio.bfq* statistics are created and updated
|
||||||
(Section 4-2). For BFQ, this leads to the following maximum
|
(Section 4-2). For BFQ, this leads to the following maximum
|
||||||
|
@ -537,19 +537,19 @@ or io.bfq.weight.
|
||||||
|
|
||||||
As for cgroups-v1 (blkio controller), the exact set of stat files
|
As for cgroups-v1 (blkio controller), the exact set of stat files
|
||||||
created, and kept up-to-date by bfq, depends on whether
|
created, and kept up-to-date by bfq, depends on whether
|
||||||
CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all
|
CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all
|
||||||
the stat files documented in
|
the stat files documented in
|
||||||
Documentation/cgroup-v1/blkio-controller.rst. If, instead,
|
Documentation/cgroup-v1/blkio-controller.rst. If, instead,
|
||||||
CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files
|
CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files
|
||||||
blkio.bfq.io_service_bytes
|
blkio.bfq.io_service_bytes
|
||||||
blkio.bfq.io_service_bytes_recursive
|
blkio.bfq.io_service_bytes_recursive
|
||||||
blkio.bfq.io_serviced
|
blkio.bfq.io_serviced
|
||||||
blkio.bfq.io_serviced_recursive
|
blkio.bfq.io_serviced_recursive
|
||||||
|
|
||||||
The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum
|
The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum
|
||||||
throughput sustainable with bfq, because updating the blkio.bfq.*
|
throughput sustainable with bfq, because updating the blkio.bfq.*
|
||||||
stats is rather costly, especially for some of the stats enabled by
|
stats is rather costly, especially for some of the stats enabled by
|
||||||
CONFIG_DEBUG_BLK_CGROUP.
|
CONFIG_BFQ_CGROUP_DEBUG.
|
||||||
|
|
||||||
Parameters to set
|
Parameters to set
|
||||||
-----------------
|
-----------------
|
||||||
|
|
|
@ -436,7 +436,6 @@ struct bio {
|
||||||
struct bvec_iter bi_iter; /* current index into bio_vec array */
|
struct bvec_iter bi_iter; /* current index into bio_vec array */
|
||||||
|
|
||||||
unsigned int bi_size; /* total size in bytes */
|
unsigned int bi_size; /* total size in bytes */
|
||||||
unsigned short bi_phys_segments; /* segments after physaddr coalesce*/
|
|
||||||
unsigned short bi_hw_segments; /* segments after DMA remapping */
|
unsigned short bi_hw_segments; /* segments after DMA remapping */
|
||||||
unsigned int bi_max; /* max bio_vecs we can hold
|
unsigned int bi_max; /* max bio_vecs we can hold
|
||||||
used as index into pool */
|
used as index into pool */
|
||||||
|
|
|
@ -14,6 +14,15 @@ add_random (RW)
|
||||||
This file allows to turn off the disk entropy contribution. Default
|
This file allows to turn off the disk entropy contribution. Default
|
||||||
value of this file is '1'(on).
|
value of this file is '1'(on).
|
||||||
|
|
||||||
|
chunk_sectors (RO)
|
||||||
|
------------------
|
||||||
|
This has different meaning depending on the type of the block device.
|
||||||
|
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
|
||||||
|
of the RAID volume stripe segment. For a zoned block device, either host-aware
|
||||||
|
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
|
||||||
|
of the device, with the eventual exception of the last zone of the device which
|
||||||
|
may be smaller.
|
||||||
|
|
||||||
dax (RO)
|
dax (RO)
|
||||||
--------
|
--------
|
||||||
This file indicates whether the device supports Direct Access (DAX),
|
This file indicates whether the device supports Direct Access (DAX),
|
||||||
|
@ -43,6 +52,16 @@ large discards are issued, setting this value lower will make Linux issue
|
||||||
smaller discards and potentially help reduce latencies induced by large
|
smaller discards and potentially help reduce latencies induced by large
|
||||||
discard operations.
|
discard operations.
|
||||||
|
|
||||||
|
discard_zeroes_data (RO)
|
||||||
|
------------------------
|
||||||
|
Obsolete. Always zero.
|
||||||
|
|
||||||
|
fua (RO)
|
||||||
|
--------
|
||||||
|
Whether or not the block driver supports the FUA flag for write requests.
|
||||||
|
FUA stands for Force Unit Access. If the FUA flag is set that means that
|
||||||
|
write requests must bypass the volatile cache of the storage device.
|
||||||
|
|
||||||
hw_sector_size (RO)
|
hw_sector_size (RO)
|
||||||
-------------------
|
-------------------
|
||||||
This is the hardware sector size of the device, in bytes.
|
This is the hardware sector size of the device, in bytes.
|
||||||
|
@ -83,14 +102,19 @@ logical_block_size (RO)
|
||||||
-----------------------
|
-----------------------
|
||||||
This is the logical block size of the device, in bytes.
|
This is the logical block size of the device, in bytes.
|
||||||
|
|
||||||
|
max_discard_segments (RO)
|
||||||
|
-------------------------
|
||||||
|
The maximum number of DMA scatter/gather entries in a discard request.
|
||||||
|
|
||||||
max_hw_sectors_kb (RO)
|
max_hw_sectors_kb (RO)
|
||||||
----------------------
|
----------------------
|
||||||
This is the maximum number of kilobytes supported in a single data transfer.
|
This is the maximum number of kilobytes supported in a single data transfer.
|
||||||
|
|
||||||
max_integrity_segments (RO)
|
max_integrity_segments (RO)
|
||||||
---------------------------
|
---------------------------
|
||||||
When read, this file shows the max limit of integrity segments as
|
Maximum number of elements in a DMA scatter/gather list with integrity
|
||||||
set by block layer which a hardware controller can handle.
|
data that will be submitted by the block layer core to the associated
|
||||||
|
block driver.
|
||||||
|
|
||||||
max_sectors_kb (RW)
|
max_sectors_kb (RW)
|
||||||
-------------------
|
-------------------
|
||||||
|
@ -100,11 +124,12 @@ size allowed by the hardware.
|
||||||
|
|
||||||
max_segments (RO)
|
max_segments (RO)
|
||||||
-----------------
|
-----------------
|
||||||
Maximum number of segments of the device.
|
Maximum number of elements in a DMA scatter/gather list that is submitted
|
||||||
|
to the associated block driver.
|
||||||
|
|
||||||
max_segment_size (RO)
|
max_segment_size (RO)
|
||||||
---------------------
|
---------------------
|
||||||
Maximum segment size of the device.
|
Maximum size in bytes of a single element in a DMA scatter/gather list.
|
||||||
|
|
||||||
minimum_io_size (RO)
|
minimum_io_size (RO)
|
||||||
--------------------
|
--------------------
|
||||||
|
@ -132,6 +157,12 @@ per-block-cgroup request pool. IOW, if there are N block cgroups,
|
||||||
each request queue may have up to N request pools, each independently
|
each request queue may have up to N request pools, each independently
|
||||||
regulated by nr_requests.
|
regulated by nr_requests.
|
||||||
|
|
||||||
|
nr_zones (RO)
|
||||||
|
-------------
|
||||||
|
For zoned block devices (zoned attribute indicating "host-managed" or
|
||||||
|
"host-aware"), this indicates the total number of zones of the device.
|
||||||
|
This is always 0 for regular block devices.
|
||||||
|
|
||||||
optimal_io_size (RO)
|
optimal_io_size (RO)
|
||||||
--------------------
|
--------------------
|
||||||
This is the optimal IO size reported by the device.
|
This is the optimal IO size reported by the device.
|
||||||
|
@ -185,8 +216,8 @@ This is the number of bytes the device can write in a single write-same
|
||||||
command. A value of '0' means write-same is not supported by this
|
command. A value of '0' means write-same is not supported by this
|
||||||
device.
|
device.
|
||||||
|
|
||||||
wb_lat_usec (RW)
|
wbt_lat_usec (RW)
|
||||||
----------------
|
-----------------
|
||||||
If the device is registered for writeback throttling, then this file shows
|
If the device is registered for writeback throttling, then this file shows
|
||||||
the target minimum read latency. If this latency is exceeded in a given
|
the target minimum read latency. If this latency is exceeded in a given
|
||||||
window of time (see wb_window_usec), then the writeback throttling will start
|
window of time (see wb_window_usec), then the writeback throttling will start
|
||||||
|
@ -201,6 +232,12 @@ blk-throttle makes decision based on the samplings. Lower time means cgroups
|
||||||
have more smooth throughput, but higher CPU overhead. This exists only when
|
have more smooth throughput, but higher CPU overhead. This exists only when
|
||||||
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
|
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
|
||||||
|
|
||||||
|
write_zeroes_max_bytes (RO)
|
||||||
|
---------------------------
|
||||||
|
For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
|
||||||
|
bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
|
||||||
|
is not supported.
|
||||||
|
|
||||||
zoned (RO)
|
zoned (RO)
|
||||||
----------
|
----------
|
||||||
This indicates if the device is a zoned block device and the zone model of the
|
This indicates if the device is a zoned block device and the zone model of the
|
||||||
|
@ -213,19 +250,4 @@ devices are described in the ZBC (Zoned Block Commands) and ZAC
|
||||||
do not support zone commands, they will be treated as regular block devices
|
do not support zone commands, they will be treated as regular block devices
|
||||||
and zoned will report "none".
|
and zoned will report "none".
|
||||||
|
|
||||||
nr_zones (RO)
|
|
||||||
-------------
|
|
||||||
For zoned block devices (zoned attribute indicating "host-managed" or
|
|
||||||
"host-aware"), this indicates the total number of zones of the device.
|
|
||||||
This is always 0 for regular block devices.
|
|
||||||
|
|
||||||
chunk_sectors (RO)
|
|
||||||
------------------
|
|
||||||
This has different meaning depending on the type of the block device.
|
|
||||||
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
|
|
||||||
of the RAID volume stripe segment. For a zoned block device, either host-aware
|
|
||||||
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
|
|
||||||
of the device, with the eventual exception of the last zone of the device which
|
|
||||||
may be smaller.
|
|
||||||
|
|
||||||
Jens Axboe <jens.axboe@oracle.com>, February 2009
|
Jens Axboe <jens.axboe@oracle.com>, February 2009
|
||||||
|
|
|
@ -82,7 +82,7 @@ Various user visible config options
|
||||||
CONFIG_BLK_CGROUP
|
CONFIG_BLK_CGROUP
|
||||||
- Block IO controller.
|
- Block IO controller.
|
||||||
|
|
||||||
CONFIG_DEBUG_BLK_CGROUP
|
CONFIG_BFQ_CGROUP_DEBUG
|
||||||
- Debug help. Right now some additional stats file show up in cgroup
|
- Debug help. Right now some additional stats file show up in cgroup
|
||||||
if this option is enabled.
|
if this option is enabled.
|
||||||
|
|
||||||
|
@ -202,13 +202,13 @@ Proportional weight policy files
|
||||||
write, sync or async.
|
write, sync or async.
|
||||||
|
|
||||||
- blkio.avg_queue_size
|
- blkio.avg_queue_size
|
||||||
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
|
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
|
||||||
The average queue size for this cgroup over the entire time of this
|
The average queue size for this cgroup over the entire time of this
|
||||||
cgroup's existence. Queue size samples are taken each time one of the
|
cgroup's existence. Queue size samples are taken each time one of the
|
||||||
queues of this cgroup gets a timeslice.
|
queues of this cgroup gets a timeslice.
|
||||||
|
|
||||||
- blkio.group_wait_time
|
- blkio.group_wait_time
|
||||||
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
|
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
|
||||||
This is the amount of time the cgroup had to wait since it became busy
|
This is the amount of time the cgroup had to wait since it became busy
|
||||||
(i.e., went from 0 to 1 request queued) to get a timeslice for one of
|
(i.e., went from 0 to 1 request queued) to get a timeslice for one of
|
||||||
its queues. This is different from the io_wait_time which is the
|
its queues. This is different from the io_wait_time which is the
|
||||||
|
@ -219,7 +219,7 @@ Proportional weight policy files
|
||||||
got a timeslice and will not include the current delta.
|
got a timeslice and will not include the current delta.
|
||||||
|
|
||||||
- blkio.empty_time
|
- blkio.empty_time
|
||||||
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
|
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
|
||||||
This is the amount of time a cgroup spends without any pending
|
This is the amount of time a cgroup spends without any pending
|
||||||
requests when not being served, i.e., it does not include any time
|
requests when not being served, i.e., it does not include any time
|
||||||
spent idling for one of the queues of the cgroup. This is in
|
spent idling for one of the queues of the cgroup. This is in
|
||||||
|
@ -228,7 +228,7 @@ Proportional weight policy files
|
||||||
time it had a pending request and will not include the current delta.
|
time it had a pending request and will not include the current delta.
|
||||||
|
|
||||||
- blkio.idle_time
|
- blkio.idle_time
|
||||||
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
|
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
|
||||||
This is the amount of time spent by the IO scheduler idling for a
|
This is the amount of time spent by the IO scheduler idling for a
|
||||||
given cgroup in anticipation of a better request than the existing ones
|
given cgroup in anticipation of a better request than the existing ones
|
||||||
from other queues/cgroups. This is in nanoseconds. If this is read
|
from other queues/cgroups. This is in nanoseconds. If this is read
|
||||||
|
@ -237,7 +237,7 @@ Proportional weight policy files
|
||||||
the current delta.
|
the current delta.
|
||||||
|
|
||||||
- blkio.dequeue
|
- blkio.dequeue
|
||||||
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. This
|
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
|
||||||
gives the statistics about how many a times a group was dequeued
|
gives the statistics about how many a times a group was dequeued
|
||||||
from service tree of the device. First two fields specify the major
|
from service tree of the device. First two fields specify the major
|
||||||
and minor number of the device and third field specifies the number
|
and minor number of the device and third field specifies the number
|
||||||
|
|
|
@ -114,3 +114,59 @@ R13: ffff88011a3c9680 R14: 0000000000000000 R15: 0000000000000000
|
||||||
cpu_startup_entry+0x6f/0x80
|
cpu_startup_entry+0x6f/0x80
|
||||||
start_secondary+0x187/0x1e0
|
start_secondary+0x187/0x1e0
|
||||||
secondary_startup_64+0xa5/0xb0
|
secondary_startup_64+0xa5/0xb0
|
||||||
|
|
||||||
|
Example 3: Inject an error into the 10th admin command
|
||||||
|
------------------------------------------------------
|
||||||
|
|
||||||
|
echo 100 > /sys/kernel/debug/nvme0/fault_inject/probability
|
||||||
|
echo 10 > /sys/kernel/debug/nvme0/fault_inject/space
|
||||||
|
echo 1 > /sys/kernel/debug/nvme0/fault_inject/times
|
||||||
|
nvme reset /dev/nvme0
|
||||||
|
|
||||||
|
Expected Result:
|
||||||
|
|
||||||
|
After NVMe controller reset, the reinitialization may or may not succeed.
|
||||||
|
It depends on which admin command is actually forced to fail.
|
||||||
|
|
||||||
|
Message from dmesg:
|
||||||
|
|
||||||
|
nvme nvme0: resetting controller
|
||||||
|
FAULT_INJECTION: forcing a failure.
|
||||||
|
name fault_inject, interval 1, probability 100, space 1, times 1
|
||||||
|
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2
|
||||||
|
Hardware name: MSI MS-7A45/B150M MORTAR ARCTIC (MS-7A45), BIOS 1.50 04/25/2017
|
||||||
|
Call Trace:
|
||||||
|
<IRQ>
|
||||||
|
dump_stack+0x63/0x85
|
||||||
|
should_fail+0x14a/0x170
|
||||||
|
nvme_should_fail+0x38/0x80 [nvme_core]
|
||||||
|
nvme_irq+0x129/0x280 [nvme]
|
||||||
|
? blk_mq_end_request+0xb3/0x120
|
||||||
|
__handle_irq_event_percpu+0x84/0x1a0
|
||||||
|
handle_irq_event_percpu+0x32/0x80
|
||||||
|
handle_irq_event+0x3b/0x60
|
||||||
|
handle_edge_irq+0x7f/0x1a0
|
||||||
|
handle_irq+0x20/0x30
|
||||||
|
do_IRQ+0x4e/0xe0
|
||||||
|
common_interrupt+0xf/0xf
|
||||||
|
</IRQ>
|
||||||
|
RIP: 0010:cpuidle_enter_state+0xc5/0x460
|
||||||
|
Code: ff e8 8f 5f 86 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 69 03 00 00 31 ff e8 62 aa 8c ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 37 03 00 00 4c 8b 45 d0 4c 2b 45 b8 48 ba cf f7 53
|
||||||
|
RSP: 0018:ffffffff88c03dd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc
|
||||||
|
RAX: ffff9dac25a2ac80 RBX: ffffffff88d53760 RCX: 000000000000001f
|
||||||
|
RDX: 0000000000000000 RSI: 000000002d958403 RDI: 0000000000000000
|
||||||
|
RBP: ffffffff88c03e18 R08: fffffff75e35ffb7 R09: 00000a49a56c0b48
|
||||||
|
R10: ffffffff88c03da0 R11: 0000000000001b0c R12: ffff9dac25a34d00
|
||||||
|
R13: 0000000000000006 R14: 0000000000000006 R15: ffffffff88d53760
|
||||||
|
cpuidle_enter+0x2e/0x40
|
||||||
|
call_cpuidle+0x23/0x40
|
||||||
|
do_idle+0x201/0x280
|
||||||
|
cpu_startup_entry+0x1d/0x20
|
||||||
|
rest_init+0xaa/0xb0
|
||||||
|
arch_call_rest_init+0xe/0x1b
|
||||||
|
start_kernel+0x51c/0x53b
|
||||||
|
x86_64_start_reservations+0x24/0x26
|
||||||
|
x86_64_start_kernel+0x74/0x77
|
||||||
|
secondary_startup_64+0xa4/0xb0
|
||||||
|
nvme nvme0: Could not set queue count (16385)
|
||||||
|
nvme nvme0: IO queues not created
|
||||||
|
|
|
@ -36,6 +36,13 @@ config BFQ_GROUP_IOSCHED
|
||||||
Enable hierarchical scheduling in BFQ, using the blkio
|
Enable hierarchical scheduling in BFQ, using the blkio
|
||||||
(cgroups-v1) or io (cgroups-v2) controller.
|
(cgroups-v1) or io (cgroups-v2) controller.
|
||||||
|
|
||||||
|
config BFQ_CGROUP_DEBUG
|
||||||
|
bool "BFQ IO controller debugging"
|
||||||
|
depends on BFQ_GROUP_IOSCHED
|
||||||
|
---help---
|
||||||
|
Enable some debugging help. Currently it exports additional stat
|
||||||
|
files in a cgroup which can be useful for debugging.
|
||||||
|
|
||||||
endmenu
|
endmenu
|
||||||
|
|
||||||
endif
|
endif
|
||||||
|
|
|
@ -15,7 +15,83 @@
|
||||||
|
|
||||||
#include "bfq-iosched.h"
|
#include "bfq-iosched.h"
|
||||||
|
|
||||||
#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
|
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||||
|
static int bfq_stat_init(struct bfq_stat *stat, gfp_t gfp)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
ret = percpu_counter_init(&stat->cpu_cnt, 0, gfp);
|
||||||
|
if (ret)
|
||||||
|
return ret;
|
||||||
|
|
||||||
|
atomic64_set(&stat->aux_cnt, 0);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void bfq_stat_exit(struct bfq_stat *stat)
|
||||||
|
{
|
||||||
|
percpu_counter_destroy(&stat->cpu_cnt);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* bfq_stat_add - add a value to a bfq_stat
|
||||||
|
* @stat: target bfq_stat
|
||||||
|
* @val: value to add
|
||||||
|
*
|
||||||
|
* Add @val to @stat. The caller must ensure that IRQ on the same CPU
|
||||||
|
* don't re-enter this function for the same counter.
|
||||||
|
*/
|
||||||
|
static inline void bfq_stat_add(struct bfq_stat *stat, uint64_t val)
|
||||||
|
{
|
||||||
|
percpu_counter_add_batch(&stat->cpu_cnt, val, BLKG_STAT_CPU_BATCH);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* bfq_stat_read - read the current value of a bfq_stat
|
||||||
|
* @stat: bfq_stat to read
|
||||||
|
*/
|
||||||
|
static inline uint64_t bfq_stat_read(struct bfq_stat *stat)
|
||||||
|
{
|
||||||
|
return percpu_counter_sum_positive(&stat->cpu_cnt);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* bfq_stat_reset - reset a bfq_stat
|
||||||
|
* @stat: bfq_stat to reset
|
||||||
|
*/
|
||||||
|
static inline void bfq_stat_reset(struct bfq_stat *stat)
|
||||||
|
{
|
||||||
|
percpu_counter_set(&stat->cpu_cnt, 0);
|
||||||
|
atomic64_set(&stat->aux_cnt, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* bfq_stat_add_aux - add a bfq_stat into another's aux count
|
||||||
|
* @to: the destination bfq_stat
|
||||||
|
* @from: the source
|
||||||
|
*
|
||||||
|
* Add @from's count including the aux one to @to's aux count.
|
||||||
|
*/
|
||||||
|
static inline void bfq_stat_add_aux(struct bfq_stat *to,
|
||||||
|
struct bfq_stat *from)
|
||||||
|
{
|
||||||
|
atomic64_add(bfq_stat_read(from) + atomic64_read(&from->aux_cnt),
|
||||||
|
&to->aux_cnt);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* blkg_prfill_stat - prfill callback for bfq_stat
|
||||||
|
* @sf: seq_file to print to
|
||||||
|
* @pd: policy private data of interest
|
||||||
|
* @off: offset to the bfq_stat in @pd
|
||||||
|
*
|
||||||
|
* prfill callback for printing a bfq_stat.
|
||||||
|
*/
|
||||||
|
static u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd,
|
||||||
|
int off)
|
||||||
|
{
|
||||||
|
return __blkg_prfill_u64(sf, pd, bfq_stat_read((void *)pd + off));
|
||||||
|
}
|
||||||
|
|
||||||
/* bfqg stats flags */
|
/* bfqg stats flags */
|
||||||
enum bfqg_stats_flags {
|
enum bfqg_stats_flags {
|
||||||
|
@ -53,7 +129,7 @@ static void bfqg_stats_update_group_wait_time(struct bfqg_stats *stats)
|
||||||
|
|
||||||
now = ktime_get_ns();
|
now = ktime_get_ns();
|
||||||
if (now > stats->start_group_wait_time)
|
if (now > stats->start_group_wait_time)
|
||||||
blkg_stat_add(&stats->group_wait_time,
|
bfq_stat_add(&stats->group_wait_time,
|
||||||
now - stats->start_group_wait_time);
|
now - stats->start_group_wait_time);
|
||||||
bfqg_stats_clear_waiting(stats);
|
bfqg_stats_clear_waiting(stats);
|
||||||
}
|
}
|
||||||
|
@ -82,14 +158,14 @@ static void bfqg_stats_end_empty_time(struct bfqg_stats *stats)
|
||||||
|
|
||||||
now = ktime_get_ns();
|
now = ktime_get_ns();
|
||||||
if (now > stats->start_empty_time)
|
if (now > stats->start_empty_time)
|
||||||
blkg_stat_add(&stats->empty_time,
|
bfq_stat_add(&stats->empty_time,
|
||||||
now - stats->start_empty_time);
|
now - stats->start_empty_time);
|
||||||
bfqg_stats_clear_empty(stats);
|
bfqg_stats_clear_empty(stats);
|
||||||
}
|
}
|
||||||
|
|
||||||
void bfqg_stats_update_dequeue(struct bfq_group *bfqg)
|
void bfqg_stats_update_dequeue(struct bfq_group *bfqg)
|
||||||
{
|
{
|
||||||
blkg_stat_add(&bfqg->stats.dequeue, 1);
|
bfq_stat_add(&bfqg->stats.dequeue, 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg)
|
void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg)
|
||||||
|
@ -119,7 +195,7 @@ void bfqg_stats_update_idle_time(struct bfq_group *bfqg)
|
||||||
u64 now = ktime_get_ns();
|
u64 now = ktime_get_ns();
|
||||||
|
|
||||||
if (now > stats->start_idle_time)
|
if (now > stats->start_idle_time)
|
||||||
blkg_stat_add(&stats->idle_time,
|
bfq_stat_add(&stats->idle_time,
|
||||||
now - stats->start_idle_time);
|
now - stats->start_idle_time);
|
||||||
bfqg_stats_clear_idling(stats);
|
bfqg_stats_clear_idling(stats);
|
||||||
}
|
}
|
||||||
|
@ -137,9 +213,9 @@ void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg)
|
||||||
{
|
{
|
||||||
struct bfqg_stats *stats = &bfqg->stats;
|
struct bfqg_stats *stats = &bfqg->stats;
|
||||||
|
|
||||||
blkg_stat_add(&stats->avg_queue_size_sum,
|
bfq_stat_add(&stats->avg_queue_size_sum,
|
||||||
blkg_rwstat_total(&stats->queued));
|
blkg_rwstat_total(&stats->queued));
|
||||||
blkg_stat_add(&stats->avg_queue_size_samples, 1);
|
bfq_stat_add(&stats->avg_queue_size_samples, 1);
|
||||||
bfqg_stats_update_group_wait_time(stats);
|
bfqg_stats_update_group_wait_time(stats);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -176,7 +252,7 @@ void bfqg_stats_update_completion(struct bfq_group *bfqg, u64 start_time_ns,
|
||||||
io_start_time_ns - start_time_ns);
|
io_start_time_ns - start_time_ns);
|
||||||
}
|
}
|
||||||
|
|
||||||
#else /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
|
#else /* CONFIG_BFQ_CGROUP_DEBUG */
|
||||||
|
|
||||||
void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
|
void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
|
||||||
unsigned int op) { }
|
unsigned int op) { }
|
||||||
|
@ -190,7 +266,7 @@ void bfqg_stats_update_idle_time(struct bfq_group *bfqg) { }
|
||||||
void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { }
|
void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { }
|
||||||
void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { }
|
void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { }
|
||||||
|
|
||||||
#endif /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
|
#endif /* CONFIG_BFQ_CGROUP_DEBUG */
|
||||||
|
|
||||||
#ifdef CONFIG_BFQ_GROUP_IOSCHED
|
#ifdef CONFIG_BFQ_GROUP_IOSCHED
|
||||||
|
|
||||||
|
@ -274,18 +350,18 @@ void bfqg_and_blkg_put(struct bfq_group *bfqg)
|
||||||
/* @stats = 0 */
|
/* @stats = 0 */
|
||||||
static void bfqg_stats_reset(struct bfqg_stats *stats)
|
static void bfqg_stats_reset(struct bfqg_stats *stats)
|
||||||
{
|
{
|
||||||
#ifdef CONFIG_DEBUG_BLK_CGROUP
|
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||||
/* queued stats shouldn't be cleared */
|
/* queued stats shouldn't be cleared */
|
||||||
blkg_rwstat_reset(&stats->merged);
|
blkg_rwstat_reset(&stats->merged);
|
||||||
blkg_rwstat_reset(&stats->service_time);
|
blkg_rwstat_reset(&stats->service_time);
|
||||||
blkg_rwstat_reset(&stats->wait_time);
|
blkg_rwstat_reset(&stats->wait_time);
|
||||||
blkg_stat_reset(&stats->time);
|
bfq_stat_reset(&stats->time);
|
||||||
blkg_stat_reset(&stats->avg_queue_size_sum);
|
bfq_stat_reset(&stats->avg_queue_size_sum);
|
||||||
blkg_stat_reset(&stats->avg_queue_size_samples);
|
bfq_stat_reset(&stats->avg_queue_size_samples);
|
||||||
blkg_stat_reset(&stats->dequeue);
|
bfq_stat_reset(&stats->dequeue);
|
||||||
blkg_stat_reset(&stats->group_wait_time);
|
bfq_stat_reset(&stats->group_wait_time);
|
||||||
blkg_stat_reset(&stats->idle_time);
|
bfq_stat_reset(&stats->idle_time);
|
||||||
blkg_stat_reset(&stats->empty_time);
|
bfq_stat_reset(&stats->empty_time);
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -295,19 +371,19 @@ static void bfqg_stats_add_aux(struct bfqg_stats *to, struct bfqg_stats *from)
|
||||||
if (!to || !from)
|
if (!to || !from)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
#ifdef CONFIG_DEBUG_BLK_CGROUP
|
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||||
/* queued stats shouldn't be cleared */
|
/* queued stats shouldn't be cleared */
|
||||||
blkg_rwstat_add_aux(&to->merged, &from->merged);
|
blkg_rwstat_add_aux(&to->merged, &from->merged);
|
||||||
blkg_rwstat_add_aux(&to->service_time, &from->service_time);
|
blkg_rwstat_add_aux(&to->service_time, &from->service_time);
|
||||||
blkg_rwstat_add_aux(&to->wait_time, &from->wait_time);
|
blkg_rwstat_add_aux(&to->wait_time, &from->wait_time);
|
||||||
blkg_stat_add_aux(&from->time, &from->time);
|
bfq_stat_add_aux(&from->time, &from->time);
|
||||||
blkg_stat_add_aux(&to->avg_queue_size_sum, &from->avg_queue_size_sum);
|
bfq_stat_add_aux(&to->avg_queue_size_sum, &from->avg_queue_size_sum);
|
||||||
blkg_stat_add_aux(&to->avg_queue_size_samples,
|
bfq_stat_add_aux(&to->avg_queue_size_samples,
|
||||||
&from->avg_queue_size_samples);
|
&from->avg_queue_size_samples);
|
||||||
blkg_stat_add_aux(&to->dequeue, &from->dequeue);
|
bfq_stat_add_aux(&to->dequeue, &from->dequeue);
|
||||||
blkg_stat_add_aux(&to->group_wait_time, &from->group_wait_time);
|
bfq_stat_add_aux(&to->group_wait_time, &from->group_wait_time);
|
||||||
blkg_stat_add_aux(&to->idle_time, &from->idle_time);
|
bfq_stat_add_aux(&to->idle_time, &from->idle_time);
|
||||||
blkg_stat_add_aux(&to->empty_time, &from->empty_time);
|
bfq_stat_add_aux(&to->empty_time, &from->empty_time);
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -355,35 +431,35 @@ void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg)
|
||||||
|
|
||||||
static void bfqg_stats_exit(struct bfqg_stats *stats)
|
static void bfqg_stats_exit(struct bfqg_stats *stats)
|
||||||
{
|
{
|
||||||
#ifdef CONFIG_DEBUG_BLK_CGROUP
|
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||||
blkg_rwstat_exit(&stats->merged);
|
blkg_rwstat_exit(&stats->merged);
|
||||||
blkg_rwstat_exit(&stats->service_time);
|
blkg_rwstat_exit(&stats->service_time);
|
||||||
blkg_rwstat_exit(&stats->wait_time);
|
blkg_rwstat_exit(&stats->wait_time);
|
||||||
blkg_rwstat_exit(&stats->queued);
|
blkg_rwstat_exit(&stats->queued);
|
||||||
blkg_stat_exit(&stats->time);
|
bfq_stat_exit(&stats->time);
|
||||||
blkg_stat_exit(&stats->avg_queue_size_sum);
|
bfq_stat_exit(&stats->avg_queue_size_sum);
|
||||||
blkg_stat_exit(&stats->avg_queue_size_samples);
|
bfq_stat_exit(&stats->avg_queue_size_samples);
|
||||||
blkg_stat_exit(&stats->dequeue);
|
bfq_stat_exit(&stats->dequeue);
|
||||||
blkg_stat_exit(&stats->group_wait_time);
|
bfq_stat_exit(&stats->group_wait_time);
|
||||||
blkg_stat_exit(&stats->idle_time);
|
bfq_stat_exit(&stats->idle_time);
|
||||||
blkg_stat_exit(&stats->empty_time);
|
bfq_stat_exit(&stats->empty_time);
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
|
static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
|
||||||
{
|
{
|
||||||
#ifdef CONFIG_DEBUG_BLK_CGROUP
|
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||||
if (blkg_rwstat_init(&stats->merged, gfp) ||
|
if (blkg_rwstat_init(&stats->merged, gfp) ||
|
||||||
blkg_rwstat_init(&stats->service_time, gfp) ||
|
blkg_rwstat_init(&stats->service_time, gfp) ||
|
||||||
blkg_rwstat_init(&stats->wait_time, gfp) ||
|
blkg_rwstat_init(&stats->wait_time, gfp) ||
|
||||||
blkg_rwstat_init(&stats->queued, gfp) ||
|
blkg_rwstat_init(&stats->queued, gfp) ||
|
||||||
blkg_stat_init(&stats->time, gfp) ||
|
bfq_stat_init(&stats->time, gfp) ||
|
||||||
blkg_stat_init(&stats->avg_queue_size_sum, gfp) ||
|
bfq_stat_init(&stats->avg_queue_size_sum, gfp) ||
|
||||||
blkg_stat_init(&stats->avg_queue_size_samples, gfp) ||
|
bfq_stat_init(&stats->avg_queue_size_samples, gfp) ||
|
||||||
blkg_stat_init(&stats->dequeue, gfp) ||
|
bfq_stat_init(&stats->dequeue, gfp) ||
|
||||||
blkg_stat_init(&stats->group_wait_time, gfp) ||
|
bfq_stat_init(&stats->group_wait_time, gfp) ||
|
||||||
blkg_stat_init(&stats->idle_time, gfp) ||
|
bfq_stat_init(&stats->idle_time, gfp) ||
|
||||||
blkg_stat_init(&stats->empty_time, gfp)) {
|
bfq_stat_init(&stats->empty_time, gfp)) {
|
||||||
bfqg_stats_exit(stats);
|
bfqg_stats_exit(stats);
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
}
|
}
|
||||||
|
@ -909,7 +985,7 @@ static ssize_t bfq_io_set_weight(struct kernfs_open_file *of,
|
||||||
return ret ?: nbytes;
|
return ret ?: nbytes;
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_DEBUG_BLK_CGROUP
|
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||||
static int bfqg_print_stat(struct seq_file *sf, void *v)
|
static int bfqg_print_stat(struct seq_file *sf, void *v)
|
||||||
{
|
{
|
||||||
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
|
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
|
||||||
|
@ -927,17 +1003,34 @@ static int bfqg_print_rwstat(struct seq_file *sf, void *v)
|
||||||
static u64 bfqg_prfill_stat_recursive(struct seq_file *sf,
|
static u64 bfqg_prfill_stat_recursive(struct seq_file *sf,
|
||||||
struct blkg_policy_data *pd, int off)
|
struct blkg_policy_data *pd, int off)
|
||||||
{
|
{
|
||||||
u64 sum = blkg_stat_recursive_sum(pd_to_blkg(pd),
|
struct blkcg_gq *blkg = pd_to_blkg(pd);
|
||||||
&blkcg_policy_bfq, off);
|
struct blkcg_gq *pos_blkg;
|
||||||
|
struct cgroup_subsys_state *pos_css;
|
||||||
|
u64 sum = 0;
|
||||||
|
|
||||||
|
lockdep_assert_held(&blkg->q->queue_lock);
|
||||||
|
|
||||||
|
rcu_read_lock();
|
||||||
|
blkg_for_each_descendant_pre(pos_blkg, pos_css, blkg) {
|
||||||
|
struct bfq_stat *stat;
|
||||||
|
|
||||||
|
if (!pos_blkg->online)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
stat = (void *)blkg_to_pd(pos_blkg, &blkcg_policy_bfq) + off;
|
||||||
|
sum += bfq_stat_read(stat) + atomic64_read(&stat->aux_cnt);
|
||||||
|
}
|
||||||
|
rcu_read_unlock();
|
||||||
|
|
||||||
return __blkg_prfill_u64(sf, pd, sum);
|
return __blkg_prfill_u64(sf, pd, sum);
|
||||||
}
|
}
|
||||||
|
|
||||||
static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf,
|
static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf,
|
||||||
struct blkg_policy_data *pd, int off)
|
struct blkg_policy_data *pd, int off)
|
||||||
{
|
{
|
||||||
struct blkg_rwstat sum = blkg_rwstat_recursive_sum(pd_to_blkg(pd),
|
struct blkg_rwstat_sample sum;
|
||||||
&blkcg_policy_bfq,
|
|
||||||
off);
|
blkg_rwstat_recursive_sum(pd_to_blkg(pd), &blkcg_policy_bfq, off, &sum);
|
||||||
return __blkg_prfill_rwstat(sf, pd, &sum);
|
return __blkg_prfill_rwstat(sf, pd, &sum);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -975,12 +1068,13 @@ static int bfqg_print_stat_sectors(struct seq_file *sf, void *v)
|
||||||
static u64 bfqg_prfill_sectors_recursive(struct seq_file *sf,
|
static u64 bfqg_prfill_sectors_recursive(struct seq_file *sf,
|
||||||
struct blkg_policy_data *pd, int off)
|
struct blkg_policy_data *pd, int off)
|
||||||
{
|
{
|
||||||
struct blkg_rwstat tmp = blkg_rwstat_recursive_sum(pd->blkg, NULL,
|
struct blkg_rwstat_sample tmp;
|
||||||
offsetof(struct blkcg_gq, stat_bytes));
|
|
||||||
u64 sum = atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_READ]) +
|
|
||||||
atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_WRITE]);
|
|
||||||
|
|
||||||
return __blkg_prfill_u64(sf, pd, sum >> 9);
|
blkg_rwstat_recursive_sum(pd->blkg, NULL,
|
||||||
|
offsetof(struct blkcg_gq, stat_bytes), &tmp);
|
||||||
|
|
||||||
|
return __blkg_prfill_u64(sf, pd,
|
||||||
|
(tmp.cnt[BLKG_RWSTAT_READ] + tmp.cnt[BLKG_RWSTAT_WRITE]) >> 9);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int bfqg_print_stat_sectors_recursive(struct seq_file *sf, void *v)
|
static int bfqg_print_stat_sectors_recursive(struct seq_file *sf, void *v)
|
||||||
|
@ -995,11 +1089,11 @@ static u64 bfqg_prfill_avg_queue_size(struct seq_file *sf,
|
||||||
struct blkg_policy_data *pd, int off)
|
struct blkg_policy_data *pd, int off)
|
||||||
{
|
{
|
||||||
struct bfq_group *bfqg = pd_to_bfqg(pd);
|
struct bfq_group *bfqg = pd_to_bfqg(pd);
|
||||||
u64 samples = blkg_stat_read(&bfqg->stats.avg_queue_size_samples);
|
u64 samples = bfq_stat_read(&bfqg->stats.avg_queue_size_samples);
|
||||||
u64 v = 0;
|
u64 v = 0;
|
||||||
|
|
||||||
if (samples) {
|
if (samples) {
|
||||||
v = blkg_stat_read(&bfqg->stats.avg_queue_size_sum);
|
v = bfq_stat_read(&bfqg->stats.avg_queue_size_sum);
|
||||||
v = div64_u64(v, samples);
|
v = div64_u64(v, samples);
|
||||||
}
|
}
|
||||||
__blkg_prfill_u64(sf, pd, v);
|
__blkg_prfill_u64(sf, pd, v);
|
||||||
|
@ -1014,7 +1108,7 @@ static int bfqg_print_avg_queue_size(struct seq_file *sf, void *v)
|
||||||
0, false);
|
0, false);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
#endif /* CONFIG_DEBUG_BLK_CGROUP */
|
#endif /* CONFIG_BFQ_CGROUP_DEBUG */
|
||||||
|
|
||||||
struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
|
struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
|
||||||
{
|
{
|
||||||
|
@ -1062,7 +1156,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
|
||||||
.private = (unsigned long)&blkcg_policy_bfq,
|
.private = (unsigned long)&blkcg_policy_bfq,
|
||||||
.seq_show = blkg_print_stat_ios,
|
.seq_show = blkg_print_stat_ios,
|
||||||
},
|
},
|
||||||
#ifdef CONFIG_DEBUG_BLK_CGROUP
|
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||||
{
|
{
|
||||||
.name = "bfq.time",
|
.name = "bfq.time",
|
||||||
.private = offsetof(struct bfq_group, stats.time),
|
.private = offsetof(struct bfq_group, stats.time),
|
||||||
|
@ -1092,7 +1186,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
|
||||||
.private = offsetof(struct bfq_group, stats.queued),
|
.private = offsetof(struct bfq_group, stats.queued),
|
||||||
.seq_show = bfqg_print_rwstat,
|
.seq_show = bfqg_print_rwstat,
|
||||||
},
|
},
|
||||||
#endif /* CONFIG_DEBUG_BLK_CGROUP */
|
#endif /* CONFIG_BFQ_CGROUP_DEBUG */
|
||||||
|
|
||||||
/* the same statistics which cover the bfqg and its descendants */
|
/* the same statistics which cover the bfqg and its descendants */
|
||||||
{
|
{
|
||||||
|
@ -1105,7 +1199,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
|
||||||
.private = (unsigned long)&blkcg_policy_bfq,
|
.private = (unsigned long)&blkcg_policy_bfq,
|
||||||
.seq_show = blkg_print_stat_ios_recursive,
|
.seq_show = blkg_print_stat_ios_recursive,
|
||||||
},
|
},
|
||||||
#ifdef CONFIG_DEBUG_BLK_CGROUP
|
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||||
{
|
{
|
||||||
.name = "bfq.time_recursive",
|
.name = "bfq.time_recursive",
|
||||||
.private = offsetof(struct bfq_group, stats.time),
|
.private = offsetof(struct bfq_group, stats.time),
|
||||||
|
@ -1159,7 +1253,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
|
||||||
.private = offsetof(struct bfq_group, stats.dequeue),
|
.private = offsetof(struct bfq_group, stats.dequeue),
|
||||||
.seq_show = bfqg_print_stat,
|
.seq_show = bfqg_print_stat,
|
||||||
},
|
},
|
||||||
#endif /* CONFIG_DEBUG_BLK_CGROUP */
|
#endif /* CONFIG_BFQ_CGROUP_DEBUG */
|
||||||
{ } /* terminate */
|
{ } /* terminate */
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -357,6 +357,24 @@ struct bfq_queue {
|
||||||
|
|
||||||
/* max service rate measured so far */
|
/* max service rate measured so far */
|
||||||
u32 max_service_rate;
|
u32 max_service_rate;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Pointer to the waker queue for this queue, i.e., to the
|
||||||
|
* queue Q such that this queue happens to get new I/O right
|
||||||
|
* after some I/O request of Q is completed. For details, see
|
||||||
|
* the comments on the choice of the queue for injection in
|
||||||
|
* bfq_select_queue().
|
||||||
|
*/
|
||||||
|
struct bfq_queue *waker_bfqq;
|
||||||
|
/* node for woken_list, see below */
|
||||||
|
struct hlist_node woken_list_node;
|
||||||
|
/*
|
||||||
|
* Head of the list of the woken queues for this queue, i.e.,
|
||||||
|
* of the list of the queues for which this queue is a waker
|
||||||
|
* queue. This list is used to reset the waker_bfqq pointer in
|
||||||
|
* the woken queues when this queue exits.
|
||||||
|
*/
|
||||||
|
struct hlist_head woken_list;
|
||||||
};
|
};
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@ -533,6 +551,9 @@ struct bfq_data {
|
||||||
/* time of last request completion (ns) */
|
/* time of last request completion (ns) */
|
||||||
u64 last_completion;
|
u64 last_completion;
|
||||||
|
|
||||||
|
/* bfqq owning the last completed rq */
|
||||||
|
struct bfq_queue *last_completed_rq_bfqq;
|
||||||
|
|
||||||
/* time of last transition from empty to non-empty (ns) */
|
/* time of last transition from empty to non-empty (ns) */
|
||||||
u64 last_empty_occupied_ns;
|
u64 last_empty_occupied_ns;
|
||||||
|
|
||||||
|
@ -743,7 +764,8 @@ enum bfqq_state_flags {
|
||||||
* update
|
* update
|
||||||
*/
|
*/
|
||||||
BFQQF_coop, /* bfqq is shared */
|
BFQQF_coop, /* bfqq is shared */
|
||||||
BFQQF_split_coop /* shared bfqq will be split */
|
BFQQF_split_coop, /* shared bfqq will be split */
|
||||||
|
BFQQF_has_waker /* bfqq has a waker queue */
|
||||||
};
|
};
|
||||||
|
|
||||||
#define BFQ_BFQQ_FNS(name) \
|
#define BFQ_BFQQ_FNS(name) \
|
||||||
|
@ -763,6 +785,7 @@ BFQ_BFQQ_FNS(in_large_burst);
|
||||||
BFQ_BFQQ_FNS(coop);
|
BFQ_BFQQ_FNS(coop);
|
||||||
BFQ_BFQQ_FNS(split_coop);
|
BFQ_BFQQ_FNS(split_coop);
|
||||||
BFQ_BFQQ_FNS(softrt_update);
|
BFQ_BFQQ_FNS(softrt_update);
|
||||||
|
BFQ_BFQQ_FNS(has_waker);
|
||||||
#undef BFQ_BFQQ_FNS
|
#undef BFQ_BFQQ_FNS
|
||||||
|
|
||||||
/* Expiration reasons. */
|
/* Expiration reasons. */
|
||||||
|
@ -777,8 +800,13 @@ enum bfqq_expiration {
|
||||||
BFQQE_PREEMPTED /* preemption in progress */
|
BFQQE_PREEMPTED /* preemption in progress */
|
||||||
};
|
};
|
||||||
|
|
||||||
|
struct bfq_stat {
|
||||||
|
struct percpu_counter cpu_cnt;
|
||||||
|
atomic64_t aux_cnt;
|
||||||
|
};
|
||||||
|
|
||||||
struct bfqg_stats {
|
struct bfqg_stats {
|
||||||
#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
|
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||||
/* number of ios merged */
|
/* number of ios merged */
|
||||||
struct blkg_rwstat merged;
|
struct blkg_rwstat merged;
|
||||||
/* total time spent on device in ns, may not be accurate w/ queueing */
|
/* total time spent on device in ns, may not be accurate w/ queueing */
|
||||||
|
@ -788,25 +816,25 @@ struct bfqg_stats {
|
||||||
/* number of IOs queued up */
|
/* number of IOs queued up */
|
||||||
struct blkg_rwstat queued;
|
struct blkg_rwstat queued;
|
||||||
/* total disk time and nr sectors dispatched by this group */
|
/* total disk time and nr sectors dispatched by this group */
|
||||||
struct blkg_stat time;
|
struct bfq_stat time;
|
||||||
/* sum of number of ios queued across all samples */
|
/* sum of number of ios queued across all samples */
|
||||||
struct blkg_stat avg_queue_size_sum;
|
struct bfq_stat avg_queue_size_sum;
|
||||||
/* count of samples taken for average */
|
/* count of samples taken for average */
|
||||||
struct blkg_stat avg_queue_size_samples;
|
struct bfq_stat avg_queue_size_samples;
|
||||||
/* how many times this group has been removed from service tree */
|
/* how many times this group has been removed from service tree */
|
||||||
struct blkg_stat dequeue;
|
struct bfq_stat dequeue;
|
||||||
/* total time spent waiting for it to be assigned a timeslice. */
|
/* total time spent waiting for it to be assigned a timeslice. */
|
||||||
struct blkg_stat group_wait_time;
|
struct bfq_stat group_wait_time;
|
||||||
/* time spent idling for this blkcg_gq */
|
/* time spent idling for this blkcg_gq */
|
||||||
struct blkg_stat idle_time;
|
struct bfq_stat idle_time;
|
||||||
/* total time with empty current active q with other requests queued */
|
/* total time with empty current active q with other requests queued */
|
||||||
struct blkg_stat empty_time;
|
struct bfq_stat empty_time;
|
||||||
/* fields after this shouldn't be cleared on stat reset */
|
/* fields after this shouldn't be cleared on stat reset */
|
||||||
u64 start_group_wait_time;
|
u64 start_group_wait_time;
|
||||||
u64 start_idle_time;
|
u64 start_idle_time;
|
||||||
u64 start_empty_time;
|
u64 start_empty_time;
|
||||||
uint16_t flags;
|
uint16_t flags;
|
||||||
#endif /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
|
#endif /* CONFIG_BFQ_CGROUP_DEBUG */
|
||||||
};
|
};
|
||||||
|
|
||||||
#ifdef CONFIG_BFQ_GROUP_IOSCHED
|
#ifdef CONFIG_BFQ_GROUP_IOSCHED
|
||||||
|
|
96
block/bio.c
96
block/bio.c
|
@ -558,14 +558,6 @@ void bio_put(struct bio *bio)
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL(bio_put);
|
EXPORT_SYMBOL(bio_put);
|
||||||
|
|
||||||
int bio_phys_segments(struct request_queue *q, struct bio *bio)
|
|
||||||
{
|
|
||||||
if (unlikely(!bio_flagged(bio, BIO_SEG_VALID)))
|
|
||||||
blk_recount_segments(q, bio);
|
|
||||||
|
|
||||||
return bio->bi_phys_segments;
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* __bio_clone_fast - clone a bio that shares the original bio's biovec
|
* __bio_clone_fast - clone a bio that shares the original bio's biovec
|
||||||
* @bio: destination bio
|
* @bio: destination bio
|
||||||
|
@ -731,10 +723,10 @@ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if (bio_full(bio))
|
if (bio_full(bio, len))
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
if (bio->bi_phys_segments >= queue_max_segments(q))
|
if (bio->bi_vcnt >= queue_max_segments(q))
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
bvec = &bio->bi_io_vec[bio->bi_vcnt];
|
bvec = &bio->bi_io_vec[bio->bi_vcnt];
|
||||||
|
@ -744,8 +736,6 @@ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
|
||||||
bio->bi_vcnt++;
|
bio->bi_vcnt++;
|
||||||
done:
|
done:
|
||||||
bio->bi_iter.bi_size += len;
|
bio->bi_iter.bi_size += len;
|
||||||
bio->bi_phys_segments = bio->bi_vcnt;
|
|
||||||
bio_set_flag(bio, BIO_SEG_VALID);
|
|
||||||
return len;
|
return len;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -807,7 +797,7 @@ void __bio_add_page(struct bio *bio, struct page *page,
|
||||||
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
|
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
|
||||||
|
|
||||||
WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
|
WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
|
||||||
WARN_ON_ONCE(bio_full(bio));
|
WARN_ON_ONCE(bio_full(bio, len));
|
||||||
|
|
||||||
bv->bv_page = page;
|
bv->bv_page = page;
|
||||||
bv->bv_offset = off;
|
bv->bv_offset = off;
|
||||||
|
@ -834,7 +824,7 @@ int bio_add_page(struct bio *bio, struct page *page,
|
||||||
bool same_page = false;
|
bool same_page = false;
|
||||||
|
|
||||||
if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) {
|
if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) {
|
||||||
if (bio_full(bio))
|
if (bio_full(bio, len))
|
||||||
return 0;
|
return 0;
|
||||||
__bio_add_page(bio, page, len, offset);
|
__bio_add_page(bio, page, len, offset);
|
||||||
}
|
}
|
||||||
|
@ -842,22 +832,19 @@ int bio_add_page(struct bio *bio, struct page *page,
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL(bio_add_page);
|
EXPORT_SYMBOL(bio_add_page);
|
||||||
|
|
||||||
static void bio_get_pages(struct bio *bio)
|
void bio_release_pages(struct bio *bio, bool mark_dirty)
|
||||||
{
|
{
|
||||||
struct bvec_iter_all iter_all;
|
struct bvec_iter_all iter_all;
|
||||||
struct bio_vec *bvec;
|
struct bio_vec *bvec;
|
||||||
|
|
||||||
bio_for_each_segment_all(bvec, bio, iter_all)
|
if (bio_flagged(bio, BIO_NO_PAGE_REF))
|
||||||
get_page(bvec->bv_page);
|
return;
|
||||||
}
|
|
||||||
|
|
||||||
static void bio_release_pages(struct bio *bio)
|
bio_for_each_segment_all(bvec, bio, iter_all) {
|
||||||
{
|
if (mark_dirty && !PageCompound(bvec->bv_page))
|
||||||
struct bvec_iter_all iter_all;
|
set_page_dirty_lock(bvec->bv_page);
|
||||||
struct bio_vec *bvec;
|
|
||||||
|
|
||||||
bio_for_each_segment_all(bvec, bio, iter_all)
|
|
||||||
put_page(bvec->bv_page);
|
put_page(bvec->bv_page);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
|
static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
|
||||||
|
@ -922,7 +909,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
|
||||||
if (same_page)
|
if (same_page)
|
||||||
put_page(page);
|
put_page(page);
|
||||||
} else {
|
} else {
|
||||||
if (WARN_ON_ONCE(bio_full(bio)))
|
if (WARN_ON_ONCE(bio_full(bio, len)))
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
__bio_add_page(bio, page, len, offset);
|
__bio_add_page(bio, page, len, offset);
|
||||||
}
|
}
|
||||||
|
@ -966,13 +953,10 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
|
||||||
ret = __bio_iov_bvec_add_pages(bio, iter);
|
ret = __bio_iov_bvec_add_pages(bio, iter);
|
||||||
else
|
else
|
||||||
ret = __bio_iov_iter_get_pages(bio, iter);
|
ret = __bio_iov_iter_get_pages(bio, iter);
|
||||||
} while (!ret && iov_iter_count(iter) && !bio_full(bio));
|
} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
|
||||||
|
|
||||||
if (iov_iter_bvec_no_ref(iter))
|
if (is_bvec)
|
||||||
bio_set_flag(bio, BIO_NO_PAGE_REF);
|
bio_set_flag(bio, BIO_NO_PAGE_REF);
|
||||||
else if (is_bvec)
|
|
||||||
bio_get_pages(bio);
|
|
||||||
|
|
||||||
return bio->bi_vcnt ? 0 : ret;
|
return bio->bi_vcnt ? 0 : ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1124,8 +1108,7 @@ static struct bio_map_data *bio_alloc_map_data(struct iov_iter *data,
|
||||||
if (data->nr_segs > UIO_MAXIOV)
|
if (data->nr_segs > UIO_MAXIOV)
|
||||||
return NULL;
|
return NULL;
|
||||||
|
|
||||||
bmd = kmalloc(sizeof(struct bio_map_data) +
|
bmd = kmalloc(struct_size(bmd, iov, data->nr_segs), gfp_mask);
|
||||||
sizeof(struct iovec) * data->nr_segs, gfp_mask);
|
|
||||||
if (!bmd)
|
if (!bmd)
|
||||||
return NULL;
|
return NULL;
|
||||||
memcpy(bmd->iov, data->iov, sizeof(struct iovec) * data->nr_segs);
|
memcpy(bmd->iov, data->iov, sizeof(struct iovec) * data->nr_segs);
|
||||||
|
@ -1371,8 +1354,6 @@ struct bio *bio_map_user_iov(struct request_queue *q,
|
||||||
int j;
|
int j;
|
||||||
struct bio *bio;
|
struct bio *bio;
|
||||||
int ret;
|
int ret;
|
||||||
struct bio_vec *bvec;
|
|
||||||
struct bvec_iter_all iter_all;
|
|
||||||
|
|
||||||
if (!iov_iter_count(iter))
|
if (!iov_iter_count(iter))
|
||||||
return ERR_PTR(-EINVAL);
|
return ERR_PTR(-EINVAL);
|
||||||
|
@ -1439,31 +1420,11 @@ struct bio *bio_map_user_iov(struct request_queue *q,
|
||||||
return bio;
|
return bio;
|
||||||
|
|
||||||
out_unmap:
|
out_unmap:
|
||||||
bio_for_each_segment_all(bvec, bio, iter_all) {
|
bio_release_pages(bio, false);
|
||||||
put_page(bvec->bv_page);
|
|
||||||
}
|
|
||||||
bio_put(bio);
|
bio_put(bio);
|
||||||
return ERR_PTR(ret);
|
return ERR_PTR(ret);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void __bio_unmap_user(struct bio *bio)
|
|
||||||
{
|
|
||||||
struct bio_vec *bvec;
|
|
||||||
struct bvec_iter_all iter_all;
|
|
||||||
|
|
||||||
/*
|
|
||||||
* make sure we dirty pages we wrote to
|
|
||||||
*/
|
|
||||||
bio_for_each_segment_all(bvec, bio, iter_all) {
|
|
||||||
if (bio_data_dir(bio) == READ)
|
|
||||||
set_page_dirty_lock(bvec->bv_page);
|
|
||||||
|
|
||||||
put_page(bvec->bv_page);
|
|
||||||
}
|
|
||||||
|
|
||||||
bio_put(bio);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* bio_unmap_user - unmap a bio
|
* bio_unmap_user - unmap a bio
|
||||||
* @bio: the bio being unmapped
|
* @bio: the bio being unmapped
|
||||||
|
@ -1475,7 +1436,8 @@ static void __bio_unmap_user(struct bio *bio)
|
||||||
*/
|
*/
|
||||||
void bio_unmap_user(struct bio *bio)
|
void bio_unmap_user(struct bio *bio)
|
||||||
{
|
{
|
||||||
__bio_unmap_user(bio);
|
bio_release_pages(bio, bio_data_dir(bio) == READ);
|
||||||
|
bio_put(bio);
|
||||||
bio_put(bio);
|
bio_put(bio);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1695,9 +1657,7 @@ static void bio_dirty_fn(struct work_struct *work)
|
||||||
while ((bio = next) != NULL) {
|
while ((bio = next) != NULL) {
|
||||||
next = bio->bi_private;
|
next = bio->bi_private;
|
||||||
|
|
||||||
bio_set_pages_dirty(bio);
|
bio_release_pages(bio, true);
|
||||||
if (!bio_flagged(bio, BIO_NO_PAGE_REF))
|
|
||||||
bio_release_pages(bio);
|
|
||||||
bio_put(bio);
|
bio_put(bio);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -1713,8 +1673,7 @@ void bio_check_pages_dirty(struct bio *bio)
|
||||||
goto defer;
|
goto defer;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!bio_flagged(bio, BIO_NO_PAGE_REF))
|
bio_release_pages(bio, false);
|
||||||
bio_release_pages(bio);
|
|
||||||
bio_put(bio);
|
bio_put(bio);
|
||||||
return;
|
return;
|
||||||
defer:
|
defer:
|
||||||
|
@ -1775,18 +1734,6 @@ void generic_end_io_acct(struct request_queue *q, int req_op,
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL(generic_end_io_acct);
|
EXPORT_SYMBOL(generic_end_io_acct);
|
||||||
|
|
||||||
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
|
|
||||||
void bio_flush_dcache_pages(struct bio *bi)
|
|
||||||
{
|
|
||||||
struct bio_vec bvec;
|
|
||||||
struct bvec_iter iter;
|
|
||||||
|
|
||||||
bio_for_each_segment(bvec, bi, iter)
|
|
||||||
flush_dcache_page(bvec.bv_page);
|
|
||||||
}
|
|
||||||
EXPORT_SYMBOL(bio_flush_dcache_pages);
|
|
||||||
#endif
|
|
||||||
|
|
||||||
static inline bool bio_remaining_done(struct bio *bio)
|
static inline bool bio_remaining_done(struct bio *bio)
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
|
@ -1914,10 +1861,7 @@ void bio_trim(struct bio *bio, int offset, int size)
|
||||||
if (offset == 0 && size == bio->bi_iter.bi_size)
|
if (offset == 0 && size == bio->bi_iter.bi_size)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
bio_clear_flag(bio, BIO_SEG_VALID);
|
|
||||||
|
|
||||||
bio_advance(bio, offset << 9);
|
bio_advance(bio, offset << 9);
|
||||||
|
|
||||||
bio->bi_iter.bi_size = size;
|
bio->bi_iter.bi_size = size;
|
||||||
|
|
||||||
if (bio_integrity(bio))
|
if (bio_integrity(bio))
|
||||||
|
|
|
@ -79,6 +79,7 @@ static void blkg_free(struct blkcg_gq *blkg)
|
||||||
|
|
||||||
blkg_rwstat_exit(&blkg->stat_ios);
|
blkg_rwstat_exit(&blkg->stat_ios);
|
||||||
blkg_rwstat_exit(&blkg->stat_bytes);
|
blkg_rwstat_exit(&blkg->stat_bytes);
|
||||||
|
percpu_ref_exit(&blkg->refcnt);
|
||||||
kfree(blkg);
|
kfree(blkg);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -86,8 +87,6 @@ static void __blkg_release(struct rcu_head *rcu)
|
||||||
{
|
{
|
||||||
struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
|
struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
|
||||||
|
|
||||||
percpu_ref_exit(&blkg->refcnt);
|
|
||||||
|
|
||||||
/* release the blkcg and parent blkg refs this blkg has been holding */
|
/* release the blkcg and parent blkg refs this blkg has been holding */
|
||||||
css_put(&blkg->blkcg->css);
|
css_put(&blkg->blkcg->css);
|
||||||
if (blkg->parent)
|
if (blkg->parent)
|
||||||
|
@ -132,6 +131,9 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
|
||||||
if (!blkg)
|
if (!blkg)
|
||||||
return NULL;
|
return NULL;
|
||||||
|
|
||||||
|
if (percpu_ref_init(&blkg->refcnt, blkg_release, 0, gfp_mask))
|
||||||
|
goto err_free;
|
||||||
|
|
||||||
if (blkg_rwstat_init(&blkg->stat_bytes, gfp_mask) ||
|
if (blkg_rwstat_init(&blkg->stat_bytes, gfp_mask) ||
|
||||||
blkg_rwstat_init(&blkg->stat_ios, gfp_mask))
|
blkg_rwstat_init(&blkg->stat_ios, gfp_mask))
|
||||||
goto err_free;
|
goto err_free;
|
||||||
|
@ -244,11 +246,6 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
|
||||||
blkg_get(blkg->parent);
|
blkg_get(blkg->parent);
|
||||||
}
|
}
|
||||||
|
|
||||||
ret = percpu_ref_init(&blkg->refcnt, blkg_release, 0,
|
|
||||||
GFP_NOWAIT | __GFP_NOWARN);
|
|
||||||
if (ret)
|
|
||||||
goto err_cancel_ref;
|
|
||||||
|
|
||||||
/* invoke per-policy init */
|
/* invoke per-policy init */
|
||||||
for (i = 0; i < BLKCG_MAX_POLS; i++) {
|
for (i = 0; i < BLKCG_MAX_POLS; i++) {
|
||||||
struct blkcg_policy *pol = blkcg_policy[i];
|
struct blkcg_policy *pol = blkcg_policy[i];
|
||||||
|
@ -281,8 +278,6 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
|
||||||
blkg_put(blkg);
|
blkg_put(blkg);
|
||||||
return ERR_PTR(ret);
|
return ERR_PTR(ret);
|
||||||
|
|
||||||
err_cancel_ref:
|
|
||||||
percpu_ref_exit(&blkg->refcnt);
|
|
||||||
err_put_congested:
|
err_put_congested:
|
||||||
wb_congested_put(wb_congested);
|
wb_congested_put(wb_congested);
|
||||||
err_put_css:
|
err_put_css:
|
||||||
|
@ -549,7 +544,7 @@ EXPORT_SYMBOL_GPL(__blkg_prfill_u64);
|
||||||
* Print @rwstat to @sf for the device assocaited with @pd.
|
* Print @rwstat to @sf for the device assocaited with @pd.
|
||||||
*/
|
*/
|
||||||
u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
|
u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
|
||||||
const struct blkg_rwstat *rwstat)
|
const struct blkg_rwstat_sample *rwstat)
|
||||||
{
|
{
|
||||||
static const char *rwstr[] = {
|
static const char *rwstr[] = {
|
||||||
[BLKG_RWSTAT_READ] = "Read",
|
[BLKG_RWSTAT_READ] = "Read",
|
||||||
|
@ -567,30 +562,16 @@ u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
|
||||||
|
|
||||||
for (i = 0; i < BLKG_RWSTAT_NR; i++)
|
for (i = 0; i < BLKG_RWSTAT_NR; i++)
|
||||||
seq_printf(sf, "%s %s %llu\n", dname, rwstr[i],
|
seq_printf(sf, "%s %s %llu\n", dname, rwstr[i],
|
||||||
(unsigned long long)atomic64_read(&rwstat->aux_cnt[i]));
|
rwstat->cnt[i]);
|
||||||
|
|
||||||
v = atomic64_read(&rwstat->aux_cnt[BLKG_RWSTAT_READ]) +
|
v = rwstat->cnt[BLKG_RWSTAT_READ] +
|
||||||
atomic64_read(&rwstat->aux_cnt[BLKG_RWSTAT_WRITE]) +
|
rwstat->cnt[BLKG_RWSTAT_WRITE] +
|
||||||
atomic64_read(&rwstat->aux_cnt[BLKG_RWSTAT_DISCARD]);
|
rwstat->cnt[BLKG_RWSTAT_DISCARD];
|
||||||
seq_printf(sf, "%s Total %llu\n", dname, (unsigned long long)v);
|
seq_printf(sf, "%s Total %llu\n", dname, v);
|
||||||
return v;
|
return v;
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(__blkg_prfill_rwstat);
|
EXPORT_SYMBOL_GPL(__blkg_prfill_rwstat);
|
||||||
|
|
||||||
/**
|
|
||||||
* blkg_prfill_stat - prfill callback for blkg_stat
|
|
||||||
* @sf: seq_file to print to
|
|
||||||
* @pd: policy private data of interest
|
|
||||||
* @off: offset to the blkg_stat in @pd
|
|
||||||
*
|
|
||||||
* prfill callback for printing a blkg_stat.
|
|
||||||
*/
|
|
||||||
u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd, int off)
|
|
||||||
{
|
|
||||||
return __blkg_prfill_u64(sf, pd, blkg_stat_read((void *)pd + off));
|
|
||||||
}
|
|
||||||
EXPORT_SYMBOL_GPL(blkg_prfill_stat);
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* blkg_prfill_rwstat - prfill callback for blkg_rwstat
|
* blkg_prfill_rwstat - prfill callback for blkg_rwstat
|
||||||
* @sf: seq_file to print to
|
* @sf: seq_file to print to
|
||||||
|
@ -602,8 +583,9 @@ EXPORT_SYMBOL_GPL(blkg_prfill_stat);
|
||||||
u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
|
u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
|
||||||
int off)
|
int off)
|
||||||
{
|
{
|
||||||
struct blkg_rwstat rwstat = blkg_rwstat_read((void *)pd + off);
|
struct blkg_rwstat_sample rwstat = { };
|
||||||
|
|
||||||
|
blkg_rwstat_read((void *)pd + off, &rwstat);
|
||||||
return __blkg_prfill_rwstat(sf, pd, &rwstat);
|
return __blkg_prfill_rwstat(sf, pd, &rwstat);
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(blkg_prfill_rwstat);
|
EXPORT_SYMBOL_GPL(blkg_prfill_rwstat);
|
||||||
|
@ -611,8 +593,9 @@ EXPORT_SYMBOL_GPL(blkg_prfill_rwstat);
|
||||||
static u64 blkg_prfill_rwstat_field(struct seq_file *sf,
|
static u64 blkg_prfill_rwstat_field(struct seq_file *sf,
|
||||||
struct blkg_policy_data *pd, int off)
|
struct blkg_policy_data *pd, int off)
|
||||||
{
|
{
|
||||||
struct blkg_rwstat rwstat = blkg_rwstat_read((void *)pd->blkg + off);
|
struct blkg_rwstat_sample rwstat = { };
|
||||||
|
|
||||||
|
blkg_rwstat_read((void *)pd->blkg + off, &rwstat);
|
||||||
return __blkg_prfill_rwstat(sf, pd, &rwstat);
|
return __blkg_prfill_rwstat(sf, pd, &rwstat);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -654,8 +637,9 @@ static u64 blkg_prfill_rwstat_field_recursive(struct seq_file *sf,
|
||||||
struct blkg_policy_data *pd,
|
struct blkg_policy_data *pd,
|
||||||
int off)
|
int off)
|
||||||
{
|
{
|
||||||
struct blkg_rwstat rwstat = blkg_rwstat_recursive_sum(pd->blkg,
|
struct blkg_rwstat_sample rwstat;
|
||||||
NULL, off);
|
|
||||||
|
blkg_rwstat_recursive_sum(pd->blkg, NULL, off, &rwstat);
|
||||||
return __blkg_prfill_rwstat(sf, pd, &rwstat);
|
return __blkg_prfill_rwstat(sf, pd, &rwstat);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -689,53 +673,12 @@ int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v)
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(blkg_print_stat_ios_recursive);
|
EXPORT_SYMBOL_GPL(blkg_print_stat_ios_recursive);
|
||||||
|
|
||||||
/**
|
|
||||||
* blkg_stat_recursive_sum - collect hierarchical blkg_stat
|
|
||||||
* @blkg: blkg of interest
|
|
||||||
* @pol: blkcg_policy which contains the blkg_stat
|
|
||||||
* @off: offset to the blkg_stat in blkg_policy_data or @blkg
|
|
||||||
*
|
|
||||||
* Collect the blkg_stat specified by @blkg, @pol and @off and all its
|
|
||||||
* online descendants and their aux counts. The caller must be holding the
|
|
||||||
* queue lock for online tests.
|
|
||||||
*
|
|
||||||
* If @pol is NULL, blkg_stat is at @off bytes into @blkg; otherwise, it is
|
|
||||||
* at @off bytes into @blkg's blkg_policy_data of the policy.
|
|
||||||
*/
|
|
||||||
u64 blkg_stat_recursive_sum(struct blkcg_gq *blkg,
|
|
||||||
struct blkcg_policy *pol, int off)
|
|
||||||
{
|
|
||||||
struct blkcg_gq *pos_blkg;
|
|
||||||
struct cgroup_subsys_state *pos_css;
|
|
||||||
u64 sum = 0;
|
|
||||||
|
|
||||||
lockdep_assert_held(&blkg->q->queue_lock);
|
|
||||||
|
|
||||||
rcu_read_lock();
|
|
||||||
blkg_for_each_descendant_pre(pos_blkg, pos_css, blkg) {
|
|
||||||
struct blkg_stat *stat;
|
|
||||||
|
|
||||||
if (!pos_blkg->online)
|
|
||||||
continue;
|
|
||||||
|
|
||||||
if (pol)
|
|
||||||
stat = (void *)blkg_to_pd(pos_blkg, pol) + off;
|
|
||||||
else
|
|
||||||
stat = (void *)blkg + off;
|
|
||||||
|
|
||||||
sum += blkg_stat_read(stat) + atomic64_read(&stat->aux_cnt);
|
|
||||||
}
|
|
||||||
rcu_read_unlock();
|
|
||||||
|
|
||||||
return sum;
|
|
||||||
}
|
|
||||||
EXPORT_SYMBOL_GPL(blkg_stat_recursive_sum);
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* blkg_rwstat_recursive_sum - collect hierarchical blkg_rwstat
|
* blkg_rwstat_recursive_sum - collect hierarchical blkg_rwstat
|
||||||
* @blkg: blkg of interest
|
* @blkg: blkg of interest
|
||||||
* @pol: blkcg_policy which contains the blkg_rwstat
|
* @pol: blkcg_policy which contains the blkg_rwstat
|
||||||
* @off: offset to the blkg_rwstat in blkg_policy_data or @blkg
|
* @off: offset to the blkg_rwstat in blkg_policy_data or @blkg
|
||||||
|
* @sum: blkg_rwstat_sample structure containing the results
|
||||||
*
|
*
|
||||||
* Collect the blkg_rwstat specified by @blkg, @pol and @off and all its
|
* Collect the blkg_rwstat specified by @blkg, @pol and @off and all its
|
||||||
* online descendants and their aux counts. The caller must be holding the
|
* online descendants and their aux counts. The caller must be holding the
|
||||||
|
@ -744,13 +687,12 @@ EXPORT_SYMBOL_GPL(blkg_stat_recursive_sum);
|
||||||
* If @pol is NULL, blkg_rwstat is at @off bytes into @blkg; otherwise, it
|
* If @pol is NULL, blkg_rwstat is at @off bytes into @blkg; otherwise, it
|
||||||
* is at @off bytes into @blkg's blkg_policy_data of the policy.
|
* is at @off bytes into @blkg's blkg_policy_data of the policy.
|
||||||
*/
|
*/
|
||||||
struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
|
void blkg_rwstat_recursive_sum(struct blkcg_gq *blkg, struct blkcg_policy *pol,
|
||||||
struct blkcg_policy *pol, int off)
|
int off, struct blkg_rwstat_sample *sum)
|
||||||
{
|
{
|
||||||
struct blkcg_gq *pos_blkg;
|
struct blkcg_gq *pos_blkg;
|
||||||
struct cgroup_subsys_state *pos_css;
|
struct cgroup_subsys_state *pos_css;
|
||||||
struct blkg_rwstat sum = { };
|
unsigned int i;
|
||||||
int i;
|
|
||||||
|
|
||||||
lockdep_assert_held(&blkg->q->queue_lock);
|
lockdep_assert_held(&blkg->q->queue_lock);
|
||||||
|
|
||||||
|
@ -767,13 +709,9 @@ struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
|
||||||
rwstat = (void *)pos_blkg + off;
|
rwstat = (void *)pos_blkg + off;
|
||||||
|
|
||||||
for (i = 0; i < BLKG_RWSTAT_NR; i++)
|
for (i = 0; i < BLKG_RWSTAT_NR; i++)
|
||||||
atomic64_add(atomic64_read(&rwstat->aux_cnt[i]) +
|
sum->cnt[i] = blkg_rwstat_read_counter(rwstat, i);
|
||||||
percpu_counter_sum_positive(&rwstat->cpu_cnt[i]),
|
|
||||||
&sum.aux_cnt[i]);
|
|
||||||
}
|
}
|
||||||
rcu_read_unlock();
|
rcu_read_unlock();
|
||||||
|
|
||||||
return sum;
|
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(blkg_rwstat_recursive_sum);
|
EXPORT_SYMBOL_GPL(blkg_rwstat_recursive_sum);
|
||||||
|
|
||||||
|
@ -939,7 +877,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
|
||||||
hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
|
hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
|
||||||
const char *dname;
|
const char *dname;
|
||||||
char *buf;
|
char *buf;
|
||||||
struct blkg_rwstat rwstat;
|
struct blkg_rwstat_sample rwstat;
|
||||||
u64 rbytes, wbytes, rios, wios, dbytes, dios;
|
u64 rbytes, wbytes, rios, wios, dbytes, dios;
|
||||||
size_t size = seq_get_buf(sf, &buf), off = 0;
|
size_t size = seq_get_buf(sf, &buf), off = 0;
|
||||||
int i;
|
int i;
|
||||||
|
@ -959,17 +897,17 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
|
||||||
|
|
||||||
spin_lock_irq(&blkg->q->queue_lock);
|
spin_lock_irq(&blkg->q->queue_lock);
|
||||||
|
|
||||||
rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
|
blkg_rwstat_recursive_sum(blkg, NULL,
|
||||||
offsetof(struct blkcg_gq, stat_bytes));
|
offsetof(struct blkcg_gq, stat_bytes), &rwstat);
|
||||||
rbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]);
|
rbytes = rwstat.cnt[BLKG_RWSTAT_READ];
|
||||||
wbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
|
wbytes = rwstat.cnt[BLKG_RWSTAT_WRITE];
|
||||||
dbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_DISCARD]);
|
dbytes = rwstat.cnt[BLKG_RWSTAT_DISCARD];
|
||||||
|
|
||||||
rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
|
blkg_rwstat_recursive_sum(blkg, NULL,
|
||||||
offsetof(struct blkcg_gq, stat_ios));
|
offsetof(struct blkcg_gq, stat_ios), &rwstat);
|
||||||
rios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]);
|
rios = rwstat.cnt[BLKG_RWSTAT_READ];
|
||||||
wios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
|
wios = rwstat.cnt[BLKG_RWSTAT_WRITE];
|
||||||
dios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_DISCARD]);
|
dios = rwstat.cnt[BLKG_RWSTAT_DISCARD];
|
||||||
|
|
||||||
spin_unlock_irq(&blkg->q->queue_lock);
|
spin_unlock_irq(&blkg->q->queue_lock);
|
||||||
|
|
||||||
|
@ -1006,8 +944,12 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
|
||||||
}
|
}
|
||||||
next:
|
next:
|
||||||
if (has_stats) {
|
if (has_stats) {
|
||||||
|
if (off < size - 1) {
|
||||||
off += scnprintf(buf+off, size-off, "\n");
|
off += scnprintf(buf+off, size-off, "\n");
|
||||||
seq_commit(sf, off);
|
seq_commit(sf, off);
|
||||||
|
} else {
|
||||||
|
seq_commit(sf, -1);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1391,7 +1333,8 @@ pd_prealloc:
|
||||||
|
|
||||||
spin_lock_irq(&q->queue_lock);
|
spin_lock_irq(&q->queue_lock);
|
||||||
|
|
||||||
list_for_each_entry(blkg, &q->blkg_list, q_node) {
|
/* blkg_list is pushed at the head, reverse walk to init parents first */
|
||||||
|
list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
|
||||||
struct blkg_policy_data *pd;
|
struct blkg_policy_data *pd;
|
||||||
|
|
||||||
if (blkg->pd[pol->plid])
|
if (blkg->pd[pol->plid])
|
||||||
|
|
109
block/blk-core.c
109
block/blk-core.c
|
@ -120,6 +120,42 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL(blk_rq_init);
|
EXPORT_SYMBOL(blk_rq_init);
|
||||||
|
|
||||||
|
#define REQ_OP_NAME(name) [REQ_OP_##name] = #name
|
||||||
|
static const char *const blk_op_name[] = {
|
||||||
|
REQ_OP_NAME(READ),
|
||||||
|
REQ_OP_NAME(WRITE),
|
||||||
|
REQ_OP_NAME(FLUSH),
|
||||||
|
REQ_OP_NAME(DISCARD),
|
||||||
|
REQ_OP_NAME(SECURE_ERASE),
|
||||||
|
REQ_OP_NAME(ZONE_RESET),
|
||||||
|
REQ_OP_NAME(WRITE_SAME),
|
||||||
|
REQ_OP_NAME(WRITE_ZEROES),
|
||||||
|
REQ_OP_NAME(SCSI_IN),
|
||||||
|
REQ_OP_NAME(SCSI_OUT),
|
||||||
|
REQ_OP_NAME(DRV_IN),
|
||||||
|
REQ_OP_NAME(DRV_OUT),
|
||||||
|
};
|
||||||
|
#undef REQ_OP_NAME
|
||||||
|
|
||||||
|
/**
|
||||||
|
* blk_op_str - Return string XXX in the REQ_OP_XXX.
|
||||||
|
* @op: REQ_OP_XXX.
|
||||||
|
*
|
||||||
|
* Description: Centralize block layer function to convert REQ_OP_XXX into
|
||||||
|
* string format. Useful in the debugging and tracing bio or request. For
|
||||||
|
* invalid REQ_OP_XXX it returns string "UNKNOWN".
|
||||||
|
*/
|
||||||
|
inline const char *blk_op_str(unsigned int op)
|
||||||
|
{
|
||||||
|
const char *op_str = "UNKNOWN";
|
||||||
|
|
||||||
|
if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op])
|
||||||
|
op_str = blk_op_name[op];
|
||||||
|
|
||||||
|
return op_str;
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(blk_op_str);
|
||||||
|
|
||||||
static const struct {
|
static const struct {
|
||||||
int errno;
|
int errno;
|
||||||
const char *name;
|
const char *name;
|
||||||
|
@ -167,18 +203,23 @@ int blk_status_to_errno(blk_status_t status)
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(blk_status_to_errno);
|
EXPORT_SYMBOL_GPL(blk_status_to_errno);
|
||||||
|
|
||||||
static void print_req_error(struct request *req, blk_status_t status)
|
static void print_req_error(struct request *req, blk_status_t status,
|
||||||
|
const char *caller)
|
||||||
{
|
{
|
||||||
int idx = (__force int)status;
|
int idx = (__force int)status;
|
||||||
|
|
||||||
if (WARN_ON_ONCE(idx >= ARRAY_SIZE(blk_errors)))
|
if (WARN_ON_ONCE(idx >= ARRAY_SIZE(blk_errors)))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
printk_ratelimited(KERN_ERR "%s: %s error, dev %s, sector %llu flags %x\n",
|
printk_ratelimited(KERN_ERR
|
||||||
__func__, blk_errors[idx].name,
|
"%s: %s error, dev %s, sector %llu op 0x%x:(%s) flags 0x%x "
|
||||||
|
"phys_seg %u prio class %u\n",
|
||||||
|
caller, blk_errors[idx].name,
|
||||||
req->rq_disk ? req->rq_disk->disk_name : "?",
|
req->rq_disk ? req->rq_disk->disk_name : "?",
|
||||||
(unsigned long long)blk_rq_pos(req),
|
blk_rq_pos(req), req_op(req), blk_op_str(req_op(req)),
|
||||||
req->cmd_flags);
|
req->cmd_flags & ~REQ_OP_MASK,
|
||||||
|
req->nr_phys_segments,
|
||||||
|
IOPRIO_PRIO_CLASS(req->ioprio));
|
||||||
}
|
}
|
||||||
|
|
||||||
static void req_bio_endio(struct request *rq, struct bio *bio,
|
static void req_bio_endio(struct request *rq, struct bio *bio,
|
||||||
|
@ -550,15 +591,15 @@ void blk_put_request(struct request *req)
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL(blk_put_request);
|
EXPORT_SYMBOL(blk_put_request);
|
||||||
|
|
||||||
bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
|
bool bio_attempt_back_merge(struct request *req, struct bio *bio,
|
||||||
struct bio *bio)
|
unsigned int nr_segs)
|
||||||
{
|
{
|
||||||
const int ff = bio->bi_opf & REQ_FAILFAST_MASK;
|
const int ff = bio->bi_opf & REQ_FAILFAST_MASK;
|
||||||
|
|
||||||
if (!ll_back_merge_fn(q, req, bio))
|
if (!ll_back_merge_fn(req, bio, nr_segs))
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
trace_block_bio_backmerge(q, req, bio);
|
trace_block_bio_backmerge(req->q, req, bio);
|
||||||
|
|
||||||
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
|
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
|
||||||
blk_rq_set_mixed_merge(req);
|
blk_rq_set_mixed_merge(req);
|
||||||
|
@ -571,15 +612,15 @@ bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
bool bio_attempt_front_merge(struct request_queue *q, struct request *req,
|
bool bio_attempt_front_merge(struct request *req, struct bio *bio,
|
||||||
struct bio *bio)
|
unsigned int nr_segs)
|
||||||
{
|
{
|
||||||
const int ff = bio->bi_opf & REQ_FAILFAST_MASK;
|
const int ff = bio->bi_opf & REQ_FAILFAST_MASK;
|
||||||
|
|
||||||
if (!ll_front_merge_fn(q, req, bio))
|
if (!ll_front_merge_fn(req, bio, nr_segs))
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
trace_block_bio_frontmerge(q, req, bio);
|
trace_block_bio_frontmerge(req->q, req, bio);
|
||||||
|
|
||||||
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
|
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
|
||||||
blk_rq_set_mixed_merge(req);
|
blk_rq_set_mixed_merge(req);
|
||||||
|
@ -621,6 +662,7 @@ no_merge:
|
||||||
* blk_attempt_plug_merge - try to merge with %current's plugged list
|
* blk_attempt_plug_merge - try to merge with %current's plugged list
|
||||||
* @q: request_queue new bio is being queued at
|
* @q: request_queue new bio is being queued at
|
||||||
* @bio: new bio being queued
|
* @bio: new bio being queued
|
||||||
|
* @nr_segs: number of segments in @bio
|
||||||
* @same_queue_rq: pointer to &struct request that gets filled in when
|
* @same_queue_rq: pointer to &struct request that gets filled in when
|
||||||
* another request associated with @q is found on the plug list
|
* another request associated with @q is found on the plug list
|
||||||
* (optional, may be %NULL)
|
* (optional, may be %NULL)
|
||||||
|
@ -639,7 +681,7 @@ no_merge:
|
||||||
* Caller must ensure !blk_queue_nomerges(q) beforehand.
|
* Caller must ensure !blk_queue_nomerges(q) beforehand.
|
||||||
*/
|
*/
|
||||||
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
|
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
|
||||||
struct request **same_queue_rq)
|
unsigned int nr_segs, struct request **same_queue_rq)
|
||||||
{
|
{
|
||||||
struct blk_plug *plug;
|
struct blk_plug *plug;
|
||||||
struct request *rq;
|
struct request *rq;
|
||||||
|
@ -668,10 +710,10 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
|
||||||
|
|
||||||
switch (blk_try_merge(rq, bio)) {
|
switch (blk_try_merge(rq, bio)) {
|
||||||
case ELEVATOR_BACK_MERGE:
|
case ELEVATOR_BACK_MERGE:
|
||||||
merged = bio_attempt_back_merge(q, rq, bio);
|
merged = bio_attempt_back_merge(rq, bio, nr_segs);
|
||||||
break;
|
break;
|
||||||
case ELEVATOR_FRONT_MERGE:
|
case ELEVATOR_FRONT_MERGE:
|
||||||
merged = bio_attempt_front_merge(q, rq, bio);
|
merged = bio_attempt_front_merge(rq, bio, nr_segs);
|
||||||
break;
|
break;
|
||||||
case ELEVATOR_DISCARD_MERGE:
|
case ELEVATOR_DISCARD_MERGE:
|
||||||
merged = bio_attempt_discard_merge(q, rq, bio);
|
merged = bio_attempt_discard_merge(q, rq, bio);
|
||||||
|
@ -687,18 +729,6 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
void blk_init_request_from_bio(struct request *req, struct bio *bio)
|
|
||||||
{
|
|
||||||
if (bio->bi_opf & REQ_RAHEAD)
|
|
||||||
req->cmd_flags |= REQ_FAILFAST_MASK;
|
|
||||||
|
|
||||||
req->__sector = bio->bi_iter.bi_sector;
|
|
||||||
req->ioprio = bio_prio(bio);
|
|
||||||
req->write_hint = bio->bi_write_hint;
|
|
||||||
blk_rq_bio_prep(req->q, req, bio);
|
|
||||||
}
|
|
||||||
EXPORT_SYMBOL_GPL(blk_init_request_from_bio);
|
|
||||||
|
|
||||||
static void handle_bad_sector(struct bio *bio, sector_t maxsector)
|
static void handle_bad_sector(struct bio *bio, sector_t maxsector)
|
||||||
{
|
{
|
||||||
char b[BDEVNAME_SIZE];
|
char b[BDEVNAME_SIZE];
|
||||||
|
@ -1163,7 +1193,7 @@ static int blk_cloned_rq_check_limits(struct request_queue *q,
|
||||||
* Recalculate it to check the request correctly on this queue's
|
* Recalculate it to check the request correctly on this queue's
|
||||||
* limitation.
|
* limitation.
|
||||||
*/
|
*/
|
||||||
blk_recalc_rq_segments(rq);
|
rq->nr_phys_segments = blk_recalc_rq_segments(rq);
|
||||||
if (rq->nr_phys_segments > queue_max_segments(q)) {
|
if (rq->nr_phys_segments > queue_max_segments(q)) {
|
||||||
printk(KERN_ERR "%s: over max segments limit. (%hu > %hu)\n",
|
printk(KERN_ERR "%s: over max segments limit. (%hu > %hu)\n",
|
||||||
__func__, rq->nr_phys_segments, queue_max_segments(q));
|
__func__, rq->nr_phys_segments, queue_max_segments(q));
|
||||||
|
@ -1348,7 +1378,7 @@ EXPORT_SYMBOL_GPL(blk_steal_bios);
|
||||||
*
|
*
|
||||||
* This special helper function is only for request stacking drivers
|
* This special helper function is only for request stacking drivers
|
||||||
* (e.g. request-based dm) so that they can handle partial completion.
|
* (e.g. request-based dm) so that they can handle partial completion.
|
||||||
* Actual device drivers should use blk_end_request instead.
|
* Actual device drivers should use blk_mq_end_request instead.
|
||||||
*
|
*
|
||||||
* Passing the result of blk_rq_bytes() as @nr_bytes guarantees
|
* Passing the result of blk_rq_bytes() as @nr_bytes guarantees
|
||||||
* %false return from this function.
|
* %false return from this function.
|
||||||
|
@ -1373,7 +1403,7 @@ bool blk_update_request(struct request *req, blk_status_t error,
|
||||||
|
|
||||||
if (unlikely(error && !blk_rq_is_passthrough(req) &&
|
if (unlikely(error && !blk_rq_is_passthrough(req) &&
|
||||||
!(req->rq_flags & RQF_QUIET)))
|
!(req->rq_flags & RQF_QUIET)))
|
||||||
print_req_error(req, error);
|
print_req_error(req, error, __func__);
|
||||||
|
|
||||||
blk_account_io_completion(req, nr_bytes);
|
blk_account_io_completion(req, nr_bytes);
|
||||||
|
|
||||||
|
@ -1432,28 +1462,13 @@ bool blk_update_request(struct request *req, blk_status_t error,
|
||||||
}
|
}
|
||||||
|
|
||||||
/* recalculate the number of segments */
|
/* recalculate the number of segments */
|
||||||
blk_recalc_rq_segments(req);
|
req->nr_phys_segments = blk_recalc_rq_segments(req);
|
||||||
}
|
}
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(blk_update_request);
|
EXPORT_SYMBOL_GPL(blk_update_request);
|
||||||
|
|
||||||
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
|
|
||||||
struct bio *bio)
|
|
||||||
{
|
|
||||||
if (bio_has_data(bio))
|
|
||||||
rq->nr_phys_segments = bio_phys_segments(q, bio);
|
|
||||||
else if (bio_op(bio) == REQ_OP_DISCARD)
|
|
||||||
rq->nr_phys_segments = 1;
|
|
||||||
|
|
||||||
rq->__data_len = bio->bi_iter.bi_size;
|
|
||||||
rq->bio = rq->biotail = bio;
|
|
||||||
|
|
||||||
if (bio->bi_disk)
|
|
||||||
rq->rq_disk = bio->bi_disk;
|
|
||||||
}
|
|
||||||
|
|
||||||
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
|
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
|
||||||
/**
|
/**
|
||||||
* rq_flush_dcache_pages - Helper function to flush all pages in a request
|
* rq_flush_dcache_pages - Helper function to flush all pages in a request
|
||||||
|
|
|
@ -618,8 +618,11 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
|
||||||
|
|
||||||
inflight = atomic_dec_return(&rqw->inflight);
|
inflight = atomic_dec_return(&rqw->inflight);
|
||||||
WARN_ON_ONCE(inflight < 0);
|
WARN_ON_ONCE(inflight < 0);
|
||||||
if (iolat->min_lat_nsec == 0)
|
/*
|
||||||
goto next;
|
* If bi_status is BLK_STS_AGAIN, the bio wasn't actually
|
||||||
|
* submitted, so do not account for it.
|
||||||
|
*/
|
||||||
|
if (iolat->min_lat_nsec && bio->bi_status != BLK_STS_AGAIN) {
|
||||||
iolatency_record_time(iolat, &bio->bi_issue, now,
|
iolatency_record_time(iolat, &bio->bi_issue, now,
|
||||||
issue_as_root);
|
issue_as_root);
|
||||||
window_start = atomic64_read(&iolat->window_start);
|
window_start = atomic64_read(&iolat->window_start);
|
||||||
|
@ -629,29 +632,8 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
|
||||||
window_start, now) == window_start)
|
window_start, now) == window_start)
|
||||||
iolatency_check_latencies(iolat, now);
|
iolatency_check_latencies(iolat, now);
|
||||||
}
|
}
|
||||||
next:
|
|
||||||
wake_up(&rqw->wait);
|
|
||||||
blkg = blkg->parent;
|
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
static void blkcg_iolatency_cleanup(struct rq_qos *rqos, struct bio *bio)
|
|
||||||
{
|
|
||||||
struct blkcg_gq *blkg;
|
|
||||||
|
|
||||||
blkg = bio->bi_blkg;
|
|
||||||
while (blkg && blkg->parent) {
|
|
||||||
struct rq_wait *rqw;
|
|
||||||
struct iolatency_grp *iolat;
|
|
||||||
|
|
||||||
iolat = blkg_to_lat(blkg);
|
|
||||||
if (!iolat)
|
|
||||||
goto next;
|
|
||||||
|
|
||||||
rqw = &iolat->rq_wait;
|
|
||||||
atomic_dec(&rqw->inflight);
|
|
||||||
wake_up(&rqw->wait);
|
wake_up(&rqw->wait);
|
||||||
next:
|
|
||||||
blkg = blkg->parent;
|
blkg = blkg->parent;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -667,7 +649,6 @@ static void blkcg_iolatency_exit(struct rq_qos *rqos)
|
||||||
|
|
||||||
static struct rq_qos_ops blkcg_iolatency_ops = {
|
static struct rq_qos_ops blkcg_iolatency_ops = {
|
||||||
.throttle = blkcg_iolatency_throttle,
|
.throttle = blkcg_iolatency_throttle,
|
||||||
.cleanup = blkcg_iolatency_cleanup,
|
|
||||||
.done_bio = blkcg_iolatency_done_bio,
|
.done_bio = blkcg_iolatency_done_bio,
|
||||||
.exit = blkcg_iolatency_exit,
|
.exit = blkcg_iolatency_exit,
|
||||||
};
|
};
|
||||||
|
@ -778,8 +759,10 @@ static int iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val)
|
||||||
|
|
||||||
if (!oldval && val)
|
if (!oldval && val)
|
||||||
return 1;
|
return 1;
|
||||||
if (oldval && !val)
|
if (oldval && !val) {
|
||||||
|
blkcg_clear_delay(blkg);
|
||||||
return -1;
|
return -1;
|
||||||
|
}
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -18,13 +18,19 @@
|
||||||
int blk_rq_append_bio(struct request *rq, struct bio **bio)
|
int blk_rq_append_bio(struct request *rq, struct bio **bio)
|
||||||
{
|
{
|
||||||
struct bio *orig_bio = *bio;
|
struct bio *orig_bio = *bio;
|
||||||
|
struct bvec_iter iter;
|
||||||
|
struct bio_vec bv;
|
||||||
|
unsigned int nr_segs = 0;
|
||||||
|
|
||||||
blk_queue_bounce(rq->q, bio);
|
blk_queue_bounce(rq->q, bio);
|
||||||
|
|
||||||
|
bio_for_each_bvec(bv, *bio, iter)
|
||||||
|
nr_segs++;
|
||||||
|
|
||||||
if (!rq->bio) {
|
if (!rq->bio) {
|
||||||
blk_rq_bio_prep(rq->q, rq, *bio);
|
blk_rq_bio_prep(rq, *bio, nr_segs);
|
||||||
} else {
|
} else {
|
||||||
if (!ll_back_merge_fn(rq->q, rq, *bio)) {
|
if (!ll_back_merge_fn(rq, *bio, nr_segs)) {
|
||||||
if (orig_bio != *bio) {
|
if (orig_bio != *bio) {
|
||||||
bio_put(*bio);
|
bio_put(*bio);
|
||||||
*bio = orig_bio;
|
*bio = orig_bio;
|
||||||
|
|
|
@ -105,7 +105,7 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
|
||||||
static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
|
static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
|
||||||
struct bio *bio, struct bio_set *bs, unsigned *nsegs)
|
struct bio *bio, struct bio_set *bs, unsigned *nsegs)
|
||||||
{
|
{
|
||||||
*nsegs = 1;
|
*nsegs = 0;
|
||||||
|
|
||||||
if (!q->limits.max_write_zeroes_sectors)
|
if (!q->limits.max_write_zeroes_sectors)
|
||||||
return NULL;
|
return NULL;
|
||||||
|
@ -202,8 +202,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
|
||||||
struct bio_vec bv, bvprv, *bvprvp = NULL;
|
struct bio_vec bv, bvprv, *bvprvp = NULL;
|
||||||
struct bvec_iter iter;
|
struct bvec_iter iter;
|
||||||
unsigned nsegs = 0, sectors = 0;
|
unsigned nsegs = 0, sectors = 0;
|
||||||
bool do_split = true;
|
|
||||||
struct bio *new = NULL;
|
|
||||||
const unsigned max_sectors = get_max_io_size(q, bio);
|
const unsigned max_sectors = get_max_io_size(q, bio);
|
||||||
const unsigned max_segs = queue_max_segments(q);
|
const unsigned max_segs = queue_max_segments(q);
|
||||||
|
|
||||||
|
@ -245,45 +243,36 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
do_split = false;
|
*segs = nsegs;
|
||||||
|
return NULL;
|
||||||
split:
|
split:
|
||||||
*segs = nsegs;
|
*segs = nsegs;
|
||||||
|
return bio_split(bio, sectors, GFP_NOIO, bs);
|
||||||
if (do_split) {
|
|
||||||
new = bio_split(bio, sectors, GFP_NOIO, bs);
|
|
||||||
if (new)
|
|
||||||
bio = new;
|
|
||||||
}
|
|
||||||
|
|
||||||
return do_split ? new : NULL;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
void blk_queue_split(struct request_queue *q, struct bio **bio)
|
void __blk_queue_split(struct request_queue *q, struct bio **bio,
|
||||||
|
unsigned int *nr_segs)
|
||||||
{
|
{
|
||||||
struct bio *split, *res;
|
struct bio *split;
|
||||||
unsigned nsegs;
|
|
||||||
|
|
||||||
switch (bio_op(*bio)) {
|
switch (bio_op(*bio)) {
|
||||||
case REQ_OP_DISCARD:
|
case REQ_OP_DISCARD:
|
||||||
case REQ_OP_SECURE_ERASE:
|
case REQ_OP_SECURE_ERASE:
|
||||||
split = blk_bio_discard_split(q, *bio, &q->bio_split, &nsegs);
|
split = blk_bio_discard_split(q, *bio, &q->bio_split, nr_segs);
|
||||||
break;
|
break;
|
||||||
case REQ_OP_WRITE_ZEROES:
|
case REQ_OP_WRITE_ZEROES:
|
||||||
split = blk_bio_write_zeroes_split(q, *bio, &q->bio_split, &nsegs);
|
split = blk_bio_write_zeroes_split(q, *bio, &q->bio_split,
|
||||||
|
nr_segs);
|
||||||
break;
|
break;
|
||||||
case REQ_OP_WRITE_SAME:
|
case REQ_OP_WRITE_SAME:
|
||||||
split = blk_bio_write_same_split(q, *bio, &q->bio_split, &nsegs);
|
split = blk_bio_write_same_split(q, *bio, &q->bio_split,
|
||||||
|
nr_segs);
|
||||||
break;
|
break;
|
||||||
default:
|
default:
|
||||||
split = blk_bio_segment_split(q, *bio, &q->bio_split, &nsegs);
|
split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* physical segments can be figured out during splitting */
|
|
||||||
res = split ? split : *bio;
|
|
||||||
res->bi_phys_segments = nsegs;
|
|
||||||
bio_set_flag(res, BIO_SEG_VALID);
|
|
||||||
|
|
||||||
if (split) {
|
if (split) {
|
||||||
/* there isn't chance to merge the splitted bio */
|
/* there isn't chance to merge the splitted bio */
|
||||||
split->bi_opf |= REQ_NOMERGE;
|
split->bi_opf |= REQ_NOMERGE;
|
||||||
|
@ -304,19 +293,25 @@ void blk_queue_split(struct request_queue *q, struct bio **bio)
|
||||||
*bio = split;
|
*bio = split;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void blk_queue_split(struct request_queue *q, struct bio **bio)
|
||||||
|
{
|
||||||
|
unsigned int nr_segs;
|
||||||
|
|
||||||
|
__blk_queue_split(q, bio, &nr_segs);
|
||||||
|
}
|
||||||
EXPORT_SYMBOL(blk_queue_split);
|
EXPORT_SYMBOL(blk_queue_split);
|
||||||
|
|
||||||
static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
|
unsigned int blk_recalc_rq_segments(struct request *rq)
|
||||||
struct bio *bio)
|
|
||||||
{
|
{
|
||||||
unsigned int nr_phys_segs = 0;
|
unsigned int nr_phys_segs = 0;
|
||||||
struct bvec_iter iter;
|
struct req_iterator iter;
|
||||||
struct bio_vec bv;
|
struct bio_vec bv;
|
||||||
|
|
||||||
if (!bio)
|
if (!rq->bio)
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
switch (bio_op(bio)) {
|
switch (bio_op(rq->bio)) {
|
||||||
case REQ_OP_DISCARD:
|
case REQ_OP_DISCARD:
|
||||||
case REQ_OP_SECURE_ERASE:
|
case REQ_OP_SECURE_ERASE:
|
||||||
case REQ_OP_WRITE_ZEROES:
|
case REQ_OP_WRITE_ZEROES:
|
||||||
|
@ -325,30 +320,11 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
for_each_bio(bio) {
|
rq_for_each_bvec(bv, rq, iter)
|
||||||
bio_for_each_bvec(bv, bio, iter)
|
bvec_split_segs(rq->q, &bv, &nr_phys_segs, NULL, UINT_MAX);
|
||||||
bvec_split_segs(q, &bv, &nr_phys_segs, NULL, UINT_MAX);
|
|
||||||
}
|
|
||||||
|
|
||||||
return nr_phys_segs;
|
return nr_phys_segs;
|
||||||
}
|
}
|
||||||
|
|
||||||
void blk_recalc_rq_segments(struct request *rq)
|
|
||||||
{
|
|
||||||
rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio);
|
|
||||||
}
|
|
||||||
|
|
||||||
void blk_recount_segments(struct request_queue *q, struct bio *bio)
|
|
||||||
{
|
|
||||||
struct bio *nxt = bio->bi_next;
|
|
||||||
|
|
||||||
bio->bi_next = NULL;
|
|
||||||
bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio);
|
|
||||||
bio->bi_next = nxt;
|
|
||||||
|
|
||||||
bio_set_flag(bio, BIO_SEG_VALID);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
|
static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
|
||||||
struct scatterlist *sglist)
|
struct scatterlist *sglist)
|
||||||
{
|
{
|
||||||
|
@ -519,16 +495,13 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL(blk_rq_map_sg);
|
EXPORT_SYMBOL(blk_rq_map_sg);
|
||||||
|
|
||||||
static inline int ll_new_hw_segment(struct request_queue *q,
|
static inline int ll_new_hw_segment(struct request *req, struct bio *bio,
|
||||||
struct request *req,
|
unsigned int nr_phys_segs)
|
||||||
struct bio *bio)
|
|
||||||
{
|
{
|
||||||
int nr_phys_segs = bio_phys_segments(q, bio);
|
if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(req->q))
|
||||||
|
|
||||||
if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(q))
|
|
||||||
goto no_merge;
|
goto no_merge;
|
||||||
|
|
||||||
if (blk_integrity_merge_bio(q, req, bio) == false)
|
if (blk_integrity_merge_bio(req->q, req, bio) == false)
|
||||||
goto no_merge;
|
goto no_merge;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -539,12 +512,11 @@ static inline int ll_new_hw_segment(struct request_queue *q,
|
||||||
return 1;
|
return 1;
|
||||||
|
|
||||||
no_merge:
|
no_merge:
|
||||||
req_set_nomerge(q, req);
|
req_set_nomerge(req->q, req);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
int ll_back_merge_fn(struct request_queue *q, struct request *req,
|
int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs)
|
||||||
struct bio *bio)
|
|
||||||
{
|
{
|
||||||
if (req_gap_back_merge(req, bio))
|
if (req_gap_back_merge(req, bio))
|
||||||
return 0;
|
return 0;
|
||||||
|
@ -553,21 +525,15 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req,
|
||||||
return 0;
|
return 0;
|
||||||
if (blk_rq_sectors(req) + bio_sectors(bio) >
|
if (blk_rq_sectors(req) + bio_sectors(bio) >
|
||||||
blk_rq_get_max_sectors(req, blk_rq_pos(req))) {
|
blk_rq_get_max_sectors(req, blk_rq_pos(req))) {
|
||||||
req_set_nomerge(q, req);
|
req_set_nomerge(req->q, req);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
if (!bio_flagged(req->biotail, BIO_SEG_VALID))
|
|
||||||
blk_recount_segments(q, req->biotail);
|
|
||||||
if (!bio_flagged(bio, BIO_SEG_VALID))
|
|
||||||
blk_recount_segments(q, bio);
|
|
||||||
|
|
||||||
return ll_new_hw_segment(q, req, bio);
|
return ll_new_hw_segment(req, bio, nr_segs);
|
||||||
}
|
}
|
||||||
|
|
||||||
int ll_front_merge_fn(struct request_queue *q, struct request *req,
|
int ll_front_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs)
|
||||||
struct bio *bio)
|
|
||||||
{
|
{
|
||||||
|
|
||||||
if (req_gap_front_merge(req, bio))
|
if (req_gap_front_merge(req, bio))
|
||||||
return 0;
|
return 0;
|
||||||
if (blk_integrity_rq(req) &&
|
if (blk_integrity_rq(req) &&
|
||||||
|
@ -575,15 +541,11 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
|
||||||
return 0;
|
return 0;
|
||||||
if (blk_rq_sectors(req) + bio_sectors(bio) >
|
if (blk_rq_sectors(req) + bio_sectors(bio) >
|
||||||
blk_rq_get_max_sectors(req, bio->bi_iter.bi_sector)) {
|
blk_rq_get_max_sectors(req, bio->bi_iter.bi_sector)) {
|
||||||
req_set_nomerge(q, req);
|
req_set_nomerge(req->q, req);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
if (!bio_flagged(bio, BIO_SEG_VALID))
|
|
||||||
blk_recount_segments(q, bio);
|
|
||||||
if (!bio_flagged(req->bio, BIO_SEG_VALID))
|
|
||||||
blk_recount_segments(q, req->bio);
|
|
||||||
|
|
||||||
return ll_new_hw_segment(q, req, bio);
|
return ll_new_hw_segment(req, bio, nr_segs);
|
||||||
}
|
}
|
||||||
|
|
||||||
static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
|
static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
|
||||||
|
|
|
@ -17,7 +17,7 @@
|
||||||
static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
|
static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
|
||||||
{
|
{
|
||||||
if (stat->nr_samples) {
|
if (stat->nr_samples) {
|
||||||
seq_printf(m, "samples=%d, mean=%lld, min=%llu, max=%llu",
|
seq_printf(m, "samples=%d, mean=%llu, min=%llu, max=%llu",
|
||||||
stat->nr_samples, stat->mean, stat->min, stat->max);
|
stat->nr_samples, stat->mean, stat->min, stat->max);
|
||||||
} else {
|
} else {
|
||||||
seq_puts(m, "samples=0");
|
seq_puts(m, "samples=0");
|
||||||
|
@ -29,13 +29,13 @@ static int queue_poll_stat_show(void *data, struct seq_file *m)
|
||||||
struct request_queue *q = data;
|
struct request_queue *q = data;
|
||||||
int bucket;
|
int bucket;
|
||||||
|
|
||||||
for (bucket = 0; bucket < BLK_MQ_POLL_STATS_BKTS/2; bucket++) {
|
for (bucket = 0; bucket < (BLK_MQ_POLL_STATS_BKTS / 2); bucket++) {
|
||||||
seq_printf(m, "read (%d Bytes): ", 1 << (9+bucket));
|
seq_printf(m, "read (%d Bytes): ", 1 << (9 + bucket));
|
||||||
print_stat(m, &q->poll_stat[2*bucket]);
|
print_stat(m, &q->poll_stat[2 * bucket]);
|
||||||
seq_puts(m, "\n");
|
seq_puts(m, "\n");
|
||||||
|
|
||||||
seq_printf(m, "write (%d Bytes): ", 1 << (9+bucket));
|
seq_printf(m, "write (%d Bytes): ", 1 << (9 + bucket));
|
||||||
print_stat(m, &q->poll_stat[2*bucket+1]);
|
print_stat(m, &q->poll_stat[2 * bucket + 1]);
|
||||||
seq_puts(m, "\n");
|
seq_puts(m, "\n");
|
||||||
}
|
}
|
||||||
return 0;
|
return 0;
|
||||||
|
@ -261,23 +261,6 @@ static int hctx_flags_show(void *data, struct seq_file *m)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
#define REQ_OP_NAME(name) [REQ_OP_##name] = #name
|
|
||||||
static const char *const op_name[] = {
|
|
||||||
REQ_OP_NAME(READ),
|
|
||||||
REQ_OP_NAME(WRITE),
|
|
||||||
REQ_OP_NAME(FLUSH),
|
|
||||||
REQ_OP_NAME(DISCARD),
|
|
||||||
REQ_OP_NAME(SECURE_ERASE),
|
|
||||||
REQ_OP_NAME(ZONE_RESET),
|
|
||||||
REQ_OP_NAME(WRITE_SAME),
|
|
||||||
REQ_OP_NAME(WRITE_ZEROES),
|
|
||||||
REQ_OP_NAME(SCSI_IN),
|
|
||||||
REQ_OP_NAME(SCSI_OUT),
|
|
||||||
REQ_OP_NAME(DRV_IN),
|
|
||||||
REQ_OP_NAME(DRV_OUT),
|
|
||||||
};
|
|
||||||
#undef REQ_OP_NAME
|
|
||||||
|
|
||||||
#define CMD_FLAG_NAME(name) [__REQ_##name] = #name
|
#define CMD_FLAG_NAME(name) [__REQ_##name] = #name
|
||||||
static const char *const cmd_flag_name[] = {
|
static const char *const cmd_flag_name[] = {
|
||||||
CMD_FLAG_NAME(FAILFAST_DEV),
|
CMD_FLAG_NAME(FAILFAST_DEV),
|
||||||
|
@ -341,13 +324,14 @@ static const char *blk_mq_rq_state_name(enum mq_rq_state rq_state)
|
||||||
int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq)
|
int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq)
|
||||||
{
|
{
|
||||||
const struct blk_mq_ops *const mq_ops = rq->q->mq_ops;
|
const struct blk_mq_ops *const mq_ops = rq->q->mq_ops;
|
||||||
const unsigned int op = rq->cmd_flags & REQ_OP_MASK;
|
const unsigned int op = req_op(rq);
|
||||||
|
const char *op_str = blk_op_str(op);
|
||||||
|
|
||||||
seq_printf(m, "%p {.op=", rq);
|
seq_printf(m, "%p {.op=", rq);
|
||||||
if (op < ARRAY_SIZE(op_name) && op_name[op])
|
if (strcmp(op_str, "UNKNOWN") == 0)
|
||||||
seq_printf(m, "%s", op_name[op]);
|
seq_printf(m, "%u", op);
|
||||||
else
|
else
|
||||||
seq_printf(m, "%d", op);
|
seq_printf(m, "%s", op_str);
|
||||||
seq_puts(m, ", .cmd_flags=");
|
seq_puts(m, ", .cmd_flags=");
|
||||||
blk_flags_show(m, rq->cmd_flags & ~REQ_OP_MASK, cmd_flag_name,
|
blk_flags_show(m, rq->cmd_flags & ~REQ_OP_MASK, cmd_flag_name,
|
||||||
ARRAY_SIZE(cmd_flag_name));
|
ARRAY_SIZE(cmd_flag_name));
|
||||||
|
@ -779,7 +763,7 @@ static int blk_mq_debugfs_release(struct inode *inode, struct file *file)
|
||||||
|
|
||||||
if (attr->show)
|
if (attr->show)
|
||||||
return single_release(inode, file);
|
return single_release(inode, file);
|
||||||
else
|
|
||||||
return seq_release(inode, file);
|
return seq_release(inode, file);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -224,7 +224,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
|
||||||
}
|
}
|
||||||
|
|
||||||
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
|
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
|
||||||
struct request **merged_request)
|
unsigned int nr_segs, struct request **merged_request)
|
||||||
{
|
{
|
||||||
struct request *rq;
|
struct request *rq;
|
||||||
|
|
||||||
|
@ -232,7 +232,7 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
|
||||||
case ELEVATOR_BACK_MERGE:
|
case ELEVATOR_BACK_MERGE:
|
||||||
if (!blk_mq_sched_allow_merge(q, rq, bio))
|
if (!blk_mq_sched_allow_merge(q, rq, bio))
|
||||||
return false;
|
return false;
|
||||||
if (!bio_attempt_back_merge(q, rq, bio))
|
if (!bio_attempt_back_merge(rq, bio, nr_segs))
|
||||||
return false;
|
return false;
|
||||||
*merged_request = attempt_back_merge(q, rq);
|
*merged_request = attempt_back_merge(q, rq);
|
||||||
if (!*merged_request)
|
if (!*merged_request)
|
||||||
|
@ -241,7 +241,7 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
|
||||||
case ELEVATOR_FRONT_MERGE:
|
case ELEVATOR_FRONT_MERGE:
|
||||||
if (!blk_mq_sched_allow_merge(q, rq, bio))
|
if (!blk_mq_sched_allow_merge(q, rq, bio))
|
||||||
return false;
|
return false;
|
||||||
if (!bio_attempt_front_merge(q, rq, bio))
|
if (!bio_attempt_front_merge(rq, bio, nr_segs))
|
||||||
return false;
|
return false;
|
||||||
*merged_request = attempt_front_merge(q, rq);
|
*merged_request = attempt_front_merge(q, rq);
|
||||||
if (!*merged_request)
|
if (!*merged_request)
|
||||||
|
@ -260,7 +260,7 @@ EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
|
||||||
* of them.
|
* of them.
|
||||||
*/
|
*/
|
||||||
bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
|
bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
|
||||||
struct bio *bio)
|
struct bio *bio, unsigned int nr_segs)
|
||||||
{
|
{
|
||||||
struct request *rq;
|
struct request *rq;
|
||||||
int checked = 8;
|
int checked = 8;
|
||||||
|
@ -277,11 +277,13 @@ bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
|
||||||
switch (blk_try_merge(rq, bio)) {
|
switch (blk_try_merge(rq, bio)) {
|
||||||
case ELEVATOR_BACK_MERGE:
|
case ELEVATOR_BACK_MERGE:
|
||||||
if (blk_mq_sched_allow_merge(q, rq, bio))
|
if (blk_mq_sched_allow_merge(q, rq, bio))
|
||||||
merged = bio_attempt_back_merge(q, rq, bio);
|
merged = bio_attempt_back_merge(rq, bio,
|
||||||
|
nr_segs);
|
||||||
break;
|
break;
|
||||||
case ELEVATOR_FRONT_MERGE:
|
case ELEVATOR_FRONT_MERGE:
|
||||||
if (blk_mq_sched_allow_merge(q, rq, bio))
|
if (blk_mq_sched_allow_merge(q, rq, bio))
|
||||||
merged = bio_attempt_front_merge(q, rq, bio);
|
merged = bio_attempt_front_merge(rq, bio,
|
||||||
|
nr_segs);
|
||||||
break;
|
break;
|
||||||
case ELEVATOR_DISCARD_MERGE:
|
case ELEVATOR_DISCARD_MERGE:
|
||||||
merged = bio_attempt_discard_merge(q, rq, bio);
|
merged = bio_attempt_discard_merge(q, rq, bio);
|
||||||
|
@ -304,13 +306,14 @@ EXPORT_SYMBOL_GPL(blk_mq_bio_list_merge);
|
||||||
*/
|
*/
|
||||||
static bool blk_mq_attempt_merge(struct request_queue *q,
|
static bool blk_mq_attempt_merge(struct request_queue *q,
|
||||||
struct blk_mq_hw_ctx *hctx,
|
struct blk_mq_hw_ctx *hctx,
|
||||||
struct blk_mq_ctx *ctx, struct bio *bio)
|
struct blk_mq_ctx *ctx, struct bio *bio,
|
||||||
|
unsigned int nr_segs)
|
||||||
{
|
{
|
||||||
enum hctx_type type = hctx->type;
|
enum hctx_type type = hctx->type;
|
||||||
|
|
||||||
lockdep_assert_held(&ctx->lock);
|
lockdep_assert_held(&ctx->lock);
|
||||||
|
|
||||||
if (blk_mq_bio_list_merge(q, &ctx->rq_lists[type], bio)) {
|
if (blk_mq_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs)) {
|
||||||
ctx->rq_merged++;
|
ctx->rq_merged++;
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
@ -318,7 +321,8 @@ static bool blk_mq_attempt_merge(struct request_queue *q,
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
|
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
|
||||||
|
unsigned int nr_segs)
|
||||||
{
|
{
|
||||||
struct elevator_queue *e = q->elevator;
|
struct elevator_queue *e = q->elevator;
|
||||||
struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
|
struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
|
||||||
|
@ -326,21 +330,18 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
|
||||||
bool ret = false;
|
bool ret = false;
|
||||||
enum hctx_type type;
|
enum hctx_type type;
|
||||||
|
|
||||||
if (e && e->type->ops.bio_merge) {
|
if (e && e->type->ops.bio_merge)
|
||||||
blk_mq_put_ctx(ctx);
|
return e->type->ops.bio_merge(hctx, bio, nr_segs);
|
||||||
return e->type->ops.bio_merge(hctx, bio);
|
|
||||||
}
|
|
||||||
|
|
||||||
type = hctx->type;
|
type = hctx->type;
|
||||||
if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
|
if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
|
||||||
!list_empty_careful(&ctx->rq_lists[type])) {
|
!list_empty_careful(&ctx->rq_lists[type])) {
|
||||||
/* default per sw-queue merge */
|
/* default per sw-queue merge */
|
||||||
spin_lock(&ctx->lock);
|
spin_lock(&ctx->lock);
|
||||||
ret = blk_mq_attempt_merge(q, hctx, ctx, bio);
|
ret = blk_mq_attempt_merge(q, hctx, ctx, bio, nr_segs);
|
||||||
spin_unlock(&ctx->lock);
|
spin_unlock(&ctx->lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
blk_mq_put_ctx(ctx);
|
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -12,8 +12,9 @@ void blk_mq_sched_assign_ioc(struct request *rq);
|
||||||
|
|
||||||
void blk_mq_sched_request_inserted(struct request *rq);
|
void blk_mq_sched_request_inserted(struct request *rq);
|
||||||
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
|
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
|
||||||
struct request **merged_request);
|
unsigned int nr_segs, struct request **merged_request);
|
||||||
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
|
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
|
||||||
|
unsigned int nr_segs);
|
||||||
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
|
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
|
||||||
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx);
|
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx);
|
||||||
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
|
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
|
||||||
|
@ -31,12 +32,13 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
|
||||||
void blk_mq_sched_free_requests(struct request_queue *q);
|
void blk_mq_sched_free_requests(struct request_queue *q);
|
||||||
|
|
||||||
static inline bool
|
static inline bool
|
||||||
blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
|
blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
|
||||||
|
unsigned int nr_segs)
|
||||||
{
|
{
|
||||||
if (blk_queue_nomerges(q) || !bio_mergeable(bio))
|
if (blk_queue_nomerges(q) || !bio_mergeable(bio))
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
return __blk_mq_sched_bio_merge(q, bio);
|
return __blk_mq_sched_bio_merge(q, bio, nr_segs);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline bool
|
static inline bool
|
||||||
|
|
|
@ -113,7 +113,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
||||||
struct sbq_wait_state *ws;
|
struct sbq_wait_state *ws;
|
||||||
DEFINE_SBQ_WAIT(wait);
|
DEFINE_SBQ_WAIT(wait);
|
||||||
unsigned int tag_offset;
|
unsigned int tag_offset;
|
||||||
bool drop_ctx;
|
|
||||||
int tag;
|
int tag;
|
||||||
|
|
||||||
if (data->flags & BLK_MQ_REQ_RESERVED) {
|
if (data->flags & BLK_MQ_REQ_RESERVED) {
|
||||||
|
@ -136,7 +135,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
||||||
return BLK_MQ_TAG_FAIL;
|
return BLK_MQ_TAG_FAIL;
|
||||||
|
|
||||||
ws = bt_wait_ptr(bt, data->hctx);
|
ws = bt_wait_ptr(bt, data->hctx);
|
||||||
drop_ctx = data->ctx == NULL;
|
|
||||||
do {
|
do {
|
||||||
struct sbitmap_queue *bt_prev;
|
struct sbitmap_queue *bt_prev;
|
||||||
|
|
||||||
|
@ -161,9 +159,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
||||||
if (tag != -1)
|
if (tag != -1)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
if (data->ctx)
|
|
||||||
blk_mq_put_ctx(data->ctx);
|
|
||||||
|
|
||||||
bt_prev = bt;
|
bt_prev = bt;
|
||||||
io_schedule();
|
io_schedule();
|
||||||
|
|
||||||
|
@ -189,9 +184,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
||||||
ws = bt_wait_ptr(bt, data->hctx);
|
ws = bt_wait_ptr(bt, data->hctx);
|
||||||
} while (1);
|
} while (1);
|
||||||
|
|
||||||
if (drop_ctx && data->ctx)
|
|
||||||
blk_mq_put_ctx(data->ctx);
|
|
||||||
|
|
||||||
sbitmap_finish_wait(bt, ws, &wait);
|
sbitmap_finish_wait(bt, ws, &wait);
|
||||||
|
|
||||||
found_tag:
|
found_tag:
|
||||||
|
|
|
@ -355,13 +355,13 @@ static struct request *blk_mq_get_request(struct request_queue *q,
|
||||||
struct elevator_queue *e = q->elevator;
|
struct elevator_queue *e = q->elevator;
|
||||||
struct request *rq;
|
struct request *rq;
|
||||||
unsigned int tag;
|
unsigned int tag;
|
||||||
bool put_ctx_on_error = false;
|
bool clear_ctx_on_error = false;
|
||||||
|
|
||||||
blk_queue_enter_live(q);
|
blk_queue_enter_live(q);
|
||||||
data->q = q;
|
data->q = q;
|
||||||
if (likely(!data->ctx)) {
|
if (likely(!data->ctx)) {
|
||||||
data->ctx = blk_mq_get_ctx(q);
|
data->ctx = blk_mq_get_ctx(q);
|
||||||
put_ctx_on_error = true;
|
clear_ctx_on_error = true;
|
||||||
}
|
}
|
||||||
if (likely(!data->hctx))
|
if (likely(!data->hctx))
|
||||||
data->hctx = blk_mq_map_queue(q, data->cmd_flags,
|
data->hctx = blk_mq_map_queue(q, data->cmd_flags,
|
||||||
|
@ -387,10 +387,8 @@ static struct request *blk_mq_get_request(struct request_queue *q,
|
||||||
|
|
||||||
tag = blk_mq_get_tag(data);
|
tag = blk_mq_get_tag(data);
|
||||||
if (tag == BLK_MQ_TAG_FAIL) {
|
if (tag == BLK_MQ_TAG_FAIL) {
|
||||||
if (put_ctx_on_error) {
|
if (clear_ctx_on_error)
|
||||||
blk_mq_put_ctx(data->ctx);
|
|
||||||
data->ctx = NULL;
|
data->ctx = NULL;
|
||||||
}
|
|
||||||
blk_queue_exit(q);
|
blk_queue_exit(q);
|
||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
@ -427,8 +425,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
|
||||||
if (!rq)
|
if (!rq)
|
||||||
return ERR_PTR(-EWOULDBLOCK);
|
return ERR_PTR(-EWOULDBLOCK);
|
||||||
|
|
||||||
blk_mq_put_ctx(alloc_data.ctx);
|
|
||||||
|
|
||||||
rq->__data_len = 0;
|
rq->__data_len = 0;
|
||||||
rq->__sector = (sector_t) -1;
|
rq->__sector = (sector_t) -1;
|
||||||
rq->bio = rq->biotail = NULL;
|
rq->bio = rq->biotail = NULL;
|
||||||
|
@ -1764,9 +1760,15 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void blk_mq_bio_to_request(struct request *rq, struct bio *bio)
|
static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
|
||||||
|
unsigned int nr_segs)
|
||||||
{
|
{
|
||||||
blk_init_request_from_bio(rq, bio);
|
if (bio->bi_opf & REQ_RAHEAD)
|
||||||
|
rq->cmd_flags |= REQ_FAILFAST_MASK;
|
||||||
|
|
||||||
|
rq->__sector = bio->bi_iter.bi_sector;
|
||||||
|
rq->write_hint = bio->bi_write_hint;
|
||||||
|
blk_rq_bio_prep(rq, bio, nr_segs);
|
||||||
|
|
||||||
blk_account_io_start(rq, true);
|
blk_account_io_start(rq, true);
|
||||||
}
|
}
|
||||||
|
@ -1936,20 +1938,20 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
|
||||||
struct request *rq;
|
struct request *rq;
|
||||||
struct blk_plug *plug;
|
struct blk_plug *plug;
|
||||||
struct request *same_queue_rq = NULL;
|
struct request *same_queue_rq = NULL;
|
||||||
|
unsigned int nr_segs;
|
||||||
blk_qc_t cookie;
|
blk_qc_t cookie;
|
||||||
|
|
||||||
blk_queue_bounce(q, &bio);
|
blk_queue_bounce(q, &bio);
|
||||||
|
__blk_queue_split(q, &bio, &nr_segs);
|
||||||
blk_queue_split(q, &bio);
|
|
||||||
|
|
||||||
if (!bio_integrity_prep(bio))
|
if (!bio_integrity_prep(bio))
|
||||||
return BLK_QC_T_NONE;
|
return BLK_QC_T_NONE;
|
||||||
|
|
||||||
if (!is_flush_fua && !blk_queue_nomerges(q) &&
|
if (!is_flush_fua && !blk_queue_nomerges(q) &&
|
||||||
blk_attempt_plug_merge(q, bio, &same_queue_rq))
|
blk_attempt_plug_merge(q, bio, nr_segs, &same_queue_rq))
|
||||||
return BLK_QC_T_NONE;
|
return BLK_QC_T_NONE;
|
||||||
|
|
||||||
if (blk_mq_sched_bio_merge(q, bio))
|
if (blk_mq_sched_bio_merge(q, bio, nr_segs))
|
||||||
return BLK_QC_T_NONE;
|
return BLK_QC_T_NONE;
|
||||||
|
|
||||||
rq_qos_throttle(q, bio);
|
rq_qos_throttle(q, bio);
|
||||||
|
@ -1969,11 +1971,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
|
||||||
|
|
||||||
cookie = request_to_qc_t(data.hctx, rq);
|
cookie = request_to_qc_t(data.hctx, rq);
|
||||||
|
|
||||||
|
blk_mq_bio_to_request(rq, bio, nr_segs);
|
||||||
|
|
||||||
plug = current->plug;
|
plug = current->plug;
|
||||||
if (unlikely(is_flush_fua)) {
|
if (unlikely(is_flush_fua)) {
|
||||||
blk_mq_put_ctx(data.ctx);
|
|
||||||
blk_mq_bio_to_request(rq, bio);
|
|
||||||
|
|
||||||
/* bypass scheduler for flush rq */
|
/* bypass scheduler for flush rq */
|
||||||
blk_insert_flush(rq);
|
blk_insert_flush(rq);
|
||||||
blk_mq_run_hw_queue(data.hctx, true);
|
blk_mq_run_hw_queue(data.hctx, true);
|
||||||
|
@ -1985,9 +1986,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
|
||||||
unsigned int request_count = plug->rq_count;
|
unsigned int request_count = plug->rq_count;
|
||||||
struct request *last = NULL;
|
struct request *last = NULL;
|
||||||
|
|
||||||
blk_mq_put_ctx(data.ctx);
|
|
||||||
blk_mq_bio_to_request(rq, bio);
|
|
||||||
|
|
||||||
if (!request_count)
|
if (!request_count)
|
||||||
trace_block_plug(q);
|
trace_block_plug(q);
|
||||||
else
|
else
|
||||||
|
@ -2001,8 +1999,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
|
||||||
|
|
||||||
blk_add_rq_to_plug(plug, rq);
|
blk_add_rq_to_plug(plug, rq);
|
||||||
} else if (plug && !blk_queue_nomerges(q)) {
|
} else if (plug && !blk_queue_nomerges(q)) {
|
||||||
blk_mq_bio_to_request(rq, bio);
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* We do limited plugging. If the bio can be merged, do that.
|
* We do limited plugging. If the bio can be merged, do that.
|
||||||
* Otherwise the existing request in the plug list will be
|
* Otherwise the existing request in the plug list will be
|
||||||
|
@ -2019,8 +2015,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
|
||||||
blk_add_rq_to_plug(plug, rq);
|
blk_add_rq_to_plug(plug, rq);
|
||||||
trace_block_plug(q);
|
trace_block_plug(q);
|
||||||
|
|
||||||
blk_mq_put_ctx(data.ctx);
|
|
||||||
|
|
||||||
if (same_queue_rq) {
|
if (same_queue_rq) {
|
||||||
data.hctx = same_queue_rq->mq_hctx;
|
data.hctx = same_queue_rq->mq_hctx;
|
||||||
trace_block_unplug(q, 1, true);
|
trace_block_unplug(q, 1, true);
|
||||||
|
@ -2029,12 +2023,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
|
||||||
}
|
}
|
||||||
} else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator &&
|
} else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator &&
|
||||||
!data.hctx->dispatch_busy)) {
|
!data.hctx->dispatch_busy)) {
|
||||||
blk_mq_put_ctx(data.ctx);
|
|
||||||
blk_mq_bio_to_request(rq, bio);
|
|
||||||
blk_mq_try_issue_directly(data.hctx, rq, &cookie);
|
blk_mq_try_issue_directly(data.hctx, rq, &cookie);
|
||||||
} else {
|
} else {
|
||||||
blk_mq_put_ctx(data.ctx);
|
|
||||||
blk_mq_bio_to_request(rq, bio);
|
|
||||||
blk_mq_sched_insert_request(rq, false, true, true);
|
blk_mq_sched_insert_request(rq, false, true, true);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -151,12 +151,7 @@ static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
|
||||||
*/
|
*/
|
||||||
static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
|
static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
|
||||||
{
|
{
|
||||||
return __blk_mq_get_ctx(q, get_cpu());
|
return __blk_mq_get_ctx(q, raw_smp_processor_id());
|
||||||
}
|
|
||||||
|
|
||||||
static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
|
|
||||||
{
|
|
||||||
put_cpu();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
struct blk_mq_alloc_data {
|
struct blk_mq_alloc_data {
|
||||||
|
|
36
block/blk.h
36
block/blk.h
|
@ -51,8 +51,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
|
||||||
int node, int cmd_size, gfp_t flags);
|
int node, int cmd_size, gfp_t flags);
|
||||||
void blk_free_flush_queue(struct blk_flush_queue *q);
|
void blk_free_flush_queue(struct blk_flush_queue *q);
|
||||||
|
|
||||||
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
|
|
||||||
struct bio *bio);
|
|
||||||
void blk_freeze_queue(struct request_queue *q);
|
void blk_freeze_queue(struct request_queue *q);
|
||||||
|
|
||||||
static inline void blk_queue_enter_live(struct request_queue *q)
|
static inline void blk_queue_enter_live(struct request_queue *q)
|
||||||
|
@ -101,6 +99,18 @@ static inline bool bvec_gap_to_prev(struct request_queue *q,
|
||||||
return __bvec_gap_to_prev(q, bprv, offset);
|
return __bvec_gap_to_prev(q, bprv, offset);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static inline void blk_rq_bio_prep(struct request *rq, struct bio *bio,
|
||||||
|
unsigned int nr_segs)
|
||||||
|
{
|
||||||
|
rq->nr_phys_segments = nr_segs;
|
||||||
|
rq->__data_len = bio->bi_iter.bi_size;
|
||||||
|
rq->bio = rq->biotail = bio;
|
||||||
|
rq->ioprio = bio_prio(bio);
|
||||||
|
|
||||||
|
if (bio->bi_disk)
|
||||||
|
rq->rq_disk = bio->bi_disk;
|
||||||
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_BLK_DEV_INTEGRITY
|
#ifdef CONFIG_BLK_DEV_INTEGRITY
|
||||||
void blk_flush_integrity(void);
|
void blk_flush_integrity(void);
|
||||||
bool __bio_integrity_endio(struct bio *);
|
bool __bio_integrity_endio(struct bio *);
|
||||||
|
@ -154,14 +164,14 @@ static inline bool bio_integrity_endio(struct bio *bio)
|
||||||
unsigned long blk_rq_timeout(unsigned long timeout);
|
unsigned long blk_rq_timeout(unsigned long timeout);
|
||||||
void blk_add_timer(struct request *req);
|
void blk_add_timer(struct request *req);
|
||||||
|
|
||||||
bool bio_attempt_front_merge(struct request_queue *q, struct request *req,
|
bool bio_attempt_front_merge(struct request *req, struct bio *bio,
|
||||||
struct bio *bio);
|
unsigned int nr_segs);
|
||||||
bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
|
bool bio_attempt_back_merge(struct request *req, struct bio *bio,
|
||||||
struct bio *bio);
|
unsigned int nr_segs);
|
||||||
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
|
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
|
||||||
struct bio *bio);
|
struct bio *bio);
|
||||||
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
|
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
|
||||||
struct request **same_queue_rq);
|
unsigned int nr_segs, struct request **same_queue_rq);
|
||||||
|
|
||||||
void blk_account_io_start(struct request *req, bool new_io);
|
void blk_account_io_start(struct request *req, bool new_io);
|
||||||
void blk_account_io_completion(struct request *req, unsigned int bytes);
|
void blk_account_io_completion(struct request *req, unsigned int bytes);
|
||||||
|
@ -202,15 +212,17 @@ static inline int blk_should_fake_timeout(struct request_queue *q)
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
int ll_back_merge_fn(struct request_queue *q, struct request *req,
|
void __blk_queue_split(struct request_queue *q, struct bio **bio,
|
||||||
struct bio *bio);
|
unsigned int *nr_segs);
|
||||||
int ll_front_merge_fn(struct request_queue *q, struct request *req,
|
int ll_back_merge_fn(struct request *req, struct bio *bio,
|
||||||
struct bio *bio);
|
unsigned int nr_segs);
|
||||||
|
int ll_front_merge_fn(struct request *req, struct bio *bio,
|
||||||
|
unsigned int nr_segs);
|
||||||
struct request *attempt_back_merge(struct request_queue *q, struct request *rq);
|
struct request *attempt_back_merge(struct request_queue *q, struct request *rq);
|
||||||
struct request *attempt_front_merge(struct request_queue *q, struct request *rq);
|
struct request *attempt_front_merge(struct request_queue *q, struct request *rq);
|
||||||
int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
|
int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
|
||||||
struct request *next);
|
struct request *next);
|
||||||
void blk_recalc_rq_segments(struct request *rq);
|
unsigned int blk_recalc_rq_segments(struct request *rq);
|
||||||
void blk_rq_set_mixed_merge(struct request *rq);
|
void blk_rq_set_mixed_merge(struct request *rq);
|
||||||
bool blk_rq_merge_ok(struct request *rq, struct bio *bio);
|
bool blk_rq_merge_ok(struct request *rq, struct bio *bio);
|
||||||
enum elv_merge blk_try_merge(struct request *rq, struct bio *bio);
|
enum elv_merge blk_try_merge(struct request *rq, struct bio *bio);
|
||||||
|
|
|
@ -1281,7 +1281,6 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno)
|
||||||
struct disk_part_tbl *new_ptbl;
|
struct disk_part_tbl *new_ptbl;
|
||||||
int len = old_ptbl ? old_ptbl->len : 0;
|
int len = old_ptbl ? old_ptbl->len : 0;
|
||||||
int i, target;
|
int i, target;
|
||||||
size_t size;
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* check for int overflow, since we can get here from blkpg_ioctl()
|
* check for int overflow, since we can get here from blkpg_ioctl()
|
||||||
|
@ -1298,8 +1297,8 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno)
|
||||||
if (target <= len)
|
if (target <= len)
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
size = sizeof(*new_ptbl) + target * sizeof(new_ptbl->part[0]);
|
new_ptbl = kzalloc_node(struct_size(new_ptbl, part, target), GFP_KERNEL,
|
||||||
new_ptbl = kzalloc_node(size, GFP_KERNEL, disk->node_id);
|
disk->node_id);
|
||||||
if (!new_ptbl)
|
if (!new_ptbl)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
|
|
|
@ -562,7 +562,8 @@ static void kyber_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
|
static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio,
|
||||||
|
unsigned int nr_segs)
|
||||||
{
|
{
|
||||||
struct kyber_hctx_data *khd = hctx->sched_data;
|
struct kyber_hctx_data *khd = hctx->sched_data;
|
||||||
struct blk_mq_ctx *ctx = blk_mq_get_ctx(hctx->queue);
|
struct blk_mq_ctx *ctx = blk_mq_get_ctx(hctx->queue);
|
||||||
|
@ -572,9 +573,8 @@ static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
|
||||||
bool merged;
|
bool merged;
|
||||||
|
|
||||||
spin_lock(&kcq->lock);
|
spin_lock(&kcq->lock);
|
||||||
merged = blk_mq_bio_list_merge(hctx->queue, rq_list, bio);
|
merged = blk_mq_bio_list_merge(hctx->queue, rq_list, bio, nr_segs);
|
||||||
spin_unlock(&kcq->lock);
|
spin_unlock(&kcq->lock);
|
||||||
blk_mq_put_ctx(ctx);
|
|
||||||
|
|
||||||
return merged;
|
return merged;
|
||||||
}
|
}
|
||||||
|
|
|
@ -469,7 +469,8 @@ static int dd_request_merge(struct request_queue *q, struct request **rq,
|
||||||
return ELEVATOR_NO_MERGE;
|
return ELEVATOR_NO_MERGE;
|
||||||
}
|
}
|
||||||
|
|
||||||
static bool dd_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
|
static bool dd_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio,
|
||||||
|
unsigned int nr_segs)
|
||||||
{
|
{
|
||||||
struct request_queue *q = hctx->queue;
|
struct request_queue *q = hctx->queue;
|
||||||
struct deadline_data *dd = q->elevator->elevator_data;
|
struct deadline_data *dd = q->elevator->elevator_data;
|
||||||
|
@ -477,7 +478,7 @@ static bool dd_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
|
||||||
bool ret;
|
bool ret;
|
||||||
|
|
||||||
spin_lock(&dd->lock);
|
spin_lock(&dd->lock);
|
||||||
ret = blk_mq_sched_try_merge(q, bio, &free);
|
ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
|
||||||
spin_unlock(&dd->lock);
|
spin_unlock(&dd->lock);
|
||||||
|
|
||||||
if (free)
|
if (free)
|
||||||
|
|
|
@ -98,6 +98,7 @@ enum opal_uid {
|
||||||
OPAL_ENTERPRISE_BANDMASTER0_UID,
|
OPAL_ENTERPRISE_BANDMASTER0_UID,
|
||||||
OPAL_ENTERPRISE_ERASEMASTER_UID,
|
OPAL_ENTERPRISE_ERASEMASTER_UID,
|
||||||
/* tables */
|
/* tables */
|
||||||
|
OPAL_TABLE_TABLE,
|
||||||
OPAL_LOCKINGRANGE_GLOBAL,
|
OPAL_LOCKINGRANGE_GLOBAL,
|
||||||
OPAL_LOCKINGRANGE_ACE_RDLOCKED,
|
OPAL_LOCKINGRANGE_ACE_RDLOCKED,
|
||||||
OPAL_LOCKINGRANGE_ACE_WRLOCKED,
|
OPAL_LOCKINGRANGE_ACE_WRLOCKED,
|
||||||
|
@ -152,6 +153,21 @@ enum opal_token {
|
||||||
OPAL_STARTCOLUMN = 0x03,
|
OPAL_STARTCOLUMN = 0x03,
|
||||||
OPAL_ENDCOLUMN = 0x04,
|
OPAL_ENDCOLUMN = 0x04,
|
||||||
OPAL_VALUES = 0x01,
|
OPAL_VALUES = 0x01,
|
||||||
|
/* table table */
|
||||||
|
OPAL_TABLE_UID = 0x00,
|
||||||
|
OPAL_TABLE_NAME = 0x01,
|
||||||
|
OPAL_TABLE_COMMON = 0x02,
|
||||||
|
OPAL_TABLE_TEMPLATE = 0x03,
|
||||||
|
OPAL_TABLE_KIND = 0x04,
|
||||||
|
OPAL_TABLE_COLUMN = 0x05,
|
||||||
|
OPAL_TABLE_COLUMNS = 0x06,
|
||||||
|
OPAL_TABLE_ROWS = 0x07,
|
||||||
|
OPAL_TABLE_ROWS_FREE = 0x08,
|
||||||
|
OPAL_TABLE_ROW_BYTES = 0x09,
|
||||||
|
OPAL_TABLE_LASTID = 0x0A,
|
||||||
|
OPAL_TABLE_MIN = 0x0B,
|
||||||
|
OPAL_TABLE_MAX = 0x0C,
|
||||||
|
|
||||||
/* authority table */
|
/* authority table */
|
||||||
OPAL_PIN = 0x03,
|
OPAL_PIN = 0x03,
|
||||||
/* locking tokens */
|
/* locking tokens */
|
||||||
|
|
197
block/sed-opal.c
197
block/sed-opal.c
|
@ -26,6 +26,9 @@
|
||||||
#define IO_BUFFER_LENGTH 2048
|
#define IO_BUFFER_LENGTH 2048
|
||||||
#define MAX_TOKS 64
|
#define MAX_TOKS 64
|
||||||
|
|
||||||
|
/* Number of bytes needed by cmd_finalize. */
|
||||||
|
#define CMD_FINALIZE_BYTES_NEEDED 7
|
||||||
|
|
||||||
struct opal_step {
|
struct opal_step {
|
||||||
int (*fn)(struct opal_dev *dev, void *data);
|
int (*fn)(struct opal_dev *dev, void *data);
|
||||||
void *data;
|
void *data;
|
||||||
|
@ -127,6 +130,8 @@ static const u8 opaluid[][OPAL_UID_LENGTH] = {
|
||||||
|
|
||||||
/* tables */
|
/* tables */
|
||||||
|
|
||||||
|
[OPAL_TABLE_TABLE]
|
||||||
|
{ 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01 },
|
||||||
[OPAL_LOCKINGRANGE_GLOBAL] =
|
[OPAL_LOCKINGRANGE_GLOBAL] =
|
||||||
{ 0x00, 0x00, 0x08, 0x02, 0x00, 0x00, 0x00, 0x01 },
|
{ 0x00, 0x00, 0x08, 0x02, 0x00, 0x00, 0x00, 0x01 },
|
||||||
[OPAL_LOCKINGRANGE_ACE_RDLOCKED] =
|
[OPAL_LOCKINGRANGE_ACE_RDLOCKED] =
|
||||||
|
@ -523,12 +528,17 @@ static int opal_discovery0_step(struct opal_dev *dev)
|
||||||
return execute_step(dev, &discovery0_step, 0);
|
return execute_step(dev, &discovery0_step, 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static size_t remaining_size(struct opal_dev *cmd)
|
||||||
|
{
|
||||||
|
return IO_BUFFER_LENGTH - cmd->pos;
|
||||||
|
}
|
||||||
|
|
||||||
static bool can_add(int *err, struct opal_dev *cmd, size_t len)
|
static bool can_add(int *err, struct opal_dev *cmd, size_t len)
|
||||||
{
|
{
|
||||||
if (*err)
|
if (*err)
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
if (len > IO_BUFFER_LENGTH || cmd->pos > IO_BUFFER_LENGTH - len) {
|
if (remaining_size(cmd) < len) {
|
||||||
pr_debug("Error adding %zu bytes: end of buffer.\n", len);
|
pr_debug("Error adding %zu bytes: end of buffer.\n", len);
|
||||||
*err = -ERANGE;
|
*err = -ERANGE;
|
||||||
return false;
|
return false;
|
||||||
|
@ -674,7 +684,11 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, u32 tsn)
|
||||||
struct opal_header *hdr;
|
struct opal_header *hdr;
|
||||||
int err = 0;
|
int err = 0;
|
||||||
|
|
||||||
/* close the parameter list opened from cmd_start */
|
/*
|
||||||
|
* Close the parameter list opened from cmd_start.
|
||||||
|
* The number of bytes added must be equal to
|
||||||
|
* CMD_FINALIZE_BYTES_NEEDED.
|
||||||
|
*/
|
||||||
add_token_u8(&err, cmd, OPAL_ENDLIST);
|
add_token_u8(&err, cmd, OPAL_ENDLIST);
|
||||||
|
|
||||||
add_token_u8(&err, cmd, OPAL_ENDOFDATA);
|
add_token_u8(&err, cmd, OPAL_ENDOFDATA);
|
||||||
|
@ -1119,6 +1133,29 @@ static int generic_get_column(struct opal_dev *dev, const u8 *table,
|
||||||
return finalize_and_send(dev, parse_and_check_status);
|
return finalize_and_send(dev, parse_and_check_status);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* see TCG SAS 5.3.2.3 for a description of the available columns
|
||||||
|
*
|
||||||
|
* the result is provided in dev->resp->tok[4]
|
||||||
|
*/
|
||||||
|
static int generic_get_table_info(struct opal_dev *dev, enum opal_uid table,
|
||||||
|
u64 column)
|
||||||
|
{
|
||||||
|
u8 uid[OPAL_UID_LENGTH];
|
||||||
|
const unsigned int half = OPAL_UID_LENGTH/2;
|
||||||
|
|
||||||
|
/* sed-opal UIDs can be split in two halves:
|
||||||
|
* first: actual table index
|
||||||
|
* second: relative index in the table
|
||||||
|
* so we have to get the first half of the OPAL_TABLE_TABLE and use the
|
||||||
|
* first part of the target table as relative index into that table
|
||||||
|
*/
|
||||||
|
memcpy(uid, opaluid[OPAL_TABLE_TABLE], half);
|
||||||
|
memcpy(uid+half, opaluid[table], half);
|
||||||
|
|
||||||
|
return generic_get_column(dev, uid, column);
|
||||||
|
}
|
||||||
|
|
||||||
static int gen_key(struct opal_dev *dev, void *data)
|
static int gen_key(struct opal_dev *dev, void *data)
|
||||||
{
|
{
|
||||||
u8 uid[OPAL_UID_LENGTH];
|
u8 uid[OPAL_UID_LENGTH];
|
||||||
|
@ -1307,6 +1344,7 @@ static int start_generic_opal_session(struct opal_dev *dev,
|
||||||
break;
|
break;
|
||||||
case OPAL_ADMIN1_UID:
|
case OPAL_ADMIN1_UID:
|
||||||
case OPAL_SID_UID:
|
case OPAL_SID_UID:
|
||||||
|
case OPAL_PSID_UID:
|
||||||
add_token_u8(&err, dev, OPAL_STARTNAME);
|
add_token_u8(&err, dev, OPAL_STARTNAME);
|
||||||
add_token_u8(&err, dev, 0); /* HostChallenge */
|
add_token_u8(&err, dev, 0); /* HostChallenge */
|
||||||
add_token_bytestring(&err, dev, key, key_len);
|
add_token_bytestring(&err, dev, key, key_len);
|
||||||
|
@ -1367,6 +1405,16 @@ static int start_admin1LSP_opal_session(struct opal_dev *dev, void *data)
|
||||||
key->key, key->key_len);
|
key->key, key->key_len);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int start_PSID_opal_session(struct opal_dev *dev, void *data)
|
||||||
|
{
|
||||||
|
const struct opal_key *okey = data;
|
||||||
|
|
||||||
|
return start_generic_opal_session(dev, OPAL_PSID_UID,
|
||||||
|
OPAL_ADMINSP_UID,
|
||||||
|
okey->key,
|
||||||
|
okey->key_len);
|
||||||
|
}
|
||||||
|
|
||||||
static int start_auth_opal_session(struct opal_dev *dev, void *data)
|
static int start_auth_opal_session(struct opal_dev *dev, void *data)
|
||||||
{
|
{
|
||||||
struct opal_session_info *session = data;
|
struct opal_session_info *session = data;
|
||||||
|
@ -1525,6 +1573,72 @@ static int set_mbr_enable_disable(struct opal_dev *dev, void *data)
|
||||||
return finalize_and_send(dev, parse_and_check_status);
|
return finalize_and_send(dev, parse_and_check_status);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int write_shadow_mbr(struct opal_dev *dev, void *data)
|
||||||
|
{
|
||||||
|
struct opal_shadow_mbr *shadow = data;
|
||||||
|
const u8 __user *src;
|
||||||
|
u8 *dst;
|
||||||
|
size_t off = 0;
|
||||||
|
u64 len;
|
||||||
|
int err = 0;
|
||||||
|
|
||||||
|
/* do we fit in the available shadow mbr space? */
|
||||||
|
err = generic_get_table_info(dev, OPAL_MBR, OPAL_TABLE_ROWS);
|
||||||
|
if (err) {
|
||||||
|
pr_debug("MBR: could not get shadow size\n");
|
||||||
|
return err;
|
||||||
|
}
|
||||||
|
|
||||||
|
len = response_get_u64(&dev->parsed, 4);
|
||||||
|
if (shadow->size > len || shadow->offset > len - shadow->size) {
|
||||||
|
pr_debug("MBR: does not fit in shadow (%llu vs. %llu)\n",
|
||||||
|
shadow->offset + shadow->size, len);
|
||||||
|
return -ENOSPC;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* do the actual transmission(s) */
|
||||||
|
src = (u8 __user *)(uintptr_t)shadow->data;
|
||||||
|
while (off < shadow->size) {
|
||||||
|
err = cmd_start(dev, opaluid[OPAL_MBR], opalmethod[OPAL_SET]);
|
||||||
|
add_token_u8(&err, dev, OPAL_STARTNAME);
|
||||||
|
add_token_u8(&err, dev, OPAL_WHERE);
|
||||||
|
add_token_u64(&err, dev, shadow->offset + off);
|
||||||
|
add_token_u8(&err, dev, OPAL_ENDNAME);
|
||||||
|
|
||||||
|
add_token_u8(&err, dev, OPAL_STARTNAME);
|
||||||
|
add_token_u8(&err, dev, OPAL_VALUES);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The bytestring header is either 1 or 2 bytes, so assume 2.
|
||||||
|
* There also needs to be enough space to accommodate the
|
||||||
|
* trailing OPAL_ENDNAME (1 byte) and tokens added by
|
||||||
|
* cmd_finalize.
|
||||||
|
*/
|
||||||
|
len = min(remaining_size(dev) - (2+1+CMD_FINALIZE_BYTES_NEEDED),
|
||||||
|
(size_t)(shadow->size - off));
|
||||||
|
pr_debug("MBR: write bytes %zu+%llu/%llu\n",
|
||||||
|
off, len, shadow->size);
|
||||||
|
|
||||||
|
dst = add_bytestring_header(&err, dev, len);
|
||||||
|
if (!dst)
|
||||||
|
break;
|
||||||
|
if (copy_from_user(dst, src + off, len))
|
||||||
|
err = -EFAULT;
|
||||||
|
dev->pos += len;
|
||||||
|
|
||||||
|
add_token_u8(&err, dev, OPAL_ENDNAME);
|
||||||
|
if (err)
|
||||||
|
break;
|
||||||
|
|
||||||
|
err = finalize_and_send(dev, parse_and_check_status);
|
||||||
|
if (err)
|
||||||
|
break;
|
||||||
|
|
||||||
|
off += len;
|
||||||
|
}
|
||||||
|
return err;
|
||||||
|
}
|
||||||
|
|
||||||
static int generic_pw_cmd(u8 *key, size_t key_len, u8 *cpin_uid,
|
static int generic_pw_cmd(u8 *key, size_t key_len, u8 *cpin_uid,
|
||||||
struct opal_dev *dev)
|
struct opal_dev *dev)
|
||||||
{
|
{
|
||||||
|
@ -1978,6 +2092,50 @@ static int opal_enable_disable_shadow_mbr(struct opal_dev *dev,
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int opal_set_mbr_done(struct opal_dev *dev,
|
||||||
|
struct opal_mbr_done *mbr_done)
|
||||||
|
{
|
||||||
|
u8 mbr_done_tf = mbr_done->done_flag == OPAL_MBR_DONE ?
|
||||||
|
OPAL_TRUE : OPAL_FALSE;
|
||||||
|
|
||||||
|
const struct opal_step mbr_steps[] = {
|
||||||
|
{ start_admin1LSP_opal_session, &mbr_done->key },
|
||||||
|
{ set_mbr_done, &mbr_done_tf },
|
||||||
|
{ end_opal_session, }
|
||||||
|
};
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
if (mbr_done->done_flag != OPAL_MBR_DONE &&
|
||||||
|
mbr_done->done_flag != OPAL_MBR_NOT_DONE)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
mutex_lock(&dev->dev_lock);
|
||||||
|
setup_opal_dev(dev);
|
||||||
|
ret = execute_steps(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
|
||||||
|
mutex_unlock(&dev->dev_lock);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int opal_write_shadow_mbr(struct opal_dev *dev,
|
||||||
|
struct opal_shadow_mbr *info)
|
||||||
|
{
|
||||||
|
const struct opal_step mbr_steps[] = {
|
||||||
|
{ start_admin1LSP_opal_session, &info->key },
|
||||||
|
{ write_shadow_mbr, info },
|
||||||
|
{ end_opal_session, }
|
||||||
|
};
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
if (info->size == 0)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
mutex_lock(&dev->dev_lock);
|
||||||
|
setup_opal_dev(dev);
|
||||||
|
ret = execute_steps(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
|
||||||
|
mutex_unlock(&dev->dev_lock);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
static int opal_save(struct opal_dev *dev, struct opal_lock_unlock *lk_unlk)
|
static int opal_save(struct opal_dev *dev, struct opal_lock_unlock *lk_unlk)
|
||||||
{
|
{
|
||||||
struct opal_suspend_data *suspend;
|
struct opal_suspend_data *suspend;
|
||||||
|
@ -2030,17 +2188,28 @@ static int opal_add_user_to_lr(struct opal_dev *dev,
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int opal_reverttper(struct opal_dev *dev, struct opal_key *opal)
|
static int opal_reverttper(struct opal_dev *dev, struct opal_key *opal, bool psid)
|
||||||
{
|
{
|
||||||
|
/* controller will terminate session */
|
||||||
const struct opal_step revert_steps[] = {
|
const struct opal_step revert_steps[] = {
|
||||||
{ start_SIDASP_opal_session, opal },
|
{ start_SIDASP_opal_session, opal },
|
||||||
{ revert_tper, } /* controller will terminate session */
|
{ revert_tper, }
|
||||||
};
|
};
|
||||||
|
const struct opal_step psid_revert_steps[] = {
|
||||||
|
{ start_PSID_opal_session, opal },
|
||||||
|
{ revert_tper, }
|
||||||
|
};
|
||||||
|
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
mutex_lock(&dev->dev_lock);
|
mutex_lock(&dev->dev_lock);
|
||||||
setup_opal_dev(dev);
|
setup_opal_dev(dev);
|
||||||
ret = execute_steps(dev, revert_steps, ARRAY_SIZE(revert_steps));
|
if (psid)
|
||||||
|
ret = execute_steps(dev, psid_revert_steps,
|
||||||
|
ARRAY_SIZE(psid_revert_steps));
|
||||||
|
else
|
||||||
|
ret = execute_steps(dev, revert_steps,
|
||||||
|
ARRAY_SIZE(revert_steps));
|
||||||
mutex_unlock(&dev->dev_lock);
|
mutex_unlock(&dev->dev_lock);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -2092,8 +2261,7 @@ static int opal_lock_unlock(struct opal_dev *dev,
|
||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
if (lk_unlk->session.who < OPAL_ADMIN1 ||
|
if (lk_unlk->session.who > OPAL_USER9)
|
||||||
lk_unlk->session.who > OPAL_USER9)
|
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
mutex_lock(&dev->dev_lock);
|
mutex_lock(&dev->dev_lock);
|
||||||
|
@ -2171,9 +2339,7 @@ static int opal_set_new_pw(struct opal_dev *dev, struct opal_new_pw *opal_pw)
|
||||||
};
|
};
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
if (opal_pw->session.who < OPAL_ADMIN1 ||
|
if (opal_pw->session.who > OPAL_USER9 ||
|
||||||
opal_pw->session.who > OPAL_USER9 ||
|
|
||||||
opal_pw->new_user_pw.who < OPAL_ADMIN1 ||
|
|
||||||
opal_pw->new_user_pw.who > OPAL_USER9)
|
opal_pw->new_user_pw.who > OPAL_USER9)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
|
@ -2280,7 +2446,7 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
|
||||||
ret = opal_activate_user(dev, p);
|
ret = opal_activate_user(dev, p);
|
||||||
break;
|
break;
|
||||||
case IOC_OPAL_REVERT_TPR:
|
case IOC_OPAL_REVERT_TPR:
|
||||||
ret = opal_reverttper(dev, p);
|
ret = opal_reverttper(dev, p, false);
|
||||||
break;
|
break;
|
||||||
case IOC_OPAL_LR_SETUP:
|
case IOC_OPAL_LR_SETUP:
|
||||||
ret = opal_setup_locking_range(dev, p);
|
ret = opal_setup_locking_range(dev, p);
|
||||||
|
@ -2291,12 +2457,21 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
|
||||||
case IOC_OPAL_ENABLE_DISABLE_MBR:
|
case IOC_OPAL_ENABLE_DISABLE_MBR:
|
||||||
ret = opal_enable_disable_shadow_mbr(dev, p);
|
ret = opal_enable_disable_shadow_mbr(dev, p);
|
||||||
break;
|
break;
|
||||||
|
case IOC_OPAL_MBR_DONE:
|
||||||
|
ret = opal_set_mbr_done(dev, p);
|
||||||
|
break;
|
||||||
|
case IOC_OPAL_WRITE_SHADOW_MBR:
|
||||||
|
ret = opal_write_shadow_mbr(dev, p);
|
||||||
|
break;
|
||||||
case IOC_OPAL_ERASE_LR:
|
case IOC_OPAL_ERASE_LR:
|
||||||
ret = opal_erase_locking_range(dev, p);
|
ret = opal_erase_locking_range(dev, p);
|
||||||
break;
|
break;
|
||||||
case IOC_OPAL_SECURE_ERASE_LR:
|
case IOC_OPAL_SECURE_ERASE_LR:
|
||||||
ret = opal_secure_erase_locking_range(dev, p);
|
ret = opal_secure_erase_locking_range(dev, p);
|
||||||
break;
|
break;
|
||||||
|
case IOC_OPAL_PSID_REVERT_TPR:
|
||||||
|
ret = opal_reverttper(dev, p, true);
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
|
@ -465,35 +465,20 @@ static const struct file_operations in_flight_summary_fops = {
|
||||||
void drbd_debugfs_resource_add(struct drbd_resource *resource)
|
void drbd_debugfs_resource_add(struct drbd_resource *resource)
|
||||||
{
|
{
|
||||||
struct dentry *dentry;
|
struct dentry *dentry;
|
||||||
if (!drbd_debugfs_resources)
|
|
||||||
return;
|
|
||||||
|
|
||||||
dentry = debugfs_create_dir(resource->name, drbd_debugfs_resources);
|
dentry = debugfs_create_dir(resource->name, drbd_debugfs_resources);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
resource->debugfs_res = dentry;
|
resource->debugfs_res = dentry;
|
||||||
|
|
||||||
dentry = debugfs_create_dir("volumes", resource->debugfs_res);
|
dentry = debugfs_create_dir("volumes", resource->debugfs_res);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
resource->debugfs_res_volumes = dentry;
|
resource->debugfs_res_volumes = dentry;
|
||||||
|
|
||||||
dentry = debugfs_create_dir("connections", resource->debugfs_res);
|
dentry = debugfs_create_dir("connections", resource->debugfs_res);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
resource->debugfs_res_connections = dentry;
|
resource->debugfs_res_connections = dentry;
|
||||||
|
|
||||||
dentry = debugfs_create_file("in_flight_summary", 0440,
|
dentry = debugfs_create_file("in_flight_summary", 0440,
|
||||||
resource->debugfs_res, resource,
|
resource->debugfs_res, resource,
|
||||||
&in_flight_summary_fops);
|
&in_flight_summary_fops);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
resource->debugfs_res_in_flight_summary = dentry;
|
resource->debugfs_res_in_flight_summary = dentry;
|
||||||
return;
|
|
||||||
|
|
||||||
fail:
|
|
||||||
drbd_debugfs_resource_cleanup(resource);
|
|
||||||
drbd_err(resource, "failed to create debugfs dentry\n");
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static void drbd_debugfs_remove(struct dentry **dp)
|
static void drbd_debugfs_remove(struct dentry **dp)
|
||||||
|
@ -636,35 +621,22 @@ void drbd_debugfs_connection_add(struct drbd_connection *connection)
|
||||||
{
|
{
|
||||||
struct dentry *conns_dir = connection->resource->debugfs_res_connections;
|
struct dentry *conns_dir = connection->resource->debugfs_res_connections;
|
||||||
struct dentry *dentry;
|
struct dentry *dentry;
|
||||||
if (!conns_dir)
|
|
||||||
return;
|
|
||||||
|
|
||||||
/* Once we enable mutliple peers,
|
/* Once we enable mutliple peers,
|
||||||
* these connections will have descriptive names.
|
* these connections will have descriptive names.
|
||||||
* For now, it is just the one connection to the (only) "peer". */
|
* For now, it is just the one connection to the (only) "peer". */
|
||||||
dentry = debugfs_create_dir("peer", conns_dir);
|
dentry = debugfs_create_dir("peer", conns_dir);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
connection->debugfs_conn = dentry;
|
connection->debugfs_conn = dentry;
|
||||||
|
|
||||||
dentry = debugfs_create_file("callback_history", 0440,
|
dentry = debugfs_create_file("callback_history", 0440,
|
||||||
connection->debugfs_conn, connection,
|
connection->debugfs_conn, connection,
|
||||||
&connection_callback_history_fops);
|
&connection_callback_history_fops);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
connection->debugfs_conn_callback_history = dentry;
|
connection->debugfs_conn_callback_history = dentry;
|
||||||
|
|
||||||
dentry = debugfs_create_file("oldest_requests", 0440,
|
dentry = debugfs_create_file("oldest_requests", 0440,
|
||||||
connection->debugfs_conn, connection,
|
connection->debugfs_conn, connection,
|
||||||
&connection_oldest_requests_fops);
|
&connection_oldest_requests_fops);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
connection->debugfs_conn_oldest_requests = dentry;
|
connection->debugfs_conn_oldest_requests = dentry;
|
||||||
return;
|
|
||||||
|
|
||||||
fail:
|
|
||||||
drbd_debugfs_connection_cleanup(connection);
|
|
||||||
drbd_err(connection, "failed to create debugfs dentry\n");
|
|
||||||
}
|
}
|
||||||
|
|
||||||
void drbd_debugfs_connection_cleanup(struct drbd_connection *connection)
|
void drbd_debugfs_connection_cleanup(struct drbd_connection *connection)
|
||||||
|
@ -809,8 +781,6 @@ void drbd_debugfs_device_add(struct drbd_device *device)
|
||||||
|
|
||||||
snprintf(vnr_buf, sizeof(vnr_buf), "%u", device->vnr);
|
snprintf(vnr_buf, sizeof(vnr_buf), "%u", device->vnr);
|
||||||
dentry = debugfs_create_dir(vnr_buf, vols_dir);
|
dentry = debugfs_create_dir(vnr_buf, vols_dir);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
device->debugfs_vol = dentry;
|
device->debugfs_vol = dentry;
|
||||||
|
|
||||||
snprintf(minor_buf, sizeof(minor_buf), "%u", device->minor);
|
snprintf(minor_buf, sizeof(minor_buf), "%u", device->minor);
|
||||||
|
@ -819,18 +789,14 @@ void drbd_debugfs_device_add(struct drbd_device *device)
|
||||||
if (!slink_name)
|
if (!slink_name)
|
||||||
goto fail;
|
goto fail;
|
||||||
dentry = debugfs_create_symlink(minor_buf, drbd_debugfs_minors, slink_name);
|
dentry = debugfs_create_symlink(minor_buf, drbd_debugfs_minors, slink_name);
|
||||||
|
device->debugfs_minor = dentry;
|
||||||
kfree(slink_name);
|
kfree(slink_name);
|
||||||
slink_name = NULL;
|
slink_name = NULL;
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
device->debugfs_minor = dentry;
|
|
||||||
|
|
||||||
#define DCF(name) do { \
|
#define DCF(name) do { \
|
||||||
dentry = debugfs_create_file(#name, 0440, \
|
dentry = debugfs_create_file(#name, 0440, \
|
||||||
device->debugfs_vol, device, \
|
device->debugfs_vol, device, \
|
||||||
&device_ ## name ## _fops); \
|
&device_ ## name ## _fops); \
|
||||||
if (IS_ERR_OR_NULL(dentry)) \
|
|
||||||
goto fail; \
|
|
||||||
device->debugfs_vol_ ## name = dentry; \
|
device->debugfs_vol_ ## name = dentry; \
|
||||||
} while (0)
|
} while (0)
|
||||||
|
|
||||||
|
@ -864,19 +830,9 @@ void drbd_debugfs_peer_device_add(struct drbd_peer_device *peer_device)
|
||||||
struct dentry *dentry;
|
struct dentry *dentry;
|
||||||
char vnr_buf[8];
|
char vnr_buf[8];
|
||||||
|
|
||||||
if (!conn_dir)
|
|
||||||
return;
|
|
||||||
|
|
||||||
snprintf(vnr_buf, sizeof(vnr_buf), "%u", peer_device->device->vnr);
|
snprintf(vnr_buf, sizeof(vnr_buf), "%u", peer_device->device->vnr);
|
||||||
dentry = debugfs_create_dir(vnr_buf, conn_dir);
|
dentry = debugfs_create_dir(vnr_buf, conn_dir);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
peer_device->debugfs_peer_dev = dentry;
|
peer_device->debugfs_peer_dev = dentry;
|
||||||
return;
|
|
||||||
|
|
||||||
fail:
|
|
||||||
drbd_debugfs_peer_device_cleanup(peer_device);
|
|
||||||
drbd_err(peer_device, "failed to create debugfs entries\n");
|
|
||||||
}
|
}
|
||||||
|
|
||||||
void drbd_debugfs_peer_device_cleanup(struct drbd_peer_device *peer_device)
|
void drbd_debugfs_peer_device_cleanup(struct drbd_peer_device *peer_device)
|
||||||
|
@ -917,35 +873,19 @@ void drbd_debugfs_cleanup(void)
|
||||||
drbd_debugfs_remove(&drbd_debugfs_root);
|
drbd_debugfs_remove(&drbd_debugfs_root);
|
||||||
}
|
}
|
||||||
|
|
||||||
int __init drbd_debugfs_init(void)
|
void __init drbd_debugfs_init(void)
|
||||||
{
|
{
|
||||||
struct dentry *dentry;
|
struct dentry *dentry;
|
||||||
|
|
||||||
dentry = debugfs_create_dir("drbd", NULL);
|
dentry = debugfs_create_dir("drbd", NULL);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
drbd_debugfs_root = dentry;
|
drbd_debugfs_root = dentry;
|
||||||
|
|
||||||
dentry = debugfs_create_file("version", 0444, drbd_debugfs_root, NULL, &drbd_version_fops);
|
dentry = debugfs_create_file("version", 0444, drbd_debugfs_root, NULL, &drbd_version_fops);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
drbd_debugfs_version = dentry;
|
drbd_debugfs_version = dentry;
|
||||||
|
|
||||||
dentry = debugfs_create_dir("resources", drbd_debugfs_root);
|
dentry = debugfs_create_dir("resources", drbd_debugfs_root);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
drbd_debugfs_resources = dentry;
|
drbd_debugfs_resources = dentry;
|
||||||
|
|
||||||
dentry = debugfs_create_dir("minors", drbd_debugfs_root);
|
dentry = debugfs_create_dir("minors", drbd_debugfs_root);
|
||||||
if (IS_ERR_OR_NULL(dentry))
|
|
||||||
goto fail;
|
|
||||||
drbd_debugfs_minors = dentry;
|
drbd_debugfs_minors = dentry;
|
||||||
return 0;
|
|
||||||
|
|
||||||
fail:
|
|
||||||
drbd_debugfs_cleanup();
|
|
||||||
if (dentry)
|
|
||||||
return PTR_ERR(dentry);
|
|
||||||
else
|
|
||||||
return -EINVAL;
|
|
||||||
}
|
}
|
||||||
|
|
|
@ -6,7 +6,7 @@
|
||||||
#include "drbd_int.h"
|
#include "drbd_int.h"
|
||||||
|
|
||||||
#ifdef CONFIG_DEBUG_FS
|
#ifdef CONFIG_DEBUG_FS
|
||||||
int __init drbd_debugfs_init(void);
|
void __init drbd_debugfs_init(void);
|
||||||
void drbd_debugfs_cleanup(void);
|
void drbd_debugfs_cleanup(void);
|
||||||
|
|
||||||
void drbd_debugfs_resource_add(struct drbd_resource *resource);
|
void drbd_debugfs_resource_add(struct drbd_resource *resource);
|
||||||
|
@ -22,7 +22,7 @@ void drbd_debugfs_peer_device_add(struct drbd_peer_device *peer_device);
|
||||||
void drbd_debugfs_peer_device_cleanup(struct drbd_peer_device *peer_device);
|
void drbd_debugfs_peer_device_cleanup(struct drbd_peer_device *peer_device);
|
||||||
#else
|
#else
|
||||||
|
|
||||||
static inline int __init drbd_debugfs_init(void) { return -ENODEV; }
|
static inline void __init drbd_debugfs_init(void) { }
|
||||||
static inline void drbd_debugfs_cleanup(void) { }
|
static inline void drbd_debugfs_cleanup(void) { }
|
||||||
|
|
||||||
static inline void drbd_debugfs_resource_add(struct drbd_resource *resource) { }
|
static inline void drbd_debugfs_resource_add(struct drbd_resource *resource) { }
|
||||||
|
|
|
@ -3009,8 +3009,7 @@ static int __init drbd_init(void)
|
||||||
spin_lock_init(&retry.lock);
|
spin_lock_init(&retry.lock);
|
||||||
INIT_LIST_HEAD(&retry.writes);
|
INIT_LIST_HEAD(&retry.writes);
|
||||||
|
|
||||||
if (drbd_debugfs_init())
|
drbd_debugfs_init();
|
||||||
pr_notice("failed to initialize debugfs -- will not be available\n");
|
|
||||||
|
|
||||||
pr_info("initialized. "
|
pr_info("initialized. "
|
||||||
"Version: " REL_VERSION " (api:%d/proto:%d-%d)\n",
|
"Version: " REL_VERSION " (api:%d/proto:%d-%d)\n",
|
||||||
|
|
|
@ -3900,7 +3900,7 @@ static void __init config_types(void)
|
||||||
if (!UDP->cmos)
|
if (!UDP->cmos)
|
||||||
UDP->cmos = FLOPPY0_TYPE;
|
UDP->cmos = FLOPPY0_TYPE;
|
||||||
drive = 1;
|
drive = 1;
|
||||||
if (!UDP->cmos && FLOPPY1_TYPE)
|
if (!UDP->cmos)
|
||||||
UDP->cmos = FLOPPY1_TYPE;
|
UDP->cmos = FLOPPY1_TYPE;
|
||||||
|
|
||||||
/* FIXME: additional physical CMOS drive detection should go here */
|
/* FIXME: additional physical CMOS drive detection should go here */
|
||||||
|
|
|
@ -264,20 +264,12 @@ lo_do_transfer(struct loop_device *lo, int cmd,
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void loop_iov_iter_bvec(struct iov_iter *i,
|
|
||||||
unsigned int direction, const struct bio_vec *bvec,
|
|
||||||
unsigned long nr_segs, size_t count)
|
|
||||||
{
|
|
||||||
iov_iter_bvec(i, direction, bvec, nr_segs, count);
|
|
||||||
i->type |= ITER_BVEC_FLAG_NO_REF;
|
|
||||||
}
|
|
||||||
|
|
||||||
static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos)
|
static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos)
|
||||||
{
|
{
|
||||||
struct iov_iter i;
|
struct iov_iter i;
|
||||||
ssize_t bw;
|
ssize_t bw;
|
||||||
|
|
||||||
loop_iov_iter_bvec(&i, WRITE, bvec, 1, bvec->bv_len);
|
iov_iter_bvec(&i, WRITE, bvec, 1, bvec->bv_len);
|
||||||
|
|
||||||
file_start_write(file);
|
file_start_write(file);
|
||||||
bw = vfs_iter_write(file, &i, ppos, 0);
|
bw = vfs_iter_write(file, &i, ppos, 0);
|
||||||
|
@ -355,7 +347,7 @@ static int lo_read_simple(struct loop_device *lo, struct request *rq,
|
||||||
ssize_t len;
|
ssize_t len;
|
||||||
|
|
||||||
rq_for_each_segment(bvec, rq, iter) {
|
rq_for_each_segment(bvec, rq, iter) {
|
||||||
loop_iov_iter_bvec(&i, READ, &bvec, 1, bvec.bv_len);
|
iov_iter_bvec(&i, READ, &bvec, 1, bvec.bv_len);
|
||||||
len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0);
|
len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0);
|
||||||
if (len < 0)
|
if (len < 0)
|
||||||
return len;
|
return len;
|
||||||
|
@ -396,7 +388,7 @@ static int lo_read_transfer(struct loop_device *lo, struct request *rq,
|
||||||
b.bv_offset = 0;
|
b.bv_offset = 0;
|
||||||
b.bv_len = bvec.bv_len;
|
b.bv_len = bvec.bv_len;
|
||||||
|
|
||||||
loop_iov_iter_bvec(&i, READ, &b, 1, b.bv_len);
|
iov_iter_bvec(&i, READ, &b, 1, b.bv_len);
|
||||||
len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0);
|
len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0);
|
||||||
if (len < 0) {
|
if (len < 0) {
|
||||||
ret = len;
|
ret = len;
|
||||||
|
@ -563,7 +555,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
|
||||||
}
|
}
|
||||||
atomic_set(&cmd->ref, 2);
|
atomic_set(&cmd->ref, 2);
|
||||||
|
|
||||||
loop_iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq));
|
iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq));
|
||||||
iter.iov_offset = offset;
|
iter.iov_offset = offset;
|
||||||
|
|
||||||
cmd->iocb.ki_pos = pos;
|
cmd->iocb.ki_pos = pos;
|
||||||
|
|
|
@ -1577,7 +1577,6 @@ static int exec_drive_command(struct mtip_port *port, u8 *command,
|
||||||
ATA_SECT_SIZE * xfer_sz);
|
ATA_SECT_SIZE * xfer_sz);
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
}
|
}
|
||||||
memset(buf, 0, ATA_SECT_SIZE * xfer_sz);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Build the FIS. */
|
/* Build the FIS. */
|
||||||
|
@ -2776,7 +2775,6 @@ static int mtip_dma_alloc(struct driver_data *dd)
|
||||||
&port->block1_dma, GFP_KERNEL);
|
&port->block1_dma, GFP_KERNEL);
|
||||||
if (!port->block1)
|
if (!port->block1)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
memset(port->block1, 0, BLOCK_DMA_ALLOC_SZ);
|
|
||||||
|
|
||||||
/* Allocate dma memory for command list */
|
/* Allocate dma memory for command list */
|
||||||
port->command_list =
|
port->command_list =
|
||||||
|
@ -2789,7 +2787,6 @@ static int mtip_dma_alloc(struct driver_data *dd)
|
||||||
port->block1_dma = 0;
|
port->block1_dma = 0;
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
}
|
}
|
||||||
memset(port->command_list, 0, AHCI_CMD_TBL_SZ);
|
|
||||||
|
|
||||||
/* Setup all pointers into first DMA region */
|
/* Setup all pointers into first DMA region */
|
||||||
port->rxfis = port->block1 + AHCI_RX_FIS_OFFSET;
|
port->rxfis = port->block1 + AHCI_RX_FIS_OFFSET;
|
||||||
|
@ -3529,8 +3526,6 @@ static int mtip_init_cmd(struct blk_mq_tag_set *set, struct request *rq,
|
||||||
if (!cmd->command)
|
if (!cmd->command)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
memset(cmd->command, 0, CMD_DMA_ALLOC_SZ);
|
|
||||||
|
|
||||||
sg_init_table(cmd->sg, MTIP_MAX_SG);
|
sg_init_table(cmd->sg, MTIP_MAX_SG);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
|
@ -327,11 +327,12 @@ static ssize_t nullb_device_power_store(struct config_item *item,
|
||||||
set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
|
set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
|
||||||
dev->power = newp;
|
dev->power = newp;
|
||||||
} else if (dev->power && !newp) {
|
} else if (dev->power && !newp) {
|
||||||
|
if (test_and_clear_bit(NULLB_DEV_FL_UP, &dev->flags)) {
|
||||||
mutex_lock(&lock);
|
mutex_lock(&lock);
|
||||||
dev->power = newp;
|
dev->power = newp;
|
||||||
null_del_dev(dev->nullb);
|
null_del_dev(dev->nullb);
|
||||||
mutex_unlock(&lock);
|
mutex_unlock(&lock);
|
||||||
clear_bit(NULLB_DEV_FL_UP, &dev->flags);
|
}
|
||||||
clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
|
clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1197,7 +1198,7 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd)
|
||||||
if (!cmd->error && dev->zoned) {
|
if (!cmd->error && dev->zoned) {
|
||||||
sector_t sector;
|
sector_t sector;
|
||||||
unsigned int nr_sectors;
|
unsigned int nr_sectors;
|
||||||
int op;
|
enum req_opf op;
|
||||||
|
|
||||||
if (dev->queue_mode == NULL_Q_BIO) {
|
if (dev->queue_mode == NULL_Q_BIO) {
|
||||||
op = bio_op(cmd->bio);
|
op = bio_op(cmd->bio);
|
||||||
|
@ -1488,7 +1489,6 @@ static int setup_queues(struct nullb *nullb)
|
||||||
if (!nullb->queues)
|
if (!nullb->queues)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
nullb->nr_queues = 0;
|
|
||||||
nullb->queue_depth = nullb->dev->hw_queue_depth;
|
nullb->queue_depth = nullb->dev->hw_queue_depth;
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
|
|
@ -2694,7 +2694,6 @@ static int skd_cons_skmsg(struct skd_device *skdev)
|
||||||
(FIT_QCMD_ALIGN - 1),
|
(FIT_QCMD_ALIGN - 1),
|
||||||
"not aligned: msg_buf %p mb_dma_address %pad\n",
|
"not aligned: msg_buf %p mb_dma_address %pad\n",
|
||||||
skmsg->msg_buf, &skmsg->mb_dma_address);
|
skmsg->msg_buf, &skmsg->mb_dma_address);
|
||||||
memset(skmsg->msg_buf, 0, SKD_N_FITMSG_BYTES);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
err_out:
|
err_out:
|
||||||
|
|
|
@ -478,7 +478,7 @@ static void __nvm_remove_target(struct nvm_target *t, bool graceful)
|
||||||
*/
|
*/
|
||||||
static int nvm_remove_tgt(struct nvm_ioctl_remove *remove)
|
static int nvm_remove_tgt(struct nvm_ioctl_remove *remove)
|
||||||
{
|
{
|
||||||
struct nvm_target *t;
|
struct nvm_target *t = NULL;
|
||||||
struct nvm_dev *dev;
|
struct nvm_dev *dev;
|
||||||
|
|
||||||
down_read(&nvm_lock);
|
down_read(&nvm_lock);
|
||||||
|
|
|
@ -323,14 +323,16 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
|
||||||
void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off,
|
void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off,
|
||||||
int nr_pages)
|
int nr_pages)
|
||||||
{
|
{
|
||||||
struct bio_vec bv;
|
struct bio_vec *bv;
|
||||||
int i;
|
struct page *page;
|
||||||
|
int i, e, nbv = 0;
|
||||||
|
|
||||||
WARN_ON(off + nr_pages != bio->bi_vcnt);
|
for (i = 0; i < bio->bi_vcnt; i++) {
|
||||||
|
bv = &bio->bi_io_vec[i];
|
||||||
for (i = off; i < nr_pages + off; i++) {
|
page = bv->bv_page;
|
||||||
bv = bio->bi_io_vec[i];
|
for (e = 0; e < bv->bv_len; e += PBLK_EXPOSED_PAGE_SIZE, nbv++)
|
||||||
mempool_free(bv.bv_page, &pblk->page_bio_pool);
|
if (nbv >= off)
|
||||||
|
mempool_free(page++, &pblk->page_bio_pool);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -393,6 +393,11 @@ long bch_bucket_alloc(struct cache *ca, unsigned int reserve, bool wait)
|
||||||
struct bucket *b;
|
struct bucket *b;
|
||||||
long r;
|
long r;
|
||||||
|
|
||||||
|
|
||||||
|
/* No allocation if CACHE_SET_IO_DISABLE bit is set */
|
||||||
|
if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &ca->set->flags)))
|
||||||
|
return -1;
|
||||||
|
|
||||||
/* fastpath */
|
/* fastpath */
|
||||||
if (fifo_pop(&ca->free[RESERVE_NONE], r) ||
|
if (fifo_pop(&ca->free[RESERVE_NONE], r) ||
|
||||||
fifo_pop(&ca->free[reserve], r))
|
fifo_pop(&ca->free[reserve], r))
|
||||||
|
@ -484,6 +489,10 @@ int __bch_bucket_alloc_set(struct cache_set *c, unsigned int reserve,
|
||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
|
/* No allocation if CACHE_SET_IO_DISABLE bit is set */
|
||||||
|
if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags)))
|
||||||
|
return -1;
|
||||||
|
|
||||||
lockdep_assert_held(&c->bucket_lock);
|
lockdep_assert_held(&c->bucket_lock);
|
||||||
BUG_ON(!n || n > c->caches_loaded || n > MAX_CACHES_PER_SET);
|
BUG_ON(!n || n > c->caches_loaded || n > MAX_CACHES_PER_SET);
|
||||||
|
|
||||||
|
|
|
@ -705,8 +705,8 @@ struct cache_set {
|
||||||
atomic_long_t writeback_keys_failed;
|
atomic_long_t writeback_keys_failed;
|
||||||
|
|
||||||
atomic_long_t reclaim;
|
atomic_long_t reclaim;
|
||||||
|
atomic_long_t reclaimed_journal_buckets;
|
||||||
atomic_long_t flush_write;
|
atomic_long_t flush_write;
|
||||||
atomic_long_t retry_flush_write;
|
|
||||||
|
|
||||||
enum {
|
enum {
|
||||||
ON_ERROR_UNREGISTER,
|
ON_ERROR_UNREGISTER,
|
||||||
|
@ -726,8 +726,6 @@ struct cache_set {
|
||||||
|
|
||||||
#define BUCKET_HASH_BITS 12
|
#define BUCKET_HASH_BITS 12
|
||||||
struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS];
|
struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS];
|
||||||
|
|
||||||
DECLARE_HEAP(struct btree *, flush_btree);
|
|
||||||
};
|
};
|
||||||
|
|
||||||
struct bbio {
|
struct bbio {
|
||||||
|
@ -1006,7 +1004,7 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size);
|
||||||
int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
|
int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
|
||||||
uint8_t *set_uuid);
|
uint8_t *set_uuid);
|
||||||
void bch_cached_dev_detach(struct cached_dev *dc);
|
void bch_cached_dev_detach(struct cached_dev *dc);
|
||||||
void bch_cached_dev_run(struct cached_dev *dc);
|
int bch_cached_dev_run(struct cached_dev *dc);
|
||||||
void bcache_device_stop(struct bcache_device *d);
|
void bcache_device_stop(struct bcache_device *d);
|
||||||
|
|
||||||
void bch_cache_set_unregister(struct cache_set *c);
|
void bch_cache_set_unregister(struct cache_set *c);
|
||||||
|
|
|
@ -347,22 +347,19 @@ EXPORT_SYMBOL(bch_btree_keys_alloc);
|
||||||
void bch_btree_keys_init(struct btree_keys *b, const struct btree_keys_ops *ops,
|
void bch_btree_keys_init(struct btree_keys *b, const struct btree_keys_ops *ops,
|
||||||
bool *expensive_debug_checks)
|
bool *expensive_debug_checks)
|
||||||
{
|
{
|
||||||
unsigned int i;
|
|
||||||
|
|
||||||
b->ops = ops;
|
b->ops = ops;
|
||||||
b->expensive_debug_checks = expensive_debug_checks;
|
b->expensive_debug_checks = expensive_debug_checks;
|
||||||
b->nsets = 0;
|
b->nsets = 0;
|
||||||
b->last_set_unwritten = 0;
|
b->last_set_unwritten = 0;
|
||||||
|
|
||||||
/* XXX: shouldn't be needed */
|
|
||||||
for (i = 0; i < MAX_BSETS; i++)
|
|
||||||
b->set[i].size = 0;
|
|
||||||
/*
|
/*
|
||||||
* Second loop starts at 1 because b->keys[0]->data is the memory we
|
* struct btree_keys in embedded in struct btree, and struct
|
||||||
* allocated
|
* bset_tree is embedded into struct btree_keys. They are all
|
||||||
|
* initialized as 0 by kzalloc() in mca_bucket_alloc(), and
|
||||||
|
* b->set[0].data is allocated in bch_btree_keys_alloc(), so we
|
||||||
|
* don't have to initiate b->set[].size and b->set[].data here
|
||||||
|
* any more.
|
||||||
*/
|
*/
|
||||||
for (i = 1; i < MAX_BSETS; i++)
|
|
||||||
b->set[i].data = NULL;
|
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL(bch_btree_keys_init);
|
EXPORT_SYMBOL(bch_btree_keys_init);
|
||||||
|
|
||||||
|
@ -970,45 +967,25 @@ static struct bset_search_iter bset_search_tree(struct bset_tree *t,
|
||||||
unsigned int inorder, j, n = 1;
|
unsigned int inorder, j, n = 1;
|
||||||
|
|
||||||
do {
|
do {
|
||||||
/*
|
|
||||||
* A bit trick here.
|
|
||||||
* If p < t->size, (int)(p - t->size) is a minus value and
|
|
||||||
* the most significant bit is set, right shifting 31 bits
|
|
||||||
* gets 1. If p >= t->size, the most significant bit is
|
|
||||||
* not set, right shifting 31 bits gets 0.
|
|
||||||
* So the following 2 lines equals to
|
|
||||||
* if (p >= t->size)
|
|
||||||
* p = 0;
|
|
||||||
* but a branch instruction is avoided.
|
|
||||||
*/
|
|
||||||
unsigned int p = n << 4;
|
unsigned int p = n << 4;
|
||||||
|
|
||||||
p &= ((int) (p - t->size)) >> 31;
|
if (p < t->size)
|
||||||
|
|
||||||
prefetch(&t->tree[p]);
|
prefetch(&t->tree[p]);
|
||||||
|
|
||||||
j = n;
|
j = n;
|
||||||
f = &t->tree[j];
|
f = &t->tree[j];
|
||||||
|
|
||||||
/*
|
if (likely(f->exponent != 127)) {
|
||||||
* Similar bit trick, use subtract operation to avoid a branch
|
if (f->mantissa >= bfloat_mantissa(search, f))
|
||||||
* instruction.
|
n = j * 2;
|
||||||
*
|
|
||||||
* n = (f->mantissa > bfloat_mantissa())
|
|
||||||
* ? j * 2
|
|
||||||
* : j * 2 + 1;
|
|
||||||
*
|
|
||||||
* We need to subtract 1 from f->mantissa for the sign bit trick
|
|
||||||
* to work - that's done in make_bfloat()
|
|
||||||
*/
|
|
||||||
if (likely(f->exponent != 127))
|
|
||||||
n = j * 2 + (((unsigned int)
|
|
||||||
(f->mantissa -
|
|
||||||
bfloat_mantissa(search, f))) >> 31);
|
|
||||||
else
|
else
|
||||||
n = (bkey_cmp(tree_to_bkey(t, j), search) > 0)
|
n = j * 2 + 1;
|
||||||
? j * 2
|
} else {
|
||||||
: j * 2 + 1;
|
if (bkey_cmp(tree_to_bkey(t, j), search) > 0)
|
||||||
|
n = j * 2;
|
||||||
|
else
|
||||||
|
n = j * 2 + 1;
|
||||||
|
}
|
||||||
} while (n < t->size);
|
} while (n < t->size);
|
||||||
|
|
||||||
inorder = to_inorder(j, t);
|
inorder = to_inorder(j, t);
|
||||||
|
|
|
@ -35,7 +35,7 @@
|
||||||
#include <linux/rcupdate.h>
|
#include <linux/rcupdate.h>
|
||||||
#include <linux/sched/clock.h>
|
#include <linux/sched/clock.h>
|
||||||
#include <linux/rculist.h>
|
#include <linux/rculist.h>
|
||||||
|
#include <linux/delay.h>
|
||||||
#include <trace/events/bcache.h>
|
#include <trace/events/bcache.h>
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -613,6 +613,10 @@ static void mca_data_alloc(struct btree *b, struct bkey *k, gfp_t gfp)
|
||||||
static struct btree *mca_bucket_alloc(struct cache_set *c,
|
static struct btree *mca_bucket_alloc(struct cache_set *c,
|
||||||
struct bkey *k, gfp_t gfp)
|
struct bkey *k, gfp_t gfp)
|
||||||
{
|
{
|
||||||
|
/*
|
||||||
|
* kzalloc() is necessary here for initialization,
|
||||||
|
* see code comments in bch_btree_keys_init().
|
||||||
|
*/
|
||||||
struct btree *b = kzalloc(sizeof(struct btree), gfp);
|
struct btree *b = kzalloc(sizeof(struct btree), gfp);
|
||||||
|
|
||||||
if (!b)
|
if (!b)
|
||||||
|
@ -655,7 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
|
||||||
up(&b->io_mutex);
|
up(&b->io_mutex);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
retry:
|
||||||
|
/*
|
||||||
|
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
|
||||||
|
* __bch_btree_node_write(). To avoid an extra flush, acquire
|
||||||
|
* b->write_lock before checking BTREE_NODE_dirty bit.
|
||||||
|
*/
|
||||||
mutex_lock(&b->write_lock);
|
mutex_lock(&b->write_lock);
|
||||||
|
/*
|
||||||
|
* If this btree node is selected in btree_flush_write() by journal
|
||||||
|
* code, delay and retry until the node is flushed by journal code
|
||||||
|
* and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
|
||||||
|
*/
|
||||||
|
if (btree_node_journal_flush(b)) {
|
||||||
|
pr_debug("bnode %p is flushing by journal, retry", b);
|
||||||
|
mutex_unlock(&b->write_lock);
|
||||||
|
udelay(1);
|
||||||
|
goto retry;
|
||||||
|
}
|
||||||
|
|
||||||
if (btree_node_dirty(b))
|
if (btree_node_dirty(b))
|
||||||
__bch_btree_node_write(b, &cl);
|
__bch_btree_node_write(b, &cl);
|
||||||
mutex_unlock(&b->write_lock);
|
mutex_unlock(&b->write_lock);
|
||||||
|
@ -778,10 +800,15 @@ void bch_btree_cache_free(struct cache_set *c)
|
||||||
while (!list_empty(&c->btree_cache)) {
|
while (!list_empty(&c->btree_cache)) {
|
||||||
b = list_first_entry(&c->btree_cache, struct btree, list);
|
b = list_first_entry(&c->btree_cache, struct btree, list);
|
||||||
|
|
||||||
if (btree_node_dirty(b))
|
/*
|
||||||
|
* This function is called by cache_set_free(), no I/O
|
||||||
|
* request on cache now, it is unnecessary to acquire
|
||||||
|
* b->write_lock before clearing BTREE_NODE_dirty anymore.
|
||||||
|
*/
|
||||||
|
if (btree_node_dirty(b)) {
|
||||||
btree_complete_write(b, btree_current_write(b));
|
btree_complete_write(b, btree_current_write(b));
|
||||||
clear_bit(BTREE_NODE_dirty, &b->flags);
|
clear_bit(BTREE_NODE_dirty, &b->flags);
|
||||||
|
}
|
||||||
mca_data_free(b);
|
mca_data_free(b);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1067,11 +1094,25 @@ static void btree_node_free(struct btree *b)
|
||||||
|
|
||||||
BUG_ON(b == b->c->root);
|
BUG_ON(b == b->c->root);
|
||||||
|
|
||||||
|
retry:
|
||||||
mutex_lock(&b->write_lock);
|
mutex_lock(&b->write_lock);
|
||||||
|
/*
|
||||||
|
* If the btree node is selected and flushing in btree_flush_write(),
|
||||||
|
* delay and retry until the BTREE_NODE_journal_flush bit cleared,
|
||||||
|
* then it is safe to free the btree node here. Otherwise this btree
|
||||||
|
* node will be in race condition.
|
||||||
|
*/
|
||||||
|
if (btree_node_journal_flush(b)) {
|
||||||
|
mutex_unlock(&b->write_lock);
|
||||||
|
pr_debug("bnode %p journal_flush set, retry", b);
|
||||||
|
udelay(1);
|
||||||
|
goto retry;
|
||||||
|
}
|
||||||
|
|
||||||
if (btree_node_dirty(b))
|
if (btree_node_dirty(b)) {
|
||||||
btree_complete_write(b, btree_current_write(b));
|
btree_complete_write(b, btree_current_write(b));
|
||||||
clear_bit(BTREE_NODE_dirty, &b->flags);
|
clear_bit(BTREE_NODE_dirty, &b->flags);
|
||||||
|
}
|
||||||
|
|
||||||
mutex_unlock(&b->write_lock);
|
mutex_unlock(&b->write_lock);
|
||||||
|
|
||||||
|
|
|
@ -158,11 +158,13 @@ enum btree_flags {
|
||||||
BTREE_NODE_io_error,
|
BTREE_NODE_io_error,
|
||||||
BTREE_NODE_dirty,
|
BTREE_NODE_dirty,
|
||||||
BTREE_NODE_write_idx,
|
BTREE_NODE_write_idx,
|
||||||
|
BTREE_NODE_journal_flush,
|
||||||
};
|
};
|
||||||
|
|
||||||
BTREE_FLAG(io_error);
|
BTREE_FLAG(io_error);
|
||||||
BTREE_FLAG(dirty);
|
BTREE_FLAG(dirty);
|
||||||
BTREE_FLAG(write_idx);
|
BTREE_FLAG(write_idx);
|
||||||
|
BTREE_FLAG(journal_flush);
|
||||||
|
|
||||||
static inline struct btree_write *btree_current_write(struct btree *b)
|
static inline struct btree_write *btree_current_write(struct btree *b)
|
||||||
{
|
{
|
||||||
|
|
|
@ -58,6 +58,18 @@ void bch_count_backing_io_errors(struct cached_dev *dc, struct bio *bio)
|
||||||
|
|
||||||
WARN_ONCE(!dc, "NULL pointer of struct cached_dev");
|
WARN_ONCE(!dc, "NULL pointer of struct cached_dev");
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Read-ahead requests on a degrading and recovering md raid
|
||||||
|
* (e.g. raid6) device might be failured immediately by md
|
||||||
|
* raid code, which is not a real hardware media failure. So
|
||||||
|
* we shouldn't count failed REQ_RAHEAD bio to dc->io_errors.
|
||||||
|
*/
|
||||||
|
if (bio->bi_opf & REQ_RAHEAD) {
|
||||||
|
pr_warn_ratelimited("%s: Read-ahead I/O failed on backing device, ignore",
|
||||||
|
dc->backing_dev_name);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
errors = atomic_add_return(1, &dc->io_errors);
|
errors = atomic_add_return(1, &dc->io_errors);
|
||||||
if (errors < dc->error_limit)
|
if (errors < dc->error_limit)
|
||||||
pr_err("%s: IO error on backing device, unrecoverable",
|
pr_err("%s: IO error on backing device, unrecoverable",
|
||||||
|
|
|
@ -100,6 +100,20 @@ reread: left = ca->sb.bucket_size - offset;
|
||||||
|
|
||||||
blocks = set_blocks(j, block_bytes(ca->set));
|
blocks = set_blocks(j, block_bytes(ca->set));
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Nodes in 'list' are in linear increasing order of
|
||||||
|
* i->j.seq, the node on head has the smallest (oldest)
|
||||||
|
* journal seq, the node on tail has the biggest
|
||||||
|
* (latest) journal seq.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check from the oldest jset for last_seq. If
|
||||||
|
* i->j.seq < j->last_seq, it means the oldest jset
|
||||||
|
* in list is expired and useless, remove it from
|
||||||
|
* this list. Otherwise, j is a condidate jset for
|
||||||
|
* further following checks.
|
||||||
|
*/
|
||||||
while (!list_empty(list)) {
|
while (!list_empty(list)) {
|
||||||
i = list_first_entry(list,
|
i = list_first_entry(list,
|
||||||
struct journal_replay, list);
|
struct journal_replay, list);
|
||||||
|
@ -109,13 +123,22 @@ reread: left = ca->sb.bucket_size - offset;
|
||||||
kfree(i);
|
kfree(i);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* iterate list in reverse order (from latest jset) */
|
||||||
list_for_each_entry_reverse(i, list, list) {
|
list_for_each_entry_reverse(i, list, list) {
|
||||||
if (j->seq == i->j.seq)
|
if (j->seq == i->j.seq)
|
||||||
goto next_set;
|
goto next_set;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* if j->seq is less than any i->j.last_seq
|
||||||
|
* in list, j is an expired and useless jset.
|
||||||
|
*/
|
||||||
if (j->seq < i->j.last_seq)
|
if (j->seq < i->j.last_seq)
|
||||||
goto next_set;
|
goto next_set;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* 'where' points to first jset in list which
|
||||||
|
* is elder then j.
|
||||||
|
*/
|
||||||
if (j->seq > i->j.seq) {
|
if (j->seq > i->j.seq) {
|
||||||
where = &i->list;
|
where = &i->list;
|
||||||
goto add;
|
goto add;
|
||||||
|
@ -129,9 +152,11 @@ add:
|
||||||
if (!i)
|
if (!i)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
memcpy(&i->j, j, bytes);
|
memcpy(&i->j, j, bytes);
|
||||||
|
/* Add to the location after 'where' points to */
|
||||||
list_add(&i->list, where);
|
list_add(&i->list, where);
|
||||||
ret = 1;
|
ret = 1;
|
||||||
|
|
||||||
|
if (j->seq > ja->seq[bucket_index])
|
||||||
ja->seq[bucket_index] = j->seq;
|
ja->seq[bucket_index] = j->seq;
|
||||||
next_set:
|
next_set:
|
||||||
offset += blocks * ca->sb.block_size;
|
offset += blocks * ca->sb.block_size;
|
||||||
|
@ -268,7 +293,7 @@ bsearch:
|
||||||
struct journal_replay,
|
struct journal_replay,
|
||||||
list)->j.seq;
|
list)->j.seq;
|
||||||
|
|
||||||
return ret;
|
return 0;
|
||||||
#undef read_bucket
|
#undef read_bucket
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -391,60 +416,90 @@ err:
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Journalling */
|
/* Journalling */
|
||||||
#define journal_max_cmp(l, r) \
|
|
||||||
(fifo_idx(&c->journal.pin, btree_current_write(l)->journal) < \
|
|
||||||
fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal))
|
|
||||||
#define journal_min_cmp(l, r) \
|
|
||||||
(fifo_idx(&c->journal.pin, btree_current_write(l)->journal) > \
|
|
||||||
fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal))
|
|
||||||
|
|
||||||
static void btree_flush_write(struct cache_set *c)
|
static void btree_flush_write(struct cache_set *c)
|
||||||
{
|
{
|
||||||
/*
|
struct btree *b, *t, *btree_nodes[BTREE_FLUSH_NR];
|
||||||
* Try to find the btree node with that references the oldest journal
|
unsigned int i, n;
|
||||||
* entry, best is our current candidate and is locked if non NULL:
|
|
||||||
*/
|
if (c->journal.btree_flushing)
|
||||||
struct btree *b;
|
return;
|
||||||
int i;
|
|
||||||
|
spin_lock(&c->journal.flush_write_lock);
|
||||||
|
if (c->journal.btree_flushing) {
|
||||||
|
spin_unlock(&c->journal.flush_write_lock);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
c->journal.btree_flushing = true;
|
||||||
|
spin_unlock(&c->journal.flush_write_lock);
|
||||||
|
|
||||||
atomic_long_inc(&c->flush_write);
|
atomic_long_inc(&c->flush_write);
|
||||||
|
memset(btree_nodes, 0, sizeof(btree_nodes));
|
||||||
|
n = 0;
|
||||||
|
|
||||||
retry:
|
mutex_lock(&c->bucket_lock);
|
||||||
spin_lock(&c->journal.lock);
|
list_for_each_entry_safe_reverse(b, t, &c->btree_cache, list) {
|
||||||
if (heap_empty(&c->flush_btree)) {
|
if (btree_node_journal_flush(b))
|
||||||
for_each_cached_btree(b, c, i)
|
pr_err("BUG: flush_write bit should not be set here!");
|
||||||
if (btree_current_write(b)->journal) {
|
|
||||||
if (!heap_full(&c->flush_btree))
|
|
||||||
heap_add(&c->flush_btree, b,
|
|
||||||
journal_max_cmp);
|
|
||||||
else if (journal_max_cmp(b,
|
|
||||||
heap_peek(&c->flush_btree))) {
|
|
||||||
c->flush_btree.data[0] = b;
|
|
||||||
heap_sift(&c->flush_btree, 0,
|
|
||||||
journal_max_cmp);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
for (i = c->flush_btree.used / 2 - 1; i >= 0; --i)
|
|
||||||
heap_sift(&c->flush_btree, i, journal_min_cmp);
|
|
||||||
}
|
|
||||||
|
|
||||||
b = NULL;
|
|
||||||
heap_pop(&c->flush_btree, b, journal_min_cmp);
|
|
||||||
spin_unlock(&c->journal.lock);
|
|
||||||
|
|
||||||
if (b) {
|
|
||||||
mutex_lock(&b->write_lock);
|
mutex_lock(&b->write_lock);
|
||||||
|
|
||||||
|
if (!btree_node_dirty(b)) {
|
||||||
|
mutex_unlock(&b->write_lock);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
if (!btree_current_write(b)->journal) {
|
if (!btree_current_write(b)->journal) {
|
||||||
mutex_unlock(&b->write_lock);
|
mutex_unlock(&b->write_lock);
|
||||||
/* We raced */
|
continue;
|
||||||
atomic_long_inc(&c->retry_flush_write);
|
}
|
||||||
goto retry;
|
|
||||||
|
set_btree_node_journal_flush(b);
|
||||||
|
|
||||||
|
mutex_unlock(&b->write_lock);
|
||||||
|
|
||||||
|
btree_nodes[n++] = b;
|
||||||
|
if (n == BTREE_FLUSH_NR)
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
mutex_unlock(&c->bucket_lock);
|
||||||
|
|
||||||
|
for (i = 0; i < n; i++) {
|
||||||
|
b = btree_nodes[i];
|
||||||
|
if (!b) {
|
||||||
|
pr_err("BUG: btree_nodes[%d] is NULL", i);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* safe to check without holding b->write_lock */
|
||||||
|
if (!btree_node_journal_flush(b)) {
|
||||||
|
pr_err("BUG: bnode %p: journal_flush bit cleaned", b);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
mutex_lock(&b->write_lock);
|
||||||
|
if (!btree_current_write(b)->journal) {
|
||||||
|
clear_bit(BTREE_NODE_journal_flush, &b->flags);
|
||||||
|
mutex_unlock(&b->write_lock);
|
||||||
|
pr_debug("bnode %p: written by others", b);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!btree_node_dirty(b)) {
|
||||||
|
clear_bit(BTREE_NODE_journal_flush, &b->flags);
|
||||||
|
mutex_unlock(&b->write_lock);
|
||||||
|
pr_debug("bnode %p: dirty bit cleaned by others", b);
|
||||||
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
__bch_btree_node_write(b, NULL);
|
__bch_btree_node_write(b, NULL);
|
||||||
|
clear_bit(BTREE_NODE_journal_flush, &b->flags);
|
||||||
mutex_unlock(&b->write_lock);
|
mutex_unlock(&b->write_lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
spin_lock(&c->journal.flush_write_lock);
|
||||||
|
c->journal.btree_flushing = false;
|
||||||
|
spin_unlock(&c->journal.flush_write_lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
#define last_seq(j) ((j)->seq - fifo_used(&(j)->pin) + 1)
|
#define last_seq(j) ((j)->seq - fifo_used(&(j)->pin) + 1)
|
||||||
|
@ -559,6 +614,7 @@ static void journal_reclaim(struct cache_set *c)
|
||||||
k->ptr[n++] = MAKE_PTR(0,
|
k->ptr[n++] = MAKE_PTR(0,
|
||||||
bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
|
bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
|
||||||
ca->sb.nr_this_dev);
|
ca->sb.nr_this_dev);
|
||||||
|
atomic_long_inc(&c->reclaimed_journal_buckets);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (n) {
|
if (n) {
|
||||||
|
@ -811,6 +867,10 @@ atomic_t *bch_journal(struct cache_set *c,
|
||||||
struct journal_write *w;
|
struct journal_write *w;
|
||||||
atomic_t *ret;
|
atomic_t *ret;
|
||||||
|
|
||||||
|
/* No journaling if CACHE_SET_IO_DISABLE set already */
|
||||||
|
if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags)))
|
||||||
|
return NULL;
|
||||||
|
|
||||||
if (!CACHE_SYNC(&c->sb))
|
if (!CACHE_SYNC(&c->sb))
|
||||||
return NULL;
|
return NULL;
|
||||||
|
|
||||||
|
@ -855,7 +915,6 @@ void bch_journal_free(struct cache_set *c)
|
||||||
free_pages((unsigned long) c->journal.w[1].data, JSET_BITS);
|
free_pages((unsigned long) c->journal.w[1].data, JSET_BITS);
|
||||||
free_pages((unsigned long) c->journal.w[0].data, JSET_BITS);
|
free_pages((unsigned long) c->journal.w[0].data, JSET_BITS);
|
||||||
free_fifo(&c->journal.pin);
|
free_fifo(&c->journal.pin);
|
||||||
free_heap(&c->flush_btree);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
int bch_journal_alloc(struct cache_set *c)
|
int bch_journal_alloc(struct cache_set *c)
|
||||||
|
@ -863,6 +922,7 @@ int bch_journal_alloc(struct cache_set *c)
|
||||||
struct journal *j = &c->journal;
|
struct journal *j = &c->journal;
|
||||||
|
|
||||||
spin_lock_init(&j->lock);
|
spin_lock_init(&j->lock);
|
||||||
|
spin_lock_init(&j->flush_write_lock);
|
||||||
INIT_DELAYED_WORK(&j->work, journal_write_work);
|
INIT_DELAYED_WORK(&j->work, journal_write_work);
|
||||||
|
|
||||||
c->journal_delay_ms = 100;
|
c->journal_delay_ms = 100;
|
||||||
|
@ -870,8 +930,7 @@ int bch_journal_alloc(struct cache_set *c)
|
||||||
j->w[0].c = c;
|
j->w[0].c = c;
|
||||||
j->w[1].c = c;
|
j->w[1].c = c;
|
||||||
|
|
||||||
if (!(init_heap(&c->flush_btree, 128, GFP_KERNEL)) ||
|
if (!(init_fifo(&j->pin, JOURNAL_PIN, GFP_KERNEL)) ||
|
||||||
!(init_fifo(&j->pin, JOURNAL_PIN, GFP_KERNEL)) ||
|
|
||||||
!(j->w[0].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)) ||
|
!(j->w[0].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)) ||
|
||||||
!(j->w[1].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)))
|
!(j->w[1].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)))
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
|
@ -103,6 +103,8 @@ struct journal_write {
|
||||||
/* Embedded in struct cache_set */
|
/* Embedded in struct cache_set */
|
||||||
struct journal {
|
struct journal {
|
||||||
spinlock_t lock;
|
spinlock_t lock;
|
||||||
|
spinlock_t flush_write_lock;
|
||||||
|
bool btree_flushing;
|
||||||
/* used when waiting because the journal was full */
|
/* used when waiting because the journal was full */
|
||||||
struct closure_waitlist wait;
|
struct closure_waitlist wait;
|
||||||
struct closure io;
|
struct closure io;
|
||||||
|
@ -154,6 +156,8 @@ struct journal_device {
|
||||||
struct bio_vec bv[8];
|
struct bio_vec bv[8];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
#define BTREE_FLUSH_NR 8
|
||||||
|
|
||||||
#define journal_pin_cmp(c, l, r) \
|
#define journal_pin_cmp(c, l, r) \
|
||||||
(fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r)))
|
(fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r)))
|
||||||
|
|
||||||
|
|
|
@ -40,6 +40,7 @@ static const char invalid_uuid[] = {
|
||||||
|
|
||||||
static struct kobject *bcache_kobj;
|
static struct kobject *bcache_kobj;
|
||||||
struct mutex bch_register_lock;
|
struct mutex bch_register_lock;
|
||||||
|
bool bcache_is_reboot;
|
||||||
LIST_HEAD(bch_cache_sets);
|
LIST_HEAD(bch_cache_sets);
|
||||||
static LIST_HEAD(uncached_devices);
|
static LIST_HEAD(uncached_devices);
|
||||||
|
|
||||||
|
@ -49,6 +50,7 @@ static wait_queue_head_t unregister_wait;
|
||||||
struct workqueue_struct *bcache_wq;
|
struct workqueue_struct *bcache_wq;
|
||||||
struct workqueue_struct *bch_journal_wq;
|
struct workqueue_struct *bch_journal_wq;
|
||||||
|
|
||||||
|
|
||||||
#define BTREE_MAX_PAGES (256 * 1024 / PAGE_SIZE)
|
#define BTREE_MAX_PAGES (256 * 1024 / PAGE_SIZE)
|
||||||
/* limitation of partitions number on single bcache device */
|
/* limitation of partitions number on single bcache device */
|
||||||
#define BCACHE_MINORS 128
|
#define BCACHE_MINORS 128
|
||||||
|
@ -197,7 +199,9 @@ err:
|
||||||
static void write_bdev_super_endio(struct bio *bio)
|
static void write_bdev_super_endio(struct bio *bio)
|
||||||
{
|
{
|
||||||
struct cached_dev *dc = bio->bi_private;
|
struct cached_dev *dc = bio->bi_private;
|
||||||
/* XXX: error checking */
|
|
||||||
|
if (bio->bi_status)
|
||||||
|
bch_count_backing_io_errors(dc, bio);
|
||||||
|
|
||||||
closure_put(&dc->sb_write);
|
closure_put(&dc->sb_write);
|
||||||
}
|
}
|
||||||
|
@ -691,6 +695,7 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c,
|
||||||
{
|
{
|
||||||
unsigned int i;
|
unsigned int i;
|
||||||
struct cache *ca;
|
struct cache *ca;
|
||||||
|
int ret;
|
||||||
|
|
||||||
for_each_cache(ca, d->c, i)
|
for_each_cache(ca, d->c, i)
|
||||||
bd_link_disk_holder(ca->bdev, d->disk);
|
bd_link_disk_holder(ca->bdev, d->disk);
|
||||||
|
@ -698,9 +703,13 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c,
|
||||||
snprintf(d->name, BCACHEDEVNAME_SIZE,
|
snprintf(d->name, BCACHEDEVNAME_SIZE,
|
||||||
"%s%u", name, d->id);
|
"%s%u", name, d->id);
|
||||||
|
|
||||||
WARN(sysfs_create_link(&d->kobj, &c->kobj, "cache") ||
|
ret = sysfs_create_link(&d->kobj, &c->kobj, "cache");
|
||||||
sysfs_create_link(&c->kobj, &d->kobj, d->name),
|
if (ret < 0)
|
||||||
"Couldn't create device <-> cache set symlinks");
|
pr_err("Couldn't create device -> cache set symlink");
|
||||||
|
|
||||||
|
ret = sysfs_create_link(&c->kobj, &d->kobj, d->name);
|
||||||
|
if (ret < 0)
|
||||||
|
pr_err("Couldn't create cache set -> device symlink");
|
||||||
|
|
||||||
clear_bit(BCACHE_DEV_UNLINK_DONE, &d->flags);
|
clear_bit(BCACHE_DEV_UNLINK_DONE, &d->flags);
|
||||||
}
|
}
|
||||||
|
@ -908,7 +917,7 @@ static int cached_dev_status_update(void *arg)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
void bch_cached_dev_run(struct cached_dev *dc)
|
int bch_cached_dev_run(struct cached_dev *dc)
|
||||||
{
|
{
|
||||||
struct bcache_device *d = &dc->disk;
|
struct bcache_device *d = &dc->disk;
|
||||||
char *buf = kmemdup_nul(dc->sb.label, SB_LABEL_SIZE, GFP_KERNEL);
|
char *buf = kmemdup_nul(dc->sb.label, SB_LABEL_SIZE, GFP_KERNEL);
|
||||||
|
@ -919,11 +928,19 @@ void bch_cached_dev_run(struct cached_dev *dc)
|
||||||
NULL,
|
NULL,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
if (dc->io_disable) {
|
||||||
|
pr_err("I/O disabled on cached dev %s",
|
||||||
|
dc->backing_dev_name);
|
||||||
|
return -EIO;
|
||||||
|
}
|
||||||
|
|
||||||
if (atomic_xchg(&dc->running, 1)) {
|
if (atomic_xchg(&dc->running, 1)) {
|
||||||
kfree(env[1]);
|
kfree(env[1]);
|
||||||
kfree(env[2]);
|
kfree(env[2]);
|
||||||
kfree(buf);
|
kfree(buf);
|
||||||
return;
|
pr_info("cached dev %s is running already",
|
||||||
|
dc->backing_dev_name);
|
||||||
|
return -EBUSY;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!d->c &&
|
if (!d->c &&
|
||||||
|
@ -949,8 +966,11 @@ void bch_cached_dev_run(struct cached_dev *dc)
|
||||||
kfree(buf);
|
kfree(buf);
|
||||||
|
|
||||||
if (sysfs_create_link(&d->kobj, &disk_to_dev(d->disk)->kobj, "dev") ||
|
if (sysfs_create_link(&d->kobj, &disk_to_dev(d->disk)->kobj, "dev") ||
|
||||||
sysfs_create_link(&disk_to_dev(d->disk)->kobj, &d->kobj, "bcache"))
|
sysfs_create_link(&disk_to_dev(d->disk)->kobj,
|
||||||
pr_debug("error creating sysfs link");
|
&d->kobj, "bcache")) {
|
||||||
|
pr_err("Couldn't create bcache dev <-> disk sysfs symlinks");
|
||||||
|
return -ENOMEM;
|
||||||
|
}
|
||||||
|
|
||||||
dc->status_update_thread = kthread_run(cached_dev_status_update,
|
dc->status_update_thread = kthread_run(cached_dev_status_update,
|
||||||
dc, "bcache_status_update");
|
dc, "bcache_status_update");
|
||||||
|
@ -959,6 +979,8 @@ void bch_cached_dev_run(struct cached_dev *dc)
|
||||||
"continue to run without monitoring backing "
|
"continue to run without monitoring backing "
|
||||||
"device status");
|
"device status");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -996,7 +1018,6 @@ static void cached_dev_detach_finish(struct work_struct *w)
|
||||||
BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags));
|
BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags));
|
||||||
BUG_ON(refcount_read(&dc->count));
|
BUG_ON(refcount_read(&dc->count));
|
||||||
|
|
||||||
mutex_lock(&bch_register_lock);
|
|
||||||
|
|
||||||
if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
|
if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
|
||||||
cancel_writeback_rate_update_dwork(dc);
|
cancel_writeback_rate_update_dwork(dc);
|
||||||
|
@ -1012,6 +1033,8 @@ static void cached_dev_detach_finish(struct work_struct *w)
|
||||||
bch_write_bdev_super(dc, &cl);
|
bch_write_bdev_super(dc, &cl);
|
||||||
closure_sync(&cl);
|
closure_sync(&cl);
|
||||||
|
|
||||||
|
mutex_lock(&bch_register_lock);
|
||||||
|
|
||||||
calc_cached_dev_sectors(dc->disk.c);
|
calc_cached_dev_sectors(dc->disk.c);
|
||||||
bcache_device_detach(&dc->disk);
|
bcache_device_detach(&dc->disk);
|
||||||
list_move(&dc->list, &uncached_devices);
|
list_move(&dc->list, &uncached_devices);
|
||||||
|
@ -1054,6 +1077,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
|
||||||
uint32_t rtime = cpu_to_le32((u32)ktime_get_real_seconds());
|
uint32_t rtime = cpu_to_le32((u32)ktime_get_real_seconds());
|
||||||
struct uuid_entry *u;
|
struct uuid_entry *u;
|
||||||
struct cached_dev *exist_dc, *t;
|
struct cached_dev *exist_dc, *t;
|
||||||
|
int ret = 0;
|
||||||
|
|
||||||
if ((set_uuid && memcmp(set_uuid, c->sb.set_uuid, 16)) ||
|
if ((set_uuid && memcmp(set_uuid, c->sb.set_uuid, 16)) ||
|
||||||
(!set_uuid && memcmp(dc->sb.set_uuid, c->sb.set_uuid, 16)))
|
(!set_uuid && memcmp(dc->sb.set_uuid, c->sb.set_uuid, 16)))
|
||||||
|
@ -1153,6 +1177,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
|
||||||
down_write(&dc->writeback_lock);
|
down_write(&dc->writeback_lock);
|
||||||
if (bch_cached_dev_writeback_start(dc)) {
|
if (bch_cached_dev_writeback_start(dc)) {
|
||||||
up_write(&dc->writeback_lock);
|
up_write(&dc->writeback_lock);
|
||||||
|
pr_err("Couldn't start writeback facilities for %s",
|
||||||
|
dc->disk.disk->disk_name);
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1163,7 +1189,22 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
|
||||||
|
|
||||||
bch_sectors_dirty_init(&dc->disk);
|
bch_sectors_dirty_init(&dc->disk);
|
||||||
|
|
||||||
bch_cached_dev_run(dc);
|
ret = bch_cached_dev_run(dc);
|
||||||
|
if (ret && (ret != -EBUSY)) {
|
||||||
|
up_write(&dc->writeback_lock);
|
||||||
|
/*
|
||||||
|
* bch_register_lock is held, bcache_device_stop() is not
|
||||||
|
* able to be directly called. The kthread and kworker
|
||||||
|
* created previously in bch_cached_dev_writeback_start()
|
||||||
|
* have to be stopped manually here.
|
||||||
|
*/
|
||||||
|
kthread_stop(dc->writeback_thread);
|
||||||
|
cancel_writeback_rate_update_dwork(dc);
|
||||||
|
pr_err("Couldn't run cached device %s",
|
||||||
|
dc->backing_dev_name);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
bcache_device_link(&dc->disk, c, "bdev");
|
bcache_device_link(&dc->disk, c, "bdev");
|
||||||
atomic_inc(&c->attached_dev_nr);
|
atomic_inc(&c->attached_dev_nr);
|
||||||
|
|
||||||
|
@ -1190,18 +1231,16 @@ static void cached_dev_free(struct closure *cl)
|
||||||
{
|
{
|
||||||
struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl);
|
struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl);
|
||||||
|
|
||||||
mutex_lock(&bch_register_lock);
|
|
||||||
|
|
||||||
if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
|
if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
|
||||||
cancel_writeback_rate_update_dwork(dc);
|
cancel_writeback_rate_update_dwork(dc);
|
||||||
|
|
||||||
if (!IS_ERR_OR_NULL(dc->writeback_thread))
|
if (!IS_ERR_OR_NULL(dc->writeback_thread))
|
||||||
kthread_stop(dc->writeback_thread);
|
kthread_stop(dc->writeback_thread);
|
||||||
if (dc->writeback_write_wq)
|
|
||||||
destroy_workqueue(dc->writeback_write_wq);
|
|
||||||
if (!IS_ERR_OR_NULL(dc->status_update_thread))
|
if (!IS_ERR_OR_NULL(dc->status_update_thread))
|
||||||
kthread_stop(dc->status_update_thread);
|
kthread_stop(dc->status_update_thread);
|
||||||
|
|
||||||
|
mutex_lock(&bch_register_lock);
|
||||||
|
|
||||||
if (atomic_read(&dc->running))
|
if (atomic_read(&dc->running))
|
||||||
bd_unlink_disk_holder(dc->bdev, dc->disk.disk);
|
bd_unlink_disk_holder(dc->bdev, dc->disk.disk);
|
||||||
bcache_device_free(&dc->disk);
|
bcache_device_free(&dc->disk);
|
||||||
|
@ -1290,6 +1329,7 @@ static int register_bdev(struct cache_sb *sb, struct page *sb_page,
|
||||||
{
|
{
|
||||||
const char *err = "cannot allocate memory";
|
const char *err = "cannot allocate memory";
|
||||||
struct cache_set *c;
|
struct cache_set *c;
|
||||||
|
int ret = -ENOMEM;
|
||||||
|
|
||||||
bdevname(bdev, dc->backing_dev_name);
|
bdevname(bdev, dc->backing_dev_name);
|
||||||
memcpy(&dc->sb, sb, sizeof(struct cache_sb));
|
memcpy(&dc->sb, sb, sizeof(struct cache_sb));
|
||||||
|
@ -1319,14 +1359,18 @@ static int register_bdev(struct cache_sb *sb, struct page *sb_page,
|
||||||
bch_cached_dev_attach(dc, c, NULL);
|
bch_cached_dev_attach(dc, c, NULL);
|
||||||
|
|
||||||
if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE ||
|
if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE ||
|
||||||
BDEV_STATE(&dc->sb) == BDEV_STATE_STALE)
|
BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) {
|
||||||
bch_cached_dev_run(dc);
|
err = "failed to run cached device";
|
||||||
|
ret = bch_cached_dev_run(dc);
|
||||||
|
if (ret)
|
||||||
|
goto err;
|
||||||
|
}
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
err:
|
err:
|
||||||
pr_notice("error %s: %s", dc->backing_dev_name, err);
|
pr_notice("error %s: %s", dc->backing_dev_name, err);
|
||||||
bcache_device_stop(&dc->disk);
|
bcache_device_stop(&dc->disk);
|
||||||
return -EIO;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Flash only volumes */
|
/* Flash only volumes */
|
||||||
|
@ -1437,8 +1481,6 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size)
|
||||||
|
|
||||||
bool bch_cached_dev_error(struct cached_dev *dc)
|
bool bch_cached_dev_error(struct cached_dev *dc)
|
||||||
{
|
{
|
||||||
struct cache_set *c;
|
|
||||||
|
|
||||||
if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags))
|
if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags))
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
|
@ -1449,21 +1491,6 @@ bool bch_cached_dev_error(struct cached_dev *dc)
|
||||||
pr_err("stop %s: too many IO errors on backing device %s\n",
|
pr_err("stop %s: too many IO errors on backing device %s\n",
|
||||||
dc->disk.disk->disk_name, dc->backing_dev_name);
|
dc->disk.disk->disk_name, dc->backing_dev_name);
|
||||||
|
|
||||||
/*
|
|
||||||
* If the cached device is still attached to a cache set,
|
|
||||||
* even dc->io_disable is true and no more I/O requests
|
|
||||||
* accepted, cache device internal I/O (writeback scan or
|
|
||||||
* garbage collection) may still prevent bcache device from
|
|
||||||
* being stopped. So here CACHE_SET_IO_DISABLE should be
|
|
||||||
* set to c->flags too, to make the internal I/O to cache
|
|
||||||
* device rejected and stopped immediately.
|
|
||||||
* If c is NULL, that means the bcache device is not attached
|
|
||||||
* to any cache set, then no CACHE_SET_IO_DISABLE bit to set.
|
|
||||||
*/
|
|
||||||
c = dc->disk.c;
|
|
||||||
if (c && test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags))
|
|
||||||
pr_info("CACHE_SET_IO_DISABLE already set");
|
|
||||||
|
|
||||||
bcache_device_stop(&dc->disk);
|
bcache_device_stop(&dc->disk);
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
@ -1564,13 +1591,17 @@ static void cache_set_flush(struct closure *cl)
|
||||||
kobject_put(&c->internal);
|
kobject_put(&c->internal);
|
||||||
kobject_del(&c->kobj);
|
kobject_del(&c->kobj);
|
||||||
|
|
||||||
if (c->gc_thread)
|
if (!IS_ERR_OR_NULL(c->gc_thread))
|
||||||
kthread_stop(c->gc_thread);
|
kthread_stop(c->gc_thread);
|
||||||
|
|
||||||
if (!IS_ERR_OR_NULL(c->root))
|
if (!IS_ERR_OR_NULL(c->root))
|
||||||
list_add(&c->root->list, &c->btree_cache);
|
list_add(&c->root->list, &c->btree_cache);
|
||||||
|
|
||||||
/* Should skip this if we're unregistering because of an error */
|
/*
|
||||||
|
* Avoid flushing cached nodes if cache set is retiring
|
||||||
|
* due to too many I/O errors detected.
|
||||||
|
*/
|
||||||
|
if (!test_bit(CACHE_SET_IO_DISABLE, &c->flags))
|
||||||
list_for_each_entry(b, &c->btree_cache, list) {
|
list_for_each_entry(b, &c->btree_cache, list) {
|
||||||
mutex_lock(&b->write_lock);
|
mutex_lock(&b->write_lock);
|
||||||
if (btree_node_dirty(b))
|
if (btree_node_dirty(b))
|
||||||
|
@ -1849,6 +1880,23 @@ static int run_cache_set(struct cache_set *c)
|
||||||
if (bch_btree_check(c))
|
if (bch_btree_check(c))
|
||||||
goto err;
|
goto err;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* bch_btree_check() may occupy too much system memory which
|
||||||
|
* has negative effects to user space application (e.g. data
|
||||||
|
* base) performance. Shrink the mca cache memory proactively
|
||||||
|
* here to avoid competing memory with user space workloads..
|
||||||
|
*/
|
||||||
|
if (!c->shrinker_disabled) {
|
||||||
|
struct shrink_control sc;
|
||||||
|
|
||||||
|
sc.gfp_mask = GFP_KERNEL;
|
||||||
|
sc.nr_to_scan = c->btree_cache_used * c->btree_pages;
|
||||||
|
/* first run to clear b->accessed tag */
|
||||||
|
c->shrink.scan_objects(&c->shrink, &sc);
|
||||||
|
/* second run to reap non-accessed nodes */
|
||||||
|
c->shrink.scan_objects(&c->shrink, &sc);
|
||||||
|
}
|
||||||
|
|
||||||
bch_journal_mark(c, &journal);
|
bch_journal_mark(c, &journal);
|
||||||
bch_initial_gc_finish(c);
|
bch_initial_gc_finish(c);
|
||||||
pr_debug("btree_check() done");
|
pr_debug("btree_check() done");
|
||||||
|
@ -1957,7 +2005,7 @@ err:
|
||||||
}
|
}
|
||||||
|
|
||||||
closure_sync(&cl);
|
closure_sync(&cl);
|
||||||
/* XXX: test this, it's broken */
|
|
||||||
bch_cache_set_error(c, "%s", err);
|
bch_cache_set_error(c, "%s", err);
|
||||||
|
|
||||||
return -EIO;
|
return -EIO;
|
||||||
|
@ -2251,9 +2299,13 @@ err:
|
||||||
|
|
||||||
static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
|
static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
|
||||||
const char *buffer, size_t size);
|
const char *buffer, size_t size);
|
||||||
|
static ssize_t bch_pending_bdevs_cleanup(struct kobject *k,
|
||||||
|
struct kobj_attribute *attr,
|
||||||
|
const char *buffer, size_t size);
|
||||||
|
|
||||||
kobj_attribute_write(register, register_bcache);
|
kobj_attribute_write(register, register_bcache);
|
||||||
kobj_attribute_write(register_quiet, register_bcache);
|
kobj_attribute_write(register_quiet, register_bcache);
|
||||||
|
kobj_attribute_write(pendings_cleanup, bch_pending_bdevs_cleanup);
|
||||||
|
|
||||||
static bool bch_is_open_backing(struct block_device *bdev)
|
static bool bch_is_open_backing(struct block_device *bdev)
|
||||||
{
|
{
|
||||||
|
@ -2301,6 +2353,11 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
|
||||||
if (!try_module_get(THIS_MODULE))
|
if (!try_module_get(THIS_MODULE))
|
||||||
return -EBUSY;
|
return -EBUSY;
|
||||||
|
|
||||||
|
/* For latest state of bcache_is_reboot */
|
||||||
|
smp_mb();
|
||||||
|
if (bcache_is_reboot)
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
path = kstrndup(buffer, size, GFP_KERNEL);
|
path = kstrndup(buffer, size, GFP_KERNEL);
|
||||||
if (!path)
|
if (!path)
|
||||||
goto err;
|
goto err;
|
||||||
|
@ -2378,8 +2435,61 @@ err:
|
||||||
goto out;
|
goto out;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
struct pdev {
|
||||||
|
struct list_head list;
|
||||||
|
struct cached_dev *dc;
|
||||||
|
};
|
||||||
|
|
||||||
|
static ssize_t bch_pending_bdevs_cleanup(struct kobject *k,
|
||||||
|
struct kobj_attribute *attr,
|
||||||
|
const char *buffer,
|
||||||
|
size_t size)
|
||||||
|
{
|
||||||
|
LIST_HEAD(pending_devs);
|
||||||
|
ssize_t ret = size;
|
||||||
|
struct cached_dev *dc, *tdc;
|
||||||
|
struct pdev *pdev, *tpdev;
|
||||||
|
struct cache_set *c, *tc;
|
||||||
|
|
||||||
|
mutex_lock(&bch_register_lock);
|
||||||
|
list_for_each_entry_safe(dc, tdc, &uncached_devices, list) {
|
||||||
|
pdev = kmalloc(sizeof(struct pdev), GFP_KERNEL);
|
||||||
|
if (!pdev)
|
||||||
|
break;
|
||||||
|
pdev->dc = dc;
|
||||||
|
list_add(&pdev->list, &pending_devs);
|
||||||
|
}
|
||||||
|
|
||||||
|
list_for_each_entry_safe(pdev, tpdev, &pending_devs, list) {
|
||||||
|
list_for_each_entry_safe(c, tc, &bch_cache_sets, list) {
|
||||||
|
char *pdev_set_uuid = pdev->dc->sb.set_uuid;
|
||||||
|
char *set_uuid = c->sb.uuid;
|
||||||
|
|
||||||
|
if (!memcmp(pdev_set_uuid, set_uuid, 16)) {
|
||||||
|
list_del(&pdev->list);
|
||||||
|
kfree(pdev);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
mutex_unlock(&bch_register_lock);
|
||||||
|
|
||||||
|
list_for_each_entry_safe(pdev, tpdev, &pending_devs, list) {
|
||||||
|
pr_info("delete pdev %p", pdev);
|
||||||
|
list_del(&pdev->list);
|
||||||
|
bcache_device_stop(&pdev->dc->disk);
|
||||||
|
kfree(pdev);
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x)
|
static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x)
|
||||||
{
|
{
|
||||||
|
if (bcache_is_reboot)
|
||||||
|
return NOTIFY_DONE;
|
||||||
|
|
||||||
if (code == SYS_DOWN ||
|
if (code == SYS_DOWN ||
|
||||||
code == SYS_HALT ||
|
code == SYS_HALT ||
|
||||||
code == SYS_POWER_OFF) {
|
code == SYS_POWER_OFF) {
|
||||||
|
@ -2392,19 +2502,45 @@ static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x)
|
||||||
|
|
||||||
mutex_lock(&bch_register_lock);
|
mutex_lock(&bch_register_lock);
|
||||||
|
|
||||||
|
if (bcache_is_reboot)
|
||||||
|
goto out;
|
||||||
|
|
||||||
|
/* New registration is rejected since now */
|
||||||
|
bcache_is_reboot = true;
|
||||||
|
/*
|
||||||
|
* Make registering caller (if there is) on other CPU
|
||||||
|
* core know bcache_is_reboot set to true earlier
|
||||||
|
*/
|
||||||
|
smp_mb();
|
||||||
|
|
||||||
if (list_empty(&bch_cache_sets) &&
|
if (list_empty(&bch_cache_sets) &&
|
||||||
list_empty(&uncached_devices))
|
list_empty(&uncached_devices))
|
||||||
goto out;
|
goto out;
|
||||||
|
|
||||||
|
mutex_unlock(&bch_register_lock);
|
||||||
|
|
||||||
pr_info("Stopping all devices:");
|
pr_info("Stopping all devices:");
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The reason bch_register_lock is not held to call
|
||||||
|
* bch_cache_set_stop() and bcache_device_stop() is to
|
||||||
|
* avoid potential deadlock during reboot, because cache
|
||||||
|
* set or bcache device stopping process will acqurie
|
||||||
|
* bch_register_lock too.
|
||||||
|
*
|
||||||
|
* We are safe here because bcache_is_reboot sets to
|
||||||
|
* true already, register_bcache() will reject new
|
||||||
|
* registration now. bcache_is_reboot also makes sure
|
||||||
|
* bcache_reboot() won't be re-entered on by other thread,
|
||||||
|
* so there is no race in following list iteration by
|
||||||
|
* list_for_each_entry_safe().
|
||||||
|
*/
|
||||||
list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
|
list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
|
||||||
bch_cache_set_stop(c);
|
bch_cache_set_stop(c);
|
||||||
|
|
||||||
list_for_each_entry_safe(dc, tdc, &uncached_devices, list)
|
list_for_each_entry_safe(dc, tdc, &uncached_devices, list)
|
||||||
bcache_device_stop(&dc->disk);
|
bcache_device_stop(&dc->disk);
|
||||||
|
|
||||||
mutex_unlock(&bch_register_lock);
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Give an early chance for other kthreads and
|
* Give an early chance for other kthreads and
|
||||||
|
@ -2496,6 +2632,7 @@ static int __init bcache_init(void)
|
||||||
static const struct attribute *files[] = {
|
static const struct attribute *files[] = {
|
||||||
&ksysfs_register.attr,
|
&ksysfs_register.attr,
|
||||||
&ksysfs_register_quiet.attr,
|
&ksysfs_register_quiet.attr,
|
||||||
|
&ksysfs_pendings_cleanup.attr,
|
||||||
NULL
|
NULL
|
||||||
};
|
};
|
||||||
|
|
||||||
|
@ -2531,6 +2668,8 @@ static int __init bcache_init(void)
|
||||||
bch_debug_init();
|
bch_debug_init();
|
||||||
closure_debug_init();
|
closure_debug_init();
|
||||||
|
|
||||||
|
bcache_is_reboot = false;
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
err:
|
err:
|
||||||
bcache_exit();
|
bcache_exit();
|
||||||
|
|
|
@ -16,33 +16,31 @@
|
||||||
#include <linux/sort.h>
|
#include <linux/sort.h>
|
||||||
#include <linux/sched/clock.h>
|
#include <linux/sched/clock.h>
|
||||||
|
|
||||||
|
extern bool bcache_is_reboot;
|
||||||
|
|
||||||
/* Default is 0 ("writethrough") */
|
/* Default is 0 ("writethrough") */
|
||||||
static const char * const bch_cache_modes[] = {
|
static const char * const bch_cache_modes[] = {
|
||||||
"writethrough",
|
"writethrough",
|
||||||
"writeback",
|
"writeback",
|
||||||
"writearound",
|
"writearound",
|
||||||
"none",
|
"none"
|
||||||
NULL
|
|
||||||
};
|
};
|
||||||
|
|
||||||
/* Default is 0 ("auto") */
|
/* Default is 0 ("auto") */
|
||||||
static const char * const bch_stop_on_failure_modes[] = {
|
static const char * const bch_stop_on_failure_modes[] = {
|
||||||
"auto",
|
"auto",
|
||||||
"always",
|
"always"
|
||||||
NULL
|
|
||||||
};
|
};
|
||||||
|
|
||||||
static const char * const cache_replacement_policies[] = {
|
static const char * const cache_replacement_policies[] = {
|
||||||
"lru",
|
"lru",
|
||||||
"fifo",
|
"fifo",
|
||||||
"random",
|
"random"
|
||||||
NULL
|
|
||||||
};
|
};
|
||||||
|
|
||||||
static const char * const error_actions[] = {
|
static const char * const error_actions[] = {
|
||||||
"unregister",
|
"unregister",
|
||||||
"panic",
|
"panic"
|
||||||
NULL
|
|
||||||
};
|
};
|
||||||
|
|
||||||
write_attribute(attach);
|
write_attribute(attach);
|
||||||
|
@ -84,8 +82,8 @@ read_attribute(bset_tree_stats);
|
||||||
read_attribute(state);
|
read_attribute(state);
|
||||||
read_attribute(cache_read_races);
|
read_attribute(cache_read_races);
|
||||||
read_attribute(reclaim);
|
read_attribute(reclaim);
|
||||||
|
read_attribute(reclaimed_journal_buckets);
|
||||||
read_attribute(flush_write);
|
read_attribute(flush_write);
|
||||||
read_attribute(retry_flush_write);
|
|
||||||
read_attribute(writeback_keys_done);
|
read_attribute(writeback_keys_done);
|
||||||
read_attribute(writeback_keys_failed);
|
read_attribute(writeback_keys_failed);
|
||||||
read_attribute(io_errors);
|
read_attribute(io_errors);
|
||||||
|
@ -180,7 +178,7 @@ SHOW(__bch_cached_dev)
|
||||||
var_print(writeback_percent);
|
var_print(writeback_percent);
|
||||||
sysfs_hprint(writeback_rate,
|
sysfs_hprint(writeback_rate,
|
||||||
wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0);
|
wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0);
|
||||||
sysfs_hprint(io_errors, atomic_read(&dc->io_errors));
|
sysfs_printf(io_errors, "%i", atomic_read(&dc->io_errors));
|
||||||
sysfs_printf(io_error_limit, "%i", dc->error_limit);
|
sysfs_printf(io_error_limit, "%i", dc->error_limit);
|
||||||
sysfs_printf(io_disable, "%i", dc->io_disable);
|
sysfs_printf(io_disable, "%i", dc->io_disable);
|
||||||
var_print(writeback_rate_update_seconds);
|
var_print(writeback_rate_update_seconds);
|
||||||
|
@ -271,6 +269,10 @@ STORE(__cached_dev)
|
||||||
struct cache_set *c;
|
struct cache_set *c;
|
||||||
struct kobj_uevent_env *env;
|
struct kobj_uevent_env *env;
|
||||||
|
|
||||||
|
/* no user space access if system is rebooting */
|
||||||
|
if (bcache_is_reboot)
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
#define d_strtoul(var) sysfs_strtoul(var, dc->var)
|
#define d_strtoul(var) sysfs_strtoul(var, dc->var)
|
||||||
#define d_strtoul_nonzero(var) sysfs_strtoul_clamp(var, dc->var, 1, INT_MAX)
|
#define d_strtoul_nonzero(var) sysfs_strtoul_clamp(var, dc->var, 1, INT_MAX)
|
||||||
#define d_strtoi_h(var) sysfs_hatoi(var, dc->var)
|
#define d_strtoi_h(var) sysfs_hatoi(var, dc->var)
|
||||||
|
@ -329,11 +331,14 @@ STORE(__cached_dev)
|
||||||
bch_cache_accounting_clear(&dc->accounting);
|
bch_cache_accounting_clear(&dc->accounting);
|
||||||
|
|
||||||
if (attr == &sysfs_running &&
|
if (attr == &sysfs_running &&
|
||||||
strtoul_or_return(buf))
|
strtoul_or_return(buf)) {
|
||||||
bch_cached_dev_run(dc);
|
v = bch_cached_dev_run(dc);
|
||||||
|
if (v)
|
||||||
|
return v;
|
||||||
|
}
|
||||||
|
|
||||||
if (attr == &sysfs_cache_mode) {
|
if (attr == &sysfs_cache_mode) {
|
||||||
v = __sysfs_match_string(bch_cache_modes, -1, buf);
|
v = sysfs_match_string(bch_cache_modes, buf);
|
||||||
if (v < 0)
|
if (v < 0)
|
||||||
return v;
|
return v;
|
||||||
|
|
||||||
|
@ -344,7 +349,7 @@ STORE(__cached_dev)
|
||||||
}
|
}
|
||||||
|
|
||||||
if (attr == &sysfs_stop_when_cache_set_failed) {
|
if (attr == &sysfs_stop_when_cache_set_failed) {
|
||||||
v = __sysfs_match_string(bch_stop_on_failure_modes, -1, buf);
|
v = sysfs_match_string(bch_stop_on_failure_modes, buf);
|
||||||
if (v < 0)
|
if (v < 0)
|
||||||
return v;
|
return v;
|
||||||
|
|
||||||
|
@ -408,6 +413,10 @@ STORE(bch_cached_dev)
|
||||||
struct cached_dev *dc = container_of(kobj, struct cached_dev,
|
struct cached_dev *dc = container_of(kobj, struct cached_dev,
|
||||||
disk.kobj);
|
disk.kobj);
|
||||||
|
|
||||||
|
/* no user space access if system is rebooting */
|
||||||
|
if (bcache_is_reboot)
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
mutex_lock(&bch_register_lock);
|
mutex_lock(&bch_register_lock);
|
||||||
size = __cached_dev_store(kobj, attr, buf, size);
|
size = __cached_dev_store(kobj, attr, buf, size);
|
||||||
|
|
||||||
|
@ -464,7 +473,7 @@ static struct attribute *bch_cached_dev_files[] = {
|
||||||
&sysfs_writeback_rate_p_term_inverse,
|
&sysfs_writeback_rate_p_term_inverse,
|
||||||
&sysfs_writeback_rate_minimum,
|
&sysfs_writeback_rate_minimum,
|
||||||
&sysfs_writeback_rate_debug,
|
&sysfs_writeback_rate_debug,
|
||||||
&sysfs_errors,
|
&sysfs_io_errors,
|
||||||
&sysfs_io_error_limit,
|
&sysfs_io_error_limit,
|
||||||
&sysfs_io_disable,
|
&sysfs_io_disable,
|
||||||
&sysfs_dirty_data,
|
&sysfs_dirty_data,
|
||||||
|
@ -511,6 +520,10 @@ STORE(__bch_flash_dev)
|
||||||
kobj);
|
kobj);
|
||||||
struct uuid_entry *u = &d->c->uuids[d->id];
|
struct uuid_entry *u = &d->c->uuids[d->id];
|
||||||
|
|
||||||
|
/* no user space access if system is rebooting */
|
||||||
|
if (bcache_is_reboot)
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
sysfs_strtoul(data_csum, d->data_csum);
|
sysfs_strtoul(data_csum, d->data_csum);
|
||||||
|
|
||||||
if (attr == &sysfs_size) {
|
if (attr == &sysfs_size) {
|
||||||
|
@ -693,12 +706,12 @@ SHOW(__bch_cache_set)
|
||||||
sysfs_print(reclaim,
|
sysfs_print(reclaim,
|
||||||
atomic_long_read(&c->reclaim));
|
atomic_long_read(&c->reclaim));
|
||||||
|
|
||||||
|
sysfs_print(reclaimed_journal_buckets,
|
||||||
|
atomic_long_read(&c->reclaimed_journal_buckets));
|
||||||
|
|
||||||
sysfs_print(flush_write,
|
sysfs_print(flush_write,
|
||||||
atomic_long_read(&c->flush_write));
|
atomic_long_read(&c->flush_write));
|
||||||
|
|
||||||
sysfs_print(retry_flush_write,
|
|
||||||
atomic_long_read(&c->retry_flush_write));
|
|
||||||
|
|
||||||
sysfs_print(writeback_keys_done,
|
sysfs_print(writeback_keys_done,
|
||||||
atomic_long_read(&c->writeback_keys_done));
|
atomic_long_read(&c->writeback_keys_done));
|
||||||
sysfs_print(writeback_keys_failed,
|
sysfs_print(writeback_keys_failed,
|
||||||
|
@ -746,6 +759,10 @@ STORE(__bch_cache_set)
|
||||||
struct cache_set *c = container_of(kobj, struct cache_set, kobj);
|
struct cache_set *c = container_of(kobj, struct cache_set, kobj);
|
||||||
ssize_t v;
|
ssize_t v;
|
||||||
|
|
||||||
|
/* no user space access if system is rebooting */
|
||||||
|
if (bcache_is_reboot)
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
if (attr == &sysfs_unregister)
|
if (attr == &sysfs_unregister)
|
||||||
bch_cache_set_unregister(c);
|
bch_cache_set_unregister(c);
|
||||||
|
|
||||||
|
@ -799,7 +816,7 @@ STORE(__bch_cache_set)
|
||||||
0, UINT_MAX);
|
0, UINT_MAX);
|
||||||
|
|
||||||
if (attr == &sysfs_errors) {
|
if (attr == &sysfs_errors) {
|
||||||
v = __sysfs_match_string(error_actions, -1, buf);
|
v = sysfs_match_string(error_actions, buf);
|
||||||
if (v < 0)
|
if (v < 0)
|
||||||
return v;
|
return v;
|
||||||
|
|
||||||
|
@ -865,6 +882,10 @@ STORE(bch_cache_set_internal)
|
||||||
{
|
{
|
||||||
struct cache_set *c = container_of(kobj, struct cache_set, internal);
|
struct cache_set *c = container_of(kobj, struct cache_set, internal);
|
||||||
|
|
||||||
|
/* no user space access if system is rebooting */
|
||||||
|
if (bcache_is_reboot)
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
return bch_cache_set_store(&c->kobj, attr, buf, size);
|
return bch_cache_set_store(&c->kobj, attr, buf, size);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -914,8 +935,8 @@ static struct attribute *bch_cache_set_internal_files[] = {
|
||||||
&sysfs_bset_tree_stats,
|
&sysfs_bset_tree_stats,
|
||||||
&sysfs_cache_read_races,
|
&sysfs_cache_read_races,
|
||||||
&sysfs_reclaim,
|
&sysfs_reclaim,
|
||||||
|
&sysfs_reclaimed_journal_buckets,
|
||||||
&sysfs_flush_write,
|
&sysfs_flush_write,
|
||||||
&sysfs_retry_flush_write,
|
|
||||||
&sysfs_writeback_keys_done,
|
&sysfs_writeback_keys_done,
|
||||||
&sysfs_writeback_keys_failed,
|
&sysfs_writeback_keys_failed,
|
||||||
|
|
||||||
|
@ -1050,6 +1071,10 @@ STORE(__bch_cache)
|
||||||
struct cache *ca = container_of(kobj, struct cache, kobj);
|
struct cache *ca = container_of(kobj, struct cache, kobj);
|
||||||
ssize_t v;
|
ssize_t v;
|
||||||
|
|
||||||
|
/* no user space access if system is rebooting */
|
||||||
|
if (bcache_is_reboot)
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
if (attr == &sysfs_discard) {
|
if (attr == &sysfs_discard) {
|
||||||
bool v = strtoul_or_return(buf);
|
bool v = strtoul_or_return(buf);
|
||||||
|
|
||||||
|
@ -1063,7 +1088,7 @@ STORE(__bch_cache)
|
||||||
}
|
}
|
||||||
|
|
||||||
if (attr == &sysfs_cache_replacement_policy) {
|
if (attr == &sysfs_cache_replacement_policy) {
|
||||||
v = __sysfs_match_string(cache_replacement_policies, -1, buf);
|
v = sysfs_match_string(cache_replacement_policies, buf);
|
||||||
if (v < 0)
|
if (v < 0)
|
||||||
return v;
|
return v;
|
||||||
|
|
||||||
|
|
|
@ -113,8 +113,6 @@ do { \
|
||||||
|
|
||||||
#define heap_full(h) ((h)->used == (h)->size)
|
#define heap_full(h) ((h)->used == (h)->size)
|
||||||
|
|
||||||
#define heap_empty(h) ((h)->used == 0)
|
|
||||||
|
|
||||||
#define DECLARE_FIFO(type, name) \
|
#define DECLARE_FIFO(type, name) \
|
||||||
struct { \
|
struct { \
|
||||||
size_t front, back, size, mask; \
|
size_t front, back, size, mask; \
|
||||||
|
|
|
@ -122,6 +122,9 @@ static void __update_writeback_rate(struct cached_dev *dc)
|
||||||
static bool set_at_max_writeback_rate(struct cache_set *c,
|
static bool set_at_max_writeback_rate(struct cache_set *c,
|
||||||
struct cached_dev *dc)
|
struct cached_dev *dc)
|
||||||
{
|
{
|
||||||
|
/* Don't set max writeback rate if gc is running */
|
||||||
|
if (!c->gc_mark_valid)
|
||||||
|
return false;
|
||||||
/*
|
/*
|
||||||
* Idle_counter is increased everytime when update_writeback_rate() is
|
* Idle_counter is increased everytime when update_writeback_rate() is
|
||||||
* called. If all backing devices attached to the same cache set have
|
* called. If all backing devices attached to the same cache set have
|
||||||
|
@ -735,6 +738,10 @@ static int bch_writeback_thread(void *arg)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (dc->writeback_write_wq) {
|
||||||
|
flush_workqueue(dc->writeback_write_wq);
|
||||||
|
destroy_workqueue(dc->writeback_write_wq);
|
||||||
|
}
|
||||||
cached_dev_put(dc);
|
cached_dev_put(dc);
|
||||||
wait_for_kthread_stop();
|
wait_for_kthread_stop();
|
||||||
|
|
||||||
|
@ -830,6 +837,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
|
||||||
"bcache_writeback");
|
"bcache_writeback");
|
||||||
if (IS_ERR(dc->writeback_thread)) {
|
if (IS_ERR(dc->writeback_thread)) {
|
||||||
cached_dev_put(dc);
|
cached_dev_put(dc);
|
||||||
|
destroy_workqueue(dc->writeback_write_wq);
|
||||||
return PTR_ERR(dc->writeback_thread);
|
return PTR_ERR(dc->writeback_thread);
|
||||||
}
|
}
|
||||||
dc->writeback_running = true;
|
dc->writeback_running = true;
|
||||||
|
|
|
@ -1790,6 +1790,8 @@ void md_bitmap_destroy(struct mddev *mddev)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
md_bitmap_wait_behind_writes(mddev);
|
md_bitmap_wait_behind_writes(mddev);
|
||||||
|
mempool_destroy(mddev->wb_info_pool);
|
||||||
|
mddev->wb_info_pool = NULL;
|
||||||
|
|
||||||
mutex_lock(&mddev->bitmap_info.mutex);
|
mutex_lock(&mddev->bitmap_info.mutex);
|
||||||
spin_lock(&mddev->lock);
|
spin_lock(&mddev->lock);
|
||||||
|
@ -1900,10 +1902,14 @@ int md_bitmap_load(struct mddev *mddev)
|
||||||
sector_t start = 0;
|
sector_t start = 0;
|
||||||
sector_t sector = 0;
|
sector_t sector = 0;
|
||||||
struct bitmap *bitmap = mddev->bitmap;
|
struct bitmap *bitmap = mddev->bitmap;
|
||||||
|
struct md_rdev *rdev;
|
||||||
|
|
||||||
if (!bitmap)
|
if (!bitmap)
|
||||||
goto out;
|
goto out;
|
||||||
|
|
||||||
|
rdev_for_each(rdev, mddev)
|
||||||
|
mddev_create_wb_pool(mddev, rdev, true);
|
||||||
|
|
||||||
if (mddev_is_clustered(mddev))
|
if (mddev_is_clustered(mddev))
|
||||||
md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes);
|
md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes);
|
||||||
|
|
||||||
|
@ -2462,12 +2468,26 @@ static ssize_t
|
||||||
backlog_store(struct mddev *mddev, const char *buf, size_t len)
|
backlog_store(struct mddev *mddev, const char *buf, size_t len)
|
||||||
{
|
{
|
||||||
unsigned long backlog;
|
unsigned long backlog;
|
||||||
|
unsigned long old_mwb = mddev->bitmap_info.max_write_behind;
|
||||||
int rv = kstrtoul(buf, 10, &backlog);
|
int rv = kstrtoul(buf, 10, &backlog);
|
||||||
if (rv)
|
if (rv)
|
||||||
return rv;
|
return rv;
|
||||||
if (backlog > COUNTER_MAX)
|
if (backlog > COUNTER_MAX)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
mddev->bitmap_info.max_write_behind = backlog;
|
mddev->bitmap_info.max_write_behind = backlog;
|
||||||
|
if (!backlog && mddev->wb_info_pool) {
|
||||||
|
/* wb_info_pool is not needed if backlog is zero */
|
||||||
|
mempool_destroy(mddev->wb_info_pool);
|
||||||
|
mddev->wb_info_pool = NULL;
|
||||||
|
} else if (backlog && !mddev->wb_info_pool) {
|
||||||
|
/* wb_info_pool is needed since backlog is not zero */
|
||||||
|
struct md_rdev *rdev;
|
||||||
|
|
||||||
|
rdev_for_each(rdev, mddev)
|
||||||
|
mddev_create_wb_pool(mddev, rdev, false);
|
||||||
|
}
|
||||||
|
if (old_mwb != backlog)
|
||||||
|
md_bitmap_update_sb(mddev->bitmap);
|
||||||
return len;
|
return len;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
129
drivers/md/md.c
129
drivers/md/md.c
|
@ -37,6 +37,7 @@
|
||||||
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
#include <linux/sched/mm.h>
|
||||||
#include <linux/sched/signal.h>
|
#include <linux/sched/signal.h>
|
||||||
#include <linux/kthread.h>
|
#include <linux/kthread.h>
|
||||||
#include <linux/blkdev.h>
|
#include <linux/blkdev.h>
|
||||||
|
@ -124,6 +125,77 @@ static inline int speed_max(struct mddev *mddev)
|
||||||
mddev->sync_speed_max : sysctl_speed_limit_max;
|
mddev->sync_speed_max : sysctl_speed_limit_max;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int rdev_init_wb(struct md_rdev *rdev)
|
||||||
|
{
|
||||||
|
if (rdev->bdev->bd_queue->nr_hw_queues == 1)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
spin_lock_init(&rdev->wb_list_lock);
|
||||||
|
INIT_LIST_HEAD(&rdev->wb_list);
|
||||||
|
init_waitqueue_head(&rdev->wb_io_wait);
|
||||||
|
set_bit(WBCollisionCheck, &rdev->flags);
|
||||||
|
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Create wb_info_pool if rdev is the first multi-queue device flaged
|
||||||
|
* with writemostly, also write-behind mode is enabled.
|
||||||
|
*/
|
||||||
|
void mddev_create_wb_pool(struct mddev *mddev, struct md_rdev *rdev,
|
||||||
|
bool is_suspend)
|
||||||
|
{
|
||||||
|
if (mddev->bitmap_info.max_write_behind == 0)
|
||||||
|
return;
|
||||||
|
|
||||||
|
if (!test_bit(WriteMostly, &rdev->flags) || !rdev_init_wb(rdev))
|
||||||
|
return;
|
||||||
|
|
||||||
|
if (mddev->wb_info_pool == NULL) {
|
||||||
|
unsigned int noio_flag;
|
||||||
|
|
||||||
|
if (!is_suspend)
|
||||||
|
mddev_suspend(mddev);
|
||||||
|
noio_flag = memalloc_noio_save();
|
||||||
|
mddev->wb_info_pool = mempool_create_kmalloc_pool(NR_WB_INFOS,
|
||||||
|
sizeof(struct wb_info));
|
||||||
|
memalloc_noio_restore(noio_flag);
|
||||||
|
if (!mddev->wb_info_pool)
|
||||||
|
pr_err("can't alloc memory pool for writemostly\n");
|
||||||
|
if (!is_suspend)
|
||||||
|
mddev_resume(mddev);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(mddev_create_wb_pool);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* destroy wb_info_pool if rdev is the last device flaged with WBCollisionCheck.
|
||||||
|
*/
|
||||||
|
static void mddev_destroy_wb_pool(struct mddev *mddev, struct md_rdev *rdev)
|
||||||
|
{
|
||||||
|
if (!test_and_clear_bit(WBCollisionCheck, &rdev->flags))
|
||||||
|
return;
|
||||||
|
|
||||||
|
if (mddev->wb_info_pool) {
|
||||||
|
struct md_rdev *temp;
|
||||||
|
int num = 0;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check if other rdevs need wb_info_pool.
|
||||||
|
*/
|
||||||
|
rdev_for_each(temp, mddev)
|
||||||
|
if (temp != rdev &&
|
||||||
|
test_bit(WBCollisionCheck, &temp->flags))
|
||||||
|
num++;
|
||||||
|
if (!num) {
|
||||||
|
mddev_suspend(rdev->mddev);
|
||||||
|
mempool_destroy(mddev->wb_info_pool);
|
||||||
|
mddev->wb_info_pool = NULL;
|
||||||
|
mddev_resume(rdev->mddev);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
static struct ctl_table_header *raid_table_header;
|
static struct ctl_table_header *raid_table_header;
|
||||||
|
|
||||||
static struct ctl_table raid_table[] = {
|
static struct ctl_table raid_table[] = {
|
||||||
|
@ -2210,6 +2282,9 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
|
||||||
rdev->mddev = mddev;
|
rdev->mddev = mddev;
|
||||||
pr_debug("md: bind<%s>\n", b);
|
pr_debug("md: bind<%s>\n", b);
|
||||||
|
|
||||||
|
if (mddev->raid_disks)
|
||||||
|
mddev_create_wb_pool(mddev, rdev, false);
|
||||||
|
|
||||||
if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b)))
|
if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b)))
|
||||||
goto fail;
|
goto fail;
|
||||||
|
|
||||||
|
@ -2246,6 +2321,7 @@ static void unbind_rdev_from_array(struct md_rdev *rdev)
|
||||||
bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
|
bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
|
||||||
list_del_rcu(&rdev->same_set);
|
list_del_rcu(&rdev->same_set);
|
||||||
pr_debug("md: unbind<%s>\n", bdevname(rdev->bdev,b));
|
pr_debug("md: unbind<%s>\n", bdevname(rdev->bdev,b));
|
||||||
|
mddev_destroy_wb_pool(rdev->mddev, rdev);
|
||||||
rdev->mddev = NULL;
|
rdev->mddev = NULL;
|
||||||
sysfs_remove_link(&rdev->kobj, "block");
|
sysfs_remove_link(&rdev->kobj, "block");
|
||||||
sysfs_put(rdev->sysfs_state);
|
sysfs_put(rdev->sysfs_state);
|
||||||
|
@ -2758,8 +2834,10 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
|
||||||
}
|
}
|
||||||
} else if (cmd_match(buf, "writemostly")) {
|
} else if (cmd_match(buf, "writemostly")) {
|
||||||
set_bit(WriteMostly, &rdev->flags);
|
set_bit(WriteMostly, &rdev->flags);
|
||||||
|
mddev_create_wb_pool(rdev->mddev, rdev, false);
|
||||||
err = 0;
|
err = 0;
|
||||||
} else if (cmd_match(buf, "-writemostly")) {
|
} else if (cmd_match(buf, "-writemostly")) {
|
||||||
|
mddev_destroy_wb_pool(rdev->mddev, rdev);
|
||||||
clear_bit(WriteMostly, &rdev->flags);
|
clear_bit(WriteMostly, &rdev->flags);
|
||||||
err = 0;
|
err = 0;
|
||||||
} else if (cmd_match(buf, "blocked")) {
|
} else if (cmd_match(buf, "blocked")) {
|
||||||
|
@ -3356,7 +3434,7 @@ rdev_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
|
||||||
if (!entry->show)
|
if (!entry->show)
|
||||||
return -EIO;
|
return -EIO;
|
||||||
if (!rdev->mddev)
|
if (!rdev->mddev)
|
||||||
return -EBUSY;
|
return -ENODEV;
|
||||||
return entry->show(rdev, page);
|
return entry->show(rdev, page);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -5588,15 +5666,28 @@ int md_run(struct mddev *mddev)
|
||||||
mddev->bitmap = bitmap;
|
mddev->bitmap = bitmap;
|
||||||
|
|
||||||
}
|
}
|
||||||
if (err) {
|
if (err)
|
||||||
mddev_detach(mddev);
|
goto bitmap_abort;
|
||||||
if (mddev->private)
|
|
||||||
pers->free(mddev, mddev->private);
|
if (mddev->bitmap_info.max_write_behind > 0) {
|
||||||
mddev->private = NULL;
|
bool creat_pool = false;
|
||||||
module_put(pers->owner);
|
|
||||||
md_bitmap_destroy(mddev);
|
rdev_for_each(rdev, mddev) {
|
||||||
goto abort;
|
if (test_bit(WriteMostly, &rdev->flags) &&
|
||||||
|
rdev_init_wb(rdev))
|
||||||
|
creat_pool = true;
|
||||||
}
|
}
|
||||||
|
if (creat_pool && mddev->wb_info_pool == NULL) {
|
||||||
|
mddev->wb_info_pool =
|
||||||
|
mempool_create_kmalloc_pool(NR_WB_INFOS,
|
||||||
|
sizeof(struct wb_info));
|
||||||
|
if (!mddev->wb_info_pool) {
|
||||||
|
err = -ENOMEM;
|
||||||
|
goto bitmap_abort;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if (mddev->queue) {
|
if (mddev->queue) {
|
||||||
bool nonrot = true;
|
bool nonrot = true;
|
||||||
|
|
||||||
|
@ -5639,8 +5730,7 @@ int md_run(struct mddev *mddev)
|
||||||
spin_unlock(&mddev->lock);
|
spin_unlock(&mddev->lock);
|
||||||
rdev_for_each(rdev, mddev)
|
rdev_for_each(rdev, mddev)
|
||||||
if (rdev->raid_disk >= 0)
|
if (rdev->raid_disk >= 0)
|
||||||
if (sysfs_link_rdev(mddev, rdev))
|
sysfs_link_rdev(mddev, rdev); /* failure here is OK */
|
||||||
/* failure here is OK */;
|
|
||||||
|
|
||||||
if (mddev->degraded && !mddev->ro)
|
if (mddev->degraded && !mddev->ro)
|
||||||
/* This ensures that recovering status is reported immediately
|
/* This ensures that recovering status is reported immediately
|
||||||
|
@ -5658,6 +5748,13 @@ int md_run(struct mddev *mddev)
|
||||||
sysfs_notify(&mddev->kobj, NULL, "degraded");
|
sysfs_notify(&mddev->kobj, NULL, "degraded");
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
|
bitmap_abort:
|
||||||
|
mddev_detach(mddev);
|
||||||
|
if (mddev->private)
|
||||||
|
pers->free(mddev, mddev->private);
|
||||||
|
mddev->private = NULL;
|
||||||
|
module_put(pers->owner);
|
||||||
|
md_bitmap_destroy(mddev);
|
||||||
abort:
|
abort:
|
||||||
bioset_exit(&mddev->bio_set);
|
bioset_exit(&mddev->bio_set);
|
||||||
bioset_exit(&mddev->sync_set);
|
bioset_exit(&mddev->sync_set);
|
||||||
|
@ -5826,6 +5923,8 @@ static void __md_stop_writes(struct mddev *mddev)
|
||||||
mddev->in_sync = 1;
|
mddev->in_sync = 1;
|
||||||
md_update_sb(mddev, 1);
|
md_update_sb(mddev, 1);
|
||||||
}
|
}
|
||||||
|
mempool_destroy(mddev->wb_info_pool);
|
||||||
|
mddev->wb_info_pool = NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
void md_stop_writes(struct mddev *mddev)
|
void md_stop_writes(struct mddev *mddev)
|
||||||
|
@ -8198,8 +8297,7 @@ void md_do_sync(struct md_thread *thread)
|
||||||
{
|
{
|
||||||
struct mddev *mddev = thread->mddev;
|
struct mddev *mddev = thread->mddev;
|
||||||
struct mddev *mddev2;
|
struct mddev *mddev2;
|
||||||
unsigned int currspeed = 0,
|
unsigned int currspeed = 0, window;
|
||||||
window;
|
|
||||||
sector_t max_sectors,j, io_sectors, recovery_done;
|
sector_t max_sectors,j, io_sectors, recovery_done;
|
||||||
unsigned long mark[SYNC_MARKS];
|
unsigned long mark[SYNC_MARKS];
|
||||||
unsigned long update_time;
|
unsigned long update_time;
|
||||||
|
@ -8256,7 +8354,7 @@ void md_do_sync(struct md_thread *thread)
|
||||||
* 0 == not engaged in resync at all
|
* 0 == not engaged in resync at all
|
||||||
* 2 == checking that there is no conflict with another sync
|
* 2 == checking that there is no conflict with another sync
|
||||||
* 1 == like 2, but have yielded to allow conflicting resync to
|
* 1 == like 2, but have yielded to allow conflicting resync to
|
||||||
* commense
|
* commence
|
||||||
* other == active in resync - this many blocks
|
* other == active in resync - this many blocks
|
||||||
*
|
*
|
||||||
* Before starting a resync we must have set curr_resync to
|
* Before starting a resync we must have set curr_resync to
|
||||||
|
@ -8387,7 +8485,7 @@ void md_do_sync(struct md_thread *thread)
|
||||||
/*
|
/*
|
||||||
* Tune reconstruction:
|
* Tune reconstruction:
|
||||||
*/
|
*/
|
||||||
window = 32*(PAGE_SIZE/512);
|
window = 32 * (PAGE_SIZE / 512);
|
||||||
pr_debug("md: using %dk window, over a total of %lluk.\n",
|
pr_debug("md: using %dk window, over a total of %lluk.\n",
|
||||||
window/2, (unsigned long long)max_sectors/2);
|
window/2, (unsigned long long)max_sectors/2);
|
||||||
|
|
||||||
|
@ -9200,7 +9298,6 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
|
||||||
* perform resync with the new activated disk */
|
* perform resync with the new activated disk */
|
||||||
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
|
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
|
||||||
md_wakeup_thread(mddev->thread);
|
md_wakeup_thread(mddev->thread);
|
||||||
|
|
||||||
}
|
}
|
||||||
/* device faulty
|
/* device faulty
|
||||||
* We just want to do the minimum to mark the disk
|
* We just want to do the minimum to mark the disk
|
||||||
|
|
|
@ -109,6 +109,14 @@ struct md_rdev {
|
||||||
* for reporting to userspace and storing
|
* for reporting to userspace and storing
|
||||||
* in superblock.
|
* in superblock.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The members for check collision of write behind IOs.
|
||||||
|
*/
|
||||||
|
struct list_head wb_list;
|
||||||
|
spinlock_t wb_list_lock;
|
||||||
|
wait_queue_head_t wb_io_wait;
|
||||||
|
|
||||||
struct work_struct del_work; /* used for delayed sysfs removal */
|
struct work_struct del_work; /* used for delayed sysfs removal */
|
||||||
|
|
||||||
struct kernfs_node *sysfs_state; /* handle for 'state'
|
struct kernfs_node *sysfs_state; /* handle for 'state'
|
||||||
|
@ -193,6 +201,10 @@ enum flag_bits {
|
||||||
* it didn't fail, so don't use FailFast
|
* it didn't fail, so don't use FailFast
|
||||||
* any more for metadata
|
* any more for metadata
|
||||||
*/
|
*/
|
||||||
|
WBCollisionCheck, /*
|
||||||
|
* multiqueue device should check if there
|
||||||
|
* is collision between write behind bios.
|
||||||
|
*/
|
||||||
};
|
};
|
||||||
|
|
||||||
static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors,
|
static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors,
|
||||||
|
@ -245,6 +257,14 @@ enum mddev_sb_flags {
|
||||||
MD_SB_NEED_REWRITE, /* metadata write needs to be repeated */
|
MD_SB_NEED_REWRITE, /* metadata write needs to be repeated */
|
||||||
};
|
};
|
||||||
|
|
||||||
|
#define NR_WB_INFOS 8
|
||||||
|
/* record current range of write behind IOs */
|
||||||
|
struct wb_info {
|
||||||
|
sector_t lo;
|
||||||
|
sector_t hi;
|
||||||
|
struct list_head list;
|
||||||
|
};
|
||||||
|
|
||||||
struct mddev {
|
struct mddev {
|
||||||
void *private;
|
void *private;
|
||||||
struct md_personality *pers;
|
struct md_personality *pers;
|
||||||
|
@ -461,6 +481,7 @@ struct mddev {
|
||||||
*/
|
*/
|
||||||
struct work_struct flush_work;
|
struct work_struct flush_work;
|
||||||
struct work_struct event_work; /* used by dm to report failure event */
|
struct work_struct event_work; /* used by dm to report failure event */
|
||||||
|
mempool_t *wb_info_pool;
|
||||||
void (*sync_super)(struct mddev *mddev, struct md_rdev *rdev);
|
void (*sync_super)(struct mddev *mddev, struct md_rdev *rdev);
|
||||||
struct md_cluster_info *cluster_info;
|
struct md_cluster_info *cluster_info;
|
||||||
unsigned int good_device_nr; /* good device num within cluster raid */
|
unsigned int good_device_nr; /* good device num within cluster raid */
|
||||||
|
@ -709,6 +730,8 @@ extern struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
|
||||||
extern void md_reload_sb(struct mddev *mddev, int raid_disk);
|
extern void md_reload_sb(struct mddev *mddev, int raid_disk);
|
||||||
extern void md_update_sb(struct mddev *mddev, int force);
|
extern void md_update_sb(struct mddev *mddev, int force);
|
||||||
extern void md_kick_rdev_from_array(struct md_rdev * rdev);
|
extern void md_kick_rdev_from_array(struct md_rdev * rdev);
|
||||||
|
extern void mddev_create_wb_pool(struct mddev *mddev, struct md_rdev *rdev,
|
||||||
|
bool is_suspend);
|
||||||
struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr);
|
struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr);
|
||||||
struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev);
|
struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev);
|
||||||
|
|
||||||
|
|
|
@ -3,12 +3,42 @@
|
||||||
#define RESYNC_BLOCK_SIZE (64*1024)
|
#define RESYNC_BLOCK_SIZE (64*1024)
|
||||||
#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
|
#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Number of guaranteed raid bios in case of extreme VM load:
|
||||||
|
*/
|
||||||
|
#define NR_RAID_BIOS 256
|
||||||
|
|
||||||
|
/* when we get a read error on a read-only array, we redirect to another
|
||||||
|
* device without failing the first device, or trying to over-write to
|
||||||
|
* correct the read error. To keep track of bad blocks on a per-bio
|
||||||
|
* level, we store IO_BLOCKED in the appropriate 'bios' pointer
|
||||||
|
*/
|
||||||
|
#define IO_BLOCKED ((struct bio *)1)
|
||||||
|
/* When we successfully write to a known bad-block, we need to remove the
|
||||||
|
* bad-block marking which must be done from process context. So we record
|
||||||
|
* the success by setting devs[n].bio to IO_MADE_GOOD
|
||||||
|
*/
|
||||||
|
#define IO_MADE_GOOD ((struct bio *)2)
|
||||||
|
|
||||||
|
#define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
|
||||||
|
|
||||||
|
/* When there are this many requests queue to be written by
|
||||||
|
* the raid thread, we become 'congested' to provide back-pressure
|
||||||
|
* for writeback.
|
||||||
|
*/
|
||||||
|
static int max_queued_requests = 1024;
|
||||||
|
|
||||||
/* for managing resync I/O pages */
|
/* for managing resync I/O pages */
|
||||||
struct resync_pages {
|
struct resync_pages {
|
||||||
void *raid_bio;
|
void *raid_bio;
|
||||||
struct page *pages[RESYNC_PAGES];
|
struct page *pages[RESYNC_PAGES];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
static void rbio_pool_free(void *rbio, void *data)
|
||||||
|
{
|
||||||
|
kfree(rbio);
|
||||||
|
}
|
||||||
|
|
||||||
static inline int resync_alloc_pages(struct resync_pages *rp,
|
static inline int resync_alloc_pages(struct resync_pages *rp,
|
||||||
gfp_t gfp_flags)
|
gfp_t gfp_flags)
|
||||||
{
|
{
|
||||||
|
|
|
@ -42,31 +42,6 @@
|
||||||
(1L << MD_HAS_PPL) | \
|
(1L << MD_HAS_PPL) | \
|
||||||
(1L << MD_HAS_MULTIPLE_PPLS))
|
(1L << MD_HAS_MULTIPLE_PPLS))
|
||||||
|
|
||||||
/*
|
|
||||||
* Number of guaranteed r1bios in case of extreme VM load:
|
|
||||||
*/
|
|
||||||
#define NR_RAID1_BIOS 256
|
|
||||||
|
|
||||||
/* when we get a read error on a read-only array, we redirect to another
|
|
||||||
* device without failing the first device, or trying to over-write to
|
|
||||||
* correct the read error. To keep track of bad blocks on a per-bio
|
|
||||||
* level, we store IO_BLOCKED in the appropriate 'bios' pointer
|
|
||||||
*/
|
|
||||||
#define IO_BLOCKED ((struct bio *)1)
|
|
||||||
/* When we successfully write to a known bad-block, we need to remove the
|
|
||||||
* bad-block marking which must be done from process context. So we record
|
|
||||||
* the success by setting devs[n].bio to IO_MADE_GOOD
|
|
||||||
*/
|
|
||||||
#define IO_MADE_GOOD ((struct bio *)2)
|
|
||||||
|
|
||||||
#define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
|
|
||||||
|
|
||||||
/* When there are this many requests queue to be written by
|
|
||||||
* the raid1 thread, we become 'congested' to provide back-pressure
|
|
||||||
* for writeback.
|
|
||||||
*/
|
|
||||||
static int max_queued_requests = 1024;
|
|
||||||
|
|
||||||
static void allow_barrier(struct r1conf *conf, sector_t sector_nr);
|
static void allow_barrier(struct r1conf *conf, sector_t sector_nr);
|
||||||
static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
|
static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
|
||||||
|
|
||||||
|
@ -75,6 +50,57 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
|
||||||
|
|
||||||
#include "raid1-10.c"
|
#include "raid1-10.c"
|
||||||
|
|
||||||
|
static int check_and_add_wb(struct md_rdev *rdev, sector_t lo, sector_t hi)
|
||||||
|
{
|
||||||
|
struct wb_info *wi, *temp_wi;
|
||||||
|
unsigned long flags;
|
||||||
|
int ret = 0;
|
||||||
|
struct mddev *mddev = rdev->mddev;
|
||||||
|
|
||||||
|
wi = mempool_alloc(mddev->wb_info_pool, GFP_NOIO);
|
||||||
|
|
||||||
|
spin_lock_irqsave(&rdev->wb_list_lock, flags);
|
||||||
|
list_for_each_entry(temp_wi, &rdev->wb_list, list) {
|
||||||
|
/* collision happened */
|
||||||
|
if (hi > temp_wi->lo && lo < temp_wi->hi) {
|
||||||
|
ret = -EBUSY;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!ret) {
|
||||||
|
wi->lo = lo;
|
||||||
|
wi->hi = hi;
|
||||||
|
list_add(&wi->list, &rdev->wb_list);
|
||||||
|
} else
|
||||||
|
mempool_free(wi, mddev->wb_info_pool);
|
||||||
|
spin_unlock_irqrestore(&rdev->wb_list_lock, flags);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void remove_wb(struct md_rdev *rdev, sector_t lo, sector_t hi)
|
||||||
|
{
|
||||||
|
struct wb_info *wi;
|
||||||
|
unsigned long flags;
|
||||||
|
int found = 0;
|
||||||
|
struct mddev *mddev = rdev->mddev;
|
||||||
|
|
||||||
|
spin_lock_irqsave(&rdev->wb_list_lock, flags);
|
||||||
|
list_for_each_entry(wi, &rdev->wb_list, list)
|
||||||
|
if (hi == wi->hi && lo == wi->lo) {
|
||||||
|
list_del(&wi->list);
|
||||||
|
mempool_free(wi, mddev->wb_info_pool);
|
||||||
|
found = 1;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!found)
|
||||||
|
WARN(1, "The write behind IO is not recorded\n");
|
||||||
|
spin_unlock_irqrestore(&rdev->wb_list_lock, flags);
|
||||||
|
wake_up(&rdev->wb_io_wait);
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* for resync bio, r1bio pointer can be retrieved from the per-bio
|
* for resync bio, r1bio pointer can be retrieved from the per-bio
|
||||||
* 'struct resync_pages'.
|
* 'struct resync_pages'.
|
||||||
|
@ -93,11 +119,6 @@ static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
|
||||||
return kzalloc(size, gfp_flags);
|
return kzalloc(size, gfp_flags);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void r1bio_pool_free(void *r1_bio, void *data)
|
|
||||||
{
|
|
||||||
kfree(r1_bio);
|
|
||||||
}
|
|
||||||
|
|
||||||
#define RESYNC_DEPTH 32
|
#define RESYNC_DEPTH 32
|
||||||
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
|
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
|
||||||
#define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
|
#define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
|
||||||
|
@ -173,7 +194,7 @@ out_free_bio:
|
||||||
kfree(rps);
|
kfree(rps);
|
||||||
|
|
||||||
out_free_r1bio:
|
out_free_r1bio:
|
||||||
r1bio_pool_free(r1_bio, data);
|
rbio_pool_free(r1_bio, data);
|
||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -193,7 +214,7 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
|
||||||
/* resync pages array stored in the 1st bio's .bi_private */
|
/* resync pages array stored in the 1st bio's .bi_private */
|
||||||
kfree(rp);
|
kfree(rp);
|
||||||
|
|
||||||
r1bio_pool_free(r1bio, data);
|
rbio_pool_free(r1bio, data);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void put_all_bios(struct r1conf *conf, struct r1bio *r1_bio)
|
static void put_all_bios(struct r1conf *conf, struct r1bio *r1_bio)
|
||||||
|
@ -476,6 +497,12 @@ static void raid1_end_write_request(struct bio *bio)
|
||||||
}
|
}
|
||||||
|
|
||||||
if (behind) {
|
if (behind) {
|
||||||
|
if (test_bit(WBCollisionCheck, &rdev->flags)) {
|
||||||
|
sector_t lo = r1_bio->sector;
|
||||||
|
sector_t hi = r1_bio->sector + r1_bio->sectors;
|
||||||
|
|
||||||
|
remove_wb(rdev, lo, hi);
|
||||||
|
}
|
||||||
if (test_bit(WriteMostly, &rdev->flags))
|
if (test_bit(WriteMostly, &rdev->flags))
|
||||||
atomic_dec(&r1_bio->behind_remaining);
|
atomic_dec(&r1_bio->behind_remaining);
|
||||||
|
|
||||||
|
@ -1449,7 +1476,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
|
||||||
if (!r1_bio->bios[i])
|
if (!r1_bio->bios[i])
|
||||||
continue;
|
continue;
|
||||||
|
|
||||||
|
|
||||||
if (first_clone) {
|
if (first_clone) {
|
||||||
/* do behind I/O ?
|
/* do behind I/O ?
|
||||||
* Not if there are too many, or cannot
|
* Not if there are too many, or cannot
|
||||||
|
@ -1474,7 +1500,16 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
|
||||||
mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
|
mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
|
||||||
|
|
||||||
if (r1_bio->behind_master_bio) {
|
if (r1_bio->behind_master_bio) {
|
||||||
if (test_bit(WriteMostly, &conf->mirrors[i].rdev->flags))
|
struct md_rdev *rdev = conf->mirrors[i].rdev;
|
||||||
|
|
||||||
|
if (test_bit(WBCollisionCheck, &rdev->flags)) {
|
||||||
|
sector_t lo = r1_bio->sector;
|
||||||
|
sector_t hi = r1_bio->sector + r1_bio->sectors;
|
||||||
|
|
||||||
|
wait_event(rdev->wb_io_wait,
|
||||||
|
check_and_add_wb(rdev, lo, hi) == 0);
|
||||||
|
}
|
||||||
|
if (test_bit(WriteMostly, &rdev->flags))
|
||||||
atomic_inc(&r1_bio->behind_remaining);
|
atomic_inc(&r1_bio->behind_remaining);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1729,9 +1764,8 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
|
||||||
first = last = rdev->saved_raid_disk;
|
first = last = rdev->saved_raid_disk;
|
||||||
|
|
||||||
for (mirror = first; mirror <= last; mirror++) {
|
for (mirror = first; mirror <= last; mirror++) {
|
||||||
p = conf->mirrors+mirror;
|
p = conf->mirrors + mirror;
|
||||||
if (!p->rdev) {
|
if (!p->rdev) {
|
||||||
|
|
||||||
if (mddev->gendisk)
|
if (mddev->gendisk)
|
||||||
disk_stack_limits(mddev->gendisk, rdev->bdev,
|
disk_stack_limits(mddev->gendisk, rdev->bdev,
|
||||||
rdev->data_offset << 9);
|
rdev->data_offset << 9);
|
||||||
|
@ -2888,7 +2922,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
|
||||||
if (read_targets == 1)
|
if (read_targets == 1)
|
||||||
bio->bi_opf &= ~MD_FAILFAST;
|
bio->bi_opf &= ~MD_FAILFAST;
|
||||||
generic_make_request(bio);
|
generic_make_request(bio);
|
||||||
|
|
||||||
}
|
}
|
||||||
return nr_sectors;
|
return nr_sectors;
|
||||||
}
|
}
|
||||||
|
@ -2947,8 +2980,8 @@ static struct r1conf *setup_conf(struct mddev *mddev)
|
||||||
if (!conf->poolinfo)
|
if (!conf->poolinfo)
|
||||||
goto abort;
|
goto abort;
|
||||||
conf->poolinfo->raid_disks = mddev->raid_disks * 2;
|
conf->poolinfo->raid_disks = mddev->raid_disks * 2;
|
||||||
err = mempool_init(&conf->r1bio_pool, NR_RAID1_BIOS, r1bio_pool_alloc,
|
err = mempool_init(&conf->r1bio_pool, NR_RAID_BIOS, r1bio_pool_alloc,
|
||||||
r1bio_pool_free, conf->poolinfo);
|
rbio_pool_free, conf->poolinfo);
|
||||||
if (err)
|
if (err)
|
||||||
goto abort;
|
goto abort;
|
||||||
|
|
||||||
|
@ -3089,7 +3122,7 @@ static int raid1_run(struct mddev *mddev)
|
||||||
}
|
}
|
||||||
|
|
||||||
mddev->degraded = 0;
|
mddev->degraded = 0;
|
||||||
for (i=0; i < conf->raid_disks; i++)
|
for (i = 0; i < conf->raid_disks; i++)
|
||||||
if (conf->mirrors[i].rdev == NULL ||
|
if (conf->mirrors[i].rdev == NULL ||
|
||||||
!test_bit(In_sync, &conf->mirrors[i].rdev->flags) ||
|
!test_bit(In_sync, &conf->mirrors[i].rdev->flags) ||
|
||||||
test_bit(Faulty, &conf->mirrors[i].rdev->flags))
|
test_bit(Faulty, &conf->mirrors[i].rdev->flags))
|
||||||
|
@ -3232,8 +3265,8 @@ static int raid1_reshape(struct mddev *mddev)
|
||||||
newpoolinfo->mddev = mddev;
|
newpoolinfo->mddev = mddev;
|
||||||
newpoolinfo->raid_disks = raid_disks * 2;
|
newpoolinfo->raid_disks = raid_disks * 2;
|
||||||
|
|
||||||
ret = mempool_init(&newpool, NR_RAID1_BIOS, r1bio_pool_alloc,
|
ret = mempool_init(&newpool, NR_RAID_BIOS, r1bio_pool_alloc,
|
||||||
r1bio_pool_free, newpoolinfo);
|
rbio_pool_free, newpoolinfo);
|
||||||
if (ret) {
|
if (ret) {
|
||||||
kfree(newpoolinfo);
|
kfree(newpoolinfo);
|
||||||
return ret;
|
return ret;
|
||||||
|
|
|
@ -64,31 +64,6 @@
|
||||||
* [B A] [D C] [B A] [E C D]
|
* [B A] [D C] [B A] [E C D]
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/*
|
|
||||||
* Number of guaranteed r10bios in case of extreme VM load:
|
|
||||||
*/
|
|
||||||
#define NR_RAID10_BIOS 256
|
|
||||||
|
|
||||||
/* when we get a read error on a read-only array, we redirect to another
|
|
||||||
* device without failing the first device, or trying to over-write to
|
|
||||||
* correct the read error. To keep track of bad blocks on a per-bio
|
|
||||||
* level, we store IO_BLOCKED in the appropriate 'bios' pointer
|
|
||||||
*/
|
|
||||||
#define IO_BLOCKED ((struct bio *)1)
|
|
||||||
/* When we successfully write to a known bad-block, we need to remove the
|
|
||||||
* bad-block marking which must be done from process context. So we record
|
|
||||||
* the success by setting devs[n].bio to IO_MADE_GOOD
|
|
||||||
*/
|
|
||||||
#define IO_MADE_GOOD ((struct bio *)2)
|
|
||||||
|
|
||||||
#define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
|
|
||||||
|
|
||||||
/* When there are this many requests queued to be written by
|
|
||||||
* the raid10 thread, we become 'congested' to provide back-pressure
|
|
||||||
* for writeback.
|
|
||||||
*/
|
|
||||||
static int max_queued_requests = 1024;
|
|
||||||
|
|
||||||
static void allow_barrier(struct r10conf *conf);
|
static void allow_barrier(struct r10conf *conf);
|
||||||
static void lower_barrier(struct r10conf *conf);
|
static void lower_barrier(struct r10conf *conf);
|
||||||
static int _enough(struct r10conf *conf, int previous, int ignore);
|
static int _enough(struct r10conf *conf, int previous, int ignore);
|
||||||
|
@ -123,11 +98,6 @@ static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
|
||||||
return kzalloc(size, gfp_flags);
|
return kzalloc(size, gfp_flags);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void r10bio_pool_free(void *r10_bio, void *data)
|
|
||||||
{
|
|
||||||
kfree(r10_bio);
|
|
||||||
}
|
|
||||||
|
|
||||||
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
|
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
|
||||||
/* amount of memory to reserve for resync requests */
|
/* amount of memory to reserve for resync requests */
|
||||||
#define RESYNC_WINDOW (1024*1024)
|
#define RESYNC_WINDOW (1024*1024)
|
||||||
|
@ -233,7 +203,7 @@ out_free_bio:
|
||||||
}
|
}
|
||||||
kfree(rps);
|
kfree(rps);
|
||||||
out_free_r10bio:
|
out_free_r10bio:
|
||||||
r10bio_pool_free(r10_bio, conf);
|
rbio_pool_free(r10_bio, conf);
|
||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -261,7 +231,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
|
||||||
/* resync pages array stored in the 1st bio's .bi_private */
|
/* resync pages array stored in the 1st bio's .bi_private */
|
||||||
kfree(rp);
|
kfree(rp);
|
||||||
|
|
||||||
r10bio_pool_free(r10bio, conf);
|
rbio_pool_free(r10bio, conf);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void put_all_bios(struct r10conf *conf, struct r10bio *r10_bio)
|
static void put_all_bios(struct r10conf *conf, struct r10bio *r10_bio)
|
||||||
|
@ -737,15 +707,19 @@ static struct md_rdev *read_balance(struct r10conf *conf,
|
||||||
int sectors = r10_bio->sectors;
|
int sectors = r10_bio->sectors;
|
||||||
int best_good_sectors;
|
int best_good_sectors;
|
||||||
sector_t new_distance, best_dist;
|
sector_t new_distance, best_dist;
|
||||||
struct md_rdev *best_rdev, *rdev = NULL;
|
struct md_rdev *best_dist_rdev, *best_pending_rdev, *rdev = NULL;
|
||||||
int do_balance;
|
int do_balance;
|
||||||
int best_slot;
|
int best_dist_slot, best_pending_slot;
|
||||||
|
bool has_nonrot_disk = false;
|
||||||
|
unsigned int min_pending;
|
||||||
struct geom *geo = &conf->geo;
|
struct geom *geo = &conf->geo;
|
||||||
|
|
||||||
raid10_find_phys(conf, r10_bio);
|
raid10_find_phys(conf, r10_bio);
|
||||||
rcu_read_lock();
|
rcu_read_lock();
|
||||||
best_slot = -1;
|
best_dist_slot = -1;
|
||||||
best_rdev = NULL;
|
min_pending = UINT_MAX;
|
||||||
|
best_dist_rdev = NULL;
|
||||||
|
best_pending_rdev = NULL;
|
||||||
best_dist = MaxSector;
|
best_dist = MaxSector;
|
||||||
best_good_sectors = 0;
|
best_good_sectors = 0;
|
||||||
do_balance = 1;
|
do_balance = 1;
|
||||||
|
@ -767,6 +741,8 @@ static struct md_rdev *read_balance(struct r10conf *conf,
|
||||||
sector_t first_bad;
|
sector_t first_bad;
|
||||||
int bad_sectors;
|
int bad_sectors;
|
||||||
sector_t dev_sector;
|
sector_t dev_sector;
|
||||||
|
unsigned int pending;
|
||||||
|
bool nonrot;
|
||||||
|
|
||||||
if (r10_bio->devs[slot].bio == IO_BLOCKED)
|
if (r10_bio->devs[slot].bio == IO_BLOCKED)
|
||||||
continue;
|
continue;
|
||||||
|
@ -803,8 +779,8 @@ static struct md_rdev *read_balance(struct r10conf *conf,
|
||||||
first_bad - dev_sector;
|
first_bad - dev_sector;
|
||||||
if (good_sectors > best_good_sectors) {
|
if (good_sectors > best_good_sectors) {
|
||||||
best_good_sectors = good_sectors;
|
best_good_sectors = good_sectors;
|
||||||
best_slot = slot;
|
best_dist_slot = slot;
|
||||||
best_rdev = rdev;
|
best_dist_rdev = rdev;
|
||||||
}
|
}
|
||||||
if (!do_balance)
|
if (!do_balance)
|
||||||
/* Must read from here */
|
/* Must read from here */
|
||||||
|
@ -817,14 +793,23 @@ static struct md_rdev *read_balance(struct r10conf *conf,
|
||||||
if (!do_balance)
|
if (!do_balance)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
if (best_slot >= 0)
|
nonrot = blk_queue_nonrot(bdev_get_queue(rdev->bdev));
|
||||||
|
has_nonrot_disk |= nonrot;
|
||||||
|
pending = atomic_read(&rdev->nr_pending);
|
||||||
|
if (min_pending > pending && nonrot) {
|
||||||
|
min_pending = pending;
|
||||||
|
best_pending_slot = slot;
|
||||||
|
best_pending_rdev = rdev;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (best_dist_slot >= 0)
|
||||||
/* At least 2 disks to choose from so failfast is OK */
|
/* At least 2 disks to choose from so failfast is OK */
|
||||||
set_bit(R10BIO_FailFast, &r10_bio->state);
|
set_bit(R10BIO_FailFast, &r10_bio->state);
|
||||||
/* This optimisation is debatable, and completely destroys
|
/* This optimisation is debatable, and completely destroys
|
||||||
* sequential read speed for 'far copies' arrays. So only
|
* sequential read speed for 'far copies' arrays. So only
|
||||||
* keep it for 'near' arrays, and review those later.
|
* keep it for 'near' arrays, and review those later.
|
||||||
*/
|
*/
|
||||||
if (geo->near_copies > 1 && !atomic_read(&rdev->nr_pending))
|
if (geo->near_copies > 1 && !pending)
|
||||||
new_distance = 0;
|
new_distance = 0;
|
||||||
|
|
||||||
/* for far > 1 always use the lowest address */
|
/* for far > 1 always use the lowest address */
|
||||||
|
@ -833,15 +818,21 @@ static struct md_rdev *read_balance(struct r10conf *conf,
|
||||||
else
|
else
|
||||||
new_distance = abs(r10_bio->devs[slot].addr -
|
new_distance = abs(r10_bio->devs[slot].addr -
|
||||||
conf->mirrors[disk].head_position);
|
conf->mirrors[disk].head_position);
|
||||||
|
|
||||||
if (new_distance < best_dist) {
|
if (new_distance < best_dist) {
|
||||||
best_dist = new_distance;
|
best_dist = new_distance;
|
||||||
best_slot = slot;
|
best_dist_slot = slot;
|
||||||
best_rdev = rdev;
|
best_dist_rdev = rdev;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if (slot >= conf->copies) {
|
if (slot >= conf->copies) {
|
||||||
slot = best_slot;
|
if (has_nonrot_disk) {
|
||||||
rdev = best_rdev;
|
slot = best_pending_slot;
|
||||||
|
rdev = best_pending_rdev;
|
||||||
|
} else {
|
||||||
|
slot = best_dist_slot;
|
||||||
|
rdev = best_dist_rdev;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if (slot >= 0) {
|
if (slot >= 0) {
|
||||||
|
@ -3675,8 +3666,8 @@ static struct r10conf *setup_conf(struct mddev *mddev)
|
||||||
|
|
||||||
conf->geo = geo;
|
conf->geo = geo;
|
||||||
conf->copies = copies;
|
conf->copies = copies;
|
||||||
err = mempool_init(&conf->r10bio_pool, NR_RAID10_BIOS, r10bio_pool_alloc,
|
err = mempool_init(&conf->r10bio_pool, NR_RAID_BIOS, r10bio_pool_alloc,
|
||||||
r10bio_pool_free, conf);
|
rbio_pool_free, conf);
|
||||||
if (err)
|
if (err)
|
||||||
goto out;
|
goto out;
|
||||||
|
|
||||||
|
@ -4780,8 +4771,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
|
||||||
int idx = 0;
|
int idx = 0;
|
||||||
struct page **pages;
|
struct page **pages;
|
||||||
|
|
||||||
r10b = kmalloc(sizeof(*r10b) +
|
r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO);
|
||||||
sizeof(struct r10dev) * conf->copies, GFP_NOIO);
|
|
||||||
if (!r10b) {
|
if (!r10b) {
|
||||||
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
|
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
|
@ -5251,7 +5251,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
|
||||||
rcu_read_unlock();
|
rcu_read_unlock();
|
||||||
raid_bio->bi_next = (void*)rdev;
|
raid_bio->bi_next = (void*)rdev;
|
||||||
bio_set_dev(align_bi, rdev->bdev);
|
bio_set_dev(align_bi, rdev->bdev);
|
||||||
bio_clear_flag(align_bi, BIO_SEG_VALID);
|
|
||||||
|
|
||||||
if (is_badblock(rdev, align_bi->bi_iter.bi_sector,
|
if (is_badblock(rdev, align_bi->bi_iter.bi_sector,
|
||||||
bio_sectors(align_bi),
|
bio_sectors(align_bi),
|
||||||
|
@ -7672,7 +7671,7 @@ abort:
|
||||||
static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
|
static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
|
||||||
{
|
{
|
||||||
struct r5conf *conf = mddev->private;
|
struct r5conf *conf = mddev->private;
|
||||||
int err = -EEXIST;
|
int ret, err = -EEXIST;
|
||||||
int disk;
|
int disk;
|
||||||
struct disk_info *p;
|
struct disk_info *p;
|
||||||
int first = 0;
|
int first = 0;
|
||||||
|
@ -7687,7 +7686,14 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
|
||||||
* The array is in readonly mode if journal is missing, so no
|
* The array is in readonly mode if journal is missing, so no
|
||||||
* write requests running. We should be safe
|
* write requests running. We should be safe
|
||||||
*/
|
*/
|
||||||
log_init(conf, rdev, false);
|
ret = log_init(conf, rdev, false);
|
||||||
|
if (ret)
|
||||||
|
return ret;
|
||||||
|
|
||||||
|
ret = r5l_start(conf->log);
|
||||||
|
if (ret)
|
||||||
|
return ret;
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
if (mddev->recovery_disabled == conf->recovery_disabled)
|
if (mddev->recovery_disabled == conf->recovery_disabled)
|
||||||
|
|
|
@ -1113,15 +1113,15 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
|
||||||
return id;
|
return id;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
|
static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned int fid,
|
||||||
void *buffer, size_t buflen, u32 *result)
|
unsigned int dword11, void *buffer, size_t buflen, u32 *result)
|
||||||
{
|
{
|
||||||
struct nvme_command c;
|
struct nvme_command c;
|
||||||
union nvme_result res;
|
union nvme_result res;
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
memset(&c, 0, sizeof(c));
|
memset(&c, 0, sizeof(c));
|
||||||
c.features.opcode = nvme_admin_set_features;
|
c.features.opcode = op;
|
||||||
c.features.fid = cpu_to_le32(fid);
|
c.features.fid = cpu_to_le32(fid);
|
||||||
c.features.dword11 = cpu_to_le32(dword11);
|
c.features.dword11 = cpu_to_le32(dword11);
|
||||||
|
|
||||||
|
@ -1132,6 +1132,24 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
|
||||||
|
unsigned int dword11, void *buffer, size_t buflen,
|
||||||
|
u32 *result)
|
||||||
|
{
|
||||||
|
return nvme_features(dev, nvme_admin_set_features, fid, dword11, buffer,
|
||||||
|
buflen, result);
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(nvme_set_features);
|
||||||
|
|
||||||
|
int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid,
|
||||||
|
unsigned int dword11, void *buffer, size_t buflen,
|
||||||
|
u32 *result)
|
||||||
|
{
|
||||||
|
return nvme_features(dev, nvme_admin_get_features, fid, dword11, buffer,
|
||||||
|
buflen, result);
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(nvme_get_features);
|
||||||
|
|
||||||
int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
|
int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
|
||||||
{
|
{
|
||||||
u32 q_count = (*count - 1) | ((*count - 1) << 16);
|
u32 q_count = (*count - 1) | ((*count - 1) << 16);
|
||||||
|
@ -3318,7 +3336,7 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
|
||||||
device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups);
|
device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups);
|
||||||
|
|
||||||
nvme_mpath_add_disk(ns, id);
|
nvme_mpath_add_disk(ns, id);
|
||||||
nvme_fault_inject_init(ns);
|
nvme_fault_inject_init(&ns->fault_inject, ns->disk->disk_name);
|
||||||
kfree(id);
|
kfree(id);
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
@ -3343,7 +3361,15 @@ static void nvme_ns_remove(struct nvme_ns *ns)
|
||||||
if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
|
if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
nvme_fault_inject_fini(ns);
|
nvme_fault_inject_fini(&ns->fault_inject);
|
||||||
|
|
||||||
|
mutex_lock(&ns->ctrl->subsys->lock);
|
||||||
|
list_del_rcu(&ns->siblings);
|
||||||
|
mutex_unlock(&ns->ctrl->subsys->lock);
|
||||||
|
synchronize_rcu(); /* guarantee not available in head->list */
|
||||||
|
nvme_mpath_clear_current_path(ns);
|
||||||
|
synchronize_srcu(&ns->head->srcu); /* wait for concurrent submissions */
|
||||||
|
|
||||||
if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
|
if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
|
||||||
del_gendisk(ns->disk);
|
del_gendisk(ns->disk);
|
||||||
blk_cleanup_queue(ns->queue);
|
blk_cleanup_queue(ns->queue);
|
||||||
|
@ -3351,16 +3377,10 @@ static void nvme_ns_remove(struct nvme_ns *ns)
|
||||||
blk_integrity_unregister(ns->disk);
|
blk_integrity_unregister(ns->disk);
|
||||||
}
|
}
|
||||||
|
|
||||||
mutex_lock(&ns->ctrl->subsys->lock);
|
|
||||||
list_del_rcu(&ns->siblings);
|
|
||||||
nvme_mpath_clear_current_path(ns);
|
|
||||||
mutex_unlock(&ns->ctrl->subsys->lock);
|
|
||||||
|
|
||||||
down_write(&ns->ctrl->namespaces_rwsem);
|
down_write(&ns->ctrl->namespaces_rwsem);
|
||||||
list_del_init(&ns->list);
|
list_del_init(&ns->list);
|
||||||
up_write(&ns->ctrl->namespaces_rwsem);
|
up_write(&ns->ctrl->namespaces_rwsem);
|
||||||
|
|
||||||
synchronize_srcu(&ns->head->srcu);
|
|
||||||
nvme_mpath_check_last_path(ns);
|
nvme_mpath_check_last_path(ns);
|
||||||
nvme_put_ns(ns);
|
nvme_put_ns(ns);
|
||||||
}
|
}
|
||||||
|
@ -3702,6 +3722,7 @@ EXPORT_SYMBOL_GPL(nvme_start_ctrl);
|
||||||
|
|
||||||
void nvme_uninit_ctrl(struct nvme_ctrl *ctrl)
|
void nvme_uninit_ctrl(struct nvme_ctrl *ctrl)
|
||||||
{
|
{
|
||||||
|
nvme_fault_inject_fini(&ctrl->fault_inject);
|
||||||
dev_pm_qos_hide_latency_tolerance(ctrl->device);
|
dev_pm_qos_hide_latency_tolerance(ctrl->device);
|
||||||
cdev_device_del(&ctrl->cdev, ctrl->device);
|
cdev_device_del(&ctrl->cdev, ctrl->device);
|
||||||
}
|
}
|
||||||
|
@ -3797,6 +3818,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
|
||||||
dev_pm_qos_update_user_latency_tolerance(ctrl->device,
|
dev_pm_qos_update_user_latency_tolerance(ctrl->device,
|
||||||
min(default_ps_max_latency_us, (unsigned long)S32_MAX));
|
min(default_ps_max_latency_us, (unsigned long)S32_MAX));
|
||||||
|
|
||||||
|
nvme_fault_inject_init(&ctrl->fault_inject, dev_name(ctrl->device));
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
out_free_name:
|
out_free_name:
|
||||||
kfree_const(ctrl->device->kobj.name);
|
kfree_const(ctrl->device->kobj.name);
|
||||||
|
|
|
@ -578,7 +578,7 @@ bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
|
||||||
switch (ctrl->state) {
|
switch (ctrl->state) {
|
||||||
case NVME_CTRL_NEW:
|
case NVME_CTRL_NEW:
|
||||||
case NVME_CTRL_CONNECTING:
|
case NVME_CTRL_CONNECTING:
|
||||||
if (req->cmd->common.opcode == nvme_fabrics_command &&
|
if (nvme_is_fabrics(req->cmd) &&
|
||||||
req->cmd->fabrics.fctype == nvme_fabrics_type_connect)
|
req->cmd->fabrics.fctype == nvme_fabrics_type_connect)
|
||||||
return true;
|
return true;
|
||||||
break;
|
break;
|
||||||
|
|
|
@ -15,11 +15,10 @@ static DECLARE_FAULT_ATTR(fail_default_attr);
|
||||||
static char *fail_request;
|
static char *fail_request;
|
||||||
module_param(fail_request, charp, 0000);
|
module_param(fail_request, charp, 0000);
|
||||||
|
|
||||||
void nvme_fault_inject_init(struct nvme_ns *ns)
|
void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj,
|
||||||
|
const char *dev_name)
|
||||||
{
|
{
|
||||||
struct dentry *dir, *parent;
|
struct dentry *dir, *parent;
|
||||||
char *name = ns->disk->disk_name;
|
|
||||||
struct nvme_fault_inject *fault_inj = &ns->fault_inject;
|
|
||||||
struct fault_attr *attr = &fault_inj->attr;
|
struct fault_attr *attr = &fault_inj->attr;
|
||||||
|
|
||||||
/* set default fault injection attribute */
|
/* set default fault injection attribute */
|
||||||
|
@ -27,20 +26,20 @@ void nvme_fault_inject_init(struct nvme_ns *ns)
|
||||||
setup_fault_attr(&fail_default_attr, fail_request);
|
setup_fault_attr(&fail_default_attr, fail_request);
|
||||||
|
|
||||||
/* create debugfs directory and attribute */
|
/* create debugfs directory and attribute */
|
||||||
parent = debugfs_create_dir(name, NULL);
|
parent = debugfs_create_dir(dev_name, NULL);
|
||||||
if (!parent) {
|
if (!parent) {
|
||||||
pr_warn("%s: failed to create debugfs directory\n", name);
|
pr_warn("%s: failed to create debugfs directory\n", dev_name);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
*attr = fail_default_attr;
|
*attr = fail_default_attr;
|
||||||
dir = fault_create_debugfs_attr("fault_inject", parent, attr);
|
dir = fault_create_debugfs_attr("fault_inject", parent, attr);
|
||||||
if (IS_ERR(dir)) {
|
if (IS_ERR(dir)) {
|
||||||
pr_warn("%s: failed to create debugfs attr\n", name);
|
pr_warn("%s: failed to create debugfs attr\n", dev_name);
|
||||||
debugfs_remove_recursive(parent);
|
debugfs_remove_recursive(parent);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
ns->fault_inject.parent = parent;
|
fault_inj->parent = parent;
|
||||||
|
|
||||||
/* create debugfs for status code and dont_retry */
|
/* create debugfs for status code and dont_retry */
|
||||||
fault_inj->status = NVME_SC_INVALID_OPCODE;
|
fault_inj->status = NVME_SC_INVALID_OPCODE;
|
||||||
|
@ -49,29 +48,33 @@ void nvme_fault_inject_init(struct nvme_ns *ns)
|
||||||
debugfs_create_bool("dont_retry", 0600, dir, &fault_inj->dont_retry);
|
debugfs_create_bool("dont_retry", 0600, dir, &fault_inj->dont_retry);
|
||||||
}
|
}
|
||||||
|
|
||||||
void nvme_fault_inject_fini(struct nvme_ns *ns)
|
void nvme_fault_inject_fini(struct nvme_fault_inject *fault_inject)
|
||||||
{
|
{
|
||||||
/* remove debugfs directories */
|
/* remove debugfs directories */
|
||||||
debugfs_remove_recursive(ns->fault_inject.parent);
|
debugfs_remove_recursive(fault_inject->parent);
|
||||||
}
|
}
|
||||||
|
|
||||||
void nvme_should_fail(struct request *req)
|
void nvme_should_fail(struct request *req)
|
||||||
{
|
{
|
||||||
struct gendisk *disk = req->rq_disk;
|
struct gendisk *disk = req->rq_disk;
|
||||||
struct nvme_ns *ns = NULL;
|
struct nvme_fault_inject *fault_inject = NULL;
|
||||||
u16 status;
|
u16 status;
|
||||||
|
|
||||||
/*
|
if (disk) {
|
||||||
* make sure this request is coming from a valid namespace
|
struct nvme_ns *ns = disk->private_data;
|
||||||
*/
|
|
||||||
if (!disk)
|
|
||||||
return;
|
|
||||||
|
|
||||||
ns = disk->private_data;
|
if (ns)
|
||||||
if (ns && should_fail(&ns->fault_inject.attr, 1)) {
|
fault_inject = &ns->fault_inject;
|
||||||
|
else
|
||||||
|
WARN_ONCE(1, "No namespace found for request\n");
|
||||||
|
} else {
|
||||||
|
fault_inject = &nvme_req(req)->ctrl->fault_inject;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (fault_inject && should_fail(&fault_inject->attr, 1)) {
|
||||||
/* inject status code and DNR bit */
|
/* inject status code and DNR bit */
|
||||||
status = ns->fault_inject.status;
|
status = fault_inject->status;
|
||||||
if (ns->fault_inject.dont_retry)
|
if (fault_inject->dont_retry)
|
||||||
status |= NVME_SC_DNR;
|
status |= NVME_SC_DNR;
|
||||||
nvme_req(req)->status = status;
|
nvme_req(req)->status = status;
|
||||||
}
|
}
|
||||||
|
|
|
@ -2607,6 +2607,12 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
|
||||||
if (nvme_fc_ctlr_active_on_rport(ctrl))
|
if (nvme_fc_ctlr_active_on_rport(ctrl))
|
||||||
return -ENOTUNIQ;
|
return -ENOTUNIQ;
|
||||||
|
|
||||||
|
dev_info(ctrl->ctrl.device,
|
||||||
|
"NVME-FC{%d}: create association : host wwpn 0x%016llx "
|
||||||
|
" rport wwpn 0x%016llx: NQN \"%s\"\n",
|
||||||
|
ctrl->cnum, ctrl->lport->localport.port_name,
|
||||||
|
ctrl->rport->remoteport.port_name, ctrl->ctrl.opts->subsysnqn);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Create the admin queue
|
* Create the admin queue
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -660,7 +660,7 @@ static struct request *nvme_nvm_alloc_request(struct request_queue *q,
|
||||||
rq->cmd_flags &= ~REQ_FAILFAST_DRIVER;
|
rq->cmd_flags &= ~REQ_FAILFAST_DRIVER;
|
||||||
|
|
||||||
if (rqd->bio)
|
if (rqd->bio)
|
||||||
blk_init_request_from_bio(rq, rqd->bio);
|
blk_rq_append_bio(rq, &rqd->bio);
|
||||||
else
|
else
|
||||||
rq->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
|
rq->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
|
||||||
|
|
||||||
|
|
|
@ -146,6 +146,15 @@ enum nvme_ctrl_state {
|
||||||
NVME_CTRL_DEAD,
|
NVME_CTRL_DEAD,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
struct nvme_fault_inject {
|
||||||
|
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
|
||||||
|
struct fault_attr attr;
|
||||||
|
struct dentry *parent;
|
||||||
|
bool dont_retry; /* DNR, do not retry */
|
||||||
|
u16 status; /* status code */
|
||||||
|
#endif
|
||||||
|
};
|
||||||
|
|
||||||
struct nvme_ctrl {
|
struct nvme_ctrl {
|
||||||
bool comp_seen;
|
bool comp_seen;
|
||||||
enum nvme_ctrl_state state;
|
enum nvme_ctrl_state state;
|
||||||
|
@ -247,6 +256,8 @@ struct nvme_ctrl {
|
||||||
|
|
||||||
struct page *discard_page;
|
struct page *discard_page;
|
||||||
unsigned long discard_page_busy;
|
unsigned long discard_page_busy;
|
||||||
|
|
||||||
|
struct nvme_fault_inject fault_inject;
|
||||||
};
|
};
|
||||||
|
|
||||||
enum nvme_iopolicy {
|
enum nvme_iopolicy {
|
||||||
|
@ -313,15 +324,6 @@ struct nvme_ns_head {
|
||||||
#endif
|
#endif
|
||||||
};
|
};
|
||||||
|
|
||||||
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
|
|
||||||
struct nvme_fault_inject {
|
|
||||||
struct fault_attr attr;
|
|
||||||
struct dentry *parent;
|
|
||||||
bool dont_retry; /* DNR, do not retry */
|
|
||||||
u16 status; /* status code */
|
|
||||||
};
|
|
||||||
#endif
|
|
||||||
|
|
||||||
struct nvme_ns {
|
struct nvme_ns {
|
||||||
struct list_head list;
|
struct list_head list;
|
||||||
|
|
||||||
|
@ -349,9 +351,7 @@ struct nvme_ns {
|
||||||
#define NVME_NS_ANA_PENDING 2
|
#define NVME_NS_ANA_PENDING 2
|
||||||
u16 noiob;
|
u16 noiob;
|
||||||
|
|
||||||
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
|
|
||||||
struct nvme_fault_inject fault_inject;
|
struct nvme_fault_inject fault_inject;
|
||||||
#endif
|
|
||||||
|
|
||||||
};
|
};
|
||||||
|
|
||||||
|
@ -372,12 +372,18 @@ struct nvme_ctrl_ops {
|
||||||
};
|
};
|
||||||
|
|
||||||
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
|
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
|
||||||
void nvme_fault_inject_init(struct nvme_ns *ns);
|
void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj,
|
||||||
void nvme_fault_inject_fini(struct nvme_ns *ns);
|
const char *dev_name);
|
||||||
|
void nvme_fault_inject_fini(struct nvme_fault_inject *fault_inject);
|
||||||
void nvme_should_fail(struct request *req);
|
void nvme_should_fail(struct request *req);
|
||||||
#else
|
#else
|
||||||
static inline void nvme_fault_inject_init(struct nvme_ns *ns) {}
|
static inline void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj,
|
||||||
static inline void nvme_fault_inject_fini(struct nvme_ns *ns) {}
|
const char *dev_name)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
static inline void nvme_fault_inject_fini(struct nvme_fault_inject *fault_inj)
|
||||||
|
{
|
||||||
|
}
|
||||||
static inline void nvme_should_fail(struct request *req) {}
|
static inline void nvme_should_fail(struct request *req) {}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
@ -459,6 +465,12 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
|
||||||
union nvme_result *result, void *buffer, unsigned bufflen,
|
union nvme_result *result, void *buffer, unsigned bufflen,
|
||||||
unsigned timeout, int qid, int at_head,
|
unsigned timeout, int qid, int at_head,
|
||||||
blk_mq_req_flags_t flags, bool poll);
|
blk_mq_req_flags_t flags, bool poll);
|
||||||
|
int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
|
||||||
|
unsigned int dword11, void *buffer, size_t buflen,
|
||||||
|
u32 *result);
|
||||||
|
int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid,
|
||||||
|
unsigned int dword11, void *buffer, size_t buflen,
|
||||||
|
u32 *result);
|
||||||
int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
|
int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
|
||||||
void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
|
void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
|
||||||
int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
|
int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
|
||||||
|
|
|
@ -18,6 +18,7 @@
|
||||||
#include <linux/mutex.h>
|
#include <linux/mutex.h>
|
||||||
#include <linux/once.h>
|
#include <linux/once.h>
|
||||||
#include <linux/pci.h>
|
#include <linux/pci.h>
|
||||||
|
#include <linux/suspend.h>
|
||||||
#include <linux/t10-pi.h>
|
#include <linux/t10-pi.h>
|
||||||
#include <linux/types.h>
|
#include <linux/types.h>
|
||||||
#include <linux/io-64-nonatomic-lo-hi.h>
|
#include <linux/io-64-nonatomic-lo-hi.h>
|
||||||
|
@ -67,20 +68,14 @@ static int io_queue_depth = 1024;
|
||||||
module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644);
|
module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644);
|
||||||
MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2");
|
MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2");
|
||||||
|
|
||||||
static int queue_count_set(const char *val, const struct kernel_param *kp);
|
|
||||||
static const struct kernel_param_ops queue_count_ops = {
|
|
||||||
.set = queue_count_set,
|
|
||||||
.get = param_get_int,
|
|
||||||
};
|
|
||||||
|
|
||||||
static int write_queues;
|
static int write_queues;
|
||||||
module_param_cb(write_queues, &queue_count_ops, &write_queues, 0644);
|
module_param(write_queues, int, 0644);
|
||||||
MODULE_PARM_DESC(write_queues,
|
MODULE_PARM_DESC(write_queues,
|
||||||
"Number of queues to use for writes. If not set, reads and writes "
|
"Number of queues to use for writes. If not set, reads and writes "
|
||||||
"will share a queue set.");
|
"will share a queue set.");
|
||||||
|
|
||||||
static int poll_queues = 0;
|
static int poll_queues;
|
||||||
module_param_cb(poll_queues, &queue_count_ops, &poll_queues, 0644);
|
module_param(poll_queues, int, 0644);
|
||||||
MODULE_PARM_DESC(poll_queues, "Number of queues to use for polled IO.");
|
MODULE_PARM_DESC(poll_queues, "Number of queues to use for polled IO.");
|
||||||
|
|
||||||
struct nvme_dev;
|
struct nvme_dev;
|
||||||
|
@ -116,6 +111,7 @@ struct nvme_dev {
|
||||||
u32 cmbsz;
|
u32 cmbsz;
|
||||||
u32 cmbloc;
|
u32 cmbloc;
|
||||||
struct nvme_ctrl ctrl;
|
struct nvme_ctrl ctrl;
|
||||||
|
u32 last_ps;
|
||||||
|
|
||||||
mempool_t *iod_mempool;
|
mempool_t *iod_mempool;
|
||||||
|
|
||||||
|
@ -144,19 +140,6 @@ static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
|
||||||
return param_set_int(val, kp);
|
return param_set_int(val, kp);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int queue_count_set(const char *val, const struct kernel_param *kp)
|
|
||||||
{
|
|
||||||
int n, ret;
|
|
||||||
|
|
||||||
ret = kstrtoint(val, 10, &n);
|
|
||||||
if (ret)
|
|
||||||
return ret;
|
|
||||||
if (n > num_possible_cpus())
|
|
||||||
n = num_possible_cpus();
|
|
||||||
|
|
||||||
return param_set_int(val, kp);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline unsigned int sq_idx(unsigned int qid, u32 stride)
|
static inline unsigned int sq_idx(unsigned int qid, u32 stride)
|
||||||
{
|
{
|
||||||
return qid * 2 * stride;
|
return qid * 2 * stride;
|
||||||
|
@ -2068,6 +2051,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
|
||||||
.priv = dev,
|
.priv = dev,
|
||||||
};
|
};
|
||||||
unsigned int irq_queues, this_p_queues;
|
unsigned int irq_queues, this_p_queues;
|
||||||
|
unsigned int nr_cpus = num_possible_cpus();
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Poll queues don't need interrupts, but we need at least one IO
|
* Poll queues don't need interrupts, but we need at least one IO
|
||||||
|
@ -2078,6 +2062,9 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
|
||||||
this_p_queues = nr_io_queues - 1;
|
this_p_queues = nr_io_queues - 1;
|
||||||
irq_queues = 1;
|
irq_queues = 1;
|
||||||
} else {
|
} else {
|
||||||
|
if (nr_cpus < nr_io_queues - this_p_queues)
|
||||||
|
irq_queues = nr_cpus + 1;
|
||||||
|
else
|
||||||
irq_queues = nr_io_queues - this_p_queues + 1;
|
irq_queues = nr_io_queues - this_p_queues + 1;
|
||||||
}
|
}
|
||||||
dev->io_queues[HCTX_TYPE_POLL] = this_p_queues;
|
dev->io_queues[HCTX_TYPE_POLL] = this_p_queues;
|
||||||
|
@ -2464,10 +2451,8 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
|
||||||
kfree(dev);
|
kfree(dev);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
|
static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
|
||||||
{
|
{
|
||||||
dev_warn(dev->ctrl.device, "Removing after probe failure status: %d\n", status);
|
|
||||||
|
|
||||||
nvme_get_ctrl(&dev->ctrl);
|
nvme_get_ctrl(&dev->ctrl);
|
||||||
nvme_dev_disable(dev, false);
|
nvme_dev_disable(dev, false);
|
||||||
nvme_kill_queues(&dev->ctrl);
|
nvme_kill_queues(&dev->ctrl);
|
||||||
|
@ -2480,11 +2465,13 @@ static void nvme_reset_work(struct work_struct *work)
|
||||||
struct nvme_dev *dev =
|
struct nvme_dev *dev =
|
||||||
container_of(work, struct nvme_dev, ctrl.reset_work);
|
container_of(work, struct nvme_dev, ctrl.reset_work);
|
||||||
bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
|
bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
|
||||||
int result = -ENODEV;
|
int result;
|
||||||
enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
|
enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
|
||||||
|
|
||||||
if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
|
if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING)) {
|
||||||
|
result = -ENODEV;
|
||||||
goto out;
|
goto out;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* If we're called to reset a live controller first shut it down before
|
* If we're called to reset a live controller first shut it down before
|
||||||
|
@ -2528,6 +2515,7 @@ static void nvme_reset_work(struct work_struct *work)
|
||||||
if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
|
if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
|
||||||
dev_warn(dev->ctrl.device,
|
dev_warn(dev->ctrl.device,
|
||||||
"failed to mark controller CONNECTING\n");
|
"failed to mark controller CONNECTING\n");
|
||||||
|
result = -EBUSY;
|
||||||
goto out;
|
goto out;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -2588,6 +2576,7 @@ static void nvme_reset_work(struct work_struct *work)
|
||||||
if (!nvme_change_ctrl_state(&dev->ctrl, new_state)) {
|
if (!nvme_change_ctrl_state(&dev->ctrl, new_state)) {
|
||||||
dev_warn(dev->ctrl.device,
|
dev_warn(dev->ctrl.device,
|
||||||
"failed to mark controller state %d\n", new_state);
|
"failed to mark controller state %d\n", new_state);
|
||||||
|
result = -ENODEV;
|
||||||
goto out;
|
goto out;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -2597,7 +2586,10 @@ static void nvme_reset_work(struct work_struct *work)
|
||||||
out_unlock:
|
out_unlock:
|
||||||
mutex_unlock(&dev->shutdown_lock);
|
mutex_unlock(&dev->shutdown_lock);
|
||||||
out:
|
out:
|
||||||
nvme_remove_dead_ctrl(dev, result);
|
if (result)
|
||||||
|
dev_warn(dev->ctrl.device,
|
||||||
|
"Removing after probe failure status: %d\n", result);
|
||||||
|
nvme_remove_dead_ctrl(dev);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void nvme_remove_dead_ctrl_work(struct work_struct *work)
|
static void nvme_remove_dead_ctrl_work(struct work_struct *work)
|
||||||
|
@ -2835,16 +2827,94 @@ static void nvme_remove(struct pci_dev *pdev)
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_PM_SLEEP
|
#ifdef CONFIG_PM_SLEEP
|
||||||
|
static int nvme_get_power_state(struct nvme_ctrl *ctrl, u32 *ps)
|
||||||
|
{
|
||||||
|
return nvme_get_features(ctrl, NVME_FEAT_POWER_MGMT, 0, NULL, 0, ps);
|
||||||
|
}
|
||||||
|
|
||||||
|
static int nvme_set_power_state(struct nvme_ctrl *ctrl, u32 ps)
|
||||||
|
{
|
||||||
|
return nvme_set_features(ctrl, NVME_FEAT_POWER_MGMT, ps, NULL, 0, NULL);
|
||||||
|
}
|
||||||
|
|
||||||
|
static int nvme_resume(struct device *dev)
|
||||||
|
{
|
||||||
|
struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
|
||||||
|
struct nvme_ctrl *ctrl = &ndev->ctrl;
|
||||||
|
|
||||||
|
if (pm_resume_via_firmware() || !ctrl->npss ||
|
||||||
|
nvme_set_power_state(ctrl, ndev->last_ps) != 0)
|
||||||
|
nvme_reset_ctrl(ctrl);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
static int nvme_suspend(struct device *dev)
|
static int nvme_suspend(struct device *dev)
|
||||||
{
|
{
|
||||||
struct pci_dev *pdev = to_pci_dev(dev);
|
struct pci_dev *pdev = to_pci_dev(dev);
|
||||||
struct nvme_dev *ndev = pci_get_drvdata(pdev);
|
struct nvme_dev *ndev = pci_get_drvdata(pdev);
|
||||||
|
struct nvme_ctrl *ctrl = &ndev->ctrl;
|
||||||
|
int ret = -EBUSY;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The platform does not remove power for a kernel managed suspend so
|
||||||
|
* use host managed nvme power settings for lowest idle power if
|
||||||
|
* possible. This should have quicker resume latency than a full device
|
||||||
|
* shutdown. But if the firmware is involved after the suspend or the
|
||||||
|
* device does not support any non-default power states, shut down the
|
||||||
|
* device fully.
|
||||||
|
*/
|
||||||
|
if (pm_suspend_via_firmware() || !ctrl->npss) {
|
||||||
|
nvme_dev_disable(ndev, true);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
nvme_start_freeze(ctrl);
|
||||||
|
nvme_wait_freeze(ctrl);
|
||||||
|
nvme_sync_queues(ctrl);
|
||||||
|
|
||||||
|
if (ctrl->state != NVME_CTRL_LIVE &&
|
||||||
|
ctrl->state != NVME_CTRL_ADMIN_ONLY)
|
||||||
|
goto unfreeze;
|
||||||
|
|
||||||
|
ndev->last_ps = 0;
|
||||||
|
ret = nvme_get_power_state(ctrl, &ndev->last_ps);
|
||||||
|
if (ret < 0)
|
||||||
|
goto unfreeze;
|
||||||
|
|
||||||
|
ret = nvme_set_power_state(ctrl, ctrl->npss);
|
||||||
|
if (ret < 0)
|
||||||
|
goto unfreeze;
|
||||||
|
|
||||||
|
if (ret) {
|
||||||
|
/*
|
||||||
|
* Clearing npss forces a controller reset on resume. The
|
||||||
|
* correct value will be resdicovered then.
|
||||||
|
*/
|
||||||
|
nvme_dev_disable(ndev, true);
|
||||||
|
ctrl->npss = 0;
|
||||||
|
ret = 0;
|
||||||
|
goto unfreeze;
|
||||||
|
}
|
||||||
|
/*
|
||||||
|
* A saved state prevents pci pm from generically controlling the
|
||||||
|
* device's power. If we're using protocol specific settings, we don't
|
||||||
|
* want pci interfering.
|
||||||
|
*/
|
||||||
|
pci_save_state(pdev);
|
||||||
|
unfreeze:
|
||||||
|
nvme_unfreeze(ctrl);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int nvme_simple_suspend(struct device *dev)
|
||||||
|
{
|
||||||
|
struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
|
||||||
|
|
||||||
nvme_dev_disable(ndev, true);
|
nvme_dev_disable(ndev, true);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int nvme_resume(struct device *dev)
|
static int nvme_simple_resume(struct device *dev)
|
||||||
{
|
{
|
||||||
struct pci_dev *pdev = to_pci_dev(dev);
|
struct pci_dev *pdev = to_pci_dev(dev);
|
||||||
struct nvme_dev *ndev = pci_get_drvdata(pdev);
|
struct nvme_dev *ndev = pci_get_drvdata(pdev);
|
||||||
|
@ -2852,9 +2922,16 @@ static int nvme_resume(struct device *dev)
|
||||||
nvme_reset_ctrl(&ndev->ctrl);
|
nvme_reset_ctrl(&ndev->ctrl);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
#endif
|
|
||||||
|
|
||||||
static SIMPLE_DEV_PM_OPS(nvme_dev_pm_ops, nvme_suspend, nvme_resume);
|
const struct dev_pm_ops nvme_dev_pm_ops = {
|
||||||
|
.suspend = nvme_suspend,
|
||||||
|
.resume = nvme_resume,
|
||||||
|
.freeze = nvme_simple_suspend,
|
||||||
|
.thaw = nvme_simple_resume,
|
||||||
|
.poweroff = nvme_simple_suspend,
|
||||||
|
.restore = nvme_simple_resume,
|
||||||
|
};
|
||||||
|
#endif /* CONFIG_PM_SLEEP */
|
||||||
|
|
||||||
static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
|
static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
|
||||||
pci_channel_state_t state)
|
pci_channel_state_t state)
|
||||||
|
@ -2959,9 +3036,11 @@ static struct pci_driver nvme_driver = {
|
||||||
.probe = nvme_probe,
|
.probe = nvme_probe,
|
||||||
.remove = nvme_remove,
|
.remove = nvme_remove,
|
||||||
.shutdown = nvme_shutdown,
|
.shutdown = nvme_shutdown,
|
||||||
|
#ifdef CONFIG_PM_SLEEP
|
||||||
.driver = {
|
.driver = {
|
||||||
.pm = &nvme_dev_pm_ops,
|
.pm = &nvme_dev_pm_ops,
|
||||||
},
|
},
|
||||||
|
#endif
|
||||||
.sriov_configure = pci_sriov_configure_simple,
|
.sriov_configure = pci_sriov_configure_simple,
|
||||||
.err_handler = &nvme_err_handler,
|
.err_handler = &nvme_err_handler,
|
||||||
};
|
};
|
||||||
|
|
|
@ -135,6 +135,69 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static const char *nvme_trace_fabrics_property_set(struct trace_seq *p, u8 *spc)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
u8 attrib = spc[0];
|
||||||
|
u32 ofst = get_unaligned_le32(spc + 4);
|
||||||
|
u64 value = get_unaligned_le64(spc + 8);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "attrib=%u, ofst=0x%x, value=0x%llx",
|
||||||
|
attrib, ofst, value);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvme_trace_fabrics_connect(struct trace_seq *p, u8 *spc)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
u16 recfmt = get_unaligned_le16(spc);
|
||||||
|
u16 qid = get_unaligned_le16(spc + 2);
|
||||||
|
u16 sqsize = get_unaligned_le16(spc + 4);
|
||||||
|
u8 cattr = spc[6];
|
||||||
|
u32 kato = get_unaligned_le32(spc + 8);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "recfmt=%u, qid=%u, sqsize=%u, cattr=%u, kato=%u",
|
||||||
|
recfmt, qid, sqsize, cattr, kato);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvme_trace_fabrics_property_get(struct trace_seq *p, u8 *spc)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
u8 attrib = spc[0];
|
||||||
|
u32 ofst = get_unaligned_le32(spc + 4);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "attrib=%u, ofst=0x%x", attrib, ofst);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvme_trace_fabrics_common(struct trace_seq *p, u8 *spc)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "spcecific=%*ph", 24, spc);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
const char *nvme_trace_parse_fabrics_cmd(struct trace_seq *p,
|
||||||
|
u8 fctype, u8 *spc)
|
||||||
|
{
|
||||||
|
switch (fctype) {
|
||||||
|
case nvme_fabrics_type_property_set:
|
||||||
|
return nvme_trace_fabrics_property_set(p, spc);
|
||||||
|
case nvme_fabrics_type_connect:
|
||||||
|
return nvme_trace_fabrics_connect(p, spc);
|
||||||
|
case nvme_fabrics_type_property_get:
|
||||||
|
return nvme_trace_fabrics_property_get(p, spc);
|
||||||
|
default:
|
||||||
|
return nvme_trace_fabrics_common(p, spc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
const char *nvme_trace_disk_name(struct trace_seq *p, char *name)
|
const char *nvme_trace_disk_name(struct trace_seq *p, char *name)
|
||||||
{
|
{
|
||||||
const char *ret = trace_seq_buffer_ptr(p);
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
@ -145,6 +208,5 @@ const char *nvme_trace_disk_name(struct trace_seq *p, char *name)
|
||||||
|
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(nvme_trace_disk_name);
|
|
||||||
|
|
||||||
EXPORT_TRACEPOINT_SYMBOL_GPL(nvme_sq);
|
EXPORT_TRACEPOINT_SYMBOL_GPL(nvme_sq);
|
||||||
|
|
|
@ -16,59 +16,19 @@
|
||||||
|
|
||||||
#include "nvme.h"
|
#include "nvme.h"
|
||||||
|
|
||||||
#define nvme_admin_opcode_name(opcode) { opcode, #opcode }
|
|
||||||
#define show_admin_opcode_name(val) \
|
|
||||||
__print_symbolic(val, \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_delete_sq), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_create_sq), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_get_log_page), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_delete_cq), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_create_cq), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_identify), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_abort_cmd), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_set_features), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_get_features), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_async_event), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_ns_mgmt), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_activate_fw), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_download_fw), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_ns_attach), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_keep_alive), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_directive_send), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_directive_recv), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_dbbuf), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_format_nvm), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_security_send), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_security_recv), \
|
|
||||||
nvme_admin_opcode_name(nvme_admin_sanitize_nvm))
|
|
||||||
|
|
||||||
#define nvme_opcode_name(opcode) { opcode, #opcode }
|
|
||||||
#define show_nvm_opcode_name(val) \
|
|
||||||
__print_symbolic(val, \
|
|
||||||
nvme_opcode_name(nvme_cmd_flush), \
|
|
||||||
nvme_opcode_name(nvme_cmd_write), \
|
|
||||||
nvme_opcode_name(nvme_cmd_read), \
|
|
||||||
nvme_opcode_name(nvme_cmd_write_uncor), \
|
|
||||||
nvme_opcode_name(nvme_cmd_compare), \
|
|
||||||
nvme_opcode_name(nvme_cmd_write_zeroes), \
|
|
||||||
nvme_opcode_name(nvme_cmd_dsm), \
|
|
||||||
nvme_opcode_name(nvme_cmd_resv_register), \
|
|
||||||
nvme_opcode_name(nvme_cmd_resv_report), \
|
|
||||||
nvme_opcode_name(nvme_cmd_resv_acquire), \
|
|
||||||
nvme_opcode_name(nvme_cmd_resv_release))
|
|
||||||
|
|
||||||
#define show_opcode_name(qid, opcode) \
|
|
||||||
(qid ? show_nvm_opcode_name(opcode) : show_admin_opcode_name(opcode))
|
|
||||||
|
|
||||||
const char *nvme_trace_parse_admin_cmd(struct trace_seq *p, u8 opcode,
|
const char *nvme_trace_parse_admin_cmd(struct trace_seq *p, u8 opcode,
|
||||||
u8 *cdw10);
|
u8 *cdw10);
|
||||||
const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p, u8 opcode,
|
const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p, u8 opcode,
|
||||||
u8 *cdw10);
|
u8 *cdw10);
|
||||||
|
const char *nvme_trace_parse_fabrics_cmd(struct trace_seq *p, u8 fctype,
|
||||||
|
u8 *spc);
|
||||||
|
|
||||||
#define parse_nvme_cmd(qid, opcode, cdw10) \
|
#define parse_nvme_cmd(qid, opcode, fctype, cdw10) \
|
||||||
(qid ? \
|
((opcode) == nvme_fabrics_command ? \
|
||||||
|
nvme_trace_parse_fabrics_cmd(p, fctype, cdw10) : \
|
||||||
|
((qid) ? \
|
||||||
nvme_trace_parse_nvm_cmd(p, opcode, cdw10) : \
|
nvme_trace_parse_nvm_cmd(p, opcode, cdw10) : \
|
||||||
nvme_trace_parse_admin_cmd(p, opcode, cdw10))
|
nvme_trace_parse_admin_cmd(p, opcode, cdw10)))
|
||||||
|
|
||||||
const char *nvme_trace_disk_name(struct trace_seq *p, char *name);
|
const char *nvme_trace_disk_name(struct trace_seq *p, char *name);
|
||||||
#define __print_disk_name(name) \
|
#define __print_disk_name(name) \
|
||||||
|
@ -93,6 +53,7 @@ TRACE_EVENT(nvme_setup_cmd,
|
||||||
__field(int, qid)
|
__field(int, qid)
|
||||||
__field(u8, opcode)
|
__field(u8, opcode)
|
||||||
__field(u8, flags)
|
__field(u8, flags)
|
||||||
|
__field(u8, fctype)
|
||||||
__field(u16, cid)
|
__field(u16, cid)
|
||||||
__field(u32, nsid)
|
__field(u32, nsid)
|
||||||
__field(u64, metadata)
|
__field(u64, metadata)
|
||||||
|
@ -106,6 +67,7 @@ TRACE_EVENT(nvme_setup_cmd,
|
||||||
__entry->cid = cmd->common.command_id;
|
__entry->cid = cmd->common.command_id;
|
||||||
__entry->nsid = le32_to_cpu(cmd->common.nsid);
|
__entry->nsid = le32_to_cpu(cmd->common.nsid);
|
||||||
__entry->metadata = le64_to_cpu(cmd->common.metadata);
|
__entry->metadata = le64_to_cpu(cmd->common.metadata);
|
||||||
|
__entry->fctype = cmd->fabrics.fctype;
|
||||||
__assign_disk_name(__entry->disk, req->rq_disk);
|
__assign_disk_name(__entry->disk, req->rq_disk);
|
||||||
memcpy(__entry->cdw10, &cmd->common.cdw10,
|
memcpy(__entry->cdw10, &cmd->common.cdw10,
|
||||||
sizeof(__entry->cdw10));
|
sizeof(__entry->cdw10));
|
||||||
|
@ -114,8 +76,10 @@ TRACE_EVENT(nvme_setup_cmd,
|
||||||
__entry->ctrl_id, __print_disk_name(__entry->disk),
|
__entry->ctrl_id, __print_disk_name(__entry->disk),
|
||||||
__entry->qid, __entry->cid, __entry->nsid,
|
__entry->qid, __entry->cid, __entry->nsid,
|
||||||
__entry->flags, __entry->metadata,
|
__entry->flags, __entry->metadata,
|
||||||
show_opcode_name(__entry->qid, __entry->opcode),
|
show_opcode_name(__entry->qid, __entry->opcode,
|
||||||
parse_nvme_cmd(__entry->qid, __entry->opcode, __entry->cdw10))
|
__entry->fctype),
|
||||||
|
parse_nvme_cmd(__entry->qid, __entry->opcode,
|
||||||
|
__entry->fctype, __entry->cdw10))
|
||||||
);
|
);
|
||||||
|
|
||||||
TRACE_EVENT(nvme_complete_rq,
|
TRACE_EVENT(nvme_complete_rq,
|
||||||
|
@ -141,7 +105,7 @@ TRACE_EVENT(nvme_complete_rq,
|
||||||
__entry->status = nvme_req(req)->status;
|
__entry->status = nvme_req(req)->status;
|
||||||
__assign_disk_name(__entry->disk, req->rq_disk);
|
__assign_disk_name(__entry->disk, req->rq_disk);
|
||||||
),
|
),
|
||||||
TP_printk("nvme%d: %sqid=%d, cmdid=%u, res=%llu, retries=%u, flags=0x%x, status=%u",
|
TP_printk("nvme%d: %sqid=%d, cmdid=%u, res=%#llx, retries=%u, flags=0x%x, status=%#x",
|
||||||
__entry->ctrl_id, __print_disk_name(__entry->disk),
|
__entry->ctrl_id, __print_disk_name(__entry->disk),
|
||||||
__entry->qid, __entry->cid, __entry->result,
|
__entry->qid, __entry->cid, __entry->result,
|
||||||
__entry->retries, __entry->flags, __entry->status)
|
__entry->retries, __entry->flags, __entry->status)
|
||||||
|
|
|
@ -1,5 +1,7 @@
|
||||||
# SPDX-License-Identifier: GPL-2.0
|
# SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
ccflags-y += -I$(src)
|
||||||
|
|
||||||
obj-$(CONFIG_NVME_TARGET) += nvmet.o
|
obj-$(CONFIG_NVME_TARGET) += nvmet.o
|
||||||
obj-$(CONFIG_NVME_TARGET_LOOP) += nvme-loop.o
|
obj-$(CONFIG_NVME_TARGET_LOOP) += nvme-loop.o
|
||||||
obj-$(CONFIG_NVME_TARGET_RDMA) += nvmet-rdma.o
|
obj-$(CONFIG_NVME_TARGET_RDMA) += nvmet-rdma.o
|
||||||
|
@ -14,3 +16,4 @@ nvmet-rdma-y += rdma.o
|
||||||
nvmet-fc-y += fc.o
|
nvmet-fc-y += fc.o
|
||||||
nvme-fcloop-y += fcloop.o
|
nvme-fcloop-y += fcloop.o
|
||||||
nvmet-tcp-y += tcp.o
|
nvmet-tcp-y += tcp.o
|
||||||
|
nvmet-$(CONFIG_TRACING) += trace.o
|
||||||
|
|
|
@ -10,6 +10,9 @@
|
||||||
#include <linux/pci-p2pdma.h>
|
#include <linux/pci-p2pdma.h>
|
||||||
#include <linux/scatterlist.h>
|
#include <linux/scatterlist.h>
|
||||||
|
|
||||||
|
#define CREATE_TRACE_POINTS
|
||||||
|
#include "trace.h"
|
||||||
|
|
||||||
#include "nvmet.h"
|
#include "nvmet.h"
|
||||||
|
|
||||||
struct workqueue_struct *buffered_io_wq;
|
struct workqueue_struct *buffered_io_wq;
|
||||||
|
@ -311,6 +314,7 @@ int nvmet_enable_port(struct nvmet_port *port)
|
||||||
port->inline_data_size = 0;
|
port->inline_data_size = 0;
|
||||||
|
|
||||||
port->enabled = true;
|
port->enabled = true;
|
||||||
|
port->tr_ops = ops;
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -321,6 +325,7 @@ void nvmet_disable_port(struct nvmet_port *port)
|
||||||
lockdep_assert_held(&nvmet_config_sem);
|
lockdep_assert_held(&nvmet_config_sem);
|
||||||
|
|
||||||
port->enabled = false;
|
port->enabled = false;
|
||||||
|
port->tr_ops = NULL;
|
||||||
|
|
||||||
ops = nvmet_transports[port->disc_addr.trtype];
|
ops = nvmet_transports[port->disc_addr.trtype];
|
||||||
ops->remove_port(port);
|
ops->remove_port(port);
|
||||||
|
@ -689,6 +694,9 @@ static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
|
||||||
|
|
||||||
if (unlikely(status))
|
if (unlikely(status))
|
||||||
nvmet_set_error(req, status);
|
nvmet_set_error(req, status);
|
||||||
|
|
||||||
|
trace_nvmet_req_complete(req);
|
||||||
|
|
||||||
if (req->ns)
|
if (req->ns)
|
||||||
nvmet_put_namespace(req->ns);
|
nvmet_put_namespace(req->ns);
|
||||||
req->ops->queue_response(req);
|
req->ops->queue_response(req);
|
||||||
|
@ -848,6 +856,8 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
|
||||||
req->error_loc = NVMET_NO_ERROR_LOC;
|
req->error_loc = NVMET_NO_ERROR_LOC;
|
||||||
req->error_slba = 0;
|
req->error_slba = 0;
|
||||||
|
|
||||||
|
trace_nvmet_req_init(req, req->cmd);
|
||||||
|
|
||||||
/* no support for fused commands yet */
|
/* no support for fused commands yet */
|
||||||
if (unlikely(flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND))) {
|
if (unlikely(flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND))) {
|
||||||
req->error_loc = offsetof(struct nvme_common_command, flags);
|
req->error_loc = offsetof(struct nvme_common_command, flags);
|
||||||
|
@ -871,7 +881,7 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
|
||||||
status = nvmet_parse_connect_cmd(req);
|
status = nvmet_parse_connect_cmd(req);
|
||||||
else if (likely(req->sq->qid != 0))
|
else if (likely(req->sq->qid != 0))
|
||||||
status = nvmet_parse_io_cmd(req);
|
status = nvmet_parse_io_cmd(req);
|
||||||
else if (req->cmd->common.opcode == nvme_fabrics_command)
|
else if (nvme_is_fabrics(req->cmd))
|
||||||
status = nvmet_parse_fabrics_cmd(req);
|
status = nvmet_parse_fabrics_cmd(req);
|
||||||
else if (req->sq->ctrl->subsys->type == NVME_NQN_DISC)
|
else if (req->sq->ctrl->subsys->type == NVME_NQN_DISC)
|
||||||
status = nvmet_parse_discovery_cmd(req);
|
status = nvmet_parse_discovery_cmd(req);
|
||||||
|
|
|
@ -41,6 +41,10 @@ void nvmet_port_disc_changed(struct nvmet_port *port,
|
||||||
__nvmet_disc_changed(port, ctrl);
|
__nvmet_disc_changed(port, ctrl);
|
||||||
}
|
}
|
||||||
mutex_unlock(&nvmet_disc_subsys->lock);
|
mutex_unlock(&nvmet_disc_subsys->lock);
|
||||||
|
|
||||||
|
/* If transport can signal change, notify transport */
|
||||||
|
if (port->tr_ops && port->tr_ops->discovery_chg)
|
||||||
|
port->tr_ops->discovery_chg(port);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void __nvmet_subsys_disc_changed(struct nvmet_port *port,
|
static void __nvmet_subsys_disc_changed(struct nvmet_port *port,
|
||||||
|
|
|
@ -268,7 +268,7 @@ u16 nvmet_parse_connect_cmd(struct nvmet_req *req)
|
||||||
{
|
{
|
||||||
struct nvme_command *cmd = req->cmd;
|
struct nvme_command *cmd = req->cmd;
|
||||||
|
|
||||||
if (cmd->common.opcode != nvme_fabrics_command) {
|
if (!nvme_is_fabrics(cmd)) {
|
||||||
pr_err("invalid command 0x%x on unconnected queue.\n",
|
pr_err("invalid command 0x%x on unconnected queue.\n",
|
||||||
cmd->fabrics.opcode);
|
cmd->fabrics.opcode);
|
||||||
req->error_loc = offsetof(struct nvme_common_command, opcode);
|
req->error_loc = offsetof(struct nvme_common_command, opcode);
|
||||||
|
|
|
@ -1806,7 +1806,7 @@ nvmet_fc_prep_fcp_rsp(struct nvmet_fc_tgtport *tgtport,
|
||||||
*/
|
*/
|
||||||
rspcnt = atomic_inc_return(&fod->queue->zrspcnt);
|
rspcnt = atomic_inc_return(&fod->queue->zrspcnt);
|
||||||
if (!(rspcnt % fod->queue->ersp_ratio) ||
|
if (!(rspcnt % fod->queue->ersp_ratio) ||
|
||||||
sqe->opcode == nvme_fabrics_command ||
|
nvme_is_fabrics((struct nvme_command *) sqe) ||
|
||||||
xfr_length != fod->req.transfer_len ||
|
xfr_length != fod->req.transfer_len ||
|
||||||
(le16_to_cpu(cqe->status) & 0xFFFE) || cqewd[0] || cqewd[1] ||
|
(le16_to_cpu(cqe->status) & 0xFFFE) || cqewd[0] || cqewd[1] ||
|
||||||
(sqe->flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND)) ||
|
(sqe->flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND)) ||
|
||||||
|
@ -2549,6 +2549,16 @@ nvmet_fc_remove_port(struct nvmet_port *port)
|
||||||
kfree(pe);
|
kfree(pe);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
nvmet_fc_discovery_chg(struct nvmet_port *port)
|
||||||
|
{
|
||||||
|
struct nvmet_fc_port_entry *pe = port->priv;
|
||||||
|
struct nvmet_fc_tgtport *tgtport = pe->tgtport;
|
||||||
|
|
||||||
|
if (tgtport && tgtport->ops->discovery_event)
|
||||||
|
tgtport->ops->discovery_event(&tgtport->fc_target_port);
|
||||||
|
}
|
||||||
|
|
||||||
static const struct nvmet_fabrics_ops nvmet_fc_tgt_fcp_ops = {
|
static const struct nvmet_fabrics_ops nvmet_fc_tgt_fcp_ops = {
|
||||||
.owner = THIS_MODULE,
|
.owner = THIS_MODULE,
|
||||||
.type = NVMF_TRTYPE_FC,
|
.type = NVMF_TRTYPE_FC,
|
||||||
|
@ -2557,6 +2567,7 @@ static const struct nvmet_fabrics_ops nvmet_fc_tgt_fcp_ops = {
|
||||||
.remove_port = nvmet_fc_remove_port,
|
.remove_port = nvmet_fc_remove_port,
|
||||||
.queue_response = nvmet_fc_fcp_nvme_cmd_done,
|
.queue_response = nvmet_fc_fcp_nvme_cmd_done,
|
||||||
.delete_ctrl = nvmet_fc_delete_ctrl,
|
.delete_ctrl = nvmet_fc_delete_ctrl,
|
||||||
|
.discovery_chg = nvmet_fc_discovery_chg,
|
||||||
};
|
};
|
||||||
|
|
||||||
static int __init nvmet_fc_init_module(void)
|
static int __init nvmet_fc_init_module(void)
|
||||||
|
|
|
@ -231,6 +231,11 @@ struct fcloop_lsreq {
|
||||||
int status;
|
int status;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
struct fcloop_rscn {
|
||||||
|
struct fcloop_tport *tport;
|
||||||
|
struct work_struct work;
|
||||||
|
};
|
||||||
|
|
||||||
enum {
|
enum {
|
||||||
INI_IO_START = 0,
|
INI_IO_START = 0,
|
||||||
INI_IO_ACTIVE = 1,
|
INI_IO_ACTIVE = 1,
|
||||||
|
@ -348,6 +353,37 @@ fcloop_xmt_ls_rsp(struct nvmet_fc_target_port *tport,
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Simulate reception of RSCN and converting it to a initiator transport
|
||||||
|
* call to rescan a remote port.
|
||||||
|
*/
|
||||||
|
static void
|
||||||
|
fcloop_tgt_rscn_work(struct work_struct *work)
|
||||||
|
{
|
||||||
|
struct fcloop_rscn *tgt_rscn =
|
||||||
|
container_of(work, struct fcloop_rscn, work);
|
||||||
|
struct fcloop_tport *tport = tgt_rscn->tport;
|
||||||
|
|
||||||
|
if (tport->remoteport)
|
||||||
|
nvme_fc_rescan_remoteport(tport->remoteport);
|
||||||
|
kfree(tgt_rscn);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
fcloop_tgt_discovery_evt(struct nvmet_fc_target_port *tgtport)
|
||||||
|
{
|
||||||
|
struct fcloop_rscn *tgt_rscn;
|
||||||
|
|
||||||
|
tgt_rscn = kzalloc(sizeof(*tgt_rscn), GFP_KERNEL);
|
||||||
|
if (!tgt_rscn)
|
||||||
|
return;
|
||||||
|
|
||||||
|
tgt_rscn->tport = tgtport->private;
|
||||||
|
INIT_WORK(&tgt_rscn->work, fcloop_tgt_rscn_work);
|
||||||
|
|
||||||
|
schedule_work(&tgt_rscn->work);
|
||||||
|
}
|
||||||
|
|
||||||
static void
|
static void
|
||||||
fcloop_tfcp_req_free(struct kref *ref)
|
fcloop_tfcp_req_free(struct kref *ref)
|
||||||
{
|
{
|
||||||
|
@ -839,6 +875,7 @@ static struct nvmet_fc_target_template tgttemplate = {
|
||||||
.fcp_op = fcloop_fcp_op,
|
.fcp_op = fcloop_fcp_op,
|
||||||
.fcp_abort = fcloop_tgt_fcp_abort,
|
.fcp_abort = fcloop_tgt_fcp_abort,
|
||||||
.fcp_req_release = fcloop_fcp_req_release,
|
.fcp_req_release = fcloop_fcp_req_release,
|
||||||
|
.discovery_event = fcloop_tgt_discovery_evt,
|
||||||
.max_hw_queues = FCLOOP_HW_QUEUES,
|
.max_hw_queues = FCLOOP_HW_QUEUES,
|
||||||
.max_sgl_segments = FCLOOP_SGL_SEGS,
|
.max_sgl_segments = FCLOOP_SGL_SEGS,
|
||||||
.max_dif_sgl_segments = FCLOOP_SGL_SEGS,
|
.max_dif_sgl_segments = FCLOOP_SGL_SEGS,
|
||||||
|
|
|
@ -140,6 +140,7 @@ struct nvmet_port {
|
||||||
void *priv;
|
void *priv;
|
||||||
bool enabled;
|
bool enabled;
|
||||||
int inline_data_size;
|
int inline_data_size;
|
||||||
|
const struct nvmet_fabrics_ops *tr_ops;
|
||||||
};
|
};
|
||||||
|
|
||||||
static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
|
static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
|
||||||
|
@ -277,6 +278,7 @@ struct nvmet_fabrics_ops {
|
||||||
void (*disc_traddr)(struct nvmet_req *req,
|
void (*disc_traddr)(struct nvmet_req *req,
|
||||||
struct nvmet_port *port, char *traddr);
|
struct nvmet_port *port, char *traddr);
|
||||||
u16 (*install_queue)(struct nvmet_sq *nvme_sq);
|
u16 (*install_queue)(struct nvmet_sq *nvme_sq);
|
||||||
|
void (*discovery_chg)(struct nvmet_port *port);
|
||||||
};
|
};
|
||||||
|
|
||||||
#define NVMET_MAX_INLINE_BIOVEC 8
|
#define NVMET_MAX_INLINE_BIOVEC 8
|
||||||
|
|
|
@ -0,0 +1,201 @@
|
||||||
|
// SPDX-License-Identifier: GPL-2.0
|
||||||
|
/*
|
||||||
|
* NVM Express target device driver tracepoints
|
||||||
|
* Copyright (c) 2018 Johannes Thumshirn, SUSE Linux GmbH
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <asm/unaligned.h>
|
||||||
|
#include "trace.h"
|
||||||
|
|
||||||
|
static const char *nvmet_trace_admin_identify(struct trace_seq *p, u8 *cdw10)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
u8 cns = cdw10[0];
|
||||||
|
u16 ctrlid = get_unaligned_le16(cdw10 + 2);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "cns=%u, ctrlid=%u", cns, ctrlid);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvmet_trace_admin_get_features(struct trace_seq *p,
|
||||||
|
u8 *cdw10)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
u8 fid = cdw10[0];
|
||||||
|
u8 sel = cdw10[1] & 0x7;
|
||||||
|
u32 cdw11 = get_unaligned_le32(cdw10 + 4);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "fid=0x%x sel=0x%x cdw11=0x%x", fid, sel, cdw11);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvmet_trace_read_write(struct trace_seq *p, u8 *cdw10)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
u64 slba = get_unaligned_le64(cdw10);
|
||||||
|
u16 length = get_unaligned_le16(cdw10 + 8);
|
||||||
|
u16 control = get_unaligned_le16(cdw10 + 10);
|
||||||
|
u32 dsmgmt = get_unaligned_le32(cdw10 + 12);
|
||||||
|
u32 reftag = get_unaligned_le32(cdw10 + 16);
|
||||||
|
|
||||||
|
trace_seq_printf(p,
|
||||||
|
"slba=%llu, len=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u",
|
||||||
|
slba, length, control, dsmgmt, reftag);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvmet_trace_dsm(struct trace_seq *p, u8 *cdw10)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "nr=%u, attributes=%u",
|
||||||
|
get_unaligned_le32(cdw10),
|
||||||
|
get_unaligned_le32(cdw10 + 4));
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvmet_trace_common(struct trace_seq *p, u8 *cdw10)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "cdw10=%*ph", 24, cdw10);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
const char *nvmet_trace_parse_admin_cmd(struct trace_seq *p,
|
||||||
|
u8 opcode, u8 *cdw10)
|
||||||
|
{
|
||||||
|
switch (opcode) {
|
||||||
|
case nvme_admin_identify:
|
||||||
|
return nvmet_trace_admin_identify(p, cdw10);
|
||||||
|
case nvme_admin_get_features:
|
||||||
|
return nvmet_trace_admin_get_features(p, cdw10);
|
||||||
|
default:
|
||||||
|
return nvmet_trace_common(p, cdw10);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const char *nvmet_trace_parse_nvm_cmd(struct trace_seq *p,
|
||||||
|
u8 opcode, u8 *cdw10)
|
||||||
|
{
|
||||||
|
switch (opcode) {
|
||||||
|
case nvme_cmd_read:
|
||||||
|
case nvme_cmd_write:
|
||||||
|
case nvme_cmd_write_zeroes:
|
||||||
|
return nvmet_trace_read_write(p, cdw10);
|
||||||
|
case nvme_cmd_dsm:
|
||||||
|
return nvmet_trace_dsm(p, cdw10);
|
||||||
|
default:
|
||||||
|
return nvmet_trace_common(p, cdw10);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvmet_trace_fabrics_property_set(struct trace_seq *p,
|
||||||
|
u8 *spc)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
u8 attrib = spc[0];
|
||||||
|
u32 ofst = get_unaligned_le32(spc + 4);
|
||||||
|
u64 value = get_unaligned_le64(spc + 8);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "attrib=%u, ofst=0x%x, value=0x%llx",
|
||||||
|
attrib, ofst, value);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvmet_trace_fabrics_connect(struct trace_seq *p,
|
||||||
|
u8 *spc)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
u16 recfmt = get_unaligned_le16(spc);
|
||||||
|
u16 qid = get_unaligned_le16(spc + 2);
|
||||||
|
u16 sqsize = get_unaligned_le16(spc + 4);
|
||||||
|
u8 cattr = spc[6];
|
||||||
|
u32 kato = get_unaligned_le32(spc + 8);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "recfmt=%u, qid=%u, sqsize=%u, cattr=%u, kato=%u",
|
||||||
|
recfmt, qid, sqsize, cattr, kato);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvmet_trace_fabrics_property_get(struct trace_seq *p,
|
||||||
|
u8 *spc)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
u8 attrib = spc[0];
|
||||||
|
u32 ofst = get_unaligned_le32(spc + 4);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "attrib=%u, ofst=0x%x", attrib, ofst);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *nvmet_trace_fabrics_common(struct trace_seq *p, u8 *spc)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
|
||||||
|
trace_seq_printf(p, "spcecific=%*ph", 24, spc);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
const char *nvmet_trace_parse_fabrics_cmd(struct trace_seq *p,
|
||||||
|
u8 fctype, u8 *spc)
|
||||||
|
{
|
||||||
|
switch (fctype) {
|
||||||
|
case nvme_fabrics_type_property_set:
|
||||||
|
return nvmet_trace_fabrics_property_set(p, spc);
|
||||||
|
case nvme_fabrics_type_connect:
|
||||||
|
return nvmet_trace_fabrics_connect(p, spc);
|
||||||
|
case nvme_fabrics_type_property_get:
|
||||||
|
return nvmet_trace_fabrics_property_get(p, spc);
|
||||||
|
default:
|
||||||
|
return nvmet_trace_fabrics_common(p, spc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const char *nvmet_trace_disk_name(struct trace_seq *p, char *name)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
|
||||||
|
if (*name)
|
||||||
|
trace_seq_printf(p, "disk=%s, ", name);
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl)
|
||||||
|
{
|
||||||
|
const char *ret = trace_seq_buffer_ptr(p);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* XXX: We don't know the controller instance before executing the
|
||||||
|
* connect command itself because the connect command for the admin
|
||||||
|
* queue will not provide the cntlid which will be allocated in this
|
||||||
|
* command. In case of io queues, the controller instance will be
|
||||||
|
* mapped by the extra data of the connect command.
|
||||||
|
* If we can know the extra data of the connect command in this stage,
|
||||||
|
* we can update this print statement later.
|
||||||
|
*/
|
||||||
|
if (ctrl)
|
||||||
|
trace_seq_printf(p, "%d", ctrl->cntlid);
|
||||||
|
else
|
||||||
|
trace_seq_printf(p, "_");
|
||||||
|
trace_seq_putc(p, 0);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
|
@ -0,0 +1,141 @@
|
||||||
|
/* SPDX-License-Identifier: GPL-2.0 */
|
||||||
|
/*
|
||||||
|
* NVM Express target device driver tracepoints
|
||||||
|
* Copyright (c) 2018 Johannes Thumshirn, SUSE Linux GmbH
|
||||||
|
*
|
||||||
|
* This is entirely based on drivers/nvme/host/trace.h
|
||||||
|
*/
|
||||||
|
|
||||||
|
#undef TRACE_SYSTEM
|
||||||
|
#define TRACE_SYSTEM nvmet
|
||||||
|
|
||||||
|
#if !defined(_TRACE_NVMET_H) || defined(TRACE_HEADER_MULTI_READ)
|
||||||
|
#define _TRACE_NVMET_H
|
||||||
|
|
||||||
|
#include <linux/nvme.h>
|
||||||
|
#include <linux/tracepoint.h>
|
||||||
|
#include <linux/trace_seq.h>
|
||||||
|
|
||||||
|
#include "nvmet.h"
|
||||||
|
|
||||||
|
const char *nvmet_trace_parse_admin_cmd(struct trace_seq *p, u8 opcode,
|
||||||
|
u8 *cdw10);
|
||||||
|
const char *nvmet_trace_parse_nvm_cmd(struct trace_seq *p, u8 opcode,
|
||||||
|
u8 *cdw10);
|
||||||
|
const char *nvmet_trace_parse_fabrics_cmd(struct trace_seq *p, u8 fctype,
|
||||||
|
u8 *spc);
|
||||||
|
|
||||||
|
#define parse_nvme_cmd(qid, opcode, fctype, cdw10) \
|
||||||
|
((opcode) == nvme_fabrics_command ? \
|
||||||
|
nvmet_trace_parse_fabrics_cmd(p, fctype, cdw10) : \
|
||||||
|
(qid ? \
|
||||||
|
nvmet_trace_parse_nvm_cmd(p, opcode, cdw10) : \
|
||||||
|
nvmet_trace_parse_admin_cmd(p, opcode, cdw10)))
|
||||||
|
|
||||||
|
const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl);
|
||||||
|
#define __print_ctrl_name(ctrl) \
|
||||||
|
nvmet_trace_ctrl_name(p, ctrl)
|
||||||
|
|
||||||
|
const char *nvmet_trace_disk_name(struct trace_seq *p, char *name);
|
||||||
|
#define __print_disk_name(name) \
|
||||||
|
nvmet_trace_disk_name(p, name)
|
||||||
|
|
||||||
|
#ifndef TRACE_HEADER_MULTI_READ
|
||||||
|
static inline struct nvmet_ctrl *nvmet_req_to_ctrl(struct nvmet_req *req)
|
||||||
|
{
|
||||||
|
return req->sq->ctrl;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void __assign_disk_name(char *name, struct nvmet_req *req,
|
||||||
|
bool init)
|
||||||
|
{
|
||||||
|
struct nvmet_ctrl *ctrl = nvmet_req_to_ctrl(req);
|
||||||
|
struct nvmet_ns *ns;
|
||||||
|
|
||||||
|
if ((init && req->sq->qid) || (!init && req->cq->qid)) {
|
||||||
|
ns = nvmet_find_namespace(ctrl, req->cmd->rw.nsid);
|
||||||
|
strncpy(name, ns->device_path, DISK_NAME_LEN);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
memset(name, 0, DISK_NAME_LEN);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
TRACE_EVENT(nvmet_req_init,
|
||||||
|
TP_PROTO(struct nvmet_req *req, struct nvme_command *cmd),
|
||||||
|
TP_ARGS(req, cmd),
|
||||||
|
TP_STRUCT__entry(
|
||||||
|
__field(struct nvme_command *, cmd)
|
||||||
|
__field(struct nvmet_ctrl *, ctrl)
|
||||||
|
__array(char, disk, DISK_NAME_LEN)
|
||||||
|
__field(int, qid)
|
||||||
|
__field(u16, cid)
|
||||||
|
__field(u8, opcode)
|
||||||
|
__field(u8, fctype)
|
||||||
|
__field(u8, flags)
|
||||||
|
__field(u32, nsid)
|
||||||
|
__field(u64, metadata)
|
||||||
|
__array(u8, cdw10, 24)
|
||||||
|
),
|
||||||
|
TP_fast_assign(
|
||||||
|
__entry->cmd = cmd;
|
||||||
|
__entry->ctrl = nvmet_req_to_ctrl(req);
|
||||||
|
__assign_disk_name(__entry->disk, req, true);
|
||||||
|
__entry->qid = req->sq->qid;
|
||||||
|
__entry->cid = cmd->common.command_id;
|
||||||
|
__entry->opcode = cmd->common.opcode;
|
||||||
|
__entry->fctype = cmd->fabrics.fctype;
|
||||||
|
__entry->flags = cmd->common.flags;
|
||||||
|
__entry->nsid = le32_to_cpu(cmd->common.nsid);
|
||||||
|
__entry->metadata = le64_to_cpu(cmd->common.metadata);
|
||||||
|
memcpy(__entry->cdw10, &cmd->common.cdw10,
|
||||||
|
sizeof(__entry->cdw10));
|
||||||
|
),
|
||||||
|
TP_printk("nvmet%s: %sqid=%d, cmdid=%u, nsid=%u, flags=%#x, "
|
||||||
|
"meta=%#llx, cmd=(%s, %s)",
|
||||||
|
__print_ctrl_name(__entry->ctrl),
|
||||||
|
__print_disk_name(__entry->disk),
|
||||||
|
__entry->qid, __entry->cid, __entry->nsid,
|
||||||
|
__entry->flags, __entry->metadata,
|
||||||
|
show_opcode_name(__entry->qid, __entry->opcode,
|
||||||
|
__entry->fctype),
|
||||||
|
parse_nvme_cmd(__entry->qid, __entry->opcode,
|
||||||
|
__entry->fctype, __entry->cdw10))
|
||||||
|
);
|
||||||
|
|
||||||
|
TRACE_EVENT(nvmet_req_complete,
|
||||||
|
TP_PROTO(struct nvmet_req *req),
|
||||||
|
TP_ARGS(req),
|
||||||
|
TP_STRUCT__entry(
|
||||||
|
__field(struct nvmet_ctrl *, ctrl)
|
||||||
|
__array(char, disk, DISK_NAME_LEN)
|
||||||
|
__field(int, qid)
|
||||||
|
__field(int, cid)
|
||||||
|
__field(u64, result)
|
||||||
|
__field(u16, status)
|
||||||
|
),
|
||||||
|
TP_fast_assign(
|
||||||
|
__entry->ctrl = nvmet_req_to_ctrl(req);
|
||||||
|
__entry->qid = req->cq->qid;
|
||||||
|
__entry->cid = req->cqe->command_id;
|
||||||
|
__entry->result = le64_to_cpu(req->cqe->result.u64);
|
||||||
|
__entry->status = le16_to_cpu(req->cqe->status) >> 1;
|
||||||
|
__assign_disk_name(__entry->disk, req, false);
|
||||||
|
),
|
||||||
|
TP_printk("nvmet%s: %sqid=%d, cmdid=%u, res=%#llx, status=%#x",
|
||||||
|
__print_ctrl_name(__entry->ctrl),
|
||||||
|
__print_disk_name(__entry->disk),
|
||||||
|
__entry->qid, __entry->cid, __entry->result, __entry->status)
|
||||||
|
|
||||||
|
);
|
||||||
|
|
||||||
|
#endif /* _TRACE_NVMET_H */
|
||||||
|
|
||||||
|
#undef TRACE_INCLUDE_PATH
|
||||||
|
#define TRACE_INCLUDE_PATH .
|
||||||
|
#undef TRACE_INCLUDE_FILE
|
||||||
|
#define TRACE_INCLUDE_FILE trace
|
||||||
|
|
||||||
|
/* This part must be outside protection */
|
||||||
|
#include <trace/define_trace.h>
|
|
@ -274,6 +274,7 @@ struct lpfc_stats {
|
||||||
uint32_t elsXmitADISC;
|
uint32_t elsXmitADISC;
|
||||||
uint32_t elsXmitLOGO;
|
uint32_t elsXmitLOGO;
|
||||||
uint32_t elsXmitSCR;
|
uint32_t elsXmitSCR;
|
||||||
|
uint32_t elsXmitRSCN;
|
||||||
uint32_t elsXmitRNID;
|
uint32_t elsXmitRNID;
|
||||||
uint32_t elsXmitFARP;
|
uint32_t elsXmitFARP;
|
||||||
uint32_t elsXmitFARPR;
|
uint32_t elsXmitFARPR;
|
||||||
|
@ -819,6 +820,7 @@ struct lpfc_hba {
|
||||||
uint32_t cfg_use_msi;
|
uint32_t cfg_use_msi;
|
||||||
uint32_t cfg_auto_imax;
|
uint32_t cfg_auto_imax;
|
||||||
uint32_t cfg_fcp_imax;
|
uint32_t cfg_fcp_imax;
|
||||||
|
uint32_t cfg_force_rscn;
|
||||||
uint32_t cfg_cq_poll_threshold;
|
uint32_t cfg_cq_poll_threshold;
|
||||||
uint32_t cfg_cq_max_proc_limit;
|
uint32_t cfg_cq_max_proc_limit;
|
||||||
uint32_t cfg_fcp_cpu_map;
|
uint32_t cfg_fcp_cpu_map;
|
||||||
|
|
|
@ -4958,6 +4958,64 @@ static DEVICE_ATTR(lpfc_req_fw_upgrade, S_IRUGO | S_IWUSR,
|
||||||
lpfc_request_firmware_upgrade_show,
|
lpfc_request_firmware_upgrade_show,
|
||||||
lpfc_request_firmware_upgrade_store);
|
lpfc_request_firmware_upgrade_store);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* lpfc_force_rscn_store
|
||||||
|
*
|
||||||
|
* @dev: class device that is converted into a Scsi_host.
|
||||||
|
* @attr: device attribute, not used.
|
||||||
|
* @buf: unused string
|
||||||
|
* @count: unused variable.
|
||||||
|
*
|
||||||
|
* Description:
|
||||||
|
* Force the switch to send a RSCN to all other NPorts in our zone
|
||||||
|
* If we are direct connect pt2pt, build the RSCN command ourself
|
||||||
|
* and send to the other NPort. Not supported for private loop.
|
||||||
|
*
|
||||||
|
* Returns:
|
||||||
|
* 0 - on success
|
||||||
|
* -EIO - if command is not sent
|
||||||
|
**/
|
||||||
|
static ssize_t
|
||||||
|
lpfc_force_rscn_store(struct device *dev, struct device_attribute *attr,
|
||||||
|
const char *buf, size_t count)
|
||||||
|
{
|
||||||
|
struct Scsi_Host *shost = class_to_shost(dev);
|
||||||
|
struct lpfc_vport *vport = (struct lpfc_vport *)shost->hostdata;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
i = lpfc_issue_els_rscn(vport, 0);
|
||||||
|
if (i)
|
||||||
|
return -EIO;
|
||||||
|
return strlen(buf);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* lpfc_force_rscn: Force an RSCN to be sent to all remote NPorts
|
||||||
|
* connected to the HBA.
|
||||||
|
*
|
||||||
|
* Value range is any ascii value
|
||||||
|
*/
|
||||||
|
static int lpfc_force_rscn;
|
||||||
|
module_param(lpfc_force_rscn, int, 0644);
|
||||||
|
MODULE_PARM_DESC(lpfc_force_rscn,
|
||||||
|
"Force an RSCN to be sent to all remote NPorts");
|
||||||
|
lpfc_param_show(force_rscn)
|
||||||
|
|
||||||
|
/**
|
||||||
|
* lpfc_force_rscn_init - Force an RSCN to be sent to all remote NPorts
|
||||||
|
* @phba: lpfc_hba pointer.
|
||||||
|
* @val: unused value.
|
||||||
|
*
|
||||||
|
* Returns:
|
||||||
|
* zero if val saved.
|
||||||
|
**/
|
||||||
|
static int
|
||||||
|
lpfc_force_rscn_init(struct lpfc_hba *phba, int val)
|
||||||
|
{
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
static DEVICE_ATTR_RW(lpfc_force_rscn);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* lpfc_fcp_imax_store
|
* lpfc_fcp_imax_store
|
||||||
*
|
*
|
||||||
|
@ -5958,6 +6016,7 @@ struct device_attribute *lpfc_hba_attrs[] = {
|
||||||
&dev_attr_lpfc_nvme_oas,
|
&dev_attr_lpfc_nvme_oas,
|
||||||
&dev_attr_lpfc_nvme_embed_cmd,
|
&dev_attr_lpfc_nvme_embed_cmd,
|
||||||
&dev_attr_lpfc_fcp_imax,
|
&dev_attr_lpfc_fcp_imax,
|
||||||
|
&dev_attr_lpfc_force_rscn,
|
||||||
&dev_attr_lpfc_cq_poll_threshold,
|
&dev_attr_lpfc_cq_poll_threshold,
|
||||||
&dev_attr_lpfc_cq_max_proc_limit,
|
&dev_attr_lpfc_cq_max_proc_limit,
|
||||||
&dev_attr_lpfc_fcp_cpu_map,
|
&dev_attr_lpfc_fcp_cpu_map,
|
||||||
|
@ -7005,6 +7064,7 @@ lpfc_get_cfgparam(struct lpfc_hba *phba)
|
||||||
lpfc_nvme_oas_init(phba, lpfc_nvme_oas);
|
lpfc_nvme_oas_init(phba, lpfc_nvme_oas);
|
||||||
lpfc_nvme_embed_cmd_init(phba, lpfc_nvme_embed_cmd);
|
lpfc_nvme_embed_cmd_init(phba, lpfc_nvme_embed_cmd);
|
||||||
lpfc_fcp_imax_init(phba, lpfc_fcp_imax);
|
lpfc_fcp_imax_init(phba, lpfc_fcp_imax);
|
||||||
|
lpfc_force_rscn_init(phba, lpfc_force_rscn);
|
||||||
lpfc_cq_poll_threshold_init(phba, lpfc_cq_poll_threshold);
|
lpfc_cq_poll_threshold_init(phba, lpfc_cq_poll_threshold);
|
||||||
lpfc_cq_max_proc_limit_init(phba, lpfc_cq_max_proc_limit);
|
lpfc_cq_max_proc_limit_init(phba, lpfc_cq_max_proc_limit);
|
||||||
lpfc_fcp_cpu_map_init(phba, lpfc_fcp_cpu_map);
|
lpfc_fcp_cpu_map_init(phba, lpfc_fcp_cpu_map);
|
||||||
|
|
|
@ -141,6 +141,7 @@ int lpfc_issue_els_adisc(struct lpfc_vport *, struct lpfc_nodelist *, uint8_t);
|
||||||
int lpfc_issue_els_logo(struct lpfc_vport *, struct lpfc_nodelist *, uint8_t);
|
int lpfc_issue_els_logo(struct lpfc_vport *, struct lpfc_nodelist *, uint8_t);
|
||||||
int lpfc_issue_els_npiv_logo(struct lpfc_vport *, struct lpfc_nodelist *);
|
int lpfc_issue_els_npiv_logo(struct lpfc_vport *, struct lpfc_nodelist *);
|
||||||
int lpfc_issue_els_scr(struct lpfc_vport *, uint32_t, uint8_t);
|
int lpfc_issue_els_scr(struct lpfc_vport *, uint32_t, uint8_t);
|
||||||
|
int lpfc_issue_els_rscn(struct lpfc_vport *vport, uint8_t retry);
|
||||||
int lpfc_issue_fabric_reglogin(struct lpfc_vport *);
|
int lpfc_issue_fabric_reglogin(struct lpfc_vport *);
|
||||||
int lpfc_els_free_iocb(struct lpfc_hba *, struct lpfc_iocbq *);
|
int lpfc_els_free_iocb(struct lpfc_hba *, struct lpfc_iocbq *);
|
||||||
int lpfc_ct_free_iocb(struct lpfc_hba *, struct lpfc_iocbq *);
|
int lpfc_ct_free_iocb(struct lpfc_hba *, struct lpfc_iocbq *);
|
||||||
|
@ -355,6 +356,7 @@ void lpfc_mbox_timeout_handler(struct lpfc_hba *);
|
||||||
struct lpfc_nodelist *lpfc_findnode_did(struct lpfc_vport *, uint32_t);
|
struct lpfc_nodelist *lpfc_findnode_did(struct lpfc_vport *, uint32_t);
|
||||||
struct lpfc_nodelist *lpfc_findnode_wwpn(struct lpfc_vport *,
|
struct lpfc_nodelist *lpfc_findnode_wwpn(struct lpfc_vport *,
|
||||||
struct lpfc_name *);
|
struct lpfc_name *);
|
||||||
|
struct lpfc_nodelist *lpfc_findnode_mapped(struct lpfc_vport *vport);
|
||||||
|
|
||||||
int lpfc_sli_issue_mbox_wait(struct lpfc_hba *, LPFC_MBOXQ_t *, uint32_t);
|
int lpfc_sli_issue_mbox_wait(struct lpfc_hba *, LPFC_MBOXQ_t *, uint32_t);
|
||||||
|
|
||||||
|
@ -555,6 +557,8 @@ void lpfc_ras_stop_fwlog(struct lpfc_hba *phba);
|
||||||
int lpfc_check_fwlog_support(struct lpfc_hba *phba);
|
int lpfc_check_fwlog_support(struct lpfc_hba *phba);
|
||||||
|
|
||||||
/* NVME interfaces. */
|
/* NVME interfaces. */
|
||||||
|
void lpfc_nvme_rescan_port(struct lpfc_vport *vport,
|
||||||
|
struct lpfc_nodelist *ndlp);
|
||||||
void lpfc_nvme_unregister_port(struct lpfc_vport *vport,
|
void lpfc_nvme_unregister_port(struct lpfc_vport *vport,
|
||||||
struct lpfc_nodelist *ndlp);
|
struct lpfc_nodelist *ndlp);
|
||||||
int lpfc_nvme_register_port(struct lpfc_vport *vport,
|
int lpfc_nvme_register_port(struct lpfc_vport *vport,
|
||||||
|
|
|
@ -30,6 +30,8 @@
|
||||||
#include <scsi/scsi_device.h>
|
#include <scsi/scsi_device.h>
|
||||||
#include <scsi/scsi_host.h>
|
#include <scsi/scsi_host.h>
|
||||||
#include <scsi/scsi_transport_fc.h>
|
#include <scsi/scsi_transport_fc.h>
|
||||||
|
#include <uapi/scsi/fc/fc_fs.h>
|
||||||
|
#include <uapi/scsi/fc/fc_els.h>
|
||||||
|
|
||||||
#include "lpfc_hw4.h"
|
#include "lpfc_hw4.h"
|
||||||
#include "lpfc_hw.h"
|
#include "lpfc_hw.h"
|
||||||
|
@ -3078,6 +3080,116 @@ lpfc_issue_els_scr(struct lpfc_vport *vport, uint32_t nportid, uint8_t retry)
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* lpfc_issue_els_rscn - Issue an RSCN to the Fabric Controller (Fabric)
|
||||||
|
* or the other nport (pt2pt).
|
||||||
|
* @vport: pointer to a host virtual N_Port data structure.
|
||||||
|
* @retry: number of retries to the command IOCB.
|
||||||
|
*
|
||||||
|
* This routine issues a RSCN to the Fabric Controller (DID 0xFFFFFD)
|
||||||
|
* when connected to a fabric, or to the remote port when connected
|
||||||
|
* in point-to-point mode. When sent to the Fabric Controller, it will
|
||||||
|
* replay the RSCN to registered recipients.
|
||||||
|
*
|
||||||
|
* Note that, in lpfc_prep_els_iocb() routine, the reference count of ndlp
|
||||||
|
* will be incremented by 1 for holding the ndlp and the reference to ndlp
|
||||||
|
* will be stored into the context1 field of the IOCB for the completion
|
||||||
|
* callback function to the RSCN ELS command.
|
||||||
|
*
|
||||||
|
* Return code
|
||||||
|
* 0 - Successfully issued RSCN command
|
||||||
|
* 1 - Failed to issue RSCN command
|
||||||
|
**/
|
||||||
|
int
|
||||||
|
lpfc_issue_els_rscn(struct lpfc_vport *vport, uint8_t retry)
|
||||||
|
{
|
||||||
|
struct lpfc_hba *phba = vport->phba;
|
||||||
|
struct lpfc_iocbq *elsiocb;
|
||||||
|
struct lpfc_nodelist *ndlp;
|
||||||
|
struct {
|
||||||
|
struct fc_els_rscn rscn;
|
||||||
|
struct fc_els_rscn_page portid;
|
||||||
|
} *event;
|
||||||
|
uint32_t nportid;
|
||||||
|
uint16_t cmdsize = sizeof(*event);
|
||||||
|
|
||||||
|
/* Not supported for private loop */
|
||||||
|
if (phba->fc_topology == LPFC_TOPOLOGY_LOOP &&
|
||||||
|
!(vport->fc_flag & FC_PUBLIC_LOOP))
|
||||||
|
return 1;
|
||||||
|
|
||||||
|
if (vport->fc_flag & FC_PT2PT) {
|
||||||
|
/* find any mapped nport - that would be the other nport */
|
||||||
|
ndlp = lpfc_findnode_mapped(vport);
|
||||||
|
if (!ndlp)
|
||||||
|
return 1;
|
||||||
|
} else {
|
||||||
|
nportid = FC_FID_FCTRL;
|
||||||
|
/* find the fabric controller node */
|
||||||
|
ndlp = lpfc_findnode_did(vport, nportid);
|
||||||
|
if (!ndlp) {
|
||||||
|
/* if one didn't exist, make one */
|
||||||
|
ndlp = lpfc_nlp_init(vport, nportid);
|
||||||
|
if (!ndlp)
|
||||||
|
return 1;
|
||||||
|
lpfc_enqueue_node(vport, ndlp);
|
||||||
|
} else if (!NLP_CHK_NODE_ACT(ndlp)) {
|
||||||
|
ndlp = lpfc_enable_node(vport, ndlp,
|
||||||
|
NLP_STE_UNUSED_NODE);
|
||||||
|
if (!ndlp)
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
elsiocb = lpfc_prep_els_iocb(vport, 1, cmdsize, retry, ndlp,
|
||||||
|
ndlp->nlp_DID, ELS_CMD_RSCN_XMT);
|
||||||
|
|
||||||
|
if (!elsiocb) {
|
||||||
|
/* This will trigger the release of the node just
|
||||||
|
* allocated
|
||||||
|
*/
|
||||||
|
lpfc_nlp_put(ndlp);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
event = ((struct lpfc_dmabuf *)elsiocb->context2)->virt;
|
||||||
|
|
||||||
|
event->rscn.rscn_cmd = ELS_RSCN;
|
||||||
|
event->rscn.rscn_page_len = sizeof(struct fc_els_rscn_page);
|
||||||
|
event->rscn.rscn_plen = cpu_to_be16(cmdsize);
|
||||||
|
|
||||||
|
nportid = vport->fc_myDID;
|
||||||
|
/* appears that page flags must be 0 for fabric to broadcast RSCN */
|
||||||
|
event->portid.rscn_page_flags = 0;
|
||||||
|
event->portid.rscn_fid[0] = (nportid & 0x00FF0000) >> 16;
|
||||||
|
event->portid.rscn_fid[1] = (nportid & 0x0000FF00) >> 8;
|
||||||
|
event->portid.rscn_fid[2] = nportid & 0x000000FF;
|
||||||
|
|
||||||
|
lpfc_debugfs_disc_trc(vport, LPFC_DISC_TRC_ELS_CMD,
|
||||||
|
"Issue RSCN: did:x%x",
|
||||||
|
ndlp->nlp_DID, 0, 0);
|
||||||
|
|
||||||
|
phba->fc_stat.elsXmitRSCN++;
|
||||||
|
elsiocb->iocb_cmpl = lpfc_cmpl_els_cmd;
|
||||||
|
if (lpfc_sli_issue_iocb(phba, LPFC_ELS_RING, elsiocb, 0) ==
|
||||||
|
IOCB_ERROR) {
|
||||||
|
/* The additional lpfc_nlp_put will cause the following
|
||||||
|
* lpfc_els_free_iocb routine to trigger the rlease of
|
||||||
|
* the node.
|
||||||
|
*/
|
||||||
|
lpfc_nlp_put(ndlp);
|
||||||
|
lpfc_els_free_iocb(phba, elsiocb);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
/* This will cause the callback-function lpfc_cmpl_els_cmd to
|
||||||
|
* trigger the release of node.
|
||||||
|
*/
|
||||||
|
if (!(vport->fc_flag & FC_PT2PT))
|
||||||
|
lpfc_nlp_put(ndlp);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* lpfc_issue_els_farpr - Issue a farp to an node on a vport
|
* lpfc_issue_els_farpr - Issue a farp to an node on a vport
|
||||||
* @vport: pointer to a host virtual N_Port data structure.
|
* @vport: pointer to a host virtual N_Port data structure.
|
||||||
|
@ -6214,6 +6326,8 @@ lpfc_rscn_recovery_check(struct lpfc_vport *vport)
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (ndlp->nlp_fc4_type & NLP_FC4_NVME)
|
||||||
|
lpfc_nvme_rescan_port(vport, ndlp);
|
||||||
|
|
||||||
lpfc_disc_state_machine(vport, ndlp, NULL,
|
lpfc_disc_state_machine(vport, ndlp, NULL,
|
||||||
NLP_EVT_DEVICE_RECOVERY);
|
NLP_EVT_DEVICE_RECOVERY);
|
||||||
|
@ -6318,6 +6432,19 @@ lpfc_els_rcv_rscn(struct lpfc_vport *vport, struct lpfc_iocbq *cmdiocb,
|
||||||
fc_host_post_event(shost, fc_get_event_number(),
|
fc_host_post_event(shost, fc_get_event_number(),
|
||||||
FCH_EVT_RSCN, lp[i]);
|
FCH_EVT_RSCN, lp[i]);
|
||||||
|
|
||||||
|
/* Check if RSCN is coming from a direct-connected remote NPort */
|
||||||
|
if (vport->fc_flag & FC_PT2PT) {
|
||||||
|
/* If so, just ACC it, no other action needed for now */
|
||||||
|
lpfc_printf_vlog(vport, KERN_INFO, LOG_ELS,
|
||||||
|
"2024 pt2pt RSCN %08x Data: x%x x%x\n",
|
||||||
|
*lp, vport->fc_flag, payload_len);
|
||||||
|
lpfc_els_rsp_acc(vport, ELS_CMD_ACC, cmdiocb, ndlp, NULL);
|
||||||
|
|
||||||
|
if (ndlp->nlp_fc4_type & NLP_FC4_NVME)
|
||||||
|
lpfc_nvme_rescan_port(vport, ndlp);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
/* If we are about to begin discovery, just ACC the RSCN.
|
/* If we are about to begin discovery, just ACC the RSCN.
|
||||||
* Discovery processing will satisfy it.
|
* Discovery processing will satisfy it.
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -5276,6 +5276,41 @@ lpfc_findnode_did(struct lpfc_vport *vport, uint32_t did)
|
||||||
return ndlp;
|
return ndlp;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
struct lpfc_nodelist *
|
||||||
|
lpfc_findnode_mapped(struct lpfc_vport *vport)
|
||||||
|
{
|
||||||
|
struct Scsi_Host *shost = lpfc_shost_from_vport(vport);
|
||||||
|
struct lpfc_nodelist *ndlp;
|
||||||
|
uint32_t data1;
|
||||||
|
unsigned long iflags;
|
||||||
|
|
||||||
|
spin_lock_irqsave(shost->host_lock, iflags);
|
||||||
|
|
||||||
|
list_for_each_entry(ndlp, &vport->fc_nodes, nlp_listp) {
|
||||||
|
if (ndlp->nlp_state == NLP_STE_UNMAPPED_NODE ||
|
||||||
|
ndlp->nlp_state == NLP_STE_MAPPED_NODE) {
|
||||||
|
data1 = (((uint32_t)ndlp->nlp_state << 24) |
|
||||||
|
((uint32_t)ndlp->nlp_xri << 16) |
|
||||||
|
((uint32_t)ndlp->nlp_type << 8) |
|
||||||
|
((uint32_t)ndlp->nlp_rpi & 0xff));
|
||||||
|
spin_unlock_irqrestore(shost->host_lock, iflags);
|
||||||
|
lpfc_printf_vlog(vport, KERN_INFO, LOG_NODE,
|
||||||
|
"2025 FIND node DID "
|
||||||
|
"Data: x%p x%x x%x x%x %p\n",
|
||||||
|
ndlp, ndlp->nlp_DID,
|
||||||
|
ndlp->nlp_flag, data1,
|
||||||
|
ndlp->active_rrqs_xri_bitmap);
|
||||||
|
return ndlp;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
spin_unlock_irqrestore(shost->host_lock, iflags);
|
||||||
|
|
||||||
|
/* FIND node did <did> NOT FOUND */
|
||||||
|
lpfc_printf_vlog(vport, KERN_INFO, LOG_NODE,
|
||||||
|
"2026 FIND mapped did NOT FOUND.\n");
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
struct lpfc_nodelist *
|
struct lpfc_nodelist *
|
||||||
lpfc_setup_disc_node(struct lpfc_vport *vport, uint32_t did)
|
lpfc_setup_disc_node(struct lpfc_vport *vport, uint32_t did)
|
||||||
{
|
{
|
||||||
|
|
|
@ -601,6 +601,7 @@ struct fc_vft_header {
|
||||||
#define ELS_CMD_RPL 0x57000000
|
#define ELS_CMD_RPL 0x57000000
|
||||||
#define ELS_CMD_FAN 0x60000000
|
#define ELS_CMD_FAN 0x60000000
|
||||||
#define ELS_CMD_RSCN 0x61040000
|
#define ELS_CMD_RSCN 0x61040000
|
||||||
|
#define ELS_CMD_RSCN_XMT 0x61040008
|
||||||
#define ELS_CMD_SCR 0x62000000
|
#define ELS_CMD_SCR 0x62000000
|
||||||
#define ELS_CMD_RNID 0x78000000
|
#define ELS_CMD_RNID 0x78000000
|
||||||
#define ELS_CMD_LIRR 0x7A000000
|
#define ELS_CMD_LIRR 0x7A000000
|
||||||
|
@ -642,6 +643,7 @@ struct fc_vft_header {
|
||||||
#define ELS_CMD_RPL 0x57
|
#define ELS_CMD_RPL 0x57
|
||||||
#define ELS_CMD_FAN 0x60
|
#define ELS_CMD_FAN 0x60
|
||||||
#define ELS_CMD_RSCN 0x0461
|
#define ELS_CMD_RSCN 0x0461
|
||||||
|
#define ELS_CMD_RSCN_XMT 0x08000461
|
||||||
#define ELS_CMD_SCR 0x62
|
#define ELS_CMD_SCR 0x62
|
||||||
#define ELS_CMD_RNID 0x78
|
#define ELS_CMD_RNID 0x78
|
||||||
#define ELS_CMD_LIRR 0x7A
|
#define ELS_CMD_LIRR 0x7A
|
||||||
|
|
|
@ -2402,6 +2402,50 @@ lpfc_nvme_register_port(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* lpfc_nvme_rescan_port - Check to see if we should rescan this remoteport
|
||||||
|
*
|
||||||
|
* If the ndlp represents an NVME Target, that we are logged into,
|
||||||
|
* ping the NVME FC Transport layer to initiate a device rescan
|
||||||
|
* on this remote NPort.
|
||||||
|
*/
|
||||||
|
void
|
||||||
|
lpfc_nvme_rescan_port(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
|
||||||
|
{
|
||||||
|
#if (IS_ENABLED(CONFIG_NVME_FC))
|
||||||
|
struct lpfc_nvme_rport *rport;
|
||||||
|
struct nvme_fc_remote_port *remoteport;
|
||||||
|
|
||||||
|
rport = ndlp->nrport;
|
||||||
|
|
||||||
|
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
|
||||||
|
"6170 Rescan NPort DID x%06x type x%x "
|
||||||
|
"state x%x rport %p\n",
|
||||||
|
ndlp->nlp_DID, ndlp->nlp_type, ndlp->nlp_state, rport);
|
||||||
|
if (!rport)
|
||||||
|
goto input_err;
|
||||||
|
remoteport = rport->remoteport;
|
||||||
|
if (!remoteport)
|
||||||
|
goto input_err;
|
||||||
|
|
||||||
|
/* Only rescan if we are an NVME target in the MAPPED state */
|
||||||
|
if (remoteport->port_role & FC_PORT_ROLE_NVME_DISCOVERY &&
|
||||||
|
ndlp->nlp_state == NLP_STE_MAPPED_NODE) {
|
||||||
|
nvme_fc_rescan_remoteport(remoteport);
|
||||||
|
|
||||||
|
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC,
|
||||||
|
"6172 NVME rescanned DID x%06x "
|
||||||
|
"port_state x%x\n",
|
||||||
|
ndlp->nlp_DID, remoteport->port_state);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
input_err:
|
||||||
|
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC,
|
||||||
|
"6169 State error: lport %p, rport%p FCID x%06x\n",
|
||||||
|
vport->localport, ndlp->rport, ndlp->nlp_DID);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
/* lpfc_nvme_unregister_port - unbind the DID and port_role from this rport.
|
/* lpfc_nvme_unregister_port - unbind the DID and port_role from this rport.
|
||||||
*
|
*
|
||||||
* There is no notion of Devloss or rport recovery from the current
|
* There is no notion of Devloss or rport recovery from the current
|
||||||
|
|
|
@ -1139,6 +1139,22 @@ lpfc_nvmet_defer_rcv(struct nvmet_fc_target_port *tgtport,
|
||||||
spin_unlock_irqrestore(&ctxp->ctxlock, iflag);
|
spin_unlock_irqrestore(&ctxp->ctxlock, iflag);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
lpfc_nvmet_discovery_event(struct nvmet_fc_target_port *tgtport)
|
||||||
|
{
|
||||||
|
struct lpfc_nvmet_tgtport *tgtp;
|
||||||
|
struct lpfc_hba *phba;
|
||||||
|
uint32_t rc;
|
||||||
|
|
||||||
|
tgtp = tgtport->private;
|
||||||
|
phba = tgtp->phba;
|
||||||
|
|
||||||
|
rc = lpfc_issue_els_rscn(phba->pport, 0);
|
||||||
|
lpfc_printf_log(phba, KERN_ERR, LOG_NVME,
|
||||||
|
"6420 NVMET subsystem change: Notification %s\n",
|
||||||
|
(rc) ? "Failed" : "Sent");
|
||||||
|
}
|
||||||
|
|
||||||
static struct nvmet_fc_target_template lpfc_tgttemplate = {
|
static struct nvmet_fc_target_template lpfc_tgttemplate = {
|
||||||
.targetport_delete = lpfc_nvmet_targetport_delete,
|
.targetport_delete = lpfc_nvmet_targetport_delete,
|
||||||
.xmt_ls_rsp = lpfc_nvmet_xmt_ls_rsp,
|
.xmt_ls_rsp = lpfc_nvmet_xmt_ls_rsp,
|
||||||
|
@ -1146,6 +1162,7 @@ static struct nvmet_fc_target_template lpfc_tgttemplate = {
|
||||||
.fcp_abort = lpfc_nvmet_xmt_fcp_abort,
|
.fcp_abort = lpfc_nvmet_xmt_fcp_abort,
|
||||||
.fcp_req_release = lpfc_nvmet_xmt_fcp_release,
|
.fcp_req_release = lpfc_nvmet_xmt_fcp_release,
|
||||||
.defer_rcv = lpfc_nvmet_defer_rcv,
|
.defer_rcv = lpfc_nvmet_defer_rcv,
|
||||||
|
.discovery_event = lpfc_nvmet_discovery_event,
|
||||||
|
|
||||||
.max_hw_queues = 1,
|
.max_hw_queues = 1,
|
||||||
.max_sgl_segments = LPFC_NVMET_DEFAULT_SEGS,
|
.max_sgl_segments = LPFC_NVMET_DEFAULT_SEGS,
|
||||||
|
|
|
@ -9398,6 +9398,7 @@ lpfc_sli4_iocb2wqe(struct lpfc_hba *phba, struct lpfc_iocbq *iocbq,
|
||||||
if (if_type >= LPFC_SLI_INTF_IF_TYPE_2) {
|
if (if_type >= LPFC_SLI_INTF_IF_TYPE_2) {
|
||||||
if (pcmd && (*pcmd == ELS_CMD_FLOGI ||
|
if (pcmd && (*pcmd == ELS_CMD_FLOGI ||
|
||||||
*pcmd == ELS_CMD_SCR ||
|
*pcmd == ELS_CMD_SCR ||
|
||||||
|
*pcmd == ELS_CMD_RSCN_XMT ||
|
||||||
*pcmd == ELS_CMD_FDISC ||
|
*pcmd == ELS_CMD_FDISC ||
|
||||||
*pcmd == ELS_CMD_LOGO ||
|
*pcmd == ELS_CMD_LOGO ||
|
||||||
*pcmd == ELS_CMD_PLOGI)) {
|
*pcmd == ELS_CMD_PLOGI)) {
|
||||||
|
|
|
@ -203,13 +203,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
|
||||||
{
|
{
|
||||||
struct file *file = iocb->ki_filp;
|
struct file *file = iocb->ki_filp;
|
||||||
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
|
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
|
||||||
struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs, *bvec;
|
struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs;
|
||||||
loff_t pos = iocb->ki_pos;
|
loff_t pos = iocb->ki_pos;
|
||||||
bool should_dirty = false;
|
bool should_dirty = false;
|
||||||
struct bio bio;
|
struct bio bio;
|
||||||
ssize_t ret;
|
ssize_t ret;
|
||||||
blk_qc_t qc;
|
blk_qc_t qc;
|
||||||
struct bvec_iter_all iter_all;
|
|
||||||
|
|
||||||
if ((pos | iov_iter_alignment(iter)) &
|
if ((pos | iov_iter_alignment(iter)) &
|
||||||
(bdev_logical_block_size(bdev) - 1))
|
(bdev_logical_block_size(bdev) - 1))
|
||||||
|
@ -259,13 +258,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
|
||||||
}
|
}
|
||||||
__set_current_state(TASK_RUNNING);
|
__set_current_state(TASK_RUNNING);
|
||||||
|
|
||||||
bio_for_each_segment_all(bvec, &bio, iter_all) {
|
bio_release_pages(&bio, should_dirty);
|
||||||
if (should_dirty && !PageCompound(bvec->bv_page))
|
|
||||||
set_page_dirty_lock(bvec->bv_page);
|
|
||||||
if (!bio_flagged(&bio, BIO_NO_PAGE_REF))
|
|
||||||
put_page(bvec->bv_page);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (unlikely(bio.bi_status))
|
if (unlikely(bio.bi_status))
|
||||||
ret = blk_status_to_errno(bio.bi_status);
|
ret = blk_status_to_errno(bio.bi_status);
|
||||||
|
|
||||||
|
@ -335,13 +328,7 @@ static void blkdev_bio_end_io(struct bio *bio)
|
||||||
if (should_dirty) {
|
if (should_dirty) {
|
||||||
bio_check_pages_dirty(bio);
|
bio_check_pages_dirty(bio);
|
||||||
} else {
|
} else {
|
||||||
if (!bio_flagged(bio, BIO_NO_PAGE_REF)) {
|
bio_release_pages(bio, false);
|
||||||
struct bvec_iter_all iter_all;
|
|
||||||
struct bio_vec *bvec;
|
|
||||||
|
|
||||||
bio_for_each_segment_all(bvec, bio, iter_all)
|
|
||||||
put_page(bvec->bv_page);
|
|
||||||
}
|
|
||||||
bio_put(bio);
|
bio_put(bio);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -538,8 +538,8 @@ static struct bio *dio_await_one(struct dio *dio)
|
||||||
*/
|
*/
|
||||||
static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
|
static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
|
||||||
{
|
{
|
||||||
struct bio_vec *bvec;
|
|
||||||
blk_status_t err = bio->bi_status;
|
blk_status_t err = bio->bi_status;
|
||||||
|
bool should_dirty = dio->op == REQ_OP_READ && dio->should_dirty;
|
||||||
|
|
||||||
if (err) {
|
if (err) {
|
||||||
if (err == BLK_STS_AGAIN && (bio->bi_opf & REQ_NOWAIT))
|
if (err == BLK_STS_AGAIN && (bio->bi_opf & REQ_NOWAIT))
|
||||||
|
@ -548,19 +548,10 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
|
||||||
dio->io_error = -EIO;
|
dio->io_error = -EIO;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
|
if (dio->is_async && should_dirty) {
|
||||||
bio_check_pages_dirty(bio); /* transfers ownership */
|
bio_check_pages_dirty(bio); /* transfers ownership */
|
||||||
} else {
|
} else {
|
||||||
struct bvec_iter_all iter_all;
|
bio_release_pages(bio, should_dirty);
|
||||||
|
|
||||||
bio_for_each_segment_all(bvec, bio, iter_all) {
|
|
||||||
struct page *page = bvec->bv_page;
|
|
||||||
|
|
||||||
if (dio->op == REQ_OP_READ && !PageCompound(page) &&
|
|
||||||
dio->should_dirty)
|
|
||||||
set_page_dirty_lock(page);
|
|
||||||
put_page(page);
|
|
||||||
}
|
|
||||||
bio_put(bio);
|
bio_put(bio);
|
||||||
}
|
}
|
||||||
return err;
|
return err;
|
||||||
|
|
|
@ -715,6 +715,7 @@ void wbc_detach_inode(struct writeback_control *wbc)
|
||||||
void wbc_account_io(struct writeback_control *wbc, struct page *page,
|
void wbc_account_io(struct writeback_control *wbc, struct page *page,
|
||||||
size_t bytes)
|
size_t bytes)
|
||||||
{
|
{
|
||||||
|
struct cgroup_subsys_state *css;
|
||||||
int id;
|
int id;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -726,7 +727,12 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page,
|
||||||
if (!wbc->wb)
|
if (!wbc->wb)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
id = mem_cgroup_css_from_page(page)->id;
|
css = mem_cgroup_css_from_page(page);
|
||||||
|
/* dead cgroups shouldn't contribute to inode ownership arbitration */
|
||||||
|
if (!(css->flags & CSS_ONLINE))
|
||||||
|
return;
|
||||||
|
|
||||||
|
id = css->id;
|
||||||
|
|
||||||
if (id == wbc->wb_id) {
|
if (id == wbc->wb_id) {
|
||||||
wbc->wb_bytes += bytes;
|
wbc->wb_bytes += bytes;
|
||||||
|
|
|
@ -998,9 +998,6 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
|
||||||
iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
|
iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
|
||||||
if (offset)
|
if (offset)
|
||||||
iov_iter_advance(iter, offset);
|
iov_iter_advance(iter, offset);
|
||||||
|
|
||||||
/* don't drop a reference to these pages */
|
|
||||||
iter->type |= ITER_BVEC_FLAG_NO_REF;
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
10
fs/iomap.c
10
fs/iomap.c
|
@ -333,7 +333,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
|
||||||
if (iop)
|
if (iop)
|
||||||
atomic_inc(&iop->read_count);
|
atomic_inc(&iop->read_count);
|
||||||
|
|
||||||
if (!ctx->bio || !is_contig || bio_full(ctx->bio)) {
|
if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {
|
||||||
gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
|
gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
|
||||||
int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
|
int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
|
||||||
|
|
||||||
|
@ -1599,13 +1599,7 @@ static void iomap_dio_bio_end_io(struct bio *bio)
|
||||||
if (should_dirty) {
|
if (should_dirty) {
|
||||||
bio_check_pages_dirty(bio);
|
bio_check_pages_dirty(bio);
|
||||||
} else {
|
} else {
|
||||||
if (!bio_flagged(bio, BIO_NO_PAGE_REF)) {
|
bio_release_pages(bio, false);
|
||||||
struct bvec_iter_all iter_all;
|
|
||||||
struct bio_vec *bvec;
|
|
||||||
|
|
||||||
bio_for_each_segment_all(bvec, bio, iter_all)
|
|
||||||
put_page(bvec->bv_page);
|
|
||||||
}
|
|
||||||
bio_put(bio);
|
bio_put(bio);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -782,7 +782,7 @@ xfs_add_to_ioend(
|
||||||
atomic_inc(&iop->write_count);
|
atomic_inc(&iop->write_count);
|
||||||
|
|
||||||
if (!merged) {
|
if (!merged) {
|
||||||
if (bio_full(wpc->ioend->io_bio))
|
if (bio_full(wpc->ioend->io_bio, len))
|
||||||
xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
|
xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
|
||||||
bio_add_page(wpc->ioend->io_bio, page, len, poff);
|
bio_add_page(wpc->ioend->io_bio, page, len, poff);
|
||||||
}
|
}
|
||||||
|
|
|
@ -102,9 +102,23 @@ static inline void *bio_data(struct bio *bio)
|
||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline bool bio_full(struct bio *bio)
|
/**
|
||||||
|
* bio_full - check if the bio is full
|
||||||
|
* @bio: bio to check
|
||||||
|
* @len: length of one segment to be added
|
||||||
|
*
|
||||||
|
* Return true if @bio is full and one segment with @len bytes can't be
|
||||||
|
* added to the bio, otherwise return false
|
||||||
|
*/
|
||||||
|
static inline bool bio_full(struct bio *bio, unsigned len)
|
||||||
{
|
{
|
||||||
return bio->bi_vcnt >= bio->bi_max_vecs;
|
if (bio->bi_vcnt >= bio->bi_max_vecs)
|
||||||
|
return true;
|
||||||
|
|
||||||
|
if (bio->bi_iter.bi_size > UINT_MAX - len)
|
||||||
|
return true;
|
||||||
|
|
||||||
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline bool bio_next_segment(const struct bio *bio,
|
static inline bool bio_next_segment(const struct bio *bio,
|
||||||
|
@ -408,7 +422,6 @@ static inline void bio_wouldblock_error(struct bio *bio)
|
||||||
}
|
}
|
||||||
|
|
||||||
struct request_queue;
|
struct request_queue;
|
||||||
extern int bio_phys_segments(struct request_queue *, struct bio *);
|
|
||||||
|
|
||||||
extern int submit_bio_wait(struct bio *bio);
|
extern int submit_bio_wait(struct bio *bio);
|
||||||
extern void bio_advance(struct bio *, unsigned);
|
extern void bio_advance(struct bio *, unsigned);
|
||||||
|
@ -427,6 +440,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
|
||||||
void __bio_add_page(struct bio *bio, struct page *page,
|
void __bio_add_page(struct bio *bio, struct page *page,
|
||||||
unsigned int len, unsigned int off);
|
unsigned int len, unsigned int off);
|
||||||
int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
|
int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
|
||||||
|
void bio_release_pages(struct bio *bio, bool mark_dirty);
|
||||||
struct rq_map_data;
|
struct rq_map_data;
|
||||||
extern struct bio *bio_map_user_iov(struct request_queue *,
|
extern struct bio *bio_map_user_iov(struct request_queue *,
|
||||||
struct iov_iter *, gfp_t);
|
struct iov_iter *, gfp_t);
|
||||||
|
@ -444,17 +458,6 @@ void generic_end_io_acct(struct request_queue *q, int op,
|
||||||
struct hd_struct *part,
|
struct hd_struct *part,
|
||||||
unsigned long start_time);
|
unsigned long start_time);
|
||||||
|
|
||||||
#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
|
|
||||||
# error "You should define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE for your platform"
|
|
||||||
#endif
|
|
||||||
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
|
|
||||||
extern void bio_flush_dcache_pages(struct bio *bi);
|
|
||||||
#else
|
|
||||||
static inline void bio_flush_dcache_pages(struct bio *bi)
|
|
||||||
{
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
|
extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
|
||||||
struct bio *src, struct bvec_iter *src_iter);
|
struct bio *src, struct bvec_iter *src_iter);
|
||||||
extern void bio_copy_data(struct bio *dst, struct bio *src);
|
extern void bio_copy_data(struct bio *dst, struct bio *src);
|
||||||
|
|
|
@ -63,19 +63,17 @@ struct blkcg {
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* blkg_[rw]stat->aux_cnt is excluded for local stats but included for
|
* blkg_[rw]stat->aux_cnt is excluded for local stats but included for
|
||||||
* recursive. Used to carry stats of dead children, and, for blkg_rwstat,
|
* recursive. Used to carry stats of dead children.
|
||||||
* to carry result values from read and sum operations.
|
|
||||||
*/
|
*/
|
||||||
struct blkg_stat {
|
|
||||||
struct percpu_counter cpu_cnt;
|
|
||||||
atomic64_t aux_cnt;
|
|
||||||
};
|
|
||||||
|
|
||||||
struct blkg_rwstat {
|
struct blkg_rwstat {
|
||||||
struct percpu_counter cpu_cnt[BLKG_RWSTAT_NR];
|
struct percpu_counter cpu_cnt[BLKG_RWSTAT_NR];
|
||||||
atomic64_t aux_cnt[BLKG_RWSTAT_NR];
|
atomic64_t aux_cnt[BLKG_RWSTAT_NR];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
struct blkg_rwstat_sample {
|
||||||
|
u64 cnt[BLKG_RWSTAT_NR];
|
||||||
|
};
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* A blkcg_gq (blkg) is association between a block cgroup (blkcg) and a
|
* A blkcg_gq (blkg) is association between a block cgroup (blkcg) and a
|
||||||
* request_queue (q). This is used by blkcg policies which need to track
|
* request_queue (q). This is used by blkcg policies which need to track
|
||||||
|
@ -198,6 +196,13 @@ int blkcg_activate_policy(struct request_queue *q,
|
||||||
void blkcg_deactivate_policy(struct request_queue *q,
|
void blkcg_deactivate_policy(struct request_queue *q,
|
||||||
const struct blkcg_policy *pol);
|
const struct blkcg_policy *pol);
|
||||||
|
|
||||||
|
static inline u64 blkg_rwstat_read_counter(struct blkg_rwstat *rwstat,
|
||||||
|
unsigned int idx)
|
||||||
|
{
|
||||||
|
return atomic64_read(&rwstat->aux_cnt[idx]) +
|
||||||
|
percpu_counter_sum_positive(&rwstat->cpu_cnt[idx]);
|
||||||
|
}
|
||||||
|
|
||||||
const char *blkg_dev_name(struct blkcg_gq *blkg);
|
const char *blkg_dev_name(struct blkcg_gq *blkg);
|
||||||
void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg,
|
void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg,
|
||||||
u64 (*prfill)(struct seq_file *,
|
u64 (*prfill)(struct seq_file *,
|
||||||
|
@ -206,8 +211,7 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg,
|
||||||
bool show_total);
|
bool show_total);
|
||||||
u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v);
|
u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v);
|
||||||
u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
|
u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
|
||||||
const struct blkg_rwstat *rwstat);
|
const struct blkg_rwstat_sample *rwstat);
|
||||||
u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd, int off);
|
|
||||||
u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
|
u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
|
||||||
int off);
|
int off);
|
||||||
int blkg_print_stat_bytes(struct seq_file *sf, void *v);
|
int blkg_print_stat_bytes(struct seq_file *sf, void *v);
|
||||||
|
@ -215,10 +219,8 @@ int blkg_print_stat_ios(struct seq_file *sf, void *v);
|
||||||
int blkg_print_stat_bytes_recursive(struct seq_file *sf, void *v);
|
int blkg_print_stat_bytes_recursive(struct seq_file *sf, void *v);
|
||||||
int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v);
|
int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v);
|
||||||
|
|
||||||
u64 blkg_stat_recursive_sum(struct blkcg_gq *blkg,
|
void blkg_rwstat_recursive_sum(struct blkcg_gq *blkg, struct blkcg_policy *pol,
|
||||||
struct blkcg_policy *pol, int off);
|
int off, struct blkg_rwstat_sample *sum);
|
||||||
struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
|
|
||||||
struct blkcg_policy *pol, int off);
|
|
||||||
|
|
||||||
struct blkg_conf_ctx {
|
struct blkg_conf_ctx {
|
||||||
struct gendisk *disk;
|
struct gendisk *disk;
|
||||||
|
@ -569,69 +571,6 @@ static inline void blkg_put(struct blkcg_gq *blkg)
|
||||||
if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \
|
if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \
|
||||||
(p_blkg)->q, false)))
|
(p_blkg)->q, false)))
|
||||||
|
|
||||||
static inline int blkg_stat_init(struct blkg_stat *stat, gfp_t gfp)
|
|
||||||
{
|
|
||||||
int ret;
|
|
||||||
|
|
||||||
ret = percpu_counter_init(&stat->cpu_cnt, 0, gfp);
|
|
||||||
if (ret)
|
|
||||||
return ret;
|
|
||||||
|
|
||||||
atomic64_set(&stat->aux_cnt, 0);
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline void blkg_stat_exit(struct blkg_stat *stat)
|
|
||||||
{
|
|
||||||
percpu_counter_destroy(&stat->cpu_cnt);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* blkg_stat_add - add a value to a blkg_stat
|
|
||||||
* @stat: target blkg_stat
|
|
||||||
* @val: value to add
|
|
||||||
*
|
|
||||||
* Add @val to @stat. The caller must ensure that IRQ on the same CPU
|
|
||||||
* don't re-enter this function for the same counter.
|
|
||||||
*/
|
|
||||||
static inline void blkg_stat_add(struct blkg_stat *stat, uint64_t val)
|
|
||||||
{
|
|
||||||
percpu_counter_add_batch(&stat->cpu_cnt, val, BLKG_STAT_CPU_BATCH);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* blkg_stat_read - read the current value of a blkg_stat
|
|
||||||
* @stat: blkg_stat to read
|
|
||||||
*/
|
|
||||||
static inline uint64_t blkg_stat_read(struct blkg_stat *stat)
|
|
||||||
{
|
|
||||||
return percpu_counter_sum_positive(&stat->cpu_cnt);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* blkg_stat_reset - reset a blkg_stat
|
|
||||||
* @stat: blkg_stat to reset
|
|
||||||
*/
|
|
||||||
static inline void blkg_stat_reset(struct blkg_stat *stat)
|
|
||||||
{
|
|
||||||
percpu_counter_set(&stat->cpu_cnt, 0);
|
|
||||||
atomic64_set(&stat->aux_cnt, 0);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
|
||||||
* blkg_stat_add_aux - add a blkg_stat into another's aux count
|
|
||||||
* @to: the destination blkg_stat
|
|
||||||
* @from: the source
|
|
||||||
*
|
|
||||||
* Add @from's count including the aux one to @to's aux count.
|
|
||||||
*/
|
|
||||||
static inline void blkg_stat_add_aux(struct blkg_stat *to,
|
|
||||||
struct blkg_stat *from)
|
|
||||||
{
|
|
||||||
atomic64_add(blkg_stat_read(from) + atomic64_read(&from->aux_cnt),
|
|
||||||
&to->aux_cnt);
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline int blkg_rwstat_init(struct blkg_rwstat *rwstat, gfp_t gfp)
|
static inline int blkg_rwstat_init(struct blkg_rwstat *rwstat, gfp_t gfp)
|
||||||
{
|
{
|
||||||
int i, ret;
|
int i, ret;
|
||||||
|
@ -693,15 +632,14 @@ static inline void blkg_rwstat_add(struct blkg_rwstat *rwstat,
|
||||||
*
|
*
|
||||||
* Read the current snapshot of @rwstat and return it in the aux counts.
|
* Read the current snapshot of @rwstat and return it in the aux counts.
|
||||||
*/
|
*/
|
||||||
static inline struct blkg_rwstat blkg_rwstat_read(struct blkg_rwstat *rwstat)
|
static inline void blkg_rwstat_read(struct blkg_rwstat *rwstat,
|
||||||
|
struct blkg_rwstat_sample *result)
|
||||||
{
|
{
|
||||||
struct blkg_rwstat result;
|
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
for (i = 0; i < BLKG_RWSTAT_NR; i++)
|
for (i = 0; i < BLKG_RWSTAT_NR; i++)
|
||||||
atomic64_set(&result.aux_cnt[i],
|
result->cnt[i] =
|
||||||
percpu_counter_sum_positive(&rwstat->cpu_cnt[i]));
|
percpu_counter_sum_positive(&rwstat->cpu_cnt[i]);
|
||||||
return result;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@ -714,10 +652,10 @@ static inline struct blkg_rwstat blkg_rwstat_read(struct blkg_rwstat *rwstat)
|
||||||
*/
|
*/
|
||||||
static inline uint64_t blkg_rwstat_total(struct blkg_rwstat *rwstat)
|
static inline uint64_t blkg_rwstat_total(struct blkg_rwstat *rwstat)
|
||||||
{
|
{
|
||||||
struct blkg_rwstat tmp = blkg_rwstat_read(rwstat);
|
struct blkg_rwstat_sample tmp = { };
|
||||||
|
|
||||||
return atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_READ]) +
|
blkg_rwstat_read(rwstat, &tmp);
|
||||||
atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_WRITE]);
|
return tmp.cnt[BLKG_RWSTAT_READ] + tmp.cnt[BLKG_RWSTAT_WRITE];
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|
|
@ -306,7 +306,7 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs
|
||||||
bool blk_mq_complete_request(struct request *rq);
|
bool blk_mq_complete_request(struct request *rq);
|
||||||
void blk_mq_complete_request_sync(struct request *rq);
|
void blk_mq_complete_request_sync(struct request *rq);
|
||||||
bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
|
bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
|
||||||
struct bio *bio);
|
struct bio *bio, unsigned int nr_segs);
|
||||||
bool blk_mq_queue_stopped(struct request_queue *q);
|
bool blk_mq_queue_stopped(struct request_queue *q);
|
||||||
void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
|
void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
|
||||||
void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);
|
void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);
|
||||||
|
|
|
@ -154,11 +154,6 @@ struct bio {
|
||||||
blk_status_t bi_status;
|
blk_status_t bi_status;
|
||||||
u8 bi_partno;
|
u8 bi_partno;
|
||||||
|
|
||||||
/* Number of segments in this BIO after
|
|
||||||
* physical address coalescing is performed.
|
|
||||||
*/
|
|
||||||
unsigned int bi_phys_segments;
|
|
||||||
|
|
||||||
struct bvec_iter bi_iter;
|
struct bvec_iter bi_iter;
|
||||||
|
|
||||||
atomic_t __bi_remaining;
|
atomic_t __bi_remaining;
|
||||||
|
@ -210,7 +205,6 @@ struct bio {
|
||||||
*/
|
*/
|
||||||
enum {
|
enum {
|
||||||
BIO_NO_PAGE_REF, /* don't put release vec pages */
|
BIO_NO_PAGE_REF, /* don't put release vec pages */
|
||||||
BIO_SEG_VALID, /* bi_phys_segments valid */
|
|
||||||
BIO_CLONED, /* doesn't own data */
|
BIO_CLONED, /* doesn't own data */
|
||||||
BIO_BOUNCED, /* bio is a bounce bio */
|
BIO_BOUNCED, /* bio is a bounce bio */
|
||||||
BIO_USER_MAPPED, /* contains user pages */
|
BIO_USER_MAPPED, /* contains user pages */
|
||||||
|
|
|
@ -137,11 +137,11 @@ struct request {
|
||||||
unsigned int cmd_flags; /* op and common flags */
|
unsigned int cmd_flags; /* op and common flags */
|
||||||
req_flags_t rq_flags;
|
req_flags_t rq_flags;
|
||||||
|
|
||||||
|
int tag;
|
||||||
int internal_tag;
|
int internal_tag;
|
||||||
|
|
||||||
/* the following two fields are internal, NEVER access directly */
|
/* the following two fields are internal, NEVER access directly */
|
||||||
unsigned int __data_len; /* total data len */
|
unsigned int __data_len; /* total data len */
|
||||||
int tag;
|
|
||||||
sector_t __sector; /* sector cursor */
|
sector_t __sector; /* sector cursor */
|
||||||
|
|
||||||
struct bio *bio;
|
struct bio *bio;
|
||||||
|
@ -828,7 +828,6 @@ extern void blk_unregister_queue(struct gendisk *disk);
|
||||||
extern blk_qc_t generic_make_request(struct bio *bio);
|
extern blk_qc_t generic_make_request(struct bio *bio);
|
||||||
extern blk_qc_t direct_make_request(struct bio *bio);
|
extern blk_qc_t direct_make_request(struct bio *bio);
|
||||||
extern void blk_rq_init(struct request_queue *q, struct request *rq);
|
extern void blk_rq_init(struct request_queue *q, struct request *rq);
|
||||||
extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
|
|
||||||
extern void blk_put_request(struct request *);
|
extern void blk_put_request(struct request *);
|
||||||
extern struct request *blk_get_request(struct request_queue *, unsigned int op,
|
extern struct request *blk_get_request(struct request_queue *, unsigned int op,
|
||||||
blk_mq_req_flags_t flags);
|
blk_mq_req_flags_t flags);
|
||||||
|
@ -842,7 +841,6 @@ extern blk_status_t blk_insert_cloned_request(struct request_queue *q,
|
||||||
struct request *rq);
|
struct request *rq);
|
||||||
extern int blk_rq_append_bio(struct request *rq, struct bio **bio);
|
extern int blk_rq_append_bio(struct request *rq, struct bio **bio);
|
||||||
extern void blk_queue_split(struct request_queue *, struct bio **);
|
extern void blk_queue_split(struct request_queue *, struct bio **);
|
||||||
extern void blk_recount_segments(struct request_queue *, struct bio *);
|
|
||||||
extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
|
extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
|
||||||
extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
|
extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
|
||||||
unsigned int, void __user *);
|
unsigned int, void __user *);
|
||||||
|
@ -867,6 +865,9 @@ extern void blk_execute_rq(struct request_queue *, struct gendisk *,
|
||||||
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
|
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
|
||||||
struct request *, int, rq_end_io_fn *);
|
struct request *, int, rq_end_io_fn *);
|
||||||
|
|
||||||
|
/* Helper to convert REQ_OP_XXX to its string format XXX */
|
||||||
|
extern const char *blk_op_str(unsigned int op);
|
||||||
|
|
||||||
int blk_status_to_errno(blk_status_t status);
|
int blk_status_to_errno(blk_status_t status);
|
||||||
blk_status_t errno_to_blk_status(int errno);
|
blk_status_t errno_to_blk_status(int errno);
|
||||||
|
|
||||||
|
@ -1026,21 +1027,9 @@ void blk_steal_bios(struct bio_list *list, struct request *rq);
|
||||||
*
|
*
|
||||||
* blk_update_request() completes given number of bytes and updates
|
* blk_update_request() completes given number of bytes and updates
|
||||||
* the request without completing it.
|
* the request without completing it.
|
||||||
*
|
|
||||||
* blk_end_request() and friends. __blk_end_request() must be called
|
|
||||||
* with the request queue spinlock acquired.
|
|
||||||
*
|
|
||||||
* Several drivers define their own end_request and call
|
|
||||||
* blk_end_request() for parts of the original function.
|
|
||||||
* This prevents code duplication in drivers.
|
|
||||||
*/
|
*/
|
||||||
extern bool blk_update_request(struct request *rq, blk_status_t error,
|
extern bool blk_update_request(struct request *rq, blk_status_t error,
|
||||||
unsigned int nr_bytes);
|
unsigned int nr_bytes);
|
||||||
extern void blk_end_request_all(struct request *rq, blk_status_t error);
|
|
||||||
extern bool __blk_end_request(struct request *rq, blk_status_t error,
|
|
||||||
unsigned int nr_bytes);
|
|
||||||
extern void __blk_end_request_all(struct request *rq, blk_status_t error);
|
|
||||||
extern bool __blk_end_request_cur(struct request *rq, blk_status_t error);
|
|
||||||
|
|
||||||
extern void __blk_complete_request(struct request *);
|
extern void __blk_complete_request(struct request *);
|
||||||
extern void blk_abort_request(struct request *);
|
extern void blk_abort_request(struct request *);
|
||||||
|
|
|
@ -34,7 +34,7 @@ struct elevator_mq_ops {
|
||||||
void (*depth_updated)(struct blk_mq_hw_ctx *);
|
void (*depth_updated)(struct blk_mq_hw_ctx *);
|
||||||
|
|
||||||
bool (*allow_merge)(struct request_queue *, struct request *, struct bio *);
|
bool (*allow_merge)(struct request_queue *, struct request *, struct bio *);
|
||||||
bool (*bio_merge)(struct blk_mq_hw_ctx *, struct bio *);
|
bool (*bio_merge)(struct blk_mq_hw_ctx *, struct bio *, unsigned int);
|
||||||
int (*request_merge)(struct request_queue *q, struct request **, struct bio *);
|
int (*request_merge)(struct request_queue *q, struct request **, struct bio *);
|
||||||
void (*request_merged)(struct request_queue *, struct request *, enum elv_merge);
|
void (*request_merged)(struct request_queue *, struct request *, enum elv_merge);
|
||||||
void (*requests_merged)(struct request_queue *, struct request *, struct request *);
|
void (*requests_merged)(struct request_queue *, struct request *, struct request *);
|
||||||
|
|
|
@ -791,6 +791,11 @@ struct nvmet_fc_target_port {
|
||||||
* nvmefc_tgt_fcp_req.
|
* nvmefc_tgt_fcp_req.
|
||||||
* Entrypoint is Optional.
|
* Entrypoint is Optional.
|
||||||
*
|
*
|
||||||
|
* @discovery_event: Called by the transport to generate an RSCN
|
||||||
|
* change notifications to NVME initiators. The RSCN notifications
|
||||||
|
* should cause the initiator to rescan the discovery controller
|
||||||
|
* on the targetport.
|
||||||
|
*
|
||||||
* @max_hw_queues: indicates the maximum number of hw queues the LLDD
|
* @max_hw_queues: indicates the maximum number of hw queues the LLDD
|
||||||
* supports for cpu affinitization.
|
* supports for cpu affinitization.
|
||||||
* Value is Mandatory. Must be at least 1.
|
* Value is Mandatory. Must be at least 1.
|
||||||
|
@ -832,6 +837,7 @@ struct nvmet_fc_target_template {
|
||||||
struct nvmefc_tgt_fcp_req *fcpreq);
|
struct nvmefc_tgt_fcp_req *fcpreq);
|
||||||
void (*defer_rcv)(struct nvmet_fc_target_port *tgtport,
|
void (*defer_rcv)(struct nvmet_fc_target_port *tgtport,
|
||||||
struct nvmefc_tgt_fcp_req *fcpreq);
|
struct nvmefc_tgt_fcp_req *fcpreq);
|
||||||
|
void (*discovery_event)(struct nvmet_fc_target_port *tgtport);
|
||||||
|
|
||||||
u32 max_hw_queues;
|
u32 max_hw_queues;
|
||||||
u16 max_sgl_segments;
|
u16 max_sgl_segments;
|
||||||
|
|
|
@ -562,6 +562,22 @@ enum nvme_opcode {
|
||||||
nvme_cmd_resv_release = 0x15,
|
nvme_cmd_resv_release = 0x15,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
#define nvme_opcode_name(opcode) { opcode, #opcode }
|
||||||
|
#define show_nvm_opcode_name(val) \
|
||||||
|
__print_symbolic(val, \
|
||||||
|
nvme_opcode_name(nvme_cmd_flush), \
|
||||||
|
nvme_opcode_name(nvme_cmd_write), \
|
||||||
|
nvme_opcode_name(nvme_cmd_read), \
|
||||||
|
nvme_opcode_name(nvme_cmd_write_uncor), \
|
||||||
|
nvme_opcode_name(nvme_cmd_compare), \
|
||||||
|
nvme_opcode_name(nvme_cmd_write_zeroes), \
|
||||||
|
nvme_opcode_name(nvme_cmd_dsm), \
|
||||||
|
nvme_opcode_name(nvme_cmd_resv_register), \
|
||||||
|
nvme_opcode_name(nvme_cmd_resv_report), \
|
||||||
|
nvme_opcode_name(nvme_cmd_resv_acquire), \
|
||||||
|
nvme_opcode_name(nvme_cmd_resv_release))
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Descriptor subtype - lower 4 bits of nvme_(keyed_)sgl_desc identifier
|
* Descriptor subtype - lower 4 bits of nvme_(keyed_)sgl_desc identifier
|
||||||
*
|
*
|
||||||
|
@ -794,6 +810,32 @@ enum nvme_admin_opcode {
|
||||||
nvme_admin_sanitize_nvm = 0x84,
|
nvme_admin_sanitize_nvm = 0x84,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
#define nvme_admin_opcode_name(opcode) { opcode, #opcode }
|
||||||
|
#define show_admin_opcode_name(val) \
|
||||||
|
__print_symbolic(val, \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_delete_sq), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_create_sq), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_get_log_page), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_delete_cq), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_create_cq), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_identify), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_abort_cmd), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_set_features), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_get_features), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_async_event), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_ns_mgmt), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_activate_fw), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_download_fw), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_ns_attach), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_keep_alive), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_directive_send), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_directive_recv), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_dbbuf), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_format_nvm), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_security_send), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_security_recv), \
|
||||||
|
nvme_admin_opcode_name(nvme_admin_sanitize_nvm))
|
||||||
|
|
||||||
enum {
|
enum {
|
||||||
NVME_QUEUE_PHYS_CONTIG = (1 << 0),
|
NVME_QUEUE_PHYS_CONTIG = (1 << 0),
|
||||||
NVME_CQ_IRQ_ENABLED = (1 << 1),
|
NVME_CQ_IRQ_ENABLED = (1 << 1),
|
||||||
|
@ -1008,6 +1050,23 @@ enum nvmf_capsule_command {
|
||||||
nvme_fabrics_type_property_get = 0x04,
|
nvme_fabrics_type_property_get = 0x04,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
#define nvme_fabrics_type_name(type) { type, #type }
|
||||||
|
#define show_fabrics_type_name(type) \
|
||||||
|
__print_symbolic(type, \
|
||||||
|
nvme_fabrics_type_name(nvme_fabrics_type_property_set), \
|
||||||
|
nvme_fabrics_type_name(nvme_fabrics_type_connect), \
|
||||||
|
nvme_fabrics_type_name(nvme_fabrics_type_property_get))
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If not fabrics command, fctype will be ignored.
|
||||||
|
*/
|
||||||
|
#define show_opcode_name(qid, opcode, fctype) \
|
||||||
|
((opcode) == nvme_fabrics_command ? \
|
||||||
|
show_fabrics_type_name(fctype) : \
|
||||||
|
((qid) ? \
|
||||||
|
show_nvm_opcode_name(opcode) : \
|
||||||
|
show_admin_opcode_name(opcode)))
|
||||||
|
|
||||||
struct nvmf_common_command {
|
struct nvmf_common_command {
|
||||||
__u8 opcode;
|
__u8 opcode;
|
||||||
__u8 resv1;
|
__u8 resv1;
|
||||||
|
@ -1165,6 +1224,11 @@ struct nvme_command {
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
static inline bool nvme_is_fabrics(struct nvme_command *cmd)
|
||||||
|
{
|
||||||
|
return cmd->common.opcode == nvme_fabrics_command;
|
||||||
|
}
|
||||||
|
|
||||||
struct nvme_error_slot {
|
struct nvme_error_slot {
|
||||||
__le64 error_count;
|
__le64 error_count;
|
||||||
__le16 sqid;
|
__le16 sqid;
|
||||||
|
@ -1186,7 +1250,7 @@ static inline bool nvme_is_write(struct nvme_command *cmd)
|
||||||
*
|
*
|
||||||
* Why can't we simply have a Fabrics In and Fabrics out command?
|
* Why can't we simply have a Fabrics In and Fabrics out command?
|
||||||
*/
|
*/
|
||||||
if (unlikely(cmd->common.opcode == nvme_fabrics_command))
|
if (unlikely(nvme_is_fabrics(cmd)))
|
||||||
return cmd->fabrics.fctype & 1;
|
return cmd->fabrics.fctype & 1;
|
||||||
return cmd->common.opcode & 1;
|
return cmd->common.opcode & 1;
|
||||||
}
|
}
|
||||||
|
|
|
@ -39,6 +39,9 @@ static inline bool is_sed_ioctl(unsigned int cmd)
|
||||||
case IOC_OPAL_ENABLE_DISABLE_MBR:
|
case IOC_OPAL_ENABLE_DISABLE_MBR:
|
||||||
case IOC_OPAL_ERASE_LR:
|
case IOC_OPAL_ERASE_LR:
|
||||||
case IOC_OPAL_SECURE_ERASE_LR:
|
case IOC_OPAL_SECURE_ERASE_LR:
|
||||||
|
case IOC_OPAL_PSID_REVERT_TPR:
|
||||||
|
case IOC_OPAL_MBR_DONE:
|
||||||
|
case IOC_OPAL_WRITE_SHADOW_MBR:
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
return false;
|
return false;
|
||||||
|
|
|
@ -19,9 +19,6 @@ struct kvec {
|
||||||
};
|
};
|
||||||
|
|
||||||
enum iter_type {
|
enum iter_type {
|
||||||
/* set if ITER_BVEC doesn't hold a bv_page ref */
|
|
||||||
ITER_BVEC_FLAG_NO_REF = 2,
|
|
||||||
|
|
||||||
/* iter types */
|
/* iter types */
|
||||||
ITER_IOVEC = 4,
|
ITER_IOVEC = 4,
|
||||||
ITER_KVEC = 8,
|
ITER_KVEC = 8,
|
||||||
|
@ -56,7 +53,7 @@ struct iov_iter {
|
||||||
|
|
||||||
static inline enum iter_type iov_iter_type(const struct iov_iter *i)
|
static inline enum iter_type iov_iter_type(const struct iov_iter *i)
|
||||||
{
|
{
|
||||||
return i->type & ~(READ | WRITE | ITER_BVEC_FLAG_NO_REF);
|
return i->type & ~(READ | WRITE);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline bool iter_is_iovec(const struct iov_iter *i)
|
static inline bool iter_is_iovec(const struct iov_iter *i)
|
||||||
|
@ -89,11 +86,6 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)
|
||||||
return i->type & (READ | WRITE);
|
return i->type & (READ | WRITE);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline bool iov_iter_bvec_no_ref(const struct iov_iter *i)
|
|
||||||
{
|
|
||||||
return (i->type & ITER_BVEC_FLAG_NO_REF) != 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Total number of bytes covered by an iovec.
|
* Total number of bytes covered by an iovec.
|
||||||
*
|
*
|
||||||
|
|
|
@ -76,16 +76,7 @@ TRACE_DEFINE_ENUM(CP_TRIMMED);
|
||||||
#define show_bio_type(op,op_flags) show_bio_op(op), \
|
#define show_bio_type(op,op_flags) show_bio_op(op), \
|
||||||
show_bio_op_flags(op_flags)
|
show_bio_op_flags(op_flags)
|
||||||
|
|
||||||
#define show_bio_op(op) \
|
#define show_bio_op(op) blk_op_str(op)
|
||||||
__print_symbolic(op, \
|
|
||||||
{ REQ_OP_READ, "READ" }, \
|
|
||||||
{ REQ_OP_WRITE, "WRITE" }, \
|
|
||||||
{ REQ_OP_FLUSH, "FLUSH" }, \
|
|
||||||
{ REQ_OP_DISCARD, "DISCARD" }, \
|
|
||||||
{ REQ_OP_SECURE_ERASE, "SECURE_ERASE" }, \
|
|
||||||
{ REQ_OP_ZONE_RESET, "ZONE_RESET" }, \
|
|
||||||
{ REQ_OP_WRITE_SAME, "WRITE_SAME" }, \
|
|
||||||
{ REQ_OP_WRITE_ZEROES, "WRITE_ZEROES" })
|
|
||||||
|
|
||||||
#define show_bio_op_flags(flags) \
|
#define show_bio_op_flags(flags) \
|
||||||
__print_flags(F2FS_BIO_FLAG_MASK(flags), "|", \
|
__print_flags(F2FS_BIO_FLAG_MASK(flags), "|", \
|
||||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue