1
0
Fork 0

for-5.3/block-20190708

-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0jrIMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptlFD/9CNsBX+Aap2lO6wKNr6QISwNAK76GMzEay
 s4LSY2kGkXvzv8i89mCuY+8UVNI8WH2/22WnU+8CBAJOjWyFQMsIwH/mrq0oZWRD
 J6STJE8rTr6Fc2MvJUWryp/xdBh3+eDIsAdIZVHVAkIzqYPBnpIAwEIeIw8t0xsm
 v9ngpQ3WD6ep8tOj9pnG1DGKFg1CmukZCC/Y4CQV1vZtmm2I935zUwNV/TB+Egfx
 G8JSC0cSV02LMK88HCnA6MnC/XSUC0qgfXbnmP+TpKlgjVX+P/fuB3oIYcZEu2Rk
 3YBpIkhsQytKYbF42KRLsmBH72u6oB9G+tNZTgB1STUDrZqdtD9xwX1rjDlY0ZzP
 EUDnk48jl/cxbs+VZrHoE2TcNonLiymV7Kb92juHXdIYmKFQStprGcQUbMaTkMfB
 6BYrYLifWx0leu1JJ1i7qhNmug94BYCSCxcRmH0p6kPazPcY9LXNmDWMfMuBPZT7
 z79VLZnHF2wNXJyT1cBluwRYYJRT4osWZ3XUaBWFKDgf1qyvXJfrN/4zmgkEIyW7
 ivXC+KLlGkhntDlWo2pLKbbyOIKY1HmU6aROaI11k5Zyh0ixKB7tHKavK39l+NOo
 YB41+4l6VEpQEyxyRk8tO0sbHpKaKB+evVIK3tTwbY+Q0qTExErxjfWUtOgRWhjx
 iXJssPRo4w==
 =VSYT
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "This is the main block updates for 5.3. Nothing earth shattering or
  major in here, just fixes, additions, and improvements all over the
  map. This contains:

   - Series of documentation fixes (Bart)

   - Optimization of the blk-mq ctx get/put (Bart)

   - null_blk removal race condition fix (Bob)

   - req/bio_op() cleanups (Chaitanya)

   - Series cleaning up the segment accounting, and request/bio mapping
     (Christoph)

   - Series cleaning up the page getting/putting for bios (Christoph)

   - block cgroup cleanups and moving it to where it is used (Christoph)

   - block cgroup fixes (Tejun)

   - Series of fixes and improvements to bcache, most notably a write
     deadlock fix (Coly)

   - blk-iolatency STS_AGAIN and accounting fixes (Dennis)

   - Series of improvements and fixes to BFQ (Douglas, Paolo)

   - debugfs_create() return value check removal for drbd (Greg)

   - Use struct_size(), where appropriate (Gustavo)

   - Two lighnvm fixes (Heiner, Geert)

   - MD fixes, including a read balance and corruption fix (Guoqing,
     Marcos, Xiao, Yufen)

   - block opal shadow mbr additions (Jonas, Revanth)

   - sbitmap compare-and-exhange improvemnts (Pavel)

   - Fix for potential bio->bi_size overflow (Ming)

   - NVMe pull requests:
       - improved PCIe suspent support (Keith Busch)
       - error injection support for the admin queue (Akinobu Mita)
       - Fibre Channel discovery improvements (James Smart)
       - tracing improvements including nvmetc tracing support (Minwoo Im)
       - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya
         Kulkarni)"

   - Various little fixes and improvements to drivers and core"

* tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits)
  blk-iolatency: fix STS_AGAIN handling
  block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
  blk-mq: simplify blk_mq_make_request()
  blk-mq: remove blk_mq_put_ctx()
  sbitmap: Replace cmpxchg with xchg
  block: fix .bi_size overflow
  block: sed-opal: check size of shadow mbr
  block: sed-opal: ioctl for writing to shadow mbr
  block: sed-opal: add ioctl for done-mark of shadow mbr
  block: never take page references for ITER_BVEC
  direct-io: use bio_release_pages in dio_bio_complete
  block_dev: use bio_release_pages in bio_unmap_user
  block_dev: use bio_release_pages in blkdev_bio_end_io
  iomap: use bio_release_pages in iomap_dio_bio_end_io
  block: use bio_release_pages in bio_map_user_iov
  block: use bio_release_pages in bio_unmap_user
  block: optionally mark pages dirty in bio_release_pages
  block: move the BIO_NO_PAGE_REF check into bio_release_pages
  block: skd_main.c: Remove call to memset after dma_alloc_coherent
  block: mtip32xx: Remove call to memset after dma_alloc_coherent
  ...
alistair/sunxi64-5.4-dsi
Linus Torvalds 2019-07-09 10:45:06 -07:00
commit 3b99107f0e
104 changed files with 3370 additions and 1556 deletions

View File

@ -38,13 +38,13 @@ stack). To give an idea of the limits with BFQ, on slow or average
CPUs, here are, first, the limits of BFQ for three different CPUs, on, CPUs, here are, first, the limits of BFQ for three different CPUs, on,
respectively, an average laptop, an old desktop, and a cheap embedded respectively, an average laptop, an old desktop, and a cheap embedded
system, in case full hierarchical support is enabled (i.e., system, in case full hierarchical support is enabled (i.e.,
CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_DEBUG_BLK_CGROUP is not CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not
set (Section 4-2): set (Section 4-2):
- Intel i7-4850HQ: 400 KIOPS - Intel i7-4850HQ: 400 KIOPS
- AMD A8-3850: 250 KIOPS - AMD A8-3850: 250 KIOPS
- ARM CortexTM-A53 Octa-core: 80 KIOPS - ARM CortexTM-A53 Octa-core: 80 KIOPS
If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical
support is enabled), then the sustainable throughput with BFQ support is enabled), then the sustainable throughput with BFQ
decreases, because all blkio.bfq* statistics are created and updated decreases, because all blkio.bfq* statistics are created and updated
(Section 4-2). For BFQ, this leads to the following maximum (Section 4-2). For BFQ, this leads to the following maximum
@ -537,19 +537,19 @@ or io.bfq.weight.
As for cgroups-v1 (blkio controller), the exact set of stat files As for cgroups-v1 (blkio controller), the exact set of stat files
created, and kept up-to-date by bfq, depends on whether created, and kept up-to-date by bfq, depends on whether
CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all
the stat files documented in the stat files documented in
Documentation/cgroup-v1/blkio-controller.rst. If, instead, Documentation/cgroup-v1/blkio-controller.rst. If, instead,
CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files
blkio.bfq.io_service_bytes blkio.bfq.io_service_bytes
blkio.bfq.io_service_bytes_recursive blkio.bfq.io_service_bytes_recursive
blkio.bfq.io_serviced blkio.bfq.io_serviced
blkio.bfq.io_serviced_recursive blkio.bfq.io_serviced_recursive
The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum
throughput sustainable with bfq, because updating the blkio.bfq.* throughput sustainable with bfq, because updating the blkio.bfq.*
stats is rather costly, especially for some of the stats enabled by stats is rather costly, especially for some of the stats enabled by
CONFIG_DEBUG_BLK_CGROUP. CONFIG_BFQ_CGROUP_DEBUG.
Parameters to set Parameters to set
----------------- -----------------

View File

@ -436,7 +436,6 @@ struct bio {
struct bvec_iter bi_iter; /* current index into bio_vec array */ struct bvec_iter bi_iter; /* current index into bio_vec array */
unsigned int bi_size; /* total size in bytes */ unsigned int bi_size; /* total size in bytes */
unsigned short bi_phys_segments; /* segments after physaddr coalesce*/
unsigned short bi_hw_segments; /* segments after DMA remapping */ unsigned short bi_hw_segments; /* segments after DMA remapping */
unsigned int bi_max; /* max bio_vecs we can hold unsigned int bi_max; /* max bio_vecs we can hold
used as index into pool */ used as index into pool */

View File

@ -14,6 +14,15 @@ add_random (RW)
This file allows to turn off the disk entropy contribution. Default This file allows to turn off the disk entropy contribution. Default
value of this file is '1'(on). value of this file is '1'(on).
chunk_sectors (RO)
------------------
This has different meaning depending on the type of the block device.
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
of the RAID volume stripe segment. For a zoned block device, either host-aware
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
of the device, with the eventual exception of the last zone of the device which
may be smaller.
dax (RO) dax (RO)
-------- --------
This file indicates whether the device supports Direct Access (DAX), This file indicates whether the device supports Direct Access (DAX),
@ -43,6 +52,16 @@ large discards are issued, setting this value lower will make Linux issue
smaller discards and potentially help reduce latencies induced by large smaller discards and potentially help reduce latencies induced by large
discard operations. discard operations.
discard_zeroes_data (RO)
------------------------
Obsolete. Always zero.
fua (RO)
--------
Whether or not the block driver supports the FUA flag for write requests.
FUA stands for Force Unit Access. If the FUA flag is set that means that
write requests must bypass the volatile cache of the storage device.
hw_sector_size (RO) hw_sector_size (RO)
------------------- -------------------
This is the hardware sector size of the device, in bytes. This is the hardware sector size of the device, in bytes.
@ -83,14 +102,19 @@ logical_block_size (RO)
----------------------- -----------------------
This is the logical block size of the device, in bytes. This is the logical block size of the device, in bytes.
max_discard_segments (RO)
-------------------------
The maximum number of DMA scatter/gather entries in a discard request.
max_hw_sectors_kb (RO) max_hw_sectors_kb (RO)
---------------------- ----------------------
This is the maximum number of kilobytes supported in a single data transfer. This is the maximum number of kilobytes supported in a single data transfer.
max_integrity_segments (RO) max_integrity_segments (RO)
--------------------------- ---------------------------
When read, this file shows the max limit of integrity segments as Maximum number of elements in a DMA scatter/gather list with integrity
set by block layer which a hardware controller can handle. data that will be submitted by the block layer core to the associated
block driver.
max_sectors_kb (RW) max_sectors_kb (RW)
------------------- -------------------
@ -100,11 +124,12 @@ size allowed by the hardware.
max_segments (RO) max_segments (RO)
----------------- -----------------
Maximum number of segments of the device. Maximum number of elements in a DMA scatter/gather list that is submitted
to the associated block driver.
max_segment_size (RO) max_segment_size (RO)
--------------------- ---------------------
Maximum segment size of the device. Maximum size in bytes of a single element in a DMA scatter/gather list.
minimum_io_size (RO) minimum_io_size (RO)
-------------------- --------------------
@ -132,6 +157,12 @@ per-block-cgroup request pool. IOW, if there are N block cgroups,
each request queue may have up to N request pools, each independently each request queue may have up to N request pools, each independently
regulated by nr_requests. regulated by nr_requests.
nr_zones (RO)
-------------
For zoned block devices (zoned attribute indicating "host-managed" or
"host-aware"), this indicates the total number of zones of the device.
This is always 0 for regular block devices.
optimal_io_size (RO) optimal_io_size (RO)
-------------------- --------------------
This is the optimal IO size reported by the device. This is the optimal IO size reported by the device.
@ -185,8 +216,8 @@ This is the number of bytes the device can write in a single write-same
command. A value of '0' means write-same is not supported by this command. A value of '0' means write-same is not supported by this
device. device.
wb_lat_usec (RW) wbt_lat_usec (RW)
---------------- -----------------
If the device is registered for writeback throttling, then this file shows If the device is registered for writeback throttling, then this file shows
the target minimum read latency. If this latency is exceeded in a given the target minimum read latency. If this latency is exceeded in a given
window of time (see wb_window_usec), then the writeback throttling will start window of time (see wb_window_usec), then the writeback throttling will start
@ -201,6 +232,12 @@ blk-throttle makes decision based on the samplings. Lower time means cgroups
have more smooth throughput, but higher CPU overhead. This exists only when have more smooth throughput, but higher CPU overhead. This exists only when
CONFIG_BLK_DEV_THROTTLING_LOW is enabled. CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
write_zeroes_max_bytes (RO)
---------------------------
For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
is not supported.
zoned (RO) zoned (RO)
---------- ----------
This indicates if the device is a zoned block device and the zone model of the This indicates if the device is a zoned block device and the zone model of the
@ -213,19 +250,4 @@ devices are described in the ZBC (Zoned Block Commands) and ZAC
do not support zone commands, they will be treated as regular block devices do not support zone commands, they will be treated as regular block devices
and zoned will report "none". and zoned will report "none".
nr_zones (RO)
-------------
For zoned block devices (zoned attribute indicating "host-managed" or
"host-aware"), this indicates the total number of zones of the device.
This is always 0 for regular block devices.
chunk_sectors (RO)
------------------
This has different meaning depending on the type of the block device.
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
of the RAID volume stripe segment. For a zoned block device, either host-aware
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
of the device, with the eventual exception of the last zone of the device which
may be smaller.
Jens Axboe <jens.axboe@oracle.com>, February 2009 Jens Axboe <jens.axboe@oracle.com>, February 2009

View File

@ -82,7 +82,7 @@ Various user visible config options
CONFIG_BLK_CGROUP CONFIG_BLK_CGROUP
- Block IO controller. - Block IO controller.
CONFIG_DEBUG_BLK_CGROUP CONFIG_BFQ_CGROUP_DEBUG
- Debug help. Right now some additional stats file show up in cgroup - Debug help. Right now some additional stats file show up in cgroup
if this option is enabled. if this option is enabled.
@ -202,13 +202,13 @@ Proportional weight policy files
write, sync or async. write, sync or async.
- blkio.avg_queue_size - blkio.avg_queue_size
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
The average queue size for this cgroup over the entire time of this The average queue size for this cgroup over the entire time of this
cgroup's existence. Queue size samples are taken each time one of the cgroup's existence. Queue size samples are taken each time one of the
queues of this cgroup gets a timeslice. queues of this cgroup gets a timeslice.
- blkio.group_wait_time - blkio.group_wait_time
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
This is the amount of time the cgroup had to wait since it became busy This is the amount of time the cgroup had to wait since it became busy
(i.e., went from 0 to 1 request queued) to get a timeslice for one of (i.e., went from 0 to 1 request queued) to get a timeslice for one of
its queues. This is different from the io_wait_time which is the its queues. This is different from the io_wait_time which is the
@ -219,7 +219,7 @@ Proportional weight policy files
got a timeslice and will not include the current delta. got a timeslice and will not include the current delta.
- blkio.empty_time - blkio.empty_time
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
This is the amount of time a cgroup spends without any pending This is the amount of time a cgroup spends without any pending
requests when not being served, i.e., it does not include any time requests when not being served, i.e., it does not include any time
spent idling for one of the queues of the cgroup. This is in spent idling for one of the queues of the cgroup. This is in
@ -228,7 +228,7 @@ Proportional weight policy files
time it had a pending request and will not include the current delta. time it had a pending request and will not include the current delta.
- blkio.idle_time - blkio.idle_time
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
This is the amount of time spent by the IO scheduler idling for a This is the amount of time spent by the IO scheduler idling for a
given cgroup in anticipation of a better request than the existing ones given cgroup in anticipation of a better request than the existing ones
from other queues/cgroups. This is in nanoseconds. If this is read from other queues/cgroups. This is in nanoseconds. If this is read
@ -237,7 +237,7 @@ Proportional weight policy files
the current delta. the current delta.
- blkio.dequeue - blkio.dequeue
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. This - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
gives the statistics about how many a times a group was dequeued gives the statistics about how many a times a group was dequeued
from service tree of the device. First two fields specify the major from service tree of the device. First two fields specify the major
and minor number of the device and third field specifies the number and minor number of the device and third field specifies the number

View File

@ -114,3 +114,59 @@ R13: ffff88011a3c9680 R14: 0000000000000000 R15: 0000000000000000
cpu_startup_entry+0x6f/0x80 cpu_startup_entry+0x6f/0x80
start_secondary+0x187/0x1e0 start_secondary+0x187/0x1e0
secondary_startup_64+0xa5/0xb0 secondary_startup_64+0xa5/0xb0
Example 3: Inject an error into the 10th admin command
------------------------------------------------------
echo 100 > /sys/kernel/debug/nvme0/fault_inject/probability
echo 10 > /sys/kernel/debug/nvme0/fault_inject/space
echo 1 > /sys/kernel/debug/nvme0/fault_inject/times
nvme reset /dev/nvme0
Expected Result:
After NVMe controller reset, the reinitialization may or may not succeed.
It depends on which admin command is actually forced to fail.
Message from dmesg:
nvme nvme0: resetting controller
FAULT_INJECTION: forcing a failure.
name fault_inject, interval 1, probability 100, space 1, times 1
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2
Hardware name: MSI MS-7A45/B150M MORTAR ARCTIC (MS-7A45), BIOS 1.50 04/25/2017
Call Trace:
<IRQ>
dump_stack+0x63/0x85
should_fail+0x14a/0x170
nvme_should_fail+0x38/0x80 [nvme_core]
nvme_irq+0x129/0x280 [nvme]
? blk_mq_end_request+0xb3/0x120
__handle_irq_event_percpu+0x84/0x1a0
handle_irq_event_percpu+0x32/0x80
handle_irq_event+0x3b/0x60
handle_edge_irq+0x7f/0x1a0
handle_irq+0x20/0x30
do_IRQ+0x4e/0xe0
common_interrupt+0xf/0xf
</IRQ>
RIP: 0010:cpuidle_enter_state+0xc5/0x460
Code: ff e8 8f 5f 86 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 69 03 00 00 31 ff e8 62 aa 8c ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 37 03 00 00 4c 8b 45 d0 4c 2b 45 b8 48 ba cf f7 53
RSP: 0018:ffffffff88c03dd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc
RAX: ffff9dac25a2ac80 RBX: ffffffff88d53760 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 000000002d958403 RDI: 0000000000000000
RBP: ffffffff88c03e18 R08: fffffff75e35ffb7 R09: 00000a49a56c0b48
R10: ffffffff88c03da0 R11: 0000000000001b0c R12: ffff9dac25a34d00
R13: 0000000000000006 R14: 0000000000000006 R15: ffffffff88d53760
cpuidle_enter+0x2e/0x40
call_cpuidle+0x23/0x40
do_idle+0x201/0x280
cpu_startup_entry+0x1d/0x20
rest_init+0xaa/0xb0
arch_call_rest_init+0xe/0x1b
start_kernel+0x51c/0x53b
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x74/0x77
secondary_startup_64+0xa4/0xb0
nvme nvme0: Could not set queue count (16385)
nvme nvme0: IO queues not created

View File

@ -36,6 +36,13 @@ config BFQ_GROUP_IOSCHED
Enable hierarchical scheduling in BFQ, using the blkio Enable hierarchical scheduling in BFQ, using the blkio
(cgroups-v1) or io (cgroups-v2) controller. (cgroups-v1) or io (cgroups-v2) controller.
config BFQ_CGROUP_DEBUG
bool "BFQ IO controller debugging"
depends on BFQ_GROUP_IOSCHED
---help---
Enable some debugging help. Currently it exports additional stat
files in a cgroup which can be useful for debugging.
endmenu endmenu
endif endif

View File

@ -15,7 +15,83 @@
#include "bfq-iosched.h" #include "bfq-iosched.h"
#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP) #ifdef CONFIG_BFQ_CGROUP_DEBUG
static int bfq_stat_init(struct bfq_stat *stat, gfp_t gfp)
{
int ret;
ret = percpu_counter_init(&stat->cpu_cnt, 0, gfp);
if (ret)
return ret;
atomic64_set(&stat->aux_cnt, 0);
return 0;
}
static void bfq_stat_exit(struct bfq_stat *stat)
{
percpu_counter_destroy(&stat->cpu_cnt);
}
/**
* bfq_stat_add - add a value to a bfq_stat
* @stat: target bfq_stat
* @val: value to add
*
* Add @val to @stat. The caller must ensure that IRQ on the same CPU
* don't re-enter this function for the same counter.
*/
static inline void bfq_stat_add(struct bfq_stat *stat, uint64_t val)
{
percpu_counter_add_batch(&stat->cpu_cnt, val, BLKG_STAT_CPU_BATCH);
}
/**
* bfq_stat_read - read the current value of a bfq_stat
* @stat: bfq_stat to read
*/
static inline uint64_t bfq_stat_read(struct bfq_stat *stat)
{
return percpu_counter_sum_positive(&stat->cpu_cnt);
}
/**
* bfq_stat_reset - reset a bfq_stat
* @stat: bfq_stat to reset
*/
static inline void bfq_stat_reset(struct bfq_stat *stat)
{
percpu_counter_set(&stat->cpu_cnt, 0);
atomic64_set(&stat->aux_cnt, 0);
}
/**
* bfq_stat_add_aux - add a bfq_stat into another's aux count
* @to: the destination bfq_stat
* @from: the source
*
* Add @from's count including the aux one to @to's aux count.
*/
static inline void bfq_stat_add_aux(struct bfq_stat *to,
struct bfq_stat *from)
{
atomic64_add(bfq_stat_read(from) + atomic64_read(&from->aux_cnt),
&to->aux_cnt);
}
/**
* blkg_prfill_stat - prfill callback for bfq_stat
* @sf: seq_file to print to
* @pd: policy private data of interest
* @off: offset to the bfq_stat in @pd
*
* prfill callback for printing a bfq_stat.
*/
static u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd,
int off)
{
return __blkg_prfill_u64(sf, pd, bfq_stat_read((void *)pd + off));
}
/* bfqg stats flags */ /* bfqg stats flags */
enum bfqg_stats_flags { enum bfqg_stats_flags {
@ -53,7 +129,7 @@ static void bfqg_stats_update_group_wait_time(struct bfqg_stats *stats)
now = ktime_get_ns(); now = ktime_get_ns();
if (now > stats->start_group_wait_time) if (now > stats->start_group_wait_time)
blkg_stat_add(&stats->group_wait_time, bfq_stat_add(&stats->group_wait_time,
now - stats->start_group_wait_time); now - stats->start_group_wait_time);
bfqg_stats_clear_waiting(stats); bfqg_stats_clear_waiting(stats);
} }
@ -82,14 +158,14 @@ static void bfqg_stats_end_empty_time(struct bfqg_stats *stats)
now = ktime_get_ns(); now = ktime_get_ns();
if (now > stats->start_empty_time) if (now > stats->start_empty_time)
blkg_stat_add(&stats->empty_time, bfq_stat_add(&stats->empty_time,
now - stats->start_empty_time); now - stats->start_empty_time);
bfqg_stats_clear_empty(stats); bfqg_stats_clear_empty(stats);
} }
void bfqg_stats_update_dequeue(struct bfq_group *bfqg) void bfqg_stats_update_dequeue(struct bfq_group *bfqg)
{ {
blkg_stat_add(&bfqg->stats.dequeue, 1); bfq_stat_add(&bfqg->stats.dequeue, 1);
} }
void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg) void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg)
@ -119,7 +195,7 @@ void bfqg_stats_update_idle_time(struct bfq_group *bfqg)
u64 now = ktime_get_ns(); u64 now = ktime_get_ns();
if (now > stats->start_idle_time) if (now > stats->start_idle_time)
blkg_stat_add(&stats->idle_time, bfq_stat_add(&stats->idle_time,
now - stats->start_idle_time); now - stats->start_idle_time);
bfqg_stats_clear_idling(stats); bfqg_stats_clear_idling(stats);
} }
@ -137,9 +213,9 @@ void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg)
{ {
struct bfqg_stats *stats = &bfqg->stats; struct bfqg_stats *stats = &bfqg->stats;
blkg_stat_add(&stats->avg_queue_size_sum, bfq_stat_add(&stats->avg_queue_size_sum,
blkg_rwstat_total(&stats->queued)); blkg_rwstat_total(&stats->queued));
blkg_stat_add(&stats->avg_queue_size_samples, 1); bfq_stat_add(&stats->avg_queue_size_samples, 1);
bfqg_stats_update_group_wait_time(stats); bfqg_stats_update_group_wait_time(stats);
} }
@ -176,7 +252,7 @@ void bfqg_stats_update_completion(struct bfq_group *bfqg, u64 start_time_ns,
io_start_time_ns - start_time_ns); io_start_time_ns - start_time_ns);
} }
#else /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */ #else /* CONFIG_BFQ_CGROUP_DEBUG */
void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq, void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
unsigned int op) { } unsigned int op) { }
@ -190,7 +266,7 @@ void bfqg_stats_update_idle_time(struct bfq_group *bfqg) { }
void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { } void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { }
void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { } void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { }
#endif /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */ #endif /* CONFIG_BFQ_CGROUP_DEBUG */
#ifdef CONFIG_BFQ_GROUP_IOSCHED #ifdef CONFIG_BFQ_GROUP_IOSCHED
@ -274,18 +350,18 @@ void bfqg_and_blkg_put(struct bfq_group *bfqg)
/* @stats = 0 */ /* @stats = 0 */
static void bfqg_stats_reset(struct bfqg_stats *stats) static void bfqg_stats_reset(struct bfqg_stats *stats)
{ {
#ifdef CONFIG_DEBUG_BLK_CGROUP #ifdef CONFIG_BFQ_CGROUP_DEBUG
/* queued stats shouldn't be cleared */ /* queued stats shouldn't be cleared */
blkg_rwstat_reset(&stats->merged); blkg_rwstat_reset(&stats->merged);
blkg_rwstat_reset(&stats->service_time); blkg_rwstat_reset(&stats->service_time);
blkg_rwstat_reset(&stats->wait_time); blkg_rwstat_reset(&stats->wait_time);
blkg_stat_reset(&stats->time); bfq_stat_reset(&stats->time);
blkg_stat_reset(&stats->avg_queue_size_sum); bfq_stat_reset(&stats->avg_queue_size_sum);
blkg_stat_reset(&stats->avg_queue_size_samples); bfq_stat_reset(&stats->avg_queue_size_samples);
blkg_stat_reset(&stats->dequeue); bfq_stat_reset(&stats->dequeue);
blkg_stat_reset(&stats->group_wait_time); bfq_stat_reset(&stats->group_wait_time);
blkg_stat_reset(&stats->idle_time); bfq_stat_reset(&stats->idle_time);
blkg_stat_reset(&stats->empty_time); bfq_stat_reset(&stats->empty_time);
#endif #endif
} }
@ -295,19 +371,19 @@ static void bfqg_stats_add_aux(struct bfqg_stats *to, struct bfqg_stats *from)
if (!to || !from) if (!to || !from)
return; return;
#ifdef CONFIG_DEBUG_BLK_CGROUP #ifdef CONFIG_BFQ_CGROUP_DEBUG
/* queued stats shouldn't be cleared */ /* queued stats shouldn't be cleared */
blkg_rwstat_add_aux(&to->merged, &from->merged); blkg_rwstat_add_aux(&to->merged, &from->merged);
blkg_rwstat_add_aux(&to->service_time, &from->service_time); blkg_rwstat_add_aux(&to->service_time, &from->service_time);
blkg_rwstat_add_aux(&to->wait_time, &from->wait_time); blkg_rwstat_add_aux(&to->wait_time, &from->wait_time);
blkg_stat_add_aux(&from->time, &from->time); bfq_stat_add_aux(&from->time, &from->time);
blkg_stat_add_aux(&to->avg_queue_size_sum, &from->avg_queue_size_sum); bfq_stat_add_aux(&to->avg_queue_size_sum, &from->avg_queue_size_sum);
blkg_stat_add_aux(&to->avg_queue_size_samples, bfq_stat_add_aux(&to->avg_queue_size_samples,
&from->avg_queue_size_samples); &from->avg_queue_size_samples);
blkg_stat_add_aux(&to->dequeue, &from->dequeue); bfq_stat_add_aux(&to->dequeue, &from->dequeue);
blkg_stat_add_aux(&to->group_wait_time, &from->group_wait_time); bfq_stat_add_aux(&to->group_wait_time, &from->group_wait_time);
blkg_stat_add_aux(&to->idle_time, &from->idle_time); bfq_stat_add_aux(&to->idle_time, &from->idle_time);
blkg_stat_add_aux(&to->empty_time, &from->empty_time); bfq_stat_add_aux(&to->empty_time, &from->empty_time);
#endif #endif
} }
@ -355,35 +431,35 @@ void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg)
static void bfqg_stats_exit(struct bfqg_stats *stats) static void bfqg_stats_exit(struct bfqg_stats *stats)
{ {
#ifdef CONFIG_DEBUG_BLK_CGROUP #ifdef CONFIG_BFQ_CGROUP_DEBUG
blkg_rwstat_exit(&stats->merged); blkg_rwstat_exit(&stats->merged);
blkg_rwstat_exit(&stats->service_time); blkg_rwstat_exit(&stats->service_time);
blkg_rwstat_exit(&stats->wait_time); blkg_rwstat_exit(&stats->wait_time);
blkg_rwstat_exit(&stats->queued); blkg_rwstat_exit(&stats->queued);
blkg_stat_exit(&stats->time); bfq_stat_exit(&stats->time);
blkg_stat_exit(&stats->avg_queue_size_sum); bfq_stat_exit(&stats->avg_queue_size_sum);
blkg_stat_exit(&stats->avg_queue_size_samples); bfq_stat_exit(&stats->avg_queue_size_samples);
blkg_stat_exit(&stats->dequeue); bfq_stat_exit(&stats->dequeue);
blkg_stat_exit(&stats->group_wait_time); bfq_stat_exit(&stats->group_wait_time);
blkg_stat_exit(&stats->idle_time); bfq_stat_exit(&stats->idle_time);
blkg_stat_exit(&stats->empty_time); bfq_stat_exit(&stats->empty_time);
#endif #endif
} }
static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp) static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
{ {
#ifdef CONFIG_DEBUG_BLK_CGROUP #ifdef CONFIG_BFQ_CGROUP_DEBUG
if (blkg_rwstat_init(&stats->merged, gfp) || if (blkg_rwstat_init(&stats->merged, gfp) ||
blkg_rwstat_init(&stats->service_time, gfp) || blkg_rwstat_init(&stats->service_time, gfp) ||
blkg_rwstat_init(&stats->wait_time, gfp) || blkg_rwstat_init(&stats->wait_time, gfp) ||
blkg_rwstat_init(&stats->queued, gfp) || blkg_rwstat_init(&stats->queued, gfp) ||
blkg_stat_init(&stats->time, gfp) || bfq_stat_init(&stats->time, gfp) ||
blkg_stat_init(&stats->avg_queue_size_sum, gfp) || bfq_stat_init(&stats->avg_queue_size_sum, gfp) ||
blkg_stat_init(&stats->avg_queue_size_samples, gfp) || bfq_stat_init(&stats->avg_queue_size_samples, gfp) ||
blkg_stat_init(&stats->dequeue, gfp) || bfq_stat_init(&stats->dequeue, gfp) ||
blkg_stat_init(&stats->group_wait_time, gfp) || bfq_stat_init(&stats->group_wait_time, gfp) ||
blkg_stat_init(&stats->idle_time, gfp) || bfq_stat_init(&stats->idle_time, gfp) ||
blkg_stat_init(&stats->empty_time, gfp)) { bfq_stat_init(&stats->empty_time, gfp)) {
bfqg_stats_exit(stats); bfqg_stats_exit(stats);
return -ENOMEM; return -ENOMEM;
} }
@ -909,7 +985,7 @@ static ssize_t bfq_io_set_weight(struct kernfs_open_file *of,
return ret ?: nbytes; return ret ?: nbytes;
} }
#ifdef CONFIG_DEBUG_BLK_CGROUP #ifdef CONFIG_BFQ_CGROUP_DEBUG
static int bfqg_print_stat(struct seq_file *sf, void *v) static int bfqg_print_stat(struct seq_file *sf, void *v)
{ {
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat, blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
@ -927,17 +1003,34 @@ static int bfqg_print_rwstat(struct seq_file *sf, void *v)
static u64 bfqg_prfill_stat_recursive(struct seq_file *sf, static u64 bfqg_prfill_stat_recursive(struct seq_file *sf,
struct blkg_policy_data *pd, int off) struct blkg_policy_data *pd, int off)
{ {
u64 sum = blkg_stat_recursive_sum(pd_to_blkg(pd), struct blkcg_gq *blkg = pd_to_blkg(pd);
&blkcg_policy_bfq, off); struct blkcg_gq *pos_blkg;
struct cgroup_subsys_state *pos_css;
u64 sum = 0;
lockdep_assert_held(&blkg->q->queue_lock);
rcu_read_lock();
blkg_for_each_descendant_pre(pos_blkg, pos_css, blkg) {
struct bfq_stat *stat;
if (!pos_blkg->online)
continue;
stat = (void *)blkg_to_pd(pos_blkg, &blkcg_policy_bfq) + off;
sum += bfq_stat_read(stat) + atomic64_read(&stat->aux_cnt);
}
rcu_read_unlock();
return __blkg_prfill_u64(sf, pd, sum); return __blkg_prfill_u64(sf, pd, sum);
} }
static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf, static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf,
struct blkg_policy_data *pd, int off) struct blkg_policy_data *pd, int off)
{ {
struct blkg_rwstat sum = blkg_rwstat_recursive_sum(pd_to_blkg(pd), struct blkg_rwstat_sample sum;
&blkcg_policy_bfq,
off); blkg_rwstat_recursive_sum(pd_to_blkg(pd), &blkcg_policy_bfq, off, &sum);
return __blkg_prfill_rwstat(sf, pd, &sum); return __blkg_prfill_rwstat(sf, pd, &sum);
} }
@ -975,12 +1068,13 @@ static int bfqg_print_stat_sectors(struct seq_file *sf, void *v)
static u64 bfqg_prfill_sectors_recursive(struct seq_file *sf, static u64 bfqg_prfill_sectors_recursive(struct seq_file *sf,
struct blkg_policy_data *pd, int off) struct blkg_policy_data *pd, int off)
{ {
struct blkg_rwstat tmp = blkg_rwstat_recursive_sum(pd->blkg, NULL, struct blkg_rwstat_sample tmp;
offsetof(struct blkcg_gq, stat_bytes));
u64 sum = atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_READ]) +
atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_WRITE]);
return __blkg_prfill_u64(sf, pd, sum >> 9); blkg_rwstat_recursive_sum(pd->blkg, NULL,
offsetof(struct blkcg_gq, stat_bytes), &tmp);
return __blkg_prfill_u64(sf, pd,
(tmp.cnt[BLKG_RWSTAT_READ] + tmp.cnt[BLKG_RWSTAT_WRITE]) >> 9);
} }
static int bfqg_print_stat_sectors_recursive(struct seq_file *sf, void *v) static int bfqg_print_stat_sectors_recursive(struct seq_file *sf, void *v)
@ -995,11 +1089,11 @@ static u64 bfqg_prfill_avg_queue_size(struct seq_file *sf,
struct blkg_policy_data *pd, int off) struct blkg_policy_data *pd, int off)
{ {
struct bfq_group *bfqg = pd_to_bfqg(pd); struct bfq_group *bfqg = pd_to_bfqg(pd);
u64 samples = blkg_stat_read(&bfqg->stats.avg_queue_size_samples); u64 samples = bfq_stat_read(&bfqg->stats.avg_queue_size_samples);
u64 v = 0; u64 v = 0;
if (samples) { if (samples) {
v = blkg_stat_read(&bfqg->stats.avg_queue_size_sum); v = bfq_stat_read(&bfqg->stats.avg_queue_size_sum);
v = div64_u64(v, samples); v = div64_u64(v, samples);
} }
__blkg_prfill_u64(sf, pd, v); __blkg_prfill_u64(sf, pd, v);
@ -1014,7 +1108,7 @@ static int bfqg_print_avg_queue_size(struct seq_file *sf, void *v)
0, false); 0, false);
return 0; return 0;
} }
#endif /* CONFIG_DEBUG_BLK_CGROUP */ #endif /* CONFIG_BFQ_CGROUP_DEBUG */
struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node) struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
{ {
@ -1062,7 +1156,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
.private = (unsigned long)&blkcg_policy_bfq, .private = (unsigned long)&blkcg_policy_bfq,
.seq_show = blkg_print_stat_ios, .seq_show = blkg_print_stat_ios,
}, },
#ifdef CONFIG_DEBUG_BLK_CGROUP #ifdef CONFIG_BFQ_CGROUP_DEBUG
{ {
.name = "bfq.time", .name = "bfq.time",
.private = offsetof(struct bfq_group, stats.time), .private = offsetof(struct bfq_group, stats.time),
@ -1092,7 +1186,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
.private = offsetof(struct bfq_group, stats.queued), .private = offsetof(struct bfq_group, stats.queued),
.seq_show = bfqg_print_rwstat, .seq_show = bfqg_print_rwstat,
}, },
#endif /* CONFIG_DEBUG_BLK_CGROUP */ #endif /* CONFIG_BFQ_CGROUP_DEBUG */
/* the same statistics which cover the bfqg and its descendants */ /* the same statistics which cover the bfqg and its descendants */
{ {
@ -1105,7 +1199,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
.private = (unsigned long)&blkcg_policy_bfq, .private = (unsigned long)&blkcg_policy_bfq,
.seq_show = blkg_print_stat_ios_recursive, .seq_show = blkg_print_stat_ios_recursive,
}, },
#ifdef CONFIG_DEBUG_BLK_CGROUP #ifdef CONFIG_BFQ_CGROUP_DEBUG
{ {
.name = "bfq.time_recursive", .name = "bfq.time_recursive",
.private = offsetof(struct bfq_group, stats.time), .private = offsetof(struct bfq_group, stats.time),
@ -1159,7 +1253,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
.private = offsetof(struct bfq_group, stats.dequeue), .private = offsetof(struct bfq_group, stats.dequeue),
.seq_show = bfqg_print_stat, .seq_show = bfqg_print_stat,
}, },
#endif /* CONFIG_DEBUG_BLK_CGROUP */ #endif /* CONFIG_BFQ_CGROUP_DEBUG */
{ } /* terminate */ { } /* terminate */
}; };

File diff suppressed because it is too large Load Diff

View File

@ -357,6 +357,24 @@ struct bfq_queue {
/* max service rate measured so far */ /* max service rate measured so far */
u32 max_service_rate; u32 max_service_rate;
/*
* Pointer to the waker queue for this queue, i.e., to the
* queue Q such that this queue happens to get new I/O right
* after some I/O request of Q is completed. For details, see
* the comments on the choice of the queue for injection in
* bfq_select_queue().
*/
struct bfq_queue *waker_bfqq;
/* node for woken_list, see below */
struct hlist_node woken_list_node;
/*
* Head of the list of the woken queues for this queue, i.e.,
* of the list of the queues for which this queue is a waker
* queue. This list is used to reset the waker_bfqq pointer in
* the woken queues when this queue exits.
*/
struct hlist_head woken_list;
}; };
/** /**
@ -533,6 +551,9 @@ struct bfq_data {
/* time of last request completion (ns) */ /* time of last request completion (ns) */
u64 last_completion; u64 last_completion;
/* bfqq owning the last completed rq */
struct bfq_queue *last_completed_rq_bfqq;
/* time of last transition from empty to non-empty (ns) */ /* time of last transition from empty to non-empty (ns) */
u64 last_empty_occupied_ns; u64 last_empty_occupied_ns;
@ -743,7 +764,8 @@ enum bfqq_state_flags {
* update * update
*/ */
BFQQF_coop, /* bfqq is shared */ BFQQF_coop, /* bfqq is shared */
BFQQF_split_coop /* shared bfqq will be split */ BFQQF_split_coop, /* shared bfqq will be split */
BFQQF_has_waker /* bfqq has a waker queue */
}; };
#define BFQ_BFQQ_FNS(name) \ #define BFQ_BFQQ_FNS(name) \
@ -763,6 +785,7 @@ BFQ_BFQQ_FNS(in_large_burst);
BFQ_BFQQ_FNS(coop); BFQ_BFQQ_FNS(coop);
BFQ_BFQQ_FNS(split_coop); BFQ_BFQQ_FNS(split_coop);
BFQ_BFQQ_FNS(softrt_update); BFQ_BFQQ_FNS(softrt_update);
BFQ_BFQQ_FNS(has_waker);
#undef BFQ_BFQQ_FNS #undef BFQ_BFQQ_FNS
/* Expiration reasons. */ /* Expiration reasons. */
@ -777,8 +800,13 @@ enum bfqq_expiration {
BFQQE_PREEMPTED /* preemption in progress */ BFQQE_PREEMPTED /* preemption in progress */
}; };
struct bfq_stat {
struct percpu_counter cpu_cnt;
atomic64_t aux_cnt;
};
struct bfqg_stats { struct bfqg_stats {
#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP) #ifdef CONFIG_BFQ_CGROUP_DEBUG
/* number of ios merged */ /* number of ios merged */
struct blkg_rwstat merged; struct blkg_rwstat merged;
/* total time spent on device in ns, may not be accurate w/ queueing */ /* total time spent on device in ns, may not be accurate w/ queueing */
@ -788,25 +816,25 @@ struct bfqg_stats {
/* number of IOs queued up */ /* number of IOs queued up */
struct blkg_rwstat queued; struct blkg_rwstat queued;
/* total disk time and nr sectors dispatched by this group */ /* total disk time and nr sectors dispatched by this group */
struct blkg_stat time; struct bfq_stat time;
/* sum of number of ios queued across all samples */ /* sum of number of ios queued across all samples */
struct blkg_stat avg_queue_size_sum; struct bfq_stat avg_queue_size_sum;
/* count of samples taken for average */ /* count of samples taken for average */
struct blkg_stat avg_queue_size_samples; struct bfq_stat avg_queue_size_samples;
/* how many times this group has been removed from service tree */ /* how many times this group has been removed from service tree */
struct blkg_stat dequeue; struct bfq_stat dequeue;
/* total time spent waiting for it to be assigned a timeslice. */ /* total time spent waiting for it to be assigned a timeslice. */
struct blkg_stat group_wait_time; struct bfq_stat group_wait_time;
/* time spent idling for this blkcg_gq */ /* time spent idling for this blkcg_gq */
struct blkg_stat idle_time; struct bfq_stat idle_time;
/* total time with empty current active q with other requests queued */ /* total time with empty current active q with other requests queued */
struct blkg_stat empty_time; struct bfq_stat empty_time;
/* fields after this shouldn't be cleared on stat reset */ /* fields after this shouldn't be cleared on stat reset */
u64 start_group_wait_time; u64 start_group_wait_time;
u64 start_idle_time; u64 start_idle_time;
u64 start_empty_time; u64 start_empty_time;
uint16_t flags; uint16_t flags;
#endif /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */ #endif /* CONFIG_BFQ_CGROUP_DEBUG */
}; };
#ifdef CONFIG_BFQ_GROUP_IOSCHED #ifdef CONFIG_BFQ_GROUP_IOSCHED

View File

@ -558,14 +558,6 @@ void bio_put(struct bio *bio)
} }
EXPORT_SYMBOL(bio_put); EXPORT_SYMBOL(bio_put);
int bio_phys_segments(struct request_queue *q, struct bio *bio)
{
if (unlikely(!bio_flagged(bio, BIO_SEG_VALID)))
blk_recount_segments(q, bio);
return bio->bi_phys_segments;
}
/** /**
* __bio_clone_fast - clone a bio that shares the original bio's biovec * __bio_clone_fast - clone a bio that shares the original bio's biovec
* @bio: destination bio * @bio: destination bio
@ -731,10 +723,10 @@ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
} }
} }
if (bio_full(bio)) if (bio_full(bio, len))
return 0; return 0;
if (bio->bi_phys_segments >= queue_max_segments(q)) if (bio->bi_vcnt >= queue_max_segments(q))
return 0; return 0;
bvec = &bio->bi_io_vec[bio->bi_vcnt]; bvec = &bio->bi_io_vec[bio->bi_vcnt];
@ -744,8 +736,6 @@ static int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
bio->bi_vcnt++; bio->bi_vcnt++;
done: done:
bio->bi_iter.bi_size += len; bio->bi_iter.bi_size += len;
bio->bi_phys_segments = bio->bi_vcnt;
bio_set_flag(bio, BIO_SEG_VALID);
return len; return len;
} }
@ -807,7 +797,7 @@ void __bio_add_page(struct bio *bio, struct page *page,
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt]; struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt];
WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
WARN_ON_ONCE(bio_full(bio)); WARN_ON_ONCE(bio_full(bio, len));
bv->bv_page = page; bv->bv_page = page;
bv->bv_offset = off; bv->bv_offset = off;
@ -834,7 +824,7 @@ int bio_add_page(struct bio *bio, struct page *page,
bool same_page = false; bool same_page = false;
if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) { if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) {
if (bio_full(bio)) if (bio_full(bio, len))
return 0; return 0;
__bio_add_page(bio, page, len, offset); __bio_add_page(bio, page, len, offset);
} }
@ -842,22 +832,19 @@ int bio_add_page(struct bio *bio, struct page *page,
} }
EXPORT_SYMBOL(bio_add_page); EXPORT_SYMBOL(bio_add_page);
static void bio_get_pages(struct bio *bio) void bio_release_pages(struct bio *bio, bool mark_dirty)
{ {
struct bvec_iter_all iter_all; struct bvec_iter_all iter_all;
struct bio_vec *bvec; struct bio_vec *bvec;
bio_for_each_segment_all(bvec, bio, iter_all) if (bio_flagged(bio, BIO_NO_PAGE_REF))
get_page(bvec->bv_page); return;
}
static void bio_release_pages(struct bio *bio) bio_for_each_segment_all(bvec, bio, iter_all) {
{ if (mark_dirty && !PageCompound(bvec->bv_page))
struct bvec_iter_all iter_all; set_page_dirty_lock(bvec->bv_page);
struct bio_vec *bvec;
bio_for_each_segment_all(bvec, bio, iter_all)
put_page(bvec->bv_page); put_page(bvec->bv_page);
}
} }
static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
@ -922,7 +909,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
if (same_page) if (same_page)
put_page(page); put_page(page);
} else { } else {
if (WARN_ON_ONCE(bio_full(bio))) if (WARN_ON_ONCE(bio_full(bio, len)))
return -EINVAL; return -EINVAL;
__bio_add_page(bio, page, len, offset); __bio_add_page(bio, page, len, offset);
} }
@ -966,13 +953,10 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
ret = __bio_iov_bvec_add_pages(bio, iter); ret = __bio_iov_bvec_add_pages(bio, iter);
else else
ret = __bio_iov_iter_get_pages(bio, iter); ret = __bio_iov_iter_get_pages(bio, iter);
} while (!ret && iov_iter_count(iter) && !bio_full(bio)); } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
if (iov_iter_bvec_no_ref(iter)) if (is_bvec)
bio_set_flag(bio, BIO_NO_PAGE_REF); bio_set_flag(bio, BIO_NO_PAGE_REF);
else if (is_bvec)
bio_get_pages(bio);
return bio->bi_vcnt ? 0 : ret; return bio->bi_vcnt ? 0 : ret;
} }
@ -1124,8 +1108,7 @@ static struct bio_map_data *bio_alloc_map_data(struct iov_iter *data,
if (data->nr_segs > UIO_MAXIOV) if (data->nr_segs > UIO_MAXIOV)
return NULL; return NULL;
bmd = kmalloc(sizeof(struct bio_map_data) + bmd = kmalloc(struct_size(bmd, iov, data->nr_segs), gfp_mask);
sizeof(struct iovec) * data->nr_segs, gfp_mask);
if (!bmd) if (!bmd)
return NULL; return NULL;
memcpy(bmd->iov, data->iov, sizeof(struct iovec) * data->nr_segs); memcpy(bmd->iov, data->iov, sizeof(struct iovec) * data->nr_segs);
@ -1371,8 +1354,6 @@ struct bio *bio_map_user_iov(struct request_queue *q,
int j; int j;
struct bio *bio; struct bio *bio;
int ret; int ret;
struct bio_vec *bvec;
struct bvec_iter_all iter_all;
if (!iov_iter_count(iter)) if (!iov_iter_count(iter))
return ERR_PTR(-EINVAL); return ERR_PTR(-EINVAL);
@ -1439,31 +1420,11 @@ struct bio *bio_map_user_iov(struct request_queue *q,
return bio; return bio;
out_unmap: out_unmap:
bio_for_each_segment_all(bvec, bio, iter_all) { bio_release_pages(bio, false);
put_page(bvec->bv_page);
}
bio_put(bio); bio_put(bio);
return ERR_PTR(ret); return ERR_PTR(ret);
} }
static void __bio_unmap_user(struct bio *bio)
{
struct bio_vec *bvec;
struct bvec_iter_all iter_all;
/*
* make sure we dirty pages we wrote to
*/
bio_for_each_segment_all(bvec, bio, iter_all) {
if (bio_data_dir(bio) == READ)
set_page_dirty_lock(bvec->bv_page);
put_page(bvec->bv_page);
}
bio_put(bio);
}
/** /**
* bio_unmap_user - unmap a bio * bio_unmap_user - unmap a bio
* @bio: the bio being unmapped * @bio: the bio being unmapped
@ -1475,7 +1436,8 @@ static void __bio_unmap_user(struct bio *bio)
*/ */
void bio_unmap_user(struct bio *bio) void bio_unmap_user(struct bio *bio)
{ {
__bio_unmap_user(bio); bio_release_pages(bio, bio_data_dir(bio) == READ);
bio_put(bio);
bio_put(bio); bio_put(bio);
} }
@ -1695,9 +1657,7 @@ static void bio_dirty_fn(struct work_struct *work)
while ((bio = next) != NULL) { while ((bio = next) != NULL) {
next = bio->bi_private; next = bio->bi_private;
bio_set_pages_dirty(bio); bio_release_pages(bio, true);
if (!bio_flagged(bio, BIO_NO_PAGE_REF))
bio_release_pages(bio);
bio_put(bio); bio_put(bio);
} }
} }
@ -1713,8 +1673,7 @@ void bio_check_pages_dirty(struct bio *bio)
goto defer; goto defer;
} }
if (!bio_flagged(bio, BIO_NO_PAGE_REF)) bio_release_pages(bio, false);
bio_release_pages(bio);
bio_put(bio); bio_put(bio);
return; return;
defer: defer:
@ -1775,18 +1734,6 @@ void generic_end_io_acct(struct request_queue *q, int req_op,
} }
EXPORT_SYMBOL(generic_end_io_acct); EXPORT_SYMBOL(generic_end_io_acct);
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
void bio_flush_dcache_pages(struct bio *bi)
{
struct bio_vec bvec;
struct bvec_iter iter;
bio_for_each_segment(bvec, bi, iter)
flush_dcache_page(bvec.bv_page);
}
EXPORT_SYMBOL(bio_flush_dcache_pages);
#endif
static inline bool bio_remaining_done(struct bio *bio) static inline bool bio_remaining_done(struct bio *bio)
{ {
/* /*
@ -1914,10 +1861,7 @@ void bio_trim(struct bio *bio, int offset, int size)
if (offset == 0 && size == bio->bi_iter.bi_size) if (offset == 0 && size == bio->bi_iter.bi_size)
return; return;
bio_clear_flag(bio, BIO_SEG_VALID);
bio_advance(bio, offset << 9); bio_advance(bio, offset << 9);
bio->bi_iter.bi_size = size; bio->bi_iter.bi_size = size;
if (bio_integrity(bio)) if (bio_integrity(bio))

View File

@ -79,6 +79,7 @@ static void blkg_free(struct blkcg_gq *blkg)
blkg_rwstat_exit(&blkg->stat_ios); blkg_rwstat_exit(&blkg->stat_ios);
blkg_rwstat_exit(&blkg->stat_bytes); blkg_rwstat_exit(&blkg->stat_bytes);
percpu_ref_exit(&blkg->refcnt);
kfree(blkg); kfree(blkg);
} }
@ -86,8 +87,6 @@ static void __blkg_release(struct rcu_head *rcu)
{ {
struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head); struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
percpu_ref_exit(&blkg->refcnt);
/* release the blkcg and parent blkg refs this blkg has been holding */ /* release the blkcg and parent blkg refs this blkg has been holding */
css_put(&blkg->blkcg->css); css_put(&blkg->blkcg->css);
if (blkg->parent) if (blkg->parent)
@ -132,6 +131,9 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
if (!blkg) if (!blkg)
return NULL; return NULL;
if (percpu_ref_init(&blkg->refcnt, blkg_release, 0, gfp_mask))
goto err_free;
if (blkg_rwstat_init(&blkg->stat_bytes, gfp_mask) || if (blkg_rwstat_init(&blkg->stat_bytes, gfp_mask) ||
blkg_rwstat_init(&blkg->stat_ios, gfp_mask)) blkg_rwstat_init(&blkg->stat_ios, gfp_mask))
goto err_free; goto err_free;
@ -244,11 +246,6 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
blkg_get(blkg->parent); blkg_get(blkg->parent);
} }
ret = percpu_ref_init(&blkg->refcnt, blkg_release, 0,
GFP_NOWAIT | __GFP_NOWARN);
if (ret)
goto err_cancel_ref;
/* invoke per-policy init */ /* invoke per-policy init */
for (i = 0; i < BLKCG_MAX_POLS; i++) { for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i]; struct blkcg_policy *pol = blkcg_policy[i];
@ -281,8 +278,6 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
blkg_put(blkg); blkg_put(blkg);
return ERR_PTR(ret); return ERR_PTR(ret);
err_cancel_ref:
percpu_ref_exit(&blkg->refcnt);
err_put_congested: err_put_congested:
wb_congested_put(wb_congested); wb_congested_put(wb_congested);
err_put_css: err_put_css:
@ -549,7 +544,7 @@ EXPORT_SYMBOL_GPL(__blkg_prfill_u64);
* Print @rwstat to @sf for the device assocaited with @pd. * Print @rwstat to @sf for the device assocaited with @pd.
*/ */
u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd, u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
const struct blkg_rwstat *rwstat) const struct blkg_rwstat_sample *rwstat)
{ {
static const char *rwstr[] = { static const char *rwstr[] = {
[BLKG_RWSTAT_READ] = "Read", [BLKG_RWSTAT_READ] = "Read",
@ -567,30 +562,16 @@ u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
for (i = 0; i < BLKG_RWSTAT_NR; i++) for (i = 0; i < BLKG_RWSTAT_NR; i++)
seq_printf(sf, "%s %s %llu\n", dname, rwstr[i], seq_printf(sf, "%s %s %llu\n", dname, rwstr[i],
(unsigned long long)atomic64_read(&rwstat->aux_cnt[i])); rwstat->cnt[i]);
v = atomic64_read(&rwstat->aux_cnt[BLKG_RWSTAT_READ]) + v = rwstat->cnt[BLKG_RWSTAT_READ] +
atomic64_read(&rwstat->aux_cnt[BLKG_RWSTAT_WRITE]) + rwstat->cnt[BLKG_RWSTAT_WRITE] +
atomic64_read(&rwstat->aux_cnt[BLKG_RWSTAT_DISCARD]); rwstat->cnt[BLKG_RWSTAT_DISCARD];
seq_printf(sf, "%s Total %llu\n", dname, (unsigned long long)v); seq_printf(sf, "%s Total %llu\n", dname, v);
return v; return v;
} }
EXPORT_SYMBOL_GPL(__blkg_prfill_rwstat); EXPORT_SYMBOL_GPL(__blkg_prfill_rwstat);
/**
* blkg_prfill_stat - prfill callback for blkg_stat
* @sf: seq_file to print to
* @pd: policy private data of interest
* @off: offset to the blkg_stat in @pd
*
* prfill callback for printing a blkg_stat.
*/
u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd, int off)
{
return __blkg_prfill_u64(sf, pd, blkg_stat_read((void *)pd + off));
}
EXPORT_SYMBOL_GPL(blkg_prfill_stat);
/** /**
* blkg_prfill_rwstat - prfill callback for blkg_rwstat * blkg_prfill_rwstat - prfill callback for blkg_rwstat
* @sf: seq_file to print to * @sf: seq_file to print to
@ -602,8 +583,9 @@ EXPORT_SYMBOL_GPL(blkg_prfill_stat);
u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd, u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
int off) int off)
{ {
struct blkg_rwstat rwstat = blkg_rwstat_read((void *)pd + off); struct blkg_rwstat_sample rwstat = { };
blkg_rwstat_read((void *)pd + off, &rwstat);
return __blkg_prfill_rwstat(sf, pd, &rwstat); return __blkg_prfill_rwstat(sf, pd, &rwstat);
} }
EXPORT_SYMBOL_GPL(blkg_prfill_rwstat); EXPORT_SYMBOL_GPL(blkg_prfill_rwstat);
@ -611,8 +593,9 @@ EXPORT_SYMBOL_GPL(blkg_prfill_rwstat);
static u64 blkg_prfill_rwstat_field(struct seq_file *sf, static u64 blkg_prfill_rwstat_field(struct seq_file *sf,
struct blkg_policy_data *pd, int off) struct blkg_policy_data *pd, int off)
{ {
struct blkg_rwstat rwstat = blkg_rwstat_read((void *)pd->blkg + off); struct blkg_rwstat_sample rwstat = { };
blkg_rwstat_read((void *)pd->blkg + off, &rwstat);
return __blkg_prfill_rwstat(sf, pd, &rwstat); return __blkg_prfill_rwstat(sf, pd, &rwstat);
} }
@ -654,8 +637,9 @@ static u64 blkg_prfill_rwstat_field_recursive(struct seq_file *sf,
struct blkg_policy_data *pd, struct blkg_policy_data *pd,
int off) int off)
{ {
struct blkg_rwstat rwstat = blkg_rwstat_recursive_sum(pd->blkg, struct blkg_rwstat_sample rwstat;
NULL, off);
blkg_rwstat_recursive_sum(pd->blkg, NULL, off, &rwstat);
return __blkg_prfill_rwstat(sf, pd, &rwstat); return __blkg_prfill_rwstat(sf, pd, &rwstat);
} }
@ -689,53 +673,12 @@ int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v)
} }
EXPORT_SYMBOL_GPL(blkg_print_stat_ios_recursive); EXPORT_SYMBOL_GPL(blkg_print_stat_ios_recursive);
/**
* blkg_stat_recursive_sum - collect hierarchical blkg_stat
* @blkg: blkg of interest
* @pol: blkcg_policy which contains the blkg_stat
* @off: offset to the blkg_stat in blkg_policy_data or @blkg
*
* Collect the blkg_stat specified by @blkg, @pol and @off and all its
* online descendants and their aux counts. The caller must be holding the
* queue lock for online tests.
*
* If @pol is NULL, blkg_stat is at @off bytes into @blkg; otherwise, it is
* at @off bytes into @blkg's blkg_policy_data of the policy.
*/
u64 blkg_stat_recursive_sum(struct blkcg_gq *blkg,
struct blkcg_policy *pol, int off)
{
struct blkcg_gq *pos_blkg;
struct cgroup_subsys_state *pos_css;
u64 sum = 0;
lockdep_assert_held(&blkg->q->queue_lock);
rcu_read_lock();
blkg_for_each_descendant_pre(pos_blkg, pos_css, blkg) {
struct blkg_stat *stat;
if (!pos_blkg->online)
continue;
if (pol)
stat = (void *)blkg_to_pd(pos_blkg, pol) + off;
else
stat = (void *)blkg + off;
sum += blkg_stat_read(stat) + atomic64_read(&stat->aux_cnt);
}
rcu_read_unlock();
return sum;
}
EXPORT_SYMBOL_GPL(blkg_stat_recursive_sum);
/** /**
* blkg_rwstat_recursive_sum - collect hierarchical blkg_rwstat * blkg_rwstat_recursive_sum - collect hierarchical blkg_rwstat
* @blkg: blkg of interest * @blkg: blkg of interest
* @pol: blkcg_policy which contains the blkg_rwstat * @pol: blkcg_policy which contains the blkg_rwstat
* @off: offset to the blkg_rwstat in blkg_policy_data or @blkg * @off: offset to the blkg_rwstat in blkg_policy_data or @blkg
* @sum: blkg_rwstat_sample structure containing the results
* *
* Collect the blkg_rwstat specified by @blkg, @pol and @off and all its * Collect the blkg_rwstat specified by @blkg, @pol and @off and all its
* online descendants and their aux counts. The caller must be holding the * online descendants and their aux counts. The caller must be holding the
@ -744,13 +687,12 @@ EXPORT_SYMBOL_GPL(blkg_stat_recursive_sum);
* If @pol is NULL, blkg_rwstat is at @off bytes into @blkg; otherwise, it * If @pol is NULL, blkg_rwstat is at @off bytes into @blkg; otherwise, it
* is at @off bytes into @blkg's blkg_policy_data of the policy. * is at @off bytes into @blkg's blkg_policy_data of the policy.
*/ */
struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg, void blkg_rwstat_recursive_sum(struct blkcg_gq *blkg, struct blkcg_policy *pol,
struct blkcg_policy *pol, int off) int off, struct blkg_rwstat_sample *sum)
{ {
struct blkcg_gq *pos_blkg; struct blkcg_gq *pos_blkg;
struct cgroup_subsys_state *pos_css; struct cgroup_subsys_state *pos_css;
struct blkg_rwstat sum = { }; unsigned int i;
int i;
lockdep_assert_held(&blkg->q->queue_lock); lockdep_assert_held(&blkg->q->queue_lock);
@ -767,13 +709,9 @@ struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
rwstat = (void *)pos_blkg + off; rwstat = (void *)pos_blkg + off;
for (i = 0; i < BLKG_RWSTAT_NR; i++) for (i = 0; i < BLKG_RWSTAT_NR; i++)
atomic64_add(atomic64_read(&rwstat->aux_cnt[i]) + sum->cnt[i] = blkg_rwstat_read_counter(rwstat, i);
percpu_counter_sum_positive(&rwstat->cpu_cnt[i]),
&sum.aux_cnt[i]);
} }
rcu_read_unlock(); rcu_read_unlock();
return sum;
} }
EXPORT_SYMBOL_GPL(blkg_rwstat_recursive_sum); EXPORT_SYMBOL_GPL(blkg_rwstat_recursive_sum);
@ -939,7 +877,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
const char *dname; const char *dname;
char *buf; char *buf;
struct blkg_rwstat rwstat; struct blkg_rwstat_sample rwstat;
u64 rbytes, wbytes, rios, wios, dbytes, dios; u64 rbytes, wbytes, rios, wios, dbytes, dios;
size_t size = seq_get_buf(sf, &buf), off = 0; size_t size = seq_get_buf(sf, &buf), off = 0;
int i; int i;
@ -959,17 +897,17 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
spin_lock_irq(&blkg->q->queue_lock); spin_lock_irq(&blkg->q->queue_lock);
rwstat = blkg_rwstat_recursive_sum(blkg, NULL, blkg_rwstat_recursive_sum(blkg, NULL,
offsetof(struct blkcg_gq, stat_bytes)); offsetof(struct blkcg_gq, stat_bytes), &rwstat);
rbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]); rbytes = rwstat.cnt[BLKG_RWSTAT_READ];
wbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]); wbytes = rwstat.cnt[BLKG_RWSTAT_WRITE];
dbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_DISCARD]); dbytes = rwstat.cnt[BLKG_RWSTAT_DISCARD];
rwstat = blkg_rwstat_recursive_sum(blkg, NULL, blkg_rwstat_recursive_sum(blkg, NULL,
offsetof(struct blkcg_gq, stat_ios)); offsetof(struct blkcg_gq, stat_ios), &rwstat);
rios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]); rios = rwstat.cnt[BLKG_RWSTAT_READ];
wios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]); wios = rwstat.cnt[BLKG_RWSTAT_WRITE];
dios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_DISCARD]); dios = rwstat.cnt[BLKG_RWSTAT_DISCARD];
spin_unlock_irq(&blkg->q->queue_lock); spin_unlock_irq(&blkg->q->queue_lock);
@ -1006,8 +944,12 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
} }
next: next:
if (has_stats) { if (has_stats) {
off += scnprintf(buf+off, size-off, "\n"); if (off < size - 1) {
seq_commit(sf, off); off += scnprintf(buf+off, size-off, "\n");
seq_commit(sf, off);
} else {
seq_commit(sf, -1);
}
} }
} }
@ -1391,7 +1333,8 @@ pd_prealloc:
spin_lock_irq(&q->queue_lock); spin_lock_irq(&q->queue_lock);
list_for_each_entry(blkg, &q->blkg_list, q_node) { /* blkg_list is pushed at the head, reverse walk to init parents first */
list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
struct blkg_policy_data *pd; struct blkg_policy_data *pd;
if (blkg->pd[pol->plid]) if (blkg->pd[pol->plid])

View File

@ -120,6 +120,42 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
} }
EXPORT_SYMBOL(blk_rq_init); EXPORT_SYMBOL(blk_rq_init);
#define REQ_OP_NAME(name) [REQ_OP_##name] = #name
static const char *const blk_op_name[] = {
REQ_OP_NAME(READ),
REQ_OP_NAME(WRITE),
REQ_OP_NAME(FLUSH),
REQ_OP_NAME(DISCARD),
REQ_OP_NAME(SECURE_ERASE),
REQ_OP_NAME(ZONE_RESET),
REQ_OP_NAME(WRITE_SAME),
REQ_OP_NAME(WRITE_ZEROES),
REQ_OP_NAME(SCSI_IN),
REQ_OP_NAME(SCSI_OUT),
REQ_OP_NAME(DRV_IN),
REQ_OP_NAME(DRV_OUT),
};
#undef REQ_OP_NAME
/**
* blk_op_str - Return string XXX in the REQ_OP_XXX.
* @op: REQ_OP_XXX.
*
* Description: Centralize block layer function to convert REQ_OP_XXX into
* string format. Useful in the debugging and tracing bio or request. For
* invalid REQ_OP_XXX it returns string "UNKNOWN".
*/
inline const char *blk_op_str(unsigned int op)
{
const char *op_str = "UNKNOWN";
if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op])
op_str = blk_op_name[op];
return op_str;
}
EXPORT_SYMBOL_GPL(blk_op_str);
static const struct { static const struct {
int errno; int errno;
const char *name; const char *name;
@ -167,18 +203,23 @@ int blk_status_to_errno(blk_status_t status)
} }
EXPORT_SYMBOL_GPL(blk_status_to_errno); EXPORT_SYMBOL_GPL(blk_status_to_errno);
static void print_req_error(struct request *req, blk_status_t status) static void print_req_error(struct request *req, blk_status_t status,
const char *caller)
{ {
int idx = (__force int)status; int idx = (__force int)status;
if (WARN_ON_ONCE(idx >= ARRAY_SIZE(blk_errors))) if (WARN_ON_ONCE(idx >= ARRAY_SIZE(blk_errors)))
return; return;
printk_ratelimited(KERN_ERR "%s: %s error, dev %s, sector %llu flags %x\n", printk_ratelimited(KERN_ERR
__func__, blk_errors[idx].name, "%s: %s error, dev %s, sector %llu op 0x%x:(%s) flags 0x%x "
req->rq_disk ? req->rq_disk->disk_name : "?", "phys_seg %u prio class %u\n",
(unsigned long long)blk_rq_pos(req), caller, blk_errors[idx].name,
req->cmd_flags); req->rq_disk ? req->rq_disk->disk_name : "?",
blk_rq_pos(req), req_op(req), blk_op_str(req_op(req)),
req->cmd_flags & ~REQ_OP_MASK,
req->nr_phys_segments,
IOPRIO_PRIO_CLASS(req->ioprio));
} }
static void req_bio_endio(struct request *rq, struct bio *bio, static void req_bio_endio(struct request *rq, struct bio *bio,
@ -550,15 +591,15 @@ void blk_put_request(struct request *req)
} }
EXPORT_SYMBOL(blk_put_request); EXPORT_SYMBOL(blk_put_request);
bool bio_attempt_back_merge(struct request_queue *q, struct request *req, bool bio_attempt_back_merge(struct request *req, struct bio *bio,
struct bio *bio) unsigned int nr_segs)
{ {
const int ff = bio->bi_opf & REQ_FAILFAST_MASK; const int ff = bio->bi_opf & REQ_FAILFAST_MASK;
if (!ll_back_merge_fn(q, req, bio)) if (!ll_back_merge_fn(req, bio, nr_segs))
return false; return false;
trace_block_bio_backmerge(q, req, bio); trace_block_bio_backmerge(req->q, req, bio);
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
blk_rq_set_mixed_merge(req); blk_rq_set_mixed_merge(req);
@ -571,15 +612,15 @@ bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
return true; return true;
} }
bool bio_attempt_front_merge(struct request_queue *q, struct request *req, bool bio_attempt_front_merge(struct request *req, struct bio *bio,
struct bio *bio) unsigned int nr_segs)
{ {
const int ff = bio->bi_opf & REQ_FAILFAST_MASK; const int ff = bio->bi_opf & REQ_FAILFAST_MASK;
if (!ll_front_merge_fn(q, req, bio)) if (!ll_front_merge_fn(req, bio, nr_segs))
return false; return false;
trace_block_bio_frontmerge(q, req, bio); trace_block_bio_frontmerge(req->q, req, bio);
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
blk_rq_set_mixed_merge(req); blk_rq_set_mixed_merge(req);
@ -621,6 +662,7 @@ no_merge:
* blk_attempt_plug_merge - try to merge with %current's plugged list * blk_attempt_plug_merge - try to merge with %current's plugged list
* @q: request_queue new bio is being queued at * @q: request_queue new bio is being queued at
* @bio: new bio being queued * @bio: new bio being queued
* @nr_segs: number of segments in @bio
* @same_queue_rq: pointer to &struct request that gets filled in when * @same_queue_rq: pointer to &struct request that gets filled in when
* another request associated with @q is found on the plug list * another request associated with @q is found on the plug list
* (optional, may be %NULL) * (optional, may be %NULL)
@ -639,7 +681,7 @@ no_merge:
* Caller must ensure !blk_queue_nomerges(q) beforehand. * Caller must ensure !blk_queue_nomerges(q) beforehand.
*/ */
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio, bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
struct request **same_queue_rq) unsigned int nr_segs, struct request **same_queue_rq)
{ {
struct blk_plug *plug; struct blk_plug *plug;
struct request *rq; struct request *rq;
@ -668,10 +710,10 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
switch (blk_try_merge(rq, bio)) { switch (blk_try_merge(rq, bio)) {
case ELEVATOR_BACK_MERGE: case ELEVATOR_BACK_MERGE:
merged = bio_attempt_back_merge(q, rq, bio); merged = bio_attempt_back_merge(rq, bio, nr_segs);
break; break;
case ELEVATOR_FRONT_MERGE: case ELEVATOR_FRONT_MERGE:
merged = bio_attempt_front_merge(q, rq, bio); merged = bio_attempt_front_merge(rq, bio, nr_segs);
break; break;
case ELEVATOR_DISCARD_MERGE: case ELEVATOR_DISCARD_MERGE:
merged = bio_attempt_discard_merge(q, rq, bio); merged = bio_attempt_discard_merge(q, rq, bio);
@ -687,18 +729,6 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
return false; return false;
} }
void blk_init_request_from_bio(struct request *req, struct bio *bio)
{
if (bio->bi_opf & REQ_RAHEAD)
req->cmd_flags |= REQ_FAILFAST_MASK;
req->__sector = bio->bi_iter.bi_sector;
req->ioprio = bio_prio(bio);
req->write_hint = bio->bi_write_hint;
blk_rq_bio_prep(req->q, req, bio);
}
EXPORT_SYMBOL_GPL(blk_init_request_from_bio);
static void handle_bad_sector(struct bio *bio, sector_t maxsector) static void handle_bad_sector(struct bio *bio, sector_t maxsector)
{ {
char b[BDEVNAME_SIZE]; char b[BDEVNAME_SIZE];
@ -1163,7 +1193,7 @@ static int blk_cloned_rq_check_limits(struct request_queue *q,
* Recalculate it to check the request correctly on this queue's * Recalculate it to check the request correctly on this queue's
* limitation. * limitation.
*/ */
blk_recalc_rq_segments(rq); rq->nr_phys_segments = blk_recalc_rq_segments(rq);
if (rq->nr_phys_segments > queue_max_segments(q)) { if (rq->nr_phys_segments > queue_max_segments(q)) {
printk(KERN_ERR "%s: over max segments limit. (%hu > %hu)\n", printk(KERN_ERR "%s: over max segments limit. (%hu > %hu)\n",
__func__, rq->nr_phys_segments, queue_max_segments(q)); __func__, rq->nr_phys_segments, queue_max_segments(q));
@ -1348,7 +1378,7 @@ EXPORT_SYMBOL_GPL(blk_steal_bios);
* *
* This special helper function is only for request stacking drivers * This special helper function is only for request stacking drivers
* (e.g. request-based dm) so that they can handle partial completion. * (e.g. request-based dm) so that they can handle partial completion.
* Actual device drivers should use blk_end_request instead. * Actual device drivers should use blk_mq_end_request instead.
* *
* Passing the result of blk_rq_bytes() as @nr_bytes guarantees * Passing the result of blk_rq_bytes() as @nr_bytes guarantees
* %false return from this function. * %false return from this function.
@ -1373,7 +1403,7 @@ bool blk_update_request(struct request *req, blk_status_t error,
if (unlikely(error && !blk_rq_is_passthrough(req) && if (unlikely(error && !blk_rq_is_passthrough(req) &&
!(req->rq_flags & RQF_QUIET))) !(req->rq_flags & RQF_QUIET)))
print_req_error(req, error); print_req_error(req, error, __func__);
blk_account_io_completion(req, nr_bytes); blk_account_io_completion(req, nr_bytes);
@ -1432,28 +1462,13 @@ bool blk_update_request(struct request *req, blk_status_t error,
} }
/* recalculate the number of segments */ /* recalculate the number of segments */
blk_recalc_rq_segments(req); req->nr_phys_segments = blk_recalc_rq_segments(req);
} }
return true; return true;
} }
EXPORT_SYMBOL_GPL(blk_update_request); EXPORT_SYMBOL_GPL(blk_update_request);
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio)
{
if (bio_has_data(bio))
rq->nr_phys_segments = bio_phys_segments(q, bio);
else if (bio_op(bio) == REQ_OP_DISCARD)
rq->nr_phys_segments = 1;
rq->__data_len = bio->bi_iter.bi_size;
rq->bio = rq->biotail = bio;
if (bio->bi_disk)
rq->rq_disk = bio->bi_disk;
}
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
/** /**
* rq_flush_dcache_pages - Helper function to flush all pages in a request * rq_flush_dcache_pages - Helper function to flush all pages in a request

View File

@ -618,44 +618,26 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
inflight = atomic_dec_return(&rqw->inflight); inflight = atomic_dec_return(&rqw->inflight);
WARN_ON_ONCE(inflight < 0); WARN_ON_ONCE(inflight < 0);
if (iolat->min_lat_nsec == 0) /*
goto next; * If bi_status is BLK_STS_AGAIN, the bio wasn't actually
iolatency_record_time(iolat, &bio->bi_issue, now, * submitted, so do not account for it.
issue_as_root); */
window_start = atomic64_read(&iolat->window_start); if (iolat->min_lat_nsec && bio->bi_status != BLK_STS_AGAIN) {
if (now > window_start && iolatency_record_time(iolat, &bio->bi_issue, now,
(now - window_start) >= iolat->cur_win_nsec) { issue_as_root);
if (atomic64_cmpxchg(&iolat->window_start, window_start = atomic64_read(&iolat->window_start);
window_start, now) == window_start) if (now > window_start &&
iolatency_check_latencies(iolat, now); (now - window_start) >= iolat->cur_win_nsec) {
if (atomic64_cmpxchg(&iolat->window_start,
window_start, now) == window_start)
iolatency_check_latencies(iolat, now);
}
} }
next:
wake_up(&rqw->wait); wake_up(&rqw->wait);
blkg = blkg->parent; blkg = blkg->parent;
} }
} }
static void blkcg_iolatency_cleanup(struct rq_qos *rqos, struct bio *bio)
{
struct blkcg_gq *blkg;
blkg = bio->bi_blkg;
while (blkg && blkg->parent) {
struct rq_wait *rqw;
struct iolatency_grp *iolat;
iolat = blkg_to_lat(blkg);
if (!iolat)
goto next;
rqw = &iolat->rq_wait;
atomic_dec(&rqw->inflight);
wake_up(&rqw->wait);
next:
blkg = blkg->parent;
}
}
static void blkcg_iolatency_exit(struct rq_qos *rqos) static void blkcg_iolatency_exit(struct rq_qos *rqos)
{ {
struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos); struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
@ -667,7 +649,6 @@ static void blkcg_iolatency_exit(struct rq_qos *rqos)
static struct rq_qos_ops blkcg_iolatency_ops = { static struct rq_qos_ops blkcg_iolatency_ops = {
.throttle = blkcg_iolatency_throttle, .throttle = blkcg_iolatency_throttle,
.cleanup = blkcg_iolatency_cleanup,
.done_bio = blkcg_iolatency_done_bio, .done_bio = blkcg_iolatency_done_bio,
.exit = blkcg_iolatency_exit, .exit = blkcg_iolatency_exit,
}; };
@ -778,8 +759,10 @@ static int iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val)
if (!oldval && val) if (!oldval && val)
return 1; return 1;
if (oldval && !val) if (oldval && !val) {
blkcg_clear_delay(blkg);
return -1; return -1;
}
return 0; return 0;
} }

View File

@ -18,13 +18,19 @@
int blk_rq_append_bio(struct request *rq, struct bio **bio) int blk_rq_append_bio(struct request *rq, struct bio **bio)
{ {
struct bio *orig_bio = *bio; struct bio *orig_bio = *bio;
struct bvec_iter iter;
struct bio_vec bv;
unsigned int nr_segs = 0;
blk_queue_bounce(rq->q, bio); blk_queue_bounce(rq->q, bio);
bio_for_each_bvec(bv, *bio, iter)
nr_segs++;
if (!rq->bio) { if (!rq->bio) {
blk_rq_bio_prep(rq->q, rq, *bio); blk_rq_bio_prep(rq, *bio, nr_segs);
} else { } else {
if (!ll_back_merge_fn(rq->q, rq, *bio)) { if (!ll_back_merge_fn(rq, *bio, nr_segs)) {
if (orig_bio != *bio) { if (orig_bio != *bio) {
bio_put(*bio); bio_put(*bio);
*bio = orig_bio; *bio = orig_bio;

View File

@ -105,7 +105,7 @@ static struct bio *blk_bio_discard_split(struct request_queue *q,
static struct bio *blk_bio_write_zeroes_split(struct request_queue *q, static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
struct bio *bio, struct bio_set *bs, unsigned *nsegs) struct bio *bio, struct bio_set *bs, unsigned *nsegs)
{ {
*nsegs = 1; *nsegs = 0;
if (!q->limits.max_write_zeroes_sectors) if (!q->limits.max_write_zeroes_sectors)
return NULL; return NULL;
@ -202,8 +202,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
struct bio_vec bv, bvprv, *bvprvp = NULL; struct bio_vec bv, bvprv, *bvprvp = NULL;
struct bvec_iter iter; struct bvec_iter iter;
unsigned nsegs = 0, sectors = 0; unsigned nsegs = 0, sectors = 0;
bool do_split = true;
struct bio *new = NULL;
const unsigned max_sectors = get_max_io_size(q, bio); const unsigned max_sectors = get_max_io_size(q, bio);
const unsigned max_segs = queue_max_segments(q); const unsigned max_segs = queue_max_segments(q);
@ -245,45 +243,36 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
} }
} }
do_split = false; *segs = nsegs;
return NULL;
split: split:
*segs = nsegs; *segs = nsegs;
return bio_split(bio, sectors, GFP_NOIO, bs);
if (do_split) {
new = bio_split(bio, sectors, GFP_NOIO, bs);
if (new)
bio = new;
}
return do_split ? new : NULL;
} }
void blk_queue_split(struct request_queue *q, struct bio **bio) void __blk_queue_split(struct request_queue *q, struct bio **bio,
unsigned int *nr_segs)
{ {
struct bio *split, *res; struct bio *split;
unsigned nsegs;
switch (bio_op(*bio)) { switch (bio_op(*bio)) {
case REQ_OP_DISCARD: case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE: case REQ_OP_SECURE_ERASE:
split = blk_bio_discard_split(q, *bio, &q->bio_split, &nsegs); split = blk_bio_discard_split(q, *bio, &q->bio_split, nr_segs);
break; break;
case REQ_OP_WRITE_ZEROES: case REQ_OP_WRITE_ZEROES:
split = blk_bio_write_zeroes_split(q, *bio, &q->bio_split, &nsegs); split = blk_bio_write_zeroes_split(q, *bio, &q->bio_split,
nr_segs);
break; break;
case REQ_OP_WRITE_SAME: case REQ_OP_WRITE_SAME:
split = blk_bio_write_same_split(q, *bio, &q->bio_split, &nsegs); split = blk_bio_write_same_split(q, *bio, &q->bio_split,
nr_segs);
break; break;
default: default:
split = blk_bio_segment_split(q, *bio, &q->bio_split, &nsegs); split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
break; break;
} }
/* physical segments can be figured out during splitting */
res = split ? split : *bio;
res->bi_phys_segments = nsegs;
bio_set_flag(res, BIO_SEG_VALID);
if (split) { if (split) {
/* there isn't chance to merge the splitted bio */ /* there isn't chance to merge the splitted bio */
split->bi_opf |= REQ_NOMERGE; split->bi_opf |= REQ_NOMERGE;
@ -304,19 +293,25 @@ void blk_queue_split(struct request_queue *q, struct bio **bio)
*bio = split; *bio = split;
} }
} }
void blk_queue_split(struct request_queue *q, struct bio **bio)
{
unsigned int nr_segs;
__blk_queue_split(q, bio, &nr_segs);
}
EXPORT_SYMBOL(blk_queue_split); EXPORT_SYMBOL(blk_queue_split);
static unsigned int __blk_recalc_rq_segments(struct request_queue *q, unsigned int blk_recalc_rq_segments(struct request *rq)
struct bio *bio)
{ {
unsigned int nr_phys_segs = 0; unsigned int nr_phys_segs = 0;
struct bvec_iter iter; struct req_iterator iter;
struct bio_vec bv; struct bio_vec bv;
if (!bio) if (!rq->bio)
return 0; return 0;
switch (bio_op(bio)) { switch (bio_op(rq->bio)) {
case REQ_OP_DISCARD: case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE: case REQ_OP_SECURE_ERASE:
case REQ_OP_WRITE_ZEROES: case REQ_OP_WRITE_ZEROES:
@ -325,30 +320,11 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
return 1; return 1;
} }
for_each_bio(bio) { rq_for_each_bvec(bv, rq, iter)
bio_for_each_bvec(bv, bio, iter) bvec_split_segs(rq->q, &bv, &nr_phys_segs, NULL, UINT_MAX);
bvec_split_segs(q, &bv, &nr_phys_segs, NULL, UINT_MAX);
}
return nr_phys_segs; return nr_phys_segs;
} }
void blk_recalc_rq_segments(struct request *rq)
{
rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio);
}
void blk_recount_segments(struct request_queue *q, struct bio *bio)
{
struct bio *nxt = bio->bi_next;
bio->bi_next = NULL;
bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio);
bio->bi_next = nxt;
bio_set_flag(bio, BIO_SEG_VALID);
}
static inline struct scatterlist *blk_next_sg(struct scatterlist **sg, static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
struct scatterlist *sglist) struct scatterlist *sglist)
{ {
@ -519,16 +495,13 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
} }
EXPORT_SYMBOL(blk_rq_map_sg); EXPORT_SYMBOL(blk_rq_map_sg);
static inline int ll_new_hw_segment(struct request_queue *q, static inline int ll_new_hw_segment(struct request *req, struct bio *bio,
struct request *req, unsigned int nr_phys_segs)
struct bio *bio)
{ {
int nr_phys_segs = bio_phys_segments(q, bio); if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(req->q))
if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(q))
goto no_merge; goto no_merge;
if (blk_integrity_merge_bio(q, req, bio) == false) if (blk_integrity_merge_bio(req->q, req, bio) == false)
goto no_merge; goto no_merge;
/* /*
@ -539,12 +512,11 @@ static inline int ll_new_hw_segment(struct request_queue *q,
return 1; return 1;
no_merge: no_merge:
req_set_nomerge(q, req); req_set_nomerge(req->q, req);
return 0; return 0;
} }
int ll_back_merge_fn(struct request_queue *q, struct request *req, int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs)
struct bio *bio)
{ {
if (req_gap_back_merge(req, bio)) if (req_gap_back_merge(req, bio))
return 0; return 0;
@ -553,21 +525,15 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req,
return 0; return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) > if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req, blk_rq_pos(req))) { blk_rq_get_max_sectors(req, blk_rq_pos(req))) {
req_set_nomerge(q, req); req_set_nomerge(req->q, req);
return 0; return 0;
} }
if (!bio_flagged(req->biotail, BIO_SEG_VALID))
blk_recount_segments(q, req->biotail);
if (!bio_flagged(bio, BIO_SEG_VALID))
blk_recount_segments(q, bio);
return ll_new_hw_segment(q, req, bio); return ll_new_hw_segment(req, bio, nr_segs);
} }
int ll_front_merge_fn(struct request_queue *q, struct request *req, int ll_front_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs)
struct bio *bio)
{ {
if (req_gap_front_merge(req, bio)) if (req_gap_front_merge(req, bio))
return 0; return 0;
if (blk_integrity_rq(req) && if (blk_integrity_rq(req) &&
@ -575,15 +541,11 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
return 0; return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) > if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req, bio->bi_iter.bi_sector)) { blk_rq_get_max_sectors(req, bio->bi_iter.bi_sector)) {
req_set_nomerge(q, req); req_set_nomerge(req->q, req);
return 0; return 0;
} }
if (!bio_flagged(bio, BIO_SEG_VALID))
blk_recount_segments(q, bio);
if (!bio_flagged(req->bio, BIO_SEG_VALID))
blk_recount_segments(q, req->bio);
return ll_new_hw_segment(q, req, bio); return ll_new_hw_segment(req, bio, nr_segs);
} }
static bool req_attempt_discard_merge(struct request_queue *q, struct request *req, static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,

View File

@ -17,7 +17,7 @@
static void print_stat(struct seq_file *m, struct blk_rq_stat *stat) static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
{ {
if (stat->nr_samples) { if (stat->nr_samples) {
seq_printf(m, "samples=%d, mean=%lld, min=%llu, max=%llu", seq_printf(m, "samples=%d, mean=%llu, min=%llu, max=%llu",
stat->nr_samples, stat->mean, stat->min, stat->max); stat->nr_samples, stat->mean, stat->min, stat->max);
} else { } else {
seq_puts(m, "samples=0"); seq_puts(m, "samples=0");
@ -29,13 +29,13 @@ static int queue_poll_stat_show(void *data, struct seq_file *m)
struct request_queue *q = data; struct request_queue *q = data;
int bucket; int bucket;
for (bucket = 0; bucket < BLK_MQ_POLL_STATS_BKTS/2; bucket++) { for (bucket = 0; bucket < (BLK_MQ_POLL_STATS_BKTS / 2); bucket++) {
seq_printf(m, "read (%d Bytes): ", 1 << (9+bucket)); seq_printf(m, "read (%d Bytes): ", 1 << (9 + bucket));
print_stat(m, &q->poll_stat[2*bucket]); print_stat(m, &q->poll_stat[2 * bucket]);
seq_puts(m, "\n"); seq_puts(m, "\n");
seq_printf(m, "write (%d Bytes): ", 1 << (9+bucket)); seq_printf(m, "write (%d Bytes): ", 1 << (9 + bucket));
print_stat(m, &q->poll_stat[2*bucket+1]); print_stat(m, &q->poll_stat[2 * bucket + 1]);
seq_puts(m, "\n"); seq_puts(m, "\n");
} }
return 0; return 0;
@ -261,23 +261,6 @@ static int hctx_flags_show(void *data, struct seq_file *m)
return 0; return 0;
} }
#define REQ_OP_NAME(name) [REQ_OP_##name] = #name
static const char *const op_name[] = {
REQ_OP_NAME(READ),
REQ_OP_NAME(WRITE),
REQ_OP_NAME(FLUSH),
REQ_OP_NAME(DISCARD),
REQ_OP_NAME(SECURE_ERASE),
REQ_OP_NAME(ZONE_RESET),
REQ_OP_NAME(WRITE_SAME),
REQ_OP_NAME(WRITE_ZEROES),
REQ_OP_NAME(SCSI_IN),
REQ_OP_NAME(SCSI_OUT),
REQ_OP_NAME(DRV_IN),
REQ_OP_NAME(DRV_OUT),
};
#undef REQ_OP_NAME
#define CMD_FLAG_NAME(name) [__REQ_##name] = #name #define CMD_FLAG_NAME(name) [__REQ_##name] = #name
static const char *const cmd_flag_name[] = { static const char *const cmd_flag_name[] = {
CMD_FLAG_NAME(FAILFAST_DEV), CMD_FLAG_NAME(FAILFAST_DEV),
@ -341,13 +324,14 @@ static const char *blk_mq_rq_state_name(enum mq_rq_state rq_state)
int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq) int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq)
{ {
const struct blk_mq_ops *const mq_ops = rq->q->mq_ops; const struct blk_mq_ops *const mq_ops = rq->q->mq_ops;
const unsigned int op = rq->cmd_flags & REQ_OP_MASK; const unsigned int op = req_op(rq);
const char *op_str = blk_op_str(op);
seq_printf(m, "%p {.op=", rq); seq_printf(m, "%p {.op=", rq);
if (op < ARRAY_SIZE(op_name) && op_name[op]) if (strcmp(op_str, "UNKNOWN") == 0)
seq_printf(m, "%s", op_name[op]); seq_printf(m, "%u", op);
else else
seq_printf(m, "%d", op); seq_printf(m, "%s", op_str);
seq_puts(m, ", .cmd_flags="); seq_puts(m, ", .cmd_flags=");
blk_flags_show(m, rq->cmd_flags & ~REQ_OP_MASK, cmd_flag_name, blk_flags_show(m, rq->cmd_flags & ~REQ_OP_MASK, cmd_flag_name,
ARRAY_SIZE(cmd_flag_name)); ARRAY_SIZE(cmd_flag_name));
@ -779,8 +763,8 @@ static int blk_mq_debugfs_release(struct inode *inode, struct file *file)
if (attr->show) if (attr->show)
return single_release(inode, file); return single_release(inode, file);
else
return seq_release(inode, file); return seq_release(inode, file);
} }
static const struct file_operations blk_mq_debugfs_fops = { static const struct file_operations blk_mq_debugfs_fops = {

View File

@ -224,7 +224,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
} }
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
struct request **merged_request) unsigned int nr_segs, struct request **merged_request)
{ {
struct request *rq; struct request *rq;
@ -232,7 +232,7 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
case ELEVATOR_BACK_MERGE: case ELEVATOR_BACK_MERGE:
if (!blk_mq_sched_allow_merge(q, rq, bio)) if (!blk_mq_sched_allow_merge(q, rq, bio))
return false; return false;
if (!bio_attempt_back_merge(q, rq, bio)) if (!bio_attempt_back_merge(rq, bio, nr_segs))
return false; return false;
*merged_request = attempt_back_merge(q, rq); *merged_request = attempt_back_merge(q, rq);
if (!*merged_request) if (!*merged_request)
@ -241,7 +241,7 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
case ELEVATOR_FRONT_MERGE: case ELEVATOR_FRONT_MERGE:
if (!blk_mq_sched_allow_merge(q, rq, bio)) if (!blk_mq_sched_allow_merge(q, rq, bio))
return false; return false;
if (!bio_attempt_front_merge(q, rq, bio)) if (!bio_attempt_front_merge(rq, bio, nr_segs))
return false; return false;
*merged_request = attempt_front_merge(q, rq); *merged_request = attempt_front_merge(q, rq);
if (!*merged_request) if (!*merged_request)
@ -260,7 +260,7 @@ EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
* of them. * of them.
*/ */
bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list, bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
struct bio *bio) struct bio *bio, unsigned int nr_segs)
{ {
struct request *rq; struct request *rq;
int checked = 8; int checked = 8;
@ -277,11 +277,13 @@ bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
switch (blk_try_merge(rq, bio)) { switch (blk_try_merge(rq, bio)) {
case ELEVATOR_BACK_MERGE: case ELEVATOR_BACK_MERGE:
if (blk_mq_sched_allow_merge(q, rq, bio)) if (blk_mq_sched_allow_merge(q, rq, bio))
merged = bio_attempt_back_merge(q, rq, bio); merged = bio_attempt_back_merge(rq, bio,
nr_segs);
break; break;
case ELEVATOR_FRONT_MERGE: case ELEVATOR_FRONT_MERGE:
if (blk_mq_sched_allow_merge(q, rq, bio)) if (blk_mq_sched_allow_merge(q, rq, bio))
merged = bio_attempt_front_merge(q, rq, bio); merged = bio_attempt_front_merge(rq, bio,
nr_segs);
break; break;
case ELEVATOR_DISCARD_MERGE: case ELEVATOR_DISCARD_MERGE:
merged = bio_attempt_discard_merge(q, rq, bio); merged = bio_attempt_discard_merge(q, rq, bio);
@ -304,13 +306,14 @@ EXPORT_SYMBOL_GPL(blk_mq_bio_list_merge);
*/ */
static bool blk_mq_attempt_merge(struct request_queue *q, static bool blk_mq_attempt_merge(struct request_queue *q,
struct blk_mq_hw_ctx *hctx, struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx, struct bio *bio) struct blk_mq_ctx *ctx, struct bio *bio,
unsigned int nr_segs)
{ {
enum hctx_type type = hctx->type; enum hctx_type type = hctx->type;
lockdep_assert_held(&ctx->lock); lockdep_assert_held(&ctx->lock);
if (blk_mq_bio_list_merge(q, &ctx->rq_lists[type], bio)) { if (blk_mq_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs)) {
ctx->rq_merged++; ctx->rq_merged++;
return true; return true;
} }
@ -318,7 +321,8 @@ static bool blk_mq_attempt_merge(struct request_queue *q,
return false; return false;
} }
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio) bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs)
{ {
struct elevator_queue *e = q->elevator; struct elevator_queue *e = q->elevator;
struct blk_mq_ctx *ctx = blk_mq_get_ctx(q); struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
@ -326,21 +330,18 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
bool ret = false; bool ret = false;
enum hctx_type type; enum hctx_type type;
if (e && e->type->ops.bio_merge) { if (e && e->type->ops.bio_merge)
blk_mq_put_ctx(ctx); return e->type->ops.bio_merge(hctx, bio, nr_segs);
return e->type->ops.bio_merge(hctx, bio);
}
type = hctx->type; type = hctx->type;
if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) && if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
!list_empty_careful(&ctx->rq_lists[type])) { !list_empty_careful(&ctx->rq_lists[type])) {
/* default per sw-queue merge */ /* default per sw-queue merge */
spin_lock(&ctx->lock); spin_lock(&ctx->lock);
ret = blk_mq_attempt_merge(q, hctx, ctx, bio); ret = blk_mq_attempt_merge(q, hctx, ctx, bio, nr_segs);
spin_unlock(&ctx->lock); spin_unlock(&ctx->lock);
} }
blk_mq_put_ctx(ctx);
return ret; return ret;
} }

View File

@ -12,8 +12,9 @@ void blk_mq_sched_assign_ioc(struct request *rq);
void blk_mq_sched_request_inserted(struct request *rq); void blk_mq_sched_request_inserted(struct request *rq);
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
struct request **merged_request); unsigned int nr_segs, struct request **merged_request);
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio); bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs);
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq); bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx); void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx); void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
@ -31,12 +32,13 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
void blk_mq_sched_free_requests(struct request_queue *q); void blk_mq_sched_free_requests(struct request_queue *q);
static inline bool static inline bool
blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio) blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs)
{ {
if (blk_queue_nomerges(q) || !bio_mergeable(bio)) if (blk_queue_nomerges(q) || !bio_mergeable(bio))
return false; return false;
return __blk_mq_sched_bio_merge(q, bio); return __blk_mq_sched_bio_merge(q, bio, nr_segs);
} }
static inline bool static inline bool

View File

@ -113,7 +113,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
struct sbq_wait_state *ws; struct sbq_wait_state *ws;
DEFINE_SBQ_WAIT(wait); DEFINE_SBQ_WAIT(wait);
unsigned int tag_offset; unsigned int tag_offset;
bool drop_ctx;
int tag; int tag;
if (data->flags & BLK_MQ_REQ_RESERVED) { if (data->flags & BLK_MQ_REQ_RESERVED) {
@ -136,7 +135,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
return BLK_MQ_TAG_FAIL; return BLK_MQ_TAG_FAIL;
ws = bt_wait_ptr(bt, data->hctx); ws = bt_wait_ptr(bt, data->hctx);
drop_ctx = data->ctx == NULL;
do { do {
struct sbitmap_queue *bt_prev; struct sbitmap_queue *bt_prev;
@ -161,9 +159,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
if (tag != -1) if (tag != -1)
break; break;
if (data->ctx)
blk_mq_put_ctx(data->ctx);
bt_prev = bt; bt_prev = bt;
io_schedule(); io_schedule();
@ -189,9 +184,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
ws = bt_wait_ptr(bt, data->hctx); ws = bt_wait_ptr(bt, data->hctx);
} while (1); } while (1);
if (drop_ctx && data->ctx)
blk_mq_put_ctx(data->ctx);
sbitmap_finish_wait(bt, ws, &wait); sbitmap_finish_wait(bt, ws, &wait);
found_tag: found_tag:

View File

@ -355,13 +355,13 @@ static struct request *blk_mq_get_request(struct request_queue *q,
struct elevator_queue *e = q->elevator; struct elevator_queue *e = q->elevator;
struct request *rq; struct request *rq;
unsigned int tag; unsigned int tag;
bool put_ctx_on_error = false; bool clear_ctx_on_error = false;
blk_queue_enter_live(q); blk_queue_enter_live(q);
data->q = q; data->q = q;
if (likely(!data->ctx)) { if (likely(!data->ctx)) {
data->ctx = blk_mq_get_ctx(q); data->ctx = blk_mq_get_ctx(q);
put_ctx_on_error = true; clear_ctx_on_error = true;
} }
if (likely(!data->hctx)) if (likely(!data->hctx))
data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->hctx = blk_mq_map_queue(q, data->cmd_flags,
@ -387,10 +387,8 @@ static struct request *blk_mq_get_request(struct request_queue *q,
tag = blk_mq_get_tag(data); tag = blk_mq_get_tag(data);
if (tag == BLK_MQ_TAG_FAIL) { if (tag == BLK_MQ_TAG_FAIL) {
if (put_ctx_on_error) { if (clear_ctx_on_error)
blk_mq_put_ctx(data->ctx);
data->ctx = NULL; data->ctx = NULL;
}
blk_queue_exit(q); blk_queue_exit(q);
return NULL; return NULL;
} }
@ -427,8 +425,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
if (!rq) if (!rq)
return ERR_PTR(-EWOULDBLOCK); return ERR_PTR(-EWOULDBLOCK);
blk_mq_put_ctx(alloc_data.ctx);
rq->__data_len = 0; rq->__data_len = 0;
rq->__sector = (sector_t) -1; rq->__sector = (sector_t) -1;
rq->bio = rq->biotail = NULL; rq->bio = rq->biotail = NULL;
@ -1764,9 +1760,15 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
} }
} }
static void blk_mq_bio_to_request(struct request *rq, struct bio *bio) static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
unsigned int nr_segs)
{ {
blk_init_request_from_bio(rq, bio); if (bio->bi_opf & REQ_RAHEAD)
rq->cmd_flags |= REQ_FAILFAST_MASK;
rq->__sector = bio->bi_iter.bi_sector;
rq->write_hint = bio->bi_write_hint;
blk_rq_bio_prep(rq, bio, nr_segs);
blk_account_io_start(rq, true); blk_account_io_start(rq, true);
} }
@ -1936,20 +1938,20 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
struct request *rq; struct request *rq;
struct blk_plug *plug; struct blk_plug *plug;
struct request *same_queue_rq = NULL; struct request *same_queue_rq = NULL;
unsigned int nr_segs;
blk_qc_t cookie; blk_qc_t cookie;
blk_queue_bounce(q, &bio); blk_queue_bounce(q, &bio);
__blk_queue_split(q, &bio, &nr_segs);
blk_queue_split(q, &bio);
if (!bio_integrity_prep(bio)) if (!bio_integrity_prep(bio))
return BLK_QC_T_NONE; return BLK_QC_T_NONE;
if (!is_flush_fua && !blk_queue_nomerges(q) && if (!is_flush_fua && !blk_queue_nomerges(q) &&
blk_attempt_plug_merge(q, bio, &same_queue_rq)) blk_attempt_plug_merge(q, bio, nr_segs, &same_queue_rq))
return BLK_QC_T_NONE; return BLK_QC_T_NONE;
if (blk_mq_sched_bio_merge(q, bio)) if (blk_mq_sched_bio_merge(q, bio, nr_segs))
return BLK_QC_T_NONE; return BLK_QC_T_NONE;
rq_qos_throttle(q, bio); rq_qos_throttle(q, bio);
@ -1969,11 +1971,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
cookie = request_to_qc_t(data.hctx, rq); cookie = request_to_qc_t(data.hctx, rq);
blk_mq_bio_to_request(rq, bio, nr_segs);
plug = current->plug; plug = current->plug;
if (unlikely(is_flush_fua)) { if (unlikely(is_flush_fua)) {
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
/* bypass scheduler for flush rq */ /* bypass scheduler for flush rq */
blk_insert_flush(rq); blk_insert_flush(rq);
blk_mq_run_hw_queue(data.hctx, true); blk_mq_run_hw_queue(data.hctx, true);
@ -1985,9 +1986,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
unsigned int request_count = plug->rq_count; unsigned int request_count = plug->rq_count;
struct request *last = NULL; struct request *last = NULL;
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
if (!request_count) if (!request_count)
trace_block_plug(q); trace_block_plug(q);
else else
@ -2001,8 +1999,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
blk_add_rq_to_plug(plug, rq); blk_add_rq_to_plug(plug, rq);
} else if (plug && !blk_queue_nomerges(q)) { } else if (plug && !blk_queue_nomerges(q)) {
blk_mq_bio_to_request(rq, bio);
/* /*
* We do limited plugging. If the bio can be merged, do that. * We do limited plugging. If the bio can be merged, do that.
* Otherwise the existing request in the plug list will be * Otherwise the existing request in the plug list will be
@ -2019,8 +2015,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
blk_add_rq_to_plug(plug, rq); blk_add_rq_to_plug(plug, rq);
trace_block_plug(q); trace_block_plug(q);
blk_mq_put_ctx(data.ctx);
if (same_queue_rq) { if (same_queue_rq) {
data.hctx = same_queue_rq->mq_hctx; data.hctx = same_queue_rq->mq_hctx;
trace_block_unplug(q, 1, true); trace_block_unplug(q, 1, true);
@ -2029,12 +2023,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
} }
} else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator && } else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator &&
!data.hctx->dispatch_busy)) { !data.hctx->dispatch_busy)) {
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_try_issue_directly(data.hctx, rq, &cookie); blk_mq_try_issue_directly(data.hctx, rq, &cookie);
} else { } else {
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_sched_insert_request(rq, false, true, true); blk_mq_sched_insert_request(rq, false, true, true);
} }

View File

@ -151,12 +151,7 @@ static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
*/ */
static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q) static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
{ {
return __blk_mq_get_ctx(q, get_cpu()); return __blk_mq_get_ctx(q, raw_smp_processor_id());
}
static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
{
put_cpu();
} }
struct blk_mq_alloc_data { struct blk_mq_alloc_data {

View File

@ -51,8 +51,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
int node, int cmd_size, gfp_t flags); int node, int cmd_size, gfp_t flags);
void blk_free_flush_queue(struct blk_flush_queue *q); void blk_free_flush_queue(struct blk_flush_queue *q);
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio);
void blk_freeze_queue(struct request_queue *q); void blk_freeze_queue(struct request_queue *q);
static inline void blk_queue_enter_live(struct request_queue *q) static inline void blk_queue_enter_live(struct request_queue *q)
@ -101,6 +99,18 @@ static inline bool bvec_gap_to_prev(struct request_queue *q,
return __bvec_gap_to_prev(q, bprv, offset); return __bvec_gap_to_prev(q, bprv, offset);
} }
static inline void blk_rq_bio_prep(struct request *rq, struct bio *bio,
unsigned int nr_segs)
{
rq->nr_phys_segments = nr_segs;
rq->__data_len = bio->bi_iter.bi_size;
rq->bio = rq->biotail = bio;
rq->ioprio = bio_prio(bio);
if (bio->bi_disk)
rq->rq_disk = bio->bi_disk;
}
#ifdef CONFIG_BLK_DEV_INTEGRITY #ifdef CONFIG_BLK_DEV_INTEGRITY
void blk_flush_integrity(void); void blk_flush_integrity(void);
bool __bio_integrity_endio(struct bio *); bool __bio_integrity_endio(struct bio *);
@ -154,14 +164,14 @@ static inline bool bio_integrity_endio(struct bio *bio)
unsigned long blk_rq_timeout(unsigned long timeout); unsigned long blk_rq_timeout(unsigned long timeout);
void blk_add_timer(struct request *req); void blk_add_timer(struct request *req);
bool bio_attempt_front_merge(struct request_queue *q, struct request *req, bool bio_attempt_front_merge(struct request *req, struct bio *bio,
struct bio *bio); unsigned int nr_segs);
bool bio_attempt_back_merge(struct request_queue *q, struct request *req, bool bio_attempt_back_merge(struct request *req, struct bio *bio,
struct bio *bio); unsigned int nr_segs);
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req, bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
struct bio *bio); struct bio *bio);
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio, bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
struct request **same_queue_rq); unsigned int nr_segs, struct request **same_queue_rq);
void blk_account_io_start(struct request *req, bool new_io); void blk_account_io_start(struct request *req, bool new_io);
void blk_account_io_completion(struct request *req, unsigned int bytes); void blk_account_io_completion(struct request *req, unsigned int bytes);
@ -202,15 +212,17 @@ static inline int blk_should_fake_timeout(struct request_queue *q)
} }
#endif #endif
int ll_back_merge_fn(struct request_queue *q, struct request *req, void __blk_queue_split(struct request_queue *q, struct bio **bio,
struct bio *bio); unsigned int *nr_segs);
int ll_front_merge_fn(struct request_queue *q, struct request *req, int ll_back_merge_fn(struct request *req, struct bio *bio,
struct bio *bio); unsigned int nr_segs);
int ll_front_merge_fn(struct request *req, struct bio *bio,
unsigned int nr_segs);
struct request *attempt_back_merge(struct request_queue *q, struct request *rq); struct request *attempt_back_merge(struct request_queue *q, struct request *rq);
struct request *attempt_front_merge(struct request_queue *q, struct request *rq); struct request *attempt_front_merge(struct request_queue *q, struct request *rq);
int blk_attempt_req_merge(struct request_queue *q, struct request *rq, int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
struct request *next); struct request *next);
void blk_recalc_rq_segments(struct request *rq); unsigned int blk_recalc_rq_segments(struct request *rq);
void blk_rq_set_mixed_merge(struct request *rq); void blk_rq_set_mixed_merge(struct request *rq);
bool blk_rq_merge_ok(struct request *rq, struct bio *bio); bool blk_rq_merge_ok(struct request *rq, struct bio *bio);
enum elv_merge blk_try_merge(struct request *rq, struct bio *bio); enum elv_merge blk_try_merge(struct request *rq, struct bio *bio);

View File

@ -1281,7 +1281,6 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno)
struct disk_part_tbl *new_ptbl; struct disk_part_tbl *new_ptbl;
int len = old_ptbl ? old_ptbl->len : 0; int len = old_ptbl ? old_ptbl->len : 0;
int i, target; int i, target;
size_t size;
/* /*
* check for int overflow, since we can get here from blkpg_ioctl() * check for int overflow, since we can get here from blkpg_ioctl()
@ -1298,8 +1297,8 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno)
if (target <= len) if (target <= len)
return 0; return 0;
size = sizeof(*new_ptbl) + target * sizeof(new_ptbl->part[0]); new_ptbl = kzalloc_node(struct_size(new_ptbl, part, target), GFP_KERNEL,
new_ptbl = kzalloc_node(size, GFP_KERNEL, disk->node_id); disk->node_id);
if (!new_ptbl) if (!new_ptbl)
return -ENOMEM; return -ENOMEM;

View File

@ -562,7 +562,8 @@ static void kyber_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
} }
} }
static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio) static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio,
unsigned int nr_segs)
{ {
struct kyber_hctx_data *khd = hctx->sched_data; struct kyber_hctx_data *khd = hctx->sched_data;
struct blk_mq_ctx *ctx = blk_mq_get_ctx(hctx->queue); struct blk_mq_ctx *ctx = blk_mq_get_ctx(hctx->queue);
@ -572,9 +573,8 @@ static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
bool merged; bool merged;
spin_lock(&kcq->lock); spin_lock(&kcq->lock);
merged = blk_mq_bio_list_merge(hctx->queue, rq_list, bio); merged = blk_mq_bio_list_merge(hctx->queue, rq_list, bio, nr_segs);
spin_unlock(&kcq->lock); spin_unlock(&kcq->lock);
blk_mq_put_ctx(ctx);
return merged; return merged;
} }

View File

@ -469,7 +469,8 @@ static int dd_request_merge(struct request_queue *q, struct request **rq,
return ELEVATOR_NO_MERGE; return ELEVATOR_NO_MERGE;
} }
static bool dd_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio) static bool dd_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio,
unsigned int nr_segs)
{ {
struct request_queue *q = hctx->queue; struct request_queue *q = hctx->queue;
struct deadline_data *dd = q->elevator->elevator_data; struct deadline_data *dd = q->elevator->elevator_data;
@ -477,7 +478,7 @@ static bool dd_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
bool ret; bool ret;
spin_lock(&dd->lock); spin_lock(&dd->lock);
ret = blk_mq_sched_try_merge(q, bio, &free); ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
spin_unlock(&dd->lock); spin_unlock(&dd->lock);
if (free) if (free)

View File

@ -98,6 +98,7 @@ enum opal_uid {
OPAL_ENTERPRISE_BANDMASTER0_UID, OPAL_ENTERPRISE_BANDMASTER0_UID,
OPAL_ENTERPRISE_ERASEMASTER_UID, OPAL_ENTERPRISE_ERASEMASTER_UID,
/* tables */ /* tables */
OPAL_TABLE_TABLE,
OPAL_LOCKINGRANGE_GLOBAL, OPAL_LOCKINGRANGE_GLOBAL,
OPAL_LOCKINGRANGE_ACE_RDLOCKED, OPAL_LOCKINGRANGE_ACE_RDLOCKED,
OPAL_LOCKINGRANGE_ACE_WRLOCKED, OPAL_LOCKINGRANGE_ACE_WRLOCKED,
@ -152,6 +153,21 @@ enum opal_token {
OPAL_STARTCOLUMN = 0x03, OPAL_STARTCOLUMN = 0x03,
OPAL_ENDCOLUMN = 0x04, OPAL_ENDCOLUMN = 0x04,
OPAL_VALUES = 0x01, OPAL_VALUES = 0x01,
/* table table */
OPAL_TABLE_UID = 0x00,
OPAL_TABLE_NAME = 0x01,
OPAL_TABLE_COMMON = 0x02,
OPAL_TABLE_TEMPLATE = 0x03,
OPAL_TABLE_KIND = 0x04,
OPAL_TABLE_COLUMN = 0x05,
OPAL_TABLE_COLUMNS = 0x06,
OPAL_TABLE_ROWS = 0x07,
OPAL_TABLE_ROWS_FREE = 0x08,
OPAL_TABLE_ROW_BYTES = 0x09,
OPAL_TABLE_LASTID = 0x0A,
OPAL_TABLE_MIN = 0x0B,
OPAL_TABLE_MAX = 0x0C,
/* authority table */ /* authority table */
OPAL_PIN = 0x03, OPAL_PIN = 0x03,
/* locking tokens */ /* locking tokens */

View File

@ -26,6 +26,9 @@
#define IO_BUFFER_LENGTH 2048 #define IO_BUFFER_LENGTH 2048
#define MAX_TOKS 64 #define MAX_TOKS 64
/* Number of bytes needed by cmd_finalize. */
#define CMD_FINALIZE_BYTES_NEEDED 7
struct opal_step { struct opal_step {
int (*fn)(struct opal_dev *dev, void *data); int (*fn)(struct opal_dev *dev, void *data);
void *data; void *data;
@ -127,6 +130,8 @@ static const u8 opaluid[][OPAL_UID_LENGTH] = {
/* tables */ /* tables */
[OPAL_TABLE_TABLE]
{ 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01 },
[OPAL_LOCKINGRANGE_GLOBAL] = [OPAL_LOCKINGRANGE_GLOBAL] =
{ 0x00, 0x00, 0x08, 0x02, 0x00, 0x00, 0x00, 0x01 }, { 0x00, 0x00, 0x08, 0x02, 0x00, 0x00, 0x00, 0x01 },
[OPAL_LOCKINGRANGE_ACE_RDLOCKED] = [OPAL_LOCKINGRANGE_ACE_RDLOCKED] =
@ -523,12 +528,17 @@ static int opal_discovery0_step(struct opal_dev *dev)
return execute_step(dev, &discovery0_step, 0); return execute_step(dev, &discovery0_step, 0);
} }
static size_t remaining_size(struct opal_dev *cmd)
{
return IO_BUFFER_LENGTH - cmd->pos;
}
static bool can_add(int *err, struct opal_dev *cmd, size_t len) static bool can_add(int *err, struct opal_dev *cmd, size_t len)
{ {
if (*err) if (*err)
return false; return false;
if (len > IO_BUFFER_LENGTH || cmd->pos > IO_BUFFER_LENGTH - len) { if (remaining_size(cmd) < len) {
pr_debug("Error adding %zu bytes: end of buffer.\n", len); pr_debug("Error adding %zu bytes: end of buffer.\n", len);
*err = -ERANGE; *err = -ERANGE;
return false; return false;
@ -674,7 +684,11 @@ static int cmd_finalize(struct opal_dev *cmd, u32 hsn, u32 tsn)
struct opal_header *hdr; struct opal_header *hdr;
int err = 0; int err = 0;
/* close the parameter list opened from cmd_start */ /*
* Close the parameter list opened from cmd_start.
* The number of bytes added must be equal to
* CMD_FINALIZE_BYTES_NEEDED.
*/
add_token_u8(&err, cmd, OPAL_ENDLIST); add_token_u8(&err, cmd, OPAL_ENDLIST);
add_token_u8(&err, cmd, OPAL_ENDOFDATA); add_token_u8(&err, cmd, OPAL_ENDOFDATA);
@ -1119,6 +1133,29 @@ static int generic_get_column(struct opal_dev *dev, const u8 *table,
return finalize_and_send(dev, parse_and_check_status); return finalize_and_send(dev, parse_and_check_status);
} }
/*
* see TCG SAS 5.3.2.3 for a description of the available columns
*
* the result is provided in dev->resp->tok[4]
*/
static int generic_get_table_info(struct opal_dev *dev, enum opal_uid table,
u64 column)
{
u8 uid[OPAL_UID_LENGTH];
const unsigned int half = OPAL_UID_LENGTH/2;
/* sed-opal UIDs can be split in two halves:
* first: actual table index
* second: relative index in the table
* so we have to get the first half of the OPAL_TABLE_TABLE and use the
* first part of the target table as relative index into that table
*/
memcpy(uid, opaluid[OPAL_TABLE_TABLE], half);
memcpy(uid+half, opaluid[table], half);
return generic_get_column(dev, uid, column);
}
static int gen_key(struct opal_dev *dev, void *data) static int gen_key(struct opal_dev *dev, void *data)
{ {
u8 uid[OPAL_UID_LENGTH]; u8 uid[OPAL_UID_LENGTH];
@ -1307,6 +1344,7 @@ static int start_generic_opal_session(struct opal_dev *dev,
break; break;
case OPAL_ADMIN1_UID: case OPAL_ADMIN1_UID:
case OPAL_SID_UID: case OPAL_SID_UID:
case OPAL_PSID_UID:
add_token_u8(&err, dev, OPAL_STARTNAME); add_token_u8(&err, dev, OPAL_STARTNAME);
add_token_u8(&err, dev, 0); /* HostChallenge */ add_token_u8(&err, dev, 0); /* HostChallenge */
add_token_bytestring(&err, dev, key, key_len); add_token_bytestring(&err, dev, key, key_len);
@ -1367,6 +1405,16 @@ static int start_admin1LSP_opal_session(struct opal_dev *dev, void *data)
key->key, key->key_len); key->key, key->key_len);
} }
static int start_PSID_opal_session(struct opal_dev *dev, void *data)
{
const struct opal_key *okey = data;
return start_generic_opal_session(dev, OPAL_PSID_UID,
OPAL_ADMINSP_UID,
okey->key,
okey->key_len);
}
static int start_auth_opal_session(struct opal_dev *dev, void *data) static int start_auth_opal_session(struct opal_dev *dev, void *data)
{ {
struct opal_session_info *session = data; struct opal_session_info *session = data;
@ -1525,6 +1573,72 @@ static int set_mbr_enable_disable(struct opal_dev *dev, void *data)
return finalize_and_send(dev, parse_and_check_status); return finalize_and_send(dev, parse_and_check_status);
} }
static int write_shadow_mbr(struct opal_dev *dev, void *data)
{
struct opal_shadow_mbr *shadow = data;
const u8 __user *src;
u8 *dst;
size_t off = 0;
u64 len;
int err = 0;
/* do we fit in the available shadow mbr space? */
err = generic_get_table_info(dev, OPAL_MBR, OPAL_TABLE_ROWS);
if (err) {
pr_debug("MBR: could not get shadow size\n");
return err;
}
len = response_get_u64(&dev->parsed, 4);
if (shadow->size > len || shadow->offset > len - shadow->size) {
pr_debug("MBR: does not fit in shadow (%llu vs. %llu)\n",
shadow->offset + shadow->size, len);
return -ENOSPC;
}
/* do the actual transmission(s) */
src = (u8 __user *)(uintptr_t)shadow->data;
while (off < shadow->size) {
err = cmd_start(dev, opaluid[OPAL_MBR], opalmethod[OPAL_SET]);
add_token_u8(&err, dev, OPAL_STARTNAME);
add_token_u8(&err, dev, OPAL_WHERE);
add_token_u64(&err, dev, shadow->offset + off);
add_token_u8(&err, dev, OPAL_ENDNAME);
add_token_u8(&err, dev, OPAL_STARTNAME);
add_token_u8(&err, dev, OPAL_VALUES);
/*
* The bytestring header is either 1 or 2 bytes, so assume 2.
* There also needs to be enough space to accommodate the
* trailing OPAL_ENDNAME (1 byte) and tokens added by
* cmd_finalize.
*/
len = min(remaining_size(dev) - (2+1+CMD_FINALIZE_BYTES_NEEDED),
(size_t)(shadow->size - off));
pr_debug("MBR: write bytes %zu+%llu/%llu\n",
off, len, shadow->size);
dst = add_bytestring_header(&err, dev, len);
if (!dst)
break;
if (copy_from_user(dst, src + off, len))
err = -EFAULT;
dev->pos += len;
add_token_u8(&err, dev, OPAL_ENDNAME);
if (err)
break;
err = finalize_and_send(dev, parse_and_check_status);
if (err)
break;
off += len;
}
return err;
}
static int generic_pw_cmd(u8 *key, size_t key_len, u8 *cpin_uid, static int generic_pw_cmd(u8 *key, size_t key_len, u8 *cpin_uid,
struct opal_dev *dev) struct opal_dev *dev)
{ {
@ -1978,6 +2092,50 @@ static int opal_enable_disable_shadow_mbr(struct opal_dev *dev,
return ret; return ret;
} }
static int opal_set_mbr_done(struct opal_dev *dev,
struct opal_mbr_done *mbr_done)
{
u8 mbr_done_tf = mbr_done->done_flag == OPAL_MBR_DONE ?
OPAL_TRUE : OPAL_FALSE;
const struct opal_step mbr_steps[] = {
{ start_admin1LSP_opal_session, &mbr_done->key },
{ set_mbr_done, &mbr_done_tf },
{ end_opal_session, }
};
int ret;
if (mbr_done->done_flag != OPAL_MBR_DONE &&
mbr_done->done_flag != OPAL_MBR_NOT_DONE)
return -EINVAL;
mutex_lock(&dev->dev_lock);
setup_opal_dev(dev);
ret = execute_steps(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
mutex_unlock(&dev->dev_lock);
return ret;
}
static int opal_write_shadow_mbr(struct opal_dev *dev,
struct opal_shadow_mbr *info)
{
const struct opal_step mbr_steps[] = {
{ start_admin1LSP_opal_session, &info->key },
{ write_shadow_mbr, info },
{ end_opal_session, }
};
int ret;
if (info->size == 0)
return 0;
mutex_lock(&dev->dev_lock);
setup_opal_dev(dev);
ret = execute_steps(dev, mbr_steps, ARRAY_SIZE(mbr_steps));
mutex_unlock(&dev->dev_lock);
return ret;
}
static int opal_save(struct opal_dev *dev, struct opal_lock_unlock *lk_unlk) static int opal_save(struct opal_dev *dev, struct opal_lock_unlock *lk_unlk)
{ {
struct opal_suspend_data *suspend; struct opal_suspend_data *suspend;
@ -2030,17 +2188,28 @@ static int opal_add_user_to_lr(struct opal_dev *dev,
return ret; return ret;
} }
static int opal_reverttper(struct opal_dev *dev, struct opal_key *opal) static int opal_reverttper(struct opal_dev *dev, struct opal_key *opal, bool psid)
{ {
/* controller will terminate session */
const struct opal_step revert_steps[] = { const struct opal_step revert_steps[] = {
{ start_SIDASP_opal_session, opal }, { start_SIDASP_opal_session, opal },
{ revert_tper, } /* controller will terminate session */ { revert_tper, }
}; };
const struct opal_step psid_revert_steps[] = {
{ start_PSID_opal_session, opal },
{ revert_tper, }
};
int ret; int ret;
mutex_lock(&dev->dev_lock); mutex_lock(&dev->dev_lock);
setup_opal_dev(dev); setup_opal_dev(dev);
ret = execute_steps(dev, revert_steps, ARRAY_SIZE(revert_steps)); if (psid)
ret = execute_steps(dev, psid_revert_steps,
ARRAY_SIZE(psid_revert_steps));
else
ret = execute_steps(dev, revert_steps,
ARRAY_SIZE(revert_steps));
mutex_unlock(&dev->dev_lock); mutex_unlock(&dev->dev_lock);
/* /*
@ -2092,8 +2261,7 @@ static int opal_lock_unlock(struct opal_dev *dev,
{ {
int ret; int ret;
if (lk_unlk->session.who < OPAL_ADMIN1 || if (lk_unlk->session.who > OPAL_USER9)
lk_unlk->session.who > OPAL_USER9)
return -EINVAL; return -EINVAL;
mutex_lock(&dev->dev_lock); mutex_lock(&dev->dev_lock);
@ -2171,9 +2339,7 @@ static int opal_set_new_pw(struct opal_dev *dev, struct opal_new_pw *opal_pw)
}; };
int ret; int ret;
if (opal_pw->session.who < OPAL_ADMIN1 || if (opal_pw->session.who > OPAL_USER9 ||
opal_pw->session.who > OPAL_USER9 ||
opal_pw->new_user_pw.who < OPAL_ADMIN1 ||
opal_pw->new_user_pw.who > OPAL_USER9) opal_pw->new_user_pw.who > OPAL_USER9)
return -EINVAL; return -EINVAL;
@ -2280,7 +2446,7 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
ret = opal_activate_user(dev, p); ret = opal_activate_user(dev, p);
break; break;
case IOC_OPAL_REVERT_TPR: case IOC_OPAL_REVERT_TPR:
ret = opal_reverttper(dev, p); ret = opal_reverttper(dev, p, false);
break; break;
case IOC_OPAL_LR_SETUP: case IOC_OPAL_LR_SETUP:
ret = opal_setup_locking_range(dev, p); ret = opal_setup_locking_range(dev, p);
@ -2291,12 +2457,21 @@ int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg)
case IOC_OPAL_ENABLE_DISABLE_MBR: case IOC_OPAL_ENABLE_DISABLE_MBR:
ret = opal_enable_disable_shadow_mbr(dev, p); ret = opal_enable_disable_shadow_mbr(dev, p);
break; break;
case IOC_OPAL_MBR_DONE:
ret = opal_set_mbr_done(dev, p);
break;
case IOC_OPAL_WRITE_SHADOW_MBR:
ret = opal_write_shadow_mbr(dev, p);
break;
case IOC_OPAL_ERASE_LR: case IOC_OPAL_ERASE_LR:
ret = opal_erase_locking_range(dev, p); ret = opal_erase_locking_range(dev, p);
break; break;
case IOC_OPAL_SECURE_ERASE_LR: case IOC_OPAL_SECURE_ERASE_LR:
ret = opal_secure_erase_locking_range(dev, p); ret = opal_secure_erase_locking_range(dev, p);
break; break;
case IOC_OPAL_PSID_REVERT_TPR:
ret = opal_reverttper(dev, p, true);
break;
default: default:
break; break;
} }

View File

@ -465,35 +465,20 @@ static const struct file_operations in_flight_summary_fops = {
void drbd_debugfs_resource_add(struct drbd_resource *resource) void drbd_debugfs_resource_add(struct drbd_resource *resource)
{ {
struct dentry *dentry; struct dentry *dentry;
if (!drbd_debugfs_resources)
return;
dentry = debugfs_create_dir(resource->name, drbd_debugfs_resources); dentry = debugfs_create_dir(resource->name, drbd_debugfs_resources);
if (IS_ERR_OR_NULL(dentry))
goto fail;
resource->debugfs_res = dentry; resource->debugfs_res = dentry;
dentry = debugfs_create_dir("volumes", resource->debugfs_res); dentry = debugfs_create_dir("volumes", resource->debugfs_res);
if (IS_ERR_OR_NULL(dentry))
goto fail;
resource->debugfs_res_volumes = dentry; resource->debugfs_res_volumes = dentry;
dentry = debugfs_create_dir("connections", resource->debugfs_res); dentry = debugfs_create_dir("connections", resource->debugfs_res);
if (IS_ERR_OR_NULL(dentry))
goto fail;
resource->debugfs_res_connections = dentry; resource->debugfs_res_connections = dentry;
dentry = debugfs_create_file("in_flight_summary", 0440, dentry = debugfs_create_file("in_flight_summary", 0440,
resource->debugfs_res, resource, resource->debugfs_res, resource,
&in_flight_summary_fops); &in_flight_summary_fops);
if (IS_ERR_OR_NULL(dentry))
goto fail;
resource->debugfs_res_in_flight_summary = dentry; resource->debugfs_res_in_flight_summary = dentry;
return;
fail:
drbd_debugfs_resource_cleanup(resource);
drbd_err(resource, "failed to create debugfs dentry\n");
} }
static void drbd_debugfs_remove(struct dentry **dp) static void drbd_debugfs_remove(struct dentry **dp)
@ -636,35 +621,22 @@ void drbd_debugfs_connection_add(struct drbd_connection *connection)
{ {
struct dentry *conns_dir = connection->resource->debugfs_res_connections; struct dentry *conns_dir = connection->resource->debugfs_res_connections;
struct dentry *dentry; struct dentry *dentry;
if (!conns_dir)
return;
/* Once we enable mutliple peers, /* Once we enable mutliple peers,
* these connections will have descriptive names. * these connections will have descriptive names.
* For now, it is just the one connection to the (only) "peer". */ * For now, it is just the one connection to the (only) "peer". */
dentry = debugfs_create_dir("peer", conns_dir); dentry = debugfs_create_dir("peer", conns_dir);
if (IS_ERR_OR_NULL(dentry))
goto fail;
connection->debugfs_conn = dentry; connection->debugfs_conn = dentry;
dentry = debugfs_create_file("callback_history", 0440, dentry = debugfs_create_file("callback_history", 0440,
connection->debugfs_conn, connection, connection->debugfs_conn, connection,
&connection_callback_history_fops); &connection_callback_history_fops);
if (IS_ERR_OR_NULL(dentry))
goto fail;
connection->debugfs_conn_callback_history = dentry; connection->debugfs_conn_callback_history = dentry;
dentry = debugfs_create_file("oldest_requests", 0440, dentry = debugfs_create_file("oldest_requests", 0440,
connection->debugfs_conn, connection, connection->debugfs_conn, connection,
&connection_oldest_requests_fops); &connection_oldest_requests_fops);
if (IS_ERR_OR_NULL(dentry))
goto fail;
connection->debugfs_conn_oldest_requests = dentry; connection->debugfs_conn_oldest_requests = dentry;
return;
fail:
drbd_debugfs_connection_cleanup(connection);
drbd_err(connection, "failed to create debugfs dentry\n");
} }
void drbd_debugfs_connection_cleanup(struct drbd_connection *connection) void drbd_debugfs_connection_cleanup(struct drbd_connection *connection)
@ -809,8 +781,6 @@ void drbd_debugfs_device_add(struct drbd_device *device)
snprintf(vnr_buf, sizeof(vnr_buf), "%u", device->vnr); snprintf(vnr_buf, sizeof(vnr_buf), "%u", device->vnr);
dentry = debugfs_create_dir(vnr_buf, vols_dir); dentry = debugfs_create_dir(vnr_buf, vols_dir);
if (IS_ERR_OR_NULL(dentry))
goto fail;
device->debugfs_vol = dentry; device->debugfs_vol = dentry;
snprintf(minor_buf, sizeof(minor_buf), "%u", device->minor); snprintf(minor_buf, sizeof(minor_buf), "%u", device->minor);
@ -819,18 +789,14 @@ void drbd_debugfs_device_add(struct drbd_device *device)
if (!slink_name) if (!slink_name)
goto fail; goto fail;
dentry = debugfs_create_symlink(minor_buf, drbd_debugfs_minors, slink_name); dentry = debugfs_create_symlink(minor_buf, drbd_debugfs_minors, slink_name);
device->debugfs_minor = dentry;
kfree(slink_name); kfree(slink_name);
slink_name = NULL; slink_name = NULL;
if (IS_ERR_OR_NULL(dentry))
goto fail;
device->debugfs_minor = dentry;
#define DCF(name) do { \ #define DCF(name) do { \
dentry = debugfs_create_file(#name, 0440, \ dentry = debugfs_create_file(#name, 0440, \
device->debugfs_vol, device, \ device->debugfs_vol, device, \
&device_ ## name ## _fops); \ &device_ ## name ## _fops); \
if (IS_ERR_OR_NULL(dentry)) \
goto fail; \
device->debugfs_vol_ ## name = dentry; \ device->debugfs_vol_ ## name = dentry; \
} while (0) } while (0)
@ -864,19 +830,9 @@ void drbd_debugfs_peer_device_add(struct drbd_peer_device *peer_device)
struct dentry *dentry; struct dentry *dentry;
char vnr_buf[8]; char vnr_buf[8];
if (!conn_dir)
return;
snprintf(vnr_buf, sizeof(vnr_buf), "%u", peer_device->device->vnr); snprintf(vnr_buf, sizeof(vnr_buf), "%u", peer_device->device->vnr);
dentry = debugfs_create_dir(vnr_buf, conn_dir); dentry = debugfs_create_dir(vnr_buf, conn_dir);
if (IS_ERR_OR_NULL(dentry))
goto fail;
peer_device->debugfs_peer_dev = dentry; peer_device->debugfs_peer_dev = dentry;
return;
fail:
drbd_debugfs_peer_device_cleanup(peer_device);
drbd_err(peer_device, "failed to create debugfs entries\n");
} }
void drbd_debugfs_peer_device_cleanup(struct drbd_peer_device *peer_device) void drbd_debugfs_peer_device_cleanup(struct drbd_peer_device *peer_device)
@ -917,35 +873,19 @@ void drbd_debugfs_cleanup(void)
drbd_debugfs_remove(&drbd_debugfs_root); drbd_debugfs_remove(&drbd_debugfs_root);
} }
int __init drbd_debugfs_init(void) void __init drbd_debugfs_init(void)
{ {
struct dentry *dentry; struct dentry *dentry;
dentry = debugfs_create_dir("drbd", NULL); dentry = debugfs_create_dir("drbd", NULL);
if (IS_ERR_OR_NULL(dentry))
goto fail;
drbd_debugfs_root = dentry; drbd_debugfs_root = dentry;
dentry = debugfs_create_file("version", 0444, drbd_debugfs_root, NULL, &drbd_version_fops); dentry = debugfs_create_file("version", 0444, drbd_debugfs_root, NULL, &drbd_version_fops);
if (IS_ERR_OR_NULL(dentry))
goto fail;
drbd_debugfs_version = dentry; drbd_debugfs_version = dentry;
dentry = debugfs_create_dir("resources", drbd_debugfs_root); dentry = debugfs_create_dir("resources", drbd_debugfs_root);
if (IS_ERR_OR_NULL(dentry))
goto fail;
drbd_debugfs_resources = dentry; drbd_debugfs_resources = dentry;
dentry = debugfs_create_dir("minors", drbd_debugfs_root); dentry = debugfs_create_dir("minors", drbd_debugfs_root);
if (IS_ERR_OR_NULL(dentry))
goto fail;
drbd_debugfs_minors = dentry; drbd_debugfs_minors = dentry;
return 0;
fail:
drbd_debugfs_cleanup();
if (dentry)
return PTR_ERR(dentry);
else
return -EINVAL;
} }

View File

@ -6,7 +6,7 @@
#include "drbd_int.h" #include "drbd_int.h"
#ifdef CONFIG_DEBUG_FS #ifdef CONFIG_DEBUG_FS
int __init drbd_debugfs_init(void); void __init drbd_debugfs_init(void);
void drbd_debugfs_cleanup(void); void drbd_debugfs_cleanup(void);
void drbd_debugfs_resource_add(struct drbd_resource *resource); void drbd_debugfs_resource_add(struct drbd_resource *resource);
@ -22,7 +22,7 @@ void drbd_debugfs_peer_device_add(struct drbd_peer_device *peer_device);
void drbd_debugfs_peer_device_cleanup(struct drbd_peer_device *peer_device); void drbd_debugfs_peer_device_cleanup(struct drbd_peer_device *peer_device);
#else #else
static inline int __init drbd_debugfs_init(void) { return -ENODEV; } static inline void __init drbd_debugfs_init(void) { }
static inline void drbd_debugfs_cleanup(void) { } static inline void drbd_debugfs_cleanup(void) { }
static inline void drbd_debugfs_resource_add(struct drbd_resource *resource) { } static inline void drbd_debugfs_resource_add(struct drbd_resource *resource) { }

View File

@ -3009,8 +3009,7 @@ static int __init drbd_init(void)
spin_lock_init(&retry.lock); spin_lock_init(&retry.lock);
INIT_LIST_HEAD(&retry.writes); INIT_LIST_HEAD(&retry.writes);
if (drbd_debugfs_init()) drbd_debugfs_init();
pr_notice("failed to initialize debugfs -- will not be available\n");
pr_info("initialized. " pr_info("initialized. "
"Version: " REL_VERSION " (api:%d/proto:%d-%d)\n", "Version: " REL_VERSION " (api:%d/proto:%d-%d)\n",

View File

@ -3900,7 +3900,7 @@ static void __init config_types(void)
if (!UDP->cmos) if (!UDP->cmos)
UDP->cmos = FLOPPY0_TYPE; UDP->cmos = FLOPPY0_TYPE;
drive = 1; drive = 1;
if (!UDP->cmos && FLOPPY1_TYPE) if (!UDP->cmos)
UDP->cmos = FLOPPY1_TYPE; UDP->cmos = FLOPPY1_TYPE;
/* FIXME: additional physical CMOS drive detection should go here */ /* FIXME: additional physical CMOS drive detection should go here */

View File

@ -264,20 +264,12 @@ lo_do_transfer(struct loop_device *lo, int cmd,
return ret; return ret;
} }
static inline void loop_iov_iter_bvec(struct iov_iter *i,
unsigned int direction, const struct bio_vec *bvec,
unsigned long nr_segs, size_t count)
{
iov_iter_bvec(i, direction, bvec, nr_segs, count);
i->type |= ITER_BVEC_FLAG_NO_REF;
}
static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos) static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos)
{ {
struct iov_iter i; struct iov_iter i;
ssize_t bw; ssize_t bw;
loop_iov_iter_bvec(&i, WRITE, bvec, 1, bvec->bv_len); iov_iter_bvec(&i, WRITE, bvec, 1, bvec->bv_len);
file_start_write(file); file_start_write(file);
bw = vfs_iter_write(file, &i, ppos, 0); bw = vfs_iter_write(file, &i, ppos, 0);
@ -355,7 +347,7 @@ static int lo_read_simple(struct loop_device *lo, struct request *rq,
ssize_t len; ssize_t len;
rq_for_each_segment(bvec, rq, iter) { rq_for_each_segment(bvec, rq, iter) {
loop_iov_iter_bvec(&i, READ, &bvec, 1, bvec.bv_len); iov_iter_bvec(&i, READ, &bvec, 1, bvec.bv_len);
len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0); len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0);
if (len < 0) if (len < 0)
return len; return len;
@ -396,7 +388,7 @@ static int lo_read_transfer(struct loop_device *lo, struct request *rq,
b.bv_offset = 0; b.bv_offset = 0;
b.bv_len = bvec.bv_len; b.bv_len = bvec.bv_len;
loop_iov_iter_bvec(&i, READ, &b, 1, b.bv_len); iov_iter_bvec(&i, READ, &b, 1, b.bv_len);
len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0); len = vfs_iter_read(lo->lo_backing_file, &i, &pos, 0);
if (len < 0) { if (len < 0) {
ret = len; ret = len;
@ -563,7 +555,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
} }
atomic_set(&cmd->ref, 2); atomic_set(&cmd->ref, 2);
loop_iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq)); iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq));
iter.iov_offset = offset; iter.iov_offset = offset;
cmd->iocb.ki_pos = pos; cmd->iocb.ki_pos = pos;

View File

@ -1577,7 +1577,6 @@ static int exec_drive_command(struct mtip_port *port, u8 *command,
ATA_SECT_SIZE * xfer_sz); ATA_SECT_SIZE * xfer_sz);
return -ENOMEM; return -ENOMEM;
} }
memset(buf, 0, ATA_SECT_SIZE * xfer_sz);
} }
/* Build the FIS. */ /* Build the FIS. */
@ -2776,7 +2775,6 @@ static int mtip_dma_alloc(struct driver_data *dd)
&port->block1_dma, GFP_KERNEL); &port->block1_dma, GFP_KERNEL);
if (!port->block1) if (!port->block1)
return -ENOMEM; return -ENOMEM;
memset(port->block1, 0, BLOCK_DMA_ALLOC_SZ);
/* Allocate dma memory for command list */ /* Allocate dma memory for command list */
port->command_list = port->command_list =
@ -2789,7 +2787,6 @@ static int mtip_dma_alloc(struct driver_data *dd)
port->block1_dma = 0; port->block1_dma = 0;
return -ENOMEM; return -ENOMEM;
} }
memset(port->command_list, 0, AHCI_CMD_TBL_SZ);
/* Setup all pointers into first DMA region */ /* Setup all pointers into first DMA region */
port->rxfis = port->block1 + AHCI_RX_FIS_OFFSET; port->rxfis = port->block1 + AHCI_RX_FIS_OFFSET;
@ -3529,8 +3526,6 @@ static int mtip_init_cmd(struct blk_mq_tag_set *set, struct request *rq,
if (!cmd->command) if (!cmd->command)
return -ENOMEM; return -ENOMEM;
memset(cmd->command, 0, CMD_DMA_ALLOC_SZ);
sg_init_table(cmd->sg, MTIP_MAX_SG); sg_init_table(cmd->sg, MTIP_MAX_SG);
return 0; return 0;
} }

View File

@ -327,11 +327,12 @@ static ssize_t nullb_device_power_store(struct config_item *item,
set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags); set_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
dev->power = newp; dev->power = newp;
} else if (dev->power && !newp) { } else if (dev->power && !newp) {
mutex_lock(&lock); if (test_and_clear_bit(NULLB_DEV_FL_UP, &dev->flags)) {
dev->power = newp; mutex_lock(&lock);
null_del_dev(dev->nullb); dev->power = newp;
mutex_unlock(&lock); null_del_dev(dev->nullb);
clear_bit(NULLB_DEV_FL_UP, &dev->flags); mutex_unlock(&lock);
}
clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags); clear_bit(NULLB_DEV_FL_CONFIGURED, &dev->flags);
} }
@ -1197,7 +1198,7 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd)
if (!cmd->error && dev->zoned) { if (!cmd->error && dev->zoned) {
sector_t sector; sector_t sector;
unsigned int nr_sectors; unsigned int nr_sectors;
int op; enum req_opf op;
if (dev->queue_mode == NULL_Q_BIO) { if (dev->queue_mode == NULL_Q_BIO) {
op = bio_op(cmd->bio); op = bio_op(cmd->bio);
@ -1488,7 +1489,6 @@ static int setup_queues(struct nullb *nullb)
if (!nullb->queues) if (!nullb->queues)
return -ENOMEM; return -ENOMEM;
nullb->nr_queues = 0;
nullb->queue_depth = nullb->dev->hw_queue_depth; nullb->queue_depth = nullb->dev->hw_queue_depth;
return 0; return 0;

View File

@ -2694,7 +2694,6 @@ static int skd_cons_skmsg(struct skd_device *skdev)
(FIT_QCMD_ALIGN - 1), (FIT_QCMD_ALIGN - 1),
"not aligned: msg_buf %p mb_dma_address %pad\n", "not aligned: msg_buf %p mb_dma_address %pad\n",
skmsg->msg_buf, &skmsg->mb_dma_address); skmsg->msg_buf, &skmsg->mb_dma_address);
memset(skmsg->msg_buf, 0, SKD_N_FITMSG_BYTES);
} }
err_out: err_out:

View File

@ -478,7 +478,7 @@ static void __nvm_remove_target(struct nvm_target *t, bool graceful)
*/ */
static int nvm_remove_tgt(struct nvm_ioctl_remove *remove) static int nvm_remove_tgt(struct nvm_ioctl_remove *remove)
{ {
struct nvm_target *t; struct nvm_target *t = NULL;
struct nvm_dev *dev; struct nvm_dev *dev;
down_read(&nvm_lock); down_read(&nvm_lock);

View File

@ -323,14 +323,16 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off, void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off,
int nr_pages) int nr_pages)
{ {
struct bio_vec bv; struct bio_vec *bv;
int i; struct page *page;
int i, e, nbv = 0;
WARN_ON(off + nr_pages != bio->bi_vcnt); for (i = 0; i < bio->bi_vcnt; i++) {
bv = &bio->bi_io_vec[i];
for (i = off; i < nr_pages + off; i++) { page = bv->bv_page;
bv = bio->bi_io_vec[i]; for (e = 0; e < bv->bv_len; e += PBLK_EXPOSED_PAGE_SIZE, nbv++)
mempool_free(bv.bv_page, &pblk->page_bio_pool); if (nbv >= off)
mempool_free(page++, &pblk->page_bio_pool);
} }
} }

View File

@ -393,6 +393,11 @@ long bch_bucket_alloc(struct cache *ca, unsigned int reserve, bool wait)
struct bucket *b; struct bucket *b;
long r; long r;
/* No allocation if CACHE_SET_IO_DISABLE bit is set */
if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &ca->set->flags)))
return -1;
/* fastpath */ /* fastpath */
if (fifo_pop(&ca->free[RESERVE_NONE], r) || if (fifo_pop(&ca->free[RESERVE_NONE], r) ||
fifo_pop(&ca->free[reserve], r)) fifo_pop(&ca->free[reserve], r))
@ -484,6 +489,10 @@ int __bch_bucket_alloc_set(struct cache_set *c, unsigned int reserve,
{ {
int i; int i;
/* No allocation if CACHE_SET_IO_DISABLE bit is set */
if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags)))
return -1;
lockdep_assert_held(&c->bucket_lock); lockdep_assert_held(&c->bucket_lock);
BUG_ON(!n || n > c->caches_loaded || n > MAX_CACHES_PER_SET); BUG_ON(!n || n > c->caches_loaded || n > MAX_CACHES_PER_SET);

View File

@ -705,8 +705,8 @@ struct cache_set {
atomic_long_t writeback_keys_failed; atomic_long_t writeback_keys_failed;
atomic_long_t reclaim; atomic_long_t reclaim;
atomic_long_t reclaimed_journal_buckets;
atomic_long_t flush_write; atomic_long_t flush_write;
atomic_long_t retry_flush_write;
enum { enum {
ON_ERROR_UNREGISTER, ON_ERROR_UNREGISTER,
@ -726,8 +726,6 @@ struct cache_set {
#define BUCKET_HASH_BITS 12 #define BUCKET_HASH_BITS 12
struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS]; struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS];
DECLARE_HEAP(struct btree *, flush_btree);
}; };
struct bbio { struct bbio {
@ -1006,7 +1004,7 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size);
int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c, int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
uint8_t *set_uuid); uint8_t *set_uuid);
void bch_cached_dev_detach(struct cached_dev *dc); void bch_cached_dev_detach(struct cached_dev *dc);
void bch_cached_dev_run(struct cached_dev *dc); int bch_cached_dev_run(struct cached_dev *dc);
void bcache_device_stop(struct bcache_device *d); void bcache_device_stop(struct bcache_device *d);
void bch_cache_set_unregister(struct cache_set *c); void bch_cache_set_unregister(struct cache_set *c);

View File

@ -347,22 +347,19 @@ EXPORT_SYMBOL(bch_btree_keys_alloc);
void bch_btree_keys_init(struct btree_keys *b, const struct btree_keys_ops *ops, void bch_btree_keys_init(struct btree_keys *b, const struct btree_keys_ops *ops,
bool *expensive_debug_checks) bool *expensive_debug_checks)
{ {
unsigned int i;
b->ops = ops; b->ops = ops;
b->expensive_debug_checks = expensive_debug_checks; b->expensive_debug_checks = expensive_debug_checks;
b->nsets = 0; b->nsets = 0;
b->last_set_unwritten = 0; b->last_set_unwritten = 0;
/* XXX: shouldn't be needed */
for (i = 0; i < MAX_BSETS; i++)
b->set[i].size = 0;
/* /*
* Second loop starts at 1 because b->keys[0]->data is the memory we * struct btree_keys in embedded in struct btree, and struct
* allocated * bset_tree is embedded into struct btree_keys. They are all
* initialized as 0 by kzalloc() in mca_bucket_alloc(), and
* b->set[0].data is allocated in bch_btree_keys_alloc(), so we
* don't have to initiate b->set[].size and b->set[].data here
* any more.
*/ */
for (i = 1; i < MAX_BSETS; i++)
b->set[i].data = NULL;
} }
EXPORT_SYMBOL(bch_btree_keys_init); EXPORT_SYMBOL(bch_btree_keys_init);
@ -970,45 +967,25 @@ static struct bset_search_iter bset_search_tree(struct bset_tree *t,
unsigned int inorder, j, n = 1; unsigned int inorder, j, n = 1;
do { do {
/*
* A bit trick here.
* If p < t->size, (int)(p - t->size) is a minus value and
* the most significant bit is set, right shifting 31 bits
* gets 1. If p >= t->size, the most significant bit is
* not set, right shifting 31 bits gets 0.
* So the following 2 lines equals to
* if (p >= t->size)
* p = 0;
* but a branch instruction is avoided.
*/
unsigned int p = n << 4; unsigned int p = n << 4;
p &= ((int) (p - t->size)) >> 31; if (p < t->size)
prefetch(&t->tree[p]);
prefetch(&t->tree[p]);
j = n; j = n;
f = &t->tree[j]; f = &t->tree[j];
/* if (likely(f->exponent != 127)) {
* Similar bit trick, use subtract operation to avoid a branch if (f->mantissa >= bfloat_mantissa(search, f))
* instruction. n = j * 2;
* else
* n = (f->mantissa > bfloat_mantissa()) n = j * 2 + 1;
* ? j * 2 } else {
* : j * 2 + 1; if (bkey_cmp(tree_to_bkey(t, j), search) > 0)
* n = j * 2;
* We need to subtract 1 from f->mantissa for the sign bit trick else
* to work - that's done in make_bfloat() n = j * 2 + 1;
*/ }
if (likely(f->exponent != 127))
n = j * 2 + (((unsigned int)
(f->mantissa -
bfloat_mantissa(search, f))) >> 31);
else
n = (bkey_cmp(tree_to_bkey(t, j), search) > 0)
? j * 2
: j * 2 + 1;
} while (n < t->size); } while (n < t->size);
inorder = to_inorder(j, t); inorder = to_inorder(j, t);

View File

@ -35,7 +35,7 @@
#include <linux/rcupdate.h> #include <linux/rcupdate.h>
#include <linux/sched/clock.h> #include <linux/sched/clock.h>
#include <linux/rculist.h> #include <linux/rculist.h>
#include <linux/delay.h>
#include <trace/events/bcache.h> #include <trace/events/bcache.h>
/* /*
@ -613,6 +613,10 @@ static void mca_data_alloc(struct btree *b, struct bkey *k, gfp_t gfp)
static struct btree *mca_bucket_alloc(struct cache_set *c, static struct btree *mca_bucket_alloc(struct cache_set *c,
struct bkey *k, gfp_t gfp) struct bkey *k, gfp_t gfp)
{ {
/*
* kzalloc() is necessary here for initialization,
* see code comments in bch_btree_keys_init().
*/
struct btree *b = kzalloc(sizeof(struct btree), gfp); struct btree *b = kzalloc(sizeof(struct btree), gfp);
if (!b) if (!b)
@ -655,7 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex); up(&b->io_mutex);
} }
retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock); mutex_lock(&b->write_lock);
/*
* If this btree node is selected in btree_flush_write() by journal
* code, delay and retry until the node is flushed by journal code
* and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
*/
if (btree_node_journal_flush(b)) {
pr_debug("bnode %p is flushing by journal, retry", b);
mutex_unlock(&b->write_lock);
udelay(1);
goto retry;
}
if (btree_node_dirty(b)) if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl); __bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock); mutex_unlock(&b->write_lock);
@ -778,10 +800,15 @@ void bch_btree_cache_free(struct cache_set *c)
while (!list_empty(&c->btree_cache)) { while (!list_empty(&c->btree_cache)) {
b = list_first_entry(&c->btree_cache, struct btree, list); b = list_first_entry(&c->btree_cache, struct btree, list);
if (btree_node_dirty(b)) /*
* This function is called by cache_set_free(), no I/O
* request on cache now, it is unnecessary to acquire
* b->write_lock before clearing BTREE_NODE_dirty anymore.
*/
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b)); btree_complete_write(b, btree_current_write(b));
clear_bit(BTREE_NODE_dirty, &b->flags); clear_bit(BTREE_NODE_dirty, &b->flags);
}
mca_data_free(b); mca_data_free(b);
} }
@ -1067,11 +1094,25 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root); BUG_ON(b == b->c->root);
retry:
mutex_lock(&b->write_lock); mutex_lock(&b->write_lock);
/*
* If the btree node is selected and flushing in btree_flush_write(),
* delay and retry until the BTREE_NODE_journal_flush bit cleared,
* then it is safe to free the btree node here. Otherwise this btree
* node will be in race condition.
*/
if (btree_node_journal_flush(b)) {
mutex_unlock(&b->write_lock);
pr_debug("bnode %p journal_flush set, retry", b);
udelay(1);
goto retry;
}
if (btree_node_dirty(b)) if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b)); btree_complete_write(b, btree_current_write(b));
clear_bit(BTREE_NODE_dirty, &b->flags); clear_bit(BTREE_NODE_dirty, &b->flags);
}
mutex_unlock(&b->write_lock); mutex_unlock(&b->write_lock);

View File

@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error, BTREE_NODE_io_error,
BTREE_NODE_dirty, BTREE_NODE_dirty,
BTREE_NODE_write_idx, BTREE_NODE_write_idx,
BTREE_NODE_journal_flush,
}; };
BTREE_FLAG(io_error); BTREE_FLAG(io_error);
BTREE_FLAG(dirty); BTREE_FLAG(dirty);
BTREE_FLAG(write_idx); BTREE_FLAG(write_idx);
BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b) static inline struct btree_write *btree_current_write(struct btree *b)
{ {

View File

@ -58,6 +58,18 @@ void bch_count_backing_io_errors(struct cached_dev *dc, struct bio *bio)
WARN_ONCE(!dc, "NULL pointer of struct cached_dev"); WARN_ONCE(!dc, "NULL pointer of struct cached_dev");
/*
* Read-ahead requests on a degrading and recovering md raid
* (e.g. raid6) device might be failured immediately by md
* raid code, which is not a real hardware media failure. So
* we shouldn't count failed REQ_RAHEAD bio to dc->io_errors.
*/
if (bio->bi_opf & REQ_RAHEAD) {
pr_warn_ratelimited("%s: Read-ahead I/O failed on backing device, ignore",
dc->backing_dev_name);
return;
}
errors = atomic_add_return(1, &dc->io_errors); errors = atomic_add_return(1, &dc->io_errors);
if (errors < dc->error_limit) if (errors < dc->error_limit)
pr_err("%s: IO error on backing device, unrecoverable", pr_err("%s: IO error on backing device, unrecoverable",

View File

@ -100,6 +100,20 @@ reread: left = ca->sb.bucket_size - offset;
blocks = set_blocks(j, block_bytes(ca->set)); blocks = set_blocks(j, block_bytes(ca->set));
/*
* Nodes in 'list' are in linear increasing order of
* i->j.seq, the node on head has the smallest (oldest)
* journal seq, the node on tail has the biggest
* (latest) journal seq.
*/
/*
* Check from the oldest jset for last_seq. If
* i->j.seq < j->last_seq, it means the oldest jset
* in list is expired and useless, remove it from
* this list. Otherwise, j is a condidate jset for
* further following checks.
*/
while (!list_empty(list)) { while (!list_empty(list)) {
i = list_first_entry(list, i = list_first_entry(list,
struct journal_replay, list); struct journal_replay, list);
@ -109,13 +123,22 @@ reread: left = ca->sb.bucket_size - offset;
kfree(i); kfree(i);
} }
/* iterate list in reverse order (from latest jset) */
list_for_each_entry_reverse(i, list, list) { list_for_each_entry_reverse(i, list, list) {
if (j->seq == i->j.seq) if (j->seq == i->j.seq)
goto next_set; goto next_set;
/*
* if j->seq is less than any i->j.last_seq
* in list, j is an expired and useless jset.
*/
if (j->seq < i->j.last_seq) if (j->seq < i->j.last_seq)
goto next_set; goto next_set;
/*
* 'where' points to first jset in list which
* is elder then j.
*/
if (j->seq > i->j.seq) { if (j->seq > i->j.seq) {
where = &i->list; where = &i->list;
goto add; goto add;
@ -129,10 +152,12 @@ add:
if (!i) if (!i)
return -ENOMEM; return -ENOMEM;
memcpy(&i->j, j, bytes); memcpy(&i->j, j, bytes);
/* Add to the location after 'where' points to */
list_add(&i->list, where); list_add(&i->list, where);
ret = 1; ret = 1;
ja->seq[bucket_index] = j->seq; if (j->seq > ja->seq[bucket_index])
ja->seq[bucket_index] = j->seq;
next_set: next_set:
offset += blocks * ca->sb.block_size; offset += blocks * ca->sb.block_size;
len -= blocks * ca->sb.block_size; len -= blocks * ca->sb.block_size;
@ -268,7 +293,7 @@ bsearch:
struct journal_replay, struct journal_replay,
list)->j.seq; list)->j.seq;
return ret; return 0;
#undef read_bucket #undef read_bucket
} }
@ -391,60 +416,90 @@ err:
} }
/* Journalling */ /* Journalling */
#define journal_max_cmp(l, r) \
(fifo_idx(&c->journal.pin, btree_current_write(l)->journal) < \
fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal))
#define journal_min_cmp(l, r) \
(fifo_idx(&c->journal.pin, btree_current_write(l)->journal) > \
fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal))
static void btree_flush_write(struct cache_set *c) static void btree_flush_write(struct cache_set *c)
{ {
/* struct btree *b, *t, *btree_nodes[BTREE_FLUSH_NR];
* Try to find the btree node with that references the oldest journal unsigned int i, n;
* entry, best is our current candidate and is locked if non NULL:
*/ if (c->journal.btree_flushing)
struct btree *b; return;
int i;
spin_lock(&c->journal.flush_write_lock);
if (c->journal.btree_flushing) {
spin_unlock(&c->journal.flush_write_lock);
return;
}
c->journal.btree_flushing = true;
spin_unlock(&c->journal.flush_write_lock);
atomic_long_inc(&c->flush_write); atomic_long_inc(&c->flush_write);
memset(btree_nodes, 0, sizeof(btree_nodes));
n = 0;
retry: mutex_lock(&c->bucket_lock);
spin_lock(&c->journal.lock); list_for_each_entry_safe_reverse(b, t, &c->btree_cache, list) {
if (heap_empty(&c->flush_btree)) { if (btree_node_journal_flush(b))
for_each_cached_btree(b, c, i) pr_err("BUG: flush_write bit should not be set here!");
if (btree_current_write(b)->journal) {
if (!heap_full(&c->flush_btree))
heap_add(&c->flush_btree, b,
journal_max_cmp);
else if (journal_max_cmp(b,
heap_peek(&c->flush_btree))) {
c->flush_btree.data[0] = b;
heap_sift(&c->flush_btree, 0,
journal_max_cmp);
}
}
for (i = c->flush_btree.used / 2 - 1; i >= 0; --i)
heap_sift(&c->flush_btree, i, journal_min_cmp);
}
b = NULL;
heap_pop(&c->flush_btree, b, journal_min_cmp);
spin_unlock(&c->journal.lock);
if (b) {
mutex_lock(&b->write_lock); mutex_lock(&b->write_lock);
if (!btree_node_dirty(b)) {
mutex_unlock(&b->write_lock);
continue;
}
if (!btree_current_write(b)->journal) { if (!btree_current_write(b)->journal) {
mutex_unlock(&b->write_lock); mutex_unlock(&b->write_lock);
/* We raced */ continue;
atomic_long_inc(&c->retry_flush_write); }
goto retry;
set_btree_node_journal_flush(b);
mutex_unlock(&b->write_lock);
btree_nodes[n++] = b;
if (n == BTREE_FLUSH_NR)
break;
}
mutex_unlock(&c->bucket_lock);
for (i = 0; i < n; i++) {
b = btree_nodes[i];
if (!b) {
pr_err("BUG: btree_nodes[%d] is NULL", i);
continue;
}
/* safe to check without holding b->write_lock */
if (!btree_node_journal_flush(b)) {
pr_err("BUG: bnode %p: journal_flush bit cleaned", b);
continue;
}
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
pr_debug("bnode %p: written by others", b);
continue;
}
if (!btree_node_dirty(b)) {
clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
pr_debug("bnode %p: dirty bit cleaned by others", b);
continue;
} }
__bch_btree_node_write(b, NULL); __bch_btree_node_write(b, NULL);
clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock); mutex_unlock(&b->write_lock);
} }
spin_lock(&c->journal.flush_write_lock);
c->journal.btree_flushing = false;
spin_unlock(&c->journal.flush_write_lock);
} }
#define last_seq(j) ((j)->seq - fifo_used(&(j)->pin) + 1) #define last_seq(j) ((j)->seq - fifo_used(&(j)->pin) + 1)
@ -559,6 +614,7 @@ static void journal_reclaim(struct cache_set *c)
k->ptr[n++] = MAKE_PTR(0, k->ptr[n++] = MAKE_PTR(0,
bucket_to_sector(c, ca->sb.d[ja->cur_idx]), bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
ca->sb.nr_this_dev); ca->sb.nr_this_dev);
atomic_long_inc(&c->reclaimed_journal_buckets);
} }
if (n) { if (n) {
@ -811,6 +867,10 @@ atomic_t *bch_journal(struct cache_set *c,
struct journal_write *w; struct journal_write *w;
atomic_t *ret; atomic_t *ret;
/* No journaling if CACHE_SET_IO_DISABLE set already */
if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags)))
return NULL;
if (!CACHE_SYNC(&c->sb)) if (!CACHE_SYNC(&c->sb))
return NULL; return NULL;
@ -855,7 +915,6 @@ void bch_journal_free(struct cache_set *c)
free_pages((unsigned long) c->journal.w[1].data, JSET_BITS); free_pages((unsigned long) c->journal.w[1].data, JSET_BITS);
free_pages((unsigned long) c->journal.w[0].data, JSET_BITS); free_pages((unsigned long) c->journal.w[0].data, JSET_BITS);
free_fifo(&c->journal.pin); free_fifo(&c->journal.pin);
free_heap(&c->flush_btree);
} }
int bch_journal_alloc(struct cache_set *c) int bch_journal_alloc(struct cache_set *c)
@ -863,6 +922,7 @@ int bch_journal_alloc(struct cache_set *c)
struct journal *j = &c->journal; struct journal *j = &c->journal;
spin_lock_init(&j->lock); spin_lock_init(&j->lock);
spin_lock_init(&j->flush_write_lock);
INIT_DELAYED_WORK(&j->work, journal_write_work); INIT_DELAYED_WORK(&j->work, journal_write_work);
c->journal_delay_ms = 100; c->journal_delay_ms = 100;
@ -870,8 +930,7 @@ int bch_journal_alloc(struct cache_set *c)
j->w[0].c = c; j->w[0].c = c;
j->w[1].c = c; j->w[1].c = c;
if (!(init_heap(&c->flush_btree, 128, GFP_KERNEL)) || if (!(init_fifo(&j->pin, JOURNAL_PIN, GFP_KERNEL)) ||
!(init_fifo(&j->pin, JOURNAL_PIN, GFP_KERNEL)) ||
!(j->w[0].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)) || !(j->w[0].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)) ||
!(j->w[1].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS))) !(j->w[1].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)))
return -ENOMEM; return -ENOMEM;

View File

@ -103,6 +103,8 @@ struct journal_write {
/* Embedded in struct cache_set */ /* Embedded in struct cache_set */
struct journal { struct journal {
spinlock_t lock; spinlock_t lock;
spinlock_t flush_write_lock;
bool btree_flushing;
/* used when waiting because the journal was full */ /* used when waiting because the journal was full */
struct closure_waitlist wait; struct closure_waitlist wait;
struct closure io; struct closure io;
@ -154,6 +156,8 @@ struct journal_device {
struct bio_vec bv[8]; struct bio_vec bv[8];
}; };
#define BTREE_FLUSH_NR 8
#define journal_pin_cmp(c, l, r) \ #define journal_pin_cmp(c, l, r) \
(fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r))) (fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r)))

View File

@ -40,6 +40,7 @@ static const char invalid_uuid[] = {
static struct kobject *bcache_kobj; static struct kobject *bcache_kobj;
struct mutex bch_register_lock; struct mutex bch_register_lock;
bool bcache_is_reboot;
LIST_HEAD(bch_cache_sets); LIST_HEAD(bch_cache_sets);
static LIST_HEAD(uncached_devices); static LIST_HEAD(uncached_devices);
@ -49,6 +50,7 @@ static wait_queue_head_t unregister_wait;
struct workqueue_struct *bcache_wq; struct workqueue_struct *bcache_wq;
struct workqueue_struct *bch_journal_wq; struct workqueue_struct *bch_journal_wq;
#define BTREE_MAX_PAGES (256 * 1024 / PAGE_SIZE) #define BTREE_MAX_PAGES (256 * 1024 / PAGE_SIZE)
/* limitation of partitions number on single bcache device */ /* limitation of partitions number on single bcache device */
#define BCACHE_MINORS 128 #define BCACHE_MINORS 128
@ -197,7 +199,9 @@ err:
static void write_bdev_super_endio(struct bio *bio) static void write_bdev_super_endio(struct bio *bio)
{ {
struct cached_dev *dc = bio->bi_private; struct cached_dev *dc = bio->bi_private;
/* XXX: error checking */
if (bio->bi_status)
bch_count_backing_io_errors(dc, bio);
closure_put(&dc->sb_write); closure_put(&dc->sb_write);
} }
@ -691,6 +695,7 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c,
{ {
unsigned int i; unsigned int i;
struct cache *ca; struct cache *ca;
int ret;
for_each_cache(ca, d->c, i) for_each_cache(ca, d->c, i)
bd_link_disk_holder(ca->bdev, d->disk); bd_link_disk_holder(ca->bdev, d->disk);
@ -698,9 +703,13 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c,
snprintf(d->name, BCACHEDEVNAME_SIZE, snprintf(d->name, BCACHEDEVNAME_SIZE,
"%s%u", name, d->id); "%s%u", name, d->id);
WARN(sysfs_create_link(&d->kobj, &c->kobj, "cache") || ret = sysfs_create_link(&d->kobj, &c->kobj, "cache");
sysfs_create_link(&c->kobj, &d->kobj, d->name), if (ret < 0)
"Couldn't create device <-> cache set symlinks"); pr_err("Couldn't create device -> cache set symlink");
ret = sysfs_create_link(&c->kobj, &d->kobj, d->name);
if (ret < 0)
pr_err("Couldn't create cache set -> device symlink");
clear_bit(BCACHE_DEV_UNLINK_DONE, &d->flags); clear_bit(BCACHE_DEV_UNLINK_DONE, &d->flags);
} }
@ -908,7 +917,7 @@ static int cached_dev_status_update(void *arg)
} }
void bch_cached_dev_run(struct cached_dev *dc) int bch_cached_dev_run(struct cached_dev *dc)
{ {
struct bcache_device *d = &dc->disk; struct bcache_device *d = &dc->disk;
char *buf = kmemdup_nul(dc->sb.label, SB_LABEL_SIZE, GFP_KERNEL); char *buf = kmemdup_nul(dc->sb.label, SB_LABEL_SIZE, GFP_KERNEL);
@ -919,11 +928,19 @@ void bch_cached_dev_run(struct cached_dev *dc)
NULL, NULL,
}; };
if (dc->io_disable) {
pr_err("I/O disabled on cached dev %s",
dc->backing_dev_name);
return -EIO;
}
if (atomic_xchg(&dc->running, 1)) { if (atomic_xchg(&dc->running, 1)) {
kfree(env[1]); kfree(env[1]);
kfree(env[2]); kfree(env[2]);
kfree(buf); kfree(buf);
return; pr_info("cached dev %s is running already",
dc->backing_dev_name);
return -EBUSY;
} }
if (!d->c && if (!d->c &&
@ -949,8 +966,11 @@ void bch_cached_dev_run(struct cached_dev *dc)
kfree(buf); kfree(buf);
if (sysfs_create_link(&d->kobj, &disk_to_dev(d->disk)->kobj, "dev") || if (sysfs_create_link(&d->kobj, &disk_to_dev(d->disk)->kobj, "dev") ||
sysfs_create_link(&disk_to_dev(d->disk)->kobj, &d->kobj, "bcache")) sysfs_create_link(&disk_to_dev(d->disk)->kobj,
pr_debug("error creating sysfs link"); &d->kobj, "bcache")) {
pr_err("Couldn't create bcache dev <-> disk sysfs symlinks");
return -ENOMEM;
}
dc->status_update_thread = kthread_run(cached_dev_status_update, dc->status_update_thread = kthread_run(cached_dev_status_update,
dc, "bcache_status_update"); dc, "bcache_status_update");
@ -959,6 +979,8 @@ void bch_cached_dev_run(struct cached_dev *dc)
"continue to run without monitoring backing " "continue to run without monitoring backing "
"device status"); "device status");
} }
return 0;
} }
/* /*
@ -996,7 +1018,6 @@ static void cached_dev_detach_finish(struct work_struct *w)
BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)); BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags));
BUG_ON(refcount_read(&dc->count)); BUG_ON(refcount_read(&dc->count));
mutex_lock(&bch_register_lock);
if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
cancel_writeback_rate_update_dwork(dc); cancel_writeback_rate_update_dwork(dc);
@ -1012,6 +1033,8 @@ static void cached_dev_detach_finish(struct work_struct *w)
bch_write_bdev_super(dc, &cl); bch_write_bdev_super(dc, &cl);
closure_sync(&cl); closure_sync(&cl);
mutex_lock(&bch_register_lock);
calc_cached_dev_sectors(dc->disk.c); calc_cached_dev_sectors(dc->disk.c);
bcache_device_detach(&dc->disk); bcache_device_detach(&dc->disk);
list_move(&dc->list, &uncached_devices); list_move(&dc->list, &uncached_devices);
@ -1054,6 +1077,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
uint32_t rtime = cpu_to_le32((u32)ktime_get_real_seconds()); uint32_t rtime = cpu_to_le32((u32)ktime_get_real_seconds());
struct uuid_entry *u; struct uuid_entry *u;
struct cached_dev *exist_dc, *t; struct cached_dev *exist_dc, *t;
int ret = 0;
if ((set_uuid && memcmp(set_uuid, c->sb.set_uuid, 16)) || if ((set_uuid && memcmp(set_uuid, c->sb.set_uuid, 16)) ||
(!set_uuid && memcmp(dc->sb.set_uuid, c->sb.set_uuid, 16))) (!set_uuid && memcmp(dc->sb.set_uuid, c->sb.set_uuid, 16)))
@ -1153,6 +1177,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
down_write(&dc->writeback_lock); down_write(&dc->writeback_lock);
if (bch_cached_dev_writeback_start(dc)) { if (bch_cached_dev_writeback_start(dc)) {
up_write(&dc->writeback_lock); up_write(&dc->writeback_lock);
pr_err("Couldn't start writeback facilities for %s",
dc->disk.disk->disk_name);
return -ENOMEM; return -ENOMEM;
} }
@ -1163,7 +1189,22 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
bch_sectors_dirty_init(&dc->disk); bch_sectors_dirty_init(&dc->disk);
bch_cached_dev_run(dc); ret = bch_cached_dev_run(dc);
if (ret && (ret != -EBUSY)) {
up_write(&dc->writeback_lock);
/*
* bch_register_lock is held, bcache_device_stop() is not
* able to be directly called. The kthread and kworker
* created previously in bch_cached_dev_writeback_start()
* have to be stopped manually here.
*/
kthread_stop(dc->writeback_thread);
cancel_writeback_rate_update_dwork(dc);
pr_err("Couldn't run cached device %s",
dc->backing_dev_name);
return ret;
}
bcache_device_link(&dc->disk, c, "bdev"); bcache_device_link(&dc->disk, c, "bdev");
atomic_inc(&c->attached_dev_nr); atomic_inc(&c->attached_dev_nr);
@ -1190,18 +1231,16 @@ static void cached_dev_free(struct closure *cl)
{ {
struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl); struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl);
mutex_lock(&bch_register_lock);
if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
cancel_writeback_rate_update_dwork(dc); cancel_writeback_rate_update_dwork(dc);
if (!IS_ERR_OR_NULL(dc->writeback_thread)) if (!IS_ERR_OR_NULL(dc->writeback_thread))
kthread_stop(dc->writeback_thread); kthread_stop(dc->writeback_thread);
if (dc->writeback_write_wq)
destroy_workqueue(dc->writeback_write_wq);
if (!IS_ERR_OR_NULL(dc->status_update_thread)) if (!IS_ERR_OR_NULL(dc->status_update_thread))
kthread_stop(dc->status_update_thread); kthread_stop(dc->status_update_thread);
mutex_lock(&bch_register_lock);
if (atomic_read(&dc->running)) if (atomic_read(&dc->running))
bd_unlink_disk_holder(dc->bdev, dc->disk.disk); bd_unlink_disk_holder(dc->bdev, dc->disk.disk);
bcache_device_free(&dc->disk); bcache_device_free(&dc->disk);
@ -1290,6 +1329,7 @@ static int register_bdev(struct cache_sb *sb, struct page *sb_page,
{ {
const char *err = "cannot allocate memory"; const char *err = "cannot allocate memory";
struct cache_set *c; struct cache_set *c;
int ret = -ENOMEM;
bdevname(bdev, dc->backing_dev_name); bdevname(bdev, dc->backing_dev_name);
memcpy(&dc->sb, sb, sizeof(struct cache_sb)); memcpy(&dc->sb, sb, sizeof(struct cache_sb));
@ -1319,14 +1359,18 @@ static int register_bdev(struct cache_sb *sb, struct page *sb_page,
bch_cached_dev_attach(dc, c, NULL); bch_cached_dev_attach(dc, c, NULL);
if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE || if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE ||
BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) {
bch_cached_dev_run(dc); err = "failed to run cached device";
ret = bch_cached_dev_run(dc);
if (ret)
goto err;
}
return 0; return 0;
err: err:
pr_notice("error %s: %s", dc->backing_dev_name, err); pr_notice("error %s: %s", dc->backing_dev_name, err);
bcache_device_stop(&dc->disk); bcache_device_stop(&dc->disk);
return -EIO; return ret;
} }
/* Flash only volumes */ /* Flash only volumes */
@ -1437,8 +1481,6 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size)
bool bch_cached_dev_error(struct cached_dev *dc) bool bch_cached_dev_error(struct cached_dev *dc)
{ {
struct cache_set *c;
if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags)) if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags))
return false; return false;
@ -1449,21 +1491,6 @@ bool bch_cached_dev_error(struct cached_dev *dc)
pr_err("stop %s: too many IO errors on backing device %s\n", pr_err("stop %s: too many IO errors on backing device %s\n",
dc->disk.disk->disk_name, dc->backing_dev_name); dc->disk.disk->disk_name, dc->backing_dev_name);
/*
* If the cached device is still attached to a cache set,
* even dc->io_disable is true and no more I/O requests
* accepted, cache device internal I/O (writeback scan or
* garbage collection) may still prevent bcache device from
* being stopped. So here CACHE_SET_IO_DISABLE should be
* set to c->flags too, to make the internal I/O to cache
* device rejected and stopped immediately.
* If c is NULL, that means the bcache device is not attached
* to any cache set, then no CACHE_SET_IO_DISABLE bit to set.
*/
c = dc->disk.c;
if (c && test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags))
pr_info("CACHE_SET_IO_DISABLE already set");
bcache_device_stop(&dc->disk); bcache_device_stop(&dc->disk);
return true; return true;
} }
@ -1564,19 +1591,23 @@ static void cache_set_flush(struct closure *cl)
kobject_put(&c->internal); kobject_put(&c->internal);
kobject_del(&c->kobj); kobject_del(&c->kobj);
if (c->gc_thread) if (!IS_ERR_OR_NULL(c->gc_thread))
kthread_stop(c->gc_thread); kthread_stop(c->gc_thread);
if (!IS_ERR_OR_NULL(c->root)) if (!IS_ERR_OR_NULL(c->root))
list_add(&c->root->list, &c->btree_cache); list_add(&c->root->list, &c->btree_cache);
/* Should skip this if we're unregistering because of an error */ /*
list_for_each_entry(b, &c->btree_cache, list) { * Avoid flushing cached nodes if cache set is retiring
mutex_lock(&b->write_lock); * due to too many I/O errors detected.
if (btree_node_dirty(b)) */
__bch_btree_node_write(b, NULL); if (!test_bit(CACHE_SET_IO_DISABLE, &c->flags))
mutex_unlock(&b->write_lock); list_for_each_entry(b, &c->btree_cache, list) {
} mutex_lock(&b->write_lock);
if (btree_node_dirty(b))
__bch_btree_node_write(b, NULL);
mutex_unlock(&b->write_lock);
}
for_each_cache(ca, c, i) for_each_cache(ca, c, i)
if (ca->alloc_thread) if (ca->alloc_thread)
@ -1849,6 +1880,23 @@ static int run_cache_set(struct cache_set *c)
if (bch_btree_check(c)) if (bch_btree_check(c))
goto err; goto err;
/*
* bch_btree_check() may occupy too much system memory which
* has negative effects to user space application (e.g. data
* base) performance. Shrink the mca cache memory proactively
* here to avoid competing memory with user space workloads..
*/
if (!c->shrinker_disabled) {
struct shrink_control sc;
sc.gfp_mask = GFP_KERNEL;
sc.nr_to_scan = c->btree_cache_used * c->btree_pages;
/* first run to clear b->accessed tag */
c->shrink.scan_objects(&c->shrink, &sc);
/* second run to reap non-accessed nodes */
c->shrink.scan_objects(&c->shrink, &sc);
}
bch_journal_mark(c, &journal); bch_journal_mark(c, &journal);
bch_initial_gc_finish(c); bch_initial_gc_finish(c);
pr_debug("btree_check() done"); pr_debug("btree_check() done");
@ -1957,7 +2005,7 @@ err:
} }
closure_sync(&cl); closure_sync(&cl);
/* XXX: test this, it's broken */
bch_cache_set_error(c, "%s", err); bch_cache_set_error(c, "%s", err);
return -EIO; return -EIO;
@ -2251,9 +2299,13 @@ err:
static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
const char *buffer, size_t size); const char *buffer, size_t size);
static ssize_t bch_pending_bdevs_cleanup(struct kobject *k,
struct kobj_attribute *attr,
const char *buffer, size_t size);
kobj_attribute_write(register, register_bcache); kobj_attribute_write(register, register_bcache);
kobj_attribute_write(register_quiet, register_bcache); kobj_attribute_write(register_quiet, register_bcache);
kobj_attribute_write(pendings_cleanup, bch_pending_bdevs_cleanup);
static bool bch_is_open_backing(struct block_device *bdev) static bool bch_is_open_backing(struct block_device *bdev)
{ {
@ -2301,6 +2353,11 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
if (!try_module_get(THIS_MODULE)) if (!try_module_get(THIS_MODULE))
return -EBUSY; return -EBUSY;
/* For latest state of bcache_is_reboot */
smp_mb();
if (bcache_is_reboot)
return -EBUSY;
path = kstrndup(buffer, size, GFP_KERNEL); path = kstrndup(buffer, size, GFP_KERNEL);
if (!path) if (!path)
goto err; goto err;
@ -2378,8 +2435,61 @@ err:
goto out; goto out;
} }
struct pdev {
struct list_head list;
struct cached_dev *dc;
};
static ssize_t bch_pending_bdevs_cleanup(struct kobject *k,
struct kobj_attribute *attr,
const char *buffer,
size_t size)
{
LIST_HEAD(pending_devs);
ssize_t ret = size;
struct cached_dev *dc, *tdc;
struct pdev *pdev, *tpdev;
struct cache_set *c, *tc;
mutex_lock(&bch_register_lock);
list_for_each_entry_safe(dc, tdc, &uncached_devices, list) {
pdev = kmalloc(sizeof(struct pdev), GFP_KERNEL);
if (!pdev)
break;
pdev->dc = dc;
list_add(&pdev->list, &pending_devs);
}
list_for_each_entry_safe(pdev, tpdev, &pending_devs, list) {
list_for_each_entry_safe(c, tc, &bch_cache_sets, list) {
char *pdev_set_uuid = pdev->dc->sb.set_uuid;
char *set_uuid = c->sb.uuid;
if (!memcmp(pdev_set_uuid, set_uuid, 16)) {
list_del(&pdev->list);
kfree(pdev);
break;
}
}
}
mutex_unlock(&bch_register_lock);
list_for_each_entry_safe(pdev, tpdev, &pending_devs, list) {
pr_info("delete pdev %p", pdev);
list_del(&pdev->list);
bcache_device_stop(&pdev->dc->disk);
kfree(pdev);
}
return ret;
}
static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x) static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x)
{ {
if (bcache_is_reboot)
return NOTIFY_DONE;
if (code == SYS_DOWN || if (code == SYS_DOWN ||
code == SYS_HALT || code == SYS_HALT ||
code == SYS_POWER_OFF) { code == SYS_POWER_OFF) {
@ -2392,19 +2502,45 @@ static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x)
mutex_lock(&bch_register_lock); mutex_lock(&bch_register_lock);
if (bcache_is_reboot)
goto out;
/* New registration is rejected since now */
bcache_is_reboot = true;
/*
* Make registering caller (if there is) on other CPU
* core know bcache_is_reboot set to true earlier
*/
smp_mb();
if (list_empty(&bch_cache_sets) && if (list_empty(&bch_cache_sets) &&
list_empty(&uncached_devices)) list_empty(&uncached_devices))
goto out; goto out;
mutex_unlock(&bch_register_lock);
pr_info("Stopping all devices:"); pr_info("Stopping all devices:");
/*
* The reason bch_register_lock is not held to call
* bch_cache_set_stop() and bcache_device_stop() is to
* avoid potential deadlock during reboot, because cache
* set or bcache device stopping process will acqurie
* bch_register_lock too.
*
* We are safe here because bcache_is_reboot sets to
* true already, register_bcache() will reject new
* registration now. bcache_is_reboot also makes sure
* bcache_reboot() won't be re-entered on by other thread,
* so there is no race in following list iteration by
* list_for_each_entry_safe().
*/
list_for_each_entry_safe(c, tc, &bch_cache_sets, list) list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
bch_cache_set_stop(c); bch_cache_set_stop(c);
list_for_each_entry_safe(dc, tdc, &uncached_devices, list) list_for_each_entry_safe(dc, tdc, &uncached_devices, list)
bcache_device_stop(&dc->disk); bcache_device_stop(&dc->disk);
mutex_unlock(&bch_register_lock);
/* /*
* Give an early chance for other kthreads and * Give an early chance for other kthreads and
@ -2496,6 +2632,7 @@ static int __init bcache_init(void)
static const struct attribute *files[] = { static const struct attribute *files[] = {
&ksysfs_register.attr, &ksysfs_register.attr,
&ksysfs_register_quiet.attr, &ksysfs_register_quiet.attr,
&ksysfs_pendings_cleanup.attr,
NULL NULL
}; };
@ -2531,6 +2668,8 @@ static int __init bcache_init(void)
bch_debug_init(); bch_debug_init();
closure_debug_init(); closure_debug_init();
bcache_is_reboot = false;
return 0; return 0;
err: err:
bcache_exit(); bcache_exit();

View File

@ -16,33 +16,31 @@
#include <linux/sort.h> #include <linux/sort.h>
#include <linux/sched/clock.h> #include <linux/sched/clock.h>
extern bool bcache_is_reboot;
/* Default is 0 ("writethrough") */ /* Default is 0 ("writethrough") */
static const char * const bch_cache_modes[] = { static const char * const bch_cache_modes[] = {
"writethrough", "writethrough",
"writeback", "writeback",
"writearound", "writearound",
"none", "none"
NULL
}; };
/* Default is 0 ("auto") */ /* Default is 0 ("auto") */
static const char * const bch_stop_on_failure_modes[] = { static const char * const bch_stop_on_failure_modes[] = {
"auto", "auto",
"always", "always"
NULL
}; };
static const char * const cache_replacement_policies[] = { static const char * const cache_replacement_policies[] = {
"lru", "lru",
"fifo", "fifo",
"random", "random"
NULL
}; };
static const char * const error_actions[] = { static const char * const error_actions[] = {
"unregister", "unregister",
"panic", "panic"
NULL
}; };
write_attribute(attach); write_attribute(attach);
@ -84,8 +82,8 @@ read_attribute(bset_tree_stats);
read_attribute(state); read_attribute(state);
read_attribute(cache_read_races); read_attribute(cache_read_races);
read_attribute(reclaim); read_attribute(reclaim);
read_attribute(reclaimed_journal_buckets);
read_attribute(flush_write); read_attribute(flush_write);
read_attribute(retry_flush_write);
read_attribute(writeback_keys_done); read_attribute(writeback_keys_done);
read_attribute(writeback_keys_failed); read_attribute(writeback_keys_failed);
read_attribute(io_errors); read_attribute(io_errors);
@ -180,7 +178,7 @@ SHOW(__bch_cached_dev)
var_print(writeback_percent); var_print(writeback_percent);
sysfs_hprint(writeback_rate, sysfs_hprint(writeback_rate,
wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0); wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0);
sysfs_hprint(io_errors, atomic_read(&dc->io_errors)); sysfs_printf(io_errors, "%i", atomic_read(&dc->io_errors));
sysfs_printf(io_error_limit, "%i", dc->error_limit); sysfs_printf(io_error_limit, "%i", dc->error_limit);
sysfs_printf(io_disable, "%i", dc->io_disable); sysfs_printf(io_disable, "%i", dc->io_disable);
var_print(writeback_rate_update_seconds); var_print(writeback_rate_update_seconds);
@ -271,6 +269,10 @@ STORE(__cached_dev)
struct cache_set *c; struct cache_set *c;
struct kobj_uevent_env *env; struct kobj_uevent_env *env;
/* no user space access if system is rebooting */
if (bcache_is_reboot)
return -EBUSY;
#define d_strtoul(var) sysfs_strtoul(var, dc->var) #define d_strtoul(var) sysfs_strtoul(var, dc->var)
#define d_strtoul_nonzero(var) sysfs_strtoul_clamp(var, dc->var, 1, INT_MAX) #define d_strtoul_nonzero(var) sysfs_strtoul_clamp(var, dc->var, 1, INT_MAX)
#define d_strtoi_h(var) sysfs_hatoi(var, dc->var) #define d_strtoi_h(var) sysfs_hatoi(var, dc->var)
@ -329,11 +331,14 @@ STORE(__cached_dev)
bch_cache_accounting_clear(&dc->accounting); bch_cache_accounting_clear(&dc->accounting);
if (attr == &sysfs_running && if (attr == &sysfs_running &&
strtoul_or_return(buf)) strtoul_or_return(buf)) {
bch_cached_dev_run(dc); v = bch_cached_dev_run(dc);
if (v)
return v;
}
if (attr == &sysfs_cache_mode) { if (attr == &sysfs_cache_mode) {
v = __sysfs_match_string(bch_cache_modes, -1, buf); v = sysfs_match_string(bch_cache_modes, buf);
if (v < 0) if (v < 0)
return v; return v;
@ -344,7 +349,7 @@ STORE(__cached_dev)
} }
if (attr == &sysfs_stop_when_cache_set_failed) { if (attr == &sysfs_stop_when_cache_set_failed) {
v = __sysfs_match_string(bch_stop_on_failure_modes, -1, buf); v = sysfs_match_string(bch_stop_on_failure_modes, buf);
if (v < 0) if (v < 0)
return v; return v;
@ -408,6 +413,10 @@ STORE(bch_cached_dev)
struct cached_dev *dc = container_of(kobj, struct cached_dev, struct cached_dev *dc = container_of(kobj, struct cached_dev,
disk.kobj); disk.kobj);
/* no user space access if system is rebooting */
if (bcache_is_reboot)
return -EBUSY;
mutex_lock(&bch_register_lock); mutex_lock(&bch_register_lock);
size = __cached_dev_store(kobj, attr, buf, size); size = __cached_dev_store(kobj, attr, buf, size);
@ -464,7 +473,7 @@ static struct attribute *bch_cached_dev_files[] = {
&sysfs_writeback_rate_p_term_inverse, &sysfs_writeback_rate_p_term_inverse,
&sysfs_writeback_rate_minimum, &sysfs_writeback_rate_minimum,
&sysfs_writeback_rate_debug, &sysfs_writeback_rate_debug,
&sysfs_errors, &sysfs_io_errors,
&sysfs_io_error_limit, &sysfs_io_error_limit,
&sysfs_io_disable, &sysfs_io_disable,
&sysfs_dirty_data, &sysfs_dirty_data,
@ -511,6 +520,10 @@ STORE(__bch_flash_dev)
kobj); kobj);
struct uuid_entry *u = &d->c->uuids[d->id]; struct uuid_entry *u = &d->c->uuids[d->id];
/* no user space access if system is rebooting */
if (bcache_is_reboot)
return -EBUSY;
sysfs_strtoul(data_csum, d->data_csum); sysfs_strtoul(data_csum, d->data_csum);
if (attr == &sysfs_size) { if (attr == &sysfs_size) {
@ -693,12 +706,12 @@ SHOW(__bch_cache_set)
sysfs_print(reclaim, sysfs_print(reclaim,
atomic_long_read(&c->reclaim)); atomic_long_read(&c->reclaim));
sysfs_print(reclaimed_journal_buckets,
atomic_long_read(&c->reclaimed_journal_buckets));
sysfs_print(flush_write, sysfs_print(flush_write,
atomic_long_read(&c->flush_write)); atomic_long_read(&c->flush_write));
sysfs_print(retry_flush_write,
atomic_long_read(&c->retry_flush_write));
sysfs_print(writeback_keys_done, sysfs_print(writeback_keys_done,
atomic_long_read(&c->writeback_keys_done)); atomic_long_read(&c->writeback_keys_done));
sysfs_print(writeback_keys_failed, sysfs_print(writeback_keys_failed,
@ -746,6 +759,10 @@ STORE(__bch_cache_set)
struct cache_set *c = container_of(kobj, struct cache_set, kobj); struct cache_set *c = container_of(kobj, struct cache_set, kobj);
ssize_t v; ssize_t v;
/* no user space access if system is rebooting */
if (bcache_is_reboot)
return -EBUSY;
if (attr == &sysfs_unregister) if (attr == &sysfs_unregister)
bch_cache_set_unregister(c); bch_cache_set_unregister(c);
@ -799,7 +816,7 @@ STORE(__bch_cache_set)
0, UINT_MAX); 0, UINT_MAX);
if (attr == &sysfs_errors) { if (attr == &sysfs_errors) {
v = __sysfs_match_string(error_actions, -1, buf); v = sysfs_match_string(error_actions, buf);
if (v < 0) if (v < 0)
return v; return v;
@ -865,6 +882,10 @@ STORE(bch_cache_set_internal)
{ {
struct cache_set *c = container_of(kobj, struct cache_set, internal); struct cache_set *c = container_of(kobj, struct cache_set, internal);
/* no user space access if system is rebooting */
if (bcache_is_reboot)
return -EBUSY;
return bch_cache_set_store(&c->kobj, attr, buf, size); return bch_cache_set_store(&c->kobj, attr, buf, size);
} }
@ -914,8 +935,8 @@ static struct attribute *bch_cache_set_internal_files[] = {
&sysfs_bset_tree_stats, &sysfs_bset_tree_stats,
&sysfs_cache_read_races, &sysfs_cache_read_races,
&sysfs_reclaim, &sysfs_reclaim,
&sysfs_reclaimed_journal_buckets,
&sysfs_flush_write, &sysfs_flush_write,
&sysfs_retry_flush_write,
&sysfs_writeback_keys_done, &sysfs_writeback_keys_done,
&sysfs_writeback_keys_failed, &sysfs_writeback_keys_failed,
@ -1050,6 +1071,10 @@ STORE(__bch_cache)
struct cache *ca = container_of(kobj, struct cache, kobj); struct cache *ca = container_of(kobj, struct cache, kobj);
ssize_t v; ssize_t v;
/* no user space access if system is rebooting */
if (bcache_is_reboot)
return -EBUSY;
if (attr == &sysfs_discard) { if (attr == &sysfs_discard) {
bool v = strtoul_or_return(buf); bool v = strtoul_or_return(buf);
@ -1063,7 +1088,7 @@ STORE(__bch_cache)
} }
if (attr == &sysfs_cache_replacement_policy) { if (attr == &sysfs_cache_replacement_policy) {
v = __sysfs_match_string(cache_replacement_policies, -1, buf); v = sysfs_match_string(cache_replacement_policies, buf);
if (v < 0) if (v < 0)
return v; return v;

View File

@ -113,8 +113,6 @@ do { \
#define heap_full(h) ((h)->used == (h)->size) #define heap_full(h) ((h)->used == (h)->size)
#define heap_empty(h) ((h)->used == 0)
#define DECLARE_FIFO(type, name) \ #define DECLARE_FIFO(type, name) \
struct { \ struct { \
size_t front, back, size, mask; \ size_t front, back, size, mask; \

View File

@ -122,6 +122,9 @@ static void __update_writeback_rate(struct cached_dev *dc)
static bool set_at_max_writeback_rate(struct cache_set *c, static bool set_at_max_writeback_rate(struct cache_set *c,
struct cached_dev *dc) struct cached_dev *dc)
{ {
/* Don't set max writeback rate if gc is running */
if (!c->gc_mark_valid)
return false;
/* /*
* Idle_counter is increased everytime when update_writeback_rate() is * Idle_counter is increased everytime when update_writeback_rate() is
* called. If all backing devices attached to the same cache set have * called. If all backing devices attached to the same cache set have
@ -735,6 +738,10 @@ static int bch_writeback_thread(void *arg)
} }
} }
if (dc->writeback_write_wq) {
flush_workqueue(dc->writeback_write_wq);
destroy_workqueue(dc->writeback_write_wq);
}
cached_dev_put(dc); cached_dev_put(dc);
wait_for_kthread_stop(); wait_for_kthread_stop();
@ -830,6 +837,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback"); "bcache_writeback");
if (IS_ERR(dc->writeback_thread)) { if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc); cached_dev_put(dc);
destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread); return PTR_ERR(dc->writeback_thread);
} }
dc->writeback_running = true; dc->writeback_running = true;

View File

@ -1790,6 +1790,8 @@ void md_bitmap_destroy(struct mddev *mddev)
return; return;
md_bitmap_wait_behind_writes(mddev); md_bitmap_wait_behind_writes(mddev);
mempool_destroy(mddev->wb_info_pool);
mddev->wb_info_pool = NULL;
mutex_lock(&mddev->bitmap_info.mutex); mutex_lock(&mddev->bitmap_info.mutex);
spin_lock(&mddev->lock); spin_lock(&mddev->lock);
@ -1900,10 +1902,14 @@ int md_bitmap_load(struct mddev *mddev)
sector_t start = 0; sector_t start = 0;
sector_t sector = 0; sector_t sector = 0;
struct bitmap *bitmap = mddev->bitmap; struct bitmap *bitmap = mddev->bitmap;
struct md_rdev *rdev;
if (!bitmap) if (!bitmap)
goto out; goto out;
rdev_for_each(rdev, mddev)
mddev_create_wb_pool(mddev, rdev, true);
if (mddev_is_clustered(mddev)) if (mddev_is_clustered(mddev))
md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes); md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes);
@ -2462,12 +2468,26 @@ static ssize_t
backlog_store(struct mddev *mddev, const char *buf, size_t len) backlog_store(struct mddev *mddev, const char *buf, size_t len)
{ {
unsigned long backlog; unsigned long backlog;
unsigned long old_mwb = mddev->bitmap_info.max_write_behind;
int rv = kstrtoul(buf, 10, &backlog); int rv = kstrtoul(buf, 10, &backlog);
if (rv) if (rv)
return rv; return rv;
if (backlog > COUNTER_MAX) if (backlog > COUNTER_MAX)
return -EINVAL; return -EINVAL;
mddev->bitmap_info.max_write_behind = backlog; mddev->bitmap_info.max_write_behind = backlog;
if (!backlog && mddev->wb_info_pool) {
/* wb_info_pool is not needed if backlog is zero */
mempool_destroy(mddev->wb_info_pool);
mddev->wb_info_pool = NULL;
} else if (backlog && !mddev->wb_info_pool) {
/* wb_info_pool is needed since backlog is not zero */
struct md_rdev *rdev;
rdev_for_each(rdev, mddev)
mddev_create_wb_pool(mddev, rdev, false);
}
if (old_mwb != backlog)
md_bitmap_update_sb(mddev->bitmap);
return len; return len;
} }

View File

@ -37,6 +37,7 @@
*/ */
#include <linux/sched/mm.h>
#include <linux/sched/signal.h> #include <linux/sched/signal.h>
#include <linux/kthread.h> #include <linux/kthread.h>
#include <linux/blkdev.h> #include <linux/blkdev.h>
@ -124,6 +125,77 @@ static inline int speed_max(struct mddev *mddev)
mddev->sync_speed_max : sysctl_speed_limit_max; mddev->sync_speed_max : sysctl_speed_limit_max;
} }
static int rdev_init_wb(struct md_rdev *rdev)
{
if (rdev->bdev->bd_queue->nr_hw_queues == 1)
return 0;
spin_lock_init(&rdev->wb_list_lock);
INIT_LIST_HEAD(&rdev->wb_list);
init_waitqueue_head(&rdev->wb_io_wait);
set_bit(WBCollisionCheck, &rdev->flags);
return 1;
}
/*
* Create wb_info_pool if rdev is the first multi-queue device flaged
* with writemostly, also write-behind mode is enabled.
*/
void mddev_create_wb_pool(struct mddev *mddev, struct md_rdev *rdev,
bool is_suspend)
{
if (mddev->bitmap_info.max_write_behind == 0)
return;
if (!test_bit(WriteMostly, &rdev->flags) || !rdev_init_wb(rdev))
return;
if (mddev->wb_info_pool == NULL) {
unsigned int noio_flag;
if (!is_suspend)
mddev_suspend(mddev);
noio_flag = memalloc_noio_save();
mddev->wb_info_pool = mempool_create_kmalloc_pool(NR_WB_INFOS,
sizeof(struct wb_info));
memalloc_noio_restore(noio_flag);
if (!mddev->wb_info_pool)
pr_err("can't alloc memory pool for writemostly\n");
if (!is_suspend)
mddev_resume(mddev);
}
}
EXPORT_SYMBOL_GPL(mddev_create_wb_pool);
/*
* destroy wb_info_pool if rdev is the last device flaged with WBCollisionCheck.
*/
static void mddev_destroy_wb_pool(struct mddev *mddev, struct md_rdev *rdev)
{
if (!test_and_clear_bit(WBCollisionCheck, &rdev->flags))
return;
if (mddev->wb_info_pool) {
struct md_rdev *temp;
int num = 0;
/*
* Check if other rdevs need wb_info_pool.
*/
rdev_for_each(temp, mddev)
if (temp != rdev &&
test_bit(WBCollisionCheck, &temp->flags))
num++;
if (!num) {
mddev_suspend(rdev->mddev);
mempool_destroy(mddev->wb_info_pool);
mddev->wb_info_pool = NULL;
mddev_resume(rdev->mddev);
}
}
}
static struct ctl_table_header *raid_table_header; static struct ctl_table_header *raid_table_header;
static struct ctl_table raid_table[] = { static struct ctl_table raid_table[] = {
@ -2210,6 +2282,9 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
rdev->mddev = mddev; rdev->mddev = mddev;
pr_debug("md: bind<%s>\n", b); pr_debug("md: bind<%s>\n", b);
if (mddev->raid_disks)
mddev_create_wb_pool(mddev, rdev, false);
if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b))) if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b)))
goto fail; goto fail;
@ -2246,6 +2321,7 @@ static void unbind_rdev_from_array(struct md_rdev *rdev)
bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk); bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
list_del_rcu(&rdev->same_set); list_del_rcu(&rdev->same_set);
pr_debug("md: unbind<%s>\n", bdevname(rdev->bdev,b)); pr_debug("md: unbind<%s>\n", bdevname(rdev->bdev,b));
mddev_destroy_wb_pool(rdev->mddev, rdev);
rdev->mddev = NULL; rdev->mddev = NULL;
sysfs_remove_link(&rdev->kobj, "block"); sysfs_remove_link(&rdev->kobj, "block");
sysfs_put(rdev->sysfs_state); sysfs_put(rdev->sysfs_state);
@ -2758,8 +2834,10 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
} }
} else if (cmd_match(buf, "writemostly")) { } else if (cmd_match(buf, "writemostly")) {
set_bit(WriteMostly, &rdev->flags); set_bit(WriteMostly, &rdev->flags);
mddev_create_wb_pool(rdev->mddev, rdev, false);
err = 0; err = 0;
} else if (cmd_match(buf, "-writemostly")) { } else if (cmd_match(buf, "-writemostly")) {
mddev_destroy_wb_pool(rdev->mddev, rdev);
clear_bit(WriteMostly, &rdev->flags); clear_bit(WriteMostly, &rdev->flags);
err = 0; err = 0;
} else if (cmd_match(buf, "blocked")) { } else if (cmd_match(buf, "blocked")) {
@ -3356,7 +3434,7 @@ rdev_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
if (!entry->show) if (!entry->show)
return -EIO; return -EIO;
if (!rdev->mddev) if (!rdev->mddev)
return -EBUSY; return -ENODEV;
return entry->show(rdev, page); return entry->show(rdev, page);
} }
@ -5588,15 +5666,28 @@ int md_run(struct mddev *mddev)
mddev->bitmap = bitmap; mddev->bitmap = bitmap;
} }
if (err) { if (err)
mddev_detach(mddev); goto bitmap_abort;
if (mddev->private)
pers->free(mddev, mddev->private); if (mddev->bitmap_info.max_write_behind > 0) {
mddev->private = NULL; bool creat_pool = false;
module_put(pers->owner);
md_bitmap_destroy(mddev); rdev_for_each(rdev, mddev) {
goto abort; if (test_bit(WriteMostly, &rdev->flags) &&
rdev_init_wb(rdev))
creat_pool = true;
}
if (creat_pool && mddev->wb_info_pool == NULL) {
mddev->wb_info_pool =
mempool_create_kmalloc_pool(NR_WB_INFOS,
sizeof(struct wb_info));
if (!mddev->wb_info_pool) {
err = -ENOMEM;
goto bitmap_abort;
}
}
} }
if (mddev->queue) { if (mddev->queue) {
bool nonrot = true; bool nonrot = true;
@ -5639,8 +5730,7 @@ int md_run(struct mddev *mddev)
spin_unlock(&mddev->lock); spin_unlock(&mddev->lock);
rdev_for_each(rdev, mddev) rdev_for_each(rdev, mddev)
if (rdev->raid_disk >= 0) if (rdev->raid_disk >= 0)
if (sysfs_link_rdev(mddev, rdev)) sysfs_link_rdev(mddev, rdev); /* failure here is OK */
/* failure here is OK */;
if (mddev->degraded && !mddev->ro) if (mddev->degraded && !mddev->ro)
/* This ensures that recovering status is reported immediately /* This ensures that recovering status is reported immediately
@ -5658,6 +5748,13 @@ int md_run(struct mddev *mddev)
sysfs_notify(&mddev->kobj, NULL, "degraded"); sysfs_notify(&mddev->kobj, NULL, "degraded");
return 0; return 0;
bitmap_abort:
mddev_detach(mddev);
if (mddev->private)
pers->free(mddev, mddev->private);
mddev->private = NULL;
module_put(pers->owner);
md_bitmap_destroy(mddev);
abort: abort:
bioset_exit(&mddev->bio_set); bioset_exit(&mddev->bio_set);
bioset_exit(&mddev->sync_set); bioset_exit(&mddev->sync_set);
@ -5826,6 +5923,8 @@ static void __md_stop_writes(struct mddev *mddev)
mddev->in_sync = 1; mddev->in_sync = 1;
md_update_sb(mddev, 1); md_update_sb(mddev, 1);
} }
mempool_destroy(mddev->wb_info_pool);
mddev->wb_info_pool = NULL;
} }
void md_stop_writes(struct mddev *mddev) void md_stop_writes(struct mddev *mddev)
@ -8198,8 +8297,7 @@ void md_do_sync(struct md_thread *thread)
{ {
struct mddev *mddev = thread->mddev; struct mddev *mddev = thread->mddev;
struct mddev *mddev2; struct mddev *mddev2;
unsigned int currspeed = 0, unsigned int currspeed = 0, window;
window;
sector_t max_sectors,j, io_sectors, recovery_done; sector_t max_sectors,j, io_sectors, recovery_done;
unsigned long mark[SYNC_MARKS]; unsigned long mark[SYNC_MARKS];
unsigned long update_time; unsigned long update_time;
@ -8256,7 +8354,7 @@ void md_do_sync(struct md_thread *thread)
* 0 == not engaged in resync at all * 0 == not engaged in resync at all
* 2 == checking that there is no conflict with another sync * 2 == checking that there is no conflict with another sync
* 1 == like 2, but have yielded to allow conflicting resync to * 1 == like 2, but have yielded to allow conflicting resync to
* commense * commence
* other == active in resync - this many blocks * other == active in resync - this many blocks
* *
* Before starting a resync we must have set curr_resync to * Before starting a resync we must have set curr_resync to
@ -8387,7 +8485,7 @@ void md_do_sync(struct md_thread *thread)
/* /*
* Tune reconstruction: * Tune reconstruction:
*/ */
window = 32*(PAGE_SIZE/512); window = 32 * (PAGE_SIZE / 512);
pr_debug("md: using %dk window, over a total of %lluk.\n", pr_debug("md: using %dk window, over a total of %lluk.\n",
window/2, (unsigned long long)max_sectors/2); window/2, (unsigned long long)max_sectors/2);
@ -9200,7 +9298,6 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
* perform resync with the new activated disk */ * perform resync with the new activated disk */
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->thread);
} }
/* device faulty /* device faulty
* We just want to do the minimum to mark the disk * We just want to do the minimum to mark the disk

View File

@ -109,6 +109,14 @@ struct md_rdev {
* for reporting to userspace and storing * for reporting to userspace and storing
* in superblock. * in superblock.
*/ */
/*
* The members for check collision of write behind IOs.
*/
struct list_head wb_list;
spinlock_t wb_list_lock;
wait_queue_head_t wb_io_wait;
struct work_struct del_work; /* used for delayed sysfs removal */ struct work_struct del_work; /* used for delayed sysfs removal */
struct kernfs_node *sysfs_state; /* handle for 'state' struct kernfs_node *sysfs_state; /* handle for 'state'
@ -193,6 +201,10 @@ enum flag_bits {
* it didn't fail, so don't use FailFast * it didn't fail, so don't use FailFast
* any more for metadata * any more for metadata
*/ */
WBCollisionCheck, /*
* multiqueue device should check if there
* is collision between write behind bios.
*/
}; };
static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors, static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors,
@ -245,6 +257,14 @@ enum mddev_sb_flags {
MD_SB_NEED_REWRITE, /* metadata write needs to be repeated */ MD_SB_NEED_REWRITE, /* metadata write needs to be repeated */
}; };
#define NR_WB_INFOS 8
/* record current range of write behind IOs */
struct wb_info {
sector_t lo;
sector_t hi;
struct list_head list;
};
struct mddev { struct mddev {
void *private; void *private;
struct md_personality *pers; struct md_personality *pers;
@ -461,6 +481,7 @@ struct mddev {
*/ */
struct work_struct flush_work; struct work_struct flush_work;
struct work_struct event_work; /* used by dm to report failure event */ struct work_struct event_work; /* used by dm to report failure event */
mempool_t *wb_info_pool;
void (*sync_super)(struct mddev *mddev, struct md_rdev *rdev); void (*sync_super)(struct mddev *mddev, struct md_rdev *rdev);
struct md_cluster_info *cluster_info; struct md_cluster_info *cluster_info;
unsigned int good_device_nr; /* good device num within cluster raid */ unsigned int good_device_nr; /* good device num within cluster raid */
@ -709,6 +730,8 @@ extern struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
extern void md_reload_sb(struct mddev *mddev, int raid_disk); extern void md_reload_sb(struct mddev *mddev, int raid_disk);
extern void md_update_sb(struct mddev *mddev, int force); extern void md_update_sb(struct mddev *mddev, int force);
extern void md_kick_rdev_from_array(struct md_rdev * rdev); extern void md_kick_rdev_from_array(struct md_rdev * rdev);
extern void mddev_create_wb_pool(struct mddev *mddev, struct md_rdev *rdev,
bool is_suspend);
struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr); struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr);
struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev); struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev);

View File

@ -3,12 +3,42 @@
#define RESYNC_BLOCK_SIZE (64*1024) #define RESYNC_BLOCK_SIZE (64*1024)
#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE) #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
/*
* Number of guaranteed raid bios in case of extreme VM load:
*/
#define NR_RAID_BIOS 256
/* when we get a read error on a read-only array, we redirect to another
* device without failing the first device, or trying to over-write to
* correct the read error. To keep track of bad blocks on a per-bio
* level, we store IO_BLOCKED in the appropriate 'bios' pointer
*/
#define IO_BLOCKED ((struct bio *)1)
/* When we successfully write to a known bad-block, we need to remove the
* bad-block marking which must be done from process context. So we record
* the success by setting devs[n].bio to IO_MADE_GOOD
*/
#define IO_MADE_GOOD ((struct bio *)2)
#define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
/* When there are this many requests queue to be written by
* the raid thread, we become 'congested' to provide back-pressure
* for writeback.
*/
static int max_queued_requests = 1024;
/* for managing resync I/O pages */ /* for managing resync I/O pages */
struct resync_pages { struct resync_pages {
void *raid_bio; void *raid_bio;
struct page *pages[RESYNC_PAGES]; struct page *pages[RESYNC_PAGES];
}; };
static void rbio_pool_free(void *rbio, void *data)
{
kfree(rbio);
}
static inline int resync_alloc_pages(struct resync_pages *rp, static inline int resync_alloc_pages(struct resync_pages *rp,
gfp_t gfp_flags) gfp_t gfp_flags)
{ {

View File

@ -42,31 +42,6 @@
(1L << MD_HAS_PPL) | \ (1L << MD_HAS_PPL) | \
(1L << MD_HAS_MULTIPLE_PPLS)) (1L << MD_HAS_MULTIPLE_PPLS))
/*
* Number of guaranteed r1bios in case of extreme VM load:
*/
#define NR_RAID1_BIOS 256
/* when we get a read error on a read-only array, we redirect to another
* device without failing the first device, or trying to over-write to
* correct the read error. To keep track of bad blocks on a per-bio
* level, we store IO_BLOCKED in the appropriate 'bios' pointer
*/
#define IO_BLOCKED ((struct bio *)1)
/* When we successfully write to a known bad-block, we need to remove the
* bad-block marking which must be done from process context. So we record
* the success by setting devs[n].bio to IO_MADE_GOOD
*/
#define IO_MADE_GOOD ((struct bio *)2)
#define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
/* When there are this many requests queue to be written by
* the raid1 thread, we become 'congested' to provide back-pressure
* for writeback.
*/
static int max_queued_requests = 1024;
static void allow_barrier(struct r1conf *conf, sector_t sector_nr); static void allow_barrier(struct r1conf *conf, sector_t sector_nr);
static void lower_barrier(struct r1conf *conf, sector_t sector_nr); static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
@ -75,6 +50,57 @@ static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
#include "raid1-10.c" #include "raid1-10.c"
static int check_and_add_wb(struct md_rdev *rdev, sector_t lo, sector_t hi)
{
struct wb_info *wi, *temp_wi;
unsigned long flags;
int ret = 0;
struct mddev *mddev = rdev->mddev;
wi = mempool_alloc(mddev->wb_info_pool, GFP_NOIO);
spin_lock_irqsave(&rdev->wb_list_lock, flags);
list_for_each_entry(temp_wi, &rdev->wb_list, list) {
/* collision happened */
if (hi > temp_wi->lo && lo < temp_wi->hi) {
ret = -EBUSY;
break;
}
}
if (!ret) {
wi->lo = lo;
wi->hi = hi;
list_add(&wi->list, &rdev->wb_list);
} else
mempool_free(wi, mddev->wb_info_pool);
spin_unlock_irqrestore(&rdev->wb_list_lock, flags);
return ret;
}
static void remove_wb(struct md_rdev *rdev, sector_t lo, sector_t hi)
{
struct wb_info *wi;
unsigned long flags;
int found = 0;
struct mddev *mddev = rdev->mddev;
spin_lock_irqsave(&rdev->wb_list_lock, flags);
list_for_each_entry(wi, &rdev->wb_list, list)
if (hi == wi->hi && lo == wi->lo) {
list_del(&wi->list);
mempool_free(wi, mddev->wb_info_pool);
found = 1;
break;
}
if (!found)
WARN(1, "The write behind IO is not recorded\n");
spin_unlock_irqrestore(&rdev->wb_list_lock, flags);
wake_up(&rdev->wb_io_wait);
}
/* /*
* for resync bio, r1bio pointer can be retrieved from the per-bio * for resync bio, r1bio pointer can be retrieved from the per-bio
* 'struct resync_pages'. * 'struct resync_pages'.
@ -93,11 +119,6 @@ static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
return kzalloc(size, gfp_flags); return kzalloc(size, gfp_flags);
} }
static void r1bio_pool_free(void *r1_bio, void *data)
{
kfree(r1_bio);
}
#define RESYNC_DEPTH 32 #define RESYNC_DEPTH 32
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
#define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH) #define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
@ -173,7 +194,7 @@ out_free_bio:
kfree(rps); kfree(rps);
out_free_r1bio: out_free_r1bio:
r1bio_pool_free(r1_bio, data); rbio_pool_free(r1_bio, data);
return NULL; return NULL;
} }
@ -193,7 +214,7 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
/* resync pages array stored in the 1st bio's .bi_private */ /* resync pages array stored in the 1st bio's .bi_private */
kfree(rp); kfree(rp);
r1bio_pool_free(r1bio, data); rbio_pool_free(r1bio, data);
} }
static void put_all_bios(struct r1conf *conf, struct r1bio *r1_bio) static void put_all_bios(struct r1conf *conf, struct r1bio *r1_bio)
@ -476,6 +497,12 @@ static void raid1_end_write_request(struct bio *bio)
} }
if (behind) { if (behind) {
if (test_bit(WBCollisionCheck, &rdev->flags)) {
sector_t lo = r1_bio->sector;
sector_t hi = r1_bio->sector + r1_bio->sectors;
remove_wb(rdev, lo, hi);
}
if (test_bit(WriteMostly, &rdev->flags)) if (test_bit(WriteMostly, &rdev->flags))
atomic_dec(&r1_bio->behind_remaining); atomic_dec(&r1_bio->behind_remaining);
@ -1449,7 +1476,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
if (!r1_bio->bios[i]) if (!r1_bio->bios[i])
continue; continue;
if (first_clone) { if (first_clone) {
/* do behind I/O ? /* do behind I/O ?
* Not if there are too many, or cannot * Not if there are too many, or cannot
@ -1474,7 +1500,16 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
if (r1_bio->behind_master_bio) { if (r1_bio->behind_master_bio) {
if (test_bit(WriteMostly, &conf->mirrors[i].rdev->flags)) struct md_rdev *rdev = conf->mirrors[i].rdev;
if (test_bit(WBCollisionCheck, &rdev->flags)) {
sector_t lo = r1_bio->sector;
sector_t hi = r1_bio->sector + r1_bio->sectors;
wait_event(rdev->wb_io_wait,
check_and_add_wb(rdev, lo, hi) == 0);
}
if (test_bit(WriteMostly, &rdev->flags))
atomic_inc(&r1_bio->behind_remaining); atomic_inc(&r1_bio->behind_remaining);
} }
@ -1729,9 +1764,8 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
first = last = rdev->saved_raid_disk; first = last = rdev->saved_raid_disk;
for (mirror = first; mirror <= last; mirror++) { for (mirror = first; mirror <= last; mirror++) {
p = conf->mirrors+mirror; p = conf->mirrors + mirror;
if (!p->rdev) { if (!p->rdev) {
if (mddev->gendisk) if (mddev->gendisk)
disk_stack_limits(mddev->gendisk, rdev->bdev, disk_stack_limits(mddev->gendisk, rdev->bdev,
rdev->data_offset << 9); rdev->data_offset << 9);
@ -2888,7 +2922,6 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
if (read_targets == 1) if (read_targets == 1)
bio->bi_opf &= ~MD_FAILFAST; bio->bi_opf &= ~MD_FAILFAST;
generic_make_request(bio); generic_make_request(bio);
} }
return nr_sectors; return nr_sectors;
} }
@ -2947,8 +2980,8 @@ static struct r1conf *setup_conf(struct mddev *mddev)
if (!conf->poolinfo) if (!conf->poolinfo)
goto abort; goto abort;
conf->poolinfo->raid_disks = mddev->raid_disks * 2; conf->poolinfo->raid_disks = mddev->raid_disks * 2;
err = mempool_init(&conf->r1bio_pool, NR_RAID1_BIOS, r1bio_pool_alloc, err = mempool_init(&conf->r1bio_pool, NR_RAID_BIOS, r1bio_pool_alloc,
r1bio_pool_free, conf->poolinfo); rbio_pool_free, conf->poolinfo);
if (err) if (err)
goto abort; goto abort;
@ -3089,7 +3122,7 @@ static int raid1_run(struct mddev *mddev)
} }
mddev->degraded = 0; mddev->degraded = 0;
for (i=0; i < conf->raid_disks; i++) for (i = 0; i < conf->raid_disks; i++)
if (conf->mirrors[i].rdev == NULL || if (conf->mirrors[i].rdev == NULL ||
!test_bit(In_sync, &conf->mirrors[i].rdev->flags) || !test_bit(In_sync, &conf->mirrors[i].rdev->flags) ||
test_bit(Faulty, &conf->mirrors[i].rdev->flags)) test_bit(Faulty, &conf->mirrors[i].rdev->flags))
@ -3124,7 +3157,7 @@ static int raid1_run(struct mddev *mddev)
mddev->queue); mddev->queue);
} }
ret = md_integrity_register(mddev); ret = md_integrity_register(mddev);
if (ret) { if (ret) {
md_unregister_thread(&mddev->thread); md_unregister_thread(&mddev->thread);
raid1_free(mddev, conf); raid1_free(mddev, conf);
@ -3232,8 +3265,8 @@ static int raid1_reshape(struct mddev *mddev)
newpoolinfo->mddev = mddev; newpoolinfo->mddev = mddev;
newpoolinfo->raid_disks = raid_disks * 2; newpoolinfo->raid_disks = raid_disks * 2;
ret = mempool_init(&newpool, NR_RAID1_BIOS, r1bio_pool_alloc, ret = mempool_init(&newpool, NR_RAID_BIOS, r1bio_pool_alloc,
r1bio_pool_free, newpoolinfo); rbio_pool_free, newpoolinfo);
if (ret) { if (ret) {
kfree(newpoolinfo); kfree(newpoolinfo);
return ret; return ret;

View File

@ -64,31 +64,6 @@
* [B A] [D C] [B A] [E C D] * [B A] [D C] [B A] [E C D]
*/ */
/*
* Number of guaranteed r10bios in case of extreme VM load:
*/
#define NR_RAID10_BIOS 256
/* when we get a read error on a read-only array, we redirect to another
* device without failing the first device, or trying to over-write to
* correct the read error. To keep track of bad blocks on a per-bio
* level, we store IO_BLOCKED in the appropriate 'bios' pointer
*/
#define IO_BLOCKED ((struct bio *)1)
/* When we successfully write to a known bad-block, we need to remove the
* bad-block marking which must be done from process context. So we record
* the success by setting devs[n].bio to IO_MADE_GOOD
*/
#define IO_MADE_GOOD ((struct bio *)2)
#define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
/* When there are this many requests queued to be written by
* the raid10 thread, we become 'congested' to provide back-pressure
* for writeback.
*/
static int max_queued_requests = 1024;
static void allow_barrier(struct r10conf *conf); static void allow_barrier(struct r10conf *conf);
static void lower_barrier(struct r10conf *conf); static void lower_barrier(struct r10conf *conf);
static int _enough(struct r10conf *conf, int previous, int ignore); static int _enough(struct r10conf *conf, int previous, int ignore);
@ -123,11 +98,6 @@ static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
return kzalloc(size, gfp_flags); return kzalloc(size, gfp_flags);
} }
static void r10bio_pool_free(void *r10_bio, void *data)
{
kfree(r10_bio);
}
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
/* amount of memory to reserve for resync requests */ /* amount of memory to reserve for resync requests */
#define RESYNC_WINDOW (1024*1024) #define RESYNC_WINDOW (1024*1024)
@ -233,7 +203,7 @@ out_free_bio:
} }
kfree(rps); kfree(rps);
out_free_r10bio: out_free_r10bio:
r10bio_pool_free(r10_bio, conf); rbio_pool_free(r10_bio, conf);
return NULL; return NULL;
} }
@ -261,7 +231,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
/* resync pages array stored in the 1st bio's .bi_private */ /* resync pages array stored in the 1st bio's .bi_private */
kfree(rp); kfree(rp);
r10bio_pool_free(r10bio, conf); rbio_pool_free(r10bio, conf);
} }
static void put_all_bios(struct r10conf *conf, struct r10bio *r10_bio) static void put_all_bios(struct r10conf *conf, struct r10bio *r10_bio)
@ -737,15 +707,19 @@ static struct md_rdev *read_balance(struct r10conf *conf,
int sectors = r10_bio->sectors; int sectors = r10_bio->sectors;
int best_good_sectors; int best_good_sectors;
sector_t new_distance, best_dist; sector_t new_distance, best_dist;
struct md_rdev *best_rdev, *rdev = NULL; struct md_rdev *best_dist_rdev, *best_pending_rdev, *rdev = NULL;
int do_balance; int do_balance;
int best_slot; int best_dist_slot, best_pending_slot;
bool has_nonrot_disk = false;
unsigned int min_pending;
struct geom *geo = &conf->geo; struct geom *geo = &conf->geo;
raid10_find_phys(conf, r10_bio); raid10_find_phys(conf, r10_bio);
rcu_read_lock(); rcu_read_lock();
best_slot = -1; best_dist_slot = -1;
best_rdev = NULL; min_pending = UINT_MAX;
best_dist_rdev = NULL;
best_pending_rdev = NULL;
best_dist = MaxSector; best_dist = MaxSector;
best_good_sectors = 0; best_good_sectors = 0;
do_balance = 1; do_balance = 1;
@ -767,6 +741,8 @@ static struct md_rdev *read_balance(struct r10conf *conf,
sector_t first_bad; sector_t first_bad;
int bad_sectors; int bad_sectors;
sector_t dev_sector; sector_t dev_sector;
unsigned int pending;
bool nonrot;
if (r10_bio->devs[slot].bio == IO_BLOCKED) if (r10_bio->devs[slot].bio == IO_BLOCKED)
continue; continue;
@ -803,8 +779,8 @@ static struct md_rdev *read_balance(struct r10conf *conf,
first_bad - dev_sector; first_bad - dev_sector;
if (good_sectors > best_good_sectors) { if (good_sectors > best_good_sectors) {
best_good_sectors = good_sectors; best_good_sectors = good_sectors;
best_slot = slot; best_dist_slot = slot;
best_rdev = rdev; best_dist_rdev = rdev;
} }
if (!do_balance) if (!do_balance)
/* Must read from here */ /* Must read from here */
@ -817,14 +793,23 @@ static struct md_rdev *read_balance(struct r10conf *conf,
if (!do_balance) if (!do_balance)
break; break;
if (best_slot >= 0) nonrot = blk_queue_nonrot(bdev_get_queue(rdev->bdev));
has_nonrot_disk |= nonrot;
pending = atomic_read(&rdev->nr_pending);
if (min_pending > pending && nonrot) {
min_pending = pending;
best_pending_slot = slot;
best_pending_rdev = rdev;
}
if (best_dist_slot >= 0)
/* At least 2 disks to choose from so failfast is OK */ /* At least 2 disks to choose from so failfast is OK */
set_bit(R10BIO_FailFast, &r10_bio->state); set_bit(R10BIO_FailFast, &r10_bio->state);
/* This optimisation is debatable, and completely destroys /* This optimisation is debatable, and completely destroys
* sequential read speed for 'far copies' arrays. So only * sequential read speed for 'far copies' arrays. So only
* keep it for 'near' arrays, and review those later. * keep it for 'near' arrays, and review those later.
*/ */
if (geo->near_copies > 1 && !atomic_read(&rdev->nr_pending)) if (geo->near_copies > 1 && !pending)
new_distance = 0; new_distance = 0;
/* for far > 1 always use the lowest address */ /* for far > 1 always use the lowest address */
@ -833,15 +818,21 @@ static struct md_rdev *read_balance(struct r10conf *conf,
else else
new_distance = abs(r10_bio->devs[slot].addr - new_distance = abs(r10_bio->devs[slot].addr -
conf->mirrors[disk].head_position); conf->mirrors[disk].head_position);
if (new_distance < best_dist) { if (new_distance < best_dist) {
best_dist = new_distance; best_dist = new_distance;
best_slot = slot; best_dist_slot = slot;
best_rdev = rdev; best_dist_rdev = rdev;
} }
} }
if (slot >= conf->copies) { if (slot >= conf->copies) {
slot = best_slot; if (has_nonrot_disk) {
rdev = best_rdev; slot = best_pending_slot;
rdev = best_pending_rdev;
} else {
slot = best_dist_slot;
rdev = best_dist_rdev;
}
} }
if (slot >= 0) { if (slot >= 0) {
@ -3675,8 +3666,8 @@ static struct r10conf *setup_conf(struct mddev *mddev)
conf->geo = geo; conf->geo = geo;
conf->copies = copies; conf->copies = copies;
err = mempool_init(&conf->r10bio_pool, NR_RAID10_BIOS, r10bio_pool_alloc, err = mempool_init(&conf->r10bio_pool, NR_RAID_BIOS, r10bio_pool_alloc,
r10bio_pool_free, conf); rbio_pool_free, conf);
if (err) if (err)
goto out; goto out;
@ -4780,8 +4771,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
int idx = 0; int idx = 0;
struct page **pages; struct page **pages;
r10b = kmalloc(sizeof(*r10b) + r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO);
sizeof(struct r10dev) * conf->copies, GFP_NOIO);
if (!r10b) { if (!r10b) {
set_bit(MD_RECOVERY_INTR, &mddev->recovery); set_bit(MD_RECOVERY_INTR, &mddev->recovery);
return -ENOMEM; return -ENOMEM;

View File

@ -5251,7 +5251,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
rcu_read_unlock(); rcu_read_unlock();
raid_bio->bi_next = (void*)rdev; raid_bio->bi_next = (void*)rdev;
bio_set_dev(align_bi, rdev->bdev); bio_set_dev(align_bi, rdev->bdev);
bio_clear_flag(align_bi, BIO_SEG_VALID);
if (is_badblock(rdev, align_bi->bi_iter.bi_sector, if (is_badblock(rdev, align_bi->bi_iter.bi_sector,
bio_sectors(align_bi), bio_sectors(align_bi),
@ -7672,7 +7671,7 @@ abort:
static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev) static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
{ {
struct r5conf *conf = mddev->private; struct r5conf *conf = mddev->private;
int err = -EEXIST; int ret, err = -EEXIST;
int disk; int disk;
struct disk_info *p; struct disk_info *p;
int first = 0; int first = 0;
@ -7687,7 +7686,14 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
* The array is in readonly mode if journal is missing, so no * The array is in readonly mode if journal is missing, so no
* write requests running. We should be safe * write requests running. We should be safe
*/ */
log_init(conf, rdev, false); ret = log_init(conf, rdev, false);
if (ret)
return ret;
ret = r5l_start(conf->log);
if (ret)
return ret;
return 0; return 0;
} }
if (mddev->recovery_disabled == conf->recovery_disabled) if (mddev->recovery_disabled == conf->recovery_disabled)

View File

@ -1113,15 +1113,15 @@ static struct nvme_id_ns *nvme_identify_ns(struct nvme_ctrl *ctrl,
return id; return id;
} }
static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11, static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned int fid,
void *buffer, size_t buflen, u32 *result) unsigned int dword11, void *buffer, size_t buflen, u32 *result)
{ {
struct nvme_command c; struct nvme_command c;
union nvme_result res; union nvme_result res;
int ret; int ret;
memset(&c, 0, sizeof(c)); memset(&c, 0, sizeof(c));
c.features.opcode = nvme_admin_set_features; c.features.opcode = op;
c.features.fid = cpu_to_le32(fid); c.features.fid = cpu_to_le32(fid);
c.features.dword11 = cpu_to_le32(dword11); c.features.dword11 = cpu_to_le32(dword11);
@ -1132,6 +1132,24 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
return ret; return ret;
} }
int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
unsigned int dword11, void *buffer, size_t buflen,
u32 *result)
{
return nvme_features(dev, nvme_admin_set_features, fid, dword11, buffer,
buflen, result);
}
EXPORT_SYMBOL_GPL(nvme_set_features);
int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid,
unsigned int dword11, void *buffer, size_t buflen,
u32 *result)
{
return nvme_features(dev, nvme_admin_get_features, fid, dword11, buffer,
buflen, result);
}
EXPORT_SYMBOL_GPL(nvme_get_features);
int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count) int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
{ {
u32 q_count = (*count - 1) | ((*count - 1) << 16); u32 q_count = (*count - 1) | ((*count - 1) << 16);
@ -3318,7 +3336,7 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups); device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups);
nvme_mpath_add_disk(ns, id); nvme_mpath_add_disk(ns, id);
nvme_fault_inject_init(ns); nvme_fault_inject_init(&ns->fault_inject, ns->disk->disk_name);
kfree(id); kfree(id);
return 0; return 0;
@ -3343,7 +3361,15 @@ static void nvme_ns_remove(struct nvme_ns *ns)
if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags)) if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
return; return;
nvme_fault_inject_fini(ns); nvme_fault_inject_fini(&ns->fault_inject);
mutex_lock(&ns->ctrl->subsys->lock);
list_del_rcu(&ns->siblings);
mutex_unlock(&ns->ctrl->subsys->lock);
synchronize_rcu(); /* guarantee not available in head->list */
nvme_mpath_clear_current_path(ns);
synchronize_srcu(&ns->head->srcu); /* wait for concurrent submissions */
if (ns->disk && ns->disk->flags & GENHD_FL_UP) { if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
del_gendisk(ns->disk); del_gendisk(ns->disk);
blk_cleanup_queue(ns->queue); blk_cleanup_queue(ns->queue);
@ -3351,16 +3377,10 @@ static void nvme_ns_remove(struct nvme_ns *ns)
blk_integrity_unregister(ns->disk); blk_integrity_unregister(ns->disk);
} }
mutex_lock(&ns->ctrl->subsys->lock);
list_del_rcu(&ns->siblings);
nvme_mpath_clear_current_path(ns);
mutex_unlock(&ns->ctrl->subsys->lock);
down_write(&ns->ctrl->namespaces_rwsem); down_write(&ns->ctrl->namespaces_rwsem);
list_del_init(&ns->list); list_del_init(&ns->list);
up_write(&ns->ctrl->namespaces_rwsem); up_write(&ns->ctrl->namespaces_rwsem);
synchronize_srcu(&ns->head->srcu);
nvme_mpath_check_last_path(ns); nvme_mpath_check_last_path(ns);
nvme_put_ns(ns); nvme_put_ns(ns);
} }
@ -3702,6 +3722,7 @@ EXPORT_SYMBOL_GPL(nvme_start_ctrl);
void nvme_uninit_ctrl(struct nvme_ctrl *ctrl) void nvme_uninit_ctrl(struct nvme_ctrl *ctrl)
{ {
nvme_fault_inject_fini(&ctrl->fault_inject);
dev_pm_qos_hide_latency_tolerance(ctrl->device); dev_pm_qos_hide_latency_tolerance(ctrl->device);
cdev_device_del(&ctrl->cdev, ctrl->device); cdev_device_del(&ctrl->cdev, ctrl->device);
} }
@ -3797,6 +3818,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
dev_pm_qos_update_user_latency_tolerance(ctrl->device, dev_pm_qos_update_user_latency_tolerance(ctrl->device,
min(default_ps_max_latency_us, (unsigned long)S32_MAX)); min(default_ps_max_latency_us, (unsigned long)S32_MAX));
nvme_fault_inject_init(&ctrl->fault_inject, dev_name(ctrl->device));
return 0; return 0;
out_free_name: out_free_name:
kfree_const(ctrl->device->kobj.name); kfree_const(ctrl->device->kobj.name);

View File

@ -578,7 +578,7 @@ bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
switch (ctrl->state) { switch (ctrl->state) {
case NVME_CTRL_NEW: case NVME_CTRL_NEW:
case NVME_CTRL_CONNECTING: case NVME_CTRL_CONNECTING:
if (req->cmd->common.opcode == nvme_fabrics_command && if (nvme_is_fabrics(req->cmd) &&
req->cmd->fabrics.fctype == nvme_fabrics_type_connect) req->cmd->fabrics.fctype == nvme_fabrics_type_connect)
return true; return true;
break; break;

View File

@ -15,11 +15,10 @@ static DECLARE_FAULT_ATTR(fail_default_attr);
static char *fail_request; static char *fail_request;
module_param(fail_request, charp, 0000); module_param(fail_request, charp, 0000);
void nvme_fault_inject_init(struct nvme_ns *ns) void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj,
const char *dev_name)
{ {
struct dentry *dir, *parent; struct dentry *dir, *parent;
char *name = ns->disk->disk_name;
struct nvme_fault_inject *fault_inj = &ns->fault_inject;
struct fault_attr *attr = &fault_inj->attr; struct fault_attr *attr = &fault_inj->attr;
/* set default fault injection attribute */ /* set default fault injection attribute */
@ -27,20 +26,20 @@ void nvme_fault_inject_init(struct nvme_ns *ns)
setup_fault_attr(&fail_default_attr, fail_request); setup_fault_attr(&fail_default_attr, fail_request);
/* create debugfs directory and attribute */ /* create debugfs directory and attribute */
parent = debugfs_create_dir(name, NULL); parent = debugfs_create_dir(dev_name, NULL);
if (!parent) { if (!parent) {
pr_warn("%s: failed to create debugfs directory\n", name); pr_warn("%s: failed to create debugfs directory\n", dev_name);
return; return;
} }
*attr = fail_default_attr; *attr = fail_default_attr;
dir = fault_create_debugfs_attr("fault_inject", parent, attr); dir = fault_create_debugfs_attr("fault_inject", parent, attr);
if (IS_ERR(dir)) { if (IS_ERR(dir)) {
pr_warn("%s: failed to create debugfs attr\n", name); pr_warn("%s: failed to create debugfs attr\n", dev_name);
debugfs_remove_recursive(parent); debugfs_remove_recursive(parent);
return; return;
} }
ns->fault_inject.parent = parent; fault_inj->parent = parent;
/* create debugfs for status code and dont_retry */ /* create debugfs for status code and dont_retry */
fault_inj->status = NVME_SC_INVALID_OPCODE; fault_inj->status = NVME_SC_INVALID_OPCODE;
@ -49,29 +48,33 @@ void nvme_fault_inject_init(struct nvme_ns *ns)
debugfs_create_bool("dont_retry", 0600, dir, &fault_inj->dont_retry); debugfs_create_bool("dont_retry", 0600, dir, &fault_inj->dont_retry);
} }
void nvme_fault_inject_fini(struct nvme_ns *ns) void nvme_fault_inject_fini(struct nvme_fault_inject *fault_inject)
{ {
/* remove debugfs directories */ /* remove debugfs directories */
debugfs_remove_recursive(ns->fault_inject.parent); debugfs_remove_recursive(fault_inject->parent);
} }
void nvme_should_fail(struct request *req) void nvme_should_fail(struct request *req)
{ {
struct gendisk *disk = req->rq_disk; struct gendisk *disk = req->rq_disk;
struct nvme_ns *ns = NULL; struct nvme_fault_inject *fault_inject = NULL;
u16 status; u16 status;
/* if (disk) {
* make sure this request is coming from a valid namespace struct nvme_ns *ns = disk->private_data;
*/
if (!disk)
return;
ns = disk->private_data; if (ns)
if (ns && should_fail(&ns->fault_inject.attr, 1)) { fault_inject = &ns->fault_inject;
else
WARN_ONCE(1, "No namespace found for request\n");
} else {
fault_inject = &nvme_req(req)->ctrl->fault_inject;
}
if (fault_inject && should_fail(&fault_inject->attr, 1)) {
/* inject status code and DNR bit */ /* inject status code and DNR bit */
status = ns->fault_inject.status; status = fault_inject->status;
if (ns->fault_inject.dont_retry) if (fault_inject->dont_retry)
status |= NVME_SC_DNR; status |= NVME_SC_DNR;
nvme_req(req)->status = status; nvme_req(req)->status = status;
} }

View File

@ -2607,6 +2607,12 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
if (nvme_fc_ctlr_active_on_rport(ctrl)) if (nvme_fc_ctlr_active_on_rport(ctrl))
return -ENOTUNIQ; return -ENOTUNIQ;
dev_info(ctrl->ctrl.device,
"NVME-FC{%d}: create association : host wwpn 0x%016llx "
" rport wwpn 0x%016llx: NQN \"%s\"\n",
ctrl->cnum, ctrl->lport->localport.port_name,
ctrl->rport->remoteport.port_name, ctrl->ctrl.opts->subsysnqn);
/* /*
* Create the admin queue * Create the admin queue
*/ */

View File

@ -660,7 +660,7 @@ static struct request *nvme_nvm_alloc_request(struct request_queue *q,
rq->cmd_flags &= ~REQ_FAILFAST_DRIVER; rq->cmd_flags &= ~REQ_FAILFAST_DRIVER;
if (rqd->bio) if (rqd->bio)
blk_init_request_from_bio(rq, rqd->bio); blk_rq_append_bio(rq, &rqd->bio);
else else
rq->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM); rq->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);

View File

@ -146,6 +146,15 @@ enum nvme_ctrl_state {
NVME_CTRL_DEAD, NVME_CTRL_DEAD,
}; };
struct nvme_fault_inject {
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
struct fault_attr attr;
struct dentry *parent;
bool dont_retry; /* DNR, do not retry */
u16 status; /* status code */
#endif
};
struct nvme_ctrl { struct nvme_ctrl {
bool comp_seen; bool comp_seen;
enum nvme_ctrl_state state; enum nvme_ctrl_state state;
@ -247,6 +256,8 @@ struct nvme_ctrl {
struct page *discard_page; struct page *discard_page;
unsigned long discard_page_busy; unsigned long discard_page_busy;
struct nvme_fault_inject fault_inject;
}; };
enum nvme_iopolicy { enum nvme_iopolicy {
@ -313,15 +324,6 @@ struct nvme_ns_head {
#endif #endif
}; };
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
struct nvme_fault_inject {
struct fault_attr attr;
struct dentry *parent;
bool dont_retry; /* DNR, do not retry */
u16 status; /* status code */
};
#endif
struct nvme_ns { struct nvme_ns {
struct list_head list; struct list_head list;
@ -349,9 +351,7 @@ struct nvme_ns {
#define NVME_NS_ANA_PENDING 2 #define NVME_NS_ANA_PENDING 2
u16 noiob; u16 noiob;
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
struct nvme_fault_inject fault_inject; struct nvme_fault_inject fault_inject;
#endif
}; };
@ -372,12 +372,18 @@ struct nvme_ctrl_ops {
}; };
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
void nvme_fault_inject_init(struct nvme_ns *ns); void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj,
void nvme_fault_inject_fini(struct nvme_ns *ns); const char *dev_name);
void nvme_fault_inject_fini(struct nvme_fault_inject *fault_inject);
void nvme_should_fail(struct request *req); void nvme_should_fail(struct request *req);
#else #else
static inline void nvme_fault_inject_init(struct nvme_ns *ns) {} static inline void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj,
static inline void nvme_fault_inject_fini(struct nvme_ns *ns) {} const char *dev_name)
{
}
static inline void nvme_fault_inject_fini(struct nvme_fault_inject *fault_inj)
{
}
static inline void nvme_should_fail(struct request *req) {} static inline void nvme_should_fail(struct request *req) {}
#endif #endif
@ -459,6 +465,12 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
union nvme_result *result, void *buffer, unsigned bufflen, union nvme_result *result, void *buffer, unsigned bufflen,
unsigned timeout, int qid, int at_head, unsigned timeout, int qid, int at_head,
blk_mq_req_flags_t flags, bool poll); blk_mq_req_flags_t flags, bool poll);
int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
unsigned int dword11, void *buffer, size_t buflen,
u32 *result);
int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid,
unsigned int dword11, void *buffer, size_t buflen,
u32 *result);
int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count); int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
void nvme_stop_keep_alive(struct nvme_ctrl *ctrl); void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
int nvme_reset_ctrl(struct nvme_ctrl *ctrl); int nvme_reset_ctrl(struct nvme_ctrl *ctrl);

View File

@ -18,6 +18,7 @@
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/once.h> #include <linux/once.h>
#include <linux/pci.h> #include <linux/pci.h>
#include <linux/suspend.h>
#include <linux/t10-pi.h> #include <linux/t10-pi.h>
#include <linux/types.h> #include <linux/types.h>
#include <linux/io-64-nonatomic-lo-hi.h> #include <linux/io-64-nonatomic-lo-hi.h>
@ -67,20 +68,14 @@ static int io_queue_depth = 1024;
module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644); module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644);
MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2"); MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2");
static int queue_count_set(const char *val, const struct kernel_param *kp);
static const struct kernel_param_ops queue_count_ops = {
.set = queue_count_set,
.get = param_get_int,
};
static int write_queues; static int write_queues;
module_param_cb(write_queues, &queue_count_ops, &write_queues, 0644); module_param(write_queues, int, 0644);
MODULE_PARM_DESC(write_queues, MODULE_PARM_DESC(write_queues,
"Number of queues to use for writes. If not set, reads and writes " "Number of queues to use for writes. If not set, reads and writes "
"will share a queue set."); "will share a queue set.");
static int poll_queues = 0; static int poll_queues;
module_param_cb(poll_queues, &queue_count_ops, &poll_queues, 0644); module_param(poll_queues, int, 0644);
MODULE_PARM_DESC(poll_queues, "Number of queues to use for polled IO."); MODULE_PARM_DESC(poll_queues, "Number of queues to use for polled IO.");
struct nvme_dev; struct nvme_dev;
@ -116,6 +111,7 @@ struct nvme_dev {
u32 cmbsz; u32 cmbsz;
u32 cmbloc; u32 cmbloc;
struct nvme_ctrl ctrl; struct nvme_ctrl ctrl;
u32 last_ps;
mempool_t *iod_mempool; mempool_t *iod_mempool;
@ -144,19 +140,6 @@ static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
return param_set_int(val, kp); return param_set_int(val, kp);
} }
static int queue_count_set(const char *val, const struct kernel_param *kp)
{
int n, ret;
ret = kstrtoint(val, 10, &n);
if (ret)
return ret;
if (n > num_possible_cpus())
n = num_possible_cpus();
return param_set_int(val, kp);
}
static inline unsigned int sq_idx(unsigned int qid, u32 stride) static inline unsigned int sq_idx(unsigned int qid, u32 stride)
{ {
return qid * 2 * stride; return qid * 2 * stride;
@ -2068,6 +2051,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
.priv = dev, .priv = dev,
}; };
unsigned int irq_queues, this_p_queues; unsigned int irq_queues, this_p_queues;
unsigned int nr_cpus = num_possible_cpus();
/* /*
* Poll queues don't need interrupts, but we need at least one IO * Poll queues don't need interrupts, but we need at least one IO
@ -2078,7 +2062,10 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
this_p_queues = nr_io_queues - 1; this_p_queues = nr_io_queues - 1;
irq_queues = 1; irq_queues = 1;
} else { } else {
irq_queues = nr_io_queues - this_p_queues + 1; if (nr_cpus < nr_io_queues - this_p_queues)
irq_queues = nr_cpus + 1;
else
irq_queues = nr_io_queues - this_p_queues + 1;
} }
dev->io_queues[HCTX_TYPE_POLL] = this_p_queues; dev->io_queues[HCTX_TYPE_POLL] = this_p_queues;
@ -2464,10 +2451,8 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
kfree(dev); kfree(dev);
} }
static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status) static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
{ {
dev_warn(dev->ctrl.device, "Removing after probe failure status: %d\n", status);
nvme_get_ctrl(&dev->ctrl); nvme_get_ctrl(&dev->ctrl);
nvme_dev_disable(dev, false); nvme_dev_disable(dev, false);
nvme_kill_queues(&dev->ctrl); nvme_kill_queues(&dev->ctrl);
@ -2480,11 +2465,13 @@ static void nvme_reset_work(struct work_struct *work)
struct nvme_dev *dev = struct nvme_dev *dev =
container_of(work, struct nvme_dev, ctrl.reset_work); container_of(work, struct nvme_dev, ctrl.reset_work);
bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL); bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
int result = -ENODEV; int result;
enum nvme_ctrl_state new_state = NVME_CTRL_LIVE; enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING)) if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING)) {
result = -ENODEV;
goto out; goto out;
}
/* /*
* If we're called to reset a live controller first shut it down before * If we're called to reset a live controller first shut it down before
@ -2528,6 +2515,7 @@ static void nvme_reset_work(struct work_struct *work)
if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) { if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
dev_warn(dev->ctrl.device, dev_warn(dev->ctrl.device,
"failed to mark controller CONNECTING\n"); "failed to mark controller CONNECTING\n");
result = -EBUSY;
goto out; goto out;
} }
@ -2588,6 +2576,7 @@ static void nvme_reset_work(struct work_struct *work)
if (!nvme_change_ctrl_state(&dev->ctrl, new_state)) { if (!nvme_change_ctrl_state(&dev->ctrl, new_state)) {
dev_warn(dev->ctrl.device, dev_warn(dev->ctrl.device,
"failed to mark controller state %d\n", new_state); "failed to mark controller state %d\n", new_state);
result = -ENODEV;
goto out; goto out;
} }
@ -2597,7 +2586,10 @@ static void nvme_reset_work(struct work_struct *work)
out_unlock: out_unlock:
mutex_unlock(&dev->shutdown_lock); mutex_unlock(&dev->shutdown_lock);
out: out:
nvme_remove_dead_ctrl(dev, result); if (result)
dev_warn(dev->ctrl.device,
"Removing after probe failure status: %d\n", result);
nvme_remove_dead_ctrl(dev);
} }
static void nvme_remove_dead_ctrl_work(struct work_struct *work) static void nvme_remove_dead_ctrl_work(struct work_struct *work)
@ -2835,16 +2827,94 @@ static void nvme_remove(struct pci_dev *pdev)
} }
#ifdef CONFIG_PM_SLEEP #ifdef CONFIG_PM_SLEEP
static int nvme_get_power_state(struct nvme_ctrl *ctrl, u32 *ps)
{
return nvme_get_features(ctrl, NVME_FEAT_POWER_MGMT, 0, NULL, 0, ps);
}
static int nvme_set_power_state(struct nvme_ctrl *ctrl, u32 ps)
{
return nvme_set_features(ctrl, NVME_FEAT_POWER_MGMT, ps, NULL, 0, NULL);
}
static int nvme_resume(struct device *dev)
{
struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
struct nvme_ctrl *ctrl = &ndev->ctrl;
if (pm_resume_via_firmware() || !ctrl->npss ||
nvme_set_power_state(ctrl, ndev->last_ps) != 0)
nvme_reset_ctrl(ctrl);
return 0;
}
static int nvme_suspend(struct device *dev) static int nvme_suspend(struct device *dev)
{ {
struct pci_dev *pdev = to_pci_dev(dev); struct pci_dev *pdev = to_pci_dev(dev);
struct nvme_dev *ndev = pci_get_drvdata(pdev); struct nvme_dev *ndev = pci_get_drvdata(pdev);
struct nvme_ctrl *ctrl = &ndev->ctrl;
int ret = -EBUSY;
/*
* The platform does not remove power for a kernel managed suspend so
* use host managed nvme power settings for lowest idle power if
* possible. This should have quicker resume latency than a full device
* shutdown. But if the firmware is involved after the suspend or the
* device does not support any non-default power states, shut down the
* device fully.
*/
if (pm_suspend_via_firmware() || !ctrl->npss) {
nvme_dev_disable(ndev, true);
return 0;
}
nvme_start_freeze(ctrl);
nvme_wait_freeze(ctrl);
nvme_sync_queues(ctrl);
if (ctrl->state != NVME_CTRL_LIVE &&
ctrl->state != NVME_CTRL_ADMIN_ONLY)
goto unfreeze;
ndev->last_ps = 0;
ret = nvme_get_power_state(ctrl, &ndev->last_ps);
if (ret < 0)
goto unfreeze;
ret = nvme_set_power_state(ctrl, ctrl->npss);
if (ret < 0)
goto unfreeze;
if (ret) {
/*
* Clearing npss forces a controller reset on resume. The
* correct value will be resdicovered then.
*/
nvme_dev_disable(ndev, true);
ctrl->npss = 0;
ret = 0;
goto unfreeze;
}
/*
* A saved state prevents pci pm from generically controlling the
* device's power. If we're using protocol specific settings, we don't
* want pci interfering.
*/
pci_save_state(pdev);
unfreeze:
nvme_unfreeze(ctrl);
return ret;
}
static int nvme_simple_suspend(struct device *dev)
{
struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
nvme_dev_disable(ndev, true); nvme_dev_disable(ndev, true);
return 0; return 0;
} }
static int nvme_resume(struct device *dev) static int nvme_simple_resume(struct device *dev)
{ {
struct pci_dev *pdev = to_pci_dev(dev); struct pci_dev *pdev = to_pci_dev(dev);
struct nvme_dev *ndev = pci_get_drvdata(pdev); struct nvme_dev *ndev = pci_get_drvdata(pdev);
@ -2852,9 +2922,16 @@ static int nvme_resume(struct device *dev)
nvme_reset_ctrl(&ndev->ctrl); nvme_reset_ctrl(&ndev->ctrl);
return 0; return 0;
} }
#endif
static SIMPLE_DEV_PM_OPS(nvme_dev_pm_ops, nvme_suspend, nvme_resume); const struct dev_pm_ops nvme_dev_pm_ops = {
.suspend = nvme_suspend,
.resume = nvme_resume,
.freeze = nvme_simple_suspend,
.thaw = nvme_simple_resume,
.poweroff = nvme_simple_suspend,
.restore = nvme_simple_resume,
};
#endif /* CONFIG_PM_SLEEP */
static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev, static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
pci_channel_state_t state) pci_channel_state_t state)
@ -2959,9 +3036,11 @@ static struct pci_driver nvme_driver = {
.probe = nvme_probe, .probe = nvme_probe,
.remove = nvme_remove, .remove = nvme_remove,
.shutdown = nvme_shutdown, .shutdown = nvme_shutdown,
#ifdef CONFIG_PM_SLEEP
.driver = { .driver = {
.pm = &nvme_dev_pm_ops, .pm = &nvme_dev_pm_ops,
}, },
#endif
.sriov_configure = pci_sriov_configure_simple, .sriov_configure = pci_sriov_configure_simple,
.err_handler = &nvme_err_handler, .err_handler = &nvme_err_handler,
}; };

View File

@ -135,6 +135,69 @@ const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p,
} }
} }
static const char *nvme_trace_fabrics_property_set(struct trace_seq *p, u8 *spc)
{
const char *ret = trace_seq_buffer_ptr(p);
u8 attrib = spc[0];
u32 ofst = get_unaligned_le32(spc + 4);
u64 value = get_unaligned_le64(spc + 8);
trace_seq_printf(p, "attrib=%u, ofst=0x%x, value=0x%llx",
attrib, ofst, value);
trace_seq_putc(p, 0);
return ret;
}
static const char *nvme_trace_fabrics_connect(struct trace_seq *p, u8 *spc)
{
const char *ret = trace_seq_buffer_ptr(p);
u16 recfmt = get_unaligned_le16(spc);
u16 qid = get_unaligned_le16(spc + 2);
u16 sqsize = get_unaligned_le16(spc + 4);
u8 cattr = spc[6];
u32 kato = get_unaligned_le32(spc + 8);
trace_seq_printf(p, "recfmt=%u, qid=%u, sqsize=%u, cattr=%u, kato=%u",
recfmt, qid, sqsize, cattr, kato);
trace_seq_putc(p, 0);
return ret;
}
static const char *nvme_trace_fabrics_property_get(struct trace_seq *p, u8 *spc)
{
const char *ret = trace_seq_buffer_ptr(p);
u8 attrib = spc[0];
u32 ofst = get_unaligned_le32(spc + 4);
trace_seq_printf(p, "attrib=%u, ofst=0x%x", attrib, ofst);
trace_seq_putc(p, 0);
return ret;
}
static const char *nvme_trace_fabrics_common(struct trace_seq *p, u8 *spc)
{
const char *ret = trace_seq_buffer_ptr(p);
trace_seq_printf(p, "spcecific=%*ph", 24, spc);
trace_seq_putc(p, 0);
return ret;
}
const char *nvme_trace_parse_fabrics_cmd(struct trace_seq *p,
u8 fctype, u8 *spc)
{
switch (fctype) {
case nvme_fabrics_type_property_set:
return nvme_trace_fabrics_property_set(p, spc);
case nvme_fabrics_type_connect:
return nvme_trace_fabrics_connect(p, spc);
case nvme_fabrics_type_property_get:
return nvme_trace_fabrics_property_get(p, spc);
default:
return nvme_trace_fabrics_common(p, spc);
}
}
const char *nvme_trace_disk_name(struct trace_seq *p, char *name) const char *nvme_trace_disk_name(struct trace_seq *p, char *name)
{ {
const char *ret = trace_seq_buffer_ptr(p); const char *ret = trace_seq_buffer_ptr(p);
@ -145,6 +208,5 @@ const char *nvme_trace_disk_name(struct trace_seq *p, char *name)
return ret; return ret;
} }
EXPORT_SYMBOL_GPL(nvme_trace_disk_name);
EXPORT_TRACEPOINT_SYMBOL_GPL(nvme_sq); EXPORT_TRACEPOINT_SYMBOL_GPL(nvme_sq);

View File

@ -16,59 +16,19 @@
#include "nvme.h" #include "nvme.h"
#define nvme_admin_opcode_name(opcode) { opcode, #opcode }
#define show_admin_opcode_name(val) \
__print_symbolic(val, \
nvme_admin_opcode_name(nvme_admin_delete_sq), \
nvme_admin_opcode_name(nvme_admin_create_sq), \
nvme_admin_opcode_name(nvme_admin_get_log_page), \
nvme_admin_opcode_name(nvme_admin_delete_cq), \
nvme_admin_opcode_name(nvme_admin_create_cq), \
nvme_admin_opcode_name(nvme_admin_identify), \
nvme_admin_opcode_name(nvme_admin_abort_cmd), \
nvme_admin_opcode_name(nvme_admin_set_features), \
nvme_admin_opcode_name(nvme_admin_get_features), \
nvme_admin_opcode_name(nvme_admin_async_event), \
nvme_admin_opcode_name(nvme_admin_ns_mgmt), \
nvme_admin_opcode_name(nvme_admin_activate_fw), \
nvme_admin_opcode_name(nvme_admin_download_fw), \
nvme_admin_opcode_name(nvme_admin_ns_attach), \
nvme_admin_opcode_name(nvme_admin_keep_alive), \
nvme_admin_opcode_name(nvme_admin_directive_send), \
nvme_admin_opcode_name(nvme_admin_directive_recv), \
nvme_admin_opcode_name(nvme_admin_dbbuf), \
nvme_admin_opcode_name(nvme_admin_format_nvm), \
nvme_admin_opcode_name(nvme_admin_security_send), \
nvme_admin_opcode_name(nvme_admin_security_recv), \
nvme_admin_opcode_name(nvme_admin_sanitize_nvm))
#define nvme_opcode_name(opcode) { opcode, #opcode }
#define show_nvm_opcode_name(val) \
__print_symbolic(val, \
nvme_opcode_name(nvme_cmd_flush), \
nvme_opcode_name(nvme_cmd_write), \
nvme_opcode_name(nvme_cmd_read), \
nvme_opcode_name(nvme_cmd_write_uncor), \
nvme_opcode_name(nvme_cmd_compare), \
nvme_opcode_name(nvme_cmd_write_zeroes), \
nvme_opcode_name(nvme_cmd_dsm), \
nvme_opcode_name(nvme_cmd_resv_register), \
nvme_opcode_name(nvme_cmd_resv_report), \
nvme_opcode_name(nvme_cmd_resv_acquire), \
nvme_opcode_name(nvme_cmd_resv_release))
#define show_opcode_name(qid, opcode) \
(qid ? show_nvm_opcode_name(opcode) : show_admin_opcode_name(opcode))
const char *nvme_trace_parse_admin_cmd(struct trace_seq *p, u8 opcode, const char *nvme_trace_parse_admin_cmd(struct trace_seq *p, u8 opcode,
u8 *cdw10); u8 *cdw10);
const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p, u8 opcode, const char *nvme_trace_parse_nvm_cmd(struct trace_seq *p, u8 opcode,
u8 *cdw10); u8 *cdw10);
const char *nvme_trace_parse_fabrics_cmd(struct trace_seq *p, u8 fctype,
u8 *spc);
#define parse_nvme_cmd(qid, opcode, cdw10) \ #define parse_nvme_cmd(qid, opcode, fctype, cdw10) \
(qid ? \ ((opcode) == nvme_fabrics_command ? \
nvme_trace_parse_nvm_cmd(p, opcode, cdw10) : \ nvme_trace_parse_fabrics_cmd(p, fctype, cdw10) : \
nvme_trace_parse_admin_cmd(p, opcode, cdw10)) ((qid) ? \
nvme_trace_parse_nvm_cmd(p, opcode, cdw10) : \
nvme_trace_parse_admin_cmd(p, opcode, cdw10)))
const char *nvme_trace_disk_name(struct trace_seq *p, char *name); const char *nvme_trace_disk_name(struct trace_seq *p, char *name);
#define __print_disk_name(name) \ #define __print_disk_name(name) \
@ -93,6 +53,7 @@ TRACE_EVENT(nvme_setup_cmd,
__field(int, qid) __field(int, qid)
__field(u8, opcode) __field(u8, opcode)
__field(u8, flags) __field(u8, flags)
__field(u8, fctype)
__field(u16, cid) __field(u16, cid)
__field(u32, nsid) __field(u32, nsid)
__field(u64, metadata) __field(u64, metadata)
@ -106,6 +67,7 @@ TRACE_EVENT(nvme_setup_cmd,
__entry->cid = cmd->common.command_id; __entry->cid = cmd->common.command_id;
__entry->nsid = le32_to_cpu(cmd->common.nsid); __entry->nsid = le32_to_cpu(cmd->common.nsid);
__entry->metadata = le64_to_cpu(cmd->common.metadata); __entry->metadata = le64_to_cpu(cmd->common.metadata);
__entry->fctype = cmd->fabrics.fctype;
__assign_disk_name(__entry->disk, req->rq_disk); __assign_disk_name(__entry->disk, req->rq_disk);
memcpy(__entry->cdw10, &cmd->common.cdw10, memcpy(__entry->cdw10, &cmd->common.cdw10,
sizeof(__entry->cdw10)); sizeof(__entry->cdw10));
@ -114,8 +76,10 @@ TRACE_EVENT(nvme_setup_cmd,
__entry->ctrl_id, __print_disk_name(__entry->disk), __entry->ctrl_id, __print_disk_name(__entry->disk),
__entry->qid, __entry->cid, __entry->nsid, __entry->qid, __entry->cid, __entry->nsid,
__entry->flags, __entry->metadata, __entry->flags, __entry->metadata,
show_opcode_name(__entry->qid, __entry->opcode), show_opcode_name(__entry->qid, __entry->opcode,
parse_nvme_cmd(__entry->qid, __entry->opcode, __entry->cdw10)) __entry->fctype),
parse_nvme_cmd(__entry->qid, __entry->opcode,
__entry->fctype, __entry->cdw10))
); );
TRACE_EVENT(nvme_complete_rq, TRACE_EVENT(nvme_complete_rq,
@ -141,7 +105,7 @@ TRACE_EVENT(nvme_complete_rq,
__entry->status = nvme_req(req)->status; __entry->status = nvme_req(req)->status;
__assign_disk_name(__entry->disk, req->rq_disk); __assign_disk_name(__entry->disk, req->rq_disk);
), ),
TP_printk("nvme%d: %sqid=%d, cmdid=%u, res=%llu, retries=%u, flags=0x%x, status=%u", TP_printk("nvme%d: %sqid=%d, cmdid=%u, res=%#llx, retries=%u, flags=0x%x, status=%#x",
__entry->ctrl_id, __print_disk_name(__entry->disk), __entry->ctrl_id, __print_disk_name(__entry->disk),
__entry->qid, __entry->cid, __entry->result, __entry->qid, __entry->cid, __entry->result,
__entry->retries, __entry->flags, __entry->status) __entry->retries, __entry->flags, __entry->status)

View File

@ -1,5 +1,7 @@
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
ccflags-y += -I$(src)
obj-$(CONFIG_NVME_TARGET) += nvmet.o obj-$(CONFIG_NVME_TARGET) += nvmet.o
obj-$(CONFIG_NVME_TARGET_LOOP) += nvme-loop.o obj-$(CONFIG_NVME_TARGET_LOOP) += nvme-loop.o
obj-$(CONFIG_NVME_TARGET_RDMA) += nvmet-rdma.o obj-$(CONFIG_NVME_TARGET_RDMA) += nvmet-rdma.o
@ -14,3 +16,4 @@ nvmet-rdma-y += rdma.o
nvmet-fc-y += fc.o nvmet-fc-y += fc.o
nvme-fcloop-y += fcloop.o nvme-fcloop-y += fcloop.o
nvmet-tcp-y += tcp.o nvmet-tcp-y += tcp.o
nvmet-$(CONFIG_TRACING) += trace.o

View File

@ -10,6 +10,9 @@
#include <linux/pci-p2pdma.h> #include <linux/pci-p2pdma.h>
#include <linux/scatterlist.h> #include <linux/scatterlist.h>
#define CREATE_TRACE_POINTS
#include "trace.h"
#include "nvmet.h" #include "nvmet.h"
struct workqueue_struct *buffered_io_wq; struct workqueue_struct *buffered_io_wq;
@ -311,6 +314,7 @@ int nvmet_enable_port(struct nvmet_port *port)
port->inline_data_size = 0; port->inline_data_size = 0;
port->enabled = true; port->enabled = true;
port->tr_ops = ops;
return 0; return 0;
} }
@ -321,6 +325,7 @@ void nvmet_disable_port(struct nvmet_port *port)
lockdep_assert_held(&nvmet_config_sem); lockdep_assert_held(&nvmet_config_sem);
port->enabled = false; port->enabled = false;
port->tr_ops = NULL;
ops = nvmet_transports[port->disc_addr.trtype]; ops = nvmet_transports[port->disc_addr.trtype];
ops->remove_port(port); ops->remove_port(port);
@ -689,6 +694,9 @@ static void __nvmet_req_complete(struct nvmet_req *req, u16 status)
if (unlikely(status)) if (unlikely(status))
nvmet_set_error(req, status); nvmet_set_error(req, status);
trace_nvmet_req_complete(req);
if (req->ns) if (req->ns)
nvmet_put_namespace(req->ns); nvmet_put_namespace(req->ns);
req->ops->queue_response(req); req->ops->queue_response(req);
@ -848,6 +856,8 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
req->error_loc = NVMET_NO_ERROR_LOC; req->error_loc = NVMET_NO_ERROR_LOC;
req->error_slba = 0; req->error_slba = 0;
trace_nvmet_req_init(req, req->cmd);
/* no support for fused commands yet */ /* no support for fused commands yet */
if (unlikely(flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND))) { if (unlikely(flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND))) {
req->error_loc = offsetof(struct nvme_common_command, flags); req->error_loc = offsetof(struct nvme_common_command, flags);
@ -871,7 +881,7 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
status = nvmet_parse_connect_cmd(req); status = nvmet_parse_connect_cmd(req);
else if (likely(req->sq->qid != 0)) else if (likely(req->sq->qid != 0))
status = nvmet_parse_io_cmd(req); status = nvmet_parse_io_cmd(req);
else if (req->cmd->common.opcode == nvme_fabrics_command) else if (nvme_is_fabrics(req->cmd))
status = nvmet_parse_fabrics_cmd(req); status = nvmet_parse_fabrics_cmd(req);
else if (req->sq->ctrl->subsys->type == NVME_NQN_DISC) else if (req->sq->ctrl->subsys->type == NVME_NQN_DISC)
status = nvmet_parse_discovery_cmd(req); status = nvmet_parse_discovery_cmd(req);

View File

@ -41,6 +41,10 @@ void nvmet_port_disc_changed(struct nvmet_port *port,
__nvmet_disc_changed(port, ctrl); __nvmet_disc_changed(port, ctrl);
} }
mutex_unlock(&nvmet_disc_subsys->lock); mutex_unlock(&nvmet_disc_subsys->lock);
/* If transport can signal change, notify transport */
if (port->tr_ops && port->tr_ops->discovery_chg)
port->tr_ops->discovery_chg(port);
} }
static void __nvmet_subsys_disc_changed(struct nvmet_port *port, static void __nvmet_subsys_disc_changed(struct nvmet_port *port,

View File

@ -268,7 +268,7 @@ u16 nvmet_parse_connect_cmd(struct nvmet_req *req)
{ {
struct nvme_command *cmd = req->cmd; struct nvme_command *cmd = req->cmd;
if (cmd->common.opcode != nvme_fabrics_command) { if (!nvme_is_fabrics(cmd)) {
pr_err("invalid command 0x%x on unconnected queue.\n", pr_err("invalid command 0x%x on unconnected queue.\n",
cmd->fabrics.opcode); cmd->fabrics.opcode);
req->error_loc = offsetof(struct nvme_common_command, opcode); req->error_loc = offsetof(struct nvme_common_command, opcode);

View File

@ -1806,7 +1806,7 @@ nvmet_fc_prep_fcp_rsp(struct nvmet_fc_tgtport *tgtport,
*/ */
rspcnt = atomic_inc_return(&fod->queue->zrspcnt); rspcnt = atomic_inc_return(&fod->queue->zrspcnt);
if (!(rspcnt % fod->queue->ersp_ratio) || if (!(rspcnt % fod->queue->ersp_ratio) ||
sqe->opcode == nvme_fabrics_command || nvme_is_fabrics((struct nvme_command *) sqe) ||
xfr_length != fod->req.transfer_len || xfr_length != fod->req.transfer_len ||
(le16_to_cpu(cqe->status) & 0xFFFE) || cqewd[0] || cqewd[1] || (le16_to_cpu(cqe->status) & 0xFFFE) || cqewd[0] || cqewd[1] ||
(sqe->flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND)) || (sqe->flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND)) ||
@ -2549,6 +2549,16 @@ nvmet_fc_remove_port(struct nvmet_port *port)
kfree(pe); kfree(pe);
} }
static void
nvmet_fc_discovery_chg(struct nvmet_port *port)
{
struct nvmet_fc_port_entry *pe = port->priv;
struct nvmet_fc_tgtport *tgtport = pe->tgtport;
if (tgtport && tgtport->ops->discovery_event)
tgtport->ops->discovery_event(&tgtport->fc_target_port);
}
static const struct nvmet_fabrics_ops nvmet_fc_tgt_fcp_ops = { static const struct nvmet_fabrics_ops nvmet_fc_tgt_fcp_ops = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.type = NVMF_TRTYPE_FC, .type = NVMF_TRTYPE_FC,
@ -2557,6 +2567,7 @@ static const struct nvmet_fabrics_ops nvmet_fc_tgt_fcp_ops = {
.remove_port = nvmet_fc_remove_port, .remove_port = nvmet_fc_remove_port,
.queue_response = nvmet_fc_fcp_nvme_cmd_done, .queue_response = nvmet_fc_fcp_nvme_cmd_done,
.delete_ctrl = nvmet_fc_delete_ctrl, .delete_ctrl = nvmet_fc_delete_ctrl,
.discovery_chg = nvmet_fc_discovery_chg,
}; };
static int __init nvmet_fc_init_module(void) static int __init nvmet_fc_init_module(void)

View File

@ -231,6 +231,11 @@ struct fcloop_lsreq {
int status; int status;
}; };
struct fcloop_rscn {
struct fcloop_tport *tport;
struct work_struct work;
};
enum { enum {
INI_IO_START = 0, INI_IO_START = 0,
INI_IO_ACTIVE = 1, INI_IO_ACTIVE = 1,
@ -348,6 +353,37 @@ fcloop_xmt_ls_rsp(struct nvmet_fc_target_port *tport,
return 0; return 0;
} }
/*
* Simulate reception of RSCN and converting it to a initiator transport
* call to rescan a remote port.
*/
static void
fcloop_tgt_rscn_work(struct work_struct *work)
{
struct fcloop_rscn *tgt_rscn =
container_of(work, struct fcloop_rscn, work);
struct fcloop_tport *tport = tgt_rscn->tport;
if (tport->remoteport)
nvme_fc_rescan_remoteport(tport->remoteport);
kfree(tgt_rscn);
}
static void
fcloop_tgt_discovery_evt(struct nvmet_fc_target_port *tgtport)
{
struct fcloop_rscn *tgt_rscn;
tgt_rscn = kzalloc(sizeof(*tgt_rscn), GFP_KERNEL);
if (!tgt_rscn)
return;
tgt_rscn->tport = tgtport->private;
INIT_WORK(&tgt_rscn->work, fcloop_tgt_rscn_work);
schedule_work(&tgt_rscn->work);
}
static void static void
fcloop_tfcp_req_free(struct kref *ref) fcloop_tfcp_req_free(struct kref *ref)
{ {
@ -839,6 +875,7 @@ static struct nvmet_fc_target_template tgttemplate = {
.fcp_op = fcloop_fcp_op, .fcp_op = fcloop_fcp_op,
.fcp_abort = fcloop_tgt_fcp_abort, .fcp_abort = fcloop_tgt_fcp_abort,
.fcp_req_release = fcloop_fcp_req_release, .fcp_req_release = fcloop_fcp_req_release,
.discovery_event = fcloop_tgt_discovery_evt,
.max_hw_queues = FCLOOP_HW_QUEUES, .max_hw_queues = FCLOOP_HW_QUEUES,
.max_sgl_segments = FCLOOP_SGL_SEGS, .max_sgl_segments = FCLOOP_SGL_SEGS,
.max_dif_sgl_segments = FCLOOP_SGL_SEGS, .max_dif_sgl_segments = FCLOOP_SGL_SEGS,

View File

@ -140,6 +140,7 @@ struct nvmet_port {
void *priv; void *priv;
bool enabled; bool enabled;
int inline_data_size; int inline_data_size;
const struct nvmet_fabrics_ops *tr_ops;
}; };
static inline struct nvmet_port *to_nvmet_port(struct config_item *item) static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
@ -277,6 +278,7 @@ struct nvmet_fabrics_ops {
void (*disc_traddr)(struct nvmet_req *req, void (*disc_traddr)(struct nvmet_req *req,
struct nvmet_port *port, char *traddr); struct nvmet_port *port, char *traddr);
u16 (*install_queue)(struct nvmet_sq *nvme_sq); u16 (*install_queue)(struct nvmet_sq *nvme_sq);
void (*discovery_chg)(struct nvmet_port *port);
}; };
#define NVMET_MAX_INLINE_BIOVEC 8 #define NVMET_MAX_INLINE_BIOVEC 8

View File

@ -0,0 +1,201 @@
// SPDX-License-Identifier: GPL-2.0
/*
* NVM Express target device driver tracepoints
* Copyright (c) 2018 Johannes Thumshirn, SUSE Linux GmbH
*/
#include <asm/unaligned.h>
#include "trace.h"
static const char *nvmet_trace_admin_identify(struct trace_seq *p, u8 *cdw10)
{
const char *ret = trace_seq_buffer_ptr(p);
u8 cns = cdw10[0];
u16 ctrlid = get_unaligned_le16(cdw10 + 2);
trace_seq_printf(p, "cns=%u, ctrlid=%u", cns, ctrlid);
trace_seq_putc(p, 0);
return ret;
}
static const char *nvmet_trace_admin_get_features(struct trace_seq *p,
u8 *cdw10)
{
const char *ret = trace_seq_buffer_ptr(p);
u8 fid = cdw10[0];
u8 sel = cdw10[1] & 0x7;
u32 cdw11 = get_unaligned_le32(cdw10 + 4);
trace_seq_printf(p, "fid=0x%x sel=0x%x cdw11=0x%x", fid, sel, cdw11);
trace_seq_putc(p, 0);
return ret;
}
static const char *nvmet_trace_read_write(struct trace_seq *p, u8 *cdw10)
{
const char *ret = trace_seq_buffer_ptr(p);
u64 slba = get_unaligned_le64(cdw10);
u16 length = get_unaligned_le16(cdw10 + 8);
u16 control = get_unaligned_le16(cdw10 + 10);
u32 dsmgmt = get_unaligned_le32(cdw10 + 12);
u32 reftag = get_unaligned_le32(cdw10 + 16);
trace_seq_printf(p,
"slba=%llu, len=%u, ctrl=0x%x, dsmgmt=%u, reftag=%u",
slba, length, control, dsmgmt, reftag);
trace_seq_putc(p, 0);
return ret;
}
static const char *nvmet_trace_dsm(struct trace_seq *p, u8 *cdw10)
{
const char *ret = trace_seq_buffer_ptr(p);
trace_seq_printf(p, "nr=%u, attributes=%u",
get_unaligned_le32(cdw10),
get_unaligned_le32(cdw10 + 4));
trace_seq_putc(p, 0);
return ret;
}
static const char *nvmet_trace_common(struct trace_seq *p, u8 *cdw10)
{
const char *ret = trace_seq_buffer_ptr(p);
trace_seq_printf(p, "cdw10=%*ph", 24, cdw10);
trace_seq_putc(p, 0);
return ret;
}
const char *nvmet_trace_parse_admin_cmd(struct trace_seq *p,
u8 opcode, u8 *cdw10)
{
switch (opcode) {
case nvme_admin_identify:
return nvmet_trace_admin_identify(p, cdw10);
case nvme_admin_get_features:
return nvmet_trace_admin_get_features(p, cdw10);
default:
return nvmet_trace_common(p, cdw10);
}
}
const char *nvmet_trace_parse_nvm_cmd(struct trace_seq *p,
u8 opcode, u8 *cdw10)
{
switch (opcode) {
case nvme_cmd_read:
case nvme_cmd_write:
case nvme_cmd_write_zeroes:
return nvmet_trace_read_write(p, cdw10);
case nvme_cmd_dsm:
return nvmet_trace_dsm(p, cdw10);
default:
return nvmet_trace_common(p, cdw10);
}
}
static const char *nvmet_trace_fabrics_property_set(struct trace_seq *p,
u8 *spc)
{
const char *ret = trace_seq_buffer_ptr(p);
u8 attrib = spc[0];
u32 ofst = get_unaligned_le32(spc + 4);
u64 value = get_unaligned_le64(spc + 8);
trace_seq_printf(p, "attrib=%u, ofst=0x%x, value=0x%llx",
attrib, ofst, value);
trace_seq_putc(p, 0);
return ret;
}
static const char *nvmet_trace_fabrics_connect(struct trace_seq *p,
u8 *spc)
{
const char *ret = trace_seq_buffer_ptr(p);
u16 recfmt = get_unaligned_le16(spc);
u16 qid = get_unaligned_le16(spc + 2);
u16 sqsize = get_unaligned_le16(spc + 4);
u8 cattr = spc[6];
u32 kato = get_unaligned_le32(spc + 8);
trace_seq_printf(p, "recfmt=%u, qid=%u, sqsize=%u, cattr=%u, kato=%u",
recfmt, qid, sqsize, cattr, kato);
trace_seq_putc(p, 0);
return ret;
}
static const char *nvmet_trace_fabrics_property_get(struct trace_seq *p,
u8 *spc)
{
const char *ret = trace_seq_buffer_ptr(p);
u8 attrib = spc[0];
u32 ofst = get_unaligned_le32(spc + 4);
trace_seq_printf(p, "attrib=%u, ofst=0x%x", attrib, ofst);
trace_seq_putc(p, 0);
return ret;
}
static const char *nvmet_trace_fabrics_common(struct trace_seq *p, u8 *spc)
{
const char *ret = trace_seq_buffer_ptr(p);
trace_seq_printf(p, "spcecific=%*ph", 24, spc);
trace_seq_putc(p, 0);
return ret;
}
const char *nvmet_trace_parse_fabrics_cmd(struct trace_seq *p,
u8 fctype, u8 *spc)
{
switch (fctype) {
case nvme_fabrics_type_property_set:
return nvmet_trace_fabrics_property_set(p, spc);
case nvme_fabrics_type_connect:
return nvmet_trace_fabrics_connect(p, spc);
case nvme_fabrics_type_property_get:
return nvmet_trace_fabrics_property_get(p, spc);
default:
return nvmet_trace_fabrics_common(p, spc);
}
}
const char *nvmet_trace_disk_name(struct trace_seq *p, char *name)
{
const char *ret = trace_seq_buffer_ptr(p);
if (*name)
trace_seq_printf(p, "disk=%s, ", name);
trace_seq_putc(p, 0);
return ret;
}
const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl)
{
const char *ret = trace_seq_buffer_ptr(p);
/*
* XXX: We don't know the controller instance before executing the
* connect command itself because the connect command for the admin
* queue will not provide the cntlid which will be allocated in this
* command. In case of io queues, the controller instance will be
* mapped by the extra data of the connect command.
* If we can know the extra data of the connect command in this stage,
* we can update this print statement later.
*/
if (ctrl)
trace_seq_printf(p, "%d", ctrl->cntlid);
else
trace_seq_printf(p, "_");
trace_seq_putc(p, 0);
return ret;
}

View File

@ -0,0 +1,141 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* NVM Express target device driver tracepoints
* Copyright (c) 2018 Johannes Thumshirn, SUSE Linux GmbH
*
* This is entirely based on drivers/nvme/host/trace.h
*/
#undef TRACE_SYSTEM
#define TRACE_SYSTEM nvmet
#if !defined(_TRACE_NVMET_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_NVMET_H
#include <linux/nvme.h>
#include <linux/tracepoint.h>
#include <linux/trace_seq.h>
#include "nvmet.h"
const char *nvmet_trace_parse_admin_cmd(struct trace_seq *p, u8 opcode,
u8 *cdw10);
const char *nvmet_trace_parse_nvm_cmd(struct trace_seq *p, u8 opcode,
u8 *cdw10);
const char *nvmet_trace_parse_fabrics_cmd(struct trace_seq *p, u8 fctype,
u8 *spc);
#define parse_nvme_cmd(qid, opcode, fctype, cdw10) \
((opcode) == nvme_fabrics_command ? \
nvmet_trace_parse_fabrics_cmd(p, fctype, cdw10) : \
(qid ? \
nvmet_trace_parse_nvm_cmd(p, opcode, cdw10) : \
nvmet_trace_parse_admin_cmd(p, opcode, cdw10)))
const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl);
#define __print_ctrl_name(ctrl) \
nvmet_trace_ctrl_name(p, ctrl)
const char *nvmet_trace_disk_name(struct trace_seq *p, char *name);
#define __print_disk_name(name) \
nvmet_trace_disk_name(p, name)
#ifndef TRACE_HEADER_MULTI_READ
static inline struct nvmet_ctrl *nvmet_req_to_ctrl(struct nvmet_req *req)
{
return req->sq->ctrl;
}
static inline void __assign_disk_name(char *name, struct nvmet_req *req,
bool init)
{
struct nvmet_ctrl *ctrl = nvmet_req_to_ctrl(req);
struct nvmet_ns *ns;
if ((init && req->sq->qid) || (!init && req->cq->qid)) {
ns = nvmet_find_namespace(ctrl, req->cmd->rw.nsid);
strncpy(name, ns->device_path, DISK_NAME_LEN);
return;
}
memset(name, 0, DISK_NAME_LEN);
}
#endif
TRACE_EVENT(nvmet_req_init,
TP_PROTO(struct nvmet_req *req, struct nvme_command *cmd),
TP_ARGS(req, cmd),
TP_STRUCT__entry(
__field(struct nvme_command *, cmd)
__field(struct nvmet_ctrl *, ctrl)
__array(char, disk, DISK_NAME_LEN)
__field(int, qid)
__field(u16, cid)
__field(u8, opcode)
__field(u8, fctype)
__field(u8, flags)
__field(u32, nsid)
__field(u64, metadata)
__array(u8, cdw10, 24)
),
TP_fast_assign(
__entry->cmd = cmd;
__entry->ctrl = nvmet_req_to_ctrl(req);
__assign_disk_name(__entry->disk, req, true);
__entry->qid = req->sq->qid;
__entry->cid = cmd->common.command_id;
__entry->opcode = cmd->common.opcode;
__entry->fctype = cmd->fabrics.fctype;
__entry->flags = cmd->common.flags;
__entry->nsid = le32_to_cpu(cmd->common.nsid);
__entry->metadata = le64_to_cpu(cmd->common.metadata);
memcpy(__entry->cdw10, &cmd->common.cdw10,
sizeof(__entry->cdw10));
),
TP_printk("nvmet%s: %sqid=%d, cmdid=%u, nsid=%u, flags=%#x, "
"meta=%#llx, cmd=(%s, %s)",
__print_ctrl_name(__entry->ctrl),
__print_disk_name(__entry->disk),
__entry->qid, __entry->cid, __entry->nsid,
__entry->flags, __entry->metadata,
show_opcode_name(__entry->qid, __entry->opcode,
__entry->fctype),
parse_nvme_cmd(__entry->qid, __entry->opcode,
__entry->fctype, __entry->cdw10))
);
TRACE_EVENT(nvmet_req_complete,
TP_PROTO(struct nvmet_req *req),
TP_ARGS(req),
TP_STRUCT__entry(
__field(struct nvmet_ctrl *, ctrl)
__array(char, disk, DISK_NAME_LEN)
__field(int, qid)
__field(int, cid)
__field(u64, result)
__field(u16, status)
),
TP_fast_assign(
__entry->ctrl = nvmet_req_to_ctrl(req);
__entry->qid = req->cq->qid;
__entry->cid = req->cqe->command_id;
__entry->result = le64_to_cpu(req->cqe->result.u64);
__entry->status = le16_to_cpu(req->cqe->status) >> 1;
__assign_disk_name(__entry->disk, req, false);
),
TP_printk("nvmet%s: %sqid=%d, cmdid=%u, res=%#llx, status=%#x",
__print_ctrl_name(__entry->ctrl),
__print_disk_name(__entry->disk),
__entry->qid, __entry->cid, __entry->result, __entry->status)
);
#endif /* _TRACE_NVMET_H */
#undef TRACE_INCLUDE_PATH
#define TRACE_INCLUDE_PATH .
#undef TRACE_INCLUDE_FILE
#define TRACE_INCLUDE_FILE trace
/* This part must be outside protection */
#include <trace/define_trace.h>

View File

@ -274,6 +274,7 @@ struct lpfc_stats {
uint32_t elsXmitADISC; uint32_t elsXmitADISC;
uint32_t elsXmitLOGO; uint32_t elsXmitLOGO;
uint32_t elsXmitSCR; uint32_t elsXmitSCR;
uint32_t elsXmitRSCN;
uint32_t elsXmitRNID; uint32_t elsXmitRNID;
uint32_t elsXmitFARP; uint32_t elsXmitFARP;
uint32_t elsXmitFARPR; uint32_t elsXmitFARPR;
@ -819,6 +820,7 @@ struct lpfc_hba {
uint32_t cfg_use_msi; uint32_t cfg_use_msi;
uint32_t cfg_auto_imax; uint32_t cfg_auto_imax;
uint32_t cfg_fcp_imax; uint32_t cfg_fcp_imax;
uint32_t cfg_force_rscn;
uint32_t cfg_cq_poll_threshold; uint32_t cfg_cq_poll_threshold;
uint32_t cfg_cq_max_proc_limit; uint32_t cfg_cq_max_proc_limit;
uint32_t cfg_fcp_cpu_map; uint32_t cfg_fcp_cpu_map;

View File

@ -4958,6 +4958,64 @@ static DEVICE_ATTR(lpfc_req_fw_upgrade, S_IRUGO | S_IWUSR,
lpfc_request_firmware_upgrade_show, lpfc_request_firmware_upgrade_show,
lpfc_request_firmware_upgrade_store); lpfc_request_firmware_upgrade_store);
/**
* lpfc_force_rscn_store
*
* @dev: class device that is converted into a Scsi_host.
* @attr: device attribute, not used.
* @buf: unused string
* @count: unused variable.
*
* Description:
* Force the switch to send a RSCN to all other NPorts in our zone
* If we are direct connect pt2pt, build the RSCN command ourself
* and send to the other NPort. Not supported for private loop.
*
* Returns:
* 0 - on success
* -EIO - if command is not sent
**/
static ssize_t
lpfc_force_rscn_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct Scsi_Host *shost = class_to_shost(dev);
struct lpfc_vport *vport = (struct lpfc_vport *)shost->hostdata;
int i;
i = lpfc_issue_els_rscn(vport, 0);
if (i)
return -EIO;
return strlen(buf);
}
/*
* lpfc_force_rscn: Force an RSCN to be sent to all remote NPorts
* connected to the HBA.
*
* Value range is any ascii value
*/
static int lpfc_force_rscn;
module_param(lpfc_force_rscn, int, 0644);
MODULE_PARM_DESC(lpfc_force_rscn,
"Force an RSCN to be sent to all remote NPorts");
lpfc_param_show(force_rscn)
/**
* lpfc_force_rscn_init - Force an RSCN to be sent to all remote NPorts
* @phba: lpfc_hba pointer.
* @val: unused value.
*
* Returns:
* zero if val saved.
**/
static int
lpfc_force_rscn_init(struct lpfc_hba *phba, int val)
{
return 0;
}
static DEVICE_ATTR_RW(lpfc_force_rscn);
/** /**
* lpfc_fcp_imax_store * lpfc_fcp_imax_store
* *
@ -5958,6 +6016,7 @@ struct device_attribute *lpfc_hba_attrs[] = {
&dev_attr_lpfc_nvme_oas, &dev_attr_lpfc_nvme_oas,
&dev_attr_lpfc_nvme_embed_cmd, &dev_attr_lpfc_nvme_embed_cmd,
&dev_attr_lpfc_fcp_imax, &dev_attr_lpfc_fcp_imax,
&dev_attr_lpfc_force_rscn,
&dev_attr_lpfc_cq_poll_threshold, &dev_attr_lpfc_cq_poll_threshold,
&dev_attr_lpfc_cq_max_proc_limit, &dev_attr_lpfc_cq_max_proc_limit,
&dev_attr_lpfc_fcp_cpu_map, &dev_attr_lpfc_fcp_cpu_map,
@ -7005,6 +7064,7 @@ lpfc_get_cfgparam(struct lpfc_hba *phba)
lpfc_nvme_oas_init(phba, lpfc_nvme_oas); lpfc_nvme_oas_init(phba, lpfc_nvme_oas);
lpfc_nvme_embed_cmd_init(phba, lpfc_nvme_embed_cmd); lpfc_nvme_embed_cmd_init(phba, lpfc_nvme_embed_cmd);
lpfc_fcp_imax_init(phba, lpfc_fcp_imax); lpfc_fcp_imax_init(phba, lpfc_fcp_imax);
lpfc_force_rscn_init(phba, lpfc_force_rscn);
lpfc_cq_poll_threshold_init(phba, lpfc_cq_poll_threshold); lpfc_cq_poll_threshold_init(phba, lpfc_cq_poll_threshold);
lpfc_cq_max_proc_limit_init(phba, lpfc_cq_max_proc_limit); lpfc_cq_max_proc_limit_init(phba, lpfc_cq_max_proc_limit);
lpfc_fcp_cpu_map_init(phba, lpfc_fcp_cpu_map); lpfc_fcp_cpu_map_init(phba, lpfc_fcp_cpu_map);

View File

@ -141,6 +141,7 @@ int lpfc_issue_els_adisc(struct lpfc_vport *, struct lpfc_nodelist *, uint8_t);
int lpfc_issue_els_logo(struct lpfc_vport *, struct lpfc_nodelist *, uint8_t); int lpfc_issue_els_logo(struct lpfc_vport *, struct lpfc_nodelist *, uint8_t);
int lpfc_issue_els_npiv_logo(struct lpfc_vport *, struct lpfc_nodelist *); int lpfc_issue_els_npiv_logo(struct lpfc_vport *, struct lpfc_nodelist *);
int lpfc_issue_els_scr(struct lpfc_vport *, uint32_t, uint8_t); int lpfc_issue_els_scr(struct lpfc_vport *, uint32_t, uint8_t);
int lpfc_issue_els_rscn(struct lpfc_vport *vport, uint8_t retry);
int lpfc_issue_fabric_reglogin(struct lpfc_vport *); int lpfc_issue_fabric_reglogin(struct lpfc_vport *);
int lpfc_els_free_iocb(struct lpfc_hba *, struct lpfc_iocbq *); int lpfc_els_free_iocb(struct lpfc_hba *, struct lpfc_iocbq *);
int lpfc_ct_free_iocb(struct lpfc_hba *, struct lpfc_iocbq *); int lpfc_ct_free_iocb(struct lpfc_hba *, struct lpfc_iocbq *);
@ -355,6 +356,7 @@ void lpfc_mbox_timeout_handler(struct lpfc_hba *);
struct lpfc_nodelist *lpfc_findnode_did(struct lpfc_vport *, uint32_t); struct lpfc_nodelist *lpfc_findnode_did(struct lpfc_vport *, uint32_t);
struct lpfc_nodelist *lpfc_findnode_wwpn(struct lpfc_vport *, struct lpfc_nodelist *lpfc_findnode_wwpn(struct lpfc_vport *,
struct lpfc_name *); struct lpfc_name *);
struct lpfc_nodelist *lpfc_findnode_mapped(struct lpfc_vport *vport);
int lpfc_sli_issue_mbox_wait(struct lpfc_hba *, LPFC_MBOXQ_t *, uint32_t); int lpfc_sli_issue_mbox_wait(struct lpfc_hba *, LPFC_MBOXQ_t *, uint32_t);
@ -555,6 +557,8 @@ void lpfc_ras_stop_fwlog(struct lpfc_hba *phba);
int lpfc_check_fwlog_support(struct lpfc_hba *phba); int lpfc_check_fwlog_support(struct lpfc_hba *phba);
/* NVME interfaces. */ /* NVME interfaces. */
void lpfc_nvme_rescan_port(struct lpfc_vport *vport,
struct lpfc_nodelist *ndlp);
void lpfc_nvme_unregister_port(struct lpfc_vport *vport, void lpfc_nvme_unregister_port(struct lpfc_vport *vport,
struct lpfc_nodelist *ndlp); struct lpfc_nodelist *ndlp);
int lpfc_nvme_register_port(struct lpfc_vport *vport, int lpfc_nvme_register_port(struct lpfc_vport *vport,

View File

@ -30,6 +30,8 @@
#include <scsi/scsi_device.h> #include <scsi/scsi_device.h>
#include <scsi/scsi_host.h> #include <scsi/scsi_host.h>
#include <scsi/scsi_transport_fc.h> #include <scsi/scsi_transport_fc.h>
#include <uapi/scsi/fc/fc_fs.h>
#include <uapi/scsi/fc/fc_els.h>
#include "lpfc_hw4.h" #include "lpfc_hw4.h"
#include "lpfc_hw.h" #include "lpfc_hw.h"
@ -3078,6 +3080,116 @@ lpfc_issue_els_scr(struct lpfc_vport *vport, uint32_t nportid, uint8_t retry)
return 0; return 0;
} }
/**
* lpfc_issue_els_rscn - Issue an RSCN to the Fabric Controller (Fabric)
* or the other nport (pt2pt).
* @vport: pointer to a host virtual N_Port data structure.
* @retry: number of retries to the command IOCB.
*
* This routine issues a RSCN to the Fabric Controller (DID 0xFFFFFD)
* when connected to a fabric, or to the remote port when connected
* in point-to-point mode. When sent to the Fabric Controller, it will
* replay the RSCN to registered recipients.
*
* Note that, in lpfc_prep_els_iocb() routine, the reference count of ndlp
* will be incremented by 1 for holding the ndlp and the reference to ndlp
* will be stored into the context1 field of the IOCB for the completion
* callback function to the RSCN ELS command.
*
* Return code
* 0 - Successfully issued RSCN command
* 1 - Failed to issue RSCN command
**/
int
lpfc_issue_els_rscn(struct lpfc_vport *vport, uint8_t retry)
{
struct lpfc_hba *phba = vport->phba;
struct lpfc_iocbq *elsiocb;
struct lpfc_nodelist *ndlp;
struct {
struct fc_els_rscn rscn;
struct fc_els_rscn_page portid;
} *event;
uint32_t nportid;
uint16_t cmdsize = sizeof(*event);
/* Not supported for private loop */
if (phba->fc_topology == LPFC_TOPOLOGY_LOOP &&
!(vport->fc_flag & FC_PUBLIC_LOOP))
return 1;
if (vport->fc_flag & FC_PT2PT) {
/* find any mapped nport - that would be the other nport */
ndlp = lpfc_findnode_mapped(vport);
if (!ndlp)
return 1;
} else {
nportid = FC_FID_FCTRL;
/* find the fabric controller node */
ndlp = lpfc_findnode_did(vport, nportid);
if (!ndlp) {
/* if one didn't exist, make one */
ndlp = lpfc_nlp_init(vport, nportid);
if (!ndlp)
return 1;
lpfc_enqueue_node(vport, ndlp);
} else if (!NLP_CHK_NODE_ACT(ndlp)) {
ndlp = lpfc_enable_node(vport, ndlp,
NLP_STE_UNUSED_NODE);
if (!ndlp)
return 1;
}
}
elsiocb = lpfc_prep_els_iocb(vport, 1, cmdsize, retry, ndlp,
ndlp->nlp_DID, ELS_CMD_RSCN_XMT);
if (!elsiocb) {
/* This will trigger the release of the node just
* allocated
*/
lpfc_nlp_put(ndlp);
return 1;
}
event = ((struct lpfc_dmabuf *)elsiocb->context2)->virt;
event->rscn.rscn_cmd = ELS_RSCN;
event->rscn.rscn_page_len = sizeof(struct fc_els_rscn_page);
event->rscn.rscn_plen = cpu_to_be16(cmdsize);
nportid = vport->fc_myDID;
/* appears that page flags must be 0 for fabric to broadcast RSCN */
event->portid.rscn_page_flags = 0;
event->portid.rscn_fid[0] = (nportid & 0x00FF0000) >> 16;
event->portid.rscn_fid[1] = (nportid & 0x0000FF00) >> 8;
event->portid.rscn_fid[2] = nportid & 0x000000FF;
lpfc_debugfs_disc_trc(vport, LPFC_DISC_TRC_ELS_CMD,
"Issue RSCN: did:x%x",
ndlp->nlp_DID, 0, 0);
phba->fc_stat.elsXmitRSCN++;
elsiocb->iocb_cmpl = lpfc_cmpl_els_cmd;
if (lpfc_sli_issue_iocb(phba, LPFC_ELS_RING, elsiocb, 0) ==
IOCB_ERROR) {
/* The additional lpfc_nlp_put will cause the following
* lpfc_els_free_iocb routine to trigger the rlease of
* the node.
*/
lpfc_nlp_put(ndlp);
lpfc_els_free_iocb(phba, elsiocb);
return 1;
}
/* This will cause the callback-function lpfc_cmpl_els_cmd to
* trigger the release of node.
*/
if (!(vport->fc_flag & FC_PT2PT))
lpfc_nlp_put(ndlp);
return 0;
}
/** /**
* lpfc_issue_els_farpr - Issue a farp to an node on a vport * lpfc_issue_els_farpr - Issue a farp to an node on a vport
* @vport: pointer to a host virtual N_Port data structure. * @vport: pointer to a host virtual N_Port data structure.
@ -6214,6 +6326,8 @@ lpfc_rscn_recovery_check(struct lpfc_vport *vport)
continue; continue;
} }
if (ndlp->nlp_fc4_type & NLP_FC4_NVME)
lpfc_nvme_rescan_port(vport, ndlp);
lpfc_disc_state_machine(vport, ndlp, NULL, lpfc_disc_state_machine(vport, ndlp, NULL,
NLP_EVT_DEVICE_RECOVERY); NLP_EVT_DEVICE_RECOVERY);
@ -6318,6 +6432,19 @@ lpfc_els_rcv_rscn(struct lpfc_vport *vport, struct lpfc_iocbq *cmdiocb,
fc_host_post_event(shost, fc_get_event_number(), fc_host_post_event(shost, fc_get_event_number(),
FCH_EVT_RSCN, lp[i]); FCH_EVT_RSCN, lp[i]);
/* Check if RSCN is coming from a direct-connected remote NPort */
if (vport->fc_flag & FC_PT2PT) {
/* If so, just ACC it, no other action needed for now */
lpfc_printf_vlog(vport, KERN_INFO, LOG_ELS,
"2024 pt2pt RSCN %08x Data: x%x x%x\n",
*lp, vport->fc_flag, payload_len);
lpfc_els_rsp_acc(vport, ELS_CMD_ACC, cmdiocb, ndlp, NULL);
if (ndlp->nlp_fc4_type & NLP_FC4_NVME)
lpfc_nvme_rescan_port(vport, ndlp);
return 0;
}
/* If we are about to begin discovery, just ACC the RSCN. /* If we are about to begin discovery, just ACC the RSCN.
* Discovery processing will satisfy it. * Discovery processing will satisfy it.
*/ */

View File

@ -5276,6 +5276,41 @@ lpfc_findnode_did(struct lpfc_vport *vport, uint32_t did)
return ndlp; return ndlp;
} }
struct lpfc_nodelist *
lpfc_findnode_mapped(struct lpfc_vport *vport)
{
struct Scsi_Host *shost = lpfc_shost_from_vport(vport);
struct lpfc_nodelist *ndlp;
uint32_t data1;
unsigned long iflags;
spin_lock_irqsave(shost->host_lock, iflags);
list_for_each_entry(ndlp, &vport->fc_nodes, nlp_listp) {
if (ndlp->nlp_state == NLP_STE_UNMAPPED_NODE ||
ndlp->nlp_state == NLP_STE_MAPPED_NODE) {
data1 = (((uint32_t)ndlp->nlp_state << 24) |
((uint32_t)ndlp->nlp_xri << 16) |
((uint32_t)ndlp->nlp_type << 8) |
((uint32_t)ndlp->nlp_rpi & 0xff));
spin_unlock_irqrestore(shost->host_lock, iflags);
lpfc_printf_vlog(vport, KERN_INFO, LOG_NODE,
"2025 FIND node DID "
"Data: x%p x%x x%x x%x %p\n",
ndlp, ndlp->nlp_DID,
ndlp->nlp_flag, data1,
ndlp->active_rrqs_xri_bitmap);
return ndlp;
}
}
spin_unlock_irqrestore(shost->host_lock, iflags);
/* FIND node did <did> NOT FOUND */
lpfc_printf_vlog(vport, KERN_INFO, LOG_NODE,
"2026 FIND mapped did NOT FOUND.\n");
return NULL;
}
struct lpfc_nodelist * struct lpfc_nodelist *
lpfc_setup_disc_node(struct lpfc_vport *vport, uint32_t did) lpfc_setup_disc_node(struct lpfc_vport *vport, uint32_t did)
{ {

View File

@ -601,6 +601,7 @@ struct fc_vft_header {
#define ELS_CMD_RPL 0x57000000 #define ELS_CMD_RPL 0x57000000
#define ELS_CMD_FAN 0x60000000 #define ELS_CMD_FAN 0x60000000
#define ELS_CMD_RSCN 0x61040000 #define ELS_CMD_RSCN 0x61040000
#define ELS_CMD_RSCN_XMT 0x61040008
#define ELS_CMD_SCR 0x62000000 #define ELS_CMD_SCR 0x62000000
#define ELS_CMD_RNID 0x78000000 #define ELS_CMD_RNID 0x78000000
#define ELS_CMD_LIRR 0x7A000000 #define ELS_CMD_LIRR 0x7A000000
@ -642,6 +643,7 @@ struct fc_vft_header {
#define ELS_CMD_RPL 0x57 #define ELS_CMD_RPL 0x57
#define ELS_CMD_FAN 0x60 #define ELS_CMD_FAN 0x60
#define ELS_CMD_RSCN 0x0461 #define ELS_CMD_RSCN 0x0461
#define ELS_CMD_RSCN_XMT 0x08000461
#define ELS_CMD_SCR 0x62 #define ELS_CMD_SCR 0x62
#define ELS_CMD_RNID 0x78 #define ELS_CMD_RNID 0x78
#define ELS_CMD_LIRR 0x7A #define ELS_CMD_LIRR 0x7A

View File

@ -2402,6 +2402,50 @@ lpfc_nvme_register_port(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
#endif #endif
} }
/**
* lpfc_nvme_rescan_port - Check to see if we should rescan this remoteport
*
* If the ndlp represents an NVME Target, that we are logged into,
* ping the NVME FC Transport layer to initiate a device rescan
* on this remote NPort.
*/
void
lpfc_nvme_rescan_port(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
{
#if (IS_ENABLED(CONFIG_NVME_FC))
struct lpfc_nvme_rport *rport;
struct nvme_fc_remote_port *remoteport;
rport = ndlp->nrport;
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
"6170 Rescan NPort DID x%06x type x%x "
"state x%x rport %p\n",
ndlp->nlp_DID, ndlp->nlp_type, ndlp->nlp_state, rport);
if (!rport)
goto input_err;
remoteport = rport->remoteport;
if (!remoteport)
goto input_err;
/* Only rescan if we are an NVME target in the MAPPED state */
if (remoteport->port_role & FC_PORT_ROLE_NVME_DISCOVERY &&
ndlp->nlp_state == NLP_STE_MAPPED_NODE) {
nvme_fc_rescan_remoteport(remoteport);
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC,
"6172 NVME rescanned DID x%06x "
"port_state x%x\n",
ndlp->nlp_DID, remoteport->port_state);
}
return;
input_err:
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC,
"6169 State error: lport %p, rport%p FCID x%06x\n",
vport->localport, ndlp->rport, ndlp->nlp_DID);
#endif
}
/* lpfc_nvme_unregister_port - unbind the DID and port_role from this rport. /* lpfc_nvme_unregister_port - unbind the DID and port_role from this rport.
* *
* There is no notion of Devloss or rport recovery from the current * There is no notion of Devloss or rport recovery from the current

View File

@ -1139,6 +1139,22 @@ lpfc_nvmet_defer_rcv(struct nvmet_fc_target_port *tgtport,
spin_unlock_irqrestore(&ctxp->ctxlock, iflag); spin_unlock_irqrestore(&ctxp->ctxlock, iflag);
} }
static void
lpfc_nvmet_discovery_event(struct nvmet_fc_target_port *tgtport)
{
struct lpfc_nvmet_tgtport *tgtp;
struct lpfc_hba *phba;
uint32_t rc;
tgtp = tgtport->private;
phba = tgtp->phba;
rc = lpfc_issue_els_rscn(phba->pport, 0);
lpfc_printf_log(phba, KERN_ERR, LOG_NVME,
"6420 NVMET subsystem change: Notification %s\n",
(rc) ? "Failed" : "Sent");
}
static struct nvmet_fc_target_template lpfc_tgttemplate = { static struct nvmet_fc_target_template lpfc_tgttemplate = {
.targetport_delete = lpfc_nvmet_targetport_delete, .targetport_delete = lpfc_nvmet_targetport_delete,
.xmt_ls_rsp = lpfc_nvmet_xmt_ls_rsp, .xmt_ls_rsp = lpfc_nvmet_xmt_ls_rsp,
@ -1146,6 +1162,7 @@ static struct nvmet_fc_target_template lpfc_tgttemplate = {
.fcp_abort = lpfc_nvmet_xmt_fcp_abort, .fcp_abort = lpfc_nvmet_xmt_fcp_abort,
.fcp_req_release = lpfc_nvmet_xmt_fcp_release, .fcp_req_release = lpfc_nvmet_xmt_fcp_release,
.defer_rcv = lpfc_nvmet_defer_rcv, .defer_rcv = lpfc_nvmet_defer_rcv,
.discovery_event = lpfc_nvmet_discovery_event,
.max_hw_queues = 1, .max_hw_queues = 1,
.max_sgl_segments = LPFC_NVMET_DEFAULT_SEGS, .max_sgl_segments = LPFC_NVMET_DEFAULT_SEGS,

View File

@ -9398,6 +9398,7 @@ lpfc_sli4_iocb2wqe(struct lpfc_hba *phba, struct lpfc_iocbq *iocbq,
if (if_type >= LPFC_SLI_INTF_IF_TYPE_2) { if (if_type >= LPFC_SLI_INTF_IF_TYPE_2) {
if (pcmd && (*pcmd == ELS_CMD_FLOGI || if (pcmd && (*pcmd == ELS_CMD_FLOGI ||
*pcmd == ELS_CMD_SCR || *pcmd == ELS_CMD_SCR ||
*pcmd == ELS_CMD_RSCN_XMT ||
*pcmd == ELS_CMD_FDISC || *pcmd == ELS_CMD_FDISC ||
*pcmd == ELS_CMD_LOGO || *pcmd == ELS_CMD_LOGO ||
*pcmd == ELS_CMD_PLOGI)) { *pcmd == ELS_CMD_PLOGI)) {

View File

@ -203,13 +203,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
{ {
struct file *file = iocb->ki_filp; struct file *file = iocb->ki_filp;
struct block_device *bdev = I_BDEV(bdev_file_inode(file)); struct block_device *bdev = I_BDEV(bdev_file_inode(file));
struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs, *bvec; struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs;
loff_t pos = iocb->ki_pos; loff_t pos = iocb->ki_pos;
bool should_dirty = false; bool should_dirty = false;
struct bio bio; struct bio bio;
ssize_t ret; ssize_t ret;
blk_qc_t qc; blk_qc_t qc;
struct bvec_iter_all iter_all;
if ((pos | iov_iter_alignment(iter)) & if ((pos | iov_iter_alignment(iter)) &
(bdev_logical_block_size(bdev) - 1)) (bdev_logical_block_size(bdev) - 1))
@ -259,13 +258,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
} }
__set_current_state(TASK_RUNNING); __set_current_state(TASK_RUNNING);
bio_for_each_segment_all(bvec, &bio, iter_all) { bio_release_pages(&bio, should_dirty);
if (should_dirty && !PageCompound(bvec->bv_page))
set_page_dirty_lock(bvec->bv_page);
if (!bio_flagged(&bio, BIO_NO_PAGE_REF))
put_page(bvec->bv_page);
}
if (unlikely(bio.bi_status)) if (unlikely(bio.bi_status))
ret = blk_status_to_errno(bio.bi_status); ret = blk_status_to_errno(bio.bi_status);
@ -335,13 +328,7 @@ static void blkdev_bio_end_io(struct bio *bio)
if (should_dirty) { if (should_dirty) {
bio_check_pages_dirty(bio); bio_check_pages_dirty(bio);
} else { } else {
if (!bio_flagged(bio, BIO_NO_PAGE_REF)) { bio_release_pages(bio, false);
struct bvec_iter_all iter_all;
struct bio_vec *bvec;
bio_for_each_segment_all(bvec, bio, iter_all)
put_page(bvec->bv_page);
}
bio_put(bio); bio_put(bio);
} }
} }

View File

@ -538,8 +538,8 @@ static struct bio *dio_await_one(struct dio *dio)
*/ */
static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio) static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
{ {
struct bio_vec *bvec;
blk_status_t err = bio->bi_status; blk_status_t err = bio->bi_status;
bool should_dirty = dio->op == REQ_OP_READ && dio->should_dirty;
if (err) { if (err) {
if (err == BLK_STS_AGAIN && (bio->bi_opf & REQ_NOWAIT)) if (err == BLK_STS_AGAIN && (bio->bi_opf & REQ_NOWAIT))
@ -548,19 +548,10 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
dio->io_error = -EIO; dio->io_error = -EIO;
} }
if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) { if (dio->is_async && should_dirty) {
bio_check_pages_dirty(bio); /* transfers ownership */ bio_check_pages_dirty(bio); /* transfers ownership */
} else { } else {
struct bvec_iter_all iter_all; bio_release_pages(bio, should_dirty);
bio_for_each_segment_all(bvec, bio, iter_all) {
struct page *page = bvec->bv_page;
if (dio->op == REQ_OP_READ && !PageCompound(page) &&
dio->should_dirty)
set_page_dirty_lock(page);
put_page(page);
}
bio_put(bio); bio_put(bio);
} }
return err; return err;

View File

@ -715,6 +715,7 @@ void wbc_detach_inode(struct writeback_control *wbc)
void wbc_account_io(struct writeback_control *wbc, struct page *page, void wbc_account_io(struct writeback_control *wbc, struct page *page,
size_t bytes) size_t bytes)
{ {
struct cgroup_subsys_state *css;
int id; int id;
/* /*
@ -726,7 +727,12 @@ void wbc_account_io(struct writeback_control *wbc, struct page *page,
if (!wbc->wb) if (!wbc->wb)
return; return;
id = mem_cgroup_css_from_page(page)->id; css = mem_cgroup_css_from_page(page);
/* dead cgroups shouldn't contribute to inode ownership arbitration */
if (!(css->flags & CSS_ONLINE))
return;
id = css->id;
if (id == wbc->wb_id) { if (id == wbc->wb_id) {
wbc->wb_bytes += bytes; wbc->wb_bytes += bytes;

View File

@ -998,9 +998,6 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len); iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
if (offset) if (offset)
iov_iter_advance(iter, offset); iov_iter_advance(iter, offset);
/* don't drop a reference to these pages */
iter->type |= ITER_BVEC_FLAG_NO_REF;
return 0; return 0;
} }

View File

@ -333,7 +333,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
if (iop) if (iop)
atomic_inc(&iop->read_count); atomic_inc(&iop->read_count);
if (!ctx->bio || !is_contig || bio_full(ctx->bio)) { if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {
gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL); gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT; int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
@ -1599,13 +1599,7 @@ static void iomap_dio_bio_end_io(struct bio *bio)
if (should_dirty) { if (should_dirty) {
bio_check_pages_dirty(bio); bio_check_pages_dirty(bio);
} else { } else {
if (!bio_flagged(bio, BIO_NO_PAGE_REF)) { bio_release_pages(bio, false);
struct bvec_iter_all iter_all;
struct bio_vec *bvec;
bio_for_each_segment_all(bvec, bio, iter_all)
put_page(bvec->bv_page);
}
bio_put(bio); bio_put(bio);
} }
} }

View File

@ -782,7 +782,7 @@ xfs_add_to_ioend(
atomic_inc(&iop->write_count); atomic_inc(&iop->write_count);
if (!merged) { if (!merged) {
if (bio_full(wpc->ioend->io_bio)) if (bio_full(wpc->ioend->io_bio, len))
xfs_chain_bio(wpc->ioend, wbc, bdev, sector); xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
bio_add_page(wpc->ioend->io_bio, page, len, poff); bio_add_page(wpc->ioend->io_bio, page, len, poff);
} }

View File

@ -102,9 +102,23 @@ static inline void *bio_data(struct bio *bio)
return NULL; return NULL;
} }
static inline bool bio_full(struct bio *bio) /**
* bio_full - check if the bio is full
* @bio: bio to check
* @len: length of one segment to be added
*
* Return true if @bio is full and one segment with @len bytes can't be
* added to the bio, otherwise return false
*/
static inline bool bio_full(struct bio *bio, unsigned len)
{ {
return bio->bi_vcnt >= bio->bi_max_vecs; if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
if (bio->bi_iter.bi_size > UINT_MAX - len)
return true;
return false;
} }
static inline bool bio_next_segment(const struct bio *bio, static inline bool bio_next_segment(const struct bio *bio,
@ -408,7 +422,6 @@ static inline void bio_wouldblock_error(struct bio *bio)
} }
struct request_queue; struct request_queue;
extern int bio_phys_segments(struct request_queue *, struct bio *);
extern int submit_bio_wait(struct bio *bio); extern int submit_bio_wait(struct bio *bio);
extern void bio_advance(struct bio *, unsigned); extern void bio_advance(struct bio *, unsigned);
@ -427,6 +440,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
void __bio_add_page(struct bio *bio, struct page *page, void __bio_add_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int off); unsigned int len, unsigned int off);
int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
void bio_release_pages(struct bio *bio, bool mark_dirty);
struct rq_map_data; struct rq_map_data;
extern struct bio *bio_map_user_iov(struct request_queue *, extern struct bio *bio_map_user_iov(struct request_queue *,
struct iov_iter *, gfp_t); struct iov_iter *, gfp_t);
@ -444,17 +458,6 @@ void generic_end_io_acct(struct request_queue *q, int op,
struct hd_struct *part, struct hd_struct *part,
unsigned long start_time); unsigned long start_time);
#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
# error "You should define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE for your platform"
#endif
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
extern void bio_flush_dcache_pages(struct bio *bi);
#else
static inline void bio_flush_dcache_pages(struct bio *bi)
{
}
#endif
extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter, extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
struct bio *src, struct bvec_iter *src_iter); struct bio *src, struct bvec_iter *src_iter);
extern void bio_copy_data(struct bio *dst, struct bio *src); extern void bio_copy_data(struct bio *dst, struct bio *src);

View File

@ -63,19 +63,17 @@ struct blkcg {
/* /*
* blkg_[rw]stat->aux_cnt is excluded for local stats but included for * blkg_[rw]stat->aux_cnt is excluded for local stats but included for
* recursive. Used to carry stats of dead children, and, for blkg_rwstat, * recursive. Used to carry stats of dead children.
* to carry result values from read and sum operations.
*/ */
struct blkg_stat {
struct percpu_counter cpu_cnt;
atomic64_t aux_cnt;
};
struct blkg_rwstat { struct blkg_rwstat {
struct percpu_counter cpu_cnt[BLKG_RWSTAT_NR]; struct percpu_counter cpu_cnt[BLKG_RWSTAT_NR];
atomic64_t aux_cnt[BLKG_RWSTAT_NR]; atomic64_t aux_cnt[BLKG_RWSTAT_NR];
}; };
struct blkg_rwstat_sample {
u64 cnt[BLKG_RWSTAT_NR];
};
/* /*
* A blkcg_gq (blkg) is association between a block cgroup (blkcg) and a * A blkcg_gq (blkg) is association between a block cgroup (blkcg) and a
* request_queue (q). This is used by blkcg policies which need to track * request_queue (q). This is used by blkcg policies which need to track
@ -198,6 +196,13 @@ int blkcg_activate_policy(struct request_queue *q,
void blkcg_deactivate_policy(struct request_queue *q, void blkcg_deactivate_policy(struct request_queue *q,
const struct blkcg_policy *pol); const struct blkcg_policy *pol);
static inline u64 blkg_rwstat_read_counter(struct blkg_rwstat *rwstat,
unsigned int idx)
{
return atomic64_read(&rwstat->aux_cnt[idx]) +
percpu_counter_sum_positive(&rwstat->cpu_cnt[idx]);
}
const char *blkg_dev_name(struct blkcg_gq *blkg); const char *blkg_dev_name(struct blkcg_gq *blkg);
void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg, void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg,
u64 (*prfill)(struct seq_file *, u64 (*prfill)(struct seq_file *,
@ -206,8 +211,7 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg,
bool show_total); bool show_total);
u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v); u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v);
u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd, u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
const struct blkg_rwstat *rwstat); const struct blkg_rwstat_sample *rwstat);
u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd, int off);
u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd, u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
int off); int off);
int blkg_print_stat_bytes(struct seq_file *sf, void *v); int blkg_print_stat_bytes(struct seq_file *sf, void *v);
@ -215,10 +219,8 @@ int blkg_print_stat_ios(struct seq_file *sf, void *v);
int blkg_print_stat_bytes_recursive(struct seq_file *sf, void *v); int blkg_print_stat_bytes_recursive(struct seq_file *sf, void *v);
int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v); int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v);
u64 blkg_stat_recursive_sum(struct blkcg_gq *blkg, void blkg_rwstat_recursive_sum(struct blkcg_gq *blkg, struct blkcg_policy *pol,
struct blkcg_policy *pol, int off); int off, struct blkg_rwstat_sample *sum);
struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
struct blkcg_policy *pol, int off);
struct blkg_conf_ctx { struct blkg_conf_ctx {
struct gendisk *disk; struct gendisk *disk;
@ -569,69 +571,6 @@ static inline void blkg_put(struct blkcg_gq *blkg)
if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \ if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \
(p_blkg)->q, false))) (p_blkg)->q, false)))
static inline int blkg_stat_init(struct blkg_stat *stat, gfp_t gfp)
{
int ret;
ret = percpu_counter_init(&stat->cpu_cnt, 0, gfp);
if (ret)
return ret;
atomic64_set(&stat->aux_cnt, 0);
return 0;
}
static inline void blkg_stat_exit(struct blkg_stat *stat)
{
percpu_counter_destroy(&stat->cpu_cnt);
}
/**
* blkg_stat_add - add a value to a blkg_stat
* @stat: target blkg_stat
* @val: value to add
*
* Add @val to @stat. The caller must ensure that IRQ on the same CPU
* don't re-enter this function for the same counter.
*/
static inline void blkg_stat_add(struct blkg_stat *stat, uint64_t val)
{
percpu_counter_add_batch(&stat->cpu_cnt, val, BLKG_STAT_CPU_BATCH);
}
/**
* blkg_stat_read - read the current value of a blkg_stat
* @stat: blkg_stat to read
*/
static inline uint64_t blkg_stat_read(struct blkg_stat *stat)
{
return percpu_counter_sum_positive(&stat->cpu_cnt);
}
/**
* blkg_stat_reset - reset a blkg_stat
* @stat: blkg_stat to reset
*/
static inline void blkg_stat_reset(struct blkg_stat *stat)
{
percpu_counter_set(&stat->cpu_cnt, 0);
atomic64_set(&stat->aux_cnt, 0);
}
/**
* blkg_stat_add_aux - add a blkg_stat into another's aux count
* @to: the destination blkg_stat
* @from: the source
*
* Add @from's count including the aux one to @to's aux count.
*/
static inline void blkg_stat_add_aux(struct blkg_stat *to,
struct blkg_stat *from)
{
atomic64_add(blkg_stat_read(from) + atomic64_read(&from->aux_cnt),
&to->aux_cnt);
}
static inline int blkg_rwstat_init(struct blkg_rwstat *rwstat, gfp_t gfp) static inline int blkg_rwstat_init(struct blkg_rwstat *rwstat, gfp_t gfp)
{ {
int i, ret; int i, ret;
@ -693,15 +632,14 @@ static inline void blkg_rwstat_add(struct blkg_rwstat *rwstat,
* *
* Read the current snapshot of @rwstat and return it in the aux counts. * Read the current snapshot of @rwstat and return it in the aux counts.
*/ */
static inline struct blkg_rwstat blkg_rwstat_read(struct blkg_rwstat *rwstat) static inline void blkg_rwstat_read(struct blkg_rwstat *rwstat,
struct blkg_rwstat_sample *result)
{ {
struct blkg_rwstat result;
int i; int i;
for (i = 0; i < BLKG_RWSTAT_NR; i++) for (i = 0; i < BLKG_RWSTAT_NR; i++)
atomic64_set(&result.aux_cnt[i], result->cnt[i] =
percpu_counter_sum_positive(&rwstat->cpu_cnt[i])); percpu_counter_sum_positive(&rwstat->cpu_cnt[i]);
return result;
} }
/** /**
@ -714,10 +652,10 @@ static inline struct blkg_rwstat blkg_rwstat_read(struct blkg_rwstat *rwstat)
*/ */
static inline uint64_t blkg_rwstat_total(struct blkg_rwstat *rwstat) static inline uint64_t blkg_rwstat_total(struct blkg_rwstat *rwstat)
{ {
struct blkg_rwstat tmp = blkg_rwstat_read(rwstat); struct blkg_rwstat_sample tmp = { };
return atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_READ]) + blkg_rwstat_read(rwstat, &tmp);
atomic64_read(&tmp.aux_cnt[BLKG_RWSTAT_WRITE]); return tmp.cnt[BLKG_RWSTAT_READ] + tmp.cnt[BLKG_RWSTAT_WRITE];
} }
/** /**

View File

@ -306,7 +306,7 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs
bool blk_mq_complete_request(struct request *rq); bool blk_mq_complete_request(struct request *rq);
void blk_mq_complete_request_sync(struct request *rq); void blk_mq_complete_request_sync(struct request *rq);
bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list, bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
struct bio *bio); struct bio *bio, unsigned int nr_segs);
bool blk_mq_queue_stopped(struct request_queue *q); bool blk_mq_queue_stopped(struct request_queue *q);
void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx); void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx); void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);

View File

@ -154,11 +154,6 @@ struct bio {
blk_status_t bi_status; blk_status_t bi_status;
u8 bi_partno; u8 bi_partno;
/* Number of segments in this BIO after
* physical address coalescing is performed.
*/
unsigned int bi_phys_segments;
struct bvec_iter bi_iter; struct bvec_iter bi_iter;
atomic_t __bi_remaining; atomic_t __bi_remaining;
@ -210,7 +205,6 @@ struct bio {
*/ */
enum { enum {
BIO_NO_PAGE_REF, /* don't put release vec pages */ BIO_NO_PAGE_REF, /* don't put release vec pages */
BIO_SEG_VALID, /* bi_phys_segments valid */
BIO_CLONED, /* doesn't own data */ BIO_CLONED, /* doesn't own data */
BIO_BOUNCED, /* bio is a bounce bio */ BIO_BOUNCED, /* bio is a bounce bio */
BIO_USER_MAPPED, /* contains user pages */ BIO_USER_MAPPED, /* contains user pages */

View File

@ -137,11 +137,11 @@ struct request {
unsigned int cmd_flags; /* op and common flags */ unsigned int cmd_flags; /* op and common flags */
req_flags_t rq_flags; req_flags_t rq_flags;
int tag;
int internal_tag; int internal_tag;
/* the following two fields are internal, NEVER access directly */ /* the following two fields are internal, NEVER access directly */
unsigned int __data_len; /* total data len */ unsigned int __data_len; /* total data len */
int tag;
sector_t __sector; /* sector cursor */ sector_t __sector; /* sector cursor */
struct bio *bio; struct bio *bio;
@ -828,7 +828,6 @@ extern void blk_unregister_queue(struct gendisk *disk);
extern blk_qc_t generic_make_request(struct bio *bio); extern blk_qc_t generic_make_request(struct bio *bio);
extern blk_qc_t direct_make_request(struct bio *bio); extern blk_qc_t direct_make_request(struct bio *bio);
extern void blk_rq_init(struct request_queue *q, struct request *rq); extern void blk_rq_init(struct request_queue *q, struct request *rq);
extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
extern void blk_put_request(struct request *); extern void blk_put_request(struct request *);
extern struct request *blk_get_request(struct request_queue *, unsigned int op, extern struct request *blk_get_request(struct request_queue *, unsigned int op,
blk_mq_req_flags_t flags); blk_mq_req_flags_t flags);
@ -842,7 +841,6 @@ extern blk_status_t blk_insert_cloned_request(struct request_queue *q,
struct request *rq); struct request *rq);
extern int blk_rq_append_bio(struct request *rq, struct bio **bio); extern int blk_rq_append_bio(struct request *rq, struct bio **bio);
extern void blk_queue_split(struct request_queue *, struct bio **); extern void blk_queue_split(struct request_queue *, struct bio **);
extern void blk_recount_segments(struct request_queue *, struct bio *);
extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int); extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int);
extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t, extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t,
unsigned int, void __user *); unsigned int, void __user *);
@ -867,6 +865,9 @@ extern void blk_execute_rq(struct request_queue *, struct gendisk *,
extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *, extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
struct request *, int, rq_end_io_fn *); struct request *, int, rq_end_io_fn *);
/* Helper to convert REQ_OP_XXX to its string format XXX */
extern const char *blk_op_str(unsigned int op);
int blk_status_to_errno(blk_status_t status); int blk_status_to_errno(blk_status_t status);
blk_status_t errno_to_blk_status(int errno); blk_status_t errno_to_blk_status(int errno);
@ -1026,21 +1027,9 @@ void blk_steal_bios(struct bio_list *list, struct request *rq);
* *
* blk_update_request() completes given number of bytes and updates * blk_update_request() completes given number of bytes and updates
* the request without completing it. * the request without completing it.
*
* blk_end_request() and friends. __blk_end_request() must be called
* with the request queue spinlock acquired.
*
* Several drivers define their own end_request and call
* blk_end_request() for parts of the original function.
* This prevents code duplication in drivers.
*/ */
extern bool blk_update_request(struct request *rq, blk_status_t error, extern bool blk_update_request(struct request *rq, blk_status_t error,
unsigned int nr_bytes); unsigned int nr_bytes);
extern void blk_end_request_all(struct request *rq, blk_status_t error);
extern bool __blk_end_request(struct request *rq, blk_status_t error,
unsigned int nr_bytes);
extern void __blk_end_request_all(struct request *rq, blk_status_t error);
extern bool __blk_end_request_cur(struct request *rq, blk_status_t error);
extern void __blk_complete_request(struct request *); extern void __blk_complete_request(struct request *);
extern void blk_abort_request(struct request *); extern void blk_abort_request(struct request *);

View File

@ -34,7 +34,7 @@ struct elevator_mq_ops {
void (*depth_updated)(struct blk_mq_hw_ctx *); void (*depth_updated)(struct blk_mq_hw_ctx *);
bool (*allow_merge)(struct request_queue *, struct request *, struct bio *); bool (*allow_merge)(struct request_queue *, struct request *, struct bio *);
bool (*bio_merge)(struct blk_mq_hw_ctx *, struct bio *); bool (*bio_merge)(struct blk_mq_hw_ctx *, struct bio *, unsigned int);
int (*request_merge)(struct request_queue *q, struct request **, struct bio *); int (*request_merge)(struct request_queue *q, struct request **, struct bio *);
void (*request_merged)(struct request_queue *, struct request *, enum elv_merge); void (*request_merged)(struct request_queue *, struct request *, enum elv_merge);
void (*requests_merged)(struct request_queue *, struct request *, struct request *); void (*requests_merged)(struct request_queue *, struct request *, struct request *);

View File

@ -791,6 +791,11 @@ struct nvmet_fc_target_port {
* nvmefc_tgt_fcp_req. * nvmefc_tgt_fcp_req.
* Entrypoint is Optional. * Entrypoint is Optional.
* *
* @discovery_event: Called by the transport to generate an RSCN
* change notifications to NVME initiators. The RSCN notifications
* should cause the initiator to rescan the discovery controller
* on the targetport.
*
* @max_hw_queues: indicates the maximum number of hw queues the LLDD * @max_hw_queues: indicates the maximum number of hw queues the LLDD
* supports for cpu affinitization. * supports for cpu affinitization.
* Value is Mandatory. Must be at least 1. * Value is Mandatory. Must be at least 1.
@ -832,6 +837,7 @@ struct nvmet_fc_target_template {
struct nvmefc_tgt_fcp_req *fcpreq); struct nvmefc_tgt_fcp_req *fcpreq);
void (*defer_rcv)(struct nvmet_fc_target_port *tgtport, void (*defer_rcv)(struct nvmet_fc_target_port *tgtport,
struct nvmefc_tgt_fcp_req *fcpreq); struct nvmefc_tgt_fcp_req *fcpreq);
void (*discovery_event)(struct nvmet_fc_target_port *tgtport);
u32 max_hw_queues; u32 max_hw_queues;
u16 max_sgl_segments; u16 max_sgl_segments;

View File

@ -562,6 +562,22 @@ enum nvme_opcode {
nvme_cmd_resv_release = 0x15, nvme_cmd_resv_release = 0x15,
}; };
#define nvme_opcode_name(opcode) { opcode, #opcode }
#define show_nvm_opcode_name(val) \
__print_symbolic(val, \
nvme_opcode_name(nvme_cmd_flush), \
nvme_opcode_name(nvme_cmd_write), \
nvme_opcode_name(nvme_cmd_read), \
nvme_opcode_name(nvme_cmd_write_uncor), \
nvme_opcode_name(nvme_cmd_compare), \
nvme_opcode_name(nvme_cmd_write_zeroes), \
nvme_opcode_name(nvme_cmd_dsm), \
nvme_opcode_name(nvme_cmd_resv_register), \
nvme_opcode_name(nvme_cmd_resv_report), \
nvme_opcode_name(nvme_cmd_resv_acquire), \
nvme_opcode_name(nvme_cmd_resv_release))
/* /*
* Descriptor subtype - lower 4 bits of nvme_(keyed_)sgl_desc identifier * Descriptor subtype - lower 4 bits of nvme_(keyed_)sgl_desc identifier
* *
@ -794,6 +810,32 @@ enum nvme_admin_opcode {
nvme_admin_sanitize_nvm = 0x84, nvme_admin_sanitize_nvm = 0x84,
}; };
#define nvme_admin_opcode_name(opcode) { opcode, #opcode }
#define show_admin_opcode_name(val) \
__print_symbolic(val, \
nvme_admin_opcode_name(nvme_admin_delete_sq), \
nvme_admin_opcode_name(nvme_admin_create_sq), \
nvme_admin_opcode_name(nvme_admin_get_log_page), \
nvme_admin_opcode_name(nvme_admin_delete_cq), \
nvme_admin_opcode_name(nvme_admin_create_cq), \
nvme_admin_opcode_name(nvme_admin_identify), \
nvme_admin_opcode_name(nvme_admin_abort_cmd), \
nvme_admin_opcode_name(nvme_admin_set_features), \
nvme_admin_opcode_name(nvme_admin_get_features), \
nvme_admin_opcode_name(nvme_admin_async_event), \
nvme_admin_opcode_name(nvme_admin_ns_mgmt), \
nvme_admin_opcode_name(nvme_admin_activate_fw), \
nvme_admin_opcode_name(nvme_admin_download_fw), \
nvme_admin_opcode_name(nvme_admin_ns_attach), \
nvme_admin_opcode_name(nvme_admin_keep_alive), \
nvme_admin_opcode_name(nvme_admin_directive_send), \
nvme_admin_opcode_name(nvme_admin_directive_recv), \
nvme_admin_opcode_name(nvme_admin_dbbuf), \
nvme_admin_opcode_name(nvme_admin_format_nvm), \
nvme_admin_opcode_name(nvme_admin_security_send), \
nvme_admin_opcode_name(nvme_admin_security_recv), \
nvme_admin_opcode_name(nvme_admin_sanitize_nvm))
enum { enum {
NVME_QUEUE_PHYS_CONTIG = (1 << 0), NVME_QUEUE_PHYS_CONTIG = (1 << 0),
NVME_CQ_IRQ_ENABLED = (1 << 1), NVME_CQ_IRQ_ENABLED = (1 << 1),
@ -1008,6 +1050,23 @@ enum nvmf_capsule_command {
nvme_fabrics_type_property_get = 0x04, nvme_fabrics_type_property_get = 0x04,
}; };
#define nvme_fabrics_type_name(type) { type, #type }
#define show_fabrics_type_name(type) \
__print_symbolic(type, \
nvme_fabrics_type_name(nvme_fabrics_type_property_set), \
nvme_fabrics_type_name(nvme_fabrics_type_connect), \
nvme_fabrics_type_name(nvme_fabrics_type_property_get))
/*
* If not fabrics command, fctype will be ignored.
*/
#define show_opcode_name(qid, opcode, fctype) \
((opcode) == nvme_fabrics_command ? \
show_fabrics_type_name(fctype) : \
((qid) ? \
show_nvm_opcode_name(opcode) : \
show_admin_opcode_name(opcode)))
struct nvmf_common_command { struct nvmf_common_command {
__u8 opcode; __u8 opcode;
__u8 resv1; __u8 resv1;
@ -1165,6 +1224,11 @@ struct nvme_command {
}; };
}; };
static inline bool nvme_is_fabrics(struct nvme_command *cmd)
{
return cmd->common.opcode == nvme_fabrics_command;
}
struct nvme_error_slot { struct nvme_error_slot {
__le64 error_count; __le64 error_count;
__le16 sqid; __le16 sqid;
@ -1186,7 +1250,7 @@ static inline bool nvme_is_write(struct nvme_command *cmd)
* *
* Why can't we simply have a Fabrics In and Fabrics out command? * Why can't we simply have a Fabrics In and Fabrics out command?
*/ */
if (unlikely(cmd->common.opcode == nvme_fabrics_command)) if (unlikely(nvme_is_fabrics(cmd)))
return cmd->fabrics.fctype & 1; return cmd->fabrics.fctype & 1;
return cmd->common.opcode & 1; return cmd->common.opcode & 1;
} }

View File

@ -39,6 +39,9 @@ static inline bool is_sed_ioctl(unsigned int cmd)
case IOC_OPAL_ENABLE_DISABLE_MBR: case IOC_OPAL_ENABLE_DISABLE_MBR:
case IOC_OPAL_ERASE_LR: case IOC_OPAL_ERASE_LR:
case IOC_OPAL_SECURE_ERASE_LR: case IOC_OPAL_SECURE_ERASE_LR:
case IOC_OPAL_PSID_REVERT_TPR:
case IOC_OPAL_MBR_DONE:
case IOC_OPAL_WRITE_SHADOW_MBR:
return true; return true;
} }
return false; return false;

View File

@ -19,9 +19,6 @@ struct kvec {
}; };
enum iter_type { enum iter_type {
/* set if ITER_BVEC doesn't hold a bv_page ref */
ITER_BVEC_FLAG_NO_REF = 2,
/* iter types */ /* iter types */
ITER_IOVEC = 4, ITER_IOVEC = 4,
ITER_KVEC = 8, ITER_KVEC = 8,
@ -56,7 +53,7 @@ struct iov_iter {
static inline enum iter_type iov_iter_type(const struct iov_iter *i) static inline enum iter_type iov_iter_type(const struct iov_iter *i)
{ {
return i->type & ~(READ | WRITE | ITER_BVEC_FLAG_NO_REF); return i->type & ~(READ | WRITE);
} }
static inline bool iter_is_iovec(const struct iov_iter *i) static inline bool iter_is_iovec(const struct iov_iter *i)
@ -89,11 +86,6 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)
return i->type & (READ | WRITE); return i->type & (READ | WRITE);
} }
static inline bool iov_iter_bvec_no_ref(const struct iov_iter *i)
{
return (i->type & ITER_BVEC_FLAG_NO_REF) != 0;
}
/* /*
* Total number of bytes covered by an iovec. * Total number of bytes covered by an iovec.
* *

View File

@ -76,16 +76,7 @@ TRACE_DEFINE_ENUM(CP_TRIMMED);
#define show_bio_type(op,op_flags) show_bio_op(op), \ #define show_bio_type(op,op_flags) show_bio_op(op), \
show_bio_op_flags(op_flags) show_bio_op_flags(op_flags)
#define show_bio_op(op) \ #define show_bio_op(op) blk_op_str(op)
__print_symbolic(op, \
{ REQ_OP_READ, "READ" }, \
{ REQ_OP_WRITE, "WRITE" }, \
{ REQ_OP_FLUSH, "FLUSH" }, \
{ REQ_OP_DISCARD, "DISCARD" }, \
{ REQ_OP_SECURE_ERASE, "SECURE_ERASE" }, \
{ REQ_OP_ZONE_RESET, "ZONE_RESET" }, \
{ REQ_OP_WRITE_SAME, "WRITE_SAME" }, \
{ REQ_OP_WRITE_ZEROES, "WRITE_ZEROES" })
#define show_bio_op_flags(flags) \ #define show_bio_op_flags(flags) \
__print_flags(F2FS_BIO_FLAG_MASK(flags), "|", \ __print_flags(F2FS_BIO_FLAG_MASK(flags), "|", \

Some files were not shown because too many files have changed in this diff Show More