alistair23-linux

redonkable

Author	SHA1	Message	Date
Martin K. Petersen	86b3728141	block: Expose discard granularity While SSDs track block usage on a per-sector basis, RAID arrays often have allocation blocks that are bigger. Allow the discard granularity and alignment to be set and teach the topology stacking logic how to handle them. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-11-10 11:50:21 +01:00
Jens Axboe	b9d128f108	block: move bdi/address_space unplug functions to backing-dev.h There's nothing block related about them, the backing device is used by things like NFS etc as well. This gets rid of the need to protect such calls by CONFIG_BLOCK. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-29 13:59:26 +01:00
Jens Axboe	23e018a1b0	block: get rid of kblock_schedule_delayed_work() It was briefly introduced to allow CFQ to to delayed scheduling, but we ended up removing that feature again. So lets kill the function and export, and just switch CFQ back to the normal work schedule since it is now passing in a '0' delay from all call sites. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-05 11:03:58 +02:00
Martin K. Petersen	ac481c20ef	block: Topology ioctls Not all users of the topology information want to use libblkid. Provide the topology information through bdev ioctls. Also clarify sector size comments for existing BLK ioctls. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-03 20:52:01 +02:00
Jens Axboe	8e29675555	cfq-iosched: implement slower async initiate and queue ramp up This slowly ramps up the async queue depth based on the time passed since the sync IO, and doesn't allow async at all until a sync slice period has passed. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-03 16:27:13 +02:00
Christoph Hellwig	67efc92580	block: allow large discard requests Currently we set the bio size to the byte equivalent of the blocks to be trimmed when submitting the initial DISCARD ioctl. That means it is subject to the max_hw_sectors limitation of the HBA which is much lower than the size of a DISCARD request we can support. Add a separate max_discard_sectors tunable to limit the size for discard requests. We limit the max discard request size in bytes to 32bit as that is the limit for bio->bi_size. This could be much larger if we had a way to pass that information through the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-01 21:19:34 +02:00
Christoph Hellwig	c15227de13	block: use normal I/O path for discard requests prepare_discard_fn() was being called in a place where memory allocation was effectively impossible. This makes it inappropriate for all but the most trivial translations of Linux's DISCARD operation to the block command set. Additionally adding a payload there makes the ownership of the bio backing unclear as it's now allocated by the device driver and not the submitter as usual. It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether the queue supports discard operations or not. blkdev_issue_discard now allocates a one-page, sector-length payload which is the right thing for the common ATA and SCSI implementations. The mtd implementation of prepare_discard_fn() is replaced with simply checking for the request being a discard. Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx> which did the prepare_discard_fn but not the different payload allocation yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-01 21:19:30 +02:00
Christoph Hellwig	746cd1e7e4	block: use blkdev_issue_discard in blk_ioctl_discard blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-14 08:24:53 +02:00
Martin K. Petersen	3c5820c743	block: Optimal I/O limit wrapper Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper around it. DM needs this to avoid poking at the queue_limits directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-14 08:24:52 +02:00
Jens Axboe	01e97f6b89	block: enable rq CPU completion affinity by default Test results here look good, and on big OLTP runs it has also shown to significantly increase cycles attributed to the database and cause a performance boost. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:34:33 +02:00
Jens Axboe	fb1e75389b	block: improve queue_should_plug() by looking at IO depths Instead of just checking whether this device uses block layer tagging, we can improve the detection by looking at the maximum queue depth it has reached. If that crosses 4, then deem it a queuing device. This is important on high IOPS devices, since plugging hurts the performance there (it can be as much as 10-15% of the sys time). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:31 +02:00
Jens Axboe	1f98a13f62	bio: first step in sanitizing the bio->bi_rw flag testing Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:31 +02:00
Tejun Heo	80a761fd33	block: implement mixed merge of different failfast requests Failfast has characteristics from other attributes. When issuing, executing and successuflly completing requests, failfast doesn't make any difference. It only affects how a request is handled on failure. Allowing requests with different failfast settings to be merged cause normal IOs to fail prematurely while not allowing has performance penalties as failfast is used for read aheads which are likely to be located near in-flight or to-be-issued normal IOs. This patch introduces the concept of 'mixed merge'. A request is a mixed merge if it is merge of segments which require different handling on failure. Currently the only mixable attributes are failfast ones (or lack thereof). When a bio with different failfast settings is added to an existing request or requests of different failfast settings are merged, the merged request is marked mixed. Each bio carries failfast settings and the request always tracks failfast state of the first bio. When the request fails, blk_rq_err_bytes() can be used to determine how many bytes can be safely failed without crossing into an area which requires further retrials. This allows request merging regardless of failfast settings while keeping the failure handling correct. This patch only implements mixed merge but doesn't enable it. The next one will update SCSI to make use of mixed merge. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:30 +02:00
Tejun Heo	a82afdfcb8	block: use the same failfast bits for bio and request bio and request use the same set of failfast bits. This patch makes the following changes to simplify things. * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_* bits coincide with __REQ_FAILFAST_* bits. * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV but the matching is useless anyway. init_request_from_bio() is responsible for setting FAILFAST bits on FS requests and non-FS requests never use BIO_RW_AHEAD. Drop the code and comment from blk_rq_bio_prep(). * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and simplify FAILFAST flags handling in init_request_from_bio(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:27 +02:00
Martin K. Petersen	7c958e3264	block: Add a wrapper for setting minimum request size without a queue Introduce blk_limits_io_min() and make blk_queue_io_min() call it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-08-01 10:24:35 +02:00
Trond Myklebust	373c0a7ed3	Fix compile error due to congestion_wait() changes Move the definition of BLK_RW_ASYNC/BLK_RW_SYNC into linux/backing-dev.h so that it is available to all callers of set/clear_bdi_congested(). This replaces commit `097041e576` ("fuse: Fix build error"), which will be reverted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-11 11:22:26 -07:00
FUJITA Tomonori	ecb554a846	block: fix sg SG_DXFER_TO_FROM_DEV regression I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use the block layer mapping API (2.6.28). Douglas Gilbert explained SG_DXFER_TO_FROM_DEV: http://www.spinics.net/lists/linux-scsi/msg37135.html = The semantics of SG_DXFER_TO_FROM_DEV were: - copy user space buffer to kernel (LLD) buffer - do SCSI command which is assumed to be of the DATA_IN (data from device) variety. This would overwrite some or all of the kernel buffer - copy kernel (LLD) buffer back to the user space. The idea was to detect short reads by filling the original user space buffer with some marker bytes ("0xec" it would seem in this report). The "resid" value is a better way of detecting short reads but that was only added this century and requires co-operation from the LLD. = This patch changes the block layer mapping API to support this semantics. This simply adds another field to struct rq_map_data and enables __bio_copy_iov() to copy data from user space even with READ requests. It's better to add the flags field and kills null_mapped and the new from_user fields in struct rq_map_data but that approach makes it difficult to send this patch to stable trees because st and osst drivers use struct rq_map_data (they were converted to use the block layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer mapping API. zhou sf reported this regiression and tested this patch: http://www.spinics.net/lists/linux-scsi/msg37128.html http://www.spinics.net/lists/linux-scsi/msg37168.html Reported-by: zhou sf <sxzzsf@gmail.com> Tested-by: zhou sf <sxzzsf@gmail.com> Cc: stable@kernel.org Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:53 +02:00
Jens Axboe	8aa7e847d8	Fix congestion_wait() sync/async vs read/write confusion Commit `1faa16d228` accidentally broke the bdi congestion wait queue logic, causing us to wait on congestion for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:53 +02:00
Jens Axboe	018e044689	block: get rid of queue-private command filter The initial patches to support this through sysfs export were broken and have been if 0'ed out in any release. So lets just kill the code and reclaim some space in struct request_queue, if anyone would later like to fixup the sysfs bits, the git history can easily restore the removed bits. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-01 10:56:26 +02:00
Martin K. Petersen	e475bba2fd	block: Introduce helper to reset queue limits to default values DM reuses the request queue when swapping in a new device table Introduce blk_set_default_limits() which can be used to reset the the queue_limits prior to stacking devices. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 08:23:52 +02:00
Linus Torvalds	d614aec475	Merge branch 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (29 commits) ide: re-implement ide_pci_init_one() on top of ide_pci_init_two() ide: unexport ide_find_dma_mode() ide: fix PowerMac bootup oops ide: skip probe if there are no devices on the port (v2) sl82c105: add printk() logging facility ide-tape: fix proc warning ide: add IDE_DFLAG_NIEN_QUIRK device flag ide: respect quirk_drives[] list on all controllers hpt366: enable all quirks for devices on quirk_drives[] list hpt366: sync quirk_drives[] list with pdc202xx_{new,old}.c ide: remove superfluous SELECT_MASK() call from do_rw_taskfile() ide: remove superfluous SELECT_MASK() call from ide_driveid_update() icside: remove superfluous ->maskproc method ide-tape: fix IDE_AFLAG_* atomic accesses ide-tape: change IDE_AFLAG_IGNORE_DSC non-atomically pdc202xx_old: kill resetproc() method pdc202xx_old: don't call pdc202xx_reset() on IRQ timeout pdc202xx_old: use ide_dma_test_irq() ide: preserve Host Protected Area by default (v2) ide-gd: implement block device ->set_capacity method (v2) ...	2009-06-12 09:29:42 -07:00
Kiyoshi Ueda	b0fd271d5f	block: add request clone interface (v2) This patch adds the following 2 interfaces for request-stacking drivers: - blk_rq_prep_clone(struct request clone, struct request orig, struct bio_set bs, gfp_t gfp_mask, int (bio_ctr)(struct bio , struct bio, void ), void data) * Clones bios in the original request to the clone request (bio_ctr is called for each cloned bios.) * Copies attributes of the original request to the clone request. The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not copied. - blk_rq_unprep_clone(struct request clone) Frees cloned bios from the clone request. Request stacking drivers (e.g. request-based dm) need to make a clone request for a submitted request and dispatch it to other devices. To allocate request for the clone, request stacking drivers may not be able to use blk_get_request() because the allocation may be done in an irq-disabled context. So blk_rq_prep_clone() takes a request allocated by the caller as an argument. For each clone bio in the clone request, request stacking drivers should be able to set up their own completion handler. So blk_rq_prep_clone() takes a callback function which is called for each clone bio, and a pointer for private data which is passed to the callback. NOTE: blk_rq_prep_clone() doesn't copy any actual data of the original request. Pages are shared between original bios and cloned bios. So caller must not complete the original request before the clone request. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-11 13:11:05 +02:00
Jens Axboe	9df1bb9b51	Revert "block: Fix bounce limit setting in DM" This reverts commit `a05c0205ba`. DM doesn't need to access the bounce_pfn directly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-09 06:22:57 +02:00
Bartlomiej Zolnierkiewicz	db429e9ec0	partitions: add ->set_capacity block device method * Add ->set_capacity block device method and use it in rescan_partitions() to attempt enabling native capacity of the device upon detecting the partition which exceeds device capacity. * Add GENHD_FL_NATIVE_CAPACITY flag to try limit attempts of enabling native capacity during partition scan. Together with the consecutive patch implementing ->set_capacity method in ide-gd device driver this allows automatic disabling of Host Protected Area (HPA) if any partitions overlapping HPA are detected. Cc: Robert Hancock <hancockrwd@gmail.com> Cc: Frans Pop <elendil@planet.nl> Cc: "Andries E. Brouwer" <Andries.Brouwer@cwi.nl> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Emphatically-Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2009-06-07 13:52:52 +02:00
Martin K. Petersen	a05c0205ba	block: Fix bounce limit setting in DM blk_queue_bounce_limit() is more than a wrapper about the request queue limits.bounce_pfn variable. Introduce blk_queue_bounce_pfn() which can be called by stacking drivers that wish to set the bounce limit explicitly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-03 09:33:18 +02:00
Martin K. Petersen	c72758f337	block: Export I/O topology for block devices and partitions To support devices with physical block sizes bigger than 512 bytes we need to ensure proper alignment. This patch adds support for exposing I/O topology characteristics as devices are stacked. logical_block_size is the smallest unit the device can address. physical_block_size indicates the smallest I/O the device can write without incurring a read-modify-write penalty. The io_min parameter is the smallest preferred I/O size reported by the device. In many cases this is the same as the physical block size. However, the io_min parameter can be scaled up when stacking (RAID5 chunk size > physical block size). The io_opt characteristic indicates the optimal I/O size reported by the device. This is usually the stripe width for arrays. The alignment_offset parameter indicates the number of bytes the start of the device/partition is offset from the device's natural alignment. Partition tools and MD/DM utilities can use this to pad their offsets so filesystems start on proper boundaries. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:55 +02:00
Martin K. Petersen	025146e13b	block: Move queue limits to an embedded struct To accommodate stacking drivers that do not have an associated request queue we're moving the limits to a separate, embedded structure. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:55 +02:00
Martin K. Petersen	ae03bf639a	block: Use accessor functions for queue limits Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Martin K. Petersen	e1defc4ff0	block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Jens Axboe	e4b636366c	Merge branch 'master' into for-2.6.31 Conflicts: drivers/block/hd.c drivers/block/mg_disk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 20:25:34 +02:00
Jens Axboe	0a7ae2ff0d	block: change the tag sync vs async restriction logic Make them fully share the tag space, but disallow async requests using the last any two slots. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-20 08:54:31 +02:00
Boaz Harrosh	a411f4bbb8	block: Un-export blk_rq_append_bio OSD was the last in-tree user of blk_rq_append_bio(). Now that it is fixed blk_rq_append_bio is un-exported and is only used internally by block layer. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-19 12:14:56 +02:00
Boaz Harrosh	79eb63e9e5	block: Add blk_make_request(), takes bio, returns a request New block API: given a struct bio allocates a new request. This is the parallel of generic_make_request for BLOCK_PC commands users. The passed bio may be a chained-bio. The bio is bounced if needed inside the call to this member. This is in the effort of un-exporting blk_rq_append_bio(). Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> CC: Jeff Garzik <jeff@garzik.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-19 12:14:56 +02:00
FUJITA Tomonori	b1f744937f	block: move completion related functions back to blk-core.c Let's put the completion related functions back to block/blk-core.c where they have lived. We can also unexport blk_end_bidi_request() and __blk_end_bidi_request(), which nobody uses. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 11:06:48 +02:00
FUJITA Tomonori	1822952ba2	block: let blk_end_request_all handle bidi requests blk_end_request_all() and __blk_end_request_all() should finish all bytes including bidi, by definition. That's what all bidi users need , bidi requests must be complete as a whole (partial completion is impossible). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 11:06:47 +02:00
Tejun Heo	9934c8c045	block: implement and enforce request peek/start/fetch Till now block layer allowed two separate modes of request execution. A request is always acquired from the request queue via elv_next_request(). After that, drivers are free to either dequeue it or process it without dequeueing. Dequeue allows elv_next_request() to return the next request so that multiple requests can be in flight. Executing requests without dequeueing has its merits mostly in allowing drivers for simpler devices which can't do sg to deal with segments only without considering request boundary. However, the benefit this brings is dubious and declining while the cost of the API ambiguity is increasing. Segment based drivers are usually for very old or limited devices and as converting to dequeueing model isn't difficult, it doesn't justify the API overhead it puts on block layer and its more modern users. Previous patches converted all block low level drivers to dequeueing model. This patch completes the API transition by... * renaming elv_next_request() to blk_peek_request() * renaming blkdev_dequeue_request() to blk_start_request() * adding blk_fetch_request() which is combination of peek and start * disallowing completion of queued (not started) requests * applying new API to all LLDs Renamings are for consistency and to break out of tree code so that it's apparent that out of tree drivers need updating. [ Impact: block request issue API cleanup, no functional change ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Mike Miller <mike.miller@hp.com> Cc: unsik Kim <donari75@gmail.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Laurent Vivier <Laurent@lvivier.info> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: Stefan Weinhuber <wein@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:52:18 +02:00
Tejun Heo	a2dec7b363	block: hide request sector and data_len Block low level drivers for some reason have been pretty good at abusing block layer API. Especially struct request's fields tend to get violated in all possible ways. Make it clear that low level drivers MUST NOT access or manipulate rq->sector and rq->data_len directly by prefixing them with double underscores. This change is also necessary to break build of out-of-tree codes which assume the previous block API where internal fields can be manipulated and rq->data_len carries residual count on completion. [ Impact: hide internal fields, block API change ] Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:55 +02:00
Tejun Heo	2e46e8b27a	block: drop request->hard_* and nr_sectors struct request has had a few different ways to represent some properties of a request. ->hard_ represent block layer's view of the request progress (completion cursor) and the ones without the prefix are supposed to represent the issue cursor and allowed to be updated as necessary by the low level drivers. The thing is that as block layer supports partial completion, the two cursors really aren't necessary and only cause confusion. In addition, manual management of request detail from low level drivers is cumbersome and error-prone at the very least. Another interesting duplicate fields are rq->[hard_]nr_sectors and rq->{hard_cur\|current}_nr_sectors against rq->data_len and rq->bio->bi_size. This is more convoluted than the hard_ case. rq->[hard_]nr_sectors are initialized for requests with bio but blk_rq_bytes() uses it only for !pc requests. rq->data_len is initialized for all request but blk_rq_bytes() uses it only for pc requests. This causes good amount of confusion throughout block layer and its drivers and determining the request length has been a bit of black magic which may or may not work depending on circumstances and what the specific LLD is actually doing. rq->{hard_cur\|current}_nr_sectors represent the number of sectors in the contiguous data area at the front. This is mainly used by drivers which transfers data by walking request segment-by-segment. This value always equals rq->bio->bi_size >> 9. However, data length for pc requests may not be multiple of 512 bytes and using this field becomes a bit confusing. In general, having multiple fields to represent the same property leads only to confusion and subtle bugs. With recent block low level driver cleanups, no driver is accessing or manipulating these duplicate fields directly. Drop all the duplicates. Now rq->sector means the current sector, rq->data_len the current total length and rq->bio->bi_size the current segment length. Everything else is defined in terms of these three and available only through accessors. * blk_recalc_rq_sectors() is collapsed into blk_update_request() and now handles pc and fs requests equally other than rq->sector update. This means that now pc requests can use partial completion too (no in-kernel user yet tho). * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer now uses byte count as the primary data length. * blk_rq_pos() is now guranteed to be always correct. In-block users converted. * blk_rq_bytes() is now guaranteed to be always valid as is blk_rq_sectors(). In-block users converted. * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9. More convenient one is used. * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const pointer to request. [ Impact: API cleanup, single way to represent one property of a request ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:54 +02:00
Tejun Heo	5b93629b45	block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones Implement accessors - blk_rq_pos(), blk_rq_sectors() and blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors and rq->hard_cur_sectors respectively and convert direct references of the said fields to the accessors. This is in preparation of request data length handling cleanup. Geert : suggested adding const to struct request * parameter to accessors Sergei : spotted error in patch description [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Ackec-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:53 +02:00
Tejun Heo	c3a4d78c58	block: add rq->resid_len rq->data_len served two purposes - the length of data buffer on issue and the residual count on completion. This duality creates some headaches. First of all, block layer and low level drivers can't really determine what rq->data_len contains while a request is executing. It could be the total request length or it coulde be anything else one of the lower layers is using to keep track of residual count. This complicates things because blk_rq_bytes() and thus [__]blk_end_request_all() relies on rq->data_len for PC commands. Drivers which want to report residual count should first cache the total request length, update rq->data_len and then complete the request with the cached data length. Secondly, it makes requests default to reporting full residual count, ie. reporting that no data transfer occurred. The residual count is an exception not the norm; however, the driver should clear rq->data_len to zero to signify the normal cases while leaving it alone means no data transfer occurred at all. This reverse default behavior complicates code unnecessarily and renders block PC on some drivers (ide-tape/floppy) unuseable. This patch adds rq->resid_len which is used only for residual count. While at it, remove now unnecessasry blk_rq_bytes() caching in ide_pc_intr() as rq->data_len is not changed anymore. Boaz : spotted missing conversion in osd Sergei : spotted too early conversion to blk_rq_bytes() in ide-tape [ Impact: cleanup residual count handling, report 0 resid by default ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Darrick J. Wong <djwong@us.ibm.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:53 +02:00
Linus Torvalds	7b39da786a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: ide-cd: fix REQ_QUIET tests in cdrom_decode_status Fix up trivial conflicts in include/linux/blkdev.h	2009-05-02 16:48:32 -07:00
Borislav Petkov	96c1674397	ide-cd: fix REQ_QUIET tests in cdrom_decode_status Original patch (`dfa4411cc3`) was buggy. This is a more proper fix which introduces blk_rq_quiet() macro alleviating the need for dumb, too short caching variables. Thanks to Helge Deller and Bart for debugging this. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Reported-and-tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2009-04-30 18:24:34 +02:00
Tejun Heo	9fd8d0e1bc	block: make blk_end_request_cur() return bool In the process of mindlessly copying [__]blk_end_request_all(), [__]blk_end_request_cur() ended up returning void even though they're partial completion functions. Fix it. [ Impact: fix braindead API ] Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-28 08:14:49 +02:00
Tejun Heo	731ec497e5	block: kill rq->data Now that all block request data transfer is done via bio, rq->data isn't used. Kill it. While at it, make the roles of rq->special and buffer clear. [ Impact: drop now unncessary field from struct request ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Boaz Harrosh <bharrosh@panasas.com>	2009-04-28 07:37:36 +02:00
Tejun Heo	f06d9a2b52	block: replace end_request() with [__]blk_end_request_cur() end_request() has been kept around for backward compatibility; however, it's about time for it to go away. * There aren't too many users left. * Its use of @updtodate is pretty confusing. * In some cases, newer code ends up using mixture of end_request() and [__]blk_end_request[_all](), which is way too confusing. So, add [__]blk_end_request_cur() and replace end_request() with it. Most conversions are straightforward. Noteworthy ones are... * paride/pcd: next_request() updated to take 0/-errno instead of 1/0. * paride/pf: pf_end_request() and next_request() updated to take 0/-errno instead of 1/0. * xd: xd_readwrite() updated to return 0/-errno instead of 1/0. * mtd/mtd_blkdevs: blktrans_discard_request() updated to return 0/-errno instead of 1/0. Unnecessary local variable res initialization removed from mtd_blktrans_thread(). [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Joerg Dorchain <joerg@dorchain.net> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Laurent Vivier <Laurent@lvivier.info> Cc: Tim Waugh <tim@cyberelk.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: unsik Kim <donari75@gmail.com>	2009-04-28 07:37:36 +02:00
Tejun Heo	40cbbb781d	block: implement and use [__]blk_end_request_all() There are many [__]blk_end_request() call sites which call it with full request length and expect full completion. Many of them ensure that the request actually completes by doing BUG_ON() the return value, which is awkward and error-prone. This patch adds [__]blk_end_request_all() which takes @rq and @error and fully completes the request. BUG_ON() is added to to ensure that this actually happens. Most conversions are simple but there are a few noteworthy ones. * cdrom/viocd: viocd_end_request() replaced with direct calls to __blk_end_request_all(). * s390/block/dasd: dasd_end_request() replaced with direct calls to __blk_end_request_all(). * s390/char/tape_block: tapeblock_end_request() replaced with direct calls to blk_end_request_all(). [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mike Miller <mike.miller@hp.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-04-28 07:37:35 +02:00
Tejun Heo	2e60e02297	block: clean up request completion API Request completion has gone through several changes and became a bit messy over the time. Clean it up. 1. end_that_request_data() is a thin wrapper around end_that_request_data_first() which checks whether bio is NULL before doing anything and handles bidi completion. blk_update_request() is a thin wrapper around end_that_request_data() which clears nr_sectors on the last iteration but doesn't use the bidi completion. Clean it up by moving the initial bio NULL check and nr_sectors clearing on the last iteration into end_that_request_data() and renaming it to blk_update_request(), which makes blk_end_io() the only user of end_that_request_data(). Collapse end_that_request_data() into blk_end_io(). 2. There are four visible completion variants - blk_end_request(), __blk_end_request(), blk_end_bidi_request() and end_request(). blk_end_request() and blk_end_bidi_request() uses blk_end_request() as the backend but __blk_end_request() and end_request() use separate implementation in __blk_end_request() due to different locking rules. blk_end_bidi_request() is identical to blk_end_io(). Collapse blk_end_io() into blk_end_bidi_request(), separate out request update into internal helper blk_update_bidi_request() and add __blk_end_bidi_request(). Redefine [__]blk_end_request() as thin inline wrappers around [__]blk_end_bidi_request(). 3. As the whole request issue/completion usages are about to be modified and audited, it's a good chance to convert completion functions return bool which better indicates the intended meaning of return values. 4. The function name end_that_request_last() is from the days when it was a public interface and slighly confusing. Give it a proper internal name - blk_finish_request(). 5. Add description explaning that blk_end_bidi_request() can be safely used for uni requests as suggested by Boaz Harrosh. The only visible behavior change is from #1. nr_sectors counts are cleared after the final iteration no matter which function is used to complete the request. I couldn't find any place where the code assumes those nr_sectors counters contain the values for the last segment and this change is good as it makes the API much more consistent as the end result is now same whether a request is completed using [__]blk_end_request() alone or in combination with blk_update_request(). API further cleaned up per Christoph's suggestion. [ Impact: cleanup, rq->*nr_sectors always updated after req completion ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Christoph Hellwig <hch@infradead.org>	2009-04-28 07:37:35 +02:00
Tejun Heo	0b302d5aa7	block: kill blk_end_request_callback() With recent IDE updates, blk_end_request_callback() doesn't have any user now. Kill it. [ Impact: removal of unused convoluted interface ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:34 +02:00
Tejun Heo	5efccd17ce	block: reorder request completion functions Reorder request completion functions such that * All request completion functions are located together. * Functions which are used by only one caller is put right above the caller. * end_request() is put after other completion functions but before blk_update_request(). This change is for completion function cleanup which will follow. [ Impact: cleanup, code reorganization ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:34 +02:00
Tejun Heo	a7f5579234	block: kill blk_start_queueing() blk_start_queueing() is identical to __blk_run_queue() except that it doesn't check for recursion. None of the current users depends on blk_start_queueing() running request_fn directly. Replace usages of blk_start_queueing() with [__]blk_run_queue() and kill it. [ Impact: removal of mostly duplicate interface function ] Signed-off-by: Tejun Heo <tj@kernel.org>	2009-04-28 07:37:33 +02:00
Jerome Marchand	42dad7647a	block: simplify I/O stat accounting This simplifies I/O stat accounting switching code and separates it completely from I/O scheduler switch code. Requests are accounted according to the state of their request queue at the time of the request allocation. There is no need anymore to flush the request queue when switching I/O accounting state. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-24 08:54:21 +02:00
Jens Axboe	2385327725	block: remove unused REQ_UNPLUG The request inherits the unplug flag from the bio, but it isn't actually used. The bio flag stops at __make_request(), which tells it to unplug after submission. Passing it on to the request doesn't make any sense. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-07 08:59:11 +02:00
Jens Axboe	aeb6fafb8f	block: Add flag for telling the IO schedulers NOT to anticipate more IO By default, CFQ will anticipate more IO from a given io context if the previously completed IO was sync. This used to be fine, since the only sync IO was reads and O_DIRECT writes. But with more "normal" sync writes being used now, we don't want to anticipate for those. Add a bio/request flag that informs the IO scheduler that this is a sync request that we should not idle for. Introduce WRITE_ODIRECT specifically for O_DIRECT writes, and make sure that the other sync writes set this flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 08:04:54 -07:00
Jens Axboe	1faa16d228	block: change the request allocation/congestion logic to be sync/async based This makes sure that we never wait on async IO for sync requests, instead of doing the split on writes vs reads. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 08:04:53 -07:00
Jens Axboe	1e42807918	block: reduce stack footprint of blk_recount_segments() blk_recalc_rq_segments() requires a request structure passed in, which we don't have from blk_recount_segments(). So the latter allocates one on the stack, using > 400 bytes of stack for that. This can cause us to spill over one page of stack from ext4 at least: 0) 4560 400 blk_recount_segments+0x43/0x62 1) 4160 32 bio_phys_segments+0x1c/0x24 2) 4128 32 blk_rq_bio_prep+0x2a/0xf9 3) 4096 32 init_request_from_bio+0xf9/0xfe 4) 4064 112 __make_request+0x33c/0x3f6 5) 3952 144 generic_make_request+0x2d1/0x321 6) 3808 64 submit_bio+0xb9/0xc3 7) 3744 48 submit_bh+0xea/0x10e 8) 3696 368 ext4_mb_init_cache+0x257/0xa6a [ext4] 9) 3328 288 ext4_mb_regular_allocator+0x421/0xcd9 [ext4] 10) 3040 160 ext4_mb_new_blocks+0x211/0x4b4 [ext4] 11) 2880 336 ext4_ext_get_blocks+0xb61/0xd45 [ext4] 12) 2544 96 ext4_get_blocks_wrap+0xf2/0x200 [ext4] 13) 2448 80 ext4_da_get_block_write+0x6e/0x16b [ext4] 14) 2368 352 mpage_da_map_blocks+0x7e/0x4b3 [ext4] 15) 2016 352 ext4_da_writepages+0x2ce/0x43c [ext4] 16) 1664 32 do_writepages+0x2d/0x3c 17) 1632 144 __writeback_single_inode+0x162/0x2cd 18) 1488 96 generic_sync_sb_inodes+0x1e3/0x32b 19) 1392 16 sync_sb_inodes+0xe/0x10 20) 1376 48 writeback_inodes+0x69/0xb3 21) 1328 208 balance_dirty_pages_ratelimited_nr+0x187/0x2f9 22) 1120 224 generic_file_buffered_write+0x1d4/0x2c4 23) 896 176 __generic_file_aio_write_nolock+0x35f/0x393 24) 720 80 generic_file_aio_write+0x6c/0xc8 25) 640 80 ext4_file_write+0xa9/0x137 [ext4] 26) 560 320 do_sync_write+0xf0/0x137 27) 240 48 vfs_write+0xb3/0x13c 28) 192 64 sys_write+0x4c/0x74 29) 128 128 system_call_fastpath+0x16/0x1b Split the segment counting out into a __blk_recalc_rq_segments() helper to avoid allocating an onstack request just for checking the physical segment count. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-26 10:45:48 +01:00
Jens Axboe	0648e10d71	block: fix inconsistent parenthesisation of QUEUE_FLAG_DEFAULT Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-02 08:43:48 +01:00
Jens Axboe	bc58ba9468	block: add sysfs file for controlling io stats accounting This allows us to turn off disk stat accounting completely, for the cases where the 0.5-1% reduction in system time is important. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:38 +01:00
Jens Axboe	213d9417fe	block: seperate bio/request unplug and sync bits Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:37 +01:00
FUJITA Tomonori	97ae77a1cd	[SCSI] block: make blk_rq_map_user take a NULL user-space buffer for WRITE The commit `818827669d` (block: make blk_rq_map_user take a NULL user-space buffer) extended blk_rq_map_user to accept a NULL user-space buffer with a READ command. It was necessary to convert sg to use the block layer mapping API. This patch extends blk_rq_map_user again for a WRITE command. It is necessary to convert st and osst drivers to use the block layer apping API. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-02 11:10:35 -06:00
FUJITA Tomonori	56c451f4b5	[SCSI] block: fix the partial mappings with struct rq_map_data This fixes bio_copy_user_iov to properly handle the partial mappings with struct rq_map_data (which only sg uses for now but st and osst will shortly). It adds the offset member to struct rq_map_data and changes blk_rq_map_user to update it so that bio_copy_user_iov can add an appropriate page frame via bio_add_pc_page(). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-02 11:10:08 -06:00
Jens Axboe	b374d18a4b	block: get rid of elevator_t typedef Just use struct elevator_queue everywhere instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:50 +01:00
Tejun Heo	58eea927d2	block: simplify empty barrier implementation Empty barrier required special handling in __elv_next_request() to complete it without letting the low level driver see it. With previous changes, barrier code is now flexible enough to skip the BAR step using the same barrier sequence selection mechanism. Drop the special handling and mask off q->ordered from start_ordered(). Remove blk_empty_barrier() test which now has no user. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:45 +01:00
Tejun Heo	8f11b3e99a	block: make barrier completion more robust Barrier completion had the following assumptions. * start_ordered() couldn't finish the whole sequence properly. If all actions are to be skipped, q->ordseq is set correctly but the actual completion was never triggered thus hanging the barrier request. * Drain completion in elv_complete_request() assumed that there's always at least one request in the queue when drain completes. Both assumptions are true but these assumptions need to be removed to improve empty barrier implementation. This patch makes the following changes. * Make start_ordered() use blk_ordered_complete_seq() to mark skipped steps complete and notify __elv_next_request() that it should fetch the next request if the whole barrier has completed inside start_ordered(). * Make drain completion path in elv_complete_request() check whether the queue is empty. Empty queue also indicates drain completion. * While at it, convert 0/1 return from blk_do_ordered() to false/true. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:45 +01:00
Tejun Heo	f671620e7d	block: make every barrier action optional In all barrier sequences, the barrier write itself was always assumed to be issued and thus didn't have corresponding control flag. This patch adds QUEUE_ORDERED_DO_BAR and unify action mask handling in start_ordered() such that any barrier action can be skipped. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:45 +01:00
Tejun Heo	313e42999d	block: reorganize QUEUE_ORDERED_* constants Separate out ordering type (drain,) and action masks (preflush, postflush, fua) from visible ordering mode selectors (QUEUE_ORDERED_). Ordering types are now named QUEUE_ORDERED_BY_ while action masks are named QUEUE_ORDERED_DO_*. This change is necessary to add QUEUE_ORDERED_DO_BAR and make it optional to improve empty barrier implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Cheng Renquan	64d01dc9e1	block: use cancel_work_sync() instead of kblockd_flush_work() After many improvements on kblockd_flush_work, it is now identical to cancel_work_sync, so a direct call to cancel_work_sync is suggested. The only difference is that cancel_work_sync is a GPL symbol, so no non-GPL modules anymore. Signed-off-by: Cheng Renquan <crquan@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Fernando Luis Vázquez Cao	88e740f165	block: add queue flag for paravirt frontend drivers As is the case with SSD devices, we do not want to idle in AS/CFQ when the block device is a paravirt front-end driver. This patch adds a flag (QUEUE_FLAG_VIRT) which should be used by front-end drivers such as virtio_blk and xen-blkfront to indicate a paravirtualized device. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:41 +01:00
Linus Torvalds	f2f1fa78a1	Enforce a minimum SG_IO timeout There's no point in having too short SG_IO timeouts, since if the command does end up timing out, we'll end up through the reset sequence that is several seconds long in order to abort the command that timed out. As a result, shorter timeouts than a few seconds simply do not make sense, as the recovery would be longer than the timeout itself. Add a BLK_MIN_SG_TIMEOUT to match the existign BLK_DEFAULT_SG_TIMEOUT. Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-05 14:49:18 -08:00
Milan Broz	0e435ac26e	block: fix setting of max_segment_size and seg_boundary mask Fix setting of max_segment_size and seg_boundary mask for stacked md/dm devices. When stacking devices (LVM over MD over SCSI) some of the request queue parameters are not set up correctly in some cases by default, namely max_segment_size and and seg_boundary mask. If you create MD device over SCSI, these attributes are zeroed. Problem become when there is over this mapping next device-mapper mapping - queue attributes are set in DM this way: request_queue max_segment_size seg_boundary_mask SCSI 65536 0xffffffff MD RAID1 0 0 LVM 65536 -1 (64bit) Unfortunately bio_add_page (resp. bio_phys_segments) calculates number of physical segments according to these parameters. During the generic_make_request() is segment cout recalculated and can increase bio->bi_phys_segments count over the allowed limit. (After bio_clone() in stack operation.) Thi is specially problem in CCISS driver, where it produce OOPS here BUG_ON(creq->nr_phys_segments > MAXSGENTRIES); (MAXSEGENTRIES is 31 by default.) Sometimes even this command is enough to cause oops: dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10 This command generates bios with 250 sectors, allocated in 32 4k-pages (last page uses only 1024 bytes). For LVM layer, it allocates bio with 31 segments (still OK for CCISS), unfortunatelly on lower layer it is recalculated to 32 segments and this violates CCISS restriction and triggers BUG_ON(). The patch tries to fix it by: * initializing attributes above in queue request constructor blk_queue_make_request() * make sure that blk_queue_stack_limits() inherits setting (DM uses its own function to set the limits because it blk_queue_stack_limits() was introduced later. It should probably switch to use generic stack limit function too.) * sets the default seg_boundary value in one place (blkdev.h) * use this mask as default in DM (instead of -1, which differs in 64bit) Bugs related to this: https://bugzilla.redhat.com/show_bug.cgi?id=471639 http://bugzilla.kernel.org/show_bug.cgi?id=8672 Signed-off-by: Milan Broz <mbroz@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-03 12:55:55 +01:00
Tejun Heo	53a08807c0	block: internal dequeue shouldn't start timer blkdev_dequeue_request() and elv_dequeue_request() are equivalent and both start the timeout timer. Barrier code dequeues the original barrier request but doesn't passes the request itself to lower level driver, only broken down proxy requests; however, as the original barrier code goes through the same dequeue path and timeout timer is started on it. If barrier sequence takes long enough, this timer expires but the low level driver has no idea about this request and oops follows. Timeout timer shouldn't have been started on the original barrier request as it never goes through actual IO. This patch unexports elv_dequeue_request(), which has no external user anyway, and makes it operate on elevator proper w/o adding the timer and make blkdev_dequeue_request() call elv_dequeue_request() and add timer. Internal users which don't pass the request to driver - barrier code and end_that_request_last() - are converted to use elv_dequeue_request(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-03 12:41:26 +01:00
Al Viro	90b8f2824c	[PATCH] end of methods switch: remove the old ones Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:48:52 -04:00
Al Viro	d4430d62fa	[PATCH] beginning of methods conversion To keep the size of changesets sane we split the switch by drivers; to keep the damn thing bisectable we do the following: 1) rename the affected methods, add ones with correct prototypes, make (few) callers handle both. That's this changeset. 2) for each driver convert to new methods. ALL drivers are converted in this series. 3) kill the old (renamed) methods. Note that it _is_ a flagday; all in-tree drivers are converted and by the end of this series no trace of old methods remain. The only reason why we do that this way is to keep the damn thing bisectable and allow per-driver debugging if anything goes wrong. New methods: open(bdev, mode) release(disk, mode) ioctl(bdev, mode, cmd, arg) /* Called without BKL / compat_ioctl(bdev, mode, cmd, arg) locked_ioctl(bdev, mode, cmd, arg) / Called with BKL, legacy */ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:32 -04:00
Al Viro	633a08b812	[PATCH] introduce __blkdev_driver_ioctl() Analog of blkdev_driver_ioctl() with sane arguments. For now uses fake struct file, by the end of the series it won't and blkdev_driver_ioctl() will become a wrapper around it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:26 -04:00
Al Viro	08f8585121	[PATCH] move block_device_operations to blkdev.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:20 -04:00
Al Viro	74f3c8aff3	[PATCH] switch scsi_cmd_ioctl() to passing fmode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:14 -04:00
Al Viro	e915e872ed	[PATCH] switch sg_scsi_ioctl() to passing fmode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:12 -04:00
Al Viro	aeb5d72706	[PATCH] introduce fmode_t, do annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:06 -04:00
Linus Torvalds	c53dbf5486	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: remove __generic_unplug_device() from exports block: move q->unplug_work initialization blktrace: pass zfcp driver data blktrace: add support for driver data block: fix current kernel-doc warnings block: only call ->request_fn when the queue is not stopped block: simplify string handling in elv_iosched_store() block: fix kernel-doc for blk_alloc_devt() block: fix nr_phys_segments miscalculation bug block: add partition attribute for partition number block: add BIG FAT WARNING to CONFIG_DEBUG_BLOCK_EXT_DEVT softirq: Add support for triggering softirq work on softirqs.	2008-10-17 09:29:55 -07:00
Jens Axboe	f73e2d13a1	block: remove __generic_unplug_device() from exports The only out-of-core user is IDE, and that should be using blk_start_queueing() instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-17 14:03:08 +02:00
Mike Christie	6000a368cd	[SCSI] block: separate failfast into multiple bits. Multipath is best at handling transport errors. If it gets a device error then there is not much the multipath layer can do. It will just access the same device but from a different path. This patch breaks up failfast into device, transport and driver errors. The multipath layers (md and dm mutlipath) only ask the lower levels to fast fail transport errors. The user of failfast, read ahead, will ask to fast fail on all errors. Note that blk_noretry_request will return true if any failfast bit is set. This allows drivers that do not support the multipath failfast bits to continue to fail on any failfast error like before. Drivers like scsi that are able to fail fast specific errors can check for the specific fail fast type. In the next patch I will convert scsi. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-10-13 09:28:52 -04:00
Martin K. Petersen	b02739b01c	block: gendisk integrity wrapper This is a wrapper for accessing a gendisk's integrity bits. It allows the integrity support in MD to be compiled with BLK_DEV_INTEGRITY off. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:22 +02:00
Martin K. Petersen	ad7fce9314	block: Switch blk_integrity_compare from bdev to gendisk The DM and MD integrity support now depends on being able to use gendisks instead of block_devices when comparing integrity profiles. Change function parameters accordingly. Also update comparison logic so that two NULL profiles are a valid configuration. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:21 +02:00
Jens Axboe	b04accc425	block: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1 We need bdev_get_integrity() to support the pending md/dm patches. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:21 +02:00
Kiyoshi Ueda	d00e29fd99	block: remove end_{queued\|dequeued}_request() This patch removes end_queued_request() and end_dequeued_request(), which are no longer used. As a results, users of __end_request() became only end_request(). So the actual code in __end_request() is moved to end_request() and __end_request() is removed. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:21 +02:00
Kiyoshi Ueda	ef9e3facdf	block: add lld busy state exporting interface This patch adds an new interface, blk_lld_busy(), to check lld's busy state from the block layer. blk_lld_busy() calls down into low-level drivers for the checking if the drivers set q->lld_busy_fn() using blk_queue_lld_busy(). This resolves a performance problem on request stacking devices below. Some drivers like scsi mid layer stop dispatching request when they detect busy state on its low-level device like host/target/device. It allows other requests to stay in the I/O scheduler's queue for a chance of merging. Request stacking drivers like request-based dm should follow the same logic. However, there is no generic interface for the stacked device to check if the underlying device(s) are busy. If the request stacking driver dispatches and submits requests to the busy underlying device, the requests will stay in the underlying device's queue without a chance of merging. This causes performance problem on burst I/O load. With this patch, busy state of the underlying device is exported via q->lld_busy_fn(). So the request stacking driver can check it and stop dispatching requests if busy. The underlying device driver must return the busy state appropriately: 1: when the device driver can't process requests immediately. 0: when the device driver can process requests immediately, including abnormal situations where the device driver needs to kill all requests. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:20 +02:00
Jens Axboe	a68bbddba4	block: add queue flag for SSD/non-rotational devices We don't want to idle in AS/CFQ if the device doesn't have a seek penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational device, low level drivers should set this flag upon discovery of an SSD or similar device type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:19 +02:00
Kiyoshi Ueda	4ee5eaf451	block: add a queue flag for request stacking support This patch adds a queue flag to indicate the block device can be used for request stacking. Request stacking drivers need to stack their devices on top of only devices of which q->request_fn is functional. Since bio stacking drivers (e.g. md, loop) basically initialize their queue using blk_alloc_queue() and don't set q->request_fn, the check of (q->request_fn == NULL) looks enough for that purpose. However, dm will become both types of stacking driver (bio-based and request-based). And dm will always set q->request_fn even if the dm device is bio-based of which q->request_fn is not functional actually. So we need something else to distinguish the type of the device. Adding a queue flag is a solution for that. The reason why dm always sets q->request_fn is to keep the compatibility of dm user-space tools. Currently, all dm user-space tools are using bio-based dm without specifying the type of the dm device they use. To use request-based dm without changing such tools, the kernel must decide the type of the dm device automatically. The automatic type decision can't be done at the device creation time and needs to be deferred until such tools load a mapping table, since the actual type is decided by dm target type included in the mapping table. So a dm device has to be initialized using blk_init_queue() so that we can load either type of table. Then, all queue stuffs are set (e.g. q->request_fn) and we have no element to distinguish that it is bio-based or request-based, even after a table is loaded and the type of the device is decided. By the way, some stuffs of the queue (e.g. request_list, elevator) are needless when the dm device is used as bio-based. But the memory size is not so large (about 20[KB] per queue on ia64), so I hope the memory loss can be acceptable for bio-based dm users. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:18 +02:00
Kiyoshi Ueda	82124d6035	block: add request submission interface This patch adds blk_insert_cloned_request(), a generic request submission interface for request stacking drivers. Request-based dm will use it to submit their clones to underlying devices. blk_rq_check_limits() is also added because it is possible that the lower queue has stronger limitations than the upper queue if multiple drivers are stacking at request-level. Not only for blk_insert_cloned_request()'s internal use, the function will be used by request-based dm when the queue limitation is modified (e.g. by replacing dm's table). Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:18 +02:00
Kiyoshi Ueda	32fab448e5	block: add request update interface This patch adds blk_update_request(), which updates struct request with completing its data part, but doesn't complete the struct request itself. Though it looks like end_that_request_first() of older kernels, blk_update_request() should be used only by request stacking drivers. Request-based dm will use it in bio->bi_end_io callback to update the original request when a data part of a cloned request completes. Followings are additional background information of why request-based dm needs this interface. - Request stacking drivers can't use blk_end_request() directly from the lower driver's completion context (bio->bi_end_io or rq->end_io), because some device drivers (e.g. ide) may try to complete their request with queue lock held, and it may cause deadlock. See below for detailed description of possible deadlock: <http://marc.info/?l=linux-kernel&m=120311479108569&w=2> - To solve that, request-based dm offloads the completion of cloned struct request to softirq context (i.e. using blk_complete_request() from rq->end_io). - Though it is possible to use the same solution from bio->bi_end_io, it will delay the notification of bio completion to the original submitter. Also, it will cause inefficient partial completion, because the lower driver can't perform the cloned request anymore and request-based dm needs to requeue and redispatch it to the lower driver again later. That's not good. - So request-based dm needs blk_update_request() to perform the bio completion in the lower driver's completion context, which is more efficient. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:18 +02:00
Jens Axboe	9c02f2b02e	block: cleanup some of the integrity stuff in blkdev.h Don't put functions that are only used in fs/bio-integrity.c in blkdev.h, it's much cleaner to just keep it in there. Also kill completely unused bdev_get_tag_size() Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:17 +02:00
Jens Axboe	581d4e28d9	block: add fault injection mechanism for faking request timeouts Only works for the generic request timer handling. Allows one to sporadically ignore request completions, thus exercising the timeout handling. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:17 +02:00
Hugh Dickins	3e6053d76d	block: adjust blkdev_issue_discard for swap Two mods to blkdev_issue_discard(), thinking ahead to its use on swap: 1. Add gfp_mask argument, so swap allocation can use it where GFP_KERNEL might deadlock but GFP_NOIO is safe. 2. Enlarge nr_sects argument from unsigned to sector_t: unsigned long is enough to cover a whole swap area, but sector_t suits any partition. Change sb_issue_discard()'s nr_blocks to sector_t too; but no need seen for a gfp_mask there, just pass GFP_KERNEL down to blkdev_issue_discard(). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:17 +02:00
Mike Anderson	11914a53d2	block: Add interface to abort queued requests Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:13 +02:00
Jens Axboe	242f9dcb8b	block: unify request timeout handling Right now SCSI and others do their own command timeout handling. Move those bits to the block layer. Instead of having a timer per command, we try to be a bit more clever and simply have one per-queue. This avoids the overhead of having to tear down and setup a timer for each command, so it will result in a lot less timer fiddling. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:13 +02:00
FUJITA Tomonori	879040742c	block: add blk_rq_aligned helper function This adds blk_rq_aligned helper function to see if alignment and padding requirement is satisfied for DMA transfer. This also converts blk_rq_map_kern and __blk_rq_map_user to use the helper function. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:11 +02:00
FUJITA Tomonori	152e283fdf	block: introduce struct rq_map_data to use reserved pages This patch introduces struct rq_map_data to enable bio_copy_use_iov() use reserved pages. Currently, bio_copy_user_iov allocates bounce pages but drivers/scsi/sg.c wants to allocate pages by itself and use them. struct rq_map_data can be used to pass allocated pages to bio_copy_user_iov. The current users of bio_copy_user_iov simply passes NULL (they don't want to use pre-allocated pages). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Douglas Gilbert <dougg@torque.net> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:10 +02:00
FUJITA Tomonori	a3bce90edd	block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Currently, blk_rq_map_user and blk_rq_map_user_iov always do GFP_KERNEL allocation. This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov so sg can use it (sg always does GFP_ATOMIC allocation). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Douglas Gilbert <dougg@torque.net> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:10 +02:00
Jens Axboe	ab780f1ece	block: inherit CPU completion on bio->rq and rq->rq merges Somewhat incomplete, as we do allow merges of requests and bios that have different completion CPUs given. This is done on the assumption that a larger IO is still more beneficial than CPU locality. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:09 +02:00
Jens Axboe	c7c22e4d5c	block: add support for IO CPU affinity This patch adds support for controlling the IO completion CPU of either all requests on a queue, or on a per-request basis. We export a sysfs variable (rq_affinity) which, if set, migrates completions of requests to the CPU that originally submitted it. A bio helper (bio_set_completion_cpu()) is also added, so that queuers can ask for completion on that specific CPU. In testing, this has been show to cut the system time by as much as 20-40% on synthetic workloads where CPU affinity is desired. This requires a little help from the architecture, so it'll only work as designed for archs that are using the new generic smp helper infrastructure. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:09 +02:00
Jens Axboe	18887ad910	block: make kblockd_schedule_work() take the queue as parameter Preparatory patch for checking queuing affinity. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:09 +02:00
Mikulas Patocka	5df97b91b5	drop vmerge accounting Remove hw_segments field from struct bio and struct request. Without virtual merge accounting they have no purpose. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:03 +02:00
Fernando Luis Vázquez Cao	766ca4428d	virtio_blk: use a wrapper function to access io context information of IO requests struct request has an ioprio member but it is never updated because currently bios do not hold io context information. The implication of this is that virtio_blk ends up passing useless information to the backend driver. That said, some IO schedulers such as CFQ do store io context information in struct request, but use private members for that, which means that that information cannot be directly accessed in a IO scheduler-independent way. This patch adds a function to obtain the ioprio of a request. We should avoid accessing ioprio directly and use this function instead, so that its users do not have to care about future changes in block layer structures or what the currently active IO controller is. This patch does not introduce any functional changes but paves the way for future clean-ups and enhancements. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:02 +02:00
David Woodhouse	1a8e2bddd5	Kill REQ_TYPE_FLUSH It was only used by ps3disk, and it should probably have been REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:02 +02:00
David Woodhouse	e17fc0a1cc	Allow elevators to sort/merge discard requests But blkdev_issue_discard() still emits requests which are interpreted as soft barriers, because naïve callers might otherwise issue subsequent writes to those same sectors, which might cross on the queue (if they're reallocated quickly enough). Callers still _can_ issue non-barrier discard requests, but they have to take care of queue ordering for themselves. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:02 +02:00
David Woodhouse	eae9acd13a	Support 'discard sectors' operation in translation layer support core Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:01 +02:00
David Woodhouse	fb2dce862d	Add 'discard' request handling Some block devices benefit from a hint that they can forget the contents of certain sectors. Add basic support for this to the block core, along with a 'blkdev_issue_discard()' helper function which issues such requests. The caller doesn't get to provide an end_io functio, since blkdev_issue_discard() will automatically split the request up into multiple bios if appropriate. Neither does the function wait for completion -- it's expected that callers won't care about when, or even _if_, the request completes. It's only a hint to the device anyway. By definition, the file system doesn't _care_ about these sectors any more. [With feedback from OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> and Jens Axboe <jens.axboe@oracle.com] Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:01 +02:00
David Woodhouse	d628eaef31	Fix up comments about matching flags between bio and rq Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:01 +02:00
Jens Axboe	2dc75d3c3b	block: disable sysfs parts of the disk command filter We still have life time issues with the sysfs command filter kobject, so disable it for 2.6.27 release. We can revisit this and make it work properly for 2.6.28, for 2.6.27 release it's too risky. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-09-11 14:20:23 +02:00
Jens Axboe	5168c47b4c	block: remove blk_queue_tag_depth() and blk_queue_tag_queue() They are unused and ->busy doesn't exist anymore. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-08-27 09:50:20 +02:00
FUJITA Tomonori	4beab5c623	block: rename blk_scsi_cmd_filter to blk_cmd_filter Technically, the cmd_filter would be applied to other protocols though it's unlikely to happen. Putting SCSI stuff to request_queue is kinda layer violation. So let's rename it. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-08-27 09:50:19 +02:00
FUJITA Tomonori	abf5439370	block: move cmdfilter from gendisk to request_queue cmd_filter works only for the block layer SG_IO with SCSI block devices. It breaks scsi/sg.c, bsg, and the block layer SG_IO with SCSI character devices (such as st). We hit a kernel crash with them. The problem is that cmd_filter code accesses to gendisk (having struct blk_scsi_cmd_filter) via inode->i_bdev->bd_disk. It works for only SCSI block device files. With character device files, inode->i_bdev leads you to struct cdev. inode->i_bdev->bd_disk->blk_scsi_cmd_filter isn't safe. SCSI ULDs don't expose gendisk; they keep it private. bsg needs to be independent on any protocols. We shouldn't change ULDs to expose their gendisk. This patch moves struct blk_scsi_cmd_filter from gendisk to request_queue, a common object, which eveyone can access to. The user interface doesn't change; users can change the filters via /sys/block/. gendisk has a pointer to request_queue so the cmd_filter code accesses to struct blk_scsi_cmd_filter. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-08-27 09:50:19 +02:00
Jens Axboe	6c5e0c4d51	block: add a blk_plug_device_unlocked() that grabs the queue lock blk_plug_device() must be called with the queue lock held, so callers often just grab and release the lock for that purpose. Add a helper that does just that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-08-01 20:31:32 +02:00
Martin K. Petersen	d442cc44c0	block: Trivial fix for blk_integrity_rq() Fail integrity check gracefully when request does not have a bio attached (BLOCK_PC). Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-16 14:51:41 -07:00
Linus Torvalds	98339cbd36	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (80 commits) ide-floppy: fix unfortunate function naming ide-tape: unify idetape_create_read/write_cmd ide: add ide_pc_intr() helper ide-{floppy,scsi}: read Status Register before stopping DMA engine ide-scsi: add more debugging to idescsi_pc_intr() ide-scsi: use pc->callback ide-floppy: add more debugging to idefloppy_pc_intr() ide-tape: always log debug info in idetape_pc_intr() if debugging is enabled ide-tape: add ide_tape_io_buffers() helper ide-tape: factor out DSC handling from idetape_pc_intr() ide-{floppy,tape}: move checking of ->failed_pc to ->callback ide: add ide_issue_pc() helper ide: add PC_FLAG_DRQ_INTERRUPT pc flag ide-scsi: move idescsi_map_sg() call out from idescsi_issue_pc() ide: add ide_transfer_pc() helper ide-scsi: set drive->scsi flag for devices handled by the driver ide-{cd,floppy,tape}: remove checking for drive->scsi ide: add PC_FLAG_ZIP_DRIVE pc flag ide-tape: factor out waiting for good ireason from idetape_transfer_pc() ide-tape: set PC_FLAG_DMA_IN_PROGRESS flag in idetape_transfer_pc() ...	2008-07-15 11:15:36 -07:00
FUJITA Tomonori	681a561b7e	block: unexport blk_end_sync_rq All the users of blk_end_sync_rq has gone (they are converted to use blk_execute_rq). This unexports blk_end_sync_rq. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-15 21:21:45 +02:00
FUJITA Tomonori	27f8221af4	block: add blk_queue_update_dma_pad This adds blk_queue_update_dma_pad to prevent LLDs from overwriting the dma pad mask wrongly (we added blk_queue_update_dma_alignment due to the same reason). This also converts libata to use blk_queue_update_dma_pad instead of blk_queue_dma_pad. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-04 09:52:13 +02:00
Jens Axboe	e48ec69005	block: extend queue_flag bitops Add test_and_clear and test_and_set. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:15 +02:00
Alasdair G Kergon	cc371e66e3	Add bvec_merge_data to handle stacked devices and ->merge_bvec() When devices are stacked, one device's merge_bvec_fn may need to perform the mapping and then call one or more functions for its underlying devices. The following bio fields are used: bio->bi_sector bio->bi_bdev bio->bi_size bio->bi_rw using bio_data_dir() This patch creates a new struct bvec_merge_data holding a copy of those fields to avoid having to change them directly in the struct bio when going down the stack only to have to change them back again on the way back up. (And then when the bio gets mapped for real, the whole exercise gets repeated, but that's a problem for another day...) Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: Milan Broz <mbroz@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:15 +02:00
Jens Axboe	b24498d477	block: integrity flags can't use bit ops on unsigned short Just use normal open coded bit operations instead, they need not be atomic. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:15 +02:00
Adel Gadllah	0b07de85a7	allow userspace to modify scsi command filter on per device basis This patch exports the per-gendisk command filter to user space through sysfs, so it can be changed by the system administrator. All users of the old cmd filter have been converted to use the new one. Original patch from Peter Jones. Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com> Signed-off-by: Peter Jones <pjones@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:14 +02:00
Jens Axboe	6e2401ad6f	block: integrity cleanups - No need to check for NULL bio, we'll get an immediate oops anyway. - Make bio_integrity() a proper function. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:14 +02:00
Jens Axboe	da9cbc8739	block: blkdev.h cleanup, move iocontext stuff to iocontext.h Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:14 +02:00
Martin K. Petersen	7ba1ba12ee	block: Block layer data integrity support Some block devices support verifying the integrity of requests by way of checksums or other protection information that is submitted along with the I/O. This patch implements support for generating and verifying integrity metadata, as well as correctly merging, splitting and cloning bios and requests that have this extra information attached. See Documentation/block/data-integrity.txt for more information. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:13 +02:00
Jens Axboe	244b4d56f8	block: kill request_queue_t Everything was moved to struct request_queue a few kernel revisions ago, maintaining the deprecated typedef to avoid breaking things. Now the time has come to get rid of that typedef. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:12 +02:00
Jens Axboe	7663c1e279	Improve queue_is_locked() spin_is_locked() doesn't work on UP without spinlock debugging. Make it safer and just return 1 on UP, so we don't get false positives. The plan is to kill this debug function during the -rc cycle. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 12:36:54 -07:00
Linus Torvalds	8f45c1a58a	block: fix queue locking verification The new queue_flag_set/clear() functions verify that the queue is locked, but in doing so they will actually instead oops if the queue lock hasn't been initialized at all. So fix the lock debug test to consider the "no lock" case to be unlocked. This way you get a nice WARN_ON_ONCE() instead of a fatal oops. Bug introduced by commit `75ad23bc0f` ("block: make queue flags non-atomic"). Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 10:16:38 -07:00
Alan D. Brunelle	ac9fafa124	block: Skip I/O merges when disabled The block I/O + elevator + I/O scheduler code spend a lot of time trying to merge I/Os -- rightfully so under "normal" circumstances. However, if one were to know that the incoming I/O stream was /very/ random in nature, the cycles are wasted. This patch adds a per-request_queue tunable that (when set) disables merge attempts (beyond the simple one-hit cache check), thus freeing up a non-trivial amount of CPU cycles. Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 14:48:55 +02:00
FUJITA Tomonori	d7e3c3249e	block: add large command support This patch changes rq->cmd from the static array to a pointer to support large commands. We rarely handle large commands. So for optimization, a struct request still has a static array for a command. rq_init sets rq->cmd pointer to the static array. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 14:48:55 +02:00
FUJITA Tomonori	2a4aa30c5f	block: rename and export rq_init() This rename rq_init() blk_rq_init() and export it. Any path that hands the request to the block layer needs to call it to initialize the request. This is a preparation for large command support, which needs to initialize the request in a proper way (that is, just doing a memset() will not work). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 14:48:55 +02:00
Nick Piggin	75ad23bc0f	block: make queue flags non-atomic We can save some atomic ops in the IO path, if we clearly define the rules of how to modify the queue flags. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 14:48:33 +02:00
Andi Kleen	2472892a3c	block: fix memory hotplug and bouncing in block layer Only noticed this while hacking something else, no test case. blk_max_low_pfn is initialized once at bootup by the block layer from max_low_pfn. But max_low_pfn is not necessarily constant over the runtime of the system when you consider memory hotplug. What could happen if that someone adds memory later the block layer wouldn't get updated and then start bouncing memory unnecessarily. Also on 64bit blk_max_low_pfn actually isn't needed because it just disables bouncing essentially and there is no highmem. And nobody can pass pfns > max_low_pfn to the block layer, because those wouldn't have a struct page and I suspect block layer wouldn't be very happy without that. So set BLK_BOUNCE_HIGH to infinity (-1ULL) on 64bit. That avoids the problem of having to update it on memory hotadd. On 32bit I kept the same behaviour because at least on i386 memory hotadd only adds HIGHMEM, never lowmem. BLK_BOUNCE_ANY is always set to infinity on both 32 and 64bit. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-21 09:51:05 +02:00
FUJITA Tomonori	f18573abcc	block: move the padding adjustment to blk_rq_map_sg blk_rq_map_user adjusts bi_size of the last bio. It breaks the rule that req->data_len (the true data length) is equal to sum(bio). It broke the scsi command completion code. commit `e97a294ef6` was introduced to fix the above issue. However, the partial completion code doesn't work with it. The commit is also a layer violation (scsi mid-layer should not know about the block layer's padding). This patch moves the padding adjustment to blk_rq_map_sg (suggested by James). The padding works like the drain buffer. This patch breaks the rule that req->data_len is equal to sum(sg), however, the drain buffer already broke it. So this patch just restores the rule that req->data_len is equal to sub(bio) without breaking anything new. Now when a low level driver needs padding, blk_rq_map_user and blk_rq_map_user_iov guarantee there's enough room for padding. blk_rq_map_sg can safely extend the last entry of a scatter list. blk_rq_map_sg must extend the last entry of a scatter list only for a request that got through bio_copy_user_iov. This patches introduces new REQ_COPY_USER flag. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-21 09:50:08 +02:00
Tejun Heo	e3790c7d42	block: separate out padding from alignment Block layer alignment was used for two different purposes - memory alignment and padding. This causes problems in lower layers because drivers which only require memory alignment ends up with adjusted rq->data_len. Separate out padding such that padding occurs iff driver explicitly requests it. Tomo: restorethe code to update bio in blk_rq_map_user introduced by the commit `40b01b9bbd` according to padding alignment. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-04 11:18:17 +01:00
FUJITA Tomonori	7a85f8896f	block: restore the meaning of rq->data_len to the true data length The meaning of rq->data_len was changed to the length of an allocated buffer from the true data length. It breaks SG_IO friends and bsg. This patch restores the meaning of rq->data_len to the true data length and adds rq->extra_len to store an extended length (due to drain buffer and padding). This patch also removes the code to update bio in blk_rq_map_user introduced by the commit `40b01b9bbd`. The commit adjusts bio according to memory alignment (queue_dma_alignment). However, memory alignment is NOT padding alignment. This adjustment also breaks SG_IO friends and bsg. Padding alignment needs to be fixed in a proper way (by a separate patch). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>	2008-03-04 11:17:11 +01:00
Tejun Heo	2fb98e8414	block: implement request_queue->dma_drain_needed Draining shouldn't be done for commands where overflow may indicate data integrity issues. Add dma_drain_needed callback to request_queue. Drain buffer is appened iff this function returns non-zero. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-19 11:36:53 +01:00
Tejun Heo	6b00769fe1	block: add request->raw_data_len With padding and draining moved into it, block layer now may extend requests as directed by queue parameters, so now a request has two sizes - the original request size and the extended size which matches the size of area pointed to by bios and later by sgs. The latter size is what lower layers are primarily interested in when allocating, filling up DMA tables and setting up the controller. Both padding and draining extend the data area to accomodate controller characteristics. As any controller which speaks SCSI can handle underflows, feeding larger data area is safe. So, this patch makes the primary data length field, request->data_len, indicate the size of full data area and add a separate length field, request->raw_data_len, for the unmodified request size. The latter is used to report to higher layer (userland) and where the original request size should be fed to the controller or device. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-19 11:36:35 +01:00
Jens Axboe	63a7138671	block: fixup rq_init() a bit Rearrange fields in cache order and initialize some fields that we didn't previously init. Remove init of ->completion_data, it's part of a union with ->hash. Luckily clearing the rb node is the same as setting it to null! Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-08 12:41:03 +01:00
Jens Axboe	3bc217ffe6	block: kill swap_io_context() It blindly copies everything in the io_context, including the lock. That doesn't work so well for either lock ordering or lockdep. There seems zero point in swapping io contexts on a request to request merge, so the best point of action is to just remove it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-01 11:34:49 +01:00
Jens Axboe	22b132102f	block: new end request handling interface should take unsigned byte counts No point in passing signed integers as the byte count, they can never be negative. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-01 09:26:33 +01:00
Jens Axboe	023ccde109	block: fix warning on compile with CONFIG_BLOCK struct io_context was not defined, just add an empty forward decl. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-29 21:55:14 +01:00
Linus Torvalds	8d01eddf29	Merge branch 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block: block: implement drain buffers __bio_clone: don't calculate hw/phys segment counts block: allow queue dma_alignment of zero blktrace: Add blktrace ioctls to SCSI generic devices	2008-01-29 08:51:56 +11:00
Linus Torvalds	f0f0052069	Merge branch 'blk-end-request' of git://git.kernel.dk/linux-2.6-block * 'blk-end-request' of git://git.kernel.dk/linux-2.6-block: (30 commits) blk_end_request: changing xsysace (take 4) blk_end_request: changing ub (take 4) blk_end_request: cleanup of request completion (take 4) blk_end_request: cleanup 'uptodate' related code (take 4) blk_end_request: remove/unexport end_that_request_* (take 4) blk_end_request: changing scsi (take 4) blk_end_request: add bidi completion interface (take 4) blk_end_request: changing ide-cd (take 4) blk_end_request: add callback feature (take 4) blk_end_request: changing ide normal caller (take 4) blk_end_request: changing cpqarray (take 4) blk_end_request: changing cciss (take 4) blk_end_request: changing ide-scsi (take 4) blk_end_request: changing s390 (take 4) blk_end_request: changing mmc (take 4) blk_end_request: changing i2o_block (take 4) blk_end_request: changing viocd (take 4) blk_end_request: changing xen-blkfront (take 4) blk_end_request: changing viodasd (take 4) blk_end_request: changing sx8 (take 4) ...	2008-01-29 08:51:32 +11:00
James Bottomley	fa0ccd837e	block: implement drain buffers These DMA drain buffer implementations in drivers are pretty horrible to do in terms of manipulating the scatterlist. Plus they're being done at least in drivers/ide and drivers/ata, so we now have code duplication. The one use case for this, as I understand it is AHCI controllers doing PIO mode to mmc devices but translating this to DMA at the controller level. So, what about adding a callback to the block layer that permits the adding of the drain buffer for the problem devices. The idea is that you'd do this in slave_configure after you find one of these devices. The beauty of doing it in the block layer is that it quietly adds the drain buffer to the end of the sg list, so it automatically gets mapped (and unmapped) without anything unusual having to be done to the scatterlist in driver/scsi or drivers/ata and without any alteration to the transfer length. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:54:11 +01:00
Jens Axboe	d38ecf935f	io context sharing: preliminary support Detach task state from ioc, instead keep track of how many processes are accessing the ioc. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:50:31 +01:00
Jens Axboe	fd0928df98	ioprio: move io priority from task_struct to io_context This is where it belongs and then it doesn't take up space for a process that doesn't do IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:50:29 +01:00
Kiyoshi Ueda	5450d3e1d6	blk_end_request: cleanup 'uptodate' related code (take 4) This patch converts 'uptodate' arguments of no longer exported interfaces, end_that_request_first/last, to 'error', and removes internal conversions for it in blk_end_request interfaces. Also, this patch removes no longer needed end_io_error(). Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:37:13 +01:00
Kiyoshi Ueda	3bcddeac1c	blk_end_request: remove/unexport end_that_request_* (take 4) This patch removes the following functions: o end_that_request_first() o end_that_request_chunk() and stops exporting the functions below: o end_that_request_last() Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:37:12 +01:00
Kiyoshi Ueda	e3a04fe34a	blk_end_request: add bidi completion interface (take 4) This patch adds a variant of the interface, blk_end_bidi_request(), which completes a bidi request. Bidi request must be completed as a whole, both rq and rq->next_rq at once. So the interface has 2 arguments for completion size. As for ->end_io, only rq->end_io is called (rq->next_rq->end_io is not called). So if special completion handling is needed, the handler must be set to rq->end_io. And the handler must take care of freeing next_rq too, since the interface doesn't care of it if rq->end_io is not NULL. Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:37:08 +01:00
Kiyoshi Ueda	e19a3ab058	blk_end_request: add callback feature (take 4) This patch adds a variant of the interface, blk_end_request_callback(), which has driver callback feature. Drivers may need to do special works between end_that_request_first() and end_that_request_last(). For such drivers, blk_end_request_callback() allows it to pass a callback function which is called between end_that_request_first() and end_that_request_last(). This interface is only for fallback of other blk_end_request interfaces. Drivers should avoid their tricky behaviors and use other interfaces as much as possible. Currently, only one driver, ide-cd, needs this interface. So this interface should/will be removed, after the driver removes such tricky behaviors. o ide-cd (cdrom_newpc_intr()) In PIO mode, cdrom_newpc_intr() needs to defer end_that_request_last() until the device clears DRQ_STAT and raises an interrupt after end_that_request_first(). So end_that_request_first() and end_that_request_last() are called separately in cdrom_newpc_intr(). This means blk_end_request_callback() has to return without completing request even if no leftover in the request. To satisfy the requirement, callback function has return value so that drivers can tell blk_end_request_callback() to return without completing request. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:37:04 +01:00
Kiyoshi Ueda	3b11313a6c	blk_end_request: add/export functions to get request size (take 4) This patch adds/exports functions to get the size of request in bytes. They are useful because blk_end_request interfaces take bytes as a completed I/O size instead of sectors. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:35:56 +01:00
Kiyoshi Ueda	336cdb4003	blk_end_request: add new request completion interface (take 4) This patch adds 2 new interfaces for request completion: o blk_end_request() : called without queue lock o __blk_end_request() : called with queue lock held blk_end_request takes 'error' as an argument instead of 'uptodate', which current end_that_request_* take. The meanings of values are below and the value is used when bio is completed. 0 : success < 0 : error Some device drivers call some generic functions below between end_that_request_{first/chunk} and end_that_request_last(). o add_disk_randomness() o blk_queue_end_tag() o blkdev_dequeue_request() These are called in the blk_end_request interfaces as a part of generic request completion. So all device drivers become to call above functions. To decide whether to call blkdev_dequeue_request(), blk_end_request uses list_empty(&rq->queuelist) (blk_queued_rq() macro is added for it). So drivers must re-initialize it using list_init() or so before calling blk_end_request if drivers use it for its specific purpose. (Currently, there is no driver which completes request without re-initializing the queuelist after used it. So rq->queuelist can be used for the purpose above.) "Normal" drivers can be converted to use blk_end_request() in a standard way shown below. a) end_that_request_{chunk/first} spin_lock_irqsave() (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request()) end_that_request_last() spin_unlock_irqrestore() => blk_end_request() b) spin_lock_irqsave() end_that_request_{chunk/first} (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request()) end_that_request_last() spin_unlock_irqrestore() => spin_lock_irqsave() __blk_end_request() spin_unlock_irqsave() c) spin_lock_irqsave() (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request()) end_that_request_last() spin_unlock_irqrestore() => blk_end_request() or spin_lock_irqsave() __blk_end_request() spin_unlock_irqrestore() Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:35:53 +01:00
Pete Wyckoff	482eb68916	block: allow queue dma_alignment of zero Let queue_dma_alignment return 0 if it was specifically set to 0. This permits devices with no particular alignment restrictions to use arbitrary user space buffers without copying. Signed-off-by: Pete Wyckoff <pw@osc.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:04:46 +01:00
Bartlomiej Zolnierkiewicz	7267c33774	ide: remove REQ_TYPE_ATA_CMD Based on the earlier work by Tejun Heo. All users are gone so we can finally remove it. Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-01-26 20:13:13 +01:00
Linus Torvalds	9b73e76f3c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits) [SCSI] usbstorage: use last_sector_bug flag universally [SCSI] libsas: abstract STP task status into a function [SCSI] ultrastor: clean up inline asm warnings [SCSI] aic7xxx: fix firmware build [SCSI] aacraid: fib context lock for management ioctls [SCSI] ch: remove forward declarations [SCSI] ch: fix device minor number management bug [SCSI] ch: handle class_device_create failure properly [SCSI] NCR5380: fix section mismatch [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices [SCSI] IB/iSER: add logical unit reset support [SCSI] don't use __GFP_DMA for sense buffers if not required [SCSI] use dynamically allocated sense buffer [SCSI] scsi.h: add macro for enclosure bit of inquiry data [SCSI] sd: add fix for devices with last sector access problems [SCSI] fix pcmcia compile problem [SCSI] aacraid: add Voodoo Lite class of cards. [SCSI] aacraid: add new driver features flags [SCSI] qla2xxx: Update version number to 8.02.00-k7. [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command. ...	2008-01-25 17:19:08 -08:00
Bartlomiej Zolnierkiewicz	29ed2a5f8c	ide: remove REQ_TYPE_ATA_TASK Based on the earlier work by Tejun Heo. All users are gone so we can finally remove it. Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-01-25 22:17:11 +01:00
James Bottomley	11c3e689f1	[SCSI] block: Introduce new blk_queue_update_dma_alignment interface The purpose of this is to allow stacked alignment settings, with the ultimate queue alignment being set to the largest alignment requirement in the stack. The reason for this is so that the SCSI mid-layer can relax the default alignment requirements (which are basically causing a lot of superfluous copying to go on in the SG_IO interface) while allowing transports, devices or HBAs to add stricter limits if they need them. Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-01-11 18:29:20 -06:00
Alan D. Brunelle	2ad8b1ef11	Add UNPLUG traces to all appropriate places Added blk_unplug interface, allowing all invocations of unplugs to result in a generated blktrace UNPLUG. Signed-off-by: Alan D. Brunelle <Alan.Brunelle@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-09 13:41:32 +01:00
Jens Axboe	6eca9004df	[BLOCK] Fix bad sharing of tag busy list on queues with shared tag maps For the locking to work, only the tag map and tag bit map may be shared (incidentally, I was just explaining this to Nick yesterday, but I apparently didn't review the code well enough myself). But we also share the busy list! The busy_list must be queue private, or we need a block_queue_tag covering lock as well. So we have to move the busy_list to the queue. This'll work fine, and it'll actually also fix a problem with blk_queue_invalidate_tags() which will invalidate tags across all shared queues. This is a bit confusing, the low level driver should call it for each queue seperately since otherwise you cannot kill tags on just a single queue for eg a hard drive that stops responding. Since the function has no callers currently, it's not an issue. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-29 11:33:06 +01:00
Jens Axboe	fd5d806266	block: convert blkdev_issue_flush() to use empty barriers Then we can get rid of ->issue_flush_fn() and all the driver private implementations of that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:05:02 +02:00
Jens Axboe	bf2de6f5a4	block: Initial support for data-less (or empty) barrier support This implements functionality to pass down or insert a barrier in a queue, without having data attached to it. The ->prepare_flush_fn() infrastructure from data barriers are reused to provide this functionality. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:03:56 +02:00
Jens Axboe	a0cd128542	block: add end_queued_request() and end_dequeued_request() helpers We can use this helper in the elevator core for BLKPREP_KILL, and it'll also be useful for the empty barrier patch. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:03:53 +02:00
Jens Axboe	2da96acde0	[BLOCK] Move sector_div() from blkdev.h to kernel.h We need it even if CONFIG_BLOCK is disabled, so move it outside of the block layer include system. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-12 12:40:38 +02:00
NeilBrown	d24517d793	Remove flush_dry_bio_endio The entire function of flush_dry_bio_endio is to undo the effects of bio_endio (when called on a barrier request). So remove the function and the call to bio_endio. This allows us to remove "bi_size" from "struct request_queue". Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./block/ll_rw_blk.c \| 39 ++------------------------------------- ./include/linux/blkdev.h \| 1 - 2 files changed, 2 insertions(+), 38 deletions(-) diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:57 +02:00
Jens Axboe	f5ff8422bb	Fix warnings with !CONFIG_BLOCK Hide everything in blkdev.h with CONFIG_BLOCK isn't set, and fixup the (few) files that fail to build because they were relying on blkdev.h pulling in extra includes for them. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:57 +02:00
NeilBrown	66846572bf	Stop exporting blk_rq_bio_prep blk_rq_bio_prep is exported for use in exactly one place. That place can benefit from using the new blk_rq_append_bio instead. So - change dm-emc to call blk_rq_append_bio - stop exporting blk_rq_bio_prep, and - initialise rq_disk in blk_rq_bio_prep, as dm-emc needs it. Signed-off-by: Neil Brown <neilb@suse.de> diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:56 +02:00
NeilBrown	3001ca7712	New function blk_req_append_bio ll_back_merge_fn is currently exported to SCSI where is it used, together with blk_rq_bio_prep, in exactly the same way these functions are used in __blk_rq_map_user. So move the common code into a new function (blk_rq_append_bio), and don't export ll_back_merge_fn any longer. Signed-off-by: Neil Brown <neilb@suse.de> diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:56 +02:00
NeilBrown	5705f70217	Introduce rq_for_each_segment replacing rq_for_each_bio Every usage of rq_for_each_bio wraps a usage of bio_for_each_segment, so these can be combined into rq_for_each_segment. We define "struct req_iterator" to hold the 'bio' and 'index' that are needed for the double iteration. Signed-off-by: Neil Brown <neilb@suse.de> Various compile fixes by me... Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:56 +02:00
Qi Yong	4e97182a22	[patch] QUEUE_FLAG_READFULL QUEUE_FLAG_WRITEFULL comment fix The two comments were transposed. Signed-off-by: Qi Yong <qiyong@mail.fc-cn.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-27 08:08:24 +02:00
Jens Axboe	71f65e6bd7	[BLOCK] Add request_queue_t and mark it deprecated Andrew thinks I should be nice and allow outside code to at least just compile, so add the request_queue_t typedef back and mark it deprecated. It'll warn people that this type is going away soonish. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-24 10:29:42 +02:00
Jens Axboe	165125e1e4	[BLOCK] Get rid of request_queue_t typedef Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-24 09:28:11 +02:00
FUJITA Tomonori	41e1703b9b	[SCSI] bsg: unexport sg v3 helper functions blk_fill_sghdr_rq, blk_unmap_sghdr_rq, and blk_complete_sghdr_rq were exported for bsg, however bsg was changed to support only sg v4. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-22 08:48:41 -05:00
Christoph Lameter	2a7326b5bb	CONFIG_BOUNCE to avoid useless inclusion of bounce buffer logic The bounce buffer logic is included on systems that do not need it. If a system does not have zones like ZONE_DMA and ZONE_HIGHMEM that can lead to the use of bounce buffers then there is no need to reserve memory pools etc etc. This is true f.e. for SGI Altix. Also nicifies the Makefile and gets rid of the tricky "and" there. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:02 -07:00
FUJITA Tomonori	abae1fde63	add a struct request pointer to the request structure This adds a struct request pointer to the request structure for the second data phase (bidi for now). A request queue supporting bidi requests sets QUEUE_FLAG_BIDI. This prevents sending bidi requests to a non-bidi queue. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-16 08:52:46 +02:00
FUJITA Tomonori	d351af01b9	bsg: bind bsg to request_queue instead of gendisk This patch binds bsg devices to request_queue instead of gendisk. Any objects (like transport entities) can define own request_handler and create own bsg device. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-16 08:52:46 +02:00
FUJITA Tomonori	45e79a3acd	bsg: add a request_queue argument to scsi_cmd_ioctl() bsg uses scsi_cmd_ioctl() for some SCSI/sg ioctl commands. scsi_cmd_ioctl() gets a request queue from a gendisk arguement. This prevents bsg being bound to SCSI devices that don't have a gendisk (like OSD). This adds a request_queue argument to scsi_cmd_ioctl(). The SCSI/sg ioctl commands doesn't use a gendisk so it's safe for any SCSI devices to use scsi_cmd_ioctl(). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-16 08:52:45 +02:00
FUJITA Tomonori	337ad41dea	block: export blk_verify_command for SG v4 blk_fill_sghdr_rq doesn't work for SG v4 so verify_command needed to be exported. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-16 08:52:44 +02:00
Jens Axboe	3d6392cfbd	bsg: support for full generic block layer SG v3 Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-16 08:52:44 +02:00
Benjamin Gilbert	7deeed1317	[TRIVIAL PATCH] Kill blk_congestion_wait() stub for !CONFIG_BLOCK blk_congestion_wait() doesn't exist anymore, but there's still a stub in blkdev.h for the !CONFIG_BLOCK case. Kill it. Signed-off-by: Benjamin Gilbert <bgilbert@cs.cmu.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:03:32 +02:00
Andrew Morton	19a75d83ff	kblockd: use flush_work Switch the kblockd flushing from a global flush to a more specific flush_work(). (akpm: bypassed maintainers, sorry. There are other patches which depend on this) Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <axboe@suse.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Jens Axboe	4e521c27ee	ll_rw_blk: add io_context private pointer To be used by as/cfq as they see fit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Jens Axboe	aaf1228ddf	cfq-iosched: remove cfq_io_context last_queue It hasn't been used for a while, kill it off and remove the old if 0 code chunk. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	8e5cfc45e7	[PATCH] Fixup blk_rq_unmap_user() API The blk_rq_unmap_user() API is not very nice. It expects the caller to know that rq->bio has to be reset to the original bio, and it will silently do nothing if that is not done. Instead make it explicit that we need to pass in the first bio, by expecting a bio argument. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 11:12:46 +01:00
Jens Axboe	1aa4f24fe9	[PATCH] Remove queue merging hooks We have full flexibility of merging parameters now, so we can remove the hooks that define back/front/request merge strategies. Nobody is using them anymore. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 08:33:11 +01:00
Boaz Harrosh	2b02a17920	[PATCH] remove blk_queue_activity_fn While working on bidi support at struct request level I have found that blk_queue_activity_fn is actually never used. The only user is in ide-probe.c with this code: /* enable led activity for disk drives only */ if (drive->media == ide_disk && hwif->led_act) blk_queue_activity_fn(q, hwif->led_act, drive); And led_act is never initialized anywhere. (Looking back at older kernels it was used in the PPC arch, but was removed around 2.6.18) Unless it is all for future use off course. (this patch is against linux-2.6-block.git as off 2006/12/4) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-12 10:22:23 +01:00
Mike Christie	0e75f9063f	[PATCH] block: support larger block pc requests This patch modifies blk_rq_map/unmap_user() and the cdrom and scsi_ioctl.c users so that it supports requests larger than bio by chaining them together. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-01 10:40:55 +01:00
Andrew Morton	3fcfab16c5	[PATCH] separate bdi congestion functions from queue congestion functions Separate out the concept of "queue congestion" from "backing-dev congestion". Congestion is a backing-dev concept, not a queue concept. The blk_* congestion functions are retained, as wrappers around the core backing-dev congestion functions. This proper layering is needed so that NFS can cleanly use the congestion functions, and so that CONFIG_BLOCK=n actually links. Cc: "Thomas Maier" <balagi@justmail.de> Cc: "Jens Axboe" <jens.axboe@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: David Howells <dhowells@redhat.com> Cc: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:35 -07:00
Thomas Maier	79e2de4bc5	[PATCH] export clear_queue_congested and set_queue_congested Export the clear_queue_congested() and set_queue_congested() functions located in ll_rw_blk.c The functions are renamed to blk_clear_queue_congested() and blk_set_queue_congested(). (needed in the pktcdvd driver's bio write congestion control) Signed-off-by: Thomas Maier <balagi@justmail.de> Cc: Peter Osterlund <petero2@telia.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:35 -07:00
Jens Axboe	cea2885a2e	[PATCH] ide-cd: fix breakage with internally queued commands We still need to maintain a private PC style command, since it isn't completely unified with REQ_TYPE_BLOCK_PC yet. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-10-12 15:08:51 +02:00
David C Somayajulu	f583f4924d	[PATCH] helper function for retrieving scsi_cmd given host based block layer tag This was necessitated by the need for a function to get back to a scsi_cmnd, when an hba the posts its (corresponding) completion interrupt with a block layer tag as its reference. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: David Somayajulu <david.somayajulu@qlogic.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-10-04 19:32:09 +02:00
Andrew Morton	bcfd8d3615	[PATCH] CONFIG_BLOCK: blk_congestion_wait() fix Don't just do nothing: it'll cause busywaits all over writeback and page reclaim. For now, take a fixed-length nap. Will improve when NFS starts waking up throttled processes. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:33 +02:00
David Howells	9361401eb7	[PATCH] BLOCK: Make it possible to disable the block layer [try #6 ] Make it possible to disable the block layer. Not all embedded devices require it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require the block layer to be present. This patch does the following: () Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev support. () Adds dependencies on CONFIG_BLOCK to any configuration item that controls an item that uses the block layer. This includes: () Block I/O tracing. () Disk partition code. () All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS. () The SCSI layer. As far as I can tell, even SCSI chardevs use the block layer to do scheduling. Some drivers that use SCSI facilities - such as USB storage - end up disabled indirectly from this. () Various block-based device drivers, such as IDE and the old CDROM drivers. () MTD blockdev handling and FTL. () JFFS - which uses set_bdev_super(), something it could avoid doing by taking a leaf out of JFFS2's book. () Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is, however, still used in places, and so is still available. () Also made contingent are the contents of linux/mpage.h, linux/genhd.h and parts of linux/fs.h. () Makes a number of files in fs/ contingent on CONFIG_BLOCK. () Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK. () set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK is not enabled. () fs/no-block.c is created to hold out-of-line stubs and things that are required when CONFIG_BLOCK is not set: () Default blockdev file operations (to give error ENODEV on opening). () Makes some /proc changes: () /proc/devices does not list any blockdevs. () /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK. () Makes some compat ioctl handling contingent on CONFIG_BLOCK. () If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if given command other than Q_SYNC or if a special device is specified. () In init/do_mounts.c, no reference is made to the blockdev routines if CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2. () The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return error ENOSYS by way of cond_syscall if so). () The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if CONFIG_BLOCK is not set, since they can't then happen. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:31 +02:00
Jens Axboe	5404bc7a87	[PATCH] Allow file systems to differentiate between data and meta reads We can use this information for making more intelligent priority decisions, and it will also be useful for blktrace. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:42 +02:00
Jens Axboe	dc72ef4ae3	[PATCH] Add blk_start_queueing() helper CFQ implements this on its own now, but it's really block layer knowledge. Tells a device queue to start dispatching requests to the driver, taking care to unplug if needed. Also fixes the issue where as/cfq will invoke a stopped queue, which we really don't want. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:40 +02:00
Jens Axboe	b5deef9012	[PATCH] Make sure all block/io scheduler setups are node aware Some were kmalloc_node(), some were still kmalloc(). Change them all to kmalloc_node(). Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:39 +02:00
Jens Axboe	a3b05e8f58	[PATCH] Kill various deprecated/unused block layer defines/functions Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:38 +02:00
Jens Axboe	fc46379daf	[PATCH] cfq-iosched: kill cfq_exit_lock cfq_exit_lock is protecting two things now: - The per-ioc rbtree of cfq_io_contexts - The per-cfqd linked list of cfq_io_contexts The per-cfqd linked list can be protected by the queue lock, as it is (by definition) per cfqd as the queue lock is. The per-ioc rbtree is mainly used and updated by the process itself only. The only outside use is the io priority changing. If we move the priority changing to not browsing the rbtree, we can remove any locking from the rbtree updates and lookup completely. Let the sys_ioprio syscall just mark processes as having the iopriority changed and lazily update the private cfq io contexts the next time io is queued, and we can remove this locking as well. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:36 +02:00
Jens Axboe	e6a1c874a0	[PATCH] struct request: shrink and optimize some more Move some members around and unionize completion_data and rb_node since they cannot ever be used at the same time. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:35 +02:00
Jens Axboe	cdd6026217	[PATCH] Remove ->rq_status from struct request After Christophs SCSI change, the only usage left is RQ_ACTIVE and RQ_INACTIVE. The block layer sets RQ_INACTIVE right before freeing the request, so any check for RQ_INACTIVE in a driver is a bug and indicates use-after-free. So kill/clean the remaining users, straight forward. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:23 +02:00
Jens Axboe	49171e5c6f	[PATCH] Remove struct request_list from struct request It is always identical to &q->rq, and we only use it for detecting whether this request came out of our mempool or not. So replace it with an additional ->flags bit flag. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:22 +02:00
Jens Axboe	c00895ab2f	[PATCH] Remove ->waiting member from struct request As the comments indicates in blkdev.h, we can fold it into ->end_io_data usage as that is really what ->waiting is. Fixup the users of blk_end_sync_rq(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:29:12 +02:00
Jens Axboe	ff7d145fd9	[PATCH] Add one more pointer to struct request for IO scheduler usage Then we have enough room in the request to get rid of the dynamic allocations in CFQ/AS. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:27:01 +02:00
Jens Axboe	9e2585a8a2	[PATCH] as-iosched: remove arq->is_sync member We can track this in struct request. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>	2006-09-30 20:27:00 +02:00
Jens Axboe	2e662b65f0	[PATCH] elevator: abstract out the rbtree sort handling The rbtree sort/lookup/reposition logic is mostly duplicated in cfq/deadline/as, so move it to the elevator core. The io schedulers still provide the actual rb root, as we don't want to impose any sort of specific handling on the schedulers. Introduce the helpers and rb_node in struct request to help migrate the IO schedulers. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:57 +02:00
Jens Axboe	9817064b68	[PATCH] elevator: move the backmerging logic into the elevator core Right now, every IO scheduler implements its own backmerging (except for noop, which does no merging). That results in duplicated code for essentially the same operation, which is never a good thing. This patch moves the backmerging out of the io schedulers and into the elevator core. We save 1.6kb of text and as a bonus get backmerging for noop as well. Win-win! Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:56 +02:00
Jens Axboe	4aff5e2333	[PATCH] Split struct request ->flags into two parts Right now ->flags is a bit of a mess: some are request types, and others are just modifiers. Clean this up by splitting it into ->cmd_type and ->cmd_flags. This allows introduction of generic Linux block message types, useful for sending generic Linux commands to block devices. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:23:37 +02:00
Alexey Dobriyan	6c5c934153	[PATCH] ifdef blktrace debugging fields Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:09 -07:00
James Bottomley	1aedf2ccc6	Merge mulgrave-w:git/linux-2.6 Conflicts: include/linux/blkdev.h Trivial merge to incorporate tag prototypes.	2006-09-23 21:03:52 -05:00
Trond Myklebust	275a082fe9	Add a real API for dealing with blk_congestion_wait() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
James Bottomley	492dfb4896	[SCSI] block: add support for shared tag maps The current block queue implementation already contains most of the machinery for shared tag maps. The only remaining pieces are a way to allocate and destroy a tag map independently of the queues (so that the maps can be managed on the life cycle of the overseeing entity) Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2006-08-31 11:17:18 -04:00
Jens Axboe	8f34ee75de	[PATCH] Rearrange a few struct request members This saves 8 bytes of data in 64-bit archs. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	ad3caddaa1	[PATCH] Get rid of struct request request_pm_state member The IDE power management can just use the ->end_io_data member to store it's data. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	b31dc66a54	[PATCH] Kill PF_SYNCWRITE flag A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Linus Torvalds	28e4b22495	Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (85 commits) [SCSI] 53c700: remove reliance on deprecated cmnd fields [SCSI] hptiop: don't use cmnd->bufflen [SCSI] hptiop: HighPoint RocketRAID 3xxx controller driver [SCSI] aacraid: small misc. cleanups [SCSI] aacraid: Update supported product information [SCSI] aacraid: Fix return code interpretation [SCSI] scsi_transport_sas: fix panic in sas_free_rphy [SCSI] remove RQ_SCSI_* flags [SCSI] remove scsi_request infrastructure [SCSI] mptfusion: change driver revision to 3.03.10 [SCSI] mptfc: abort of board reset leaves port dead requiring reboot [SCSI] mptfc: fix fibre channel infinite request/response loop [SCSI] mptfc: set fibre channel fw target missing timers to one second [SCSI] mptfusion: move fc event/reset handling to mptfc [SCSI] spi transport: don't allow dt to be set on SE or HVD buses [SCSI] aic7xxx: expose the bus setting to sysfs [SCSI] scsi: remove Documentation/scsi/cpqfc.txt [SCSI] drivers/scsi: Use ARRAY_SIZE macro [SCSI] Remove last page_address from dc395x.c [SCSI] hptiop: HighPoint RocketRAID 3xxx controller driver ... Fixed up conflicts in drivers/message/fusion/mptbase.c manually (due to the sparc interrupt cleanups)	2006-06-21 11:18:25 -07:00
Christoph Hellwig	8d7feac3c7	[SCSI] remove RQ_SCSI_* flags The RQ_SCSI_* flags are a vestiage of a long past history. The EH code still sets them but we never make use of that information. The other users is pluto.c which never had a chance to work but needs to be kept compiling to keep Davem happy, so copy over the definition there. We could probably get rid of RQ_ACTIVE/RQ_INACTIVE aswell with some work, there's only two more or less bogus looking uses in ubd and scsi. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2006-06-10 16:25:21 -05:00
David Woodhouse	62c4f0a2d5	Don't include linux/config.h from anywhere else in include/ Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-04-26 12:56:16 +01:00
Christoph Hellwig	21b2f0c803	[SCSI] unify SCSI_IOCTL_SEND_COMMAND implementations We currently have two implementations of this obsolete ioctl, one in the block layer and one in the scsi code. Both of them have drawbacks. This patch kills the scsi layer version after updating the block version with the missing bits: - argument checking - use scatterlist I/O - set number of retries based on the submitted command This is the last user of non-S/G I/O except for the gdth driver, so getting this in ASAP and through the scsi tree would be nie to kill the non-S/G I/O path. Jens, what do you think about adding a check for non-S/G I/O in the midlayer? Thanks to Or Gerlitz for testing this patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2006-04-13 10:13:15 -05:00
Jens Axboe	206dc69b31	[BLOCK] cfq-iosched: seek and async performance fixes Detect whether a given process is seeky and if so disable (mostly) the idle window if it is. We still allow just a little idle time, just enough to allow that process to submit a new request. That is needed to maintain fairness across priority groups. In some cases, we could setup several async queues. This is not optimal from a performance POV, since we want all async io in one queue to perform good sorting on it. It also impacted sync queues, as async io got too much slice time. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-28 13:03:44 +02:00
Jens Axboe	e2d74ac066	[PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree On setups with many disks, we spend a considerable amount of time looking up the process-disk mapping on each queue of io. Testing with a NULL based block driver, this costs 40-50% reduction in throughput for 1000 disks. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-28 08:59:01 +02:00
Jens Axboe	2056a782f8	[PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23 Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-23 20:00:26 +01:00
Al Viro	483f4afc42	[PATCH] fix sysfs interaction and lifetime rules handling for queues	2006-03-18 18:34:37 -05:00
Al Viro	d9ff418793	[PATCH] make cfq_exit_queue() prune the cfq_io_context for that queue Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-03-18 18:34:07 -05:00
Al Viro	12a0573215	[PATCH] keep sync and async cfq_queue separate Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-03-18 18:34:02 -05:00
Jens Axboe	2cb2e147a6	[BLOCK] ll_rw_blk: make max_sectors and max_hw_sectors unsigned ints IDE lba48 can support full 64k request size, which overflows the max_hw_sectors variable. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-01-24 10:06:19 +01:00
Linus Torvalds	e2688f00dc	Merge branch 'blk-softirq' of git://brick.kernel.dk/data/git/linux-2.6-block Manual merge for trivial #include changes	2006-01-09 09:26:40 -08:00
Jens Axboe	ff856bad67	[BLOCK] ll_rw_blk: Enable out-of-order request completions through softirq Request completion can be a quite heavy process, since it needs to iterate through the entire request and complete the bio's it holds. This patch adds blk_complete_request() which moves this processing into a dedicated block softirq. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-01-09 16:02:34 +01:00
Jens Axboe	356cebea11	[BLOCK] Kill blk_attempt_remerge() It's a broken interface, it's done way too late. And apparently it triggers slab problems in recent kernels as well (most likely after the generic dispatch code was merged). So kill it, ide-cd is the only user of it. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-01-09 15:30:20 +01:00
Jens Axboe	15fc858a00	[BLOCK] Correct blk_execute_rq_nowait() prototype	2006-01-06 10:00:50 +01:00
Tejun Heo	797e7dbbee	[BLOCK] reimplement handling of barrier request Reimplement handling of barrier requests. * Flexible handling to deal with various capabilities of target devices. * Retry support for falling back. * Tagged queues which don't support ordered tag can do ordered. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-01-06 09:51:03 +01:00
Tejun Heo	8ffdc6550c	[BLOCK] add @uptodate to end_that_request_last() and @error to rq_end_io_fn() add @uptodate argument to end_that_request_last() and @error to rq_end_io_fn(). there's no generic way to pass error code to request completion function, making generic error handling of non-fs request difficult (rq->errors is driver-specific and each driver uses it differently). this patch adds @uptodate to end_that_request_last() and @error to rq_end_io_fn(). for fs requests, this doesn't really matter, so just using the same uptodate argument used in the last call to end_that_request_first() should suffice. imho, this can also help the generic command-carrying request jens is working on. Signed-off-by: tejun heo <htejun@gmail.com> Signed-Off-By: Jens Axboe <axboe@suse.de>	2006-01-06 09:49:03 +01:00
Mike Christie	defd94b754	[SCSI] seperate max_sectors from max_hw_sectors - export __blk_put_request and blk_execute_rq_nowait needed for async REQ_BLOCK_PC requests - seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was already testing against max_sectors and SCSI-ml was setting max_sectors and max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set a valid max_hw_sectors for all LLDs. Today if a LLD does not set it SCSI-ml sets it to a safe default and some LLDs set it to a artificial low value to overcome memory and feedback issues. Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024, drivers that used to call blk_queue_max_sectors with a large value of max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2005-12-15 15:11:40 -08:00
Mike Christie	17e01f216b	[SCSI] add retries field to request for REQ_BLOCK_PC use For tape we need to control the retries. This patch adds a retries counter on the request for REQ_BLOCK_PC commands originating from scsi_execute* to use. REQ_BLOCK_PC commands comming from the block layer SG_IO path continue to use the retires set in the ULD init_command. (scsi_execute* does not set the gendisk so we do not execute the init_command in that path). Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2005-12-14 19:04:11 -08:00
Mike Christie	6e39b69e7e	[SCSI] export blk layer functions needed for blk_execute_rq_nowait To send async requests we need these two functions exported. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2005-12-14 19:00:50 -08:00
Tejun Heo	15853af9f0	[BLOCK] Implement elv_drain_elevator for improved switch error detection This patch adds request_queue->nr_sorted which keeps the number of requests in the iosched and implement elv_drain_elevator which performs forced dispatching. elv_drain_elevator checks whether iosched actually dispatches all requests it has and prints error message if it doesn't. As buggy forced dispatching can result in wrong barrier operations, I think this extra check is worthwhile. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2005-11-12 10:56:06 +01:00
Linus Torvalds	5dd962494f	Merge branch 'elevator-switch' of git://brick.kernel.dk/data/git/linux-2.6-block Manual fixup for trivial "gfp_t" changes.	2005-10-28 08:56:34 -07:00
Linus Torvalds	28d721e24c	Merge branch 'generic-dispatch' of git://brick.kernel.dk/data/git/linux-2.6-block	2005-10-28 08:53:49 -07:00
Al Viro	8267e268e0	[PATCH] gfp_t: block layer core Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-28 08:16:47 -07:00
Jens Axboe	64521d1a3b	[BLOCK] elevator switch fixes/cleanup - 100msec sleep is a little excessive, lots of requests can complete in that timeframe. Use 10msec instead. - Rename QUEUE_FLAG_BYPASS to QUEUE_FLAG_ELVSWITCH to indicate what is going on. Signed-off-by: Jens Axboe <axboe@suse.de>	2005-10-28 08:48:23 +02:00
Tejun Heo	cb98fc8bb9	[BLOCK] Reimplement elevator switch This patch reimplements elevator switch. This patch assumes generic dispatch queue patchset is applied. * Each request is tagged with REQ_ELVPRIV flag if it has its elevator private data set. * Requests which doesn't have REQ_ELVPRIV flag set never enter iosched. They are always directly back inserted to dispatch queue. Of course, elevator_put_req_fn is called only for requests which have its REQ_ELVPRIV set. * Request queue maintains the current number of requests which have its elevator data set (elevator_set_req_fn called) in q->rq->elvpriv. * If a request queue has QUEUE_FLAG_BYPASS set, elevator private data is not allocated for new requests. To switch to another iosched, we set QUEUE_FLAG_BYPASS and wait until elvpriv goes to zero; then, we attach the new iosched and clears QUEUE_FLAG_BYPASS. New implementation is much simpler and main code paths are less cluttered, IMHO. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2005-10-28 08:48:12 +02:00
Tejun Heo	cb19833dcc	[BLOCK] kill generic max_back_kb handling This patch kills max_back_kb handling from elv_dispatch_sort() and kills max_back_kb field from struct request_queue. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2005-10-28 08:46:01 +02:00
Tejun Heo	06b86245c0	[PATCH] 03/05 move last_merge handlin into generic elevator code Currently, both generic elevator code and specific ioscheds participate in the management and usage of last_merge. This and the following patches move last_merge handling into generic elevator code. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2005-10-28 08:45:20 +02:00
Jens Axboe	1b47f531e2	[PATCH] generic dispatch fixes - Split elv_dispatch_insert() into two functions - Rename rq_last_sector() to rq_end_sector() Signed-off-by: Jens Axboe <axboe@suse.de>	2005-10-28 08:44:37 +02:00
Tejun Heo	8922e16cf6	[PATCH] 01/05 Implement generic dispatch queue Implements generic dispatch queue which can replace all dispatch queues implemented by each iosched. This reduces code duplication, eases enforcing semantics over dispatch queue, and simplifies specific ioscheds. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2005-10-28 08:44:24 +02:00
Adrian Bunk	2befb9e36d	[PATCH] include/linux/blkdev.h: "extern inline" -> "static inline" "extern inline" doesn't make much sense. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-09-10 10:06:34 -07:00
James Bottomley	31151ba2ce	fix mismerge in ll_rw_blk.c	2005-08-28 10:43:07 -05:00
Tejun Heo	ba02508248	[PATCH] blk: fix tag shrinking (revive real_max_size) My patch in commit `fa72b903f7` incorrectly removed blk_queue_tag->real_max_depth. The original resize implementation was incorrect in the following points. * actual allocation size of tag_index was shorter than real_max_size, but assumed to be of the same size, possibly causing memory access beyond the allocated area. * bits in tag_map between max_deptn and real_max_depth were initialized to 1's, making the tags permanently reserved. In an attempt to fix above two bugs, I had removed allocation optimization in init_tag_map and real_max_size. Tag map/index were allocated and freed immediately during resize. Unfortunately, I wasn't considering that tag map/index can be resized dynamically with tags beyond new_depth active. This led to accessing freed area after shrinking tags and led to the following bug reporting thread on linux-scsi. http://marc.theaimsgroup.com/?l=linux-scsi&m=112319898111885&w=2 To fix the problem, I've revived real_max_depth without allocation optimization in init_tag_map, and Andrew Vasquez confirmed that the problem was fixed. As Jens is not going to be available for a week, he asked me to make sure that this patch reaches you. http://marc.theaimsgroup.com/?l=linux-scsi&m=112325778530886&w=2 Also, a comment was added to make sure that real_max_size is needed for dynamic shrinking. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-05 13:43:16 -07:00
Nick Piggin	fb3cc4320e	[PATCH] blk: light iocontext ops get_io_context needlessly turned off interrupts and checked for racing io context creations. Both of which aren't needed, because the io context can only be created while in process context of the current process. Also, split the function in 2. A light version, current_io_context does not elevate the reference count specifically, but can be used when in process context, because the process holds a reference itself. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-28 21:20:35 -07:00
Jens Axboe	22e2c507c3	[PATCH] Update cfq io scheduler to time sliced design This updates the CFQ io scheduler to the new time sliced design (cfq v3). It provides full process fairness, while giving excellent aggregate system throughput even for many competing processes. It supports io priorities, either inherited from the cpu nice value or set directly with the ioprio_get/set syscalls. The latter closely mimic set/getpriority. This import is based on my latest from -mm. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-27 14:33:29 -07:00
Adrian Bunk	93d17d3d84	[PATCH] drivers/block/ll_rw_blk.c: cleanups This patch contains the following cleanups: - make needlessly global code static - remove the following unused global functions: - blkdev_scsi_issue_flush_fn - __blk_attempt_remerge - remove the following unused EXPORT_SYMBOL's: - blk_phys_contig_segment - blk_hw_contig_segment - blkdev_scsi_issue_flush_fn - __blk_attempt_remerge Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:25:05 -07:00
Tejun Heo	f7d37d028d	[PATCH] blk: remove BLK_TAGS_{PER_LONG\|MASK} Replace BLK_TAGS_PER_LONG with BITS_PER_LONG and remove unused BLK_TAGS_MASK. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:15 -07:00
Tejun Heo	fa72b903f7	[PATCH] blk: remove blk_queue_tag->real_max_depth optimization blk_queue_tag->real_max_depth was used to optimize out unnecessary allocations/frees on tag resize. However, the whole thing was very broken - tag_map was never allocated to real_max_depth resulting in access beyond the end of the map, bits in [max_depth..real_max_depth] were set when initializing a map and copied when resizing resulting in pre-occupied tags. As the gain of the optimization is very small, well, almost nill, remove the whole thing. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:15 -07:00

... 3 4 5 6 7 ...

458 Commits (180b2f95dd331010a9930a65c8a18d6d81b94dc1)