aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Comment typo fixes for 'descriptor'Justin P. Mattock2011-01-19
| | | | | Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* replace obsolete CONFIG_DYNAMIC_PRINTK_DEBUG with CONFIG_DYNAMIC_DEBUGJim Cromie2011-01-19
| | | | | | | | | | | | | former is obsoleted by latter, done by       commit e9d376f0fa66bd630fe27403669c6ae6c22a868f       Author: Jason Baron <jbaron@redhat.com>       Date:   Thu Feb 5 11:51:38 2009 -0500 most defconfig mentions have been removed in the big defconfig cleanup, but the one in s6105_defconfig remains. Signed-off-by:  Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* scsi: Remove unnecessary casts of void ptr returning alloc function return ↵Jesper Juhl2011-01-19
| | | | | | | | | | | | | values The [vk][cmz]alloc(_node) family of functions return void pointers which it's completely unnecessary/pointless to cast to other pointer types since that happens implicitly. This patch removes such casts from drivers/scsi/ Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* drivers/message/, i2o: Remove unnecessary casts of void ptr returning alloc ↵Jesper Juhl2011-01-19
| | | | | | | | | | | | | function return values The [vk][cmz]alloc(_node) family of functions return void pointers which it's completely unnecessary/pointless to cast to other pointer types since that happens implicitly. This patch removes such casts from drivers/message/ Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* GPU DRM: Remove unnecessary casts of void ptr returning alloc function ↵Jesper Juhl2011-01-19
| | | | | | | | | | | | | return values The [vk][cmz]alloc(_node) family of functions return void pointers which it's completely unnecessary/pointless to cast to other pointer types since that happens implicitly. This patch removes such casts from drivers/gpu/drm/ Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* Kconfig: BLK_THROTTLE -> BLK_DEV_THROTTLINGMichael Witten2011-01-17
| | | | | | | | | | | | | | It would seem that `CONFIG_BLK_THROTTLE' doesn't exist, as it is only referenced in the documentation for `CONFIG_BLK_CGROUP'. The only other choice is `CONFIG_BLK_DEV_THROTTLING': $ git grep --cached THROTTL -- \*Kconfig block/Kconfig:config BLK_DEV_THROTTLING init/Kconfig: CONFIG_BLK_THROTTLE=y. Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* Kconfig: Typo: seti -> setMichael Witten2011-01-17
| | | | | | | Also, I introduced some punctuation to facilitate reading. Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* Documentation: dm-crypt: update cryptsetup homepageAndrea Gelmini2011-01-13
| | | | | | Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Acked-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* Merge branch 'release' of ↵Linus Torvalds2011-01-13
|\ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Fix format warning in arch/ia64/kernel/acpi.c
| * [IA64] Fix format warning in arch/ia64/kernel/acpi.cTony Luck2011-01-12
| | | | | | | | | | | | | | | | | | arch/ia64/kernel/acpi.c:481: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long unsigned int’ Introduced by commit 05f2f274c8a8747bbfb13ac8ee0c27d5f2ad8510 [IA64] Avoid array overflow if there are too many cpus in SRAT table Signed-off-by: Tony Luck <tony.luck@intel.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-01-13
|\ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: firewire: ohci: fix compilation on arches without PAGE_KERNEL_RO
| * | firewire: ohci: fix compilation on arches without PAGE_KERNEL_ROClemens Ladisch2011-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAGE_KERNEL_RO is not available on all architectures, so its use in the new AR code broke compilation on sparc64. Because the read-only mapping was just a debugging aid, just use PAGE_KERNEL instead. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> James Bottomley wrote: > On Thu, 2011-01-13 at 08:27 +0100, Clemens Ladisch wrote: >> firewire: ohci: fix compilation on arches without PAGE_KERNEL_RO, e.g. sparc >> >> PAGE_KERNEL_RO is not available on all architectures, so its use in the >> new AR code broke compilation on sparc64. >> >> Because the R/O mapping is only used to catch drivers that try to write >> to the reception buffer and not actually required for correct operation, >> we can just use a normal PAGE_KERNEL mapping where _RO is not available. [...] >> +/* >> + * For archs where PAGE_KERNEL_RO is not supported; >> + * mapping the AR buffers readonly for the CPU is just a debugging aid. >> + */ >> +#ifndef PAGE_KERNEL_RO >> +#define PAGE_KERNEL_RO PAGE_KERNEL >> +#endif > > This might cause interesting issues on sparc64 if it ever acquired a > PAGE_KERNEL_RO. Sparc64 has extern pgprot_t for it's PAGE_KERNEL types > rather than #defines, so the #ifdef check wouldn't see this. > > I think either PAGE_PROT_RO becomes part of our arch API (so all > architectures are forced to add it), or, if it's not part of the API, > ohci isn't entitled to use it. The latter seems simplest since you have > no real use for write protection anyway. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
* | | Merge branch 'for-2.6.38/drivers' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-01-13
|\ \ \ | | | | | | | | | | | | | | | | | | | | * 'for-2.6.38/drivers' of git://git.kernel.dk/linux-2.6-block: cciss: reinstate proper FIFO order of command queue list floppy: replace NO_GEOM macro with a function
| * | | cciss: reinstate proper FIFO order of command queue listJens Axboe2011-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8a3173de inadvertently changed the ordering when switching to hlists. Change to regular list heads so we can use tail list adds, this improves performance. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | floppy: replace NO_GEOM macro with a functionPekka Enberg2010-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces the NO_GEOM macro with a proper static inline function and converts an open-coded caller in check_floppy_change() to use it. Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | | Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-01-13
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...
| * \ \ \ Merge branch 'for-2.6.38/event-handling' into for-2.6.38/coreJens Axboe2011-01-13
| |\ \ \ \
| | * | | | Revert "sd: implement sd_check_events()"Jens Axboe2010-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit c8d2e937355d02db3055c2fc203e5f017297ee1f. We run into merging problems with the SCSI tree, revert this one so it can be handled by a postmerge tree there. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | cdrom: export cdrom_check_events()Jens Axboe2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's used by sr, so we need to export it. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | sd: implement sd_check_events()Tejun Heo2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace sd_media_change() with sd_check_events(). sd used to set the changed state whenever the device is not ready, which can cause event loop while the device is not ready. Media presence handling code is changed such that the changed state is set iff the media presence actually changes. UA still always sets the changed state and NOT_READY always (at least where it used to set ->changed) clears media presence, so no event is lost. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | sr: implement sr_check_events()Tejun Heo2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace sr_media_change() with sr_check_events(). It normally only uses GET_EVENT_STATUS_NOTIFICATION to check both media change and eject request. If @clearing includes DISK_EVENT_MEDIA_CHANGE, it issues TUR and compares whether media presence has changed. The SCSI specific media change uevent is kept for compatibility. sr_media_change() was doing both media change check and revalidation. The revalidation part is split into sr_block_revalidate_disk(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | scsi: replace sr_test_unit_ready() with scsi_test_unit_ready()Tejun Heo2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The usage of TUR has been confusing involving several different commits updating different parts over time. Currently, the only differences between scsi_test_unit_ready() and sr_test_unit_ready() are, * scsi_test_unit_ready() also sets sdev->changed on NOT_READY. * scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or NOT_READY. Due to the above two differences, sr is using its own sr_test_unit_ready(), but sd - the sole user of the above extra handling - doesn't even need them. Where scsi_test_unit_ready() is used in sd_media_changed(), the code is looking for device ready w/ media present state which is true iff TUR succeeds w/o sense data or UA, and when the device is not ready for whatever reason sd_media_changed() explicitly marks media as missing so there's no reason to set sdev->changed automatically from scsi_test_unit_ready() on NOT_READY. Drop both special handlings from scsi_test_unit_ready(), which makes it equivalant to sr_test_unit_ready(), and replace sr_test_unit_ready() with scsi_test_unit_ready(). Also, drop the unnecessary explicit NOT_READY check from sd_media_changed(). Checking return value is enough for testing device readiness. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | scsi: fix TUR error handling in sr_media_change()Tejun Heo2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sr_test_unit_ready() returns 0 iff TUR succeeded - IOW, when media is present and the device is actually ready, so the return value wouldn't be zero when TUR ends with sense data. sr_media_change() incorrectly tests (retval || (scsi_sense_valid(sshdr)...)) when it tries to test whether TUR failed without sense data or with sense data indicating media-not-present. Fix the test using scsi_status_is_good() and update comments. - Fixed a comment typo spotted by Eike. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | cdrom: add ->check_events() supportTejun Heo2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In principle, cdrom just needs to pass through ->check_events() but CDROM_MEDIA_CHANGED ioctl makes things a bit more complex. Just as with ->media_changed() support, cdrom code needs to buffer the events and serve them to ioctl and vfs as requested. As the code has to deal with both ->check_events() and ->media_changed(), and vfs and ioctl event buffering, this patch adds check_events caching on top of the existing cdi->mc_flags buffering. It may be a good idea to deprecate CDROM_MEDIA_CHANGED ioctl and remove all this mess. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | implement in-kernel gendisk events handlingTejun Heo2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, media presence polling for removeable block devices is done from userland. There are several issues with this. * Polling is done by periodically opening the device. For SCSI devices, the command sequence generated by such action involves a few different commands including TEST_UNIT_READY. This behavior, while perfectly legal, is different from Windows which only issues single command, GET_EVENT_STATUS_NOTIFICATION. Unfortunately, some ATAPI devices lock up after being periodically queried such command sequences. * There is no reliable and unintrusive way for a userland program to tell whether the target device is safe for media presence polling. For example, polling for media presence during an on-going burning session can make it fail. The polling program can avoid this by opening the device with O_EXCL but then it risks making a valid exclusive user of the device fail w/ -EBUSY. * Userland polling is unnecessarily heavy and in-kernel implementation is lighter and better coordinated (workqueue, timer slack). This patch implements framework for in-kernel disk event handling, which includes media presence polling. * bdops->check_events() is added, which supercedes ->media_changed(). It should check whether there's any pending event and return if so. Currently, two events are defined - DISK_EVENT_MEDIA_CHANGE and DISK_EVENT_EJECT_REQUEST. ->check_events() is guaranteed not to be called parallelly. * gendisk->events and ->async_events are added. These should be initialized by block driver before passing the device to add_disk(). The former contains the mask of all supported events and the latter the mask of all events which the device can report without polling. /sys/block/*/events[_async] export these to userland. * Kernel parameter block.events_dfl_poll_msecs controls the system polling interval (default is 0 which means disable) and /sys/block/*/events_poll_msecs control polling intervals for individual devices (default is -1 meaning use system setting). Note that if a device can report all supported events asynchronously and its polling interval isn't explicitly set, the device won't be polled regardless of the system polling interval. * If a device is opened exclusively with write access, event checking is automatically disabled until all write exclusive accesses are released. * There are event 'clearing' events. For example, both of currently defined events are cleared after the device has been successfully opened. This information is passed to ->check_events() callback using @clearing argument as a hint. * Event checking is always performed from system_nrt_wq and timer slack is set to 25% for polling. * Nothing changes for drivers which implement ->media_changed() but not ->check_events(). Going forward, all drivers will be converted to ->check_events() and ->media_change() will be dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | block: move register_disk() and del_gendisk() to block/genhd.cTejun Heo2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason for register_disk() and del_gendisk() to be in fs/partitions/check.c. Move both to genhd.c. While at it, collapse unlink_gendisk(), which was artificially in a separate function due to genhd.c / check.c split, into del_gendisk(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | block: kill genhd_media_change_notify()Tejun Heo2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no user of the facility. Kill it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | block: ensure that completion error gets properly tracedJens Axboe2011-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We normally just use the BIO_UPTODATE flag to signal 0/-EIO. If we have more information available, we should pass that along to the trace output. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | blktrace: add missing probe argument to block_bio_completeMathieu Desnoyers2011-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blktrace.c block bio complete callback needs to gain a new argument to reflect the newly added "error" tracepoint argument. This is needed to match the new block_bio_complete TRACE_EVENT as of commit de983a7bfcb7c020901ca6e2314cf55a4207ab5a. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Jeff Moyer <jmoyer@redhat.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | block cfq: don't use atomic_t for cfq_groupShaohua Li2011-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfq_group->ref is used with queue_lock hold, the only exception is cfq_set_request, which looks like a bug to me, so ref doesn't need to be an atomic and atomic operation is slower. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | block cfq: don't use atomic_t for cfq_queueShaohua Li2011-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfq_queue->ref is used with queue_lock hold, so ref doesn't need to be an atomic and atomic operation is slower. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | block: trace event block fix unassigned fieldJeff Moyer2011-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "error" field in block_bio_complete is not assigned, leaving the memory area uninitialized (keeping garbage data). Pass an additional tracepoint argument to this event to initialize this field. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: Li Zefan <lizf@cn.fujitsu.com> CC: Alan.Brunelle@hp.com Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | block: add internal hd part table referencesJens Axboe2011-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't use krefs since it's apparently restricted to very basic reference counting. This reverts commit e4a683c8. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | block: fix accounting bug on cross partition mergesJerome Marchand2011-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/diskstats would display a strange output as follows. $ cat /proc/diskstats |grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089 8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691 ~~~~~~~~~~ 8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390 8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92 8 4 sda4 4 0 8 0 0 0 0 0 0 0 0 8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137 Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE. The detailed root cause is as follows. Assuming that there are two partition, sda1 and sda2. 1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight is 0 and sda2's one is 1. | hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 --------------------------- 2. A bio belongs to sda1 is issued and is merged into the request mentioned on step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed from sda2 region to sda1 region. However the two partition's hd_struct->in_flight are not changed. | hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 --------------------------- 3. The request is finished and blk_account_io_done() is called. In this case, sda2's hd_struct->in_flight, not a sda1's one, is decremented. | hd_struct->in_flight --------------------------- sda1 | -1 sda2 | 1 --------------------------- The patch fixes the problem by caching the partition lookup inside the request structure, hence making sure that the increment and decrement will always happen on the same partition struct. This also speeds up IO with accounting enabled, since it cuts down on the number of lookups we have to do. Also add a refcount to struct hd_struct to keep the partition in memory as long as users exist. We use kref_test_and_get() to ensure we don't add a reference to a partition which is going away. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | kref: add kref_test_and_getJerome Marchand2011-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add kref_test_and_get() function, which atomically add a reference only if refcount is not zero. This prevent to add a reference to an object that is already being removed. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | bio-integrity: mark kintegrityd_wq highpri and CPU intensiveTejun Heo2011-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Work items processed by kintegrityd_wq won't block much, may burn a lot of CPU cycles and affect IO latency. Use alloc_workqueue() to mark it highpri and CPU intensive with max concurrency of 1. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | block: make kblockd_workqueue smarterTejun Heo2011-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kblockd is used for unplugging and may affect IO latency and throughput and the max number of concurrent work items are bound by the number of block devices. Make it HIGHPRI workqueue w/ default max concurrency. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | block: Clean up exit_io_context() source code.Bart Van Assche2010-12-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a spelling error in a source code comment and removes superfluous braces in the function exit_io_context(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | Fix compile warnings due to missing removal of a 'ret' variableJens Axboe2010-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a8adbe3 forgot to remove the return variable, kill it. drivers/block/loop.c: In function 'lo_splice_actor': drivers/block/loop.c:398: warning: unused variable 'ret' [...] fs/nfsd/vfs.c: In function 'nfsd_splice_actor': fs/nfsd/vfs.c:848: warning: unused variable 'ret' Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | fs/block: type signature of major_to_index(int) to major_to_index(unsigned)Yang Zhang2010-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The major/minor device numbers are always defined and used as `unsigned'. Signed-off-by: Yang Zhang <kthreadd@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)Yang Zhang2010-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Yang Zhang <kthreadd@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | cfq-iosched: don't check cfqg in choose_service_tree()Gui Jianfeng2010-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cfq_choose_cfqg() is called in select_queue(), there must be at least one backlogged CFQ queue waiting for dispatching, hence there must be at least one backlogged CFQ group on service tree. So we never call choose_service_tree() with cfqg == NULL. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | fs/splice: Pull buf->ops->confirm() from splice_from_pipe actorsMichał Mirosław2010-12-17
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch pulls calls to buf->ops->confirm() from all actors passed (also indirectly) to splice_from_pipe_feed(). Is avoiding the call to buf->ops->confirm() while splice()ing to /dev/null is an intentional optimization? No other user does that and this will remove this special case. Against current linux.git 6313e3c21743cc88bb5bd8aa72948ee1e83937b6. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | block cfq: select new workload if priority changedShaohua Li writes2010-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If priority is changed, continuing to check workload_expires and service tree count of the previous workload does not make sense. We should always choose the workload with lowest key of new priority in such case. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | cfq-iosched: Get rid of on_st flagGui Jianfeng2010-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's able to check whether a CFQ group on a service tree by checking "cfqg->rb_node". There's no need to maintain an extra flag here. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | cfq-iosched: Get rid of st->activeGui Jianfeng2010-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a cfq group is running, it won't be dequeued from service tree, so there's no need to store the active one in st->active. Just gid rid of it. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | Merge branch 'cleanup-bd_claim' of ↵Jens Axboe2010-11-27
| |\ \ \ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into for-2.6.38/core
| | * | | | block: clean up blkdev_get() wrappers and their usersTejun Heo2010-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After recent blkdev_get() modifications, open_by_devnum() and open_bdev_exclusive() are simple wrappers around blkdev_get(). Replace them with blkdev_get_by_dev() and blkdev_get_by_path(). blkdev_get_by_dev() is identical to open_by_devnum(). blkdev_get_by_path() is slightly different in that it doesn't automatically add %FMODE_EXCL to @mode. All users are converted. Most conversions are mechanical and don't introduce any behavior difference. There are several exceptions. * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no reason to OR it explicitly on blkdev_put(). * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in sb->s_mode. * With the above changes, sb->s_mode now always should contain FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect errors. The new blkdev_get_*() functions are with proper docbook comments. While at it, add function description to blkdev_get() too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: reiserfs-devel@vger.kernel.org Cc: xfs-masters@oss.sgi.com Cc: Alexander Viro <viro@zeniv.linux.org.uk>
| | * | | | block: check bdev_read_only() from blkdev_get()Tejun Heo2010-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdev read-only status can be queried using bdev_read_only() and may change while the device is being opened. Enforce it by checking it from blkdev_get() after open succeeds. This makes bdev_read_only() check in open_bdev_exclusive() and fsg_lun_open() unnecessary. Drop them. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Brownell <dbrownell@users.sourceforge.net> Cc: linux-usb@vger.kernel.org
| | * | | | block: reorganize claim/release implementationTejun Heo2010-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With claim/release rolled into blkdev_get/put(), there's no reason to keep bd_abort/finish_claim(), __bd_claim() and bd_release() as separate functions. It only makes the code difficult to follow. Collapse them into blkdev_get/put(). This will ease future changes around claim/release. Signed-off-by: Tejun Heo <tj@kernel.org>