aboutsummaryrefslogtreecommitdiffstats
path: root/block/genhd.c
Commit message (Collapse)AuthorAge
* block: add one-hit cache for disk partition lookupJens Axboe2008-12-29
| | | | | | | | | | | | | | | | | disk_map_sector_rcu() returns a partition from a sector offset, which we use for IO statistics on a per-partition basis. The lookup itself is an O(N) list lookup, where N is the number of partitions. This actually hurts performance quite a bit, even on the lower end partitions. On higher numbered partitions, it can get pretty bad. Solve this by adding a one-hit cache for partition lookup. This makes the lookup O(1) for the case where we do most IO to one partition. Even for mixed partition workloads, amortized cost is pretty close to O(1) since the natural IO batching makes the one-hit cache last for lots of IOs. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: set disk->node_id before it's being usedCheng Renquan2008-12-03
| | | | | | | | disk->node_id will be refered in allocating in disk_expand_part_tbl, so we should set it before disk->node_id is refered. Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nashZhang, Yanmin2008-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We run into system boot failure with kernel 2.6.28-rc. We found it on a couple of machines, including T61 notebook, nehalem machine, and another HPC NX6325 notebook. All the machines use FedoraCore 8 or FedoraCore 9. With kernel prior to 2.6.28-rc, system boot doesn't fail. I debug it and locate the root cause. Pls. see http://bugzilla.kernel.org/show_bug.cgi?id=11899 https://bugzilla.redhat.com/show_bug.cgi?id=471517 As a matter of fact, there are 2 bugs. 1)root=/dev/sda1, system boot randomly fails. Mostly, boot for 5 times and fails once. nash has a bug. Some of its functions misuse return value 0. Sometimes, 0 means timeout and no uevent available. Sometimes, 0 means nash gets an uevent, but the uevent isn't block-related (for exmaple, usb). If by coincidence, kernel tells nash that uevents are available, but kernel also set timeout, nash might stops collecting other uevents in queue if current uevent isn't block-related. I work out a patch for nash to fix it. http://bugzilla.kernel.org/attachment.cgi?id=18858 2) root=LABEL=/, system always can't boot. initrd init reports switchroot fails. Here is an executation branch of nash when booting: (1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop) (2) nash query /proc/devices with the major number; It found line "8 sd"; (3) nash use 'sd' to search its own probe table to find device (DISK) type for the device and add it to its own list; (4) Later on, it probes all devices in its list to get filesystem labels; scsi register "8 sd" always. When major is 259, nash fails to find the device(DISK) type. I enables CONFIG_DEBUG_BLOCK_EXT_DEVT=y when compiling kernel, so 259 is picked up for device /dev/sda1, which causes nash to fail to find device (DISK) type. To fixing issue 2), I create a patch for nash and another patch for kernel. http://bugzilla.kernel.org/attachment.cgi?id=18859 http://bugzilla.kernel.org/attachment.cgi?id=18837 Below is the patch for kernel 2.6.28-rc4. It registers blkext, a new block device in proc/devices. With 2 patches on nash and 1 patch on kernel, I boot my machines for dozens of times without failure. Signed-off-by Zhang Yanmin <yanmin.zhang@linux.intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* proc: move /proc/diskstats boilerplate to block/genhd.cAlexey Dobriyan2008-10-23
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Jens Axboe <jens.axboe@oracle.com>
* proc: move rest of /proc/partitions code to block/genhd.cAlexey Dobriyan2008-10-23
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Jens Axboe <jens.axboe@oracle.com>
* block: fix current kernel-doc warningsRandy Dunlap2008-10-17
| | | | | | | | | | | | Fix block kernel-doc warnings: Warning(linux-2.6.27-git4//fs/block_dev.c:1272): No description found for parameter 'path' Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'cpu' Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'part' Warning(/var/linsrc/linux-2.6.27-git4//block/genhd.c:544): No description found for parameter 'partno' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: fix kernel-doc for blk_alloc_devt()Li Zefan2008-10-17
| | | | | | | No argument 'gfp_mask' for blk_alloc_devt(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: add fault injection mechanism for faking request timeoutsJens Axboe2008-10-09
| | | | | | | | Only works for the generic request timer handling. Allows one to sporadically ignore request completions, thus exercising the timeout handling. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: fix duplicate headers for /proc/partitionsTejun Heo2008-10-09
| | | | | | | | seqf can be started multiple times for a read and the header should be printed only for the initial one. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: don't test for partition size in bdget_disk() and blk_lookup_devt()Tejun Heo2008-10-09
| | | | | | | | | | | | | | | | | bdget_disk() and blk_lookup_devt() never cared whether the specified partition (or disk) is zero sized or not. I got confused while converting those not to depend on consecutive minor numbers in commit 5a6411b1178baf534aa9138052864dfa89d3eada and later when dev0 was added it broke callers which expected to get valid return for zero sized disk devices. So, they never needed nr_sects checks in the first place. Kill them. This problem was spotted and debugged by Bartlmoiej Zolnierkiewicz. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: kmalloc args reversed, small function definition fixesHarvey Harrison2008-10-09
| | | | | | | | | | | | | | | | Noticed by sparse: block/blk-softirq.c:156:12: warning: symbol 'blk_softirq_init' was not declared. Should it be static? block/genhd.c:583:28: warning: function 'bdget_disk' with external linkage has definition block/genhd.c:659:17: warning: incorrect type in argument 1 (different base types) block/genhd.c:659:17: expected unsigned int [unsigned] [usertype] size block/genhd.c:659:17: got restricted gfp_t block/genhd.c:659:29: warning: incorrect type in argument 2 (different base types) block/genhd.c:659:29: expected restricted gfp_t [usertype] flags block/genhd.c:659:29: got unsigned int block: kmalloc args reversed Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: allow disk to have extended device numberTejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | | | Now that disk and partition handlings are mostly unified, it's easy to allow disk to have extended device number. This patch makes add_disk() use extended device number if disk->minors is zero. Both sd and ide-disk are updated to use this. * sd_format_disk_name() is implemented which can generically determine the drive name. This removes disk number restriction stemming from limited device names. * If sd index goes over SD_MAX_DISKS (which can be increased now BTW), sd simply doesn't initialize minors letting block layer choose extended device number. * If CONFIG_DEBUG_EXT_DEVT is set, both sd and ide-disk always set minors to 0 and use extended device numbers. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: replace @ext_minors with GENHD_FL_EXT_DEVTTejun Heo2008-10-09
| | | | | | | | | | | | | | | With previous changes, it's meaningless to limit the number of partitions. Replace @ext_minors with GENHD_FL_EXT_DEVT such that setting the flag allows the disk to have maximum number of allowed partitions (only limited by the number of entries in parsed_partitions as determined by MAX_PART constant). This kills not-too-pretty alloc_disk_ext[_node]() functions and makes @minors parameter to alloc_disk[_node]() unnecessary. The parameter is left alone to avoid disturbing the users. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: make partition array dynamicTejun Heo2008-10-09
| | | | | | | | | | | | disk->__part used to be statically allocated to the maximum possible number of partitions. This patch makes partition array allocation dynamic. The added overhead is minimal as only real change is one memory dereference changed to RCU one. This saves both a bit of memory and cpu cycles iterating through unoccupied slots and makes increasing partition limit easier. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: move stats from disk to part0Tejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move stats related fields - stamp, in_flight, dkstats - from disk to part0 and unify stat handling such that... * part_stat_*() now updates part0 together if the specified partition is not part0. ie. part_stat_*() are now essentially all_stat_*(). * {disk|all}_stat_*() are gone. * part_round_stats() is updated similary. It handles part0 stats automatically and disk_round_stats() is killed. * part_{inc|dec}_in_fligh() is implemented which automatically updates part0 stats for parts other than part0. * disk_map_sector_rcu() is updated to return part0 if no part matches. Combined with the above changes, this makes NULL special case handling in callers unnecessary. * Separate stats show code paths for disk are collapsed into part stats show code paths. * Rename disk_stat_lock/unlock() to part_stat_lock/unlock() While at it, reposition stat handling macros a bit and add missing parentheses around macro parameters. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: kill GENHD_FL_FAIL and use part0->make_it_failTejun Heo2008-10-09
| | | | | | | | GENHD_FL_FAIL for disk is what make_it_fail is for parts. Kill it and use part0->make_it_fail. Sysfs node handling is unified too. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: move policy from disk to part0Tejun Heo2008-10-09
| | | | | | | Move disk->policy to part0->policy. Implement and use get_disk_ro(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: unify sysfs size node handlingTejun Heo2008-10-09
| | | | | | | | Now that capacity and __dev are moved to part0, part0 and others can share the same method. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: move __dev from disk to part0Tejun Heo2008-10-09
| | | | | | | | | | | | | | | Move disk->__dev to part0->__dev. This simplifies bdget_disk() and lookup_devt() and allows common sysfs attributes to be unified. part_to_disk() is updated to handle part0 -> disk. Updated to include a fix from Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>, he writes: "part0 is a "special" partition and doesn't need to have capacity set - this fixes regression caused by "block: move __dev from disk to part0" commit." Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: introduce partition 0Tejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | | | | | genhd and partition code handled disk and partitions separately. All information about the whole disk was in struct genhd and partitions in struct hd_struct. However, the whole disk (part0) and other partitions have a lot in common and the data structures end up having good number of common fields and thus separate code paths doing the same thing. Also, the partition array was indexed by partno - 1 which gets pretty confusing at times. This patch introduces partition 0 and makes the partition array indexed by partno. Following patches will unify the handling of disk and parts piece-by-piece. This patch also implements disk_partitionable() which tests whether a disk is partitionable. With coming dynamic partition array change, the most common usage of disk_max_parts() will be testing whether a disk is partitionable and the number of max partitions will become much less important. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: implement and use {disk|part}_to_dev()Tejun Heo2008-10-09
| | | | | | | | | | | | Implement {disk|part}_to_dev() and use them to access generic device instead of directly dereferencing {disk|part}->dev. To make sure no user is left behind, rename generic devices fields to __dev. This is in preparation of unifying partition 0 handling with other partitions. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: implement CONFIG_DEBUG_BLOCK_EXT_DEVTTejun Heo2008-10-09
| | | | | | | | | | | | | | | | Extended devt introduces non-contiguos device numbers. This patch implements a debug option which forces most devt allocations to be from the extended area and spreads them out. This is enabled by default if DEBUG_KERNEL is set and achieves... 1. Detects code paths in kernel or userland which expect predetermined consecutive device numbers. 2. When something goes wrong, avoid corruption as adding to the minor of earlier partition won't lead to the wrong but valid device. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: adjust formatting for large minors and add ext_range sysfs attrTejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | | With extended minors and the soon-to-follow debug feature, large minor numbers for block devices will be common. This patch does the followings to make printouts pretty. * Adapt print formats such that large minors don't break the formatting. * For extended MAJ:MIN, %02x%02x for MAJ:MIN used in printk_all_partitions() doesn't cut it anymore. Update it such that %03x:%05x is used if either MAJ or MIN doesn't fit in %02x. * Implement ext_range sysfs attribute which shows total minors the device can use including both conventional minor space and the extended one. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: implement extended dev numbersTejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | | Implement extended device numbers. A block driver can tell block layer that it wants to use extended device numbers. After the usual minor space is used up, block layer automatically allocates devt's from EXT_BLOCK_MAJOR. Currently only one major number is allocated for this but as the allocation is strictly on-demand, ~1mil minor space under it should suffice unless the system actually has more than ~1mil partitions and if that ever happens adding more majors to the extended devt area is easy. Due to internal implementation issues, the first partition can't be allocated on the extended area. In other words, genhd->minors should at least be 1. This limitation will be lifted by later changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: fix diskstats accessTejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two variants of stat functions - ones prefixed with double underbars which don't care about preemption and ones without which disable preemption before manipulating per-cpu counters. It's unclear whether the underbarred ones assume that preemtion is disabled on entry as some callers don't do that. This patch unifies diskstats access by implementing disk_stat_lock() and disk_stat_unlock() which take care of both RCU (for partition access) and preemption (for per-cpu counter access). diskstats access should always be enclosed between the two functions. As such, there's no need for the versions which disables preemption. They're removed and double underbars ones are renamed to drop the underbars. As an extra argument is added, there's no danger of using the old version unconverted. disk_stat_lock() uses get_cpu() and returns the cpu index and all diskstat functions which access per-cpu counters now has @cpu argument to help RT. This change adds RCU or preemption operations at some places but also collapses several preemption ops into one at others. Overall, the performance difference should be negligible as all involved ops are very lightweight per-cpu ones. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: fix disk->part[] dereferencing raceTejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disk->part[] is protected by its matching bdev's lock. However, non-critical accesses like collecting stats and printing out sysfs and proc information used to be performed without any locking. As partitions can come and go dynamically, partitions can go away underneath those non-critical accesses. As some of those accesses are writes, this theoretically can lead to silent corruption. This patch fixes the race by using RCU for the partition array and dev reference counter to hold partitions. * Rename disk->part[] to disk->__part[] to make sure no one outside genhd layer proper accesses it directly. * Use RCU for disk->__part[] dereferencing. * Implement disk_{get|put}_part() which can be used to get and put partitions from gendisk respectively. * Iterators are implemented to help iterate through all partitions safely. * Functions which require RCU readlock are marked with _rcu suffix. * Use disk_put_part() in __blkdev_put() instead of directly putting the contained kobject. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: don't depend on consecutive minor spaceTejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | * Implement disk_devt() and part_devt() and use them to directly access devt instead of computing it from ->major and ->first_minor. Note that all references to ->major and ->first_minor outside of block layer is used to determine devt of the disk (the part0) and as ->major and ->first_minor will continue to represent devt for the disk, converting these users aren't strictly necessary. However, convert them for consistency. * Implement disk_max_parts() to avoid directly deferencing genhd->minors. * Update bdget_disk() such that it doesn't assume consecutive minor space. * Move devt computation from register_disk() to add_disk() and make it the only one (all other usages use the initially determined value). These changes clean up the code and will help disk->part dereference fix and extended block device numbers. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: make variable and argument names more consistentTejun Heo2008-10-09
| | | | | | | | | | | | | | In hd_struct, @partno is used to denote partition number and a number of other places use @part to denote hd_struct. Functions use @part and @index instead. This causes confusion and makes it difficult to use consistent variable names for hd_struct. Always use @partno if a variable represents partition number. Also, print out functions use @f or @part for seq_file argument. Use @seqf uniformly instead. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: misc updatesTejun Heo2008-10-09
| | | | | | | | | | | | | | | | | This patch makes the following misc updates in preparation for disk->part dereference fix and extended block devt support. * implment part_to_disk() * fix comment about gendisk->part indexing * rename get_part() to disk_map_sector() * don't use n which is always zero while printing disk information in diskstats_show() Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: use class_dev_iterator instead of class_for_each_device()Tejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | Recent block_class iteration updates 5c6f35c5..27f3025 converted all class device iteration to class_for_each_device() and class_find_device(), which are correct but pain in the ass to use. This pach converts them to newly introduced class_dev_iterator so that they can use more natural control structures instead of separate callbacks and struct to pass parameters to them. This results in smaller and easier code. This patch also restores the original behavior of not printing header in /proc/partitions if there's no partition to print. This is trivial but still user-visible behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: don't grab block_class_lock unnecessarilyTejun Heo2008-10-09
| | | | | | | | | block_class_lock protects major_names array and bdev_map and doesn't have anything to do with block class devices. Don't grab them while iterating over block class devices. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: fix partition info printoutsTejun Heo2008-10-09
| | | | | | | | | | | | | | | | | | | | | Recent block_class iteration updates 5c6f35c5..27f3025 broke partition info printouts. * printk_all_partitions(): Partition print out stops when it meets a partition hole. Partition printing inner loop should continue instead of exiting on empty partition slot. * /proc/partitions and /proc/diskstats: If all information can't be read in single read(), the information is truncated. This is because find_start() doesn't actually update the counter containing the initial seek. It runs to the end and ends up always reporting EOF on the second read. This patch fixes both problems. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Add some block/ source files to the kernel-api docbook. Fix kernel-doc ↵Randy Dunlap2008-10-09
| | | | | | | notation in them as needed. Fix changed function parameter names. Fix typos/spellos. In comments, change REQ_SPECIAL to REQ_TYPE_SPECIAL and REQ_BLOCK_PC to REQ_TYPE_BLOCK_PC. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: restore original behavior of /proc/partition when there's no partitionTejun Heo2008-09-01
| | | | | | | | | | | | | | | /proc/partitions didn't use to write out the header if there was no partition. However, recent commit 66c64afe changed the behavior. This is nothing major but there's no reason to change user visible behavior without a good rationale. Restore the original behavior. Note that 2.6.28 has clean up changes scheduled which will replace this rather hacky implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* remove blk_register_filter and blk_unregister_filter in gendiskFUJITA Tomonori2008-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch remove blk_register_filter and blk_unregister_filter in gendisk, and adds them to sd.c, sr.c. and ide-cd.c The commit abf5439370491dd6fbb4fe1a7939680d2a9bc9d4 moved cmdfilter from gendisk to request_queue. It turned out that in some subsystems multiple gendisks share a single request_queue. So we get: Using physmap partition information Creating 3 MTD partitions on "physmap-flash": 0x00000000-0x01c00000 : "User FS" 0x01c00000-0x01c40000 : "booter" kobject (8511c410): tried to init an initialized object, something is seriously wrong. Call Trace: [<8036644c>] dump_stack+0x8/0x34 [<8021f050>] kobject_init+0x50/0xcc [<8021fa18>] kobject_init_and_add+0x24/0x58 [<8021d20c>] blk_register_filter+0x4c/0x64 [<8021c194>] add_disk+0x78/0xe0 [<8027d14c>] add_mtd_blktrans_dev+0x254/0x278 [<8027c8f0>] blktrans_notify_add+0x40/0x78 [<80279c00>] add_mtd_device+0xd0/0x150 [<8027b090>] add_mtd_partitions+0x568/0x5d8 [<80285458>] physmap_flash_probe+0x2ac/0x334 [<802644f8>] driver_probe_device+0x12c/0x244 [<8026465c>] __driver_attach+0x4c/0x84 [<80263c64>] bus_for_each_dev+0x58/0xac [<802633ec>] bus_add_driver+0xc4/0x24c [<802648e0>] driver_register+0xcc/0x184 [<80100460>] _stext+0x60/0x1bc In the long term, we need to fix such subsystems but we need a quick fix now. This patch add the command filter support to only sd and sr though it might be useful for other SG_IO users (such as cciss). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reported-by: Manuel Lauss <mano@roarinelk.homelinux.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: drop references taken by class_find_device()Kay Sievers2008-08-21
| | | | | | | | | Otherwise we leak references, which is not a good thing to do. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: fix partial read() of /proc/{partitions,diskstats}Kay Sievers2008-08-21
| | | | | | | | | | | The proc files get truncated if they do not fit into the buffer with a single read(). We need to move the seq_file index from the callback of class_find_device() to the caller of class_find_device(), to keep its value across multiple invocations of the callback. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: make /proc/partitions and /proc/diskstats use class_find_device()Greg Kroah-Hartman2008-07-22
| | | | | | | | Use the proper class iterator function instead of mucking around in the internals of the class structures. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: move header for /proc/partitions to seq_startGreg Kroah-Hartman2008-07-22
| | | | | | | | | The seq_start call is the better place for the header for the file, that way we don't have to be mucking in the class structure to try to figure out if this is the first partition or not. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: make proc files seq_start use the class_find_device()Greg Kroah-Hartman2008-07-22
| | | | | | | | Use the proper class iterator function instead of mucking around in the internals of the class structures. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: make /proc/diskstats only build if CONFIG_PROC_FS is enabledRandy Dunlap2008-07-22
| | | | | | | | | | | These functions are only needed if CONFIG_PROC_FS is enabled, so save the space when it is not. This also makes it easier for a patch later in this series to work properly if CONFIG_PROC_FS is not enabled. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: make blk_lookup_devt use the class iterator functionGreg Kroah-Hartman2008-07-22
| | | | | | | | Use the proper class iterator function instead of mucking around in the internals of the class structures. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: make printk_partition use the class iterator functionGreg Kroah-Hartman2008-07-22
| | | | | | | | Use the proper class iterator function instead of mucking around in the internals of the class structures. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: fix compiler warning in genhd.cGreg Kroah-Hartman2008-07-22
| | | | | | | Warn if something really bad happens if we can't create this link. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minorDan Williams2008-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Why?: There are occasions where userspace would like to access sysfs attributes for a device but it may not know how sysfs has named the device or the path. For example what is the sysfs path for /dev/disk/by-id/ata-ST3160827AS_5MT004CK? With this change a call to stat(2) returns the major:minor then userspace can see that /sys/dev/block/8:32 links to /sys/block/sdc. What are the alternatives?: 1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce the need to proliferate ioctl interfaces into the kernel, so this seems counter productive. 2/ Use udev to create these symlinks: Also doable, but it adds a udev dependency to utilities that might be running in a limited environment like an initramfs. 3/ Do a full-tree search of sysfs. [kay.sievers@vrfy.org: fix duplicate registrations] [kay.sievers@vrfy.org: cleanup suggestions] Cc: Neil Brown <neilb@suse.de> Cc: Tejun Heo <htejun@gmail.com> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Reviewed-by: SL Baur <steve@xemacs.org> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Mark Lord <lkml@rtr.ca> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* allow userspace to modify scsi command filter on per device basisAdel Gadllah2008-07-03
| | | | | | | | | | | | This patch exports the per-gendisk command filter to user space through sysfs, so it can be changed by the system administrator. All users of the old cmd filter have been converted to use the new one. Original patch from Peter Jones. Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com> Signed-off-by: Peter Jones <pjones@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: export "ro" attributeKay Sievers2008-07-03
| | | | | Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Fix invalid access errors in blk_lookup_devtLinus Torvalds2008-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 30f2f0eb4bd2c43d10a8b0d872c6e5ad8f31c9a0 ("block: do_mounts - accept root=<non-existant partition>") extended blk_lookup_devt() to be able to look up partitions that had not yet been registered, but in the process made the assumption that the '&block_class.devices' list only contains disk devices and that you can do 'dev_to_disk(dev)' on them. That isn't actually true. The block_class device list also contains the partitions we've discovered so far, and you can't just do a 'dev_to_disk()' on those. So make sure to only work on devices that block/genhd.c has registered itself, something we can test by checking the 'dev->type' member. This makes the loop in blk_lookup_devt() match the other such loops in this file. [ We may want to do an alternate version that knows to handle _either_ whole-disk devices or partitions, but for now this is the minimal fix for a series of crashes reported by Mariusz Kozlowski in http://lkml.org/lkml/2008/5/25/25 and Ingo in http://lkml.org/lkml/2008/6/9/39 ] Reported-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Reported-by: Ingo Molnar <mingo@elte.hu> Cc: Neil Brown <neilb@suse.de> Cc: Joao Luis Meloni Assirati <assirati@nonada.if.usp.br> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* block: do_mounts - accept root=<non-existant partition>Kay Sievers2008-05-14
| | | | | | | | | | | | | | | | Some devices, like md, may create partitions only at first access, so allow root= to be set to a valid non-existant partition of an existing disk. This applies only to non-initramfs root mounting. This fixes a regression from 2.6.24 which did allow this to happen and broke some users machines :( Acked-by: Neil Brown <neilb@suse.de> Tested-by: Joao Luis Meloni Assirati <assirati@nonada.if.usp.br> Cc: stable <stable@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* mm: bdi: export BDI attributes in sysfsPeter Zijlstra2008-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object. This allows us to see and set the various BDI specific variables. In particular this properly exposes the read-ahead window for all relevant users and /sys/block/<block>/queue/read_ahead_kb should be deprecated. With patient help from Kay Sievers and Greg KH [mszeredi@suse.cz] - split off NFS and FUSE changes into separate patches - document new sysfs attributes under Documentation/ABI - do bdi_class_init as a core_initcall, otherwise the "default" BDI won't be initialized - remove bdi_init_fmt macro, it's not used very much [akpm@linux-foundation.org: fix ia64 warning] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Greg KH <greg@kroah.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>