aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAge
* Merge branch 'for-3.0-important' of git://git.drbd.org/linux-2.6-drbd into ↵Jens Axboe2011-06-30
|\ | | | | | | for-linus
| * drbd: we should write meta data updates with FLUSH FUALars Ellenberg2011-06-30
| | | | | | | | | | | | | | | | | | | | | | | | We used to write these with BIO_RW_BARRIER aka REQ_HARDBARRIER (unless disabled in the configuration). The correct semantic now would be to write with FLUSH/FUA. For example, with activity log transactions, FUA alone is not enough, we need the corresponding bitmap update (and all related application updates) on stable storage as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: when receive times out on meta socket, also check last receive time on ↵Lars Ellenberg2011-06-30
| | | | | | | | | | | | | | | | | | | | | | | | data socket If we have an asymetrically congested network, we may send P_PING, but due to congestion, the corresponding P_PING_ACK would time out, and we would drop a (congested, but otherwise) healthy connection ("PingAck did not arrive in time.") Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: account bitmap IO during resync as resync-(related-)-ioLars Ellenberg2011-06-30
| | | | | | | | | | | | | | | | | | If we have a good resync rate, we will frequently update the on-disk bitmap, which, if not accounted for as resync io, may let an otherwise idle device appear to be "busy", and cause us to throttle resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: don't cond_resched_lock with IRQs disabledLars Ellenberg2011-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | The last commit, drbd: add missing spinlock to bitmap receive, introduced a cond_resched_lock(), where the lock in question is taken with irqs disabled. As we must not schedule with IRQs disabled, and cond_resched_lock_irq() does not exist, yet, we re-aquire the spin_lock_irq() for each bitmap page processed in turn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: add missing spinlock to bitmap receiveLars Ellenberg2011-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During bitmap exchange, when using the RLE bitmap compression scheme, we have a code path that can set the whole bitmap at once. To avoid holding spin_lock_irq() for too long, we used to lock out other bitmap modifications during bitmap exchange by other means, and then, knowing we have exclusive access to the bitmap, modify it without the spinlock, and with IRQs enabled. Since we now allow local IO to continue, potentially setting additional bits during the bitmap receive phase, this is no longer true, and we get uncoordinated updates of bitmap members, causing bm_set to no longer accurately reflect the total number of set bits. To actually see this, you'd need to have a large bitmap, use RLE bitmap compression, and have busy IO during sync handshake and bitmap exchange. Fix this by taking the spin_lock_irq() in this code path as well, but calling cond_resched_lock() after each page worth of bits processed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Use the correct max_bio_size when creating resync requestsPhilipp Reisner2011-06-30
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2011-06-03
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-block: block: Use hlist_entry() for io_context.cic_list.first cfq-iosched: Remove bogus check in queue_fail path xen/blkback: potential null dereference in error handling xen/blkback: don't call vbd_size() if bd_disk is NULL block: blkdev_get() should access ->bd_disk only after success CFQ: Fix typo and remove unnecessary semicolon block: remove unwanted semicolons Revert "block: Remove extra discard_alignment from hd_struct." nbd: adjust 'max_part' according to part_shift nbd: limit module parameters to a sane value nbd: pass MSG_* flags to kernel_recvmsg() block: improve the bio_add_page() and bio_add_pc_page() descriptions
| * | xen/blkback: potential null dereference in error handlingDan Carpenter2011-06-01
| | | | | | | | | | | | | | | | | | | | | | | | blkbk->pending_pages can be NULL here so I added a check for it. Signed-off-by: Dan Carpenter <error27@gmail.com> [v1: Redid the loop a bit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen/blkback: don't call vbd_size() if bd_disk is NULLLaszlo Ersek2011-06-01
| | | | | | | | | | | | | | | | | | | | | ...because vbd_size() dereferences bd_disk if bd_part is NULL. Signed-off-by: Laszlo Ersek<lersek@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | nbd: adjust 'max_part' according to part_shiftNamhyung Kim2011-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'max_part' parameter determines how many partitions are supported on each nbd device. However the actual number can be changed to the power of 2 minus 1 form during the module initialization as alloc_disk() is called with (1 << part_shift) for some reason. So adjust 'max_part' also at least for consistency with loop and brd. It is exported via sysfs already, and a user should check this value after module loading if [s]he wants to use that number correctly (i.e. fdisk or something). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | nbd: limit module parameters to a sane valueNamhyung Kim2011-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'max_part' parameter controls the number of maximum partition a nbd device can have. However if a user specifies very large value it would exceed the limitation of device minor number and can cause a kernel oops (or, at least, produce invalid device nodes in some cases). In addition, specifying large 'nbds_max' value causes same problem for the same reason. On my desktop, following command results to the kernel bug: $ sudo modprobe nbd max_part=100000 kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/block/nbd4/range CPU 1 Modules linked in: nbd(+) bridge stp llc kvm_intel kvm asus_atk0110 sg sr_mod cdrom Pid: 2522, comm: modprobe Tainted: G W 2.6.39-leonard+ #159 System manufacturer System Product Name/P5G41TD-M PRO RIP: 0010:[<ffffffff8115aa08>] [<ffffffff8115aa08>] internal_create_group+0x2f/0x166 RSP: 0018:ffff8801009f1de8 EFLAGS: 00010246 RAX: 00000000ffffffef RBX: ffff880103920478 RCX: 00000000000a7bd3 RDX: ffffffff81a2dbe0 RSI: 0000000000000000 RDI: ffff880103920478 RBP: ffff8801009f1e38 R08: ffff880103920468 R09: ffff880103920478 R10: ffff8801009f1de8 R11: ffff88011eccbb68 R12: ffffffff81a2dbe0 R13: ffff880103920468 R14: 0000000000000000 R15: ffff880103920400 FS: 00007f3c49de9700(0000) GS:ffff88011f800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f3b7fe7c000 CR3: 00000000cd58d000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 2522, threadinfo ffff8801009f0000, task ffff8801009a93a0) Stack: ffff8801009f1e58 ffffffff812e8f6e ffff8801009f1e58 ffffffff812e7a80 ffff880000000010 ffff880103920400 ffff8801002fd0c0 ffff880103920468 0000000000000011 ffff880103920400 ffff8801009f1e48 ffffffff8115ab6a Call Trace: [<ffffffff812e8f6e>] ? device_add+0x4f1/0x5e4 [<ffffffff812e7a80>] ? dev_set_name+0x41/0x43 [<ffffffff8115ab6a>] sysfs_create_group+0x13/0x15 [<ffffffff810b857e>] blk_trace_init_sysfs+0x14/0x16 [<ffffffff811ee58b>] blk_register_queue+0x4c/0xfd [<ffffffff811f3bdf>] add_disk+0xe4/0x29c [<ffffffffa007e2ab>] nbd_init+0x2ab/0x30d [nbd] [<ffffffffa007e000>] ? 0xffffffffa007dfff [<ffffffff8100020f>] do_one_initcall+0x7f/0x13e [<ffffffff8107ab0a>] sys_init_module+0xa1/0x1e3 [<ffffffff814f3542>] system_call_fastpath+0x16/0x1b Code: 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 48 89 fb 41 89 f6 49 89 d4 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe b9 ea ff ff ff 48 83 7f 30 00 0f 84 09 01 00 00 49 RIP [<ffffffff8115aa08>] internal_create_group+0x2f/0x166 RSP <ffff8801009f1de8> ---[ end trace 753285ffbf72c57c ]--- Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | nbd: pass MSG_* flags to kernel_recvmsg()Namhyung Kim2011-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike kernel_sendmsg(), kernel_recvmsg() requires passing flags explicitly via last parameter instead of struct msghdr.msg_flags. Therefore calls to sock_xmit(lo, 0, ..., MSG_WAITALL) have not been processed properly by tcp layer wrt. the flag. Fix it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | block: fix mismerge of the DISK_EVENT_MEDIA_CHANGE removalLinus Torvalds2011-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jens' back-merge commit 698567f3fa79 ("Merge commit 'v2.6.39' into for-2.6.40/core") was incorrectly done, and re-introduced the DISK_EVENT_MEDIA_CHANGE lines that had been removed earlier in commits - 9fd097b14918 ("block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers") - 7eec77a1816a ("ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd") because of conflicts with the "g->flags" updates near-by by commit d4dc210f69bc ("block: don't block events on excl write for non-optical devices") As a result, we re-introduced the hanging behavior due to infinite disk media change reports. Tssk, tssk, people! Don't do back-merges at all, and *definitely* don't do them to hide merge conflicts from me - especially as I'm likely better at merging them than you are, since I do so many merges. Reported-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | drivers, block: virtio_blk: Replace cryptic number with the macroLiu Yuan2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | It is easier to figure out the context by reading SCSI_SENSE_BUFFERSIZE instead of plain '96'. Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | | virtio_blk: allow re-reading config space at runtimeChristoph Hellwig2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wire up the virtio_driver config_changed method to get notified about config changes raised by the host. For now we just re-read the device size to support online resizing of devices, but once we add more attributes that might be changeable they could be added as well. Note that the config_changed method is called from irq context, so we'll have to use the workqueue infrastructure to provide us a proper user context for our changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | | Merge branch 'idle-release' of ↵Linus Torvalds2011-05-29
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param x86 idle: deprecate "no-hlt" cmdline param x86 idle APM: deprecate CONFIG_APM_CPU_IDLE x86 idle floppy: deprecate disable_hlt() x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it x86 idle: clarify AMD erratum 400 workaround idle governor: Avoid lock acquisition to read pm_qos before entering idle cpuidle: menu: fixed wrapping timers at 4.294 seconds
| * | x86 idle floppy: deprecate disable_hlt()Len Brown2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Plan to remove floppy_disable_hlt in 2012, an ancient workaround with comments that it should be removed. This allows us to remove clutter and a run-time branch from the idle code. WARN_ONCE() on invocation until it is removed. cc: x86@kernel.org cc: stable@kernel.org # .39.x Signed-off-by: Len Brown <len.brown@intel.com>
* | | loop: export module parametersNamhyung Kim2011-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export 'max_loop' and 'max_part' parameters to sysfs so user can know that how many devices are allowed and how many partitions are supported. If 'max_loop' is 0, there is no restriction on the number of loop devices. User can create/use the devices as many as minor numbers available. If 'max_part' is 0, it means simply the device doesn't support partitioning. Also note that 'max_part' can be adjusted to power of 2 minus 1 form if needed. User should check this value after the module loading if he/she want to use that number correctly (i.e. fdisk, mknod, etc.). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | brd: export module parametersNamhyung Kim2011-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export 'rd_nr', 'rd_size' and 'max_part' parameters to sysfs so user can know that how many devices are allowed, how big each device is and how many partitions are supported. If 'max_part' is 0, it means simply the device doesn't support partitioning. Also note that 'max_part' can be adjusted to power of 2 minus 1 form if needed. User should check this value after the module loading if he/she want to use that number correctly (i.e. fdisk, mknod, etc.). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | brd: fix comment on initial device creationNamhyung Kim2011-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | If 'rd_nr' param was not specified, 16 (can be adjusted via CONFIG_BLK_DEV_RAM_COUNT) devices would be created by default but comment said 1. Fix it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | brd: handle on-demand devices correctlyNamhyung Kim2011-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When finding or allocating a ram disk device, brd_probe() did not take partition numbers into account so that it can result to a different device. Consider following example (I set CONFIG_BLK_DEV_RAM_COUNT=4 for simplicity) : $ sudo modprobe brd max_part=15 $ ls -l /dev/ram* brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0 brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1 brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2 brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3 $ sudo mknod /dev/ram4 b 1 64 $ sudo dd if=/dev/zero of=/dev/ram4 bs=4k count=256 256+0 records in 256+0 records out 1048576 bytes (1.0 MB) copied, 0.00215578 s, 486 MB/s namhyung@leonhard:linux$ ls -l /dev/ram* brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0 brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1 brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2 brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3 brw-r--r-- 1 root root 1, 64 2011-05-25 15:45 /dev/ram4 brw-rw---- 1 root disk 1, 1024 2011-05-25 15:44 /dev/ram64 After this patch, /dev/ram4 - instead of /dev/ram64 - was accessed correctly. In addition, 'range' passed to blk_register_region() should include all range of dev_t that RAMDISK_MAJOR can address. It does not need to be limited by partition numbers unless 'rd_nr' param was specified. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | brd: limit 'max_part' module param to DISK_MAX_PARTSNamhyung Kim2011-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'max_part' parameter controls the number of maximum partition a brd device can have. However if a user specifies very large value it would exceed the limitation of device minor number and can cause a kernel panic (or, at least, produce invalid device nodes in some cases). On my desktop system, following command kills the kernel. On qemu, it triggers similar oops but the kernel was alive: $ sudo modprobe brd max_part=100000 BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff81110a9a>] sysfs_create_dir+0x2d/0xae PGD 7af1067 PUD 7b19067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: CPU 0 Modules linked in: brd(+) Pid: 44, comm: insmod Tainted: G W 2.6.39-qemu+ #158 Bochs Bochs RIP: 0010:[<ffffffff81110a9a>] [<ffffffff81110a9a>] sysfs_create_dir+0x2d/0xae RSP: 0018:ffff880007b15d78 EFLAGS: 00000286 RAX: ffff880007b05478 RBX: ffff880007a52760 RCX: ffff880007b15dc8 RDX: ffff880007a4f900 RSI: ffff880007b15e48 RDI: ffff880007a52760 RBP: ffff880007b15da8 R08: 0000000000000002 R09: 0000000000000000 R10: ffff880007b15e48 R11: ffff880007b05478 R12: 0000000000000000 R13: ffff880007b05478 R14: 0000000000400920 R15: 0000000000000063 FS: 0000000002160880(0063) GS:ffff880007c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 0000000007b1c000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 Process insmod (pid: 44, threadinfo ffff880007b14000, task ffff880007acb980) Stack: ffff880007b15dc8 ffff880007b05478 ffff880007b15da8 00000000fffffffe ffff880007a52760 ffff880007b05478 ffff880007b15de8 ffffffff81143c0a 0000000000400920 ffff880007a52760 ffff880007b05478 0000000000000000 Call Trace: [<ffffffff81143c0a>] kobject_add_internal+0xdf/0x1a0 [<ffffffff81143da1>] kobject_add_varg+0x41/0x50 [<ffffffff81143e6b>] kobject_add+0x64/0x66 [<ffffffff8113bbe7>] blk_register_queue+0x5f/0xb8 [<ffffffff81140f72>] add_disk+0xdf/0x289 [<ffffffffa00040df>] brd_init+0xdf/0x1aa [brd] [<ffffffffa0004000>] ? 0xffffffffa0003fff [<ffffffffa0004000>] ? 0xffffffffa0003fff [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e [<ffffffff8108516c>] sys_init_module+0x9c/0x1dc [<ffffffff812ff4bb>] system_call_fastpath+0x16/0x1b Code: 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 18 48 85 ff 75 04 0f 0b eb fe 48 8b 47 18 49 c7 c4 70 1e 4d 81 48 85 c0 74 04 4c 8b 60 30 8b 44 24 58 45 31 ed 0f b6 c4 85 c0 74 0d 48 8b 43 28 48 89 RIP [<ffffffff81110a9a>] sysfs_create_dir+0x2d/0xae RSP <ffff880007b15d78> CR2: 0000000000000058 ---[ end trace aebb1175ce1f6739 ]--- Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | brd: get rid of unused members from struct brd_deviceNamhyung Kim2011-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | brd_refcnt, brd_offset, brd_sizelimit and brd_blocksize in struct brd_device seem to be copied from struct loop_device but they're not used anywhere. Let get rid of them. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2011-05-25
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) ceph: fix cap flush race reentrancy libceph: subscribe to osdmap when cluster is full libceph: handle new osdmap down/state change encoding rbd: handle online resize of underlying rbd image ceph: avoid inode lookup on nfs fh reconnect ceph: use LOOKUPINO to make unconnected nfs fh more reliable rbd: use snprintf for disk->disk_name rbd: cleanup: make kfree match kmalloc rbd: warn on update_snaps failure on notify ceph: check return value for start_request in writepages ceph: remove useless check libceph: add missing breaks in addr_set_port libceph: fix TAG_WAIT case ceph: fix broken comparison in readdir loop libceph: fix osdmap timestamp assignment ceph: fix rare potential cap leak libceph: use snprintf for unknown addrs libceph: use snprintf for formatting object name ceph: use snprintf for dirstat content libceph: fix uninitialized value when no get_authorizer method is set ...
| * | | rbd: handle online resize of underlying rbd imageSage Weil2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If we get a notification that the image header has changed, check for a change in the image size. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | rbd: use snprintf for disk->disk_nameSage Weil2011-05-24
| | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * | | rbd: cleanup: make kfree match kmallocSage Weil2011-05-24
| | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * | | rbd: warn on update_snaps failure on notifySage Weil2011-05-19
| | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
* | | | Merge branch 'for-2.6.40/drivers' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-05-25
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.40/drivers' of git://git.kernel.dk/linux-2.6-block: (110 commits) loop: handle on-demand devices correctly loop: limit 'max_part' module param to DISK_MAX_PARTS drbd: fix warning drbd: fix warning drbd: Fix spelling drbd: fix schedule in atomic drbd: Take a more conservative approach when deciding max_bio_size drbd: Fixed state transitions after async outdate-peer-handler returned drbd: Disallow the peer_disk_state to be D_OUTDATED while connected drbd: Fix for the connection problems on high latency links drbd: fix potential activity log refcount imbalance in error path drbd: Only downgrade the disk state in case of disk failures drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int drbd: fix potential distributed deadlock lru_cache.h: fix comments referring to ts_ instead of lc_ drbd: Fix for application IO with the on-io-error=pass-on policy xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. xen/blkback: don't fail empty barrier requests xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end() ...
| * | | loop: handle on-demand devices correctlyNamhyung Kim2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When finding or allocating a loop device, loop_probe() did not take partition numbers into account so that it can result to a different device. Consider following example: $ sudo modprobe loop max_part=15 $ ls -l /dev/loop* brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0 brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1 brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2 brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3 brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4 brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5 brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6 brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7 $ sudo mknod /dev/loop8 b 7 128 $ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img $ sudo losetup -a /dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img) $ ls -l /dev/loop* brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0 brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1 brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128 brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1 brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2 brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3 brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2 brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3 brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4 brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5 brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6 brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7 brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8 After this patch, /dev/loop8 - instead of /dev/loop128 - was accessed correctly. In addition, 'range' passed to blk_register_region() should include all range of dev_t that LOOP_MAJOR can address. It does not need to be limited by partition numbers unless 'max_loop' param was specified. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | loop: limit 'max_part' module param to DISK_MAX_PARTSNamhyung Kim2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'max_part' parameter controls the number of maximum partition a loop block device can have. However if a user specifies very large value it would exceed the limitation of device minor number and can cause a kernel panic (or, at least, produce invalid device nodes in some cases). On my desktop system, following command kills the kernel. On qemu, it triggers similar oops but the kernel was alive: $ sudo modprobe loop max_part0000 ------------[ cut here ]------------ kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65! invalid opcode: 0000 [#1] SMP last sysfs file: CPU 0 Modules linked in: loop(+) Pid: 43, comm: insmod Tainted: G W 2.6.39-qemu+ #155 Bochs Bochs RIP: 0010:[<ffffffff8113ce61>] [<ffffffff8113ce61>] internal_create_group= +0x2a/0x170 RSP: 0018:ffff880007b3fde8 EFLAGS: 00000246 RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4 RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878 RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000 R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50 R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800 FS: 0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000= 00 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c= 0) Stack: ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b 0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868 0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8 Call Trace: [<ffffffff811e66dd>] ? device_add+0x4bc/0x5af [<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e [<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12 [<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16 [<ffffffff8116a090>] blk_register_queue+0x47/0xf7 [<ffffffff8116f527>] add_disk+0xdf/0x290 [<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop] [<ffffffffa0006000>] ? 0xffffffffa0005fff [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e [<ffffffff81096804>] sys_init_module+0x9c/0x1e0 [<ffffffff813329bb>] system_call_fastpath+0x16/0x1b Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb= 48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe = 48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49 RIP [<ffffffff8113ce61>] internal_create_group+0x2a/0x170 RSP <ffff880007b3fde8> ---[ end trace a123eb592043acad ]--- Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | drbd: fix warningAndrew Morton2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In file included from drivers/block/drbd/drbd_main.c:54: drivers/block/drbd/drbd_int.h:1190: warning: parameter has incomplete type Forward declarations of enums do not work. Fix it unpleasantly by moving the prototype. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Lars Ellenberg <drbd-dev@lists.linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | drbd: fix warningPhilipp Reisner2011-05-24
| | | | | | | | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
| * | | drbd: Fix spellingBart Van Assche2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found these with the help of ispell -l. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
| * | | drbd: fix schedule in atomicLars Ellenberg2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An administrative detach used to request a state change directly to D_DISKLESS, first suspending IO to avoid the last put_ldev() occuring from an endio handler, potentially in irq context. This is not enough on the receiving side (typically secondary), we may miss some peer_req on the way to local disk, which then may do the last put_ldev() from their drbd_peer_request_endio(). This patch makes the detach always go through the intermediate D_FAILED state. We may consider to rename it D_DETACHING. Alternative approach would be to create yet an other work item to be scheduled on the worker, do the destructor work from there, and get the timing right. manually picked commit 564040f from the drbd 8.4 branch. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | drbd: Take a more conservative approach when deciding max_bio_sizePhilipp Reisner2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old (optimistic) implementation could shrink the bio size on an primary device. Shrinking the bio size on a primary device is bad. Since there we might get BIOs with the old (bigger) size shortly after we published the new size. The new implementation is more conservative, and eventually increases the max_bio_size on a primary device (which is valid). It does so, when it knows the local limit AND the remote limit. We cache the last seen max_bio_size of the peer in the meta data, and rely on that, to make the operation of single nodes more efficient. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | drbd: Fixed state transitions after async outdate-peer-handler returnedPhilipp Reisner2011-05-24
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | drbd: Disallow the peer_disk_state to be D_OUTDATED while connectedPhilipp Reisner2011-05-24
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | drbd: Fix for the connection problems on high latency linksPhilipp Reisner2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems that the real cause of all the issues where that we did not noticed in drbd_try_connect() when the other guy closes one socket if the round trip time gets higher than 100ms. There were that 100ms hard coded! Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | drbd: fix potential activity log refcount imbalance in error pathLars Ellenberg2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is no longer sufficient to trigger on local WRITE, we need to check on (rq_state & RQ_IN_ACT_LOG) before calling drbd_al_complete_io also in the error path. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | drbd: Only downgrade the disk state in case of disk failuresPhilipp Reisner2011-05-24
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | drbd: fix disconnect/reconnect loop, if ping-timeout == ping-intLars Ellenberg2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is no replication traffic within the idle timeout (ping-int seconds), DRBD will send a P_PING, and adjust the timeout to ping-timeout. If there is no P_PING_ACK received within this ping-timeout, DRBD finally drops the connection, and tries to re-establish it. To decide which timeout was active, we compared the current timeout with the ping-timeout, and dropped the connection, if that was the case. By default, ping-int is 10 seconds, ping-timeout is 500 ms. Unfortunately, if you configure ping-timeout to be the same as ping-int, expiry of the idle-timeout had been mistaken for a missing ping ack, and caused an immediate reconnection attempt. Fix: Allow both timeouts to be equal, use a local variable to store which timeout is active. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | drbd: fix potential distributed deadlockLars Ellenberg2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We limit ourselves to a configurable maximum number of pages used as temporary bio pages. If the configured "max_buffers" is not big enough to match the bandwidth of the respective deployment, a distributed deadlock could be triggered by e.g. fast online verify and heavy application IO. TCP connections would block on congestion, because both receivers would wait on pages to become available. Fortunately the respective senders in this case would be able to give back some pages already. So do that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | drbd: Fix for application IO with the on-io-error=pass-on policyPhilipp Reisner2011-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case a write failes on the local disk, go into D_INCONSISTENT disk state. That causes future reads of that block to be shipped to the peer. Read retry remote was already in place. Actually the documentation needs to get fixed now. Since the application is still shielded from the error. (as long as we have only a single disk failing) The difference to detach is that we keep the disk. And therefore might keep all the other, still working sectors up to date. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | Merge branches 'for-jens/xen-backend-fixes' and 'for-jens/xen-blkback-v3.3' ↵Jens Axboe2011-05-19
| |\ \ \ | | | | | | | | | | | | | | | of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-2.6.40/drivers
| | * | | xen/blkback: don't fail empty barrier requestsJan Beulich2011-05-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sector number on empty barrier requests may (will?) be -1, which, given that it's being treated as unsigned 64-bit quantity, will almost always exceed the actual (virtual) disk's size. Inspired by Konrad's "When writting barriers set the sector number to zero...". While at it also add overflow checking to the math in vbd_translate(). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/blkback: fix xenbus_transaction_start() hang caused by double ↵Laszlo Ersek2011-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xenbus_transaction_end() vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double xenbus_transaction_end() calls. The next down_read() in xenbus_transaction_start() (at eg. the next resize attempt) hangs. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317 Acked-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/blkback: Align the tabs on the structure.Konrad Rzeszutek Wilk2011-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent changes caused this field of the structure to be offset a bit. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/blkback: if log_stats is enabled print out the data.Konrad Rzeszutek Wilk2011-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | And not depend on the driver being built with -DDEBUG flag. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>