aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAge
...
| * | | drbd: cleanup bogus assert messageLars Ellenberg2013-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes ASSERT( mdev->state.disk == D_FAILED ) in drivers/block/drbd/drbd_main.c When we detach from local disk, we let the local refcount hit zero twice. First, we transition to D_FAILED, so we won't give out new references to incoming requests; we still may give out *internal* references, though. Once the refcount hits zero [1] while in D_FAILED, we queue a transition to D_DISKLESS to our worker. We need to queue it, because we may be in atomic context when putting the reference. Once the transition to D_DISKLESS actually happened [2] from worker context, we don't give out new internal references either. Between hitting zero the first time [1] and actually transition to D_DISKLESS [2], there may be a few very short lived internal get/put, so we may hit zero more than once while being in D_FAILED, or even see a race where a an internal get_ldev() happened while D_FAILED, but the corresponding put_ldev() happens just after the transition to D_DISKLESS. That's why we have the additional test_and_set_bit(GO_DISKLESS,); and that's why the assert was placed wrong. Since there was exactly one code path left to drbd_go_diskless(), and that checks already for D_FAILED, drop that assert, and fold in the drbd_queue_work(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-blockLinus Torvalds2013-05-08
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block core updates from Jens Axboe: - Major bit is Kents prep work for immutable bio vecs. - Stable candidate fix for a scheduling-while-atomic in the queue bypass operation. - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging discard bios. - Tejuns changes to convert the writeback thread pool to the generic workqueue mechanism. - Runtime PM framework, SCSI patches exists on top of these in James' tree. - A few random fixes. * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits) relay: move remove_buf_file inside relay_close_buf partitions/efi.c: replace useless kzalloc's by kmalloc's fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read() block: fix max discard sectors limit blkcg: fix "scheduling while atomic" in blk_queue_bypass_start Documentation: cfq-iosched: update documentation help for cfq tunables writeback: expose the bdi_wq workqueue writeback: replace custom worker pool implementation with unbound workqueue writeback: remove unused bdi_pending_list aoe: Fix unitialized var usage bio-integrity: Add explicit field for owner of bip_buf block: Add an explicit bio flag for bios that own their bvec block: Add bio_alloc_pages() block: Convert some code to bio_for_each_segment_all() block: Add bio_for_each_segment_all() bounce: Refactor __blk_queue_bounce to not use bi_io_vec raid1: use bio_copy_data() pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage pktcdvd: use bio_copy_data() block: Add bio_copy_data() ...
| * \ \ \ Merge branch 'writeback-workqueue' of ↵Jens Axboe2013-04-02
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-3.10/core Tejun writes: ----- This is the pull request for the earlier patchset[1] with the same name. It's only three patches (the first one was committed to workqueue tree) but the merge strategy is a bit involved due to the dependencies. * Because the conversion needs features from wq/for-3.10, block/for-3.10/core is based on rc3, and wq/for-3.10 has conflicts with rc3, I pulled mainline (rc5) into wq/for-3.10 to prevent those workqueue conflicts from flaring up in block tree. * Resolving the issue that Jan and Dave raised about debugging requires arch-wide changes. The patchset is being worked on[2] but it'll have to go through -mm after these changes show up in -next, and not included in this pull request. The three commits are located in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git writeback-workqueue Pulling it into block/for-3.10/core produces a conflict in drivers/md/raid5.c between the following two commits. e3620a3ad5 ("MD RAID5: Avoid accessing gendisk or queue structs when not available") 2f6db2a707 ("raid5: use bio_reset()") The conflict is trivial - one removes an "if ()" conditional while the other removes "rbi->bi_next = NULL" right above it. We just need to remove both. The merged branch is available at git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git block-test-merge so that you can use it for verification. The test merge commit has proper merge description. While these changes are a bit of pain to route, they make code simpler and even have, while minute, measureable performance gain[3] even on a workload which isn't particularly favorable to showing the benefits of this conversion. ---- Fixed up the conflict. Conflicts: drivers/md/raid5.c Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | aoe: Fix unitialized var usageKent Overstreet2013-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4f2ac93c175c4922bdddbfec6cad94b32cea0070 (block: Remove bi_idx references) accidently removed the bit that set bv - readd that. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: fengguang.wu@intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | block: Add bio_for_each_segment_all()Kent Overstreet2013-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __bio_for_each_segment() iterates bvecs from the specified index instead of bio->bv_idx. Currently, the only usage is to walk all the bvecs after the bio has been advanced by specifying 0 index. For immutable bvecs, we need to split these apart; bio_for_each_segment() is going to have a different implementation. This will also help document the intent of code that's using it - bio_for_each_segment_all() is only legal to use for code that owns the bio. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Neil Brown <neilb@suse.de> CC: Boaz Harrosh <bharrosh@panasas.com>
| * | | | | pktcdvd: Use bio_reset() in disabled code to kill bi_idx usageKent Overstreet2013-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the short term this'll help with code auditing, and if this code ever gets used now it's converted :) Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jiri Kosina <jkosina@suse.cz>
| * | | | | pktcdvd: use bio_copy_data()Kent Overstreet2013-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Jiri Kosina <jkosina@suse.cz>
| * | | | | block: Remove bi_idx referencesKent Overstreet2013-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For immutable bvecs, all bi_idx usage needs to be audited - so here we're removing all the unnecessary uses. Most of these are places where it was being initialized on a bio that was just allocated, a few others are conversions to standard macros. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk>
| * | | | | block: Use bio_sectors() more consistentlyKent Overstreet2013-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bunch of places in the code weren't using it where they could be - this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: "Ed L. Cashin" <ecashin@coraid.com> CC: Nick Piggin <npiggin@kernel.dk> CC: Jiri Kosina <jkosina@suse.cz> CC: Jim Paris <jim@jtan.com> CC: Geoff Levand <geoff@infradead.org> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Acked-by: Ed Cashin <ecashin@coraid.com>
| * | | | | block: Add bio_end_sector()Kent Overstreet2013-03-23
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just a little convenience macro - main reason to add it now is preparing for immutable bio vecs, it'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Lars Ellenberg <drbd-dev@lists.linbit.com> CC: Jiri Kosina <jkosina@suse.cz> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Heiko Carstens <heiko.carstens@de.ibm.com> CC: linux-s390@vger.kernel.org CC: Chris Mason <chris.mason@fusionio.com> CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-05-07
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "A couple of fixes + getting rid of __blkdev_put() return value" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: proc: Use PDE attribute setting accessor functions make blkdev_put() return void block_device_operations->release() should return void mtd_blktrans_ops->release() should return void hfs: SMP race on directory close()
| * | | | | block_device_operations->release() should return voidAl Viro2013-05-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value passed is 0 in all but "it can never happen" cases (and those only in a couple of drivers) *and* it would've been lost on the way out anyway, even if something tried to pass something meaningful. Just don't bother. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-05-06
|\ \ \ \ \ \ | |/ / / / / |/| | | | / | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Alex Elder: "This is a big pull. Most of it is culmination of Alex's work to implement RBD image layering, which is now complete (yay!). There is also some work from Yan to fix i_mutex behavior surrounding writes in cephfs, a sync write fix, a fix for RBD images that get resized while they are mapped, and a few patches from me that resolve annoying auth warnings and fix several bugs in the ceph auth code." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (254 commits) rbd: fix image request leak on parent read libceph: use slab cache for osd client requests libceph: allocate ceph message data with a slab allocator libceph: allocate ceph messages with a slab allocator rbd: allocate image object names with a slab allocator rbd: allocate object requests with a slab allocator rbd: allocate name separate from obj_request rbd: allocate image requests with a slab allocator rbd: use binary search for snapshot lookup rbd: clear EXISTS flag if mapped snapshot disappears rbd: kill off the snapshot list rbd: define rbd_snap_size() and rbd_snap_features() rbd: use snap_id not index to look up snap info rbd: look up snapshot name in names buffer rbd: drop obj_request->version rbd: drop rbd_obj_method_sync() version parameter rbd: more version parameter removal rbd: get rid of some version parameters rbd: stop tracking header object version rbd: snap names are pointer to constant data ...
| * | | | rbd: fix image request leak on parent readAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a read for a layered image object finds the target object doesn't exist, a read image request for the parent image is created and submitted. When that completes, the callback routine was not releasing that parent image request. Fix that. The slab allocation stuff just added has greatly simplified the search for the source of this memory leak. This resolves: http://tracker.ceph.com/issues/4803 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: allocate image object names with a slab allocatorAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The names of objects used for image object requests are always fixed size. So create a slab cache to manage them. Define a new function rbd_segment_name_free() to match rbd_segment_name() (which is what supplies the dynamically-allocated name buffer). This is part of: http://tracker.ceph.com/issues/3926 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: allocate object requests with a slab allocatorAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a slab cache to manage rbd_obj_request allocation. We aren't using a constructor, and we'll zero-fill object request structures when they're allocated. This is part of: http://tracker.ceph.com/issues/3926 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: allocate name separate from obj_requestAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next patch will define a slab allocator for a object requests. To use that we'll need to allocate the name of an object separate from the request structure itself. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: allocate image requests with a slab allocatorAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a slab cache to manage rbd_img_request allocation. Nothing too fancy at this point--we'll still initialize everything at allocation time (no constructor) This is part of: http://tracker.ceph.com/issues/3926 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: use binary search for snapshot lookupAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use bsearch(3) to make snapshot lookup by id more efficient. (There could be thousands of snapshots, and conceivably many more.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: clear EXISTS flag if mapped snapshot disappearsAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This functionality inadvertently disappeared in the last patch. Image snapshots can get removed at just about any time. In particular it can disappear even if it is in use by an rbd client as a mapped image. The rbd client deals with such a disappearance by responding to new requests with ENXIO. This is implemented by each rbd device maintaining an EXISTS flag, which is normally set but cleared if a snapshot disappears. This patch (re-)implements the clearing of that flag. Whenever mapped image header information is refreshed, if the mapping is for a snapshot, verify the mapped snapshot is still present in the updated snapshot context. If it is not, clear the flag. It is not necessary to check this in the initial probe, because the probe will not succeed if the snapshot doesn't exist. This resolves: http://tracker.ceph.com/issues/4880 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: kill off the snapshot listAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We no longer use the snapshot list for anything. When we need to look up a snapshot name, id, size, or feature mask, we just do it directly rather than relying on this list being updated with every refresh. The main reason it existed was for the benefit of the device/sysfs entries that previously were associated with snapshots. So get rid of the snapshot list, and struct rbd_snap, and the hundreds of lines of code that supported them. This resolves: http://tracker.ceph.com/issues/4868 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: define rbd_snap_size() and rbd_snap_features()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch defines a handful of new functions that will allow us to get rid of the rbd device structure's list of snapshots. Define rbd_snap_id_by_name() to look up a snapshot id given its name. This is efficient for format 1 images but not for format 2. Fortunately it only gets called at mapping time so it's not that critical. Use rbd_snap_id_by_name() to find out the id for a snapshot getting mapped, and pass that id to new functions rbd_snap_size() and rbd_snap_features() to look up information about a given snapshot's size and feature mask given its snapshot id. All this gets done in rbd_dev_mapping_set(). As a result, snap_by_name() is no longer needed, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: use snap_id not index to look up snap infoAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to align with what was needed for format 1 rbd images, rbd_dev_v2_snap_info() was set up to take as argument an index into the array of snapshot ids in a rbd device's snapshot context. This switches that around, so we pass the snapshot id instead. In doing this, rbd_snap_name() now returns a dynamically-allocated string rather than a fixed one, so there's no need to make a duplicate in its caller, rbd_dev_spec_update(). This means the following functions take a snapshot id where they previously used an index value: rbd_dev_snap_info() rbd_dev_v1_snap_info() rbd_dev_v2_snap_info() A new function, rbd_dev_snap_index(), determines the snap index for format 1 images and uses it to look up the name. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: look up snapshot name in names bufferAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than scanning the list of snapshot structures for it, scan the snapshot context buffer containing snapshot names in order to determine for a format 1 image the name associated with a given snapshot id. Pull out the part of rbd_dev_v1_snap_info() that does this scan into a new function, _rbd_dev_v1_snap_name(). Have that function return a dynamically-allocated copy of the name, and don't duplicate it in rbd_dev_v1_snap_info(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: drop obj_request->versionAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nothing ever uses the version field maintained in the object request structure any more, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: drop rbd_obj_method_sync() version parameterAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only NULL is passed as the version argument to rbd_obj_method_sync(), so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: more version parameter removalAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continued from the last patch, more parameters that can go away because we no longer have a need to track object versions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: get rid of some version parametersAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several functions in rbd have parameters meant to allow the version of an object to be passed in or out. The purpose of those was to allow the version of a header object to be maintained, but we no longer do that. As a result, these parameters are never actually needed or used, so get rid of them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: stop tracking header object versionAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rbd code takes care to maintain the version of the header object. This was done in hopes of using it to detect a change in the object between reading it and setting up a watch request to be notified of changes. The mechanism was never fully implemented, however. And we now avoid the original problem by setting up the watch request before ever reading the content of the header. The osd doesn't interpret the object version supplied with a WATCH osd op, nor does it use the version supplied with a NOTIFY_ACK op (we can just supply 0 for both). There is therefore no need to maintain the header's object version any more, so stop doing so. We'll be able to simplify some more rbd code in the next few patches as a result of this. This resolves: http://tracker.ceph.com/issues/3952 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: snap names are pointer to constant dataAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make explicit that snapshot names don't change by making functions return and take parameters that that point to const qualified data. This resolves: http://tracker.ceph.com/issues/4867 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: don't revalidate so muchAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever a header object event causes a mapped rbd image to refresh its header information, revalidate_disk() is being called. This was done in rbd_dev_refresh() outside the control mutex in order to avoid a lock inversion. Although a an event like this *might* indicate the image has changed size, most of the time it does not. Record the image size before and after the refresh, and only call revalidate_disk() if it changes. This resolves: http://tracker.ceph.com/issues/4867 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: fix up the layering warning messageAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A warning gets spewed for any image being probed, including parent images. Set up a condition such that the warning message only gets printed for the image being mapped, not any of its parents. Also, I didn't like the way the warning ended up being so long. Make it a terse warning instead. People experimenting with layering will know what the message means. This is part of: http://tracker.ceph.com/issues/4867 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | ceph: use ceph_create_snap_context()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have a library routine to create snap contexts, use it. This is part of: http://tracker.ceph.com/issues/4857 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: set up devices only for mapped imagesAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop setting up Linux devices during the image probe operation. Instead, set up the devices as a separate step after the image probe, in rbd_add(). A consequence of this is that only mapped images get devices assigned to them, which is pretty sweet. This resolves: http://tracker.ceph.com/issues/4774 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: don't have device release destroy rbd_devAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently an rbd_device structure gets destroyed from the release routine for the device embedded within it. Stop doing that, instead calling rbd_dev_image_release() right after rbd_bus_del_dev() wherever the latter is called. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: define rbd_dev_unprobe()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a new function rbd_dev_unprobe() which undoes state changes that occur from calling rbd_dev_v1_probe() or rbd_dev_v2_probe(). Note that this is a superset of rbd_header_free(), which is now getting removed (it seems to have been used improperly anyway). Flesh out rbd_dev_image_release() so it undoes exactly what rbd_dev_image_probe() does. This means that: - rbd_dev_device_release() gets called when the last device reference gets dropped; - that undoes everything done by the rbd_dev_device_setup() call at the end of rbd_dev_image_probe() (and nothing more), ending by calling rbd_dev_image_release(); and - rbd_dev_image_release() undoes everything else done by rbd_dev_image_probe() (and this includes a call to rbd_dev_unprobe(). This means the image and device portions of an rbd device are fairly cleanly separated now, so error paths should be a little easier to verify than they used to be. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: don't destroy rbd_dev in device release functionAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename rbd_dev_probe_finish() to be rbd_dev_device_setup(). Its purpose is to set up the Linux side of an rbd device mapping. Rename rbd_dev_release() to be rbd_dev_device_release(), making it more obvious it serves as the inverse of the setup function (or it will). Encapsulate some of what was done in rbd_dev_release() into a new function rbd_dev_image_release(), which serves as the inverse of setting up the ceph side of the mapped rbd image. Define a new helper rbd_dev_clear_mapping() to simply zero out the fields of a mapping structure--the inverse of rbd_dev_set_mapping(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: drop module laterAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop the module reference at the end of rbd_remove() for symmetry with adding a reference at the top of rbd_add(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: set up watch in rbd_dev_image_probe()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move setting up the watch request for an image so it's done in rbd_dev_image_probe() rather than rbd_dev_probe_finish(). Move it all the way up to before doing the initial probe. This avoids a potential race condition, in which we get (and use) the initial snapshot context for an image, and it gets changed between that time and the time we get the watch set up. This resolves: http://tracker.ceph.com/issues/3871 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: don't bother checking whether order changesAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a format 2 image is refreshed, code is in place to verify that the object order never changes from what it was originally. This relies on the fact that the refresh will occur *after* an initial load of information about the image. An upcoming patch makes it possible for the refresh to occur first, so we can no longer make this order check. The order really can't ever change anyway--this was just a sanity check. So get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: don't clean up watch in device release functionAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a watch on an rbd device header object gets torn down when its final Linux device reference gets dropped. Instead, tear it down when removing the device. If an error occurs cleaning up the watch event when unmapping, abort the unmap request. All images (including parents) still get watch requests set up, so tear these down also, in rbd_dev_remove_parent(). For now, ignore any errors that occur in this case. Get rid of local variable "rc" in rbd_remove(); use "ret" instead (they both somehow ended up defined in the function and only one is needed). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: define rbd_header_name()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a new function rbd_header_name(), which allocates and formats the name of the header object for the rbd device. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: move more initialization into rbd_dev_image_probe()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move a block of initialization related to the "ceph-side" of an rbd image out of rbd_dev_probe_finish() and into rbd_dev_image_probe(). Add appropriate error handling to clean things up in the event any of these new functions return an error. We know that rbd_dev_snaps_update(), rbd_dev_spec_update(), and rbd_dev_probe_parent() all clean up after themselves before they return an error, so no special cleanup is required except when an earlier call succeeds. Since rbd_dev_spec_update() only updates the spec field (whose cleanup will be handled by dropping the last reference to the spec) there is no cleanup action associatied with that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: probe for the parent earlierAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Probe for a parent device earlier in rbd_dev_probe_finish(), before starting to set up the Linux side of the rbd device. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: remove parent devices on probe errorAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an error occurs while finishing probing a device it is assumed that parent devices get cleaned up when deleting a device. They don't. Add a call to clean them up. Note that this means the parent spec will already be cleaned up so it doesn't have to be in one of the rbd_add() error paths. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: fix rbd_dev_remove_parent()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In certain error paths, it is possible for an rbd device to have a parent spec but no parent rbd_dev. In rbd_dev_remove_parent() use the parent field rather than parent_spec in determining whether to try to remove any parent devices. Use assertions to indicate that any non-null parent pointer has parent_spec associated with it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: kill __rbd_remove()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function __rbd_remove() is used in two spots, and it's fairly simple. It combines cleanup of part of the ceph-side state as well as cleaning up the Linux-side state. Just open code it in the two callers and eliminate the function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: set mapping info earlierAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the mapping size and features earlier in rbd_dev_probe_finish(). Define rbd_dev_mapping_clear() as an inverse for setting those fields, and use it both in error handling in rbd_dev_image_probe() and in the final cleanup in rbd_dev_release(). Change the name of rbd_dev_set_mapping() to of rbd_dev_mapping_set(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: encapsulate removing parent devicesAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encapsulate the code that removes an rbd device's parent images into a new function, rbd_dev_remove_parent(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: encapsulate probing for parent devicesAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encapsulate the code that probes for an rbd device's parent images into a new function, rbd_dev_probe_parent(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>