aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAge
...
| | | * | | rbd: separate layout initAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull a block of code that initializes the layout structure in an osd request into its own function so it can be reused. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: only get snap context for write requestsAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we get the snapshot context for an rbd image (under protection of the header semaphore) for every request processed. There's no need to get the snap context if we're doing a read, so avoid doing so in that case. Note that we no longer need to hold the header semaphore to check the rbd_dev's existence flag. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: make exists flag atomicAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rbd_device->exists field can be updated asynchronously, changing from set to clear if a mapped snapshot disappears from the base image's snapshot context. Currently, value of the "exists" flag is only read and modified under protection of the header semaphore, but that will change with the next patch. Making it atomic ensures this won't be a problem because the a the non-existence of device will be immediately known. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: a little more cleanup of rbd_rq_fn()Alex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that a big hunk in the middle of rbd_rq_fn() has been moved into its own routine we can simplify it a little more. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: end request on error in rbd_do_request() callerAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only one of the three callers of rbd_do_request() provide a collection structure to aggregate status. If an error occurs in rbd_do_request(), have the caller take care of calling rbd_coll_end_req() if necessary in that one spot. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: encapsulate handling for a single requestAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In rbd_rq_fn(), requests are fetched from the block layer and each request is processed, looping through the request's list of bio's until they've all been consumed. Separate the handling for a single request into its own function to make it a bit easier to see what's going on. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: be picky about osd request status typeAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The result field in a ceph osd reply header is a signed 32-bit type, but rbd code often casually uses int to represent it. The following changes the types of variables that handle this result value to be "s32" instead of "int" to be completely explicit about it. Only at the point we pass that result to __blk_end_request() does the type get converted to the plain old int defined for that interface. There is almost certainly no binary impact of this change, but I prefer to show the exact size and signedness of the value since we know it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
| | | * | | rbd: standardize ceph_osd_request variable namesAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are spots where a ceph_osds_request pointer variable is given the name "req". Since we're dealing with (at least) three types of requests (block layer, rbd, and osd), I find this slightly distracting. Change such instances to use "osd_req" consistently to make the abstraction represented a little more obvious. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: standardize rbd_request variable namesAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two names used for items of rbd_request structure type: "req" and "req_data". The former name is also used to represent items of pointers to struct ceph_osd_request. Change all variables that have these names so they are instead called "rbd_req" consistently. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: add warnings to rbd_dev_probe_update_spec()Alex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Josh suggested adding warnings to this function to help users diagnose problems. Other than memory allocatino errors, there are two places where errors can be returned. Both represent problems that should have been caught earlier, and as such might well have been handled with BUG_ON() calls. But if either ever did manage to happen, it will be reported. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: add a warning in bio_chain_clone_range()Alex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a warning in bio_chain_clone_range() to help a user determine what exactly might have led to a failure. There is only one; please say something if you disagree with the following reasoning. There are three places this can return abnormally: - Initially, if there is nothing to clone. It turns out that right now this cannot happen anyway. The test is in place because the code below it doesn't work if those conditions don't hold. As such they could be assertions but since I can return a null to indicate an error I just do that instead. I have not added a warning here because it won't happen. - While processing bio's, if none remain but there are supposed to be more bytes to clone. Here I have added a warning. - If bio_clone_range() returns a null pointer. That function will have already produced a warning (at least the first time, via WARN_ON_ONCE()) to distinguish the cause of the error. The only exception is memory exhaustion, and I'd rather not pepper the code with warnings in all those spots. So no warning is added in that place. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: add warning messages for missing argumentsAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tell the user (via dmesg) what was wrong with the arguments provided via /sys/bus/rbd/add. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
| | | * | | rbd: define and use rbd_warn()Alex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a new function rbd_warn() that produces a boilerplate warning message, identifying in the resulting message the affected rbd device in the best way available. Use it in a few places that now use pr_warning(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: use kmemdup()Alex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces two kmalloc()/memcpy() combinations with a single call to kmemdup(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: kill rbd_spec->image_id_lenAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no real benefit to keeping the length of an image id, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: kill rbd_spec->image_name_lenAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There may have been a benefit to hanging on to the length of an image name before, but there is really none now. The only time it's used is when probing for rbd images, so we can just compute the length then. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| | | * | | rbd: document rbd_spec structureAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I promised Josh I would document whether there were any restrictions needed for accessing fields of an rbd_spec structure. This adds a big block of comments that documents the structure and how it is used--including the fact that we don't attempt to synchronize access to it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* | | | | | Merge branch 'for-3.9/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2013-02-28
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver bits from Jens Axboe: "After the block IO core bits are in, please grab the driver updates from below as well. It contains: - Fix ancient regression in dac960. Nobody must be using that anymore... - Some good fixes from Guo Ghao for loop, fixing both potential oopses and deadlocks. - Improve mtip32xx for NUMA systems, by being a bit more clever in distributing work. - Add IBM RamSan 70/80 driver. A second round of fixes for that is pending, that will come in through for-linus during the 3.9 cycle as per usual. - A few xen-blk{back,front} fixes from Konrad and Roger. - Other minor fixes and improvements." * 'for-3.9/drivers' of git://git.kernel.dk/linux-block: loopdev: ignore negative offset when calculate loop device size loopdev: remove an user triggerable oops loopdev: move common code into loop_figure_size() loopdev: update block device size in loop_set_status() loopdev: fix a deadlock xen-blkback: use balloon pages for persistent grants xen-blkfront: drop the use of llist_for_each_entry_safe xen/blkback: Don't trust the handle from the frontend. xen-blkback: do not leak mode property block: IBM RamSan 70/80 driver fixes rsxx: add slab.h include to dma.c drivers/block/mtip32xx: add missing GENERIC_HARDIRQS dependency block: remove new __devinit/exit annotations on ramsam driver block: IBM RamSan 70/80 device driver drivers/block/mtip32xx/mtip32xx.c:1726:5: sparse: symbol 'mtip_send_trim' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c:4029:1: sparse: symbol 'mtip_workq_sdbf0' was not declared. Should it be static? dac960: return success instead of -ENOTTY mtip32xx: add trim support mtip32xx: Add workqueue and NUMA support block: delete super ancient PC-XT driver for 1980's hardware
| * | | | | | loopdev: ignore negative offset when calculate loop device sizeGuo Chao2013-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Negative offset may cause loop device size larger than backing file size. $ fallocate -l 1M a $ losetup --offset 0xffffffffffff0000 /dev/loop0 a $ blockdev --getsize64 /dev/loop0 1114112 $ ls -l a -rw-r--r-- 1 root root 1048576 Jan 23 12:46 a $ cat /dev/loop0 cat: /dev/loop0: Input/output error It makes no sense to do that. Only apply offset when it's positive. Fix a typo in the comment by the way. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | loopdev: remove an user triggerable oopsGuo Chao2013-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When loopdev is built as module and we pass an invalid parameter, loop_init() will return directly without deregister misc device, which will cause an oops when insert loop module next time because we left some garbage in the misc device list. Test case: sudo modprobe loop max_part=1024 (failed due to invalid parameter) sudo modprobe loop (oops) Clean up nicely to avoid such oops. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | loopdev: move common code into loop_figure_size()Guo Chao2013-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update block device size in accord with gendisk size and let userspace know the change in loop_figure_size(). This is a clean up to remove common code of loop_figure_size()'s two callers. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | loopdev: update block device size in loop_set_status()Guo Chao2013-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Loop device driver sometimes fails to impose the size limit on the device. Keep issuing following two commands: losetup --offset 7517244416 --sizelimit 3224971264 /dev/loop0 backed_file blockdev --getsize64 /dev/loop0 blockdev reports file size instead of sizelimit several out of 100 times. The problems are: - losetup set up the device in two ioctl: LOOP_SET_FD and LOOP_SET_STATUS64. - LOOP_SET_STATUS64 only update size of gendisk. Block device size will be updated lazily when device comes to use. If udev rushes in between the two ioctl, it will bring in a block device whose size is backing file size. If the device is not released after LOOP_SET_STATUS64 ioctl, blockdev will not see the updated size. Update block size in LOOP_SET_STATUS64 ioctl. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Reported-by: M. Hindess <hindessm@uk.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | loopdev: fix a deadlockGuo Chao2013-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bd_mutex and lo_ctl_mutex can be held in different order. Path #1: blkdev_open blkdev_get __blkdev_get (hold bd_mutex) lo_open (hold lo_ctl_mutex) Path #2: blkdev_ioctl lo_ioctl (hold lo_ctl_mutex) lo_set_capacity (hold bd_mutex) Lockdep does not report it, because path #2 actually holds a subclass of lo_ctl_mutex. This subclass seems creep into the code by mistake. The patch author actually just mentioned it in the changelog, see commit f028f3b2 ("loop: fix circular locking in loop_clr_fd()"), also see: http://marc.info/?l=linux-kernel&m=123806169129727&w=2 Path #2 hold bd_mutex to call bd_set_size(), I've protected it with i_mutex in a previous patch, so drop bd_mutex at this site. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | Merge branch 'stable/for-jens-3.9' of ↵Jens Axboe2013-02-20
| |\ \ \ \ \ \ | | | |_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.9/drivers Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.9 which has bug-fixes that did not make it in v3.8. They all are marked as material for the stable tree as well. There are two bug-fixes for the code that has been in there for some time (that is the Jan's fix and one of mine). And there are two bug-fixes for the persistent grant feature that debuted in v3.8 for xen blk[back|front]end.
| | * | | | | xen-blkback: use balloon pages for persistent grantsRoger Pau Monne2013-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With current persistent grants implementation we are not freeing the persistent grants after we disconnect the device. Since grant map operations change the mfn of the allocated page, and we can no longer pass it to __free_page without setting the mfn to a sane value, use balloon grant pages instead, as the gntdev device does. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: stable@vger.kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | | | xen-blkfront: drop the use of llist_for_each_entry_safeKonrad Rzeszutek Wilk2013-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace llist_for_each_entry_safe with a while loop. llist_for_each_entry_safe can trigger a bug in GCC 4.1, so it's best to remove it and use a while loop and do the deletion manually. Specifically this bug can be triggered by hot-unplugging a disk, either by doing xm block-detach or by save/restore cycle. BUG: unable to handle kernel paging request at fffffffffffffff0 IP: [<ffffffffa0047223>] blkif_free+0x63/0x130 [xen_blkfront] The crash call trace is: ... bad_area_nosemaphore+0x13/0x20 do_page_fault+0x25e/0x4b0 page_fault+0x25/0x30 ? blkif_free+0x63/0x130 [xen_blkfront] blkfront_resume+0x46/0xa0 [xen_blkfront] xenbus_dev_resume+0x6c/0x140 pm_op+0x192/0x1b0 device_resume+0x82/0x1e0 dpm_resume+0xc9/0x1a0 dpm_resume_end+0x15/0x30 do_suspend+0x117/0x1e0 When drilling down to the assembler code, on newer GCC it does .L29: cmpq $-16, %r12 #, persistent_gnt check je .L30 #, out of the loop .L25: ... code in the loop testq %r13, %r13 # n je .L29 #, back to the top of the loop cmpq $-16, %r12 #, persistent_gnt check movq 16(%r12), %r13 # <variable>.node.next, n jne .L25 #, back to the top of the loop .L30: While on GCC 4.1, it is: L78: ... code in the loop testq %r13, %r13 # n je .L78 #, back to the top of the loop movq 16(%rbx), %r13 # <variable>.node.next, n jmp .L78 #, back to the top of the loop Which basically means that the exit loop condition instead of being: &(pos)->member != NULL; is: ; which makes the loop unbound. Since xen-blkfront is the only user of the llist_for_each_entry_safe macro remove it from llist.h. Orabug: 16263164 CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | | | xen/blkback: Don't trust the handle from the frontend.Konrad Rzeszutek Wilk2013-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'handle' is the device that the request is from. For the life-time of the ring we copy it from a request to a response so that the frontend is not surprised by it. But we do not need it - when we start processing I/Os we have our own 'struct phys_req' which has only most essential information about the request. In fact the 'vbd_translate' ends up over-writing the preq.dev with a value from the backend. This assignment of preq.dev with the 'handle' value is superfluous so lets not do it. Cc: stable@vger.kernel.org Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | | | xen-blkback: do not leak mode propertyJan Beulich2013-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "be->mode" is obtained from xenbus_read(), which does a kmalloc() for the message body. The short string is never released, so do it along with freeing "be" itself, and make sure the string isn't kept when backend_changed() doesn't complete successfully (which made it desirable to slightly re-structure that function, so that the error cleanup can be done in one place). Reported-by: Olaf Hering <olaf@aepfle.de> CC: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | | block: IBM RamSan 70/80 driver fixesPhilip J Kelleher2013-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch includes the following driver fixes for the IBM RamSan 70/80 driver: o Changed the creg_ctrl lock from a mutex to a spinlock. o Added a count check for ioctl calls. o Removed unnecessary casting of void pointers. o Made every function static that needed to be. o Added comments to explain things more thoroughly. Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | Merge branch 'delete-xt-disk' of ↵Jens Axboe2013-02-14
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux into for-3.9/drivers Paul writes: Please pull the following to get the removal of the original IBM PC-XT hard disk driver from the block layer (drivers/block/xd.c). As near as I can tell, it hasn't seen a run time fix in over a dozen years, and with drive sizes of 10-20MB, and performance of about 128kB/s maximum, it is no surprise that it has been completely unused for well over a decade. The removal was originally posted[1] well over a month ago, and since then, there has been nobody objecting to the removal, aside from someone who had mistakenly confused it with a completely different driver (hd.c)
| | * | | | | | block: delete super ancient PC-XT driver for 1980's hardwarePaul Gortmaker2013-01-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This driver was for the 8 bit ISA cards that were installed in the PC-XT machines of 1980 vintage. They supported the dual ribbon cable MFM drives of 10-20MB capacity, and ran at a 3:1 interleave, giving performance on the order of 128kB/s. By the introduction of the PC-AT (286) these controllers were already scrapped in favour of 16 bit controllers with some onboard RAM that could support a 1:1 interleave. The git history doesn't show any evidence of runtime fixes that would reflect active usage; instead just the usual tree-wide API type changes/cleanups. Going back to in-source changelogs, the last "runtime" fix that is evident is something I did over a dozen years ago[1] -- and even back then, the hardware was long since unavailable, so that ancient fix was also not runtime tested. The time is long overdue for this to get flushed, so lets get rid of it before anyone wastes more time doing builds and sparse checks etc. on long since dead code. [1] http://lkml.indiana.edu/hypermail/linux/kernel/0102.2/0027.html Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * | | | | | | rsxx: add slab.h include to dma.cJens Axboe2013-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kbuild test robot says: tree: git://git.kernel.dk/linux-block.git for-3.9/drivers head: 1262e24a59a052f9a98383e47e7c903712490d5c commit: 8722ff8cdbfac9c1b20e67bb067b455c48cb8e93 [6/8] block: IBM RamSan 70/80 device driver config: make ARCH=alpha allyesconfig All error/warnings: drivers/block/rsxx/dma.c: In function 'rsxx_complete_dma': >> drivers/block/rsxx/dma.c:251:2: error: implicit declaration of function 'kmem_cache_free' [-Werror=implicit-function-declaration] drivers/block/rsxx/dma.c: In function 'rsxx_queue_discard': >> drivers/block/rsxx/dma.c:567:2: error: implicit declaration of function 'kmem_cache_alloc' [-Werror=implicit-function-declaration] >> drivers/block/rsxx/dma.c:567:6: warning: assignment makes pointer from integer without a cast [enabled by default] drivers/block/rsxx/dma.c: In function 'rsxx_queue_dma': >> drivers/block/rsxx/dma.c:601:6: warning: assignment makes pointer from integer without a cast [enabled by default] drivers/block/rsxx/dma.c: In function 'rsxx_dma_init': >> drivers/block/rsxx/dma.c:985:2: error: implicit declaration of function 'KMEM_CACHE' [-Werror=implicit-function-declaration] >> drivers/block/rsxx/dma.c:985:29: error: 'rsxx_dma' undeclared (first use in this function) drivers/block/rsxx/dma.c:985:29: note: each undeclared identifier is reported only once for each function it appears in >> drivers/block/rsxx/dma.c:985:39: error: 'SLAB_HWCACHE_ALIGN' undeclared (first use in this function) drivers/block/rsxx/dma.c: In function 'rsxx_dma_cleanup': >> drivers/block/rsxx/dma.c:995:2: error: implicit declaration of function 'kmem_cache_destroy' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | drivers/block/mtip32xx: add missing GENERIC_HARDIRQS dependencyHeiko Carstens2013-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MTIP32XX driver calls devm_request_irq() and therefore needs a GENERIC_HARDIRQS dependency to prevent building it on s390. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | block: remove new __devinit/exit annotations on ramsam driverStephen Rothwell2013-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | block: IBM RamSan 70/80 device driverjosh.h.morris@us.ibm.com2013-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch includes the device driver for the IBM RamSan family of PCI SSD flash storage cards. This driver will include support for the RamSan 70 and 80. The driver presents a block device for device I/O. Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | drivers/block/mtip32xx/mtip32xx.c:1726:5: sparse: symbol 'mtip_send_trim' ↵Fengguang Wu2013-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | was not declared. Should it be static? Hi Asai, FYI, there are new sparse warnings show up in tree: git://git.kernel.dk/linux-block.git for-3.9/drivers head: 3d6a87430e764cf4132710538139c10970929189 commit: 152834694dcb90d3088bc77fa436755c66553929 [2/3] mtip32xx: add trim support >> drivers/block/mtip32xx/mtip32xx.c:1726:5: sparse: symbol 'mtip_send_trim' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c:3348:17: sparse: cast to restricted __le32 drivers/block/mtip32xx/mtip32xx.c:4125:1: sparse: symbol 'mtip_workq_sdbf0' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c:4126:1: sparse: symbol 'mtip_workq_sdbf1' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c:4127:1: sparse: symbol 'mtip_workq_sdbf2' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c:4128:1: sparse: symbol 'mtip_workq_sdbf3' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c:4129:1: sparse: symbol 'mtip_workq_sdbf4' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c:4130:1: sparse: symbol 'mtip_workq_sdbf5' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c:4131:1: sparse: symbol 'mtip_workq_sdbf6' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c:4132:1: sparse: symbol 'mtip_workq_sdbf7' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c: In function 'mtip_hw_read_flags': drivers/block/mtip32xx/mtip32xx.c:2804:1: warning: the frame size of 1036 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/block/mtip32xx/mtip32xx.c: In function 'mtip_hw_read_registers': drivers/block/mtip32xx/mtip32xx.c:2781:1: warning: the frame size of 1044 bytes is larger than 1024 bytes [-Wframe-larger-than=] Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | drivers/block/mtip32xx/mtip32xx.c:4029:1: sparse: symbol 'mtip_workq_sdbf0' ↵Fengguang Wu2013-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | was not declared. Should it be static? Hi Asai, FYI, there are new sparse warnings show up in tree: git://git.kernel.dk/linux-block.git for-3.9/drivers head: 3d6a87430e764cf4132710538139c10970929189 commit: 16c906e51c6f08a15763a85b5686e1ded35e77ab [1/3] mtip32xx: Add workqueue and NUMA support drivers/block/mtip32xx/mtip32xx.c:3267:17: sparse: cast to restricted __le32 >> drivers/block/mtip32xx/mtip32xx.c:4029:1: sparse: symbol 'mtip_workq_sdbf0' was not declared. Should it be static? >> drivers/block/mtip32xx/mtip32xx.c:4030:1: sparse: symbol 'mtip_workq_sdbf1' was not declared. Should it be static? >> drivers/block/mtip32xx/mtip32xx.c:4031:1: sparse: symbol 'mtip_workq_sdbf2' was not declared. Should it be static? >> drivers/block/mtip32xx/mtip32xx.c:4032:1: sparse: symbol 'mtip_workq_sdbf3' was not declared. Should it be static? >> drivers/block/mtip32xx/mtip32xx.c:4033:1: sparse: symbol 'mtip_workq_sdbf4' was not declared. Should it be static? >> drivers/block/mtip32xx/mtip32xx.c:4034:1: sparse: symbol 'mtip_workq_sdbf5' was not declared. Should it be static? >> drivers/block/mtip32xx/mtip32xx.c:4035:1: sparse: symbol 'mtip_workq_sdbf6' was not declared. Should it be static? >> drivers/block/mtip32xx/mtip32xx.c:4036:1: sparse: symbol 'mtip_workq_sdbf7' was not declared. Should it be static? drivers/block/mtip32xx/mtip32xx.c: In function 'mtip_hw_read_flags': drivers/block/mtip32xx/mtip32xx.c:2723:1: warning: the frame size of 1036 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/block/mtip32xx/mtip32xx.c: In function 'mtip_hw_read_registers': drivers/block/mtip32xx/mtip32xx.c:2700:1: warning: the frame size of 1044 bytes is larger than 1024 bytes [-Wframe-larger-than=] Please consider folding the below diff :-) Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | dac960: return success instead of -ENOTTYDan Carpenter2013-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a missing break statement here. This used to return directly but we re-worked it in 2008 to add locking as part of the BKL push down. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | mtip32xx: add trim supportAsai Thambi S P2013-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TRIM support added through vendor unique command. Signed-off-by: Sam Bradshaw < sbradshaw@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | mtip32xx: Add workqueue and NUMA supportAsai Thambi S P2013-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains * parallel command completion using workers * bind the workers to the chosen numa node * bind isr to the chosen numa node * allocating memory in the chosen numa node Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | | | Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-blockLinus Torvalds2013-02-28
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block IO core bits from Jens Axboe: "Below are the core block IO bits for 3.9. It was delayed a few days since my workstation kept crashing every 2-8h after pulling it into current -git, but turns out it is a bug in the new pstate code (divide by zero, will report separately). In any case, it contains: - The big cfq/blkcg update from Tejun and and Vivek. - Additional block and writeback tracepoints from Tejun. - Improvement of the should sort (based on queues) logic in the plug flushing. - _io() variants of the wait_for_completion() interface, using io_schedule() instead of schedule() to contribute to io wait properly. - Various little fixes. You'll get two trivial merge conflicts, which should be easy enough to fix up" Fix up the trivial conflicts due to hlist traversal cleanups (commit b67bfe0d42ca: "hlist: drop the node parameter from iterators"). * 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits) block: remove redundant check to bd_openers() block: use i_size_write() in bd_set_size() cfq: fix lock imbalance with failed allocations drivers/block/swim3.c: fix null pointer dereference block: don't select PERCPU_RWSEM block: account iowait time when waiting for completion of IO request sched: add wait_for_completion_io[_timeout] writeback: add more tracepoints block: add block_{touch|dirty}_buffer tracepoint buffer: make touch_buffer() an exported function block: add @req to bio_{front|back}_merge tracepoints block: add missing block_bio_complete() tracepoint block: Remove should_sort judgement when flush blk_plug block,elevator: use new hashtable implementation cfq-iosched: add hierarchical cfq_group statistics cfq-iosched: collect stats from dead cfqgs cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats() blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock block: RCU free request_queue blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge() ...
| * | | | | | | | drivers/block/swim3.c: fix null pointer dereferenceCong Ding2013-02-22
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of pointer fs should be after the null check. Signed-off-by: Cong Ding <dinggnu@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | | | nbd: fix sparse warningAlex Elder2013-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I just fixed this in "drivers/block/rbd.c" and I noticed that "drivers/block/nbd.c" has the same problem. Fix a warning issued by sparse by adding some lockdep annotations to indicate the queue lock gets dropped (because it's held when do_nbd_request() is called) and re-acquired within the function. Signed-off-by: Alex Elder <elder@inktank.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Paul Clements <paul.clements@us.sios.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | nbd: show read-only state in sysfsPaolo Bonzini2013-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the read-only flag to set_device_ro, so that it will be visible to the block layer and in sysfs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: Alex Bligh <alex@alex.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | nbd: fsync and kill block device on shutdownPaolo Bonzini2013-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two problems with shutdown in the NBD driver. 1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem. This patch adds the sync operation into __nbd_ioctl()'s NBD_DISCONNECT handler. This is useful because BLKFLSBUF is restricted to processes that have CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the block device does not sync the filesystem, either). 2: Once we clear the socket we have no guarantee that later reads will come from the same backing storage. The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket clearing code so the page cache is cleaned, lest reads that hit on the page cache will return stale data from the previously-accessible disk. Example: # qemu-nbd -r -c/dev/nbd0 /dev/sr0 # file -s /dev/nbd0 /dev/stdin: # UDF filesystem data (version 1.5) etc. # qemu-nbd -d /dev/nbd0 # qemu-nbd -r -c/dev/nbd0 /dev/sda # file -s /dev/nbd0 /dev/stdin: # UDF filesystem data (version 1.5) etc. While /dev/sda has: # file -s /dev/sda /dev/sda: x86 boot sector; etc. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paul Clements <Paul.Clements@steeleye.com> Cc: Alex Bligh <alex@alex.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | nbd: support FLUSH requestsAlex Bligh2013-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the NBD device does not accept flush requests from the Linux block layer. If the NBD server opened the target with neither O_SYNC nor O_DSYNC, however, the device will be effectively backed by a writeback cache. Without issuing flushes properly, operation of the NBD device will not be safe against power losses. The NBD protocol has support for both a cache flush command and a FUA command flag; the server will also pass a flag to note its support for these features. This patch adds support for the cache flush command and flag. In the kernel, we receive the flags via the NBD_SET_FLAGS ioctl, and map NBD_FLAG_SEND_FLUSH to the argument of blk_queue_flush. When the flag is active the block layer will send REQ_FLUSH requests, which we translate to NBD_CMD_FLUSH commands. FUA support is not included in this patch because all free software servers implement it with a full fdatasync; thus it has no advantage over supporting flush only. Because I [Paolo] cannot really benchmark it in a realistic scenario, I cannot tell if it is a good idea or not. It is also not clear if it is valid for an NBD server to support FUA but not flush. The Linux block layer gives a warning for this combination, the NBD protocol documentation says nothing about it. The patch also fixes a small problem in the handling of flags: nbd->flags must be cleared at the end of NBD_DO_IT, but the driver was not doing that. The bug manifests itself as follows. Suppose you two different client/server pairs to start the NBD device. Suppose also that the first client supports NBD_SET_FLAGS, and the first server sends NBD_FLAG_SEND_FLUSH; the second pair instead does neither of these two things. Before this patch, the second invocation of NBD_DO_IT will use a stale value of nbd->flags, and the second server will issue an error every time it receives an NBD_CMD_FLUSH command. This bug is pre-existing, but it becomes much more important after this patch; flush failures make the device pretty much unusable, unlike Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Bligh <alex@alex.org.uk> Acked-by: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | drbd: convert to idr_alloc()Tejun Heo2013-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | block/loop: convert to idr_alloc()Tejun Heo2013-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | block/loop: don't use idr_remove_all()Tejun Heo2013-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-02-26
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...