aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAge
...
| * | | | rbd: record aggregate image transfer countAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compute the total number of bytes transferred for an image request--the sum across each of the request's object requests. To avoid contention do it only when all object requests are complete, in rbd_img_request_complete(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: record overall image request resultAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If any image object request produces a non-zero result, preserve that as the result of the overall image request. If multiple objects have non-zero results, save only the first one. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: update feature bitsAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a new rbd feature bit defined for "fancy striping." Add it to the ones defined in the kernel client. Change RBD_FEATURES_ALL so it represents the set of all feature bits (rather than just the ones we support). Define a new symbol RBD_FEATURES_SUPPORTED to indicate the supported ones. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: make method call data be a separate data itemAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now the data for a method call is specified via a pointer and length, and it's copied--along with the class and method name--into a pagelist data item to be sent to the osd. Instead, encode the data in a data item separate from the class and method names. This will allow large amounts of data to be supplied to methods without copying. Only rbd uses the class functionality right now, and when it really needs this it will probably need to use a page array rather than a page list. But this simple implementation demonstrates the functionality on the osd client, and that's enough for now. This resolves: http://tracker.ceph.com/issues/4104 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: combine initializing and setting osd dataAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ends up being a rather large patch but what it's doing is somewhat straightforward. Basically, this is replacing two calls with one. The first of the two calls is initializing a struct ceph_osd_data with data (either a page array, a page list, or a bio list); the second is setting an osd request op so it associates that data with one of the op's parameters. In place of those two will be a single function that initializes the op directly. That means we sort of fan out a set of the needed functions: - extent ops with pages data - extent ops with pagelist data - extent ops with bio list data and - class ops with page data for receiving a response We also have define another one, but it's only used internally: - class ops with pagelist data for request parameters Note that we *still* haven't gotten rid of the osd request's r_data_in and r_data_out fields. All the osd ops refer to them for their data. For now, these data fields are pointers assigned to the appropriate r_data_* field when these new functions are called. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: rearrange some code for consistencyAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch just trivially moves around some code for consistency. In preparation for initializing osd request data fields in ceph_osdc_build_request(), I wanted to verify that rbd did in fact call that immediately before it called ceph_osdc_start_request(). It was true (although image requests are built in a group and then started as a group). But I made the changes here just to make it more obvious, by making all of the calls follow a common sequence: osd_req_op_<optype>_init(); ceph_osd_data_<type>_init() osd_req_op_<optype>_<datafield>() rbd_osd_req_format() ... ret = rbd_obj_request_submit() I moved the initialization of the callback for image object requests into rbd_img_request_fill_bio(), again, for consistency. To avoid a forward reference, I moved the definition of rbd_img_obj_callback() up in the file. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: separate initialization of osd dataAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The osd data for a request is currently initialized inside rbd_osd_req_create(), but that assumes an object request's data belongs in the osd request's data in or data out field. There are only three places where requests with data are set up, and it turns out it's easier to call just the osd data init routines directly there rather than handling it in rbd_osd_req_create(). (The real motivation here is moving toward getting rid of the osd request in and out data fields.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: don't set data in rbd_osd_req_format_op()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently an object request has its osd request's data field set in rbd_osd_req_format_op(). That assumes a single osd op per object request, and that won't be the case for long. Move the code that sets this out and into the caller. Rename rbd_osd_req_format_op() to be just rbd_osd_req_format(), removing the notion that it's doing anything op-specific. This and the next patch resolve: http://tracker.ceph.com/issues/4658 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: specify osd op by index in requestAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An osd request now holds all of its source op structures, and every place that initializes one of these is in fact initializing one of the entries in the the osd request's array. So rather than supplying the address of the op to initialize, have caller specify the osd request and an indication of which op it would like to initialize. This better hides the details the op structure (and faciltates moving the data pointers they use). Since osd_req_op_init() is a common routine, and it's not used outside the osd client code, give it static scope. Also make it return the address of the specified op (so all the other init routines don't have to repeat that code). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: add data pointers in osd op structuresAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An extent type osd operation currently implies that there will be corresponding data supplied in the data portion of the request (for write) or response (for read) message. Similarly, an osd class method operation implies a data item will be supplied to receive the response data from the operation. Add a ceph_osd_data pointer to each of those structures, and assign it to point to eithre the incoming or the outgoing data structure in the osd message. The data is not always available when an op is initially set up, so add two new functions to allow setting them after the op has been initialized. Begin to make use of the data item pointer available in the osd operation rather than the request data in or out structure in places where it's convenient. Add some assertions to verify pointers are always set the way they're expected to be. This is a sort of stepping stone toward really moving the data into the osd request ops, to allow for some validation before making that jump. This is the first in a series of patches that resolve: http://tracker.ceph.com/issues/4657 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: keep source rather than message osd op arrayAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An osd request keeps a pointer to the osd operations (ops) array that it builds in its request message. In order to allow each op in the array to have its own distinct data, we will need to keep track of each op's data, and that information does not go over the wire. As long as we're tracking the data we might as well just track the entire (source) op definition for each of the ops. And if we're doing that, we'll have no more need to keep a pointer to the wire-encoded version. This patch makes the array of source ops be kept with the osd request structure, and uses that instead of the version encoded in the message in places where that was previously used. The array will be embedded in the request structure, and the maximum number of ops we ever actually use is currently 2. So reduce CEPH_OSD_MAX_OP to 2 to reduce the size of the structure. The result of doing this sort of ripples back up, and as a result various function parameters and local variables become unnecessary. Make r_num_ops be unsigned, and move the definition of struct ceph_osd_req_op earlier to ensure it's defined where needed. It does not yet add per-op data, that's coming soon. This resolves: http://tracker.ceph.com/issues/4656 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: define rbd_osd_req_format_op()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define rbd_osd_req_format_op(), which encapsulates formatting an osd op into an object request's osd request message. Only one op is supported right now. Stop calling ceph_osdc_build_request() in rbd_osd_req_create(). Instead, call rbd_osd_req_format_op() in each of the callers of rbd_osd_req_create(). This is to prepare for the next patch, in which the source ops for an osd request will be held in the osd request itself. Because of that, we won't have the source op to work with until after the request is created, so we can't format the op until then. This an the next patch resolve: http://tracker.ceph.com/issues/4656 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: define osd data initialization helpersAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define and use functions that encapsulate the initializion of a ceph_osd_data structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | rbd: define inbound data size for method opsAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When rbd creates an object request containing an object method call operation it is passing 0 for the size. I originally thought this was because the length was not needed for method calls, but I think it really should be supplied, to describe how much space is available to receive response data. So provide the supplied length. This resolves: http://tracker.ceph.com/issues/4659 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: record length of bio list with bioAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When assigning a bio pointer to an osd request, we don't have an efficient way of knowing the total length bytes in the bio list. That information is available at the point it's set up by the rbd code, so record it with the osd data when it's set. This and the next patch are related to maintaining the length of a message's data independent of the message header, as described here: http://tracker.ceph.com/issues/4589 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: define source request op functionsAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rbd code has a function that allocates and populates a ceph_osd_req_op structure (the in-core version of an osd request operation). When reviewed, Josh suggested two things: that the big varargs function might be better split into type-specific functions; and that this functionality really belongs in the osd client rather than rbd. This patch implements both of Josh's suggestions. It breaks up the rbd function into separate functions and defines them in the osd client module as exported interfaces. Unlike the rbd version, however, the functions don't allocate an osd_req_op structure; they are provided the address of one and that is initialized instead. The rbd function has been eliminated and calls to it have been replaced by calls to the new routines. The rbd code now now use a stack (struct) variable to hold the op rather than allocating and freeing it each time. For now only the capabilities used by rbd are implemented. Implementing all the other osd op types, and making the rest of the code use it will be done separately, in the next few patches. Note that only the extent, cls, and watch portions of the ceph_osd_req_op structure are currently used. Delete the others (xattr, pgls, and snap) from its definition so nobody thinks it's actually implemented or needed. We can add it back again later if needed, when we know it's been tested. This (and a few follow-on patches) resolves: http://tracker.ceph.com/issues/3861 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | ceph: move max constant definitionsAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move some definitions for max integer values out of the rbd code and into the more central "decode.h" header file. These really belong in a Linux (or libc) header somewhere, but I haven't gotten around to proposing that yet. This is in preparation for moving some code out of rbd.c and into the osd client. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: let osd ops determine request data lengthAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The length of outgoing data in an osd request is dependent on the osd ops that are embedded in that request. Each op is encoded into a request message using osd_req_encode_op(), so that should be used to determine the amount of outgoing data implied by the op as it is encoded. Have osd_req_encode_op() return the number of bytes of outgoing data implied by the op being encoded, and accumulate and use that in ceph_osdc_build_request(). As a result, ceph_osdc_build_request() no longer requires its "len" parameter, so get rid of it. Using the sum of the op lengths rather than the length provided is a valid change because: - The only callers of osd ceph_osdc_build_request() are rbd and the osd client (in ceph_osdc_new_request() on behalf of the file system). - When rbd calls it, the length provided is only non-zero for write requests, and in that case the single op has the same length value as what was passed here. - When called from ceph_osdc_new_request(), (it's not all that easy to see, but) the length passed is also always the same as the extent length encoded in its (single) write op if present. This resolves: http://tracker.ceph.com/issues/4406 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: record byte count not page countAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Record the byte count for an osd request rather than the page count. The number of pages can always be derived from the byte count (and alignment/offset) but the reverse is not true. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: separate read and write dataAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An osd request defines information about where data to be read should be placed as well as where data to write comes from. Currently these are represented by common fields. Keep information about data for writing separate from data to be read by splitting these into data_in and data_out fields. This is the key patch in this whole series, in that it actually identifies which osd requests generate outgoing data and which generate incoming data. It's less obvious (currently) that an osd CALL op generates both outgoing and incoming data; that's the focus of some upcoming work. This resolves: http://tracker.ceph.com/issues/4127 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: distinguish page and bio requestsAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An osd request uses either pages or a bio list for its data. Use a union to record information about the two, and add a data type tag to select between them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
| * | | | libceph: separate osd request data infoAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull the fields in an osd request structure that define the data for the request out into a separate structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* | | | | Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds2013-05-02
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio & lguest updates from Rusty Russell: "Lots of virtio work which wasn't quite ready for last merge window. Plus I dived into lguest again, reworking the pagetable code so we can move the switcher page: our fixmaps sometimes take more than 2MB now..." Ugh. Annoying conflicts with the tcm_vhost -> vhost_scsi rename. Hopefully correctly resolved. * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits) caif_virtio: Remove bouncing email addresses lguest: improve code readability in lg_cpu_start. virtio-net: fill only rx queues which are being used lguest: map Switcher below fixmap. lguest: cache last cpu we ran on. lguest: map Switcher text whenever we allocate a new pagetable. lguest: don't share Switcher PTE pages between guests. lguest: expost switcher_pages array (as lg_switcher_pages). lguest: extract shadow PTE walking / allocating. lguest: make check_gpte et. al return bool. lguest: assume Switcher text is a single page. lguest: rename switcher_page to switcher_pages. lguest: remove RESERVE_MEM constant. lguest: check vaddr not pgd for Switcher protection. lguest: prepare to make SWITCHER_ADDR a variable. virtio: console: replace EMFILE with EBUSY for already-open port virtio-scsi: reset virtqueue affinity when doing cpu hotplug virtio-scsi: introduce multiqueue support virtio-scsi: push vq lock/unlock into virtscsi_vq_done virtio-scsi: pass struct virtio_scsi to virtqueue completion function ...
| * | | | | virtio_blk: remove nents member.Rusty Russell2013-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's simply a flag as to whether we have data now, so make it an explicit function parameter rather than a member of struct virtblk_req. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Asias He <asias@redhat.com>
| * | | | | virtio-blk: use virtqueue_add_sgs on req pathPaolo Bonzini2013-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (This is a respin of Paolo Bonzini's patch, but it calls virtqueue_add_sgs() instead of his multi-part API). This is similar to the previous patch, but a bit more radical because the bio and req paths now share the buffer construction code. Because the req path doesn't use vbr->sg, however, we need to add a couple of arguments to __virtblk_add_req. We also need to teach __virtblk_add_req how to build SCSI command requests. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Asias He <asias@redhat.com>
| * | | | | virtio-blk: use virtqueue_add_sgs on bio pathPaolo Bonzini2013-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (This is a respin of Paolo Bonzini's patch, but it calls virtqueue_add_sgs() instead of his multi-part API). Move the creation of the request header and response footer to __virtblk_add_req. vbr->sg only contains the data scatterlist, the header/footer are added separately using virtqueue_add_sgs(). With this change, virtio-blk (with use_bio) is not relying anymore on the virtio functions ignoring the end markers in a scatterlist. The next patch will do the same for the other path. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Asias He <asias@redhat.com>
| * | | | | virtio-blk: reorganize virtblk_add_reqPaolo Bonzini2013-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, both virtblk_add_req and virtblk_add_req_wait call virtqueue_add_buf. To prepare for the next patches, abstract the call to virtqueue_add_buf into a new function __virtblk_add_req, and include the waiting logic directly in virtblk_add_req. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | virtio-blk: emit udev event when device is resizedMilos Vyletel2013-03-12
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When virtio-blk device is resized from host (using block_resize from QEMU) emit KOBJ_CHANGE uevent to notify guest about such change. This allows user to have custom udev rules which would take whatever action if such event occurs. As a proof of concept I've created simple udev rule that automatically resize filesystem on virtio-blk device. ACTION=="change", KERNEL=="vd*", \ ENV{RESIZE}=="1", \ ENV{ID_FS_TYPE}=="ext[3-4]", \ RUN+="/sbin/resize2fs /dev/%k" ACTION=="change", KERNEL=="vd*", \ ENV{RESIZE}=="1", \ ENV{ID_FS_TYPE}=="LVM2_member", \ RUN+="/sbin/pvresize /dev/%k" Signed-off-by: Milos Vyletel <milos.vyletel@sde.cz> Tested-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor simplification)
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-05-01
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...
| * | | | | procfs: new helper - PDE_DATA(inode)Al Viro2013-04-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only part of proc_dir_entry the code outside of fs/proc really cares about is PDE(inode)->data. Provide a helper for that; static inline for now, eventually will be moved to fs/proc, along with the knowledge of struct proc_dir_entry layout. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | cciss: switch to ->show_info()Al Viro2013-04-09
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | lift sb_start_write() out of ->write()Al Viro2013-04-09
| | |_|_|/ | |/| | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | aoe: replace kmalloc and then memcpy with kmemdupMihnea Dobrescu-Balaur2013-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Mihnea Dobrescu-Balaur <mihneadb@gmail.com> Cc: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | nbd: increase default and max request sizesMichal Belczyk2013-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Raise the default max request size for nbd to 128KB (from 127KB) to get it 4KB aligned. This patch also allows the max request size to be increased (via /sys/block/nbd<x>/queue/max_sectors_kb) to 32MB. The patch makes nbd network traffic more efficient by: - reducing request fragmentation (4KB alignment) - reducing the number of requests (fewer round trips, less network overhead) Especially in high latency networks, larger request size can make a dramatic Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Michal Belczyk <belczyk@bsd.krakow.pl> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | drbd: rename random32() to prandom_u32()Akinobu Mita2013-04-29
| |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-04-17
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "It's a simple fix for a hard to hit race, but low-risk and clearly correct" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: do a safe list traversal in rbd_img_request_submit()
| * | | | rbd: do a safe list traversal in rbd_img_request_submit()Alex Elder2013-04-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible that the reference to the object request dropped inside the loop in rbd_img_request_submit() will be the last one, in which case the content of the object pointer can't be trusted. Use a safe form of the object request list traversal to avoid problems. This resolves: http://tracker.ceph.com/issues/4705 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* | | | | Merge tag 'for-linus-20130409' of git://git.kernel.dk/linux-blockLinus Torvalds2013-04-09
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "I've got a few smaller fixes queued up for 3.9 that should go in. The major one is the loop regression, the others are nice fixes on their own though. It contains: - Fix for unitialized var in the block sysfs code, courtesy of Arnd and gcc-4.8. - Two fixes for mtip32xx, fixing probe and command timeout. Also a debug measure that could have waited for 3.10, but it's driver only, so I let it slip in. - Revert the loop partition cleanup fix, it could cause a deadlock on auto-teardown as part of umount. The fix is clear, but at this point we just want to revert it and get a real fix in for 3.10." * tag 'for-linus-20130409' of git://git.kernel.dk/linux-block: Revert "loop: cleanup partitions when detaching loop device" mtip32xx: fix two smatch warnings mtip32xx: Add debugfs entry device_status mtip32xx: return 0 from pci probe in case of rebuild mtip32xx: recovery from command timeout block: avoid using uninitialized value in from queue_var_store
| * | | | | Revert "loop: cleanup partitions when detaching loop device"Jens Axboe2013-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8761a3dc1f07b163414e2215a2cadbb4cfe2a107. There are situations where the destruction path is called with the bdev->bd_mutex already held, which then deadlocks in loop_clr_fd(). The normal partition cleanup does a trylock() on the mutex, but it'd be nice to have a more bullet proof method in loop. So punt this more involved fix to the next merge window, and just back out this buggy fix for now. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | mtip32xx: fix two smatch warningsJens Axboe2013-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan reports: New smatch warnings: drivers/block/mtip32xx/mtip32xx.c:2728 show_device_status() warn: variable dereferenced before check 'dd' (see line 2727) drivers/block/mtip32xx/mtip32xx.c:2758 show_device_status() warn: variable dereferenced before check 'dd' (see line 2757) which are checking if dd == NULL, in a list_for_each_entry() type loop. Get rid of the check, dd can never be NULL here. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | mtip32xx: Add debugfs entry device_statusAsai Thambi S P2013-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new debugfs entry 'device_status' in /sys/kernel/debug/rssd. The value of this entry shows devices online and those in the process of removing. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | mtip32xx: return 0 from pci probe in case of rebuildAsai Thambi S P2013-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix to return 0 from pci probe in case of rebuild. If not, pci consider probe has failed, and crash during rmmod. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | mtip32xx: recovery from command timeoutAsai Thambi S P2013-04-03
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To recover from command timeouts, reset the device. In addition to that improved timeout handling of PIO commands. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | loop: prevent bdev freeing while device in useAnatol Pomozov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct block_device lifecycle is defined by its inode (see fs/block_dev.c) - block_device allocated first time we access /dev/loopXX and deallocated on bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile" we want that block_device stay alive until we destroy the loop device with "losetup -d". But because we do not hold /dev/loopXX inode its counter goes 0, and inode/bdev can be destroyed at any moment. Usually it happens at memory pressure or when user drops inode cache (like in the test below). When later in loop_clr_fd() we want to use bdev we have use-after-free error with following stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000280 bd_set_size+0x10/0xa0 loop_clr_fd+0x1f8/0x420 [loop] lo_ioctl+0x200/0x7e0 [loop] lo_compat_ioctl+0x47/0xe0 [loop] compat_blkdev_ioctl+0x341/0x1290 do_filp_open+0x42/0xa0 compat_sys_ioctl+0xc1/0xf20 do_sys_open+0x16e/0x1d0 sysenter_dispatch+0x7/0x1a To prevent use-after-free we need to grab the device in loop_set_fd() and put it later in loop_clr_fd(). The issue is reprodusible on current Linus head and v3.3. Here is the test: dd if=/dev/zero of=loop.file bs=1M count=1 while [ true ]; do losetup /dev/loop0 loop.file echo 2 > /proc/sys/vm/drop_caches losetup -d /dev/loop0 done [ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every time we call loop_set_fd() we check that loop_device->lo_state is Lo_unbound and set it to Lo_bound If somebody will try to set_fd again it will get EBUSY. And if we try to loop_clr_fd() on unbound loop device we'll get ENXIO. loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under loop_device->lo_ctl_mutex. ] Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-04-01
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) sadb_msg prepared for IPSEC userspace forgets to initialize the satype field, fix from Nicolas Dichtel. 2) Fix mac80211 synchronization during station removal, from Johannes Berg. 3) Fix IPSEC sequence number notifications when they wrap, from Steffen Klassert. 4) Fix cfg80211 wdev tracing crashes when add_virtual_intf() returns an error pointer, from Johannes Berg. 5) In mac80211, don't call into the channel context code with the interface list mutex held. From Johannes Berg. 6) In mac80211, if we don't actually associate, do not restart the STA timer, otherwise we can crash. From Ben Greear. 7) Missing dma_mapping_error() check in e1000, ixgb, and e1000e. From Christoph Paasch. 8) Fix sja1000 driver defines to not conflict with SH port, from Marc Kleine-Budde. 9) Don't call il4965_rs_use_green with a NULL station, from Colin Ian King. 10) Suspend/Resume in the FEC driver fail because the buffer descriptors are not initialized at all the moments in which they should. Fix from Frank Li. 11) cpsw and davinci_emac drivers both use the wrong interface to restart a stopped TX queue. Use netif_wake_queue not netif_start_queue, the latter is for initialization/bringup not active management of the queue. From Mugunthan V N. 12) Fix regression in rate calculations done by psched_ratecfg_precompute(), missing u64 type promotion. From Sergey Popovich. 13) Fix length overflow in tg3 VPD parsing, from Kees Cook. 14) AOE driver fails to allocate enough headroom, resulting in crashes. Fix from Eric Dumazet. 15) RX overflow happens too quickly in sky2 driver because pause packet thresholds are not programmed correctly. From Mirko Lindner. 16) Bonding driver manages arp_interval and miimon settings incorrectly, disabling one unintentionally disables both. Fix from Nikolay Aleksandrov. 17) smsc75xx drivers don't program the RX mac properly for jumbo frames. Fix from Steve Glendinning. 18) Fix off-by-one in Codel packet scheduler. From Vijay Subramanian. 19) Fix packet corruption in atl1c by disabling MSI support, from Hannes Frederic Sowa. 20) netdev_rx_handler_unregister() needs a synchronize_net() to fix crashes in bonding driver unload stress tests. From Eric Dumazet. 21) rxlen field of ks8851 RX packet descriptors not interpreted correctly (it is 12 bits not 16 bits, so needs to be masked after shifting the 32-bit value down 16 bits). Fix from Max Nekludov. 22) Fix missed RX/TX enable in sh_eth driver due to mishandling of link change indications. From Sergei Shtylyov. 23) Fix crashes during spurious ECI interrupts in sh_eth driver, also from Sergei Shtylyov. 24) dm9000 driver initialization is done wrong for revision B devices with DSP PHY, from Joseph CHANG. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits) DM9000B: driver initialization upgrade sh_eth: make 'link' field of 'struct sh_eth_private' *int* sh_eth: workaround for spurious ECI interrupt sh_eth: fix handling of no LINK signal ks8851: Fix interpretation of rxlen field. net: add a synchronize_net() in netdev_rx_handler_unregister() MAINTAINERS: Update netxen_nic maintainers list atl1e: drop pci-msi support because of packet corruption net: fq_codel: Fix off-by-one error net: calxedaxgmac: Wake-on-LAN fixes net: calxedaxgmac: fix rx ring handling when OOM net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb' smsc75xx: fix jumbo frame support net: fix the use of this_cpu_ptr bonding: fix disabling of arp_interval and miimon ipv6: don't accept node local multicast traffic from the wire sky2: Threshold for Pause Packet is set wrong sky2: Receive Overflows not counted aoe: reserve enough headroom on skbs line up comment for ndo_bridge_getlink ...
| * | | | aoe: reserve enough headroom on skbsEric Dumazet2013-03-28
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some network drivers use a non default hard_header_len Transmitted skb should take into account dev->hard_header_len, or risk crashes or expensive reallocations. In the case of aoe, lets reserve MAX_HEADER bytes. David reported a crash in defxx driver, solved by this patch. Reported-by: David Oostdyk <daveo@ll.mit.edu> Tested-by: David Oostdyk <daveo@ll.mit.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ed Cashin <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge tag 'for-linus-20130331' of git://git.kernel.dk/linux-blockLinus Torvalds2013-03-31
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "Alright, this time from 10K up in the air. Collection of fixes that have been queued up since the merge window opened, hence postponed until later in the cycle. The pull request contains: - A bunch of fixes for the xen blk front/back driver. - A round of fixes for the new IBM RamSan driver, fixing various nasty issues. - Fixes for multiple drives from Wei Yongjun, bad handling of return values and wrong pointer math. - A fix for loop properly killing partitions when being detached." * tag 'for-linus-20130331' of git://git.kernel.dk/linux-block: (25 commits) mg_disk: fix error return code in mg_probe() rsxx: remove unused variable rsxx: enable error return of rsxx_eeh_save_issued_dmas() block: removes dynamic allocation on stack Block: blk-flush: Fixed indent code style cciss: fix invalid use of sizeof in cciss_find_cfgtables() loop: cleanup partitions when detaching loop device loop: fix error return code in loop_add() mtip32xx: fix error return code in mtip_pci_probe() xen-blkfront: remove frame list from blk_shadow xen-blkfront: pre-allocate pages for requests xen-blkback: don't store dev_bus_addr xen-blkfront: switch from llist to list xen-blkback: fix foreach_grant_safe to handle empty lists xen-blkfront: replace kmalloc and then memcpy with kmemdup xen-blkback: fix dispatch_rw_block_io() error path rsxx: fix missing unlock on error return in rsxx_eeh_remap_dmas() Adding in EEH support to the IBM FlashSystem 70/80 device driver block: IBM RamSan 70/80 error message bug fix. block: IBM RamSan 70/80 branding changes. ...
| * | | | mg_disk: fix error return code in mg_probe()Wei Yongjun2013-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | rsxx: remove unused variablePhilip J Kelleher2013-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | rsxx: enable error return of rsxx_eeh_save_issued_dmas()Philip J Kelleher2013-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d8d595df introduced a bug where we did not check for a NULL return from kmalloc(). Make rsxx_eeh_save_issued_dmas() return an error for that case, and make the callers handle that. Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>