aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/ceph/osd_client.h
Commit message (Collapse)AuthorAge
...
* libceph: set the data pointers when encoding opsAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | Still using the osd request r_data_in and r_data_out pointer, but we're basically only referring to it via the data pointers in the osd ops. And we're transferring that information to the request or reply message only when the op indicates it's needed, in osd_req_encode_op(). To avoid a forward reference, ceph_osdc_msg_data_set() was moved up in the file. Don't bother calling ceph_osd_data_init(), in ceph_osd_alloc(), because the ops array will already be zeroed anyway. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: combine initializing and setting osd dataAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ends up being a rather large patch but what it's doing is somewhat straightforward. Basically, this is replacing two calls with one. The first of the two calls is initializing a struct ceph_osd_data with data (either a page array, a page list, or a bio list); the second is setting an osd request op so it associates that data with one of the op's parameters. In place of those two will be a single function that initializes the op directly. That means we sort of fan out a set of the needed functions: - extent ops with pages data - extent ops with pagelist data - extent ops with bio list data and - class ops with page data for receiving a response We also have define another one, but it's only used internally: - class ops with pagelist data for request parameters Note that we *still* haven't gotten rid of the osd request's r_data_in and r_data_out fields. All the osd ops refer to them for their data. For now, these data fields are pointers assigned to the appropriate r_data_* field when these new functions are called. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: format class info at init timeAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | An object class method is formatted using a pagelist which contains the class name, the method name, and the data concatenated into an osd request's outbound data. Currently when a class op is initialized in osd_req_op_cls_init(), the lengths of and pointers to these three items are recorded. Later, when the op is getting formatted into the request message, a new pagelist is created and that is when these items get copied into the pagelist. This patch makes it so the pagelist to hold these items is created when the op is initialized instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: specify osd op by index in requestAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | An osd request now holds all of its source op structures, and every place that initializes one of these is in fact initializing one of the entries in the the osd request's array. So rather than supplying the address of the op to initialize, have caller specify the osd request and an indication of which op it would like to initialize. This better hides the details the op structure (and faciltates moving the data pointers they use). Since osd_req_op_init() is a common routine, and it's not used outside the osd client code, give it static scope. Also make it return the address of the specified op (so all the other init routines don't have to repeat that code). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: add data pointers in osd op structuresAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | An extent type osd operation currently implies that there will be corresponding data supplied in the data portion of the request (for write) or response (for read) message. Similarly, an osd class method operation implies a data item will be supplied to receive the response data from the operation. Add a ceph_osd_data pointer to each of those structures, and assign it to point to eithre the incoming or the outgoing data structure in the osd message. The data is not always available when an op is initially set up, so add two new functions to allow setting them after the op has been initialized. Begin to make use of the data item pointer available in the osd operation rather than the request data in or out structure in places where it's convenient. Add some assertions to verify pointers are always set the way they're expected to be. This is a sort of stepping stone toward really moving the data into the osd request ops, to allow for some validation before making that jump. This is the first in a series of patches that resolve: http://tracker.ceph.com/issues/4657 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: rename data out field in osd request opAlex Elder2013-05-02
| | | | | | | | | | | | | | There are fields "indata" and "indata_len" defined the ceph osd request op structure. The "in" part is with from the point of view of the osd server, but is a little confusing here on the client side. Change their names to use "request" instead of "in" to indicate that it defines data provided with the request (as opposed the data returned in the response). Rename the local variable in osd_req_encode_op() to match. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: keep source rather than message osd op arrayAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An osd request keeps a pointer to the osd operations (ops) array that it builds in its request message. In order to allow each op in the array to have its own distinct data, we will need to keep track of each op's data, and that information does not go over the wire. As long as we're tracking the data we might as well just track the entire (source) op definition for each of the ops. And if we're doing that, we'll have no more need to keep a pointer to the wire-encoded version. This patch makes the array of source ops be kept with the osd request structure, and uses that instead of the version encoded in the message in places where that was previously used. The array will be embedded in the request structure, and the maximum number of ops we ever actually use is currently 2. So reduce CEPH_OSD_MAX_OP to 2 to reduce the size of the structure. The result of doing this sort of ripples back up, and as a result various function parameters and local variables become unnecessary. Make r_num_ops be unsigned, and move the definition of struct ceph_osd_req_op earlier to ensure it's defined where needed. It does not yet add per-op data, that's coming soon. This resolves: http://tracker.ceph.com/issues/4656 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: define osd data initialization helpersAlex Elder2013-05-02
| | | | | | | | Define and use functions that encapsulate the initializion of a ceph_osd_data structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* ceph: build osd request message later for writepagesAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | Hold off building the osd request message in ceph_writepages_start() until just before it will be submitted to the osd client for execution. We'll still create the request and allocate the page pointer array after we learn we have at least one page to write. A local variable will be used to keep track of the allocated array of pages. Wait until just before submitting the request for assigning that page array pointer to the request message. Create ands use a new function osd_req_op_extent_update() whose purpose is to serve this one spot where the length value supplied when an osd request's op was initially formatted might need to get changed (reduced, never increased) before submitting the request. Previously, ceph_writepages_start() assigned the message header's data length because of this update. That's no longer necessary, because ceph_osdc_build_request() will recalculate the right value to use based on the content of the ops in the request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: don't build request in ceph_osdc_new_request()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the call to ceph_osdc_build_request() out of ceph_osdc_new_request() and into its caller. This is in order to defer formatting osd operation information into the request message until just before request is started. The only unusual (ab)user of ceph_osdc_build_request() is ceph_writepages_start(), where the final length of write request may change (downward) based on the current inode size or the oldest snapshot context with dirty data for the inode. The remaining callers don't change anything in the request after has been built. This means the ops array is now supplied by the caller. It also means there is no need to pass the mtime to ceph_osdc_new_request() (it gets provided to ceph_osdc_build_request()). And rather than passing a do_sync flag, have the number of ops in the ops array supplied imply adding a second STARTSYNC operation after the READ or WRITE requested. This and some of the patches that follow are related to having the messenger (only) be responsible for filling the content of the message header, as described here: http://tracker.ceph.com/issues/4589 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: record length of bio list with bioAlex Elder2013-05-02
| | | | | | | | | | | | | | When assigning a bio pointer to an osd request, we don't have an efficient way of knowing the total length bytes in the bio list. That information is available at the point it's set up by the rbd code, so record it with the osd data when it's set. This and the next patch are related to maintaining the length of a message's data independent of the message header, as described here: http://tracker.ceph.com/issues/4589 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: drop ceph_osd_request->r_con_filling_msgAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A field in an osd request keeps track of whether a connection is currently filling the request's reply message. This patch gets rid of that field. An osd request includes two messages--a request and a reply--and they're both associated with the connection that existed to its the target osd at the time the request was created. An osd request can be dropped early, even when it's in flight. And at that time both messages are released. It's possible the reply message has been supplied to its connection to receive an incoming response message at the time the osd request gets dropped. So ceph_osdc_release_request() revokes that message from the connection before releasing it so things get cleaned up properly. Previously this may have caused a problem, because the connection that a message was associated with might have gone away before the revoke request. And to avoid any problems using that connection, the osd client held a reference to it when it supplies its response message. However since this commit: 38941f80 libceph: have messages point to their connection all messages hold a reference to the connection they are associated with whenever the connection is actively operating on the message (i.e. while the message is queued to send or sending, and when it data is being received into it). And if a message has no connection associated with it, ceph_msg_revoke_incoming() won't do anything when asked to revoke it. As a result, there is no need to keep an additional reference to the connection associated with a message when we hand the message to the messenger when it calls our alloc_msg() method to receive something. If the connection *were* operating on it, it would have its own reference, and if not, there's no work to be done when we need to revoke it. So get rid of the osd request's r_con_filling_msg field. This resolves: http://tracker.ceph.com/issues/4647 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: define source request op functionsAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rbd code has a function that allocates and populates a ceph_osd_req_op structure (the in-core version of an osd request operation). When reviewed, Josh suggested two things: that the big varargs function might be better split into type-specific functions; and that this functionality really belongs in the osd client rather than rbd. This patch implements both of Josh's suggestions. It breaks up the rbd function into separate functions and defines them in the osd client module as exported interfaces. Unlike the rbd version, however, the functions don't allocate an osd_req_op structure; they are provided the address of one and that is initialized instead. The rbd function has been eliminated and calls to it have been replaced by calls to the new routines. The rbd code now now use a stack (struct) variable to hold the op rather than allocating and freeing it each time. For now only the capabilities used by rbd are implemented. Implementing all the other osd op types, and making the rest of the code use it will be done separately, in the next few patches. Note that only the extent, cls, and watch portions of the ceph_osd_req_op structure are currently used. Delete the others (xattr, pgls, and snap) from its definition so nobody thinks it's actually implemented or needed. We can add it back again later if needed, when we know it's been tested. This (and a few follow-on patches) resolves: http://tracker.ceph.com/issues/3861 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: kill osd request r_trailAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The osd trail is a pagelist, used only for a CALL osd operation to hold the class and method names, along with any input data for the call. It is only currently used by the rbd client, and when it's used it is the only bit of outbound data in the osd request. Since we already support (non-trail) pagelist data in a message, we can just save this outbound CALL data in the "normal" pagelist rather than the trail, and get rid of the trail entirely. The existing pagelist support depends on the pagelist being dynamically allocated, and ownership of it is passed to the messenger once it's been attached to a message. (That is to say, the messenger releases and frees the pagelist when it's done with it). That means we need to dynamically allocate the pagelist also. Note that we simply assert that the allocation of a pagelist structure succeeds. Appending to a pagelist might require a dynamic allocation, so we're already assuming we won't run into trouble doing so (we're just ignore any failures--and that should be fixed at some point). This resolves: http://tracker.ceph.com/issues/4407 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: have osd requests support pagelist dataAlex Elder2013-05-02
| | | | | | | | Add support for recording a ceph pagelist as data associated with an osd request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: let osd ops determine request data lengthAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The length of outgoing data in an osd request is dependent on the osd ops that are embedded in that request. Each op is encoded into a request message using osd_req_encode_op(), so that should be used to determine the amount of outgoing data implied by the op as it is encoded. Have osd_req_encode_op() return the number of bytes of outgoing data implied by the op being encoded, and accumulate and use that in ceph_osdc_build_request(). As a result, ceph_osdc_build_request() no longer requires its "len" parameter, so get rid of it. Using the sum of the op lengths rather than the length provided is a valid change because: - The only callers of osd ceph_osdc_build_request() are rbd and the osd client (in ceph_osdc_new_request() on behalf of the file system). - When rbd calls it, the length provided is only non-zero for write requests, and in that case the single op has the same length value as what was passed here. - When called from ceph_osdc_new_request(), (it's not all that easy to see, but) the length passed is also always the same as the extent length encoded in its (single) write op if present. This resolves: http://tracker.ceph.com/issues/4406 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: record byte count not page countAlex Elder2013-05-02
| | | | | | | | | Record the byte count for an osd request rather than the page count. The number of pages can always be derived from the byte count (and alignment/offset) but the reverse is not true. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: separate read and write dataAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | An osd request defines information about where data to be read should be placed as well as where data to write comes from. Currently these are represented by common fields. Keep information about data for writing separate from data to be read by splitting these into data_in and data_out fields. This is the key patch in this whole series, in that it actually identifies which osd requests generate outgoing data and which generate incoming data. It's less obvious (currently) that an osd CALL op generates both outgoing and incoming data; that's the focus of some upcoming work. This resolves: http://tracker.ceph.com/issues/4127 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: distinguish page and bio requestsAlex Elder2013-05-02
| | | | | | | | | An osd request uses either pages or a bio list for its data. Use a union to record information about the two, and add a data type tag to select between them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: separate osd request data infoAlex Elder2013-05-02
| | | | | | | | Pull the fields in an osd request structure that define the data for the request out into a separate structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: don't assign page info in ceph_osdc_new_request()Alex Elder2013-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently ceph_osdc_new_request() assigns an osd request's r_num_pages and r_alignment fields. The only thing it does after that is call ceph_osdc_build_request(), and that doesn't need those fields to be assigned. Move the assignment of those fields out of ceph_osdc_new_request() and into its caller. As a result, the page_align parameter is no longer used, so get rid of it. Note that in ceph_sync_write(), the value for req->r_num_pages had already been calculated earlier (as num_pages, and fortunately it was computed the same way). So don't bother recomputing it, but because it's not needed earlier, move that calculation after the call to ceph_osdc_new_request(). Hold off making the assignment to r_alignment, doing it instead r_pages and r_num_pages are getting set. Similarly, in start_read(), nr_pages already holds the number of pages in the array (and is calculated the same way), so there's no need to recompute it. Move the assignment of the page alignment down with the others there as well. This and the next few patches are preparation work for: http://tracker.ceph.com/issues/4127 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: use (void *) for untyped data in osd opsAlex Elder2013-05-02
| | | | | | | | | Two of the fields defining osd operations are defined using (char *) while the data they represent are really untyped, not character strings. Change them to have type (void *). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: complete lingering requests only onceAlex Elder2013-05-02
| | | | | | | | | | | | | | | | | An osd request marked to linger will be re-submitted in the event a connection to the target osd gets dropped. Currently, if there is a callback function associated with a request it will be called each time a request is submitted--which for lingering requests can be more than once. Change it so a request--including lingering ones--will get completed (from the perspective of the user of the osd client) exactly once. This resolves: http://tracker.ceph.com/issues/3967 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: update osd request/reply encodingSage Weil2013-02-26
| | | | | | | | | | | | | | | | Use the new version of the encoding for osd requests and replies. In the process, update the way we are tracking request ops and reply lengths and results in the struct ceph_osd_request. Update the rbd and fs/ceph users appropriately. The main changes are: - we keep pointers into the request memory for fields we need to update each time the request is sent out over the wire - we keep information about the result in an array in the request struct where the users can easily get at it. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
* libceph: calculate placement based on the internal data typesSage Weil2013-02-26
| | | | | | | | | | | Instead of using the old ceph_object_layout struct, update our internal ceph_calc_object_layout method to use the ceph_pg type. This allows us to pass the full 32-bit precision of the pgid.seed to the callers. It also allows some callers to avoid reaching into the request structures for the struct ceph_object_layout fields. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
* libceph: decode into cpu-native ceph_pg typeSage Weil2013-02-26
| | | | | | | | | Always decode data into our cpu-native ceph_pg type that has the correct field widths. Limit any remaining uses of ceph_pg_v1 to dealing with the legacy protocol. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
* libceph: rename ceph_pg -> ceph_pg_v1Sage Weil2013-02-26
| | | | | | | Rename the old version this type to distinguish it from the new version. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
* libceph: kill ceph_osdc_wait_event()Alex Elder2013-02-18
| | | | | | | | | | | | There are no actual users of ceph_osdc_wait_event(). This would have been one-shot events, but we no longer support those so just get rid of this function. Since this leaves nothing else that waits for the completion of an event, we can get rid of the completion in a struct ceph_osd_event. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: kill ceph_osdc_create_event() "one_shot" parameterAlex Elder2013-02-18
| | | | | | | | | | | | | | | There is only one caller of ceph_osdc_create_event(), and it provides 0 as its "one_shot" argument. Get rid of that argument and just use 0 in its place. Replace the code in handle_watch_notify() that executes if one_shot is nonzero in the event with a BUG_ON() call. While modifying "osd_client.c", give handle_watch_notify() static scope. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: kill ceph_calc_raw_layout()Alex Elder2013-02-18
| | | | | | | | | | | | | | | | | There is no caller of ceph_calc_raw_layout() outside of libceph, so there's no need to export from the module. Furthermore, there is only one caller, in calc_layout(), and it is not much more than a simple wrapper for that function. So get rid of ceph_calc_raw_layout() and embed it instead within calc_layout(). While touching "osd_client.c", get rid of the unnecessary forward declaration of __send_request(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* ceph: kill ceph_osdc_new_request() "num_reply" parameterAlex Elder2013-02-18
| | | | | | | | | | | | The "num_reply" parameter to ceph_osdc_new_request() is never used inside that function, so get rid of it. Note that ceph_sync_write() passes 2 for that argument, while all other callers pass 1. It doesn't matter, but perhaps someone should verify this doesn't indicate a problem. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* ceph: kill ceph_osdc_writepages() "flags" parameterAlex Elder2013-02-18
| | | | | | | | | There is only one caller of ceph_osdc_writepages(), and it always passes 0 as its "flags" argument. Get rid of that argument and replace its use in ceph_osdc_writepages() with 0. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* ceph: kill ceph_osdc_writepages() "dosync" parameterAlex Elder2013-02-18
| | | | | | | | | There is only one caller of ceph_osdc_writepages(), and it always passes 0 as its "dosync" argument. Get rid of that argument and replace its use in ceph_osdc_writepages() with 0. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* ceph: kill ceph_osdc_writepages() "nofail" parameterAlex Elder2013-02-18
| | | | | | | | | | | | | There is only one caller of ceph_osdc_writepages(), and it always passes the value true as its "nofail" argument. Get rid of that argument and replace its use in ceph_osdc_writepages() with the constant value true. This and a number of cleanup patches that follow resolve: http://tracker.ceph.com/issues/4126 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: improve packing in struct ceph_osd_req_opAlex Elder2013-02-18
| | | | | | | | | | | | The layout of struct ceph_osd_req_op leaves lots of holes. Rearranging things a little for better field alignment reduces the size by a third. This resolves: http://tracker.ceph.com/issues/4163 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* rbd: kill ceph_osd_req_op->flagsAlex Elder2013-01-17
| | | | | | | | The flags field of struct ceph_osd_req_op is never used, so just get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: pass num_op with opsAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | Both ceph_osdc_alloc_request() and ceph_osdc_build_request() are provided an array of ceph osd request operations. Rather than just passing the number of operations in the array, the caller is required append an additional zeroed operation structure to signal the end of the array. All callers know the number of operations at the time these functions are called, so drop the silly zero entry and supply that number directly. As a result, get_num_ops() is no longer needed. This also means that ceph_osdc_alloc_request() never uses its ops argument, so that can be dropped. Also rbd_create_rw_ops() no longer needs to add one to reserve room for the additional op. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: don't set pages or bio in ceph_osdc_alloc_request()Alex Elder2013-01-17
| | | | | | | | | | | | | Only one of the two callers of ceph_osdc_alloc_request() provides page or bio data for its payload. And essentially all that function was doing with those arguments was assigning them to fields in the osd request structure. Simplify ceph_osdc_alloc_request() by having the caller take care of making those assignments Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: don't set flags in ceph_osdc_alloc_request()Alex Elder2013-01-17
| | | | | | | | | | | | | The only thing ceph_osdc_alloc_request() really does with the flags value it is passed is assign it to the newly-created osd request structure. Do that in the caller instead. Both callers subsequently call ceph_osdc_build_request(), so have that function (instead of ceph_osdc_alloc_request()) issue a warning if a request comes through with neither the read nor write flags set. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: drop osdc from ceph_calc_raw_layout()Alex Elder2013-01-17
| | | | | | | | | The osdc parameter to ceph_calc_raw_layout() is not used, so get rid of it. Consequently, the corresponding parameter in calc_layout() becomes unused, so get rid of that as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: drop snapid in ceph_calc_raw_layout()Alex Elder2013-01-17
| | | | | | | | | | | | | | A snapshot id must be provided to ceph_calc_raw_layout() even though it is not needed at all for calculating the layout. Where the snapshot id *is* needed is when building the request message for an osd operation. Drop the snapid parameter from ceph_calc_raw_layout() and pass that value instead in ceph_osdc_build_request(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: pass length to ceph_osdc_build_request()Alex Elder2013-01-17
| | | | | | | | | | The len argument to ceph_osdc_build_request() is set up to be passed by address, but that function never updates its value so there's no need to do this. Tighten up the interface by passing the length directly. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: always allow trail in osd requestAlex Elder2013-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An osd request structure contains an optional trail portion, which if present will contain data to be passed in the payload portion of the message containing the request. The trail field is a ceph_pagelist pointer, and if null it indicates there is no trail. A ceph_pagelist structure contains a length field, and it can legitimately hold value 0. Make use of this to change the interpretation of the "trail" of an osd request so that every osd request has trailing data, it just might have length 0. This means we change the r_trail field in a ceph_osd_request structure from a pointer to a structure that is always initialized. Note that in ceph_osdc_start_request(), the trail pointer (or now address of that structure) is assigned to a ceph message's trail field. Here's why that's still OK (looking at net/ceph/messenger.c): - What would have resulted in a null pointer previously will now refer to a 0-length page list. That message trail pointer is used in two functions, write_partial_msg_pages() and out_msg_pos_next(). - In write_partial_msg_pages(), a null page list pointer is handled the same as a message with 0-length trail, and both result in a "in_trail" variable set to false. The trail pointer is only used if in_trail is true. - The only other place the message trail pointer is used is out_msg_pos_next(). That function is only called by write_partial_msg_pages() and only touches the trail pointer if the in_trail value it is passed is true. Therefore a null ceph_msg->trail pointer is equivalent to a non-null pointer referring to a 0-length page list structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* rbd: drop oid parameters from ceph_osdc_build_request()Alex Elder2013-01-17
| | | | | | | | | | The last two parameters to ceph_osd_build_request() describe the object id, but the values passed always come from the osd request structure whose address is also provided. Get rid of those last two parameters. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: check for invalid mappingSage Weil2012-10-01
| | | | | | | | If we encounter an invalid (e.g., zeroed) mapping, return an error and avoid a divide by zero. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
* ceph: define ceph_auth_handshake typeAlex Elder2012-05-17
| | | | | | | | | | | | | The definitions for the ceph_mds_session and ceph_osd both contain five fields related only to "authorizers." Encapsulate those fields into their own struct type, allowing for better isolation in some upcoming patches. Fix the #includes in "linux/ceph/osd_client.h" to lay out their more complete canonical path. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* libceph: Allocate larger oid buffer in request msgsStratos Psomadakis2011-11-11
| | | | | | | | | | | | | ceph_osd_request struct allocates a 40-byte buffer for object names. RBD image names can be up to 96 chars long (100 with the .rbd suffix), which results in the object name for the image being truncated, and a subsequent map failure. Increase the oid buffer in request messages, in order to avoid the truncation. Signed-off-by: Stratos Psomadakis <psomas@grnet.gr> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: add lingering request and watch/notify event frameworkYehuda Sadeh2011-03-22
| | | | | | | | | | | | | | Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix osd request queuing on osdmap updatesSage Weil2011-03-21
| | | | | | | | | | | | | | | | | | | | | | | If we send a request to osd A, and the request's pg remaps to osd B and then back to A in quick succession, we need to resend the request to A. The old code was only calling kick_requests after processing all incremental maps in a message, so it was very possible to not resend a request that needed to be resent. This would make the osd eventually time out (at least with the current default of osd timeouts enabled). The correct approach is to scan requests on every map incremental. This patch refactors the kick code in a few ways: - all requests are either on req_lru (in flight), req_unsent (ready to send), or req_notarget (currently map to no up osd) - mapping always done by map_request (previous map_osds) - if the mapping changes, we requeue. requests are resent only after all map incrementals are processed. - some osd reset code is moved out of kick_requests into a separate function - the "kick this osd" functionality is moved to kick_osd_requests, as it is unrelated to scanning for request->pg->osd mapping changes Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: make page alignment explicit in osd interfaceSage Weil2010-11-09
| | | | | | | | | | | We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: Sage Weil <sage@newdream.net>