diff options
Diffstat (limited to 'Documentation/networking/rxrpc.txt')
-rw-r--r-- | Documentation/networking/rxrpc.txt | 859 |
1 files changed, 859 insertions, 0 deletions
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt new file mode 100644 index 000000000000..cae231b1c134 --- /dev/null +++ b/Documentation/networking/rxrpc.txt | |||
@@ -0,0 +1,859 @@ | |||
1 | ====================== | ||
2 | RxRPC NETWORK PROTOCOL | ||
3 | ====================== | ||
4 | |||
5 | The RxRPC protocol driver provides a reliable two-phase transport on top of UDP | ||
6 | that can be used to perform RxRPC remote operations. This is done over sockets | ||
7 | of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and | ||
8 | receive data, aborts and errors. | ||
9 | |||
10 | Contents of this document: | ||
11 | |||
12 | (*) Overview. | ||
13 | |||
14 | (*) RxRPC protocol summary. | ||
15 | |||
16 | (*) AF_RXRPC driver model. | ||
17 | |||
18 | (*) Control messages. | ||
19 | |||
20 | (*) Socket options. | ||
21 | |||
22 | (*) Security. | ||
23 | |||
24 | (*) Example client usage. | ||
25 | |||
26 | (*) Example server usage. | ||
27 | |||
28 | (*) AF_RXRPC kernel interface. | ||
29 | |||
30 | |||
31 | ======== | ||
32 | OVERVIEW | ||
33 | ======== | ||
34 | |||
35 | RxRPC is a two-layer protocol. There is a session layer which provides | ||
36 | reliable virtual connections using UDP over IPv4 (or IPv6) as the transport | ||
37 | layer, but implements a real network protocol; and there's the presentation | ||
38 | layer which renders structured data to binary blobs and back again using XDR | ||
39 | (as does SunRPC): | ||
40 | |||
41 | +-------------+ | ||
42 | | Application | | ||
43 | +-------------+ | ||
44 | | XDR | Presentation | ||
45 | +-------------+ | ||
46 | | RxRPC | Session | ||
47 | +-------------+ | ||
48 | | UDP | Transport | ||
49 | +-------------+ | ||
50 | |||
51 | |||
52 | AF_RXRPC provides: | ||
53 | |||
54 | (1) Part of an RxRPC facility for both kernel and userspace applications by | ||
55 | making the session part of it a Linux network protocol (AF_RXRPC). | ||
56 | |||
57 | (2) A two-phase protocol. The client transmits a blob (the request) and then | ||
58 | receives a blob (the reply), and the server receives the request and then | ||
59 | transmits the reply. | ||
60 | |||
61 | (3) Retention of the reusable bits of the transport system set up for one call | ||
62 | to speed up subsequent calls. | ||
63 | |||
64 | (4) A secure protocol, using the Linux kernel's key retention facility to | ||
65 | manage security on the client end. The server end must of necessity be | ||
66 | more active in security negotiations. | ||
67 | |||
68 | AF_RXRPC does not provide XDR marshalling/presentation facilities. That is | ||
69 | left to the application. AF_RXRPC only deals in blobs. Even the operation ID | ||
70 | is just the first four bytes of the request blob, and as such is beyond the | ||
71 | kernel's interest. | ||
72 | |||
73 | |||
74 | Sockets of AF_RXRPC family are: | ||
75 | |||
76 | (1) created as type SOCK_DGRAM; | ||
77 | |||
78 | (2) provided with a protocol of the type of underlying transport they're going | ||
79 | to use - currently only PF_INET is supported. | ||
80 | |||
81 | |||
82 | The Andrew File System (AFS) is an example of an application that uses this and | ||
83 | that has both kernel (filesystem) and userspace (utility) components. | ||
84 | |||
85 | |||
86 | ====================== | ||
87 | RXRPC PROTOCOL SUMMARY | ||
88 | ====================== | ||
89 | |||
90 | An overview of the RxRPC protocol: | ||
91 | |||
92 | (*) RxRPC sits on top of another networking protocol (UDP is the only option | ||
93 | currently), and uses this to provide network transport. UDP ports, for | ||
94 | example, provide transport endpoints. | ||
95 | |||
96 | (*) RxRPC supports multiple virtual "connections" from any given transport | ||
97 | endpoint, thus allowing the endpoints to be shared, even to the same | ||
98 | remote endpoint. | ||
99 | |||
100 | (*) Each connection goes to a particular "service". A connection may not go | ||
101 | to multiple services. A service may be considered the RxRPC equivalent of | ||
102 | a port number. AF_RXRPC permits multiple services to share an endpoint. | ||
103 | |||
104 | (*) Client-originating packets are marked, thus a transport endpoint can be | ||
105 | shared between client and server connections (connections have a | ||
106 | direction). | ||
107 | |||
108 | (*) Up to a billion connections may be supported concurrently between one | ||
109 | local transport endpoint and one service on one remote endpoint. An RxRPC | ||
110 | connection is described by seven numbers: | ||
111 | |||
112 | Local address } | ||
113 | Local port } Transport (UDP) address | ||
114 | Remote address } | ||
115 | Remote port } | ||
116 | Direction | ||
117 | Connection ID | ||
118 | Service ID | ||
119 | |||
120 | (*) Each RxRPC operation is a "call". A connection may make up to four | ||
121 | billion calls, but only up to four calls may be in progress on a | ||
122 | connection at any one time. | ||
123 | |||
124 | (*) Calls are two-phase and asymmetric: the client sends its request data, | ||
125 | which the service receives; then the service transmits the reply data | ||
126 | which the client receives. | ||
127 | |||
128 | (*) The data blobs are of indefinite size, the end of a phase is marked with a | ||
129 | flag in the packet. The number of packets of data making up one blob may | ||
130 | not exceed 4 billion, however, as this would cause the sequence number to | ||
131 | wrap. | ||
132 | |||
133 | (*) The first four bytes of the request data are the service operation ID. | ||
134 | |||
135 | (*) Security is negotiated on a per-connection basis. The connection is | ||
136 | initiated by the first data packet on it arriving. If security is | ||
137 | requested, the server then issues a "challenge" and then the client | ||
138 | replies with a "response". If the response is successful, the security is | ||
139 | set for the lifetime of that connection, and all subsequent calls made | ||
140 | upon it use that same security. In the event that the server lets a | ||
141 | connection lapse before the client, the security will be renegotiated if | ||
142 | the client uses the connection again. | ||
143 | |||
144 | (*) Calls use ACK packets to handle reliability. Data packets are also | ||
145 | explicitly sequenced per call. | ||
146 | |||
147 | (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs. | ||
148 | A hard-ACK indicates to the far side that all the data received to a point | ||
149 | has been received and processed; a soft-ACK indicates that the data has | ||
150 | been received but may yet be discarded and re-requested. The sender may | ||
151 | not discard any transmittable packets until they've been hard-ACK'd. | ||
152 | |||
153 | (*) Reception of a reply data packet implicitly hard-ACK's all the data | ||
154 | packets that make up the request. | ||
155 | |||
156 | (*) An call is complete when the request has been sent, the reply has been | ||
157 | received and the final hard-ACK on the last packet of the reply has | ||
158 | reached the server. | ||
159 | |||
160 | (*) An call may be aborted by either end at any time up to its completion. | ||
161 | |||
162 | |||
163 | ===================== | ||
164 | AF_RXRPC DRIVER MODEL | ||
165 | ===================== | ||
166 | |||
167 | About the AF_RXRPC driver: | ||
168 | |||
169 | (*) The AF_RXRPC protocol transparently uses internal sockets of the transport | ||
170 | protocol to represent transport endpoints. | ||
171 | |||
172 | (*) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC | ||
173 | connections are handled transparently. One client socket may be used to | ||
174 | make multiple simultaneous calls to the same service. One server socket | ||
175 | may handle calls from many clients. | ||
176 | |||
177 | (*) Additional parallel client connections will be initiated to support extra | ||
178 | concurrent calls, up to a tunable limit. | ||
179 | |||
180 | (*) Each connection is retained for a certain amount of time [tunable] after | ||
181 | the last call currently using it has completed in case a new call is made | ||
182 | that could reuse it. | ||
183 | |||
184 | (*) Each internal UDP socket is retained [tunable] for a certain amount of | ||
185 | time [tunable] after the last connection using it discarded, in case a new | ||
186 | connection is made that could use it. | ||
187 | |||
188 | (*) A client-side connection is only shared between calls if they have have | ||
189 | the same key struct describing their security (and assuming the calls | ||
190 | would otherwise share the connection). Non-secured calls would also be | ||
191 | able to share connections with each other. | ||
192 | |||
193 | (*) A server-side connection is shared if the client says it is. | ||
194 | |||
195 | (*) ACK'ing is handled by the protocol driver automatically, including ping | ||
196 | replying. | ||
197 | |||
198 | (*) SO_KEEPALIVE automatically pings the other side to keep the connection | ||
199 | alive [TODO]. | ||
200 | |||
201 | (*) If an ICMP error is received, all calls affected by that error will be | ||
202 | aborted with an appropriate network error passed through recvmsg(). | ||
203 | |||
204 | |||
205 | Interaction with the user of the RxRPC socket: | ||
206 | |||
207 | (*) A socket is made into a server socket by binding an address with a | ||
208 | non-zero service ID. | ||
209 | |||
210 | (*) In the client, sending a request is achieved with one or more sendmsgs, | ||
211 | followed by the reply being received with one or more recvmsgs. | ||
212 | |||
213 | (*) The first sendmsg for a request to be sent from a client contains a tag to | ||
214 | be used in all other sendmsgs or recvmsgs associated with that call. The | ||
215 | tag is carried in the control data. | ||
216 | |||
217 | (*) connect() is used to supply a default destination address for a client | ||
218 | socket. This may be overridden by supplying an alternate address to the | ||
219 | first sendmsg() of a call (struct msghdr::msg_name). | ||
220 | |||
221 | (*) If connect() is called on an unbound client, a random local port will | ||
222 | bound before the operation takes place. | ||
223 | |||
224 | (*) A server socket may also be used to make client calls. To do this, the | ||
225 | first sendmsg() of the call must specify the target address. The server's | ||
226 | transport endpoint is used to send the packets. | ||
227 | |||
228 | (*) Once the application has received the last message associated with a call, | ||
229 | the tag is guaranteed not to be seen again, and so it can be used to pin | ||
230 | client resources. A new call can then be initiated with the same tag | ||
231 | without fear of interference. | ||
232 | |||
233 | (*) In the server, a request is received with one or more recvmsgs, then the | ||
234 | the reply is transmitted with one or more sendmsgs, and then the final ACK | ||
235 | is received with a last recvmsg. | ||
236 | |||
237 | (*) When sending data for a call, sendmsg is given MSG_MORE if there's more | ||
238 | data to come on that call. | ||
239 | |||
240 | (*) When receiving data for a call, recvmsg flags MSG_MORE if there's more | ||
241 | data to come for that call. | ||
242 | |||
243 | (*) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg | ||
244 | to indicate the terminal message for that call. | ||
245 | |||
246 | (*) A call may be aborted by adding an abort control message to the control | ||
247 | data. Issuing an abort terminates the kernel's use of that call's tag. | ||
248 | Any messages waiting in the receive queue for that call will be discarded. | ||
249 | |||
250 | (*) Aborts, busy notifications and challenge packets are delivered by recvmsg, | ||
251 | and control data messages will be set to indicate the context. Receiving | ||
252 | an abort or a busy message terminates the kernel's use of that call's tag. | ||
253 | |||
254 | (*) The control data part of the msghdr struct is used for a number of things: | ||
255 | |||
256 | (*) The tag of the intended or affected call. | ||
257 | |||
258 | (*) Sending or receiving errors, aborts and busy notifications. | ||
259 | |||
260 | (*) Notifications of incoming calls. | ||
261 | |||
262 | (*) Sending debug requests and receiving debug replies [TODO]. | ||
263 | |||
264 | (*) When the kernel has received and set up an incoming call, it sends a | ||
265 | message to server application to let it know there's a new call awaiting | ||
266 | its acceptance [recvmsg reports a special control message]. The server | ||
267 | application then uses sendmsg to assign a tag to the new call. Once that | ||
268 | is done, the first part of the request data will be delivered by recvmsg. | ||
269 | |||
270 | (*) The server application has to provide the server socket with a keyring of | ||
271 | secret keys corresponding to the security types it permits. When a secure | ||
272 | connection is being set up, the kernel looks up the appropriate secret key | ||
273 | in the keyring and then sends a challenge packet to the client and | ||
274 | receives a response packet. The kernel then checks the authorisation of | ||
275 | the packet and either aborts the connection or sets up the security. | ||
276 | |||
277 | (*) The name of the key a client will use to secure its communications is | ||
278 | nominated by a socket option. | ||
279 | |||
280 | |||
281 | Notes on recvmsg: | ||
282 | |||
283 | (*) If there's a sequence of data messages belonging to a particular call on | ||
284 | the receive queue, then recvmsg will keep working through them until: | ||
285 | |||
286 | (a) it meets the end of that call's received data, | ||
287 | |||
288 | (b) it meets a non-data message, | ||
289 | |||
290 | (c) it meets a message belonging to a different call, or | ||
291 | |||
292 | (d) it fills the user buffer. | ||
293 | |||
294 | If recvmsg is called in blocking mode, it will keep sleeping, awaiting the | ||
295 | reception of further data, until one of the above four conditions is met. | ||
296 | |||
297 | (2) MSG_PEEK operates similarly, but will return immediately if it has put any | ||
298 | data in the buffer rather than sleeping until it can fill the buffer. | ||
299 | |||
300 | (3) If a data message is only partially consumed in filling a user buffer, | ||
301 | then the remainder of that message will be left on the front of the queue | ||
302 | for the next taker. MSG_TRUNC will never be flagged. | ||
303 | |||
304 | (4) If there is more data to be had on a call (it hasn't copied the last byte | ||
305 | of the last data message in that phase yet), then MSG_MORE will be | ||
306 | flagged. | ||
307 | |||
308 | |||
309 | ================ | ||
310 | CONTROL MESSAGES | ||
311 | ================ | ||
312 | |||
313 | AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex | ||
314 | calls, to invoke certain actions and to report certain conditions. These are: | ||
315 | |||
316 | MESSAGE ID SRT DATA MEANING | ||
317 | ======================= === =========== =============================== | ||
318 | RXRPC_USER_CALL_ID sr- User ID App's call specifier | ||
319 | RXRPC_ABORT srt Abort code Abort code to issue/received | ||
320 | RXRPC_ACK -rt n/a Final ACK received | ||
321 | RXRPC_NET_ERROR -rt error num Network error on call | ||
322 | RXRPC_BUSY -rt n/a Call rejected (server busy) | ||
323 | RXRPC_LOCAL_ERROR -rt error num Local error encountered | ||
324 | RXRPC_NEW_CALL -r- n/a New call received | ||
325 | RXRPC_ACCEPT s-- n/a Accept new call | ||
326 | |||
327 | (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message) | ||
328 | |||
329 | (*) RXRPC_USER_CALL_ID | ||
330 | |||
331 | This is used to indicate the application's call ID. It's an unsigned long | ||
332 | that the app specifies in the client by attaching it to the first data | ||
333 | message or in the server by passing it in association with an RXRPC_ACCEPT | ||
334 | message. recvmsg() passes it in conjunction with all messages except | ||
335 | those of the RXRPC_NEW_CALL message. | ||
336 | |||
337 | (*) RXRPC_ABORT | ||
338 | |||
339 | This is can be used by an application to abort a call by passing it to | ||
340 | sendmsg, or it can be delivered by recvmsg to indicate a remote abort was | ||
341 | received. Either way, it must be associated with an RXRPC_USER_CALL_ID to | ||
342 | specify the call affected. If an abort is being sent, then error EBADSLT | ||
343 | will be returned if there is no call with that user ID. | ||
344 | |||
345 | (*) RXRPC_ACK | ||
346 | |||
347 | This is delivered to a server application to indicate that the final ACK | ||
348 | of a call was received from the client. It will be associated with an | ||
349 | RXRPC_USER_CALL_ID to indicate the call that's now complete. | ||
350 | |||
351 | (*) RXRPC_NET_ERROR | ||
352 | |||
353 | This is delivered to an application to indicate that an ICMP error message | ||
354 | was encountered in the process of trying to talk to the peer. An | ||
355 | errno-class integer value will be included in the control message data | ||
356 | indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call | ||
357 | affected. | ||
358 | |||
359 | (*) RXRPC_BUSY | ||
360 | |||
361 | This is delivered to a client application to indicate that a call was | ||
362 | rejected by the server due to the server being busy. It will be | ||
363 | associated with an RXRPC_USER_CALL_ID to indicate the rejected call. | ||
364 | |||
365 | (*) RXRPC_LOCAL_ERROR | ||
366 | |||
367 | This is delivered to an application to indicate that a local error was | ||
368 | encountered and that a call has been aborted because of it. An | ||
369 | errno-class integer value will be included in the control message data | ||
370 | indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call | ||
371 | affected. | ||
372 | |||
373 | (*) RXRPC_NEW_CALL | ||
374 | |||
375 | This is delivered to indicate to a server application that a new call has | ||
376 | arrived and is awaiting acceptance. No user ID is associated with this, | ||
377 | as a user ID must subsequently be assigned by doing an RXRPC_ACCEPT. | ||
378 | |||
379 | (*) RXRPC_ACCEPT | ||
380 | |||
381 | This is used by a server application to attempt to accept a call and | ||
382 | assign it a user ID. It should be associated with an RXRPC_USER_CALL_ID | ||
383 | to indicate the user ID to be assigned. If there is no call to be | ||
384 | accepted (it may have timed out, been aborted, etc.), then sendmsg will | ||
385 | return error ENODATA. If the user ID is already in use by another call, | ||
386 | then error EBADSLT will be returned. | ||
387 | |||
388 | |||
389 | ============== | ||
390 | SOCKET OPTIONS | ||
391 | ============== | ||
392 | |||
393 | AF_RXRPC sockets support a few socket options at the SOL_RXRPC level: | ||
394 | |||
395 | (*) RXRPC_SECURITY_KEY | ||
396 | |||
397 | This is used to specify the description of the key to be used. The key is | ||
398 | extracted from the calling process's keyrings with request_key() and | ||
399 | should be of "rxrpc" type. | ||
400 | |||
401 | The optval pointer points to the description string, and optlen indicates | ||
402 | how long the string is, without the NUL terminator. | ||
403 | |||
404 | (*) RXRPC_SECURITY_KEYRING | ||
405 | |||
406 | Similar to above but specifies a keyring of server secret keys to use (key | ||
407 | type "keyring"). See the "Security" section. | ||
408 | |||
409 | (*) RXRPC_EXCLUSIVE_CONNECTION | ||
410 | |||
411 | This is used to request that new connections should be used for each call | ||
412 | made subsequently on this socket. optval should be NULL and optlen 0. | ||
413 | |||
414 | (*) RXRPC_MIN_SECURITY_LEVEL | ||
415 | |||
416 | This is used to specify the minimum security level required for calls on | ||
417 | this socket. optval must point to an int containing one of the following | ||
418 | values: | ||
419 | |||
420 | (a) RXRPC_SECURITY_PLAIN | ||
421 | |||
422 | Encrypted checksum only. | ||
423 | |||
424 | (b) RXRPC_SECURITY_AUTH | ||
425 | |||
426 | Encrypted checksum plus packet padded and first eight bytes of packet | ||
427 | encrypted - which includes the actual packet length. | ||
428 | |||
429 | (c) RXRPC_SECURITY_ENCRYPTED | ||
430 | |||
431 | Encrypted checksum plus entire packet padded and encrypted, including | ||
432 | actual packet length. | ||
433 | |||
434 | |||
435 | ======== | ||
436 | SECURITY | ||
437 | ======== | ||
438 | |||
439 | Currently, only the kerberos 4 equivalent protocol has been implemented | ||
440 | (security index 2 - rxkad). This requires the rxkad module to be loaded and, | ||
441 | on the client, tickets of the appropriate type to be obtained from the AFS | ||
442 | kaserver or the kerberos server and installed as "rxrpc" type keys. This is | ||
443 | normally done using the klog program. An example simple klog program can be | ||
444 | found at: | ||
445 | |||
446 | http://people.redhat.com/~dhowells/rxrpc/klog.c | ||
447 | |||
448 | The payload provided to add_key() on the client should be of the following | ||
449 | form: | ||
450 | |||
451 | struct rxrpc_key_sec2_v1 { | ||
452 | uint16_t security_index; /* 2 */ | ||
453 | uint16_t ticket_length; /* length of ticket[] */ | ||
454 | uint32_t expiry; /* time at which expires */ | ||
455 | uint8_t kvno; /* key version number */ | ||
456 | uint8_t __pad[3]; | ||
457 | uint8_t session_key[8]; /* DES session key */ | ||
458 | uint8_t ticket[0]; /* the encrypted ticket */ | ||
459 | }; | ||
460 | |||
461 | Where the ticket blob is just appended to the above structure. | ||
462 | |||
463 | |||
464 | For the server, keys of type "rxrpc_s" must be made available to the server. | ||
465 | They have a description of "<serviceID>:<securityIndex>" (eg: "52:2" for an | ||
466 | rxkad key for the AFS VL service). When such a key is created, it should be | ||
467 | given the server's secret key as the instantiation data (see the example | ||
468 | below). | ||
469 | |||
470 | add_key("rxrpc_s", "52:2", secret_key, 8, keyring); | ||
471 | |||
472 | A keyring is passed to the server socket by naming it in a sockopt. The server | ||
473 | socket then looks the server secret keys up in this keyring when secure | ||
474 | incoming connections are made. This can be seen in an example program that can | ||
475 | be found at: | ||
476 | |||
477 | http://people.redhat.com/~dhowells/rxrpc/listen.c | ||
478 | |||
479 | |||
480 | ==================== | ||
481 | EXAMPLE CLIENT USAGE | ||
482 | ==================== | ||
483 | |||
484 | A client would issue an operation by: | ||
485 | |||
486 | (1) An RxRPC socket is set up by: | ||
487 | |||
488 | client = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); | ||
489 | |||
490 | Where the third parameter indicates the protocol family of the transport | ||
491 | socket used - usually IPv4 but it can also be IPv6 [TODO]. | ||
492 | |||
493 | (2) A local address can optionally be bound: | ||
494 | |||
495 | struct sockaddr_rxrpc srx = { | ||
496 | .srx_family = AF_RXRPC, | ||
497 | .srx_service = 0, /* we're a client */ | ||
498 | .transport_type = SOCK_DGRAM, /* type of transport socket */ | ||
499 | .transport.sin_family = AF_INET, | ||
500 | .transport.sin_port = htons(7000), /* AFS callback */ | ||
501 | .transport.sin_address = 0, /* all local interfaces */ | ||
502 | }; | ||
503 | bind(client, &srx, sizeof(srx)); | ||
504 | |||
505 | This specifies the local UDP port to be used. If not given, a random | ||
506 | non-privileged port will be used. A UDP port may be shared between | ||
507 | several unrelated RxRPC sockets. Security is handled on a basis of | ||
508 | per-RxRPC virtual connection. | ||
509 | |||
510 | (3) The security is set: | ||
511 | |||
512 | const char *key = "AFS:cambridge.redhat.com"; | ||
513 | setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key)); | ||
514 | |||
515 | This issues a request_key() to get the key representing the security | ||
516 | context. The minimum security level can be set: | ||
517 | |||
518 | unsigned int sec = RXRPC_SECURITY_ENCRYPTED; | ||
519 | setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL, | ||
520 | &sec, sizeof(sec)); | ||
521 | |||
522 | (4) The server to be contacted can then be specified (alternatively this can | ||
523 | be done through sendmsg): | ||
524 | |||
525 | struct sockaddr_rxrpc srx = { | ||
526 | .srx_family = AF_RXRPC, | ||
527 | .srx_service = VL_SERVICE_ID, | ||
528 | .transport_type = SOCK_DGRAM, /* type of transport socket */ | ||
529 | .transport.sin_family = AF_INET, | ||
530 | .transport.sin_port = htons(7005), /* AFS volume manager */ | ||
531 | .transport.sin_address = ..., | ||
532 | }; | ||
533 | connect(client, &srx, sizeof(srx)); | ||
534 | |||
535 | (5) The request data should then be posted to the server socket using a series | ||
536 | of sendmsg() calls, each with the following control message attached: | ||
537 | |||
538 | RXRPC_USER_CALL_ID - specifies the user ID for this call | ||
539 | |||
540 | MSG_MORE should be set in msghdr::msg_flags on all but the last part of | ||
541 | the request. Multiple requests may be made simultaneously. | ||
542 | |||
543 | If a call is intended to go to a destination other then the default | ||
544 | specified through connect(), then msghdr::msg_name should be set on the | ||
545 | first request message of that call. | ||
546 | |||
547 | (6) The reply data will then be posted to the server socket for recvmsg() to | ||
548 | pick up. MSG_MORE will be flagged by recvmsg() if there's more reply data | ||
549 | for a particular call to be read. MSG_EOR will be set on the terminal | ||
550 | read for a call. | ||
551 | |||
552 | All data will be delivered with the following control message attached: | ||
553 | |||
554 | RXRPC_USER_CALL_ID - specifies the user ID for this call | ||
555 | |||
556 | If an abort or error occurred, this will be returned in the control data | ||
557 | buffer instead, and MSG_EOR will be flagged to indicate the end of that | ||
558 | call. | ||
559 | |||
560 | |||
561 | ==================== | ||
562 | EXAMPLE SERVER USAGE | ||
563 | ==================== | ||
564 | |||
565 | A server would be set up to accept operations in the following manner: | ||
566 | |||
567 | (1) An RxRPC socket is created by: | ||
568 | |||
569 | server = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); | ||
570 | |||
571 | Where the third parameter indicates the address type of the transport | ||
572 | socket used - usually IPv4. | ||
573 | |||
574 | (2) Security is set up if desired by giving the socket a keyring with server | ||
575 | secret keys in it: | ||
576 | |||
577 | keyring = add_key("keyring", "AFSkeys", NULL, 0, | ||
578 | KEY_SPEC_PROCESS_KEYRING); | ||
579 | |||
580 | const char secret_key[8] = { | ||
581 | 0xa7, 0x83, 0x8a, 0xcb, 0xc7, 0x83, 0xec, 0x94 }; | ||
582 | add_key("rxrpc_s", "52:2", secret_key, 8, keyring); | ||
583 | |||
584 | setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7); | ||
585 | |||
586 | The keyring can be manipulated after it has been given to the socket. This | ||
587 | permits the server to add more keys, replace keys, etc. whilst it is live. | ||
588 | |||
589 | (2) A local address must then be bound: | ||
590 | |||
591 | struct sockaddr_rxrpc srx = { | ||
592 | .srx_family = AF_RXRPC, | ||
593 | .srx_service = VL_SERVICE_ID, /* RxRPC service ID */ | ||
594 | .transport_type = SOCK_DGRAM, /* type of transport socket */ | ||
595 | .transport.sin_family = AF_INET, | ||
596 | .transport.sin_port = htons(7000), /* AFS callback */ | ||
597 | .transport.sin_address = 0, /* all local interfaces */ | ||
598 | }; | ||
599 | bind(server, &srx, sizeof(srx)); | ||
600 | |||
601 | (3) The server is then set to listen out for incoming calls: | ||
602 | |||
603 | listen(server, 100); | ||
604 | |||
605 | (4) The kernel notifies the server of pending incoming connections by sending | ||
606 | it a message for each. This is received with recvmsg() on the server | ||
607 | socket. It has no data, and has a single dataless control message | ||
608 | attached: | ||
609 | |||
610 | RXRPC_NEW_CALL | ||
611 | |||
612 | The address that can be passed back by recvmsg() at this point should be | ||
613 | ignored since the call for which the message was posted may have gone by | ||
614 | the time it is accepted - in which case the first call still on the queue | ||
615 | will be accepted. | ||
616 | |||
617 | (5) The server then accepts the new call by issuing a sendmsg() with two | ||
618 | pieces of control data and no actual data: | ||
619 | |||
620 | RXRPC_ACCEPT - indicate connection acceptance | ||
621 | RXRPC_USER_CALL_ID - specify user ID for this call | ||
622 | |||
623 | (6) The first request data packet will then be posted to the server socket for | ||
624 | recvmsg() to pick up. At that point, the RxRPC address for the call can | ||
625 | be read from the address fields in the msghdr struct. | ||
626 | |||
627 | Subsequent request data will be posted to the server socket for recvmsg() | ||
628 | to collect as it arrives. All but the last piece of the request data will | ||
629 | be delivered with MSG_MORE flagged. | ||
630 | |||
631 | All data will be delivered with the following control message attached: | ||
632 | |||
633 | RXRPC_USER_CALL_ID - specifies the user ID for this call | ||
634 | |||
635 | (8) The reply data should then be posted to the server socket using a series | ||
636 | of sendmsg() calls, each with the following control messages attached: | ||
637 | |||
638 | RXRPC_USER_CALL_ID - specifies the user ID for this call | ||
639 | |||
640 | MSG_MORE should be set in msghdr::msg_flags on all but the last message | ||
641 | for a particular call. | ||
642 | |||
643 | (9) The final ACK from the client will be posted for retrieval by recvmsg() | ||
644 | when it is received. It will take the form of a dataless message with two | ||
645 | control messages attached: | ||
646 | |||
647 | RXRPC_USER_CALL_ID - specifies the user ID for this call | ||
648 | RXRPC_ACK - indicates final ACK (no data) | ||
649 | |||
650 | MSG_EOR will be flagged to indicate that this is the final message for | ||
651 | this call. | ||
652 | |||
653 | (10) Up to the point the final packet of reply data is sent, the call can be | ||
654 | aborted by calling sendmsg() with a dataless message with the following | ||
655 | control messages attached: | ||
656 | |||
657 | RXRPC_USER_CALL_ID - specifies the user ID for this call | ||
658 | RXRPC_ABORT - indicates abort code (4 byte data) | ||
659 | |||
660 | Any packets waiting in the socket's receive queue will be discarded if | ||
661 | this is issued. | ||
662 | |||
663 | Note that all the communications for a particular service take place through | ||
664 | the one server socket, using control messages on sendmsg() and recvmsg() to | ||
665 | determine the call affected. | ||
666 | |||
667 | |||
668 | ========================= | ||
669 | AF_RXRPC KERNEL INTERFACE | ||
670 | ========================= | ||
671 | |||
672 | The AF_RXRPC module also provides an interface for use by in-kernel utilities | ||
673 | such as the AFS filesystem. This permits such a utility to: | ||
674 | |||
675 | (1) Use different keys directly on individual client calls on one socket | ||
676 | rather than having to open a whole slew of sockets, one for each key it | ||
677 | might want to use. | ||
678 | |||
679 | (2) Avoid having RxRPC call request_key() at the point of issue of a call or | ||
680 | opening of a socket. Instead the utility is responsible for requesting a | ||
681 | key at the appropriate point. AFS, for instance, would do this during VFS | ||
682 | operations such as open() or unlink(). The key is then handed through | ||
683 | when the call is initiated. | ||
684 | |||
685 | (3) Request the use of something other than GFP_KERNEL to allocate memory. | ||
686 | |||
687 | (4) Avoid the overhead of using the recvmsg() call. RxRPC messages can be | ||
688 | intercepted before they get put into the socket Rx queue and the socket | ||
689 | buffers manipulated directly. | ||
690 | |||
691 | To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket, | ||
692 | bind an addess as appropriate and listen if it's to be a server socket, but | ||
693 | then it passes this to the kernel interface functions. | ||
694 | |||
695 | The kernel interface functions are as follows: | ||
696 | |||
697 | (*) Begin a new client call. | ||
698 | |||
699 | struct rxrpc_call * | ||
700 | rxrpc_kernel_begin_call(struct socket *sock, | ||
701 | struct sockaddr_rxrpc *srx, | ||
702 | struct key *key, | ||
703 | unsigned long user_call_ID, | ||
704 | gfp_t gfp); | ||
705 | |||
706 | This allocates the infrastructure to make a new RxRPC call and assigns | ||
707 | call and connection numbers. The call will be made on the UDP port that | ||
708 | the socket is bound to. The call will go to the destination address of a | ||
709 | connected client socket unless an alternative is supplied (srx is | ||
710 | non-NULL). | ||
711 | |||
712 | If a key is supplied then this will be used to secure the call instead of | ||
713 | the key bound to the socket with the RXRPC_SECURITY_KEY sockopt. Calls | ||
714 | secured in this way will still share connections if at all possible. | ||
715 | |||
716 | The user_call_ID is equivalent to that supplied to sendmsg() in the | ||
717 | control data buffer. It is entirely feasible to use this to point to a | ||
718 | kernel data structure. | ||
719 | |||
720 | If this function is successful, an opaque reference to the RxRPC call is | ||
721 | returned. The caller now holds a reference on this and it must be | ||
722 | properly ended. | ||
723 | |||
724 | (*) End a client call. | ||
725 | |||
726 | void rxrpc_kernel_end_call(struct rxrpc_call *call); | ||
727 | |||
728 | This is used to end a previously begun call. The user_call_ID is expunged | ||
729 | from AF_RXRPC's knowledge and will not be seen again in association with | ||
730 | the specified call. | ||
731 | |||
732 | (*) Send data through a call. | ||
733 | |||
734 | int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg, | ||
735 | size_t len); | ||
736 | |||
737 | This is used to supply either the request part of a client call or the | ||
738 | reply part of a server call. msg.msg_iovlen and msg.msg_iov specify the | ||
739 | data buffers to be used. msg_iov may not be NULL and must point | ||
740 | exclusively to in-kernel virtual addresses. msg.msg_flags may be given | ||
741 | MSG_MORE if there will be subsequent data sends for this call. | ||
742 | |||
743 | The msg must not specify a destination address, control data or any flags | ||
744 | other than MSG_MORE. len is the total amount of data to transmit. | ||
745 | |||
746 | (*) Abort a call. | ||
747 | |||
748 | void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code); | ||
749 | |||
750 | This is used to abort a call if it's still in an abortable state. The | ||
751 | abort code specified will be placed in the ABORT message sent. | ||
752 | |||
753 | (*) Intercept received RxRPC messages. | ||
754 | |||
755 | typedef void (*rxrpc_interceptor_t)(struct sock *sk, | ||
756 | unsigned long user_call_ID, | ||
757 | struct sk_buff *skb); | ||
758 | |||
759 | void | ||
760 | rxrpc_kernel_intercept_rx_messages(struct socket *sock, | ||
761 | rxrpc_interceptor_t interceptor); | ||
762 | |||
763 | This installs an interceptor function on the specified AF_RXRPC socket. | ||
764 | All messages that would otherwise wind up in the socket's Rx queue are | ||
765 | then diverted to this function. Note that care must be taken to process | ||
766 | the messages in the right order to maintain DATA message sequentiality. | ||
767 | |||
768 | The interceptor function itself is provided with the address of the socket | ||
769 | and handling the incoming message, the ID assigned by the kernel utility | ||
770 | to the call and the socket buffer containing the message. | ||
771 | |||
772 | The skb->mark field indicates the type of message: | ||
773 | |||
774 | MARK MEANING | ||
775 | =============================== ======================================= | ||
776 | RXRPC_SKB_MARK_DATA Data message | ||
777 | RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call | ||
778 | RXRPC_SKB_MARK_BUSY Client call rejected as server busy | ||
779 | RXRPC_SKB_MARK_REMOTE_ABORT Call aborted by peer | ||
780 | RXRPC_SKB_MARK_NET_ERROR Network error detected | ||
781 | RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered | ||
782 | RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance | ||
783 | |||
784 | The remote abort message can be probed with rxrpc_kernel_get_abort_code(). | ||
785 | The two error messages can be probed with rxrpc_kernel_get_error_number(). | ||
786 | A new call can be accepted with rxrpc_kernel_accept_call(). | ||
787 | |||
788 | Data messages can have their contents extracted with the usual bunch of | ||
789 | socket buffer manipulation functions. A data message can be determined to | ||
790 | be the last one in a sequence with rxrpc_kernel_is_data_last(). When a | ||
791 | data message has been used up, rxrpc_kernel_data_delivered() should be | ||
792 | called on it.. | ||
793 | |||
794 | Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose | ||
795 | of. It is possible to get extra refs on all types of message for later | ||
796 | freeing, but this may pin the state of a call until the message is finally | ||
797 | freed. | ||
798 | |||
799 | (*) Accept an incoming call. | ||
800 | |||
801 | struct rxrpc_call * | ||
802 | rxrpc_kernel_accept_call(struct socket *sock, | ||
803 | unsigned long user_call_ID); | ||
804 | |||
805 | This is used to accept an incoming call and to assign it a call ID. This | ||
806 | function is similar to rxrpc_kernel_begin_call() and calls accepted must | ||
807 | be ended in the same way. | ||
808 | |||
809 | If this function is successful, an opaque reference to the RxRPC call is | ||
810 | returned. The caller now holds a reference on this and it must be | ||
811 | properly ended. | ||
812 | |||
813 | (*) Reject an incoming call. | ||
814 | |||
815 | int rxrpc_kernel_reject_call(struct socket *sock); | ||
816 | |||
817 | This is used to reject the first incoming call on the socket's queue with | ||
818 | a BUSY message. -ENODATA is returned if there were no incoming calls. | ||
819 | Other errors may be returned if the call had been aborted (-ECONNABORTED) | ||
820 | or had timed out (-ETIME). | ||
821 | |||
822 | (*) Record the delivery of a data message and free it. | ||
823 | |||
824 | void rxrpc_kernel_data_delivered(struct sk_buff *skb); | ||
825 | |||
826 | This is used to record a data message as having been delivered and to | ||
827 | update the ACK state for the call. The socket buffer will be freed. | ||
828 | |||
829 | (*) Free a message. | ||
830 | |||
831 | void rxrpc_kernel_free_skb(struct sk_buff *skb); | ||
832 | |||
833 | This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC | ||
834 | socket. | ||
835 | |||
836 | (*) Determine if a data message is the last one on a call. | ||
837 | |||
838 | bool rxrpc_kernel_is_data_last(struct sk_buff *skb); | ||
839 | |||
840 | This is used to determine if a socket buffer holds the last data message | ||
841 | to be received for a call (true will be returned if it does, false | ||
842 | if not). | ||
843 | |||
844 | The data message will be part of the reply on a client call and the | ||
845 | request on an incoming call. In the latter case there will be more | ||
846 | messages, but in the former case there will not. | ||
847 | |||
848 | (*) Get the abort code from an abort message. | ||
849 | |||
850 | u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb); | ||
851 | |||
852 | This is used to extract the abort code from a remote abort message. | ||
853 | |||
854 | (*) Get the error number from a local or network error message. | ||
855 | |||
856 | int rxrpc_kernel_get_error_number(struct sk_buff *skb); | ||
857 | |||
858 | This is used to extract the error number from a message indicating either | ||
859 | a local error occurred or a network error occurred. | ||