diff options
Diffstat (limited to 'Documentation/infiniband')
-rw-r--r-- | Documentation/infiniband/user_verbs.txt | 69 |
1 files changed, 69 insertions, 0 deletions
diff --git a/Documentation/infiniband/user_verbs.txt b/Documentation/infiniband/user_verbs.txt new file mode 100644 index 000000000000..f847501e50b5 --- /dev/null +++ b/Documentation/infiniband/user_verbs.txt | |||
@@ -0,0 +1,69 @@ | |||
1 | USERSPACE VERBS ACCESS | ||
2 | |||
3 | The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, | ||
4 | enables direct userspace access to IB hardware via "verbs," as | ||
5 | described in chapter 11 of the InfiniBand Architecture Specification. | ||
6 | |||
7 | To use the verbs, the libibverbs library, available from | ||
8 | <http://openib.org/>, is required. libibverbs contains a | ||
9 | device-independent API for using the ib_uverbs interface. | ||
10 | libibverbs also requires appropriate device-dependent kernel and | ||
11 | userspace driver for your InfiniBand hardware. For example, to use | ||
12 | a Mellanox HCA, you will need the ib_mthca kernel module and the | ||
13 | libmthca userspace driver be installed. | ||
14 | |||
15 | User-kernel communication | ||
16 | |||
17 | Userspace communicates with the kernel for slow path, resource | ||
18 | management operations via the /dev/infiniband/uverbsN character | ||
19 | devices. Fast path operations are typically performed by writing | ||
20 | directly to hardware registers mmap()ed into userspace, with no | ||
21 | system call or context switch into the kernel. | ||
22 | |||
23 | Commands are sent to the kernel via write()s on these device files. | ||
24 | The ABI is defined in drivers/infiniband/include/ib_user_verbs.h. | ||
25 | The structs for commands that require a response from the kernel | ||
26 | contain a 64-bit field used to pass a pointer to an output buffer. | ||
27 | Status is returned to userspace as the return value of the write() | ||
28 | system call. | ||
29 | |||
30 | Resource management | ||
31 | |||
32 | Since creation and destruction of all IB resources is done by | ||
33 | commands passed through a file descriptor, the kernel can keep track | ||
34 | of which resources are attached to a given userspace context. The | ||
35 | ib_uverbs module maintains idr tables that are used to translate | ||
36 | between kernel pointers and opaque userspace handles, so that kernel | ||
37 | pointers are never exposed to userspace and userspace cannot trick | ||
38 | the kernel into following a bogus pointer. | ||
39 | |||
40 | This also allows the kernel to clean up when a process exits and | ||
41 | prevent one process from touching another process's resources. | ||
42 | |||
43 | Memory pinning | ||
44 | |||
45 | Direct userspace I/O requires that memory regions that are potential | ||
46 | I/O targets be kept resident at the same physical address. The | ||
47 | ib_uverbs module manages pinning and unpinning memory regions via | ||
48 | get_user_pages() and put_page() calls. It also accounts for the | ||
49 | amount of memory pinned in the process's locked_vm, and checks that | ||
50 | unprivileged processes do not exceed their RLIMIT_MEMLOCK limit. | ||
51 | |||
52 | Pages that are pinned multiple times are counted each time they are | ||
53 | pinned, so the value of locked_vm may be an overestimate of the | ||
54 | number of pages pinned by a process. | ||
55 | |||
56 | /dev files | ||
57 | |||
58 | To create the appropriate character device files automatically with | ||
59 | udev, a rule like | ||
60 | |||
61 | KERNEL="uverbs*", NAME="infiniband/%k" | ||
62 | |||
63 | can be used. This will create device nodes named | ||
64 | |||
65 | /dev/infiniband/uverbs0 | ||
66 | |||
67 | and so on. Since the InfiniBand userspace verbs should be safe for | ||
68 | use by non-privileged processes, it may be useful to add an | ||
69 | appropriate MODE or GROUP to the udev rule. | ||