diff options
Diffstat (limited to 'Documentation/infiniband')
-rw-r--r-- | Documentation/infiniband/ipoib.txt | 56 | ||||
-rw-r--r-- | Documentation/infiniband/sysfs.txt | 66 | ||||
-rw-r--r-- | Documentation/infiniband/user_mad.txt | 99 |
3 files changed, 221 insertions, 0 deletions
diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt new file mode 100644 index 000000000000..18e184b56040 --- /dev/null +++ b/Documentation/infiniband/ipoib.txt | |||
@@ -0,0 +1,56 @@ | |||
1 | IP OVER INFINIBAND | ||
2 | |||
3 | The ib_ipoib driver is an implementation of the IP over InfiniBand | ||
4 | protocol as specified by the latest Internet-Drafts issued by the | ||
5 | IETF ipoib working group. It is a "native" implementation in the | ||
6 | sense of setting the interface type to ARPHRD_INFINIBAND and the | ||
7 | hardware address length to 20 (earlier proprietary implementations | ||
8 | masqueraded to the kernel as ethernet interfaces). | ||
9 | |||
10 | Partitions and P_Keys | ||
11 | |||
12 | When the IPoIB driver is loaded, it creates one interface for each | ||
13 | port using the P_Key at index 0. To create an interface with a | ||
14 | different P_Key, write the desired P_Key into the main interface's | ||
15 | /sys/class/net/<intf name>/create_child file. For example: | ||
16 | |||
17 | echo 0x8001 > /sys/class/net/ib0/create_child | ||
18 | |||
19 | This will create an interface named ib0.8001 with P_Key 0x8001. To | ||
20 | remove a subinterface, use the "delete_child" file: | ||
21 | |||
22 | echo 0x8001 > /sys/class/net/ib0/delete_child | ||
23 | |||
24 | The P_Key for any interface is given by the "pkey" file, and the | ||
25 | main interface for a subinterface is in "parent." | ||
26 | |||
27 | Debugging Information | ||
28 | |||
29 | By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set | ||
30 | to 'y', tracing messages are compiled into the driver. They are | ||
31 | turned on by setting the module parameters debug_level and | ||
32 | mcast_debug_level to 1. These parameters can be controlled at | ||
33 | runtime through files in /sys/module/ib_ipoib/. | ||
34 | |||
35 | CONFIG_INFINIBAND_IPOIB_DEBUG also enables the "ipoib_debugfs" | ||
36 | virtual filesystem. By mounting this filesystem, for example with | ||
37 | |||
38 | mkdir -p /ipoib_debugfs | ||
39 | mount -t ipoib_debugfs none /ipoib_debufs | ||
40 | |||
41 | it is possible to get statistics about multicast groups from the | ||
42 | files /ipoib_debugfs/ib0_mcg and so on. | ||
43 | |||
44 | The performance impact of this option is negligible, so it | ||
45 | is safe to enable this option with debug_level set to 0 for normal | ||
46 | operation. | ||
47 | |||
48 | CONFIG_INFINIBAND_IPOIB_DEBUG_DATA enables even more debug output in | ||
49 | the data path when data_debug_level is set to 1. However, even with | ||
50 | the output disabled, enabling this configuration option will affect | ||
51 | performance, because it adds tests to the fast path. | ||
52 | |||
53 | References | ||
54 | |||
55 | IETF IP over InfiniBand (ipoib) Working Group | ||
56 | http://ietf.org/html.charters/ipoib-charter.html | ||
diff --git a/Documentation/infiniband/sysfs.txt b/Documentation/infiniband/sysfs.txt new file mode 100644 index 000000000000..ddd519b72ee1 --- /dev/null +++ b/Documentation/infiniband/sysfs.txt | |||
@@ -0,0 +1,66 @@ | |||
1 | SYSFS FILES | ||
2 | |||
3 | For each InfiniBand device, the InfiniBand drivers create the | ||
4 | following files under /sys/class/infiniband/<device name>: | ||
5 | |||
6 | node_type - Node type (CA, switch or router) | ||
7 | node_guid - Node GUID | ||
8 | sys_image_guid - System image GUID | ||
9 | |||
10 | In addition, there is a "ports" subdirectory, with one subdirectory | ||
11 | for each port. For example, if mthca0 is a 2-port HCA, there will | ||
12 | be two directories: | ||
13 | |||
14 | /sys/class/infiniband/mthca0/ports/1 | ||
15 | /sys/class/infiniband/mthca0/ports/2 | ||
16 | |||
17 | (A switch will only have a single "0" subdirectory for switch port | ||
18 | 0; no subdirectory is created for normal switch ports) | ||
19 | |||
20 | In each port subdirectory, the following files are created: | ||
21 | |||
22 | cap_mask - Port capability mask | ||
23 | lid - Port LID | ||
24 | lid_mask_count - Port LID mask count | ||
25 | rate - Port data rate (active width * active speed) | ||
26 | sm_lid - Subnet manager LID for port's subnet | ||
27 | sm_sl - Subnet manager SL for port's subnet | ||
28 | state - Port state (DOWN, INIT, ARMED, ACTIVE or ACTIVE_DEFER) | ||
29 | phys_state - Port physical state (Sleep, Polling, LinkUp, etc) | ||
30 | |||
31 | There is also a "counters" subdirectory, with files | ||
32 | |||
33 | VL15_dropped | ||
34 | excessive_buffer_overrun_errors | ||
35 | link_downed | ||
36 | link_error_recovery | ||
37 | local_link_integrity_errors | ||
38 | port_rcv_constraint_errors | ||
39 | port_rcv_data | ||
40 | port_rcv_errors | ||
41 | port_rcv_packets | ||
42 | port_rcv_remote_physical_errors | ||
43 | port_rcv_switch_relay_errors | ||
44 | port_xmit_constraint_errors | ||
45 | port_xmit_data | ||
46 | port_xmit_discards | ||
47 | port_xmit_packets | ||
48 | symbol_error | ||
49 | |||
50 | Each of these files contains the corresponding value from the port's | ||
51 | Performance Management PortCounters attribute, as described in | ||
52 | section 16.1.3.5 of the InfiniBand Architecture Specification. | ||
53 | |||
54 | The "pkeys" and "gids" subdirectories contain one file for each | ||
55 | entry in the port's P_Key or GID table respectively. For example, | ||
56 | ports/1/pkeys/10 contains the value at index 10 in port 1's P_Key | ||
57 | table. | ||
58 | |||
59 | MTHCA | ||
60 | |||
61 | The Mellanox HCA driver also creates the files: | ||
62 | |||
63 | hw_rev - Hardware revision number | ||
64 | fw_ver - Firmware version | ||
65 | hca_type - HCA type: "MT23108", "MT25208 (MT23108 compat mode)", | ||
66 | or "MT25208" | ||
diff --git a/Documentation/infiniband/user_mad.txt b/Documentation/infiniband/user_mad.txt new file mode 100644 index 000000000000..cae0c83f1ee9 --- /dev/null +++ b/Documentation/infiniband/user_mad.txt | |||
@@ -0,0 +1,99 @@ | |||
1 | USERSPACE MAD ACCESS | ||
2 | |||
3 | Device files | ||
4 | |||
5 | Each port of each InfiniBand device has a "umad" device and an | ||
6 | "issm" device attached. For example, a two-port HCA will have two | ||
7 | umad devices and two issm devices, while a switch will have one | ||
8 | device of each type (for switch port 0). | ||
9 | |||
10 | Creating MAD agents | ||
11 | |||
12 | A MAD agent can be created by filling in a struct ib_user_mad_reg_req | ||
13 | and then calling the IB_USER_MAD_REGISTER_AGENT ioctl on a file | ||
14 | descriptor for the appropriate device file. If the registration | ||
15 | request succeeds, a 32-bit id will be returned in the structure. | ||
16 | For example: | ||
17 | |||
18 | struct ib_user_mad_reg_req req = { /* ... */ }; | ||
19 | ret = ioctl(fd, IB_USER_MAD_REGISTER_AGENT, (char *) &req); | ||
20 | if (!ret) | ||
21 | my_agent = req.id; | ||
22 | else | ||
23 | perror("agent register"); | ||
24 | |||
25 | Agents can be unregistered with the IB_USER_MAD_UNREGISTER_AGENT | ||
26 | ioctl. Also, all agents registered through a file descriptor will | ||
27 | be unregistered when the descriptor is closed. | ||
28 | |||
29 | Receiving MADs | ||
30 | |||
31 | MADs are received using read(). The buffer passed to read() must be | ||
32 | large enough to hold at least one struct ib_user_mad. For example: | ||
33 | |||
34 | struct ib_user_mad mad; | ||
35 | ret = read(fd, &mad, sizeof mad); | ||
36 | if (ret != sizeof mad) | ||
37 | perror("read"); | ||
38 | |||
39 | In addition to the actual MAD contents, the other struct ib_user_mad | ||
40 | fields will be filled in with information on the received MAD. For | ||
41 | example, the remote LID will be in mad.lid. | ||
42 | |||
43 | If a send times out, a receive will be generated with mad.status set | ||
44 | to ETIMEDOUT. Otherwise when a MAD has been successfully received, | ||
45 | mad.status will be 0. | ||
46 | |||
47 | poll()/select() may be used to wait until a MAD can be read. | ||
48 | |||
49 | Sending MADs | ||
50 | |||
51 | MADs are sent using write(). The agent ID for sending should be | ||
52 | filled into the id field of the MAD, the destination LID should be | ||
53 | filled into the lid field, and so on. For example: | ||
54 | |||
55 | struct ib_user_mad mad; | ||
56 | |||
57 | /* fill in mad.data */ | ||
58 | |||
59 | mad.id = my_agent; /* req.id from agent registration */ | ||
60 | mad.lid = my_dest; /* in network byte order... */ | ||
61 | /* etc. */ | ||
62 | |||
63 | ret = write(fd, &mad, sizeof mad); | ||
64 | if (ret != sizeof mad) | ||
65 | perror("write"); | ||
66 | |||
67 | Setting IsSM Capability Bit | ||
68 | |||
69 | To set the IsSM capability bit for a port, simply open the | ||
70 | corresponding issm device file. If the IsSM bit is already set, | ||
71 | then the open call will block until the bit is cleared (or return | ||
72 | immediately with errno set to EAGAIN if the O_NONBLOCK flag is | ||
73 | passed to open()). The IsSM bit will be cleared when the issm file | ||
74 | is closed. No read, write or other operations can be performed on | ||
75 | the issm file. | ||
76 | |||
77 | /dev files | ||
78 | |||
79 | To create the appropriate character device files automatically with | ||
80 | udev, a rule like | ||
81 | |||
82 | KERNEL="umad*", NAME="infiniband/%k" | ||
83 | KERNEL="issm*", NAME="infiniband/%k" | ||
84 | |||
85 | can be used. This will create device nodes named | ||
86 | |||
87 | /dev/infiniband/umad0 | ||
88 | /dev/infiniband/issm0 | ||
89 | |||
90 | for the first port, and so on. The InfiniBand device and port | ||
91 | associated with these devices can be determined from the files | ||
92 | |||
93 | /sys/class/infiniband_mad/umad0/ibdev | ||
94 | /sys/class/infiniband_mad/umad0/port | ||
95 | |||
96 | and | ||
97 | |||
98 | /sys/class/infiniband_mad/issm0/ibdev | ||
99 | /sys/class/infiniband_mad/issm0/port | ||