diff options
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/00-INDEX | 3 | ||||
-rw-r--r-- | Documentation/networking/ip-sysctl.txt | 3 | ||||
-rw-r--r-- | Documentation/networking/l2tp.txt | 169 | ||||
-rw-r--r-- | Documentation/networking/multiqueue.txt | 111 | ||||
-rw-r--r-- | Documentation/networking/netdevices.txt | 38 | ||||
-rw-r--r-- | Documentation/networking/sk98lin.txt | 568 | ||||
-rw-r--r-- | Documentation/networking/spider_net.txt | 204 |
7 files changed, 517 insertions, 579 deletions
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 153d84d281e6..d63f480afb74 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX | |||
@@ -96,9 +96,6 @@ routing.txt | |||
96 | - the new routing mechanism | 96 | - the new routing mechanism |
97 | shaper.txt | 97 | shaper.txt |
98 | - info on the module that can shape/limit transmitted traffic. | 98 | - info on the module that can shape/limit transmitted traffic. |
99 | sk98lin.txt | ||
100 | - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit | ||
101 | Ethernet Adapter family driver info | ||
102 | skfp.txt | 99 | skfp.txt |
103 | - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. | 100 | - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. |
104 | smc9.txt | 101 | smc9.txt |
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 8f6067ea5e3e..32c2e9da5f3a 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt | |||
@@ -880,8 +880,7 @@ accept_redirects - BOOLEAN | |||
880 | accept_source_route - INTEGER | 880 | accept_source_route - INTEGER |
881 | Accept source routing (routing extension header). | 881 | Accept source routing (routing extension header). |
882 | 882 | ||
883 | > 0: Accept routing header. | 883 | >= 0: Accept only routing header type 2. |
884 | = 0: Accept only routing header type 2. | ||
885 | < 0: Do not accept routing header. | 884 | < 0: Do not accept routing header. |
886 | 885 | ||
887 | Default: 0 | 886 | Default: 0 |
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt new file mode 100644 index 000000000000..2451f551c505 --- /dev/null +++ b/Documentation/networking/l2tp.txt | |||
@@ -0,0 +1,169 @@ | |||
1 | This brief document describes how to use the kernel's PPPoL2TP driver | ||
2 | to provide L2TP functionality. L2TP is a protocol that tunnels one or | ||
3 | more PPP sessions over a UDP tunnel. It is commonly used for VPNs | ||
4 | (L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP | ||
5 | network infrastructure. | ||
6 | |||
7 | Design | ||
8 | ====== | ||
9 | |||
10 | The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by | ||
11 | which PPP frames carried through an L2TP session are passed through | ||
12 | the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all | ||
13 | PPP interaction with the peer. PPP network interfaces are created for | ||
14 | each local PPP endpoint. | ||
15 | |||
16 | The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP | ||
17 | control and data frames. L2TP control frames carry messages between | ||
18 | L2TP clients/servers and are used to setup / teardown tunnels and | ||
19 | sessions. An L2TP client or server is implemented in userspace and | ||
20 | will use a regular UDP socket per tunnel. L2TP data frames carry PPP | ||
21 | frames, which may be PPP control or PPP data. The kernel's PPP | ||
22 | subsystem arranges for PPP control frames to be delivered to pppd, | ||
23 | while data frames are forwarded as usual. | ||
24 | |||
25 | Each tunnel and session within a tunnel is assigned a unique tunnel_id | ||
26 | and session_id. These ids are carried in the L2TP header of every | ||
27 | control and data packet. The pppol2tp driver uses them to lookup | ||
28 | internal tunnel and/or session contexts. Zero tunnel / session ids are | ||
29 | treated specially - zero ids are never assigned to tunnels or sessions | ||
30 | in the network. In the driver, the tunnel context keeps a pointer to | ||
31 | the tunnel UDP socket. The session context keeps a pointer to the | ||
32 | PPPoL2TP socket, as well as other data that lets the driver interface | ||
33 | to the kernel PPP subsystem. | ||
34 | |||
35 | Note that the pppol2tp kernel driver handles only L2TP data frames; | ||
36 | L2TP control frames are simply passed up to userspace in the UDP | ||
37 | tunnel socket. The kernel handles all datapath aspects of the | ||
38 | protocol, including data packet resequencing (if enabled). | ||
39 | |||
40 | There are a number of requirements on the userspace L2TP daemon in | ||
41 | order to use the pppol2tp driver. | ||
42 | |||
43 | 1. Use a UDP socket per tunnel. | ||
44 | |||
45 | 2. Create a single PPPoL2TP socket per tunnel bound to a special null | ||
46 | session id. This is used only for communicating with the driver but | ||
47 | must remain open while the tunnel is active. Opening this tunnel | ||
48 | management socket causes the driver to mark the tunnel socket as an | ||
49 | L2TP UDP encapsulation socket and flags it for use by the | ||
50 | referenced tunnel id. This hooks up the UDP receive path via | ||
51 | udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed | ||
52 | in this special PPPoX socket. | ||
53 | |||
54 | 3. Create a PPPoL2TP socket per L2TP session. This is typically done | ||
55 | by starting pppd with the pppol2tp plugin and appropriate | ||
56 | arguments. A PPPoL2TP tunnel management socket (Step 2) must be | ||
57 | created before the first PPPoL2TP session socket is created. | ||
58 | |||
59 | When creating PPPoL2TP sockets, the application provides information | ||
60 | to the driver about the socket in a socket connect() call. Source and | ||
61 | destination tunnel and session ids are provided, as well as the file | ||
62 | descriptor of a UDP socket. See struct pppol2tp_addr in | ||
63 | include/linux/if_ppp.h. Note that zero tunnel / session ids are | ||
64 | treated specially. When creating the per-tunnel PPPoL2TP management | ||
65 | socket in Step 2 above, zero source and destination session ids are | ||
66 | specified, which tells the driver to prepare the supplied UDP file | ||
67 | descriptor for use as an L2TP tunnel socket. | ||
68 | |||
69 | Userspace may control behavior of the tunnel or session using | ||
70 | setsockopt and ioctl on the PPPoX socket. The following socket | ||
71 | options are supported:- | ||
72 | |||
73 | DEBUG - bitmask of debug message categories. See below. | ||
74 | SENDSEQ - 0 => don't send packets with sequence numbers | ||
75 | 1 => send packets with sequence numbers | ||
76 | RECVSEQ - 0 => receive packet sequence numbers are optional | ||
77 | 1 => drop receive packets without sequence numbers | ||
78 | LNSMODE - 0 => act as LAC. | ||
79 | 1 => act as LNS. | ||
80 | REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder. | ||
81 | |||
82 | Only the DEBUG option is supported by the special tunnel management | ||
83 | PPPoX socket. | ||
84 | |||
85 | In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided | ||
86 | to retrieve tunnel and session statistics from the kernel using the | ||
87 | PPPoX socket of the appropriate tunnel or session. | ||
88 | |||
89 | Debugging | ||
90 | ========= | ||
91 | |||
92 | The driver supports a flexible debug scheme where kernel trace | ||
93 | messages may be optionally enabled per tunnel and per session. Care is | ||
94 | needed when debugging a live system since the messages are not | ||
95 | rate-limited and a busy system could be swamped. Userspace uses | ||
96 | setsockopt on the PPPoX socket to set a debug mask. | ||
97 | |||
98 | The following debug mask bits are available: | ||
99 | |||
100 | PPPOL2TP_MSG_DEBUG verbose debug (if compiled in) | ||
101 | PPPOL2TP_MSG_CONTROL userspace - kernel interface | ||
102 | PPPOL2TP_MSG_SEQ sequence numbers handling | ||
103 | PPPOL2TP_MSG_DATA data packets | ||
104 | |||
105 | Sample Userspace Code | ||
106 | ===================== | ||
107 | |||
108 | 1. Create tunnel management PPPoX socket | ||
109 | |||
110 | kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); | ||
111 | if (kernel_fd >= 0) { | ||
112 | struct sockaddr_pppol2tp sax; | ||
113 | struct sockaddr_in const *peer_addr; | ||
114 | |||
115 | peer_addr = l2tp_tunnel_get_peer_addr(tunnel); | ||
116 | memset(&sax, 0, sizeof(sax)); | ||
117 | sax.sa_family = AF_PPPOX; | ||
118 | sax.sa_protocol = PX_PROTO_OL2TP; | ||
119 | sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */ | ||
120 | sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr; | ||
121 | sax.pppol2tp.addr.sin_port = peer_addr->sin_port; | ||
122 | sax.pppol2tp.addr.sin_family = AF_INET; | ||
123 | sax.pppol2tp.s_tunnel = tunnel_id; | ||
124 | sax.pppol2tp.s_session = 0; /* special case: mgmt socket */ | ||
125 | sax.pppol2tp.d_tunnel = 0; | ||
126 | sax.pppol2tp.d_session = 0; /* special case: mgmt socket */ | ||
127 | |||
128 | if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) { | ||
129 | perror("connect failed"); | ||
130 | result = -errno; | ||
131 | goto err; | ||
132 | } | ||
133 | } | ||
134 | |||
135 | 2. Create session PPPoX data socket | ||
136 | |||
137 | struct sockaddr_pppol2tp sax; | ||
138 | int fd; | ||
139 | |||
140 | /* Note, the target socket must be bound already, else it will not be ready */ | ||
141 | sax.sa_family = AF_PPPOX; | ||
142 | sax.sa_protocol = PX_PROTO_OL2TP; | ||
143 | sax.pppol2tp.fd = tunnel_fd; | ||
144 | sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr; | ||
145 | sax.pppol2tp.addr.sin_port = addr->sin_port; | ||
146 | sax.pppol2tp.addr.sin_family = AF_INET; | ||
147 | sax.pppol2tp.s_tunnel = tunnel_id; | ||
148 | sax.pppol2tp.s_session = session_id; | ||
149 | sax.pppol2tp.d_tunnel = peer_tunnel_id; | ||
150 | sax.pppol2tp.d_session = peer_session_id; | ||
151 | |||
152 | /* session_fd is the fd of the session's PPPoL2TP socket. | ||
153 | * tunnel_fd is the fd of the tunnel UDP socket. | ||
154 | */ | ||
155 | fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax)); | ||
156 | if (fd < 0 ) { | ||
157 | return -errno; | ||
158 | } | ||
159 | return 0; | ||
160 | |||
161 | Miscellanous | ||
162 | ============ | ||
163 | |||
164 | The PPPoL2TP driver was developed as part of the OpenL2TP project by | ||
165 | Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server, | ||
166 | designed from the ground up to have the L2TP datapath in the | ||
167 | kernel. The project also implemented the pppol2tp plugin for pppd | ||
168 | which allows pppd to use the kernel driver. Details can be found at | ||
169 | http://openl2tp.sourceforge.net. | ||
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt new file mode 100644 index 000000000000..00b60cce2224 --- /dev/null +++ b/Documentation/networking/multiqueue.txt | |||
@@ -0,0 +1,111 @@ | |||
1 | |||
2 | HOWTO for multiqueue network device support | ||
3 | =========================================== | ||
4 | |||
5 | Section 1: Base driver requirements for implementing multiqueue support | ||
6 | Section 2: Qdisc support for multiqueue devices | ||
7 | Section 3: Brief howto using PRIO or RR for multiqueue devices | ||
8 | |||
9 | |||
10 | Intro: Kernel support for multiqueue devices | ||
11 | --------------------------------------------------------- | ||
12 | |||
13 | Kernel support for multiqueue devices is only an API that is presented to the | ||
14 | netdevice layer for base drivers to implement. This feature is part of the | ||
15 | core networking stack, and all network devices will be running on the | ||
16 | multiqueue-aware stack. If a base driver only has one queue, then these | ||
17 | changes are transparent to that driver. | ||
18 | |||
19 | |||
20 | Section 1: Base driver requirements for implementing multiqueue support | ||
21 | ----------------------------------------------------------------------- | ||
22 | |||
23 | Base drivers are required to use the new alloc_etherdev_mq() or | ||
24 | alloc_netdev_mq() functions to allocate the subqueues for the device. The | ||
25 | underlying kernel API will take care of the allocation and deallocation of | ||
26 | the subqueue memory, as well as netdev configuration of where the queues | ||
27 | exist in memory. | ||
28 | |||
29 | The base driver will also need to manage the queues as it does the global | ||
30 | netdev->queue_lock today. Therefore base drivers should use the | ||
31 | netif_{start|stop|wake}_subqueue() functions to manage each queue while the | ||
32 | device is still operational. netdev->queue_lock is still used when the device | ||
33 | comes online or when it's completely shut down (unregister_netdev(), etc.). | ||
34 | |||
35 | Finally, the base driver should indicate that it is a multiqueue device. The | ||
36 | feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features | ||
37 | bitmap on device initialization. Below is an example from e1000: | ||
38 | |||
39 | #ifdef CONFIG_E1000_MQ | ||
40 | if ( (adapter->hw.mac.type == e1000_82571) || | ||
41 | (adapter->hw.mac.type == e1000_82572) || | ||
42 | (adapter->hw.mac.type == e1000_80003es2lan)) | ||
43 | netdev->features |= NETIF_F_MULTI_QUEUE; | ||
44 | #endif | ||
45 | |||
46 | |||
47 | Section 2: Qdisc support for multiqueue devices | ||
48 | ----------------------------------------------- | ||
49 | |||
50 | Currently two qdiscs support multiqueue devices. A new round-robin qdisc, | ||
51 | sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to | ||
52 | bands and queues, and will store the queue mapping into skb->queue_mapping. | ||
53 | Use this field in the base driver to determine which queue to send the skb | ||
54 | to. | ||
55 | |||
56 | sch_rr has been added for hardware that doesn't want scheduling policies from | ||
57 | software, so it's a straight round-robin qdisc. It uses the same syntax and | ||
58 | classification priomap that sch_prio uses, so it should be intuitive to | ||
59 | configure for people who've used sch_prio. | ||
60 | |||
61 | The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been | ||
62 | built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of | ||
63 | bands requested is equal to the number of queues on the hardware. If they | ||
64 | are equal, it sets a one-to-one mapping up between the queues and bands. If | ||
65 | they're not equal, it will not load the qdisc. This is the same behavior | ||
66 | for RR. Once the association is made, any skb that is classified will have | ||
67 | skb->queue_mapping set, which will allow the driver to properly queue skb's | ||
68 | to multiple queues. | ||
69 | |||
70 | |||
71 | Section 3: Brief howto using PRIO and RR for multiqueue devices | ||
72 | --------------------------------------------------------------- | ||
73 | |||
74 | The userspace command 'tc,' part of the iproute2 package, is used to configure | ||
75 | qdiscs. To add the PRIO qdisc to your network device, assuming the device is | ||
76 | called eth0, run the following command: | ||
77 | |||
78 | # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue | ||
79 | |||
80 | This will create 4 bands, 0 being highest priority, and associate those bands | ||
81 | to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping | ||
82 | would look like: | ||
83 | |||
84 | band 0 => queue 0 | ||
85 | band 1 => queue 1 | ||
86 | band 2 => queue 2 | ||
87 | band 3 => queue 3 | ||
88 | |||
89 | Traffic will begin flowing through each queue if your TOS values are assigning | ||
90 | traffic across the various bands. For example, ssh traffic will always try to | ||
91 | go out band 0 based on TOS -> Linux priority conversion (realtime traffic), | ||
92 | so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" | ||
93 | traffic classification, which is band 1. Therefore pings will be send out | ||
94 | queue 1 on the NIC. | ||
95 | |||
96 | Note the use of the multiqueue keyword. This is only in versions of iproute2 | ||
97 | that support multiqueue networking devices; if this is omitted when loading | ||
98 | a qdisc onto a multiqueue device, the qdisc will load and operate the same | ||
99 | if it were loaded onto a single-queue device (i.e. - sends all traffic to | ||
100 | queue 0). | ||
101 | |||
102 | Another alternative to multiqueue band allocation can be done by using the | ||
103 | multiqueue option and specify 0 bands. If this is the case, the qdisc will | ||
104 | allocate the number of bands to equal the number of queues that the device | ||
105 | reports, and bring the qdisc online. | ||
106 | |||
107 | The behavior of tc filters remains the same, where it will override TOS priority | ||
108 | classification. | ||
109 | |||
110 | |||
111 | Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> | ||
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index ce1361f95243..37869295fc70 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt | |||
@@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If | |||
20 | separately allocated data is attached to the network device | 20 | separately allocated data is attached to the network device |
21 | (dev->priv) then it is up to the module exit handler to free that. | 21 | (dev->priv) then it is up to the module exit handler to free that. |
22 | 22 | ||
23 | MTU | ||
24 | === | ||
25 | Each network device has a Maximum Transfer Unit. The MTU does not | ||
26 | include any link layer protocol overhead. Upper layer protocols must | ||
27 | not pass a socket buffer (skb) to a device to transmit with more data | ||
28 | than the mtu. The MTU does not include link layer header overhead, so | ||
29 | for example on Ethernet if the standard MTU is 1500 bytes used, the | ||
30 | actual skb will contain up to 1514 bytes because of the Ethernet | ||
31 | header. Devices should allow for the 4 byte VLAN header as well. | ||
32 | |||
33 | Segmentation Offload (GSO, TSO) is an exception to this rule. The | ||
34 | upper layer protocol may pass a large socket buffer to the device | ||
35 | transmit routine, and the device will break that up into separate | ||
36 | packets based on the current MTU. | ||
37 | |||
38 | MTU is symmetrical and applies both to receive and transmit. A device | ||
39 | must be able to receive at least the maximum size packet allowed by | ||
40 | the MTU. A network device may use the MTU as mechanism to size receive | ||
41 | buffers, but the device should allow packets with VLAN header. With | ||
42 | standard Ethernet mtu of 1500 bytes, the device should allow up to | ||
43 | 1518 byte packets (1500 + 14 header + 4 tag). The device may either: | ||
44 | drop, truncate, or pass up oversize packets, but dropping oversize | ||
45 | packets is preferred. | ||
46 | |||
23 | 47 | ||
24 | struct net_device synchronization rules | 48 | struct net_device synchronization rules |
25 | ======================================= | 49 | ======================================= |
@@ -43,16 +67,17 @@ dev->get_stats: | |||
43 | 67 | ||
44 | dev->hard_start_xmit: | 68 | dev->hard_start_xmit: |
45 | Synchronization: netif_tx_lock spinlock. | 69 | Synchronization: netif_tx_lock spinlock. |
70 | |||
46 | When the driver sets NETIF_F_LLTX in dev->features this will be | 71 | When the driver sets NETIF_F_LLTX in dev->features this will be |
47 | called without holding netif_tx_lock. In this case the driver | 72 | called without holding netif_tx_lock. In this case the driver |
48 | has to lock by itself when needed. It is recommended to use a try lock | 73 | has to lock by itself when needed. It is recommended to use a try lock |
49 | for this and return -1 when the spin lock fails. | 74 | for this and return NETDEV_TX_LOCKED when the spin lock fails. |
50 | The locking there should also properly protect against | 75 | The locking there should also properly protect against |
51 | set_multicast_list | 76 | set_multicast_list. |
52 | Context: Process with BHs disabled or BH (timer). | 77 | |
53 | Notes: netif_queue_stopped() is guaranteed false | 78 | Context: Process with BHs disabled or BH (timer), |
54 | Interrupts must be enabled when calling hard_start_xmit. | 79 | will be called with interrupts disabled by netconsole. |
55 | (Interrupts must also be enabled when enabling the BH handler.) | 80 | |
56 | Return codes: | 81 | Return codes: |
57 | o NETDEV_TX_OK everything ok. | 82 | o NETDEV_TX_OK everything ok. |
58 | o NETDEV_TX_BUSY Cannot transmit packet, try later | 83 | o NETDEV_TX_BUSY Cannot transmit packet, try later |
@@ -74,4 +99,5 @@ dev->poll: | |||
74 | Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See | 99 | Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See |
75 | dev_close code and comments in net/core/dev.c for more info. | 100 | dev_close code and comments in net/core/dev.c for more info. |
76 | Context: softirq | 101 | Context: softirq |
102 | will be called with interrupts disabled by netconsole. | ||
77 | 103 | ||
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt deleted file mode 100644 index 8590a954df1d..000000000000 --- a/Documentation/networking/sk98lin.txt +++ /dev/null | |||
@@ -1,568 +0,0 @@ | |||
1 | (C)Copyright 1999-2004 Marvell(R). | ||
2 | All rights reserved | ||
3 | =========================================================================== | ||
4 | |||
5 | sk98lin.txt created 13-Feb-2004 | ||
6 | |||
7 | Readme File for sk98lin v6.23 | ||
8 | Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX | ||
9 | |||
10 | This file contains | ||
11 | 1 Overview | ||
12 | 2 Required Files | ||
13 | 3 Installation | ||
14 | 3.1 Driver Installation | ||
15 | 3.2 Inclusion of adapter at system start | ||
16 | 4 Driver Parameters | ||
17 | 4.1 Per-Port Parameters | ||
18 | 4.2 Adapter Parameters | ||
19 | 5 Large Frame Support | ||
20 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) | ||
21 | 7 Troubleshooting | ||
22 | |||
23 | =========================================================================== | ||
24 | |||
25 | |||
26 | 1 Overview | ||
27 | =========== | ||
28 | |||
29 | The sk98lin driver supports the Marvell Yukon and SysKonnect | ||
30 | SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has | ||
31 | been tested with Linux on Intel/x86 machines. | ||
32 | *** | ||
33 | |||
34 | |||
35 | 2 Required Files | ||
36 | ================= | ||
37 | |||
38 | The linux kernel source. | ||
39 | No additional files required. | ||
40 | *** | ||
41 | |||
42 | |||
43 | 3 Installation | ||
44 | =============== | ||
45 | |||
46 | It is recommended to download the latest version of the driver from the | ||
47 | SysKonnect web site www.syskonnect.com. If you have downloaded the latest | ||
48 | driver, the Linux kernel has to be patched before the driver can be | ||
49 | installed. For details on how to patch a Linux kernel, refer to the | ||
50 | patch.txt file. | ||
51 | |||
52 | 3.1 Driver Installation | ||
53 | ------------------------ | ||
54 | |||
55 | The following steps describe the actions that are required to install | ||
56 | the driver and to start it manually. These steps should be carried | ||
57 | out for the initial driver setup. Once confirmed to be ok, they can | ||
58 | be included in the system start. | ||
59 | |||
60 | NOTE 1: To perform the following tasks you need 'root' access. | ||
61 | |||
62 | NOTE 2: In case of problems, please read the section "Troubleshooting" | ||
63 | below. | ||
64 | |||
65 | The driver can either be integrated into the kernel or it can be compiled | ||
66 | as a module. Select the appropriate option during the kernel | ||
67 | configuration. | ||
68 | |||
69 | Compile/use the driver as a module | ||
70 | ---------------------------------- | ||
71 | To compile the driver, go to the directory /usr/src/linux and | ||
72 | execute the command "make menuconfig" or "make xconfig" and proceed as | ||
73 | follows: | ||
74 | |||
75 | To integrate the driver permanently into the kernel, proceed as follows: | ||
76 | |||
77 | 1. Select the menu "Network device support" and then "Ethernet(1000Mbit)" | ||
78 | 2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" | ||
79 | with (*) | ||
80 | 3. Build a new kernel when the configuration of the above options is | ||
81 | finished. | ||
82 | 4. Install the new kernel. | ||
83 | 5. Reboot your system. | ||
84 | |||
85 | To use the driver as a module, proceed as follows: | ||
86 | |||
87 | 1. Enable 'loadable module support' in the kernel. | ||
88 | 2. For automatic driver start, enable the 'Kernel module loader'. | ||
89 | 3. Select the menu "Network device support" and then "Ethernet(1000Mbit)" | ||
90 | 4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" | ||
91 | with (M) | ||
92 | 5. Execute the command "make modules". | ||
93 | 6. Execute the command "make modules_install". | ||
94 | The appropriate modules will be installed. | ||
95 | 7. Reboot your system. | ||
96 | |||
97 | |||
98 | Load the module manually | ||
99 | ------------------------ | ||
100 | To load the module manually, proceed as follows: | ||
101 | |||
102 | 1. Enter "modprobe sk98lin". | ||
103 | 2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in | ||
104 | your computer and you have a /proc file system, execute the command: | ||
105 | "ls /proc/net/sk98lin/" | ||
106 | This should produce an output containing a line with the following | ||
107 | format: | ||
108 | eth0 eth1 ... | ||
109 | which indicates that your adapter has been found and initialized. | ||
110 | |||
111 | NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx | ||
112 | adapter installed, the adapters will be listed as 'eth0', | ||
113 | 'eth1', 'eth2', etc. | ||
114 | For each adapter, repeat steps 3 and 4 below. | ||
115 | |||
116 | NOTE 2: If you have other Ethernet adapters installed, your Marvell | ||
117 | Yukon or SysKonnect SK-98xx adapter will be mapped to the | ||
118 | next available number, e.g. 'eth1'. The mapping is executed | ||
119 | automatically. | ||
120 | The module installation message (displayed either in a system | ||
121 | log file or on the console) prints a line for each adapter | ||
122 | found containing the corresponding 'ethX'. | ||
123 | |||
124 | 3. Select an IP address and assign it to the respective adapter by | ||
125 | entering: | ||
126 | ifconfig eth0 <ip-address> | ||
127 | With this command, the adapter is connected to the Ethernet. | ||
128 | |||
129 | SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter | ||
130 | is now active, the link status LED of the primary port is active and | ||
131 | the link status LED of the secondary port (on dual port adapters) is | ||
132 | blinking (if the ports are connected to a switch or hub). | ||
133 | SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active. | ||
134 | In addition, you will receive a status message on the console stating | ||
135 | "ethX: network connection up using port Y" and showing the selected | ||
136 | connection parameters (x stands for the ethernet device number | ||
137 | (0,1,2, etc), y stands for the port name (A or B)). | ||
138 | |||
139 | NOTE: If you are in doubt about IP addresses, ask your network | ||
140 | administrator for assistance. | ||
141 | |||
142 | 4. Your adapter should now be fully operational. | ||
143 | Use 'ping <otherstation>' to verify the connection to other computers | ||
144 | on your network. | ||
145 | 5. To check the adapter configuration view /proc/net/sk98lin/[devicename]. | ||
146 | For example by executing: | ||
147 | "cat /proc/net/sk98lin/eth0" | ||
148 | |||
149 | Unload the module | ||
150 | ----------------- | ||
151 | To stop and unload the driver modules, proceed as follows: | ||
152 | |||
153 | 1. Execute the command "ifconfig eth0 down". | ||
154 | 2. Execute the command "rmmod sk98lin". | ||
155 | |||
156 | 3.2 Inclusion of adapter at system start | ||
157 | ----------------------------------------- | ||
158 | |||
159 | Since a large number of different Linux distributions are | ||
160 | available, we are unable to describe a general installation procedure | ||
161 | for the driver module. | ||
162 | Because the driver is now integrated in the kernel, installation should | ||
163 | be easy, using the standard mechanism of your distribution. | ||
164 | Refer to the distribution's manual for installation of ethernet adapters. | ||
165 | |||
166 | *** | ||
167 | |||
168 | 4 Driver Parameters | ||
169 | ==================== | ||
170 | |||
171 | Parameters can be set at the command line after the module has been | ||
172 | loaded with the command 'modprobe'. | ||
173 | In some distributions, the configuration tools are able to pass parameters | ||
174 | to the driver module. | ||
175 | |||
176 | If you use the kernel module loader, you can set driver parameters | ||
177 | in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier). | ||
178 | To set the driver parameters in this file, proceed as follows: | ||
179 | |||
180 | 1. Insert a line of the form : | ||
181 | options sk98lin ... | ||
182 | For "...", the same syntax is required as described for the command | ||
183 | line parameters of modprobe below. | ||
184 | 2. To activate the new parameters, either reboot your computer | ||
185 | or | ||
186 | unload and reload the driver. | ||
187 | The syntax of the driver parameters is: | ||
188 | |||
189 | modprobe sk98lin parameter=value1[,value2[,value3...]] | ||
190 | |||
191 | where value1 refers to the first adapter, value2 to the second etc. | ||
192 | |||
193 | NOTE: All parameters are case sensitive. Write them exactly as shown | ||
194 | below. | ||
195 | |||
196 | Example: | ||
197 | Suppose you have two adapters. You want to set auto-negotiation | ||
198 | on the first adapter to ON and on the second adapter to OFF. | ||
199 | You also want to set DuplexCapabilities on the first adapter | ||
200 | to FULL, and on the second adapter to HALF. | ||
201 | Then, you must enter: | ||
202 | |||
203 | modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half | ||
204 | |||
205 | NOTE: The number of adapters that can be configured this way is | ||
206 | limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM). | ||
207 | The current limit is 16. If you happen to install | ||
208 | more adapters, adjust this and recompile. | ||
209 | |||
210 | |||
211 | 4.1 Per-Port Parameters | ||
212 | ------------------------ | ||
213 | |||
214 | These settings are available for each port on the adapter. | ||
215 | In the following description, '?' stands for the port for | ||
216 | which you set the parameter (A or B). | ||
217 | |||
218 | Speed | ||
219 | ----- | ||
220 | Parameter: Speed_? | ||
221 | Values: 10, 100, 1000, Auto | ||
222 | Default: Auto | ||
223 | |||
224 | This parameter is used to set the speed capabilities. It is only valid | ||
225 | for the SK-98xx V2.0 copper adapters. | ||
226 | Usually, the speed is negotiated between the two ports during link | ||
227 | establishment. If this fails, a port can be forced to a specific setting | ||
228 | with this parameter. | ||
229 | |||
230 | Auto-Negotiation | ||
231 | ---------------- | ||
232 | Parameter: AutoNeg_? | ||
233 | Values: On, Off, Sense | ||
234 | Default: On | ||
235 | |||
236 | The "Sense"-mode automatically detects whether the link partner supports | ||
237 | auto-negotiation or not. | ||
238 | |||
239 | Duplex Capabilities | ||
240 | ------------------- | ||
241 | Parameter: DupCap_? | ||
242 | Values: Half, Full, Both | ||
243 | Default: Both | ||
244 | |||
245 | This parameters is only relevant if auto-negotiation for this port is | ||
246 | not set to "Sense". If auto-negotiation is set to "On", all three values | ||
247 | are possible. If it is set to "Off", only "Full" and "Half" are allowed. | ||
248 | This parameter is useful if your link partner does not support all | ||
249 | possible combinations. | ||
250 | |||
251 | Flow Control | ||
252 | ------------ | ||
253 | Parameter: FlowCtrl_? | ||
254 | Values: Sym, SymOrRem, LocSend, None | ||
255 | Default: SymOrRem | ||
256 | |||
257 | This parameter can be used to set the flow control capabilities the | ||
258 | port reports during auto-negotiation. It can be set for each port | ||
259 | individually. | ||
260 | Possible modes: | ||
261 | -- Sym = Symmetric: both link partners are allowed to send | ||
262 | PAUSE frames | ||
263 | -- SymOrRem = SymmetricOrRemote: both or only remote partner | ||
264 | are allowed to send PAUSE frames | ||
265 | -- LocSend = LocalSend: only local link partner is allowed | ||
266 | to send PAUSE frames | ||
267 | -- None = no link partner is allowed to send PAUSE frames | ||
268 | |||
269 | NOTE: This parameter is ignored if auto-negotiation is set to "Off". | ||
270 | |||
271 | Role in Master-Slave-Negotiation (1000Base-T only) | ||
272 | -------------------------------------------------- | ||
273 | Parameter: Role_? | ||
274 | Values: Auto, Master, Slave | ||
275 | Default: Auto | ||
276 | |||
277 | This parameter is only valid for the SK-9821 and SK-9822 adapters. | ||
278 | For two 1000Base-T ports to communicate, one must take the role of the | ||
279 | master (providing timing information), while the other must be the | ||
280 | slave. Usually, this is negotiated between the two ports during link | ||
281 | establishment. If this fails, a port can be forced to a specific setting | ||
282 | with this parameter. | ||
283 | |||
284 | |||
285 | 4.2 Adapter Parameters | ||
286 | ----------------------- | ||
287 | |||
288 | Connection Type (SK-98xx V2.0 copper adapters only) | ||
289 | --------------- | ||
290 | Parameter: ConType | ||
291 | Values: Auto, 100FD, 100HD, 10FD, 10HD | ||
292 | Default: Auto | ||
293 | |||
294 | The parameter 'ConType' is a combination of all five per-port parameters | ||
295 | within one single parameter. This simplifies the configuration of both ports | ||
296 | of an adapter card! The different values of this variable reflect the most | ||
297 | meaningful combinations of port parameters. | ||
298 | |||
299 | The following table shows the values of 'ConType' and the corresponding | ||
300 | combinations of the per-port parameters: | ||
301 | |||
302 | ConType | DupCap AutoNeg FlowCtrl Role Speed | ||
303 | ----------+------------------------------------------------------ | ||
304 | Auto | Both On SymOrRem Auto Auto | ||
305 | 100FD | Full Off None Auto (ignored) 100 | ||
306 | 100HD | Half Off None Auto (ignored) 100 | ||
307 | 10FD | Full Off None Auto (ignored) 10 | ||
308 | 10HD | Half Off None Auto (ignored) 10 | ||
309 | |||
310 | Stating any other port parameter together with this 'ConType' variable | ||
311 | will result in a merged configuration of those settings. This due to | ||
312 | the fact, that the per-port parameters (e.g. Speed_? ) have a higher | ||
313 | priority than the combined variable 'ConType'. | ||
314 | |||
315 | NOTE: This parameter is always used on both ports of the adapter card. | ||
316 | |||
317 | Interrupt Moderation | ||
318 | -------------------- | ||
319 | Parameter: Moderation | ||
320 | Values: None, Static, Dynamic | ||
321 | Default: None | ||
322 | |||
323 | Interrupt moderation is employed to limit the maximum number of interrupts | ||
324 | the driver has to serve. That is, one or more interrupts (which indicate any | ||
325 | transmit or receive packet to be processed) are queued until the driver | ||
326 | processes them. When queued interrupts are to be served, is determined by the | ||
327 | 'IntsPerSec' parameter, which is explained later below. | ||
328 | |||
329 | Possible modes: | ||
330 | |||
331 | -- None - No interrupt moderation is applied on the adapter card. | ||
332 | Therefore, each transmit or receive interrupt is served immediately | ||
333 | as soon as it appears on the interrupt line of the adapter card. | ||
334 | |||
335 | -- Static - Interrupt moderation is applied on the adapter card. | ||
336 | All transmit and receive interrupts are queued until a complete | ||
337 | moderation interval ends. If such a moderation interval ends, all | ||
338 | queued interrupts are processed in one big bunch without any delay. | ||
339 | The term 'static' reflects the fact, that interrupt moderation is | ||
340 | always enabled, regardless how much network load is currently | ||
341 | passing via a particular interface. In addition, the duration of | ||
342 | the moderation interval has a fixed length that never changes while | ||
343 | the driver is operational. | ||
344 | |||
345 | -- Dynamic - Interrupt moderation might be applied on the adapter card, | ||
346 | depending on the load of the system. If the driver detects that the | ||
347 | system load is too high, the driver tries to shield the system against | ||
348 | too much network load by enabling interrupt moderation. If - at a later | ||
349 | time - the CPU utilization decreases again (or if the network load is | ||
350 | negligible) the interrupt moderation will automatically be disabled. | ||
351 | |||
352 | Interrupt moderation should be used when the driver has to handle one or more | ||
353 | interfaces with a high network load, which - as a consequence - leads also to a | ||
354 | high CPU utilization. When moderation is applied in such high network load | ||
355 | situations, CPU load might be reduced by 20-30%. | ||
356 | |||
357 | NOTE: The drawback of using interrupt moderation is an increase of the round- | ||
358 | trip-time (RTT), due to the queueing and serving of interrupts at dedicated | ||
359 | moderation times. | ||
360 | |||
361 | Interrupts per second | ||
362 | --------------------- | ||
363 | Parameter: IntsPerSec | ||
364 | Values: 30...40000 (interrupts per second) | ||
365 | Default: 2000 | ||
366 | |||
367 | This parameter is only used if either static or dynamic interrupt moderation | ||
368 | is used on a network adapter card. Using this parameter if no moderation is | ||
369 | applied will lead to no action performed. | ||
370 | |||
371 | This parameter determines the length of any interrupt moderation interval. | ||
372 | Assuming that static interrupt moderation is to be used, an 'IntsPerSec' | ||
373 | parameter value of 2000 will lead to an interrupt moderation interval of | ||
374 | 500 microseconds. | ||
375 | |||
376 | NOTE: The duration of the moderation interval is to be chosen with care. | ||
377 | At first glance, selecting a very long duration (e.g. only 100 interrupts per | ||
378 | second) seems to be meaningful, but the increase of packet-processing delay | ||
379 | is tremendous. On the other hand, selecting a very short moderation time might | ||
380 | compensate the use of any moderation being applied. | ||
381 | |||
382 | |||
383 | Preferred Port | ||
384 | -------------- | ||
385 | Parameter: PrefPort | ||
386 | Values: A, B | ||
387 | Default: A | ||
388 | |||
389 | This is used to force the preferred port to A or B (on dual-port network | ||
390 | adapters). The preferred port is the one that is used if both are detected | ||
391 | as fully functional. | ||
392 | |||
393 | RLMT Mode (Redundant Link Management Technology) | ||
394 | ------------------------------------------------ | ||
395 | Parameter: RlmtMode | ||
396 | Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet | ||
397 | Default: CheckLinkState | ||
398 | |||
399 | RLMT monitors the status of the port. If the link of the active port | ||
400 | fails, RLMT switches immediately to the standby link. The virtual link is | ||
401 | maintained as long as at least one 'physical' link is up. | ||
402 | |||
403 | Possible modes: | ||
404 | |||
405 | -- CheckLinkState - Check link state only: RLMT uses the link state | ||
406 | reported by the adapter hardware for each individual port to | ||
407 | determine whether a port can be used for all network traffic or | ||
408 | not. | ||
409 | |||
410 | -- CheckLocalPort - In this mode, RLMT monitors the network path | ||
411 | between the two ports of an adapter by regularly exchanging packets | ||
412 | between them. This mode requires a network configuration in which | ||
413 | the two ports are able to "see" each other (i.e. there must not be | ||
414 | any router between the ports). | ||
415 | |||
416 | -- CheckSeg - Check local port and segmentation: This mode supports the | ||
417 | same functions as the CheckLocalPort mode and additionally checks | ||
418 | network segmentation between the ports. Therefore, this mode is only | ||
419 | to be used if Gigabit Ethernet switches are installed on the network | ||
420 | that have been configured to use the Spanning Tree protocol. | ||
421 | |||
422 | -- DualNet - In this mode, ports A and B are used as separate devices. | ||
423 | If you have a dual port adapter, port A will be configured as eth0 | ||
424 | and port B as eth1. Both ports can be used independently with | ||
425 | distinct IP addresses. The preferred port setting is not used. | ||
426 | RLMT is turned off. | ||
427 | |||
428 | NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations | ||
429 | where a network path between the ports on one adapter exists. | ||
430 | Moreover, they are not designed to work where adapters are connected | ||
431 | back-to-back. | ||
432 | *** | ||
433 | |||
434 | |||
435 | 5 Large Frame Support | ||
436 | ====================== | ||
437 | |||
438 | The driver supports large frames (also called jumbo frames). Using large | ||
439 | frames can result in an improved throughput if transferring large amounts | ||
440 | of data. | ||
441 | To enable large frames, set the MTU (maximum transfer unit) of the | ||
442 | interface to the desired value (up to 9000), execute the following | ||
443 | command: | ||
444 | ifconfig eth0 mtu 9000 | ||
445 | This will only work if you have two adapters connected back-to-back | ||
446 | or if you use a switch that supports large frames. When using a switch, | ||
447 | it should be configured to allow large frames and auto-negotiation should | ||
448 | be set to OFF. The setting must be configured on all adapters that can be | ||
449 | reached by the large frames. If one adapter is not set to receive large | ||
450 | frames, it will simply drop them. | ||
451 | |||
452 | You can switch back to the standard ethernet frame size by executing the | ||
453 | following command: | ||
454 | ifconfig eth0 mtu 1500 | ||
455 | |||
456 | To permanently configure this setting, add a script with the 'ifconfig' | ||
457 | line to the system startup sequence (named something like "S99sk98lin" | ||
458 | in /etc/rc.d/rc2.d). | ||
459 | *** | ||
460 | |||
461 | |||
462 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) | ||
463 | ================================================================== | ||
464 | |||
465 | The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and | ||
466 | Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad. | ||
467 | These features are only available after installation of open source | ||
468 | modules available on the Internet: | ||
469 | For VLAN go to: http://www.candelatech.com/~greear/vlan.html | ||
470 | For Link Aggregation go to: http://www.st.rim.or.jp/~yumo | ||
471 | |||
472 | NOTE: SysKonnect GmbH does not offer any support for these open source | ||
473 | modules and does not take the responsibility for any kind of | ||
474 | failures or problems arising in connection with these modules. | ||
475 | |||
476 | NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may | ||
477 | cause problems when unloading the driver. | ||
478 | |||
479 | |||
480 | 7 Troubleshooting | ||
481 | ================== | ||
482 | |||
483 | If any problems occur during the installation process, check the | ||
484 | following list: | ||
485 | |||
486 | |||
487 | Problem: The SK-98xx adapter cannot be found by the driver. | ||
488 | Solution: In /proc/pci search for the following entry: | ||
489 | 'Ethernet controller: SysKonnect SK-98xx ...' | ||
490 | If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has | ||
491 | been found by the system and should be operational. | ||
492 | If this entry does not exist or if the file '/proc/pci' is not | ||
493 | found, there may be a hardware problem or the PCI support may | ||
494 | not be enabled in your kernel. | ||
495 | The adapter can be checked using the diagnostics program which | ||
496 | is available on the SysKonnect web site: | ||
497 | www.syskonnect.com | ||
498 | |||
499 | Some COMPAQ machines have problems dealing with PCI under Linux. | ||
500 | This problem is described in the 'PCI howto' document | ||
501 | (included in some distributions or available from the | ||
502 | web, e.g. at 'www.linux.org'). | ||
503 | |||
504 | |||
505 | Problem: Programs such as 'ifconfig' or 'route' cannot be found or the | ||
506 | error message 'Operation not permitted' is displayed. | ||
507 | Reason: You are not logged in as user 'root'. | ||
508 | Solution: Logout and login as 'root' or change to 'root' via 'su'. | ||
509 | |||
510 | |||
511 | Problem: Upon use of the command 'ping <address>' the message | ||
512 | "ping: sendto: Network is unreachable" is displayed. | ||
513 | Reason: Your route is not set correctly. | ||
514 | Solution: If you are using RedHat, you probably forgot to set up the | ||
515 | route in the 'network configuration'. | ||
516 | Check the existing routes with the 'route' command and check | ||
517 | if an entry for 'eth0' exists, and if so, if it is set correctly. | ||
518 | |||
519 | |||
520 | Problem: The driver can be started, the adapter is connected to the | ||
521 | network, but you cannot receive or transmit any packets; | ||
522 | e.g. 'ping' does not work. | ||
523 | Reason: There is an incorrect route in your routing table. | ||
524 | Solution: Check the routing table with the command 'route' and read the | ||
525 | manual help pages dealing with routes (enter 'man route'). | ||
526 | |||
527 | NOTE: Although the 2.2.x kernel versions generate the routing entry | ||
528 | automatically, problems of this kind may occur here as well. We've | ||
529 | come across a situation in which the driver started correctly at | ||
530 | system start, but after the driver has been removed and reloaded, | ||
531 | the route of the adapter's network pointed to the 'dummy0'device | ||
532 | and had to be corrected manually. | ||
533 | |||
534 | |||
535 | Problem: Your computer should act as a router between multiple | ||
536 | IP subnetworks (using multiple adapters), but computers in | ||
537 | other subnetworks cannot be reached. | ||
538 | Reason: Either the router's kernel is not configured for IP forwarding | ||
539 | or the routing table and gateway configuration of at least one | ||
540 | computer is not working. | ||
541 | |||
542 | Problem: Upon driver start, the following error message is displayed: | ||
543 | "eth0: -- ERROR -- | ||
544 | Class: internal Software error | ||
545 | Nr: 0xcc | ||
546 | Msg: SkGeInitPort() cannot init running ports" | ||
547 | Reason: You are using a driver compiled for single processor machines | ||
548 | on a multiprocessor machine with SMP (Symmetric MultiProcessor) | ||
549 | kernel. | ||
550 | Solution: Configure your kernel appropriately and recompile the kernel or | ||
551 | the modules. | ||
552 | |||
553 | |||
554 | |||
555 | If your problem is not listed here, please contact SysKonnect's technical | ||
556 | support for help (linux@syskonnect.de). | ||
557 | When contacting our technical support, please ensure that the following | ||
558 | information is available: | ||
559 | - System Manufacturer and HW Informations (CPU, Memory... ) | ||
560 | - PCI-Boards in your system | ||
561 | - Distribution | ||
562 | - Kernel version | ||
563 | - Driver version | ||
564 | *** | ||
565 | |||
566 | |||
567 | |||
568 | ***End of Readme File*** | ||
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt new file mode 100644 index 000000000000..4b4adb8eb14f --- /dev/null +++ b/Documentation/networking/spider_net.txt | |||
@@ -0,0 +1,204 @@ | |||
1 | |||
2 | The Spidernet Device Driver | ||
3 | =========================== | ||
4 | |||
5 | Written by Linas Vepstas <linas@austin.ibm.com> | ||
6 | |||
7 | Version of 7 June 2007 | ||
8 | |||
9 | Abstract | ||
10 | ======== | ||
11 | This document sketches the structure of portions of the spidernet | ||
12 | device driver in the Linux kernel tree. The spidernet is a gigabit | ||
13 | ethernet device built into the Toshiba southbridge commonly used | ||
14 | in the SONY Playstation 3 and the IBM QS20 Cell blade. | ||
15 | |||
16 | The Structure of the RX Ring. | ||
17 | ============================= | ||
18 | The receive (RX) ring is a circular linked list of RX descriptors, | ||
19 | together with three pointers into the ring that are used to manage its | ||
20 | contents. | ||
21 | |||
22 | The elements of the ring are called "descriptors" or "descrs"; they | ||
23 | describe the received data. This includes a pointer to a buffer | ||
24 | containing the received data, the buffer size, and various status bits. | ||
25 | |||
26 | There are three primary states that a descriptor can be in: "empty", | ||
27 | "full" and "not-in-use". An "empty" or "ready" descriptor is ready | ||
28 | to receive data from the hardware. A "full" descriptor has data in it, | ||
29 | and is waiting to be emptied and processed by the OS. A "not-in-use" | ||
30 | descriptor is neither empty or full; it is simply not ready. It may | ||
31 | not even have a data buffer in it, or is otherwise unusable. | ||
32 | |||
33 | During normal operation, on device startup, the OS (specifically, the | ||
34 | spidernet device driver) allocates a set of RX descriptors and RX | ||
35 | buffers. These are all marked "empty", ready to receive data. This | ||
36 | ring is handed off to the hardware, which sequentially fills in the | ||
37 | buffers, and marks them "full". The OS follows up, taking the full | ||
38 | buffers, processing them, and re-marking them empty. | ||
39 | |||
40 | This filling and emptying is managed by three pointers, the "head" | ||
41 | and "tail" pointers, managed by the OS, and a hardware current | ||
42 | descriptor pointer (GDACTDPA). The GDACTDPA points at the descr | ||
43 | currently being filled. When this descr is filled, the hardware | ||
44 | marks it full, and advances the GDACTDPA by one. Thus, when there is | ||
45 | flowing RX traffic, every descr behind it should be marked "full", | ||
46 | and everything in front of it should be "empty". If the hardware | ||
47 | discovers that the current descr is not empty, it will signal an | ||
48 | interrupt, and halt processing. | ||
49 | |||
50 | The tail pointer tails or trails the hardware pointer. When the | ||
51 | hardware is ahead, the tail pointer will be pointing at a "full" | ||
52 | descr. The OS will process this descr, and then mark it "not-in-use", | ||
53 | and advance the tail pointer. Thus, when there is flowing RX traffic, | ||
54 | all of the descrs in front of the tail pointer should be "full", and | ||
55 | all of those behind it should be "not-in-use". When RX traffic is not | ||
56 | flowing, then the tail pointer can catch up to the hardware pointer. | ||
57 | The OS will then note that the current tail is "empty", and halt | ||
58 | processing. | ||
59 | |||
60 | The head pointer (somewhat mis-named) follows after the tail pointer. | ||
61 | When traffic is flowing, then the head pointer will be pointing at | ||
62 | a "not-in-use" descr. The OS will perform various housekeeping duties | ||
63 | on this descr. This includes allocating a new data buffer and | ||
64 | dma-mapping it so as to make it visible to the hardware. The OS will | ||
65 | then mark the descr as "empty", ready to receive data. Thus, when there | ||
66 | is flowing RX traffic, everything in front of the head pointer should | ||
67 | be "not-in-use", and everything behind it should be "empty". If no | ||
68 | RX traffic is flowing, then the head pointer can catch up to the tail | ||
69 | pointer, at which point the OS will notice that the head descr is | ||
70 | "empty", and it will halt processing. | ||
71 | |||
72 | Thus, in an idle system, the GDACTDPA, tail and head pointers will | ||
73 | all be pointing at the same descr, which should be "empty". All of the | ||
74 | other descrs in the ring should be "empty" as well. | ||
75 | |||
76 | The show_rx_chain() routine will print out the the locations of the | ||
77 | GDACTDPA, tail and head pointers. It will also summarize the contents | ||
78 | of the ring, starting at the tail pointer, and listing the status | ||
79 | of the descrs that follow. | ||
80 | |||
81 | A typical example of the output, for a nearly idle system, might be | ||
82 | |||
83 | net eth1: Total number of descrs=256 | ||
84 | net eth1: Chain tail located at descr=20 | ||
85 | net eth1: Chain head is at 20 | ||
86 | net eth1: HW curr desc (GDACTDPA) is at 21 | ||
87 | net eth1: Have 1 descrs with stat=x40800101 | ||
88 | net eth1: HW next desc (GDACNEXTDA) is at 22 | ||
89 | net eth1: Last 255 descrs with stat=xa0800000 | ||
90 | |||
91 | In the above, the hardware has filled in one descr, number 20. Both | ||
92 | head and tail are pointing at 20, because it has not yet been emptied. | ||
93 | Meanwhile, hw is pointing at 21, which is free. | ||
94 | |||
95 | The "Have nnn decrs" refers to the descr starting at the tail: in this | ||
96 | case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers | ||
97 | to all of the rest of the descrs, from the last status change. The "nnn" | ||
98 | is a count of how many descrs have exactly the same status. | ||
99 | |||
100 | The status x4... corresponds to "full" and status xa... corresponds | ||
101 | to "empty". The actual value printed is RXCOMST_A. | ||
102 | |||
103 | In the device driver source code, a different set of names are | ||
104 | used for these same concepts, so that | ||
105 | |||
106 | "empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa | ||
107 | "full" == SPIDER_NET_DESCR_FRAME_END == 0x4 | ||
108 | "not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf | ||
109 | |||
110 | |||
111 | The RX RAM full bug/feature | ||
112 | =========================== | ||
113 | |||
114 | As long as the OS can empty out the RX buffers at a rate faster than | ||
115 | the hardware can fill them, there is no problem. If, for some reason, | ||
116 | the OS fails to empty the RX ring fast enough, the hardware GDACTDPA | ||
117 | pointer will catch up to the head, notice the not-empty condition, | ||
118 | ad stop. However, RX packets may still continue arriving on the wire. | ||
119 | The spidernet chip can save some limited number of these in local RAM. | ||
120 | When this local ram fills up, the spider chip will issue an interrupt | ||
121 | indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit | ||
122 | will be set in GHIINT1STS). When the RX ram full condition occurs, | ||
123 | a certain bug/feature is triggered that has to be specially handled. | ||
124 | This section describes the special handling for this condition. | ||
125 | |||
126 | When the OS finally has a chance to run, it will empty out the RX ring. | ||
127 | In particular, it will clear the descriptor on which the hardware had | ||
128 | stopped. However, once the hardware has decided that a certain | ||
129 | descriptor is invalid, it will not restart at that descriptor; instead | ||
130 | it will restart at the next descr. This potentially will lead to a | ||
131 | deadlock condition, as the tail pointer will be pointing at this descr, | ||
132 | which, from the OS point of view, is empty; the OS will be waiting for | ||
133 | this descr to be filled. However, the hardware has skipped this descr, | ||
134 | and is filling the next descrs. Since the OS doesn't see this, there | ||
135 | is a potential deadlock, with the OS waiting for one descr to fill, | ||
136 | while the hardware is waiting for a different set of descrs to become | ||
137 | empty. | ||
138 | |||
139 | A call to show_rx_chain() at this point indicates the nature of the | ||
140 | problem. A typical print when the network is hung shows the following: | ||
141 | |||
142 | net eth1: Spider RX RAM full, incoming packets might be discarded! | ||
143 | net eth1: Total number of descrs=256 | ||
144 | net eth1: Chain tail located at descr=255 | ||
145 | net eth1: Chain head is at 255 | ||
146 | net eth1: HW curr desc (GDACTDPA) is at 0 | ||
147 | net eth1: Have 1 descrs with stat=xa0800000 | ||
148 | net eth1: HW next desc (GDACNEXTDA) is at 1 | ||
149 | net eth1: Have 127 descrs with stat=x40800101 | ||
150 | net eth1: Have 1 descrs with stat=x40800001 | ||
151 | net eth1: Have 126 descrs with stat=x40800101 | ||
152 | net eth1: Last 1 descrs with stat=xa0800000 | ||
153 | |||
154 | Both the tail and head pointers are pointing at descr 255, which is | ||
155 | marked xa... which is "empty". Thus, from the OS point of view, there | ||
156 | is nothing to be done. In particular, there is the implicit assumption | ||
157 | that everything in front of the "empty" descr must surely also be empty, | ||
158 | as explained in the last section. The OS is waiting for descr 255 to | ||
159 | become non-empty, which, in this case, will never happen. | ||
160 | |||
161 | The HW pointer is at descr 0. This descr is marked 0x4.. or "full". | ||
162 | Since its already full, the hardware can do nothing more, and thus has | ||
163 | halted processing. Notice that descrs 0 through 254 are all marked | ||
164 | "full", while descr 254 and 255 are empty. (The "Last 1 descrs" is | ||
165 | descr 254, since tail was at 255.) Thus, the system is deadlocked, | ||
166 | and there can be no forward progress; the OS thinks there's nothing | ||
167 | to do, and the hardware has nowhere to put incoming data. | ||
168 | |||
169 | This bug/feature is worked around with the spider_net_resync_head_ptr() | ||
170 | routine. When the driver receives RX interrupts, but an examination | ||
171 | of the RX chain seems to show it is empty, then it is probable that | ||
172 | the hardware has skipped a descr or two (sometimes dozens under heavy | ||
173 | network conditions). The spider_net_resync_head_ptr() subroutine will | ||
174 | search the ring for the next full descr, and the driver will resume | ||
175 | operations there. Since this will leave "holes" in the ring, there | ||
176 | is also a spider_net_resync_tail_ptr() that will skip over such holes. | ||
177 | |||
178 | As of this writing, the spider_net_resync() strategy seems to work very | ||
179 | well, even under heavy network loads. | ||
180 | |||
181 | |||
182 | The TX ring | ||
183 | =========== | ||
184 | The TX ring uses a low-watermark interrupt scheme to make sure that | ||
185 | the TX queue is appropriately serviced for large packet sizes. | ||
186 | |||
187 | For packet sizes greater than about 1KBytes, the kernel can fill | ||
188 | the TX ring quicker than the device can drain it. Once the ring | ||
189 | is full, the netdev is stopped. When there is room in the ring, | ||
190 | the netdev needs to be reawakened, so that more TX packets are placed | ||
191 | in the ring. The hardware can empty the ring about four times per jiffy, | ||
192 | so its not appropriate to wait for the poll routine to refill, since | ||
193 | the poll routine runs only once per jiffy. The low-watermark mechanism | ||
194 | marks a descr about 1/4th of the way from the bottom of the queue, so | ||
195 | that an interrupt is generated when the descr is processed. This | ||
196 | interrupt wakes up the netdev, which can then refill the queue. | ||
197 | For large packets, this mechanism generates a relatively small number | ||
198 | of interrupts, about 1K/sec. For smaller packets, this will drop to zero | ||
199 | interrupts, as the hardware can empty the queue faster than the kernel | ||
200 | can fill it. | ||
201 | |||
202 | |||
203 | ======= END OF DOCUMENT ======== | ||
204 | |||