diff options
author | Jeff Garzik <jgarzik@pobox.com> | 2005-08-29 16:40:27 -0400 |
---|---|---|
committer | Jeff Garzik <jgarzik@pobox.com> | 2005-08-29 16:40:27 -0400 |
commit | c1b054d03f5b31c33eaa0b267c629b118eaf3790 (patch) | |
tree | 9333907ca767be24fcb3667877242976c3e3c8dd /Documentation/networking | |
parent | 559fb51ba7e66fe298b8355fabde1275b7def35f (diff) | |
parent | bf4e70e54cf31dcca48d279c7f7e71328eebe749 (diff) |
Merge /spare/repo/linux-2.6/
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/00-INDEX | 4 | ||||
-rw-r--r-- | Documentation/networking/bonding.txt | 978 | ||||
-rw-r--r-- | Documentation/networking/dmfe.txt | 82 | ||||
-rw-r--r-- | Documentation/networking/fib_trie.txt | 145 | ||||
-rw-r--r-- | Documentation/networking/ip-sysctl.txt | 56 | ||||
-rw-r--r-- | Documentation/networking/phy.txt | 288 | ||||
-rw-r--r-- | Documentation/networking/tcp.txt | 69 | ||||
-rw-r--r-- | Documentation/networking/wanpipe.txt | 622 |
8 files changed, 1228 insertions, 1016 deletions
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 834993d26730..5b01d5cc4e95 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX | |||
@@ -114,9 +114,7 @@ tuntap.txt | |||
114 | vortex.txt | 114 | vortex.txt |
115 | - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. | 115 | - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. |
116 | wan-router.txt | 116 | wan-router.txt |
117 | - Wan router documentation | 117 | - WAN router documentation |
118 | wanpipe.txt | ||
119 | - WANPIPE(tm) Multiprotocol WAN Driver for Linux WAN Router | ||
120 | wavelan.txt | 118 | wavelan.txt |
121 | - AT&T GIS (nee NCR) WaveLAN card: An Ethernet-like radio transceiver | 119 | - AT&T GIS (nee NCR) WaveLAN card: An Ethernet-like radio transceiver |
122 | x25.txt | 120 | x25.txt |
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 0bc2ed136a38..24d029455baa 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt | |||
@@ -1,5 +1,7 @@ | |||
1 | 1 | ||
2 | Linux Ethernet Bonding Driver HOWTO | 2 | Linux Ethernet Bonding Driver HOWTO |
3 | |||
4 | Latest update: 21 June 2005 | ||
3 | 5 | ||
4 | Initial release : Thomas Davis <tadavis at lbl.gov> | 6 | Initial release : Thomas Davis <tadavis at lbl.gov> |
5 | Corrections, HA extensions : 2000/10/03-15 : | 7 | Corrections, HA extensions : 2000/10/03-15 : |
@@ -11,15 +13,22 @@ Corrections, HA extensions : 2000/10/03-15 : | |||
11 | 13 | ||
12 | Reorganized and updated Feb 2005 by Jay Vosburgh | 14 | Reorganized and updated Feb 2005 by Jay Vosburgh |
13 | 15 | ||
14 | Note : | 16 | Introduction |
15 | ------ | 17 | ============ |
18 | |||
19 | The Linux bonding driver provides a method for aggregating | ||
20 | multiple network interfaces into a single logical "bonded" interface. | ||
21 | The behavior of the bonded interfaces depends upon the mode; generally | ||
22 | speaking, modes provide either hot standby or load balancing services. | ||
23 | Additionally, link integrity monitoring may be performed. | ||
16 | 24 | ||
17 | The bonding driver originally came from Donald Becker's beowulf patches for | 25 | The bonding driver originally came from Donald Becker's |
18 | kernel 2.0. It has changed quite a bit since, and the original tools from | 26 | beowulf patches for kernel 2.0. It has changed quite a bit since, and |
19 | extreme-linux and beowulf sites will not work with this version of the driver. | 27 | the original tools from extreme-linux and beowulf sites will not work |
28 | with this version of the driver. | ||
20 | 29 | ||
21 | For new versions of the driver, patches for older kernels and the updated | 30 | For new versions of the driver, updated userspace tools, and |
22 | userspace tools, please follow the links at the end of this file. | 31 | who to ask for help, please follow the links at the end of this file. |
23 | 32 | ||
24 | Table of Contents | 33 | Table of Contents |
25 | ================= | 34 | ================= |
@@ -30,9 +39,13 @@ Table of Contents | |||
30 | 39 | ||
31 | 3. Configuring Bonding Devices | 40 | 3. Configuring Bonding Devices |
32 | 3.1 Configuration with sysconfig support | 41 | 3.1 Configuration with sysconfig support |
42 | 3.1.1 Using DHCP with sysconfig | ||
43 | 3.1.2 Configuring Multiple Bonds with sysconfig | ||
33 | 3.2 Configuration with initscripts support | 44 | 3.2 Configuration with initscripts support |
45 | 3.2.1 Using DHCP with initscripts | ||
46 | 3.2.2 Configuring Multiple Bonds with initscripts | ||
34 | 3.3 Configuring Bonding Manually | 47 | 3.3 Configuring Bonding Manually |
35 | 3.4 Configuring Multiple Bonds | 48 | 3.3.1 Configuring Multiple Bonds Manually |
36 | 49 | ||
37 | 5. Querying Bonding Configuration | 50 | 5. Querying Bonding Configuration |
38 | 5.1 Bonding Configuration | 51 | 5.1 Bonding Configuration |
@@ -56,21 +69,30 @@ Table of Contents | |||
56 | 69 | ||
57 | 11. Promiscuous mode | 70 | 11. Promiscuous mode |
58 | 71 | ||
59 | 12. High Availability Information | 72 | 12. Configuring Bonding for High Availability |
60 | 12.1 High Availability in a Single Switch Topology | 73 | 12.1 High Availability in a Single Switch Topology |
61 | 12.1.1 Bonding Mode Selection for Single Switch Topology | ||
62 | 12.1.2 Link Monitoring for Single Switch Topology | ||
63 | 12.2 High Availability in a Multiple Switch Topology | 74 | 12.2 High Availability in a Multiple Switch Topology |
64 | 12.2.1 Bonding Mode Selection for Multiple Switch Topology | 75 | 12.2.1 HA Bonding Mode Selection for Multiple Switch Topology |
65 | 12.2.2 Link Monitoring for Multiple Switch Topology | 76 | 12.2.2 HA Link Monitoring for Multiple Switch Topology |
66 | 12.3 Switch Behavior Issues for High Availability | 77 | |
78 | 13. Configuring Bonding for Maximum Throughput | ||
79 | 13.1 Maximum Throughput in a Single Switch Topology | ||
80 | 13.1.1 MT Bonding Mode Selection for Single Switch Topology | ||
81 | 13.1.2 MT Link Monitoring for Single Switch Topology | ||
82 | 13.2 Maximum Throughput in a Multiple Switch Topology | ||
83 | 13.2.1 MT Bonding Mode Selection for Multiple Switch Topology | ||
84 | 13.2.2 MT Link Monitoring for Multiple Switch Topology | ||
67 | 85 | ||
68 | 13. Hardware Specific Considerations | 86 | 14. Switch Behavior Issues |
69 | 13.1 IBM BladeCenter | 87 | 14.1 Link Establishment and Failover Delays |
88 | 14.2 Duplicated Incoming Packets | ||
70 | 89 | ||
71 | 14. Frequently Asked Questions | 90 | 15. Hardware Specific Considerations |
91 | 15.1 IBM BladeCenter | ||
72 | 92 | ||
73 | 15. Resources and Links | 93 | 16. Frequently Asked Questions |
94 | |||
95 | 17. Resources and Links | ||
74 | 96 | ||
75 | 97 | ||
76 | 1. Bonding Driver Installation | 98 | 1. Bonding Driver Installation |
@@ -86,16 +108,10 @@ the following steps: | |||
86 | 1.1 Configure and build the kernel with bonding | 108 | 1.1 Configure and build the kernel with bonding |
87 | ----------------------------------------------- | 109 | ----------------------------------------------- |
88 | 110 | ||
89 | The latest version of the bonding driver is available in the | 111 | The current version of the bonding driver is available in the |
90 | drivers/net/bonding subdirectory of the most recent kernel source | 112 | drivers/net/bonding subdirectory of the most recent kernel source |
91 | (which is available on http://kernel.org). | 113 | (which is available on http://kernel.org). Most users "rolling their |
92 | 114 | own" will want to use the most recent kernel from kernel.org. | |
93 | Prior to the 2.4.11 kernel, the bonding driver was maintained | ||
94 | largely outside the kernel tree; patches for some earlier kernels are | ||
95 | available on the bonding sourceforge site, although those patches are | ||
96 | still several years out of date. Most users will want to use either | ||
97 | the most recent kernel from kernel.org or whatever kernel came with | ||
98 | their distro. | ||
99 | 115 | ||
100 | Configure kernel with "make menuconfig" (or "make xconfig" or | 116 | Configure kernel with "make menuconfig" (or "make xconfig" or |
101 | "make config"), then select "Bonding driver support" in the "Network | 117 | "make config"), then select "Bonding driver support" in the "Network |
@@ -103,8 +119,8 @@ device support" section. It is recommended that you configure the | |||
103 | driver as module since it is currently the only way to pass parameters | 119 | driver as module since it is currently the only way to pass parameters |
104 | to the driver or configure more than one bonding device. | 120 | to the driver or configure more than one bonding device. |
105 | 121 | ||
106 | Build and install the new kernel and modules, then proceed to | 122 | Build and install the new kernel and modules, then continue |
107 | step 2. | 123 | below to install ifenslave. |
108 | 124 | ||
109 | 1.2 Install ifenslave Control Utility | 125 | 1.2 Install ifenslave Control Utility |
110 | ------------------------------------- | 126 | ------------------------------------- |
@@ -147,9 +163,9 @@ default kernel source include directory. | |||
147 | Options for the bonding driver are supplied as parameters to | 163 | Options for the bonding driver are supplied as parameters to |
148 | the bonding module at load time. They may be given as command line | 164 | the bonding module at load time. They may be given as command line |
149 | arguments to the insmod or modprobe command, but are usually specified | 165 | arguments to the insmod or modprobe command, but are usually specified |
150 | in either the /etc/modprobe.conf configuration file, or in a | 166 | in either the /etc/modules.conf or /etc/modprobe.conf configuration |
151 | distro-specific configuration file (some of which are detailed in the | 167 | file, or in a distro-specific configuration file (some of which are |
152 | next section). | 168 | detailed in the next section). |
153 | 169 | ||
154 | The available bonding driver parameters are listed below. If a | 170 | The available bonding driver parameters are listed below. If a |
155 | parameter is not specified the default value is used. When initially | 171 | parameter is not specified the default value is used. When initially |
@@ -162,34 +178,34 @@ degradation will occur during link failures. Very few devices do not | |||
162 | support at least miimon, so there is really no reason not to use it. | 178 | support at least miimon, so there is really no reason not to use it. |
163 | 179 | ||
164 | Options with textual values will accept either the text name | 180 | Options with textual values will accept either the text name |
165 | or, for backwards compatibility, the option value. E.g., | 181 | or, for backwards compatibility, the option value. E.g., |
166 | "mode=802.3ad" and "mode=4" set the same mode. | 182 | "mode=802.3ad" and "mode=4" set the same mode. |
167 | 183 | ||
168 | The parameters are as follows: | 184 | The parameters are as follows: |
169 | 185 | ||
170 | arp_interval | 186 | arp_interval |
171 | 187 | ||
172 | Specifies the ARP monitoring frequency in milli-seconds. If | 188 | Specifies the ARP link monitoring frequency in milliseconds. |
173 | ARP monitoring is used in a load-balancing mode (mode 0 or 2), | 189 | If ARP monitoring is used in an etherchannel compatible mode |
174 | the switch should be configured in a mode that evenly | 190 | (modes 0 and 2), the switch should be configured in a mode |
175 | distributes packets across all links - such as round-robin. If | 191 | that evenly distributes packets across all links. If the |
176 | the switch is configured to distribute the packets in an XOR | 192 | switch is configured to distribute the packets in an XOR |
177 | fashion, all replies from the ARP targets will be received on | 193 | fashion, all replies from the ARP targets will be received on |
178 | the same link which could cause the other team members to | 194 | the same link which could cause the other team members to |
179 | fail. ARP monitoring should not be used in conjunction with | 195 | fail. ARP monitoring should not be used in conjunction with |
180 | miimon. A value of 0 disables ARP monitoring. The default | 196 | miimon. A value of 0 disables ARP monitoring. The default |
181 | value is 0. | 197 | value is 0. |
182 | 198 | ||
183 | arp_ip_target | 199 | arp_ip_target |
184 | 200 | ||
185 | Specifies the ip addresses to use when arp_interval is > 0. | 201 | Specifies the IP addresses to use as ARP monitoring peers when |
186 | These are the targets of the ARP request sent to determine the | 202 | arp_interval is > 0. These are the targets of the ARP request |
187 | health of the link to the targets. Specify these values in | 203 | sent to determine the health of the link to the targets. |
188 | ddd.ddd.ddd.ddd format. Multiple ip adresses must be | 204 | Specify these values in ddd.ddd.ddd.ddd format. Multiple IP |
189 | seperated by a comma. At least one IP address must be given | 205 | addresses must be separated by a comma. At least one IP |
190 | for ARP monitoring to function. The maximum number of targets | 206 | address must be given for ARP monitoring to function. The |
191 | that can be specified is 16. The default value is no IP | 207 | maximum number of targets that can be specified is 16. The |
192 | addresses. | 208 | default value is no IP addresses. |
193 | 209 | ||
194 | downdelay | 210 | downdelay |
195 | 211 | ||
@@ -207,11 +223,13 @@ lacp_rate | |||
207 | are: | 223 | are: |
208 | 224 | ||
209 | slow or 0 | 225 | slow or 0 |
210 | Request partner to transmit LACPDUs every 30 seconds (default) | 226 | Request partner to transmit LACPDUs every 30 seconds |
211 | 227 | ||
212 | fast or 1 | 228 | fast or 1 |
213 | Request partner to transmit LACPDUs every 1 second | 229 | Request partner to transmit LACPDUs every 1 second |
214 | 230 | ||
231 | The default is slow. | ||
232 | |||
215 | max_bonds | 233 | max_bonds |
216 | 234 | ||
217 | Specifies the number of bonding devices to create for this | 235 | Specifies the number of bonding devices to create for this |
@@ -221,10 +239,11 @@ max_bonds | |||
221 | 239 | ||
222 | miimon | 240 | miimon |
223 | 241 | ||
224 | Specifies the frequency in milli-seconds that MII link | 242 | Specifies the MII link monitoring frequency in milliseconds. |
225 | monitoring will occur. A value of zero disables MII link | 243 | This determines how often the link state of each slave is |
226 | monitoring. A value of 100 is a good starting point. The | 244 | inspected for link failures. A value of zero disables MII |
227 | use_carrier option, below, affects how the link state is | 245 | link monitoring. A value of 100 is a good starting point. |
246 | The use_carrier option, below, affects how the link state is | ||
228 | determined. See the High Availability section for additional | 247 | determined. See the High Availability section for additional |
229 | information. The default value is 0. | 248 | information. The default value is 0. |
230 | 249 | ||
@@ -246,17 +265,31 @@ mode | |||
246 | active. A different slave becomes active if, and only | 265 | active. A different slave becomes active if, and only |
247 | if, the active slave fails. The bond's MAC address is | 266 | if, the active slave fails. The bond's MAC address is |
248 | externally visible on only one port (network adapter) | 267 | externally visible on only one port (network adapter) |
249 | to avoid confusing the switch. This mode provides | 268 | to avoid confusing the switch. |
250 | fault tolerance. The primary option affects the | 269 | |
251 | behavior of this mode. | 270 | In bonding version 2.6.2 or later, when a failover |
271 | occurs in active-backup mode, bonding will issue one | ||
272 | or more gratuitous ARPs on the newly active slave. | ||
273 | One gratutious ARP is issued for the bonding master | ||
274 | interface and each VLAN interfaces configured above | ||
275 | it, provided that the interface has at least one IP | ||
276 | address configured. Gratuitous ARPs issued for VLAN | ||
277 | interfaces are tagged with the appropriate VLAN id. | ||
278 | |||
279 | This mode provides fault tolerance. The primary | ||
280 | option, documented below, affects the behavior of this | ||
281 | mode. | ||
252 | 282 | ||
253 | balance-xor or 2 | 283 | balance-xor or 2 |
254 | 284 | ||
255 | XOR policy: Transmit based on [(source MAC address | 285 | XOR policy: Transmit based on the selected transmit |
256 | XOR'd with destination MAC address) modulo slave | 286 | hash policy. The default policy is a simple [(source |
257 | count]. This selects the same slave for each | 287 | MAC address XOR'd with destination MAC address) modulo |
258 | destination MAC address. This mode provides load | 288 | slave count]. Alternate transmit policies may be |
259 | balancing and fault tolerance. | 289 | selected via the xmit_hash_policy option, described |
290 | below. | ||
291 | |||
292 | This mode provides load balancing and fault tolerance. | ||
260 | 293 | ||
261 | broadcast or 3 | 294 | broadcast or 3 |
262 | 295 | ||
@@ -270,7 +303,17 @@ mode | |||
270 | duplex settings. Utilizes all slaves in the active | 303 | duplex settings. Utilizes all slaves in the active |
271 | aggregator according to the 802.3ad specification. | 304 | aggregator according to the 802.3ad specification. |
272 | 305 | ||
273 | Pre-requisites: | 306 | Slave selection for outgoing traffic is done according |
307 | to the transmit hash policy, which may be changed from | ||
308 | the default simple XOR policy via the xmit_hash_policy | ||
309 | option, documented below. Note that not all transmit | ||
310 | policies may be 802.3ad compliant, particularly in | ||
311 | regards to the packet mis-ordering requirements of | ||
312 | section 43.2.4 of the 802.3ad standard. Differing | ||
313 | peer implementations will have varying tolerances for | ||
314 | noncompliance. | ||
315 | |||
316 | Prerequisites: | ||
274 | 317 | ||
275 | 1. Ethtool support in the base drivers for retrieving | 318 | 1. Ethtool support in the base drivers for retrieving |
276 | the speed and duplex of each slave. | 319 | the speed and duplex of each slave. |
@@ -333,7 +376,7 @@ mode | |||
333 | 376 | ||
334 | When a link is reconnected or a new slave joins the | 377 | When a link is reconnected or a new slave joins the |
335 | bond the receive traffic is redistributed among all | 378 | bond the receive traffic is redistributed among all |
336 | active slaves in the bond by intiating ARP Replies | 379 | active slaves in the bond by initiating ARP Replies |
337 | with the selected mac address to each of the | 380 | with the selected mac address to each of the |
338 | clients. The updelay parameter (detailed below) must | 381 | clients. The updelay parameter (detailed below) must |
339 | be set to a value equal or greater than the switch's | 382 | be set to a value equal or greater than the switch's |
@@ -396,6 +439,60 @@ use_carrier | |||
396 | 0 will use the deprecated MII / ETHTOOL ioctls. The default | 439 | 0 will use the deprecated MII / ETHTOOL ioctls. The default |
397 | value is 1. | 440 | value is 1. |
398 | 441 | ||
442 | xmit_hash_policy | ||
443 | |||
444 | Selects the transmit hash policy to use for slave selection in | ||
445 | balance-xor and 802.3ad modes. Possible values are: | ||
446 | |||
447 | layer2 | ||
448 | |||
449 | Uses XOR of hardware MAC addresses to generate the | ||
450 | hash. The formula is | ||
451 | |||
452 | (source MAC XOR destination MAC) modulo slave count | ||
453 | |||
454 | This algorithm will place all traffic to a particular | ||
455 | network peer on the same slave. | ||
456 | |||
457 | This algorithm is 802.3ad compliant. | ||
458 | |||
459 | layer3+4 | ||
460 | |||
461 | This policy uses upper layer protocol information, | ||
462 | when available, to generate the hash. This allows for | ||
463 | traffic to a particular network peer to span multiple | ||
464 | slaves, although a single connection will not span | ||
465 | multiple slaves. | ||
466 | |||
467 | The formula for unfragmented TCP and UDP packets is | ||
468 | |||
469 | ((source port XOR dest port) XOR | ||
470 | ((source IP XOR dest IP) AND 0xffff) | ||
471 | modulo slave count | ||
472 | |||
473 | For fragmented TCP or UDP packets and all other IP | ||
474 | protocol traffic, the source and destination port | ||
475 | information is omitted. For non-IP traffic, the | ||
476 | formula is the same as for the layer2 transmit hash | ||
477 | policy. | ||
478 | |||
479 | This policy is intended to mimic the behavior of | ||
480 | certain switches, notably Cisco switches with PFC2 as | ||
481 | well as some Foundry and IBM products. | ||
482 | |||
483 | This algorithm is not fully 802.3ad compliant. A | ||
484 | single TCP or UDP conversation containing both | ||
485 | fragmented and unfragmented packets will see packets | ||
486 | striped across two interfaces. This may result in out | ||
487 | of order delivery. Most traffic types will not meet | ||
488 | this criteria, as TCP rarely fragments traffic, and | ||
489 | most UDP traffic is not involved in extended | ||
490 | conversations. Other implementations of 802.3ad may | ||
491 | or may not tolerate this noncompliance. | ||
492 | |||
493 | The default value is layer2. This option was added in bonding | ||
494 | version 2.6.3. In earlier versions of bonding, this parameter does | ||
495 | not exist, and the layer2 policy is the only policy. | ||
399 | 496 | ||
400 | 497 | ||
401 | 3. Configuring Bonding Devices | 498 | 3. Configuring Bonding Devices |
@@ -448,8 +545,9 @@ Bonding devices can be managed by hand, however, as follows. | |||
448 | slave devices. On SLES 9, this is most easily done by running the | 545 | slave devices. On SLES 9, this is most easily done by running the |
449 | yast2 sysconfig configuration utility. The goal is for to create an | 546 | yast2 sysconfig configuration utility. The goal is for to create an |
450 | ifcfg-id file for each slave device. The simplest way to accomplish | 547 | ifcfg-id file for each slave device. The simplest way to accomplish |
451 | this is to configure the devices for DHCP. The name of the | 548 | this is to configure the devices for DHCP (this is only to get the |
452 | configuration file for each device will be of the form: | 549 | file ifcfg-id file created; see below for some issues with DHCP). The |
550 | name of the configuration file for each device will be of the form: | ||
453 | 551 | ||
454 | ifcfg-id-xx:xx:xx:xx:xx:xx | 552 | ifcfg-id-xx:xx:xx:xx:xx:xx |
455 | 553 | ||
@@ -459,7 +557,7 @@ the device's permanent MAC address. | |||
459 | Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been | 557 | Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been |
460 | created, it is necessary to edit the configuration files for the slave | 558 | created, it is necessary to edit the configuration files for the slave |
461 | devices (the MAC addresses correspond to those of the slave devices). | 559 | devices (the MAC addresses correspond to those of the slave devices). |
462 | Before editing, the file will contain muliple lines, and will look | 560 | Before editing, the file will contain multiple lines, and will look |
463 | something like this: | 561 | something like this: |
464 | 562 | ||
465 | BOOTPROTO='dhcp' | 563 | BOOTPROTO='dhcp' |
@@ -496,16 +594,11 @@ STARTMODE="onboot" | |||
496 | BONDING_MASTER="yes" | 594 | BONDING_MASTER="yes" |
497 | BONDING_MODULE_OPTS="mode=active-backup miimon=100" | 595 | BONDING_MODULE_OPTS="mode=active-backup miimon=100" |
498 | BONDING_SLAVE0="eth0" | 596 | BONDING_SLAVE0="eth0" |
499 | BONDING_SLAVE1="eth1" | 597 | BONDING_SLAVE1="bus-pci-0000:06:08.1" |
500 | 598 | ||
501 | Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK | 599 | Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK |
502 | values with the appropriate values for your network. | 600 | values with the appropriate values for your network. |
503 | 601 | ||
504 | Note that configuring the bonding device with BOOTPROTO='dhcp' | ||
505 | does not work; the scripts attempt to obtain the device address from | ||
506 | DHCP prior to adding any of the slave devices. Without active slaves, | ||
507 | the DHCP requests are not sent to the network. | ||
508 | |||
509 | The STARTMODE specifies when the device is brought online. | 602 | The STARTMODE specifies when the device is brought online. |
510 | The possible values are: | 603 | The possible values are: |
511 | 604 | ||
@@ -531,9 +624,17 @@ for the bonding mode, link monitoring, and so on here. Do not include | |||
531 | the max_bonds bonding parameter; this will confuse the configuration | 624 | the max_bonds bonding parameter; this will confuse the configuration |
532 | system if you have multiple bonding devices. | 625 | system if you have multiple bonding devices. |
533 | 626 | ||
534 | Finally, supply one BONDING_SLAVEn="ethX" for each slave, | 627 | Finally, supply one BONDING_SLAVEn="slave device" for each |
535 | where "n" is an increasing value, one for each slave, and "ethX" is | 628 | slave. where "n" is an increasing value, one for each slave. The |
536 | the name of the slave device (eth0, eth1, etc). | 629 | "slave device" is either an interface name, e.g., "eth0", or a device |
630 | specifier for the network device. The interface name is easier to | ||
631 | find, but the ethN names are subject to change at boot time if, e.g., | ||
632 | a device early in the sequence has failed. The device specifiers | ||
633 | (bus-pci-0000:06:08.1 in the example above) specify the physical | ||
634 | network device, and will not change unless the device's bus location | ||
635 | changes (for example, it is moved from one PCI slot to another). The | ||
636 | example above uses one of each type for demonstration purposes; most | ||
637 | configurations will choose one or the other for all slave devices. | ||
537 | 638 | ||
538 | When all configuration files have been modified or created, | 639 | When all configuration files have been modified or created, |
539 | networking must be restarted for the configuration changes to take | 640 | networking must be restarted for the configuration changes to take |
@@ -544,7 +645,7 @@ effect. This can be accomplished via the following: | |||
544 | Note that the network control script (/sbin/ifdown) will | 645 | Note that the network control script (/sbin/ifdown) will |
545 | remove the bonding module as part of the network shutdown processing, | 646 | remove the bonding module as part of the network shutdown processing, |
546 | so it is not necessary to remove the module by hand if, e.g., the | 647 | so it is not necessary to remove the module by hand if, e.g., the |
547 | module paramters have changed. | 648 | module parameters have changed. |
548 | 649 | ||
549 | Also, at this writing, YaST/YaST2 will not manage bonding | 650 | Also, at this writing, YaST/YaST2 will not manage bonding |
550 | devices (they do not show bonding interfaces on its list of network | 651 | devices (they do not show bonding interfaces on its list of network |
@@ -559,12 +660,37 @@ format can be found in an example ifcfg template file: | |||
559 | Note that the template does not document the various BONDING_ | 660 | Note that the template does not document the various BONDING_ |
560 | settings described above, but does describe many of the other options. | 661 | settings described above, but does describe many of the other options. |
561 | 662 | ||
663 | 3.1.1 Using DHCP with sysconfig | ||
664 | ------------------------------- | ||
665 | |||
666 | Under sysconfig, configuring a device with BOOTPROTO='dhcp' | ||
667 | will cause it to query DHCP for its IP address information. At this | ||
668 | writing, this does not function for bonding devices; the scripts | ||
669 | attempt to obtain the device address from DHCP prior to adding any of | ||
670 | the slave devices. Without active slaves, the DHCP requests are not | ||
671 | sent to the network. | ||
672 | |||
673 | 3.1.2 Configuring Multiple Bonds with sysconfig | ||
674 | ----------------------------------------------- | ||
675 | |||
676 | The sysconfig network initialization system is capable of | ||
677 | handling multiple bonding devices. All that is necessary is for each | ||
678 | bonding instance to have an appropriately configured ifcfg-bondX file | ||
679 | (as described above). Do not specify the "max_bonds" parameter to any | ||
680 | instance of bonding, as this will confuse sysconfig. If you require | ||
681 | multiple bonding devices with identical parameters, create multiple | ||
682 | ifcfg-bondX files. | ||
683 | |||
684 | Because the sysconfig scripts supply the bonding module | ||
685 | options in the ifcfg-bondX file, it is not necessary to add them to | ||
686 | the system /etc/modules.conf or /etc/modprobe.conf configuration file. | ||
687 | |||
562 | 3.2 Configuration with initscripts support | 688 | 3.2 Configuration with initscripts support |
563 | ------------------------------------------ | 689 | ------------------------------------------ |
564 | 690 | ||
565 | This section applies to distros using a version of initscripts | 691 | This section applies to distros using a version of initscripts |
566 | with bonding support, for example, Red Hat Linux 9 or Red Hat | 692 | with bonding support, for example, Red Hat Linux 9 or Red Hat |
567 | Enterprise Linux version 3. On these systems, the network | 693 | Enterprise Linux version 3 or 4. On these systems, the network |
568 | initialization scripts have some knowledge of bonding, and can be | 694 | initialization scripts have some knowledge of bonding, and can be |
569 | configured to control bonding devices. | 695 | configured to control bonding devices. |
570 | 696 | ||
@@ -614,10 +740,11 @@ USERCTL=no | |||
614 | Be sure to change the networking specific lines (IPADDR, | 740 | Be sure to change the networking specific lines (IPADDR, |
615 | NETMASK, NETWORK and BROADCAST) to match your network configuration. | 741 | NETMASK, NETWORK and BROADCAST) to match your network configuration. |
616 | 742 | ||
617 | Finally, it is necessary to edit /etc/modules.conf to load the | 743 | Finally, it is necessary to edit /etc/modules.conf (or |
618 | bonding module when the bond0 interface is brought up. The following | 744 | /etc/modprobe.conf, depending upon your distro) to load the bonding |
619 | sample lines in /etc/modules.conf will load the bonding module, and | 745 | module with your desired options when the bond0 interface is brought |
620 | select its options: | 746 | up. The following lines in /etc/modules.conf (or modprobe.conf) will |
747 | load the bonding module, and select its options: | ||
621 | 748 | ||
622 | alias bond0 bonding | 749 | alias bond0 bonding |
623 | options bond0 mode=balance-alb miimon=100 | 750 | options bond0 mode=balance-alb miimon=100 |
@@ -629,6 +756,33 @@ options for your configuration. | |||
629 | will restart the networking subsystem and your bond link should be now | 756 | will restart the networking subsystem and your bond link should be now |
630 | up and running. | 757 | up and running. |
631 | 758 | ||
759 | 3.2.1 Using DHCP with initscripts | ||
760 | --------------------------------- | ||
761 | |||
762 | Recent versions of initscripts (the version supplied with | ||
763 | Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do | ||
764 | have support for assigning IP information to bonding devices via DHCP. | ||
765 | |||
766 | To configure bonding for DHCP, configure it as described | ||
767 | above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" | ||
768 | and add a line consisting of "TYPE=Bonding". Note that the TYPE value | ||
769 | is case sensitive. | ||
770 | |||
771 | 3.2.2 Configuring Multiple Bonds with initscripts | ||
772 | ------------------------------------------------- | ||
773 | |||
774 | At this writing, the initscripts package does not directly | ||
775 | support loading the bonding driver multiple times, so the process for | ||
776 | doing so is the same as described in the "Configuring Multiple Bonds | ||
777 | Manually" section, below. | ||
778 | |||
779 | NOTE: It has been observed that some Red Hat supplied kernels | ||
780 | are apparently unable to rename modules at load time (the "-obonding1" | ||
781 | part). Attempts to pass that option to modprobe will produce an | ||
782 | "Operation not permitted" error. This has been reported on some | ||
783 | Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels | ||
784 | exhibiting this problem, it will be impossible to configure multiple | ||
785 | bonds with differing parameters. | ||
632 | 786 | ||
633 | 3.3 Configuring Bonding Manually | 787 | 3.3 Configuring Bonding Manually |
634 | -------------------------------- | 788 | -------------------------------- |
@@ -638,10 +792,11 @@ scripts (the sysconfig or initscripts package) do not have specific | |||
638 | knowledge of bonding. One such distro is SuSE Linux Enterprise Server | 792 | knowledge of bonding. One such distro is SuSE Linux Enterprise Server |
639 | version 8. | 793 | version 8. |
640 | 794 | ||
641 | The general methodology for these systems is to place the | 795 | The general method for these systems is to place the bonding |
642 | bonding module parameters into /etc/modprobe.conf, then add modprobe | 796 | module parameters into /etc/modules.conf or /etc/modprobe.conf (as |
643 | and/or ifenslave commands to the system's global init script. The | 797 | appropriate for the installed distro), then add modprobe and/or |
644 | name of the global init script differs; for sysconfig, it is | 798 | ifenslave commands to the system's global init script. The name of |
799 | the global init script differs; for sysconfig, it is | ||
645 | /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. | 800 | /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. |
646 | 801 | ||
647 | For example, if you wanted to make a simple bond of two e100 | 802 | For example, if you wanted to make a simple bond of two e100 |
@@ -649,7 +804,7 @@ devices (presumed to be eth0 and eth1), and have it persist across | |||
649 | reboots, edit the appropriate file (/etc/init.d/boot.local or | 804 | reboots, edit the appropriate file (/etc/init.d/boot.local or |
650 | /etc/rc.d/rc.local), and add the following: | 805 | /etc/rc.d/rc.local), and add the following: |
651 | 806 | ||
652 | modprobe bonding -obond0 mode=balance-alb miimon=100 | 807 | modprobe bonding mode=balance-alb miimon=100 |
653 | modprobe e100 | 808 | modprobe e100 |
654 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up | 809 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up |
655 | ifenslave bond0 eth0 | 810 | ifenslave bond0 eth0 |
@@ -657,11 +812,7 @@ ifenslave bond0 eth1 | |||
657 | 812 | ||
658 | Replace the example bonding module parameters and bond0 | 813 | Replace the example bonding module parameters and bond0 |
659 | network configuration (IP address, netmask, etc) with the appropriate | 814 | network configuration (IP address, netmask, etc) with the appropriate |
660 | values for your configuration. The above example loads the bonding | 815 | values for your configuration. |
661 | module with the name "bond0," this simplifies the naming if multiple | ||
662 | bonding modules are loaded (each successive instance of the module is | ||
663 | given a different name, and the module instance names match the | ||
664 | bonding interface names). | ||
665 | 816 | ||
666 | Unfortunately, this method will not provide support for the | 817 | Unfortunately, this method will not provide support for the |
667 | ifup and ifdown scripts on the bond devices. To reload the bonding | 818 | ifup and ifdown scripts on the bond devices. To reload the bonding |
@@ -684,20 +835,23 @@ appropriate device driver modules. For our example above, you can do | |||
684 | the following: | 835 | the following: |
685 | 836 | ||
686 | # ifconfig bond0 down | 837 | # ifconfig bond0 down |
687 | # rmmod bond0 | 838 | # rmmod bonding |
688 | # rmmod e100 | 839 | # rmmod e100 |
689 | 840 | ||
690 | Again, for convenience, it may be desirable to create a script | 841 | Again, for convenience, it may be desirable to create a script |
691 | with these commands. | 842 | with these commands. |
692 | 843 | ||
693 | 844 | ||
694 | 3.4 Configuring Multiple Bonds | 845 | 3.3.1 Configuring Multiple Bonds Manually |
695 | ------------------------------ | 846 | ----------------------------------------- |
696 | 847 | ||
697 | This section contains information on configuring multiple | 848 | This section contains information on configuring multiple |
698 | bonding devices with differing options. If you require multiple | 849 | bonding devices with differing options for those systems whose network |
699 | bonding devices, but all with the same options, see the "max_bonds" | 850 | initialization scripts lack support for configuring multiple bonds. |
700 | module paramter, documented above. | 851 | |
852 | If you require multiple bonding devices, but all with the same | ||
853 | options, you may wish to use the "max_bonds" module parameter, | ||
854 | documented above. | ||
701 | 855 | ||
702 | To create multiple bonding devices with differing options, it | 856 | To create multiple bonding devices with differing options, it |
703 | is necessary to load the bonding driver multiple times. Note that | 857 | is necessary to load the bonding driver multiple times. Note that |
@@ -724,11 +878,16 @@ named "bond0" and creates the bond0 device in balance-rr mode with an | |||
724 | miimon of 100. The second instance is named "bond1" and creates the | 878 | miimon of 100. The second instance is named "bond1" and creates the |
725 | bond1 device in balance-alb mode with an miimon of 50. | 879 | bond1 device in balance-alb mode with an miimon of 50. |
726 | 880 | ||
881 | In some circumstances (typically with older distributions), | ||
882 | the above does not work, and the second bonding instance never sees | ||
883 | its options. In that case, the second options line can be substituted | ||
884 | as follows: | ||
885 | |||
886 | install bonding1 /sbin/modprobe bonding -obond1 mode=balance-alb miimon=50 | ||
887 | |||
727 | This may be repeated any number of times, specifying a new and | 888 | This may be repeated any number of times, specifying a new and |
728 | unique name in place of bond0 or bond1 for each instance. | 889 | unique name in place of bond1 for each subsequent instance. |
729 | 890 | ||
730 | When the appropriate module paramters are in place, then | ||
731 | configure bonding according to the instructions for your distro. | ||
732 | 891 | ||
733 | 5. Querying Bonding Configuration | 892 | 5. Querying Bonding Configuration |
734 | ================================= | 893 | ================================= |
@@ -846,8 +1005,8 @@ tagged internally by bonding itself. As a result, bonding must | |||
846 | self generated packets. | 1005 | self generated packets. |
847 | 1006 | ||
848 | For reasons of simplicity, and to support the use of adapters | 1007 | For reasons of simplicity, and to support the use of adapters |
849 | that can do VLAN hardware acceleration offloding, the bonding | 1008 | that can do VLAN hardware acceleration offloading, the bonding |
850 | interface declares itself as fully hardware offloaing capable, it gets | 1009 | interface declares itself as fully hardware offloading capable, it gets |
851 | the add_vid/kill_vid notifications to gather the necessary | 1010 | the add_vid/kill_vid notifications to gather the necessary |
852 | information, and it propagates those actions to the slaves. In case | 1011 | information, and it propagates those actions to the slaves. In case |
853 | of mixed adapter types, hardware accelerated tagged packets that | 1012 | of mixed adapter types, hardware accelerated tagged packets that |
@@ -880,7 +1039,7 @@ bond interface: | |||
880 | matches the hardware address of the VLAN interfaces. | 1039 | matches the hardware address of the VLAN interfaces. |
881 | 1040 | ||
882 | Note that changing a VLAN interface's HW address would set the | 1041 | Note that changing a VLAN interface's HW address would set the |
883 | underlying device -- i.e. the bonding interface -- to promiscouos | 1042 | underlying device -- i.e. the bonding interface -- to promiscuous |
884 | mode, which might not be what you want. | 1043 | mode, which might not be what you want. |
885 | 1044 | ||
886 | 1045 | ||
@@ -923,7 +1082,7 @@ down or have a problem making it unresponsive to ARP requests. Having | |||
923 | an additional target (or several) increases the reliability of the ARP | 1082 | an additional target (or several) increases the reliability of the ARP |
924 | monitoring. | 1083 | monitoring. |
925 | 1084 | ||
926 | Multiple ARP targets must be seperated by commas as follows: | 1085 | Multiple ARP targets must be separated by commas as follows: |
927 | 1086 | ||
928 | # example options for ARP monitoring with three targets | 1087 | # example options for ARP monitoring with three targets |
929 | alias bond0 bonding | 1088 | alias bond0 bonding |
@@ -1045,7 +1204,7 @@ install bonding /sbin/modprobe tg3; /sbin/modprobe e1000; | |||
1045 | This will, when loading the bonding module, rather than | 1204 | This will, when loading the bonding module, rather than |
1046 | performing the normal action, instead execute the provided command. | 1205 | performing the normal action, instead execute the provided command. |
1047 | This command loads the device drivers in the order needed, then calls | 1206 | This command loads the device drivers in the order needed, then calls |
1048 | modprobe with --ingore-install to cause the normal action to then take | 1207 | modprobe with --ignore-install to cause the normal action to then take |
1049 | place. Full documentation on this can be found in the modprobe.conf | 1208 | place. Full documentation on this can be found in the modprobe.conf |
1050 | and modprobe manual pages. | 1209 | and modprobe manual pages. |
1051 | 1210 | ||
@@ -1130,14 +1289,14 @@ association. | |||
1130 | common to enable promiscuous mode on the device, so that all traffic | 1289 | common to enable promiscuous mode on the device, so that all traffic |
1131 | is seen (instead of seeing only traffic destined for the local host). | 1290 | is seen (instead of seeing only traffic destined for the local host). |
1132 | The bonding driver handles promiscuous mode changes to the bonding | 1291 | The bonding driver handles promiscuous mode changes to the bonding |
1133 | master device (e.g., bond0), and propogates the setting to the slave | 1292 | master device (e.g., bond0), and propagates the setting to the slave |
1134 | devices. | 1293 | devices. |
1135 | 1294 | ||
1136 | For the balance-rr, balance-xor, broadcast, and 802.3ad modes, | 1295 | For the balance-rr, balance-xor, broadcast, and 802.3ad modes, |
1137 | the promiscuous mode setting is propogated to all slaves. | 1296 | the promiscuous mode setting is propagated to all slaves. |
1138 | 1297 | ||
1139 | For the active-backup, balance-tlb and balance-alb modes, the | 1298 | For the active-backup, balance-tlb and balance-alb modes, the |
1140 | promiscuous mode setting is propogated only to the active slave. | 1299 | promiscuous mode setting is propagated only to the active slave. |
1141 | 1300 | ||
1142 | For balance-tlb mode, the active slave is the slave currently | 1301 | For balance-tlb mode, the active slave is the slave currently |
1143 | receiving inbound traffic. | 1302 | receiving inbound traffic. |
@@ -1148,46 +1307,182 @@ sending to peers that are unassigned or if the load is unbalanced. | |||
1148 | 1307 | ||
1149 | For the active-backup, balance-tlb and balance-alb modes, when | 1308 | For the active-backup, balance-tlb and balance-alb modes, when |
1150 | the active slave changes (e.g., due to a link failure), the | 1309 | the active slave changes (e.g., due to a link failure), the |
1151 | promiscuous setting will be propogated to the new active slave. | 1310 | promiscuous setting will be propagated to the new active slave. |
1152 | 1311 | ||
1153 | 12. High Availability Information | 1312 | 12. Configuring Bonding for High Availability |
1154 | ================================= | 1313 | ============================================= |
1155 | 1314 | ||
1156 | High Availability refers to configurations that provide | 1315 | High Availability refers to configurations that provide |
1157 | maximum network availability by having redundant or backup devices, | 1316 | maximum network availability by having redundant or backup devices, |
1158 | links and switches between the host and the rest of the world. | 1317 | links or switches between the host and the rest of the world. The |
1159 | 1318 | goal is to provide the maximum availability of network connectivity | |
1160 | There are currently two basic methods for configuring to | 1319 | (i.e., the network always works), even though other configurations |
1161 | maximize availability. They are dependent on the network topology and | 1320 | could provide higher throughput. |
1162 | the primary goal of the configuration, but in general, a configuration | ||
1163 | can be optimized for maximum available bandwidth, or for maximum | ||
1164 | network availability. | ||
1165 | 1321 | ||
1166 | 12.1 High Availability in a Single Switch Topology | 1322 | 12.1 High Availability in a Single Switch Topology |
1167 | -------------------------------------------------- | 1323 | -------------------------------------------------- |
1168 | 1324 | ||
1169 | If two hosts (or a host and a switch) are directly connected | 1325 | If two hosts (or a host and a single switch) are directly |
1170 | via multiple physical links, then there is no network availability | 1326 | connected via multiple physical links, then there is no availability |
1171 | penalty for optimizing for maximum bandwidth: there is only one switch | 1327 | penalty to optimizing for maximum bandwidth. In this case, there is |
1172 | (or peer), so if it fails, you have no alternative access to fail over | 1328 | only one switch (or peer), so if it fails, there is no alternative |
1173 | to. | 1329 | access to fail over to. Additionally, the bonding load balance modes |
1330 | support link monitoring of their members, so if individual links fail, | ||
1331 | the load will be rebalanced across the remaining devices. | ||
1332 | |||
1333 | See Section 13, "Configuring Bonding for Maximum Throughput" | ||
1334 | for information on configuring bonding with one peer device. | ||
1335 | |||
1336 | 12.2 High Availability in a Multiple Switch Topology | ||
1337 | ---------------------------------------------------- | ||
1338 | |||
1339 | With multiple switches, the configuration of bonding and the | ||
1340 | network changes dramatically. In multiple switch topologies, there is | ||
1341 | a trade off between network availability and usable bandwidth. | ||
1342 | |||
1343 | Below is a sample network, configured to maximize the | ||
1344 | availability of the network: | ||
1174 | 1345 | ||
1175 | Example 1 : host to switch (or other host) | 1346 | | | |
1347 | |port3 port3| | ||
1348 | +-----+----+ +-----+----+ | ||
1349 | | |port2 ISL port2| | | ||
1350 | | switch A +--------------------------+ switch B | | ||
1351 | | | | | | ||
1352 | +-----+----+ +-----++---+ | ||
1353 | |port1 port1| | ||
1354 | | +-------+ | | ||
1355 | +-------------+ host1 +---------------+ | ||
1356 | eth0 +-------+ eth1 | ||
1176 | 1357 | ||
1177 | +----------+ +----------+ | 1358 | In this configuration, there is a link between the two |
1178 | | |eth0 eth0| switch | | 1359 | switches (ISL, or inter switch link), and multiple ports connecting to |
1179 | | Host A +--------------------------+ or | | 1360 | the outside world ("port3" on each switch). There is no technical |
1180 | | +--------------------------+ other | | 1361 | reason that this could not be extended to a third switch. |
1181 | | |eth1 eth1| host | | ||
1182 | +----------+ +----------+ | ||
1183 | 1362 | ||
1363 | 12.2.1 HA Bonding Mode Selection for Multiple Switch Topology | ||
1364 | ------------------------------------------------------------- | ||
1184 | 1365 | ||
1185 | 12.1.1 Bonding Mode Selection for single switch topology | 1366 | In a topology such as the example above, the active-backup and |
1186 | -------------------------------------------------------- | 1367 | broadcast modes are the only useful bonding modes when optimizing for |
1368 | availability; the other modes require all links to terminate on the | ||
1369 | same peer for them to behave rationally. | ||
1370 | |||
1371 | active-backup: This is generally the preferred mode, particularly if | ||
1372 | the switches have an ISL and play together well. If the | ||
1373 | network configuration is such that one switch is specifically | ||
1374 | a backup switch (e.g., has lower capacity, higher cost, etc), | ||
1375 | then the primary option can be used to insure that the | ||
1376 | preferred link is always used when it is available. | ||
1377 | |||
1378 | broadcast: This mode is really a special purpose mode, and is suitable | ||
1379 | only for very specific needs. For example, if the two | ||
1380 | switches are not connected (no ISL), and the networks beyond | ||
1381 | them are totally independent. In this case, if it is | ||
1382 | necessary for some specific one-way traffic to reach both | ||
1383 | independent networks, then the broadcast mode may be suitable. | ||
1384 | |||
1385 | 12.2.2 HA Link Monitoring Selection for Multiple Switch Topology | ||
1386 | ---------------------------------------------------------------- | ||
1387 | |||
1388 | The choice of link monitoring ultimately depends upon your | ||
1389 | switch. If the switch can reliably fail ports in response to other | ||
1390 | failures, then either the MII or ARP monitors should work. For | ||
1391 | example, in the above example, if the "port3" link fails at the remote | ||
1392 | end, the MII monitor has no direct means to detect this. The ARP | ||
1393 | monitor could be configured with a target at the remote end of port3, | ||
1394 | thus detecting that failure without switch support. | ||
1395 | |||
1396 | In general, however, in a multiple switch topology, the ARP | ||
1397 | monitor can provide a higher level of reliability in detecting end to | ||
1398 | end connectivity failures (which may be caused by the failure of any | ||
1399 | individual component to pass traffic for any reason). Additionally, | ||
1400 | the ARP monitor should be configured with multiple targets (at least | ||
1401 | one for each switch in the network). This will insure that, | ||
1402 | regardless of which switch is active, the ARP monitor has a suitable | ||
1403 | target to query. | ||
1404 | |||
1405 | |||
1406 | 13. Configuring Bonding for Maximum Throughput | ||
1407 | ============================================== | ||
1408 | |||
1409 | 13.1 Maximizing Throughput in a Single Switch Topology | ||
1410 | ------------------------------------------------------ | ||
1411 | |||
1412 | In a single switch configuration, the best method to maximize | ||
1413 | throughput depends upon the application and network environment. The | ||
1414 | various load balancing modes each have strengths and weaknesses in | ||
1415 | different environments, as detailed below. | ||
1416 | |||
1417 | For this discussion, we will break down the topologies into | ||
1418 | two categories. Depending upon the destination of most traffic, we | ||
1419 | categorize them into either "gatewayed" or "local" configurations. | ||
1420 | |||
1421 | In a gatewayed configuration, the "switch" is acting primarily | ||
1422 | as a router, and the majority of traffic passes through this router to | ||
1423 | other networks. An example would be the following: | ||
1424 | |||
1425 | |||
1426 | +----------+ +----------+ | ||
1427 | | |eth0 port1| | to other networks | ||
1428 | | Host A +---------------------+ router +-------------------> | ||
1429 | | +---------------------+ | Hosts B and C are out | ||
1430 | | |eth1 port2| | here somewhere | ||
1431 | +----------+ +----------+ | ||
1432 | |||
1433 | The router may be a dedicated router device, or another host | ||
1434 | acting as a gateway. For our discussion, the important point is that | ||
1435 | the majority of traffic from Host A will pass through the router to | ||
1436 | some other network before reaching its final destination. | ||
1437 | |||
1438 | In a gatewayed network configuration, although Host A may | ||
1439 | communicate with many other systems, all of its traffic will be sent | ||
1440 | and received via one other peer on the local network, the router. | ||
1441 | |||
1442 | Note that the case of two systems connected directly via | ||
1443 | multiple physical links is, for purposes of configuring bonding, the | ||
1444 | same as a gatewayed configuration. In that case, it happens that all | ||
1445 | traffic is destined for the "gateway" itself, not some other network | ||
1446 | beyond the gateway. | ||
1447 | |||
1448 | In a local configuration, the "switch" is acting primarily as | ||
1449 | a switch, and the majority of traffic passes through this switch to | ||
1450 | reach other stations on the same network. An example would be the | ||
1451 | following: | ||
1452 | |||
1453 | +----------+ +----------+ +--------+ | ||
1454 | | |eth0 port1| +-------+ Host B | | ||
1455 | | Host A +------------+ switch |port3 +--------+ | ||
1456 | | +------------+ | +--------+ | ||
1457 | | |eth1 port2| +------------------+ Host C | | ||
1458 | +----------+ +----------+port4 +--------+ | ||
1459 | |||
1460 | |||
1461 | Again, the switch may be a dedicated switch device, or another | ||
1462 | host acting as a gateway. For our discussion, the important point is | ||
1463 | that the majority of traffic from Host A is destined for other hosts | ||
1464 | on the same local network (Hosts B and C in the above example). | ||
1465 | |||
1466 | In summary, in a gatewayed configuration, traffic to and from | ||
1467 | the bonded device will be to the same MAC level peer on the network | ||
1468 | (the gateway itself, i.e., the router), regardless of its final | ||
1469 | destination. In a local configuration, traffic flows directly to and | ||
1470 | from the final destinations, thus, each destination (Host B, Host C) | ||
1471 | will be addressed directly by their individual MAC addresses. | ||
1472 | |||
1473 | This distinction between a gatewayed and a local network | ||
1474 | configuration is important because many of the load balancing modes | ||
1475 | available use the MAC addresses of the local network source and | ||
1476 | destination to make load balancing decisions. The behavior of each | ||
1477 | mode is described below. | ||
1478 | |||
1479 | |||
1480 | 13.1.1 MT Bonding Mode Selection for Single Switch Topology | ||
1481 | ----------------------------------------------------------- | ||
1187 | 1482 | ||
1188 | This configuration is the easiest to set up and to understand, | 1483 | This configuration is the easiest to set up and to understand, |
1189 | although you will have to decide which bonding mode best suits your | 1484 | although you will have to decide which bonding mode best suits your |
1190 | needs. The tradeoffs for each mode are detailed below: | 1485 | needs. The trade offs for each mode are detailed below: |
1191 | 1486 | ||
1192 | balance-rr: This mode is the only mode that will permit a single | 1487 | balance-rr: This mode is the only mode that will permit a single |
1193 | TCP/IP connection to stripe traffic across multiple | 1488 | TCP/IP connection to stripe traffic across multiple |
@@ -1206,6 +1501,23 @@ balance-rr: This mode is the only mode that will permit a single | |||
1206 | interface's worth of throughput, even after adjusting | 1501 | interface's worth of throughput, even after adjusting |
1207 | tcp_reordering. | 1502 | tcp_reordering. |
1208 | 1503 | ||
1504 | Note that this out of order delivery occurs when both the | ||
1505 | sending and receiving systems are utilizing a multiple | ||
1506 | interface bond. Consider a configuration in which a | ||
1507 | balance-rr bond feeds into a single higher capacity network | ||
1508 | channel (e.g., multiple 100Mb/sec ethernets feeding a single | ||
1509 | gigabit ethernet via an etherchannel capable switch). In this | ||
1510 | configuration, traffic sent from the multiple 100Mb devices to | ||
1511 | a destination connected to the gigabit device will not see | ||
1512 | packets out of order. However, traffic sent from the gigabit | ||
1513 | device to the multiple 100Mb devices may or may not see | ||
1514 | traffic out of order, depending upon the balance policy of the | ||
1515 | switch. Many switches do not support any modes that stripe | ||
1516 | traffic (instead choosing a port based upon IP or MAC level | ||
1517 | addresses); for those devices, traffic flowing from the | ||
1518 | gigabit device to the many 100Mb devices will only utilize one | ||
1519 | interface. | ||
1520 | |||
1209 | If you are utilizing protocols other than TCP/IP, UDP for | 1521 | If you are utilizing protocols other than TCP/IP, UDP for |
1210 | example, and your application can tolerate out of order | 1522 | example, and your application can tolerate out of order |
1211 | delivery, then this mode can allow for single stream datagram | 1523 | delivery, then this mode can allow for single stream datagram |
@@ -1220,16 +1532,21 @@ active-backup: There is not much advantage in this network topology to | |||
1220 | connected to the same peer as the primary. In this case, a | 1532 | connected to the same peer as the primary. In this case, a |
1221 | load balancing mode (with link monitoring) will provide the | 1533 | load balancing mode (with link monitoring) will provide the |
1222 | same level of network availability, but with increased | 1534 | same level of network availability, but with increased |
1223 | available bandwidth. On the plus side, it does not require | 1535 | available bandwidth. On the plus side, active-backup mode |
1224 | any configuration of the switch. | 1536 | does not require any configuration of the switch, so it may |
1537 | have value if the hardware available does not support any of | ||
1538 | the load balance modes. | ||
1225 | 1539 | ||
1226 | balance-xor: This mode will limit traffic such that packets destined | 1540 | balance-xor: This mode will limit traffic such that packets destined |
1227 | for specific peers will always be sent over the same | 1541 | for specific peers will always be sent over the same |
1228 | interface. Since the destination is determined by the MAC | 1542 | interface. Since the destination is determined by the MAC |
1229 | addresses involved, this may be desirable if you have a large | 1543 | addresses involved, this mode works best in a "local" network |
1230 | network with many hosts. It is likely to be suboptimal if all | 1544 | configuration (as described above), with destinations all on |
1231 | your traffic is passed through a single router, however. As | 1545 | the same local network. This mode is likely to be suboptimal |
1232 | with balance-rr, the switch ports need to be configured for | 1546 | if all your traffic is passed through a single router (i.e., a |
1547 | "gatewayed" network configuration, as described above). | ||
1548 | |||
1549 | As with balance-rr, the switch ports need to be configured for | ||
1233 | "etherchannel" or "trunking." | 1550 | "etherchannel" or "trunking." |
1234 | 1551 | ||
1235 | broadcast: Like active-backup, there is not much advantage to this | 1552 | broadcast: Like active-backup, there is not much advantage to this |
@@ -1241,122 +1558,131 @@ broadcast: Like active-backup, there is not much advantage to this | |||
1241 | protocol includes automatic configuration of the aggregates, | 1558 | protocol includes automatic configuration of the aggregates, |
1242 | so minimal manual configuration of the switch is needed | 1559 | so minimal manual configuration of the switch is needed |
1243 | (typically only to designate that some set of devices is | 1560 | (typically only to designate that some set of devices is |
1244 | usable for 802.3ad). The 802.3ad standard also mandates that | 1561 | available for 802.3ad). The 802.3ad standard also mandates |
1245 | frames be delivered in order (within certain limits), so in | 1562 | that frames be delivered in order (within certain limits), so |
1246 | general single connections will not see misordering of | 1563 | in general single connections will not see misordering of |
1247 | packets. The 802.3ad mode does have some drawbacks: the | 1564 | packets. The 802.3ad mode does have some drawbacks: the |
1248 | standard mandates that all devices in the aggregate operate at | 1565 | standard mandates that all devices in the aggregate operate at |
1249 | the same speed and duplex. Also, as with all bonding load | 1566 | the same speed and duplex. Also, as with all bonding load |
1250 | balance modes other than balance-rr, no single connection will | 1567 | balance modes other than balance-rr, no single connection will |
1251 | be able to utilize more than a single interface's worth of | 1568 | be able to utilize more than a single interface's worth of |
1252 | bandwidth. Additionally, the linux bonding 802.3ad | 1569 | bandwidth. |
1253 | implementation distributes traffic by peer (using an XOR of | 1570 | |
1254 | MAC addresses), so in general all traffic to a particular | 1571 | Additionally, the linux bonding 802.3ad implementation |
1255 | destination will use the same interface. Finally, the 802.3ad | 1572 | distributes traffic by peer (using an XOR of MAC addresses), |
1256 | mode mandates the use of the MII monitor, therefore, the ARP | 1573 | so in a "gatewayed" configuration, all outgoing traffic will |
1257 | monitor is not available in this mode. | 1574 | generally use the same device. Incoming traffic may also end |
1258 | 1575 | up on a single device, but that is dependent upon the | |
1259 | balance-tlb: This mode is also a good choice for this type of | 1576 | balancing policy of the peer's 8023.ad implementation. In a |
1260 | topology. It has no special switch configuration | 1577 | "local" configuration, traffic will be distributed across the |
1261 | requirements, and balances outgoing traffic by peer, in a | 1578 | devices in the bond. |
1262 | vaguely intelligent manner (not a simple XOR as in balance-xor | 1579 | |
1263 | or 802.3ad mode), so that unlucky MAC addresses will not all | 1580 | Finally, the 802.3ad mode mandates the use of the MII monitor, |
1264 | "bunch up" on a single interface. Interfaces may be of | 1581 | therefore, the ARP monitor is not available in this mode. |
1265 | differing speeds. On the down side, in this mode all incoming | 1582 | |
1266 | traffic arrives over a single interface, this mode requires | 1583 | balance-tlb: The balance-tlb mode balances outgoing traffic by peer. |
1267 | certain ethtool support in the network device driver of the | 1584 | Since the balancing is done according to MAC address, in a |
1268 | slave interfaces, and the ARP monitor is not available. | 1585 | "gatewayed" configuration (as described above), this mode will |
1269 | 1586 | send all traffic across a single device. However, in a | |
1270 | balance-alb: This mode is everything that balance-tlb is, and more. It | 1587 | "local" network configuration, this mode balances multiple |
1271 | has all of the features (and restrictions) of balance-tlb, and | 1588 | local network peers across devices in a vaguely intelligent |
1272 | will also balance incoming traffic from peers (as described in | 1589 | manner (not a simple XOR as in balance-xor or 802.3ad mode), |
1273 | the Bonding Module Options section, above). The only extra | 1590 | so that mathematically unlucky MAC addresses (i.e., ones that |
1274 | down side to this mode is that the network device driver must | 1591 | XOR to the same value) will not all "bunch up" on a single |
1275 | support changing the hardware address while the device is | 1592 | interface. |
1276 | open. | 1593 | |
1277 | 1594 | Unlike 802.3ad, interfaces may be of differing speeds, and no | |
1278 | 12.1.2 Link Monitoring for Single Switch Topology | 1595 | special switch configuration is required. On the down side, |
1279 | ------------------------------------------------- | 1596 | in this mode all incoming traffic arrives over a single |
1597 | interface, this mode requires certain ethtool support in the | ||
1598 | network device driver of the slave interfaces, and the ARP | ||
1599 | monitor is not available. | ||
1600 | |||
1601 | balance-alb: This mode is everything that balance-tlb is, and more. | ||
1602 | It has all of the features (and restrictions) of balance-tlb, | ||
1603 | and will also balance incoming traffic from local network | ||
1604 | peers (as described in the Bonding Module Options section, | ||
1605 | above). | ||
1606 | |||
1607 | The only additional down side to this mode is that the network | ||
1608 | device driver must support changing the hardware address while | ||
1609 | the device is open. | ||
1610 | |||
1611 | 13.1.2 MT Link Monitoring for Single Switch Topology | ||
1612 | ---------------------------------------------------- | ||
1280 | 1613 | ||
1281 | The choice of link monitoring may largely depend upon which | 1614 | The choice of link monitoring may largely depend upon which |
1282 | mode you choose to use. The more advanced load balancing modes do not | 1615 | mode you choose to use. The more advanced load balancing modes do not |
1283 | support the use of the ARP monitor, and are thus restricted to using | 1616 | support the use of the ARP monitor, and are thus restricted to using |
1284 | the MII monitor (which does not provide as high a level of assurance | 1617 | the MII monitor (which does not provide as high a level of end to end |
1285 | as the ARP monitor). | 1618 | assurance as the ARP monitor). |
1286 | 1619 | ||
1287 | 1620 | 13.2 Maximum Throughput in a Multiple Switch Topology | |
1288 | 12.2 High Availability in a Multiple Switch Topology | 1621 | ----------------------------------------------------- |
1289 | ---------------------------------------------------- | 1622 | |
1290 | 1623 | Multiple switches may be utilized to optimize for throughput | |
1291 | With multiple switches, the configuration of bonding and the | 1624 | when they are configured in parallel as part of an isolated network |
1292 | network changes dramatically. In multiple switch topologies, there is | 1625 | between two or more systems, for example: |
1293 | a tradeoff between network availability and usable bandwidth. | 1626 | |
1294 | 1627 | +-----------+ | |
1295 | Below is a sample network, configured to maximize the | 1628 | | Host A | |
1296 | availability of the network: | 1629 | +-+---+---+-+ |
1297 | 1630 | | | | | |
1298 | | | | 1631 | +--------+ | +---------+ |
1299 | |port3 port3| | 1632 | | | | |
1300 | +-----+----+ +-----+----+ | 1633 | +------+---+ +-----+----+ +-----+----+ |
1301 | | |port2 ISL port2| | | 1634 | | Switch A | | Switch B | | Switch C | |
1302 | | switch A +--------------------------+ switch B | | 1635 | +------+---+ +-----+----+ +-----+----+ |
1303 | | | | | | 1636 | | | | |
1304 | +-----+----+ +-----++---+ | 1637 | +--------+ | +---------+ |
1305 | |port1 port1| | 1638 | | | | |
1306 | | +-------+ | | 1639 | +-+---+---+-+ |
1307 | +-------------+ host1 +---------------+ | 1640 | | Host B | |
1308 | eth0 +-------+ eth1 | 1641 | +-----------+ |
1309 | 1642 | ||
1310 | In this configuration, there is a link between the two | 1643 | In this configuration, the switches are isolated from one |
1311 | switches (ISL, or inter switch link), and multiple ports connecting to | 1644 | another. One reason to employ a topology such as this is for an |
1312 | the outside world ("port3" on each switch). There is no technical | 1645 | isolated network with many hosts (a cluster configured for high |
1313 | reason that this could not be extended to a third switch. | 1646 | performance, for example), using multiple smaller switches can be more |
1314 | 1647 | cost effective than a single larger switch, e.g., on a network with 24 | |
1315 | 12.2.1 Bonding Mode Selection for Multiple Switch Topology | 1648 | hosts, three 24 port switches can be significantly less expensive than |
1316 | ---------------------------------------------------------- | 1649 | a single 72 port switch. |
1317 | 1650 | ||
1318 | In a topology such as this, the active-backup and broadcast | 1651 | If access beyond the network is required, an individual host |
1319 | modes are the only useful bonding modes; the other modes require all | 1652 | can be equipped with an additional network device connected to an |
1320 | links to terminate on the same peer for them to behave rationally. | 1653 | external network; this host then additionally acts as a gateway. |
1321 | 1654 | ||
1322 | active-backup: This is generally the preferred mode, particularly if | 1655 | 13.2.1 MT Bonding Mode Selection for Multiple Switch Topology |
1323 | the switches have an ISL and play together well. If the | ||
1324 | network configuration is such that one switch is specifically | ||
1325 | a backup switch (e.g., has lower capacity, higher cost, etc), | ||
1326 | then the primary option can be used to insure that the | ||
1327 | preferred link is always used when it is available. | ||
1328 | |||
1329 | broadcast: This mode is really a special purpose mode, and is suitable | ||
1330 | only for very specific needs. For example, if the two | ||
1331 | switches are not connected (no ISL), and the networks beyond | ||
1332 | them are totally independant. In this case, if it is | ||
1333 | necessary for some specific one-way traffic to reach both | ||
1334 | independent networks, then the broadcast mode may be suitable. | ||
1335 | |||
1336 | 12.2.2 Link Monitoring Selection for Multiple Switch Topology | ||
1337 | ------------------------------------------------------------- | 1656 | ------------------------------------------------------------- |
1338 | 1657 | ||
1339 | The choice of link monitoring ultimately depends upon your | 1658 | In actual practice, the bonding mode typically employed in |
1340 | switch. If the switch can reliably fail ports in response to other | 1659 | configurations of this type is balance-rr. Historically, in this |
1341 | failures, then either the MII or ARP monitors should work. For | 1660 | network configuration, the usual caveats about out of order packet |
1342 | example, in the above example, if the "port3" link fails at the remote | 1661 | delivery are mitigated by the use of network adapters that do not do |
1343 | end, the MII monitor has no direct means to detect this. The ARP | 1662 | any kind of packet coalescing (via the use of NAPI, or because the |
1344 | monitor could be configured with a target at the remote end of port3, | 1663 | device itself does not generate interrupts until some number of |
1345 | thus detecting that failure without switch support. | 1664 | packets has arrived). When employed in this fashion, the balance-rr |
1665 | mode allows individual connections between two hosts to effectively | ||
1666 | utilize greater than one interface's bandwidth. | ||
1346 | 1667 | ||
1347 | In general, however, in a multiple switch topology, the ARP | 1668 | 13.2.2 MT Link Monitoring for Multiple Switch Topology |
1348 | monitor can provide a higher level of reliability in detecting link | 1669 | ------------------------------------------------------ |
1349 | failures. Additionally, it should be configured with multiple targets | ||
1350 | (at least one for each switch in the network). This will insure that, | ||
1351 | regardless of which switch is active, the ARP monitor has a suitable | ||
1352 | target to query. | ||
1353 | 1670 | ||
1671 | Again, in actual practice, the MII monitor is most often used | ||
1672 | in this configuration, as performance is given preference over | ||
1673 | availability. The ARP monitor will function in this topology, but its | ||
1674 | advantages over the MII monitor are mitigated by the volume of probes | ||
1675 | needed as the number of systems involved grows (remember that each | ||
1676 | host in the network is configured with bonding). | ||
1354 | 1677 | ||
1355 | 12.3 Switch Behavior Issues for High Availability | 1678 | 14. Switch Behavior Issues |
1356 | ------------------------------------------------- | 1679 | ========================== |
1357 | 1680 | ||
1358 | You may encounter issues with the timing of link up and down | 1681 | 14.1 Link Establishment and Failover Delays |
1359 | reporting by the switch. | 1682 | ------------------------------------------- |
1683 | |||
1684 | Some switches exhibit undesirable behavior with regard to the | ||
1685 | timing of link up and down reporting by the switch. | ||
1360 | 1686 | ||
1361 | First, when a link comes up, some switches may indicate that | 1687 | First, when a link comes up, some switches may indicate that |
1362 | the link is up (carrier available), but not pass traffic over the | 1688 | the link is up (carrier available), but not pass traffic over the |
@@ -1370,30 +1696,70 @@ relevant interface(s). | |||
1370 | Second, some switches may "bounce" the link state one or more | 1696 | Second, some switches may "bounce" the link state one or more |
1371 | times while a link is changing state. This occurs most commonly while | 1697 | times while a link is changing state. This occurs most commonly while |
1372 | the switch is initializing. Again, an appropriate updelay value may | 1698 | the switch is initializing. Again, an appropriate updelay value may |
1373 | help, but note that if all links are down, then updelay is ignored | 1699 | help. |
1374 | when any link becomes active (the slave closest to completing its | ||
1375 | updelay is chosen). | ||
1376 | 1700 | ||
1377 | Note that when a bonding interface has no active links, the | 1701 | Note that when a bonding interface has no active links, the |
1378 | driver will immediately reuse the first link that goes up, even if | 1702 | driver will immediately reuse the first link that goes up, even if the |
1379 | updelay parameter was specified. If there are slave interfaces | 1703 | updelay parameter has been specified (the updelay is ignored in this |
1380 | waiting for the updelay timeout to expire, the interface that first | 1704 | case). If there are slave interfaces waiting for the updelay timeout |
1381 | went into that state will be immediately reused. This reduces down | 1705 | to expire, the interface that first went into that state will be |
1382 | time of the network if the value of updelay has been overestimated. | 1706 | immediately reused. This reduces down time of the network if the |
1707 | value of updelay has been overestimated, and since this occurs only in | ||
1708 | cases with no connectivity, there is no additional penalty for | ||
1709 | ignoring the updelay. | ||
1383 | 1710 | ||
1384 | In addition to the concerns about switch timings, if your | 1711 | In addition to the concerns about switch timings, if your |
1385 | switches take a long time to go into backup mode, it may be desirable | 1712 | switches take a long time to go into backup mode, it may be desirable |
1386 | to not activate a backup interface immediately after a link goes down. | 1713 | to not activate a backup interface immediately after a link goes down. |
1387 | Failover may be delayed via the downdelay bonding module option. | 1714 | Failover may be delayed via the downdelay bonding module option. |
1388 | 1715 | ||
1389 | 13. Hardware Specific Considerations | 1716 | 14.2 Duplicated Incoming Packets |
1717 | -------------------------------- | ||
1718 | |||
1719 | It is not uncommon to observe a short burst of duplicated | ||
1720 | traffic when the bonding device is first used, or after it has been | ||
1721 | idle for some period of time. This is most easily observed by issuing | ||
1722 | a "ping" to some other host on the network, and noticing that the | ||
1723 | output from ping flags duplicates (typically one per slave). | ||
1724 | |||
1725 | For example, on a bond in active-backup mode with five slaves | ||
1726 | all connected to one switch, the output may appear as follows: | ||
1727 | |||
1728 | # ping -n 10.0.4.2 | ||
1729 | PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data. | ||
1730 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms | ||
1731 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1732 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1733 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1734 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1735 | 64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms | ||
1736 | 64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms | ||
1737 | 64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms | ||
1738 | |||
1739 | This is not due to an error in the bonding driver, rather, it | ||
1740 | is a side effect of how many switches update their MAC forwarding | ||
1741 | tables. Initially, the switch does not associate the MAC address in | ||
1742 | the packet with a particular switch port, and so it may send the | ||
1743 | traffic to all ports until its MAC forwarding table is updated. Since | ||
1744 | the interfaces attached to the bond may occupy multiple ports on a | ||
1745 | single switch, when the switch (temporarily) floods the traffic to all | ||
1746 | ports, the bond device receives multiple copies of the same packet | ||
1747 | (one per slave device). | ||
1748 | |||
1749 | The duplicated packet behavior is switch dependent, some | ||
1750 | switches exhibit this, and some do not. On switches that display this | ||
1751 | behavior, it can be induced by clearing the MAC forwarding table (on | ||
1752 | most Cisco switches, the privileged command "clear mac address-table | ||
1753 | dynamic" will accomplish this). | ||
1754 | |||
1755 | 15. Hardware Specific Considerations | ||
1390 | ==================================== | 1756 | ==================================== |
1391 | 1757 | ||
1392 | This section contains additional information for configuring | 1758 | This section contains additional information for configuring |
1393 | bonding on specific hardware platforms, or for interfacing bonding | 1759 | bonding on specific hardware platforms, or for interfacing bonding |
1394 | with particular switches or other devices. | 1760 | with particular switches or other devices. |
1395 | 1761 | ||
1396 | 13.1 IBM BladeCenter | 1762 | 15.1 IBM BladeCenter |
1397 | -------------------- | 1763 | -------------------- |
1398 | 1764 | ||
1399 | This applies to the JS20 and similar systems. | 1765 | This applies to the JS20 and similar systems. |
@@ -1407,12 +1773,12 @@ JS20 network adapter information | |||
1407 | -------------------------------- | 1773 | -------------------------------- |
1408 | 1774 | ||
1409 | All JS20s come with two Broadcom Gigabit Ethernet ports | 1775 | All JS20s come with two Broadcom Gigabit Ethernet ports |
1410 | integrated on the planar. In the BladeCenter chassis, the eth0 port | 1776 | integrated on the planar (that's "motherboard" in IBM-speak). In the |
1411 | of all JS20 blades is hard wired to I/O Module #1; similarly, all eth1 | 1777 | BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to |
1412 | ports are wired to I/O Module #2. An add-on Broadcom daughter card | 1778 | I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2. |
1413 | can be installed on a JS20 to provide two more Gigabit Ethernet ports. | 1779 | An add-on Broadcom daughter card can be installed on a JS20 to provide |
1414 | These ports, eth2 and eth3, are wired to I/O Modules 3 and 4, | 1780 | two more Gigabit Ethernet ports. These ports, eth2 and eth3, are |
1415 | respectively. | 1781 | wired to I/O Modules 3 and 4, respectively. |
1416 | 1782 | ||
1417 | Each I/O Module may contain either a switch or a passthrough | 1783 | Each I/O Module may contain either a switch or a passthrough |
1418 | module (which allows ports to be directly connected to an external | 1784 | module (which allows ports to be directly connected to an external |
@@ -1432,29 +1798,30 @@ BladeCenter networking configuration | |||
1432 | of ways, this discussion will be confined to describing basic | 1798 | of ways, this discussion will be confined to describing basic |
1433 | configurations. | 1799 | configurations. |
1434 | 1800 | ||
1435 | Normally, Ethernet Switch Modules (ESM) are used in I/O | 1801 | Normally, Ethernet Switch Modules (ESMs) are used in I/O |
1436 | modules 1 and 2. In this configuration, the eth0 and eth1 ports of a | 1802 | modules 1 and 2. In this configuration, the eth0 and eth1 ports of a |
1437 | JS20 will be connected to different internal switches (in the | 1803 | JS20 will be connected to different internal switches (in the |
1438 | respective I/O modules). | 1804 | respective I/O modules). |
1439 | 1805 | ||
1440 | An optical passthru module (OPM) connects the I/O module | 1806 | A passthrough module (OPM or CPM, optical or copper, |
1441 | directly to an external switch. By using OPMs in I/O module #1 and | 1807 | passthrough module) connects the I/O module directly to an external |
1442 | #2, the eth0 and eth1 interfaces of a JS20 can be redirected to the | 1808 | switch. By using PMs in I/O module #1 and #2, the eth0 and eth1 |
1443 | outside world and connected to a common external switch. | 1809 | interfaces of a JS20 can be redirected to the outside world and |
1444 | 1810 | connected to a common external switch. | |
1445 | Depending upon the mix of ESM and OPM modules, the network | 1811 | |
1446 | will appear to bonding as either a single switch topology (all OPM | 1812 | Depending upon the mix of ESMs and PMs, the network will |
1447 | modules) or as a multiple switch topology (one or more ESM modules, | 1813 | appear to bonding as either a single switch topology (all PMs) or as a |
1448 | zero or more OPM modules). It is also possible to connect ESM modules | 1814 | multiple switch topology (one or more ESMs, zero or more PMs). It is |
1449 | together, resulting in a configuration much like the example in "High | 1815 | also possible to connect ESMs together, resulting in a configuration |
1450 | Availability in a multiple switch topology." | 1816 | much like the example in "High Availability in a Multiple Switch |
1451 | 1817 | Topology," above. | |
1452 | Requirements for specifc modes | 1818 | |
1453 | ------------------------------ | 1819 | Requirements for specific modes |
1454 | 1820 | ------------------------------- | |
1455 | The balance-rr mode requires the use of OPM modules for | 1821 | |
1456 | devices in the bond, all connected to an common external switch. That | 1822 | The balance-rr mode requires the use of passthrough modules |
1457 | switch must be configured for "etherchannel" or "trunking" on the | 1823 | for devices in the bond, all connected to an common external switch. |
1824 | That switch must be configured for "etherchannel" or "trunking" on the | ||
1458 | appropriate ports, as is usual for balance-rr. | 1825 | appropriate ports, as is usual for balance-rr. |
1459 | 1826 | ||
1460 | The balance-alb and balance-tlb modes will function with | 1827 | The balance-alb and balance-tlb modes will function with |
@@ -1484,17 +1851,18 @@ connected to the JS20 system. | |||
1484 | Other concerns | 1851 | Other concerns |
1485 | -------------- | 1852 | -------------- |
1486 | 1853 | ||
1487 | The Serial Over LAN link is established over the primary | 1854 | The Serial Over LAN (SoL) link is established over the primary |
1488 | ethernet (eth0) only, therefore, any loss of link to eth0 will result | 1855 | ethernet (eth0) only, therefore, any loss of link to eth0 will result |
1489 | in losing your SoL connection. It will not fail over with other | 1856 | in losing your SoL connection. It will not fail over with other |
1490 | network traffic. | 1857 | network traffic, as the SoL system is beyond the control of the |
1858 | bonding driver. | ||
1491 | 1859 | ||
1492 | It may be desirable to disable spanning tree on the switch | 1860 | It may be desirable to disable spanning tree on the switch |
1493 | (either the internal Ethernet Switch Module, or an external switch) to | 1861 | (either the internal Ethernet Switch Module, or an external switch) to |
1494 | avoid fail-over delays issues when using bonding. | 1862 | avoid fail-over delay issues when using bonding. |
1495 | 1863 | ||
1496 | 1864 | ||
1497 | 14. Frequently Asked Questions | 1865 | 16. Frequently Asked Questions |
1498 | ============================== | 1866 | ============================== |
1499 | 1867 | ||
1500 | 1. Is it SMP safe? | 1868 | 1. Is it SMP safe? |
@@ -1505,8 +1873,8 @@ The new driver was designed to be SMP safe from the start. | |||
1505 | 2. What type of cards will work with it? | 1873 | 2. What type of cards will work with it? |
1506 | 1874 | ||
1507 | Any Ethernet type cards (you can even mix cards - a Intel | 1875 | Any Ethernet type cards (you can even mix cards - a Intel |
1508 | EtherExpress PRO/100 and a 3com 3c905b, for example). They need not | 1876 | EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, |
1509 | be of the same speed. | 1877 | devices need not be of the same speed. |
1510 | 1878 | ||
1511 | 3. How many bonding devices can I have? | 1879 | 3. How many bonding devices can I have? |
1512 | 1880 | ||
@@ -1524,11 +1892,12 @@ system. | |||
1524 | disabled. The active-backup mode will fail over to a backup link, and | 1892 | disabled. The active-backup mode will fail over to a backup link, and |
1525 | other modes will ignore the failed link. The link will continue to be | 1893 | other modes will ignore the failed link. The link will continue to be |
1526 | monitored, and should it recover, it will rejoin the bond (in whatever | 1894 | monitored, and should it recover, it will rejoin the bond (in whatever |
1527 | manner is appropriate for the mode). See the section on High | 1895 | manner is appropriate for the mode). See the sections on High |
1528 | Availability for additional information. | 1896 | Availability and the documentation for each mode for additional |
1897 | information. | ||
1529 | 1898 | ||
1530 | Link monitoring can be enabled via either the miimon or | 1899 | Link monitoring can be enabled via either the miimon or |
1531 | arp_interval paramters (described in the module paramters section, | 1900 | arp_interval parameters (described in the module parameters section, |
1532 | above). In general, miimon monitors the carrier state as sensed by | 1901 | above). In general, miimon monitors the carrier state as sensed by |
1533 | the underlying network device, and the arp monitor (arp_interval) | 1902 | the underlying network device, and the arp monitor (arp_interval) |
1534 | monitors connectivity to another host on the local network. | 1903 | monitors connectivity to another host on the local network. |
@@ -1536,7 +1905,7 @@ monitors connectivity to another host on the local network. | |||
1536 | If no link monitoring is configured, the bonding driver will | 1905 | If no link monitoring is configured, the bonding driver will |
1537 | be unable to detect link failures, and will assume that all links are | 1906 | be unable to detect link failures, and will assume that all links are |
1538 | always available. This will likely result in lost packets, and a | 1907 | always available. This will likely result in lost packets, and a |
1539 | resulting degredation of performance. The precise performance loss | 1908 | resulting degradation of performance. The precise performance loss |
1540 | depends upon the bonding mode and network configuration. | 1909 | depends upon the bonding mode and network configuration. |
1541 | 1910 | ||
1542 | 6. Can bonding be used for High Availability? | 1911 | 6. Can bonding be used for High Availability? |
@@ -1550,12 +1919,12 @@ depends upon the bonding mode and network configuration. | |||
1550 | In the basic balance modes (balance-rr and balance-xor), it | 1919 | In the basic balance modes (balance-rr and balance-xor), it |
1551 | works with any system that supports etherchannel (also called | 1920 | works with any system that supports etherchannel (also called |
1552 | trunking). Most managed switches currently available have such | 1921 | trunking). Most managed switches currently available have such |
1553 | support, and many unmananged switches as well. | 1922 | support, and many unmanaged switches as well. |
1554 | 1923 | ||
1555 | The advanced balance modes (balance-tlb and balance-alb) do | 1924 | The advanced balance modes (balance-tlb and balance-alb) do |
1556 | not have special switch requirements, but do need device drivers that | 1925 | not have special switch requirements, but do need device drivers that |
1557 | support specific features (described in the appropriate section under | 1926 | support specific features (described in the appropriate section under |
1558 | module paramters, above). | 1927 | module parameters, above). |
1559 | 1928 | ||
1560 | In 802.3ad mode, it works with with systems that support IEEE | 1929 | In 802.3ad mode, it works with with systems that support IEEE |
1561 | 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged | 1930 | 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged |
@@ -1565,17 +1934,19 @@ switches currently available support 802.3ad. | |||
1565 | 1934 | ||
1566 | 8. Where does a bonding device get its MAC address from? | 1935 | 8. Where does a bonding device get its MAC address from? |
1567 | 1936 | ||
1568 | If not explicitly configured with ifconfig, the MAC address of | 1937 | If not explicitly configured (with ifconfig or ip link), the |
1569 | the bonding device is taken from its first slave device. This MAC | 1938 | MAC address of the bonding device is taken from its first slave |
1570 | address is then passed to all following slaves and remains persistent | 1939 | device. This MAC address is then passed to all following slaves and |
1571 | (even if the the first slave is removed) until the bonding device is | 1940 | remains persistent (even if the the first slave is removed) until the |
1572 | brought down or reconfigured. | 1941 | bonding device is brought down or reconfigured. |
1573 | 1942 | ||
1574 | If you wish to change the MAC address, you can set it with | 1943 | If you wish to change the MAC address, you can set it with |
1575 | ifconfig: | 1944 | ifconfig or ip link: |
1576 | 1945 | ||
1577 | # ifconfig bond0 hw ether 00:11:22:33:44:55 | 1946 | # ifconfig bond0 hw ether 00:11:22:33:44:55 |
1578 | 1947 | ||
1948 | # ip link set bond0 address 66:77:88:99:aa:bb | ||
1949 | |||
1579 | The MAC address can be also changed by bringing down/up the | 1950 | The MAC address can be also changed by bringing down/up the |
1580 | device and then changing its slaves (or their order): | 1951 | device and then changing its slaves (or their order): |
1581 | 1952 | ||
@@ -1591,23 +1962,28 @@ from the bond (`ifenslave -d bond0 eth0'). The bonding driver will | |||
1591 | then restore the MAC addresses that the slaves had before they were | 1962 | then restore the MAC addresses that the slaves had before they were |
1592 | enslaved. | 1963 | enslaved. |
1593 | 1964 | ||
1594 | 15. Resources and Links | 1965 | 16. Resources and Links |
1595 | ======================= | 1966 | ======================= |
1596 | 1967 | ||
1597 | The latest version of the bonding driver can be found in the latest | 1968 | The latest version of the bonding driver can be found in the latest |
1598 | version of the linux kernel, found on http://kernel.org | 1969 | version of the linux kernel, found on http://kernel.org |
1599 | 1970 | ||
1971 | The latest version of this document can be found in either the latest | ||
1972 | kernel source (named Documentation/networking/bonding.txt), or on the | ||
1973 | bonding sourceforge site: | ||
1974 | |||
1975 | http://www.sourceforge.net/projects/bonding | ||
1976 | |||
1600 | Discussions regarding the bonding driver take place primarily on the | 1977 | Discussions regarding the bonding driver take place primarily on the |
1601 | bonding-devel mailing list, hosted at sourceforge.net. If you have | 1978 | bonding-devel mailing list, hosted at sourceforge.net. If you have |
1602 | questions or problems, post them to the list. | 1979 | questions or problems, post them to the list. The list address is: |
1603 | 1980 | ||
1604 | bonding-devel@lists.sourceforge.net | 1981 | bonding-devel@lists.sourceforge.net |
1605 | 1982 | ||
1606 | https://lists.sourceforge.net/lists/listinfo/bonding-devel | 1983 | The administrative interface (to subscribe or unsubscribe) can |
1607 | 1984 | be found at: | |
1608 | There is also a project site on sourceforge. | ||
1609 | 1985 | ||
1610 | http://www.sourceforge.net/projects/bonding | 1986 | https://lists.sourceforge.net/lists/listinfo/bonding-devel |
1611 | 1987 | ||
1612 | Donald Becker's Ethernet Drivers and diag programs may be found at : | 1988 | Donald Becker's Ethernet Drivers and diag programs may be found at : |
1613 | - http://www.scyld.com/network/ | 1989 | - http://www.scyld.com/network/ |
diff --git a/Documentation/networking/dmfe.txt b/Documentation/networking/dmfe.txt index c0e8398674ef..046363552d09 100644 --- a/Documentation/networking/dmfe.txt +++ b/Documentation/networking/dmfe.txt | |||
@@ -1,59 +1,65 @@ | |||
1 | dmfe.c: Version 1.28 01/18/2000 | 1 | Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux. |
2 | 2 | ||
3 | A Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux. | 3 | This program is free software; you can redistribute it and/or |
4 | Copyright (C) 1997 Sten Wang | 4 | modify it under the terms of the GNU General Public License |
5 | as published by the Free Software Foundation; either version 2 | ||
6 | of the License, or (at your option) any later version. | ||
5 | 7 | ||
6 | This program is free software; you can redistribute it and/or | 8 | This program is distributed in the hope that it will be useful, |
7 | modify it under the terms of the GNU General Public License | 9 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
8 | as published by the Free Software Foundation; either version 2 | 10 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
9 | of the License, or (at your option) any later version. | 11 | GNU General Public License for more details. |
10 | 12 | ||
11 | This program is distributed in the hope that it will be useful, | ||
12 | but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
13 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
14 | GNU General Public License for more details. | ||
15 | 13 | ||
14 | This driver provides kernel support for Davicom DM9102(A)/DM9132/DM9801 ethernet cards ( CNET | ||
15 | 10/100 ethernet cards uses Davicom chipset too, so this driver supports CNET cards too ).If you | ||
16 | didn't compile this driver as a module, it will automatically load itself on boot and print a | ||
17 | line similar to : | ||
16 | 18 | ||
17 | A. Compiler command: | 19 | dmfe: Davicom DM9xxx net driver, version 1.36.4 (2002-01-17) |
18 | 20 | ||
19 | A-1: For normal single or multiple processor kernel | 21 | If you compiled this driver as a module, you have to load it on boot.You can load it with command : |
20 | "gcc -DMODULE -D__KERNEL__ -I/usr/src/linux/net/inet -Wall | ||
21 | -Wstrict-prototypes -O6 -c dmfe.c" | ||
22 | 22 | ||
23 | A-2: For single or multiple processor with kernel module version function | 23 | insmod dmfe |
24 | "gcc -DMODULE -DMODVERSIONS -D__KERNEL__ -I/usr/src/linux/net/inet | ||
25 | -Wall -Wstrict-prototypes -O6 -c dmfe.c" | ||
26 | 24 | ||
25 | This way it will autodetect the device mode.This is the suggested way to load the module.Or you can pass | ||
26 | a mode= setting to module while loading, like : | ||
27 | 27 | ||
28 | B. The following steps teach you how to activate a DM9102 board: | 28 | insmod dmfe mode=0 # Force 10M Half Duplex |
29 | insmod dmfe mode=1 # Force 100M Half Duplex | ||
30 | insmod dmfe mode=4 # Force 10M Full Duplex | ||
31 | insmod dmfe mode=5 # Force 100M Full Duplex | ||
29 | 32 | ||
30 | 1. Used the upper compiler command to compile dmfe.c | 33 | Next you should configure your network interface with a command similar to : |
31 | 34 | ||
32 | 2. Insert dmfe module into kernel | 35 | ifconfig eth0 172.22.3.18 |
33 | "insmod dmfe" ;;Auto Detection Mode (Suggest) | 36 | ^^^^^^^^^^^ |
34 | "insmod dmfe mode=0" ;;Force 10M Half Duplex | 37 | Your IP Adress |
35 | "insmod dmfe mode=1" ;;Force 100M Half Duplex | ||
36 | "insmod dmfe mode=4" ;;Force 10M Full Duplex | ||
37 | "insmod dmfe mode=5" ;;Force 100M Full Duplex | ||
38 | 38 | ||
39 | 3. Config a dm9102 network interface | 39 | Then you may have to modify the default routing table with command : |
40 | "ifconfig eth0 172.22.3.18" | ||
41 | ^^^^^^^^^^^ Your IP address | ||
42 | 40 | ||
43 | 4. Activate the IP routing table. For some distributions, it is not | 41 | route add default eth0 |
44 | necessary. You can type "route" to check. | ||
45 | 42 | ||
46 | "route add default eth0" | ||
47 | 43 | ||
44 | Now your ethernet card should be up and running. | ||
48 | 45 | ||
49 | 5. Well done. Your DM9102 adapter is now activated. | ||
50 | 46 | ||
47 | TODO: | ||
51 | 48 | ||
52 | C. Object files description: | 49 | Implement pci_driver::suspend() and pci_driver::resume() power management methods. |
53 | 1. dmfe_rh61.o: For Redhat 6.1 | 50 | Check on 64 bit boxes. |
51 | Check and fix on big endian boxes. | ||
52 | Test and make sure PCI latency is now correct for all cases. | ||
54 | 53 | ||
55 | If you can make sure your kernel version, you can rename | ||
56 | to dmfe.o and directly use it without re-compiling. | ||
57 | 54 | ||
55 | Authors: | ||
58 | 56 | ||
59 | Author: Sten Wang, 886-3-5798797-8517, E-mail: sten_wang@davicom.com.tw | 57 | Sten Wang <sten_wang@davicom.com.tw > : Original Author |
58 | Tobias Ringstrom <tori@unhappy.mine.nu> : Current Maintainer | ||
59 | |||
60 | Contributors: | ||
61 | |||
62 | Marcelo Tosatti <marcelo@conectiva.com.br> | ||
63 | Alan Cox <alan@redhat.com> | ||
64 | Jeff Garzik <jgarzik@pobox.com> | ||
65 | Vojtech Pavlik <vojtech@suse.cz> | ||
diff --git a/Documentation/networking/fib_trie.txt b/Documentation/networking/fib_trie.txt new file mode 100644 index 000000000000..f50d0c673c57 --- /dev/null +++ b/Documentation/networking/fib_trie.txt | |||
@@ -0,0 +1,145 @@ | |||
1 | LC-trie implementation notes. | ||
2 | |||
3 | Node types | ||
4 | ---------- | ||
5 | leaf | ||
6 | An end node with data. This has a copy of the relevant key, along | ||
7 | with 'hlist' with routing table entries sorted by prefix length. | ||
8 | See struct leaf and struct leaf_info. | ||
9 | |||
10 | trie node or tnode | ||
11 | An internal node, holding an array of child (leaf or tnode) pointers, | ||
12 | indexed through a subset of the key. See Level Compression. | ||
13 | |||
14 | A few concepts explained | ||
15 | ------------------------ | ||
16 | Bits (tnode) | ||
17 | The number of bits in the key segment used for indexing into the | ||
18 | child array - the "child index". See Level Compression. | ||
19 | |||
20 | Pos (tnode) | ||
21 | The position (in the key) of the key segment used for indexing into | ||
22 | the child array. See Path Compression. | ||
23 | |||
24 | Path Compression / skipped bits | ||
25 | Any given tnode is linked to from the child array of its parent, using | ||
26 | a segment of the key specified by the parent's "pos" and "bits" | ||
27 | In certain cases, this tnode's own "pos" will not be immediately | ||
28 | adjacent to the parent (pos+bits), but there will be some bits | ||
29 | in the key skipped over because they represent a single path with no | ||
30 | deviations. These "skipped bits" constitute Path Compression. | ||
31 | Note that the search algorithm will simply skip over these bits when | ||
32 | searching, making it necessary to save the keys in the leaves to | ||
33 | verify that they actually do match the key we are searching for. | ||
34 | |||
35 | Level Compression / child arrays | ||
36 | the trie is kept level balanced moving, under certain conditions, the | ||
37 | children of a full child (see "full_children") up one level, so that | ||
38 | instead of a pure binary tree, each internal node ("tnode") may | ||
39 | contain an arbitrarily large array of links to several children. | ||
40 | Conversely, a tnode with a mostly empty child array (see empty_children) | ||
41 | may be "halved", having some of its children moved downwards one level, | ||
42 | in order to avoid ever-increasing child arrays. | ||
43 | |||
44 | empty_children | ||
45 | the number of positions in the child array of a given tnode that are | ||
46 | NULL. | ||
47 | |||
48 | full_children | ||
49 | the number of children of a given tnode that aren't path compressed. | ||
50 | (in other words, they aren't NULL or leaves and their "pos" is equal | ||
51 | to this tnode's "pos"+"bits"). | ||
52 | |||
53 | (The word "full" here is used more in the sense of "complete" than | ||
54 | as the opposite of "empty", which might be a tad confusing.) | ||
55 | |||
56 | Comments | ||
57 | --------- | ||
58 | |||
59 | We have tried to keep the structure of the code as close to fib_hash as | ||
60 | possible to allow verification and help up reviewing. | ||
61 | |||
62 | fib_find_node() | ||
63 | A good start for understanding this code. This function implements a | ||
64 | straightforward trie lookup. | ||
65 | |||
66 | fib_insert_node() | ||
67 | Inserts a new leaf node in the trie. This is bit more complicated than | ||
68 | fib_find_node(). Inserting a new node means we might have to run the | ||
69 | level compression algorithm on part of the trie. | ||
70 | |||
71 | trie_leaf_remove() | ||
72 | Looks up a key, deletes it and runs the level compression algorithm. | ||
73 | |||
74 | trie_rebalance() | ||
75 | The key function for the dynamic trie after any change in the trie | ||
76 | it is run to optimize and reorganize. Tt will walk the trie upwards | ||
77 | towards the root from a given tnode, doing a resize() at each step | ||
78 | to implement level compression. | ||
79 | |||
80 | resize() | ||
81 | Analyzes a tnode and optimizes the child array size by either inflating | ||
82 | or shrinking it repeatedly until it fullfills the criteria for optimal | ||
83 | level compression. This part follows the original paper pretty closely | ||
84 | and there may be some room for experimentation here. | ||
85 | |||
86 | inflate() | ||
87 | Doubles the size of the child array within a tnode. Used by resize(). | ||
88 | |||
89 | halve() | ||
90 | Halves the size of the child array within a tnode - the inverse of | ||
91 | inflate(). Used by resize(); | ||
92 | |||
93 | fn_trie_insert(), fn_trie_delete(), fn_trie_select_default() | ||
94 | The route manipulation functions. Should conform pretty closely to the | ||
95 | corresponding functions in fib_hash. | ||
96 | |||
97 | fn_trie_flush() | ||
98 | This walks the full trie (using nextleaf()) and searches for empty | ||
99 | leaves which have to be removed. | ||
100 | |||
101 | fn_trie_dump() | ||
102 | Dumps the routing table ordered by prefix length. This is somewhat | ||
103 | slower than the corresponding fib_hash function, as we have to walk the | ||
104 | entire trie for each prefix length. In comparison, fib_hash is organized | ||
105 | as one "zone"/hash per prefix length. | ||
106 | |||
107 | Locking | ||
108 | ------- | ||
109 | |||
110 | fib_lock is used for an RW-lock in the same way that this is done in fib_hash. | ||
111 | However, the functions are somewhat separated for other possible locking | ||
112 | scenarios. It might conceivably be possible to run trie_rebalance via RCU | ||
113 | to avoid read_lock in the fn_trie_lookup() function. | ||
114 | |||
115 | Main lookup mechanism | ||
116 | --------------------- | ||
117 | fn_trie_lookup() is the main lookup function. | ||
118 | |||
119 | The lookup is in its simplest form just like fib_find_node(). We descend the | ||
120 | trie, key segment by key segment, until we find a leaf. check_leaf() does | ||
121 | the fib_semantic_match in the leaf's sorted prefix hlist. | ||
122 | |||
123 | If we find a match, we are done. | ||
124 | |||
125 | If we don't find a match, we enter prefix matching mode. The prefix length, | ||
126 | starting out at the same as the key length, is reduced one step at a time, | ||
127 | and we backtrack upwards through the trie trying to find a longest matching | ||
128 | prefix. The goal is always to reach a leaf and get a positive result from the | ||
129 | fib_semantic_match mechanism. | ||
130 | |||
131 | Inside each tnode, the search for longest matching prefix consists of searching | ||
132 | through the child array, chopping off (zeroing) the least significant "1" of | ||
133 | the child index until we find a match or the child index consists of nothing but | ||
134 | zeros. | ||
135 | |||
136 | At this point we backtrack (t->stats.backtrack++) up the trie, continuing to | ||
137 | chop off part of the key in order to find the longest matching prefix. | ||
138 | |||
139 | At this point we will repeatedly descend subtries to look for a match, and there | ||
140 | are some optimizations available that can provide us with "shortcuts" to avoid | ||
141 | descending into dead ends. Look for "HL_OPTIMIZE" sections in the code. | ||
142 | |||
143 | To alleviate any doubts about the correctness of the route selection process, | ||
144 | a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which | ||
145 | gives userland access to fib_lookup(). | ||
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index a2c893a7475d..ab65714d95fc 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt | |||
@@ -304,57 +304,6 @@ tcp_low_latency - BOOLEAN | |||
304 | changed would be a Beowulf compute cluster. | 304 | changed would be a Beowulf compute cluster. |
305 | Default: 0 | 305 | Default: 0 |
306 | 306 | ||
307 | tcp_westwood - BOOLEAN | ||
308 | Enable TCP Westwood+ congestion control algorithm. | ||
309 | TCP Westwood+ is a sender-side only modification of the TCP Reno | ||
310 | protocol stack that optimizes the performance of TCP congestion | ||
311 | control. It is based on end-to-end bandwidth estimation to set | ||
312 | congestion window and slow start threshold after a congestion | ||
313 | episode. Using this estimation, TCP Westwood+ adaptively sets a | ||
314 | slow start threshold and a congestion window which takes into | ||
315 | account the bandwidth used at the time congestion is experienced. | ||
316 | TCP Westwood+ significantly increases fairness wrt TCP Reno in | ||
317 | wired networks and throughput over wireless links. | ||
318 | Default: 0 | ||
319 | |||
320 | tcp_vegas_cong_avoid - BOOLEAN | ||
321 | Enable TCP Vegas congestion avoidance algorithm. | ||
322 | TCP Vegas is a sender-side only change to TCP that anticipates | ||
323 | the onset of congestion by estimating the bandwidth. TCP Vegas | ||
324 | adjusts the sending rate by modifying the congestion | ||
325 | window. TCP Vegas should provide less packet loss, but it is | ||
326 | not as aggressive as TCP Reno. | ||
327 | Default:0 | ||
328 | |||
329 | tcp_bic - BOOLEAN | ||
330 | Enable BIC TCP congestion control algorithm. | ||
331 | BIC-TCP is a sender-side only change that ensures a linear RTT | ||
332 | fairness under large windows while offering both scalability and | ||
333 | bounded TCP-friendliness. The protocol combines two schemes | ||
334 | called additive increase and binary search increase. When the | ||
335 | congestion window is large, additive increase with a large | ||
336 | increment ensures linear RTT fairness as well as good | ||
337 | scalability. Under small congestion windows, binary search | ||
338 | increase provides TCP friendliness. | ||
339 | Default: 0 | ||
340 | |||
341 | tcp_bic_low_window - INTEGER | ||
342 | Sets the threshold window (in packets) where BIC TCP starts to | ||
343 | adjust the congestion window. Below this threshold BIC TCP behaves | ||
344 | the same as the default TCP Reno. | ||
345 | Default: 14 | ||
346 | |||
347 | tcp_bic_fast_convergence - BOOLEAN | ||
348 | Forces BIC TCP to more quickly respond to changes in congestion | ||
349 | window. Allows two flows sharing the same connection to converge | ||
350 | more rapidly. | ||
351 | Default: 1 | ||
352 | |||
353 | tcp_default_win_scale - INTEGER | ||
354 | Sets the minimum window scale TCP will negotiate for on all | ||
355 | conections. | ||
356 | Default: 7 | ||
357 | |||
358 | tcp_tso_win_divisor - INTEGER | 307 | tcp_tso_win_divisor - INTEGER |
359 | This allows control over what percentage of the congestion window | 308 | This allows control over what percentage of the congestion window |
360 | can be consumed by a single TSO frame. | 309 | can be consumed by a single TSO frame. |
@@ -368,6 +317,11 @@ tcp_frto - BOOLEAN | |||
368 | where packet loss is typically due to random radio interference | 317 | where packet loss is typically due to random radio interference |
369 | rather than intermediate router congestion. | 318 | rather than intermediate router congestion. |
370 | 319 | ||
320 | tcp_congestion_control - STRING | ||
321 | Set the congestion control algorithm to be used for new | ||
322 | connections. The algorithm "reno" is always available, but | ||
323 | additional choices may be available based on kernel configuration. | ||
324 | |||
371 | somaxconn - INTEGER | 325 | somaxconn - INTEGER |
372 | Limit of socket listen() backlog, known in userspace as SOMAXCONN. | 326 | Limit of socket listen() backlog, known in userspace as SOMAXCONN. |
373 | Defaults to 128. See also tcp_max_syn_backlog for additional tuning | 327 | Defaults to 128. See also tcp_max_syn_backlog for additional tuning |
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt new file mode 100644 index 000000000000..29ccae409031 --- /dev/null +++ b/Documentation/networking/phy.txt | |||
@@ -0,0 +1,288 @@ | |||
1 | |||
2 | ------- | ||
3 | PHY Abstraction Layer | ||
4 | (Updated 2005-07-21) | ||
5 | |||
6 | Purpose | ||
7 | |||
8 | Most network devices consist of set of registers which provide an interface | ||
9 | to a MAC layer, which communicates with the physical connection through a | ||
10 | PHY. The PHY concerns itself with negotiating link parameters with the link | ||
11 | partner on the other side of the network connection (typically, an ethernet | ||
12 | cable), and provides a register interface to allow drivers to determine what | ||
13 | settings were chosen, and to configure what settings are allowed. | ||
14 | |||
15 | While these devices are distinct from the network devices, and conform to a | ||
16 | standard layout for the registers, it has been common practice to integrate | ||
17 | the PHY management code with the network driver. This has resulted in large | ||
18 | amounts of redundant code. Also, on embedded systems with multiple (and | ||
19 | sometimes quite different) ethernet controllers connected to the same | ||
20 | management bus, it is difficult to ensure safe use of the bus. | ||
21 | |||
22 | Since the PHYs are devices, and the management busses through which they are | ||
23 | accessed are, in fact, busses, the PHY Abstraction Layer treats them as such. | ||
24 | In doing so, it has these goals: | ||
25 | |||
26 | 1) Increase code-reuse | ||
27 | 2) Increase overall code-maintainability | ||
28 | 3) Speed development time for new network drivers, and for new systems | ||
29 | |||
30 | Basically, this layer is meant to provide an interface to PHY devices which | ||
31 | allows network driver writers to write as little code as possible, while | ||
32 | still providing a full feature set. | ||
33 | |||
34 | The MDIO bus | ||
35 | |||
36 | Most network devices are connected to a PHY by means of a management bus. | ||
37 | Different devices use different busses (though some share common interfaces). | ||
38 | In order to take advantage of the PAL, each bus interface needs to be | ||
39 | registered as a distinct device. | ||
40 | |||
41 | 1) read and write functions must be implemented. Their prototypes are: | ||
42 | |||
43 | int write(struct mii_bus *bus, int mii_id, int regnum, u16 value); | ||
44 | int read(struct mii_bus *bus, int mii_id, int regnum); | ||
45 | |||
46 | mii_id is the address on the bus for the PHY, and regnum is the register | ||
47 | number. These functions are guaranteed not to be called from interrupt | ||
48 | time, so it is safe for them to block, waiting for an interrupt to signal | ||
49 | the operation is complete | ||
50 | |||
51 | 2) A reset function is necessary. This is used to return the bus to an | ||
52 | initialized state. | ||
53 | |||
54 | 3) A probe function is needed. This function should set up anything the bus | ||
55 | driver needs, setup the mii_bus structure, and register with the PAL using | ||
56 | mdiobus_register. Similarly, there's a remove function to undo all of | ||
57 | that (use mdiobus_unregister). | ||
58 | |||
59 | 4) Like any driver, the device_driver structure must be configured, and init | ||
60 | exit functions are used to register the driver. | ||
61 | |||
62 | 5) The bus must also be declared somewhere as a device, and registered. | ||
63 | |||
64 | As an example for how one driver implemented an mdio bus driver, see | ||
65 | drivers/net/gianfar_mii.c and arch/ppc/syslib/mpc85xx_devices.c | ||
66 | |||
67 | Connecting to a PHY | ||
68 | |||
69 | Sometime during startup, the network driver needs to establish a connection | ||
70 | between the PHY device, and the network device. At this time, the PHY's bus | ||
71 | and drivers need to all have been loaded, so it is ready for the connection. | ||
72 | At this point, there are several ways to connect to the PHY: | ||
73 | |||
74 | 1) The PAL handles everything, and only calls the network driver when | ||
75 | the link state changes, so it can react. | ||
76 | |||
77 | 2) The PAL handles everything except interrupts (usually because the | ||
78 | controller has the interrupt registers). | ||
79 | |||
80 | 3) The PAL handles everything, but checks in with the driver every second, | ||
81 | allowing the network driver to react first to any changes before the PAL | ||
82 | does. | ||
83 | |||
84 | 4) The PAL serves only as a library of functions, with the network device | ||
85 | manually calling functions to update status, and configure the PHY | ||
86 | |||
87 | |||
88 | Letting the PHY Abstraction Layer do Everything | ||
89 | |||
90 | If you choose option 1 (The hope is that every driver can, but to still be | ||
91 | useful to drivers that can't), connecting to the PHY is simple: | ||
92 | |||
93 | First, you need a function to react to changes in the link state. This | ||
94 | function follows this protocol: | ||
95 | |||
96 | static void adjust_link(struct net_device *dev); | ||
97 | |||
98 | Next, you need to know the device name of the PHY connected to this device. | ||
99 | The name will look something like, "phy0:0", where the first number is the | ||
100 | bus id, and the second is the PHY's address on that bus. | ||
101 | |||
102 | Now, to connect, just call this function: | ||
103 | |||
104 | phydev = phy_connect(dev, phy_name, &adjust_link, flags); | ||
105 | |||
106 | phydev is a pointer to the phy_device structure which represents the PHY. If | ||
107 | phy_connect is successful, it will return the pointer. dev, here, is the | ||
108 | pointer to your net_device. Once done, this function will have started the | ||
109 | PHY's software state machine, and registered for the PHY's interrupt, if it | ||
110 | has one. The phydev structure will be populated with information about the | ||
111 | current state, though the PHY will not yet be truly operational at this | ||
112 | point. | ||
113 | |||
114 | flags is a u32 which can optionally contain phy-specific flags. | ||
115 | This is useful if the system has put hardware restrictions on | ||
116 | the PHY/controller, of which the PHY needs to be aware. | ||
117 | |||
118 | Now just make sure that phydev->supported and phydev->advertising have any | ||
119 | values pruned from them which don't make sense for your controller (a 10/100 | ||
120 | controller may be connected to a gigabit capable PHY, so you would need to | ||
121 | mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions | ||
122 | for these bitfields. Note that you should not SET any bits, or the PHY may | ||
123 | get put into an unsupported state. | ||
124 | |||
125 | Lastly, once the controller is ready to handle network traffic, you call | ||
126 | phy_start(phydev). This tells the PAL that you are ready, and configures the | ||
127 | PHY to connect to the network. If you want to handle your own interrupts, | ||
128 | just set phydev->irq to PHY_IGNORE_INTERRUPT before you call phy_start. | ||
129 | Similarly, if you don't want to use interrupts, set phydev->irq to PHY_POLL. | ||
130 | |||
131 | When you want to disconnect from the network (even if just briefly), you call | ||
132 | phy_stop(phydev). | ||
133 | |||
134 | Keeping Close Tabs on the PAL | ||
135 | |||
136 | It is possible that the PAL's built-in state machine needs a little help to | ||
137 | keep your network device and the PHY properly in sync. If so, you can | ||
138 | register a helper function when connecting to the PHY, which will be called | ||
139 | every second before the state machine reacts to any changes. To do this, you | ||
140 | need to manually call phy_attach() and phy_prepare_link(), and then call | ||
141 | phy_start_machine() with the second argument set to point to your special | ||
142 | handler. | ||
143 | |||
144 | Currently there are no examples of how to use this functionality, and testing | ||
145 | on it has been limited because the author does not have any drivers which use | ||
146 | it (they all use option 1). So Caveat Emptor. | ||
147 | |||
148 | Doing it all yourself | ||
149 | |||
150 | There's a remote chance that the PAL's built-in state machine cannot track | ||
151 | the complex interactions between the PHY and your network device. If this is | ||
152 | so, you can simply call phy_attach(), and not call phy_start_machine or | ||
153 | phy_prepare_link(). This will mean that phydev->state is entirely yours to | ||
154 | handle (phy_start and phy_stop toggle between some of the states, so you | ||
155 | might need to avoid them). | ||
156 | |||
157 | An effort has been made to make sure that useful functionality can be | ||
158 | accessed without the state-machine running, and most of these functions are | ||
159 | descended from functions which did not interact with a complex state-machine. | ||
160 | However, again, no effort has been made so far to test running without the | ||
161 | state machine, so tryer beware. | ||
162 | |||
163 | Here is a brief rundown of the functions: | ||
164 | |||
165 | int phy_read(struct phy_device *phydev, u16 regnum); | ||
166 | int phy_write(struct phy_device *phydev, u16 regnum, u16 val); | ||
167 | |||
168 | Simple read/write primitives. They invoke the bus's read/write function | ||
169 | pointers. | ||
170 | |||
171 | void phy_print_status(struct phy_device *phydev); | ||
172 | |||
173 | A convenience function to print out the PHY status neatly. | ||
174 | |||
175 | int phy_clear_interrupt(struct phy_device *phydev); | ||
176 | int phy_config_interrupt(struct phy_device *phydev, u32 interrupts); | ||
177 | |||
178 | Clear the PHY's interrupt, and configure which ones are allowed, | ||
179 | respectively. Currently only supports all on, or all off. | ||
180 | |||
181 | int phy_enable_interrupts(struct phy_device *phydev); | ||
182 | int phy_disable_interrupts(struct phy_device *phydev); | ||
183 | |||
184 | Functions which enable/disable PHY interrupts, clearing them | ||
185 | before and after, respectively. | ||
186 | |||
187 | int phy_start_interrupts(struct phy_device *phydev); | ||
188 | int phy_stop_interrupts(struct phy_device *phydev); | ||
189 | |||
190 | Requests the IRQ for the PHY interrupts, then enables them for | ||
191 | start, or disables then frees them for stop. | ||
192 | |||
193 | struct phy_device * phy_attach(struct net_device *dev, const char *phy_id, | ||
194 | u32 flags); | ||
195 | |||
196 | Attaches a network device to a particular PHY, binding the PHY to a generic | ||
197 | driver if none was found during bus initialization. Passes in | ||
198 | any phy-specific flags as needed. | ||
199 | |||
200 | int phy_start_aneg(struct phy_device *phydev); | ||
201 | |||
202 | Using variables inside the phydev structure, either configures advertising | ||
203 | and resets autonegotiation, or disables autonegotiation, and configures | ||
204 | forced settings. | ||
205 | |||
206 | static inline int phy_read_status(struct phy_device *phydev); | ||
207 | |||
208 | Fills the phydev structure with up-to-date information about the current | ||
209 | settings in the PHY. | ||
210 | |||
211 | void phy_sanitize_settings(struct phy_device *phydev) | ||
212 | |||
213 | Resolves differences between currently desired settings, and | ||
214 | supported settings for the given PHY device. Does not make | ||
215 | the changes in the hardware, though. | ||
216 | |||
217 | int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd); | ||
218 | int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd); | ||
219 | |||
220 | Ethtool convenience functions. | ||
221 | |||
222 | int phy_mii_ioctl(struct phy_device *phydev, | ||
223 | struct mii_ioctl_data *mii_data, int cmd); | ||
224 | |||
225 | The MII ioctl. Note that this function will completely screw up the state | ||
226 | machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to | ||
227 | use this only to write registers which are not standard, and don't set off | ||
228 | a renegotiation. | ||
229 | |||
230 | |||
231 | PHY Device Drivers | ||
232 | |||
233 | With the PHY Abstraction Layer, adding support for new PHYs is | ||
234 | quite easy. In some cases, no work is required at all! However, | ||
235 | many PHYs require a little hand-holding to get up-and-running. | ||
236 | |||
237 | Generic PHY driver | ||
238 | |||
239 | If the desired PHY doesn't have any errata, quirks, or special | ||
240 | features you want to support, then it may be best to not add | ||
241 | support, and let the PHY Abstraction Layer's Generic PHY Driver | ||
242 | do all of the work. | ||
243 | |||
244 | Writing a PHY driver | ||
245 | |||
246 | If you do need to write a PHY driver, the first thing to do is | ||
247 | make sure it can be matched with an appropriate PHY device. | ||
248 | This is done during bus initialization by reading the device's | ||
249 | UID (stored in registers 2 and 3), then comparing it to each | ||
250 | driver's phy_id field by ANDing it with each driver's | ||
251 | phy_id_mask field. Also, it needs a name. Here's an example: | ||
252 | |||
253 | static struct phy_driver dm9161_driver = { | ||
254 | .phy_id = 0x0181b880, | ||
255 | .name = "Davicom DM9161E", | ||
256 | .phy_id_mask = 0x0ffffff0, | ||
257 | ... | ||
258 | } | ||
259 | |||
260 | Next, you need to specify what features (speed, duplex, autoneg, | ||
261 | etc) your PHY device and driver support. Most PHYs support | ||
262 | PHY_BASIC_FEATURES, but you can look in include/mii.h for other | ||
263 | features. | ||
264 | |||
265 | Each driver consists of a number of function pointers: | ||
266 | |||
267 | config_init: configures PHY into a sane state after a reset. | ||
268 | For instance, a Davicom PHY requires descrambling disabled. | ||
269 | probe: Does any setup needed by the driver | ||
270 | suspend/resume: power management | ||
271 | config_aneg: Changes the speed/duplex/negotiation settings | ||
272 | read_status: Reads the current speed/duplex/negotiation settings | ||
273 | ack_interrupt: Clear a pending interrupt | ||
274 | config_intr: Enable or disable interrupts | ||
275 | remove: Does any driver take-down | ||
276 | |||
277 | Of these, only config_aneg and read_status are required to be | ||
278 | assigned by the driver code. The rest are optional. Also, it is | ||
279 | preferred to use the generic phy driver's versions of these two | ||
280 | functions if at all possible: genphy_read_status and | ||
281 | genphy_config_aneg. If this is not possible, it is likely that | ||
282 | you only need to perform some actions before and after invoking | ||
283 | these functions, and so your functions will wrap the generic | ||
284 | ones. | ||
285 | |||
286 | Feel free to look at the Marvell, Cicada, and Davicom drivers in | ||
287 | drivers/net/phy/ for examples (the lxt and qsemi drivers have | ||
288 | not been tested as of this writing) | ||
diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt index 71749007091e..0fa300425575 100644 --- a/Documentation/networking/tcp.txt +++ b/Documentation/networking/tcp.txt | |||
@@ -1,5 +1,72 @@ | |||
1 | How the new TCP output machine [nyi] works. | 1 | TCP protocol |
2 | ============ | ||
3 | |||
4 | Last updated: 21 June 2005 | ||
5 | |||
6 | Contents | ||
7 | ======== | ||
8 | |||
9 | - Congestion control | ||
10 | - How the new TCP output machine [nyi] works | ||
11 | |||
12 | Congestion control | ||
13 | ================== | ||
14 | |||
15 | The following variables are used in the tcp_sock for congestion control: | ||
16 | snd_cwnd The size of the congestion window | ||
17 | snd_ssthresh Slow start threshold. We are in slow start if | ||
18 | snd_cwnd is less than this. | ||
19 | snd_cwnd_cnt A counter used to slow down the rate of increase | ||
20 | once we exceed slow start threshold. | ||
21 | snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to. | ||
22 | snd_cwnd_stamp Timestamp for when congestion window last validated. | ||
23 | snd_cwnd_used Used as a highwater mark for how much of the | ||
24 | congestion window is in use. It is used to adjust | ||
25 | snd_cwnd down when the link is limited by the | ||
26 | application rather than the network. | ||
27 | |||
28 | As of 2.6.13, Linux supports pluggable congestion control algorithms. | ||
29 | A congestion control mechanism can be registered through functions in | ||
30 | tcp_cong.c. The functions used by the congestion control mechanism are | ||
31 | registered via passing a tcp_congestion_ops struct to | ||
32 | tcp_register_congestion_control. As a minimum name, ssthresh, | ||
33 | cong_avoid, min_cwnd must be valid. | ||
2 | 34 | ||
35 | Private data for a congestion control mechanism is stored in tp->ca_priv. | ||
36 | tcp_ca(tp) returns a pointer to this space. This is preallocated space - it | ||
37 | is important to check the size of your private data will fit this space, or | ||
38 | alternatively space could be allocated elsewhere and a pointer to it could | ||
39 | be stored here. | ||
40 | |||
41 | There are three kinds of congestion control algorithms currently: The | ||
42 | simplest ones are derived from TCP reno (highspeed, scalable) and just | ||
43 | provide an alternative the congestion window calculation. More complex | ||
44 | ones like BIC try to look at other events to provide better | ||
45 | heuristics. There are also round trip time based algorithms like | ||
46 | Vegas and Westwood+. | ||
47 | |||
48 | Good TCP congestion control is a complex problem because the algorithm | ||
49 | needs to maintain fairness and performance. Please review current | ||
50 | research and RFC's before developing new modules. | ||
51 | |||
52 | The method that is used to determine which congestion control mechanism is | ||
53 | determined by the setting of the sysctl net.ipv4.tcp_congestion_control. | ||
54 | The default congestion control will be the last one registered (LIFO); | ||
55 | so if you built everything as modules. the default will be reno. If you | ||
56 | build with the default's from Kconfig, then BIC will be builtin (not a module) | ||
57 | and it will end up the default. | ||
58 | |||
59 | If you really want a particular default value then you will need | ||
60 | to set it with the sysctl. If you use a sysctl, the module will be autoloaded | ||
61 | if needed and you will get the expected protocol. If you ask for an | ||
62 | unknown congestion method, then the sysctl attempt will fail. | ||
63 | |||
64 | If you remove a tcp congestion control module, then you will get the next | ||
65 | available one. Since reno can not be built as a module, and can not be | ||
66 | deleted, it will always be available. | ||
67 | |||
68 | How the new TCP output machine [nyi] works. | ||
69 | =========================================== | ||
3 | 70 | ||
4 | Data is kept on a single queue. The skb->users flag tells us if the frame is | 71 | Data is kept on a single queue. The skb->users flag tells us if the frame is |
5 | one that has been queued already. To add a frame we throw it on the end. Ack | 72 | one that has been queued already. To add a frame we throw it on the end. Ack |
diff --git a/Documentation/networking/wanpipe.txt b/Documentation/networking/wanpipe.txt deleted file mode 100644 index aea20cd2a56e..000000000000 --- a/Documentation/networking/wanpipe.txt +++ /dev/null | |||
@@ -1,622 +0,0 @@ | |||
1 | ------------------------------------------------------------------------------ | ||
2 | Linux WAN Router Utilities Package | ||
3 | ------------------------------------------------------------------------------ | ||
4 | Version 2.2.1 | ||
5 | Mar 28, 2001 | ||
6 | Author: Nenad Corbic <ncorbic@sangoma.com> | ||
7 | Copyright (c) 1995-2001 Sangoma Technologies Inc. | ||
8 | ------------------------------------------------------------------------------ | ||
9 | |||
10 | INTRODUCTION | ||
11 | |||
12 | Wide Area Networks (WANs) are used to interconnect Local Area Networks (LANs) | ||
13 | and/or stand-alone hosts over vast distances with data transfer rates | ||
14 | significantly higher than those achievable with commonly used dial-up | ||
15 | connections. | ||
16 | |||
17 | Usually an external device called `WAN router' sitting on your local network | ||
18 | or connected to your machine's serial port provides physical connection to | ||
19 | WAN. Although router's job may be as simple as taking your local network | ||
20 | traffic, converting it to WAN format and piping it through the WAN link, these | ||
21 | devices are notoriously expensive, with prices as much as 2 - 5 times higher | ||
22 | then the price of a typical PC box. | ||
23 | |||
24 | Alternatively, considering robustness and multitasking capabilities of Linux, | ||
25 | an internal router can be built (most routers use some sort of stripped down | ||
26 | Unix-like operating system anyway). With a number of relatively inexpensive WAN | ||
27 | interface cards available on the market, a perfectly usable router can be | ||
28 | built for less than half a price of an external router. Yet a Linux box | ||
29 | acting as a router can still be used for other purposes, such as fire-walling, | ||
30 | running FTP, WWW or DNS server, etc. | ||
31 | |||
32 | This kernel module introduces the notion of a WAN Link Driver (WLD) to Linux | ||
33 | operating system and provides generic hardware-independent services for such | ||
34 | drivers. Why can existing Linux network device interface not be used for | ||
35 | this purpose? Well, it can. However, there are a few key differences between | ||
36 | a typical network interface (e.g. Ethernet) and a WAN link. | ||
37 | |||
38 | Many WAN protocols, such as X.25 and frame relay, allow for multiple logical | ||
39 | connections (known as `virtual circuits' in X.25 terminology) over a single | ||
40 | physical link. Each such virtual circuit may (and almost always does) lead | ||
41 | to a different geographical location and, therefore, different network. As a | ||
42 | result, it is the virtual circuit, not the physical link, that represents a | ||
43 | route and, therefore, a network interface in Linux terms. | ||
44 | |||
45 | To further complicate things, virtual circuits are usually volatile in nature | ||
46 | (excluding so called `permanent' virtual circuits or PVCs). With almost no | ||
47 | time required to set up and tear down a virtual circuit, it is highly desirable | ||
48 | to implement on-demand connections in order to minimize network charges. So | ||
49 | unlike a typical network driver, the WAN driver must be able to handle multiple | ||
50 | network interfaces and cope as multiple virtual circuits come into existence | ||
51 | and go away dynamically. | ||
52 | |||
53 | Last, but not least, WAN configuration is much more complex than that of say | ||
54 | Ethernet and may well amount to several dozens of parameters. Some of them | ||
55 | are "link-wide" while others are virtual circuit-specific. The same holds | ||
56 | true for WAN statistics which is by far more extensive and extremely useful | ||
57 | when troubleshooting WAN connections. Extending the ifconfig utility to suit | ||
58 | these needs may be possible, but does not seem quite reasonable. Therefore, a | ||
59 | WAN configuration utility and corresponding application programmer's interface | ||
60 | is needed for this purpose. | ||
61 | |||
62 | Most of these problems are taken care of by this module. Its goal is to | ||
63 | provide a user with more-or-less standard look and feel for all WAN devices and | ||
64 | assist a WAN device driver writer by providing common services, such as: | ||
65 | |||
66 | o User-level interface via /proc file system | ||
67 | o Centralized configuration | ||
68 | o Device management (setup, shutdown, etc.) | ||
69 | o Network interface management (dynamic creation/destruction) | ||
70 | o Protocol encapsulation/decapsulation | ||
71 | |||
72 | To ba able to use the Linux WAN Router you will also need a WAN Tools package | ||
73 | available from | ||
74 | |||
75 | ftp.sangoma.com/pub/linux/current_wanpipe/wanpipe-X.Y.Z.tgz | ||
76 | |||
77 | where vX.Y.Z represent the wanpipe version number. | ||
78 | |||
79 | For technical questions and/or comments please e-mail to ncorbic@sangoma.com. | ||
80 | For general inquiries please contact Sangoma Technologies Inc. by | ||
81 | |||
82 | Hotline: 1-800-388-2475 (USA and Canada, toll free) | ||
83 | Phone: (905) 474-1990 ext: 106 | ||
84 | Fax: (905) 474-9223 | ||
85 | E-mail: dm@sangoma.com (David Mandelstam) | ||
86 | WWW: http://www.sangoma.com | ||
87 | |||
88 | |||
89 | INSTALLATION | ||
90 | |||
91 | Please read the WanpipeForLinux.pdf manual on how to | ||
92 | install the WANPIPE tools and drivers properly. | ||
93 | |||
94 | |||
95 | After installing wanpipe package: /usr/local/wanrouter/doc. | ||
96 | On the ftp.sangoma.com : /linux/current_wanpipe/doc | ||
97 | |||
98 | |||
99 | COPYRIGHT AND LICENSING INFORMATION | ||
100 | |||
101 | This program is free software; you can redistribute it and/or modify it under | ||
102 | the terms of the GNU General Public License as published by the Free Software | ||
103 | Foundation; either version 2, or (at your option) any later version. | ||
104 | |||
105 | This program is distributed in the hope that it will be useful, but WITHOUT | ||
106 | ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS | ||
107 | FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. | ||
108 | |||
109 | You should have received a copy of the GNU General Public License along with | ||
110 | this program; if not, write to the Free Software Foundation, Inc., 675 Mass | ||
111 | Ave, Cambridge, MA 02139, USA. | ||
112 | |||
113 | |||
114 | |||
115 | ACKNOWLEDGEMENTS | ||
116 | |||
117 | This product is based on the WANPIPE(tm) Multiprotocol WAN Router developed | ||
118 | by Sangoma Technologies Inc. for Linux 2.0.x and 2.2.x. Success of the WANPIPE | ||
119 | together with the next major release of Linux kernel in summer 1996 commanded | ||
120 | adequate changes to the WANPIPE code to take full advantage of new Linux | ||
121 | features. | ||
122 | |||
123 | Instead of continuing developing proprietary interface tied to Sangoma WAN | ||
124 | cards, we decided to separate all hardware-independent code into a separate | ||
125 | module and defined two levels of interfaces - one for user-level applications | ||
126 | and another for kernel-level WAN drivers. WANPIPE is now implemented as a | ||
127 | WAN driver compliant with the WAN Link Driver interface. Also a general | ||
128 | purpose WAN configuration utility and a set of shell scripts was developed to | ||
129 | support WAN router at the user level. | ||
130 | |||
131 | Many useful ideas concerning hardware-independent interface implementation | ||
132 | were given by Mike McLagan <mike.mclagan@linux.org> and his implementation | ||
133 | of the Frame Relay router and drivers for Sangoma cards (dlci/sdla). | ||
134 | |||
135 | With the new implementation of the APIs being incorporated into the WANPIPE, | ||
136 | a special thank goes to Alan Cox in providing insight into BSD sockets. | ||
137 | |||
138 | Special thanks to all the WANPIPE users who performed field-testing, reported | ||
139 | bugs and made valuable comments and suggestions that help us to improve this | ||
140 | product. | ||
141 | |||
142 | |||
143 | |||
144 | NEW IN THIS RELEASE | ||
145 | |||
146 | o Updated the WANCFG utility | ||
147 | Calls the pppconfig to configure the PPPD | ||
148 | for async connections. | ||
149 | |||
150 | o Added the PPPCONFIG utility | ||
151 | Used to configure the PPPD dameon for the | ||
152 | WANPIPE Async PPP and standard serial port. | ||
153 | The wancfg calls the pppconfig to configure | ||
154 | the pppd. | ||
155 | |||
156 | o Fixed the PCI autodetect feature. | ||
157 | The SLOT 0 was used as an autodetect option | ||
158 | however, some high end PC's slot numbers start | ||
159 | from 0. | ||
160 | |||
161 | o This release has been tested with the new backupd | ||
162 | daemon release. | ||
163 | |||
164 | |||
165 | PRODUCT COMPONENTS AND RELATED FILES | ||
166 | |||
167 | /etc: (or user defined) | ||
168 | wanpipe1.conf default router configuration file | ||
169 | |||
170 | /lib/modules/X.Y.Z/misc: | ||
171 | wanrouter.o router kernel loadable module | ||
172 | af_wanpipe.o wanpipe api socket module | ||
173 | |||
174 | /lib/modules/X.Y.Z/net: | ||
175 | sdladrv.o Sangoma SDLA support module | ||
176 | wanpipe.o Sangoma WANPIPE(tm) driver module | ||
177 | |||
178 | /proc/net/wanrouter | ||
179 | Config reads current router configuration | ||
180 | Status reads current router status | ||
181 | {name} reads WAN driver statistics | ||
182 | |||
183 | /usr/sbin: | ||
184 | wanrouter wanrouter start-up script | ||
185 | wanconfig wanrouter configuration utility | ||
186 | sdladump WANPIPE adapter memory dump utility | ||
187 | fpipemon Monitor for Frame Relay | ||
188 | cpipemon Monitor for Cisco HDLC | ||
189 | ppipemon Monitor for PPP | ||
190 | xpipemon Monitor for X25 | ||
191 | wpkbdmon WANPIPE keyboard led monitor/debugger | ||
192 | |||
193 | /usr/local/wanrouter: | ||
194 | README this file | ||
195 | COPYING GNU General Public License | ||
196 | Setup installation script | ||
197 | Filelist distribution definition file | ||
198 | wanrouter.rc meta-configuration file | ||
199 | (used by the Setup and wanrouter script) | ||
200 | |||
201 | /usr/local/wanrouter/doc: | ||
202 | wanpipeForLinux.pdf WAN Router User's Manual | ||
203 | |||
204 | /usr/local/wanrouter/patches: | ||
205 | wanrouter-v2213.gz patch for Linux kernels 2.2.11 up to 2.2.13. | ||
206 | wanrouter-v2214.gz patch for Linux kernel 2.2.14. | ||
207 | wanrouter-v2215.gz patch for Linux kernels 2.2.15 to 2.2.17. | ||
208 | wanrouter-v2218.gz patch for Linux kernels 2.2.18 and up. | ||
209 | wanrouter-v240.gz patch for Linux kernel 2.4.0. | ||
210 | wanrouter-v242.gz patch for Linux kernel 2.4.2 and up. | ||
211 | wanrouter-v2034.gz patch for Linux kernel 2.0.34 | ||
212 | wanrouter-v2036.gz patch for Linux kernel 2.0.36 and up. | ||
213 | |||
214 | /usr/local/wanrouter/patches/kdrivers: | ||
215 | Sources of the latest WANPIPE device drivers. | ||
216 | These are used to UPGRADE the linux kernel to the newest | ||
217 | version if the kernel source has already been pathced with | ||
218 | WANPIPE drivers. | ||
219 | |||
220 | /usr/local/wanrouter/samples: | ||
221 | interface sample interface configuration file | ||
222 | wanpipe1.cpri CHDLC primary port | ||
223 | wanpipe2.csec CHDLC secondary port | ||
224 | wanpipe1.fr Frame Relay protocol | ||
225 | wanpipe1.ppp PPP protocol ) | ||
226 | wanpipe1.asy CHDLC ASYNC protocol | ||
227 | wanpipe1.x25 X25 protocol | ||
228 | wanpipe1.stty Sync TTY driver (Used by Kernel PPPD daemon) | ||
229 | wanpipe1.atty Async TTY driver (Used by Kernel PPPD daemon) | ||
230 | wanrouter.rc sample meta-configuration file | ||
231 | |||
232 | /usr/local/wanrouter/util: | ||
233 | * wan-tools utilities source code | ||
234 | |||
235 | /usr/local/wanrouter/api/x25: | ||
236 | * x25 api sample programs. | ||
237 | /usr/local/wanrouter/api/chdlc: | ||
238 | * chdlc api sample programs. | ||
239 | /usr/local/wanrouter/api/fr: | ||
240 | * fr api sample programs. | ||
241 | /usr/local/wanrouter/config/wancfg: | ||
242 | wancfg WANPIPE GUI configuration program. | ||
243 | Creates wanpipe#.conf files. | ||
244 | /usr/local/wanrouter/config/cfgft1: | ||
245 | cfgft1 GUI CSU/DSU configuration program. | ||
246 | |||
247 | /usr/include/linux: | ||
248 | wanrouter.h router API definitions | ||
249 | wanpipe.h WANPIPE API definitions | ||
250 | sdladrv.h SDLA support module API definitions | ||
251 | sdlasfm.h SDLA firmware module definitions | ||
252 | if_wanpipe.h WANPIPE Socket definitions | ||
253 | if_wanpipe_common.h WANPIPE Socket/Driver common definitions. | ||
254 | sdlapci.h WANPIPE PCI definitions | ||
255 | |||
256 | |||
257 | /usr/src/linux/net/wanrouter: | ||
258 | * wanrouter source code | ||
259 | |||
260 | /var/log: | ||
261 | wanrouter wanrouter start-up log (created by the Setup script) | ||
262 | |||
263 | /var/lock: (or /var/lock/subsys for RedHat) | ||
264 | wanrouter wanrouter lock file (created by the Setup script) | ||
265 | |||
266 | /usr/local/wanrouter/firmware: | ||
267 | fr514.sfm Frame relay firmware for Sangoma S508/S514 card | ||
268 | cdual514.sfm Dual Port Cisco HDLC firmware for Sangoma S508/S514 card | ||
269 | ppp514.sfm PPP Firmware for Sangoma S508 and S514 cards | ||
270 | x25_508.sfm X25 Firmware for Sangoma S508 card. | ||
271 | |||
272 | |||
273 | REVISION HISTORY | ||
274 | |||
275 | 1.0.0 December 31, 1996 Initial version | ||
276 | |||
277 | 1.0.1 January 30, 1997 Status and statistics can be read via /proc | ||
278 | filesystem entries. | ||
279 | |||
280 | 1.0.2 April 30, 1997 Added UDP management via monitors. | ||
281 | |||
282 | 1.0.3 June 3, 1997 UDP management for multiple boards using Frame | ||
283 | Relay and PPP | ||
284 | Enabled continuous transmission of Configure | ||
285 | Request Packet for PPP (for 508 only) | ||
286 | Connection Timeout for PPP changed from 900 to 0 | ||
287 | Flow Control Problem fixed for Frame Relay | ||
288 | |||
289 | 1.0.4 July 10, 1997 S508/FT1 monitoring capability in fpipemon and | ||
290 | ppipemon utilities. | ||
291 | Configurable TTL for UDP packets. | ||
292 | Multicast and Broadcast IP source addresses are | ||
293 | silently discarded. | ||
294 | |||
295 | 1.0.5 July 28, 1997 Configurable T391,T392,N391,N392,N393 for Frame | ||
296 | Relay in router.conf. | ||
297 | Configurable Memory Address through router.conf | ||
298 | for Frame Relay, PPP and X.25. (commenting this | ||
299 | out enables auto-detection). | ||
300 | Fixed freeing up received buffers using kfree() | ||
301 | for Frame Relay and X.25. | ||
302 | Protect sdla_peek() by calling save_flags(), | ||
303 | cli() and restore_flags(). | ||
304 | Changed number of Trace elements from 32 to 20 | ||
305 | Added DLCI specific data monitoring in FPIPEMON. | ||
306 | 2.0.0 Nov 07, 1997 Implemented protection of RACE conditions by | ||
307 | critical flags for FRAME RELAY and PPP. | ||
308 | DLCI List interrupt mode implemented. | ||
309 | IPX support in FRAME RELAY and PPP. | ||
310 | IPX Server Support (MARS) | ||
311 | More driver specific stats included in FPIPEMON | ||
312 | and PIPEMON. | ||
313 | |||
314 | 2.0.1 Nov 28, 1997 Bug Fixes for version 2.0.0. | ||
315 | Protection of "enable_irq()" while | ||
316 | "disable_irq()" has been enabled from any other | ||
317 | routine (for Frame Relay, PPP and X25). | ||
318 | Added additional Stats for Fpipemon and Ppipemon | ||
319 | Improved Load Sharing for multiple boards | ||
320 | |||
321 | 2.0.2 Dec 09, 1997 Support for PAP and CHAP for ppp has been | ||
322 | implemented. | ||
323 | |||
324 | 2.0.3 Aug 15, 1998 New release supporting Cisco HDLC, CIR for Frame | ||
325 | relay, Dynamic IP assignment for PPP and Inverse | ||
326 | Arp support for Frame-relay. Man Pages are | ||
327 | included for better support and a new utility | ||
328 | for configuring FT1 cards. | ||
329 | |||
330 | 2.0.4 Dec 09, 1998 Dual Port support for Cisco HDLC. | ||
331 | Support for HDLC (LAPB) API. | ||
332 | Supports BiSync Streaming code for S502E | ||
333 | and S503 cards. | ||
334 | Support for Streaming HDLC API. | ||
335 | Provides a BSD socket interface for | ||
336 | creating applications using BiSync | ||
337 | streaming. | ||
338 | |||
339 | 2.0.5 Aug 04, 1999 CHDLC initializatin bug fix. | ||
340 | PPP interrupt driven driver: | ||
341 | Fix to the PPP line hangup problem. | ||
342 | New PPP firmware | ||
343 | Added comments to the startup SYSTEM ERROR messages | ||
344 | Xpipemon debugging application for the X25 protocol | ||
345 | New USER_MANUAL.txt | ||
346 | Fixed the odd boundary 4byte writes to the board. | ||
347 | BiSync Streaming code has been taken out. | ||
348 | Available as a patch. | ||
349 | Streaming HDLC API has been taken out. | ||
350 | Available as a patch. | ||
351 | |||
352 | 2.0.6 Aug 17, 1999 Increased debugging in statup scripts | ||
353 | Fixed insallation bugs from 2.0.5 | ||
354 | Kernel patch works for both 2.2.10 and 2.2.11 kernels. | ||
355 | There is no functional difference between the two packages | ||
356 | |||
357 | 2.0.7 Aug 26, 1999 o Merged X25API code into WANPIPE. | ||
358 | o Fixed a memeory leak for X25API | ||
359 | o Updated the X25API code for 2.2.X kernels. | ||
360 | o Improved NEM handling. | ||
361 | |||
362 | 2.1.0 Oct 25, 1999 o New code for S514 PCI Card | ||
363 | o New CHDLC and Frame Relay drivers | ||
364 | o PPP and X25 are not supported in this release | ||
365 | |||
366 | 2.1.1 Nov 30, 1999 o PPP support for S514 PCI Cards | ||
367 | |||
368 | 2.1.3 Apr 06, 2000 o Socket based x25api | ||
369 | o Socket based chdlc api | ||
370 | o Socket based fr api | ||
371 | o Dual Port Receive only CHDLC support. | ||
372 | o Asynchronous CHDLC support (Secondary Port) | ||
373 | o cfgft1 GUI csu/dsu configurator | ||
374 | o wancfg GUI configuration file | ||
375 | configurator. | ||
376 | o Architectual directory changes. | ||
377 | |||
378 | beta-2.1.4 Jul 2000 o Dynamic interface configuration: | ||
379 | Network interfaces reflect the state | ||
380 | of protocol layer. If the protocol becomes | ||
381 | disconnected, driver will bring down | ||
382 | the interface. Once the protocol reconnects | ||
383 | the interface will be brought up. | ||
384 | |||
385 | Note: This option is turned off by default. | ||
386 | |||
387 | o Dynamic wanrouter setup using 'wanconfig': | ||
388 | wanconfig utility can be used to | ||
389 | shutdown,restart,start or reconfigure | ||
390 | a virtual circuit dynamically. | ||
391 | |||
392 | Frame Relay: Each DLCI can be: | ||
393 | created,stopped,restarted and reconfigured | ||
394 | dynamically using wanconfig. | ||
395 | |||
396 | ex: wanconfig card wanpipe1 dev wp1_fr16 up | ||
397 | |||
398 | o Wanrouter startup via command line arguments: | ||
399 | wanconfig also supports wanrouter startup via command line | ||
400 | arguments. Thus, there is no need to create a wanpipe#.conf | ||
401 | configuration file. | ||
402 | |||
403 | o Socket based x25api update/bug fixes. | ||
404 | Added support for LCN numbers greater than 255. | ||
405 | Option to pass up modem messages. | ||
406 | Provided a PCI IRQ check, so a single S514 | ||
407 | card is guaranteed to have a non-sharing interrupt. | ||
408 | |||
409 | o Fixes to the wancfg utility. | ||
410 | o New FT1 debugging support via *pipemon utilities. | ||
411 | o Frame Relay ARP support Enabled. | ||
412 | |||
413 | beta3-2.1.4 Jul 2000 o X25 M_BIT Problem fix. | ||
414 | o Added the Multi-Port PPP | ||
415 | Updated utilites for the Multi-Port PPP. | ||
416 | |||
417 | 2.1.4 Aut 2000 | ||
418 | o In X25API: | ||
419 | Maximum packet an application can send | ||
420 | to the driver has been extended to 4096 bytes. | ||
421 | |||
422 | Fixed the x25 startup bug. Enable | ||
423 | communications only after all interfaces | ||
424 | come up. HIGH SVC/PVC is used to calculate | ||
425 | the number of channels. | ||
426 | Enable protocol only after all interfaces | ||
427 | are enabled. | ||
428 | |||
429 | o Added an extra state to the FT1 config, kernel module. | ||
430 | o Updated the pipemon debuggers. | ||
431 | |||
432 | o Blocked the Multi-Port PPP from running on kernels | ||
433 | 2.2.16 or greater, due to syncppp kernel module | ||
434 | change. | ||
435 | |||
436 | beta1-2.1.5 Nov 15 2000 | ||
437 | o Fixed the MulitPort PPP Support for kernels 2.2.16 and above. | ||
438 | 2.2.X kernels only | ||
439 | |||
440 | o Secured the driver UDP debugging calls | ||
441 | - All illegal netowrk debugging calls are reported to | ||
442 | the log. | ||
443 | - Defined a set of allowed commands, all other denied. | ||
444 | |||
445 | o Cpipemon | ||
446 | - Added set FT1 commands to the cpipemon. Thus CSU/DSU | ||
447 | configuraiton can be performed using cpipemon. | ||
448 | All systems that cannot run cfgft1 GUI utility should | ||
449 | use cpipemon to configure the on board CSU/DSU. | ||
450 | |||
451 | |||
452 | o Keyboard Led Monitor/Debugger | ||
453 | - A new utilty /usr/sbin/wpkbdmon uses keyboard leds | ||
454 | to convey operatinal statistic information of the | ||
455 | Sangoma WANPIPE cards. | ||
456 | NUM_LOCK = Line State (On=connected, Off=disconnected) | ||
457 | CAPS_LOCK = Tx data (On=transmitting, Off=no tx data) | ||
458 | SCROLL_LOCK = Rx data (On=receiving, Off=no rx data | ||
459 | |||
460 | o Hardware probe on module load and dynamic device allocation | ||
461 | - During WANPIPE module load, all Sangoma cards are probed | ||
462 | and found information is printed in the /var/log/messages. | ||
463 | - If no cards are found, the module load fails. | ||
464 | - Appropriate number of devices are dynamically loaded | ||
465 | based on the number of Sangoma cards found. | ||
466 | |||
467 | Note: The kernel configuraiton option | ||
468 | CONFIG_WANPIPE_CARDS has been taken out. | ||
469 | |||
470 | o Fixed the Frame Relay and Chdlc network interfaces so they are | ||
471 | compatible with libpcap libraries. Meaning, tcpdump, snort, | ||
472 | ethereal, and all other packet sniffers and debuggers work on | ||
473 | all WANPIPE netowrk interfaces. | ||
474 | - Set the network interface encoding type to ARPHRD_PPP. | ||
475 | This tell the sniffers that data obtained from the | ||
476 | network interface is in pure IP format. | ||
477 | Fix for 2.2.X kernels only. | ||
478 | |||
479 | o True interface encoding option for Frame Relay and CHDLC | ||
480 | - The above fix sets the network interface encoding | ||
481 | type to ARPHRD_PPP, however some customers use | ||
482 | the encoding interface type to determine the | ||
483 | protocol running. Therefore, the TURE ENCODING | ||
484 | option will set the interface type back to the | ||
485 | original value. | ||
486 | |||
487 | NOTE: If this option is used with Frame Relay and CHDLC | ||
488 | libpcap library support will be broken. | ||
489 | i.e. tcpdump will not work. | ||
490 | Fix for 2.2.x Kernels only. | ||
491 | |||
492 | o Ethernet Bridgind over Frame Relay | ||
493 | - The Frame Relay bridging has been developed by | ||
494 | Kristian Hoffmann and Mark Wells. | ||
495 | - The Linux kernel bridge is used to send ethernet | ||
496 | data over the frame relay links. | ||
497 | For 2.2.X Kernels only. | ||
498 | |||
499 | o Added extensive 2.0.X support. Most new features of | ||
500 | 2.1.5 for protocols Frame Relay, PPP and CHDLC are | ||
501 | supported under 2.0.X kernels. | ||
502 | |||
503 | beta1-2.2.0 Dec 30 2000 | ||
504 | o Updated drivers for 2.4.X kernels. | ||
505 | o Updated drivers for SMP support. | ||
506 | o X25API is now able to share PCI interrupts. | ||
507 | o Took out a general polling routine that was used | ||
508 | only by X25API. | ||
509 | o Added appropriate locks to the dynamic reconfiguration | ||
510 | code. | ||
511 | o Fixed a bug in the keyboard debug monitor. | ||
512 | |||
513 | beta2-2.2.0 Jan 8 2001 | ||
514 | o Patches for 2.4.0 kernel | ||
515 | o Patches for 2.2.18 kernel | ||
516 | o Minor updates to PPP and CHLDC drivers. | ||
517 | Note: No functinal difference. | ||
518 | |||
519 | beta3-2.2.9 Jan 10 2001 | ||
520 | o I missed the 2.2.18 kernel patches in beta2-2.2.0 | ||
521 | release. They are included in this release. | ||
522 | |||
523 | Stable Release | ||
524 | 2.2.0 Feb 01 2001 | ||
525 | o Bug fix in wancfg GUI configurator. | ||
526 | The edit function didn't work properly. | ||
527 | |||
528 | |||
529 | bata1-2.2.1 Feb 09 2001 | ||
530 | o WANPIPE TTY Driver emulation. | ||
531 | Two modes of operation Sync and Async. | ||
532 | Sync: Using the PPPD daemon, kernel SyncPPP layer | ||
533 | and the Wanpipe sync TTY driver: a PPP protocol | ||
534 | connection can be established via Sangoma adapter, over | ||
535 | a T1 leased line. | ||
536 | |||
537 | The 2.4.0 kernel PPP layer supports MULTILINK | ||
538 | protocol, that can be used to bundle any number of Sangoma | ||
539 | adapters (T1 lines) into one, under a single IP address. | ||
540 | Thus, efficiently obtaining multiple T1 throughput. | ||
541 | |||
542 | NOTE: The remote side must also implement MULTILINK PPP | ||
543 | protocol. | ||
544 | |||
545 | Async:Using the PPPD daemon, kernel AsyncPPP layer | ||
546 | and the WANPIPE async TTY driver: a PPP protocol | ||
547 | connection can be established via Sangoma adapter and | ||
548 | a modem, over a telephone line. | ||
549 | |||
550 | Thus, the WANPIPE async TTY driver simulates a serial | ||
551 | TTY driver that would normally be used to interface the | ||
552 | MODEM to the linux kernel. | ||
553 | |||
554 | o WANPIPE PPP Backup Utility | ||
555 | This utility will monitor the state of the PPP T1 line. | ||
556 | In case of failure, a dial up connection will be established | ||
557 | via pppd daemon, ether via a serial tty driver (serial port), | ||
558 | or a WANPIPE async TTY driver (in case serial port is unavailable). | ||
559 | |||
560 | Furthermore, while in dial up mode, the primary PPP T1 link | ||
561 | will be monitored for signs of life. | ||
562 | |||
563 | If the PPP T1 link comes back to life, the dial up connection | ||
564 | will be shutdown and T1 line re-established. | ||
565 | |||
566 | |||
567 | o New Setup installation script. | ||
568 | Option to UPGRADE device drivers if the kernel source has | ||
569 | already been patched with WANPIPE. | ||
570 | |||
571 | Option to COMPILE WANPIPE modules against the currently | ||
572 | running kernel, thus no need for manual kernel and module | ||
573 | re-compilatin. | ||
574 | |||
575 | o Updates and Bug Fixes to wancfg utility. | ||
576 | |||
577 | bata2-2.2.1 Feb 20 2001 | ||
578 | |||
579 | o Bug fixes to the CHDLC device drivers. | ||
580 | The driver had compilation problems under kernels | ||
581 | 2.2.14 or lower. | ||
582 | |||
583 | o Bug fixes to the Setup installation script. | ||
584 | The device drivers compilation options didn't work | ||
585 | properly. | ||
586 | |||
587 | o Update to the wpbackupd daemon. | ||
588 | Optimized the cross-over times, between the primary | ||
589 | link and the backup dialup. | ||
590 | |||
591 | beta3-2.2.1 Mar 02 2001 | ||
592 | o Patches for 2.4.2 kernel. | ||
593 | |||
594 | o Bug fixes to util/ make files. | ||
595 | o Bug fixes to the Setup installation script. | ||
596 | |||
597 | o Took out the backupd support and made it into | ||
598 | as separate package. | ||
599 | |||
600 | beta4-2.2.1 Mar 12 2001 | ||
601 | |||
602 | o Fix to the Frame Relay Device driver. | ||
603 | IPSAC sends a packet of zero length | ||
604 | header to the frame relay driver. The | ||
605 | driver tries to push its own 2 byte header | ||
606 | into the packet, which causes the driver to | ||
607 | crash. | ||
608 | |||
609 | o Fix the WANPIPE re-configuration code. | ||
610 | Bug was found by trying to run the cfgft1 while the | ||
611 | interface was already running. | ||
612 | |||
613 | o Updates to cfgft1. | ||
614 | Writes a wanpipe#.cfgft1 configuration file | ||
615 | once the CSU/DSU is configured. This file can | ||
616 | holds the current CSU/DSU configuration. | ||
617 | |||
618 | |||
619 | |||
620 | >>>>>> END OF README <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< | ||
621 | |||
622 | |||