diff options
author | Jay Vosburgh <fubar@us.ibm.com> | 2005-07-21 15:18:02 -0400 |
---|---|---|
committer | Jeff Garzik <jgarzik@pobox.com> | 2005-07-31 00:46:48 -0400 |
commit | 00354cfb92bd819a0d09d3ef9988e509b6675fdd (patch) | |
tree | f1e338c59bd5d5e5af223cccc476e42826a67d21 /Documentation/networking | |
parent | f2e1e47d14aae1f33e98cf8d9ea93e2fe9e2f521 (diff) |
[PATCH] bonding: documentation update
Contains general updates (additional configuration info, hopefully
better examples, updated some out of date info, and a bonus pass
through ispell to banish the "paramters.") and info specific to
gratuitous ARP and xmit policy functionality already in 2.6.13-rc2.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/bonding.txt | 978 |
1 files changed, 677 insertions, 301 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 0bc2ed136a38..24d029455baa 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt | |||
@@ -1,5 +1,7 @@ | |||
1 | 1 | ||
2 | Linux Ethernet Bonding Driver HOWTO | 2 | Linux Ethernet Bonding Driver HOWTO |
3 | |||
4 | Latest update: 21 June 2005 | ||
3 | 5 | ||
4 | Initial release : Thomas Davis <tadavis at lbl.gov> | 6 | Initial release : Thomas Davis <tadavis at lbl.gov> |
5 | Corrections, HA extensions : 2000/10/03-15 : | 7 | Corrections, HA extensions : 2000/10/03-15 : |
@@ -11,15 +13,22 @@ Corrections, HA extensions : 2000/10/03-15 : | |||
11 | 13 | ||
12 | Reorganized and updated Feb 2005 by Jay Vosburgh | 14 | Reorganized and updated Feb 2005 by Jay Vosburgh |
13 | 15 | ||
14 | Note : | 16 | Introduction |
15 | ------ | 17 | ============ |
18 | |||
19 | The Linux bonding driver provides a method for aggregating | ||
20 | multiple network interfaces into a single logical "bonded" interface. | ||
21 | The behavior of the bonded interfaces depends upon the mode; generally | ||
22 | speaking, modes provide either hot standby or load balancing services. | ||
23 | Additionally, link integrity monitoring may be performed. | ||
16 | 24 | ||
17 | The bonding driver originally came from Donald Becker's beowulf patches for | 25 | The bonding driver originally came from Donald Becker's |
18 | kernel 2.0. It has changed quite a bit since, and the original tools from | 26 | beowulf patches for kernel 2.0. It has changed quite a bit since, and |
19 | extreme-linux and beowulf sites will not work with this version of the driver. | 27 | the original tools from extreme-linux and beowulf sites will not work |
28 | with this version of the driver. | ||
20 | 29 | ||
21 | For new versions of the driver, patches for older kernels and the updated | 30 | For new versions of the driver, updated userspace tools, and |
22 | userspace tools, please follow the links at the end of this file. | 31 | who to ask for help, please follow the links at the end of this file. |
23 | 32 | ||
24 | Table of Contents | 33 | Table of Contents |
25 | ================= | 34 | ================= |
@@ -30,9 +39,13 @@ Table of Contents | |||
30 | 39 | ||
31 | 3. Configuring Bonding Devices | 40 | 3. Configuring Bonding Devices |
32 | 3.1 Configuration with sysconfig support | 41 | 3.1 Configuration with sysconfig support |
42 | 3.1.1 Using DHCP with sysconfig | ||
43 | 3.1.2 Configuring Multiple Bonds with sysconfig | ||
33 | 3.2 Configuration with initscripts support | 44 | 3.2 Configuration with initscripts support |
45 | 3.2.1 Using DHCP with initscripts | ||
46 | 3.2.2 Configuring Multiple Bonds with initscripts | ||
34 | 3.3 Configuring Bonding Manually | 47 | 3.3 Configuring Bonding Manually |
35 | 3.4 Configuring Multiple Bonds | 48 | 3.3.1 Configuring Multiple Bonds Manually |
36 | 49 | ||
37 | 5. Querying Bonding Configuration | 50 | 5. Querying Bonding Configuration |
38 | 5.1 Bonding Configuration | 51 | 5.1 Bonding Configuration |
@@ -56,21 +69,30 @@ Table of Contents | |||
56 | 69 | ||
57 | 11. Promiscuous mode | 70 | 11. Promiscuous mode |
58 | 71 | ||
59 | 12. High Availability Information | 72 | 12. Configuring Bonding for High Availability |
60 | 12.1 High Availability in a Single Switch Topology | 73 | 12.1 High Availability in a Single Switch Topology |
61 | 12.1.1 Bonding Mode Selection for Single Switch Topology | ||
62 | 12.1.2 Link Monitoring for Single Switch Topology | ||
63 | 12.2 High Availability in a Multiple Switch Topology | 74 | 12.2 High Availability in a Multiple Switch Topology |
64 | 12.2.1 Bonding Mode Selection for Multiple Switch Topology | 75 | 12.2.1 HA Bonding Mode Selection for Multiple Switch Topology |
65 | 12.2.2 Link Monitoring for Multiple Switch Topology | 76 | 12.2.2 HA Link Monitoring for Multiple Switch Topology |
66 | 12.3 Switch Behavior Issues for High Availability | 77 | |
78 | 13. Configuring Bonding for Maximum Throughput | ||
79 | 13.1 Maximum Throughput in a Single Switch Topology | ||
80 | 13.1.1 MT Bonding Mode Selection for Single Switch Topology | ||
81 | 13.1.2 MT Link Monitoring for Single Switch Topology | ||
82 | 13.2 Maximum Throughput in a Multiple Switch Topology | ||
83 | 13.2.1 MT Bonding Mode Selection for Multiple Switch Topology | ||
84 | 13.2.2 MT Link Monitoring for Multiple Switch Topology | ||
67 | 85 | ||
68 | 13. Hardware Specific Considerations | 86 | 14. Switch Behavior Issues |
69 | 13.1 IBM BladeCenter | 87 | 14.1 Link Establishment and Failover Delays |
88 | 14.2 Duplicated Incoming Packets | ||
70 | 89 | ||
71 | 14. Frequently Asked Questions | 90 | 15. Hardware Specific Considerations |
91 | 15.1 IBM BladeCenter | ||
72 | 92 | ||
73 | 15. Resources and Links | 93 | 16. Frequently Asked Questions |
94 | |||
95 | 17. Resources and Links | ||
74 | 96 | ||
75 | 97 | ||
76 | 1. Bonding Driver Installation | 98 | 1. Bonding Driver Installation |
@@ -86,16 +108,10 @@ the following steps: | |||
86 | 1.1 Configure and build the kernel with bonding | 108 | 1.1 Configure and build the kernel with bonding |
87 | ----------------------------------------------- | 109 | ----------------------------------------------- |
88 | 110 | ||
89 | The latest version of the bonding driver is available in the | 111 | The current version of the bonding driver is available in the |
90 | drivers/net/bonding subdirectory of the most recent kernel source | 112 | drivers/net/bonding subdirectory of the most recent kernel source |
91 | (which is available on http://kernel.org). | 113 | (which is available on http://kernel.org). Most users "rolling their |
92 | 114 | own" will want to use the most recent kernel from kernel.org. | |
93 | Prior to the 2.4.11 kernel, the bonding driver was maintained | ||
94 | largely outside the kernel tree; patches for some earlier kernels are | ||
95 | available on the bonding sourceforge site, although those patches are | ||
96 | still several years out of date. Most users will want to use either | ||
97 | the most recent kernel from kernel.org or whatever kernel came with | ||
98 | their distro. | ||
99 | 115 | ||
100 | Configure kernel with "make menuconfig" (or "make xconfig" or | 116 | Configure kernel with "make menuconfig" (or "make xconfig" or |
101 | "make config"), then select "Bonding driver support" in the "Network | 117 | "make config"), then select "Bonding driver support" in the "Network |
@@ -103,8 +119,8 @@ device support" section. It is recommended that you configure the | |||
103 | driver as module since it is currently the only way to pass parameters | 119 | driver as module since it is currently the only way to pass parameters |
104 | to the driver or configure more than one bonding device. | 120 | to the driver or configure more than one bonding device. |
105 | 121 | ||
106 | Build and install the new kernel and modules, then proceed to | 122 | Build and install the new kernel and modules, then continue |
107 | step 2. | 123 | below to install ifenslave. |
108 | 124 | ||
109 | 1.2 Install ifenslave Control Utility | 125 | 1.2 Install ifenslave Control Utility |
110 | ------------------------------------- | 126 | ------------------------------------- |
@@ -147,9 +163,9 @@ default kernel source include directory. | |||
147 | Options for the bonding driver are supplied as parameters to | 163 | Options for the bonding driver are supplied as parameters to |
148 | the bonding module at load time. They may be given as command line | 164 | the bonding module at load time. They may be given as command line |
149 | arguments to the insmod or modprobe command, but are usually specified | 165 | arguments to the insmod or modprobe command, but are usually specified |
150 | in either the /etc/modprobe.conf configuration file, or in a | 166 | in either the /etc/modules.conf or /etc/modprobe.conf configuration |
151 | distro-specific configuration file (some of which are detailed in the | 167 | file, or in a distro-specific configuration file (some of which are |
152 | next section). | 168 | detailed in the next section). |
153 | 169 | ||
154 | The available bonding driver parameters are listed below. If a | 170 | The available bonding driver parameters are listed below. If a |
155 | parameter is not specified the default value is used. When initially | 171 | parameter is not specified the default value is used. When initially |
@@ -162,34 +178,34 @@ degradation will occur during link failures. Very few devices do not | |||
162 | support at least miimon, so there is really no reason not to use it. | 178 | support at least miimon, so there is really no reason not to use it. |
163 | 179 | ||
164 | Options with textual values will accept either the text name | 180 | Options with textual values will accept either the text name |
165 | or, for backwards compatibility, the option value. E.g., | 181 | or, for backwards compatibility, the option value. E.g., |
166 | "mode=802.3ad" and "mode=4" set the same mode. | 182 | "mode=802.3ad" and "mode=4" set the same mode. |
167 | 183 | ||
168 | The parameters are as follows: | 184 | The parameters are as follows: |
169 | 185 | ||
170 | arp_interval | 186 | arp_interval |
171 | 187 | ||
172 | Specifies the ARP monitoring frequency in milli-seconds. If | 188 | Specifies the ARP link monitoring frequency in milliseconds. |
173 | ARP monitoring is used in a load-balancing mode (mode 0 or 2), | 189 | If ARP monitoring is used in an etherchannel compatible mode |
174 | the switch should be configured in a mode that evenly | 190 | (modes 0 and 2), the switch should be configured in a mode |
175 | distributes packets across all links - such as round-robin. If | 191 | that evenly distributes packets across all links. If the |
176 | the switch is configured to distribute the packets in an XOR | 192 | switch is configured to distribute the packets in an XOR |
177 | fashion, all replies from the ARP targets will be received on | 193 | fashion, all replies from the ARP targets will be received on |
178 | the same link which could cause the other team members to | 194 | the same link which could cause the other team members to |
179 | fail. ARP monitoring should not be used in conjunction with | 195 | fail. ARP monitoring should not be used in conjunction with |
180 | miimon. A value of 0 disables ARP monitoring. The default | 196 | miimon. A value of 0 disables ARP monitoring. The default |
181 | value is 0. | 197 | value is 0. |
182 | 198 | ||
183 | arp_ip_target | 199 | arp_ip_target |
184 | 200 | ||
185 | Specifies the ip addresses to use when arp_interval is > 0. | 201 | Specifies the IP addresses to use as ARP monitoring peers when |
186 | These are the targets of the ARP request sent to determine the | 202 | arp_interval is > 0. These are the targets of the ARP request |
187 | health of the link to the targets. Specify these values in | 203 | sent to determine the health of the link to the targets. |
188 | ddd.ddd.ddd.ddd format. Multiple ip adresses must be | 204 | Specify these values in ddd.ddd.ddd.ddd format. Multiple IP |
189 | seperated by a comma. At least one IP address must be given | 205 | addresses must be separated by a comma. At least one IP |
190 | for ARP monitoring to function. The maximum number of targets | 206 | address must be given for ARP monitoring to function. The |
191 | that can be specified is 16. The default value is no IP | 207 | maximum number of targets that can be specified is 16. The |
192 | addresses. | 208 | default value is no IP addresses. |
193 | 209 | ||
194 | downdelay | 210 | downdelay |
195 | 211 | ||
@@ -207,11 +223,13 @@ lacp_rate | |||
207 | are: | 223 | are: |
208 | 224 | ||
209 | slow or 0 | 225 | slow or 0 |
210 | Request partner to transmit LACPDUs every 30 seconds (default) | 226 | Request partner to transmit LACPDUs every 30 seconds |
211 | 227 | ||
212 | fast or 1 | 228 | fast or 1 |
213 | Request partner to transmit LACPDUs every 1 second | 229 | Request partner to transmit LACPDUs every 1 second |
214 | 230 | ||
231 | The default is slow. | ||
232 | |||
215 | max_bonds | 233 | max_bonds |
216 | 234 | ||
217 | Specifies the number of bonding devices to create for this | 235 | Specifies the number of bonding devices to create for this |
@@ -221,10 +239,11 @@ max_bonds | |||
221 | 239 | ||
222 | miimon | 240 | miimon |
223 | 241 | ||
224 | Specifies the frequency in milli-seconds that MII link | 242 | Specifies the MII link monitoring frequency in milliseconds. |
225 | monitoring will occur. A value of zero disables MII link | 243 | This determines how often the link state of each slave is |
226 | monitoring. A value of 100 is a good starting point. The | 244 | inspected for link failures. A value of zero disables MII |
227 | use_carrier option, below, affects how the link state is | 245 | link monitoring. A value of 100 is a good starting point. |
246 | The use_carrier option, below, affects how the link state is | ||
228 | determined. See the High Availability section for additional | 247 | determined. See the High Availability section for additional |
229 | information. The default value is 0. | 248 | information. The default value is 0. |
230 | 249 | ||
@@ -246,17 +265,31 @@ mode | |||
246 | active. A different slave becomes active if, and only | 265 | active. A different slave becomes active if, and only |
247 | if, the active slave fails. The bond's MAC address is | 266 | if, the active slave fails. The bond's MAC address is |
248 | externally visible on only one port (network adapter) | 267 | externally visible on only one port (network adapter) |
249 | to avoid confusing the switch. This mode provides | 268 | to avoid confusing the switch. |
250 | fault tolerance. The primary option affects the | 269 | |
251 | behavior of this mode. | 270 | In bonding version 2.6.2 or later, when a failover |
271 | occurs in active-backup mode, bonding will issue one | ||
272 | or more gratuitous ARPs on the newly active slave. | ||
273 | One gratutious ARP is issued for the bonding master | ||
274 | interface and each VLAN interfaces configured above | ||
275 | it, provided that the interface has at least one IP | ||
276 | address configured. Gratuitous ARPs issued for VLAN | ||
277 | interfaces are tagged with the appropriate VLAN id. | ||
278 | |||
279 | This mode provides fault tolerance. The primary | ||
280 | option, documented below, affects the behavior of this | ||
281 | mode. | ||
252 | 282 | ||
253 | balance-xor or 2 | 283 | balance-xor or 2 |
254 | 284 | ||
255 | XOR policy: Transmit based on [(source MAC address | 285 | XOR policy: Transmit based on the selected transmit |
256 | XOR'd with destination MAC address) modulo slave | 286 | hash policy. The default policy is a simple [(source |
257 | count]. This selects the same slave for each | 287 | MAC address XOR'd with destination MAC address) modulo |
258 | destination MAC address. This mode provides load | 288 | slave count]. Alternate transmit policies may be |
259 | balancing and fault tolerance. | 289 | selected via the xmit_hash_policy option, described |
290 | below. | ||
291 | |||
292 | This mode provides load balancing and fault tolerance. | ||
260 | 293 | ||
261 | broadcast or 3 | 294 | broadcast or 3 |
262 | 295 | ||
@@ -270,7 +303,17 @@ mode | |||
270 | duplex settings. Utilizes all slaves in the active | 303 | duplex settings. Utilizes all slaves in the active |
271 | aggregator according to the 802.3ad specification. | 304 | aggregator according to the 802.3ad specification. |
272 | 305 | ||
273 | Pre-requisites: | 306 | Slave selection for outgoing traffic is done according |
307 | to the transmit hash policy, which may be changed from | ||
308 | the default simple XOR policy via the xmit_hash_policy | ||
309 | option, documented below. Note that not all transmit | ||
310 | policies may be 802.3ad compliant, particularly in | ||
311 | regards to the packet mis-ordering requirements of | ||
312 | section 43.2.4 of the 802.3ad standard. Differing | ||
313 | peer implementations will have varying tolerances for | ||
314 | noncompliance. | ||
315 | |||
316 | Prerequisites: | ||
274 | 317 | ||
275 | 1. Ethtool support in the base drivers for retrieving | 318 | 1. Ethtool support in the base drivers for retrieving |
276 | the speed and duplex of each slave. | 319 | the speed and duplex of each slave. |
@@ -333,7 +376,7 @@ mode | |||
333 | 376 | ||
334 | When a link is reconnected or a new slave joins the | 377 | When a link is reconnected or a new slave joins the |
335 | bond the receive traffic is redistributed among all | 378 | bond the receive traffic is redistributed among all |
336 | active slaves in the bond by intiating ARP Replies | 379 | active slaves in the bond by initiating ARP Replies |
337 | with the selected mac address to each of the | 380 | with the selected mac address to each of the |
338 | clients. The updelay parameter (detailed below) must | 381 | clients. The updelay parameter (detailed below) must |
339 | be set to a value equal or greater than the switch's | 382 | be set to a value equal or greater than the switch's |
@@ -396,6 +439,60 @@ use_carrier | |||
396 | 0 will use the deprecated MII / ETHTOOL ioctls. The default | 439 | 0 will use the deprecated MII / ETHTOOL ioctls. The default |
397 | value is 1. | 440 | value is 1. |
398 | 441 | ||
442 | xmit_hash_policy | ||
443 | |||
444 | Selects the transmit hash policy to use for slave selection in | ||
445 | balance-xor and 802.3ad modes. Possible values are: | ||
446 | |||
447 | layer2 | ||
448 | |||
449 | Uses XOR of hardware MAC addresses to generate the | ||
450 | hash. The formula is | ||
451 | |||
452 | (source MAC XOR destination MAC) modulo slave count | ||
453 | |||
454 | This algorithm will place all traffic to a particular | ||
455 | network peer on the same slave. | ||
456 | |||
457 | This algorithm is 802.3ad compliant. | ||
458 | |||
459 | layer3+4 | ||
460 | |||
461 | This policy uses upper layer protocol information, | ||
462 | when available, to generate the hash. This allows for | ||
463 | traffic to a particular network peer to span multiple | ||
464 | slaves, although a single connection will not span | ||
465 | multiple slaves. | ||
466 | |||
467 | The formula for unfragmented TCP and UDP packets is | ||
468 | |||
469 | ((source port XOR dest port) XOR | ||
470 | ((source IP XOR dest IP) AND 0xffff) | ||
471 | modulo slave count | ||
472 | |||
473 | For fragmented TCP or UDP packets and all other IP | ||
474 | protocol traffic, the source and destination port | ||
475 | information is omitted. For non-IP traffic, the | ||
476 | formula is the same as for the layer2 transmit hash | ||
477 | policy. | ||
478 | |||
479 | This policy is intended to mimic the behavior of | ||
480 | certain switches, notably Cisco switches with PFC2 as | ||
481 | well as some Foundry and IBM products. | ||
482 | |||
483 | This algorithm is not fully 802.3ad compliant. A | ||
484 | single TCP or UDP conversation containing both | ||
485 | fragmented and unfragmented packets will see packets | ||
486 | striped across two interfaces. This may result in out | ||
487 | of order delivery. Most traffic types will not meet | ||
488 | this criteria, as TCP rarely fragments traffic, and | ||
489 | most UDP traffic is not involved in extended | ||
490 | conversations. Other implementations of 802.3ad may | ||
491 | or may not tolerate this noncompliance. | ||
492 | |||
493 | The default value is layer2. This option was added in bonding | ||
494 | version 2.6.3. In earlier versions of bonding, this parameter does | ||
495 | not exist, and the layer2 policy is the only policy. | ||
399 | 496 | ||
400 | 497 | ||
401 | 3. Configuring Bonding Devices | 498 | 3. Configuring Bonding Devices |
@@ -448,8 +545,9 @@ Bonding devices can be managed by hand, however, as follows. | |||
448 | slave devices. On SLES 9, this is most easily done by running the | 545 | slave devices. On SLES 9, this is most easily done by running the |
449 | yast2 sysconfig configuration utility. The goal is for to create an | 546 | yast2 sysconfig configuration utility. The goal is for to create an |
450 | ifcfg-id file for each slave device. The simplest way to accomplish | 547 | ifcfg-id file for each slave device. The simplest way to accomplish |
451 | this is to configure the devices for DHCP. The name of the | 548 | this is to configure the devices for DHCP (this is only to get the |
452 | configuration file for each device will be of the form: | 549 | file ifcfg-id file created; see below for some issues with DHCP). The |
550 | name of the configuration file for each device will be of the form: | ||
453 | 551 | ||
454 | ifcfg-id-xx:xx:xx:xx:xx:xx | 552 | ifcfg-id-xx:xx:xx:xx:xx:xx |
455 | 553 | ||
@@ -459,7 +557,7 @@ the device's permanent MAC address. | |||
459 | Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been | 557 | Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been |
460 | created, it is necessary to edit the configuration files for the slave | 558 | created, it is necessary to edit the configuration files for the slave |
461 | devices (the MAC addresses correspond to those of the slave devices). | 559 | devices (the MAC addresses correspond to those of the slave devices). |
462 | Before editing, the file will contain muliple lines, and will look | 560 | Before editing, the file will contain multiple lines, and will look |
463 | something like this: | 561 | something like this: |
464 | 562 | ||
465 | BOOTPROTO='dhcp' | 563 | BOOTPROTO='dhcp' |
@@ -496,16 +594,11 @@ STARTMODE="onboot" | |||
496 | BONDING_MASTER="yes" | 594 | BONDING_MASTER="yes" |
497 | BONDING_MODULE_OPTS="mode=active-backup miimon=100" | 595 | BONDING_MODULE_OPTS="mode=active-backup miimon=100" |
498 | BONDING_SLAVE0="eth0" | 596 | BONDING_SLAVE0="eth0" |
499 | BONDING_SLAVE1="eth1" | 597 | BONDING_SLAVE1="bus-pci-0000:06:08.1" |
500 | 598 | ||
501 | Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK | 599 | Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK |
502 | values with the appropriate values for your network. | 600 | values with the appropriate values for your network. |
503 | 601 | ||
504 | Note that configuring the bonding device with BOOTPROTO='dhcp' | ||
505 | does not work; the scripts attempt to obtain the device address from | ||
506 | DHCP prior to adding any of the slave devices. Without active slaves, | ||
507 | the DHCP requests are not sent to the network. | ||
508 | |||
509 | The STARTMODE specifies when the device is brought online. | 602 | The STARTMODE specifies when the device is brought online. |
510 | The possible values are: | 603 | The possible values are: |
511 | 604 | ||
@@ -531,9 +624,17 @@ for the bonding mode, link monitoring, and so on here. Do not include | |||
531 | the max_bonds bonding parameter; this will confuse the configuration | 624 | the max_bonds bonding parameter; this will confuse the configuration |
532 | system if you have multiple bonding devices. | 625 | system if you have multiple bonding devices. |
533 | 626 | ||
534 | Finally, supply one BONDING_SLAVEn="ethX" for each slave, | 627 | Finally, supply one BONDING_SLAVEn="slave device" for each |
535 | where "n" is an increasing value, one for each slave, and "ethX" is | 628 | slave. where "n" is an increasing value, one for each slave. The |
536 | the name of the slave device (eth0, eth1, etc). | 629 | "slave device" is either an interface name, e.g., "eth0", or a device |
630 | specifier for the network device. The interface name is easier to | ||
631 | find, but the ethN names are subject to change at boot time if, e.g., | ||
632 | a device early in the sequence has failed. The device specifiers | ||
633 | (bus-pci-0000:06:08.1 in the example above) specify the physical | ||
634 | network device, and will not change unless the device's bus location | ||
635 | changes (for example, it is moved from one PCI slot to another). The | ||
636 | example above uses one of each type for demonstration purposes; most | ||
637 | configurations will choose one or the other for all slave devices. | ||
537 | 638 | ||
538 | When all configuration files have been modified or created, | 639 | When all configuration files have been modified or created, |
539 | networking must be restarted for the configuration changes to take | 640 | networking must be restarted for the configuration changes to take |
@@ -544,7 +645,7 @@ effect. This can be accomplished via the following: | |||
544 | Note that the network control script (/sbin/ifdown) will | 645 | Note that the network control script (/sbin/ifdown) will |
545 | remove the bonding module as part of the network shutdown processing, | 646 | remove the bonding module as part of the network shutdown processing, |
546 | so it is not necessary to remove the module by hand if, e.g., the | 647 | so it is not necessary to remove the module by hand if, e.g., the |
547 | module paramters have changed. | 648 | module parameters have changed. |
548 | 649 | ||
549 | Also, at this writing, YaST/YaST2 will not manage bonding | 650 | Also, at this writing, YaST/YaST2 will not manage bonding |
550 | devices (they do not show bonding interfaces on its list of network | 651 | devices (they do not show bonding interfaces on its list of network |
@@ -559,12 +660,37 @@ format can be found in an example ifcfg template file: | |||
559 | Note that the template does not document the various BONDING_ | 660 | Note that the template does not document the various BONDING_ |
560 | settings described above, but does describe many of the other options. | 661 | settings described above, but does describe many of the other options. |
561 | 662 | ||
663 | 3.1.1 Using DHCP with sysconfig | ||
664 | ------------------------------- | ||
665 | |||
666 | Under sysconfig, configuring a device with BOOTPROTO='dhcp' | ||
667 | will cause it to query DHCP for its IP address information. At this | ||
668 | writing, this does not function for bonding devices; the scripts | ||
669 | attempt to obtain the device address from DHCP prior to adding any of | ||
670 | the slave devices. Without active slaves, the DHCP requests are not | ||
671 | sent to the network. | ||
672 | |||
673 | 3.1.2 Configuring Multiple Bonds with sysconfig | ||
674 | ----------------------------------------------- | ||
675 | |||
676 | The sysconfig network initialization system is capable of | ||
677 | handling multiple bonding devices. All that is necessary is for each | ||
678 | bonding instance to have an appropriately configured ifcfg-bondX file | ||
679 | (as described above). Do not specify the "max_bonds" parameter to any | ||
680 | instance of bonding, as this will confuse sysconfig. If you require | ||
681 | multiple bonding devices with identical parameters, create multiple | ||
682 | ifcfg-bondX files. | ||
683 | |||
684 | Because the sysconfig scripts supply the bonding module | ||
685 | options in the ifcfg-bondX file, it is not necessary to add them to | ||
686 | the system /etc/modules.conf or /etc/modprobe.conf configuration file. | ||
687 | |||
562 | 3.2 Configuration with initscripts support | 688 | 3.2 Configuration with initscripts support |
563 | ------------------------------------------ | 689 | ------------------------------------------ |
564 | 690 | ||
565 | This section applies to distros using a version of initscripts | 691 | This section applies to distros using a version of initscripts |
566 | with bonding support, for example, Red Hat Linux 9 or Red Hat | 692 | with bonding support, for example, Red Hat Linux 9 or Red Hat |
567 | Enterprise Linux version 3. On these systems, the network | 693 | Enterprise Linux version 3 or 4. On these systems, the network |
568 | initialization scripts have some knowledge of bonding, and can be | 694 | initialization scripts have some knowledge of bonding, and can be |
569 | configured to control bonding devices. | 695 | configured to control bonding devices. |
570 | 696 | ||
@@ -614,10 +740,11 @@ USERCTL=no | |||
614 | Be sure to change the networking specific lines (IPADDR, | 740 | Be sure to change the networking specific lines (IPADDR, |
615 | NETMASK, NETWORK and BROADCAST) to match your network configuration. | 741 | NETMASK, NETWORK and BROADCAST) to match your network configuration. |
616 | 742 | ||
617 | Finally, it is necessary to edit /etc/modules.conf to load the | 743 | Finally, it is necessary to edit /etc/modules.conf (or |
618 | bonding module when the bond0 interface is brought up. The following | 744 | /etc/modprobe.conf, depending upon your distro) to load the bonding |
619 | sample lines in /etc/modules.conf will load the bonding module, and | 745 | module with your desired options when the bond0 interface is brought |
620 | select its options: | 746 | up. The following lines in /etc/modules.conf (or modprobe.conf) will |
747 | load the bonding module, and select its options: | ||
621 | 748 | ||
622 | alias bond0 bonding | 749 | alias bond0 bonding |
623 | options bond0 mode=balance-alb miimon=100 | 750 | options bond0 mode=balance-alb miimon=100 |
@@ -629,6 +756,33 @@ options for your configuration. | |||
629 | will restart the networking subsystem and your bond link should be now | 756 | will restart the networking subsystem and your bond link should be now |
630 | up and running. | 757 | up and running. |
631 | 758 | ||
759 | 3.2.1 Using DHCP with initscripts | ||
760 | --------------------------------- | ||
761 | |||
762 | Recent versions of initscripts (the version supplied with | ||
763 | Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do | ||
764 | have support for assigning IP information to bonding devices via DHCP. | ||
765 | |||
766 | To configure bonding for DHCP, configure it as described | ||
767 | above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" | ||
768 | and add a line consisting of "TYPE=Bonding". Note that the TYPE value | ||
769 | is case sensitive. | ||
770 | |||
771 | 3.2.2 Configuring Multiple Bonds with initscripts | ||
772 | ------------------------------------------------- | ||
773 | |||
774 | At this writing, the initscripts package does not directly | ||
775 | support loading the bonding driver multiple times, so the process for | ||
776 | doing so is the same as described in the "Configuring Multiple Bonds | ||
777 | Manually" section, below. | ||
778 | |||
779 | NOTE: It has been observed that some Red Hat supplied kernels | ||
780 | are apparently unable to rename modules at load time (the "-obonding1" | ||
781 | part). Attempts to pass that option to modprobe will produce an | ||
782 | "Operation not permitted" error. This has been reported on some | ||
783 | Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels | ||
784 | exhibiting this problem, it will be impossible to configure multiple | ||
785 | bonds with differing parameters. | ||
632 | 786 | ||
633 | 3.3 Configuring Bonding Manually | 787 | 3.3 Configuring Bonding Manually |
634 | -------------------------------- | 788 | -------------------------------- |
@@ -638,10 +792,11 @@ scripts (the sysconfig or initscripts package) do not have specific | |||
638 | knowledge of bonding. One such distro is SuSE Linux Enterprise Server | 792 | knowledge of bonding. One such distro is SuSE Linux Enterprise Server |
639 | version 8. | 793 | version 8. |
640 | 794 | ||
641 | The general methodology for these systems is to place the | 795 | The general method for these systems is to place the bonding |
642 | bonding module parameters into /etc/modprobe.conf, then add modprobe | 796 | module parameters into /etc/modules.conf or /etc/modprobe.conf (as |
643 | and/or ifenslave commands to the system's global init script. The | 797 | appropriate for the installed distro), then add modprobe and/or |
644 | name of the global init script differs; for sysconfig, it is | 798 | ifenslave commands to the system's global init script. The name of |
799 | the global init script differs; for sysconfig, it is | ||
645 | /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. | 800 | /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. |
646 | 801 | ||
647 | For example, if you wanted to make a simple bond of two e100 | 802 | For example, if you wanted to make a simple bond of two e100 |
@@ -649,7 +804,7 @@ devices (presumed to be eth0 and eth1), and have it persist across | |||
649 | reboots, edit the appropriate file (/etc/init.d/boot.local or | 804 | reboots, edit the appropriate file (/etc/init.d/boot.local or |
650 | /etc/rc.d/rc.local), and add the following: | 805 | /etc/rc.d/rc.local), and add the following: |
651 | 806 | ||
652 | modprobe bonding -obond0 mode=balance-alb miimon=100 | 807 | modprobe bonding mode=balance-alb miimon=100 |
653 | modprobe e100 | 808 | modprobe e100 |
654 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up | 809 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up |
655 | ifenslave bond0 eth0 | 810 | ifenslave bond0 eth0 |
@@ -657,11 +812,7 @@ ifenslave bond0 eth1 | |||
657 | 812 | ||
658 | Replace the example bonding module parameters and bond0 | 813 | Replace the example bonding module parameters and bond0 |
659 | network configuration (IP address, netmask, etc) with the appropriate | 814 | network configuration (IP address, netmask, etc) with the appropriate |
660 | values for your configuration. The above example loads the bonding | 815 | values for your configuration. |
661 | module with the name "bond0," this simplifies the naming if multiple | ||
662 | bonding modules are loaded (each successive instance of the module is | ||
663 | given a different name, and the module instance names match the | ||
664 | bonding interface names). | ||
665 | 816 | ||
666 | Unfortunately, this method will not provide support for the | 817 | Unfortunately, this method will not provide support for the |
667 | ifup and ifdown scripts on the bond devices. To reload the bonding | 818 | ifup and ifdown scripts on the bond devices. To reload the bonding |
@@ -684,20 +835,23 @@ appropriate device driver modules. For our example above, you can do | |||
684 | the following: | 835 | the following: |
685 | 836 | ||
686 | # ifconfig bond0 down | 837 | # ifconfig bond0 down |
687 | # rmmod bond0 | 838 | # rmmod bonding |
688 | # rmmod e100 | 839 | # rmmod e100 |
689 | 840 | ||
690 | Again, for convenience, it may be desirable to create a script | 841 | Again, for convenience, it may be desirable to create a script |
691 | with these commands. | 842 | with these commands. |
692 | 843 | ||
693 | 844 | ||
694 | 3.4 Configuring Multiple Bonds | 845 | 3.3.1 Configuring Multiple Bonds Manually |
695 | ------------------------------ | 846 | ----------------------------------------- |
696 | 847 | ||
697 | This section contains information on configuring multiple | 848 | This section contains information on configuring multiple |
698 | bonding devices with differing options. If you require multiple | 849 | bonding devices with differing options for those systems whose network |
699 | bonding devices, but all with the same options, see the "max_bonds" | 850 | initialization scripts lack support for configuring multiple bonds. |
700 | module paramter, documented above. | 851 | |
852 | If you require multiple bonding devices, but all with the same | ||
853 | options, you may wish to use the "max_bonds" module parameter, | ||
854 | documented above. | ||
701 | 855 | ||
702 | To create multiple bonding devices with differing options, it | 856 | To create multiple bonding devices with differing options, it |
703 | is necessary to load the bonding driver multiple times. Note that | 857 | is necessary to load the bonding driver multiple times. Note that |
@@ -724,11 +878,16 @@ named "bond0" and creates the bond0 device in balance-rr mode with an | |||
724 | miimon of 100. The second instance is named "bond1" and creates the | 878 | miimon of 100. The second instance is named "bond1" and creates the |
725 | bond1 device in balance-alb mode with an miimon of 50. | 879 | bond1 device in balance-alb mode with an miimon of 50. |
726 | 880 | ||
881 | In some circumstances (typically with older distributions), | ||
882 | the above does not work, and the second bonding instance never sees | ||
883 | its options. In that case, the second options line can be substituted | ||
884 | as follows: | ||
885 | |||
886 | install bonding1 /sbin/modprobe bonding -obond1 mode=balance-alb miimon=50 | ||
887 | |||
727 | This may be repeated any number of times, specifying a new and | 888 | This may be repeated any number of times, specifying a new and |
728 | unique name in place of bond0 or bond1 for each instance. | 889 | unique name in place of bond1 for each subsequent instance. |
729 | 890 | ||
730 | When the appropriate module paramters are in place, then | ||
731 | configure bonding according to the instructions for your distro. | ||
732 | 891 | ||
733 | 5. Querying Bonding Configuration | 892 | 5. Querying Bonding Configuration |
734 | ================================= | 893 | ================================= |
@@ -846,8 +1005,8 @@ tagged internally by bonding itself. As a result, bonding must | |||
846 | self generated packets. | 1005 | self generated packets. |
847 | 1006 | ||
848 | For reasons of simplicity, and to support the use of adapters | 1007 | For reasons of simplicity, and to support the use of adapters |
849 | that can do VLAN hardware acceleration offloding, the bonding | 1008 | that can do VLAN hardware acceleration offloading, the bonding |
850 | interface declares itself as fully hardware offloaing capable, it gets | 1009 | interface declares itself as fully hardware offloading capable, it gets |
851 | the add_vid/kill_vid notifications to gather the necessary | 1010 | the add_vid/kill_vid notifications to gather the necessary |
852 | information, and it propagates those actions to the slaves. In case | 1011 | information, and it propagates those actions to the slaves. In case |
853 | of mixed adapter types, hardware accelerated tagged packets that | 1012 | of mixed adapter types, hardware accelerated tagged packets that |
@@ -880,7 +1039,7 @@ bond interface: | |||
880 | matches the hardware address of the VLAN interfaces. | 1039 | matches the hardware address of the VLAN interfaces. |
881 | 1040 | ||
882 | Note that changing a VLAN interface's HW address would set the | 1041 | Note that changing a VLAN interface's HW address would set the |
883 | underlying device -- i.e. the bonding interface -- to promiscouos | 1042 | underlying device -- i.e. the bonding interface -- to promiscuous |
884 | mode, which might not be what you want. | 1043 | mode, which might not be what you want. |
885 | 1044 | ||
886 | 1045 | ||
@@ -923,7 +1082,7 @@ down or have a problem making it unresponsive to ARP requests. Having | |||
923 | an additional target (or several) increases the reliability of the ARP | 1082 | an additional target (or several) increases the reliability of the ARP |
924 | monitoring. | 1083 | monitoring. |
925 | 1084 | ||
926 | Multiple ARP targets must be seperated by commas as follows: | 1085 | Multiple ARP targets must be separated by commas as follows: |
927 | 1086 | ||
928 | # example options for ARP monitoring with three targets | 1087 | # example options for ARP monitoring with three targets |
929 | alias bond0 bonding | 1088 | alias bond0 bonding |
@@ -1045,7 +1204,7 @@ install bonding /sbin/modprobe tg3; /sbin/modprobe e1000; | |||
1045 | This will, when loading the bonding module, rather than | 1204 | This will, when loading the bonding module, rather than |
1046 | performing the normal action, instead execute the provided command. | 1205 | performing the normal action, instead execute the provided command. |
1047 | This command loads the device drivers in the order needed, then calls | 1206 | This command loads the device drivers in the order needed, then calls |
1048 | modprobe with --ingore-install to cause the normal action to then take | 1207 | modprobe with --ignore-install to cause the normal action to then take |
1049 | place. Full documentation on this can be found in the modprobe.conf | 1208 | place. Full documentation on this can be found in the modprobe.conf |
1050 | and modprobe manual pages. | 1209 | and modprobe manual pages. |
1051 | 1210 | ||
@@ -1130,14 +1289,14 @@ association. | |||
1130 | common to enable promiscuous mode on the device, so that all traffic | 1289 | common to enable promiscuous mode on the device, so that all traffic |
1131 | is seen (instead of seeing only traffic destined for the local host). | 1290 | is seen (instead of seeing only traffic destined for the local host). |
1132 | The bonding driver handles promiscuous mode changes to the bonding | 1291 | The bonding driver handles promiscuous mode changes to the bonding |
1133 | master device (e.g., bond0), and propogates the setting to the slave | 1292 | master device (e.g., bond0), and propagates the setting to the slave |
1134 | devices. | 1293 | devices. |
1135 | 1294 | ||
1136 | For the balance-rr, balance-xor, broadcast, and 802.3ad modes, | 1295 | For the balance-rr, balance-xor, broadcast, and 802.3ad modes, |
1137 | the promiscuous mode setting is propogated to all slaves. | 1296 | the promiscuous mode setting is propagated to all slaves. |
1138 | 1297 | ||
1139 | For the active-backup, balance-tlb and balance-alb modes, the | 1298 | For the active-backup, balance-tlb and balance-alb modes, the |
1140 | promiscuous mode setting is propogated only to the active slave. | 1299 | promiscuous mode setting is propagated only to the active slave. |
1141 | 1300 | ||
1142 | For balance-tlb mode, the active slave is the slave currently | 1301 | For balance-tlb mode, the active slave is the slave currently |
1143 | receiving inbound traffic. | 1302 | receiving inbound traffic. |
@@ -1148,46 +1307,182 @@ sending to peers that are unassigned or if the load is unbalanced. | |||
1148 | 1307 | ||
1149 | For the active-backup, balance-tlb and balance-alb modes, when | 1308 | For the active-backup, balance-tlb and balance-alb modes, when |
1150 | the active slave changes (e.g., due to a link failure), the | 1309 | the active slave changes (e.g., due to a link failure), the |
1151 | promiscuous setting will be propogated to the new active slave. | 1310 | promiscuous setting will be propagated to the new active slave. |
1152 | 1311 | ||
1153 | 12. High Availability Information | 1312 | 12. Configuring Bonding for High Availability |
1154 | ================================= | 1313 | ============================================= |
1155 | 1314 | ||
1156 | High Availability refers to configurations that provide | 1315 | High Availability refers to configurations that provide |
1157 | maximum network availability by having redundant or backup devices, | 1316 | maximum network availability by having redundant or backup devices, |
1158 | links and switches between the host and the rest of the world. | 1317 | links or switches between the host and the rest of the world. The |
1159 | 1318 | goal is to provide the maximum availability of network connectivity | |
1160 | There are currently two basic methods for configuring to | 1319 | (i.e., the network always works), even though other configurations |
1161 | maximize availability. They are dependent on the network topology and | 1320 | could provide higher throughput. |
1162 | the primary goal of the configuration, but in general, a configuration | ||
1163 | can be optimized for maximum available bandwidth, or for maximum | ||
1164 | network availability. | ||
1165 | 1321 | ||
1166 | 12.1 High Availability in a Single Switch Topology | 1322 | 12.1 High Availability in a Single Switch Topology |
1167 | -------------------------------------------------- | 1323 | -------------------------------------------------- |
1168 | 1324 | ||
1169 | If two hosts (or a host and a switch) are directly connected | 1325 | If two hosts (or a host and a single switch) are directly |
1170 | via multiple physical links, then there is no network availability | 1326 | connected via multiple physical links, then there is no availability |
1171 | penalty for optimizing for maximum bandwidth: there is only one switch | 1327 | penalty to optimizing for maximum bandwidth. In this case, there is |
1172 | (or peer), so if it fails, you have no alternative access to fail over | 1328 | only one switch (or peer), so if it fails, there is no alternative |
1173 | to. | 1329 | access to fail over to. Additionally, the bonding load balance modes |
1330 | support link monitoring of their members, so if individual links fail, | ||
1331 | the load will be rebalanced across the remaining devices. | ||
1332 | |||
1333 | See Section 13, "Configuring Bonding for Maximum Throughput" | ||
1334 | for information on configuring bonding with one peer device. | ||
1335 | |||
1336 | 12.2 High Availability in a Multiple Switch Topology | ||
1337 | ---------------------------------------------------- | ||
1338 | |||
1339 | With multiple switches, the configuration of bonding and the | ||
1340 | network changes dramatically. In multiple switch topologies, there is | ||
1341 | a trade off between network availability and usable bandwidth. | ||
1342 | |||
1343 | Below is a sample network, configured to maximize the | ||
1344 | availability of the network: | ||
1174 | 1345 | ||
1175 | Example 1 : host to switch (or other host) | 1346 | | | |
1347 | |port3 port3| | ||
1348 | +-----+----+ +-----+----+ | ||
1349 | | |port2 ISL port2| | | ||
1350 | | switch A +--------------------------+ switch B | | ||
1351 | | | | | | ||
1352 | +-----+----+ +-----++---+ | ||
1353 | |port1 port1| | ||
1354 | | +-------+ | | ||
1355 | +-------------+ host1 +---------------+ | ||
1356 | eth0 +-------+ eth1 | ||
1176 | 1357 | ||
1177 | +----------+ +----------+ | 1358 | In this configuration, there is a link between the two |
1178 | | |eth0 eth0| switch | | 1359 | switches (ISL, or inter switch link), and multiple ports connecting to |
1179 | | Host A +--------------------------+ or | | 1360 | the outside world ("port3" on each switch). There is no technical |
1180 | | +--------------------------+ other | | 1361 | reason that this could not be extended to a third switch. |
1181 | | |eth1 eth1| host | | ||
1182 | +----------+ +----------+ | ||
1183 | 1362 | ||
1363 | 12.2.1 HA Bonding Mode Selection for Multiple Switch Topology | ||
1364 | ------------------------------------------------------------- | ||
1184 | 1365 | ||
1185 | 12.1.1 Bonding Mode Selection for single switch topology | 1366 | In a topology such as the example above, the active-backup and |
1186 | -------------------------------------------------------- | 1367 | broadcast modes are the only useful bonding modes when optimizing for |
1368 | availability; the other modes require all links to terminate on the | ||
1369 | same peer for them to behave rationally. | ||
1370 | |||
1371 | active-backup: This is generally the preferred mode, particularly if | ||
1372 | the switches have an ISL and play together well. If the | ||
1373 | network configuration is such that one switch is specifically | ||
1374 | a backup switch (e.g., has lower capacity, higher cost, etc), | ||
1375 | then the primary option can be used to insure that the | ||
1376 | preferred link is always used when it is available. | ||
1377 | |||
1378 | broadcast: This mode is really a special purpose mode, and is suitable | ||
1379 | only for very specific needs. For example, if the two | ||
1380 | switches are not connected (no ISL), and the networks beyond | ||
1381 | them are totally independent. In this case, if it is | ||
1382 | necessary for some specific one-way traffic to reach both | ||
1383 | independent networks, then the broadcast mode may be suitable. | ||
1384 | |||
1385 | 12.2.2 HA Link Monitoring Selection for Multiple Switch Topology | ||
1386 | ---------------------------------------------------------------- | ||
1387 | |||
1388 | The choice of link monitoring ultimately depends upon your | ||
1389 | switch. If the switch can reliably fail ports in response to other | ||
1390 | failures, then either the MII or ARP monitors should work. For | ||
1391 | example, in the above example, if the "port3" link fails at the remote | ||
1392 | end, the MII monitor has no direct means to detect this. The ARP | ||
1393 | monitor could be configured with a target at the remote end of port3, | ||
1394 | thus detecting that failure without switch support. | ||
1395 | |||
1396 | In general, however, in a multiple switch topology, the ARP | ||
1397 | monitor can provide a higher level of reliability in detecting end to | ||
1398 | end connectivity failures (which may be caused by the failure of any | ||
1399 | individual component to pass traffic for any reason). Additionally, | ||
1400 | the ARP monitor should be configured with multiple targets (at least | ||
1401 | one for each switch in the network). This will insure that, | ||
1402 | regardless of which switch is active, the ARP monitor has a suitable | ||
1403 | target to query. | ||
1404 | |||
1405 | |||
1406 | 13. Configuring Bonding for Maximum Throughput | ||
1407 | ============================================== | ||
1408 | |||
1409 | 13.1 Maximizing Throughput in a Single Switch Topology | ||
1410 | ------------------------------------------------------ | ||
1411 | |||
1412 | In a single switch configuration, the best method to maximize | ||
1413 | throughput depends upon the application and network environment. The | ||
1414 | various load balancing modes each have strengths and weaknesses in | ||
1415 | different environments, as detailed below. | ||
1416 | |||
1417 | For this discussion, we will break down the topologies into | ||
1418 | two categories. Depending upon the destination of most traffic, we | ||
1419 | categorize them into either "gatewayed" or "local" configurations. | ||
1420 | |||
1421 | In a gatewayed configuration, the "switch" is acting primarily | ||
1422 | as a router, and the majority of traffic passes through this router to | ||
1423 | other networks. An example would be the following: | ||
1424 | |||
1425 | |||
1426 | +----------+ +----------+ | ||
1427 | | |eth0 port1| | to other networks | ||
1428 | | Host A +---------------------+ router +-------------------> | ||
1429 | | +---------------------+ | Hosts B and C are out | ||
1430 | | |eth1 port2| | here somewhere | ||
1431 | +----------+ +----------+ | ||
1432 | |||
1433 | The router may be a dedicated router device, or another host | ||
1434 | acting as a gateway. For our discussion, the important point is that | ||
1435 | the majority of traffic from Host A will pass through the router to | ||
1436 | some other network before reaching its final destination. | ||
1437 | |||
1438 | In a gatewayed network configuration, although Host A may | ||
1439 | communicate with many other systems, all of its traffic will be sent | ||
1440 | and received via one other peer on the local network, the router. | ||
1441 | |||
1442 | Note that the case of two systems connected directly via | ||
1443 | multiple physical links is, for purposes of configuring bonding, the | ||
1444 | same as a gatewayed configuration. In that case, it happens that all | ||
1445 | traffic is destined for the "gateway" itself, not some other network | ||
1446 | beyond the gateway. | ||
1447 | |||
1448 | In a local configuration, the "switch" is acting primarily as | ||
1449 | a switch, and the majority of traffic passes through this switch to | ||
1450 | reach other stations on the same network. An example would be the | ||
1451 | following: | ||
1452 | |||
1453 | +----------+ +----------+ +--------+ | ||
1454 | | |eth0 port1| +-------+ Host B | | ||
1455 | | Host A +------------+ switch |port3 +--------+ | ||
1456 | | +------------+ | +--------+ | ||
1457 | | |eth1 port2| +------------------+ Host C | | ||
1458 | +----------+ +----------+port4 +--------+ | ||
1459 | |||
1460 | |||
1461 | Again, the switch may be a dedicated switch device, or another | ||
1462 | host acting as a gateway. For our discussion, the important point is | ||
1463 | that the majority of traffic from Host A is destined for other hosts | ||
1464 | on the same local network (Hosts B and C in the above example). | ||
1465 | |||
1466 | In summary, in a gatewayed configuration, traffic to and from | ||
1467 | the bonded device will be to the same MAC level peer on the network | ||
1468 | (the gateway itself, i.e., the router), regardless of its final | ||
1469 | destination. In a local configuration, traffic flows directly to and | ||
1470 | from the final destinations, thus, each destination (Host B, Host C) | ||
1471 | will be addressed directly by their individual MAC addresses. | ||
1472 | |||
1473 | This distinction between a gatewayed and a local network | ||
1474 | configuration is important because many of the load balancing modes | ||
1475 | available use the MAC addresses of the local network source and | ||
1476 | destination to make load balancing decisions. The behavior of each | ||
1477 | mode is described below. | ||
1478 | |||
1479 | |||
1480 | 13.1.1 MT Bonding Mode Selection for Single Switch Topology | ||
1481 | ----------------------------------------------------------- | ||
1187 | 1482 | ||
1188 | This configuration is the easiest to set up and to understand, | 1483 | This configuration is the easiest to set up and to understand, |
1189 | although you will have to decide which bonding mode best suits your | 1484 | although you will have to decide which bonding mode best suits your |
1190 | needs. The tradeoffs for each mode are detailed below: | 1485 | needs. The trade offs for each mode are detailed below: |
1191 | 1486 | ||
1192 | balance-rr: This mode is the only mode that will permit a single | 1487 | balance-rr: This mode is the only mode that will permit a single |
1193 | TCP/IP connection to stripe traffic across multiple | 1488 | TCP/IP connection to stripe traffic across multiple |
@@ -1206,6 +1501,23 @@ balance-rr: This mode is the only mode that will permit a single | |||
1206 | interface's worth of throughput, even after adjusting | 1501 | interface's worth of throughput, even after adjusting |
1207 | tcp_reordering. | 1502 | tcp_reordering. |
1208 | 1503 | ||
1504 | Note that this out of order delivery occurs when both the | ||
1505 | sending and receiving systems are utilizing a multiple | ||
1506 | interface bond. Consider a configuration in which a | ||
1507 | balance-rr bond feeds into a single higher capacity network | ||
1508 | channel (e.g., multiple 100Mb/sec ethernets feeding a single | ||
1509 | gigabit ethernet via an etherchannel capable switch). In this | ||
1510 | configuration, traffic sent from the multiple 100Mb devices to | ||
1511 | a destination connected to the gigabit device will not see | ||
1512 | packets out of order. However, traffic sent from the gigabit | ||
1513 | device to the multiple 100Mb devices may or may not see | ||
1514 | traffic out of order, depending upon the balance policy of the | ||
1515 | switch. Many switches do not support any modes that stripe | ||
1516 | traffic (instead choosing a port based upon IP or MAC level | ||
1517 | addresses); for those devices, traffic flowing from the | ||
1518 | gigabit device to the many 100Mb devices will only utilize one | ||
1519 | interface. | ||
1520 | |||
1209 | If you are utilizing protocols other than TCP/IP, UDP for | 1521 | If you are utilizing protocols other than TCP/IP, UDP for |
1210 | example, and your application can tolerate out of order | 1522 | example, and your application can tolerate out of order |
1211 | delivery, then this mode can allow for single stream datagram | 1523 | delivery, then this mode can allow for single stream datagram |
@@ -1220,16 +1532,21 @@ active-backup: There is not much advantage in this network topology to | |||
1220 | connected to the same peer as the primary. In this case, a | 1532 | connected to the same peer as the primary. In this case, a |
1221 | load balancing mode (with link monitoring) will provide the | 1533 | load balancing mode (with link monitoring) will provide the |
1222 | same level of network availability, but with increased | 1534 | same level of network availability, but with increased |
1223 | available bandwidth. On the plus side, it does not require | 1535 | available bandwidth. On the plus side, active-backup mode |
1224 | any configuration of the switch. | 1536 | does not require any configuration of the switch, so it may |
1537 | have value if the hardware available does not support any of | ||
1538 | the load balance modes. | ||
1225 | 1539 | ||
1226 | balance-xor: This mode will limit traffic such that packets destined | 1540 | balance-xor: This mode will limit traffic such that packets destined |
1227 | for specific peers will always be sent over the same | 1541 | for specific peers will always be sent over the same |
1228 | interface. Since the destination is determined by the MAC | 1542 | interface. Since the destination is determined by the MAC |
1229 | addresses involved, this may be desirable if you have a large | 1543 | addresses involved, this mode works best in a "local" network |
1230 | network with many hosts. It is likely to be suboptimal if all | 1544 | configuration (as described above), with destinations all on |
1231 | your traffic is passed through a single router, however. As | 1545 | the same local network. This mode is likely to be suboptimal |
1232 | with balance-rr, the switch ports need to be configured for | 1546 | if all your traffic is passed through a single router (i.e., a |
1547 | "gatewayed" network configuration, as described above). | ||
1548 | |||
1549 | As with balance-rr, the switch ports need to be configured for | ||
1233 | "etherchannel" or "trunking." | 1550 | "etherchannel" or "trunking." |
1234 | 1551 | ||
1235 | broadcast: Like active-backup, there is not much advantage to this | 1552 | broadcast: Like active-backup, there is not much advantage to this |
@@ -1241,122 +1558,131 @@ broadcast: Like active-backup, there is not much advantage to this | |||
1241 | protocol includes automatic configuration of the aggregates, | 1558 | protocol includes automatic configuration of the aggregates, |
1242 | so minimal manual configuration of the switch is needed | 1559 | so minimal manual configuration of the switch is needed |
1243 | (typically only to designate that some set of devices is | 1560 | (typically only to designate that some set of devices is |
1244 | usable for 802.3ad). The 802.3ad standard also mandates that | 1561 | available for 802.3ad). The 802.3ad standard also mandates |
1245 | frames be delivered in order (within certain limits), so in | 1562 | that frames be delivered in order (within certain limits), so |
1246 | general single connections will not see misordering of | 1563 | in general single connections will not see misordering of |
1247 | packets. The 802.3ad mode does have some drawbacks: the | 1564 | packets. The 802.3ad mode does have some drawbacks: the |
1248 | standard mandates that all devices in the aggregate operate at | 1565 | standard mandates that all devices in the aggregate operate at |
1249 | the same speed and duplex. Also, as with all bonding load | 1566 | the same speed and duplex. Also, as with all bonding load |
1250 | balance modes other than balance-rr, no single connection will | 1567 | balance modes other than balance-rr, no single connection will |
1251 | be able to utilize more than a single interface's worth of | 1568 | be able to utilize more than a single interface's worth of |
1252 | bandwidth. Additionally, the linux bonding 802.3ad | 1569 | bandwidth. |
1253 | implementation distributes traffic by peer (using an XOR of | 1570 | |
1254 | MAC addresses), so in general all traffic to a particular | 1571 | Additionally, the linux bonding 802.3ad implementation |
1255 | destination will use the same interface. Finally, the 802.3ad | 1572 | distributes traffic by peer (using an XOR of MAC addresses), |
1256 | mode mandates the use of the MII monitor, therefore, the ARP | 1573 | so in a "gatewayed" configuration, all outgoing traffic will |
1257 | monitor is not available in this mode. | 1574 | generally use the same device. Incoming traffic may also end |
1258 | 1575 | up on a single device, but that is dependent upon the | |
1259 | balance-tlb: This mode is also a good choice for this type of | 1576 | balancing policy of the peer's 8023.ad implementation. In a |
1260 | topology. It has no special switch configuration | 1577 | "local" configuration, traffic will be distributed across the |
1261 | requirements, and balances outgoing traffic by peer, in a | 1578 | devices in the bond. |
1262 | vaguely intelligent manner (not a simple XOR as in balance-xor | 1579 | |
1263 | or 802.3ad mode), so that unlucky MAC addresses will not all | 1580 | Finally, the 802.3ad mode mandates the use of the MII monitor, |
1264 | "bunch up" on a single interface. Interfaces may be of | 1581 | therefore, the ARP monitor is not available in this mode. |
1265 | differing speeds. On the down side, in this mode all incoming | 1582 | |
1266 | traffic arrives over a single interface, this mode requires | 1583 | balance-tlb: The balance-tlb mode balances outgoing traffic by peer. |
1267 | certain ethtool support in the network device driver of the | 1584 | Since the balancing is done according to MAC address, in a |
1268 | slave interfaces, and the ARP monitor is not available. | 1585 | "gatewayed" configuration (as described above), this mode will |
1269 | 1586 | send all traffic across a single device. However, in a | |
1270 | balance-alb: This mode is everything that balance-tlb is, and more. It | 1587 | "local" network configuration, this mode balances multiple |
1271 | has all of the features (and restrictions) of balance-tlb, and | 1588 | local network peers across devices in a vaguely intelligent |
1272 | will also balance incoming traffic from peers (as described in | 1589 | manner (not a simple XOR as in balance-xor or 802.3ad mode), |
1273 | the Bonding Module Options section, above). The only extra | 1590 | so that mathematically unlucky MAC addresses (i.e., ones that |
1274 | down side to this mode is that the network device driver must | 1591 | XOR to the same value) will not all "bunch up" on a single |
1275 | support changing the hardware address while the device is | 1592 | interface. |
1276 | open. | 1593 | |
1277 | 1594 | Unlike 802.3ad, interfaces may be of differing speeds, and no | |
1278 | 12.1.2 Link Monitoring for Single Switch Topology | 1595 | special switch configuration is required. On the down side, |
1279 | ------------------------------------------------- | 1596 | in this mode all incoming traffic arrives over a single |
1597 | interface, this mode requires certain ethtool support in the | ||
1598 | network device driver of the slave interfaces, and the ARP | ||
1599 | monitor is not available. | ||
1600 | |||
1601 | balance-alb: This mode is everything that balance-tlb is, and more. | ||
1602 | It has all of the features (and restrictions) of balance-tlb, | ||
1603 | and will also balance incoming traffic from local network | ||
1604 | peers (as described in the Bonding Module Options section, | ||
1605 | above). | ||
1606 | |||
1607 | The only additional down side to this mode is that the network | ||
1608 | device driver must support changing the hardware address while | ||
1609 | the device is open. | ||
1610 | |||
1611 | 13.1.2 MT Link Monitoring for Single Switch Topology | ||
1612 | ---------------------------------------------------- | ||
1280 | 1613 | ||
1281 | The choice of link monitoring may largely depend upon which | 1614 | The choice of link monitoring may largely depend upon which |
1282 | mode you choose to use. The more advanced load balancing modes do not | 1615 | mode you choose to use. The more advanced load balancing modes do not |
1283 | support the use of the ARP monitor, and are thus restricted to using | 1616 | support the use of the ARP monitor, and are thus restricted to using |
1284 | the MII monitor (which does not provide as high a level of assurance | 1617 | the MII monitor (which does not provide as high a level of end to end |
1285 | as the ARP monitor). | 1618 | assurance as the ARP monitor). |
1286 | 1619 | ||
1287 | 1620 | 13.2 Maximum Throughput in a Multiple Switch Topology | |
1288 | 12.2 High Availability in a Multiple Switch Topology | 1621 | ----------------------------------------------------- |
1289 | ---------------------------------------------------- | 1622 | |
1290 | 1623 | Multiple switches may be utilized to optimize for throughput | |
1291 | With multiple switches, the configuration of bonding and the | 1624 | when they are configured in parallel as part of an isolated network |
1292 | network changes dramatically. In multiple switch topologies, there is | 1625 | between two or more systems, for example: |
1293 | a tradeoff between network availability and usable bandwidth. | 1626 | |
1294 | 1627 | +-----------+ | |
1295 | Below is a sample network, configured to maximize the | 1628 | | Host A | |
1296 | availability of the network: | 1629 | +-+---+---+-+ |
1297 | 1630 | | | | | |
1298 | | | | 1631 | +--------+ | +---------+ |
1299 | |port3 port3| | 1632 | | | | |
1300 | +-----+----+ +-----+----+ | 1633 | +------+---+ +-----+----+ +-----+----+ |
1301 | | |port2 ISL port2| | | 1634 | | Switch A | | Switch B | | Switch C | |
1302 | | switch A +--------------------------+ switch B | | 1635 | +------+---+ +-----+----+ +-----+----+ |
1303 | | | | | | 1636 | | | | |
1304 | +-----+----+ +-----++---+ | 1637 | +--------+ | +---------+ |
1305 | |port1 port1| | 1638 | | | | |
1306 | | +-------+ | | 1639 | +-+---+---+-+ |
1307 | +-------------+ host1 +---------------+ | 1640 | | Host B | |
1308 | eth0 +-------+ eth1 | 1641 | +-----------+ |
1309 | 1642 | ||
1310 | In this configuration, there is a link between the two | 1643 | In this configuration, the switches are isolated from one |
1311 | switches (ISL, or inter switch link), and multiple ports connecting to | 1644 | another. One reason to employ a topology such as this is for an |
1312 | the outside world ("port3" on each switch). There is no technical | 1645 | isolated network with many hosts (a cluster configured for high |
1313 | reason that this could not be extended to a third switch. | 1646 | performance, for example), using multiple smaller switches can be more |
1314 | 1647 | cost effective than a single larger switch, e.g., on a network with 24 | |
1315 | 12.2.1 Bonding Mode Selection for Multiple Switch Topology | 1648 | hosts, three 24 port switches can be significantly less expensive than |
1316 | ---------------------------------------------------------- | 1649 | a single 72 port switch. |
1317 | 1650 | ||
1318 | In a topology such as this, the active-backup and broadcast | 1651 | If access beyond the network is required, an individual host |
1319 | modes are the only useful bonding modes; the other modes require all | 1652 | can be equipped with an additional network device connected to an |
1320 | links to terminate on the same peer for them to behave rationally. | 1653 | external network; this host then additionally acts as a gateway. |
1321 | 1654 | ||
1322 | active-backup: This is generally the preferred mode, particularly if | 1655 | 13.2.1 MT Bonding Mode Selection for Multiple Switch Topology |
1323 | the switches have an ISL and play together well. If the | ||
1324 | network configuration is such that one switch is specifically | ||
1325 | a backup switch (e.g., has lower capacity, higher cost, etc), | ||
1326 | then the primary option can be used to insure that the | ||
1327 | preferred link is always used when it is available. | ||
1328 | |||
1329 | broadcast: This mode is really a special purpose mode, and is suitable | ||
1330 | only for very specific needs. For example, if the two | ||
1331 | switches are not connected (no ISL), and the networks beyond | ||
1332 | them are totally independant. In this case, if it is | ||
1333 | necessary for some specific one-way traffic to reach both | ||
1334 | independent networks, then the broadcast mode may be suitable. | ||
1335 | |||
1336 | 12.2.2 Link Monitoring Selection for Multiple Switch Topology | ||
1337 | ------------------------------------------------------------- | 1656 | ------------------------------------------------------------- |
1338 | 1657 | ||
1339 | The choice of link monitoring ultimately depends upon your | 1658 | In actual practice, the bonding mode typically employed in |
1340 | switch. If the switch can reliably fail ports in response to other | 1659 | configurations of this type is balance-rr. Historically, in this |
1341 | failures, then either the MII or ARP monitors should work. For | 1660 | network configuration, the usual caveats about out of order packet |
1342 | example, in the above example, if the "port3" link fails at the remote | 1661 | delivery are mitigated by the use of network adapters that do not do |
1343 | end, the MII monitor has no direct means to detect this. The ARP | 1662 | any kind of packet coalescing (via the use of NAPI, or because the |
1344 | monitor could be configured with a target at the remote end of port3, | 1663 | device itself does not generate interrupts until some number of |
1345 | thus detecting that failure without switch support. | 1664 | packets has arrived). When employed in this fashion, the balance-rr |
1665 | mode allows individual connections between two hosts to effectively | ||
1666 | utilize greater than one interface's bandwidth. | ||
1346 | 1667 | ||
1347 | In general, however, in a multiple switch topology, the ARP | 1668 | 13.2.2 MT Link Monitoring for Multiple Switch Topology |
1348 | monitor can provide a higher level of reliability in detecting link | 1669 | ------------------------------------------------------ |
1349 | failures. Additionally, it should be configured with multiple targets | ||
1350 | (at least one for each switch in the network). This will insure that, | ||
1351 | regardless of which switch is active, the ARP monitor has a suitable | ||
1352 | target to query. | ||
1353 | 1670 | ||
1671 | Again, in actual practice, the MII monitor is most often used | ||
1672 | in this configuration, as performance is given preference over | ||
1673 | availability. The ARP monitor will function in this topology, but its | ||
1674 | advantages over the MII monitor are mitigated by the volume of probes | ||
1675 | needed as the number of systems involved grows (remember that each | ||
1676 | host in the network is configured with bonding). | ||
1354 | 1677 | ||
1355 | 12.3 Switch Behavior Issues for High Availability | 1678 | 14. Switch Behavior Issues |
1356 | ------------------------------------------------- | 1679 | ========================== |
1357 | 1680 | ||
1358 | You may encounter issues with the timing of link up and down | 1681 | 14.1 Link Establishment and Failover Delays |
1359 | reporting by the switch. | 1682 | ------------------------------------------- |
1683 | |||
1684 | Some switches exhibit undesirable behavior with regard to the | ||
1685 | timing of link up and down reporting by the switch. | ||
1360 | 1686 | ||
1361 | First, when a link comes up, some switches may indicate that | 1687 | First, when a link comes up, some switches may indicate that |
1362 | the link is up (carrier available), but not pass traffic over the | 1688 | the link is up (carrier available), but not pass traffic over the |
@@ -1370,30 +1696,70 @@ relevant interface(s). | |||
1370 | Second, some switches may "bounce" the link state one or more | 1696 | Second, some switches may "bounce" the link state one or more |
1371 | times while a link is changing state. This occurs most commonly while | 1697 | times while a link is changing state. This occurs most commonly while |
1372 | the switch is initializing. Again, an appropriate updelay value may | 1698 | the switch is initializing. Again, an appropriate updelay value may |
1373 | help, but note that if all links are down, then updelay is ignored | 1699 | help. |
1374 | when any link becomes active (the slave closest to completing its | ||
1375 | updelay is chosen). | ||
1376 | 1700 | ||
1377 | Note that when a bonding interface has no active links, the | 1701 | Note that when a bonding interface has no active links, the |
1378 | driver will immediately reuse the first link that goes up, even if | 1702 | driver will immediately reuse the first link that goes up, even if the |
1379 | updelay parameter was specified. If there are slave interfaces | 1703 | updelay parameter has been specified (the updelay is ignored in this |
1380 | waiting for the updelay timeout to expire, the interface that first | 1704 | case). If there are slave interfaces waiting for the updelay timeout |
1381 | went into that state will be immediately reused. This reduces down | 1705 | to expire, the interface that first went into that state will be |
1382 | time of the network if the value of updelay has been overestimated. | 1706 | immediately reused. This reduces down time of the network if the |
1707 | value of updelay has been overestimated, and since this occurs only in | ||
1708 | cases with no connectivity, there is no additional penalty for | ||
1709 | ignoring the updelay. | ||
1383 | 1710 | ||
1384 | In addition to the concerns about switch timings, if your | 1711 | In addition to the concerns about switch timings, if your |
1385 | switches take a long time to go into backup mode, it may be desirable | 1712 | switches take a long time to go into backup mode, it may be desirable |
1386 | to not activate a backup interface immediately after a link goes down. | 1713 | to not activate a backup interface immediately after a link goes down. |
1387 | Failover may be delayed via the downdelay bonding module option. | 1714 | Failover may be delayed via the downdelay bonding module option. |
1388 | 1715 | ||
1389 | 13. Hardware Specific Considerations | 1716 | 14.2 Duplicated Incoming Packets |
1717 | -------------------------------- | ||
1718 | |||
1719 | It is not uncommon to observe a short burst of duplicated | ||
1720 | traffic when the bonding device is first used, or after it has been | ||
1721 | idle for some period of time. This is most easily observed by issuing | ||
1722 | a "ping" to some other host on the network, and noticing that the | ||
1723 | output from ping flags duplicates (typically one per slave). | ||
1724 | |||
1725 | For example, on a bond in active-backup mode with five slaves | ||
1726 | all connected to one switch, the output may appear as follows: | ||
1727 | |||
1728 | # ping -n 10.0.4.2 | ||
1729 | PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data. | ||
1730 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms | ||
1731 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1732 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1733 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1734 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1735 | 64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms | ||
1736 | 64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms | ||
1737 | 64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms | ||
1738 | |||
1739 | This is not due to an error in the bonding driver, rather, it | ||
1740 | is a side effect of how many switches update their MAC forwarding | ||
1741 | tables. Initially, the switch does not associate the MAC address in | ||
1742 | the packet with a particular switch port, and so it may send the | ||
1743 | traffic to all ports until its MAC forwarding table is updated. Since | ||
1744 | the interfaces attached to the bond may occupy multiple ports on a | ||
1745 | single switch, when the switch (temporarily) floods the traffic to all | ||
1746 | ports, the bond device receives multiple copies of the same packet | ||
1747 | (one per slave device). | ||
1748 | |||
1749 | The duplicated packet behavior is switch dependent, some | ||
1750 | switches exhibit this, and some do not. On switches that display this | ||
1751 | behavior, it can be induced by clearing the MAC forwarding table (on | ||
1752 | most Cisco switches, the privileged command "clear mac address-table | ||
1753 | dynamic" will accomplish this). | ||
1754 | |||
1755 | 15. Hardware Specific Considerations | ||
1390 | ==================================== | 1756 | ==================================== |
1391 | 1757 | ||
1392 | This section contains additional information for configuring | 1758 | This section contains additional information for configuring |
1393 | bonding on specific hardware platforms, or for interfacing bonding | 1759 | bonding on specific hardware platforms, or for interfacing bonding |
1394 | with particular switches or other devices. | 1760 | with particular switches or other devices. |
1395 | 1761 | ||
1396 | 13.1 IBM BladeCenter | 1762 | 15.1 IBM BladeCenter |
1397 | -------------------- | 1763 | -------------------- |
1398 | 1764 | ||
1399 | This applies to the JS20 and similar systems. | 1765 | This applies to the JS20 and similar systems. |
@@ -1407,12 +1773,12 @@ JS20 network adapter information | |||
1407 | -------------------------------- | 1773 | -------------------------------- |
1408 | 1774 | ||
1409 | All JS20s come with two Broadcom Gigabit Ethernet ports | 1775 | All JS20s come with two Broadcom Gigabit Ethernet ports |
1410 | integrated on the planar. In the BladeCenter chassis, the eth0 port | 1776 | integrated on the planar (that's "motherboard" in IBM-speak). In the |
1411 | of all JS20 blades is hard wired to I/O Module #1; similarly, all eth1 | 1777 | BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to |
1412 | ports are wired to I/O Module #2. An add-on Broadcom daughter card | 1778 | I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2. |
1413 | can be installed on a JS20 to provide two more Gigabit Ethernet ports. | 1779 | An add-on Broadcom daughter card can be installed on a JS20 to provide |
1414 | These ports, eth2 and eth3, are wired to I/O Modules 3 and 4, | 1780 | two more Gigabit Ethernet ports. These ports, eth2 and eth3, are |
1415 | respectively. | 1781 | wired to I/O Modules 3 and 4, respectively. |
1416 | 1782 | ||
1417 | Each I/O Module may contain either a switch or a passthrough | 1783 | Each I/O Module may contain either a switch or a passthrough |
1418 | module (which allows ports to be directly connected to an external | 1784 | module (which allows ports to be directly connected to an external |
@@ -1432,29 +1798,30 @@ BladeCenter networking configuration | |||
1432 | of ways, this discussion will be confined to describing basic | 1798 | of ways, this discussion will be confined to describing basic |
1433 | configurations. | 1799 | configurations. |
1434 | 1800 | ||
1435 | Normally, Ethernet Switch Modules (ESM) are used in I/O | 1801 | Normally, Ethernet Switch Modules (ESMs) are used in I/O |
1436 | modules 1 and 2. In this configuration, the eth0 and eth1 ports of a | 1802 | modules 1 and 2. In this configuration, the eth0 and eth1 ports of a |
1437 | JS20 will be connected to different internal switches (in the | 1803 | JS20 will be connected to different internal switches (in the |
1438 | respective I/O modules). | 1804 | respective I/O modules). |
1439 | 1805 | ||
1440 | An optical passthru module (OPM) connects the I/O module | 1806 | A passthrough module (OPM or CPM, optical or copper, |
1441 | directly to an external switch. By using OPMs in I/O module #1 and | 1807 | passthrough module) connects the I/O module directly to an external |
1442 | #2, the eth0 and eth1 interfaces of a JS20 can be redirected to the | 1808 | switch. By using PMs in I/O module #1 and #2, the eth0 and eth1 |
1443 | outside world and connected to a common external switch. | 1809 | interfaces of a JS20 can be redirected to the outside world and |
1444 | 1810 | connected to a common external switch. | |
1445 | Depending upon the mix of ESM and OPM modules, the network | 1811 | |
1446 | will appear to bonding as either a single switch topology (all OPM | 1812 | Depending upon the mix of ESMs and PMs, the network will |
1447 | modules) or as a multiple switch topology (one or more ESM modules, | 1813 | appear to bonding as either a single switch topology (all PMs) or as a |
1448 | zero or more OPM modules). It is also possible to connect ESM modules | 1814 | multiple switch topology (one or more ESMs, zero or more PMs). It is |
1449 | together, resulting in a configuration much like the example in "High | 1815 | also possible to connect ESMs together, resulting in a configuration |
1450 | Availability in a multiple switch topology." | 1816 | much like the example in "High Availability in a Multiple Switch |
1451 | 1817 | Topology," above. | |
1452 | Requirements for specifc modes | 1818 | |
1453 | ------------------------------ | 1819 | Requirements for specific modes |
1454 | 1820 | ------------------------------- | |
1455 | The balance-rr mode requires the use of OPM modules for | 1821 | |
1456 | devices in the bond, all connected to an common external switch. That | 1822 | The balance-rr mode requires the use of passthrough modules |
1457 | switch must be configured for "etherchannel" or "trunking" on the | 1823 | for devices in the bond, all connected to an common external switch. |
1824 | That switch must be configured for "etherchannel" or "trunking" on the | ||
1458 | appropriate ports, as is usual for balance-rr. | 1825 | appropriate ports, as is usual for balance-rr. |
1459 | 1826 | ||
1460 | The balance-alb and balance-tlb modes will function with | 1827 | The balance-alb and balance-tlb modes will function with |
@@ -1484,17 +1851,18 @@ connected to the JS20 system. | |||
1484 | Other concerns | 1851 | Other concerns |
1485 | -------------- | 1852 | -------------- |
1486 | 1853 | ||
1487 | The Serial Over LAN link is established over the primary | 1854 | The Serial Over LAN (SoL) link is established over the primary |
1488 | ethernet (eth0) only, therefore, any loss of link to eth0 will result | 1855 | ethernet (eth0) only, therefore, any loss of link to eth0 will result |
1489 | in losing your SoL connection. It will not fail over with other | 1856 | in losing your SoL connection. It will not fail over with other |
1490 | network traffic. | 1857 | network traffic, as the SoL system is beyond the control of the |
1858 | bonding driver. | ||
1491 | 1859 | ||
1492 | It may be desirable to disable spanning tree on the switch | 1860 | It may be desirable to disable spanning tree on the switch |
1493 | (either the internal Ethernet Switch Module, or an external switch) to | 1861 | (either the internal Ethernet Switch Module, or an external switch) to |
1494 | avoid fail-over delays issues when using bonding. | 1862 | avoid fail-over delay issues when using bonding. |
1495 | 1863 | ||
1496 | 1864 | ||
1497 | 14. Frequently Asked Questions | 1865 | 16. Frequently Asked Questions |
1498 | ============================== | 1866 | ============================== |
1499 | 1867 | ||
1500 | 1. Is it SMP safe? | 1868 | 1. Is it SMP safe? |
@@ -1505,8 +1873,8 @@ The new driver was designed to be SMP safe from the start. | |||
1505 | 2. What type of cards will work with it? | 1873 | 2. What type of cards will work with it? |
1506 | 1874 | ||
1507 | Any Ethernet type cards (you can even mix cards - a Intel | 1875 | Any Ethernet type cards (you can even mix cards - a Intel |
1508 | EtherExpress PRO/100 and a 3com 3c905b, for example). They need not | 1876 | EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, |
1509 | be of the same speed. | 1877 | devices need not be of the same speed. |
1510 | 1878 | ||
1511 | 3. How many bonding devices can I have? | 1879 | 3. How many bonding devices can I have? |
1512 | 1880 | ||
@@ -1524,11 +1892,12 @@ system. | |||
1524 | disabled. The active-backup mode will fail over to a backup link, and | 1892 | disabled. The active-backup mode will fail over to a backup link, and |
1525 | other modes will ignore the failed link. The link will continue to be | 1893 | other modes will ignore the failed link. The link will continue to be |
1526 | monitored, and should it recover, it will rejoin the bond (in whatever | 1894 | monitored, and should it recover, it will rejoin the bond (in whatever |
1527 | manner is appropriate for the mode). See the section on High | 1895 | manner is appropriate for the mode). See the sections on High |
1528 | Availability for additional information. | 1896 | Availability and the documentation for each mode for additional |
1897 | information. | ||
1529 | 1898 | ||
1530 | Link monitoring can be enabled via either the miimon or | 1899 | Link monitoring can be enabled via either the miimon or |
1531 | arp_interval paramters (described in the module paramters section, | 1900 | arp_interval parameters (described in the module parameters section, |
1532 | above). In general, miimon monitors the carrier state as sensed by | 1901 | above). In general, miimon monitors the carrier state as sensed by |
1533 | the underlying network device, and the arp monitor (arp_interval) | 1902 | the underlying network device, and the arp monitor (arp_interval) |
1534 | monitors connectivity to another host on the local network. | 1903 | monitors connectivity to another host on the local network. |
@@ -1536,7 +1905,7 @@ monitors connectivity to another host on the local network. | |||
1536 | If no link monitoring is configured, the bonding driver will | 1905 | If no link monitoring is configured, the bonding driver will |
1537 | be unable to detect link failures, and will assume that all links are | 1906 | be unable to detect link failures, and will assume that all links are |
1538 | always available. This will likely result in lost packets, and a | 1907 | always available. This will likely result in lost packets, and a |
1539 | resulting degredation of performance. The precise performance loss | 1908 | resulting degradation of performance. The precise performance loss |
1540 | depends upon the bonding mode and network configuration. | 1909 | depends upon the bonding mode and network configuration. |
1541 | 1910 | ||
1542 | 6. Can bonding be used for High Availability? | 1911 | 6. Can bonding be used for High Availability? |
@@ -1550,12 +1919,12 @@ depends upon the bonding mode and network configuration. | |||
1550 | In the basic balance modes (balance-rr and balance-xor), it | 1919 | In the basic balance modes (balance-rr and balance-xor), it |
1551 | works with any system that supports etherchannel (also called | 1920 | works with any system that supports etherchannel (also called |
1552 | trunking). Most managed switches currently available have such | 1921 | trunking). Most managed switches currently available have such |
1553 | support, and many unmananged switches as well. | 1922 | support, and many unmanaged switches as well. |
1554 | 1923 | ||
1555 | The advanced balance modes (balance-tlb and balance-alb) do | 1924 | The advanced balance modes (balance-tlb and balance-alb) do |
1556 | not have special switch requirements, but do need device drivers that | 1925 | not have special switch requirements, but do need device drivers that |
1557 | support specific features (described in the appropriate section under | 1926 | support specific features (described in the appropriate section under |
1558 | module paramters, above). | 1927 | module parameters, above). |
1559 | 1928 | ||
1560 | In 802.3ad mode, it works with with systems that support IEEE | 1929 | In 802.3ad mode, it works with with systems that support IEEE |
1561 | 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged | 1930 | 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged |
@@ -1565,17 +1934,19 @@ switches currently available support 802.3ad. | |||
1565 | 1934 | ||
1566 | 8. Where does a bonding device get its MAC address from? | 1935 | 8. Where does a bonding device get its MAC address from? |
1567 | 1936 | ||
1568 | If not explicitly configured with ifconfig, the MAC address of | 1937 | If not explicitly configured (with ifconfig or ip link), the |
1569 | the bonding device is taken from its first slave device. This MAC | 1938 | MAC address of the bonding device is taken from its first slave |
1570 | address is then passed to all following slaves and remains persistent | 1939 | device. This MAC address is then passed to all following slaves and |
1571 | (even if the the first slave is removed) until the bonding device is | 1940 | remains persistent (even if the the first slave is removed) until the |
1572 | brought down or reconfigured. | 1941 | bonding device is brought down or reconfigured. |
1573 | 1942 | ||
1574 | If you wish to change the MAC address, you can set it with | 1943 | If you wish to change the MAC address, you can set it with |
1575 | ifconfig: | 1944 | ifconfig or ip link: |
1576 | 1945 | ||
1577 | # ifconfig bond0 hw ether 00:11:22:33:44:55 | 1946 | # ifconfig bond0 hw ether 00:11:22:33:44:55 |
1578 | 1947 | ||
1948 | # ip link set bond0 address 66:77:88:99:aa:bb | ||
1949 | |||
1579 | The MAC address can be also changed by bringing down/up the | 1950 | The MAC address can be also changed by bringing down/up the |
1580 | device and then changing its slaves (or their order): | 1951 | device and then changing its slaves (or their order): |
1581 | 1952 | ||
@@ -1591,23 +1962,28 @@ from the bond (`ifenslave -d bond0 eth0'). The bonding driver will | |||
1591 | then restore the MAC addresses that the slaves had before they were | 1962 | then restore the MAC addresses that the slaves had before they were |
1592 | enslaved. | 1963 | enslaved. |
1593 | 1964 | ||
1594 | 15. Resources and Links | 1965 | 16. Resources and Links |
1595 | ======================= | 1966 | ======================= |
1596 | 1967 | ||
1597 | The latest version of the bonding driver can be found in the latest | 1968 | The latest version of the bonding driver can be found in the latest |
1598 | version of the linux kernel, found on http://kernel.org | 1969 | version of the linux kernel, found on http://kernel.org |
1599 | 1970 | ||
1971 | The latest version of this document can be found in either the latest | ||
1972 | kernel source (named Documentation/networking/bonding.txt), or on the | ||
1973 | bonding sourceforge site: | ||
1974 | |||
1975 | http://www.sourceforge.net/projects/bonding | ||
1976 | |||
1600 | Discussions regarding the bonding driver take place primarily on the | 1977 | Discussions regarding the bonding driver take place primarily on the |
1601 | bonding-devel mailing list, hosted at sourceforge.net. If you have | 1978 | bonding-devel mailing list, hosted at sourceforge.net. If you have |
1602 | questions or problems, post them to the list. | 1979 | questions or problems, post them to the list. The list address is: |
1603 | 1980 | ||
1604 | bonding-devel@lists.sourceforge.net | 1981 | bonding-devel@lists.sourceforge.net |
1605 | 1982 | ||
1606 | https://lists.sourceforge.net/lists/listinfo/bonding-devel | 1983 | The administrative interface (to subscribe or unsubscribe) can |
1607 | 1984 | be found at: | |
1608 | There is also a project site on sourceforge. | ||
1609 | 1985 | ||
1610 | http://www.sourceforge.net/projects/bonding | 1986 | https://lists.sourceforge.net/lists/listinfo/bonding-devel |
1611 | 1987 | ||
1612 | Donald Becker's Ethernet Drivers and diag programs may be found at : | 1988 | Donald Becker's Ethernet Drivers and diag programs may be found at : |
1613 | - http://www.scyld.com/network/ | 1989 | - http://www.scyld.com/network/ |