aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/networking/bonding.txt978
1 files changed, 677 insertions, 301 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 0bc2ed136a38..24d029455baa 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -1,5 +1,7 @@
1 1
2 Linux Ethernet Bonding Driver HOWTO 2 Linux Ethernet Bonding Driver HOWTO
3
4 Latest update: 21 June 2005
3 5
4Initial release : Thomas Davis <tadavis at lbl.gov> 6Initial release : Thomas Davis <tadavis at lbl.gov>
5Corrections, HA extensions : 2000/10/03-15 : 7Corrections, HA extensions : 2000/10/03-15 :
@@ -11,15 +13,22 @@ Corrections, HA extensions : 2000/10/03-15 :
11 13
12Reorganized and updated Feb 2005 by Jay Vosburgh 14Reorganized and updated Feb 2005 by Jay Vosburgh
13 15
14Note : 16Introduction
15------ 17============
18
19 The Linux bonding driver provides a method for aggregating
20multiple network interfaces into a single logical "bonded" interface.
21The behavior of the bonded interfaces depends upon the mode; generally
22speaking, modes provide either hot standby or load balancing services.
23Additionally, link integrity monitoring may be performed.
16 24
17The bonding driver originally came from Donald Becker's beowulf patches for 25 The bonding driver originally came from Donald Becker's
18kernel 2.0. It has changed quite a bit since, and the original tools from 26beowulf patches for kernel 2.0. It has changed quite a bit since, and
19extreme-linux and beowulf sites will not work with this version of the driver. 27the original tools from extreme-linux and beowulf sites will not work
28with this version of the driver.
20 29
21For new versions of the driver, patches for older kernels and the updated 30 For new versions of the driver, updated userspace tools, and
22userspace tools, please follow the links at the end of this file. 31who to ask for help, please follow the links at the end of this file.
23 32
24Table of Contents 33Table of Contents
25================= 34=================
@@ -30,9 +39,13 @@ Table of Contents
30 39
313. Configuring Bonding Devices 403. Configuring Bonding Devices
323.1 Configuration with sysconfig support 413.1 Configuration with sysconfig support
423.1.1 Using DHCP with sysconfig
433.1.2 Configuring Multiple Bonds with sysconfig
333.2 Configuration with initscripts support 443.2 Configuration with initscripts support
453.2.1 Using DHCP with initscripts
463.2.2 Configuring Multiple Bonds with initscripts
343.3 Configuring Bonding Manually 473.3 Configuring Bonding Manually
353.4 Configuring Multiple Bonds 483.3.1 Configuring Multiple Bonds Manually
36 49
375. Querying Bonding Configuration 505. Querying Bonding Configuration
385.1 Bonding Configuration 515.1 Bonding Configuration
@@ -56,21 +69,30 @@ Table of Contents
56 69
5711. Promiscuous mode 7011. Promiscuous mode
58 71
5912. High Availability Information 7212. Configuring Bonding for High Availability
6012.1 High Availability in a Single Switch Topology 7312.1 High Availability in a Single Switch Topology
6112.1.1 Bonding Mode Selection for Single Switch Topology
6212.1.2 Link Monitoring for Single Switch Topology
6312.2 High Availability in a Multiple Switch Topology 7412.2 High Availability in a Multiple Switch Topology
6412.2.1 Bonding Mode Selection for Multiple Switch Topology 7512.2.1 HA Bonding Mode Selection for Multiple Switch Topology
6512.2.2 Link Monitoring for Multiple Switch Topology 7612.2.2 HA Link Monitoring for Multiple Switch Topology
6612.3 Switch Behavior Issues for High Availability 77
7813. Configuring Bonding for Maximum Throughput
7913.1 Maximum Throughput in a Single Switch Topology
8013.1.1 MT Bonding Mode Selection for Single Switch Topology
8113.1.2 MT Link Monitoring for Single Switch Topology
8213.2 Maximum Throughput in a Multiple Switch Topology
8313.2.1 MT Bonding Mode Selection for Multiple Switch Topology
8413.2.2 MT Link Monitoring for Multiple Switch Topology
67 85
6813. Hardware Specific Considerations 8614. Switch Behavior Issues
6913.1 IBM BladeCenter 8714.1 Link Establishment and Failover Delays
8814.2 Duplicated Incoming Packets
70 89
7114. Frequently Asked Questions 9015. Hardware Specific Considerations
9115.1 IBM BladeCenter
72 92
7315. Resources and Links 9316. Frequently Asked Questions
94
9517. Resources and Links
74 96
75 97
761. Bonding Driver Installation 981. Bonding Driver Installation
@@ -86,16 +108,10 @@ the following steps:
861.1 Configure and build the kernel with bonding 1081.1 Configure and build the kernel with bonding
87----------------------------------------------- 109-----------------------------------------------
88 110
89 The latest version of the bonding driver is available in the 111 The current version of the bonding driver is available in the
90drivers/net/bonding subdirectory of the most recent kernel source 112drivers/net/bonding subdirectory of the most recent kernel source
91(which is available on http://kernel.org). 113(which is available on http://kernel.org). Most users "rolling their
92 114own" will want to use the most recent kernel from kernel.org.
93 Prior to the 2.4.11 kernel, the bonding driver was maintained
94largely outside the kernel tree; patches for some earlier kernels are
95available on the bonding sourceforge site, although those patches are
96still several years out of date. Most users will want to use either
97the most recent kernel from kernel.org or whatever kernel came with
98their distro.
99 115
100 Configure kernel with "make menuconfig" (or "make xconfig" or 116 Configure kernel with "make menuconfig" (or "make xconfig" or
101"make config"), then select "Bonding driver support" in the "Network 117"make config"), then select "Bonding driver support" in the "Network
@@ -103,8 +119,8 @@ device support" section. It is recommended that you configure the
103driver as module since it is currently the only way to pass parameters 119driver as module since it is currently the only way to pass parameters
104to the driver or configure more than one bonding device. 120to the driver or configure more than one bonding device.
105 121
106 Build and install the new kernel and modules, then proceed to 122 Build and install the new kernel and modules, then continue
107step 2. 123below to install ifenslave.
108 124
1091.2 Install ifenslave Control Utility 1251.2 Install ifenslave Control Utility
110------------------------------------- 126-------------------------------------
@@ -147,9 +163,9 @@ default kernel source include directory.
147 Options for the bonding driver are supplied as parameters to 163 Options for the bonding driver are supplied as parameters to
148the bonding module at load time. They may be given as command line 164the bonding module at load time. They may be given as command line
149arguments to the insmod or modprobe command, but are usually specified 165arguments to the insmod or modprobe command, but are usually specified
150in either the /etc/modprobe.conf configuration file, or in a 166in either the /etc/modules.conf or /etc/modprobe.conf configuration
151distro-specific configuration file (some of which are detailed in the 167file, or in a distro-specific configuration file (some of which are
152next section). 168detailed in the next section).
153 169
154 The available bonding driver parameters are listed below. If a 170 The available bonding driver parameters are listed below. If a
155parameter is not specified the default value is used. When initially 171parameter is not specified the default value is used. When initially
@@ -162,34 +178,34 @@ degradation will occur during link failures. Very few devices do not
162support at least miimon, so there is really no reason not to use it. 178support at least miimon, so there is really no reason not to use it.
163 179
164 Options with textual values will accept either the text name 180 Options with textual values will accept either the text name
165 or, for backwards compatibility, the option value. E.g., 181or, for backwards compatibility, the option value. E.g.,
166 "mode=802.3ad" and "mode=4" set the same mode. 182"mode=802.3ad" and "mode=4" set the same mode.
167 183
168 The parameters are as follows: 184 The parameters are as follows:
169 185
170arp_interval 186arp_interval
171 187
172 Specifies the ARP monitoring frequency in milli-seconds. If 188 Specifies the ARP link monitoring frequency in milliseconds.
173 ARP monitoring is used in a load-balancing mode (mode 0 or 2), 189 If ARP monitoring is used in an etherchannel compatible mode
174 the switch should be configured in a mode that evenly 190 (modes 0 and 2), the switch should be configured in a mode
175 distributes packets across all links - such as round-robin. If 191 that evenly distributes packets across all links. If the
176 the switch is configured to distribute the packets in an XOR 192 switch is configured to distribute the packets in an XOR
177 fashion, all replies from the ARP targets will be received on 193 fashion, all replies from the ARP targets will be received on
178 the same link which could cause the other team members to 194 the same link which could cause the other team members to
179 fail. ARP monitoring should not be used in conjunction with 195 fail. ARP monitoring should not be used in conjunction with
180 miimon. A value of 0 disables ARP monitoring. The default 196 miimon. A value of 0 disables ARP monitoring. The default
181 value is 0. 197 value is 0.
182 198
183arp_ip_target 199arp_ip_target
184 200
185 Specifies the ip addresses to use when arp_interval is > 0. 201 Specifies the IP addresses to use as ARP monitoring peers when
186 These are the targets of the ARP request sent to determine the 202 arp_interval is > 0. These are the targets of the ARP request
187 health of the link to the targets. Specify these values in 203 sent to determine the health of the link to the targets.
188 ddd.ddd.ddd.ddd format. Multiple ip adresses must be 204 Specify these values in ddd.ddd.ddd.ddd format. Multiple IP
189 seperated by a comma. At least one IP address must be given 205 addresses must be separated by a comma. At least one IP
190 for ARP monitoring to function. The maximum number of targets 206 address must be given for ARP monitoring to function. The
191 that can be specified is 16. The default value is no IP 207 maximum number of targets that can be specified is 16. The
192 addresses. 208 default value is no IP addresses.
193 209
194downdelay 210downdelay
195 211
@@ -207,11 +223,13 @@ lacp_rate
207 are: 223 are:
208 224
209 slow or 0 225 slow or 0
210 Request partner to transmit LACPDUs every 30 seconds (default) 226 Request partner to transmit LACPDUs every 30 seconds
211 227
212 fast or 1 228 fast or 1
213 Request partner to transmit LACPDUs every 1 second 229 Request partner to transmit LACPDUs every 1 second
214 230
231 The default is slow.
232
215max_bonds 233max_bonds
216 234
217 Specifies the number of bonding devices to create for this 235 Specifies the number of bonding devices to create for this
@@ -221,10 +239,11 @@ max_bonds
221 239
222miimon 240miimon
223 241
224 Specifies the frequency in milli-seconds that MII link 242 Specifies the MII link monitoring frequency in milliseconds.
225 monitoring will occur. A value of zero disables MII link 243 This determines how often the link state of each slave is
226 monitoring. A value of 100 is a good starting point. The 244 inspected for link failures. A value of zero disables MII
227 use_carrier option, below, affects how the link state is 245 link monitoring. A value of 100 is a good starting point.
246 The use_carrier option, below, affects how the link state is
228 determined. See the High Availability section for additional 247 determined. See the High Availability section for additional
229 information. The default value is 0. 248 information. The default value is 0.
230 249
@@ -246,17 +265,31 @@ mode
246 active. A different slave becomes active if, and only 265 active. A different slave becomes active if, and only
247 if, the active slave fails. The bond's MAC address is 266 if, the active slave fails. The bond's MAC address is
248 externally visible on only one port (network adapter) 267 externally visible on only one port (network adapter)
249 to avoid confusing the switch. This mode provides 268 to avoid confusing the switch.
250 fault tolerance. The primary option affects the 269
251 behavior of this mode. 270 In bonding version 2.6.2 or later, when a failover
271 occurs in active-backup mode, bonding will issue one
272 or more gratuitous ARPs on the newly active slave.
273 One gratutious ARP is issued for the bonding master
274 interface and each VLAN interfaces configured above
275 it, provided that the interface has at least one IP
276 address configured. Gratuitous ARPs issued for VLAN
277 interfaces are tagged with the appropriate VLAN id.
278
279 This mode provides fault tolerance. The primary
280 option, documented below, affects the behavior of this
281 mode.
252 282
253 balance-xor or 2 283 balance-xor or 2
254 284
255 XOR policy: Transmit based on [(source MAC address 285 XOR policy: Transmit based on the selected transmit
256 XOR'd with destination MAC address) modulo slave 286 hash policy. The default policy is a simple [(source
257 count]. This selects the same slave for each 287 MAC address XOR'd with destination MAC address) modulo
258 destination MAC address. This mode provides load 288 slave count]. Alternate transmit policies may be
259 balancing and fault tolerance. 289 selected via the xmit_hash_policy option, described
290 below.
291
292 This mode provides load balancing and fault tolerance.
260 293
261 broadcast or 3 294 broadcast or 3
262 295
@@ -270,7 +303,17 @@ mode
270 duplex settings. Utilizes all slaves in the active 303 duplex settings. Utilizes all slaves in the active
271 aggregator according to the 802.3ad specification. 304 aggregator according to the 802.3ad specification.
272 305
273 Pre-requisites: 306 Slave selection for outgoing traffic is done according
307 to the transmit hash policy, which may be changed from
308 the default simple XOR policy via the xmit_hash_policy
309 option, documented below. Note that not all transmit
310 policies may be 802.3ad compliant, particularly in
311 regards to the packet mis-ordering requirements of
312 section 43.2.4 of the 802.3ad standard. Differing
313 peer implementations will have varying tolerances for
314 noncompliance.
315
316 Prerequisites:
274 317
275 1. Ethtool support in the base drivers for retrieving 318 1. Ethtool support in the base drivers for retrieving
276 the speed and duplex of each slave. 319 the speed and duplex of each slave.
@@ -333,7 +376,7 @@ mode
333 376
334 When a link is reconnected or a new slave joins the 377 When a link is reconnected or a new slave joins the
335 bond the receive traffic is redistributed among all 378 bond the receive traffic is redistributed among all
336 active slaves in the bond by intiating ARP Replies 379 active slaves in the bond by initiating ARP Replies
337 with the selected mac address to each of the 380 with the selected mac address to each of the
338 clients. The updelay parameter (detailed below) must 381 clients. The updelay parameter (detailed below) must
339 be set to a value equal or greater than the switch's 382 be set to a value equal or greater than the switch's
@@ -396,6 +439,60 @@ use_carrier
396 0 will use the deprecated MII / ETHTOOL ioctls. The default 439 0 will use the deprecated MII / ETHTOOL ioctls. The default
397 value is 1. 440 value is 1.
398 441
442xmit_hash_policy
443
444 Selects the transmit hash policy to use for slave selection in
445 balance-xor and 802.3ad modes. Possible values are:
446
447 layer2
448
449 Uses XOR of hardware MAC addresses to generate the
450 hash. The formula is
451
452 (source MAC XOR destination MAC) modulo slave count
453
454 This algorithm will place all traffic to a particular
455 network peer on the same slave.
456
457 This algorithm is 802.3ad compliant.
458
459 layer3+4
460
461 This policy uses upper layer protocol information,
462 when available, to generate the hash. This allows for
463 traffic to a particular network peer to span multiple
464 slaves, although a single connection will not span
465 multiple slaves.
466
467 The formula for unfragmented TCP and UDP packets is
468
469 ((source port XOR dest port) XOR
470 ((source IP XOR dest IP) AND 0xffff)
471 modulo slave count
472
473 For fragmented TCP or UDP packets and all other IP
474 protocol traffic, the source and destination port
475 information is omitted. For non-IP traffic, the
476 formula is the same as for the layer2 transmit hash
477 policy.
478
479 This policy is intended to mimic the behavior of
480 certain switches, notably Cisco switches with PFC2 as
481 well as some Foundry and IBM products.
482
483 This algorithm is not fully 802.3ad compliant. A
484 single TCP or UDP conversation containing both
485 fragmented and unfragmented packets will see packets
486 striped across two interfaces. This may result in out
487 of order delivery. Most traffic types will not meet
488 this criteria, as TCP rarely fragments traffic, and
489 most UDP traffic is not involved in extended
490 conversations. Other implementations of 802.3ad may
491 or may not tolerate this noncompliance.
492
493 The default value is layer2. This option was added in bonding
494version 2.6.3. In earlier versions of bonding, this parameter does
495not exist, and the layer2 policy is the only policy.
399 496
400 497
4013. Configuring Bonding Devices 4983. Configuring Bonding Devices
@@ -448,8 +545,9 @@ Bonding devices can be managed by hand, however, as follows.
448slave devices. On SLES 9, this is most easily done by running the 545slave devices. On SLES 9, this is most easily done by running the
449yast2 sysconfig configuration utility. The goal is for to create an 546yast2 sysconfig configuration utility. The goal is for to create an
450ifcfg-id file for each slave device. The simplest way to accomplish 547ifcfg-id file for each slave device. The simplest way to accomplish
451this is to configure the devices for DHCP. The name of the 548this is to configure the devices for DHCP (this is only to get the
452configuration file for each device will be of the form: 549file ifcfg-id file created; see below for some issues with DHCP). The
550name of the configuration file for each device will be of the form:
453 551
454ifcfg-id-xx:xx:xx:xx:xx:xx 552ifcfg-id-xx:xx:xx:xx:xx:xx
455 553
@@ -459,7 +557,7 @@ the device's permanent MAC address.
459 Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been 557 Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been
460created, it is necessary to edit the configuration files for the slave 558created, it is necessary to edit the configuration files for the slave
461devices (the MAC addresses correspond to those of the slave devices). 559devices (the MAC addresses correspond to those of the slave devices).
462Before editing, the file will contain muliple lines, and will look 560Before editing, the file will contain multiple lines, and will look
463something like this: 561something like this:
464 562
465BOOTPROTO='dhcp' 563BOOTPROTO='dhcp'
@@ -496,16 +594,11 @@ STARTMODE="onboot"
496BONDING_MASTER="yes" 594BONDING_MASTER="yes"
497BONDING_MODULE_OPTS="mode=active-backup miimon=100" 595BONDING_MODULE_OPTS="mode=active-backup miimon=100"
498BONDING_SLAVE0="eth0" 596BONDING_SLAVE0="eth0"
499BONDING_SLAVE1="eth1" 597BONDING_SLAVE1="bus-pci-0000:06:08.1"
500 598
501 Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK 599 Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK
502values with the appropriate values for your network. 600values with the appropriate values for your network.
503 601
504 Note that configuring the bonding device with BOOTPROTO='dhcp'
505does not work; the scripts attempt to obtain the device address from
506DHCP prior to adding any of the slave devices. Without active slaves,
507the DHCP requests are not sent to the network.
508
509 The STARTMODE specifies when the device is brought online. 602 The STARTMODE specifies when the device is brought online.
510The possible values are: 603The possible values are:
511 604
@@ -531,9 +624,17 @@ for the bonding mode, link monitoring, and so on here. Do not include
531the max_bonds bonding parameter; this will confuse the configuration 624the max_bonds bonding parameter; this will confuse the configuration
532system if you have multiple bonding devices. 625system if you have multiple bonding devices.
533 626
534 Finally, supply one BONDING_SLAVEn="ethX" for each slave, 627 Finally, supply one BONDING_SLAVEn="slave device" for each
535where "n" is an increasing value, one for each slave, and "ethX" is 628slave. where "n" is an increasing value, one for each slave. The
536the name of the slave device (eth0, eth1, etc). 629"slave device" is either an interface name, e.g., "eth0", or a device
630specifier for the network device. The interface name is easier to
631find, but the ethN names are subject to change at boot time if, e.g.,
632a device early in the sequence has failed. The device specifiers
633(bus-pci-0000:06:08.1 in the example above) specify the physical
634network device, and will not change unless the device's bus location
635changes (for example, it is moved from one PCI slot to another). The
636example above uses one of each type for demonstration purposes; most
637configurations will choose one or the other for all slave devices.
537 638
538 When all configuration files have been modified or created, 639 When all configuration files have been modified or created,
539networking must be restarted for the configuration changes to take 640networking must be restarted for the configuration changes to take
@@ -544,7 +645,7 @@ effect. This can be accomplished via the following:
544 Note that the network control script (/sbin/ifdown) will 645 Note that the network control script (/sbin/ifdown) will
545remove the bonding module as part of the network shutdown processing, 646remove the bonding module as part of the network shutdown processing,
546so it is not necessary to remove the module by hand if, e.g., the 647so it is not necessary to remove the module by hand if, e.g., the
547module paramters have changed. 648module parameters have changed.
548 649
549 Also, at this writing, YaST/YaST2 will not manage bonding 650 Also, at this writing, YaST/YaST2 will not manage bonding
550devices (they do not show bonding interfaces on its list of network 651devices (they do not show bonding interfaces on its list of network
@@ -559,12 +660,37 @@ format can be found in an example ifcfg template file:
559 Note that the template does not document the various BONDING_ 660 Note that the template does not document the various BONDING_
560settings described above, but does describe many of the other options. 661settings described above, but does describe many of the other options.
561 662
6633.1.1 Using DHCP with sysconfig
664-------------------------------
665
666 Under sysconfig, configuring a device with BOOTPROTO='dhcp'
667will cause it to query DHCP for its IP address information. At this
668writing, this does not function for bonding devices; the scripts
669attempt to obtain the device address from DHCP prior to adding any of
670the slave devices. Without active slaves, the DHCP requests are not
671sent to the network.
672
6733.1.2 Configuring Multiple Bonds with sysconfig
674-----------------------------------------------
675
676 The sysconfig network initialization system is capable of
677handling multiple bonding devices. All that is necessary is for each
678bonding instance to have an appropriately configured ifcfg-bondX file
679(as described above). Do not specify the "max_bonds" parameter to any
680instance of bonding, as this will confuse sysconfig. If you require
681multiple bonding devices with identical parameters, create multiple
682ifcfg-bondX files.
683
684 Because the sysconfig scripts supply the bonding module
685options in the ifcfg-bondX file, it is not necessary to add them to
686the system /etc/modules.conf or /etc/modprobe.conf configuration file.
687
5623.2 Configuration with initscripts support 6883.2 Configuration with initscripts support
563------------------------------------------ 689------------------------------------------
564 690
565 This section applies to distros using a version of initscripts 691 This section applies to distros using a version of initscripts
566with bonding support, for example, Red Hat Linux 9 or Red Hat 692with bonding support, for example, Red Hat Linux 9 or Red Hat
567Enterprise Linux version 3. On these systems, the network 693Enterprise Linux version 3 or 4. On these systems, the network
568initialization scripts have some knowledge of bonding, and can be 694initialization scripts have some knowledge of bonding, and can be
569configured to control bonding devices. 695configured to control bonding devices.
570 696
@@ -614,10 +740,11 @@ USERCTL=no
614 Be sure to change the networking specific lines (IPADDR, 740 Be sure to change the networking specific lines (IPADDR,
615NETMASK, NETWORK and BROADCAST) to match your network configuration. 741NETMASK, NETWORK and BROADCAST) to match your network configuration.
616 742
617 Finally, it is necessary to edit /etc/modules.conf to load the 743 Finally, it is necessary to edit /etc/modules.conf (or
618bonding module when the bond0 interface is brought up. The following 744/etc/modprobe.conf, depending upon your distro) to load the bonding
619sample lines in /etc/modules.conf will load the bonding module, and 745module with your desired options when the bond0 interface is brought
620select its options: 746up. The following lines in /etc/modules.conf (or modprobe.conf) will
747load the bonding module, and select its options:
621 748
622alias bond0 bonding 749alias bond0 bonding
623options bond0 mode=balance-alb miimon=100 750options bond0 mode=balance-alb miimon=100
@@ -629,6 +756,33 @@ options for your configuration.
629will restart the networking subsystem and your bond link should be now 756will restart the networking subsystem and your bond link should be now
630up and running. 757up and running.
631 758
7593.2.1 Using DHCP with initscripts
760---------------------------------
761
762 Recent versions of initscripts (the version supplied with
763Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do
764have support for assigning IP information to bonding devices via DHCP.
765
766 To configure bonding for DHCP, configure it as described
767above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp"
768and add a line consisting of "TYPE=Bonding". Note that the TYPE value
769is case sensitive.
770
7713.2.2 Configuring Multiple Bonds with initscripts
772-------------------------------------------------
773
774 At this writing, the initscripts package does not directly
775support loading the bonding driver multiple times, so the process for
776doing so is the same as described in the "Configuring Multiple Bonds
777Manually" section, below.
778
779 NOTE: It has been observed that some Red Hat supplied kernels
780are apparently unable to rename modules at load time (the "-obonding1"
781part). Attempts to pass that option to modprobe will produce an
782"Operation not permitted" error. This has been reported on some
783Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels
784exhibiting this problem, it will be impossible to configure multiple
785bonds with differing parameters.
632 786
6333.3 Configuring Bonding Manually 7873.3 Configuring Bonding Manually
634-------------------------------- 788--------------------------------
@@ -638,10 +792,11 @@ scripts (the sysconfig or initscripts package) do not have specific
638knowledge of bonding. One such distro is SuSE Linux Enterprise Server 792knowledge of bonding. One such distro is SuSE Linux Enterprise Server
639version 8. 793version 8.
640 794
641 The general methodology for these systems is to place the 795 The general method for these systems is to place the bonding
642bonding module parameters into /etc/modprobe.conf, then add modprobe 796module parameters into /etc/modules.conf or /etc/modprobe.conf (as
643and/or ifenslave commands to the system's global init script. The 797appropriate for the installed distro), then add modprobe and/or
644name of the global init script differs; for sysconfig, it is 798ifenslave commands to the system's global init script. The name of
799the global init script differs; for sysconfig, it is
645/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. 800/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local.
646 801
647 For example, if you wanted to make a simple bond of two e100 802 For example, if you wanted to make a simple bond of two e100
@@ -649,7 +804,7 @@ devices (presumed to be eth0 and eth1), and have it persist across
649reboots, edit the appropriate file (/etc/init.d/boot.local or 804reboots, edit the appropriate file (/etc/init.d/boot.local or
650/etc/rc.d/rc.local), and add the following: 805/etc/rc.d/rc.local), and add the following:
651 806
652modprobe bonding -obond0 mode=balance-alb miimon=100 807modprobe bonding mode=balance-alb miimon=100
653modprobe e100 808modprobe e100
654ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up 809ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
655ifenslave bond0 eth0 810ifenslave bond0 eth0
@@ -657,11 +812,7 @@ ifenslave bond0 eth1
657 812
658 Replace the example bonding module parameters and bond0 813 Replace the example bonding module parameters and bond0
659network configuration (IP address, netmask, etc) with the appropriate 814network configuration (IP address, netmask, etc) with the appropriate
660values for your configuration. The above example loads the bonding 815values for your configuration.
661module with the name "bond0," this simplifies the naming if multiple
662bonding modules are loaded (each successive instance of the module is
663given a different name, and the module instance names match the
664bonding interface names).
665 816
666 Unfortunately, this method will not provide support for the 817 Unfortunately, this method will not provide support for the
667ifup and ifdown scripts on the bond devices. To reload the bonding 818ifup and ifdown scripts on the bond devices. To reload the bonding
@@ -684,20 +835,23 @@ appropriate device driver modules. For our example above, you can do
684the following: 835the following:
685 836
686# ifconfig bond0 down 837# ifconfig bond0 down
687# rmmod bond0 838# rmmod bonding
688# rmmod e100 839# rmmod e100
689 840
690 Again, for convenience, it may be desirable to create a script 841 Again, for convenience, it may be desirable to create a script
691with these commands. 842with these commands.
692 843
693 844
6943.4 Configuring Multiple Bonds 8453.3.1 Configuring Multiple Bonds Manually
695------------------------------ 846-----------------------------------------
696 847
697 This section contains information on configuring multiple 848 This section contains information on configuring multiple
698bonding devices with differing options. If you require multiple 849bonding devices with differing options for those systems whose network
699bonding devices, but all with the same options, see the "max_bonds" 850initialization scripts lack support for configuring multiple bonds.
700module paramter, documented above. 851
852 If you require multiple bonding devices, but all with the same
853options, you may wish to use the "max_bonds" module parameter,
854documented above.
701 855
702 To create multiple bonding devices with differing options, it 856 To create multiple bonding devices with differing options, it
703is necessary to load the bonding driver multiple times. Note that 857is necessary to load the bonding driver multiple times. Note that
@@ -724,11 +878,16 @@ named "bond0" and creates the bond0 device in balance-rr mode with an
724miimon of 100. The second instance is named "bond1" and creates the 878miimon of 100. The second instance is named "bond1" and creates the
725bond1 device in balance-alb mode with an miimon of 50. 879bond1 device in balance-alb mode with an miimon of 50.
726 880
881 In some circumstances (typically with older distributions),
882the above does not work, and the second bonding instance never sees
883its options. In that case, the second options line can be substituted
884as follows:
885
886install bonding1 /sbin/modprobe bonding -obond1 mode=balance-alb miimon=50
887
727 This may be repeated any number of times, specifying a new and 888 This may be repeated any number of times, specifying a new and
728unique name in place of bond0 or bond1 for each instance. 889unique name in place of bond1 for each subsequent instance.
729 890
730 When the appropriate module paramters are in place, then
731configure bonding according to the instructions for your distro.
732 891
7335. Querying Bonding Configuration 8925. Querying Bonding Configuration
734================================= 893=================================
@@ -846,8 +1005,8 @@ tagged internally by bonding itself. As a result, bonding must
846self generated packets. 1005self generated packets.
847 1006
848 For reasons of simplicity, and to support the use of adapters 1007 For reasons of simplicity, and to support the use of adapters
849that can do VLAN hardware acceleration offloding, the bonding 1008that can do VLAN hardware acceleration offloading, the bonding
850interface declares itself as fully hardware offloaing capable, it gets 1009interface declares itself as fully hardware offloading capable, it gets
851the add_vid/kill_vid notifications to gather the necessary 1010the add_vid/kill_vid notifications to gather the necessary
852information, and it propagates those actions to the slaves. In case 1011information, and it propagates those actions to the slaves. In case
853of mixed adapter types, hardware accelerated tagged packets that 1012of mixed adapter types, hardware accelerated tagged packets that
@@ -880,7 +1039,7 @@ bond interface:
880matches the hardware address of the VLAN interfaces. 1039matches the hardware address of the VLAN interfaces.
881 1040
882 Note that changing a VLAN interface's HW address would set the 1041 Note that changing a VLAN interface's HW address would set the
883underlying device -- i.e. the bonding interface -- to promiscouos 1042underlying device -- i.e. the bonding interface -- to promiscuous
884mode, which might not be what you want. 1043mode, which might not be what you want.
885 1044
886 1045
@@ -923,7 +1082,7 @@ down or have a problem making it unresponsive to ARP requests. Having
923an additional target (or several) increases the reliability of the ARP 1082an additional target (or several) increases the reliability of the ARP
924monitoring. 1083monitoring.
925 1084
926 Multiple ARP targets must be seperated by commas as follows: 1085 Multiple ARP targets must be separated by commas as follows:
927 1086
928# example options for ARP monitoring with three targets 1087# example options for ARP monitoring with three targets
929alias bond0 bonding 1088alias bond0 bonding
@@ -1045,7 +1204,7 @@ install bonding /sbin/modprobe tg3; /sbin/modprobe e1000;
1045 This will, when loading the bonding module, rather than 1204 This will, when loading the bonding module, rather than
1046performing the normal action, instead execute the provided command. 1205performing the normal action, instead execute the provided command.
1047This command loads the device drivers in the order needed, then calls 1206This command loads the device drivers in the order needed, then calls
1048modprobe with --ingore-install to cause the normal action to then take 1207modprobe with --ignore-install to cause the normal action to then take
1049place. Full documentation on this can be found in the modprobe.conf 1208place. Full documentation on this can be found in the modprobe.conf
1050and modprobe manual pages. 1209and modprobe manual pages.
1051 1210
@@ -1130,14 +1289,14 @@ association.
1130common to enable promiscuous mode on the device, so that all traffic 1289common to enable promiscuous mode on the device, so that all traffic
1131is seen (instead of seeing only traffic destined for the local host). 1290is seen (instead of seeing only traffic destined for the local host).
1132The bonding driver handles promiscuous mode changes to the bonding 1291The bonding driver handles promiscuous mode changes to the bonding
1133master device (e.g., bond0), and propogates the setting to the slave 1292master device (e.g., bond0), and propagates the setting to the slave
1134devices. 1293devices.
1135 1294
1136 For the balance-rr, balance-xor, broadcast, and 802.3ad modes, 1295 For the balance-rr, balance-xor, broadcast, and 802.3ad modes,
1137the promiscuous mode setting is propogated to all slaves. 1296the promiscuous mode setting is propagated to all slaves.
1138 1297
1139 For the active-backup, balance-tlb and balance-alb modes, the 1298 For the active-backup, balance-tlb and balance-alb modes, the
1140promiscuous mode setting is propogated only to the active slave. 1299promiscuous mode setting is propagated only to the active slave.
1141 1300
1142 For balance-tlb mode, the active slave is the slave currently 1301 For balance-tlb mode, the active slave is the slave currently
1143receiving inbound traffic. 1302receiving inbound traffic.
@@ -1148,46 +1307,182 @@ sending to peers that are unassigned or if the load is unbalanced.
1148 1307
1149 For the active-backup, balance-tlb and balance-alb modes, when 1308 For the active-backup, balance-tlb and balance-alb modes, when
1150the active slave changes (e.g., due to a link failure), the 1309the active slave changes (e.g., due to a link failure), the
1151promiscuous setting will be propogated to the new active slave. 1310promiscuous setting will be propagated to the new active slave.
1152 1311
115312. High Availability Information 131212. Configuring Bonding for High Availability
1154================================= 1313=============================================
1155 1314
1156 High Availability refers to configurations that provide 1315 High Availability refers to configurations that provide
1157maximum network availability by having redundant or backup devices, 1316maximum network availability by having redundant or backup devices,
1158links and switches between the host and the rest of the world. 1317links or switches between the host and the rest of the world. The
1159 1318goal is to provide the maximum availability of network connectivity
1160 There are currently two basic methods for configuring to 1319(i.e., the network always works), even though other configurations
1161maximize availability. They are dependent on the network topology and 1320could provide higher throughput.
1162the primary goal of the configuration, but in general, a configuration
1163can be optimized for maximum available bandwidth, or for maximum
1164network availability.
1165 1321
116612.1 High Availability in a Single Switch Topology 132212.1 High Availability in a Single Switch Topology
1167-------------------------------------------------- 1323--------------------------------------------------
1168 1324
1169 If two hosts (or a host and a switch) are directly connected 1325 If two hosts (or a host and a single switch) are directly
1170via multiple physical links, then there is no network availability 1326connected via multiple physical links, then there is no availability
1171penalty for optimizing for maximum bandwidth: there is only one switch 1327penalty to optimizing for maximum bandwidth. In this case, there is
1172(or peer), so if it fails, you have no alternative access to fail over 1328only one switch (or peer), so if it fails, there is no alternative
1173to. 1329access to fail over to. Additionally, the bonding load balance modes
1330support link monitoring of their members, so if individual links fail,
1331the load will be rebalanced across the remaining devices.
1332
1333 See Section 13, "Configuring Bonding for Maximum Throughput"
1334for information on configuring bonding with one peer device.
1335
133612.2 High Availability in a Multiple Switch Topology
1337----------------------------------------------------
1338
1339 With multiple switches, the configuration of bonding and the
1340network changes dramatically. In multiple switch topologies, there is
1341a trade off between network availability and usable bandwidth.
1342
1343 Below is a sample network, configured to maximize the
1344availability of the network:
1174 1345
1175Example 1 : host to switch (or other host) 1346 | |
1347 |port3 port3|
1348 +-----+----+ +-----+----+
1349 | |port2 ISL port2| |
1350 | switch A +--------------------------+ switch B |
1351 | | | |
1352 +-----+----+ +-----++---+
1353 |port1 port1|
1354 | +-------+ |
1355 +-------------+ host1 +---------------+
1356 eth0 +-------+ eth1
1176 1357
1177 +----------+ +----------+ 1358 In this configuration, there is a link between the two
1178 | |eth0 eth0| switch | 1359switches (ISL, or inter switch link), and multiple ports connecting to
1179 | Host A +--------------------------+ or | 1360the outside world ("port3" on each switch). There is no technical
1180 | +--------------------------+ other | 1361reason that this could not be extended to a third switch.
1181 | |eth1 eth1| host |
1182 +----------+ +----------+
1183 1362
136312.2.1 HA Bonding Mode Selection for Multiple Switch Topology
1364-------------------------------------------------------------
1184 1365
118512.1.1 Bonding Mode Selection for single switch topology 1366 In a topology such as the example above, the active-backup and
1186-------------------------------------------------------- 1367broadcast modes are the only useful bonding modes when optimizing for
1368availability; the other modes require all links to terminate on the
1369same peer for them to behave rationally.
1370
1371active-backup: This is generally the preferred mode, particularly if
1372 the switches have an ISL and play together well. If the
1373 network configuration is such that one switch is specifically
1374 a backup switch (e.g., has lower capacity, higher cost, etc),
1375 then the primary option can be used to insure that the
1376 preferred link is always used when it is available.
1377
1378broadcast: This mode is really a special purpose mode, and is suitable
1379 only for very specific needs. For example, if the two
1380 switches are not connected (no ISL), and the networks beyond
1381 them are totally independent. In this case, if it is
1382 necessary for some specific one-way traffic to reach both
1383 independent networks, then the broadcast mode may be suitable.
1384
138512.2.2 HA Link Monitoring Selection for Multiple Switch Topology
1386----------------------------------------------------------------
1387
1388 The choice of link monitoring ultimately depends upon your
1389switch. If the switch can reliably fail ports in response to other
1390failures, then either the MII or ARP monitors should work. For
1391example, in the above example, if the "port3" link fails at the remote
1392end, the MII monitor has no direct means to detect this. The ARP
1393monitor could be configured with a target at the remote end of port3,
1394thus detecting that failure without switch support.
1395
1396 In general, however, in a multiple switch topology, the ARP
1397monitor can provide a higher level of reliability in detecting end to
1398end connectivity failures (which may be caused by the failure of any
1399individual component to pass traffic for any reason). Additionally,
1400the ARP monitor should be configured with multiple targets (at least
1401one for each switch in the network). This will insure that,
1402regardless of which switch is active, the ARP monitor has a suitable
1403target to query.
1404
1405
140613. Configuring Bonding for Maximum Throughput
1407==============================================
1408
140913.1 Maximizing Throughput in a Single Switch Topology
1410------------------------------------------------------
1411
1412 In a single switch configuration, the best method to maximize
1413throughput depends upon the application and network environment. The
1414various load balancing modes each have strengths and weaknesses in
1415different environments, as detailed below.
1416
1417 For this discussion, we will break down the topologies into
1418two categories. Depending upon the destination of most traffic, we
1419categorize them into either "gatewayed" or "local" configurations.
1420
1421 In a gatewayed configuration, the "switch" is acting primarily
1422as a router, and the majority of traffic passes through this router to
1423other networks. An example would be the following:
1424
1425
1426 +----------+ +----------+
1427 | |eth0 port1| | to other networks
1428 | Host A +---------------------+ router +------------------->
1429 | +---------------------+ | Hosts B and C are out
1430 | |eth1 port2| | here somewhere
1431 +----------+ +----------+
1432
1433 The router may be a dedicated router device, or another host
1434acting as a gateway. For our discussion, the important point is that
1435the majority of traffic from Host A will pass through the router to
1436some other network before reaching its final destination.
1437
1438 In a gatewayed network configuration, although Host A may
1439communicate with many other systems, all of its traffic will be sent
1440and received via one other peer on the local network, the router.
1441
1442 Note that the case of two systems connected directly via
1443multiple physical links is, for purposes of configuring bonding, the
1444same as a gatewayed configuration. In that case, it happens that all
1445traffic is destined for the "gateway" itself, not some other network
1446beyond the gateway.
1447
1448 In a local configuration, the "switch" is acting primarily as
1449a switch, and the majority of traffic passes through this switch to
1450reach other stations on the same network. An example would be the
1451following:
1452
1453 +----------+ +----------+ +--------+
1454 | |eth0 port1| +-------+ Host B |
1455 | Host A +------------+ switch |port3 +--------+
1456 | +------------+ | +--------+
1457 | |eth1 port2| +------------------+ Host C |
1458 +----------+ +----------+port4 +--------+
1459
1460
1461 Again, the switch may be a dedicated switch device, or another
1462host acting as a gateway. For our discussion, the important point is
1463that the majority of traffic from Host A is destined for other hosts
1464on the same local network (Hosts B and C in the above example).
1465
1466 In summary, in a gatewayed configuration, traffic to and from
1467the bonded device will be to the same MAC level peer on the network
1468(the gateway itself, i.e., the router), regardless of its final
1469destination. In a local configuration, traffic flows directly to and
1470from the final destinations, thus, each destination (Host B, Host C)
1471will be addressed directly by their individual MAC addresses.
1472
1473 This distinction between a gatewayed and a local network
1474configuration is important because many of the load balancing modes
1475available use the MAC addresses of the local network source and
1476destination to make load balancing decisions. The behavior of each
1477mode is described below.
1478
1479
148013.1.1 MT Bonding Mode Selection for Single Switch Topology
1481-----------------------------------------------------------
1187 1482
1188 This configuration is the easiest to set up and to understand, 1483 This configuration is the easiest to set up and to understand,
1189although you will have to decide which bonding mode best suits your 1484although you will have to decide which bonding mode best suits your
1190needs. The tradeoffs for each mode are detailed below: 1485needs. The trade offs for each mode are detailed below:
1191 1486
1192balance-rr: This mode is the only mode that will permit a single 1487balance-rr: This mode is the only mode that will permit a single
1193 TCP/IP connection to stripe traffic across multiple 1488 TCP/IP connection to stripe traffic across multiple
@@ -1206,6 +1501,23 @@ balance-rr: This mode is the only mode that will permit a single
1206 interface's worth of throughput, even after adjusting 1501 interface's worth of throughput, even after adjusting
1207 tcp_reordering. 1502 tcp_reordering.
1208 1503
1504 Note that this out of order delivery occurs when both the
1505 sending and receiving systems are utilizing a multiple
1506 interface bond. Consider a configuration in which a
1507 balance-rr bond feeds into a single higher capacity network
1508 channel (e.g., multiple 100Mb/sec ethernets feeding a single
1509 gigabit ethernet via an etherchannel capable switch). In this
1510 configuration, traffic sent from the multiple 100Mb devices to
1511 a destination connected to the gigabit device will not see
1512 packets out of order. However, traffic sent from the gigabit
1513 device to the multiple 100Mb devices may or may not see
1514 traffic out of order, depending upon the balance policy of the
1515 switch. Many switches do not support any modes that stripe
1516 traffic (instead choosing a port based upon IP or MAC level
1517 addresses); for those devices, traffic flowing from the
1518 gigabit device to the many 100Mb devices will only utilize one
1519 interface.
1520
1209 If you are utilizing protocols other than TCP/IP, UDP for 1521 If you are utilizing protocols other than TCP/IP, UDP for
1210 example, and your application can tolerate out of order 1522 example, and your application can tolerate out of order
1211 delivery, then this mode can allow for single stream datagram 1523 delivery, then this mode can allow for single stream datagram
@@ -1220,16 +1532,21 @@ active-backup: There is not much advantage in this network topology to
1220 connected to the same peer as the primary. In this case, a 1532 connected to the same peer as the primary. In this case, a
1221 load balancing mode (with link monitoring) will provide the 1533 load balancing mode (with link monitoring) will provide the
1222 same level of network availability, but with increased 1534 same level of network availability, but with increased
1223 available bandwidth. On the plus side, it does not require 1535 available bandwidth. On the plus side, active-backup mode
1224 any configuration of the switch. 1536 does not require any configuration of the switch, so it may
1537 have value if the hardware available does not support any of
1538 the load balance modes.
1225 1539
1226balance-xor: This mode will limit traffic such that packets destined 1540balance-xor: This mode will limit traffic such that packets destined
1227 for specific peers will always be sent over the same 1541 for specific peers will always be sent over the same
1228 interface. Since the destination is determined by the MAC 1542 interface. Since the destination is determined by the MAC
1229 addresses involved, this may be desirable if you have a large 1543 addresses involved, this mode works best in a "local" network
1230 network with many hosts. It is likely to be suboptimal if all 1544 configuration (as described above), with destinations all on
1231 your traffic is passed through a single router, however. As 1545 the same local network. This mode is likely to be suboptimal
1232 with balance-rr, the switch ports need to be configured for 1546 if all your traffic is passed through a single router (i.e., a
1547 "gatewayed" network configuration, as described above).
1548
1549 As with balance-rr, the switch ports need to be configured for
1233 "etherchannel" or "trunking." 1550 "etherchannel" or "trunking."
1234 1551
1235broadcast: Like active-backup, there is not much advantage to this 1552broadcast: Like active-backup, there is not much advantage to this
@@ -1241,122 +1558,131 @@ broadcast: Like active-backup, there is not much advantage to this
1241 protocol includes automatic configuration of the aggregates, 1558 protocol includes automatic configuration of the aggregates,
1242 so minimal manual configuration of the switch is needed 1559 so minimal manual configuration of the switch is needed
1243 (typically only to designate that some set of devices is 1560 (typically only to designate that some set of devices is
1244 usable for 802.3ad). The 802.3ad standard also mandates that 1561 available for 802.3ad). The 802.3ad standard also mandates
1245 frames be delivered in order (within certain limits), so in 1562 that frames be delivered in order (within certain limits), so
1246 general single connections will not see misordering of 1563 in general single connections will not see misordering of
1247 packets. The 802.3ad mode does have some drawbacks: the 1564 packets. The 802.3ad mode does have some drawbacks: the
1248 standard mandates that all devices in the aggregate operate at 1565 standard mandates that all devices in the aggregate operate at
1249 the same speed and duplex. Also, as with all bonding load 1566 the same speed and duplex. Also, as with all bonding load
1250 balance modes other than balance-rr, no single connection will 1567 balance modes other than balance-rr, no single connection will
1251 be able to utilize more than a single interface's worth of 1568 be able to utilize more than a single interface's worth of
1252 bandwidth. Additionally, the linux bonding 802.3ad 1569 bandwidth.
1253 implementation distributes traffic by peer (using an XOR of 1570
1254 MAC addresses), so in general all traffic to a particular 1571 Additionally, the linux bonding 802.3ad implementation
1255 destination will use the same interface. Finally, the 802.3ad 1572 distributes traffic by peer (using an XOR of MAC addresses),
1256 mode mandates the use of the MII monitor, therefore, the ARP 1573 so in a "gatewayed" configuration, all outgoing traffic will
1257 monitor is not available in this mode. 1574 generally use the same device. Incoming traffic may also end
1258 1575 up on a single device, but that is dependent upon the
1259balance-tlb: This mode is also a good choice for this type of 1576 balancing policy of the peer's 8023.ad implementation. In a
1260 topology. It has no special switch configuration 1577 "local" configuration, traffic will be distributed across the
1261 requirements, and balances outgoing traffic by peer, in a 1578 devices in the bond.
1262 vaguely intelligent manner (not a simple XOR as in balance-xor 1579
1263 or 802.3ad mode), so that unlucky MAC addresses will not all 1580 Finally, the 802.3ad mode mandates the use of the MII monitor,
1264 "bunch up" on a single interface. Interfaces may be of 1581 therefore, the ARP monitor is not available in this mode.
1265 differing speeds. On the down side, in this mode all incoming 1582
1266 traffic arrives over a single interface, this mode requires 1583balance-tlb: The balance-tlb mode balances outgoing traffic by peer.
1267 certain ethtool support in the network device driver of the 1584 Since the balancing is done according to MAC address, in a
1268 slave interfaces, and the ARP monitor is not available. 1585 "gatewayed" configuration (as described above), this mode will
1269 1586 send all traffic across a single device. However, in a
1270balance-alb: This mode is everything that balance-tlb is, and more. It 1587 "local" network configuration, this mode balances multiple
1271 has all of the features (and restrictions) of balance-tlb, and 1588 local network peers across devices in a vaguely intelligent
1272 will also balance incoming traffic from peers (as described in 1589 manner (not a simple XOR as in balance-xor or 802.3ad mode),
1273 the Bonding Module Options section, above). The only extra 1590 so that mathematically unlucky MAC addresses (i.e., ones that
1274 down side to this mode is that the network device driver must 1591 XOR to the same value) will not all "bunch up" on a single
1275 support changing the hardware address while the device is 1592 interface.
1276 open. 1593
1277 1594 Unlike 802.3ad, interfaces may be of differing speeds, and no
127812.1.2 Link Monitoring for Single Switch Topology 1595 special switch configuration is required. On the down side,
1279------------------------------------------------- 1596 in this mode all incoming traffic arrives over a single
1597 interface, this mode requires certain ethtool support in the
1598 network device driver of the slave interfaces, and the ARP
1599 monitor is not available.
1600
1601balance-alb: This mode is everything that balance-tlb is, and more.
1602 It has all of the features (and restrictions) of balance-tlb,
1603 and will also balance incoming traffic from local network
1604 peers (as described in the Bonding Module Options section,
1605 above).
1606
1607 The only additional down side to this mode is that the network
1608 device driver must support changing the hardware address while
1609 the device is open.
1610
161113.1.2 MT Link Monitoring for Single Switch Topology
1612----------------------------------------------------
1280 1613
1281 The choice of link monitoring may largely depend upon which 1614 The choice of link monitoring may largely depend upon which
1282mode you choose to use. The more advanced load balancing modes do not 1615mode you choose to use. The more advanced load balancing modes do not
1283support the use of the ARP monitor, and are thus restricted to using 1616support the use of the ARP monitor, and are thus restricted to using
1284the MII monitor (which does not provide as high a level of assurance 1617the MII monitor (which does not provide as high a level of end to end
1285as the ARP monitor). 1618assurance as the ARP monitor).
1286 1619
1287 162013.2 Maximum Throughput in a Multiple Switch Topology
128812.2 High Availability in a Multiple Switch Topology 1621-----------------------------------------------------
1289---------------------------------------------------- 1622
1290 1623 Multiple switches may be utilized to optimize for throughput
1291 With multiple switches, the configuration of bonding and the 1624when they are configured in parallel as part of an isolated network
1292network changes dramatically. In multiple switch topologies, there is 1625between two or more systems, for example:
1293a tradeoff between network availability and usable bandwidth. 1626
1294 1627 +-----------+
1295 Below is a sample network, configured to maximize the 1628 | Host A |
1296availability of the network: 1629 +-+---+---+-+
1297 1630 | | |
1298 | | 1631 +--------+ | +---------+
1299 |port3 port3| 1632 | | |
1300 +-----+----+ +-----+----+ 1633 +------+---+ +-----+----+ +-----+----+
1301 | |port2 ISL port2| | 1634 | Switch A | | Switch B | | Switch C |
1302 | switch A +--------------------------+ switch B | 1635 +------+---+ +-----+----+ +-----+----+
1303 | | | | 1636 | | |
1304 +-----+----+ +-----++---+ 1637 +--------+ | +---------+
1305 |port1 port1| 1638 | | |
1306 | +-------+ | 1639 +-+---+---+-+
1307 +-------------+ host1 +---------------+ 1640 | Host B |
1308 eth0 +-------+ eth1 1641 +-----------+
1309 1642
1310 In this configuration, there is a link between the two 1643 In this configuration, the switches are isolated from one
1311switches (ISL, or inter switch link), and multiple ports connecting to 1644another. One reason to employ a topology such as this is for an
1312the outside world ("port3" on each switch). There is no technical 1645isolated network with many hosts (a cluster configured for high
1313reason that this could not be extended to a third switch. 1646performance, for example), using multiple smaller switches can be more
1314 1647cost effective than a single larger switch, e.g., on a network with 24
131512.2.1 Bonding Mode Selection for Multiple Switch Topology 1648hosts, three 24 port switches can be significantly less expensive than
1316---------------------------------------------------------- 1649a single 72 port switch.
1317 1650
1318 In a topology such as this, the active-backup and broadcast 1651 If access beyond the network is required, an individual host
1319modes are the only useful bonding modes; the other modes require all 1652can be equipped with an additional network device connected to an
1320links to terminate on the same peer for them to behave rationally. 1653external network; this host then additionally acts as a gateway.
1321 1654
1322active-backup: This is generally the preferred mode, particularly if 165513.2.1 MT Bonding Mode Selection for Multiple Switch Topology
1323 the switches have an ISL and play together well. If the
1324 network configuration is such that one switch is specifically
1325 a backup switch (e.g., has lower capacity, higher cost, etc),
1326 then the primary option can be used to insure that the
1327 preferred link is always used when it is available.
1328
1329broadcast: This mode is really a special purpose mode, and is suitable
1330 only for very specific needs. For example, if the two
1331 switches are not connected (no ISL), and the networks beyond
1332 them are totally independant. In this case, if it is
1333 necessary for some specific one-way traffic to reach both
1334 independent networks, then the broadcast mode may be suitable.
1335
133612.2.2 Link Monitoring Selection for Multiple Switch Topology
1337------------------------------------------------------------- 1656-------------------------------------------------------------
1338 1657
1339 The choice of link monitoring ultimately depends upon your 1658 In actual practice, the bonding mode typically employed in
1340switch. If the switch can reliably fail ports in response to other 1659configurations of this type is balance-rr. Historically, in this
1341failures, then either the MII or ARP monitors should work. For 1660network configuration, the usual caveats about out of order packet
1342example, in the above example, if the "port3" link fails at the remote 1661delivery are mitigated by the use of network adapters that do not do
1343end, the MII monitor has no direct means to detect this. The ARP 1662any kind of packet coalescing (via the use of NAPI, or because the
1344monitor could be configured with a target at the remote end of port3, 1663device itself does not generate interrupts until some number of
1345thus detecting that failure without switch support. 1664packets has arrived). When employed in this fashion, the balance-rr
1665mode allows individual connections between two hosts to effectively
1666utilize greater than one interface's bandwidth.
1346 1667
1347 In general, however, in a multiple switch topology, the ARP 166813.2.2 MT Link Monitoring for Multiple Switch Topology
1348monitor can provide a higher level of reliability in detecting link 1669------------------------------------------------------
1349failures. Additionally, it should be configured with multiple targets
1350(at least one for each switch in the network). This will insure that,
1351regardless of which switch is active, the ARP monitor has a suitable
1352target to query.
1353 1670
1671 Again, in actual practice, the MII monitor is most often used
1672in this configuration, as performance is given preference over
1673availability. The ARP monitor will function in this topology, but its
1674advantages over the MII monitor are mitigated by the volume of probes
1675needed as the number of systems involved grows (remember that each
1676host in the network is configured with bonding).
1354 1677
135512.3 Switch Behavior Issues for High Availability 167814. Switch Behavior Issues
1356------------------------------------------------- 1679==========================
1357 1680
1358 You may encounter issues with the timing of link up and down 168114.1 Link Establishment and Failover Delays
1359reporting by the switch. 1682-------------------------------------------
1683
1684 Some switches exhibit undesirable behavior with regard to the
1685timing of link up and down reporting by the switch.
1360 1686
1361 First, when a link comes up, some switches may indicate that 1687 First, when a link comes up, some switches may indicate that
1362the link is up (carrier available), but not pass traffic over the 1688the link is up (carrier available), but not pass traffic over the
@@ -1370,30 +1696,70 @@ relevant interface(s).
1370 Second, some switches may "bounce" the link state one or more 1696 Second, some switches may "bounce" the link state one or more
1371times while a link is changing state. This occurs most commonly while 1697times while a link is changing state. This occurs most commonly while
1372the switch is initializing. Again, an appropriate updelay value may 1698the switch is initializing. Again, an appropriate updelay value may
1373help, but note that if all links are down, then updelay is ignored 1699help.
1374when any link becomes active (the slave closest to completing its
1375updelay is chosen).
1376 1700
1377 Note that when a bonding interface has no active links, the 1701 Note that when a bonding interface has no active links, the
1378driver will immediately reuse the first link that goes up, even if 1702driver will immediately reuse the first link that goes up, even if the
1379updelay parameter was specified. If there are slave interfaces 1703updelay parameter has been specified (the updelay is ignored in this
1380waiting for the updelay timeout to expire, the interface that first 1704case). If there are slave interfaces waiting for the updelay timeout
1381went into that state will be immediately reused. This reduces down 1705to expire, the interface that first went into that state will be
1382time of the network if the value of updelay has been overestimated. 1706immediately reused. This reduces down time of the network if the
1707value of updelay has been overestimated, and since this occurs only in
1708cases with no connectivity, there is no additional penalty for
1709ignoring the updelay.
1383 1710
1384 In addition to the concerns about switch timings, if your 1711 In addition to the concerns about switch timings, if your
1385switches take a long time to go into backup mode, it may be desirable 1712switches take a long time to go into backup mode, it may be desirable
1386to not activate a backup interface immediately after a link goes down. 1713to not activate a backup interface immediately after a link goes down.
1387Failover may be delayed via the downdelay bonding module option. 1714Failover may be delayed via the downdelay bonding module option.
1388 1715
138913. Hardware Specific Considerations 171614.2 Duplicated Incoming Packets
1717--------------------------------
1718
1719 It is not uncommon to observe a short burst of duplicated
1720traffic when the bonding device is first used, or after it has been
1721idle for some period of time. This is most easily observed by issuing
1722a "ping" to some other host on the network, and noticing that the
1723output from ping flags duplicates (typically one per slave).
1724
1725 For example, on a bond in active-backup mode with five slaves
1726all connected to one switch, the output may appear as follows:
1727
1728# ping -n 10.0.4.2
1729PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data.
173064 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms
173164 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
173264 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
173364 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
173464 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!)
173564 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms
173664 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms
173764 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms
1738
1739 This is not due to an error in the bonding driver, rather, it
1740is a side effect of how many switches update their MAC forwarding
1741tables. Initially, the switch does not associate the MAC address in
1742the packet with a particular switch port, and so it may send the
1743traffic to all ports until its MAC forwarding table is updated. Since
1744the interfaces attached to the bond may occupy multiple ports on a
1745single switch, when the switch (temporarily) floods the traffic to all
1746ports, the bond device receives multiple copies of the same packet
1747(one per slave device).
1748
1749 The duplicated packet behavior is switch dependent, some
1750switches exhibit this, and some do not. On switches that display this
1751behavior, it can be induced by clearing the MAC forwarding table (on
1752most Cisco switches, the privileged command "clear mac address-table
1753dynamic" will accomplish this).
1754
175515. Hardware Specific Considerations
1390==================================== 1756====================================
1391 1757
1392 This section contains additional information for configuring 1758 This section contains additional information for configuring
1393bonding on specific hardware platforms, or for interfacing bonding 1759bonding on specific hardware platforms, or for interfacing bonding
1394with particular switches or other devices. 1760with particular switches or other devices.
1395 1761
139613.1 IBM BladeCenter 176215.1 IBM BladeCenter
1397-------------------- 1763--------------------
1398 1764
1399 This applies to the JS20 and similar systems. 1765 This applies to the JS20 and similar systems.
@@ -1407,12 +1773,12 @@ JS20 network adapter information
1407-------------------------------- 1773--------------------------------
1408 1774
1409 All JS20s come with two Broadcom Gigabit Ethernet ports 1775 All JS20s come with two Broadcom Gigabit Ethernet ports
1410integrated on the planar. In the BladeCenter chassis, the eth0 port 1776integrated on the planar (that's "motherboard" in IBM-speak). In the
1411of all JS20 blades is hard wired to I/O Module #1; similarly, all eth1 1777BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to
1412ports are wired to I/O Module #2. An add-on Broadcom daughter card 1778I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2.
1413can be installed on a JS20 to provide two more Gigabit Ethernet ports. 1779An add-on Broadcom daughter card can be installed on a JS20 to provide
1414These ports, eth2 and eth3, are wired to I/O Modules 3 and 4, 1780two more Gigabit Ethernet ports. These ports, eth2 and eth3, are
1415respectively. 1781wired to I/O Modules 3 and 4, respectively.
1416 1782
1417 Each I/O Module may contain either a switch or a passthrough 1783 Each I/O Module may contain either a switch or a passthrough
1418module (which allows ports to be directly connected to an external 1784module (which allows ports to be directly connected to an external
@@ -1432,29 +1798,30 @@ BladeCenter networking configuration
1432of ways, this discussion will be confined to describing basic 1798of ways, this discussion will be confined to describing basic
1433configurations. 1799configurations.
1434 1800
1435 Normally, Ethernet Switch Modules (ESM) are used in I/O 1801 Normally, Ethernet Switch Modules (ESMs) are used in I/O
1436modules 1 and 2. In this configuration, the eth0 and eth1 ports of a 1802modules 1 and 2. In this configuration, the eth0 and eth1 ports of a
1437JS20 will be connected to different internal switches (in the 1803JS20 will be connected to different internal switches (in the
1438respective I/O modules). 1804respective I/O modules).
1439 1805
1440 An optical passthru module (OPM) connects the I/O module 1806 A passthrough module (OPM or CPM, optical or copper,
1441directly to an external switch. By using OPMs in I/O module #1 and 1807passthrough module) connects the I/O module directly to an external
1442#2, the eth0 and eth1 interfaces of a JS20 can be redirected to the 1808switch. By using PMs in I/O module #1 and #2, the eth0 and eth1
1443outside world and connected to a common external switch. 1809interfaces of a JS20 can be redirected to the outside world and
1444 1810connected to a common external switch.
1445 Depending upon the mix of ESM and OPM modules, the network 1811
1446will appear to bonding as either a single switch topology (all OPM 1812 Depending upon the mix of ESMs and PMs, the network will
1447modules) or as a multiple switch topology (one or more ESM modules, 1813appear to bonding as either a single switch topology (all PMs) or as a
1448zero or more OPM modules). It is also possible to connect ESM modules 1814multiple switch topology (one or more ESMs, zero or more PMs). It is
1449together, resulting in a configuration much like the example in "High 1815also possible to connect ESMs together, resulting in a configuration
1450Availability in a multiple switch topology." 1816much like the example in "High Availability in a Multiple Switch
1451 1817Topology," above.
1452Requirements for specifc modes 1818
1453------------------------------ 1819Requirements for specific modes
1454 1820-------------------------------
1455 The balance-rr mode requires the use of OPM modules for 1821
1456devices in the bond, all connected to an common external switch. That 1822 The balance-rr mode requires the use of passthrough modules
1457switch must be configured for "etherchannel" or "trunking" on the 1823for devices in the bond, all connected to an common external switch.
1824That switch must be configured for "etherchannel" or "trunking" on the
1458appropriate ports, as is usual for balance-rr. 1825appropriate ports, as is usual for balance-rr.
1459 1826
1460 The balance-alb and balance-tlb modes will function with 1827 The balance-alb and balance-tlb modes will function with
@@ -1484,17 +1851,18 @@ connected to the JS20 system.
1484Other concerns 1851Other concerns
1485-------------- 1852--------------
1486 1853
1487 The Serial Over LAN link is established over the primary 1854 The Serial Over LAN (SoL) link is established over the primary
1488ethernet (eth0) only, therefore, any loss of link to eth0 will result 1855ethernet (eth0) only, therefore, any loss of link to eth0 will result
1489in losing your SoL connection. It will not fail over with other 1856in losing your SoL connection. It will not fail over with other
1490network traffic. 1857network traffic, as the SoL system is beyond the control of the
1858bonding driver.
1491 1859
1492 It may be desirable to disable spanning tree on the switch 1860 It may be desirable to disable spanning tree on the switch
1493(either the internal Ethernet Switch Module, or an external switch) to 1861(either the internal Ethernet Switch Module, or an external switch) to
1494avoid fail-over delays issues when using bonding. 1862avoid fail-over delay issues when using bonding.
1495 1863
1496 1864
149714. Frequently Asked Questions 186516. Frequently Asked Questions
1498============================== 1866==============================
1499 1867
15001. Is it SMP safe? 18681. Is it SMP safe?
@@ -1505,8 +1873,8 @@ The new driver was designed to be SMP safe from the start.
15052. What type of cards will work with it? 18732. What type of cards will work with it?
1506 1874
1507 Any Ethernet type cards (you can even mix cards - a Intel 1875 Any Ethernet type cards (you can even mix cards - a Intel
1508EtherExpress PRO/100 and a 3com 3c905b, for example). They need not 1876EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes,
1509be of the same speed. 1877devices need not be of the same speed.
1510 1878
15113. How many bonding devices can I have? 18793. How many bonding devices can I have?
1512 1880
@@ -1524,11 +1892,12 @@ system.
1524disabled. The active-backup mode will fail over to a backup link, and 1892disabled. The active-backup mode will fail over to a backup link, and
1525other modes will ignore the failed link. The link will continue to be 1893other modes will ignore the failed link. The link will continue to be
1526monitored, and should it recover, it will rejoin the bond (in whatever 1894monitored, and should it recover, it will rejoin the bond (in whatever
1527manner is appropriate for the mode). See the section on High 1895manner is appropriate for the mode). See the sections on High
1528Availability for additional information. 1896Availability and the documentation for each mode for additional
1897information.
1529 1898
1530 Link monitoring can be enabled via either the miimon or 1899 Link monitoring can be enabled via either the miimon or
1531arp_interval paramters (described in the module paramters section, 1900arp_interval parameters (described in the module parameters section,
1532above). In general, miimon monitors the carrier state as sensed by 1901above). In general, miimon monitors the carrier state as sensed by
1533the underlying network device, and the arp monitor (arp_interval) 1902the underlying network device, and the arp monitor (arp_interval)
1534monitors connectivity to another host on the local network. 1903monitors connectivity to another host on the local network.
@@ -1536,7 +1905,7 @@ monitors connectivity to another host on the local network.
1536 If no link monitoring is configured, the bonding driver will 1905 If no link monitoring is configured, the bonding driver will
1537be unable to detect link failures, and will assume that all links are 1906be unable to detect link failures, and will assume that all links are
1538always available. This will likely result in lost packets, and a 1907always available. This will likely result in lost packets, and a
1539resulting degredation of performance. The precise performance loss 1908resulting degradation of performance. The precise performance loss
1540depends upon the bonding mode and network configuration. 1909depends upon the bonding mode and network configuration.
1541 1910
15426. Can bonding be used for High Availability? 19116. Can bonding be used for High Availability?
@@ -1550,12 +1919,12 @@ depends upon the bonding mode and network configuration.
1550 In the basic balance modes (balance-rr and balance-xor), it 1919 In the basic balance modes (balance-rr and balance-xor), it
1551works with any system that supports etherchannel (also called 1920works with any system that supports etherchannel (also called
1552trunking). Most managed switches currently available have such 1921trunking). Most managed switches currently available have such
1553support, and many unmananged switches as well. 1922support, and many unmanaged switches as well.
1554 1923
1555 The advanced balance modes (balance-tlb and balance-alb) do 1924 The advanced balance modes (balance-tlb and balance-alb) do
1556not have special switch requirements, but do need device drivers that 1925not have special switch requirements, but do need device drivers that
1557support specific features (described in the appropriate section under 1926support specific features (described in the appropriate section under
1558module paramters, above). 1927module parameters, above).
1559 1928
1560 In 802.3ad mode, it works with with systems that support IEEE 1929 In 802.3ad mode, it works with with systems that support IEEE
1561802.3ad Dynamic Link Aggregation. Most managed and many unmanaged 1930802.3ad Dynamic Link Aggregation. Most managed and many unmanaged
@@ -1565,17 +1934,19 @@ switches currently available support 802.3ad.
1565 1934
15668. Where does a bonding device get its MAC address from? 19358. Where does a bonding device get its MAC address from?
1567 1936
1568 If not explicitly configured with ifconfig, the MAC address of 1937 If not explicitly configured (with ifconfig or ip link), the
1569the bonding device is taken from its first slave device. This MAC 1938MAC address of the bonding device is taken from its first slave
1570address is then passed to all following slaves and remains persistent 1939device. This MAC address is then passed to all following slaves and
1571(even if the the first slave is removed) until the bonding device is 1940remains persistent (even if the the first slave is removed) until the
1572brought down or reconfigured. 1941bonding device is brought down or reconfigured.
1573 1942
1574 If you wish to change the MAC address, you can set it with 1943 If you wish to change the MAC address, you can set it with
1575ifconfig: 1944ifconfig or ip link:
1576 1945
1577# ifconfig bond0 hw ether 00:11:22:33:44:55 1946# ifconfig bond0 hw ether 00:11:22:33:44:55
1578 1947
1948# ip link set bond0 address 66:77:88:99:aa:bb
1949
1579 The MAC address can be also changed by bringing down/up the 1950 The MAC address can be also changed by bringing down/up the
1580device and then changing its slaves (or their order): 1951device and then changing its slaves (or their order):
1581 1952
@@ -1591,23 +1962,28 @@ from the bond (`ifenslave -d bond0 eth0'). The bonding driver will
1591then restore the MAC addresses that the slaves had before they were 1962then restore the MAC addresses that the slaves had before they were
1592enslaved. 1963enslaved.
1593 1964
159415. Resources and Links 196516. Resources and Links
1595======================= 1966=======================
1596 1967
1597The latest version of the bonding driver can be found in the latest 1968The latest version of the bonding driver can be found in the latest
1598version of the linux kernel, found on http://kernel.org 1969version of the linux kernel, found on http://kernel.org
1599 1970
1971The latest version of this document can be found in either the latest
1972kernel source (named Documentation/networking/bonding.txt), or on the
1973bonding sourceforge site:
1974
1975http://www.sourceforge.net/projects/bonding
1976
1600Discussions regarding the bonding driver take place primarily on the 1977Discussions regarding the bonding driver take place primarily on the
1601bonding-devel mailing list, hosted at sourceforge.net. If you have 1978bonding-devel mailing list, hosted at sourceforge.net. If you have
1602questions or problems, post them to the list. 1979questions or problems, post them to the list. The list address is:
1603 1980
1604bonding-devel@lists.sourceforge.net 1981bonding-devel@lists.sourceforge.net
1605 1982
1606https://lists.sourceforge.net/lists/listinfo/bonding-devel 1983 The administrative interface (to subscribe or unsubscribe) can
1607 1984be found at:
1608There is also a project site on sourceforge.
1609 1985
1610http://www.sourceforge.net/projects/bonding 1986https://lists.sourceforge.net/lists/listinfo/bonding-devel
1611 1987
1612Donald Becker's Ethernet Drivers and diag programs may be found at : 1988Donald Becker's Ethernet Drivers and diag programs may be found at :
1613 - http://www.scyld.com/network/ 1989 - http://www.scyld.com/network/