diff options
author | Jay Vosburgh <fubar@us.ibm.com> | 2007-11-13 23:25:48 -0500 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2008-01-28 18:03:48 -0500 |
commit | 9a6c686799346b6c95405c9e051f5023873504fa (patch) | |
tree | f1a8414921b8535a5fb4acdf8d57e69459c48289 | |
parent | 7a47dd7a2f178cc4e87d584b0469eef4b58b7aea (diff) |
[BONDING]: Documentation update
Update the bonding documentation: more discussion on
initialization and configuration, changes to discussion of packet
reordering in balance-rr, update some out of date information.
Based in part on input from Rick Jones <rick.jones2@hp.com>
and Andy Gospodarek <andy@greyhouse.net>.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
-rw-r--r-- | Documentation/networking/bonding.txt | 204 |
1 files changed, 143 insertions, 61 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 6cc30e0d5795..a0cda062bc33 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt | |||
@@ -1,7 +1,7 @@ | |||
1 | 1 | ||
2 | Linux Ethernet Bonding Driver HOWTO | 2 | Linux Ethernet Bonding Driver HOWTO |
3 | 3 | ||
4 | Latest update: 24 April 2006 | 4 | Latest update: 12 November 2007 |
5 | 5 | ||
6 | Initial release : Thomas Davis <tadavis at lbl.gov> | 6 | Initial release : Thomas Davis <tadavis at lbl.gov> |
7 | Corrections, HA extensions : 2000/10/03-15 : | 7 | Corrections, HA extensions : 2000/10/03-15 : |
@@ -166,12 +166,17 @@ to use ifenslave. | |||
166 | 2. Bonding Driver Options | 166 | 2. Bonding Driver Options |
167 | ========================= | 167 | ========================= |
168 | 168 | ||
169 | Options for the bonding driver are supplied as parameters to | 169 | Options for the bonding driver are supplied as parameters to the |
170 | the bonding module at load time. They may be given as command line | 170 | bonding module at load time, or are specified via sysfs. |
171 | arguments to the insmod or modprobe command, but are usually specified | 171 | |
172 | in either the /etc/modules.conf or /etc/modprobe.conf configuration | 172 | Module options may be given as command line arguments to the |
173 | file, or in a distro-specific configuration file (some of which are | 173 | insmod or modprobe command, but are usually specified in either the |
174 | detailed in the next section). | 174 | /etc/modules.conf or /etc/modprobe.conf configuration file, or in a |
175 | distro-specific configuration file (some of which are detailed in the next | ||
176 | section). | ||
177 | |||
178 | Details on bonding support for sysfs is provided in the | ||
179 | "Configuring Bonding Manually via Sysfs" section, below. | ||
175 | 180 | ||
176 | The available bonding driver parameters are listed below. If a | 181 | The available bonding driver parameters are listed below. If a |
177 | parameter is not specified the default value is used. When initially | 182 | parameter is not specified the default value is used. When initially |
@@ -812,11 +817,13 @@ the system /etc/modules.conf or /etc/modprobe.conf configuration file. | |||
812 | 3.2 Configuration with Initscripts Support | 817 | 3.2 Configuration with Initscripts Support |
813 | ------------------------------------------ | 818 | ------------------------------------------ |
814 | 819 | ||
815 | This section applies to distros using a version of initscripts | 820 | This section applies to distros using a recent version of |
816 | with bonding support, for example, Red Hat Linux 9 or Red Hat | 821 | initscripts with bonding support, for example, Red Hat Enterprise Linux |
817 | Enterprise Linux version 3 or 4. On these systems, the network | 822 | version 3 or later, Fedora, etc. On these systems, the network |
818 | initialization scripts have some knowledge of bonding, and can be | 823 | initialization scripts have knowledge of bonding, and can be configured to |
819 | configured to control bonding devices. | 824 | control bonding devices. Note that older versions of the initscripts |
825 | package have lower levels of support for bonding; this will be noted where | ||
826 | applicable. | ||
820 | 827 | ||
821 | These distros will not automatically load the network adapter | 828 | These distros will not automatically load the network adapter |
822 | driver unless the ethX device is configured with an IP address. | 829 | driver unless the ethX device is configured with an IP address. |
@@ -864,11 +871,31 @@ USERCTL=no | |||
864 | Be sure to change the networking specific lines (IPADDR, | 871 | Be sure to change the networking specific lines (IPADDR, |
865 | NETMASK, NETWORK and BROADCAST) to match your network configuration. | 872 | NETMASK, NETWORK and BROADCAST) to match your network configuration. |
866 | 873 | ||
867 | Finally, it is necessary to edit /etc/modules.conf (or | 874 | For later versions of initscripts, such as that found with Fedora |
868 | /etc/modprobe.conf, depending upon your distro) to load the bonding | 875 | 7 and Red Hat Enterprise Linux version 5 (or later), it is possible, and, |
869 | module with your desired options when the bond0 interface is brought | 876 | indeed, preferable, to specify the bonding options in the ifcfg-bond0 |
870 | up. The following lines in /etc/modules.conf (or modprobe.conf) will | 877 | file, e.g. a line of the format: |
871 | load the bonding module, and select its options: | 878 | |
879 | BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=+192.168.1.254" | ||
880 | |||
881 | will configure the bond with the specified options. The options | ||
882 | specified in BONDING_OPTS are identical to the bonding module parameters | ||
883 | except for the arp_ip_target field. Each target should be included as a | ||
884 | separate option and should be preceded by a '+' to indicate it should be | ||
885 | added to the list of queried targets, e.g., | ||
886 | |||
887 | arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2 | ||
888 | |||
889 | is the proper syntax to specify multiple targets. When specifying | ||
890 | options via BONDING_OPTS, it is not necessary to edit /etc/modules.conf or | ||
891 | /etc/modprobe.conf. | ||
892 | |||
893 | For older versions of initscripts that do not support | ||
894 | BONDING_OPTS, it is necessary to edit /etc/modules.conf (or | ||
895 | /etc/modprobe.conf, depending upon your distro) to load the bonding module | ||
896 | with your desired options when the bond0 interface is brought up. The | ||
897 | following lines in /etc/modules.conf (or modprobe.conf) will load the | ||
898 | bonding module, and select its options: | ||
872 | 899 | ||
873 | alias bond0 bonding | 900 | alias bond0 bonding |
874 | options bond0 mode=balance-alb miimon=100 | 901 | options bond0 mode=balance-alb miimon=100 |
@@ -883,9 +910,10 @@ up and running. | |||
883 | 3.2.1 Using DHCP with Initscripts | 910 | 3.2.1 Using DHCP with Initscripts |
884 | --------------------------------- | 911 | --------------------------------- |
885 | 912 | ||
886 | Recent versions of initscripts (the version supplied with | 913 | Recent versions of initscripts (the versions supplied with Fedora |
887 | Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do | 914 | Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to |
888 | have support for assigning IP information to bonding devices via DHCP. | 915 | work) have support for assigning IP information to bonding devices via |
916 | DHCP. | ||
889 | 917 | ||
890 | To configure bonding for DHCP, configure it as described | 918 | To configure bonding for DHCP, configure it as described |
891 | above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" | 919 | above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" |
@@ -895,18 +923,14 @@ is case sensitive. | |||
895 | 3.2.2 Configuring Multiple Bonds with Initscripts | 923 | 3.2.2 Configuring Multiple Bonds with Initscripts |
896 | ------------------------------------------------- | 924 | ------------------------------------------------- |
897 | 925 | ||
898 | At this writing, the initscripts package does not directly | 926 | Initscripts packages that are included with Fedora 7 and Red Hat |
899 | support loading the bonding driver multiple times, so the process for | 927 | Enterprise Linux 5 support multiple bonding interfaces by simply |
900 | doing so is the same as described in the "Configuring Multiple Bonds | 928 | specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the |
901 | Manually" section, below. | 929 | number of the bond. This support requires sysfs support in the kernel, |
902 | 930 | and a bonding driver of version 3.0.0 or later. Other configurations may | |
903 | NOTE: It has been observed that some Red Hat supplied kernels | 931 | not support this method for specifying multiple bonding interfaces; for |
904 | are apparently unable to rename modules at load time (the "-o bond1" | 932 | those instances, see the "Configuring Multiple Bonds Manually" section, |
905 | part). Attempts to pass that option to modprobe will produce an | 933 | below. |
906 | "Operation not permitted" error. This has been reported on some | ||
907 | Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels | ||
908 | exhibiting this problem, it will be impossible to configure multiple | ||
909 | bonds with differing parameters. | ||
910 | 934 | ||
911 | 3.3 Configuring Bonding Manually with Ifenslave | 935 | 3.3 Configuring Bonding Manually with Ifenslave |
912 | ----------------------------------------------- | 936 | ----------------------------------------------- |
@@ -977,15 +1001,58 @@ initialization scripts lack support for configuring multiple bonds. | |||
977 | options, you may wish to use the "max_bonds" module parameter, | 1001 | options, you may wish to use the "max_bonds" module parameter, |
978 | documented above. | 1002 | documented above. |
979 | 1003 | ||
980 | To create multiple bonding devices with differing options, it | 1004 | To create multiple bonding devices with differing options, it is |
981 | is necessary to use bonding parameters exported by sysfs, documented | 1005 | preferrable to use bonding parameters exported by sysfs, documented in the |
982 | in the section below. | 1006 | section below. |
1007 | |||
1008 | For versions of bonding without sysfs support, the only means to | ||
1009 | provide multiple instances of bonding with differing options is to load | ||
1010 | the bonding driver multiple times. Note that current versions of the | ||
1011 | sysconfig network initialization scripts handle this automatically; if | ||
1012 | your distro uses these scripts, no special action is needed. See the | ||
1013 | section Configuring Bonding Devices, above, if you're not sure about your | ||
1014 | network initialization scripts. | ||
1015 | |||
1016 | To load multiple instances of the module, it is necessary to | ||
1017 | specify a different name for each instance (the module loading system | ||
1018 | requires that every loaded module, even multiple instances of the same | ||
1019 | module, have a unique name). This is accomplished by supplying multiple | ||
1020 | sets of bonding options in /etc/modprobe.conf, for example: | ||
1021 | |||
1022 | alias bond0 bonding | ||
1023 | options bond0 -o bond0 mode=balance-rr miimon=100 | ||
1024 | |||
1025 | alias bond1 bonding | ||
1026 | options bond1 -o bond1 mode=balance-alb miimon=50 | ||
1027 | |||
1028 | will load the bonding module two times. The first instance is | ||
1029 | named "bond0" and creates the bond0 device in balance-rr mode with an | ||
1030 | miimon of 100. The second instance is named "bond1" and creates the | ||
1031 | bond1 device in balance-alb mode with an miimon of 50. | ||
1032 | |||
1033 | In some circumstances (typically with older distributions), | ||
1034 | the above does not work, and the second bonding instance never sees | ||
1035 | its options. In that case, the second options line can be substituted | ||
1036 | as follows: | ||
1037 | |||
1038 | install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ | ||
1039 | mode=balance-alb miimon=50 | ||
983 | 1040 | ||
1041 | This may be repeated any number of times, specifying a new and | ||
1042 | unique name in place of bond1 for each subsequent instance. | ||
1043 | |||
1044 | It has been observed that some Red Hat supplied kernels are unable | ||
1045 | to rename modules at load time (the "-o bond1" part). Attempts to pass | ||
1046 | that option to modprobe will produce an "Operation not permitted" error. | ||
1047 | This has been reported on some Fedora Core kernels, and has been seen on | ||
1048 | RHEL 4 as well. On kernels exhibiting this problem, it will be impossible | ||
1049 | to configure multiple bonds with differing parameters (as they are older | ||
1050 | kernels, and also lack sysfs support). | ||
984 | 1051 | ||
985 | 3.4 Configuring Bonding Manually via Sysfs | 1052 | 3.4 Configuring Bonding Manually via Sysfs |
986 | ------------------------------------------ | 1053 | ------------------------------------------ |
987 | 1054 | ||
988 | Starting with version 3.0, Channel Bonding may be configured | 1055 | Starting with version 3.0.0, Channel Bonding may be configured |
989 | via the sysfs interface. This interface allows dynamic configuration | 1056 | via the sysfs interface. This interface allows dynamic configuration |
990 | of all bonds in the system without unloading the module. It also | 1057 | of all bonds in the system without unloading the module. It also |
991 | allows for adding and removing bonds at runtime. Ifenslave is no | 1058 | allows for adding and removing bonds at runtime. Ifenslave is no |
@@ -1030,9 +1097,6 @@ To enslave interface eth0 to bond bond0: | |||
1030 | To free slave eth0 from bond bond0: | 1097 | To free slave eth0 from bond bond0: |
1031 | # echo -eth0 > /sys/class/net/bond0/bonding/slaves | 1098 | # echo -eth0 > /sys/class/net/bond0/bonding/slaves |
1032 | 1099 | ||
1033 | NOTE: The bond must be up before slaves can be added. All | ||
1034 | slaves are freed when the interface is brought down. | ||
1035 | |||
1036 | When an interface is enslaved to a bond, symlinks between the | 1100 | When an interface is enslaved to a bond, symlinks between the |
1037 | two are created in the sysfs filesystem. In this case, you would get | 1101 | two are created in the sysfs filesystem. In this case, you would get |
1038 | /sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and | 1102 | /sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and |
@@ -1622,6 +1686,15 @@ one for each switch in the network). This will insure that, | |||
1622 | regardless of which switch is active, the ARP monitor has a suitable | 1686 | regardless of which switch is active, the ARP monitor has a suitable |
1623 | target to query. | 1687 | target to query. |
1624 | 1688 | ||
1689 | Note, also, that of late many switches now support a functionality | ||
1690 | generally referred to as "trunk failover." This is a feature of the | ||
1691 | switch that causes the link state of a particular switch port to be set | ||
1692 | down (or up) when the state of another switch port goes down (or up). | ||
1693 | It's purpose is to propogate link failures from logically "exterior" ports | ||
1694 | to the logically "interior" ports that bonding is able to monitor via | ||
1695 | miimon. Availability and configuration for trunk failover varies by | ||
1696 | switch, but this can be a viable alternative to the ARP monitor when using | ||
1697 | suitable switches. | ||
1625 | 1698 | ||
1626 | 12. Configuring Bonding for Maximum Throughput | 1699 | 12. Configuring Bonding for Maximum Throughput |
1627 | ============================================== | 1700 | ============================================== |
@@ -1709,7 +1782,7 @@ balance-rr: This mode is the only mode that will permit a single | |||
1709 | interfaces. It is therefore the only mode that will allow a | 1782 | interfaces. It is therefore the only mode that will allow a |
1710 | single TCP/IP stream to utilize more than one interface's | 1783 | single TCP/IP stream to utilize more than one interface's |
1711 | worth of throughput. This comes at a cost, however: the | 1784 | worth of throughput. This comes at a cost, however: the |
1712 | striping often results in peer systems receiving packets out | 1785 | striping generally results in peer systems receiving packets out |
1713 | of order, causing TCP/IP's congestion control system to kick | 1786 | of order, causing TCP/IP's congestion control system to kick |
1714 | in, often by retransmitting segments. | 1787 | in, often by retransmitting segments. |
1715 | 1788 | ||
@@ -1721,22 +1794,20 @@ balance-rr: This mode is the only mode that will permit a single | |||
1721 | interface's worth of throughput, even after adjusting | 1794 | interface's worth of throughput, even after adjusting |
1722 | tcp_reordering. | 1795 | tcp_reordering. |
1723 | 1796 | ||
1724 | Note that this out of order delivery occurs when both the | 1797 | Note that the fraction of packets that will be delivered out of |
1725 | sending and receiving systems are utilizing a multiple | 1798 | order is highly variable, and is unlikely to be zero. The level |
1726 | interface bond. Consider a configuration in which a | 1799 | of reordering depends upon a variety of factors, including the |
1727 | balance-rr bond feeds into a single higher capacity network | 1800 | networking interfaces, the switch, and the topology of the |
1728 | channel (e.g., multiple 100Mb/sec ethernets feeding a single | 1801 | configuration. Speaking in general terms, higher speed network |
1729 | gigabit ethernet via an etherchannel capable switch). In this | 1802 | cards produce more reordering (due to factors such as packet |
1730 | configuration, traffic sent from the multiple 100Mb devices to | 1803 | coalescing), and a "many to many" topology will reorder at a |
1731 | a destination connected to the gigabit device will not see | 1804 | higher rate than a "many slow to one fast" configuration. |
1732 | packets out of order. However, traffic sent from the gigabit | 1805 | |
1733 | device to the multiple 100Mb devices may or may not see | 1806 | Many switches do not support any modes that stripe traffic |
1734 | traffic out of order, depending upon the balance policy of the | 1807 | (instead choosing a port based upon IP or MAC level addresses); |
1735 | switch. Many switches do not support any modes that stripe | 1808 | for those devices, traffic for a particular connection flowing |
1736 | traffic (instead choosing a port based upon IP or MAC level | 1809 | through the switch to a balance-rr bond will not utilize greater |
1737 | addresses); for those devices, traffic flowing from the | 1810 | than one interface's worth of bandwidth. |
1738 | gigabit device to the many 100Mb devices will only utilize one | ||
1739 | interface. | ||
1740 | 1811 | ||
1741 | If you are utilizing protocols other than TCP/IP, UDP for | 1812 | If you are utilizing protocols other than TCP/IP, UDP for |
1742 | example, and your application can tolerate out of order | 1813 | example, and your application can tolerate out of order |
@@ -1936,6 +2007,10 @@ Failover may be delayed via the downdelay bonding module option. | |||
1936 | 13.2 Duplicated Incoming Packets | 2007 | 13.2 Duplicated Incoming Packets |
1937 | -------------------------------- | 2008 | -------------------------------- |
1938 | 2009 | ||
2010 | NOTE: Starting with version 3.0.2, the bonding driver has logic to | ||
2011 | suppress duplicate packets, which should largely eliminate this problem. | ||
2012 | The following description is kept for reference. | ||
2013 | |||
1939 | It is not uncommon to observe a short burst of duplicated | 2014 | It is not uncommon to observe a short burst of duplicated |
1940 | traffic when the bonding device is first used, or after it has been | 2015 | traffic when the bonding device is first used, or after it has been |
1941 | idle for some period of time. This is most easily observed by issuing | 2016 | idle for some period of time. This is most easily observed by issuing |
@@ -2096,6 +2171,9 @@ The new driver was designed to be SMP safe from the start. | |||
2096 | EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, | 2171 | EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, |
2097 | devices need not be of the same speed. | 2172 | devices need not be of the same speed. |
2098 | 2173 | ||
2174 | Starting with version 3.2.1, bonding also supports Infiniband | ||
2175 | slaves in active-backup mode. | ||
2176 | |||
2099 | 3. How many bonding devices can I have? | 2177 | 3. How many bonding devices can I have? |
2100 | 2178 | ||
2101 | There is no limit. | 2179 | There is no limit. |
@@ -2154,11 +2232,15 @@ switches currently available support 802.3ad. | |||
2154 | 2232 | ||
2155 | 8. Where does a bonding device get its MAC address from? | 2233 | 8. Where does a bonding device get its MAC address from? |
2156 | 2234 | ||
2157 | If not explicitly configured (with ifconfig or ip link), the | 2235 | When using slave devices that have fixed MAC addresses, or when |
2158 | MAC address of the bonding device is taken from its first slave | 2236 | the fail_over_mac option is enabled, the bonding device's MAC address is |
2159 | device. This MAC address is then passed to all following slaves and | 2237 | the MAC address of the active slave. |
2160 | remains persistent (even if the first slave is removed) until the | 2238 | |
2161 | bonding device is brought down or reconfigured. | 2239 | For other configurations, if not explicitly configured (with |
2240 | ifconfig or ip link), the MAC address of the bonding device is taken from | ||
2241 | its first slave device. This MAC address is then passed to all following | ||
2242 | slaves and remains persistent (even if the first slave is removed) until | ||
2243 | the bonding device is brought down or reconfigured. | ||
2162 | 2244 | ||
2163 | If you wish to change the MAC address, you can set it with | 2245 | If you wish to change the MAC address, you can set it with |
2164 | ifconfig or ip link: | 2246 | ifconfig or ip link: |