aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorTrond Myklebust <Trond.Myklebust@netapp.com>2006-06-20 08:59:45 -0400
committerTrond Myklebust <Trond.Myklebust@netapp.com>2006-06-20 08:59:45 -0400
commitd59bf96cdde5b874a57bfd1425faa45da915d0b7 (patch)
tree351a40b72514d620e5bebea2de38c26f23277ffc /Documentation
parent28df955a2ad484d602314b30183ea8496a9aa34a (diff)
parent25f42b6af09e34c3f92107b36b5aa6edc2fdba2f (diff)
Merge branch 'master' of /home/trondmy/kernel/linux-2.6/
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/feature-removal-schedule.txt9
-rw-r--r--Documentation/infiniband/ipoib.txt12
-rw-r--r--Documentation/kernel-parameters.txt9
-rw-r--r--Documentation/memory-barriers.txt348
-rw-r--r--Documentation/networking/README.ipw220010
-rw-r--r--Documentation/networking/bonding.txt323
-rw-r--r--Documentation/networking/ip-sysctl.txt7
-rw-r--r--Documentation/networking/netdevices.txt8
8 files changed, 532 insertions, 194 deletions
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 43ab119963d5..f50cf8fac3f0 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -212,15 +212,6 @@ Who: Greg Kroah-Hartman <gregkh@suse.de>
212 212
213--------------------------- 213---------------------------
214 214
215What: Support for NEC DDB5074 and DDB5476 evaluation boards.
216When: June 2006
217Why: Board specific code doesn't build anymore since ~2.6.0 and no
218 users have complained indicating there is no more need for these
219 boards. This should really be considered a last call.
220Who: Ralf Baechle <ralf@linux-mips.org>
221
222---------------------------
223
224What: USB driver API moves to EXPORT_SYMBOL_GPL 215What: USB driver API moves to EXPORT_SYMBOL_GPL
225When: Febuary 2008 216When: Febuary 2008
226Files: include/linux/usb.h, drivers/usb/core/driver.c 217Files: include/linux/usb.h, drivers/usb/core/driver.c
diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt
index 5c5a4ccce76a..187035560d7f 100644
--- a/Documentation/infiniband/ipoib.txt
+++ b/Documentation/infiniband/ipoib.txt
@@ -1,10 +1,10 @@
1IP OVER INFINIBAND 1IP OVER INFINIBAND
2 2
3 The ib_ipoib driver is an implementation of the IP over InfiniBand 3 The ib_ipoib driver is an implementation of the IP over InfiniBand
4 protocol as specified by the latest Internet-Drafts issued by the 4 protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib
5 IETF ipoib working group. It is a "native" implementation in the 5 working group. It is a "native" implementation in the sense of
6 sense of setting the interface type to ARPHRD_INFINIBAND and the 6 setting the interface type to ARPHRD_INFINIBAND and the hardware
7 hardware address length to 20 (earlier proprietary implementations 7 address length to 20 (earlier proprietary implementations
8 masqueraded to the kernel as ethernet interfaces). 8 masqueraded to the kernel as ethernet interfaces).
9 9
10Partitions and P_Keys 10Partitions and P_Keys
@@ -53,3 +53,7 @@ References
53 53
54 IETF IP over InfiniBand (ipoib) Working Group 54 IETF IP over InfiniBand (ipoib) Working Group
55 http://ietf.org/html.charters/ipoib-charter.html 55 http://ietf.org/html.charters/ipoib-charter.html
56 Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
57 http://ietf.org/rfc/rfc4391.txt
58 IP over InfiniBand (IPoIB) Architecture (RFC 4392)
59 http://ietf.org/rfc/rfc4392.txt
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b3a6187e5305..a9d3a1794b23 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1402,6 +1402,15 @@ running once the system is up.
1402 If enabled at boot time, /selinux/disable can be used 1402 If enabled at boot time, /selinux/disable can be used
1403 later to disable prior to initial policy load. 1403 later to disable prior to initial policy load.
1404 1404
1405 selinux_compat_net =
1406 [SELINUX] Set initial selinux_compat_net flag value.
1407 Format: { "0" | "1" }
1408 0 -- use new secmark-based packet controls
1409 1 -- use legacy packet controls
1410 Default value is 0 (preferred).
1411 Value can be changed at runtime via
1412 /selinux/compat_net.
1413
1405 serialnumber [BUGS=IA-32] 1414 serialnumber [BUGS=IA-32]
1406 1415
1407 sg_def_reserved_size= [SCSI] 1416 sg_def_reserved_size= [SCSI]
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index c61d8b876fdb..4710845dbac4 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -19,6 +19,7 @@ Contents:
19 - Control dependencies. 19 - Control dependencies.
20 - SMP barrier pairing. 20 - SMP barrier pairing.
21 - Examples of memory barrier sequences. 21 - Examples of memory barrier sequences.
22 - Read memory barriers vs load speculation.
22 23
23 (*) Explicit kernel barriers. 24 (*) Explicit kernel barriers.
24 25
@@ -248,7 +249,7 @@ And there are a number of things that _must_ or _must_not_ be assumed:
248 we may get either of: 249 we may get either of:
249 250
250 STORE *A = X; Y = LOAD *A; 251 STORE *A = X; Y = LOAD *A;
251 STORE *A = Y; 252 STORE *A = Y = X;
252 253
253 254
254========================= 255=========================
@@ -344,9 +345,12 @@ Memory barriers come in four basic varieties:
344 345
345 (4) General memory barriers. 346 (4) General memory barriers.
346 347
347 A general memory barrier is a combination of both a read memory barrier 348 A general memory barrier gives a guarantee that all the LOAD and STORE
348 and a write memory barrier. It is a partial ordering over both loads and 349 operations specified before the barrier will appear to happen before all
349 stores. 350 the LOAD and STORE operations specified after the barrier with respect to
351 the other components of the system.
352
353 A general memory barrier is a partial ordering over both loads and stores.
350 354
351 General memory barriers imply both read and write memory barriers, and so 355 General memory barriers imply both read and write memory barriers, and so
352 can substitute for either. 356 can substitute for either.
@@ -546,9 +550,9 @@ write barrier, though, again, a general barrier is viable:
546 =============== =============== 550 =============== ===============
547 a = 1; 551 a = 1;
548 <write barrier> 552 <write barrier>
549 b = 2; x = a; 553 b = 2; x = b;
550 <read barrier> 554 <read barrier>
551 y = b; 555 y = a;
552 556
553Or: 557Or:
554 558
@@ -563,6 +567,18 @@ Or:
563Basically, the read barrier always has to be there, even though it can be of 567Basically, the read barrier always has to be there, even though it can be of
564the "weaker" type. 568the "weaker" type.
565 569
570[!] Note that the stores before the write barrier would normally be expected to
571match the loads after the read barrier or data dependency barrier, and vice
572versa:
573
574 CPU 1 CPU 2
575 =============== ===============
576 a = 1; }---- --->{ v = c
577 b = 2; } \ / { w = d
578 <write barrier> \ <read barrier>
579 c = 3; } / \ { x = a;
580 d = 4; }---- --->{ y = b;
581
566 582
567EXAMPLES OF MEMORY BARRIER SEQUENCES 583EXAMPLES OF MEMORY BARRIER SEQUENCES
568------------------------------------ 584------------------------------------
@@ -600,8 +616,8 @@ STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E
600 | | +------+ 616 | | +------+
601 +-------+ : : 617 +-------+ : :
602 | 618 |
603 | Sequence in which stores committed to memory system 619 | Sequence in which stores are committed to the
604 | by CPU 1 620 | memory system by CPU 1
605 V 621 V
606 622
607 623
@@ -683,14 +699,12 @@ then the following will occur:
683 | : : | | 699 | : : | |
684 | : : | CPU 2 | 700 | : : | CPU 2 |
685 | +-------+ | | 701 | +-------+ | |
686 \ | X->9 |------>| | 702 | | X->9 |------>| |
687 \ +-------+ | | 703 | +-------+ | |
688 ----->| B->2 | | | 704 Makes sure all effects ---> \ ddddddddddddddddd | |
689 +-------+ | | 705 prior to the store of C \ +-------+ | |
690 Makes sure all effects ---> ddddddddddddddddd | | 706 are perceptible to ----->| B->2 |------>| |
691 prior to the store of C +-------+ | | 707 subsequent loads +-------+ | |
692 are perceptible to | B->2 |------>| |
693 successive loads +-------+ | |
694 : : +-------+ 708 : : +-------+
695 709
696 710
@@ -699,73 +713,239 @@ following sequence of events:
699 713
700 CPU 1 CPU 2 714 CPU 1 CPU 2
701 ======================= ======================= 715 ======================= =======================
716 { A = 0, B = 9 }
702 STORE A=1 717 STORE A=1
703 STORE B=2
704 STORE C=3
705 <write barrier> 718 <write barrier>
706 STORE D=4 719 STORE B=2
707 STORE E=5
708 LOAD A
709 LOAD B 720 LOAD B
710 LOAD C 721 LOAD A
711 LOAD D
712 LOAD E
713 722
714Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in 723Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in
715some effectively random order, despite the write barrier issued by CPU 1: 724some effectively random order, despite the write barrier issued by CPU 1:
716 725
717 +-------+ : : 726 +-------+ : : : :
718 | | +------+ 727 | | +------+ +-------+
719 | |------>| C=3 | } 728 | |------>| A=1 |------ --->| A->0 |
720 | | : +------+ } 729 | | +------+ \ +-------+
721 | | : | A=1 | } 730 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
722 | | : +------+ } 731 | | +------+ | +-------+
723 | CPU 1 | : | B=2 | }--- 732 | |------>| B=2 |--- | : :
724 | | +------+ } \ 733 | | +------+ \ | : : +-------+
725 | | wwwwwwwwwwwww} \ 734 +-------+ : : \ | +-------+ | |
726 | | +------+ } \ : : +-------+ 735 ---------->| B->2 |------>| |
727 | | : | E=5 | } \ +-------+ | | 736 | +-------+ | CPU 2 |
728 | | : +------+ } \ { | C->3 |------>| | 737 | | A->0 |------>| |
729 | |------>| D=4 | } \ { +-------+ : | | 738 | +-------+ | |
730 | | +------+ \ { | E->5 | : | | 739 | : : +-------+
731 +-------+ : : \ { +-------+ : | | 740 \ : :
732 Transfer -->{ | A->1 | : | CPU 2 | 741 \ +-------+
733 from CPU 1 { +-------+ : | | 742 ---->| A->1 |
734 to CPU 2 { | D->4 | : | | 743 +-------+
735 { +-------+ : | | 744 : :
736 { | B->2 |------>| |
737 +-------+ | |
738 : : +-------+
739
740
741If, however, a read barrier were to be placed between the load of C and the
742load of D on CPU 2, then the partial ordering imposed by CPU 1 will be
743perceived correctly by CPU 2.
744 745
745 +-------+ : : 746
746 | | +------+ 747If, however, a read barrier were to be placed between the load of E and the
747 | |------>| C=3 | } 748load of A on CPU 2:
748 | | : +------+ } 749
749 | | : | A=1 | }--- 750 CPU 1 CPU 2
750 | | : +------+ } \ 751 ======================= =======================
751 | CPU 1 | : | B=2 | } \ 752 { A = 0, B = 9 }
752 | | +------+ \ 753 STORE A=1
753 | | wwwwwwwwwwwwwwww \ 754 <write barrier>
754 | | +------+ \ : : +-------+ 755 STORE B=2
755 | | : | E=5 | } \ +-------+ | | 756 LOAD B
756 | | : +------+ }--- \ { | C->3 |------>| | 757 <read barrier>
757 | |------>| D=4 | } \ \ { +-------+ : | | 758 LOAD A
758 | | +------+ \ -->{ | B->2 | : | | 759
759 +-------+ : : \ { +-------+ : | | 760then the partial ordering imposed by CPU 1 will be perceived correctly by CPU
760 \ { | A->1 | : | CPU 2 | 7612:
761 \ +-------+ | | 762
762 At this point the read ----> \ rrrrrrrrrrrrrrrrr | | 763 +-------+ : : : :
763 barrier causes all effects \ +-------+ | | 764 | | +------+ +-------+
764 prior to the storage of C \ { | E->5 | : | | 765 | |------>| A=1 |------ --->| A->0 |
765 to be perceptible to CPU 2 -->{ +-------+ : | | 766 | | +------+ \ +-------+
766 { | D->4 |------>| | 767 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
767 +-------+ | | 768 | | +------+ | +-------+
768 : : +-------+ 769 | |------>| B=2 |--- | : :
770 | | +------+ \ | : : +-------+
771 +-------+ : : \ | +-------+ | |
772 ---------->| B->2 |------>| |
773 | +-------+ | CPU 2 |
774 | : : | |
775 | : : | |
776 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
777 barrier causes all effects \ +-------+ | |
778 prior to the storage of B ---->| A->1 |------>| |
779 to be perceptible to CPU 2 +-------+ | |
780 : : +-------+
781
782
783To illustrate this more completely, consider what could happen if the code
784contained a load of A either side of the read barrier:
785
786 CPU 1 CPU 2
787 ======================= =======================
788 { A = 0, B = 9 }
789 STORE A=1
790 <write barrier>
791 STORE B=2
792 LOAD B
793 LOAD A [first load of A]
794 <read barrier>
795 LOAD A [second load of A]
796
797Even though the two loads of A both occur after the load of B, they may both
798come up with different values:
799
800 +-------+ : : : :
801 | | +------+ +-------+
802 | |------>| A=1 |------ --->| A->0 |
803 | | +------+ \ +-------+
804 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
805 | | +------+ | +-------+
806 | |------>| B=2 |--- | : :
807 | | +------+ \ | : : +-------+
808 +-------+ : : \ | +-------+ | |
809 ---------->| B->2 |------>| |
810 | +-------+ | CPU 2 |
811 | : : | |
812 | : : | |
813 | +-------+ | |
814 | | A->0 |------>| 1st |
815 | +-------+ | |
816 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
817 barrier causes all effects \ +-------+ | |
818 prior to the storage of B ---->| A->1 |------>| 2nd |
819 to be perceptible to CPU 2 +-------+ | |
820 : : +-------+
821
822
823But it may be that the update to A from CPU 1 becomes perceptible to CPU 2
824before the read barrier completes anyway:
825
826 +-------+ : : : :
827 | | +------+ +-------+
828 | |------>| A=1 |------ --->| A->0 |
829 | | +------+ \ +-------+
830 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
831 | | +------+ | +-------+
832 | |------>| B=2 |--- | : :
833 | | +------+ \ | : : +-------+
834 +-------+ : : \ | +-------+ | |
835 ---------->| B->2 |------>| |
836 | +-------+ | CPU 2 |
837 | : : | |
838 \ : : | |
839 \ +-------+ | |
840 ---->| A->1 |------>| 1st |
841 +-------+ | |
842 rrrrrrrrrrrrrrrrr | |
843 +-------+ | |
844 | A->1 |------>| 2nd |
845 +-------+ | |
846 : : +-------+
847
848
849The guarantee is that the second load will always come up with A == 1 if the
850load of B came up with B == 2. No such guarantee exists for the first load of
851A; that may come up with either A == 0 or A == 1.
852
853
854READ MEMORY BARRIERS VS LOAD SPECULATION
855----------------------------------------
856
857Many CPUs speculate with loads: that is they see that they will need to load an
858item from memory, and they find a time where they're not using the bus for any
859other loads, and so do the load in advance - even though they haven't actually
860got to that point in the instruction execution flow yet. This permits the
861actual load instruction to potentially complete immediately because the CPU
862already has the value to hand.
863
864It may turn out that the CPU didn't actually need the value - perhaps because a
865branch circumvented the load - in which case it can discard the value or just
866cache it for later use.
867
868Consider:
869
870 CPU 1 CPU 2
871 ======================= =======================
872 LOAD B
873 DIVIDE } Divide instructions generally
874 DIVIDE } take a long time to perform
875 LOAD A
876
877Which might appear as this:
878
879 : : +-------+
880 +-------+ | |
881 --->| B->2 |------>| |
882 +-------+ | CPU 2 |
883 : :DIVIDE | |
884 +-------+ | |
885 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
886 division speculates on the +-------+ ~ | |
887 LOAD of A : : ~ | |
888 : :DIVIDE | |
889 : : ~ | |
890 Once the divisions are complete --> : : ~-->| |
891 the CPU can then perform the : : | |
892 LOAD with immediate effect : : +-------+
893
894
895Placing a read barrier or a data dependency barrier just before the second
896load:
897
898 CPU 1 CPU 2
899 ======================= =======================
900 LOAD B
901 DIVIDE
902 DIVIDE
903 <read barrier>
904 LOAD A
905
906will force any value speculatively obtained to be reconsidered to an extent
907dependent on the type of barrier used. If there was no change made to the
908speculated memory location, then the speculated value will just be used:
909
910 : : +-------+
911 +-------+ | |
912 --->| B->2 |------>| |
913 +-------+ | CPU 2 |
914 : :DIVIDE | |
915 +-------+ | |
916 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
917 division speculates on the +-------+ ~ | |
918 LOAD of A : : ~ | |
919 : :DIVIDE | |
920 : : ~ | |
921 : : ~ | |
922 rrrrrrrrrrrrrrrr~ | |
923 : : ~ | |
924 : : ~-->| |
925 : : | |
926 : : +-------+
927
928
929but if there was an update or an invalidation from another CPU pending, then
930the speculation will be cancelled and the value reloaded:
931
932 : : +-------+
933 +-------+ | |
934 --->| B->2 |------>| |
935 +-------+ | CPU 2 |
936 : :DIVIDE | |
937 +-------+ | |
938 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
939 division speculates on the +-------+ ~ | |
940 LOAD of A : : ~ | |
941 : :DIVIDE | |
942 : : ~ | |
943 : : ~ | |
944 rrrrrrrrrrrrrrrrr | |
945 +-------+ | |
946 The speculation is discarded ---> --->| A->1 |------>| |
947 and an updated value is +-------+ | |
948 retrieved : : +-------+
769 949
770 950
771======================== 951========================
@@ -901,7 +1081,7 @@ IMPLICIT KERNEL MEMORY BARRIERS
901=============================== 1081===============================
902 1082
903Some of the other functions in the linux kernel imply memory barriers, amongst 1083Some of the other functions in the linux kernel imply memory barriers, amongst
904which are locking, scheduling and memory allocation functions. 1084which are locking and scheduling functions.
905 1085
906This specification is a _minimum_ guarantee; any particular architecture may 1086This specification is a _minimum_ guarantee; any particular architecture may
907provide more substantial guarantees, but these may not be relied upon outside 1087provide more substantial guarantees, but these may not be relied upon outside
@@ -966,6 +1146,20 @@ equivalent to a full barrier, but a LOCK followed by an UNLOCK is not.
966 barriers is that the effects instructions outside of a critical section may 1146 barriers is that the effects instructions outside of a critical section may
967 seep into the inside of the critical section. 1147 seep into the inside of the critical section.
968 1148
1149A LOCK followed by an UNLOCK may not be assumed to be full memory barrier
1150because it is possible for an access preceding the LOCK to happen after the
1151LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
1152two accesses can themselves then cross:
1153
1154 *A = a;
1155 LOCK
1156 UNLOCK
1157 *B = b;
1158
1159may occur as:
1160
1161 LOCK, STORE *B, STORE *A, UNLOCK
1162
969Locks and semaphores may not provide any guarantee of ordering on UP compiled 1163Locks and semaphores may not provide any guarantee of ordering on UP compiled
970systems, and so cannot be counted on in such a situation to actually achieve 1164systems, and so cannot be counted on in such a situation to actually achieve
971anything at all - especially with respect to I/O accesses - unless combined 1165anything at all - especially with respect to I/O accesses - unless combined
@@ -1016,8 +1210,6 @@ Other functions that imply barriers:
1016 1210
1017 (*) schedule() and similar imply full memory barriers. 1211 (*) schedule() and similar imply full memory barriers.
1018 1212
1019 (*) Memory allocation and release functions imply full memory barriers.
1020
1021 1213
1022================================= 1214=================================
1023INTER-CPU LOCKING BARRIER EFFECTS 1215INTER-CPU LOCKING BARRIER EFFECTS
diff --git a/Documentation/networking/README.ipw2200 b/Documentation/networking/README.ipw2200
index acb30c5dcff3..4f2a40f1dbc6 100644
--- a/Documentation/networking/README.ipw2200
+++ b/Documentation/networking/README.ipw2200
@@ -14,8 +14,8 @@ Copyright (C) 2004-2006, Intel Corporation
14 14
15README.ipw2200 15README.ipw2200
16 16
17Version: 1.0.8 17Version: 1.1.2
18Date : October 20, 2005 18Date : March 30, 2006
19 19
20 20
21Index 21Index
@@ -103,7 +103,7 @@ file.
103 103
1041.1. Overview of Features 1041.1. Overview of Features
105----------------------------------------------- 105-----------------------------------------------
106The current release (1.0.8) supports the following features: 106The current release (1.1.2) supports the following features:
107 107
108+ BSS mode (Infrastructure, Managed) 108+ BSS mode (Infrastructure, Managed)
109+ IBSS mode (Ad-Hoc) 109+ IBSS mode (Ad-Hoc)
@@ -247,8 +247,8 @@ and can set the contents via echo. For example:
247% cat /sys/bus/pci/drivers/ipw2200/debug_level 247% cat /sys/bus/pci/drivers/ipw2200/debug_level
248 248
249Will report the current debug level of the driver's logging subsystem 249Will report the current debug level of the driver's logging subsystem
250(only available if CONFIG_IPW_DEBUG was configured when the driver was 250(only available if CONFIG_IPW2200_DEBUG was configured when the driver
251built). 251was built).
252 252
253You can set the debug level via: 253You can set the debug level via:
254 254
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 8d8b4e5ea184..afac780445cd 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -1,7 +1,7 @@
1 1
2 Linux Ethernet Bonding Driver HOWTO 2 Linux Ethernet Bonding Driver HOWTO
3 3
4 Latest update: 21 June 2005 4 Latest update: 24 April 2006
5 5
6Initial release : Thomas Davis <tadavis at lbl.gov> 6Initial release : Thomas Davis <tadavis at lbl.gov>
7Corrections, HA extensions : 2000/10/03-15 : 7Corrections, HA extensions : 2000/10/03-15 :
@@ -12,6 +12,8 @@ Corrections, HA extensions : 2000/10/03-15 :
12 - Jay Vosburgh <fubar at us dot ibm dot com> 12 - Jay Vosburgh <fubar at us dot ibm dot com>
13 13
14Reorganized and updated Feb 2005 by Jay Vosburgh 14Reorganized and updated Feb 2005 by Jay Vosburgh
15Added Sysfs information: 2006/04/24
16 - Mitch Williams <mitch.a.williams at intel.com>
15 17
16Introduction 18Introduction
17============ 19============
@@ -38,61 +40,62 @@ Table of Contents
382. Bonding Driver Options 402. Bonding Driver Options
39 41
403. Configuring Bonding Devices 423. Configuring Bonding Devices
413.1 Configuration with sysconfig support 433.1 Configuration with Sysconfig Support
423.1.1 Using DHCP with sysconfig 443.1.1 Using DHCP with Sysconfig
433.1.2 Configuring Multiple Bonds with sysconfig 453.1.2 Configuring Multiple Bonds with Sysconfig
443.2 Configuration with initscripts support 463.2 Configuration with Initscripts Support
453.2.1 Using DHCP with initscripts 473.2.1 Using DHCP with Initscripts
463.2.2 Configuring Multiple Bonds with initscripts 483.2.2 Configuring Multiple Bonds with Initscripts
473.3 Configuring Bonding Manually 493.3 Configuring Bonding Manually with Ifenslave
483.3.1 Configuring Multiple Bonds Manually 503.3.1 Configuring Multiple Bonds Manually
513.4 Configuring Bonding Manually via Sysfs
49 52
505. Querying Bonding Configuration 534. Querying Bonding Configuration
515.1 Bonding Configuration 544.1 Bonding Configuration
525.2 Network Configuration 554.2 Network Configuration
53 56
546. Switch Configuration 575. Switch Configuration
55 58
567. 802.1q VLAN Support 596. 802.1q VLAN Support
57 60
588. Link Monitoring 617. Link Monitoring
598.1 ARP Monitor Operation 627.1 ARP Monitor Operation
608.2 Configuring Multiple ARP Targets 637.2 Configuring Multiple ARP Targets
618.3 MII Monitor Operation 647.3 MII Monitor Operation
62 65
639. Potential Trouble Sources 668. Potential Trouble Sources
649.1 Adventures in Routing 678.1 Adventures in Routing
659.2 Ethernet Device Renaming 688.2 Ethernet Device Renaming
669.3 Painfully Slow Or No Failed Link Detection By Miimon 698.3 Painfully Slow Or No Failed Link Detection By Miimon
67 70
6810. SNMP agents 719. SNMP agents
69 72
7011. Promiscuous mode 7310. Promiscuous mode
71 74
7212. Configuring Bonding for High Availability 7511. Configuring Bonding for High Availability
7312.1 High Availability in a Single Switch Topology 7611.1 High Availability in a Single Switch Topology
7412.2 High Availability in a Multiple Switch Topology 7711.2 High Availability in a Multiple Switch Topology
7512.2.1 HA Bonding Mode Selection for Multiple Switch Topology 7811.2.1 HA Bonding Mode Selection for Multiple Switch Topology
7612.2.2 HA Link Monitoring for Multiple Switch Topology 7911.2.2 HA Link Monitoring for Multiple Switch Topology
77 80
7813. Configuring Bonding for Maximum Throughput 8112. Configuring Bonding for Maximum Throughput
7913.1 Maximum Throughput in a Single Switch Topology 8212.1 Maximum Throughput in a Single Switch Topology
8013.1.1 MT Bonding Mode Selection for Single Switch Topology 8312.1.1 MT Bonding Mode Selection for Single Switch Topology
8113.1.2 MT Link Monitoring for Single Switch Topology 8412.1.2 MT Link Monitoring for Single Switch Topology
8213.2 Maximum Throughput in a Multiple Switch Topology 8512.2 Maximum Throughput in a Multiple Switch Topology
8313.2.1 MT Bonding Mode Selection for Multiple Switch Topology 8612.2.1 MT Bonding Mode Selection for Multiple Switch Topology
8413.2.2 MT Link Monitoring for Multiple Switch Topology 8712.2.2 MT Link Monitoring for Multiple Switch Topology
85 88
8614. Switch Behavior Issues 8913. Switch Behavior Issues
8714.1 Link Establishment and Failover Delays 9013.1 Link Establishment and Failover Delays
8814.2 Duplicated Incoming Packets 9113.2 Duplicated Incoming Packets
89 92
9015. Hardware Specific Considerations 9314. Hardware Specific Considerations
9115.1 IBM BladeCenter 9414.1 IBM BladeCenter
92 95
9316. Frequently Asked Questions 9615. Frequently Asked Questions
94 97
9517. Resources and Links 9816. Resources and Links
96 99
97 100
981. Bonding Driver Installation 1011. Bonding Driver Installation
@@ -156,6 +159,9 @@ you're trying to build it for. Some distros (e.g., Red Hat from 7.1
156onwards) do not have /usr/include/linux symbolically linked to the 159onwards) do not have /usr/include/linux symbolically linked to the
157default kernel source include directory. 160default kernel source include directory.
158 161
162SECOND IMPORTANT NOTE:
163 If you plan to configure bonding using sysfs, you do not need
164to use ifenslave.
159 165
1602. Bonding Driver Options 1662. Bonding Driver Options
161========================= 167=========================
@@ -270,7 +276,7 @@ mode
270 In bonding version 2.6.2 or later, when a failover 276 In bonding version 2.6.2 or later, when a failover
271 occurs in active-backup mode, bonding will issue one 277 occurs in active-backup mode, bonding will issue one
272 or more gratuitous ARPs on the newly active slave. 278 or more gratuitous ARPs on the newly active slave.
273 One gratutious ARP is issued for the bonding master 279 One gratuitous ARP is issued for the bonding master
274 interface and each VLAN interfaces configured above 280 interface and each VLAN interfaces configured above
275 it, provided that the interface has at least one IP 281 it, provided that the interface has at least one IP
276 address configured. Gratuitous ARPs issued for VLAN 282 address configured. Gratuitous ARPs issued for VLAN
@@ -377,7 +383,7 @@ mode
377 When a link is reconnected or a new slave joins the 383 When a link is reconnected or a new slave joins the
378 bond the receive traffic is redistributed among all 384 bond the receive traffic is redistributed among all
379 active slaves in the bond by initiating ARP Replies 385 active slaves in the bond by initiating ARP Replies
380 with the selected mac address to each of the 386 with the selected MAC address to each of the
381 clients. The updelay parameter (detailed below) must 387 clients. The updelay parameter (detailed below) must
382 be set to a value equal or greater than the switch's 388 be set to a value equal or greater than the switch's
383 forwarding delay so that the ARP Replies sent to the 389 forwarding delay so that the ARP Replies sent to the
@@ -498,11 +504,12 @@ not exist, and the layer2 policy is the only policy.
4983. Configuring Bonding Devices 5043. Configuring Bonding Devices
499============================== 505==============================
500 506
501 There are, essentially, two methods for configuring bonding: 507 You can configure bonding using either your distro's network
502with support from the distro's network initialization scripts, and 508initialization scripts, or manually using either ifenslave or the
503without. Distros generally use one of two packages for the network 509sysfs interface. Distros generally use one of two packages for the
504initialization scripts: initscripts or sysconfig. Recent versions of 510network initialization scripts: initscripts or sysconfig. Recent
505these packages have support for bonding, while older versions do not. 511versions of these packages have support for bonding, while older
512versions do not.
506 513
507 We will first describe the options for configuring bonding for 514 We will first describe the options for configuring bonding for
508distros using versions of initscripts and sysconfig with full or 515distros using versions of initscripts and sysconfig with full or
@@ -530,7 +537,7 @@ $ grep ifenslave /sbin/ifup
530 If this returns any matches, then your initscripts or 537 If this returns any matches, then your initscripts or
531sysconfig has support for bonding. 538sysconfig has support for bonding.
532 539
5333.1 Configuration with sysconfig support 5403.1 Configuration with Sysconfig Support
534---------------------------------------- 541----------------------------------------
535 542
536 This section applies to distros using a version of sysconfig 543 This section applies to distros using a version of sysconfig
@@ -538,7 +545,7 @@ with bonding support, for example, SuSE Linux Enterprise Server 9.
538 545
539 SuSE SLES 9's networking configuration system does support 546 SuSE SLES 9's networking configuration system does support
540bonding, however, at this writing, the YaST system configuration 547bonding, however, at this writing, the YaST system configuration
541frontend does not provide any means to work with bonding devices. 548front end does not provide any means to work with bonding devices.
542Bonding devices can be managed by hand, however, as follows. 549Bonding devices can be managed by hand, however, as follows.
543 550
544 First, if they have not already been configured, configure the 551 First, if they have not already been configured, configure the
@@ -660,7 +667,7 @@ format can be found in an example ifcfg template file:
660 Note that the template does not document the various BONDING_ 667 Note that the template does not document the various BONDING_
661settings described above, but does describe many of the other options. 668settings described above, but does describe many of the other options.
662 669
6633.1.1 Using DHCP with sysconfig 6703.1.1 Using DHCP with Sysconfig
664------------------------------- 671-------------------------------
665 672
666 Under sysconfig, configuring a device with BOOTPROTO='dhcp' 673 Under sysconfig, configuring a device with BOOTPROTO='dhcp'
@@ -670,7 +677,7 @@ attempt to obtain the device address from DHCP prior to adding any of
670the slave devices. Without active slaves, the DHCP requests are not 677the slave devices. Without active slaves, the DHCP requests are not
671sent to the network. 678sent to the network.
672 679
6733.1.2 Configuring Multiple Bonds with sysconfig 6803.1.2 Configuring Multiple Bonds with Sysconfig
674----------------------------------------------- 681-----------------------------------------------
675 682
676 The sysconfig network initialization system is capable of 683 The sysconfig network initialization system is capable of
@@ -685,7 +692,7 @@ ifcfg-bondX files.
685options in the ifcfg-bondX file, it is not necessary to add them to 692options in the ifcfg-bondX file, it is not necessary to add them to
686the system /etc/modules.conf or /etc/modprobe.conf configuration file. 693the system /etc/modules.conf or /etc/modprobe.conf configuration file.
687 694
6883.2 Configuration with initscripts support 6953.2 Configuration with Initscripts Support
689------------------------------------------ 696------------------------------------------
690 697
691 This section applies to distros using a version of initscripts 698 This section applies to distros using a version of initscripts
@@ -756,7 +763,7 @@ options for your configuration.
756will restart the networking subsystem and your bond link should be now 763will restart the networking subsystem and your bond link should be now
757up and running. 764up and running.
758 765
7593.2.1 Using DHCP with initscripts 7663.2.1 Using DHCP with Initscripts
760--------------------------------- 767---------------------------------
761 768
762 Recent versions of initscripts (the version supplied with 769 Recent versions of initscripts (the version supplied with
@@ -768,7 +775,7 @@ above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp"
768and add a line consisting of "TYPE=Bonding". Note that the TYPE value 775and add a line consisting of "TYPE=Bonding". Note that the TYPE value
769is case sensitive. 776is case sensitive.
770 777
7713.2.2 Configuring Multiple Bonds with initscripts 7783.2.2 Configuring Multiple Bonds with Initscripts
772------------------------------------------------- 779-------------------------------------------------
773 780
774 At this writing, the initscripts package does not directly 781 At this writing, the initscripts package does not directly
@@ -784,8 +791,8 @@ Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels
784exhibiting this problem, it will be impossible to configure multiple 791exhibiting this problem, it will be impossible to configure multiple
785bonds with differing parameters. 792bonds with differing parameters.
786 793
7873.3 Configuring Bonding Manually 7943.3 Configuring Bonding Manually with Ifenslave
788-------------------------------- 795-----------------------------------------------
789 796
790 This section applies to distros whose network initialization 797 This section applies to distros whose network initialization
791scripts (the sysconfig or initscripts package) do not have specific 798scripts (the sysconfig or initscripts package) do not have specific
@@ -889,11 +896,139 @@ install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \
889 This may be repeated any number of times, specifying a new and 896 This may be repeated any number of times, specifying a new and
890unique name in place of bond1 for each subsequent instance. 897unique name in place of bond1 for each subsequent instance.
891 898
8993.4 Configuring Bonding Manually via Sysfs
900------------------------------------------
901
902 Starting with version 3.0, Channel Bonding may be configured
903via the sysfs interface. This interface allows dynamic configuration
904of all bonds in the system without unloading the module. It also
905allows for adding and removing bonds at runtime. Ifenslave is no
906longer required, though it is still supported.
907
908 Use of the sysfs interface allows you to use multiple bonds
909with different configurations without having to reload the module.
910It also allows you to use multiple, differently configured bonds when
911bonding is compiled into the kernel.
912
913 You must have the sysfs filesystem mounted to configure
914bonding this way. The examples in this document assume that you
915are using the standard mount point for sysfs, e.g. /sys. If your
916sysfs filesystem is mounted elsewhere, you will need to adjust the
917example paths accordingly.
918
919Creating and Destroying Bonds
920-----------------------------
921To add a new bond foo:
922# echo +foo > /sys/class/net/bonding_masters
923
924To remove an existing bond bar:
925# echo -bar > /sys/class/net/bonding_masters
926
927To show all existing bonds:
928# cat /sys/class/net/bonding_masters
929
930NOTE: due to 4K size limitation of sysfs files, this list may be
931truncated if you have more than a few hundred bonds. This is unlikely
932to occur under normal operating conditions.
933
934Adding and Removing Slaves
935--------------------------
936 Interfaces may be enslaved to a bond using the file
937/sys/class/net/<bond>/bonding/slaves. The semantics for this file
938are the same as for the bonding_masters file.
939
940To enslave interface eth0 to bond bond0:
941# ifconfig bond0 up
942# echo +eth0 > /sys/class/net/bond0/bonding/slaves
943
944To free slave eth0 from bond bond0:
945# echo -eth0 > /sys/class/net/bond0/bonding/slaves
946
947 NOTE: The bond must be up before slaves can be added. All
948slaves are freed when the interface is brought down.
949
950 When an interface is enslaved to a bond, symlinks between the
951two are created in the sysfs filesystem. In this case, you would get
952/sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and
953/sys/class/net/eth0/master pointing to /sys/class/net/bond0.
954
955 This means that you can tell quickly whether or not an
956interface is enslaved by looking for the master symlink. Thus:
957# echo -eth0 > /sys/class/net/eth0/master/bonding/slaves
958will free eth0 from whatever bond it is enslaved to, regardless of
959the name of the bond interface.
960
961Changing a Bond's Configuration
962-------------------------------
963 Each bond may be configured individually by manipulating the
964files located in /sys/class/net/<bond name>/bonding
965
966 The names of these files correspond directly with the command-
967line parameters described elsewhere in in this file, and, with the
968exception of arp_ip_target, they accept the same values. To see the
969current setting, simply cat the appropriate file.
970
971 A few examples will be given here; for specific usage
972guidelines for each parameter, see the appropriate section in this
973document.
974
975To configure bond0 for balance-alb mode:
976# ifconfig bond0 down
977# echo 6 > /sys/class/net/bond0/bonding/mode
978 - or -
979# echo balance-alb > /sys/class/net/bond0/bonding/mode
980 NOTE: The bond interface must be down before the mode can be
981changed.
982
983To enable MII monitoring on bond0 with a 1 second interval:
984# echo 1000 > /sys/class/net/bond0/bonding/miimon
985 NOTE: If ARP monitoring is enabled, it will disabled when MII
986monitoring is enabled, and vice-versa.
987
988To add ARP targets:
989# echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
990# echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target
991 NOTE: up to 10 target addresses may be specified.
992
993To remove an ARP target:
994# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
995
996Example Configuration
997---------------------
998 We begin with the same example that is shown in section 3.3,
999executed with sysfs, and without using ifenslave.
1000
1001 To make a simple bond of two e100 devices (presumed to be eth0
1002and eth1), and have it persist across reboots, edit the appropriate
1003file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the
1004following:
1005
1006modprobe bonding
1007modprobe e100
1008echo balance-alb > /sys/class/net/bond0/bonding/mode
1009ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
1010echo 100 > /sys/class/net/bond0/bonding/miimon
1011echo +eth0 > /sys/class/net/bond0/bonding/slaves
1012echo +eth1 > /sys/class/net/bond0/bonding/slaves
1013
1014 To add a second bond, with two e1000 interfaces in
1015active-backup mode, using ARP monitoring, add the following lines to
1016your init script:
1017
1018modprobe e1000
1019echo +bond1 > /sys/class/net/bonding_masters
1020echo active-backup > /sys/class/net/bond1/bonding/mode
1021ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up
1022echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target
1023echo 2000 > /sys/class/net/bond1/bonding/arp_interval
1024echo +eth2 > /sys/class/net/bond1/bonding/slaves
1025echo +eth3 > /sys/class/net/bond1/bonding/slaves
1026
892 1027
8935. Querying Bonding Configuration 10284. Querying Bonding Configuration
894================================= 1029=================================
895 1030
8965.1 Bonding Configuration 10314.1 Bonding Configuration
897------------------------- 1032-------------------------
898 1033
899 Each bonding device has a read-only file residing in the 1034 Each bonding device has a read-only file residing in the
@@ -923,7 +1058,7 @@ generally as follows:
923 The precise format and contents will change depending upon the 1058 The precise format and contents will change depending upon the
924bonding configuration, state, and version of the bonding driver. 1059bonding configuration, state, and version of the bonding driver.
925 1060
9265.2 Network configuration 10614.2 Network configuration
927------------------------- 1062-------------------------
928 1063
929 The network configuration can be inspected using the ifconfig 1064 The network configuration can be inspected using the ifconfig
@@ -958,7 +1093,7 @@ eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
958 collisions:0 txqueuelen:100 1093 collisions:0 txqueuelen:100
959 Interrupt:9 Base address:0x1400 1094 Interrupt:9 Base address:0x1400
960 1095
9616. Switch Configuration 10965. Switch Configuration
962======================= 1097=======================
963 1098
964 For this section, "switch" refers to whatever system the 1099 For this section, "switch" refers to whatever system the
@@ -991,7 +1126,7 @@ transmit policy for an EtherChannel group; all three will interoperate
991with another EtherChannel group. 1126with another EtherChannel group.
992 1127
993 1128
9947. 802.1q VLAN Support 11296. 802.1q VLAN Support
995====================== 1130======================
996 1131
997 It is possible to configure VLAN devices over a bond interface 1132 It is possible to configure VLAN devices over a bond interface
@@ -1042,7 +1177,7 @@ underlying device -- i.e. the bonding interface -- to promiscuous
1042mode, which might not be what you want. 1177mode, which might not be what you want.
1043 1178
1044 1179
10458. Link Monitoring 11807. Link Monitoring
1046================== 1181==================
1047 1182
1048 The bonding driver at present supports two schemes for 1183 The bonding driver at present supports two schemes for
@@ -1053,7 +1188,7 @@ monitor.
1053bonding driver itself, it is not possible to enable both ARP and MII 1188bonding driver itself, it is not possible to enable both ARP and MII
1054monitoring simultaneously. 1189monitoring simultaneously.
1055 1190
10568.1 ARP Monitor Operation 11917.1 ARP Monitor Operation
1057------------------------- 1192-------------------------
1058 1193
1059 The ARP monitor operates as its name suggests: it sends ARP 1194 The ARP monitor operates as its name suggests: it sends ARP
@@ -1071,7 +1206,7 @@ those slaves will stay down. If networking monitoring (tcpdump, etc)
1071shows the ARP requests and replies on the network, then it may be that 1206shows the ARP requests and replies on the network, then it may be that
1072your device driver is not updating last_rx and trans_start. 1207your device driver is not updating last_rx and trans_start.
1073 1208
10748.2 Configuring Multiple ARP Targets 12097.2 Configuring Multiple ARP Targets
1075------------------------------------ 1210------------------------------------
1076 1211
1077 While ARP monitoring can be done with just one target, it can 1212 While ARP monitoring can be done with just one target, it can
@@ -1094,7 +1229,7 @@ alias bond0 bonding
1094options bond0 arp_interval=60 arp_ip_target=192.168.0.100 1229options bond0 arp_interval=60 arp_ip_target=192.168.0.100
1095 1230
1096 1231
10978.3 MII Monitor Operation 12327.3 MII Monitor Operation
1098------------------------- 1233-------------------------
1099 1234
1100 The MII monitor monitors only the carrier state of the local 1235 The MII monitor monitors only the carrier state of the local
@@ -1120,14 +1255,14 @@ does not support or had some error in processing both the MII register
1120and ethtool requests), then the MII monitor will assume the link is 1255and ethtool requests), then the MII monitor will assume the link is
1121up. 1256up.
1122 1257
11239. Potential Sources of Trouble 12588. Potential Sources of Trouble
1124=============================== 1259===============================
1125 1260
11269.1 Adventures in Routing 12618.1 Adventures in Routing
1127------------------------- 1262-------------------------
1128 1263
1129 When bonding is configured, it is important that the slave 1264 When bonding is configured, it is important that the slave
1130devices not have routes that supercede routes of the master (or, 1265devices not have routes that supersede routes of the master (or,
1131generally, not have routes at all). For example, suppose the bonding 1266generally, not have routes at all). For example, suppose the bonding
1132device bond0 has two slaves, eth0 and eth1, and the routing table is 1267device bond0 has two slaves, eth0 and eth1, and the routing table is
1133as follows: 1268as follows:
@@ -1154,11 +1289,11 @@ by the state of the routing table.
1154 1289
1155 The solution here is simply to insure that slaves do not have 1290 The solution here is simply to insure that slaves do not have
1156routes of their own, and if for some reason they must, those routes do 1291routes of their own, and if for some reason they must, those routes do
1157not supercede routes of their master. This should generally be the 1292not supersede routes of their master. This should generally be the
1158case, but unusual configurations or errant manual or automatic static 1293case, but unusual configurations or errant manual or automatic static
1159route additions may cause trouble. 1294route additions may cause trouble.
1160 1295
11619.2 Ethernet Device Renaming 12968.2 Ethernet Device Renaming
1162---------------------------- 1297----------------------------
1163 1298
1164 On systems with network configuration scripts that do not 1299 On systems with network configuration scripts that do not
@@ -1207,7 +1342,7 @@ modprobe with --ignore-install to cause the normal action to then take
1207place. Full documentation on this can be found in the modprobe.conf 1342place. Full documentation on this can be found in the modprobe.conf
1208and modprobe manual pages. 1343and modprobe manual pages.
1209 1344
12109.3. Painfully Slow Or No Failed Link Detection By Miimon 13458.3. Painfully Slow Or No Failed Link Detection By Miimon
1211--------------------------------------------------------- 1346---------------------------------------------------------
1212 1347
1213 By default, bonding enables the use_carrier option, which 1348 By default, bonding enables the use_carrier option, which
@@ -1235,7 +1370,7 @@ carrier state. It has no way to determine the state of devices on or
1235beyond other ports of a switch, or if a switch is refusing to pass 1370beyond other ports of a switch, or if a switch is refusing to pass
1236traffic while still maintaining carrier on. 1371traffic while still maintaining carrier on.
1237 1372
123810. SNMP agents 13739. SNMP agents
1239=============== 1374===============
1240 1375
1241 If running SNMP agents, the bonding driver should be loaded 1376 If running SNMP agents, the bonding driver should be loaded
@@ -1281,7 +1416,7 @@ ifDescr, the association between the IP address and IfIndex remains
1281and SNMP functions such as Interface_Scan_Next will report that 1416and SNMP functions such as Interface_Scan_Next will report that
1282association. 1417association.
1283 1418
128411. Promiscuous mode 141910. Promiscuous mode
1285==================== 1420====================
1286 1421
1287 When running network monitoring tools, e.g., tcpdump, it is 1422 When running network monitoring tools, e.g., tcpdump, it is
@@ -1308,7 +1443,7 @@ sending to peers that are unassigned or if the load is unbalanced.
1308the active slave changes (e.g., due to a link failure), the 1443the active slave changes (e.g., due to a link failure), the
1309promiscuous setting will be propagated to the new active slave. 1444promiscuous setting will be propagated to the new active slave.
1310 1445
131112. Configuring Bonding for High Availability 144611. Configuring Bonding for High Availability
1312============================================= 1447=============================================
1313 1448
1314 High Availability refers to configurations that provide 1449 High Availability refers to configurations that provide
@@ -1318,7 +1453,7 @@ goal is to provide the maximum availability of network connectivity
1318(i.e., the network always works), even though other configurations 1453(i.e., the network always works), even though other configurations
1319could provide higher throughput. 1454could provide higher throughput.
1320 1455
132112.1 High Availability in a Single Switch Topology 145611.1 High Availability in a Single Switch Topology
1322-------------------------------------------------- 1457--------------------------------------------------
1323 1458
1324 If two hosts (or a host and a single switch) are directly 1459 If two hosts (or a host and a single switch) are directly
@@ -1332,7 +1467,7 @@ the load will be rebalanced across the remaining devices.
1332 See Section 13, "Configuring Bonding for Maximum Throughput" 1467 See Section 13, "Configuring Bonding for Maximum Throughput"
1333for information on configuring bonding with one peer device. 1468for information on configuring bonding with one peer device.
1334 1469
133512.2 High Availability in a Multiple Switch Topology 147011.2 High Availability in a Multiple Switch Topology
1336---------------------------------------------------- 1471----------------------------------------------------
1337 1472
1338 With multiple switches, the configuration of bonding and the 1473 With multiple switches, the configuration of bonding and the
@@ -1359,7 +1494,7 @@ switches (ISL, or inter switch link), and multiple ports connecting to
1359the outside world ("port3" on each switch). There is no technical 1494the outside world ("port3" on each switch). There is no technical
1360reason that this could not be extended to a third switch. 1495reason that this could not be extended to a third switch.
1361 1496
136212.2.1 HA Bonding Mode Selection for Multiple Switch Topology 149711.2.1 HA Bonding Mode Selection for Multiple Switch Topology
1363------------------------------------------------------------- 1498-------------------------------------------------------------
1364 1499
1365 In a topology such as the example above, the active-backup and 1500 In a topology such as the example above, the active-backup and
@@ -1381,7 +1516,7 @@ broadcast: This mode is really a special purpose mode, and is suitable
1381 necessary for some specific one-way traffic to reach both 1516 necessary for some specific one-way traffic to reach both
1382 independent networks, then the broadcast mode may be suitable. 1517 independent networks, then the broadcast mode may be suitable.
1383 1518
138412.2.2 HA Link Monitoring Selection for Multiple Switch Topology 151911.2.2 HA Link Monitoring Selection for Multiple Switch Topology
1385---------------------------------------------------------------- 1520----------------------------------------------------------------
1386 1521
1387 The choice of link monitoring ultimately depends upon your 1522 The choice of link monitoring ultimately depends upon your
@@ -1402,10 +1537,10 @@ regardless of which switch is active, the ARP monitor has a suitable
1402target to query. 1537target to query.
1403 1538
1404 1539
140513. Configuring Bonding for Maximum Throughput 154012. Configuring Bonding for Maximum Throughput
1406============================================== 1541==============================================
1407 1542
140813.1 Maximizing Throughput in a Single Switch Topology 154312.1 Maximizing Throughput in a Single Switch Topology
1409------------------------------------------------------ 1544------------------------------------------------------
1410 1545
1411 In a single switch configuration, the best method to maximize 1546 In a single switch configuration, the best method to maximize
@@ -1476,7 +1611,7 @@ destination to make load balancing decisions. The behavior of each
1476mode is described below. 1611mode is described below.
1477 1612
1478 1613
147913.1.1 MT Bonding Mode Selection for Single Switch Topology 161412.1.1 MT Bonding Mode Selection for Single Switch Topology
1480----------------------------------------------------------- 1615-----------------------------------------------------------
1481 1616
1482 This configuration is the easiest to set up and to understand, 1617 This configuration is the easiest to set up and to understand,
@@ -1607,7 +1742,7 @@ balance-alb: This mode is everything that balance-tlb is, and more.
1607 device driver must support changing the hardware address while 1742 device driver must support changing the hardware address while
1608 the device is open. 1743 the device is open.
1609 1744
161013.1.2 MT Link Monitoring for Single Switch Topology 174512.1.2 MT Link Monitoring for Single Switch Topology
1611---------------------------------------------------- 1746----------------------------------------------------
1612 1747
1613 The choice of link monitoring may largely depend upon which 1748 The choice of link monitoring may largely depend upon which
@@ -1616,7 +1751,7 @@ support the use of the ARP monitor, and are thus restricted to using
1616the MII monitor (which does not provide as high a level of end to end 1751the MII monitor (which does not provide as high a level of end to end
1617assurance as the ARP monitor). 1752assurance as the ARP monitor).
1618 1753
161913.2 Maximum Throughput in a Multiple Switch Topology 175412.2 Maximum Throughput in a Multiple Switch Topology
1620----------------------------------------------------- 1755-----------------------------------------------------
1621 1756
1622 Multiple switches may be utilized to optimize for throughput 1757 Multiple switches may be utilized to optimize for throughput
@@ -1651,7 +1786,7 @@ a single 72 port switch.
1651can be equipped with an additional network device connected to an 1786can be equipped with an additional network device connected to an
1652external network; this host then additionally acts as a gateway. 1787external network; this host then additionally acts as a gateway.
1653 1788
165413.2.1 MT Bonding Mode Selection for Multiple Switch Topology 178912.2.1 MT Bonding Mode Selection for Multiple Switch Topology
1655------------------------------------------------------------- 1790-------------------------------------------------------------
1656 1791
1657 In actual practice, the bonding mode typically employed in 1792 In actual practice, the bonding mode typically employed in
@@ -1664,7 +1799,7 @@ packets has arrived). When employed in this fashion, the balance-rr
1664mode allows individual connections between two hosts to effectively 1799mode allows individual connections between two hosts to effectively
1665utilize greater than one interface's bandwidth. 1800utilize greater than one interface's bandwidth.
1666 1801
166713.2.2 MT Link Monitoring for Multiple Switch Topology 180212.2.2 MT Link Monitoring for Multiple Switch Topology
1668------------------------------------------------------ 1803------------------------------------------------------
1669 1804
1670 Again, in actual practice, the MII monitor is most often used 1805 Again, in actual practice, the MII monitor is most often used
@@ -1674,10 +1809,10 @@ advantages over the MII monitor are mitigated by the volume of probes
1674needed as the number of systems involved grows (remember that each 1809needed as the number of systems involved grows (remember that each
1675host in the network is configured with bonding). 1810host in the network is configured with bonding).
1676 1811
167714. Switch Behavior Issues 181213. Switch Behavior Issues
1678========================== 1813==========================
1679 1814
168014.1 Link Establishment and Failover Delays 181513.1 Link Establishment and Failover Delays
1681------------------------------------------- 1816-------------------------------------------
1682 1817
1683 Some switches exhibit undesirable behavior with regard to the 1818 Some switches exhibit undesirable behavior with regard to the
@@ -1712,7 +1847,7 @@ switches take a long time to go into backup mode, it may be desirable
1712to not activate a backup interface immediately after a link goes down. 1847to not activate a backup interface immediately after a link goes down.
1713Failover may be delayed via the downdelay bonding module option. 1848Failover may be delayed via the downdelay bonding module option.
1714 1849
171514.2 Duplicated Incoming Packets 185013.2 Duplicated Incoming Packets
1716-------------------------------- 1851--------------------------------
1717 1852
1718 It is not uncommon to observe a short burst of duplicated 1853 It is not uncommon to observe a short burst of duplicated
@@ -1751,14 +1886,14 @@ behavior, it can be induced by clearing the MAC forwarding table (on
1751most Cisco switches, the privileged command "clear mac address-table 1886most Cisco switches, the privileged command "clear mac address-table
1752dynamic" will accomplish this). 1887dynamic" will accomplish this).
1753 1888
175415. Hardware Specific Considerations 188914. Hardware Specific Considerations
1755==================================== 1890====================================
1756 1891
1757 This section contains additional information for configuring 1892 This section contains additional information for configuring
1758bonding on specific hardware platforms, or for interfacing bonding 1893bonding on specific hardware platforms, or for interfacing bonding
1759with particular switches or other devices. 1894with particular switches or other devices.
1760 1895
176115.1 IBM BladeCenter 189614.1 IBM BladeCenter
1762-------------------- 1897--------------------
1763 1898
1764 This applies to the JS20 and similar systems. 1899 This applies to the JS20 and similar systems.
@@ -1861,7 +1996,7 @@ bonding driver.
1861avoid fail-over delay issues when using bonding. 1996avoid fail-over delay issues when using bonding.
1862 1997
1863 1998
186416. Frequently Asked Questions 199915. Frequently Asked Questions
1865============================== 2000==============================
1866 2001
18671. Is it SMP safe? 20021. Is it SMP safe?
@@ -1925,7 +2060,7 @@ not have special switch requirements, but do need device drivers that
1925support specific features (described in the appropriate section under 2060support specific features (described in the appropriate section under
1926module parameters, above). 2061module parameters, above).
1927 2062
1928 In 802.3ad mode, it works with with systems that support IEEE 2063 In 802.3ad mode, it works with systems that support IEEE
1929802.3ad Dynamic Link Aggregation. Most managed and many unmanaged 2064802.3ad Dynamic Link Aggregation. Most managed and many unmanaged
1930switches currently available support 802.3ad. 2065switches currently available support 802.3ad.
1931 2066
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f12007b80a46..d46338af6002 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -362,6 +362,13 @@ tcp_workaround_signed_windows - BOOLEAN
362 not receive a window scaling option from them. 362 not receive a window scaling option from them.
363 Default: 0 363 Default: 0
364 364
365tcp_slow_start_after_idle - BOOLEAN
366 If set, provide RFC2861 behavior and time out the congestion
367 window after an idle period. An idle period is defined at
368 the current RTO. If unset, the congestion window will not
369 be timed out after an idle period.
370 Default: 1
371
365IP Variables: 372IP Variables:
366 373
367ip_local_port_range - 2 INTEGERS 374ip_local_port_range - 2 INTEGERS
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index 3c0a5ba614d7..847cedb238f6 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -42,9 +42,9 @@ dev->get_stats:
42 Context: nominally process, but don't sleep inside an rwlock 42 Context: nominally process, but don't sleep inside an rwlock
43 43
44dev->hard_start_xmit: 44dev->hard_start_xmit:
45 Synchronization: dev->xmit_lock spinlock. 45 Synchronization: netif_tx_lock spinlock.
46 When the driver sets NETIF_F_LLTX in dev->features this will be 46 When the driver sets NETIF_F_LLTX in dev->features this will be
47 called without holding xmit_lock. In this case the driver 47 called without holding netif_tx_lock. In this case the driver
48 has to lock by itself when needed. It is recommended to use a try lock 48 has to lock by itself when needed. It is recommended to use a try lock
49 for this and return -1 when the spin lock fails. 49 for this and return -1 when the spin lock fails.
50 The locking there should also properly protect against 50 The locking there should also properly protect against
@@ -62,12 +62,12 @@ dev->hard_start_xmit:
62 Only valid when NETIF_F_LLTX is set. 62 Only valid when NETIF_F_LLTX is set.
63 63
64dev->tx_timeout: 64dev->tx_timeout:
65 Synchronization: dev->xmit_lock spinlock. 65 Synchronization: netif_tx_lock spinlock.
66 Context: BHs disabled 66 Context: BHs disabled
67 Notes: netif_queue_stopped() is guaranteed true 67 Notes: netif_queue_stopped() is guaranteed true
68 68
69dev->set_multicast_list: 69dev->set_multicast_list:
70 Synchronization: dev->xmit_lock spinlock. 70 Synchronization: netif_tx_lock spinlock.
71 Context: BHs disabled 71 Context: BHs disabled
72 72
73dev->poll: 73dev->poll: