diff options
author | Jeff Garzik <jgarzik@pobox.com> | 2005-08-14 23:10:00 -0400 |
---|---|---|
committer | Jeff Garzik <jgarzik@pobox.com> | 2005-08-14 23:10:00 -0400 |
commit | 4c0e176dd5e4c44dd60f398518f75eedbe1a65f3 (patch) | |
tree | 07aea7539f78f221c6fc535a94a07befa2afdb63 /Documentation | |
parent | f241be74b803dcf9d70c9978292946370654320f (diff) | |
parent | 2ba84684e8cf6f980e4e95a2300f53a505eb794e (diff) |
Merge /spare/repo/linux-2.6/
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/SubmittingPatches | 5 | ||||
-rw-r--r-- | Documentation/arm/Samsung-S3C24XX/USB-Host.txt | 93 | ||||
-rw-r--r-- | Documentation/dontdiff | 1 | ||||
-rw-r--r-- | Documentation/fb/vesafb.txt | 16 | ||||
-rw-r--r-- | Documentation/kprobes.txt | 588 | ||||
-rw-r--r-- | Documentation/networking/bonding.txt | 978 | ||||
-rw-r--r-- | Documentation/usb/usbmon.txt | 2 | ||||
-rw-r--r-- | Documentation/video4linux/CARDLIST.cx88 | 1 | ||||
-rw-r--r-- | Documentation/video4linux/CARDLIST.tuner | 2 | ||||
-rw-r--r-- | Documentation/video4linux/bttv/Insmod-options | 3 | ||||
-rw-r--r-- | Documentation/x86_64/boot-options.txt | 5 |
11 files changed, 1391 insertions, 303 deletions
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 6761a7b241a5..7f43b040311e 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches | |||
@@ -149,6 +149,11 @@ USB, framebuffer devices, the VFS, the SCSI subsystem, etc. See the | |||
149 | MAINTAINERS file for a mailing list that relates specifically to | 149 | MAINTAINERS file for a mailing list that relates specifically to |
150 | your change. | 150 | your change. |
151 | 151 | ||
152 | If changes affect userland-kernel interfaces, please send | ||
153 | the MAN-PAGES maintainer (as listed in the MAINTAINERS file) | ||
154 | a man-pages patch, or at least a notification of the change, | ||
155 | so that some information makes its way into the manual pages. | ||
156 | |||
152 | Even if the maintainer did not respond in step #4, make sure to ALWAYS | 157 | Even if the maintainer did not respond in step #4, make sure to ALWAYS |
153 | copy the maintainer when you change their code. | 158 | copy the maintainer when you change their code. |
154 | 159 | ||
diff --git a/Documentation/arm/Samsung-S3C24XX/USB-Host.txt b/Documentation/arm/Samsung-S3C24XX/USB-Host.txt new file mode 100644 index 000000000000..b93b68e2b143 --- /dev/null +++ b/Documentation/arm/Samsung-S3C24XX/USB-Host.txt | |||
@@ -0,0 +1,93 @@ | |||
1 | S3C24XX USB Host support | ||
2 | ======================== | ||
3 | |||
4 | |||
5 | |||
6 | Introduction | ||
7 | ------------ | ||
8 | |||
9 | This document details the S3C2410/S3C2440 in-built OHCI USB host support. | ||
10 | |||
11 | Configuration | ||
12 | ------------- | ||
13 | |||
14 | Enable at least the following kernel options: | ||
15 | |||
16 | menuconfig: | ||
17 | |||
18 | Device Drivers ---> | ||
19 | USB support ---> | ||
20 | <*> Support for Host-side USB | ||
21 | <*> OHCI HCD support | ||
22 | |||
23 | |||
24 | .config: | ||
25 | CONFIG_USB | ||
26 | CONFIG_USB_OHCI_HCD | ||
27 | |||
28 | |||
29 | Once these options are configured, the standard set of USB device | ||
30 | drivers can be configured and used. | ||
31 | |||
32 | |||
33 | Board Support | ||
34 | ------------- | ||
35 | |||
36 | The driver attaches to a platform device, which will need to be | ||
37 | added by the board specific support file in linux/arch/arm/mach-s3c2410, | ||
38 | such as mach-bast.c or mach-smdk2410.c | ||
39 | |||
40 | The platform device's platform_data field is only needed if the | ||
41 | board implements extra power control or over-current monitoring. | ||
42 | |||
43 | The OHCI driver does not ensure the state of the S3C2410's MISCCTRL | ||
44 | register, so if both ports are to be used for the host, then it is | ||
45 | the board support file's responsibility to ensure that the second | ||
46 | port is configured to be connected to the OHCI core. | ||
47 | |||
48 | |||
49 | Platform Data | ||
50 | ------------- | ||
51 | |||
52 | See linux/include/asm-arm/arch-s3c2410/usb-control.h for the | ||
53 | descriptions of the platform device data. An implementation | ||
54 | can be found in linux/arch/arm/mach-s3c2410/usb-simtec.c . | ||
55 | |||
56 | The `struct s3c2410_hcd_info` contains a pair of functions | ||
57 | that get called to enable over-current detection, and to | ||
58 | control the port power status. | ||
59 | |||
60 | The ports are numbered 0 and 1. | ||
61 | |||
62 | power_control: | ||
63 | |||
64 | Called to enable or disable the power on the port. | ||
65 | |||
66 | enable_oc: | ||
67 | |||
68 | Called to enable or disable the over-current monitoring. | ||
69 | This should claim or release the resources being used to | ||
70 | check the power condition on the port, such as an IRQ. | ||
71 | |||
72 | report_oc: | ||
73 | |||
74 | The OHCI driver fills this field in for the over-current code | ||
75 | to call when there is a change to the over-current state on | ||
76 | an port. The ports argument is a bitmask of 1 bit per port, | ||
77 | with bit X being 1 for an over-current on port X. | ||
78 | |||
79 | The function s3c2410_usb_report_oc() has been provided to | ||
80 | ensure this is called correctly. | ||
81 | |||
82 | port[x]: | ||
83 | |||
84 | This is struct describes each port, 0 or 1. The platform driver | ||
85 | should set the flags field of each port to S3C_HCDFLG_USED if | ||
86 | the port is enabled. | ||
87 | |||
88 | |||
89 | |||
90 | Document Author | ||
91 | --------------- | ||
92 | |||
93 | Ben Dooks, (c) 2005 Simtec Electronics | ||
diff --git a/Documentation/dontdiff b/Documentation/dontdiff index b974cf595d01..96bea278bbf6 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff | |||
@@ -104,6 +104,7 @@ logo_*.c | |||
104 | logo_*_clut224.c | 104 | logo_*_clut224.c |
105 | logo_*_mono.c | 105 | logo_*_mono.c |
106 | lxdialog | 106 | lxdialog |
107 | mach-types | ||
107 | mach-types.h | 108 | mach-types.h |
108 | make_times_h | 109 | make_times_h |
109 | map | 110 | map |
diff --git a/Documentation/fb/vesafb.txt b/Documentation/fb/vesafb.txt index 814e2f56a6ad..62db6758d1c1 100644 --- a/Documentation/fb/vesafb.txt +++ b/Documentation/fb/vesafb.txt | |||
@@ -144,7 +144,21 @@ vgapal Use the standard vga registers for palette changes. | |||
144 | This is the default. | 144 | This is the default. |
145 | pmipal Use the protected mode interface for palette changes. | 145 | pmipal Use the protected mode interface for palette changes. |
146 | 146 | ||
147 | mtrr setup memory type range registers for the vesafb framebuffer. | 147 | mtrr:n setup memory type range registers for the vesafb framebuffer |
148 | where n: | ||
149 | 0 - disabled (equivalent to nomtrr) | ||
150 | 1 - uncachable | ||
151 | 2 - write-back | ||
152 | 3 - write-combining (default) | ||
153 | 4 - write-through | ||
154 | |||
155 | If you see the following in dmesg, choose the type that matches the | ||
156 | old one. In this example, use "mtrr:2". | ||
157 | ... | ||
158 | mtrr: type mismatch for e0000000,8000000 old: write-back new: write-combining | ||
159 | ... | ||
160 | |||
161 | nomtrr disable mtrr | ||
148 | 162 | ||
149 | vremap:n | 163 | vremap:n |
150 | remap 'n' MiB of video RAM. If 0 or not specified, remap memory | 164 | remap 'n' MiB of video RAM. If 0 or not specified, remap memory |
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt new file mode 100644 index 000000000000..0541fe1de704 --- /dev/null +++ b/Documentation/kprobes.txt | |||
@@ -0,0 +1,588 @@ | |||
1 | Title : Kernel Probes (Kprobes) | ||
2 | Authors : Jim Keniston <jkenisto@us.ibm.com> | ||
3 | : Prasanna S Panchamukhi <prasanna@in.ibm.com> | ||
4 | |||
5 | CONTENTS | ||
6 | |||
7 | 1. Concepts: Kprobes, Jprobes, Return Probes | ||
8 | 2. Architectures Supported | ||
9 | 3. Configuring Kprobes | ||
10 | 4. API Reference | ||
11 | 5. Kprobes Features and Limitations | ||
12 | 6. Probe Overhead | ||
13 | 7. TODO | ||
14 | 8. Kprobes Example | ||
15 | 9. Jprobes Example | ||
16 | 10. Kretprobes Example | ||
17 | |||
18 | 1. Concepts: Kprobes, Jprobes, Return Probes | ||
19 | |||
20 | Kprobes enables you to dynamically break into any kernel routine and | ||
21 | collect debugging and performance information non-disruptively. You | ||
22 | can trap at almost any kernel code address, specifying a handler | ||
23 | routine to be invoked when the breakpoint is hit. | ||
24 | |||
25 | There are currently three types of probes: kprobes, jprobes, and | ||
26 | kretprobes (also called return probes). A kprobe can be inserted | ||
27 | on virtually any instruction in the kernel. A jprobe is inserted at | ||
28 | the entry to a kernel function, and provides convenient access to the | ||
29 | function's arguments. A return probe fires when a specified function | ||
30 | returns. | ||
31 | |||
32 | In the typical case, Kprobes-based instrumentation is packaged as | ||
33 | a kernel module. The module's init function installs ("registers") | ||
34 | one or more probes, and the exit function unregisters them. A | ||
35 | registration function such as register_kprobe() specifies where | ||
36 | the probe is to be inserted and what handler is to be called when | ||
37 | the probe is hit. | ||
38 | |||
39 | The next three subsections explain how the different types of | ||
40 | probes work. They explain certain things that you'll need to | ||
41 | know in order to make the best use of Kprobes -- e.g., the | ||
42 | difference between a pre_handler and a post_handler, and how | ||
43 | to use the maxactive and nmissed fields of a kretprobe. But | ||
44 | if you're in a hurry to start using Kprobes, you can skip ahead | ||
45 | to section 2. | ||
46 | |||
47 | 1.1 How Does a Kprobe Work? | ||
48 | |||
49 | When a kprobe is registered, Kprobes makes a copy of the probed | ||
50 | instruction and replaces the first byte(s) of the probed instruction | ||
51 | with a breakpoint instruction (e.g., int3 on i386 and x86_64). | ||
52 | |||
53 | When a CPU hits the breakpoint instruction, a trap occurs, the CPU's | ||
54 | registers are saved, and control passes to Kprobes via the | ||
55 | notifier_call_chain mechanism. Kprobes executes the "pre_handler" | ||
56 | associated with the kprobe, passing the handler the addresses of the | ||
57 | kprobe struct and the saved registers. | ||
58 | |||
59 | Next, Kprobes single-steps its copy of the probed instruction. | ||
60 | (It would be simpler to single-step the actual instruction in place, | ||
61 | but then Kprobes would have to temporarily remove the breakpoint | ||
62 | instruction. This would open a small time window when another CPU | ||
63 | could sail right past the probepoint.) | ||
64 | |||
65 | After the instruction is single-stepped, Kprobes executes the | ||
66 | "post_handler," if any, that is associated with the kprobe. | ||
67 | Execution then continues with the instruction following the probepoint. | ||
68 | |||
69 | 1.2 How Does a Jprobe Work? | ||
70 | |||
71 | A jprobe is implemented using a kprobe that is placed on a function's | ||
72 | entry point. It employs a simple mirroring principle to allow | ||
73 | seamless access to the probed function's arguments. The jprobe | ||
74 | handler routine should have the same signature (arg list and return | ||
75 | type) as the function being probed, and must always end by calling | ||
76 | the Kprobes function jprobe_return(). | ||
77 | |||
78 | Here's how it works. When the probe is hit, Kprobes makes a copy of | ||
79 | the saved registers and a generous portion of the stack (see below). | ||
80 | Kprobes then points the saved instruction pointer at the jprobe's | ||
81 | handler routine, and returns from the trap. As a result, control | ||
82 | passes to the handler, which is presented with the same register and | ||
83 | stack contents as the probed function. When it is done, the handler | ||
84 | calls jprobe_return(), which traps again to restore the original stack | ||
85 | contents and processor state and switch to the probed function. | ||
86 | |||
87 | By convention, the callee owns its arguments, so gcc may produce code | ||
88 | that unexpectedly modifies that portion of the stack. This is why | ||
89 | Kprobes saves a copy of the stack and restores it after the jprobe | ||
90 | handler has run. Up to MAX_STACK_SIZE bytes are copied -- e.g., | ||
91 | 64 bytes on i386. | ||
92 | |||
93 | Note that the probed function's args may be passed on the stack | ||
94 | or in registers (e.g., for x86_64 or for an i386 fastcall function). | ||
95 | The jprobe will work in either case, so long as the handler's | ||
96 | prototype matches that of the probed function. | ||
97 | |||
98 | 1.3 How Does a Return Probe Work? | ||
99 | |||
100 | When you call register_kretprobe(), Kprobes establishes a kprobe at | ||
101 | the entry to the function. When the probed function is called and this | ||
102 | probe is hit, Kprobes saves a copy of the return address, and replaces | ||
103 | the return address with the address of a "trampoline." The trampoline | ||
104 | is an arbitrary piece of code -- typically just a nop instruction. | ||
105 | At boot time, Kprobes registers a kprobe at the trampoline. | ||
106 | |||
107 | When the probed function executes its return instruction, control | ||
108 | passes to the trampoline and that probe is hit. Kprobes' trampoline | ||
109 | handler calls the user-specified handler associated with the kretprobe, | ||
110 | then sets the saved instruction pointer to the saved return address, | ||
111 | and that's where execution resumes upon return from the trap. | ||
112 | |||
113 | While the probed function is executing, its return address is | ||
114 | stored in an object of type kretprobe_instance. Before calling | ||
115 | register_kretprobe(), the user sets the maxactive field of the | ||
116 | kretprobe struct to specify how many instances of the specified | ||
117 | function can be probed simultaneously. register_kretprobe() | ||
118 | pre-allocates the indicated number of kretprobe_instance objects. | ||
119 | |||
120 | For example, if the function is non-recursive and is called with a | ||
121 | spinlock held, maxactive = 1 should be enough. If the function is | ||
122 | non-recursive and can never relinquish the CPU (e.g., via a semaphore | ||
123 | or preemption), NR_CPUS should be enough. If maxactive <= 0, it is | ||
124 | set to a default value. If CONFIG_PREEMPT is enabled, the default | ||
125 | is max(10, 2*NR_CPUS). Otherwise, the default is NR_CPUS. | ||
126 | |||
127 | It's not a disaster if you set maxactive too low; you'll just miss | ||
128 | some probes. In the kretprobe struct, the nmissed field is set to | ||
129 | zero when the return probe is registered, and is incremented every | ||
130 | time the probed function is entered but there is no kretprobe_instance | ||
131 | object available for establishing the return probe. | ||
132 | |||
133 | 2. Architectures Supported | ||
134 | |||
135 | Kprobes, jprobes, and return probes are implemented on the following | ||
136 | architectures: | ||
137 | |||
138 | - i386 | ||
139 | - x86_64 (AMD-64, E64MT) | ||
140 | - ppc64 | ||
141 | - ia64 (Support for probes on certain instruction types is still in progress.) | ||
142 | - sparc64 (Return probes not yet implemented.) | ||
143 | |||
144 | 3. Configuring Kprobes | ||
145 | |||
146 | When configuring the kernel using make menuconfig/xconfig/oldconfig, | ||
147 | ensure that CONFIG_KPROBES is set to "y". Under "Kernel hacking", | ||
148 | look for "Kprobes". You may have to enable "Kernel debugging" | ||
149 | (CONFIG_DEBUG_KERNEL) before you can enable Kprobes. | ||
150 | |||
151 | You may also want to ensure that CONFIG_KALLSYMS and perhaps even | ||
152 | CONFIG_KALLSYMS_ALL are set to "y", since kallsyms_lookup_name() | ||
153 | is a handy, version-independent way to find a function's address. | ||
154 | |||
155 | If you need to insert a probe in the middle of a function, you may find | ||
156 | it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO), | ||
157 | so you can use "objdump -d -l vmlinux" to see the source-to-object | ||
158 | code mapping. | ||
159 | |||
160 | 4. API Reference | ||
161 | |||
162 | The Kprobes API includes a "register" function and an "unregister" | ||
163 | function for each type of probe. Here are terse, mini-man-page | ||
164 | specifications for these functions and the associated probe handlers | ||
165 | that you'll write. See the latter half of this document for examples. | ||
166 | |||
167 | 4.1 register_kprobe | ||
168 | |||
169 | #include <linux/kprobes.h> | ||
170 | int register_kprobe(struct kprobe *kp); | ||
171 | |||
172 | Sets a breakpoint at the address kp->addr. When the breakpoint is | ||
173 | hit, Kprobes calls kp->pre_handler. After the probed instruction | ||
174 | is single-stepped, Kprobe calls kp->post_handler. If a fault | ||
175 | occurs during execution of kp->pre_handler or kp->post_handler, | ||
176 | or during single-stepping of the probed instruction, Kprobes calls | ||
177 | kp->fault_handler. Any or all handlers can be NULL. | ||
178 | |||
179 | register_kprobe() returns 0 on success, or a negative errno otherwise. | ||
180 | |||
181 | User's pre-handler (kp->pre_handler): | ||
182 | #include <linux/kprobes.h> | ||
183 | #include <linux/ptrace.h> | ||
184 | int pre_handler(struct kprobe *p, struct pt_regs *regs); | ||
185 | |||
186 | Called with p pointing to the kprobe associated with the breakpoint, | ||
187 | and regs pointing to the struct containing the registers saved when | ||
188 | the breakpoint was hit. Return 0 here unless you're a Kprobes geek. | ||
189 | |||
190 | User's post-handler (kp->post_handler): | ||
191 | #include <linux/kprobes.h> | ||
192 | #include <linux/ptrace.h> | ||
193 | void post_handler(struct kprobe *p, struct pt_regs *regs, | ||
194 | unsigned long flags); | ||
195 | |||
196 | p and regs are as described for the pre_handler. flags always seems | ||
197 | to be zero. | ||
198 | |||
199 | User's fault-handler (kp->fault_handler): | ||
200 | #include <linux/kprobes.h> | ||
201 | #include <linux/ptrace.h> | ||
202 | int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr); | ||
203 | |||
204 | p and regs are as described for the pre_handler. trapnr is the | ||
205 | architecture-specific trap number associated with the fault (e.g., | ||
206 | on i386, 13 for a general protection fault or 14 for a page fault). | ||
207 | Returns 1 if it successfully handled the exception. | ||
208 | |||
209 | 4.2 register_jprobe | ||
210 | |||
211 | #include <linux/kprobes.h> | ||
212 | int register_jprobe(struct jprobe *jp) | ||
213 | |||
214 | Sets a breakpoint at the address jp->kp.addr, which must be the address | ||
215 | of the first instruction of a function. When the breakpoint is hit, | ||
216 | Kprobes runs the handler whose address is jp->entry. | ||
217 | |||
218 | The handler should have the same arg list and return type as the probed | ||
219 | function; and just before it returns, it must call jprobe_return(). | ||
220 | (The handler never actually returns, since jprobe_return() returns | ||
221 | control to Kprobes.) If the probed function is declared asmlinkage, | ||
222 | fastcall, or anything else that affects how args are passed, the | ||
223 | handler's declaration must match. | ||
224 | |||
225 | register_jprobe() returns 0 on success, or a negative errno otherwise. | ||
226 | |||
227 | 4.3 register_kretprobe | ||
228 | |||
229 | #include <linux/kprobes.h> | ||
230 | int register_kretprobe(struct kretprobe *rp); | ||
231 | |||
232 | Establishes a return probe for the function whose address is | ||
233 | rp->kp.addr. When that function returns, Kprobes calls rp->handler. | ||
234 | You must set rp->maxactive appropriately before you call | ||
235 | register_kretprobe(); see "How Does a Return Probe Work?" for details. | ||
236 | |||
237 | register_kretprobe() returns 0 on success, or a negative errno | ||
238 | otherwise. | ||
239 | |||
240 | User's return-probe handler (rp->handler): | ||
241 | #include <linux/kprobes.h> | ||
242 | #include <linux/ptrace.h> | ||
243 | int kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs); | ||
244 | |||
245 | regs is as described for kprobe.pre_handler. ri points to the | ||
246 | kretprobe_instance object, of which the following fields may be | ||
247 | of interest: | ||
248 | - ret_addr: the return address | ||
249 | - rp: points to the corresponding kretprobe object | ||
250 | - task: points to the corresponding task struct | ||
251 | The handler's return value is currently ignored. | ||
252 | |||
253 | 4.4 unregister_*probe | ||
254 | |||
255 | #include <linux/kprobes.h> | ||
256 | void unregister_kprobe(struct kprobe *kp); | ||
257 | void unregister_jprobe(struct jprobe *jp); | ||
258 | void unregister_kretprobe(struct kretprobe *rp); | ||
259 | |||
260 | Removes the specified probe. The unregister function can be called | ||
261 | at any time after the probe has been registered. | ||
262 | |||
263 | 5. Kprobes Features and Limitations | ||
264 | |||
265 | As of Linux v2.6.12, Kprobes allows multiple probes at the same | ||
266 | address. Currently, however, there cannot be multiple jprobes on | ||
267 | the same function at the same time. | ||
268 | |||
269 | In general, you can install a probe anywhere in the kernel. | ||
270 | In particular, you can probe interrupt handlers. Known exceptions | ||
271 | are discussed in this section. | ||
272 | |||
273 | For obvious reasons, it's a bad idea to install a probe in | ||
274 | the code that implements Kprobes (mostly kernel/kprobes.c and | ||
275 | arch/*/kernel/kprobes.c). A patch in the v2.6.13 timeframe instructs | ||
276 | Kprobes to reject such requests. | ||
277 | |||
278 | If you install a probe in an inline-able function, Kprobes makes | ||
279 | no attempt to chase down all inline instances of the function and | ||
280 | install probes there. gcc may inline a function without being asked, | ||
281 | so keep this in mind if you're not seeing the probe hits you expect. | ||
282 | |||
283 | A probe handler can modify the environment of the probed function | ||
284 | -- e.g., by modifying kernel data structures, or by modifying the | ||
285 | contents of the pt_regs struct (which are restored to the registers | ||
286 | upon return from the breakpoint). So Kprobes can be used, for example, | ||
287 | to install a bug fix or to inject faults for testing. Kprobes, of | ||
288 | course, has no way to distinguish the deliberately injected faults | ||
289 | from the accidental ones. Don't drink and probe. | ||
290 | |||
291 | Kprobes makes no attempt to prevent probe handlers from stepping on | ||
292 | each other -- e.g., probing printk() and then calling printk() from a | ||
293 | probe handler. As of Linux v2.6.12, if a probe handler hits a probe, | ||
294 | that second probe's handlers won't be run in that instance. | ||
295 | |||
296 | In Linux v2.6.12 and previous versions, Kprobes' data structures are | ||
297 | protected by a single lock that is held during probe registration and | ||
298 | unregistration and while handlers are run. Thus, no two handlers | ||
299 | can run simultaneously. To improve scalability on SMP systems, | ||
300 | this restriction will probably be removed soon, in which case | ||
301 | multiple handlers (or multiple instances of the same handler) may | ||
302 | run concurrently on different CPUs. Code your handlers accordingly. | ||
303 | |||
304 | Kprobes does not use semaphores or allocate memory except during | ||
305 | registration and unregistration. | ||
306 | |||
307 | Probe handlers are run with preemption disabled. Depending on the | ||
308 | architecture, handlers may also run with interrupts disabled. In any | ||
309 | case, your handler should not yield the CPU (e.g., by attempting to | ||
310 | acquire a semaphore). | ||
311 | |||
312 | Since a return probe is implemented by replacing the return | ||
313 | address with the trampoline's address, stack backtraces and calls | ||
314 | to __builtin_return_address() will typically yield the trampoline's | ||
315 | address instead of the real return address for kretprobed functions. | ||
316 | (As far as we can tell, __builtin_return_address() is used only | ||
317 | for instrumentation and error reporting.) | ||
318 | |||
319 | If the number of times a function is called does not match the | ||
320 | number of times it returns, registering a return probe on that | ||
321 | function may produce undesirable results. We have the do_exit() | ||
322 | and do_execve() cases covered. do_fork() is not an issue. We're | ||
323 | unaware of other specific cases where this could be a problem. | ||
324 | |||
325 | 6. Probe Overhead | ||
326 | |||
327 | On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 | ||
328 | microseconds to process. Specifically, a benchmark that hits the same | ||
329 | probepoint repeatedly, firing a simple handler each time, reports 1-2 | ||
330 | million hits per second, depending on the architecture. A jprobe or | ||
331 | return-probe hit typically takes 50-75% longer than a kprobe hit. | ||
332 | When you have a return probe set on a function, adding a kprobe at | ||
333 | the entry to that function adds essentially no overhead. | ||
334 | |||
335 | Here are sample overhead figures (in usec) for different architectures. | ||
336 | k = kprobe; j = jprobe; r = return probe; kr = kprobe + return probe | ||
337 | on same function; jr = jprobe + return probe on same function | ||
338 | |||
339 | i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips | ||
340 | k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40 | ||
341 | |||
342 | x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips | ||
343 | k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07 | ||
344 | |||
345 | ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) | ||
346 | k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 | ||
347 | |||
348 | 7. TODO | ||
349 | |||
350 | a. SystemTap (http://sourceware.org/systemtap): Work in progress | ||
351 | to provide a simplified programming interface for probe-based | ||
352 | instrumentation. | ||
353 | b. Improved SMP scalability: Currently, work is in progress to handle | ||
354 | multiple kprobes in parallel. | ||
355 | c. Kernel return probes for sparc64. | ||
356 | d. Support for other architectures. | ||
357 | e. User-space probes. | ||
358 | |||
359 | 8. Kprobes Example | ||
360 | |||
361 | Here's a sample kernel module showing the use of kprobes to dump a | ||
362 | stack trace and selected i386 registers when do_fork() is called. | ||
363 | ----- cut here ----- | ||
364 | /*kprobe_example.c*/ | ||
365 | #include <linux/kernel.h> | ||
366 | #include <linux/module.h> | ||
367 | #include <linux/kprobes.h> | ||
368 | #include <linux/kallsyms.h> | ||
369 | #include <linux/sched.h> | ||
370 | |||
371 | /*For each probe you need to allocate a kprobe structure*/ | ||
372 | static struct kprobe kp; | ||
373 | |||
374 | /*kprobe pre_handler: called just before the probed instruction is executed*/ | ||
375 | int handler_pre(struct kprobe *p, struct pt_regs *regs) | ||
376 | { | ||
377 | printk("pre_handler: p->addr=0x%p, eip=%lx, eflags=0x%lx\n", | ||
378 | p->addr, regs->eip, regs->eflags); | ||
379 | dump_stack(); | ||
380 | return 0; | ||
381 | } | ||
382 | |||
383 | /*kprobe post_handler: called after the probed instruction is executed*/ | ||
384 | void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags) | ||
385 | { | ||
386 | printk("post_handler: p->addr=0x%p, eflags=0x%lx\n", | ||
387 | p->addr, regs->eflags); | ||
388 | } | ||
389 | |||
390 | /* fault_handler: this is called if an exception is generated for any | ||
391 | * instruction within the pre- or post-handler, or when Kprobes | ||
392 | * single-steps the probed instruction. | ||
393 | */ | ||
394 | int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr) | ||
395 | { | ||
396 | printk("fault_handler: p->addr=0x%p, trap #%dn", | ||
397 | p->addr, trapnr); | ||
398 | /* Return 0 because we don't handle the fault. */ | ||
399 | return 0; | ||
400 | } | ||
401 | |||
402 | int init_module(void) | ||
403 | { | ||
404 | int ret; | ||
405 | kp.pre_handler = handler_pre; | ||
406 | kp.post_handler = handler_post; | ||
407 | kp.fault_handler = handler_fault; | ||
408 | kp.addr = (kprobe_opcode_t*) kallsyms_lookup_name("do_fork"); | ||
409 | /* register the kprobe now */ | ||
410 | if (!kp.addr) { | ||
411 | printk("Couldn't find %s to plant kprobe\n", "do_fork"); | ||
412 | return -1; | ||
413 | } | ||
414 | if ((ret = register_kprobe(&kp) < 0)) { | ||
415 | printk("register_kprobe failed, returned %d\n", ret); | ||
416 | return -1; | ||
417 | } | ||
418 | printk("kprobe registered\n"); | ||
419 | return 0; | ||
420 | } | ||
421 | |||
422 | void cleanup_module(void) | ||
423 | { | ||
424 | unregister_kprobe(&kp); | ||
425 | printk("kprobe unregistered\n"); | ||
426 | } | ||
427 | |||
428 | MODULE_LICENSE("GPL"); | ||
429 | ----- cut here ----- | ||
430 | |||
431 | You can build the kernel module, kprobe-example.ko, using the following | ||
432 | Makefile: | ||
433 | ----- cut here ----- | ||
434 | obj-m := kprobe-example.o | ||
435 | KDIR := /lib/modules/$(shell uname -r)/build | ||
436 | PWD := $(shell pwd) | ||
437 | default: | ||
438 | $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules | ||
439 | clean: | ||
440 | rm -f *.mod.c *.ko *.o | ||
441 | ----- cut here ----- | ||
442 | |||
443 | $ make | ||
444 | $ su - | ||
445 | ... | ||
446 | # insmod kprobe-example.ko | ||
447 | |||
448 | You will see the trace data in /var/log/messages and on the console | ||
449 | whenever do_fork() is invoked to create a new process. | ||
450 | |||
451 | 9. Jprobes Example | ||
452 | |||
453 | Here's a sample kernel module showing the use of jprobes to dump | ||
454 | the arguments of do_fork(). | ||
455 | ----- cut here ----- | ||
456 | /*jprobe-example.c */ | ||
457 | #include <linux/kernel.h> | ||
458 | #include <linux/module.h> | ||
459 | #include <linux/fs.h> | ||
460 | #include <linux/uio.h> | ||
461 | #include <linux/kprobes.h> | ||
462 | #include <linux/kallsyms.h> | ||
463 | |||
464 | /* | ||
465 | * Jumper probe for do_fork. | ||
466 | * Mirror principle enables access to arguments of the probed routine | ||
467 | * from the probe handler. | ||
468 | */ | ||
469 | |||
470 | /* Proxy routine having the same arguments as actual do_fork() routine */ | ||
471 | long jdo_fork(unsigned long clone_flags, unsigned long stack_start, | ||
472 | struct pt_regs *regs, unsigned long stack_size, | ||
473 | int __user * parent_tidptr, int __user * child_tidptr) | ||
474 | { | ||
475 | printk("jprobe: clone_flags=0x%lx, stack_size=0x%lx, regs=0x%p\n", | ||
476 | clone_flags, stack_size, regs); | ||
477 | /* Always end with a call to jprobe_return(). */ | ||
478 | jprobe_return(); | ||
479 | /*NOTREACHED*/ | ||
480 | return 0; | ||
481 | } | ||
482 | |||
483 | static struct jprobe my_jprobe = { | ||
484 | .entry = (kprobe_opcode_t *) jdo_fork | ||
485 | }; | ||
486 | |||
487 | int init_module(void) | ||
488 | { | ||
489 | int ret; | ||
490 | my_jprobe.kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("do_fork"); | ||
491 | if (!my_jprobe.kp.addr) { | ||
492 | printk("Couldn't find %s to plant jprobe\n", "do_fork"); | ||
493 | return -1; | ||
494 | } | ||
495 | |||
496 | if ((ret = register_jprobe(&my_jprobe)) <0) { | ||
497 | printk("register_jprobe failed, returned %d\n", ret); | ||
498 | return -1; | ||
499 | } | ||
500 | printk("Planted jprobe at %p, handler addr %p\n", | ||
501 | my_jprobe.kp.addr, my_jprobe.entry); | ||
502 | return 0; | ||
503 | } | ||
504 | |||
505 | void cleanup_module(void) | ||
506 | { | ||
507 | unregister_jprobe(&my_jprobe); | ||
508 | printk("jprobe unregistered\n"); | ||
509 | } | ||
510 | |||
511 | MODULE_LICENSE("GPL"); | ||
512 | ----- cut here ----- | ||
513 | |||
514 | Build and insert the kernel module as shown in the above kprobe | ||
515 | example. You will see the trace data in /var/log/messages and on | ||
516 | the console whenever do_fork() is invoked to create a new process. | ||
517 | (Some messages may be suppressed if syslogd is configured to | ||
518 | eliminate duplicate messages.) | ||
519 | |||
520 | 10. Kretprobes Example | ||
521 | |||
522 | Here's a sample kernel module showing the use of return probes to | ||
523 | report failed calls to sys_open(). | ||
524 | ----- cut here ----- | ||
525 | /*kretprobe-example.c*/ | ||
526 | #include <linux/kernel.h> | ||
527 | #include <linux/module.h> | ||
528 | #include <linux/kprobes.h> | ||
529 | #include <linux/kallsyms.h> | ||
530 | |||
531 | static const char *probed_func = "sys_open"; | ||
532 | |||
533 | /* Return-probe handler: If the probed function fails, log the return value. */ | ||
534 | static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs) | ||
535 | { | ||
536 | // Substitute the appropriate register name for your architecture -- | ||
537 | // e.g., regs->rax for x86_64, regs->gpr[3] for ppc64. | ||
538 | int retval = (int) regs->eax; | ||
539 | if (retval < 0) { | ||
540 | printk("%s returns %d\n", probed_func, retval); | ||
541 | } | ||
542 | return 0; | ||
543 | } | ||
544 | |||
545 | static struct kretprobe my_kretprobe = { | ||
546 | .handler = ret_handler, | ||
547 | /* Probe up to 20 instances concurrently. */ | ||
548 | .maxactive = 20 | ||
549 | }; | ||
550 | |||
551 | int init_module(void) | ||
552 | { | ||
553 | int ret; | ||
554 | my_kretprobe.kp.addr = | ||
555 | (kprobe_opcode_t *) kallsyms_lookup_name(probed_func); | ||
556 | if (!my_kretprobe.kp.addr) { | ||
557 | printk("Couldn't find %s to plant return probe\n", probed_func); | ||
558 | return -1; | ||
559 | } | ||
560 | if ((ret = register_kretprobe(&my_kretprobe)) < 0) { | ||
561 | printk("register_kretprobe failed, returned %d\n", ret); | ||
562 | return -1; | ||
563 | } | ||
564 | printk("Planted return probe at %p\n", my_kretprobe.kp.addr); | ||
565 | return 0; | ||
566 | } | ||
567 | |||
568 | void cleanup_module(void) | ||
569 | { | ||
570 | unregister_kretprobe(&my_kretprobe); | ||
571 | printk("kretprobe unregistered\n"); | ||
572 | /* nmissed > 0 suggests that maxactive was set too low. */ | ||
573 | printk("Missed probing %d instances of %s\n", | ||
574 | my_kretprobe.nmissed, probed_func); | ||
575 | } | ||
576 | |||
577 | MODULE_LICENSE("GPL"); | ||
578 | ----- cut here ----- | ||
579 | |||
580 | Build and insert the kernel module as shown in the above kprobe | ||
581 | example. You will see the trace data in /var/log/messages and on the | ||
582 | console whenever sys_open() returns a negative value. (Some messages | ||
583 | may be suppressed if syslogd is configured to eliminate duplicate | ||
584 | messages.) | ||
585 | |||
586 | For additional information on Kprobes, refer to the following URLs: | ||
587 | http://www-106.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe | ||
588 | http://www.redhat.com/magazine/005mar05/features/kprobes/ | ||
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 0bc2ed136a38..24d029455baa 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt | |||
@@ -1,5 +1,7 @@ | |||
1 | 1 | ||
2 | Linux Ethernet Bonding Driver HOWTO | 2 | Linux Ethernet Bonding Driver HOWTO |
3 | |||
4 | Latest update: 21 June 2005 | ||
3 | 5 | ||
4 | Initial release : Thomas Davis <tadavis at lbl.gov> | 6 | Initial release : Thomas Davis <tadavis at lbl.gov> |
5 | Corrections, HA extensions : 2000/10/03-15 : | 7 | Corrections, HA extensions : 2000/10/03-15 : |
@@ -11,15 +13,22 @@ Corrections, HA extensions : 2000/10/03-15 : | |||
11 | 13 | ||
12 | Reorganized and updated Feb 2005 by Jay Vosburgh | 14 | Reorganized and updated Feb 2005 by Jay Vosburgh |
13 | 15 | ||
14 | Note : | 16 | Introduction |
15 | ------ | 17 | ============ |
18 | |||
19 | The Linux bonding driver provides a method for aggregating | ||
20 | multiple network interfaces into a single logical "bonded" interface. | ||
21 | The behavior of the bonded interfaces depends upon the mode; generally | ||
22 | speaking, modes provide either hot standby or load balancing services. | ||
23 | Additionally, link integrity monitoring may be performed. | ||
16 | 24 | ||
17 | The bonding driver originally came from Donald Becker's beowulf patches for | 25 | The bonding driver originally came from Donald Becker's |
18 | kernel 2.0. It has changed quite a bit since, and the original tools from | 26 | beowulf patches for kernel 2.0. It has changed quite a bit since, and |
19 | extreme-linux and beowulf sites will not work with this version of the driver. | 27 | the original tools from extreme-linux and beowulf sites will not work |
28 | with this version of the driver. | ||
20 | 29 | ||
21 | For new versions of the driver, patches for older kernels and the updated | 30 | For new versions of the driver, updated userspace tools, and |
22 | userspace tools, please follow the links at the end of this file. | 31 | who to ask for help, please follow the links at the end of this file. |
23 | 32 | ||
24 | Table of Contents | 33 | Table of Contents |
25 | ================= | 34 | ================= |
@@ -30,9 +39,13 @@ Table of Contents | |||
30 | 39 | ||
31 | 3. Configuring Bonding Devices | 40 | 3. Configuring Bonding Devices |
32 | 3.1 Configuration with sysconfig support | 41 | 3.1 Configuration with sysconfig support |
42 | 3.1.1 Using DHCP with sysconfig | ||
43 | 3.1.2 Configuring Multiple Bonds with sysconfig | ||
33 | 3.2 Configuration with initscripts support | 44 | 3.2 Configuration with initscripts support |
45 | 3.2.1 Using DHCP with initscripts | ||
46 | 3.2.2 Configuring Multiple Bonds with initscripts | ||
34 | 3.3 Configuring Bonding Manually | 47 | 3.3 Configuring Bonding Manually |
35 | 3.4 Configuring Multiple Bonds | 48 | 3.3.1 Configuring Multiple Bonds Manually |
36 | 49 | ||
37 | 5. Querying Bonding Configuration | 50 | 5. Querying Bonding Configuration |
38 | 5.1 Bonding Configuration | 51 | 5.1 Bonding Configuration |
@@ -56,21 +69,30 @@ Table of Contents | |||
56 | 69 | ||
57 | 11. Promiscuous mode | 70 | 11. Promiscuous mode |
58 | 71 | ||
59 | 12. High Availability Information | 72 | 12. Configuring Bonding for High Availability |
60 | 12.1 High Availability in a Single Switch Topology | 73 | 12.1 High Availability in a Single Switch Topology |
61 | 12.1.1 Bonding Mode Selection for Single Switch Topology | ||
62 | 12.1.2 Link Monitoring for Single Switch Topology | ||
63 | 12.2 High Availability in a Multiple Switch Topology | 74 | 12.2 High Availability in a Multiple Switch Topology |
64 | 12.2.1 Bonding Mode Selection for Multiple Switch Topology | 75 | 12.2.1 HA Bonding Mode Selection for Multiple Switch Topology |
65 | 12.2.2 Link Monitoring for Multiple Switch Topology | 76 | 12.2.2 HA Link Monitoring for Multiple Switch Topology |
66 | 12.3 Switch Behavior Issues for High Availability | 77 | |
78 | 13. Configuring Bonding for Maximum Throughput | ||
79 | 13.1 Maximum Throughput in a Single Switch Topology | ||
80 | 13.1.1 MT Bonding Mode Selection for Single Switch Topology | ||
81 | 13.1.2 MT Link Monitoring for Single Switch Topology | ||
82 | 13.2 Maximum Throughput in a Multiple Switch Topology | ||
83 | 13.2.1 MT Bonding Mode Selection for Multiple Switch Topology | ||
84 | 13.2.2 MT Link Monitoring for Multiple Switch Topology | ||
67 | 85 | ||
68 | 13. Hardware Specific Considerations | 86 | 14. Switch Behavior Issues |
69 | 13.1 IBM BladeCenter | 87 | 14.1 Link Establishment and Failover Delays |
88 | 14.2 Duplicated Incoming Packets | ||
70 | 89 | ||
71 | 14. Frequently Asked Questions | 90 | 15. Hardware Specific Considerations |
91 | 15.1 IBM BladeCenter | ||
72 | 92 | ||
73 | 15. Resources and Links | 93 | 16. Frequently Asked Questions |
94 | |||
95 | 17. Resources and Links | ||
74 | 96 | ||
75 | 97 | ||
76 | 1. Bonding Driver Installation | 98 | 1. Bonding Driver Installation |
@@ -86,16 +108,10 @@ the following steps: | |||
86 | 1.1 Configure and build the kernel with bonding | 108 | 1.1 Configure and build the kernel with bonding |
87 | ----------------------------------------------- | 109 | ----------------------------------------------- |
88 | 110 | ||
89 | The latest version of the bonding driver is available in the | 111 | The current version of the bonding driver is available in the |
90 | drivers/net/bonding subdirectory of the most recent kernel source | 112 | drivers/net/bonding subdirectory of the most recent kernel source |
91 | (which is available on http://kernel.org). | 113 | (which is available on http://kernel.org). Most users "rolling their |
92 | 114 | own" will want to use the most recent kernel from kernel.org. | |
93 | Prior to the 2.4.11 kernel, the bonding driver was maintained | ||
94 | largely outside the kernel tree; patches for some earlier kernels are | ||
95 | available on the bonding sourceforge site, although those patches are | ||
96 | still several years out of date. Most users will want to use either | ||
97 | the most recent kernel from kernel.org or whatever kernel came with | ||
98 | their distro. | ||
99 | 115 | ||
100 | Configure kernel with "make menuconfig" (or "make xconfig" or | 116 | Configure kernel with "make menuconfig" (or "make xconfig" or |
101 | "make config"), then select "Bonding driver support" in the "Network | 117 | "make config"), then select "Bonding driver support" in the "Network |
@@ -103,8 +119,8 @@ device support" section. It is recommended that you configure the | |||
103 | driver as module since it is currently the only way to pass parameters | 119 | driver as module since it is currently the only way to pass parameters |
104 | to the driver or configure more than one bonding device. | 120 | to the driver or configure more than one bonding device. |
105 | 121 | ||
106 | Build and install the new kernel and modules, then proceed to | 122 | Build and install the new kernel and modules, then continue |
107 | step 2. | 123 | below to install ifenslave. |
108 | 124 | ||
109 | 1.2 Install ifenslave Control Utility | 125 | 1.2 Install ifenslave Control Utility |
110 | ------------------------------------- | 126 | ------------------------------------- |
@@ -147,9 +163,9 @@ default kernel source include directory. | |||
147 | Options for the bonding driver are supplied as parameters to | 163 | Options for the bonding driver are supplied as parameters to |
148 | the bonding module at load time. They may be given as command line | 164 | the bonding module at load time. They may be given as command line |
149 | arguments to the insmod or modprobe command, but are usually specified | 165 | arguments to the insmod or modprobe command, but are usually specified |
150 | in either the /etc/modprobe.conf configuration file, or in a | 166 | in either the /etc/modules.conf or /etc/modprobe.conf configuration |
151 | distro-specific configuration file (some of which are detailed in the | 167 | file, or in a distro-specific configuration file (some of which are |
152 | next section). | 168 | detailed in the next section). |
153 | 169 | ||
154 | The available bonding driver parameters are listed below. If a | 170 | The available bonding driver parameters are listed below. If a |
155 | parameter is not specified the default value is used. When initially | 171 | parameter is not specified the default value is used. When initially |
@@ -162,34 +178,34 @@ degradation will occur during link failures. Very few devices do not | |||
162 | support at least miimon, so there is really no reason not to use it. | 178 | support at least miimon, so there is really no reason not to use it. |
163 | 179 | ||
164 | Options with textual values will accept either the text name | 180 | Options with textual values will accept either the text name |
165 | or, for backwards compatibility, the option value. E.g., | 181 | or, for backwards compatibility, the option value. E.g., |
166 | "mode=802.3ad" and "mode=4" set the same mode. | 182 | "mode=802.3ad" and "mode=4" set the same mode. |
167 | 183 | ||
168 | The parameters are as follows: | 184 | The parameters are as follows: |
169 | 185 | ||
170 | arp_interval | 186 | arp_interval |
171 | 187 | ||
172 | Specifies the ARP monitoring frequency in milli-seconds. If | 188 | Specifies the ARP link monitoring frequency in milliseconds. |
173 | ARP monitoring is used in a load-balancing mode (mode 0 or 2), | 189 | If ARP monitoring is used in an etherchannel compatible mode |
174 | the switch should be configured in a mode that evenly | 190 | (modes 0 and 2), the switch should be configured in a mode |
175 | distributes packets across all links - such as round-robin. If | 191 | that evenly distributes packets across all links. If the |
176 | the switch is configured to distribute the packets in an XOR | 192 | switch is configured to distribute the packets in an XOR |
177 | fashion, all replies from the ARP targets will be received on | 193 | fashion, all replies from the ARP targets will be received on |
178 | the same link which could cause the other team members to | 194 | the same link which could cause the other team members to |
179 | fail. ARP monitoring should not be used in conjunction with | 195 | fail. ARP monitoring should not be used in conjunction with |
180 | miimon. A value of 0 disables ARP monitoring. The default | 196 | miimon. A value of 0 disables ARP monitoring. The default |
181 | value is 0. | 197 | value is 0. |
182 | 198 | ||
183 | arp_ip_target | 199 | arp_ip_target |
184 | 200 | ||
185 | Specifies the ip addresses to use when arp_interval is > 0. | 201 | Specifies the IP addresses to use as ARP monitoring peers when |
186 | These are the targets of the ARP request sent to determine the | 202 | arp_interval is > 0. These are the targets of the ARP request |
187 | health of the link to the targets. Specify these values in | 203 | sent to determine the health of the link to the targets. |
188 | ddd.ddd.ddd.ddd format. Multiple ip adresses must be | 204 | Specify these values in ddd.ddd.ddd.ddd format. Multiple IP |
189 | seperated by a comma. At least one IP address must be given | 205 | addresses must be separated by a comma. At least one IP |
190 | for ARP monitoring to function. The maximum number of targets | 206 | address must be given for ARP monitoring to function. The |
191 | that can be specified is 16. The default value is no IP | 207 | maximum number of targets that can be specified is 16. The |
192 | addresses. | 208 | default value is no IP addresses. |
193 | 209 | ||
194 | downdelay | 210 | downdelay |
195 | 211 | ||
@@ -207,11 +223,13 @@ lacp_rate | |||
207 | are: | 223 | are: |
208 | 224 | ||
209 | slow or 0 | 225 | slow or 0 |
210 | Request partner to transmit LACPDUs every 30 seconds (default) | 226 | Request partner to transmit LACPDUs every 30 seconds |
211 | 227 | ||
212 | fast or 1 | 228 | fast or 1 |
213 | Request partner to transmit LACPDUs every 1 second | 229 | Request partner to transmit LACPDUs every 1 second |
214 | 230 | ||
231 | The default is slow. | ||
232 | |||
215 | max_bonds | 233 | max_bonds |
216 | 234 | ||
217 | Specifies the number of bonding devices to create for this | 235 | Specifies the number of bonding devices to create for this |
@@ -221,10 +239,11 @@ max_bonds | |||
221 | 239 | ||
222 | miimon | 240 | miimon |
223 | 241 | ||
224 | Specifies the frequency in milli-seconds that MII link | 242 | Specifies the MII link monitoring frequency in milliseconds. |
225 | monitoring will occur. A value of zero disables MII link | 243 | This determines how often the link state of each slave is |
226 | monitoring. A value of 100 is a good starting point. The | 244 | inspected for link failures. A value of zero disables MII |
227 | use_carrier option, below, affects how the link state is | 245 | link monitoring. A value of 100 is a good starting point. |
246 | The use_carrier option, below, affects how the link state is | ||
228 | determined. See the High Availability section for additional | 247 | determined. See the High Availability section for additional |
229 | information. The default value is 0. | 248 | information. The default value is 0. |
230 | 249 | ||
@@ -246,17 +265,31 @@ mode | |||
246 | active. A different slave becomes active if, and only | 265 | active. A different slave becomes active if, and only |
247 | if, the active slave fails. The bond's MAC address is | 266 | if, the active slave fails. The bond's MAC address is |
248 | externally visible on only one port (network adapter) | 267 | externally visible on only one port (network adapter) |
249 | to avoid confusing the switch. This mode provides | 268 | to avoid confusing the switch. |
250 | fault tolerance. The primary option affects the | 269 | |
251 | behavior of this mode. | 270 | In bonding version 2.6.2 or later, when a failover |
271 | occurs in active-backup mode, bonding will issue one | ||
272 | or more gratuitous ARPs on the newly active slave. | ||
273 | One gratutious ARP is issued for the bonding master | ||
274 | interface and each VLAN interfaces configured above | ||
275 | it, provided that the interface has at least one IP | ||
276 | address configured. Gratuitous ARPs issued for VLAN | ||
277 | interfaces are tagged with the appropriate VLAN id. | ||
278 | |||
279 | This mode provides fault tolerance. The primary | ||
280 | option, documented below, affects the behavior of this | ||
281 | mode. | ||
252 | 282 | ||
253 | balance-xor or 2 | 283 | balance-xor or 2 |
254 | 284 | ||
255 | XOR policy: Transmit based on [(source MAC address | 285 | XOR policy: Transmit based on the selected transmit |
256 | XOR'd with destination MAC address) modulo slave | 286 | hash policy. The default policy is a simple [(source |
257 | count]. This selects the same slave for each | 287 | MAC address XOR'd with destination MAC address) modulo |
258 | destination MAC address. This mode provides load | 288 | slave count]. Alternate transmit policies may be |
259 | balancing and fault tolerance. | 289 | selected via the xmit_hash_policy option, described |
290 | below. | ||
291 | |||
292 | This mode provides load balancing and fault tolerance. | ||
260 | 293 | ||
261 | broadcast or 3 | 294 | broadcast or 3 |
262 | 295 | ||
@@ -270,7 +303,17 @@ mode | |||
270 | duplex settings. Utilizes all slaves in the active | 303 | duplex settings. Utilizes all slaves in the active |
271 | aggregator according to the 802.3ad specification. | 304 | aggregator according to the 802.3ad specification. |
272 | 305 | ||
273 | Pre-requisites: | 306 | Slave selection for outgoing traffic is done according |
307 | to the transmit hash policy, which may be changed from | ||
308 | the default simple XOR policy via the xmit_hash_policy | ||
309 | option, documented below. Note that not all transmit | ||
310 | policies may be 802.3ad compliant, particularly in | ||
311 | regards to the packet mis-ordering requirements of | ||
312 | section 43.2.4 of the 802.3ad standard. Differing | ||
313 | peer implementations will have varying tolerances for | ||
314 | noncompliance. | ||
315 | |||
316 | Prerequisites: | ||
274 | 317 | ||
275 | 1. Ethtool support in the base drivers for retrieving | 318 | 1. Ethtool support in the base drivers for retrieving |
276 | the speed and duplex of each slave. | 319 | the speed and duplex of each slave. |
@@ -333,7 +376,7 @@ mode | |||
333 | 376 | ||
334 | When a link is reconnected or a new slave joins the | 377 | When a link is reconnected or a new slave joins the |
335 | bond the receive traffic is redistributed among all | 378 | bond the receive traffic is redistributed among all |
336 | active slaves in the bond by intiating ARP Replies | 379 | active slaves in the bond by initiating ARP Replies |
337 | with the selected mac address to each of the | 380 | with the selected mac address to each of the |
338 | clients. The updelay parameter (detailed below) must | 381 | clients. The updelay parameter (detailed below) must |
339 | be set to a value equal or greater than the switch's | 382 | be set to a value equal or greater than the switch's |
@@ -396,6 +439,60 @@ use_carrier | |||
396 | 0 will use the deprecated MII / ETHTOOL ioctls. The default | 439 | 0 will use the deprecated MII / ETHTOOL ioctls. The default |
397 | value is 1. | 440 | value is 1. |
398 | 441 | ||
442 | xmit_hash_policy | ||
443 | |||
444 | Selects the transmit hash policy to use for slave selection in | ||
445 | balance-xor and 802.3ad modes. Possible values are: | ||
446 | |||
447 | layer2 | ||
448 | |||
449 | Uses XOR of hardware MAC addresses to generate the | ||
450 | hash. The formula is | ||
451 | |||
452 | (source MAC XOR destination MAC) modulo slave count | ||
453 | |||
454 | This algorithm will place all traffic to a particular | ||
455 | network peer on the same slave. | ||
456 | |||
457 | This algorithm is 802.3ad compliant. | ||
458 | |||
459 | layer3+4 | ||
460 | |||
461 | This policy uses upper layer protocol information, | ||
462 | when available, to generate the hash. This allows for | ||
463 | traffic to a particular network peer to span multiple | ||
464 | slaves, although a single connection will not span | ||
465 | multiple slaves. | ||
466 | |||
467 | The formula for unfragmented TCP and UDP packets is | ||
468 | |||
469 | ((source port XOR dest port) XOR | ||
470 | ((source IP XOR dest IP) AND 0xffff) | ||
471 | modulo slave count | ||
472 | |||
473 | For fragmented TCP or UDP packets and all other IP | ||
474 | protocol traffic, the source and destination port | ||
475 | information is omitted. For non-IP traffic, the | ||
476 | formula is the same as for the layer2 transmit hash | ||
477 | policy. | ||
478 | |||
479 | This policy is intended to mimic the behavior of | ||
480 | certain switches, notably Cisco switches with PFC2 as | ||
481 | well as some Foundry and IBM products. | ||
482 | |||
483 | This algorithm is not fully 802.3ad compliant. A | ||
484 | single TCP or UDP conversation containing both | ||
485 | fragmented and unfragmented packets will see packets | ||
486 | striped across two interfaces. This may result in out | ||
487 | of order delivery. Most traffic types will not meet | ||
488 | this criteria, as TCP rarely fragments traffic, and | ||
489 | most UDP traffic is not involved in extended | ||
490 | conversations. Other implementations of 802.3ad may | ||
491 | or may not tolerate this noncompliance. | ||
492 | |||
493 | The default value is layer2. This option was added in bonding | ||
494 | version 2.6.3. In earlier versions of bonding, this parameter does | ||
495 | not exist, and the layer2 policy is the only policy. | ||
399 | 496 | ||
400 | 497 | ||
401 | 3. Configuring Bonding Devices | 498 | 3. Configuring Bonding Devices |
@@ -448,8 +545,9 @@ Bonding devices can be managed by hand, however, as follows. | |||
448 | slave devices. On SLES 9, this is most easily done by running the | 545 | slave devices. On SLES 9, this is most easily done by running the |
449 | yast2 sysconfig configuration utility. The goal is for to create an | 546 | yast2 sysconfig configuration utility. The goal is for to create an |
450 | ifcfg-id file for each slave device. The simplest way to accomplish | 547 | ifcfg-id file for each slave device. The simplest way to accomplish |
451 | this is to configure the devices for DHCP. The name of the | 548 | this is to configure the devices for DHCP (this is only to get the |
452 | configuration file for each device will be of the form: | 549 | file ifcfg-id file created; see below for some issues with DHCP). The |
550 | name of the configuration file for each device will be of the form: | ||
453 | 551 | ||
454 | ifcfg-id-xx:xx:xx:xx:xx:xx | 552 | ifcfg-id-xx:xx:xx:xx:xx:xx |
455 | 553 | ||
@@ -459,7 +557,7 @@ the device's permanent MAC address. | |||
459 | Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been | 557 | Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been |
460 | created, it is necessary to edit the configuration files for the slave | 558 | created, it is necessary to edit the configuration files for the slave |
461 | devices (the MAC addresses correspond to those of the slave devices). | 559 | devices (the MAC addresses correspond to those of the slave devices). |
462 | Before editing, the file will contain muliple lines, and will look | 560 | Before editing, the file will contain multiple lines, and will look |
463 | something like this: | 561 | something like this: |
464 | 562 | ||
465 | BOOTPROTO='dhcp' | 563 | BOOTPROTO='dhcp' |
@@ -496,16 +594,11 @@ STARTMODE="onboot" | |||
496 | BONDING_MASTER="yes" | 594 | BONDING_MASTER="yes" |
497 | BONDING_MODULE_OPTS="mode=active-backup miimon=100" | 595 | BONDING_MODULE_OPTS="mode=active-backup miimon=100" |
498 | BONDING_SLAVE0="eth0" | 596 | BONDING_SLAVE0="eth0" |
499 | BONDING_SLAVE1="eth1" | 597 | BONDING_SLAVE1="bus-pci-0000:06:08.1" |
500 | 598 | ||
501 | Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK | 599 | Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK |
502 | values with the appropriate values for your network. | 600 | values with the appropriate values for your network. |
503 | 601 | ||
504 | Note that configuring the bonding device with BOOTPROTO='dhcp' | ||
505 | does not work; the scripts attempt to obtain the device address from | ||
506 | DHCP prior to adding any of the slave devices. Without active slaves, | ||
507 | the DHCP requests are not sent to the network. | ||
508 | |||
509 | The STARTMODE specifies when the device is brought online. | 602 | The STARTMODE specifies when the device is brought online. |
510 | The possible values are: | 603 | The possible values are: |
511 | 604 | ||
@@ -531,9 +624,17 @@ for the bonding mode, link monitoring, and so on here. Do not include | |||
531 | the max_bonds bonding parameter; this will confuse the configuration | 624 | the max_bonds bonding parameter; this will confuse the configuration |
532 | system if you have multiple bonding devices. | 625 | system if you have multiple bonding devices. |
533 | 626 | ||
534 | Finally, supply one BONDING_SLAVEn="ethX" for each slave, | 627 | Finally, supply one BONDING_SLAVEn="slave device" for each |
535 | where "n" is an increasing value, one for each slave, and "ethX" is | 628 | slave. where "n" is an increasing value, one for each slave. The |
536 | the name of the slave device (eth0, eth1, etc). | 629 | "slave device" is either an interface name, e.g., "eth0", or a device |
630 | specifier for the network device. The interface name is easier to | ||
631 | find, but the ethN names are subject to change at boot time if, e.g., | ||
632 | a device early in the sequence has failed. The device specifiers | ||
633 | (bus-pci-0000:06:08.1 in the example above) specify the physical | ||
634 | network device, and will not change unless the device's bus location | ||
635 | changes (for example, it is moved from one PCI slot to another). The | ||
636 | example above uses one of each type for demonstration purposes; most | ||
637 | configurations will choose one or the other for all slave devices. | ||
537 | 638 | ||
538 | When all configuration files have been modified or created, | 639 | When all configuration files have been modified or created, |
539 | networking must be restarted for the configuration changes to take | 640 | networking must be restarted for the configuration changes to take |
@@ -544,7 +645,7 @@ effect. This can be accomplished via the following: | |||
544 | Note that the network control script (/sbin/ifdown) will | 645 | Note that the network control script (/sbin/ifdown) will |
545 | remove the bonding module as part of the network shutdown processing, | 646 | remove the bonding module as part of the network shutdown processing, |
546 | so it is not necessary to remove the module by hand if, e.g., the | 647 | so it is not necessary to remove the module by hand if, e.g., the |
547 | module paramters have changed. | 648 | module parameters have changed. |
548 | 649 | ||
549 | Also, at this writing, YaST/YaST2 will not manage bonding | 650 | Also, at this writing, YaST/YaST2 will not manage bonding |
550 | devices (they do not show bonding interfaces on its list of network | 651 | devices (they do not show bonding interfaces on its list of network |
@@ -559,12 +660,37 @@ format can be found in an example ifcfg template file: | |||
559 | Note that the template does not document the various BONDING_ | 660 | Note that the template does not document the various BONDING_ |
560 | settings described above, but does describe many of the other options. | 661 | settings described above, but does describe many of the other options. |
561 | 662 | ||
663 | 3.1.1 Using DHCP with sysconfig | ||
664 | ------------------------------- | ||
665 | |||
666 | Under sysconfig, configuring a device with BOOTPROTO='dhcp' | ||
667 | will cause it to query DHCP for its IP address information. At this | ||
668 | writing, this does not function for bonding devices; the scripts | ||
669 | attempt to obtain the device address from DHCP prior to adding any of | ||
670 | the slave devices. Without active slaves, the DHCP requests are not | ||
671 | sent to the network. | ||
672 | |||
673 | 3.1.2 Configuring Multiple Bonds with sysconfig | ||
674 | ----------------------------------------------- | ||
675 | |||
676 | The sysconfig network initialization system is capable of | ||
677 | handling multiple bonding devices. All that is necessary is for each | ||
678 | bonding instance to have an appropriately configured ifcfg-bondX file | ||
679 | (as described above). Do not specify the "max_bonds" parameter to any | ||
680 | instance of bonding, as this will confuse sysconfig. If you require | ||
681 | multiple bonding devices with identical parameters, create multiple | ||
682 | ifcfg-bondX files. | ||
683 | |||
684 | Because the sysconfig scripts supply the bonding module | ||
685 | options in the ifcfg-bondX file, it is not necessary to add them to | ||
686 | the system /etc/modules.conf or /etc/modprobe.conf configuration file. | ||
687 | |||
562 | 3.2 Configuration with initscripts support | 688 | 3.2 Configuration with initscripts support |
563 | ------------------------------------------ | 689 | ------------------------------------------ |
564 | 690 | ||
565 | This section applies to distros using a version of initscripts | 691 | This section applies to distros using a version of initscripts |
566 | with bonding support, for example, Red Hat Linux 9 or Red Hat | 692 | with bonding support, for example, Red Hat Linux 9 or Red Hat |
567 | Enterprise Linux version 3. On these systems, the network | 693 | Enterprise Linux version 3 or 4. On these systems, the network |
568 | initialization scripts have some knowledge of bonding, and can be | 694 | initialization scripts have some knowledge of bonding, and can be |
569 | configured to control bonding devices. | 695 | configured to control bonding devices. |
570 | 696 | ||
@@ -614,10 +740,11 @@ USERCTL=no | |||
614 | Be sure to change the networking specific lines (IPADDR, | 740 | Be sure to change the networking specific lines (IPADDR, |
615 | NETMASK, NETWORK and BROADCAST) to match your network configuration. | 741 | NETMASK, NETWORK and BROADCAST) to match your network configuration. |
616 | 742 | ||
617 | Finally, it is necessary to edit /etc/modules.conf to load the | 743 | Finally, it is necessary to edit /etc/modules.conf (or |
618 | bonding module when the bond0 interface is brought up. The following | 744 | /etc/modprobe.conf, depending upon your distro) to load the bonding |
619 | sample lines in /etc/modules.conf will load the bonding module, and | 745 | module with your desired options when the bond0 interface is brought |
620 | select its options: | 746 | up. The following lines in /etc/modules.conf (or modprobe.conf) will |
747 | load the bonding module, and select its options: | ||
621 | 748 | ||
622 | alias bond0 bonding | 749 | alias bond0 bonding |
623 | options bond0 mode=balance-alb miimon=100 | 750 | options bond0 mode=balance-alb miimon=100 |
@@ -629,6 +756,33 @@ options for your configuration. | |||
629 | will restart the networking subsystem and your bond link should be now | 756 | will restart the networking subsystem and your bond link should be now |
630 | up and running. | 757 | up and running. |
631 | 758 | ||
759 | 3.2.1 Using DHCP with initscripts | ||
760 | --------------------------------- | ||
761 | |||
762 | Recent versions of initscripts (the version supplied with | ||
763 | Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do | ||
764 | have support for assigning IP information to bonding devices via DHCP. | ||
765 | |||
766 | To configure bonding for DHCP, configure it as described | ||
767 | above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" | ||
768 | and add a line consisting of "TYPE=Bonding". Note that the TYPE value | ||
769 | is case sensitive. | ||
770 | |||
771 | 3.2.2 Configuring Multiple Bonds with initscripts | ||
772 | ------------------------------------------------- | ||
773 | |||
774 | At this writing, the initscripts package does not directly | ||
775 | support loading the bonding driver multiple times, so the process for | ||
776 | doing so is the same as described in the "Configuring Multiple Bonds | ||
777 | Manually" section, below. | ||
778 | |||
779 | NOTE: It has been observed that some Red Hat supplied kernels | ||
780 | are apparently unable to rename modules at load time (the "-obonding1" | ||
781 | part). Attempts to pass that option to modprobe will produce an | ||
782 | "Operation not permitted" error. This has been reported on some | ||
783 | Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels | ||
784 | exhibiting this problem, it will be impossible to configure multiple | ||
785 | bonds with differing parameters. | ||
632 | 786 | ||
633 | 3.3 Configuring Bonding Manually | 787 | 3.3 Configuring Bonding Manually |
634 | -------------------------------- | 788 | -------------------------------- |
@@ -638,10 +792,11 @@ scripts (the sysconfig or initscripts package) do not have specific | |||
638 | knowledge of bonding. One such distro is SuSE Linux Enterprise Server | 792 | knowledge of bonding. One such distro is SuSE Linux Enterprise Server |
639 | version 8. | 793 | version 8. |
640 | 794 | ||
641 | The general methodology for these systems is to place the | 795 | The general method for these systems is to place the bonding |
642 | bonding module parameters into /etc/modprobe.conf, then add modprobe | 796 | module parameters into /etc/modules.conf or /etc/modprobe.conf (as |
643 | and/or ifenslave commands to the system's global init script. The | 797 | appropriate for the installed distro), then add modprobe and/or |
644 | name of the global init script differs; for sysconfig, it is | 798 | ifenslave commands to the system's global init script. The name of |
799 | the global init script differs; for sysconfig, it is | ||
645 | /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. | 800 | /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. |
646 | 801 | ||
647 | For example, if you wanted to make a simple bond of two e100 | 802 | For example, if you wanted to make a simple bond of two e100 |
@@ -649,7 +804,7 @@ devices (presumed to be eth0 and eth1), and have it persist across | |||
649 | reboots, edit the appropriate file (/etc/init.d/boot.local or | 804 | reboots, edit the appropriate file (/etc/init.d/boot.local or |
650 | /etc/rc.d/rc.local), and add the following: | 805 | /etc/rc.d/rc.local), and add the following: |
651 | 806 | ||
652 | modprobe bonding -obond0 mode=balance-alb miimon=100 | 807 | modprobe bonding mode=balance-alb miimon=100 |
653 | modprobe e100 | 808 | modprobe e100 |
654 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up | 809 | ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up |
655 | ifenslave bond0 eth0 | 810 | ifenslave bond0 eth0 |
@@ -657,11 +812,7 @@ ifenslave bond0 eth1 | |||
657 | 812 | ||
658 | Replace the example bonding module parameters and bond0 | 813 | Replace the example bonding module parameters and bond0 |
659 | network configuration (IP address, netmask, etc) with the appropriate | 814 | network configuration (IP address, netmask, etc) with the appropriate |
660 | values for your configuration. The above example loads the bonding | 815 | values for your configuration. |
661 | module with the name "bond0," this simplifies the naming if multiple | ||
662 | bonding modules are loaded (each successive instance of the module is | ||
663 | given a different name, and the module instance names match the | ||
664 | bonding interface names). | ||
665 | 816 | ||
666 | Unfortunately, this method will not provide support for the | 817 | Unfortunately, this method will not provide support for the |
667 | ifup and ifdown scripts on the bond devices. To reload the bonding | 818 | ifup and ifdown scripts on the bond devices. To reload the bonding |
@@ -684,20 +835,23 @@ appropriate device driver modules. For our example above, you can do | |||
684 | the following: | 835 | the following: |
685 | 836 | ||
686 | # ifconfig bond0 down | 837 | # ifconfig bond0 down |
687 | # rmmod bond0 | 838 | # rmmod bonding |
688 | # rmmod e100 | 839 | # rmmod e100 |
689 | 840 | ||
690 | Again, for convenience, it may be desirable to create a script | 841 | Again, for convenience, it may be desirable to create a script |
691 | with these commands. | 842 | with these commands. |
692 | 843 | ||
693 | 844 | ||
694 | 3.4 Configuring Multiple Bonds | 845 | 3.3.1 Configuring Multiple Bonds Manually |
695 | ------------------------------ | 846 | ----------------------------------------- |
696 | 847 | ||
697 | This section contains information on configuring multiple | 848 | This section contains information on configuring multiple |
698 | bonding devices with differing options. If you require multiple | 849 | bonding devices with differing options for those systems whose network |
699 | bonding devices, but all with the same options, see the "max_bonds" | 850 | initialization scripts lack support for configuring multiple bonds. |
700 | module paramter, documented above. | 851 | |
852 | If you require multiple bonding devices, but all with the same | ||
853 | options, you may wish to use the "max_bonds" module parameter, | ||
854 | documented above. | ||
701 | 855 | ||
702 | To create multiple bonding devices with differing options, it | 856 | To create multiple bonding devices with differing options, it |
703 | is necessary to load the bonding driver multiple times. Note that | 857 | is necessary to load the bonding driver multiple times. Note that |
@@ -724,11 +878,16 @@ named "bond0" and creates the bond0 device in balance-rr mode with an | |||
724 | miimon of 100. The second instance is named "bond1" and creates the | 878 | miimon of 100. The second instance is named "bond1" and creates the |
725 | bond1 device in balance-alb mode with an miimon of 50. | 879 | bond1 device in balance-alb mode with an miimon of 50. |
726 | 880 | ||
881 | In some circumstances (typically with older distributions), | ||
882 | the above does not work, and the second bonding instance never sees | ||
883 | its options. In that case, the second options line can be substituted | ||
884 | as follows: | ||
885 | |||
886 | install bonding1 /sbin/modprobe bonding -obond1 mode=balance-alb miimon=50 | ||
887 | |||
727 | This may be repeated any number of times, specifying a new and | 888 | This may be repeated any number of times, specifying a new and |
728 | unique name in place of bond0 or bond1 for each instance. | 889 | unique name in place of bond1 for each subsequent instance. |
729 | 890 | ||
730 | When the appropriate module paramters are in place, then | ||
731 | configure bonding according to the instructions for your distro. | ||
732 | 891 | ||
733 | 5. Querying Bonding Configuration | 892 | 5. Querying Bonding Configuration |
734 | ================================= | 893 | ================================= |
@@ -846,8 +1005,8 @@ tagged internally by bonding itself. As a result, bonding must | |||
846 | self generated packets. | 1005 | self generated packets. |
847 | 1006 | ||
848 | For reasons of simplicity, and to support the use of adapters | 1007 | For reasons of simplicity, and to support the use of adapters |
849 | that can do VLAN hardware acceleration offloding, the bonding | 1008 | that can do VLAN hardware acceleration offloading, the bonding |
850 | interface declares itself as fully hardware offloaing capable, it gets | 1009 | interface declares itself as fully hardware offloading capable, it gets |
851 | the add_vid/kill_vid notifications to gather the necessary | 1010 | the add_vid/kill_vid notifications to gather the necessary |
852 | information, and it propagates those actions to the slaves. In case | 1011 | information, and it propagates those actions to the slaves. In case |
853 | of mixed adapter types, hardware accelerated tagged packets that | 1012 | of mixed adapter types, hardware accelerated tagged packets that |
@@ -880,7 +1039,7 @@ bond interface: | |||
880 | matches the hardware address of the VLAN interfaces. | 1039 | matches the hardware address of the VLAN interfaces. |
881 | 1040 | ||
882 | Note that changing a VLAN interface's HW address would set the | 1041 | Note that changing a VLAN interface's HW address would set the |
883 | underlying device -- i.e. the bonding interface -- to promiscouos | 1042 | underlying device -- i.e. the bonding interface -- to promiscuous |
884 | mode, which might not be what you want. | 1043 | mode, which might not be what you want. |
885 | 1044 | ||
886 | 1045 | ||
@@ -923,7 +1082,7 @@ down or have a problem making it unresponsive to ARP requests. Having | |||
923 | an additional target (or several) increases the reliability of the ARP | 1082 | an additional target (or several) increases the reliability of the ARP |
924 | monitoring. | 1083 | monitoring. |
925 | 1084 | ||
926 | Multiple ARP targets must be seperated by commas as follows: | 1085 | Multiple ARP targets must be separated by commas as follows: |
927 | 1086 | ||
928 | # example options for ARP monitoring with three targets | 1087 | # example options for ARP monitoring with three targets |
929 | alias bond0 bonding | 1088 | alias bond0 bonding |
@@ -1045,7 +1204,7 @@ install bonding /sbin/modprobe tg3; /sbin/modprobe e1000; | |||
1045 | This will, when loading the bonding module, rather than | 1204 | This will, when loading the bonding module, rather than |
1046 | performing the normal action, instead execute the provided command. | 1205 | performing the normal action, instead execute the provided command. |
1047 | This command loads the device drivers in the order needed, then calls | 1206 | This command loads the device drivers in the order needed, then calls |
1048 | modprobe with --ingore-install to cause the normal action to then take | 1207 | modprobe with --ignore-install to cause the normal action to then take |
1049 | place. Full documentation on this can be found in the modprobe.conf | 1208 | place. Full documentation on this can be found in the modprobe.conf |
1050 | and modprobe manual pages. | 1209 | and modprobe manual pages. |
1051 | 1210 | ||
@@ -1130,14 +1289,14 @@ association. | |||
1130 | common to enable promiscuous mode on the device, so that all traffic | 1289 | common to enable promiscuous mode on the device, so that all traffic |
1131 | is seen (instead of seeing only traffic destined for the local host). | 1290 | is seen (instead of seeing only traffic destined for the local host). |
1132 | The bonding driver handles promiscuous mode changes to the bonding | 1291 | The bonding driver handles promiscuous mode changes to the bonding |
1133 | master device (e.g., bond0), and propogates the setting to the slave | 1292 | master device (e.g., bond0), and propagates the setting to the slave |
1134 | devices. | 1293 | devices. |
1135 | 1294 | ||
1136 | For the balance-rr, balance-xor, broadcast, and 802.3ad modes, | 1295 | For the balance-rr, balance-xor, broadcast, and 802.3ad modes, |
1137 | the promiscuous mode setting is propogated to all slaves. | 1296 | the promiscuous mode setting is propagated to all slaves. |
1138 | 1297 | ||
1139 | For the active-backup, balance-tlb and balance-alb modes, the | 1298 | For the active-backup, balance-tlb and balance-alb modes, the |
1140 | promiscuous mode setting is propogated only to the active slave. | 1299 | promiscuous mode setting is propagated only to the active slave. |
1141 | 1300 | ||
1142 | For balance-tlb mode, the active slave is the slave currently | 1301 | For balance-tlb mode, the active slave is the slave currently |
1143 | receiving inbound traffic. | 1302 | receiving inbound traffic. |
@@ -1148,46 +1307,182 @@ sending to peers that are unassigned or if the load is unbalanced. | |||
1148 | 1307 | ||
1149 | For the active-backup, balance-tlb and balance-alb modes, when | 1308 | For the active-backup, balance-tlb and balance-alb modes, when |
1150 | the active slave changes (e.g., due to a link failure), the | 1309 | the active slave changes (e.g., due to a link failure), the |
1151 | promiscuous setting will be propogated to the new active slave. | 1310 | promiscuous setting will be propagated to the new active slave. |
1152 | 1311 | ||
1153 | 12. High Availability Information | 1312 | 12. Configuring Bonding for High Availability |
1154 | ================================= | 1313 | ============================================= |
1155 | 1314 | ||
1156 | High Availability refers to configurations that provide | 1315 | High Availability refers to configurations that provide |
1157 | maximum network availability by having redundant or backup devices, | 1316 | maximum network availability by having redundant or backup devices, |
1158 | links and switches between the host and the rest of the world. | 1317 | links or switches between the host and the rest of the world. The |
1159 | 1318 | goal is to provide the maximum availability of network connectivity | |
1160 | There are currently two basic methods for configuring to | 1319 | (i.e., the network always works), even though other configurations |
1161 | maximize availability. They are dependent on the network topology and | 1320 | could provide higher throughput. |
1162 | the primary goal of the configuration, but in general, a configuration | ||
1163 | can be optimized for maximum available bandwidth, or for maximum | ||
1164 | network availability. | ||
1165 | 1321 | ||
1166 | 12.1 High Availability in a Single Switch Topology | 1322 | 12.1 High Availability in a Single Switch Topology |
1167 | -------------------------------------------------- | 1323 | -------------------------------------------------- |
1168 | 1324 | ||
1169 | If two hosts (or a host and a switch) are directly connected | 1325 | If two hosts (or a host and a single switch) are directly |
1170 | via multiple physical links, then there is no network availability | 1326 | connected via multiple physical links, then there is no availability |
1171 | penalty for optimizing for maximum bandwidth: there is only one switch | 1327 | penalty to optimizing for maximum bandwidth. In this case, there is |
1172 | (or peer), so if it fails, you have no alternative access to fail over | 1328 | only one switch (or peer), so if it fails, there is no alternative |
1173 | to. | 1329 | access to fail over to. Additionally, the bonding load balance modes |
1330 | support link monitoring of their members, so if individual links fail, | ||
1331 | the load will be rebalanced across the remaining devices. | ||
1332 | |||
1333 | See Section 13, "Configuring Bonding for Maximum Throughput" | ||
1334 | for information on configuring bonding with one peer device. | ||
1335 | |||
1336 | 12.2 High Availability in a Multiple Switch Topology | ||
1337 | ---------------------------------------------------- | ||
1338 | |||
1339 | With multiple switches, the configuration of bonding and the | ||
1340 | network changes dramatically. In multiple switch topologies, there is | ||
1341 | a trade off between network availability and usable bandwidth. | ||
1342 | |||
1343 | Below is a sample network, configured to maximize the | ||
1344 | availability of the network: | ||
1174 | 1345 | ||
1175 | Example 1 : host to switch (or other host) | 1346 | | | |
1347 | |port3 port3| | ||
1348 | +-----+----+ +-----+----+ | ||
1349 | | |port2 ISL port2| | | ||
1350 | | switch A +--------------------------+ switch B | | ||
1351 | | | | | | ||
1352 | +-----+----+ +-----++---+ | ||
1353 | |port1 port1| | ||
1354 | | +-------+ | | ||
1355 | +-------------+ host1 +---------------+ | ||
1356 | eth0 +-------+ eth1 | ||
1176 | 1357 | ||
1177 | +----------+ +----------+ | 1358 | In this configuration, there is a link between the two |
1178 | | |eth0 eth0| switch | | 1359 | switches (ISL, or inter switch link), and multiple ports connecting to |
1179 | | Host A +--------------------------+ or | | 1360 | the outside world ("port3" on each switch). There is no technical |
1180 | | +--------------------------+ other | | 1361 | reason that this could not be extended to a third switch. |
1181 | | |eth1 eth1| host | | ||
1182 | +----------+ +----------+ | ||
1183 | 1362 | ||
1363 | 12.2.1 HA Bonding Mode Selection for Multiple Switch Topology | ||
1364 | ------------------------------------------------------------- | ||
1184 | 1365 | ||
1185 | 12.1.1 Bonding Mode Selection for single switch topology | 1366 | In a topology such as the example above, the active-backup and |
1186 | -------------------------------------------------------- | 1367 | broadcast modes are the only useful bonding modes when optimizing for |
1368 | availability; the other modes require all links to terminate on the | ||
1369 | same peer for them to behave rationally. | ||
1370 | |||
1371 | active-backup: This is generally the preferred mode, particularly if | ||
1372 | the switches have an ISL and play together well. If the | ||
1373 | network configuration is such that one switch is specifically | ||
1374 | a backup switch (e.g., has lower capacity, higher cost, etc), | ||
1375 | then the primary option can be used to insure that the | ||
1376 | preferred link is always used when it is available. | ||
1377 | |||
1378 | broadcast: This mode is really a special purpose mode, and is suitable | ||
1379 | only for very specific needs. For example, if the two | ||
1380 | switches are not connected (no ISL), and the networks beyond | ||
1381 | them are totally independent. In this case, if it is | ||
1382 | necessary for some specific one-way traffic to reach both | ||
1383 | independent networks, then the broadcast mode may be suitable. | ||
1384 | |||
1385 | 12.2.2 HA Link Monitoring Selection for Multiple Switch Topology | ||
1386 | ---------------------------------------------------------------- | ||
1387 | |||
1388 | The choice of link monitoring ultimately depends upon your | ||
1389 | switch. If the switch can reliably fail ports in response to other | ||
1390 | failures, then either the MII or ARP monitors should work. For | ||
1391 | example, in the above example, if the "port3" link fails at the remote | ||
1392 | end, the MII monitor has no direct means to detect this. The ARP | ||
1393 | monitor could be configured with a target at the remote end of port3, | ||
1394 | thus detecting that failure without switch support. | ||
1395 | |||
1396 | In general, however, in a multiple switch topology, the ARP | ||
1397 | monitor can provide a higher level of reliability in detecting end to | ||
1398 | end connectivity failures (which may be caused by the failure of any | ||
1399 | individual component to pass traffic for any reason). Additionally, | ||
1400 | the ARP monitor should be configured with multiple targets (at least | ||
1401 | one for each switch in the network). This will insure that, | ||
1402 | regardless of which switch is active, the ARP monitor has a suitable | ||
1403 | target to query. | ||
1404 | |||
1405 | |||
1406 | 13. Configuring Bonding for Maximum Throughput | ||
1407 | ============================================== | ||
1408 | |||
1409 | 13.1 Maximizing Throughput in a Single Switch Topology | ||
1410 | ------------------------------------------------------ | ||
1411 | |||
1412 | In a single switch configuration, the best method to maximize | ||
1413 | throughput depends upon the application and network environment. The | ||
1414 | various load balancing modes each have strengths and weaknesses in | ||
1415 | different environments, as detailed below. | ||
1416 | |||
1417 | For this discussion, we will break down the topologies into | ||
1418 | two categories. Depending upon the destination of most traffic, we | ||
1419 | categorize them into either "gatewayed" or "local" configurations. | ||
1420 | |||
1421 | In a gatewayed configuration, the "switch" is acting primarily | ||
1422 | as a router, and the majority of traffic passes through this router to | ||
1423 | other networks. An example would be the following: | ||
1424 | |||
1425 | |||
1426 | +----------+ +----------+ | ||
1427 | | |eth0 port1| | to other networks | ||
1428 | | Host A +---------------------+ router +-------------------> | ||
1429 | | +---------------------+ | Hosts B and C are out | ||
1430 | | |eth1 port2| | here somewhere | ||
1431 | +----------+ +----------+ | ||
1432 | |||
1433 | The router may be a dedicated router device, or another host | ||
1434 | acting as a gateway. For our discussion, the important point is that | ||
1435 | the majority of traffic from Host A will pass through the router to | ||
1436 | some other network before reaching its final destination. | ||
1437 | |||
1438 | In a gatewayed network configuration, although Host A may | ||
1439 | communicate with many other systems, all of its traffic will be sent | ||
1440 | and received via one other peer on the local network, the router. | ||
1441 | |||
1442 | Note that the case of two systems connected directly via | ||
1443 | multiple physical links is, for purposes of configuring bonding, the | ||
1444 | same as a gatewayed configuration. In that case, it happens that all | ||
1445 | traffic is destined for the "gateway" itself, not some other network | ||
1446 | beyond the gateway. | ||
1447 | |||
1448 | In a local configuration, the "switch" is acting primarily as | ||
1449 | a switch, and the majority of traffic passes through this switch to | ||
1450 | reach other stations on the same network. An example would be the | ||
1451 | following: | ||
1452 | |||
1453 | +----------+ +----------+ +--------+ | ||
1454 | | |eth0 port1| +-------+ Host B | | ||
1455 | | Host A +------------+ switch |port3 +--------+ | ||
1456 | | +------------+ | +--------+ | ||
1457 | | |eth1 port2| +------------------+ Host C | | ||
1458 | +----------+ +----------+port4 +--------+ | ||
1459 | |||
1460 | |||
1461 | Again, the switch may be a dedicated switch device, or another | ||
1462 | host acting as a gateway. For our discussion, the important point is | ||
1463 | that the majority of traffic from Host A is destined for other hosts | ||
1464 | on the same local network (Hosts B and C in the above example). | ||
1465 | |||
1466 | In summary, in a gatewayed configuration, traffic to and from | ||
1467 | the bonded device will be to the same MAC level peer on the network | ||
1468 | (the gateway itself, i.e., the router), regardless of its final | ||
1469 | destination. In a local configuration, traffic flows directly to and | ||
1470 | from the final destinations, thus, each destination (Host B, Host C) | ||
1471 | will be addressed directly by their individual MAC addresses. | ||
1472 | |||
1473 | This distinction between a gatewayed and a local network | ||
1474 | configuration is important because many of the load balancing modes | ||
1475 | available use the MAC addresses of the local network source and | ||
1476 | destination to make load balancing decisions. The behavior of each | ||
1477 | mode is described below. | ||
1478 | |||
1479 | |||
1480 | 13.1.1 MT Bonding Mode Selection for Single Switch Topology | ||
1481 | ----------------------------------------------------------- | ||
1187 | 1482 | ||
1188 | This configuration is the easiest to set up and to understand, | 1483 | This configuration is the easiest to set up and to understand, |
1189 | although you will have to decide which bonding mode best suits your | 1484 | although you will have to decide which bonding mode best suits your |
1190 | needs. The tradeoffs for each mode are detailed below: | 1485 | needs. The trade offs for each mode are detailed below: |
1191 | 1486 | ||
1192 | balance-rr: This mode is the only mode that will permit a single | 1487 | balance-rr: This mode is the only mode that will permit a single |
1193 | TCP/IP connection to stripe traffic across multiple | 1488 | TCP/IP connection to stripe traffic across multiple |
@@ -1206,6 +1501,23 @@ balance-rr: This mode is the only mode that will permit a single | |||
1206 | interface's worth of throughput, even after adjusting | 1501 | interface's worth of throughput, even after adjusting |
1207 | tcp_reordering. | 1502 | tcp_reordering. |
1208 | 1503 | ||
1504 | Note that this out of order delivery occurs when both the | ||
1505 | sending and receiving systems are utilizing a multiple | ||
1506 | interface bond. Consider a configuration in which a | ||
1507 | balance-rr bond feeds into a single higher capacity network | ||
1508 | channel (e.g., multiple 100Mb/sec ethernets feeding a single | ||
1509 | gigabit ethernet via an etherchannel capable switch). In this | ||
1510 | configuration, traffic sent from the multiple 100Mb devices to | ||
1511 | a destination connected to the gigabit device will not see | ||
1512 | packets out of order. However, traffic sent from the gigabit | ||
1513 | device to the multiple 100Mb devices may or may not see | ||
1514 | traffic out of order, depending upon the balance policy of the | ||
1515 | switch. Many switches do not support any modes that stripe | ||
1516 | traffic (instead choosing a port based upon IP or MAC level | ||
1517 | addresses); for those devices, traffic flowing from the | ||
1518 | gigabit device to the many 100Mb devices will only utilize one | ||
1519 | interface. | ||
1520 | |||
1209 | If you are utilizing protocols other than TCP/IP, UDP for | 1521 | If you are utilizing protocols other than TCP/IP, UDP for |
1210 | example, and your application can tolerate out of order | 1522 | example, and your application can tolerate out of order |
1211 | delivery, then this mode can allow for single stream datagram | 1523 | delivery, then this mode can allow for single stream datagram |
@@ -1220,16 +1532,21 @@ active-backup: There is not much advantage in this network topology to | |||
1220 | connected to the same peer as the primary. In this case, a | 1532 | connected to the same peer as the primary. In this case, a |
1221 | load balancing mode (with link monitoring) will provide the | 1533 | load balancing mode (with link monitoring) will provide the |
1222 | same level of network availability, but with increased | 1534 | same level of network availability, but with increased |
1223 | available bandwidth. On the plus side, it does not require | 1535 | available bandwidth. On the plus side, active-backup mode |
1224 | any configuration of the switch. | 1536 | does not require any configuration of the switch, so it may |
1537 | have value if the hardware available does not support any of | ||
1538 | the load balance modes. | ||
1225 | 1539 | ||
1226 | balance-xor: This mode will limit traffic such that packets destined | 1540 | balance-xor: This mode will limit traffic such that packets destined |
1227 | for specific peers will always be sent over the same | 1541 | for specific peers will always be sent over the same |
1228 | interface. Since the destination is determined by the MAC | 1542 | interface. Since the destination is determined by the MAC |
1229 | addresses involved, this may be desirable if you have a large | 1543 | addresses involved, this mode works best in a "local" network |
1230 | network with many hosts. It is likely to be suboptimal if all | 1544 | configuration (as described above), with destinations all on |
1231 | your traffic is passed through a single router, however. As | 1545 | the same local network. This mode is likely to be suboptimal |
1232 | with balance-rr, the switch ports need to be configured for | 1546 | if all your traffic is passed through a single router (i.e., a |
1547 | "gatewayed" network configuration, as described above). | ||
1548 | |||
1549 | As with balance-rr, the switch ports need to be configured for | ||
1233 | "etherchannel" or "trunking." | 1550 | "etherchannel" or "trunking." |
1234 | 1551 | ||
1235 | broadcast: Like active-backup, there is not much advantage to this | 1552 | broadcast: Like active-backup, there is not much advantage to this |
@@ -1241,122 +1558,131 @@ broadcast: Like active-backup, there is not much advantage to this | |||
1241 | protocol includes automatic configuration of the aggregates, | 1558 | protocol includes automatic configuration of the aggregates, |
1242 | so minimal manual configuration of the switch is needed | 1559 | so minimal manual configuration of the switch is needed |
1243 | (typically only to designate that some set of devices is | 1560 | (typically only to designate that some set of devices is |
1244 | usable for 802.3ad). The 802.3ad standard also mandates that | 1561 | available for 802.3ad). The 802.3ad standard also mandates |
1245 | frames be delivered in order (within certain limits), so in | 1562 | that frames be delivered in order (within certain limits), so |
1246 | general single connections will not see misordering of | 1563 | in general single connections will not see misordering of |
1247 | packets. The 802.3ad mode does have some drawbacks: the | 1564 | packets. The 802.3ad mode does have some drawbacks: the |
1248 | standard mandates that all devices in the aggregate operate at | 1565 | standard mandates that all devices in the aggregate operate at |
1249 | the same speed and duplex. Also, as with all bonding load | 1566 | the same speed and duplex. Also, as with all bonding load |
1250 | balance modes other than balance-rr, no single connection will | 1567 | balance modes other than balance-rr, no single connection will |
1251 | be able to utilize more than a single interface's worth of | 1568 | be able to utilize more than a single interface's worth of |
1252 | bandwidth. Additionally, the linux bonding 802.3ad | 1569 | bandwidth. |
1253 | implementation distributes traffic by peer (using an XOR of | 1570 | |
1254 | MAC addresses), so in general all traffic to a particular | 1571 | Additionally, the linux bonding 802.3ad implementation |
1255 | destination will use the same interface. Finally, the 802.3ad | 1572 | distributes traffic by peer (using an XOR of MAC addresses), |
1256 | mode mandates the use of the MII monitor, therefore, the ARP | 1573 | so in a "gatewayed" configuration, all outgoing traffic will |
1257 | monitor is not available in this mode. | 1574 | generally use the same device. Incoming traffic may also end |
1258 | 1575 | up on a single device, but that is dependent upon the | |
1259 | balance-tlb: This mode is also a good choice for this type of | 1576 | balancing policy of the peer's 8023.ad implementation. In a |
1260 | topology. It has no special switch configuration | 1577 | "local" configuration, traffic will be distributed across the |
1261 | requirements, and balances outgoing traffic by peer, in a | 1578 | devices in the bond. |
1262 | vaguely intelligent manner (not a simple XOR as in balance-xor | 1579 | |
1263 | or 802.3ad mode), so that unlucky MAC addresses will not all | 1580 | Finally, the 802.3ad mode mandates the use of the MII monitor, |
1264 | "bunch up" on a single interface. Interfaces may be of | 1581 | therefore, the ARP monitor is not available in this mode. |
1265 | differing speeds. On the down side, in this mode all incoming | 1582 | |
1266 | traffic arrives over a single interface, this mode requires | 1583 | balance-tlb: The balance-tlb mode balances outgoing traffic by peer. |
1267 | certain ethtool support in the network device driver of the | 1584 | Since the balancing is done according to MAC address, in a |
1268 | slave interfaces, and the ARP monitor is not available. | 1585 | "gatewayed" configuration (as described above), this mode will |
1269 | 1586 | send all traffic across a single device. However, in a | |
1270 | balance-alb: This mode is everything that balance-tlb is, and more. It | 1587 | "local" network configuration, this mode balances multiple |
1271 | has all of the features (and restrictions) of balance-tlb, and | 1588 | local network peers across devices in a vaguely intelligent |
1272 | will also balance incoming traffic from peers (as described in | 1589 | manner (not a simple XOR as in balance-xor or 802.3ad mode), |
1273 | the Bonding Module Options section, above). The only extra | 1590 | so that mathematically unlucky MAC addresses (i.e., ones that |
1274 | down side to this mode is that the network device driver must | 1591 | XOR to the same value) will not all "bunch up" on a single |
1275 | support changing the hardware address while the device is | 1592 | interface. |
1276 | open. | 1593 | |
1277 | 1594 | Unlike 802.3ad, interfaces may be of differing speeds, and no | |
1278 | 12.1.2 Link Monitoring for Single Switch Topology | 1595 | special switch configuration is required. On the down side, |
1279 | ------------------------------------------------- | 1596 | in this mode all incoming traffic arrives over a single |
1597 | interface, this mode requires certain ethtool support in the | ||
1598 | network device driver of the slave interfaces, and the ARP | ||
1599 | monitor is not available. | ||
1600 | |||
1601 | balance-alb: This mode is everything that balance-tlb is, and more. | ||
1602 | It has all of the features (and restrictions) of balance-tlb, | ||
1603 | and will also balance incoming traffic from local network | ||
1604 | peers (as described in the Bonding Module Options section, | ||
1605 | above). | ||
1606 | |||
1607 | The only additional down side to this mode is that the network | ||
1608 | device driver must support changing the hardware address while | ||
1609 | the device is open. | ||
1610 | |||
1611 | 13.1.2 MT Link Monitoring for Single Switch Topology | ||
1612 | ---------------------------------------------------- | ||
1280 | 1613 | ||
1281 | The choice of link monitoring may largely depend upon which | 1614 | The choice of link monitoring may largely depend upon which |
1282 | mode you choose to use. The more advanced load balancing modes do not | 1615 | mode you choose to use. The more advanced load balancing modes do not |
1283 | support the use of the ARP monitor, and are thus restricted to using | 1616 | support the use of the ARP monitor, and are thus restricted to using |
1284 | the MII monitor (which does not provide as high a level of assurance | 1617 | the MII monitor (which does not provide as high a level of end to end |
1285 | as the ARP monitor). | 1618 | assurance as the ARP monitor). |
1286 | 1619 | ||
1287 | 1620 | 13.2 Maximum Throughput in a Multiple Switch Topology | |
1288 | 12.2 High Availability in a Multiple Switch Topology | 1621 | ----------------------------------------------------- |
1289 | ---------------------------------------------------- | 1622 | |
1290 | 1623 | Multiple switches may be utilized to optimize for throughput | |
1291 | With multiple switches, the configuration of bonding and the | 1624 | when they are configured in parallel as part of an isolated network |
1292 | network changes dramatically. In multiple switch topologies, there is | 1625 | between two or more systems, for example: |
1293 | a tradeoff between network availability and usable bandwidth. | 1626 | |
1294 | 1627 | +-----------+ | |
1295 | Below is a sample network, configured to maximize the | 1628 | | Host A | |
1296 | availability of the network: | 1629 | +-+---+---+-+ |
1297 | 1630 | | | | | |
1298 | | | | 1631 | +--------+ | +---------+ |
1299 | |port3 port3| | 1632 | | | | |
1300 | +-----+----+ +-----+----+ | 1633 | +------+---+ +-----+----+ +-----+----+ |
1301 | | |port2 ISL port2| | | 1634 | | Switch A | | Switch B | | Switch C | |
1302 | | switch A +--------------------------+ switch B | | 1635 | +------+---+ +-----+----+ +-----+----+ |
1303 | | | | | | 1636 | | | | |
1304 | +-----+----+ +-----++---+ | 1637 | +--------+ | +---------+ |
1305 | |port1 port1| | 1638 | | | | |
1306 | | +-------+ | | 1639 | +-+---+---+-+ |
1307 | +-------------+ host1 +---------------+ | 1640 | | Host B | |
1308 | eth0 +-------+ eth1 | 1641 | +-----------+ |
1309 | 1642 | ||
1310 | In this configuration, there is a link between the two | 1643 | In this configuration, the switches are isolated from one |
1311 | switches (ISL, or inter switch link), and multiple ports connecting to | 1644 | another. One reason to employ a topology such as this is for an |
1312 | the outside world ("port3" on each switch). There is no technical | 1645 | isolated network with many hosts (a cluster configured for high |
1313 | reason that this could not be extended to a third switch. | 1646 | performance, for example), using multiple smaller switches can be more |
1314 | 1647 | cost effective than a single larger switch, e.g., on a network with 24 | |
1315 | 12.2.1 Bonding Mode Selection for Multiple Switch Topology | 1648 | hosts, three 24 port switches can be significantly less expensive than |
1316 | ---------------------------------------------------------- | 1649 | a single 72 port switch. |
1317 | 1650 | ||
1318 | In a topology such as this, the active-backup and broadcast | 1651 | If access beyond the network is required, an individual host |
1319 | modes are the only useful bonding modes; the other modes require all | 1652 | can be equipped with an additional network device connected to an |
1320 | links to terminate on the same peer for them to behave rationally. | 1653 | external network; this host then additionally acts as a gateway. |
1321 | 1654 | ||
1322 | active-backup: This is generally the preferred mode, particularly if | 1655 | 13.2.1 MT Bonding Mode Selection for Multiple Switch Topology |
1323 | the switches have an ISL and play together well. If the | ||
1324 | network configuration is such that one switch is specifically | ||
1325 | a backup switch (e.g., has lower capacity, higher cost, etc), | ||
1326 | then the primary option can be used to insure that the | ||
1327 | preferred link is always used when it is available. | ||
1328 | |||
1329 | broadcast: This mode is really a special purpose mode, and is suitable | ||
1330 | only for very specific needs. For example, if the two | ||
1331 | switches are not connected (no ISL), and the networks beyond | ||
1332 | them are totally independant. In this case, if it is | ||
1333 | necessary for some specific one-way traffic to reach both | ||
1334 | independent networks, then the broadcast mode may be suitable. | ||
1335 | |||
1336 | 12.2.2 Link Monitoring Selection for Multiple Switch Topology | ||
1337 | ------------------------------------------------------------- | 1656 | ------------------------------------------------------------- |
1338 | 1657 | ||
1339 | The choice of link monitoring ultimately depends upon your | 1658 | In actual practice, the bonding mode typically employed in |
1340 | switch. If the switch can reliably fail ports in response to other | 1659 | configurations of this type is balance-rr. Historically, in this |
1341 | failures, then either the MII or ARP monitors should work. For | 1660 | network configuration, the usual caveats about out of order packet |
1342 | example, in the above example, if the "port3" link fails at the remote | 1661 | delivery are mitigated by the use of network adapters that do not do |
1343 | end, the MII monitor has no direct means to detect this. The ARP | 1662 | any kind of packet coalescing (via the use of NAPI, or because the |
1344 | monitor could be configured with a target at the remote end of port3, | 1663 | device itself does not generate interrupts until some number of |
1345 | thus detecting that failure without switch support. | 1664 | packets has arrived). When employed in this fashion, the balance-rr |
1665 | mode allows individual connections between two hosts to effectively | ||
1666 | utilize greater than one interface's bandwidth. | ||
1346 | 1667 | ||
1347 | In general, however, in a multiple switch topology, the ARP | 1668 | 13.2.2 MT Link Monitoring for Multiple Switch Topology |
1348 | monitor can provide a higher level of reliability in detecting link | 1669 | ------------------------------------------------------ |
1349 | failures. Additionally, it should be configured with multiple targets | ||
1350 | (at least one for each switch in the network). This will insure that, | ||
1351 | regardless of which switch is active, the ARP monitor has a suitable | ||
1352 | target to query. | ||
1353 | 1670 | ||
1671 | Again, in actual practice, the MII monitor is most often used | ||
1672 | in this configuration, as performance is given preference over | ||
1673 | availability. The ARP monitor will function in this topology, but its | ||
1674 | advantages over the MII monitor are mitigated by the volume of probes | ||
1675 | needed as the number of systems involved grows (remember that each | ||
1676 | host in the network is configured with bonding). | ||
1354 | 1677 | ||
1355 | 12.3 Switch Behavior Issues for High Availability | 1678 | 14. Switch Behavior Issues |
1356 | ------------------------------------------------- | 1679 | ========================== |
1357 | 1680 | ||
1358 | You may encounter issues with the timing of link up and down | 1681 | 14.1 Link Establishment and Failover Delays |
1359 | reporting by the switch. | 1682 | ------------------------------------------- |
1683 | |||
1684 | Some switches exhibit undesirable behavior with regard to the | ||
1685 | timing of link up and down reporting by the switch. | ||
1360 | 1686 | ||
1361 | First, when a link comes up, some switches may indicate that | 1687 | First, when a link comes up, some switches may indicate that |
1362 | the link is up (carrier available), but not pass traffic over the | 1688 | the link is up (carrier available), but not pass traffic over the |
@@ -1370,30 +1696,70 @@ relevant interface(s). | |||
1370 | Second, some switches may "bounce" the link state one or more | 1696 | Second, some switches may "bounce" the link state one or more |
1371 | times while a link is changing state. This occurs most commonly while | 1697 | times while a link is changing state. This occurs most commonly while |
1372 | the switch is initializing. Again, an appropriate updelay value may | 1698 | the switch is initializing. Again, an appropriate updelay value may |
1373 | help, but note that if all links are down, then updelay is ignored | 1699 | help. |
1374 | when any link becomes active (the slave closest to completing its | ||
1375 | updelay is chosen). | ||
1376 | 1700 | ||
1377 | Note that when a bonding interface has no active links, the | 1701 | Note that when a bonding interface has no active links, the |
1378 | driver will immediately reuse the first link that goes up, even if | 1702 | driver will immediately reuse the first link that goes up, even if the |
1379 | updelay parameter was specified. If there are slave interfaces | 1703 | updelay parameter has been specified (the updelay is ignored in this |
1380 | waiting for the updelay timeout to expire, the interface that first | 1704 | case). If there are slave interfaces waiting for the updelay timeout |
1381 | went into that state will be immediately reused. This reduces down | 1705 | to expire, the interface that first went into that state will be |
1382 | time of the network if the value of updelay has been overestimated. | 1706 | immediately reused. This reduces down time of the network if the |
1707 | value of updelay has been overestimated, and since this occurs only in | ||
1708 | cases with no connectivity, there is no additional penalty for | ||
1709 | ignoring the updelay. | ||
1383 | 1710 | ||
1384 | In addition to the concerns about switch timings, if your | 1711 | In addition to the concerns about switch timings, if your |
1385 | switches take a long time to go into backup mode, it may be desirable | 1712 | switches take a long time to go into backup mode, it may be desirable |
1386 | to not activate a backup interface immediately after a link goes down. | 1713 | to not activate a backup interface immediately after a link goes down. |
1387 | Failover may be delayed via the downdelay bonding module option. | 1714 | Failover may be delayed via the downdelay bonding module option. |
1388 | 1715 | ||
1389 | 13. Hardware Specific Considerations | 1716 | 14.2 Duplicated Incoming Packets |
1717 | -------------------------------- | ||
1718 | |||
1719 | It is not uncommon to observe a short burst of duplicated | ||
1720 | traffic when the bonding device is first used, or after it has been | ||
1721 | idle for some period of time. This is most easily observed by issuing | ||
1722 | a "ping" to some other host on the network, and noticing that the | ||
1723 | output from ping flags duplicates (typically one per slave). | ||
1724 | |||
1725 | For example, on a bond in active-backup mode with five slaves | ||
1726 | all connected to one switch, the output may appear as follows: | ||
1727 | |||
1728 | # ping -n 10.0.4.2 | ||
1729 | PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data. | ||
1730 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms | ||
1731 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1732 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1733 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1734 | 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) | ||
1735 | 64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms | ||
1736 | 64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms | ||
1737 | 64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms | ||
1738 | |||
1739 | This is not due to an error in the bonding driver, rather, it | ||
1740 | is a side effect of how many switches update their MAC forwarding | ||
1741 | tables. Initially, the switch does not associate the MAC address in | ||
1742 | the packet with a particular switch port, and so it may send the | ||
1743 | traffic to all ports until its MAC forwarding table is updated. Since | ||
1744 | the interfaces attached to the bond may occupy multiple ports on a | ||
1745 | single switch, when the switch (temporarily) floods the traffic to all | ||
1746 | ports, the bond device receives multiple copies of the same packet | ||
1747 | (one per slave device). | ||
1748 | |||
1749 | The duplicated packet behavior is switch dependent, some | ||
1750 | switches exhibit this, and some do not. On switches that display this | ||
1751 | behavior, it can be induced by clearing the MAC forwarding table (on | ||
1752 | most Cisco switches, the privileged command "clear mac address-table | ||
1753 | dynamic" will accomplish this). | ||
1754 | |||
1755 | 15. Hardware Specific Considerations | ||
1390 | ==================================== | 1756 | ==================================== |
1391 | 1757 | ||
1392 | This section contains additional information for configuring | 1758 | This section contains additional information for configuring |
1393 | bonding on specific hardware platforms, or for interfacing bonding | 1759 | bonding on specific hardware platforms, or for interfacing bonding |
1394 | with particular switches or other devices. | 1760 | with particular switches or other devices. |
1395 | 1761 | ||
1396 | 13.1 IBM BladeCenter | 1762 | 15.1 IBM BladeCenter |
1397 | -------------------- | 1763 | -------------------- |
1398 | 1764 | ||
1399 | This applies to the JS20 and similar systems. | 1765 | This applies to the JS20 and similar systems. |
@@ -1407,12 +1773,12 @@ JS20 network adapter information | |||
1407 | -------------------------------- | 1773 | -------------------------------- |
1408 | 1774 | ||
1409 | All JS20s come with two Broadcom Gigabit Ethernet ports | 1775 | All JS20s come with two Broadcom Gigabit Ethernet ports |
1410 | integrated on the planar. In the BladeCenter chassis, the eth0 port | 1776 | integrated on the planar (that's "motherboard" in IBM-speak). In the |
1411 | of all JS20 blades is hard wired to I/O Module #1; similarly, all eth1 | 1777 | BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to |
1412 | ports are wired to I/O Module #2. An add-on Broadcom daughter card | 1778 | I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2. |
1413 | can be installed on a JS20 to provide two more Gigabit Ethernet ports. | 1779 | An add-on Broadcom daughter card can be installed on a JS20 to provide |
1414 | These ports, eth2 and eth3, are wired to I/O Modules 3 and 4, | 1780 | two more Gigabit Ethernet ports. These ports, eth2 and eth3, are |
1415 | respectively. | 1781 | wired to I/O Modules 3 and 4, respectively. |
1416 | 1782 | ||
1417 | Each I/O Module may contain either a switch or a passthrough | 1783 | Each I/O Module may contain either a switch or a passthrough |
1418 | module (which allows ports to be directly connected to an external | 1784 | module (which allows ports to be directly connected to an external |
@@ -1432,29 +1798,30 @@ BladeCenter networking configuration | |||
1432 | of ways, this discussion will be confined to describing basic | 1798 | of ways, this discussion will be confined to describing basic |
1433 | configurations. | 1799 | configurations. |
1434 | 1800 | ||
1435 | Normally, Ethernet Switch Modules (ESM) are used in I/O | 1801 | Normally, Ethernet Switch Modules (ESMs) are used in I/O |
1436 | modules 1 and 2. In this configuration, the eth0 and eth1 ports of a | 1802 | modules 1 and 2. In this configuration, the eth0 and eth1 ports of a |
1437 | JS20 will be connected to different internal switches (in the | 1803 | JS20 will be connected to different internal switches (in the |
1438 | respective I/O modules). | 1804 | respective I/O modules). |
1439 | 1805 | ||
1440 | An optical passthru module (OPM) connects the I/O module | 1806 | A passthrough module (OPM or CPM, optical or copper, |
1441 | directly to an external switch. By using OPMs in I/O module #1 and | 1807 | passthrough module) connects the I/O module directly to an external |
1442 | #2, the eth0 and eth1 interfaces of a JS20 can be redirected to the | 1808 | switch. By using PMs in I/O module #1 and #2, the eth0 and eth1 |
1443 | outside world and connected to a common external switch. | 1809 | interfaces of a JS20 can be redirected to the outside world and |
1444 | 1810 | connected to a common external switch. | |
1445 | Depending upon the mix of ESM and OPM modules, the network | 1811 | |
1446 | will appear to bonding as either a single switch topology (all OPM | 1812 | Depending upon the mix of ESMs and PMs, the network will |
1447 | modules) or as a multiple switch topology (one or more ESM modules, | 1813 | appear to bonding as either a single switch topology (all PMs) or as a |
1448 | zero or more OPM modules). It is also possible to connect ESM modules | 1814 | multiple switch topology (one or more ESMs, zero or more PMs). It is |
1449 | together, resulting in a configuration much like the example in "High | 1815 | also possible to connect ESMs together, resulting in a configuration |
1450 | Availability in a multiple switch topology." | 1816 | much like the example in "High Availability in a Multiple Switch |
1451 | 1817 | Topology," above. | |
1452 | Requirements for specifc modes | 1818 | |
1453 | ------------------------------ | 1819 | Requirements for specific modes |
1454 | 1820 | ------------------------------- | |
1455 | The balance-rr mode requires the use of OPM modules for | 1821 | |
1456 | devices in the bond, all connected to an common external switch. That | 1822 | The balance-rr mode requires the use of passthrough modules |
1457 | switch must be configured for "etherchannel" or "trunking" on the | 1823 | for devices in the bond, all connected to an common external switch. |
1824 | That switch must be configured for "etherchannel" or "trunking" on the | ||
1458 | appropriate ports, as is usual for balance-rr. | 1825 | appropriate ports, as is usual for balance-rr. |
1459 | 1826 | ||
1460 | The balance-alb and balance-tlb modes will function with | 1827 | The balance-alb and balance-tlb modes will function with |
@@ -1484,17 +1851,18 @@ connected to the JS20 system. | |||
1484 | Other concerns | 1851 | Other concerns |
1485 | -------------- | 1852 | -------------- |
1486 | 1853 | ||
1487 | The Serial Over LAN link is established over the primary | 1854 | The Serial Over LAN (SoL) link is established over the primary |
1488 | ethernet (eth0) only, therefore, any loss of link to eth0 will result | 1855 | ethernet (eth0) only, therefore, any loss of link to eth0 will result |
1489 | in losing your SoL connection. It will not fail over with other | 1856 | in losing your SoL connection. It will not fail over with other |
1490 | network traffic. | 1857 | network traffic, as the SoL system is beyond the control of the |
1858 | bonding driver. | ||
1491 | 1859 | ||
1492 | It may be desirable to disable spanning tree on the switch | 1860 | It may be desirable to disable spanning tree on the switch |
1493 | (either the internal Ethernet Switch Module, or an external switch) to | 1861 | (either the internal Ethernet Switch Module, or an external switch) to |
1494 | avoid fail-over delays issues when using bonding. | 1862 | avoid fail-over delay issues when using bonding. |
1495 | 1863 | ||
1496 | 1864 | ||
1497 | 14. Frequently Asked Questions | 1865 | 16. Frequently Asked Questions |
1498 | ============================== | 1866 | ============================== |
1499 | 1867 | ||
1500 | 1. Is it SMP safe? | 1868 | 1. Is it SMP safe? |
@@ -1505,8 +1873,8 @@ The new driver was designed to be SMP safe from the start. | |||
1505 | 2. What type of cards will work with it? | 1873 | 2. What type of cards will work with it? |
1506 | 1874 | ||
1507 | Any Ethernet type cards (you can even mix cards - a Intel | 1875 | Any Ethernet type cards (you can even mix cards - a Intel |
1508 | EtherExpress PRO/100 and a 3com 3c905b, for example). They need not | 1876 | EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, |
1509 | be of the same speed. | 1877 | devices need not be of the same speed. |
1510 | 1878 | ||
1511 | 3. How many bonding devices can I have? | 1879 | 3. How many bonding devices can I have? |
1512 | 1880 | ||
@@ -1524,11 +1892,12 @@ system. | |||
1524 | disabled. The active-backup mode will fail over to a backup link, and | 1892 | disabled. The active-backup mode will fail over to a backup link, and |
1525 | other modes will ignore the failed link. The link will continue to be | 1893 | other modes will ignore the failed link. The link will continue to be |
1526 | monitored, and should it recover, it will rejoin the bond (in whatever | 1894 | monitored, and should it recover, it will rejoin the bond (in whatever |
1527 | manner is appropriate for the mode). See the section on High | 1895 | manner is appropriate for the mode). See the sections on High |
1528 | Availability for additional information. | 1896 | Availability and the documentation for each mode for additional |
1897 | information. | ||
1529 | 1898 | ||
1530 | Link monitoring can be enabled via either the miimon or | 1899 | Link monitoring can be enabled via either the miimon or |
1531 | arp_interval paramters (described in the module paramters section, | 1900 | arp_interval parameters (described in the module parameters section, |
1532 | above). In general, miimon monitors the carrier state as sensed by | 1901 | above). In general, miimon monitors the carrier state as sensed by |
1533 | the underlying network device, and the arp monitor (arp_interval) | 1902 | the underlying network device, and the arp monitor (arp_interval) |
1534 | monitors connectivity to another host on the local network. | 1903 | monitors connectivity to another host on the local network. |
@@ -1536,7 +1905,7 @@ monitors connectivity to another host on the local network. | |||
1536 | If no link monitoring is configured, the bonding driver will | 1905 | If no link monitoring is configured, the bonding driver will |
1537 | be unable to detect link failures, and will assume that all links are | 1906 | be unable to detect link failures, and will assume that all links are |
1538 | always available. This will likely result in lost packets, and a | 1907 | always available. This will likely result in lost packets, and a |
1539 | resulting degredation of performance. The precise performance loss | 1908 | resulting degradation of performance. The precise performance loss |
1540 | depends upon the bonding mode and network configuration. | 1909 | depends upon the bonding mode and network configuration. |
1541 | 1910 | ||
1542 | 6. Can bonding be used for High Availability? | 1911 | 6. Can bonding be used for High Availability? |
@@ -1550,12 +1919,12 @@ depends upon the bonding mode and network configuration. | |||
1550 | In the basic balance modes (balance-rr and balance-xor), it | 1919 | In the basic balance modes (balance-rr and balance-xor), it |
1551 | works with any system that supports etherchannel (also called | 1920 | works with any system that supports etherchannel (also called |
1552 | trunking). Most managed switches currently available have such | 1921 | trunking). Most managed switches currently available have such |
1553 | support, and many unmananged switches as well. | 1922 | support, and many unmanaged switches as well. |
1554 | 1923 | ||
1555 | The advanced balance modes (balance-tlb and balance-alb) do | 1924 | The advanced balance modes (balance-tlb and balance-alb) do |
1556 | not have special switch requirements, but do need device drivers that | 1925 | not have special switch requirements, but do need device drivers that |
1557 | support specific features (described in the appropriate section under | 1926 | support specific features (described in the appropriate section under |
1558 | module paramters, above). | 1927 | module parameters, above). |
1559 | 1928 | ||
1560 | In 802.3ad mode, it works with with systems that support IEEE | 1929 | In 802.3ad mode, it works with with systems that support IEEE |
1561 | 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged | 1930 | 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged |
@@ -1565,17 +1934,19 @@ switches currently available support 802.3ad. | |||
1565 | 1934 | ||
1566 | 8. Where does a bonding device get its MAC address from? | 1935 | 8. Where does a bonding device get its MAC address from? |
1567 | 1936 | ||
1568 | If not explicitly configured with ifconfig, the MAC address of | 1937 | If not explicitly configured (with ifconfig or ip link), the |
1569 | the bonding device is taken from its first slave device. This MAC | 1938 | MAC address of the bonding device is taken from its first slave |
1570 | address is then passed to all following slaves and remains persistent | 1939 | device. This MAC address is then passed to all following slaves and |
1571 | (even if the the first slave is removed) until the bonding device is | 1940 | remains persistent (even if the the first slave is removed) until the |
1572 | brought down or reconfigured. | 1941 | bonding device is brought down or reconfigured. |
1573 | 1942 | ||
1574 | If you wish to change the MAC address, you can set it with | 1943 | If you wish to change the MAC address, you can set it with |
1575 | ifconfig: | 1944 | ifconfig or ip link: |
1576 | 1945 | ||
1577 | # ifconfig bond0 hw ether 00:11:22:33:44:55 | 1946 | # ifconfig bond0 hw ether 00:11:22:33:44:55 |
1578 | 1947 | ||
1948 | # ip link set bond0 address 66:77:88:99:aa:bb | ||
1949 | |||
1579 | The MAC address can be also changed by bringing down/up the | 1950 | The MAC address can be also changed by bringing down/up the |
1580 | device and then changing its slaves (or their order): | 1951 | device and then changing its slaves (or their order): |
1581 | 1952 | ||
@@ -1591,23 +1962,28 @@ from the bond (`ifenslave -d bond0 eth0'). The bonding driver will | |||
1591 | then restore the MAC addresses that the slaves had before they were | 1962 | then restore the MAC addresses that the slaves had before they were |
1592 | enslaved. | 1963 | enslaved. |
1593 | 1964 | ||
1594 | 15. Resources and Links | 1965 | 16. Resources and Links |
1595 | ======================= | 1966 | ======================= |
1596 | 1967 | ||
1597 | The latest version of the bonding driver can be found in the latest | 1968 | The latest version of the bonding driver can be found in the latest |
1598 | version of the linux kernel, found on http://kernel.org | 1969 | version of the linux kernel, found on http://kernel.org |
1599 | 1970 | ||
1971 | The latest version of this document can be found in either the latest | ||
1972 | kernel source (named Documentation/networking/bonding.txt), or on the | ||
1973 | bonding sourceforge site: | ||
1974 | |||
1975 | http://www.sourceforge.net/projects/bonding | ||
1976 | |||
1600 | Discussions regarding the bonding driver take place primarily on the | 1977 | Discussions regarding the bonding driver take place primarily on the |
1601 | bonding-devel mailing list, hosted at sourceforge.net. If you have | 1978 | bonding-devel mailing list, hosted at sourceforge.net. If you have |
1602 | questions or problems, post them to the list. | 1979 | questions or problems, post them to the list. The list address is: |
1603 | 1980 | ||
1604 | bonding-devel@lists.sourceforge.net | 1981 | bonding-devel@lists.sourceforge.net |
1605 | 1982 | ||
1606 | https://lists.sourceforge.net/lists/listinfo/bonding-devel | 1983 | The administrative interface (to subscribe or unsubscribe) can |
1607 | 1984 | be found at: | |
1608 | There is also a project site on sourceforge. | ||
1609 | 1985 | ||
1610 | http://www.sourceforge.net/projects/bonding | 1986 | https://lists.sourceforge.net/lists/listinfo/bonding-devel |
1611 | 1987 | ||
1612 | Donald Becker's Ethernet Drivers and diag programs may be found at : | 1988 | Donald Becker's Ethernet Drivers and diag programs may be found at : |
1613 | - http://www.scyld.com/network/ | 1989 | - http://www.scyld.com/network/ |
diff --git a/Documentation/usb/usbmon.txt b/Documentation/usb/usbmon.txt index f1896ee3bb2a..63cb7edd177e 100644 --- a/Documentation/usb/usbmon.txt +++ b/Documentation/usb/usbmon.txt | |||
@@ -102,7 +102,7 @@ Here is the list of words, from left to right: | |||
102 | - URB Status. This field makes no sense for submissions, but is present | 102 | - URB Status. This field makes no sense for submissions, but is present |
103 | to help scripts with parsing. In error case, it contains the error code. | 103 | to help scripts with parsing. In error case, it contains the error code. |
104 | In case of a setup packet, it contains a Setup Tag. If scripts read a number | 104 | In case of a setup packet, it contains a Setup Tag. If scripts read a number |
105 | in this field, the proceed to read Data Length. Otherwise, they read | 105 | in this field, they proceed to read Data Length. Otherwise, they read |
106 | the setup packet before reading the Data Length. | 106 | the setup packet before reading the Data Length. |
107 | - Setup packet, if present, consists of 5 words: one of each for bmRequestType, | 107 | - Setup packet, if present, consists of 5 words: one of each for bmRequestType, |
108 | bRequest, wValue, wIndex, wLength, as specified by the USB Specification 2.0. | 108 | bRequest, wValue, wIndex, wLength, as specified by the USB Specification 2.0. |
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88 index 6d44958289de..03deb0726aa4 100644 --- a/Documentation/video4linux/CARDLIST.cx88 +++ b/Documentation/video4linux/CARDLIST.cx88 | |||
@@ -29,3 +29,4 @@ card=27 - PixelView PlayTV Ultra Pro (Stereo) | |||
29 | card=28 - DViCO FusionHDTV 3 Gold-T | 29 | card=28 - DViCO FusionHDTV 3 Gold-T |
30 | card=29 - ADS Tech Instant TV DVB-T PCI | 30 | card=29 - ADS Tech Instant TV DVB-T PCI |
31 | card=30 - TerraTec Cinergy 1400 DVB-T | 31 | card=30 - TerraTec Cinergy 1400 DVB-T |
32 | card=31 - DViCO FusionHDTV 5 Gold | ||
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner index d1b9d21ffd89..f3302e1b1b9c 100644 --- a/Documentation/video4linux/CARDLIST.tuner +++ b/Documentation/video4linux/CARDLIST.tuner | |||
@@ -62,3 +62,5 @@ tuner=60 - Thomson DDT 7611 (ATSC/NTSC) | |||
62 | tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF | 62 | tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF |
63 | tuner=62 - Philips TEA5767HN FM Radio | 63 | tuner=62 - Philips TEA5767HN FM Radio |
64 | tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner | 64 | tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner |
65 | tuner=64 - LG TDVS-H062F/TUA6034 | ||
66 | tuner=65 - Ymec TVF66T5-B/DFF | ||
diff --git a/Documentation/video4linux/bttv/Insmod-options b/Documentation/video4linux/bttv/Insmod-options index 7bb5a50b0779..fc94ff235ffa 100644 --- a/Documentation/video4linux/bttv/Insmod-options +++ b/Documentation/video4linux/bttv/Insmod-options | |||
@@ -44,6 +44,9 @@ bttv.o | |||
44 | push used by bttv. bttv will disable overlay | 44 | push used by bttv. bttv will disable overlay |
45 | by default on this hardware to avoid crashes. | 45 | by default on this hardware to avoid crashes. |
46 | With this insmod option you can override this. | 46 | With this insmod option you can override this. |
47 | no_overlay=1 Disable overlay. It should be used by broken | ||
48 | hardware that doesn't support PCI2PCI direct | ||
49 | transfers. | ||
47 | automute=0/1 Automatically mutes the sound if there is | 50 | automute=0/1 Automatically mutes the sound if there is |
48 | no TV signal, on by default. You might try | 51 | no TV signal, on by default. You might try |
49 | to disable this if you have bad input signal | 52 | to disable this if you have bad input signal |
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index 476c0c22fbb7..678e8f192db2 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt | |||
@@ -6,6 +6,11 @@ only the AMD64 specific ones are listed here. | |||
6 | Machine check | 6 | Machine check |
7 | 7 | ||
8 | mce=off disable machine check | 8 | mce=off disable machine check |
9 | mce=bootlog Enable logging of machine checks left over from booting. | ||
10 | Disabled by default because some BIOS leave bogus ones. | ||
11 | If your BIOS doesn't do that it's a good idea to enable though | ||
12 | to make sure you log even machine check events that result | ||
13 | in a reboot. | ||
9 | 14 | ||
10 | nomce (for compatibility with i386): same as mce=off | 15 | nomce (for compatibility with i386): same as mce=off |
11 | 16 | ||