aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorChristoph Lameter <christoph@graphe.net>2005-03-30 16:34:31 -0500
committerJeff Garzik <jgarzik@pobox.com>2005-05-15 19:15:02 -0400
commit8199d3a79c224bbe5943fa08684e1f93a17881b0 (patch)
tree77726ddade7ca4282bc12315abcb01fdf241be74 /Documentation
parent88d7bd8cb9eb8d64bf7997600b0d64f7834047c5 (diff)
[PATCH] A new 10GB Ethernet Driver by Chelsio Communications
A Linux driver for the Chelsio 10Gb Ethernet Network Controller by Chelsio (http://www.chelsio.com). This driver supports the Chelsio N210 NIC and is backward compatible with the Chelsio N110 model 10Gb NICs. It supports AMD64, EM64T and x86 systems. Signed-off-by: Tina Yang <tinay@chelsio.com> Signed-off-by: Scott Bardone <sbardone@chelsio.com> Signed-off-by: Christoph Lameter <christoph@lameter.com> Adrian said: - my3126.c is unused (because t1_my3126_ops isn't used anywhere) - what are the EXTRA_CFLAGS in drivers/net/chelsio/Makefile for? - $(cxgb-y) in drivers/net/chelsio/Makefile seems to be unneeded - completely unused global functions: - espi.c: t1_espi_get_intr_counts - sge.c: t1_sge_get_intr_counts - the following functions can be made static: - sge.c: t1_espi_workaround - sge.c: t1_sge_tx - subr.c: __t1_tpi_read - subr.c: __t1_tpi_write - subr.c: t1_wait_op_done shemminger said: The performance recommendations in cxgb.txt are common to all fast devices, and should be in one file rather than just for this device. I would rather see ip-sysctl.txt updated or a new file on tuning recommendations started. Some of them have consequences that aren't documented well. For example, turning off TCP timestamps risks data corruption from sequence wrap. A new driver shouldn't need so may #ifdef's unless you want to putit on older vendor versions of 2.4 Some accessor and wrapper functions like: t1_pci_read_config_4 adapter_name t1_malloc are just annoying noise. Why have useless dead code like: /* Interrupt handler */ +static int pm3393_interrupt_handler(struct cmac *cmac) +{ + u32 master_intr_status; +/* + 1. Read master interrupt register. + 2. Read BLOCK's interrupt status registers. + 3. Handle BLOCK interrupts. +*/ Jeff said: step 1: kill all the OS wrappers. And do you really need hooks for multiple MACs, when only one MAC is really supported? Typically these hooks are at a higher level anyway -- struct net_device. From: Christoph Lameter <christoph@lameter Driver modified as suggested by Pekka Enberg, Stephen Hemminger and Andrian Bunk. Reduces the size of the driver to ~260k. - clean up tabs - removed my3126.c - removed 85% of suni1x10gexp_regs.h - removed 80% of regs.h - removed various calls, renamed variables/functions. - removed system specific and other wrappers (usleep, msleep) - removed dead code - dropped redundant casts in osdep.h - dropped redundant check of kfree - dropped weird code (MODVERSIONS stuff) - reduced number of #ifdefs - use kcalloc now instead of kmalloc - Add information about known issues with the driver - Add information about authors Signed-off-by: Scott Bardone <sbardone@chelsio.com> Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> diff -puN /dev/null Documentation/networking/cxgb.txt
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/networking/cxgb.txt322
1 files changed, 322 insertions, 0 deletions
diff --git a/Documentation/networking/cxgb.txt b/Documentation/networking/cxgb.txt
new file mode 100644
index 000000000000..9f2eb646c6f5
--- /dev/null
+++ b/Documentation/networking/cxgb.txt
@@ -0,0 +1,322 @@
1 Chelsio N210 10Gb Ethernet Network Controller
2
3 Driver Release Notes for Linux
4
5 Version 2.1.0
6
7 March 8, 2005
8
9CONTENTS
10========
11 INTRODUCTION
12 FEATURES
13 PERFORMANCE
14 DRIVER MESSAGES
15 KNOWN ISSUES
16 SUPPORT
17
18
19INTRODUCTION
20============
21
22 This document describes the Linux driver for Chelsio 10Gb Ethernet Network
23 Controller. This driver supports the Chelsio N210 NIC and is backward
24 compatible with the Chelsio N110 model 10Gb NICs. This driver supports AMD64
25 and EM64T, and x86 systems.
26
27
28FEATURES
29========
30
31 Adaptive Interrupts (adaptive-rx)
32 ---------------------------------
33
34 This feature provides an adaptive algorithm that adjusts the interrupt
35 coalescing parameters, allowing the driver to dynamically adapt the latency
36 settings to achieve the highest performance during various types of network
37 load.
38
39 The interface used to control this feature is ethtool. Please see the
40 ethtool manpage for additional usage information.
41
42 By default, adaptive-rx is disabled.
43 To enable adaptive-rx:
44
45 ethtool -C <interface> adaptive-rx on
46
47 To disable adaptive-rx, use ethtool:
48
49 ethtool -C <interface> adaptive-rx off
50
51 After disabling adaptive-rx, the timer latency value will be set to 50us.
52 You may set the timer latency after disabling adaptive-rx:
53
54 ethtool -C <interface> rx-usecs <microseconds>
55
56 An example to set the timer latency value to 100us on eth0:
57
58 ethtool -C eth0 rx-usecs 100
59
60 You may also provide a timer latency value while disabling adpative-rx:
61
62 ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
63
64 If adaptive-rx is disabled and a timer latency value is specified, the timer
65 will be set to the specified value until changed by the user or until
66 adaptive-rx is enabled.
67
68 To view the status of the adaptive-rx and timer latency values:
69
70 ethtool -c <interface>
71
72
73 TCP Segmentation Offloading (TSO) Support
74 -----------------------------------------
75
76 This feature, also known as "large send", enables a system's protocol stack
77 to offload portions of outbound TCP processing to a network interface card
78 thereby reducing system CPU utilization and enhancing performance.
79
80 The interface used to control this feature is ethtool version 1.8 or higher.
81 Please see the ethtool manpage for additional usage information.
82
83 By default, TSO is enabled.
84 To disable TSO:
85
86 ethtool -K <interface> tso off
87
88 To enable TSO:
89
90 ethtool -K <interface> tso on
91
92 To view the status of TSO:
93
94 ethtool -k <interface>
95
96
97PERFORMANCE
98===========
99
100 The following information is provided as an example of how to change system
101 parameters for "performance tuning" an what value to use. You may or may not
102 want to change these system parameters, depending on your server/workstation
103 application. Doing so is not warranted in any way by Chelsio Communications,
104 and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
105 of data or damage to equipment.
106
107 Your distribution may have a different way of doing things, or you may prefer
108 a different method. These commands are shown only to provide an example of
109 what to do and are by no means definitive.
110
111 Making any of the following system changes will only last until you reboot
112 your system. You may want to write a script that runs at boot-up which
113 includes the optimal settings for your system.
114
115 Setting PCI Latency Timer:
116 setpci -d 1425:* 0x0c.l=0x0000F800
117
118 Disabling TCP timestamp:
119 sysctl -w net.ipv4.tcp_timestamps=0
120
121 Disabling SACK:
122 sysctl -w net.ipv4.tcp_sack=0
123
124 Setting TCP read buffers (min/default/max):
125 sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
126
127 Setting TCP write buffers (min/pressure/max):
128 sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
129
130 Setting TCP buffer space (min/pressure/max):
131 sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
132
133 Setting large number of incoming connection requests (2.6.x only):
134 sysctl -w net.ipv4.tcp_max_syn_backlog=3000
135
136 Setting maximum receive socket buffer size:
137 sysctl -w net.core.rmem_max=524287
138
139 Setting maximum send socket buffer size:
140 sysctl -w net.core.wmem_max=524287
141
142 Setting default receive socket buffer size:
143 sysctl -w net.core.rmem_default=524287
144
145 Setting default send socket buffer size:
146 sysctl -w net.core.wmem_default=524287
147
148 Setting maximum option memory buffers:
149 sysctl -w net.core.optmem_max=524287
150
151 Setting maximum backlog (# of unprocessed packets before kernel drops):
152 sysctl -w net.core.netdev_max_backlog=300000
153
154 Set smp_affinity (on a multiprocessor system) to a single CPU:
155 echo 00000001 > /proc/irq/<interrupt_number>/smp_affinity
156
157 TCP window size for single connections:
158 The receive buffer (RX_WINDOW) size must be at least as large as the
159 Bandwidth-Delay Product of the communication link between the sender and
160 receiver. Due to the variations of RTT, you may want to increase the buffer
161 size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
162 "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
163 At 10Gb speeds, use the following formula:
164 RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
165 Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
166 RX_WINDOW sizes of 256KB - 512KB should be sufficient.
167 Setting the min, max, and default receive buffer (RX_WINDOW) size:
168 sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
169
170 TCP window size for multiple connections:
171 The receive buffer (RX_WINDOW) size may be calculated the same as single
172 connections, but should be divided by the number of connections. The
173 smaller window prevents congestion and facilitates better pacing,
174 especially if/when MAC level flow control does not work well or when it is
175 not supported on the machine. Experimentation may be necessary to attain
176 the correct value. This method is provided as a starting point fot the
177 correct receive buffer size.
178 Setting the min, max, and default receive buffer (RX_WINDOW) size is
179 performed in the same manner as single connection.
180
181
182DRIVER MESSAGES
183===============
184
185 The following messages are the most common messages logged by syslog. These
186 may be found in /var/log/messages.
187
188 Driver up:
189 Chelsio Network Driver - version 2.1.0
190
191 NIC detected:
192 eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
193
194 Link up:
195 eth#: link is up at 10 Gbps, full duplex
196
197 Link down:
198 eth#: link is down
199
200
201KNOWN ISSUES
202============
203
204 These issues have been identified during testing. The following information
205 is provided as a workaround to the problem. In some cases, this problem is
206 inherent to Linux or to a particular Linux Distribution and/or hardware
207 platform.
208
209 1. Large number of TCP retransmits on a multiprocessor (SMP) system.
210
211 On a system with multiple CPUs, the interrupt (IRQ) for the network
212 controller may be bound to more than one CPU. This will cause TCP
213 retransmits if the packet data were to be split across different CPUs
214 and re-assembled in a different order than expected.
215
216 To eliminate the TCP retransmits, set smp_affinity on the particular
217 interrupt to a single CPU. You can locate the interrupt (IRQ) used on
218 the N110/N210 by using ifconfig:
219 ifconfig <dev_name> | grep Interrupt
220 Set the smp_affinity to a single CPU:
221 echo 1 > /proc/irq/<interrupt_number>/smp_affinity
222
223 It is highly suggested that you do not run the irqbalance daemon on your
224 system, as this will change any smp_affinity setting you have applied.
225 The irqbalance daemon runs on a 10 second interval and binds interrupts
226 to the least loaded CPU determined by the daemon. To disable this daemon:
227 chkconfig --level 2345 irqbalance off
228
229 By default, some Linux distributions enable the kernel feature,
230 irqbalance, which performs the same function as the daemon. To disable
231 this feature, add the following line to your bootloader:
232 noirqbalance
233
234 Example using the Grub bootloader:
235 title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
236 root (hd0,0)
237 kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
238 initrd /initrd-2.4.21-27.ELsmp.img
239
240 2. After running insmod, the driver is loaded and the incorrect network
241 interface is brought up without running ifup.
242
243 When using 2.4.x kernels, including RHEL kernels, the Linux kernel
244 invokes a script named "hotplug". This script is primarily used to
245 automatically bring up USB devices when they are plugged in, however,
246 the script also attempts to automatically bring up a network interface
247 after loading the kernel module. The hotplug script does this by scanning
248 the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
249 for HWADDR=<mac_address>.
250
251 If the hotplug script does not find the HWADDRR within any of the
252 ifcfg-eth# files, it will bring up the device with the next available
253 interface name. If this interface is already configured for a different
254 network card, your new interface will have incorrect IP address and
255 network settings.
256
257 To solve this issue, you can add the HWADDR=<mac_address> key to the
258 interface config file of your network controller.
259
260 To disable this "hotplug" feature, you may add the driver (module name)
261 to the "blacklist" file located in /etc/hotplug. It has been noted that
262 this does not work for network devices because the net.agent script
263 does not use the blacklist file. Simply remove, or rename, the net.agent
264 script located in /etc/hotplug to disable this feature.
265
266 3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
267 on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
268
269 If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
270 chipset, you may experience the "133-Mhz Mode Split Completion Data
271 Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
272 bus PCI-X bus.
273
274 AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
275 can provide stale data via split completion cycles to a PCI-X card that
276 is operating at 133 Mhz", causing data corruption.
277
278 AMD's provides three workarounds for this problem, however, Chelsio
279 recommends the first option for best performance with this bug:
280
281 For 133Mhz secondary bus operation, limit the transaction length and
282 the number of outstanding transactions, via BIOS configuration
283 programming of the PCI-X card, to the following:
284
285 Data Length (bytes): 2k
286 Total allowed outstanding transactions: 1
287
288 Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
289 section 56, "133-MHz Mode Split Completion Data Corruption" for more
290 details with this bug and workarounds suggested by AMD.
291
292
293SUPPORT
294=======
295
296 If you have problems with the software or hardware, please contact our
297 customer support team via email at support@chelsio.com or check our website
298 at http://www.chelsio.com
299
300===============================================================================
301
302 Chelsio Communications
303 370 San Aleso Ave.
304 Suite 100
305 Sunnyvale, CA 94085
306 http://www.chelsio.com
307
308This program is free software; you can redistribute it and/or modify
309it under the terms of the GNU General Public License, version 2, as
310published by the Free Software Foundation.
311
312You should have received a copy of the GNU General Public License along
313with this program; if not, write to the Free Software Foundation, Inc.,
31459 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
315
316THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
317WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
318MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
319
320 Copyright (c) 2003-2005 Chelsio Communications. All rights reserved.
321
322===============================================================================