2 files changed, 73 insertions, 52 deletions
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index a2c893a7475d..ab65714d95fc 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -304,57 +304,6 @@ tcp_low_latency - BOOLEAN
        changed would be a Beowulf compute cluster.
        Default: 0
-tcp_westwood - BOOLEAN
-        Enable TCP Westwood+ congestion control algorithm.
-        TCP Westwood+ is a sender-side only modification of the TCP Reno 
-        protocol stack that optimizes the performance of TCP congestion 
-        control. It is based on end-to-end bandwidth estimation to set 
-        congestion window and slow start threshold after a congestion 
-        episode. Using this estimation, TCP Westwood+ adaptively sets a 
-        slow start threshold and a congestion window which takes into 
-        account the bandwidth used  at the time congestion is experienced. 
-        TCP Westwood+ significantly increases fairness wrt TCP Reno in 
-        wired networks and throughput over wireless links.   
-        Default: 0
-tcp_vegas_cong_avoid - BOOLEAN
-        Enable TCP Vegas congestion avoidance algorithm.
-        TCP Vegas is a sender-side only change to TCP that anticipates
-        the onset of congestion by estimating the bandwidth. TCP Vegas
-        adjusts the sending rate by modifying the congestion
-        window. TCP Vegas should provide less packet loss, but it is
-        not as aggressive as TCP Reno.
-        Default:0
-tcp_bic - BOOLEAN
-        Enable BIC TCP congestion control algorithm.
-        BIC-TCP is a sender-side only change that ensures a linear RTT
-        fairness under large windows while offering both scalability and
-        bounded TCP-friendliness. The protocol combines two schemes
-        called additive increase and binary search increase. When the
-        congestion window is large, additive increase with a large
-        increment ensures linear RTT fairness as well as good
-        scalability. Under small congestion windows, binary search
-        increase provides TCP friendliness.
-        Default: 0
-tcp_bic_low_window - INTEGER
-        Sets the threshold window (in packets) where BIC TCP starts to
-        adjust the congestion window. Below this threshold BIC TCP behaves
-        the same as the default TCP Reno. 
-        Default: 14
-tcp_bic_fast_convergence - BOOLEAN
-        Forces BIC TCP to more quickly respond to changes in congestion
-        window. Allows two flows sharing the same connection to converge
-        more rapidly.
-        Default: 1
-tcp_default_win_scale - INTEGER
-        Sets the minimum window scale TCP will negotiate for on all
-        conections.
-        Default: 7
 tcp_tso_win_divisor - INTEGER
       This allows control over what percentage of the congestion window
       can be consumed by a single TSO frame.
@@ -368,6 +317,11 @@ tcp_frto - BOOLEAN
        where packet loss is typically due to random radio interference
        rather than intermediate router congestion.
+tcp_congestion_control - STRING
+        Set the congestion control algorithm to be used for new
+        connections. The algorithm "reno" is always available, but
+        additional choices may be available based on kernel configuration.
 somaxconn - INTEGER
        Limit of socket listen() backlog, known in userspace as SOMAXCONN.
        Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt
index 71749007091e..0fa300425575 100644
--- a/Documentation/networking/tcp.txt
+++ b/Documentation/networking/tcp.txt
@@ -1,5 +1,72 @@
-How the new TCP output machine [nyi] works.
+TCP protocol
+============
+Last updated: 21 June 2005
+Contents
+========
+- Congestion control
+- How the new TCP output machine [nyi] works
+Congestion control
+==================
+The following variables are used in the tcp_sock for congestion control:
+snd_cwnd                The size of the congestion window
+snd_ssthresh            Slow start threshold. We are in slow start if
+                        snd_cwnd is less than this.
+snd_cwnd_cnt            A counter used to slow down the rate of increase
+                        once we exceed slow start threshold.
+snd_cwnd_clamp          This is the maximum size that snd_cwnd can grow to.
+snd_cwnd_stamp          Timestamp for when congestion window last validated.
+snd_cwnd_used           Used as a highwater mark for how much of the
+                        congestion window is in use. It is used to adjust
+                        snd_cwnd down when the link is limited by the
+                        application rather than the network.
+As of 2.6.13, Linux supports pluggable congestion control algorithms.
+A congestion control mechanism can be registered through functions in
+tcp_cong.c. The functions used by the congestion control mechanism are
+registered via passing a tcp_congestion_ops struct to
+tcp_register_congestion_control. As a minimum name, ssthresh,
+cong_avoid, min_cwnd must be valid.
+Private data for a congestion control mechanism is stored in tp->ca_priv.
+tcp_ca(tp) returns a pointer to this space.  This is preallocated space - it
+is important to check the size of your private data will fit this space, or
+alternatively space could be allocated elsewhere and a pointer to it could
+be stored here.
+There are three kinds of congestion control algorithms currently: The
+simplest ones are derived from TCP reno (highspeed, scalable) and just
+provide an alternative the congestion window calculation. More complex
+ones like BIC try to look at other events to provide better
+heuristics.  There are also round trip time based algorithms like
+Vegas and Westwood+.
+Good TCP congestion control is a complex problem because the algorithm
+needs to maintain fairness and performance. Please review current
+research and RFC's before developing new modules.
+The method that is used to determine which congestion control mechanism is
+determined by the setting of the sysctl net.ipv4.tcp_congestion_control.
+The default congestion control will be the last one registered (LIFO);
+so if you built everything as modules. the default will be reno. If you
+build with the default's from Kconfig, then BIC will be builtin (not a module)
+and it will end up the default.
+If you really want a particular default value then you will need
+to set it with the sysctl.  If you use a sysctl, the module will be autoloaded
+if needed and you will get the expected protocol. If you ask for an
+unknown congestion method, then the sysctl attempt will fail.
+If you remove a tcp congestion control module, then you will get the next
+available one. Since reno can not be built as a module, and can not be
+deleted, it will always be available.
+How the new TCP output machine [nyi] works.
+===========================================
 Data is kept on a single queue. The skb->users flag tells us if the frame is
 one that has been queued already. To add a frame we throw it on the end. Ack

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index a2c893a7475d..ab65714d95fc 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt
@@ -304,57 +304,6 @@ tcp_low_latency - BOOLEAN
304	changed would be a Beowulf compute cluster.	304	changed would be a Beowulf compute cluster.
305	Default: 0	305	Default: 0
306		306
307	tcp_westwood - BOOLEAN
308	Enable TCP Westwood+ congestion control algorithm.
309	TCP Westwood+ is a sender-side only modification of the TCP Reno
310	protocol stack that optimizes the performance of TCP congestion
311	control. It is based on end-to-end bandwidth estimation to set
312	congestion window and slow start threshold after a congestion
313	episode. Using this estimation, TCP Westwood+ adaptively sets a
314	slow start threshold and a congestion window which takes into
315	account the bandwidth used at the time congestion is experienced.
316	TCP Westwood+ significantly increases fairness wrt TCP Reno in
317	wired networks and throughput over wireless links.
318	Default: 0
319
320	tcp_vegas_cong_avoid - BOOLEAN
321	Enable TCP Vegas congestion avoidance algorithm.
322	TCP Vegas is a sender-side only change to TCP that anticipates
323	the onset of congestion by estimating the bandwidth. TCP Vegas
324	adjusts the sending rate by modifying the congestion
325	window. TCP Vegas should provide less packet loss, but it is
326	not as aggressive as TCP Reno.
327	Default:0
328
329	tcp_bic - BOOLEAN
330	Enable BIC TCP congestion control algorithm.
331	BIC-TCP is a sender-side only change that ensures a linear RTT
332	fairness under large windows while offering both scalability and
333	bounded TCP-friendliness. The protocol combines two schemes
334	called additive increase and binary search increase. When the
335	congestion window is large, additive increase with a large
336	increment ensures linear RTT fairness as well as good
337	scalability. Under small congestion windows, binary search
338	increase provides TCP friendliness.
339	Default: 0
340
341	tcp_bic_low_window - INTEGER
342	Sets the threshold window (in packets) where BIC TCP starts to
343	adjust the congestion window. Below this threshold BIC TCP behaves
344	the same as the default TCP Reno.
345	Default: 14
346
347	tcp_bic_fast_convergence - BOOLEAN
348	Forces BIC TCP to more quickly respond to changes in congestion
349	window. Allows two flows sharing the same connection to converge
350	more rapidly.
351	Default: 1
352
353	tcp_default_win_scale - INTEGER
354	Sets the minimum window scale TCP will negotiate for on all
355	conections.
356	Default: 7
357
358	tcp_tso_win_divisor - INTEGER	307	tcp_tso_win_divisor - INTEGER
359	This allows control over what percentage of the congestion window	308	This allows control over what percentage of the congestion window
360	can be consumed by a single TSO frame.	309	can be consumed by a single TSO frame.
@@ -368,6 +317,11 @@ tcp_frto - BOOLEAN
368	where packet loss is typically due to random radio interference	317	where packet loss is typically due to random radio interference
369	rather than intermediate router congestion.	318	rather than intermediate router congestion.
370		319
		320	tcp_congestion_control - STRING
		321	Set the congestion control algorithm to be used for new
		322	connections. The algorithm "reno" is always available, but
		323	additional choices may be available based on kernel configuration.
		324
371	somaxconn - INTEGER	325	somaxconn - INTEGER
372	Limit of socket listen() backlog, known in userspace as SOMAXCONN.	326	Limit of socket listen() backlog, known in userspace as SOMAXCONN.
373	Defaults to 128. See also tcp_max_syn_backlog for additional tuning	327	Defaults to 128. See also tcp_max_syn_backlog for additional tuning


diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt index 71749007091e..0fa300425575 100644 --- a/Documentation/networking/tcp.txt +++ b/Documentation/networking/tcp.txt
@@ -1,5 +1,72 @@
1	How the new TCP output machine [nyi] works.	1	TCP protocol
		2	============
		3
		4	Last updated: 21 June 2005
		5
		6	Contents
		7	========
		8
		9	- Congestion control
		10	- How the new TCP output machine [nyi] works
		11
		12	Congestion control
		13	==================
		14
		15	The following variables are used in the tcp_sock for congestion control:
		16	snd_cwnd The size of the congestion window
		17	snd_ssthresh Slow start threshold. We are in slow start if
		18	snd_cwnd is less than this.
		19	snd_cwnd_cnt A counter used to slow down the rate of increase
		20	once we exceed slow start threshold.
		21	snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to.
		22	snd_cwnd_stamp Timestamp for when congestion window last validated.
		23	snd_cwnd_used Used as a highwater mark for how much of the
		24	congestion window is in use. It is used to adjust
		25	snd_cwnd down when the link is limited by the
		26	application rather than the network.
		27
		28	As of 2.6.13, Linux supports pluggable congestion control algorithms.
		29	A congestion control mechanism can be registered through functions in
		30	tcp_cong.c. The functions used by the congestion control mechanism are
		31	registered via passing a tcp_congestion_ops struct to
		32	tcp_register_congestion_control. As a minimum name, ssthresh,
		33	cong_avoid, min_cwnd must be valid.
2		34
		35	Private data for a congestion control mechanism is stored in tp->ca_priv.
		36	tcp_ca(tp) returns a pointer to this space. This is preallocated space - it
		37	is important to check the size of your private data will fit this space, or
		38	alternatively space could be allocated elsewhere and a pointer to it could
		39	be stored here.
		40
		41	There are three kinds of congestion control algorithms currently: The
		42	simplest ones are derived from TCP reno (highspeed, scalable) and just
		43	provide an alternative the congestion window calculation. More complex
		44	ones like BIC try to look at other events to provide better
		45	heuristics. There are also round trip time based algorithms like
		46	Vegas and Westwood+.
		47
		48	Good TCP congestion control is a complex problem because the algorithm
		49	needs to maintain fairness and performance. Please review current
		50	research and RFC's before developing new modules.
		51
		52	The method that is used to determine which congestion control mechanism is
		53	determined by the setting of the sysctl net.ipv4.tcp_congestion_control.
		54	The default congestion control will be the last one registered (LIFO);
		55	so if you built everything as modules. the default will be reno. If you
		56	build with the default's from Kconfig, then BIC will be builtin (not a module)
		57	and it will end up the default.
		58
		59	If you really want a particular default value then you will need
		60	to set it with the sysctl. If you use a sysctl, the module will be autoloaded
		61	if needed and you will get the expected protocol. If you ask for an
		62	unknown congestion method, then the sysctl attempt will fail.
		63
		64	If you remove a tcp congestion control module, then you will get the next
		65	available one. Since reno can not be built as a module, and can not be
		66	deleted, it will always be available.
		67
		68	How the new TCP output machine [nyi] works.
		69	===========================================
3		70
4	Data is kept on a single queue. The skb->users flag tells us if the frame is	71	Data is kept on a single queue. The skb->users flag tells us if the frame is
5	one that has been queued already. To add a frame we throw it on the end. Ack	72	one that has been queued already. To add a frame we throw it on the end. Ack