aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
authorEric Dumazet <eric.dumazet@gmail.com>2012-07-11 01:50:31 -0400
committerDavid S. Miller <davem@davemloft.net>2012-07-11 21:12:59 -0400
commit46d3ceabd8d98ed0ad10f20c595ca784e34786c5 (patch)
tree771200292431be56c6ebcb23af9206bc03d40e65 /Documentation/networking
parent2100844ca9d7055d5cddce2f8ed13af94c01f85b (diff)
tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues) TSQ goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. sk->sk_wmem_alloc not allowed to grow above a given limit, allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a given time. TSO packets are sized/capped to half the limit, so that we have two TSO packets in flight, allowing better bandwidth use. As a side effect, setting the limit to 40000 automatically reduces the standard gso max limit (65536) to 40000/2 : It can help to reduce latencies of high prio packets, having smaller TSO packets. This means we divert sock_wfree() to a tcp_wfree() handler, to queue/send following frames when skb_orphan() [2] is called for the already queued skbs. Results on my dev machines (tg3/ixgbe nics) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) < 8ms on 100Mbit (instead of 132 ms) I no longer have 4 MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes. As skb destructor cannot restart xmit itself ( as qdisc lock might be taken at this point ), we delegate the work to a tasklet. We use one tasklest per cpu for performance reasons. If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag. This flag is tested in a new protocol method called from release_sock(), to eventually send new segments. [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable [2] skb_orphan() is usually called at TX completion time, but some drivers call it in their start_xmit() handler. These drivers should at least use BQL, or else a single TCP session can still fill the whole NIC TX ring, since TSQ will have no effect. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dave Taht <dave.taht@bufferbloat.net> Cc: Tom Herbert <therbert@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/ip-sysctl.txt14
1 files changed, 14 insertions, 0 deletions
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 47b6c79e9b05..e20c17a7d34e 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -551,6 +551,20 @@ tcp_thin_dupack - BOOLEAN
551 Documentation/networking/tcp-thin.txt 551 Documentation/networking/tcp-thin.txt
552 Default: 0 552 Default: 0
553 553
554tcp_limit_output_bytes - INTEGER
555 Controls TCP Small Queue limit per tcp socket.
556 TCP bulk sender tends to increase packets in flight until it
557 gets losses notifications. With SNDBUF autotuning, this can
558 result in a large amount of packets queued in qdisc/device
559 on the local machine, hurting latency of other flows, for
560 typical pfifo_fast qdiscs.
561 tcp_limit_output_bytes limits the number of bytes on qdisc
562 or device to reduce artificial RTT/cwnd and reduce bufferbloat.
563 Note: For GSO/TSO enabled flows, we try to have at least two
564 packets in flight. Reducing tcp_limit_output_bytes might also
565 reduce the size of individual GSO packet (64KB being the max)
566 Default: 131072
567
554UDP variables: 568UDP variables:
555 569
556udp_mem - vector of 3 INTEGERs: min, pressure, max 570udp_mem - vector of 3 INTEGERs: min, pressure, max