diff options
author | Eric Dumazet <eric.dumazet@gmail.com> | 2012-07-11 01:50:31 -0400 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2012-07-11 21:12:59 -0400 |
commit | 46d3ceabd8d98ed0ad10f20c595ca784e34786c5 (patch) | |
tree | 771200292431be56c6ebcb23af9206bc03d40e65 /Documentation/networking/ip-sysctl.txt | |
parent | 2100844ca9d7055d5cddce2f8ed13af94c01f85b (diff) |
tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues)
TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.
sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.
TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.
As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.
This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.
Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.
Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)
I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.
As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.
If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.
[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
but some drivers call it in their start_xmit() handler.
These drivers should at least use BQL, or else a single TCP
session can still fill the whole NIC TX ring, since TSQ will
have no effect.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking/ip-sysctl.txt')
-rw-r--r-- | Documentation/networking/ip-sysctl.txt | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 47b6c79e9b05..e20c17a7d34e 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt | |||
@@ -551,6 +551,20 @@ tcp_thin_dupack - BOOLEAN | |||
551 | Documentation/networking/tcp-thin.txt | 551 | Documentation/networking/tcp-thin.txt |
552 | Default: 0 | 552 | Default: 0 |
553 | 553 | ||
554 | tcp_limit_output_bytes - INTEGER | ||
555 | Controls TCP Small Queue limit per tcp socket. | ||
556 | TCP bulk sender tends to increase packets in flight until it | ||
557 | gets losses notifications. With SNDBUF autotuning, this can | ||
558 | result in a large amount of packets queued in qdisc/device | ||
559 | on the local machine, hurting latency of other flows, for | ||
560 | typical pfifo_fast qdiscs. | ||
561 | tcp_limit_output_bytes limits the number of bytes on qdisc | ||
562 | or device to reduce artificial RTT/cwnd and reduce bufferbloat. | ||
563 | Note: For GSO/TSO enabled flows, we try to have at least two | ||
564 | packets in flight. Reducing tcp_limit_output_bytes might also | ||
565 | reduce the size of individual GSO packet (64KB being the max) | ||
566 | Default: 131072 | ||
567 | |||
554 | UDP variables: | 568 | UDP variables: |
555 | 569 | ||
556 | udp_mem - vector of 3 INTEGERs: min, pressure, max | 570 | udp_mem - vector of 3 INTEGERs: min, pressure, max |