aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
authorNeal Cardwell <ncardwell@google.com>2015-02-06 16:04:38 -0500
committerDavid S. Miller <davem@davemloft.net>2015-02-08 04:03:12 -0500
commit032ee4236954eb214651cb9bfc1b38ffa8fd7a01 (patch)
treedf165996666757322162c263cebcec8fe3c93d1a /Documentation/networking
parentca539345f8767cca221b5aa77bf4329c725d0d7e (diff)
tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks
Helpers for mitigating ACK loops by rate-limiting dupacks sent in response to incoming out-of-window packets. This patch includes: - rate-limiting logic - sysctl to control how often we allow dupacks to out-of-window packets - SNMP counter for cases where we rate-limited our dupack sending The rate-limiting logic in this patch decides to not send dupacks in response to out-of-window segments if (a) they are SYNs or pure ACKs and (b) the remote endpoint is sending them faster than the configured rate limit. We rate-limit our responses rather than blocking them entirely or resetting the connection, because legitimate connections can rely on dupacks in response to some out-of-window segments. For example, zero window probes are typically sent with a sequence number that is below the current window, and ZWPs thus expect to thus elicit a dupack in response. We allow dupacks in response to TCP segments with data, because these may be spurious retransmissions for which the remote endpoint wants to receive DSACKs. This is safe because segments with data can't realistically be part of ACK loops, which by their nature consist of each side sending pure/data-less ACKs to each other. The dupack interval is controlled by a new sysctl knob, tcp_invalid_ratelimit, given in milliseconds, in case an administrator needs to dial this upward in the face of a high-rate DoS attack. The name and units are chosen to be analogous to the existing analogous knob for ICMP, icmp_ratelimit. The default value for tcp_invalid_ratelimit is 500ms, which allows at most one such dupack per 500ms. This is chosen to be 2x faster than the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule 2.4). We allow the extra 2x factor because network delay variations can cause packets sent at 1 second intervals to be compressed and arrive much closer. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/ip-sysctl.txt22
1 files changed, 22 insertions, 0 deletions
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index a5e4c813f17f..1b8c964b0d17 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -290,6 +290,28 @@ tcp_frto - INTEGER
290 290
291 By default it's enabled with a non-zero value. 0 disables F-RTO. 291 By default it's enabled with a non-zero value. 0 disables F-RTO.
292 292
293tcp_invalid_ratelimit - INTEGER
294 Limit the maximal rate for sending duplicate acknowledgments
295 in response to incoming TCP packets that are for an existing
296 connection but that are invalid due to any of these reasons:
297
298 (a) out-of-window sequence number,
299 (b) out-of-window acknowledgment number, or
300 (c) PAWS (Protection Against Wrapped Sequence numbers) check failure
301
302 This can help mitigate simple "ack loop" DoS attacks, wherein
303 a buggy or malicious middlebox or man-in-the-middle can
304 rewrite TCP header fields in manner that causes each endpoint
305 to think that the other is sending invalid TCP segments, thus
306 causing each side to send an unterminating stream of duplicate
307 acknowledgments for invalid segments.
308
309 Using 0 disables rate-limiting of dupacks in response to
310 invalid segments; otherwise this value specifies the minimal
311 space between sending such dupacks, in milliseconds.
312
313 Default: 500 (milliseconds).
314
293tcp_keepalive_time - INTEGER 315tcp_keepalive_time - INTEGER
294 How often TCP sends out keepalive messages when keepalive is enabled. 316 How often TCP sends out keepalive messages when keepalive is enabled.
295 Default: 2hours. 317 Default: 2hours.