diff options
author | Eric Dumazet <edumazet@google.com> | 2014-06-02 08:26:03 -0400 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2014-06-02 14:00:41 -0400 |
commit | 73f156a6e8c1074ac6327e0abd1169e95eb66463 (patch) | |
tree | 2c8b222f21784e738c397ba95dee70a8f256ea64 /net/ipv4/inetpeer.c | |
parent | e067ee336a9d3f038ffa9699c59f2abec3376bf7 (diff) |
inetpeer: get rid of ip_id_count
Ideally, we would need to generate IP ID using a per destination IP
generator.
linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.
1) each inet_peer struct consumes 192 bytes
2) inetpeer cache uses a binary tree of inet_peer structs,
with a nominal size of ~66000 elements under load.
3) lookups in this tree are hitting a lot of cache lines, as tree depth
is about 20.
4) If server deals with many tcp flows, we have a high probability of
not finding the inet_peer, allocating a fresh one, inserting it in
the tree with same initial ip_id_count, (cf secure_ip_id())
5) We garbage collect inet_peer aggressively.
IP ID generation do not have to be 'perfect'
Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.
We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.
ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)
secure_ip_id() and secure_ipv6_id() no longer are needed.
Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/inetpeer.c')
-rw-r--r-- | net/ipv4/inetpeer.c | 18 |
1 files changed, 0 insertions, 18 deletions
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c index c98cf141f4ed..4ced1b9a97f0 100644 --- a/net/ipv4/inetpeer.c +++ b/net/ipv4/inetpeer.c | |||
@@ -26,20 +26,7 @@ | |||
26 | * Theory of operations. | 26 | * Theory of operations. |
27 | * We keep one entry for each peer IP address. The nodes contains long-living | 27 | * We keep one entry for each peer IP address. The nodes contains long-living |
28 | * information about the peer which doesn't depend on routes. | 28 | * information about the peer which doesn't depend on routes. |
29 | * At this moment this information consists only of ID field for the next | ||
30 | * outgoing IP packet. This field is incremented with each packet as encoded | ||
31 | * in inet_getid() function (include/net/inetpeer.h). | ||
32 | * At the moment of writing this notes identifier of IP packets is generated | ||
33 | * to be unpredictable using this code only for packets subjected | ||
34 | * (actually or potentially) to defragmentation. I.e. DF packets less than | ||
35 | * PMTU in size when local fragmentation is disabled use a constant ID and do | ||
36 | * not use this code (see ip_select_ident() in include/net/ip.h). | ||
37 | * | 29 | * |
38 | * Route cache entries hold references to our nodes. | ||
39 | * New cache entries get references via lookup by destination IP address in | ||
40 | * the avl tree. The reference is grabbed only when it's needed i.e. only | ||
41 | * when we try to output IP packet which needs an unpredictable ID (see | ||
42 | * __ip_select_ident() in net/ipv4/route.c). | ||
43 | * Nodes are removed only when reference counter goes to 0. | 30 | * Nodes are removed only when reference counter goes to 0. |
44 | * When it's happened the node may be removed when a sufficient amount of | 31 | * When it's happened the node may be removed when a sufficient amount of |
45 | * time has been passed since its last use. The less-recently-used entry can | 32 | * time has been passed since its last use. The less-recently-used entry can |
@@ -62,7 +49,6 @@ | |||
62 | * refcnt: atomically against modifications on other CPU; | 49 | * refcnt: atomically against modifications on other CPU; |
63 | * usually under some other lock to prevent node disappearing | 50 | * usually under some other lock to prevent node disappearing |
64 | * daddr: unchangeable | 51 | * daddr: unchangeable |
65 | * ip_id_count: atomic value (no lock needed) | ||
66 | */ | 52 | */ |
67 | 53 | ||
68 | static struct kmem_cache *peer_cachep __read_mostly; | 54 | static struct kmem_cache *peer_cachep __read_mostly; |
@@ -497,10 +483,6 @@ relookup: | |||
497 | p->daddr = *daddr; | 483 | p->daddr = *daddr; |
498 | atomic_set(&p->refcnt, 1); | 484 | atomic_set(&p->refcnt, 1); |
499 | atomic_set(&p->rid, 0); | 485 | atomic_set(&p->rid, 0); |
500 | atomic_set(&p->ip_id_count, | ||
501 | (daddr->family == AF_INET) ? | ||
502 | secure_ip_id(daddr->addr.a4) : | ||
503 | secure_ipv6_id(daddr->addr.a6)); | ||
504 | p->metrics[RTAX_LOCK-1] = INETPEER_METRICS_NEW; | 486 | p->metrics[RTAX_LOCK-1] = INETPEER_METRICS_NEW; |
505 | p->rate_tokens = 0; | 487 | p->rate_tokens = 0; |
506 | /* 60*HZ is arbitrary, but chosen enough high so that the first | 488 | /* 60*HZ is arbitrary, but chosen enough high so that the first |