diff options
author | Eric Dumazet <eric.dumazet@gmail.com> | 2009-06-11 05:55:43 -0400 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2009-06-11 05:55:43 -0400 |
commit | 2b85a34e911bf483c27cfdd124aeb1605145dc80 (patch) | |
tree | 3cea3e8a27b62de2f92e759641c27200d8bde421 /net/ipv4/ip_output.c | |
parent | f2333a014c1e13ac8e1b73a6fd77731c524eff78 (diff) |
net: No more expensive sock_hold()/sock_put() on each tx
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.
This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.
We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.
We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.
As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.
skb_set_owner_w() doesnt call sock_hold() anymore
sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.
Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.
Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/ip_output.c')
-rw-r--r-- | net/ipv4/ip_output.c | 1 |
1 files changed, 0 insertions, 1 deletions
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 9248d2807ba6..247026282669 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c | |||
@@ -498,7 +498,6 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) | |||
498 | 498 | ||
499 | BUG_ON(frag->sk); | 499 | BUG_ON(frag->sk); |
500 | if (skb->sk) { | 500 | if (skb->sk) { |
501 | sock_hold(skb->sk); | ||
502 | frag->sk = skb->sk; | 501 | frag->sk = skb->sk; |
503 | frag->destructor = sock_wfree; | 502 | frag->destructor = sock_wfree; |
504 | truesizes += frag->truesize; | 503 | truesizes += frag->truesize; |