summaryrefslogtreecommitdiffstats
path: root/net/sched
diff options
context:
space:
mode:
authorNik Unger <njunger@uwaterloo.ca>2017-03-13 13:16:58 -0400
committerDavid S. Miller <davem@davemloft.net>2017-03-16 23:14:06 -0400
commit5080f39e8c72e01cf37e8359023e7018e2a4901e (patch)
tree853526eed80a9df6b650ccc92adfd99b5334b08b /net/sched
parentcd918afd023a7c213a5b0a397c94c91c96b36e35 (diff)
netem: apply correct delay when rate throttling
I recently reported on the netem list that iperf network benchmarks show unexpected results when a bandwidth throttling rate has been configured for netem. Specifically: 1) The measured link bandwidth *increases* when a higher delay is added 2) The measured link bandwidth appears higher than the specified limit 3) The measured link bandwidth for the same very slow settings varies significantly across machines The issue can be reproduced by using tc to configure netem with a 512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a veth pair between network namespaces, and then using iperf (or any other network benchmarking tool) to test throughput. Complete detailed instructions are in the original email chain here: https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html There appear to be two underlying bugs causing these effects: - The first issue causes long delays when the rate is slow and no delay is configured (e.g., "rate 512kbit"). This is because SKBs are not orphaned when no delay is configured, so orphaning does not occur until *after* the rate-induced delay has been applied. For this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us") dramatically increases the measured bandwidth. - The second issue is that rate-induced delays are not correctly applied, allowing SKB delays to occur in parallel. The indended approach is to compute the delay for an SKB and to add this delay to the end of the current queue. However, the code does not detect existing SKBs in the queue due to improperly testing sch->q.qlen, which is nonzero even when packets exist only in the rbtree. Consequently, new SKBs do not wait for the current queue to empty. When packet delays vary significantly (e.g., if packet sizes are different), then this also causes unintended reordering. I modified the code to expect a delay (and orphan the SKB) when a rate is configured. I also added some defensive tests that correctly find the latest scheduled delivery time, even if it is (unexpectedly) for a packet in sch->q. I have tested these changes on the latest kernel (4.11.0-rc1+) and the iperf / ping test results are as expected. Signed-off-by: Nik Unger <njunger@uwaterloo.ca> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/sched')
-rw-r--r--net/sched/sch_netem.c26
1 files changed, 18 insertions, 8 deletions
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index c8bb62a1e744..94b4928ad413 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -462,7 +462,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
462 /* If a delay is expected, orphan the skb. (orphaning usually takes 462 /* If a delay is expected, orphan the skb. (orphaning usually takes
463 * place at TX completion time, so _before_ the link transit delay) 463 * place at TX completion time, so _before_ the link transit delay)
464 */ 464 */
465 if (q->latency || q->jitter) 465 if (q->latency || q->jitter || q->rate)
466 skb_orphan_partial(skb); 466 skb_orphan_partial(skb);
467 467
468 /* 468 /*
@@ -530,21 +530,31 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
530 now = psched_get_time(); 530 now = psched_get_time();
531 531
532 if (q->rate) { 532 if (q->rate) {
533 struct sk_buff *last; 533 struct netem_skb_cb *last = NULL;
534
535 if (sch->q.tail)
536 last = netem_skb_cb(sch->q.tail);
537 if (q->t_root.rb_node) {
538 struct sk_buff *t_skb;
539 struct netem_skb_cb *t_last;
540
541 t_skb = netem_rb_to_skb(rb_last(&q->t_root));
542 t_last = netem_skb_cb(t_skb);
543 if (!last ||
544 t_last->time_to_send > last->time_to_send) {
545 last = t_last;
546 }
547 }
534 548
535 if (sch->q.qlen)
536 last = sch->q.tail;
537 else
538 last = netem_rb_to_skb(rb_last(&q->t_root));
539 if (last) { 549 if (last) {
540 /* 550 /*
541 * Last packet in queue is reference point (now), 551 * Last packet in queue is reference point (now),
542 * calculate this time bonus and subtract 552 * calculate this time bonus and subtract
543 * from delay. 553 * from delay.
544 */ 554 */
545 delay -= netem_skb_cb(last)->time_to_send - now; 555 delay -= last->time_to_send - now;
546 delay = max_t(psched_tdiff_t, 0, delay); 556 delay = max_t(psched_tdiff_t, 0, delay);
547 now = netem_skb_cb(last)->time_to_send; 557 now = last->time_to_send;
548 } 558 }
549 559
550 delay += packet_len_2_sched_time(qdisc_pkt_len(skb), q); 560 delay += packet_len_2_sched_time(qdisc_pkt_len(skb), q);