From 9b99b7e90bf5cd9a88823c49574ba80d92a98262 Mon Sep 17 00:00:00 2001 From: Paolo Valente Date: Tue, 5 Mar 2013 08:04:57 +0000 Subject: pkt_sched: sch_qfq: properly cap timestamps in charge_actual_service QFQ+ schedules the active aggregates in a group using a bucket list (one list per group). The bucket in which each aggregate is inserted depends on the aggregate's timestamps, and the number of buckets in a group is enough to accomodate the possible (range of) values of the timestamps of all the aggregates in the group. For this property to hold, timestamps must however be computed correctly. One necessary condition for computing timestamps correctly is that the number of bits dequeued for each aggregate, while the aggregate is in service, does not exceed the maximum budget budgetmax assigned to the aggregate. For each aggregate, budgetmax is proportional to the number of classes in the aggregate. If the number of classes of the aggregate is decreased through qfq_change_class(), then budgetmax is decreased automatically as well. Problems may occur if the aggregate is in service when budgetmax is decreased, because the current remaining budget of the aggregate and/or the service already received by the aggregate may happen to be larger than the new value of budgetmax. In this case, when the aggregate is eventually deselected and its timestamps are updated, the aggregate may happen to have received an amount of service larger than budgetmax. This may cause the aggregate to be assigned a higher virtual finish time than the maximum acceptable value for the last bucket in the bucket list of the group. This fix introduces a cap that addresses this issue. Signed-off-by: Paolo Valente Reviewed-by: Fabio Checconi Signed-off-by: David S. Miller --- net/sched/sch_qfq.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) (limited to 'net/sched') diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index e9a77f621c3d..489edd929932 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -298,6 +298,10 @@ static void qfq_update_agg(struct qfq_sched *q, struct qfq_aggregate *agg, new_num_classes == q->max_agg_classes - 1) /* agg no more full */ hlist_add_head(&agg->nonfull_next, &q->nonfull_aggs); + /* The next assignment may let + * agg->initial_budget > agg->budgetmax + * hold, we will take it into account in charge_actual_service(). + */ agg->budgetmax = new_num_classes * agg->lmax; new_agg_weight = agg->class_weight * new_num_classes; agg->inv_w = ONE_FP/new_agg_weight; @@ -988,8 +992,13 @@ static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg, /* Update F according to the actual service received by the aggregate. */ static inline void charge_actual_service(struct qfq_aggregate *agg) { - /* compute the service received by the aggregate */ - u32 service_received = agg->initial_budget - agg->budget; + /* Compute the service received by the aggregate, taking into + * account that, after decreasing the number of classes in + * agg, it may happen that + * agg->initial_budget - agg->budget > agg->bugdetmax + */ + u32 service_received = min(agg->budgetmax, + agg->initial_budget - agg->budget); agg->F = agg->S + (u64)service_received * agg->inv_w; } -- cgit v1.2.2 From 624b85fb96206879c8146abdba071222bbd25259 Mon Sep 17 00:00:00 2001 From: Paolo Valente Date: Tue, 5 Mar 2013 08:04:58 +0000 Subject: pkt_sched: sch_qfq: fix the update of eligible-group sets Between two invocations of make_eligible, the system virtual time may happen to grow enough that, in its binary representation, a bit with higher order than 31 flips. This happens especially with TSO/GSO. Before this fix, the mask used in make_eligible was computed as (1UL< 31. The fix just replaces 1UL with 1ULL. Signed-off-by: Paolo Valente Reviewed-by: Fabio Checconi Signed-off-by: David S. Miller --- net/sched/sch_qfq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/sched') diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 489edd929932..39d85cc0db26 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -821,7 +821,7 @@ static void qfq_make_eligible(struct qfq_sched *q) unsigned long old_vslot = q->oldV >> q->min_slot_shift; if (vslot != old_vslot) { - unsigned long mask = (1UL << fls(vslot ^ old_vslot)) - 1; + unsigned long mask = (1ULL << fls(vslot ^ old_vslot)) - 1; qfq_move_groups(q, mask, IR, ER); qfq_move_groups(q, mask, IB, EB); } -- cgit v1.2.2 From 2f3b89a1fe0823fceb544856c9eeb036a75ff091 Mon Sep 17 00:00:00 2001 From: Paolo Valente Date: Tue, 5 Mar 2013 08:04:59 +0000 Subject: pkt_sched: sch_qfq: serve activated aggregates immediately if the scheduler is empty If no aggregate is in service, then the function qfq_dequeue() does not dequeue any packet. For this reason, to guarantee QFQ+ to be work conserving, a just-activated aggregate must be set as in service immediately if it happens to be the only active aggregate. This is done by the function qfq_enqueue(). Unfortunately, the function qfq_add_to_agg(), used to add a class to an aggregate, does not perform this important additional operation. In particular, if: 1) qfq_add_to_agg() is invoked to complete the move of a class from a source aggregate, becoming, for this move, inactive, to a destination aggregate, becoming instead active, and 2) the destination aggregate becomes the only active aggregate, then this aggregate is not however set as in service. QFQ+ remains then in a non-work-conserving state until a new invocation of qfq_enqueue() recovers the situation. This fix solves the problem by moving the logic for setting an aggregate as in service directly into the function qfq_activate_agg(). Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+ remains work conserving. Since the more-complex logic of this new version of activate_aggregate() is not necessary, in qfq_dequeue(), to reschedule an aggregate that finishes its budget, then the aggregate is now rescheduled by invoking directly the functions needed. Signed-off-by: Paolo Valente Reviewed-by: Fabio Checconi Signed-off-by: David S. Miller --- net/sched/sch_qfq.c | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) (limited to 'net/sched') diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 39d85cc0db26..8c5172c7f193 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -1003,6 +1003,12 @@ static inline void charge_actual_service(struct qfq_aggregate *agg) agg->F = agg->S + (u64)service_received * agg->inv_w; } +static inline void qfq_update_agg_ts(struct qfq_sched *q, + struct qfq_aggregate *agg, + enum update_reason reason); + +static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg); + static struct sk_buff *qfq_dequeue(struct Qdisc *sch) { struct qfq_sched *q = qdisc_priv(sch); @@ -1030,7 +1036,7 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch) in_serv_agg->initial_budget = in_serv_agg->budget = in_serv_agg->budgetmax; - if (!list_empty(&in_serv_agg->active)) + if (!list_empty(&in_serv_agg->active)) { /* * Still active: reschedule for * service. Possible optimization: if no other @@ -1041,8 +1047,9 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch) * handle it, we would need to maintain an * extra num_active_aggs field. */ - qfq_activate_agg(q, in_serv_agg, requeue); - else if (sch->q.qlen == 0) { /* no aggregate to serve */ + qfq_update_agg_ts(q, in_serv_agg, requeue); + qfq_schedule_agg(q, in_serv_agg); + } else if (sch->q.qlen == 0) { /* no aggregate to serve */ q->in_serv_agg = NULL; return NULL; } @@ -1226,17 +1233,11 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) cl->deficit = agg->lmax; list_add_tail(&cl->alist, &agg->active); - if (list_first_entry(&agg->active, struct qfq_class, alist) != cl) - return err; /* aggregate was not empty, nothing else to do */ - - /* recharge budget */ - agg->initial_budget = agg->budget = agg->budgetmax; + if (list_first_entry(&agg->active, struct qfq_class, alist) != cl || + q->in_serv_agg == agg) + return err; /* non-empty or in service, nothing else to do */ - qfq_update_agg_ts(q, agg, enqueue); - if (q->in_serv_agg == NULL) - q->in_serv_agg = agg; - else if (agg != q->in_serv_agg) - qfq_schedule_agg(q, agg); + qfq_activate_agg(q, agg, enqueue); return err; } @@ -1293,8 +1294,15 @@ skip_update: static void qfq_activate_agg(struct qfq_sched *q, struct qfq_aggregate *agg, enum update_reason reason) { + agg->initial_budget = agg->budget = agg->budgetmax; /* recharge budg. */ + qfq_update_agg_ts(q, agg, reason); - qfq_schedule_agg(q, agg); + if (q->in_serv_agg == NULL) { /* no aggr. in service or scheduled */ + q->in_serv_agg = agg; /* start serving this aggregate */ + /* update V: to be in service, agg must be eligible */ + q->oldV = q->V = agg->S; + } else if (agg != q->in_serv_agg) + qfq_schedule_agg(q, agg); } static void qfq_slot_remove(struct qfq_sched *q, struct qfq_group *grp, -- cgit v1.2.2 From a0143efa96671dc51dab9bba776a66f9bfa1757f Mon Sep 17 00:00:00 2001 From: Paolo Valente Date: Tue, 5 Mar 2013 08:05:00 +0000 Subject: pkt_sched: sch_qfq: prevent budget from wrapping around after a dequeue Aggregate budgets are computed so as to guarantee that, after an aggregate has been selected for service, that aggregate has enough budget to serve at least one maximum-size packet for the classes it contains. For this reason, after a new aggregate has been selected for service, its next packet is immediately dequeued, without any further control. The maximum packet size for a class, lmax, can be changed through qfq_change_class(). In case the user sets lmax to a lower value than the the size of some of the still-to-arrive packets, QFQ+ will automatically push up lmax as it enqueues these packets. This automatic push up is likely to happen with TSO/GSO. In any case, if lmax is assigned a lower value than the size of some of the packets already enqueued for the class, then the following problem may occur: the size of the next packet to dequeue for the class may happen to be larger than lmax, after the aggregate to which the class belongs has been just selected for service. In this case, even the budget of the aggregate, which is an unsigned value, may be lower than the size of the next packet to dequeue. After dequeueing this packet and subtracting its size from the budget, the latter would wrap around. This fix prevents the budget from wrapping around after any packet dequeue. Signed-off-by: Paolo Valente Reviewed-by: Fabio Checconi Signed-off-by: David S. Miller --- net/sched/sch_qfq.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'net/sched') diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 8c5172c7f193..c34af93ddfe2 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -1068,7 +1068,15 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch) qdisc_bstats_update(sch, skb); agg_dequeue(in_serv_agg, cl, len); - in_serv_agg->budget -= len; + /* If lmax is lowered, through qfq_change_class, for a class + * owning pending packets with larger size than the new value + * of lmax, then the following condition may hold. + */ + if (unlikely(in_serv_agg->budget < len)) + in_serv_agg->budget = 0; + else + in_serv_agg->budget -= len; + q->V += (u64)len * IWSUM; pr_debug("qfq dequeue: len %u F %lld now %lld\n", len, (unsigned long long) in_serv_agg->F, -- cgit v1.2.2 From 40dd2d546198e7bbb8d3fe718957b158caa3fe52 Mon Sep 17 00:00:00 2001 From: Paolo Valente Date: Tue, 5 Mar 2013 08:05:01 +0000 Subject: pkt_sched: sch_qfq: do not allow virtual time to jump if an aggregate is in service By definition of (the algorithm of) QFQ+, the system virtual time must be pushed up only if there is no 'eligible' aggregate, i.e. no aggregate that would have started to be served also in the ideal system emulated by QFQ+. QFQ+ serves only eligible aggregates, hence the aggregate currently in service is eligible. As a consequence, to decide whether there is no eligible aggregate, QFQ+ must also check whether there is no aggregate in service. Signed-off-by: Paolo Valente Reviewed-by: Fabio Checconi Signed-off-by: David S. Miller --- net/sched/sch_qfq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'net/sched') diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index c34af93ddfe2..6b7ce803d07c 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -1279,7 +1279,8 @@ static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg) /* group was surely ineligible, remove */ __clear_bit(grp->index, &q->bitmaps[IR]); __clear_bit(grp->index, &q->bitmaps[IB]); - } else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V)) + } else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V) && + q->in_serv_agg == NULL) q->V = roundedS; grp->S = roundedS; -- cgit v1.2.2 From 76e4cb0d3a583900f844f1704b19b7e8c5df8837 Mon Sep 17 00:00:00 2001 From: Paolo Valente Date: Tue, 5 Mar 2013 08:05:02 +0000 Subject: pkt_sched: sch_qfq: remove a useless invocation of qfq_update_eligible QFQ+ can select for service only 'eligible' aggregates, i.e., aggregates that would have started to be served also in the emulated ideal system. As a consequence, for QFQ+ to be work conserving, at least one of the active aggregates must be eligible when it is time to choose the next aggregate to serve. The set of eligible aggregates is updated through the function qfq_update_eligible(), which does guarantee that, after its invocation, at least one of the active aggregates is eligible. Because of this property, this function is invoked in qfq_deactivate_agg() to guarantee that at least one of the active aggregates is still eligible after an aggregate has been deactivated. In particular, the critical case is when there are other active aggregates, but the aggregate being deactivated happens to be the only one eligible. However, this precaution is not needed for QFQ+ to be work conserving, because update_eligible() is always invoked also at the beginning of qfq_choose_next_agg(). This patch removes the additional invocation of update_eligible() in qfq_deactivate_agg(). Signed-off-by: Paolo Valente Reviewed-by: Fabio Checconi Signed-off-by: David S. Miller --- net/sched/sch_qfq.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'net/sched') diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 6b7ce803d07c..d51852bba01c 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -1383,8 +1383,6 @@ static void qfq_deactivate_agg(struct qfq_sched *q, struct qfq_aggregate *agg) __set_bit(grp->index, &q->bitmaps[s]); } } - - qfq_update_eligible(q); } static void qfq_qlen_notify(struct Qdisc *sch, unsigned long arg) -- cgit v1.2.2