summaryrefslogtreecommitdiffstats
path: root/block/bfq-iosched.c
diff options
context:
space:
mode:
authorPaolo Valente <paolo.valente@linaro.org>2017-04-12 12:23:11 -0400
committerJens Axboe <axboe@fb.com>2017-04-19 10:30:26 -0400
commitc074170e65995706be78e8c57ed2017c638d5464 (patch)
tree380cebbd58df1c0ecfb033e1b19755cf829e2530 /block/bfq-iosched.c
parentab0e43e9cea047873599bc8041cd6278781fd4e0 (diff)
block, bfq: add more fairness with writes and slow processes
This patch deals with two sources of unfairness, which can also cause high latencies and throughput loss. The first source is related to write requests. Write requests tend to starve read requests, basically because, on one side, writes are slower than reads, whereas, on the other side, storage devices confuse schedulers by deceptively signaling the completion of write requests immediately after receiving them. This patch addresses this issue by just throttling writes. In particular, after a write request is dispatched for a queue, the budget of the queue is decremented by the number of sectors to write, multiplied by an (over)charge coefficient. The value of the coefficient is the result of our tuning with different devices. The second source of unfairness has to do with slowness detection: when the in-service queue is expired, BFQ also controls whether the queue has been "too slow", i.e., has consumed its last-assigned budget at such a low rate that it would have been impossible to consume all of this budget within the maximum time slice T_max (Subsec. 3.5 in [1]). In this case, the queue is always (over)charged the whole budget, to reduce its utilization of the device. Both this overcharge and the slowness-detection criterion may cause unfairness. First, always charging a full budget to a slow queue is too coarse. It is much more accurate, and this patch lets BFQ do so, to charge an amount of service 'equivalent' to the amount of time during which the queue has been in service. As explained in more detail in the comments on the code, this enables BFQ to provide time fairness among slow queues. Secondly, because of ZBR, a queue may be deemed as slow when its associated process is performing I/O on the slowest zones of a disk. However, unless the process is truly too slow, not reducing the disk utilization of the queue is more profitable in terms of disk throughput than the opposite. A similar problem is caused by logical block mapping on non-rotational devices. For this reason, this patch lets a queue be charged time, and not budget, only if the queue has consumed less than 2/3 of its assigned budget. As an additional, important benefit, this tolerance allows BFQ to preserve enough elasticity to still perform bandwidth, and not time, distribution with little unlucky or quasi-sequential processes. Finally, for the same reasons as above, this patch makes slowness detection itself much less harsh: a queue is deemed slow only if it has consumed its budget at less than half of the peak rate. [1] P. Valente and M. Andreolini, "Improving Application Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of the 5th Annual International Systems and Storage Conference (SYSTOR '12), June 2012. Slightly extended version: http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite- results.pdf Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Diffstat (limited to 'block/bfq-iosched.c')
-rw-r--r--block/bfq-iosched.c120
1 files changed, 85 insertions, 35 deletions
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 61d880b90882..dce273b91015 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -753,6 +753,13 @@ static const int bfq_stats_min_budgets = 194;
753/* Default maximum budget values, in sectors and number of requests. */ 753/* Default maximum budget values, in sectors and number of requests. */
754static const int bfq_default_max_budget = 16 * 1024; 754static const int bfq_default_max_budget = 16 * 1024;
755 755
756/*
757 * Async to sync throughput distribution is controlled as follows:
758 * when an async request is served, the entity is charged the number
759 * of sectors of the request, multiplied by the factor below
760 */
761static const int bfq_async_charge_factor = 10;
762
756/* Default timeout values, in jiffies, approximating CFQ defaults. */ 763/* Default timeout values, in jiffies, approximating CFQ defaults. */
757static const int bfq_timeout = HZ / 8; 764static const int bfq_timeout = HZ / 8;
758 765
@@ -1571,22 +1578,52 @@ static void bfq_bfqq_served(struct bfq_queue *bfqq, int served)
1571} 1578}
1572 1579
1573/** 1580/**
1574 * bfq_bfqq_charge_full_budget - set the service to the entity budget. 1581 * bfq_bfqq_charge_time - charge an amount of service equivalent to the length
1582 * of the time interval during which bfqq has been in
1583 * service.
1584 * @bfqd: the device
1575 * @bfqq: the queue that needs a service update. 1585 * @bfqq: the queue that needs a service update.
1586 * @time_ms: the amount of time during which the queue has received service
1576 * 1587 *
1577 * When it's not possible to be fair in the service domain, because 1588 * If a queue does not consume its budget fast enough, then providing
1578 * a queue is not consuming its budget fast enough (the meaning of 1589 * the queue with service fairness may impair throughput, more or less
1579 * fast depends on the timeout parameter), we charge it a full 1590 * severely. For this reason, queues that consume their budget slowly
1580 * budget. In this way we should obtain a sort of time-domain 1591 * are provided with time fairness instead of service fairness. This
1581 * fairness among all the seeky/slow queues. 1592 * goal is achieved through the BFQ scheduling engine, even if such an
1593 * engine works in the service, and not in the time domain. The trick
1594 * is charging these queues with an inflated amount of service, equal
1595 * to the amount of service that they would have received during their
1596 * service slot if they had been fast, i.e., if their requests had
1597 * been dispatched at a rate equal to the estimated peak rate.
1598 *
1599 * It is worth noting that time fairness can cause important
1600 * distortions in terms of bandwidth distribution, on devices with
1601 * internal queueing. The reason is that I/O requests dispatched
1602 * during the service slot of a queue may be served after that service
1603 * slot is finished, and may have a total processing time loosely
1604 * correlated with the duration of the service slot. This is
1605 * especially true for short service slots.
1582 */ 1606 */
1583static void bfq_bfqq_charge_full_budget(struct bfq_queue *bfqq) 1607static void bfq_bfqq_charge_time(struct bfq_data *bfqd, struct bfq_queue *bfqq,
1608 unsigned long time_ms)
1584{ 1609{
1585 struct bfq_entity *entity = &bfqq->entity; 1610 struct bfq_entity *entity = &bfqq->entity;
1611 int tot_serv_to_charge = entity->service;
1612 unsigned int timeout_ms = jiffies_to_msecs(bfq_timeout);
1613
1614 if (time_ms > 0 && time_ms < timeout_ms)
1615 tot_serv_to_charge =
1616 (bfqd->bfq_max_budget * time_ms) / timeout_ms;
1586 1617
1587 bfq_log_bfqq(bfqq->bfqd, bfqq, "charge_full_budget"); 1618 if (tot_serv_to_charge < entity->service)
1619 tot_serv_to_charge = entity->service;
1588 1620
1589 bfq_bfqq_served(bfqq, entity->budget - entity->service); 1621 /* Increase budget to avoid inconsistencies */
1622 if (tot_serv_to_charge > entity->budget)
1623 entity->budget = tot_serv_to_charge;
1624
1625 bfq_bfqq_served(bfqq,
1626 max_t(int, 0, tot_serv_to_charge - entity->service));
1590} 1627}
1591 1628
1592static void bfq_update_fin_time_enqueue(struct bfq_entity *entity, 1629static void bfq_update_fin_time_enqueue(struct bfq_entity *entity,
@@ -3572,10 +3609,14 @@ static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
3572 return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last)); 3609 return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
3573} 3610}
3574 3611
3612/* see the definition of bfq_async_charge_factor for details */
3575static unsigned long bfq_serv_to_charge(struct request *rq, 3613static unsigned long bfq_serv_to_charge(struct request *rq,
3576 struct bfq_queue *bfqq) 3614 struct bfq_queue *bfqq)
3577{ 3615{
3578 return blk_rq_sectors(rq); 3616 if (bfq_bfqq_sync(bfqq))
3617 return blk_rq_sectors(rq);
3618
3619 return blk_rq_sectors(rq) * bfq_async_charge_factor;
3579} 3620}
3580 3621
3581/** 3622/**
@@ -4676,28 +4717,24 @@ static unsigned long bfq_smallest_from_now(void)
4676 * @compensate: if true, compensate for the time spent idling. 4717 * @compensate: if true, compensate for the time spent idling.
4677 * @reason: the reason causing the expiration. 4718 * @reason: the reason causing the expiration.
4678 * 4719 *
4720 * If the process associated with bfqq does slow I/O (e.g., because it
4721 * issues random requests), we charge bfqq with the time it has been
4722 * in service instead of the service it has received (see
4723 * bfq_bfqq_charge_time for details on how this goal is achieved). As
4724 * a consequence, bfqq will typically get higher timestamps upon
4725 * reactivation, and hence it will be rescheduled as if it had
4726 * received more service than what it has actually received. In the
4727 * end, bfqq receives less service in proportion to how slowly its
4728 * associated process consumes its budgets (and hence how seriously it
4729 * tends to lower the throughput). In addition, this time-charging
4730 * strategy guarantees time fairness among slow processes. In
4731 * contrast, if the process associated with bfqq is not slow, we
4732 * charge bfqq exactly with the service it has received.
4679 * 4733 *
4680 * If the process associated with the queue is slow (i.e., seeky), or 4734 * Charging time to the first type of queues and the exact service to
4681 * in case of budget timeout, or, finally, if it is async, we 4735 * the other has the effect of using the WF2Q+ policy to schedule the
4682 * artificially charge it an entire budget (independently of the 4736 * former on a timeslice basis, without violating service domain
4683 * actual service it received). As a consequence, the queue will get 4737 * guarantees among the latter.
4684 * higher timestamps than the correct ones upon reactivation, and
4685 * hence it will be rescheduled as if it had received more service
4686 * than what it actually received. In the end, this class of processes
4687 * will receive less service in proportion to how slowly they consume
4688 * their budgets (and hence how seriously they tend to lower the
4689 * throughput).
4690 *
4691 * In contrast, when a queue expires because it has been idling for
4692 * too much or because it exhausted its budget, we do not touch the
4693 * amount of service it has received. Hence when the queue will be
4694 * reactivated and its timestamps updated, the latter will be in sync
4695 * with the actual service received by the queue until expiration.
4696 *
4697 * Charging a full budget to the first type of queues and the exact
4698 * service to the others has the effect of using the WF2Q+ policy to
4699 * schedule the former on a timeslice basis, without violating the
4700 * service domain guarantees of the latter.
4701 */ 4738 */
4702static void bfq_bfqq_expire(struct bfq_data *bfqd, 4739static void bfq_bfqq_expire(struct bfq_data *bfqd,
4703 struct bfq_queue *bfqq, 4740 struct bfq_queue *bfqq,
@@ -4715,11 +4752,24 @@ static void bfq_bfqq_expire(struct bfq_data *bfqd,
4715 slow = bfq_bfqq_is_slow(bfqd, bfqq, compensate, reason, &delta); 4752 slow = bfq_bfqq_is_slow(bfqd, bfqq, compensate, reason, &delta);
4716 4753
4717 /* 4754 /*
4718 * As above explained, 'punish' slow (i.e., seeky), timed-out 4755 * As above explained, charge slow (typically seeky) and
4719 * and async queues, to favor sequential sync workloads. 4756 * timed-out queues with the time and not the service
4757 * received, to favor sequential workloads.
4758 *
4759 * Processes doing I/O in the slower disk zones will tend to
4760 * be slow(er) even if not seeky. Therefore, since the
4761 * estimated peak rate is actually an average over the disk
4762 * surface, these processes may timeout just for bad luck. To
4763 * avoid punishing them, do not charge time to processes that
4764 * succeeded in consuming at least 2/3 of their budget. This
4765 * allows BFQ to preserve enough elasticity to still perform
4766 * bandwidth, and not time, distribution with little unlucky
4767 * or quasi-sequential processes.
4720 */ 4768 */
4721 if (slow || reason == BFQQE_BUDGET_TIMEOUT) 4769 if (slow ||
4722 bfq_bfqq_charge_full_budget(bfqq); 4770 (reason == BFQQE_BUDGET_TIMEOUT &&
4771 bfq_bfqq_budget_left(bfqq) >= entity->budget / 3))
4772 bfq_bfqq_charge_time(bfqd, bfqq, delta);
4723 4773
4724 if (reason == BFQQE_TOO_IDLE && 4774 if (reason == BFQQE_TOO_IDLE &&
4725 entity->service <= 2 * entity->budget / 10) 4775 entity->service <= 2 * entity->budget / 10)