summaryrefslogtreecommitdiffstats
path: root/block
diff options
context:
space:
mode:
authorPaolo Valente <paolo.valente@linaro.org>2019-03-12 04:59:29 -0400
committerJens Axboe <axboe@kernel.dk>2019-04-01 10:15:39 -0400
commit2341d662e9a2a5751ff8ac4ffa640fb493b0ee84 (patch)
tree9087f602e9181eb0e2fa16016acca3f1adf1b9d1 /block
parentfb53ac6cd0269987b1b77f957db453b3ec7bf7e4 (diff)
block, bfq: tune service injection basing on request service times
The processes associated with a bfq_queue, say Q, may happen to generate their cumulative I/O at a lower rate than the rate at which the device could serve the same I/O. This is rather probable, e.g., if only one process is associated with Q and the device is an SSD. It results in Q becoming often empty while in service. If BFQ is not allowed to switch to another queue when Q becomes empty, then, during the service of Q, there will be frequent "service holes", i.e., time intervals during which Q gets empty and the device can only consume the I/O already queued in its hardware queues. This easily causes considerable losses of throughput. To counter this problem, BFQ implements a request injection mechanism, which tries to fill the above service holes with I/O requests taken from other bfq_queues. The hard part in this mechanism is finding the right amount of I/O to inject, so as to both boost throughput and not break Q's bandwidth and latency guarantees. To this goal, the current version of this mechanism measures the bandwidth enjoyed by Q while it is being served, and tries to inject the maximum possible amount of extra service that does not cause Q's bandwidth to decrease too much. This solution has an important shortcoming. For bandwidth measurements to be stable and reliable, Q must remain in service for a much longer time than that needed to serve a single I/O request. Unfortunately, this does not hold with many workloads. This commit addresses this issue by changing the way the amount of injection allowed is dynamically computed. It tunes injection as a function of the service times of single I/O requests of Q, instead of Q's bandwidth. Single-request service times are evidently meaningful even if Q gets very few I/O requests completed while it is in service. As a testbed for this new solution, we measured the throughput reached by BFQ for one of the nastiest workloads and configurations for this scheduler: the workload generated by the dbench test (in the Phoronix suite), with 6 clients, on a filesystem with journaling, and with the journaling daemon enjoying a higher weight than normal processes. With this commit, the throughput grows from ~100 MB/s to ~150 MB/s on a PLEXTOR PX-256M5. Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Francesco Pollicino <fra.fra.800@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Diffstat (limited to 'block')
-rw-r--r--block/bfq-iosched.c417
-rw-r--r--block/bfq-iosched.h51
2 files changed, 409 insertions, 59 deletions
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 2eb587fe7c1a..f59efee7a601 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1721,6 +1721,123 @@ static void bfq_add_request(struct request *rq)
1721 bfqq->queued[rq_is_sync(rq)]++; 1721 bfqq->queued[rq_is_sync(rq)]++;
1722 bfqd->queued++; 1722 bfqd->queued++;
1723 1723
1724 if (RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_bfqq_sync(bfqq)) {
1725 /*
1726 * Periodically reset inject limit, to make sure that
1727 * the latter eventually drops in case workload
1728 * changes, see step (3) in the comments on
1729 * bfq_update_inject_limit().
1730 */
1731 if (time_is_before_eq_jiffies(bfqq->decrease_time_jif +
1732 msecs_to_jiffies(1000))) {
1733 /* invalidate baseline total service time */
1734 bfqq->last_serv_time_ns = 0;
1735
1736 /*
1737 * Reset pointer in case we are waiting for
1738 * some request completion.
1739 */
1740 bfqd->waited_rq = NULL;
1741
1742 /*
1743 * If bfqq has a short think time, then start
1744 * by setting the inject limit to 0
1745 * prudentially, because the service time of
1746 * an injected I/O request may be higher than
1747 * the think time of bfqq, and therefore, if
1748 * one request was injected when bfqq remains
1749 * empty, this injected request might delay
1750 * the service of the next I/O request for
1751 * bfqq significantly. In case bfqq can
1752 * actually tolerate some injection, then the
1753 * adaptive update will however raise the
1754 * limit soon. This lucky circumstance holds
1755 * exactly because bfqq has a short think
1756 * time, and thus, after remaining empty, is
1757 * likely to get new I/O enqueued---and then
1758 * completed---before being expired. This is
1759 * the very pattern that gives the
1760 * limit-update algorithm the chance to
1761 * measure the effect of injection on request
1762 * service times, and then to update the limit
1763 * accordingly.
1764 *
1765 * On the opposite end, if bfqq has a long
1766 * think time, then start directly by 1,
1767 * because:
1768 * a) on the bright side, keeping at most one
1769 * request in service in the drive is unlikely
1770 * to cause any harm to the latency of bfqq's
1771 * requests, as the service time of a single
1772 * request is likely to be lower than the
1773 * think time of bfqq;
1774 * b) on the downside, after becoming empty,
1775 * bfqq is likely to expire before getting its
1776 * next request. With this request arrival
1777 * pattern, it is very hard to sample total
1778 * service times and update the inject limit
1779 * accordingly (see comments on
1780 * bfq_update_inject_limit()). So the limit is
1781 * likely to be never, or at least seldom,
1782 * updated. As a consequence, by setting the
1783 * limit to 1, we avoid that no injection ever
1784 * occurs with bfqq. On the downside, this
1785 * proactive step further reduces chances to
1786 * actually compute the baseline total service
1787 * time. Thus it reduces chances to execute the
1788 * limit-update algorithm and possibly raise the
1789 * limit to more than 1.
1790 */
1791 if (bfq_bfqq_has_short_ttime(bfqq))
1792 bfqq->inject_limit = 0;
1793 else
1794 bfqq->inject_limit = 1;
1795 bfqq->decrease_time_jif = jiffies;
1796 }
1797
1798 /*
1799 * The following conditions must hold to setup a new
1800 * sampling of total service time, and then a new
1801 * update of the inject limit:
1802 * - bfqq is in service, because the total service
1803 * time is evaluated only for the I/O requests of
1804 * the queues in service;
1805 * - this is the right occasion to compute or to
1806 * lower the baseline total service time, because
1807 * there are actually no requests in the drive,
1808 * or
1809 * the baseline total service time is available, and
1810 * this is the right occasion to compute the other
1811 * quantity needed to update the inject limit, i.e.,
1812 * the total service time caused by the amount of
1813 * injection allowed by the current value of the
1814 * limit. It is the right occasion because injection
1815 * has actually been performed during the service
1816 * hole, and there are still in-flight requests,
1817 * which are very likely to be exactly the injected
1818 * requests, or part of them;
1819 * - the minimum interval for sampling the total
1820 * service time and updating the inject limit has
1821 * elapsed.
1822 */
1823 if (bfqq == bfqd->in_service_queue &&
1824 (bfqd->rq_in_driver == 0 ||
1825 (bfqq->last_serv_time_ns > 0 &&
1826 bfqd->rqs_injected && bfqd->rq_in_driver > 0)) &&
1827 time_is_before_eq_jiffies(bfqq->decrease_time_jif +
1828 msecs_to_jiffies(100))) {
1829 bfqd->last_empty_occupied_ns = ktime_get_ns();
1830 /*
1831 * Start the state machine for measuring the
1832 * total service time of rq: setting
1833 * wait_dispatch will cause bfqd->waited_rq to
1834 * be set when rq will be dispatched.
1835 */
1836 bfqd->wait_dispatch = true;
1837 bfqd->rqs_injected = false;
1838 }
1839 }
1840
1724 elv_rb_add(&bfqq->sort_list, rq); 1841 elv_rb_add(&bfqq->sort_list, rq);
1725 1842
1726 /* 1843 /*
@@ -2566,6 +2683,8 @@ static void bfq_arm_slice_timer(struct bfq_data *bfqd)
2566 sl = max_t(u32, sl, 20ULL * NSEC_PER_MSEC); 2683 sl = max_t(u32, sl, 20ULL * NSEC_PER_MSEC);
2567 2684
2568 bfqd->last_idling_start = ktime_get(); 2685 bfqd->last_idling_start = ktime_get();
2686 bfqd->last_idling_start_jiffies = jiffies;
2687
2569 hrtimer_start(&bfqd->idle_slice_timer, ns_to_ktime(sl), 2688 hrtimer_start(&bfqd->idle_slice_timer, ns_to_ktime(sl),
2570 HRTIMER_MODE_REL); 2689 HRTIMER_MODE_REL);
2571 bfqg_stats_set_start_idle_time(bfqq_group(bfqq)); 2690 bfqg_stats_set_start_idle_time(bfqq_group(bfqq));
@@ -3240,13 +3359,6 @@ static unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
3240 jiffies + nsecs_to_jiffies(bfqq->bfqd->bfq_slice_idle) + 4); 3359 jiffies + nsecs_to_jiffies(bfqq->bfqd->bfq_slice_idle) + 4);
3241} 3360}
3242 3361
3243static bool bfq_bfqq_injectable(struct bfq_queue *bfqq)
3244{
3245 return BFQQ_SEEKY(bfqq) && bfqq->wr_coeff == 1 &&
3246 blk_queue_nonrot(bfqq->bfqd->queue) &&
3247 bfqq->bfqd->hw_tag;
3248}
3249
3250/** 3362/**
3251 * bfq_bfqq_expire - expire a queue. 3363 * bfq_bfqq_expire - expire a queue.
3252 * @bfqd: device owning the queue. 3364 * @bfqd: device owning the queue.
@@ -3362,6 +3474,14 @@ void bfq_bfqq_expire(struct bfq_data *bfqd,
3362 slow, bfqq->dispatched, bfq_bfqq_has_short_ttime(bfqq)); 3474 slow, bfqq->dispatched, bfq_bfqq_has_short_ttime(bfqq));
3363 3475
3364 /* 3476 /*
3477 * bfqq expired, so no total service time needs to be computed
3478 * any longer: reset state machine for measuring total service
3479 * times.
3480 */
3481 bfqd->rqs_injected = bfqd->wait_dispatch = false;
3482 bfqd->waited_rq = NULL;
3483
3484 /*
3365 * Increase, decrease or leave budget unchanged according to 3485 * Increase, decrease or leave budget unchanged according to
3366 * reason. 3486 * reason.
3367 */ 3487 */
@@ -3372,8 +3492,6 @@ void bfq_bfqq_expire(struct bfq_data *bfqd,
3372 if (ref == 1) /* bfqq is gone, no more actions on it */ 3492 if (ref == 1) /* bfqq is gone, no more actions on it */
3373 return; 3493 return;
3374 3494
3375 bfqq->injected_service = 0;
3376
3377 /* mark bfqq as waiting a request only if a bic still points to it */ 3495 /* mark bfqq as waiting a request only if a bic still points to it */
3378 if (!bfq_bfqq_busy(bfqq) && 3496 if (!bfq_bfqq_busy(bfqq) &&
3379 reason != BFQQE_BUDGET_TIMEOUT && 3497 reason != BFQQE_BUDGET_TIMEOUT &&
@@ -3767,26 +3885,98 @@ static bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
3767 return RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_better_to_idle(bfqq); 3885 return RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_better_to_idle(bfqq);
3768} 3886}
3769 3887
3770static struct bfq_queue *bfq_choose_bfqq_for_injection(struct bfq_data *bfqd) 3888/*
3889 * This function chooses the queue from which to pick the next extra
3890 * I/O request to inject, if it finds a compatible queue. See the
3891 * comments on bfq_update_inject_limit() for details on the injection
3892 * mechanism, and for the definitions of the quantities mentioned
3893 * below.
3894 */
3895static struct bfq_queue *
3896bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
3771{ 3897{
3772 struct bfq_queue *bfqq; 3898 struct bfq_queue *bfqq, *in_serv_bfqq = bfqd->in_service_queue;
3899 unsigned int limit = in_serv_bfqq->inject_limit;
3900 /*
3901 * If
3902 * - bfqq is not weight-raised and therefore does not carry
3903 * time-critical I/O,
3904 * or
3905 * - regardless of whether bfqq is weight-raised, bfqq has
3906 * however a long think time, during which it can absorb the
3907 * effect of an appropriate number of extra I/O requests
3908 * from other queues (see bfq_update_inject_limit for
3909 * details on the computation of this number);
3910 * then injection can be performed without restrictions.
3911 */
3912 bool in_serv_always_inject = in_serv_bfqq->wr_coeff == 1 ||
3913 !bfq_bfqq_has_short_ttime(in_serv_bfqq);
3773 3914
3774 /* 3915 /*
3775 * A linear search; but, with a high probability, very few 3916 * If
3776 * steps are needed to find a candidate queue, i.e., a queue 3917 * - the baseline total service time could not be sampled yet,
3777 * with enough budget left for its next request. In fact: 3918 * so the inject limit happens to be still 0, and
3919 * - a lot of time has elapsed since the plugging of I/O
3920 * dispatching started, so drive speed is being wasted
3921 * significantly;
3922 * then temporarily raise inject limit to one request.
3923 */
3924 if (limit == 0 && in_serv_bfqq->last_serv_time_ns == 0 &&
3925 bfq_bfqq_wait_request(in_serv_bfqq) &&
3926 time_is_before_eq_jiffies(bfqd->last_idling_start_jiffies +
3927 bfqd->bfq_slice_idle)
3928 )
3929 limit = 1;
3930
3931 if (bfqd->rq_in_driver >= limit)
3932 return NULL;
3933
3934 /*
3935 * Linear search of the source queue for injection; but, with
3936 * a high probability, very few steps are needed to find a
3937 * candidate queue, i.e., a queue with enough budget left for
3938 * its next request. In fact:
3778 * - BFQ dynamically updates the budget of every queue so as 3939 * - BFQ dynamically updates the budget of every queue so as
3779 * to accommodate the expected backlog of the queue; 3940 * to accommodate the expected backlog of the queue;
3780 * - if a queue gets all its requests dispatched as injected 3941 * - if a queue gets all its requests dispatched as injected
3781 * service, then the queue is removed from the active list 3942 * service, then the queue is removed from the active list
3782 * (and re-added only if it gets new requests, but with 3943 * (and re-added only if it gets new requests, but then it
3783 * enough budget for its new backlog). 3944 * is assigned again enough budget for its new backlog).
3784 */ 3945 */
3785 list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) 3946 list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
3786 if (!RB_EMPTY_ROOT(&bfqq->sort_list) && 3947 if (!RB_EMPTY_ROOT(&bfqq->sort_list) &&
3948 (in_serv_always_inject || bfqq->wr_coeff > 1) &&
3787 bfq_serv_to_charge(bfqq->next_rq, bfqq) <= 3949 bfq_serv_to_charge(bfqq->next_rq, bfqq) <=
3788 bfq_bfqq_budget_left(bfqq)) 3950 bfq_bfqq_budget_left(bfqq)) {
3789 return bfqq; 3951 /*
3952 * Allow for only one large in-flight request
3953 * on non-rotational devices, for the
3954 * following reason. On non-rotationl drives,
3955 * large requests take much longer than
3956 * smaller requests to be served. In addition,
3957 * the drive prefers to serve large requests
3958 * w.r.t. to small ones, if it can choose. So,
3959 * having more than one large requests queued
3960 * in the drive may easily make the next first
3961 * request of the in-service queue wait for so
3962 * long to break bfqq's service guarantees. On
3963 * the bright side, large requests let the
3964 * drive reach a very high throughput, even if
3965 * there is only one in-flight large request
3966 * at a time.
3967 */
3968 if (blk_queue_nonrot(bfqd->queue) &&
3969 blk_rq_sectors(bfqq->next_rq) >=
3970 BFQQ_SECT_THR_NONROT)
3971 limit = min_t(unsigned int, 1, limit);
3972 else
3973 limit = in_serv_bfqq->inject_limit;
3974
3975 if (bfqd->rq_in_driver < limit) {
3976 bfqd->rqs_injected = true;
3977 return bfqq;
3978 }
3979 }
3790 3980
3791 return NULL; 3981 return NULL;
3792} 3982}
@@ -3873,14 +4063,32 @@ check_queue:
3873 * for a new request, or has requests waiting for a completion and 4063 * for a new request, or has requests waiting for a completion and
3874 * may idle after their completion, then keep it anyway. 4064 * may idle after their completion, then keep it anyway.
3875 * 4065 *
3876 * Yet, to boost throughput, inject service from other queues if 4066 * Yet, inject service from other queues if it boosts
3877 * possible. 4067 * throughput and is possible.
3878 */ 4068 */
3879 if (bfq_bfqq_wait_request(bfqq) || 4069 if (bfq_bfqq_wait_request(bfqq) ||
3880 (bfqq->dispatched != 0 && bfq_better_to_idle(bfqq))) { 4070 (bfqq->dispatched != 0 && bfq_better_to_idle(bfqq))) {
3881 if (bfq_bfqq_injectable(bfqq) && 4071 struct bfq_queue *async_bfqq =
3882 bfqq->injected_service * bfqq->inject_coeff < 4072 bfqq->bic && bfqq->bic->bfqq[0] &&
3883 bfqq->entity.service * 10) 4073 bfq_bfqq_busy(bfqq->bic->bfqq[0]) ?
4074 bfqq->bic->bfqq[0] : NULL;
4075
4076 /*
4077 * If the process associated with bfqq has also async
4078 * I/O pending, then inject it
4079 * unconditionally. Injecting I/O from the same
4080 * process can cause no harm to the process. On the
4081 * contrary, it can only increase bandwidth and reduce
4082 * latency for the process.
4083 */
4084 if (async_bfqq &&
4085 icq_to_bic(async_bfqq->next_rq->elv.icq) == bfqq->bic &&
4086 bfq_serv_to_charge(async_bfqq->next_rq, async_bfqq) <=
4087 bfq_bfqq_budget_left(async_bfqq))
4088 bfqq = bfqq->bic->bfqq[0];
4089 else if (!idling_boosts_thr_without_issues(bfqd, bfqq) &&
4090 (bfqq->wr_coeff == 1 || bfqd->wr_busy_queues > 1 ||
4091 !bfq_bfqq_has_short_ttime(bfqq)))
3884 bfqq = bfq_choose_bfqq_for_injection(bfqd); 4092 bfqq = bfq_choose_bfqq_for_injection(bfqd);
3885 else 4093 else
3886 bfqq = NULL; 4094 bfqq = NULL;
@@ -3972,15 +4180,15 @@ static struct request *bfq_dispatch_rq_from_bfqq(struct bfq_data *bfqd,
3972 4180
3973 bfq_bfqq_served(bfqq, service_to_charge); 4181 bfq_bfqq_served(bfqq, service_to_charge);
3974 4182
3975 bfq_dispatch_remove(bfqd->queue, rq); 4183 if (bfqq == bfqd->in_service_queue && bfqd->wait_dispatch) {
4184 bfqd->wait_dispatch = false;
4185 bfqd->waited_rq = rq;
4186 }
3976 4187
3977 if (bfqq != bfqd->in_service_queue) { 4188 bfq_dispatch_remove(bfqd->queue, rq);
3978 if (likely(bfqd->in_service_queue))
3979 bfqd->in_service_queue->injected_service +=
3980 bfq_serv_to_charge(rq, bfqq);
3981 4189
4190 if (bfqq != bfqd->in_service_queue)
3982 goto return_rq; 4191 goto return_rq;
3983 }
3984 4192
3985 /* 4193 /*
3986 * If weight raising has to terminate for bfqq, then next 4194 * If weight raising has to terminate for bfqq, then next
@@ -4411,13 +4619,6 @@ static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
4411 bfq_mark_bfqq_has_short_ttime(bfqq); 4619 bfq_mark_bfqq_has_short_ttime(bfqq);
4412 bfq_mark_bfqq_sync(bfqq); 4620 bfq_mark_bfqq_sync(bfqq);
4413 bfq_mark_bfqq_just_created(bfqq); 4621 bfq_mark_bfqq_just_created(bfqq);
4414 /*
4415 * Aggressively inject a lot of service: up to 90%.
4416 * This coefficient remains constant during bfqq life,
4417 * but this behavior might be changed, after enough
4418 * testing and tuning.
4419 */
4420 bfqq->inject_coeff = 1;
4421 } else 4622 } else
4422 bfq_clear_bfqq_sync(bfqq); 4623 bfq_clear_bfqq_sync(bfqq);
4423 4624
@@ -4977,6 +5178,147 @@ static void bfq_finish_requeue_request_body(struct bfq_queue *bfqq)
4977} 5178}
4978 5179
4979/* 5180/*
5181 * The processes associated with bfqq may happen to generate their
5182 * cumulative I/O at a lower rate than the rate at which the device
5183 * could serve the same I/O. This is rather probable, e.g., if only
5184 * one process is associated with bfqq and the device is an SSD. It
5185 * results in bfqq becoming often empty while in service. In this
5186 * respect, if BFQ is allowed to switch to another queue when bfqq
5187 * remains empty, then the device goes on being fed with I/O requests,
5188 * and the throughput is not affected. In contrast, if BFQ is not
5189 * allowed to switch to another queue---because bfqq is sync and
5190 * I/O-dispatch needs to be plugged while bfqq is temporarily
5191 * empty---then, during the service of bfqq, there will be frequent
5192 * "service holes", i.e., time intervals during which bfqq gets empty
5193 * and the device can only consume the I/O already queued in its
5194 * hardware queues. During service holes, the device may even get to
5195 * remaining idle. In the end, during the service of bfqq, the device
5196 * is driven at a lower speed than the one it can reach with the kind
5197 * of I/O flowing through bfqq.
5198 *
5199 * To counter this loss of throughput, BFQ implements a "request
5200 * injection mechanism", which tries to fill the above service holes
5201 * with I/O requests taken from other queues. The hard part in this
5202 * mechanism is finding the right amount of I/O to inject, so as to
5203 * both boost throughput and not break bfqq's bandwidth and latency
5204 * guarantees. In this respect, the mechanism maintains a per-queue
5205 * inject limit, computed as below. While bfqq is empty, the injection
5206 * mechanism dispatches extra I/O requests only until the total number
5207 * of I/O requests in flight---i.e., already dispatched but not yet
5208 * completed---remains lower than this limit.
5209 *
5210 * A first definition comes in handy to introduce the algorithm by
5211 * which the inject limit is computed. We define as first request for
5212 * bfqq, an I/O request for bfqq that arrives while bfqq is in
5213 * service, and causes bfqq to switch from empty to non-empty. The
5214 * algorithm updates the limit as a function of the effect of
5215 * injection on the service times of only the first requests of
5216 * bfqq. The reason for this restriction is that these are the
5217 * requests whose service time is affected most, because they are the
5218 * first to arrive after injection possibly occurred.
5219 *
5220 * To evaluate the effect of injection, the algorithm measures the
5221 * "total service time" of first requests. We define as total service
5222 * time of an I/O request, the time that elapses since when the
5223 * request is enqueued into bfqq, to when it is completed. This
5224 * quantity allows the whole effect of injection to be measured. It is
5225 * easy to see why. Suppose that some requests of other queues are
5226 * actually injected while bfqq is empty, and that a new request R
5227 * then arrives for bfqq. If the device does start to serve all or
5228 * part of the injected requests during the service hole, then,
5229 * because of this extra service, it may delay the next invocation of
5230 * the dispatch hook of BFQ. Then, even after R gets eventually
5231 * dispatched, the device may delay the actual service of R if it is
5232 * still busy serving the extra requests, or if it decides to serve,
5233 * before R, some extra request still present in its queues. As a
5234 * conclusion, the cumulative extra delay caused by injection can be
5235 * easily evaluated by just comparing the total service time of first
5236 * requests with and without injection.
5237 *
5238 * The limit-update algorithm works as follows. On the arrival of a
5239 * first request of bfqq, the algorithm measures the total time of the
5240 * request only if one of the three cases below holds, and, for each
5241 * case, it updates the limit as described below:
5242 *
5243 * (1) If there is no in-flight request. This gives a baseline for the
5244 * total service time of the requests of bfqq. If the baseline has
5245 * not been computed yet, then, after computing it, the limit is
5246 * set to 1, to start boosting throughput, and to prepare the
5247 * ground for the next case. If the baseline has already been
5248 * computed, then it is updated, in case it results to be lower
5249 * than the previous value.
5250 *
5251 * (2) If the limit is higher than 0 and there are in-flight
5252 * requests. By comparing the total service time in this case with
5253 * the above baseline, it is possible to know at which extent the
5254 * current value of the limit is inflating the total service
5255 * time. If the inflation is below a certain threshold, then bfqq
5256 * is assumed to be suffering from no perceivable loss of its
5257 * service guarantees, and the limit is even tentatively
5258 * increased. If the inflation is above the threshold, then the
5259 * limit is decreased. Due to the lack of any hysteresis, this
5260 * logic makes the limit oscillate even in steady workload
5261 * conditions. Yet we opted for it, because it is fast in reaching
5262 * the best value for the limit, as a function of the current I/O
5263 * workload. To reduce oscillations, this step is disabled for a
5264 * short time interval after the limit happens to be decreased.
5265 *
5266 * (3) Periodically, after resetting the limit, to make sure that the
5267 * limit eventually drops in case the workload changes. This is
5268 * needed because, after the limit has gone safely up for a
5269 * certain workload, it is impossible to guess whether the
5270 * baseline total service time may have changed, without measuring
5271 * it again without injection. A more effective version of this
5272 * step might be to just sample the baseline, by interrupting
5273 * injection only once, and then to reset/lower the limit only if
5274 * the total service time with the current limit does happen to be
5275 * too large.
5276 *
5277 * More details on each step are provided in the comments on the
5278 * pieces of code that implement these steps: the branch handling the
5279 * transition from empty to non empty in bfq_add_request(), the branch
5280 * handling injection in bfq_select_queue(), and the function
5281 * bfq_choose_bfqq_for_injection(). These comments also explain some
5282 * exceptions, made by the injection mechanism in some special cases.
5283 */
5284static void bfq_update_inject_limit(struct bfq_data *bfqd,
5285 struct bfq_queue *bfqq)
5286{
5287 u64 tot_time_ns = ktime_get_ns() - bfqd->last_empty_occupied_ns;
5288 unsigned int old_limit = bfqq->inject_limit;
5289
5290 if (bfqq->last_serv_time_ns > 0) {
5291 u64 threshold = (bfqq->last_serv_time_ns * 3)>>1;
5292
5293 if (tot_time_ns >= threshold && old_limit > 0) {
5294 bfqq->inject_limit--;
5295 bfqq->decrease_time_jif = jiffies;
5296 } else if (tot_time_ns < threshold &&
5297 old_limit < bfqd->max_rq_in_driver<<1)
5298 bfqq->inject_limit++;
5299 }
5300
5301 /*
5302 * Either we still have to compute the base value for the
5303 * total service time, and there seem to be the right
5304 * conditions to do it, or we can lower the last base value
5305 * computed.
5306 */
5307 if ((bfqq->last_serv_time_ns == 0 && bfqd->rq_in_driver == 0) ||
5308 tot_time_ns < bfqq->last_serv_time_ns) {
5309 bfqq->last_serv_time_ns = tot_time_ns;
5310 /*
5311 * Now we certainly have a base value: make sure we
5312 * start trying injection.
5313 */
5314 bfqq->inject_limit = max_t(unsigned int, 1, old_limit);
5315 }
5316
5317 /* update complete, not waiting for any request completion any longer */
5318 bfqd->waited_rq = NULL;
5319}
5320
5321/*
4980 * Handle either a requeue or a finish for rq. The things to do are 5322 * Handle either a requeue or a finish for rq. The things to do are
4981 * the same in both cases: all references to rq are to be dropped. In 5323 * the same in both cases: all references to rq are to be dropped. In
4982 * particular, rq is considered completed from the point of view of 5324 * particular, rq is considered completed from the point of view of
@@ -5020,6 +5362,9 @@ static void bfq_finish_requeue_request(struct request *rq)
5020 5362
5021 spin_lock_irqsave(&bfqd->lock, flags); 5363 spin_lock_irqsave(&bfqd->lock, flags);
5022 5364
5365 if (rq == bfqd->waited_rq)
5366 bfq_update_inject_limit(bfqd, bfqq);
5367
5023 bfq_completed_request(bfqq, bfqd); 5368 bfq_completed_request(bfqq, bfqd);
5024 bfq_finish_requeue_request_body(bfqq); 5369 bfq_finish_requeue_request_body(bfqq);
5025 5370
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 81cabf51a87e..26869cfbbfa9 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -240,6 +240,13 @@ struct bfq_queue {
240 /* next ioprio and ioprio class if a change is in progress */ 240 /* next ioprio and ioprio class if a change is in progress */
241 unsigned short new_ioprio, new_ioprio_class; 241 unsigned short new_ioprio, new_ioprio_class;
242 242
243 /* last total-service-time sample, see bfq_update_inject_limit() */
244 u64 last_serv_time_ns;
245 /* limit for request injection */
246 unsigned int inject_limit;
247 /* last time the inject limit has been decreased, in jiffies */
248 unsigned long decrease_time_jif;
249
243 /* 250 /*
244 * Shared bfq_queue if queue is cooperating with one or more 251 * Shared bfq_queue if queue is cooperating with one or more
245 * other queues. 252 * other queues.
@@ -357,29 +364,6 @@ struct bfq_queue {
357 364
358 /* max service rate measured so far */ 365 /* max service rate measured so far */
359 u32 max_service_rate; 366 u32 max_service_rate;
360 /*
361 * Ratio between the service received by bfqq while it is in
362 * service, and the cumulative service (of requests of other
363 * queues) that may be injected while bfqq is empty but still
364 * in service. To increase precision, the coefficient is
365 * measured in tenths of unit. Here are some example of (1)
366 * ratios, (2) resulting percentages of service injected
367 * w.r.t. to the total service dispatched while bfqq is in
368 * service, and (3) corresponding values of the coefficient:
369 * 1 (50%) -> 10
370 * 2 (33%) -> 20
371 * 10 (9%) -> 100
372 * 9.9 (9%) -> 99
373 * 1.5 (40%) -> 15
374 * 0.5 (66%) -> 5
375 * 0.1 (90%) -> 1
376 *
377 * So, if the coefficient is lower than 10, then
378 * injected service is more than bfqq service.
379 */
380 unsigned int inject_coeff;
381 /* amount of service injected in current service slot */
382 unsigned int injected_service;
383}; 367};
384 368
385/** 369/**
@@ -544,6 +528,26 @@ struct bfq_data {
544 /* time of last request completion (ns) */ 528 /* time of last request completion (ns) */
545 u64 last_completion; 529 u64 last_completion;
546 530
531 /* time of last transition from empty to non-empty (ns) */
532 u64 last_empty_occupied_ns;
533
534 /*
535 * Flag set to activate the sampling of the total service time
536 * of a just-arrived first I/O request (see
537 * bfq_update_inject_limit()). This will cause the setting of
538 * waited_rq when the request is finally dispatched.
539 */
540 bool wait_dispatch;
541 /*
542 * If set, then bfq_update_inject_limit() is invoked when
543 * waited_rq is eventually completed.
544 */
545 struct request *waited_rq;
546 /*
547 * True if some request has been injected during the last service hole.
548 */
549 bool rqs_injected;
550
547 /* time of first rq dispatch in current observation interval (ns) */ 551 /* time of first rq dispatch in current observation interval (ns) */
548 u64 first_dispatch; 552 u64 first_dispatch;
549 /* time of last rq dispatch in current observation interval (ns) */ 553 /* time of last rq dispatch in current observation interval (ns) */
@@ -553,6 +557,7 @@ struct bfq_data {
553 ktime_t last_budget_start; 557 ktime_t last_budget_start;
554 /* beginning of the last idle slice */ 558 /* beginning of the last idle slice */
555 ktime_t last_idling_start; 559 ktime_t last_idling_start;
560 unsigned long last_idling_start_jiffies;
556 561
557 /* number of samples in current observation interval */ 562 /* number of samples in current observation interval */
558 int peak_rate_samples; 563 int peak_rate_samples;