summaryrefslogtreecommitdiffstats
path: root/block/bfq-iosched.c
diff options
context:
space:
mode:
authorPaolo Valente <paolo.valente@linaro.org>2017-04-12 12:23:18 -0400
committerJens Axboe <axboe@fb.com>2017-04-19 10:30:26 -0400
commitbf2b79e7c4b312aa6e1c661fb27e0dc4dd42f2c2 (patch)
tree2e3964bbff70ae4e4102886a27ac381d04589e93 /block/bfq-iosched.c
parent1de0c4cd9ea65f99910ae0b77fce2cd1a8e5de01 (diff)
block, bfq: boost the throughput on NCQ-capable flash-based devices
This patch boosts the throughput on NCQ-capable flash-based devices, while still preserving latency guarantees for interactive and soft real-time applications. The throughput is boosted by just not idling the device when the in-service queue remains empty, even if the queue is sync and has a non-null idle window. This helps to keep the drive's internal queue full, which is necessary to achieve maximum performance. This solution to boost the throughput is a port of commits a68bbdd and f7d7b7a for CFQ. As already highlighted in a previous patch, allowing the device to prefetch and internally reorder requests trivially causes loss of control on the request service order, and hence on service guarantees. Fortunately, as discussed in detail in the comments on the function bfq_bfqq_may_idle(), if every process has to receive the same fraction of the throughput, then the service order enforced by the internal scheduler of a flash-based device is relatively close to that enforced by BFQ. In particular, it is close enough to let service guarantees be substantially preserved. Things change in an asymmetric scenario, i.e., if not every process has to receive the same fraction of the throughput. In this case, to guarantee the desired throughput distribution, the device must be prevented from prefetching requests. This is exactly what this patch does in asymmetric scenarios. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Diffstat (limited to 'block/bfq-iosched.c')
-rw-r--r--block/bfq-iosched.c154
1 files changed, 106 insertions, 48 deletions
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index b97801ff3de0..208178478dc4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6442,15 +6442,25 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
6442 * The value of the variable is computed considering that 6442 * The value of the variable is computed considering that
6443 * idling is usually beneficial for the throughput if: 6443 * idling is usually beneficial for the throughput if:
6444 * (a) the device is not NCQ-capable, or 6444 * (a) the device is not NCQ-capable, or
6445 * (b) regardless of the presence of NCQ, the request pattern 6445 * (b) regardless of the presence of NCQ, the device is rotational
6446 * for bfqq is I/O-bound (possible throughput losses 6446 * and the request pattern for bfqq is I/O-bound (possible
6447 * caused by granting idling to seeky queues are mitigated 6447 * throughput losses caused by granting idling to seeky queues
6448 * by the fact that, in all scenarios where boosting 6448 * are mitigated by the fact that, in all scenarios where
6449 * throughput is the best thing to do, i.e., in all 6449 * boosting throughput is the best thing to do, i.e., in all
6450 * symmetric scenarios, only a minimal idle time is 6450 * symmetric scenarios, only a minimal idle time is allowed to
6451 * allowed to seeky queues). 6451 * seeky queues).
6452 *
6453 * Secondly, and in contrast to the above item (b), idling an
6454 * NCQ-capable flash-based device would not boost the
6455 * throughput even with intense I/O; rather it would lower
6456 * the throughput in proportion to how fast the device
6457 * is. Accordingly, the next variable is true if any of the
6458 * above conditions (a) and (b) is true, and, in particular,
6459 * happens to be false if bfqd is an NCQ-capable flash-based
6460 * device.
6452 */ 6461 */
6453 idling_boosts_thr = !bfqd->hw_tag || bfq_bfqq_IO_bound(bfqq); 6462 idling_boosts_thr = !bfqd->hw_tag ||
6463 (!blk_queue_nonrot(bfqd->queue) && bfq_bfqq_IO_bound(bfqq));
6454 6464
6455 /* 6465 /*
6456 * The value of the next variable, 6466 * The value of the next variable,
@@ -6491,14 +6501,16 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
6491 bfqd->wr_busy_queues == 0; 6501 bfqd->wr_busy_queues == 0;
6492 6502
6493 /* 6503 /*
6494 * There is then a case where idling must be performed not for 6504 * There is then a case where idling must be performed not
6495 * throughput concerns, but to preserve service guarantees. To 6505 * for throughput concerns, but to preserve service
6496 * introduce it, we can note that allowing the drive to 6506 * guarantees.
6497 * enqueue more than one request at a time, and hence 6507 *
6508 * To introduce this case, we can note that allowing the drive
6509 * to enqueue more than one request at a time, and hence
6498 * delegating de facto final scheduling decisions to the 6510 * delegating de facto final scheduling decisions to the
6499 * drive's internal scheduler, causes loss of control on the 6511 * drive's internal scheduler, entails loss of control on the
6500 * actual request service order. In particular, the critical 6512 * actual request service order. In particular, the critical
6501 * situation is when requests from different processes happens 6513 * situation is when requests from different processes happen
6502 * to be present, at the same time, in the internal queue(s) 6514 * to be present, at the same time, in the internal queue(s)
6503 * of the drive. In such a situation, the drive, by deciding 6515 * of the drive. In such a situation, the drive, by deciding
6504 * the service order of the internally-queued requests, does 6516 * the service order of the internally-queued requests, does
@@ -6509,51 +6521,97 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
6509 * the service distribution enforced by the drive's internal 6521 * the service distribution enforced by the drive's internal
6510 * scheduler is likely to coincide with the desired 6522 * scheduler is likely to coincide with the desired
6511 * device-throughput distribution only in a completely 6523 * device-throughput distribution only in a completely
6512 * symmetric scenario where: (i) each of these processes must 6524 * symmetric scenario where:
6513 * get the same throughput as the others; (ii) all these 6525 * (i) each of these processes must get the same throughput as
6514 * processes have the same I/O pattern (either sequential or 6526 * the others;
6515 * random). In fact, in such a scenario, the drive will tend 6527 * (ii) all these processes have the same I/O pattern
6516 * to treat the requests of each of these processes in about 6528 (either sequential or random).
6517 * the same way as the requests of the others, and thus to 6529 * In fact, in such a scenario, the drive will tend to treat
6518 * provide each of these processes with about the same 6530 * the requests of each of these processes in about the same
6519 * throughput (which is exactly the desired throughput 6531 * way as the requests of the others, and thus to provide
6520 * distribution). In contrast, in any asymmetric scenario, 6532 * each of these processes with about the same throughput
6521 * device idling is certainly needed to guarantee that bfqq 6533 * (which is exactly the desired throughput distribution). In
6522 * receives its assigned fraction of the device throughput 6534 * contrast, in any asymmetric scenario, device idling is
6523 * (see [1] for details). 6535 * certainly needed to guarantee that bfqq receives its
6536 * assigned fraction of the device throughput (see [1] for
6537 * details).
6538 *
6539 * We address this issue by controlling, actually, only the
6540 * symmetry sub-condition (i), i.e., provided that
6541 * sub-condition (i) holds, idling is not performed,
6542 * regardless of whether sub-condition (ii) holds. In other
6543 * words, only if sub-condition (i) holds, then idling is
6544 * allowed, and the device tends to be prevented from queueing
6545 * many requests, possibly of several processes. The reason
6546 * for not controlling also sub-condition (ii) is that we
6547 * exploit preemption to preserve guarantees in case of
6548 * symmetric scenarios, even if (ii) does not hold, as
6549 * explained in the next two paragraphs.
6550 *
6551 * Even if a queue, say Q, is expired when it remains idle, Q
6552 * can still preempt the new in-service queue if the next
6553 * request of Q arrives soon (see the comments on
6554 * bfq_bfqq_update_budg_for_activation). If all queues and
6555 * groups have the same weight, this form of preemption,
6556 * combined with the hole-recovery heuristic described in the
6557 * comments on function bfq_bfqq_update_budg_for_activation,
6558 * are enough to preserve a correct bandwidth distribution in
6559 * the mid term, even without idling. In fact, even if not
6560 * idling allows the internal queues of the device to contain
6561 * many requests, and thus to reorder requests, we can rather
6562 * safely assume that the internal scheduler still preserves a
6563 * minimum of mid-term fairness. The motivation for using
6564 * preemption instead of idling is that, by not idling,
6565 * service guarantees are preserved without minimally
6566 * sacrificing throughput. In other words, both a high
6567 * throughput and its desired distribution are obtained.
6568 *
6569 * More precisely, this preemption-based, idleless approach
6570 * provides fairness in terms of IOPS, and not sectors per
6571 * second. This can be seen with a simple example. Suppose
6572 * that there are two queues with the same weight, but that
6573 * the first queue receives requests of 8 sectors, while the
6574 * second queue receives requests of 1024 sectors. In
6575 * addition, suppose that each of the two queues contains at
6576 * most one request at a time, which implies that each queue
6577 * always remains idle after it is served. Finally, after
6578 * remaining idle, each queue receives very quickly a new
6579 * request. It follows that the two queues are served
6580 * alternatively, preempting each other if needed. This
6581 * implies that, although both queues have the same weight,
6582 * the queue with large requests receives a service that is
6583 * 1024/8 times as high as the service received by the other
6584 * queue.
6524 * 6585 *
6525 * As for sub-condition (i), actually we check only whether 6586 * On the other hand, device idling is performed, and thus
6526 * bfqq is being weight-raised. In fact, if bfqq is not being 6587 * pure sector-domain guarantees are provided, for the
6527 * weight-raised, we have that: 6588 * following queues, which are likely to need stronger
6528 * - if the process associated with bfqq is not I/O-bound, then 6589 * throughput guarantees: weight-raised queues, and queues
6529 * it is not either latency- or throughput-critical; therefore 6590 * with a higher weight than other queues. When such queues
6530 * idling is not needed for bfqq; 6591 * are active, sub-condition (i) is false, which triggers
6531 * - if the process asociated with bfqq is I/O-bound, then 6592 * device idling.
6532 * idling is already granted with bfqq (see the comments on
6533 * idling_boosts_thr).
6534 * 6593 *
6535 * We do not check sub-condition (ii) at all, i.e., the next 6594 * According to the above considerations, the next variable is
6536 * variable is true if and only if bfqq is being 6595 * true (only) if sub-condition (i) holds. To compute the
6537 * weight-raised. We do not need to control sub-condition (ii) 6596 * value of this variable, we not only use the return value of
6538 * for the following reason: 6597 * the function bfq_symmetric_scenario(), but also check
6539 * - if bfqq is being weight-raised, then idling is already 6598 * whether bfqq is being weight-raised, because
6540 * guaranteed to bfqq by sub-condition (i); 6599 * bfq_symmetric_scenario() does not take into account also
6541 * - if bfqq is not being weight-raised, then idling is 6600 * weight-raised queues (see comments on
6542 * already guaranteed to bfqq (only) if it matters, i.e., if 6601 * bfq_weights_tree_add()).
6543 * bfqq is associated to a currently I/O-bound process (see
6544 * the above comment on sub-condition (i)).
6545 * 6602 *
6546 * As a side note, it is worth considering that the above 6603 * As a side note, it is worth considering that the above
6547 * device-idling countermeasures may however fail in the 6604 * device-idling countermeasures may however fail in the
6548 * following unlucky scenario: if idling is (correctly) 6605 * following unlucky scenario: if idling is (correctly)
6549 * disabled in a time period during which the symmetry 6606 * disabled in a time period during which all symmetry
6550 * sub-condition holds, and hence the device is allowed to 6607 * sub-conditions hold, and hence the device is allowed to
6551 * enqueue many requests, but at some later point in time some 6608 * enqueue many requests, but at some later point in time some
6552 * sub-condition stops to hold, then it may become impossible 6609 * sub-condition stops to hold, then it may become impossible
6553 * to let requests be served in the desired order until all 6610 * to let requests be served in the desired order until all
6554 * the requests already queued in the device have been served. 6611 * the requests already queued in the device have been served.
6555 */ 6612 */
6556 asymmetric_scenario = bfqq->wr_coeff > 1; 6613 asymmetric_scenario = bfqq->wr_coeff > 1 ||
6614 !bfq_symmetric_scenario(bfqd);
6557 6615
6558 /* 6616 /*
6559 * We have now all the components we need to compute the return 6617 * We have now all the components we need to compute the return