summaryrefslogtreecommitdiffstats
path: root/block/bfq-iosched.c
diff options
context:
space:
mode:
authorPaolo Valente <paolo.valente@linaro.org>2018-05-31 10:45:06 -0400
committerJens Axboe <axboe@kernel.dk>2018-05-31 10:54:38 -0400
commite24f1c245fb61b799137b586ea7ac3c6a5e952be (patch)
treeca5fda33e39c6a8d62565e4991c78c640f8e4ec7 /block/bfq-iosched.c
parent4029eef1be4c869ae4c1bdcdc0010a1f2a5b888f (diff)
block, bfq: remove slow-system class
BFQ computes the duration of weight raising for interactive applications automatically, using some reference parameters. In particular, BFQ uses the best durations (see comments in the code for how these durations have been assessed) for two classes of systems: slow and fast ones. Examples of slow systems are old phones or systems using micro HDDs. Fast systems are all the remaining ones. Using these parameters, BFQ computes the actual duration of the weight raising, for the system at hand, as a function of the relative speed of the system w.r.t. the speed of a reference system, belonging to the same class of systems as the system at hand. This slow vs fast differentiation proved to be useful in the past, but happens to have little meaning with current hardware. Even worse, it does cause problems in virtual systems, where the speed of the system can vary frequently, and so widely to just confuse the class-detection mechanism, and, as we have verified experimentally, to cause BFQ to compute non-sensical weight-raising durations. This commit addresses this issue by removing the slow class and the class-detection mechanism. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Diffstat (limited to 'block/bfq-iosched.c')
-rw-r--r--block/bfq-iosched.c137
1 files changed, 42 insertions, 95 deletions
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index e68d0a4159c4..21011019f5df 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -251,55 +251,43 @@ static struct kmem_cache *bfq_pool;
251 * When configured for computing the duration of the weight-raising 251 * When configured for computing the duration of the weight-raising
252 * for interactive queues automatically (see the comments at the 252 * for interactive queues automatically (see the comments at the
253 * beginning of this file), BFQ does it using the following formula: 253 * beginning of this file), BFQ does it using the following formula:
254 * duration = (R / r) * T, 254 * duration = (ref_rate / r) * ref_wr_duration,
255 * where r is the peak rate of the device, and R 255 * where r is the peak rate of the device, and ref_rate and
256 * and T are two reference parameters. In particular, 256 * ref_wr_duration are two reference parameters. In particular,
257 * R is the peak rate of the reference device (see below), and 257 * ref_rate is the peak rate of the reference storage device (see
258 * T is a reference time: given the systems that are likely 258 * below), and ref_wr_duration is about the maximum time needed, with
259 * to be installed on the reference device according to its speed 259 * BFQ and while reading two files in parallel, to load typical large
260 * class, T is about the maximum time needed, under BFQ and 260 * applications on the reference device (see the comments on
261 * while reading two files in parallel, to load typical large 261 * max_service_from_wr below, for more details on how ref_wr_duration
262 * applications on these systems (see the comments on 262 * is obtained). In practice, the slower/faster the device at hand
263 * max_service_from_wr below, for more details on how T is 263 * is, the more/less it takes to load applications with respect to the
264 * obtained). In practice, the slower/faster the device at hand is,
265 * the more/less it takes to load applications with respect to the
266 * reference device. Accordingly, the longer/shorter BFQ grants 264 * reference device. Accordingly, the longer/shorter BFQ grants
267 * weight raising to interactive applications. 265 * weight raising to interactive applications.
268 * 266 *
269 * BFQ uses four different reference pairs (R, T), depending on: 267 * BFQ uses two different reference pairs (ref_rate, ref_wr_duration),
270 * . whether the device is rotational or non-rotational; 268 * depending on whether the device is rotational or non-rotational.
271 * . whether the device is slow, such as old or portable HDDs, as well as
272 * SD cards, or fast, such as newer HDDs and SSDs.
273 * 269 *
274 * The device's speed class is dynamically (re)detected in 270 * In the following definitions, ref_rate[0] and ref_wr_duration[0]
275 * bfq_update_peak_rate() every time the estimated peak rate is updated. 271 * are the reference values for a rotational device, whereas
272 * ref_rate[1] and ref_wr_duration[1] are the reference values for a
273 * non-rotational device. The reference rates are not the actual peak
274 * rates of the devices used as a reference, but slightly lower
275 * values. The reason for using slightly lower values is that the
276 * peak-rate estimator tends to yield slightly lower values than the
277 * actual peak rate (it can yield the actual peak rate only if there
278 * is only one process doing I/O, and the process does sequential
279 * I/O).
276 * 280 *
277 * In the following definitions, R_slow[0]/R_fast[0] and 281 * The reference peak rates are measured in sectors/usec, left-shifted
278 * T_slow[0]/T_fast[0] are the reference values for a slow/fast 282 * by BFQ_RATE_SHIFT.
279 * rotational device, whereas R_slow[1]/R_fast[1] and
280 * T_slow[1]/T_fast[1] are the reference values for a slow/fast
281 * non-rotational device. Finally, device_speed_thresh are the
282 * thresholds used to switch between speed classes. The reference
283 * rates are not the actual peak rates of the devices used as a
284 * reference, but slightly lower values. The reason for using these
285 * slightly lower values is that the peak-rate estimator tends to
286 * yield slightly lower values than the actual peak rate (it can yield
287 * the actual peak rate only if there is only one process doing I/O,
288 * and the process does sequential I/O).
289 *
290 * Both the reference peak rates and the thresholds are measured in
291 * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
292 */ 283 */
293static int R_slow[2] = {1000, 10700}; 284static int ref_rate[2] = {14000, 33000};
294static int R_fast[2] = {14000, 33000};
295/* 285/*
296 * To improve readability, a conversion function is used to initialize the 286 * To improve readability, a conversion function is used to initialize
297 * following arrays, which entails that they can be initialized only in a 287 * the following array, which entails that the array can be
298 * function. 288 * initialized only in a function.
299 */ 289 */
300static int T_slow[2]; 290static int ref_wr_duration[2];
301static int T_fast[2];
302static int device_speed_thresh[2];
303 291
304/* 292/*
305 * BFQ uses the above-detailed, time-based weight-raising mechanism to 293 * BFQ uses the above-detailed, time-based weight-raising mechanism to
@@ -884,7 +872,7 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
884 if (bfqd->bfq_wr_max_time > 0) 872 if (bfqd->bfq_wr_max_time > 0)
885 return bfqd->bfq_wr_max_time; 873 return bfqd->bfq_wr_max_time;
886 874
887 dur = bfqd->RT_prod; 875 dur = bfqd->rate_dur_prod;
888 do_div(dur, bfqd->peak_rate); 876 do_div(dur, bfqd->peak_rate);
889 877
890 /* 878 /*
@@ -2492,37 +2480,15 @@ static unsigned long bfq_calc_max_budget(struct bfq_data *bfqd)
2492/* 2480/*
2493 * Update parameters related to throughput and responsiveness, as a 2481 * Update parameters related to throughput and responsiveness, as a
2494 * function of the estimated peak rate. See comments on 2482 * function of the estimated peak rate. See comments on
2495 * bfq_calc_max_budget(), and on T_slow and T_fast arrays. 2483 * bfq_calc_max_budget(), and on the ref_wr_duration array.
2496 */ 2484 */
2497static void update_thr_responsiveness_params(struct bfq_data *bfqd) 2485static void update_thr_responsiveness_params(struct bfq_data *bfqd)
2498{ 2486{
2499 int dev_type = blk_queue_nonrot(bfqd->queue); 2487 if (bfqd->bfq_user_max_budget == 0) {
2500
2501 if (bfqd->bfq_user_max_budget == 0)
2502 bfqd->bfq_max_budget = 2488 bfqd->bfq_max_budget =
2503 bfq_calc_max_budget(bfqd); 2489 bfq_calc_max_budget(bfqd);
2504 2490 bfq_log(bfqd, "new max_budget = %d", bfqd->bfq_max_budget);
2505 if (bfqd->device_speed == BFQ_BFQD_FAST &&
2506 bfqd->peak_rate < device_speed_thresh[dev_type]) {
2507 bfqd->device_speed = BFQ_BFQD_SLOW;
2508 bfqd->RT_prod = R_slow[dev_type] *
2509 T_slow[dev_type];
2510 } else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
2511 bfqd->peak_rate > device_speed_thresh[dev_type]) {
2512 bfqd->device_speed = BFQ_BFQD_FAST;
2513 bfqd->RT_prod = R_fast[dev_type] *
2514 T_fast[dev_type];
2515 } 2491 }
2516
2517 bfq_log(bfqd,
2518"dev_type %s dev_speed_class = %s (%llu sects/sec), thresh %llu setcs/sec",
2519 dev_type == 0 ? "ROT" : "NONROT",
2520 bfqd->device_speed == BFQ_BFQD_FAST ? "FAST" : "SLOW",
2521 bfqd->device_speed == BFQ_BFQD_FAST ?
2522 (USEC_PER_SEC*(u64)R_fast[dev_type])>>BFQ_RATE_SHIFT :
2523 (USEC_PER_SEC*(u64)R_slow[dev_type])>>BFQ_RATE_SHIFT,
2524 (USEC_PER_SEC*(u64)device_speed_thresh[dev_type])>>
2525 BFQ_RATE_SHIFT);
2526} 2492}
2527 2493
2528static void bfq_reset_rate_computation(struct bfq_data *bfqd, 2494static void bfq_reset_rate_computation(struct bfq_data *bfqd,
@@ -5311,14 +5277,12 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
5311 bfqd->wr_busy_queues = 0; 5277 bfqd->wr_busy_queues = 0;
5312 5278
5313 /* 5279 /*
5314 * Begin by assuming, optimistically, that the device is a 5280 * Begin by assuming, optimistically, that the device peak
5315 * high-speed one, and that its peak rate is equal to 2/3 of 5281 * rate is equal to 2/3 of the highest reference rate.
5316 * the highest reference rate.
5317 */ 5282 */
5318 bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] * 5283 bfqd->rate_dur_prod = ref_rate[blk_queue_nonrot(bfqd->queue)] *
5319 T_fast[blk_queue_nonrot(bfqd->queue)]; 5284 ref_wr_duration[blk_queue_nonrot(bfqd->queue)];
5320 bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)] * 2 / 3; 5285 bfqd->peak_rate = ref_rate[blk_queue_nonrot(bfqd->queue)] * 2 / 3;
5321 bfqd->device_speed = BFQ_BFQD_FAST;
5322 5286
5323 spin_lock_init(&bfqd->lock); 5287 spin_lock_init(&bfqd->lock);
5324 5288
@@ -5626,8 +5590,8 @@ static int __init bfq_init(void)
5626 /* 5590 /*
5627 * Times to load large popular applications for the typical 5591 * Times to load large popular applications for the typical
5628 * systems installed on the reference devices (see the 5592 * systems installed on the reference devices (see the
5629 * comments before the definitions of the next two 5593 * comments before the definition of the next
5630 * arrays). Actually, we use slightly slower values, as the 5594 * array). Actually, we use slightly lower values, as the
5631 * estimated peak rate tends to be smaller than the actual 5595 * estimated peak rate tends to be smaller than the actual
5632 * peak rate. The reason for this last fact is that estimates 5596 * peak rate. The reason for this last fact is that estimates
5633 * are computed over much shorter time intervals than the long 5597 * are computed over much shorter time intervals than the long
@@ -5636,25 +5600,8 @@ static int __init bfq_init(void)
5636 * scheduler cannot rely on a peak-rate-evaluation workload to 5600 * scheduler cannot rely on a peak-rate-evaluation workload to
5637 * be run for a long time. 5601 * be run for a long time.
5638 */ 5602 */
5639 T_slow[0] = msecs_to_jiffies(3500); /* actually 4 sec */ 5603 ref_wr_duration[0] = msecs_to_jiffies(7000); /* actually 8 sec */
5640 T_slow[1] = msecs_to_jiffies(6000); /* actually 6.5 sec */ 5604 ref_wr_duration[1] = msecs_to_jiffies(2500); /* actually 3 sec */
5641 T_fast[0] = msecs_to_jiffies(7000); /* actually 8 sec */
5642 T_fast[1] = msecs_to_jiffies(2500); /* actually 3 sec */
5643
5644 /*
5645 * Thresholds that determine the switch between speed classes
5646 * (see the comments before the definition of the array
5647 * device_speed_thresh). These thresholds are biased towards
5648 * transitions to the fast class. This is safer than the
5649 * opposite bias. In fact, a wrong transition to the slow
5650 * class results in short weight-raising periods, because the
5651 * speed of the device then tends to be higher that the
5652 * reference peak rate. On the opposite end, a wrong
5653 * transition to the fast class tends to increase
5654 * weight-raising periods, because of the opposite reason.
5655 */
5656 device_speed_thresh[0] = (4 * R_slow[0]) / 3;
5657 device_speed_thresh[1] = (4 * R_slow[1]) / 3;
5658 5605
5659 ret = elv_register(&iosched_bfq_mq); 5606 ret = elv_register(&iosched_bfq_mq);
5660 if (ret) 5607 if (ret)