writeback: comment on the bdi dirty threshold

We do "floating proportions" to let active devices to grow its target share of dirty pages and stalled/inactive devices to decrease its target share over time. It works well except in the case of "an inactive disk suddenly goes busy", where the initial target share may be too small. To mitigate this, bdi_position_ratio() has the below line to raise a small bdi_thresh when it's safe to do so, so that the disk be feed with enough dirty pages for efficient IO and in turn fast rampup of bdi_thresh: bdi_thresh = max(bdi_thresh, (limit - dirty) / 8); balance_dirty_pages() normally does negative feedback control which adjusts ratelimit to balance the bdi dirty pages around the target. In some extreme cases when that is not enough, it will have to block the tasks completely until the bdi dirty pages drop below bdi_thresh. Acked-by: Jan Kara <jack@suse.cz> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
author: Wu Fengguang <fengguang.wu@intel.com> 2011-11-23 12:44:41 -0500
committer: Wu Fengguang <fengguang.wu@intel.com> 2011-12-07 21:49:20 -0500
commit: aed21ad28b1323b2807faea019e5ac388a7bc837 (patch)
tree: 64d6bf0e86b7d256621420d2266d5c7c29bb5d50 /mm/page-writeback.c
parent: a50527b19c62c808a7fca022816fff88a50b948d (diff)
1 files changed, 14 insertions, 2 deletions
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 71252486bc6f..155efca4c123 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -411,8 +411,13 @@ void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty)
 *
 * Returns @bdi's dirty limit in pages. The term "dirty" in the context of
 * dirty balancing includes all PG_dirty, PG_writeback and NFS unstable pages.
- * And the "limit" in the name is not seriously taken as hard limit in
+ *
- * balance_dirty_pages().
+ * Note that balance_dirty_pages() will only seriously take it as a hard limit
+ * when sleeping max_pause per page is not enough to keep the dirty pages under
+ * control. For example, when the device is completely stalled due to some error
+ * conditions, or when there are 1000 dd tasks writing to a slow 10MB/s USB key.
+ * In the other normal situations, it acts more gently by throttling the tasks
+ * more (rather than completely block them) when the bdi dirty pages go high.
 *
 * It allocates high/low dirty limits to fast/slow devices, in order to prevent
 * - starving fast devices
@@ -594,6 +599,13 @@ static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
         */
        if (unlikely(bdi_thresh > thresh))
                bdi_thresh = thresh;
+        /*
+         * It's very possible that bdi_thresh is close to 0 not because the
+         * device is slow, but that it has remained inactive for long time.
+         * Honour such devices a reasonable good (hopefully IO efficient)
+         * threshold, so that the occasional writes won't be blocked and active
+         * writes can rampup the threshold quickly.
+         */
        bdi_thresh = max(bdi_thresh, (limit - dirty) / 8);
        /*
         * scale global setpoint to bdi's:
author	Wu Fengguang <fengguang.wu@intel.com>	2011-11-23 12:44:41 -0500
committer	Wu Fengguang <fengguang.wu@intel.com>	2011-12-07 21:49:20 -0500
commit	aed21ad28b1323b2807faea019e5ac388a7bc837 (patch)
tree	64d6bf0e86b7d256621420d2266d5c7c29bb5d50 /mm/page-writeback.c
parent	a50527b19c62c808a7fca022816fff88a50b948d (diff)