aboutsummaryrefslogtreecommitdiffstats
path: root/fs
diff options
context:
space:
mode:
authorWu Fengguang <fengguang.wu@intel.com>2010-08-29 13:22:30 -0400
committerWu Fengguang <fengguang.wu@intel.com>2011-07-10 01:09:01 -0400
commite98be2d599207c6b31e9bb340d52a231b2f3662d (patch)
tree3ae28e7d621a6e2ddf8e7462f8d282901c113d5c /fs
parentf7d2b1ecd0c714adefc7d3a942ef87beb828a763 (diff)
writeback: bdi write bandwidth estimation
The estimation value will start from 100MB/s and adapt to the real bandwidth in seconds. It tries to update the bandwidth only when disk is fully utilized. Any inactive period of more than one second will be skipped. The estimated bandwidth will be reflecting how fast the device can writeout when _fully utilized_, and won't drop to 0 when it goes idle. The value will remain constant at disk idle time. At busy write time, if not considering fluctuations, it will also remain high unless be knocked down by possible concurrent reads that compete for the disk time and bandwidth with async writes. The estimation is not done purely in the flusher because there is no guarantee for write_cache_pages() to return timely to update bandwidth. The bdi->avg_write_bandwidth smoothing is very effective for filtering out sudden spikes, however may be a little biased in long term. The overheads are low because the bdi bandwidth update only occurs at 200ms intervals. The 200ms update interval is suitable, because it's not possible to get the real bandwidth for the instance at all, due to large fluctuations. The NFS commits can be as large as seconds worth of data. One XFS completion may be as large as half second worth of data if we are going to increase the write chunk to half second worth of data. In ext4, fluctuations with time period of around 5 seconds is observed. And there is another pattern of irregular periods of up to 20 seconds on SSD tests. That's why we are not only doing the estimation at 200ms intervals, but also averaging them over a period of 3 seconds and then go further to do another level of smoothing in avg_write_bandwidth. CC: Li Shaohua <shaohua.li@intel.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Diffstat (limited to 'fs')
-rw-r--r--fs/fs-writeback.c13
1 files changed, 13 insertions, 0 deletions
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 2c947da39f6e..5826992910e9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -693,6 +693,16 @@ static inline bool over_bground_thresh(void)
693} 693}
694 694
695/* 695/*
696 * Called under wb->list_lock. If there are multiple wb per bdi,
697 * only the flusher working on the first wb should do it.
698 */
699static void wb_update_bandwidth(struct bdi_writeback *wb,
700 unsigned long start_time)
701{
702 __bdi_update_bandwidth(wb->bdi, start_time);
703}
704
705/*
696 * Explicit flushing or periodic writeback of "old" data. 706 * Explicit flushing or periodic writeback of "old" data.
697 * 707 *
698 * Define "old": the first time one of an inode's pages is dirtied, we mark the 708 * Define "old": the first time one of an inode's pages is dirtied, we mark the
@@ -710,6 +720,7 @@ static inline bool over_bground_thresh(void)
710static long wb_writeback(struct bdi_writeback *wb, 720static long wb_writeback(struct bdi_writeback *wb,
711 struct wb_writeback_work *work) 721 struct wb_writeback_work *work)
712{ 722{
723 unsigned long wb_start = jiffies;
713 long nr_pages = work->nr_pages; 724 long nr_pages = work->nr_pages;
714 unsigned long oldest_jif; 725 unsigned long oldest_jif;
715 struct inode *inode; 726 struct inode *inode;
@@ -758,6 +769,8 @@ static long wb_writeback(struct bdi_writeback *wb,
758 progress = __writeback_inodes_wb(wb, work); 769 progress = __writeback_inodes_wb(wb, work);
759 trace_writeback_written(wb->bdi, work); 770 trace_writeback_written(wb->bdi, work);
760 771
772 wb_update_bandwidth(wb, wb_start);
773
761 /* 774 /*
762 * Did we write something? Try for more 775 * Did we write something? Try for more
763 * 776 *