diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2013-02-28 15:52:24 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2013-02-28 15:52:24 -0500 |
commit | ee89f81252179dcbf6cd65bd48299f5e52292d88 (patch) | |
tree | 805846cd12821f84cfe619d44c9e3e36e0b0f9e6 /Documentation/block | |
parent | 21f3b24da9328415792efc780f50b9f434c12465 (diff) | |
parent | de33127d8d3f1d570aad8c2223cd81b206636bc1 (diff) |
Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block
Pull block IO core bits from Jens Axboe:
"Below are the core block IO bits for 3.9. It was delayed a few days
since my workstation kept crashing every 2-8h after pulling it into
current -git, but turns out it is a bug in the new pstate code (divide
by zero, will report separately). In any case, it contains:
- The big cfq/blkcg update from Tejun and and Vivek.
- Additional block and writeback tracepoints from Tejun.
- Improvement of the should sort (based on queues) logic in the plug
flushing.
- _io() variants of the wait_for_completion() interface, using
io_schedule() instead of schedule() to contribute to io wait
properly.
- Various little fixes.
You'll get two trivial merge conflicts, which should be easy enough to
fix up"
Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0d42ca: "hlist: drop the node parameter from iterators").
* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
block: remove redundant check to bd_openers()
block: use i_size_write() in bd_set_size()
cfq: fix lock imbalance with failed allocations
drivers/block/swim3.c: fix null pointer dereference
block: don't select PERCPU_RWSEM
block: account iowait time when waiting for completion of IO request
sched: add wait_for_completion_io[_timeout]
writeback: add more tracepoints
block: add block_{touch|dirty}_buffer tracepoint
buffer: make touch_buffer() an exported function
block: add @req to bio_{front|back}_merge tracepoints
block: add missing block_bio_complete() tracepoint
block: Remove should_sort judgement when flush blk_plug
block,elevator: use new hashtable implementation
cfq-iosched: add hierarchical cfq_group statistics
cfq-iosched: collect stats from dead cfqgs
cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
block: RCU free request_queue
blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
...
Diffstat (limited to 'Documentation/block')
-rw-r--r-- | Documentation/block/cfq-iosched.txt | 58 |
1 files changed, 58 insertions, 0 deletions
diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt index d89b4fe724d7..a5eb7d19a65d 100644 --- a/Documentation/block/cfq-iosched.txt +++ b/Documentation/block/cfq-iosched.txt | |||
@@ -102,6 +102,64 @@ processing of request. Therefore, increasing the value can imporve the | |||
102 | performace although this can cause the latency of some I/O to increase due | 102 | performace although this can cause the latency of some I/O to increase due |
103 | to more number of requests. | 103 | to more number of requests. |
104 | 104 | ||
105 | CFQ Group scheduling | ||
106 | ==================== | ||
107 | |||
108 | CFQ supports blkio cgroup and has "blkio." prefixed files in each | ||
109 | blkio cgroup directory. It is weight-based and there are four knobs | ||
110 | for configuration - weight[_device] and leaf_weight[_device]. | ||
111 | Internal cgroup nodes (the ones with children) can also have tasks in | ||
112 | them, so the former two configure how much proportion the cgroup as a | ||
113 | whole is entitled to at its parent's level while the latter two | ||
114 | configure how much proportion the tasks in the cgroup have compared to | ||
115 | its direct children. | ||
116 | |||
117 | Another way to think about it is assuming that each internal node has | ||
118 | an implicit leaf child node which hosts all the tasks whose weight is | ||
119 | configured by leaf_weight[_device]. Let's assume a blkio hierarchy | ||
120 | composed of five cgroups - root, A, B, AA and AB - with the following | ||
121 | weights where the names represent the hierarchy. | ||
122 | |||
123 | weight leaf_weight | ||
124 | root : 125 125 | ||
125 | A : 500 750 | ||
126 | B : 250 500 | ||
127 | AA : 500 500 | ||
128 | AB : 1000 500 | ||
129 | |||
130 | root never has a parent making its weight is meaningless. For backward | ||
131 | compatibility, weight is always kept in sync with leaf_weight. B, AA | ||
132 | and AB have no child and thus its tasks have no children cgroup to | ||
133 | compete with. They always get 100% of what the cgroup won at the | ||
134 | parent level. Considering only the weights which matter, the hierarchy | ||
135 | looks like the following. | ||
136 | |||
137 | root | ||
138 | / | \ | ||
139 | A B leaf | ||
140 | 500 250 125 | ||
141 | / | \ | ||
142 | AA AB leaf | ||
143 | 500 1000 750 | ||
144 | |||
145 | If all cgroups have active IOs and competing with each other, disk | ||
146 | time will be distributed like the following. | ||
147 | |||
148 | Distribution below root. The total active weight at this level is | ||
149 | A:500 + B:250 + C:125 = 875. | ||
150 | |||
151 | root-leaf : 125 / 875 =~ 14% | ||
152 | A : 500 / 875 =~ 57% | ||
153 | B(-leaf) : 250 / 875 =~ 28% | ||
154 | |||
155 | A has children and further distributes its 57% among the children and | ||
156 | the implicit leaf node. The total active weight at this level is | ||
157 | AA:500 + AB:1000 + A-leaf:750 = 2250. | ||
158 | |||
159 | A-leaf : ( 750 / 2250) * A =~ 19% | ||
160 | AA(-leaf) : ( 500 / 2250) * A =~ 12% | ||
161 | AB(-leaf) : (1000 / 2250) * A =~ 25% | ||
162 | |||
105 | CFQ IOPS Mode for group scheduling | 163 | CFQ IOPS Mode for group scheduling |
106 | =================================== | 164 | =================================== |
107 | Basic CFQ design is to provide priority based time slices. Higher priority | 165 | Basic CFQ design is to provide priority based time slices. Higher priority |