aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorTejun Heo <tj@kernel.org>2013-01-09 11:05:11 -0500
committerTejun Heo <tj@kernel.org>2013-01-09 11:05:11 -0500
commitd02f7aa8dce8166dbbc515ce393912aa45e6b8a6 (patch)
tree8a3e8e54bed797bb084a83008ca47065a261a9d6 /Documentation
parent41cad6ab2cb9ccb3b11546ad56b8b285e47c6279 (diff)
cfq-iosched: enable full blkcg hierarchy support
With the previous two patches, all cfqg scheduling decisions are based on vfraction and ready for hierarchy support. The only thing which keeps the behavior flat is cfqg_flat_parent() which makes vfraction calculation consider all non-root cfqgs children of the root cfqg. Replace it with cfqg_parent() which returns the real parent. This enables full blkcg hierarchy support for cfq-iosched. For example, consider the following hierarchy. root / \ A:500 B:250 / \ AA:500 AB:1000 For simplicity, let's say all the leaf nodes have active tasks and are on service tree. For each leaf node, vfraction would be AA: (500 / 1500) * (500 / 750) =~ 0.2222 AB: (1000 / 1500) * (500 / 750) =~ 0.4444 B: (250 / 750) =~ 0.3333 and vdisktime will be distributed accordingly. For more detail, please refer to Documentation/block/cfq-iosched.txt. v2: cfq-iosched.txt updated to describe group scheduling as suggested by Vivek. v3: blkio-controller.txt updated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/block/cfq-iosched.txt58
-rw-r--r--Documentation/cgroups/blkio-controller.txt35
2 files changed, 82 insertions, 11 deletions
diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt
index d89b4fe724d7..a5eb7d19a65d 100644
--- a/Documentation/block/cfq-iosched.txt
+++ b/Documentation/block/cfq-iosched.txt
@@ -102,6 +102,64 @@ processing of request. Therefore, increasing the value can imporve the
102performace although this can cause the latency of some I/O to increase due 102performace although this can cause the latency of some I/O to increase due
103to more number of requests. 103to more number of requests.
104 104
105CFQ Group scheduling
106====================
107
108CFQ supports blkio cgroup and has "blkio." prefixed files in each
109blkio cgroup directory. It is weight-based and there are four knobs
110for configuration - weight[_device] and leaf_weight[_device].
111Internal cgroup nodes (the ones with children) can also have tasks in
112them, so the former two configure how much proportion the cgroup as a
113whole is entitled to at its parent's level while the latter two
114configure how much proportion the tasks in the cgroup have compared to
115its direct children.
116
117Another way to think about it is assuming that each internal node has
118an implicit leaf child node which hosts all the tasks whose weight is
119configured by leaf_weight[_device]. Let's assume a blkio hierarchy
120composed of five cgroups - root, A, B, AA and AB - with the following
121weights where the names represent the hierarchy.
122
123 weight leaf_weight
124 root : 125 125
125 A : 500 750
126 B : 250 500
127 AA : 500 500
128 AB : 1000 500
129
130root never has a parent making its weight is meaningless. For backward
131compatibility, weight is always kept in sync with leaf_weight. B, AA
132and AB have no child and thus its tasks have no children cgroup to
133compete with. They always get 100% of what the cgroup won at the
134parent level. Considering only the weights which matter, the hierarchy
135looks like the following.
136
137 root
138 / | \
139 A B leaf
140 500 250 125
141 / | \
142 AA AB leaf
143 500 1000 750
144
145If all cgroups have active IOs and competing with each other, disk
146time will be distributed like the following.
147
148Distribution below root. The total active weight at this level is
149A:500 + B:250 + C:125 = 875.
150
151 root-leaf : 125 / 875 =~ 14%
152 A : 500 / 875 =~ 57%
153 B(-leaf) : 250 / 875 =~ 28%
154
155A has children and further distributes its 57% among the children and
156the implicit leaf node. The total active weight at this level is
157AA:500 + AB:1000 + A-leaf:750 = 2250.
158
159 A-leaf : ( 750 / 2250) * A =~ 19%
160 AA(-leaf) : ( 500 / 2250) * A =~ 12%
161 AB(-leaf) : (1000 / 2250) * A =~ 25%
162
105CFQ IOPS Mode for group scheduling 163CFQ IOPS Mode for group scheduling
106=================================== 164===================================
107Basic CFQ design is to provide priority based time slices. Higher priority 165Basic CFQ design is to provide priority based time slices. Higher priority
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index b4b1fb3a83f0..1b70843c574e 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -94,13 +94,11 @@ Throttling/Upper Limit policy
94 94
95Hierarchical Cgroups 95Hierarchical Cgroups
96==================== 96====================
97- Currently none of the IO control policy supports hierarchical groups. But 97- Currently only CFQ supports hierarchical groups. For throttling,
98 cgroup interface does allow creation of hierarchical cgroups and internally 98 cgroup interface does allow creation of hierarchical cgroups and
99 IO policies treat them as flat hierarchy. 99 internally it treats them as flat hierarchy.
100 100
101 So this patch will allow creation of cgroup hierarchcy but at the backend 101 If somebody created a hierarchy like as follows.
102 everything will be treated as flat. So if somebody created a hierarchy like
103 as follows.
104 102
105 root 103 root
106 / \ 104 / \
@@ -108,16 +106,20 @@ Hierarchical Cgroups
108 | 106 |
109 test3 107 test3
110 108
111 CFQ and throttling will practically treat all groups at same level. 109 CFQ will handle the hierarchy correctly but and throttling will
110 practically treat all groups at same level. For details on CFQ
111 hierarchy support, refer to Documentation/block/cfq-iosched.txt.
112 Throttling will treat the hierarchy as if it looks like the
113 following.
112 114
113 pivot 115 pivot
114 / / \ \ 116 / / \ \
115 root test1 test2 test3 117 root test1 test2 test3
116 118
117 Down the line we can implement hierarchical accounting/control support 119 Nesting cgroups, while allowed, isn't officially supported and blkio
118 and also introduce a new cgroup file "use_hierarchy" which will control 120 genereates warning when cgroups nest. Once throttling implements
119 whether cgroup hierarchy is viewed as flat or hierarchical by the policy.. 121 hierarchy support, hierarchy will be supported and the warning will
120 This is how memory controller also has implemented the things. 122 be removed.
121 123
122Various user visible config options 124Various user visible config options
123=================================== 125===================================
@@ -172,6 +174,12 @@ Proportional weight policy files
172 dev weight 174 dev weight
173 8:16 300 175 8:16 300
174 176
177- blkio.leaf_weight[_device]
178 - Equivalents of blkio.weight[_device] for the purpose of
179 deciding how much weight tasks in the given cgroup has while
180 competing with the cgroup's child cgroups. For details,
181 please refer to Documentation/block/cfq-iosched.txt.
182
175- blkio.time 183- blkio.time
176 - disk time allocated to cgroup per device in milliseconds. First 184 - disk time allocated to cgroup per device in milliseconds. First
177 two fields specify the major and minor number of the device and 185 two fields specify the major and minor number of the device and
@@ -279,6 +287,11 @@ Proportional weight policy files
279 and minor number of the device and third field specifies the number 287 and minor number of the device and third field specifies the number
280 of times a group was dequeued from a particular device. 288 of times a group was dequeued from a particular device.
281 289
290- blkio.*_recursive
291 - Recursive version of various stats. These files show the
292 same information as their non-recursive counterparts but
293 include stats from all the descendant cgroups.
294
282Throttling/Upper limit policy files 295Throttling/Upper limit policy files
283----------------------------------- 296-----------------------------------
284- blkio.throttle.read_bps_device 297- blkio.throttle.read_bps_device