diff options
-rw-r--r-- | Documentation/block/cfq-iosched.txt | 71 |
1 files changed, 71 insertions, 0 deletions
diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt index e578feed6d81..6d670f570451 100644 --- a/Documentation/block/cfq-iosched.txt +++ b/Documentation/block/cfq-iosched.txt | |||
@@ -43,3 +43,74 @@ If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches | |||
43 | to IOPS mode and starts providing fairness in terms of number of requests | 43 | to IOPS mode and starts providing fairness in terms of number of requests |
44 | dispatched. Note that this mode switching takes effect only for group | 44 | dispatched. Note that this mode switching takes effect only for group |
45 | scheduling. For non-cgroup users nothing should change. | 45 | scheduling. For non-cgroup users nothing should change. |
46 | |||
47 | CFQ IO scheduler Idling Theory | ||
48 | =============================== | ||
49 | Idling on a queue is primarily about waiting for the next request to come | ||
50 | on same queue after completion of a request. In this process CFQ will not | ||
51 | dispatch requests from other cfq queues even if requests are pending there. | ||
52 | |||
53 | The rationale behind idling is that it can cut down on number of seeks | ||
54 | on rotational media. For example, if a process is doing dependent | ||
55 | sequential reads (next read will come on only after completion of previous | ||
56 | one), then not dispatching request from other queue should help as we | ||
57 | did not move the disk head and kept on dispatching sequential IO from | ||
58 | one queue. | ||
59 | |||
60 | CFQ has following service trees and various queues are put on these trees. | ||
61 | |||
62 | sync-idle sync-noidle async | ||
63 | |||
64 | All cfq queues doing synchronous sequential IO go on to sync-idle tree. | ||
65 | On this tree we idle on each queue individually. | ||
66 | |||
67 | All synchronous non-sequential queues go on sync-noidle tree. Also any | ||
68 | request which are marked with REQ_NOIDLE go on this service tree. On this | ||
69 | tree we do not idle on individual queues instead idle on the whole group | ||
70 | of queues or the tree. So if there are 4 queues waiting for IO to dispatch | ||
71 | we will idle only once last queue has dispatched the IO and there is | ||
72 | no more IO on this service tree. | ||
73 | |||
74 | All async writes go on async service tree. There is no idling on async | ||
75 | queues. | ||
76 | |||
77 | CFQ has some optimizations for SSDs and if it detects a non-rotational | ||
78 | media which can support higher queue depth (multiple requests at in | ||
79 | flight at a time), then it cuts down on idling of individual queues and | ||
80 | all the queues move to sync-noidle tree and only tree idle remains. This | ||
81 | tree idling provides isolation with buffered write queues on async tree. | ||
82 | |||
83 | FAQ | ||
84 | === | ||
85 | Q1. Why to idle at all on queues marked with REQ_NOIDLE. | ||
86 | |||
87 | A1. We only do tree idle (all queues on sync-noidle tree) on queues marked | ||
88 | with REQ_NOIDLE. This helps in providing isolation with all the sync-idle | ||
89 | queues. Otherwise in presence of many sequential readers, other | ||
90 | synchronous IO might not get fair share of disk. | ||
91 | |||
92 | For example, if there are 10 sequential readers doing IO and they get | ||
93 | 100ms each. If a REQ_NOIDLE request comes in, it will be scheduled | ||
94 | roughly after 1 second. If after completion of REQ_NOIDLE request we | ||
95 | do not idle, and after a couple of milli seconds a another REQ_NOIDLE | ||
96 | request comes in, again it will be scheduled after 1second. Repeat it | ||
97 | and notice how a workload can lose its disk share and suffer due to | ||
98 | multiple sequential readers. | ||
99 | |||
100 | fsync can generate dependent IO where bunch of data is written in the | ||
101 | context of fsync, and later some journaling data is written. Journaling | ||
102 | data comes in only after fsync has finished its IO (atleast for ext4 | ||
103 | that seemed to be the case). Now if one decides not to idle on fsync | ||
104 | thread due to REQ_NOIDLE, then next journaling write will not get | ||
105 | scheduled for another second. A process doing small fsync, will suffer | ||
106 | badly in presence of multiple sequential readers. | ||
107 | |||
108 | Hence doing tree idling on threads using REQ_NOIDLE flag on requests | ||
109 | provides isolation from multiple sequential readers and at the same | ||
110 | time we do not idle on individual threads. | ||
111 | |||
112 | Q2. When to specify REQ_NOIDLE | ||
113 | A2. I would think whenever one is doing synchronous write and not expecting | ||
114 | more writes to be dispatched from same context soon, should be able | ||
115 | to specify REQ_NOIDLE on writes and that probably should work well for | ||
116 | most of the cases. | ||