diff options
author | Tejun Heo <tj@kernel.org> | 2012-07-14 01:16:45 -0400 |
---|---|---|
committer | Tejun Heo <tj@kernel.org> | 2012-07-14 01:24:45 -0400 |
commit | 3270476a6c0ce322354df8679652f060d66526dc (patch) | |
tree | db58846beb7c5e1c1b50b7e2f1c2538320408c26 /Documentation/workqueue.txt | |
parent | 4ce62e9e30cacc26885cab133ad1de358dd79f21 (diff) |
workqueue: reimplement WQ_HIGHPRI using a separate worker_pool
WQ_HIGHPRI was implemented by queueing highpri work items at the head
of the global worklist. Other than queueing at the head, they weren't
handled differently; unfortunately, this could lead to execution
latency of a few seconds on heavily loaded systems.
Now that workqueue code has been updated to deal with multiple
worker_pools per global_cwq, this patch reimplements WQ_HIGHPRI using
a separate worker_pool. NR_WORKER_POOLS is bumped to two and
gcwq->pools[0] is used for normal pri work items and ->pools[1] for
highpri. Highpri workers get -20 nice level and has 'H' suffix in
their names. Note that this change increases the number of kworkers
per cpu.
POOL_HIGHPRI_PENDING, pool_determine_ins_pos() and highpri chain
wakeup code in process_one_work() are no longer used and removed.
This allows proper prioritization of highpri work items and removes
high execution latency of highpri work items.
v2: nr_running indexing bug in get_pool_nr_running() fixed.
v3: Refreshed for the get_pool_nr_running() update in the previous
patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josh Hunt <joshhunt00@gmail.com>
LKML-Reference: <CAKA=qzaHqwZ8eqpLNFjxnO2fX-tgAOjmpvxgBFjv6dJeQaOW1w@mail.gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Diffstat (limited to 'Documentation/workqueue.txt')
-rw-r--r-- | Documentation/workqueue.txt | 103 |
1 files changed, 38 insertions, 65 deletions
diff --git a/Documentation/workqueue.txt b/Documentation/workqueue.txt index a0b577de918f..a6ab4b62d926 100644 --- a/Documentation/workqueue.txt +++ b/Documentation/workqueue.txt | |||
@@ -89,25 +89,28 @@ called thread-pools. | |||
89 | 89 | ||
90 | The cmwq design differentiates between the user-facing workqueues that | 90 | The cmwq design differentiates between the user-facing workqueues that |
91 | subsystems and drivers queue work items on and the backend mechanism | 91 | subsystems and drivers queue work items on and the backend mechanism |
92 | which manages thread-pool and processes the queued work items. | 92 | which manages thread-pools and processes the queued work items. |
93 | 93 | ||
94 | The backend is called gcwq. There is one gcwq for each possible CPU | 94 | The backend is called gcwq. There is one gcwq for each possible CPU |
95 | and one gcwq to serve work items queued on unbound workqueues. | 95 | and one gcwq to serve work items queued on unbound workqueues. Each |
96 | gcwq has two thread-pools - one for normal work items and the other | ||
97 | for high priority ones. | ||
96 | 98 | ||
97 | Subsystems and drivers can create and queue work items through special | 99 | Subsystems and drivers can create and queue work items through special |
98 | workqueue API functions as they see fit. They can influence some | 100 | workqueue API functions as they see fit. They can influence some |
99 | aspects of the way the work items are executed by setting flags on the | 101 | aspects of the way the work items are executed by setting flags on the |
100 | workqueue they are putting the work item on. These flags include | 102 | workqueue they are putting the work item on. These flags include |
101 | things like CPU locality, reentrancy, concurrency limits and more. To | 103 | things like CPU locality, reentrancy, concurrency limits, priority and |
102 | get a detailed overview refer to the API description of | 104 | more. To get a detailed overview refer to the API description of |
103 | alloc_workqueue() below. | 105 | alloc_workqueue() below. |
104 | 106 | ||
105 | When a work item is queued to a workqueue, the target gcwq is | 107 | When a work item is queued to a workqueue, the target gcwq and |
106 | determined according to the queue parameters and workqueue attributes | 108 | thread-pool is determined according to the queue parameters and |
107 | and appended on the shared worklist of the gcwq. For example, unless | 109 | workqueue attributes and appended on the shared worklist of the |
108 | specifically overridden, a work item of a bound workqueue will be | 110 | thread-pool. For example, unless specifically overridden, a work item |
109 | queued on the worklist of exactly that gcwq that is associated to the | 111 | of a bound workqueue will be queued on the worklist of either normal |
110 | CPU the issuer is running on. | 112 | or highpri thread-pool of the gcwq that is associated to the CPU the |
113 | issuer is running on. | ||
111 | 114 | ||
112 | For any worker pool implementation, managing the concurrency level | 115 | For any worker pool implementation, managing the concurrency level |
113 | (how many execution contexts are active) is an important issue. cmwq | 116 | (how many execution contexts are active) is an important issue. cmwq |
@@ -115,26 +118,26 @@ tries to keep the concurrency at a minimal but sufficient level. | |||
115 | Minimal to save resources and sufficient in that the system is used at | 118 | Minimal to save resources and sufficient in that the system is used at |
116 | its full capacity. | 119 | its full capacity. |
117 | 120 | ||
118 | Each gcwq bound to an actual CPU implements concurrency management by | 121 | Each thread-pool bound to an actual CPU implements concurrency |
119 | hooking into the scheduler. The gcwq is notified whenever an active | 122 | management by hooking into the scheduler. The thread-pool is notified |
120 | worker wakes up or sleeps and keeps track of the number of the | 123 | whenever an active worker wakes up or sleeps and keeps track of the |
121 | currently runnable workers. Generally, work items are not expected to | 124 | number of the currently runnable workers. Generally, work items are |
122 | hog a CPU and consume many cycles. That means maintaining just enough | 125 | not expected to hog a CPU and consume many cycles. That means |
123 | concurrency to prevent work processing from stalling should be | 126 | maintaining just enough concurrency to prevent work processing from |
124 | optimal. As long as there are one or more runnable workers on the | 127 | stalling should be optimal. As long as there are one or more runnable |
125 | CPU, the gcwq doesn't start execution of a new work, but, when the | 128 | workers on the CPU, the thread-pool doesn't start execution of a new |
126 | last running worker goes to sleep, it immediately schedules a new | 129 | work, but, when the last running worker goes to sleep, it immediately |
127 | worker so that the CPU doesn't sit idle while there are pending work | 130 | schedules a new worker so that the CPU doesn't sit idle while there |
128 | items. This allows using a minimal number of workers without losing | 131 | are pending work items. This allows using a minimal number of workers |
129 | execution bandwidth. | 132 | without losing execution bandwidth. |
130 | 133 | ||
131 | Keeping idle workers around doesn't cost other than the memory space | 134 | Keeping idle workers around doesn't cost other than the memory space |
132 | for kthreads, so cmwq holds onto idle ones for a while before killing | 135 | for kthreads, so cmwq holds onto idle ones for a while before killing |
133 | them. | 136 | them. |
134 | 137 | ||
135 | For an unbound wq, the above concurrency management doesn't apply and | 138 | For an unbound wq, the above concurrency management doesn't apply and |
136 | the gcwq for the pseudo unbound CPU tries to start executing all work | 139 | the thread-pools for the pseudo unbound CPU try to start executing all |
137 | items as soon as possible. The responsibility of regulating | 140 | work items as soon as possible. The responsibility of regulating |
138 | concurrency level is on the users. There is also a flag to mark a | 141 | concurrency level is on the users. There is also a flag to mark a |
139 | bound wq to ignore the concurrency management. Please refer to the | 142 | bound wq to ignore the concurrency management. Please refer to the |
140 | API section for details. | 143 | API section for details. |
@@ -205,31 +208,22 @@ resources, scheduled and executed. | |||
205 | 208 | ||
206 | WQ_HIGHPRI | 209 | WQ_HIGHPRI |
207 | 210 | ||
208 | Work items of a highpri wq are queued at the head of the | 211 | Work items of a highpri wq are queued to the highpri |
209 | worklist of the target gcwq and start execution regardless of | 212 | thread-pool of the target gcwq. Highpri thread-pools are |
210 | the current concurrency level. In other words, highpri work | 213 | served by worker threads with elevated nice level. |
211 | items will always start execution as soon as execution | ||
212 | resource is available. | ||
213 | 214 | ||
214 | Ordering among highpri work items is preserved - a highpri | 215 | Note that normal and highpri thread-pools don't interact with |
215 | work item queued after another highpri work item will start | 216 | each other. Each maintain its separate pool of workers and |
216 | execution after the earlier highpri work item starts. | 217 | implements concurrency management among its workers. |
217 | |||
218 | Although highpri work items are not held back by other | ||
219 | runnable work items, they still contribute to the concurrency | ||
220 | level. Highpri work items in runnable state will prevent | ||
221 | non-highpri work items from starting execution. | ||
222 | |||
223 | This flag is meaningless for unbound wq. | ||
224 | 218 | ||
225 | WQ_CPU_INTENSIVE | 219 | WQ_CPU_INTENSIVE |
226 | 220 | ||
227 | Work items of a CPU intensive wq do not contribute to the | 221 | Work items of a CPU intensive wq do not contribute to the |
228 | concurrency level. In other words, runnable CPU intensive | 222 | concurrency level. In other words, runnable CPU intensive |
229 | work items will not prevent other work items from starting | 223 | work items will not prevent other work items in the same |
230 | execution. This is useful for bound work items which are | 224 | thread-pool from starting execution. This is useful for bound |
231 | expected to hog CPU cycles so that their execution is | 225 | work items which are expected to hog CPU cycles so that their |
232 | regulated by the system scheduler. | 226 | execution is regulated by the system scheduler. |
233 | 227 | ||
234 | Although CPU intensive work items don't contribute to the | 228 | Although CPU intensive work items don't contribute to the |
235 | concurrency level, start of their executions is still | 229 | concurrency level, start of their executions is still |
@@ -239,14 +233,6 @@ resources, scheduled and executed. | |||
239 | 233 | ||
240 | This flag is meaningless for unbound wq. | 234 | This flag is meaningless for unbound wq. |
241 | 235 | ||
242 | WQ_HIGHPRI | WQ_CPU_INTENSIVE | ||
243 | |||
244 | This combination makes the wq avoid interaction with | ||
245 | concurrency management completely and behave as a simple | ||
246 | per-CPU execution context provider. Work items queued on a | ||
247 | highpri CPU-intensive wq start execution as soon as resources | ||
248 | are available and don't affect execution of other work items. | ||
249 | |||
250 | @max_active: | 236 | @max_active: |
251 | 237 | ||
252 | @max_active determines the maximum number of execution contexts per | 238 | @max_active determines the maximum number of execution contexts per |
@@ -328,20 +314,7 @@ If @max_active == 2, | |||
328 | 35 w2 wakes up and finishes | 314 | 35 w2 wakes up and finishes |
329 | 315 | ||
330 | Now, let's assume w1 and w2 are queued to a different wq q1 which has | 316 | Now, let's assume w1 and w2 are queued to a different wq q1 which has |
331 | WQ_HIGHPRI set, | 317 | WQ_CPU_INTENSIVE set, |
332 | |||
333 | TIME IN MSECS EVENT | ||
334 | 0 w1 and w2 start and burn CPU | ||
335 | 5 w1 sleeps | ||
336 | 10 w2 sleeps | ||
337 | 10 w0 starts and burns CPU | ||
338 | 15 w0 sleeps | ||
339 | 15 w1 wakes up and finishes | ||
340 | 20 w2 wakes up and finishes | ||
341 | 25 w0 wakes up and burns CPU | ||
342 | 30 w0 finishes | ||
343 | |||
344 | If q1 has WQ_CPU_INTENSIVE set, | ||
345 | 318 | ||
346 | TIME IN MSECS EVENT | 319 | TIME IN MSECS EVENT |
347 | 0 w0 starts and burns CPU | 320 | 0 w0 starts and burns CPU |