aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/workqueue.txt
diff options
context:
space:
mode:
authorTejun Heo <tj@kernel.org>2012-07-14 01:16:45 -0400
committerTejun Heo <tj@kernel.org>2012-07-14 01:24:45 -0400
commit3270476a6c0ce322354df8679652f060d66526dc (patch)
treedb58846beb7c5e1c1b50b7e2f1c2538320408c26 /Documentation/workqueue.txt
parent4ce62e9e30cacc26885cab133ad1de358dd79f21 (diff)
workqueue: reimplement WQ_HIGHPRI using a separate worker_pool
WQ_HIGHPRI was implemented by queueing highpri work items at the head of the global worklist. Other than queueing at the head, they weren't handled differently; unfortunately, this could lead to execution latency of a few seconds on heavily loaded systems. Now that workqueue code has been updated to deal with multiple worker_pools per global_cwq, this patch reimplements WQ_HIGHPRI using a separate worker_pool. NR_WORKER_POOLS is bumped to two and gcwq->pools[0] is used for normal pri work items and ->pools[1] for highpri. Highpri workers get -20 nice level and has 'H' suffix in their names. Note that this change increases the number of kworkers per cpu. POOL_HIGHPRI_PENDING, pool_determine_ins_pos() and highpri chain wakeup code in process_one_work() are no longer used and removed. This allows proper prioritization of highpri work items and removes high execution latency of highpri work items. v2: nr_running indexing bug in get_pool_nr_running() fixed. v3: Refreshed for the get_pool_nr_running() update in the previous patch. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Josh Hunt <joshhunt00@gmail.com> LKML-Reference: <CAKA=qzaHqwZ8eqpLNFjxnO2fX-tgAOjmpvxgBFjv6dJeQaOW1w@mail.gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com>
Diffstat (limited to 'Documentation/workqueue.txt')
-rw-r--r--Documentation/workqueue.txt103
1 files changed, 38 insertions, 65 deletions
diff --git a/Documentation/workqueue.txt b/Documentation/workqueue.txt
index a0b577de918f..a6ab4b62d926 100644
--- a/Documentation/workqueue.txt
+++ b/Documentation/workqueue.txt
@@ -89,25 +89,28 @@ called thread-pools.
89 89
90The cmwq design differentiates between the user-facing workqueues that 90The cmwq design differentiates between the user-facing workqueues that
91subsystems and drivers queue work items on and the backend mechanism 91subsystems and drivers queue work items on and the backend mechanism
92which manages thread-pool and processes the queued work items. 92which manages thread-pools and processes the queued work items.
93 93
94The backend is called gcwq. There is one gcwq for each possible CPU 94The backend is called gcwq. There is one gcwq for each possible CPU
95and one gcwq to serve work items queued on unbound workqueues. 95and one gcwq to serve work items queued on unbound workqueues. Each
96gcwq has two thread-pools - one for normal work items and the other
97for high priority ones.
96 98
97Subsystems and drivers can create and queue work items through special 99Subsystems and drivers can create and queue work items through special
98workqueue API functions as they see fit. They can influence some 100workqueue API functions as they see fit. They can influence some
99aspects of the way the work items are executed by setting flags on the 101aspects of the way the work items are executed by setting flags on the
100workqueue they are putting the work item on. These flags include 102workqueue they are putting the work item on. These flags include
101things like CPU locality, reentrancy, concurrency limits and more. To 103things like CPU locality, reentrancy, concurrency limits, priority and
102get a detailed overview refer to the API description of 104more. To get a detailed overview refer to the API description of
103alloc_workqueue() below. 105alloc_workqueue() below.
104 106
105When a work item is queued to a workqueue, the target gcwq is 107When a work item is queued to a workqueue, the target gcwq and
106determined according to the queue parameters and workqueue attributes 108thread-pool is determined according to the queue parameters and
107and appended on the shared worklist of the gcwq. For example, unless 109workqueue attributes and appended on the shared worklist of the
108specifically overridden, a work item of a bound workqueue will be 110thread-pool. For example, unless specifically overridden, a work item
109queued on the worklist of exactly that gcwq that is associated to the 111of a bound workqueue will be queued on the worklist of either normal
110CPU the issuer is running on. 112or highpri thread-pool of the gcwq that is associated to the CPU the
113issuer is running on.
111 114
112For any worker pool implementation, managing the concurrency level 115For any worker pool implementation, managing the concurrency level
113(how many execution contexts are active) is an important issue. cmwq 116(how many execution contexts are active) is an important issue. cmwq
@@ -115,26 +118,26 @@ tries to keep the concurrency at a minimal but sufficient level.
115Minimal to save resources and sufficient in that the system is used at 118Minimal to save resources and sufficient in that the system is used at
116its full capacity. 119its full capacity.
117 120
118Each gcwq bound to an actual CPU implements concurrency management by 121Each thread-pool bound to an actual CPU implements concurrency
119hooking into the scheduler. The gcwq is notified whenever an active 122management by hooking into the scheduler. The thread-pool is notified
120worker wakes up or sleeps and keeps track of the number of the 123whenever an active worker wakes up or sleeps and keeps track of the
121currently runnable workers. Generally, work items are not expected to 124number of the currently runnable workers. Generally, work items are
122hog a CPU and consume many cycles. That means maintaining just enough 125not expected to hog a CPU and consume many cycles. That means
123concurrency to prevent work processing from stalling should be 126maintaining just enough concurrency to prevent work processing from
124optimal. As long as there are one or more runnable workers on the 127stalling should be optimal. As long as there are one or more runnable
125CPU, the gcwq doesn't start execution of a new work, but, when the 128workers on the CPU, the thread-pool doesn't start execution of a new
126last running worker goes to sleep, it immediately schedules a new 129work, but, when the last running worker goes to sleep, it immediately
127worker so that the CPU doesn't sit idle while there are pending work 130schedules a new worker so that the CPU doesn't sit idle while there
128items. This allows using a minimal number of workers without losing 131are pending work items. This allows using a minimal number of workers
129execution bandwidth. 132without losing execution bandwidth.
130 133
131Keeping idle workers around doesn't cost other than the memory space 134Keeping idle workers around doesn't cost other than the memory space
132for kthreads, so cmwq holds onto idle ones for a while before killing 135for kthreads, so cmwq holds onto idle ones for a while before killing
133them. 136them.
134 137
135For an unbound wq, the above concurrency management doesn't apply and 138For an unbound wq, the above concurrency management doesn't apply and
136the gcwq for the pseudo unbound CPU tries to start executing all work 139the thread-pools for the pseudo unbound CPU try to start executing all
137items as soon as possible. The responsibility of regulating 140work items as soon as possible. The responsibility of regulating
138concurrency level is on the users. There is also a flag to mark a 141concurrency level is on the users. There is also a flag to mark a
139bound wq to ignore the concurrency management. Please refer to the 142bound wq to ignore the concurrency management. Please refer to the
140API section for details. 143API section for details.
@@ -205,31 +208,22 @@ resources, scheduled and executed.
205 208
206 WQ_HIGHPRI 209 WQ_HIGHPRI
207 210
208 Work items of a highpri wq are queued at the head of the 211 Work items of a highpri wq are queued to the highpri
209 worklist of the target gcwq and start execution regardless of 212 thread-pool of the target gcwq. Highpri thread-pools are
210 the current concurrency level. In other words, highpri work 213 served by worker threads with elevated nice level.
211 items will always start execution as soon as execution
212 resource is available.
213 214
214 Ordering among highpri work items is preserved - a highpri 215 Note that normal and highpri thread-pools don't interact with
215 work item queued after another highpri work item will start 216 each other. Each maintain its separate pool of workers and
216 execution after the earlier highpri work item starts. 217 implements concurrency management among its workers.
217
218 Although highpri work items are not held back by other
219 runnable work items, they still contribute to the concurrency
220 level. Highpri work items in runnable state will prevent
221 non-highpri work items from starting execution.
222
223 This flag is meaningless for unbound wq.
224 218
225 WQ_CPU_INTENSIVE 219 WQ_CPU_INTENSIVE
226 220
227 Work items of a CPU intensive wq do not contribute to the 221 Work items of a CPU intensive wq do not contribute to the
228 concurrency level. In other words, runnable CPU intensive 222 concurrency level. In other words, runnable CPU intensive
229 work items will not prevent other work items from starting 223 work items will not prevent other work items in the same
230 execution. This is useful for bound work items which are 224 thread-pool from starting execution. This is useful for bound
231 expected to hog CPU cycles so that their execution is 225 work items which are expected to hog CPU cycles so that their
232 regulated by the system scheduler. 226 execution is regulated by the system scheduler.
233 227
234 Although CPU intensive work items don't contribute to the 228 Although CPU intensive work items don't contribute to the
235 concurrency level, start of their executions is still 229 concurrency level, start of their executions is still
@@ -239,14 +233,6 @@ resources, scheduled and executed.
239 233
240 This flag is meaningless for unbound wq. 234 This flag is meaningless for unbound wq.
241 235
242 WQ_HIGHPRI | WQ_CPU_INTENSIVE
243
244 This combination makes the wq avoid interaction with
245 concurrency management completely and behave as a simple
246 per-CPU execution context provider. Work items queued on a
247 highpri CPU-intensive wq start execution as soon as resources
248 are available and don't affect execution of other work items.
249
250@max_active: 236@max_active:
251 237
252@max_active determines the maximum number of execution contexts per 238@max_active determines the maximum number of execution contexts per
@@ -328,20 +314,7 @@ If @max_active == 2,
328 35 w2 wakes up and finishes 314 35 w2 wakes up and finishes
329 315
330Now, let's assume w1 and w2 are queued to a different wq q1 which has 316Now, let's assume w1 and w2 are queued to a different wq q1 which has
331WQ_HIGHPRI set, 317WQ_CPU_INTENSIVE set,
332
333 TIME IN MSECS EVENT
334 0 w1 and w2 start and burn CPU
335 5 w1 sleeps
336 10 w2 sleeps
337 10 w0 starts and burns CPU
338 15 w0 sleeps
339 15 w1 wakes up and finishes
340 20 w2 wakes up and finishes
341 25 w0 wakes up and burns CPU
342 30 w0 finishes
343
344If q1 has WQ_CPU_INTENSIVE set,
345 318
346 TIME IN MSECS EVENT 319 TIME IN MSECS EVENT
347 0 w0 starts and burns CPU 320 0 w0 starts and burns CPU