aboutsummaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-04-29 22:07:40 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2013-04-29 22:07:40 -0400
commit46d9be3e5eb01f71fc02653755d970247174b400 (patch)
tree01534c9ebfa5f52a7133e34354d2831fe6704f15 /include
parentce8aa48929449b491149b6c87861ac69cb797a42 (diff)
parentcece95dfe5aa56ba99e51b4746230ff0b8542abd (diff)
Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo: "A lot of activities on workqueue side this time. The changes achieve the followings. - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are updated to be able to interface with multiple backend worker pools. This involved a lot of churning but the end result seems actually neater as unbound workqueues are now a lot closer to per-cpu ones. - The ability to interface with multiple backend worker pools are used to implement unbound workqueues with custom attributes. Currently the supported attributes are the nice level and CPU affinity. It may be expanded to include cgroup association in future. The attributes can be specified either by calling apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if the workqueue in question is exported through sysfs. The backend worker pools are keyed by the actual attributes and shared by any workqueues which share the same attributes. When attributes of a workqueue are changed, the workqueue binds to the worker pool with the specified attributes while leaving the work items which are already executing in its previous worker pools alone. This allows converting custom worker pool implementations which want worker attribute tuning to use workqueues. The writeback pool is already converted in block tree and there are a couple others are likely to follow including btrfs io workers. - WQ_UNBOUND's ability to bind to multiple worker pools is also used to make it NUMA-aware. Because there's no association between work item issuer and the specific worker assigned to execute it, before this change, using unbound workqueue led to unnecessary cross-node bouncing and it couldn't be helped by autonuma as it requires tasks to have implicit node affinity and workers are assigned randomly. After these changes, an unbound workqueue now binds to multiple NUMA-affine worker pools so that queued work items are executed in the same node. This is turned on by default but can be disabled system-wide or for individual workqueues. Crypto was requesting NUMA affinity as encrypting data across different nodes can contribute noticeable overhead and doing it per-cpu was too limiting for certain cases and IO throughput could be bottlenecked by one CPU being fully occupied while others have idle cycles. While the new features required a lot of changes including restructuring locking, it didn't complicate the execution paths much. The unbound workqueue handling is now closer to per-cpu ones and the new features are implemented by simply associating a workqueue with different sets of backend worker pools without changing queue, execution or flush paths. As such, even though the amount of change is very high, I feel relatively safe in that it isn't likely to cause subtle issues with basic correctness of work item execution and handling. If something is wrong, it's likely to show up as being associated with worker pools with the wrong attributes or OOPS while workqueue attributes are being changed or during CPU hotplug. While this creates more backend worker pools, it doesn't add too many more workers unless, of course, there are many workqueues with unique combinations of attributes. Assuming everything else is the same, NUMA awareness costs an extra worker pool per NUMA node with online CPUs. There are also a couple things which are being routed outside the workqueue tree. - block tree pulled in workqueue for-3.10 so that writeback worker pool can be converted to unbound workqueue with sysfs control exposed. This simplifies the code, makes writeback workers NUMA-aware and allows tuning nice level and CPU affinity via sysfs. - The conversion to workqueue means that there's no 1:1 association between a specific worker, which makes writeback folks unhappy as they want to be able to tell which filesystem caused a problem from backtrace on systems with many filesystems mounted. This is resolved by allowing work items to set debug info string which is printed when the task is dumped. As this change involves unifying implementations of dump_stack() and friends in arch codes, it's being routed through Andrew's -mm tree." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits) workqueue: use kmem_cache_free() instead of kfree() workqueue: avoid false negative WARN_ON() in destroy_workqueue() workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity workqueue: implement NUMA affinity for unbound workqueues workqueue: introduce put_pwq_unlocked() workqueue: introduce numa_pwq_tbl_install() workqueue: use NUMA-aware allocation for pool_workqueues workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() workqueue: map an unbound workqueues to multiple per-node pool_workqueues workqueue: move hot fields of workqueue_struct to the end workqueue: make workqueue->name[] fixed len workqueue: add workqueue->unbound_attrs workqueue: determine NUMA node of workers accourding to the allowed cpumask workqueue: drop 'H' from kworker names of unbound worker pools workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] workqueue: move pwq_pool_locking outside of get/put_unbound_pool() workqueue: fix memory leak in apply_workqueue_attrs() workqueue: fix unbound workqueue attrs hashing / comparison workqueue: fix race condition in unbound workqueue free path workqueue: remove pwq_lock which is no longer used ...
Diffstat (limited to 'include')
-rw-r--r--include/linux/cpumask.h15
-rw-r--r--include/linux/device.h2
-rw-r--r--include/linux/sched.h2
-rw-r--r--include/linux/workqueue.h166
4 files changed, 165 insertions, 20 deletions
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 032560295fcb..d08e4d2a9b92 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -591,6 +591,21 @@ static inline int cpulist_scnprintf(char *buf, int len,
591} 591}
592 592
593/** 593/**
594 * cpumask_parse - extract a cpumask from from a string
595 * @buf: the buffer to extract from
596 * @dstp: the cpumask to set.
597 *
598 * Returns -errno, or 0 for success.
599 */
600static inline int cpumask_parse(const char *buf, struct cpumask *dstp)
601{
602 char *nl = strchr(buf, '\n');
603 int len = nl ? nl - buf : strlen(buf);
604
605 return bitmap_parse(buf, len, cpumask_bits(dstp), nr_cpumask_bits);
606}
607
608/**
594 * cpulist_parse - extract a cpumask from a user string of ranges 609 * cpulist_parse - extract a cpumask from a user string of ranges
595 * @buf: the buffer to extract from 610 * @buf: the buffer to extract from
596 * @dstp: the cpumask to set. 611 * @dstp: the cpumask to set.
diff --git a/include/linux/device.h b/include/linux/device.h
index 88615ccaf23a..711793b145ff 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -297,6 +297,8 @@ void subsys_interface_unregister(struct subsys_interface *sif);
297 297
298int subsys_system_register(struct bus_type *subsys, 298int subsys_system_register(struct bus_type *subsys,
299 const struct attribute_group **groups); 299 const struct attribute_group **groups);
300int subsys_virtual_register(struct bus_type *subsys,
301 const struct attribute_group **groups);
300 302
301/** 303/**
302 * struct class - device classes 304 * struct class - device classes
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2d02c76a01be..bcbc30397f23 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1793,7 +1793,7 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
1793#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */ 1793#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
1794#define PF_SPREAD_PAGE 0x01000000 /* Spread page cache over cpuset */ 1794#define PF_SPREAD_PAGE 0x01000000 /* Spread page cache over cpuset */
1795#define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */ 1795#define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */
1796#define PF_THREAD_BOUND 0x04000000 /* Thread bound to specific cpu */ 1796#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_allowed */
1797#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ 1797#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
1798#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */ 1798#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
1799#define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */ 1799#define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 8afab27cdbc2..717975639378 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -11,6 +11,7 @@
11#include <linux/lockdep.h> 11#include <linux/lockdep.h>
12#include <linux/threads.h> 12#include <linux/threads.h>
13#include <linux/atomic.h> 13#include <linux/atomic.h>
14#include <linux/cpumask.h>
14 15
15struct workqueue_struct; 16struct workqueue_struct;
16 17
@@ -68,7 +69,7 @@ enum {
68 WORK_STRUCT_COLOR_BITS, 69 WORK_STRUCT_COLOR_BITS,
69 70
70 /* data contains off-queue information when !WORK_STRUCT_PWQ */ 71 /* data contains off-queue information when !WORK_STRUCT_PWQ */
71 WORK_OFFQ_FLAG_BASE = WORK_STRUCT_FLAG_BITS, 72 WORK_OFFQ_FLAG_BASE = WORK_STRUCT_COLOR_SHIFT,
72 73
73 WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE), 74 WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE),
74 75
@@ -115,6 +116,20 @@ struct delayed_work {
115 int cpu; 116 int cpu;
116}; 117};
117 118
119/*
120 * A struct for workqueue attributes. This can be used to change
121 * attributes of an unbound workqueue.
122 *
123 * Unlike other fields, ->no_numa isn't a property of a worker_pool. It
124 * only modifies how apply_workqueue_attrs() select pools and thus doesn't
125 * participate in pool hash calculations or equality comparisons.
126 */
127struct workqueue_attrs {
128 int nice; /* nice level */
129 cpumask_var_t cpumask; /* allowed CPUs */
130 bool no_numa; /* disable NUMA affinity */
131};
132
118static inline struct delayed_work *to_delayed_work(struct work_struct *work) 133static inline struct delayed_work *to_delayed_work(struct work_struct *work)
119{ 134{
120 return container_of(work, struct delayed_work, work); 135 return container_of(work, struct delayed_work, work);
@@ -283,9 +298,10 @@ enum {
283 WQ_MEM_RECLAIM = 1 << 3, /* may be used for memory reclaim */ 298 WQ_MEM_RECLAIM = 1 << 3, /* may be used for memory reclaim */
284 WQ_HIGHPRI = 1 << 4, /* high priority */ 299 WQ_HIGHPRI = 1 << 4, /* high priority */
285 WQ_CPU_INTENSIVE = 1 << 5, /* cpu instensive workqueue */ 300 WQ_CPU_INTENSIVE = 1 << 5, /* cpu instensive workqueue */
301 WQ_SYSFS = 1 << 6, /* visible in sysfs, see wq_sysfs_register() */
286 302
287 WQ_DRAINING = 1 << 6, /* internal: workqueue is draining */ 303 __WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */
288 WQ_RESCUER = 1 << 7, /* internal: workqueue has rescuer */ 304 __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
289 305
290 WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */ 306 WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */
291 WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */ 307 WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */
@@ -388,7 +404,7 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,
388 * Pointer to the allocated workqueue on success, %NULL on failure. 404 * Pointer to the allocated workqueue on success, %NULL on failure.
389 */ 405 */
390#define alloc_ordered_workqueue(fmt, flags, args...) \ 406#define alloc_ordered_workqueue(fmt, flags, args...) \
391 alloc_workqueue(fmt, WQ_UNBOUND | (flags), 1, ##args) 407 alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args)
392 408
393#define create_workqueue(name) \ 409#define create_workqueue(name) \
394 alloc_workqueue((name), WQ_MEM_RECLAIM, 1) 410 alloc_workqueue((name), WQ_MEM_RECLAIM, 1)
@@ -399,30 +415,23 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,
399 415
400extern void destroy_workqueue(struct workqueue_struct *wq); 416extern void destroy_workqueue(struct workqueue_struct *wq);
401 417
418struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask);
419void free_workqueue_attrs(struct workqueue_attrs *attrs);
420int apply_workqueue_attrs(struct workqueue_struct *wq,
421 const struct workqueue_attrs *attrs);
422
402extern bool queue_work_on(int cpu, struct workqueue_struct *wq, 423extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
403 struct work_struct *work); 424 struct work_struct *work);
404extern bool queue_work(struct workqueue_struct *wq, struct work_struct *work);
405extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, 425extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
406 struct delayed_work *work, unsigned long delay); 426 struct delayed_work *work, unsigned long delay);
407extern bool queue_delayed_work(struct workqueue_struct *wq,
408 struct delayed_work *work, unsigned long delay);
409extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq, 427extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
410 struct delayed_work *dwork, unsigned long delay); 428 struct delayed_work *dwork, unsigned long delay);
411extern bool mod_delayed_work(struct workqueue_struct *wq,
412 struct delayed_work *dwork, unsigned long delay);
413 429
414extern void flush_workqueue(struct workqueue_struct *wq); 430extern void flush_workqueue(struct workqueue_struct *wq);
415extern void drain_workqueue(struct workqueue_struct *wq); 431extern void drain_workqueue(struct workqueue_struct *wq);
416extern void flush_scheduled_work(void); 432extern void flush_scheduled_work(void);
417 433
418extern bool schedule_work_on(int cpu, struct work_struct *work);
419extern bool schedule_work(struct work_struct *work);
420extern bool schedule_delayed_work_on(int cpu, struct delayed_work *work,
421 unsigned long delay);
422extern bool schedule_delayed_work(struct delayed_work *work,
423 unsigned long delay);
424extern int schedule_on_each_cpu(work_func_t func); 434extern int schedule_on_each_cpu(work_func_t func);
425extern int keventd_up(void);
426 435
427int execute_in_process_context(work_func_t fn, struct execute_work *); 436int execute_in_process_context(work_func_t fn, struct execute_work *);
428 437
@@ -435,9 +444,121 @@ extern bool cancel_delayed_work_sync(struct delayed_work *dwork);
435 444
436extern void workqueue_set_max_active(struct workqueue_struct *wq, 445extern void workqueue_set_max_active(struct workqueue_struct *wq,
437 int max_active); 446 int max_active);
438extern bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq); 447extern bool current_is_workqueue_rescuer(void);
448extern bool workqueue_congested(int cpu, struct workqueue_struct *wq);
439extern unsigned int work_busy(struct work_struct *work); 449extern unsigned int work_busy(struct work_struct *work);
440 450
451/**
452 * queue_work - queue work on a workqueue
453 * @wq: workqueue to use
454 * @work: work to queue
455 *
456 * Returns %false if @work was already on a queue, %true otherwise.
457 *
458 * We queue the work to the CPU on which it was submitted, but if the CPU dies
459 * it can be processed by another CPU.
460 */
461static inline bool queue_work(struct workqueue_struct *wq,
462 struct work_struct *work)
463{
464 return queue_work_on(WORK_CPU_UNBOUND, wq, work);
465}
466
467/**
468 * queue_delayed_work - queue work on a workqueue after delay
469 * @wq: workqueue to use
470 * @dwork: delayable work to queue
471 * @delay: number of jiffies to wait before queueing
472 *
473 * Equivalent to queue_delayed_work_on() but tries to use the local CPU.
474 */
475static inline bool queue_delayed_work(struct workqueue_struct *wq,
476 struct delayed_work *dwork,
477 unsigned long delay)
478{
479 return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
480}
481
482/**
483 * mod_delayed_work - modify delay of or queue a delayed work
484 * @wq: workqueue to use
485 * @dwork: work to queue
486 * @delay: number of jiffies to wait before queueing
487 *
488 * mod_delayed_work_on() on local CPU.
489 */
490static inline bool mod_delayed_work(struct workqueue_struct *wq,
491 struct delayed_work *dwork,
492 unsigned long delay)
493{
494 return mod_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
495}
496
497/**
498 * schedule_work_on - put work task on a specific cpu
499 * @cpu: cpu to put the work task on
500 * @work: job to be done
501 *
502 * This puts a job on a specific cpu
503 */
504static inline bool schedule_work_on(int cpu, struct work_struct *work)
505{
506 return queue_work_on(cpu, system_wq, work);
507}
508
509/**
510 * schedule_work - put work task in global workqueue
511 * @work: job to be done
512 *
513 * Returns %false if @work was already on the kernel-global workqueue and
514 * %true otherwise.
515 *
516 * This puts a job in the kernel-global workqueue if it was not already
517 * queued and leaves it in the same position on the kernel-global
518 * workqueue otherwise.
519 */
520static inline bool schedule_work(struct work_struct *work)
521{
522 return queue_work(system_wq, work);
523}
524
525/**
526 * schedule_delayed_work_on - queue work in global workqueue on CPU after delay
527 * @cpu: cpu to use
528 * @dwork: job to be done
529 * @delay: number of jiffies to wait
530 *
531 * After waiting for a given time this puts a job in the kernel-global
532 * workqueue on the specified CPU.
533 */
534static inline bool schedule_delayed_work_on(int cpu, struct delayed_work *dwork,
535 unsigned long delay)
536{
537 return queue_delayed_work_on(cpu, system_wq, dwork, delay);
538}
539
540/**
541 * schedule_delayed_work - put work task in global workqueue after delay
542 * @dwork: job to be done
543 * @delay: number of jiffies to wait or 0 for immediate execution
544 *
545 * After waiting for a given time this puts a job in the kernel-global
546 * workqueue.
547 */
548static inline bool schedule_delayed_work(struct delayed_work *dwork,
549 unsigned long delay)
550{
551 return queue_delayed_work(system_wq, dwork, delay);
552}
553
554/**
555 * keventd_up - is workqueue initialized yet?
556 */
557static inline bool keventd_up(void)
558{
559 return system_wq != NULL;
560}
561
441/* 562/*
442 * Like above, but uses del_timer() instead of del_timer_sync(). This means, 563 * Like above, but uses del_timer() instead of del_timer_sync(). This means,
443 * if it returns 0 the timer function may be running and the queueing is in 564 * if it returns 0 the timer function may be running and the queueing is in
@@ -466,12 +587,12 @@ static inline bool __deprecated flush_delayed_work_sync(struct delayed_work *dwo
466} 587}
467 588
468#ifndef CONFIG_SMP 589#ifndef CONFIG_SMP
469static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg) 590static inline long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
470{ 591{
471 return fn(arg); 592 return fn(arg);
472} 593}
473#else 594#else
474long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg); 595long work_on_cpu(int cpu, long (*fn)(void *), void *arg);
475#endif /* CONFIG_SMP */ 596#endif /* CONFIG_SMP */
476 597
477#ifdef CONFIG_FREEZER 598#ifdef CONFIG_FREEZER
@@ -480,4 +601,11 @@ extern bool freeze_workqueues_busy(void);
480extern void thaw_workqueues(void); 601extern void thaw_workqueues(void);
481#endif /* CONFIG_FREEZER */ 602#endif /* CONFIG_FREEZER */
482 603
604#ifdef CONFIG_SYSFS
605int workqueue_sysfs_register(struct workqueue_struct *wq);
606#else /* CONFIG_SYSFS */
607static inline int workqueue_sysfs_register(struct workqueue_struct *wq)
608{ return 0; }
609#endif /* CONFIG_SYSFS */
610
483#endif 611#endif