Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo: "A lot of activities on workqueue side this time. The changes achieve the followings. - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are updated to be able to interface with multiple backend worker pools. This involved a lot of churning but the end result seems actually neater as unbound workqueues are now a lot closer to per-cpu ones. - The ability to interface with multiple backend worker pools are used to implement unbound workqueues with custom attributes. Currently the supported attributes are the nice level and CPU affinity. It may be expanded to include cgroup association in future. The attributes can be specified either by calling apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if the workqueue in question is exported through sysfs. The backend worker pools are keyed by the actual attributes and shared by any workqueues which share the same attributes. When attributes of a workqueue are changed, the workqueue binds to the worker pool with the specified attributes while leaving the work items which are already executing in its previous worker pools alone. This allows converting custom worker pool implementations which want worker attribute tuning to use workqueues. The writeback pool is already converted in block tree and there are a couple others are likely to follow including btrfs io workers. - WQ_UNBOUND's ability to bind to multiple worker pools is also used to make it NUMA-aware. Because there's no association between work item issuer and the specific worker assigned to execute it, before this change, using unbound workqueue led to unnecessary cross-node bouncing and it couldn't be helped by autonuma as it requires tasks to have implicit node affinity and workers are assigned randomly. After these changes, an unbound workqueue now binds to multiple NUMA-affine worker pools so that queued work items are executed in the same node. This is turned on by default but can be disabled system-wide or for individual workqueues. Crypto was requesting NUMA affinity as encrypting data across different nodes can contribute noticeable overhead and doing it per-cpu was too limiting for certain cases and IO throughput could be bottlenecked by one CPU being fully occupied while others have idle cycles. While the new features required a lot of changes including restructuring locking, it didn't complicate the execution paths much. The unbound workqueue handling is now closer to per-cpu ones and the new features are implemented by simply associating a workqueue with different sets of backend worker pools without changing queue, execution or flush paths. As such, even though the amount of change is very high, I feel relatively safe in that it isn't likely to cause subtle issues with basic correctness of work item execution and handling. If something is wrong, it's likely to show up as being associated with worker pools with the wrong attributes or OOPS while workqueue attributes are being changed or during CPU hotplug. While this creates more backend worker pools, it doesn't add too many more workers unless, of course, there are many workqueues with unique combinations of attributes. Assuming everything else is the same, NUMA awareness costs an extra worker pool per NUMA node with online CPUs. There are also a couple things which are being routed outside the workqueue tree. - block tree pulled in workqueue for-3.10 so that writeback worker pool can be converted to unbound workqueue with sysfs control exposed. This simplifies the code, makes writeback workers NUMA-aware and allows tuning nice level and CPU affinity via sysfs. - The conversion to workqueue means that there's no 1:1 association between a specific worker, which makes writeback folks unhappy as they want to be able to tell which filesystem caused a problem from backtrace on systems with many filesystems mounted. This is resolved by allowing work items to set debug info string which is printed when the task is dumped. As this change involves unifying implementations of dump_stack() and friends in arch codes, it's being routed through Andrew's -mm tree." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits) workqueue: use kmem_cache_free() instead of kfree() workqueue: avoid false negative WARN_ON() in destroy_workqueue() workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity workqueue: implement NUMA affinity for unbound workqueues workqueue: introduce put_pwq_unlocked() workqueue: introduce numa_pwq_tbl_install() workqueue: use NUMA-aware allocation for pool_workqueues workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() workqueue: map an unbound workqueues to multiple per-node pool_workqueues workqueue: move hot fields of workqueue_struct to the end workqueue: make workqueue->name[] fixed len workqueue: add workqueue->unbound_attrs workqueue: determine NUMA node of workers accourding to the allowed cpumask workqueue: drop 'H' from kworker names of unbound worker pools workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] workqueue: move pwq_pool_locking outside of get/put_unbound_pool() workqueue: fix memory leak in apply_workqueue_attrs() workqueue: fix unbound workqueue attrs hashing / comparison workqueue: fix race condition in unbound workqueue free path workqueue: remove pwq_lock which is no longer used ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2013-04-29 22:07:40 -0400
committer: Linus Torvalds <torvalds@linux-foundation.org> 2013-04-29 22:07:40 -0400
commit: 46d9be3e5eb01f71fc02653755d970247174b400 (patch)
tree: 01534c9ebfa5f52a7133e34354d2831fe6704f15 /include
parent: ce8aa48929449b491149b6c87861ac69cb797a42 (diff)
parent: cece95dfe5aa56ba99e51b4746230ff0b8542abd (diff)
4 files changed, 165 insertions, 20 deletions
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 032560295fcb..d08e4d2a9b92 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -591,6 +591,21 @@ static inline int cpulist_scnprintf(char *buf, int len,
 }
 /**
+ * cpumask_parse - extract a cpumask from from a string
+ * @buf: the buffer to extract from
+ * @dstp: the cpumask to set.
+ *
+ * Returns -errno, or 0 for success.
+ */
+static inline int cpumask_parse(const char *buf, struct cpumask *dstp)
+{
+        char *nl = strchr(buf, '\n');
+        int len = nl ? nl - buf : strlen(buf);
+        return bitmap_parse(buf, len, cpumask_bits(dstp), nr_cpumask_bits);
+}
+/**
 * cpulist_parse - extract a cpumask from a user string of ranges
 * @buf: the buffer to extract from
 * @dstp: the cpumask to set.
diff --git a/include/linux/device.h b/include/linux/device.h
index 88615ccaf23a..711793b145ff 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -297,6 +297,8 @@ void subsys_interface_unregister(struct subsys_interface *sif);
 int subsys_system_register(struct bus_type *subsys,
                           const struct attribute_group **groups);
+int subsys_virtual_register(struct bus_type *subsys,
+                            const struct attribute_group **groups);
 /**
 * struct class - device classes
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2d02c76a01be..bcbc30397f23 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1793,7 +1793,7 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
 #define PF_SWAPWRITE    0x00800000      /* Allowed to write to swap */
 #define PF_SPREAD_PAGE  0x01000000      /* Spread page cache over cpuset */
 #define PF_SPREAD_SLAB  0x02000000      /* Spread some slab caches over cpuset */
-#define PF_THREAD_BOUND 0x04000000      /* Thread bound to specific cpu */
+#define PF_NO_SETAFFINITY 0x04000000    /* Userland is not allowed to meddle with cpus_allowed */
 #define PF_MCE_EARLY    0x08000000      /* Early kill for mce process policy */
 #define PF_MEMPOLICY    0x10000000      /* Non-default NUMA mempolicy */
 #define PF_MUTEX_TESTER 0x20000000      /* Thread belongs to the rt mutex tester */
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 8afab27cdbc2..717975639378 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -11,6 +11,7 @@
 #include <linux/lockdep.h>
 #include <linux/threads.h>
 #include <linux/atomic.h>
+#include <linux/cpumask.h>
 struct workqueue_struct;
@@ -68,7 +69,7 @@ enum {
                                  WORK_STRUCT_COLOR_BITS,
        /* data contains off-queue information when !WORK_STRUCT_PWQ */
-        WORK_OFFQ_FLAG_BASE     = WORK_STRUCT_FLAG_BITS,
+        WORK_OFFQ_FLAG_BASE     = WORK_STRUCT_COLOR_SHIFT,
        WORK_OFFQ_CANCELING     = (1 << WORK_OFFQ_FLAG_BASE),
@@ -115,6 +116,20 @@ struct delayed_work {
        int cpu;
 };
+/*
+ * A struct for workqueue attributes.  This can be used to change
+ * attributes of an unbound workqueue.
+ *
+ * Unlike other fields, ->no_numa isn't a property of a worker_pool.  It
+ * only modifies how apply_workqueue_attrs() select pools and thus doesn't
+ * participate in pool hash calculations or equality comparisons.
+ */
+struct workqueue_attrs {
+        int                     nice;           /* nice level */
+        cpumask_var_t           cpumask;        /* allowed CPUs */
+        bool                    no_numa;        /* disable NUMA affinity */
+};
 static inline struct delayed_work *to_delayed_work(struct work_struct *work)
 {
        return container_of(work, struct delayed_work, work);
@@ -283,9 +298,10 @@ enum {
        WQ_MEM_RECLAIM          = 1 << 3, /* may be used for memory reclaim */
        WQ_HIGHPRI              = 1 << 4, /* high priority */
        WQ_CPU_INTENSIVE        = 1 << 5, /* cpu instensive workqueue */
+        WQ_SYSFS                = 1 << 6, /* visible in sysfs, see wq_sysfs_register() */
-        WQ_DRAINING             = 1 << 6, /* internal: workqueue is draining */
+        __WQ_DRAINING           = 1 << 16, /* internal: workqueue is draining */
-        WQ_RESCUER              = 1 << 7, /* internal: workqueue has rescuer */
+        __WQ_ORDERED            = 1 << 17, /* internal: workqueue is ordered */
        WQ_MAX_ACTIVE           = 512,    /* I like 512, better ideas? */
        WQ_MAX_UNBOUND_PER_CPU  = 4,      /* 4 * #cpus for unbound wq */
@@ -388,7 +404,7 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,
 * Pointer to the allocated workqueue on success, %NULL on failure.
 */
 #define alloc_ordered_workqueue(fmt, flags, args...)                    \
-        alloc_workqueue(fmt, WQ_UNBOUND | (flags), 1, ##args)
+        alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args)
 #define create_workqueue(name)                                          \
        alloc_workqueue((name), WQ_MEM_RECLAIM, 1)
@@ -399,30 +415,23 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,
 extern void destroy_workqueue(struct workqueue_struct *wq);
+struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask);
+void free_workqueue_attrs(struct workqueue_attrs *attrs);
+int apply_workqueue_attrs(struct workqueue_struct *wq,
+                          const struct workqueue_attrs *attrs);
 extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
                        struct work_struct *work);
-extern bool queue_work(struct workqueue_struct *wq, struct work_struct *work);
 extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
                        struct delayed_work *work, unsigned long delay);
-extern bool queue_delayed_work(struct workqueue_struct *wq,
-                        struct delayed_work *work, unsigned long delay);
 extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
                        struct delayed_work *dwork, unsigned long delay);
-extern bool mod_delayed_work(struct workqueue_struct *wq,
-                        struct delayed_work *dwork, unsigned long delay);
 extern void flush_workqueue(struct workqueue_struct *wq);
 extern void drain_workqueue(struct workqueue_struct *wq);
 extern void flush_scheduled_work(void);
-extern bool schedule_work_on(int cpu, struct work_struct *work);
-extern bool schedule_work(struct work_struct *work);
-extern bool schedule_delayed_work_on(int cpu, struct delayed_work *work,
-                                     unsigned long delay);
-extern bool schedule_delayed_work(struct delayed_work *work,
-                                  unsigned long delay);
 extern int schedule_on_each_cpu(work_func_t func);
-extern int keventd_up(void);
 int execute_in_process_context(work_func_t fn, struct execute_work *);
@@ -435,9 +444,121 @@ extern bool cancel_delayed_work_sync(struct delayed_work *dwork);
 extern void workqueue_set_max_active(struct workqueue_struct *wq,
                                     int max_active);
-extern bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq);
+extern bool current_is_workqueue_rescuer(void);
+extern bool workqueue_congested(int cpu, struct workqueue_struct *wq);
 extern unsigned int work_busy(struct work_struct *work);
+/**
+ * queue_work - queue work on a workqueue
+ * @wq: workqueue to use
+ * @work: work to queue
+ *
+ * Returns %false if @work was already on a queue, %true otherwise.
+ *
+ * We queue the work to the CPU on which it was submitted, but if the CPU dies
+ * it can be processed by another CPU.
+ */
+static inline bool queue_work(struct workqueue_struct *wq,
+                              struct work_struct *work)
+{
+        return queue_work_on(WORK_CPU_UNBOUND, wq, work);
+}
+/**
+ * queue_delayed_work - queue work on a workqueue after delay
+ * @wq: workqueue to use
+ * @dwork: delayable work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * Equivalent to queue_delayed_work_on() but tries to use the local CPU.
+ */
+static inline bool queue_delayed_work(struct workqueue_struct *wq,
+                                      struct delayed_work *dwork,
+                                      unsigned long delay)
+{
+        return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
+}
+/**
+ * mod_delayed_work - modify delay of or queue a delayed work
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * mod_delayed_work_on() on local CPU.
+ */
+static inline bool mod_delayed_work(struct workqueue_struct *wq,
+                                    struct delayed_work *dwork,
+                                    unsigned long delay)
+{
+        return mod_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
+}
+/**
+ * schedule_work_on - put work task on a specific cpu
+ * @cpu: cpu to put the work task on
+ * @work: job to be done
+ *
+ * This puts a job on a specific cpu
+ */
+static inline bool schedule_work_on(int cpu, struct work_struct *work)
+{
+        return queue_work_on(cpu, system_wq, work);
+}
+/**
+ * schedule_work - put work task in global workqueue
+ * @work: job to be done
+ *
+ * Returns %false if @work was already on the kernel-global workqueue and
+ * %true otherwise.
+ *
+ * This puts a job in the kernel-global workqueue if it was not already
+ * queued and leaves it in the same position on the kernel-global
+ * workqueue otherwise.
+ */
+static inline bool schedule_work(struct work_struct *work)
+{
+        return queue_work(system_wq, work);
+}
+/**
+ * schedule_delayed_work_on - queue work in global workqueue on CPU after delay
+ * @cpu: cpu to use
+ * @dwork: job to be done
+ * @delay: number of jiffies to wait
+ *
+ * After waiting for a given time this puts a job in the kernel-global
+ * workqueue on the specified CPU.
+ */
+static inline bool schedule_delayed_work_on(int cpu, struct delayed_work *dwork,
+                                            unsigned long delay)
+{
+        return queue_delayed_work_on(cpu, system_wq, dwork, delay);
+}
+/**
+ * schedule_delayed_work - put work task in global workqueue after delay
+ * @dwork: job to be done
+ * @delay: number of jiffies to wait or 0 for immediate execution
+ *
+ * After waiting for a given time this puts a job in the kernel-global
+ * workqueue.
+ */
+static inline bool schedule_delayed_work(struct delayed_work *dwork,
+                                         unsigned long delay)
+{
+        return queue_delayed_work(system_wq, dwork, delay);
+}
+/**
+ * keventd_up - is workqueue initialized yet?
+ */
+static inline bool keventd_up(void)
+{
+        return system_wq != NULL;
+}
 /*
 * Like above, but uses del_timer() instead of del_timer_sync(). This means,
 * if it returns 0 the timer function may be running and the queueing is in
@@ -466,12 +587,12 @@ static inline bool __deprecated flush_delayed_work_sync(struct delayed_work *dwo
 }
 #ifndef CONFIG_SMP
-static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
+static inline long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 {
        return fn(arg);
 }
 #else
-long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
+long work_on_cpu(int cpu, long (*fn)(void *), void *arg);
 #endif /* CONFIG_SMP */
 #ifdef CONFIG_FREEZER
@@ -480,4 +601,11 @@ extern bool freeze_workqueues_busy(void);
 extern void thaw_workqueues(void);
 #endif /* CONFIG_FREEZER */
+#ifdef CONFIG_SYSFS
+int workqueue_sysfs_register(struct workqueue_struct *wq);
+#else   /* CONFIG_SYSFS */
+static inline int workqueue_sysfs_register(struct workqueue_struct *wq)
+{ return 0; }
+#endif  /* CONFIG_SYSFS */
 #endif
author	Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 22:07:40 -0400
committer	Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 22:07:40 -0400
commit	46d9be3e5eb01f71fc02653755d970247174b400 (patch)
tree	01534c9ebfa5f52a7133e34354d2831fe6704f15 /include
parent	ce8aa48929449b491149b6c87861ac69cb797a42 (diff)
parent	cece95dfe5aa56ba99e51b4746230ff0b8542abd (diff)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 032560295fcb..d08e4d2a9b92 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h
@@ -591,6 +591,21 @@ static inline int cpulist_scnprintf(char *buf, int len,
591	}	591	}
592		592
593	/**	593	/**
		594	* cpumask_parse - extract a cpumask from from a string
		595	* @buf: the buffer to extract from
		596	* @dstp: the cpumask to set.
		597	*
		598	* Returns -errno, or 0 for success.
		599	*/
		600	static inline int cpumask_parse(const char buf, struct cpumask dstp)
		601	{
		602	char *nl = strchr(buf, '\n');
		603	int len = nl ? nl - buf : strlen(buf);
		604
		605	return bitmap_parse(buf, len, cpumask_bits(dstp), nr_cpumask_bits);
		606	}
		607
		608	/**
594	* cpulist_parse - extract a cpumask from a user string of ranges	609	* cpulist_parse - extract a cpumask from a user string of ranges
595	* @buf: the buffer to extract from	610	* @buf: the buffer to extract from
596	* @dstp: the cpumask to set.	611	* @dstp: the cpumask to set.


diff --git a/include/linux/device.h b/include/linux/device.h index 88615ccaf23a..711793b145ff 100644 --- a/include/linux/device.h +++ b/include/linux/device.h
@@ -297,6 +297,8 @@ void subsys_interface_unregister(struct subsys_interface *sif);
297		297
298	int subsys_system_register(struct bus_type *subsys,	298	int subsys_system_register(struct bus_type *subsys,
299	const struct attribute_group **groups);	299	const struct attribute_group **groups);
		300	int subsys_virtual_register(struct bus_type *subsys,
		301	const struct attribute_group **groups);
300		302
301	/**	303	/**
302	* struct class - device classes	304	* struct class - device classes


diff --git a/include/linux/sched.h b/include/linux/sched.h index 2d02c76a01be..bcbc30397f23 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h
@@ -1793,7 +1793,7 @@ extern void thread_group_cputime_adjusted(struct task_struct p, cputime_t ut,
1793	#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */	1793	#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
1794	#define PF_SPREAD_PAGE 0x01000000 /* Spread page cache over cpuset */	1794	#define PF_SPREAD_PAGE 0x01000000 /* Spread page cache over cpuset */
1795	#define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */	1795	#define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */
1796	#define PF_THREAD_BOUND 0x04000000 /* Thread bound to specific cpu */	1796	#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_allowed */
1797	#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */	1797	#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
1798	#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */	1798	#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
1799	#define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */	1799	#define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */


diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 8afab27cdbc2..717975639378 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h
@@ -11,6 +11,7 @@
11	#include <linux/lockdep.h>	11	#include <linux/lockdep.h>
12	#include <linux/threads.h>	12	#include <linux/threads.h>
13	#include <linux/atomic.h>	13	#include <linux/atomic.h>
		14	#include <linux/cpumask.h>
14		15
15	struct workqueue_struct;	16	struct workqueue_struct;
16		17
@@ -68,7 +69,7 @@ enum {
68	WORK_STRUCT_COLOR_BITS,	69	WORK_STRUCT_COLOR_BITS,
69		70
70	/* data contains off-queue information when !WORK_STRUCT_PWQ */	71	/* data contains off-queue information when !WORK_STRUCT_PWQ */
71	WORK_OFFQ_FLAG_BASE = WORK_STRUCT_FLAG_BITS,	72	WORK_OFFQ_FLAG_BASE = WORK_STRUCT_COLOR_SHIFT,
72		73
73	WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE),	74	WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE),
74		75
@@ -115,6 +116,20 @@ struct delayed_work {
115	int cpu;	116	int cpu;
116	};	117	};
117		118
		119	/*
		120	* A struct for workqueue attributes. This can be used to change
		121	* attributes of an unbound workqueue.
		122	*
		123	* Unlike other fields, ->no_numa isn't a property of a worker_pool. It
		124	* only modifies how apply_workqueue_attrs() select pools and thus doesn't
		125	* participate in pool hash calculations or equality comparisons.
		126	*/
		127	struct workqueue_attrs {
		128	int nice; /* nice level */
		129	cpumask_var_t cpumask; /* allowed CPUs */
		130	bool no_numa; /* disable NUMA affinity */
		131	};
		132
118	static inline struct delayed_work to_delayed_work(struct work_struct work)	133	static inline struct delayed_work to_delayed_work(struct work_struct work)
119	{	134	{
120	return container_of(work, struct delayed_work, work);	135	return container_of(work, struct delayed_work, work);
@@ -283,9 +298,10 @@ enum {
283	WQ_MEM_RECLAIM = 1 << 3, /* may be used for memory reclaim */	298	WQ_MEM_RECLAIM = 1 << 3, /* may be used for memory reclaim */
284	WQ_HIGHPRI = 1 << 4, /* high priority */	299	WQ_HIGHPRI = 1 << 4, /* high priority */
285	WQ_CPU_INTENSIVE = 1 << 5, /* cpu instensive workqueue */	300	WQ_CPU_INTENSIVE = 1 << 5, /* cpu instensive workqueue */
		301	WQ_SYSFS = 1 << 6, /* visible in sysfs, see wq_sysfs_register() */
286		302
287	WQ_DRAINING = 1 << 6, /* internal: workqueue is draining */	303	__WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */
288	WQ_RESCUER = 1 << 7, /* internal: workqueue has rescuer */	304	__WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
289		305
290	WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */	306	WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */
291	WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */	307	WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */
@@ -388,7 +404,7 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,
388	* Pointer to the allocated workqueue on success, %NULL on failure.	404	* Pointer to the allocated workqueue on success, %NULL on failure.
389	*/	405	*/
390	#define alloc_ordered_workqueue(fmt, flags, args...) \	406	#define alloc_ordered_workqueue(fmt, flags, args...) \
391	alloc_workqueue(fmt, WQ_UNBOUND \| (flags), 1, ##args)	407	alloc_workqueue(fmt, WQ_UNBOUND \| __WQ_ORDERED \| (flags), 1, ##args)
392		408
393	#define create_workqueue(name) \	409	#define create_workqueue(name) \
394	alloc_workqueue((name), WQ_MEM_RECLAIM, 1)	410	alloc_workqueue((name), WQ_MEM_RECLAIM, 1)
@@ -399,30 +415,23 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,
399		415
400	extern void destroy_workqueue(struct workqueue_struct *wq);	416	extern void destroy_workqueue(struct workqueue_struct *wq);
401		417
		418	struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask);
		419	void free_workqueue_attrs(struct workqueue_attrs *attrs);
		420	int apply_workqueue_attrs(struct workqueue_struct *wq,
		421	const struct workqueue_attrs *attrs);
		422
402	extern bool queue_work_on(int cpu, struct workqueue_struct *wq,	423	extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
403	struct work_struct *work);	424	struct work_struct *work);
404	extern bool queue_work(struct workqueue_struct wq, struct work_struct work);
405	extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,	425	extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
406	struct delayed_work *work, unsigned long delay);	426	struct delayed_work *work, unsigned long delay);
407	extern bool queue_delayed_work(struct workqueue_struct *wq,
408	struct delayed_work *work, unsigned long delay);
409	extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,	427	extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
410	struct delayed_work *dwork, unsigned long delay);	428	struct delayed_work *dwork, unsigned long delay);
411	extern bool mod_delayed_work(struct workqueue_struct *wq,
412	struct delayed_work *dwork, unsigned long delay);
413		429
414	extern void flush_workqueue(struct workqueue_struct *wq);	430	extern void flush_workqueue(struct workqueue_struct *wq);
415	extern void drain_workqueue(struct workqueue_struct *wq);	431	extern void drain_workqueue(struct workqueue_struct *wq);
416	extern void flush_scheduled_work(void);	432	extern void flush_scheduled_work(void);
417		433
418	extern bool schedule_work_on(int cpu, struct work_struct *work);
419	extern bool schedule_work(struct work_struct *work);
420	extern bool schedule_delayed_work_on(int cpu, struct delayed_work *work,
421	unsigned long delay);
422	extern bool schedule_delayed_work(struct delayed_work *work,
423	unsigned long delay);
424	extern int schedule_on_each_cpu(work_func_t func);	434	extern int schedule_on_each_cpu(work_func_t func);
425	extern int keventd_up(void);
426		435
427	int execute_in_process_context(work_func_t fn, struct execute_work *);	436	int execute_in_process_context(work_func_t fn, struct execute_work *);
428		437
@@ -435,9 +444,121 @@ extern bool cancel_delayed_work_sync(struct delayed_work *dwork);
435		444
436	extern void workqueue_set_max_active(struct workqueue_struct *wq,	445	extern void workqueue_set_max_active(struct workqueue_struct *wq,
437	int max_active);	446	int max_active);
438	extern bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq);	447	extern bool current_is_workqueue_rescuer(void);
		448	extern bool workqueue_congested(int cpu, struct workqueue_struct *wq);
439	extern unsigned int work_busy(struct work_struct *work);	449	extern unsigned int work_busy(struct work_struct *work);
440		450
		451	/**
		452	* queue_work - queue work on a workqueue
		453	* @wq: workqueue to use
		454	* @work: work to queue
		455	*
		456	* Returns %false if @work was already on a queue, %true otherwise.
		457	*
		458	* We queue the work to the CPU on which it was submitted, but if the CPU dies
		459	* it can be processed by another CPU.
		460	*/
		461	static inline bool queue_work(struct workqueue_struct *wq,
		462	struct work_struct *work)
		463	{
		464	return queue_work_on(WORK_CPU_UNBOUND, wq, work);
		465	}
		466
		467	/**
		468	* queue_delayed_work - queue work on a workqueue after delay
		469	* @wq: workqueue to use
		470	* @dwork: delayable work to queue
		471	* @delay: number of jiffies to wait before queueing
		472	*
		473	* Equivalent to queue_delayed_work_on() but tries to use the local CPU.
		474	*/
		475	static inline bool queue_delayed_work(struct workqueue_struct *wq,
		476	struct delayed_work *dwork,
		477	unsigned long delay)
		478	{
		479	return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
		480	}
		481
		482	/**
		483	* mod_delayed_work - modify delay of or queue a delayed work
		484	* @wq: workqueue to use
		485	* @dwork: work to queue
		486	* @delay: number of jiffies to wait before queueing
		487	*
		488	* mod_delayed_work_on() on local CPU.
		489	*/
		490	static inline bool mod_delayed_work(struct workqueue_struct *wq,
		491	struct delayed_work *dwork,
		492	unsigned long delay)
		493	{
		494	return mod_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
		495	}
		496
		497	/**
		498	* schedule_work_on - put work task on a specific cpu
		499	* @cpu: cpu to put the work task on
		500	* @work: job to be done
		501	*
		502	* This puts a job on a specific cpu
		503	*/
		504	static inline bool schedule_work_on(int cpu, struct work_struct *work)
		505	{
		506	return queue_work_on(cpu, system_wq, work);
		507	}
		508
		509	/**
		510	* schedule_work - put work task in global workqueue
		511	* @work: job to be done
		512	*
		513	* Returns %false if @work was already on the kernel-global workqueue and
		514	* %true otherwise.
		515	*
		516	* This puts a job in the kernel-global workqueue if it was not already
		517	* queued and leaves it in the same position on the kernel-global
		518	* workqueue otherwise.
		519	*/
		520	static inline bool schedule_work(struct work_struct *work)
		521	{
		522	return queue_work(system_wq, work);
		523	}
		524
		525	/**
		526	* schedule_delayed_work_on - queue work in global workqueue on CPU after delay
		527	* @cpu: cpu to use
		528	* @dwork: job to be done
		529	* @delay: number of jiffies to wait
		530	*
		531	* After waiting for a given time this puts a job in the kernel-global
		532	* workqueue on the specified CPU.
		533	*/
		534	static inline bool schedule_delayed_work_on(int cpu, struct delayed_work *dwork,
		535	unsigned long delay)
		536	{
		537	return queue_delayed_work_on(cpu, system_wq, dwork, delay);
		538	}
		539
		540	/**
		541	* schedule_delayed_work - put work task in global workqueue after delay
		542	* @dwork: job to be done
		543	* @delay: number of jiffies to wait or 0 for immediate execution
		544	*
		545	* After waiting for a given time this puts a job in the kernel-global
		546	* workqueue.
		547	*/
		548	static inline bool schedule_delayed_work(struct delayed_work *dwork,
		549	unsigned long delay)
		550	{
		551	return queue_delayed_work(system_wq, dwork, delay);
		552	}
		553
		554	/**
		555	* keventd_up - is workqueue initialized yet?
		556	*/
		557	static inline bool keventd_up(void)
		558	{
		559	return system_wq != NULL;
		560	}
		561
441	/*	562	/*
442	* Like above, but uses del_timer() instead of del_timer_sync(). This means,	563	* Like above, but uses del_timer() instead of del_timer_sync(). This means,
443	* if it returns 0 the timer function may be running and the queueing is in	564	* if it returns 0 the timer function may be running and the queueing is in
@@ -466,12 +587,12 @@ static inline bool __deprecated flush_delayed_work_sync(struct delayed_work *dwo
466	}	587	}
467		588
468	#ifndef CONFIG_SMP	589	#ifndef CONFIG_SMP
469	static inline long work_on_cpu(unsigned int cpu, long (fn)(void ), void *arg)	590	static inline long work_on_cpu(int cpu, long (fn)(void ), void *arg)
470	{	591	{
471	return fn(arg);	592	return fn(arg);
472	}	593	}
473	#else	594	#else
474	long work_on_cpu(unsigned int cpu, long (fn)(void ), void *arg);	595	long work_on_cpu(int cpu, long (fn)(void ), void *arg);
475	#endif /* CONFIG_SMP */	596	#endif /* CONFIG_SMP */
476		597
477	#ifdef CONFIG_FREEZER	598	#ifdef CONFIG_FREEZER
@@ -480,4 +601,11 @@ extern bool freeze_workqueues_busy(void);
480	extern void thaw_workqueues(void);	601	extern void thaw_workqueues(void);
481	#endif /* CONFIG_FREEZER */	602	#endif /* CONFIG_FREEZER */
482		603
		604	#ifdef CONFIG_SYSFS
		605	int workqueue_sysfs_register(struct workqueue_struct *wq);
		606	#else /* CONFIG_SYSFS */
		607	static inline int workqueue_sysfs_register(struct workqueue_struct *wq)
		608	{ return 0; }
		609	#endif /* CONFIG_SYSFS */
		610
483	#endif	611	#endif