aboutsummaryrefslogtreecommitdiffstats
path: root/drivers
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-04-29 22:07:40 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2013-04-29 22:07:40 -0400
commit46d9be3e5eb01f71fc02653755d970247174b400 (patch)
tree01534c9ebfa5f52a7133e34354d2831fe6704f15 /drivers
parentce8aa48929449b491149b6c87861ac69cb797a42 (diff)
parentcece95dfe5aa56ba99e51b4746230ff0b8542abd (diff)
Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo: "A lot of activities on workqueue side this time. The changes achieve the followings. - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are updated to be able to interface with multiple backend worker pools. This involved a lot of churning but the end result seems actually neater as unbound workqueues are now a lot closer to per-cpu ones. - The ability to interface with multiple backend worker pools are used to implement unbound workqueues with custom attributes. Currently the supported attributes are the nice level and CPU affinity. It may be expanded to include cgroup association in future. The attributes can be specified either by calling apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if the workqueue in question is exported through sysfs. The backend worker pools are keyed by the actual attributes and shared by any workqueues which share the same attributes. When attributes of a workqueue are changed, the workqueue binds to the worker pool with the specified attributes while leaving the work items which are already executing in its previous worker pools alone. This allows converting custom worker pool implementations which want worker attribute tuning to use workqueues. The writeback pool is already converted in block tree and there are a couple others are likely to follow including btrfs io workers. - WQ_UNBOUND's ability to bind to multiple worker pools is also used to make it NUMA-aware. Because there's no association between work item issuer and the specific worker assigned to execute it, before this change, using unbound workqueue led to unnecessary cross-node bouncing and it couldn't be helped by autonuma as it requires tasks to have implicit node affinity and workers are assigned randomly. After these changes, an unbound workqueue now binds to multiple NUMA-affine worker pools so that queued work items are executed in the same node. This is turned on by default but can be disabled system-wide or for individual workqueues. Crypto was requesting NUMA affinity as encrypting data across different nodes can contribute noticeable overhead and doing it per-cpu was too limiting for certain cases and IO throughput could be bottlenecked by one CPU being fully occupied while others have idle cycles. While the new features required a lot of changes including restructuring locking, it didn't complicate the execution paths much. The unbound workqueue handling is now closer to per-cpu ones and the new features are implemented by simply associating a workqueue with different sets of backend worker pools without changing queue, execution or flush paths. As such, even though the amount of change is very high, I feel relatively safe in that it isn't likely to cause subtle issues with basic correctness of work item execution and handling. If something is wrong, it's likely to show up as being associated with worker pools with the wrong attributes or OOPS while workqueue attributes are being changed or during CPU hotplug. While this creates more backend worker pools, it doesn't add too many more workers unless, of course, there are many workqueues with unique combinations of attributes. Assuming everything else is the same, NUMA awareness costs an extra worker pool per NUMA node with online CPUs. There are also a couple things which are being routed outside the workqueue tree. - block tree pulled in workqueue for-3.10 so that writeback worker pool can be converted to unbound workqueue with sysfs control exposed. This simplifies the code, makes writeback workers NUMA-aware and allows tuning nice level and CPU affinity via sysfs. - The conversion to workqueue means that there's no 1:1 association between a specific worker, which makes writeback folks unhappy as they want to be able to tell which filesystem caused a problem from backtrace on systems with many filesystems mounted. This is resolved by allowing work items to set debug info string which is printed when the task is dumped. As this change involves unifying implementations of dump_stack() and friends in arch codes, it's being routed through Andrew's -mm tree." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits) workqueue: use kmem_cache_free() instead of kfree() workqueue: avoid false negative WARN_ON() in destroy_workqueue() workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity workqueue: implement NUMA affinity for unbound workqueues workqueue: introduce put_pwq_unlocked() workqueue: introduce numa_pwq_tbl_install() workqueue: use NUMA-aware allocation for pool_workqueues workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() workqueue: map an unbound workqueues to multiple per-node pool_workqueues workqueue: move hot fields of workqueue_struct to the end workqueue: make workqueue->name[] fixed len workqueue: add workqueue->unbound_attrs workqueue: determine NUMA node of workers accourding to the allowed cpumask workqueue: drop 'H' from kworker names of unbound worker pools workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] workqueue: move pwq_pool_locking outside of get/put_unbound_pool() workqueue: fix memory leak in apply_workqueue_attrs() workqueue: fix unbound workqueue attrs hashing / comparison workqueue: fix race condition in unbound workqueue free path workqueue: remove pwq_lock which is no longer used ...
Diffstat (limited to 'drivers')
-rw-r--r--drivers/base/base.h2
-rw-r--r--drivers/base/bus.c73
-rw-r--r--drivers/base/core.c2
3 files changed, 55 insertions, 22 deletions
diff --git a/drivers/base/base.h b/drivers/base/base.h
index 6ee17bb391a9..b8bdfe61daa6 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -101,6 +101,8 @@ static inline int hypervisor_init(void) { return 0; }
101extern int platform_bus_init(void); 101extern int platform_bus_init(void);
102extern void cpu_dev_init(void); 102extern void cpu_dev_init(void);
103 103
104struct kobject *virtual_device_parent(struct device *dev);
105
104extern int bus_add_device(struct device *dev); 106extern int bus_add_device(struct device *dev);
105extern void bus_probe_device(struct device *dev); 107extern void bus_probe_device(struct device *dev);
106extern void bus_remove_device(struct device *dev); 108extern void bus_remove_device(struct device *dev);
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 8a00dec574d6..1a68f947ded8 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -1205,26 +1205,10 @@ static void system_root_device_release(struct device *dev)
1205{ 1205{
1206 kfree(dev); 1206 kfree(dev);
1207} 1207}
1208/** 1208
1209 * subsys_system_register - register a subsystem at /sys/devices/system/ 1209static int subsys_register(struct bus_type *subsys,
1210 * @subsys: system subsystem 1210 const struct attribute_group **groups,
1211 * @groups: default attributes for the root device 1211 struct kobject *parent_of_root)
1212 *
1213 * All 'system' subsystems have a /sys/devices/system/<name> root device
1214 * with the name of the subsystem. The root device can carry subsystem-
1215 * wide attributes. All registered devices are below this single root
1216 * device and are named after the subsystem with a simple enumeration
1217 * number appended. The registered devices are not explicitely named;
1218 * only 'id' in the device needs to be set.
1219 *
1220 * Do not use this interface for anything new, it exists for compatibility
1221 * with bad ideas only. New subsystems should use plain subsystems; and
1222 * add the subsystem-wide attributes should be added to the subsystem
1223 * directory itself and not some create fake root-device placed in
1224 * /sys/devices/system/<name>.
1225 */
1226int subsys_system_register(struct bus_type *subsys,
1227 const struct attribute_group **groups)
1228{ 1212{
1229 struct device *dev; 1213 struct device *dev;
1230 int err; 1214 int err;
@@ -1243,7 +1227,7 @@ int subsys_system_register(struct bus_type *subsys,
1243 if (err < 0) 1227 if (err < 0)
1244 goto err_name; 1228 goto err_name;
1245 1229
1246 dev->kobj.parent = &system_kset->kobj; 1230 dev->kobj.parent = parent_of_root;
1247 dev->groups = groups; 1231 dev->groups = groups;
1248 dev->release = system_root_device_release; 1232 dev->release = system_root_device_release;
1249 1233
@@ -1263,8 +1247,55 @@ err_dev:
1263 bus_unregister(subsys); 1247 bus_unregister(subsys);
1264 return err; 1248 return err;
1265} 1249}
1250
1251/**
1252 * subsys_system_register - register a subsystem at /sys/devices/system/
1253 * @subsys: system subsystem
1254 * @groups: default attributes for the root device
1255 *
1256 * All 'system' subsystems have a /sys/devices/system/<name> root device
1257 * with the name of the subsystem. The root device can carry subsystem-
1258 * wide attributes. All registered devices are below this single root
1259 * device and are named after the subsystem with a simple enumeration
1260 * number appended. The registered devices are not explicitely named;
1261 * only 'id' in the device needs to be set.
1262 *
1263 * Do not use this interface for anything new, it exists for compatibility
1264 * with bad ideas only. New subsystems should use plain subsystems; and
1265 * add the subsystem-wide attributes should be added to the subsystem
1266 * directory itself and not some create fake root-device placed in
1267 * /sys/devices/system/<name>.
1268 */
1269int subsys_system_register(struct bus_type *subsys,
1270 const struct attribute_group **groups)
1271{
1272 return subsys_register(subsys, groups, &system_kset->kobj);
1273}
1266EXPORT_SYMBOL_GPL(subsys_system_register); 1274EXPORT_SYMBOL_GPL(subsys_system_register);
1267 1275
1276/**
1277 * subsys_virtual_register - register a subsystem at /sys/devices/virtual/
1278 * @subsys: virtual subsystem
1279 * @groups: default attributes for the root device
1280 *
1281 * All 'virtual' subsystems have a /sys/devices/system/<name> root device
1282 * with the name of the subystem. The root device can carry subsystem-wide
1283 * attributes. All registered devices are below this single root device.
1284 * There's no restriction on device naming. This is for kernel software
1285 * constructs which need sysfs interface.
1286 */
1287int subsys_virtual_register(struct bus_type *subsys,
1288 const struct attribute_group **groups)
1289{
1290 struct kobject *virtual_dir;
1291
1292 virtual_dir = virtual_device_parent(NULL);
1293 if (!virtual_dir)
1294 return -ENOMEM;
1295
1296 return subsys_register(subsys, groups, virtual_dir);
1297}
1298
1268int __init buses_init(void) 1299int __init buses_init(void)
1269{ 1300{
1270 bus_kset = kset_create_and_add("bus", &bus_uevent_ops, NULL); 1301 bus_kset = kset_create_and_add("bus", &bus_uevent_ops, NULL);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index f88d9e259a32..016312437577 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -703,7 +703,7 @@ void device_initialize(struct device *dev)
703 set_dev_node(dev, -1); 703 set_dev_node(dev, -1);
704} 704}
705 705
706static struct kobject *virtual_device_parent(struct device *dev) 706struct kobject *virtual_device_parent(struct device *dev)
707{ 707{
708 static struct kobject *virtual_dir = NULL; 708 static struct kobject *virtual_dir = NULL;
709 709