aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
...
| * | | | | | | | | | | | | | | | cpuset: introduce ->css_on/offline()Tejun Heo2013-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add cpuset_css_on/offline() and rearrange css init/exit such that, * Allocation and clearing to the default values happen in css_alloc(). Allocation now uses kzalloc(). * Config inheritance and registration happen in css_online(). * css_offline() undoes what css_online() did. * css_free() frees. This doesn't introduce any visible behavior changes. This will help cleaning up locking. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
| * | | | | | | | | | | | | | | | cpuset: remove fast exit path from remove_tasks_in_empty_cpuset()Tejun Heo2013-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function isn't that hot, the overhead of missing the fast exit is low, the test itself depends heavily on cgroup internals, and it's gonna be a hindrance when trying to decouple cpuset locking from cgroup core. Remove the fast exit path. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
| * | | | | | | | | | | | | | | | cpuset: remove unused cpuset_unlock()Tejun Heo2013-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | | | | | | | | | | | | | | | | Merge branch 'for-3.9' of ↵Linus Torvalds2013-02-20
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | |_|_|_|_|_|_|_|_|_|_|_|_|_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Nothing too drastic. - Removal of synchronize_rcu() from userland visible paths. - Various fixes and cleanups from Li. - cgroup_rightmost_descendant() added which will be used by cpuset changes (it will be a separate pull request)." * 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fail if monitored file and event_control are in different cgroup cgroup: fix cgroup_rmdir() vs close(eventfd) race cpuset: fix cpuset_print_task_mems_allowed() vs rename() race cgroup: fix exit() vs rmdir() race cgroup: remove bogus comments in cgroup_diput() cgroup: remove synchronize_rcu() from cgroup_diput() cgroup: remove duplicate RCU free on struct cgroup sched: remove redundant NULL cgroup check in task_group_path() sched: split out css_online/css_offline from tg creation/destruction cgroup: initialize cgrp->dentry before css_alloc() cgroup: remove a NULL check in cgroup_exit() cgroup: fix bogus kernel warnings when cgroup_create() failed cgroup: remove synchronize_rcu() from rebind_subsystems() cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}() cgroup: use new hashtable implementation cgroups: fix cgroup_event_listener error handling cgroups: move cgroup_event_listener.c to tools/cgroup cgroup: implement cgroup_rightmost_descendant() cgroup: remove unused dummy cgroup_fork_callbacks()
| * | | | | | | | | | | | | | | | cgroup: fail if monitored file and event_control are in different cgroupLi Zefan2013-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we pass fd of memory.usage_in_bytes of cgroup A to cgroup.event_control of cgroup B, then we won't get memory usage notification from A but B! What's worse, if A and B are in different mount hierarchy, we'll end up accessing NULL pointer! Disallow this kind of invalid usage. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | cgroup: fix cgroup_rmdir() vs close(eventfd) raceLi Zefan2013-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep warning for event_control") solved a deadlock by introducing a new bug. Move cgrp->event_list to a temporary list doesn't mean you can traverse this list locklessly, because at the same time cgroup_event_wake() can be called and remove the event from the list. The result of this race is disastrous. We adopt the way how kvm irqfd code implements race-free event removal, which is now described in the comments in cgroup_event_wake(). v3: - call eventfd_signal() no matter it's eventfd close or cgroup removal that removes the cgroup event. Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | cpuset: fix cpuset_print_task_mems_allowed() vs rename() raceLi Zefan2013-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rename() will change dentry->d_name. The result of this race can be worse than seeing partially rewritten name, but we might access a stale pointer because rename() will re-allocate memory to hold a longer name. It's safe in the protection of dentry->d_lock. v2: check NULL dentry before acquiring dentry lock. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
| * | | | | | | | | | | | | | | | cgroup: fix exit() vs rmdir() raceLi Zefan2013-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cgroup_exit() put_css_set_taskexit() is called without any lock, which might lead to accessing a freed cgroup: thread1 thread2 --------------------------------------------- exit() cgroup_exit() put_css_set_taskexit() atomic_dec(cgrp->count); rmdir(); /* not safe !! */ check_for_release(cgrp); rcu_read_lock() can be used to make sure the cgroup is alive. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
| * | | | | | | | | | | | | | | | cgroup: remove bogus comments in cgroup_diput()Li Zefan2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 48ddbe194623ae089cc0576e60363f2d2e85662a ("cgroup: make css->refcnt clearing on cgroup removal optional"), each css holds a ref on cgroup's dentry, so cgroup_diput() won't be called until all css' refs go down to 0, which invalids the comments. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | cgroup: remove synchronize_rcu() from cgroup_diput()Li Zefan2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Free cgroup via call_rcu(). The actual work is done through workqueue. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | cgroup: remove duplicate RCU free on struct cgroupLi Zefan2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When destroying a cgroup, though in cgroup_diput() we've called synchronize_rcu(), we then still have to free it via call_rcu(). The story is, long ago to fix a race between reading /proc/sched_debug and freeing cgroup, the code was changed to utilize call_rcu(). See commit a47295e6bc42ad35f9c15ac66f598aa24debd4e2 ("cgroups: make cgroup_path() RCU-safe") As we've fixed cpu cgroup that cpu_cgroup_offline_css() is used to unregister a task_group so there won't be concurrent access to this task_group after synchronize_rcu() in diput(). Now we can just kfree(cgrp). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | sched: remove redundant NULL cgroup check in task_group_path()Li Zefan2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A task_group won't be online (thus no one can see it) until cpu_cgroup_css_online(), and at that time tg->css.cgroup has been initialized, so this NULL check is redundant. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | sched: split out css_online/css_offline from tg creation/destructionLi Zefan2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preparaton for later patches. - What do we gain from cpu_cgroup_css_online(): After ss->css_alloc() and before ss->css_online(), there's a small window that tg->css.cgroup is NULL. With this change, tg won't be seen before ss->css_online(), where it's added to the global list, so we're guaranteed we'll never see NULL tg->css.cgroup. - What do we gain from cpu_cgroup_css_offline(): tg is freed via RCU, so is cgroup. Without this change, This is how synchronization works: cgroup_rmdir() no ss->css_offline() diput() syncornize_rcu() ss->css_free() <-- unregister tg, and free it via call_rcu() kfree_rcu(cgroup) <-- wait possible refs to cgroup, and free cgroup We can't just kfree(cgroup), because tg might access tg->css.cgroup. With this change: cgroup_rmdir() ss->css_offline() <-- unregister tg diput() synchronize_rcu() <-- wait possible refs to tg and cgroup ss->css_free() <-- free tg kfree_rcu(cgroup) <-- free cgroup As you see, kfree_rcu() is redundant now. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | | | | | | | | | cgroup: initialize cgrp->dentry before css_alloc()Li Zefan2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this change, we're guaranteed that cgroup_path() won't see NULL cgrp->dentry, and thus we can remove the NULL check in it. (Well, it's not strictly true, because dummptop.dentry is always NULL but we already handle that separately.) Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | cgroup: remove a NULL check in cgroup_exit()Li Zefan2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | init_task.cgroups is initialized at boot phase, and whenver a ask is forked, it's cgroups pointer is inherited from its parent, and it's never set to NULL afterwards. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | cgroup: fix bogus kernel warnings when cgroup_create() failedLi Zefan2013-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If cgroup_create() failed and cgroup_destroy_locked() is called to do cleanup, we'll see a bunch of warnings: cgroup_addrm_files: failed to remove 2MB.limit_in_bytes, err=-2 cgroup_addrm_files: failed to remove 2MB.usage_in_bytes, err=-2 cgroup_addrm_files: failed to remove 2MB.max_usage_in_bytes, err=-2 cgroup_addrm_files: failed to remove 2MB.failcnt, err=-2 cgroup_addrm_files: failed to remove prioidx, err=-2 cgroup_addrm_files: failed to remove ifpriomap, err=-2 ... We failed to remove those files, because cgroup_create() has failed before creating those cgroup files. To fix this, we simply don't warn if cgroup_rm_file() can't find the cft entry. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | cgroup: remove synchronize_rcu() from rebind_subsystems()Li Zefan2013-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nothing's protected by RCU in rebind_subsystems(), and I can't think of a reason why it is needed. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}()Li Zefan2013-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These 2 syncronize_rcu()s make attaching a task to a cgroup quite slow, and it can't be ignored in some situations. A real case from Colin Cross: Android uses cgroups heavily to manage thread priorities, putting threads in a background group with reduced cpu.shares when they are not visible to the user, and in a foreground group when they are. Some RPCs from foreground threads to background threads will temporarily move the background thread into the foreground group for the duration of the RPC. This results in many calls to cgroup_attach_task. In cgroup_attach_task() it's task->cgroups that is protected by RCU, and put_css_set() calls kfree_rcu() to free it. If we remove this synchronize_rcu(), there can be threads in RCU-read sections accessing their old cgroup via current->cgroups with concurrent rmdir operation, but this is safe. # time for ((i=0; i<50; i++)) { echo $$ > /mnt/sub/tasks; echo $$ > /mnt/tasks; } real 0m2.524s user 0m0.008s sys 0m0.004s With this patch: real 0m0.004s user 0m0.004s sys 0m0.000s tj: These synchronize_rcu()s are utterly confused. synchornize_rcu() necessarily has to come between two operations to guarantee that the changes made by the former operation are visible to all rcu readers before proceeding to the latter operation. Here, synchornize_rcu() are at the end of attach operations with nothing beyond it. Its only effect would be delaying completion of write(2) to sysfs tasks/procs files until all rcu readers see the change, which doesn't mean anything. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Colin Cross <ccross@google.com>
| * | | | | | | | | | | | | | | | cgroup: use new hashtable implementationLi Zefan2013-01-10
| |/ / / / / / / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch cgroup to use the new hashtable implementation. No functional changes. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | / / / / / / / / / / / / cgroup: implement cgroup_rightmost_descendant()Tejun Heo2013-01-07
| | |_|/ / / / / / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement cgroup_rightmost_descendant() which returns the right most descendant of the specified cgroup. This can be used to skip the cgroup's subtree while iterating with cgroup_for_each_descendant_pre(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizefan@huawei.com>
* | | | | | | | | | | | | | | Merge branch 'for-3.9-async' of ↵Linus Torvalds2013-02-20
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull async changes from Tejun Heo: "These are followups for the earlier deadlock issue involving async ending up waiting for itself through block requesting module[1]. The following changes are made by these commits. - Instead of requesting default elevator on each request_queue init, block now requests it once early during boot. - Kmod triggers warning if invoked from an async worker. - Async synchronization implementation has been reimplemented. It's a lot simpler now." * 'for-3.9-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: async: initialise list heads to fix crash async: replace list of active domains with global list of pending items async: keep pending tasks on async_domain and remove async_pending async: use ULLONG_MAX for infinity cookie value async: bring sanity to the use of words domain and running async, kmod: warn on synchronous request_module() from async workers block: don't request module during elevator init init, block: try to load default elevator module early during boot
| * | | | | | | | | | | | | | | async: initialise list heads to fix crashJames Hogan2013-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 9fdb04cdc55 ("async: replace list of active domains with global list of pending items") added a struct list_head global_list in struct async_entry, which isn't initialised. This means that if !domain->registered at __async_schedule(), then list_del_init() will be called on the list head in async_run_entry_fn with both pointers NULL, causing a crash. This is fixed by initialising both the global_list and domain_list list_heads after kzalloc'ing the entry. This was noticed due to dapm_power_widgets() which uses ASYNC_DOMAIN_EXCLUSIVE, which initialises the domain->registered to 0. Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: James Hogan <james.hogan@imgtec.com> Reported-by: Stephen Warren <swarren@wwwdotorg.org>
| * | | | | | | | | | | | | | | async: replace list of active domains with global list of pending itemsTejun Heo2013-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Global synchronization - async_synchronize_full() - is currently implemented by keeping a list of all active registered domains and syncing them one by one until no domain is active. While this isn't necessarily a complex scheme, it can easily be simplified by keeping global list of the pending items of all registered active domains instead of list of domains and simply using the globl pending list the same way as domain syncing. This patch replaces async_domains with async_global_pending and update lowest_in_progress() to use the global pending list if @domain is %NULL. async_synchronize_full_domain(NULL) is now allowed and equivalent to async_synchronize_full(). As no one is calling with NULL domain, this doesn't affect any existing users. async_register_mutex is no longer necessary and dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <djbw@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | | async: keep pending tasks on async_domain and remove async_pendingTejun Heo2013-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Async kept single global pending list and per-domain running lists. When an async item is queued, it's put on the global pending list. The item is moved to the per-domain running list when its execution starts. At this point, this design complicates execution and synchronization without bringing any benefit. The list only matters for synchronization which doesn't care whether a given async item is pending or executing. Also, global synchronization is done by iterating through all active registered async_domains, so the global async_pending list doesn't help anything either. Rename async_domain->running to async_domain->pending and put async items directly there and remove when execution completes. This simplifies lowest_in_progress() a lot - the first item on the pending list is the one with the lowest cookie, and async_run_entry_fn() doesn't have to mess with moving the item from pending to running. After the change, whether a domain is empty or not can be trivially determined by looking at async_domain->pending. Remove async_domain->count and use list_empty() on pending instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <djbw@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | | async: use ULLONG_MAX for infinity cookie valueTejun Heo2013-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, next_cookie is used as the infinity value. In most cases, this should work fine but it theoretically could bring subtle behavior difference between async_synchronize_full() and async_synchronize_full_domain(). async_synchronize_full() keeps waiting until there's no registered async_entry left regardless of what next_cookie was when the function was called. It guarantees that the queue is completely drained at least once before returning. However, async_synchronize_full_domain() doesn't. It synchronizes upto next_cookie and if further async jobs are queued after the next_cookie value to synchronize is decided, they won't be waited for. For unrelated async jobs, the behavior difference doesn't matter; however, if async jobs which are related (nested or otherwise) to the executing ones are queued while sychronization is in progress, the resulting behavior difference could be problematic. This can be easily fixed by using ULLONG_MAX as the infinity value instead. Define ASYNC_COOKIE_MAX as ULLONG_MAX and use it as the infinity value for synchronization. This makes async_synchronize_full_domain() fully drain the domain at least once before returning, making its behavior match async_synchronize_full(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <djbw@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | | async: bring sanity to the use of words domain and runningTejun Heo2013-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the beginning, running lists were literal struct list_heads. Later on, struct async_domain was added. For some reason, while the conversion substituted list_heads with async_domains, the variable names weren't fully converted. In more places, "running" was used for struct async_domain while other places adopted new "domain" name. The situation is made much worse by having async_domain's running list named "domain" and async_entry's field pointing to async_domain named "running". So, we end up with mix of "running" and "domain" for variable names for async_domain, with the field names of async_domain and async_entry swapped between "running" and "domain". It feels almost intentionally made to be as confusing as possible. Bring some sanity by * Renaming all async_domain variables "domain". * s/async_running/async_dfl_domain/ * s/async_domain->domain/async_domain->running/ * s/async_entry->running/async_entry->domain/ Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <djbw@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | | Merge branch 'master' into for-3.9-asyncTejun Heo2013-01-23
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | |_|/ / / / / / / / / / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To receive f56c3196f251012de9b3ebaff55732a9074fdaae ("async: fix __lowest_in_progress()"). Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | async, kmod: warn on synchronous request_module() from async workersTejun Heo2013-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Synchronous requet_module() from an async worker can lead to deadlock because module init path may invoke async_synchronize_full(). The async worker waits for request_module() to complete and the module loading waits for the async task to finish. This bug happened in the block layer because of default elevator auto-loading. Block layer has been updated not to do default elevator auto-loading and it has been decided to disallow synchronous request_module() from async workers. Trigger WARN_ON_ONCE() on synchronous request_module() from async workers. For more details, please refer to the following thread. http://thread.gmane.org/gmane.linux.kernel/1420814 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alex Riesen <raa.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com>
* | | | | | | | | | | | | | | | Merge branch 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2013-02-20
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull workqueue changes from Tejun Heo: "A lot of reorganization is going on mostly to prepare for worker pools with custom attributes so that workqueue can replace custom pool implementations in places including writeback and btrfs and make CPU assignment in crypto more flexible. workqueue evolved from purely per-cpu design and implementation, so there are a lot of assumptions regarding being bound to CPUs and even unbound workqueues are implemented as an extension of the model - workqueues running on the special unbound CPU. Bulk of changes this round are about promoting worker_pools as the top level abstraction replacing global_cwq (global cpu workqueue). At this point, I'm fairly confident about getting custom worker pools working pretty soon and ready for the next merge window. Lai's patches are replacing the convoluted mb() dancing workqueue has been doing with much simpler mechanism which only depends on assignment atomicity of long. For details, please read the commit message of 0b3dae68ac ("workqueue: simplify is-work-item-queued-here test"). While the change ends up adding one pointer to struct delayed_work, the inflation in percentage is less than five percent and it decouples delayed_work logic a lot more cleaner from usual work handling, removes the unusual memory barrier dancing, and allows for further simplification, so I think the trade-off is acceptable. There will be two more workqueue related pull requests and there are some shared commits among them. I'll write further pull requests assuming this pull request is pulled first." * 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (37 commits) workqueue: un-GPL function delayed_work_timer_fn() workqueue: rename cpu_workqueue to pool_workqueue workqueue: reimplement is_chained_work() using current_wq_worker() workqueue: fix is_chained_work() regression workqueue: pick cwq instead of pool in __queue_work() workqueue: make get_work_pool_id() cheaper workqueue: move nr_running into worker_pool workqueue: cosmetic update in try_to_grab_pending() workqueue: simplify is-work-item-queued-here test workqueue: make work->data point to pool after try_to_grab_pending() workqueue: add delayed_work->wq to simplify reentrancy handling workqueue: make work_busy() test WORK_STRUCT_PENDING first workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END workqueue: post global_cwq removal cleanups workqueue: rename nr_running variables workqueue: remove global_cwq workqueue: remove worker_pool->gcwq workqueue: replace for_each_worker_pool() with for_each_std_worker_pool() workqueue: make freezing/thawing per-pool workqueue: make hotplug processing per-pool ...
| * | | | | | | | | | | | | | | | workqueue: un-GPL function delayed_work_timer_fn()Konstantin Khlebnikov2013-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d8e794dfd51c368ed3f686b7f4172830b60ae47b ("workqueue: set delayed_work->timer function on initialization") exports function delayed_work_timer_fn() only for GPL modules. This makes delayed-works unusable for non-GPL modules, because initialization macro now requires GPL symbol. For example schedule_delayed_work() available for non-GPL. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # 3.7
| * | | | | | | | | | | | | | | | workqueue: rename cpu_workqueue to pool_workqueueTejun Heo2013-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | workqueue has moved away from global_cwqs to worker_pools and with the scheduled custom worker pools, wforkqueues will be associated with pools which don't have anything to do with CPUs. The workqueue code went through significant amount of changes recently and mass renaming isn't likely to hurt much additionally. Let's replace 'cpu' with 'pool' so that it reflects the current design. * s/struct cpu_workqueue_struct/struct pool_workqueue/ * s/cpu_wq/pool_wq/ * s/cwq/pwq/ This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: reimplement is_chained_work() using current_wq_worker()Tejun Heo2013-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is_chained_work() was added before current_wq_worker() and implemented its own ham-fisted way of finding out whether %current is a workqueue worker - it iterates through all possible workers. Drop the custom implementation and reimplement using current_wq_worker(). Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: fix is_chained_work() regressionTejun Heo2013-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | c9e7cf273f ("workqueue: move busy_hash from global_cwq to worker_pool") incorrectly converted is_chained_work() to use get_gcwq() inside for_each_gcwq_cpu() while removing get_gcwq(). As cwq might not exist for all possible workqueue CPUs, @cwq can be NULL and the following cwq deferences can lead to oops. Fix it by using for_each_cwq_cpu() instead, which is the better one to use anyway as we only need to check pools that the wq is associated with. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: pick cwq instead of pool in __queue_work()Lai Jiangshan2013-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, __queue_work() chooses the pool to queue a work item to and then determines cwq from the target wq and the chosen pool. This is a bit backwards in that we can determine cwq first and simply use cwq->pool. This way, we can skip get_std_worker_pool() in queueing path which will be a hurdle when implementing custom worker pools. Update __queue_work() such that it chooses the target cwq and then use cwq->pool instead of the other way around. While at it, add missing {} in an if statement. This patch doesn't introduce any functional changes. tj: The original patch had two get_cwq() calls - the first to determine the pool by doing get_cwq(cpu, wq)->pool and the second to determine the matching cwq from get_cwq(pool->cpu, wq). Updated the function such that it chooses cwq instead of pool and removed the second call. Rewrote the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: make get_work_pool_id() cheaperLai Jiangshan2013-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_work_pool_id() currently first obtains pool using get_work_pool() and then return pool->id. For an off-queue work item, this involves obtaining pool ID from worker->data, performing idr_find() to find the matching pool and then returning its pool->id which of course is the same as the one which went into idr_find(). Just open code WORK_STRUCT_CWQ case and directly return pool ID from work->data. tj: The original patch dropped on-queue work item handling and renamed the function to offq_work_pool_id(). There isn't much benefit in doing so. Handling it only requires a single if() and we need at least BUG_ON(), which is also a branch, even if we drop on-queue handling. Open code WORK_STRUCT_CWQ case and keep the function in line with get_work_pool(). Rewrote the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: move nr_running into worker_poolTejun Heo2013-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As nr_running is likely to be accessed from other CPUs during try_to_wake_up(), it was kept outside worker_pool; however, while less frequent, other fields in worker_pool are accessed from other CPUs for, e.g., non-reentrancy check. Also, with recent pool related changes, accessing nr_running matching the worker_pool isn't as simple as it used to be. Move nr_running inside worker_pool. Keep it aligned to cacheline and define CPU pools using DEFINE_PER_CPU_SHARED_ALIGNED(). This should give at least the same cacheline behavior. get_pool_nr_running() is replaced with direct pool->nr_running accesses. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Joonsoo Kim <js1304@gmail.com>
| * | | | | | | | | | | | | | | | workqueue: cosmetic update in try_to_grab_pending()Tejun Heo2013-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the recent is-work-queued-here test simplification, the nested if() in try_to_grab_pending() can be collapsed. Collapse it. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
| * | | | | | | | | | | | | | | | workqueue: simplify is-work-item-queued-here testLai Jiangshan2013-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, determining whether a work item is queued on a locked pool involves somewhat convoluted memory barrier dancing. It goes like the following. * When a work item is queued on a pool, work->data is updated before work->entry is linked to the pending list with a wmb() inbetween. * When trying to determine whether a work item is currently queued on a pool pointed to by work->data, it locks the pool and looks at work->entry. If work->entry is linked, we then do rmb() and then check whether work->data points to the current pool. This works because, work->data can only point to a pool if it currently is or were on the pool and, * If it currently is on the pool, the tests would obviously succeed. * It it left the pool, its work->entry was cleared under pool->lock, so if we're seeing non-empty work->entry, it has to be from the work item being linked on another pool. Because work->data is updated before work->entry is linked with wmb() inbetween, work->data update from another pool is guaranteed to be visible if we do rmb() after seeing non-empty work->entry. So, we either see empty work->entry or we see updated work->data pointin to another pool. While this works, it's convoluted, to put it mildly. With recent updates, it's now guaranteed that work->data points to cwq only while the work item is queued and that updating work->data to point to cwq or back to pool is done under pool->lock, so we can simply test whether work->data points to cwq which is associated with the currently locked pool instead of the convoluted memory barrier dancing. This patch replaces the memory barrier based "are you still here, really?" test with much simpler "does work->data points to me?" test - if work->data points to a cwq which is associated with the currently locked pool, the work item is guaranteed to be queued on the pool as work->data can start and stop pointing to such cwq only under pool->lock and the start and stop coincide with queue and dequeue. tj: Rewrote the comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: make work->data point to pool after try_to_grab_pending()Lai Jiangshan2013-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We plan to use work->data pointing to cwq as the synchronization invariant when determining whether a given work item is on a locked pool or not, which requires work->data pointing to cwq only while the work item is queued on the associated pool. With delayed_work updated not to overload work->data for target workqueue recording, the only case where we still have off-queue work->data pointing to cwq is try_to_grab_pending() which doesn't update work->data after stealing a queued work item. There's no reason for try_to_grab_pending() to not update work->data to point to the pool instead of cwq, like the normal execution does. This patch adds set_work_pool_and_keep_pending() which makes work->data point to pool instead of cwq but keeps the pending bit unlike set_work_pool_and_clear_pending() (surprise!). After this patch, it's guaranteed that only queued work items point to cwqs. This patch doesn't introduce any visible behavior change. tj: Renamed the new helper function to match set_work_pool_and_clear_pending() and rewrote the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: add delayed_work->wq to simplify reentrancy handlingLai Jiangshan2013-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid executing the same work item from multiple CPUs concurrently, a work_struct records the last pool it was on in its ->data so that, on the next queueing, the pool can be queried to determine whether the work item is still executing or not. A delayed_work goes through timer before actually being queued on the target workqueue and the timer needs to know the target workqueue and CPU. This is currently achieved by modifying delayed_work->work.data such that it points to the cwq which points to the target workqueue and the last CPU the work item was on. __queue_delayed_work() extracts the last CPU from delayed_work->work.data and then combines it with the target workqueue to create new work.data. The only thing this rather ugly hack achieves is encoding the target workqueue into delayed_work->work.data without using a separate field, which could be a trade off one can make; unfortunately, this entangles work->data management between regular workqueue and delayed_work code by setting cwq pointer before the work item is actually queued and becomes a hindrance for further improvements of work->data handling. This can be easily made sane by adding a target workqueue field to delayed_work. While delayed_work is used widely in the kernel and this does make it a bit larger (<5%), I think this is the right trade-off especially given the prospect of much saner handling of work->data which currently involves quite tricky memory barrier dancing, and don't expect to see any measureable effect. Add delayed_work->wq and drop the delayed_work->work.data overloading. tj: Rewrote the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: make work_busy() test WORK_STRUCT_PENDING firstLai Jiangshan2013-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, work_busy() first tests whether the work has a pool associated with it and if not, considers it idle. This works fine even for delayed_work.work queued on timer, as __queue_delayed_work() sets cwq on delayed_work.work - a queued delayed_work always has its cwq and thus pool associated with it. However, we're about to update delayed_work queueing and this won't hold. Update work_busy() such that it tests WORK_STRUCT_PENDING before the associated pool. This doesn't make any noticeable behavior difference now. With work_pending() test moved, the function read a lot better with "if (!pool)" test flipped to positive. Flip it. While at it, lose the comment about now non-existent reentrant workqueues. tj: Reorganized the function and rewrote the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_ENDLai Jiangshan2013-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that workqueue has moved away from gcwqs, workqueue no longer has the need to have a CPU identifier indicating "no cpu associated" - we now use WORK_OFFQ_POOL_NONE instead - and most uses of WORK_CPU_NONE are gone. The only left usage is as the end marker for for_each_*wq*() iterators, where the name WORK_CPU_NONE is confusing w/o actual WORK_CPU_NONE usages. Similarly, WORK_CPU_LAST which equals WORK_CPU_NONE no longer makes sense. Replace both WORK_CPU_NONE and LAST with WORK_CPU_END. This patch doesn't introduce any functional difference. tj: s/WORK_CPU_LAST/WORK_CPU_END/ and rewrote the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | | | | | | workqueue: post global_cwq removal cleanupsTejun Heo2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove remaining references to gcwq. * __next_gcwq_cpu() steals __next_wq_cpu() name. The original __next_wq_cpu() became __next_cwq_cpu(). * s/for_each_gcwq_cpu/for_each_wq_cpu/ s/for_each_online_gcwq_cpu/for_each_online_wq_cpu/ * s/gcwq_mayday_timeout/pool_mayday_timeout/ * s/gcwq_unbind_fn/wq_unbind_fn/ * Drop references to gcwq in comments. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
| * | | | | | | | | | | | | | | | workqueue: rename nr_running variablesTejun Heo2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename per-cpu and unbound nr_running variables such that they match the pool variables. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
| * | | | | | | | | | | | | | | | workqueue: remove global_cwqTejun Heo2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | global_cwq is now nothing but a container for per-cpu standard worker_pools. Declare the worker pools directly as cpu/unbound_std_worker_pools[] and remove global_cwq. * ____cacheline_aligned_in_smp moved from global_cwq to worker_pool. This probably would have made sense even before this change as we want each pool to be aligned. * get_gcwq() is replaced with std_worker_pools() which returns the pointer to the standard pool array for a given CPU. * __alloc_workqueue_key() updated to use get_std_worker_pool() instead of open-coding pool determination. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. v2: Joonsoo pointed out that it'd better to align struct worker_pool rather than the array so that every pool is aligned. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Joonsoo Kim <js1304@gmail.com>
| * | | | | | | | | | | | | | | | workqueue: remove worker_pool->gcwqTejun Heo2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only remaining user of pool->gcwq is std_worker_pool_pri(). Reimplement it using get_gcwq() and remove worker_pool->gcwq. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
| * | | | | | | | | | | | | | | | workqueue: replace for_each_worker_pool() with for_each_std_worker_pool()Tejun Heo2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for_each_std_worker_pool() takes @cpu instead of @gcwq. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
| * | | | | | | | | | | | | | | | workqueue: make freezing/thawing per-poolTejun Heo2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of holding locks from both pools and then processing the pools together, make freezing/thwaing per-pool - grab locks of one pool, process it, release it and then proceed to the next pool. While this patch changes processing order across pools, order within each pool remains the same. As each pool is independent, this shouldn't break anything. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
| * | | | | | | | | | | | | | | | workqueue: make hotplug processing per-poolTejun Heo2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of holding locks from both pools and then processing the pools together, make hotplug processing per-pool - grab locks of one pool, process it, release it and then proceed to the next pool. rebind_workers() is updated to take and process @pool instead of @gcwq which results in a lot of de-indentation. gcwq_claim_assoc_and_lock() and its counterpart are replaced with in-line per-pool locking. While this patch changes processing order across pools, order within each pool remains the same. As each pool is independent, this shouldn't break anything. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
| * | | | | | | | | | | | | | | | workqueue: move global_cwq->lock to worker_poolTejun Heo2013-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move gcwq->lock to pool->lock. The conversion is mostly straight-forward. Things worth noting are * In many places, this removes the need to use gcwq completely. pool is used directly instead. get_std_worker_pool() is added to help some of these conversions. This also leaves get_work_gcwq() without any user. Removed. * In hotplug and freezer paths, the pools belonging to a CPU are often processed together. This patch makes those paths hold locks of all pools, with highpri lock nested inside, to keep the conversion straight-forward. These nested lockings will be removed by following patches. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>