aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/cgroup.h
Commit message (Collapse)AuthorAge
...
* | cgroup: make flags and subsys_masks unsigned intTejun Heo2014-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason to use atomic bitops for cgroup_subsys_state->flags, cgroup_root->flags and various subsys_masks. This patch updates those to use bitwise and/or operations instead and converts them form unsigned long to unsigned int. This makes the fields occupy (marginally) smaller space and makes it clear that they don't require atomicity. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | cgroup: implement cgroup.populated for the default hierarchyTejun Heo2014-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup users often need a way to determine when a cgroup's subhierarchy becomes empty so that it can be cleaned up. cgroup currently provides release_agent for it; unfortunately, this mechanism is riddled with issues. * It delivers events by forking and execing a userland binary specified as the release_agent. This is a long deprecated method of notification delivery. It's extremely heavy, slow and cumbersome to integrate with larger infrastructure. * There is single monitoring point at the root. There's no way to delegate management of a subtree. * The event isn't recursive. It triggers when a cgroup doesn't have any tasks or child cgroups. Events for internal nodes trigger only after all children are removed. This again makes it impossible to delegate management of a subtree. * Events are filtered from the kernel side. "notify_on_release" file is used to subscribe to or suppress release event. This is unnecessarily complicated and probably done this way because event delivery itself was expensive. This patch implements interface file "cgroup.populated" which can be used to monitor whether the cgroup's subhierarchy has tasks in it or not. Its value is 0 if there is no task in the cgroup and its descendants; otherwise, 1, and kernfs_notify() notificaiton is triggers when the value changes, which can be monitored through poll and [di]notify. This is a lot ligther and simpler and trivially allows delegating management of subhierarchy - subhierarchy monitoring can block further propgation simply by putting itself or another process in the root of the subhierarchy and monitor events that it's interested in from there without interfering with monitoring higher in the tree. v2: Patch description updated as per Serge. v3: "cgroup.subtree_populated" renamed to "cgroup.populated". The subtree_ prefix was a bit confusing because "cgroup.subtree_control" uses it to denote the tree rooted at the cgroup sans the cgroup itself while the populated state includes the cgroup itself. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Lennart Poettering <lennart@poettering.net>
* | cgroup: implement dynamic subtree controller enable/disable on the default ↵Tejun Heo2014-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hierarchy cgroup is switching away from multiple hierarchies and will use one unified default hierarchy where controllers can be dynamically enabled and disabled per subtree. The default hierarchy will serve as the unified hierarchy to which all controllers are attached and a css on the default hierarchy would need to also serve the tasks of descendant cgroups which don't have the controller enabled - ie. the tree may be collapsed from leaf towards root when viewed from specific controllers. This has been implemented through effective css in the previous patches. This patch finally implements dynamic subtree controller enable/disable on the default hierarchy via a new knob - "cgroup.subtree_control" which controls which controllers are enabled on the child cgroups. Let's assume a hierarchy like the following. root - A - B - C \ D root's "cgroup.subtree_control" determines which controllers are enabled on A. A's on B. B's on C and D. This coincides with the fact that controllers on the immediate sub-level are used to distribute the resources of the parent. In fact, it's natural to assume that resource control knobs of a child belong to its parent. Enabling a controller in "cgroup.subtree_control" declares that distribution of the respective resources of the cgroup will be controlled. Note that this means that controller enable states are shared among siblings. The default hierarchy has an extra restriction - only cgroups which don't contain any task may have controllers enabled in "cgroup.subtree_control". Combined with the other properties of the default hierarchy, this guarantees that, from the view point of controllers, tasks are only on the leaf cgroups. In other words, only leaf csses may contain tasks. This rules out situations where child cgroups compete against internal tasks of the parent, which is a competition between two different types of entities without any clear way to determine resource distribution between the two. Different controllers handle it differently and all the implemented behaviors are ambiguous, ad-hoc, cumbersome and/or just wrong. Having this structural constraints imposed from cgroup core removes the burden from controller implementations and enables showing one consistent behavior across all controllers. When a controller is enabled or disabled, css associations for the controller in the subtrees of each child should be updated. After enabling, the whole subtree of a child should point to the new css of the child. After disabling, the whole subtree of a child should point to the cgroup's css. This is implemented by first updating cgroup states such that cgroup_e_css() result points to the appropriate css and then invoking cgroup_update_dfl_csses() which migrates all tasks in the affected subtrees to the self cgroup on the default hierarchy. * When read, "cgroup.subtree_control" lists all the currently enabled controllers on the children of the cgroup. * White-space separated list of controller names prefixed with either '+' or '-' can be written to "cgroup.subtree_control". The ones prefixed with '+' are enabled on the controller and '-' disabled. * A controller can be enabled iff the parent's "cgroup.subtree_control" enables it and disabled iff no child's "cgroup.subtree_control" has it enabled. * If a cgroup has tasks, no controller can be enabled via "cgroup.subtree_control". Likewise, if "cgroup.subtree_control" has some controllers enabled, tasks can't be migrated into the cgroup. * All controllers which aren't bound on other hierarchies are automatically associated with the root cgroup of the default hierarchy. All the controllers which are bound to the default hierarchy are listed in the read-only file "cgroup.controllers" in the root directory. * "cgroup.controllers" in all non-root cgroups is read-only file whose content is equal to that of "cgroup.subtree_control" of the parent. This indicates which controllers can be used in the cgroup's "cgroup.subtree_control". This is still experimental and there are some holes, one of which is that ->can_attach() failure during cgroup_update_dfl_csses() may leave the cgroups in an undefined state. The issues will be addressed by future patches. v2: Non-root cgroups now also have "cgroup.controllers". Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | cgroup: add css_set->dfl_cgrpTejun Heo2014-04-23
| | | | | | | | | | | | | | | | | | | | To implement the unified hierarchy behavior, we'll need to be able to determine the associated cgroup on the default hierarchy from css_set. Let's add css_set->dfl_cgrp so that it can be accessed conveniently and efficiently. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | cgroup: teach css_task_iter about effective cssesTejun Heo2014-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, css_task_iter iterates tasks associated with a css by visiting each css_set associated with the owning cgroup and walking tasks of each of them. This works fine for !unified hierarchies as each cgroup has its own css for each associated subsystem on the hierarchy; however, on the planned unified hierarchy, a cgroup may not have csses associated and its tasks would be considered associated with the matching css of the nearest ancestor which has the subsystem enabled. This means that on the default unified hierarchy, just walking all tasks associated with a cgroup isn't enough to walk all tasks which are associated with the specified css. If any of its children doesn't have the matching css enabled, task iteration should also include all tasks from the subtree. We already added cgroup->e_csets[] to list all css_sets effectively associated with a given css and walk css_sets on that list instead to achieve such iteration. This patch updates css_task_iter iteration such that it walks css_sets on cgroup->e_csets[] instead of cgroup->cset_links if iteration is requested on an non-dummy css. Thanks to the previous iteration update, this change can be achieved with the addition of css_task_iter->ss and minimal updates to css_advance_task_iter() and css_task_iter_start(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | cgroup: reorganize css_task_iterTejun Heo2014-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reorganizes css_task_iter so that adding effective css support is easier. * s/->cset_link/->cset_pos/ and s/->task/->task_pos/ for consistency * ->origin_css is used to determine whether the iteration reached the last css_set. Replace it with explicit ->cset_head so that css_advance_task_iter() doesn't have to know the termination condition directly. * css_task_iter_next() currently assumes that it's walking list of cgrp_cset_link and reaches into the current cset through the current link to determine the termination conditions for task walking. As this won't always be true for effective css walking, add ->tasks_head and ->mg_tasks_head and use them to control task walking so that css_task_iter_next() doesn't have to know how css_sets are being walked. This patch doesn't make any behavior changes. The iteration logic stays unchanged after the patch. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | cgroup: implement cgroup->e_csets[]Tejun Heo2014-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the default unified hierarchy, a cgroup may be associated with csses of its ancestors, which means that a css of a given cgroup may be associated with css_sets of descendant cgroups. This means that we can't walk all tasks associated with a css by iterating the css_sets associated with the cgroup as there are css_sets which are pointing to the css but linked on the descendants. This patch adds per-subsystem list heads cgroup->e_csets[]. Any css_set which is pointing to a css is linked to css->cgroup->e_csets[$SUBSYS_ID] through css_set->e_cset_node[$SUBSYS_ID]. The lists are protected by css_set_rwsem and will allow us to walk all css_sets associated with a given css so that we can find out all associated tasks. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore ↵Tejun Heo2014-04-23
|/ | | | | | | | | | | | | | | | | | | | | | | cgroup_root->subsys_mask 944196278d3d ("cgroup: move ->subsys_mask from cgroupfs_root to cgroup") moved ->subsys_mask from cgroup_root to cgroup to prepare for the unified hierarhcy; however, it turns out that carrying the subsys_mask of the children in the parent, instead of itself, is a lot more natural. This patch restores cgroup_root->subsys_mask and morphs cgroup->subsys_mask into cgroup->child_subsys_mask. * Uses of root->cgrp.subsys_mask are restored to root->subsys_mask. * Remove automatic setting and clearing of cgrp->subsys_mask and instead just inherit ->child_subsys_mask from the parent during cgroup creation. Note that this doesn't affect any current behaviors. * Undo __kill_css() separation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: remove useless argument from cgroup_exit()Li Zefan2014-03-29
| | | | | Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: implement CFTYPE_ONLY_ON_DFLTejun Heo2014-03-19
| | | | | | | | This cftype flag makes the file only appear on the default hierarchy. This will later be used for cgroup.controllers file. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: make cgrp_dfl_root mountableTejun Heo2014-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgrp_dfl_root will be used as the default unified hierarchy. This patch makes cgrp_dfl_root mountable by making the following changes. * cgroup_init_early() now initializes cgrp_dfl_root w/ CGRP_ROOT_SANE_BEHAVIOR. The default hierarchy is always sane. * parse_cgroupfs_options() and cgroup_mount() are updated such that cgrp_dfl_root is mounted if sane_behavior is specified w/o any subsystems. * rebind_subsystems() now populates the root directory of cgrp_dfl_root. Note that the function still guarantees success of rebinding subsystems to cgrp_dfl_root. If populating fails while rebinding to cgrp_dfl_root, it whines but ignores the error. * For backward compatibility, the default hierarchy shows up in /proc/$PID/cgroup only after it's explicitly mounted so that userland which doesn't make use of it doesn't see any change. * "current_css_set_cg_links" file of debug cgroup now treats the default hierarchy the same as other hierarchies. This is visible to userland. Given that it's for debug controller, this should be fine. * While at it, implement cgroup_on_dfl() which tests whether a give cgroup is on the default hierarchy or not. The above changes make cgrp_dfl_root mostly equivalent to other controllers but the actual unified hierarchy behaviors are not implemented yet. Let's plug child cgroup creation in cgrp_dfl_root from create_cgroup() for now. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: drop const from @buffer of cftype->write_string()Tejun Heo2014-03-19
| | | | | | | | | | | | | | | | | | | cftype->write_string() just passes on the writeable buffer from kernfs and there's no reason to add const restriction on the buffer. The only thing const achieves is unnecessarily complicating parsing of the buffer. Drop const from @buffer. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
* cgroup: rename cgroup_dummy_root and related namesTejun Heo2014-03-19
| | | | | | | | | | | | | | The dummy root will be repurposed to serve as the default unified hierarchy. Let's rename things in preparation. * s/cgroup_dummy_root/cgrp_dfl_root/ * s/cgroupfs_root/cgroup_root/ as we don't do fs part directly anymore * s/cgroup_root->top_cgroup/cgroup_root->cgrp/ for brevity This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: move ->subsys_mask from cgroupfs_root to cgroupTejun Heo2014-03-19
| | | | | | | | | | | | | | | | | | | | | | cgroupfs_root->subsys_mask represents the controllers attached to the hierarchy. This patch moves the field to cgroup. Subsystem initialization and rebinding updates the top cgroup's subsys_mask. For !root cgroups, the subsys_mask bits are set from create_css() and cleared from kill_css(), which effectively means that all cgroups will have the same subsys_mask as the top cgroup. While this doesn't make any difference now, this will help implementation of the default unified hierarchy where !root cgroups may have subsets of the top_cgroup's subsys_mask. While at it, __kill_css() is split out of kill_css(). The former doesn't care about the subsys_mask while the latter becomes noop if the controller is already killed and clears the matching bit if not before proceeding to killing the css. This will be used later by the default unified hierarchy implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()Tejun Heo2014-03-19
| | | | | | | | | The dummy hierarchy is now a fully functional one and dummy_top has a kernfs_node associated with it. Drop the NULL checks in [pr_cont_]cont_{name|path}() which are no longer necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: drop task_lock() protection around task->cgroupsTejun Heo2014-02-25
| | | | | | | | | | | For optimization, task_lock() is additionally used to protect task->cgroups. The optimization is pretty dubious as either css_set_rwsem is grabbed anyway or PF_EXITING already protects task->cgroups. It adds only overhead and confusion at this point. Let's drop task_[un]lock() and update comments accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: split process / task migration into four stepsTejun Heo2014-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, process / task migration is a single operation which may fail depending on memory pressure or the involved controllers' ->can_attach() callbacks. One problem with this approach is migration of multiple targets. It's impossible to tell whether a given target will be successfully migrated beforehand and cgroup core can't keep track of enough states to roll back after intermediate failure. This is already an issue with cgroup_transfer_tasks(). Also, we're gonna need multiple target migration for unified hierarchy. This patch splits migration into four stages - cgroup_migrate_add_src(), cgroup_migrate_prepare_dst(), cgroup_migrate() and cgroup_migrate_finish(), where cgroup_migrate_prepare_dst() performs all the operations which may fail due to allocation failure without actually migrating the target. The four separate stages mean that, disregarding ->can_attach() failures, the success or failure of multi target migration can be determined before performing any actual migration. If preparations of all targets succeed, the whole thing will succeed. If not, the whole operation can fail without any side-effect. Since the previous patch to use css_set->mg_tasks to keep track of migration targets, the only thing which may need memory allocation during migration is the target css_sets. cgroup_migrate_prepare() pins all source and target css_sets and link them up. Note that this can be performed without holding threadgroup_lock even if the target is a process. As long as cgroup_mutex is held, no new css_set can be put into play. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: use css_set->mg_tasks to track target tasks during migrationTejun Heo2014-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, while migrating tasks from one cgroup to another, cgroup_attach_task() builds a flex array of all target tasks; unfortunately, this has a couple issues. * Flex array has size limit. On 64bit, struct task_and_cgroup is 24bytes making the flex element limit around 87k. It is a high number but not impossible to hit. This means that the current cgroup implementation can't migrate a process with more than 87k threads. * Process migration involves memory allocation whose size is dependent on the number of threads the process has. This means that cgroup core can't guarantee success or failure of multi-process migrations as memory allocation failure can happen in the middle. This is in part because cgroup can't grab threadgroup locks of multiple processes at the same time, so when there are multiple processes to migrate, it is imposible to tell how many tasks are to be migrated beforehand. Note that this already affects cgroup_transfer_tasks(). cgroup currently cannot guarantee atomic success or failure of the operation. It may fail in the middle and after such failure cgroup doesn't have enough information to roll back properly. It just aborts with some tasks migrated and others not. To resolve the situation, this patch updates the migration path to use task->cg_list to track target tasks. The previous patch already added css_set->mg_tasks and updated iterations in non-migration paths to include them during task migration. This patch updates migration path to actually make use of it. Instead of putting onto a flex_array, each target task is moved from its css_set->tasks list to css_set->mg_tasks and the migration path keeps trace of all the source css_sets and the associated cgroups. Once all source css_sets are determined, the destination css_set for each is determined, linked to the matching source css_set and put on a separate list. To iterate the target tasks, migration path just needs to iterat through either the source or target css_sets, depending on whether migration has been committed or not, and the tasks on their ->mg_tasks lists. cgroup_taskset is updated to contain the list_heads for source and target css_sets and the iteration cursor. cgroup_taskset_*() are accordingly updated to walk through css_sets and their ->mg_tasks. This resolves the above listed issues with moderate additional complexity. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: add css_set->mg_tasksTejun Heo2014-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, while migrating tasks from one cgroup to another, cgroup_attach_task() builds a flex array of all target tasks; unfortunately, this has a couple issues. * Flex array has size limit. On 64bit, struct task_and_cgroup is 24bytes making the flex element limit around 87k. It is a high number but not impossible to hit. This means that the current cgroup implementation can't migrate a process with more than 87k threads. * Process migration involves memory allocation whose size is dependent on the number of threads the process has. This means that cgroup core can't guarantee success or failure of multi-process migrations as memory allocation failure can happen in the middle. This is in part because cgroup can't grab threadgroup locks of multiple processes at the same time, so when there are multiple processes to migrate, it is imposible to tell how many tasks are to be migrated beforehand. Note that this already affects cgroup_transfer_tasks(). cgroup currently cannot guarantee atomic success or failure of the operation. It may fail in the middle and after such failure cgroup doesn't have enough information to roll back properly. It just aborts with some tasks migrated and others not. To resolve the situation, we're going to use task->cg_list during migration too. Instead of building a separate array, target tasks will be linked into a dedicated migration list_head on the owning css_set. Tasks on the migration list are treated the same as tasks on the usual tasks list; however, being on a separate list allows cgroup migration code path to keep track of the target tasks by simply keeping the list of css_sets with tasks being migrated, making unpredictable dynamic allocation unnecessary. In prepartion of such migration path update, this patch introduces css_set->mg_tasks list and updates css_set task iterations so that they walk both css_set->tasks and ->mg_tasks. Note that ->mg_tasks isn't used yet. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: deal with dummp_top in cgroup_name() and cgroup_path()Li Zefan2014-02-14
| | | | | | | | | | My kernel fails to boot, because blkcg calls cgroup_path() while cgroupfs is not mounted. Fix both cgroup_name() and cgroup_path(). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: remove cgroup_taskset_cur_css() and cgroup_taskset_size()Tejun Heo2014-02-13
| | | | | | | | The two functions don't have any users left. Remove them along with cgroup_taskset->cur_cgrp. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: drop @skip_css from cgroup_taskset_for_each()Tejun Heo2014-02-13
| | | | | | | | | | | | | | | | | If !NULL, @skip_css makes cgroup_taskset_for_each() skip the matching css. The intention of the interface is to make it easy to skip css's (cgroup_subsys_states) which already match the migration target; however, this is entirely unnecessary as migration taskset doesn't include tasks which are already in the target cgroup. Drop @skip_css from cgroup_taskset_for_each(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Daniel Borkmann <dborkman@redhat.com>
* cgroup: remove css_scan_tasks()Tejun Heo2014-02-13
| | | | | | | css_scan_tasks() doesn't have any user left. Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: implement cgroup_has_tasks() and unexport cgroup_task_count()Tejun Heo2014-02-13
| | | | | | | | | | | | | | | | | | | | cgroup_task_count() read-locks css_set_lock and walks all tasks to count them and then returns the result. The only thing all the users want is determining whether the cgroup is empty or not. This patch implements cgroup_has_tasks() which tests whether cgroup->cset_links is empty, replaces all cgroup_task_count() usages and unexports it. Note that the test isn't synchronized. This is the same as before. The test has always been racy. This will help planned css_set locking update. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
* cgroup: drop CGRP_ROOT_SUBSYS_BOUNDTejun Heo2014-02-13
| | | | | | | | | | | | | | | | | Before kernfs conversion, due to the way super_block lookup works, cgroup roots were created and made visible before being fully initialized. This in turn required a special flag to mark that the root hasn't been fully initialized so that the destruction path can tell fully bound ones from half initialized. That flag is CGRP_ROOT_SUBSYS_BOUND and no longer necessary after the kernfs conversion as the lookup and creation of new root are atomic w.r.t. cgroup_mutex. This patch removes the flag and passes the requests subsystem mask to cgroup_setup_root() so that it can set the respective mask bits as subsystems are bound. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: disallow xattr, release_agent and name if sane_behaviorTejun Heo2014-02-13
| | | | | | | | | | | Disallow more mount options if sane_behavior. Note that xattr used to generate warning. While at it, simplify option check in cgroup_mount() and update sane_behavior comment in cgroup.h. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: remove cgroupfs_root->refcntTejun Heo2014-02-12
| | | | | | | | | | | | | | | | | | | Currently, cgroupfs_root and its ->top_cgroup are separated reference counted and the latter's is ignored. There's no reason to do this separately. This patch removes cgroupfs_root->refcnt and destroys cgroupfs_root when the top_cgroup is released. * cgroup_put() updated to ignore cgroup_is_dead() test for top cgroups. cgroup_free_fn() updated to handle root destruction when releasing a top cgroup. * As root destruction is now bounced through cgroup destruction, it is asynchronous. Update cgroup_mount() so that it waits for pending release which is currently implemented using msleep(). Converting this to proper wait_queue isn't hard but likely unnecessary. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: rename cgroupfs_root->number_of_cgroups to ->nr_cgrps and make it ↵Tejun Heo2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | atomic_t root->number_of_cgroups is currently an integer protected with cgroup_mutex. Except for sanity checks and proc reporting, the only place it's used is to check whether the root has any child during remount; however, this is a bit flawed as the counter is not decremented when the cgroup is unlinked but when it's released, meaning that there could be an extended period where all cgroups are removed but remount is still not allowed because some internal objects are lingering. While not perfect either, it'd be better to use emptiness test on root->top_cgroup.children. This patch updates cgroup_remount() to test top_cgroup's children instead, which makes number_of_cgroups only actual usage statistics printing in proc implemented in proc_cgroupstats_show(). Let's shorten its name and make it an atomic_t so that we don't have to worry about its synchronization. It's purely auxiliary at this point. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: remove cgroup->nameTejun Heo2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup->name handling became quite complicated over time involving dedicated struct cgroup_name for RCU protection. Now that cgroup is on kernfs, we can drop all of it and simply use kernfs_name/path() and friends. Replace cgroup->name and all related code with kernfs name/path constructs. * Reimplement cgroup_name() and cgroup_path() as thin wrappers on top of kernfs counterparts, which involves semantic changes. pr_cont_cgroup_name() and pr_cont_cgroup_path() added. * cgroup->name handling dropped from cgroup_rename(). * All users of cgroup_name/path() updated to the new semantics. Users which were formatting the string just to printk them are converted to use pr_cont_cgroup_name/path() instead, which simplifies things quite a bit. As cgroup_name() no longer requires RCU read lock around it, RCU lockings which were protecting only cgroup_name() are removed. v2: Comment above oom_info_lock updated as suggested by Michal. v3: dummy_top doesn't have a kn associated and pr_cont_cgroup_name/path() ended up calling the matching kernfs functions with NULL kn leading to oops. Test for NULL kn and print "/" if so. This issue was reported by Fengguang Wu. v4: Rebased on top of 0ab02ca8f887 ("cgroup: protect modifications to cgroup_idr with cgroup_mutex"). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
* cgroup: remove cftype_setTejun Heo2014-02-12
| | | | | | | | | | | | | | | | | cftype_set was added primarily to allow registering the same cftype array more than once for different subsystems. Nobody uses or needs such thing and it's already broken because each cftype has ->ss pointer which is initialized during registration. Let's add list_head ->node to cftype and use the first cftype entry in the array to link them instead of allocating separate cftype_set. While at it, trigger WARN if cft seems previously initialized during registration. This simplifies cftype handling a bit. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: warn if "xattr" is specified with "sane_behavior"Tejun Heo2014-02-12
| | | | | | | | | Mount option "xattr" is no longer necessary as it's enabled by default on kernfs. Warn if "xattr" is specified with "sane_behavior" so that the option can be removed in the future. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: convert to kernfsTejun Heo2014-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup filesystem code was derived from the original sysfs implementation which was heavily intertwined with vfs objects and locking with the goal of re-using the existing vfs infrastructure. That experiment turned out rather disastrous and sysfs switched, a long time ago, to distributed filesystem model where a separate representation is maintained which is queried by vfs. Unfortunately, cgroup stuck with the failed experiment all these years and accumulated even more problems over time. Locking and object lifetime management being entangled with vfs is probably the most egregious. vfs is never designed to be misused like this and cgroup ends up jumping through various convoluted dancing to make things work. Even then, operations across multiple cgroups can't be done safely as it'll deadlock with rename locking. Recently, kernfs is separated out from sysfs so that it can be used by users other than sysfs. This patch converts cgroup to use kernfs, which will bring the following benefits. * Separation from vfs internals. Locking and object lifetime management is contained in cgroup proper making things a lot simpler. This removes significant amount of locking convolutions, hairy object lifetime rules and the restriction on multi-cgroup operations. * Can drop a lot of code to implement filesystem interface as most are provided by kernfs. * Proper "severing" semantics, which allows controllers to not worry about lingering file accesses after offline. While the preceding patches did as much as possible to make the transition less painful, large part of the conversion has to be one discrete step making this patch rather large. The rest of the commit message lists notable changes in different areas. Overall ------- * vfs constructs replaced with kernfs ones. cgroup->dentry w/ ->kn, cgroupfs_root->sb w/ ->kf_root. * All dentry accessors are removed. Helpers to map from kernfs constructs are added. * All vfs plumbing around dentry, inode and bdi removed. * cgroup_mount() now directly looks for matching root and then proceeds to create a new one if not found. Synchronization and object lifetime ----------------------------------- * vfs inode locking removed. Among other things, this removes the need for the convolution in cgroup_cfts_commit(). Future patches will further simplify it. * vfs refcnting replaced with cgroup internal ones. cgroup->refcnt, cgroupfs_root->refcnt added. cgroup_put_root() now directly puts root->refcnt and when it reaches zero proceeds to destroy it thus merging cgroup_put_root() and the former cgroup_kill_sb(). Simliarly, cgroup_put() now directly schedules cgroup_free_rcu() when refcnt reaches zero. * Unlike before, kernfs objects don't hold onto cgroup objects. When cgroup destroys a kernfs node, all existing operations are drained and the association is broken immediately. The same for cgroupfs_roots and mounts. * All operations which come through kernfs guarantee that the associated cgroup is and stays valid for the duration of operation; however, there are two paths which need to find out the associated cgroup from dentry without going through kernfs - css_tryget_from_dir() and cgroupstats_build(). For these two, kernfs_node->priv is RCU managed so that they can dereference it under RCU read lock. File and directory handling --------------------------- * File and directory operations converted to kernfs_ops and kernfs_syscall_ops. * xattrs is implicitly supported by kernfs. No need to worry about it from cgroup. This means that "xattr" mount option is no longer necessary. A future patch will add a deprecated warning message when sane_behavior. * When cftype->max_write_len > PAGE_SIZE, it's necessary to make a private copy of one of the kernfs_ops to set its atomic_write_len. cftype->kf_ops is added and cgroup_init/exit_cftypes() are updated to handle it. * cftype->lockdep_key added so that kernfs lockdep annotation can be per cftype. * Inidividual file entries and open states are now managed by kernfs. No need to worry about them from cgroup. cfent, cgroup_open_file and their friends are removed. * kernfs_nodes are created deactivated and kernfs_activate() invocations added to places where creation of new nodes are committed. * cgroup_rmdir() uses kernfs_[un]break_active_protection() for self-removal. v2: - Li pointed out in an earlier patch that specifying "name=" during mount without subsystem specification should succeed if there's an existing hierarchy with a matching name although it should fail with -EINVAL if a new hierarchy should be created. Prior to the conversion, this used by handled by deferring failure from NULL return from cgroup_root_from_opts(), which was necessary because root was being created before checking for existing ones. Note that cgroup_root_from_opts() returned an ERR_PTR() value for error conditions which require immediate mount failure. As we now have separate search and creation steps, deferring failure from cgroup_root_from_opts() is no longer necessary. cgroup_root_from_opts() is updated to always return ERR_PTR() value on failure. - The logic to match existing roots is updated so that a mount attempt with a matching name but different subsys_mask are rejected. This was handled by a separate matching loop under the comment "Check for name clashes with existing mounts" but got lost during conversion. Merge the check into the main search loop. - Add __rcu __force casting in RCU_INIT_POINTER() in cgroup_destroy_locked() to avoid the sparse address space warning reported by kbuild test bot. Maybe we want an explicit interface to use kn->priv as RCU protected pointer? v3: Make CONFIG_CGROUPS select CONFIG_KERNFS. v4: Rebased on top of 0ab02ca8f887 ("cgroup: protect modifications to cgroup_idr with cgroup_mutex"). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: kbuild test robot fengguang.wu@intel.com>
* cgroup: misc preps for kernfs conversionTejun Heo2014-02-11
| | | | | | | | | | | | | | | | | | | | | | | | * Un-inline seq_css(). After kernfs conversion, the function will need to dereference internal data structures. * Add cgroup_get/put_root() and replace direct super_block->s_active manipulatinos with them. These will be converted to kernfs_root refcnting. * Add cgroup_get/put() and replace dget/put() on cgrp->dentry with them. These will be converted to kernfs refcnting. * Update current_css_set_cg_links_read() to use cgroup_name() instead of reaching into the dentry name. The end result is the same. These changes don't make functional differences but will make transition to kernfs easier. v2: Rebased on top of 0ab02ca8f887 ("cgroup: protect modifications to cgroup_idr with cgroup_mutex"). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: introduce cgroup_ino()Tejun Heo2014-02-11
| | | | | | | | | | | | | | | | | mm/memory-failure.c::hwpoison_filter_task() has been reaching into cgroup to extract the associated ino to be used as a filtering criterion. This is an implementation detail which shouldn't be depended upon from outside cgroup proper and is about to change with the scheduled kernfs conversion. This patch introduces a proper interface to determine the associated ino, cgroup_ino(), and updates hwpoison_filter_task() to use it instead of reaching directly into cgroup. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Wu Fengguang <fengguang.wu@intel.com>
* cgroup: update the meaning of cftype->max_write_lenTejun Heo2014-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | cftype->max_write_len is used to extend the maximum size of writes. It's interpreted in such a way that the actual maximum size is one less than the specified value. The default size is defined by CGROUP_LOCAL_BUFFER_SIZE. Its interpretation is quite confusing - its value is decremented by 1 and then compared for equality with max size, which means that the actual default size is CGROUP_LOCAL_BUFFER_SIZE - 2, which is 62 chars. There's no point in having a limit that low. Update its definition so that it means the actual string length sans termination and anything below PAGE_SIZE-1 is treated as PAGE_SIZE-1. .max_write_len for "release_agent" is updated to PATH_MAX-1 and cgroup_release_agent_write() is updated so that the redundant strlen() check is removed and it uses strlcpy() instead of strcpy(). .max_write_len initializations in blk-throttle.c and cfq-iosched.c are no longer necessary and removed. The one in cpuset is kept unchanged as it's an approximated value to begin with. This will also make transition to kernfs smoother. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: make cgroup_subsys->base_cftypes use cgroup_add_cftypes()Tejun Heo2014-02-11
| | | | | | | | | | | | | | | | | | | | | Currently, cgroup_subsys->base_cftypes registration is different from dynamic cftypes registartion. Instead of going through cgroup_add_cftypes(), cgroup_init_subsys() invokes cgroup_init_cftsets() which makes use of cgroup_subsys->base_cftset which doesn't involve dynamic allocation. While avoiding dynamic allocation is somewhat nice, having two separate paths for cftypes registration is nasty, especially as we're planning to add more operations during cftypes registration. This patch drops cgroup_init_cftsets() and cgroup_subsys->base_cftset and registers base_cftypes using cgroup_add_cftypes(). This is done as a separate step in cgroup_init() instead of a part of cgroup_init_subsys(). This is because cgroup_init_subsys() can be called very early during boot when kmalloc() isn't available yet. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: improve css_from_dir() into css_tryget_from_dir()Tejun Heo2014-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | css_from_dir() returns the matching css (cgroup_subsys_state) given a dentry and subsystem. The function doesn't pin the css before returning and requires the caller to be holding RCU read lock or cgroup_mutex and handling pinning on the caller side. Given that users of the function are likely to want to pin the returned css (both existing users do) and that getting and putting css's are very cheap, there's no reason for the interface to be tricky like this. Rename css_from_dir() to css_tryget_from_dir() and make it try to pin the found css and return it only if pinning succeeded. The callers are updated so that they no longer do RCU locking and pinning around the function and just use the returned css. This will also ease converting cgroup to kernfs. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
* Merge branch 'cgroup/for-3.14-fixes' into cgroup/for-3.15Tejun Heo2014-02-11
|\ | | | | | | | | | | | | | | Pull for-3.14-fixes to receive 0ab02ca8f887 ("cgroup: protect modifications to cgroup_idr with cgroup_mutex") prior to kernfs conversion series to avoid non-trivial conflicts. Signed-off-by: Tejun Heo <tj@kernel.org>
| * cgroup: protect modifications to cgroup_idr with cgroup_mutexLi Zefan2014-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setup cgroupfs like this: # mount -t cgroup -o cpuacct xxx /cgroup # mkdir /cgroup/sub1 # mkdir /cgroup/sub2 Then run these two commands: # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } & # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } & After seconds you may see this warning: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0() idr_remove called for id=6 which is not allocated. ... Call Trace: [<ffffffff8156063c>] dump_stack+0x7a/0x96 [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0 [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50 [<ffffffff81300aa7>] sub_remove+0x87/0x1b0 [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0 [<ffffffff81300bf5>] idr_remove+0x25/0xd0 [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0 [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0 [<ffffffff8107cdbc>] process_one_work+0x26c/0x550 [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0 [<ffffffff81085f96>] kthread+0xe6/0xf0 [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0 ---[ end trace 2d1577ec10cf80d0 ]--- It's because allocating/removing cgroup ID is not properly synchronized. The bug was introduced when we converted cgroup_ida to cgroup_idr. While synchronization is already done inside ida_simple_{get,remove}(), users are responsible for concurrent calls to idr_{alloc,remove}(). tj: Refreshed on top of b58c89986a77 ("cgroup: fix error return from cgroup_create()"). Fixes: 4e96ee8e981b ("cgroup: convert cgroup_ida to cgroup_idr") Cc: <stable@vger.kernel.org> #3.12+ Reported-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | cgroup: rename cgroup_subsys->subsys_id to ->idTejun Heo2014-02-08
| | | | | | | | | | | | | | | | | | | | It's no longer referenced outside cgroup core, so renaming is easy. Let's rename it for consistency & brevity. This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | cgroup: clean up cgroup_subsys names and initializationTejun Heo2014-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup_subsys is a bit messier than it needs to be. * The name of a subsys can be different from its internal identifier defined in cgroup_subsys.h. Most subsystems use the matching name but three - cpu, memory and perf_event - use different ones. * cgroup_subsys_id enums are postfixed with _subsys_id and each cgroup_subsys is postfixed with _subsys. cgroup.h is widely included throughout various subsystems, it doesn't and shouldn't have claim on such generic names which don't have any qualifier indicating that they belong to cgroup. * cgroup_subsys->subsys_id should always equal the matching cgroup_subsys_id enum; however, we require each controller to initialize it and then BUG if they don't match, which is a bit silly. This patch cleans up cgroup_subsys names and initialization by doing the followings. * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each cgroup_subsys with _cgrp_subsys. * With the above, renaming subsys identifiers to match the userland visible names doesn't cause any naming conflicts. All non-matching identifiers are renamed to match the official names. cpu_cgroup -> cpu mem_cgroup -> memory perf -> perf_event * controllers no longer need to initialize ->subsys_id and ->name. They're generated in cgroup core and set automatically during boot. * Redundant cgroup_subsys declarations removed. * While updating BUG_ON()s in cgroup_init_early(), convert them to WARN()s. BUGging that early during boot is stupid - the kernel can't print anything, even through serial console and the trap handler doesn't even link stack frame properly for back-tracing. This patch doesn't introduce any behavior changes. v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs classid handling into core"). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Ingo Molnar <mingo@redhat.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Thomas Graf <tgraf@suug.ch>
* | cgroup: drop module supportTejun Heo2014-02-08
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With module supported dropped from net_prio, no controller is using cgroup module support. None of actual resource controllers can be built as a module and we aren't gonna add new controllers which don't control resources. This patch drops module support from cgroup. * cgroup_[un]load_subsys() and cgroup_subsys->module removed. * As there's no point in distinguishing IS_BUILTIN() and IS_MODULE(), cgroup_subsys.h now uses IS_ENABLED() directly. * enum cgroup_subsys_id now exactly matches the list of enabled controllers as ordered in cgroup_subsys.h. * cgroup_subsys[] is now a contiguously occupied array. Size specification is no longer necessary and dropped. * for_each_builtin_subsys() is removed and for_each_subsys() is updated to not require any locking. * module ref handling is removed from rebind_subsystems(). * Module related comments dropped. v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs classid handling into core"). v3: Added {} around the if (need_forkexit_callback) block in cgroup_post_fork() for readability as suggested by Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: remove stray references to css_idHugh Dickins2014-01-13
| | | | | | | | Trivial: remove the few stray references to css_id, which itself was removed in v3.13's 2ff2a7d03bbe "cgroup: kill css_id". Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: remove for_each_root_subsys()Tejun Heo2013-12-06
| | | | | | | | | | | | After the previous patch which introduced for_each_css(), for_each_root_subsys() only has two users left. This patch replaces it with for_each_subsys() + explicit subsys_mask testing and remove for_each_root_subsys() along with cgroupfs_root->subsys_list handling. This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: unify pidlist and other file handlingTejun Heo2013-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation of conversion to kernfs, cgroup file handling is updated so that it can be easily mapped to kernfs. With the previous changes, the difference between pidlist and other files are very small. Both are served by seq_file in a pretty standard way with the only difference being !pidlist files use single_open(). This patch adds cftype->seq_start(), ->seq_next and ->seq_stop() and implements the matching cgroup_seqfile_start/next/stop() which either emulates single_open() behavior or invokes cftype->seq_*() operations if specified. This allows using single seq_operations for both pidlist and other files and makes cgroup_pidlist_operations and cgorup_pidlist_open() no longer necessary. As cgroup_pidlist_open() was the only user of cftype->open(), the method is dropped together. This brings cftype file interface very close to kernfs interface and mapping shouldn't be too difficult. Once converted to kernfs, most of the plumbing code including cgroup_seqfile_*() will be removed as kernfs provides those facilities. This patch does not introduce any behavior changes. v2: Refreshed on top of the updated "cgroup: introduce struct cgroup_pidlist_open_file". v3: Refreshed on top of the updated "cgroup: attach cgroup_open_file to all cgroup files". Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: replace cftype->read_seq_string() with cftype->seq_show()Tejun Heo2013-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation of conversion to kernfs, cgroup file handling is updated so that it can be easily mapped to kernfs. This patch replaces cftype->read_seq_string() with cftype->seq_show() which is not limited to single_open() operation and will map directcly to kernfs seq_file interface. The conversions are mechanical. As ->seq_show() doesn't have @css and @cft, the functions which make use of them are converted to use seq_css() and seq_cft() respectively. In several occassions, e.f. if it has seq_string in its name, the function name is updated to fit the new method better. This patch does not introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Neil Horman <nhorman@tuxdriver.com>
* cgroup: attach cgroup_open_file to all cgroup filesTejun Heo2013-12-05
| | | | | | | | | | | | | | | | | | | | | | | In preparation of conversion to kernfs, cgroup file handling is updated so that it can be easily mapped to kernfs. This patch attaches cgroup_open_file, which used to be attached to pidlist files, to all cgroup files, introduces seq_css/cft() accessors to determine the cgroup_subsys_state and cftype associated with a given cgroup seq_file, exports them as public interface. This doesn't cause any behavior changes but unifies cgroup file handling across different file types and will help converting them to kernfs seq_show() interface. v2: Li pointed out that the original patch was using single_open_size() incorrectly assuming that the size param is private data size. Fix it by allocating @of separately and passing it to single_open() and explicitly freeing it in the release path. This isn't the prettiest but this path is gonna be restructured by the following patches pretty soon. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: remove cftype->read(), ->read_map() and ->write()Tejun Heo2013-12-05
| | | | | | | | | | | | In preparation of conversion to kernfs, cgroup file handling is being consolidated so that it can be easily mapped to the seq_file based interface of kernfs. After recent updates, ->read() and ->read_map() don't have any user left and ->write() never had any user. Remove them. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: don't guarantee cgroup.procs is sorted if sane_behaviorTejun Heo2013-11-29
| | | | | | | | | | | | | | | For some reason, tasks and cgroup.procs guarantee that the result is sorted. This is the only reason this whole pidlist logic is necessary instead of just iterating through sorted member tasks. We can't do anything about the existing interface but at least ensure that such expectation doesn't exist for the new interface so that pidlist logic may be removed in the distant future. This patch scrambles the sort order if sane_behavior so that the output is usually not sorted in the new interface. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: remove cftype->release()Tejun Heo2013-11-29
| | | | | | | | Now that pidlist files don't use cftype->release(), it doesn't have any user left. Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>