diff options
author | Joel Becker <joel.becker@oracle.com> | 2005-12-15 17:29:43 -0500 |
---|---|---|
committer | Joel Becker <joel.becker@oracle.com> | 2006-01-03 14:45:28 -0500 |
commit | 7063fbf2261194f72ee75afca67b3b38b554b5fa (patch) | |
tree | 7bfe4eeab8ce784b767cf30886623d456c384718 /Documentation/filesystems/configfs/configfs.txt | |
parent | 88026842b0a760145aa71d69e74fbc9ec118ca44 (diff) |
[PATCH] configfs: User-driven configuration filesystem
Configfs, a file system for userspace-driven kernel object configuration.
The OCFS2 stack makes extensive use of this for propagation of cluster
configuration information into kernel.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Diffstat (limited to 'Documentation/filesystems/configfs/configfs.txt')
-rw-r--r-- | Documentation/filesystems/configfs/configfs.txt | 434 |
1 files changed, 434 insertions, 0 deletions
diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt new file mode 100644 index 000000000000..c4ff96b7c4e0 --- /dev/null +++ b/Documentation/filesystems/configfs/configfs.txt | |||
@@ -0,0 +1,434 @@ | |||
1 | |||
2 | configfs - Userspace-driven kernel object configuation. | ||
3 | |||
4 | Joel Becker <joel.becker@oracle.com> | ||
5 | |||
6 | Updated: 31 March 2005 | ||
7 | |||
8 | Copyright (c) 2005 Oracle Corporation, | ||
9 | Joel Becker <joel.becker@oracle.com> | ||
10 | |||
11 | |||
12 | [What is configfs?] | ||
13 | |||
14 | configfs is a ram-based filesystem that provides the converse of | ||
15 | sysfs's functionality. Where sysfs is a filesystem-based view of | ||
16 | kernel objects, configfs is a filesystem-based manager of kernel | ||
17 | objects, or config_items. | ||
18 | |||
19 | With sysfs, an object is created in kernel (for example, when a device | ||
20 | is discovered) and it is registered with sysfs. Its attributes then | ||
21 | appear in sysfs, allowing userspace to read the attributes via | ||
22 | readdir(3)/read(2). It may allow some attributes to be modified via | ||
23 | write(2). The important point is that the object is created and | ||
24 | destroyed in kernel, the kernel controls the lifecycle of the sysfs | ||
25 | representation, and sysfs is merely a window on all this. | ||
26 | |||
27 | A configfs config_item is created via an explicit userspace operation: | ||
28 | mkdir(2). It is destroyed via rmdir(2). The attributes appear at | ||
29 | mkdir(2) time, and can be read or modified via read(2) and write(2). | ||
30 | As with sysfs, readdir(3) queries the list of items and/or attributes. | ||
31 | symlink(2) can be used to group items together. Unlike sysfs, the | ||
32 | lifetime of the representation is completely driven by userspace. The | ||
33 | kernel modules backing the items must respond to this. | ||
34 | |||
35 | Both sysfs and configfs can and should exist together on the same | ||
36 | system. One is not a replacement for the other. | ||
37 | |||
38 | [Using configfs] | ||
39 | |||
40 | configfs can be compiled as a module or into the kernel. You can access | ||
41 | it by doing | ||
42 | |||
43 | mount -t configfs none /config | ||
44 | |||
45 | The configfs tree will be empty unless client modules are also loaded. | ||
46 | These are modules that register their item types with configfs as | ||
47 | subsystems. Once a client subsystem is loaded, it will appear as a | ||
48 | subdirectory (or more than one) under /config. Like sysfs, the | ||
49 | configfs tree is always there, whether mounted on /config or not. | ||
50 | |||
51 | An item is created via mkdir(2). The item's attributes will also | ||
52 | appear at this time. readdir(3) can determine what the attributes are, | ||
53 | read(2) can query their default values, and write(2) can store new | ||
54 | values. Like sysfs, attributes should be ASCII text files, preferably | ||
55 | with only one value per file. The same efficiency caveats from sysfs | ||
56 | apply. Don't mix more than one attribute in one attribute file. | ||
57 | |||
58 | Like sysfs, configfs expects write(2) to store the entire buffer at | ||
59 | once. When writing to configfs attributes, userspace processes should | ||
60 | first read the entire file, modify the portions they wish to change, and | ||
61 | then write the entire buffer back. Attribute files have a maximum size | ||
62 | of one page (PAGE_SIZE, 4096 on i386). | ||
63 | |||
64 | When an item needs to be destroyed, remove it with rmdir(2). An | ||
65 | item cannot be destroyed if any other item has a link to it (via | ||
66 | symlink(2)). Links can be removed via unlink(2). | ||
67 | |||
68 | [Configuring FakeNBD: an Example] | ||
69 | |||
70 | Imagine there's a Network Block Device (NBD) driver that allows you to | ||
71 | access remote block devices. Call it FakeNBD. FakeNBD uses configfs | ||
72 | for its configuration. Obviously, there will be a nice program that | ||
73 | sysadmins use to configure FakeNBD, but somehow that program has to tell | ||
74 | the driver about it. Here's where configfs comes in. | ||
75 | |||
76 | When the FakeNBD driver is loaded, it registers itself with configfs. | ||
77 | readdir(3) sees this just fine: | ||
78 | |||
79 | # ls /config | ||
80 | fakenbd | ||
81 | |||
82 | A fakenbd connection can be created with mkdir(2). The name is | ||
83 | arbitrary, but likely the tool will make some use of the name. Perhaps | ||
84 | it is a uuid or a disk name: | ||
85 | |||
86 | # mkdir /config/fakenbd/disk1 | ||
87 | # ls /config/fakenbd/disk1 | ||
88 | target device rw | ||
89 | |||
90 | The target attribute contains the IP address of the server FakeNBD will | ||
91 | connect to. The device attribute is the device on the server. | ||
92 | Predictably, the rw attribute determines whether the connection is | ||
93 | read-only or read-write. | ||
94 | |||
95 | # echo 10.0.0.1 > /config/fakenbd/disk1/target | ||
96 | # echo /dev/sda1 > /config/fakenbd/disk1/device | ||
97 | # echo 1 > /config/fakenbd/disk1/rw | ||
98 | |||
99 | That's it. That's all there is. Now the device is configured, via the | ||
100 | shell no less. | ||
101 | |||
102 | [Coding With configfs] | ||
103 | |||
104 | Every object in configfs is a config_item. A config_item reflects an | ||
105 | object in the subsystem. It has attributes that match values on that | ||
106 | object. configfs handles the filesystem representation of that object | ||
107 | and its attributes, allowing the subsystem to ignore all but the | ||
108 | basic show/store interaction. | ||
109 | |||
110 | Items are created and destroyed inside a config_group. A group is a | ||
111 | collection of items that share the same attributes and operations. | ||
112 | Items are created by mkdir(2) and removed by rmdir(2), but configfs | ||
113 | handles that. The group has a set of operations to perform these tasks | ||
114 | |||
115 | A subsystem is the top level of a client module. During initialization, | ||
116 | the client module registers the subsystem with configfs, the subsystem | ||
117 | appears as a directory at the top of the configfs filesystem. A | ||
118 | subsystem is also a config_group, and can do everything a config_group | ||
119 | can. | ||
120 | |||
121 | [struct config_item] | ||
122 | |||
123 | struct config_item { | ||
124 | char *ci_name; | ||
125 | char ci_namebuf[UOBJ_NAME_LEN]; | ||
126 | struct kref ci_kref; | ||
127 | struct list_head ci_entry; | ||
128 | struct config_item *ci_parent; | ||
129 | struct config_group *ci_group; | ||
130 | struct config_item_type *ci_type; | ||
131 | struct dentry *ci_dentry; | ||
132 | }; | ||
133 | |||
134 | void config_item_init(struct config_item *); | ||
135 | void config_item_init_type_name(struct config_item *, | ||
136 | const char *name, | ||
137 | struct config_item_type *type); | ||
138 | struct config_item *config_item_get(struct config_item *); | ||
139 | void config_item_put(struct config_item *); | ||
140 | |||
141 | Generally, struct config_item is embedded in a container structure, a | ||
142 | structure that actually represents what the subsystem is doing. The | ||
143 | config_item portion of that structure is how the object interacts with | ||
144 | configfs. | ||
145 | |||
146 | Whether statically defined in a source file or created by a parent | ||
147 | config_group, a config_item must have one of the _init() functions | ||
148 | called on it. This initializes the reference count and sets up the | ||
149 | appropriate fields. | ||
150 | |||
151 | All users of a config_item should have a reference on it via | ||
152 | config_item_get(), and drop the reference when they are done via | ||
153 | config_item_put(). | ||
154 | |||
155 | By itself, a config_item cannot do much more than appear in configfs. | ||
156 | Usually a subsystem wants the item to display and/or store attributes, | ||
157 | among other things. For that, it needs a type. | ||
158 | |||
159 | [struct config_item_type] | ||
160 | |||
161 | struct configfs_item_operations { | ||
162 | void (*release)(struct config_item *); | ||
163 | ssize_t (*show_attribute)(struct config_item *, | ||
164 | struct configfs_attribute *, | ||
165 | char *); | ||
166 | ssize_t (*store_attribute)(struct config_item *, | ||
167 | struct configfs_attribute *, | ||
168 | const char *, size_t); | ||
169 | int (*allow_link)(struct config_item *src, | ||
170 | struct config_item *target); | ||
171 | int (*drop_link)(struct config_item *src, | ||
172 | struct config_item *target); | ||
173 | }; | ||
174 | |||
175 | struct config_item_type { | ||
176 | struct module *ct_owner; | ||
177 | struct configfs_item_operations *ct_item_ops; | ||
178 | struct configfs_group_operations *ct_group_ops; | ||
179 | struct configfs_attribute **ct_attrs; | ||
180 | }; | ||
181 | |||
182 | The most basic function of a config_item_type is to define what | ||
183 | operations can be performed on a config_item. All items that have been | ||
184 | allocated dynamically will need to provide the ct_item_ops->release() | ||
185 | method. This method is called when the config_item's reference count | ||
186 | reaches zero. Items that wish to display an attribute need to provide | ||
187 | the ct_item_ops->show_attribute() method. Similarly, storing a new | ||
188 | attribute value uses the store_attribute() method. | ||
189 | |||
190 | [struct configfs_attribute] | ||
191 | |||
192 | struct configfs_attribute { | ||
193 | char *ca_name; | ||
194 | struct module *ca_owner; | ||
195 | mode_t ca_mode; | ||
196 | }; | ||
197 | |||
198 | When a config_item wants an attribute to appear as a file in the item's | ||
199 | configfs directory, it must define a configfs_attribute describing it. | ||
200 | It then adds the attribute to the NULL-terminated array | ||
201 | config_item_type->ct_attrs. When the item appears in configfs, the | ||
202 | attribute file will appear with the configfs_attribute->ca_name | ||
203 | filename. configfs_attribute->ca_mode specifies the file permissions. | ||
204 | |||
205 | If an attribute is readable and the config_item provides a | ||
206 | ct_item_ops->show_attribute() method, that method will be called | ||
207 | whenever userspace asks for a read(2) on the attribute. The converse | ||
208 | will happen for write(2). | ||
209 | |||
210 | [struct config_group] | ||
211 | |||
212 | A config_item cannot live in a vaccum. The only way one can be created | ||
213 | is via mkdir(2) on a config_group. This will trigger creation of a | ||
214 | child item. | ||
215 | |||
216 | struct config_group { | ||
217 | struct config_item cg_item; | ||
218 | struct list_head cg_children; | ||
219 | struct configfs_subsystem *cg_subsys; | ||
220 | struct config_group **default_groups; | ||
221 | }; | ||
222 | |||
223 | void config_group_init(struct config_group *group); | ||
224 | void config_group_init_type_name(struct config_group *group, | ||
225 | const char *name, | ||
226 | struct config_item_type *type); | ||
227 | |||
228 | |||
229 | The config_group structure contains a config_item. Properly configuring | ||
230 | that item means that a group can behave as an item in its own right. | ||
231 | However, it can do more: it can create child items or groups. This is | ||
232 | accomplished via the group operations specified on the group's | ||
233 | config_item_type. | ||
234 | |||
235 | struct configfs_group_operations { | ||
236 | struct config_item *(*make_item)(struct config_group *group, | ||
237 | const char *name); | ||
238 | struct config_group *(*make_group)(struct config_group *group, | ||
239 | const char *name); | ||
240 | int (*commit_item)(struct config_item *item); | ||
241 | void (*drop_item)(struct config_group *group, | ||
242 | struct config_item *item); | ||
243 | }; | ||
244 | |||
245 | A group creates child items by providing the | ||
246 | ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new | ||
247 | config_item (or more likely, its container structure), initializes it, | ||
248 | and returns it to configfs. Configfs will then populate the filesystem | ||
249 | tree to reflect the new item. | ||
250 | |||
251 | If the subsystem wants the child to be a group itself, the subsystem | ||
252 | provides ct_group_ops->make_group(). Everything else behaves the same, | ||
253 | using the group _init() functions on the group. | ||
254 | |||
255 | Finally, when userspace calls rmdir(2) on the item or group, | ||
256 | ct_group_ops->drop_item() is called. As a config_group is also a | ||
257 | config_item, it is not necessary for a seperate drop_group() method. | ||
258 | The subsystem must config_item_put() the reference that was initialized | ||
259 | upon item allocation. If a subsystem has no work to do, it may omit | ||
260 | the ct_group_ops->drop_item() method, and configfs will call | ||
261 | config_item_put() on the item on behalf of the subsystem. | ||
262 | |||
263 | IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2) | ||
264 | is called, configfs WILL remove the item from the filesystem tree | ||
265 | (assuming that it has no children to keep it busy). The subsystem is | ||
266 | responsible for responding to this. If the subsystem has references to | ||
267 | the item in other threads, the memory is safe. It may take some time | ||
268 | for the item to actually disappear from the subsystem's usage. But it | ||
269 | is gone from configfs. | ||
270 | |||
271 | A config_group cannot be removed while it still has child items. This | ||
272 | is implemented in the configfs rmdir(2) code. ->drop_item() will not be | ||
273 | called, as the item has not been dropped. rmdir(2) will fail, as the | ||
274 | directory is not empty. | ||
275 | |||
276 | [struct configfs_subsystem] | ||
277 | |||
278 | A subsystem must register itself, ususally at module_init time. This | ||
279 | tells configfs to make the subsystem appear in the file tree. | ||
280 | |||
281 | struct configfs_subsystem { | ||
282 | struct config_group su_group; | ||
283 | struct semaphore su_sem; | ||
284 | }; | ||
285 | |||
286 | int configfs_register_subsystem(struct configfs_subsystem *subsys); | ||
287 | void configfs_unregister_subsystem(struct configfs_subsystem *subsys); | ||
288 | |||
289 | A subsystem consists of a toplevel config_group and a semaphore. | ||
290 | The group is where child config_items are created. For a subsystem, | ||
291 | this group is usually defined statically. Before calling | ||
292 | configfs_register_subsystem(), the subsystem must have initialized the | ||
293 | group via the usual group _init() functions, and it must also have | ||
294 | initialized the semaphore. | ||
295 | When the register call returns, the subsystem is live, and it | ||
296 | will be visible via configfs. At that point, mkdir(2) can be called and | ||
297 | the subsystem must be ready for it. | ||
298 | |||
299 | [An Example] | ||
300 | |||
301 | The best example of these basic concepts is the simple_children | ||
302 | subsystem/group and the simple_child item in configfs_example.c It | ||
303 | shows a trivial object displaying and storing an attribute, and a simple | ||
304 | group creating and destroying these children. | ||
305 | |||
306 | [Hierarchy Navigation and the Subsystem Semaphore] | ||
307 | |||
308 | There is an extra bonus that configfs provides. The config_groups and | ||
309 | config_items are arranged in a hierarchy due to the fact that they | ||
310 | appear in a filesystem. A subsystem is NEVER to touch the filesystem | ||
311 | parts, but the subsystem might be interested in this hierarchy. For | ||
312 | this reason, the hierarchy is mirrored via the config_group->cg_children | ||
313 | and config_item->ci_parent structure members. | ||
314 | |||
315 | A subsystem can navigate the cg_children list and the ci_parent pointer | ||
316 | to see the tree created by the subsystem. This can race with configfs' | ||
317 | management of the hierarchy, so configfs uses the subsystem semaphore to | ||
318 | protect modifications. Whenever a subsystem wants to navigate the | ||
319 | hierarchy, it must do so under the protection of the subsystem | ||
320 | semaphore. | ||
321 | |||
322 | A subsystem will be prevented from acquiring the semaphore while a newly | ||
323 | allocated item has not been linked into this hierarchy. Similarly, it | ||
324 | will not be able to acquire the semaphore while a dropping item has not | ||
325 | yet been unlinked. This means that an item's ci_parent pointer will | ||
326 | never be NULL while the item is in configfs, and that an item will only | ||
327 | be in its parent's cg_children list for the same duration. This allows | ||
328 | a subsystem to trust ci_parent and cg_children while they hold the | ||
329 | semaphore. | ||
330 | |||
331 | [Item Aggregation Via symlink(2)] | ||
332 | |||
333 | configfs provides a simple group via the group->item parent/child | ||
334 | relationship. Often, however, a larger environment requires aggregation | ||
335 | outside of the parent/child connection. This is implemented via | ||
336 | symlink(2). | ||
337 | |||
338 | A config_item may provide the ct_item_ops->allow_link() and | ||
339 | ct_item_ops->drop_link() methods. If the ->allow_link() method exists, | ||
340 | symlink(2) may be called with the config_item as the source of the link. | ||
341 | These links are only allowed between configfs config_items. Any | ||
342 | symlink(2) attempt outside the configfs filesystem will be denied. | ||
343 | |||
344 | When symlink(2) is called, the source config_item's ->allow_link() | ||
345 | method is called with itself and a target item. If the source item | ||
346 | allows linking to target item, it returns 0. A source item may wish to | ||
347 | reject a link if it only wants links to a certain type of object (say, | ||
348 | in its own subsystem). | ||
349 | |||
350 | When unlink(2) is called on the symbolic link, the source item is | ||
351 | notified via the ->drop_link() method. Like the ->drop_item() method, | ||
352 | this is a void function and cannot return failure. The subsystem is | ||
353 | responsible for responding to the change. | ||
354 | |||
355 | A config_item cannot be removed while it links to any other item, nor | ||
356 | can it be removed while an item links to it. Dangling symlinks are not | ||
357 | allowed in configfs. | ||
358 | |||
359 | [Automatically Created Subgroups] | ||
360 | |||
361 | A new config_group may want to have two types of child config_items. | ||
362 | While this could be codified by magic names in ->make_item(), it is much | ||
363 | more explicit to have a method whereby userspace sees this divergence. | ||
364 | |||
365 | Rather than have a group where some items behave differently than | ||
366 | others, configfs provides a method whereby one or many subgroups are | ||
367 | automatically created inside the parent at its creation. Thus, | ||
368 | mkdir("parent) results in "parent", "parent/subgroup1", up through | ||
369 | "parent/subgroupN". Items of type 1 can now be created in | ||
370 | "parent/subgroup1", and items of type N can be created in | ||
371 | "parent/subgroupN". | ||
372 | |||
373 | These automatic subgroups, or default groups, do not preclude other | ||
374 | children of the parent group. If ct_group_ops->make_group() exists, | ||
375 | other child groups can be created on the parent group directly. | ||
376 | |||
377 | A configfs subsystem specifies default groups by filling in the | ||
378 | NULL-terminated array default_groups on the config_group structure. | ||
379 | Each group in that array is populated in the configfs tree at the same | ||
380 | time as the parent group. Similarly, they are removed at the same time | ||
381 | as the parent. No extra notification is provided. When a ->drop_item() | ||
382 | method call notifies the subsystem the parent group is going away, it | ||
383 | also means every default group child associated with that parent group. | ||
384 | |||
385 | As a consequence of this, default_groups cannot be removed directly via | ||
386 | rmdir(2). They also are not considered when rmdir(2) on the parent | ||
387 | group is checking for children. | ||
388 | |||
389 | [Committable Items] | ||
390 | |||
391 | NOTE: Committable items are currently unimplemented. | ||
392 | |||
393 | Some config_items cannot have a valid initial state. That is, no | ||
394 | default values can be specified for the item's attributes such that the | ||
395 | item can do its work. Userspace must configure one or more attributes, | ||
396 | after which the subsystem can start whatever entity this item | ||
397 | represents. | ||
398 | |||
399 | Consider the FakeNBD device from above. Without a target address *and* | ||
400 | a target device, the subsystem has no idea what block device to import. | ||
401 | The simple example assumes that the subsystem merely waits until all the | ||
402 | appropriate attributes are configured, and then connects. This will, | ||
403 | indeed, work, but now every attribute store must check if the attributes | ||
404 | are initialized. Every attribute store must fire off the connection if | ||
405 | that condition is met. | ||
406 | |||
407 | Far better would be an explicit action notifying the subsystem that the | ||
408 | config_item is ready to go. More importantly, an explicit action allows | ||
409 | the subsystem to provide feedback as to whether the attibutes are | ||
410 | initialized in a way that makes sense. configfs provides this as | ||
411 | committable items. | ||
412 | |||
413 | configfs still uses only normal filesystem operations. An item is | ||
414 | committed via rename(2). The item is moved from a directory where it | ||
415 | can be modified to a directory where it cannot. | ||
416 | |||
417 | Any group that provides the ct_group_ops->commit_item() method has | ||
418 | committable items. When this group appears in configfs, mkdir(2) will | ||
419 | not work directly in the group. Instead, the group will have two | ||
420 | subdirectories: "live" and "pending". The "live" directory does not | ||
421 | support mkdir(2) or rmdir(2) either. It only allows rename(2). The | ||
422 | "pending" directory does allow mkdir(2) and rmdir(2). An item is | ||
423 | created in the "pending" directory. Its attributes can be modified at | ||
424 | will. Userspace commits the item by renaming it into the "live" | ||
425 | directory. At this point, the subsystem recieves the ->commit_item() | ||
426 | callback. If all required attributes are filled to satisfaction, the | ||
427 | method returns zero and the item is moved to the "live" directory. | ||
428 | |||
429 | As rmdir(2) does not work in the "live" directory, an item must be | ||
430 | shutdown, or "uncommitted". Again, this is done via rename(2), this | ||
431 | time from the "live" directory back to the "pending" one. The subsystem | ||
432 | is notified by the ct_group_ops->uncommit_object() method. | ||
433 | |||
434 | |||