diff options
author | Greg Kroah-Hartman <gregkh@suse.de> | 2006-01-06 15:59:59 -0500 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@suse.de> | 2006-01-06 15:59:59 -0500 |
commit | ccf18968b1bbc2fb117190a1984ac2a826dac228 (patch) | |
tree | 7bc8fbf5722aecf1e84fa50c31c657864cba1daa /Documentation | |
parent | e91c021c487110386a07facd0396e6c3b7cf9c1f (diff) | |
parent | d99cf9d679a520d67f81d805b7cb91c68e1847f0 (diff) |
Merge ../torvalds-2.6/
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/block/biodoc.txt | 10 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 11 | ||||
-rw-r--r-- | Documentation/filesystems/00-INDEX | 6 | ||||
-rw-r--r-- | Documentation/filesystems/configfs/configfs.txt | 434 | ||||
-rw-r--r-- | Documentation/filesystems/configfs/configfs_example.c | 474 | ||||
-rw-r--r-- | Documentation/filesystems/dlmfs.txt | 130 | ||||
-rw-r--r-- | Documentation/filesystems/ocfs2.txt | 55 | ||||
-rw-r--r-- | Documentation/keys.txt | 18 | ||||
-rw-r--r-- | Documentation/md.txt | 120 | ||||
-rw-r--r-- | Documentation/power/interface.txt | 11 | ||||
-rw-r--r-- | Documentation/power/swsusp.txt | 5 |
11 files changed, 1237 insertions, 37 deletions
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 303c57a7fad9..8e63831971d5 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt | |||
@@ -263,14 +263,8 @@ A flag in the bio structure, BIO_BARRIER is used to identify a barrier i/o. | |||
263 | The generic i/o scheduler would make sure that it places the barrier request and | 263 | The generic i/o scheduler would make sure that it places the barrier request and |
264 | all other requests coming after it after all the previous requests in the | 264 | all other requests coming after it after all the previous requests in the |
265 | queue. Barriers may be implemented in different ways depending on the | 265 | queue. Barriers may be implemented in different ways depending on the |
266 | driver. A SCSI driver for example could make use of ordered tags to | 266 | driver. For more details regarding I/O barriers, please read barrier.txt |
267 | preserve the necessary ordering with a lower impact on throughput. For IDE | 267 | in this directory. |
268 | this might be two sync cache flush: a pre and post flush when encountering | ||
269 | a barrier write. | ||
270 | |||
271 | There is a provision for queues to indicate what kind of barriers they | ||
272 | can provide. This is as of yet unmerged, details will be added here once it | ||
273 | is in the kernel. | ||
274 | 268 | ||
275 | 1.2.2 Request Priority/Latency | 269 | 1.2.2 Request Priority/Latency |
276 | 270 | ||
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index cb13b963f7ae..9474501dd6cc 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -47,17 +47,6 @@ Who: Paul E. McKenney <paulmck@us.ibm.com> | |||
47 | 47 | ||
48 | --------------------------- | 48 | --------------------------- |
49 | 49 | ||
50 | What: IEEE1394 Audio and Music Data Transmission Protocol driver, | ||
51 | Connection Management Procedures driver | ||
52 | When: November 2005 | ||
53 | Files: drivers/ieee1394/{amdtp,cmp}* | ||
54 | Why: These are incomplete, have never worked, and are better implemented | ||
55 | in userland via raw1394 (see http://freebob.sourceforge.net/ for | ||
56 | example.) | ||
57 | Who: Jody McIntyre <scjody@steamballoon.com> | ||
58 | |||
59 | --------------------------- | ||
60 | |||
61 | What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN | 50 | What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN |
62 | When: November 2005 | 51 | When: November 2005 |
63 | Why: Deprecated in favour of the new ioctl-based rawiso interface, which is | 52 | Why: Deprecated in favour of the new ioctl-based rawiso interface, which is |
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 7e17712f3229..74052d22d868 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX | |||
@@ -12,10 +12,14 @@ cifs.txt | |||
12 | - description of the CIFS filesystem | 12 | - description of the CIFS filesystem |
13 | coda.txt | 13 | coda.txt |
14 | - description of the CODA filesystem. | 14 | - description of the CODA filesystem. |
15 | configfs/ | ||
16 | - directory containing configfs documentation and example code. | ||
15 | cramfs.txt | 17 | cramfs.txt |
16 | - info on the cram filesystem for small storage (ROMs etc) | 18 | - info on the cram filesystem for small storage (ROMs etc) |
17 | devfs/ | 19 | devfs/ |
18 | - directory containing devfs documentation. | 20 | - directory containing devfs documentation. |
21 | dlmfs.txt | ||
22 | - info on the userspace interface to the OCFS2 DLM. | ||
19 | ext2.txt | 23 | ext2.txt |
20 | - info, mount options and specifications for the Ext2 filesystem. | 24 | - info, mount options and specifications for the Ext2 filesystem. |
21 | hpfs.txt | 25 | hpfs.txt |
@@ -30,6 +34,8 @@ ntfs.txt | |||
30 | - info and mount options for the NTFS filesystem (Windows NT). | 34 | - info and mount options for the NTFS filesystem (Windows NT). |
31 | proc.txt | 35 | proc.txt |
32 | - info on Linux's /proc filesystem. | 36 | - info on Linux's /proc filesystem. |
37 | ocfs2.txt | ||
38 | - info and mount options for the OCFS2 clustered filesystem. | ||
33 | romfs.txt | 39 | romfs.txt |
34 | - Description of the ROMFS filesystem. | 40 | - Description of the ROMFS filesystem. |
35 | smbfs.txt | 41 | smbfs.txt |
diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt new file mode 100644 index 000000000000..c4ff96b7c4e0 --- /dev/null +++ b/Documentation/filesystems/configfs/configfs.txt | |||
@@ -0,0 +1,434 @@ | |||
1 | |||
2 | configfs - Userspace-driven kernel object configuation. | ||
3 | |||
4 | Joel Becker <joel.becker@oracle.com> | ||
5 | |||
6 | Updated: 31 March 2005 | ||
7 | |||
8 | Copyright (c) 2005 Oracle Corporation, | ||
9 | Joel Becker <joel.becker@oracle.com> | ||
10 | |||
11 | |||
12 | [What is configfs?] | ||
13 | |||
14 | configfs is a ram-based filesystem that provides the converse of | ||
15 | sysfs's functionality. Where sysfs is a filesystem-based view of | ||
16 | kernel objects, configfs is a filesystem-based manager of kernel | ||
17 | objects, or config_items. | ||
18 | |||
19 | With sysfs, an object is created in kernel (for example, when a device | ||
20 | is discovered) and it is registered with sysfs. Its attributes then | ||
21 | appear in sysfs, allowing userspace to read the attributes via | ||
22 | readdir(3)/read(2). It may allow some attributes to be modified via | ||
23 | write(2). The important point is that the object is created and | ||
24 | destroyed in kernel, the kernel controls the lifecycle of the sysfs | ||
25 | representation, and sysfs is merely a window on all this. | ||
26 | |||
27 | A configfs config_item is created via an explicit userspace operation: | ||
28 | mkdir(2). It is destroyed via rmdir(2). The attributes appear at | ||
29 | mkdir(2) time, and can be read or modified via read(2) and write(2). | ||
30 | As with sysfs, readdir(3) queries the list of items and/or attributes. | ||
31 | symlink(2) can be used to group items together. Unlike sysfs, the | ||
32 | lifetime of the representation is completely driven by userspace. The | ||
33 | kernel modules backing the items must respond to this. | ||
34 | |||
35 | Both sysfs and configfs can and should exist together on the same | ||
36 | system. One is not a replacement for the other. | ||
37 | |||
38 | [Using configfs] | ||
39 | |||
40 | configfs can be compiled as a module or into the kernel. You can access | ||
41 | it by doing | ||
42 | |||
43 | mount -t configfs none /config | ||
44 | |||
45 | The configfs tree will be empty unless client modules are also loaded. | ||
46 | These are modules that register their item types with configfs as | ||
47 | subsystems. Once a client subsystem is loaded, it will appear as a | ||
48 | subdirectory (or more than one) under /config. Like sysfs, the | ||
49 | configfs tree is always there, whether mounted on /config or not. | ||
50 | |||
51 | An item is created via mkdir(2). The item's attributes will also | ||
52 | appear at this time. readdir(3) can determine what the attributes are, | ||
53 | read(2) can query their default values, and write(2) can store new | ||
54 | values. Like sysfs, attributes should be ASCII text files, preferably | ||
55 | with only one value per file. The same efficiency caveats from sysfs | ||
56 | apply. Don't mix more than one attribute in one attribute file. | ||
57 | |||
58 | Like sysfs, configfs expects write(2) to store the entire buffer at | ||
59 | once. When writing to configfs attributes, userspace processes should | ||
60 | first read the entire file, modify the portions they wish to change, and | ||
61 | then write the entire buffer back. Attribute files have a maximum size | ||
62 | of one page (PAGE_SIZE, 4096 on i386). | ||
63 | |||
64 | When an item needs to be destroyed, remove it with rmdir(2). An | ||
65 | item cannot be destroyed if any other item has a link to it (via | ||
66 | symlink(2)). Links can be removed via unlink(2). | ||
67 | |||
68 | [Configuring FakeNBD: an Example] | ||
69 | |||
70 | Imagine there's a Network Block Device (NBD) driver that allows you to | ||
71 | access remote block devices. Call it FakeNBD. FakeNBD uses configfs | ||
72 | for its configuration. Obviously, there will be a nice program that | ||
73 | sysadmins use to configure FakeNBD, but somehow that program has to tell | ||
74 | the driver about it. Here's where configfs comes in. | ||
75 | |||
76 | When the FakeNBD driver is loaded, it registers itself with configfs. | ||
77 | readdir(3) sees this just fine: | ||
78 | |||
79 | # ls /config | ||
80 | fakenbd | ||
81 | |||
82 | A fakenbd connection can be created with mkdir(2). The name is | ||
83 | arbitrary, but likely the tool will make some use of the name. Perhaps | ||
84 | it is a uuid or a disk name: | ||
85 | |||
86 | # mkdir /config/fakenbd/disk1 | ||
87 | # ls /config/fakenbd/disk1 | ||
88 | target device rw | ||
89 | |||
90 | The target attribute contains the IP address of the server FakeNBD will | ||
91 | connect to. The device attribute is the device on the server. | ||
92 | Predictably, the rw attribute determines whether the connection is | ||
93 | read-only or read-write. | ||
94 | |||
95 | # echo 10.0.0.1 > /config/fakenbd/disk1/target | ||
96 | # echo /dev/sda1 > /config/fakenbd/disk1/device | ||
97 | # echo 1 > /config/fakenbd/disk1/rw | ||
98 | |||
99 | That's it. That's all there is. Now the device is configured, via the | ||
100 | shell no less. | ||
101 | |||
102 | [Coding With configfs] | ||
103 | |||
104 | Every object in configfs is a config_item. A config_item reflects an | ||
105 | object in the subsystem. It has attributes that match values on that | ||
106 | object. configfs handles the filesystem representation of that object | ||
107 | and its attributes, allowing the subsystem to ignore all but the | ||
108 | basic show/store interaction. | ||
109 | |||
110 | Items are created and destroyed inside a config_group. A group is a | ||
111 | collection of items that share the same attributes and operations. | ||
112 | Items are created by mkdir(2) and removed by rmdir(2), but configfs | ||
113 | handles that. The group has a set of operations to perform these tasks | ||
114 | |||
115 | A subsystem is the top level of a client module. During initialization, | ||
116 | the client module registers the subsystem with configfs, the subsystem | ||
117 | appears as a directory at the top of the configfs filesystem. A | ||
118 | subsystem is also a config_group, and can do everything a config_group | ||
119 | can. | ||
120 | |||
121 | [struct config_item] | ||
122 | |||
123 | struct config_item { | ||
124 | char *ci_name; | ||
125 | char ci_namebuf[UOBJ_NAME_LEN]; | ||
126 | struct kref ci_kref; | ||
127 | struct list_head ci_entry; | ||
128 | struct config_item *ci_parent; | ||
129 | struct config_group *ci_group; | ||
130 | struct config_item_type *ci_type; | ||
131 | struct dentry *ci_dentry; | ||
132 | }; | ||
133 | |||
134 | void config_item_init(struct config_item *); | ||
135 | void config_item_init_type_name(struct config_item *, | ||
136 | const char *name, | ||
137 | struct config_item_type *type); | ||
138 | struct config_item *config_item_get(struct config_item *); | ||
139 | void config_item_put(struct config_item *); | ||
140 | |||
141 | Generally, struct config_item is embedded in a container structure, a | ||
142 | structure that actually represents what the subsystem is doing. The | ||
143 | config_item portion of that structure is how the object interacts with | ||
144 | configfs. | ||
145 | |||
146 | Whether statically defined in a source file or created by a parent | ||
147 | config_group, a config_item must have one of the _init() functions | ||
148 | called on it. This initializes the reference count and sets up the | ||
149 | appropriate fields. | ||
150 | |||
151 | All users of a config_item should have a reference on it via | ||
152 | config_item_get(), and drop the reference when they are done via | ||
153 | config_item_put(). | ||
154 | |||
155 | By itself, a config_item cannot do much more than appear in configfs. | ||
156 | Usually a subsystem wants the item to display and/or store attributes, | ||
157 | among other things. For that, it needs a type. | ||
158 | |||
159 | [struct config_item_type] | ||
160 | |||
161 | struct configfs_item_operations { | ||
162 | void (*release)(struct config_item *); | ||
163 | ssize_t (*show_attribute)(struct config_item *, | ||
164 | struct configfs_attribute *, | ||
165 | char *); | ||
166 | ssize_t (*store_attribute)(struct config_item *, | ||
167 | struct configfs_attribute *, | ||
168 | const char *, size_t); | ||
169 | int (*allow_link)(struct config_item *src, | ||
170 | struct config_item *target); | ||
171 | int (*drop_link)(struct config_item *src, | ||
172 | struct config_item *target); | ||
173 | }; | ||
174 | |||
175 | struct config_item_type { | ||
176 | struct module *ct_owner; | ||
177 | struct configfs_item_operations *ct_item_ops; | ||
178 | struct configfs_group_operations *ct_group_ops; | ||
179 | struct configfs_attribute **ct_attrs; | ||
180 | }; | ||
181 | |||
182 | The most basic function of a config_item_type is to define what | ||
183 | operations can be performed on a config_item. All items that have been | ||
184 | allocated dynamically will need to provide the ct_item_ops->release() | ||
185 | method. This method is called when the config_item's reference count | ||
186 | reaches zero. Items that wish to display an attribute need to provide | ||
187 | the ct_item_ops->show_attribute() method. Similarly, storing a new | ||
188 | attribute value uses the store_attribute() method. | ||
189 | |||
190 | [struct configfs_attribute] | ||
191 | |||
192 | struct configfs_attribute { | ||
193 | char *ca_name; | ||
194 | struct module *ca_owner; | ||
195 | mode_t ca_mode; | ||
196 | }; | ||
197 | |||
198 | When a config_item wants an attribute to appear as a file in the item's | ||
199 | configfs directory, it must define a configfs_attribute describing it. | ||
200 | It then adds the attribute to the NULL-terminated array | ||
201 | config_item_type->ct_attrs. When the item appears in configfs, the | ||
202 | attribute file will appear with the configfs_attribute->ca_name | ||
203 | filename. configfs_attribute->ca_mode specifies the file permissions. | ||
204 | |||
205 | If an attribute is readable and the config_item provides a | ||
206 | ct_item_ops->show_attribute() method, that method will be called | ||
207 | whenever userspace asks for a read(2) on the attribute. The converse | ||
208 | will happen for write(2). | ||
209 | |||
210 | [struct config_group] | ||
211 | |||
212 | A config_item cannot live in a vaccum. The only way one can be created | ||
213 | is via mkdir(2) on a config_group. This will trigger creation of a | ||
214 | child item. | ||
215 | |||
216 | struct config_group { | ||
217 | struct config_item cg_item; | ||
218 | struct list_head cg_children; | ||
219 | struct configfs_subsystem *cg_subsys; | ||
220 | struct config_group **default_groups; | ||
221 | }; | ||
222 | |||
223 | void config_group_init(struct config_group *group); | ||
224 | void config_group_init_type_name(struct config_group *group, | ||
225 | const char *name, | ||
226 | struct config_item_type *type); | ||
227 | |||
228 | |||
229 | The config_group structure contains a config_item. Properly configuring | ||
230 | that item means that a group can behave as an item in its own right. | ||
231 | However, it can do more: it can create child items or groups. This is | ||
232 | accomplished via the group operations specified on the group's | ||
233 | config_item_type. | ||
234 | |||
235 | struct configfs_group_operations { | ||
236 | struct config_item *(*make_item)(struct config_group *group, | ||
237 | const char *name); | ||
238 | struct config_group *(*make_group)(struct config_group *group, | ||
239 | const char *name); | ||
240 | int (*commit_item)(struct config_item *item); | ||
241 | void (*drop_item)(struct config_group *group, | ||
242 | struct config_item *item); | ||
243 | }; | ||
244 | |||
245 | A group creates child items by providing the | ||
246 | ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new | ||
247 | config_item (or more likely, its container structure), initializes it, | ||
248 | and returns it to configfs. Configfs will then populate the filesystem | ||
249 | tree to reflect the new item. | ||
250 | |||
251 | If the subsystem wants the child to be a group itself, the subsystem | ||
252 | provides ct_group_ops->make_group(). Everything else behaves the same, | ||
253 | using the group _init() functions on the group. | ||
254 | |||
255 | Finally, when userspace calls rmdir(2) on the item or group, | ||
256 | ct_group_ops->drop_item() is called. As a config_group is also a | ||
257 | config_item, it is not necessary for a seperate drop_group() method. | ||
258 | The subsystem must config_item_put() the reference that was initialized | ||
259 | upon item allocation. If a subsystem has no work to do, it may omit | ||
260 | the ct_group_ops->drop_item() method, and configfs will call | ||
261 | config_item_put() on the item on behalf of the subsystem. | ||
262 | |||
263 | IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2) | ||
264 | is called, configfs WILL remove the item from the filesystem tree | ||
265 | (assuming that it has no children to keep it busy). The subsystem is | ||
266 | responsible for responding to this. If the subsystem has references to | ||
267 | the item in other threads, the memory is safe. It may take some time | ||
268 | for the item to actually disappear from the subsystem's usage. But it | ||
269 | is gone from configfs. | ||
270 | |||
271 | A config_group cannot be removed while it still has child items. This | ||
272 | is implemented in the configfs rmdir(2) code. ->drop_item() will not be | ||
273 | called, as the item has not been dropped. rmdir(2) will fail, as the | ||
274 | directory is not empty. | ||
275 | |||
276 | [struct configfs_subsystem] | ||
277 | |||
278 | A subsystem must register itself, ususally at module_init time. This | ||
279 | tells configfs to make the subsystem appear in the file tree. | ||
280 | |||
281 | struct configfs_subsystem { | ||
282 | struct config_group su_group; | ||
283 | struct semaphore su_sem; | ||
284 | }; | ||
285 | |||
286 | int configfs_register_subsystem(struct configfs_subsystem *subsys); | ||
287 | void configfs_unregister_subsystem(struct configfs_subsystem *subsys); | ||
288 | |||
289 | A subsystem consists of a toplevel config_group and a semaphore. | ||
290 | The group is where child config_items are created. For a subsystem, | ||
291 | this group is usually defined statically. Before calling | ||
292 | configfs_register_subsystem(), the subsystem must have initialized the | ||
293 | group via the usual group _init() functions, and it must also have | ||
294 | initialized the semaphore. | ||
295 | When the register call returns, the subsystem is live, and it | ||
296 | will be visible via configfs. At that point, mkdir(2) can be called and | ||
297 | the subsystem must be ready for it. | ||
298 | |||
299 | [An Example] | ||
300 | |||
301 | The best example of these basic concepts is the simple_children | ||
302 | subsystem/group and the simple_child item in configfs_example.c It | ||
303 | shows a trivial object displaying and storing an attribute, and a simple | ||
304 | group creating and destroying these children. | ||
305 | |||
306 | [Hierarchy Navigation and the Subsystem Semaphore] | ||
307 | |||
308 | There is an extra bonus that configfs provides. The config_groups and | ||
309 | config_items are arranged in a hierarchy due to the fact that they | ||
310 | appear in a filesystem. A subsystem is NEVER to touch the filesystem | ||
311 | parts, but the subsystem might be interested in this hierarchy. For | ||
312 | this reason, the hierarchy is mirrored via the config_group->cg_children | ||
313 | and config_item->ci_parent structure members. | ||
314 | |||
315 | A subsystem can navigate the cg_children list and the ci_parent pointer | ||
316 | to see the tree created by the subsystem. This can race with configfs' | ||
317 | management of the hierarchy, so configfs uses the subsystem semaphore to | ||
318 | protect modifications. Whenever a subsystem wants to navigate the | ||
319 | hierarchy, it must do so under the protection of the subsystem | ||
320 | semaphore. | ||
321 | |||
322 | A subsystem will be prevented from acquiring the semaphore while a newly | ||
323 | allocated item has not been linked into this hierarchy. Similarly, it | ||
324 | will not be able to acquire the semaphore while a dropping item has not | ||
325 | yet been unlinked. This means that an item's ci_parent pointer will | ||
326 | never be NULL while the item is in configfs, and that an item will only | ||
327 | be in its parent's cg_children list for the same duration. This allows | ||
328 | a subsystem to trust ci_parent and cg_children while they hold the | ||
329 | semaphore. | ||
330 | |||
331 | [Item Aggregation Via symlink(2)] | ||
332 | |||
333 | configfs provides a simple group via the group->item parent/child | ||
334 | relationship. Often, however, a larger environment requires aggregation | ||
335 | outside of the parent/child connection. This is implemented via | ||
336 | symlink(2). | ||
337 | |||
338 | A config_item may provide the ct_item_ops->allow_link() and | ||
339 | ct_item_ops->drop_link() methods. If the ->allow_link() method exists, | ||
340 | symlink(2) may be called with the config_item as the source of the link. | ||
341 | These links are only allowed between configfs config_items. Any | ||
342 | symlink(2) attempt outside the configfs filesystem will be denied. | ||
343 | |||
344 | When symlink(2) is called, the source config_item's ->allow_link() | ||
345 | method is called with itself and a target item. If the source item | ||
346 | allows linking to target item, it returns 0. A source item may wish to | ||
347 | reject a link if it only wants links to a certain type of object (say, | ||
348 | in its own subsystem). | ||
349 | |||
350 | When unlink(2) is called on the symbolic link, the source item is | ||
351 | notified via the ->drop_link() method. Like the ->drop_item() method, | ||
352 | this is a void function and cannot return failure. The subsystem is | ||
353 | responsible for responding to the change. | ||
354 | |||
355 | A config_item cannot be removed while it links to any other item, nor | ||
356 | can it be removed while an item links to it. Dangling symlinks are not | ||
357 | allowed in configfs. | ||
358 | |||
359 | [Automatically Created Subgroups] | ||
360 | |||
361 | A new config_group may want to have two types of child config_items. | ||
362 | While this could be codified by magic names in ->make_item(), it is much | ||
363 | more explicit to have a method whereby userspace sees this divergence. | ||
364 | |||
365 | Rather than have a group where some items behave differently than | ||
366 | others, configfs provides a method whereby one or many subgroups are | ||
367 | automatically created inside the parent at its creation. Thus, | ||
368 | mkdir("parent) results in "parent", "parent/subgroup1", up through | ||
369 | "parent/subgroupN". Items of type 1 can now be created in | ||
370 | "parent/subgroup1", and items of type N can be created in | ||
371 | "parent/subgroupN". | ||
372 | |||
373 | These automatic subgroups, or default groups, do not preclude other | ||
374 | children of the parent group. If ct_group_ops->make_group() exists, | ||
375 | other child groups can be created on the parent group directly. | ||
376 | |||
377 | A configfs subsystem specifies default groups by filling in the | ||
378 | NULL-terminated array default_groups on the config_group structure. | ||
379 | Each group in that array is populated in the configfs tree at the same | ||
380 | time as the parent group. Similarly, they are removed at the same time | ||
381 | as the parent. No extra notification is provided. When a ->drop_item() | ||
382 | method call notifies the subsystem the parent group is going away, it | ||
383 | also means every default group child associated with that parent group. | ||
384 | |||
385 | As a consequence of this, default_groups cannot be removed directly via | ||
386 | rmdir(2). They also are not considered when rmdir(2) on the parent | ||
387 | group is checking for children. | ||
388 | |||
389 | [Committable Items] | ||
390 | |||
391 | NOTE: Committable items are currently unimplemented. | ||
392 | |||
393 | Some config_items cannot have a valid initial state. That is, no | ||
394 | default values can be specified for the item's attributes such that the | ||
395 | item can do its work. Userspace must configure one or more attributes, | ||
396 | after which the subsystem can start whatever entity this item | ||
397 | represents. | ||
398 | |||
399 | Consider the FakeNBD device from above. Without a target address *and* | ||
400 | a target device, the subsystem has no idea what block device to import. | ||
401 | The simple example assumes that the subsystem merely waits until all the | ||
402 | appropriate attributes are configured, and then connects. This will, | ||
403 | indeed, work, but now every attribute store must check if the attributes | ||
404 | are initialized. Every attribute store must fire off the connection if | ||
405 | that condition is met. | ||
406 | |||
407 | Far better would be an explicit action notifying the subsystem that the | ||
408 | config_item is ready to go. More importantly, an explicit action allows | ||
409 | the subsystem to provide feedback as to whether the attibutes are | ||
410 | initialized in a way that makes sense. configfs provides this as | ||
411 | committable items. | ||
412 | |||
413 | configfs still uses only normal filesystem operations. An item is | ||
414 | committed via rename(2). The item is moved from a directory where it | ||
415 | can be modified to a directory where it cannot. | ||
416 | |||
417 | Any group that provides the ct_group_ops->commit_item() method has | ||
418 | committable items. When this group appears in configfs, mkdir(2) will | ||
419 | not work directly in the group. Instead, the group will have two | ||
420 | subdirectories: "live" and "pending". The "live" directory does not | ||
421 | support mkdir(2) or rmdir(2) either. It only allows rename(2). The | ||
422 | "pending" directory does allow mkdir(2) and rmdir(2). An item is | ||
423 | created in the "pending" directory. Its attributes can be modified at | ||
424 | will. Userspace commits the item by renaming it into the "live" | ||
425 | directory. At this point, the subsystem recieves the ->commit_item() | ||
426 | callback. If all required attributes are filled to satisfaction, the | ||
427 | method returns zero and the item is moved to the "live" directory. | ||
428 | |||
429 | As rmdir(2) does not work in the "live" directory, an item must be | ||
430 | shutdown, or "uncommitted". Again, this is done via rename(2), this | ||
431 | time from the "live" directory back to the "pending" one. The subsystem | ||
432 | is notified by the ct_group_ops->uncommit_object() method. | ||
433 | |||
434 | |||
diff --git a/Documentation/filesystems/configfs/configfs_example.c b/Documentation/filesystems/configfs/configfs_example.c new file mode 100644 index 000000000000..f3c6e4946f98 --- /dev/null +++ b/Documentation/filesystems/configfs/configfs_example.c | |||
@@ -0,0 +1,474 @@ | |||
1 | /* | ||
2 | * vim: noexpandtab ts=8 sts=0 sw=8: | ||
3 | * | ||
4 | * configfs_example.c - This file is a demonstration module containing | ||
5 | * a number of configfs subsystems. | ||
6 | * | ||
7 | * This program is free software; you can redistribute it and/or | ||
8 | * modify it under the terms of the GNU General Public | ||
9 | * License as published by the Free Software Foundation; either | ||
10 | * version 2 of the License, or (at your option) any later version. | ||
11 | * | ||
12 | * This program is distributed in the hope that it will be useful, | ||
13 | * but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
14 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | ||
15 | * General Public License for more details. | ||
16 | * | ||
17 | * You should have received a copy of the GNU General Public | ||
18 | * License along with this program; if not, write to the | ||
19 | * Free Software Foundation, Inc., 59 Temple Place - Suite 330, | ||
20 | * Boston, MA 021110-1307, USA. | ||
21 | * | ||
22 | * Based on sysfs: | ||
23 | * sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel | ||
24 | * | ||
25 | * configfs Copyright (C) 2005 Oracle. All rights reserved. | ||
26 | */ | ||
27 | |||
28 | #include <linux/init.h> | ||
29 | #include <linux/module.h> | ||
30 | #include <linux/slab.h> | ||
31 | |||
32 | #include <linux/configfs.h> | ||
33 | |||
34 | |||
35 | |||
36 | /* | ||
37 | * 01-childless | ||
38 | * | ||
39 | * This first example is a childless subsystem. It cannot create | ||
40 | * any config_items. It just has attributes. | ||
41 | * | ||
42 | * Note that we are enclosing the configfs_subsystem inside a container. | ||
43 | * This is not necessary if a subsystem has no attributes directly | ||
44 | * on the subsystem. See the next example, 02-simple-children, for | ||
45 | * such a subsystem. | ||
46 | */ | ||
47 | |||
48 | struct childless { | ||
49 | struct configfs_subsystem subsys; | ||
50 | int showme; | ||
51 | int storeme; | ||
52 | }; | ||
53 | |||
54 | struct childless_attribute { | ||
55 | struct configfs_attribute attr; | ||
56 | ssize_t (*show)(struct childless *, char *); | ||
57 | ssize_t (*store)(struct childless *, const char *, size_t); | ||
58 | }; | ||
59 | |||
60 | static inline struct childless *to_childless(struct config_item *item) | ||
61 | { | ||
62 | return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL; | ||
63 | } | ||
64 | |||
65 | static ssize_t childless_showme_read(struct childless *childless, | ||
66 | char *page) | ||
67 | { | ||
68 | ssize_t pos; | ||
69 | |||
70 | pos = sprintf(page, "%d\n", childless->showme); | ||
71 | childless->showme++; | ||
72 | |||
73 | return pos; | ||
74 | } | ||
75 | |||
76 | static ssize_t childless_storeme_read(struct childless *childless, | ||
77 | char *page) | ||
78 | { | ||
79 | return sprintf(page, "%d\n", childless->storeme); | ||
80 | } | ||
81 | |||
82 | static ssize_t childless_storeme_write(struct childless *childless, | ||
83 | const char *page, | ||
84 | size_t count) | ||
85 | { | ||
86 | unsigned long tmp; | ||
87 | char *p = (char *) page; | ||
88 | |||
89 | tmp = simple_strtoul(p, &p, 10); | ||
90 | if (!p || (*p && (*p != '\n'))) | ||
91 | return -EINVAL; | ||
92 | |||
93 | if (tmp > INT_MAX) | ||
94 | return -ERANGE; | ||
95 | |||
96 | childless->storeme = tmp; | ||
97 | |||
98 | return count; | ||
99 | } | ||
100 | |||
101 | static ssize_t childless_description_read(struct childless *childless, | ||
102 | char *page) | ||
103 | { | ||
104 | return sprintf(page, | ||
105 | "[01-childless]\n" | ||
106 | "\n" | ||
107 | "The childless subsystem is the simplest possible subsystem in\n" | ||
108 | "configfs. It does not support the creation of child config_items.\n" | ||
109 | "It only has a few attributes. In fact, it isn't much different\n" | ||
110 | "than a directory in /proc.\n"); | ||
111 | } | ||
112 | |||
113 | static struct childless_attribute childless_attr_showme = { | ||
114 | .attr = { .ca_owner = THIS_MODULE, .ca_name = "showme", .ca_mode = S_IRUGO }, | ||
115 | .show = childless_showme_read, | ||
116 | }; | ||
117 | static struct childless_attribute childless_attr_storeme = { | ||
118 | .attr = { .ca_owner = THIS_MODULE, .ca_name = "storeme", .ca_mode = S_IRUGO | S_IWUSR }, | ||
119 | .show = childless_storeme_read, | ||
120 | .store = childless_storeme_write, | ||
121 | }; | ||
122 | static struct childless_attribute childless_attr_description = { | ||
123 | .attr = { .ca_owner = THIS_MODULE, .ca_name = "description", .ca_mode = S_IRUGO }, | ||
124 | .show = childless_description_read, | ||
125 | }; | ||
126 | |||
127 | static struct configfs_attribute *childless_attrs[] = { | ||
128 | &childless_attr_showme.attr, | ||
129 | &childless_attr_storeme.attr, | ||
130 | &childless_attr_description.attr, | ||
131 | NULL, | ||
132 | }; | ||
133 | |||
134 | static ssize_t childless_attr_show(struct config_item *item, | ||
135 | struct configfs_attribute *attr, | ||
136 | char *page) | ||
137 | { | ||
138 | struct childless *childless = to_childless(item); | ||
139 | struct childless_attribute *childless_attr = | ||
140 | container_of(attr, struct childless_attribute, attr); | ||
141 | ssize_t ret = 0; | ||
142 | |||
143 | if (childless_attr->show) | ||
144 | ret = childless_attr->show(childless, page); | ||
145 | return ret; | ||
146 | } | ||
147 | |||
148 | static ssize_t childless_attr_store(struct config_item *item, | ||
149 | struct configfs_attribute *attr, | ||
150 | const char *page, size_t count) | ||
151 | { | ||
152 | struct childless *childless = to_childless(item); | ||
153 | struct childless_attribute *childless_attr = | ||
154 | container_of(attr, struct childless_attribute, attr); | ||
155 | ssize_t ret = -EINVAL; | ||
156 | |||
157 | if (childless_attr->store) | ||
158 | ret = childless_attr->store(childless, page, count); | ||
159 | return ret; | ||
160 | } | ||
161 | |||
162 | static struct configfs_item_operations childless_item_ops = { | ||
163 | .show_attribute = childless_attr_show, | ||
164 | .store_attribute = childless_attr_store, | ||
165 | }; | ||
166 | |||
167 | static struct config_item_type childless_type = { | ||
168 | .ct_item_ops = &childless_item_ops, | ||
169 | .ct_attrs = childless_attrs, | ||
170 | .ct_owner = THIS_MODULE, | ||
171 | }; | ||
172 | |||
173 | static struct childless childless_subsys = { | ||
174 | .subsys = { | ||
175 | .su_group = { | ||
176 | .cg_item = { | ||
177 | .ci_namebuf = "01-childless", | ||
178 | .ci_type = &childless_type, | ||
179 | }, | ||
180 | }, | ||
181 | }, | ||
182 | }; | ||
183 | |||
184 | |||
185 | /* ----------------------------------------------------------------- */ | ||
186 | |||
187 | /* | ||
188 | * 02-simple-children | ||
189 | * | ||
190 | * This example merely has a simple one-attribute child. Note that | ||
191 | * there is no extra attribute structure, as the child's attribute is | ||
192 | * known from the get-go. Also, there is no container for the | ||
193 | * subsystem, as it has no attributes of its own. | ||
194 | */ | ||
195 | |||
196 | struct simple_child { | ||
197 | struct config_item item; | ||
198 | int storeme; | ||
199 | }; | ||
200 | |||
201 | static inline struct simple_child *to_simple_child(struct config_item *item) | ||
202 | { | ||
203 | return item ? container_of(item, struct simple_child, item) : NULL; | ||
204 | } | ||
205 | |||
206 | static struct configfs_attribute simple_child_attr_storeme = { | ||
207 | .ca_owner = THIS_MODULE, | ||
208 | .ca_name = "storeme", | ||
209 | .ca_mode = S_IRUGO | S_IWUSR, | ||
210 | }; | ||
211 | |||
212 | static struct configfs_attribute *simple_child_attrs[] = { | ||
213 | &simple_child_attr_storeme, | ||
214 | NULL, | ||
215 | }; | ||
216 | |||
217 | static ssize_t simple_child_attr_show(struct config_item *item, | ||
218 | struct configfs_attribute *attr, | ||
219 | char *page) | ||
220 | { | ||
221 | ssize_t count; | ||
222 | struct simple_child *simple_child = to_simple_child(item); | ||
223 | |||
224 | count = sprintf(page, "%d\n", simple_child->storeme); | ||
225 | |||
226 | return count; | ||
227 | } | ||
228 | |||
229 | static ssize_t simple_child_attr_store(struct config_item *item, | ||
230 | struct configfs_attribute *attr, | ||
231 | const char *page, size_t count) | ||
232 | { | ||
233 | struct simple_child *simple_child = to_simple_child(item); | ||
234 | unsigned long tmp; | ||
235 | char *p = (char *) page; | ||
236 | |||
237 | tmp = simple_strtoul(p, &p, 10); | ||
238 | if (!p || (*p && (*p != '\n'))) | ||
239 | return -EINVAL; | ||
240 | |||
241 | if (tmp > INT_MAX) | ||
242 | return -ERANGE; | ||
243 | |||
244 | simple_child->storeme = tmp; | ||
245 | |||
246 | return count; | ||
247 | } | ||
248 | |||
249 | static void simple_child_release(struct config_item *item) | ||
250 | { | ||
251 | kfree(to_simple_child(item)); | ||
252 | } | ||
253 | |||
254 | static struct configfs_item_operations simple_child_item_ops = { | ||
255 | .release = simple_child_release, | ||
256 | .show_attribute = simple_child_attr_show, | ||
257 | .store_attribute = simple_child_attr_store, | ||
258 | }; | ||
259 | |||
260 | static struct config_item_type simple_child_type = { | ||
261 | .ct_item_ops = &simple_child_item_ops, | ||
262 | .ct_attrs = simple_child_attrs, | ||
263 | .ct_owner = THIS_MODULE, | ||
264 | }; | ||
265 | |||
266 | |||
267 | static struct config_item *simple_children_make_item(struct config_group *group, const char *name) | ||
268 | { | ||
269 | struct simple_child *simple_child; | ||
270 | |||
271 | simple_child = kmalloc(sizeof(struct simple_child), GFP_KERNEL); | ||
272 | if (!simple_child) | ||
273 | return NULL; | ||
274 | |||
275 | memset(simple_child, 0, sizeof(struct simple_child)); | ||
276 | |||
277 | config_item_init_type_name(&simple_child->item, name, | ||
278 | &simple_child_type); | ||
279 | |||
280 | simple_child->storeme = 0; | ||
281 | |||
282 | return &simple_child->item; | ||
283 | } | ||
284 | |||
285 | static struct configfs_attribute simple_children_attr_description = { | ||
286 | .ca_owner = THIS_MODULE, | ||
287 | .ca_name = "description", | ||
288 | .ca_mode = S_IRUGO, | ||
289 | }; | ||
290 | |||
291 | static struct configfs_attribute *simple_children_attrs[] = { | ||
292 | &simple_children_attr_description, | ||
293 | NULL, | ||
294 | }; | ||
295 | |||
296 | static ssize_t simple_children_attr_show(struct config_item *item, | ||
297 | struct configfs_attribute *attr, | ||
298 | char *page) | ||
299 | { | ||
300 | return sprintf(page, | ||
301 | "[02-simple-children]\n" | ||
302 | "\n" | ||
303 | "This subsystem allows the creation of child config_items. These\n" | ||
304 | "items have only one attribute that is readable and writeable.\n"); | ||
305 | } | ||
306 | |||
307 | static struct configfs_item_operations simple_children_item_ops = { | ||
308 | .show_attribute = simple_children_attr_show, | ||
309 | }; | ||
310 | |||
311 | /* | ||
312 | * Note that, since no extra work is required on ->drop_item(), | ||
313 | * no ->drop_item() is provided. | ||
314 | */ | ||
315 | static struct configfs_group_operations simple_children_group_ops = { | ||
316 | .make_item = simple_children_make_item, | ||
317 | }; | ||
318 | |||
319 | static struct config_item_type simple_children_type = { | ||
320 | .ct_item_ops = &simple_children_item_ops, | ||
321 | .ct_group_ops = &simple_children_group_ops, | ||
322 | .ct_attrs = simple_children_attrs, | ||
323 | }; | ||
324 | |||
325 | static struct configfs_subsystem simple_children_subsys = { | ||
326 | .su_group = { | ||
327 | .cg_item = { | ||
328 | .ci_namebuf = "02-simple-children", | ||
329 | .ci_type = &simple_children_type, | ||
330 | }, | ||
331 | }, | ||
332 | }; | ||
333 | |||
334 | |||
335 | /* ----------------------------------------------------------------- */ | ||
336 | |||
337 | /* | ||
338 | * 03-group-children | ||
339 | * | ||
340 | * This example reuses the simple_children group from above. However, | ||
341 | * the simple_children group is not the subsystem itself, it is a | ||
342 | * child of the subsystem. Creation of a group in the subsystem creates | ||
343 | * a new simple_children group. That group can then have simple_child | ||
344 | * children of its own. | ||
345 | */ | ||
346 | |||
347 | struct simple_children { | ||
348 | struct config_group group; | ||
349 | }; | ||
350 | |||
351 | static struct config_group *group_children_make_group(struct config_group *group, const char *name) | ||
352 | { | ||
353 | struct simple_children *simple_children; | ||
354 | |||
355 | simple_children = kmalloc(sizeof(struct simple_children), | ||
356 | GFP_KERNEL); | ||
357 | if (!simple_children) | ||
358 | return NULL; | ||
359 | |||
360 | memset(simple_children, 0, sizeof(struct simple_children)); | ||
361 | |||
362 | config_group_init_type_name(&simple_children->group, name, | ||
363 | &simple_children_type); | ||
364 | |||
365 | return &simple_children->group; | ||
366 | } | ||
367 | |||
368 | static struct configfs_attribute group_children_attr_description = { | ||
369 | .ca_owner = THIS_MODULE, | ||
370 | .ca_name = "description", | ||
371 | .ca_mode = S_IRUGO, | ||
372 | }; | ||
373 | |||
374 | static struct configfs_attribute *group_children_attrs[] = { | ||
375 | &group_children_attr_description, | ||
376 | NULL, | ||
377 | }; | ||
378 | |||
379 | static ssize_t group_children_attr_show(struct config_item *item, | ||
380 | struct configfs_attribute *attr, | ||
381 | char *page) | ||
382 | { | ||
383 | return sprintf(page, | ||
384 | "[03-group-children]\n" | ||
385 | "\n" | ||
386 | "This subsystem allows the creation of child config_groups. These\n" | ||
387 | "groups are like the subsystem simple-children.\n"); | ||
388 | } | ||
389 | |||
390 | static struct configfs_item_operations group_children_item_ops = { | ||
391 | .show_attribute = group_children_attr_show, | ||
392 | }; | ||
393 | |||
394 | /* | ||
395 | * Note that, since no extra work is required on ->drop_item(), | ||
396 | * no ->drop_item() is provided. | ||
397 | */ | ||
398 | static struct configfs_group_operations group_children_group_ops = { | ||
399 | .make_group = group_children_make_group, | ||
400 | }; | ||
401 | |||
402 | static struct config_item_type group_children_type = { | ||
403 | .ct_item_ops = &group_children_item_ops, | ||
404 | .ct_group_ops = &group_children_group_ops, | ||
405 | .ct_attrs = group_children_attrs, | ||
406 | }; | ||
407 | |||
408 | static struct configfs_subsystem group_children_subsys = { | ||
409 | .su_group = { | ||
410 | .cg_item = { | ||
411 | .ci_namebuf = "03-group-children", | ||
412 | .ci_type = &group_children_type, | ||
413 | }, | ||
414 | }, | ||
415 | }; | ||
416 | |||
417 | /* ----------------------------------------------------------------- */ | ||
418 | |||
419 | /* | ||
420 | * We're now done with our subsystem definitions. | ||
421 | * For convenience in this module, here's a list of them all. It | ||
422 | * allows the init function to easily register them. Most modules | ||
423 | * will only have one subsystem, and will only call register_subsystem | ||
424 | * on it directly. | ||
425 | */ | ||
426 | static struct configfs_subsystem *example_subsys[] = { | ||
427 | &childless_subsys.subsys, | ||
428 | &simple_children_subsys, | ||
429 | &group_children_subsys, | ||
430 | NULL, | ||
431 | }; | ||
432 | |||
433 | static int __init configfs_example_init(void) | ||
434 | { | ||
435 | int ret; | ||
436 | int i; | ||
437 | struct configfs_subsystem *subsys; | ||
438 | |||
439 | for (i = 0; example_subsys[i]; i++) { | ||
440 | subsys = example_subsys[i]; | ||
441 | |||
442 | config_group_init(&subsys->su_group); | ||
443 | init_MUTEX(&subsys->su_sem); | ||
444 | ret = configfs_register_subsystem(subsys); | ||
445 | if (ret) { | ||
446 | printk(KERN_ERR "Error %d while registering subsystem %s\n", | ||
447 | ret, | ||
448 | subsys->su_group.cg_item.ci_namebuf); | ||
449 | goto out_unregister; | ||
450 | } | ||
451 | } | ||
452 | |||
453 | return 0; | ||
454 | |||
455 | out_unregister: | ||
456 | for (; i >= 0; i--) { | ||
457 | configfs_unregister_subsystem(example_subsys[i]); | ||
458 | } | ||
459 | |||
460 | return ret; | ||
461 | } | ||
462 | |||
463 | static void __exit configfs_example_exit(void) | ||
464 | { | ||
465 | int i; | ||
466 | |||
467 | for (i = 0; example_subsys[i]; i++) { | ||
468 | configfs_unregister_subsystem(example_subsys[i]); | ||
469 | } | ||
470 | } | ||
471 | |||
472 | module_init(configfs_example_init); | ||
473 | module_exit(configfs_example_exit); | ||
474 | MODULE_LICENSE("GPL"); | ||
diff --git a/Documentation/filesystems/dlmfs.txt b/Documentation/filesystems/dlmfs.txt new file mode 100644 index 000000000000..9afab845a906 --- /dev/null +++ b/Documentation/filesystems/dlmfs.txt | |||
@@ -0,0 +1,130 @@ | |||
1 | dlmfs | ||
2 | ================== | ||
3 | A minimal DLM userspace interface implemented via a virtual file | ||
4 | system. | ||
5 | |||
6 | dlmfs is built with OCFS2 as it requires most of its infrastructure. | ||
7 | |||
8 | Project web page: http://oss.oracle.com/projects/ocfs2 | ||
9 | Tools web page: http://oss.oracle.com/projects/ocfs2-tools | ||
10 | OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ | ||
11 | |||
12 | All code copyright 2005 Oracle except when otherwise noted. | ||
13 | |||
14 | CREDITS | ||
15 | ======= | ||
16 | |||
17 | Some code taken from ramfs which is Copyright (C) 2000 Linus Torvalds | ||
18 | and Transmeta Corp. | ||
19 | |||
20 | Mark Fasheh <mark.fasheh@oracle.com> | ||
21 | |||
22 | Caveats | ||
23 | ======= | ||
24 | - Right now it only works with the OCFS2 DLM, though support for other | ||
25 | DLM implementations should not be a major issue. | ||
26 | |||
27 | Mount options | ||
28 | ============= | ||
29 | None | ||
30 | |||
31 | Usage | ||
32 | ===== | ||
33 | |||
34 | If you're just interested in OCFS2, then please see ocfs2.txt. The | ||
35 | rest of this document will be geared towards those who want to use | ||
36 | dlmfs for easy to setup and easy to use clustered locking in | ||
37 | userspace. | ||
38 | |||
39 | Setup | ||
40 | ===== | ||
41 | |||
42 | dlmfs requires that the OCFS2 cluster infrastructure be in | ||
43 | place. Please download ocfs2-tools from the above url and configure a | ||
44 | cluster. | ||
45 | |||
46 | You'll want to start heartbeating on a volume which all the nodes in | ||
47 | your lockspace can access. The easiest way to do this is via | ||
48 | ocfs2_hb_ctl (distributed with ocfs2-tools). Right now it requires | ||
49 | that an OCFS2 file system be in place so that it can automatically | ||
50 | find it's heartbeat area, though it will eventually support heartbeat | ||
51 | against raw disks. | ||
52 | |||
53 | Please see the ocfs2_hb_ctl and mkfs.ocfs2 manual pages distributed | ||
54 | with ocfs2-tools. | ||
55 | |||
56 | Once you're heartbeating, DLM lock 'domains' can be easily created / | ||
57 | destroyed and locks within them accessed. | ||
58 | |||
59 | Locking | ||
60 | ======= | ||
61 | |||
62 | Users may access dlmfs via standard file system calls, or they can use | ||
63 | 'libo2dlm' (distributed with ocfs2-tools) which abstracts the file | ||
64 | system calls and presents a more traditional locking api. | ||
65 | |||
66 | dlmfs handles lock caching automatically for the user, so a lock | ||
67 | request for an already acquired lock will not generate another DLM | ||
68 | call. Userspace programs are assumed to handle their own local | ||
69 | locking. | ||
70 | |||
71 | Two levels of locks are supported - Shared Read, and Exlcusive. | ||
72 | Also supported is a Trylock operation. | ||
73 | |||
74 | For information on the libo2dlm interface, please see o2dlm.h, | ||
75 | distributed with ocfs2-tools. | ||
76 | |||
77 | Lock value blocks can be read and written to a resource via read(2) | ||
78 | and write(2) against the fd obtained via your open(2) call. The | ||
79 | maximum currently supported LVB length is 64 bytes (though that is an | ||
80 | OCFS2 DLM limitation). Through this mechanism, users of dlmfs can share | ||
81 | small amounts of data amongst their nodes. | ||
82 | |||
83 | mkdir(2) signals dlmfs to join a domain (which will have the same name | ||
84 | as the resulting directory) | ||
85 | |||
86 | rmdir(2) signals dlmfs to leave the domain | ||
87 | |||
88 | Locks for a given domain are represented by regular inodes inside the | ||
89 | domain directory. Locking against them is done via the open(2) system | ||
90 | call. | ||
91 | |||
92 | The open(2) call will not return until your lock has been granted or | ||
93 | an error has occurred, unless it has been instructed to do a trylock | ||
94 | operation. If the lock succeeds, you'll get an fd. | ||
95 | |||
96 | open(2) with O_CREAT to ensure the resource inode is created - dlmfs does | ||
97 | not automatically create inodes for existing lock resources. | ||
98 | |||
99 | Open Flag Lock Request Type | ||
100 | --------- ----------------- | ||
101 | O_RDONLY Shared Read | ||
102 | O_RDWR Exclusive | ||
103 | |||
104 | Open Flag Resulting Locking Behavior | ||
105 | --------- -------------------------- | ||
106 | O_NONBLOCK Trylock operation | ||
107 | |||
108 | You must provide exactly one of O_RDONLY or O_RDWR. | ||
109 | |||
110 | If O_NONBLOCK is also provided and the trylock operation was valid but | ||
111 | could not lock the resource then open(2) will return ETXTBUSY. | ||
112 | |||
113 | close(2) drops the lock associated with your fd. | ||
114 | |||
115 | Modes passed to mkdir(2) or open(2) are adhered to locally. Chown is | ||
116 | supported locally as well. This means you can use them to restrict | ||
117 | access to the resources via dlmfs on your local node only. | ||
118 | |||
119 | The resource LVB may be read from the fd in either Shared Read or | ||
120 | Exclusive modes via the read(2) system call. It can be written via | ||
121 | write(2) only when open in Exclusive mode. | ||
122 | |||
123 | Once written, an LVB will be visible to other nodes who obtain Read | ||
124 | Only or higher level locks on the resource. | ||
125 | |||
126 | See Also | ||
127 | ======== | ||
128 | http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf | ||
129 | |||
130 | For more information on the VMS distributed locking API. | ||
diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt new file mode 100644 index 000000000000..f2595caf052e --- /dev/null +++ b/Documentation/filesystems/ocfs2.txt | |||
@@ -0,0 +1,55 @@ | |||
1 | OCFS2 filesystem | ||
2 | ================== | ||
3 | OCFS2 is a general purpose extent based shared disk cluster file | ||
4 | system with many similarities to ext3. It supports 64 bit inode | ||
5 | numbers, and has automatically extending metadata groups which may | ||
6 | also make it attractive for non-clustered use. | ||
7 | |||
8 | You'll want to install the ocfs2-tools package in order to at least | ||
9 | get "mount.ocfs2" and "ocfs2_hb_ctl". | ||
10 | |||
11 | Project web page: http://oss.oracle.com/projects/ocfs2 | ||
12 | Tools web page: http://oss.oracle.com/projects/ocfs2-tools | ||
13 | OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ | ||
14 | |||
15 | All code copyright 2005 Oracle except when otherwise noted. | ||
16 | |||
17 | CREDITS: | ||
18 | Lots of code taken from ext3 and other projects. | ||
19 | |||
20 | Authors in alphabetical order: | ||
21 | Joel Becker <joel.becker@oracle.com> | ||
22 | Zach Brown <zach.brown@oracle.com> | ||
23 | Mark Fasheh <mark.fasheh@oracle.com> | ||
24 | Kurt Hackel <kurt.hackel@oracle.com> | ||
25 | Sunil Mushran <sunil.mushran@oracle.com> | ||
26 | Manish Singh <manish.singh@oracle.com> | ||
27 | |||
28 | Caveats | ||
29 | ======= | ||
30 | Features which OCFS2 does not support yet: | ||
31 | - sparse files | ||
32 | - extended attributes | ||
33 | - shared writeable mmap | ||
34 | - loopback is supported, but data written will not | ||
35 | be cluster coherent. | ||
36 | - quotas | ||
37 | - cluster aware flock | ||
38 | - Directory change notification (F_NOTIFY) | ||
39 | - Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease) | ||
40 | - POSIX ACLs | ||
41 | - readpages / writepages (not user visible) | ||
42 | |||
43 | Mount options | ||
44 | ============= | ||
45 | |||
46 | OCFS2 supports the following mount options: | ||
47 | (*) == default | ||
48 | |||
49 | barrier=1 This enables/disables barriers. barrier=0 disables it, | ||
50 | barrier=1 enables it. | ||
51 | errors=remount-ro(*) Remount the filesystem read-only on an error. | ||
52 | errors=panic Panic and halt the machine if an error occurs. | ||
53 | intr (*) Allow signals to interrupt cluster operations. | ||
54 | nointr Do not allow signals to interrupt cluster | ||
55 | operations. | ||
diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 31154882000a..6304db59bfe4 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt | |||
@@ -860,24 +860,6 @@ The structure has a number of fields, some of which are mandatory: | |||
860 | It is safe to sleep in this method. | 860 | It is safe to sleep in this method. |
861 | 861 | ||
862 | 862 | ||
863 | (*) int (*duplicate)(struct key *key, const struct key *source); | ||
864 | |||
865 | If this type of key can be duplicated, then this method should be | ||
866 | provided. It is called to copy the payload attached to the source into the | ||
867 | new key. The data length on the new key will have been updated and the | ||
868 | quota adjusted already. | ||
869 | |||
870 | This method will be called with the source key's semaphore read-locked to | ||
871 | prevent its payload from being changed, thus RCU constraints need not be | ||
872 | applied to the source key. | ||
873 | |||
874 | This method does not have to lock the destination key in order to attach a | ||
875 | payload. The fact that KEY_FLAG_INSTANTIATED is not set in key->flags | ||
876 | prevents anything else from gaining access to the key. | ||
877 | |||
878 | It is safe to sleep in this method. | ||
879 | |||
880 | |||
881 | (*) int (*update)(struct key *key, const void *data, size_t datalen); | 863 | (*) int (*update)(struct key *key, const void *data, size_t datalen); |
882 | 864 | ||
883 | If this type of key can be updated, then this method should be provided. | 865 | If this type of key can be updated, then this method should be provided. |
diff --git a/Documentation/md.txt b/Documentation/md.txt index 23e6cce40f9c..03a13c462cf2 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt | |||
@@ -51,6 +51,30 @@ superblock can be autodetected and run at boot time. | |||
51 | The kernel parameter "raid=partitionable" (or "raid=part") means | 51 | The kernel parameter "raid=partitionable" (or "raid=part") means |
52 | that all auto-detected arrays are assembled as partitionable. | 52 | that all auto-detected arrays are assembled as partitionable. |
53 | 53 | ||
54 | Boot time assembly of degraded/dirty arrays | ||
55 | ------------------------------------------- | ||
56 | |||
57 | If a raid5 or raid6 array is both dirty and degraded, it could have | ||
58 | undetectable data corruption. This is because the fact that it is | ||
59 | 'dirty' means that the parity cannot be trusted, and the fact that it | ||
60 | is degraded means that some datablocks are missing and cannot reliably | ||
61 | be reconstructed (due to no parity). | ||
62 | |||
63 | For this reason, md will normally refuse to start such an array. This | ||
64 | requires the sysadmin to take action to explicitly start the array | ||
65 | desipite possible corruption. This is normally done with | ||
66 | mdadm --assemble --force .... | ||
67 | |||
68 | This option is not really available if the array has the root | ||
69 | filesystem on it. In order to support this booting from such an | ||
70 | array, md supports a module parameter "start_dirty_degraded" which, | ||
71 | when set to 1, bypassed the checks and will allows dirty degraded | ||
72 | arrays to be started. | ||
73 | |||
74 | So, to boot with a root filesystem of a dirty degraded raid[56], use | ||
75 | |||
76 | md-mod.start_dirty_degraded=1 | ||
77 | |||
54 | 78 | ||
55 | Superblock formats | 79 | Superblock formats |
56 | ------------------ | 80 | ------------------ |
@@ -141,6 +165,70 @@ All md devices contain: | |||
141 | in a fully functional array. If this is not yet known, the file | 165 | in a fully functional array. If this is not yet known, the file |
142 | will be empty. If an array is being resized (not currently | 166 | will be empty. If an array is being resized (not currently |
143 | possible) this will contain the larger of the old and new sizes. | 167 | possible) this will contain the larger of the old and new sizes. |
168 | Some raid level (RAID1) allow this value to be set while the | ||
169 | array is active. This will reconfigure the array. Otherwise | ||
170 | it can only be set while assembling an array. | ||
171 | |||
172 | chunk_size | ||
173 | This is the size if bytes for 'chunks' and is only relevant to | ||
174 | raid levels that involve striping (1,4,5,6,10). The address space | ||
175 | of the array is conceptually divided into chunks and consecutive | ||
176 | chunks are striped onto neighbouring devices. | ||
177 | The size should be atleast PAGE_SIZE (4k) and should be a power | ||
178 | of 2. This can only be set while assembling an array | ||
179 | |||
180 | component_size | ||
181 | For arrays with data redundancy (i.e. not raid0, linear, faulty, | ||
182 | multipath), all components must be the same size - or at least | ||
183 | there must a size that they all provide space for. This is a key | ||
184 | part or the geometry of the array. It is measured in sectors | ||
185 | and can be read from here. Writing to this value may resize | ||
186 | the array if the personality supports it (raid1, raid5, raid6), | ||
187 | and if the component drives are large enough. | ||
188 | |||
189 | metadata_version | ||
190 | This indicates the format that is being used to record metadata | ||
191 | about the array. It can be 0.90 (traditional format), 1.0, 1.1, | ||
192 | 1.2 (newer format in varying locations) or "none" indicating that | ||
193 | the kernel isn't managing metadata at all. | ||
194 | |||
195 | level | ||
196 | The raid 'level' for this array. The name will often (but not | ||
197 | always) be the same as the name of the module that implements the | ||
198 | level. To be auto-loaded the module must have an alias | ||
199 | md-$LEVEL e.g. md-raid5 | ||
200 | This can be written only while the array is being assembled, not | ||
201 | after it is started. | ||
202 | |||
203 | new_dev | ||
204 | This file can be written but not read. The value written should | ||
205 | be a block device number as major:minor. e.g. 8:0 | ||
206 | This will cause that device to be attached to the array, if it is | ||
207 | available. It will then appear at md/dev-XXX (depending on the | ||
208 | name of the device) and further configuration is then possible. | ||
209 | |||
210 | sync_speed_min | ||
211 | sync_speed_max | ||
212 | This are similar to /proc/sys/dev/raid/speed_limit_{min,max} | ||
213 | however they only apply to the particular array. | ||
214 | If no value has been written to these, of if the word 'system' | ||
215 | is written, then the system-wide value is used. If a value, | ||
216 | in kibibytes-per-second is written, then it is used. | ||
217 | When the files are read, they show the currently active value | ||
218 | followed by "(local)" or "(system)" depending on whether it is | ||
219 | a locally set or system-wide value. | ||
220 | |||
221 | sync_completed | ||
222 | This shows the number of sectors that have been completed of | ||
223 | whatever the current sync_action is, followed by the number of | ||
224 | sectors in total that could need to be processed. The two | ||
225 | numbers are separated by a '/' thus effectively showing one | ||
226 | value, a fraction of the process that is complete. | ||
227 | |||
228 | sync_speed | ||
229 | This shows the current actual speed, in K/sec, of the current | ||
230 | sync_action. It is averaged over the last 30 seconds. | ||
231 | |||
144 | 232 | ||
145 | As component devices are added to an md array, they appear in the 'md' | 233 | As component devices are added to an md array, they appear in the 'md' |
146 | directory as new directories named | 234 | directory as new directories named |
@@ -167,6 +255,38 @@ Each directory contains: | |||
167 | of being recoverred to | 255 | of being recoverred to |
168 | This list make grow in future. | 256 | This list make grow in future. |
169 | 257 | ||
258 | errors | ||
259 | An approximate count of read errors that have been detected on | ||
260 | this device but have not caused the device to be evicted from | ||
261 | the array (either because they were corrected or because they | ||
262 | happened while the array was read-only). When using version-1 | ||
263 | metadata, this value persists across restarts of the array. | ||
264 | |||
265 | This value can be written while assembling an array thus | ||
266 | providing an ongoing count for arrays with metadata managed by | ||
267 | userspace. | ||
268 | |||
269 | slot | ||
270 | This gives the role that the device has in the array. It will | ||
271 | either be 'none' if the device is not active in the array | ||
272 | (i.e. is a spare or has failed) or an integer less than the | ||
273 | 'raid_disks' number for the array indicating which possition | ||
274 | it currently fills. This can only be set while assembling an | ||
275 | array. A device for which this is set is assumed to be working. | ||
276 | |||
277 | offset | ||
278 | This gives the location in the device (in sectors from the | ||
279 | start) where data from the array will be stored. Any part of | ||
280 | the device before this offset us not touched, unless it is | ||
281 | used for storing metadata (Formats 1.1 and 1.2). | ||
282 | |||
283 | size | ||
284 | The amount of the device, after the offset, that can be used | ||
285 | for storage of data. This will normally be the same as the | ||
286 | component_size. This can be written while assembling an | ||
287 | array. If a value less than the current component_size is | ||
288 | written, component_size will be reduced to this value. | ||
289 | |||
170 | 290 | ||
171 | An active md device will also contain and entry for each active device | 291 | An active md device will also contain and entry for each active device |
172 | in the array. These are named | 292 | in the array. These are named |
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt index f5ebda5f4276..bd4ffb5bd49a 100644 --- a/Documentation/power/interface.txt +++ b/Documentation/power/interface.txt | |||
@@ -41,3 +41,14 @@ to. Writing to this file will accept one of | |||
41 | It will only change to 'firmware' or 'platform' if the system supports | 41 | It will only change to 'firmware' or 'platform' if the system supports |
42 | it. | 42 | it. |
43 | 43 | ||
44 | /sys/power/image_size controls the size of the image created by | ||
45 | the suspend-to-disk mechanism. It can be written a string | ||
46 | representing a non-negative integer that will be used as an upper | ||
47 | limit of the image size, in megabytes. The suspend-to-disk mechanism will | ||
48 | do its best to ensure the image size will not exceed that number. However, | ||
49 | if this turns out to be impossible, it will try to suspend anyway using the | ||
50 | smallest image possible. In particular, if "0" is written to this file, the | ||
51 | suspend image will be as small as possible. | ||
52 | |||
53 | Reading from this file will display the current image size limit, which | ||
54 | is set to 500 MB by default. | ||
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index b0d50840788e..cd0fcd89a6f0 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt | |||
@@ -27,6 +27,11 @@ echo shutdown > /sys/power/disk; echo disk > /sys/power/state | |||
27 | 27 | ||
28 | echo platform > /sys/power/disk; echo disk > /sys/power/state | 28 | echo platform > /sys/power/disk; echo disk > /sys/power/state |
29 | 29 | ||
30 | If you want to limit the suspend image size to N megabytes, do | ||
31 | |||
32 | echo N > /sys/power/image_size | ||
33 | |||
34 | before suspend (it is limited to 500 MB by default). | ||
30 | 35 | ||
31 | Encrypted suspend image: | 36 | Encrypted suspend image: |
32 | ------------------------ | 37 | ------------------------ |