diff options
-rw-r--r-- | Documentation/filesystems/autofs4-mount-control.txt | 393 |
1 files changed, 393 insertions, 0 deletions
diff --git a/Documentation/filesystems/autofs4-mount-control.txt b/Documentation/filesystems/autofs4-mount-control.txt new file mode 100644 index 000000000000..c6341745df37 --- /dev/null +++ b/Documentation/filesystems/autofs4-mount-control.txt | |||
@@ -0,0 +1,393 @@ | |||
1 | |||
2 | Miscellaneous Device control operations for the autofs4 kernel module | ||
3 | ==================================================================== | ||
4 | |||
5 | The problem | ||
6 | =========== | ||
7 | |||
8 | There is a problem with active restarts in autofs (that is to say | ||
9 | restarting autofs when there are busy mounts). | ||
10 | |||
11 | During normal operation autofs uses a file descriptor opened on the | ||
12 | directory that is being managed in order to be able to issue control | ||
13 | operations. Using a file descriptor gives ioctl operations access to | ||
14 | autofs specific information stored in the super block. The operations | ||
15 | are things such as setting an autofs mount catatonic, setting the | ||
16 | expire timeout and requesting expire checks. As is explained below, | ||
17 | certain types of autofs triggered mounts can end up covering an autofs | ||
18 | mount itself which prevents us being able to use open(2) to obtain a | ||
19 | file descriptor for these operations if we don't already have one open. | ||
20 | |||
21 | Currently autofs uses "umount -l" (lazy umount) to clear active mounts | ||
22 | at restart. While using lazy umount works for most cases, anything that | ||
23 | needs to walk back up the mount tree to construct a path, such as | ||
24 | getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works | ||
25 | because the point from which the path is constructed has been detached | ||
26 | from the mount tree. | ||
27 | |||
28 | The actual problem with autofs is that it can't reconnect to existing | ||
29 | mounts. Immediately one thinks of just adding the ability to remount | ||
30 | autofs file systems would solve it, but alas, that can't work. This is | ||
31 | because autofs direct mounts and the implementation of "on demand mount | ||
32 | and expire" of nested mount trees have the file system mounted directly | ||
33 | on top of the mount trigger directory dentry. | ||
34 | |||
35 | For example, there are two types of automount maps, direct (in the kernel | ||
36 | module source you will see a third type called an offset, which is just | ||
37 | a direct mount in disguise) and indirect. | ||
38 | |||
39 | Here is a master map with direct and indirect map entries: | ||
40 | |||
41 | /- /etc/auto.direct | ||
42 | /test /etc/auto.indirect | ||
43 | |||
44 | and the corresponding map files: | ||
45 | |||
46 | /etc/auto.direct: | ||
47 | |||
48 | /automount/dparse/g6 budgie:/autofs/export1 | ||
49 | /automount/dparse/g1 shark:/autofs/export1 | ||
50 | and so on. | ||
51 | |||
52 | /etc/auto.indirect: | ||
53 | |||
54 | g1 shark:/autofs/export1 | ||
55 | g6 budgie:/autofs/export1 | ||
56 | and so on. | ||
57 | |||
58 | For the above indirect map an autofs file system is mounted on /test and | ||
59 | mounts are triggered for each sub-directory key by the inode lookup | ||
60 | operation. So we see a mount of shark:/autofs/export1 on /test/g1, for | ||
61 | example. | ||
62 | |||
63 | The way that direct mounts are handled is by making an autofs mount on | ||
64 | each full path, such as /automount/dparse/g1, and using it as a mount | ||
65 | trigger. So when we walk on the path we mount shark:/autofs/export1 "on | ||
66 | top of this mount point". Since these are always directories we can | ||
67 | use the follow_link inode operation to trigger the mount. | ||
68 | |||
69 | But, each entry in direct and indirect maps can have offsets (making | ||
70 | them multi-mount map entries). | ||
71 | |||
72 | For example, an indirect mount map entry could also be: | ||
73 | |||
74 | g1 \ | ||
75 | / shark:/autofs/export5/testing/test \ | ||
76 | /s1 shark:/autofs/export/testing/test/s1 \ | ||
77 | /s2 shark:/autofs/export5/testing/test/s2 \ | ||
78 | /s1/ss1 shark:/autofs/export1 \ | ||
79 | /s2/ss2 shark:/autofs/export2 | ||
80 | |||
81 | and a similarly a direct mount map entry could also be: | ||
82 | |||
83 | /automount/dparse/g1 \ | ||
84 | / shark:/autofs/export5/testing/test \ | ||
85 | /s1 shark:/autofs/export/testing/test/s1 \ | ||
86 | /s2 shark:/autofs/export5/testing/test/s2 \ | ||
87 | /s1/ss1 shark:/autofs/export2 \ | ||
88 | /s2/ss2 shark:/autofs/export2 | ||
89 | |||
90 | One of the issues with version 4 of autofs was that, when mounting an | ||
91 | entry with a large number of offsets, possibly with nesting, we needed | ||
92 | to mount and umount all of the offsets as a single unit. Not really a | ||
93 | problem, except for people with a large number of offsets in map entries. | ||
94 | This mechanism is used for the well known "hosts" map and we have seen | ||
95 | cases (in 2.4) where the available number of mounts are exhausted or | ||
96 | where the number of privileged ports available is exhausted. | ||
97 | |||
98 | In version 5 we mount only as we go down the tree of offsets and | ||
99 | similarly for expiring them which resolves the above problem. There is | ||
100 | somewhat more detail to the implementation but it isn't needed for the | ||
101 | sake of the problem explanation. The one important detail is that these | ||
102 | offsets are implemented using the same mechanism as the direct mounts | ||
103 | above and so the mount points can be covered by a mount. | ||
104 | |||
105 | The current autofs implementation uses an ioctl file descriptor opened | ||
106 | on the mount point for control operations. The references held by the | ||
107 | descriptor are accounted for in checks made to determine if a mount is | ||
108 | in use and is also used to access autofs file system information held | ||
109 | in the mount super block. So the use of a file handle needs to be | ||
110 | retained. | ||
111 | |||
112 | |||
113 | The Solution | ||
114 | ============ | ||
115 | |||
116 | To be able to restart autofs leaving existing direct, indirect and | ||
117 | offset mounts in place we need to be able to obtain a file handle | ||
118 | for these potentially covered autofs mount points. Rather than just | ||
119 | implement an isolated operation it was decided to re-implement the | ||
120 | existing ioctl interface and add new operations to provide this | ||
121 | functionality. | ||
122 | |||
123 | In addition, to be able to reconstruct a mount tree that has busy mounts, | ||
124 | the uid and gid of the last user that triggered the mount needs to be | ||
125 | available because these can be used as macro substitution variables in | ||
126 | autofs maps. They are recorded at mount request time and an operation | ||
127 | has been added to retrieve them. | ||
128 | |||
129 | Since we're re-implementing the control interface, a couple of other | ||
130 | problems with the existing interface have been addressed. First, when | ||
131 | a mount or expire operation completes a status is returned to the | ||
132 | kernel by either a "send ready" or a "send fail" operation. The | ||
133 | "send fail" operation of the ioctl interface could only ever send | ||
134 | ENOENT so the re-implementation allows user space to send an actual | ||
135 | status. Another expensive operation in user space, for those using | ||
136 | very large maps, is discovering if a mount is present. Usually this | ||
137 | involves scanning /proc/mounts and since it needs to be done quite | ||
138 | often it can introduce significant overhead when there are many entries | ||
139 | in the mount table. An operation to lookup the mount status of a mount | ||
140 | point dentry (covered or not) has also been added. | ||
141 | |||
142 | Current kernel development policy recommends avoiding the use of the | ||
143 | ioctl mechanism in favor of systems such as Netlink. An implementation | ||
144 | using this system was attempted to evaluate its suitability and it was | ||
145 | found to be inadequate, in this case. The Generic Netlink system was | ||
146 | used for this as raw Netlink would lead to a significant increase in | ||
147 | complexity. There's no question that the Generic Netlink system is an | ||
148 | elegant solution for common case ioctl functions but it's not a complete | ||
149 | replacement probably because it's primary purpose in life is to be a | ||
150 | message bus implementation rather than specifically an ioctl replacement. | ||
151 | While it would be possible to work around this there is one concern | ||
152 | that lead to the decision to not use it. This is that the autofs | ||
153 | expire in the daemon has become far to complex because umount | ||
154 | candidates are enumerated, almost for no other reason than to "count" | ||
155 | the number of times to call the expire ioctl. This involves scanning | ||
156 | the mount table which has proved to be a big overhead for users with | ||
157 | large maps. The best way to improve this is try and get back to the | ||
158 | way the expire was done long ago. That is, when an expire request is | ||
159 | issued for a mount (file handle) we should continually call back to | ||
160 | the daemon until we can't umount any more mounts, then return the | ||
161 | appropriate status to the daemon. At the moment we just expire one | ||
162 | mount at a time. A Generic Netlink implementation would exclude this | ||
163 | possibility for future development due to the requirements of the | ||
164 | message bus architecture. | ||
165 | |||
166 | |||
167 | autofs4 Miscellaneous Device mount control interface | ||
168 | ==================================================== | ||
169 | |||
170 | The control interface is opening a device node, typically /dev/autofs. | ||
171 | |||
172 | All the ioctls use a common structure to pass the needed parameter | ||
173 | information and return operation results: | ||
174 | |||
175 | struct autofs_dev_ioctl { | ||
176 | __u32 ver_major; | ||
177 | __u32 ver_minor; | ||
178 | __u32 size; /* total size of data passed in | ||
179 | * including this struct */ | ||
180 | __s32 ioctlfd; /* automount command fd */ | ||
181 | |||
182 | __u32 arg1; /* Command parameters */ | ||
183 | __u32 arg2; | ||
184 | |||
185 | char path[0]; | ||
186 | }; | ||
187 | |||
188 | The ioctlfd field is a mount point file descriptor of an autofs mount | ||
189 | point. It is returned by the open call and is used by all calls except | ||
190 | the check for whether a given path is a mount point, where it may | ||
191 | optionally be used to check a specific mount corresponding to a given | ||
192 | mount point file descriptor, and when requesting the uid and gid of the | ||
193 | last successful mount on a directory within the autofs file system. | ||
194 | |||
195 | The fields arg1 and arg2 are used to communicate parameters and results of | ||
196 | calls made as described below. | ||
197 | |||
198 | The path field is used to pass a path where it is needed and the size field | ||
199 | is used account for the increased structure length when translating the | ||
200 | structure sent from user space. | ||
201 | |||
202 | This structure can be initialized before setting specific fields by using | ||
203 | the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *). | ||
204 | |||
205 | All of the ioctls perform a copy of this structure from user space to | ||
206 | kernel space and return -EINVAL if the size parameter is smaller than | ||
207 | the structure size itself, -ENOMEM if the kernel memory allocation fails | ||
208 | or -EFAULT if the copy itself fails. Other checks include a version check | ||
209 | of the compiled in user space version against the module version and a | ||
210 | mismatch results in a -EINVAL return. If the size field is greater than | ||
211 | the structure size then a path is assumed to be present and is checked to | ||
212 | ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is | ||
213 | returned. Following these checks, for all ioctl commands except | ||
214 | AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and | ||
215 | AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is | ||
216 | not a valid descriptor or doesn't correspond to an autofs mount point | ||
217 | an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is | ||
218 | returned. | ||
219 | |||
220 | |||
221 | The ioctls | ||
222 | ========== | ||
223 | |||
224 | An example of an implementation which uses this interface can be seen | ||
225 | in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the | ||
226 | distribution tar available for download from kernel.org in directory | ||
227 | /pub/linux/daemons/autofs/v5. | ||
228 | |||
229 | The device node ioctl operations implemented by this interface are: | ||
230 | |||
231 | |||
232 | AUTOFS_DEV_IOCTL_VERSION | ||
233 | ------------------------ | ||
234 | |||
235 | Get the major and minor version of the autofs4 device ioctl kernel module | ||
236 | implementation. It requires an initialized struct autofs_dev_ioctl as an | ||
237 | input parameter and sets the version information in the passed in structure. | ||
238 | It returns 0 on success or the error -EINVAL if a version mismatch is | ||
239 | detected. | ||
240 | |||
241 | |||
242 | AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD | ||
243 | ------------------------------------------------------------------ | ||
244 | |||
245 | Get the major and minor version of the autofs4 protocol version understood | ||
246 | by loaded module. This call requires an initialized struct autofs_dev_ioctl | ||
247 | with the ioctlfd field set to a valid autofs mount point descriptor | ||
248 | and sets the requested version number in structure field arg1. These | ||
249 | commands return 0 on success or one of the negative error codes if | ||
250 | validation fails. | ||
251 | |||
252 | |||
253 | AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT | ||
254 | ---------------------------------------------------------- | ||
255 | |||
256 | Obtain and release a file descriptor for an autofs managed mount point | ||
257 | path. The open call requires an initialized struct autofs_dev_ioctl with | ||
258 | the the path field set and the size field adjusted appropriately as well | ||
259 | as the arg1 field set to the device number of the autofs mount. The | ||
260 | device number can be obtained from the mount options shown in | ||
261 | /proc/mounts. The close call requires an initialized struct | ||
262 | autofs_dev_ioct with the ioctlfd field set to the descriptor obtained | ||
263 | from the open call. The release of the file descriptor can also be done | ||
264 | with close(2) so any open descriptors will also be closed at process exit. | ||
265 | The close call is included in the implemented operations largely for | ||
266 | completeness and to provide for a consistent user space implementation. | ||
267 | |||
268 | |||
269 | AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD | ||
270 | -------------------------------------------------------- | ||
271 | |||
272 | Return mount and expire result status from user space to the kernel. | ||
273 | Both of these calls require an initialized struct autofs_dev_ioctl | ||
274 | with the ioctlfd field set to the descriptor obtained from the open | ||
275 | call and the arg1 field set to the wait queue token number, received | ||
276 | by user space in the foregoing mount or expire request. The arg2 field | ||
277 | is set to the status to be returned. For the ready call this is always | ||
278 | 0 and for the fail call it is set to the errno of the operation. | ||
279 | |||
280 | |||
281 | AUTOFS_DEV_IOCTL_SETPIPEFD_CMD | ||
282 | ------------------------------ | ||
283 | |||
284 | Set the pipe file descriptor used for kernel communication to the daemon. | ||
285 | Normally this is set at mount time using an option but when reconnecting | ||
286 | to a existing mount we need to use this to tell the autofs mount about | ||
287 | the new kernel pipe descriptor. In order to protect mounts against | ||
288 | incorrectly setting the pipe descriptor we also require that the autofs | ||
289 | mount be catatonic (see next call). | ||
290 | |||
291 | The call requires an initialized struct autofs_dev_ioctl with the | ||
292 | ioctlfd field set to the descriptor obtained from the open call and | ||
293 | the arg1 field set to descriptor of the pipe. On success the call | ||
294 | also sets the process group id used to identify the controlling process | ||
295 | (eg. the owning automount(8) daemon) to the process group of the caller. | ||
296 | |||
297 | |||
298 | AUTOFS_DEV_IOCTL_CATATONIC_CMD | ||
299 | ------------------------------ | ||
300 | |||
301 | Make the autofs mount point catatonic. The autofs mount will no longer | ||
302 | issue mount requests, the kernel communication pipe descriptor is released | ||
303 | and any remaining waits in the queue released. | ||
304 | |||
305 | The call requires an initialized struct autofs_dev_ioctl with the | ||
306 | ioctlfd field set to the descriptor obtained from the open call. | ||
307 | |||
308 | |||
309 | AUTOFS_DEV_IOCTL_TIMEOUT_CMD | ||
310 | ---------------------------- | ||
311 | |||
312 | Set the expire timeout for mounts withing an autofs mount point. | ||
313 | |||
314 | The call requires an initialized struct autofs_dev_ioctl with the | ||
315 | ioctlfd field set to the descriptor obtained from the open call. | ||
316 | |||
317 | |||
318 | AUTOFS_DEV_IOCTL_REQUESTER_CMD | ||
319 | ------------------------------ | ||
320 | |||
321 | Return the uid and gid of the last process to successfully trigger a the | ||
322 | mount on the given path dentry. | ||
323 | |||
324 | The call requires an initialized struct autofs_dev_ioctl with the path | ||
325 | field set to the mount point in question and the size field adjusted | ||
326 | appropriately as well as the arg1 field set to the device number of the | ||
327 | containing autofs mount. Upon return the struct field arg1 contains the | ||
328 | uid and arg2 the gid. | ||
329 | |||
330 | When reconstructing an autofs mount tree with active mounts we need to | ||
331 | re-connect to mounts that may have used the original process uid and | ||
332 | gid (or string variations of them) for mount lookups within the map entry. | ||
333 | This call provides the ability to obtain this uid and gid so they may be | ||
334 | used by user space for the mount map lookups. | ||
335 | |||
336 | |||
337 | AUTOFS_DEV_IOCTL_EXPIRE_CMD | ||
338 | --------------------------- | ||
339 | |||
340 | Issue an expire request to the kernel for an autofs mount. Typically | ||
341 | this ioctl is called until no further expire candidates are found. | ||
342 | |||
343 | The call requires an initialized struct autofs_dev_ioctl with the | ||
344 | ioctlfd field set to the descriptor obtained from the open call. In | ||
345 | addition an immediate expire, independent of the mount timeout, can be | ||
346 | requested by setting the arg1 field to 1. If no expire candidates can | ||
347 | be found the ioctl returns -1 with errno set to EAGAIN. | ||
348 | |||
349 | This call causes the kernel module to check the mount corresponding | ||
350 | to the given ioctlfd for mounts that can be expired, issues an expire | ||
351 | request back to the daemon and waits for completion. | ||
352 | |||
353 | AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD | ||
354 | ------------------------------ | ||
355 | |||
356 | Checks if an autofs mount point is in use. | ||
357 | |||
358 | The call requires an initialized struct autofs_dev_ioctl with the | ||
359 | ioctlfd field set to the descriptor obtained from the open call and | ||
360 | it returns the result in the arg1 field, 1 for busy and 0 otherwise. | ||
361 | |||
362 | |||
363 | AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD | ||
364 | --------------------------------- | ||
365 | |||
366 | Check if the given path is a mountpoint. | ||
367 | |||
368 | The call requires an initialized struct autofs_dev_ioctl. There are two | ||
369 | possible variations. Both use the path field set to the path of the mount | ||
370 | point to check and the size field adjusted appropriately. One uses the | ||
371 | ioctlfd field to identify a specific mount point to check while the other | ||
372 | variation uses the path and optionaly arg1 set to an autofs mount type. | ||
373 | The call returns 1 if this is a mount point and sets arg1 to the device | ||
374 | number of the mount and field arg2 to the relevant super block magic | ||
375 | number (described below) or 0 if it isn't a mountpoint. In both cases | ||
376 | the the device number (as returned by new_encode_dev()) is returned | ||
377 | in field arg1. | ||
378 | |||
379 | If supplied with a file descriptor we're looking for a specific mount, | ||
380 | not necessarily at the top of the mounted stack. In this case the path | ||
381 | the descriptor corresponds to is considered a mountpoint if it is itself | ||
382 | a mountpoint or contains a mount, such as a multi-mount without a root | ||
383 | mount. In this case we return 1 if the descriptor corresponds to a mount | ||
384 | point and and also returns the super magic of the covering mount if there | ||
385 | is one or 0 if it isn't a mountpoint. | ||
386 | |||
387 | If a path is supplied (and the ioctlfd field is set to -1) then the path | ||
388 | is looked up and is checked to see if it is the root of a mount. If a | ||
389 | type is also given we are looking for a particular autofs mount and if | ||
390 | a match isn't found a fail is returned. If the the located path is the | ||
391 | root of a mount 1 is returned along with the super magic of the mount | ||
392 | or 0 otherwise. | ||
393 | |||