diff options
| -rw-r--r-- | Documentation/filesystems/autofs4-mount-control.txt | 393 |
1 files changed, 393 insertions, 0 deletions
diff --git a/Documentation/filesystems/autofs4-mount-control.txt b/Documentation/filesystems/autofs4-mount-control.txt new file mode 100644 index 000000000000..c6341745df37 --- /dev/null +++ b/Documentation/filesystems/autofs4-mount-control.txt | |||
| @@ -0,0 +1,393 @@ | |||
| 1 | |||
| 2 | Miscellaneous Device control operations for the autofs4 kernel module | ||
| 3 | ==================================================================== | ||
| 4 | |||
| 5 | The problem | ||
| 6 | =========== | ||
| 7 | |||
| 8 | There is a problem with active restarts in autofs (that is to say | ||
| 9 | restarting autofs when there are busy mounts). | ||
| 10 | |||
| 11 | During normal operation autofs uses a file descriptor opened on the | ||
| 12 | directory that is being managed in order to be able to issue control | ||
| 13 | operations. Using a file descriptor gives ioctl operations access to | ||
| 14 | autofs specific information stored in the super block. The operations | ||
| 15 | are things such as setting an autofs mount catatonic, setting the | ||
| 16 | expire timeout and requesting expire checks. As is explained below, | ||
| 17 | certain types of autofs triggered mounts can end up covering an autofs | ||
| 18 | mount itself which prevents us being able to use open(2) to obtain a | ||
| 19 | file descriptor for these operations if we don't already have one open. | ||
| 20 | |||
| 21 | Currently autofs uses "umount -l" (lazy umount) to clear active mounts | ||
| 22 | at restart. While using lazy umount works for most cases, anything that | ||
| 23 | needs to walk back up the mount tree to construct a path, such as | ||
| 24 | getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works | ||
| 25 | because the point from which the path is constructed has been detached | ||
| 26 | from the mount tree. | ||
| 27 | |||
| 28 | The actual problem with autofs is that it can't reconnect to existing | ||
| 29 | mounts. Immediately one thinks of just adding the ability to remount | ||
| 30 | autofs file systems would solve it, but alas, that can't work. This is | ||
| 31 | because autofs direct mounts and the implementation of "on demand mount | ||
| 32 | and expire" of nested mount trees have the file system mounted directly | ||
| 33 | on top of the mount trigger directory dentry. | ||
| 34 | |||
| 35 | For example, there are two types of automount maps, direct (in the kernel | ||
| 36 | module source you will see a third type called an offset, which is just | ||
| 37 | a direct mount in disguise) and indirect. | ||
| 38 | |||
| 39 | Here is a master map with direct and indirect map entries: | ||
| 40 | |||
| 41 | /- /etc/auto.direct | ||
| 42 | /test /etc/auto.indirect | ||
| 43 | |||
| 44 | and the corresponding map files: | ||
| 45 | |||
| 46 | /etc/auto.direct: | ||
| 47 | |||
| 48 | /automount/dparse/g6 budgie:/autofs/export1 | ||
| 49 | /automount/dparse/g1 shark:/autofs/export1 | ||
| 50 | and so on. | ||
| 51 | |||
| 52 | /etc/auto.indirect: | ||
| 53 | |||
| 54 | g1 shark:/autofs/export1 | ||
| 55 | g6 budgie:/autofs/export1 | ||
| 56 | and so on. | ||
| 57 | |||
| 58 | For the above indirect map an autofs file system is mounted on /test and | ||
| 59 | mounts are triggered for each sub-directory key by the inode lookup | ||
| 60 | operation. So we see a mount of shark:/autofs/export1 on /test/g1, for | ||
| 61 | example. | ||
| 62 | |||
| 63 | The way that direct mounts are handled is by making an autofs mount on | ||
| 64 | each full path, such as /automount/dparse/g1, and using it as a mount | ||
| 65 | trigger. So when we walk on the path we mount shark:/autofs/export1 "on | ||
| 66 | top of this mount point". Since these are always directories we can | ||
| 67 | use the follow_link inode operation to trigger the mount. | ||
| 68 | |||
| 69 | But, each entry in direct and indirect maps can have offsets (making | ||
| 70 | them multi-mount map entries). | ||
| 71 | |||
| 72 | For example, an indirect mount map entry could also be: | ||
| 73 | |||
| 74 | g1 \ | ||
| 75 | / shark:/autofs/export5/testing/test \ | ||
| 76 | /s1 shark:/autofs/export/testing/test/s1 \ | ||
| 77 | /s2 shark:/autofs/export5/testing/test/s2 \ | ||
| 78 | /s1/ss1 shark:/autofs/export1 \ | ||
| 79 | /s2/ss2 shark:/autofs/export2 | ||
| 80 | |||
| 81 | and a similarly a direct mount map entry could also be: | ||
| 82 | |||
| 83 | /automount/dparse/g1 \ | ||
| 84 | / shark:/autofs/export5/testing/test \ | ||
| 85 | /s1 shark:/autofs/export/testing/test/s1 \ | ||
| 86 | /s2 shark:/autofs/export5/testing/test/s2 \ | ||
| 87 | /s1/ss1 shark:/autofs/export2 \ | ||
| 88 | /s2/ss2 shark:/autofs/export2 | ||
| 89 | |||
| 90 | One of the issues with version 4 of autofs was that, when mounting an | ||
| 91 | entry with a large number of offsets, possibly with nesting, we needed | ||
| 92 | to mount and umount all of the offsets as a single unit. Not really a | ||
| 93 | problem, except for people with a large number of offsets in map entries. | ||
| 94 | This mechanism is used for the well known "hosts" map and we have seen | ||
| 95 | cases (in 2.4) where the available number of mounts are exhausted or | ||
| 96 | where the number of privileged ports available is exhausted. | ||
| 97 | |||
| 98 | In version 5 we mount only as we go down the tree of offsets and | ||
| 99 | similarly for expiring them which resolves the above problem. There is | ||
| 100 | somewhat more detail to the implementation but it isn't needed for the | ||
| 101 | sake of the problem explanation. The one important detail is that these | ||
| 102 | offsets are implemented using the same mechanism as the direct mounts | ||
| 103 | above and so the mount points can be covered by a mount. | ||
| 104 | |||
| 105 | The current autofs implementation uses an ioctl file descriptor opened | ||
| 106 | on the mount point for control operations. The references held by the | ||
| 107 | descriptor are accounted for in checks made to determine if a mount is | ||
| 108 | in use and is also used to access autofs file system information held | ||
| 109 | in the mount super block. So the use of a file handle needs to be | ||
| 110 | retained. | ||
| 111 | |||
| 112 | |||
| 113 | The Solution | ||
| 114 | ============ | ||
| 115 | |||
| 116 | To be able to restart autofs leaving existing direct, indirect and | ||
| 117 | offset mounts in place we need to be able to obtain a file handle | ||
| 118 | for these potentially covered autofs mount points. Rather than just | ||
| 119 | implement an isolated operation it was decided to re-implement the | ||
| 120 | existing ioctl interface and add new operations to provide this | ||
| 121 | functionality. | ||
| 122 | |||
| 123 | In addition, to be able to reconstruct a mount tree that has busy mounts, | ||
| 124 | the uid and gid of the last user that triggered the mount needs to be | ||
| 125 | available because these can be used as macro substitution variables in | ||
| 126 | autofs maps. They are recorded at mount request time and an operation | ||
| 127 | has been added to retrieve them. | ||
| 128 | |||
| 129 | Since we're re-implementing the control interface, a couple of other | ||
| 130 | problems with the existing interface have been addressed. First, when | ||
| 131 | a mount or expire operation completes a status is returned to the | ||
| 132 | kernel by either a "send ready" or a "send fail" operation. The | ||
| 133 | "send fail" operation of the ioctl interface could only ever send | ||
| 134 | ENOENT so the re-implementation allows user space to send an actual | ||
| 135 | status. Another expensive operation in user space, for those using | ||
| 136 | very large maps, is discovering if a mount is present. Usually this | ||
| 137 | involves scanning /proc/mounts and since it needs to be done quite | ||
| 138 | often it can introduce significant overhead when there are many entries | ||
| 139 | in the mount table. An operation to lookup the mount status of a mount | ||
| 140 | point dentry (covered or not) has also been added. | ||
| 141 | |||
| 142 | Current kernel development policy recommends avoiding the use of the | ||
| 143 | ioctl mechanism in favor of systems such as Netlink. An implementation | ||
| 144 | using this system was attempted to evaluate its suitability and it was | ||
| 145 | found to be inadequate, in this case. The Generic Netlink system was | ||
| 146 | used for this as raw Netlink would lead to a significant increase in | ||
| 147 | complexity. There's no question that the Generic Netlink system is an | ||
| 148 | elegant solution for common case ioctl functions but it's not a complete | ||
| 149 | replacement probably because it's primary purpose in life is to be a | ||
| 150 | message bus implementation rather than specifically an ioctl replacement. | ||
| 151 | While it would be possible to work around this there is one concern | ||
| 152 | that lead to the decision to not use it. This is that the autofs | ||
| 153 | expire in the daemon has become far to complex because umount | ||
| 154 | candidates are enumerated, almost for no other reason than to "count" | ||
| 155 | the number of times to call the expire ioctl. This involves scanning | ||
| 156 | the mount table which has proved to be a big overhead for users with | ||
| 157 | large maps. The best way to improve this is try and get back to the | ||
| 158 | way the expire was done long ago. That is, when an expire request is | ||
| 159 | issued for a mount (file handle) we should continually call back to | ||
| 160 | the daemon until we can't umount any more mounts, then return the | ||
| 161 | appropriate status to the daemon. At the moment we just expire one | ||
| 162 | mount at a time. A Generic Netlink implementation would exclude this | ||
| 163 | possibility for future development due to the requirements of the | ||
| 164 | message bus architecture. | ||
| 165 | |||
| 166 | |||
| 167 | autofs4 Miscellaneous Device mount control interface | ||
| 168 | ==================================================== | ||
| 169 | |||
| 170 | The control interface is opening a device node, typically /dev/autofs. | ||
| 171 | |||
| 172 | All the ioctls use a common structure to pass the needed parameter | ||
| 173 | information and return operation results: | ||
| 174 | |||
| 175 | struct autofs_dev_ioctl { | ||
| 176 | __u32 ver_major; | ||
| 177 | __u32 ver_minor; | ||
| 178 | __u32 size; /* total size of data passed in | ||
| 179 | * including this struct */ | ||
| 180 | __s32 ioctlfd; /* automount command fd */ | ||
| 181 | |||
| 182 | __u32 arg1; /* Command parameters */ | ||
| 183 | __u32 arg2; | ||
| 184 | |||
| 185 | char path[0]; | ||
| 186 | }; | ||
| 187 | |||
| 188 | The ioctlfd field is a mount point file descriptor of an autofs mount | ||
| 189 | point. It is returned by the open call and is used by all calls except | ||
| 190 | the check for whether a given path is a mount point, where it may | ||
| 191 | optionally be used to check a specific mount corresponding to a given | ||
| 192 | mount point file descriptor, and when requesting the uid and gid of the | ||
| 193 | last successful mount on a directory within the autofs file system. | ||
| 194 | |||
| 195 | The fields arg1 and arg2 are used to communicate parameters and results of | ||
| 196 | calls made as described below. | ||
| 197 | |||
| 198 | The path field is used to pass a path where it is needed and the size field | ||
| 199 | is used account for the increased structure length when translating the | ||
| 200 | structure sent from user space. | ||
| 201 | |||
| 202 | This structure can be initialized before setting specific fields by using | ||
| 203 | the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *). | ||
| 204 | |||
| 205 | All of the ioctls perform a copy of this structure from user space to | ||
| 206 | kernel space and return -EINVAL if the size parameter is smaller than | ||
| 207 | the structure size itself, -ENOMEM if the kernel memory allocation fails | ||
| 208 | or -EFAULT if the copy itself fails. Other checks include a version check | ||
| 209 | of the compiled in user space version against the module version and a | ||
| 210 | mismatch results in a -EINVAL return. If the size field is greater than | ||
| 211 | the structure size then a path is assumed to be present and is checked to | ||
| 212 | ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is | ||
| 213 | returned. Following these checks, for all ioctl commands except | ||
| 214 | AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and | ||
| 215 | AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is | ||
| 216 | not a valid descriptor or doesn't correspond to an autofs mount point | ||
| 217 | an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is | ||
| 218 | returned. | ||
| 219 | |||
| 220 | |||
| 221 | The ioctls | ||
| 222 | ========== | ||
| 223 | |||
| 224 | An example of an implementation which uses this interface can be seen | ||
| 225 | in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the | ||
| 226 | distribution tar available for download from kernel.org in directory | ||
| 227 | /pub/linux/daemons/autofs/v5. | ||
| 228 | |||
| 229 | The device node ioctl operations implemented by this interface are: | ||
| 230 | |||
| 231 | |||
| 232 | AUTOFS_DEV_IOCTL_VERSION | ||
| 233 | ------------------------ | ||
| 234 | |||
| 235 | Get the major and minor version of the autofs4 device ioctl kernel module | ||
| 236 | implementation. It requires an initialized struct autofs_dev_ioctl as an | ||
| 237 | input parameter and sets the version information in the passed in structure. | ||
| 238 | It returns 0 on success or the error -EINVAL if a version mismatch is | ||
| 239 | detected. | ||
| 240 | |||
| 241 | |||
| 242 | AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD | ||
| 243 | ------------------------------------------------------------------ | ||
| 244 | |||
| 245 | Get the major and minor version of the autofs4 protocol version understood | ||
| 246 | by loaded module. This call requires an initialized struct autofs_dev_ioctl | ||
| 247 | with the ioctlfd field set to a valid autofs mount point descriptor | ||
| 248 | and sets the requested version number in structure field arg1. These | ||
| 249 | commands return 0 on success or one of the negative error codes if | ||
| 250 | validation fails. | ||
| 251 | |||
| 252 | |||
| 253 | AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT | ||
| 254 | ---------------------------------------------------------- | ||
| 255 | |||
| 256 | Obtain and release a file descriptor for an autofs managed mount point | ||
| 257 | path. The open call requires an initialized struct autofs_dev_ioctl with | ||
| 258 | the the path field set and the size field adjusted appropriately as well | ||
| 259 | as the arg1 field set to the device number of the autofs mount. The | ||
| 260 | device number can be obtained from the mount options shown in | ||
| 261 | /proc/mounts. The close call requires an initialized struct | ||
| 262 | autofs_dev_ioct with the ioctlfd field set to the descriptor obtained | ||
| 263 | from the open call. The release of the file descriptor can also be done | ||
| 264 | with close(2) so any open descriptors will also be closed at process exit. | ||
| 265 | The close call is included in the implemented operations largely for | ||
| 266 | completeness and to provide for a consistent user space implementation. | ||
| 267 | |||
| 268 | |||
| 269 | AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD | ||
| 270 | -------------------------------------------------------- | ||
| 271 | |||
| 272 | Return mount and expire result status from user space to the kernel. | ||
| 273 | Both of these calls require an initialized struct autofs_dev_ioctl | ||
| 274 | with the ioctlfd field set to the descriptor obtained from the open | ||
| 275 | call and the arg1 field set to the wait queue token number, received | ||
| 276 | by user space in the foregoing mount or expire request. The arg2 field | ||
| 277 | is set to the status to be returned. For the ready call this is always | ||
| 278 | 0 and for the fail call it is set to the errno of the operation. | ||
| 279 | |||
| 280 | |||
| 281 | AUTOFS_DEV_IOCTL_SETPIPEFD_CMD | ||
| 282 | ------------------------------ | ||
| 283 | |||
| 284 | Set the pipe file descriptor used for kernel communication to the daemon. | ||
| 285 | Normally this is set at mount time using an option but when reconnecting | ||
| 286 | to a existing mount we need to use this to tell the autofs mount about | ||
| 287 | the new kernel pipe descriptor. In order to protect mounts against | ||
| 288 | incorrectly setting the pipe descriptor we also require that the autofs | ||
| 289 | mount be catatonic (see next call). | ||
| 290 | |||
| 291 | The call requires an initialized struct autofs_dev_ioctl with the | ||
| 292 | ioctlfd field set to the descriptor obtained from the open call and | ||
| 293 | the arg1 field set to descriptor of the pipe. On success the call | ||
| 294 | also sets the process group id used to identify the controlling process | ||
| 295 | (eg. the owning automount(8) daemon) to the process group of the caller. | ||
| 296 | |||
| 297 | |||
| 298 | AUTOFS_DEV_IOCTL_CATATONIC_CMD | ||
| 299 | ------------------------------ | ||
| 300 | |||
| 301 | Make the autofs mount point catatonic. The autofs mount will no longer | ||
| 302 | issue mount requests, the kernel communication pipe descriptor is released | ||
| 303 | and any remaining waits in the queue released. | ||
| 304 | |||
| 305 | The call requires an initialized struct autofs_dev_ioctl with the | ||
| 306 | ioctlfd field set to the descriptor obtained from the open call. | ||
| 307 | |||
| 308 | |||
| 309 | AUTOFS_DEV_IOCTL_TIMEOUT_CMD | ||
| 310 | ---------------------------- | ||
| 311 | |||
| 312 | Set the expire timeout for mounts withing an autofs mount point. | ||
| 313 | |||
| 314 | The call requires an initialized struct autofs_dev_ioctl with the | ||
| 315 | ioctlfd field set to the descriptor obtained from the open call. | ||
| 316 | |||
| 317 | |||
| 318 | AUTOFS_DEV_IOCTL_REQUESTER_CMD | ||
| 319 | ------------------------------ | ||
| 320 | |||
| 321 | Return the uid and gid of the last process to successfully trigger a the | ||
| 322 | mount on the given path dentry. | ||
| 323 | |||
| 324 | The call requires an initialized struct autofs_dev_ioctl with the path | ||
| 325 | field set to the mount point in question and the size field adjusted | ||
| 326 | appropriately as well as the arg1 field set to the device number of the | ||
| 327 | containing autofs mount. Upon return the struct field arg1 contains the | ||
| 328 | uid and arg2 the gid. | ||
| 329 | |||
| 330 | When reconstructing an autofs mount tree with active mounts we need to | ||
| 331 | re-connect to mounts that may have used the original process uid and | ||
| 332 | gid (or string variations of them) for mount lookups within the map entry. | ||
| 333 | This call provides the ability to obtain this uid and gid so they may be | ||
| 334 | used by user space for the mount map lookups. | ||
| 335 | |||
| 336 | |||
| 337 | AUTOFS_DEV_IOCTL_EXPIRE_CMD | ||
| 338 | --------------------------- | ||
| 339 | |||
| 340 | Issue an expire request to the kernel for an autofs mount. Typically | ||
| 341 | this ioctl is called until no further expire candidates are found. | ||
| 342 | |||
| 343 | The call requires an initialized struct autofs_dev_ioctl with the | ||
| 344 | ioctlfd field set to the descriptor obtained from the open call. In | ||
| 345 | addition an immediate expire, independent of the mount timeout, can be | ||
| 346 | requested by setting the arg1 field to 1. If no expire candidates can | ||
| 347 | be found the ioctl returns -1 with errno set to EAGAIN. | ||
| 348 | |||
| 349 | This call causes the kernel module to check the mount corresponding | ||
| 350 | to the given ioctlfd for mounts that can be expired, issues an expire | ||
| 351 | request back to the daemon and waits for completion. | ||
| 352 | |||
| 353 | AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD | ||
| 354 | ------------------------------ | ||
| 355 | |||
| 356 | Checks if an autofs mount point is in use. | ||
| 357 | |||
| 358 | The call requires an initialized struct autofs_dev_ioctl with the | ||
| 359 | ioctlfd field set to the descriptor obtained from the open call and | ||
| 360 | it returns the result in the arg1 field, 1 for busy and 0 otherwise. | ||
| 361 | |||
| 362 | |||
| 363 | AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD | ||
| 364 | --------------------------------- | ||
| 365 | |||
| 366 | Check if the given path is a mountpoint. | ||
| 367 | |||
| 368 | The call requires an initialized struct autofs_dev_ioctl. There are two | ||
| 369 | possible variations. Both use the path field set to the path of the mount | ||
| 370 | point to check and the size field adjusted appropriately. One uses the | ||
| 371 | ioctlfd field to identify a specific mount point to check while the other | ||
| 372 | variation uses the path and optionaly arg1 set to an autofs mount type. | ||
| 373 | The call returns 1 if this is a mount point and sets arg1 to the device | ||
| 374 | number of the mount and field arg2 to the relevant super block magic | ||
| 375 | number (described below) or 0 if it isn't a mountpoint. In both cases | ||
| 376 | the the device number (as returned by new_encode_dev()) is returned | ||
| 377 | in field arg1. | ||
| 378 | |||
| 379 | If supplied with a file descriptor we're looking for a specific mount, | ||
| 380 | not necessarily at the top of the mounted stack. In this case the path | ||
| 381 | the descriptor corresponds to is considered a mountpoint if it is itself | ||
| 382 | a mountpoint or contains a mount, such as a multi-mount without a root | ||
| 383 | mount. In this case we return 1 if the descriptor corresponds to a mount | ||
| 384 | point and and also returns the super magic of the covering mount if there | ||
| 385 | is one or 0 if it isn't a mountpoint. | ||
| 386 | |||
| 387 | If a path is supplied (and the ioctlfd field is set to -1) then the path | ||
| 388 | is looked up and is checked to see if it is the root of a mount. If a | ||
| 389 | type is also given we are looking for a particular autofs mount and if | ||
| 390 | a match isn't found a fail is returned. If the the located path is the | ||
| 391 | root of a mount 1 is returned along with the super magic of the mount | ||
| 392 | or 0 otherwise. | ||
| 393 | |||
