diff options
Diffstat (limited to 'Documentation/filesystems/devfs/README')
-rw-r--r-- | Documentation/filesystems/devfs/README | 1964 |
1 files changed, 1964 insertions, 0 deletions
diff --git a/Documentation/filesystems/devfs/README b/Documentation/filesystems/devfs/README new file mode 100644 index 000000000000..54366ecc241f --- /dev/null +++ b/Documentation/filesystems/devfs/README | |||
@@ -0,0 +1,1964 @@ | |||
1 | Devfs (Device File System) FAQ | ||
2 | |||
3 | |||
4 | Linux Devfs (Device File System) FAQ | ||
5 | Richard Gooch | ||
6 | 20-AUG-2002 | ||
7 | |||
8 | |||
9 | Document languages: | ||
10 | |||
11 | |||
12 | |||
13 | |||
14 | |||
15 | |||
16 | |||
17 | ----------------------------------------------------------------------------- | ||
18 | |||
19 | NOTE: the master copy of this document is available online at: | ||
20 | |||
21 | http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html | ||
22 | and looks much better than the text version distributed with the | ||
23 | kernel sources. A mirror site is available at: | ||
24 | |||
25 | http://www.ras.ucalgary.ca/~rgooch/linux/docs/devfs.html | ||
26 | |||
27 | There is also an optional daemon that may be used with devfs. You can | ||
28 | find out more about it at: | ||
29 | |||
30 | http://www.atnf.csiro.au/~rgooch/linux/ | ||
31 | |||
32 | A mailing list is available which you may subscribe to. Send | ||
33 | |||
34 | to majordomo@oss.sgi.com with the following line in the | ||
35 | body of the message: | ||
36 | subscribe devfs | ||
37 | To unsubscribe, send the message body: | ||
38 | unsubscribe devfs | ||
39 | instead. The list is archived at | ||
40 | |||
41 | http://oss.sgi.com/projects/devfs/archive/. | ||
42 | |||
43 | ----------------------------------------------------------------------------- | ||
44 | |||
45 | Contents | ||
46 | |||
47 | |||
48 | What is it? | ||
49 | |||
50 | Why do it? | ||
51 | |||
52 | Who else does it? | ||
53 | |||
54 | How it works | ||
55 | |||
56 | Operational issues (essential reading) | ||
57 | |||
58 | Instructions for the impatient | ||
59 | Permissions persistence across reboots | ||
60 | Dealing with drivers without devfs support | ||
61 | All the way with Devfs | ||
62 | Other Issues | ||
63 | Kernel Naming Scheme | ||
64 | Devfsd Naming Scheme | ||
65 | Old Compatibility Names | ||
66 | SCSI Host Probing Issues | ||
67 | |||
68 | |||
69 | |||
70 | Device drivers currently ported | ||
71 | |||
72 | Allocation of Device Numbers | ||
73 | |||
74 | Questions and Answers | ||
75 | |||
76 | Making things work | ||
77 | Alternatives to devfs | ||
78 | What I don't like about devfs | ||
79 | How to report bugs | ||
80 | Strange kernel messages | ||
81 | Compilation problems with devfsd | ||
82 | |||
83 | |||
84 | Other resources | ||
85 | |||
86 | Translations of this document | ||
87 | |||
88 | |||
89 | ----------------------------------------------------------------------------- | ||
90 | |||
91 | |||
92 | What is it? | ||
93 | |||
94 | Devfs is an alternative to "real" character and block special devices | ||
95 | on your root filesystem. Kernel device drivers can register devices by | ||
96 | name rather than major and minor numbers. These devices will appear in | ||
97 | devfs automatically, with whatever default ownership and | ||
98 | protection the driver specified. A daemon (devfsd) can be used to | ||
99 | override these defaults. Devfs has been in the kernel since 2.3.46. | ||
100 | |||
101 | NOTE that devfs is entirely optional. If you prefer the old | ||
102 | disc-based device nodes, then simply leave CONFIG_DEVFS_FS=n (the | ||
103 | default). In this case, nothing will change. ALSO NOTE that if you do | ||
104 | enable devfs, the defaults are such that full compatibility is | ||
105 | maintained with the old devices names. | ||
106 | |||
107 | There are two aspects to devfs: one is the underlying device | ||
108 | namespace, which is a namespace just like any mounted filesystem. The | ||
109 | other aspect is the filesystem code which provides a view of the | ||
110 | device namespace. The reason I make a distinction is because devfs | ||
111 | can be mounted many times, with each mount showing the same device | ||
112 | namespace. Changes made are global to all mounted devfs filesystems. | ||
113 | Also, because the devfs namespace exists without any devfs mounts, you | ||
114 | can easily mount the root filesystem by referring to an entry in the | ||
115 | devfs namespace. | ||
116 | |||
117 | |||
118 | The cost of devfs is a small increase in kernel code size and memory | ||
119 | usage. About 7 pages of code (some of that in __init sections) and 72 | ||
120 | bytes for each entry in the namespace. A modest system has only a | ||
121 | couple of hundred device entries, so this costs a few more | ||
122 | pages. Compare this with the suggestion to put /dev on a <a | ||
123 | href="#why-faq-ramdisc">ramdisc. | ||
124 | |||
125 | On a typical machine, the cost is under 0.2 percent. On a modest | ||
126 | system with 64 MBytes of RAM, the cost is under 0.1 percent. The | ||
127 | accusations of "bloatware" levelled at devfs are not justified. | ||
128 | |||
129 | ----------------------------------------------------------------------------- | ||
130 | |||
131 | |||
132 | Why do it? | ||
133 | |||
134 | There are several problems that devfs addresses. Some of these | ||
135 | problems are more serious than others (depending on your point of | ||
136 | view), and some can be solved without devfs. However, the totality of | ||
137 | these problems really calls out for devfs. | ||
138 | |||
139 | The choice is a patchwork of inefficient user space solutions, which | ||
140 | are complex and likely to be fragile, or to use a simple and efficient | ||
141 | devfs which is robust. | ||
142 | |||
143 | There have been many counter-proposals to devfs, all seeking to | ||
144 | provide some of the benefits without actually implementing devfs. So | ||
145 | far there has been an absence of code and no proposed alternative has | ||
146 | been able to provide all the features that devfs does. Further, | ||
147 | alternative proposals require far more complexity in user-space (and | ||
148 | still deliver less functionality than devfs). Some people have the | ||
149 | mantra of reducing "kernel bloat", but don't consider the effects on | ||
150 | user-space. | ||
151 | |||
152 | A good solution limits the total complexity of kernel-space and | ||
153 | user-space. | ||
154 | |||
155 | |||
156 | Major&minor allocation | ||
157 | |||
158 | The existing scheme requires the allocation of major and minor device | ||
159 | numbers for each and every device. This means that a central | ||
160 | co-ordinating authority is required to issue these device numbers | ||
161 | (unless you're developing a "private" device driver), in order to | ||
162 | preserve uniqueness. Devfs shifts the burden to a namespace. This may | ||
163 | not seem like a huge benefit, but actually it is. Since driver authors | ||
164 | will naturally choose a device name which reflects the functionality | ||
165 | of the device, there is far less potential for namespace conflict. | ||
166 | Solving this requires a kernel change. | ||
167 | |||
168 | /dev management | ||
169 | |||
170 | Because you currently access devices through device nodes, these must | ||
171 | be created by the system administrator. For standard devices you can | ||
172 | usually find a MAKEDEV programme which creates all these (hundreds!) | ||
173 | of nodes. This means that changes in the kernel must be reflected by | ||
174 | changes in the MAKEDEV programme, or else the system administrator | ||
175 | creates device nodes by hand. | ||
176 | |||
177 | The basic problem is that there are two separate databases of | ||
178 | major and minor numbers. One is in the kernel and one is in /dev (or | ||
179 | in a MAKEDEV programme, if you want to look at it that way). This is | ||
180 | duplication of information, which is not good practice. | ||
181 | Solving this requires a kernel change. | ||
182 | |||
183 | /dev growth | ||
184 | |||
185 | A typical /dev has over 1200 nodes! Most of these devices simply don't | ||
186 | exist because the hardware is not available. A huge /dev increases the | ||
187 | time to access devices (I'm just referring to the dentry lookup times | ||
188 | and the time taken to read inodes off disc: the next subsection shows | ||
189 | some more horrors). | ||
190 | |||
191 | An example of how big /dev can grow is if we consider SCSI devices: | ||
192 | |||
193 | host 6 bits (say up to 64 hosts on a really big machine) | ||
194 | channel 4 bits (say up to 16 SCSI buses per host) | ||
195 | id 4 bits | ||
196 | lun 3 bits | ||
197 | partition 6 bits | ||
198 | TOTAL 23 bits | ||
199 | |||
200 | |||
201 | This requires 8 Mega (1024*1024) inodes if we want to store all | ||
202 | possible device nodes. Even if we scrap everything but id,partition | ||
203 | and assume a single host adapter with a single SCSI bus and only one | ||
204 | logical unit per SCSI target (id), that's still 10 bits or 1024 | ||
205 | inodes. Each VFS inode takes around 256 bytes (kernel 2.1.78), so | ||
206 | that's 256 kBytes of inode storage on disc (assuming real inodes take | ||
207 | a similar amount of space as VFS inodes). This is actually not so bad, | ||
208 | because disc is cheap these days. Embedded systems would care about | ||
209 | 256 kBytes of /dev inodes, but you could argue that embedded systems | ||
210 | would have hand-tuned /dev directories. I've had to do just that on my | ||
211 | embedded systems, but I would rather just leave it to devfs. | ||
212 | |||
213 | Another issue is the time taken to lookup an inode when first | ||
214 | referenced. Not only does this take time in scanning through a list in | ||
215 | memory, but also the seek times to read the inodes off disc. | ||
216 | This could be solved in user-space using a clever programme which | ||
217 | scanned the kernel logs and deleted /dev entries which are not | ||
218 | available and created them when they were available. This programme | ||
219 | would need to be run every time a new module was loaded, which would | ||
220 | slow things down a lot. | ||
221 | |||
222 | There is an existing programme called scsidev which will automatically | ||
223 | create device nodes for SCSI devices. It can do this by scanning files | ||
224 | in /proc/scsi. Unfortunately, to extend this idea to other device | ||
225 | nodes would require significant modifications to existing drivers (so | ||
226 | they too would provide information in /proc). This is a non-trivial | ||
227 | change (I should know: devfs has had to do something similar). Once | ||
228 | you go to this much effort, you may as well use devfs itself (which | ||
229 | also provides this information). Furthermore, such a system would | ||
230 | likely be implemented in an ad-hoc fashion, as different drivers will | ||
231 | provide their information in different ways. | ||
232 | |||
233 | Devfs is much cleaner, because it (naturally) has a uniform mechanism | ||
234 | to provide this information: the device nodes themselves! | ||
235 | |||
236 | |||
237 | Node to driver file_operations translation | ||
238 | |||
239 | There is an important difference between the way disc-based character | ||
240 | and block nodes and devfs entries make the connection between an entry | ||
241 | in /dev and the actual device driver. | ||
242 | |||
243 | With the current 8 bit major and minor numbers the connection between | ||
244 | disc-based c&b nodes and per-major drivers is done through a | ||
245 | fixed-length table of 128 entries. The various filesystem types set | ||
246 | the inode operations for c&b nodes to {chr,blk}dev_inode_operations, | ||
247 | so when a device is opened a few quick levels of indirection bring us | ||
248 | to the driver file_operations. | ||
249 | |||
250 | For miscellaneous character devices a second step is required: there | ||
251 | is a scan for the driver entry with the same minor number as the file | ||
252 | that was opened, and the appropriate minor open method is called. This | ||
253 | scanning is done *every time* you open a device node. Potentially, you | ||
254 | may be searching through dozens of misc. entries before you find your | ||
255 | open method. While not an enormous performance overhead, this does | ||
256 | seem pointless. | ||
257 | |||
258 | Linux *must* move beyond the 8 bit major and minor barrier, | ||
259 | somehow. If we simply increase each to 16 bits, then the indexing | ||
260 | scheme used for major driver lookup becomes untenable, because the | ||
261 | major tables (one each for character and block devices) would need to | ||
262 | be 64 k entries long (512 kBytes on x86, 1 MByte for 64 bit | ||
263 | systems). So we would have to use a scheme like that used for | ||
264 | miscellaneous character devices, which means the search time goes up | ||
265 | linearly with the average number of major device drivers on your | ||
266 | system. Not all "devices" are hardware, some are higher-level drivers | ||
267 | like KGI, so you can get more "devices" without adding hardware | ||
268 | You can improve this by creating an ordered (balanced:-) | ||
269 | binary tree, in which case your search time becomes log(N). | ||
270 | Alternatively, you can use hashing to speed up the search. | ||
271 | But why do that search at all if you don't have to? Once again, it | ||
272 | seems pointless. | ||
273 | |||
274 | Note that devfs doesn't use the major&minor system. For devfs | ||
275 | entries, the connection is done when you lookup the /dev entry. When | ||
276 | devfs_register() is called, an internal table is appended which has | ||
277 | the entry name and the file_operations. If the dentry cache doesn't | ||
278 | have the /dev entry already, this internal table is scanned to get the | ||
279 | file_operations, and an inode is created. If the dentry cache already | ||
280 | has the entry, there is *no lookup time* (other than the dentry scan | ||
281 | itself, but we can't avoid that anyway, and besides Linux dentries | ||
282 | cream other OS's which don't have them:-). Furthermore, the number of | ||
283 | node entries in a devfs is only the number of available device | ||
284 | entries, not the number of *conceivable* entries. Even if you remove | ||
285 | unnecessary entries in a disc-based /dev, the number of conceivable | ||
286 | entries remains the same: you just limit yourself in order to save | ||
287 | space. | ||
288 | |||
289 | Devfs provides a fast connection between a VFS node and the device | ||
290 | driver, in a scalable way. | ||
291 | |||
292 | /dev as a system administration tool | ||
293 | |||
294 | Right now /dev contains a list of conceivable devices, most of which I | ||
295 | don't have. Devfs only shows those devices available on my | ||
296 | system. This means that listing /dev is a handy way of checking what | ||
297 | devices are available. | ||
298 | |||
299 | Major&minor size | ||
300 | |||
301 | Existing major and minor numbers are limited to 8 bits each. This is | ||
302 | now a limiting factor for some drivers, particularly the SCSI disc | ||
303 | driver, which consumes a single major number. Only 16 discs are | ||
304 | supported, and each disc may have only 15 partitions. Maybe this isn't | ||
305 | a problem for you, but some of us are building huge Linux systems with | ||
306 | disc arrays. With devfs an arbitrary pointer can be associated with | ||
307 | each device entry, which can be used to give an effective 32 bit | ||
308 | device identifier (i.e. that's like having a 32 bit minor | ||
309 | number). Since this is private to the kernel, there are no C library | ||
310 | compatibility issues which you would have with increasing major and | ||
311 | minor number sizes. See the section on "Allocation of Device Numbers" | ||
312 | for details on maintaining compatibility with userspace. | ||
313 | |||
314 | Solving this requires a kernel change. | ||
315 | |||
316 | Since writing this, the kernel has been modified so that the SCSI disc | ||
317 | driver has more major numbers allocated to it and now supports up to | ||
318 | 128 discs. Since these major numbers are non-contiguous (a result of | ||
319 | unplanned expansion), the implementation is a little more cumbersome | ||
320 | than originally. | ||
321 | |||
322 | Just like the changes to IPv4 to fix impending limitations in the | ||
323 | address space, people find ways around the limitations. In the long | ||
324 | run, however, solutions like IPv6 or devfs can't be put off forever. | ||
325 | |||
326 | Read-only root filesystem | ||
327 | |||
328 | Having your device nodes on the root filesystem means that you can't | ||
329 | operate properly with a read-only root filesystem. This is because you | ||
330 | want to change ownerships and protections of tty devices. Existing | ||
331 | practice prevents you using a CD-ROM as your root filesystem for a | ||
332 | *real* system. Sure, you can boot off a CD-ROM, but you can't change | ||
333 | tty ownerships, so it's only good for installing. | ||
334 | |||
335 | Also, you can't use a shared NFS root filesystem for a cluster of | ||
336 | discless Linux machines (having tty ownerships changed on a common | ||
337 | /dev is not good). Nor can you embed your root filesystem in a | ||
338 | ROM-FS. | ||
339 | |||
340 | You can get around this by creating a RAMDISC at boot time, making | ||
341 | an ext2 filesystem in it, mounting it somewhere and copying the | ||
342 | contents of /dev into it, then unmounting it and mounting it over | ||
343 | /dev. | ||
344 | |||
345 | A devfs is a cleaner way of solving this. | ||
346 | |||
347 | Non-Unix root filesystem | ||
348 | |||
349 | Non-Unix filesystems (such as NTFS) can't be used for a root | ||
350 | filesystem because they variously don't support character and block | ||
351 | special files or symbolic links. You can't have a separate disc-based | ||
352 | or RAMDISC-based filesystem mounted on /dev because you need device | ||
353 | nodes before you can mount these. Devfs can be mounted without any | ||
354 | device nodes. Devlinks won't work because symlinks aren't supported. | ||
355 | An alternative solution is to use initrd to mount a RAMDISC initial | ||
356 | root filesystem (which is populated with a minimal set of device | ||
357 | nodes), and then construct a new /dev in another RAMDISC, and finally | ||
358 | switch to your non-Unix root filesystem. This requires clever boot | ||
359 | scripts and a fragile and conceptually complex boot procedure. | ||
360 | |||
361 | Devfs solves this in a robust and conceptually simple way. | ||
362 | |||
363 | PTY security | ||
364 | |||
365 | Current pseudo-tty (pty) devices are owned by root and read-writable | ||
366 | by everyone. The user of a pty-pair cannot change | ||
367 | ownership/protections without being suid-root. | ||
368 | |||
369 | This could be solved with a secure user-space daemon which runs as | ||
370 | root and does the actual creation of pty-pairs. Such a daemon would | ||
371 | require modification to *every* programme that wants to use this new | ||
372 | mechanism. It also slows down creation of pty-pairs. | ||
373 | |||
374 | An alternative is to create a new open_pty() syscall which does much | ||
375 | the same thing as the user-space daemon. Once again, this requires | ||
376 | modifications to pty-handling programmes. | ||
377 | |||
378 | The devfs solution allows a device driver to "tag" certain device | ||
379 | files so that when an unopened device is opened, the ownerships are | ||
380 | changed to the current euid and egid of the opening process, and the | ||
381 | protections are changed to the default registered by the driver. When | ||
382 | the device is closed ownership is set back to root and protections are | ||
383 | set back to read-write for everybody. No programme need be changed. | ||
384 | The devpts filesystem provides this auto-ownership feature for Unix98 | ||
385 | ptys. It doesn't support old-style pty devices, nor does it have all | ||
386 | the other features of devfs. | ||
387 | |||
388 | Intelligent device management | ||
389 | |||
390 | Devfs implements a simple yet powerful protocol for communication with | ||
391 | a device management daemon (devfsd) which runs in user space. It is | ||
392 | possible to send a message (either synchronously or asynchronously) to | ||
393 | devfsd on any event, such as registration/unregistration of device | ||
394 | entries, opening and closing devices, looking up inodes, scanning | ||
395 | directories and more. This has many possibilities. Some of these are | ||
396 | already implemented. See: | ||
397 | |||
398 | |||
399 | http://www.atnf.csiro.au/~rgooch/linux/ | ||
400 | |||
401 | Device entry registration events can be used by devfsd to change | ||
402 | permissions of newly-created device nodes. This is one mechanism to | ||
403 | control device permissions. | ||
404 | |||
405 | Device entry registration/unregistration events can be used to run | ||
406 | programmes or scripts. This can be used to provide automatic mounting | ||
407 | of filesystems when a new block device media is inserted into the | ||
408 | drive. | ||
409 | |||
410 | Asynchronous device open and close events can be used to implement | ||
411 | clever permissions management. For example, the default permissions on | ||
412 | /dev/dsp do not allow everybody to read from the device. This is | ||
413 | sensible, as you don't want some remote user recording what you say at | ||
414 | your console. However, the console user is also prevented from | ||
415 | recording. This behaviour is not desirable. With asynchronous device | ||
416 | open and close events, you can have devfsd run a programme or script | ||
417 | when console devices are opened to change the ownerships for *other* | ||
418 | device nodes (such as /dev/dsp). On closure, you can run a different | ||
419 | script to restore permissions. An advantage of this scheme over | ||
420 | modifying the C library tty handling is that this works even if your | ||
421 | programme crashes (how many times have you seen the utmp database with | ||
422 | lingering entries for non-existent logins?). | ||
423 | |||
424 | Synchronous device open events can be used to perform intelligent | ||
425 | device access protections. Before the device driver open() method is | ||
426 | called, the daemon must first validate the open attempt, by running an | ||
427 | external programme or script. This is far more flexible than access | ||
428 | control lists, as access can be determined on the basis of other | ||
429 | system conditions instead of just the UID and GID. | ||
430 | |||
431 | Inode lookup events can be used to authenticate module autoload | ||
432 | requests. Instead of using kmod directly, the event is sent to | ||
433 | devfsd which can implement an arbitrary authentication before loading | ||
434 | the module itself. | ||
435 | |||
436 | Inode lookup events can also be used to construct arbitrary | ||
437 | namespaces, without having to resort to populating devfs with symlinks | ||
438 | to devices that don't exist. | ||
439 | |||
440 | Speculative Device Scanning | ||
441 | |||
442 | Consider an application (like cdparanoia) that wants to find all | ||
443 | CD-ROM devices on the system (SCSI, IDE and other types), whether or | ||
444 | not their respective modules are loaded. The application must | ||
445 | speculatively open certain device nodes (such as /dev/sr0 for the SCSI | ||
446 | CD-ROMs) in order to make sure the module is loaded. This requires | ||
447 | that all Linux distributions follow the standard device naming scheme | ||
448 | (last time I looked RedHat did things differently). Devfs solves the | ||
449 | naming problem. | ||
450 | |||
451 | The same application also wants to see which devices are actually | ||
452 | available on the system. With the existing system it needs to read the | ||
453 | /dev directory and speculatively open each /dev/sr* device to | ||
454 | determine if the device exists or not. With a large /dev this is an | ||
455 | inefficient operation, especially if there are many /dev/sr* nodes. A | ||
456 | solution like scsidev could reduce the number of /dev/sr* entries (but | ||
457 | of course that also requires all that inefficient directory scanning). | ||
458 | |||
459 | With devfs, the application can open the /dev/sr directory | ||
460 | (which triggers the module autoloading if required), and proceed to | ||
461 | read /dev/sr. Since only the available devices will have | ||
462 | entries, there are no inefficencies in directory scanning or device | ||
463 | openings. | ||
464 | |||
465 | ----------------------------------------------------------------------------- | ||
466 | |||
467 | Who else does it? | ||
468 | |||
469 | FreeBSD has a devfs implementation. Solaris and AIX each have a | ||
470 | pseudo-devfs (something akin to scsidev but for all devices, with some | ||
471 | unspecified kernel support). BeOS, Plan9 and QNX also have it. SGI's | ||
472 | IRIX 6.4 and above also have a device filesystem. | ||
473 | |||
474 | While we shouldn't just automatically do something because others do | ||
475 | it, we should not ignore the work of others either. FreeBSD has a lot | ||
476 | of competent people working on it, so their opinion should not be | ||
477 | blithely ignored. | ||
478 | |||
479 | ----------------------------------------------------------------------------- | ||
480 | |||
481 | |||
482 | How it works | ||
483 | |||
484 | Registering device entries | ||
485 | |||
486 | For every entry (device node) in a devfs-based /dev a driver must call | ||
487 | devfs_register(). This adds the name of the device entry, the | ||
488 | file_operations structure pointer and a few other things to an | ||
489 | internal table. Device entries may be added and removed at any | ||
490 | time. When a device entry is registered, it automagically appears in | ||
491 | any mounted devfs'. | ||
492 | |||
493 | Inode lookup | ||
494 | |||
495 | When a lookup operation on an entry is performed and if there is no | ||
496 | driver information for that entry devfs will attempt to call | ||
497 | devfsd. If still no driver information can be found then a negative | ||
498 | dentry is yielded and the next stage operation will be called by the | ||
499 | VFS (such as create() or mknod() inode methods). If driver information | ||
500 | can be found, an inode is created (if one does not exist already) and | ||
501 | all is well. | ||
502 | |||
503 | Manually creating device nodes | ||
504 | |||
505 | The mknod() method allows you to create an ordinary named pipe in the | ||
506 | devfs, or you can create a character or block special inode if one | ||
507 | does not already exist. You may wish to create a character or block | ||
508 | special inode so that you can set permissions and ownership. Later, if | ||
509 | a device driver registers an entry with the same name, the | ||
510 | permissions, ownership and times are retained. This is how you can set | ||
511 | the protections on a device even before the driver is loaded. Once you | ||
512 | create an inode it appears in the directory listing. | ||
513 | |||
514 | Unregistering device entries | ||
515 | |||
516 | A device driver calls devfs_unregister() to unregister an entry. | ||
517 | |||
518 | Chroot() gaols | ||
519 | |||
520 | 2.2.x kernels | ||
521 | |||
522 | The semantics of inode creation are different when devfs is mounted | ||
523 | with the "explicit" option. Now, when a device entry is registered, it | ||
524 | will not appear until you use mknod() to create the device. It doesn't | ||
525 | matter if you mknod() before or after the device is registered with | ||
526 | devfs_register(). The purpose of this behaviour is to support | ||
527 | chroot(2) gaols, where you want to mount a minimal devfs inside the | ||
528 | gaol. Only the devices you specifically want to be available (through | ||
529 | your mknod() setup) will be accessible. | ||
530 | |||
531 | 2.4.x kernels | ||
532 | |||
533 | As of kernel 2.3.99, the VFS has had the ability to rebind parts of | ||
534 | the global filesystem namespace into another part of the namespace. | ||
535 | This now works even at the leaf-node level, which means that | ||
536 | individual files and device nodes may be bound into other parts of the | ||
537 | namespace. This is like making links, but better, because it works | ||
538 | across filesystems (unlike hard links) and works through chroot() | ||
539 | gaols (unlike symbolic links). | ||
540 | |||
541 | Because of these improvements to the VFS, the multi-mount capability | ||
542 | in devfs is no longer needed. The administrator may create a minimal | ||
543 | device tree inside a chroot(2) gaol by using VFS bindings. As this | ||
544 | provides most of the features of the devfs multi-mount capability, I | ||
545 | removed the multi-mount support code (after issuing an RFC). This | ||
546 | yielded code size reductions and simplifications. | ||
547 | |||
548 | If you want to construct a minimal chroot() gaol, the following | ||
549 | command should suffice: | ||
550 | |||
551 | mount --bind /dev/null /gaol/dev/null | ||
552 | |||
553 | |||
554 | Repeat for other device nodes you want to expose. Simple! | ||
555 | |||
556 | ----------------------------------------------------------------------------- | ||
557 | |||
558 | |||
559 | Operational issues | ||
560 | |||
561 | |||
562 | Instructions for the impatient | ||
563 | |||
564 | Nobody likes reading documentation. People just want to get in there | ||
565 | and play. So this section tells you quickly the steps you need to take | ||
566 | to run with devfs mounted over /dev. Skip these steps and you will end | ||
567 | up with a nearly unbootable system. Subsequent sections describe the | ||
568 | issues in more detail, and discuss non-essential configuration | ||
569 | options. | ||
570 | |||
571 | Devfsd | ||
572 | OK, if you're reading this, I assume you want to play with | ||
573 | devfs. First you should ensure that /usr/src/linux contains a | ||
574 | recent kernel source tree. Then you need to compile devfsd, the device | ||
575 | management daemon, available at | ||
576 | |||
577 | http://www.atnf.csiro.au/~rgooch/linux/. | ||
578 | Because the kernel has a naming scheme | ||
579 | which is quite different from the old naming scheme, you need to | ||
580 | install devfsd so that software and configuration files that use the | ||
581 | old naming scheme will not break. | ||
582 | |||
583 | Compile and install devfsd. You will be provided with a default | ||
584 | configuration file /etc/devfsd.conf which will provide | ||
585 | compatibility symlinks for the old naming scheme. Don't change this | ||
586 | config file unless you know what you're doing. Even if you think you | ||
587 | do know what you're doing, don't change it until you've followed all | ||
588 | the steps below and booted a devfs-enabled system and verified that it | ||
589 | works. | ||
590 | |||
591 | Now edit your main system boot script so that devfsd is started at the | ||
592 | very beginning (before any filesystem | ||
593 | checks). /etc/rc.d/rc.sysinit is often the main boot script | ||
594 | on systems with SysV-style boot scripts. On systems with BSD-style | ||
595 | boot scripts it is often /etc/rc. Also check | ||
596 | /sbin/rc. | ||
597 | |||
598 | NOTE that the line you put into the boot | ||
599 | script should be exactly: | ||
600 | |||
601 | /sbin/devfsd /dev | ||
602 | |||
603 | DO NOT use some special daemon-launching | ||
604 | programme, otherwise the boot script may not wait for devfsd to finish | ||
605 | initialising. | ||
606 | |||
607 | System Libraries | ||
608 | There may still be some problems because of broken software making | ||
609 | assumptions about device names. In particular, some software does not | ||
610 | handle devices which are symbolic links. If you are running a libc 5 | ||
611 | based system, install libc 5.4.44 (if you have libc 5.4.46, go back to | ||
612 | libc 5.4.44, which is actually correct). If you are running a glibc | ||
613 | based system, make sure you have glibc 2.1.3 or later. | ||
614 | |||
615 | /etc/securetty | ||
616 | PAM (Pluggable Authentication Modules) is supposed to be a flexible | ||
617 | mechanism for providing better user authentication and access to | ||
618 | services. Unfortunately, it's also fragile, complex and undocumented | ||
619 | (check out RedHat 6.1, and probably other distributions as well). PAM | ||
620 | has problems with symbolic links. Append the following lines to your | ||
621 | /etc/securetty file: | ||
622 | |||
623 | vc/1 | ||
624 | vc/2 | ||
625 | vc/3 | ||
626 | vc/4 | ||
627 | vc/5 | ||
628 | vc/6 | ||
629 | vc/7 | ||
630 | vc/8 | ||
631 | |||
632 | This will not weaken security. If you have a version of util-linux | ||
633 | earlier than 2.10.h, please upgrade to 2.10.h or later. If you | ||
634 | absolutely cannot upgrade, then also append the following lines to | ||
635 | your /etc/securetty file: | ||
636 | |||
637 | 1 | ||
638 | 2 | ||
639 | 3 | ||
640 | 4 | ||
641 | 5 | ||
642 | 6 | ||
643 | 7 | ||
644 | 8 | ||
645 | |||
646 | This may potentially weaken security by allowing root logins over the | ||
647 | network (a password is still required, though). However, since there | ||
648 | are problems with dealing with symlinks, I'm suspicious of the level | ||
649 | of security offered in any case. | ||
650 | |||
651 | XFree86 | ||
652 | While not essential, it's probably a good idea to upgrade to XFree86 | ||
653 | 4.0, as patches went in to make it more devfs-friendly. If you don't, | ||
654 | you'll probably need to apply the following patch to | ||
655 | /etc/security/console.perms so that ordinary users can run | ||
656 | startx. Note that not all distributions have this file (e.g. Debian), | ||
657 | so if it's not present, don't worry about it. | ||
658 | |||
659 | --- /etc/security/console.perms.orig Sat Apr 17 16:26:47 1999 | ||
660 | +++ /etc/security/console.perms Fri Feb 25 23:53:55 2000 | ||
661 | @@ -14,7 +14,7 @@ | ||
662 | # man 5 console.perms | ||
663 | |||
664 | # file classes -- these are regular expressions | ||
665 | -<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] | ||
666 | +<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] | ||
667 | |||
668 | # device classes -- these are shell-style globs | ||
669 | <floppy>=/dev/fd[0-1]* | ||
670 | |||
671 | If the patch does not apply, then change the line: | ||
672 | |||
673 | <console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] | ||
674 | |||
675 | with: | ||
676 | |||
677 | <console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] | ||
678 | |||
679 | |||
680 | Disable devpts | ||
681 | I've had a report of devpts mounted on /dev/pts not working | ||
682 | correctly. Since devfs will also manage /dev/pts, there is no | ||
683 | need to mount devpts as well. You should either edit your | ||
684 | /etc/fstab so devpts is not mounted, or disable devpts from | ||
685 | your kernel configuration. | ||
686 | |||
687 | Unsupported drivers | ||
688 | Not all drivers have devfs support. If you depend on one of these | ||
689 | drivers, you will need to create a script or tarfile that you can use | ||
690 | at boot time to create device nodes as appropriate. There is a | ||
691 | section which describes this. Another | ||
692 | section lists the drivers which have | ||
693 | devfs support. | ||
694 | |||
695 | /dev/mouse | ||
696 | |||
697 | Many disributions configure /dev/mouse to be the mouse device | ||
698 | for XFree86 and GPM. I actually think this is a bad idea, because it | ||
699 | adds another level of indirection. When looking at a config file, if | ||
700 | you see /dev/mouse you're left wondering which mouse | ||
701 | is being referred to. Hence I recommend putting the actual mouse | ||
702 | device (for example /dev/psaux) into your | ||
703 | /etc/X11/XF86Config file (and similarly for the GPM | ||
704 | configuration file). | ||
705 | |||
706 | Alternatively, use the same technique used for unsupported drivers | ||
707 | described above. | ||
708 | |||
709 | The Kernel | ||
710 | Finally, you need to make sure devfs is compiled into your kernel. Set | ||
711 | CONFIG_EXPERIMENTAL=y, CONFIG_DEVFS_FS=y and CONFIG_DEVFS_MOUNT=y by | ||
712 | using favourite configuration tool (i.e. make config or | ||
713 | make xconfig) and then make clean and then recompile your kernel and | ||
714 | modules. At boot, devfs will be mounted onto /dev. | ||
715 | |||
716 | If you encounter problems booting (for example if you forgot a | ||
717 | configuration step), you can pass devfs=nomount at the kernel | ||
718 | boot command line. This will prevent the kernel from mounting devfs at | ||
719 | boot time onto /dev. | ||
720 | |||
721 | In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting | ||
722 | devfs onto /dev is completely safe, and requires no | ||
723 | configuration changes. One exception to take note of is when | ||
724 | LABEL= directives are used in /etc/fstab. In this | ||
725 | case you will be unable to boot properly. This is because the | ||
726 | mount(8) programme uses /proc/partitions as part of | ||
727 | the volume label search process, and the device names it finds are not | ||
728 | available, because setting CONFIG_DEVFS_FS=y changes the names in | ||
729 | /proc/partitions, irrespective of whether devfs is mounted. | ||
730 | |||
731 | Now you've finished all the steps required. You're now ready to boot | ||
732 | your shiny new kernel. Enjoy. | ||
733 | |||
734 | Changing the configuration | ||
735 | |||
736 | OK, you've now booted a devfs-enabled system, and everything works. | ||
737 | Now you may feel like changing the configuration (common targets are | ||
738 | /etc/fstab and /etc/devfsd.conf). Since you have a | ||
739 | system that works, if you make any changes and it doesn't work, you | ||
740 | now know that you only have to restore your configuration files to the | ||
741 | default and it will work again. | ||
742 | |||
743 | |||
744 | Permissions persistence across reboots | ||
745 | |||
746 | If you don't use mknod(2) to create a device file, nor use chmod(2) or | ||
747 | chown(2) to change the ownerships/permissions, the inode ctime will | ||
748 | remain at 0 (the epoch, 12 am, 1-JAN-1970, GMT). Anything with a ctime | ||
749 | later than this has had it's ownership/permissions changed. Hence, a | ||
750 | simple script or programme may be used to tar up all changed inodes, | ||
751 | prior to shutdown. Although effective, many consider this approach a | ||
752 | kludge. | ||
753 | |||
754 | A much better approach is to use devfsd to save and restore | ||
755 | permissions. It may be configured to record changes in permissions and | ||
756 | will save them in a database (in fact a directory tree), and restore | ||
757 | these upon boot. This is an efficient method and results in immediate | ||
758 | saving of current permissions (unlike the tar approach, which saves | ||
759 | permissions at some unspecified future time). | ||
760 | |||
761 | The default configuration file supplied with devfsd has config entries | ||
762 | which you may uncomment to enable persistence management. | ||
763 | |||
764 | If you decide to use the tar approach anyway, be aware that tar will | ||
765 | first unlink(2) an inode before creating a new device node. The | ||
766 | unlink(2) has the effect of breaking the connection between a devfs | ||
767 | entry and the device driver. If you use the "devfs=only" boot option, | ||
768 | you lose access to the device driver, requiring you to reload the | ||
769 | module. I consider this a bug in tar (there is no real need to | ||
770 | unlink(2) the inode first). | ||
771 | |||
772 | Alternatively, you can use devfsd to provide more sophisticated | ||
773 | management of device permissions. You can use devfsd to store | ||
774 | permissions for whole groups of devices with a single configuration | ||
775 | entry, rather than the conventional single entry per device entry. | ||
776 | |||
777 | Permissions database stored in mounted-over /dev | ||
778 | |||
779 | If you wish to save and restore your device permissions into the | ||
780 | disc-based /dev while still mounting devfs onto /dev | ||
781 | you may do so. This requires a 2.4.x kernel (in fact, 2.3.99 or | ||
782 | later), which has the VFS binding facility. You need to do the | ||
783 | following to set this up: | ||
784 | |||
785 | |||
786 | |||
787 | make sure the kernel does not mount devfs at boot time | ||
788 | |||
789 | |||
790 | make sure you have a correct /dev/console entry in your | ||
791 | root file-system (where your disc-based /dev lives) | ||
792 | |||
793 | create the /dev-state directory | ||
794 | |||
795 | |||
796 | add the following lines near the very beginning of your boot | ||
797 | scripts: | ||
798 | |||
799 | mount --bind /dev /dev-state | ||
800 | mount -t devfs none /dev | ||
801 | devfsd /dev | ||
802 | |||
803 | |||
804 | |||
805 | |||
806 | add the following lines to your /etc/devfsd.conf file: | ||
807 | |||
808 | REGISTER ^pt[sy] IGNORE | ||
809 | CREATE ^pt[sy] IGNORE | ||
810 | CHANGE ^pt[sy] IGNORE | ||
811 | DELETE ^pt[sy] IGNORE | ||
812 | REGISTER .* COPY /dev-state/$devname $devpath | ||
813 | CREATE .* COPY $devpath /dev-state/$devname | ||
814 | CHANGE .* COPY $devpath /dev-state/$devname | ||
815 | DELETE .* CFUNCTION GLOBAL unlink /dev-state/$devname | ||
816 | RESTORE /dev-state | ||
817 | |||
818 | Note that the sample devfsd.conf file contains these lines, | ||
819 | as well as other sample configurations you may find useful. See the | ||
820 | devfsd distribution | ||
821 | |||
822 | |||
823 | reboot. | ||
824 | |||
825 | |||
826 | |||
827 | |||
828 | Permissions database stored in normal directory | ||
829 | |||
830 | If you are using an older kernel which doesn't support VFS binding, | ||
831 | then you won't be able to have the permissions database in a | ||
832 | mounted-over /dev. However, you can still use a regular | ||
833 | directory to store the database. The sample /etc/devfsd.conf | ||
834 | file above may still be used. You will need to create the | ||
835 | /dev-state directory prior to installing devfsd. If you have | ||
836 | old permissions in /dev, then just copy (or move) the device | ||
837 | nodes over to the new directory. | ||
838 | |||
839 | Which method is better? | ||
840 | |||
841 | The best method is to have the permissions database stored in the | ||
842 | mounted-over /dev. This is because you will not need to copy | ||
843 | device nodes over to /dev-state, and because it allows you to | ||
844 | switch between devfs and non-devfs kernels, without requiring you to | ||
845 | copy permissions between /dev-state (for devfs) and | ||
846 | /dev (for non-devfs). | ||
847 | |||
848 | |||
849 | Dealing with drivers without devfs support | ||
850 | |||
851 | Currently, not all device drivers in the kernel have been modified to | ||
852 | use devfs. Device drivers which do not yet have devfs support will not | ||
853 | automagically appear in devfs. The simplest way to create device nodes | ||
854 | for these drivers is to unpack a tarfile containing the required | ||
855 | device nodes. You can do this in your boot scripts. All your drivers | ||
856 | will now work as before. | ||
857 | |||
858 | Hopefully for most people devfs will have enough support so that they | ||
859 | can mount devfs directly over /dev without losing most functionality | ||
860 | (i.e. losing access to various devices). As of 22-JAN-1998 (devfs | ||
861 | patch version 10) I am now running this way. All the devices I have | ||
862 | are available in devfs, so I don't lose anything. | ||
863 | |||
864 | WARNING: if your configuration requires the old-style device names | ||
865 | (i.e. /dev/hda1 or /dev/sda1), you must install devfsd and configure | ||
866 | it to maintain compatibility entries. It is almost certain that you | ||
867 | will require this. Note that the kernel creates a compatibility entry | ||
868 | for the root device, so you don't need initrd. | ||
869 | |||
870 | Note that you no longer need to mount devpts if you use Unix98 PTYs, | ||
871 | as devfs can manage /dev/pts itself. This saves you some RAM, as you | ||
872 | don't need to compile and install devpts. Note that some versions of | ||
873 | glibc have a bug with Unix98 pty handling on devfs systems. Contact | ||
874 | the glibc maintainers for a fix. Glibc 2.1.3 has the fix. | ||
875 | |||
876 | Note also that apart from editing /etc/fstab, other things will need | ||
877 | to be changed if you *don't* install devfsd. Some software (like the X | ||
878 | server) hard-wire device names in their source. It really is much | ||
879 | easier to install devfsd so that compatibility entries are created. | ||
880 | You can then slowly migrate your system to using the new device names | ||
881 | (for example, by starting with /etc/fstab), and then limiting the | ||
882 | compatibility entries that devfsd creates. | ||
883 | |||
884 | IF YOU CONFIGURE TO MOUNT DEVFS AT BOOT, MAKE SURE YOU INSTALL DEVFSD | ||
885 | BEFORE YOU BOOT A DEVFS-ENABLED KERNEL! | ||
886 | |||
887 | Now that devfs has gone into the 2.3.46 kernel, I'm getting a lot of | ||
888 | reports back. Many of these are because people are trying to run | ||
889 | without devfsd, and hence some things break. Please just run devfsd if | ||
890 | things break. I want to concentrate on real bugs rather than | ||
891 | misconfiguration problems at the moment. If people are willing to fix | ||
892 | bugs/false assumptions in other code (i.e. glibc, X server) and submit | ||
893 | that to the respective maintainers, that would be great. | ||
894 | |||
895 | |||
896 | All the way with Devfs | ||
897 | |||
898 | The devfs kernel patch creates a rationalised device tree. As stated | ||
899 | above, if you want to keep using the old /dev naming scheme, | ||
900 | you just need to configure devfsd appopriately (see the man | ||
901 | page). People who prefer the old names can ignore this section. For | ||
902 | those of us who like the rationalised names and an uncluttered | ||
903 | /dev, read on. | ||
904 | |||
905 | If you don't run devfsd, or don't enable compatibility entry | ||
906 | management, then you will have to configure your system to use the new | ||
907 | names. For example, you will then need to edit your | ||
908 | /etc/fstab to use the new disc naming scheme. If you want to | ||
909 | be able to boot non-devfs kernels, you will need compatibility | ||
910 | symlinks in the underlying disc-based /dev pointing back to | ||
911 | the old-style names for when you boot a kernel without devfs. | ||
912 | |||
913 | You can selectively decide which devices you want compatibility | ||
914 | entries for. For example, you may only want compatibility entries for | ||
915 | BSD pseudo-terminal devices (otherwise you'll have to patch you C | ||
916 | library or use Unix98 ptys instead). It's just a matter of putting in | ||
917 | the correct regular expression into /dev/devfsd.conf. | ||
918 | |||
919 | There are other choices of naming schemes that you may prefer. For | ||
920 | example, I don't use the kernel-supplied | ||
921 | names, because they are too verbose. A common misconception is | ||
922 | that the kernel-supplied names are meant to be used directly in | ||
923 | configuration files. This is not the case. They are designed to | ||
924 | reflect the layout of the devices attached and to provide easy | ||
925 | classification. | ||
926 | |||
927 | If you like the kernel-supplied names, that's fine. If you don't then | ||
928 | you should be using devfsd to construct a namespace more to your | ||
929 | liking. Devfsd has built-in code to construct a | ||
930 | namespace that is both logical and easy to | ||
931 | manage. In essence, it creates a convenient abbreviation of the | ||
932 | kernel-supplied namespace. | ||
933 | |||
934 | You are of course free to build your own namespace. Devfsd has all the | ||
935 | infrastructure required to make this easy for you. All you need do is | ||
936 | write a script. You can even write some C code and devfsd can load the | ||
937 | shared object as a callable extension. | ||
938 | |||
939 | |||
940 | Other Issues | ||
941 | |||
942 | The init programme | ||
943 | Another thing to take note of is whether your init programme | ||
944 | creates a Unix socket /dev/telinit. Some versions of init | ||
945 | create /dev/telinit so that the telinit programme can | ||
946 | communicate with the init process. If you have such a system you need | ||
947 | to make sure that devfs is mounted over /dev *before* init | ||
948 | starts. In other words, you can't leave the mounting of devfs to | ||
949 | /etc/rc, since this is executed after init. Other | ||
950 | versions of init require a named pipe /dev/initctl | ||
951 | which must exist *before* init starts. Once again, you need to | ||
952 | mount devfs and then create the named pipe *before* init | ||
953 | starts. | ||
954 | |||
955 | The default behaviour now is not to mount devfs onto /dev at | ||
956 | boot time for 2.3.x and later kernels. You can correct this with the | ||
957 | "devfs=mount" boot option. This solves any problems with init, | ||
958 | and also prevents the dreaded: | ||
959 | |||
960 | Cannot open initial console | ||
961 | |||
962 | message. For 2.2.x kernels where you need to apply the devfs patch, | ||
963 | the default is to mount. | ||
964 | |||
965 | If you have automatic mounting of devfs onto /dev then you | ||
966 | may need to create /dev/initctl in your boot scripts. The | ||
967 | following lines should suffice: | ||
968 | |||
969 | mknod /dev/initctl p | ||
970 | kill -SIGUSR1 1 # tell init that /dev/initctl now exists | ||
971 | |||
972 | Alternatively, if you don't want the kernel to mount devfs onto | ||
973 | /dev then you could use the following procedure is a | ||
974 | guideline for how to get around /dev/initctl problems: | ||
975 | |||
976 | # cd /sbin | ||
977 | # mv init init.real | ||
978 | # cat > init | ||
979 | #! /bin/sh | ||
980 | mount -n -t devfs none /dev | ||
981 | mknod /dev/initctl p | ||
982 | exec /sbin/init.real $* | ||
983 | [control-D] | ||
984 | # chmod a+x init | ||
985 | |||
986 | Note that newer versions of init create /dev/initctl | ||
987 | automatically, so you don't have to worry about this. | ||
988 | |||
989 | Module autoloading | ||
990 | You will need to configure devfsd to enable module | ||
991 | autoloading. The following lines should be placed in your | ||
992 | /etc/devfsd.conf file: | ||
993 | |||
994 | LOOKUP .* MODLOAD | ||
995 | |||
996 | |||
997 | As of devfsd-v1.3.10, a generic /etc/modules.devfs | ||
998 | configuration file is installed, which is used by the MODLOAD | ||
999 | action. This should be sufficient for most configurations. If you | ||
1000 | require further configuration, edit your /etc/modules.conf | ||
1001 | file. The way module autoloading work with devfs is: | ||
1002 | |||
1003 | |||
1004 | a process attempts to lookup a device node (e.g. /dev/fred) | ||
1005 | |||
1006 | |||
1007 | if that device node does not exist, the full pathname is passed to | ||
1008 | devfsd as a string | ||
1009 | |||
1010 | |||
1011 | devfsd will pass the string to the modprobe programme (provided the | ||
1012 | configuration line shown above is present), and specifies that | ||
1013 | /etc/modules.devfs is the configuration file | ||
1014 | |||
1015 | |||
1016 | /etc/modules.devfs includes /etc/modules.conf to | ||
1017 | access local configurations | ||
1018 | |||
1019 | modprobe will search it's configuration files, looking for an alias | ||
1020 | that translates the pathname into a module name | ||
1021 | |||
1022 | |||
1023 | the translated pathname is then used to load the module. | ||
1024 | |||
1025 | |||
1026 | If you wanted a lookup of /dev/fred to load the | ||
1027 | mymod module, you would require the following configuration | ||
1028 | line in /etc/modules.conf: | ||
1029 | |||
1030 | alias /dev/fred mymod | ||
1031 | |||
1032 | The /etc/modules.devfs configuration file provides many such | ||
1033 | aliases for standard device names. If you look closely at this file, | ||
1034 | you will note that some modules require multiple alias configuration | ||
1035 | lines. This is required to support module autoloading for old and new | ||
1036 | device names. | ||
1037 | |||
1038 | Mounting root off a devfs device | ||
1039 | If you wish to mount root off a devfs device when you pass the | ||
1040 | "devfs=only" boot option, then you need to pass in the | ||
1041 | "root=<device>" option to the kernel when booting. If you use | ||
1042 | LILO, then you must have this in lilo.conf: | ||
1043 | |||
1044 | append = "root=<device>" | ||
1045 | |||
1046 | Surprised? Yep, so was I. It turns out if you have (as most people | ||
1047 | do): | ||
1048 | |||
1049 | root = <device> | ||
1050 | |||
1051 | |||
1052 | then LILO will determine the device number of <device> and will | ||
1053 | write that device number into a special place in the kernel image | ||
1054 | before starting the kernel, and the kernel will use that device number | ||
1055 | to mount the root filesystem. So, using the "append" variety ensures | ||
1056 | that LILO passes the root filesystem device as a string, which devfs | ||
1057 | can then use. | ||
1058 | |||
1059 | Note that this isn't an issue if you don't pass "devfs=only". | ||
1060 | |||
1061 | TTY issues | ||
1062 | The ttyname(3) function in some versions of the C library makes | ||
1063 | false assumptions about device entries which are symbolic links. The | ||
1064 | tty(1) programme is one that depends on this function. I've | ||
1065 | written a patch to libc 5.4.43 which fixes this. This has been | ||
1066 | included in libc 5.4.44 and a similar fix is in glibc 2.1.3. | ||
1067 | |||
1068 | |||
1069 | Kernel Naming Scheme | ||
1070 | |||
1071 | The kernel provides a default naming scheme. This scheme is designed | ||
1072 | to make it easy to search for specific devices or device types, and to | ||
1073 | view the available devices. Some device types (such as hard discs), | ||
1074 | have a directory of entries, making it easy to see what devices of | ||
1075 | that class are available. Often, the entries are symbolic links into a | ||
1076 | directory tree that reflects the topology of available devices. The | ||
1077 | topological tree is useful for finding how your devices are arranged. | ||
1078 | |||
1079 | Below is a list of the naming schemes for the most common drivers. A | ||
1080 | list of reserved device names is | ||
1081 | available for reference. Please send email to | ||
1082 | rgooch@atnf.csiro.au to obtain an allocation. Please be | ||
1083 | patient (the maintainer is busy). An alternative name may be allocated | ||
1084 | instead of the requested name, at the discretion of the maintainer. | ||
1085 | |||
1086 | Disc Devices | ||
1087 | |||
1088 | All discs, whether SCSI, IDE or whatever, are placed under the | ||
1089 | /dev/discs hierarchy: | ||
1090 | |||
1091 | /dev/discs/disc0 first disc | ||
1092 | /dev/discs/disc1 second disc | ||
1093 | |||
1094 | |||
1095 | Each of these entries is a symbolic link to the directory for that | ||
1096 | device. The device directory contains: | ||
1097 | |||
1098 | disc for the whole disc | ||
1099 | part* for individual partitions | ||
1100 | |||
1101 | |||
1102 | CD-ROM Devices | ||
1103 | |||
1104 | All CD-ROMs, whether SCSI, IDE or whatever, are placed under the | ||
1105 | /dev/cdroms hierarchy: | ||
1106 | |||
1107 | /dev/cdroms/cdrom0 first CD-ROM | ||
1108 | /dev/cdroms/cdrom1 second CD-ROM | ||
1109 | |||
1110 | |||
1111 | Each of these entries is a symbolic link to the real device entry for | ||
1112 | that device. | ||
1113 | |||
1114 | Tape Devices | ||
1115 | |||
1116 | All tapes, whether SCSI, IDE or whatever, are placed under the | ||
1117 | /dev/tapes hierarchy: | ||
1118 | |||
1119 | /dev/tapes/tape0 first tape | ||
1120 | /dev/tapes/tape1 second tape | ||
1121 | |||
1122 | |||
1123 | Each of these entries is a symbolic link to the directory for that | ||
1124 | device. The device directory contains: | ||
1125 | |||
1126 | mt for mode 0 | ||
1127 | mtl for mode 1 | ||
1128 | mtm for mode 2 | ||
1129 | mta for mode 3 | ||
1130 | mtn for mode 0, no rewind | ||
1131 | mtln for mode 1, no rewind | ||
1132 | mtmn for mode 2, no rewind | ||
1133 | mtan for mode 3, no rewind | ||
1134 | |||
1135 | |||
1136 | SCSI Devices | ||
1137 | |||
1138 | To uniquely identify any SCSI device requires the following | ||
1139 | information: | ||
1140 | |||
1141 | controller (host adapter) | ||
1142 | bus (SCSI channel) | ||
1143 | target (SCSI ID) | ||
1144 | unit (Logical Unit Number) | ||
1145 | |||
1146 | |||
1147 | All SCSI devices are placed under /dev/scsi (assuming devfs | ||
1148 | is mounted on /dev). Hence, a SCSI device with the following | ||
1149 | parameters: c=1,b=2,t=3,u=4 would appear as: | ||
1150 | |||
1151 | /dev/scsi/host1/bus2/target3/lun4 device directory | ||
1152 | |||
1153 | |||
1154 | Inside this directory, a number of device entries may be created, | ||
1155 | depending on which SCSI device-type drivers were installed. | ||
1156 | |||
1157 | See the section on the disc naming scheme to see what entries the SCSI | ||
1158 | disc driver creates. | ||
1159 | |||
1160 | See the section on the tape naming scheme to see what entries the SCSI | ||
1161 | tape driver creates. | ||
1162 | |||
1163 | The SCSI CD-ROM driver creates: | ||
1164 | |||
1165 | cd | ||
1166 | |||
1167 | |||
1168 | The SCSI generic driver creates: | ||
1169 | |||
1170 | generic | ||
1171 | |||
1172 | |||
1173 | IDE Devices | ||
1174 | |||
1175 | To uniquely identify any IDE device requires the following | ||
1176 | information: | ||
1177 | |||
1178 | controller | ||
1179 | bus (aka. primary/secondary) | ||
1180 | target (aka. master/slave) | ||
1181 | unit | ||
1182 | |||
1183 | |||
1184 | All IDE devices are placed under /dev/ide, and uses a similar | ||
1185 | naming scheme to the SCSI subsystem. | ||
1186 | |||
1187 | XT Hard Discs | ||
1188 | |||
1189 | All XT discs are placed under /dev/xd. The first XT disc has | ||
1190 | the directory /dev/xd/disc0. | ||
1191 | |||
1192 | TTY devices | ||
1193 | |||
1194 | The tty devices now appear as: | ||
1195 | |||
1196 | New name Old-name Device Type | ||
1197 | -------- -------- ----------- | ||
1198 | /dev/tts/{0,1,...} /dev/ttyS{0,1,...} Serial ports | ||
1199 | /dev/cua/{0,1,...} /dev/cua{0,1,...} Call out devices | ||
1200 | /dev/vc/0 /dev/tty Current virtual console | ||
1201 | /dev/vc/{1,2,...} /dev/tty{1...63} Virtual consoles | ||
1202 | /dev/vcc/{0,1,...} /dev/vcs{1...63} Virtual consoles | ||
1203 | /dev/pty/m{0,1,...} /dev/ptyp?? PTY masters | ||
1204 | /dev/pty/s{0,1,...} /dev/ttyp?? PTY slaves | ||
1205 | |||
1206 | |||
1207 | RAMDISCS | ||
1208 | |||
1209 | The RAMDISCS are placed in their own directory, and are named thus: | ||
1210 | |||
1211 | /dev/rd/{0,1,2,...} | ||
1212 | |||
1213 | |||
1214 | Meta Devices | ||
1215 | |||
1216 | The meta devices are placed in their own directory, and are named | ||
1217 | thus: | ||
1218 | |||
1219 | /dev/md/{0,1,2,...} | ||
1220 | |||
1221 | |||
1222 | Floppy discs | ||
1223 | |||
1224 | Floppy discs are placed in the /dev/floppy directory. | ||
1225 | |||
1226 | Loop devices | ||
1227 | |||
1228 | Loop devices are placed in the /dev/loop directory. | ||
1229 | |||
1230 | Sound devices | ||
1231 | |||
1232 | Sound devices are placed in the /dev/sound directory | ||
1233 | (audio, sequencer, ...). | ||
1234 | |||
1235 | |||
1236 | Devfsd Naming Scheme | ||
1237 | |||
1238 | Devfsd provides a naming scheme which is a convenient abbreviation of | ||
1239 | the kernel-supplied namespace. In some | ||
1240 | cases, the kernel-supplied naming scheme is quite convenient, so | ||
1241 | devfsd does not provide another naming scheme. The convenience names | ||
1242 | that devfsd creates are in fact the same names as the original devfs | ||
1243 | kernel patch created (before Linus mandated the Big Name | ||
1244 | Change). These are referred to as "new compatibility entries". | ||
1245 | |||
1246 | In order to configure devfsd to create these convenience names, the | ||
1247 | following lines should be placed in your /etc/devfsd.conf: | ||
1248 | |||
1249 | REGISTER .* MKNEWCOMPAT | ||
1250 | UNREGISTER .* RMNEWCOMPAT | ||
1251 | |||
1252 | This will cause devfsd to create (and destroy) symbolic links which | ||
1253 | point to the kernel-supplied names. | ||
1254 | |||
1255 | SCSI Hard Discs | ||
1256 | |||
1257 | All SCSI discs are placed under /dev/sd (assuming devfs is | ||
1258 | mounted on /dev). Hence, a SCSI disc with the following | ||
1259 | parameters: c=1,b=2,t=3,u=4 would appear as: | ||
1260 | |||
1261 | /dev/sd/c1b2t3u4 for the whole disc | ||
1262 | /dev/sd/c1b2t3u4p5 for the 5th partition | ||
1263 | /dev/sd/c1b2t3u4p5s6 for the 6th slice in the 5th partition | ||
1264 | |||
1265 | |||
1266 | SCSI Tapes | ||
1267 | |||
1268 | All SCSI tapes are placed under /dev/st. A similar naming | ||
1269 | scheme is used as for SCSI discs. A SCSI tape with the | ||
1270 | parameters:c=1,b=2,t=3,u=4 would appear as: | ||
1271 | |||
1272 | /dev/st/c1b2t3u4m0 for mode 0 | ||
1273 | /dev/st/c1b2t3u4m1 for mode 1 | ||
1274 | /dev/st/c1b2t3u4m2 for mode 2 | ||
1275 | /dev/st/c1b2t3u4m3 for mode 3 | ||
1276 | /dev/st/c1b2t3u4m0n for mode 0, no rewind | ||
1277 | /dev/st/c1b2t3u4m1n for mode 1, no rewind | ||
1278 | /dev/st/c1b2t3u4m2n for mode 2, no rewind | ||
1279 | /dev/st/c1b2t3u4m3n for mode 3, no rewind | ||
1280 | |||
1281 | |||
1282 | SCSI CD-ROMs | ||
1283 | |||
1284 | All SCSI CD-ROMs are placed under /dev/sr. A similar naming | ||
1285 | scheme is used as for SCSI discs. A SCSI CD-ROM with the | ||
1286 | parameters:c=1,b=2,t=3,u=4 would appear as: | ||
1287 | |||
1288 | /dev/sr/c1b2t3u4 | ||
1289 | |||
1290 | |||
1291 | SCSI Generic Devices | ||
1292 | |||
1293 | The generic (aka. raw) interface for all SCSI devices are placed under | ||
1294 | /dev/sg. A similar naming scheme is used as for SCSI discs. A | ||
1295 | SCSI generic device with the parameters:c=1,b=2,t=3,u=4 would appear | ||
1296 | as: | ||
1297 | |||
1298 | /dev/sg/c1b2t3u4 | ||
1299 | |||
1300 | |||
1301 | IDE Hard Discs | ||
1302 | |||
1303 | All IDE discs are placed under /dev/ide/hd, using a similar | ||
1304 | convention to SCSI discs. The following mappings exist between the new | ||
1305 | and the old names: | ||
1306 | |||
1307 | /dev/hda /dev/ide/hd/c0b0t0u0 | ||
1308 | /dev/hdb /dev/ide/hd/c0b0t1u0 | ||
1309 | /dev/hdc /dev/ide/hd/c0b1t0u0 | ||
1310 | /dev/hdd /dev/ide/hd/c0b1t1u0 | ||
1311 | |||
1312 | |||
1313 | IDE Tapes | ||
1314 | |||
1315 | A similar naming scheme is used as for IDE discs. The entries will | ||
1316 | appear in the /dev/ide/mt directory. | ||
1317 | |||
1318 | IDE CD-ROM | ||
1319 | |||
1320 | A similar naming scheme is used as for IDE discs. The entries will | ||
1321 | appear in the /dev/ide/cd directory. | ||
1322 | |||
1323 | IDE Floppies | ||
1324 | |||
1325 | A similar naming scheme is used as for IDE discs. The entries will | ||
1326 | appear in the /dev/ide/fd directory. | ||
1327 | |||
1328 | XT Hard Discs | ||
1329 | |||
1330 | All XT discs are placed under /dev/xd. The first XT disc | ||
1331 | would appear as /dev/xd/c0t0. | ||
1332 | |||
1333 | |||
1334 | Old Compatibility Names | ||
1335 | |||
1336 | The old compatibility names are the legacy device names, such as | ||
1337 | /dev/hda, /dev/sda, /dev/rtc and so on. | ||
1338 | Devfsd can be configured to create compatibility symlinks so that you | ||
1339 | may continue to use the old names in your configuration files and so | ||
1340 | that old applications will continue to function correctly. | ||
1341 | |||
1342 | In order to configure devfsd to create these legacy names, the | ||
1343 | following lines should be placed in your /etc/devfsd.conf: | ||
1344 | |||
1345 | REGISTER .* MKOLDCOMPAT | ||
1346 | UNREGISTER .* RMOLDCOMPAT | ||
1347 | |||
1348 | This will cause devfsd to create (and destroy) symbolic links which | ||
1349 | point to the kernel-supplied names. | ||
1350 | |||
1351 | |||
1352 | ----------------------------------------------------------------------------- | ||
1353 | |||
1354 | |||
1355 | Device drivers currently ported | ||
1356 | |||
1357 | - All miscellaneous character devices support devfs (this is done | ||
1358 | transparently through misc_register()) | ||
1359 | |||
1360 | - SCSI discs and generic hard discs | ||
1361 | |||
1362 | - Character memory devices (null, zero, full and so on) | ||
1363 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
1364 | |||
1365 | - Loop devices (/dev/loop?) | ||
1366 | |||
1367 | - TTY devices (console, serial ports, terminals and pseudo-terminals) | ||
1368 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
1369 | |||
1370 | - SCSI tapes (/dev/scsi and /dev/tapes) | ||
1371 | |||
1372 | - SCSI CD-ROMs (/dev/scsi and /dev/cdroms) | ||
1373 | |||
1374 | - SCSI generic devices (/dev/scsi) | ||
1375 | |||
1376 | - RAMDISCS (/dev/ram?) | ||
1377 | |||
1378 | - Meta Devices (/dev/md*) | ||
1379 | |||
1380 | - Floppy discs (/dev/floppy) | ||
1381 | |||
1382 | - Parallel port printers (/dev/printers) | ||
1383 | |||
1384 | - Sound devices (/dev/sound) | ||
1385 | Thanks to Eric Dumas <dumas@linux.eu.org> and | ||
1386 | C. Scott Ananian <cananian@alumni.princeton.edu> | ||
1387 | |||
1388 | - Joysticks (/dev/joysticks) | ||
1389 | |||
1390 | - Sparc keyboard (/dev/kbd) | ||
1391 | |||
1392 | - DSP56001 digital signal processor (/dev/dsp56k) | ||
1393 | |||
1394 | - Apple Desktop Bus (/dev/adb) | ||
1395 | |||
1396 | - Coda network file system (/dev/cfs*) | ||
1397 | |||
1398 | - Virtual console capture devices (/dev/vcc) | ||
1399 | Thanks to Dennis Hou <smilax@mindmeld.yi.org> | ||
1400 | |||
1401 | - Frame buffer devices (/dev/fb) | ||
1402 | |||
1403 | - Video capture devices (/dev/v4l) | ||
1404 | |||
1405 | |||
1406 | ----------------------------------------------------------------------------- | ||
1407 | |||
1408 | |||
1409 | Allocation of Device Numbers | ||
1410 | |||
1411 | Devfs allows you to write a driver which doesn't need to allocate a | ||
1412 | device number (major&minor numbers) for the internal operation of the | ||
1413 | kernel. However, there are a number of userspace programmes that use | ||
1414 | the device number as a unique handle for a device. An example is the | ||
1415 | find programme, which uses device numbers to determine whether | ||
1416 | an inode is on a different filesystem than another inode. The device | ||
1417 | number used is the one for the block device which a filesystem is | ||
1418 | using. To preserve compatibility with userspace programmes, block | ||
1419 | devices using devfs need to have unique device numbers allocated to | ||
1420 | them. Furthermore, POSIX specifies device numbers, so some kind of | ||
1421 | device number needs to be presented to userspace. | ||
1422 | |||
1423 | The simplest option (especially when porting drivers to devfs) is to | ||
1424 | keep using the old major and minor numbers. Devfs will take whatever | ||
1425 | values are given for major&minor and pass them onto userspace. | ||
1426 | |||
1427 | This device number is a 16 bit number, so this leaves plenty of space | ||
1428 | for large numbers of discs and partitions. This scheme can also be | ||
1429 | used for character devices, in particular the tty devices, which are | ||
1430 | currently limited to 256 pseudo-ttys (this limits the total number of | ||
1431 | simultaneous xterms and remote logins). Note that the device number | ||
1432 | is limited to the range 36864-61439 (majors 144-239), in order to | ||
1433 | avoid any possible conflicts with existing official allocations. | ||
1434 | |||
1435 | Please note that using dynamically allocated block device numbers may | ||
1436 | break the NFS daemons (both user and kernel mode), which expect dev_t | ||
1437 | for a given device to be constant over the lifetime of remote mounts. | ||
1438 | |||
1439 | A final note on this scheme: since it doesn't increase the size of | ||
1440 | device numbers, there are no compatibility issues with userspace. | ||
1441 | |||
1442 | ----------------------------------------------------------------------------- | ||
1443 | |||
1444 | |||
1445 | Questions and Answers | ||
1446 | |||
1447 | |||
1448 | Making things work | ||
1449 | Alternatives to devfs | ||
1450 | What I don't like about devfs | ||
1451 | How to report bugs | ||
1452 | Strange kernel messages | ||
1453 | Compilation problems with devfsd | ||
1454 | |||
1455 | |||
1456 | |||
1457 | Making things work | ||
1458 | |||
1459 | Here are some common questions and answers. | ||
1460 | |||
1461 | |||
1462 | |||
1463 | Devfsd doesn't start | ||
1464 | |||
1465 | Make sure you have compiled and installed devfsd | ||
1466 | Make sure devfsd is being started from your boot | ||
1467 | scripts | ||
1468 | Make sure you have configured your kernel to enable devfs (see | ||
1469 | below) | ||
1470 | Make sure devfs is mounted (see below) | ||
1471 | |||
1472 | |||
1473 | Devfsd is not managing all my permissions | ||
1474 | |||
1475 | Make sure you are capturing the appropriate events. For example, | ||
1476 | device entries created by the kernel generate REGISTER events, | ||
1477 | but those created by devfsd generate CREATE events. | ||
1478 | |||
1479 | |||
1480 | Devfsd is not capturing all REGISTER events | ||
1481 | |||
1482 | See the previous entry: you may need to capture CREATE events. | ||
1483 | |||
1484 | |||
1485 | X will not start | ||
1486 | |||
1487 | Make sure you followed the steps | ||
1488 | outlined above. | ||
1489 | |||
1490 | |||
1491 | Why don't my network devices appear in devfs? | ||
1492 | |||
1493 | This is not a bug. Network devices have their own, completely separate | ||
1494 | namespace. They are accessed via socket(2) and | ||
1495 | setsockopt(2) calls, and thus require no device nodes. I have | ||
1496 | raised the possibilty of moving network devices into the device | ||
1497 | namespace, but have had no response. | ||
1498 | |||
1499 | |||
1500 | How can I test if I have devfs compiled into my kernel? | ||
1501 | |||
1502 | All filesystems built-in or currently loaded are listed in | ||
1503 | /proc/filesystems. If you see a devfs entry, then | ||
1504 | you know that devfs was compiled into your kernel. If you have | ||
1505 | correctly configured and rebuilt your kernel, then devfs will be | ||
1506 | built-in. If you think you've configured it in, but | ||
1507 | /proc/filesystems doesn't show it, you've made a mistake. | ||
1508 | Common mistakes include: | ||
1509 | |||
1510 | Using a 2.2.x kernel without applying the devfs patch (if you | ||
1511 | don't know how to patch your kernel, use 2.4.x instead, don't bother | ||
1512 | asking me how to patch) | ||
1513 | Forgetting to set CONFIG_EXPERIMENTAL=y | ||
1514 | Forgetting to set CONFIG_DEVFS_FS=y | ||
1515 | Forgetting to set CONFIG_DEVFS_MOUNT=y (if you want devfs | ||
1516 | to be automatically mounted at boot) | ||
1517 | Editing your .config manually, instead of using make | ||
1518 | config or make xconfig | ||
1519 | Forgetting to run make dep; make clean after changing the | ||
1520 | configuration and before compiling | ||
1521 | Forgetting to compile your kernel and modules | ||
1522 | Forgetting to install your kernel | ||
1523 | Forgetting to install your modules | ||
1524 | |||
1525 | Please check twice that you've done all these steps before sending in | ||
1526 | a bug report. | ||
1527 | |||
1528 | |||
1529 | |||
1530 | How can I test if devfs is mounted on /dev? | ||
1531 | |||
1532 | The device filesystem will always create an entry called | ||
1533 | ".devfsd", which is used to communicate with the daemon. Even | ||
1534 | if the daemon is not running, this entry will exist. Testing for the | ||
1535 | existence of this entry is the approved method of determining if devfs | ||
1536 | is mounted or not. Note that the type of entry (i.e. regular file, | ||
1537 | character device, named pipe, etc.) may change without notice. Only | ||
1538 | the existence of the entry should be relied upon. | ||
1539 | |||
1540 | |||
1541 | When I start devfsd, I see the error: | ||
1542 | Error opening file: ".devfsd" No such file or directory? | ||
1543 | |||
1544 | This means that devfs is not mounted. Make sure you have devfs mounted. | ||
1545 | |||
1546 | |||
1547 | How do I mount devfs? | ||
1548 | |||
1549 | First make sure you have devfs compiled into your kernel (see | ||
1550 | above). Then you will either need to: | ||
1551 | |||
1552 | set CONFIG_DEVFS_MOUNT=y in your kernel config | ||
1553 | pass devfs=mount to your boot loader | ||
1554 | mount devfs manually in your boot scripts with: | ||
1555 | mount -t none devfs /dev | ||
1556 | |||
1557 | |||
1558 | |||
1559 | Mount by volume LABEL=<label> doesn't work with | ||
1560 | devfs | ||
1561 | |||
1562 | Most probably you are not mounting devfs onto /dev. What | ||
1563 | happens is that if your kernel config has CONFIG_DEVFS_FS=y | ||
1564 | then the contents of /proc/partitions will have the devfs | ||
1565 | names (such as scsi/host0/bus0/target0/lun0/part1). The | ||
1566 | contents of /proc/partitions are used by mount(8) when | ||
1567 | mounting by volume label. If devfs is not mounted on /dev, | ||
1568 | then mount(8) will fail to find devices. The solution is to | ||
1569 | make sure that devfs is mounted on /dev. See above for how to | ||
1570 | do that. | ||
1571 | |||
1572 | |||
1573 | I have extra or incorrect entries in /dev | ||
1574 | |||
1575 | You may have stale entries in your dev-state area. Check for a | ||
1576 | RESTORE configuration line in your devfsd configuration | ||
1577 | (typically /etc/devfsd.conf). If you have this line, check | ||
1578 | the contents of the specified directory for stale entries. Remove | ||
1579 | any entries which are incorrect, then reboot. | ||
1580 | |||
1581 | |||
1582 | I get "Unable to open initial console" messages at boot | ||
1583 | |||
1584 | This usually happens when you don't have devfs automounted onto | ||
1585 | /dev at boot time, and there is no valid | ||
1586 | /dev/console entry on your root file-system. Create a valid | ||
1587 | /dev/console device node. | ||
1588 | |||
1589 | |||
1590 | |||
1591 | |||
1592 | |||
1593 | Alternatives to devfs | ||
1594 | |||
1595 | I've attempted to collate all the anti-devfs proposals and explain | ||
1596 | their limitations. Under construction. | ||
1597 | |||
1598 | |||
1599 | Why not just pass device create/remove events to a daemon? | ||
1600 | |||
1601 | Here the suggestion is to develop an API in the kernel so that devices | ||
1602 | can register create and remove events, and a daemon listens for those | ||
1603 | events. The daemon would then populate/depopulate /dev (which | ||
1604 | resides on disc). | ||
1605 | |||
1606 | This has several limitations: | ||
1607 | |||
1608 | |||
1609 | it only works for modules loaded and unloaded (or devices inserted | ||
1610 | and removed) after the kernel has finished booting. Without a database | ||
1611 | of events, there is no way the daemon could fully populate | ||
1612 | /dev | ||
1613 | |||
1614 | |||
1615 | if you add a database to this scheme, the question is then how to | ||
1616 | present that database to user-space. If you make it a list of strings | ||
1617 | with embedded event codes which are passed through a pipe to the | ||
1618 | daemon, then this is only of use to the daemon. I would argue that the | ||
1619 | natural way to present this data is via a filesystem (since many of | ||
1620 | the events will be of a hierarchical nature), such as devfs. | ||
1621 | Presenting the data as a filesystem makes it easy for the user to see | ||
1622 | what is available and also makes it easy to write scripts to scan the | ||
1623 | "database" | ||
1624 | |||
1625 | |||
1626 | the tight binding between device nodes and drivers is no longer | ||
1627 | possible (requiring the otherwise perfectly avoidable | ||
1628 | table lookups) | ||
1629 | |||
1630 | |||
1631 | you cannot catch inode lookup events on /dev which means | ||
1632 | that module autoloading requires device nodes to be created. This is a | ||
1633 | problem, particularly for drivers where only a few inodes are created | ||
1634 | from a potentially large set | ||
1635 | |||
1636 | |||
1637 | this technique can't be used when the root FS is mounted | ||
1638 | read-only | ||
1639 | |||
1640 | |||
1641 | |||
1642 | |||
1643 | Just implement a better scsidev | ||
1644 | |||
1645 | This suggestion involves taking the scsidev programme and | ||
1646 | extending it to scan for all devices, not just SCSI devices. The | ||
1647 | scsidev programme works by scanning /proc/scsi | ||
1648 | |||
1649 | Problems: | ||
1650 | |||
1651 | |||
1652 | the kernel does not currently provide a list of all devices | ||
1653 | available. Not all drivers register entries in /proc or | ||
1654 | generate kernel messages | ||
1655 | |||
1656 | |||
1657 | there is no uniform mechanism to register devices other than the | ||
1658 | devfs API | ||
1659 | |||
1660 | |||
1661 | implementing such an API is then the same as the | ||
1662 | proposal above | ||
1663 | |||
1664 | |||
1665 | |||
1666 | |||
1667 | Put /dev on a ramdisc | ||
1668 | |||
1669 | This suggestion involves creating a ramdisc and populating it with | ||
1670 | device nodes and then mounting it over /dev. | ||
1671 | |||
1672 | Problems: | ||
1673 | |||
1674 | |||
1675 | |||
1676 | this doesn't help when mounting the root filesystem, since you | ||
1677 | still need a device node to do that | ||
1678 | |||
1679 | |||
1680 | if you want to use this technique for the root device node as | ||
1681 | well, you need to use initrd. This complicates the booting sequence | ||
1682 | and makes it significantly harder to administer and configure. The | ||
1683 | initrd is essentially opaque, robbing the system administrator of easy | ||
1684 | configuration | ||
1685 | |||
1686 | |||
1687 | insufficient information is available to correctly populate the | ||
1688 | ramdisc. So we come back to the | ||
1689 | proposal above to "solve" this | ||
1690 | |||
1691 | |||
1692 | a ramdisc-based solution would take more kernel memory, since the | ||
1693 | backing store would be (at best) normal VFS inodes and dentries, which | ||
1694 | take 284 bytes and 112 bytes, respectively, for each entry. Compare | ||
1695 | that to 72 bytes for devfs | ||
1696 | |||
1697 | |||
1698 | |||
1699 | |||
1700 | Do nothing: there's no problem | ||
1701 | |||
1702 | Sometimes people can be heard to claim that the existing scheme is | ||
1703 | fine. This is what they're ignoring: | ||
1704 | |||
1705 | |||
1706 | device number size (8 bits each for major and minor) is a real | ||
1707 | limitation, and must be fixed somehow. Systems with large numbers of | ||
1708 | SCSI devices, for example, will continue to consume the remaining | ||
1709 | unallocated major numbers. USB will also need to push beyond the 8 bit | ||
1710 | minor limitation | ||
1711 | |||
1712 | |||
1713 | simply increasing the device number size is insufficient. Apart | ||
1714 | from causing a lot of pain, it doesn't solve the management issues | ||
1715 | of a /dev with thousands or more device nodes | ||
1716 | |||
1717 | |||
1718 | ignoring the problem of a huge /dev will not make it go | ||
1719 | away, and dismisses the legitimacy of a large number of people who | ||
1720 | want a dynamic /dev | ||
1721 | |||
1722 | |||
1723 | the standard response then becomes: "write a device management | ||
1724 | daemon", which brings us back to the | ||
1725 | proposal above | ||
1726 | |||
1727 | |||
1728 | |||
1729 | |||
1730 | What I don't like about devfs | ||
1731 | |||
1732 | Here are some common complaints about devfs, and some suggestions and | ||
1733 | solutions that may make it more palatable for you. I can't please | ||
1734 | everybody, but I do try :-) | ||
1735 | |||
1736 | I hate the naming scheme | ||
1737 | |||
1738 | First, remember that no naming scheme will please everybody. You hate | ||
1739 | the scheme, others love it. Who's to say who's right and who's wrong? | ||
1740 | Ultimately, the person who writes the code gets to choose, and what | ||
1741 | exists now is a combination of the choices made by the | ||
1742 | devfs author and the | ||
1743 | kernel maintainer (Linus). | ||
1744 | |||
1745 | However, not all is lost. If you want to create your own naming | ||
1746 | scheme, it is a simple matter to write a standalone script, hack | ||
1747 | devfsd, or write a script called by devfsd. You can create whatever | ||
1748 | naming scheme you like. | ||
1749 | |||
1750 | Further, if you want to remove all traces of the devfs naming scheme | ||
1751 | from /dev, you can mount devfs elsewhere (say | ||
1752 | /devfs) and populate /dev with links into | ||
1753 | /devfs. This population can be automated using devfsd if you | ||
1754 | wish. | ||
1755 | |||
1756 | You can even use the VFS binding facility to make the links, rather | ||
1757 | than using symbolic links. This way, you don't even have to see the | ||
1758 | "destination" of these symbolic links. | ||
1759 | |||
1760 | Devfs puts policy into the kernel | ||
1761 | |||
1762 | There's already policy in the kernel. Device numbers are in fact | ||
1763 | policy (why should the kernel dictate what device numbers I use?). | ||
1764 | Face it, some policy has to be in the kernel. The real difference | ||
1765 | between device names as policy and device numbers as policy is that | ||
1766 | no one will use device numbers directly, because device | ||
1767 | numbers are devoid of meaning to humans and are ugly. At least with | ||
1768 | the devfs device names, (even though you can add your own naming | ||
1769 | scheme) some people will use the devfs-supplied names directly. This | ||
1770 | offends some people :-) | ||
1771 | |||
1772 | Devfs is bloatware | ||
1773 | |||
1774 | This is not even remotely true. As shown above, | ||
1775 | both code and data size are quite modest. | ||
1776 | |||
1777 | |||
1778 | How to report bugs | ||
1779 | |||
1780 | If you have (or think you have) a bug with devfs, please follow the | ||
1781 | steps below: | ||
1782 | |||
1783 | |||
1784 | |||
1785 | make sure you have enabled debugging output when configuring your | ||
1786 | kernel. You will need to set (at least) the following config options: | ||
1787 | |||
1788 | CONFIG_DEVFS_DEBUG=y | ||
1789 | CONFIG_DEBUG_KERNEL=y | ||
1790 | CONFIG_DEBUG_SLAB=y | ||
1791 | |||
1792 | |||
1793 | |||
1794 | please make sure you have the latest devfs patches applied. The | ||
1795 | latest kernel version might not have the latest devfs patches applied | ||
1796 | yet (Linus is very busy) | ||
1797 | |||
1798 | |||
1799 | save a copy of your complete kernel logs (preferably by | ||
1800 | using the dmesg programme) for later inclusion in your bug | ||
1801 | report. You may need to use the -s switch to increase the | ||
1802 | internal buffer size so you can capture all the boot messages. | ||
1803 | Don't edit or trim the dmesg output | ||
1804 | |||
1805 | |||
1806 | |||
1807 | |||
1808 | try booting with devfs=dall passed to the kernel boot | ||
1809 | command line (read the documentation on your bootloader on how to do | ||
1810 | this), and save the result to a file. This may be quite verbose, and | ||
1811 | it may overflow the messages buffer, but try to get as much of it as | ||
1812 | you can | ||
1813 | |||
1814 | |||
1815 | if you get an Oops, run ksymoops to decode it so that the | ||
1816 | names of the offending functions are provided. A non-decoded Oops is | ||
1817 | pretty useless | ||
1818 | |||
1819 | |||
1820 | send a copy of your devfsd configuration file(s) | ||
1821 | |||
1822 | send the bug report to me first. | ||
1823 | Don't expect that I will see it if you post it to the linux-kernel | ||
1824 | mailing list. Include all the information listed above, plus | ||
1825 | anything else that you think might be relevant. Put the string | ||
1826 | devfs somewhere in the subject line, so my mail filters mark | ||
1827 | it as urgent | ||
1828 | |||
1829 | |||
1830 | |||
1831 | |||
1832 | Here is a general guide on how to ask questions in a way that greatly | ||
1833 | improves your chances of getting a reply: | ||
1834 | |||
1835 | http://www.tuxedo.org/~esr/faqs/smart-questions.html. If you have | ||
1836 | a bug to report, you should also read | ||
1837 | |||
1838 | http://www.chiark.greenend.org.uk/~sgtatham/bugs.html. | ||
1839 | |||
1840 | |||
1841 | Strange kernel messages | ||
1842 | |||
1843 | You may see devfs-related messages in your kernel logs. Below are some | ||
1844 | messages and what they mean (and what you should do about them, if | ||
1845 | anything). | ||
1846 | |||
1847 | |||
1848 | |||
1849 | devfs_register(fred): could not append to parent, err: -17 | ||
1850 | |||
1851 | You need to check what the error code means, but usually 17 means | ||
1852 | EEXIST. This means that a driver attempted to create an entry | ||
1853 | fred in a directory, but there already was an entry with that | ||
1854 | name. This is often caused by flawed boot scripts which untar a bunch | ||
1855 | of inodes into /dev, as a way to restore permissions. This | ||
1856 | message is harmless, as the device nodes will still | ||
1857 | provide access to the driver (unless you use the devfs=only | ||
1858 | boot option, which is only for dedicated souls:-). If you want to get | ||
1859 | rid of these annoying messages, upgrade to devfsd-v1.3.20 and use the | ||
1860 | recommended RESTORE directive to restore permissions. | ||
1861 | |||
1862 | |||
1863 | devfs_mk_dir(bill): using old entry in dir: c1808724 "" | ||
1864 | |||
1865 | This is similar to the message above, except that a driver attempted | ||
1866 | to create a directory named bill, and the parent directory | ||
1867 | has an entry with the same name. In this case, to ensure that drivers | ||
1868 | continue to work properly, the old entry is re-used and given to the | ||
1869 | driver. In 2.5 kernels, the driver is given a NULL entry, and thus, | ||
1870 | under rare circumstances, may not create the require device nodes. | ||
1871 | The solution is the same as above. | ||
1872 | |||
1873 | |||
1874 | |||
1875 | |||
1876 | |||
1877 | Compilation problems with devfsd | ||
1878 | |||
1879 | Usually, you can compile devfsd just by typing in | ||
1880 | make in the source directory, followed by a make | ||
1881 | install (as root). Sometimes, you may have problems, particularly | ||
1882 | on broken configurations. | ||
1883 | |||
1884 | |||
1885 | |||
1886 | error messages relating to DEVFSD_NOTIFY_DELETE | ||
1887 | |||
1888 | This happened because you have an ancient set of kernel headers | ||
1889 | installed in /usr/include/linux or /usr/src/linux. | ||
1890 | Install kernel 2.4.10 or later. You may need to pass the | ||
1891 | KERNEL_DIR variable to make (if you did not install | ||
1892 | the new kernel sources as /usr/src/linux), or you may copy | ||
1893 | the devfs_fs.h file in the kernel source tree into | ||
1894 | /usr/include/linux. | ||
1895 | |||
1896 | |||
1897 | |||
1898 | |||
1899 | ----------------------------------------------------------------------------- | ||
1900 | |||
1901 | |||
1902 | Other resources | ||
1903 | |||
1904 | |||
1905 | |||
1906 | Douglas Gilbert has written a useful document at | ||
1907 | |||
1908 | http://www.torque.net/sg/devfs_scsi.html which | ||
1909 | explores the SCSI subsystem and how it interacts with devfs | ||
1910 | |||
1911 | |||
1912 | Douglas Gilbert has written another useful document at | ||
1913 | |||
1914 | http://www.torque.net/scsi/SCSI-2.4-HOWTO/ which | ||
1915 | discusses the Linux SCSI subsystem in 2.4. | ||
1916 | |||
1917 | |||
1918 | Johannes Erdfelt has started a discussion paper on Linux and | ||
1919 | hot-swap devices, describing what the requirements are for a scalable | ||
1920 | solution and how and why he's used devfs+devfsd. Note that this is an | ||
1921 | early draft only, available in plain text form at: | ||
1922 | |||
1923 | http://johannes.erdfelt.com/hotswap.txt. | ||
1924 | Johannes has promised a HTML version will follow. | ||
1925 | |||
1926 | |||
1927 | I presented an invited | ||
1928 | paper | ||
1929 | at the | ||
1930 | |||
1931 | 2nd Annual Storage Management Workshop held in Miamia, Florida, | ||
1932 | U.S.A. in October 2000. | ||
1933 | |||
1934 | |||
1935 | |||
1936 | |||
1937 | ----------------------------------------------------------------------------- | ||
1938 | |||
1939 | |||
1940 | Translations of this document | ||
1941 | |||
1942 | This document has been translated into other languages. | ||
1943 | |||
1944 | |||
1945 | |||
1946 | |||
1947 | The document master (in English) by rgooch@atnf.csiro.au is | ||
1948 | available at | ||
1949 | |||
1950 | http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html | ||
1951 | |||
1952 | |||
1953 | |||
1954 | A Korean translation by viatoris@nownuri.net is available at | ||
1955 | |||
1956 | http://your.destiny.pe.kr/devfs/devfs.html | ||
1957 | |||
1958 | |||
1959 | |||
1960 | |||
1961 | ----------------------------------------------------------------------------- | ||
1962 | Most flags courtesy of ITA's | ||
1963 | Flags of All Countries | ||
1964 | used with permission. | ||