diff options
Diffstat (limited to 'Documentation/filesystems')
| -rw-r--r-- | Documentation/filesystems/00-INDEX | 6 | ||||
| -rw-r--r-- | Documentation/filesystems/nfsroot.txt | 270 | ||||
| -rw-r--r-- | Documentation/filesystems/rpc-cache.txt | 202 | ||||
| -rw-r--r-- | Documentation/filesystems/seq_file.txt | 283 |
4 files changed, 761 insertions, 0 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index e68021c08fbd..52cd611277a3 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX | |||
| @@ -66,6 +66,8 @@ mandatory-locking.txt | |||
| 66 | - info on the Linux implementation of Sys V mandatory file locking. | 66 | - info on the Linux implementation of Sys V mandatory file locking. |
| 67 | ncpfs.txt | 67 | ncpfs.txt |
| 68 | - info on Novell Netware(tm) filesystem using NCP protocol. | 68 | - info on Novell Netware(tm) filesystem using NCP protocol. |
| 69 | nfsroot.txt | ||
| 70 | - short guide on setting up a diskless box with NFS root filesystem. | ||
| 69 | ntfs.txt | 71 | ntfs.txt |
| 70 | - info and mount options for the NTFS filesystem (Windows NT). | 72 | - info and mount options for the NTFS filesystem (Windows NT). |
| 71 | ocfs2.txt | 73 | ocfs2.txt |
| @@ -82,6 +84,10 @@ relay.txt | |||
| 82 | - info on relay, for efficient streaming from kernel to user space. | 84 | - info on relay, for efficient streaming from kernel to user space. |
| 83 | romfs.txt | 85 | romfs.txt |
| 84 | - description of the ROMFS filesystem. | 86 | - description of the ROMFS filesystem. |
| 87 | rpc-cache.txt | ||
| 88 | - introduction to the caching mechanisms in the sunrpc layer. | ||
| 89 | seq_file.txt | ||
| 90 | - how to use the seq_file API | ||
| 85 | sharedsubtree.txt | 91 | sharedsubtree.txt |
| 86 | - a description of shared subtrees for namespaces. | 92 | - a description of shared subtrees for namespaces. |
| 87 | smbfs.txt | 93 | smbfs.txt |
diff --git a/Documentation/filesystems/nfsroot.txt b/Documentation/filesystems/nfsroot.txt new file mode 100644 index 000000000000..31b329172343 --- /dev/null +++ b/Documentation/filesystems/nfsroot.txt | |||
| @@ -0,0 +1,270 @@ | |||
| 1 | Mounting the root filesystem via NFS (nfsroot) | ||
| 2 | =============================================== | ||
| 3 | |||
| 4 | Written 1996 by Gero Kuhlmann <gero@gkminix.han.de> | ||
| 5 | Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz> | ||
| 6 | Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org> | ||
| 7 | Updated 2006 by Horms <horms@verge.net.au> | ||
| 8 | |||
| 9 | |||
| 10 | |||
| 11 | In order to use a diskless system, such as an X-terminal or printer server | ||
| 12 | for example, it is necessary for the root filesystem to be present on a | ||
| 13 | non-disk device. This may be an initramfs (see Documentation/filesystems/ | ||
| 14 | ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a | ||
| 15 | filesystem mounted via NFS. The following text describes on how to use NFS | ||
| 16 | for the root filesystem. For the rest of this text 'client' means the | ||
| 17 | diskless system, and 'server' means the NFS server. | ||
| 18 | |||
| 19 | |||
| 20 | |||
| 21 | |||
| 22 | 1.) Enabling nfsroot capabilities | ||
| 23 | ----------------------------- | ||
| 24 | |||
| 25 | In order to use nfsroot, NFS client support needs to be selected as | ||
| 26 | built-in during configuration. Once this has been selected, the nfsroot | ||
| 27 | option will become available, which should also be selected. | ||
| 28 | |||
| 29 | In the networking options, kernel level autoconfiguration can be selected, | ||
| 30 | along with the types of autoconfiguration to support. Selecting all of | ||
| 31 | DHCP, BOOTP and RARP is safe. | ||
| 32 | |||
| 33 | |||
| 34 | |||
| 35 | |||
| 36 | 2.) Kernel command line | ||
| 37 | ------------------- | ||
| 38 | |||
| 39 | When the kernel has been loaded by a boot loader (see below) it needs to be | ||
| 40 | told what root fs device to use. And in the case of nfsroot, where to find | ||
| 41 | both the server and the name of the directory on the server to mount as root. | ||
| 42 | This can be established using the following kernel command line parameters: | ||
| 43 | |||
| 44 | |||
| 45 | root=/dev/nfs | ||
| 46 | |||
| 47 | This is necessary to enable the pseudo-NFS-device. Note that it's not a | ||
| 48 | real device but just a synonym to tell the kernel to use NFS instead of | ||
| 49 | a real device. | ||
| 50 | |||
| 51 | |||
| 52 | nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] | ||
| 53 | |||
| 54 | If the `nfsroot' parameter is NOT given on the command line, | ||
| 55 | the default "/tftpboot/%s" will be used. | ||
| 56 | |||
| 57 | <server-ip> Specifies the IP address of the NFS server. | ||
| 58 | The default address is determined by the `ip' parameter | ||
| 59 | (see below). This parameter allows the use of different | ||
| 60 | servers for IP autoconfiguration and NFS. | ||
| 61 | |||
| 62 | <root-dir> Name of the directory on the server to mount as root. | ||
| 63 | If there is a "%s" token in the string, it will be | ||
| 64 | replaced by the ASCII-representation of the client's | ||
| 65 | IP address. | ||
| 66 | |||
| 67 | <nfs-options> Standard NFS options. All options are separated by commas. | ||
| 68 | The following defaults are used: | ||
| 69 | port = as given by server portmap daemon | ||
| 70 | rsize = 4096 | ||
| 71 | wsize = 4096 | ||
| 72 | timeo = 7 | ||
| 73 | retrans = 3 | ||
| 74 | acregmin = 3 | ||
| 75 | acregmax = 60 | ||
| 76 | acdirmin = 30 | ||
| 77 | acdirmax = 60 | ||
| 78 | flags = hard, nointr, noposix, cto, ac | ||
| 79 | |||
| 80 | |||
| 81 | ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf> | ||
| 82 | |||
| 83 | This parameter tells the kernel how to configure IP addresses of devices | ||
| 84 | and also how to set up the IP routing table. It was originally called | ||
| 85 | `nfsaddrs', but now the boot-time IP configuration works independently of | ||
| 86 | NFS, so it was renamed to `ip' and the old name remained as an alias for | ||
| 87 | compatibility reasons. | ||
| 88 | |||
| 89 | If this parameter is missing from the kernel command line, all fields are | ||
| 90 | assumed to be empty, and the defaults mentioned below apply. In general | ||
| 91 | this means that the kernel tries to configure everything using | ||
| 92 | autoconfiguration. | ||
| 93 | |||
| 94 | The <autoconf> parameter can appear alone as the value to the `ip' | ||
| 95 | parameter (without all the ':' characters before). If the value is | ||
| 96 | "ip=off" or "ip=none", no autoconfiguration will take place, otherwise | ||
| 97 | autoconfiguration will take place. The most common way to use this | ||
| 98 | is "ip=dhcp". | ||
| 99 | |||
| 100 | <client-ip> IP address of the client. | ||
| 101 | |||
| 102 | Default: Determined using autoconfiguration. | ||
| 103 | |||
| 104 | <server-ip> IP address of the NFS server. If RARP is used to determine | ||
| 105 | the client address and this parameter is NOT empty only | ||
| 106 | replies from the specified server are accepted. | ||
| 107 | |||
| 108 | Only required for for NFS root. That is autoconfiguration | ||
| 109 | will not be triggered if it is missing and NFS root is not | ||
| 110 | in operation. | ||
| 111 | |||
| 112 | Default: Determined using autoconfiguration. | ||
| 113 | The address of the autoconfiguration server is used. | ||
| 114 | |||
| 115 | <gw-ip> IP address of a gateway if the server is on a different subnet. | ||
| 116 | |||
| 117 | Default: Determined using autoconfiguration. | ||
| 118 | |||
| 119 | <netmask> Netmask for local network interface. If unspecified | ||
| 120 | the netmask is derived from the client IP address assuming | ||
| 121 | classful addressing. | ||
| 122 | |||
| 123 | Default: Determined using autoconfiguration. | ||
| 124 | |||
| 125 | <hostname> Name of the client. May be supplied by autoconfiguration, | ||
| 126 | but its absence will not trigger autoconfiguration. | ||
| 127 | |||
| 128 | Default: Client IP address is used in ASCII notation. | ||
| 129 | |||
| 130 | <device> Name of network device to use. | ||
| 131 | |||
| 132 | Default: If the host only has one device, it is used. | ||
| 133 | Otherwise the device is determined using | ||
| 134 | autoconfiguration. This is done by sending | ||
| 135 | autoconfiguration requests out of all devices, | ||
| 136 | and using the device that received the first reply. | ||
| 137 | |||
| 138 | <autoconf> Method to use for autoconfiguration. In the case of options | ||
| 139 | which specify multiple autoconfiguration protocols, | ||
| 140 | requests are sent using all protocols, and the first one | ||
| 141 | to reply is used. | ||
| 142 | |||
| 143 | Only autoconfiguration protocols that have been compiled | ||
| 144 | into the kernel will be used, regardless of the value of | ||
| 145 | this option. | ||
| 146 | |||
| 147 | off or none: don't use autoconfiguration | ||
| 148 | (do static IP assignment instead) | ||
| 149 | on or any: use any protocol available in the kernel | ||
| 150 | (default) | ||
| 151 | dhcp: use DHCP | ||
| 152 | bootp: use BOOTP | ||
| 153 | rarp: use RARP | ||
| 154 | both: use both BOOTP and RARP but not DHCP | ||
| 155 | (old option kept for backwards compatibility) | ||
| 156 | |||
| 157 | Default: any | ||
| 158 | |||
| 159 | |||
| 160 | |||
| 161 | |||
| 162 | 3.) Boot Loader | ||
| 163 | ---------- | ||
| 164 | |||
| 165 | To get the kernel into memory different approaches can be used. | ||
| 166 | They depend on various facilities being available: | ||
| 167 | |||
| 168 | |||
| 169 | 3.1) Booting from a floppy using syslinux | ||
| 170 | |||
| 171 | When building kernels, an easy way to create a boot floppy that uses | ||
| 172 | syslinux is to use the zdisk or bzdisk make targets which use | ||
| 173 | and bzimage images respectively. Both targets accept the | ||
| 174 | FDARGS parameter which can be used to set the kernel command line. | ||
| 175 | |||
| 176 | e.g. | ||
| 177 | make bzdisk FDARGS="root=/dev/nfs" | ||
| 178 | |||
| 179 | Note that the user running this command will need to have | ||
| 180 | access to the floppy drive device, /dev/fd0 | ||
| 181 | |||
| 182 | For more information on syslinux, including how to create bootdisks | ||
| 183 | for prebuilt kernels, see http://syslinux.zytor.com/ | ||
| 184 | |||
| 185 | N.B: Previously it was possible to write a kernel directly to | ||
| 186 | a floppy using dd, configure the boot device using rdev, and | ||
| 187 | boot using the resulting floppy. Linux no longer supports this | ||
| 188 | method of booting. | ||
| 189 | |||
| 190 | 3.2) Booting from a cdrom using isolinux | ||
| 191 | |||
| 192 | When building kernels, an easy way to create a bootable cdrom that | ||
| 193 | uses isolinux is to use the isoimage target which uses a bzimage | ||
| 194 | image. Like zdisk and bzdisk, this target accepts the FDARGS | ||
| 195 | parameter which can be used to set the kernel command line. | ||
| 196 | |||
| 197 | e.g. | ||
| 198 | make isoimage FDARGS="root=/dev/nfs" | ||
| 199 | |||
| 200 | The resulting iso image will be arch/<ARCH>/boot/image.iso | ||
| 201 | This can be written to a cdrom using a variety of tools including | ||
| 202 | cdrecord. | ||
| 203 | |||
| 204 | e.g. | ||
| 205 | cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso | ||
| 206 | |||
| 207 | For more information on isolinux, including how to create bootdisks | ||
| 208 | for prebuilt kernels, see http://syslinux.zytor.com/ | ||
| 209 | |||
| 210 | 3.2) Using LILO | ||
| 211 | When using LILO all the necessary command line parameters may be | ||
| 212 | specified using the 'append=' directive in the LILO configuration | ||
| 213 | file. | ||
| 214 | |||
| 215 | However, to use the 'root=' directive you also need to create | ||
| 216 | a dummy root device, which may be removed after LILO is run. | ||
| 217 | |||
| 218 | mknod /dev/boot255 c 0 255 | ||
| 219 | |||
| 220 | For information on configuring LILO, please refer to its documentation. | ||
| 221 | |||
| 222 | 3.3) Using GRUB | ||
| 223 | When using GRUB, kernel parameter are simply appended after the kernel | ||
| 224 | specification: kernel <kernel> <parameters> | ||
| 225 | |||
| 226 | 3.4) Using loadlin | ||
| 227 | loadlin may be used to boot Linux from a DOS command prompt without | ||
| 228 | requiring a local hard disk to mount as root. This has not been | ||
| 229 | thoroughly tested by the authors of this document, but in general | ||
| 230 | it should be possible configure the kernel command line similarly | ||
| 231 | to the configuration of LILO. | ||
| 232 | |||
| 233 | Please refer to the loadlin documentation for further information. | ||
| 234 | |||
| 235 | 3.5) Using a boot ROM | ||
| 236 | This is probably the most elegant way of booting a diskless client. | ||
| 237 | With a boot ROM the kernel is loaded using the TFTP protocol. The | ||
| 238 | authors of this document are not aware of any no commercial boot | ||
| 239 | ROMs that support booting Linux over the network. However, there | ||
| 240 | are two free implementations of a boot ROM, netboot-nfs and | ||
| 241 | etherboot, both of which are available on sunsite.unc.edu, and both | ||
| 242 | of which contain everything you need to boot a diskless Linux client. | ||
| 243 | |||
| 244 | 3.6) Using pxelinux | ||
| 245 | Pxelinux may be used to boot linux using the PXE boot loader | ||
| 246 | which is present on many modern network cards. | ||
| 247 | |||
| 248 | When using pxelinux, the kernel image is specified using | ||
| 249 | "kernel <relative-path-below /tftpboot>". The nfsroot parameters | ||
| 250 | are passed to the kernel by adding them to the "append" line. | ||
| 251 | It is common to use serial console in conjunction with pxeliunx, | ||
| 252 | see Documentation/serial-console.txt for more information. | ||
| 253 | |||
| 254 | For more information on isolinux, including how to create bootdisks | ||
| 255 | for prebuilt kernels, see http://syslinux.zytor.com/ | ||
| 256 | |||
| 257 | |||
| 258 | |||
| 259 | |||
| 260 | 4.) Credits | ||
| 261 | ------- | ||
| 262 | |||
| 263 | The nfsroot code in the kernel and the RARP support have been written | ||
| 264 | by Gero Kuhlmann <gero@gkminix.han.de>. | ||
| 265 | |||
| 266 | The rest of the IP layer autoconfiguration code has been written | ||
| 267 | by Martin Mares <mj@atrey.karlin.mff.cuni.cz>. | ||
| 268 | |||
| 269 | In order to write the initial version of nfsroot I would like to thank | ||
| 270 | Jens-Uwe Mager <jum@anubis.han.de> for his help. | ||
diff --git a/Documentation/filesystems/rpc-cache.txt b/Documentation/filesystems/rpc-cache.txt new file mode 100644 index 000000000000..8a382bea6808 --- /dev/null +++ b/Documentation/filesystems/rpc-cache.txt | |||
| @@ -0,0 +1,202 @@ | |||
| 1 | This document gives a brief introduction to the caching | ||
| 2 | mechanisms in the sunrpc layer that is used, in particular, | ||
| 3 | for NFS authentication. | ||
| 4 | |||
| 5 | CACHES | ||
| 6 | ====== | ||
| 7 | The caching replaces the old exports table and allows for | ||
| 8 | a wide variety of values to be caches. | ||
| 9 | |||
| 10 | There are a number of caches that are similar in structure though | ||
| 11 | quite possibly very different in content and use. There is a corpus | ||
| 12 | of common code for managing these caches. | ||
| 13 | |||
| 14 | Examples of caches that are likely to be needed are: | ||
| 15 | - mapping from IP address to client name | ||
| 16 | - mapping from client name and filesystem to export options | ||
| 17 | - mapping from UID to list of GIDs, to work around NFS's limitation | ||
| 18 | of 16 gids. | ||
| 19 | - mappings between local UID/GID and remote UID/GID for sites that | ||
| 20 | do not have uniform uid assignment | ||
| 21 | - mapping from network identify to public key for crypto authentication. | ||
| 22 | |||
| 23 | The common code handles such things as: | ||
| 24 | - general cache lookup with correct locking | ||
| 25 | - supporting 'NEGATIVE' as well as positive entries | ||
| 26 | - allowing an EXPIRED time on cache items, and removing | ||
| 27 | items after they expire, and are no longer in-use. | ||
| 28 | - making requests to user-space to fill in cache entries | ||
| 29 | - allowing user-space to directly set entries in the cache | ||
| 30 | - delaying RPC requests that depend on as-yet incomplete | ||
| 31 | cache entries, and replaying those requests when the cache entry | ||
| 32 | is complete. | ||
| 33 | - clean out old entries as they expire. | ||
| 34 | |||
| 35 | Creating a Cache | ||
| 36 | ---------------- | ||
| 37 | |||
| 38 | 1/ A cache needs a datum to store. This is in the form of a | ||
| 39 | structure definition that must contain a | ||
| 40 | struct cache_head | ||
| 41 | as an element, usually the first. | ||
| 42 | It will also contain a key and some content. | ||
| 43 | Each cache element is reference counted and contains | ||
| 44 | expiry and update times for use in cache management. | ||
| 45 | 2/ A cache needs a "cache_detail" structure that | ||
| 46 | describes the cache. This stores the hash table, some | ||
| 47 | parameters for cache management, and some operations detailing how | ||
| 48 | to work with particular cache items. | ||
| 49 | The operations requires are: | ||
| 50 | struct cache_head *alloc(void) | ||
| 51 | This simply allocates appropriate memory and returns | ||
| 52 | a pointer to the cache_detail embedded within the | ||
| 53 | structure | ||
| 54 | void cache_put(struct kref *) | ||
| 55 | This is called when the last reference to an item is | ||
| 56 | dropped. The pointer passed is to the 'ref' field | ||
| 57 | in the cache_head. cache_put should release any | ||
| 58 | references create by 'cache_init' and, if CACHE_VALID | ||
| 59 | is set, any references created by cache_update. | ||
| 60 | It should then release the memory allocated by | ||
| 61 | 'alloc'. | ||
| 62 | int match(struct cache_head *orig, struct cache_head *new) | ||
| 63 | test if the keys in the two structures match. Return | ||
| 64 | 1 if they do, 0 if they don't. | ||
| 65 | void init(struct cache_head *orig, struct cache_head *new) | ||
| 66 | Set the 'key' fields in 'new' from 'orig'. This may | ||
| 67 | include taking references to shared objects. | ||
| 68 | void update(struct cache_head *orig, struct cache_head *new) | ||
| 69 | Set the 'content' fileds in 'new' from 'orig'. | ||
| 70 | int cache_show(struct seq_file *m, struct cache_detail *cd, | ||
| 71 | struct cache_head *h) | ||
| 72 | Optional. Used to provide a /proc file that lists the | ||
| 73 | contents of a cache. This should show one item, | ||
| 74 | usually on just one line. | ||
| 75 | int cache_request(struct cache_detail *cd, struct cache_head *h, | ||
| 76 | char **bpp, int *blen) | ||
| 77 | Format a request to be send to user-space for an item | ||
| 78 | to be instantiated. *bpp is a buffer of size *blen. | ||
| 79 | bpp should be moved forward over the encoded message, | ||
| 80 | and *blen should be reduced to show how much free | ||
| 81 | space remains. Return 0 on success or <0 if not | ||
| 82 | enough room or other problem. | ||
| 83 | int cache_parse(struct cache_detail *cd, char *buf, int len) | ||
| 84 | A message from user space has arrived to fill out a | ||
| 85 | cache entry. It is in 'buf' of length 'len'. | ||
| 86 | cache_parse should parse this, find the item in the | ||
| 87 | cache with sunrpc_cache_lookup, and update the item | ||
| 88 | with sunrpc_cache_update. | ||
| 89 | |||
| 90 | |||
| 91 | 3/ A cache needs to be registered using cache_register(). This | ||
| 92 | includes it on a list of caches that will be regularly | ||
| 93 | cleaned to discard old data. | ||
| 94 | |||
| 95 | Using a cache | ||
| 96 | ------------- | ||
| 97 | |||
| 98 | To find a value in a cache, call sunrpc_cache_lookup passing a pointer | ||
| 99 | to the cache_head in a sample item with the 'key' fields filled in. | ||
| 100 | This will be passed to ->match to identify the target entry. If no | ||
| 101 | entry is found, a new entry will be create, added to the cache, and | ||
| 102 | marked as not containing valid data. | ||
| 103 | |||
| 104 | The item returned is typically passed to cache_check which will check | ||
| 105 | if the data is valid, and may initiate an up-call to get fresh data. | ||
| 106 | cache_check will return -ENOENT in the entry is negative or if an up | ||
| 107 | call is needed but not possible, -EAGAIN if an upcall is pending, | ||
| 108 | or 0 if the data is valid; | ||
| 109 | |||
| 110 | cache_check can be passed a "struct cache_req *". This structure is | ||
| 111 | typically embedded in the actual request and can be used to create a | ||
| 112 | deferred copy of the request (struct cache_deferred_req). This is | ||
| 113 | done when the found cache item is not uptodate, but the is reason to | ||
| 114 | believe that userspace might provide information soon. When the cache | ||
| 115 | item does become valid, the deferred copy of the request will be | ||
| 116 | revisited (->revisit). It is expected that this method will | ||
| 117 | reschedule the request for processing. | ||
| 118 | |||
| 119 | The value returned by sunrpc_cache_lookup can also be passed to | ||
| 120 | sunrpc_cache_update to set the content for the item. A second item is | ||
| 121 | passed which should hold the content. If the item found by _lookup | ||
| 122 | has valid data, then it is discarded and a new item is created. This | ||
| 123 | saves any user of an item from worrying about content changing while | ||
| 124 | it is being inspected. If the item found by _lookup does not contain | ||
| 125 | valid data, then the content is copied across and CACHE_VALID is set. | ||
| 126 | |||
| 127 | Populating a cache | ||
| 128 | ------------------ | ||
| 129 | |||
| 130 | Each cache has a name, and when the cache is registered, a directory | ||
| 131 | with that name is created in /proc/net/rpc | ||
| 132 | |||
| 133 | This directory contains a file called 'channel' which is a channel | ||
| 134 | for communicating between kernel and user for populating the cache. | ||
| 135 | This directory may later contain other files of interacting | ||
| 136 | with the cache. | ||
| 137 | |||
| 138 | The 'channel' works a bit like a datagram socket. Each 'write' is | ||
| 139 | passed as a whole to the cache for parsing and interpretation. | ||
| 140 | Each cache can treat the write requests differently, but it is | ||
| 141 | expected that a message written will contain: | ||
| 142 | - a key | ||
| 143 | - an expiry time | ||
| 144 | - a content. | ||
| 145 | with the intention that an item in the cache with the give key | ||
| 146 | should be create or updated to have the given content, and the | ||
| 147 | expiry time should be set on that item. | ||
| 148 | |||
| 149 | Reading from a channel is a bit more interesting. When a cache | ||
| 150 | lookup fails, or when it succeeds but finds an entry that may soon | ||
| 151 | expire, a request is lodged for that cache item to be updated by | ||
| 152 | user-space. These requests appear in the channel file. | ||
| 153 | |||
| 154 | Successive reads will return successive requests. | ||
| 155 | If there are no more requests to return, read will return EOF, but a | ||
| 156 | select or poll for read will block waiting for another request to be | ||
| 157 | added. | ||
| 158 | |||
| 159 | Thus a user-space helper is likely to: | ||
| 160 | open the channel. | ||
| 161 | select for readable | ||
| 162 | read a request | ||
| 163 | write a response | ||
| 164 | loop. | ||
| 165 | |||
| 166 | If it dies and needs to be restarted, any requests that have not been | ||
| 167 | answered will still appear in the file and will be read by the new | ||
| 168 | instance of the helper. | ||
| 169 | |||
| 170 | Each cache should define a "cache_parse" method which takes a message | ||
| 171 | written from user-space and processes it. It should return an error | ||
| 172 | (which propagates back to the write syscall) or 0. | ||
| 173 | |||
| 174 | Each cache should also define a "cache_request" method which | ||
| 175 | takes a cache item and encodes a request into the buffer | ||
| 176 | provided. | ||
| 177 | |||
| 178 | Note: If a cache has no active readers on the channel, and has had not | ||
| 179 | active readers for more than 60 seconds, further requests will not be | ||
| 180 | added to the channel but instead all lookups that do not find a valid | ||
| 181 | entry will fail. This is partly for backward compatibility: The | ||
| 182 | previous nfs exports table was deemed to be authoritative and a | ||
| 183 | failed lookup meant a definite 'no'. | ||
| 184 | |||
| 185 | request/response format | ||
| 186 | ----------------------- | ||
| 187 | |||
| 188 | While each cache is free to use it's own format for requests | ||
| 189 | and responses over channel, the following is recommended as | ||
| 190 | appropriate and support routines are available to help: | ||
| 191 | Each request or response record should be printable ASCII | ||
| 192 | with precisely one newline character which should be at the end. | ||
| 193 | Fields within the record should be separated by spaces, normally one. | ||
| 194 | If spaces, newlines, or nul characters are needed in a field they | ||
| 195 | much be quoted. two mechanisms are available: | ||
| 196 | 1/ If a field begins '\x' then it must contain an even number of | ||
| 197 | hex digits, and pairs of these digits provide the bytes in the | ||
| 198 | field. | ||
| 199 | 2/ otherwise a \ in the field must be followed by 3 octal digits | ||
| 200 | which give the code for a byte. Other characters are treated | ||
| 201 | as them selves. At the very least, space, newline, nul, and | ||
| 202 | '\' must be quoted in this way. | ||
diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt new file mode 100644 index 000000000000..cc6cdb95b73a --- /dev/null +++ b/Documentation/filesystems/seq_file.txt | |||
| @@ -0,0 +1,283 @@ | |||
| 1 | The seq_file interface | ||
| 2 | |||
| 3 | Copyright 2003 Jonathan Corbet <corbet@lwn.net> | ||
| 4 | This file is originally from the LWN.net Driver Porting series at | ||
| 5 | http://lwn.net/Articles/driver-porting/ | ||
| 6 | |||
| 7 | |||
| 8 | There are numerous ways for a device driver (or other kernel component) to | ||
| 9 | provide information to the user or system administrator. One useful | ||
| 10 | technique is the creation of virtual files, in debugfs, /proc or elsewhere. | ||
| 11 | Virtual files can provide human-readable output that is easy to get at | ||
| 12 | without any special utility programs; they can also make life easier for | ||
| 13 | script writers. It is not surprising that the use of virtual files has | ||
| 14 | grown over the years. | ||
| 15 | |||
| 16 | Creating those files correctly has always been a bit of a challenge, | ||
| 17 | however. It is not that hard to make a virtual file which returns a | ||
| 18 | string. But life gets trickier if the output is long - anything greater | ||
| 19 | than an application is likely to read in a single operation. Handling | ||
| 20 | multiple reads (and seeks) requires careful attention to the reader's | ||
| 21 | position within the virtual file - that position is, likely as not, in the | ||
| 22 | middle of a line of output. The kernel has traditionally had a number of | ||
| 23 | implementations that got this wrong. | ||
| 24 | |||
| 25 | The 2.6 kernel contains a set of functions (implemented by Alexander Viro) | ||
| 26 | which are designed to make it easy for virtual file creators to get it | ||
| 27 | right. | ||
| 28 | |||
| 29 | The seq_file interface is available via <linux/seq_file.h>. There are | ||
| 30 | three aspects to seq_file: | ||
| 31 | |||
| 32 | * An iterator interface which lets a virtual file implementation | ||
| 33 | step through the objects it is presenting. | ||
| 34 | |||
| 35 | * Some utility functions for formatting objects for output without | ||
| 36 | needing to worry about things like output buffers. | ||
| 37 | |||
| 38 | * A set of canned file_operations which implement most operations on | ||
| 39 | the virtual file. | ||
| 40 | |||
| 41 | We'll look at the seq_file interface via an extremely simple example: a | ||
| 42 | loadable module which creates a file called /proc/sequence. The file, when | ||
| 43 | read, simply produces a set of increasing integer values, one per line. The | ||
| 44 | sequence will continue until the user loses patience and finds something | ||
| 45 | better to do. The file is seekable, in that one can do something like the | ||
| 46 | following: | ||
| 47 | |||
| 48 | dd if=/proc/sequence of=out1 count=1 | ||
| 49 | dd if=/proc/sequence skip=1 out=out2 count=1 | ||
| 50 | |||
| 51 | Then concatenate the output files out1 and out2 and get the right | ||
| 52 | result. Yes, it is a thoroughly useless module, but the point is to show | ||
| 53 | how the mechanism works without getting lost in other details. (Those | ||
| 54 | wanting to see the full source for this module can find it at | ||
| 55 | http://lwn.net/Articles/22359/). | ||
| 56 | |||
| 57 | |||
| 58 | The iterator interface | ||
| 59 | |||
| 60 | Modules implementing a virtual file with seq_file must implement a simple | ||
| 61 | iterator object that allows stepping through the data of interest. | ||
| 62 | Iterators must be able to move to a specific position - like the file they | ||
| 63 | implement - but the interpretation of that position is up to the iterator | ||
| 64 | itself. A seq_file implementation that is formatting firewall rules, for | ||
| 65 | example, could interpret position N as the Nth rule in the chain. | ||
| 66 | Positioning can thus be done in whatever way makes the most sense for the | ||
| 67 | generator of the data, which need not be aware of how a position translates | ||
| 68 | to an offset in the virtual file. The one obvious exception is that a | ||
| 69 | position of zero should indicate the beginning of the file. | ||
| 70 | |||
| 71 | The /proc/sequence iterator just uses the count of the next number it | ||
| 72 | will output as its position. | ||
| 73 | |||
| 74 | Four functions must be implemented to make the iterator work. The first, | ||
| 75 | called start() takes a position as an argument and returns an iterator | ||
| 76 | which will start reading at that position. For our simple sequence example, | ||
| 77 | the start() function looks like: | ||
| 78 | |||
| 79 | static void *ct_seq_start(struct seq_file *s, loff_t *pos) | ||
| 80 | { | ||
| 81 | loff_t *spos = kmalloc(sizeof(loff_t), GFP_KERNEL); | ||
| 82 | if (! spos) | ||
| 83 | return NULL; | ||
| 84 | *spos = *pos; | ||
| 85 | return spos; | ||
| 86 | } | ||
| 87 | |||
| 88 | The entire data structure for this iterator is a single loff_t value | ||
| 89 | holding the current position. There is no upper bound for the sequence | ||
| 90 | iterator, but that will not be the case for most other seq_file | ||
| 91 | implementations; in most cases the start() function should check for a | ||
| 92 | "past end of file" condition and return NULL if need be. | ||
| 93 | |||
| 94 | For more complicated applications, the private field of the seq_file | ||
| 95 | structure can be used. There is also a special value whch can be returned | ||
| 96 | by the start() function called SEQ_START_TOKEN; it can be used if you wish | ||
| 97 | to instruct your show() function (described below) to print a header at the | ||
| 98 | top of the output. SEQ_START_TOKEN should only be used if the offset is | ||
| 99 | zero, however. | ||
| 100 | |||
| 101 | The next function to implement is called, amazingly, next(); its job is to | ||
| 102 | move the iterator forward to the next position in the sequence. The | ||
| 103 | example module can simply increment the position by one; more useful | ||
| 104 | modules will do what is needed to step through some data structure. The | ||
| 105 | next() function returns a new iterator, or NULL if the sequence is | ||
| 106 | complete. Here's the example version: | ||
| 107 | |||
| 108 | static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos) | ||
| 109 | { | ||
| 110 | loff_t *spos = v; | ||
| 111 | *pos = ++*spos; | ||
| 112 | return spos; | ||
| 113 | } | ||
| 114 | |||
| 115 | The stop() function is called when iteration is complete; its job, of | ||
| 116 | course, is to clean up. If dynamic memory is allocated for the iterator, | ||
| 117 | stop() is the place to free it. | ||
| 118 | |||
| 119 | static void ct_seq_stop(struct seq_file *s, void *v) | ||
| 120 | { | ||
| 121 | kfree(v); | ||
| 122 | } | ||
| 123 | |||
| 124 | Finally, the show() function should format the object currently pointed to | ||
| 125 | by the iterator for output. It should return zero, or an error code if | ||
| 126 | something goes wrong. The example module's show() function is: | ||
| 127 | |||
| 128 | static int ct_seq_show(struct seq_file *s, void *v) | ||
| 129 | { | ||
| 130 | loff_t *spos = v; | ||
| 131 | seq_printf(s, "%lld\n", (long long)*spos); | ||
| 132 | return 0; | ||
| 133 | } | ||
| 134 | |||
| 135 | We will look at seq_printf() in a moment. But first, the definition of the | ||
| 136 | seq_file iterator is finished by creating a seq_operations structure with | ||
| 137 | the four functions we have just defined: | ||
| 138 | |||
| 139 | static const struct seq_operations ct_seq_ops = { | ||
| 140 | .start = ct_seq_start, | ||
| 141 | .next = ct_seq_next, | ||
| 142 | .stop = ct_seq_stop, | ||
| 143 | .show = ct_seq_show | ||
| 144 | }; | ||
| 145 | |||
| 146 | This structure will be needed to tie our iterator to the /proc file in | ||
| 147 | a little bit. | ||
| 148 | |||
| 149 | It's worth noting that the interator value returned by start() and | ||
| 150 | manipulated by the other functions is considered to be completely opaque by | ||
| 151 | the seq_file code. It can thus be anything that is useful in stepping | ||
| 152 | through the data to be output. Counters can be useful, but it could also be | ||
| 153 | a direct pointer into an array or linked list. Anything goes, as long as | ||
| 154 | the programmer is aware that things can happen between calls to the | ||
| 155 | iterator function. However, the seq_file code (by design) will not sleep | ||
| 156 | between the calls to start() and stop(), so holding a lock during that time | ||
| 157 | is a reasonable thing to do. The seq_file code will also avoid taking any | ||
| 158 | other locks while the iterator is active. | ||
| 159 | |||
| 160 | |||
| 161 | Formatted output | ||
| 162 | |||
| 163 | The seq_file code manages positioning within the output created by the | ||
| 164 | iterator and getting it into the user's buffer. But, for that to work, that | ||
| 165 | output must be passed to the seq_file code. Some utility functions have | ||
| 166 | been defined which make this task easy. | ||
| 167 | |||
| 168 | Most code will simply use seq_printf(), which works pretty much like | ||
| 169 | printk(), but which requires the seq_file pointer as an argument. It is | ||
| 170 | common to ignore the return value from seq_printf(), but a function | ||
| 171 | producing complicated output may want to check that value and quit if | ||
| 172 | something non-zero is returned; an error return means that the seq_file | ||
| 173 | buffer has been filled and further output will be discarded. | ||
| 174 | |||
| 175 | For straight character output, the following functions may be used: | ||
| 176 | |||
| 177 | int seq_putc(struct seq_file *m, char c); | ||
| 178 | int seq_puts(struct seq_file *m, const char *s); | ||
| 179 | int seq_escape(struct seq_file *m, const char *s, const char *esc); | ||
| 180 | |||
| 181 | The first two output a single character and a string, just like one would | ||
| 182 | expect. seq_escape() is like seq_puts(), except that any character in s | ||
| 183 | which is in the string esc will be represented in octal form in the output. | ||
| 184 | |||
| 185 | There is also a function for printing filenames: | ||
| 186 | |||
| 187 | int seq_path(struct seq_file *m, struct path *path, char *esc); | ||
| 188 | |||
| 189 | Here, path indicates the file of interest, and esc is a set of characters | ||
| 190 | which should be escaped in the output. | ||
| 191 | |||
| 192 | |||
| 193 | Making it all work | ||
| 194 | |||
| 195 | So far, we have a nice set of functions which can produce output within the | ||
| 196 | seq_file system, but we have not yet turned them into a file that a user | ||
| 197 | can see. Creating a file within the kernel requires, of course, the | ||
| 198 | creation of a set of file_operations which implement the operations on that | ||
| 199 | file. The seq_file interface provides a set of canned operations which do | ||
| 200 | most of the work. The virtual file author still must implement the open() | ||
| 201 | method, however, to hook everything up. The open function is often a single | ||
| 202 | line, as in the example module: | ||
| 203 | |||
| 204 | static int ct_open(struct inode *inode, struct file *file) | ||
| 205 | { | ||
| 206 | return seq_open(file, &ct_seq_ops); | ||
| 207 | } | ||
| 208 | |||
| 209 | Here, the call to seq_open() takes the seq_operations structure we created | ||
| 210 | before, and gets set up to iterate through the virtual file. | ||
| 211 | |||
| 212 | On a successful open, seq_open() stores the struct seq_file pointer in | ||
| 213 | file->private_data. If you have an application where the same iterator can | ||
| 214 | be used for more than one file, you can store an arbitrary pointer in the | ||
| 215 | private field of the seq_file structure; that value can then be retrieved | ||
| 216 | by the iterator functions. | ||
| 217 | |||
| 218 | The other operations of interest - read(), llseek(), and release() - are | ||
| 219 | all implemented by the seq_file code itself. So a virtual file's | ||
| 220 | file_operations structure will look like: | ||
| 221 | |||
| 222 | static const struct file_operations ct_file_ops = { | ||
| 223 | .owner = THIS_MODULE, | ||
| 224 | .open = ct_open, | ||
| 225 | .read = seq_read, | ||
| 226 | .llseek = seq_lseek, | ||
| 227 | .release = seq_release | ||
| 228 | }; | ||
| 229 | |||
| 230 | There is also a seq_release_private() which passes the contents of the | ||
| 231 | seq_file private field to kfree() before releasing the structure. | ||
| 232 | |||
| 233 | The final step is the creation of the /proc file itself. In the example | ||
| 234 | code, that is done in the initialization code in the usual way: | ||
| 235 | |||
| 236 | static int ct_init(void) | ||
| 237 | { | ||
| 238 | struct proc_dir_entry *entry; | ||
| 239 | |||
| 240 | entry = create_proc_entry("sequence", 0, NULL); | ||
| 241 | if (entry) | ||
| 242 | entry->proc_fops = &ct_file_ops; | ||
| 243 | return 0; | ||
| 244 | } | ||
| 245 | |||
| 246 | module_init(ct_init); | ||
| 247 | |||
| 248 | And that is pretty much it. | ||
| 249 | |||
| 250 | |||
| 251 | seq_list | ||
| 252 | |||
| 253 | If your file will be iterating through a linked list, you may find these | ||
| 254 | routines useful: | ||
| 255 | |||
| 256 | struct list_head *seq_list_start(struct list_head *head, | ||
| 257 | loff_t pos); | ||
| 258 | struct list_head *seq_list_start_head(struct list_head *head, | ||
| 259 | loff_t pos); | ||
| 260 | struct list_head *seq_list_next(void *v, struct list_head *head, | ||
| 261 | loff_t *ppos); | ||
| 262 | |||
| 263 | These helpers will interpret pos as a position within the list and iterate | ||
| 264 | accordingly. Your start() and next() functions need only invoke the | ||
| 265 | seq_list_* helpers with a pointer to the appropriate list_head structure. | ||
| 266 | |||
| 267 | |||
| 268 | The extra-simple version | ||
| 269 | |||
| 270 | For extremely simple virtual files, there is an even easier interface. A | ||
| 271 | module can define only the show() function, which should create all the | ||
| 272 | output that the virtual file will contain. The file's open() method then | ||
| 273 | calls: | ||
| 274 | |||
| 275 | int single_open(struct file *file, | ||
| 276 | int (*show)(struct seq_file *m, void *p), | ||
| 277 | void *data); | ||
| 278 | |||
| 279 | When output time comes, the show() function will be called once. The data | ||
| 280 | value given to single_open() can be found in the private field of the | ||
| 281 | seq_file structure. When using single_open(), the programmer should use | ||
| 282 | single_release() instead of seq_release() in the file_operations structure | ||
| 283 | to avoid a memory leak. | ||
