diff options
| author | J. Bruce Fields <bfields@citi.umich.edu> | 2009-10-27 14:41:35 -0400 |
|---|---|---|
| committer | J. Bruce Fields <bfields@citi.umich.edu> | 2009-10-27 19:34:04 -0400 |
| commit | dc7a08166f3a5f23e79e839a8a88849bd3397c32 (patch) | |
| tree | 2feb8aed7b6142467e6b8833fbfd9838bda69c39 /Documentation/filesystems/nfs | |
| parent | e343eb0d60f74547e0aeb5bd151105c2e6cfe588 (diff) | |
nfs: new subdir Documentation/filesystems/nfs
We're adding enough nfs documentation that it may as well have its own
subdirectory.
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Diffstat (limited to 'Documentation/filesystems/nfs')
| -rw-r--r-- | Documentation/filesystems/nfs/00-INDEX | 12 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/Exporting | 147 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfs-rdma.txt | 271 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfs.txt | 98 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfs41-server.txt | 222 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfsroot.txt | 270 |
6 files changed, 1020 insertions, 0 deletions
diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX new file mode 100644 index 000000000000..6ff3d212027b --- /dev/null +++ b/Documentation/filesystems/nfs/00-INDEX | |||
| @@ -0,0 +1,12 @@ | |||
| 1 | 00-INDEX | ||
| 2 | - this file (nfs-related documentation). | ||
| 3 | Exporting | ||
| 4 | - explanation of how to make filesystems exportable. | ||
| 5 | nfs.txt | ||
| 6 | - nfs client, and DNS resolution for fs_locations. | ||
| 7 | nfs41-server.txt | ||
| 8 | - info on the Linux server implementation of NFSv4 minor version 1. | ||
| 9 | nfs-rdma.txt | ||
| 10 | - how to install and setup the Linux NFS/RDMA client and server software | ||
| 11 | nfsroot.txt | ||
| 12 | - short guide on setting up a diskless box with NFS root filesystem. | ||
diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting new file mode 100644 index 000000000000..87019d2b5981 --- /dev/null +++ b/Documentation/filesystems/nfs/Exporting | |||
| @@ -0,0 +1,147 @@ | |||
| 1 | |||
| 2 | Making Filesystems Exportable | ||
| 3 | ============================= | ||
| 4 | |||
| 5 | Overview | ||
| 6 | -------- | ||
| 7 | |||
| 8 | All filesystem operations require a dentry (or two) as a starting | ||
| 9 | point. Local applications have a reference-counted hold on suitable | ||
| 10 | dentries via open file descriptors or cwd/root. However remote | ||
| 11 | applications that access a filesystem via a remote filesystem protocol | ||
| 12 | such as NFS may not be able to hold such a reference, and so need a | ||
| 13 | different way to refer to a particular dentry. As the alternative | ||
| 14 | form of reference needs to be stable across renames, truncates, and | ||
| 15 | server-reboot (among other things, though these tend to be the most | ||
| 16 | problematic), there is no simple answer like 'filename'. | ||
| 17 | |||
| 18 | The mechanism discussed here allows each filesystem implementation to | ||
| 19 | specify how to generate an opaque (outside of the filesystem) byte | ||
| 20 | string for any dentry, and how to find an appropriate dentry for any | ||
| 21 | given opaque byte string. | ||
| 22 | This byte string will be called a "filehandle fragment" as it | ||
| 23 | corresponds to part of an NFS filehandle. | ||
| 24 | |||
| 25 | A filesystem which supports the mapping between filehandle fragments | ||
| 26 | and dentries will be termed "exportable". | ||
| 27 | |||
| 28 | |||
| 29 | |||
| 30 | Dcache Issues | ||
| 31 | ------------- | ||
| 32 | |||
| 33 | The dcache normally contains a proper prefix of any given filesystem | ||
| 34 | tree. This means that if any filesystem object is in the dcache, then | ||
| 35 | all of the ancestors of that filesystem object are also in the dcache. | ||
| 36 | As normal access is by filename this prefix is created naturally and | ||
| 37 | maintained easily (by each object maintaining a reference count on | ||
| 38 | its parent). | ||
| 39 | |||
| 40 | However when objects are included into the dcache by interpreting a | ||
| 41 | filehandle fragment, there is no automatic creation of a path prefix | ||
| 42 | for the object. This leads to two related but distinct features of | ||
| 43 | the dcache that are not needed for normal filesystem access. | ||
| 44 | |||
| 45 | 1/ The dcache must sometimes contain objects that are not part of the | ||
| 46 | proper prefix. i.e that are not connected to the root. | ||
| 47 | 2/ The dcache must be prepared for a newly found (via ->lookup) directory | ||
| 48 | to already have a (non-connected) dentry, and must be able to move | ||
| 49 | that dentry into place (based on the parent and name in the | ||
| 50 | ->lookup). This is particularly needed for directories as | ||
| 51 | it is a dcache invariant that directories only have one dentry. | ||
| 52 | |||
| 53 | To implement these features, the dcache has: | ||
| 54 | |||
| 55 | a/ A dentry flag DCACHE_DISCONNECTED which is set on | ||
| 56 | any dentry that might not be part of the proper prefix. | ||
| 57 | This is set when anonymous dentries are created, and cleared when a | ||
| 58 | dentry is noticed to be a child of a dentry which is in the proper | ||
| 59 | prefix. | ||
| 60 | |||
| 61 | b/ A per-superblock list "s_anon" of dentries which are the roots of | ||
| 62 | subtrees that are not in the proper prefix. These dentries, as | ||
| 63 | well as the proper prefix, need to be released at unmount time. As | ||
| 64 | these dentries will not be hashed, they are linked together on the | ||
| 65 | d_hash list_head. | ||
| 66 | |||
| 67 | c/ Helper routines to allocate anonymous dentries, and to help attach | ||
| 68 | loose directory dentries at lookup time. They are: | ||
| 69 | d_alloc_anon(inode) will return a dentry for the given inode. | ||
| 70 | If the inode already has a dentry, one of those is returned. | ||
| 71 | If it doesn't, a new anonymous (IS_ROOT and | ||
| 72 | DCACHE_DISCONNECTED) dentry is allocated and attached. | ||
| 73 | In the case of a directory, care is taken that only one dentry | ||
| 74 | can ever be attached. | ||
| 75 | d_splice_alias(inode, dentry) will make sure that there is a | ||
| 76 | dentry with the same name and parent as the given dentry, and | ||
| 77 | which refers to the given inode. | ||
| 78 | If the inode is a directory and already has a dentry, then that | ||
| 79 | dentry is d_moved over the given dentry. | ||
| 80 | If the passed dentry gets attached, care is taken that this is | ||
| 81 | mutually exclusive to a d_alloc_anon operation. | ||
| 82 | If the passed dentry is used, NULL is returned, else the used | ||
| 83 | dentry is returned. This corresponds to the calling pattern of | ||
| 84 | ->lookup. | ||
| 85 | |||
| 86 | |||
| 87 | Filesystem Issues | ||
| 88 | ----------------- | ||
| 89 | |||
| 90 | For a filesystem to be exportable it must: | ||
| 91 | |||
| 92 | 1/ provide the filehandle fragment routines described below. | ||
| 93 | 2/ make sure that d_splice_alias is used rather than d_add | ||
| 94 | when ->lookup finds an inode for a given parent and name. | ||
| 95 | Typically the ->lookup routine will end with a: | ||
| 96 | |||
| 97 | return d_splice_alias(inode, dentry); | ||
| 98 | } | ||
| 99 | |||
| 100 | |||
| 101 | |||
| 102 | A file system implementation declares that instances of the filesystem | ||
| 103 | are exportable by setting the s_export_op field in the struct | ||
| 104 | super_block. This field must point to a "struct export_operations" | ||
| 105 | struct which has the following members: | ||
| 106 | |||
| 107 | encode_fh (optional) | ||
| 108 | Takes a dentry and creates a filehandle fragment which can later be used | ||
| 109 | to find or create a dentry for the same object. The default | ||
| 110 | implementation creates a filehandle fragment that encodes a 32bit inode | ||
| 111 | and generation number for the inode encoded, and if necessary the | ||
| 112 | same information for the parent. | ||
| 113 | |||
| 114 | fh_to_dentry (mandatory) | ||
| 115 | Given a filehandle fragment, this should find the implied object and | ||
| 116 | create a dentry for it (possibly with d_alloc_anon). | ||
| 117 | |||
| 118 | fh_to_parent (optional but strongly recommended) | ||
| 119 | Given a filehandle fragment, this should find the parent of the | ||
| 120 | implied object and create a dentry for it (possibly with d_alloc_anon). | ||
| 121 | May fail if the filehandle fragment is too small. | ||
| 122 | |||
| 123 | get_parent (optional but strongly recommended) | ||
| 124 | When given a dentry for a directory, this should return a dentry for | ||
| 125 | the parent. Quite possibly the parent dentry will have been allocated | ||
| 126 | by d_alloc_anon. The default get_parent function just returns an error | ||
| 127 | so any filehandle lookup that requires finding a parent will fail. | ||
| 128 | ->lookup("..") is *not* used as a default as it can leave ".." entries | ||
| 129 | in the dcache which are too messy to work with. | ||
| 130 | |||
| 131 | get_name (optional) | ||
| 132 | When given a parent dentry and a child dentry, this should find a name | ||
| 133 | in the directory identified by the parent dentry, which leads to the | ||
| 134 | object identified by the child dentry. If no get_name function is | ||
| 135 | supplied, a default implementation is provided which uses vfs_readdir | ||
| 136 | to find potential names, and matches inode numbers to find the correct | ||
| 137 | match. | ||
| 138 | |||
| 139 | |||
| 140 | A filehandle fragment consists of an array of 1 or more 4byte words, | ||
| 141 | together with a one byte "type". | ||
| 142 | The decode_fh routine should not depend on the stated size that is | ||
| 143 | passed to it. This size may be larger than the original filehandle | ||
| 144 | generated by encode_fh, in which case it will have been padded with | ||
| 145 | nuls. Rather, the encode_fh routine should choose a "type" which | ||
| 146 | indicates the decode_fh how much of the filehandle is valid, and how | ||
| 147 | it should be interpreted. | ||
diff --git a/Documentation/filesystems/nfs/nfs-rdma.txt b/Documentation/filesystems/nfs/nfs-rdma.txt new file mode 100644 index 000000000000..e386f7e4bcee --- /dev/null +++ b/Documentation/filesystems/nfs/nfs-rdma.txt | |||
| @@ -0,0 +1,271 @@ | |||
| 1 | ################################################################################ | ||
| 2 | # # | ||
| 3 | # NFS/RDMA README # | ||
| 4 | # # | ||
| 5 | ################################################################################ | ||
| 6 | |||
| 7 | Author: NetApp and Open Grid Computing | ||
| 8 | Date: May 29, 2008 | ||
| 9 | |||
| 10 | Table of Contents | ||
| 11 | ~~~~~~~~~~~~~~~~~ | ||
| 12 | - Overview | ||
| 13 | - Getting Help | ||
| 14 | - Installation | ||
| 15 | - Check RDMA and NFS Setup | ||
| 16 | - NFS/RDMA Setup | ||
| 17 | |||
| 18 | Overview | ||
| 19 | ~~~~~~~~ | ||
| 20 | |||
| 21 | This document describes how to install and setup the Linux NFS/RDMA client | ||
| 22 | and server software. | ||
| 23 | |||
| 24 | The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server | ||
| 25 | was first included in the following release, Linux 2.6.25. | ||
| 26 | |||
| 27 | In our testing, we have obtained excellent performance results (full 10Gbit | ||
| 28 | wire bandwidth at minimal client CPU) under many workloads. The code passes | ||
| 29 | the full Connectathon test suite and operates over both Infiniband and iWARP | ||
| 30 | RDMA adapters. | ||
| 31 | |||
| 32 | Getting Help | ||
| 33 | ~~~~~~~~~~~~ | ||
| 34 | |||
| 35 | If you get stuck, you can ask questions on the | ||
| 36 | |||
| 37 | nfs-rdma-devel@lists.sourceforge.net | ||
| 38 | |||
| 39 | mailing list. | ||
| 40 | |||
| 41 | Installation | ||
| 42 | ~~~~~~~~~~~~ | ||
| 43 | |||
| 44 | These instructions are a step by step guide to building a machine for | ||
| 45 | use with NFS/RDMA. | ||
| 46 | |||
| 47 | - Install an RDMA device | ||
| 48 | |||
| 49 | Any device supported by the drivers in drivers/infiniband/hw is acceptable. | ||
| 50 | |||
| 51 | Testing has been performed using several Mellanox-based IB cards, the | ||
| 52 | Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. | ||
| 53 | |||
| 54 | - Install a Linux distribution and tools | ||
| 55 | |||
| 56 | The first kernel release to contain both the NFS/RDMA client and server was | ||
| 57 | Linux 2.6.25 Therefore, a distribution compatible with this and subsequent | ||
| 58 | Linux kernel release should be installed. | ||
| 59 | |||
| 60 | The procedures described in this document have been tested with | ||
| 61 | distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). | ||
| 62 | |||
| 63 | - Install nfs-utils-1.1.2 or greater on the client | ||
| 64 | |||
| 65 | An NFS/RDMA mount point can be obtained by using the mount.nfs command in | ||
| 66 | nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils | ||
| 67 | version with support for NFS/RDMA mounts, but for various reasons we | ||
| 68 | recommend using nfs-utils-1.1.2 or greater). To see which version of | ||
| 69 | mount.nfs you are using, type: | ||
| 70 | |||
| 71 | $ /sbin/mount.nfs -V | ||
| 72 | |||
| 73 | If the version is less than 1.1.2 or the command does not exist, | ||
| 74 | you should install the latest version of nfs-utils. | ||
| 75 | |||
| 76 | Download the latest package from: | ||
| 77 | |||
| 78 | http://www.kernel.org/pub/linux/utils/nfs | ||
| 79 | |||
| 80 | Uncompress the package and follow the installation instructions. | ||
| 81 | |||
| 82 | If you will not need the idmapper and gssd executables (you do not need | ||
| 83 | these to create an NFS/RDMA enabled mount command), the installation | ||
| 84 | process can be simplified by disabling these features when running | ||
| 85 | configure: | ||
| 86 | |||
| 87 | $ ./configure --disable-gss --disable-nfsv4 | ||
| 88 | |||
| 89 | To build nfs-utils you will need the tcp_wrappers package installed. For | ||
| 90 | more information on this see the package's README and INSTALL files. | ||
| 91 | |||
| 92 | After building the nfs-utils package, there will be a mount.nfs binary in | ||
| 93 | the utils/mount directory. This binary can be used to initiate NFS v2, v3, | ||
| 94 | or v4 mounts. To initiate a v4 mount, the binary must be called | ||
| 95 | mount.nfs4. The standard technique is to create a symlink called | ||
| 96 | mount.nfs4 to mount.nfs. | ||
| 97 | |||
| 98 | This mount.nfs binary should be installed at /sbin/mount.nfs as follows: | ||
| 99 | |||
| 100 | $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs | ||
| 101 | |||
| 102 | In this location, mount.nfs will be invoked automatically for NFS mounts | ||
| 103 | by the system mount command. | ||
| 104 | |||
| 105 | NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed | ||
| 106 | on the NFS client machine. You do not need this specific version of | ||
| 107 | nfs-utils on the server. Furthermore, only the mount.nfs command from | ||
| 108 | nfs-utils-1.1.2 is needed on the client. | ||
| 109 | |||
| 110 | - Install a Linux kernel with NFS/RDMA | ||
| 111 | |||
| 112 | The NFS/RDMA client and server are both included in the mainline Linux | ||
| 113 | kernel version 2.6.25 and later. This and other versions of the 2.6 Linux | ||
| 114 | kernel can be found at: | ||
| 115 | |||
| 116 | ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ | ||
| 117 | |||
| 118 | Download the sources and place them in an appropriate location. | ||
| 119 | |||
| 120 | - Configure the RDMA stack | ||
| 121 | |||
| 122 | Make sure your kernel configuration has RDMA support enabled. Under | ||
| 123 | Device Drivers -> InfiniBand support, update the kernel configuration | ||
| 124 | to enable InfiniBand support [NOTE: the option name is misleading. Enabling | ||
| 125 | InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. | ||
| 126 | |||
| 127 | Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or | ||
| 128 | iWARP adapter support (amso, cxgb3, etc.). | ||
| 129 | |||
| 130 | If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. | ||
| 131 | |||
| 132 | - Configure the NFS client and server | ||
| 133 | |||
| 134 | Your kernel configuration must also have NFS file system support and/or | ||
| 135 | NFS server support enabled. These and other NFS related configuration | ||
| 136 | options can be found under File Systems -> Network File Systems. | ||
| 137 | |||
| 138 | - Build, install, reboot | ||
| 139 | |||
| 140 | The NFS/RDMA code will be enabled automatically if NFS and RDMA | ||
| 141 | are turned on. The NFS/RDMA client and server are configured via the hidden | ||
| 142 | SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The | ||
| 143 | value of SUNRPC_XPRT_RDMA will be: | ||
| 144 | |||
| 145 | - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client | ||
| 146 | and server will not be built | ||
| 147 | - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, | ||
| 148 | in this case the NFS/RDMA client and server will be built as modules | ||
| 149 | - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client | ||
| 150 | and server will be built into the kernel | ||
| 151 | |||
| 152 | Therefore, if you have followed the steps above and turned no NFS and RDMA, | ||
| 153 | the NFS/RDMA client and server will be built. | ||
| 154 | |||
| 155 | Build a new kernel, install it, boot it. | ||
| 156 | |||
| 157 | Check RDMA and NFS Setup | ||
| 158 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 159 | |||
| 160 | Before configuring the NFS/RDMA software, it is a good idea to test | ||
| 161 | your new kernel to ensure that the kernel is working correctly. | ||
| 162 | In particular, it is a good idea to verify that the RDMA stack | ||
| 163 | is functioning as expected and standard NFS over TCP/IP and/or UDP/IP | ||
| 164 | is working properly. | ||
| 165 | |||
| 166 | - Check RDMA Setup | ||
| 167 | |||
| 168 | If you built the RDMA components as modules, load them at | ||
| 169 | this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel | ||
| 170 | card: | ||
| 171 | |||
| 172 | $ modprobe ib_mthca | ||
| 173 | $ modprobe ib_ipoib | ||
| 174 | |||
| 175 | If you are using InfiniBand, make sure there is a Subnet Manager (SM) | ||
| 176 | running on the network. If your IB switch has an embedded SM, you can | ||
| 177 | use it. Otherwise, you will need to run an SM, such as OpenSM, on one | ||
| 178 | of your end nodes. | ||
| 179 | |||
| 180 | If an SM is running on your network, you should see the following: | ||
| 181 | |||
| 182 | $ cat /sys/class/infiniband/driverX/ports/1/state | ||
| 183 | 4: ACTIVE | ||
| 184 | |||
| 185 | where driverX is mthca0, ipath5, ehca3, etc. | ||
| 186 | |||
| 187 | To further test the InfiniBand software stack, use IPoIB (this | ||
| 188 | assumes you have two IB hosts named host1 and host2): | ||
| 189 | |||
| 190 | host1$ ifconfig ib0 a.b.c.x | ||
| 191 | host2$ ifconfig ib0 a.b.c.y | ||
| 192 | host1$ ping a.b.c.y | ||
| 193 | host2$ ping a.b.c.x | ||
| 194 | |||
| 195 | For other device types, follow the appropriate procedures. | ||
| 196 | |||
| 197 | - Check NFS Setup | ||
| 198 | |||
| 199 | For the NFS components enabled above (client and/or server), | ||
| 200 | test their functionality over standard Ethernet using TCP/IP or UDP/IP. | ||
| 201 | |||
| 202 | NFS/RDMA Setup | ||
| 203 | ~~~~~~~~~~~~~~ | ||
| 204 | |||
| 205 | We recommend that you use two machines, one to act as the client and | ||
| 206 | one to act as the server. | ||
| 207 | |||
| 208 | One time configuration: | ||
| 209 | |||
| 210 | - On the server system, configure the /etc/exports file and | ||
| 211 | start the NFS/RDMA server. | ||
| 212 | |||
| 213 | Exports entries with the following formats have been tested: | ||
| 214 | |||
| 215 | /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) | ||
| 216 | /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) | ||
| 217 | |||
| 218 | The IP address(es) is(are) the client's IPoIB address for an InfiniBand | ||
| 219 | HCA or the cleint's iWARP address(es) for an RNIC. | ||
| 220 | |||
| 221 | NOTE: The "insecure" option must be used because the NFS/RDMA client does | ||
| 222 | not use a reserved port. | ||
| 223 | |||
| 224 | Each time a machine boots: | ||
| 225 | |||
| 226 | - Load and configure the RDMA drivers | ||
| 227 | |||
| 228 | For InfiniBand using a Mellanox adapter: | ||
| 229 | |||
| 230 | $ modprobe ib_mthca | ||
| 231 | $ modprobe ib_ipoib | ||
| 232 | $ ifconfig ib0 a.b.c.d | ||
| 233 | |||
| 234 | NOTE: use unique addresses for the client and server | ||
| 235 | |||
| 236 | - Start the NFS server | ||
| 237 | |||
| 238 | If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in | ||
| 239 | kernel config), load the RDMA transport module: | ||
| 240 | |||
| 241 | $ modprobe svcrdma | ||
| 242 | |||
| 243 | Regardless of how the server was built (module or built-in), start the | ||
| 244 | server: | ||
| 245 | |||
| 246 | $ /etc/init.d/nfs start | ||
| 247 | |||
| 248 | or | ||
| 249 | |||
| 250 | $ service nfs start | ||
| 251 | |||
| 252 | Instruct the server to listen on the RDMA transport: | ||
| 253 | |||
| 254 | $ echo rdma 20049 > /proc/fs/nfsd/portlist | ||
| 255 | |||
| 256 | - On the client system | ||
| 257 | |||
| 258 | If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in | ||
| 259 | kernel config), load the RDMA client module: | ||
| 260 | |||
| 261 | $ modprobe xprtrdma.ko | ||
| 262 | |||
| 263 | Regardless of how the client was built (module or built-in), use this | ||
| 264 | command to mount the NFS/RDMA server: | ||
| 265 | |||
| 266 | $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt | ||
| 267 | |||
| 268 | To verify that the mount is using RDMA, run "cat /proc/mounts" and check | ||
| 269 | the "proto" field for the given mount. | ||
| 270 | |||
| 271 | Congratulations! You're using NFS/RDMA! | ||
diff --git a/Documentation/filesystems/nfs/nfs.txt b/Documentation/filesystems/nfs/nfs.txt new file mode 100644 index 000000000000..f50f26ce6cd0 --- /dev/null +++ b/Documentation/filesystems/nfs/nfs.txt | |||
| @@ -0,0 +1,98 @@ | |||
| 1 | |||
| 2 | The NFS client | ||
| 3 | ============== | ||
| 4 | |||
| 5 | The NFS version 2 protocol was first documented in RFC1094 (March 1989). | ||
| 6 | Since then two more major releases of NFS have been published, with NFSv3 | ||
| 7 | being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April | ||
| 8 | 2003). | ||
| 9 | |||
| 10 | The Linux NFS client currently supports all the above published versions, | ||
| 11 | and work is in progress on adding support for minor version 1 of the NFSv4 | ||
| 12 | protocol. | ||
| 13 | |||
| 14 | The purpose of this document is to provide information on some of the | ||
| 15 | upcall interfaces that are used in order to provide the NFS client with | ||
| 16 | some of the information that it requires in order to fully comply with | ||
| 17 | the NFS spec. | ||
| 18 | |||
| 19 | The DNS resolver | ||
| 20 | ================ | ||
| 21 | |||
| 22 | NFSv4 allows for one server to refer the NFS client to data that has been | ||
| 23 | migrated onto another server by means of the special "fs_locations" | ||
| 24 | attribute. See | ||
| 25 | http://tools.ietf.org/html/rfc3530#section-6 | ||
| 26 | and | ||
| 27 | http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 | ||
| 28 | |||
| 29 | The fs_locations information can take the form of either an ip address and | ||
| 30 | a path, or a DNS hostname and a path. The latter requires the NFS client to | ||
| 31 | do a DNS lookup in order to mount the new volume, and hence the need for an | ||
| 32 | upcall to allow userland to provide this service. | ||
| 33 | |||
| 34 | Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual | ||
| 35 | /var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: | ||
| 36 | |||
| 37 | (1) The process checks the dns_resolve cache to see if it contains a | ||
| 38 | valid entry. If so, it returns that entry and exits. | ||
| 39 | |||
| 40 | (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' | ||
| 41 | (may be changed using the 'nfs.cache_getent' kernel boot parameter) | ||
| 42 | is run, with two arguments: | ||
| 43 | - the cache name, "dns_resolve" | ||
| 44 | - the hostname to resolve | ||
| 45 | |||
| 46 | (3) After looking up the corresponding ip address, the helper script | ||
| 47 | writes the result into the rpc_pipefs pseudo-file | ||
| 48 | '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' | ||
| 49 | in the following (text) format: | ||
| 50 | |||
| 51 | "<ip address> <hostname> <ttl>\n" | ||
| 52 | |||
| 53 | Where <ip address> is in the usual IPv4 (123.456.78.90) or IPv6 | ||
| 54 | (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. | ||
| 55 | <hostname> is identical to the second argument of the helper | ||
| 56 | script, and <ttl> is the 'time to live' of this cache entry (in | ||
| 57 | units of seconds). | ||
| 58 | |||
| 59 | Note: If <ip address> is invalid, say the string "0", then a negative | ||
| 60 | entry is created, which will cause the kernel to treat the hostname | ||
| 61 | as having no valid DNS translation. | ||
| 62 | |||
| 63 | |||
| 64 | |||
| 65 | |||
| 66 | A basic sample /sbin/nfs_cache_getent | ||
| 67 | ===================================== | ||
| 68 | |||
| 69 | #!/bin/bash | ||
| 70 | # | ||
| 71 | ttl=600 | ||
| 72 | # | ||
| 73 | cut=/usr/bin/cut | ||
| 74 | getent=/usr/bin/getent | ||
| 75 | rpc_pipefs=/var/lib/nfs/rpc_pipefs | ||
| 76 | # | ||
| 77 | die() | ||
| 78 | { | ||
| 79 | echo "Usage: $0 cache_name entry_name" | ||
| 80 | exit 1 | ||
| 81 | } | ||
| 82 | |||
| 83 | [ $# -lt 2 ] && die | ||
| 84 | cachename="$1" | ||
| 85 | cache_path=${rpc_pipefs}/cache/${cachename}/channel | ||
| 86 | |||
| 87 | case "${cachename}" in | ||
| 88 | dns_resolve) | ||
| 89 | name="$2" | ||
| 90 | result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" | ||
| 91 | [ -z "${result}" ] && result="0" | ||
| 92 | ;; | ||
| 93 | *) | ||
| 94 | die | ||
| 95 | ;; | ||
| 96 | esac | ||
| 97 | echo "${result} ${name} ${ttl}" >${cache_path} | ||
| 98 | |||
diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt new file mode 100644 index 000000000000..1bd0d0c05171 --- /dev/null +++ b/Documentation/filesystems/nfs/nfs41-server.txt | |||
| @@ -0,0 +1,222 @@ | |||
| 1 | NFSv4.1 Server Implementation | ||
| 2 | |||
| 3 | Server support for minorversion 1 can be controlled using the | ||
| 4 | /proc/fs/nfsd/versions control file. The string output returned | ||
| 5 | by reading this file will contain either "+4.1" or "-4.1" | ||
| 6 | correspondingly. | ||
| 7 | |||
| 8 | Currently, server support for minorversion 1 is disabled by default. | ||
| 9 | It can be enabled at run time by writing the string "+4.1" to | ||
| 10 | the /proc/fs/nfsd/versions control file. Note that to write this | ||
| 11 | control file, the nfsd service must be taken down. Use your user-mode | ||
| 12 | nfs-utils to set this up; see rpc.nfsd(8) | ||
| 13 | |||
| 14 | (Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and | ||
| 15 | "-4", respectively. Therefore, code meant to work on both new and old | ||
| 16 | kernels must turn 4.1 on or off *before* turning support for version 4 | ||
| 17 | on or off; rpc.nfsd does this correctly.) | ||
| 18 | |||
| 19 | The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based | ||
| 20 | on the latest NFSv4.1 Internet Draft: | ||
| 21 | http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29 | ||
| 22 | |||
| 23 | From the many new features in NFSv4.1 the current implementation | ||
| 24 | focuses on the mandatory-to-implement NFSv4.1 Sessions, providing | ||
| 25 | "exactly once" semantics and better control and throttling of the | ||
| 26 | resources allocated for each client. | ||
| 27 | |||
| 28 | Other NFSv4.1 features, Parallel NFS operations in particular, | ||
| 29 | are still under development out of tree. | ||
| 30 | See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design | ||
| 31 | for more information. | ||
| 32 | |||
| 33 | The current implementation is intended for developers only: while it | ||
| 34 | does support ordinary file operations on clients we have tested against | ||
| 35 | (including the linux client), it is incomplete in ways which may limit | ||
| 36 | features unexpectedly, cause known bugs in rare cases, or cause | ||
| 37 | interoperability problems with future clients. Known issues: | ||
| 38 | |||
| 39 | - gss support is questionable: currently mounts with kerberos | ||
| 40 | from a linux client are possible, but we aren't really | ||
| 41 | conformant with the spec (for example, we don't use kerberos | ||
| 42 | on the backchannel correctly). | ||
| 43 | - no trunking support: no clients currently take advantage of | ||
| 44 | trunking, but this is a mandatory feature, and its use is | ||
| 45 | recommended to clients in a number of places. (E.g. to ensure | ||
| 46 | timely renewal in case an existing connection's retry timeouts | ||
| 47 | have gotten too long; see section 8.3 of the draft.) | ||
| 48 | Therefore, lack of this feature may cause future clients to | ||
| 49 | fail. | ||
| 50 | - Incomplete backchannel support: incomplete backchannel gss | ||
| 51 | support and no support for BACKCHANNEL_CTL mean that | ||
| 52 | callbacks (hence delegations and layouts) may not be | ||
| 53 | available and clients confused by the incomplete | ||
| 54 | implementation may fail. | ||
| 55 | - Server reboot recovery is unsupported; if the server reboots, | ||
| 56 | clients may fail. | ||
| 57 | - We do not support SSV, which provides security for shared | ||
| 58 | client-server state (thus preventing unauthorized tampering | ||
| 59 | with locks and opens, for example). It is mandatory for | ||
| 60 | servers to support this, though no clients use it yet. | ||
| 61 | - Mandatory operations which we do not support, such as | ||
| 62 | DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and | ||
| 63 | TEST_STATEID, are not currently used by clients, but will be | ||
| 64 | (and the spec recommends their uses in common cases), and | ||
| 65 | clients should not be expected to know how to recover from the | ||
| 66 | case where they are not supported. This will eventually cause | ||
| 67 | interoperability failures. | ||
| 68 | |||
| 69 | In addition, some limitations are inherited from the current NFSv4 | ||
| 70 | implementation: | ||
| 71 | |||
| 72 | - Incomplete delegation enforcement: if a file is renamed or | ||
| 73 | unlinked, a client holding a delegation may continue to | ||
| 74 | indefinitely allow opens of the file under the old name. | ||
| 75 | |||
| 76 | The table below, taken from the NFSv4.1 document, lists | ||
| 77 | the operations that are mandatory to implement (REQ), optional | ||
| 78 | (OPT), and NFSv4.0 operations that are required not to implement (MNI) | ||
| 79 | in minor version 1. The first column indicates the operations that | ||
| 80 | are not supported yet by the linux server implementation. | ||
| 81 | |||
| 82 | The OPTIONAL features identified and their abbreviations are as follows: | ||
| 83 | pNFS Parallel NFS | ||
| 84 | FDELG File Delegations | ||
| 85 | DDELG Directory Delegations | ||
| 86 | |||
| 87 | The following abbreviations indicate the linux server implementation status. | ||
| 88 | I Implemented NFSv4.1 operations. | ||
| 89 | NS Not Supported. | ||
| 90 | NS* unimplemented optional feature. | ||
| 91 | P pNFS features implemented out of tree. | ||
| 92 | PNS pNFS features that are not supported yet (out of tree). | ||
| 93 | |||
| 94 | Operations | ||
| 95 | |||
| 96 | +----------------------+------------+--------------+----------------+ | ||
| 97 | | Operation | REQ, REC, | Feature | Definition | | ||
| 98 | | | OPT, or | (REQ, REC, | | | ||
| 99 | | | MNI | or OPT) | | | ||
| 100 | +----------------------+------------+--------------+----------------+ | ||
| 101 | | ACCESS | REQ | | Section 18.1 | | ||
| 102 | NS | BACKCHANNEL_CTL | REQ | | Section 18.33 | | ||
| 103 | NS | BIND_CONN_TO_SESSION | REQ | | Section 18.34 | | ||
| 104 | | CLOSE | REQ | | Section 18.2 | | ||
| 105 | | COMMIT | REQ | | Section 18.3 | | ||
| 106 | | CREATE | REQ | | Section 18.4 | | ||
| 107 | I | CREATE_SESSION | REQ | | Section 18.36 | | ||
| 108 | NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 | | ||
| 109 | | DELEGRETURN | OPT | FDELG, | Section 18.6 | | ||
| 110 | | | | DDELG, pNFS | | | ||
| 111 | | | | (REQ) | | | ||
| 112 | NS | DESTROY_CLIENTID | REQ | | Section 18.50 | | ||
| 113 | I | DESTROY_SESSION | REQ | | Section 18.37 | | ||
| 114 | I | EXCHANGE_ID | REQ | | Section 18.35 | | ||
| 115 | NS | FREE_STATEID | REQ | | Section 18.38 | | ||
| 116 | | GETATTR | REQ | | Section 18.7 | | ||
| 117 | P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | | ||
| 118 | P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | | ||
| 119 | | GETFH | REQ | | Section 18.8 | | ||
| 120 | NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 | | ||
| 121 | P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | | ||
| 122 | P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | | ||
| 123 | P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | | ||
| 124 | | LINK | OPT | | Section 18.9 | | ||
| 125 | | LOCK | REQ | | Section 18.10 | | ||
| 126 | | LOCKT | REQ | | Section 18.11 | | ||
| 127 | | LOCKU | REQ | | Section 18.12 | | ||
| 128 | | LOOKUP | REQ | | Section 18.13 | | ||
| 129 | | LOOKUPP | REQ | | Section 18.14 | | ||
| 130 | | NVERIFY | REQ | | Section 18.15 | | ||
| 131 | | OPEN | REQ | | Section 18.16 | | ||
| 132 | NS*| OPENATTR | OPT | | Section 18.17 | | ||
| 133 | | OPEN_CONFIRM | MNI | | N/A | | ||
| 134 | | OPEN_DOWNGRADE | REQ | | Section 18.18 | | ||
| 135 | | PUTFH | REQ | | Section 18.19 | | ||
| 136 | | PUTPUBFH | REQ | | Section 18.20 | | ||
| 137 | | PUTROOTFH | REQ | | Section 18.21 | | ||
| 138 | | READ | REQ | | Section 18.22 | | ||
| 139 | | READDIR | REQ | | Section 18.23 | | ||
| 140 | | READLINK | OPT | | Section 18.24 | | ||
| 141 | NS | RECLAIM_COMPLETE | REQ | | Section 18.51 | | ||
| 142 | | RELEASE_LOCKOWNER | MNI | | N/A | | ||
| 143 | | REMOVE | REQ | | Section 18.25 | | ||
| 144 | | RENAME | REQ | | Section 18.26 | | ||
| 145 | | RENEW | MNI | | N/A | | ||
| 146 | | RESTOREFH | REQ | | Section 18.27 | | ||
| 147 | | SAVEFH | REQ | | Section 18.28 | | ||
| 148 | | SECINFO | REQ | | Section 18.29 | | ||
| 149 | NS | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, | | ||
| 150 | | | | layout (REQ) | Section 13.12 | | ||
| 151 | I | SEQUENCE | REQ | | Section 18.46 | | ||
| 152 | | SETATTR | REQ | | Section 18.30 | | ||
| 153 | | SETCLIENTID | MNI | | N/A | | ||
| 154 | | SETCLIENTID_CONFIRM | MNI | | N/A | | ||
| 155 | NS | SET_SSV | REQ | | Section 18.47 | | ||
| 156 | NS | TEST_STATEID | REQ | | Section 18.48 | | ||
| 157 | | VERIFY | REQ | | Section 18.31 | | ||
| 158 | NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 | | ||
| 159 | | WRITE | REQ | | Section 18.32 | | ||
| 160 | |||
| 161 | Callback Operations | ||
| 162 | |||
| 163 | +-------------------------+-----------+-------------+---------------+ | ||
| 164 | | Operation | REQ, REC, | Feature | Definition | | ||
| 165 | | | OPT, or | (REQ, REC, | | | ||
| 166 | | | MNI | or OPT) | | | ||
| 167 | +-------------------------+-----------+-------------+---------------+ | ||
| 168 | | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | | ||
| 169 | P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | | ||
| 170 | NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | | ||
| 171 | P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | | ||
| 172 | NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 | | ||
| 173 | NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 | | ||
| 174 | | CB_RECALL | OPT | FDELG, | Section 20.2 | | ||
| 175 | | | | DDELG, pNFS | | | ||
| 176 | | | | (REQ) | | | ||
| 177 | NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 | | ||
| 178 | | | | DDELG, pNFS | | | ||
| 179 | | | | (REQ) | | | ||
| 180 | NS | CB_RECALL_SLOT | REQ | | Section 20.8 | | ||
| 181 | NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 | | ||
| 182 | | | | (REQ) | | | ||
| 183 | I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 | | ||
| 184 | | | | DDELG, pNFS | | | ||
| 185 | | | | (REQ) | | | ||
| 186 | NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 | | ||
| 187 | | | | DDELG, pNFS | | | ||
| 188 | | | | (REQ) | | | ||
| 189 | +-------------------------+-----------+-------------+---------------+ | ||
| 190 | |||
| 191 | Implementation notes: | ||
| 192 | |||
| 193 | DELEGPURGE: | ||
| 194 | * mandatory only for servers that support CLAIM_DELEGATE_PREV and/or | ||
| 195 | CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that | ||
| 196 | persist across client reboots). Thus we need not implement this for | ||
| 197 | now. | ||
| 198 | |||
| 199 | EXCHANGE_ID: | ||
| 200 | * only SP4_NONE state protection supported | ||
| 201 | * implementation ids are ignored | ||
| 202 | |||
| 203 | CREATE_SESSION: | ||
| 204 | * backchannel attributes are ignored | ||
| 205 | * backchannel security parameters are ignored | ||
| 206 | |||
| 207 | SEQUENCE: | ||
| 208 | * no support for dynamic slot table renegotiation (optional) | ||
| 209 | |||
| 210 | nfsv4.1 COMPOUND rules: | ||
| 211 | The following cases aren't supported yet: | ||
| 212 | * Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION, | ||
| 213 | DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID. | ||
| 214 | * DESTROY_SESSION MUST be the final operation in the COMPOUND request. | ||
| 215 | |||
| 216 | Nonstandard compound limitations: | ||
| 217 | * No support for a sessions fore channel RPC compound that requires both a | ||
| 218 | ca_maxrequestsize request and a ca_maxresponsesize reply, so we may | ||
| 219 | fail to live up to the promise we made in CREATE_SESSION fore channel | ||
| 220 | negotiation. | ||
| 221 | * No more than one IO operation (read, write, readdir) allowed per | ||
| 222 | compound. | ||
diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt new file mode 100644 index 000000000000..3ba0b945aaf8 --- /dev/null +++ b/Documentation/filesystems/nfs/nfsroot.txt | |||
| @@ -0,0 +1,270 @@ | |||
| 1 | Mounting the root filesystem via NFS (nfsroot) | ||
| 2 | =============================================== | ||
| 3 | |||
| 4 | Written 1996 by Gero Kuhlmann <gero@gkminix.han.de> | ||
| 5 | Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz> | ||
| 6 | Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org> | ||
| 7 | Updated 2006 by Horms <horms@verge.net.au> | ||
| 8 | |||
| 9 | |||
| 10 | |||
| 11 | In order to use a diskless system, such as an X-terminal or printer server | ||
| 12 | for example, it is necessary for the root filesystem to be present on a | ||
| 13 | non-disk device. This may be an initramfs (see Documentation/filesystems/ | ||
| 14 | ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a | ||
| 15 | filesystem mounted via NFS. The following text describes on how to use NFS | ||
| 16 | for the root filesystem. For the rest of this text 'client' means the | ||
| 17 | diskless system, and 'server' means the NFS server. | ||
| 18 | |||
| 19 | |||
| 20 | |||
| 21 | |||
| 22 | 1.) Enabling nfsroot capabilities | ||
| 23 | ----------------------------- | ||
| 24 | |||
| 25 | In order to use nfsroot, NFS client support needs to be selected as | ||
| 26 | built-in during configuration. Once this has been selected, the nfsroot | ||
| 27 | option will become available, which should also be selected. | ||
| 28 | |||
| 29 | In the networking options, kernel level autoconfiguration can be selected, | ||
| 30 | along with the types of autoconfiguration to support. Selecting all of | ||
| 31 | DHCP, BOOTP and RARP is safe. | ||
| 32 | |||
| 33 | |||
| 34 | |||
| 35 | |||
| 36 | 2.) Kernel command line | ||
| 37 | ------------------- | ||
| 38 | |||
| 39 | When the kernel has been loaded by a boot loader (see below) it needs to be | ||
| 40 | told what root fs device to use. And in the case of nfsroot, where to find | ||
| 41 | both the server and the name of the directory on the server to mount as root. | ||
| 42 | This can be established using the following kernel command line parameters: | ||
| 43 | |||
| 44 | |||
| 45 | root=/dev/nfs | ||
| 46 | |||
| 47 | This is necessary to enable the pseudo-NFS-device. Note that it's not a | ||
| 48 | real device but just a synonym to tell the kernel to use NFS instead of | ||
| 49 | a real device. | ||
| 50 | |||
| 51 | |||
| 52 | nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] | ||
| 53 | |||
| 54 | If the `nfsroot' parameter is NOT given on the command line, | ||
| 55 | the default "/tftpboot/%s" will be used. | ||
| 56 | |||
| 57 | <server-ip> Specifies the IP address of the NFS server. | ||
| 58 | The default address is determined by the `ip' parameter | ||
| 59 | (see below). This parameter allows the use of different | ||
| 60 | servers for IP autoconfiguration and NFS. | ||
| 61 | |||
| 62 | <root-dir> Name of the directory on the server to mount as root. | ||
| 63 | If there is a "%s" token in the string, it will be | ||
| 64 | replaced by the ASCII-representation of the client's | ||
| 65 | IP address. | ||
| 66 | |||
| 67 | <nfs-options> Standard NFS options. All options are separated by commas. | ||
| 68 | The following defaults are used: | ||
| 69 | port = as given by server portmap daemon | ||
| 70 | rsize = 4096 | ||
| 71 | wsize = 4096 | ||
| 72 | timeo = 7 | ||
| 73 | retrans = 3 | ||
| 74 | acregmin = 3 | ||
| 75 | acregmax = 60 | ||
| 76 | acdirmin = 30 | ||
| 77 | acdirmax = 60 | ||
| 78 | flags = hard, nointr, noposix, cto, ac | ||
| 79 | |||
| 80 | |||
| 81 | ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf> | ||
| 82 | |||
| 83 | This parameter tells the kernel how to configure IP addresses of devices | ||
| 84 | and also how to set up the IP routing table. It was originally called | ||
| 85 | `nfsaddrs', but now the boot-time IP configuration works independently of | ||
| 86 | NFS, so it was renamed to `ip' and the old name remained as an alias for | ||
| 87 | compatibility reasons. | ||
| 88 | |||
| 89 | If this parameter is missing from the kernel command line, all fields are | ||
| 90 | assumed to be empty, and the defaults mentioned below apply. In general | ||
| 91 | this means that the kernel tries to configure everything using | ||
| 92 | autoconfiguration. | ||
| 93 | |||
| 94 | The <autoconf> parameter can appear alone as the value to the `ip' | ||
| 95 | parameter (without all the ':' characters before). If the value is | ||
| 96 | "ip=off" or "ip=none", no autoconfiguration will take place, otherwise | ||
| 97 | autoconfiguration will take place. The most common way to use this | ||
| 98 | is "ip=dhcp". | ||
| 99 | |||
| 100 | <client-ip> IP address of the client. | ||
| 101 | |||
| 102 | Default: Determined using autoconfiguration. | ||
| 103 | |||
| 104 | <server-ip> IP address of the NFS server. If RARP is used to determine | ||
| 105 | the client address and this parameter is NOT empty only | ||
| 106 | replies from the specified server are accepted. | ||
| 107 | |||
| 108 | Only required for NFS root. That is autoconfiguration | ||
| 109 | will not be triggered if it is missing and NFS root is not | ||
| 110 | in operation. | ||
| 111 | |||
| 112 | Default: Determined using autoconfiguration. | ||
| 113 | The address of the autoconfiguration server is used. | ||
| 114 | |||
| 115 | <gw-ip> IP address of a gateway if the server is on a different subnet. | ||
| 116 | |||
| 117 | Default: Determined using autoconfiguration. | ||
| 118 | |||
| 119 | <netmask> Netmask for local network interface. If unspecified | ||
| 120 | the netmask is derived from the client IP address assuming | ||
| 121 | classful addressing. | ||
| 122 | |||
| 123 | Default: Determined using autoconfiguration. | ||
| 124 | |||
| 125 | <hostname> Name of the client. May be supplied by autoconfiguration, | ||
| 126 | but its absence will not trigger autoconfiguration. | ||
| 127 | |||
| 128 | Default: Client IP address is used in ASCII notation. | ||
| 129 | |||
| 130 | <device> Name of network device to use. | ||
| 131 | |||
| 132 | Default: If the host only has one device, it is used. | ||
| 133 | Otherwise the device is determined using | ||
| 134 | autoconfiguration. This is done by sending | ||
| 135 | autoconfiguration requests out of all devices, | ||
| 136 | and using the device that received the first reply. | ||
| 137 | |||
| 138 | <autoconf> Method to use for autoconfiguration. In the case of options | ||
| 139 | which specify multiple autoconfiguration protocols, | ||
| 140 | requests are sent using all protocols, and the first one | ||
| 141 | to reply is used. | ||
| 142 | |||
| 143 | Only autoconfiguration protocols that have been compiled | ||
| 144 | into the kernel will be used, regardless of the value of | ||
| 145 | this option. | ||
| 146 | |||
| 147 | off or none: don't use autoconfiguration | ||
| 148 | (do static IP assignment instead) | ||
| 149 | on or any: use any protocol available in the kernel | ||
| 150 | (default) | ||
| 151 | dhcp: use DHCP | ||
| 152 | bootp: use BOOTP | ||
| 153 | rarp: use RARP | ||
| 154 | both: use both BOOTP and RARP but not DHCP | ||
| 155 | (old option kept for backwards compatibility) | ||
| 156 | |||
| 157 | Default: any | ||
| 158 | |||
| 159 | |||
| 160 | |||
| 161 | |||
| 162 | 3.) Boot Loader | ||
| 163 | ---------- | ||
| 164 | |||
| 165 | To get the kernel into memory different approaches can be used. | ||
| 166 | They depend on various facilities being available: | ||
| 167 | |||
| 168 | |||
| 169 | 3.1) Booting from a floppy using syslinux | ||
| 170 | |||
| 171 | When building kernels, an easy way to create a boot floppy that uses | ||
| 172 | syslinux is to use the zdisk or bzdisk make targets which use zimage | ||
| 173 | and bzimage images respectively. Both targets accept the | ||
| 174 | FDARGS parameter which can be used to set the kernel command line. | ||
| 175 | |||
| 176 | e.g. | ||
| 177 | make bzdisk FDARGS="root=/dev/nfs" | ||
| 178 | |||
| 179 | Note that the user running this command will need to have | ||
| 180 | access to the floppy drive device, /dev/fd0 | ||
| 181 | |||
| 182 | For more information on syslinux, including how to create bootdisks | ||
| 183 | for prebuilt kernels, see http://syslinux.zytor.com/ | ||
| 184 | |||
| 185 | N.B: Previously it was possible to write a kernel directly to | ||
| 186 | a floppy using dd, configure the boot device using rdev, and | ||
| 187 | boot using the resulting floppy. Linux no longer supports this | ||
| 188 | method of booting. | ||
| 189 | |||
| 190 | 3.2) Booting from a cdrom using isolinux | ||
| 191 | |||
| 192 | When building kernels, an easy way to create a bootable cdrom that | ||
| 193 | uses isolinux is to use the isoimage target which uses a bzimage | ||
| 194 | image. Like zdisk and bzdisk, this target accepts the FDARGS | ||
| 195 | parameter which can be used to set the kernel command line. | ||
| 196 | |||
| 197 | e.g. | ||
| 198 | make isoimage FDARGS="root=/dev/nfs" | ||
| 199 | |||
| 200 | The resulting iso image will be arch/<ARCH>/boot/image.iso | ||
| 201 | This can be written to a cdrom using a variety of tools including | ||
| 202 | cdrecord. | ||
| 203 | |||
| 204 | e.g. | ||
| 205 | cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso | ||
| 206 | |||
| 207 | For more information on isolinux, including how to create bootdisks | ||
| 208 | for prebuilt kernels, see http://syslinux.zytor.com/ | ||
| 209 | |||
| 210 | 3.2) Using LILO | ||
| 211 | When using LILO all the necessary command line parameters may be | ||
| 212 | specified using the 'append=' directive in the LILO configuration | ||
| 213 | file. | ||
| 214 | |||
| 215 | However, to use the 'root=' directive you also need to create | ||
| 216 | a dummy root device, which may be removed after LILO is run. | ||
| 217 | |||
| 218 | mknod /dev/boot255 c 0 255 | ||
| 219 | |||
| 220 | For information on configuring LILO, please refer to its documentation. | ||
| 221 | |||
| 222 | 3.3) Using GRUB | ||
| 223 | When using GRUB, kernel parameter are simply appended after the kernel | ||
| 224 | specification: kernel <kernel> <parameters> | ||
| 225 | |||
| 226 | 3.4) Using loadlin | ||
| 227 | loadlin may be used to boot Linux from a DOS command prompt without | ||
| 228 | requiring a local hard disk to mount as root. This has not been | ||
| 229 | thoroughly tested by the authors of this document, but in general | ||
| 230 | it should be possible configure the kernel command line similarly | ||
| 231 | to the configuration of LILO. | ||
| 232 | |||
| 233 | Please refer to the loadlin documentation for further information. | ||
| 234 | |||
| 235 | 3.5) Using a boot ROM | ||
| 236 | This is probably the most elegant way of booting a diskless client. | ||
| 237 | With a boot ROM the kernel is loaded using the TFTP protocol. The | ||
| 238 | authors of this document are not aware of any no commercial boot | ||
| 239 | ROMs that support booting Linux over the network. However, there | ||
| 240 | are two free implementations of a boot ROM, netboot-nfs and | ||
| 241 | etherboot, both of which are available on sunsite.unc.edu, and both | ||
| 242 | of which contain everything you need to boot a diskless Linux client. | ||
| 243 | |||
| 244 | 3.6) Using pxelinux | ||
| 245 | Pxelinux may be used to boot linux using the PXE boot loader | ||
| 246 | which is present on many modern network cards. | ||
| 247 | |||
| 248 | When using pxelinux, the kernel image is specified using | ||
| 249 | "kernel <relative-path-below /tftpboot>". The nfsroot parameters | ||
| 250 | are passed to the kernel by adding them to the "append" line. | ||
| 251 | It is common to use serial console in conjunction with pxeliunx, | ||
| 252 | see Documentation/serial-console.txt for more information. | ||
| 253 | |||
| 254 | For more information on isolinux, including how to create bootdisks | ||
| 255 | for prebuilt kernels, see http://syslinux.zytor.com/ | ||
| 256 | |||
| 257 | |||
| 258 | |||
| 259 | |||
| 260 | 4.) Credits | ||
| 261 | ------- | ||
| 262 | |||
| 263 | The nfsroot code in the kernel and the RARP support have been written | ||
| 264 | by Gero Kuhlmann <gero@gkminix.han.de>. | ||
| 265 | |||
| 266 | The rest of the IP layer autoconfiguration code has been written | ||
| 267 | by Martin Mares <mj@atrey.karlin.mff.cuni.cz>. | ||
| 268 | |||
| 269 | In order to write the initial version of nfsroot I would like to thank | ||
| 270 | Jens-Uwe Mager <jum@anubis.han.de> for his help. | ||
