aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems/nfs
diff options
context:
space:
mode:
authorJ. Bruce Fields <bfields@citi.umich.edu>2009-10-27 14:41:35 -0400
committerJ. Bruce Fields <bfields@citi.umich.edu>2009-10-27 19:34:04 -0400
commitdc7a08166f3a5f23e79e839a8a88849bd3397c32 (patch)
tree2feb8aed7b6142467e6b8833fbfd9838bda69c39 /Documentation/filesystems/nfs
parente343eb0d60f74547e0aeb5bd151105c2e6cfe588 (diff)
nfs: new subdir Documentation/filesystems/nfs
We're adding enough nfs documentation that it may as well have its own subdirectory. Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Diffstat (limited to 'Documentation/filesystems/nfs')
-rw-r--r--Documentation/filesystems/nfs/00-INDEX12
-rw-r--r--Documentation/filesystems/nfs/Exporting147
-rw-r--r--Documentation/filesystems/nfs/nfs-rdma.txt271
-rw-r--r--Documentation/filesystems/nfs/nfs.txt98
-rw-r--r--Documentation/filesystems/nfs/nfs41-server.txt222
-rw-r--r--Documentation/filesystems/nfs/nfsroot.txt270
6 files changed, 1020 insertions, 0 deletions
diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX
new file mode 100644
index 000000000000..6ff3d212027b
--- /dev/null
+++ b/Documentation/filesystems/nfs/00-INDEX
@@ -0,0 +1,12 @@
100-INDEX
2 - this file (nfs-related documentation).
3Exporting
4 - explanation of how to make filesystems exportable.
5nfs.txt
6 - nfs client, and DNS resolution for fs_locations.
7nfs41-server.txt
8 - info on the Linux server implementation of NFSv4 minor version 1.
9nfs-rdma.txt
10 - how to install and setup the Linux NFS/RDMA client and server software
11nfsroot.txt
12 - short guide on setting up a diskless box with NFS root filesystem.
diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting
new file mode 100644
index 000000000000..87019d2b5981
--- /dev/null
+++ b/Documentation/filesystems/nfs/Exporting
@@ -0,0 +1,147 @@
1
2Making Filesystems Exportable
3=============================
4
5Overview
6--------
7
8All filesystem operations require a dentry (or two) as a starting
9point. Local applications have a reference-counted hold on suitable
10dentries via open file descriptors or cwd/root. However remote
11applications that access a filesystem via a remote filesystem protocol
12such as NFS may not be able to hold such a reference, and so need a
13different way to refer to a particular dentry. As the alternative
14form of reference needs to be stable across renames, truncates, and
15server-reboot (among other things, though these tend to be the most
16problematic), there is no simple answer like 'filename'.
17
18The mechanism discussed here allows each filesystem implementation to
19specify how to generate an opaque (outside of the filesystem) byte
20string for any dentry, and how to find an appropriate dentry for any
21given opaque byte string.
22This byte string will be called a "filehandle fragment" as it
23corresponds to part of an NFS filehandle.
24
25A filesystem which supports the mapping between filehandle fragments
26and dentries will be termed "exportable".
27
28
29
30Dcache Issues
31-------------
32
33The dcache normally contains a proper prefix of any given filesystem
34tree. This means that if any filesystem object is in the dcache, then
35all of the ancestors of that filesystem object are also in the dcache.
36As normal access is by filename this prefix is created naturally and
37maintained easily (by each object maintaining a reference count on
38its parent).
39
40However when objects are included into the dcache by interpreting a
41filehandle fragment, there is no automatic creation of a path prefix
42for the object. This leads to two related but distinct features of
43the dcache that are not needed for normal filesystem access.
44
451/ The dcache must sometimes contain objects that are not part of the
46 proper prefix. i.e that are not connected to the root.
472/ The dcache must be prepared for a newly found (via ->lookup) directory
48 to already have a (non-connected) dentry, and must be able to move
49 that dentry into place (based on the parent and name in the
50 ->lookup). This is particularly needed for directories as
51 it is a dcache invariant that directories only have one dentry.
52
53To implement these features, the dcache has:
54
55a/ A dentry flag DCACHE_DISCONNECTED which is set on
56 any dentry that might not be part of the proper prefix.
57 This is set when anonymous dentries are created, and cleared when a
58 dentry is noticed to be a child of a dentry which is in the proper
59 prefix.
60
61b/ A per-superblock list "s_anon" of dentries which are the roots of
62 subtrees that are not in the proper prefix. These dentries, as
63 well as the proper prefix, need to be released at unmount time. As
64 these dentries will not be hashed, they are linked together on the
65 d_hash list_head.
66
67c/ Helper routines to allocate anonymous dentries, and to help attach
68 loose directory dentries at lookup time. They are:
69 d_alloc_anon(inode) will return a dentry for the given inode.
70 If the inode already has a dentry, one of those is returned.
71 If it doesn't, a new anonymous (IS_ROOT and
72 DCACHE_DISCONNECTED) dentry is allocated and attached.
73 In the case of a directory, care is taken that only one dentry
74 can ever be attached.
75 d_splice_alias(inode, dentry) will make sure that there is a
76 dentry with the same name and parent as the given dentry, and
77 which refers to the given inode.
78 If the inode is a directory and already has a dentry, then that
79 dentry is d_moved over the given dentry.
80 If the passed dentry gets attached, care is taken that this is
81 mutually exclusive to a d_alloc_anon operation.
82 If the passed dentry is used, NULL is returned, else the used
83 dentry is returned. This corresponds to the calling pattern of
84 ->lookup.
85
86
87Filesystem Issues
88-----------------
89
90For a filesystem to be exportable it must:
91
92 1/ provide the filehandle fragment routines described below.
93 2/ make sure that d_splice_alias is used rather than d_add
94 when ->lookup finds an inode for a given parent and name.
95 Typically the ->lookup routine will end with a:
96
97 return d_splice_alias(inode, dentry);
98 }
99
100
101
102 A file system implementation declares that instances of the filesystem
103are exportable by setting the s_export_op field in the struct
104super_block. This field must point to a "struct export_operations"
105struct which has the following members:
106
107 encode_fh (optional)
108 Takes a dentry and creates a filehandle fragment which can later be used
109 to find or create a dentry for the same object. The default
110 implementation creates a filehandle fragment that encodes a 32bit inode
111 and generation number for the inode encoded, and if necessary the
112 same information for the parent.
113
114 fh_to_dentry (mandatory)
115 Given a filehandle fragment, this should find the implied object and
116 create a dentry for it (possibly with d_alloc_anon).
117
118 fh_to_parent (optional but strongly recommended)
119 Given a filehandle fragment, this should find the parent of the
120 implied object and create a dentry for it (possibly with d_alloc_anon).
121 May fail if the filehandle fragment is too small.
122
123 get_parent (optional but strongly recommended)
124 When given a dentry for a directory, this should return a dentry for
125 the parent. Quite possibly the parent dentry will have been allocated
126 by d_alloc_anon. The default get_parent function just returns an error
127 so any filehandle lookup that requires finding a parent will fail.
128 ->lookup("..") is *not* used as a default as it can leave ".." entries
129 in the dcache which are too messy to work with.
130
131 get_name (optional)
132 When given a parent dentry and a child dentry, this should find a name
133 in the directory identified by the parent dentry, which leads to the
134 object identified by the child dentry. If no get_name function is
135 supplied, a default implementation is provided which uses vfs_readdir
136 to find potential names, and matches inode numbers to find the correct
137 match.
138
139
140A filehandle fragment consists of an array of 1 or more 4byte words,
141together with a one byte "type".
142The decode_fh routine should not depend on the stated size that is
143passed to it. This size may be larger than the original filehandle
144generated by encode_fh, in which case it will have been padded with
145nuls. Rather, the encode_fh routine should choose a "type" which
146indicates the decode_fh how much of the filehandle is valid, and how
147it should be interpreted.
diff --git a/Documentation/filesystems/nfs/nfs-rdma.txt b/Documentation/filesystems/nfs/nfs-rdma.txt
new file mode 100644
index 000000000000..e386f7e4bcee
--- /dev/null
+++ b/Documentation/filesystems/nfs/nfs-rdma.txt
@@ -0,0 +1,271 @@
1################################################################################
2# #
3# NFS/RDMA README #
4# #
5################################################################################
6
7 Author: NetApp and Open Grid Computing
8 Date: May 29, 2008
9
10Table of Contents
11~~~~~~~~~~~~~~~~~
12 - Overview
13 - Getting Help
14 - Installation
15 - Check RDMA and NFS Setup
16 - NFS/RDMA Setup
17
18Overview
19~~~~~~~~
20
21 This document describes how to install and setup the Linux NFS/RDMA client
22 and server software.
23
24 The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
25 was first included in the following release, Linux 2.6.25.
26
27 In our testing, we have obtained excellent performance results (full 10Gbit
28 wire bandwidth at minimal client CPU) under many workloads. The code passes
29 the full Connectathon test suite and operates over both Infiniband and iWARP
30 RDMA adapters.
31
32Getting Help
33~~~~~~~~~~~~
34
35 If you get stuck, you can ask questions on the
36
37 nfs-rdma-devel@lists.sourceforge.net
38
39 mailing list.
40
41Installation
42~~~~~~~~~~~~
43
44 These instructions are a step by step guide to building a machine for
45 use with NFS/RDMA.
46
47 - Install an RDMA device
48
49 Any device supported by the drivers in drivers/infiniband/hw is acceptable.
50
51 Testing has been performed using several Mellanox-based IB cards, the
52 Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
53
54 - Install a Linux distribution and tools
55
56 The first kernel release to contain both the NFS/RDMA client and server was
57 Linux 2.6.25 Therefore, a distribution compatible with this and subsequent
58 Linux kernel release should be installed.
59
60 The procedures described in this document have been tested with
61 distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
62
63 - Install nfs-utils-1.1.2 or greater on the client
64
65 An NFS/RDMA mount point can be obtained by using the mount.nfs command in
66 nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
67 version with support for NFS/RDMA mounts, but for various reasons we
68 recommend using nfs-utils-1.1.2 or greater). To see which version of
69 mount.nfs you are using, type:
70
71 $ /sbin/mount.nfs -V
72
73 If the version is less than 1.1.2 or the command does not exist,
74 you should install the latest version of nfs-utils.
75
76 Download the latest package from:
77
78 http://www.kernel.org/pub/linux/utils/nfs
79
80 Uncompress the package and follow the installation instructions.
81
82 If you will not need the idmapper and gssd executables (you do not need
83 these to create an NFS/RDMA enabled mount command), the installation
84 process can be simplified by disabling these features when running
85 configure:
86
87 $ ./configure --disable-gss --disable-nfsv4
88
89 To build nfs-utils you will need the tcp_wrappers package installed. For
90 more information on this see the package's README and INSTALL files.
91
92 After building the nfs-utils package, there will be a mount.nfs binary in
93 the utils/mount directory. This binary can be used to initiate NFS v2, v3,
94 or v4 mounts. To initiate a v4 mount, the binary must be called
95 mount.nfs4. The standard technique is to create a symlink called
96 mount.nfs4 to mount.nfs.
97
98 This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
99
100 $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
101
102 In this location, mount.nfs will be invoked automatically for NFS mounts
103 by the system mount command.
104
105 NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
106 on the NFS client machine. You do not need this specific version of
107 nfs-utils on the server. Furthermore, only the mount.nfs command from
108 nfs-utils-1.1.2 is needed on the client.
109
110 - Install a Linux kernel with NFS/RDMA
111
112 The NFS/RDMA client and server are both included in the mainline Linux
113 kernel version 2.6.25 and later. This and other versions of the 2.6 Linux
114 kernel can be found at:
115
116 ftp://ftp.kernel.org/pub/linux/kernel/v2.6/
117
118 Download the sources and place them in an appropriate location.
119
120 - Configure the RDMA stack
121
122 Make sure your kernel configuration has RDMA support enabled. Under
123 Device Drivers -> InfiniBand support, update the kernel configuration
124 to enable InfiniBand support [NOTE: the option name is misleading. Enabling
125 InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
126
127 Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
128 iWARP adapter support (amso, cxgb3, etc.).
129
130 If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
131
132 - Configure the NFS client and server
133
134 Your kernel configuration must also have NFS file system support and/or
135 NFS server support enabled. These and other NFS related configuration
136 options can be found under File Systems -> Network File Systems.
137
138 - Build, install, reboot
139
140 The NFS/RDMA code will be enabled automatically if NFS and RDMA
141 are turned on. The NFS/RDMA client and server are configured via the hidden
142 SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
143 value of SUNRPC_XPRT_RDMA will be:
144
145 - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
146 and server will not be built
147 - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
148 in this case the NFS/RDMA client and server will be built as modules
149 - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
150 and server will be built into the kernel
151
152 Therefore, if you have followed the steps above and turned no NFS and RDMA,
153 the NFS/RDMA client and server will be built.
154
155 Build a new kernel, install it, boot it.
156
157Check RDMA and NFS Setup
158~~~~~~~~~~~~~~~~~~~~~~~~
159
160 Before configuring the NFS/RDMA software, it is a good idea to test
161 your new kernel to ensure that the kernel is working correctly.
162 In particular, it is a good idea to verify that the RDMA stack
163 is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
164 is working properly.
165
166 - Check RDMA Setup
167
168 If you built the RDMA components as modules, load them at
169 this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
170 card:
171
172 $ modprobe ib_mthca
173 $ modprobe ib_ipoib
174
175 If you are using InfiniBand, make sure there is a Subnet Manager (SM)
176 running on the network. If your IB switch has an embedded SM, you can
177 use it. Otherwise, you will need to run an SM, such as OpenSM, on one
178 of your end nodes.
179
180 If an SM is running on your network, you should see the following:
181
182 $ cat /sys/class/infiniband/driverX/ports/1/state
183 4: ACTIVE
184
185 where driverX is mthca0, ipath5, ehca3, etc.
186
187 To further test the InfiniBand software stack, use IPoIB (this
188 assumes you have two IB hosts named host1 and host2):
189
190 host1$ ifconfig ib0 a.b.c.x
191 host2$ ifconfig ib0 a.b.c.y
192 host1$ ping a.b.c.y
193 host2$ ping a.b.c.x
194
195 For other device types, follow the appropriate procedures.
196
197 - Check NFS Setup
198
199 For the NFS components enabled above (client and/or server),
200 test their functionality over standard Ethernet using TCP/IP or UDP/IP.
201
202NFS/RDMA Setup
203~~~~~~~~~~~~~~
204
205 We recommend that you use two machines, one to act as the client and
206 one to act as the server.
207
208 One time configuration:
209
210 - On the server system, configure the /etc/exports file and
211 start the NFS/RDMA server.
212
213 Exports entries with the following formats have been tested:
214
215 /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
216 /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
217
218 The IP address(es) is(are) the client's IPoIB address for an InfiniBand
219 HCA or the cleint's iWARP address(es) for an RNIC.
220
221 NOTE: The "insecure" option must be used because the NFS/RDMA client does
222 not use a reserved port.
223
224 Each time a machine boots:
225
226 - Load and configure the RDMA drivers
227
228 For InfiniBand using a Mellanox adapter:
229
230 $ modprobe ib_mthca
231 $ modprobe ib_ipoib
232 $ ifconfig ib0 a.b.c.d
233
234 NOTE: use unique addresses for the client and server
235
236 - Start the NFS server
237
238 If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
239 kernel config), load the RDMA transport module:
240
241 $ modprobe svcrdma
242
243 Regardless of how the server was built (module or built-in), start the
244 server:
245
246 $ /etc/init.d/nfs start
247
248 or
249
250 $ service nfs start
251
252 Instruct the server to listen on the RDMA transport:
253
254 $ echo rdma 20049 > /proc/fs/nfsd/portlist
255
256 - On the client system
257
258 If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
259 kernel config), load the RDMA client module:
260
261 $ modprobe xprtrdma.ko
262
263 Regardless of how the client was built (module or built-in), use this
264 command to mount the NFS/RDMA server:
265
266 $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
267
268 To verify that the mount is using RDMA, run "cat /proc/mounts" and check
269 the "proto" field for the given mount.
270
271 Congratulations! You're using NFS/RDMA!
diff --git a/Documentation/filesystems/nfs/nfs.txt b/Documentation/filesystems/nfs/nfs.txt
new file mode 100644
index 000000000000..f50f26ce6cd0
--- /dev/null
+++ b/Documentation/filesystems/nfs/nfs.txt
@@ -0,0 +1,98 @@
1
2The NFS client
3==============
4
5The NFS version 2 protocol was first documented in RFC1094 (March 1989).
6Since then two more major releases of NFS have been published, with NFSv3
7being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April
82003).
9
10The Linux NFS client currently supports all the above published versions,
11and work is in progress on adding support for minor version 1 of the NFSv4
12protocol.
13
14The purpose of this document is to provide information on some of the
15upcall interfaces that are used in order to provide the NFS client with
16some of the information that it requires in order to fully comply with
17the NFS spec.
18
19The DNS resolver
20================
21
22NFSv4 allows for one server to refer the NFS client to data that has been
23migrated onto another server by means of the special "fs_locations"
24attribute. See
25 http://tools.ietf.org/html/rfc3530#section-6
26and
27 http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
28
29The fs_locations information can take the form of either an ip address and
30a path, or a DNS hostname and a path. The latter requires the NFS client to
31do a DNS lookup in order to mount the new volume, and hence the need for an
32upcall to allow userland to provide this service.
33
34Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
35/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps:
36
37 (1) The process checks the dns_resolve cache to see if it contains a
38 valid entry. If so, it returns that entry and exits.
39
40 (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent'
41 (may be changed using the 'nfs.cache_getent' kernel boot parameter)
42 is run, with two arguments:
43 - the cache name, "dns_resolve"
44 - the hostname to resolve
45
46 (3) After looking up the corresponding ip address, the helper script
47 writes the result into the rpc_pipefs pseudo-file
48 '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel'
49 in the following (text) format:
50
51 "<ip address> <hostname> <ttl>\n"
52
53 Where <ip address> is in the usual IPv4 (123.456.78.90) or IPv6
54 (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format.
55 <hostname> is identical to the second argument of the helper
56 script, and <ttl> is the 'time to live' of this cache entry (in
57 units of seconds).
58
59 Note: If <ip address> is invalid, say the string "0", then a negative
60 entry is created, which will cause the kernel to treat the hostname
61 as having no valid DNS translation.
62
63
64
65
66A basic sample /sbin/nfs_cache_getent
67=====================================
68
69#!/bin/bash
70#
71ttl=600
72#
73cut=/usr/bin/cut
74getent=/usr/bin/getent
75rpc_pipefs=/var/lib/nfs/rpc_pipefs
76#
77die()
78{
79 echo "Usage: $0 cache_name entry_name"
80 exit 1
81}
82
83[ $# -lt 2 ] && die
84cachename="$1"
85cache_path=${rpc_pipefs}/cache/${cachename}/channel
86
87case "${cachename}" in
88 dns_resolve)
89 name="$2"
90 result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )"
91 [ -z "${result}" ] && result="0"
92 ;;
93 *)
94 die
95 ;;
96esac
97echo "${result} ${name} ${ttl}" >${cache_path}
98
diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt
new file mode 100644
index 000000000000..1bd0d0c05171
--- /dev/null
+++ b/Documentation/filesystems/nfs/nfs41-server.txt
@@ -0,0 +1,222 @@
1NFSv4.1 Server Implementation
2
3Server support for minorversion 1 can be controlled using the
4/proc/fs/nfsd/versions control file. The string output returned
5by reading this file will contain either "+4.1" or "-4.1"
6correspondingly.
7
8Currently, server support for minorversion 1 is disabled by default.
9It can be enabled at run time by writing the string "+4.1" to
10the /proc/fs/nfsd/versions control file. Note that to write this
11control file, the nfsd service must be taken down. Use your user-mode
12nfs-utils to set this up; see rpc.nfsd(8)
13
14(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and
15"-4", respectively. Therefore, code meant to work on both new and old
16kernels must turn 4.1 on or off *before* turning support for version 4
17on or off; rpc.nfsd does this correctly.)
18
19The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
20on the latest NFSv4.1 Internet Draft:
21http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29
22
23From the many new features in NFSv4.1 the current implementation
24focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
25"exactly once" semantics and better control and throttling of the
26resources allocated for each client.
27
28Other NFSv4.1 features, Parallel NFS operations in particular,
29are still under development out of tree.
30See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design
31for more information.
32
33The current implementation is intended for developers only: while it
34does support ordinary file operations on clients we have tested against
35(including the linux client), it is incomplete in ways which may limit
36features unexpectedly, cause known bugs in rare cases, or cause
37interoperability problems with future clients. Known issues:
38
39 - gss support is questionable: currently mounts with kerberos
40 from a linux client are possible, but we aren't really
41 conformant with the spec (for example, we don't use kerberos
42 on the backchannel correctly).
43 - no trunking support: no clients currently take advantage of
44 trunking, but this is a mandatory feature, and its use is
45 recommended to clients in a number of places. (E.g. to ensure
46 timely renewal in case an existing connection's retry timeouts
47 have gotten too long; see section 8.3 of the draft.)
48 Therefore, lack of this feature may cause future clients to
49 fail.
50 - Incomplete backchannel support: incomplete backchannel gss
51 support and no support for BACKCHANNEL_CTL mean that
52 callbacks (hence delegations and layouts) may not be
53 available and clients confused by the incomplete
54 implementation may fail.
55 - Server reboot recovery is unsupported; if the server reboots,
56 clients may fail.
57 - We do not support SSV, which provides security for shared
58 client-server state (thus preventing unauthorized tampering
59 with locks and opens, for example). It is mandatory for
60 servers to support this, though no clients use it yet.
61 - Mandatory operations which we do not support, such as
62 DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and
63 TEST_STATEID, are not currently used by clients, but will be
64 (and the spec recommends their uses in common cases), and
65 clients should not be expected to know how to recover from the
66 case where they are not supported. This will eventually cause
67 interoperability failures.
68
69In addition, some limitations are inherited from the current NFSv4
70implementation:
71
72 - Incomplete delegation enforcement: if a file is renamed or
73 unlinked, a client holding a delegation may continue to
74 indefinitely allow opens of the file under the old name.
75
76The table below, taken from the NFSv4.1 document, lists
77the operations that are mandatory to implement (REQ), optional
78(OPT), and NFSv4.0 operations that are required not to implement (MNI)
79in minor version 1. The first column indicates the operations that
80are not supported yet by the linux server implementation.
81
82The OPTIONAL features identified and their abbreviations are as follows:
83 pNFS Parallel NFS
84 FDELG File Delegations
85 DDELG Directory Delegations
86
87The following abbreviations indicate the linux server implementation status.
88 I Implemented NFSv4.1 operations.
89 NS Not Supported.
90 NS* unimplemented optional feature.
91 P pNFS features implemented out of tree.
92 PNS pNFS features that are not supported yet (out of tree).
93
94Operations
95
96 +----------------------+------------+--------------+----------------+
97 | Operation | REQ, REC, | Feature | Definition |
98 | | OPT, or | (REQ, REC, | |
99 | | MNI | or OPT) | |
100 +----------------------+------------+--------------+----------------+
101 | ACCESS | REQ | | Section 18.1 |
102NS | BACKCHANNEL_CTL | REQ | | Section 18.33 |
103NS | BIND_CONN_TO_SESSION | REQ | | Section 18.34 |
104 | CLOSE | REQ | | Section 18.2 |
105 | COMMIT | REQ | | Section 18.3 |
106 | CREATE | REQ | | Section 18.4 |
107I | CREATE_SESSION | REQ | | Section 18.36 |
108NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 |
109 | DELEGRETURN | OPT | FDELG, | Section 18.6 |
110 | | | DDELG, pNFS | |
111 | | | (REQ) | |
112NS | DESTROY_CLIENTID | REQ | | Section 18.50 |
113I | DESTROY_SESSION | REQ | | Section 18.37 |
114I | EXCHANGE_ID | REQ | | Section 18.35 |
115NS | FREE_STATEID | REQ | | Section 18.38 |
116 | GETATTR | REQ | | Section 18.7 |
117P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 |
118P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 |
119 | GETFH | REQ | | Section 18.8 |
120NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 |
121P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 |
122P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 |
123P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 |
124 | LINK | OPT | | Section 18.9 |
125 | LOCK | REQ | | Section 18.10 |
126 | LOCKT | REQ | | Section 18.11 |
127 | LOCKU | REQ | | Section 18.12 |
128 | LOOKUP | REQ | | Section 18.13 |
129 | LOOKUPP | REQ | | Section 18.14 |
130 | NVERIFY | REQ | | Section 18.15 |
131 | OPEN | REQ | | Section 18.16 |
132NS*| OPENATTR | OPT | | Section 18.17 |
133 | OPEN_CONFIRM | MNI | | N/A |
134 | OPEN_DOWNGRADE | REQ | | Section 18.18 |
135 | PUTFH | REQ | | Section 18.19 |
136 | PUTPUBFH | REQ | | Section 18.20 |
137 | PUTROOTFH | REQ | | Section 18.21 |
138 | READ | REQ | | Section 18.22 |
139 | READDIR | REQ | | Section 18.23 |
140 | READLINK | OPT | | Section 18.24 |
141NS | RECLAIM_COMPLETE | REQ | | Section 18.51 |
142 | RELEASE_LOCKOWNER | MNI | | N/A |
143 | REMOVE | REQ | | Section 18.25 |
144 | RENAME | REQ | | Section 18.26 |
145 | RENEW | MNI | | N/A |
146 | RESTOREFH | REQ | | Section 18.27 |
147 | SAVEFH | REQ | | Section 18.28 |
148 | SECINFO | REQ | | Section 18.29 |
149NS | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, |
150 | | | layout (REQ) | Section 13.12 |
151I | SEQUENCE | REQ | | Section 18.46 |
152 | SETATTR | REQ | | Section 18.30 |
153 | SETCLIENTID | MNI | | N/A |
154 | SETCLIENTID_CONFIRM | MNI | | N/A |
155NS | SET_SSV | REQ | | Section 18.47 |
156NS | TEST_STATEID | REQ | | Section 18.48 |
157 | VERIFY | REQ | | Section 18.31 |
158NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 |
159 | WRITE | REQ | | Section 18.32 |
160
161Callback Operations
162
163 +-------------------------+-----------+-------------+---------------+
164 | Operation | REQ, REC, | Feature | Definition |
165 | | OPT, or | (REQ, REC, | |
166 | | MNI | or OPT) | |
167 +-------------------------+-----------+-------------+---------------+
168 | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 |
169P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 |
170NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 |
171P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 |
172NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 |
173NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 |
174 | CB_RECALL | OPT | FDELG, | Section 20.2 |
175 | | | DDELG, pNFS | |
176 | | | (REQ) | |
177NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 |
178 | | | DDELG, pNFS | |
179 | | | (REQ) | |
180NS | CB_RECALL_SLOT | REQ | | Section 20.8 |
181NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 |
182 | | | (REQ) | |
183I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 |
184 | | | DDELG, pNFS | |
185 | | | (REQ) | |
186NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 |
187 | | | DDELG, pNFS | |
188 | | | (REQ) | |
189 +-------------------------+-----------+-------------+---------------+
190
191Implementation notes:
192
193DELEGPURGE:
194* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or
195 CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that
196 persist across client reboots). Thus we need not implement this for
197 now.
198
199EXCHANGE_ID:
200* only SP4_NONE state protection supported
201* implementation ids are ignored
202
203CREATE_SESSION:
204* backchannel attributes are ignored
205* backchannel security parameters are ignored
206
207SEQUENCE:
208* no support for dynamic slot table renegotiation (optional)
209
210nfsv4.1 COMPOUND rules:
211The following cases aren't supported yet:
212* Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION,
213 DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID.
214* DESTROY_SESSION MUST be the final operation in the COMPOUND request.
215
216Nonstandard compound limitations:
217* No support for a sessions fore channel RPC compound that requires both a
218 ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
219 fail to live up to the promise we made in CREATE_SESSION fore channel
220 negotiation.
221* No more than one IO operation (read, write, readdir) allowed per
222 compound.
diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt
new file mode 100644
index 000000000000..3ba0b945aaf8
--- /dev/null
+++ b/Documentation/filesystems/nfs/nfsroot.txt
@@ -0,0 +1,270 @@
1Mounting the root filesystem via NFS (nfsroot)
2===============================================
3
4Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
5Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
6Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
7Updated 2006 by Horms <horms@verge.net.au>
8
9
10
11In order to use a diskless system, such as an X-terminal or printer server
12for example, it is necessary for the root filesystem to be present on a
13non-disk device. This may be an initramfs (see Documentation/filesystems/
14ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a
15filesystem mounted via NFS. The following text describes on how to use NFS
16for the root filesystem. For the rest of this text 'client' means the
17diskless system, and 'server' means the NFS server.
18
19
20
21
221.) Enabling nfsroot capabilities
23 -----------------------------
24
25In order to use nfsroot, NFS client support needs to be selected as
26built-in during configuration. Once this has been selected, the nfsroot
27option will become available, which should also be selected.
28
29In the networking options, kernel level autoconfiguration can be selected,
30along with the types of autoconfiguration to support. Selecting all of
31DHCP, BOOTP and RARP is safe.
32
33
34
35
362.) Kernel command line
37 -------------------
38
39When the kernel has been loaded by a boot loader (see below) it needs to be
40told what root fs device to use. And in the case of nfsroot, where to find
41both the server and the name of the directory on the server to mount as root.
42This can be established using the following kernel command line parameters:
43
44
45root=/dev/nfs
46
47 This is necessary to enable the pseudo-NFS-device. Note that it's not a
48 real device but just a synonym to tell the kernel to use NFS instead of
49 a real device.
50
51
52nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
53
54 If the `nfsroot' parameter is NOT given on the command line,
55 the default "/tftpboot/%s" will be used.
56
57 <server-ip> Specifies the IP address of the NFS server.
58 The default address is determined by the `ip' parameter
59 (see below). This parameter allows the use of different
60 servers for IP autoconfiguration and NFS.
61
62 <root-dir> Name of the directory on the server to mount as root.
63 If there is a "%s" token in the string, it will be
64 replaced by the ASCII-representation of the client's
65 IP address.
66
67 <nfs-options> Standard NFS options. All options are separated by commas.
68 The following defaults are used:
69 port = as given by server portmap daemon
70 rsize = 4096
71 wsize = 4096
72 timeo = 7
73 retrans = 3
74 acregmin = 3
75 acregmax = 60
76 acdirmin = 30
77 acdirmax = 60
78 flags = hard, nointr, noposix, cto, ac
79
80
81ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
82
83 This parameter tells the kernel how to configure IP addresses of devices
84 and also how to set up the IP routing table. It was originally called
85 `nfsaddrs', but now the boot-time IP configuration works independently of
86 NFS, so it was renamed to `ip' and the old name remained as an alias for
87 compatibility reasons.
88
89 If this parameter is missing from the kernel command line, all fields are
90 assumed to be empty, and the defaults mentioned below apply. In general
91 this means that the kernel tries to configure everything using
92 autoconfiguration.
93
94 The <autoconf> parameter can appear alone as the value to the `ip'
95 parameter (without all the ':' characters before). If the value is
96 "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
97 autoconfiguration will take place. The most common way to use this
98 is "ip=dhcp".
99
100 <client-ip> IP address of the client.
101
102 Default: Determined using autoconfiguration.
103
104 <server-ip> IP address of the NFS server. If RARP is used to determine
105 the client address and this parameter is NOT empty only
106 replies from the specified server are accepted.
107
108 Only required for NFS root. That is autoconfiguration
109 will not be triggered if it is missing and NFS root is not
110 in operation.
111
112 Default: Determined using autoconfiguration.
113 The address of the autoconfiguration server is used.
114
115 <gw-ip> IP address of a gateway if the server is on a different subnet.
116
117 Default: Determined using autoconfiguration.
118
119 <netmask> Netmask for local network interface. If unspecified
120 the netmask is derived from the client IP address assuming
121 classful addressing.
122
123 Default: Determined using autoconfiguration.
124
125 <hostname> Name of the client. May be supplied by autoconfiguration,
126 but its absence will not trigger autoconfiguration.
127
128 Default: Client IP address is used in ASCII notation.
129
130 <device> Name of network device to use.
131
132 Default: If the host only has one device, it is used.
133 Otherwise the device is determined using
134 autoconfiguration. This is done by sending
135 autoconfiguration requests out of all devices,
136 and using the device that received the first reply.
137
138 <autoconf> Method to use for autoconfiguration. In the case of options
139 which specify multiple autoconfiguration protocols,
140 requests are sent using all protocols, and the first one
141 to reply is used.
142
143 Only autoconfiguration protocols that have been compiled
144 into the kernel will be used, regardless of the value of
145 this option.
146
147 off or none: don't use autoconfiguration
148 (do static IP assignment instead)
149 on or any: use any protocol available in the kernel
150 (default)
151 dhcp: use DHCP
152 bootp: use BOOTP
153 rarp: use RARP
154 both: use both BOOTP and RARP but not DHCP
155 (old option kept for backwards compatibility)
156
157 Default: any
158
159
160
161
1623.) Boot Loader
163 ----------
164
165To get the kernel into memory different approaches can be used.
166They depend on various facilities being available:
167
168
1693.1) Booting from a floppy using syslinux
170
171 When building kernels, an easy way to create a boot floppy that uses
172 syslinux is to use the zdisk or bzdisk make targets which use zimage
173 and bzimage images respectively. Both targets accept the
174 FDARGS parameter which can be used to set the kernel command line.
175
176 e.g.
177 make bzdisk FDARGS="root=/dev/nfs"
178
179 Note that the user running this command will need to have
180 access to the floppy drive device, /dev/fd0
181
182 For more information on syslinux, including how to create bootdisks
183 for prebuilt kernels, see http://syslinux.zytor.com/
184
185 N.B: Previously it was possible to write a kernel directly to
186 a floppy using dd, configure the boot device using rdev, and
187 boot using the resulting floppy. Linux no longer supports this
188 method of booting.
189
1903.2) Booting from a cdrom using isolinux
191
192 When building kernels, an easy way to create a bootable cdrom that
193 uses isolinux is to use the isoimage target which uses a bzimage
194 image. Like zdisk and bzdisk, this target accepts the FDARGS
195 parameter which can be used to set the kernel command line.
196
197 e.g.
198 make isoimage FDARGS="root=/dev/nfs"
199
200 The resulting iso image will be arch/<ARCH>/boot/image.iso
201 This can be written to a cdrom using a variety of tools including
202 cdrecord.
203
204 e.g.
205 cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
206
207 For more information on isolinux, including how to create bootdisks
208 for prebuilt kernels, see http://syslinux.zytor.com/
209
2103.2) Using LILO
211 When using LILO all the necessary command line parameters may be
212 specified using the 'append=' directive in the LILO configuration
213 file.
214
215 However, to use the 'root=' directive you also need to create
216 a dummy root device, which may be removed after LILO is run.
217
218 mknod /dev/boot255 c 0 255
219
220 For information on configuring LILO, please refer to its documentation.
221
2223.3) Using GRUB
223 When using GRUB, kernel parameter are simply appended after the kernel
224 specification: kernel <kernel> <parameters>
225
2263.4) Using loadlin
227 loadlin may be used to boot Linux from a DOS command prompt without
228 requiring a local hard disk to mount as root. This has not been
229 thoroughly tested by the authors of this document, but in general
230 it should be possible configure the kernel command line similarly
231 to the configuration of LILO.
232
233 Please refer to the loadlin documentation for further information.
234
2353.5) Using a boot ROM
236 This is probably the most elegant way of booting a diskless client.
237 With a boot ROM the kernel is loaded using the TFTP protocol. The
238 authors of this document are not aware of any no commercial boot
239 ROMs that support booting Linux over the network. However, there
240 are two free implementations of a boot ROM, netboot-nfs and
241 etherboot, both of which are available on sunsite.unc.edu, and both
242 of which contain everything you need to boot a diskless Linux client.
243
2443.6) Using pxelinux
245 Pxelinux may be used to boot linux using the PXE boot loader
246 which is present on many modern network cards.
247
248 When using pxelinux, the kernel image is specified using
249 "kernel <relative-path-below /tftpboot>". The nfsroot parameters
250 are passed to the kernel by adding them to the "append" line.
251 It is common to use serial console in conjunction with pxeliunx,
252 see Documentation/serial-console.txt for more information.
253
254 For more information on isolinux, including how to create bootdisks
255 for prebuilt kernels, see http://syslinux.zytor.com/
256
257
258
259
2604.) Credits
261 -------
262
263 The nfsroot code in the kernel and the RARP support have been written
264 by Gero Kuhlmann <gero@gkminix.han.de>.
265
266 The rest of the IP layer autoconfiguration code has been written
267 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>.
268
269 In order to write the initial version of nfsroot I would like to thank
270 Jens-Uwe Mager <jum@anubis.han.de> for his help.