diff options
Diffstat (limited to 'Documentation/filesystems/nfs/Exporting')
-rw-r--r-- | Documentation/filesystems/nfs/Exporting | 147 |
1 files changed, 147 insertions, 0 deletions
diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting new file mode 100644 index 000000000000..87019d2b5981 --- /dev/null +++ b/Documentation/filesystems/nfs/Exporting | |||
@@ -0,0 +1,147 @@ | |||
1 | |||
2 | Making Filesystems Exportable | ||
3 | ============================= | ||
4 | |||
5 | Overview | ||
6 | -------- | ||
7 | |||
8 | All filesystem operations require a dentry (or two) as a starting | ||
9 | point. Local applications have a reference-counted hold on suitable | ||
10 | dentries via open file descriptors or cwd/root. However remote | ||
11 | applications that access a filesystem via a remote filesystem protocol | ||
12 | such as NFS may not be able to hold such a reference, and so need a | ||
13 | different way to refer to a particular dentry. As the alternative | ||
14 | form of reference needs to be stable across renames, truncates, and | ||
15 | server-reboot (among other things, though these tend to be the most | ||
16 | problematic), there is no simple answer like 'filename'. | ||
17 | |||
18 | The mechanism discussed here allows each filesystem implementation to | ||
19 | specify how to generate an opaque (outside of the filesystem) byte | ||
20 | string for any dentry, and how to find an appropriate dentry for any | ||
21 | given opaque byte string. | ||
22 | This byte string will be called a "filehandle fragment" as it | ||
23 | corresponds to part of an NFS filehandle. | ||
24 | |||
25 | A filesystem which supports the mapping between filehandle fragments | ||
26 | and dentries will be termed "exportable". | ||
27 | |||
28 | |||
29 | |||
30 | Dcache Issues | ||
31 | ------------- | ||
32 | |||
33 | The dcache normally contains a proper prefix of any given filesystem | ||
34 | tree. This means that if any filesystem object is in the dcache, then | ||
35 | all of the ancestors of that filesystem object are also in the dcache. | ||
36 | As normal access is by filename this prefix is created naturally and | ||
37 | maintained easily (by each object maintaining a reference count on | ||
38 | its parent). | ||
39 | |||
40 | However when objects are included into the dcache by interpreting a | ||
41 | filehandle fragment, there is no automatic creation of a path prefix | ||
42 | for the object. This leads to two related but distinct features of | ||
43 | the dcache that are not needed for normal filesystem access. | ||
44 | |||
45 | 1/ The dcache must sometimes contain objects that are not part of the | ||
46 | proper prefix. i.e that are not connected to the root. | ||
47 | 2/ The dcache must be prepared for a newly found (via ->lookup) directory | ||
48 | to already have a (non-connected) dentry, and must be able to move | ||
49 | that dentry into place (based on the parent and name in the | ||
50 | ->lookup). This is particularly needed for directories as | ||
51 | it is a dcache invariant that directories only have one dentry. | ||
52 | |||
53 | To implement these features, the dcache has: | ||
54 | |||
55 | a/ A dentry flag DCACHE_DISCONNECTED which is set on | ||
56 | any dentry that might not be part of the proper prefix. | ||
57 | This is set when anonymous dentries are created, and cleared when a | ||
58 | dentry is noticed to be a child of a dentry which is in the proper | ||
59 | prefix. | ||
60 | |||
61 | b/ A per-superblock list "s_anon" of dentries which are the roots of | ||
62 | subtrees that are not in the proper prefix. These dentries, as | ||
63 | well as the proper prefix, need to be released at unmount time. As | ||
64 | these dentries will not be hashed, they are linked together on the | ||
65 | d_hash list_head. | ||
66 | |||
67 | c/ Helper routines to allocate anonymous dentries, and to help attach | ||
68 | loose directory dentries at lookup time. They are: | ||
69 | d_alloc_anon(inode) will return a dentry for the given inode. | ||
70 | If the inode already has a dentry, one of those is returned. | ||
71 | If it doesn't, a new anonymous (IS_ROOT and | ||
72 | DCACHE_DISCONNECTED) dentry is allocated and attached. | ||
73 | In the case of a directory, care is taken that only one dentry | ||
74 | can ever be attached. | ||
75 | d_splice_alias(inode, dentry) will make sure that there is a | ||
76 | dentry with the same name and parent as the given dentry, and | ||
77 | which refers to the given inode. | ||
78 | If the inode is a directory and already has a dentry, then that | ||
79 | dentry is d_moved over the given dentry. | ||
80 | If the passed dentry gets attached, care is taken that this is | ||
81 | mutually exclusive to a d_alloc_anon operation. | ||
82 | If the passed dentry is used, NULL is returned, else the used | ||
83 | dentry is returned. This corresponds to the calling pattern of | ||
84 | ->lookup. | ||
85 | |||
86 | |||
87 | Filesystem Issues | ||
88 | ----------------- | ||
89 | |||
90 | For a filesystem to be exportable it must: | ||
91 | |||
92 | 1/ provide the filehandle fragment routines described below. | ||
93 | 2/ make sure that d_splice_alias is used rather than d_add | ||
94 | when ->lookup finds an inode for a given parent and name. | ||
95 | Typically the ->lookup routine will end with a: | ||
96 | |||
97 | return d_splice_alias(inode, dentry); | ||
98 | } | ||
99 | |||
100 | |||
101 | |||
102 | A file system implementation declares that instances of the filesystem | ||
103 | are exportable by setting the s_export_op field in the struct | ||
104 | super_block. This field must point to a "struct export_operations" | ||
105 | struct which has the following members: | ||
106 | |||
107 | encode_fh (optional) | ||
108 | Takes a dentry and creates a filehandle fragment which can later be used | ||
109 | to find or create a dentry for the same object. The default | ||
110 | implementation creates a filehandle fragment that encodes a 32bit inode | ||
111 | and generation number for the inode encoded, and if necessary the | ||
112 | same information for the parent. | ||
113 | |||
114 | fh_to_dentry (mandatory) | ||
115 | Given a filehandle fragment, this should find the implied object and | ||
116 | create a dentry for it (possibly with d_alloc_anon). | ||
117 | |||
118 | fh_to_parent (optional but strongly recommended) | ||
119 | Given a filehandle fragment, this should find the parent of the | ||
120 | implied object and create a dentry for it (possibly with d_alloc_anon). | ||
121 | May fail if the filehandle fragment is too small. | ||
122 | |||
123 | get_parent (optional but strongly recommended) | ||
124 | When given a dentry for a directory, this should return a dentry for | ||
125 | the parent. Quite possibly the parent dentry will have been allocated | ||
126 | by d_alloc_anon. The default get_parent function just returns an error | ||
127 | so any filehandle lookup that requires finding a parent will fail. | ||
128 | ->lookup("..") is *not* used as a default as it can leave ".." entries | ||
129 | in the dcache which are too messy to work with. | ||
130 | |||
131 | get_name (optional) | ||
132 | When given a parent dentry and a child dentry, this should find a name | ||
133 | in the directory identified by the parent dentry, which leads to the | ||
134 | object identified by the child dentry. If no get_name function is | ||
135 | supplied, a default implementation is provided which uses vfs_readdir | ||
136 | to find potential names, and matches inode numbers to find the correct | ||
137 | match. | ||
138 | |||
139 | |||
140 | A filehandle fragment consists of an array of 1 or more 4byte words, | ||
141 | together with a one byte "type". | ||
142 | The decode_fh routine should not depend on the stated size that is | ||
143 | passed to it. This size may be larger than the original filehandle | ||
144 | generated by encode_fh, in which case it will have been padded with | ||
145 | nuls. Rather, the encode_fh routine should choose a "type" which | ||
146 | indicates the decode_fh how much of the filehandle is valid, and how | ||
147 | it should be interpreted. | ||