diff options
Diffstat (limited to 'Documentation/filesystems/caching/object.txt')
-rw-r--r-- | Documentation/filesystems/caching/object.txt | 313 |
1 files changed, 313 insertions, 0 deletions
diff --git a/Documentation/filesystems/caching/object.txt b/Documentation/filesystems/caching/object.txt new file mode 100644 index 000000000000..e8b0a35d8fe5 --- /dev/null +++ b/Documentation/filesystems/caching/object.txt | |||
@@ -0,0 +1,313 @@ | |||
1 | ==================================================== | ||
2 | IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT | ||
3 | ==================================================== | ||
4 | |||
5 | By: David Howells <dhowells@redhat.com> | ||
6 | |||
7 | Contents: | ||
8 | |||
9 | (*) Representation | ||
10 | |||
11 | (*) Object management state machine. | ||
12 | |||
13 | - Provision of cpu time. | ||
14 | - Locking simplification. | ||
15 | |||
16 | (*) The set of states. | ||
17 | |||
18 | (*) The set of events. | ||
19 | |||
20 | |||
21 | ============== | ||
22 | REPRESENTATION | ||
23 | ============== | ||
24 | |||
25 | FS-Cache maintains an in-kernel representation of each object that a netfs is | ||
26 | currently interested in. Such objects are represented by the fscache_cookie | ||
27 | struct and are referred to as cookies. | ||
28 | |||
29 | FS-Cache also maintains a separate in-kernel representation of the objects that | ||
30 | a cache backend is currently actively caching. Such objects are represented by | ||
31 | the fscache_object struct. The cache backends allocate these upon request, and | ||
32 | are expected to embed them in their own representations. These are referred to | ||
33 | as objects. | ||
34 | |||
35 | There is a 1:N relationship between cookies and objects. A cookie may be | ||
36 | represented by multiple objects - an index may exist in more than one cache - | ||
37 | or even by no objects (it may not be cached). | ||
38 | |||
39 | Furthermore, both cookies and objects are hierarchical. The two hierarchies | ||
40 | correspond, but the cookies tree is a superset of the union of the object trees | ||
41 | of multiple caches: | ||
42 | |||
43 | NETFS INDEX TREE : CACHE 1 : CACHE 2 | ||
44 | : : | ||
45 | : +-----------+ : | ||
46 | +----------->| IObject | : | ||
47 | +-----------+ | : +-----------+ : | ||
48 | | ICookie |-------+ : | : | ||
49 | +-----------+ | : | : +-----------+ | ||
50 | | +------------------------------>| IObject | | ||
51 | | : | : +-----------+ | ||
52 | | : V : | | ||
53 | | : +-----------+ : | | ||
54 | V +----------->| IObject | : | | ||
55 | +-----------+ | : +-----------+ : | | ||
56 | | ICookie |-------+ : | : V | ||
57 | +-----------+ | : | : +-----------+ | ||
58 | | +------------------------------>| IObject | | ||
59 | +-----+-----+ : | : +-----------+ | ||
60 | | | : | : | | ||
61 | V | : V : | | ||
62 | +-----------+ | : +-----------+ : | | ||
63 | | ICookie |------------------------->| IObject | : | | ||
64 | +-----------+ | : +-----------+ : | | ||
65 | | V : | : V | ||
66 | | +-----------+ : | : +-----------+ | ||
67 | | | ICookie |-------------------------------->| IObject | | ||
68 | | +-----------+ : | : +-----------+ | ||
69 | V | : V : | | ||
70 | +-----------+ | : +-----------+ : | | ||
71 | | DCookie |------------------------->| DObject | : | | ||
72 | +-----------+ | : +-----------+ : | | ||
73 | | : : | | ||
74 | +-------+-------+ : : | | ||
75 | | | : : | | ||
76 | V V : : V | ||
77 | +-----------+ +-----------+ : : +-----------+ | ||
78 | | DCookie | | DCookie |------------------------>| DObject | | ||
79 | +-----------+ +-----------+ : : +-----------+ | ||
80 | : : | ||
81 | |||
82 | In the above illustration, ICookie and IObject represent indices and DCookie | ||
83 | and DObject represent data storage objects. Indices may have representation in | ||
84 | multiple caches, but currently, non-index objects may not. Objects of any type | ||
85 | may also be entirely unrepresented. | ||
86 | |||
87 | As far as the netfs API goes, the netfs is only actually permitted to see | ||
88 | pointers to the cookies. The cookies themselves and any objects attached to | ||
89 | those cookies are hidden from it. | ||
90 | |||
91 | |||
92 | =============================== | ||
93 | OBJECT MANAGEMENT STATE MACHINE | ||
94 | =============================== | ||
95 | |||
96 | Within FS-Cache, each active object is managed by its own individual state | ||
97 | machine. The state for an object is kept in the fscache_object struct, in | ||
98 | object->state. A cookie may point to a set of objects that are in different | ||
99 | states. | ||
100 | |||
101 | Each state has an action associated with it that is invoked when the machine | ||
102 | wakes up in that state. There are four logical sets of states: | ||
103 | |||
104 | (1) Preparation: states that wait for the parent objects to become ready. The | ||
105 | representations are hierarchical, and it is expected that an object must | ||
106 | be created or accessed with respect to its parent object. | ||
107 | |||
108 | (2) Initialisation: states that perform lookups in the cache and validate | ||
109 | what's found and that create on disk any missing metadata. | ||
110 | |||
111 | (3) Normal running: states that allow netfs operations on objects to proceed | ||
112 | and that update the state of objects. | ||
113 | |||
114 | (4) Termination: states that detach objects from their netfs cookies, that | ||
115 | delete objects from disk, that handle disk and system errors and that free | ||
116 | up in-memory resources. | ||
117 | |||
118 | |||
119 | In most cases, transitioning between states is in response to signalled events. | ||
120 | When a state has finished processing, it will usually set the mask of events in | ||
121 | which it is interested (object->event_mask) and relinquish the worker thread. | ||
122 | Then when an event is raised (by calling fscache_raise_event()), if the event | ||
123 | is not masked, the object will be queued for processing (by calling | ||
124 | fscache_enqueue_object()). | ||
125 | |||
126 | |||
127 | PROVISION OF CPU TIME | ||
128 | --------------------- | ||
129 | |||
130 | The work to be done by the various states is given CPU time by the threads of | ||
131 | the slow work facility (see Documentation/slow-work.txt). This is used in | ||
132 | preference to the workqueue facility because: | ||
133 | |||
134 | (1) Threads may be completely occupied for very long periods of time by a | ||
135 | particular work item. These state actions may be doing sequences of | ||
136 | synchronous, journalled disk accesses (lookup, mkdir, create, setxattr, | ||
137 | getxattr, truncate, unlink, rmdir, rename). | ||
138 | |||
139 | (2) Threads may do little actual work, but may rather spend a lot of time | ||
140 | sleeping on I/O. This means that single-threaded and 1-per-CPU-threaded | ||
141 | workqueues don't necessarily have the right numbers of threads. | ||
142 | |||
143 | |||
144 | LOCKING SIMPLIFICATION | ||
145 | ---------------------- | ||
146 | |||
147 | Because only one worker thread may be operating on any particular object's | ||
148 | state machine at once, this simplifies the locking, particularly with respect | ||
149 | to disconnecting the netfs's representation of a cache object (fscache_cookie) | ||
150 | from the cache backend's representation (fscache_object) - which may be | ||
151 | requested from either end. | ||
152 | |||
153 | |||
154 | ================= | ||
155 | THE SET OF STATES | ||
156 | ================= | ||
157 | |||
158 | The object state machine has a set of states that it can be in. There are | ||
159 | preparation states in which the object sets itself up and waits for its parent | ||
160 | object to transit to a state that allows access to its children: | ||
161 | |||
162 | (1) State FSCACHE_OBJECT_INIT. | ||
163 | |||
164 | Initialise the object and wait for the parent object to become active. In | ||
165 | the cache, it is expected that it will not be possible to look an object | ||
166 | up from the parent object, until that parent object itself has been looked | ||
167 | up. | ||
168 | |||
169 | There are initialisation states in which the object sets itself up and accesses | ||
170 | disk for the object metadata: | ||
171 | |||
172 | (2) State FSCACHE_OBJECT_LOOKING_UP. | ||
173 | |||
174 | Look up the object on disk, using the parent as a starting point. | ||
175 | FS-Cache expects the cache backend to probe the cache to see whether this | ||
176 | object is represented there, and if it is, to see if it's valid (coherency | ||
177 | management). | ||
178 | |||
179 | The cache should call fscache_object_lookup_negative() to indicate lookup | ||
180 | failure for whatever reason, and should call fscache_obtained_object() to | ||
181 | indicate success. | ||
182 | |||
183 | At the completion of lookup, FS-Cache will let the netfs go ahead with | ||
184 | read operations, no matter whether the file is yet cached. If not yet | ||
185 | cached, read operations will be immediately rejected with ENODATA until | ||
186 | the first known page is uncached - as to that point there can be no data | ||
187 | to be read out of the cache for that file that isn't currently also held | ||
188 | in the pagecache. | ||
189 | |||
190 | (3) State FSCACHE_OBJECT_CREATING. | ||
191 | |||
192 | Create an object on disk, using the parent as a starting point. This | ||
193 | happens if the lookup failed to find the object, or if the object's | ||
194 | coherency data indicated what's on disk is out of date. In this state, | ||
195 | FS-Cache expects the cache to create | ||
196 | |||
197 | The cache should call fscache_obtained_object() if creation completes | ||
198 | successfully, fscache_object_lookup_negative() otherwise. | ||
199 | |||
200 | At the completion of creation, FS-Cache will start processing write | ||
201 | operations the netfs has queued for an object. If creation failed, the | ||
202 | write ops will be transparently discarded, and nothing recorded in the | ||
203 | cache. | ||
204 | |||
205 | There are some normal running states in which the object spends its time | ||
206 | servicing netfs requests: | ||
207 | |||
208 | (4) State FSCACHE_OBJECT_AVAILABLE. | ||
209 | |||
210 | A transient state in which pending operations are started, child objects | ||
211 | are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary | ||
212 | lookup data is freed. | ||
213 | |||
214 | (5) State FSCACHE_OBJECT_ACTIVE. | ||
215 | |||
216 | The normal running state. In this state, requests the netfs makes will be | ||
217 | passed on to the cache. | ||
218 | |||
219 | (6) State FSCACHE_OBJECT_UPDATING. | ||
220 | |||
221 | The state machine comes here to update the object in the cache from the | ||
222 | netfs's records. This involves updating the auxiliary data that is used | ||
223 | to maintain coherency. | ||
224 | |||
225 | And there are terminal states in which an object cleans itself up, deallocates | ||
226 | memory and potentially deletes stuff from disk: | ||
227 | |||
228 | (7) State FSCACHE_OBJECT_LC_DYING. | ||
229 | |||
230 | The object comes here if it is dying because of a lookup or creation | ||
231 | error. This would be due to a disk error or system error of some sort. | ||
232 | Temporary data is cleaned up, and the parent is released. | ||
233 | |||
234 | (8) State FSCACHE_OBJECT_DYING. | ||
235 | |||
236 | The object comes here if it is dying due to an error, because its parent | ||
237 | cookie has been relinquished by the netfs or because the cache is being | ||
238 | withdrawn. | ||
239 | |||
240 | Any child objects waiting on this one are given CPU time so that they too | ||
241 | can destroy themselves. This object waits for all its children to go away | ||
242 | before advancing to the next state. | ||
243 | |||
244 | (9) State FSCACHE_OBJECT_ABORT_INIT. | ||
245 | |||
246 | The object comes to this state if it was waiting on its parent in | ||
247 | FSCACHE_OBJECT_INIT, but its parent died. The object will destroy itself | ||
248 | so that the parent may proceed from the FSCACHE_OBJECT_DYING state. | ||
249 | |||
250 | (10) State FSCACHE_OBJECT_RELEASING. | ||
251 | (11) State FSCACHE_OBJECT_RECYCLING. | ||
252 | |||
253 | The object comes to one of these two states when dying once it is rid of | ||
254 | all its children, if it is dying because the netfs relinquished its | ||
255 | cookie. In the first state, the cached data is expected to persist, and | ||
256 | in the second it will be deleted. | ||
257 | |||
258 | (12) State FSCACHE_OBJECT_WITHDRAWING. | ||
259 | |||
260 | The object transits to this state if the cache decides it wants to | ||
261 | withdraw the object from service, perhaps to make space, but also due to | ||
262 | error or just because the whole cache is being withdrawn. | ||
263 | |||
264 | (13) State FSCACHE_OBJECT_DEAD. | ||
265 | |||
266 | The object transits to this state when the in-memory object record is | ||
267 | ready to be deleted. The object processor shouldn't ever see an object in | ||
268 | this state. | ||
269 | |||
270 | |||
271 | THE SET OF EVENTS | ||
272 | ----------------- | ||
273 | |||
274 | There are a number of events that can be raised to an object state machine: | ||
275 | |||
276 | (*) FSCACHE_OBJECT_EV_UPDATE | ||
277 | |||
278 | The netfs requested that an object be updated. The state machine will ask | ||
279 | the cache backend to update the object, and the cache backend will ask the | ||
280 | netfs for details of the change through its cookie definition ops. | ||
281 | |||
282 | (*) FSCACHE_OBJECT_EV_CLEARED | ||
283 | |||
284 | This is signalled in two circumstances: | ||
285 | |||
286 | (a) when an object's last child object is dropped and | ||
287 | |||
288 | (b) when the last operation outstanding on an object is completed. | ||
289 | |||
290 | This is used to proceed from the dying state. | ||
291 | |||
292 | (*) FSCACHE_OBJECT_EV_ERROR | ||
293 | |||
294 | This is signalled when an I/O error occurs during the processing of some | ||
295 | object. | ||
296 | |||
297 | (*) FSCACHE_OBJECT_EV_RELEASE | ||
298 | (*) FSCACHE_OBJECT_EV_RETIRE | ||
299 | |||
300 | These are signalled when the netfs relinquishes a cookie it was using. | ||
301 | The event selected depends on whether the netfs asks for the backing | ||
302 | object to be retired (deleted) or retained. | ||
303 | |||
304 | (*) FSCACHE_OBJECT_EV_WITHDRAW | ||
305 | |||
306 | This is signalled when the cache backend wants to withdraw an object. | ||
307 | This means that the object will have to be detached from the netfs's | ||
308 | cookie. | ||
309 | |||
310 | Because the withdrawing releasing/retiring events are all handled by the object | ||
311 | state machine, it doesn't matter if there's a collision with both ends trying | ||
312 | to sever the connection at the same time. The state machine can just pick | ||
313 | which one it wants to honour, and that effects the other. | ||