diff options
Diffstat (limited to 'Documentation/filesystems/fuse.txt')
-rw-r--r-- | Documentation/filesystems/fuse.txt | 118 |
1 files changed, 71 insertions, 47 deletions
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt index 33f74310d161..a584f05403a4 100644 --- a/Documentation/filesystems/fuse.txt +++ b/Documentation/filesystems/fuse.txt | |||
@@ -18,6 +18,14 @@ Non-privileged mount (or user mount): | |||
18 | user. NOTE: this is not the same as mounts allowed with the "user" | 18 | user. NOTE: this is not the same as mounts allowed with the "user" |
19 | option in /etc/fstab, which is not discussed here. | 19 | option in /etc/fstab, which is not discussed here. |
20 | 20 | ||
21 | Filesystem connection: | ||
22 | |||
23 | A connection between the filesystem daemon and the kernel. The | ||
24 | connection exists until either the daemon dies, or the filesystem is | ||
25 | umounted. Note that detaching (or lazy umounting) the filesystem | ||
26 | does _not_ break the connection, in this case it will exist until | ||
27 | the last reference to the filesystem is released. | ||
28 | |||
21 | Mount owner: | 29 | Mount owner: |
22 | 30 | ||
23 | The user who does the mounting. | 31 | The user who does the mounting. |
@@ -86,16 +94,20 @@ Mount options | |||
86 | The default is infinite. Note that the size of read requests is | 94 | The default is infinite. Note that the size of read requests is |
87 | limited anyway to 32 pages (which is 128kbyte on i386). | 95 | limited anyway to 32 pages (which is 128kbyte on i386). |
88 | 96 | ||
89 | Sysfs | 97 | Control filesystem |
90 | ~~~~~ | 98 | ~~~~~~~~~~~~~~~~~~ |
99 | |||
100 | There's a control filesystem for FUSE, which can be mounted by: | ||
91 | 101 | ||
92 | FUSE sets up the following hierarchy in sysfs: | 102 | mount -t fusectl none /sys/fs/fuse/connections |
93 | 103 | ||
94 | /sys/fs/fuse/connections/N/ | 104 | Mounting it under the '/sys/fs/fuse/connections' directory makes it |
105 | backwards compatible with earlier versions. | ||
95 | 106 | ||
96 | where N is an increasing number allocated to each new connection. | 107 | Under the fuse control filesystem each connection has a directory |
108 | named by a unique number. | ||
97 | 109 | ||
98 | For each connection the following attributes are defined: | 110 | For each connection the following files exist within this directory: |
99 | 111 | ||
100 | 'waiting' | 112 | 'waiting' |
101 | 113 | ||
@@ -110,7 +122,47 @@ For each connection the following attributes are defined: | |||
110 | connection. This means that all waiting requests will be aborted an | 122 | connection. This means that all waiting requests will be aborted an |
111 | error returned for all aborted and new requests. | 123 | error returned for all aborted and new requests. |
112 | 124 | ||
113 | Only a privileged user may read or write these attributes. | 125 | Only the owner of the mount may read or write these files. |
126 | |||
127 | Interrupting filesystem operations | ||
128 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
129 | |||
130 | If a process issuing a FUSE filesystem request is interrupted, the | ||
131 | following will happen: | ||
132 | |||
133 | 1) If the request is not yet sent to userspace AND the signal is | ||
134 | fatal (SIGKILL or unhandled fatal signal), then the request is | ||
135 | dequeued and returns immediately. | ||
136 | |||
137 | 2) If the request is not yet sent to userspace AND the signal is not | ||
138 | fatal, then an 'interrupted' flag is set for the request. When | ||
139 | the request has been successfully transfered to userspace and | ||
140 | this flag is set, an INTERRUPT request is queued. | ||
141 | |||
142 | 3) If the request is already sent to userspace, then an INTERRUPT | ||
143 | request is queued. | ||
144 | |||
145 | INTERRUPT requests take precedence over other requests, so the | ||
146 | userspace filesystem will receive queued INTERRUPTs before any others. | ||
147 | |||
148 | The userspace filesystem may ignore the INTERRUPT requests entirely, | ||
149 | or may honor them by sending a reply to the _original_ request, with | ||
150 | the error set to EINTR. | ||
151 | |||
152 | It is also possible that there's a race between processing the | ||
153 | original request and it's INTERRUPT request. There are two possibilities: | ||
154 | |||
155 | 1) The INTERRUPT request is processed before the original request is | ||
156 | processed | ||
157 | |||
158 | 2) The INTERRUPT request is processed after the original request has | ||
159 | been answered | ||
160 | |||
161 | If the filesystem cannot find the original request, it should wait for | ||
162 | some timeout and/or a number of new requests to arrive, after which it | ||
163 | should reply to the INTERRUPT request with an EAGAIN error. In case | ||
164 | 1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT | ||
165 | reply will be ignored. | ||
114 | 166 | ||
115 | Aborting a filesystem connection | 167 | Aborting a filesystem connection |
116 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 168 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
@@ -139,8 +191,8 @@ the filesystem. There are several ways to do this: | |||
139 | - Use forced umount (umount -f). Works in all cases but only if | 191 | - Use forced umount (umount -f). Works in all cases but only if |
140 | filesystem is still attached (it hasn't been lazy unmounted) | 192 | filesystem is still attached (it hasn't been lazy unmounted) |
141 | 193 | ||
142 | - Abort filesystem through the sysfs interface. Most powerful | 194 | - Abort filesystem through the FUSE control filesystem. Most |
143 | method, always works. | 195 | powerful method, always works. |
144 | 196 | ||
145 | How do non-privileged mounts work? | 197 | How do non-privileged mounts work? |
146 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 198 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
@@ -304,25 +356,7 @@ Scenario 1 - Simple deadlock | |||
304 | | | for "file"] | 356 | | | for "file"] |
305 | | | *DEADLOCK* | 357 | | | *DEADLOCK* |
306 | 358 | ||
307 | The solution for this is to allow requests to be interrupted while | 359 | The solution for this is to allow the filesystem to be aborted. |
308 | they are in userspace: | ||
309 | |||
310 | | [interrupted by signal] | | ||
311 | | <fuse_unlink() | | ||
312 | | [release semaphore] | [semaphore acquired] | ||
313 | | <sys_unlink() | | ||
314 | | | >fuse_unlink() | ||
315 | | | [queue req on fc->pending] | ||
316 | | | [wake up fc->waitq] | ||
317 | | | [sleep on req->waitq] | ||
318 | |||
319 | If the filesystem daemon was single threaded, this will stop here, | ||
320 | since there's no other thread to dequeue and execute the request. | ||
321 | In this case the solution is to kill the FUSE daemon as well. If | ||
322 | there are multiple serving threads, you just have to kill them as | ||
323 | long as any remain. | ||
324 | |||
325 | Moral: a filesystem which deadlocks, can soon find itself dead. | ||
326 | 360 | ||
327 | Scenario 2 - Tricky deadlock | 361 | Scenario 2 - Tricky deadlock |
328 | ---------------------------- | 362 | ---------------------------- |
@@ -355,24 +389,14 @@ but is caused by a pagefault. | |||
355 | | | [lock page] | 389 | | | [lock page] |
356 | | | * DEADLOCK * | 390 | | | * DEADLOCK * |
357 | 391 | ||
358 | Solution is again to let the the request be interrupted (not | 392 | Solution is basically the same as above. |
359 | elaborated further). | ||
360 | |||
361 | An additional problem is that while the write buffer is being | ||
362 | copied to the request, the request must not be interrupted. This | ||
363 | is because the destination address of the copy may not be valid | ||
364 | after the request is interrupted. | ||
365 | |||
366 | This is solved with doing the copy atomically, and allowing | ||
367 | interruption while the page(s) belonging to the write buffer are | ||
368 | faulted with get_user_pages(). The 'req->locked' flag indicates | ||
369 | when the copy is taking place, and interruption is delayed until | ||
370 | this flag is unset. | ||
371 | 393 | ||
372 | Scenario 3 - Tricky deadlock with asynchronous read | 394 | An additional problem is that while the write buffer is being copied |
373 | --------------------------------------------------- | 395 | to the request, the request must not be interrupted/aborted. This is |
396 | because the destination address of the copy may not be valid after the | ||
397 | request has returned. | ||
374 | 398 | ||
375 | The same situation as above, except thread-1 will wait on page lock | 399 | This is solved with doing the copy atomically, and allowing abort |
376 | and hence it will be uninterruptible as well. The solution is to | 400 | while the page(s) belonging to the write buffer are faulted with |
377 | abort the connection with forced umount (if mount is attached) or | 401 | get_user_pages(). The 'req->locked' flag indicates when the copy is |
378 | through the abort attribute in sysfs. | 402 | taking place, and abort is delayed until this flag is unset. |