aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/filesystems/orangefs.txt50
1 files changed, 46 insertions, 4 deletions
diff --git a/Documentation/filesystems/orangefs.txt b/Documentation/filesystems/orangefs.txt
index e1a0056a365f..1dfdec790946 100644
--- a/Documentation/filesystems/orangefs.txt
+++ b/Documentation/filesystems/orangefs.txt
@@ -281,7 +281,7 @@ on the wait queue and one attempt is made to recycle them. Obviously,
281if the client-core stays dead too long, the arbitrary userspace processes 281if the client-core stays dead too long, the arbitrary userspace processes
282trying to use Orangefs will be negatively affected. Waiting ops 282trying to use Orangefs will be negatively affected. Waiting ops
283that can't be serviced will be removed from the request list and 283that can't be serviced will be removed from the request list and
284have their states set to "given up". In-progress ops that can't 284have their states set to "given up". In-progress ops that can't
285be serviced will be removed from the in_progress hash table and 285be serviced will be removed from the in_progress hash table and
286have their states set to "given up". 286have their states set to "given up".
287 287
@@ -338,7 +338,7 @@ particular response.
338 PVFS2_VFS_OP_STATFS 338 PVFS2_VFS_OP_STATFS
339 fill a pvfs2_statfs_response_t with useless info <g>. It is hard for 339 fill a pvfs2_statfs_response_t with useless info <g>. It is hard for
340 us to know, in a timely fashion, these statistics about our 340 us to know, in a timely fashion, these statistics about our
341 distributed network filesystem. 341 distributed network filesystem.
342 342
343 PVFS2_VFS_OP_FS_MOUNT 343 PVFS2_VFS_OP_FS_MOUNT
344 fill a pvfs2_fs_mount_response_t which is just like a PVFS_object_kref 344 fill a pvfs2_fs_mount_response_t which is just like a PVFS_object_kref
@@ -386,7 +386,7 @@ responses:
386 386
387 io_array[1].iov_base = address of global variable "pdev_magic" (int32_t) 387 io_array[1].iov_base = address of global variable "pdev_magic" (int32_t)
388 io_array[1].iov_len = sizeof(int32_t) 388 io_array[1].iov_len = sizeof(int32_t)
389 389
390 io_array[2].iov_base = address of parameter "tag" (PVFS_id_gen_t) 390 io_array[2].iov_base = address of parameter "tag" (PVFS_id_gen_t)
391 io_array[2].iov_len = sizeof(int64_t) 391 io_array[2].iov_len = sizeof(int64_t)
392 392
@@ -402,5 +402,47 @@ Readdir responses initialize the fifth element io_array like this:
402 io_array[4].iov_len = contents of member trailer_size (PVFS_size) 402 io_array[4].iov_len = contents of member trailer_size (PVFS_size)
403 from out_downcall member of global variable 403 from out_downcall member of global variable
404 vfs_request 404 vfs_request
405 405
406Orangefs exploits the dcache in order to avoid sending redundant
407requests to userspace. We keep object inode attributes up-to-date with
408orangefs_inode_getattr. Orangefs_inode_getattr uses two arguments to
409help it decide whether or not to update an inode: "new" and "bypass".
410Orangefs keeps private data in an object's inode that includes a short
411timeout value, getattr_time, which allows any iteration of
412orangefs_inode_getattr to know how long it has been since the inode was
413updated. When the object is not new (new == 0) and the bypass flag is not
414set (bypass == 0) orangefs_inode_getattr returns without updating the inode
415if getattr_time has not timed out. Getattr_time is updated each time the
416inode is updated.
417
418Creation of a new object (file, dir, sym-link) includes the evaluation of
419its pathname, resulting in a negative directory entry for the object.
420A new inode is allocated and associated with the dentry, turning it from
421a negative dentry into a "productive full member of society". Orangefs
422obtains the new inode from Linux with new_inode() and associates
423the inode with the dentry by sending the pair back to Linux with
424d_instantiate().
425
426The evaluation of a pathname for an object resolves to its corresponding
427dentry. If there is no corresponding dentry, one is created for it in
428the dcache. Whenever a dentry is modified or verified Orangefs stores a
429short timeout value in the dentry's d_time, and the dentry will be trusted
430for that amount of time. Orangefs is a network filesystem, and objects
431can potentially change out-of-band with any particular Orangefs kernel module
432instance, so trusting a dentry is risky. The alternative to trusting
433dentries is to always obtain the needed information from userspace - at
434least a trip to the client-core, maybe to the servers. Obtaining information
435from a dentry is cheap, obtaining it from userspace is relatively expensive,
436hence the motivation to use the dentry when possible.
437
438The timeout values d_time and getattr_time are jiffy based, and the
439code is designed to avoid the jiffy-wrap problem:
440
441"In general, if the clock may have wrapped around more than once, there
442is no way to tell how much time has elapsed. However, if the times t1
443and t2 are known to be fairly close, we can reliably compute the
444difference in a way that takes into account the possibility that the
445clock may have wrapped between times."
446
447 from course notes by instructor Andy Wang
406 448