1
0
Fork 0

Orangefs: update orangefs.txt

Describe use of jiffy-based timeout values involved in inode maintenance.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
hifive-unleashed-5.1
Mike Marshall 2016-08-01 14:01:40 -04:00 committed by Martin Brandenburg
parent 8bbb20a863
commit 302f0493f0
1 changed files with 46 additions and 4 deletions

View File

@ -281,7 +281,7 @@ on the wait queue and one attempt is made to recycle them. Obviously,
if the client-core stays dead too long, the arbitrary userspace processes
trying to use Orangefs will be negatively affected. Waiting ops
that can't be serviced will be removed from the request list and
have their states set to "given up". In-progress ops that can't
have their states set to "given up". In-progress ops that can't
be serviced will be removed from the in_progress hash table and
have their states set to "given up".
@ -338,7 +338,7 @@ particular response.
PVFS2_VFS_OP_STATFS
fill a pvfs2_statfs_response_t with useless info <g>. It is hard for
us to know, in a timely fashion, these statistics about our
distributed network filesystem.
distributed network filesystem.
PVFS2_VFS_OP_FS_MOUNT
fill a pvfs2_fs_mount_response_t which is just like a PVFS_object_kref
@ -386,7 +386,7 @@ responses:
io_array[1].iov_base = address of global variable "pdev_magic" (int32_t)
io_array[1].iov_len = sizeof(int32_t)
io_array[2].iov_base = address of parameter "tag" (PVFS_id_gen_t)
io_array[2].iov_len = sizeof(int64_t)
@ -402,5 +402,47 @@ Readdir responses initialize the fifth element io_array like this:
io_array[4].iov_len = contents of member trailer_size (PVFS_size)
from out_downcall member of global variable
vfs_request
Orangefs exploits the dcache in order to avoid sending redundant
requests to userspace. We keep object inode attributes up-to-date with
orangefs_inode_getattr. Orangefs_inode_getattr uses two arguments to
help it decide whether or not to update an inode: "new" and "bypass".
Orangefs keeps private data in an object's inode that includes a short
timeout value, getattr_time, which allows any iteration of
orangefs_inode_getattr to know how long it has been since the inode was
updated. When the object is not new (new == 0) and the bypass flag is not
set (bypass == 0) orangefs_inode_getattr returns without updating the inode
if getattr_time has not timed out. Getattr_time is updated each time the
inode is updated.
Creation of a new object (file, dir, sym-link) includes the evaluation of
its pathname, resulting in a negative directory entry for the object.
A new inode is allocated and associated with the dentry, turning it from
a negative dentry into a "productive full member of society". Orangefs
obtains the new inode from Linux with new_inode() and associates
the inode with the dentry by sending the pair back to Linux with
d_instantiate().
The evaluation of a pathname for an object resolves to its corresponding
dentry. If there is no corresponding dentry, one is created for it in
the dcache. Whenever a dentry is modified or verified Orangefs stores a
short timeout value in the dentry's d_time, and the dentry will be trusted
for that amount of time. Orangefs is a network filesystem, and objects
can potentially change out-of-band with any particular Orangefs kernel module
instance, so trusting a dentry is risky. The alternative to trusting
dentries is to always obtain the needed information from userspace - at
least a trip to the client-core, maybe to the servers. Obtaining information
from a dentry is cheap, obtaining it from userspace is relatively expensive,
hence the motivation to use the dentry when possible.
The timeout values d_time and getattr_time are jiffy based, and the
code is designed to avoid the jiffy-wrap problem:
"In general, if the clock may have wrapped around more than once, there
is no way to tell how much time has elapsed. However, if the times t1
and t2 are known to be fairly close, we can reliably compute the
difference in a way that takes into account the possibility that the
clock may have wrapped between times."
from course notes by instructor Andy Wang