diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-05-04 17:59:14 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-05-04 21:21:14 -0400 |
commit | 12f8ad4b0533d9212cb1d5e58ed73d2170114785 (patch) | |
tree | 6bab87d6d25b2ea246904aeabc3692e03c89b923 /fs/namei.c | |
parent | 4f988f152ee087831ea5c1c77cda4454cacc052c (diff) |
vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces
The calling conventions for __d_lookup_rcu() and dentry_cmp() are
annoying in different ways, and there is actually one single underlying
reason for both of the annoyances.
The fundamental reason is that we do the returned dentry sequence number
check inside __d_lookup_rcu() instead of doing it in the caller. This
results in two annoyances:
- __d_lookup_rcu() now not only needs to return the dentry and the
sequence number that goes along with the lookup, it also needs to
return the inode pointer that was validated by that sequence number
check.
- and because we did the sequence number check early (to validate the
name pointer and length) we also couldn't just pass the dentry itself
to dentry_cmp(), we had to pass the counted string that contained the
name.
So that sequence number decision caused two separate ugly calling
conventions.
Both of these problems would be solved if we just did the sequence
number check in the caller instead. There's only one caller, and that
caller already has to do the sequence number check for the parent
anyway, so just do that.
That allows us to stop returning the dentry->d_inode in that in-out
argument (pointer-to-pointer-to-inode), so we can make the inode
argument just a regular input inode pointer. The caller can just load
the inode from dentry->d_inode, and then do the sequence number check
after that to make sure that it's synchronized with the name we looked
up.
And it allows us to just pass in the dentry to dentry_cmp(), which is
what all the callers really wanted. Sure, dentry_cmp() has to be a bit
careful about the dentry (which is not stable during RCU lookup), but
that's actually very simple.
And now that dentry_cmp() can clearly see that the first string argument
is a dentry, we can use the direct word access for that, instead of the
careful unaligned zero-padding. The dentry name is always properly
aligned, since it is a single path component that is either embedded
into the dentry itself, or was allocated with kmalloc() (see __d_alloc).
Finally, this also uninlines the nasty slow-case for dentry comparisons:
that one *does* need to do a sequence number check, since it will call
in to the low-level filesystems, and we want to give those a stable
inode pointer and path component length/start arguments. Doing an extra
sequence check for that slow case is not a problem, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'fs/namei.c')
-rw-r--r-- | fs/namei.c | 19 |
1 files changed, 16 insertions, 3 deletions
diff --git a/fs/namei.c b/fs/namei.c index c42791914f82..46bd0045575d 100644 --- a/fs/namei.c +++ b/fs/namei.c | |||
@@ -1154,12 +1154,25 @@ static int do_lookup(struct nameidata *nd, struct qstr *name, | |||
1154 | */ | 1154 | */ |
1155 | if (nd->flags & LOOKUP_RCU) { | 1155 | if (nd->flags & LOOKUP_RCU) { |
1156 | unsigned seq; | 1156 | unsigned seq; |
1157 | *inode = nd->inode; | 1157 | dentry = __d_lookup_rcu(parent, name, &seq, nd->inode); |
1158 | dentry = __d_lookup_rcu(parent, name, &seq, inode); | ||
1159 | if (!dentry) | 1158 | if (!dentry) |
1160 | goto unlazy; | 1159 | goto unlazy; |
1161 | 1160 | ||
1162 | /* Memory barrier in read_seqcount_begin of child is enough */ | 1161 | /* |
1162 | * This sequence count validates that the inode matches | ||
1163 | * the dentry name information from lookup. | ||
1164 | */ | ||
1165 | *inode = dentry->d_inode; | ||
1166 | if (read_seqcount_retry(&dentry->d_seq, seq)) | ||
1167 | return -ECHILD; | ||
1168 | |||
1169 | /* | ||
1170 | * This sequence count validates that the parent had no | ||
1171 | * changes while we did the lookup of the dentry above. | ||
1172 | * | ||
1173 | * The memory barrier in read_seqcount_begin of child is | ||
1174 | * enough, we can use __read_seqcount_retry here. | ||
1175 | */ | ||
1163 | if (__read_seqcount_retry(&parent->d_seq, nd->seq)) | 1176 | if (__read_seqcount_retry(&parent->d_seq, nd->seq)) |
1164 | return -ECHILD; | 1177 | return -ECHILD; |
1165 | nd->seq = seq; | 1178 | nd->seq = seq; |