diff options
Diffstat (limited to 'Documentation/filesystems/dentry-locking.txt')
-rw-r--r-- | Documentation/filesystems/dentry-locking.txt | 174 |
1 files changed, 0 insertions, 174 deletions
diff --git a/Documentation/filesystems/dentry-locking.txt b/Documentation/filesystems/dentry-locking.txt deleted file mode 100644 index 79334ed5daa7..000000000000 --- a/Documentation/filesystems/dentry-locking.txt +++ /dev/null | |||
@@ -1,174 +0,0 @@ | |||
1 | RCU-based dcache locking model | ||
2 | ============================== | ||
3 | |||
4 | On many workloads, the most common operation on dcache is to look up a | ||
5 | dentry, given a parent dentry and the name of the child. Typically, | ||
6 | for every open(), stat() etc., the dentry corresponding to the | ||
7 | pathname will be looked up by walking the tree starting with the first | ||
8 | component of the pathname and using that dentry along with the next | ||
9 | component to look up the next level and so on. Since it is a frequent | ||
10 | operation for workloads like multiuser environments and web servers, | ||
11 | it is important to optimize this path. | ||
12 | |||
13 | Prior to 2.5.10, dcache_lock was acquired in d_lookup and thus in | ||
14 | every component during path look-up. Since 2.5.10 onwards, fast-walk | ||
15 | algorithm changed this by holding the dcache_lock at the beginning and | ||
16 | walking as many cached path component dentries as possible. This | ||
17 | significantly decreases the number of acquisition of | ||
18 | dcache_lock. However it also increases the lock hold time | ||
19 | significantly and affects performance in large SMP machines. Since | ||
20 | 2.5.62 kernel, dcache has been using a new locking model that uses RCU | ||
21 | to make dcache look-up lock-free. | ||
22 | |||
23 | The current dcache locking model is not very different from the | ||
24 | existing dcache locking model. Prior to 2.5.62 kernel, dcache_lock | ||
25 | protected the hash chain, d_child, d_alias, d_lru lists as well as | ||
26 | d_inode and several other things like mount look-up. RCU-based changes | ||
27 | affect only the way the hash chain is protected. For everything else | ||
28 | the dcache_lock must be taken for both traversing as well as | ||
29 | updating. The hash chain updates too take the dcache_lock. The | ||
30 | significant change is the way d_lookup traverses the hash chain, it | ||
31 | doesn't acquire the dcache_lock for this and rely on RCU to ensure | ||
32 | that the dentry has not been *freed*. | ||
33 | |||
34 | |||
35 | Dcache locking details | ||
36 | ====================== | ||
37 | |||
38 | For many multi-user workloads, open() and stat() on files are very | ||
39 | frequently occurring operations. Both involve walking of path names to | ||
40 | find the dentry corresponding to the concerned file. In 2.4 kernel, | ||
41 | dcache_lock was held during look-up of each path component. Contention | ||
42 | and cache-line bouncing of this global lock caused significant | ||
43 | scalability problems. With the introduction of RCU in Linux kernel, | ||
44 | this was worked around by making the look-up of path components during | ||
45 | path walking lock-free. | ||
46 | |||
47 | |||
48 | Safe lock-free look-up of dcache hash table | ||
49 | =========================================== | ||
50 | |||
51 | Dcache is a complex data structure with the hash table entries also | ||
52 | linked together in other lists. In 2.4 kernel, dcache_lock protected | ||
53 | all the lists. We applied RCU only on hash chain walking. The rest of | ||
54 | the lists are still protected by dcache_lock. Some of the important | ||
55 | changes are : | ||
56 | |||
57 | 1. The deletion from hash chain is done using hlist_del_rcu() macro | ||
58 | which doesn't initialize next pointer of the deleted dentry and | ||
59 | this allows us to walk safely lock-free while a deletion is | ||
60 | happening. | ||
61 | |||
62 | 2. Insertion of a dentry into the hash table is done using | ||
63 | hlist_add_head_rcu() which take care of ordering the writes - the | ||
64 | writes to the dentry must be visible before the dentry is | ||
65 | inserted. This works in conjunction with hlist_for_each_rcu(), | ||
66 | which has since been replaced by hlist_for_each_entry_rcu(), while | ||
67 | walking the hash chain. The only requirement is that all | ||
68 | initialization to the dentry must be done before | ||
69 | hlist_add_head_rcu() since we don't have dcache_lock protection | ||
70 | while traversing the hash chain. This isn't different from the | ||
71 | existing code. | ||
72 | |||
73 | 3. The dentry looked up without holding dcache_lock by cannot be | ||
74 | returned for walking if it is unhashed. It then may have a NULL | ||
75 | d_inode or other bogosity since RCU doesn't protect the other | ||
76 | fields in the dentry. We therefore use a flag DCACHE_UNHASHED to | ||
77 | indicate unhashed dentries and use this in conjunction with a | ||
78 | per-dentry lock (d_lock). Once looked up without the dcache_lock, | ||
79 | we acquire the per-dentry lock (d_lock) and check if the dentry is | ||
80 | unhashed. If so, the look-up is failed. If not, the reference count | ||
81 | of the dentry is increased and the dentry is returned. | ||
82 | |||
83 | 4. Once a dentry is looked up, it must be ensured during the path walk | ||
84 | for that component it doesn't go away. In pre-2.5.10 code, this was | ||
85 | done holding a reference to the dentry. dcache_rcu does the same. | ||
86 | In some sense, dcache_rcu path walking looks like the pre-2.5.10 | ||
87 | version. | ||
88 | |||
89 | 5. All dentry hash chain updates must take the dcache_lock as well as | ||
90 | the per-dentry lock in that order. dput() does this to ensure that | ||
91 | a dentry that has just been looked up in another CPU doesn't get | ||
92 | deleted before dget() can be done on it. | ||
93 | |||
94 | 6. There are several ways to do reference counting of RCU protected | ||
95 | objects. One such example is in ipv4 route cache where deferred | ||
96 | freeing (using call_rcu()) is done as soon as the reference count | ||
97 | goes to zero. This cannot be done in the case of dentries because | ||
98 | tearing down of dentries require blocking (dentry_iput()) which | ||
99 | isn't supported from RCU callbacks. Instead, tearing down of | ||
100 | dentries happen synchronously in dput(), but actual freeing happens | ||
101 | later when RCU grace period is over. This allows safe lock-free | ||
102 | walking of the hash chains, but a matched dentry may have been | ||
103 | partially torn down. The checking of DCACHE_UNHASHED flag with | ||
104 | d_lock held detects such dentries and prevents them from being | ||
105 | returned from look-up. | ||
106 | |||
107 | |||
108 | Maintaining POSIX rename semantics | ||
109 | ================================== | ||
110 | |||
111 | Since look-up of dentries is lock-free, it can race against a | ||
112 | concurrent rename operation. For example, during rename of file A to | ||
113 | B, look-up of either A or B must succeed. So, if look-up of B happens | ||
114 | after A has been removed from the hash chain but not added to the new | ||
115 | hash chain, it may fail. Also, a comparison while the name is being | ||
116 | written concurrently by a rename may result in false positive matches | ||
117 | violating rename semantics. Issues related to race with rename are | ||
118 | handled as described below : | ||
119 | |||
120 | 1. Look-up can be done in two ways - d_lookup() which is safe from | ||
121 | simultaneous renames and __d_lookup() which is not. If | ||
122 | __d_lookup() fails, it must be followed up by a d_lookup() to | ||
123 | correctly determine whether a dentry is in the hash table or | ||
124 | not. d_lookup() protects look-ups using a sequence lock | ||
125 | (rename_lock). | ||
126 | |||
127 | 2. The name associated with a dentry (d_name) may be changed if a | ||
128 | rename is allowed to happen simultaneously. To avoid memcmp() in | ||
129 | __d_lookup() go out of bounds due to a rename and false positive | ||
130 | comparison, the name comparison is done while holding the | ||
131 | per-dentry lock. This prevents concurrent renames during this | ||
132 | operation. | ||
133 | |||
134 | 3. Hash table walking during look-up may move to a different bucket as | ||
135 | the current dentry is moved to a different bucket due to rename. | ||
136 | But we use hlists in dcache hash table and they are | ||
137 | null-terminated. So, even if a dentry moves to a different bucket, | ||
138 | hash chain walk will terminate. [with a list_head list, it may not | ||
139 | since termination is when the list_head in the original bucket is | ||
140 | reached]. Since we redo the d_parent check and compare name while | ||
141 | holding d_lock, lock-free look-up will not race against d_move(). | ||
142 | |||
143 | 4. There can be a theoretical race when a dentry keeps coming back to | ||
144 | original bucket due to double moves. Due to this look-up may | ||
145 | consider that it has never moved and can end up in a infinite loop. | ||
146 | But this is not any worse that theoretical livelocks we already | ||
147 | have in the kernel. | ||
148 | |||
149 | |||
150 | Important guidelines for filesystem developers related to dcache_rcu | ||
151 | ==================================================================== | ||
152 | |||
153 | 1. Existing dcache interfaces (pre-2.5.62) exported to filesystem | ||
154 | don't change. Only dcache internal implementation changes. However | ||
155 | filesystems *must not* delete from the dentry hash chains directly | ||
156 | using the list macros like allowed earlier. They must use dcache | ||
157 | APIs like d_drop() or __d_drop() depending on the situation. | ||
158 | |||
159 | 2. d_flags is now protected by a per-dentry lock (d_lock). All access | ||
160 | to d_flags must be protected by it. | ||
161 | |||
162 | 3. For a hashed dentry, checking of d_count needs to be protected by | ||
163 | d_lock. | ||
164 | |||
165 | |||
166 | Papers and other documentation on dcache locking | ||
167 | ================================================ | ||
168 | |||
169 | 1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124). | ||
170 | |||
171 | 2. http://lse.sourceforge.net/locking/dcache/dcache.html | ||
172 | |||
173 | |||
174 | |||