diff options
Diffstat (limited to 'Documentation/filesystems')
| -rw-r--r-- | Documentation/filesystems/directory-locking | 31 | ||||
| -rw-r--r-- | Documentation/filesystems/porting | 8 |
2 files changed, 30 insertions, 9 deletions
diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking index ff7b611abf33..09bbf9a54f80 100644 --- a/Documentation/filesystems/directory-locking +++ b/Documentation/filesystems/directory-locking | |||
| @@ -2,6 +2,10 @@ | |||
| 2 | kinds of locks - per-inode (->i_mutex) and per-filesystem | 2 | kinds of locks - per-inode (->i_mutex) and per-filesystem |
| 3 | (->s_vfs_rename_mutex). | 3 | (->s_vfs_rename_mutex). |
| 4 | 4 | ||
| 5 | When taking the i_mutex on multiple non-directory objects, we | ||
| 6 | always acquire the locks in order by increasing address. We'll call | ||
| 7 | that "inode pointer" order in the following. | ||
| 8 | |||
| 5 | For our purposes all operations fall in 5 classes: | 9 | For our purposes all operations fall in 5 classes: |
| 6 | 10 | ||
| 7 | 1) read access. Locking rules: caller locks directory we are accessing. | 11 | 1) read access. Locking rules: caller locks directory we are accessing. |
| @@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem | |||
| 12 | locks victim and calls the method. | 16 | locks victim and calls the method. |
| 13 | 17 | ||
| 14 | 4) rename() that is _not_ cross-directory. Locking rules: caller locks | 18 | 4) rename() that is _not_ cross-directory. Locking rules: caller locks |
| 15 | the parent, finds source and target, if target already exists - locks it | 19 | the parent and finds source and target. If target already exists, lock |
| 16 | and then calls the method. | 20 | it. If source is a non-directory, lock it. If that means we need to |
| 21 | lock both, lock them in inode pointer order. | ||
| 17 | 22 | ||
| 18 | 5) link creation. Locking rules: | 23 | 5) link creation. Locking rules: |
| 19 | * lock parent | 24 | * lock parent |
| @@ -30,7 +35,9 @@ rules: | |||
| 30 | fail with -ENOTEMPTY | 35 | fail with -ENOTEMPTY |
| 31 | * if new parent is equal to or is a descendent of source | 36 | * if new parent is equal to or is a descendent of source |
| 32 | fail with -ELOOP | 37 | fail with -ELOOP |
| 33 | * if target exists - lock it. | 38 | * If target exists, lock it. If source is a non-directory, lock |
| 39 | it. In case that means we need to lock both source and target, | ||
| 40 | do so in inode pointer order. | ||
| 34 | * call the method. | 41 | * call the method. |
| 35 | 42 | ||
| 36 | 43 | ||
| @@ -56,9 +63,11 @@ objects - A < B iff A is an ancestor of B. | |||
| 56 | renames will be blocked on filesystem lock and we don't start changing | 63 | renames will be blocked on filesystem lock and we don't start changing |
| 57 | the order until we had acquired all locks). | 64 | the order until we had acquired all locks). |
| 58 | 65 | ||
| 59 | (3) any operation holds at most one lock on non-directory object and | 66 | (3) locks on non-directory objects are acquired only after locks on |
| 60 | that lock is acquired after all other locks. (Proof: see descriptions | 67 | directory objects, and are acquired in inode pointer order. |
| 61 | of operations). | 68 | (Proof: all operations but renames take lock on at most one |
| 69 | non-directory object, except renames, which take locks on source and | ||
| 70 | target in inode pointer order in the case they are not directories.) | ||
| 62 | 71 | ||
| 63 | Now consider the minimal deadlock. Each process is blocked on | 72 | Now consider the minimal deadlock. Each process is blocked on |
| 64 | attempt to acquire some lock and already holds at least one lock. Let's | 73 | attempt to acquire some lock and already holds at least one lock. Let's |
| @@ -66,9 +75,13 @@ consider the set of contended locks. First of all, filesystem lock is | |||
| 66 | not contended, since any process blocked on it is not holding any locks. | 75 | not contended, since any process blocked on it is not holding any locks. |
| 67 | Thus all processes are blocked on ->i_mutex. | 76 | Thus all processes are blocked on ->i_mutex. |
| 68 | 77 | ||
| 69 | Non-directory objects are not contended due to (3). Thus link | 78 | By (3), any process holding a non-directory lock can only be |
| 70 | creation can't be a part of deadlock - it can't be blocked on source | 79 | waiting on another non-directory lock with a larger address. Therefore |
| 71 | and it means that it doesn't hold any locks. | 80 | the process holding the "largest" such lock can always make progress, and |
| 81 | non-directory objects are not included in the set of contended locks. | ||
| 82 | |||
| 83 | Thus link creation can't be a part of deadlock - it can't be | ||
| 84 | blocked on source and it means that it doesn't hold any locks. | ||
| 72 | 85 | ||
| 73 | Any contended object is either held by cross-directory rename or | 86 | Any contended object is either held by cross-directory rename or |
| 74 | has a child that is also contended. Indeed, suppose that it is held by | 87 | has a child that is also contended. Indeed, suppose that it is held by |
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index f0890581f7f6..fe2b7ae6f962 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting | |||
| @@ -455,3 +455,11 @@ in your dentry operations instead. | |||
| 455 | vfs_follow_link has been removed. Filesystems must use nd_set_link | 455 | vfs_follow_link has been removed. Filesystems must use nd_set_link |
| 456 | from ->follow_link for normal symlinks, or nd_jump_link for magic | 456 | from ->follow_link for normal symlinks, or nd_jump_link for magic |
| 457 | /proc/<pid> style links. | 457 | /proc/<pid> style links. |
| 458 | -- | ||
| 459 | [mandatory] | ||
| 460 | iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be | ||
| 461 | called with both ->i_lock and inode_hash_lock held; the former is *not* | ||
| 462 | taken anymore, so verify that your callbacks do not rely on it (none | ||
| 463 | of the in-tree instances did). inode_hash_lock is still held, | ||
| 464 | of course, so they are still serialized wrt removal from inode hash, | ||
| 465 | as well as wrt set() callback of iget5_locked(). | ||
