diff options
Diffstat (limited to 'Documentation/locking/mutex-design.txt')
-rw-r--r-- | Documentation/locking/mutex-design.txt | 157 |
1 files changed, 157 insertions, 0 deletions
diff --git a/Documentation/locking/mutex-design.txt b/Documentation/locking/mutex-design.txt new file mode 100644 index 000000000000..ee231ed09ec6 --- /dev/null +++ b/Documentation/locking/mutex-design.txt | |||
@@ -0,0 +1,157 @@ | |||
1 | Generic Mutex Subsystem | ||
2 | |||
3 | started by Ingo Molnar <mingo@redhat.com> | ||
4 | updated by Davidlohr Bueso <davidlohr@hp.com> | ||
5 | |||
6 | What are mutexes? | ||
7 | ----------------- | ||
8 | |||
9 | In the Linux kernel, mutexes refer to a particular locking primitive | ||
10 | that enforces serialization on shared memory systems, and not only to | ||
11 | the generic term referring to 'mutual exclusion' found in academia | ||
12 | or similar theoretical text books. Mutexes are sleeping locks which | ||
13 | behave similarly to binary semaphores, and were introduced in 2006[1] | ||
14 | as an alternative to these. This new data structure provided a number | ||
15 | of advantages, including simpler interfaces, and at that time smaller | ||
16 | code (see Disadvantages). | ||
17 | |||
18 | [1] http://lwn.net/Articles/164802/ | ||
19 | |||
20 | Implementation | ||
21 | -------------- | ||
22 | |||
23 | Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h | ||
24 | and implemented in kernel/locking/mutex.c. These locks use a three | ||
25 | state atomic counter (->count) to represent the different possible | ||
26 | transitions that can occur during the lifetime of a lock: | ||
27 | |||
28 | 1: unlocked | ||
29 | 0: locked, no waiters | ||
30 | negative: locked, with potential waiters | ||
31 | |||
32 | In its most basic form it also includes a wait-queue and a spinlock | ||
33 | that serializes access to it. CONFIG_SMP systems can also include | ||
34 | a pointer to the lock task owner (->owner) as well as a spinner MCS | ||
35 | lock (->osq), both described below in (ii). | ||
36 | |||
37 | When acquiring a mutex, there are three possible paths that can be | ||
38 | taken, depending on the state of the lock: | ||
39 | |||
40 | (i) fastpath: tries to atomically acquire the lock by decrementing the | ||
41 | counter. If it was already taken by another task it goes to the next | ||
42 | possible path. This logic is architecture specific. On x86-64, the | ||
43 | locking fastpath is 2 instructions: | ||
44 | |||
45 | 0000000000000e10 <mutex_lock>: | ||
46 | e21: f0 ff 0b lock decl (%rbx) | ||
47 | e24: 79 08 jns e2e <mutex_lock+0x1e> | ||
48 | |||
49 | the unlocking fastpath is equally tight: | ||
50 | |||
51 | 0000000000000bc0 <mutex_unlock>: | ||
52 | bc8: f0 ff 07 lock incl (%rdi) | ||
53 | bcb: 7f 0a jg bd7 <mutex_unlock+0x17> | ||
54 | |||
55 | |||
56 | (ii) midpath: aka optimistic spinning, tries to spin for acquisition | ||
57 | while the lock owner is running and there are no other tasks ready | ||
58 | to run that have higher priority (need_resched). The rationale is | ||
59 | that if the lock owner is running, it is likely to release the lock | ||
60 | soon. The mutex spinners are queued up using MCS lock so that only | ||
61 | one spinner can compete for the mutex. | ||
62 | |||
63 | The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock | ||
64 | with the desirable properties of being fair and with each cpu trying | ||
65 | to acquire the lock spinning on a local variable. It avoids expensive | ||
66 | cacheline bouncing that common test-and-set spinlock implementations | ||
67 | incur. An MCS-like lock is specially tailored for optimistic spinning | ||
68 | for sleeping lock implementation. An important feature of the customized | ||
69 | MCS lock is that it has the extra property that spinners are able to exit | ||
70 | the MCS spinlock queue when they need to reschedule. This further helps | ||
71 | avoid situations where MCS spinners that need to reschedule would continue | ||
72 | waiting to spin on mutex owner, only to go directly to slowpath upon | ||
73 | obtaining the MCS lock. | ||
74 | |||
75 | |||
76 | (iii) slowpath: last resort, if the lock is still unable to be acquired, | ||
77 | the task is added to the wait-queue and sleeps until woken up by the | ||
78 | unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE. | ||
79 | |||
80 | While formally kernel mutexes are sleepable locks, it is path (ii) that | ||
81 | makes them more practically a hybrid type. By simply not interrupting a | ||
82 | task and busy-waiting for a few cycles instead of immediately sleeping, | ||
83 | the performance of this lock has been seen to significantly improve a | ||
84 | number of workloads. Note that this technique is also used for rw-semaphores. | ||
85 | |||
86 | Semantics | ||
87 | --------- | ||
88 | |||
89 | The mutex subsystem checks and enforces the following rules: | ||
90 | |||
91 | - Only one task can hold the mutex at a time. | ||
92 | - Only the owner can unlock the mutex. | ||
93 | - Multiple unlocks are not permitted. | ||
94 | - Recursive locking/unlocking is not permitted. | ||
95 | - A mutex must only be initialized via the API (see below). | ||
96 | - A task may not exit with a mutex held. | ||
97 | - Memory areas where held locks reside must not be freed. | ||
98 | - Held mutexes must not be reinitialized. | ||
99 | - Mutexes may not be used in hardware or software interrupt | ||
100 | contexts such as tasklets and timers. | ||
101 | |||
102 | These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled. | ||
103 | In addition, the mutex debugging code also implements a number of other | ||
104 | features that make lock debugging easier and faster: | ||
105 | |||
106 | - Uses symbolic names of mutexes, whenever they are printed | ||
107 | in debug output. | ||
108 | - Point-of-acquire tracking, symbolic lookup of function names, | ||
109 | list of all locks held in the system, printout of them. | ||
110 | - Owner tracking. | ||
111 | - Detects self-recursing locks and prints out all relevant info. | ||
112 | - Detects multi-task circular deadlocks and prints out all affected | ||
113 | locks and tasks (and only those tasks). | ||
114 | |||
115 | |||
116 | Interfaces | ||
117 | ---------- | ||
118 | Statically define the mutex: | ||
119 | DEFINE_MUTEX(name); | ||
120 | |||
121 | Dynamically initialize the mutex: | ||
122 | mutex_init(mutex); | ||
123 | |||
124 | Acquire the mutex, uninterruptible: | ||
125 | void mutex_lock(struct mutex *lock); | ||
126 | void mutex_lock_nested(struct mutex *lock, unsigned int subclass); | ||
127 | int mutex_trylock(struct mutex *lock); | ||
128 | |||
129 | Acquire the mutex, interruptible: | ||
130 | int mutex_lock_interruptible_nested(struct mutex *lock, | ||
131 | unsigned int subclass); | ||
132 | int mutex_lock_interruptible(struct mutex *lock); | ||
133 | |||
134 | Acquire the mutex, interruptible, if dec to 0: | ||
135 | int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); | ||
136 | |||
137 | Unlock the mutex: | ||
138 | void mutex_unlock(struct mutex *lock); | ||
139 | |||
140 | Test if the mutex is taken: | ||
141 | int mutex_is_locked(struct mutex *lock); | ||
142 | |||
143 | Disadvantages | ||
144 | ------------- | ||
145 | |||
146 | Unlike its original design and purpose, 'struct mutex' is larger than | ||
147 | most locks in the kernel. E.g: on x86-64 it is 40 bytes, almost twice | ||
148 | as large as 'struct semaphore' (24 bytes) and 8 bytes shy of the | ||
149 | 'struct rw_semaphore' variant. Larger structure sizes mean more CPU | ||
150 | cache and memory footprint. | ||
151 | |||
152 | When to use mutexes | ||
153 | ------------------- | ||
154 | |||
155 | Unless the strict semantics of mutexes are unsuitable and/or the critical | ||
156 | region prevents the lock from being shared, always prefer them to any other | ||
157 | locking primitive. | ||