diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2014-06-12 21:48:15 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2014-06-12 21:48:15 -0400 |
commit | c29deef32e3699e40da3e9e82267610de04e6b54 (patch) | |
tree | 820ab21fe399225f7341499e461ee793a180d414 /Documentation | |
parent | f9da455b93f6ba076935b4ef4589f61e529ae046 (diff) | |
parent | bd01ec1a13f9a327950c8e3080096446c7804753 (diff) |
Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more locking changes from Ingo Molnar:
"This is the second round of locking tree updates for v3.16, offering
large system scalability improvements:
- optimistic spinning for rwsems, from Davidlohr Bueso.
- 'qrwlocks' core code and x86 enablement, from Waiman Long and PeterZ"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, locking/rwlocks: Enable qrwlocks on x86
locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks
locking/mutexes: Documentation update/rewrite
locking/rwsem: Fix checkpatch.pl warnings
locking/rwsem: Fix warnings for CONFIG_RWSEM_GENERIC_SPINLOCK
locking/rwsem: Support optimistic spinning
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/mutex-design.txt | 252 |
1 files changed, 135 insertions, 117 deletions
diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt index 1dfe62c3641d..ee231ed09ec6 100644 --- a/Documentation/mutex-design.txt +++ b/Documentation/mutex-design.txt | |||
@@ -1,139 +1,157 @@ | |||
1 | Generic Mutex Subsystem | 1 | Generic Mutex Subsystem |
2 | 2 | ||
3 | started by Ingo Molnar <mingo@redhat.com> | 3 | started by Ingo Molnar <mingo@redhat.com> |
4 | updated by Davidlohr Bueso <davidlohr@hp.com> | ||
4 | 5 | ||
5 | "Why on earth do we need a new mutex subsystem, and what's wrong | 6 | What are mutexes? |
6 | with semaphores?" | 7 | ----------------- |
7 | 8 | ||
8 | firstly, there's nothing wrong with semaphores. But if the simpler | 9 | In the Linux kernel, mutexes refer to a particular locking primitive |
9 | mutex semantics are sufficient for your code, then there are a couple | 10 | that enforces serialization on shared memory systems, and not only to |
10 | of advantages of mutexes: | 11 | the generic term referring to 'mutual exclusion' found in academia |
12 | or similar theoretical text books. Mutexes are sleeping locks which | ||
13 | behave similarly to binary semaphores, and were introduced in 2006[1] | ||
14 | as an alternative to these. This new data structure provided a number | ||
15 | of advantages, including simpler interfaces, and at that time smaller | ||
16 | code (see Disadvantages). | ||
11 | 17 | ||
12 | - 'struct mutex' is smaller on most architectures: E.g. on x86, | 18 | [1] http://lwn.net/Articles/164802/ |
13 | 'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes. | ||
14 | A smaller structure size means less RAM footprint, and better | ||
15 | CPU-cache utilization. | ||
16 | 19 | ||
17 | - tighter code. On x86 i get the following .text sizes when | 20 | Implementation |
18 | switching all mutex-alike semaphores in the kernel to the mutex | 21 | -------------- |
19 | subsystem: | ||
20 | 22 | ||
21 | text data bss dec hex filename | 23 | Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h |
22 | 3280380 868188 396860 4545428 455b94 vmlinux-semaphore | 24 | and implemented in kernel/locking/mutex.c. These locks use a three |
23 | 3255329 865296 396732 4517357 44eded vmlinux-mutex | 25 | state atomic counter (->count) to represent the different possible |
26 | transitions that can occur during the lifetime of a lock: | ||
24 | 27 | ||
25 | that's 25051 bytes of code saved, or a 0.76% win - off the hottest | 28 | 1: unlocked |
26 | codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%) | 29 | 0: locked, no waiters |
27 | Smaller code means better icache footprint, which is one of the | 30 | negative: locked, with potential waiters |
28 | major optimization goals in the Linux kernel currently. | ||
29 | 31 | ||
30 | - the mutex subsystem is slightly faster and has better scalability for | 32 | In its most basic form it also includes a wait-queue and a spinlock |
31 | contended workloads. On an 8-way x86 system, running a mutex-based | 33 | that serializes access to it. CONFIG_SMP systems can also include |
32 | kernel and testing creat+unlink+close (of separate, per-task files) | 34 | a pointer to the lock task owner (->owner) as well as a spinner MCS |
33 | in /tmp with 16 parallel tasks, the average number of ops/sec is: | 35 | lock (->osq), both described below in (ii). |
34 | 36 | ||
35 | Semaphores: Mutexes: | 37 | When acquiring a mutex, there are three possible paths that can be |
38 | taken, depending on the state of the lock: | ||
36 | 39 | ||
37 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 | 40 | (i) fastpath: tries to atomically acquire the lock by decrementing the |
38 | 8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. | 41 | counter. If it was already taken by another task it goes to the next |
39 | checking VFS performance. checking VFS performance. | 42 | possible path. This logic is architecture specific. On x86-64, the |
40 | avg loops/sec: 34713 avg loops/sec: 84153 | 43 | locking fastpath is 2 instructions: |
41 | CPU utilization: 63% CPU utilization: 22% | ||
42 | 44 | ||
43 | i.e. in this workload, the mutex based kernel was 2.4 times faster | 45 | 0000000000000e10 <mutex_lock>: |
44 | than the semaphore based kernel, _and_ it also had 2.8 times less CPU | 46 | e21: f0 ff 0b lock decl (%rbx) |
45 | utilization. (In terms of 'ops per CPU cycle', the semaphore kernel | 47 | e24: 79 08 jns e2e <mutex_lock+0x1e> |
46 | performed 551 ops/sec per 1% of CPU time used, while the mutex kernel | ||
47 | performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times | ||
48 | more efficient.) | ||
49 | |||
50 | the scalability difference is visible even on a 2-way P4 HT box: | ||
51 | |||
52 | Semaphores: Mutexes: | ||
53 | |||
54 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 | ||
55 | 4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. | ||
56 | checking VFS performance. checking VFS performance. | ||
57 | avg loops/sec: 127659 avg loops/sec: 181082 | ||
58 | CPU utilization: 100% CPU utilization: 34% | ||
59 | |||
60 | (the straight performance advantage of mutexes is 41%, the per-cycle | ||
61 | efficiency of mutexes is 4.1 times better.) | ||
62 | |||
63 | - there are no fastpath tradeoffs, the mutex fastpath is just as tight | ||
64 | as the semaphore fastpath. On x86, the locking fastpath is 2 | ||
65 | instructions: | ||
66 | |||
67 | c0377ccb <mutex_lock>: | ||
68 | c0377ccb: f0 ff 08 lock decl (%eax) | ||
69 | c0377cce: 78 0e js c0377cde <.text..lock.mutex> | ||
70 | c0377cd0: c3 ret | ||
71 | 48 | ||
72 | the unlocking fastpath is equally tight: | 49 | the unlocking fastpath is equally tight: |
73 | 50 | ||
74 | c0377cd1 <mutex_unlock>: | 51 | 0000000000000bc0 <mutex_unlock>: |
75 | c0377cd1: f0 ff 00 lock incl (%eax) | 52 | bc8: f0 ff 07 lock incl (%rdi) |
76 | c0377cd4: 7e 0f jle c0377ce5 <.text..lock.mutex+0x7> | 53 | bcb: 7f 0a jg bd7 <mutex_unlock+0x17> |
77 | c0377cd6: c3 ret | 54 | |
78 | 55 | ||
79 | - 'struct mutex' semantics are well-defined and are enforced if | 56 | (ii) midpath: aka optimistic spinning, tries to spin for acquisition |
80 | CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have | 57 | while the lock owner is running and there are no other tasks ready |
81 | virtually no debugging code or instrumentation. The mutex subsystem | 58 | to run that have higher priority (need_resched). The rationale is |
82 | checks and enforces the following rules: | 59 | that if the lock owner is running, it is likely to release the lock |
83 | 60 | soon. The mutex spinners are queued up using MCS lock so that only | |
84 | * - only one task can hold the mutex at a time | 61 | one spinner can compete for the mutex. |
85 | * - only the owner can unlock the mutex | 62 | |
86 | * - multiple unlocks are not permitted | 63 | The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock |
87 | * - recursive locking is not permitted | 64 | with the desirable properties of being fair and with each cpu trying |
88 | * - a mutex object must be initialized via the API | 65 | to acquire the lock spinning on a local variable. It avoids expensive |
89 | * - a mutex object must not be initialized via memset or copying | 66 | cacheline bouncing that common test-and-set spinlock implementations |
90 | * - task may not exit with mutex held | 67 | incur. An MCS-like lock is specially tailored for optimistic spinning |
91 | * - memory areas where held locks reside must not be freed | 68 | for sleeping lock implementation. An important feature of the customized |
92 | * - held mutexes must not be reinitialized | 69 | MCS lock is that it has the extra property that spinners are able to exit |
93 | * - mutexes may not be used in hardware or software interrupt | 70 | the MCS spinlock queue when they need to reschedule. This further helps |
94 | * contexts such as tasklets and timers | 71 | avoid situations where MCS spinners that need to reschedule would continue |
95 | 72 | waiting to spin on mutex owner, only to go directly to slowpath upon | |
96 | furthermore, there are also convenience features in the debugging | 73 | obtaining the MCS lock. |
97 | code: | 74 | |
98 | 75 | ||
99 | * - uses symbolic names of mutexes, whenever they are printed in debug output | 76 | (iii) slowpath: last resort, if the lock is still unable to be acquired, |
100 | * - point-of-acquire tracking, symbolic lookup of function names | 77 | the task is added to the wait-queue and sleeps until woken up by the |
101 | * - list of all locks held in the system, printout of them | 78 | unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE. |
102 | * - owner tracking | 79 | |
103 | * - detects self-recursing locks and prints out all relevant info | 80 | While formally kernel mutexes are sleepable locks, it is path (ii) that |
104 | * - detects multi-task circular deadlocks and prints out all affected | 81 | makes them more practically a hybrid type. By simply not interrupting a |
105 | * locks and tasks (and only those tasks) | 82 | task and busy-waiting for a few cycles instead of immediately sleeping, |
83 | the performance of this lock has been seen to significantly improve a | ||
84 | number of workloads. Note that this technique is also used for rw-semaphores. | ||
85 | |||
86 | Semantics | ||
87 | --------- | ||
88 | |||
89 | The mutex subsystem checks and enforces the following rules: | ||
90 | |||
91 | - Only one task can hold the mutex at a time. | ||
92 | - Only the owner can unlock the mutex. | ||
93 | - Multiple unlocks are not permitted. | ||
94 | - Recursive locking/unlocking is not permitted. | ||
95 | - A mutex must only be initialized via the API (see below). | ||
96 | - A task may not exit with a mutex held. | ||
97 | - Memory areas where held locks reside must not be freed. | ||
98 | - Held mutexes must not be reinitialized. | ||
99 | - Mutexes may not be used in hardware or software interrupt | ||
100 | contexts such as tasklets and timers. | ||
101 | |||
102 | These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled. | ||
103 | In addition, the mutex debugging code also implements a number of other | ||
104 | features that make lock debugging easier and faster: | ||
105 | |||
106 | - Uses symbolic names of mutexes, whenever they are printed | ||
107 | in debug output. | ||
108 | - Point-of-acquire tracking, symbolic lookup of function names, | ||
109 | list of all locks held in the system, printout of them. | ||
110 | - Owner tracking. | ||
111 | - Detects self-recursing locks and prints out all relevant info. | ||
112 | - Detects multi-task circular deadlocks and prints out all affected | ||
113 | locks and tasks (and only those tasks). | ||
114 | |||
115 | |||
116 | Interfaces | ||
117 | ---------- | ||
118 | Statically define the mutex: | ||
119 | DEFINE_MUTEX(name); | ||
120 | |||
121 | Dynamically initialize the mutex: | ||
122 | mutex_init(mutex); | ||
123 | |||
124 | Acquire the mutex, uninterruptible: | ||
125 | void mutex_lock(struct mutex *lock); | ||
126 | void mutex_lock_nested(struct mutex *lock, unsigned int subclass); | ||
127 | int mutex_trylock(struct mutex *lock); | ||
128 | |||
129 | Acquire the mutex, interruptible: | ||
130 | int mutex_lock_interruptible_nested(struct mutex *lock, | ||
131 | unsigned int subclass); | ||
132 | int mutex_lock_interruptible(struct mutex *lock); | ||
133 | |||
134 | Acquire the mutex, interruptible, if dec to 0: | ||
135 | int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); | ||
136 | |||
137 | Unlock the mutex: | ||
138 | void mutex_unlock(struct mutex *lock); | ||
139 | |||
140 | Test if the mutex is taken: | ||
141 | int mutex_is_locked(struct mutex *lock); | ||
106 | 142 | ||
107 | Disadvantages | 143 | Disadvantages |
108 | ------------- | 144 | ------------- |
109 | 145 | ||
110 | The stricter mutex API means you cannot use mutexes the same way you | 146 | Unlike its original design and purpose, 'struct mutex' is larger than |
111 | can use semaphores: e.g. they cannot be used from an interrupt context, | 147 | most locks in the kernel. E.g: on x86-64 it is 40 bytes, almost twice |
112 | nor can they be unlocked from a different context that which acquired | 148 | as large as 'struct semaphore' (24 bytes) and 8 bytes shy of the |
113 | it. [ I'm not aware of any other (e.g. performance) disadvantages from | 149 | 'struct rw_semaphore' variant. Larger structure sizes mean more CPU |
114 | using mutexes at the moment, please let me know if you find any. ] | 150 | cache and memory footprint. |
115 | |||
116 | Implementation of mutexes | ||
117 | ------------------------- | ||
118 | |||
119 | 'struct mutex' is the new mutex type, defined in include/linux/mutex.h and | ||
120 | implemented in kernel/locking/mutex.c. It is a counter-based mutex with a | ||
121 | spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", 0 for | ||
122 | "locked" and negative numbers (usually -1) for "locked, potential waiters | ||
123 | queued". | ||
124 | |||
125 | the APIs of 'struct mutex' have been streamlined: | ||
126 | |||
127 | DEFINE_MUTEX(name); | ||
128 | 151 | ||
129 | mutex_init(mutex); | 152 | When to use mutexes |
153 | ------------------- | ||
130 | 154 | ||
131 | void mutex_lock(struct mutex *lock); | 155 | Unless the strict semantics of mutexes are unsuitable and/or the critical |
132 | int mutex_lock_interruptible(struct mutex *lock); | 156 | region prevents the lock from being shared, always prefer them to any other |
133 | int mutex_trylock(struct mutex *lock); | 157 | locking primitive. |
134 | void mutex_unlock(struct mutex *lock); | ||
135 | int mutex_is_locked(struct mutex *lock); | ||
136 | void mutex_lock_nested(struct mutex *lock, unsigned int subclass); | ||
137 | int mutex_lock_interruptible_nested(struct mutex *lock, | ||
138 | unsigned int subclass); | ||
139 | int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); | ||