diff options
Diffstat (limited to 'Documentation/mutex-design.txt')
-rw-r--r-- | Documentation/mutex-design.txt | 135 |
1 files changed, 135 insertions, 0 deletions
diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt new file mode 100644 index 000000000000..cbf79881a41c --- /dev/null +++ b/Documentation/mutex-design.txt | |||
@@ -0,0 +1,135 @@ | |||
1 | Generic Mutex Subsystem | ||
2 | |||
3 | started by Ingo Molnar <mingo@redhat.com> | ||
4 | |||
5 | "Why on earth do we need a new mutex subsystem, and what's wrong | ||
6 | with semaphores?" | ||
7 | |||
8 | firstly, there's nothing wrong with semaphores. But if the simpler | ||
9 | mutex semantics are sufficient for your code, then there are a couple | ||
10 | of advantages of mutexes: | ||
11 | |||
12 | - 'struct mutex' is smaller on most architectures: .e.g on x86, | ||
13 | 'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes. | ||
14 | A smaller structure size means less RAM footprint, and better | ||
15 | CPU-cache utilization. | ||
16 | |||
17 | - tighter code. On x86 i get the following .text sizes when | ||
18 | switching all mutex-alike semaphores in the kernel to the mutex | ||
19 | subsystem: | ||
20 | |||
21 | text data bss dec hex filename | ||
22 | 3280380 868188 396860 4545428 455b94 vmlinux-semaphore | ||
23 | 3255329 865296 396732 4517357 44eded vmlinux-mutex | ||
24 | |||
25 | that's 25051 bytes of code saved, or a 0.76% win - off the hottest | ||
26 | codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%) | ||
27 | Smaller code means better icache footprint, which is one of the | ||
28 | major optimization goals in the Linux kernel currently. | ||
29 | |||
30 | - the mutex subsystem is slightly faster and has better scalability for | ||
31 | contended workloads. On an 8-way x86 system, running a mutex-based | ||
32 | kernel and testing creat+unlink+close (of separate, per-task files) | ||
33 | in /tmp with 16 parallel tasks, the average number of ops/sec is: | ||
34 | |||
35 | Semaphores: Mutexes: | ||
36 | |||
37 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 | ||
38 | 8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. | ||
39 | checking VFS performance. checking VFS performance. | ||
40 | avg loops/sec: 34713 avg loops/sec: 84153 | ||
41 | CPU utilization: 63% CPU utilization: 22% | ||
42 | |||
43 | i.e. in this workload, the mutex based kernel was 2.4 times faster | ||
44 | than the semaphore based kernel, _and_ it also had 2.8 times less CPU | ||
45 | utilization. (In terms of 'ops per CPU cycle', the semaphore kernel | ||
46 | performed 551 ops/sec per 1% of CPU time used, while the mutex kernel | ||
47 | performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times | ||
48 | more efficient.) | ||
49 | |||
50 | the scalability difference is visible even on a 2-way P4 HT box: | ||
51 | |||
52 | Semaphores: Mutexes: | ||
53 | |||
54 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 | ||
55 | 4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. | ||
56 | checking VFS performance. checking VFS performance. | ||
57 | avg loops/sec: 127659 avg loops/sec: 181082 | ||
58 | CPU utilization: 100% CPU utilization: 34% | ||
59 | |||
60 | (the straight performance advantage of mutexes is 41%, the per-cycle | ||
61 | efficiency of mutexes is 4.1 times better.) | ||
62 | |||
63 | - there are no fastpath tradeoffs, the mutex fastpath is just as tight | ||
64 | as the semaphore fastpath. On x86, the locking fastpath is 2 | ||
65 | instructions: | ||
66 | |||
67 | c0377ccb <mutex_lock>: | ||
68 | c0377ccb: f0 ff 08 lock decl (%eax) | ||
69 | c0377cce: 78 0e js c0377cde <.text.lock.mutex> | ||
70 | c0377cd0: c3 ret | ||
71 | |||
72 | the unlocking fastpath is equally tight: | ||
73 | |||
74 | c0377cd1 <mutex_unlock>: | ||
75 | c0377cd1: f0 ff 00 lock incl (%eax) | ||
76 | c0377cd4: 7e 0f jle c0377ce5 <.text.lock.mutex+0x7> | ||
77 | c0377cd6: c3 ret | ||
78 | |||
79 | - 'struct mutex' semantics are well-defined and are enforced if | ||
80 | CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have | ||
81 | virtually no debugging code or instrumentation. The mutex subsystem | ||
82 | checks and enforces the following rules: | ||
83 | |||
84 | * - only one task can hold the mutex at a time | ||
85 | * - only the owner can unlock the mutex | ||
86 | * - multiple unlocks are not permitted | ||
87 | * - recursive locking is not permitted | ||
88 | * - a mutex object must be initialized via the API | ||
89 | * - a mutex object must not be initialized via memset or copying | ||
90 | * - task may not exit with mutex held | ||
91 | * - memory areas where held locks reside must not be freed | ||
92 | * - held mutexes must not be reinitialized | ||
93 | * - mutexes may not be used in irq contexts | ||
94 | |||
95 | furthermore, there are also convenience features in the debugging | ||
96 | code: | ||
97 | |||
98 | * - uses symbolic names of mutexes, whenever they are printed in debug output | ||
99 | * - point-of-acquire tracking, symbolic lookup of function names | ||
100 | * - list of all locks held in the system, printout of them | ||
101 | * - owner tracking | ||
102 | * - detects self-recursing locks and prints out all relevant info | ||
103 | * - detects multi-task circular deadlocks and prints out all affected | ||
104 | * locks and tasks (and only those tasks) | ||
105 | |||
106 | Disadvantages | ||
107 | ------------- | ||
108 | |||
109 | The stricter mutex API means you cannot use mutexes the same way you | ||
110 | can use semaphores: e.g. they cannot be used from an interrupt context, | ||
111 | nor can they be unlocked from a different context that which acquired | ||
112 | it. [ I'm not aware of any other (e.g. performance) disadvantages from | ||
113 | using mutexes at the moment, please let me know if you find any. ] | ||
114 | |||
115 | Implementation of mutexes | ||
116 | ------------------------- | ||
117 | |||
118 | 'struct mutex' is the new mutex type, defined in include/linux/mutex.h | ||
119 | and implemented in kernel/mutex.c. It is a counter-based mutex with a | ||
120 | spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", | ||
121 | 0 for "locked" and negative numbers (usually -1) for "locked, potential | ||
122 | waiters queued". | ||
123 | |||
124 | the APIs of 'struct mutex' have been streamlined: | ||
125 | |||
126 | DEFINE_MUTEX(name); | ||
127 | |||
128 | mutex_init(mutex); | ||
129 | |||
130 | void mutex_lock(struct mutex *lock); | ||
131 | int mutex_lock_interruptible(struct mutex *lock); | ||
132 | int mutex_trylock(struct mutex *lock); | ||
133 | void mutex_unlock(struct mutex *lock); | ||
134 | int mutex_is_locked(struct mutex *lock); | ||
135 | |||