aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/DocBook/kernel-locking.tmpl22
-rw-r--r--Documentation/mutex-design.txt135
2 files changed, 149 insertions, 8 deletions
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index 90dc2de8e0af..158ffe9bfade 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -222,7 +222,7 @@
222 <title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title> 222 <title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title>
223 223
224 <para> 224 <para>
225 There are two main types of kernel locks. The fundamental type 225 There are three main types of kernel locks. The fundamental type
226 is the spinlock 226 is the spinlock
227 (<filename class="headerfile">include/asm/spinlock.h</filename>), 227 (<filename class="headerfile">include/asm/spinlock.h</filename>),
228 which is a very simple single-holder lock: if you can't get the 228 which is a very simple single-holder lock: if you can't get the
@@ -230,16 +230,22 @@
230 very small and fast, and can be used anywhere. 230 very small and fast, and can be used anywhere.
231 </para> 231 </para>
232 <para> 232 <para>
233 The second type is a semaphore 233 The second type is a mutex
234 (<filename class="headerfile">include/linux/mutex.h</filename>): it
235 is like a spinlock, but you may block holding a mutex.
236 If you can't lock a mutex, your task will suspend itself, and be woken
237 up when the mutex is released. This means the CPU can do something
238 else while you are waiting. There are many cases when you simply
239 can't sleep (see <xref linkend="sleeping-things"/>), and so have to
240 use a spinlock instead.
241 </para>
242 <para>
243 The third type is a semaphore
234 (<filename class="headerfile">include/asm/semaphore.h</filename>): it 244 (<filename class="headerfile">include/asm/semaphore.h</filename>): it
235 can have more than one holder at any time (the number decided at 245 can have more than one holder at any time (the number decided at
236 initialization time), although it is most commonly used as a 246 initialization time), although it is most commonly used as a
237 single-holder lock (a mutex). If you can't get a semaphore, 247 single-holder lock (a mutex). If you can't get a semaphore, your
238 your task will put itself on the queue, and be woken up when the 248 task will be suspended and later on woken up - just like for mutexes.
239 semaphore is released. This means the CPU will do something
240 else while you are waiting, but there are many cases when you
241 simply can't sleep (see <xref linkend="sleeping-things"/>), and so
242 have to use a spinlock instead.
243 </para> 249 </para>
244 <para> 250 <para>
245 Neither type of lock is recursive: see 251 Neither type of lock is recursive: see
diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt
new file mode 100644
index 000000000000..cbf79881a41c
--- /dev/null
+++ b/Documentation/mutex-design.txt
@@ -0,0 +1,135 @@
1Generic Mutex Subsystem
2
3started by Ingo Molnar <mingo@redhat.com>
4
5 "Why on earth do we need a new mutex subsystem, and what's wrong
6 with semaphores?"
7
8firstly, there's nothing wrong with semaphores. But if the simpler
9mutex semantics are sufficient for your code, then there are a couple
10of advantages of mutexes:
11
12 - 'struct mutex' is smaller on most architectures: .e.g on x86,
13 'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
14 A smaller structure size means less RAM footprint, and better
15 CPU-cache utilization.
16
17 - tighter code. On x86 i get the following .text sizes when
18 switching all mutex-alike semaphores in the kernel to the mutex
19 subsystem:
20
21 text data bss dec hex filename
22 3280380 868188 396860 4545428 455b94 vmlinux-semaphore
23 3255329 865296 396732 4517357 44eded vmlinux-mutex
24
25 that's 25051 bytes of code saved, or a 0.76% win - off the hottest
26 codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
27 Smaller code means better icache footprint, which is one of the
28 major optimization goals in the Linux kernel currently.
29
30 - the mutex subsystem is slightly faster and has better scalability for
31 contended workloads. On an 8-way x86 system, running a mutex-based
32 kernel and testing creat+unlink+close (of separate, per-task files)
33 in /tmp with 16 parallel tasks, the average number of ops/sec is:
34
35 Semaphores: Mutexes:
36
37 $ ./test-mutex V 16 10 $ ./test-mutex V 16 10
38 8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
39 checking VFS performance. checking VFS performance.
40 avg loops/sec: 34713 avg loops/sec: 84153
41 CPU utilization: 63% CPU utilization: 22%
42
43 i.e. in this workload, the mutex based kernel was 2.4 times faster
44 than the semaphore based kernel, _and_ it also had 2.8 times less CPU
45 utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
46 performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
47 performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
48 more efficient.)
49
50 the scalability difference is visible even on a 2-way P4 HT box:
51
52 Semaphores: Mutexes:
53
54 $ ./test-mutex V 16 10 $ ./test-mutex V 16 10
55 4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
56 checking VFS performance. checking VFS performance.
57 avg loops/sec: 127659 avg loops/sec: 181082
58 CPU utilization: 100% CPU utilization: 34%
59
60 (the straight performance advantage of mutexes is 41%, the per-cycle
61 efficiency of mutexes is 4.1 times better.)
62
63 - there are no fastpath tradeoffs, the mutex fastpath is just as tight
64 as the semaphore fastpath. On x86, the locking fastpath is 2
65 instructions:
66
67 c0377ccb <mutex_lock>:
68 c0377ccb: f0 ff 08 lock decl (%eax)
69 c0377cce: 78 0e js c0377cde <.text.lock.mutex>
70 c0377cd0: c3 ret
71
72 the unlocking fastpath is equally tight:
73
74 c0377cd1 <mutex_unlock>:
75 c0377cd1: f0 ff 00 lock incl (%eax)
76 c0377cd4: 7e 0f jle c0377ce5 <.text.lock.mutex+0x7>
77 c0377cd6: c3 ret
78
79 - 'struct mutex' semantics are well-defined and are enforced if
80 CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
81 virtually no debugging code or instrumentation. The mutex subsystem
82 checks and enforces the following rules:
83
84 * - only one task can hold the mutex at a time
85 * - only the owner can unlock the mutex
86 * - multiple unlocks are not permitted
87 * - recursive locking is not permitted
88 * - a mutex object must be initialized via the API
89 * - a mutex object must not be initialized via memset or copying
90 * - task may not exit with mutex held
91 * - memory areas where held locks reside must not be freed
92 * - held mutexes must not be reinitialized
93 * - mutexes may not be used in irq contexts
94
95 furthermore, there are also convenience features in the debugging
96 code:
97
98 * - uses symbolic names of mutexes, whenever they are printed in debug output
99 * - point-of-acquire tracking, symbolic lookup of function names
100 * - list of all locks held in the system, printout of them
101 * - owner tracking
102 * - detects self-recursing locks and prints out all relevant info
103 * - detects multi-task circular deadlocks and prints out all affected
104 * locks and tasks (and only those tasks)
105
106Disadvantages
107-------------
108
109The stricter mutex API means you cannot use mutexes the same way you
110can use semaphores: e.g. they cannot be used from an interrupt context,
111nor can they be unlocked from a different context that which acquired
112it. [ I'm not aware of any other (e.g. performance) disadvantages from
113using mutexes at the moment, please let me know if you find any. ]
114
115Implementation of mutexes
116-------------------------
117
118'struct mutex' is the new mutex type, defined in include/linux/mutex.h
119and implemented in kernel/mutex.c. It is a counter-based mutex with a
120spinlock and a wait-list. The counter has 3 states: 1 for "unlocked",
1210 for "locked" and negative numbers (usually -1) for "locked, potential
122waiters queued".
123
124the APIs of 'struct mutex' have been streamlined:
125
126 DEFINE_MUTEX(name);
127
128 mutex_init(mutex);
129
130 void mutex_lock(struct mutex *lock);
131 int mutex_lock_interruptible(struct mutex *lock);
132 int mutex_trylock(struct mutex *lock);
133 void mutex_unlock(struct mutex *lock);
134 int mutex_is_locked(struct mutex *lock);
135