summaryrefslogtreecommitdiffstats
path: root/Documentation/locking
diff options
context:
space:
mode:
authorMauro Carvalho Chehab <mchehab+samsung@kernel.org>2019-04-10 07:32:41 -0400
committerMauro Carvalho Chehab <mchehab+samsung@kernel.org>2019-07-15 07:53:27 -0400
commit387b14684f94483cbbb72843db406ec9a8d0d6d2 (patch)
tree95ef5ba916b0dd33a6ce5e740240ef523fdc5bd8 /Documentation/locking
parentfec88ab0af9706b2201e5daf377c5031c62d11f7 (diff)
docs: locking: convert docs to ReST and rename to *.rst
Convert the locking documents to ReST and add them to the kernel development book where it belongs. Most of the stuff here is just to make Sphinx to properly parse the text file, as they're already in good shape, not requiring massive changes in order to be parsed. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>
Diffstat (limited to 'Documentation/locking')
-rw-r--r--Documentation/locking/index.rst24
-rw-r--r--Documentation/locking/lockdep-design.rst (renamed from Documentation/locking/lockdep-design.txt)51
-rw-r--r--Documentation/locking/lockstat.rst204
-rw-r--r--Documentation/locking/lockstat.txt183
-rw-r--r--Documentation/locking/locktorture.rst (renamed from Documentation/locking/locktorture.txt)105
-rw-r--r--Documentation/locking/mutex-design.rst (renamed from Documentation/locking/mutex-design.txt)26
-rw-r--r--Documentation/locking/rt-mutex-design.rst (renamed from Documentation/locking/rt-mutex-design.txt)139
-rw-r--r--Documentation/locking/rt-mutex.rst (renamed from Documentation/locking/rt-mutex.txt)30
-rw-r--r--Documentation/locking/spinlocks.rst (renamed from Documentation/locking/spinlocks.txt)32
-rw-r--r--Documentation/locking/ww-mutex-design.rst (renamed from Documentation/locking/ww-mutex-design.txt)82
10 files changed, 500 insertions, 376 deletions
diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
new file mode 100644
index 000000000000..ef5da7fe9aac
--- /dev/null
+++ b/Documentation/locking/index.rst
@@ -0,0 +1,24 @@
1:orphan:
2
3=======
4locking
5=======
6
7.. toctree::
8 :maxdepth: 1
9
10 lockdep-design
11 lockstat
12 locktorture
13 mutex-design
14 rt-mutex-design
15 rt-mutex
16 spinlocks
17 ww-mutex-design
18
19.. only:: subproject and html
20
21 Indices
22 =======
23
24 * :ref:`genindex`
diff --git a/Documentation/locking/lockdep-design.txt b/Documentation/locking/lockdep-design.rst
index f189d130e543..23fcbc4d3fc0 100644
--- a/Documentation/locking/lockdep-design.txt
+++ b/Documentation/locking/lockdep-design.rst
@@ -2,6 +2,7 @@ Runtime locking correctness validator
2===================================== 2=====================================
3 3
4started by Ingo Molnar <mingo@redhat.com> 4started by Ingo Molnar <mingo@redhat.com>
5
5additions by Arjan van de Ven <arjan@linux.intel.com> 6additions by Arjan van de Ven <arjan@linux.intel.com>
6 7
7Lock-class 8Lock-class
@@ -56,7 +57,7 @@ where the last 1 category is:
56 57
57When locking rules are violated, these usage bits are presented in the 58When locking rules are violated, these usage bits are presented in the
58locking error messages, inside curlies, with a total of 2 * n STATEs bits. 59locking error messages, inside curlies, with a total of 2 * n STATEs bits.
59A contrived example: 60A contrived example::
60 61
61 modprobe/2287 is trying to acquire lock: 62 modprobe/2287 is trying to acquire lock:
62 (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 63 (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
@@ -70,12 +71,14 @@ of the lock and readlock (if exists), for each of the n STATEs listed
70above respectively, and the character displayed at each bit position 71above respectively, and the character displayed at each bit position
71indicates: 72indicates:
72 73
74 === ===================================================
73 '.' acquired while irqs disabled and not in irq context 75 '.' acquired while irqs disabled and not in irq context
74 '-' acquired in irq context 76 '-' acquired in irq context
75 '+' acquired with irqs enabled 77 '+' acquired with irqs enabled
76 '?' acquired in irq context with irqs enabled. 78 '?' acquired in irq context with irqs enabled.
79 === ===================================================
77 80
78The bits are illustrated with an example: 81The bits are illustrated with an example::
79 82
80 (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 83 (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
81 |||| 84 ||||
@@ -90,13 +93,13 @@ context and whether that STATE is enabled yields four possible cases as
90shown in the table below. The bit character is able to indicate which 93shown in the table below. The bit character is able to indicate which
91exact case is for the lock as of the reporting time. 94exact case is for the lock as of the reporting time.
92 95
93 ------------------------------------------- 96 +--------------+-------------+--------------+
94 | | irq enabled | irq disabled | 97 | | irq enabled | irq disabled |
95 |-------------------------------------------| 98 +--------------+-------------+--------------+
96 | ever in irq | ? | - | 99 | ever in irq | ? | - |
97 |-------------------------------------------| 100 +--------------+-------------+--------------+
98 | never in irq | + | . | 101 | never in irq | + | . |
99 ------------------------------------------- 102 +--------------+-------------+--------------+
100 103
101The character '-' suggests irq is disabled because if otherwise the 104The character '-' suggests irq is disabled because if otherwise the
102charactor '?' would have been shown instead. Similar deduction can be 105charactor '?' would have been shown instead. Similar deduction can be
@@ -113,7 +116,7 @@ is irq-unsafe means it was ever acquired with irq enabled.
113 116
114A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The 117A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The
115following states must be exclusive: only one of them is allowed to be set 118following states must be exclusive: only one of them is allowed to be set
116for any lock-class based on its usage: 119for any lock-class based on its usage::
117 120
118 <hardirq-safe> or <hardirq-unsafe> 121 <hardirq-safe> or <hardirq-unsafe>
119 <softirq-safe> or <softirq-unsafe> 122 <softirq-safe> or <softirq-unsafe>
@@ -134,7 +137,7 @@ Multi-lock dependency rules:
134The same lock-class must not be acquired twice, because this could lead 137The same lock-class must not be acquired twice, because this could lead
135to lock recursion deadlocks. 138to lock recursion deadlocks.
136 139
137Furthermore, two locks can not be taken in inverse order: 140Furthermore, two locks can not be taken in inverse order::
138 141
139 <L1> -> <L2> 142 <L1> -> <L2>
140 <L2> -> <L1> 143 <L2> -> <L1>
@@ -148,7 +151,7 @@ operations; the validator will still find whether these locks can be
148acquired in a circular fashion. 151acquired in a circular fashion.
149 152
150Furthermore, the following usage based lock dependencies are not allowed 153Furthermore, the following usage based lock dependencies are not allowed
151between any two lock-classes: 154between any two lock-classes::
152 155
153 <hardirq-safe> -> <hardirq-unsafe> 156 <hardirq-safe> -> <hardirq-unsafe>
154 <softirq-safe> -> <softirq-unsafe> 157 <softirq-safe> -> <softirq-unsafe>
@@ -204,16 +207,16 @@ the ordering is not static.
204In order to teach the validator about this correct usage model, new 207In order to teach the validator about this correct usage model, new
205versions of the various locking primitives were added that allow you to 208versions of the various locking primitives were added that allow you to
206specify a "nesting level". An example call, for the block device mutex, 209specify a "nesting level". An example call, for the block device mutex,
207looks like this: 210looks like this::
208 211
209enum bdev_bd_mutex_lock_class 212 enum bdev_bd_mutex_lock_class
210{ 213 {
211 BD_MUTEX_NORMAL, 214 BD_MUTEX_NORMAL,
212 BD_MUTEX_WHOLE, 215 BD_MUTEX_WHOLE,
213 BD_MUTEX_PARTITION 216 BD_MUTEX_PARTITION
214}; 217 };
215 218
216 mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION); 219mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
217 220
218In this case the locking is done on a bdev object that is known to be a 221In this case the locking is done on a bdev object that is known to be a
219partition. 222partition.
@@ -234,7 +237,7 @@ must be held: lockdep_assert_held*(&lock) and lockdep_*pin_lock(&lock).
234As the name suggests, lockdep_assert_held* family of macros assert that a 237As the name suggests, lockdep_assert_held* family of macros assert that a
235particular lock is held at a certain time (and generate a WARN() otherwise). 238particular lock is held at a certain time (and generate a WARN() otherwise).
236This annotation is largely used all over the kernel, e.g. kernel/sched/ 239This annotation is largely used all over the kernel, e.g. kernel/sched/
237core.c 240core.c::
238 241
239 void update_rq_clock(struct rq *rq) 242 void update_rq_clock(struct rq *rq)
240 { 243 {
@@ -253,7 +256,7 @@ out to be especially helpful to debug code with callbacks, where an upper
253layer assumes a lock remains taken, but a lower layer thinks it can maybe drop 256layer assumes a lock remains taken, but a lower layer thinks it can maybe drop
254and reacquire the lock ("unwittingly" introducing races). lockdep_pin_lock() 257and reacquire the lock ("unwittingly" introducing races). lockdep_pin_lock()
255returns a 'struct pin_cookie' that is then used by lockdep_unpin_lock() to check 258returns a 'struct pin_cookie' that is then used by lockdep_unpin_lock() to check
256that nobody tampered with the lock, e.g. kernel/sched/sched.h 259that nobody tampered with the lock, e.g. kernel/sched/sched.h::
257 260
258 static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf) 261 static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
259 { 262 {
@@ -280,7 +283,7 @@ correctness) in the sense that for every simple, standalone single-task
280locking sequence that occurred at least once during the lifetime of the 283locking sequence that occurred at least once during the lifetime of the
281kernel, the validator proves it with a 100% certainty that no 284kernel, the validator proves it with a 100% certainty that no
282combination and timing of these locking sequences can cause any class of 285combination and timing of these locking sequences can cause any class of
283lock related deadlock. [*] 286lock related deadlock. [1]_
284 287
285I.e. complex multi-CPU and multi-task locking scenarios do not have to 288I.e. complex multi-CPU and multi-task locking scenarios do not have to
286occur in practice to prove a deadlock: only the simple 'component' 289occur in practice to prove a deadlock: only the simple 'component'
@@ -299,7 +302,9 @@ possible combination of locking interaction between CPUs, combined with
299every possible hardirq and softirq nesting scenario (which is impossible 302every possible hardirq and softirq nesting scenario (which is impossible
300to do in practice). 303to do in practice).
301 304
302[*] assuming that the validator itself is 100% correct, and no other 305.. [1]
306
307 assuming that the validator itself is 100% correct, and no other
303 part of the system corrupts the state of the validator in any way. 308 part of the system corrupts the state of the validator in any way.
304 We also assume that all NMI/SMM paths [which could interrupt 309 We also assume that all NMI/SMM paths [which could interrupt
305 even hardirq-disabled codepaths] are correct and do not interfere 310 even hardirq-disabled codepaths] are correct and do not interfere
@@ -310,7 +315,7 @@ to do in practice).
310Performance: 315Performance:
311------------ 316------------
312 317
313The above rules require _massive_ amounts of runtime checking. If we did 318The above rules require **massive** amounts of runtime checking. If we did
314that for every lock taken and for every irqs-enable event, it would 319that for every lock taken and for every irqs-enable event, it would
315render the system practically unusably slow. The complexity of checking 320render the system practically unusably slow. The complexity of checking
316is O(N^2), so even with just a few hundred lock-classes we'd have to do 321is O(N^2), so even with just a few hundred lock-classes we'd have to do
@@ -369,17 +374,17 @@ be harder to do than to say.
369 374
370Of course, if you do run out of lock classes, the next thing to do is 375Of course, if you do run out of lock classes, the next thing to do is
371to find the offending lock classes. First, the following command gives 376to find the offending lock classes. First, the following command gives
372you the number of lock classes currently in use along with the maximum: 377you the number of lock classes currently in use along with the maximum::
373 378
374 grep "lock-classes" /proc/lockdep_stats 379 grep "lock-classes" /proc/lockdep_stats
375 380
376This command produces the following output on a modest system: 381This command produces the following output on a modest system::
377 382
378 lock-classes: 748 [max: 8191] 383 lock-classes: 748 [max: 8191]
379 384
380If the number allocated (748 above) increases continually over time, 385If the number allocated (748 above) increases continually over time,
381then there is likely a leak. The following command can be used to 386then there is likely a leak. The following command can be used to
382identify the leaking lock classes: 387identify the leaking lock classes::
383 388
384 grep "BD" /proc/lockdep 389 grep "BD" /proc/lockdep
385 390
diff --git a/Documentation/locking/lockstat.rst b/Documentation/locking/lockstat.rst
new file mode 100644
index 000000000000..536eab8dbd99
--- /dev/null
+++ b/Documentation/locking/lockstat.rst
@@ -0,0 +1,204 @@
1===============
2Lock Statistics
3===============
4
5What
6====
7
8As the name suggests, it provides statistics on locks.
9
10
11Why
12===
13
14Because things like lock contention can severely impact performance.
15
16How
17===
18
19Lockdep already has hooks in the lock functions and maps lock instances to
20lock classes. We build on that (see Documentation/locking/lockdep-design.rst).
21The graph below shows the relation between the lock functions and the various
22hooks therein::
23
24 __acquire
25 |
26 lock _____
27 | \
28 | __contended
29 | |
30 | <wait>
31 | _______/
32 |/
33 |
34 __acquired
35 |
36 .
37 <hold>
38 .
39 |
40 __release
41 |
42 unlock
43
44 lock, unlock - the regular lock functions
45 __* - the hooks
46 <> - states
47
48With these hooks we provide the following statistics:
49
50 con-bounces
51 - number of lock contention that involved x-cpu data
52 contentions
53 - number of lock acquisitions that had to wait
54 wait time
55 min
56 - shortest (non-0) time we ever had to wait for a lock
57 max
58 - longest time we ever had to wait for a lock
59 total
60 - total time we spend waiting on this lock
61 avg
62 - average time spent waiting on this lock
63 acq-bounces
64 - number of lock acquisitions that involved x-cpu data
65 acquisitions
66 - number of times we took the lock
67 hold time
68 min
69 - shortest (non-0) time we ever held the lock
70 max
71 - longest time we ever held the lock
72 total
73 - total time this lock was held
74 avg
75 - average time this lock was held
76
77These numbers are gathered per lock class, per read/write state (when
78applicable).
79
80It also tracks 4 contention points per class. A contention point is a call site
81that had to wait on lock acquisition.
82
83Configuration
84-------------
85
86Lock statistics are enabled via CONFIG_LOCK_STAT.
87
88Usage
89-----
90
91Enable collection of statistics::
92
93 # echo 1 >/proc/sys/kernel/lock_stat
94
95Disable collection of statistics::
96
97 # echo 0 >/proc/sys/kernel/lock_stat
98
99Look at the current lock statistics::
100
101 ( line numbers not part of actual output, done for clarity in the explanation
102 below )
103
104 # less /proc/lock_stat
105
106 01 lock_stat version 0.4
107 02-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
108 03 class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
109 04-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
110 05
111 06 &mm->mmap_sem-W: 46 84 0.26 939.10 16371.53 194.90 47291 2922365 0.16 2220301.69 17464026916.32 5975.99
112 07 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77
113 08 ---------------
114 09 &mm->mmap_sem 1 [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280
115 10 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
116 11 &mm->mmap_sem 34 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
117 12 &mm->mmap_sem 17 [<ffffffff81127e71>] vm_munmap+0x41/0x80
118 13 ---------------
119 14 &mm->mmap_sem 1 [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0
120 15 &mm->mmap_sem 60 [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250
121 16 &mm->mmap_sem 41 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
122 17 &mm->mmap_sem 68 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
123 18
124 19.............................................................................................................................................................................................................................
125 20
126 21 unix_table_lock: 110 112 0.21 49.24 163.91 1.46 21094 66312 0.12 624.42 31589.81 0.48
127 22 ---------------
128 23 unix_table_lock 45 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
129 24 unix_table_lock 47 [<ffffffff8150b111>] unix_release_sock+0x31/0x250
130 25 unix_table_lock 15 [<ffffffff8150ca37>] unix_find_other+0x117/0x230
131 26 unix_table_lock 5 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
132 27 ---------------
133 28 unix_table_lock 39 [<ffffffff8150b111>] unix_release_sock+0x31/0x250
134 29 unix_table_lock 49 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
135 30 unix_table_lock 20 [<ffffffff8150ca37>] unix_find_other+0x117/0x230
136 31 unix_table_lock 4 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
137
138
139This excerpt shows the first two lock class statistics. Line 01 shows the
140output version - each time the format changes this will be updated. Line 02-04
141show the header with column descriptions. Lines 05-18 and 20-31 show the actual
142statistics. These statistics come in two parts; the actual stats separated by a
143short separator (line 08, 13) from the contention points.
144
145Lines 09-12 show the first 4 recorded contention points (the code
146which tries to get the lock) and lines 14-17 show the first 4 recorded
147contended points (the lock holder). It is possible that the max
148con-bounces point is missing in the statistics.
149
150The first lock (05-18) is a read/write lock, and shows two lines above the
151short separator. The contention points don't match the column descriptors,
152they have two: contentions and [<IP>] symbol. The second set of contention
153points are the points we're contending with.
154
155The integer part of the time values is in us.
156
157Dealing with nested locks, subclasses may appear::
158
159 32...........................................................................................................................................................................................................................
160 33
161 34 &rq->lock: 13128 13128 0.43 190.53 103881.26 7.91 97454 3453404 0.00 401.11 13224683.11 3.82
162 35 ---------
163 36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
164 37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
165 38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a
166 39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb
167 40 ---------
168 41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
169 42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
170 43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
171 44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8
172 45
173 46...........................................................................................................................................................................................................................
174 47
175 48 &rq->lock/1: 1526 11488 0.33 388.73 136294.31 11.86 21461 38404 0.00 37.93 109388.53 2.84
176 49 -----------
177 50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
178 51 -----------
179 52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
180 53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8
181 54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
182 55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
183
184Line 48 shows statistics for the second subclass (/1) of &rq->lock class
185(subclass starts from 0), since in this case, as line 50 suggests,
186double_rq_lock actually acquires a nested lock of two spinlocks.
187
188View the top contending locks::
189
190 # grep : /proc/lock_stat | head
191 clockevents_lock: 2926159 2947636 0.15 46882.81 1784540466.34 605.41 3381345 3879161 0.00 2260.97 53178395.68 13.71
192 tick_broadcast_lock: 346460 346717 0.18 2257.43 39364622.71 113.54 3642919 4242696 0.00 2263.79 49173646.60 11.59
193 &mapping->i_mmap_mutex: 203896 203899 3.36 645530.05 31767507988.39 155800.21 3361776 8893984 0.17 2254.15 14110121.02 1.59
194 &rq->lock: 135014 136909 0.18 606.09 842160.68 6.15 1540728 10436146 0.00 728.72 17606683.41 1.69
195 &(&zone->lru_lock)->rlock: 93000 94934 0.16 59.18 188253.78 1.98 1199912 3809894 0.15 391.40 3559518.81 0.93
196 tasklist_lock-W: 40667 41130 0.23 1189.42 428980.51 10.43 270278 510106 0.16 653.51 3939674.91 7.72
197 tasklist_lock-R: 21298 21305 0.20 1310.05 215511.12 10.12 186204 241258 0.14 1162.33 1179779.23 4.89
198 rcu_node_1: 47656 49022 0.16 635.41 193616.41 3.95 844888 1865423 0.00 764.26 1656226.96 0.89
199 &(&dentry->d_lockref.lock)->rlock: 39791 40179 0.15 1302.08 88851.96 2.21 2790851 12527025 0.10 1910.75 3379714.27 0.27
200 rcu_node_0: 29203 30064 0.16 786.55 1555573.00 51.74 88963 244254 0.00 398.87 428872.51 1.76
201
202Clear the statistics::
203
204 # echo 0 > /proc/lock_stat
diff --git a/Documentation/locking/lockstat.txt b/Documentation/locking/lockstat.txt
deleted file mode 100644
index fdbeb0c45ef3..000000000000
--- a/Documentation/locking/lockstat.txt
+++ /dev/null
@@ -1,183 +0,0 @@
1
2LOCK STATISTICS
3
4- WHAT
5
6As the name suggests, it provides statistics on locks.
7
8- WHY
9
10Because things like lock contention can severely impact performance.
11
12- HOW
13
14Lockdep already has hooks in the lock functions and maps lock instances to
15lock classes. We build on that (see Documentation/locking/lockdep-design.txt).
16The graph below shows the relation between the lock functions and the various
17hooks therein.
18
19 __acquire
20 |
21 lock _____
22 | \
23 | __contended
24 | |
25 | <wait>
26 | _______/
27 |/
28 |
29 __acquired
30 |
31 .
32 <hold>
33 .
34 |
35 __release
36 |
37 unlock
38
39lock, unlock - the regular lock functions
40__* - the hooks
41<> - states
42
43With these hooks we provide the following statistics:
44
45 con-bounces - number of lock contention that involved x-cpu data
46 contentions - number of lock acquisitions that had to wait
47 wait time min - shortest (non-0) time we ever had to wait for a lock
48 max - longest time we ever had to wait for a lock
49 total - total time we spend waiting on this lock
50 avg - average time spent waiting on this lock
51 acq-bounces - number of lock acquisitions that involved x-cpu data
52 acquisitions - number of times we took the lock
53 hold time min - shortest (non-0) time we ever held the lock
54 max - longest time we ever held the lock
55 total - total time this lock was held
56 avg - average time this lock was held
57
58These numbers are gathered per lock class, per read/write state (when
59applicable).
60
61It also tracks 4 contention points per class. A contention point is a call site
62that had to wait on lock acquisition.
63
64 - CONFIGURATION
65
66Lock statistics are enabled via CONFIG_LOCK_STAT.
67
68 - USAGE
69
70Enable collection of statistics:
71
72# echo 1 >/proc/sys/kernel/lock_stat
73
74Disable collection of statistics:
75
76# echo 0 >/proc/sys/kernel/lock_stat
77
78Look at the current lock statistics:
79
80( line numbers not part of actual output, done for clarity in the explanation
81 below )
82
83# less /proc/lock_stat
84
8501 lock_stat version 0.4
8602-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8703 class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
8804-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8905
9006 &mm->mmap_sem-W: 46 84 0.26 939.10 16371.53 194.90 47291 2922365 0.16 2220301.69 17464026916.32 5975.99
9107 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77
9208 ---------------
9309 &mm->mmap_sem 1 [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280
9410 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
9511 &mm->mmap_sem 34 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
9612 &mm->mmap_sem 17 [<ffffffff81127e71>] vm_munmap+0x41/0x80
9713 ---------------
9814 &mm->mmap_sem 1 [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0
9915 &mm->mmap_sem 60 [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250
10016 &mm->mmap_sem 41 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
10117 &mm->mmap_sem 68 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
10218
10319.............................................................................................................................................................................................................................
10420
10521 unix_table_lock: 110 112 0.21 49.24 163.91 1.46 21094 66312 0.12 624.42 31589.81 0.48
10622 ---------------
10723 unix_table_lock 45 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
10824 unix_table_lock 47 [<ffffffff8150b111>] unix_release_sock+0x31/0x250
10925 unix_table_lock 15 [<ffffffff8150ca37>] unix_find_other+0x117/0x230
11026 unix_table_lock 5 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
11127 ---------------
11228 unix_table_lock 39 [<ffffffff8150b111>] unix_release_sock+0x31/0x250
11329 unix_table_lock 49 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
11430 unix_table_lock 20 [<ffffffff8150ca37>] unix_find_other+0x117/0x230
11531 unix_table_lock 4 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
116
117
118This excerpt shows the first two lock class statistics. Line 01 shows the
119output version - each time the format changes this will be updated. Line 02-04
120show the header with column descriptions. Lines 05-18 and 20-31 show the actual
121statistics. These statistics come in two parts; the actual stats separated by a
122short separator (line 08, 13) from the contention points.
123
124Lines 09-12 show the first 4 recorded contention points (the code
125which tries to get the lock) and lines 14-17 show the first 4 recorded
126contended points (the lock holder). It is possible that the max
127con-bounces point is missing in the statistics.
128
129The first lock (05-18) is a read/write lock, and shows two lines above the
130short separator. The contention points don't match the column descriptors,
131they have two: contentions and [<IP>] symbol. The second set of contention
132points are the points we're contending with.
133
134The integer part of the time values is in us.
135
136Dealing with nested locks, subclasses may appear:
137
13832...........................................................................................................................................................................................................................
13933
14034 &rq->lock: 13128 13128 0.43 190.53 103881.26 7.91 97454 3453404 0.00 401.11 13224683.11 3.82
14135 ---------
14236 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
14337 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
14438 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a
14539 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb
14640 ---------
14741 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
14842 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
14943 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
15044 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8
15145
15246...........................................................................................................................................................................................................................
15347
15448 &rq->lock/1: 1526 11488 0.33 388.73 136294.31 11.86 21461 38404 0.00 37.93 109388.53 2.84
15549 -----------
15650 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
15751 -----------
15852 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
15953 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8
16054 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
16155 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
162
163Line 48 shows statistics for the second subclass (/1) of &rq->lock class
164(subclass starts from 0), since in this case, as line 50 suggests,
165double_rq_lock actually acquires a nested lock of two spinlocks.
166
167View the top contending locks:
168
169# grep : /proc/lock_stat | head
170 clockevents_lock: 2926159 2947636 0.15 46882.81 1784540466.34 605.41 3381345 3879161 0.00 2260.97 53178395.68 13.71
171 tick_broadcast_lock: 346460 346717 0.18 2257.43 39364622.71 113.54 3642919 4242696 0.00 2263.79 49173646.60 11.59
172 &mapping->i_mmap_mutex: 203896 203899 3.36 645530.05 31767507988.39 155800.21 3361776 8893984 0.17 2254.15 14110121.02 1.59
173 &rq->lock: 135014 136909 0.18 606.09 842160.68 6.15 1540728 10436146 0.00 728.72 17606683.41 1.69
174 &(&zone->lru_lock)->rlock: 93000 94934 0.16 59.18 188253.78 1.98 1199912 3809894 0.15 391.40 3559518.81 0.93
175 tasklist_lock-W: 40667 41130 0.23 1189.42 428980.51 10.43 270278 510106 0.16 653.51 3939674.91 7.72
176 tasklist_lock-R: 21298 21305 0.20 1310.05 215511.12 10.12 186204 241258 0.14 1162.33 1179779.23 4.89
177 rcu_node_1: 47656 49022 0.16 635.41 193616.41 3.95 844888 1865423 0.00 764.26 1656226.96 0.89
178 &(&dentry->d_lockref.lock)->rlock: 39791 40179 0.15 1302.08 88851.96 2.21 2790851 12527025 0.10 1910.75 3379714.27 0.27
179 rcu_node_0: 29203 30064 0.16 786.55 1555573.00 51.74 88963 244254 0.00 398.87 428872.51 1.76
180
181Clear the statistics:
182
183# echo 0 > /proc/lock_stat
diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.rst
index 6a8df4cd19bf..e79eeeca3ac6 100644
--- a/Documentation/locking/locktorture.txt
+++ b/Documentation/locking/locktorture.rst
@@ -1,6 +1,9 @@
1==================================
1Kernel Lock Torture Test Operation 2Kernel Lock Torture Test Operation
3==================================
2 4
3CONFIG_LOCK_TORTURE_TEST 5CONFIG_LOCK_TORTURE_TEST
6========================
4 7
5The CONFIG LOCK_TORTURE_TEST config option provides a kernel module 8The CONFIG LOCK_TORTURE_TEST config option provides a kernel module
6that runs torture tests on core kernel locking primitives. The kernel 9that runs torture tests on core kernel locking primitives. The kernel
@@ -18,61 +21,77 @@ can be simulated by either enlarging this critical region hold time and/or
18creating more kthreads. 21creating more kthreads.
19 22
20 23
21MODULE PARAMETERS 24Module Parameters
25=================
22 26
23This module has the following parameters: 27This module has the following parameters:
24 28
25 29
26 ** Locktorture-specific ** 30Locktorture-specific
31--------------------
27 32
28nwriters_stress Number of kernel threads that will stress exclusive lock 33nwriters_stress
34 Number of kernel threads that will stress exclusive lock
29 ownership (writers). The default value is twice the number 35 ownership (writers). The default value is twice the number
30 of online CPUs. 36 of online CPUs.
31 37
32nreaders_stress Number of kernel threads that will stress shared lock 38nreaders_stress
39 Number of kernel threads that will stress shared lock
33 ownership (readers). The default is the same amount of writer 40 ownership (readers). The default is the same amount of writer
34 locks. If the user did not specify nwriters_stress, then 41 locks. If the user did not specify nwriters_stress, then
35 both readers and writers be the amount of online CPUs. 42 both readers and writers be the amount of online CPUs.
36 43
37torture_type Type of lock to torture. By default, only spinlocks will 44torture_type
45 Type of lock to torture. By default, only spinlocks will
38 be tortured. This module can torture the following locks, 46 be tortured. This module can torture the following locks,
39 with string values as follows: 47 with string values as follows:
40 48
41 o "lock_busted": Simulates a buggy lock implementation. 49 - "lock_busted":
50 Simulates a buggy lock implementation.
42 51
43 o "spin_lock": spin_lock() and spin_unlock() pairs. 52 - "spin_lock":
53 spin_lock() and spin_unlock() pairs.
44 54
45 o "spin_lock_irq": spin_lock_irq() and spin_unlock_irq() 55 - "spin_lock_irq":
46 pairs. 56 spin_lock_irq() and spin_unlock_irq() pairs.
47 57
48 o "rw_lock": read/write lock() and unlock() rwlock pairs. 58 - "rw_lock":
59 read/write lock() and unlock() rwlock pairs.
49 60
50 o "rw_lock_irq": read/write lock_irq() and unlock_irq() 61 - "rw_lock_irq":
51 rwlock pairs. 62 read/write lock_irq() and unlock_irq()
63 rwlock pairs.
52 64
53 o "mutex_lock": mutex_lock() and mutex_unlock() pairs. 65 - "mutex_lock":
66 mutex_lock() and mutex_unlock() pairs.
54 67
55 o "rtmutex_lock": rtmutex_lock() and rtmutex_unlock() 68 - "rtmutex_lock":
56 pairs. Kernel must have CONFIG_RT_MUTEX=y. 69 rtmutex_lock() and rtmutex_unlock() pairs.
70 Kernel must have CONFIG_RT_MUTEX=y.
57 71
58 o "rwsem_lock": read/write down() and up() semaphore pairs. 72 - "rwsem_lock":
73 read/write down() and up() semaphore pairs.
59 74
60 75
61 ** Torture-framework (RCU + locking) ** 76Torture-framework (RCU + locking)
77---------------------------------
62 78
63shutdown_secs The number of seconds to run the test before terminating 79shutdown_secs
80 The number of seconds to run the test before terminating
64 the test and powering off the system. The default is 81 the test and powering off the system. The default is
65 zero, which disables test termination and system shutdown. 82 zero, which disables test termination and system shutdown.
66 This capability is useful for automated testing. 83 This capability is useful for automated testing.
67 84
68onoff_interval The number of seconds between each attempt to execute a 85onoff_interval
86 The number of seconds between each attempt to execute a
69 randomly selected CPU-hotplug operation. Defaults 87 randomly selected CPU-hotplug operation. Defaults
70 to zero, which disables CPU hotplugging. In 88 to zero, which disables CPU hotplugging. In
71 CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently 89 CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently
72 refuse to do any CPU-hotplug operations regardless of 90 refuse to do any CPU-hotplug operations regardless of
73 what value is specified for onoff_interval. 91 what value is specified for onoff_interval.
74 92
75onoff_holdoff The number of seconds to wait until starting CPU-hotplug 93onoff_holdoff
94 The number of seconds to wait until starting CPU-hotplug
76 operations. This would normally only be used when 95 operations. This would normally only be used when
77 locktorture was built into the kernel and started 96 locktorture was built into the kernel and started
78 automatically at boot time, in which case it is useful 97 automatically at boot time, in which case it is useful
@@ -80,53 +99,59 @@ onoff_holdoff The number of seconds to wait until starting CPU-hotplug
80 coming and going. This parameter is only useful if 99 coming and going. This parameter is only useful if
81 CONFIG_HOTPLUG_CPU is enabled. 100 CONFIG_HOTPLUG_CPU is enabled.
82 101
83stat_interval Number of seconds between statistics-related printk()s. 102stat_interval
103 Number of seconds between statistics-related printk()s.
84 By default, locktorture will report stats every 60 seconds. 104 By default, locktorture will report stats every 60 seconds.
85 Setting the interval to zero causes the statistics to 105 Setting the interval to zero causes the statistics to
86 be printed -only- when the module is unloaded, and this 106 be printed -only- when the module is unloaded, and this
87 is the default. 107 is the default.
88 108
89stutter The length of time to run the test before pausing for this 109stutter
110 The length of time to run the test before pausing for this
90 same period of time. Defaults to "stutter=5", so as 111 same period of time. Defaults to "stutter=5", so as
91 to run and pause for (roughly) five-second intervals. 112 to run and pause for (roughly) five-second intervals.
92 Specifying "stutter=0" causes the test to run continuously 113 Specifying "stutter=0" causes the test to run continuously
93 without pausing, which is the old default behavior. 114 without pausing, which is the old default behavior.
94 115
95shuffle_interval The number of seconds to keep the test threads affinitied 116shuffle_interval
117 The number of seconds to keep the test threads affinitied
96 to a particular subset of the CPUs, defaults to 3 seconds. 118 to a particular subset of the CPUs, defaults to 3 seconds.
97 Used in conjunction with test_no_idle_hz. 119 Used in conjunction with test_no_idle_hz.
98 120
99verbose Enable verbose debugging printing, via printk(). Enabled 121verbose
122 Enable verbose debugging printing, via printk(). Enabled
100 by default. This extra information is mostly related to 123 by default. This extra information is mostly related to
101 high-level errors and reports from the main 'torture' 124 high-level errors and reports from the main 'torture'
102 framework. 125 framework.
103 126
104 127
105STATISTICS 128Statistics
129==========
106 130
107Statistics are printed in the following format: 131Statistics are printed in the following format::
108 132
109spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0 133 spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0
110 (A) (B) (C) (D) (E) 134 (A) (B) (C) (D) (E)
111 135
112(A): Lock type that is being tortured -- torture_type parameter. 136 (A): Lock type that is being tortured -- torture_type parameter.
113 137
114(B): Number of writer lock acquisitions. If dealing with a read/write primitive 138 (B): Number of writer lock acquisitions. If dealing with a read/write
115 a second "Reads" statistics line is printed. 139 primitive a second "Reads" statistics line is printed.
116 140
117(C): Number of times the lock was acquired. 141 (C): Number of times the lock was acquired.
118 142
119(D): Min and max number of times threads failed to acquire the lock. 143 (D): Min and max number of times threads failed to acquire the lock.
120 144
121(E): true/false values if there were errors acquiring the lock. This should 145 (E): true/false values if there were errors acquiring the lock. This should
122 -only- be positive if there is a bug in the locking primitive's 146 -only- be positive if there is a bug in the locking primitive's
123 implementation. Otherwise a lock should never fail (i.e., spin_lock()). 147 implementation. Otherwise a lock should never fail (i.e., spin_lock()).
124 Of course, the same applies for (C), above. A dummy example of this is 148 Of course, the same applies for (C), above. A dummy example of this is
125 the "lock_busted" type. 149 the "lock_busted" type.
126 150
127USAGE 151Usage
152=====
128 153
129The following script may be used to torture locks: 154The following script may be used to torture locks::
130 155
131 #!/bin/sh 156 #!/bin/sh
132 157
diff --git a/Documentation/locking/mutex-design.txt b/Documentation/locking/mutex-design.rst
index 818aca19612f..4d8236b81fa5 100644
--- a/Documentation/locking/mutex-design.txt
+++ b/Documentation/locking/mutex-design.rst
@@ -1,6 +1,9 @@
1=======================
1Generic Mutex Subsystem 2Generic Mutex Subsystem
3=======================
2 4
3started by Ingo Molnar <mingo@redhat.com> 5started by Ingo Molnar <mingo@redhat.com>
6
4updated by Davidlohr Bueso <davidlohr@hp.com> 7updated by Davidlohr Bueso <davidlohr@hp.com>
5 8
6What are mutexes? 9What are mutexes?
@@ -23,7 +26,7 @@ Implementation
23Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h 26Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
24and implemented in kernel/locking/mutex.c. These locks use an atomic variable 27and implemented in kernel/locking/mutex.c. These locks use an atomic variable
25(->owner) to keep track of the lock state during its lifetime. Field owner 28(->owner) to keep track of the lock state during its lifetime. Field owner
26actually contains 'struct task_struct *' to the current lock owner and it is 29actually contains `struct task_struct *` to the current lock owner and it is
27therefore NULL if not currently owned. Since task_struct pointers are aligned 30therefore NULL if not currently owned. Since task_struct pointers are aligned
28at at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g., 31at at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g.,
29if waiter list is non-empty). In its most basic form it also includes a 32if waiter list is non-empty). In its most basic form it also includes a
@@ -101,29 +104,36 @@ features that make lock debugging easier and faster:
101 104
102Interfaces 105Interfaces
103---------- 106----------
104Statically define the mutex: 107Statically define the mutex::
108
105 DEFINE_MUTEX(name); 109 DEFINE_MUTEX(name);
106 110
107Dynamically initialize the mutex: 111Dynamically initialize the mutex::
112
108 mutex_init(mutex); 113 mutex_init(mutex);
109 114
110Acquire the mutex, uninterruptible: 115Acquire the mutex, uninterruptible::
116
111 void mutex_lock(struct mutex *lock); 117 void mutex_lock(struct mutex *lock);
112 void mutex_lock_nested(struct mutex *lock, unsigned int subclass); 118 void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
113 int mutex_trylock(struct mutex *lock); 119 int mutex_trylock(struct mutex *lock);
114 120
115Acquire the mutex, interruptible: 121Acquire the mutex, interruptible::
122
116 int mutex_lock_interruptible_nested(struct mutex *lock, 123 int mutex_lock_interruptible_nested(struct mutex *lock,
117 unsigned int subclass); 124 unsigned int subclass);
118 int mutex_lock_interruptible(struct mutex *lock); 125 int mutex_lock_interruptible(struct mutex *lock);
119 126
120Acquire the mutex, interruptible, if dec to 0: 127Acquire the mutex, interruptible, if dec to 0::
128
121 int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); 129 int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
122 130
123Unlock the mutex: 131Unlock the mutex::
132
124 void mutex_unlock(struct mutex *lock); 133 void mutex_unlock(struct mutex *lock);
125 134
126Test if the mutex is taken: 135Test if the mutex is taken::
136
127 int mutex_is_locked(struct mutex *lock); 137 int mutex_is_locked(struct mutex *lock);
128 138
129Disadvantages 139Disadvantages
diff --git a/Documentation/locking/rt-mutex-design.txt b/Documentation/locking/rt-mutex-design.rst
index 3d7b865539cc..59c2a64efb21 100644
--- a/Documentation/locking/rt-mutex-design.txt
+++ b/Documentation/locking/rt-mutex-design.rst
@@ -1,14 +1,15 @@
1# 1==============================
2# Copyright (c) 2006 Steven Rostedt
3# Licensed under the GNU Free Documentation License, Version 1.2
4#
5
6RT-mutex implementation design 2RT-mutex implementation design
7------------------------------ 3==============================
4
5Copyright (c) 2006 Steven Rostedt
6
7Licensed under the GNU Free Documentation License, Version 1.2
8
8 9
9This document tries to describe the design of the rtmutex.c implementation. 10This document tries to describe the design of the rtmutex.c implementation.
10It doesn't describe the reasons why rtmutex.c exists. For that please see 11It doesn't describe the reasons why rtmutex.c exists. For that please see
11Documentation/locking/rt-mutex.txt. Although this document does explain problems 12Documentation/locking/rt-mutex.rst. Although this document does explain problems
12that happen without this code, but that is in the concept to understand 13that happen without this code, but that is in the concept to understand
13what the code actually is doing. 14what the code actually is doing.
14 15
@@ -41,17 +42,17 @@ to release the lock, because for all we know, B is a CPU hog and will
41never give C a chance to release the lock. This is called unbounded priority 42never give C a chance to release the lock. This is called unbounded priority
42inversion. 43inversion.
43 44
44Here's a little ASCII art to show the problem. 45Here's a little ASCII art to show the problem::
45 46
46 grab lock L1 (owned by C) 47 grab lock L1 (owned by C)
47 | 48 |
48A ---+ 49 A ---+
49 C preempted by B 50 C preempted by B
50 | 51 |
51C +----+ 52 C +----+
52 53
53B +--------> 54 B +-------->
54 B now keeps A from running. 55 B now keeps A from running.
55 56
56 57
57Priority Inheritance (PI) 58Priority Inheritance (PI)
@@ -75,24 +76,29 @@ Terminology
75Here I explain some terminology that is used in this document to help describe 76Here I explain some terminology that is used in this document to help describe
76the design that is used to implement PI. 77the design that is used to implement PI.
77 78
78PI chain - The PI chain is an ordered series of locks and processes that cause 79PI chain
80 - The PI chain is an ordered series of locks and processes that cause
79 processes to inherit priorities from a previous process that is 81 processes to inherit priorities from a previous process that is
80 blocked on one of its locks. This is described in more detail 82 blocked on one of its locks. This is described in more detail
81 later in this document. 83 later in this document.
82 84
83mutex - In this document, to differentiate from locks that implement 85mutex
86 - In this document, to differentiate from locks that implement
84 PI and spin locks that are used in the PI code, from now on 87 PI and spin locks that are used in the PI code, from now on
85 the PI locks will be called a mutex. 88 the PI locks will be called a mutex.
86 89
87lock - In this document from now on, I will use the term lock when 90lock
91 - In this document from now on, I will use the term lock when
88 referring to spin locks that are used to protect parts of the PI 92 referring to spin locks that are used to protect parts of the PI
89 algorithm. These locks disable preemption for UP (when 93 algorithm. These locks disable preemption for UP (when
90 CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from 94 CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from
91 entering critical sections simultaneously. 95 entering critical sections simultaneously.
92 96
93spin lock - Same as lock above. 97spin lock
98 - Same as lock above.
94 99
95waiter - A waiter is a struct that is stored on the stack of a blocked 100waiter
101 - A waiter is a struct that is stored on the stack of a blocked
96 process. Since the scope of the waiter is within the code for 102 process. Since the scope of the waiter is within the code for
97 a process being blocked on the mutex, it is fine to allocate 103 a process being blocked on the mutex, it is fine to allocate
98 the waiter on the process's stack (local variable). This 104 the waiter on the process's stack (local variable). This
@@ -104,14 +110,18 @@ waiter - A waiter is a struct that is stored on the stack of a blocked
104 waiter is sometimes used in reference to the task that is waiting 110 waiter is sometimes used in reference to the task that is waiting
105 on a mutex. This is the same as waiter->task. 111 on a mutex. This is the same as waiter->task.
106 112
107waiters - A list of processes that are blocked on a mutex. 113waiters
114 - A list of processes that are blocked on a mutex.
108 115
109top waiter - The highest priority process waiting on a specific mutex. 116top waiter
117 - The highest priority process waiting on a specific mutex.
110 118
111top pi waiter - The highest priority process waiting on one of the mutexes 119top pi waiter
120 - The highest priority process waiting on one of the mutexes
112 that a specific process owns. 121 that a specific process owns.
113 122
114Note: task and process are used interchangeably in this document, mostly to 123Note:
124 task and process are used interchangeably in this document, mostly to
115 differentiate between two processes that are being described together. 125 differentiate between two processes that are being described together.
116 126
117 127
@@ -123,7 +133,7 @@ inheritance to take place. Multiple chains may converge, but a chain
123would never diverge, since a process can't be blocked on more than one 133would never diverge, since a process can't be blocked on more than one
124mutex at a time. 134mutex at a time.
125 135
126Example: 136Example::
127 137
128 Process: A, B, C, D, E 138 Process: A, B, C, D, E
129 Mutexes: L1, L2, L3, L4 139 Mutexes: L1, L2, L3, L4
@@ -137,21 +147,21 @@ Example:
137 D owns L4 147 D owns L4
138 E blocked on L4 148 E blocked on L4
139 149
140The chain would be: 150The chain would be::
141 151
142 E->L4->D->L3->C->L2->B->L1->A 152 E->L4->D->L3->C->L2->B->L1->A
143 153
144To show where two chains merge, we could add another process F and 154To show where two chains merge, we could add another process F and
145another mutex L5 where B owns L5 and F is blocked on mutex L5. 155another mutex L5 where B owns L5 and F is blocked on mutex L5.
146 156
147The chain for F would be: 157The chain for F would be::
148 158
149 F->L5->B->L1->A 159 F->L5->B->L1->A
150 160
151Since a process may own more than one mutex, but never be blocked on more than 161Since a process may own more than one mutex, but never be blocked on more than
152one, the chains merge. 162one, the chains merge.
153 163
154Here we show both chains: 164Here we show both chains::
155 165
156 E->L4->D->L3->C->L2-+ 166 E->L4->D->L3->C->L2-+
157 | 167 |
@@ -165,12 +175,12 @@ than the processes to the left or below in the chain.
165 175
166Also since a mutex may have more than one process blocked on it, we can 176Also since a mutex may have more than one process blocked on it, we can
167have multiple chains merge at mutexes. If we add another process G that is 177have multiple chains merge at mutexes. If we add another process G that is
168blocked on mutex L2: 178blocked on mutex L2::
169 179
170 G->L2->B->L1->A 180 G->L2->B->L1->A
171 181
172And once again, to show how this can grow I will show the merging chains 182And once again, to show how this can grow I will show the merging chains
173again. 183again::
174 184
175 E->L4->D->L3->C-+ 185 E->L4->D->L3->C-+
176 +->L2-+ 186 +->L2-+
@@ -184,7 +194,7 @@ the chain (A and B in this example), must have their priorities increased
184to that of G. 194to that of G.
185 195
186Mutex Waiters Tree 196Mutex Waiters Tree
187----------------- 197------------------
188 198
189Every mutex keeps track of all the waiters that are blocked on itself. The 199Every mutex keeps track of all the waiters that are blocked on itself. The
190mutex has a rbtree to store these waiters by priority. This tree is protected 200mutex has a rbtree to store these waiters by priority. This tree is protected
@@ -219,19 +229,19 @@ defined. But is very complex to figure it out, since it depends on all
219the nesting of mutexes. Let's look at the example where we have 3 mutexes, 229the nesting of mutexes. Let's look at the example where we have 3 mutexes,
220L1, L2, and L3, and four separate functions func1, func2, func3 and func4. 230L1, L2, and L3, and four separate functions func1, func2, func3 and func4.
221The following shows a locking order of L1->L2->L3, but may not actually 231The following shows a locking order of L1->L2->L3, but may not actually
222be directly nested that way. 232be directly nested that way::
223 233
224void func1(void) 234 void func1(void)
225{ 235 {
226 mutex_lock(L1); 236 mutex_lock(L1);
227 237
228 /* do anything */ 238 /* do anything */
229 239
230 mutex_unlock(L1); 240 mutex_unlock(L1);
231} 241 }
232 242
233void func2(void) 243 void func2(void)
234{ 244 {
235 mutex_lock(L1); 245 mutex_lock(L1);
236 mutex_lock(L2); 246 mutex_lock(L2);
237 247
@@ -239,10 +249,10 @@ void func2(void)
239 249
240 mutex_unlock(L2); 250 mutex_unlock(L2);
241 mutex_unlock(L1); 251 mutex_unlock(L1);
242} 252 }
243 253
244void func3(void) 254 void func3(void)
245{ 255 {
246 mutex_lock(L2); 256 mutex_lock(L2);
247 mutex_lock(L3); 257 mutex_lock(L3);
248 258
@@ -250,30 +260,30 @@ void func3(void)
250 260
251 mutex_unlock(L3); 261 mutex_unlock(L3);
252 mutex_unlock(L2); 262 mutex_unlock(L2);
253} 263 }
254 264
255void func4(void) 265 void func4(void)
256{ 266 {
257 mutex_lock(L3); 267 mutex_lock(L3);
258 268
259 /* do something again */ 269 /* do something again */
260 270
261 mutex_unlock(L3); 271 mutex_unlock(L3);
262} 272 }
263 273
264Now we add 4 processes that run each of these functions separately. 274Now we add 4 processes that run each of these functions separately.
265Processes A, B, C, and D which run functions func1, func2, func3 and func4 275Processes A, B, C, and D which run functions func1, func2, func3 and func4
266respectively, and such that D runs first and A last. With D being preempted 276respectively, and such that D runs first and A last. With D being preempted
267in func4 in the "do something again" area, we have a locking that follows: 277in func4 in the "do something again" area, we have a locking that follows::
268 278
269D owns L3 279 D owns L3
270 C blocked on L3 280 C blocked on L3
271 C owns L2 281 C owns L2
272 B blocked on L2 282 B blocked on L2
273 B owns L1 283 B owns L1
274 A blocked on L1 284 A blocked on L1
275 285
276And thus we have the chain A->L1->B->L2->C->L3->D. 286 And thus we have the chain A->L1->B->L2->C->L3->D.
277 287
278This gives us a PI depth of 4 (four processes), but looking at any of the 288This gives us a PI depth of 4 (four processes), but looking at any of the
279functions individually, it seems as though they only have at most a locking 289functions individually, it seems as though they only have at most a locking
@@ -298,7 +308,7 @@ not true, the rtmutex.c code will be broken!), this allows for the least
298significant bit to be used as a flag. Bit 0 is used as the "Has Waiters" 308significant bit to be used as a flag. Bit 0 is used as the "Has Waiters"
299flag. It's set whenever there are waiters on a mutex. 309flag. It's set whenever there are waiters on a mutex.
300 310
301See Documentation/locking/rt-mutex.txt for further details. 311See Documentation/locking/rt-mutex.rst for further details.
302 312
303cmpxchg Tricks 313cmpxchg Tricks
304-------------- 314--------------
@@ -307,17 +317,17 @@ Some architectures implement an atomic cmpxchg (Compare and Exchange). This
307is used (when applicable) to keep the fast path of grabbing and releasing 317is used (when applicable) to keep the fast path of grabbing and releasing
308mutexes short. 318mutexes short.
309 319
310cmpxchg is basically the following function performed atomically: 320cmpxchg is basically the following function performed atomically::
311 321
312unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C) 322 unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C)
313{ 323 {
314 unsigned long T = *A; 324 unsigned long T = *A;
315 if (*A == *B) { 325 if (*A == *B) {
316 *A = *C; 326 *A = *C;
317 } 327 }
318 return T; 328 return T;
319} 329 }
320#define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c) 330 #define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c)
321 331
322This is really nice to have, since it allows you to only update a variable 332This is really nice to have, since it allows you to only update a variable
323if the variable is what you expect it to be. You know if it succeeded if 333if the variable is what you expect it to be. You know if it succeeded if
@@ -352,9 +362,10 @@ Then rt_mutex_setprio is called to adjust the priority of the task to the
352new priority. Note that rt_mutex_setprio is defined in kernel/sched/core.c 362new priority. Note that rt_mutex_setprio is defined in kernel/sched/core.c
353to implement the actual change in priority. 363to implement the actual change in priority.
354 364
355(Note: For the "prio" field in task_struct, the lower the number, the 365Note:
366 For the "prio" field in task_struct, the lower the number, the
356 higher the priority. A "prio" of 5 is of higher priority than a 367 higher the priority. A "prio" of 5 is of higher priority than a
357 "prio" of 10.) 368 "prio" of 10.
358 369
359It is interesting to note that rt_mutex_adjust_prio can either increase 370It is interesting to note that rt_mutex_adjust_prio can either increase
360or decrease the priority of the task. In the case that a higher priority 371or decrease the priority of the task. In the case that a higher priority
@@ -439,6 +450,7 @@ wait_lock, which this code currently holds. So setting the "Has Waiters" flag
439forces the current owner to synchronize with this code. 450forces the current owner to synchronize with this code.
440 451
441The lock is taken if the following are true: 452The lock is taken if the following are true:
453
442 1) The lock has no owner 454 1) The lock has no owner
443 2) The current task is the highest priority against all other 455 2) The current task is the highest priority against all other
444 waiters of the lock 456 waiters of the lock
@@ -546,10 +558,13 @@ Credits
546------- 558-------
547 559
548Author: Steven Rostedt <rostedt@goodmis.org> 560Author: Steven Rostedt <rostedt@goodmis.org>
561
549Updated: Alex Shi <alex.shi@linaro.org> - 7/6/2017 562Updated: Alex Shi <alex.shi@linaro.org> - 7/6/2017
550 563
551Original Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and 564Original Reviewers:
565 Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and
552 Randy Dunlap 566 Randy Dunlap
567
553Update (7/6/2017) Reviewers: Steven Rostedt and Sebastian Siewior 568Update (7/6/2017) Reviewers: Steven Rostedt and Sebastian Siewior
554 569
555Updates 570Updates
diff --git a/Documentation/locking/rt-mutex.txt b/Documentation/locking/rt-mutex.rst
index 35793e003041..c365dc302081 100644
--- a/Documentation/locking/rt-mutex.txt
+++ b/Documentation/locking/rt-mutex.rst
@@ -1,5 +1,6 @@
1==================================
1RT-mutex subsystem with PI support 2RT-mutex subsystem with PI support
2---------------------------------- 3==================================
3 4
4RT-mutexes with priority inheritance are used to support PI-futexes, 5RT-mutexes with priority inheritance are used to support PI-futexes,
5which enable pthread_mutex_t priority inheritance attributes 6which enable pthread_mutex_t priority inheritance attributes
@@ -46,27 +47,30 @@ The state of the rt-mutex is tracked via the owner field of the rt-mutex
46structure: 47structure:
47 48
48lock->owner holds the task_struct pointer of the owner. Bit 0 is used to 49lock->owner holds the task_struct pointer of the owner. Bit 0 is used to
49keep track of the "lock has waiters" state. 50keep track of the "lock has waiters" state:
50 51
51 owner bit0 52 ============ ======= ================================================
53 owner bit0 Notes
54 ============ ======= ================================================
52 NULL 0 lock is free (fast acquire possible) 55 NULL 0 lock is free (fast acquire possible)
53 NULL 1 lock is free and has waiters and the top waiter 56 NULL 1 lock is free and has waiters and the top waiter
54 is going to take the lock* 57 is going to take the lock [1]_
55 taskpointer 0 lock is held (fast release possible) 58 taskpointer 0 lock is held (fast release possible)
56 taskpointer 1 lock is held and has waiters** 59 taskpointer 1 lock is held and has waiters [2]_
60 ============ ======= ================================================
57 61
58The fast atomic compare exchange based acquire and release is only 62The fast atomic compare exchange based acquire and release is only
59possible when bit 0 of lock->owner is 0. 63possible when bit 0 of lock->owner is 0.
60 64
61(*) It also can be a transitional state when grabbing the lock 65.. [1] It also can be a transitional state when grabbing the lock
62with ->wait_lock is held. To prevent any fast path cmpxchg to the lock, 66 with ->wait_lock is held. To prevent any fast path cmpxchg to the lock,
63we need to set the bit0 before looking at the lock, and the owner may be 67 we need to set the bit0 before looking at the lock, and the owner may
64NULL in this small time, hence this can be a transitional state. 68 be NULL in this small time, hence this can be a transitional state.
65 69
66(**) There is a small time when bit 0 is set but there are no 70.. [2] There is a small time when bit 0 is set but there are no
67waiters. This can happen when grabbing the lock in the slow path. 71 waiters. This can happen when grabbing the lock in the slow path.
68To prevent a cmpxchg of the owner releasing the lock, we need to 72 To prevent a cmpxchg of the owner releasing the lock, we need to
69set this bit before looking at the lock. 73 set this bit before looking at the lock.
70 74
71BTW, there is still technically a "Pending Owner", it's just not called 75BTW, there is still technically a "Pending Owner", it's just not called
72that anymore. The pending owner happens to be the top_waiter of a lock 76that anymore. The pending owner happens to be the top_waiter of a lock
diff --git a/Documentation/locking/spinlocks.txt b/Documentation/locking/spinlocks.rst
index ff35e40bdf5b..098107fb7d86 100644
--- a/Documentation/locking/spinlocks.txt
+++ b/Documentation/locking/spinlocks.rst
@@ -1,8 +1,13 @@
1===============
2Locking lessons
3===============
4
1Lesson 1: Spin locks 5Lesson 1: Spin locks
6====================
2 7
3The most basic primitive for locking is spinlock. 8The most basic primitive for locking is spinlock::
4 9
5static DEFINE_SPINLOCK(xxx_lock); 10 static DEFINE_SPINLOCK(xxx_lock);
6 11
7 unsigned long flags; 12 unsigned long flags;
8 13
@@ -19,23 +24,25 @@ worry about UP vs SMP issues: the spinlocks work correctly under both.
19 NOTE! Implications of spin_locks for memory are further described in: 24 NOTE! Implications of spin_locks for memory are further described in:
20 25
21 Documentation/memory-barriers.txt 26 Documentation/memory-barriers.txt
27
22 (5) LOCK operations. 28 (5) LOCK operations.
29
23 (6) UNLOCK operations. 30 (6) UNLOCK operations.
24 31
25The above is usually pretty simple (you usually need and want only one 32The above is usually pretty simple (you usually need and want only one
26spinlock for most things - using more than one spinlock can make things a 33spinlock for most things - using more than one spinlock can make things a
27lot more complex and even slower and is usually worth it only for 34lot more complex and even slower and is usually worth it only for
28sequences that you _know_ need to be split up: avoid it at all cost if you 35sequences that you **know** need to be split up: avoid it at all cost if you
29aren't sure). 36aren't sure).
30 37
31This is really the only really hard part about spinlocks: once you start 38This is really the only really hard part about spinlocks: once you start
32using spinlocks they tend to expand to areas you might not have noticed 39using spinlocks they tend to expand to areas you might not have noticed
33before, because you have to make sure the spinlocks correctly protect the 40before, because you have to make sure the spinlocks correctly protect the
34shared data structures _everywhere_ they are used. The spinlocks are most 41shared data structures **everywhere** they are used. The spinlocks are most
35easily added to places that are completely independent of other code (for 42easily added to places that are completely independent of other code (for
36example, internal driver data structures that nobody else ever touches). 43example, internal driver data structures that nobody else ever touches).
37 44
38 NOTE! The spin-lock is safe only when you _also_ use the lock itself 45 NOTE! The spin-lock is safe only when you **also** use the lock itself
39 to do locking across CPU's, which implies that EVERYTHING that 46 to do locking across CPU's, which implies that EVERYTHING that
40 touches a shared variable has to agree about the spinlock they want 47 touches a shared variable has to agree about the spinlock they want
41 to use. 48 to use.
@@ -43,6 +50,7 @@ example, internal driver data structures that nobody else ever touches).
43---- 50----
44 51
45Lesson 2: reader-writer spinlocks. 52Lesson 2: reader-writer spinlocks.
53==================================
46 54
47If your data accesses have a very natural pattern where you usually tend 55If your data accesses have a very natural pattern where you usually tend
48to mostly read from the shared variables, the reader-writer locks 56to mostly read from the shared variables, the reader-writer locks
@@ -54,7 +62,7 @@ to change the variables it has to get an exclusive write lock.
54 simple spinlocks. Unless the reader critical section is long, you 62 simple spinlocks. Unless the reader critical section is long, you
55 are better off just using spinlocks. 63 are better off just using spinlocks.
56 64
57The routines look the same as above: 65The routines look the same as above::
58 66
59 rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock); 67 rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock);
60 68
@@ -71,7 +79,7 @@ The routines look the same as above:
71The above kind of lock may be useful for complex data structures like 79The above kind of lock may be useful for complex data structures like
72linked lists, especially searching for entries without changing the list 80linked lists, especially searching for entries without changing the list
73itself. The read lock allows many concurrent readers. Anything that 81itself. The read lock allows many concurrent readers. Anything that
74_changes_ the list will have to get the write lock. 82**changes** the list will have to get the write lock.
75 83
76 NOTE! RCU is better for list traversal, but requires careful 84 NOTE! RCU is better for list traversal, but requires careful
77 attention to design detail (see Documentation/RCU/listRCU.txt). 85 attention to design detail (see Documentation/RCU/listRCU.txt).
@@ -87,10 +95,11 @@ to get the write-lock at the very beginning.
87---- 95----
88 96
89Lesson 3: spinlocks revisited. 97Lesson 3: spinlocks revisited.
98==============================
90 99
91The single spin-lock primitives above are by no means the only ones. They 100The single spin-lock primitives above are by no means the only ones. They
92are the most safe ones, and the ones that work under all circumstances, 101are the most safe ones, and the ones that work under all circumstances,
93but partly _because_ they are safe they are also fairly slow. They are slower 102but partly **because** they are safe they are also fairly slow. They are slower
94than they'd need to be, because they do have to disable interrupts 103than they'd need to be, because they do have to disable interrupts
95(which is just a single instruction on a x86, but it's an expensive one - 104(which is just a single instruction on a x86, but it's an expensive one -
96and on other architectures it can be worse). 105and on other architectures it can be worse).
@@ -98,7 +107,7 @@ and on other architectures it can be worse).
98If you have a case where you have to protect a data structure across 107If you have a case where you have to protect a data structure across
99several CPU's and you want to use spinlocks you can potentially use 108several CPU's and you want to use spinlocks you can potentially use
100cheaper versions of the spinlocks. IFF you know that the spinlocks are 109cheaper versions of the spinlocks. IFF you know that the spinlocks are
101never used in interrupt handlers, you can use the non-irq versions: 110never used in interrupt handlers, you can use the non-irq versions::
102 111
103 spin_lock(&lock); 112 spin_lock(&lock);
104 ... 113 ...
@@ -110,7 +119,7 @@ This is useful if you know that the data in question is only ever
110manipulated from a "process context", ie no interrupts involved. 119manipulated from a "process context", ie no interrupts involved.
111 120
112The reasons you mustn't use these versions if you have interrupts that 121The reasons you mustn't use these versions if you have interrupts that
113play with the spinlock is that you can get deadlocks: 122play with the spinlock is that you can get deadlocks::
114 123
115 spin_lock(&lock); 124 spin_lock(&lock);
116 ... 125 ...
@@ -147,9 +156,10 @@ indeed), while write-locks need to protect themselves against interrupts.
147---- 156----
148 157
149Reference information: 158Reference information:
159======================
150 160
151For dynamic initialization, use spin_lock_init() or rwlock_init() as 161For dynamic initialization, use spin_lock_init() or rwlock_init() as
152appropriate: 162appropriate::
153 163
154 spinlock_t xxx_lock; 164 spinlock_t xxx_lock;
155 rwlock_t xxx_rw_lock; 165 rwlock_t xxx_rw_lock;
diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.rst
index f0ed7c30e695..1846c199da23 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.rst
@@ -1,3 +1,4 @@
1======================================
1Wound/Wait Deadlock-Proof Mutex Design 2Wound/Wait Deadlock-Proof Mutex Design
2====================================== 3======================================
3 4
@@ -85,6 +86,7 @@ Furthermore there are three different class of w/w lock acquire functions:
85 no deadlock potential and hence the ww_mutex_lock call will block and not 86 no deadlock potential and hence the ww_mutex_lock call will block and not
86 prematurely return -EDEADLK. The advantage of the _slow functions is in 87 prematurely return -EDEADLK. The advantage of the _slow functions is in
87 interface safety: 88 interface safety:
89
88 - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow 90 - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow
89 has a void return type. Note that since ww mutex code needs loops/retries 91 has a void return type. Note that since ww mutex code needs loops/retries
90 anyway the __must_check doesn't result in spurious warnings, even though the 92 anyway the __must_check doesn't result in spurious warnings, even though the
@@ -115,36 +117,36 @@ expect the number of simultaneous competing transactions to be typically small,
115and you want to reduce the number of rollbacks. 117and you want to reduce the number of rollbacks.
116 118
117Three different ways to acquire locks within the same w/w class. Common 119Three different ways to acquire locks within the same w/w class. Common
118definitions for methods #1 and #2: 120definitions for methods #1 and #2::
119 121
120static DEFINE_WW_CLASS(ww_class); 122 static DEFINE_WW_CLASS(ww_class);
121 123
122struct obj { 124 struct obj {
123 struct ww_mutex lock; 125 struct ww_mutex lock;
124 /* obj data */ 126 /* obj data */
125}; 127 };
126 128
127struct obj_entry { 129 struct obj_entry {
128 struct list_head head; 130 struct list_head head;
129 struct obj *obj; 131 struct obj *obj;
130}; 132 };
131 133
132Method 1, using a list in execbuf->buffers that's not allowed to be reordered. 134Method 1, using a list in execbuf->buffers that's not allowed to be reordered.
133This is useful if a list of required objects is already tracked somewhere. 135This is useful if a list of required objects is already tracked somewhere.
134Furthermore the lock helper can use propagate the -EALREADY return code back to 136Furthermore the lock helper can use propagate the -EALREADY return code back to
135the caller as a signal that an object is twice on the list. This is useful if 137the caller as a signal that an object is twice on the list. This is useful if
136the list is constructed from userspace input and the ABI requires userspace to 138the list is constructed from userspace input and the ABI requires userspace to
137not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl). 139not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl)::
138 140
139int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 141 int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
140{ 142 {
141 struct obj *res_obj = NULL; 143 struct obj *res_obj = NULL;
142 struct obj_entry *contended_entry = NULL; 144 struct obj_entry *contended_entry = NULL;
143 struct obj_entry *entry; 145 struct obj_entry *entry;
144 146
145 ww_acquire_init(ctx, &ww_class); 147 ww_acquire_init(ctx, &ww_class);
146 148
147retry: 149 retry:
148 list_for_each_entry (entry, list, head) { 150 list_for_each_entry (entry, list, head) {
149 if (entry->obj == res_obj) { 151 if (entry->obj == res_obj) {
150 res_obj = NULL; 152 res_obj = NULL;
@@ -160,7 +162,7 @@ retry:
160 ww_acquire_done(ctx); 162 ww_acquire_done(ctx);
161 return 0; 163 return 0;
162 164
163err: 165 err:
164 list_for_each_entry_continue_reverse (entry, list, head) 166 list_for_each_entry_continue_reverse (entry, list, head)
165 ww_mutex_unlock(&entry->obj->lock); 167 ww_mutex_unlock(&entry->obj->lock);
166 168
@@ -176,14 +178,14 @@ err:
176 ww_acquire_fini(ctx); 178 ww_acquire_fini(ctx);
177 179
178 return ret; 180 return ret;
179} 181 }
180 182
181Method 2, using a list in execbuf->buffers that can be reordered. Same semantics 183Method 2, using a list in execbuf->buffers that can be reordered. Same semantics
182of duplicate entry detection using -EALREADY as method 1 above. But the 184of duplicate entry detection using -EALREADY as method 1 above. But the
183list-reordering allows for a bit more idiomatic code. 185list-reordering allows for a bit more idiomatic code::
184 186
185int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 187 int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
186{ 188 {
187 struct obj_entry *entry, *entry2; 189 struct obj_entry *entry, *entry2;
188 190
189 ww_acquire_init(ctx, &ww_class); 191 ww_acquire_init(ctx, &ww_class);
@@ -216,24 +218,25 @@ int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
216 218
217 ww_acquire_done(ctx); 219 ww_acquire_done(ctx);
218 return 0; 220 return 0;
219} 221 }
220 222
221Unlocking works the same way for both methods #1 and #2: 223Unlocking works the same way for both methods #1 and #2::
222 224
223void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 225 void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
224{ 226 {
225 struct obj_entry *entry; 227 struct obj_entry *entry;
226 228
227 list_for_each_entry (entry, list, head) 229 list_for_each_entry (entry, list, head)
228 ww_mutex_unlock(&entry->obj->lock); 230 ww_mutex_unlock(&entry->obj->lock);
229 231
230 ww_acquire_fini(ctx); 232 ww_acquire_fini(ctx);
231} 233 }
232 234
233Method 3 is useful if the list of objects is constructed ad-hoc and not upfront, 235Method 3 is useful if the list of objects is constructed ad-hoc and not upfront,
234e.g. when adjusting edges in a graph where each node has its own ww_mutex lock, 236e.g. when adjusting edges in a graph where each node has its own ww_mutex lock,
235and edges can only be changed when holding the locks of all involved nodes. w/w 237and edges can only be changed when holding the locks of all involved nodes. w/w
236mutexes are a natural fit for such a case for two reasons: 238mutexes are a natural fit for such a case for two reasons:
239
237- They can handle lock-acquisition in any order which allows us to start walking 240- They can handle lock-acquisition in any order which allows us to start walking
238 a graph from a starting point and then iteratively discovering new edges and 241 a graph from a starting point and then iteratively discovering new edges and
239 locking down the nodes those edges connect to. 242 locking down the nodes those edges connect to.
@@ -243,6 +246,7 @@ mutexes are a natural fit for such a case for two reasons:
243 as a starting point). 246 as a starting point).
244 247
245Note that this approach differs in two important ways from the above methods: 248Note that this approach differs in two important ways from the above methods:
249
246- Since the list of objects is dynamically constructed (and might very well be 250- Since the list of objects is dynamically constructed (and might very well be
247 different when retrying due to hitting the -EDEADLK die condition) there's 251 different when retrying due to hitting the -EDEADLK die condition) there's
248 no need to keep any object on a persistent list when it's not locked. We can 252 no need to keep any object on a persistent list when it's not locked. We can
@@ -260,17 +264,17 @@ any interface misuse for these cases.
260 264
261Also, method 3 can't fail the lock acquisition step since it doesn't return 265Also, method 3 can't fail the lock acquisition step since it doesn't return
262-EALREADY. Of course this would be different when using the _interruptible 266-EALREADY. Of course this would be different when using the _interruptible
263variants, but that's outside of the scope of these examples here. 267variants, but that's outside of the scope of these examples here::
264 268
265struct obj { 269 struct obj {
266 struct ww_mutex ww_mutex; 270 struct ww_mutex ww_mutex;
267 struct list_head locked_list; 271 struct list_head locked_list;
268}; 272 };
269 273
270static DEFINE_WW_CLASS(ww_class); 274 static DEFINE_WW_CLASS(ww_class);
271 275
272void __unlock_objs(struct list_head *list) 276 void __unlock_objs(struct list_head *list)
273{ 277 {
274 struct obj *entry, *temp; 278 struct obj *entry, *temp;
275 279
276 list_for_each_entry_safe (entry, temp, list, locked_list) { 280 list_for_each_entry_safe (entry, temp, list, locked_list) {
@@ -279,15 +283,15 @@ void __unlock_objs(struct list_head *list)
279 list_del(&entry->locked_list); 283 list_del(&entry->locked_list);
280 ww_mutex_unlock(entry->ww_mutex) 284 ww_mutex_unlock(entry->ww_mutex)
281 } 285 }
282} 286 }
283 287
284void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 288 void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
285{ 289 {
286 struct obj *obj; 290 struct obj *obj;
287 291
288 ww_acquire_init(ctx, &ww_class); 292 ww_acquire_init(ctx, &ww_class);
289 293
290retry: 294 retry:
291 /* re-init loop start state */ 295 /* re-init loop start state */
292 loop { 296 loop {
293 /* magic code which walks over a graph and decides which objects 297 /* magic code which walks over a graph and decides which objects
@@ -312,13 +316,13 @@ retry:
312 316
313 ww_acquire_done(ctx); 317 ww_acquire_done(ctx);
314 return 0; 318 return 0;
315} 319 }
316 320
317void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 321 void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
318{ 322 {
319 __unlock_objs(list); 323 __unlock_objs(list);
320 ww_acquire_fini(ctx); 324 ww_acquire_fini(ctx);
321} 325 }
322 326
323Method 4: Only lock one single objects. In that case deadlock detection and 327Method 4: Only lock one single objects. In that case deadlock detection and
324prevention is obviously overkill, since with grabbing just one lock you can't 328prevention is obviously overkill, since with grabbing just one lock you can't
@@ -329,11 +333,14 @@ Implementation Details
329---------------------- 333----------------------
330 334
331Design: 335Design:
336^^^^^^^
337
332 ww_mutex currently encapsulates a struct mutex, this means no extra overhead for 338 ww_mutex currently encapsulates a struct mutex, this means no extra overhead for
333 normal mutex locks, which are far more common. As such there is only a small 339 normal mutex locks, which are far more common. As such there is only a small
334 increase in code size if wait/wound mutexes are not used. 340 increase in code size if wait/wound mutexes are not used.
335 341
336 We maintain the following invariants for the wait list: 342 We maintain the following invariants for the wait list:
343
337 (1) Waiters with an acquire context are sorted by stamp order; waiters 344 (1) Waiters with an acquire context are sorted by stamp order; waiters
338 without an acquire context are interspersed in FIFO order. 345 without an acquire context are interspersed in FIFO order.
339 (2) For Wait-Die, among waiters with contexts, only the first one can have 346 (2) For Wait-Die, among waiters with contexts, only the first one can have
@@ -355,6 +362,8 @@ Design:
355 therefore be directed towards the uncontended cases. 362 therefore be directed towards the uncontended cases.
356 363
357Lockdep: 364Lockdep:
365^^^^^^^^
366
358 Special care has been taken to warn for as many cases of api abuse 367 Special care has been taken to warn for as many cases of api abuse
359 as possible. Some common api abuses will be caught with 368 as possible. Some common api abuses will be caught with
360 CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended. 369 CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended.
@@ -379,5 +388,6 @@ Lockdep:
379 having called ww_acquire_fini on the first. 388 having called ww_acquire_fini on the first.
380 - 'normal' deadlocks that can occur. 389 - 'normal' deadlocks that can occur.
381 390
382FIXME: Update this section once we have the TASK_DEADLOCK task state flag magic 391FIXME:
383implemented. 392 Update this section once we have the TASK_DEADLOCK task state flag magic
393 implemented.