aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPaul E. McKenney <paulmck@us.ibm.com>2006-10-04 05:17:02 -0400
committerLinus Torvalds <torvalds@g5.osdl.org>2006-10-04 10:55:30 -0400
commit621934ee7ed5b073c7fd638b347e632c53572761 (patch)
tree5722f9cda22c099ad60545f963410dcbc762ee65
parent95d77884c77beed676036d2f74d10b470a483c63 (diff)
[PATCH] srcu-3: RCU variant permitting read-side blocking
Updated patch adding a variant of RCU that permits sleeping in read-side critical sections. SRCU is as follows: o Each use of SRCU creates its own srcu_struct, and each srcu_struct has its own set of grace periods. This is critical, as it prevents one subsystem with a blocking reader from holding up SRCU grace periods for other subsystems. o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) all take a pointer to a srcu_struct. o The SRCU primitives must be called from process context. o srcu_read_lock() returns an int that must be passed to the matching srcu_read_unlock(). Realtime RCU avoids the need for this by storing the state in the task struct, but SRCU needs to allow a given code path to pass through multiple SRCU domains -- storing state in the task struct would therefore require either arbitrary space in the task struct or arbitrary limits on SRCU nesting. So I kicked the state-storage problem up to the caller. Of course, it is not permitted to call synchronize_srcu() while in an SRCU read-side critical section. o There is no call_srcu(). It would not be hard to implement one, but it seems like too easy a way to OOM the system. (Hey, we have enough trouble with call_rcu(), which does -not- permit readers to sleep!!!) So, if you want it, please tell me why... [josht@us.ibm.com: sparse notation] Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-rw-r--r--Documentation/RCU/checklist.txt38
-rw-r--r--Documentation/RCU/rcu.txt3
-rw-r--r--Documentation/RCU/whatisRCU.txt3
-rw-r--r--include/linux/srcu.h49
-rw-r--r--kernel/Makefile2
-rw-r--r--kernel/srcu.c257
6 files changed, 350 insertions, 2 deletions
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 1d50cf0c905e..f4dffadbcb00 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -221,3 +221,41 @@ over a rather long period of time, but improvements are always welcome!
221 disable irq on a given acquisition of that lock will result in 221 disable irq on a given acquisition of that lock will result in
222 deadlock as soon as the RCU callback happens to interrupt that 222 deadlock as soon as the RCU callback happens to interrupt that
223 acquisition's critical section. 223 acquisition's critical section.
224
22513. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu())
226 may only be invoked from process context. Unlike other forms of
227 RCU, it -is- permissible to block in an SRCU read-side critical
228 section (demarked by srcu_read_lock() and srcu_read_unlock()),
229 hence the "SRCU": "sleepable RCU". Please note that if you
230 don't need to sleep in read-side critical sections, you should
231 be using RCU rather than SRCU, because RCU is almost always
232 faster and easier to use than is SRCU.
233
234 Also unlike other forms of RCU, explicit initialization
235 and cleanup is required via init_srcu_struct() and
236 cleanup_srcu_struct(). These are passed a "struct srcu_struct"
237 that defines the scope of a given SRCU domain. Once initialized,
238 the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock()
239 and synchronize_srcu(). A given synchronize_srcu() waits only
240 for SRCU read-side critical sections governed by srcu_read_lock()
241 and srcu_read_unlock() calls that have been passd the same
242 srcu_struct. This property is what makes sleeping read-side
243 critical sections tolerable -- a given subsystem delays only
244 its own updates, not those of other subsystems using SRCU.
245 Therefore, SRCU is less prone to OOM the system than RCU would
246 be if RCU's read-side critical sections were permitted to
247 sleep.
248
249 The ability to sleep in read-side critical sections does not
250 come for free. First, corresponding srcu_read_lock() and
251 srcu_read_unlock() calls must be passed the same srcu_struct.
252 Second, grace-period-detection overhead is amortized only
253 over those updates sharing a given srcu_struct, rather than
254 being globally amortized as they are for other forms of RCU.
255 Therefore, SRCU should be used in preference to rw_semaphore
256 only in extremely read-intensive situations, or in situations
257 requiring SRCU's read-side deadlock immunity or low read-side
258 realtime latency.
259
260 Note that, rcu_assign_pointer() and rcu_dereference() relate to
261 SRCU just as they do to other forms of RCU.
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
index 02e27bf1d365..f84407cba816 100644
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
@@ -45,7 +45,8 @@ o How can I see where RCU is currently used in the Linux kernel?
45 45
46 Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu", 46 Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
47 "rcu_read_lock_bh", "rcu_read_unlock_bh", "call_rcu_bh", 47 "rcu_read_lock_bh", "rcu_read_unlock_bh", "call_rcu_bh",
48 "synchronize_rcu", and "synchronize_net". 48 "srcu_read_lock", "srcu_read_unlock", "synchronize_rcu",
49 "synchronize_net", and "synchronize_srcu".
49 50
50o What guidelines should I follow when writing code that uses RCU? 51o What guidelines should I follow when writing code that uses RCU?
51 52
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 820fee236967..e0d6d99b8f9b 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -778,6 +778,8 @@ Markers for RCU read-side critical sections:
778 rcu_read_unlock 778 rcu_read_unlock
779 rcu_read_lock_bh 779 rcu_read_lock_bh
780 rcu_read_unlock_bh 780 rcu_read_unlock_bh
781 srcu_read_lock
782 srcu_read_unlock
781 783
782RCU pointer/list traversal: 784RCU pointer/list traversal:
783 785
@@ -804,6 +806,7 @@ RCU grace period:
804 synchronize_net 806 synchronize_net
805 synchronize_sched 807 synchronize_sched
806 synchronize_rcu 808 synchronize_rcu
809 synchronize_srcu
807 call_rcu 810 call_rcu
808 call_rcu_bh 811 call_rcu_bh
809 812
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
new file mode 100644
index 000000000000..947fdab2ddb0
--- /dev/null
+++ b/include/linux/srcu.h
@@ -0,0 +1,49 @@
1/*
2 * Sleepable Read-Copy Update mechanism for mutual exclusion
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License as published by
6 * the Free Software Foundation; either version 2 of the License, or
7 * (at your option) any later version.
8 *
9 * This program is distributed in the hope that it will be useful,
10 * but WITHOUT ANY WARRANTY; without even the implied warranty of
11 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 * GNU General Public License for more details.
13 *
14 * You should have received a copy of the GNU General Public License
15 * along with this program; if not, write to the Free Software
16 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
17 *
18 * Copyright (C) IBM Corporation, 2006
19 *
20 * Author: Paul McKenney <paulmck@us.ibm.com>
21 *
22 * For detailed explanation of Read-Copy Update mechanism see -
23 * Documentation/RCU/ *.txt
24 *
25 */
26
27struct srcu_struct_array {
28 int c[2];
29};
30
31struct srcu_struct {
32 int completed;
33 struct srcu_struct_array *per_cpu_ref;
34 struct mutex mutex;
35};
36
37#ifndef CONFIG_PREEMPT
38#define srcu_barrier() barrier()
39#else /* #ifndef CONFIG_PREEMPT */
40#define srcu_barrier()
41#endif /* #else #ifndef CONFIG_PREEMPT */
42
43void init_srcu_struct(struct srcu_struct *sp);
44void cleanup_srcu_struct(struct srcu_struct *sp);
45int srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
46void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
47void synchronize_srcu(struct srcu_struct *sp);
48long srcu_batches_completed(struct srcu_struct *sp);
49void cleanup_srcu_struct(struct srcu_struct *sp);
diff --git a/kernel/Makefile b/kernel/Makefile
index d948ca12acf0..5e3f3b75563a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -8,7 +8,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o profile.o \
8 signal.o sys.o kmod.o workqueue.o pid.o \ 8 signal.o sys.o kmod.o workqueue.o pid.o \
9 rcupdate.o extable.o params.o posix-timers.o \ 9 rcupdate.o extable.o params.o posix-timers.o \
10 kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \ 10 kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
11 hrtimer.o rwsem.o latency.o nsproxy.o 11 hrtimer.o rwsem.o latency.o nsproxy.o srcu.o
12 12
13obj-$(CONFIG_STACKTRACE) += stacktrace.o 13obj-$(CONFIG_STACKTRACE) += stacktrace.o
14obj-y += time/ 14obj-y += time/
diff --git a/kernel/srcu.c b/kernel/srcu.c
new file mode 100644
index 000000000000..7e1979f624ba
--- /dev/null
+++ b/kernel/srcu.c
@@ -0,0 +1,257 @@
1/*
2 * Sleepable Read-Copy Update mechanism for mutual exclusion.
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License as published by
6 * the Free Software Foundation; either version 2 of the License, or
7 * (at your option) any later version.
8 *
9 * This program is distributed in the hope that it will be useful,
10 * but WITHOUT ANY WARRANTY; without even the implied warranty of
11 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 * GNU General Public License for more details.
13 *
14 * You should have received a copy of the GNU General Public License
15 * along with this program; if not, write to the Free Software
16 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
17 *
18 * Copyright (C) IBM Corporation, 2006
19 *
20 * Author: Paul McKenney <paulmck@us.ibm.com>
21 *
22 * For detailed explanation of Read-Copy Update mechanism see -
23 * Documentation/RCU/ *.txt
24 *
25 */
26
27#include <linux/module.h>
28#include <linux/mutex.h>
29#include <linux/percpu.h>
30#include <linux/preempt.h>
31#include <linux/rcupdate.h>
32#include <linux/sched.h>
33#include <linux/slab.h>
34#include <linux/smp.h>
35#include <linux/srcu.h>
36
37/**
38 * init_srcu_struct - initialize a sleep-RCU structure
39 * @sp: structure to initialize.
40 *
41 * Must invoke this on a given srcu_struct before passing that srcu_struct
42 * to any other function. Each srcu_struct represents a separate domain
43 * of SRCU protection.
44 */
45void init_srcu_struct(struct srcu_struct *sp)
46{
47 sp->completed = 0;
48 sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
49 mutex_init(&sp->mutex);
50}
51
52/*
53 * srcu_readers_active_idx -- returns approximate number of readers
54 * active on the specified rank of per-CPU counters.
55 */
56
57static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
58{
59 int cpu;
60 int sum;
61
62 sum = 0;
63 for_each_possible_cpu(cpu)
64 sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
65 return sum;
66}
67
68/**
69 * srcu_readers_active - returns approximate number of readers.
70 * @sp: which srcu_struct to count active readers (holding srcu_read_lock).
71 *
72 * Note that this is not an atomic primitive, and can therefore suffer
73 * severe errors when invoked on an active srcu_struct. That said, it
74 * can be useful as an error check at cleanup time.
75 */
76int srcu_readers_active(struct srcu_struct *sp)
77{
78 return srcu_readers_active_idx(sp, 0) + srcu_readers_active_idx(sp, 1);
79}
80
81/**
82 * cleanup_srcu_struct - deconstruct a sleep-RCU structure
83 * @sp: structure to clean up.
84 *
85 * Must invoke this after you are finished using a given srcu_struct that
86 * was initialized via init_srcu_struct(), else you leak memory.
87 */
88void cleanup_srcu_struct(struct srcu_struct *sp)
89{
90 int sum;
91
92 sum = srcu_readers_active(sp);
93 WARN_ON(sum); /* Leakage unless caller handles error. */
94 if (sum != 0)
95 return;
96 free_percpu(sp->per_cpu_ref);
97 sp->per_cpu_ref = NULL;
98}
99
100/**
101 * srcu_read_lock - register a new reader for an SRCU-protected structure.
102 * @sp: srcu_struct in which to register the new reader.
103 *
104 * Counts the new reader in the appropriate per-CPU element of the
105 * srcu_struct. Must be called from process context.
106 * Returns an index that must be passed to the matching srcu_read_unlock().
107 */
108int srcu_read_lock(struct srcu_struct *sp)
109{
110 int idx;
111
112 preempt_disable();
113 idx = sp->completed & 0x1;
114 barrier(); /* ensure compiler looks -once- at sp->completed. */
115 per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++;
116 srcu_barrier(); /* ensure compiler won't misorder critical section. */
117 preempt_enable();
118 return idx;
119}
120
121/**
122 * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
123 * @sp: srcu_struct in which to unregister the old reader.
124 * @idx: return value from corresponding srcu_read_lock().
125 *
126 * Removes the count for the old reader from the appropriate per-CPU
127 * element of the srcu_struct. Note that this may well be a different
128 * CPU than that which was incremented by the corresponding srcu_read_lock().
129 * Must be called from process context.
130 */
131void srcu_read_unlock(struct srcu_struct *sp, int idx)
132{
133 preempt_disable();
134 srcu_barrier(); /* ensure compiler won't misorder critical section. */
135 per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--;
136 preempt_enable();
137}
138
139/**
140 * synchronize_srcu - wait for prior SRCU read-side critical-section completion
141 * @sp: srcu_struct with which to synchronize.
142 *
143 * Flip the completed counter, and wait for the old count to drain to zero.
144 * As with classic RCU, the updater must use some separate means of
145 * synchronizing concurrent updates. Can block; must be called from
146 * process context.
147 *
148 * Note that it is illegal to call synchornize_srcu() from the corresponding
149 * SRCU read-side critical section; doing so will result in deadlock.
150 * However, it is perfectly legal to call synchronize_srcu() on one
151 * srcu_struct from some other srcu_struct's read-side critical section.
152 */
153void synchronize_srcu(struct srcu_struct *sp)
154{
155 int idx;
156
157 idx = sp->completed;
158 mutex_lock(&sp->mutex);
159
160 /*
161 * Check to see if someone else did the work for us while we were
162 * waiting to acquire the lock. We need -two- advances of
163 * the counter, not just one. If there was but one, we might have
164 * shown up -after- our helper's first synchronize_sched(), thus
165 * having failed to prevent CPU-reordering races with concurrent
166 * srcu_read_unlock()s on other CPUs (see comment below). So we
167 * either (1) wait for two or (2) supply the second ourselves.
168 */
169
170 if ((sp->completed - idx) >= 2) {
171 mutex_unlock(&sp->mutex);
172 return;
173 }
174
175 synchronize_sched(); /* Force memory barrier on all CPUs. */
176
177 /*
178 * The preceding synchronize_sched() ensures that any CPU that
179 * sees the new value of sp->completed will also see any preceding
180 * changes to data structures made by this CPU. This prevents
181 * some other CPU from reordering the accesses in its SRCU
182 * read-side critical section to precede the corresponding
183 * srcu_read_lock() -- ensuring that such references will in
184 * fact be protected.
185 *
186 * So it is now safe to do the flip.
187 */
188
189 idx = sp->completed & 0x1;
190 sp->completed++;
191
192 synchronize_sched(); /* Force memory barrier on all CPUs. */
193
194 /*
195 * At this point, because of the preceding synchronize_sched(),
196 * all srcu_read_lock() calls using the old counters have completed.
197 * Their corresponding critical sections might well be still
198 * executing, but the srcu_read_lock() primitives themselves
199 * will have finished executing.
200 */
201
202 while (srcu_readers_active_idx(sp, idx))
203 schedule_timeout_interruptible(1);
204
205 synchronize_sched(); /* Force memory barrier on all CPUs. */
206
207 /*
208 * The preceding synchronize_sched() forces all srcu_read_unlock()
209 * primitives that were executing concurrently with the preceding
210 * for_each_possible_cpu() loop to have completed by this point.
211 * More importantly, it also forces the corresponding SRCU read-side
212 * critical sections to have also completed, and the corresponding
213 * references to SRCU-protected data items to be dropped.
214 *
215 * Note:
216 *
217 * Despite what you might think at first glance, the
218 * preceding synchronize_sched() -must- be within the
219 * critical section ended by the following mutex_unlock().
220 * Otherwise, a task taking the early exit can race
221 * with a srcu_read_unlock(), which might have executed
222 * just before the preceding srcu_readers_active() check,
223 * and whose CPU might have reordered the srcu_read_unlock()
224 * with the preceding critical section. In this case, there
225 * is nothing preventing the synchronize_sched() task that is
226 * taking the early exit from freeing a data structure that
227 * is still being referenced (out of order) by the task
228 * doing the srcu_read_unlock().
229 *
230 * Alternatively, the comparison with "2" on the early exit
231 * could be changed to "3", but this increases synchronize_srcu()
232 * latency for bulk loads. So the current code is preferred.
233 */
234
235 mutex_unlock(&sp->mutex);
236}
237
238/**
239 * srcu_batches_completed - return batches completed.
240 * @sp: srcu_struct on which to report batch completion.
241 *
242 * Report the number of batches, correlated with, but not necessarily
243 * precisely the same as, the number of grace periods that have elapsed.
244 */
245
246long srcu_batches_completed(struct srcu_struct *sp)
247{
248 return sp->completed;
249}
250
251EXPORT_SYMBOL_GPL(init_srcu_struct);
252EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
253EXPORT_SYMBOL_GPL(srcu_read_lock);
254EXPORT_SYMBOL_GPL(srcu_read_unlock);
255EXPORT_SYMBOL_GPL(synchronize_srcu);
256EXPORT_SYMBOL_GPL(srcu_batches_completed);
257EXPORT_SYMBOL_GPL(srcu_readers_active);