diff options
author | Ingo Molnar <mingo@elte.hu> | 2011-12-14 02:16:43 -0500 |
---|---|---|
committer | Ingo Molnar <mingo@elte.hu> | 2011-12-14 02:16:43 -0500 |
commit | 919b83452b2e7c1dbced0456015508b4b9585db3 (patch) | |
tree | 836d0c32b814f7bd5fed83e19b6a2ab77dcf6987 /Documentation | |
parent | 373da0a2a33018d560afcb2c77f8842985d79594 (diff) | |
parent | a513f6bab0939800dcf1e7c075e733420cf967c5 (diff) |
Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/RCU/checklist.txt | 6 | ||||
-rw-r--r-- | Documentation/RCU/rcu.txt | 10 | ||||
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 16 | ||||
-rw-r--r-- | Documentation/RCU/torture.txt | 13 | ||||
-rw-r--r-- | Documentation/RCU/trace.txt | 4 | ||||
-rw-r--r-- | Documentation/RCU/whatisRCU.txt | 19 | ||||
-rw-r--r-- | Documentation/atomic_ops.txt | 87 | ||||
-rw-r--r-- | Documentation/lockdep-design.txt | 63 |
8 files changed, 198 insertions, 20 deletions
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 0c134f8afc6f..bff2d8be1e18 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt | |||
@@ -328,6 +328,12 @@ over a rather long period of time, but improvements are always welcome! | |||
328 | RCU rather than SRCU, because RCU is almost always faster and | 328 | RCU rather than SRCU, because RCU is almost always faster and |
329 | easier to use than is SRCU. | 329 | easier to use than is SRCU. |
330 | 330 | ||
331 | If you need to enter your read-side critical section in a | ||
332 | hardirq or exception handler, and then exit that same read-side | ||
333 | critical section in the task that was interrupted, then you need | ||
334 | to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid | ||
335 | the lockdep checking that would otherwise this practice illegal. | ||
336 | |||
331 | Also unlike other forms of RCU, explicit initialization | 337 | Also unlike other forms of RCU, explicit initialization |
332 | and cleanup is required via init_srcu_struct() and | 338 | and cleanup is required via init_srcu_struct() and |
333 | cleanup_srcu_struct(). These are passed a "struct srcu_struct" | 339 | cleanup_srcu_struct(). These are passed a "struct srcu_struct" |
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 31852705b586..bf778332a28f 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt | |||
@@ -38,11 +38,11 @@ o How can the updater tell when a grace period has completed | |||
38 | 38 | ||
39 | Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the | 39 | Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the |
40 | same effect, but require that the readers manipulate CPU-local | 40 | same effect, but require that the readers manipulate CPU-local |
41 | counters. These counters allow limited types of blocking | 41 | counters. These counters allow limited types of blocking within |
42 | within RCU read-side critical sections. SRCU also uses | 42 | RCU read-side critical sections. SRCU also uses CPU-local |
43 | CPU-local counters, and permits general blocking within | 43 | counters, and permits general blocking within RCU read-side |
44 | RCU read-side critical sections. These two variants of | 44 | critical sections. These variants of RCU detect grace periods |
45 | RCU detect grace periods by sampling these counters. | 45 | by sampling these counters. |
46 | 46 | ||
47 | o If I am running on a uniprocessor kernel, which can only do one | 47 | o If I am running on a uniprocessor kernel, which can only do one |
48 | thing at a time, why should I wait for a grace period? | 48 | thing at a time, why should I wait for a grace period? |
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 4e959208f736..083d88cbc089 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt | |||
@@ -101,6 +101,11 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that | |||
101 | CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning | 101 | CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning |
102 | messages. | 102 | messages. |
103 | 103 | ||
104 | o A hardware or software issue shuts off the scheduler-clock | ||
105 | interrupt on a CPU that is not in dyntick-idle mode. This | ||
106 | problem really has happened, and seems to be most likely to | ||
107 | result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels. | ||
108 | |||
104 | o A bug in the RCU implementation. | 109 | o A bug in the RCU implementation. |
105 | 110 | ||
106 | o A hardware failure. This is quite unlikely, but has occurred | 111 | o A hardware failure. This is quite unlikely, but has occurred |
@@ -109,12 +114,11 @@ o A hardware failure. This is quite unlikely, but has occurred | |||
109 | This resulted in a series of RCU CPU stall warnings, eventually | 114 | This resulted in a series of RCU CPU stall warnings, eventually |
110 | leading the realization that the CPU had failed. | 115 | leading the realization that the CPU had failed. |
111 | 116 | ||
112 | The RCU, RCU-sched, and RCU-bh implementations have CPU stall | 117 | The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. |
113 | warning. SRCU does not have its own CPU stall warnings, but its | 118 | SRCU does not have its own CPU stall warnings, but its calls to |
114 | calls to synchronize_sched() will result in RCU-sched detecting | 119 | synchronize_sched() will result in RCU-sched detecting RCU-sched-related |
115 | RCU-sched-related CPU stalls. Please note that RCU only detects | 120 | CPU stalls. Please note that RCU only detects CPU stalls when there is |
116 | CPU stalls when there is a grace period in progress. No grace period, | 121 | a grace period in progress. No grace period, no CPU stall warnings. |
117 | no CPU stall warnings. | ||
118 | 122 | ||
119 | To diagnose the cause of the stall, inspect the stack traces. | 123 | To diagnose the cause of the stall, inspect the stack traces. |
120 | The offending function will usually be near the top of the stack. | 124 | The offending function will usually be near the top of the stack. |
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 783d6c134d3f..d67068d0d2b9 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt | |||
@@ -61,11 +61,24 @@ nreaders This is the number of RCU reading threads supported. | |||
61 | To properly exercise RCU implementations with preemptible | 61 | To properly exercise RCU implementations with preemptible |
62 | read-side critical sections. | 62 | read-side critical sections. |
63 | 63 | ||
64 | onoff_interval | ||
65 | The number of seconds between each attempt to execute a | ||
66 | randomly selected CPU-hotplug operation. Defaults to | ||
67 | zero, which disables CPU hotplugging. In HOTPLUG_CPU=n | ||
68 | kernels, rcutorture will silently refuse to do any | ||
69 | CPU-hotplug operations regardless of what value is | ||
70 | specified for onoff_interval. | ||
71 | |||
64 | shuffle_interval | 72 | shuffle_interval |
65 | The number of seconds to keep the test threads affinitied | 73 | The number of seconds to keep the test threads affinitied |
66 | to a particular subset of the CPUs, defaults to 3 seconds. | 74 | to a particular subset of the CPUs, defaults to 3 seconds. |
67 | Used in conjunction with test_no_idle_hz. | 75 | Used in conjunction with test_no_idle_hz. |
68 | 76 | ||
77 | shutdown_secs The number of seconds to run the test before terminating | ||
78 | the test and powering off the system. The default is | ||
79 | zero, which disables test termination and system shutdown. | ||
80 | This capability is useful for automated testing. | ||
81 | |||
69 | stat_interval The number of seconds between output of torture | 82 | stat_interval The number of seconds between output of torture |
70 | statistics (via printk()). Regardless of the interval, | 83 | statistics (via printk()). Regardless of the interval, |
71 | statistics are printed when the module is unloaded. | 84 | statistics are printed when the module is unloaded. |
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index aaf65f6c6cd7..49587abfc2f7 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt | |||
@@ -105,14 +105,10 @@ o "dt" is the current value of the dyntick counter that is incremented | |||
105 | or one greater than the interrupt-nesting depth otherwise. | 105 | or one greater than the interrupt-nesting depth otherwise. |
106 | The number after the second "/" is the NMI nesting depth. | 106 | The number after the second "/" is the NMI nesting depth. |
107 | 107 | ||
108 | This field is displayed only for CONFIG_NO_HZ kernels. | ||
109 | |||
110 | o "df" is the number of times that some other CPU has forced a | 108 | o "df" is the number of times that some other CPU has forced a |
111 | quiescent state on behalf of this CPU due to this CPU being in | 109 | quiescent state on behalf of this CPU due to this CPU being in |
112 | dynticks-idle state. | 110 | dynticks-idle state. |
113 | 111 | ||
114 | This field is displayed only for CONFIG_NO_HZ kernels. | ||
115 | |||
116 | o "of" is the number of times that some other CPU has forced a | 112 | o "of" is the number of times that some other CPU has forced a |
117 | quiescent state on behalf of this CPU due to this CPU being | 113 | quiescent state on behalf of this CPU due to this CPU being |
118 | offline. In a perfect world, this might never happen, but it | 114 | offline. In a perfect world, this might never happen, but it |
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 6ef692667e2f..6bbe8dcdc3da 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt | |||
@@ -4,6 +4,7 @@ to start learning about RCU: | |||
4 | 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ | 4 | 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ |
5 | 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ | 5 | 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ |
6 | 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ | 6 | 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ |
7 | 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ | ||
7 | 8 | ||
8 | 9 | ||
9 | What is RCU? | 10 | What is RCU? |
@@ -834,6 +835,8 @@ SRCU: Critical sections Grace period Barrier | |||
834 | 835 | ||
835 | srcu_read_lock synchronize_srcu N/A | 836 | srcu_read_lock synchronize_srcu N/A |
836 | srcu_read_unlock synchronize_srcu_expedited | 837 | srcu_read_unlock synchronize_srcu_expedited |
838 | srcu_read_lock_raw | ||
839 | srcu_read_unlock_raw | ||
837 | srcu_dereference | 840 | srcu_dereference |
838 | 841 | ||
839 | SRCU: Initialization/cleanup | 842 | SRCU: Initialization/cleanup |
@@ -855,27 +858,33 @@ list can be helpful: | |||
855 | 858 | ||
856 | a. Will readers need to block? If so, you need SRCU. | 859 | a. Will readers need to block? If so, you need SRCU. |
857 | 860 | ||
858 | b. What about the -rt patchset? If readers would need to block | 861 | b. Is it necessary to start a read-side critical section in a |
862 | hardirq handler or exception handler, and then to complete | ||
863 | this read-side critical section in the task that was | ||
864 | interrupted? If so, you need SRCU's srcu_read_lock_raw() and | ||
865 | srcu_read_unlock_raw() primitives. | ||
866 | |||
867 | c. What about the -rt patchset? If readers would need to block | ||
859 | in an non-rt kernel, you need SRCU. If readers would block | 868 | in an non-rt kernel, you need SRCU. If readers would block |
860 | in a -rt kernel, but not in a non-rt kernel, SRCU is not | 869 | in a -rt kernel, but not in a non-rt kernel, SRCU is not |
861 | necessary. | 870 | necessary. |
862 | 871 | ||
863 | c. Do you need to treat NMI handlers, hardirq handlers, | 872 | d. Do you need to treat NMI handlers, hardirq handlers, |
864 | and code segments with preemption disabled (whether | 873 | and code segments with preemption disabled (whether |
865 | via preempt_disable(), local_irq_save(), local_bh_disable(), | 874 | via preempt_disable(), local_irq_save(), local_bh_disable(), |
866 | or some other mechanism) as if they were explicit RCU readers? | 875 | or some other mechanism) as if they were explicit RCU readers? |
867 | If so, you need RCU-sched. | 876 | If so, you need RCU-sched. |
868 | 877 | ||
869 | d. Do you need RCU grace periods to complete even in the face | 878 | e. Do you need RCU grace periods to complete even in the face |
870 | of softirq monopolization of one or more of the CPUs? For | 879 | of softirq monopolization of one or more of the CPUs? For |
871 | example, is your code subject to network-based denial-of-service | 880 | example, is your code subject to network-based denial-of-service |
872 | attacks? If so, you need RCU-bh. | 881 | attacks? If so, you need RCU-bh. |
873 | 882 | ||
874 | e. Is your workload too update-intensive for normal use of | 883 | f. Is your workload too update-intensive for normal use of |
875 | RCU, but inappropriate for other synchronization mechanisms? | 884 | RCU, but inappropriate for other synchronization mechanisms? |
876 | If so, consider SLAB_DESTROY_BY_RCU. But please be careful! | 885 | If so, consider SLAB_DESTROY_BY_RCU. But please be careful! |
877 | 886 | ||
878 | f. Otherwise, use RCU. | 887 | g. Otherwise, use RCU. |
879 | 888 | ||
880 | Of course, this all assumes that you have determined that RCU is in fact | 889 | Of course, this all assumes that you have determined that RCU is in fact |
881 | the right tool for your job. | 890 | the right tool for your job. |
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 3bd585b44927..27f2b21a9d5c 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt | |||
@@ -84,6 +84,93 @@ compiler optimizes the section accessing atomic_t variables. | |||
84 | 84 | ||
85 | *** YOU HAVE BEEN WARNED! *** | 85 | *** YOU HAVE BEEN WARNED! *** |
86 | 86 | ||
87 | Properly aligned pointers, longs, ints, and chars (and unsigned | ||
88 | equivalents) may be atomically loaded from and stored to in the same | ||
89 | sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE() | ||
90 | macro should be used to prevent the compiler from using optimizations | ||
91 | that might otherwise optimize accesses out of existence on the one hand, | ||
92 | or that might create unsolicited accesses on the other. | ||
93 | |||
94 | For example consider the following code: | ||
95 | |||
96 | while (a > 0) | ||
97 | do_something(); | ||
98 | |||
99 | If the compiler can prove that do_something() does not store to the | ||
100 | variable a, then the compiler is within its rights transforming this to | ||
101 | the following: | ||
102 | |||
103 | tmp = a; | ||
104 | if (a > 0) | ||
105 | for (;;) | ||
106 | do_something(); | ||
107 | |||
108 | If you don't want the compiler to do this (and you probably don't), then | ||
109 | you should use something like the following: | ||
110 | |||
111 | while (ACCESS_ONCE(a) < 0) | ||
112 | do_something(); | ||
113 | |||
114 | Alternatively, you could place a barrier() call in the loop. | ||
115 | |||
116 | For another example, consider the following code: | ||
117 | |||
118 | tmp_a = a; | ||
119 | do_something_with(tmp_a); | ||
120 | do_something_else_with(tmp_a); | ||
121 | |||
122 | If the compiler can prove that do_something_with() does not store to the | ||
123 | variable a, then the compiler is within its rights to manufacture an | ||
124 | additional load as follows: | ||
125 | |||
126 | tmp_a = a; | ||
127 | do_something_with(tmp_a); | ||
128 | tmp_a = a; | ||
129 | do_something_else_with(tmp_a); | ||
130 | |||
131 | This could fatally confuse your code if it expected the same value | ||
132 | to be passed to do_something_with() and do_something_else_with(). | ||
133 | |||
134 | The compiler would be likely to manufacture this additional load if | ||
135 | do_something_with() was an inline function that made very heavy use | ||
136 | of registers: reloading from variable a could save a flush to the | ||
137 | stack and later reload. To prevent the compiler from attacking your | ||
138 | code in this manner, write the following: | ||
139 | |||
140 | tmp_a = ACCESS_ONCE(a); | ||
141 | do_something_with(tmp_a); | ||
142 | do_something_else_with(tmp_a); | ||
143 | |||
144 | For a final example, consider the following code, assuming that the | ||
145 | variable a is set at boot time before the second CPU is brought online | ||
146 | and never changed later, so that memory barriers are not needed: | ||
147 | |||
148 | if (a) | ||
149 | b = 9; | ||
150 | else | ||
151 | b = 42; | ||
152 | |||
153 | The compiler is within its rights to manufacture an additional store | ||
154 | by transforming the above code into the following: | ||
155 | |||
156 | b = 42; | ||
157 | if (a) | ||
158 | b = 9; | ||
159 | |||
160 | This could come as a fatal surprise to other code running concurrently | ||
161 | that expected b to never have the value 42 if a was zero. To prevent | ||
162 | the compiler from doing this, write something like: | ||
163 | |||
164 | if (a) | ||
165 | ACCESS_ONCE(b) = 9; | ||
166 | else | ||
167 | ACCESS_ONCE(b) = 42; | ||
168 | |||
169 | Don't even -think- about doing this without proper use of memory barriers, | ||
170 | locks, or atomic operations if variable a can change at runtime! | ||
171 | |||
172 | *** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! *** | ||
173 | |||
87 | Now, we move onto the atomic operation interfaces typically implemented with | 174 | Now, we move onto the atomic operation interfaces typically implemented with |
88 | the help of assembly code. | 175 | the help of assembly code. |
89 | 176 | ||
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt index abf768c681e2..5dbc99c04f6e 100644 --- a/Documentation/lockdep-design.txt +++ b/Documentation/lockdep-design.txt | |||
@@ -221,3 +221,66 @@ when the chain is validated for the first time, is then put into a hash | |||
221 | table, which hash-table can be checked in a lockfree manner. If the | 221 | table, which hash-table can be checked in a lockfree manner. If the |
222 | locking chain occurs again later on, the hash table tells us that we | 222 | locking chain occurs again later on, the hash table tells us that we |
223 | dont have to validate the chain again. | 223 | dont have to validate the chain again. |
224 | |||
225 | Troubleshooting: | ||
226 | ---------------- | ||
227 | |||
228 | The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes. | ||
229 | Exceeding this number will trigger the following lockdep warning: | ||
230 | |||
231 | (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS)) | ||
232 | |||
233 | By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical | ||
234 | desktop systems have less than 1,000 lock classes, so this warning | ||
235 | normally results from lock-class leakage or failure to properly | ||
236 | initialize locks. These two problems are illustrated below: | ||
237 | |||
238 | 1. Repeated module loading and unloading while running the validator | ||
239 | will result in lock-class leakage. The issue here is that each | ||
240 | load of the module will create a new set of lock classes for | ||
241 | that module's locks, but module unloading does not remove old | ||
242 | classes (see below discussion of reuse of lock classes for why). | ||
243 | Therefore, if that module is loaded and unloaded repeatedly, | ||
244 | the number of lock classes will eventually reach the maximum. | ||
245 | |||
246 | 2. Using structures such as arrays that have large numbers of | ||
247 | locks that are not explicitly initialized. For example, | ||
248 | a hash table with 8192 buckets where each bucket has its own | ||
249 | spinlock_t will consume 8192 lock classes -unless- each spinlock | ||
250 | is explicitly initialized at runtime, for example, using the | ||
251 | run-time spin_lock_init() as opposed to compile-time initializers | ||
252 | such as __SPIN_LOCK_UNLOCKED(). Failure to properly initialize | ||
253 | the per-bucket spinlocks would guarantee lock-class overflow. | ||
254 | In contrast, a loop that called spin_lock_init() on each lock | ||
255 | would place all 8192 locks into a single lock class. | ||
256 | |||
257 | The moral of this story is that you should always explicitly | ||
258 | initialize your locks. | ||
259 | |||
260 | One might argue that the validator should be modified to allow | ||
261 | lock classes to be reused. However, if you are tempted to make this | ||
262 | argument, first review the code and think through the changes that would | ||
263 | be required, keeping in mind that the lock classes to be removed are | ||
264 | likely to be linked into the lock-dependency graph. This turns out to | ||
265 | be harder to do than to say. | ||
266 | |||
267 | Of course, if you do run out of lock classes, the next thing to do is | ||
268 | to find the offending lock classes. First, the following command gives | ||
269 | you the number of lock classes currently in use along with the maximum: | ||
270 | |||
271 | grep "lock-classes" /proc/lockdep_stats | ||
272 | |||
273 | This command produces the following output on a modest system: | ||
274 | |||
275 | lock-classes: 748 [max: 8191] | ||
276 | |||
277 | If the number allocated (748 above) increases continually over time, | ||
278 | then there is likely a leak. The following command can be used to | ||
279 | identify the leaking lock classes: | ||
280 | |||
281 | grep "BD" /proc/lockdep | ||
282 | |||
283 | Run the command and save the output, then compare against the output from | ||
284 | a later run of this command to identify the leakers. This same output | ||
285 | can also help you find situations where runtime lock initialization has | ||
286 | been omitted. | ||