aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/RCU
diff options
context:
space:
mode:
authorGlenn Elliott <gelliott@cs.unc.edu>2012-03-04 19:47:13 -0500
committerGlenn Elliott <gelliott@cs.unc.edu>2012-03-04 19:47:13 -0500
commitc71c03bda1e86c9d5198c5d83f712e695c4f2a1e (patch)
treeecb166cb3e2b7e2adb3b5e292245fefd23381ac8 /Documentation/RCU
parentea53c912f8a86a8567697115b6a0d8152beee5c8 (diff)
parent6a00f206debf8a5c8899055726ad127dbeeed098 (diff)
Merge branch 'mpi-master' into wip-k-fmlpwip-k-fmlp
Conflicts: litmus/sched_cedf.c
Diffstat (limited to 'Documentation/RCU')
-rw-r--r--Documentation/RCU/00-INDEX2
-rw-r--r--Documentation/RCU/checklist.txt46
-rw-r--r--Documentation/RCU/stallwarn.txt41
-rw-r--r--Documentation/RCU/trace.txt440
-rw-r--r--Documentation/RCU/whatisRCU.txt31
5 files changed, 472 insertions, 88 deletions
diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX
index 71b6f500ddb9..1d7a885761f5 100644
--- a/Documentation/RCU/00-INDEX
+++ b/Documentation/RCU/00-INDEX
@@ -21,7 +21,7 @@ rcu.txt
21RTFP.txt 21RTFP.txt
22 - List of RCU papers (bibliography) going back to 1980. 22 - List of RCU papers (bibliography) going back to 1980.
23stallwarn.txt 23stallwarn.txt
24 - RCU CPU stall warnings (CONFIG_RCU_CPU_STALL_DETECTOR) 24 - RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress)
25torture.txt 25torture.txt
26 - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST) 26 - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST)
27trace.txt 27trace.txt
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 790d1a812376..0c134f8afc6f 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -218,13 +218,22 @@ over a rather long period of time, but improvements are always welcome!
218 include: 218 include:
219 219
220 a. Keeping a count of the number of data-structure elements 220 a. Keeping a count of the number of data-structure elements
221 used by the RCU-protected data structure, including those 221 used by the RCU-protected data structure, including
222 waiting for a grace period to elapse. Enforce a limit 222 those waiting for a grace period to elapse. Enforce a
223 on this number, stalling updates as needed to allow 223 limit on this number, stalling updates as needed to allow
224 previously deferred frees to complete. 224 previously deferred frees to complete. Alternatively,
225 225 limit only the number awaiting deferred free rather than
226 Alternatively, limit only the number awaiting deferred 226 the total number of elements.
227 free rather than the total number of elements. 227
228 One way to stall the updates is to acquire the update-side
229 mutex. (Don't try this with a spinlock -- other CPUs
230 spinning on the lock could prevent the grace period
231 from ever ending.) Another way to stall the updates
232 is for the updates to use a wrapper function around
233 the memory allocator, so that this wrapper function
234 simulates OOM when there is too much memory awaiting an
235 RCU grace period. There are of course many other
236 variations on this theme.
228 237
229 b. Limiting update rate. For example, if updates occur only 238 b. Limiting update rate. For example, if updates occur only
230 once per hour, then no explicit rate limiting is required, 239 once per hour, then no explicit rate limiting is required,
@@ -365,3 +374,26 @@ over a rather long period of time, but improvements are always welcome!
365 and the compiler to freely reorder code into and out of RCU 374 and the compiler to freely reorder code into and out of RCU
366 read-side critical sections. It is the responsibility of the 375 read-side critical sections. It is the responsibility of the
367 RCU update-side primitives to deal with this. 376 RCU update-side primitives to deal with this.
377
37817. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and
379 the __rcu sparse checks to validate your RCU code. These
380 can help find problems as follows:
381
382 CONFIG_PROVE_RCU: check that accesses to RCU-protected data
383 structures are carried out under the proper RCU
384 read-side critical section, while holding the right
385 combination of locks, or whatever other conditions
386 are appropriate.
387
388 CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
389 same object to call_rcu() (or friends) before an RCU
390 grace period has elapsed since the last time that you
391 passed that same object to call_rcu() (or friends).
392
393 __rcu sparse checks: tag the pointer to the RCU-protected data
394 structure with __rcu, and sparse will warn you if you
395 access that pointer without the services of one of the
396 variants of rcu_dereference().
397
398 These debugging aids can help you find problems that are
399 otherwise extremely difficult to spot.
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 44c6dcc93d6d..4e959208f736 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -1,22 +1,25 @@
1Using RCU's CPU Stall Detector 1Using RCU's CPU Stall Detector
2 2
3The CONFIG_RCU_CPU_STALL_DETECTOR kernel config parameter enables 3The rcu_cpu_stall_suppress module parameter enables RCU's CPU stall
4RCU's CPU stall detector, which detects conditions that unduly delay 4detector, which detects conditions that unduly delay RCU grace periods.
5RCU grace periods. The stall detector's idea of what constitutes 5This module parameter enables CPU stall detection by default, but
6"unduly delayed" is controlled by a set of C preprocessor macros: 6may be overridden via boot-time parameter or at runtime via sysfs.
7The stall detector's idea of what constitutes "unduly delayed" is
8controlled by a set of kernel configuration variables and cpp macros:
7 9
8RCU_SECONDS_TILL_STALL_CHECK 10CONFIG_RCU_CPU_STALL_TIMEOUT
9 11
10 This macro defines the period of time that RCU will wait from 12 This kernel configuration parameter defines the period of time
11 the beginning of a grace period until it issues an RCU CPU 13 that RCU will wait from the beginning of a grace period until it
12 stall warning. This time period is normally ten seconds. 14 issues an RCU CPU stall warning. This time period is normally
15 ten seconds.
13 16
14RCU_SECONDS_TILL_STALL_RECHECK 17RCU_SECONDS_TILL_STALL_RECHECK
15 18
16 This macro defines the period of time that RCU will wait after 19 This macro defines the period of time that RCU will wait after
17 issuing a stall warning until it issues another stall warning 20 issuing a stall warning until it issues another stall warning
18 for the same stall. This time period is normally set to thirty 21 for the same stall. This time period is normally set to three
19 seconds. 22 times the check interval plus thirty seconds.
20 23
21RCU_STALL_RAT_DELAY 24RCU_STALL_RAT_DELAY
22 25
@@ -80,6 +83,24 @@ o A CPU looping with bottom halves disabled. This condition can
80o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel 83o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
81 without invoking schedule(). 84 without invoking schedule().
82 85
86o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
87 happen to preempt a low-priority task in the middle of an RCU
88 read-side critical section. This is especially damaging if
89 that low-priority task is not permitted to run on any other CPU,
90 in which case the next RCU grace period can never complete, which
91 will eventually cause the system to run out of memory and hang.
92 While the system is in the process of running itself out of
93 memory, you might see stall-warning messages.
94
95o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
96 is running at a higher priority than the RCU softirq threads.
97 This will prevent RCU callbacks from ever being invoked,
98 and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent
99 RCU grace periods from ever completing. Either way, the
100 system will eventually run out of memory and hang. In the
101 CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
102 messages.
103
83o A bug in the RCU implementation. 104o A bug in the RCU implementation.
84 105
85o A hardware failure. This is quite unlikely, but has occurred 106o A hardware failure. This is quite unlikely, but has occurred
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index efd8cc95c06b..8173cec473aa 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -1,39 +1,55 @@
1CONFIG_RCU_TRACE debugfs Files and Formats 1CONFIG_RCU_TRACE debugfs Files and Formats
2 2
3 3
4The rcutree implementation of RCU provides debugfs trace output that 4The rcutree and rcutiny implementations of RCU provide debugfs trace
5summarizes counters and state. This information is useful for debugging 5output that summarizes counters and state. This information is useful for
6RCU itself, and can sometimes also help to debug abuses of RCU. 6debugging RCU itself, and can sometimes also help to debug abuses of RCU.
7The following sections describe the debugfs files and formats. 7The following sections describe the debugfs files and formats, first
8 8for rcutree and next for rcutiny.
9 9
10Hierarchical RCU debugfs Files and Formats 10
11 11CONFIG_TREE_RCU and CONFIG_TREE_PREEMPT_RCU debugfs Files and Formats
12This implementation of RCU provides three debugfs files under the 12
13top-level directory RCU: rcu/rcudata (which displays fields in struct 13These implementations of RCU provides several debugfs files under the
14rcu_data), rcu/rcugp (which displays grace-period counters), and 14top-level directory "rcu":
15rcu/rcuhier (which displays the struct rcu_node hierarchy). 15
16rcu/rcudata:
17 Displays fields in struct rcu_data.
18rcu/rcudata.csv:
19 Comma-separated values spreadsheet version of rcudata.
20rcu/rcugp:
21 Displays grace-period counters.
22rcu/rcuhier:
23 Displays the struct rcu_node hierarchy.
24rcu/rcu_pending:
25 Displays counts of the reasons rcu_pending() decided that RCU had
26 work to do.
27rcu/rcutorture:
28 Displays rcutorture test progress.
29rcu/rcuboost:
30 Displays RCU boosting statistics. Only present if
31 CONFIG_RCU_BOOST=y.
16 32
17The output of "cat rcu/rcudata" looks as follows: 33The output of "cat rcu/rcudata" looks as follows:
18 34
19rcu_sched: 35rcu_sched:
20 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10 36 0 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=545/1/0 df=50 of=0 ri=0 ql=163 qs=NRW. kt=0/W/0 ktl=ebc3 b=10 ci=153737 co=0 ca=0
21 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10 37 1 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=967/1/0 df=58 of=0 ri=0 ql=634 qs=NRW. kt=0/W/1 ktl=58c b=10 ci=191037 co=0 ca=0
22 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10 38 2 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=1081/1/0 df=175 of=0 ri=0 ql=74 qs=N.W. kt=0/W/2 ktl=da94 b=10 ci=75991 co=0 ca=0
23 3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1 dn=0 df=1545 of=0 ri=0 ql=0 b=10 39 3 c=20942 g=20943 pq=1 pqc=20942 qp=1 dt=1846/0/0 df=404 of=0 ri=0 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=72261 co=0 ca=0
24 4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1 dn=0 df=1992 of=0 ri=0 ql=0 b=10 40 4 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=369/1/0 df=83 of=0 ri=0 ql=48 qs=N.W. kt=0/W/4 ktl=e0e7 b=10 ci=128365 co=0 ca=0
25 5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1 dn=0 df=3331 of=0 ri=4 ql=2 b=10 41 5 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=381/1/0 df=64 of=0 ri=0 ql=169 qs=NRW. kt=0/W/5 ktl=fb2f b=10 ci=164360 co=0 ca=0
26 6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1 dn=0 df=3224 of=0 ri=0 ql=0 b=10 42 6 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=1037/1/0 df=183 of=0 ri=0 ql=62 qs=N.W. kt=0/W/6 ktl=d2ad b=10 ci=65663 co=0 ca=0
27 7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1 dn=0 df=1818 of=0 ri=0 ql=2 b=10 43 7 c=20897 g=20897 pq=1 pqc=20896 qp=0 dt=1572/0/0 df=382 of=0 ri=0 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=75006 co=0 ca=0
28rcu_bh: 44rcu_bh:
29 0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1 dn=0 df=0 of=0 ri=0 ql=0 b=10 45 0 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=545/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/0 ktl=ebc3 b=10 ci=0 co=0 ca=0
30 1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1 dn=0 df=13 of=0 ri=0 ql=0 b=10 46 1 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=967/1/0 df=3 of=0 ri=1 ql=0 qs=.... kt=0/W/1 ktl=58c b=10 ci=151 co=0 ca=0
31 2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 47 2 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=1081/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/2 ktl=da94 b=10 ci=0 co=0 ca=0
32 3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1 dn=0 df=9 of=0 ri=0 ql=0 b=10 48 3 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=1846/0/0 df=8 of=0 ri=1 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=0 co=0 ca=0
33 4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 49 4 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=369/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/4 ktl=e0e7 b=10 ci=0 co=0 ca=0
34 5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 50 5 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=381/1/0 df=4 of=0 ri=1 ql=0 qs=.... kt=0/W/5 ktl=fb2f b=10 ci=0 co=0 ca=0
35 6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 51 6 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=1037/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/6 ktl=d2ad b=10 ci=0 co=0 ca=0
36 7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 52 7 c=1474 g=1474 pq=1 pqc=1473 qp=0 dt=1572/0/0 df=8 of=0 ri=1 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=0 co=0 ca=0
37 53
38The first section lists the rcu_data structures for rcu_sched, the second 54The first section lists the rcu_data structures for rcu_sched, the second
39for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an 55for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an
@@ -48,17 +64,18 @@ o The number at the beginning of each line is the CPU number.
48 substantially larger than the number of actual CPUs. 64 substantially larger than the number of actual CPUs.
49 65
50o "c" is the count of grace periods that this CPU believes have 66o "c" is the count of grace periods that this CPU believes have
51 completed. CPUs in dynticks idle mode may lag quite a ways 67 completed. Offlined CPUs and CPUs in dynticks idle mode may
52 behind, for example, CPU 4 under "rcu_sched" above, which has 68 lag quite a ways behind, for example, CPU 6 under "rcu_sched"
53 slept through the past 25 RCU grace periods. It is not unusual 69 above, which has been offline through not quite 40,000 RCU grace
54 to see CPUs lagging by thousands of grace periods. 70 periods. It is not unusual to see CPUs lagging by thousands of
71 grace periods.
55 72
56o "g" is the count of grace periods that this CPU believes have 73o "g" is the count of grace periods that this CPU believes have
57 started. Again, CPUs in dynticks idle mode may lag behind. 74 started. Again, offlined CPUs and CPUs in dynticks idle mode
58 If the "c" and "g" values are equal, this CPU has already 75 may lag behind. If the "c" and "g" values are equal, this CPU
59 reported a quiescent state for the last RCU grace period that 76 has already reported a quiescent state for the last RCU grace
60 it is aware of, otherwise, the CPU believes that it owes RCU a 77 period that it is aware of, otherwise, the CPU believes that it
61 quiescent state. 78 owes RCU a quiescent state.
62 79
63o "pq" indicates that this CPU has passed through a quiescent state 80o "pq" indicates that this CPU has passed through a quiescent state
64 for the current grace period. It is possible for "pq" to be 81 for the current grace period. It is possible for "pq" to be
@@ -77,22 +94,16 @@ o "pqc" indicates which grace period the last-observed quiescent
77 the next grace period! 94 the next grace period!
78 95
79o "qp" indicates that RCU still expects a quiescent state from 96o "qp" indicates that RCU still expects a quiescent state from
80 this CPU. 97 this CPU. Offlined CPUs and CPUs in dyntick idle mode might
98 well have qp=1, which is OK: RCU is still ignoring them.
81 99
82o "dt" is the current value of the dyntick counter that is incremented 100o "dt" is the current value of the dyntick counter that is incremented
83 when entering or leaving dynticks idle state, either by the 101 when entering or leaving dynticks idle state, either by the
84 scheduler or by irq. The number after the "/" is the interrupt 102 scheduler or by irq. This number is even if the CPU is in
85 nesting depth when in dyntick-idle state, or one greater than 103 dyntick idle mode and odd otherwise. The number after the first
86 the interrupt-nesting depth otherwise. 104 "/" is the interrupt nesting depth when in dyntick-idle state,
87 105 or one greater than the interrupt-nesting depth otherwise.
88 This field is displayed only for CONFIG_NO_HZ kernels. 106 The number after the second "/" is the NMI nesting depth.
89
90o "dn" is the current value of the dyntick counter that is incremented
91 when entering or leaving dynticks idle state via NMI. If both
92 the "dt" and "dn" values are even, then this CPU is in dynticks
93 idle mode and may be ignored by RCU. If either of these two
94 counters is odd, then RCU must be alert to the possibility of
95 an RCU read-side critical section running on this CPU.
96 107
97 This field is displayed only for CONFIG_NO_HZ kernels. 108 This field is displayed only for CONFIG_NO_HZ kernels.
98 109
@@ -104,7 +115,7 @@ o "df" is the number of times that some other CPU has forced a
104 115
105o "of" is the number of times that some other CPU has forced a 116o "of" is the number of times that some other CPU has forced a
106 quiescent state on behalf of this CPU due to this CPU being 117 quiescent state on behalf of this CPU due to this CPU being
107 offline. In a perfect world, this might neve happen, but it 118 offline. In a perfect world, this might never happen, but it
108 turns out that offlining and onlining a CPU can take several grace 119 turns out that offlining and onlining a CPU can take several grace
109 periods, and so there is likely to be an extended period of time 120 periods, and so there is likely to be an extended period of time
110 when RCU believes that the CPU is online when it really is not. 121 when RCU believes that the CPU is online when it really is not.
@@ -121,10 +132,78 @@ o "ql" is the number of RCU callbacks currently residing on
121 of what state they are in (new, waiting for grace period to 132 of what state they are in (new, waiting for grace period to
122 start, waiting for grace period to end, ready to invoke). 133 start, waiting for grace period to end, ready to invoke).
123 134
135o "qs" gives an indication of the state of the callback queue
136 with four characters:
137
138 "N" Indicates that there are callbacks queued that are not
139 ready to be handled by the next grace period, and thus
140 will be handled by the grace period following the next
141 one.
142
143 "R" Indicates that there are callbacks queued that are
144 ready to be handled by the next grace period.
145
146 "W" Indicates that there are callbacks queued that are
147 waiting on the current grace period.
148
149 "D" Indicates that there are callbacks queued that have
150 already been handled by a prior grace period, and are
151 thus waiting to be invoked. Note that callbacks in
152 the process of being invoked are not counted here.
153 Callbacks in the process of being invoked are those
154 that have been removed from the rcu_data structures
155 queues by rcu_do_batch(), but which have not yet been
156 invoked.
157
158 If there are no callbacks in a given one of the above states,
159 the corresponding character is replaced by ".".
160
161o "kt" is the per-CPU kernel-thread state. The digit preceding
162 the first slash is zero if there is no work pending and 1
163 otherwise. The character between the first pair of slashes is
164 as follows:
165
166 "S" The kernel thread is stopped, in other words, all
167 CPUs corresponding to this rcu_node structure are
168 offline.
169
170 "R" The kernel thread is running.
171
172 "W" The kernel thread is waiting because there is no work
173 for it to do.
174
175 "O" The kernel thread is waiting because it has been
176 forced off of its designated CPU or because its
177 ->cpus_allowed mask permits it to run on other than
178 its designated CPU.
179
180 "Y" The kernel thread is yielding to avoid hogging CPU.
181
182 "?" Unknown value, indicates a bug.
183
184 The number after the final slash is the CPU that the kthread
185 is actually running on.
186
187o "ktl" is the low-order 16 bits (in hexadecimal) of the count of
188 the number of times that this CPU's per-CPU kthread has gone
189 through its loop servicing invoke_rcu_cpu_kthread() requests.
190
124o "b" is the batch limit for this CPU. If more than this number 191o "b" is the batch limit for this CPU. If more than this number
125 of RCU callbacks is ready to invoke, then the remainder will 192 of RCU callbacks is ready to invoke, then the remainder will
126 be deferred. 193 be deferred.
127 194
195o "ci" is the number of RCU callbacks that have been invoked for
196 this CPU. Note that ci+ql is the number of callbacks that have
197 been registered in absence of CPU-hotplug activity.
198
199o "co" is the number of RCU callbacks that have been orphaned due to
200 this CPU going offline. These orphaned callbacks have been moved
201 to an arbitrarily chosen online CPU.
202
203o "ca" is the number of RCU callbacks that have been adopted due to
204 other CPUs going offline. Note that ci+co-ca+ql is the number of
205 RCU callbacks registered on this CPU.
206
128There is also an rcu/rcudata.csv file with the same information in 207There is also an rcu/rcudata.csv file with the same information in
129comma-separated-variable spreadsheet format. 208comma-separated-variable spreadsheet format.
130 209
@@ -157,15 +236,15 @@ o "gpnum" is the number of grace periods that have started. It is
157 236
158The output of "cat rcu/rcuhier" looks as follows, with very long lines: 237The output of "cat rcu/rcuhier" looks as follows, with very long lines:
159 238
160c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0 239c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6
1611/1 .>. 0:127 ^0 2401/1 ..>. 0:127 ^0
1623/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 2413/3 ..>. 0:35 ^0 0/0 ..>. 36:71 ^1 0/0 ..>. 72:107 ^2 0/0 ..>. 108:127 ^3
1633/3f .>. 0:5 ^0 2/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3 2423/3f ..>. 0:5 ^0 2/3 ..>. 6:11 ^1 0/0 ..>. 12:17 ^2 0/0 ..>. 18:23 ^3 0/0 ..>. 24:29 ^4 0/0 ..>. 30:35 ^5 0/0 ..>. 36:41 ^0 0/0 ..>. 42:47 ^1 0/0 ..>. 48:53 ^2 0/0 ..>. 54:59 ^3 0/0 ..>. 60:65 ^4 0/0 ..>. 66:71 ^5 0/0 ..>. 72:77 ^0 0/0 ..>. 78:83 ^1 0/0 ..>. 84:89 ^2 0/0 ..>. 90:95 ^3 0/0 ..>. 96:101 ^4 0/0 ..>. 102:107 ^5 0/0 ..>. 108:113 ^0 0/0 ..>. 114:119 ^1 0/0 ..>. 120:125 ^2 0/0 ..>. 126:127 ^3
164rcu_bh: 243rcu_bh:
165c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0 244c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0
1660/1 .>. 0:127 ^0 2450/1 ..>. 0:127 ^0
1670/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 2460/3 ..>. 0:35 ^0 0/0 ..>. 36:71 ^1 0/0 ..>. 72:107 ^2 0/0 ..>. 108:127 ^3
1680/3f .>. 0:5 ^0 0/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3 2470/3f ..>. 0:5 ^0 0/3 ..>. 6:11 ^1 0/0 ..>. 12:17 ^2 0/0 ..>. 18:23 ^3 0/0 ..>. 24:29 ^4 0/0 ..>. 30:35 ^5 0/0 ..>. 36:41 ^0 0/0 ..>. 42:47 ^1 0/0 ..>. 48:53 ^2 0/0 ..>. 54:59 ^3 0/0 ..>. 60:65 ^4 0/0 ..>. 66:71 ^5 0/0 ..>. 72:77 ^0 0/0 ..>. 78:83 ^1 0/0 ..>. 84:89 ^2 0/0 ..>. 90:95 ^3 0/0 ..>. 96:101 ^4 0/0 ..>. 102:107 ^5 0/0 ..>. 108:113 ^0 0/0 ..>. 114:119 ^1 0/0 ..>. 120:125 ^2 0/0 ..>. 126:127 ^3
169 248
170This is once again split into "rcu_sched" and "rcu_bh" portions, 249This is once again split into "rcu_sched" and "rcu_bh" portions,
171and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional 250and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional
@@ -180,7 +259,7 @@ o "s" is the "signaled" state that drives force_quiescent_state()'s
180 259
181o "jfq" is the number of jiffies remaining for this grace period 260o "jfq" is the number of jiffies remaining for this grace period
182 before force_quiescent_state() is invoked to help push things 261 before force_quiescent_state() is invoked to help push things
183 along. Note that CPUs in dyntick-idle mode thoughout the grace 262 along. Note that CPUs in dyntick-idle mode throughout the grace
184 period will not report on their own, but rather must be check by 263 period will not report on their own, but rather must be check by
185 some other CPU via force_quiescent_state(). 264 some other CPU via force_quiescent_state().
186 265
@@ -201,11 +280,6 @@ o "fqlh" is the number of calls to force_quiescent_state() that
201 exited immediately (without even being counted in nfqs above) 280 exited immediately (without even being counted in nfqs above)
202 due to contention on ->fqslock. 281 due to contention on ->fqslock.
203 282
204o "oqlen" is the number of callbacks on the "orphan" callback
205 list. RCU callbacks are placed on this list by CPUs going
206 offline, and are "adopted" either by the CPU helping the outgoing
207 CPU or by the next rcu_barrier*() call, whichever comes first.
208
209o Each element of the form "1/1 0:127 ^0" represents one struct 283o Each element of the form "1/1 0:127 ^0" represents one struct
210 rcu_node. Each line represents one level of the hierarchy, from 284 rcu_node. Each line represents one level of the hierarchy, from
211 root to leaves. It is best to think of the rcu_data structures 285 root to leaves. It is best to think of the rcu_data structures
@@ -229,13 +303,20 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
229 current grace period. 303 current grace period.
230 304
231 o The characters separated by the ">" indicate the state 305 o The characters separated by the ">" indicate the state
232 of the blocked-tasks lists. A "T" preceding the ">" 306 of the blocked-tasks lists. A "G" preceding the ">"
233 indicates that at least one task blocked in an RCU 307 indicates that at least one task blocked in an RCU
234 read-side critical section blocks the current grace 308 read-side critical section blocks the current grace
235 period, while a "." preceding the ">" indicates otherwise. 309 period, while a "E" preceding the ">" indicates that
236 The character following the ">" indicates similarly for 310 at least one task blocked in an RCU read-side critical
237 the next grace period. A "T" should appear in this 311 section blocks the current expedited grace period.
238 field only for rcu-preempt. 312 A "T" character following the ">" indicates that at
313 least one task is blocked within an RCU read-side
314 critical section, regardless of whether any current
315 grace period (expedited or normal) is inconvenienced.
316 A "." character appears if the corresponding condition
317 does not hold, so that "..>." indicates that no tasks
318 are blocked. In contrast, "GE>T" indicates maximal
319 inconvenience from blocked tasks.
239 320
240 o The numbers separated by the ":" are the range of CPUs 321 o The numbers separated by the ":" are the range of CPUs
241 served by this struct rcu_node. This can be helpful 322 served by this struct rcu_node. This can be helpful
@@ -315,3 +396,222 @@ o "nn" is the number of times that this CPU needed nothing. Alert
315 readers will note that the rcu "nn" number for a given CPU very 396 readers will note that the rcu "nn" number for a given CPU very
316 closely matches the rcu_bh "np" number for that same CPU. This 397 closely matches the rcu_bh "np" number for that same CPU. This
317 is due to short-circuit evaluation in rcu_pending(). 398 is due to short-circuit evaluation in rcu_pending().
399
400
401The output of "cat rcu/rcutorture" looks as follows:
402
403rcutorture test sequence: 0 (test in progress)
404rcutorture update version number: 615
405
406The first line shows the number of rcutorture tests that have completed
407since boot. If a test is currently running, the "(test in progress)"
408string will appear as shown above. The second line shows the number of
409update cycles that the current test has started, or zero if there is
410no test in progress.
411
412
413The output of "cat rcu/rcuboost" looks as follows:
414
4150:5 tasks=.... kt=W ntb=0 neb=0 nnb=0 j=2f95 bt=300f
416 balk: nt=0 egt=989 bt=0 nb=0 ny=0 nos=16
4176:7 tasks=.... kt=W ntb=0 neb=0 nnb=0 j=2f95 bt=300f
418 balk: nt=0 egt=225 bt=0 nb=0 ny=0 nos=6
419
420This information is output only for rcu_preempt. Each two-line entry
421corresponds to a leaf rcu_node strcuture. The fields are as follows:
422
423o "n:m" is the CPU-number range for the corresponding two-line
424 entry. In the sample output above, the first entry covers
425 CPUs zero through five and the second entry covers CPUs 6
426 and 7.
427
428o "tasks=TNEB" gives the state of the various segments of the
429 rnp->blocked_tasks list:
430
431 "T" This indicates that there are some tasks that blocked
432 while running on one of the corresponding CPUs while
433 in an RCU read-side critical section.
434
435 "N" This indicates that some of the blocked tasks are preventing
436 the current normal (non-expedited) grace period from
437 completing.
438
439 "E" This indicates that some of the blocked tasks are preventing
440 the current expedited grace period from completing.
441
442 "B" This indicates that some of the blocked tasks are in
443 need of RCU priority boosting.
444
445 Each character is replaced with "." if the corresponding
446 condition does not hold.
447
448o "kt" is the state of the RCU priority-boosting kernel
449 thread associated with the corresponding rcu_node structure.
450 The state can be one of the following:
451
452 "S" The kernel thread is stopped, in other words, all
453 CPUs corresponding to this rcu_node structure are
454 offline.
455
456 "R" The kernel thread is running.
457
458 "W" The kernel thread is waiting because there is no work
459 for it to do.
460
461 "Y" The kernel thread is yielding to avoid hogging CPU.
462
463 "?" Unknown value, indicates a bug.
464
465o "ntb" is the number of tasks boosted.
466
467o "neb" is the number of tasks boosted in order to complete an
468 expedited grace period.
469
470o "nnb" is the number of tasks boosted in order to complete a
471 normal (non-expedited) grace period. When boosting a task
472 that was blocking both an expedited and a normal grace period,
473 it is counted against the expedited total above.
474
475o "j" is the low-order 16 bits of the jiffies counter in
476 hexadecimal.
477
478o "bt" is the low-order 16 bits of the value that the jiffies
479 counter will have when we next start boosting, assuming that
480 the current grace period does not end beforehand. This is
481 also in hexadecimal.
482
483o "balk: nt" counts the number of times we didn't boost (in
484 other words, we balked) even though it was time to boost because
485 there were no blocked tasks to boost. This situation occurs
486 when there is one blocked task on one rcu_node structure and
487 none on some other rcu_node structure.
488
489o "egt" counts the number of times we balked because although
490 there were blocked tasks, none of them were blocking the
491 current grace period, whether expedited or otherwise.
492
493o "bt" counts the number of times we balked because boosting
494 had already been initiated for the current grace period.
495
496o "nb" counts the number of times we balked because there
497 was at least one task blocking the current non-expedited grace
498 period that never had blocked. If it is already running, it
499 just won't help to boost its priority!
500
501o "ny" counts the number of times we balked because it was
502 not yet time to start boosting.
503
504o "nos" counts the number of times we balked for other
505 reasons, e.g., the grace period ended first.
506
507
508CONFIG_TINY_RCU and CONFIG_TINY_PREEMPT_RCU debugfs Files and Formats
509
510These implementations of RCU provides a single debugfs file under the
511top-level directory RCU, namely rcu/rcudata, which displays fields in
512rcu_bh_ctrlblk, rcu_sched_ctrlblk and, for CONFIG_TINY_PREEMPT_RCU,
513rcu_preempt_ctrlblk.
514
515The output of "cat rcu/rcudata" is as follows:
516
517rcu_preempt: qlen=24 gp=1097669 g197/p197/c197 tasks=...
518 ttb=. btg=no ntb=184 neb=0 nnb=183 j=01f7 bt=0274
519 normal balk: nt=1097669 gt=0 bt=371 b=0 ny=25073378 nos=0
520 exp balk: bt=0 nos=0
521rcu_sched: qlen: 0
522rcu_bh: qlen: 0
523
524This is split into rcu_preempt, rcu_sched, and rcu_bh sections, with the
525rcu_preempt section appearing only in CONFIG_TINY_PREEMPT_RCU builds.
526The last three lines of the rcu_preempt section appear only in
527CONFIG_RCU_BOOST kernel builds. The fields are as follows:
528
529o "qlen" is the number of RCU callbacks currently waiting either
530 for an RCU grace period or waiting to be invoked. This is the
531 only field present for rcu_sched and rcu_bh, due to the
532 short-circuiting of grace period in those two cases.
533
534o "gp" is the number of grace periods that have completed.
535
536o "g197/p197/c197" displays the grace-period state, with the
537 "g" number being the number of grace periods that have started
538 (mod 256), the "p" number being the number of grace periods
539 that the CPU has responded to (also mod 256), and the "c"
540 number being the number of grace periods that have completed
541 (once again mode 256).
542
543 Why have both "gp" and "g"? Because the data flowing into
544 "gp" is only present in a CONFIG_RCU_TRACE kernel.
545
546o "tasks" is a set of bits. The first bit is "T" if there are
547 currently tasks that have recently blocked within an RCU
548 read-side critical section, the second bit is "N" if any of the
549 aforementioned tasks are blocking the current RCU grace period,
550 and the third bit is "E" if any of the aforementioned tasks are
551 blocking the current expedited grace period. Each bit is "."
552 if the corresponding condition does not hold.
553
554o "ttb" is a single bit. It is "B" if any of the blocked tasks
555 need to be priority boosted and "." otherwise.
556
557o "btg" indicates whether boosting has been carried out during
558 the current grace period, with "exp" indicating that boosting
559 is in progress for an expedited grace period, "no" indicating
560 that boosting has not yet started for a normal grace period,
561 "begun" indicating that boosting has bebug for a normal grace
562 period, and "done" indicating that boosting has completed for
563 a normal grace period.
564
565o "ntb" is the total number of tasks subjected to RCU priority boosting
566 periods since boot.
567
568o "neb" is the number of expedited grace periods that have had
569 to resort to RCU priority boosting since boot.
570
571o "nnb" is the number of normal grace periods that have had
572 to resort to RCU priority boosting since boot.
573
574o "j" is the low-order 16 bits of the jiffies counter in hexadecimal.
575
576o "bt" is the low-order 16 bits of the value that the jiffies counter
577 will have at the next time that boosting is scheduled to begin.
578
579o In the line beginning with "normal balk", the fields are as follows:
580
581 o "nt" is the number of times that the system balked from
582 boosting because there were no blocked tasks to boost.
583 Note that the system will balk from boosting even if the
584 grace period is overdue when the currently running task
585 is looping within an RCU read-side critical section.
586 There is no point in boosting in this case, because
587 boosting a running task won't make it run any faster.
588
589 o "gt" is the number of times that the system balked
590 from boosting because, although there were blocked tasks,
591 none of them were preventing the current grace period
592 from completing.
593
594 o "bt" is the number of times that the system balked
595 from boosting because boosting was already in progress.
596
597 o "b" is the number of times that the system balked from
598 boosting because boosting had already completed for
599 the grace period in question.
600
601 o "ny" is the number of times that the system balked from
602 boosting because it was not yet time to start boosting
603 the grace period in question.
604
605 o "nos" is the number of times that the system balked from
606 boosting for inexplicable ("not otherwise specified")
607 reasons. This can actually happen due to races involving
608 increments of the jiffies counter.
609
610o In the line beginning with "exp balk", the fields are as follows:
611
612 o "bt" is the number of times that the system balked from
613 boosting because there were no blocked tasks to boost.
614
615 o "nos" is the number of times that the system balked from
616 boosting for inexplicable ("not otherwise specified")
617 reasons.
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index cfaac34c4557..6ef692667e2f 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -849,6 +849,37 @@ All: lockdep-checked RCU-protected pointer access
849See the comment headers in the source code (or the docbook generated 849See the comment headers in the source code (or the docbook generated
850from them) for more information. 850from them) for more information.
851 851
852However, given that there are no fewer than four families of RCU APIs
853in the Linux kernel, how do you choose which one to use? The following
854list can be helpful:
855
856a. Will readers need to block? If so, you need SRCU.
857
858b. What about the -rt patchset? If readers would need to block
859 in an non-rt kernel, you need SRCU. If readers would block
860 in a -rt kernel, but not in a non-rt kernel, SRCU is not
861 necessary.
862
863c. Do you need to treat NMI handlers, hardirq handlers,
864 and code segments with preemption disabled (whether
865 via preempt_disable(), local_irq_save(), local_bh_disable(),
866 or some other mechanism) as if they were explicit RCU readers?
867 If so, you need RCU-sched.
868
869d. Do you need RCU grace periods to complete even in the face
870 of softirq monopolization of one or more of the CPUs? For
871 example, is your code subject to network-based denial-of-service
872 attacks? If so, you need RCU-bh.
873
874e. Is your workload too update-intensive for normal use of
875 RCU, but inappropriate for other synchronization mechanisms?
876 If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
877
878f. Otherwise, use RCU.
879
880Of course, this all assumes that you have determined that RCU is in fact
881the right tool for your job.
882
852 883
8538. ANSWERS TO QUICK QUIZZES 8848. ANSWERS TO QUICK QUIZZES
854 885