diff options
Diffstat (limited to 'Documentation/RCU/trace.txt')
-rw-r--r-- | Documentation/RCU/trace.txt | 254 |
1 files changed, 49 insertions, 205 deletions
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 187bbf10c923..8608fd85e921 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt | |||
@@ -1,185 +1,10 @@ | |||
1 | CONFIG_RCU_TRACE debugfs Files and Formats | 1 | CONFIG_RCU_TRACE debugfs Files and Formats |
2 | 2 | ||
3 | 3 | ||
4 | The rcupreempt and rcutree implementations of RCU provide debugfs trace | 4 | The rcutree implementation of RCU provides debugfs trace output that |
5 | output that summarizes counters and state. This information is useful for | 5 | summarizes counters and state. This information is useful for debugging |
6 | debugging RCU itself, and can sometimes also help to debug abuses of RCU. | 6 | RCU itself, and can sometimes also help to debug abuses of RCU. |
7 | Note that the rcuclassic implementation of RCU does not provide debugfs | 7 | The following sections describe the debugfs files and formats. |
8 | trace output. | ||
9 | |||
10 | The following sections describe the debugfs files and formats for | ||
11 | preemptable RCU (rcupreempt) and hierarchical RCU (rcutree). | ||
12 | |||
13 | |||
14 | Preemptable RCU debugfs Files and Formats | ||
15 | |||
16 | This implementation of RCU provides three debugfs files under the | ||
17 | top-level directory RCU: rcu/rcuctrs (which displays the per-CPU | ||
18 | counters used by preemptable RCU) rcu/rcugp (which displays grace-period | ||
19 | counters), and rcu/rcustats (which internal counters for debugging RCU). | ||
20 | |||
21 | The output of "cat rcu/rcuctrs" looks as follows: | ||
22 | |||
23 | CPU last cur F M | ||
24 | 0 5 -5 0 0 | ||
25 | 1 -1 0 0 0 | ||
26 | 2 0 1 0 0 | ||
27 | 3 0 1 0 0 | ||
28 | 4 0 1 0 0 | ||
29 | 5 0 1 0 0 | ||
30 | 6 0 2 0 0 | ||
31 | 7 0 -1 0 0 | ||
32 | 8 0 1 0 0 | ||
33 | ggp = 26226, state = waitzero | ||
34 | |||
35 | The per-CPU fields are as follows: | ||
36 | |||
37 | o "CPU" gives the CPU number. Offline CPUs are not displayed. | ||
38 | |||
39 | o "last" gives the value of the counter that is being decremented | ||
40 | for the current grace period phase. In the example above, | ||
41 | the counters sum to 4, indicating that there are still four | ||
42 | RCU read-side critical sections still running that started | ||
43 | before the last counter flip. | ||
44 | |||
45 | o "cur" gives the value of the counter that is currently being | ||
46 | both incremented (by rcu_read_lock()) and decremented (by | ||
47 | rcu_read_unlock()). In the example above, the counters sum to | ||
48 | 1, indicating that there is only one RCU read-side critical section | ||
49 | still running that started after the last counter flip. | ||
50 | |||
51 | o "F" indicates whether RCU is waiting for this CPU to acknowledge | ||
52 | a counter flip. In the above example, RCU is not waiting on any, | ||
53 | which is consistent with the state being "waitzero" rather than | ||
54 | "waitack". | ||
55 | |||
56 | o "M" indicates whether RCU is waiting for this CPU to execute a | ||
57 | memory barrier. In the above example, RCU is not waiting on any, | ||
58 | which is consistent with the state being "waitzero" rather than | ||
59 | "waitmb". | ||
60 | |||
61 | o "ggp" is the global grace-period counter. | ||
62 | |||
63 | o "state" is the RCU state, which can be one of the following: | ||
64 | |||
65 | o "idle": there is no grace period in progress. | ||
66 | |||
67 | o "waitack": RCU just incremented the global grace-period | ||
68 | counter, which has the effect of reversing the roles of | ||
69 | the "last" and "cur" counters above, and is waiting for | ||
70 | all the CPUs to acknowledge the flip. Once the flip has | ||
71 | been acknowledged, CPUs will no longer be incrementing | ||
72 | what are now the "last" counters, so that their sum will | ||
73 | decrease monotonically down to zero. | ||
74 | |||
75 | o "waitzero": RCU is waiting for the sum of the "last" counters | ||
76 | to decrease to zero. | ||
77 | |||
78 | o "waitmb": RCU is waiting for each CPU to execute a memory | ||
79 | barrier, which ensures that instructions from a given CPU's | ||
80 | last RCU read-side critical section cannot be reordered | ||
81 | with instructions following the memory-barrier instruction. | ||
82 | |||
83 | The output of "cat rcu/rcugp" looks as follows: | ||
84 | |||
85 | oldggp=48870 newggp=48873 | ||
86 | |||
87 | Note that reading from this file provokes a synchronize_rcu(). The | ||
88 | "oldggp" value is that of "ggp" from rcu/rcuctrs above, taken before | ||
89 | executing the synchronize_rcu(), and the "newggp" value is also the | ||
90 | "ggp" value, but taken after the synchronize_rcu() command returns. | ||
91 | |||
92 | |||
93 | The output of "cat rcu/rcugp" looks as follows: | ||
94 | |||
95 | na=1337955 nl=40 wa=1337915 wl=44 da=1337871 dl=0 dr=1337871 di=1337871 | ||
96 | 1=50989 e1=6138 i1=49722 ie1=82 g1=49640 a1=315203 ae1=265563 a2=49640 | ||
97 | z1=1401244 ze1=1351605 z2=49639 m1=5661253 me1=5611614 m2=49639 | ||
98 | |||
99 | These are counters tracking internal preemptable-RCU events, however, | ||
100 | some of them may be useful for debugging algorithms using RCU. In | ||
101 | particular, the "nl", "wl", and "dl" values track the number of RCU | ||
102 | callbacks in various states. The fields are as follows: | ||
103 | |||
104 | o "na" is the total number of RCU callbacks that have been enqueued | ||
105 | since boot. | ||
106 | |||
107 | o "nl" is the number of RCU callbacks waiting for the previous | ||
108 | grace period to end so that they can start waiting on the next | ||
109 | grace period. | ||
110 | |||
111 | o "wa" is the total number of RCU callbacks that have started waiting | ||
112 | for a grace period since boot. "na" should be roughly equal to | ||
113 | "nl" plus "wa". | ||
114 | |||
115 | o "wl" is the number of RCU callbacks currently waiting for their | ||
116 | grace period to end. | ||
117 | |||
118 | o "da" is the total number of RCU callbacks whose grace periods | ||
119 | have completed since boot. "wa" should be roughly equal to | ||
120 | "wl" plus "da". | ||
121 | |||
122 | o "dr" is the total number of RCU callbacks that have been removed | ||
123 | from the list of callbacks ready to invoke. "dr" should be roughly | ||
124 | equal to "da". | ||
125 | |||
126 | o "di" is the total number of RCU callbacks that have been invoked | ||
127 | since boot. "di" should be roughly equal to "da", though some | ||
128 | early versions of preemptable RCU had a bug so that only the | ||
129 | last CPU's count of invocations was displayed, rather than the | ||
130 | sum of all CPU's counts. | ||
131 | |||
132 | o "1" is the number of calls to rcu_try_flip(). This should be | ||
133 | roughly equal to the sum of "e1", "i1", "a1", "z1", and "m1" | ||
134 | described below. In other words, the number of times that | ||
135 | the state machine is visited should be equal to the sum of the | ||
136 | number of times that each state is visited plus the number of | ||
137 | times that the state-machine lock acquisition failed. | ||
138 | |||
139 | o "e1" is the number of times that rcu_try_flip() was unable to | ||
140 | acquire the fliplock. | ||
141 | |||
142 | o "i1" is the number of calls to rcu_try_flip_idle(). | ||
143 | |||
144 | o "ie1" is the number of times rcu_try_flip_idle() exited early | ||
145 | due to the calling CPU having no work for RCU. | ||
146 | |||
147 | o "g1" is the number of times that rcu_try_flip_idle() decided | ||
148 | to start a new grace period. "i1" should be roughly equal to | ||
149 | "ie1" plus "g1". | ||
150 | |||
151 | o "a1" is the number of calls to rcu_try_flip_waitack(). | ||
152 | |||
153 | o "ae1" is the number of times that rcu_try_flip_waitack() found | ||
154 | that at least one CPU had not yet acknowledge the new grace period | ||
155 | (AKA "counter flip"). | ||
156 | |||
157 | o "a2" is the number of time rcu_try_flip_waitack() found that | ||
158 | all CPUs had acknowledged. "a1" should be roughly equal to | ||
159 | "ae1" plus "a2". (This particular output was collected on | ||
160 | a 128-CPU machine, hence the smaller-than-usual fraction of | ||
161 | calls to rcu_try_flip_waitack() finding all CPUs having already | ||
162 | acknowledged.) | ||
163 | |||
164 | o "z1" is the number of calls to rcu_try_flip_waitzero(). | ||
165 | |||
166 | o "ze1" is the number of times that rcu_try_flip_waitzero() found | ||
167 | that not all of the old RCU read-side critical sections had | ||
168 | completed. | ||
169 | |||
170 | o "z2" is the number of times that rcu_try_flip_waitzero() finds | ||
171 | the sum of the counters equal to zero, in other words, that | ||
172 | all of the old RCU read-side critical sections had completed. | ||
173 | The value of "z1" should be roughly equal to "ze1" plus | ||
174 | "z2". | ||
175 | |||
176 | o "m1" is the number of calls to rcu_try_flip_waitmb(). | ||
177 | |||
178 | o "me1" is the number of times that rcu_try_flip_waitmb() finds | ||
179 | that at least one CPU has not yet executed a memory barrier. | ||
180 | |||
181 | o "m2" is the number of times that rcu_try_flip_waitmb() finds that | ||
182 | all CPUs have executed a memory barrier. | ||
183 | 8 | ||
184 | 9 | ||
185 | Hierarchical RCU debugfs Files and Formats | 10 | Hierarchical RCU debugfs Files and Formats |
@@ -210,9 +35,10 @@ rcu_bh: | |||
210 | 6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 | 35 | 6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 |
211 | 7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 | 36 | 7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10 |
212 | 37 | ||
213 | The first section lists the rcu_data structures for rcu, the second for | 38 | The first section lists the rcu_data structures for rcu_sched, the second |
214 | rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system. | 39 | for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an |
215 | The fields are as follows: | 40 | additional section for rcu_preempt. Each section has one line per CPU, |
41 | or eight for this 8-CPU system. The fields are as follows: | ||
216 | 42 | ||
217 | o The number at the beginning of each line is the CPU number. | 43 | o The number at the beginning of each line is the CPU number. |
218 | CPUs numbers followed by an exclamation mark are offline, | 44 | CPUs numbers followed by an exclamation mark are offline, |
@@ -223,9 +49,9 @@ o The number at the beginning of each line is the CPU number. | |||
223 | 49 | ||
224 | o "c" is the count of grace periods that this CPU believes have | 50 | o "c" is the count of grace periods that this CPU believes have |
225 | completed. CPUs in dynticks idle mode may lag quite a ways | 51 | completed. CPUs in dynticks idle mode may lag quite a ways |
226 | behind, for example, CPU 4 under "rcu" above, which has slept | 52 | behind, for example, CPU 4 under "rcu_sched" above, which has |
227 | through the past 25 RCU grace periods. It is not unusual to | 53 | slept through the past 25 RCU grace periods. It is not unusual |
228 | see CPUs lagging by thousands of grace periods. | 54 | to see CPUs lagging by thousands of grace periods. |
229 | 55 | ||
230 | o "g" is the count of grace periods that this CPU believes have | 56 | o "g" is the count of grace periods that this CPU believes have |
231 | started. Again, CPUs in dynticks idle mode may lag behind. | 57 | started. Again, CPUs in dynticks idle mode may lag behind. |
@@ -308,8 +134,10 @@ The output of "cat rcu/rcugp" looks as follows: | |||
308 | rcu_sched: completed=33062 gpnum=33063 | 134 | rcu_sched: completed=33062 gpnum=33063 |
309 | rcu_bh: completed=464 gpnum=464 | 135 | rcu_bh: completed=464 gpnum=464 |
310 | 136 | ||
311 | Again, this output is for both "rcu" and "rcu_bh". The fields are | 137 | Again, this output is for both "rcu_sched" and "rcu_bh". Note that |
312 | taken from the rcu_state structure, and are as follows: | 138 | kernels built with CONFIG_TREE_PREEMPT_RCU will have an additional |
139 | "rcu_preempt" line. The fields are taken from the rcu_state structure, | ||
140 | and are as follows: | ||
313 | 141 | ||
314 | o "completed" is the number of grace periods that have completed. | 142 | o "completed" is the number of grace periods that have completed. |
315 | It is comparable to the "c" field from rcu/rcudata in that a | 143 | It is comparable to the "c" field from rcu/rcudata in that a |
@@ -324,23 +152,24 @@ o "gpnum" is the number of grace periods that have started. It is | |||
324 | If these two fields are equal (as they are for "rcu_bh" above), | 152 | If these two fields are equal (as they are for "rcu_bh" above), |
325 | then there is no grace period in progress, in other words, RCU | 153 | then there is no grace period in progress, in other words, RCU |
326 | is idle. On the other hand, if the two fields differ (as they | 154 | is idle. On the other hand, if the two fields differ (as they |
327 | do for "rcu" above), then an RCU grace period is in progress. | 155 | do for "rcu_sched" above), then an RCU grace period is in progress. |
328 | 156 | ||
329 | 157 | ||
330 | The output of "cat rcu/rcuhier" looks as follows, with very long lines: | 158 | The output of "cat rcu/rcuhier" looks as follows, with very long lines: |
331 | 159 | ||
332 | c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 | 160 | c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0 |
333 | 1/1 0:127 ^0 | 161 | 1/1 .>. 0:127 ^0 |
334 | 3/3 0:35 ^0 0/0 36:71 ^1 0/0 72:107 ^2 0/0 108:127 ^3 | 162 | 3/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 |
335 | 3/3f 0:5 ^0 2/3 6:11 ^1 0/0 12:17 ^2 0/0 18:23 ^3 0/0 24:29 ^4 0/0 30:35 ^5 0/0 36:41 ^0 0/0 42:47 ^1 0/0 48:53 ^2 0/0 54:59 ^3 0/0 60:65 ^4 0/0 66:71 ^5 0/0 72:77 ^0 0/0 78:83 ^1 0/0 84:89 ^2 0/0 90:95 ^3 0/0 96:101 ^4 0/0 102:107 ^5 0/0 108:113 ^0 0/0 114:119 ^1 0/0 120:125 ^2 0/0 126:127 ^3 | 163 | 3/3f .>. 0:5 ^0 2/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3 |
336 | rcu_bh: | 164 | rcu_bh: |
337 | c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 | 165 | c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0 |
338 | 0/1 0:127 ^0 | 166 | 0/1 .>. 0:127 ^0 |
339 | 0/3 0:35 ^0 0/0 36:71 ^1 0/0 72:107 ^2 0/0 108:127 ^3 | 167 | 0/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 |
340 | 0/3f 0:5 ^0 0/3 6:11 ^1 0/0 12:17 ^2 0/0 18:23 ^3 0/0 24:29 ^4 0/0 30:35 ^5 0/0 36:41 ^0 0/0 42:47 ^1 0/0 48:53 ^2 0/0 54:59 ^3 0/0 60:65 ^4 0/0 66:71 ^5 0/0 72:77 ^0 0/0 78:83 ^1 0/0 84:89 ^2 0/0 90:95 ^3 0/0 96:101 ^4 0/0 102:107 ^5 0/0 108:113 ^0 0/0 114:119 ^1 0/0 120:125 ^2 0/0 126:127 ^3 | 168 | 0/3f .>. 0:5 ^0 0/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3 |
341 | 169 | ||
342 | This is once again split into "rcu" and "rcu_bh" portions. The fields are | 170 | This is once again split into "rcu_sched" and "rcu_bh" portions, |
343 | as follows: | 171 | and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional |
172 | "rcu_preempt" section. The fields are as follows: | ||
344 | 173 | ||
345 | o "c" is exactly the same as "completed" under rcu/rcugp. | 174 | o "c" is exactly the same as "completed" under rcu/rcugp. |
346 | 175 | ||
@@ -372,6 +201,11 @@ o "fqlh" is the number of calls to force_quiescent_state() that | |||
372 | exited immediately (without even being counted in nfqs above) | 201 | exited immediately (without even being counted in nfqs above) |
373 | due to contention on ->fqslock. | 202 | due to contention on ->fqslock. |
374 | 203 | ||
204 | o "oqlen" is the number of callbacks on the "orphan" callback | ||
205 | list. RCU callbacks are placed on this list by CPUs going | ||
206 | offline, and are "adopted" either by the CPU helping the outgoing | ||
207 | CPU or by the next rcu_barrier*() call, whichever comes first. | ||
208 | |||
375 | o Each element of the form "1/1 0:127 ^0" represents one struct | 209 | o Each element of the form "1/1 0:127 ^0" represents one struct |
376 | rcu_node. Each line represents one level of the hierarchy, from | 210 | rcu_node. Each line represents one level of the hierarchy, from |
377 | root to leaves. It is best to think of the rcu_data structures | 211 | root to leaves. It is best to think of the rcu_data structures |
@@ -379,7 +213,7 @@ o Each element of the form "1/1 0:127 ^0" represents one struct | |||
379 | might be either one, two, or three levels of rcu_node structures, | 213 | might be either one, two, or three levels of rcu_node structures, |
380 | depending on the relationship between CONFIG_RCU_FANOUT and | 214 | depending on the relationship between CONFIG_RCU_FANOUT and |
381 | CONFIG_NR_CPUS. | 215 | CONFIG_NR_CPUS. |
382 | 216 | ||
383 | o The numbers separated by the "/" are the qsmask followed | 217 | o The numbers separated by the "/" are the qsmask followed |
384 | by the qsmaskinit. The qsmask will have one bit | 218 | by the qsmaskinit. The qsmask will have one bit |
385 | set for each entity in the next lower level that | 219 | set for each entity in the next lower level that |
@@ -389,10 +223,19 @@ o Each element of the form "1/1 0:127 ^0" represents one struct | |||
389 | The value of qsmaskinit is assigned to that of qsmask | 223 | The value of qsmaskinit is assigned to that of qsmask |
390 | at the beginning of each grace period. | 224 | at the beginning of each grace period. |
391 | 225 | ||
392 | For example, for "rcu", the qsmask of the first entry | 226 | For example, for "rcu_sched", the qsmask of the first |
393 | of the lowest level is 0x14, meaning that we are still | 227 | entry of the lowest level is 0x14, meaning that we |
394 | waiting for CPUs 2 and 4 to check in for the current | 228 | are still waiting for CPUs 2 and 4 to check in for the |
395 | grace period. | 229 | current grace period. |
230 | |||
231 | o The characters separated by the ">" indicate the state | ||
232 | of the blocked-tasks lists. A "T" preceding the ">" | ||
233 | indicates that at least one task blocked in an RCU | ||
234 | read-side critical section blocks the current grace | ||
235 | period, while a "." preceding the ">" indicates otherwise. | ||
236 | The character following the ">" indicates similarly for | ||
237 | the next grace period. A "T" should appear in this | ||
238 | field only for rcu-preempt. | ||
396 | 239 | ||
397 | o The numbers separated by the ":" are the range of CPUs | 240 | o The numbers separated by the ":" are the range of CPUs |
398 | served by this struct rcu_node. This can be helpful | 241 | served by this struct rcu_node. This can be helpful |
@@ -431,8 +274,9 @@ rcu_bh: | |||
431 | 6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921 | 274 | 6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921 |
432 | 7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542 | 275 | 7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542 |
433 | 276 | ||
434 | As always, this is once again split into "rcu" and "rcu_bh" portions. | 277 | As always, this is once again split into "rcu_sched" and "rcu_bh" |
435 | The fields are as follows: | 278 | portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional |
279 | "rcu_preempt" section. The fields are as follows: | ||
436 | 280 | ||
437 | o "np" is the number of times that __rcu_pending() has been invoked | 281 | o "np" is the number of times that __rcu_pending() has been invoked |
438 | for the corresponding flavor of RCU. | 282 | for the corresponding flavor of RCU. |