diff options
author | Paul E. McKenney <paul.mckenney@linaro.org> | 2012-01-20 20:35:55 -0500 |
---|---|---|
committer | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2012-02-21 12:03:52 -0500 |
commit | 24cd7fd0eaa0d9f5e197ff77a83b006a86696068 (patch) | |
tree | 9dc9c058272951fa0e3d45dce111ef946ec4f59a /Documentation | |
parent | c13f3757d0fcdcc2b7fc5d5e38da76b8913e6648 (diff) |
rcu: Update stall-warning documentation
Add documentation of CONFIG_RCU_CPU_STALL_VERBOSE, CONFIG_RCU_CPU_STALL_INFO,
and RCU_STALL_DELAY_DELTA. Describe multiple stall-warning messages from
a single stall, and the timing of the subsequent messages. Add headings.
Remove RCU_SECONDS_TILL_STALL_RECHECK because this value is now computed
at runtime from RCU_CPU_STALL_TIMEOUT, so that sysfs changes to the timeout
value now directly affect the RCU_SECONDS_TILL_STALL_RECHECK value.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 87 |
1 files changed, 80 insertions, 7 deletions
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 083d88cbc089..523364e4e1f1 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt | |||
@@ -12,14 +12,38 @@ CONFIG_RCU_CPU_STALL_TIMEOUT | |||
12 | This kernel configuration parameter defines the period of time | 12 | This kernel configuration parameter defines the period of time |
13 | that RCU will wait from the beginning of a grace period until it | 13 | that RCU will wait from the beginning of a grace period until it |
14 | issues an RCU CPU stall warning. This time period is normally | 14 | issues an RCU CPU stall warning. This time period is normally |
15 | ten seconds. | 15 | sixty seconds. |
16 | 16 | ||
17 | RCU_SECONDS_TILL_STALL_RECHECK | 17 | This configuration parameter may be changed at runtime via the |
18 | /sys/module/rcutree/parameters/rcu_cpu_stall_timeout, however | ||
19 | this parameter is checked only at the beginning of a cycle. | ||
20 | So if you are 30 seconds into a 70-second stall, setting this | ||
21 | sysfs parameter to (say) five will shorten the timeout for the | ||
22 | -next- stall, or the following warning for the current stall | ||
23 | (assuming the stall lasts long enough). It will not affect the | ||
24 | timing of the next warning for the current stall. | ||
18 | 25 | ||
19 | This macro defines the period of time that RCU will wait after | 26 | Stall-warning messages may be enabled and disabled completely via |
20 | issuing a stall warning until it issues another stall warning | 27 | /sys/module/rcutree/parameters/rcu_cpu_stall_suppress. |
21 | for the same stall. This time period is normally set to three | 28 | |
22 | times the check interval plus thirty seconds. | 29 | CONFIG_RCU_CPU_STALL_VERBOSE |
30 | |||
31 | This kernel configuration parameter causes the stall warning to | ||
32 | also dump the stacks of any tasks that are blocking the current | ||
33 | RCU-preempt grace period. | ||
34 | |||
35 | RCU_CPU_STALL_INFO | ||
36 | |||
37 | This kernel configuration parameter causes the stall warning to | ||
38 | print out additional per-CPU diagnostic information, including | ||
39 | information on scheduling-clock ticks and RCU's idle-CPU tracking. | ||
40 | |||
41 | RCU_STALL_DELAY_DELTA | ||
42 | |||
43 | Although the lockdep facility is extremely useful, it does add | ||
44 | some overhead. Therefore, under CONFIG_PROVE_RCU, the | ||
45 | RCU_STALL_DELAY_DELTA macro allows five extra seconds before | ||
46 | giving an RCU CPU stall warning message. | ||
23 | 47 | ||
24 | RCU_STALL_RAT_DELAY | 48 | RCU_STALL_RAT_DELAY |
25 | 49 | ||
@@ -64,6 +88,54 @@ INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffi | |||
64 | 88 | ||
65 | This is rare, but does happen from time to time in real life. | 89 | This is rare, but does happen from time to time in real life. |
66 | 90 | ||
91 | If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, | ||
92 | more information is printed with the stall-warning message, for example: | ||
93 | |||
94 | INFO: rcu_preempt detected stall on CPU | ||
95 | 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 | ||
96 | (t=65000 jiffies) | ||
97 | |||
98 | In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is | ||
99 | printed: | ||
100 | |||
101 | INFO: rcu_preempt detected stall on CPU | ||
102 | 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer=-1 | ||
103 | (t=65000 jiffies) | ||
104 | |||
105 | The "(64628 ticks this GP)" indicates that this CPU has taken more | ||
106 | than 64,000 scheduling-clock interrupts during the current stalled | ||
107 | grace period. If the CPU was not yet aware of the current grace | ||
108 | period (for example, if it was offline), then this part of the message | ||
109 | indicates how many grace periods behind the CPU is. | ||
110 | |||
111 | The "idle=" portion of the message prints the dyntick-idle state. | ||
112 | The hex number before the first "/" is the low-order 12 bits of the | ||
113 | dynticks counter, which will have an even-numbered value if the CPU is | ||
114 | in dyntick-idle mode and an odd-numbered value otherwise. The hex | ||
115 | number between the two "/"s is the value of the nesting, which will | ||
116 | be a small positive number if in the idle loop and a very large positive | ||
117 | number (as shown above) otherwise. | ||
118 | |||
119 | For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the | ||
120 | CPU is not in the process of trying to force itself into dyntick-idle | ||
121 | state, the "." indicates that the CPU has not given up forcing RCU | ||
122 | into dyntick-idle mode (it would be "H" otherwise), and the "timer=-1" | ||
123 | indicates that the CPU has not recented forced RCU into dyntick-idle | ||
124 | mode (it would otherwise indicate the number of microseconds remaining | ||
125 | in this forced state). | ||
126 | |||
127 | |||
128 | Multiple Warnings From One Stall | ||
129 | |||
130 | If a stall lasts long enough, multiple stall-warning messages will be | ||
131 | printed for it. The second and subsequent messages are printed at | ||
132 | longer intervals, so that the time between (say) the first and second | ||
133 | message will be about three times the interval between the beginning | ||
134 | of the stall and the first message. | ||
135 | |||
136 | |||
137 | What Causes RCU CPU Stall Warnings? | ||
138 | |||
67 | So your kernel printed an RCU CPU stall warning. The next question is | 139 | So your kernel printed an RCU CPU stall warning. The next question is |
68 | "What caused it?" The following problems can result in RCU CPU stall | 140 | "What caused it?" The following problems can result in RCU CPU stall |
69 | warnings: | 141 | warnings: |
@@ -128,4 +200,5 @@ is occurring, which will usually be in the function nearest the top of | |||
128 | that portion of the stack which remains the same from trace to trace. | 200 | that portion of the stack which remains the same from trace to trace. |
129 | If you can reliably trigger the stall, ftrace can be quite helpful. | 201 | If you can reliably trigger the stall, ftrace can be quite helpful. |
130 | 202 | ||
131 | RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE. | 203 | RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE |
204 | and with RCU's event tracing. | ||