aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/RCU/stallwarn.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/RCU/stallwarn.txt')
-rw-r--r--Documentation/RCU/stallwarn.txt103
1 files changed, 13 insertions, 90 deletions
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 1927151b386..4e959208f73 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -12,38 +12,14 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
12 This kernel configuration parameter defines the period of time 12 This kernel configuration parameter defines the period of time
13 that RCU will wait from the beginning of a grace period until it 13 that RCU will wait from the beginning of a grace period until it
14 issues an RCU CPU stall warning. This time period is normally 14 issues an RCU CPU stall warning. This time period is normally
15 sixty seconds. 15 ten seconds.
16 16
17 This configuration parameter may be changed at runtime via the 17RCU_SECONDS_TILL_STALL_RECHECK
18 /sys/module/rcutree/parameters/rcu_cpu_stall_timeout, however
19 this parameter is checked only at the beginning of a cycle.
20 So if you are 30 seconds into a 70-second stall, setting this
21 sysfs parameter to (say) five will shorten the timeout for the
22 -next- stall, or the following warning for the current stall
23 (assuming the stall lasts long enough). It will not affect the
24 timing of the next warning for the current stall.
25 18
26 Stall-warning messages may be enabled and disabled completely via 19 This macro defines the period of time that RCU will wait after
27 /sys/module/rcutree/parameters/rcu_cpu_stall_suppress. 20 issuing a stall warning until it issues another stall warning
28 21 for the same stall. This time period is normally set to three
29CONFIG_RCU_CPU_STALL_VERBOSE 22 times the check interval plus thirty seconds.
30
31 This kernel configuration parameter causes the stall warning to
32 also dump the stacks of any tasks that are blocking the current
33 RCU-preempt grace period.
34
35RCU_CPU_STALL_INFO
36
37 This kernel configuration parameter causes the stall warning to
38 print out additional per-CPU diagnostic information, including
39 information on scheduling-clock ticks and RCU's idle-CPU tracking.
40
41RCU_STALL_DELAY_DELTA
42
43 Although the lockdep facility is extremely useful, it does add
44 some overhead. Therefore, under CONFIG_PROVE_RCU, the
45 RCU_STALL_DELAY_DELTA macro allows five extra seconds before
46 giving an RCU CPU stall warning message.
47 23
48RCU_STALL_RAT_DELAY 24RCU_STALL_RAT_DELAY
49 25
@@ -88,54 +64,6 @@ INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffi
88 64
89This is rare, but does happen from time to time in real life. 65This is rare, but does happen from time to time in real life.
90 66
91If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set,
92more information is printed with the stall-warning message, for example:
93
94 INFO: rcu_preempt detected stall on CPU
95 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0
96 (t=65000 jiffies)
97
98In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is
99printed:
100
101 INFO: rcu_preempt detected stall on CPU
102 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer not pending
103 (t=65000 jiffies)
104
105The "(64628 ticks this GP)" indicates that this CPU has taken more
106than 64,000 scheduling-clock interrupts during the current stalled
107grace period. If the CPU was not yet aware of the current grace
108period (for example, if it was offline), then this part of the message
109indicates how many grace periods behind the CPU is.
110
111The "idle=" portion of the message prints the dyntick-idle state.
112The hex number before the first "/" is the low-order 12 bits of the
113dynticks counter, which will have an even-numbered value if the CPU is
114in dyntick-idle mode and an odd-numbered value otherwise. The hex
115number between the two "/"s is the value of the nesting, which will
116be a small positive number if in the idle loop and a very large positive
117number (as shown above) otherwise.
118
119For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the CPU is
120not in the process of trying to force itself into dyntick-idle state, the
121"." indicates that the CPU has not given up forcing RCU into dyntick-idle
122mode (it would be "H" otherwise), and the "timer not pending" indicates
123that the CPU has not recently forced RCU into dyntick-idle mode (it
124would otherwise indicate the number of microseconds remaining in this
125forced state).
126
127
128Multiple Warnings From One Stall
129
130If a stall lasts long enough, multiple stall-warning messages will be
131printed for it. The second and subsequent messages are printed at
132longer intervals, so that the time between (say) the first and second
133message will be about three times the interval between the beginning
134of the stall and the first message.
135
136
137What Causes RCU CPU Stall Warnings?
138
139So your kernel printed an RCU CPU stall warning. The next question is 67So your kernel printed an RCU CPU stall warning. The next question is
140"What caused it?" The following problems can result in RCU CPU stall 68"What caused it?" The following problems can result in RCU CPU stall
141warnings: 69warnings:
@@ -173,11 +101,6 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
173 CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning 101 CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
174 messages. 102 messages.
175 103
176o A hardware or software issue shuts off the scheduler-clock
177 interrupt on a CPU that is not in dyntick-idle mode. This
178 problem really has happened, and seems to be most likely to
179 result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
180
181o A bug in the RCU implementation. 104o A bug in the RCU implementation.
182 105
183o A hardware failure. This is quite unlikely, but has occurred 106o A hardware failure. This is quite unlikely, but has occurred
@@ -186,11 +109,12 @@ o A hardware failure. This is quite unlikely, but has occurred
186 This resulted in a series of RCU CPU stall warnings, eventually 109 This resulted in a series of RCU CPU stall warnings, eventually
187 leading the realization that the CPU had failed. 110 leading the realization that the CPU had failed.
188 111
189The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. 112The RCU, RCU-sched, and RCU-bh implementations have CPU stall
190SRCU does not have its own CPU stall warnings, but its calls to 113warning. SRCU does not have its own CPU stall warnings, but its
191synchronize_sched() will result in RCU-sched detecting RCU-sched-related 114calls to synchronize_sched() will result in RCU-sched detecting
192CPU stalls. Please note that RCU only detects CPU stalls when there is 115RCU-sched-related CPU stalls. Please note that RCU only detects
193a grace period in progress. No grace period, no CPU stall warnings. 116CPU stalls when there is a grace period in progress. No grace period,
117no CPU stall warnings.
194 118
195To diagnose the cause of the stall, inspect the stack traces. 119To diagnose the cause of the stall, inspect the stack traces.
196The offending function will usually be near the top of the stack. 120The offending function will usually be near the top of the stack.
@@ -200,5 +124,4 @@ is occurring, which will usually be in the function nearest the top of
200that portion of the stack which remains the same from trace to trace. 124that portion of the stack which remains the same from trace to trace.
201If you can reliably trigger the stall, ftrace can be quite helpful. 125If you can reliably trigger the stall, ftrace can be quite helpful.
202 126
203RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE 127RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE.
204and with RCU's event tracing.