diff options
| author | Paul E. McKenney <paul.mckenney@linaro.org> | 2012-01-20 20:35:55 -0500 |
|---|---|---|
| committer | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 2012-02-21 12:03:52 -0500 |
| commit | 24cd7fd0eaa0d9f5e197ff77a83b006a86696068 (patch) | |
| tree | 9dc9c058272951fa0e3d45dce111ef946ec4f59a /Documentation/RCU | |
| parent | c13f3757d0fcdcc2b7fc5d5e38da76b8913e6648 (diff) | |
rcu: Update stall-warning documentation
Add documentation of CONFIG_RCU_CPU_STALL_VERBOSE, CONFIG_RCU_CPU_STALL_INFO,
and RCU_STALL_DELAY_DELTA. Describe multiple stall-warning messages from
a single stall, and the timing of the subsequent messages. Add headings.
Remove RCU_SECONDS_TILL_STALL_RECHECK because this value is now computed
at runtime from RCU_CPU_STALL_TIMEOUT, so that sysfs changes to the timeout
value now directly affect the RCU_SECONDS_TILL_STALL_RECHECK value.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Diffstat (limited to 'Documentation/RCU')
| -rw-r--r-- | Documentation/RCU/stallwarn.txt | 87 |
1 files changed, 80 insertions, 7 deletions
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 083d88cbc089..523364e4e1f1 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt | |||
| @@ -12,14 +12,38 @@ CONFIG_RCU_CPU_STALL_TIMEOUT | |||
| 12 | This kernel configuration parameter defines the period of time | 12 | This kernel configuration parameter defines the period of time |
| 13 | that RCU will wait from the beginning of a grace period until it | 13 | that RCU will wait from the beginning of a grace period until it |
| 14 | issues an RCU CPU stall warning. This time period is normally | 14 | issues an RCU CPU stall warning. This time period is normally |
| 15 | ten seconds. | 15 | sixty seconds. |
| 16 | 16 | ||
| 17 | RCU_SECONDS_TILL_STALL_RECHECK | 17 | This configuration parameter may be changed at runtime via the |
| 18 | /sys/module/rcutree/parameters/rcu_cpu_stall_timeout, however | ||
| 19 | this parameter is checked only at the beginning of a cycle. | ||
| 20 | So if you are 30 seconds into a 70-second stall, setting this | ||
| 21 | sysfs parameter to (say) five will shorten the timeout for the | ||
| 22 | -next- stall, or the following warning for the current stall | ||
| 23 | (assuming the stall lasts long enough). It will not affect the | ||
| 24 | timing of the next warning for the current stall. | ||
| 18 | 25 | ||
| 19 | This macro defines the period of time that RCU will wait after | 26 | Stall-warning messages may be enabled and disabled completely via |
| 20 | issuing a stall warning until it issues another stall warning | 27 | /sys/module/rcutree/parameters/rcu_cpu_stall_suppress. |
| 21 | for the same stall. This time period is normally set to three | 28 | |
| 22 | times the check interval plus thirty seconds. | 29 | CONFIG_RCU_CPU_STALL_VERBOSE |
| 30 | |||
| 31 | This kernel configuration parameter causes the stall warning to | ||
| 32 | also dump the stacks of any tasks that are blocking the current | ||
| 33 | RCU-preempt grace period. | ||
| 34 | |||
| 35 | RCU_CPU_STALL_INFO | ||
| 36 | |||
| 37 | This kernel configuration parameter causes the stall warning to | ||
| 38 | print out additional per-CPU diagnostic information, including | ||
| 39 | information on scheduling-clock ticks and RCU's idle-CPU tracking. | ||
| 40 | |||
| 41 | RCU_STALL_DELAY_DELTA | ||
| 42 | |||
| 43 | Although the lockdep facility is extremely useful, it does add | ||
| 44 | some overhead. Therefore, under CONFIG_PROVE_RCU, the | ||
| 45 | RCU_STALL_DELAY_DELTA macro allows five extra seconds before | ||
| 46 | giving an RCU CPU stall warning message. | ||
| 23 | 47 | ||
| 24 | RCU_STALL_RAT_DELAY | 48 | RCU_STALL_RAT_DELAY |
| 25 | 49 | ||
| @@ -64,6 +88,54 @@ INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffi | |||
| 64 | 88 | ||
| 65 | This is rare, but does happen from time to time in real life. | 89 | This is rare, but does happen from time to time in real life. |
| 66 | 90 | ||
| 91 | If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, | ||
| 92 | more information is printed with the stall-warning message, for example: | ||
| 93 | |||
| 94 | INFO: rcu_preempt detected stall on CPU | ||
| 95 | 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 | ||
| 96 | (t=65000 jiffies) | ||
| 97 | |||
| 98 | In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is | ||
| 99 | printed: | ||
| 100 | |||
| 101 | INFO: rcu_preempt detected stall on CPU | ||
| 102 | 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer=-1 | ||
| 103 | (t=65000 jiffies) | ||
| 104 | |||
| 105 | The "(64628 ticks this GP)" indicates that this CPU has taken more | ||
| 106 | than 64,000 scheduling-clock interrupts during the current stalled | ||
| 107 | grace period. If the CPU was not yet aware of the current grace | ||
| 108 | period (for example, if it was offline), then this part of the message | ||
| 109 | indicates how many grace periods behind the CPU is. | ||
| 110 | |||
| 111 | The "idle=" portion of the message prints the dyntick-idle state. | ||
| 112 | The hex number before the first "/" is the low-order 12 bits of the | ||
| 113 | dynticks counter, which will have an even-numbered value if the CPU is | ||
| 114 | in dyntick-idle mode and an odd-numbered value otherwise. The hex | ||
| 115 | number between the two "/"s is the value of the nesting, which will | ||
| 116 | be a small positive number if in the idle loop and a very large positive | ||
| 117 | number (as shown above) otherwise. | ||
| 118 | |||
| 119 | For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the | ||
| 120 | CPU is not in the process of trying to force itself into dyntick-idle | ||
| 121 | state, the "." indicates that the CPU has not given up forcing RCU | ||
| 122 | into dyntick-idle mode (it would be "H" otherwise), and the "timer=-1" | ||
| 123 | indicates that the CPU has not recented forced RCU into dyntick-idle | ||
| 124 | mode (it would otherwise indicate the number of microseconds remaining | ||
| 125 | in this forced state). | ||
| 126 | |||
| 127 | |||
| 128 | Multiple Warnings From One Stall | ||
| 129 | |||
| 130 | If a stall lasts long enough, multiple stall-warning messages will be | ||
| 131 | printed for it. The second and subsequent messages are printed at | ||
| 132 | longer intervals, so that the time between (say) the first and second | ||
| 133 | message will be about three times the interval between the beginning | ||
| 134 | of the stall and the first message. | ||
| 135 | |||
| 136 | |||
| 137 | What Causes RCU CPU Stall Warnings? | ||
| 138 | |||
| 67 | So your kernel printed an RCU CPU stall warning. The next question is | 139 | So your kernel printed an RCU CPU stall warning. The next question is |
| 68 | "What caused it?" The following problems can result in RCU CPU stall | 140 | "What caused it?" The following problems can result in RCU CPU stall |
| 69 | warnings: | 141 | warnings: |
| @@ -128,4 +200,5 @@ is occurring, which will usually be in the function nearest the top of | |||
| 128 | that portion of the stack which remains the same from trace to trace. | 200 | that portion of the stack which remains the same from trace to trace. |
| 129 | If you can reliably trigger the stall, ftrace can be quite helpful. | 201 | If you can reliably trigger the stall, ftrace can be quite helpful. |
| 130 | 202 | ||
| 131 | RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE. | 203 | RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE |
| 204 | and with RCU's event tracing. | ||
