aboutsummaryrefslogtreecommitdiffstats
path: root/lib
diff options
context:
space:
mode:
authorDon Zickus <dzickus@redhat.com>2010-05-07 17:11:44 -0400
committerFrederic Weisbecker <fweisbec@gmail.com>2010-05-12 17:55:33 -0400
commit58687acba59266735adb8ccd9b5b9aa2c7cd205b (patch)
tree7236582375310b116eedec6facbee87d42e3dd6d /lib
parenta9aa1d02de36b450990b0e25a88fc2ff1c3e6b94 (diff)
lockup_detector: Combine nmi_watchdog and softlockup detector
The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Diffstat (limited to 'lib')
-rw-r--r--lib/Kconfig.debug30
1 files changed, 20 insertions, 10 deletions
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 220ae6063b6f..49e285dcaf57 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -153,7 +153,7 @@ config DEBUG_SHIRQ
153 points; some don't and need to be caught. 153 points; some don't and need to be caught.
154 154
155config DETECT_SOFTLOCKUP 155config DETECT_SOFTLOCKUP
156 bool "Detect Soft Lockups" 156 bool
157 depends on DEBUG_KERNEL && !S390 157 depends on DEBUG_KERNEL && !S390
158 default y 158 default y
159 help 159 help
@@ -171,17 +171,27 @@ config DETECT_SOFTLOCKUP
171 can be detected via the NMI-watchdog, on platforms that 171 can be detected via the NMI-watchdog, on platforms that
172 support it.) 172 support it.)
173 173
174config NMI_WATCHDOG 174config LOCKUP_DETECTOR
175 bool "Detect Hard Lockups with an NMI Watchdog" 175 bool "Detect Hard and Soft Lockups"
176 depends on DEBUG_KERNEL && PERF_EVENTS && PERF_EVENTS_NMI 176 depends on DEBUG_KERNEL
177 default DETECT_SOFTLOCKUP
177 help 178 help
178 Say Y here to enable the kernel to use the NMI as a watchdog 179 Say Y here to enable the kernel to act as a watchdog to detect
179 to detect hard lockups. This is useful when a cpu hangs for no 180 hard and soft lockups.
180 reason but can still respond to NMIs. A backtrace is displayed 181
181 for reviewing and reporting. 182 Softlockups are bugs that cause the kernel to loop in kernel
183 mode for more than 60 seconds, without giving other tasks a
184 chance to run. The current stack trace is displayed upon
185 detection and the system will stay locked up.
186
187 Hardlockups are bugs that cause the CPU to loop in kernel mode
188 for more than 60 seconds, without letting other interrupts have a
189 chance to run. The current stack trace is displayed upon detection
190 and the system will stay locked up.
182 191
183 The overhead should be minimal, just an extra NMI every few 192 The overhead should be minimal. A periodic hrtimer runs to
184 seconds. 193 generate interrupts and kick the watchdog task every 10-12 seconds.
194 An NMI is generated every 60 seconds or so to check for hardlockups.
185 195
186config BOOTPARAM_SOFTLOCKUP_PANIC 196config BOOTPARAM_SOFTLOCKUP_PANIC
187 bool "Panic (Reboot) On Soft Lockups" 197 bool "Panic (Reboot) On Soft Lockups"