diff options
Diffstat (limited to 'Documentation/watchdog/hpwdt.txt')
-rw-r--r-- | Documentation/watchdog/hpwdt.txt | 95 |
1 files changed, 95 insertions, 0 deletions
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt new file mode 100644 index 000000000000..9c24d5ffbb06 --- /dev/null +++ b/Documentation/watchdog/hpwdt.txt | |||
@@ -0,0 +1,95 @@ | |||
1 | Last reviewed: 06/02/2009 | ||
2 | |||
3 | HP iLO2 NMI Watchdog Driver | ||
4 | NMI sourcing for iLO2 based ProLiant Servers | ||
5 | Documentation and Driver by | ||
6 | Thomas Mingarelli <thomas.mingarelli@hp.com> | ||
7 | |||
8 | The HP iLO2 NMI Watchdog driver is a kernel module that provides basic | ||
9 | watchdog functionality and the added benefit of NMI sourcing. Both the | ||
10 | watchdog functionality and the NMI sourcing capability need to be enabled | ||
11 | by the user. Remember that the two modes are not dependant on one another. | ||
12 | A user can have the NMI sourcing without the watchdog timer and vice-versa. | ||
13 | |||
14 | Watchdog functionality is enabled like any other common watchdog driver. That | ||
15 | is, an application needs to be started that kicks off the watchdog timer. A | ||
16 | basic application exists in the Documentation/watchdog/src directory called | ||
17 | watchdog-test.c. Simply compile the C file and kick it off. If the system | ||
18 | gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will | ||
19 | not be updated in a timely fashion and a hardware system reset (also known as | ||
20 | an Automatic Server Recovery (ASR)) event will occur. | ||
21 | |||
22 | The hpwdt driver also has four (4) module parameters. They are the following: | ||
23 | |||
24 | soft_margin - allows the user to set the watchdog timer value | ||
25 | allow_kdump - allows the user to save off a kernel dump image after an NMI | ||
26 | nowayout - basic watchdog parameter that does not allow the timer to | ||
27 | be restarted or an impending ASR to be escaped. | ||
28 | priority - determines whether or not the hpwdt driver is first on the | ||
29 | die_notify list to handle NMIs or last. The default value | ||
30 | for this module parameter is 0 or LAST. If the user wants to | ||
31 | enable NMI sourcing then reload the hpwdt driver with | ||
32 | priority=1 (and boot with nmi_watchdog=0). | ||
33 | |||
34 | NOTE: More information about watchdog drivers in general, including the ioctl | ||
35 | interface to /dev/watchdog can be found in | ||
36 | Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt. | ||
37 | |||
38 | The priority parameter was introduced due to other kernel software that relied | ||
39 | on handling NMIs (like oprofile). Keeping hpwdt's priority at 0 (or LAST) | ||
40 | enables the users of NMIs for non critical events to be work as expected. | ||
41 | |||
42 | The NMI sourcing capability is disabled by default due to the inability to | ||
43 | distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the | ||
44 | Linux kernel. What this means is that the hpwdt nmi handler code is called | ||
45 | each time the NMI signal fires off. This could amount to several thousands of | ||
46 | NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and | ||
47 | confused" message in the logs or if the system gets into a hung state, then | ||
48 | the hpwdt driver can be reloaded with the "priority" module parameter set | ||
49 | (priority=1). | ||
50 | |||
51 | 1. If the kernel has not been booted with nmi_watchdog turned off then | ||
52 | edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the | ||
53 | currently booting kernel line. | ||
54 | 2. reboot the sever | ||
55 | 3. Once the system comes up perform a rmmod hpwdt | ||
56 | 4. insmod /lib/modules/`uname -r`/kernel/drivers/char/watchdog/hpwdt.ko priority=1 | ||
57 | |||
58 | Now, the hpwdt can successfully receive and source the NMI and provide a log | ||
59 | message that details the reason for the NMI (as determined by the HP BIOS). | ||
60 | |||
61 | Below is a list of NMIs the HP BIOS understands along with the associated | ||
62 | code (reason): | ||
63 | |||
64 | No source found 00h | ||
65 | |||
66 | Uncorrectable Memory Error 01h | ||
67 | |||
68 | ASR NMI 1Bh | ||
69 | |||
70 | PCI Parity Error 20h | ||
71 | |||
72 | NMI Button Press 27h | ||
73 | |||
74 | SB_BUS_NMI 28h | ||
75 | |||
76 | ILO Doorbell NMI 29h | ||
77 | |||
78 | ILO IOP NMI 2Ah | ||
79 | |||
80 | ILO Watchdog NMI 2Bh | ||
81 | |||
82 | Proc Throt NMI 2Ch | ||
83 | |||
84 | Front Side Bus NMI 2Dh | ||
85 | |||
86 | PCI Express Error 2Fh | ||
87 | |||
88 | DMA controller NMI 30h | ||
89 | |||
90 | Hypertransport/CSI Error 31h | ||
91 | |||
92 | |||
93 | |||
94 | -- Tom Mingarelli | ||
95 | (thomas.mingarelli@hp.com) | ||