diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/watchdog/hpwdt.txt | 84 |
1 files changed, 84 insertions, 0 deletions
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt new file mode 100644 index 000000000000..127839e53043 --- /dev/null +++ b/Documentation/watchdog/hpwdt.txt | |||
@@ -0,0 +1,84 @@ | |||
1 | Last reviewed: 06/02/2009 | ||
2 | |||
3 | HP iLO2 NMI Watchdog Driver | ||
4 | NMI sourcing for iLO2 based ProLiant Servers | ||
5 | Documentation and Driver by | ||
6 | Thomas Mingarelli <thomas.mingarelli@hp.com> | ||
7 | |||
8 | The HP iLO2 NMI Watchdog driver is a kernel module that provides basic | ||
9 | watchdog functionality and the added benefit of NMI sourcing. Both the | ||
10 | watchdog functionality and the NMI sourcing capability need to be enabled | ||
11 | by the user. Remember that the two modes are not dependant on one another. | ||
12 | A user can have the NMI sourcing without the watchdog timer and vice-versa. | ||
13 | |||
14 | Watchdog functionality is enabled like any other common watchdog driver. That | ||
15 | is, an application needs to be started that kicks off the watchdog timer. A | ||
16 | basic application exists in the Documentation/watchdog/src directory called | ||
17 | watchdog-test.c. Simply compile the C file and kick it off. If the system | ||
18 | gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will | ||
19 | not be updated in a timely fashion and a hardware system reset (also known as | ||
20 | an Automatic Server Recovery (ASR)) event will occur. | ||
21 | |||
22 | The hpwdt driver also has three (3) module parameters. They are the following: | ||
23 | |||
24 | soft_margin - allows the user to set the watchdog timer value | ||
25 | allow_kdump - allows the user to save off a kernel dump image after an NMI | ||
26 | nowayout - basic watchdog parameter that does not allow the timer to | ||
27 | be restarted or an impending ASR to be escaped. | ||
28 | |||
29 | NOTE: More information about watchdog drivers in general, including the ioctl | ||
30 | interface to /dev/watchdog can be found in | ||
31 | Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt. | ||
32 | |||
33 | The NMI sourcing capability is disabled when the driver discovers that the | ||
34 | nmi_watchdog is turned on (nmi_watchdog = 1). This is due to the inability to | ||
35 | distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the | ||
36 | Linux kernel. What this means is that the hpwdt nmi handler code is called | ||
37 | each time the NMI signal fires off. This could amount to several thousands of | ||
38 | NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and | ||
39 | confused" message in the logs or if the system gets into a hung state, then | ||
40 | the user should reboot with nmi_watchdog=0. | ||
41 | |||
42 | 1. If the kernel has not been booted with nmi_watchdog turned off then | ||
43 | edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the | ||
44 | currently booting kernel line. | ||
45 | 2. reboot the sever | ||
46 | |||
47 | Now, the hpwdt can successfully receive and source the NMI and provide a log | ||
48 | message that details the reason for the NMI (as determined by the HP BIOS). | ||
49 | |||
50 | Below is a list of NMIs the HP BIOS understands along with the associated | ||
51 | code (reason): | ||
52 | |||
53 | No source found 00h | ||
54 | |||
55 | Uncorrectable Memory Error 01h | ||
56 | |||
57 | ASR NMI 1Bh | ||
58 | |||
59 | PCI Parity Error 20h | ||
60 | |||
61 | NMI Button Press 27h | ||
62 | |||
63 | SB_BUS_NMI 28h | ||
64 | |||
65 | ILO Doorbell NMI 29h | ||
66 | |||
67 | ILO IOP NMI 2Ah | ||
68 | |||
69 | ILO Watchdog NMI 2Bh | ||
70 | |||
71 | Proc Throt NMI 2Ch | ||
72 | |||
73 | Front Side Bus NMI 2Dh | ||
74 | |||
75 | PCI Express Error 2Fh | ||
76 | |||
77 | DMA controller NMI 30h | ||
78 | |||
79 | Hypertransport/CSI Error 31h | ||
80 | |||
81 | |||
82 | |||
83 | -- Tom Mingarelli | ||
84 | (thomas.mingarelli@hp.com) | ||