diff options
| author | Thomas Mingarelli <thomas.mingarelli@hp.com> | 2009-06-04 15:50:45 -0400 |
|---|---|---|
| committer | Wim Van Sebroeck <wim@iguana.be> | 2009-06-18 03:32:06 -0400 |
| commit | 47bece87b14b866872b52ff04d469832e4936756 (patch) | |
| tree | 812f6e1856cb322f1246a761a46ef20295b4689b /Documentation/watchdog | |
| parent | 55e8ddecec6a9dbe35a99d03cc4189fd7c56e600 (diff) | |
[WATCHDOG] hpwdt: Add NMI sourcing
Add NMI sourcing functionality (Can only be active if nmi_watchdog is
inactive).
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Diffstat (limited to 'Documentation/watchdog')
| -rw-r--r-- | Documentation/watchdog/hpwdt.txt | 84 |
1 files changed, 84 insertions, 0 deletions
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt new file mode 100644 index 000000000000..127839e53043 --- /dev/null +++ b/Documentation/watchdog/hpwdt.txt | |||
| @@ -0,0 +1,84 @@ | |||
| 1 | Last reviewed: 06/02/2009 | ||
| 2 | |||
| 3 | HP iLO2 NMI Watchdog Driver | ||
| 4 | NMI sourcing for iLO2 based ProLiant Servers | ||
| 5 | Documentation and Driver by | ||
| 6 | Thomas Mingarelli <thomas.mingarelli@hp.com> | ||
| 7 | |||
| 8 | The HP iLO2 NMI Watchdog driver is a kernel module that provides basic | ||
| 9 | watchdog functionality and the added benefit of NMI sourcing. Both the | ||
| 10 | watchdog functionality and the NMI sourcing capability need to be enabled | ||
| 11 | by the user. Remember that the two modes are not dependant on one another. | ||
| 12 | A user can have the NMI sourcing without the watchdog timer and vice-versa. | ||
| 13 | |||
| 14 | Watchdog functionality is enabled like any other common watchdog driver. That | ||
| 15 | is, an application needs to be started that kicks off the watchdog timer. A | ||
| 16 | basic application exists in the Documentation/watchdog/src directory called | ||
| 17 | watchdog-test.c. Simply compile the C file and kick it off. If the system | ||
| 18 | gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will | ||
| 19 | not be updated in a timely fashion and a hardware system reset (also known as | ||
| 20 | an Automatic Server Recovery (ASR)) event will occur. | ||
| 21 | |||
| 22 | The hpwdt driver also has three (3) module parameters. They are the following: | ||
| 23 | |||
| 24 | soft_margin - allows the user to set the watchdog timer value | ||
| 25 | allow_kdump - allows the user to save off a kernel dump image after an NMI | ||
| 26 | nowayout - basic watchdog parameter that does not allow the timer to | ||
| 27 | be restarted or an impending ASR to be escaped. | ||
| 28 | |||
| 29 | NOTE: More information about watchdog drivers in general, including the ioctl | ||
| 30 | interface to /dev/watchdog can be found in | ||
| 31 | Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt. | ||
| 32 | |||
| 33 | The NMI sourcing capability is disabled when the driver discovers that the | ||
| 34 | nmi_watchdog is turned on (nmi_watchdog = 1). This is due to the inability to | ||
| 35 | distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the | ||
| 36 | Linux kernel. What this means is that the hpwdt nmi handler code is called | ||
| 37 | each time the NMI signal fires off. This could amount to several thousands of | ||
| 38 | NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and | ||
| 39 | confused" message in the logs or if the system gets into a hung state, then | ||
| 40 | the user should reboot with nmi_watchdog=0. | ||
| 41 | |||
| 42 | 1. If the kernel has not been booted with nmi_watchdog turned off then | ||
| 43 | edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the | ||
| 44 | currently booting kernel line. | ||
| 45 | 2. reboot the sever | ||
| 46 | |||
| 47 | Now, the hpwdt can successfully receive and source the NMI and provide a log | ||
| 48 | message that details the reason for the NMI (as determined by the HP BIOS). | ||
| 49 | |||
| 50 | Below is a list of NMIs the HP BIOS understands along with the associated | ||
| 51 | code (reason): | ||
| 52 | |||
| 53 | No source found 00h | ||
| 54 | |||
| 55 | Uncorrectable Memory Error 01h | ||
| 56 | |||
| 57 | ASR NMI 1Bh | ||
| 58 | |||
| 59 | PCI Parity Error 20h | ||
| 60 | |||
| 61 | NMI Button Press 27h | ||
| 62 | |||
| 63 | SB_BUS_NMI 28h | ||
| 64 | |||
| 65 | ILO Doorbell NMI 29h | ||
| 66 | |||
| 67 | ILO IOP NMI 2Ah | ||
| 68 | |||
| 69 | ILO Watchdog NMI 2Bh | ||
| 70 | |||
| 71 | Proc Throt NMI 2Ch | ||
| 72 | |||
| 73 | Front Side Bus NMI 2Dh | ||
| 74 | |||
| 75 | PCI Express Error 2Fh | ||
| 76 | |||
| 77 | DMA controller NMI 30h | ||
| 78 | |||
| 79 | Hypertransport/CSI Error 31h | ||
| 80 | |||
| 81 | |||
| 82 | |||
| 83 | -- Tom Mingarelli | ||
| 84 | (thomas.mingarelli@hp.com) | ||
