diff options
author | Jerry Hoemann <jerry.hoemann@hpe.com> | 2018-08-20 15:31:23 -0400 |
---|---|---|
committer | Wim Van Sebroeck <wim@linux-watchdog.org> | 2018-10-02 07:32:23 -0400 |
commit | 18bd1963aef94e0186ad435f8a497b74c00b73de (patch) | |
tree | a7fe5659b8f6b8f7a547607029aa618511082038 /Documentation/watchdog | |
parent | e1c7f79ea54cac01d88e45f05a4c411cdb33e862 (diff) |
watchdog: hpwdt: Update Driver Documentation.
Remove references to deprecated features like NMI sourcing
and obsoleted module parameters.
Add details concerning new module parameter pretimeout and tips
to programming it.
Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Diffstat (limited to 'Documentation/watchdog')
-rw-r--r-- | Documentation/watchdog/hpwdt.txt | 93 |
1 files changed, 31 insertions, 62 deletions
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt index 6d866c537127..55df692c5595 100644 --- a/Documentation/watchdog/hpwdt.txt +++ b/Documentation/watchdog/hpwdt.txt | |||
@@ -1,15 +1,12 @@ | |||
1 | Last reviewed: 05/20/2016 | 1 | Last reviewed: 08/20/2018 |
2 | 2 | ||
3 | HPE iLO NMI Watchdog Driver | 3 | HPE iLO NMI Watchdog Driver |
4 | NMI sourcing for iLO based ProLiant Servers | 4 | for iLO based ProLiant Servers |
5 | Documentation and Driver by | ||
6 | Thomas Mingarelli | ||
7 | 5 | ||
8 | The HPE iLO NMI Watchdog driver is a kernel module that provides basic | 6 | The HPE iLO NMI Watchdog driver is a kernel module that provides basic |
9 | watchdog functionality and the added benefit of NMI sourcing. Both the | 7 | watchdog functionality and handler for the iLO "Generate NMI to System" |
10 | watchdog functionality and the NMI sourcing capability need to be enabled | 8 | virtual button. |
11 | by the user. Remember that the two modes are not dependent on one another. | 9 | |
12 | A user can have the NMI sourcing without the watchdog timer and vice-versa. | ||
13 | All references to iLO in this document imply it also works on iLO2 and all | 10 | All references to iLO in this document imply it also works on iLO2 and all |
14 | subsequent generations. | 11 | subsequent generations. |
15 | 12 | ||
@@ -21,12 +18,16 @@ Last reviewed: 05/20/2016 | |||
21 | not be updated in a timely fashion and a hardware system reset (also known as | 18 | not be updated in a timely fashion and a hardware system reset (also known as |
22 | an Automatic Server Recovery (ASR)) event will occur. | 19 | an Automatic Server Recovery (ASR)) event will occur. |
23 | 20 | ||
24 | The hpwdt driver also has three (3) module parameters. They are the following: | 21 | The hpwdt driver also has the following module parameters: |
25 | 22 | ||
26 | soft_margin - allows the user to set the watchdog timer value. | 23 | soft_margin - allows the user to set the watchdog timer value. |
27 | Default value is 30 seconds. | 24 | Default value is 30 seconds. |
28 | allow_kdump - allows the user to save off a kernel dump image after an NMI. | 25 | timeout - an alias of soft_margin. |
29 | Default value is 1/ON | 26 | pretimeout - allows the user to set the watchdog pretimeout value. |
27 | This is the number of seconds before timeout when an | ||
28 | NMI is delivered to the system. Setting the value to | ||
29 | zero disables the pretimeout NMI. | ||
30 | Default value is 9 seconds. | ||
30 | nowayout - basic watchdog parameter that does not allow the timer to | 31 | nowayout - basic watchdog parameter that does not allow the timer to |
31 | be restarted or an impending ASR to be escaped. | 32 | be restarted or an impending ASR to be escaped. |
32 | Default value is set when compiling the kernel. If it is set | 33 | Default value is set when compiling the kernel. If it is set |
@@ -37,61 +38,29 @@ Last reviewed: 05/20/2016 | |||
37 | interface to /dev/watchdog can be found in | 38 | interface to /dev/watchdog can be found in |
38 | Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt. | 39 | Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt. |
39 | 40 | ||
40 | The NMI sourcing capability is disabled by default due to the inability to | 41 | Due to limitations in the iLO hardware, the NMI pretimeout if enabled, |
41 | distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the | 42 | can only be set to 9 seconds. Attempts to set pretimeout to other |
42 | Linux kernel. What this means is that the hpwdt nmi handler code is called | 43 | non-zero values will be rounded, possibly to zero. Users should verify |
43 | each time the NMI signal fires off. This could amount to several thousands of | 44 | the pretimeout value after attempting to set pretimeout or timeout. |
44 | NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and | ||
45 | confused" message in the logs or if the system gets into a hung state, then | ||
46 | the hpwdt driver can be reloaded. | ||
47 | |||
48 | 1. If the kernel has not been booted with nmi_watchdog turned off then | ||
49 | edit and place the nmi_watchdog=0 at the end of the currently booting | ||
50 | kernel line. Depending on your Linux distribution and platform setup: | ||
51 | For non-UEFI systems | ||
52 | /boot/grub/grub.conf or | ||
53 | /boot/grub/menu.lst | ||
54 | For UEFI systems | ||
55 | /boot/efi/EFI/distroname/grub.conf or | ||
56 | /boot/efi/efi/distroname/elilo.conf | ||
57 | 2. reboot the sever | ||
58 | 3. Once the system comes up perform a modprobe -r hpwdt | ||
59 | 4. modprobe /lib/modules/`uname -r`/kernel/drivers/watchdog/hpwdt.ko | ||
60 | |||
61 | Now, the hpwdt can successfully receive and source the NMI and provide a log | ||
62 | message that details the reason for the NMI (as determined by the HPE BIOS). | ||
63 | |||
64 | Below is a list of NMIs the HPE BIOS understands along with the associated | ||
65 | code (reason): | ||
66 | |||
67 | No source found 00h | ||
68 | |||
69 | Uncorrectable Memory Error 01h | ||
70 | |||
71 | ASR NMI 1Bh | ||
72 | |||
73 | PCI Parity Error 20h | ||
74 | |||
75 | NMI Button Press 27h | ||
76 | |||
77 | SB_BUS_NMI 28h | ||
78 | |||
79 | ILO Doorbell NMI 29h | ||
80 | |||
81 | ILO IOP NMI 2Ah | ||
82 | |||
83 | ILO Watchdog NMI 2Bh | ||
84 | |||
85 | Proc Throt NMI 2Ch | ||
86 | 45 | ||
87 | Front Side Bus NMI 2Dh | 46 | Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a |
47 | panic. This is to allow for a crash dump to be collected. It is incumbent | ||
48 | upon the user to have properly configured the system for kdump. | ||
88 | 49 | ||
89 | PCI Express Error 2Fh | 50 | The default Linux kernel behavior upon panic is to print a kernel tombstone |
51 | and loop forever. This is generally not what a watchdog user wants. | ||
90 | 52 | ||
91 | DMA controller NMI 30h | 53 | For those wishing to learn more please see: |
54 | Documentation/kdump/kdump.txt | ||
55 | Documentation/admin-guide/kernel-parameters.txt (panic=) | ||
56 | Your Linux Distribution specific documentation. | ||
92 | 57 | ||
93 | Hypertransport/CSI Error 31h | 58 | If the hpwdt does not receive the NMI associated with an expiring timer, |
59 | the iLO will proceed to reset the system at timeout if the timer hasn't | ||
60 | been updated. | ||
94 | 61 | ||
62 | -- | ||
95 | 63 | ||
64 | The HPE iLO NMI Watchdog Driver and documentation were originally developed | ||
65 | by Tom Mingarelli. | ||
96 | 66 | ||
97 | -- Tom Mingarelli | ||