diff options
Diffstat (limited to 'Documentation/watchdog/watchdog-api.txt')
-rw-r--r-- | Documentation/watchdog/watchdog-api.txt | 399 |
1 files changed, 399 insertions, 0 deletions
diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt new file mode 100644 index 000000000000..28388aa700c6 --- /dev/null +++ b/Documentation/watchdog/watchdog-api.txt | |||
@@ -0,0 +1,399 @@ | |||
1 | The Linux Watchdog driver API. | ||
2 | |||
3 | Copyright 2002 Christer Weingel <wingel@nano-system.com> | ||
4 | |||
5 | Some parts of this document are copied verbatim from the sbc60xxwdt | ||
6 | driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk> | ||
7 | |||
8 | This document describes the state of the Linux 2.4.18 kernel. | ||
9 | |||
10 | Introduction: | ||
11 | |||
12 | A Watchdog Timer (WDT) is a hardware circuit that can reset the | ||
13 | computer system in case of a software fault. You probably knew that | ||
14 | already. | ||
15 | |||
16 | Usually a userspace daemon will notify the kernel watchdog driver via the | ||
17 | /dev/watchdog special device file that userspace is still alive, at | ||
18 | regular intervals. When such a notification occurs, the driver will | ||
19 | usually tell the hardware watchdog that everything is in order, and | ||
20 | that the watchdog should wait for yet another little while to reset | ||
21 | the system. If userspace fails (RAM error, kernel bug, whatever), the | ||
22 | notifications cease to occur, and the hardware watchdog will reset the | ||
23 | system (causing a reboot) after the timeout occurs. | ||
24 | |||
25 | The Linux watchdog API is a rather AD hoc construction and different | ||
26 | drivers implement different, and sometimes incompatible, parts of it. | ||
27 | This file is an attempt to document the existing usage and allow | ||
28 | future driver writers to use it as a reference. | ||
29 | |||
30 | The simplest API: | ||
31 | |||
32 | All drivers support the basic mode of operation, where the watchdog | ||
33 | activates as soon as /dev/watchdog is opened and will reboot unless | ||
34 | the watchdog is pinged within a certain time, this time is called the | ||
35 | timeout or margin. The simplest way to ping the watchdog is to write | ||
36 | some data to the device. So a very simple watchdog daemon would look | ||
37 | like this: | ||
38 | |||
39 | int main(int argc, const char *argv[]) { | ||
40 | int fd=open("/dev/watchdog",O_WRONLY); | ||
41 | if (fd==-1) { | ||
42 | perror("watchdog"); | ||
43 | exit(1); | ||
44 | } | ||
45 | while(1) { | ||
46 | write(fd, "\0", 1); | ||
47 | sleep(10); | ||
48 | } | ||
49 | } | ||
50 | |||
51 | A more advanced driver could for example check that a HTTP server is | ||
52 | still responding before doing the write call to ping the watchdog. | ||
53 | |||
54 | When the device is closed, the watchdog is disabled. This is not | ||
55 | always such a good idea, since if there is a bug in the watchdog | ||
56 | daemon and it crashes the system will not reboot. Because of this, | ||
57 | some of the drivers support the configuration option "Disable watchdog | ||
58 | shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when | ||
59 | compiling the kernel, there is no way of disabling the watchdog once | ||
60 | it has been started. So, if the watchdog dameon crashes, the system | ||
61 | will reboot after the timeout has passed. | ||
62 | |||
63 | Some other drivers will not disable the watchdog, unless a specific | ||
64 | magic character 'V' has been sent /dev/watchdog just before closing | ||
65 | the file. If the userspace daemon closes the file without sending | ||
66 | this special character, the driver will assume that the daemon (and | ||
67 | userspace in general) died, and will stop pinging the watchdog without | ||
68 | disabling it first. This will then cause a reboot. | ||
69 | |||
70 | The ioctl API: | ||
71 | |||
72 | All conforming drivers also support an ioctl API. | ||
73 | |||
74 | Pinging the watchdog using an ioctl: | ||
75 | |||
76 | All drivers that have an ioctl interface support at least one ioctl, | ||
77 | KEEPALIVE. This ioctl does exactly the same thing as a write to the | ||
78 | watchdog device, so the main loop in the above program could be | ||
79 | replaced with: | ||
80 | |||
81 | while (1) { | ||
82 | ioctl(fd, WDIOC_KEEPALIVE, 0); | ||
83 | sleep(10); | ||
84 | } | ||
85 | |||
86 | the argument to the ioctl is ignored. | ||
87 | |||
88 | Setting and getting the timeout: | ||
89 | |||
90 | For some drivers it is possible to modify the watchdog timeout on the | ||
91 | fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT | ||
92 | flag set in their option field. The argument is an integer | ||
93 | representing the timeout in seconds. The driver returns the real | ||
94 | timeout used in the same variable, and this timeout might differ from | ||
95 | the requested one due to limitation of the hardware. | ||
96 | |||
97 | int timeout = 45; | ||
98 | ioctl(fd, WDIOC_SETTIMEOUT, &timeout); | ||
99 | printf("The timeout was set to %d seconds\n", timeout); | ||
100 | |||
101 | This example might actually print "The timeout was set to 60 seconds" | ||
102 | if the device has a granularity of minutes for its timeout. | ||
103 | |||
104 | Starting with the Linux 2.4.18 kernel, it is possible to query the | ||
105 | current timeout using the GETTIMEOUT ioctl. | ||
106 | |||
107 | ioctl(fd, WDIOC_GETTIMEOUT, &timeout); | ||
108 | printf("The timeout was is %d seconds\n", timeout); | ||
109 | |||
110 | Envinronmental monitoring: | ||
111 | |||
112 | All watchdog drivers are required return more information about the system, | ||
113 | some do temperature, fan and power level monitoring, some can tell you | ||
114 | the reason for the last reboot of the system. The GETSUPPORT ioctl is | ||
115 | available to ask what the device can do: | ||
116 | |||
117 | struct watchdog_info ident; | ||
118 | ioctl(fd, WDIOC_GETSUPPORT, &ident); | ||
119 | |||
120 | the fields returned in the ident struct are: | ||
121 | |||
122 | identity a string identifying the watchdog driver | ||
123 | firmware_version the firmware version of the card if available | ||
124 | options a flags describing what the device supports | ||
125 | |||
126 | the options field can have the following bits set, and describes what | ||
127 | kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can | ||
128 | return. [FIXME -- Is this correct?] | ||
129 | |||
130 | WDIOF_OVERHEAT Reset due to CPU overheat | ||
131 | |||
132 | The machine was last rebooted by the watchdog because the thermal limit was | ||
133 | exceeded | ||
134 | |||
135 | WDIOF_FANFAULT Fan failed | ||
136 | |||
137 | A system fan monitored by the watchdog card has failed | ||
138 | |||
139 | WDIOF_EXTERN1 External relay 1 | ||
140 | |||
141 | External monitoring relay/source 1 was triggered. Controllers intended for | ||
142 | real world applications include external monitoring pins that will trigger | ||
143 | a reset. | ||
144 | |||
145 | WDIOF_EXTERN2 External relay 2 | ||
146 | |||
147 | External monitoring relay/source 2 was triggered | ||
148 | |||
149 | WDIOF_POWERUNDER Power bad/power fault | ||
150 | |||
151 | The machine is showing an undervoltage status | ||
152 | |||
153 | WDIOF_CARDRESET Card previously reset the CPU | ||
154 | |||
155 | The last reboot was caused by the watchdog card | ||
156 | |||
157 | WDIOF_POWEROVER Power over voltage | ||
158 | |||
159 | The machine is showing an overvoltage status. Note that if one level is | ||
160 | under and one over both bits will be set - this may seem odd but makes | ||
161 | sense. | ||
162 | |||
163 | WDIOF_KEEPALIVEPING Keep alive ping reply | ||
164 | |||
165 | The watchdog saw a keepalive ping since it was last queried. | ||
166 | |||
167 | WDIOF_SETTIMEOUT Can set/get the timeout | ||
168 | |||
169 | |||
170 | For those drivers that return any bits set in the option field, the | ||
171 | GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current | ||
172 | status, and the status at the last reboot, respectively. | ||
173 | |||
174 | int flags; | ||
175 | ioctl(fd, WDIOC_GETSTATUS, &flags); | ||
176 | |||
177 | or | ||
178 | |||
179 | ioctl(fd, WDIOC_GETBOOTSTATUS, &flags); | ||
180 | |||
181 | Note that not all devices support these two calls, and some only | ||
182 | support the GETBOOTSTATUS call. | ||
183 | |||
184 | Some drivers can measure the temperature using the GETTEMP ioctl. The | ||
185 | returned value is the temperature in degrees farenheit. | ||
186 | |||
187 | int temperature; | ||
188 | ioctl(fd, WDIOC_GETTEMP, &temperature); | ||
189 | |||
190 | Finally the SETOPTIONS ioctl can be used to control some aspects of | ||
191 | the cards operation; right now the pcwd driver is the only one | ||
192 | supporting thiss ioctl. | ||
193 | |||
194 | int options = 0; | ||
195 | ioctl(fd, WDIOC_SETOPTIONS, options); | ||
196 | |||
197 | The following options are available: | ||
198 | |||
199 | WDIOS_DISABLECARD Turn off the watchdog timer | ||
200 | WDIOS_ENABLECARD Turn on the watchdog timer | ||
201 | WDIOS_TEMPPANIC Kernel panic on temperature trip | ||
202 | |||
203 | [FIXME -- better explanations] | ||
204 | |||
205 | Implementations in the current drivers in the kernel tree: | ||
206 | |||
207 | Here I have tried to summarize what the different drivers support and | ||
208 | where they do strange things compared to the other drivers. | ||
209 | |||
210 | acquirewdt.c -- Acquire Single Board Computer | ||
211 | |||
212 | This driver has a hardcoded timeout of 1 minute | ||
213 | |||
214 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
215 | |||
216 | GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if | ||
217 | the device is open, 0 if not. [FIXME -- isn't this rather | ||
218 | silly? To be able to use the ioctl, the device must be open | ||
219 | and so GETSTATUS will always return 1]. | ||
220 | |||
221 | advantechwdt.c -- Advantech Single Board Computer | ||
222 | |||
223 | Timeout that defaults to 60 seconds, supports SETTIMEOUT. | ||
224 | |||
225 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
226 | |||
227 | GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | ||
228 | The GETSTATUS call returns if the device is open or not. | ||
229 | [FIXME -- silliness again?] | ||
230 | |||
231 | eurotechwdt.c -- Eurotech CPU-1220/1410 | ||
232 | |||
233 | The timeout can be set using the SETTIMEOUT ioctl and defaults | ||
234 | to 60 seconds. | ||
235 | |||
236 | Also has a module parameter "ev", event type which controls | ||
237 | what should happen on a timeout, the string "int" or anything | ||
238 | else that causes a reboot. [FIXME -- better description] | ||
239 | |||
240 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
241 | |||
242 | GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but | ||
243 | GETSTATUS is not supported and GETBOOTSTATUS just returns 0. | ||
244 | |||
245 | i810-tco.c -- Intel 810 chipset | ||
246 | |||
247 | Also has support for a lot of other i8x0 stuff, but the | ||
248 | watchdog is one of the things. | ||
249 | |||
250 | The timeout is set using the module parameter "i810_margin", | ||
251 | which is in steps of 0.6 seconds where 2<i810_margin<64. The | ||
252 | driver supports the SETTIMEOUT ioctl. | ||
253 | |||
254 | Supports CONFIG_WATCHDOG_NOWAYOUT. | ||
255 | |||
256 | GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call | ||
257 | returns some kind of timer value which ist not compatible with | ||
258 | the other drivers. GETBOOT status returns some kind of | ||
259 | hardware specific boot status. [FIXME -- describe this] | ||
260 | |||
261 | ib700wdt.c -- IB700 Single Board Computer | ||
262 | |||
263 | Default timeout of 30 seconds and the timeout is settable | ||
264 | using the SETTIMEOUT ioctl. Note that only a few timeout | ||
265 | values are supported. | ||
266 | |||
267 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
268 | |||
269 | GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | ||
270 | The GETSTATUS call returns if the device is open or not. | ||
271 | [FIXME -- silliness again?] | ||
272 | |||
273 | machzwd.c -- MachZ ZF-Logic | ||
274 | |||
275 | Hardcoded timeout of 10 seconds | ||
276 | |||
277 | Has a module parameter "action" that controls what happens | ||
278 | when the timeout runs out which can be 0 = RESET (default), | ||
279 | 1 = SMI, 2 = NMI, 3 = SCI. | ||
280 | |||
281 | Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character | ||
282 | 'V' close handling. | ||
283 | |||
284 | GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call | ||
285 | returns if the device is open or not. [FIXME -- silliness | ||
286 | again?] | ||
287 | |||
288 | mixcomwd.c -- MixCom Watchdog | ||
289 | |||
290 | [FIXME -- I'm unable to tell what the timeout is] | ||
291 | |||
292 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
293 | |||
294 | GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if | ||
295 | the device is opened or not [FIXME -- I'm not really sure how | ||
296 | this works, there seems to be some magic connected to | ||
297 | CONFIG_WATCHDOG_NOWAYOUT] | ||
298 | |||
299 | pcwd.c -- Berkshire PC Watchdog | ||
300 | |||
301 | Hardcoded timeout of 1.5 seconds | ||
302 | |||
303 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
304 | |||
305 | GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both | ||
306 | GETSTATUS and GETBOOTSTATUS return something useful. | ||
307 | |||
308 | The SETOPTIONS call can be used to enable and disable the card | ||
309 | and to ask the driver to call panic if the system overheats. | ||
310 | |||
311 | sbc60xxwdt.c -- 60xx Single Board Computer | ||
312 | |||
313 | Hardcoded timeout of 10 seconds | ||
314 | |||
315 | Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic | ||
316 | character 'V' close handling. | ||
317 | |||
318 | No bits set in GETSUPPORT | ||
319 | |||
320 | scx200.c -- National SCx200 CPUs | ||
321 | |||
322 | Not in the kernel yet. | ||
323 | |||
324 | The timeout is set using a module parameter "margin" which | ||
325 | defaults to 60 seconds. The timeout can also be set using | ||
326 | SETTIMEOUT and read using GETTIMEOUT. | ||
327 | |||
328 | Supports a module parameter "nowayout" that is initialized | ||
329 | with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the | ||
330 | magic character 'V' handling. | ||
331 | |||
332 | shwdt.c -- SuperH 3/4 processors | ||
333 | |||
334 | [FIXME -- I'm unable to tell what the timeout is] | ||
335 | |||
336 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
337 | |||
338 | GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call | ||
339 | returns if the device is open or not. [FIXME -- silliness | ||
340 | again?] | ||
341 | |||
342 | softdog.c -- Software watchdog | ||
343 | |||
344 | The timeout is set with the module parameter "soft_margin" | ||
345 | which defaults to 60 seconds, the timeout is also settable | ||
346 | using the SETTIMEOUT ioctl. | ||
347 | |||
348 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
349 | |||
350 | WDIOF_SETTIMEOUT bit set in GETSUPPORT | ||
351 | |||
352 | w83877f_wdt.c -- W83877F Computer | ||
353 | |||
354 | Hardcoded timeout of 30 seconds | ||
355 | |||
356 | Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic | ||
357 | character 'V' close handling. | ||
358 | |||
359 | No bits set in GETSUPPORT | ||
360 | |||
361 | w83627hf_wdt.c -- w83627hf watchdog | ||
362 | |||
363 | Timeout that defaults to 60 seconds, supports SETTIMEOUT. | ||
364 | |||
365 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
366 | |||
367 | GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | ||
368 | The GETSTATUS call returns if the device is open or not. | ||
369 | |||
370 | wdt.c -- ICS WDT500/501 ISA and | ||
371 | wdt_pci.c -- ICS WDT500/501 PCI | ||
372 | |||
373 | Default timeout of 60 seconds. The timeout is also settable | ||
374 | using the SETTIMEOUT ioctl. | ||
375 | |||
376 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
377 | |||
378 | GETSUPPORT returns with bits set depending on the actual | ||
379 | card. The WDT501 supports a lot of external monitoring, the | ||
380 | WDT500 much less. | ||
381 | |||
382 | wdt285.c -- Footbridge watchdog | ||
383 | |||
384 | The timeout is set with the module parameter "soft_margin" | ||
385 | which defaults to 60 seconds. The timeout is also settable | ||
386 | using the SETTIMEOUT ioctl. | ||
387 | |||
388 | Does not support CONFIG_WATCHDOG_NOWAYOUT | ||
389 | |||
390 | WDIOF_SETTIMEOUT bit set in GETSUPPORT | ||
391 | |||
392 | wdt977.c -- Netwinder W83977AF chip | ||
393 | |||
394 | Hardcoded timeout of 3 minutes | ||
395 | |||
396 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
397 | |||
398 | Does not support any ioctls at all. | ||
399 | |||