diff options
author | Carlos Maiolino <cmaiolino@redhat.com> | 2016-09-18 19:38:25 -0400 |
---|---|---|
committer | Dave Chinner <david@fromorbit.com> | 2016-09-18 19:38:25 -0400 |
commit | 5694fe9aadbb26874d2791de1db6ac08aa1b4c14 (patch) | |
tree | 89a355ce8e81699ad0012a0beecf01c570bab0ca | |
parent | 77169812739dd800bc3620d781a77c50c75165cc (diff) |
xfs: Document error handlers behavior
Document the implementation of error handlers into sysfs.
[dchinner: Added lots more detail.]
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
-rw-r--r-- | Documentation/filesystems/xfs.txt | 123 |
1 files changed, 123 insertions, 0 deletions
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt index 8146e9fd5ffc..c2d44e6e117b 100644 --- a/Documentation/filesystems/xfs.txt +++ b/Documentation/filesystems/xfs.txt | |||
@@ -348,3 +348,126 @@ Removed Sysctls | |||
348 | ---- ------- | 348 | ---- ------- |
349 | fs.xfs.xfsbufd_centisec v4.0 | 349 | fs.xfs.xfsbufd_centisec v4.0 |
350 | fs.xfs.age_buffer_centisecs v4.0 | 350 | fs.xfs.age_buffer_centisecs v4.0 |
351 | |||
352 | |||
353 | Error handling | ||
354 | ============== | ||
355 | |||
356 | XFS can act differently according to the type of error found during its | ||
357 | operation. The implementation introduces the following concepts to the error | ||
358 | handler: | ||
359 | |||
360 | -failure speed: | ||
361 | Defines how fast XFS should propagate an error upwards when a specific | ||
362 | error is found during the filesystem operation. It can propagate | ||
363 | immediately, after a defined number of retries, after a set time period, | ||
364 | or simply retry forever. | ||
365 | |||
366 | -error classes: | ||
367 | Specifies the subsystem the error configuration will apply to, such as | ||
368 | metadata IO or memory allocation. Different subsystems will have | ||
369 | different error handlers for which behaviour can be configured. | ||
370 | |||
371 | -error handlers: | ||
372 | Defines the behavior for a specific error. | ||
373 | |||
374 | The filesystem behavior during an error can be set via sysfs files. Each | ||
375 | error handler works independently - the first condition met by an error handler | ||
376 | for a specific class will cause the error to be propagated rather than reset and | ||
377 | retried. | ||
378 | |||
379 | The action taken by the filesystem when the error is propagated is context | ||
380 | dependent - it may cause a shut down in the case of an unrecoverable error, | ||
381 | it may be reported back to userspace, or it may even be ignored because | ||
382 | there's nothing useful we can with the error or anyone we can report it to (e.g. | ||
383 | during unmount). | ||
384 | |||
385 | The configuration files are organized into the following hierarchy for each | ||
386 | mounted filesystem: | ||
387 | |||
388 | /sys/fs/xfs/<dev>/error/<class>/<error>/ | ||
389 | |||
390 | Where: | ||
391 | <dev> | ||
392 | The short device name of the mounted filesystem. This is the same device | ||
393 | name that shows up in XFS kernel error messages as "XFS(<dev>): ..." | ||
394 | |||
395 | <class> | ||
396 | The subsystem the error configuration belongs to. As of 4.9, the defined | ||
397 | classes are: | ||
398 | |||
399 | - "metadata": applies metadata buffer write IO | ||
400 | |||
401 | <error> | ||
402 | The individual error handler configurations. | ||
403 | |||
404 | |||
405 | Each filesystem has "global" error configuration options defined in their top | ||
406 | level directory: | ||
407 | |||
408 | /sys/fs/xfs/<dev>/error/ | ||
409 | |||
410 | fail_at_unmount (Min: 0 Default: 1 Max: 1) | ||
411 | Defines the filesystem error behavior at unmount time. | ||
412 | |||
413 | If set to a value of 1, XFS will override all other error configurations | ||
414 | during unmount and replace them with "immediate fail" characteristics. | ||
415 | i.e. no retries, no retry timeout. This will always allow unmount to | ||
416 | succeed when there are persistent errors present. | ||
417 | |||
418 | If set to 0, the configured retry behaviour will continue until all | ||
419 | retries and/or timeouts have been exhausted. This will delay unmount | ||
420 | completion when there are persistent errors, and it may prevent the | ||
421 | filesystem from ever unmounting fully in the case of "retry forever" | ||
422 | handler configurations. | ||
423 | |||
424 | Note: there is no guarantee that fail_at_unmount can be set whilst an | ||
425 | unmount is in progress. It is possible that the sysfs entries are | ||
426 | removed by the unmounting filesystem before a "retry forever" error | ||
427 | handler configuration causes unmount to hang, and hence the filesystem | ||
428 | must be configured appropriately before unmount begins to prevent | ||
429 | unmount hangs. | ||
430 | |||
431 | Each filesystem has specific error class handlers that define the error | ||
432 | propagation behaviour for specific errors. There is also a "default" error | ||
433 | handler defined, which defines the behaviour for all errors that don't have | ||
434 | specific handlers defined. Where multiple retry constraints are configuredi for | ||
435 | a single error, the first retry configuration that expires will cause the error | ||
436 | to be propagated. The handler configurations are found in the directory: | ||
437 | |||
438 | /sys/fs/xfs/<dev>/error/<class>/<error>/ | ||
439 | |||
440 | max_retries (Min: -1 Default: Varies Max: INTMAX) | ||
441 | Defines the allowed number of retries of a specific error before | ||
442 | the filesystem will propagate the error. The retry count for a given | ||
443 | error context (e.g. a specific metadata buffer) is reset every time | ||
444 | there is a successful completion of the operation. | ||
445 | |||
446 | Setting the value to "-1" will cause XFS to retry forever for this | ||
447 | specific error. | ||
448 | |||
449 | Setting the value to "0" will cause XFS to fail immediately when the | ||
450 | specific error is reported. | ||
451 | |||
452 | Setting the value to "N" (where 0 < N < Max) will make XFS retry the | ||
453 | operation "N" times before propagating the error. | ||
454 | |||
455 | retry_timeout_seconds (Min: -1 Default: Varies Max: 1 day) | ||
456 | Define the amount of time (in seconds) that the filesystem is | ||
457 | allowed to retry its operations when the specific error is | ||
458 | found. | ||
459 | |||
460 | Setting the value to "-1" will allow XFS to retry forever for this | ||
461 | specific error. | ||
462 | |||
463 | Setting the value to "0" will cause XFS to fail immediately when the | ||
464 | specific error is reported. | ||
465 | |||
466 | Setting the value to "N" (where 0 < N < Max) will allow XFS to retry the | ||
467 | operation for up to "N" seconds before propagating the error. | ||
468 | |||
469 | Note: The default behaviour for a specific error handler is dependent on both | ||
470 | the class and error context. For example, the default values for | ||
471 | "metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults | ||
472 | to "fail immediately" behaviour. This is done because ENODEV is a fatal, | ||
473 | unrecoverable error no matter how many times the metadata IO is retried. | ||