aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorCarlos Maiolino <cmaiolino@redhat.com>2016-09-18 19:38:25 -0400
committerDave Chinner <david@fromorbit.com>2016-09-18 19:38:25 -0400
commit5694fe9aadbb26874d2791de1db6ac08aa1b4c14 (patch)
tree89a355ce8e81699ad0012a0beecf01c570bab0ca
parent77169812739dd800bc3620d781a77c50c75165cc (diff)
xfs: Document error handlers behavior
Document the implementation of error handlers into sysfs. [dchinner: Added lots more detail.] Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com>
-rw-r--r--Documentation/filesystems/xfs.txt123
1 files changed, 123 insertions, 0 deletions
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 8146e9fd5ffc..c2d44e6e117b 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -348,3 +348,126 @@ Removed Sysctls
348 ---- ------- 348 ---- -------
349 fs.xfs.xfsbufd_centisec v4.0 349 fs.xfs.xfsbufd_centisec v4.0
350 fs.xfs.age_buffer_centisecs v4.0 350 fs.xfs.age_buffer_centisecs v4.0
351
352
353Error handling
354==============
355
356XFS can act differently according to the type of error found during its
357operation. The implementation introduces the following concepts to the error
358handler:
359
360 -failure speed:
361 Defines how fast XFS should propagate an error upwards when a specific
362 error is found during the filesystem operation. It can propagate
363 immediately, after a defined number of retries, after a set time period,
364 or simply retry forever.
365
366 -error classes:
367 Specifies the subsystem the error configuration will apply to, such as
368 metadata IO or memory allocation. Different subsystems will have
369 different error handlers for which behaviour can be configured.
370
371 -error handlers:
372 Defines the behavior for a specific error.
373
374The filesystem behavior during an error can be set via sysfs files. Each
375error handler works independently - the first condition met by an error handler
376for a specific class will cause the error to be propagated rather than reset and
377retried.
378
379The action taken by the filesystem when the error is propagated is context
380dependent - it may cause a shut down in the case of an unrecoverable error,
381it may be reported back to userspace, or it may even be ignored because
382there's nothing useful we can with the error or anyone we can report it to (e.g.
383during unmount).
384
385The configuration files are organized into the following hierarchy for each
386mounted filesystem:
387
388 /sys/fs/xfs/<dev>/error/<class>/<error>/
389
390Where:
391 <dev>
392 The short device name of the mounted filesystem. This is the same device
393 name that shows up in XFS kernel error messages as "XFS(<dev>): ..."
394
395 <class>
396 The subsystem the error configuration belongs to. As of 4.9, the defined
397 classes are:
398
399 - "metadata": applies metadata buffer write IO
400
401 <error>
402 The individual error handler configurations.
403
404
405Each filesystem has "global" error configuration options defined in their top
406level directory:
407
408 /sys/fs/xfs/<dev>/error/
409
410 fail_at_unmount (Min: 0 Default: 1 Max: 1)
411 Defines the filesystem error behavior at unmount time.
412
413 If set to a value of 1, XFS will override all other error configurations
414 during unmount and replace them with "immediate fail" characteristics.
415 i.e. no retries, no retry timeout. This will always allow unmount to
416 succeed when there are persistent errors present.
417
418 If set to 0, the configured retry behaviour will continue until all
419 retries and/or timeouts have been exhausted. This will delay unmount
420 completion when there are persistent errors, and it may prevent the
421 filesystem from ever unmounting fully in the case of "retry forever"
422 handler configurations.
423
424 Note: there is no guarantee that fail_at_unmount can be set whilst an
425 unmount is in progress. It is possible that the sysfs entries are
426 removed by the unmounting filesystem before a "retry forever" error
427 handler configuration causes unmount to hang, and hence the filesystem
428 must be configured appropriately before unmount begins to prevent
429 unmount hangs.
430
431Each filesystem has specific error class handlers that define the error
432propagation behaviour for specific errors. There is also a "default" error
433handler defined, which defines the behaviour for all errors that don't have
434specific handlers defined. Where multiple retry constraints are configuredi for
435a single error, the first retry configuration that expires will cause the error
436to be propagated. The handler configurations are found in the directory:
437
438 /sys/fs/xfs/<dev>/error/<class>/<error>/
439
440 max_retries (Min: -1 Default: Varies Max: INTMAX)
441 Defines the allowed number of retries of a specific error before
442 the filesystem will propagate the error. The retry count for a given
443 error context (e.g. a specific metadata buffer) is reset every time
444 there is a successful completion of the operation.
445
446 Setting the value to "-1" will cause XFS to retry forever for this
447 specific error.
448
449 Setting the value to "0" will cause XFS to fail immediately when the
450 specific error is reported.
451
452 Setting the value to "N" (where 0 < N < Max) will make XFS retry the
453 operation "N" times before propagating the error.
454
455 retry_timeout_seconds (Min: -1 Default: Varies Max: 1 day)
456 Define the amount of time (in seconds) that the filesystem is
457 allowed to retry its operations when the specific error is
458 found.
459
460 Setting the value to "-1" will allow XFS to retry forever for this
461 specific error.
462
463 Setting the value to "0" will cause XFS to fail immediately when the
464 specific error is reported.
465
466 Setting the value to "N" (where 0 < N < Max) will allow XFS to retry the
467 operation for up to "N" seconds before propagating the error.
468
469Note: The default behaviour for a specific error handler is dependent on both
470the class and error context. For example, the default values for
471"metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
472to "fail immediately" behaviour. This is done because ENODEV is a fatal,
473unrecoverable error no matter how many times the metadata IO is retried.