diff options
author | Tejun Heo <htejun@gmail.com> | 2005-09-25 22:28:47 -0400 |
---|---|---|
committer | Jeff Garzik <jgarzik@pobox.com> | 2005-09-28 12:16:54 -0400 |
commit | bfd00722ac230a39bc5234c5f7a514ea6a77996d (patch) | |
tree | 08b76d7cfe885f9cabd5a6502105f13c1b8cf6fb /Documentation | |
parent | 64f09c98d7fce21dcb8da9f248e4159eb1ec245e (diff) |
[PATCH] libata EH document update
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/DocBook/libata.tmpl | 356 |
1 files changed, 356 insertions, 0 deletions
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index 375ae760dc1e..fcbb58fc0fee 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl | |||
@@ -413,6 +413,362 @@ and other resources, etc. | |||
413 | </sect2> | 413 | </sect2> |
414 | 414 | ||
415 | </sect1> | 415 | </sect1> |
416 | <sect1> | ||
417 | <title>Error handling</title> | ||
418 | |||
419 | <para> | ||
420 | This chapter describes how errors are handled under libata. | ||
421 | Readers are advised to read SCSI EH | ||
422 | (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first. | ||
423 | </para> | ||
424 | |||
425 | <sect2><title>Origins of commands</title> | ||
426 | <para> | ||
427 | In libata, a command is represented with struct ata_queued_cmd | ||
428 | or qc. qc's are preallocated during port initialization and | ||
429 | repetitively used for command executions. Currently only one | ||
430 | qc is allocated per port but yet-to-be-merged NCQ branch | ||
431 | allocates one for each tag and maps each qc to NCQ tag 1-to-1. | ||
432 | </para> | ||
433 | <para> | ||
434 | libata commands can originate from two sources - libata itself | ||
435 | and SCSI midlayer. libata internal commands are used for | ||
436 | initialization and error handling. All normal blk requests | ||
437 | and commands for SCSI emulation are passed as SCSI commands | ||
438 | through queuecommand callback of SCSI host template. | ||
439 | </para> | ||
440 | </sect2> | ||
441 | |||
442 | <sect2><title>How commands are issued</title> | ||
443 | |||
444 | <variablelist> | ||
445 | |||
446 | <varlistentry><term>Internal commands</term> | ||
447 | <listitem> | ||
448 | <para> | ||
449 | First, qc is allocated and initialized using | ||
450 | ata_qc_new_init(). Although ata_qc_new_init() doesn't | ||
451 | implement any wait or retry mechanism when qc is not | ||
452 | available, internal commands are currently issued only during | ||
453 | initialization and error recovery, so no other command is | ||
454 | active and allocation is guaranteed to succeed. | ||
455 | </para> | ||
456 | <para> | ||
457 | Once allocated qc's taskfile is initialized for the command to | ||
458 | be executed. qc currently has two mechanisms to notify | ||
459 | completion. One is via qc->complete_fn() callback and the | ||
460 | other is completion qc->waiting. qc->complete_fn() callback | ||
461 | is the asynchronous path used by normal SCSI translated | ||
462 | commands and qc->waiting is the synchronous (issuer sleeps in | ||
463 | process context) path used by internal commands. | ||
464 | </para> | ||
465 | <para> | ||
466 | Once initialization is complete, host_set lock is acquired | ||
467 | and the qc is issued. | ||
468 | </para> | ||
469 | </listitem> | ||
470 | </varlistentry> | ||
471 | |||
472 | <varlistentry><term>SCSI commands</term> | ||
473 | <listitem> | ||
474 | <para> | ||
475 | All libata drivers use ata_scsi_queuecmd() as | ||
476 | hostt->queuecommand callback. scmds can either be simulated | ||
477 | or translated. No qc is involved in processing a simulated | ||
478 | scmd. The result is computed right away and the scmd is | ||
479 | completed. | ||
480 | </para> | ||
481 | <para> | ||
482 | For a translated scmd, ata_qc_new_init() is invoked to | ||
483 | allocate a qc and the scmd is translated into the qc. SCSI | ||
484 | midlayer's completion notification function pointer is stored | ||
485 | into qc->scsidone. | ||
486 | </para> | ||
487 | <para> | ||
488 | qc->complete_fn() callback is used for completion | ||
489 | notification. ATA commands use ata_scsi_qc_complete() while | ||
490 | ATAPI commands use atapi_qc_complete(). Both functions end up | ||
491 | calling qc->scsidone to notify upper layer when the qc is | ||
492 | finished. After translation is completed, the qc is issued | ||
493 | with ata_qc_issue(). | ||
494 | </para> | ||
495 | <para> | ||
496 | Note that SCSI midlayer invokes hostt->queuecommand while | ||
497 | holding host_set lock, so all above occur while holding | ||
498 | host_set lock. | ||
499 | </para> | ||
500 | </listitem> | ||
501 | </varlistentry> | ||
502 | |||
503 | </variablelist> | ||
504 | </sect2> | ||
505 | |||
506 | <sect2><title>How commands are processed</title> | ||
507 | <para> | ||
508 | Depending on which protocol and which controller are used, | ||
509 | commands are processed differently. For the purpose of | ||
510 | discussion, a controller which uses taskfile interface and all | ||
511 | standard callbacks is assumed. | ||
512 | </para> | ||
513 | <para> | ||
514 | Currently 6 ATA command protocols are used. They can be | ||
515 | sorted into the following four categories according to how | ||
516 | they are processed. | ||
517 | </para> | ||
518 | |||
519 | <variablelist> | ||
520 | <varlistentry><term>ATA NO DATA or DMA</term> | ||
521 | <listitem> | ||
522 | <para> | ||
523 | ATA_PROT_NODATA and ATA_PROT_DMA fall into this category. | ||
524 | These types of commands don't require any software | ||
525 | intervention once issued. Device will raise interrupt on | ||
526 | completion. | ||
527 | </para> | ||
528 | </listitem> | ||
529 | </varlistentry> | ||
530 | |||
531 | <varlistentry><term>ATA PIO</term> | ||
532 | <listitem> | ||
533 | <para> | ||
534 | ATA_PROT_PIO is in this category. libata currently | ||
535 | implements PIO with polling. ATA_NIEN bit is set to turn | ||
536 | off interrupt and pio_task on ata_wq performs polling and | ||
537 | IO. | ||
538 | </para> | ||
539 | </listitem> | ||
540 | </varlistentry> | ||
541 | |||
542 | <varlistentry><term>ATAPI NODATA or DMA</term> | ||
543 | <listitem> | ||
544 | <para> | ||
545 | ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this | ||
546 | category. packet_task is used to poll BSY bit after | ||
547 | issuing PACKET command. Once BSY is turned off by the | ||
548 | device, packet_task transfers CDB and hands off processing | ||
549 | to interrupt handler. | ||
550 | </para> | ||
551 | </listitem> | ||
552 | </varlistentry> | ||
553 | |||
554 | <varlistentry><term>ATAPI PIO</term> | ||
555 | <listitem> | ||
556 | <para> | ||
557 | ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set | ||
558 | and, as in ATAPI NODATA or DMA, packet_task submits cdb. | ||
559 | However, after submitting cdb, further processing (data | ||
560 | transfer) is handed off to pio_task. | ||
561 | </para> | ||
562 | </listitem> | ||
563 | </varlistentry> | ||
564 | </variablelist> | ||
565 | </sect2> | ||
566 | |||
567 | <sect2><title>How commands are completed</title> | ||
568 | <para> | ||
569 | Once issued, all qc's are either completed with | ||
570 | ata_qc_complete() or time out. For commands which are handled | ||
571 | by interrupts, ata_host_intr() invokes ata_qc_complete(), and, | ||
572 | for PIO tasks, pio_task invokes ata_qc_complete(). In error | ||
573 | cases, packet_task may also complete commands. | ||
574 | </para> | ||
575 | <para> | ||
576 | ata_qc_complete() does the following. | ||
577 | </para> | ||
578 | |||
579 | <orderedlist> | ||
580 | |||
581 | <listitem> | ||
582 | <para> | ||
583 | DMA memory is unmapped. | ||
584 | </para> | ||
585 | </listitem> | ||
586 | |||
587 | <listitem> | ||
588 | <para> | ||
589 | ATA_QCFLAG_ACTIVE is clared from qc->flags. | ||
590 | </para> | ||
591 | </listitem> | ||
592 | |||
593 | <listitem> | ||
594 | <para> | ||
595 | qc->complete_fn() callback is invoked. If the return value of | ||
596 | the callback is not zero. Completion is short circuited and | ||
597 | ata_qc_complete() returns. | ||
598 | </para> | ||
599 | </listitem> | ||
600 | |||
601 | <listitem> | ||
602 | <para> | ||
603 | __ata_qc_complete() is called, which does | ||
604 | <orderedlist> | ||
605 | |||
606 | <listitem> | ||
607 | <para> | ||
608 | qc->flags is cleared to zero. | ||
609 | </para> | ||
610 | </listitem> | ||
611 | |||
612 | <listitem> | ||
613 | <para> | ||
614 | ap->active_tag and qc->tag are poisoned. | ||
615 | </para> | ||
616 | </listitem> | ||
617 | |||
618 | <listitem> | ||
619 | <para> | ||
620 | qc->waiting is claread & completed (in that order). | ||
621 | </para> | ||
622 | </listitem> | ||
623 | |||
624 | <listitem> | ||
625 | <para> | ||
626 | qc is deallocated by clearing appropriate bit in ap->qactive. | ||
627 | </para> | ||
628 | </listitem> | ||
629 | |||
630 | </orderedlist> | ||
631 | </para> | ||
632 | </listitem> | ||
633 | |||
634 | </orderedlist> | ||
635 | |||
636 | <para> | ||
637 | So, it basically notifies upper layer and deallocates qc. One | ||
638 | exception is short-circuit path in #3 which is used by | ||
639 | atapi_qc_complete(). | ||
640 | </para> | ||
641 | <para> | ||
642 | For all non-ATAPI commands, whether it fails or not, almost | ||
643 | the same code path is taken and very little error handling | ||
644 | takes place. A qc is completed with success status if it | ||
645 | succeeded, with failed status otherwise. | ||
646 | </para> | ||
647 | <para> | ||
648 | However, failed ATAPI commands require more handling as | ||
649 | REQUEST SENSE is needed to acquire sense data. If an ATAPI | ||
650 | command fails, ata_qc_complete() is invoked with error status, | ||
651 | which in turn invokes atapi_qc_complete() via | ||
652 | qc->complete_fn() callback. | ||
653 | </para> | ||
654 | <para> | ||
655 | This makes atapi_qc_complete() set scmd->result to | ||
656 | SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As | ||
657 | the sense data is empty but scmd->result is CHECK CONDITION, | ||
658 | SCSI midlayer will invoke EH for the scmd, and returning 1 | ||
659 | makes ata_qc_complete() to return without deallocating the qc. | ||
660 | This leads us to ata_scsi_error() with partially completed qc. | ||
661 | </para> | ||
662 | |||
663 | </sect2> | ||
664 | |||
665 | <sect2><title>ata_scsi_error()</title> | ||
666 | <para> | ||
667 | ata_scsi_error() is the current hostt->eh_strategy_handler() | ||
668 | for libata. As discussed above, this will be entered in two | ||
669 | cases - timeout and ATAPI error completion. This function | ||
670 | calls low level libata driver's eng_timeout() callback, the | ||
671 | standard callback for which is ata_eng_timeout(). It checks | ||
672 | if a qc is active and calls ata_qc_timeout() on the qc if so. | ||
673 | Actual error handling occurs in ata_qc_timeout(). | ||
674 | </para> | ||
675 | <para> | ||
676 | If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and | ||
677 | completes the qc. Note that as we're currently in EH, we | ||
678 | cannot call scsi_done. As described in SCSI EH doc, a | ||
679 | recovered scmd should be either retried with | ||
680 | scsi_queue_insert() or finished with scsi_finish_command(). | ||
681 | Here, we override qc->scsidone with scsi_finish_command() and | ||
682 | calls ata_qc_complete(). | ||
683 | </para> | ||
684 | <para> | ||
685 | If EH is invoked due to a failed ATAPI qc, the qc here is | ||
686 | completed but not deallocated. The purpose of this | ||
687 | half-completion is to use the qc as place holder to make EH | ||
688 | code reach this place. This is a bit hackish, but it works. | ||
689 | </para> | ||
690 | <para> | ||
691 | Once control reaches here, the qc is deallocated by invoking | ||
692 | __ata_qc_complete() explicitly. Then, internal qc for REQUEST | ||
693 | SENSE is issued. Once sense data is acquired, scmd is | ||
694 | finished by directly invoking scsi_finish_command() on the | ||
695 | scmd. Note that as we already have completed and deallocated | ||
696 | the qc which was associated with the scmd, we don't need | ||
697 | to/cannot call ata_qc_complete() again. | ||
698 | </para> | ||
699 | |||
700 | </sect2> | ||
701 | |||
702 | <sect2><title>Problems with the current EH</title> | ||
703 | |||
704 | <itemizedlist> | ||
705 | |||
706 | <listitem> | ||
707 | <para> | ||
708 | Error representation is too crude. Currently any and all | ||
709 | error conditions are represented with ATA STATUS and ERROR | ||
710 | registers. Errors which aren't ATA device errors are treated | ||
711 | as ATA device errors by setting ATA_ERR bit. Better error | ||
712 | descriptor which can properly represent ATA and other | ||
713 | errors/exceptions is needed. | ||
714 | </para> | ||
715 | </listitem> | ||
716 | |||
717 | <listitem> | ||
718 | <para> | ||
719 | When handling timeouts, no action is taken to make device | ||
720 | forget about the timed out command and ready for new commands. | ||
721 | </para> | ||
722 | </listitem> | ||
723 | |||
724 | <listitem> | ||
725 | <para> | ||
726 | EH handling via ata_scsi_error() is not properly protected | ||
727 | from usual command processing. On EH entrance, the device is | ||
728 | not in quiescent state. Timed out commands may succeed or | ||
729 | fail any time. pio_task and atapi_task may still be running. | ||
730 | </para> | ||
731 | </listitem> | ||
732 | |||
733 | <listitem> | ||
734 | <para> | ||
735 | Too weak error recovery. Devices / controllers causing HSM | ||
736 | mismatch errors and other errors quite often require reset to | ||
737 | return to known state. Also, advanced error handling is | ||
738 | necessary to support features like NCQ and hotplug. | ||
739 | </para> | ||
740 | </listitem> | ||
741 | |||
742 | <listitem> | ||
743 | <para> | ||
744 | ATA errors are directly handled in the interrupt handler and | ||
745 | PIO errors in pio_task. This is problematic for advanced | ||
746 | error handling for the following reasons. | ||
747 | </para> | ||
748 | <para> | ||
749 | First, advanced error handling often requires context and | ||
750 | internal qc execution. | ||
751 | </para> | ||
752 | <para> | ||
753 | Second, even a simple failure (say, CRC error) needs | ||
754 | information gathering and could trigger complex error handling | ||
755 | (say, resetting & reconfiguring). Having multiple code | ||
756 | paths to gather information, enter EH and trigger actions | ||
757 | makes life painful. | ||
758 | </para> | ||
759 | <para> | ||
760 | Third, scattered EH code makes implementing low level drivers | ||
761 | difficult. Low level drivers override libata callbacks. If | ||
762 | EH is scattered over several places, each affected callbacks | ||
763 | should perform its part of error handling. This can be error | ||
764 | prone and painful. | ||
765 | </para> | ||
766 | </listitem> | ||
767 | |||
768 | </itemizedlist> | ||
769 | </sect2> | ||
770 | |||
771 | </sect1> | ||
416 | </chapter> | 772 | </chapter> |
417 | 773 | ||
418 | <chapter id="libataExt"> | 774 | <chapter id="libataExt"> |