diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/DocBook/libata.tmpl | 356 |
1 files changed, 356 insertions, 0 deletions
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index 375ae760dc1e..b2ec780bcda1 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl | |||
@@ -415,6 +415,362 @@ and other resources, etc. | |||
415 | </sect1> | 415 | </sect1> |
416 | </chapter> | 416 | </chapter> |
417 | 417 | ||
418 | <chapter id="libataEH"> | ||
419 | <title>Error handling</title> | ||
420 | |||
421 | <para> | ||
422 | This chapter describes how errors are handled under libata. | ||
423 | Readers are advised to read SCSI EH | ||
424 | (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first. | ||
425 | </para> | ||
426 | |||
427 | <sect1><title>Origins of commands</title> | ||
428 | <para> | ||
429 | In libata, a command is represented with struct ata_queued_cmd | ||
430 | or qc. qc's are preallocated during port initialization and | ||
431 | repetitively used for command executions. Currently only one | ||
432 | qc is allocated per port but yet-to-be-merged NCQ branch | ||
433 | allocates one for each tag and maps each qc to NCQ tag 1-to-1. | ||
434 | </para> | ||
435 | <para> | ||
436 | libata commands can originate from two sources - libata itself | ||
437 | and SCSI midlayer. libata internal commands are used for | ||
438 | initialization and error handling. All normal blk requests | ||
439 | and commands for SCSI emulation are passed as SCSI commands | ||
440 | through queuecommand callback of SCSI host template. | ||
441 | </para> | ||
442 | </sect1> | ||
443 | |||
444 | <sect1><title>How commands are issued</title> | ||
445 | |||
446 | <variablelist> | ||
447 | |||
448 | <varlistentry><term>Internal commands</term> | ||
449 | <listitem> | ||
450 | <para> | ||
451 | First, qc is allocated and initialized using | ||
452 | ata_qc_new_init(). Although ata_qc_new_init() doesn't | ||
453 | implement any wait or retry mechanism when qc is not | ||
454 | available, internal commands are currently issued only during | ||
455 | initialization and error recovery, so no other command is | ||
456 | active and allocation is guaranteed to succeed. | ||
457 | </para> | ||
458 | <para> | ||
459 | Once allocated qc's taskfile is initialized for the command to | ||
460 | be executed. qc currently has two mechanisms to notify | ||
461 | completion. One is via qc->complete_fn() callback and the | ||
462 | other is completion qc->waiting. qc->complete_fn() callback | ||
463 | is the asynchronous path used by normal SCSI translated | ||
464 | commands and qc->waiting is the synchronous (issuer sleeps in | ||
465 | process context) path used by internal commands. | ||
466 | </para> | ||
467 | <para> | ||
468 | Once initialization is complete, host_set lock is acquired | ||
469 | and the qc is issued. | ||
470 | </para> | ||
471 | </listitem> | ||
472 | </varlistentry> | ||
473 | |||
474 | <varlistentry><term>SCSI commands</term> | ||
475 | <listitem> | ||
476 | <para> | ||
477 | All libata drivers use ata_scsi_queuecmd() as | ||
478 | hostt->queuecommand callback. scmds can either be simulated | ||
479 | or translated. No qc is involved in processing a simulated | ||
480 | scmd. The result is computed right away and the scmd is | ||
481 | completed. | ||
482 | </para> | ||
483 | <para> | ||
484 | For a translated scmd, ata_qc_new_init() is invoked to | ||
485 | allocate a qc and the scmd is translated into the qc. SCSI | ||
486 | midlayer's completion notification function pointer is stored | ||
487 | into qc->scsidone. | ||
488 | </para> | ||
489 | <para> | ||
490 | qc->complete_fn() callback is used for completion | ||
491 | notification. ATA commands use ata_scsi_qc_complete() while | ||
492 | ATAPI commands use atapi_qc_complete(). Both functions end up | ||
493 | calling qc->scsidone to notify upper layer when the qc is | ||
494 | finished. After translation is completed, the qc is issued | ||
495 | with ata_qc_issue(). | ||
496 | </para> | ||
497 | <para> | ||
498 | Note that SCSI midlayer invokes hostt->queuecommand while | ||
499 | holding host_set lock, so all above occur while holding | ||
500 | host_set lock. | ||
501 | </para> | ||
502 | </listitem> | ||
503 | </varlistentry> | ||
504 | |||
505 | </variablelist> | ||
506 | </sect1> | ||
507 | |||
508 | <sect1><title>How commands are processed</title> | ||
509 | <para> | ||
510 | Depending on which protocol and which controller are used, | ||
511 | commands are processed differently. For the purpose of | ||
512 | discussion, a controller which uses taskfile interface and all | ||
513 | standard callbacks is assumed. | ||
514 | </para> | ||
515 | <para> | ||
516 | Currently 6 ATA command protocols are used. They can be | ||
517 | sorted into the following four categories according to how | ||
518 | they are processed. | ||
519 | </para> | ||
520 | |||
521 | <variablelist> | ||
522 | <varlistentry><term>ATA NO DATA or DMA</term> | ||
523 | <listitem> | ||
524 | <para> | ||
525 | ATA_PROT_NODATA and ATA_PROT_DMA fall into this category. | ||
526 | These types of commands don't require any software | ||
527 | intervention once issued. Device will raise interrupt on | ||
528 | completion. | ||
529 | </para> | ||
530 | </listitem> | ||
531 | </varlistentry> | ||
532 | |||
533 | <varlistentry><term>ATA PIO</term> | ||
534 | <listitem> | ||
535 | <para> | ||
536 | ATA_PROT_PIO is in this category. libata currently | ||
537 | implements PIO with polling. ATA_NIEN bit is set to turn | ||
538 | off interrupt and pio_task on ata_wq performs polling and | ||
539 | IO. | ||
540 | </para> | ||
541 | </listitem> | ||
542 | </varlistentry> | ||
543 | |||
544 | <varlistentry><term>ATAPI NODATA or DMA</term> | ||
545 | <listitem> | ||
546 | <para> | ||
547 | ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this | ||
548 | category. packet_task is used to poll BSY bit after | ||
549 | issuing PACKET command. Once BSY is turned off by the | ||
550 | device, packet_task transfers CDB and hands off processing | ||
551 | to interrupt handler. | ||
552 | </para> | ||
553 | </listitem> | ||
554 | </varlistentry> | ||
555 | |||
556 | <varlistentry><term>ATAPI PIO</term> | ||
557 | <listitem> | ||
558 | <para> | ||
559 | ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set | ||
560 | and, as in ATAPI NODATA or DMA, packet_task submits cdb. | ||
561 | However, after submitting cdb, further processing (data | ||
562 | transfer) is handed off to pio_task. | ||
563 | </para> | ||
564 | </listitem> | ||
565 | </varlistentry> | ||
566 | </variablelist> | ||
567 | </sect1> | ||
568 | |||
569 | <sect1><title>How commands are completed</title> | ||
570 | <para> | ||
571 | Once issued, all qc's are either completed with | ||
572 | ata_qc_complete() or time out. For commands which are handled | ||
573 | by interrupts, ata_host_intr() invokes ata_qc_complete(), and, | ||
574 | for PIO tasks, pio_task invokes ata_qc_complete(). In error | ||
575 | cases, packet_task may also complete commands. | ||
576 | </para> | ||
577 | <para> | ||
578 | ata_qc_complete() does the following. | ||
579 | </para> | ||
580 | |||
581 | <orderedlist> | ||
582 | |||
583 | <listitem> | ||
584 | <para> | ||
585 | DMA memory is unmapped. | ||
586 | </para> | ||
587 | </listitem> | ||
588 | |||
589 | <listitem> | ||
590 | <para> | ||
591 | ATA_QCFLAG_ACTIVE is clared from qc->flags. | ||
592 | </para> | ||
593 | </listitem> | ||
594 | |||
595 | <listitem> | ||
596 | <para> | ||
597 | qc->complete_fn() callback is invoked. If the return value of | ||
598 | the callback is not zero. Completion is short circuited and | ||
599 | ata_qc_complete() returns. | ||
600 | </para> | ||
601 | </listitem> | ||
602 | |||
603 | <listitem> | ||
604 | <para> | ||
605 | __ata_qc_complete() is called, which does | ||
606 | <orderedlist> | ||
607 | |||
608 | <listitem> | ||
609 | <para> | ||
610 | qc->flags is cleared to zero. | ||
611 | </para> | ||
612 | </listitem> | ||
613 | |||
614 | <listitem> | ||
615 | <para> | ||
616 | ap->active_tag and qc->tag are poisoned. | ||
617 | </para> | ||
618 | </listitem> | ||
619 | |||
620 | <listitem> | ||
621 | <para> | ||
622 | qc->waiting is claread & completed (in that order). | ||
623 | </para> | ||
624 | </listitem> | ||
625 | |||
626 | <listitem> | ||
627 | <para> | ||
628 | qc is deallocated by clearing appropriate bit in ap->qactive. | ||
629 | </para> | ||
630 | </listitem> | ||
631 | |||
632 | </orderedlist> | ||
633 | </para> | ||
634 | </listitem> | ||
635 | |||
636 | </orderedlist> | ||
637 | |||
638 | <para> | ||
639 | So, it basically notifies upper layer and deallocates qc. One | ||
640 | exception is short-circuit path in #3 which is used by | ||
641 | atapi_qc_complete(). | ||
642 | </para> | ||
643 | <para> | ||
644 | For all non-ATAPI commands, whether it fails or not, almost | ||
645 | the same code path is taken and very little error handling | ||
646 | takes place. A qc is completed with success status if it | ||
647 | succeeded, with failed status otherwise. | ||
648 | </para> | ||
649 | <para> | ||
650 | However, failed ATAPI commands require more handling as | ||
651 | REQUEST SENSE is needed to acquire sense data. If an ATAPI | ||
652 | command fails, ata_qc_complete() is invoked with error status, | ||
653 | which in turn invokes atapi_qc_complete() via | ||
654 | qc->complete_fn() callback. | ||
655 | </para> | ||
656 | <para> | ||
657 | This makes atapi_qc_complete() set scmd->result to | ||
658 | SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As | ||
659 | the sense data is empty but scmd->result is CHECK CONDITION, | ||
660 | SCSI midlayer will invoke EH for the scmd, and returning 1 | ||
661 | makes ata_qc_complete() to return without deallocating the qc. | ||
662 | This leads us to ata_scsi_error() with partially completed qc. | ||
663 | </para> | ||
664 | |||
665 | </sect1> | ||
666 | |||
667 | <sect1><title>ata_scsi_error()</title> | ||
668 | <para> | ||
669 | ata_scsi_error() is the current hostt->eh_strategy_handler() | ||
670 | for libata. As discussed above, this will be entered in two | ||
671 | cases - timeout and ATAPI error completion. This function | ||
672 | calls low level libata driver's eng_timeout() callback, the | ||
673 | standard callback for which is ata_eng_timeout(). It checks | ||
674 | if a qc is active and calls ata_qc_timeout() on the qc if so. | ||
675 | Actual error handling occurs in ata_qc_timeout(). | ||
676 | </para> | ||
677 | <para> | ||
678 | If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and | ||
679 | completes the qc. Note that as we're currently in EH, we | ||
680 | cannot call scsi_done. As described in SCSI EH doc, a | ||
681 | recovered scmd should be either retried with | ||
682 | scsi_queue_insert() or finished with scsi_finish_command(). | ||
683 | Here, we override qc->scsidone with scsi_finish_command() and | ||
684 | calls ata_qc_complete(). | ||
685 | </para> | ||
686 | <para> | ||
687 | If EH is invoked due to a failed ATAPI qc, the qc here is | ||
688 | completed but not deallocated. The purpose of this | ||
689 | half-completion is to use the qc as place holder to make EH | ||
690 | code reach this place. This is a bit hackish, but it works. | ||
691 | </para> | ||
692 | <para> | ||
693 | Once control reaches here, the qc is deallocated by invoking | ||
694 | __ata_qc_complete() explicitly. Then, internal qc for REQUEST | ||
695 | SENSE is issued. Once sense data is acquired, scmd is | ||
696 | finished by directly invoking scsi_finish_command() on the | ||
697 | scmd. Note that as we already have completed and deallocated | ||
698 | the qc which was associated with the scmd, we don't need | ||
699 | to/cannot call ata_qc_complete() again. | ||
700 | </para> | ||
701 | |||
702 | </sect1> | ||
703 | |||
704 | <sect1><title>Problems with the current EH</title> | ||
705 | |||
706 | <itemizedlist> | ||
707 | |||
708 | <listitem> | ||
709 | <para> | ||
710 | Error representation is too crude. Currently any and all | ||
711 | error conditions are represented with ATA STATUS and ERROR | ||
712 | registers. Errors which aren't ATA device errors are treated | ||
713 | as ATA device errors by setting ATA_ERR bit. Better error | ||
714 | descriptor which can properly represent ATA and other | ||
715 | errors/exceptions is needed. | ||
716 | </para> | ||
717 | </listitem> | ||
718 | |||
719 | <listitem> | ||
720 | <para> | ||
721 | When handling timeouts, no action is taken to make device | ||
722 | forget about the timed out command and ready for new commands. | ||
723 | </para> | ||
724 | </listitem> | ||
725 | |||
726 | <listitem> | ||
727 | <para> | ||
728 | EH handling via ata_scsi_error() is not properly protected | ||
729 | from usual command processing. On EH entrance, the device is | ||
730 | not in quiescent state. Timed out commands may succeed or | ||
731 | fail any time. pio_task and atapi_task may still be running. | ||
732 | </para> | ||
733 | </listitem> | ||
734 | |||
735 | <listitem> | ||
736 | <para> | ||
737 | Too weak error recovery. Devices / controllers causing HSM | ||
738 | mismatch errors and other errors quite often require reset to | ||
739 | return to known state. Also, advanced error handling is | ||
740 | necessary to support features like NCQ and hotplug. | ||
741 | </para> | ||
742 | </listitem> | ||
743 | |||
744 | <listitem> | ||
745 | <para> | ||
746 | ATA errors are directly handled in the interrupt handler and | ||
747 | PIO errors in pio_task. This is problematic for advanced | ||
748 | error handling for the following reasons. | ||
749 | </para> | ||
750 | <para> | ||
751 | First, advanced error handling often requires context and | ||
752 | internal qc execution. | ||
753 | </para> | ||
754 | <para> | ||
755 | Second, even a simple failure (say, CRC error) needs | ||
756 | information gathering and could trigger complex error handling | ||
757 | (say, resetting & reconfiguring). Having multiple code | ||
758 | paths to gather information, enter EH and trigger actions | ||
759 | makes life painful. | ||
760 | </para> | ||
761 | <para> | ||
762 | Third, scattered EH code makes implementing low level drivers | ||
763 | difficult. Low level drivers override libata callbacks. If | ||
764 | EH is scattered over several places, each affected callbacks | ||
765 | should perform its part of error handling. This can be error | ||
766 | prone and painful. | ||
767 | </para> | ||
768 | </listitem> | ||
769 | |||
770 | </itemizedlist> | ||
771 | </sect1> | ||
772 | </chapter> | ||
773 | |||
418 | <chapter id="libataExt"> | 774 | <chapter id="libataExt"> |
419 | <title>libata Library</title> | 775 | <title>libata Library</title> |
420 | !Edrivers/scsi/libata-core.c | 776 | !Edrivers/scsi/libata-core.c |