diff options
| author | Tejun Heo <htejun@gmail.com> | 2005-10-01 22:54:29 -0400 |
|---|---|---|
| committer | Jeff Garzik <jgarzik@pobox.com> | 2005-10-03 22:11:29 -0400 |
| commit | fe998aa7e27f125f6768ec6b137b0ce2c9790509 (patch) | |
| tree | 124543efd939e2238d1b09a044969adbbef9b4bc | |
| parent | 31961943e3110c5a1c36b1e0069c29f7c4380e51 (diff) | |
[PATCH] libata: add ATA exceptions chapter to doc
Hello, Jeff.
This patch adds ATA errors & exceptions chapter to
Documentation/DocBook/libata.tmpl. As suggested, the chapter is
placed before low level driver specific chapters. Contents are
unchanged from the last posting.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
| -rw-r--r-- | Documentation/DocBook/libata.tmpl | 716 |
1 files changed, 716 insertions, 0 deletions
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index b2ec780bcda1..d260d92089ad 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl | |||
| @@ -787,6 +787,722 @@ and other resources, etc. | |||
| 787 | !Idrivers/scsi/libata-scsi.c | 787 | !Idrivers/scsi/libata-scsi.c |
| 788 | </chapter> | 788 | </chapter> |
| 789 | 789 | ||
| 790 | <chapter id="ataExceptions"> | ||
| 791 | <title>ATA errors & exceptions</title> | ||
| 792 | |||
| 793 | <para> | ||
| 794 | This chapter tries to identify what error/exception conditions exist | ||
| 795 | for ATA/ATAPI devices and describe how they should be handled in | ||
| 796 | implementation-neutral way. | ||
| 797 | </para> | ||
| 798 | |||
| 799 | <para> | ||
| 800 | The term 'error' is used to describe conditions where either an | ||
| 801 | explicit error condition is reported from device or a command has | ||
| 802 | timed out. | ||
| 803 | </para> | ||
| 804 | |||
| 805 | <para> | ||
| 806 | The term 'exception' is either used to describe exceptional | ||
| 807 | conditions which are not errors (say, power or hotplug events), or | ||
| 808 | to describe both errors and non-error exceptional conditions. Where | ||
| 809 | explicit distinction between error and exception is necessary, the | ||
| 810 | term 'non-error exception' is used. | ||
| 811 | </para> | ||
| 812 | |||
| 813 | <sect1 id="excat"> | ||
| 814 | <title>Exception categories</title> | ||
| 815 | <para> | ||
| 816 | Exceptions are described primarily with respect to legacy | ||
| 817 | taskfile + bus master IDE interface. If a controller provides | ||
| 818 | other better mechanism for error reporting, mapping those into | ||
| 819 | categories described below shouldn't be difficult. | ||
| 820 | </para> | ||
| 821 | |||
| 822 | <para> | ||
| 823 | In the following sections, two recovery actions - reset and | ||
| 824 | reconfiguring transport - are mentioned. These are described | ||
| 825 | further in <xref linkend="exrec"/>. | ||
| 826 | </para> | ||
| 827 | |||
| 828 | <sect2 id="excatHSMviolation"> | ||
| 829 | <title>HSM violation</title> | ||
| 830 | <para> | ||
| 831 | This error is indicated when STATUS value doesn't match HSM | ||
| 832 | requirement during issuing or excution any ATA/ATAPI command. | ||
| 833 | </para> | ||
| 834 | |||
| 835 | <itemizedlist> | ||
| 836 | <title>Examples</title> | ||
| 837 | |||
| 838 | <listitem> | ||
| 839 | <para> | ||
| 840 | ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying | ||
| 841 | to issue a command. | ||
| 842 | </para> | ||
| 843 | </listitem> | ||
| 844 | |||
| 845 | <listitem> | ||
| 846 | <para> | ||
| 847 | !BSY && !DRQ during PIO data transfer. | ||
| 848 | </para> | ||
| 849 | </listitem> | ||
| 850 | |||
| 851 | <listitem> | ||
| 852 | <para> | ||
| 853 | DRQ on command completion. | ||
| 854 | </para> | ||
| 855 | </listitem> | ||
| 856 | |||
| 857 | <listitem> | ||
| 858 | <para> | ||
| 859 | !BSY && ERR after CDB tranfer starts but before the | ||
| 860 | last byte of CDB is transferred. ATA/ATAPI standard states | ||
| 861 | that "The device shall not terminate the PACKET command | ||
| 862 | with an error before the last byte of the command packet has | ||
| 863 | been written" in the error outputs description of PACKET | ||
| 864 | command and the state diagram doesn't include such | ||
| 865 | transitions. | ||
| 866 | </para> | ||
| 867 | </listitem> | ||
| 868 | |||
| 869 | </itemizedlist> | ||
| 870 | |||
| 871 | <para> | ||
| 872 | In these cases, HSM is violated and not much information | ||
| 873 | regarding the error can be acquired from STATUS or ERROR | ||
| 874 | register. IOW, this error can be anything - driver bug, | ||
| 875 | faulty device, controller and/or cable. | ||
| 876 | </para> | ||
| 877 | |||
| 878 | <para> | ||
| 879 | As HSM is violated, reset is necessary to restore known state. | ||
| 880 | Reconfiguring transport for lower speed might be helpful too | ||
| 881 | as transmission errors sometimes cause this kind of errors. | ||
| 882 | </para> | ||
| 883 | </sect2> | ||
| 884 | |||
| 885 | <sect2 id="excatDevErr"> | ||
| 886 | <title>ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</title> | ||
| 887 | |||
| 888 | <para> | ||
| 889 | These are errors detected and reported by ATA/ATAPI devices | ||
| 890 | indicating device problems. For this type of errors, STATUS | ||
| 891 | and ERROR register values are valid and describe error | ||
| 892 | condition. Note that some of ATA bus errors are detected by | ||
| 893 | ATA/ATAPI devices and reported using the same mechanism as | ||
| 894 | device errors. Those cases are described later in this | ||
| 895 | section. | ||
| 896 | </para> | ||
| 897 | |||
| 898 | <para> | ||
| 899 | For ATA commands, this type of errors are indicated by !BSY | ||
| 900 | && ERR during command execution and on completion. | ||
| 901 | </para> | ||
| 902 | |||
| 903 | <para>For ATAPI commands,</para> | ||
| 904 | |||
| 905 | <itemizedlist> | ||
| 906 | |||
| 907 | <listitem> | ||
| 908 | <para> | ||
| 909 | !BSY && ERR && ABRT right after issuing PACKET | ||
| 910 | indicates that PACKET command is not supported and falls in | ||
| 911 | this category. | ||
| 912 | </para> | ||
| 913 | </listitem> | ||
| 914 | |||
| 915 | <listitem> | ||
| 916 | <para> | ||
| 917 | !BSY && ERR(==CHK) && !ABRT after the last | ||
| 918 | byte of CDB is transferred indicates CHECK CONDITION and | ||
| 919 | doesn't fall in this category. | ||
| 920 | </para> | ||
| 921 | </listitem> | ||
| 922 | |||
| 923 | <listitem> | ||
| 924 | <para> | ||
| 925 | !BSY && ERR(==CHK) && ABRT after the last byte | ||
| 926 | of CDB is transferred *probably* indicates CHECK CONDITION and | ||
| 927 | doesn't fall in this category. | ||
| 928 | </para> | ||
| 929 | </listitem> | ||
| 930 | |||
| 931 | </itemizedlist> | ||
| 932 | |||
| 933 | <para> | ||
| 934 | Of errors detected as above, the followings are not ATA/ATAPI | ||
| 935 | device errors but ATA bus errors and should be handled | ||
| 936 | according to <xref linkend="excatATAbusErr"/>. | ||
| 937 | </para> | ||
| 938 | |||
| 939 | <variablelist> | ||
| 940 | |||
| 941 | <varlistentry> | ||
| 942 | <term>CRC error during data transfer</term> | ||
| 943 | <listitem> | ||
| 944 | <para> | ||
| 945 | This is indicated by ICRC bit in the ERROR register and | ||
| 946 | means that corruption occurred during data transfer. Upto | ||
| 947 | ATA/ATAPI-7, the standard specifies that this bit is only | ||
| 948 | applicable to UDMA transfers but ATA/ATAPI-8 draft revision | ||
| 949 | 1f says that the bit may be applicable to multiword DMA and | ||
| 950 | PIO. | ||
| 951 | </para> | ||
| 952 | </listitem> | ||
| 953 | </varlistentry> | ||
| 954 | |||
| 955 | <varlistentry> | ||
| 956 | <term>ABRT error during data transfer or on completion</term> | ||
| 957 | <listitem> | ||
| 958 | <para> | ||
| 959 | Upto ATA/ATAPI-7, the standard specifies that ABRT could be | ||
| 960 | set on ICRC errors and on cases where a device is not able | ||
| 961 | to complete a command. Combined with the fact that MWDMA | ||
| 962 | and PIO transfer errors aren't allowed to use ICRC bit upto | ||
| 963 | ATA/ATAPI-7, it seems to imply that ABRT bit alone could | ||
| 964 | indicate tranfer errors. | ||
| 965 | </para> | ||
| 966 | <para> | ||
| 967 | However, ATA/ATAPI-8 draft revision 1f removes the part | ||
| 968 | that ICRC errors can turn on ABRT. So, this is kind of | ||
| 969 | gray area. Some heuristics are needed here. | ||
| 970 | </para> | ||
| 971 | </listitem> | ||
| 972 | </varlistentry> | ||
| 973 | |||
| 974 | </variablelist> | ||
| 975 | |||
| 976 | <para> | ||
| 977 | ATA/ATAPI device errors can be further categorized as follows. | ||
| 978 | </para> | ||
| 979 | |||
| 980 | <variablelist> | ||
| 981 | |||
| 982 | <varlistentry> | ||
| 983 | <term>Media errors</term> | ||
| 984 | <listitem> | ||
| 985 | <para> | ||
| 986 | This is indicated by UNC bit in the ERROR register. ATA | ||
| 987 | devices reports UNC error only after certain number of | ||
| 988 | retries cannot recover the data, so there's nothing much | ||
| 989 | else to do other than notifying upper layer. | ||
| 990 | </para> | ||
| 991 | <para> | ||
| 992 | READ and WRITE commands report CHS or LBA of the first | ||
| 993 | failed sector but ATA/ATAPI standard specifies that the | ||
| 994 | amount of transferred data on error completion is | ||
| 995 | indeterminate, so we cannot assume that sectors preceding | ||
| 996 | the failed sector have been transferred and thus cannot | ||
| 997 | complete those sectors successfully as SCSI does. | ||
| 998 | </para> | ||
| 999 | </listitem> | ||
| 1000 | </varlistentry> | ||
| 1001 | |||
| 1002 | <varlistentry> | ||
| 1003 | <term>Media changed / media change requested error</term> | ||
| 1004 | <listitem> | ||
| 1005 | <para> | ||
| 1006 | <<TODO: fill here>> | ||
| 1007 | </para> | ||
| 1008 | </listitem> | ||
| 1009 | </varlistentry> | ||
| 1010 | |||
| 1011 | <varlistentry><term>Address error</term> | ||
| 1012 | <listitem> | ||
| 1013 | <para> | ||
| 1014 | This is indicated by IDNF bit in the ERROR register. | ||
| 1015 | Report to upper layer. | ||
| 1016 | </para> | ||
| 1017 | </listitem> | ||
| 1018 | </varlistentry> | ||
| 1019 | |||
| 1020 | <varlistentry><term>Other errors</term> | ||
| 1021 | <listitem> | ||
| 1022 | <para> | ||
| 1023 | This can be invalid command or parameter indicated by ABRT | ||
| 1024 | ERROR bit or some other error condition. Note that ABRT | ||
| 1025 | bit can indicate a lot of things including ICRC and Address | ||
| 1026 | errors. Heuristics needed. | ||
| 1027 | </para> | ||
| 1028 | </listitem> | ||
| 1029 | </varlistentry> | ||
| 1030 | |||
| 1031 | </variablelist> | ||
| 1032 | |||
| 1033 | <para> | ||
| 1034 | Depending on commands, not all STATUS/ERROR bits are | ||
| 1035 | applicable. These non-applicable bits are marked with | ||
| 1036 | "na" in the output descriptions but upto ATA/ATAPI-7 | ||
| 1037 | no definition of "na" can be found. However, | ||
| 1038 | ATA/ATAPI-8 draft revision 1f describes "N/A" as | ||
| 1039 | follows. | ||
| 1040 | </para> | ||
| 1041 | |||
| 1042 | <blockquote> | ||
| 1043 | <variablelist> | ||
| 1044 | <varlistentry><term>3.2.3.3a N/A</term> | ||
| 1045 | <listitem> | ||
| 1046 | <para> | ||
| 1047 | A keyword the indicates a field has no defined value in | ||
| 1048 | this standard and should not be checked by the host or | ||
| 1049 | device. N/A fields should be cleared to zero. | ||
| 1050 | </para> | ||
| 1051 | </listitem> | ||
| 1052 | </varlistentry> | ||
| 1053 | </variablelist> | ||
| 1054 | </blockquote> | ||
| 1055 | |||
| 1056 | <para> | ||
| 1057 | So, it seems reasonable to assume that "na" bits are | ||
| 1058 | cleared to zero by devices and thus need no explicit masking. | ||
| 1059 | </para> | ||
| 1060 | |||
| 1061 | </sect2> | ||
| 1062 | |||
| 1063 | <sect2 id="excatATAPIcc"> | ||
| 1064 | <title>ATAPI device CHECK CONDITION</title> | ||
| 1065 | |||
| 1066 | <para> | ||
| 1067 | ATAPI device CHECK CONDITION error is indicated by set CHK bit | ||
| 1068 | (ERR bit) in the STATUS register after the last byte of CDB is | ||
| 1069 | transferred for a PACKET command. For this kind of errors, | ||
| 1070 | sense data should be acquired to gather information regarding | ||
| 1071 | the errors. REQUEST SENSE packet command should be used to | ||
| 1072 | acquire sense data. | ||
| 1073 | </para> | ||
| 1074 | |||
| 1075 | <para> | ||
| 1076 | Once sense data is acquired, this type of errors can be | ||
| 1077 | handled similary to other SCSI errors. Note that sense data | ||
| 1078 | may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR | ||
| 1079 | && ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such | ||
| 1080 | cases, the error should be considered as an ATA bus error and | ||
| 1081 | handled according to <xref linkend="excatATAbusErr"/>. | ||
| 1082 | </para> | ||
| 1083 | |||
| 1084 | </sect2> | ||
| 1085 | |||
| 1086 | <sect2 id="excatNCQerr"> | ||
| 1087 | <title>ATA device error (NCQ)</title> | ||
| 1088 | |||
| 1089 | <para> | ||
| 1090 | NCQ command error is indicated by cleared BSY and set ERR bit | ||
| 1091 | during NCQ command phase (one or more NCQ commands | ||
| 1092 | outstanding). Although STATUS and ERROR registers will | ||
| 1093 | contain valid values describing the error, READ LOG EXT is | ||
| 1094 | required to clear the error condition, determine which command | ||
| 1095 | has failed and acquire more information. | ||
| 1096 | </para> | ||
| 1097 | |||
| 1098 | <para> | ||
| 1099 | READ LOG EXT Log Page 10h reports which tag has failed and | ||
| 1100 | taskfile register values describing the error. With this | ||
| 1101 | information the failed command can be handled as a normal ATA | ||
| 1102 | command error as in <xref linkend="excatDevErr"/> and all | ||
| 1103 | other in-flight commands must be retried. Note that this | ||
| 1104 | retry should not be counted - it's likely that commands | ||
| 1105 | retried this way would have completed normally if it were not | ||
| 1106 | for the failed command. | ||
| 1107 | </para> | ||
| 1108 | |||
| 1109 | <para> | ||
| 1110 | Note that ATA bus errors can be reported as ATA device NCQ | ||
| 1111 | errors. This should be handled as described in <xref | ||
| 1112 | linkend="excatATAbusErr"/>. | ||
| 1113 | </para> | ||
| 1114 | |||
| 1115 | <para> | ||
| 1116 | If READ LOG EXT Log Page 10h fails or reports NQ, we're | ||
| 1117 | thoroughly screwed. This condition should be treated | ||
| 1118 | according to <xref linkend="excatHSMviolation"/>. | ||
| 1119 | </para> | ||
| 1120 | |||
| 1121 | </sect2> | ||
| 1122 | |||
| 1123 | <sect2 id="excatATAbusErr"> | ||
| 1124 | <title>ATA bus error</title> | ||
| 1125 | |||
| 1126 | <para> | ||
| 1127 | ATA bus error means that data corruption occurred during | ||
| 1128 | transmission over ATA bus (SATA or PATA). This type of errors | ||
| 1129 | can be indicated by | ||
| 1130 | </para> | ||
| 1131 | |||
| 1132 | <itemizedlist> | ||
| 1133 | |||
| 1134 | <listitem> | ||
| 1135 | <para> | ||
| 1136 | ICRC or ABRT error as described in <xref linkend="excatDevErr"/>. | ||
| 1137 | </para> | ||
| 1138 | </listitem> | ||
| 1139 | |||
| 1140 | <listitem> | ||
| 1141 | <para> | ||
| 1142 | Controller-specific error completion with error information | ||
| 1143 | indicating transmission error. | ||
| 1144 | </para> | ||
| 1145 | </listitem> | ||
| 1146 | |||
| 1147 | <listitem> | ||
| 1148 | <para> | ||
| 1149 | On some controllers, command timeout. In this case, there may | ||
| 1150 | be a mechanism to determine that the timeout is due to | ||
| 1151 | transmission error. | ||
| 1152 | </para> | ||
| 1153 | </listitem> | ||
| 1154 | |||
| 1155 | <listitem> | ||
| 1156 | <para> | ||
| 1157 | Unknown/random errors, timeouts and all sorts of weirdities. | ||
| 1158 | </para> | ||
| 1159 | </listitem> | ||
| 1160 | |||
| 1161 | </itemizedlist> | ||
| 1162 | |||
| 1163 | <para> | ||
| 1164 | As described above, transmission errors can cause wide variety | ||
| 1165 | of symptoms ranging from device ICRC error to random device | ||
| 1166 | lockup, and, for many cases, there is no way to tell if an | ||
| 1167 | error condition is due to transmission error or not; | ||
| 1168 | therefore, it's necessary to employ some kind of heuristic | ||
| 1169 | when dealing with errors and timeouts. For example, | ||
| 1170 | encountering repetitive ABRT errors for known supported | ||
| 1171 | command is likely to indicate ATA bus error. | ||
| 1172 | </para> | ||
| 1173 | |||
| 1174 | <para> | ||
| 1175 | Once it's determined that ATA bus errors have possibly | ||
| 1176 | occurred, lowering ATA bus transmission speed is one of | ||
| 1177 | actions which may alleviate the problem. See <xref | ||
| 1178 | linkend="exrecReconf"/> for more information. | ||
| 1179 | </para> | ||
| 1180 | |||
| 1181 | </sect2> | ||
| 1182 | |||
| 1183 | <sect2 id="excatPCIbusErr"> | ||
| 1184 | <title>PCI bus error</title> | ||
| 1185 | |||
| 1186 | <para> | ||
| 1187 | Data corruption or other failures during transmission over PCI | ||
| 1188 | (or other system bus). For standard BMDMA, this is indicated | ||
| 1189 | by Error bit in the BMDMA Status register. This type of | ||
| 1190 | errors must be logged as it indicates something is very wrong | ||
| 1191 | with the system. Resetting host controller is recommended. | ||
| 1192 | </para> | ||
| 1193 | |||
| 1194 | </sect2> | ||
| 1195 | |||
| 1196 | <sect2 id="excatLateCompletion"> | ||
| 1197 | <title>Late completion</title> | ||
| 1198 | |||
| 1199 | <para> | ||
| 1200 | This occurs when timeout occurs and the timeout handler finds | ||
| 1201 | out that the timed out command has completed successfully or | ||
| 1202 | with error. This is usually caused by lost interrupts. This | ||
| 1203 | type of errors must be logged. Resetting host controller is | ||
| 1204 | recommended. | ||
| 1205 | </para> | ||
| 1206 | |||
| 1207 | </sect2> | ||
| 1208 | |||
| 1209 | <sect2 id="excatUnknown"> | ||
| 1210 | <title>Unknown error (timeout)</title> | ||
| 1211 | |||
| 1212 | <para> | ||
| 1213 | This is when timeout occurs and the command is still | ||
| 1214 | processing or the host and device are in unknown state. When | ||
| 1215 | this occurs, HSM could be in any valid or invalid state. To | ||
| 1216 | bring the device to known state and make it forget about the | ||
| 1217 | timed out command, resetting is necessary. The timed out | ||
| 1218 | command may be retried. | ||
| 1219 | </para> | ||
| 1220 | |||
| 1221 | <para> | ||
| 1222 | Timeouts can also be caused by transmission errors. Refer to | ||
| 1223 | <xref linkend="excatATAbusErr"/> for more details. | ||
| 1224 | </para> | ||
| 1225 | |||
| 1226 | </sect2> | ||
| 1227 | |||
| 1228 | <sect2 id="excatHoplugPM"> | ||
| 1229 | <title>Hotplug and power management exceptions</title> | ||
| 1230 | |||
| 1231 | <para> | ||
| 1232 | <<TODO: fill here>> | ||
| 1233 | </para> | ||
| 1234 | |||
| 1235 | </sect2> | ||
| 1236 | |||
| 1237 | </sect1> | ||
| 1238 | |||
| 1239 | <sect1 id="exrec"> | ||
| 1240 | <title>EH recovery actions</title> | ||
| 1241 | |||
| 1242 | <para> | ||
| 1243 | This section discusses several important recovery actions. | ||
| 1244 | </para> | ||
| 1245 | |||
| 1246 | <sect2 id="exrecClr"> | ||
| 1247 | <title>Clearing error condition</title> | ||
| 1248 | |||
| 1249 | <para> | ||
| 1250 | Many controllers require its error registers to be cleared by | ||
| 1251 | error handler. Different controllers may have different | ||
| 1252 | requirements. | ||
| 1253 | </para> | ||
| 1254 | |||
| 1255 | <para> | ||
| 1256 | For SATA, it's strongly recommended to clear at least SError | ||
| 1257 | register during error handling. | ||
| 1258 | </para> | ||
| 1259 | </sect2> | ||
| 1260 | |||
| 1261 | <sect2 id="exrecRst"> | ||
| 1262 | <title>Reset</title> | ||
| 1263 | |||
| 1264 | <para> | ||
| 1265 | During EH, resetting is necessary in the following cases. | ||
| 1266 | </para> | ||
| 1267 | |||
| 1268 | <itemizedlist> | ||
| 1269 | |||
| 1270 | <listitem> | ||
| 1271 | <para> | ||
| 1272 | HSM is in unknown or invalid state | ||
| 1273 | </para> | ||
| 1274 | </listitem> | ||
| 1275 | |||
| 1276 | <listitem> | ||
| 1277 | <para> | ||
| 1278 | HBA is in unknown or invalid state | ||
| 1279 | </para> | ||
| 1280 | </listitem> | ||
| 1281 | |||
| 1282 | <listitem> | ||
| 1283 | <para> | ||
| 1284 | EH needs to make HBA/device forget about in-flight commands | ||
| 1285 | </para> | ||
| 1286 | </listitem> | ||
| 1287 | |||
| 1288 | <listitem> | ||
| 1289 | <para> | ||
| 1290 | HBA/device behaves weirdly | ||
| 1291 | </para> | ||
| 1292 | </listitem> | ||
| 1293 | |||
| 1294 | </itemizedlist> | ||
| 1295 | |||
| 1296 | <para> | ||
| 1297 | Resetting during EH might be a good idea regardless of error | ||
| 1298 | condition to improve EH robustness. Whether to reset both or | ||
| 1299 | either one of HBA and device depends on situation but the | ||
| 1300 | following scheme is recommended. | ||
| 1301 | </para> | ||
| 1302 | |||
| 1303 | <itemizedlist> | ||
| 1304 | |||
| 1305 | <listitem> | ||
| 1306 | <para> | ||
| 1307 | When it's known that HBA is in ready state but ATA/ATAPI | ||
| 1308 | device in in unknown state, reset only device. | ||
| 1309 | </para> | ||
| 1310 | </listitem> | ||
| 1311 | |||
| 1312 | <listitem> | ||
| 1313 | <para> | ||
| 1314 | If HBA is in unknown state, reset both HBA and device. | ||
| 1315 | </para> | ||
| 1316 | </listitem> | ||
| 1317 | |||
| 1318 | </itemizedlist> | ||
| 1319 | |||
| 1320 | <para> | ||
| 1321 | HBA resetting is implementation specific. For a controller | ||
| 1322 | complying to taskfile/BMDMA PCI IDE, stopping active DMA | ||
| 1323 | transaction may be sufficient iff BMDMA state is the only HBA | ||
| 1324 | context. But even mostly taskfile/BMDMA PCI IDE complying | ||
| 1325 | controllers may have implementation specific requirements and | ||
| 1326 | mechanism to reset themselves. This must be addressed by | ||
| 1327 | specific drivers. | ||
| 1328 | </para> | ||
| 1329 | |||
| 1330 | <para> | ||
| 1331 | OTOH, ATA/ATAPI standard describes in detail ways to reset | ||
| 1332 | ATA/ATAPI devices. | ||
| 1333 | </para> | ||
| 1334 | |||
| 1335 | <variablelist> | ||
| 1336 | |||
| 1337 | <varlistentry><term>PATA hardware reset</term> | ||
| 1338 | <listitem> | ||
| 1339 | <para> | ||
| 1340 | This is hardware initiated device reset signalled with | ||
| 1341 | asserted PATA RESET- signal. There is no standard way to | ||
| 1342 | initiate hardware reset from software although some | ||
| 1343 | hardware provides registers that allow driver to directly | ||
| 1344 | tweak the RESET- signal. | ||
| 1345 | </para> | ||
| 1346 | </listitem> | ||
| 1347 | </varlistentry> | ||
| 1348 | |||
| 1349 | <varlistentry><term>Software reset</term> | ||
| 1350 | <listitem> | ||
| 1351 | <para> | ||
| 1352 | This is achieved by turning CONTROL SRST bit on for at | ||
| 1353 | least 5us. Both PATA and SATA support it but, in case of | ||
| 1354 | SATA, this may require controller-specific support as the | ||
| 1355 | second Register FIS to clear SRST should be transmitted | ||
| 1356 | while BSY bit is still set. Note that on PATA, this resets | ||
| 1357 | both master and slave devices on a channel. | ||
| 1358 | </para> | ||
| 1359 | </listitem> | ||
| 1360 | </varlistentry> | ||
| 1361 | |||
| 1362 | <varlistentry><term>EXECUTE DEVICE DIAGNOSTIC command</term> | ||
| 1363 | <listitem> | ||
| 1364 | <para> | ||
| 1365 | Although ATA/ATAPI standard doesn't describe exactly, EDD | ||
| 1366 | implies some level of resetting, possibly similar level | ||
| 1367 | with software reset. Host-side EDD protocol can be handled | ||
| 1368 | with normal command processing and most SATA controllers | ||
| 1369 | should be able to handle EDD's just like other commands. | ||
| 1370 | As in software reset, EDD affects both devices on a PATA | ||
| 1371 | bus. | ||
| 1372 | </para> | ||
| 1373 | <para> | ||
| 1374 | Although EDD does reset devices, this doesn't suit error | ||
| 1375 | handling as EDD cannot be issued while BSY is set and it's | ||
| 1376 | unclear how it will act when device is in unknown/weird | ||
| 1377 | state. | ||
| 1378 | </para> | ||
| 1379 | </listitem> | ||
| 1380 | </varlistentry> | ||
| 1381 | |||
| 1382 | <varlistentry><term>ATAPI DEVICE RESET command</term> | ||
| 1383 | <listitem> | ||
| 1384 | <para> | ||
| 1385 | This is very similar to software reset except that reset | ||
| 1386 | can be restricted to the selected device without affecting | ||
| 1387 | the other device sharing the cable. | ||
| 1388 | </para> | ||
| 1389 | </listitem> | ||
| 1390 | </varlistentry> | ||
| 1391 | |||
| 1392 | <varlistentry><term>SATA phy reset</term> | ||
| 1393 | <listitem> | ||
| 1394 | <para> | ||
| 1395 | This is the preferred way of resetting a SATA device. In | ||
| 1396 | effect, it's identical to PATA hardware reset. Note that | ||
| 1397 | this can be done with the standard SCR Control register. | ||
| 1398 | As such, it's usually easier to implement than software | ||
| 1399 | reset. | ||
| 1400 | </para> | ||
| 1401 | </listitem> | ||
| 1402 | </varlistentry> | ||
| 1403 | |||
| 1404 | </variablelist> | ||
| 1405 | |||
| 1406 | <para> | ||
| 1407 | One more thing to consider when resetting devices is that | ||
| 1408 | resetting clears certain configuration parameters and they | ||
| 1409 | need to be set to their previous or newly adjusted values | ||
| 1410 | after reset. | ||
| 1411 | </para> | ||
| 1412 | |||
| 1413 | <para> | ||
| 1414 | Parameters affected are. | ||
| 1415 | </para> | ||
| 1416 | |||
| 1417 | <itemizedlist> | ||
| 1418 | |||
| 1419 | <listitem> | ||
| 1420 | <para> | ||
| 1421 | CHS set up with INITIALIZE DEVICE PARAMETERS (seldomly used) | ||
| 1422 | </para> | ||
| 1423 | </listitem> | ||
| 1424 | |||
| 1425 | <listitem> | ||
| 1426 | <para> | ||
| 1427 | Parameters set with SET FEATURES including transfer mode setting | ||
| 1428 | </para> | ||
| 1429 | </listitem> | ||
| 1430 | |||
| 1431 | <listitem> | ||
| 1432 | <para> | ||
| 1433 | Block count set with SET MULTIPLE MODE | ||
| 1434 | </para> | ||
| 1435 | </listitem> | ||
| 1436 | |||
| 1437 | <listitem> | ||
| 1438 | <para> | ||
| 1439 | Other parameters (SET MAX, MEDIA LOCK...) | ||
| 1440 | </para> | ||
| 1441 | </listitem> | ||
| 1442 | |||
| 1443 | </itemizedlist> | ||
| 1444 | |||
| 1445 | <para> | ||
| 1446 | ATA/ATAPI standard specifies that some parameters must be | ||
| 1447 | maintained across hardware or software reset, but doesn't | ||
| 1448 | strictly specify all of them. Always reconfiguring needed | ||
| 1449 | parameters after reset is required for robustness. Note that | ||
| 1450 | this also applies when resuming from deep sleep (power-off). | ||
| 1451 | </para> | ||
| 1452 | |||
| 1453 | <para> | ||
| 1454 | Also, ATA/ATAPI standard requires that IDENTIFY DEVICE / | ||
| 1455 | IDENTIFY PACKET DEVICE is issued after any configuration | ||
| 1456 | parameter is updated or a hardware reset and the result used | ||
| 1457 | for further operation. OS driver is required to implement | ||
| 1458 | revalidation mechanism to support this. | ||
| 1459 | </para> | ||
| 1460 | |||
| 1461 | </sect2> | ||
| 1462 | |||
| 1463 | <sect2 id="exrecReconf"> | ||
| 1464 | <title>Reconfigure transport</title> | ||
| 1465 | |||
| 1466 | <para> | ||
| 1467 | For both PATA and SATA, a lot of corners are cut for cheap | ||
| 1468 | connectors, cables or controllers and it's quite common to see | ||
| 1469 | high transmission error rate. This can be mitigated by | ||
| 1470 | lowering transmission speed. | ||
| 1471 | </para> | ||
| 1472 | |||
| 1473 | <para> | ||
| 1474 | The following is a possible scheme Jeff Garzik suggested. | ||
| 1475 | </para> | ||
| 1476 | |||
| 1477 | <blockquote> | ||
| 1478 | <para> | ||
| 1479 | If more than $N (3?) transmission errors happen in 15 minutes, | ||
| 1480 | </para> | ||
| 1481 | <itemizedlist> | ||
| 1482 | <listitem> | ||
| 1483 | <para> | ||
| 1484 | if SATA, decrease SATA PHY speed. if speed cannot be decreased, | ||
| 1485 | </para> | ||
| 1486 | </listitem> | ||
| 1487 | <listitem> | ||
| 1488 | <para> | ||
| 1489 | decrease UDMA xfer speed. if at UDMA0, switch to PIO4, | ||
| 1490 | </para> | ||
| 1491 | </listitem> | ||
| 1492 | <listitem> | ||
| 1493 | <para> | ||
| 1494 | decrease PIO xfer speed. if at PIO3, complain, but continue | ||
| 1495 | </para> | ||
| 1496 | </listitem> | ||
| 1497 | </itemizedlist> | ||
| 1498 | </blockquote> | ||
| 1499 | |||
| 1500 | </sect2> | ||
| 1501 | |||
| 1502 | </sect1> | ||
| 1503 | |||
| 1504 | </chapter> | ||
| 1505 | |||
| 790 | <chapter id="PiixInt"> | 1506 | <chapter id="PiixInt"> |
| 791 | <title>ata_piix Internals</title> | 1507 | <title>ata_piix Internals</title> |
| 792 | !Idrivers/scsi/ata_piix.c | 1508 | !Idrivers/scsi/ata_piix.c |
