Ultra ATA Implementation Guide

T13/D98109R0 Ultra ATA Implementation Guide To: T13 Technical committee From: Mark Evans Quantum Corporation 500 McCarthy Boulevard Milpitas, CA USA 95035 Phone: 408 894 4019 Fax: 408 952 3620 Email: mark.evans@quantum.com Date: 16 June 1998 Subj: Ultra ATA implementation guide Introduction: The following is a detailed implementation guide written by Eric Kvamme of Quantum s Systems Engineering group. This document is intended as an aid to the implementation of Ultra DMA/33 in host systems, ATA controllers, and peripherals. Clarification of some aspects of the protocol and details not specifically stated in the original Quantum Ultra DMA specification or the ATA/ATAPI 4 standard have been included for the benefit of component designers and device driver engineers. This document is not intended to be comprehensive but rather covers Ultra DMA subjects which have already caused design questions or are deemed to be of special interest or note. Included are warnings about proper interpretation of sections of the standard where subtle aspects of the protocol make interpretation errors seem possible.

This page left blank intentionally. Page 2 Ultra ATA/33 Implementation Guide

Table of Contents 1. Timing derivations...4 1.1 Fundamental timings, skews and delays...4 1.2 IC and PCB timings, delays, and skews...4 1.3 System timing parameters...6 1.3.1 t CYC...6 1.3.2 t 2CYC...7 1.3.3 t DS...7 1.3.4 t DH...8 1.3.5 t DVH...8 1.3.6 t DVS...9 1.3.7 t FS... 10 1.3.8 t LI... 11 1.3.9 t MLI... 12 1.3.10 t UI... 12 1.3.11 t AZ... 12 1.3.12 t ZAH... 13 1.3.13 t ZAD... 13 1.3.14 t ENV... 13 1.3.15 t SR... 14 1.3.16 t RFS... 14 1.3.17 t RP... 16 1.3.18 t IORDYZ, t ZIORDY, t ACK, t SS... 17 2. Ultra DMA protocol topics... 17 2.1 t SR, t RFS, and number of additional transfers... 17 2.2 Reasons for t SR... 18 2.3 Reason that t ZIORDY longer than t ENV is not a problem... 18 2.4 Recipient pauses and implications for data handling and CRC calculation... 19 2.5 CRC calculation and comparison... 20 2.6 IDENTIFY DEVICE command... 20 2.7 Strobe minimums and maximums... 21 2.8 Typical strobe cycle timing... 21 2.9 Reasons for t ACK timings... 22 2.10 Host chances to delay a burst and reasons for them... 22 2.11 Maximums on all control signals from the device... 23 2.12 Bus turnaround responsibilities... 23 Ultra ATA/33 Implementation Guide Page 3

1. Timing derivations The derivations for Ultra DMA that follow have been updated from the original timing derivations and, therefore, there are some minor differences between the values in the original timing table (see Table 32 Ultra DMA data burst timing requirements in ATA/ATAPI 4 revision 17) and the values determined here. Additional simulations and lab measurements continue to be made. While the values for skew and delay in the system may change slightly as new simulations and measurements are made, the process of deriving the timing values and assumptions about hardware will remain unchanged. When complete, these simulations and measurements will result in a proposal for changes to the ATA standard. 1.1 Fundamental timings, skews and delays The timings in the following list are not under the control of either the IC or PCB designer. Minimum typical cycle times (and their corresponding transfer rates) Mode 0 = 120 ns (16.7 megabytes per second). Mode 12 = 60 80 ns (33.3 (25 megabytes per second). Output termination resistor delays: Rising transition delay = 0.34 ns minimum, 1.96 ns maximum Falling transition delay = 0.23 ns minimum, 2.61 ns maximum Input termination resistor delays Data delay = 0.53 ns minimum, 0.76 ns maximum Control signal delay = 0.18 ns minimum, 0.12 ns maximum Cable/system skews and delays Maximum negative skew from the output pin of the interface IC to the input pin of the ATA connector = 3.37 ns (minimum strobe delay maximum data delay) Maximum positive skew from the output pin of the interface IC to the input pin of the ATA connector = 2.63 ns (maximum strobe delay minimum data delay) Maximum delay from the output pin of the interface IC to the input pin of the ATA connector = 6.0 ns Maximum negative skew from the output of the interface IC to the input pin of the interface IC = 3.52 ns (minimum strobe delay maximum data delay) Maximum positive skew from the output pin of the interface IC to the input pin of the interface IC = 2.73 ns (maximum strobe delay minimum data delay) Maximum delay from the output pin of the interface IC to the input pin of the interface IC = 6.2 ns 1.2 IC and PCB timings, delays, and skews The recommended Ultra DMA that timings these in timings the following be met, list they are are within not requirements. the control of the Meeting IC and tighter PCB timings designer. in some While areas it is Page 4 Ultra ATA/33 Implementation Guide

will allow looser timings in others. A designer should take all the timings that are achieved for that design and re derive the worst case timings for the protocol to determine if the timings for the protocol are met. Possible Clocks and characteristics All frequencies are assumed to have 60/40 percent asymmetry 25 MHz (supports modes 0 and 1) Typical period = 40 ns Clock variation = 1 percent 33 MHz clock (supports modes 0, 1, and 2) Typical period = 30 ns Clock variation = 1 percent 33 / 30 MHz PCI clock (supports modes 0, 1, and 2) Typical period = 30 / 33.3 ns Clock Minimum variation high or = low 1 percent time = 11.3 ns 50 MHz (supports modes 0, 1, and 2) Typical period = 20 ns Clock variation = 3.5 percent 66 MHz (supports modes 0, 1, and 2) Typical period = 15 ns Clock variation = 3.5 percent Note nanoseconds that if 33 instead MHz of or 80 a nanoseconds multiple of 33 as is MHz achievable is used, with the 25 typical MHz or cycle multiples time thereof. for mode 1 will be 90 PCB Traces Delay = 0.7 ns maximum Skew between signals due to traces = 0.15 ns maximum IC inputs Input delay from I/O pin to internal FF (includes input buffer and routing) = 5.5 ns maximum Input skew from I/O pin to internal FF, (+/ ) between strobe and data = 3.7 ns maximum IC outputs Output delay from internal system clock edge to I/O (including output buffer) = 18 ns maximum Output skew from internal system clock edge to I/O (plus and minus) between strobe and data (strobe edge may be rising or falling and data edges may be rising or falling; this timing must be met with any falling edge starting at I/O cell s Voh level or Vcc5 of system): With 25 MHz clock = 5.0 ns maximum With 33, PCI 50, clock or 66 = 5.8 MHz ns clock maximum = 6.0 ns maximum Output rising vs. falling skew for a single buffer = 3.5 ns maximum Data requires three more nanoseconds of delay than strobe for 33 MHz and PCI cases only. This is due to the fact that with these clocks, the data is held by a half cycle, and a half cycle minimum is not sufficient to meet the output hold time given the output skews listed above. The strobe and data must be skewed so that the typical data delay is longer than the typical strobe delay. The only way to reduce this required skew and Ultra ATA/33 Implementation Guide Page 5

meet the hold time would be to reduce the total output skew listed above. The reduction in required data delay is equal to the reduction in total output skew. IC flip flops Flip flop setup time (internal) = 0.7 ns minimum Flip flop hold time (internal) = 0.5 ns maximum 1.3 System timing parameters All system timings for Ultra DMA (including t RP ) are referenced to the connector of the agent responsible for the timing. Internally the IC must account for input and output delays and skews associated with all signals getting from the connector to the internal flip flop of the IC and from the flip flop of the IC to the connector. All of the values presented in the system timing derivations below use the minimum and maximum timing characteristics listed above. While values are given for each possible clock frequency for some parameters, it is important to remember that each is only an example of what system timings will be when the above listed timing characteristics are met. An IC designer should re derive all listed applicable timings based on the characteristics of the available system clock, IC and PCB with which they are used in order to confirm that all system timing requirements are met. 1.3.1 t CYC This minimum timing must account for STROBE asymmetry, output jitter, and clock variation. The worst case for minimum t CYC is generated by using the maximum output buffer skew for signals switching in opposite directions. The formula for the minimum values is: + Number of clock cycles to meet minimum typical cycle time at the minimum cycle time due to percent of clock variation Maximum skew for switching in opposite directions on the same buffer t CYC mode 0: 112 ns min w/ 25 MHz = 115.3 ns min w/ PCI @ 33 MHz = 115.3 ns min w/ 33MHz = 115.3 ns min w/ 50 MHz = 112.3 ns min w/ 66 MHz = 112.3 ns min t CYC mode 1: 73 ns min w/ 25 MHz = 75.7 ns min w/ PCI @ 33 MHz = 85.6 ns min w/ 33MHz = 85.6 ns min w/ 50 MHz = 73.7 ns min w/ 66 MHz = 83.4 ns min t CYC mode 2: 54 ns min w/ PCI @ 33 MHz = 55.9 ns min w/ 33MHz = 55.9 ns min w/ 50 MHz = 54.4 ns min w/ 66 MHz = 54.4 ns min Table 32 lists t CYC values of 114, 75, and 55. It will be recommended that these values be changed to 112, 73, and 54 respectively in future standards to reflect actual achievable values. Page 6 Ultra ATA/33 Implementation Guide

1.3.2 t 2CYC Since this timing is taken from falling edge to falling edge or rising edge to rising edge of STROBE, asymmetry in rise and fall times has no affect on the timing. Clock variation is the only significant contributor to t2 CYC variation. The formula for the minimum values is: + 2 X (Number of clock cycles to meet minimum typical cycle time at the minimum cycle time due to clock variation percent) t CYC mode 0: 230 ns min w/ 25 MHz = 237.6 ns min w/ PCI @ 33 MHz = 237.6 ns min w/ 33MHz = 237.6 ns min w/ 50 MHz = 231.6 ns min w/ 66 MHz = 231.6 ns min t CYC mode 1: 154 ns min w/ 25 MHz = 158.4 ns min w/ PCI @ 33 MHz = 178.2 ns min w/ 33MHz = 178.2 ns min w/ 50 MHz = 154.4 ns min w/ 66 MHz = 173.7 ns min t CYC mode 2: 115 ns min w/ PCI @ 33 MHz = 118.8 ns min w/ 33MHz = 118.8 ns min w/ 50 MHz = 115.8 ns min w/ 66 MHz = 115.8 ns min The minimum t 2CYC timings for modes 0, 1, and 2 in Table 32 are 235, 156, and 117 nanoseconds respectively. Those numbers are based on a two percent clock frequency variation where actual clock frequency variation on some devices is higher. For a 3.5 percent clock variation (as listed above) the t 2CYC times for modes 0, 1, and 2 should actually be 230, 154, and 115 nanoseconds respectively. As with t CYC, it will be recommended that these values be changed in future standards to reflect actual achievable values. 1.3.3 t DS This is the data setup time at the receiver. Since timings are taken at the connector and not at the ASIC, the effect of the termination resistors and traces must be considered when generating this number. Depending on the direction of the data signal and STROBE transitions, the skew between the two can change in both the positive and negative direction. The longest data signal delay and shortest STROBE delay is the worst case for setup time. In order to meet the required input skews given above, the number of buffers or amount of logic between the incoming signals and the input latch must be minimized. It may require the data input buffers to be routed directly to the input latch with no delay elements and the STROBE signal routed directly from it s input buffer to the input latch clock with no delay elements. The internal latch/flip flop has a non zero setup and hold time. t DS must be sufficiently long to guarantee that the setup time of the flip flop is met. The formula for the worst t DS corner case is: Ultra ATA/33 Implementation Guide Page 7

+ Maximum data input delay through series termination Minimum STROBE input delay through series termination + Maximum PCB trace skew + Maximum input skew + Minimum flip flop setup time t DS all modes = 5.49 ns minimum The specification for Ultra DMA adds margin for all modes 0, 1, and 2 by setting the minimum t DS values to 15, 10, and 7 nanoseconds respectively. 1.3.4 t DH As with the setup time t DS above, the hold time at the connector of the receiver must be of sufficient duration to guarantee that the hold time of the internal flip flop is met. The longest STROBE delay and shortest data delay is the worst case for hold time. The analysis is similar to the one for t DS above. The formula for the worst t DH corner case is: + Maximum STROBE input delay through series termination Minimum data input delay through series termination + Maximum PCB trace skew + Maximum input skew + Minimum flip flop hold time t DH all modes = 5.0 ns minimum The t DH specification for all modes is set to 5.0 ns. 1.3.5 t DVH This is the hold time required at the sender. It is also one of the more complex values to determine. The determination of this parameter can be approached from many perspectives. One approach is to determine what value must me bet in order to meet t DH. The minimum hold time required by the receiver has already been determined above (t DH ), and the first approach will be based on this. First of all, the timing requirement for the sender IC may be determined based on the t DH value. Hold time is reduced in the system with a STROBE delay that is longer than the data delay. This is represented by the maximum positive skew above. The hold time required at the output IC is, therefore: + t DH Maximum positive IC to IC skew Maximum trace skew IC requirement = 7.63 ns Given this requirement as measured at the IC pin, the requirement for the timing as measured at the connector pin can be determined. Strobe delay reduces the hold time and data delay increases it so the worst case would be as follows: + IC requirement just determined Maximum trace skew Maximum output falling delay through series termination + Minimum output rising delay through series termination t DVH all modes = 5.21 ns minimum Page 8 Ultra ATA/33 Implementation Guide

This value is rounded up to the nearest nanosecond to add margin, so the t DVH specification for all modes is set to 6.0 nanoseconds. After a timing requirement as measured at the connector is determined, the requirement to meet this timing as measured at the IC pin must be determined. A straightforward way to do this is to determine the extra margin added to t DVH and add that same margin to the IC requirement already determined. The same result is found be taking the t DVH specification value and adding back the trace and termination resistor skew maximums as follows: + t DVH specification + Maximum traces skew + Maximum output falling delay thorough series termination Minimum output rising delay through series termination Actual IC requirement to meet t DVH = 8.42 ns Either the achievable t DVH or the hold at the IC pin can be determined and verified against the appropriate value listed above. The achievable t DVH can be determined as follows: + Minimum internal hold time: for a 25 or 33 MHz clock this will be a minimum half clock cycle time (clock at the minimum cycle time due to clock variation and the minimum asymmetry); for the PCI clock this will be the minimum PCI high or low time; for a 50 or 66 MHz clock this will be a minimum full clock cycle time (clock at the minimum cycle time due to clock variation). Maximum IC output skew Maximum trace skew Maximum output falling delay through series termination + Minimum output rising delay through series termination + Extra IC data delay over strobe delay for 33 MHz and PCI clocks only (see IC timings section) t DVH all modes: 6.0 ns min w/ 25 MHz = 8.4 ns min w/ PCI @ 33 MHz = 6.1 ns min w/ 33MHz = 6.5 ns min w/ 50 MHz = 10.9 ns min w/ 66 MHz = 6.1 ns min This demonstrates that when all characteristic timings given above are met, the t DVH timing will also be met. As mentioned in the IC timing section, the extra delay required for data as compared to STROBE can be reduced or eliminated by reducing the output skew for the IC. No other output delays or skews are under the control of the IC so this is the only way the required delay can be reduced. A full cycle time must not be used to hold data with a 33 MHz or PCI clock, because this would be a maximum of over 30 nanoseconds internal hold time making it impossible to meet the required t DVS time. Simple gate delays can not be used in order to meet the hold time. Take, for example, an IC where the output buffer skews alone are reduced to two nanoseconds. In order to meet t DVH, the internal hold time would have to be a minimum of 10.4 nanoseconds. Delays from transistor to transistor in a single IC can be well matched, but transistor delay can vary by a factor of three over process, temperature, and voltage variations. Even if the process is controlled well enough to guarantee a delay variation of a factor of three, the maximum internal hold would be over 30 nanoseconds, and the t DVS time would not be met. While not recommended, it may be possible to use a calibrated delay path instead of the clock to produce the desired hold time. 1.3.6 t DVS This time is the data signal setup time required at the sender. This time must be of sufficient length in order to insure that the setup time at the receiver is met. In the case of Ultra DMA modes 0, 1, and 2, the data settle time can be long due to coupling between signals in the cable and loading on the receiver PCB. For Ultra ATA/33 Implementation Guide Page 9

this reason, the minimum setup time at the sender should be set as large as is achievable to give margin for settle time. In order to achieve the maximum setup time, the hold time must be generated as described above in the t DVH derivation. This will be with a half clock cycle time for a 25 MHz, 33 MHz, or PCI clock at 33 MHz, and a full cycle for a 50 or 66 MHz clock. Only for achievable setup time, the maximum asymmetry on the half cycles must be taken into account rather than the minimum. The minimum setup times achievable are, therefore, as follows: + Number of clock cycles to meet minimum typical cycle time at the minimum cycle time due to percent of clock variation Maximum internal data hold time (½ clock cycle or full clock cycle at frequency above depending on system clock used, ½ cycle at maximum percent of asymmetry) Maximum output skew for any two signals Maximum PCB trace skew Maximum output falling delay through series termination (data) + Minimum output rising delay through series termination (STROBE) t DVS mode 0: 70 ns min w/ 25 MHz = 87.6 ns min w/ PCI @ 33 MHz = 89.8 ns min w/ 33MHz = 89.6 ns min w/ 50 MHz = 88.1 ns min w/ 66 MHz = 92.9 ns min t DVS mode 1: 48 ns min w/ 25 MHz = 48.0 ns min w/ PCI @ 33 MHz = 60.1 ns min w/ 33MHz = 59.9 ns min w/ 50 MHz = 49.5 ns min w/ 66 MHz = 64.0 ns min t DVS mode 2: 30 ns min w/ PCI @ 33 MHz = 30.4 ns min w/ 33MHz = 30.2 ns min w/ 50 MHz = 30.2 ns min w/ 66 MHz = 35.0 ns min The value in Table 32 for t DVS for Ultra DMA mode 2 is 34 nanoseconds. The only way this value may be achieved along with the required hold time is to reduce the output skew. If the total output skew is reduced by two nanoseconds, then the required data delay can also be reduced by two nanoseconds making the achievable t DVS four nanoseconds longer. This would meet the specification. For a 50 MHz clock, the total output skew would have to be reduced by four nanoseconds. This will be difficult or impossible. It will be recommended that this value be changed in future standards to 30 nanoseconds. 1.3.7 t FS This timing is used only for the beginning of a read command from the STOP negation and/or HDMARDY assertion to first DSTROBE (all falling edges). The device is required to sense that these two control signals from the host have changed state. In general, synchronization is done with two flip flops. After synchronization is achieved, data must be driven on to the bus and clock cycles counted to meet the minimum setup time before the first STROBE is driven. In order for an IC based on a 25 MHz, 33 MHz, or PCI clock to meet t FS, data must be driven onto the bus no later than about 2.5 clock cycles after the control signal transitions. This could be done either by synchronizing with both the active and inactive edges of the system clock or by using only active edges to synchronize and then driving data onto the bus on the next Page 10 Ultra ATA/33 Implementation Guide

inactive edge of the clock after the signals are detected at the output of the second synchronization flip flop. With a 50 MHz clock, the first word of data must be driven out no later than three cycles after the control transitions, and with a 66 MHz clock, this may be four cycles. The maximum t FS timing is the sum of the following: + Maximum input STROBE falling edge delay through termination resistor + Maximum PCB trace delay + Maximum IC input delay to flip flop + Minimum flip flow setup time + 2, 3, or 4 clock cycles at the maximum period due to frequency variation to synchronize the control signals and start the data transfer cycle (for 25 MHz, 33 MHz, and PCI based systems, the data would be driven out ½ cycle after this; for other clock frequencies, data must be driven out no later than the 3 or 4 cycles allowed here) + As many cycles as required to meet the t DVS minimum timing for the first word of data (worst case for t FS is these at the maximum period due to frequency variation; for 25 MHz, 33 MHz, and PCI based systems, the number of cycles would be whatever is required to meet t CYC time; For 50 and 66 MHz clocks, this would be that value minus one) + Maximum output buffer delay + Maximum falling edge output termination resistor (33 ohm) delay t FS mode 0: 230 ns min w/ 25 MHz = 229.6 ns min w/ PCI @ 30 MHz = 227.6 ns min w/ 33MHz = 209.4 ns min w/ 50 MHz = 213.9 ns min w/ 66 MHz = 213.9 ns min t FS mode 1: 200 ns min w/ 25 MHz = 189.2 ns min w/ PCI @ 30 MHz = 194.0 ns min w/ 33MHz = 179.1 ns min w/ 50 MHz = 172.5 ns min w/ 66 MHz = 182.9 ns min t FS mode 2: 170 ns min w/ PCI @ 30 MHz = 160.3 ns min w/ 33MHz = 148.8 ns min w/ 50 MHz = 151.8 ns min w/ 66 MHz = 151.8 ns min 1.3.8 t LI The value of t LI needs to be large enough to give one agent enough time to respond to an input signal from the other. The derivation of t LI is similar to that of t FS since both involve one agent responding to the control signal of another. As with t FS, the number of clock cycles that an IC may take to respond is dependent on the frequency of the clock being used. For a 25 MHz or PCI clock, the maximum time to respond is three cycles, for a 33 MHz clock it is four cycles, for a 50 MHz clock it is five cycles, and for a 66 MHz clock it is seven cycles. The achievable values of t LI are derived as follows: Ultra ATA/33 Implementation Guide Page 11

+ Maximum input delay through series termination resistors + Maximum PCB trace delay. + Maximum IC input delay to flip flop. + Maximum flip flow setup time + 3, 4, 5 or 7 clock periods (depending on clock used) at the maximum period due to frequency variation to synchronize the signals to the internal clock and respond appropriately. + Maximum output buffer delay + Maximum output delay through series termination (falling) t LI mode 0, 1 and 2: 150 ns min w/ 25 MHz = 148.8 ns min w/ PCI @ 30 MHz = 126.6 ns min w/ 33MHz = 148.8 ns min w/ 50 MHz = 131.1 ns min w/ 66 MHz = 136.3 ns min 1.3.9 t MLI This timing insures that some control signals are in their proper state before DMACK is negated. It is important that STROBE and the control lines are in their proper states because all signals revert to their non Ultra DMA ATA definitions at the negation of DMACK. If the signals are not in their proper state, the active device or another device may see a false read or write strobe or data request. All control signals must be in their proper state and detectable at the device ASIC pins before DMACK is negated so t MLI must overcome the following: + Maximum IC to IC delay + Maximum IC input delay to flip flop + Minimum Flip flop setup time t MLI all modes = 12.4 ns The 20 ns assigned to t MLI is more than enough to meet the requirement. 1.3.10 t UI This timing is always measured from an action by a device to the reaction by the host. In order to allow the host to indefinitely delay the start of a read or write transfer, this value has no maximum. The minimum value is left at zero for modes 3 and 4. 1.3.11 t AZ During data bus direction turn around, the current driver of the bus is required to release the data signals no later than on the same clock cycle as another action it is taking. For the beginning of a read burst, the host must release the bus before or on the same clock cycle that it asserts DMACK, for the end of a read burst, the device must release the bus before or on the same clock cycle that it negates DMARQ. If the same clock is used, the maximum delay can be: + Maximum total IC output skew + Maximum output (33 ohm) termination resistor delay t AZ all modes = 5.9 ns The specification allows for additional margin, and the t AZ requirement is set at ten nanoseconds. Page 12 Ultra ATA/33 Implementation Guide

1.3.12 t ZAH This timing is used only for the termination of a read where the actions taken by both the host and device to change the direction of the data bus are measured from the same control signal (DMARQ). In this case the device is allowed to continue driving the bus for a maximum of t AZ after the DMARQ negation. The device is driving both DMARQ and the data bus to start. The host must wait t ZAH after the DMARQ negation to drive the data. Skew on the cable is the major factor to consider here and a longer data delay than the DMARQ delay (referred to here as maximum negative skew) is the worst case. To avoid bus contention, this value must be: + Maximum t AZ Maximum negative IO to IO (overestimate by on set of termination resistors) t ZAH all modes = 13.8 ns 1.3.13 t ZAD This timing is used only for the initiation of a read operation where the direction of the data bus is changed. Unlike the termination of a read operation where t ZAH is used, the bus high impedance time (bus released) and bus driven time are measured from two different control signals. Since these control signals must meet the t ENV timing, which is a minimum of 20 nanoseconds, no additional delay is necessary based on the t ZAH evaluation. The device must wait for the correct conditions to be present and then may immediately start driving the bus with no possibility of having bus contention. In practice, the device will require two flip flop delays to synchronize the control signals before it begins driving the bus. This value however set to zero nanoseconds minimum. 1.3.14 t ENV This time is from the host s assertion of the DMACK signal (falling edge) at the beginning of a burst to the assertion or negation of other control signals from the host (all falling edges). Since t ENV only applies to outputs from the host, the timings are synchronous with the host clock. Based on an argument similar to the one for t MLI, the minimum for t ENV is set to 20 nanoseconds. This guarantees that all control signals at all the devices are in their proper (non Ultra DMA ATA mode) states before DMACK is asserted and are sensed as changing only after DMACK has been asserted. This 20 nanoseconds more than accounts for cable and gate delays and skews between DMACK and the control signals on device inputs. Since t ENV involves synchronous events only, and an increase in t ENV reduces the performance of the specification, a maximum is applied. Enough clock cycles must be used between the assertion of DMACK and the other control signals to insure t ENV minimum is met. For a 25 MHz, 33 MHz or PCI clock this is a single cycle, for 50 or 66 MHz clocks this must be two cycles. The minimum t ENV value is therefore: + 1 or 2 system clock cycles (depending on frequency used) at the minimum period due to frequency variation to delay control signals inside the IC Maximum total IC output skew PCB trace skew Maximum output falling delay through series termination + Minimum output falling delay through series termination t ENV mode 0, 1 and 2: 20 ns min w/ 25 MHz = 32.1 ns min w/ PCI @ 33 MHz = 21.4 ns min w/ 33MHz = 21.2 ns min w/ 50 MHz = 30.1 ns min w/ 66 MHz = 20.4 ns min Ultra ATA/33 Implementation Guide Page 13

The parameter t ENV also has a maximum which must be met. For a 25 MHz and PCI clock a single cycle must still be used. For a 33 or 50 MHz clock a maximum of two cycles may be used, and with a 66 MHz a maximum of three cycles may be used. The maximum t ENV can be determined as follows: + 1, 2, 3, or 4 clock cycles (depending on frequency used) at the maximum period due to frequency variation required to delay control signals inside the IC + Maximum total IC output skew + PCB trace skew + Maximum output falling delay through series termination Minimum output falling delay through series termination t ENV mode 0, 1 and 2: 70 ns max w/ 25 MHz = 47.9 ns max w/ PCI @ 30 MHz = 42.0 ns max w/ 33MHz = 69.1 ns max w/ 50 MHz = 49.0 ns max w/ 66 MHz = 59.4 ns max Note that all the minimum and maximum numbers of clock cycles used are based on the timing characteristics given in this document, and fewer or more clock cycles may be used with some frequencies given reduced output skew. If the timing characteristics here are just met, then the internal IC delay must use the following number of clock cycles to be within t ENV minimum and maximum values. w/ 25 MHz, delay must be 1 cycle w/ PCI (30 or 33 MHz), delay must be 1 cycle w/ 33MHz, delay 1 or 2 cycles w/ 50 MHz, delay must be 2 cycles w/ 66 MHz, delay 2 or 3 cycles 1.3.15 t SR The value of t SR is determined in such a way as to guarantee that the receiver will get a maximum of one additional STROBE after it negates it s DMARDY signal. As is stated later in this document, there is no real advantage to using synchronous pauses so the derivation of the values is not included here. 1.3.16 t RFS This timing must give the sender time to sense the negation of DMARDY and respond by not sending any more STROBES. Unlike all other interlock timings where a delay in the timing does not affect the number of words transferred, a delay in t RFS timing does affect the number of words transferred. Since t RFS involves a response to a request to pause, it should be as short as possible. The shortest possible asynchronous input synchronization method would be to use two flip flops where the first is clocked on the active edge of the clock and the second on the normally unused (inactive) edge of the clock. The action to stop the STROBE signal would be taken on the next active clock edge (if there had been a STROBE scheduled for that edge it would not be sent). The hardware configuration just described is required for Ultra DMA operating with a 25, 33, or 50 MHz, or PCI clock. A half cycle of any of these clocks gives adequate time to avoid metastability while synchronizing the signal. The following timing diagram shows possible cases: t RFS range Next STROBE would have beenhere Clock: STROBE: Page 14 Ultra ATA/33 Implementation Guide

DMARDY FF1 FF2 The drawing above attempts to show the full range of possible STROBE to DMARDY transition relationships and the possible synchronization flip flop responses. When a 66 MHz or higher clock frequency is used, two clock periods may be used to synchronize the data so long as no STROBE edge is sent on the subsequent clock edges until the transfer is resumed. The longest t RFS case is where the DMARDY transition occurs before a clock cycle, but, due to skews and missed setup time, the transition is not clocked into the first flip flop until the next clock (the dotted line transition on FF1 and later on FF2). When this happens one clock cycle before a STROBE transition is generated (as shown by the left t RFS range marker near the middle of the DMARDY transition range in the diagram above), the next STROBE transition will occur (as shown in dotted lines). For all other cases, the value of t RFS will be shorter. The formula for the maximum t RFS is therefore: + Maximum input falling delay through series termination + Maximum trace delay + Maximum total IC input delay to flip flop + Minimum setup time for flip flop + 1 or 2 clock cycles at the maximum system clock period due to frequency variation for synchronization + Maximum total IC output delay from clock to output pin + Maximum trace delay on STROBE + Maximum output delay through series resistor (falling) t RFS mode 0: 75 ns min w/ 25 MHz = 68.7 ns min w/ PCI @ 30 MHz = 60.0 ns min w/ 33MHz = 58.6 ns min w/ 50 MHz = 49.0 ns min w/ 66 MHz = 59.4 ns min t RFS mode 1: 70 ns min w/ 25 MHz = 68.7 ns min w/ PCI @ 30 MHz = 60.0 ns min w/ 33MHz = 58.6 ns min w/ 50 MHz = 49.0 ns min w/ 66 MHz = 59.4 ns min t RFS mode 2: 60 ns min w/ PCI @ 30 MHz = 60.0 ns min w/ 33MHz = 58.6 ns min w/ 50 MHz = 49.0 ns min w/ 66 MHz = 59.4 ns min ATA/ATAPI-4 lists a minimum t RFS of 60 nanoseconds for mode 1 and 50 nanoseconds for mode 2. A system using a 25 MHz clock can not meet the t RFS minimum time of 60 nanoseconds. Even worse, only a system using a 50 MHz clock can meet the current minimum for mode 2. All systems and devices with Ultra ATA/33 Implementation Guide Page 15

clock frequencies other than 50 MHz can not meet this time for the worst case condition. Since both the mode 1 and mode 2 values can be changed without causing the length of time for the last STROBE to arrive at the receiver to exceed t RP, a correction to the values will not cause any backward compatibility problems. It will be recommended that this change be made in future standards. 1.3.17 t RP This is the time from the receiver s negation of DMARDY until no more STROBES will be received. STROBE edges may arrive at the sender up until this time period. Since this time parameter applies to the receiver only (it is the one responsible to wait for STROBES), it is referenced at the receiver connector. In light of this, the output delay of DMARDY from inside the IC to the connector and the input delay of a STROBE edge from the connector to the associated internal IC flip flop must be considered. There are two ways to determine the t RP minimum. One method is to consider how long it will take from the negation of DMARDY at the receiver for the sender to see the negation and become fully paused. This would involve synchronizing DMARDY as it is done for t RFS above, and then taking one more system clock cycle to change the state of the state machine to a paused state. Using this method, the minimum would be: + Maximum IC to IC delay (overestimates the delay by one termination resistor) + Maximum total IC input delay to flip flop + Minimum setup time for flip flop + 2 or 3 clock cycles (depending on clock used) at the maximum period due to clock frequency variation t RP mode 0, 1 and 2: 160, 125, and 100 ns min w/ 25 MHz = 93.2 ns min w/ PCI @ 30 MHz = 79.7 ns min w/ 33MHz = 73.0 ns min w/ 50 MHz = 53.8 ns min w/ 66 MHz = 58.9 ns min A second method to calculate this value would be to consider how long it might be for the last STROBE to be detected after negating DMARDY and make sure t RP is long enough so that the internal assertion of STOP occurs after the last STROBE has latched the last word of data. This method is applied in the following formula: + Maximum IC to IC delay (overestimates the delay by one termination resistor) + Maximum t RFS for mode + Maximum IC to IC delay (overestimates the delay by one termination resistor) + Maximum total IC input delay to flip flop + Minimum flip flop setup time t RP mode 0 for all clocks = 94.2 t RP mode 1 for all clocks = 89.2 t RP mode 2 for all clocks = 79.2 The above shows that t RP can be met and is sufficient to receive the last STROBE for all modes with all clock frequencies. An IC designer must remember that all the numbers are referenced at the connector and that the time to wait internally must be longer than the value of t RP. For higher frequency clocks, the internal delay may even need to be more than a clock cycle longer than the value of t RP to account for total output and input delays. Page 16 Ultra ATA/33 Implementation Guide

1.3.18 t IORDYZ, t ZIORDY, t ACK, t SS The derivation of these values is left to interested parties. 2. Ultra DMA protocol topics 2.1 t SR, t RFS, and number of additional transfers The statement that: If the recipient does not meet the [t SR ] maximum value, then the burst may be paused with 0, 1 or 2 additional data transfers does not imply that the sender is allowed to send up to two more strobes after it sees the negation of DMARDY. It must, in fact, never do this. Since t RFS is less than or equal to one transfer cycle time for all modes, sending two more strobes once DMARDY transitions at it s end of the cable would always be a violation of t RFS in modes 0 and 1, and would always be a violation in mode 2. In many cases for modes 0 and 1, sending one more STROBE would be a violation of the t RFS timing. Under all conditions, t RFS must be met by the sender as required in the Ultra DMA specification: The sender shall honor the recipient s negation of DMARDY within t RFS nanoseconds (by not sending any more strobes). While it would almost always be a violation for the sender to generate two more strobes once DMARDY is negated on it s end of the cable, it is still possible for the recipient that is attempting to pause the burst to see two more strobes after it negates DMARDY without any violation of the protocol. This is due to the delay of the signals through cable. Take, for instance, a theoretical case in mode 2 where the strobe time is the nominal 60 nanoseconds and signal delays add up to six nanoseconds: STROBE @ sender DMARDY @ sender STROBE @ recipient DMARDY @ recipient 5ns 6ns 60ns 49ns 6ns In the case shown, both the STROBE signal from sender to recipient and DMARDY from recipient to sender experience a cable delay of six nanoseconds. While the recipient negates DMARDY after the instant that the sender toggles STROBE, the recipient does not see the STROBE transition until after the DMARDY negation. This would account for the first word received. By the time the sender sees the DMARDY negation, there are only 49 nanoseconds until the next strobe which is within t RFS, so the sender may choose to send the strobe without violating the protocol. To the recipient, this would be the second transfer after it negates DMARDY but to the sender it would be the first and only allowable STROBE transition after seeing the DMARDY negation. One can calculate that in the mode 2 corner cases where the cycle time is the minimum and the delays are maximized, any time that the recipient negates DMARDY longer than t SR after it receives a strobe edge, it may receive up to two more transfers. t SR is the only timing that is not required to be met; it defines the boundary between cases where it is possible for up to one word to be received after the negation of DMARDY (named a synchronous pause) and the cases where it is possible for up to two words to be received (named an asynchronous pause). By the same type of analysis used to show that up to two words may be received in cases similar to the one shown above, it can also be proven that when t SR timing is met by the recipient, it can only receive up to one more word without the sender being in violation of the protocol. A rigorous derivation as is done for most timings above is left to interested parties. An IC designer must not forget the fact that this timing is measured at the connector and not inside the IC. Two more words may be received inside the IC after the clock edge that generates the negation of Ultra ATA/33 Implementation Guide Page 17

DMARDY without any part of the protocol or timing being violated. This is because there will be some output delay of DMARDY from inside the IC to the connector, and there will be input delay of a STROBE from the connector to inside the IC, even when t SR is met at the connector. The first word received inside the IC would be an edge that transitions at the connector before the negation of DMARDY gets there due to output delays. The second edge would be the single STROBE (at the connector) that is allowed in the t SR case. Additionally, the recipient can never know exactly how many more words it will receive after negating DMARDY. Therefore, a design must never expect a fixed number of words after negating DMARDY. Every time a recipient begins a pause, it must be ready to accept zero more words, one more word, or two more words and handle whichever case happens without failure. To make the assumption that only one or two of the cases listed might occur and not design for the possibility that any of the three could occur during a pause would be a fatal design error. In addition, it should be obvious that for the recipient to expect no more STROBES t RFS nanoseconds after it negates DMARDY would be an error. Instead the recipient must wait t RP before assuming that the sender is done sending STROBE transitions and has paused. The reception of two words after a pause has started should not be used by the recipient as an indication that the sender is fully paused. After two words the recipient may assume that no more words will be sent but it must not act on that information (by terminating the burst, for example) until t RP after the pause was started. This is to allow the sender time to complete it s process of going into a paused state which may take additional system clocks after it has sent it s last STROBE transition. Based on what has been stated, it should be clear that it is impossible for an Ultra DMA recipient to stop a data transfer at an exact predetermined boundary. Even by meeting t RP timing, the recipient can not avoid cases where the sender has the right to STROBE one additional word but may or may not. Please see the section on recipient pauses for additional implications of the t RFS timing. 2.2 Reasons for t SR As described above, t SR is a timing which has no requirement to be met. Instead, it defines a boundary between different pause cases. t SR could have been omitted from the standard with the requirement only that the recipient should be ready for up to two more words for all recipient generated pauses. However, a design could be produced in such a way as to always meet t SR through synchronizing the outgoing DMARDY negation with the incoming STROBE signal from the sender. The design for a recipient would be required to be ready for only up to one more transfer. Even though this kind of design adds complexity and provides little advantage, t SR was included for completeness. Other than for this unlikely architecture, t SR has no other design implications and for most designs should be ignored. A system where DMARDY is negated asynchronously with respect to the incoming STROBE would not be in violation of the protocol and would be the preferred implementation. In this implementation, the negation of DMARDY for pauses would be controlled solely by the state of the FIFO. Once a near-full condition is sensed, DMARDY could be negated immediately. There is no advantage in regard to FIFO size in trying to meet t SR due to the fact that synchronizing the outgoing DMARDY signal with the incoming STROBE requires an additional STROBE to occur after a FIFO near-full condition is detected before the DMARDY can be negated. If the asynchronous method is selected as recommended, then the design shall always be ready for zero, one, or two more words after it negates DMARDY and must work under any of those three condition. 2.3 Reason that t ZIORDY longer than t ENV is not a problem t ZIORDY does not have a maximum limit while t ENV is bounded by both a minimum and maximum. In the initiation of a data in burst, this means that STOP may be negated and DMARDY asserted (transfer ready for the first STROBE edge) while the STROBE is in a high impedance state. For the initiation of a data out burst, the STOP may be negated (Host ready to respond to a ready device) while DMARDY is in a high impedance state. In either case, the fact that the IORDY line is in a high impedance state at the active Page 18 Ultra ATA/33 Implementation Guide

device does not pose a problem. IORDY in a high impedance state at the device will be seen as asserted at the host because the host is required to have a pull up on IORDY. PIO and DMA rely on this pull up to maintain an asserted level on IORDY, and these protocols only require this signal to be negated when a device is not ready. In the case of Ultra DMA, the specification requires IORDY to be driven only during a burst and never outside of one. At the initiation of a data in burst the device may wait until the first STROBE edge is to be sent and simply change the signal from a high impedance state to driven low. If the device does not choose this implementation, it shall meet t ZIORDY then drive the STROBE high before the first edge. At the first STROBE edge then, the device would change the signal from a driven high to a driven low state. In both cases the host sees a high to low transition for the first STROBE. The first STROBE of a burst shall never be a low to high transition. At the initiation of a data out burst, the device may wait until a ready signal is required and simply change the DMARDY signal from a high impedance state to a driven low (asserted) state. If the device does not choose this implementation, it shall meet t ZIORDY, then negate DMARDY (drive it high) before the assertion of the signal is required, and then assert it at the proper time. As seen from the host, both implementations are the same since the driven high state will appear the same as the high impedance state. 2.4 Recipient pauses and implications for data handling and CRC calculation The fact that Ultra DMA/33 protocol gives the recipient the right to pause and then terminate a burst at any time regardless of the state of STROBE or the data on the bus, can cause major problems if misunderstood or forgotten. A designer must take into account that, except for the first two words of a burst, there is never a guarantee that data put on the bus will be transferred. Since a sender must stop toggling STROBE in less than one transfer cycle time when DMARDY negates at it s input, it is practically impossible to avoid cases where data will be gated or latched to the bus but never strobed because it was sent before DMARDY was synchronized to the sender s clock. For example, one possible design implementation would be with a 33 MHz system clock and two flip flops to synchronize the DMARDY signal. The first flip flop would be on an active clock edge and the second on the normally unused clock edge. In this case t RFS is only long enough for the sender to synchronize the DMARDY signal and then immediately stop toggling STROBE. Any data placed on the bus but not yet strobed when DMARDY is internally synchronized shall not be strobed. There is no minimum cycle time for DMARDY. The recipient does not have to wait for additional data words or for t RP from the time it negates DMARDY to the time it asserts DMARDY again when the recipient becomes ready. If, after negating DMARDY, the device becomes ready immediately, it may reassert DMARDY immediately. Based on the implementation of the sender s state machine, a negation and immediate assertion of DMARDY may cause a subsequent STROBE timing to be delayed. It is therefore recommended that a designer use some hysteresis in the FIFO trigger points for assertion and negation of DMARDY to avoid oscillation in the transfer (DMARDY being negated for every word or two). The above information on recipient pauses has two major implications: the first is with output data handling, and the second is with CRC calculation. If an output register is used where data is transferred from memory to the register in order for presentation on the bus, a designer must make sure that no assumptions are made that the data has been or will be transferred. If a pointer in memory is incremented or the data is cleared from memory when it is sent to the output register, then that data may be lost unless some recovery mechanism is implemented to decrement the pointer or restore the data if it is never strobed due to a permanent or temporary command termination after the pause. During the temporary suspension of a command, other bus activity (like a Status register read) may occur between when a burst is paused and it s resumption. A design structure using an output register would have any data in that register overwritten during this other activity. Other structures may involve similar considerations. It is most important to remember that data on the bus is not sent and should not be treated as sent until there is a valid STROBE edge. Besides careful handling data to avoid the loss of a word, a designer must also pay attention to what data is used when calculating the CRC. The CRC notes in the Ultra DMA specification state that, For each STROBE transition used for data transfer, both the host and device shall each calculate a new CRC Ultra ATA/33 Implementation Guide Page 19

value. Only words successfully transferred in the transfer phase of the burst are used to calculate CRC. This includes words legally transferred after a pause has been requested. Words put on the bus but never strobed shall NOT be used for CRC calculation. In addition, if STROBE is negated at the end of a pause and then the burst is terminated, the protocol requires STROBE to be returned to the asserted state after DMARQ is negated or STOP is asserted depending on the case (both conditions may be true when STROBE is returned to the asserted state). As stated in the Burst Termination notes 13 through 15, no data is transferred on this STROBE edge and any data on the bus that was not strobed during the transfer phase of the burst must not be used in the CRC calculation on this return to asserted edge of STROBE. 2.5 CRC calculation and comparison As stated in the section on recipient pauses and implications for data handling and CRC calculation, CRC must be calculated on successfully transferred data only. Data that is placed on the bus is not guaranteed to be sent. CRC must only be calculated on words that are properly strobed. For more detail on which words this includes and which it does not include, please read the section referred to above. While CRC generation in it s most basic form is a bit by bit serial shifting process, data on the AT bus is transferred one word at a time making a serial implementation difficult. For Ultra DMA/33, short of having an internal clock with a period 16 times shorter than the minimum transfer cycle time, t CYC, a clock with a longer period and a parallel equivalent to the serial process must be used. Taking this into account, the Ultra DMA specification has included the equations which define the XOR manipulations to be made on each bit and the structure required to perform this calculation using a clock generated directly from the STROBE. Through the given equations, the correct CRC can be calculated by using a small number of XOR gates, a single 16 bit latch, and a word clock (one clock per strobe edge). The equations define the value and order of each bit, and the order of each bit must be mapped directly to the same order lines of the bus. As defined in the Ultra DMA specification CRC note number 3, the CRC register must be pre set to 4ABAh. In the structure shown in the specification, this would involve pre setting the latch (CRCOUT) to 4ABAh before the first word clock occurs. After that, CRCIN15 to the latch is tied through to CRCOUT15. Once the burst is done, CRCOUT15 is the final CRC bit 15 that, when sent or received, must be done on line DD15 of the bus. This direct matching of bit order is true for all CRC bits. The proper use of the data sent on the bus bits DD0 through DD15 during the burst transfer is defined in the equations. The data bit DD15 on the bus is no different than bit DD15 in the equations to calculate CRC. This direct mapping is true for all bits strobed on the bus during a burst. Once the burst is terminated and the host sends the CRC data to the device (the host always sends the CRC independent of whether the burst was a data in or data out transfer), the device is required to compare the CRC sent from the host to the CRC it has calculated. While other CRC validation implementations may be possible, it is expected that a CRC input register be used on the device in combination with a digital comparitor to verify that the CRC value in the input register matches the value in it s own CRC calculation register. 2.6 IDENTIFY DEVICE command The changes to the IDENTIFY DEVICE command are intended to define Ultra DMA to the driver in a way similar to PIO and Multiword DMA. There are, however, some major differences due to differences in the protocol. In the field validity word (word 53), bit 2 is used to indicate that the fields reported in word 88 are valid. Word 88 defines the Ultra DMA mode that the device is capable of and the mode that is currently set. The single bit in word 53 and the bits defined for word 88 are the only bits required for Ultra DMA in the IDENTIFY DEVICE information. For an Ultra DMA capable drive, word 53 bit 2 must always be set to one. This bit could, therefore, be used to determine if a device is Ultra DMA capable, but it does not indicate which modes are supported. The contents of Word 88 are required to determine which modes are supported. On the topic of which protocols are supported: there has been some discussion about what should be done with word 49 bit 8. First, ATA 3 Page 20 Ultra ATA/33 Implementation Guide