High-Performance DDR2 SDRAM Interface Data Capture Using ISERDES and OSERDES Author: Maria George

Similar documents
Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

Synthesizable FCRAM Controller Author: Curtis Fischaber

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

Synchronizing Multiple ADC08xxxx Giga-Sample ADCs

Single Channel LVDS Tx

Reducing DDR Latency for Embedded Image Steganography

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

Asynchronous inputs. 9 - Metastability and Clock Recovery. A simple synchronizer. Only one synchronizer per input

Virtex-II Connection to a High-Speed Serial Device (TLK2501) Author: Marc Defossez

BUSES IN COMPUTER ARCHITECTURE

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

Forward Error Correction on ITU-G.709 Networks using Reed-Solomon Solutions Author: Michael Francis

FPGA Design. Part I - Hardware Components. Thomas Lenzi

Debugging Memory Interfaces using Visual Trigger on Tektronix Oscilloscopes

LogiCORE IP Spartan-6 FPGA Triple-Rate SDI v1.0

Digital Electronics II 2016 Imperial College London Page 1 of 8

LogiCORE IP AXI Video Direct Memory Access v5.01.a

LogiCORE IP Video Timing Controller v3.0

Field Programmable Gate Array (FPGA) Based Trigger System for the Klystron Department. Darius Gray

AN-605 APPLICATION NOTE

Macronix OctaFlash Serial NOR Flash White Paper

Dual Link DVI Receiver Implementation

VID_OVERLAY. Digital Video Overlay Module Rev Key Design Features. Block Diagram. Applications. Pin-out Description

8b10b Macro. v2.0. This data sheet defines the functionality of Version 1.0 of the 8b10b macro.

EE178 Spring 2018 Lecture Module 5. Eric Crabill

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

Block Diagram. dw*3 pixin (RGB) pixin_vsync pixin_hsync pixin_val pixin_rdy. clk_a. clk_b. h_s, h_bp, h_fp, h_disp, h_line

SERDES Framer Interface Level 5 for Virtex-6 Devices Author: Vasu Devunuri

Lab #5: Design Example: Keypad Scanner and Encoder - Part 1 (120 pts)

Ultra ATA Implementation Guide

Modeling Latches and Flip-flops

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

Clocking Spring /18/05

System-Level Timing Closure Using IBIS Models

Dual Link DVI Receiver Implementation

FPGA Laboratory Assignment 4. Due Date: 06/11/2012

Implementing SMPTE SDI Interfaces with Artix-7 FPGA GTP Transceivers Author: John Snow


LogiCORE IP Spartan-6 FPGA Triple-Rate SDI v1.0

IT T35 Digital system desigm y - ii /s - iii

Lab #10 Hexadecimal-to-Seven-Segment Decoder, 4-bit Adder-Subtractor and Shift Register. Fall 2017

Implementing Triple-Rate SDI with Spartan-6 FPGA GTP Transceivers Author: Reed Tidwell

(12) United States Patent (10) Patent No.: US 8,707,080 B1

Digital Phase Adjustment Scheme 0 6/3/98, Chaney. A Digital Phase Adjustment Circuit for ATM and ATM- like Data Formats. by Thomas J.

Radar Signal Processing Final Report Spring Semester 2017

Logic Analysis Basics

Logic Analysis Basics

Technical Article MS-2714

Digilent Nexys-3 Cellular RAM Controller Reference Design Overview

EEM Digital Systems II

Application Note PG001: Using 36-Channel Logic Analyzer and 36-Channel Digital Pattern Generator for testing a 32-Bit ALU

Lecture #4: Clocking in Synchronous Circuits

Modeling Latches and Flip-flops

Sub-LVDS-to-Parallel Sensor Bridge

DEDICATED TO EMBEDDED SOLUTIONS

Synchronization Issues During Encoder / Decoder Tests

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

LogiCORE IP AXI Video Direct Memory Access v5.03a

Achieving Timing Closure in ALTERA FPGAs

Altera JESD204B IP Core and ADI AD9144 Hardware Checkout Report

AN 823: Intel FPGA JESD204B IP Core and ADI AD9625 Hardware Checkout Report for Intel Stratix 10 Devices

Polar Decoder PD-MS 1.1

IP-DDC4i. Four Independent Channels Digital Down Conversion Core for FPGA FEATURES. Description APPLICATIONS HARDWARE SUPPORT DELIVERABLES

DO NOT COPY DO NOT COPY

D Latch (Transparent Latch)

Clock Domain Crossing. Presented by Abramov B. 1

LogiCORE IP Video Timing Controller v3.0

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

CS3350B Computer Architecture Winter 2015

FSM Cookbook. 1. Introduction. 2. What Functional Information Must be Modeled

PROCESSOR BASED TIMING SIGNAL GENERATOR FOR RADAR AND SENSOR APPLICATIONS

Metastability Analysis of Synchronizer

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

INSTRUCTION MANUAL FOR MODEL IOC534 LOW LATENCY FIBER OPTIC TRANSMIT / RECEIVE MODULE

Laboratory 4. Figure 1: Serdes Transceiver

AN-822 APPLICATION NOTE

CAD for VLSI Design - I Lecture 38. V. Kamakoti and Shankar Balachandran

More Digital Circuits

University of Victoria. Department of Electrical and Computer Engineering. CENG 290 Digital Design I Lab Manual

LogiCORE IP Motion Adaptive Noise Reduction v2.0

Tutorial 11 ChipscopePro, ISE 10.1 and Xilinx Simulator on the Digilent Spartan-3E board

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

L11/12: Reconfigurable Logic Architectures

Scan. This is a sample of the first 15 pages of the Scan chapter.

11. Sequential Elements

National Instruments Synchronization and Memory Core a Modern Architecture for Mixed Signal Test

Digital Blocks Semiconductor IP

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Registers

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

Chapter 4: One-Shots, Counters, and Clocks

L12: Reconfigurable Logic Architectures

How to overcome/avoid High Frequency Effects on Debug Interfaces Trace Port Design Guidelines

The GANDALF 128-Channel Time-to-Digital Converter

Block Diagram. pixin. pixin_field. pixin_vsync. pixin_hsync. pixin_val. pixin_rdy. pixels_per_line. lines_per_field. pixels_per_line [11:0]

Static Timing Analysis for Nanometer Designs

SMPTE-259M/DVB-ASI Scrambler/Controller

Product Obsolete/Under Obsolescence

Trigger synchronization and phase coherent in high speed multi-channels data acquisition system

Transcription:

Application Note: Virtex-4 FPGAs XAPP721 (v2.2) July 29, 2009 High-Performance DD2 SDAM Interface Data Capture Using ISEDES and OSEDES Author: Maria George Summary This application note describes a data capture technique for a high-performance DD2 SDAM interface. This technique uses the Input Serializer/Deserializer (ISEDES) and Output Serializer/Deserializer (OSEDES) features available in every Virtex -4 FPGA I/O. Introduction A DD2 SDAM interface is source-synchronous where the read data and read strobe are transmitted edge aligned. To capture this transmitted data using Virtex-4 FPGAs, either the strobe or the data can be delayed. In this design, the read data is captured in the delayed strobe domain and recaptured in the FPGA clock domain in the ISEDES. The received serial, double data rate (DD) read data is converted to 4-bit parallel data at the frequency of the interface using the ISEDES. The 4-bit parallel data has the same frequency of the interface because the OCLK and CLKDIV inputs of the ISEDES in the memory mode are clocked by the same fast clock. The differential strobe is placed on a clock-capable I/O pair to access the BUFIO clock resource. The BUFIO clocking resource routes the delayed read DQS to its associated data ISEDES clock inputs. The write data and strobe transmitted by the FPGA use the OSEDES during write transactions. The OSEDES converts 4-bit parallel data at half the frequency of the interface to DD data at the interface frequency. The following are clocked at half the frequency of the interface, resulting in improved design margin at frequencies of 267 MHz and above: controller, datapath, user interface, and all other FPGA slice logic. Clocking Scheme Figure 1 shows the clocking scheme for this design, which includes one digital clock manager (DCM) and one phase-matched clock divider (PMCD). The controller is clocked at half the frequency of the interface using CLKdiv_0. Therefore, the address, bank address, and command signals (AS_L, CAS_L, and WE_L) are asserted for two clock cycles (known as 2T timing) of the fast memory interface clock. The control signals (CS_L, CKE, and ODT) are twice the rate (DD) of the half frequency clock CLKdiv_0, ensuring that the control signals are asserted for just one clock cycle of the fast memory interface clock. The clock is forwarded to the external memory device using the Output Dual Data ate (ODD) flip-flops in the Virtex-4 FPGA I/O. This forwarded clock is 180 out of phase with CLKfast_0. CLKfast Input System eset DCM CLKIN CLK90 ST CLK0 CLKFB CLKDV CLKA CLKB CLKC ST PMCD CLKA1 CLKA1D2 CLKB1 CLKC1 CLKdiv_90 CLKfast_0 CLKdiv_0 LOCKED EL Figure 1: Clocking Scheme for the High-Performance Memory Interface Design Figure 2 shows the command and control timing diagram. X721_01_020707 2005 2009 Xilinx, Inc. XILINX, the Xilinx logo, Virtex, Spartan, ISE, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. All other trademarks are the property of their respective owners. XAPP721 (v2.2) July 29, 2009 www.xilinx.com 1

Write Datapath CLKdiv_0 CLKfast_0 Memory Device Clock Command WITE IDLE Control (CS_L) X721_02_080205 Figure 2: Command and Control Timing Write Datapath The write datapath uses the built-in OSEDES available in every Virtex-4 FPGA I/O. The OSEDES transmits the data (DQ) and strobe (DQS) signals. The memory specification requires DQS to be transmitted center aligned with DQ. The strobe (DQS) forwarded to the memory is 180 out of phase with CLKfast_0. Therefore, the write data transmitted using OSEDES must be clocked by and CLKdiv_90 as shown in Figure 3. D1 DQ D2 Write Data Words 0-3 D3 D4 OSEDES CLKDIV CLK CLKdiv_90 IOB OSEDES X721_03_020807 Figure 3: Write Data Transmitted Using OSEDES XAPP721 (v2.2) July 29, 2009 www.xilinx.com 2

Write Datapath Figure 4 shows the timing diagram for write DQS and DQ signals. CLKdiv_0 CLKfast_0 Clock Forwarded to Memory Device Command WITE IDLE Control (CS_L) Strobe (DQS) Data (DQ), OSEDES Output D0 D1 D2 D3 X721_04_120505 Figure 4: Write Strobe (DQS) and Data (DQ) Timing for a Write Latency of Four XAPP721 (v2.2) July 29, 2009 www.xilinx.com 3

Write Datapath Write Timing Analysis Table 1: Write Timing Analysis at 300 MHz Table 1 shows the write timing analysis for an interface at 300 MHz (600 Mb/s). Uncertainty Parameters Value (ps) Uncertainties before DQS (ps) Uncertainties after DQS (ps) Meaning T CLOCK 3,333 Clock period. T MEMOY_DLL_DUTY_CYCLE_DIST 150 150 150 DCM duty-cycle distortion. T DATA_PEIOD 1,666 Data period is half the clock period with duty-cycle distortion subtracted from it. T SETUP 300 300 0 Specified by memory vendor. T HOLD 300 0 300 Specified by memory vendor. T PACKAGE_SKEW 20 20 20 PCB trace delays for DQS and its associated DQ bits are adjusted to account for package skew. The listed value represents dielectric constant variations. T JITTE 0 0 0 Same DCM used to generate DQS and DQ. T CLOCK_SKEW-MAX 100 100 100 Clock skew between DQ bits within a byte. T PMCD_CLK_SKEW 150 150 150 Phase offset error between different clock outputs of the same PMCD. T PCB_LAYOUT_SKEW 50 50 50 Skew between data lines and the associated strobe on the board. Total Uncertainties 770 770 Start and End of Valid Window 770 896 Final Window 126 Final window equals 896 770. Notes: 1. Skew between output flip-flops and output buffers in the same bank is considered to be minimal over voltage and temperature. XAPP721 (v2.2) July 29, 2009 www.xilinx.com 4

Write Datapath Controller to Write Datapath Interface Table 2 lists the signals required from the controller to the write datapath. Table 2: Controller to Write Datapath Signals Signal Name Signal Width Signal Description ctrl_wren 1 Output from the controller to the write datapath. Write DQS and DQ generation begins when this signal is asserted. ctrl_wr_disable 1 Output from the controller to the write datapath. Write DQS and DQ generation ends when this signal is deasserted. ctrl_odd_latency 1 Output from controller to write datapath. Notes Asserted for two CLKDIV_0 cycles for a burst length of 4 and three CLKDIV_0 cycles for a burst length of 8. Asserted one CLKDIV_0 cycle earlier than the WITE command for CAS latency values of 4 and 5. Figure 5 and Figure 6 show the timing relationship of this signal with respect to the WITE command. Asserted for one CLKDIV_0 cycle for a burst length of 4 and two CLKDIV_0 cycles for a burst length of 8. Asserted one CLKDIV_0 cycle earlier than the WITE command for CAS latency values of 4 and 5. Figure 5 and Figure 6 show the timing relationship of this signal with respect to the WITE command. Asserted when the selected CAS latency is an odd number (such as 5). equired for generation of write DQS and DQ after the correct write latency (the number of clock cycles after a write command is issued). (Write latency = CAS latency 1.) XAPP721 (v2.2) July 29, 2009 www.xilinx.com 5

Write Datapath CLKdiv_0 Clock Forwarded to Memory Device CLKdiv_90 Command WITE IDLE Control (CS_L) ctrl_wren ctrl_wr_disable User Interface Data FIFO Out D0,D1,D2,D3 OSEDES Inputs D1, D2, D3, D4 X,X,D0,D1 D2,D3,X,X OSEDES Inputs T1, T2, T3, T4 1,1,0,0 0,0,1,1 Strobe (DQS) Data (DQ), OSEDES Output D0 D1 D2 D3 X721_05_080205 Figure 5: Write DQ Generation for a Write Latency of 4 and a Burst Length of 4 CLKdiv_0 CLKfast_0 Clock Forwarded to Memory Device CLKdiv_180 Command WITE IDLE Control (CS_L) ctrl_wren ctrl_wr_disable OSEDES Inputs D1, D2, D3, D4 0, 0, 0, 0 0, 1, 0, 1 0, 0, 0,0 OSEDES Inputs T1, T2, T3, T4 1, 1, 1, 0 0, 0, 0, 0 0, 1, 1, 1 Strobe (DQS), OSEDES Output X721_06_101207 Figure 6: Write DQS Generation for a Write Latency of 4 and a Burst Length of 4 XAPP721 (v2.2) July 29, 2009 www.xilinx.com 6

ead Datapath ead Datapath The read datapath comprises the read data capture and recapture stages. Both stages are implemented in the built-in ISEDES available in every Virtex-4 I/O. In the memory mode, ISEDES has three clock inputs: CLK, OCLK, and CLKDIV. For the earlier version of this design (MIG1.6), these three clock inputs were provided as follows: CLK: ead DQS routed on the BUFIO was provided as the CLK input of the ISEDES. OCLK: The clock was provided as the OCLK input of the ISEDES. CLKDIV: The CLKDIV input of the ISEDES was provided as a selection between CLKdiv_90 or its inverted version from a BUFGMUX. The BUFGMUX enabled selection of either the rising or falling edge of the divided clock during calibration, based on the number of IDELAY taps required. The CLKDIV edge that yielded the lower tap count was selected. Also, for the earlier version of this design, the total number of taps required for data in the worst case was three-quarters of a fast clock period. This scheme required one additional DCM to invert the divided clock because the PMCD cannot invert clocks. The result of this clocking scheme was additional jitter on the CLKDIV input of the ISEDES compared to OCLK input. In the latest version of this design (MIG1.7), to avoid using the additional DCM and reduce clock jitter, the divided clock is not input to the ISEDES. The OCLK and CLKDIV inputs of the ISEDES are clocked by the fast clock,, that has the same frequency as the interface. In the worst case, the total number of IDELAY taps required to align read strobe (DQS) and read data (DQ) to the rising edge of the FPGA clock () remains threequarters fast clock period. The advantage of this design is the savings in resources, namely one DCM, one BUFGMUX, and lower jitter clocks. For the latest version of this design, the clock inputs are as follows: CLK: The read DQS routed using BUFIO provides the CLK input of the ISEDES as shown in Figure 7. OCLK: The OCLK input of ISEDES is connected to the CLK input of OSEDES in hardware. In this design, the clock is provided to the ISEDES OCLK input and the OSEDES CLK input. The clock phase used for OCLK is dictated by the phase required for write data. CLKDIV: The CLKDIV input is also provided with. DQ IDELAY ISEDES Q1 Q2 Q3 User Interface FIFOs ead Data Word 3 ead Data Word 2 ead Data Word 1 Q4 ead Data Word 0 CLKdiv_180 CLK OCLK CLKDIV ISEDES Delay value determined during calibration BUFIO DQS IDELAY IOB Figure 7: ead Data Capture Using ISEDES X721_07_020807 XAPP721 (v2.2) July 29, 2009 www.xilinx.com 7

ead Datapath ead Timing Analysis To capture read data without errors in the ISEDES, read data and strobe must be delayed to meet the setup and hold times of the flip-flops in the FPGA clock domain. ead data (DQ) and strobe (DQS) are received edge aligned at the FPGA. The differential DQS pair must be placed on a clock-capable I/O pair in order to access the BUFIO resource. The received read DQS is then routed through the BUFIO resource to the CLK input of the ISEDES of the associated data bits. The delay through the BUFIO and clock routing resources shifts the DQS to the right with respect to data. The total delay through the BUFIO and clock resource is 595 ps in a -11 speed grade device and 555 ps in a -12 speed grade device. Table 3 lists the read timing analysis that is required to determine the data margin at 300 MHz. Table 3: ead Timing Analysis at 300 MHz Parameter Value (ps) Meaning T CLOCK 3,333 Clock period. T PHASE 1,667 Data period for DD data. T SAMP_BUFIO 350 Sample Window from Virtex-4 FPGA data sheet for a -12 device. It includes setup and hold for an IOB FF, clock jitter, and 150 ps of tap uncertainty. T BUFIO_DCD 100 BUFIO clock resource duty-cycle distortion. T DQSQ + T QHS 580 Worst-case memory uncertainties that include VT variations and skew between DQS and its associated DQs. IDELAY Tap Jitter 348 Total tap jitter when using 29 taps. The worst-case jitter through each tap is 12 ps. Total Uncertainties 1,378 Window 289 Worst-case window. Notes: 1. T SAMP_BUFIO is the sampling error over VT for a DD input register in the IOB when using the BUFIO clocking resource and the IDELAY. 2. All the parameters listed are uncertainties to be considered when using the per bit calibration technique. 3. Parameters such as BUFIO skew, package_skew, pcb_layout_skew, and part of TDQSQ and TQHS are calibrated out with the per bit calibration technique. Inter-symbol interference, crosstalk, and contributors to dynamic skew are not considered in this analysis. Per Bit Deskew Data Capture Technique To ensure reliable data capture in the OCLK and CLKDIV domains in the ISEDES, a training sequence is required after memory initialization. The controller issues a WITE command to write the following known data pattern: First ising data = FF, First Falling Data = 00, Second ising Data = AA, Second Falling Data = 55. The controller then issues back-to-back read commands to read back the written data from this specified location. The DQ bus ISEDES outputs Q1, Q2, Q3, and Q4 are then compared with the known data pattern. The DQS is delayed more than DQ because of the propagation delay through the BUFIO and the clock resource. The DQS is delayed by two additional taps to push it further in the DQ valid window. The flow diagram of the calibration algorithm is shown in Figure 8. XAPP721 (v2.2) July 29, 2009 www.xilinx.com 8

ead Datapath ctrl_dummyread_start = 1 Delay DQS by 2 taps (i = i + 1) Increment Tap for DQS and DQ No (i = 1) Valid Data Pattern? Yes No (i = 0) Invert clk_en to check for valid data on the adjacent clock cycle Increment Tap for DQS and DQ No Valid Data Pattern within 11 taps? Yes Valid Data Pattern for >10 taps? No (i = 0) or (i = 1) Yes Increment Tap for DQS and DQ Valid Data Pattern? Yes Decrement DQS and DQ taps by 17 or 10 taps 17 taps if valid window is > 17 taps Deskew each DQ Bit (per bit deskew) ead FIFOs Write Enable Calibration No (Error in Data Pattern detects end of data valid window) dqs_calib_done_out = 1 dp_dqs_dq_calib_done = 1 dp_dly_slct_done = 1 X721_08_030707 Figure 8: ead Data and Strobe Delay Calibration Flow XAPP721 (v2.2) July 29, 2009 www.xilinx.com 9

ead Datapath Figure 9 shows the read timing waveform for a burst length of 8. The read data, DQ, is first captured in the DQS domain and then transferred to the FPGA fast clock domain (). The waveform shows a case where the DQS and DQ are aligned correctly to the FPGA clock domain, and the correct data sequence is available at the output of the ISEDES. For a burst length of 8, valid data is available every alternate clock cycle. The lower end of the frequency range for this design is limited by the number of available taps in the IDELAY block, the PCB trace delay, and the CAS latency of the memory device. DQS @FPGA DQ @ FPGA DQS @ ISEDES delayed by BIFIO and Clocking esource DQ delayed by Calibration Delay DQ captured in DQS Domain D0 D1 D2 D3 D4 D5 D6 D7 D0 D1 D2 D3 D4 D5 D6 D7 D0 D2 D4 D6 D1 D3 D5 D7 D0 D2 D4 D6 D1 D3 D5 D7 D0 D2 D4 D6 Domain D1 D3 D5 D7 ISEDES Output Q4 D0 D2 D4 D6 ISEDES Output Q3 D1 D3 D5 D7 ISEDES Output Q2 ISEDES Output Q1 D2 D4 D6 X D3 D5 D7 X clk_en polarity determined during calibration X721_09_022007 Figure 9: ead Data and Strobe Capture Timing for Burst Length of 8 XAPP721 (v2.2) July 29, 2009 www.xilinx.com 10

ead Datapath Controller to ead Datapath Interface Table 4 lists the control signals between the controller and the read datapath. Table 4: Signals between Controller and ead Datapath Signal Name Signal Width Signal Description ctrl_dummyread_start 1 Output from the controller to the read datapath. When this signal is asserted, the strobe and data calibration begin. dp_dly_slct_done 1 Output from the read datapath to the controller indicating the strobe and data calibration are complete. ctrl_den_div0 1 Output from the controller to the read datapath used as the write enable to the read data capture FIFOs. Notes This signal must be asserted when valid read data is available on the data bus. This signal is deasserted when the dp_dly_slct_done signal is asserted. This signal is asserted when the data and strobe have been calibrated. Normal operation begins after this signal is asserted. This signal is asserted for one CLKdiv_0 clock cycle for a burst length of 4 and two clock cycles for a burst length of 8. The CAS latency and additive latency values determine the timing relationship of this signal with the read state. Figure 10 shows the timing waveform for this signal with a CAS latency of 5 and an additive latency of 0 for a burst length of 4. CLKdiv_0 CK @ Memory Command EAD D0 D1 D2 D3 DQ @ Memory Device CS# @ Memory DQS @ Memory Device ctrl_den_div0 D0 D1 D2 D3 DQS @ ISEDES CLK input (round trip + BUFIO + calibration delays) DQ @ ISEDES input (round trip + calibration delays) (Input to SL16 clocked by ) Srl_out (SL16 output) D0 - D3 Parallel Data @ ISEDES output Ctrl_dEn Write Enable to ead Data FIFOs X721_10_020607 Figure 10: Write-Enable Timing for CAS Latency of 5 and Burst Length of 4 XAPP721 (v2.2) July 29, 2009 www.xilinx.com 11

eference Design The ctrl_den signal is required to validate read data because the DD2 SDAM devices do not provide a read valid or read-enable signal along with read data. The controller generates this read-enable signal based on the CAS latency and the burst length. This read-enable signal is input to an SL16 (LUT-based shift register). The number of register stages required to align the read-enable signal to the ISEDES read data output is determined during calibration. One read-enable signal is generated for each data byte. Figure 11 shows the read-enable logic block diagram. ctrl_den_div0 ctrl_den_dir_r1 ctrl_den_dir_r FD FD SL16 srl_out FD Ctrl_dEn Number of register stages selected during calibration X721_11_020607 Figure 11: ead Data FIFO Write-Enable Logic eference Design Figure 12 shows the hierarchy of the reference design. The mem_interface_top is the top-level module. The reference design for the DD2 SDAM interface is integrated with the MIG tool. This tool has been integrated with the Xilinx COE Generator software. For the latest version of the design, download the IP update on the Xilinx website at: http://www.xilinx.com/xlnx/xil_sw_updates_home.jsp. mem_interface_top infrastructure idelay_ctrl main top test_bench iobs user_interface data_path ddr2_controller backend_rom cmp_rd_data infrastr_iobs controller_iobs datapath_iobs backend_fifos rd_data data_write tap_logic addr_gen data_gen_16 idelay_rd_en_io v4_dm_iob v4_dqs_iob v4_dq_iob rd_wr_addr_fifo wr_data_fifo_16 rd_data_fifo tap_ctrl data_tap_inc Figure 12: eference Design Hierarchy AM_D X721_11_113005 XAPP721 (v2.2) July 29, 2009 www.xilinx.com 12

eference Design Summary eference Design Summary Table 5 lists the maximum frequency by speed grade for a 72-bit interface. Table 5: Maximum Frequency by Speed Grade for a 72-Bit Interface Speed Grade Maximum Frequency by Speed Grade (MHz) -10 230-11 267-12 300 Table 6 lists the reference design summary for a 72-bit interface. Table 6: eference Design Summary for a 72-Bit Interface Parameters for Design Design Details / Notes Details Device Utilization 6,714 slices. Includes the controller, synthesizable testbench, the user interface, and the physical layer. 6 BUFGs. Includes one BUFG for the 200 MHz reference clock for the IDELAY block. 9 BUFIOs. Equals the number of strobes in the interface. 1 DCM 1 PMCD 72 ISEDES. Equals the number of data bits in the interface. 99 OSEDES. Equals the sum of the data bits, strobes, and data mask bits. Conclusion This application note explains a technique for using ISEDES to capture data for high-performance memory interfaces. This design provides a high margin because the logic in the FPGA fabric (excluding the calibration logic) is clocked at half the frequency of the interface, eliminating critical paths. evision History The following table shows the revision history for this document. Date Version evision 12/15/05 1.0 Initial Xilinx release. 12/20/05 1.1 Updated Table 1. 01/04/06 1.2 Updated link to reference design file. 02/02/06 1.3 Updated Table 4. 05/25/06 1.4 Updated Clocking Scheme, ead Datapath, and Per Bit Deskew Data Capture Technique, sections, Figure 1, Figure 7, Table 3, and Table 6. Also updated the link to the reference design file. XAPP721 (v2.2) July 29, 2009 www.xilinx.com 13

evision History Date Version evision 03/12/07 2.0 evised Summary. evised Introduction. evised Clocking Scheme text and Figure 1. evised Write Timing Analysis text and Table 1. evised Table 2. evised ead Datapath text and Figure 7. evised ead Timing Analysis and Table 3. evised Per Bit Deskew Data Capture Technique text and Figure 8. Added new Figure 9 and explanatory text. enumbered remaining figures. Old Figure 9 replaced with new figure, Figure 10. Old Figure 10 replaced with new figure, Figure 11. Old Figure 11 renumbered to Figure 12. etitled old section "eference Design Utilization" to eference Design Summary. etitled old Table 6 from "esource Utilization for a 64-Bit Interface" to eference Design Summary for a 72-Bit Interface. evised text in Table 6. evised Conclusion. 10/12/07 2.1 Figure 6: Corrected clock phase relationship between CLKdiv_0 and CLKdiv_180. 07/29/09 2.2 evised headings in Table 1 to include picoseconds (ps) unit of measure in columns 2, 3, and 4. XAPP721 (v2.2) July 29, 2009 www.xilinx.com 14