Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

Similar documents
Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

High-Performance DDR2 SDRAM Interface Data Capture Using ISERDES and OSERDES Author: Maria George

Synthesizable FCRAM Controller Author: Curtis Fischaber

BUSES IN COMPUTER ARCHITECTURE

EE178 Spring 2018 Lecture Module 5. Eric Crabill

Asynchronous inputs. 9 - Metastability and Clock Recovery. A simple synchronizer. Only one synchronizer per input

Synchronizing Multiple ADC08xxxx Giga-Sample ADCs

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

Virtex-II Connection to a High-Speed Serial Device (TLK2501) Author: Marc Defossez

Ultra ATA Implementation Guide

Using the XC9500/XL/XV JTAG Boundary Scan Interface

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

FPGA Implementation of Sequential Logic

Digital Electronics II 2016 Imperial College London Page 1 of 8

Single Channel LVDS Tx

LogiCORE IP Spartan-6 FPGA Triple-Rate SDI v1.0

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Synchronous Timing. Latch Parameters. Class Material. Homework #8 due next Tuesday

FPGA Design. Part I - Hardware Components. Thomas Lenzi

Polar Decoder PD-MS 1.1

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

LogiCORE IP Video Timing Controller v3.0

Reducing DDR Latency for Embedded Image Steganography

Digital Phase Adjustment Scheme 0 6/3/98, Chaney. A Digital Phase Adjustment Circuit for ATM and ATM- like Data Formats. by Thomas J.

EITF35: Introduction to Structured VLSI Design

Clock Domain Crossing. Presented by Abramov B. 1

Quad ADC EV10AQ190A Synchronization of Multiple ADCs

AN-822 APPLICATION NOTE

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

Field Programmable Gate Array (FPGA) Based Trigger System for the Klystron Department. Darius Gray

Synchronization Issues During Encoder / Decoder Tests

cascading flip-flops for proper operation clock skew Hardware description languages and sequential logic

Application Note PG001: Using 36-Channel Logic Analyzer and 36-Channel Digital Pattern Generator for testing a 32-Bit ALU

DEDICATED TO EMBEDDED SOLUTIONS

Level and edge-sensitive behaviour

LogiCORE IP AXI Video Direct Memory Access v5.01.a

Synchronous Sequential Design

FIFO Memories: Solution to Reduce FIFO Metastability

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

Modeling Latches and Flip-flops

EMPTY and FULL Flag Behaviors of the Axcelerator FIFO Controller

Logic Analyzer Triggering Techniques to Capture Elusive Problems

GFT Channel Slave Generator

From Theory to Practice: Private Circuit and Its Ambush

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Faculty of Electrical & Electronics Engineering BEE3233 Electronics System Design. Laboratory 3: Finite State Machine (FSM)

A High-Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability. Nikolaos Minas David Kinniment Keith Heron Gordon Russell

Dual Link DVI Receiver Implementation

BABAR IFR TDC Board (ITB): system design

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Introductory Digital Systems Laboratory

CSCB58 - Lab 4. Prelab /3 Part I (in-lab) /1 Part II (in-lab) /1 Part III (in-lab) /2 TOTAL /8

CMS Conference Report

Achieving Timing Closure in ALTERA FPGAs

Modeling Latches and Flip-flops

Switching Circuits & Logic Design

Multi-Media Card (MMC) DLL Tuning

FSM Cookbook. 1. Introduction. 2. What Functional Information Must be Modeled

IP-DDC4i. Four Independent Channels Digital Down Conversion Core for FPGA FEATURES. Description APPLICATIONS HARDWARE SUPPORT DELIVERABLES

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

EECS150 - Digital Design Lecture 15 Finite State Machines. Announcements

CS3350B Computer Architecture Winter 2015


White Paper Lower Costs in Broadcasting Applications With Integration Using FPGAs

Block Diagram. 16/24/32 etc. pixin pixin_sof pixin_val. Supports 300 MHz+ operation on basic FPGA devices 2 Memory Read/Write Arbiter SYSTEM SIGNALS

EECS150 - Digital Design Lecture 19 - Finite State Machines Revisited

AN-605 APPLICATION NOTE

Introduction to Sequential Circuits

D Latch (Transparent Latch)

IT T35 Digital system desigm y - ii /s - iii

Forward Error Correction on ITU-G.709 Networks using Reed-Solomon Solutions Author: Michael Francis

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Introductory Digital Systems Laboratory

ISSCC 2006 / SESSION 18 / CLOCK AND DATA RECOVERY / 18.6

Lecture #4: Clocking in Synchronous Circuits

LogiCORE IP Motion Adaptive Noise Reduction v2.0

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

VARIABLE FREQUENCY CLOCKING HARDWARE

Product Obsolete/Under Obsolescence

PROCESSOR BASED TIMING SIGNAL GENERATOR FOR RADAR AND SENSOR APPLICATIONS

System-Level Timing Closure Using IBIS Models

L12: Reconfigurable Logic Architectures

Measurements of metastability in MUTEX on an FPGA

National Instruments Synchronization and Memory Core a Modern Architecture for Mixed Signal Test

Logic Analysis Basics

Logic Analysis Basics

Radar Signal Processing Final Report Spring Semester 2017

Digital Circuits and Systems

UNIT 11 LATCHES AND FLIP-FLOPS

SignalTap Plus System Analyzer

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

Metastability Analysis of Synchronizer

Trigger synchronization and phase coherent in high speed multi-channels data acquisition system

SHA-256 Module Specification

L11/12: Reconfigurable Logic Architectures

problem maximum score 1 28pts 2 10pts 3 10pts 4 15pts 5 14pts 6 12pts 7 11pts total 100pts

Lab #5: Design Example: Keypad Scanner and Encoder - Part 1 (120 pts)

CS8803: Advanced Digital Design for Embedded Hardware

Features of the 745T-20C: Applications of the 745T-20C: Model 745T-20C 20 Channel Digital Delay Generator

Last time, we saw how latches can be used as memory in a circuit

Static Timing Analysis for Nanometer Designs

Transcription:

Application Note: Virtex-4 Family XAPP701 (v1.3) September 13, 2005 Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George Summary This application note describes the direct-clocking data capture technique for memory interfaces in a Virtex TM -4 device. The direct-clocking scheme utilizes some of the architectural features unique to the Virtex-4 family (for example: the 64-tap absolute delay line provided in each I/O block (IOB)). Introduction Most memory interfaces are source-synchronous interfaces where the data and clock/strobe transmitted from the external memory device is edge aligned. To capture this transmitted data in the Virtex-4 device, either the clock/strobe or the data is delayed. In the direct-clocking technique, the data is delayed and is center aligned with respect to the internal FPGA clock. In this scheme, the internal FPGA clock captures the transmitted data. The clock/strobe transmitted from the memory is used to determine the delay value for the associated data bits. As a result, there are no restrictions on the number of data bits associated with a strobe. Because the strobe does not need to be distributed to the associated data bits, no additional clocking resources are required. The Virtex-4 resource used by the clock/strobe and the data bits is a 64-tap absolute delay line. This 64-tap absolute delay line can be implemented using the IDELAY and IDELAYCTL primitives. Both the clock/strobe and the data bits are routed through the 64-tap absolute delay line. Although the strobe is not used to capture data, it is used to determine the number of taps required to center the data with respect to the internal FPGA clock. The design and implementation details of the direct-clocking scheme are explained in the following sections. Strobe Edge Detection The delay value for the data bits associated with a clock/strobe is the phase difference between the rising edge of internal FPGA clock and the center of the clock/strobe pulse. The assumption is that clock/strobe and data are edge aligned. In order to determine this phase difference, the clock/strobe is input through the 64-tap absolute delay line in the IOB and is sampled at incremental tap outputs using the internal FPGA clock. At least two edges or transitions of the clock/strobe have to be detected to determine the center of the clock/strobe pulse. The difference between the number of taps required for detection of the second transition (second edge taps), and the number of taps required for detection of the first transition (first edge taps) is the clock/strobe pulse width. Half of this difference is the pulse center (pulse center taps). The number of taps required from the rising edge of the internal FPGA clock to the center of the clock/strobe pulse is the sum of first edge taps and pulsecenter taps. 2004-2005 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and further disclaimers are as listed at http://www.xilinx.com/legal.htm. All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice. NOTICE OF DISCLAIME: Xilinx is providing this design, code, or information "as is." By providing the design, code, or information as one possible implementation of this feature, application, or standard, Xilinx makes no representation that this implementation is free from any claims of infringement. You are responsible for obtaining any rights you may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of the implementation, including but not limited to any warranties or representations that this implementation is free from claims of infringement and any implied warranties of merchantability or fitness for a particular purpose. XAPP701 (v1.3) September 13, 2005 www.xilinx.com 1

Strobe Edge Detection Table 1 describes the different types of taps. Table 1: Tap Descriptions TAPS First-edge taps Second-edge taps Second-edge taps First-edge taps Pulse-center taps First-edge taps + Pulse-center taps DESCIPTION Number of taps required to detect first transition of clock/strobe Number of taps required to detect second transition of clock/strobe Pulse width of clock/strobe Pulse width of clock/strobe divided by two Number of taps required to center data with internal FPGA clock (data-delay taps) Figure 1 illustrates two scenarios of centering data with respect to the internal FPGA clock by delaying it by the data delay taps value. Case 1 shows the falling edge of clock/strobe as the first edge being detected and this results in the delayed data being centered on the rising edge of the internal FPGA clock. Case 2 shows the rising edge of clock/strobe as the first edge being 2 www.xilinx.com XAPP701 (v1.3) September 13, 2005

Strobe Edge Detection detected and this results in the delayed data being centered on the falling edge of the internal FPGA clock. Case 1 Second Edge Detected First Edge Detected Clock/Strobe First Edge Taps Delayed Second Edge Taps Data Delay Taps Data Delay Taps Dummy_rd_en Internal Case 2 Second Edge Detected First Edge Detected Clock/Strobe First Edge Taps Delayed Second Edge Taps Data Delay Taps Data Delay Taps Dummy_rd_en Internal Figure 1: Case 1 and Case 2 - Clock/Strobe Center to Internal Phase Detection x701_01_071104 XAPP701 (v1.3) September 13, 2005 www.xilinx.com 3

Strobe Edge Detection Implementation Strobe Edge Detection Implementation The implementation of the delay value determination circuit in a Virtex-4 device is easy because of the dedicated IDELAY and IDELAY_CTL circuits. The block diagram of the implementation of the delay value determination scheme is shown in Figure 2. DQS IDELAY 64-Tap Absolute Delay Line D Q Edge Detection and Control Logic Data Delay Tap Count Data IDELAY Tap Control Logic IOB IDELAY Increment/ Decrement Logic DLYST DLYCE DLYINC Figure 2: Strobe Edge Detection ead DQ IDELAY Increment/ Decrement Logic x701_02_071105 A simple algorithm is used for detecting edges of memory clock/strobe. The clock/strobe is input to the IDELAY block with an initial value of 0. The clock/strobe is delayed in one-tap increments until the first edge is detected. The number of taps required to detect the first edge is then recorded. The clock/strobe continues to be delayed in one-tap increments until the second edge is detected. The number of taps required to detect the second edge is then recorded. The pulse width is computed using both recorded values. After the pulse width of the clock/strobe is determined in number of taps, the midpoint is obtained by dividing it by two. The sum of the midpoint and the number of taps required to detect the first edge is the required taps to delay data. The total number of taps available in the IDELAY block is 64. Therefore for a frequency of 200 MHz and below, it is not possible to detect two edges. At the end of 64 taps, if only one edge is detected, the number of taps required to delay data is the sum of the number of taps required to detect the first edge and 16 taps (~1.25 ns with a tap resolution of ~80 ps). Quarter cycle of a 200-MHz clock/strobe is about 16 taps. Based on timing analysis, this value can also be used for lower frequencies, down to 110 MHz. For frequencies below 110 MHz, if no edges are detected at the end of 64 taps, the number of taps required to delay data is 32 taps (~2.5 ns with a tap resolution of ~80 ps). This value is sufficient to set the internal FPGA clock edge within the data window. Only a small state machine is required for the first and second edge detection. This state machine is only enabled during a dummy read operation issued for data delay tap value determination. A dummy read operation comprised of multiple, back-to-back read commands is issued to the external memory device before normal operation.the state machine controls the inputs to the IDELAY circuit, namely: DLYST, DLYCE, and DLYINC. DLYST - The delay line reset signal that resets the number of taps in the delay line to a value set by the IOBDELAY_VALUE attribute. This is set to "0" in this design. DLYCE - The delay line enable signal that determines when the delay line increment/decrement signal is activated. 4 www.xilinx.com XAPP701 (v1.3) September 13, 2005

Strobe Edge Detection Implementation DLYINC - The delay line increment/decrement signal that increments or decrements the number of taps in the delay block. Table 2 describes the operation of the delay line. Table 2: Delay Block Operation Operation DLYST DLYCE DLYINC eset to configured value of tap count 1 X X Increment tap count 0 1 1 Decrement tap count 0 1 0 No Change 0 0 X The state diagram to control these delay block inputs is shown in Figure 3. The four states in this state machine are: DELAY_ST, IDLE, DELAY_INC, and DETECT_EDGE. DELAY_ST (esets IDELAY taps to 0) IDLE (Holds IDELAY in no-change mode) DELAY_INC (Increments IDELAY by 1 tap) DETECT_EDGE (Detect transition and Increment IDELAY by 1 tap) Figure 3: Strobe Edge Detection State Diagram x701_03_071105 DELAY_ST This is the first state in the state machine enabled with the start of the dummy read operation. In this state the delay block is reset to "0" taps. This state is followed by multiple IDLE states. IDLE In this state the delay block is maintained in No change operation. Every state other than IDLE is followed by multiple IDLE states. This is done to allow the tap output value to settle. This IDLE state is followed by either another IDLE, or DELAY_INC, or DETECT_EDGE state. XAPP701 (v1.3) September 13, 2005 www.xilinx.com 5

Strobe Edge Detection Implementation DELAY_INC This state increments the tap of the delay block by one. This state is followed by multiple IDLE states. DETECT_EDGE In this state, the output of the delay block is compared with its previous value to detect an edge or transition and increments the delay block tap by one. This state is followed by multiple IDLE states. After the number of taps to delay the data is determined, the data IDELAY circuit is enabled and increments to this value. This is done by incrementing the data IDELAY circuit for the same number of clock cycles as the number of taps required. The block diagram of the read/write data path with the data IDELAY circuit is shown in Figure 4. DQ IDELAY 64 Tap Absolute Delay Line Input DD Flip-Flops FIFO ising Edge User Interface IDELAY Increment/ Decrement Logic CLK0 FIFO Falling Edge 3-state Control DLYST DLYCE DLYINC Fans Out to Eight DQ IOBs Data IDELAY Tap Control Logic OBUFT CLK270 Output DD Flip-Flops Write Data ising Write Data Falling Data Delay Tap Count Figure 4: ead/write Data Path x701_04_052404 6 www.xilinx.com XAPP701 (v1.3) September 13, 2005

Data Capture and ecapture Data Capture and ecapture The delayed data is captured in the input DD flip-flops using the internal FPGA clock as shown in Figure 4. The outputs of these flip-flops are then stored in two FIFOs; one for rising edge data and the other for falling edge data. These FIFOs are implemented using the LUT AMs. The write enable for these FIFOs is provided by a read enable signal normalized for system parameters. The read enable signal also goes through the 64-tap delay line with the same number of tap delays as the data bits. The DD2 SDAM devices do not provide a read valid or read enable signal along with read data. Therefore, the controller generates this read enable signal based on the CAS latency and the burst length. The read enable signal must be asserted during the read preamble and deasserted after the last rising edge of the strobe. The read enable signal is normalized to make it system independent. The normalization is implemented with a loopback on the PCB. The EAD_EN_OUT is output from the FPGA, loops back on the PCB, and is input as EAD_EN_IN. The trace delay of this loopback must equal the sum of trace delays of the clock (CK/CK) forwarded to the memory device and the strobe (DQS)/data (DQ). The trace delays of CK, DQSs, and DQs must be closely matched. This loop back signal is used to generate write enable signals to the read data capture FIFOs. For interfaces that span multiple banks, a loopback is recommended per bank in order to manage the fanout on this enable signal. The first data word can be captured using either the rising edge or the falling edge of the internal FPGA clock. Therefore, additional logic is required for the write enable of the rising edge FIFO. The circuit implementing the write enable logic for the read recapture FIFOs is shown in Figure 5. If the first data is captured on the rising edge of the FPGA clock, then the write enable to rising edge FIFO is the output of the first flip-flop. If not, it is the output of the second flip-flop. The timing diagram for the read data capture and write enable for recapture FIFOs is shown in Figure 6. First Data on ising Edge 1 Write Enable ising FIFO Delayed, Normalized ead Enable D Q D Q D Q 0 Write Enable Falling FIFO Figure 5: Write Enable Logic for ead e-capture FIFOs x701_05_090204 XAPP701 (v1.3) September 13, 2005 www.xilinx.com 7

Data Capture and ecapture Internal Delayed, Normalized ead enable CASE 1: Delayed 0 1 2 3 IDD SAME_EDGE_PIPELINED Outputs 0 2 1 3 Write Enable ising and Falling Edge FIFO CASE 2: Delayed 0 1 2 3 IDD SAME_EDGE_PIPELINED Outputs 0 2 1 3 Write Enable ising Edge FIFO Write Enable Falling Edge FIFO Figure 6: Data Capture and Transfer to FIFOs x701_06_062005 8 www.xilinx.com XAPP701 (v1.3) September 13, 2005

ead Timing Analysis ead Timing Analysis ead timing analysis with the direct clocking technique is described in this section. ead data is captured directly in the FPGA clock domain, therefore the memory parameter used for the data valid window analysis is the access time (T AC ). The following is a brief description of each parameter used in this timing analysis. External memory parameters considered for this timing analysis are: T AC - Access time of read data (DQ) with respect to clock forwarded to memory by FPGA T MEM_DCD - Duty cycle distortion tolerance specified by memory vendor ead data (DQ) is captured using the FPGA clock and not the memory clock/strobe (DQS), therefore T AC (access time of data with respect to clock) is considered for this analysis. The DQS to DQ memory parameters such as T DQSQ, and T QHS are not considered in this analysis since T AC overrides them. FPGA parameters considered for this timing analysis are: T GLOBAL_CLOCK_TEE-SKEW - Skew on the global clock tree T JITTE - DCM clock output jitter T PACKAGE_SKEW - Package skew for a particular device/package T SETUP - Setup time of the I/O flip-flop T HOLD - Hold time of the I/O flip-flop I/O The delay on data bits associated with a DQS is computed by detecting the DQS edge. Capturing the DQS in an I/O flip-flop using the global clock performs this detection. The final delay value for the data therefore already takes into account the setup and hold time of the I/O flip-flop. For a worst case analysis, the inherent setup time and the inherent hold time for the I/O flip-flop are considered. PCB layout skew is also considered to account for the skew between data bits and the associated strobe. Table 3 shows the read timing analysis at 267 MHz for a DD-II interface. All the parameters are specified in picoseconds. T DATA_PEIOD is half the clock period minus T MEM_DCD. The sum of uncertainties before clock is the start of the valid data window (770 ps). The difference between T DATA_PEIOD and the sum of uncertainties after clock is the end of the valid data window (967 ps). This results in a 197 ps margin at 267 MHz.There is sufficient margin because two taps with a 75 ps resolution fit in this data valid window. Table 3: ead Timing Analysis at 267 MHz for a DD-2 Interface Uncertainty Parameters Value (ps) Before Clock After Clock T CLOCK 3750 Clock period Description T MEM_DCD 188 0 0 Duty cycle distortion tolerance is subtracted from clock phase (equal to half clock period) to determine T DATA_PEIOD. T DATA_PEIOD 1687 Data period is half the clock period with 10% duty cycle distortion subtracted from it. T AC ±500 500 500 Data output access time specified by memory vendor. T PACKAGE_SKEW 0 0 0 Package skew is not considered because PCB trace lengths are adjusted to compensate for this skew. XAPP701 (v1.3) September 13, 2005 www.xilinx.com 9

ead Timing Analysis Table 3: ead Timing Analysis at 267 MHz for a DD-2 Interface (Continued) Uncertainty Parameters Value (ps) Before Clock After Clock T SETUP - Minimum 100 100 0 DQS edge detection is performed by registering it in the I/O flip-flop with a global clock. The final data delay value therefore already accounts for the setup and hold times of the I/O flip-flops. The inherent set up time of the flip-flop is considered for a worst case analysis. T HOLD - Maximum 50 0 0 DQS edge detection is performed by registering it in the I/O flip-flop with a global clock. The final data delay value therefore already accounts for the setup and hold times of the I/O flip flops. The inherent hold time of the flip-flop is considered for a worst case analysis. T JITTE 100 100 100 Clock jitter that indirectly causes strobe and data jitter. T CLOCK_TEE_SKEW - Maximum 50 50 50 Small value considered for Skew on "global clock" line because DQS and associated DQ are placed close to each other T PCB_LAYOUT_SKEW 20 20 20 Skew between data lines and associated strobe on the board 387 770 720 Window 197 770 967 Figure 7 illustrates the calculated data valid window. Description Leading Edge Margin Trailing Edge Margin ead Clock/Strobe Leading Edge 0 770 967 1687 Figure 7: Data Valid Window time, ps Trailing Edge x701_07_091205 10 www.xilinx.com XAPP701 (v1.3) September 13, 2005

eference Design eference Design Conclusion The reference design for the Direct Clocking Data Capture Technique is integrated with the Memory Interface Generator (MIG) tool. This tool has been integrated with the Xilinx Core Generator. For the latest version of the design, download the IP Update on the Xilinx website at: http://www.xilinx.com/xlnx/xil_sw_updates_home.jsp The Virtex-4 I/O architecture enhances the implementation of source-synchronous memory interfaces. The architectural features used in this application note and reference design include: IDELAY block Continuously calibrated delay elements with small tap resolution. FIFO16 primitive Block AM used as FIFO with no additional CLB resources required for status flag generation. High-speed differential global clocking resources provide better duty cycle. The number of global clock resources required in a design is reduced as a result of differential clocking. evision History The following table shows the revision history for this document. Date Version evision 09/09/04 1.0 Initial Xilinx release. 11/01/04 1.1 evised description under Data Capture and ecapture section. evised Figure 6. eference design is updated on web. 07/11/05 1.2 evised Table 3, Figure 6, and Figure 7. evised eference Design links. Added new Table 1. 09/13/05 1.3 Updated ead Timing Analysis and eference Design sections. XAPP701 (v1.3) September 13, 2005 www.xilinx.com 11