Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

Similar documents
Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

High-Performance DDR2 SDRAM Interface Data Capture Using ISERDES and OSERDES Author: Maria George

EE178 Spring 2018 Lecture Module 5. Eric Crabill

BUSES IN COMPUTER ARCHITECTURE

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

Synchronizing Multiple ADC08xxxx Giga-Sample ADCs

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

Asynchronous inputs. 9 - Metastability and Clock Recovery. A simple synchronizer. Only one synchronizer per input

Reducing DDR Latency for Embedded Image Steganography

Using the XC9500/XL/XV JTAG Boundary Scan Interface

Synthesizable FCRAM Controller Author: Curtis Fischaber

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Single Channel LVDS Tx

LogiCORE IP AXI Video Direct Memory Access v5.01.a

Digital Electronics II 2016 Imperial College London Page 1 of 8

Ultra ATA Implementation Guide

LogiCORE IP Video Timing Controller v3.0

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

EMPTY and FULL Flag Behaviors of the Axcelerator FIFO Controller

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

Virtex-II Connection to a High-Speed Serial Device (TLK2501) Author: Marc Defossez

AN-822 APPLICATION NOTE

Modeling Latches and Flip-flops

LogiCORE IP Spartan-6 FPGA Triple-Rate SDI v1.0

FPGA Design. Part I - Hardware Components. Thomas Lenzi

DEDICATED TO EMBEDDED SOLUTIONS

CMS Conference Report

Modeling Latches and Flip-flops

Block Diagram. 16/24/32 etc. pixin pixin_sof pixin_val. Supports 300 MHz+ operation on basic FPGA devices 2 Memory Read/Write Arbiter SYSTEM SIGNALS

FIFO Memories: Solution to Reduce FIFO Metastability

Clock Domain Crossing. Presented by Abramov B. 1

GFT Channel Slave Generator

Application Note PG001: Using 36-Channel Logic Analyzer and 36-Channel Digital Pattern Generator for testing a 32-Bit ALU

CS3350B Computer Architecture Winter 2015

Dual Link DVI Receiver Implementation

Digital Phase Adjustment Scheme 0 6/3/98, Chaney. A Digital Phase Adjustment Circuit for ATM and ATM- like Data Formats. by Thomas J.

EITF35: Introduction to Structured VLSI Design

Synchronous Sequential Design

Sub-LVDS-to-Parallel Sensor Bridge

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Introductory Digital Systems Laboratory

Polar Decoder PD-MS 1.1

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

L12: Reconfigurable Logic Architectures

PROCESSOR BASED TIMING SIGNAL GENERATOR FOR RADAR AND SENSOR APPLICATIONS

Logic Analyzer Triggering Techniques to Capture Elusive Problems

Multi-Media Card (MMC) DLL Tuning

Static Timing Analysis for Nanometer Designs

IP-DDC4i. Four Independent Channels Digital Down Conversion Core for FPGA FEATURES. Description APPLICATIONS HARDWARE SUPPORT DELIVERABLES

From Theory to Practice: Private Circuit and Its Ambush

Introduction to Sequential Circuits

L11/12: Reconfigurable Logic Architectures

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Introductory Digital Systems Laboratory

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

SignalTap Plus System Analyzer

Level and edge-sensitive behaviour

BABAR IFR TDC Board (ITB): system design

FPGA Implementation of Sequential Logic

Field Programmable Gate Array (FPGA) Based Trigger System for the Klystron Department. Darius Gray

Achieving Timing Closure in ALTERA FPGAs

LogiCORE IP Motion Adaptive Noise Reduction v2.0

A High-Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability. Nikolaos Minas David Kinniment Keith Heron Gordon Russell

Synchronization Issues During Encoder / Decoder Tests

ISSCC 2006 / SESSION 18 / CLOCK AND DATA RECOVERY / 18.6

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Synchronous Timing. Latch Parameters. Class Material. Homework #8 due next Tuesday

Measurements of metastability in MUTEX on an FPGA

Lecture #4: Clocking in Synchronous Circuits

EEM Digital Systems II

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

Last time, we saw how latches can be used as memory in a circuit

Quad ADC EV10AQ190A Synchronization of Multiple ADCs

Product Obsolete/Under Obsolescence

Block Diagram. dw*3 pixin (RGB) pixin_vsync pixin_hsync pixin_val pixin_rdy. clk_a. clk_b. h_s, h_bp, h_fp, h_disp, h_line

Enable input provides synchronized operation with other components

SHA-256 Module Specification

IT T35 Digital system desigm y - ii /s - iii

FSM Cookbook. 1. Introduction. 2. What Functional Information Must be Modeled

Why FPGAs? FPGA Overview. Why FPGAs?

EECS150 - Digital Design Lecture 15 Finite State Machines. Announcements

FPGA Design with VHDL

Comparing JTAG, SPI, and I2C

Altera JESD204B IP Core and ADI AD9144 Hardware Checkout Report

LogiCORE IP AXI Video Direct Memory Access v5.03a

EECS150 - Digital Design Lecture 19 - Finite State Machines Revisited

Solutions to Embedded System Design Challenges Part II

Dual Link DVI Receiver Implementation

CSCB58 - Lab 4. Prelab /3 Part I (in-lab) /1 Part II (in-lab) /1 Part III (in-lab) /2 TOTAL /8

cascading flip-flops for proper operation clock skew Hardware description languages and sequential logic

CSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz

Lab #5: Design Example: Keypad Scanner and Encoder - Part 1 (120 pts)

(12) Patent Application Publication (10) Pub. No.: US 2007/ A1

UG0682 User Guide. Pattern Generator. February 2018

The GANDALF 128-Channel Time-to-Digital Converter

D Latch (Transparent Latch)

Digilent Nexys-3 Cellular RAM Controller Reference Design Overview

LogiCORE IP Spartan-6 FPGA Triple-Rate SDI v1.0

CS8803: Advanced Digital Design for Embedded Hardware

64CH SEGMENT DRIVER FOR DOT MATRIX LCD

National Instruments Synchronization and Memory Core a Modern Architecture for Mixed Signal Test

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

A Fast Constant Coefficient Multiplier for the XC6200

Transcription:

Application Note: Virtex-4 Family R XAPP701 (v1.4) October 2, 2006 Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George Summary This application note describes the direct-clocking data capture technique for memory interfaces in a Virtex TM -4 device. The direct-clocking scheme utilizes some of the architectural features unique to the Virtex-4 family (for example: the 64-tap absolute delay line provided in each I/O block (IOB)). Introduction Most memory interfaces are source-synchronous interfaces where the data and clock/strobe transmitted from the external memory device is edge aligned. To capture this transmitted data in the Virtex-4 device, either the clock/strobe or the data is delayed. In the direct-clocking technique, the data is delayed and is center aligned with respect to the internal FPGA clock. In this scheme, the internal FPGA clock captures the transmitted data. The clock/strobe transmitted from the memory is used to determine the delay value for the associated data bits. As a result, there are no restrictions on the number of data bits associated with a strobe. Because the strobe does not need to be distributed to the associated data bits, no additional clocking resources are required. The Virtex-4 resource used by the clock/strobe and the data bits is a 64-tap absolute delay line. This 64-tap absolute delay line can be implemented using the IDELAY and IDELAYCTRL primitives. Both the clock/strobe and the data bits are routed through the 64-tap absolute delay line. Although the strobe is not used to capture data, it is used to determine the number of taps required to center the data with respect to the internal FPGA clock. The design and implementation details of the direct-clocking scheme are explained in the following sections. Strobe Edge Detection The delay value for the data bits associated with a clock/strobe is the phase difference between the rising edge of internal FPGA clock and the center of the clock/strobe pulse. The assumption is that clock/strobe and data are edge aligned. In order to determine this phase difference, the clock/strobe is input through the 64-tap absolute delay line in the IOB and is sampled at incremental tap outputs using the internal FPGA clock. At least two edges or transitions of the clock/strobe have to be detected to determine the center of the clock/strobe pulse. The difference between the number of taps required for detection of the second transition (second edge taps), and the number of taps required for detection of the first transition (first edge taps) is the clock/strobe pulse width. Half of this difference is the pulse center (pulse center taps). The number of taps required from the rising edge of the internal FPGA clock to the center of the clock/strobe pulse is the sum of first edge taps and pulsecenter taps. 2004 2006 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and further disclaimers are as listed at http://www.xilinx.com/legal.htm. All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice. NOTICE OF DISCLAIMER: Xilinx is providing this design, code, or information "as is." By providing the design, code, or information as one possible implementation of this feature, application, or standard, Xilinx makes no representation that this implementation is free from any claims of infringement. You are responsible for obtaining any rights you may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of the implementation, including but not limited to any warranties or representations that this implementation is free from claims of infringement and any implied warranties of merchantability or fitness for a particular purpose. XAPP701 (v1.4) October 2, 2006 www.xilinx.com 1

R Strobe Edge Detection Table 1 describes the different types of taps. Table 1: Tap Descriptions TAPS First-edge taps Second-edge taps Second-edge taps First-edge taps Pulse-center taps First-edge taps + Pulse-center taps DESCRIPTION Number of taps required to detect first transition of clock/strobe Number of taps required to detect second transition of clock/strobe Pulse width of clock/strobe Pulse width of clock/strobe divided by two Number of taps required to center data with internal FPGA clock (data-delay taps) Figure 1 illustrates two scenarios of centering data with respect to the internal FPGA clock by delaying it by the data delay taps value. Case 1 shows the falling edge of clock/strobe as the first edge being detected, and this results in the delayed data being centered on the rising edge of the internal FPGA clock. Case 2 shows the rising edge of clock/strobe as the first edge being 2 www.xilinx.com XAPP701 (v1.4) October 2, 2006

Strobe Edge Detection R detected, and this results in the delayed data being centered on the falling edge of the internal FPGA clock. Case 1 Second Edge Detected First Edge Detected Clock/Strobe Read Data First Edge Taps Delayed Read Data Second Edge Taps Data Delay Taps Data Delay Taps Dummy_rd_en Internal FPGA Clock Case 2 Second Edge Detected First Edge Detected Clock/Strobe Read Data First Edge Taps Delayed Read Data Second Edge Taps Data Delay Taps Data Delay Taps Dummy_rd_en Internal FPGA Clock Figure 1: Case 1 and Case 2 - Clock/Strobe Center to Internal FPGA Clock Phase Detection x701_01_071104 XAPP701 (v1.4) October 2, 2006 www.xilinx.com 3

R Strobe Edge Detection Implementation Strobe Edge Detection Implementation The implementation of the delay value determination circuit in a Virtex-4 device is easy because of the dedicated IDELAY and IDELAY_CTRL circuits. The block diagram of the implementation of the delay value determination scheme is shown in Figure 2. DQS IDELAY 64-Tap Absolute Delay Line D Q Edge Detection and Control Logic Data Delay Tap Count Data IDELAY Tap Control Logic IOB IDELAY Increment/ Decrement Logic FPGA Clock DLYRST DLYCE DLYINC Figure 2: Strobe Edge Detection Read DQ IDELAY Increment/ Decrement Logic x701_02_071105 A simple algorithm is used for detecting edges of memory clock/strobe. The clock/strobe is input to the IDELAY block with an initial value of 0. The clock/strobe is delayed in one-tap increments until the first edge is detected. The number of taps required to detect the first edge is then recorded. The clock/strobe continues to be delayed in one-tap increments until the second edge is detected. The number of taps required to detect the second edge is then recorded. The pulse width is computed using both recorded values. After the pulse width of the clock/strobe is determined in number of taps, the midpoint is obtained by dividing it by two. The sum of the midpoint and the number of taps required to detect the first edge is the required taps to delay data. The total number of taps available in the IDELAY block is 64. Therefore, for a frequency of 200 MHz and below, it is not possible to detect two edges. At the end of 64 taps, if only one edge is detected, the number of taps required to delay data is the difference of the number of taps required to detect the first edge and 16 taps (~1.25 ns with a tap resolution of ~75 ps). A quarter cycle of a 200-MHz clock/strobe is about 16 taps. Based on timing analysis, this value can also be used for lower frequencies, down to 110 MHz. For frequencies below 110 MHz, if no edges are detected at the end of 64 taps, the number of taps required to delay data is 32 taps (~2.5 ns with a tap resolution of ~75 ps). This value is sufficient to set the internal FPGA clock edge within the data window. Only a small state machine is required for the first and second edge detection. This state machine is only enabled during a dummy read operation issued for data delay tap value determination. A dummy read operation comprised of multiple, back-to-back read commands is issued to the external memory device before normal operation.the state machine controls the inputs to the IDELAY circuit, namely: DLYRST, DLYCE, and DLYINC. DLYRST - The delay line reset signal that resets the number of taps in the delay line to a value set by the IOBDELAY_VALUE attribute. This is set to "0" in this design. DLYCE - The delay line enable signal that determines when the delay line increment/decrement signal is activated. 4 www.xilinx.com XAPP701 (v1.4) October 2, 2006

Strobe Edge Detection Implementation R DLYINC - The delay line increment/decrement signal that increments or decrements the number of taps in the delay block. Table 2 describes the operation of the delay line. Table 2: Delay Block Operation Operation DLYRST DLYCE DLYINC Reset to configured value of tap count 1 X X Increment tap count 0 1 1 Decrement tap count 0 1 0 No change 0 0 X The state diagram to control these delay block inputs is shown in Figure 3. The four states in this state machine are: DELAY_RST, IDLE, DELAY_INC, and DETECT_EDGE. DELAY_RST (Resets IDELAY taps to 0) IDLE (Holds IDELAY in no-change mode) DELAY_INC (Increments IDELAY by 1 tap) DETECT_EDGE (Detect transition and Increment IDELAY by 1 tap) Figure 3: Strobe Edge Detection State Diagram x701_03_071105 DELAY_RST This is the first state in the state machine enabled with the start of the dummy read operation. In this state, the delay block is reset to "0" taps. This state is followed by multiple IDLE states. IDLE In this state, the delay block is maintained in No change operation. Every state other than IDLE is followed by multiple IDLE states. This is done to allow the tap output value to settle. This IDLE state is followed by either another IDLE, DELAY_INC, or DETECT_EDGE state. XAPP701 (v1.4) October 2, 2006 www.xilinx.com 5

R Strobe Edge Detection Implementation DELAY_INC This state increments the tap of the delay block by one. This state is followed by multiple IDLE states. DETECT_EDGE In this state, the output of the delay block is compared with its previous value to detect an edge or transition and increments the delay block tap by one. This state is followed by multiple IDLE states. After the number of taps to delay the data is determined, the data IDELAY circuit is enabled and increments to this value. This is done by incrementing the data IDELAY circuit for the same number of clock cycles as the number of taps required. The block diagram of the read/write datapath with the data IDELAY circuit is shown in Figure 4. DQ IDELAY 64 Tap Absolute Delay Line Input DDR Flip-Flops Read Data FIFO Rising Edge User Interface IDELAY Increment/ Decrement Logic FPGA Clock CLK0 Read Data FIFO Falling Edge 3-state Control DLYRST DLYCE DLYINC Fans Out to Eight DQ IOBs Data IDELAY Tap Control Logic OBUFT FPGA Clock CLK270 Output DDR Flip-Flops Write Data Rising Write Data Falling Data Delay Tap Count Figure 4: Read/Write Datapath x701_04_052404 6 www.xilinx.com XAPP701 (v1.4) October 2, 2006

Data Capture and Recapture R Data Capture and Recapture The delayed data is captured in the input DDR flip-flops using the internal FPGA clock as shown in Figure 4. The outputs of these flip-flops are then stored in two FIFOs; one for rising edge data and the other for falling edge data. These FIFOs are implemented using the LUT RAMs. The write enable for these FIFOs is provided by a read enable signal generated by the controller and aligned to the captured read data based on pattern calibration. The DDR2 SDRAM devices do not provide a read valid or read enable signal along with read data. Therefore, the controller generates this read enable signal based on the CAS latency and the burst length. The read enable signal must be asserted during the read preamble and deasserted after the last rising edge of the strobe. The read enable signal must be aligned to the captured read data at the output of the IDDR flip-flops. For read enable alignment, a known pattern is written to memory after data alignment to the FPGA clock is complete. The known pattern is then read back, and the read enable signal is delayed using shift registers until it is aligned with the captured read data. A read enable signal is generated per byte of data. The timing diagram showing the read enable alignment is shown in Figure 5. Internal FPGA Clock Delayed, Normalized Read enable CASE 1: Delayed Read Data 0 1 2 3 IDDR SAME_EDGE_PIPELINED Outputs 0 2 1 3 Write Enable Rising and Falling Edge FIFO CASE 2: Delayed Read Data 0 1 2 3 IDDR SAME_EDGE_PIPELINED Outputs 0 2 1 3 Write Enable Rising Edge FIFO Write Enable Falling Edge FIFO Figure 5: Data Capture and Transfer to FIFOs x701_06_062005 XAPP701 (v1.4) October 2, 2006 www.xilinx.com 7

R Read Timing Analysis Read Timing Analysis Read timing analysis with the direct clocking technique is described in this section. Read data is captured directly in the FPGA clock domain; therefore, the memory parameter used for the data valid window analysis is the access time (T AC ). The following is a brief description of each parameter used in this timing analysis. External memory parameters considered for this timing analysis are: T AC - Access time of read data (DQ) with respect to clock forwarded to memory by FPGA T DCD - DCM output duty cycle distortion Read data (DQ) is captured using the FPGA clock and not the memory clock/strobe (DQS); therefore, T AC (access time of data with respect to clock) is considered for this analysis. The DQS to DQ memory parameters, such as T DQSQ, and T QHS, are not considered in this analysis because T AC overrides them. FPGA parameters considered for this timing analysis are: T CLOCK_TREE-SKEW - Skew on the global clock tree for IOB flip-flops closely placed within a bank T PACKAGE_SKEW - Package skew for a particular device/package T SAMP - Sampling window specified in the Virtex-4 source synchronous data sheet T IDELAYPAT_JIT - Pattern jitter per IDELAY tap specified in the Virtex-4 data sheet The delay on data bits associated with a DQS is computed by detecting the DQS edge. Capturing the DQS in an I/O flip-flop using the global clock performs this detection. The final delay value for the data, therefore, already takes into account the setup and hold time of the I/O flip-flop. For a worst case analysis, the inherent setup time and the inherent hold time for the I/O flip-flop are considered. PCB layout skew is also considered to account for the skew between data bits and the associated strobe. Table 3 shows the read timing analysis at 205 MHz for a DDR2 interface. All the parameters are specified in picoseconds. T DATA_PERIOD is half the clock period minus T MEM_DCD. The difference between T DATA_PERIOD and the sum of uncertainties is the valid data window (43 ps). This results in a 43 ps margin at 205 MHz using a -11 Virtex-4 device. Table 3: Read Timing Analysis at 205 MHz for a DDR2 Interface Uncertainty Parameters Value (ps) T CLOCK 4878 Clock period. Description T DCD 150 DCM output duty cycle distortion is subtracted from clock phase (equal to half clock period) to determine T DATA_PERIOD. T DATA_PERIOD 2289 Data period is half the clock period with duty cycle distortion subtracted from it. T AC 1000 Data output access time specified by memory vendor. T PACKAGE_SKEW 20 A small value for package skew is considered because PCB trace lengths are adjusted to compensate for this skew. T SAMP 500 This parameter is defined in the Virtex-4 source synchronous data sheet. 8 www.xilinx.com XAPP701 (v1.4) October 2, 2006

Read Timing Analysis R Table 3: Read Timing Analysis at 205 MHz for a DDR2 Interface (Continued) Uncertainty Parameters Value (ps) T IDELAYPAT_JIT 576 At 205 MHz, the total number of taps in the worst case is 3/4 x clock_period = 48 taps. 48 x 12 = 576 ps of pattern jitter. T CLOCK_TREE_SKEW - Maximum 100 Small value considered for Skew on "global clock" line because DQS and associated DQ are placed close to each other T PCB_LAYOUT_SKEW 50 Skew between data lines and associated strobe on the board Uncertainties 2246 Window 43 Description XAPP701 (v1.4) October 2, 2006 www.xilinx.com 9

R Reference Design Reference Design Conclusion The reference design for the Direct Clocking Data Capture Technique is integrated with the Memory Interface Generator (MIG) tool. This tool has been integrated with the Xilinx Core Generator tool. For the latest version of the design, download the IP Update on the Xilinx website at: http://www.xilinx.com/xlnx/xil_sw_updates_home.jsp The Virtex-4 I/O architecture enhances the implementation of source-synchronous memory interfaces. The architectural features used in this application note and reference design include: IDELAY block Continuously calibrated delay elements with small tap resolution. FIFO16 primitive Block RAM used as FIFO with no additional CLB resources required for status flag generation. High-speed differential global clocking resources provide better duty cycle. The number of global clock resources required in a design is reduced as a result of differential clocking. Revision History The following table shows the revision history for this document. Date Version Revision 09/09/04 1.0 Initial Xilinx release. 11/01/04 1.1 Revised description under Data Capture and Recapture section. Revised Figure 5. Reference design is updated on web. 07/11/05 1.2 Revised Table 3, Figure 5, and Figure 7. Revised Reference Design links. Added new Table 1. 09/13/05 1.3 Updated Read Timing Analysis and Reference Design sections. 10/2/06 1.4 Updated Strobe Edge Detection Implementation, Data Capture and Recapture, and Table 3. Removed Figure 7. 10 www.xilinx.com XAPP701 (v1.4) October 2, 2006