ISSCC 2006 / SESSION 18 / CLOCK AND DATA RECOVERY / 18.6

Similar documents
A low jitter clock and data recovery with a single edge sensing Bang-Bang PD

EE241 - Spring 2005 Advanced Digital Integrated Circuits

A 5-Gb/s Half-rate Clock Recovery Circuit in 0.25-μm CMOS Technology

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

ASNT8140. ASNT8140-KMC DC-23Gbps PRBS Generator with the (x 7 + x + 1) Polynomial. vee. vcc qp. vcc. vcc qn. qxorp. qxorn. vee. vcc rstn_p.

Switching Solutions for Multi-Channel High Speed Serial Port Testing

Draft Baseline Proposal for CDAUI-8 Chipto-Module (C2M) Electrical Interface (NRZ)

ASNT8142-KMC Generator of DC-to-23Gbps PRBS with Selectable Polynomials

LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED SERDES. Masum Hossain University of Alberta

EITF35: Introduction to Structured VLSI Design

16 Stage Bi-Directional LED Sequencer

Synchronizing Multiple ADC08xxxx Giga-Sample ADCs

A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1

A MISSILE INSTRUMENTATION ENCODER

Laboratory 4. Figure 1: Serdes Transceiver

25.5 A Zero-Crossing Based 8b, 200MS/s Pipelined ADC

PICOSECOND TIMING USING FAST ANALOG SAMPLING

GHz Sampling Design Challenge

Dual Link DVI Receiver Implementation

PAPER A 1.25-Gb/s Digitally-Controlled Dual-Loop Clock and Data Recovery Circuit with Enhanced Phase Resolution

Efficient 500 MHz Digital Phase Locked Loop Implementation sin 180nm CMOS Technology

IT T35 Digital system desigm y - ii /s - iii

ASNT_PRBS20B_1 18Gbps PRBS7/15 Generator Featuring Jitter Insertion, Selectable Sync, and Output Amplitude Control

Synthesized Clock Generator

FSM Cookbook. 1. Introduction. 2. What Functional Information Must be Modeled

CONVOLUTIONAL CODING

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

The Matched Delay Technique: Wentai Liu, Mark Clements, Ralph Cavin III. North Carolina State University. (919) (ph)

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Large Area, High Speed Photo-detectors Readout

Datasheet SHF A

Asynchronous inputs. 9 - Metastability and Clock Recovery. A simple synchronizer. Only one synchronizer per input

PAM4 signals for 400 Gbps: acquisition for measurement and signal processing

QUICK START GUIDE FOR DEMONSTRATION CIRCUIT /12/14 BIT 10 TO 65 MSPS DUAL ADC

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

ADC Peripheral in Microcontrollers. Petr Cesak, Jan Fischer, Jaroslav Roztocil

Timing Modules. Connect Frequency Control Timing Modules

Synchronization Issues During Encoder / Decoder Tests

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

COPYRIGHT 2011 AXON DIGITAL DESIGN BV ALL RIGHTS RESERVED

Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

Digital Correction for Multibit D/A Converters

White Paper Lower Costs in Broadcasting Applications With Integration Using FPGAs

Interfacing the TLC5510 Analog-to-Digital Converter to the

High-Speed ADC Building Blocks in 90 nm CMOS

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Noise Detector ND-1 Operating Manual

Kramer Electronics, Ltd. USER MANUAL. Model: FC Analog Video to SDI Converter

BER MEASUREMENT IN THE NOISY CHANNEL

GALILEO Timing Receiver

SingMai Electronics SM06. Advanced Composite Video Interface: HD-SDI to acvi converter module. User Manual. Revision 0.

2.6 Reset Design Strategy

AN-822 APPLICATION NOTE

Memory Interfaces Data Capture Using Direct Clocking Technique Author: Maria George

More Digital Circuits

AN-605 APPLICATION NOTE

BABAR IFR TDC Board (ITB): requirements and system description

BUSES IN COMPUTER ARCHITECTURE

ECEN620: Network Theory Broadband Circuit Design Fall 2014

Fast Quadrature Decode TPU Function (FQD)

Exceeding the Limits of Binary Data Transmission on Printed Circuit Boards by Multilevel Signaling

Combinational vs Sequential

psasic Timing Generator

Research Results in Mixed Signal IC Design

A Flash Time-to-Digital Converter with Two Independent Time Coding Lines. Ryszard Szplet, Zbigniew Jachna, Jozef Kalisz

Technical Article MS-2714

Dual HD input, frame synchronizer, down converter, embedder, CVBS encoder ALL RIGHTS RESERVED

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Serial Digital Interface II Reference Design for Stratix V Devices

3Gb/s, HD, SD 16ch digital audio embedder with embedded domain audio shuffler, mixer and framesync COPYRIGHT 2018 AXON DIGITAL DESIGN BV

Synthesis Technology E102 Quad Temporal Shifter User Guide Version 1.0. Dec

Digital Electronics II 2016 Imperial College London Page 1 of 8

Dual channel HD/SD integrity checking probe with clean switch over function and wings or split screen creation capabilities

Digital Circuits I and II Nov. 17, 1999

DT9837 Series. High Performance, USB Powered Modules for Sound & Vibration Analysis. Key Features:

WINTER 15 EXAMINATION Model Answer

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

Dual HD input, frame synchronizer, down converter with embedder, de-embedder and CVBS encoder COPYRIGHT 2008 AXON DIGITAL DESIGN BV

A Symmetric Differential Clock Generator for Bit-Serial Hardware

Dual HD input, frame synchronizer, down converter, embedder, CVBS encoder COPYRIGHT 2008 AXON DIGITAL DESIGN BV ALL RIGHTS RESERVED

HDB

Agilent 5345A Universal Counter, 500 MHz

Loop Bandwidth Optimization and Jitter Measurement Techniques for Serial HDTV Systems

Datasheet SHF A Multi-Channel Error Analyzer

Digital Delay / Pulse Generator DG535 Digital delay and pulse generator (4-channel)

Power Reduction and Glitch free MUX based Digitally Controlled Delay-Lines

Features. PFD Output Voltage 2000 mv, Pk - Pk. PFD Gain Gain = Vpp / 2π Rad khz 100 MHz Square Wave Ref.

2 MHz Lock-In Amplifier

Registers and Counters

Implementing Audio IP in SDI II on Arria V Development Board

MODEL 2873 Chassis with RS422 CLOCK RECOVERY Module, IOCRM4

NOW all HD Panacea Routers offer 3 Gb/s (1080p) performance!

3Gb/s, HD, SD embedded domain Dolby E/D/D+ decoder and to Dolby E encoder with audio shuffler and optional audio description processor

VARIABLE FREQUENCY CLOCKING HARDWARE

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

Clock Generation and Distribution for High-Performance Processors

Major Differences Between the DT9847 Series Modules

Transcription:

18.6 Data Recovery and Retiming for the Fully Buffered DIMM 4.8Gb/s Serial Links Hamid Partovi 1, Wolfgang Walthes 2, Luca Ravezzi 1, Paul Lindt 2, Sivaraman Chokkalingam 1, Karthik Gopalakrishnan 1, Andreas Blum 2, Otto Schumacher 2, Claudio Andreotti 2, Michael Bruennert 2, Bruno Celli-Urbani 2, Dirk Friebe 2, Ivo Koren 2, Michael Verbeck 2, Ulrich Lange 2 1 Infineon Technologies, San Jose, CA 2 Infineon Technologies, Munich, Germany The increasing demand for DRAM capacity and performance in computing, and especially servers, has led to the development of a new memory-interface standard, the fully buffered DIMM (FB-DIMM). FB-DIMMs can host up to 36 DRAMs whose communication to the host processor is facilitated by the advanced memory buffer (AMB). While DRAMs on a DIMM interact with their respective AMB using the conventional DDR2 standard, the AMB sends to and receives data from the host processor or a neighboring FB-DIMM by means of differential point-to-point signaling. In this paper, the implementation details of data recovery and retiming of the AMB serial links are discussed. The chip comprises 24 serial links, a core processing unit, and a DDR interface. To support an 800Mb/s DDR2 data rate, links must operate at 4.8Gb/s. FB-DIMMs are connected in a daisy-chain configuration, and as such, the serial links function as repeaters; they recover and retime data, process, and forward data to the next DIMM, starting from and ending at the host processor. Figure 18.6.1 depicts the block diagram of a single high-speed lane including the CDR, electrical idle, the IQ-generator, a retiming FIFO, and the transmitter. The FB-DIMM protocol uses electrical idle (EI) as the primary mechanism to initialize, control state transitions, and to enter and exit the disable state. AMB enters EI when both the differential (DM) and the common-mode (CM) levels of the received data on at least two of three assigned links are low. The key challenge with the EI-detection circuit is its required resolution and bandwidth. The EI must detect the valid, but deteriorated differential levels (±80mV) of serial data in the presence of considerable CM noise both in EI and active modes; and with fast response time, it must determine whether the incoming data stream is valid or if the preceding AMB is in idle state. Figure 18.6.2 is a simplified schematic of the EI circuit illustrating only the differential level detection. CMFB biases the gates of draincoupled devices, Md+ and Md- near V t when DM=0. With the application of a differential data stream, Md+ and Md- gates alternate above V t. Acting as a wideband full-wave rectifier, the pair generates a current, Iint, which is in turn dc-averaged by the RC load to effect a voltage drop on Vint. Replica biasing produces VintR to which Vint is compared in order to indicate entry into or exit from EI. As seen from the figure, though the input instantaneous voltage level in the active mode is frequently below that of EI, the circuit never makes a false transition, and achieves entry and exit detection times of 16ns and 8ns over PVT and mismatch, outperforming the specification of 60ns and 30ns [1]. A half-rate (2.4GHz) CML clock is distributed to pairs of lanes and is used to generate, by means of a polyphase filter (Fig. 18.6.3), quadrature clocks that drive two adjacent phase interpolators (PIs). Worst case IQ error is 0.015UI, or 3ps, and duty-cycle error is less than 0.5%. Phase interpolation is achieved by quadrant-based phase-mixing with a resolution of 1/32 UI and a DNL better than 0.25 LSB with a ±3σ confidence level over PVT. The half-rate CML clock is also converted to CMOS levels for use by the high-speed digital circuits, and the transmitter. Much like the EI detection time, fast acquisition of lock that ensues on exit from EI, significantly improves system performance after reset and recovery. The AMB uses a 1 st -order tracking CDR with fast acquisition capability based on binary search (see Fig. 18.6.4). The algorithm is independent of the loop delay, and thus enables a very short acquisition time without exhibiting any limit-cycle oscillations. Excepting the digital loop filter that operates at the decimated frequency of 600MHz, and comprises low-bandwidth tracking and fast acquisition modules, this architecture requires no additional high-speed components when compared to a generic CDR, and thus affords considerable savings in area and power. While the tracking filter receives the difference of Up<7:0> and Dn<7:0> counts, the fast acquisition module separately integrates the Up<7:0> and Dn<7:0> counts using a pair of shallow dumped integrators (DIs). The lock condition is reached with three successive steps; in each step, the first DI to cross the threshold indicates in which direction the recovered clock phase must be shifted. Adjustments are executed by trains of 8, 4, and 2 Up Acq or Dn Acq pulses sent to the PI, which, in turn, shifts the recovered clock by ±1/4UI (±8LSBs), ±1/8UI (±4LSBs), and ±1/16UI (±2LSBs). Though non-monotonic, the residual phase error is always less than the respective correction step, and is reduced to within 2LSBs at the end of the acquisition process. During each step, for the time the PI adjusts the clock phase, the loop is broken (i.e., both DIs are cleared and are held in reset), and is reconnected once the interpolator has settled. Such procedure eliminates any possibility of limit-cycle oscillations in the CDR behavior. Upon completion of the 3 rd step, the FSM asserts the Lock Detect signal and enables the low-bandwidth tracking loop which will complete the final phase convergence. Contemporaneously, the FIFO and transmitter are enabled. Figure 18.6.5 includes the phase convergence process during fast acquisition for the full span of initial phase errors [-1/2UI, +1/2UI]. The fast lock process completes in 520UIs, well exceeding the standards requirement of 1428UIs [2]. A Retiming FIFO receives the recovered clock and data. It interrupts the accumulation of jitter in the FB-DIMM daisy-chain by retiming the recovered data to the local PLL clock. Optionally, the AMB can bypass the FIFO, and forward the recovered data without retiming to the transmitter. As thru-latency is one of the key performance parameters of AMB, the FIFO is designed to operate at 2.4GHz with both writes and reads accomplished in half a period (1UI). The FIFO (Fig. 18.6.6) is implemented as a 2-entry, 8-deep, dual-port register-file. By integrating an insertion MUX onto its read bit lines, data from the DDR interface can selectively be inserted and forwarded to the link transmitter. Read and write operations are differential and utilize monotonic, dual-rail domino signaling. A pair of ring counters generates the write and read pointers. Writeto-Read pointer spacing is programmable to 2, 3 or 4UIs so that the lowest setting, based on the expected accumulated jitter, can be selected. The AMB die, shown in Fig. 18.6.7, is fabricated in a 0.13µm, 1.5V CMOS technology and occupies 9.2 4.5mm 2. The measured input sensitivity, with a minimum eye-opening of 0.35UI, is 50mV p-p at a BER of 10-12, and is better than 170mV p-p for an extrapolated BER of 10-16, exceeding the standards requirement of 170mV p-p at a BER of 10-12 [1]. Limited only by the available chipset support, a cascade of up to 4 FB-DIMMs interoperates with a host processor without error. References: [1] FB-DIMM Draft Specification: High Speed Differential PTP Link at 1.5V, Dec., 04. [2] FB-DIMM Draft Specification: AMB, Jan., 05.

ISSCC 2006 / February 7, 2006 / 4:15 PM Figure 18.6.1: High-speed lane architecture. Figure 18.6.2: Electrical idle detection circuit. I Clk Q Clk ½ ± 1.5ps Figure 18.6.3: Polyphase IQ generator. Figure 18.6.4: CDR with fast acquisition. 8 4 2 Figure 18.6.5: The CDR initial phase convergence. Figure 18.6.6: Retiming FIFO with integrated insertion MUX.

DDR2 Interface Digital Core From the Host (Southbound) PLL To the Host (Northbound) HS Lane Figure 18.6.7: AMB die micrograph.

Figure 18.6.1: High-speed lane architecture.

Figure 18.6.2: Electrical idle detection circuit.

I Clk Q Clk ½ ±1.5ps Figure 18.6.3: Polyphase IQ generator.

Figure 18.6.4: CDR with fast acquisition.

8 4 2 Figure 18.6.5: The CDR initial phase convergence.

Figure 18.6.6: Retiming FIFO with integrated insertion MUX.

DDR2 Interface Digital Core From the Host (Southbound) PLL To the Host (Northbound) HS Lane Figure 18.6.7: AMB die micrograph.