Clock Generation and Distribution for High-Performance Processors

Similar documents
Digital System Clocking: High-Performance and Low-Power Aspects

EE241 - Spring 2005 Advanced Digital Integrated Circuits

Digital System Clocking: High-Performance and Low-Power Aspects. Microprocessor Examples

Timing EECS141 EE141. EE141-Fall 2011 Digital Integrated Circuits. Pipelining. Administrative Stuff. Last Lecture. Latch-Based Clocking.

Clocking Spring /18/05

Future of Analog Design and Upcoming Challenges in Nanometer CMOS

HIGH SPEED CLOCK DISTRIBUTION NETWORK USING CURRENT MODE DOUBLE EDGE TRIGGERED FLIP FLOP WITH ENABLE

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

Current Mode Double Edge Triggered Flip Flop with Enable

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

Power Distribution and Clock Design

Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications

11. Sequential Elements

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

EECS150 - Digital Design Lecture 17 - Circuit Timing. Performance, Cost, Power

ECEN620: Network Theory Broadband Circuit Design Fall 2014

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Features. For price, delivery, and to place orders, please contact Hittite Microwave Corporation:

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

HMC-C060 HIGH SPEED LOGIC. 43 Gbps, D-TYPE FLIP-FLOP MODULE. Features. Typical Applications. General Description. Functional Diagram

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

Dual Link DVI Receiver Implementation

Chapter 2. Digital Circuits

A low jitter clock and data recovery with a single edge sensing Bang-Bang PD

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

ECE321 Electronics I

Performance Driven Reliable Link Design for Network on Chips

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

Efficient 500 MHz Digital Phase Locked Loop Implementation sin 180nm CMOS Technology

Exceeding the Limits of Binary Data Transmission on Printed Circuit Boards by Multilevel Signaling

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

Experiment # 4 Counters and Logic Analyzer

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Electrical & Computer Engineering ECE 491. Introduction to VLSI. Report 1

An FPGA Implementation of Shift Register Using Pulsed Latches

A Low-Power CMOS Flip-Flop for High Performance Processors

Lecture 6. Clocked Elements

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Computer Systems Architecture

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

Lecture 26: Multipliers. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

Sequential Circuit Design: Part 1

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Sequential Circuit Design: Part 1

A Power Efficient Flip Flop by using 90nm Technology

LFSR Counter Implementation in CMOS VLSI

LOW-POWER CLOCK DISTRIBUTION IN EDGE TRIGGERED FLIP-FLOP

VARIABLE FREQUENCY CLOCKING HARDWARE

LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED SERDES. Masum Hossain University of Alberta

Lecture 21: Sequential Circuits. Review: Timing Definitions

Lecture 10: Sequential Circuits

Synchronization circuit with synchronized vertical divider system for 60 Hz TDA2579C

Low Power D Flip Flop Using Static Pass Transistor Logic

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

SDA 3302 Family. GHz PLL with I 2 C Bus and Four Chip Addresses

EITF35: Introduction to Structured VLSI Design

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

Lecture 11: Sequential Circuit Design

EECS150 - Digital Design Lecture 2 - CMOS

Technology Scaling Issues of an I DDQ Built-In Current Sensor

o-microgigacn Data Sheet Revision Channel Optical Transceiver Module Part Number: Module: FPD-010R008-0E Patch Cord: FOC-CC****

Synchronizing Multiple ADC08xxxx Giga-Sample ADCs

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

P.Akila 1. P a g e 60

Research Article Ultra Low Power, High Performance Negative Edge Triggered ECRL Energy Recovery Sequential Elements with Power Clock Gating

EE-382M VLSI II FLIP-FLOPS

ANALYSIS OF POWER REDUCTION IN 2 TO 4 LINE DECODER DESIGN USING GATE DIFFUSION INPUT TECHNIQUE

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

Introduction. NAND Gate Latch. Digital Logic Design 1 FLIP-FLOP. Digital Logic Design 1

New Single Edge Triggered Flip-Flop Design with Improved Power and Power Delay Product for Low Data Activity Applications

AN-822 APPLICATION NOTE

Hardware Design I Chap. 5 Memory elements

Modeling and designing of Sense Amplifier based Flip-Flop using Cadence tool at 45nm

EECS150 - Digital Design Lecture 3 - Timing

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

ECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs

CMOS Low Power, High Speed Dual- Modulus32/33Prescalerin sub-nanometer Technology

Design Project: Designing a Viterbi Decoder (PART I)

Clock - key to synchronous systems. Topic 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

Clock - key to synchronous systems. Lecture 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module

PICOSECOND TIMING USING FAST ANALOG SAMPLING

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Sequential Design Basics

Clock Domain Crossing. Presented by Abramov B. 1

Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks. A Thesis presented.

Transcription:

Clock Generation and Distribution for High-Performance Processors Stefan Rusu Senior Principal Engineer Enterprise Microprocessor Division Intel Corporation stefan.rusu@intel.com

Outline Clock Distribution Trends Distribution Networks De-skew Circuits Jitter Reduction Techniques Clock Power Dissipation Future Directions Summary SoC 2004 Stefan Rusu 2

Clock Definition and Parameters The clock is a periodic synchronization signal used as a time reference for data transfers in synchronous digital systems Ref Clk t skew Skew Spatial variation of the clock signal as distributed through the chip Global vs. local skew End Clk t jitter Clock jitter Temporal variation of the clock with respect to a reference edge Long-term vs. cycle-to-cycle jitter Duty cycle variation 50/50 design target t high t low SoC 2004 Stefan Rusu 3

Processor Frequency Trend 10000 Pentium III Pentium 4 Frequency [MHz] 1000 100 386 486 Pentium Pentium Pro Pentium II 10 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 SoC 2004 Stefan Rusu 4

Clock Skew Trend 600 500 Clock Skew [ps] 400 300 200 100 0 100 1000 10000 Processor Frequency [MHz] Source: ISSCC and JSSC papers SoC 2004 Stefan Rusu 5

Relative Clock Skew 10 Clock Skew as Percentage of Cycle Time [%] 7.5 5 2.5 0 100 1000 10000 Processor Frequency [MHz] Clock skew accounts in average for ~5% of the cycle time Source: ISSCC and JSSC papers SoC 2004 Stefan Rusu 6

Sources of Clock Skew With a perfectly balanced distribution, device mismatch is the largest contributor to the clock skew Temperature Mismatch Load Mismatch Supply Mismatch Device Mismatch (Le) 0 20 40 60 Percent Geannopoulos, ISSCC-1998 SoC 2004 Stefan Rusu 7

Clock Jitter Trend 500 Clock Jitter [ps] 400 300 200 100 0 100 1000 10000 Processor Frequency [MHz] Source: ISSCC and JSSC papers SoC 2004 Stefan Rusu 8

Outline Clock Distribution Trends Distribution Networks De-skew Circuits Jitter Reduction Techniques Clock Power Dissipation Future Directions Summary SoC 2004 Stefan Rusu 9

Clock Distribution Networks Tree Mesh Grid H-Tree X-Tree Tapered H-Tree SoC 2004 Stefan Rusu 10

Inductance Effect Xanthopoulos, ISSCC-2001 SoC 2004 Stefan Rusu 11

Itanium Processor Clock Hierarchy CLKP CLKN VCC/2 Reference Clock PLL DSK DSK RCD Main Clock DSK RCD DLCLK OTB Global Distribution Regional Distribution Local Distribution Rusu, ISSCC-2000 SoC 2004 Stefan Rusu 12

Local Clock Distribution Local clock distribution enables flexible skew management to support: Intentional clock skew insertion for timing optimization Clock gating for power reduction Regional Clock Grid Normal Local Clock Buffers Intentional Skew Buffer Combinatorial Block Rusu, ISSCC-2000 SoC 2004 Stefan Rusu 13

Itanium 2 Processor Clock Distribution First level: Pseudo-differential, impedance matched branching, balanced h-tree Second level: balanced, width and length tuned binary h-tree Second Level Clock Buffers: adjustable delay buffer Gaters: all constant input loading with load-tuned drive strength gaters primary driver Repeaters SLCBs (5) (33) Each SLCB ~70 tap points of ~8 gaters each Anderson, ISSCC-2002 SoC 2004 Stefan Rusu 14

Optical Skew Probing Photon s Vin Idn Vout Clock edge generates infrared photon emission Emission peak indicates clock transition edge Tam, VLSI Symposium, 2003 SoC 2004 Stefan Rusu 15

Optical Probing Results Tam, VLSI Symposium, 2003 SoC 2004 Stefan Rusu 16

130nm Itanium 2 Skew Profile Relative Delay (ps) 70 60 50 40 30 20 10 0-10 Default Fuse Adjusted SCAN Adjusted 1 6 11 16 21 Clock Zone Tam, VLSI Symposium, 2003 SoC 2004 Stefan Rusu 17

Pentium 4 Processor Clock Network PLL 2GHz triple-spine clock distribution (180nm) Kurd, JSSC-2001 SoC 2004 Stefan Rusu 18

90nm Clock Distribution skew in ps Sub-10ps clock skew demonstrated in a 90nm processor using clock tree averaging Bindal, ISSCC 2003 SoC 2004 Stefan Rusu 19

Pentium 4 Processor Clock Skew 22ps 7ps 130nm Pentium 4 Processor 90nm Pentium 4 Processor 90nm design has 3x lower clock skew than the 130nm Schutz, ISSCC 2004 SoC 2004 Stefan Rusu 20

Alpha* Processors Clocking Product 21064 21164 21264 Frequency 166MHz 300MHz 600MHz Transistors 1.7M 9.3M 9.3M Process 0.75um 4ML 0.5um, 4ML 0.35um, 6ML Power 25W 50W 72W Clock load 2.75nF 3.75nF 2.8nF Clock Floorplan final drivers pre-driver PLL Clock skew plot Skew (ps) 75 60 45 30 15 0 Chip Vertical Axis Chip Horizontal Axis * Other names and brands may be claimed as the property of others Gronowski, JSSC 1998 SoC 2004 Stefan Rusu 21

1.2GHz Alpha* Processor Clock NCLK DLL DLL DLL GCLK L2LCLK PLL L2RCLK * Other names and brands may be claimed as the property of others Xanthopoulos, ISSCC-2001 SoC 2004 Stefan Rusu 22

Power4* Clock Distribution PLL Clock Distribution 3 2 4 Ref clk in Bypass PLL out 1 Ref clk out Global Clock Grid Feedback Dual core, SOI process, 174M transistors Measured clock skew below 25ps * Other names and brands may be claimed as the property of others Restle, ISSCC-2002 SoC 2004 Stefan Rusu 23

Power4* - 3D Skew Visualization Delay (ps) 800 700 600 500 400 grid Tuned sector trees Sector buffers level 4 buffer level 3 300 200 Y X buffer level 2 100 buffer level 1 Restle, ISSCC-2002 SoC 2004 Stefan Rusu * Other names and brands may be claimed as the property of others 24

Outline Clock Distribution Trends Distribution Networks De-skew Circuits Jitter Reduction Techniques Clock Power Dissipation Future Directions Summary SoC 2004 Stefan Rusu 25

Dual-Zone Clock Deskew X Clk FB Clk Clk_Gen Delay Line Delay Line Delay SR Deskew Ctl Delay SR Left Spine Core PD CL Right Spine Geannopoulos, ISSCC-1998 SoC 2004 Stefan Rusu 26

Itanium Processor Clock Deskew DSK DSK DSK DSK Distributed array of deskew buffers to reduce process related skew CDC 8 deskew clusters each holding up to 4 buffers 30 deskew zones DSK DSK DSK DSK DSK CDC = Cluster of 4 deskew buffers = Central Deskew Controller Rusu, ISSCC-2000 SoC 2004 Stefan Rusu 27

Itanium Processor Deskew Buffer Input Output Enable# TAP I/F 20-bit Delay Control Register Step size = 8.5ps Deskew range = 170ps Small step size enables fine skew control over a wide range TAP read / write access to Control Register enables faster timing debug and performance tuning Rusu, ISSCC-2000 SoC 2004 Stefan Rusu 28

Pentium 4 Processor Deskew Logical diagram of the skew optimization circuit Phase detector network Kurd, JSSC-2001 SoC 2004 Stefan Rusu 29

Deskew Techniques Summary Author Source Clock Zones Skew Before Skew After Step Size Geannopoulos ISSCC-98 2 60ps 15ps 12ps Rusu ISSCC-00 30 110ps 28ps 8ps Kurd ISSCC-01 47 64ps 16ps 8ps Stinson ISSCC-03 23 60ps 7ps 7ps Clock deskew techniques compensate for device and interconnect within-die variations Deskew circuits cut clock skew to less than a quarter of the original value SoC 2004 Stefan Rusu 30

Useful Clock Skew Frequency Improvement (MHz) 300 250 200 150 100 50 0 Initial Stepping 1 2 3 4 Frequency Improvement (MHz) 40 30 20 10 0 Subsequent Stepping 1 2 3 Samples Samples Use de-skew buffers to insert intentional skew to maximize the processor operating frequency Larger benefit achieved in early steppings Tam, VLSI Symposium, 2003 SoC 2004 Stefan Rusu 31

Outline Clock Distribution Trends Distribution Networks De-skew Circuits Jitter Reduction Techniques Clock Power Dissipation Future Directions Summary SoC 2004 Stefan Rusu 32

Pentium 4 Processor Jitter Reduction Vcc R Vcc - IR C I 10% dip in Core Supply 2% dip in Filtered Supply Jitter (ps) 60 40 20 0-20 -40 With Filter No Filter -60 0 10 20 30 40 50 Cycle # RC-filtered power supply for clock drivers reduces clock distribution jitter Kurd, JSSC-2001 SoC 2004 Stefan Rusu 33

Alpha* Processor Voltage Regulator 0 1.5V 2.5V PSRR [db] -10-20 -30-40 -50 LPF - + DLL -60 1.0E+02 1.0E+04 1.0E+06 1.0E+08 1.0E+10 Frequency [Hz] Voltage regulator ensures optimum DLL tracking Supply noise frequencies over 1MHz are attenuated by more than 15dB Xanthopoulos, ISSCC-2001 * Other names and brands may be claimed as the property of others SoC 2004 Stefan Rusu 34

On-Die Clock Jitter Detector Internal Clock Phase bins 0.5 * DL 0.5 * DL n clk ref Array Phase Detector Post Process Circuitry + Registers Counter inc/dec 2 Digital LPF up/dn# Kuppuswamy, VLSI Symposium 2001 SoC 2004 Stefan Rusu 35

Array Phase Detector clk ref... FF FF FF FF FF FF FF FF FF... 7 elements above and below center, with increasing positive and negative built-in offset away from center Phase offset created by progressively delaying data wrt clock SoC 2004 Stefan Rusu 36

Histogram Mode Operation Array Phase Detector XOR Logic Error Detection Logic jitter error count bins SoC 2004 Stefan Rusu 37

Graph Mode Operation jitter error encoded bins Array Phase Detector XOR Logic Error Detection Logic time SoC 2004 Stefan Rusu 38

Outline Clock Distribution Trends Distribution Networks De-skew Circuits Jitter Reduction Techniques Clock Power Dissipation Future Directions Summary SoC 2004 Stefan Rusu 39

Clock Power Breakdown Example 30% of the total power is attributed to clock Most of the clock power is used in the final clock buffers and flip-flops 2.1% 1.5% 26.2% 70.2% 1st Level 2nd Level 3rd Level Rest of chip Anderson, ISSCC-2002 SoC 2004 Stefan Rusu 40

Clock Power Reduction Reduce clock frequency Multiple frequency domains Dual edge triggered flip-flops Reduce voltage swing Low swing clocks Clock Power = f * C * V 2 Reduce clock loading Clock gating Clock-on-demand flip-flop Optimized routing SoC 2004 Stefan Rusu 41

Half Swing Clocking Requires four clock signals Two clock phases with a swing between Vdd and Vdd/2 drive the PMOS devices The other two phases with a swing between Gnd and Vdd/2 drive the NMOS transistors Experimental savings of 67% were demonstrated on a 0.5µm CMOS test chip with only 0.5ns speed degradation Requires additional area for the special clock drivers and suffers from skew problems between the four phases Kojima, JSSC 1995 SoC 2004 Stefan Rusu 42

Clock-on-demand Flip-Flop Activates internal clock only when the input data will change the output - equivalent to single bit clock gating Longer setup time and sensitive to hold time violations Hamada, ISSCC 1999 SoC 2004 Stefan Rusu 43

XScale Processor Clock Gating Three hierarchical clock gating levels GCLK_DA1 DA_BNK1_EN# GCLK_DA2 GCLK_IA1 IA_BNK1_EN# GCLK_IA2 Top level stop clock DA_BNK1_EN IA_BNK1_EN Unit level 83 enables Local clock buffers 400 unique enables GCLK_DA9 GCLK_DA10 CLK SPINE (M5) GCLK_IA10 GCLK_IA9 EGCLK (M6) Clark, JSSC 11/2001 GCLK_DC1 GCLK_IC1 GCLK_RF1 GLB_MA_EN GCLK_MA2 SoC 2004 Stefan Rusu 44

Dual Edge Triggered Flip-Flop 1 st STAGE: X 2 nd STAGE 1 st STAGE: Y CLK Mp1 Mp3 Mp6 Mp4 CLK1 Mp2 X Mp7 Q Mp8 Y Mp5 D Mn1 I1 Mn9 I2 Mn5 D CLK3 Mn2 Mn4 Mn10 I3 Mn8 Mn6 CLK4 Q CLK Mn3 Mn7 CLK1 C L Inv1 Inv2 Inv3 Inv4 CLK CLK1 CLK2 CLK3 CLK4 Operates at half the clock frequency Requires tight control of the clock duty cycle Nedovic, ESSCIRC 2002 SoC 2004 Stefan Rusu 45

Outline Clock Distribution Trends Distribution Networks De-skew Circuits Jitter Reduction Techniques Clock Power Dissipation Future Directions Summary SoC 2004 Stefan Rusu 46

Rotary Clock Distribution Transmission line based, self-regenerating rotary clock generator Wood, ISSCC-2001 SoC 2004 Stefan Rusu 47

Standing Wave Oscillator O Mahony, ISSCC-2003 SoC 2004 Stefan Rusu 48

10GHz Clock Grid Test Chip Fabricated in a 0.18µm 1.8V 6M CMOS process Very low clock skew and power consumption Attractive alternative for 10GHz clocking and beyond SoC 2004 Stefan Rusu 49

Optical Clock Distribution Board-level guided-wave H-tree distribution Monolithic silicon-based detection Couplers provide tolerance for horizontal and vertical misalignment of the flip-chip assembly Optical transmission is immune to process variations, power-grid noise and temperature J.D. Meindl, Georgia Institute of Technology, 2000 SoC 2004 Stefan Rusu 50

Summary High performance processors require a low skew and jitter clock distribution network Clock distribution techniques are optimized to achieve the best skew and jitter with reduced area and power consumption Deskew techniques are demonstrated to cut the skew to ¼ of its original value On-die supply filters are used to reduce jitter Intensive research focuses on novel clock distribution techniques SoC 2004 Stefan Rusu 51