Power Distribution and Clock Design

Similar documents
EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

ECE321 Electronics I

Clock Generation and Distribution for High-Performance Processors

Lecture 21: Sequential Circuits. Review: Timing Definitions

Timing EECS141 EE141. EE141-Fall 2011 Digital Integrated Circuits. Pipelining. Administrative Stuff. Last Lecture. Latch-Based Clocking.

11. Sequential Elements

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

Project 6: Latches and flip-flops

Clocking Spring /18/05

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock.

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Sequential Circuit Design: Part 1

Chapter 7 Sequential Circuits

Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks. A Thesis presented.

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

Clock - key to synchronous systems. Lecture 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

Clock - key to synchronous systems. Topic 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

EECS150 - Digital Design Lecture 17 - Circuit Timing. Performance, Cost, Power

P.Akila 1. P a g e 60

A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

Sequential Circuit Design: Part 1

Design of Low Power D-Flip Flop Using True Single Phase Clock (TSPC)

Scan. This is a sample of the first 15 pages of the Scan chapter.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

24. Scaling, Economics, SOI Technology

Static Timing Analysis for Nanometer Designs

Digital Integrated Circuits EECS 312

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

A Low-Power CMOS Flip-Flop for High Performance Processors

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

Performance Modeling and Noise Reduction in VLSI Packaging

TKK S ASIC-PIIRIEN SUUNNITTELU

Combinational vs Sequential

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

Lecture 1: Circuits & Layout

Digital Integrated Circuit Design II ECE 426/526, Chapter 10 $Date: 2016/04/07 00:50:16 $

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Introductory Digital Systems Laboratory

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

ECEN620: Network Theory Broadband Circuit Design Fall 2014

EEE2135 Digital Logic Design Chapter 6. Latches/Flip-Flops and Registers/Counters 서강대학교 전자공학과

CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm

Electrical & Computer Engineering ECE 491. Introduction to VLSI. Report 1

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Synchronous Timing. Latch Parameters. Class Material. Homework #8 due next Tuesday

Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications

ELEN Electronique numérique

V6118 EM MICROELECTRONIC - MARIN SA. 2, 4 and 8 Mutiplex LCD Driver

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

Wire Delay and Switch Logic

CMOS Latches and Flip-Flops

ELEC 4609 IC DESIGN TERM PROJECT: DYNAMIC PRSG v1.2

K.T. Tim Cheng 07_dft, v Testability

Lecture 10: Sequential Circuits

EE241 - Spring 2005 Advanced Digital Integrated Circuits

Flip-Flops A) Synchronization: Clocks and Latches B) Two Stage Latch C) Memory Requires Feedback D) Simple Flip-Flop Gate

EE-382M VLSI II FLIP-FLOPS

EECS150 - Digital Design Lecture 2 - CMOS

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

Performance Driven Reliable Link Design for Network on Chips

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

Design of Fault Coverage Test Pattern Generator Using LFSR

Lecture 11: Sequential Circuit Design

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

An FPGA Implementation of Shift Register Using Pulsed Latches

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Homework 3 posted this week, due after Spring break Quiz #2 today Midterm project report due on Wednesday No office hour today

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

cascading flip-flops for proper operation clock skew Hardware description languages and sequential logic

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Research Article Ultra Low Power, High Performance Negative Edge Triggered ECRL Energy Recovery Sequential Elements with Power Clock Gating


Future of Analog Design and Upcoming Challenges in Nanometer CMOS

Low Voltage Clocking Methodologies for Nanoscale ICs. A Dissertation Presented. Weicheng Liu. The Graduate School. in Partial Fulfillment of the

Hardware Design I Chap. 5 Memory elements

PICOSECOND TIMING USING FAST ANALOG SAMPLING

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

Design Project: Designing a Viterbi Decoder (PART I)

55:131 Introduction to VLSI Design Project #1 -- Fall 2009 Counter built from NAND gates, timing Due Date: Friday October 9, 2009.

Clock Tree Power Optimization of Three Dimensional VLSI System with Network

Dual Slope ADC Design from Power, Speed and Area Perspectives

Low Power D Flip Flop Using Static Pass Transistor Logic

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

Noise Margin in Low Power SRAM Cells

Digital Integrated Circuits EECS 312

Lecture 1: Intro to CMOS Circuits

Lecture 23 Design for Testability (DFT): Full-Scan

Transcription:

Lecture 3 Power Distribution and Clock Design R. Saleh Dept. of ECE University of British Columbia res@ece.ubc.ca 1

Overview of Lecture Power distribution in the past was a fairly simple task Goal of power distribution system is to deliver the required current across the chip while maintaining the voltage levels necessary for proper operation of logic circuits Interconnect effects have created problems of IR drop, Ldi/dt, electromigration. Power distribution is now a complex task in deep submicron Clock design is also a complex issue in DSM due to RC delay components in the interconnect and power dissipation Overall examination of the issues of clock skew and IR drop, and how to manage them using circuit techniques Reference: 1) Power Grid and Clock Design, HJS Textbook, Chapter 11 2

Design Issues of Power Distribution Goal: Get Vdd and Gnd to all gates in the circuit Design Challenges: How many power and ground pins should we allocate? Which layers of metal should be used to route power/ground? How wide should be make the wires to minimize voltage drops and reliability problems How do we maintain V DD and Gnd within the noise budget? How do we verify overall power distribution system? 3

Power Distribution Issues - IR Drop Vdd n1 n2 n5 n3 n6 n4 n7 < Vdd < Vdd n8 Narrowing line widths have increased metal line resistance As current flows through power grid, voltage drops occur => IR drops Actual voltage supplied to gates is less than Vdd Impacts speed and functionality; must be within 10% noise budget Need to ensure this is not a problem near the end of the design at tapeout! 4

Power Grid Issues Electromigration (EM) n1 n2 Current Desity < 10mA/um 2 n3 n4 n8 As large current flows down narrow wires, metal begins to migrate Metal lines break over time due to metal fatigue Mean-time to failure is based on average/peak current density Need to ensure that current density levels do not exceed limits set by foundry design rules Cu is 10X better than Al but we typically see 3X n5 n6 n7 5

Power Routing Examples How do we deliver power to two adjacent blocks to avoid IR and EM? Block A Block B Block A Block B Single Trunk Multiple Trunks 6

Simple Routing Examples cont d Block A Block B Block A Block B Double-Ended Connections Wider Trunks 7

Interleaved Power/Ground Routing Interleaved Vdd/Gnd 8

Power Grid Architecture Metal4 Metal5 Via Arrays Power Grid Using M4/M5 9

10

Power Grid Issues Static IR Drop Block placement and global power routing determines IR drop on the chip Possible solutions Rearrange blocks More Vdd pins Connect bottom portion of grid to top portion 11

Power Grid Issues Static IR Drop If we connect bottom portion of grid to top portion, the IR drop is reduced significantly However, this is only one part of the problem We must also examine electromigration 12

Case Study IR and EM Tradeoff 13

Block Interaction yields IR Drop 14

Effect of Ldi/dt In addition to IR drop, power system inductance is also an issue Inductance may be due to power pin or power bump Overall voltage drop is: V drop = IR + L di Simple Example: dt Drop across inductors = 2 x L x di/dt = 2 x 0.2nH x 20mA/100ps = 80mV (problematic if supply is 1.2V) Actual power pad or bump may need to support thousands of inverters 15

IR Drop and Ld/dt are Dynamic Phenomena 16

On-chip Decoupling Capacitors On-chip decaps help to stabilize the power grid voltage First line of defense against noise which can extend beyond 10GHz Distribute decoupling capacitors (decaps) liberally throughout design Capacitors store up charge Can provide instantaneous source of current for switching Later, the decap charges back up to prepare for next event L 17

Making a Decoupling Cap Decaps are basically NMOS transistors. Top plate is polysilicon, bottom-plate is inverted channel, insulator is gate oxide. Connect poly to Vdd and source/drain to Vss Low-frequency capacitance is roughly C OX W L. Since these are large capacitance to be used at high frequencies, more accurate representation is needed 18

Standard Cell Decap Layout Standard cells decaps typically have the following layout since we have access to both P and N devices V DD V SS 19

Decap High-Frequency Response Channel resistance (affects response time) Gate n + n + Finite Transit Time (affects capacitance value) + + + + + + + + + + + + + + + + + + + Gate - - - - - - - - - - - - n + - - n + 20

Use Fingers Example: With each division, resistance is reduced but so is capacitance. Question: What is the optimum # of fingers? Actually, PMOS is worse than NMOS so one option is to use NMOS only 21

How much Decoupling Cap? To estimate required decap value, run SPICE on patch of chip area with power grid, part of logic block, and sprinkle of decaps Amount of decap depends on: Acceptable ripple on Vdd-Vss (typically 10% noise budget) Switching activity of logic circuits (usually need 10X switched cap) Current provided by power grid (di/dt) Required frequency response (high frequency operation) How much decap exists ( non-switching diffusion, gate, wire caps) 22

Decap Placement Empty space is not necessarily the best place to fill with decap since P&R is done with timing and power constraints in mind. One method would be to try to shift cells around so that decaps can be placed where they are needed. Choose 4 different configurations: All decap in the center. All decap in the corners. Decap distributed evenly. Decap near cells that violate noise margin. Use an equal number of decaps for each configuration. (Equal area penalty.) Artificially manipulate the capacitance of each cell until 10%V DD noise is eliminated. Best placement scheme is one that requires the least amount of decoupling capacitance. 23

Noise Violation Configuration 24

Decap Configurations Center Corner 25 Evenly Distributed Noise Violation

Where to place Decaps? Center Corner 26 Evenly Distributed Noise Violation

Results Noise Violation Configuration: although requiring the most to eliminate ALL violations, requires the least to eliminate 99% of the violations. Should place decaps between charge source and destination Total switching capacitance in block is 350pF Ratio between Decoupling Capacitance and Switching Capacitance seems to be between 1.5-2x. Strategy Center Corner Evenly Distributed Noise Violations Total Decap 684pF 586pF 707pF 733pF 27

Designing Power Distribution Floorplanner should be aware of IR+Ldi/dt drop and EM problems and design accordingly Requires knowledge of current distributions and voltage drop constraints of blocks being placed Provide adequate number of V DD and Gnd pins May need to provide multiple V DD islands for low power Route power distribution system according to current demands of the blocks Widen wires based on expected current density in branches Distribute decoupling capacitors liberally throughout design Verify full chip with IR/EM tools 28

Clock and Flip-flop Design Clocks synchronize the operation of sequential logic circuits Flip-flops and latches are used to gate signals through combinational logic on the clock edges Critical parameters of flip-flops are the setup and hold times Once we design the basic flops, we must build a clock network that gets the signal to the flops at roughly the same time We will look at clock trees, H-trees and clock grids. Overall examination of the issues of clock skew, jitter, power and IR drop, and how to manage them using circuit techniques 29

Clocked D Flip-flop Most widely used FF in IC design for temporary storage of data May be edge-triggered (Flip-flop) or level-sensitive (transparent latch) data D Q output CK Q Flip-flop D Q n+1 data D Q output 0 0 1 1 Latch CK Q 30

Latch vs. Flip-flop Latch (level-sensitive, transparent) When the clock is high it passes In value to Out When the clock is low, it holds value that In had when the clock fell Flip-Flop (edge-triggered, non transparent) On the rising edge of clock (pos-edge trig), it transfers the value of In to Out It holds the value at all other times. In Clk Out In Out CLK In Clk In Out Out CLK Latch Flip-Flop 31

Clocking Overhead FF and Latches have setup and hold times that must be satisfied: Flip Flop Din will work may work won t work Din Latch T setup Clk T hold Clk T hold Qout Qout T setup + T clk-q T d-q If Din arrives before setup time and is stable after the hold time, FF will work; if Din arrives after hold time, it will fail; in between, it may or may not work; FF delays the slowest signal by the setup + clk-q delay in the worst case Latch has small setup and hold times; but it delays the late arriving signals by T d-q 32

Clock Definitions Duty Cycle = % of time clock is high over the clock period Edge Rate = rise time of clock edge from 10% to 90% Latency = total path delay from root clock to leaf clock. (clock delay) Skew = difference in latency between any two clock branches. (spatial variation) Jitter = variation in latency at any single leaf clock. (temporal variation) 33

Clock Design Issues Clock cycle depends on a number of factors: T cycle = T Clk-Q + T Logic + T setup + T skew + T jitter D Q Logic D Q N T Jitter Clk T Skew Clk T Jitter T Clk-Q T Logic T Setup 34

Clock Design Goals Meet Design Specs: Max Skew Min/Max Latency (Delay) Duty Cycle (Rise/Fall) Max Jitter Verify Resulting: Power Consumption Area (Gate Count) 35

Tree and Grid Minimal area cost Requires clock-tree management Use a large superbuffer to drive downstream buffers Balancing may be an issue Greater area cost Easier skew control Increased power consumption Electromigration risk increased at drivers Severely restricts floorplan and routing 36

Classic H-Tree Place clock root at center of chip and distribute as an H-tree structure to all areas of the chip Clock is delayed by an equal amount to every section of the chip Local skew inside blocks is kept within tolerable limits 37

Clock Skew Analysis CLOCK SKEW causes two problems: T clk-q T setup The cycle time gets longer by the skew Flop Fix critical path Logic Flop T cycle = T d +T setup + T clk-q + T skew T d Shows up as a SETUP time violation The part can get the wrong answer Late T d=0 Early when T skew + T hold > T clk-q Flop Flop Insert buffer Delay elements Shows up as a HOLD time violation Early Late 38

Overhead for a Clock CMOS FO4 delay is roughly 425ps/um x L eff For 0.13um, FO4 delay 40-50ps For a 1GHz clock, this allows < 20 FO4 gate delays/cycle Clock overhead (including margins for setup/hold) 2 FF/Latches cost about 2-3 FO4 delays skew costs approximately 2-3 FO4 delays Overhead of clock is roughly 4-6 FO4 delays 14-16 FO4 delays left to work with for logic Need to reduce skew and FF cost CLOCK T cycle Skew T clk-q T logic T setup 39

Requirements in Flip-Flop Design Minimize FF overhead: small clk-q delay, t setup, t hold times Minimize power flops up to 20% of total power of high-performance systems High driving capability Typical flip-flop load in a 0.18µm CMOS ranges from 50fF to over 100fF, with typical values of 100-150fF in critical paths Multiplexed or scan enabled Crosstalk insensitivity - dynamic/high impedance nodes are problematic Small load on clock to improve performance of clock and reduce power of clock clocks can consume 40% of total chip power 40

ITRS Jitter and Skew Trends 41

Sources of Clock Skew Main sources: 1. Imbalance between different paths from clock source to FF s interconnect length determines RC delays capacitive coupling effects cause delay variations buffer sizing number of loads driven 2. Process variations across die interconnect and devices have different statistical variations Secondary Sources: 1. IR and Ldi/dt in power supply 2. Temperature variations across chip 42

Contributors to Clock Skew From ISSCC 1998 Ref: Geannopoulos98 43

Contributors to Clock Skew Intra-Die PVT Variations Process Transistors (TT, FF, FS, SF, SS) Metal (Width, Thickness, etc. ~ RLC) Voltage (Power Grid Variations ~ IR-Drop, Ldi/dt) Temperature (Correlated to Power Dissipation) Tree Branches can t be Perfectly Balanced Drivers ~ Wires ~ Flip-Flops 44

PVT Variability Study Variation data from IBM and ITRS2005 45

Spatial Variation Models Ref: Hashimoto05 46

PVT Variations P V T IEEE D&T of Computers Nov-Dec06; Fetzer 47

Temperature Variations Clock delay varies primarily due to variations in V T and mobility, and temp. coeff. of wires 48

IR Drop Impacts on Clock Skew Ideal Vdd - Low delay - Low skew Delay (latency) Skew Conservative Vdd - High delay - Low skew Actual IR drop impact - delay about 5-15% 5 larger - skew about 25-30% larger 49

Reducing the Effects of IR drop and Ldi/dt Stagger the firing of clock buffers (bad idea: increases skew) Use different power grid tap points for clock buffers (but it makes routing more complicated for automated tools) Use smaller buffers (but it degrades edge rates/increases delay) Make power busses wider (requires area but should do it) Use more Vdd/Vss pins; adjust locations of Vdd/Vss pins Put in power straps where needed to deliver current Place decoupling capacitors wherever there is free space Integrate decoupling capacitors into buffer cells These caps act as decoupling caps when they are not switching 50

Power dissipation in Clocks Significant power dissipation can occur in clocks in highperformance designs: clock switches on every cycle so P= CV 2 f (i.e., α=1) clock capacitance can be ~nf range, say 1nF = 1000pF assuming a power supply of 1.8V, CV = 1800pC of charge if clock switches every 2ns (500MHz), that s 0.9A for V DD = 1.8V, P=IV=0.9(1.8)=1.6W in the clock circuit alone Much of the power (and the skew) occurs in the final drivers due to the sizing up of buffers to drive the flip-flops Key to reducing the power is to examine equation CV 2 f and reduce the terms wherever possible V DD is usually given to us; may not want to reduce swing due to coupling noise, etc. Look more closely at C and f 51

Clock Gating Most popular method for power reduction of clock signals and functional units Gate off clock to idle functional units need logic to generate disable signal increases complexity of control logic consumes power timing critical to avoid clock glitches at AND gate output additional gate delay on clock signal gating AND gate can replace a buffer in the clock distribution tree all clock trees should have same type of gating whether they are used or not for balance FF s clock Combinational Logic disable 52

Reducing Power in Clocking Reduce overall capacitance (shielding vs. spacing) shield clock shield Signal 1 clock Signal 2 (a) higher total cap./less area (b) lower cap./ more area Tradeoff between the two approaches due to coupling noise approach (a) is better for inductive noise; (b) is better for capacitive noise 53

Clock Design Objectives Now that we understand the role of the clock and some of the key issues, how do we design it? Minimize the clock skew (in presence of IR drop) Minimize the clock delay (latency) Minimize the clock power (and area) Maximize noise immunity (due to coupling effects) Maximize the clock reliability (signal EM) Problems that we will have to deal with Routing the clock to all flip-flops on the chip Driving unbalanced loading, which will not be known until the chip is nearly completed On-chip process/temperature variations 54

Clock Verification Clock verification is more complex in DSM Must include the effects of RC Interconnect delays in clock skew analysis along with PVT Signal integrity (capacitive coupling, inductance) spacing vs. shielding IR drop and Ldi/dt Signal Electromigration Clock Jitter is difficult to verify time-domain variation of a given clock signal due to random noise, IR drop, temperature, etc. 55