Lecture 6. Clocked Elements

Similar documents
Lecture 21: Sequential Circuits. Review: Timing Definitions

Lecture 26: Multipliers. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

11. Sequential Elements

ECE321 Electronics I

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

Lecture 11: Sequential Circuit Design

ECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs

CPE/EE 427, CPE 527 VLSI Design I Sequential Circuits. Sequencing

Clocking Spring /18/05

Digital System Clocking: High-Performance and Low-Power Aspects

Sequential Circuit Design: Part 1

EE-382M VLSI II FLIP-FLOPS

Sequential Circuit Design: Part 1

II. ANALYSIS I. INTRODUCTION

Lecture 10: Sequential Circuits

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

Comparative study on low-power high-performance standard-cell flip-flops

Chapter 7 Sequential Circuits

A Unified Approach in the Analysis of Latches and Flip-Flops for Low-Power Systems

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

EE241 - Spring 2005 Advanced Digital Integrated Circuits

Digital System Clocking: High-Performance and Low-Power Aspects. Microprocessor Examples

Load-Sensitive Flip-Flop Characterization

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications

Clock - key to synchronous systems. Lecture 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

Clock - key to synchronous systems. Topic 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

Energy-Delay Space Analysis for Clocked Storage Elements Under Process Variations

Modeling and designing of Sense Amplifier based Flip-Flop using Cadence tool at 45nm

Homework 3 posted this week, due after Spring break Quiz #2 today Midterm project report due on Wednesday No office hour today

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock.

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

Timing EECS141 EE141. EE141-Fall 2011 Digital Integrated Circuits. Pipelining. Administrative Stuff. Last Lecture. Latch-Based Clocking.

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Digital System Clocking: High-Performance and Low-Power Aspects

Design of New Dual Edge Triggered Sense Amplifier Flip-Flop with Low Area and Power Efficient

P.Akila 1. P a g e 60

A Power Efficient Flip Flop by using 90nm Technology

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

A NOVEL APPROACH TO ACHIEVE HIGH SPEED LOW-POWER HYBRID FLIP-FLOP

CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm

Improved Sense-Amplifier-Based Flip-Flop: Design and Measurements

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

EE273 Lecture 11 Pipelined Timing Closed-Loop Timing November 2, Today s Assignment

Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

Clock Generation and Distribution for High-Performance Processors

Asynchronous Model of Flip-Flop s and Latches for Low Power Clocking

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

Lec 24 Sequential Logic Revisited Sequential Circuit Design and Timing

Digital Integrated Circuit Design II ECE 426/526, Chapter 10 $Date: 2016/04/07 00:50:16 $

Lecture 1: Circuits & Layout

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Topic 8. Sequential Circuits 1

MUX AND FLIPFLOPS/LATCHES

AN EFFICIENT DOUBLE EDGE TRIGGERING FLIP FLOP (MDETFF)

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Synchronous Timing. Latch Parameters. Class Material. Homework #8 due next Tuesday

High performance and Low power FIR Filter Design Based on Sharing Multiplication

Logic Design. Flip Flops, Registers and Counters

Sequential Logic. References:

Reduction of Area and Power of Shift Register Using Pulsed Latches

Clocked Storage Elements in High-Performance and Low-Power Systems. Further reproduction without written permission is strictly prohibited.

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

Fundamentals of Computer Systems

Flip-Flops. Because of this the state of the latch may keep changing in circuits with feedback as long as the clock pulse remains active.

CMOS Latches and Flip-Flops

Lecture 1: Intro to CMOS Circuits

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

Fundamentals of Computer Systems

cascading flip-flops for proper operation clock skew Hardware description languages and sequential logic

Asynchronous Data Sampling Within Clock-Gated Double Edge-Triggered Flip-Flops

Power Distribution and Clock Design

First Name Last Name November 10, 2009 CS-343 Exam 2

SEQUENTIAL CIRCUITS SEQUENTIAL CIRCUITS

International Journal of Engineering Research in Electronics and Communication Engineering (IJERECE) Vol 1, Issue 6, June 2015 I.

Glitches/hazards and how to avoid them. What to do when the state machine doesn t fit!

Chapter 5 Flip-Flops and Related Devices

Engr354: Digital Logic Circuits

EE241 - Spring 2013 Advanced Digital Integrated Circuits. Announcements. Lecture 14: Statistical timing Latches

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Optimization of Scannable Latches for Low Energy

A Low-Power CMOS Flip-Flop for High Performance Processors

Synchronization in Asynchronously Communicating Digital Systems

Design of Low Power Universal Shift Register

Low Power and Reduce Area Dual Edge Pulse Triggered Flip-Flop Based on Signal Feed-Through Scheme

Lecture 23 Design for Testability (DFT): Full-Scan

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

Digital System Design

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

EECS150 - Digital Design Lecture 17 - Circuit Timing. Performance, Cost, Power

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Sequential Circuits

Transcription:

Lecture 6 Clocked Elements Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2006 Mark Horowitz, Ron Ho Some material taken from lecture notes by Vladimir Stojanovic and Ken Mai 1 Overview Readings (For next lecture on clocking) Gronowski Alpha clocking paper Restle Clock grid paper Harris Variations paper Today s topics Latches and flops overview Power and timing metrics High-performance design and low-energy design Examples 2

Why Are Clocked Elements Important? A graph from the first lecture, showing cycle time in FO4 100 intel 386 intel 486 intel pentium intel pentium 2 intel pentium 3 intel pentium 4 intel itanium Al pha 21064 Al pha 21164 Al pha 21264 Spar c Super Spar c Spar c64 Mips HP PA Power PC AM K6 AM K7 AM x86-64 Cycle in FO4 10 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 A clock cycle contains less and less time each generation 3 We Need Faster Clocked Elements Note that the previous graph invites us to extrapolate blindly In just a few years, clock frequencies will go to infinity!! Reality says clock cycle times will level out There is some disagreement on the actual limits 6-8 FO4 per cycle is optimum (Hrishikesh, ISCA 02) Yes, but including overhead, more like 18 (Srinivasan, ISM 02) Today: 16 FO4/cycle in Pentium4; 11 FO4/cycle in Sony Cell If a flop has an overhead of 3FO4, this is 15%-30% overhead We can fill less of the cycle time with work Making a faster flop helps performance significantly 4

We Need Lower Power Clocked Elements Recall that chip power density is climbing up Rocket nozzle! Surface of the sun! Oh, my! Because power is C*V 2 f and V is not scaling anymore (V!= 10L gate ) Part of this power is for clock distribution Clocking requires 70% of core power in Power4 (2001) Clocking requires 25% of core power in dual-core Itanium (2005) Almost all of this power is in driving the latches/flops at the ends Only around 20-25% of clock power is in clock transmission Saving power at the ends is better than saving power in the wires Making a more efficient flop helps power significantly 5 Timing Overhead, Illustrated A standard circuit view of a flop system Logic T cycle T cq T logic T su Flop timing overhead is the data-to-output() delay T data-to- = T setup + T clk-to- = T su + T cq Next look at some basic latch and flop designs and metrics But first, an analogy 6

An Analogy Of Timing Clocked datapaths are like streets with traffic lights Cars moving down the street are data Some cars speed, some cars drive normally, some crawl Principal rule: cannot have two cars collide into each other As a car (data) comes to a light (flop or latch) It passes through if the light is green and waits if it s red A clocked system has a master controller for the lights All the lights turn red/green at the same time (as in Manhattan) In some systems, the lights are green very briefly (flops) In other systems, the lights stay green half the time (latches) Satisfy principal rule: ensure all cars only pass one block per light The point of traffic lights is to slow down fast cars 7 The Analogy s Timing Failures A traffic light (flop or latch) can fail the principal rule in two ways A car can be too slow to make it to the next block: Max path The car hadn t reached the intersection when the light turned red A car can race through more than one block: Min path The intersection didn t stay clear just when the light went red Failures often arise because traffic light timing is not perfect ifferences between traffic lights (skew), or local variations (jitter) Fixing these failures Max paths are fixable: Slow down the master controller Min paths are not: Rebuild the street (or the light control wires)! Cost of 3 months fab time and $1M in mask costs 8

By The Way o you really need traffic lights? No, if you can guarantee that all cars travel at the same rate Wave-pipelining ensures all datapaths have the same delay o all the lights need to switch at the same time? No, and long blocks might benefit from different light timing Intentional clock skew on chips helps for timing problems o you really need a master controller for all the lights? No, if you have cars (drivers) negotiate between themselves Asynchronous circuits handshake between data items 9 A Note On Terminology Avoid saying a latch or flop is open or closed There is ambiguity in these terms Water flows if you open a valve Current stops if you open a switch The common wording today is a little awkward, but clear A timing element transfers data when it is transparent It does not transfer data when it is opaque Other choices include blocking and non-blocking The oddest I have ever seen was Permissive and Prohibitive But don t use that 10

Latches and Flops Latch has soft timing: transparent when clock=1 transparent opaque Flop has hard timing: transparent when clock 0 1 sample edge 11 Latch Timing Setup and hold times are defined relative to the clock fall Setup time: how long before the clock fall must the data arrive Hold time: how long after the clock fall must the data not change elay depends on arrival time of data relative to clock rise On early data arrival, delay = T cq On late data arrival, delay = T dq transparent opaque Early su hold Late 12

Flop Timing Setup and hold times are defined relative to the clock rise Setup time: how long before the clock rise must the data arrive Hold time: how long after the clock rise must the data not change elay is always T cq, as long as data hits the setup constraint su hold 13 Latch and Flop Timing Softness of latch timing edges allows time borrowing Nominally a latch expects its data when the latch goes transparent But the latch will accommodate late data Until the data runs into the falling edge of the clock (going opaque) Time-borrowing works backwards ( slack forwarding ) and forwards Flops generally do not allow time borrowing For some latches and flops, setup time is negative The data can change just after the latch goes opaque The latch can still see that data change 14

Flop elay What s the delay through a flop? ata must get to the flop by a setup time T su ata must traverse the flop and take up T cq So the time available to do logic is what s left T logic = T cycle T su T cq T skew Logic T cq T cycle T logic T su 15 Flop -to- elay epends On ata-to- 350 300 -Output [ps] Setup 250 200 150 Hold 100 50 Sampling Window -200-150 -100-50 0 0 50 100 150 200 ata- [ps] Source: Stojanovic 16

Examine The Setup Half Of That Graph Constant cq region Variable cq region Failure region ata-to-output delay T dq 45 o T cq ata-to-clock delay optimum 17 What About Power? To fairly compare power of various flops or latches, include External power to drive the clock input External power to drive the data input Internal power required to switch interior nodes Internal power required to drive output load V P V V P LOA CLK CLK b P CLK P INT 18

Simplest CMOS Latch Basic transparent high latch (Figure 11.2) is simply a passgate clk data q_b clk_b Very simple and compact Stores data dynamically subject to leakage and noise problems 19 Bad Things Can Happen To This Latch Various modes of failure, including Source: Chandrakasan Ch. 11 20

Buffered Transparent High Latch Avoid input noise with a local input inverter clk data q clk_b Still have problems with the storage node 21 Jam Latch Make the storage node static clk weak data q clk_b Feedback inverter is very weak and loses in a fight Burns power during the fight until the latch flips Beware mixed process corners (SF, FS) for overpowering latch! 22

Tristate Latch Prevent the fight by shutting off the feedback device clk clk_b data clk q clk_b Count on constant input drive during latch transparency Note that input inverter+passgate can be a tristate as well 23 Robust CMOS Latch on t take the output from the storage node directly clk clk_b data clk clk_b q This is a good, safe latch design Not the fastest in the world (can trade-off speed for ANGER) Setup and hold times are close to 0 24

Flip-Flops Come In Several Flavors You can make a flip-flop out of two back-to-back latches M-S Flop clk_b You can make it out of a edge-triggered element, plus SR latch Edge Flop Pulse Gen clk R S 25 Flip-Flops Come In Several Flavors, con t You can also make a flip-flop out of a pulsed (glitch) latch Glitch Latch pulse gen clk All three flavors are (almost) functionally indistinguishable Although they may react differently to clock skew Will look at this in a later lecture 26

Flavor 1: Simplest Flop Is Master-Slave World s first LSI calculator chip (Tokyo Shibaura Electric) Real old-school but it s a master-slave All tristates instead of passgates 1 M 1 1 1 1 Source: Suzuki, JSSC 1973 27 Flavor 1: Another Master-Slave PowerPC 603 flop Note this is a negative edge-triggered flop (clks are reversed) Faster than C2MOS, but at worse input noise (no tristate) Master clk usually generated from slave (why not vice versa?) Vdd Vdd b b Source: Gerosa, JSSC 1994 28

Flavor 1: Yet Another Master-Slave (With Scan) IBM Power4 latch with scan-ability (very common today) Source: Warnock, IBM JR, 2002 29 Flavor 2: True Edge-Triggered Flops ec 21264 Alpha flop (Madden & Bowhill, 90, Matsui 94) A sense-amp (pulse generator) followed by a capturing RS latch Negative setup time available Why? S R Pulse generator =0 pulse =1 Capturing Latch delays not equal Source: Madden & Bowhill, JSSC, 1990 30

(Flavor 2) Improved Strong-Arm Flop Faster S-R stage Sized for improved switching Bigger drivers and smaller keepers Symmetric and _b delays Reduce the hysterisis of the latch Slam the node high/low strongly Then in precharge, hold it weakly Makes it faster, but at what cost? Coupling immunity Yet another application of NFL No free lunch 31 Source: Nikolic & Stojanovic, ISSCC 99 Flavor 3: Glitch Latch or Pulse Latch Just like a standard latch, only with a pulsed (glitched) clock Looks and smells like a flip-flop Pulse gives you negative setup time and a soft edge (latch-like) Generating the pulsed clock can be tricky over process corners A clock chopper or single-shot o this locally at the pulse latch for each latch or small group istributing this pulse is dangerous: pulses disappear So often combine this functionality into the latch itself clk pulse 32

(Flavor 3) Glitch Latch AM K6 latch, called a Hybrid Latch Flip-Flop (i.e., glitch latch) Vdd =1 =0 =0 Pulse Generator =1 signal at node X Second Stage Latch Source: Partovi ISSCC 96 33 (Flavor 3) Abstraction of the HLFF Pulse Generator Enable Second Stage Latch =0 =1 =1 signal at node X 34 =0

(Flavor 3) A Semi-ynamic Flip-Flop Sun UltrasparcIII pulsed latch I S Source: Klass VLSI 98 1 ynamic pulse generator, unlike the NAN3 in the HLFF Beware the 1-1 glitch 35 elay Comparisons riving a load of 14 inverters, in a 0.18μm technology elay [FO4] 5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 Min - elay Comparison Pulse latches are faster MSL C2MOS HLFF SFF SAFF M-SAFF Source: Stojanovic 36

Energy Comparisons riving a load of 14 inverters, in a 0.18μm technology Energy breakdown (50% activity) Energy [fj] 120 100 80 60 40 Latches are lower energy Ext. clock Ext. data Int. clock Internal non-clk 20 0 MSL C2MOS HLFF SFF SAFF M-SAFF Source: Stojanovic 37 What About Imperfect Clocks? Clocks do not always arrive on time Clocks to different clocked elements arrive at different times: SKEW One clock will arrive at different times from cycle to cycle: JITTER Clock performance is a function of the distribution grid (later) clk1 t skew t +jit clk2 t -jit 38

o Clocked Elements Have Transparency? Change in - delay < clock uncertainty The clocked element absorbs some of the clock uncertainty There is a range of clk arrivals for which - is constant For that range the clocked element looks like a combinational gate 300 280 - delay [ps] 260 240 220 t CU M m 200-30 -20-10 0 10 20 30 40 50 60 Nominal arrival time [ps] 39 Clock Uncertainty Absorption Helps to mitigate the effects of clock skew (HLFF shown below) More skew-tolerant circuits will be discussed in a later lecture 40