Digital System Clocking: High-Performance and Low-Power Aspects

Similar documents
Lecture 26: Multipliers. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

ECE321 Electronics I

II. ANALYSIS I. INTRODUCTION

A Unified Approach in the Analysis of Latches and Flip-Flops for Low-Power Systems

Lecture 21: Sequential Circuits. Review: Timing Definitions

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

Sequential Circuit Design: Part 1

Load-Sensitive Flip-Flop Characterization

Digital System Clocking: High-Performance and Low-Power Aspects. Microprocessor Examples

P.Akila 1. P a g e 60

Sequential Circuit Design: Part 1

Lecture 6. Clocked Elements

A NOVEL APPROACH TO ACHIEVE HIGH SPEED LOW-POWER HYBRID FLIP-FLOP

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock.

EE-382M VLSI II FLIP-FLOPS

Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

Design of New Dual Edge Triggered Sense Amplifier Flip-Flop with Low Area and Power Efficient

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

Comparative study on low-power high-performance standard-cell flip-flops

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

ECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs

Design of Low Power Universal Shift Register

Improve Performance of Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop

Design of Pulse Triggered Flip Flop Using Conditional Pulse Enhancement Technique

A Power Efficient Flip Flop by using 90nm Technology

Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

Lecture 11: Sequential Circuit Design

A Low-Power CMOS Flip-Flop for High Performance Processors

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

Digital System Clocking: High-Performance and Low-Power Aspects

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Power Optimization Techniques for Sequential Elements Using Pulse Triggered Flip-Flops with SVL Logic

Topic 8. Sequential Circuits 1

Chapter 7 Sequential Circuits

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Design and Analysis of Semi-Transparent Flip-Flops for high speed and Low Power Applications in Networks

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

Asynchronous Model of Flip-Flop s and Latches for Low Power Clocking

Design And Analysis of Clocked Subsystem Elements Using Leakage Reduction Technique

DESIGN AND ANALYSIS OF LOW POWER STS PULSE TRIGGERED FLIP-FLOP USING 250NM CMOS TECHNOLOGY

Clocking Spring /18/05

Design And Analysis Of Implicit Pulsed Double Edge Triggered Clocked Latch For Low Power Applications

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

cascading flip-flops for proper operation clock skew Hardware description languages and sequential logic

GLITCH FREE NAND BASED DCDL IN PHASE LOCKED LOOP APPLICATION

Digital Integrated Circuits EECS 312

Analysis of Digitally Controlled Delay Loop-NAND Gate for Glitch Free Design

Energy-Delay Space Analysis for Clocked Storage Elements Under Process Variations

Asynchronous Data Sampling Within Clock-Gated Double Edge-Triggered Flip-Flops

AN EFFICIENT DOUBLE EDGE TRIGGERING FLIP FLOP (MDETFF)

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

International Journal of Engineering Research in Electronics and Communication Engineering (IJERECE) Vol 1, Issue 6, June 2015 I.

Low Power and Reduce Area Dual Edge Pulse Triggered Flip-Flop Based on Signal Feed-Through Scheme

UNIT 11 LATCHES AND FLIP-FLOPS

An efficient Sense amplifier based Flip-Flop design

CPE/EE 427, CPE 527 VLSI Design I Sequential Circuits. Sequencing

EFFICIENT POWER REDUCTION OF TOPOLOGICALLY COMPRESSED FLIP-FLOP AND GDI BASED FLIP FLOP

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

An FPGA Implementation of Shift Register Using Pulsed Latches

Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks. A Thesis presented.

Chapter 6. sequential logic design. This is the beginning of the second part of this course, sequential logic.

Minimization of Power for the Design of an Optimal Flip Flop

Memory, Latches, & Registers

11. Sequential Elements

Hardware Design I Chap. 5 Memory elements

Design of a High Frequency Dual Modulus Prescaler using Efficient TSPC Flip Flop using 180nm Technology

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

! Two inverters form a static memory cell " Will hold value as long as it has power applied

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

Sequential Logic. Sequential Circuits. ! Timing Methodologies " Cascading flip-flops for proper operation " Clock skew

A clock is a free-running signal with a cycle time. A clock may be either high or low, and alternates between the two states.

Logic Circuits. A gate is a circuit element that operates on a binary signal.

Digital Integrated Circuits EECS 312. Review. Combinational vs. sequential logic. Sequential logic. Introduction to sequential elements

Traversing Digital Design. EECS Components and Design Techniques for Digital Systems. Lec 22 Sequential Logic - Advanced

SEQUENTIAL LOGIC. Sequential Logic

LOW POWER LEVEL CONVERTING FLIP-FLOP DESIGN BY USING CONDITIONAL DISCHARGE TECHNIQUE

Power-Optimal Pipelining in Deep Submicron Technology

CMOS Latches and Flip-Flops

An Optimized Implementation of Pulse Triggered Flip-flop Based on Single Feed-Through Scheme in FPGA Technology

Sequential Logic. E&CE 223 Digital Circuits and Systems (A. Kennings) Page 1

DESIGN OF A NEW MODIFIED CLOCK GATED SENSE-AMPLIFIER FLIP-FLOP

ECEN620: Network Theory Broadband Circuit Design Fall 2014

Sequential logic. Circuits with feedback. How to control feedback? Sequential circuits. Timing methodologies. Basic registers

EFFICIENT TIMING ELEMENT DESIGN FEATURING LOW POWER VLSI APPLICATIONS

Sequential Circuits. Sequential Logic. Circuits with Feedback. Simplest Circuits with Feedback. Memory with Cross-coupled Gates.

MUX AND FLIPFLOPS/LATCHES

Lec 24 Sequential Logic Revisited Sequential Circuit Design and Timing

ALGORITHMS IN HW EECS150 ALGORITHMS IN HW. COMBINATIONAL vs. SEQUENTIAL. Sequential Circuits ALGORITHMS IN HW

Clocks. Sequential Logic. A clock is a free-running signal with a cycle time.

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

6. Sequential Logic Flip-Flops

Pulsed Flip-Flop with Dual Dynamic Node for Low Power using Embedded Logic

Figure 1 shows a simple implementation of a clock switch, using an AND-OR type multiplexer logic.

Transcription:

igital ystem Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic Chapter 8: tate-of-the-art Clocked torage Elements in CMO Technology Wiley-Interscience and IEEE Press, January 2003 Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 2 1

Transmission gate latches implest implementation Basic static latch Complete implementation (a) (b) (c) - only 4 transistors -ynamic when =1 -usceptible to noise - pull-up/pull-down keeper - Conflict at node whenever new data is written - Feedback turned off when writing to the latch - No conflict - Larger clock load Nov. 14, 2003 3 Transmission Gate Master-lave Latch (ML) 1 1 M M 1 Master Transmission Gate Latch ML with unprotected input (Gerosa et al. 1994), Copyright 1994 IEEE lave Transmission Gate Latch Nov. 14, 2003 4 2

Transmission Gate M Latch (continued) 1 1 M M 1 1 removed Protection from input noise ML with input gate isolation (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 5 Noise Robustness of M Latch istant driver 1 1 2 3 4 2 5 V V 1 noise on input 2 leakage α-particle and cosmic 3 rays 4 unrelated signal coupling 5 pow er supply ripple ources of noise affecting the latch state node (Partovi in Chandrakasan et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 6 3

Clocked CMO (C 2 MO) Latch removed Transmission gate latch with gate isolation (dynamic) C 2 MO latch (dynamic) Nov. 14, 2003 7 Clocked CMO (C 2 MO) M Latch 1 M 1 1 1 1 tate-keeping feedbacks outside the -to- path (uzuki et al. 1973), Copyright 1973 IEEE Nov. 14, 2003 8 4

M Latches: Comparison elay () and Race immunity (R) Energy per cycle C 2 MO: larger clock transistors: -maller delay and race immunity (80% of ML) -Higher energy consumption (1.4x more than ML) Nov. 14, 2003 9 Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 10 5

Hybrid Latch Flip-Flop (HLFF) 1 Transparent to only when and 1 are both high Limited clock uncertainty absorption mall delay mall clock load (Partovi et al. 1996), Copyright 1996 IEEE Nov. 14, 2003 11 emidynamic Flip-Flop (FF) I 1 ynamic-style first stage Fast, small clock load, logic embedding Consumes energy for evaluation whenever =1 ynamic-to-static latch in second stage tatic 1 hazard (Klass 1998), Copyright 1998 IEEE Nov. 14, 2003 12 6

tatic 1 hazard in FF If ==1 in previous cycle, race between and causes to falsely switch to 0 generated glitch Also seen in HLFF Nov. 14, 2003 13 ense-amplifier Flip-Flop (AFF) When =0, and R are high, and unchanged At rising edge of sense amplifier in 1 st stage generates a low pulse on either or R, based on which of and is higher Other node R or is driven high, preventing further changes Latch captures low level of or R and updates output Original design (Montanaro et al. 1996), Copyright 1996 IEEE Nov. 14, 2003 14 Both NAN gates must sequentially switch to change and R 7

AFF: Evolution of 2 nd tage Latch R R R R R R R R R R R R R R all-n-mo push-pull (Gieseke et al. 1991); complementary push-pull (Oklobdzija and tojanovic 2001) complementary push-pull with gated keeper (Nikolic, tojanovic, Oklobdzija, Jia, Chiu, Leung 1999). Nov. 14, 2003 15 Modified ense Amplifier Flip-Flop (MAFF) ense amplifier in 1 st stage generates a low pulse on either or R, based on which of and is higher ymmetric latch in 2 nd stage outputs are simultaneously pulled to Vdd and Gnd fast Large drive capability can be small Keeper in latch active only when there is no change No conflict R R (Nikolic et al. 1999), Copyright 1999 IEEE Nov. 14, 2003 16 8

Flip-Flops and M Latches: elay Comparison ( ) elay [FO4] 5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 ML C2MO HLFF FF AFF M-AFF M Latches are slow positive setup time, two latches in critical path AFF is slow: it waits for one output to switch the other Fastest structures are simple flip-flops with negative setup time CE delay comparison (0.18 μm, high load) Nov. 14, 2003 17 Flip-Flops: Timing Comparisons with Voltage caling elay comparison: Internal race immunity comparison: - Relative delay reduces with - mall race immunity, usually not a supply voltage due to concern in critical paths reduction of body effect (0.25 μm, light load) Nov. 14, 2003 18 9

Energy [fj] 120 100 80 60 40 Flip-Flops and M Latches: Energy Comparison Ext. clock Ext. data Int. clock Internal non-clk 20 0 ML C2MO HLFF FF AFF M-AFF In M Latches, internal nodes change only when input changes AFF, M-AFF: very small clock load, small 2 nd stage latch Most energy consumed in HLFF, FF with pulse generator and high internal switching activity CE energy breakdown (0.18 μm, 50% activity, high load) Nov. 14, 2003 19 Flip-Flops and M Latches: Energy Comparisons (0.25 μm, light load) (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 20 10

Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 21 Gated Transmission Gate M Latch Concept: inhibit clock switching when new = comp= XNOR If comp=0 ( ), circuit works as ML If comp=1 (=), =0, 1=1 latches closed, no output change, no internal power M 1 comp 1 M 0.5 * 0.5 0.5 0.5 * 1 comp 0.5 0.5 Gated ML (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 22 11

Gated TG M Latch: Timing and Energy etup time (U) and Hold time (H) comparison with ML Energy comparison with ML Increased etup time in gated ML due to inclusion of the comparator into the critical path slower than conventional ML maller energy per transition if switching activity of is <0.3 For higher switching activity, comparator and clock generator dominate the energy consumption Nov. 14, 2003 23 ata-transition look-ahead latch Pulse Generator M I Clock Control Nov. 14, 2003 24 P 1 ata-transition Look-Ahead Pulsed latch in which the generation of clock pulses are gated with XOR TLA circuit If P 1 =0, circuit operates as a conventional pulsed latch If = P 1 =1 =0, no output change or energy consumption in the latch XOR circuit and Clock Control in the critical path large setup time and - delay (Nogawa and Ohtomo 1998), Copyright 1998 IEEE 12

TLA-L: Analysis of Energy Consumption M Pulse Generator I C in (CC) TLA-L without clock gating TLA-L Pulse Generator α 1 α E = ( E + E ) + E + E CML 01 10 Cin 2 2 E L FF α = ( E 2 0 1 + E 1 0 1 α ) + E 2 idle = E idle + ECLK + EL CC + E E E 0 1 + int ext 1 0 = E idle + ECLK + EG + Eint E + E N PG E idle = E0 0 = E1 1 = + E Cin α input switching activity Pulse generator shared among N TLA-L s Nov. 14, 2003 25 Energy comparison of TLA-L and CML E(TLA-L) < E(CML) E(TLA-L) > E(CML) TLA-L is more energy-efficient than CML when N>2 and α< 0.25 Nov. 14, 2003 26 13

Clock-on-demand PL Pulsed latch in which the generation of clock pulses are gated with XNOR TLA circuit If XNOR=0, 1 when, and 0 after has changed to If = XNOR=1 =0, no output change or energy consumption in the latch Pulse Generator includes clock control can not be shared among ata-transition Look-Ahead Pulse Generator multiple PL s (Hamada et al. 1999), Copyright 1999 IEEE XNOR Nov. 14, 2003 27 Energy-Efficient Pulse Generator in CO-PL XNOR "1 " "0 " C "1 int " "1" traightforward implementation with CMO gates C int switches in each cycle Energy-inefficient XNOR "0 " inv Compound AN- NOR gate Energy-efficient inv XNOR "1" Compound AN-NOR Nov. 14, 2003 28 14

Impact of circuit sizing on the energy efficiency of CO-PL CO-PL more effective in high-speed sizing due to large clock transistors (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 29 Conditional capture flip-flop First stage: pulse generator with internal clock gating When =1, =R=1 When =1, 1 =0, can switch low if =1, =0, R can switch low if =0, =1 Otherwise, =R=1 no energy consumption econd stage: pass-gate implementation of M-AFF latch (Oklobdzija, tojanovic) No setup time degradation due to clock gating * R R N N 1 R R * (Kong et al. 2000), Copyright 2000 IEEE Nov. 14, 2003 30 15

Comparison of latches and flip-flops with local clock gating: Timing elay comparison: Internal race immunity comparison: - elay relatively constant with supply - Generally R(FF)< R(ML)< R(gated ML) voltage - CO-PL has low race immunity due to - Latches with clock gating have very wide clock pulse large delay due to large setup time (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 31 Comparison of latches and flip-flops with local clock gating: Energy, EP Energy comparison: - Latches with gated clock consume less energy than ML if α < 0.2 0.3 Energy-elay Product comparison: α < 0.03 G-ML best - 0.03 < α < 0.23 TLA-L best - 0.23 < α Conventional ML best (Markovic et al. 2001), Copyright 2001 IEEE Nov. 14, 2003 32 16

Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 33 N-only clocked latches M M d 1 N-PL (c) N 1 N 2 N-ML (a) N-PPL (d) N-FF (b) Pulse Generator (b)-(d) Concept: Bring clock only to n-mo transistors to allow reduced clock swing without conflict with partially turned-off p-mo transistors Reduced clock swing reduces clocking energy with some penalty in performance Clock is always in critical path as its edge signalizes when to change the output (a) conventional TG ML, (b) pulsed-latch, (c) conventional PL, (d) push-pull PL Nov. 14, 2003 34 17

Low clock swing CE s comparison: energy and delay Full-swing: PL preferred for high-speed ML preferred for low energy Low-swing clock: N-FF preferred for high-speed N-PPL is preferred for low energy Energy / cycle (norm) Energy / cycle (norm) 1.2 1.0 0.8 0.6 0.4 0.2 0.00 0.8 0.6 0.4 0.2 0 N-PPL N-ML 0 1 2 3 4 5 6 ata-to- delay (FO3 inverter delay) Nov. 14, 2003 35 PL N-PPL N-FF N-FF ML N-PL High-Vdd Low-wing 130nm technology, 50fF load, max. input cap=12.5ff, data activity=0.1: (a) high-v dd and (b) low-swing Effect of clock noise on low-swing clock latch delay - delay degradation 20% 16% 12% 8% 4% 0% N-CL N-PPL N-FF 0% 3% 6% 9% 12% Noise on low-swing clock All latches fail for clock noise > 12% of clock voltage N-FF gives best clock noise rejection Nov. 14, 2003 36 18

Master-lave Latches Flip-Flops CE s with local clock gating Low clock swing ual-edge triggering Nov. 14, 2003 37 ET Latch-mux circuit (ET-LM) Pass-gate latches: One transparent when =0 One transparent when =1 Pass-gate multiplexer that selects the output of the opaque latch (Llopis and achdev 1996), Copyright 1996 IEEE Nov. 14, 2003 38 19

C 2 MO Latch-mux (C 2 MO-LM) N4 N2 C 2 MO latches: One transparent when =1 One transparent when =0 Multiplexer: two C 2 MO inverters that propagate the output of the opaque latch Large clock transistors shared between the latches and the multiplexer N5 N2 N3 N3 N4 N5 (Gago et al. 1993), Copyright 1993 IEEE Nov. 14, 2003 39 Pulsed-latch (ET-PL) 2 2 1 1 1 1 1 1 2 2 2 1 2 1 (a) Pulse generator transparent to only when = 1 =1, or when = 2 =1 shortly after both edges of the clock ET PL consumes lot of energy for four clocked pass gates To improve speed, modified from original design (trollo et al, 1999) which implemented n-mo-only pass gate and p-mo-only keeper (a) single - edge, (b) dual - edge triggered Nov. 14, 2003 40 (b) 20

ET ymmetric pulse generator flip-flop (PGFF) Two pulse generators: X active at rising edge of the clock, Y active at falling edge of the clock X and Y alternately precharge and evaluate At any moment, one of X and Y keeps the value of data sampled at the most recent clock edge The other X or Y is precharged high 1 1 st TA GE: X 1 st TA GE: Y 2 nd TA GE Nov. 14, 2003 41 X 1 1 2 Pulses at X and Y have same width as clock econd stage is a simple NAN gate no need for a latch Y 2 ET vs. ET: elay comparison 6 5 E E elay [FO4] 4 3 2 1 0 ML/LM C2MO-LM PL PGFF Latch-MUX s have two equally critical paths, somewhat shorter than that of ML PL is more complex, adding more capacitance to the critical path compared to ET PL PGFF has short domino-like critical path fastest Nov. 14, 2003 42 21

ET vs. ET: Power consumption comparison LM s benefit from clever implementation of latch-mux structure with clock transistors sharing PL adds extra highactivity capacitance compared to ET PL PGFF power consumption is in the middle, mainly due to alternate switching of nodes X and Y Power [uw] 180 160 140 120 100 80 60 40 20 0 Non-clk M L LM PL E PL E C2M O E C2M O E PGFF (0.18 μm, 500MHz for ET, 250MHZ for ET, high load) Nov. 14, 2003 43 EP [fj/500mhz], [fj/250mhz] 60 50 40 30 20 10 0 ET vs. ET: EP comparison ingle Edge ouble Edge M L/LM C2M O PL PGFF Latch-MUX s have similar or better EP than their ET counterparts PL exhibits worse delay and energy compared to ET PL, due to more complex design PGFF is fastest with moderate energy consumption: lowest EP EP (PGFF) < EP (LM) < EP (PL) (0.18 μm, 500MHz for ET, 250MHZ for ET high load) Nov. 14, 2003 44 22