Power-Optimal Pipelining in Deep Submicron Technology

Similar documents
Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

EE-382M VLSI II FLIP-FLOPS

POWER OPTIMIZED CLOCK GATED ALU FOR LOW POWER PROCESSOR DESIGN

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

II. ANALYSIS I. INTRODUCTION

Performance Driven Reliable Link Design for Network on Chips

Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Design of Low Power Universal Shift Register

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

A Novel Low-overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application

Digital System Clocking: High-Performance and Low-Power Aspects

Simultaneous Control of Subthreshold and Gate Leakage Current in Nanometer-Scale CMOS Circuits

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor

Digital Integrated Circuits EECS 312

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Design and Analysis of CNTFET Based D Flip-Flop

Load-Sensitive Flip-Flop Characterization

ECE 555 DESIGN PROJECT Introduction and Phase 1

SGERC: a self-gated timing error resilient cluster of sequential cells for wide-voltage processor

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

Performance Modeling and Noise Reduction in VLSI Packaging

LFSR Counter Implementation in CMOS VLSI

Design and analysis of RCA in Subthreshold Logic Circuits Using AFE

ECE321 Electronics I

A Novel Approach for Auto Clock Gating of Flip-Flops

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Asynchronous (Ripple) Counters

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

LOW POWER LEVEL CONVERTING FLIP-FLOP DESIGN BY USING CONDITIONAL DISCHARGE TECHNIQUE

RAZOR: CIRCUIT-LEVEL CORRECTION OF TIMING ERRORS FOR LOW-POWER OPERATION

DESIGN OF LOW POWER TEST PATTERN GENERATOR

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

University College of Engineering, JNTUK, Kakinada, India Member of Technical Staff, Seerakademi, Hyderabad

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

Novel Low Power and Low Transistor Count Flip-Flop Design with. High Performance

Design and Evaluation of a Low-Power UART-Protocol Deserializer

Built-In Proactive Tuning System for Circuit Aging Resilience

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

A Low-Power CMOS Flip-Flop for High Performance Processors

Digital System Design

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

55:131 Introduction to VLSI Design Project #1 -- Fall 2009 Counter built from NAND gates, timing Due Date: Friday October 9, 2009.

Retiming Sequential Circuits for Low Power

Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks. A Thesis presented.

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Design and Analysis of a Linear Feedback Shift Register with Reduced Leakage Power

Research Article Power Consumption and BER of Flip-Flop Inserted Global Interconnect

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

ESE 570 STATIC SEQUENTIAL CMOS LOGIC CELLS. Kenneth R. Laker, University of Pennsylvania, updated 25Mar15

Design of Low Power D-Flip Flop Using True Single Phase Clock (TSPC)

Asynchronous Data Sampling Within Clock-Gated Double Edge-Triggered Flip-Flops

Static Timing Analysis for Nanometer Designs

PICOSECOND TIMING USING FAST ANALOG SAMPLING

DUAL EDGE-TRIGGERED D-TYPE FLIP-FLOP WITH LOW POWER CONSUMPTION

ISSN:

Figure.1 Clock signal II. SYSTEM ANALYSIS

P.Akila 1. P a g e 60

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

Research Article Ultra Low Power, High Performance Negative Edge Triggered ECRL Energy Recovery Sequential Elements with Power Clock Gating

New Single Edge Triggered Flip-Flop Design with Improved Power and Power Delay Product for Low Data Activity Applications

AN EFFICIENT DOUBLE EDGE TRIGGERING FLIP FLOP (MDETFF)

International Journal of Computer Trends and Technology (IJCTT) volume 24 Number 2 June 2015

Sharif University of Technology. SoC: Introduction

CHAPTER 4 RESULTS & DISCUSSION

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

FinFET-Based Low-Swing Clocking

EET2411 DIGITAL ELECTRONICS

Design of a High Frequency Dual Modulus Prescaler using Efficient TSPC Flip Flop using 180nm Technology

Design And Analysis of Clocked Subsystem Elements Using Leakage Reduction Technique

Design of Fault Coverage Test Pattern Generator Using LFSR

EECS150 - Digital Design Lecture 17 - Circuit Timing. Performance, Cost, Power

Digital Integrated Circuits EECS 312

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

Tutorial Outline. Design Levels

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

DESIGN OF A NEW MODIFIED CLOCK GATED SENSE-AMPLIFIER FLIP-FLOP

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

On the Rules of Low-Power Design

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Chapter 7 Sequential Circuits

Power Distribution and Clock Design

An FPGA Implementation of Shift Register Using Pulsed Latches

Chapter 6. Flip-Flops and Simple Flip-Flop Applications

Power Optimization of Linear Feedback Shift Register (LFSR) using Power Gating

DESIGN AND ANALYSIS OF ADDER CIRCUITS USING LEAR SLEEP TECHNIQUE IN CMOS TECHNOLOGIES

Design of an Efficient Low Power Multi Modulus Prescaler

International Journal of Scientific & Engineering Research, Volume 5, Issue 11, November-2014 ISSN

LOW-POWER CLOCK DISTRIBUTION IN EDGE TRIGGERED FLIP-FLOP

Design Project: Designing a Viterbi Decoder (PART I)

Transcription:

ISLPED 2004 8/10/2004 -Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanovi Computer Architecture Group, MIT CSAIL

Traditional Pipelining Goal: Maximum performance Vdd Clk-Q Setup Propagation Delay Clk Clk Clk

Pipelining as a Low- Tool Goal: Low-, Fixed Throughput Vdd Clk-Q Setup Propagation Delay Clk Time Slack Clk Time Slack Clk

Pipelining as a Low- Tool Goal: Low-, Fixed Throughput Vdd Clk-Q Setup Propagation Delay Clk Time Slack Traded for (supply voltage scaling) Clk Time Slack Clk

Pipelining as a Low- Tool * Clock frequency fixed Flip-flop Overhead Pipelining Time slack Delay

Pipelining as a Low- Tool * Clock frequency fixed Supply voltage scaling Saving Delay

-Optimal Pipelining reduction from pipelining limited by power overhead of increased number of flip-flops -Optimal Pipelining

-Optimal Pipelining reduction from pipelining limited by power overhead of increased number of flip-flops -Optimal Pipelining Too shallow pipelining Delay

-Optimal Pipelining reduction from pipelining limited by power overhead of increased number of flip-flops -Optimal Pipelining Too deep pipelining Too shallow pipelining Delay

-Optimal Pipelining reduction from pipelining limited by power overhead of increased number of flip-flops -Optimal Pipelining Too deep pipelining Optimal pipelining Too shallow pipelining Optimal Saving Delay

Pipelining is an old idea. Contribution Research focus has been on performance impact of pipelining. Idea of using pipelining [Chandrakasan 92] to lower power has not been fully explored in deep submicron technology. Analysis and circuit-level simulation of -Optimal Pipelining for different regimes of V th, activity factor, clock gating

Bottom-to-Top Approach 1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating) Total (clock-gated) active inactive active Time Switching Component Leakage Component Idle Component

Bottom-to-Top Approach 1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating) Total (not clock-gated) Switching Component Leakage Component active inactive active Idle Component Time *Idle power = power consumed when circuit is idle and not clock-gated

Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant Test bench TG flip-flops Methodology BPTM (Berkeley Predictive Technology Model) 70nm process: LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24) Hspice simulation at 100 C, Clock = 2 GHz Baseline N FO4 inverters (N = 2 ~ 24) TG flip-flops One Pipeline Stage

Pipelining and Switching : Analytical Trend Switching Optimal Saving Flip-flop overhead O(1/N) Optimal FO4 O(N 2 ) Quadratic reduction of logic switching power V 2 dd N 2 Number of FO4 per stage, N

Pipelining and Leakage : Analytical Trend Leakage Optimal Saving O(1/N) Flip-flop overhead Optimal FO4 O(N α ) (1<α< 2) Superlinear reduction of logic leakage power V dd * e(ηv dd ) N α DIBL effect Number of FO4 per stage, N

Pipelining and Idle : Analytical Trend Clock-gating is not always possible Increased control complexity insufficient setup time of clock enable signal Leakage + Flip-flop Switching Between leakage power scaling and flip-flop switching power scaling depending on leakage level

Relative Pipelining and Idle : Leakage Scale Optimal Saving Optimal FO4 O(1/N) Analytical Trend O(N α ) (1<α< 2) Idle Optimal FO4 Flip-flop Switching Scale O(1/N) Optimal Saving O(N) Linear reduction of Flip-flop switching power 1/N * V dd 2 N Number of FO4 per stage, N Number of FO4 per stage, N

Fixed Throughput @ 2 GHz Components Simulation Results: Components Switching Leakage Idle Right hand side curve O(N 2 ) O(N α ) (1<α< 2) O(N) or O(N α ) (1<α< 2) Saving* 79(HVT)~ 82(LVT)% 70(LVT)~ 75(HVT)% 55(HVT)~ 70(LVT)% N* 6 6 8 N = Number of FO4 inverters per stage N* = Optimal N Saving* = Optimal power saving by pipelining (Not including flip-flop delay)

Optimal Saving Optimal FO4 = 6 Clock Gating Optimal FO4 = 6~8 No Clock Gating relative power relative power *2 GHz *Flip-flop delay not included in optimal FO4 activity factor activity factor

Optimal Saving Optimal FO4 = 6 Optimal FO4 = 6~8 Clock Gating Idle No Clock Gating relative power Leakage relative power Switching Switching activity factor activity factor

Optimal Saving Optimal FO4 = 6 Optimal FO4 = 6~8 Clock Gating No Clock Gating relative power relative power LVT activity factor activity factor

Discussion LVT can be fast and power-efficient enables lower V dd Flip-flop delay more important than flip-flop power for power-optimal pipelining

Limitation of This Work Super-linear growth of flip-flops Additional memory Reduced glitches Parasitic wire capacitance Effect on optimal logic depth Effect on optimal power saving

Conclusion Pipelining is an effective low-power tool when used to support voltage scaling in digital system implementing highly parallel computation. Optimal Logic Depth: 6-8 FO4 ~ 8-10 FO4 including flip-flop delay Optimal Saving: 55 80% It depends on V th, AF, Clock-Gating Insights: Pipelining is more effective with High AF Pipelining is most effective at saving switching power Pipelining is more effective with lower V th Except for when leakage power is dominant. Pipelining is more effective with clock-gating reduced flip-flop overhead.

Acknowledgments Thanks to SCALE group members and anonymous reviewers Funded by NSF CAREER award CCR- 0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.

BACKUP SLIDES