Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

Similar documents
Design and Evaluation of a Low-Power UART-Protocol Deserializer

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

P.Akila 1. P a g e 60

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

II. ANALYSIS I. INTRODUCTION

Retiming Sequential Circuits for Low Power

Comparative study on low-power high-performance standard-cell flip-flops

Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks. A Thesis presented.

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock.

Novel Low Power and Low Transistor Count Flip-Flop Design with. High Performance

A Unified Approach in the Analysis of Latches and Flip-Flops for Low-Power Systems

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

Power-Optimal Pipelining in Deep Submicron Technology

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

A Low-Power CMOS Flip-Flop for High Performance Processors

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

International Journal of Scientific & Engineering Research, Volume 5, Issue 11, November-2014 ISSN

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

LFSR Counter Implementation in CMOS VLSI

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

FP 12.4: A CMOS Scheme for 0.5V Supply Voltage with Pico-Ampere Standby Current

Improve Performance of Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop

Design And Analysis of Clocked Subsystem Elements Using Leakage Reduction Technique

Design of Low Power D-Flip Flop Using True Single Phase Clock (TSPC)

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor

LOW POWER LEVEL CONVERTING FLIP-FLOP DESIGN BY USING CONDITIONAL DISCHARGE TECHNIQUE

Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module

Digital Integrated Circuits EECS 312

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

A Power Efficient Flip Flop by using 90nm Technology

Noise Margin in Low Power SRAM Cells

EE-382M VLSI II FLIP-FLOPS

DESIGN OF A NEW MODIFIED CLOCK GATED SENSE-AMPLIFIER FLIP-FLOP

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

Modified Ultra-Low Power NAND Based Multiplexer and Flip-Flop

Design Project: Designing a Viterbi Decoder (PART I)

Low-power Design Methodology and Applications utilizing Dual Supply Voltages

Load-Sensitive Flip-Flop Characterization

Design of a High Frequency Dual Modulus Prescaler using Efficient TSPC Flip Flop using 180nm Technology

Sequential Logic. References:

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

CMOS Design Analysis of 4 Bit Shifters 1 Baljot Kaur, M.E Scholar, Department of Electronics & Communication Engineering, National

Power Reduction Techniques for a Spread Spectrum Based Correlator

Design and analysis of RCA in Subthreshold Logic Circuits Using AFE

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

Design And Analysis Of Implicit Pulsed Double Edge Triggered Clocked Latch For Low Power Applications

Topic 8. Sequential Circuits 1

GLITCH FREE NAND BASED DCDL IN PHASE LOCKED LOOP APPLICATION

Low Power Different Sense Amplifier Based Flip-flop Configurations implemented using GDI Technique

New Single Edge Triggered Flip-Flop Design with Improved Power and Power Delay Product for Low Data Activity Applications

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Design of Conditional-Boosting Flip-Flop for Ultra Low Power Applications

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Power Optimization by Using Multi-Bit Flip-Flops

Energy Recovering ASIC Design

POWER OPTIMIZED CLOCK GATED ALU FOR LOW POWER PROCESSOR DESIGN

An Efficient Power Saving Latch Based Flip- Flop Design for Low Power Applications

An efficient Sense amplifier based Flip-Flop design

Parametric Optimization of Clocked Redundant Flip-Flop Using Transmission Gate

Dual Slope ADC Design from Power, Speed and Area Perspectives

Design of New Dual Edge Triggered Sense Amplifier Flip-Flop with Low Area and Power Efficient

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

EFFICIENT POWER REDUCTION OF TOPOLOGICALLY COMPRESSED FLIP-FLOP AND GDI BASED FLIP FLOP

Digital System Clocking: High-Performance and Low-Power Aspects

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops

Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

A Symmetric Differential Clock Generator for Bit-Serial Hardware

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

PICOSECOND TIMING USING FAST ANALOG SAMPLING

PERFORMANCE ANALYSIS OF POWER GATING TECHNIQUES IN 4-BIT SISO SHIFT REGISTER CIRCUITS

Design and Analysis of a Linear Feedback Shift Register with Reduced Leakage Power

Low Power Digital Design using Asynchronous Logic

CMOS Low Power, High Speed Dual- Modulus32/33Prescalerin sub-nanometer Technology

Clock Tree Power Optimization of Three Dimensional VLSI System with Network

Lecture 26: Multipliers. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

High Frequency 32/33 Prescalers Using 2/3 Prescaler Technique

55:131 Introduction to VLSI Design Project #1 -- Fall 2009 Counter built from NAND gates, timing Due Date: Friday October 9, 2009.

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

An FPGA Implementation of Shift Register Using Pulsed Latches

Project 6: Latches and flip-flops

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

Design of Low Power and Area Efficient 64 Bits Shift Register Using Pulsed Latches

Research Article Ultra Low Power, High Performance Negative Edge Triggered ECRL Energy Recovery Sequential Elements with Power Clock Gating

EECS150 - Digital Design Lecture 2 - CMOS

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

Transcription:

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Reduction Stephanie Augsburger 1, Borivoje Nikolić 2 1 Intel Corporation, Enterprise Processors Division, Santa Clara, CA, USA. 2 Department of EECS, University of California, Berkeley, CA, USA stephanie.a.augsburger@intel.com Abstract Multiple supply voltages, multiple transistor thresholds and transistor sizing could be used to reduce the power dissipation of digital blocks. This paper presents a framework for evaluating the effectiveness of each of these approaches independently and in conjunction with each other. Results show the advantages of multiple supply, transistor sizing, and multiple threshold can be compounded to maximize power reduction. The order of application of these techniques determines the final savings in active and leakage power. 1. Introduction This work considers the combined effectiveness of multiple threshold voltages, multiple supply voltages and transistor sizing towards both active and leakage power reduction. The analysis includes design directives for achieving the optimal balance of power reduction techniques. techniques for low power consumption in modern VLSI are becoming increasingly important [1], [2]. As technology moves into deep submicron feature sizes, the designs are becoming essentially power limited. dissipation due to leakage current is increasing at an exponential rate. Projections show that leakage power will become comparable to dynamic power dissipation in the next few years [3]. Supply voltage has not been scaled aggressively enough to keep power per unit area constant over technology generations [4]. consumption in CMOS circuits can be categorized into a number of major components. The dominant source of power consumption is the dynamic switching power needed to drive the capacitive loads on gates. power is dissipated when a short circuit current occurs during a switching event. Other components include static power consumption, which is generally zero for most logic families, and leakage power. A general formula for power consumption is shown in (1) [5]. The parameter, α, is the switching probability of a gate. P ~ α + ( C V + I t ) L swing SC ( I DC+ ILeak ) VDD SC VDD f power is attributed to leakage current, which is related to threshold voltage as follows [5]: VT I 0 I S LEAK = W 10. (2) W0 W is the channel width, and S is the subthreshold slope. The typical value of S is 0.1V/decade, which reflects an order of magnitude increase in leakage current with a 0.1V drop in threshold voltage. In recent years, a number of power reduction techniques have emerged. These focused primarily on the individual effects of multiple-supply, multiple-threshold and gate sizing techniques. In [6], the basics of dualsupply design are introduced. A 10% to 20% power reduction was reported with a clustered voltage scaling (CVS) dual-v DD design. Dual-supply design methodology, including layout issues, is covered in [7]. Reference [8] presents the use of multiple-threshold assignment on a cell-by-cell basis and reports leakage reduction from 75% to 90%. A triple-threshold RISC processor was showcased in [9]. The use of transistor sizing for power reduction is a common technique and has been covered thoroughly in years past [10]. In [11], multiple-supply, multiple-threshold and transistor sizing were looked at individually from a theoretical standpoint. Rules of thumb for optimal supply voltages, threshold voltages and transistor sizes were derived from a series of equations. Each of these techniques trades off the same timing slack for potential energy savings. Individually, the most effective technique would achieve the largest savings with the same slack. Little work has been done in combining multiple-supply, multiple-threshold and transistor sizing techniques in order to compound power savings. Here, we examine these techniques when used in conjunction. (1)

This work is restricted to a digital block with a fixed throughput and constant latency. For a given delay, power and energy are minimized through selective adjustments to threshold and supply voltages, as well as transistor sizing. The target application for these results is general ASIC design, which is characterized by a limited number of paths that constitute the critical delay. The observation of cycle slack in the non-critical paths permits the introduction of these power reduction techniques without affecting the overall system throughput. Presently, to the best of our knowledge, no gate-level CAD tool support exists for joint evaluation of all of these three techniques. In order to facilitate this exploration, a design framework was constructed using MS Excel software and Spectre simulations. Models of basic gates were derived through simulation and a generic path-delay distribution was generated based on these gates. Inside this environment, a number of combinations of the three power reduction techniques were evaluated. Section 2 describes the test setup, while Section 3 details the results of the various attempted methods. These results are analyzed in Sections 4, and conclusions are presented in Section 5. 2. Test Setup 2.1. Delay and Models This work is based on a linear delay model, where the gate delay is expressed as a linear function of the load capacitance. We ignore the delay dependence on the input slope in this early evaluation. A simplified library is based on logic gates, each of them designed with multiple sizes. Using Cadence Composer and Spectre, these gates were simulated using a general-purpose 0.13µm CMOS process. The technology provides two threshold voltages, high speed (HS), or low V T, and low-leakage (LL), or high V T, with a spread of approximately 100mV. The baseline supply voltage was 1.2V. The value of 0.8V for the second supply, V DDL was chosen in accordance with [11]. Each gate/supply/threshold combination was simulated to determine active energy, leakage power and delay for several different load capacitances. To complete the models, delay and active energy were plotted against the load capacitance and then linear extrapolation was used to determine the slope and y-intercept values for both active energy and delay. An example of the linear model calculation is shown for a 2-input NAND gate in Figure 1. Figure 1. Linear delay and active-energy models for 2- input NAND gate with 0.8V V DD, HS V T 2.2. Level-converting Flip-Flops In multiple-v DD designs, level-converters are required to provide proper interfacing when connecting a gate with a lower supply to one with a higher supply. In a CVS approach, the level conversion function is embedded in a flip-flop and all V DDL cells are clustered at the ends of each path [7]. This CVS method of multiple-supply is adopted here. Conventional level-converting flip-flop (LCFF) [6] are standard master-slave latch pairs that rely on positive feedback of the cross-coupled inverter pair in the slave stage to restore logic swing. As a result an increased clock-to-output delay causes longer delay of the succeeding pipeline stage. In [12], LCFFs were designed that reduce flip-flop delay from the data input to output. The design most suited to this project, a pulsed latch with level conversion performed in a half-latch, was chosen as an alternative to the traditional LCFF.. This design effectively eliminates the delay penalty at V DDL = 1.0V and drastically reduces it at V DDL = 0.8V. The setup time is slightly increased, while clock-output delay is virtually unchanged. One of the drawbacks of using pulsed latches in general purpose logic is their long hold time. However, in dual supply designs, race conditions can be managed by lowering the supply voltage on short logic paths. The flip-flop timing and energy characteristics are shown in Table 1. The downside of this design is slightly increased power consumption. This power penalty is offset in a dual-supply design by the increase in the number of gates that can be placed in V DDL with this LCFF.

Table 1. LCFF normalized delay and energy Delay Baseline (FF) 1.0 1.0 LCFF @ 0.8V 1.25 2.80 LCFF @ 1.0V 0.94 1.29 2.3. Framework and Baseline Due to a lack of commercial CAD tool support, a structure was built in a MS Excel workbook in which individual gates can be combined to form paths. The delay, active energy, and leakage power for each path is calculated through the use of the gate models and the sizing of the gates. A baseline design, with lambdashaped distribution, was formed by randomly stringing together different combinations of gates, up to 12 gates per path, to form 500 paths. Initially, all gates were minimum size. Sizing was then used to bring each path to its maximum speed to approximate a synthesized design. Loading for each path was set to 50x the input capacitance of a flip-flop, or 175fF. This capacitance was chosen to approximate driving a local bus. Figure 2 shows the effect of creating the baseline design through sizing on the path-delay distribution of the initial unsized design. energy per transition, leakage power and maximum delay numbers for the two designs are listed in Table 2. 3. Results Results are presented for downsizing, dual-threshold and dual-supply applied independently to the initial design. Then, these techniques are combined to explore compounded effectiveness. Each method slows down non-critical paths for energy savings through lowering the supply voltage, increasing the threshold or downsizing the gates. 3.1. Sizing Gates off the critical paths were downsized where possible. Figure 3 details the effect of sizing on the pathdelay distribution, while Table 3 shows the normalized active energy and leakage power for the downsized design against the baseline design. Figure 3. Effect of sizing on path-delay distribution Table 3. /energy with transistor sizing Sizing 0.75 0.62 Figure 2. Baseline and initial unsized design pathdelay distributions Table 2. Summary of baseline and unsized designs [pj] [µw] Maximum Delay [ps] Initial unsized 214.9 45.86 1142.6 Baseline 293.3 76.80 721.2 3.2. Dual Threshold Voltages The dual-threshold technique was applied to both the baseline design and the sized design. Cell assignment (to either LL or HS) was done on a cell-by-cell basis, with the goal being maximum reduction of leakage energy for each path while still meeting timing goals. By replacing high-speed cells with low leakage cells, leakage power was reduced substantially in both the baseline design and the sized design. See Figure 4 for the path-delay distribution, and Table 4 for the normalized power/energy numbers. power was reduced more substantially

when dual-threshold was applied to the baseline design as opposed to when it was applied to the sized design. The small reduction in active energy with the dual-threshold design is primarily due to reduced gate channel capacitances in the off state and a small reduction in signal swings (V DD V T ) at intermediate nodes. Speed penalty ranges from 25 50% for high-v T cells. Figure 5. Path-delay distribution with dual-v DD, V DDL = 0.8V Figure 4. Path-delay distribution with dual-v T Table 4. /energy with dual-v T Dual-V T 0.97 0.12 Sizing + dual-v T 0.74 0.31 3.3. Dual-Supply The second supply voltage was selected as 0.8V using the rule-of-thumb in [11]. The CVS method [6] was used to determine the cluster of V DDL cells. All V DDL cells were grouped at the end of the path. The dual-supply technique was applied to both the baseline design and the sized design. The path-delay distributions for the dual-supply design (V DDL = 0.8V) are shown in Figure 5, and the normalized power/energy in Table 5. It can be concluded that using dual-supply is more effective than sizing for both active and leakage power reduction. Table 5. /energy with dual-v DD Dual-V DD 0.66 0.33 Sizing + dual-v DD 0.65 0.52 3.4. Combination of Techniques Using dual supplies is the most effective technique for active power reduction. However, due to large impact on delay of lowered supply voltage and limitations of CVS method, a part of the timing slack remains unused. It would be beneficial to apply the other two techniques on the paths with remaining slack for added energy savings. To determine if the benefits of dual-supply, dualthreshold and sizing are cumulative, they were applied in conjunction with each other. Sizing was added to the dual-v DD design of Section 3.3 (dual-v DD applied to baseline). See Figure 6 and Table 6 for results. Comparing these results with those where the baseline was sized and then dual-supply was applied confirms that dual-supply is more effective than sizing for both active energy and leakage power. A second V T was then applied to this design, which resulted in a small additional decrease in leakage power. The effect of dualthreshold was lessened because many more paths were critical after applying dual-supply and sizing and could not absorb the large delay penalty of high-v T cells.

4. Analysis of Results Figure 6. Path-delay distribution, combining dual-v DD with other techniques, V DDL = 0.8V The second supply voltage was added to the dualthreshold design of Section 3.2 (dual-v T applied to baseline). Sizing was then used to further reduce power consumption (see Figure 7 and Table 6). There exists an optimal energy for a given block. However, in practice, the techniques of dual-supply, dualthreshold and sizing would typically be applied sequentially. Depending on the order of application, different results are obtained for leakage and active power. Overall power is dependent on many factors, including switching activity and the process. Depending on the activity of a logic block, a different emphasis should be placed on the techniques used. For high activity, dualsupply should be the first technique applied, followed by transistor sizing and then dual-threshold, only if it does not impact the active power (this will depend on the value of V DDL ). However, if leakage power is the chief concern (low activity), dual-threshold takes precedence over the other techniques, followed by dual-supply and then transistor sizing. In this analysis, the starting point was a logic block with all paths sized for maximum speed with all low-v T transistors. This presents an over-design, but this is common in today s designs. Relative power savings are observed to depend on the ratio of the internal capacitance inside the block and the loading capacitance.. Also, absolute values of power savings depend on the type of logic block, as well as its loading. It should be noted that sizing cannot affect energy consumption on the load, while the second supply essentially starts from there. Therefore, in a block without substantial loading, dual-supply may not be superior to downsizing. 5. Conclusions and Future Work Figure 7. Path-delay distribution, combining dual-v T with other techniques, V DDL = 0.8V Table 6. / with combination of techniques Dual-V DD + sizing 0.55 0.25 Dual-V DD + sizing, 0.55 0.23 dual-v T Dual-V T + dual-v DD 0.81 0.12 Dual-V T + dual- 0.69 0.10 V DD, sizing The effects of dual-supply, dual-threshold and transistor sizing were evaluated on a typical logic block in order to gain a consistent design methodology for how and when each of these three common power reduction techniques should be used. The completed experimentation shows that energy savings from these three base techniques can be compounded through proper combination for additional benefit. The leftover slack after the introduction of the second supply, as the most effective active energy reduction technique, can be consumed by downsizing, for additional savings. The second threshold voltage should be used either as a first technique for low activity blocks, or the last technique to consume leftover slack of high-activity blocks. The large potential power savings shown in this study should motivate EDA support for design environments that combine these techniques, particularly in the area of multiple-supply voltages.

6. Acknowledgment This work was completed while S. Augsburger was a graduate student at the University of California, Berkeley, where it was supported in part by the MARCO/DARPA Gigascale Silicon Research Center (http://www.gigascale.org). S. Augsburger was supported by an SRC Masters Scholarship. Their support is gratefully acknowledged. 7. References [1] J. D. Meindl, Low power Microelectronics: Retrospect and Prosprect, Proceedings of the IEEE, Vol. 83, No. 4, pp.619, 1995. [2] A. Chandrakasan, S. Sheng and R. Brodersen, Low- CMOS Digital, IEEE Journal of Solid-State Circuits, vol. 27, no. 4, pp 473-484, April 1992. [3] V. De et al., Techniques for Reduction, in of High-Performance Microprocessor Circuits, IEEE Press, NJ, 2001, pp46-62. [4] J. Edmondson, Impact of Physical Technology on Architecture, in of High-Performance Microprocessor Circuits, IEEE Press, NJ, 2001, pp3-24. [5] T. Kuroda, Low- CMOS Circuit by Means of Supply-Voltage and Threshold-Voltage Control, Ph.D. Dissertation, University of Tokyo, December 1998. [6] K. Usami and M. Horowitz, Clustered Voltage Scaling for Low-, International Symposium on Low, pp 3-8, April 1995. [7] K. Usami and M. Igarashi, Low- Methodology and Applications Utilizing Dual Supply Voltages, Proceedings of the Asia and South Pacific Automation Conference 2000, pp 123-128, January 2000. [8] N. Kato et al, Random Modulation: Multi-Threshold- Voltage Methodology in Sub-2V Supply CMOS, IEICE Transactions on Electronics, vol. E83-C, no.11, pp 1747-1754, November 2000. [9] T. Yamashita et al, A 450MHz 64b RISC Processor Using Multiple Threshold Voltage CMOS, 2000 IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp 414-415, February 2000. [10] J. Rabaey, Digital Integrated Circuits: A Perspective, Prentice Hall, NJ, 1996. [11] M. Hamada, Y. Ootaguro and T. Kuroda, Utilizing Surplus Timing for Reduction, Proceedings of the IEEE 2001 Custom Integrated Circuits Conference, pp 89-92, May 2001. [12] F. Ishihara and B. Nikolic, Level-Converting Flip-Flops for Dual-Supply Systems, to be published, 2002.