Comparative study on low-power high-performance standard-cell flip-flops

Similar documents
ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

International Journal of Engineering Research in Electronics and Communication Engineering (IJERECE) Vol 1, Issue 6, June 2015 I.

II. ANALYSIS I. INTRODUCTION

Asynchronous Model of Flip-Flop s and Latches for Low Power Clocking

Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

Novel Low Power and Low Transistor Count Flip-Flop Design with. High Performance

A Power Efficient Flip Flop by using 90nm Technology

Energy-Delay Space Analysis for Clocked Storage Elements Under Process Variations

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

A Unified Approach in the Analysis of Latches and Flip-Flops for Low-Power Systems

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

Implementation of Counter Using Low Power Overlap Based Pulsed Flip Flop

Lecture 26: Multipliers. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

EE-382M VLSI II FLIP-FLOPS

A NOVEL APPROACH TO ACHIEVE HIGH SPEED LOW-POWER HYBRID FLIP-FLOP

AN EFFICIENT DOUBLE EDGE TRIGGERING FLIP FLOP (MDETFF)

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

A Low-Power CMOS Flip-Flop for High Performance Processors

An Optimized Implementation of Pulse Triggered Flip-flop Based on Single Feed-Through Scheme in FPGA Technology

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

An FPGA Implementation of Shift Register Using Pulsed Latches

Embedded Logic Flip-Flops: A Conceptual Review

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

EFFICIENT TIMING ELEMENT DESIGN FEATURING LOW POWER VLSI APPLICATIONS

Design of New Dual Edge Triggered Sense Amplifier Flip-Flop with Low Area and Power Efficient

Load-Sensitive Flip-Flop Characterization

Lecture 21: Sequential Circuits. Review: Timing Definitions

Low Power Different Sense Amplifier Based Flip-flop Configurations implemented using GDI Technique

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

Modeling and designing of Sense Amplifier based Flip-Flop using Cadence tool at 45nm

DESIGN OF A NEW MODIFIED CLOCK GATED SENSE-AMPLIFIER FLIP-FLOP

DESIGN AND ANALYSIS OF LOW POWER STS PULSE TRIGGERED FLIP-FLOP USING 250NM CMOS TECHNOLOGY

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

Pulsed Flip-Flop with Dual Dynamic Node for Low Power using Embedded Logic

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

Design of Low Power Universal Shift Register

Improve Performance of Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop

Dual Dynamic Node Flip-Flop Design with an Embedded Logic Design

Power Optimization Techniques for Sequential Elements Using Pulse Triggered Flip-Flops with SVL Logic

Design of Low Power D-Flip Flop Using True Single Phase Clock (TSPC)

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

Analysis of Low Power Dual Dynamic Node Hybrid Flip-Flop

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock.

Design And Analysis Of Implicit Pulsed Double Edge Triggered Clocked Latch For Low Power Applications

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Low Power and Reduce Area Dual Edge Pulse Triggered Flip-Flop Based on Signal Feed-Through Scheme

Clocking Spring /18/05

Design And Analysis of Clocked Subsystem Elements Using Leakage Reduction Technique

P.Akila 1. P a g e 60

Asynchronous Data Sampling Within Clock-Gated Double Edge-Triggered Flip-Flops

High Frequency 32/33 Prescalers Using 2/3 Prescaler Technique

11. Sequential Elements

ECE321 Electronics I

Design of a High Frequency Dual Modulus Prescaler using Efficient TSPC Flip Flop using 180nm Technology

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops

Project 6: Latches and flip-flops

Optimization of Scannable Latches for Low Energy

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

THE clock system, composed of the clock interconnection

Digital System Clocking: High-Performance and Low-Power Aspects

Design of Conditional-Boosting Flip-Flop for Ultra Low Power Applications

Sequential Logic. References:

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

D Latch (Transparent Latch)

Minimization of Power for the Design of an Optimal Flip Flop

Area Efficient Level Sensitive Flip-Flops A Performance Comparison

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

CMOS Latches and Flip-Flops

Comparison of Conventional low Power Flip Flops with Pulse Triggered Generation using Signal Feed through technique

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

New Single Edge Triggered Flip-Flop Design with Improved Power and Power Delay Product for Low Data Activity Applications

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

Improved Sense-Amplifier-Based Flip-Flop: Design and Measurements

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

Low Power D Flip Flop Using Static Pass Transistor Logic

Comparative Analysis of Pulsed Latch and Flip-Flop based Shift Registers for High-Performance and Low-Power Systems

Topic 8. Sequential Circuits 1

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Design of Shift Register Using Pulse Triggered Flip Flop

Lecture 6. Clocked Elements

Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

(CSC-3501) Lecture 7 (07 Feb 2008) Seung-Jong Park (Jay) CSC S.J. Park. Announcement

I. INTRODUCTION. Figure 1: Explicit Data Close to Output

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

CMOS DESIGN OF FLIP-FLOP ON 120nm

LFSR Counter Implementation in CMOS VLSI

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Design and Analysis of Semi-Transparent Flip-Flops for high speed and Low Power Applications in Networks

Retiming Sequential Circuits for Low Power

Transcription:

Comparative study on low-power high-performance standard-cell flip-flops S. Tahmasbi Oskuii, A. Alvandpour Electronic Devices, Linköping University, Linköping, Sweden ABSTRACT This paper explores the energy-delay space of eight widely referred flip-flops in a 0.13µm CMOS technology. The main goal has been to find the smallest set of flip-flop topologies to be included in a high performance flip-flop cell library covering a wide range of power-performance targets. Based on our comparison results, transmission gate-based flipflops show the best power-performance trade-off with a total delay (clock-to-output + setup time) down to 105ps. For higher performance, the pulse-triggered flip-flops are the fastest (80ps) alternatives suitable to be included in a flip-flop cell library. However, pulse-triggered flip-flops consume significantly larger power (about 2.5x) compared to other fast but fully dynamic flip-flops such as TSPC and dynamic TG-based flip-flops. Keywords: flip-flops, latches, low-power, standard-cell, cell library, energy-delay space 1. INTRODUCTION For high performance VLSI chip-design, the choice of the back-end methodology has a significant impact on the design time and the design cost. Making every single gate from scratch is not necessarily the best method. Instead, a sufficient set of pre-designed standard cells can be utilized as building blocks to design most of the functional blocks. Semiconductor manufacturers offer standard cell libraries, which are also supported by CAD tools in automated design flows including the final physical auto-placement and routing. However, the selection of the standard cells as well as their performance is often limited. Despite the performance limitations, standard cell libraries could be useful even in design of high performance VLSI chips. Often, only a smaller portion of the chips include performance-critical units, and the rest of the design could be maximally automated to reduce the design time without degrading the targeted performance. In addition, the concept of cell library can be extended to even support the full-custom part of the chip. Custom (in house) cell libraries can be made and shared by the designers of the performance critical units. This results in a sharp decrease in the number of cells to be created and verified reducing the total chip layout time significantly. Hence, development of an efficient cell library for high performance chips is essential. A cell library includes a number of cells with different functionalities, where each cell may be available in several sizes and with different driving capability. Two central categories of cells included in cell libraries are flip-flops and latches. These are extremely important circuit elements in any synchronous VLSI chip. They are not only responsible for correct timing, functionality, and performance of the chips, but also their clocked devices consume a significant portion of the total active power. A universal flip-flop with the best performance, lowest power consumption, and highest robustness against noise would be an ideal component to be included in cell libraries. However, it will be shown in this paper, that increasing the performance of flip-flops generally involves significant power and robustness trade-offs. Therefore, a set of different latches and flip-flops with different performances are essential to limit the use of more power consuming and noisesensitive elements only for smaller portion of the chips with performance-critical units. This eliminates global and unnecessary increase in power consumption as well as robustness degradations, which would result in overall decrease in noise margin requiring extra careful and time consuming design. 390 Microelectronics: Design, Technology, and Packaging, edited by Derek Abbott, Kamran Eshraghian, Charles A. Musca, Dimitris Pavlidis, Neil Weste, Proceedings of SPIE Vol. 5274 (SPIE, Bellingham, WA, 2004) 0277-786X/04/$15 doi: 10.1117/12.530225

The goal of this work is to find a small set (ideally the smallest set) of flip-flop topologies to be included in a library covering a wide range of power-performance targets. Our strategy has been to first explore the capabilities of conventional and simpler transmission-gate (TG) based flip-flop topologies, before including other types of flip-flops. Among a large number of flip-flops that have been proposed in the past [1-5], we have selected some of the widely used and/or referred topologies. Sec.2 shows eight flip-flops we have incorporated in our initial benchmark including static and dynamic edge-triggered mater-slave as well as semi-dynamic pulsed flip-flops. In contrast to many previously published results [3], [5], we have explored a wide power-performance space for each of the eight flip-flops. By sizing, we have identified the useful operating ranges of the flip-flops. The design-space exploration not only enables a true comparison, but also it reveals potentially large overlaps in operating range of the flip-flops. This in turn provides an opportunity to reduce the number of different circuit topologies in a flip-flop library. Sec. 3 describes our simulation setup, as well as the flip-flop parameters we have considered in our comparisons. In Sec. 4 we show the comparison results including the energy-delay space for each flip-flop topology followed by the conclusions in Sec. 5. 2. FLIP-FLOP TOPOLOGIES As was described in Sec.1, many flip-flop topologies have been proposed in the past. For our comparative study, we have selected some of widely used and/or referred topologies in our initial benchmark. Four static master-slave flipflops are included in our test bench. Figure 1 shows the classic transmission-gate based flip-flop (TGMS) [1]. Another variation of TGMS is the flip-flop shown in figure 2, which is derived from PowerPC 603 master slave flip-flop [6]. In PowerPC 603 the interrupting feedback in the storage elements is based on C²MOS inverters. Figure 3 shows the third topology, which is a modified clocked inverter (mc²mos) [7], where the dynamic master-slave C²MOS flip-flop is modified to a pseudo-static C²MOS flip-flop by adding a C²MOS feedback at the outputs. Fourth master-slave flip-flop (figure 4) is based on the traditional SR-latch build by cross coupled NAND/NOR gates [1], [4]. The next two flip-flops (figures 5, 6) are pulse-triggered latches. They are based on a single latch, which is transparent within a short time (during a pulse) on the edge of the clock. Figure 5 shows a hybrid-latch flip-flop element (HLFF) [8], and figure 6 shows a semi-dynamic flip-flop (SDFF) [9]. Both of the pulse-triggered topologies require and include pulse generators. Further, there are two fully dynamic flip-flops in our benchmark; the TSPC flip-flop [10] in figure 7 and the dynamic transmission gate flip-flop [1], [2] in figure 8. These fast flip-flops (with floating nodes) are extra sensitive to noise and leakage currents. However, we have included their performance level as a reference to evaluate other flip-flops. Figure 1: TGMS flip-flop Figure 2: PowerPC 603 flip-flop Proc. of SPIE Vol. 5274 391

Figure 3: Modified C²MOS flip-flop (mc²mos) Figure 4: NAND-NOR flip-flop Figure 5: Hybrid latch flip-flop (HLFF) Figure 6: Semi-dynamic flip-flop (SDFF) Figure 7: 9T TSPC flip-flop Figure 8: fully-dynamic TGMS flip-flop 3. SIMULATION SETUP All the circuits are designed in a standard 0.13µm CMOS technology. The supply voltage used for simulations is 1.2V, and the operating temperature is 27 C. The simulation condition is shown in figure 9. All of the flip-flops utilize identical and fixed input drivers (minimum sized inverters) and are loaded equally by the input capacitances of four minimum sized inverters. 392 Proc. of SPIE Vol. 5274

Figure 9: The simulation test bench 3.1. Energy-delay space exploration We have explored the energy-delay space of each flip-flop by sizing the internal devices, while keeping the input drivers and the output loads fixed. The best achieved delays for a number of different target energy consumption points have been selected as the sub-optimal energy-delay points. Our timing and energy metrics are described as follows: 3.1.1. Delay and Timing metrics The performance of flip-flops is defined by three important time-windows and delays: Clock-to-Output delay, setup time, and hold time. For performance comparison, the setup time and hold times require a clear definition. An edgetriggered flip-flop requires the input data to be stable some time before the edge of the clock. This time-window is referred as the setup time of a flip-flop. The time after the clock edge for which the input has to remain stable is called the hold time. The setup time can be defined and measured in different ways. The time window could be measured by referring it to a timing edge, where a flip-flop fails to receive the data. However, this definition might be impractical. Before the input reach the failure limit, the flip-flop responses slower to the input data. This increases the delay of the flip-flop. Throughout this paper, we use the following definition for setup and hold time which was also used in [3]: Setup time and hold time are the data to clock offsets which cause 5% increase of Clock-to-Output. This is illustrated in figure 10. Propagation delay, setup time and hold time may be different for low-to-high or high-to-low input transitions. For the comparisons, we have chosen the worst case delays. t Clock-to-Output =max ( t Clock-to-OutputLH, t Clock-to-OutputHL), t setup =max ( t setuplh, t setuphl ), t hold =max ( t holdlh, t holdhl ) A t skew t Logic B Combinational logic Figure 11: Flip-flops at the logic boundaries Figure 10: Setup time and hold time definitions Proc. of SPIE Vol. 5274 393

Further, in a digital system (figure 11) the following condition has to be satisfied: Latency = t Clock-to-Output (Max) + t setup (1) Latency + t logic + t skew T (2) Where, T (the clock period) must be greater than the sum of the worst case clock-to-output delay of the flip-flop A, the setup time of the flip-flop B, the maximum delay in the combinational logic, and the relative clock skew. Therefore, we have used the sum of the clock-to-output and the setup time as the delay imposed by the flip-flops. 3.1.2. Energy metrics: Two energy metrics have been used for the comparisons. The first measure is energy-per-transition (EPT), which is the average energy consumption when a transition appears at the input of the flip-flop (average of high and low transitions). The second measure is the clock-energy, which is the average energy consumption when data activity is zero. Since the flip-flops are targeted to be used in cell libraries, the input clock and the input data are both single-ended. All additional clock phases are generated inside the flip-flops. Power consumptions of the clock and data drivers are included in the total power consumption of the flip-flops. 4. SIMULATION RESULTS AND COMPARISON Figures 12-19 show the energy-delay space of each flip-flop. Each figure includes two sub-graphs: 1- The upper sub-graph shows the flip-flop energy per transition versus the total delay time (clock-to-output + setup time). The energy consumed by the clocked devices is shown with black color. 2- The lower sub-graph shows the total delay time (clock-to-output + setup time) versus the total flip-flop energy per transition. The setup time and the clock-to-output delay are highlighted by white and black colors respectively. Figures 20-23 summarize the energy-delay space of all the flip-flops. As the figure 21 shows, transmission gates flipflops TGMS and PowerPC 603 show the best power-performance trade-off among the fully static flip-flops. Further, they cover a relatively wide portion of the total energy-delay space. Pulse-triggered flip-flops HLFF and SDFF can support shorter delay targets. Figure 21 shows that pulse-triggered flip-flops HLFF and SDFF are faster mainly due to their shorter setup-time. Based on this figure the SDFF is the fastest flip-flop. However, the pulse-triggered flip-flops consume a considerably larger power (about 2x compared to TGMS flip-flops). The TSPC and the dynamic TG-based flip flops have a comparable performance while they consume up to 50% of the energy needed for SDFF. However, their internal floating nodes are sensitive to leakage currents and other sources of noise [11]. Figure 12: Energy-delay space for TGMS Figure 13: Energy-delay space for PowerPC 603 394 Proc. of SPIE Vol. 5274

Figure 14: Energy-delay space for mc²mos Figure 15: Energy-delay space for NANDNOR Figure 16: Energy-delay space for HLFF Figure 17: Energy-delay space for SDFF Figure 18: Energy-delay space for TSPC Figure 19: Energy-delay space for dynamic TGMS Proc. of SPIE Vol. 5274 395

Figure 20: Energy-per-transition versus clock-to-output delay Figure 21: Energy-per-transition versus total delay Figure 22: Clock-Energy versus clock-to-output delay Figure 23: Clock-Energy versus total delay Figures 20-23 can be used to identify the optimum flip-flop topology for different energy-delay targets. However, as an example, Table 1 compares the flip-flops at their minimum EPT delay² points in Fig. 21. 396 Proc. of SPIE Vol. 5274

Overall delay [ps] Clock-to- Output [ps] Setuptime[ps] Holdtime[ps] Energy-pertransition [fj] Clock energy [fj] SDFF 83.6 65.1 15.0 18.8 46.8 34.4 HLFF 94.5 64.4 26.9 15.6 34.7 21.7 TGMS-dynamic 98.4 49.8 46.1-6.4 15.8 4.4 TSPC 103.8 59.7 41.1 3.9 15.6 6.7 PowerPC 116.3 60.2 53.1-17.4 18.9 5.7 TGMS 118.7 63.3 52.2-17.8 18.8 5.6 mc²mos 152.8 68.6 80.8-31.7 29.9 10.6 NANDNOR 197.5 94.9 97.9-30.8 25.1 7.5 Table 1: Performance parameters at minimum EPT Latency² 5. CONCLUSION In this paper, we have explored the energy-delay space for eight of widely referred flip-flops to be included in a high performance flip-flop cell library covering a wide range of power-performance targets. All the eight flip-flops have been designed in a standard 0.13µm CMOS technology at 1.2V. Based on our simulation results, we have shown that transmission gate-based flip-flops (such as TGMS and PowerPC 603) exhibit the best power-performance trade-off with a total delay (clock-to-output + setup time) down to 105ps. For higher performance, the pulse-triggered semi-dynamic flip-flop SDFF (figure 6) is the fastest (80ps) alternative suitable to be included in a flip-flop cell library. However, pulse-triggered flip-flops consume significantly larger power (about 2.5x) compared to fully-dynamic flip-flops such as TSPC and dynamic TG-based flip-flops. ACKNOWLEDGEMENTS Authors would like to thank Dr. Ram Krishnamurthy, and James Tschanz (Intel Corporation) and Prof. Christer Svensson (Linköping University) for useful technical discussions. REFERENCES 1. Weste N. H. E., Eshraghian K., Principles of CMOS VLSI design, a systems perspective, second edition, Addison-Wesley, 1994 2. Rabaey J. M., Chandrakasan A., Nikolic B., Digital integrated circuits, a design perspective, second edition, Prentice Hall, 2003 3. Markovic D., Nikolic B., Brodersen R.W., Analysis and design of low-energy flip-flops, Proceeding of International Symposium on Low Power Electronics and Design, 2001, 6-7 Aug. 2001, Pages: 52-55 4. Uyemura J., Circuit Design for CMOS VLSI, Kluwer Academic Publishers, Norwell, Massachusetts, 1992 5. Stojanovic V., Oklobdzija V.G., Comparative analysis of master-slave latches and flip-flops for highperformance and low-power systems, IEEE Journal of Solid-State Circuits, Volume: 34 Issue: 4, April 1999, Pages: 536-548 Proc. of SPIE Vol. 5274 397

6. Gerosa G., Gary S., Dietz C., Dac Pham, Hoover K., Alvarez J., Sanchez H., Ippolito P., Tai Ngo, Litch S., Eno J., Golab J., Vanderschaaf N., Kahle J., A 2.2 W, 80 MHz superscalar RISC microprocessor, IEEE Journal of Solid-State Circuits, Volume: 29 Issue: 12, Dec. 1994, Pages: 1440-1454 7. Suzuki Y., Odagawa K., Abe T., Clocked CMOS calculator circuitry, IEEE Journal of Solid-State Circuits, Volume: 8 Issue: 6, Dec 1973, Pages: 462-469 8. Partovi H., Burd R., Salim U., Weber F., DiGregorio L., Draper D., Flow-through latch and edge-triggered flip-flop hybrid elements, Solid-State Circuits Conference, 1996. Digest of Technical Papers. 43rd ISSCC., 1996 IEEE International, 8-10 Feb. 1996, Pages: 138-139 9. Klass F., Semi-dynamic and dynamic flip-flops with embedded logic, Digest of Technical Papers, 1998 Symposium on VLSI Circuits, Honolulu, HI, USA, 11-13 June 1998, Pages: 108-109 10. Yuan J., Svensson C., High-speed CMOS circuit technique, IEEE Journal of Solid-State Circuits, Volume: 24 Issue: 1, Feb. 1989, Pages: 62-70 11. Larsson P.; Svensson C., Noise in digital dynamic CMOS circuits, IEEE Journal of Solid-State Circuits, Volume: 29 Issue: 6, June 1994, Pages: 655-662 398 Proc. of SPIE Vol. 5274