Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications

Similar documents
A Low-Power CMOS Flip-Flop for High Performance Processors

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

ENERGY RECOVERY FLIP-FLOPS AND RESONANT CLOCKING OF SCCER FLIP-FLOP IN H-TREE CLOCK NETWORK

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

A Noble Design of Energy Recovery Flip-Flops

Research Article Ultra Low Power, High Performance Negative Edge Triggered ECRL Energy Recovery Sequential Elements with Power Clock Gating

Asynchronous Model of Flip-Flop s and Latches for Low Power Clocking

An efficient Sense amplifier based Flip-Flop design

A Power Efficient Flip Flop by using 90nm Technology

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

Minimization of Power for the Design of an Optimal Flip Flop

Load-Sensitive Flip-Flop Characterization

Energy Recovering ASIC Design

P.Akila 1. P a g e 60

Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module

High performance and Low power FIR Filter Design Based on Sharing Multiplication

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop

An FPGA Implementation of Shift Register Using Pulsed Latches

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

LOW POWER LEVEL CONVERTING FLIP-FLOP DESIGN BY USING CONDITIONAL DISCHARGE TECHNIQUE

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

Lecture 21: Sequential Circuits. Review: Timing Definitions

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

DESIGN OF A NEW MODIFIED CLOCK GATED SENSE-AMPLIFIER FLIP-FLOP

Design and Analysis of Semi-Transparent Flip-Flops for high speed and Low Power Applications in Networks

New Single Edge Triggered Flip-Flop Design with Improved Power and Power Delay Product for Low Data Activity Applications

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

An Efficient Power Saving Latch Based Flip- Flop Design for Low Power Applications

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

Design of New Dual Edge Triggered Sense Amplifier Flip-Flop with Low Area and Power Efficient

Reduction of Area and Power of Shift Register Using Pulsed Latches

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

Power Optimization Techniques for Sequential Elements Using Pulse Triggered Flip-Flops with SVL Logic

DESIGN AND ANALYSIS OF LOW POWER STS PULSE TRIGGERED FLIP-FLOP USING 250NM CMOS TECHNOLOGY

Low Power and Reduce Area Dual Edge Pulse Triggered Flip-Flop Based on Signal Feed-Through Scheme

Design a Low Power Flip-Flop Based on a Signal Feed-Through Scheme

Design of Pulse Triggered Flip Flop Using Conditional Pulse Enhancement Technique

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

An Optimized Implementation of Pulse Triggered Flip-flop Based on Single Feed-Through Scheme in FPGA Technology

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

A Unified Approach in the Analysis of Latches and Flip-Flops for Low-Power Systems

ECE321 Electronics I

Parametric Optimization of Clocked Redundant Flip-Flop Using Transmission Gate

AN EFFICIENT DOUBLE EDGE TRIGGERING FLIP FLOP (MDETFF)

Modeling and designing of Sense Amplifier based Flip-Flop using Cadence tool at 45nm

A Novel Pass Transistor Logic Based Pulse Triggered Flip-flop with Conditional Enhancement

DESIGN AND ANALYSIS OF COMBINATIONAL CODING CIRCUITS USING ADIABATIC LOGIC

A Novel Low-overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application

ISSN Vol.08,Issue.24, December-2016, Pages:

A NOVEL APPROACH TO ACHIEVE HIGH SPEED LOW-POWER HYBRID FLIP-FLOP

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

Improve Performance of Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop

Design of Low Power D-Flip Flop Using True Single Phase Clock (TSPC)

Design of Low Power and Area Efficient Pulsed Latch Based Shift Register

Novel Low Power and Low Transistor Count Flip-Flop Design with. High Performance

I. INTRODUCTION. Figure 1: Explicit Data Close to Output

Comparison of Conventional low Power Flip Flops with Pulse Triggered Generation using Signal Feed through technique

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

Low Power High Speed Voltage Level Shifter for Sub- Threshold Operations

Design Low-Power and Area-Efficient Shift Register using SSASPL Pulsed Latch

FP 12.4: A CMOS Scheme for 0.5V Supply Voltage with Pico-Ampere Standby Current

Sequential Circuit Design: Part 1

DESIGN AND IMPLEMENTATION OF SYNCHRONOUS 4-BIT UP COUNTER USING 180NM CMOS PROCESS TECHNOLOGY

EFFICIENT POWER REDUCTION OF TOPOLOGICALLY COMPRESSED FLIP-FLOP AND GDI BASED FLIP FLOP

EE-382M VLSI II FLIP-FLOPS

Low Power D Flip Flop Using Static Pass Transistor Logic

Sequential Circuit Design: Part 1

Design of low power 4-bit shift registers using conditionally pulse enhanced pulse triggered flip-flop

Design of Shift Register Using Pulse Triggered Flip Flop

LOW-POWER CLOCK DISTRIBUTION IN EDGE TRIGGERED FLIP-FLOP

Design of a High Frequency Dual Modulus Prescaler using Efficient TSPC Flip Flop using 180nm Technology

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

Clocking Spring /18/05

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P46 ISSN Online:

GLITCH FREE NAND BASED DCDL IN PHASE LOCKED LOOP APPLICATION

RECENT advances in mobile computing and multimedia

Lecture 26: Multipliers. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

Digital System Clocking: High-Performance and Low-Power Aspects

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

DESIGN OF EFFICIENT SHIFT REGISTERS USING PULSED LATCHES 1 M. AJAY

Comparative Analysis of Pulsed Latch and Flip-Flop based Shift Registers for High-Performance and Low-Power Systems

Design And Analysis of Clocked Subsystem Elements Using Leakage Reduction Technique

11. Sequential Elements

International Journal of Scientific & Engineering Research, Volume 5, Issue 11, November-2014 ISSN

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

Design And Analysis Of Implicit Pulsed Double Edge Triggered Clocked Latch For Low Power Applications

An Efficient Design of Low Power Sequential Circuit Using Clocked Pair Shared Flip Flop

DESIGN OF EFFICIENT SHIFT REGISTERS USING PULSED LATCHES

Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Transcription:

Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications Matthew Cooke, Hamid Mahmoodi-Meimand, Kaushik Roy School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 4796, USA +1-765-494-75 {cooke, mahoodi, kaushik} @ecn.purdue.edu ABSTRACT A significant fraction of the total power in highly synchronous systems is dissipated over clock networks. Hence, low-power clocking schemes would be promising approaches for future designs. We propose four novel energy recovery flip-flops that enable energy recovery from the clock network, resulting in significant energy savings. The proposed flip-flops operate with a single-phase sinusoidal clock, which can be generated with high efficiency. Based on the simulation results using TSMC.25µm CMOS process technology, at a frequency of 2MHz, the proposed flipflops exhibit more than 8% delay reduction, power reduction of up to 46%, and area reduction of up to 77%, as compared to the conventional energy recovery flip-flop. We implemented 124 proposed energy recovery flip-flops through an H-tree clock network driven by a resonant clock-generator that generates a sinusoidal clock. Results show a power reduction of 9% on the clock-tree and total power savings of up to 83% as compared to the same implementation using the conventional square-wave clocking scheme and flip-flops. Categories and Subject escriptors B.7.1 [Integrated Circuits]: Types and esign Styles advanced technologies, microprocessors and microcomputers, VLSI. General Terms: Theory and esign Keywords: Adiabatic, Clock, Clock Tree, Energy Recovery, Flip-Flop 1. INTROUCTION Clock signals are synchronizing signals that provide timing references for computation and communication in synchronous digital systems. The increasing demand for high-performance VLSI System-on-Chip (SOC) designs is addressed by increasing the clock frequency and integrating more components on a chip, enabled by the continuing scaling of the process technology. With the continuing increase in the clock frequency and complexity of highperformance VLSI chips, the resulting increase in power consumption has become the major obstacle to the realization of high-performance designs. In addition to increased cooling costs, increased power consumption shortens the battery lifetime in portable applications. The major fraction of the total power consumption in highly synchronous systems, such as microprocessors, is due to the clock network. In the Itanium TM Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPE 3, August 25 27, 23, Seoul, Korea. Copyright 23 ACM 1-58113-682-X/3/8 $5.. microprocessor, more than 3% of the total chip power is due to the clock distribution network [1]. Thus, innovative clocking techniques for decreasing the power consumption of the clock networks are required for future designs. Energy recovery is a technique originally developed for lowpower digital circuits [2]. Energy recovery circuits achieve low energy dissipation by restricting current to flow across devices with low voltage drop and by recycling the energy stored on their capacitors by using an AC-type (oscillating) supply voltage [2]. In this paper, we apply energy recovery techniques to the clock network since the clock signal is typically the most capacitive signal. The proposed energy recovery clocking scheme recycles the energy from this capacitance in each cycle of the clock. For an efficient clock generation, we use a sinusoidal clock signal. The rest of the system is implemented using standard circuit styles with a constant supply voltage. However, for this technique to work effectively, there is a need for energy recovery flip-flops that can operate with a sinusoidal clock. A pass-gate energy recovery flip-flop has been proposed in [3] that works with a four-phase sinusoidal clock. The main disadvantage of the pass-gate energy recovery flip-flop is that its delay takes a major fraction of the total cycle time; therefore, the time allowed for combinational logic evaluation is significantly reduced. In addition, it requires four phases of the clock, adding considerable overhead to clock generation and routing. In this paper, we propose four high-performance and low-power energy recovery flip-flops that operate with a single-phase sinusoidal clock. The proposed flip-flops exhibit significant reduction in delay, power, and area as compared to the conventional four-phase pass-gate energy recovery flip-flop. We integrated 124 energy recovery flip-flops distributed across an area of 4mm 4mm and clocked through an H-tree clocking network. A resonant clockgenerator circuit was designed to generate a sinusoidal clock and drive the clock network and the flip-flops. For comparison, we implemented the same clock-tree using square-wave clocked flipflops. In this case, the clock network is buffered by a chain of progressively sized inverters. The remainder of this paper is organized as follows. In section 2, the conventional four-phase pass-gate energy recovery flip-flop is reviewed and the proposed energy recovery flip-flops are described. In section 3, extensive simulation results of individual flip-flops and their comparisons are presented. Section 4 includes system integration, clock generation and the clock-tree implementation. Finally, the conclusion of the paper appears in Section 5. 2. ENERGY RECOVERY FLIP-FLOPS In this section, our proposed flip-flops, as well as the conventional energy recovery flip-flop, are presented and their operations are discussed. Figure 1 shows the schematic of a conventional Four-Phase Transmission-Gate (FPTG) energy 54

recovery flip-flop [3]. FPTG is similar to the conventional Transmission-Gate Flip-Flop (TGFF) [4] except that it uses 4- transisor pass-gates designed to conduct during a short fraction of the clock period. The energy recovery clock is a four-phase sinusoidal clock (, 1, 2, and 3) as shown in Figure 2. FPTG is a master-slave flip-flop with the master controlled by and 2 and the slave controlled by 1 and 3. The main disadvantage of this flip-flop is its long delay. As shown in Figure 2, the delay from to (t - ) takes roughly half the effective clock period (T eff ). In addition, transistors required for the pass-gates are large, resulting in large flip-flop area. Another approach for energy recovery flip-flops is to locally generate square-wave clocks from a sinusoidal clock [3]. This technique has the advantage that existing square-wave flip-flops could be used with the energy recovery clock. However, extra energy is required in order to generate and possibly buffer the local square waves. Moreover, energy is not recovered from gate capacitances associated with clock inputs of flip-flops. Recovering energy from internal nodes of flip-flops in a quasiadiabatic fashion would also be desirable. However, storage elements of flip-flops cannot be energy recovering because we assume that they drive standard (non-adiabatic) logic. ue to slow rising/falling transitions of energy recovery signals, applying energy recovery techniques to internal nodes driving the storage elements can result in considerable short-circuit power within the storage element. Taking these factors into consideration, we developed flip-flops that enable energy recovery from their clock input capacitance, while internal nodes and storage elements are powered by regular (constant) supply. Employing our flip-flops in system designs enables energy recovery from clock distribution networks and clock input capacitances of flip-flops. The first proposed energy recovery flip-flop, Sense Amplifier Energy Recovery (SAER) flip-flop, is shown in Figure 3. This flipflop, which is based on the sense amplifier flip-flop proposed in [4], is a dynamic flip-flop with precharge and evaluate phases of operation. In [5], this flip-flop is used to operate with a low-voltageswing clock. We use this flip-flop to operate with an energy recovery clock. When the clock voltage exceeds the threshold voltage of the clock transistor (), evaluation occurs. At the N 1 onset of evaluation, the difference between the differential data inputs ( and B) is amplified and either or switches to low and is captured by the set/reset latch. The and nodes are precharged high when the clock voltage falls below - V tp, where V tp is the threshold voltage of the precharging transistors (MP1 and MP2). Since the energy recovery clock has slow rising and falling transitions, there can be overlap between evaluation and precharge phases. This overlapping results in short-circuit current. In order to reduce the amount of this short-circuit current, the threshold voltages of the precharging transistors can be increased. In scaled dual-threshold voltage (dual-v t ) CMOS technologies, high-v t devices can be used for the precharging transistors. Since our.25µm process (with 2.5V supply voltage) does not provide dual- V t, we simulated a high-v t PMOS transistor by adding a constant supply voltage of.386v in series with its gate terminal. This resulted in a 7% increase in the threshold voltage of the precharging transistors, which led to 17% power reduction. Figure 4 shows typical simulated waveforms of this flip-flop designed in a.25µm CMOS technology. Although the SAER flip-flop is fast and uses fairly low power at high data switching activities, its main drawback is that either the or node is always charged and discharged every cycle, regardless of the data activity. This leads to substantial power consumption at low data switching activities where the data is not changing frequently. We consider two approaches to address this problem. One approach is to use a static flip-flop, and the other is to employ conditional capturing [6]. Figure 5 shows the Static ifferential Energy Recovery (SER) flip-flop. This flip-flop is a static pulsed flip-flop similar to the ual-rail Static Edge-Triggered Latch (L) [7]. The energy recovery clock is applied to a minimum-sized inverter skewed for fast high-to-low transition. The clock signal and the inverter output MP1 Sense amplifier MP2 B Set/reset latch 2 3 Fig. 1: Conventional Four-Phase Transmission-Gate (FPTG) energy recovery flip-flop [3] High V t transistor Fig. 3: Sense Amplifier Energy Recovery (SAER) flip-flop Fig. 2: Typical waveforms of FPTG flip-flop Fig. 4: Typical simulated waveforms of SAER flip-flop 55

(B) are applied to transistors and MN2 (MN3 and MN4). The series combination of these transistors conducts for a short period of time during the rising transition of the clock when both the and B signals have voltages above the threshold voltages of the NMOS transistors. Since the clock inverter is skewed for fast high-to-low transitions, the conducting period occurs only during the rising transition of the clock, but not on the falling transition. In this way, an implicit conducting pulse is generated during each rising transition of the clock. A cascade of three inverters instead of one can give a slightly sharper falling edge for the inverted clock (B). However, due to the slow rising nature of the energy recovery clock, enough delay can be generated by a single inverter. Figure 6 shows typical simulated waveforms of the SER flip-flop. In this flip-flop, when the state of the input data is the same as its state in the previous conduction phase, there are no internal transitions. Therefore, power consumption is minimized for low data switching activities. The second approach for minimizing flip-flop power at low data switching activities is to use conditional capturing to eliminate redundant internal transitions. Figure 7 shows the ifferential Conditional-Capturing Energy Recovery (CCER) flip-flop. Similar to a dynamic flip-flop, the CCER flip-flop operates in a precharge and evaluate fashion. However, instead of using the clock for precharging, small pull-up PMOS transistors (MP1 and MP2) are used for charging the precharge nodes ( and ). The CCER flip-flop uses a NAN-based Set/Reset latch for the storage mechanism. The conditional capturing is implemented by using feedback from the output to control transistors MN3 and MN4 in the evaluation paths. Therefore, if the state of the input data is same as that of the output, and are not discharged. Figure 8 shows typical simulated waveforms of the CCER flipflop. As can be seen in Figure 8, is generally less than Vdd/2 during a significant part of the conducting window. Therefore, a fairly large transistor is used for. Moreover, since there are four stacked transistors in the evaluation path, significant charge sharing may occur when three of them become ON simultaneously. Having properly sized pull-up PMOS transistors (MP1 and MP2) instead of clock controlled precharge transistors ensures a constant path to, which helps to reduce the effect of charge sharing. Another property of the circuit that helps reduce charge sharing is that the clock transistor (), which is the largest transistor in the MP1 MN3 B MN2 MP2 MN4 B Fig. 7: ifferential Conditional Capturing Energy Recovery (CCER) flip-flop Fig. 8: Typical waveforms for CCER flip-flop MP1 MN3 x MN2 MN4 MN4 B B MN3 B MN2 B Fig. 5: Static ifferential Energy Recovery (SER) flip-flop Fig. 9: Single-ended Conditional Capturing Energy Recovery (SCCER) flip-flop Fig. 6: Typical simulated waveforms of SER flip-flop Fig. 1: Typical waveforms of SCCER flip-flop 56

evaluation path, is placed at the bottom of the stack. Therefore, the diffusion capacitance of the source terminal of is grounded and does not contribute to the charge sharing. Figure 9 shows a Single-ended Conditional Capturing Energy Recovery (SCCER) flip-flop. SCCER is a single-ended version of the CCER flip-flip. The transistor MN3, controlled by the output, provides conditional capturing. The right hand side evaluation path is static and does not require conditional capturing. Placing MN3 above MN4 in the stack reduces the charge sharing. That is because when the charge sharing occurs, the capacitance associated with MN3 is already charged and therefore does not contribute to the charge sharing. Typical simulated waveforms of the SCCER flip-flop are shown in Figure 1. 3. SIMULATION RESULTS AN COMPARISONS All the flip-flops were designed and laid-out using TSMC.25µm process technology with a supply voltage of 2.5V. Netlists with parasitic capacitances were extracted from layouts and simulated using HSPICE. The designs were optimized at a temperature of 25 C for a clock frequency of 2MHz. However, since the FPTG flip-flop is a dual-edge triggered flip-flop, it was designed to operate at a clock frequency of 1MHz. A load capacitance of 3fF was used for all outputs. Figure 11 illustrates our timing definitions for energy recovery flip-flops. elay is measured between 5% points of signal transitions. Setup time is the time from when data becomes stable to the rising transition of the clock. Hold time is the time from the rising transition of the clock to the earliest time that data may change after being sampled. Setup and hold times are measured with reference to the 5% point of the rising transition of the clock. The proposed flip-flops are compared with the FPTG flip-flop. For individual flip-flop simulations, an ideal sinusoidal clock was used. Figure 12(a) shows clock-to-output (-) delay and data-tooutput (-) delay vs. setup time for all the flip-flops. It is apparent that the delays of the FPTG flip-flop are much larger as compared to the proposed flip-flops. Figure 12(b) shows a clearer illustration of the behavior of the proposed flip-flops in the minimum delay region. For any flip-flop, there is an optimum setup-time that results in a minimum - delay. This optimum setup time is near the minimum functional setup time and is used for comparisons of setup time. As shown in Figure 12, the - delay becomes independent of setup time for long setup times. We use this value of - delay for comparisons of - delay. The SCCER flipflop exhibits the smallest minimum - delay, while the SAER flip-flop shows the smallest - delay. The SER flip-flop has the shortest setup time among the proposed flip-flops. Figure 13 shows the dependence of - and - delays on clock frequency. Flip-flops are simulated from a frequency of 5MHz to their maximum frequency of operation. The flip-flops elay (ps) elay (ps) 38 33 28 23 18 13 8 3-2 13 11 9 7 5 3 1-1 -2 3 8 13 18 Setup Time (ps) SAER - SER - CCER - SCCER - (a) SAER - SER - CCER - SCCER - FPTG Clk- FPTG - SAER Clk- SAER - SER Clk- SER - CCER - CCER - SCCER - SCCER - -5 15 35 55 75 95 Setup Time (ps) (b) Fig. 12: Setup time vs. delay for (a) all flip-flops (b) proposed flip-flops - elay (ps) 9 8 7 6 5 4 3 2 1-1 SCCER - SER - CCER - SAER - FPTG - SCCER - SER - CCER - SAER - FPTG - 5 1 15 2 25 3 35 4 45 Frequency (MHz) Fig. 13: elay vs. Frequency for all flip-flops were not re-optimized for each frequency. Although all the proposed flip-flops fail at frequencies above 4MHz, they can easily be re-sized to operate at higher frequencies. The proposed flip-flops show a higher range of operational frequency, and their delays are much less dependent on the clock frequency as compared to the FPTG flip-flop. Figure 14 shows power as a function of data switching activity for different flip-flops. The SAER flip-flop has the lowest power consumption at high switching activities; however, it has the maximum power at low switching activities. The SER and conditional capturing (CCER and SCCER) flip-flops show less power consumption at low switching activities. The SER and conditional capturing flip-flops, however, consume more power than that of SAER flip-flop at high switching activities. This is because of the fact that at high switching activities there is much less opportunity for energy savings by using a static flip-flop or employing conditional capturing. Figure 11: Sample waveforms illustrating timing definitions 57

Table 1: Summary of numerical results of flip-flops at 5% data switching activity with 2MHz sinusoidal clock min - setup time hold time - Power * PP * Norm. Area Transist (fj) delay (ps) (ps) (ps) delay (ps) (µw) PP (µm 2 ) or count FPTG 2491-135 74 2619.7 156.4 389.59 1 58.8 16 SER 391.6 35 27 268.2 12.8 47.35.1214 133.9 14 SAER 434.8 355.3-19 79.5 16.4 46.293.1188 139.2 18 CCER 373.1 115.7 15 174.3 11.7 37.944.974 136.9 18 SCCER 285.9 125 123.3 83.88 23.981.616 196.8 17 *Power is for long setup time; Power-elay-Product (PP) is the product of this power and the minimum - delay. As measured from the first phase clock (). Power (uw) 35 3 25 2 15 1 5 FPTG SAER SER CCER SCCER 1 2 3 4 5 6 7 8 9 1 ata Switching Activity (%) Fig.14: Power vs. data switching activity at 2MHz Table 1 summarizes the numerical results for the flip-flops. The proposed flip-flops exhibit more than 8% delay reduction, a power reduction of up to 46%, and an area reduction of up to 77%, as compared to the FPTG flip-flop. 4. ENERGY RECOVERY CLOCKING In order to demonstrate the feasibility of energy recovery clocking, we integrated 124 energy recovery flip-flops distributed across an area of 4mm 4mm and clocked them by a single-phase sinusoidal clock through an H-tree clocking network. The flipsflops were grouped into registers of 32 flip-flops, and the registers were evenly spaced in this area. A common data input was used for all flip-flops to easily control the data switching activity of the system. The clock was distributed using an H-tree network on the metal-5 layer, which has the smallest parasitic capacitance to the substrate. The width of the clock-tree interconnects was selected to be the maximum (35µm in our.25µm process) to minimize parasitic resistances. A lumped -type RC model for each interconnect of the clock-tree was extracted and then connected together to make a distributed RC model of the clock-tree, as shown in Figure 15. The energy recovery clock generator drives the source node of the clock-tree (node in Figure 15), and each final node of the clock-tree (1 to 16) is connected to two 32-bit registers. The energy recovery clock generator is a single-phase resonant clock generator as shown in Figure 16(a). Transistor M1 receives a reference pulse to pull-down the clock signal to ground when the clock reaches its minimum; thereby maintaining the oscillation of the resonant circuit. This transistor is a fairly large transistor, and therefore, driven by a chain of progressively sized inverters. The natural oscillation frequency of this resonant clock driver is determined by: f 1 2π LC = (1) where C is the total capacitance connected to the clock-tree including parasitic capacitances of the clock-tree and gate capacitances associated with clock inputs of all flip-flops. In order to have an efficient clock generator, it is important that the frequency of the REF signal be the same as the natural oscillation frequency of the resonant circuit. In order to find the value of C, first with a given L and with the REF signal at zero, the whole system, including the flip-flops, is simulated. The clock signal shows a decaying oscillating waveform settling down to /2. From this waveform the natural decaying frequency is measured, and then by using Equation (1), the value of C is calculated. For the system with each proposed flip-flop, this experiment is carried out to determine the value of C. Having the value of C, the value of L for the frequency of 2MHz can again be determined from the Equation (1). The system consisting of the energy recovery clock generator, clock-tree, and flip-flops was simulated at the frequency of 2MHz (for all the proposed energy recovery flip-flops) with different data switching activities. Figure 17 shows a typical waveform of the generated energy recovery clock. In order to compare with the square wave clocking, three flipflops that operate with the square-wave clock were also designed. These flip-flops are Hybrid Latch Flip-Flop (HLFF) [8] and Conditional Capturing Flip-Flop (CCFF) [6], which are high-speed L R Load 1 2 5 6 /2 REF M1 C REF 3 4 7 8 T = 2 π LC (a) _in R Load 9 1 13 14 _in T = 1 f C 11 12 15 16 Fig 15: istributed RC model of clock-tree (b ) Fig. 16: (a) Resonant energy recovery clock generator (b) Nonenergy recovery clock driver 58

Fig 17: Typical waveform of generated energy recoveringclock signal Total power consumption (mw) Power (mw) Power (mw) 25 2 15 1 5 % 2% 4% 6% 8% 1% ata switching activity HLFF CCFF TGFF SAER SER CCER SCCER Fig. 18: Total power vs. switching activity at 2MHz Power (mw) 16 14 12 1 8 6 4 2 16 14 12 1 8 6 4 2 18 16 14 12 1 8 6 4 2 Power breakdown @ 2MHz and % activity Flip-flop power Clock tree power Generator power HLFF CCFF TGFF SAER SER CCER SCCER Fig 19: Power breakdown flip-flops, and Transmission-Gate Flip-Flop (TGFF) [4], which is a low-power flip-flop. For square wave clocking, the clock-tree is driven by a chain of progressively sized inverters as shown in Fig 16 (b). The whole system of clock buffers, clock-tree, and flip-flops was simulated at a frequency of 2MHz with different data switching activities. Figure 18 shows the results of this experiment. The system power is plotted vs. data switching activity for the systems with different flip-flops. Among the square-wave flip-flops, the TGFF system shows the lowest power consumption for all 83% 75% Power breakdown @ 2MHz and 25% activity 65% 5% Flip-flop power Clock tree power Generator power HLFF CCFF TGFF SAER SER CCER SCCER Power breakdown @ 2MHz and 5% activity 49% 31% Flip-flop power Clock tree power Generator power HLFF CCFF TGFF SAER SER CCER SCCER switching activities. The HLFF system has the highest power consumption at low switching activities, and the CCFF system shows the highest power consumption at high switching activities. Among the Energy Recovery flip-flops, the systems with conditional capturing flip-flops (CCER and SCCER) exhibit the lowest power consumption at low switching activities (below 66%). For high switching activities, the system with SAER flip-flops has the minimum power consumption. The energy recovery systems show less power consumption at all switching activities as compared to the square-wave clocking, except for the energy recovery system with SER flip-flops at switching activities above 66%. These results are similar to comparisons of individual flip-flops shown in Figure 14. Figure 19 shows the power breakdown of the systems with different flip-flops at different switching activities. The energy recovery clocking scheme reduces the power due to clock distribution (clock-tree) by more than 9% compared to non-energy recovery (square-wave) clocking. The generator power overhead in the energy recovery scheme is very small, which indicates that the clock generator is very efficient. As compared to the HLFF system, the SCCER system shows power savings of 83%, 65%, and 49% at data switching activities of %, 25%, and 5%, respectively. When compared to the TGFF system (the lowest power square-wave system), the SCCER system shows power savings of 75%, 5%, and 31% at data switching activities of %, 25%, and 5%, respectively. 5. CONCLUSIONS We proposed four novel energy recovery flip-flops that enable energy recovery from the clock network, resulting in significant total energy savings compared to the square-wave clocking. The proposed flip-flops operate with a single-phase sinusoidal clock, which can be generated with high efficiency. We implemented 124 proposed energy recovery flip-flops through an H-tree clock network driven by a resonant clock-generator, generating a sinusoidal clock. Results show a power reduction of 9% on the clock-tree and total power savings of up to 83% as compared to the same implementation using conventional square-wave clocking scheme and flip-flops. The results demonstrate the feasibility and effectiveness of the energy recovery clocking scheme in reducing total power consumption. 6. REFERENCES [1] S.. Naffziger and G. Hammond, The implementation of the next generation 64b Itanium TM microprocessor, IEEE International Solid- State Circuits Conference, pp. 344-472, 22. [2] W. C. Athas, et al., Low-power digital systems based on adiabaticswitching principles, IEEE Trans. VLSI Systems, vol. 2, no. 4, pp. 398-46, ec. 1994. [3] B. Voss and M. Glesner, A low power sinusoidal clock, IEEE International Symposium on Circuits and Systems, pp. 18-111, May 21. [4] B. Nikolic, et al., Improved sense-amplifier-based flip-flop: design and measurements, IEEE JSSC, vol. 35, pp. 876-884, Jun 2. [5] H. Kawaguchi and T. Sakurai, A reduced clock-swing flip-flop (RCSFF) for 63% power reduction, IEEE Journal of Solid-State Circuits, vol. 33, pp. 87-811, May 1998. [6] B. S. Kong, et al., Conditional-capture flip-flop for statistical power reduction, IEEE Journal of Solid-State Circuits, vol. 36, pp. 1263 1271, Aug. 21. [7] L. ing, et al., A dual-rail static edge-triggered latch, IEEE International Symposium on Circuits and Systems, pp. 645-648, May 21. [8] H. Partovi, et al., Flow-through latch and edge-triggered flip-flop hybrid elements, IEEE International Solid-State Circuits Conference, pp. 138-139, Feb 1996. 59