Energy Recovering ASIC Design

Similar documents
Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Load-Sensitive Flip-Flop Characterization

Research Article Ultra Low Power, High Performance Negative Edge Triggered ECRL Energy Recovery Sequential Elements with Power Clock Gating

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module

DESIGN AND ANALYSIS OF COMBINATIONAL CODING CIRCUITS USING ADIABATIC LOGIC

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

Electrical & Computer Engineering ECE 491. Introduction to VLSI. Report 1

P.Akila 1. P a g e 60

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

A Power Efficient Flip Flop by using 90nm Technology

A Low-Power CMOS Flip-Flop for High Performance Processors

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

An FPGA Implementation of Shift Register Using Pulsed Latches

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

High Frequency 32/33 Prescalers Using 2/3 Prescaler Technique

11. Sequential Elements

Adiabatic Flip-Flops and Sequential Circuit Design using Novel Resettable Adiabatic Buffers Maheshwari, S., Bartlett, V. and Kale, I.

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

Sequential Circuit Design: Part 1

CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

Improve Performance of Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop

ENERGY RECOVERY FLIP-FLOPS AND RESONANT CLOCKING OF SCCER FLIP-FLOP IN H-TREE CLOCK NETWORK

Sequential Circuit Design: Part 1

A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1

Design of Low Power D-Flip Flop Using True Single Phase Clock (TSPC)

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

EE-382M VLSI II FLIP-FLOPS

ECE321 Electronics I

An efficient Sense amplifier based Flip-Flop design

Design of New Dual Edge Triggered Sense Amplifier Flip-Flop with Low Area and Power Efficient

An Efficient Power Saving Latch Based Flip- Flop Design for Low Power Applications

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

Design Project: Designing a Viterbi Decoder (PART I)

Parametric Optimization of Clocked Redundant Flip-Flop Using Transmission Gate

LFSR Counter Implementation in CMOS VLSI

Comparative study on low-power high-performance standard-cell flip-flops

Asynchronous Model of Flip-Flop s and Latches for Low Power Clocking

DESIGN OF LOW POWER TEST PATTERN GENERATOR

Low-Energy VLSI Circuit Architectures

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

Design of Sequential Circuit using Low Power Adiabatic Complementary Pass Transistor Logic

DESIGN OF A NEW MODIFIED CLOCK GATED SENSE-AMPLIFIER FLIP-FLOP

AN EFFICIENT DOUBLE EDGE TRIGGERING FLIP FLOP (MDETFF)

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Dual Slope ADC Design from Power, Speed and Area Perspectives

A NOVEL APPROACH TO ACHIEVE HIGH SPEED LOW-POWER HYBRID FLIP-FLOP

Figure.1 Clock signal II. SYSTEM ANALYSIS

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

DESIGN AND IMPLEMENTATION OF SYNCHRONOUS 4-BIT UP COUNTER USING 180NM CMOS PROCESS TECHNOLOGY

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

LOW POWER LEVEL CONVERTING FLIP-FLOP DESIGN BY USING CONDITIONAL DISCHARGE TECHNIQUE

Power Reduction Techniques for a Spread Spectrum Based Correlator

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P46 ISSN Online:

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock.

Performance Driven Reliable Link Design for Network on Chips

WINTER 15 EXAMINATION Model Answer

2.6 Reset Design Strategy

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

ELEC 4609 IC DESIGN TERM PROJECT: DYNAMIC PRSG v1.2

Design and Analysis of Semi-Transparent Flip-Flops for high speed and Low Power Applications in Networks

Efficient 500 MHz Digital Phase Locked Loop Implementation sin 180nm CMOS Technology

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Notes on Digital Circuits

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Minimization of Power for the Design of an Optimal Flip Flop

EECS150 - Digital Design Lecture 17 - Circuit Timing. Performance, Cost, Power

A Low Power Delay Buffer Using Gated Driver Tree

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

New Single Edge Triggered Flip-Flop Design with Improved Power and Power Delay Product for Low Data Activity Applications

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

Design of Fault Coverage Test Pattern Generator Using LFSR

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

Design and Analysis of Modified Fast Compressors for MAC Unit

CMOS Low Power, High Speed Dual- Modulus32/33Prescalerin sub-nanometer Technology

A Novel Low-overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application

IT T35 Digital system desigm y - ii /s - iii

Transcription:

Energy Recovering ASIC esign Conrad H. Ziesler, Joohee Kim, Marios C. Papaefthymiou Advanced Computer Architecture Laboratory epartment of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, MI, USA cziesler, jooheek, marios @eecs.umich.edu Abstract issipation in the clock tree and state elements of ASIC designs is often a significant fraction of total energy consumption. We propose a methodology for recovering most of this energy by using a novel energy recovering flip-flop and a novel single-phase resonant clock generator. As our state element has near-zero energy consumption when the input data is not switching, it provides the savings of clock gating approaches without the additional complexity of implementing clock gating in the design. To complement this near-zero idle energy property of the flip-flop, our resonant clock generator includes the capability to decide, on a per-cycle basis, whether or not the resonant clock needs to be replenished on the next cycle, thus automatically reducing energy consumption when most of the state elements are idling. ASICs designed with our methodology can achieve sub dissipations on the clock network at frequencies of 2 5MHz and operating voltages of 1. 1.5V in a.25 m process. To evaluate our methodology, we simulated a dual-mode (conventional and energy recovering) ASIC module to directly compare energy savings between the energy recovering and conventional clocking schemes. Our simulations demonstrate savings of over a factor of 4 for the energy-recovering mode versus the conventional mode for low switching activities. low-energy, high-speed flip-flops [1], [2], [3]. Third, our flipflop is very compact, containing only 14 transistors in its minimal configuration, or 18 transistors with an embedded two-input XOR gate, for example. Furthermore, our flip-flop is capable of operating with several different energy recovering power-clock waveforms. An energy recovering power-clock enables the entire clock tree to operate at sub dissipation levels when driven by a resonant clock generator. (a) V h + v n I. INTROUCTION A popular approach to low-energy, high-throughput VLSI system design is voltage-scaled static CMOS with aggressive pipelining. This approach is often combined with clock gating to reduce the dissipation of idle flip-flops and branches of the clock tree. In these systems, due to the large number of flip-flops and loading of the clock tree, the dissipation of the clock tree and state elements (flip-flops) can often be a substantial fraction of total system dissipation. We propose a novel design methodology for energy recovery utilizing our new PMOS energy recovering flip-flop (pterf) and a novel single-phase resonant clock generator. Our singlephase approach has several advantages over other energy recovering methodologies such as simple clock generation and distribution, no need for phase-balancing, skew tolerance, single inductor tuning, and low transistor count. Effectively, our energy recovering technique enables a substantial energy reduction with minimal designer effort. Our new energy recovering flip-flop has several properties making it well suited for deeply pipelined low-voltage static CMOS systems. First, our flip-flop exhibits near zero energy consumption while idle (i.e., when the data input switching activity is zero). This property eliminates the need for clock-gating logic, yielding similar savings as fine-grained clock gating at every flip-flop in the design. With constant input data, our flipflop dissipation is a remarkable 4.9fJ/cycle at 5MHz/1.5V, or 1.8fJ/cycle at 2MHz/1.V in a.25 m process. Second, the energy consumption of our flip-flop for unit switching activity is 75fJ/cycle at 2MHz, making it competitive with other (b) v c Fig. 1. Example system configuration and clock waveform for (a) energy recovery, and (b) no energy recovery. The substantial reduction in dissipation achieved by our flipflop is due to energy recovery. The basic premise of energy recovery is to recycle the charge stored in circuit capacitance using a power-clock signal in conjunction with an inductor or stepwise capacitor driver. Figure 1 contrasts a typical energy recovering and non-energy recovering synchronous system. Several energy recovering clock generator options are available for our flip-flop, such as blip [4], [5] and sinusoid [6]. More advanced energy recovering clock driving techniques, such as harmonic rail drivers are discussed in [7]. Other recent advances in the energy recovery literature are the clock-powered AC-x series microprocessors [8] and the source coupled adiabatic logic family [9]. Our single-phase resonant clock generator has several features making it well suited to drive systems of our pterf flipflops. First, the power topology of our clock generator enables the large resonant currents to bypass the main power switch, allowing the switch to be much smaller than topologies where the entire current is conducted by the switch. Second, the gate drive of the main switch uses an efficient dynamic circuit that optimizes the turn-on and turn-off rates to minimize dissipation. Third, a compact and fast control circuit enables the clock generator to decide, on a cycle by cycle basis, whether or not to replenish the resonating power-clock energy. This capability al-

_ X Y _ X Y Fig. 2. Schematic of PMOS energy recovering flip-flop (pterf). Fig. 3. Schematic of NMOS energy recovering flip-flop (nterf). lows the clock generator to maintain a stable amplitude of while consuming as little energy as possible. To evaluate our energy recovering design methodology, we re-implemented an existing conventional ASIC design using our energy recovering flip-flop and resonant clock generator. To enable direct comparisons and to simplify testing, we combined a conventional static CMOS flip-flop with our pterf flip-flop and a multiplexer to select between conventional and energy recovering flip-flops. A conventional buffered clock tree was used for the static CMOS flip-flops, while a wide metal distribution network driven by the resonant power-clock generator was used for the pterf flip-flops. Our simulation results of this voltage-scaled system indicate greater than 4 fold savings for low switching activity (when the state elements dominates dissipation) and a 2% savings for high switching activity (when the combinational logic dominates dissipation). The remainder of this paper is organized as follows: Section II describes the structure, operation, timing and energy characteristics of our energy recovering flip-flop. Section III describes the structure and operation of our resonant clock generator. Section IV explains our test ASIC implementation and gives our simulation results. We describe ongoing work in Section V. II. PTERF FLIP-FLOP This section describes our novel single-phase, energy recovering flip-flop. Key properties of our flip-flop include near-zero dissipation when the input data is held constant, low overall dissipation when the input is changing, low-voltage operation, compact layout, and a - delay which is inversely proportional to frequency. A. Structure The energy recovering flip-flop we describe in this paper consists of an energy recovering dynamic buffer driving a pair of cross-coupled NOR gates as the static latch element. Figure 2 shows the schematic of the PMOS version, that latches on rising pulses of the power-clock node. The power-clock node supplies both power and timing information to the circuit, in contrast to conventional clock nodes which supply only timing information. Note that correct operation is dependent on the ratioing of the pull-down NMOS and the pull-up cross coupled PMOS. at bt af bf logic function pull-down network Fig. 4. Schematic of PMOS energy recovering flip-flop with embedded AN gate (pterfand). Alternatively, we can derive the complement version using NMOS in place of PMOS devices, as seen in Figure 3. The resulting circuit latches on falling pulses of, using cross coupled NAN gates as the state element. Another interesting option is to replace the two pull down NFETs in pterf with a pull-down tree that computes some given logic function. Figure 4 shows a flip-flop with an embedded dual-rail AN gate. A low overhead reset may be obtained by adding an extra reset transistor to either or. B. Operation As shown in Figure 5, the operation of pterf begins with the data input changing at a suitable time before the rising edge of. The two inverters buffer and derive the complemented input, which is applied to the gate of the NMOS pull-down transistors. When the rising edge of arrives, the cross coupled PMOS devices sense and latch the appropriate value of onto the nodes X and Y. Since the cross coupled NOR gates form a simple set/reset latch, we have that positive pulses on either X or Y will cause the latch to either set or reset, respectively. When is not changing, either X or Y will remain low, with the other node oscillating in phase with in an energy recovering manner, that is, transferring charge to/from the signal. This charge recycling probe operation is the key to the ultra low energy consumption at zero input switching activity. Changes in that occur while is low have no effect on the output, X Y _

voltage, V voltage pterf operation 1.4 1.2.8 1 Pclk.6.4.2 14 16 18 2 22 24 26 28 3 32 34 36 time, ns pterf operation (internal signals) 1.4 1.2.8 1.6.4.2 14 16 18 2 22 24 26 28 3 32 34 36 ns Fig. 5. Operational waveforms for pterf. since the transitions on X or Y are monotonically decreasing. Changes in that occur while is high have no effect on the output because of the ratioing between the PMOS and NMOS. This case is rare, however, and could only occur at low frequencies with high and a very short combinational logic path. C. Timing and Energy d_ xt xf TABLE I TIMING AN ENERGY VARIABLES Energy for or transitions Energy for or transitions Time before at which must be valid Time after at which may change Time after at which becomes valid Cycle time erage energy numbers, the active and idle energy consumption, which refer to an input switching activity of 1 and respectively. Figure 7 shows the test circuit we used to measure the timing and energy values for pterf. The inputs and outputs are buffered/loaded with typical inverters. We measure the timing values as the midpoint crossings of the indicated,, and signals. There are many different ways to account for the energy consumption of a flip-flop, considering dissipation of the input and output loads, the internal dissipation, whether or not to include complementary outputs, or assuming complementary inputs. There have been many debates as to what accounting scheme is most realistic. For simplicity, we measure the energy dissipation of the circuits within the dashed line, minus the energy required to drive the primary input. Ts 16 28% Tcycle Tq 14 Th 12 Valid Tdq (ps) 1 8 Fig. 6. Timing diagram of pterf using sinusoidal clock. 6 energy 4 2MHz,1.V 333MHz,1.2V 5MHZ,1.5V frequency,voltage Fig. 8. pterf - delay (Ts+Tq) input timing 7n:21p 7n:21p 21 7 7 6n:11p _ 21 9n:33p Fig. 7. Circuit for timing simulations. qf qt 1n:18p timing 7n:21p Figure 6 shows the timing definitions used for our timing analysis. The pterf flip-flop samples its input on the rising edge of, so we define the midpoint crossing of the rising edge of as the reference time and define the usual timing variables as indicated in Table I. In addition, we define two av- Figure 8 plots the total flip-flop delay as a function of operating frequency. By total flip-flop delay we mean the setup time plus the clock to output time. At 2MHz and 1.V, pterf requires 1,28ps, while at 5MHz and 1.5V, it requires 57ps. At 5MHz, 1.8V (not shown on the graph), pterf requires only 46ps. Notice the trend that the flip-flop delay decreases with increasing frequency. This behavior is due to the sinusoidal shape of the energy recovering waveform. At all frequencies, pterf consumes roughly a quarter of the total clock period. Increasing the voltage reduces this fraction slightly. For comparison, one fan-out-of-four using 2-input NAN gates in this technology is 21ps at 1.5V and 53ps at 1.V. So, pterf consumes roughly 2-3 fanouts of four out of the 9-1 available per cycle at these low voltages. Figure 9 shows the energy consumption of pterf as a func-

Energy/op (fj) 25 2 15 1 5 idle active 85fF*^2 1.7fJ 2.6fJ 4.9fJ 2MHz,1.V 333MHz,1.2V 5MHZ,1.5V frequency,voltage Fig. 9. Active and idle energy consumption of pterf as function of frequency. Active energy consumption varies with hold times. of the logic supply, as shown in Figure 1. Synchronization occurs with an input reference square-wave clock, which is fed into 3 delay lines that generate the appropriate timing signals for the control circuit. The controller compares the peak value of with a reference voltage. For each cycle, it decides whether or not the inductor current needs replenishing. The output of the controller is a pulse which is buffered and inverted by two ratioed inverters before connecting to the gates of a PMOS pull-up and an NMOS pull-down. These two devices drive the gate terminal of the main NMOS power switch. The sizes of the transistors in the ratioed inverters are chosen so that the pullup and pull-down are never on at the same time. In addition, the main switch is turned on slowly, but turned off quickly, thus minimizing dissipation due to the PMOS pull-up capacitance. The main NMOS power switch is turned on at the time when the voltage difference between and ground is small, replenishing the current in the inductor from the C supply. tion of operating frequency for two different input data conditions, idle (never switching) and active (always switching). The idle energy consumption is near zero at all frequencies, with a dissipation of 1.7fJ at 2MHz and 4.89 fj at 5MHz. The active energy consumption was measured both at the minimum hold time (for the case of cascaded flip-flops), and at a nominal hold time (for the case of logic between flip-flops). For all frequencies, the lower point on the error-bar indicates the energy dissipation at the nominal hold time. III. POWER-CLOCK GENERATOR d1 d2 out The generator converts energy from the C supplies into AC energy using a lumped inductor and an on-chip NMOS switch. The signal is distributed to all of the flip-flops in the ASIC core. By using a single sinusoidal waveform, the clock distribution problem is drastically simplified. Furthermore, the entire capacitance of the clock distribution wires resonates with the generator inductor, thus eliminating the dissipation in a conventional clock tree. d3 d3 d3 ref d1 d2 d3 single cycle controller out reference clock delay lines reference voltage power bus 1/2 to ASIC core 1/2 Gnd Fig. 1. Block diagram of power-clock generator The generator is composed of a control circuit, the large NMOS power transistor and associated drive circuitry, and a lumped inductor connected to a C supply which is half that Fig. 11. Single cycle controller The single cycle controller is built around a two-stage clocked comparator circuit connected to a set-reset latch, as shown in Figure 11. A low-to-high transition on d3 causes the difference between and the reference voltage to be amplified by the cross coupled inverters. The result of this comparison toggles the set-reset latch. The phase difference between d1 and d2 is used to generate a pulse which is gated by the current state of the set-reset latch and fed to the output. Thus the controller efficiently implements single cycle feedback control. Figure 12 shows typical operational waveforms of the clock generator. The g signal is the gate drive of the main power switch. The signal is the power-clock signal. At 1ns, the load connected to undergoes a step increase in dissipation. As a result, the clock generator, which was operating under a 2-on, 2-off periodicity, switches behavior to full-on. In this par-

voltage Power-clock generator operation 1.6 1.4 1.2 1.8.6.4.2 6 7 8 9 1 11 12 vdd g Fig. 12. Power-clock generator operation ticular case, the power-clock generator circuits are running at 1.V while the power-clock is being driven to 1.5V. At 1.V and 333MHz, the power-clock generator has a two cycle control latency rather than the desired single cycle. IV. ASIC ESIGN EXAMPLE To evaluate our new energy recovery ASIC design methodology, we implemented an existing ASIC design using our pterf flip-flop and resonant clock generator. The module implements a multilevel discrete wavelet transform, used as the first stage in neural signal processing chip. It consists of two pipelined multipliers, several pipelined adders, a FIFO, and some control circuits. The original design was done as part of a VLSI course project and used a standard-cell synthesis place-and-route synchronous design methodology, targeting 6MHz in a.18 m technology at 2.5 volts. For fabrication reasons, we targeted a.25 m technology available through MOSIS, choosing to aggressively voltage scale the design while trying to meet a 333MHz throughput goal. d clk Fig. 13. Schematic of conventional flip-flop used for comparisons We took the Verilog sources from the course project and reduced the bitwidth of the datapath in order to accommodate our limited HSPICE simulation facilities. We resynthesized the entire modified design for our custom.25 m standard cell library that includes our energy recovering flip-flop. Our custom library contains only a small set of gates and transistor sizes, so the synthesis results are by no means optimal. After synthesis, we manually replaced each flip-flop with a dual flip-flop multiplexer construct, that includes a conventional flip-flop and our pterf flip-flop. By changing a global select ff signal, we could switch the design between conventional and energy recovering state elements, thus affording direct energy consumption com- ns q TABLE II TEST ASIC MOULE, PRE-LAYOUT ESTIMATES gates flip-flops pre-layout Tcritical @ 1.2V 3,897 387 1,974ps TABLE III TEST ASIC MOULE, POST-LAYOUT ESTIMATES 1.3V 1.4V 1.5V Tcritical Logic 2,33ps 2,62ps 1,941ps Tdq Energy Recovery ff 753ps 74ps 654ps Tdq Conventional ff 817ps 731ps 63ps parisons. A schematic of the conventional flip-flop we used is shown in Figure 13. Fig. 14. Layout plot of test ASIC module and power-clock generator Our final structural netlist was placed and routed using PLACE and WROUTE. We placed, Ground, and Power- Clock distribution wires in repeated stripes on the top metal layer over the entire ASIC core. WROUTE then routed each gate connection the nearest stripe. The ends of the stripes were connected by a wide distribution bus tied to several bond pads along with the clock generator. For a much larger design, the strip ends would connect to the arms of a global H-tree network. As this design was only approximately 4, gates, an H-tree was unnecessary in this case. Figure 14 is a layout plot of the combined ASIC module and power-clock generator. Table II summarizes the given test ASIC module. While the design is relatively small, its complexity is representative of much larger designs. Our test ASIC module is composed of 22 synthesizable Verilog files comprising over 7, lines of code. Table IV summarizes the post-layout timing estimates of the test ASIC module. These results are estimates derived from analysis of HSPICE simulation traces on netlists with postlayout extracted capacitances. At 1.3V, both the conventional and energy recovering flip-flop fail to meet timing for the target 3ns clock period, due to the long delay through the critical path in the logic. In other words, the voltage-scaling limit for this design would be 1.4V, as determined by the amount of pipelining in the design. However, since we added an additional mux to select between the energy recovering and standard flip-flops for testing purposes, the actual minimum voltage limit is 1.5V. Notice that the conventional flip-flop is faster than pterf at high voltages, but slower at low voltages. This trend is because the

TABLE IV ENERGY RECOVERING VS. CONVENTIONAL MOE @ 333MHZ, 1.5V mode Average Logic Only.9pJ 55.28pJ 28.1pJ Energy Recovery 6.74pJ 68.47pJ 37.6pJ Conventional 29.72pJ 78.28pJ 54.pJ conventional flip-flop speeds up with increasing voltage, while pterf delay is dominated by the rise time of and so, only slight reductions with increasing voltage are possible. We simulated our ASIC module from a post-layout extracted netlist using HSPICE on a Sun Blade 1. The minimum voltage for correct operation was 1.5V in both the energy recovering and conventional modes. Simulations took approximately 12MB of RAM each and 6 hours worth of computing time. In each mode, we simulated the ASIC module for 2ns with two regions of switching activity. The period immediately after reset has several cycles with low switching activity ( ), before the module begins it s self-test mode. The period towards the end of the 2ns of simulated time is representative of typical high activity ( ), as the ASIC module is undergoing a pseudo-random self-test. Our simulation measurements represent the total energy drawn from the C supplies, per cycle, averaged over several cycles. Our primary findings are summarized in Table IV. We simulated the ASIC module in both conventional and energy recovering modes at a frequency of 333MHZ and a 1.5V supply. In addition we separate out the losses in the combinational logic. At low activities, the total dissipation represents primarily the losses due to the clock, as the logic is not switching. At higher activities, the total dissipation is dominated by the logic dissipation. Notice that the combined energy recovering mode with clock generator in all cases dissipates less than the conventional mode. The total energy recovery dissipation (including clock generator) is only 23% of the conventional case at low activity and is only 88% of the conventional case at typical activity. V. CONCLUSION We have presented a novel single-phase energy recovering methodology for low-voltage ASIC design. Our methodology complements voltage-scaling approaches by enabling further energy savings once the minimum voltage for a given throughput has been reached. A key benefit of our methodology for ASIC designers is the minimal designer overhead needed to implement our methodology. All steps in transforming an existing ASIC design to be energy recovering have been automated. In contrast, clock-gating requires significant designer effort to implement correctly. Our methodology uses a single-phase sinusoidal energy recovering flip-flop and a single-cycle feedback control resonant power-clock generator. These two components are specifically designed to complement each other for low energy consumption. First, the energy recovering flip-flop has near zero dissipation while the input data is constant using a sinusoidal waveform. This waveform is easily and efficiently distributed over the chip using wide top metal wires, since all of the charge stored in the wiring capacitance is recovered by the power-clock generator. The power-clock generator, in turn, has a topology that enables all of the resonant currents to bypass the main power switch, thus enabling it to drive large capacitive loads with low dissipation. In addition, the power-clock generator only enables its main power switch whenever the resonant system actually needs additional energy, thus effectively idling when the majority of the flip-flops are not switching. We applied our methodology to an existing ASIC design, using a completely automated implementation of the energy recovery feature. We compared, in simulation using post-layout extracted parasitics, a conventional implementation with our energy recovering implementation. The results of this comparison debunk widely held mis-perceptions about energy recovery (a.k.a. adiabatic) circuits. First, energy recovery need not be slow to be efficient. Our flip-flop and power-clock generator perform efficiently at speeds of 2 5MHz in a.25 m process. Second, energy recovery complements, rather than competes with voltage scaling. Our flip-flop works efficiently at voltages down to 1.V while still running at 2MHz. And finally, energy recovery need not be complex and difficult to design. Our flip-flop contains only 14 transistors, while the clock generator uses less than 1. More importantly, our technique allows for full automation of implementing energy recovery with an existing ASIC design using simple custom tools in addition to industry standard synthesis, placement, and routing tools. We are currently awaiting fabrication to validate on silicon our energy recovering methodology. VI. ACKNOWLEGMENTS This research was supported in part by the US Army Research Office under AASERT Grant No. AAG55-97-1-25 and Grant No. AA19-99-1-34. REFERENCES [1] B.S. Kong, S.S. Kim, and Y.H. Jun, Conditional-capture flip-flop for statistical power reduction, IEEE Journal of Solid-State Circuits, vol. 36, no. 8, pp. 1263 1271, Aug. 21. [2] V. Stojanovic and V. G. Oklobdzija, Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems, IEEE Journal of Sold-State Circuits, vol. SC-34, no. 4, pp. 536 548, Apr. 1999. [3] J. Yuan and C. Svensson, New single-clock CMOS latches and flipflops with improved speed and power savings, IEEE Journal of Solid-State Circuits, vol. SC-32, no. 1, pp. 62 69, Jan. 1997. [4] W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and Y. Chou, Low-power digital systems based on adiabatic-switching principles, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 2, no. 4, pp. 398 46, ec. 1994. [5]. Maksimovic, V. G. Oklobdzija, B. Nikolic, and K. W. Current, Clocked CMOS adiabatic logic with integrated single-phase power-clock supply, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 4, pp. 46 463, Aug. 2. [6] C. H. Ziesler, S. Kim, and M. C. Papaefthymiou, A power-clock generator for true single-phase adiabatic logic, in Proceedings of International Symposium on Low-Power Electronics and esign, Aug. 21, pp. 159 164. [7] J. S. Moon, W. C. Athas, and P. A. Beerel, Theory and practical implementation of harmonic resonant rail drivers, in Proceedings of the International Symposium of Low-Power Electronics and esign, Aug. 21, pp. 153 158. [8] W. Athas, N. Tzartzanis, W. Mao, L. Peterson, R. Lal, K. Chong, J.S. Moon, L. Svensson, and M. Bolotski, The design and implementation of a lowpower clock-powered microprocessor, IEEE Journal of Solid-State Circuits, vol. 35, no. 11, pp. 1561 157, Nov. 2. [9] S. Kim, C. H. Ziesler, and M. C. Papaefthymiou, A true single-phase adiabatic multiplier, in Proceedings of 38th esign Automation Conference, June 21, pp. 758 763.