Novel Pulsed-Latch Replacement Based on Time Borrowing and Spiral Clustering

Similar documents
Power-Driven Flip-Flop p Merging and Relocation. Shao-Huan Wang Yu-Yi Liang Tien-Yu Kuo Wai-Kei Tsing Hua University

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Clock Tree Power Optimization of Three Dimensional VLSI System with Network

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops

Flip-flop Clustering by Weighted K-means Algorithm

University College of Engineering, JNTUK, Kakinada, India Member of Technical Staff, Seerakademi, Hyderabad

K.T. Tim Cheng 07_dft, v Testability

Improved Flop Tray-Based Design Implementation for Power Reduction

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

Interconnect Planning with Local Area Constrained Retiming

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Synchronous Timing. Latch Parameters. Class Material. Homework #8 due next Tuesday

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

A Survey on Post-Placement Techniques of Multibit Flip-Flops

Timing with Virtual Signal Synchronization for Circuit Performance and Netlist Security

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Pulsed-Latch ASIC Synthesis in Industrial Design Flow

A Greedy Heuristic Algorithm for Flip-Flop Replacement Power Reduction in Digital Integrated Circuits

Latch-Based Performance Optimization for FPGAs. Xiao Teng

11. Sequential Elements

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

Low Voltage Clocking Methodologies for Nanoscale ICs. A Dissertation Presented. Weicheng Liu. The Graduate School. in Partial Fulfillment of the

Chapter 5 Synchronous Sequential Logic

Homework 3 posted this week, due after Spring break Quiz #2 today Midterm project report due on Wednesday No office hour today

Clock - key to synchronous systems. Topic 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

Clock - key to synchronous systems. Lecture 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

Figure.1 Clock signal II. SYSTEM ANALYSIS

Performance Driven Reliable Link Design for Network on Chips

Simultaneous Control of Subthreshold and Gate Leakage Current in Nanometer-Scale CMOS Circuits

IN DIGITAL transmission systems, there are always scramblers

TKK S ASIC-PIIRIEN SUUNNITTELU

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

Lecture 11: Sequential Circuit Design

Project 6: Latches and flip-flops

Chapter 6. Flip-Flops and Simple Flip-Flop Applications

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

Lec 24 Sequential Logic Revisited Sequential Circuit Design and Timing

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

AN OPTIMIZED IMPLEMENTATION OF MULTI- BIT FLIP-FLOP USING VERILOG

DUE to the popularity of portable electronic products,

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Advanced Digital Logic Design EECS 303

Digital Circuits and Systems

EE241 - Spring 2013 Advanced Digital Integrated Circuits. Announcements. Lecture 14: Statistical timing Latches

Exploring Architecture Parameters for Dual-Output LUT based FPGAs

Timing EECS141 EE141. EE141-Fall 2011 Digital Integrated Circuits. Pipelining. Administrative Stuff. Last Lecture. Latch-Based Clocking.

Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction

Power-Optimal Pipelining in Deep Submicron Technology

Clock Domain Crossing. Presented by Abramov B. 1

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

ECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs

Power Reduction Approach by using Multi-Bit Flip-Flops

Australian Journal of Basic and Applied Sciences. Design of SRAM using Multibit Flipflop with Clock Gating Technique

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

EXPLOITING LEVEL SENSITIVE LATCHES FOR WIRE PIPELINING. A Thesis VIKRAM SETH

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

The NOR latch is similar to the NAND latch

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

Lecture 23 Design for Testability (DFT): Full-Scan

Fundamentals of Computer Systems

Chapter 7 Sequential Circuits

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

cascading flip-flops for proper operation clock skew Hardware description languages and sequential logic

Chapter 8 Design for Testability

Synthesis of Reversible Sequential Elements

EECS150 - Digital Design Lecture 17 - Circuit Timing. Performance, Cost, Power

Power Optimization by Using Multi-Bit Flip-Flops

A Novel Low-overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

ELCT201: DIGITAL LOGIC DESIGN

Fundamentals of Computer Systems

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

VARIABLE FREQUENCY CLOCKING HARDWARE

High Performance Low Swing Clock Tree Synthesis with Custom D Flip-Flop Design

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

CPE/EE 427, CPE 527 VLSI Design I Sequential Circuits. Sequencing

Lecture 10: Sequential Circuits

SGERC: a self-gated timing error resilient cluster of sequential cells for wide-voltage processor

Switching Circuits & Logic Design

Clock Gate Test Points

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

YEDITEPE UNIVERSITY DEPARTMENT OF COMPUTER ENGINEERING. EXPERIMENT VIII: FLIP-FLOPS, COUNTERS 2014 Fall

Level and edge-sensitive behaviour

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

POWER OPTIMIZED CLOCK GATED ALU FOR LOW POWER PROCESSOR DESIGN

EMT 125 Digital Electronic Principles I CHAPTER 6 : FLIP-FLOP

Design for Testability Part II

Unit 11. Latches and Flip-Flops

Impact of Test Point Insertion on Silicon Area and Timing during Layout

Static Timing Analysis for Nanometer Designs

DESIGN AND IMPLEMENTATION OF SYNCHRONOUS 4-BIT UP COUNTER USING 180NM CMOS PROCESS TECHNOLOGY

VirtualSync: Timing Optimization by Synchronizing Logic Waves with Sequential and Combinational Components as Delay Units

Transcription:

Novel Pulsed-Latch Replacement Based on Time Borrowing and Spiral Clustering NCTU CHIH-LONG CHANG IRIS HUI-RU JIANG YU-MING YANG EVAN YU-WEN TSAI AKI SHENG-HUA CHEN IRIS Lab National Chiao Tung University

Outline 2 Introduction Preliminaries Feasible region Algorithm Experimental results Conclusion

Clock Power Dominates! 3 Clock power is the major contributor of total chip power consumption Large portion of it is consumed by sequencing elements Minimize the sequencing overhead! D Q clk Comb ckt D Q clk clock power 27% Clock network C clk Clock root Power breakdown of an ASIC Chen et al. Using multi-bit flip-flop for clock power saving by DesignCompiler. SNUG, 2010.

Flip-Flops vs. Pulsed-Latches 4 clk D Flip-flop (FF) The most common form of sequencing elements Two cascaded latches triggered by a clock signal High sequencing overhead in terms of delay, power, area Pulsed-latch (PL) A latch synchronized by a pulse clock A PL can be approximated as a fast, low-power, and small FF Promising to reduce power for high performance circuits Migrate from a FF-based design to a PL-based counterpart to reduce the sequencing overhead Master latch Flip-flop Slave latch Q Delay clk w Pulsed-latch PG L L PG: pulse generator L: Latch

5 Prior Work Generic PL Most of previous works adopt the generic PL structure and flip-flop-like timing analysis L Pulse distortion 1. Chuang et al. [DAC 10] propose a PL-aware analytical placer, controlling pulse distortion by limiting the # of PLs and total WL driven by each PG (no timing consideration) Timing 2. Lee et al. [ICCAD 08], Lee et al. [ICCAD 09] and Paik et al. [ASPDAC 10] apply aggressive time borrowing techniques (clock skew scheduling, pulse width allocation, retiming) Power 3. Shibatani and Li [EETimes 06] propose a methodology 4. Kim et al. [ASPDAC 11] generate clock gating functions of PGs 5. Lin et al. [ISLPED 11] minimize # of PGs without considering clock gating 6. Chuang et al. [ICCAD 11] perform placement and clock network co-synthesis (based on 1 and 5) clk PG L

6 Multi-bit Pulsed-Latches (1/2) The generic PL structure Pulses can easily be distorted since the PG and latches are placed apart Multi-bit pulsed-latches Time (ns) The PG and latches are placed and hard-wired together in a compact and symmetric form The pulse distortion and clock skew can be well controlled load clk PG L clk L PG L L L L Generic pulsed latch: pulse generator (PG) and latches (L) Multi-bit pulsed latch: hardwired PG and L together Chuang et al. Pulsed-latch-aware placement for timing-integrity optimization. DAC-10. Farmer, et al. Pipeline array. US patent 6856270 B1, 2005. Venkatraman et al., A robust, fast pulsed flip-flop design, GLSVLSI-08.

Multi-bit Pulsed-Latches (2/2) 7 Multi-bit pulsed-latches are more power efficient than single-bit pulsed latch. Bit Number Normalized power per bit 1 1.000 2 0.740 4 0.613 8 0.575 L L clk PG L L Multi-bit pulsed latch: hardwired PG and L together

Do We Need Aggressive Time Borrowing? 8 Under flip-flop-like timing analysis, prior works use aggressive time borrowing techniques Various pulse widths, clock skew scheduling, and retiming may induce some difficulties on timing closure and functional verification Latches have the time borrowing property STA tools are mature to handle time borrowing The amount of time borrowing offered by the pulse width is significant for high performance circuits We can utilize only the intrinsic time borrowing of latches to provide flexibility to relocate pulsed-latches

How About MBPL Replacement? 9 Based on the multi-bit pulsed-latch structure and time borrowing offered by the pulse width, we apply post-placement pulsed-latch replacement to minimize power consumption subject to timing constraints. 1 1 1 2 L L PG L L 2 L 2 L L 3 L 4 L PG 3 4 L 3 L L 4 Feasible region with time borrowing Generic pulsed latches without time borrowing may incur pulse distortion MBPL without time borrowing MBPL with time borrowing

Our Contributions 10 Clock gating patterns Since clock gating is widely used for clock power reduction, we incorporate clock gating consideration into pulsed-latch replacement to gain double benefits from clock gating and pulsed-latch. Spiral clustering method is suitable for not only rectangular but also rectilinear shaped layouts; the latter are popular in modern IC design due to macros. Spiral clustering Irregular feasible regions We derive timing analysis formulae with time borrowing consideration and reveal that the feasible regions can be very irregular. We adopt an efficient representation to manipulate them.

Outline 11 Introduction Preliminaries Feasible region Algorithm Experimental results Conclusion

The Pulsed-Latch Migration Flow 12 We replace flip-flops by multi-bit pulsed-latches based on their timing slacks and the available amount of time borrowing. Flip-flop-based logic synthesis Placement Flip-flop-based timing analysis Post-placement MBPL replacement Placement legalization Pulsed-latch-based timing analysis Clock-gating-aware clock tree synthesis Y Meet timing? N Routing

Problem Formulation 13 The Multi-Bit Pulsed-Latch Replacement problem: Given A multi-bit pulsed-latch library Nelist & placement of a design The timing slacks Clock gating patterns of flip-flops Goal Replace flip-flops by multi-bit pulsed-latches with time borrowing Minimize power on pulsed-latches Subject to timing slack and placement density constraints

Outline 14 Introduction Preliminaries Feasible region Algorithm Experimental results Conclusion

Timing Analysis Flip-flops 15 Flip-flop i Max: D ij j Max: D jk k t fo (i) Min: d ij t fi (j) t fo (j) Min: d jk t fi (k) clock T T Setup Hold

Timing Analysis Pulsed-latches (1/2) 16 Pulsed-latch i Max: D ij j Max: D jk k t fo (i) Min: d ij t fi (j) t fo (j) Min: d jk t fi (k) clock T w T When we replace flip-flops with pulsed-latches, the data can depart the launching latch on the rising edge of the clock, but does not have to set up until the falling edge of the clock on the receiving latch. If the maximum delay from i to j exceeds a cycle period, it can borrow time from the delay from j to k.

Timing Analysis Pulsed-latches (2/2) 17 Pulsed-latch i Max: D ij j Max: D jk k t fo (i) Min: d ij t fi (j) t fo (j) Min: d jk t fi (k) clock T w T Setup Hold To guarantee successful time borrowing, in this paper, time borrowing is allowed between two adjacent timing windows

Timing Slack Conversion 18 Flip-flop-based synthesis and placement have considered the extra hold time margin w we focus on setup slacks i t fo (i) Max: D ij Min: d ij t fi (j) j T Convert the timing slacks for and obtained by flipflop-based timing analysis into pulsed-latch-based slacks without time borrowing We equally distribute the whole setup slacks to the latches fanin and fanout parts

Slack vs. Wirelength 19 Based on Synopsys' Liberty library, wire delays and can be approximated by piece-wise linear functions with the Manhattan distances and i t fo (i) Max: D ij Min: d ij t fi (j) j is calibrated by the delay table of the pulsed-latch library We incorporate time borrowing into the slack value to derive feasible regions

Feasible Region with Time Borrowing (1/3) 20 i j k t fo (i) t fi (j) t fo (j) t fi (k) Feasible region without time borrowing S fi (j)/ Fanin S fo (j)/ Fanout The fanin and fanout setup time slacks define two diamonds centered at the fanin and fanout gates of pulsed-latch j. The overlap area is the initial feasible region without time borrowing. Fanin diamond Fanout diamond

Feasible Region with Time Borrowing (2/3) 21 t b : the amount of time borrowed from the timing window j-k to window i-j, t b w Feasible region without time borrowing t b / t b / S fi (j)/ Fanin S fo (j)/ Fanout When we borrow some time t b, the fanin diamond is expanded by t b /, while the fanout diamond is shrunk by t b /. The overlap area slides horizontally or vertically. Feasible region with time borrowing t b

Feasible Region with Time Borrowing (3/3) 22 t b : the amount of time borrowed from the timing window j-k to window i-j, t b w S fi (j)/ Fanin Fanout S fo (j)/ Entire feasible region with time borrowing When we keep borrowing, the fanin or fanout diamond would reach the middle lines of the boundaries of fanin/fanout diamonds, and the overlap area are truncated. The entire feasible region is irregular. In the worst case, the feasible region could be an octagon.

Outline 23 Introduction Preliminaries Feasible region Algorithm Experimental results Conclusion

Post-Placement Pulsed-Latch Replacement 24 Feasible region extraction Spiral clustering MBPL extraction with clock gating Any more FFs? N Done Y 1. Extract feasible regions and represent them by four interval graphs 2. Use spiral clustering to form multibit pulsed-latches 3. Meanwhile, consider clock gating during MBPL extraction 4. Relocate the newly formed multibit pulsed-latches 5. Repeat steps 2 4 until all latches are investigated

Coordinate Transformation 25 To facilitate our feasible region extraction, we adopt a simple and fast coordinate transformation The fanin/fanout diamonds in Cartesian coordinate system C become squares in C', obtained by rotating by 45-degree. y x Define the four boundaries of a fanin/fanout diamond as right, bottom, left, and top boundaries. Chang, et al. INTEGRA: Fast multi-bit flip-flop clustering for clock power saving based on interval graphs. ISPD -11

Feasible Region Extraction 26 The fanin diamond expands, while the fanout diamond shrinks with time borrowing The entire feasible region is irregular. In the worst case, the feasible region could be an octagon Fanout S fi (j)/ S fo (j)/ How to extract the feasible region? Fanin y x Entire feasible region with time borrowing

Fence Finding (1/2) 27 If some fanout boundary is outer of the corresponding fanin one, there is a fence constraining the feasible region sliding rr S fi (j)/ S fo (j)/ y x Fanin bb Fanout

Fence Finding (2/2) 28 The fences are determined by The pulse width The differences between boundaries of fanin/fanout diamonds Given the initial feasible region, the entire feasible region with time borrowing can be extracted by finding eight fences. y x Fanin Fanout

Four Interval Graphs 29 Using these eight fences, we can handle any irregular feasible region. The projection of all feasible regions to x'-, y'-, x-, and y-axes form four interval graphs. s x (j) e x (j) e x (j) y x s x (j) e y (j) Fanin s y (j) Fanout e y (j) s y (j) Sequences X', Y', X, Y to record the starting and ending coordinates of x', y', x, and y intervals in ascending order. The feasible regions of 2 pulsed-latches overlap iff their feasible regions overlap on these four interval graphs.

Post-Placement Pulsed-Latch Replacement 30 Feasible region extraction Spiral clustering MBPL extraction with clock gating Any more FFs? N Done Y 1. Extract feasible regions and represent them by four interval graphs. 2. Use spiral clustering to form multibit pulsed-latches 3. Meanwhile, consider clock gating during MBPL extraction 4. Relocate the newly formed multibit pulsed-latches. 5. Repeat steps 2 4 until all flip-flops are investigated

Spiral Clustering and MBPL Extraction 31 Spiral clustering Find maximal cliques in the intersection graph of all feasible regions In physical perspective MBPL extraction with clock gating Extract subset with similar clock gating patterns from the found maximal clique to form a multi-bit pulsed latch In logical perspective

One Way Clustering vs. Spiral Clustering 32 One way clustering* Spiral clustering Cluster along x' axis Orphans around the end of X' Find cliques from four corners towards the center y x feasible region *Chang, et al. INTEGRA: Fast multi-bit flip-flop clustering for clock power saving based on interval graphs. ISPD -11

One Way Clustering vs. Spiral Clustering 33 One way clustering* 10 9 8 7 6 5 4 3 2 1 PL8 PL7 PL3 PL4 P L 5 PL6 PL2 PL2 PL1 0 1 2 3 4 5 6 7 8 9 10 Spiral clustering 10 9 8 7 6 5 4 3 2 1 PL8 PL7 PL3 PL4 PL5 PL6 PL2 PL1 0 1 2 3 4 5 6 7 8 9 10 {8} {6, 7} {2, 5} {3} {1, 4} {7, 8} {5, 6} {1, 4} {2, 3} *Chang, et al. INTEGRA: Fast multi-bit flip-flop clustering for clock power saving based on interval graphs. ISPD -11

Rectilinear Layout 34 Spiral clustering groups from corners Suitable for rectilinearly shaped layout with many macros macro

Post-Placement Pulsed-Latch Replacement 35 Feasible region extraction Spiral clustering MBPL extraction with clock gating Any more FFs? N Done Y 1. Extract feasible regions and represent them by four interval graphs. 2. Use spiral clustering to form multibit pulsed-latches 3. Meanwhile, consider clock gating during MBPL extraction 4. Relocate the newly formed multibit pulsed-latches. 5. Repeat steps 2 4 until all flip-flops are investigated

Clock Gating Is Important! 36 Since the latches inside one MBPL cell share the pulse clock, their clock gating functions are logic ORed together. If we merge pulsed-latches with very different clock gating patterns, we may not reduce power consumption. Effective power ratio = library * pattern E.g., library: 0.74, pattern: 1.5 => effective power ratio = 1.11 Worse than separate PLs Feasible region 1001 To reduce power, our strategy is to extract a subset of feasible bit number and with minimum effective power ratio from a found maximal clique. 1010 1011 Clock gating pattern Bit Number Normalized power 1 1.00 2 1.48

Post-Placement Pulsed-Latch Replacement 37 Feasible region extraction Spiral clustering MBPL extraction with clock gating Any more FFs? N Done Y 1. Extract feasible regions and represent them by four interval graphs. 2. Use spiral clustering to form multibit pulsed-latches 3. Meanwhile, consider clock gating during MBPL extraction 4. Relocate the newly formed multibit pulsed-latches. 5. Repeat steps 2 4 until all flip-flops are investigated

MBPL Relocation 38 1. For a formed multi-bit pulsed latch, find the point in the feasible region with minimum wirelength 2. Legalize it Minimum wirelength region y x

Outline 39 Introduction Preliminaries Feasible region Algorithm Experimental results Conclusion

Settings 40 We implemented our algorithm in the C programming language and executed the program on a platform with an Intel Xeon 3.8 GHz CPU and with 16 GB memory under Ubuntu 10.04 OS. 1-/2-/4-/8-bit MBPL cells based on 55-nm technology w = 100 ps Bit Number Normalized power Normalized area 1 1.00 1.00 2 1.48 1.92 4 2.45 3.85 8 4.60 7.58 Benchmark Circuit #FFs #Bins #Grids Avg. activity Industry1 120 66 600600 0.25 Industry2 120 66 600600 0.13 Industry3 60,000 100300 2,0003,000 0.69 Industry4 5,524 100200 2,0002,000 0.44 Industry5 953 30160 6001,600 0.25 avg. activity is the average active rate of clock gating functions.

One Way Clustering vs. Spiral Clustering 41 Focus on power reduction contributed from the MBPL library during spiral clustering Circuit Power Ratio One Way Clustering* Pattern- Aware Power Ratio #Sinks (1/2/4/8-bit PLs) Runtime (s) Spiral Clustering with Time Borrowing w=100ps w/o Clock Gating Pattern- #Sinks Aware (1/2/4/8-bit PLs) Power Ratio Power Ratio Runtime (s) Industry1 74.93% 130.67% 62 49 < 0.01 69.34% 140.38% (18/37/7/0) (4/32/13/0) < 0.01 Industry2 75.78% 101.22% 64 56 < 0.01 72.36% 104.30% (20/38/6/0) (14/31/11/0) < 0.01 Industry3 57.54% 79.53% 7,558 7,500 3.36 57.50% 79.49% (10/35/46/7,467) (0/0/0/7,500) 3.07 Industry4 62.98% 96.61% 1,520 1,233 0.41 60.84% 99.33% (52/432/920/116) (16/182/784/251) 0.39 Industry5 65.36% 113.79% 311 246 0.04 62.33% 121.02% (27/123/152/9) (9/62/145/30) 0.05 Avg. 67.32% 104.36% 35.55% - 64.47% 108.90% 29.63% - *Chang, et al., INTEGRA: Fast multi-bit flip-flop clustering for clock power saving based on interval graphs, ISPD 2011

w = 150 ps vs. w = 200 ps 42 If the pulse width increases, the power saving can be further improved. Circuit Spiral Clustering with Time Borrowing w = 150 ps w/o Clock Gating Pattern- #Sinks Aware (1/2/4/8-bit PLs) Power Ratio Power Ratio Runtime (s) Spiral Clustering with Time Borrowing w = 200 ps w/o Clock Gating Pattern- #Sinks Aware (1/2/4/8-bit PLs) Power Ratio Power Ratio Runtime (s) Industry1 68.07% 142.54% 46 45 < 0.01 67.64% 144.35% (4/26/16/0) (4/24/17/0) < 0.01 Industry2 70.22% 101.35% 51 50 < 0.01 69.79% 103.56% (10/27/14/0) (10/25/15/0) < 0.01 Industry3 57.50% 79.53% 7,500 7,500 3.20 57.50% 79.47% (0/0/0/7,500) (0/0/0/7,500) 3.23 Industry4 60.52% 99.68% 1,184 1,170 0.41 60.46% 99.95% (14/157/727/286) (14/163/690/303) 0.40 Industry5 62.00% 121.95% 239 240 0.05 62.12% 122.86% (7/55/145/32) (7/63/135/35) 0.04 Avg. 63.66% 109.01% 27.97% - 63.50% 110.04% 27.61% -

Without vs. With Clock Gating (w=100ps) 43 Consider clock gating during spiral clustering Circuit Spiral Clustering with Time Borrowing w = 100 ps w/o Clock Gating Pattern- #Sinks Aware (1/2/4/8-bit PLs) Power Ratio Power Ratio Runtime (s) Spiral Clustering with Time Borrowing w = 100ps w/ Clock Gating Pattern- #Sinks Aware (1/2/4/8-bit PLs) Power Ratio Power Ratio Runtime (s) Industry1 69.34% 140.38% 49 (4/32/13/0) < 0.01 95.68% 95.68% 110 (104/4/2/0) < 0.01 Industry2 72.36% 104.30% 56 (14/31/11/0) < 0.01 78.38% 78.38% 70 (32/32/6/0) < 0.01 Industry3 57.50% 79.49% 7,500 15,033 3.07 63.59% 68.78% (0/0/0/7,500) (8,578/25/17/6,413) 5.20 Industry4 60.84% 99.33% 1,233 2,633 0.39 73.33% 73.99% (16/182/784/251) (1,584/328/621/100) 0.45 Industry5 62.33% 121.02% 246 535 0.05 77.46% 77.59% (9/62/145/30) (337/102/89/7) 0.05 Avg. 64.47% 108.90% 29.63% - 77.69% 78.88% 55.77% -

Outline 44 Introduction Preliminaries Feasible region Algorithm Experimental results Conclusion

Conclusion 45 Derive timing properties Setup/hold time constraints with time borrowing Use intrinsic time borrowing: safer than skew scheduling, pulse width allocation and retiming Reveal irregular feasible regions Maybe an octagon New representation: two pairs of interval graphs Propose spiral clustering Better clustering results than one way clustering Suitable for rectilinearly shaped layout Consider clock gating Effective power reduction Our results show that with time borrowing, spiral clustering, and clock gating consideration, we can achieve very power efficient results

46 Thank You! Contact info: Iris Hui-Ru Jiang huiru.jiang@gmail.com

How about Loops? 47 To guarantee successful time borrowing, in this paper, time borrowing is allowed between two adjacent timing windows 2T 2T 2T 2T NCTU - ISPD'12

How about Multiple Fanouts? 48 Consider individually Combine together fanout1 fanin fanout2

What We Have Already Fain slack Feasible region F r (i) Slope = +1 Slope = -1 L fo (i) L fi (i) i L fi (i) i Fanin gate y x Fanin gate Fanout gate Efficient transformation 49

Representation 50 Interval graphs Sequences 10 FF0 FF7 9 FF1 8 FF5 7 6 5 FF3 4 3 2 FF2 1 y' FF4 FF6 0 1 2 3 4 5 6 7 8 9 10 x' 10 y' [0,10] [5,9] [1,2] [0,5] [2,7] [7,8] [4,9] [7,10] 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 Efficient data structure 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 10 x' [0,4] [1,3] [0,7] [1,9] [4,6] [0,9] 6 7 [8,10] [2,8]