ISPD 2017 Contest Clock-Aware FPGA Placement
|
|
- Lewis Rudolf Atkinson
- 6 years ago
- Views:
Transcription
1 ISPD 2017 Contest Clock-Aware FPGA Placement Stephen Yang, Chandra Mulpuri, Sainath Reddy, Meghraj Kalase, Srinivasan Dasasathyan, Mehrdad E. Dehkordi, Marvin Tom, Rajat Aggarwal
2 Acknowledgement Xilinx Vivado Management Team Support from Dr. Sudip Nag and Dr. Salil Raje Support from Xilinx Lab
3 Outline Background Top-5 Team Presentations Benchmarking Results Award Ceremony
4 Last Year: Routability-Driven FPGA Placement First FPGA related contest Latest FPGA architecture Vivado: Industrial flow for evaluation Academic benchmark format: bookshelf Focus: FPGA legalization rule and routing congestion
5 This Year: Clock-Aware FPGA Placement Continuous Effort on FPGA Placement Problem Clock Legalization: Key Constraint in FPGA Placement Wirelength as the primary metric Reduced difficulty on routability, reduced runtime factor
6 Contest Timelines Oct 2016: Problem definition and contest planning Nov 2016: Contest Announcement Dec 12, 2015: Sample benchmarks ready Jan 15, 2017: Registration deadline Feb 3, 2017: Evaluation flow ready Feb 15, 2017: Alpha submission Mar 9, 2017: Final submission Mar 10-12, 2017: Benchmarking Mar 22, 2017: Announce winners at ISPD Page 6
7 Registration: 13 Teams Team Affiliation Region VDAplacer National Chiao Tung University Asia UTPlaceF2.0 University of Texas at Austin North America WicilPlacer University of Wisconsin-Madison North America RippleFPGA Chinese University of Hong Kong Asia Uni-Placer Ulsan National Institute of Science and Technology Asia CECA_Placer Peking University Asia NTUfplace National Taiwan University Asia GPlace University of Guelph North America BMTIplacer Beijing Microelectronics and Technology Institute Asia AggiePlace Texas A&M University North America UFRGSPlace Universidade Federal do Rio Grande do Sul South America POCA Tool Politecnico di Torino, Torino, Italy Europe Kapees Indian Institute of Technology, Guwahati Asia
8 Final Submission: 9 Teams Team Affiliation Region VDAplacer National Chiao Tung University Asia UTPlaceF2.0 University of Texas at Austin North America WicilPlacer University of Wisconsin-Madison North America RippleFPGA Chinese University of Hong Kong Asia CECA_Placer Peking University Asia NTUfplace National Taiwan University Asia GPlace University of Guelph North America BMTIplacer Beijing Microelectronics and Technology Institute Asia UFRGSPlace Universidade Federal do Rio Grande do Sul South America Congratulations!
9 Target FPGA: Xilinx UltraScale VU095 20nm Technology 1.2M Logic Cell Page 9
10 Clock Routing Architecture Page 10
11 Clock Region Rule distinct clocks per region Page 11
12 Half Column Rule 12 distinct clocks per half column Page 12
13 (Hidden) Benchmark Statistics Design #LUTs #FFs #BRAMs #DSPs #I/O #Clocks Design1 215K (40%) 236K (22%) 170 (10%) 75 (10%) Design2 215K (40%) 236K (22%) 170 (10%) 75 (10%) Design3 242K (45%) 270K (25%) 255 (15%) 112 (15%) Design4 268K (50%) 300K (28%) 340 (20%) 150 (20%) Design5 295K (55%) 325K (30%) 425 (25%) 187 (25%) Design6 322K (60%) 354K (33%) 510 (30%) 225 (30%) Design7 350K (65%) 384K (36%) 595 (35%) 262 (35%) Design8 376K (70%) 414K (38%) 680 (40%) 300 (40%) Design9 392K (73%) 431K (40%) 765 (45%) 337 (45%) Design10 408K (76%) 449K (42%) 850 (50%) 375 (50%) Design11 424K (79%) 450K (43%) 900 (53%) 397 (53%) Design12 440K (82%) 484K (45%) 950 (56%) 420 (56%) Design13 456K (85%) 503K (47%) 1000 (59%) 442 (59%) Largest: 1.0M instances, 57 clocks Page 13
14 Placer Evaluation Flow Design (bookshelf) Design (Xilinx DB) Load Design Vivado Contest Placer.pl file Read Placement Clock and Legality Check Routing Routed WL Page 14
15 Evaluation Metrics and Ranking Score = Routed-WL * (1 + Runtime_Factor) Runtime Factor 20% runtime -> 1% QoR Bounded by +/- 2.5% Failures Routing-Failures > Legalization-Failures > Placer-Failures Ranking per design: 1, 2, 3,, n Sum-of-the-rankings of each team
16 Top-5 Team Presentation
17 Top-5 Teams (In Alphabetical Order) GPlace, University of Guelph, Ziad Abuowaimer NTUfplace, National Taiwan University, Yun-Chih Kuo RippleFPGA, Chinese University of Hong Kong, Gengjie Chen UTPlaceF2.0, University of Texas, Austin, Wuxi Li VDAplacer, National Chiao Tung University, Chen Chen
18 Top-5 Teams (In Alphabetical Order) GPlace, University of Guelph, Ziad Abuowaimer NTUfplace, National Taiwan University, Yun-Chih Kuo RippleFPGA, Chinese University of Hong Kong, Gengjie Chen UTPlaceF2.0, University of Texas, Austin, Wuxi Li VDAplacer, National Chiao Tung University, Chen Chen
19 GPlace 2.0: Clock-Aware Placement Tool for UltraScale FPGAs Ziad Abuowaimer Shawki Areibi Anthony Vannelli University of Guelph March 22, 2017 Gary Grewal
20 Preplacement Global Placement (WL-Driven) Star+ Solver Site & Clock Legalization Congestion Estimation Adjust Global Routing Grid NCTU-gr 2.0 LUT inflation Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Clock-Loads Assignment Global Placement (Congestion-Driven) Star+ Solver Site & Clock Legalization Overlap Bbox of Clock Signals NO YES <= 24 placement.pl 20
21 Preplacement Global Placement (WL-Driven) Star+ Solver Site & Clock Legalization Congestion Estimation Adjust Global Routing Grid NCTU-gr 2.0 LUT inflation Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Clock-Loads Assignment Global Placement (Congestion-Driven) Star+ Solver Site & Clock Legalization Overlap Bbox of Clock Signals Pin-Propagation Preplacement (Similar to GPlace 1.0) NO YES <= 24 placement.pl 21
22 Preplacement Global Placement (WL-Driven) Star+ Solver Site & Clock Legalization Congestion Estimation Adjust Global Routing Grid NCTU-gr 2.0 LUT inflation Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Clock-Loads Assignment Global Placement (Congestion-Driven) Star+ Solver Site & Clock Legalization Overlap Bbox of Clock Signals NO YES <= 24 placement.pl 22
23 Preplacement Global Placement (WL-Driven) Star+ Solver Analytical Placement (Star+ and Jacobi): = = = = Site & Clock Legalization = + : : 23
24 Preplacement Global Placement (WL-Driven) FF Legalization: (Objective is WL minimization) Use Bipartition Legalization in three levels: Star+ Solver FF Legalization First partition the FPGA into Clock Regions and recursively bipartition FFs into those clock regions. Clock-Region Bipartition Half-Column Bipartition Site Bipartition Second, partition each Clock-Region into half-columns and recursively bipartition FFs into those half-columns. Third, partition each half-columns into sites and recursively bipartition FFs into those sites. 24
25 Preplacement Global Placement (WL-Driven) Star+ Solver FF Legalization Create a Recursive bi-partitioning tree data structure for the 40 Clock Regions. Each node in the tree contains: Site capacity. Clock Capacity. Clock-Region Bipartition Half-Column Bipartition Site Bipartition 25
26 Preplacement Global Placement (WL-Driven) Star+ Solver #Groups CR0 #Slices #Groups #Sub-groups RG0 CR1 CE0 CE1 CE0 Tree structure Maintain Sites and Control-Set Capacity constraints. FF Legalization Clock-Region Bipartition 9 #FFs 5 17 Half-Column Bipartition Site Bipartition CS0 RG0 CS1 Tree structure Maintain Clock Signals Capacity Constraints 9 FFs 17 FFs 26
27 Preplacement Global Placement (WL-Driven) Star+ Solver FF Legalization # Clocks & Clocksids FPGA-Clock-Region-Tree: A tree data structure that stores # of Clocks and Clocks ids At each node after FF legalization Level 1. Clock-Region Bipartition Half-Column Bipartition Site Bipartition 27
28 Preplacement Global Placement (WL-Driven) Star+ Solver FF Legalization Clock-Region Bipartition Half-Column Bipartition Create a Recursive bi-partitioning tree data structure of the half-columns within each Clock Region. (Actually we need only 3 Trees since we have 3 different patterns). Each node in the tree contains: Site capacity. Clock Capacity. Site Bipartition 28
29 Preplacement Global Placement (WL-Driven) Star+ Solver FF Legalization Tree: Clock Capacity Tree: Site & Control-Set Capacity Clock-Region Bipartition RG0 #Slices RG0 Half-Column Bipartition CS0 CS1 #Groups CR0 Site Bipartition 9 FFs 17 FFs CE0 #Sub-groups CE1 9 #FFs 5 29
30 Preplacement Global Placement (WL-Driven) Star+ Solver FF Legalization Clock-Region Bipartition Half-Column Bipartition Site Bipartition FPGA-Half-Column-Tree: A tree data structure that stores # of Clocks and Clocks ids At each node after FF legalization Level 2. 30
31 Preplacement Global Placement (WL-Driven) Star+ Solver FF Legalization Clock-Region Bipartition Half-Column Bipartition Site Bipartition Tree: Site & Control-Set Capacity Create a Recursive bipartitioning tree data #Slices RG0 structure of the Sites within each half-column. #Groups Each node in the tree contains: Site capacity. CE0 CR0 #Sub-groups CE1 9 #FFs 5 31
32 Preplacement Global Placement (WL-Driven) Star+ Solver DSP Legalization Clock-Region Bipartition Half-Column Bipartition Site Bipartition DSP Legalization: (Similar to FF legalization but without Control-Set Constraints) Use Bipartition Legalization in three levels: First partition the FPGA into Clock Regions and recursively bipartition DSPs into those clock regions. (Use and update FPGA-Clock-Region-Tree). Second, partition each Clock-Region into half-columns and recursively bipartition DSPs into those half-columns. (Use and update FPGA-Half-Column-Tree). Third, partition each half-columns into sites and recursively bipartition DSPs into those sites. 32
33 Preplacement Global Placement (WL-Driven) BRAM Legalization: (Similar to DSP legalization) Use Bipartition Legalization in three levels: Star+ Solver BRAM Legalization First partition the FPGA into Clock Regions and recursively bipartition BRAMs into those clock regions. (Use and update FPGA-Clock-Region-Tree). Clock-Region Bipartition Half-Column Bipartition Second, partition each Clock-Region into half-columns and recursively bipartition BRAMs into those half-columns. (Use and update FPGA-Half-Column-Tree). Site Bipartition Third, partition each half-columns into sites and recursively bipartition BRAMs into those sites. 33
34 Preplacement v Adjust the Global Routing Grid Capacity. Global Placement (WL-Driven) Star+ Solver v Run NCTU-gr 2.0 Global Router to get the congestion estimation. Site & Clock Legalization Congestion Estimation Adjust Global Routing Grid NCTU-gr 2.0 v Inflate LUTs based on both # of pins and congestion value: = ( ) Ratio is based on Congestion Value. LUT inflation 34
35 Preplacement Global Placement (WL-Driven) Star+ Solver Site & Clock Legalization Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Clock-Loads Assignment Congestion Estimation Adjust Global Routing Grid NCTU-gr 2.0 LUT inflation 35
36 Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Calculate the center of gravity for each Clock Signal based on the position of its Clock Loads. (Ignore The two Global Clock Signals ControlSig0 & ControlSig1) Clock-Loads Assignment 36
37 Clock-Signals Partitioning Clock-Loads Center of Gravity Find a bounding box that contains all center of gravity points. Bbox of Center of Gravity Clock-Loads Assignment 37
38 Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Assign each Clock Loads to the closest corner based on the distance of its center of gravity to that corner. Limit each partition to have 20 different Clocks maximum. Clock-Loads Assignment 38
39 Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Place each partition to the corresponding FPGA corner. Place the inflated LUTs in the middle of the FPGA. Clock-Loads Assignment LUTs 39
40 (Congestion-Driven) Preplacement Global Placement (WL-Driven) Star+ Solver Site & Clock Legalization Congestion Estimation Adjust Global Routing Grid NCTU-gr 2.0 LUT inflation Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Clock-Loads Assignment Global Placement (Congestion-Driven) Star+ Solver Site & Clock Legalization Overlap Bbox of Clock Signals Similar to Global Placement (WL-Driven) but with inflated LUTs. NO YES <= 24 placement.pl 40
41 Preplacement Global Placement (WL-Driven) Star+ Solver Site & Clock Legalization Congestion Estimation Adjust Global Routing Grid NCTU-gr 2.0 LUT inflation Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Clock-Loads Assignment Global Placement (Congestion-Driven) Star+ Solver Site & Clock Legalization Overlap Bbox of Clock Signals NO YES <= 24 placement.pl 41
42 Preplacement Global Placement (WL-Driven) Star+ Solver Site & Clock Legalization Congestion Estimation Adjust Global Routing Grid NCTU-gr 2.0 LUT inflation Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Clock-Loads Assignment Global Placement (Congestion-Driven) Star+ Solver Site & Clock Legalization Overlap Bbox of Clock Signals NO YES <= 24 placement.pl 42
43 Preplacement Global Placement (WL-Driven) Star+ Solver Site & Clock Legalization Congestion Estimation Adjust Global Routing Grid NCTU-gr 2.0 LUT inflation Clock-Signals Partitioning Clock-Loads Center of Gravity Bbox of Center of Gravity Clock-Loads Assignment Global Placement (Congestion-Driven) Star+ Solver Site & Clock Legalization Overlap Bbox of Clock Signals NO YES <= 24 placement.pl 43
44 Top-5 Teams (In Alphabetical Order) GPlace, University of Guelph, Ziad Abuowaimer NTUfplace, National Taiwan University, Yun-Chih Kuo RippleFPGA, Chinese University of Hong Kong, Gengjie Chen UTPlaceF2.0, University of Texas, Austin, Wuxi Li VDAplacer, National Chiao Tung University, Chen Chen
45 NTUfplace Clock-Aware FPGA Placement Yun-Chih Kuo, Chau-Chin Huang, Shih-Chun Chen, Chun-Han Chiang, Yao-Wen Chang, and Sy-Yen Kuo Mar. 22, 2017 National Taiwan University 45
46 Outline Introduction Proposed Approach Experimental Results Demo 46
47 Outline Introduction Proposed Approach Experimental Results Demo 47
48 bin Analytical Placement Formulation Given the chip region and block dimensions, determine (x, y) for all movable blocks min s.t. W( x, y ) // wirelength function D b ( x, y ) M b D b : density for bin b M b : max density for bin b Density = A block A bin Relax the constraints into the objective function (penalty) min W( x, y ) + λ ( max( D b ( x, y ) M b, 0 ) ) 2 Apply differentiable wirelength and density models Use the gradient method to solve the optimization problem Increase λ gradually to meet density constraints 48
49 Differentiable Wirelength and Density Models Log-sum-exp wirelength model [Naylor et al., 2001] ¾ An effective smooth and differentiable function for HPWL approximation; this model achieves exact HPWL when γ à 0 Bell-shaped density model [Kahng et al., ICCAD 04] ℎ ℎ (, ) (, )
50 Multilevel Global Placement Cluster the blocks based on connectivity/size to reduce the problem size clustering clustering Initial placement Iteratively decluster the clusters and further refine the placement declustering & refinement declustering & refinement clustered block chip boundary 50
51 Outline Introduction Proposed Approach Experimental Results Demo 51
52 Clock-Aware Multilevel Global Placement Cluster blocks with clock constraint Initial placement clustering clustering declustering & refinement declustering & refinement clustered block chip boundary Blocks within same clock domain 52
53 Mismatch between GP and LG Analytical model for global placement gives continuous solutions while legalization pulls blocks to discrete and scattered legal locations Displacement of blocks is large I/O block DSP CLB RAM 53
54 Heterogeneous Cost Function Therefore, we can solve this with gradient method: min W( x, y ) + λ 1 ( max( D b ( x, y ) M b, 0 ) ) 2 + λ 2 G(x) Cost of complex-block-alignment function Smoothed cost DSP columns 54
55 Clocking Resource Constraint We formulate the clocking resource constraint in clock regions as a cost in the placement stages Therefore, we can resolve the clocking resource constraint by moving blocks out of resource-lacking regions Clock Region 55
56 Outline Introduction Proposed Approach Experimental Results Demo 56
57 Experimental Results We ran our program on an Intel Xeon E CPU with 32GB memory Design #nodes #nets Routed-WL Runtime clk_design s clk_design m41s clk_design m11s clk_design m1s clk_design m57s 57
58 Outline Introduction Proposed Approach Experimental Results Demo 58
59 Demo 59
60 Thank You! 60
61 Top-5 Teams (In Alphabetical Order) GPlace, University of Guelph, Ziad Abuowaimer NTUfplace, National Taiwan University, Yun-Chih Kuo RippleFPGA, Chinese University of Hong Kong, Gengjie Chen UTPlaceF2.0, University of Texas, Austin, Wuxi Li VDAplacer, National Chiao Tung University, Chen Chen
62 CUHK - RippleFPGA Gengjie Chen, Chak-Wa Pui, Evangeline F. Y. Young, Bei Yu March 22, 2017
63 Outline Background Our Flow How We Handle Clock Rules Clock region Half column
64 Background Hetergenous FPGA I/O CLB RAM DSP Switch Box
65 Background Configurable Logic Block (CLB) Basic Logic Element (BLE) BLE 0 LUT 0 CK0 SR0 CE0 FF 0 upper half using CK0, SR0, CE0/1 BLE 1 BLE 2 LUT 1 FF 1 CK0 SR0 CE1 CLB BLE 3 BLE 4 lower half using CK1, SR1, CE2/3 BLE 5 BLE 6 LUT 14 LUT 15 BLE 7... CK1 SR1 CE2 FF 14 FF 15 CK1 SR1 CE3
66 Outline Background Our Flow How We Handle Clock Rules Clock Region Half Column
67 Flows in Previous Work packing flat netlist pack-place placement LUT/FF BLE CLB placed design Convectional flow (pack-place) Packing based on physical information (place-packplace): Un/DoPack [ICCAD 06], HDPack [FPL 07], UTPlaceF [ICCAD 16], GPlace-pack [ICCAD 16] Flat placement followed by legalization (place-pack): GPlace-flat [ICCAD 16] place-pack-place place-pack
68 Our Flow placement flat netlist packing flat netlist 1 flat GP LUT/FF BLE 5 CLB placed design soft BLE packing BLE GP CLB physical packing (LG) 45 two-level DP 5 slot assignment in CLB placed design
69 Our flow Features Stair-step flow which interleaves packing and placement Implicit CLB packing similar to ASIC LG (Tetris) Strengths Feedback quickly Iteratively improve other metrics (congestion, timing, power etc) Approximate analytical GP directly Smoothly control packing density Easily embed other metrics Easily consider some constraints (e.g., clock rules)
70 Outline Background Our Flow How We Handle Clock Rules Clock region Half column
71 Clock Rules Clock region ~32x60 sites => global A clock occupies a clock region if its bounding box (BB) does <= 24 clocks in each Half column 2x30 sites => local <= 12 clocks in each
72 Clock Region Clock region ~32x60 sites => global <= 24 clocks in each Solution Plan clock regions Apply it to GP, LG, DP
73 Clock Region Planning Clock bounding box (CBB): restrict the movement of cells of the same clock to a bounding box Shrinking: reduce overflow in clock region iteratively until no Expanding: reduce cell density in CBB iteratively until impossible
74 Clock Region Planning Assume 3x3 clock regions <= 2 clocks in each clock region 4 clocks 1 1 The CBB of a clock 1 1
75 Clock Region Planning Assume 3x3 clock regions <= 2 clocks in each clock region 4 clocks
76 Clock Region Planning Assume 3x3 clock regions <= 2 clocks in each clock region 4 clocks
77 Clock Region Planning Assume 3x3 clock regions <= 2 clocks in each clock region 4 clocks
78 Clock Region Planning Assume 3x3 clock regions <= 2 clocks in each clock region 4 clocks Overflow: #clk = 4 >
79 Clock Region Planning Shrinking: reduce overflow in clock region iteratively until no For clock region with max overflow Calculate total cell displacement when shrinking Select CBB & direction with min displacement and do
80 Clock Region Planning Shrinking: reduce overflow in clock region iteratively until no
81 Clock Region Planning Shrinking: reduce overflow in clock region iteratively until no
82 Clock Region Planning Shrinking: reduce overflow in clock region iteratively until no It s legal now! 1 2 1
83 Clock Region Planning Expanding: reduce cell density in CBB iteratively until impossible For unmarked CBB with max cell density Try expanding, mark if cannot
84 Clock Region Planning Expanding: reduce cell density in CBB iteratively until impossible
85 Clock Region Planning Expanding: reduce cell density in CBB iteratively until impossible
86 Clock Region Planning Expanding: reduce cell density in CBB iteratively until impossible
87 Clock Region Planning Expanding: reduce cell density in CBB iteratively until impossible
88 Clock Region Planning Expanding: reduce cell density in CBB iteratively until impossible It s exhausted now! 2 2 2
89 Clock Region Plan clock region Apply it to GP, LG, DP GP: add box constraints (not implemented) LG/DP: only consider sites within CBB
90 Half Column Half column 2x30 sites => local <= 12 clocks in each Solution Resolve overflow after normal LG Forbid movement causing overflow in DP
91 Half Column Resolve overflow after normal LG For a half column with overflow Select the clock with fewest cells Move cells to neighboring overflow-free half columns with min displacement
92 Half Column Resolve overflow after normal LG
93 Half Column Resolve overflow after normal LG
94 Half Column Resolve overflow after normal LG It s legal now!
95 Summary Background Our Flow How We Handle Clock Rules Clock region Plan clock region Apply it to GP, LG, DP Half column Resolve overflow after normal LG Forbid movement causing overflow in DP
96 Top-5 Teams (In Alphabetical Order) GPlace, University of Guelph, Ziad Abuowaimer NTUfplace, National Taiwan University, Yun-Chih Kuo RippleFPGA, Chinese University of Hong Kong, Gengjie Chen UTPlaceF2.0, University of Texas, Austin, Wuxi Li VDAplacer, National Chiao Tung University, Chen Chen
97 UT DA UTPlaceF 2.0 ISPD 2017 Clock-Aware FPGA Placement Contest Wuxi Li, David Z. Pan ECE Department, University of Texas at Austin 97
98 Team Introduction Wuxi Li Ph.D. student UT-Austin David Z. Pan Professor UT-Austin UT Design Automation Lab 98
99 Outline Original UTPlaceF Flow Clock Constraints Clock Region Constraint Half Column Constraint Clock Region Assignment UTPlaceF 2.0 Flow 99
100 Original UTPlaceF Flow Circuit Wirelength-driven Phase Routability-driven Phase Flat Initial Placement Netlist Cell In ation Packing Global Placement Quadratic Programming + Rough Legalization Quadratic Programming + Rough Legalization Legalization No Almost Converged? Yes Legalize DSP, RAM, I/O No Converged? Detailed Placement Yes FIP Done Done 100
101 Clock Region Constraint The FPGA is divided into 5 by 8 clock regions Clock demand of each clock region
102 Half Column Constraint Each clock region is divided into half column regions Clock demand of each half column region
103 Clock Region Assignment Problem Inputs A rough legalized placement Outputs Cells to clock region assignment with minimized total cell movement Capacity constraint is satisfied for each clock region Clock demand 24 for each clock region 103
104 Problem Transformation 104
105 Algorithm Overview 105
106 Min-Cost-Max-Flow Based Assignment 106
107 UTPlaceF 2.0 Flow Circuit Wirelength-driven Phase Routability & Clock Driven Phase Flat Initial Placement Netlist Cell In ation Clock-Aware Packing Clock Region Assign. + Global Placement Clock Region Assign. + Half Column Assign. + Legalization Clock-Aware Detailed Placement No Quadratic Programming + Rough Legalization Almost Converged? Yes Quadratic Programming + Clock Region Assign. + Rough Legalization Legalize DSP, RAM, I/O Converged? Yes FIP Done No Done 107
108 Thanks! 108
109 Top-5 Teams (In Alphabetical Order) GPlace, University of Guelph, Ziad Abuowaimer NTUfplace, National Taiwan University, Yun-Chih Kuo RippleFPGA, Chinese University of Hong Kong, Gengjie Chen UTPlaceF2.0, University of Texas, Austin, Wuxi Li VDAplacer, National Chiao Tung University, Chen Chen
110 VDAplacer ISPD 2017 Contest Clock-Aware FPGA Placement Presenter: Chen Chen Advisor: Prof. Hung-Ming Chen Dept. of Electronic Engineering, National Chiao Tung University 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 110
111 Outline Problem Formulation FPGA Packing Problem Clock-Aware Heterogeneous Placement Proposed Algorithm Dynamic Packing with physical information Global Placement Placement Migration Legalization and Detailed Placement 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 111
112 Outline Problem Formulation FPGA Packing Problem Clock-Aware Heterogeneous Placement Proposed Algorithm Dynamic Packing with physical information Global Placement Placement Migration Legalization and Detailed Placement 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 112
113 FPGA Packing Problem The FPGA packing problem is to cluster LUTs and FFs into groups to minimize the total number of blocks and block interconnections while satisfying the limitations of the FF controlling signals and the fracturable LUT constraints. A configurable logic block (CLB) contains 8 fracturable LUTs, 16 FFs, 2 clock inputs (CLK), 2 set/reset inputs (SR),4 clock enables (CE). The CEs are independent for { FF0, FF2, FF4, FF6 }, { FF1, FF3, FF5, FF7 }, { FF8, FF10, FF12, FF14 }, { FF9, FF11, FF13, FF15 }. A Configurable Logic Block (CLB) 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 113
114 FPGA Packing Problem A fracturbale LUT has three modes of operation: As single K-input LUT (K from 1 to 6) As two 5-input (or fewer input) LUTs with separate outputs but common inputs As two 3-input (or fewer input) LUTs irrespective of common inputs 1 to 6 1 to 5 1 to 3 LUT LUT LUT LUT 1 to 3 LUT LUT Mode (1) Mode (2) Mode (3) 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 114
115 Clock-Aware Heterogeneous Placement The FPGA placement problem: Given a heterogeneous FPGA and circuit, we are to determine the desired position for each movable block to minimize the routed wirelength such that each block is in specified regions without overlapping among the blocks. 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 115
116 Clock-Aware Heterogeneous Placement Clock-Aware Placement Constraints Number of global clocks in each clock region is at most 24 clocks. Within each clock region, each half column has at most 12 clocks. Each clock should be constrained to a continuous rectangular area. 5x8 Clock Regions (14~18)x2 Half Columns 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 116
117 Outline Problem Formulation FPGA Packing Problem Clock-Aware Heterogeneous Placement Proposed Algorithm Dynamic Packing with physical information Global Placement Placement Migration Legalization and Detailed Placement 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 117
118 Dynamic Packing with physical information Apply POLAR[1] framework Increase the force of anchor net in initial placement stage and decrease in dynamic packing stage. Packing Factor: # of Clocks # of Control Sets(C/R/CE) Distance # of Common Nets Initial Placement Solve quadratic objective function using B2B model and obtain lower bound HPWL placement using CG Obtain upper bound HPWL placement using Look Ahead Legalization (LAL) Density-Aware Global Move Upper Bound & Lower Bound Converge? YES x5 Dynamic Packing Solve quadratic objective function using B2B model and obtain lower bound HPWL placement using CG Obtain upper bound HPWL placement using Look Ahead Legalization (LAL) Density-Aware Global Move Legalized locations serve as pseudo anchors and add anchors to quadratic objective function Packing NO Legalized locations serve as pseudo anchors and add anchors to quadratic objective function NO no more good packing? YES Global Placement [1]: T. Lin, C. Chu, J. R. Shinnerl, I. Bustany, and I. Nedelchev. POLAR: Placement based on novel rough legalization and renement. ICCAD '13, /3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 118
119 Global Placement Global Placement Lower density around fixed nodes HPWL-Driven Global Placement B2B wirelength model Lower bound placement from solving quadratic objective function Upper bound placement from look-aheadlegalization Density-Aware Global Move Move to optimal region with consideration of Density Wirelength Move to clock valid location (after clock selection) Clock Selection 1. Select a initial Clock Region for each clock 2. Expand each clock s area gradually in consideration of amount of uncovered nodes 3. Unpack CLBs that cannot find any valid location Solve quadratic objective function using B2B model and obtain lower bound HPWL placement using CG Obtain upper bound HPWL placement using Look Ahead Legalization (LAL) Density-Aware Global Move Upper Bound & Lower Bound Converge? NO Routing congestion estimation Congestion-driven packing YES Legalized locations serve as pseudo anchors and add anchors to quadratic objective function Placement Migration (near converge) 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 119
120 Global Placement Routing Congestion Estimation Apply NCTUgr for estimation Congestion-driven Packing Apply further packing for overlapped but routing congestion-free area Apply unpacking for routing congested area Global Placement Lower density around fixed nodes Solve quadratic objective function using B2B model and obtain lower bound HPWL placement using CG Obtain upper bound HPWL placement using Look Ahead Legalization (LAL) Density-Aware Global Move Upper Bound & Lower Bound Converge? NO Routing congestion estimation YES Placement Migration Congestion-driven packing (near converge) Legalized locations serve as pseudo anchors and add anchors to quadratic objective function 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 120
121 Placement Migration For closing the gap between global placement and legalization : Modify the three forces balance system from Kraftwerk2 [2] Placement Migration Obtain move force by calculating cell density gradient Obtain target step size for each cell Hold force : preserve the integrity of the original placement result Net force : model the wirelength of the netlist Move force : perturb the placement and smooth the transition from global placement to legalization YES Density Overflow? NO Legalization & Detailed Placement the cell s surface model obtained by Gaussian Blurring [2]: P. Spindler, U. Schlichtmann, and F. M. Johannes. Kraftwerk2: A fast force-directed quadratic placement approach using an accurate net model. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(8): , Aug /3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 121
122 Legalization and Detailed Placement (1/2) Minimize displacement in legalization 1. Apply bipartite matching to each clock region for legalization 2. Select Clocks for every half column 3. Apply another bipartite matching to fit half column constraints. Legalization & Detailed Placement Legalization using bipartite matching Wirelength-driven detailed placement Placement Result 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 122
123 Legalization and Detailed Placement (2/2) Detailed Placement Perform the Global Swap [3] to reduce the wirelength Identify a good swap pair or a space for each cell After swapping the cell would be in the position that gives the best wirelength while all other cells are treated as fixed Legalization & Detailed Placement Legalization using bipartite matching Wirelength-driven detailed placement Placement Result [3]: M. Pan, N. Viswanathan, and C. Chu. An efficient and effective detailed placement algorithm. In IEEE/ACM International Conference on Computer-Aided Design, pages 48 55, Nov /3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 123
124 Thank you! 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 124
125 Benchmarking Results
126 Top-5 Results: Place/Route Completion Designs Placer-A Placer-B Placer-C Placer-D Placer-E CLK-FPGA01 PASS PASS PASS PASS FAIL CLK-FPGA02 PASS PASS PASS PASS PASS CLK-FPGA03 PASS PASS PASS PASS FAIL CLK-FPGA04 PASS PASS PASS PASS FAIL CLK-FPGA05 PASS PASS PASS PASS FAIL CLK-FPGA06 PASS PASS PASS PASS FAIL CLK-FPGA07 PASS PASS PASS PASS PASS CLK-FPGA08 PASS PASS PASS PASS PASS CLK-FPGA09 PASS PASS PASS PASS PASS CLK-FPGA10 PASS PASS PASS PASS FAIL CLK-FPGA11 PASS PASS PASS PASS FAIL CLK-FPGA12 PASS PASS PASS PASS PASS CLK-FPGA13 PASS PASS PASS PASS PASS
127 Top-4 Placers: Total Routed Wirelength Designs Placer-A Placer-B Placer-C Placer-D CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA
128 Total Routed Wirelength (Normalized) Designs Placer-A Placer-B Placer-C Placer-D CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA Average
129 Placer Runtime (seconds) Designs Fastest 2nd 3rd 4th CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA Less than 10 mins for the largest design!
130 Placer Runtime (Normalized) Designs Fastest 2nd-fastest 3rd-fastest 4th-fastest CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA Average
131 Final Results with Runtime Factor Designs Placer-A Placer-B Placer-C CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA Average
132 Award Ceremony
133 Fifth Place goes to
134 5 GPlace 2.0: Clock-Aware Placement Tool for UltraScale FPGAs Ziad Abuowaimer Shawki Areibi Anthony Vannelli University of Guelph March 22, 2017 Gary Grewal
135 Fourth Place goes to
136 4 VDAplacer ISPD 2017 Contest Clock-Aware FPGA Placement Presenter: Chen Chen Advisor: Prof. Hung-Ming Chen Dept. of Electronic Engineering, National Chiao Tung University 2017/3/22 Department of Electronics Engineering, National Chiao Tung University VLSI Design Automation LAB 136
137 Third Place goes to
138 3 Fastest Placer CUHK - RippleFPGA Gengjie Chen, Chak-Wa Pui, Evangeline F. Y. Young, Bei Yu March 22, 2017
139 Second Place goes to
140 NTUfplace Clock-Aware FPGA Placement 2 Yun-Chih Kuo, Chau-Chin Huang, Shih-Chun Chen, Chun-Han Chiang, Yao-Wen Chang, and Sy-Yen Kuo Mar. 22, 2017 National Taiwan University 140
141 First Place goes to
142 UT DA Two years in a row! 1 UTPlaceF 2.0 ISPD 2017 Clock-Aware FPGA Placement Contest Wuxi Li, David Z. Pan ECE Department, University of Texas at Austin 142
143 Final Results with Runtime Factor Designs UTPlaceF2.0 NTUfplace RippleFPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA CLK-FPGA Average
144 Congratulations!
Clock-Aware FPGA Placement Contest
Clock-Aware FPGA Placement Contest Stephen Yang, Chandra Mulpuri, Sainath Reddy, Meghraj Kalase, Srinivasan Dasasathyan, Mehrdad E. Dehkordi, Marvin Tom, Rajat Aggarwal Xilinx Inc. 2100 Logic Drive San
More informationNovel Pulsed-Latch Replacement Based on Time Borrowing and Spiral Clustering
Novel Pulsed-Latch Replacement Based on Time Borrowing and Spiral Clustering NCTU CHIH-LONG CHANG IRIS HUI-RU JIANG YU-MING YANG EVAN YU-WEN TSAI AKI SHENG-HUA CHEN IRIS Lab National Chiao Tung University
More informationISPD 2015 Detailed Routing-Driven Placement Contest with Fence Regions and Routing Blockages
ISPD 2015 Detailed Routing-Driven Placement Contest with Fence Regions and Routing Blockages Ismail Bustany David Chinnery Joseph Shinnerl Vladimir Yutsis www.ispd.cc/contests/15/ispd2015_contest.html
More informationPower-Driven Flip-Flop p Merging and Relocation. Shao-Huan Wang Yu-Yi Liang Tien-Yu Kuo Wai-Kei Tsing Hua University
Power-Driven Flip-Flop p Merging g and Relocation Shao-Huan Wang Yu-Yi Liang Tien-Yu Kuo Wai-Kei Mak @National Tsing Hua University Outline Introduction Problem Formulation Algorithms Experimental Results
More informationFlip-flop Clustering by Weighted K-means Algorithm
Flip-flop Clustering by Weighted K-means Algorithm Gang Wu, Yue Xu, Dean Wu, Manoj Ragupathy, Yu-yen Mo and Chris Chu Department of Electrical and Computer Engineering, Iowa State University, IA, United
More informationUniversity College of Engineering, JNTUK, Kakinada, India Member of Technical Staff, Seerakademi, Hyderabad
Power Analysis of Sequential Circuits Using Multi- Bit Flip Flops Yarramsetti Ramya Lakshmi 1, Dr. I. Santi Prabha 2, R.Niranjan 3 1 M.Tech, 2 Professor, Dept. of E.C.E. University College of Engineering,
More informationL12: Reconfigurable Logic Architectures
L12: Reconfigurable Logic Architectures Acknowledgements: Materials in this lecture are courtesy of the following sources and are used with permission. Frank Honore Prof. Randy Katz (Unified Microelectronics
More informationCOPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code
COPY RIGHT 2018IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material
More informationClock Tree Power Optimization of Three Dimensional VLSI System with Network
Clock Tree Power Optimization of Three Dimensional VLSI System with Network M.Saranya 1, S.Mahalakshmi 2, P.Saranya Devi 3 PG Student, Dept. of ECE, Syed Ammal Engineering College, Ramanathapuram, Tamilnadu,
More informationEN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014
EN2911X: Reconfigurable Computing Topic 01: Programmable Logic Prof. Sherief Reda School of Engineering, Brown University Fall 2014 1 Contents 1. Architecture of modern FPGAs Programmable interconnect
More informationExploring Architecture Parameters for Dual-Output LUT based FPGAs
Exploring Architecture Parameters for Dual-Output LUT based FPGAs Zhenghong Jiang, Colin Yu Lin, Liqun Yang, Fei Wang and Haigang Yang System on Programmable Chip Research Department, Institute of Electronics,
More informationL11/12: Reconfigurable Logic Architectures
L11/12: Reconfigurable Logic Architectures Acknowledgements: Materials in this lecture are courtesy of the following people and used with permission. - Randy H. Katz (University of California, Berkeley,
More informationAutomatic Transistor-Level Design and Layout Placement of FPGA Logic and Routing from an Architectural Specification
Automatic Transistor-Level Design and Layout Placement of FPGA Logic and Routing from an Architectural Specification by Ketan Padalia Supervisor: Jonathan Rose April 2001 Automatic Transistor-Level Design
More informationREDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.210
More informationRandom Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL
Random Access Scan Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL ramamve@auburn.edu Term Paper for ELEC 7250 (Spring 2005) Abstract: Random Access
More informationTiming with Virtual Signal Synchronization for Circuit Performance and Netlist Security
Timing with Virtual Signal Synchronization for Circuit Performance and Netlist Security Grace Li Zhang, Bing Li, Ulf Schlichtmann Chair of Electronic Design Automation Technical University of Munich (TUM)
More informationCSE140L: Components and Design Techniques for Digital Systems Lab. CPU design and PLDs. Tajana Simunic Rosing. Source: Vahid, Katz
CSE140L: Components and Design Techniques for Digital Systems Lab CPU design and PLDs Tajana Simunic Rosing Source: Vahid, Katz 1 Lab #3 due Lab #4 CPU design Today: CPU design - lab overview PLDs Updates
More informationInvestigation of Look-Up Table Based FPGAs Using Various IDCT Architectures
Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Jörn Gause Abstract This paper presents an investigation of Look-Up Table (LUT) based Field Programmable Gate Arrays (FPGAs)
More informationQuantifying Academic Placer Performance on Custom Designs
Quantifying Academic Placer Performance on Custom Designs Samuel Ward IBM STG 4 Burnet RD Austin TX 78758 siward {@us.ibm.com} Charles Alpert 5 BURNET RD AUSTIN TX 78758 alpert {@us.ibm.com} David A. Papa
More informationDUE to the popularity of portable electronic products,
64 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 1, NO. 4, APRIL 013 Effective and Efficient Approach for Power Reduction by Using Multi-Bit Flip-Flops Ya-Ting Shyu, Jai-Ming Lin,
More informationLatch-Based Performance Optimization for FPGAs. Xiao Teng
Latch-Based Performance Optimization for FPGAs by Xiao Teng A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of ECE University of Toronto
More informationInternational Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More informationECE 555 DESIGN PROJECT Introduction and Phase 1
March 15, 1998 ECE 555 DESIGN PROJECT Introduction and Phase 1 Charles R. Kime Dept. of Electrical and Computer Engineering University of Wisconsin Madison Phase I Due Wednesday, March 24; One Week Grace
More informationWhy FPGAs? FPGA Overview. Why FPGAs?
Transistor-level Logic Circuits Positive Level-sensitive EECS150 - Digital Design Lecture 3 - Field Programmable Gate Arrays (FPGAs) January 28, 2003 John Wawrzynek Transistor Level clk clk clk Positive
More informationPLACEMENT is an important step in the overall IC design
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004 537 Optimality and Scalability Study of Existing Placement Algorithms Chin-Chih Chang, Jason Cong,
More informationInterconnect Planning with Local Area Constrained Retiming
Interconnect Planning with Local Area Constrained Retiming Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 47907, USA {lur, chengkok}@ecn.purdue.edu
More informationCS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm
CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm Overview: In this assignment you will design a register cell. This cell should be a single-bit edge-triggered D-type
More informationReduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops
Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops A.Abinaya *1 and V.Priya #2 * M.E VLSI Design, ECE Dept, M.Kumarasamy College of Engineering, Karur, Tamilnadu, India # M.E VLSI
More informationDesign of Polar List Decoder using 2-Bit SC Decoding Algorithm V Priya 1 M Parimaladevi 2
IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 03, 2015 ISSN (online): 2321-0613 V Priya 1 M Parimaladevi 2 1 Master of Engineering 2 Assistant Professor 1,2 Department
More informationA Survey on Post-Placement Techniques of Multibit Flip-Flops
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 3 (March 2014), PP.11-18 A Survey on Post-Placement Techniques of Multibit
More informationThe main design objective in adder design are area, speed and power. Carry Select Adder (CSLA) is one of the fastest
ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com IMPLEMENTATION OF FAST SQUARE ROOT SELECT WITH LOW POWER CONSUMPTION V.Elanangai*, Dr. K.Vasanth Department of
More informationThis paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.
This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. The final version is published and available at IET Digital Library
More informationHigh Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation
High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities Introduction About Myself What to expect out of this lecture Understand the current trend in the IC Design
More informationInternational Journal of Engineering Research-Online A Peer Reviewed International Journal
RESEARCH ARTICLE ISSN: 2321-7758 VLSI IMPLEMENTATION OF SERIES INTEGRATOR COMPOSITE FILTERS FOR SIGNAL PROCESSING MURALI KRISHNA BATHULA Research scholar, ECE Department, UCEK, JNTU Kakinada ABSTRACT The
More informationReconfigurable FPGA Implementation of FIR Filter using Modified DA Method
Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method M. Backia Lakshmi 1, D. Sellathambi 2 1 PG Student, Department of Electronics and Communication Engineering, Parisutham Institute
More informationPower Reduction Approach by using Multi-Bit Flip-Flops
International Journal of Emerging Engineering Research and Technology Volume 2, Issue 4, July 2014, PP 60-77 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Power Reduction Approach by using Multi-Bit
More informationPeak Dynamic Power Estimation of FPGA-mapped Digital Designs
Peak Dynamic Power Estimation of FPGA-mapped Digital Designs Abstract The Peak Dynamic Power Estimation (P DP E) problem involves finding input vector pairs that cause maximum power dissipation (maximum
More informationINTERMEDIATE FABRICS: LOW-OVERHEAD COARSE-GRAINED VIRTUAL RECONFIGURABLE FABRICS TO ENABLE FAST PLACE AND ROUTE
INTERMEDIATE FABRICS: LOW-OVERHEAD COARSE-GRAINED VIRTUAL RECONFIGURABLE FABRICS TO ENABLE FAST PLACE AND ROUTE By AARON LANDY A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN
More informationUniversity of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science. EECS150, Spring 2011
University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science EECS150, Spring 2011 Homework Assignment 2: Synchronous Digital Systems Review, FPGA
More informationAn Efficient High Speed Wallace Tree Multiplier
Chepuri satish,panem charan Arur,G.Kishore Kumar and G.Mamatha 38 An Efficient High Speed Wallace Tree Multiplier Chepuri satish, Panem charan Arur, G.Kishore Kumar and G.Mamatha Abstract: The Wallace
More informationA Greedy Heuristic Algorithm for Flip-Flop Replacement Power Reduction in Digital Integrated Circuits
A Greedy Heuristic Algorithm for Flip-Flop Replacement Power Reduction in Digital Integrated Circuits C.N.Kalaivani 1, Ayswarya J.J 2 Assistant Professor, Dept. of ECE, Dhaanish Ahmed College of Engineering,
More informationCAD for VLSI Design - I Lecture 38. V. Kamakoti and Shankar Balachandran
1 CAD for VLSI Design - I Lecture 38 V. Kamakoti and Shankar Balachandran 2 Overview Commercial FPGAs Architecture LookUp Table based Architectures Routing Architectures FPGA CAD flow revisited 3 Xilinx
More informationA Fast Constant Coefficient Multiplier for the XC6200
A Fast Constant Coefficient Multiplier for the XC6200 Tom Kean, Bernie New and Bob Slous Xilinx Inc. Abstract. We discuss the design of a high performance constant coefficient multiplier on the Xilinx
More informationTemporal Error Concealment Algorithm Using Adaptive Multi- Side Boundary Matching Principle
184 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 Temporal Error Concealment Algorithm Using Adaptive Multi- Side Boundary Matching Principle Seung-Soo
More informationRetiming Sequential Circuits for Low Power
Retiming Sequential Circuits for Low Power José Monteiro, Srinivas Devadas Department of EECS MIT, Cambridge, MA Abhijit Ghosh Mitsubishi Electric Research Laboratories Sunnyvale, CA Abstract Switching
More informationESE (ESE534): Computer Organization. Last Time. Today. Last Time. Align Data / Balance Paths. Retiming in the Large
ESE680-002 (ESE534): Computer Organization Day 20: March 28, 2007 Retiming 2: Structures and Balance Last Time Saw how to formulate and automate retiming: start with network calculate minimum achievable
More informationCAD Tool Flow for Variation-Tolerant Non-Volatile STT-MRAM LUT based FPGA
CAD Tool Flow for Variation-Tolerant Non-Volatile STT-MRAM LUT based FPGA Jeongbin Kim +822-2123-7826 xtankx123@yonsei.ac.kr Ki Tae Kim +822-2123-7826 ktkim1116@yonsei.ac.kr Eui-Young Chung +822-2123-5866
More informationOptimization of memory based multiplication for LUT
Optimization of memory based multiplication for LUT V. Hari Krishna *, N.C Pant ** * Guru Nanak Institute of Technology, E.C.E Dept., Hyderabad, India ** Guru Nanak Institute of Technology, Prof & Head,
More informationBIST-Based Diagnostics of FPGA Logic Blocks
To appear in Proc. International Test Conf., Nov. 1997 BIST-Based Diagnostics of FPGA Logic Blocks Charles Stroud, Eric Lee, Dept. of Electrical Engineering University of Kentucky and Miron Abramovici
More informationBit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA
Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA M.V.M.Lahari 1, M.Mani Kumari 2 1,2 Department of ECE, GVPCEOW,Visakhapatnam. Abstract The increasing growth of sub-micron
More informationFine-grain Leakage Optimization in SRAM based FPGAs
Fine-grain Leakage Optimization in based FPGAs Abstract FPGAs are evolving at a rapid pace with improved performance and logic density. At the same time, trends in technology scaling makes leakage power
More informationOptimizing area of local routing network by reconfiguring look up tables (LUTs)
Vol.2, Issue.3, May-June 2012 pp-816-823 ISSN: 2249-6645 Optimizing area of local routing network by reconfiguring look up tables (LUTs) Sathyabhama.B 1 and S.Sudha 2 1 M.E-VLSI Design 2 Dept of ECE Easwari
More informationBubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction
1 Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction Matthew Fojtik, David Fick, Yejoong Kim, Nathaniel Pinckney, David Harris, David Blaauw, Dennis Sylvester mfojtik@umich.edu
More informationA Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.8, NO.5, OCTOBER, 08 ISSN(Print) 598-657 https://doi.org/57/jsts.08.8.5.640 ISSN(Online) -4866 A Modified Static Contention Free Single Phase Clocked
More informationThe Stratix II Logic and Routing Architecture
The Stratix II Logic and Routing Architecture David Lewis*, Elias Ahmed*, Gregg Baeckler, Vaughn Betz*, Mark Bourgeault*, David Cashman*, David Galloway*, Mike Hutton, Chris Lane, Andy Lee, Paul Leventis*,
More informationDesign of Fault Coverage Test Pattern Generator Using LFSR
Design of Fault Coverage Test Pattern Generator Using LFSR B.Saritha M.Tech Student, Department of ECE, Dhruva Institue of Engineering & Technology. Abstract: A new fault coverage test pattern generator
More informationInnovative Fast Timing Design
Innovative Fast Timing Design Solution through Simultaneous Processing of Logic Synthesis and Placement A new design methodology is now available that offers the advantages of enhanced logical design efficiency
More informationThe Effect of Wire Length Minimization on Yield
The Effect of Wire Length Minimization on Yield Venkat K. R. Chiluvuri, Israel Koren and Jeffrey L. Burns' Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 01003
More informationLow Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 29 Minimizing Switched Capacitance-III. (Refer
More informationA Proposal for Routing-Based Timing-Driven Scan Chain Ordering
A Proposal for Routing-Based Timing-Driven Scan Chain Ordering Puneet Gupta, Andrew B. Kahng and Stefanus Mantik Department of Electrical and Computer Engineering, UC San Diego, La Jolla, CA, USA Department
More informationFPGA Glitch Power Analysis and Reduction
FPGA Glitch Power Analysis and Reduction Warren Shum and Jason H. Anderson Department of Electrical and Computer Engineering, University of Toronto Toronto, ON. Canada {shumwarr, janders}@eecg.toronto.edu
More informationDesign of Routing-Constrained Low Power Scan Chains
1530-1591/04 $20.00 (c) 2004 IEEE Design of Routing-Constrained Low Power Scan Chains Y. Bonhomme 1 P. Girard 1 L. Guiller 2 C. Landrault 1 S. Pravossoudovitch 1 A. Virazel 1 1 Laboratoire d Informatique,
More informationModeling Latches and Flip-flops
Lab Workbook Introduction Sequential circuits are digital circuits in which the output depends not only on the present input (like combinatorial circuits), but also on the past sequence of inputs. In effect,
More informationLossless Compression Algorithms for Direct- Write Lithography Systems
Lossless Compression Algorithms for Direct- Write Lithography Systems Hsin-I Liu Video and Image Processing Lab Department of Electrical Engineering and Computer Science University of California at Berkeley
More information11. Sequential Elements
11. Sequential Elements Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 11, 2017 ECE Department, University of Texas at Austin
More informationVirtualSync: Timing Optimization by Synchronizing Logic Waves with Sequential and Combinational Components as Delay Units
VirtualSync: Timing Optimization by Synchronizing Logic Waves with Sequential and Combinational Components as Delay Units Grace Li Zhang 1, Bing Li 1, Masanori Hashimoto 2 and Ulf Schlichtmann 1 1 Chair
More informationAn Efficient Reduction of Area in Multistandard Transform Core
An Efficient Reduction of Area in Multistandard Transform Core A. Shanmuga Priya 1, Dr. T. K. Shanthi 2 1 PG scholar, Applied Electronics, Department of ECE, 2 Assosiate Professor, Department of ECE Thanthai
More informationLecture #4: Clocking in Synchronous Circuits
Lecture #4: Clocking in Synchronous Circuits Kunle Stanford EE183 January 15, 2003 Tutorial/Verilog Questions? Tutorial is done, right? Due at midnight (Fri 1/17/03) Turn in copies of all verilog, copy
More informationOn the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques
On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques Andy Yan, Rebecca Cheng, Steven J.E. Wilton Department of Electrical and Computer Engineering University
More informationPost-Routing Layer Assignment for Double Patterning
Post-Routing Layer Assignment for Double Patterning Jian Sun 1, Yinghai Lu 2, Hai Zhou 1,2 and Xuan Zeng 1 1 Micro-Electronics Dept. Fudan University, China 2 Electrical Engineering and Computer Science
More informationGated Driver Tree Based Power Optimized Multi-Bit Flip-Flops
International Journal of Emerging Engineering Research and Technology Volume 2, Issue 4, July 2014, PP 250-254 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Gated Driver Tree Based Power Optimized Multi-Bit
More informationRadar Signal Processing Final Report Spring Semester 2017
Radar Signal Processing Final Report Spring Semester 2017 Full report report by Brian Larson Other team members, Grad Students: Mohit Kumar, Shashank Joshil Department of Electrical and Computer Engineering
More informationReconfigurable Architectures. Greg Stitt ECE Department University of Florida
Reconfigurable Architectures Greg Stitt ECE Department University of Florida How can hardware be reconfigurable? Problem: Can t change fabricated chip ASICs are fixed Solution: Create components that can
More informationFPGA Implementation of DA Algritm for Fir Filter
International Journal of Computational Engineering Research Vol, 03 Issue, 8 FPGA Implementation of DA Algritm for Fir Filter 1, Solmanraju Putta, 2, J Kishore, 3, P. Suresh 1, M.Tech student,assoc. Prof.,Professor
More informationInternational Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi 1, K.Annapurna
More informationOF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS
IMPLEMENTATION OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS 1 G. Sowmya Bala 2 A. Rama Krishna 1 PG student, Dept. of ECM. K.L.University, Vaddeswaram, A.P, India, 2 Assistant Professor,
More informationRELATED WORK Integrated circuits and programmable devices
Chapter 2 RELATED WORK 2.1. Integrated circuits and programmable devices 2.1.1. Introduction By the late 1940s the first transistor was created as a point-contact device formed from germanium. Such an
More informationAustralian Journal of Basic and Applied Sciences. Design of SRAM using Multibit Flipflop with Clock Gating Technique
ISSN:1991-8178 Australian Journal of Basic and Applied Sciences Journal home page: www.ajbasweb.com Design of SRAM using Multibit Flipflop with Clock Gating Technique 1 Divya R. and 2 Hemalatha K.L. 1
More informationAn FPGA Implementation of Shift Register Using Pulsed Latches
An FPGA Implementation of Shift Register Using Pulsed Latches Shiny Panimalar.S, T.Nisha Priscilla, Associate Professor, Department of ECE, MAMCET, Tiruchirappalli, India PG Scholar, Department of ECE,
More informationLogic Design for Single On-Chip Test Clock Generation for N Clock Domain - Impact on SOC Area and Test Quality
and Communication Technology (IJRECT 6) Vol. 3, Issue 3 July - Sept. 6 ISSN : 38-965 (Online) ISSN : 39-33 (Print) Logic Design for Single On-Chip Test Clock Generation for N Clock Domain - Impact on SOC
More informationDesigning for High Speed-Performance in CPLDs and FPGAs
Designing for High Speed-Performance in CPLDs and FPGAs Zeljko Zilic, Guy Lemieux, Kelvin Loveless, Stephen Brown, and Zvonko Vranesic Department of Electrical and Computer Engineering University of Toronto,
More informationIn-System Testing of Configurable Logic Blocks in Xilinx 7-Series FPGAs
In-System Testing of Configurable Logic Blocks in Xilinx 7-Series FPGAs Harmish Rajeshkumar Modi Thesis submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment
More informationUsing Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel
IEEE TRANSACTIONS ON MAGNETICS, VOL. 46, NO. 1, JANUARY 2010 87 Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel Ningde Xie 1, Tong Zhang 1, and
More informationTiming Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,
Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky, tomott}@berkeley.edu Abstract With the reduction of feature sizes, more sources
More informationVLSI Chip Design Project TSEK06
VLSI Chip Design Project TSEK06 Project Description and Requirement Specification Version 1.1 Project: High Speed Serial Link Transceiver Project number: 4 Project Group: Name Project members Telephone
More informationFPGA Design. Part I - Hardware Components. Thomas Lenzi
FPGA Design Part I - Hardware Components Thomas Lenzi Approach We believe that having knowledge of the hardware components that compose an FPGA allow for better firmware design. Being able to visualise
More informationImproving FPGA Performance with a S44 LUT Structure
Improving FPGA Performance with a S44 LUT Structure Wenyi Feng, Jonathan Greene Microsemi Corporation SOC Products Group, San Jose {wenyi.feng, jonathan.greene}@microsemi.com ABSTRACT FPGA performance
More informationTesting of Cryptographic Hardware
Testing of Cryptographic Hardware Presented by: Debdeep Mukhopadhyay Dept of Computer Science and Engineering, Indian Institute of Technology Madras Motivation Behind the Work VLSI of Cryptosystems have
More informationLecture 23 Design for Testability (DFT): Full-Scan
Lecture 23 Design for Testability (DFT): Full-Scan (Lecture 19alt in the Alternative Sequence) Definition Ad-hoc methods Scan design Design rules Scan register Scan flip-flops Scan test sequences Overheads
More informationPower-Aware Placement
Power-Aware Placement Yongseok Cheon, Pei-Hsin Ho, Andrew B. Kahng, Sherief Reda, Qinke Wang Advanced Technology Group, Synopsys, Inc. CSE Department, University of California at San Diego {cheon,pho}@synopsys.com,
More informationFPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique
FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique Dr. Dhafir A. Alneema (1) Yahya Taher Qassim (2) Lecturer Assistant Lecturer Computer Engineering Dept.
More informationIterative Deletion Routing Algorithm
Iterative Deletion Routing Algorithm Perform routing based on the following placement Two nets: n 1 = {b,c,g,h,i,k}, n 2 = {a,d,e,f,j} Cell/feed-through width = 2, height = 3 Shift cells to the right,
More informationTKK S ASIC-PIIRIEN SUUNNITTELU
Design TKK S-88.134 ASIC-PIIRIEN SUUNNITTELU Design Flow 3.2.2005 RTL Design 10.2.2005 Implementation 7.4.2005 Contents 1. Terminology 2. RTL to Parts flow 3. Logic synthesis 4. Static Timing Analysis
More informationDistributed Arithmetic Unit Design for Fir Filter
Distributed Arithmetic Unit Design for Fir Filter ABSTRACT: In this paper different distributed Arithmetic (DA) architectures are proposed for Finite Impulse Response (FIR) filter. FIR filter is the main
More informationOptimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015
Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used
More informationFPGA Design with VHDL
FPGA Design with VHDL Justus-Liebig-Universität Gießen, II. Physikalisches Institut Ming Liu Dr. Sören Lange Prof. Dr. Wolfgang Kühn ming.liu@physik.uni-giessen.de Lecture Digital design basics Basic logic
More informationLow Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction
Low Illinois Scan Architecture for Simultaneous and Test Data Volume Anshuman Chandra, Felix Ng and Rohit Kapur Synopsys, Inc., 7 E. Middlefield Rd., Mountain View, CA Abstract We present Low Illinois
More informationIN A SERIAL-LINK data transmission system, a data clock
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 9, SEPTEMBER 2006 827 DC-Balance Low-Jitter Transmission Code for 4-PAM Signaling Hsiao-Yun Chen, Chih-Hsien Lin, and Shyh-Jye
More informationSequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,
Sequencing ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2013 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/ Outlines Introduction Sequencing
More informationCDA 4253 FPGA System Design FPGA Architectures. Hao Zheng Dept of Comp Sci & Eng U of South Florida
CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South Florida FPGAs Generic Architecture Also include common fixed logic blocks for higher performance: On-chip mem.
More information