DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

Similar documents
High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

4. Formal Equivalence Checking

Sharif University of Technology. SoC: Introduction

Static Timing Analysis for Nanometer Designs

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

TKK S ASIC-PIIRIEN SUUNNITTELU

Innovative Fast Timing Design

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

2.6 Reset Design Strategy

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill

Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction

L11/12: Reconfigurable Logic Architectures

L12: Reconfigurable Logic Architectures

9. Synopsys PrimeTime Support

Clock Tree Power Optimization of Three Dimensional VLSI System with Network

Digital to Mixed-Signal Verification of Power Management SOCs Using Questa-ADMS. M. Behaghel

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Achieving Timing Closure in ALTERA FPGAs

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

Logic Design II (17.342) Spring Lecture Outline

Retiming Sequential Circuits for Low Power

The Stratix II Logic and Routing Architecture

Hardware Modeling of Binary Coded Decimal Adder in Field Programmable Gate Array

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

Figure.1 Clock signal II. SYSTEM ANALYSIS

1. Convert the decimal number to binary, octal, and hexadecimal.

Level and edge-sensitive behaviour

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

University College of Engineering, JNTUK, Kakinada, India Member of Technical Staff, Seerakademi, Hyderabad

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

Cascadable 4-Bit Comparator

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity.

Static Timing Analysis for Nanometer Designs. A Practical Approach

Using the Quartus II Chip Editor

FPGA Development for Radar, Radio-Astronomy and Communications

EXOSTIV TM. Frédéric Leens, CEO

Design Project: Designing a Viterbi Decoder (PART I)

Digilent Nexys-3 Cellular RAM Controller Reference Design Overview

Timing with Virtual Signal Synchronization for Circuit Performance and Netlist Security

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

Microprocessor Design

High Performance Carry Chains for FPGAs

THE USE OF forward error correction (FEC) in optical networks

Altera s Max+plus II Tutorial

A video signal processor for motioncompensated field-rate upconversion in consumer television

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)

An MFA Binary Counter for Low Power Application

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

Modeling Digital Systems with Verilog

Section 6.8 Synthesis of Sequential Logic Page 1 of 8

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

A Novel Low-overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application

EITF35: Introduction to Structured VLSI Design

FPGA Prototyping using Behavioral Synthesis for Improving Video Processing Algorithm and FHD TV SoC Design Masaru Takahashi

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

Software Engineering 2DA4. Slides 9: Asynchronous Sequential Circuits

Improved 32 bit carry select adder for low area and low power

SGERC: a self-gated timing error resilient cluster of sequential cells for wide-voltage processor

Design for Testability

Designs with Multiple Clock Domains: Avoiding Clock Skew and Reducing Pattern Count Using DFTAdvisor tm and FastScan tm

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Lecture 23 Design for Testability (DFT): Full-Scan

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Design of Fault Coverage Test Pattern Generator Using LFSR

At-speed Testing of SOC ICs

Using minterms, m-notation / decimal notation Sum = Cout = Using maxterms, M-notation Sum = Cout =

A Novel Approach for Auto Clock Gating of Flip-Flops

CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm

T1 Deframer. LogiCORE Facts. Features. Applications. General Description. Core Specifics

COE328 Course Outline. Fall 2007

Implementation and Analysis of Area Efficient Architectures for CSLA by using CLA

Glitch Reduction and CAD Algorithm Noise in FPGAs. Warren Shum

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

A Review on Hybrid Adders in VHDL Payal V. Mawale #1, Swapnil Jain *2, Pravin W. Jaronde #3

Implementation of Low Power and Area Efficient Carry Select Adder

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

An Efficient High Speed Wallace Tree Multiplier

CS/EE 181a 2010/11 Lecture 6

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit)

CS3350B Computer Architecture Winter 2015

A Low Power Delay Buffer Using Gated Driver Tree

Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

Good afternoon! My name is Swetha Mettala Gilla you can call me Swetha.

Performance Driven Reliable Link Design for Network on Chips

Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices


Radar Signal Processing Final Report Spring Semester 2017

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Latch-Based Performance Optimization for FPGAs. Xiao Teng

Transcription:

DATASHEET DC Ultra Concurrent Timing, Area, Power and Test Optimization DC Ultra RTL synthesis solution enables users to meet today s design challenges with concurrent optimization of timing, area, power and test Overview DC Ultra includes innovative topographical technology that enables a predictable flow resulting in faster time to results.topographical technology provides timing and area prediction within 10% of the results seen post-layout enabling designers to reduce costly iterations between synthesis and physical implementation. DC Ultra also includes a scalable infrastructure that delivers 2X faster runtime on quad-core platforms. Key Benefits Concurrent optimization of timing, area, power and test Results correlate within 10% of physical implementation Removes timing bottlenecks by creating fast critical paths Gate-to-gate optimization for smaller area on new or legacy designs while maintaining timing Quality of Results (QoR) Cross-probing between RTL, schematic, and timing reports for fast debug Offers more flexibility for users to control optimization on specific areas of designs Enables higher efficiency with integrated static timing analysis, test synthesis and power synthesis Support for multi voltage and multi supply 2X faster runtime on quad-core compute servers DesignWare IP DFTMAX PrimeTime DC Ultra Correlated QoR Power Compiler Formality Figure 1: The industry s most comprehensive synthesis solution synopsys.com

WLM Physical library DC Ultra Topographical technology Built for RTL designers No need for wireload models Correlates to post-layout timing area, and power No change to synthesis use model Figure 2: Topographical technology in RTL synthesis Topographical Technology Topographical technology delivers tight correlation to post-layout timing, area, test and power without the need for wireload models. It is designed for RTL designers and requires no physical design expertise or changes to the synthesis use model (Figure 2). Prediction of layout timing and area in DC Ultra is achieved through the innovative topographical technology.it enables RTL designers to fix real design issues while still in synthesis and generate a better starting point for place and route, eliminating costly iterations. This significantly boosts RTL designers productivity. Topographical technology shares technology with Galaxy implementation, minimizing iterations to speed up physical implementation. Area Reduction Technologies DC Ultra provides optimization technologies that monotonically reduce gate-to-gate area by an average of 10% while maintaining Quality of Results (QoR). These advanced optimizations operate on both new and legacy design netlists, with or without physical information and at all process nodes. Area reductions are achieved without re-synthesis and without affecting timing results for maximum productivity. Cross-Probing Cross-probing between the RTL source code and other design views such as schematic, timing reports and physical implementation provide designers with the ability to quickly detect potential design issues and fix them at the source. Early visibility into potential design issues using multiple views accelerates the creation of high quality RTL and constraints. Figure 3: Cross-probing between RTL, schematic and timing view 2

a b c d e f a b c d e f Mul T2 Mul T2 Carry delay incurred 3x z z<=a*b+c*d-e-f CSA transformation Carry delay incurred 3x Figure 4: Transformation of sum of products into a Carry Save Adder (CSA) tree z Advanced Arithmetic Optimization For designs containing datapath, DC Ultra uses innovative datapath optimization algorithms to achieve better quality-of results in terms of timing, area and power with fast runtimes. DC Ultra identifies arithmetic trees in your HDL and optimizes them using carrysave arithmetic techniques to minimize performance and area impact of carry propagation (Figure 4). With DC Ultra, logic synthesis users can also take advantage of superior datapath synthesis capability to generate highly optimized implementations of DesignWare arithmetic components. Before logic A B critical After logic logic critical A B Figure 5: Through logic duplication, DC Ultra reduces the load on the critical path for significant timing improvements 3

Powerful Critical Path Synthesis DC Ultra employs various optimization algorithms throughout the synthesis process to deliver ultra-fast critical path timing. For example, immediately after the initial technology mapping, the design is not yet subjected to detailed gate-level optimization techniques. At this stage, DC Ultra performs aggressive timing driven restructuring, mapping and gate-level optimization. As a result, the subsequent detailed gate-level optimizations benefit from better overall timing-based structure. Throughout gate-level optimization, additional strategies are applied to improve the delay of the critical paths in the design. One of the techniques includes aggressive logic duplication for reducing the load seen by the critical path (Figure 5). DC Ultra looks at a larger subsection of the critical path during logic duplication and can replicate many gates to reduce load of high fan-out nets, hence improving timing on critical paths through load isolation. DC Ultra will also automatically ungroup parts of the design on the critical path to achieve better area and timing. It can also buffer high fan-out nets to improve total negative slack. The DC Ultra mapping algorithms also attempt to map groups of cells to wide fan-in library cells on critical timing paths that can reduce number of logic levels and cell instances. Thus, timing, area, and power are improved. Existing circuit Circuit after clock 7.5 10.2 9.8 period = 10 8.2 Inputs Outputs Figure 6: Retiming designs with registers Existing circuit Circuit after 23.0 10 ns 8.9 4.0 7.3 Inputs Outputs Inputs Outputs Figure 7: Retiming on combinational logic Register Retiming Register retiming further improves QoR. It performs optimization of sequential logic by moving registers through logic boundaries to optimize timing with minimum area impact (Figure 6) for designs that already contain registers. The same functionality is preserved at I/O boundaries. Register retiming can also insert pipeline registers in pure combination circuits to be used to meet performance requirements as well as reduce area (Figure 7). Register retiming can be used along with datapath optimization algorithms to get the fastest pipelines. 4

20 2X Faster runtime on 4 cores 16 14 Single core runtime Multicore runtime 8 4 0 350k 368k 473k 650k 838k 880k 1.2m 2.7m Figure 8: Synthesis runtime Better Control of Synthesis Cost-Function Priorities and Optimization Steps DC Ultra provides finer control over optimization to meet aggressive timing requirements. DC Ultra has a default cost function that prioritizes design rule requirements over timing and area constraints. By setting the appropriate priority, designers can drive synthesis to achieve the best QoR for a design. Compile directives in DC Ultra can be used to further control optimization. The compile directives allow the designer to change DC Ultra s standard behavior. For example, a designer may have a particular structure in mind and have instantiated the cells in the path. Although the overall structure should not change, it may be desirable for Design Compiler to perform sizing and local optimization for better timing. For this set of optimizations, the global structuring of the logic can be disabled while enabling gate sizing. Infrastructure for Multicore The advent of multicore processors in computer platforms has boosted the processing power available to designers. DC Ultra includes a scalable infrastructure to take advantage of multicore compute servers. Using an optimized scheme of distributed and multithreaded parallelization, DC Ultra delivers a 2X improvement in runtimes on quad core platforms. The infrastructure delivers runtime benefits without deviating from the quality of results. Figure 8 compares DC Ultra runtimes across multiple designs on single core vs. quad core machines. On the X-axis are designs and on the Y-axis are the runtimes in hours. The blue bars represent DC Ultra runtimes using a single core machine and the purple bars represent runtimes using quad core machines for the same design. As seen in the figure, DC Ultra is, on average, 2X faster on quad core compute servers. Netlist Formats and Interfaces DC Ultra supports all popular industry standard formats: Circuit Netlist: VerilOg, SystemVerilog, and VHDL Command Script: dcsh, TCL Interfaces: PLI, SDF, PDEF, SDC Platforms: IBM AIX (32-/64-bit) Redhat Linux (32-/64-bit) Sun Solaris (32-/64-bit) 5

Summary DC Ultra includes comprehensive algorithms to optimize concurrently for timing, area, power and test. The topographical technology in DC Ultra ensures that results correlate to layout, eliminating costly iterations between synthesis and physical implementation. Optimization technologies that reduce gate-to-gate area by an average of 10% while maintaining timing Quality of Results (QoR) operate on both new and legacy design netlists. RTL cross-probing with multiple design views accelerates creation of high quality RTL and constraints. For more information about Synopsys products, support services or training, visit us on the web at: www.synopsys.com, contact your local sales representative or call 650.584.5000. 2018 Synopsys, Inc. All rights reserved. Synopsys is a trademark of Synopsys, Inc. in the United States and other countries. A list of Synopsys trademarks is available at synopsys.com/copyright.html. All other names mentioned herein are trademarks or registered trademarks of their respective owners. 01/31/18.CS12320_DC Ultra DS.