Methodology. Nitin Chawla,Harvinder Singh & Pascal Urard. STMicroelectronics

Similar documents
High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

Sharif University of Technology. SoC: Introduction

Digital to Mixed-Signal Verification of Power Management SOCs Using Questa-ADMS. M. Behaghel

TKK S ASIC-PIIRIEN SUUNNITTELU

Co-simulation Techniques for Mixed Signal Circuits

FPGA Development for Radar, Radio-Astronomy and Communications

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Certus TM Silicon Debug: Don t Prototype Without It by Doug Amos, Mentor Graphics

Digitally Assisted Analog Circuits. Boris Murmann Stanford University Department of Electrical Engineering

A video signal processor for motioncompensated field-rate upconversion in consumer television

EITF35: Introduction to Structured VLSI Design

Verification Methodology for a Complex System-on-a-Chip

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

Layout Decompression Chip for Maskless Lithography

FPGA Prototyping using Behavioral Synthesis for Improving Video Processing Algorithm and FHD TV SoC Design Masaru Takahashi

Designing for the Internet of Things with Cadence PSpice A/D Technology

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

Modeling and Implementing Software-Defined Radio Communication Systems on FPGAs Puneet Kumar Senior Team Lead - SPC

Boolean, 1s and 0s stuff: synthesis, verification, representation This is what happens in the front end of the ASIC design process

Radar Signal Processing Final Report Spring Semester 2017

Why FPGAs? FPGA Overview. Why FPGAs?

ESE534: Computer Organization. Today. Image Processing. Retiming Demand. Preclass 2. Preclass 2. Retiming Demand. Day 21: April 14, 2014 Retiming

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

FPGA Hardware Resource Specific Optimal Design for FIR Filters

A Low-Power 0.7-V H p Video Decoder

Syed Muhammad Yasser Sherazi CURRICULUM VITAE

EMI/EMC diagnostic and debugging

Experiment: FPGA Design with Verilog (Part 4)

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity.

Data Converters and DSPs Getting Closer to Sensors

Embedded Signal Processing with the Micro Signal Architecture

Future of Analog Design and Upcoming Challenges in Nanometer CMOS

RFI MITIGATING RECEIVER BACK-END FOR RADIOMETERS

Tiptop audio z-dsp.

Introduction to The Design of Mixed-Signal Systems on Chip 1

Figure 1: Feature Vector Sequence Generator block diagram.

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

Using on-chip Test Pattern Compression for Full Scan SoC Designs

ESE (ESE534): Computer Organization. Last Time. Today. Last Time. Align Data / Balance Paths. Retiming in the Large

White Paper. Mixed Signal Design & Verification Methodology for Complex SoCs

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

HiPAcc-LTE: An Integrated High Performance Accelerator for 3GPP LTE Stream Ciphers

6.3 Sequential Circuits (plus a few Combinational)

Digital Correction for Multibit D/A Converters

VHDL Design and Implementation of FPGA Based Logic Analyzer: Work in Progress

At-speed Testing of SOC ICs

Innovative Fast Timing Design

Enhancing Performance in Multiple Execution Unit Architecture using Tomasulo Algorithm

An FPGA Platform for Demonstrating Embedded Vision Systems. Ariana Eisenstein

Low-Power Decimation Filter for 2.5 GHz Operation in Standard-Cell Implementation

EE178 Spring 2018 Lecture Module 5. Eric Crabill

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

PHOTOTUBE SCANNING SETUP AT THE UNIVERSITY OF MARYLAND. Doug Roberts U of Maryland, College Park

DT9834 Series High-Performance Multifunction USB Data Acquisition Modules

Authentic Time Hardware Co-simulation of Edge Discovery for Video Processing System

AND8383/D. Introduction to Audio Processing Using the WOLA Filterbank Coprocessor APPLICATION NOTE

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

EX04-000D. DVB-T/2xT/T2 MODULATOR DVB MODULATOR

Performance Modeling and Noise Reduction in VLSI Packaging

CacheCompress A Novel Approach for Test Data Compression with cache for IP cores

Design and analysis of microcontroller system using AMBA- Lite bus

Timing EECS141 EE141. EE141-Fall 2011 Digital Integrated Circuits. Pipelining. Administrative Stuff. Last Lecture. Latch-Based Clocking.

Digital Video Engineering Professional Certification Competencies

RF Technology for 5G mmwave Radios

Lossless Compression Algorithms for Direct- Write Lithography Systems

L12: Reconfigurable Logic Architectures

L11/12: Reconfigurable Logic Architectures

Modeling Digital Systems with Verilog

Efficient implementation of a spectrum scanner on a software-defined radio platform

DESIGN PHILOSOPHY We had a Dream...

Amon: Advanced Mesh-Like Optical NoC

Day 21: Retiming Requirements. ESE534: Computer Organization. Relative Sizes. Today. State. State Size

LCD Display Wall Narrow Bezel Series

Research Article. Implementation of Low Power, Delay and Area Efficient Shifters for Memory Based Computation

PEP-II longitudinal feedback and the low groupdelay. Dmitry Teytelman

ADVANCES in semiconductor technology are contributing

Powering Collaboration and Innovation in the Simulation Design Flow Agilent EEsof Design Forum 2010

Upgrading a FIR Compiler v3.1.x Design to v3.2.x

CS184a: Computer Architecture (Structures and Organization) Last Time

Major Differences Between the DT9847 Series Modules

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

Introduction to Data Conversion and Processing

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

IoT Technical foundation and use cases Anders P. Mynster, Senior Consultant High Tech summit DTU FORCE Technology at a glance

Tolerant Processor in 0.18 µm Commercial UMC Technology

ACT-R ACT-R. Core Components of the Architecture. Core Commitments of the Theory. Chunks. Modules

Large Area, High Speed Photo-detectors Readout

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

Calibrate, Characterize and Emulate Systems Using RFXpress in AWG Series

FPGA Design with VHDL

Design and Implementation of Partial Reconfigurable Fir Filter Using Distributed Arithmetic Architecture

SEMICONDUCTOR TECHNOLOGY -CMOS-

Interfacing the TLC5510 Analog-to-Digital Converter to the

OddCI: On-Demand Distributed Computing Infrastructure

Design and Implementation of SOC VGA Controller Using Spartan-3E FPGA

Digital Signal Processing Detailed Course Outline

Registers. Unit 12 Registers and Counters. Registers (D Flip-Flop based) Register Transfers (example not out of text) Accumulator Registers

GALILEO Timing Receiver

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy

IQORXD Dual-Channel Multimode Fibre Optic Receiver for SDI

Transcription:

An Algorithm to Silicon ESL Design Methodology Nitin Chawla,Harvinder Singh & Pascal Urard STMicroelectronics

SOC Design Challenges:Increased Complexity 992 994 996 998 2 22 24 26 28 2.7.5.35.25.8.3 9 65 45 32 k 5k 5k 3k 45k 8k 5k 3k 6k.2M #Gates / Die (5mm2) conservative numbers 5K 25K 75k.5M 2.2M 4M 7.5M 5M 3M 6M #Gates per Designer per year 4k 6k 9k 4k 56k 9k 25k 2k 2k 2k Men / Years per 5 mm2 Die ~ ~4 ~8 ~4 ~4 ~43 ~6 ~75 ~5 ~3 Need to improve design productivity DAC 29: User Track Nitin Chawla STMicroelectronics - 29 2

SOC Design Challenges: System to Implementation barrier 5. Understand specification J. Gerlach DAC 29: User Track Nitin Chawla STMicroelectronics - 29 3

Need an ESL Vision:Extending High Level Synthesis Single digital models for algorithmic and architecture exploration Executable Specs RF Analog coarse model chain analysis Encapsulate C/C++ IPs Refine spec c chain Stream block block2 block3 block4 block5 Output HLS Formal proof RTL chain Stream block block2 block3 block4 block5 Output DAC 29: User Track Nitin Chawla STMicroelectronics - 29 4

Design Space Exploration: Beyond standalone HLS Area/Power/Parameters Raising level of productivity Algorithmic + Architectural Design Space Exploration Best manual solution Design space Current HLS allows better exploration but limited to local minima DAC 29: User Track Nitin Chawla STMicroelectronics - 29 5

ESL Design Flow:Step Single digital models for algorithmic and architecture exploration Executable Specs RF Analog coarse model chain analysis Encapsulate C/C++ IPs Refine spec c chain Stream block block2 block3 block4 block5 Output HLS Formal proof RTL chain Stream block block2 block3 block4 block5 Output DAC 29: User Track Nitin Chawla STMicroelectronics - 29 6

Template class for N stage DIF Streaming FFT Serial input input v) - W output v) input v) - output v) W input v) - W output v) input 2) - W output 2) input ) - W output ) Serial output v-2 v- buffer N-th stage v=2 N- v-2 v- buffer (N-)th stage v=2 N-2 v-2 v- buffer n-th stage v=2 n- buffer 2-nd stage buffer st stage N objects of stage class Stage class input samples > = v) output samples > = v) <stage computation precision> <stage input precision> Serial input - W Serial output <stage output precision> v-2 v- buffer Nth stage v=2 n- <stage number N> DAC 29: User Track Nitin Chawla STMicroelectronics - 29 7

ESL Design Flow:Step 2 Single digital models for algorithmic and architecture exploration Executable Specs RF Analog coarse model chain analysis Encapsulate C/C++ IPs Refine spec c chain Stream block block2 block3 block4 block5 Output HLS Formal proof RTL chain Stream block block2 block3 block4 block5 Output DAC 29: User Track Nitin Chawla STMicroelectronics - 29 8

Model Based Design System Model creates the Executable Specification. System Model is the center of the development process and enables Specification Capture. Block level partitioning and assembly. Continuous test and Verification. Block level reuse. Common design environment. Examples Simulink(Mathworks) ADS/SystemVue(Agilent) DAC 29: User Track Nitin Chawla STMicroelectronics - 29 9

Single Source Model Based Design: Simulink S-function Encapsulation void block(ac_channel<type_a> &input, ac_channel<type_b> &output) HLS C++ void block_wrapper(double input[n], double output[n]) Matlab supported native datatype interface wrapper S function structure defination Legacy_code( sfcn_cmex_generate,def); Legacy_code( compile,def); MATLAB ENVIRONMENT Simulink Source sfn_block S-function Simulink sink SIMULINK ENVIRONMENT Simulink block DAC 29: User Track Nitin Chawla STMicroelectronics - 29

Numerical Refinement for Noise Budgets :SQNR vs I/P Signal PAPR for an FFT No Noise modulation based On I/P Signal PAPR Sharp fall in SQNR Due to clipping for I/P PAPR < 6 db PAPR:Peak to Average Power Ratio DAC 29: User Track Nitin Chawla STMicroelectronics - 29

ESL Design Flow:Step 3 Single digital models for algorithmic and architecture exploration Executable Specs RF Analog coarse model chain analysis Encapsulate C/C++ IPs Refine spec c chain Stream block block2 block3 block4 block5 Output HLS Formal proof RTL chain Stream block block2 block3 block4 block5 Output DAC 29: User Track Nitin Chawla STMicroelectronics - 29 2

HLS Explorations: Area, Throughput tradeoff Area, Throughput samples / n cycles II = n Area =~ a/n sample/cycle II= Area = a n samples / cycle II= Unroll n (main loop) DAC 29: User Track Nitin Chawla STMicroelectronics - 29 3

Unfolded Architecture: Stage Implementation input v) output v) Parallelism Achieved by loop unrolling M parallel Inputs butterfly M parallel outputs............ Multi banked buffer (MX(2^n-/M) n-th stage of radix 2 sdf FFT unfolded by M DAC 29: User Track Nitin Chawla STMicroelectronics - 29 4

Multimillion Gate GS/s Frequency Domain Processor N/2 N/2 N/2 N/2 FFT Filter Channel Shifter Shifter IFFT N/2 N/2 N/2 N/2 N N N N N/2 N/2 N/2 Interleaver FFT Filter Channel Shifter Shifter IFFT N/2 N/2 N/2 Add block Frequency mask Channel shift value Unfolded by (4) systolic 248 pt FFT/IFFT. Interblock FIFO communication. DAC 29: User Track Nitin Chawla STMicroelectronics - 29 5

Physical Prototyping at ESL Level In ASIC technologies of 65nm and below path delays are wire dominated. Most Signal Processing applications use lot of compiler generated memory cuts. Memory Architecture Choices at the ESL level are simply made on the basis of BandWidth and Ports. But Memory cuts create routing blockages and wire detours. In the end it s the Silicon area post P&R that matters. DAC 29: User Track Nitin Chawla STMicroelectronics - 29 6

Physical prototyping: Memory Architecture exploration 4 RAMS (Width 4X) SKEWED ASPECT RATIO NARROW & DEEP ROUTING CHANNELS - HUGE ROUTING CONGESTION (Width 4X) replaced with 4 (Width X) RAMS CREATION OF NEW ROUTING CHANNELS MEMORY AREA INCREASES CORE UTILIZATION IMPROVES BY 3% POST P&R AREA IMPROVES BY 2% DAC 29: User Track Nitin Chawla STMicroelectronics - 29 7

Design Productivity vs Manual RTL X % 5X X t /2X Behavioral IP Reuse, further improves design productivity DAC 29: User Track Nitin Chawla STMicroelectronics - 29 8

Conclusion ESL synthesis can successfully build production worthy multi million gate complex application engines from untimed C/C++ algorithmic models. Key benefits of ESL Synthesis Increased design productivity and faster time to market Flexibility and scalability to try alternative architectures Better QOR vs Hand Coded design due to enhanced Design Space Exploration DAC 29: User Track Nitin Chawla STMicroelectronics - 29 9