Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Similar documents
EE178 Spring 2018 Lecture Module 5. Eric Crabill

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Synchronization in Asynchronously Communicating Digital Systems

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity.

2.6 Reset Design Strategy

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

EECS150 - Digital Design Lecture 19 - Finite State Machines Revisited

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Design Project: Designing a Viterbi Decoder (PART I)


Chapter 2. Digital Circuits

EITF35: Introduction to Structured VLSI Design

Scan. This is a sample of the first 15 pages of the Scan chapter.

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

Metastability Analysis of Synchronizer

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

University College of Engineering, JNTUK, Kakinada, India Member of Technical Staff, Seerakademi, Hyderabad

First Name Last Name November 10, 2009 CS-343 Exam 2

Performance Modeling and Noise Reduction in VLSI Packaging

An automatic synchronous to asynchronous circuit convertor

Sharif University of Technology. SoC: Introduction

Static Timing Analysis for Nanometer Designs

Clocking Spring /18/05

K.T. Tim Cheng 07_dft, v Testability

Retiming Sequential Circuits for Low Power

DEDICATED TO EMBEDDED SOLUTIONS

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

LOW POWER VLSI ARCHITECTURE OF A VITERBI DECODER USING ASYNCHRONOUS PRECHARGE HALF BUFFER DUAL RAILTECHNIQUES

YEDITEPE UNIVERSITY DEPARTMENT OF COMPUTER ENGINEERING. EXPERIMENT VIII: FLIP-FLOPS, COUNTERS 2014 Fall

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

EE-382M VLSI II FLIP-FLOPS

Contents Circuits... 1

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

DELAY TEST SCAN FLIP-FLOP (DTSFF) DESIGN AND ITS APPLICATIONS FOR SCAN BASED DELAY TESTING

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit)

Full scan testing of handshake circuits. Frank J. te Beest

Laboratory 4. Figure 1: Serdes Transceiver

Modeling Digital Systems with Verilog

Clock Domain Crossing. Presented by Abramov B. 1

Performance Driven Reliable Link Design for Network on Chips

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING DIGITAL DESIGN

VLSI Chip Design Project TSEK06

Design and Implementation of FPGA Configuration Logic Block Using Asynchronous Static NCL

Digital Fundamentals: A Systems Approach

Figure 1 shows a simple implementation of a clock switch, using an AND-OR type multiplexer logic.

EE273 Lecture 11 Pipelined Timing Closed-Loop Timing November 2, Today s Assignment

1. What does the signal for a static-zero hazard look like?

Reduction of Clock Power in Sequential Circuits Using Multi-Bit Flip-Flops


1. Convert the decimal number to binary, octal, and hexadecimal.

Clock Tree Power Optimization of Three Dimensional VLSI System with Network

EE241 - Spring 2005 Advanced Digital Integrated Circuits

FSM Cookbook. 1. Introduction. 2. What Functional Information Must be Modeled

CPS311 Lecture: Sequential Circuits

Chapter 8 Design for Testability

FPGA Design with VHDL

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

Logic and Computer Design Fundamentals. Chapter 7. Registers and Counters

CAD Tools for Synthesis of Sleep Convention Logic

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

TKK S ASIC-PIIRIEN SUUNNITTELU

Synchronous Sequential Design

Electrical & Computer Engineering ECE 491. Introduction to VLSI. Report 1

THE USE OF forward error correction (FEC) in optical networks

CS 110 Computer Architecture. Finite State Machines, Functional Units. Instructor: Sören Schwertfeger.

Chapter 2 Clocks and Resets

Software Engineering 2DA4. Slides 9: Asynchronous Sequential Circuits

Testing Digital Systems II

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

ECE 555 DESIGN PROJECT Introduction and Phase 1

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

Asynchronous Interface FIFO Design on FPGA for High-throughput NRZ Synchronisation

ECE 263 Digital Systems, Fall 2015

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram

Level and edge-sensitive behaviour

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

A High-Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability. Nikolaos Minas David Kinniment Keith Heron Gordon Russell

Australian Journal of Basic and Applied Sciences. Design of SRAM using Multibit Flipflop with Clock Gating Technique

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

Using Scan Side Channel to Detect IP Theft

FPGA Laboratory Assignment 4. Due Date: 06/11/2012

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Low Power Digital Design using Asynchronous Logic

EECS 270 Midterm 1 Exam Closed book portion Winter 2017

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

Tutorial 11 ChipscopePro, ISE 10.1 and Xilinx Simulator on the Digilent Spartan-3E board

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Good afternoon! My name is Swetha Mettala Gilla you can call me Swetha.

Chapter 5 Synchronous Sequential Logic

Combinational vs Sequential

Sequential Logic. Analysis and Synthesis. Joseph Cavahagh Santa Clara University. r & Francis. TaylonSi Francis Group. , Boca.Raton London New York \

Transcription:

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Bradley R. Quinton*, Mark R. Greenstreet, Steven J.E. Wilton*, *Dept. of Electrical and Computer Engineering, Dept. of Computer Science University of British Columbia Vancouver, Canada Motivation: General Wire delay is increasing with respect to gate delay This can make inter-block interconnect the bottle-neck to overall IC performance What is the best way to manage this problem? 1

Motivation: Specific Sharing a single physical resource amongst many parts of the design requires a network that spans the entire die Motivation: Specific multiplexed bus spanning the entire chip 2

Motivation: Specific multiplexed bus spanning the entire chip Past Work: Synchronous Algorithms have been proposed to find the optimal repeater and register locations for synchronous interconnect However, these algorithms assume that a low-skew clock is available at any location on the die Creating this clock is difficult: on-die process variation power supply noise clock jitter placement blockages 3

Past Work: Asynchronous Asynchronous design techniques provide a potential solution since they do not require a global clock However, techniques that have been proposed thus far require custom designed circuits and manual design optimization This makes these techniques difficult to compare to synchronous techniques, and infeasible for many ASICs and SoCs designs Goals of this Work 4

Goals of this Work 1) Develop an asynchronous design that is feasible using regular standard cells, and off-the-shelf CAD tools. Goals of this Work 1) Develop an asynchronous design that is feasible using regular standard cells, and off-the-shelf CAD tools. 2) Compare synchronous and asynchronous interconnect networks in terms of throughput, area, power and latency for a range of designs. 5

Asynchronous Interconnect Basic Structure By coordinating transfers between the source and destination asynchronous techniques avoid the requirement of a global clock 6

Data Formats Two broad categories: 1) Bundled-data control signaling is separate from the data requires delay-matching* 2) Delay-insensitive control signaling encoded with the data no delay-matching* required * Arbitrary delay-matching is not supported by most design tools. Handshaking Two commonly used handshaking protocols: 1) 2-phase control signal transitions mark data transfers 2) 4-phase control signal values mark data transfers * Detecting transitions is harder than detecting values, but 4-phase requires more traversals of the interconnect 7

CAD Tool / IP Considerations CAD tool limitations from the perspective asynchronous interconnect design: delay-matching automated glitch avoidance inference from combinational loops path based delay optimization automatic insertion of sequential cells * non-optimal sequential cells * This is a significant since it restricts asynchronous pipelines to occur only at network nodes Basic Design - Data Encoding Many data encodings are possible for delay-insensitive circuits We choose dual-rail encoding to minimize the depth of the control decode dual-rail encodings allow bit transitions to be detected with an simple XOR gate. 8

Basic Design - Sequential Gates We use a flip-flop based design to conform to standard IP and CAD tools 2 flops/bit are require because the data is encoded Basic Design - Sequential Gates We use a flip-flop based design to conform to standard IP and CAD tools 2 flops/bit are require because the data is encoded 9

Basic Design - Sequential Gates We use a flip-flop based design to conform to standard IP and CAD tools 2 flops/bit are require because the data is encoded Basic Design - Sequential Gates We use a flip-flop based design to conform to standard IP and CAD tools 2 flops/bit are require because the data is encoded 10

Basic Design - Sequential Gates We use a flip-flop based design to conform to standard IP and CAD tools 2 flops/bit are require because the data is encoded Basic Design - Sequential Gates We use a flip-flop based design to conform to standard IP and CAD tools 2 flops/bit are require because the data is encoded 11

Basic Design - Clock Generation Clock generation must be done carefully in a flop-based design to avoid glitches A clock edge is generated if: 1) the code at the next stage equals the current stage and, 2) the incoming code is different from the current code Basic Design - Clock Generation 12

Additional Optimization To further increase the throughput of the design we pre-calculate the acknowledgement signal Automatic Delay Optimization CAD tools are designed to optimize delay based on paths between sequential elements This is possible in our design, however it is necessary to explicitly define a large number of paths/clocks To avoid this we made a circuit modification before delay optimization, and corrected it before routing 13

Automatic Delay Optimization Creates a virtual global clock to allow the repeater insertion tool to optimize the correct paths. Automatic Delay Optimization Enabling this automatic repeater insertion had a significant performance impact on the design. For the experiments on the largest die size: 8856 cells were resized 232 cells were inserted the path delay improved by 12.46ns 14

Synchronous Interconnect Clock Constraints register pipelining was used for the synchronous design registers are restricted to occur at network nodes the clock modeled with 100 ps of clock uncertainty (jitter) of 100 ps of skew 15

Experimental Framework Target ICs we created 9 ICs based on the TSMC 0.18µm 3 core die sizes: 3830x3830 µm (~1 million gates), 8560x8560 µm (~5 million gates), 12090x12090 µm (~10 million gates) 3 different block partitions: 16 blocks 64 blocks 256 blocks 16

Block / Network Placement CAD Tool Flow Completely automated design flow: Library: Artisan SAGE-X 0.18µm Synthesis: Synopsys Design Compiler Simulation: Cadence Verilog-XL Place and route: Cadence SoC Encounter Static Timing: Synopsys Primetime * Power : Synopsys PrimePower * * Results measured from detailed, placed and routed designs 17

Results Throughput - No Global Clock 18

Throughput - No Global Clock Power - 350 MHz 19

Latency - 350 MHz Area - 350 MHz 20

Conclusion It is feasible to implement an asynchronous interconnect network using standard cells and CAD tools For large, high-speed ICs it is possible to achieve a high throughput with asynchronous interconnect while avoiding a global clock for pipeline registers Asynchronous interconnect offers similar power, but significantly higher area than synchronous alternatives Future Work Use 90nm process - expecting a more significant difference in gate and wire delay Investigate the effect of enhancing the placement tool to allow automatic insertion of asynchronous pipelines Create a new sequential standard cell for asynchronous pipelining 21

End 22