Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction

Similar documents
66 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 1, JANUARY 2013

RAZOR: CIRCUIT-LEVEL CORRECTION OF TIMING ERRORS FOR LOW-POWER OPERATION

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Bubble Razor: Eliminating Timing Margins in an ARM Cortex-M3 Processor in 45nm CMOS Using Architecturally Independent Error Detection and Correction

On the Rules of Low-Power Design

EDSU: Error detection and sampling unified flip-flop with ultra-low overhead

SGERC: a self-gated timing error resilient cluster of sequential cells for wide-voltage processor

Extended Bubble Razor Methodology and its Application to Dynamic Voltage Frequency Scaling Systems

32 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 1, JANUARY /$ IEEE

Figure 1 shows a simple implementation of a clock switch, using an AND-OR type multiplexer logic.

EITF35: Introduction to Structured VLSI Design

Timing Error Detection and Correction for Reliable Integrated Circuits in Nanometer Technologies

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Lecture 11: Sequential Circuit Design

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation

Reducing Pipeline Energy Demands with Local DVS and Dynamic Retiming

Adaptive Overclocking and Error Correction Based on Dynamic Speculation Window

Combinational vs Sequential

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

ECEN454 Digital Integrated Circuit Design. Sequential Circuits. Sequencing. Output depends on current inputs


792 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 41, NO. 4, APRIL 2006

11. Sequential Elements

Digital System Design

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

Lec 24 Sequential Logic Revisited Sequential Circuit Design and Timing

Clock Domain Crossing. Presented by Abramov B. 1

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Homework #8 due next Tuesday. Project Phase 3 plan due this Sat.

An Automated Design Approach of Dependable VLSI Using Improved Canary FF

Flip-Flops. Because of this the state of the latch may keep changing in circuits with feedback as long as the clock pulse remains active.

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Chapter 3 :: Sequential Logic Design

Experiment 8 Introduction to Latches and Flip-Flops and registers

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

ECE 555 DESIGN PROJECT Introduction and Phase 1

CPE/EE 427, CPE 527 VLSI Design I Sequential Circuits. Sequencing

Virtually all engineers use worst-case component

EE141-Fall 2010 Digital Integrated Circuits. Announcements. Synchronous Timing. Latch Parameters. Class Material. Homework #8 due next Tuesday

EE178 Spring 2018 Lecture Module 5. Eric Crabill

Built-In Proactive Tuning System for Circuit Aging Resilience

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 53, NO. 2, FEBRUARY

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

Scan. This is a sample of the first 15 pages of the Scan chapter.

ECE321 Electronics I

At-speed Testing of SOC ICs

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

Lecture 23 Design for Testability (DFT): Full-Scan

DEDICATED TO EMBEDDED SOLUTIONS

Retiming Sequential Circuits for Low Power

The NOR latch is similar to the NAND latch

Digital Electronics II 2016 Imperial College London Page 1 of 8

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

An FPGA Implementation of Shift Register Using Pulsed Latches

Software Engineering 2DA4. Slides 9: Asynchronous Sequential Circuits

The outputs are formed by a combinational logic function of the inputs to the circuit or the values stored in the flip-flops (or both).

Lecture 21: Sequential Circuits. Review: Timing Definitions

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

Clock - key to synchronous systems. Topic 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

Clock - key to synchronous systems. Lecture 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

Modeling Digital Systems with Verilog

L5 Sequential Circuit Design

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

CS8803: Advanced Digital Design for Embedded Hardware

Design of a High Frequency Dual Modulus Prescaler using Efficient TSPC Flip Flop using 180nm Technology

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

Glitches/hazards and how to avoid them. What to do when the state machine doesn t fit!

Partial Bus Specific Clock Gating With DPL Based DDFF Design

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

Lecture 10: Sequential Circuits

give sequence to events have memory (short-term) use feedback from output to input to store information

FIFO Memories: Solution to Reduce FIFO Metastability

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

Introduction to Microprocessor & Digital Logic

Chapter 7 Sequential Circuits

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram

Momentary Changes in Outputs. State Machine Signaling. Oscillatory Behavior. Hazards/Glitches. Types of Hazards. Static Hazards

P U Q Q*

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

An Improved Hardware Implementation of the Grain-128a Stream Cipher

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)

Feedback Sequential Circuits

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Clock Gate Test Points

II. ANALYSIS I. INTRODUCTION

2.6 Reset Design Strategy

Chapter 8 Design for Testability

Switching Circuits & Logic Design, Fall Final Examination (1/13/2012, 3:30pm~5:20pm)

Novel Low Power and Low Transistor Count Flip-Flop Design with. High Performance

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

Performance Driven Reliable Link Design for Network on Chips

TKK S ASIC-PIIRIEN SUUNNITTELU

Note that none of the above MAY be a VALID ANSWER.

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

Transcription:

1 Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction Matthew Fojtik, David Fick, Yejoong Kim, Nathaniel Pinckney, David Harris, David Blaauw, Dennis Sylvester mfojtik@umich.edu Electrical Engineering & Computer Science Department The University of Michigan, Ann Arbor 1 1

Outline 2 Issues with Prior Razor Bubble Razor Algorithm Circuitry and Implementation Area Overhead Tradeoffs Test Chip Results 2 2

Timing Margins 3 Margins for uncertainty: Process Variation Temperature Variation Voltage Variation Aging Effects actual circuit delay Associated Costs: Lost performance Lost energy Tester time (tradeoff) Lost performance/energy clock Data Aging Temperature Process Voltage 3 3

Eliminating Margins 4 Always Correct Tables, Canaries D CLK Main DFF Q Detect and Correct Razor Style DCLK Shadow Latch Error S. Das, et. al. [VLSI 2005] Technique Process Ambient Data Global Local Global Local Slow Fast Slow Fast Table Lookup X X Table & Sensors X X X Canary Circuit X X Razor Designs X X X X X X X 4 4

Speculation Window and Hold Time 5 DFF A DFF B CLK A CLK B Speculation Window Speculation window linked to minimum delay constraint (hold time) 5 5

Architectural Invasiveness 6 IF ID EX MEM WB S. Das, et. al. [VLSI 2005] Razor I Style All Flops Reload Previous Values IF ID EX MEM CHK WB D. Blaauw, et. al. [ISSCC 2008] K. Bowman, et. al. [ISSCC 2008] Razor II Style Check Stage and Architectural Replay Requires Designer Effort RTL written with Razor in mind 6 6

Fundamentals of Bubble Razor Two-Phase Latch Timing Automatically convert Flip-Flop based design 7 Time Borrowing as Correction Mechanism Does not modify design architecture Does not require reloading / replaying instructions Local Correction (Bubbles) Break requirement of stalling entire chip at once 7 7

Two Phase Latch Razor Timing 8 LD A LD B CLK A CLK B Larger Speculation Window Minimum delay constraint the same as conventional design 8 8

Time Borrowing as Error Correction 9 LD DFF LD LD DFF LD TD TD TD TD Bubble Razor Switch to Latches, Borrow Time G closed open closed open closed X closed open D Error No Hold Time Issues Architecture Agnostic Push-button approach No metastability on datapath 9 9

Time Stalling Locally with Bubbles 10 Stalling the Clock Locally With flops, all registers hold data With latches, half registers hold bubbles Every latch stalls exactly once Communication only between neighbors Eventually it all resolves Blue tells Green Purple to stall tells Blue to Red stall tells Purple Yellow to stall tells Yellow takes off again Yellow tells downstream Red to stall Yellow stalls Not immediately no overwritten new data exists 1 2 3 4 5 6 7 8 10 10

Timing of Clock Waveforms 11 1 2 3 4 5 6 7 8 9 10 1 Prevent Losing inst3 2 Should Arrive Timing violation Prevent Losing inst2 3 Give time to Recover 4 Prevent Double Sampling inst1 11 11

Timing of Clock Waveforms 12 1 2 3 4 5 6 7 8 9 10 1 Prevent Losing inst3 2 Should Arrive Timing violation Prevent Losing inst2 3 Give time to Recover 4 Prevent Double Sampling inst1 12 12

Timing of Clock Waveforms 13 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 13 13

Timing of Clock Waveforms 14 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Timing violation Stall 3 Stall Neighbors 14 14

B B B B The Required Circuitry 15 TD TD TD TD 1 2 3 2 CG CG CG CG 15 15

Error Detection And OR Circuitry 16 TD 1 16 16

B Clock Gate Control Logic 17 CG A cluster stalls and sends bubbles to all neighbors if Told by a neighboring cluster Did not stall in the previous cycle Equivalent to sending bubbles to other neighbors 17 17

Clustering with hmetis 18 Widely used Hypergraph partitioning program, hmetis Clusters must only contain members with the same phase Create two graphs, and partition independently Connected in hmetis graph, if transitively connected in circuit Edge Weight = number of latches that form transitive connection 1 2 1 1 2 4 2 5 4 5 1 2 1 1 3 3 6 6 18 18

Clustering Results 19 Tradeoff between sizes of OR gates Combining errors Combining bubbles 100 negative clusters 70 positive clusters 19 19

Two Port Memory Boundary Approach 20 Must fit edge triggered memory into stalling algorithm 20 20

Managing the Synthesis/APR Tools Want balanced pipelines, no time borrowing Model razor latches as flip flops Dynamic OR always followed by latch Model dynamic OR as static Model latch as flip flop (captures when latch closes) Use regular ICG cells Can use conventional clock tree synthesis Final design appears to be relatively normal Flip-flop based design with clock gating Everything is timing constrained Razorization process is entirely automated Synthesis and netlist transformation scripts 21 21 21

Retiming And Number of Latches 22 Retiming can increase the number of latches Results in area overhead 22 22

Area Overhead of Latch Transformation 23 23 23

Speculation Window Size Full Clock Phase (100%) Minus Delay of Error Propagation Circuits Maximum allowed by technique 24 Number / Location of Latches with Error Checking Maximum slowdown that does not result in unchecked error Speculation Window 24 24

Where Error Checking is Needed 25 50% 15% Leave B 30% Speculation Arrive Arrive Window C D If circuit delay suddenly becomes 130% of its nominal value, all timing errors will be detected before the circuit fails 156% 91% Delay at Worst Delay at PoFF 65% 50% 65% 26% >50? >50? >50? 50% 20% A B C D 25 25

Path Distribution for Cortex-M3 26 Flip Flops All Latches Negative Latches Positive Latches 26 26

Area Increase from Error Checking 27 20% Area Overhead 30% Timing Speculation 27 27

Implementation on ARM Cortex-M3 28 28 28

Characterizing Throughput / Energy Operating Point Set for Worst Case Operation 85 C 10% Supply Droop 2σ Process 5% Safety Margin 29 200 MHz at 1.0 V 29 29

Gains from Bubble Razor 30 30 30

Gains from Bubble Razor 31 31 31

Bubble Razor Results 32 Slow Average Fast 32 32

Bubble Razor Results 33 Worst Case First Failure 200 MHz 8.5 FFT/ms 333 MHz 14.2 FFT/ms Optimum 425 MHz 17.3 FFT/ms Worst Case First Failure 1.0 V 3.08 μj/fft 0.775 V 1.42 μj/fft Optimum 0.725 V 1.18 μj/fft 33 33

Conclusion 34 First Razor style implementation on a complete, commercial processor (ARM Cortex-M3). Proposed two-phase latch based Razor technique Novel local replay algorithm Demonstrated automated nature of technique Successfully implemented and fabricated in 45nm 60% energy efficiency or 100% throughput increase over worst case margining 34 34