Soft Errors re-examined

Similar documents
Self Restoring Logic (SRL) Cell Targets Space Application Designs

DESIGNING AN ECU CPU FOR RADIATION ENVIRONMENT. Matthew G. M. Yee College of Engineering University of Hawai`i at Mānoa Honolulu, HI ABSTRACT

Single Event Upset Hardening by 'hijacking' the multi-vt flow during synthesis

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

Novel Correction and Detection for Memory Applications 1 B.Pujita, 2 SK.Sahir

Impact of Intermittent Faults on Nanocomputing Devices

HARDENED BY DESIGN APPROACHES FOR MITIGATING TRANSIENT FAULTS IN MEMORY-BASED SYSTEMS DANIEL RYAN BLUM

VLSI Test Technology and Reliability (ET4076)

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

TODAY, the use of embedded systems in safety-critical

VLSI System Testing. BIST Motivation

ECE 715 System on Chip Design and Test. Lecture 22

Single-Event Upsets in the PANDA EMC

A Low-cost, Radiation-Hardened Method for Pipeline Protection in Microprocessors

Tolerant Processor in 0.18 µm Commercial UMC Technology

Performance Driven Reliable Link Design for Network on Chips

Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM-based FPGAs

Self-Test and Adaptation for Random Variations in Reliability

CSER: BISER-Based Concurrent Soft-Error Resilience

Design Of Error Hardened Flip-Flop Withmultiplexer Using Transmission Gates And N-Type Pass Transistors

Testing Digital Systems II

This document is an author-formatted work. The definitive version for citation appears as:

EMPTY and FULL Flag Behaviors of the Axcelerator FIFO Controller

Final Exam CPSC/ECEN 680 May 2, Name: UIN:

AN EMISSION REINFORCED SCHEME FOR PIPELINE DEFENSE IN MICROPROCESSORS

ECEN620: Network Theory Broadband Circuit Design Fall 2014

Based on slides/material by. Topic 14. Testing. Testing. Logic Verification. Recommended Reading:

Soft errors, also called single-event upsets. Robust System Design with Built-In Soft-Error Resilience

DIGITAL SYSTEM FUNDAMENTALS (ECE421) DIGITAL ELECTRONICS FUNDAMENTAL (ECE422) LATCHES and FLIP-FLOPS

A High-Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability. Nikolaos Minas David Kinniment Keith Heron Gordon Russell

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

A Practical Look at SEU, Effects and Mitigation

An MFA Binary Counter for Low Power Application

Scan. This is a sample of the first 15 pages of the Scan chapter.

Design for Testability


ECE321 Electronics I

SoC IC Basics. COE838: Systems on Chip Design

Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)

UNIT IV CMOS TESTING. EC2354_Unit IV 1

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Fault Detection And Correction Using MLD For Memory Applications

Unit 8: Testability. Prof. Roopa Kulkarni, GIT, Belgaum. 29

Unit V Design for Testability

Lecture 26: Multipliers. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

11. Sequential Elements

Timing with Virtual Signal Synchronization for Circuit Performance and Netlist Security

DEDICATED TO EMBEDDED SOLUTIONS

Lecture 17: Introduction to Design For Testability (DFT) & Manufacturing Test

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

Professor Lloyd W. Massengill

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

Fully Automated Radiation Hardened by Design. Circuit Construction. Nathan Hindman

North Shore Community College

Design Techniques for Radiation-Hardened FPGAs

Chapter 8 Design for Testability

TKK S ASIC-PIIRIEN SUUNNITTELU

A Reconfigurable Parallel Signature Analyzer for Concurrent Error Correction in DRAM

First Name Last Name November 10, 2009 CS-343 Exam 2

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

Hardware Design I Chap. 5 Memory elements

Digital Integrated Circuits Lecture 19: Design for Testability

Soft Error Resilient System Design through Error Correction

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

Notes on Digital Circuits

超大型積體電路測試 國立清華大學電機系 EE VLSI Testing. Chapter 5 Design For Testability & Scan Test. Outline. Introduction

A 65 nm Low-Power Adaptive-Coupling Redundant Flip-Flop

Strategies for Efficient and Effective Scan Delay Testing. Chao Han

An automatic synchronous to asynchronous circuit convertor

Overview: Logic BIST

Retiming Sequential Circuits for Low Power

Analysis and Optimization of Sequential Circuit Elements to Combat Single-Event Timing Upsets

Software Engineering 2DA4. Slides 9: Asynchronous Sequential Circuits

Single-Event Upset Technology Scaling Trends of. Unhardened and Hardened Flip-Flops in Bulk CMOS. Nelson J. Gaspard III.

Figure.1 Clock signal II. SYSTEM ANALYSIS

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Lecture 11: Sequential Circuit Design

At-speed testing made easy

6.3 Sequential Circuits (plus a few Combinational)

Asynchronous inputs. 9 - Metastability and Clock Recovery. A simple synchronizer. Only one synchronizer per input

Voter Insertion Techniques for Fault Tolerant FPGA Design.

A Soft Error Tolerant LUT Cascade Emulator

Clock Domain Crossing. Presented by Abramov B. 1

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

Synchronization Voter Insertion Algorithms for FPGA Designs Using Triple Modular Redundancy

Two types of state machine as classified by output formation

EITF35: Introduction to Structured VLSI Design

CMOS Low Power, High Speed Dual- Modulus32/33Prescalerin sub-nanometer Technology

PHYSICS 5620 LAB 9 Basic Digital Circuits and Flip-Flops

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Lecture 23 Design for Testability (DFT): Full-Scan

55:131 Introduction to VLSI Design Project #1 -- Fall 2009 Counter built from NAND gates, timing Due Date: Friday October 9, 2009.

Reconfigurable Communication Experiment using a small Japanese Test Satellite

Flip-Flops A) Synchronization: Clocks and Latches B) Two Stage Latch C) Memory Requires Feedback D) Simple Flip-Flop Gate

Structural Fault Tolerance for SOC

Transcription:

Soft Errors re-examined Jamil R. Mazzawi Founder and CEO www.optima-da.com Optima Design Automation Ltd 1 v1.2

Topics: Soft errors: definitions FIT Rate Soft-errors problem strengthening in new nodes Logical Masking and deration Mitigation techniques Flip-flop selection CosmicASICs Optima Design Automation Ltd 2

Soft-errors Cosmic Particles influencing our chips Particles can flip the values in flops and memory bits Optima Design Automation Ltd 3

Measuring soft-errors: FIT rate FIT Failure In Time How many Failures in 1 billion hours FIT = 10 9 / MTBF (hours) FIT of a system = FITi i= all its components FIT for a server farm = Sum of the FIT of all its servers, routers etc.. FIT for a chip = Sum of the FIT of all flops, memory bites, combo logic etc.. Optima Design Automation Ltd 4

Example: FIT req. of a chip Server farm for bank XYZ, with 1000 servers Required MTBF(the farm) = 1 year MTBF(each server) = 1000 years Includes power supply, FAN, memory, the CPU Chip, other chips MTBF(CPU chip) = 1200 years 10 FIT(CPU) = 9 = 114077 = 95 1200 365.25 24 1200 Given: FIT(single flip-flop) = 0.01 (@NYC) Given: Chip has 300,000 flops FIT(all flops) = 3,000 > 95 We have a problem Does not include: 1- Deration factors 2- Other component of the chip (i.e, memories) Optima Design Automation Ltd 5

Problem strengthening these days Newer, technologies are more sensitive Smaller transistor dimension => Smaller critical charge => the electrical charge of the particles relatively bigger than the critical charge Two effects that cancel each other: Smaller area per-transistor decrease per-trans FIT-rate More transistor per mm² Increase total FIT (of the chip) together, they almost don t influence the FIT rate Optima Design Automation Ltd 6

Where is it important: Memories Was the only area that needed protection in older nodes Solution: ECC protection Flop-flops Flops must be protected in newer technology nodes Combinatorial logic Second degree problem Solved Problem! Hottest unsolved Problem! Not a problem yet Optima Design Automation Ltd 7

Single Event Upset vs. Soft-Error SEU: A particle caused a flip-flop or memory bit to flip its value Soft-Error: An SEU has propagated and caused a system failure seen outside Most SEU do not convert to Soft-errors Optima Design Automation Ltd 8

Most SEUs do not convert to Soft-errors Ilan Beer, IBM HVC 2008 Definition: FIT rate with derating factors FIT calculated taking into account vanishing SEUs Optima Design Automation Ltd 9

Common mitigation methods: TMR with Majority voting DMR with C-Element Soft-error detectors SE detection with Parity tree More. Optima Design Automation Ltd 10

Solution 1: TMR with Majority voter TMR Triple Modular Redundancy. Extra area ~ +205%, extra power ~ +205%, FIT = 0 (-100%) Optima Design Automation Ltd 11

Solution 2: DMR with C-element DMR - Dual Modular Redundancy using additional C-element additional area and power > +100%, FIT = reduced to 5% Optima Design Automation Ltd 12

Solution 3: Soft-Error detectors These techniques usually used for detecting single bit flips in pipeline storage elements. One simple method is to duplicate the critical node and connect the outputs to XOR gate. Additional area and power is about 100%. Optima Design Automation Ltd 13

Summary of different solutions Family Technique description Extra area Extra power FIT TMR Triplicate of storage elements with majority voter at output Triple Modular Redundancy TMR with majority voting Three time-delayed storage node +200% +200% Down to 0 DMR Dual Modular Redundancy C element Error Detection Copy storage element Using already existing scan design-for-testability Using duplicated storage element with XOR +105% +20% +103% +100% ~+15% ~+100% Down by 95% Parity Tree Parity tree Using transient detector. Used in pipelines and recoverable models --- --- Down to 0 Performance penalty Not always possible Etc.. Optima Design Automation Ltd 14

Flip-flop selection is needed Hardening all flops is not viable Silicon costs: 25%-35% Influence on: Unit cost, NRE cost and Power Solution: Apply these solutions selectively Harden flops that are more sensitive to SEUs A flop Sensitive to SEU means: SEU on the flop has higher probability to convert to soft-error Optima Design Automation Ltd 15

Existing selection methods: Error Injection simulation Run a lot of simulations Each simulation injects a single error on a random flop, at a random cycle (simulating SEU) If the test-bench detects an error this SEU is Soft-Err. How many simulations to run? Option 1: Loop for all flops and all cycles Option 2: select random flops and random cycles to inject errors on lower accuracy Optima Design Automation Ltd 16

Benefits: Error Injection simulation Almost the only available option now Draw backs: Time consuming: 2-4 weeks with low sample-rate Compute resources consuming 2-4 weeks x 10-20 machines during peak project time Internal/in-house solution: needs someone to develop it and maintain it Solution available only for big companies Optima Design Automation Ltd 17

Introducing: CosmicASIC x1000 times faster than existing solutions Plug-and-play solution 100% accuracy Optima Design Automation Ltd 18

Summary The Soft-errors problem is strengthening Mitigation techniques exist: But can cost 25%-35% in silicon, NRE and power Flip-flop selection is a must Solves the soft-error problem at fraction of the cost CosmicASIC : Flip-flop selection EDA tool Visit us at booth A03 in the exhibition area Or at: http://www.optima-da.com Optima Design Automation Ltd 19