Self-Test and Adaptation for Random Variations in Reliability

Similar documents
A Practical Look at SEU, Effects and Mitigation

11. Sequential Elements

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

A High-Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability. Nikolaos Minas David Kinniment Keith Heron Gordon Russell

Sequential Circuit Design: Principle

3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits

Towards Trusted Devices in FPGA by Modeling Radiation Induced Errors

Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM-based FPGAs

Why FPGAs? FPGA Overview. Why FPGAs?

DEDICATED TO EMBEDDED SOLUTIONS

Performance Driven Reliable Link Design for Network on Chips

Single-Event Upsets in the PANDA EMC

Logic Analysis Basics

Logic Analysis Basics

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

Measurements of metastability in MUTEX on an FPGA

Design of Fault Coverage Test Pattern Generator Using LFSR

arxiv: v1 [physics.ins-det] 30 Mar 2015

L11/12: Reconfigurable Logic Architectures

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

VLSI IEEE Projects Titles LeMeniz Infotech

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

L12: Reconfigurable Logic Architectures

High Performance Carry Chains for FPGAs

Testing Digital Systems II

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

Impact of Intermittent Faults on Nanocomputing Devices

2.6 Reset Design Strategy

DESIGNING AN ECU CPU FOR RADIATION ENVIRONMENT. Matthew G. M. Yee College of Engineering University of Hawai`i at Mānoa Honolulu, HI ABSTRACT

March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices

From Theory to Practice: Private Circuit and Its Ambush

FPGA Design. Part I - Hardware Components. Thomas Lenzi

Introduction Actel Logic Modules Xilinx LCA Altera FLEX, Altera MAX Power Dissipation

Lecture 10: Sequential Circuits

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA

Lecture 23 Design for Testability (DFT): Full-Scan

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,


L14: Quiz Information and Final Project Kickoff. L14: Spring 2004 Introductory Digital Systems Laboratory

HARDENED BY DESIGN APPROACHES FOR MITIGATING TRANSIENT FAULTS IN MEMORY-BASED SYSTEMS DANIEL RYAN BLUM

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Reading an Image using CMOS Linear Image Sensor. S.R.Shinthu 1, P.Maheswari 2, C.S.Manikandababu 3. 1 Introduction. A.

A Symmetric Differential Clock Generator for Bit-Serial Hardware

Chapter 6. Flip-Flops and Simple Flip-Flop Applications

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

Topic 8. Sequential Circuits 1

Combinational vs Sequential

Design and Implementation of an AHB VGA Peripheral

Exploring Architecture Parameters for Dual-Output LUT based FPGAs

Lecture 2: Basic FPGA Fabric. James C. Hoe Department of ECE Carnegie Mellon University

Timing Error Detection and Correction for Reliable Integrated Circuits in Nanometer Technologies

RFI MITIGATING RECEIVER BACK-END FOR RADIOMETERS

PROCESSOR BASED TIMING SIGNAL GENERATOR FOR RADAR AND SENSOR APPLICATIONS

Self Restoring Logic (SRL) Cell Targets Space Application Designs

Fault Detection And Correction Using MLD For Memory Applications

EE-382M VLSI II FLIP-FLOPS

Sequential Circuit Design: Part 1

Figure 1 shows a simple implementation of a clock switch, using an AND-OR type multiplexer logic.

ECE 715 System on Chip Design and Test. Lecture 22

MTBF Bounds for Multistage Synchronizers

Modeling Latches and Flip-flops

PICOSECOND TIMING USING FAST ANALOG SAMPLING

On the Rules of Low-Power Design

CAD for VLSI Design - I Lecture 38. V. Kamakoti and Shankar Balachandran

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

PERFORMANCE ANALYSIS OF POWER GATING TECHNIQUES IN 4-BIT SISO SHIFT REGISTER CIRCUITS

Lossless Compression Algorithms for Direct- Write Lithography Systems

EE 447/547 VLSI Design. Lecture 9: Sequential Circuits. VLSI Design EE 447/547 Sequential circuits 1

Single Event Upset Hardening by 'hijacking' the multi-vt flow during synthesis

IMPACT OF PROCESS VARIATIONS ON SOFT ERROR SENSITIVITY OF 32-NM VLSI CIRCUITS IN NEAR-THRESHOLD REGION. Lingbo Kou. Thesis

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

Digital Blocks Semiconductor IP

FPGA Implementation of Sequential Logic

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

M. Alderighi/F. Casini

Product Level MTBF Calculation

ESE (ESE534): Computer Organization. Last Time. Today. Last Time. Align Data / Balance Paths. Retiming in the Large

FPGA Design with VHDL

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Soft Errors re-examined

Product Update. JTAG Issues and the Use of RT54SX Devices

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA

Sequential Circuit Design: Part 1

MUX AND FLIPFLOPS/LATCHES

Precision testing methods of Event Timer A032-ET

Field Programmable Gate Arrays (FPGAs)

WINTER 15 EXAMINATION Model Answer

EITF35: Introduction to Structured VLSI Design

Block Diagram. pixin. pixin_field. pixin_vsync. pixin_hsync. pixin_val. pixin_rdy. pixels_per_line. lines_per_field. pixels_per_line [11:0]

Clock - key to synchronous systems. Topic 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

LFSR Counter Implementation in CMOS VLSI

Sources of Error in Time Interval Measurements

Clock - key to synchronous systems. Lecture 7. Clocking Strategies in VLSI Systems. Latch vs Flip-Flop. Clock for timing synchronization

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

CESR BPM System Calibration

Transcription:

Self-Test and Adaptation for Random Variations in Reliability Kenneth M. Zick and John P. Hayes University of Michigan, Ann Arbor, MI USA August 31, 2010

Motivation Physical variation is increasing dramatically ΔP = ΔP D2D + ΔP sc + ΔP rand Transient faults also increasing in prominence What about random variations in transient fault reliability? Very little has been published 2

Danger ahead Source: Borkar, IEEE Computer, 2005 3

Transient faults Caused by radiation and other noise sources Random variations in vulnerability have become significant Monte Carlo sim shows Q CRIT variations for four flip-flop designs: 3σ variation in Q CRIT,1 0 = 44 to 115% [Mostafa 09] 3σ variation in Q CRIT,0 1 = 20% to 72% Huge variation in Fault rate Flux Area e -QCRIT/QCOLL Upsetability of individual cells in a chip? Nobody knows! Conventional wisdom: nothing you can do anyway 4

Vision Need low-cost, fine-grained methods of introspection and self-optimization. Physically-adaptive computing Goals: better parametric yield, fewer soft errors, improved power & energy efficiency, longer system lifetimes Applicable to FPGA-based systems as well as reconfigurable nanoarchitectures 5

Proposed approach to self-test Introspection: systems probe their own components to uncover random variations Generate synthetic noise on-chip Inject noise into components during self-test Use data to infer variations in reliability 6

Self-test for latch reliability Flip-flops hold temporary state information needed for computation Prone to single event transients (SETs) and single event upsets (SEUs) Can t be protected by ECC. Need TMR, or extra circuitry D CLK CE S Flip-flop Virtex-5QV will include SET filters (up to 800ps) and SEUresistant latches, but most systems don t have them Proposal: inject synthetic noise via asynchronous set/reset Look for intra-slice variations in upsetability (SETs, SEUs) R Q 7

Proposed self-test configuration Noise Emulator Async set Logic Slice Pulse generator Built-in buffers Global/ regional interconn....... to other slices to other slices scan in (3:0) Async reset D S S S S Flip-flop under under under under test CLK test test test CLK CLK CLK R R R R Q scan out (3:0) Processor core (on-chip or off-chip) 8

Pulse generator Similar to digital-to-time converter (DTC) Desire high resolution. Linearity less important here. Some options: Dual PLLs using the Vernier principle. 35ps but overhead Delay line such IODELAY. 78ps Carry chain. ~50ps 9

1 1 1 1 Portion of pulse generator carry out 0 1 pulse(3) D Q 1 Latch CLK 0 1 pulse(2) D Q 1 Latch CLK 0 1 pulse(1) D Q 1 Latch CLK 0 1 pulse(0) D Q 1 Latch CLK carry in trigger... MUX select(5:0) out 10

Experimental setup Two Virtex-5 LX110T FPGAs 1,024 flip-flops under test Pulse widths ~600ps (calibrated to each slice) MicroBlaze processor Overhead: 64KB of MicroBlaze memory in total Calibration: 4B of data per slice. Execution time = 3 min. Characterization time: 5 seconds 11

Results - upsetability maps Chip #1 Chip #2 32x32 array of flip-flops. Values shown are the ratios of latch upsets to the mean number of latch upsets for the associated slice, over 255 trials. The test case is master latches in the 1 state. Slices span four cells vertically. 12

Quantifying intra-slice variation Coefficient of variation (σ/μ) for latch upsets within a slice in the tested noise environment, averaged over 256 slices. 13

Upsets vs. location within slice Distribution of 500,000 upsets Location in slice Chip #1 upset distribution Chip #2 upset distribution D 24.7% 24.5% C 25.3% 25.2% B 24.8% 24.9% A 25.2% 25.3% Any systematic bias is negligible compared to the large intra-slice variations Results are consistent with variations that are random by latch 14

What about LUT cells? Idea: inject noise into shift register LUTs in addr Shift register LUT (32 bits from 64 SRAM cells) out clock Found a strong bias toward upsets in 1 bits. Possible extension: search for marginal SRAM cells. Validate against radiation data. 15

Adapting to latch variations Define the cost function to be the total raw upset rate Latch upsetability: m0, m1, s0, s1. Characterized via self-test Signal probabilities SP. Can be characterized via logic simulation or capture & readback Cost of a state bit i placed at flip-flop j: cost ij = (m1 j + s1 j ) SP i + (m0 j + s0 j ) (1 - SP i ) (1) Estimated MTBF: MTBF = 1/ i j cost ij x ij (2) Find configurations that maximize the MTBF Fault avoidance. Complement to error mitigation. 16

Intra-slice optimization 17

Potential for reconfiguration 18

Recent benchmark results Improvements in MTBF for self-adaptation and assisted adaptation relative to the non-adaptive case, assuming uniformly random signal probabilities. Error bars show the standard deviation across 10 trials. 19

Conclusions Wealth of physical information is out there waiting to be discovered and put to use Random variations can be significant and can be estimated via self-test Field programmable systems have unique potential for self-test and self-optimization Much interesting research ahead! 20

Acknowledgments NASA GSRP Fellowship & NASA Langley Research Center National Science Foundation grant CCF-0702276 Xilinx Inc. and Sun Microsystems Adaptive Hardware & Systems group at U. Michigan 21

Thank you! 22

Backup slides 23

Example of CMOS latch D set Q clk clk reset 24