Single Event Effect Mitigation in Digital Integrated Circuits for Space

Similar documents
Tolerant Processor in 0.18 µm Commercial UMC Technology

Single Event Upset Hardening by 'hijacking' the multi-vt flow during synthesis

Design Techniques for Radiation-Hardened FPGAs

Self Restoring Logic (SRL) Cell Targets Space Application Designs

Radiation Hardening By Design

Soft Errors re-examined

TKK S ASIC-PIIRIEN SUUNNITTELU

Radiation Effects and Mitigation Techniques for FPGAs

Towards Trusted Devices in FPGA by Modeling Radiation Induced Errors

Product Update. JTAG Issues and the Use of RT54SX Devices

Impact of Intermittent Faults on Nanocomputing Devices

Static Timing Analysis for Nanometer Designs

L11/12: Reconfigurable Logic Architectures

L12: Reconfigurable Logic Architectures

Scan. This is a sample of the first 15 pages of the Scan chapter.

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

M. Alderighi/F. Casini


Modeling Latches and Flip-flops

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Single Event Characterization of a Xilinx UltraScale+ MP-SoC FPGA

Lecture 23 Design for Testability (DFT): Full-Scan

A video signal processor for motioncompensated field-rate upconversion in consumer television

HDL & High Level Synthesize (EEET 2035) Laboratory II Sequential Circuits with VHDL: DFF, Counter, TFF and Timer

Prototyping an ASIC with FPGAs. By Rafey Mahmud, FAE at Synplicity.

Reconfigurable Communication Experiment using a small Japanese Test Satellite

Level and edge-sensitive behaviour

Self-Test and Adaptation for Random Variations in Reliability

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit)

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

DEDICATED TO EMBEDDED SOLUTIONS

Voter Insertion Techniques for Fault Tolerant FPGA Design.

Synchronization Voter Insertion Algorithms for FPGA Designs Using Triple Modular Redundancy

Field Programmable Gate Arrays (FPGAs)

EITF35: Introduction to Structured VLSI Design

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

Innovative Fast Timing Design

Design for Testability

11. Sequential Elements

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)

Why FPGAs? FPGA Overview. Why FPGAs?

A Practical Look at SEU, Effects and Mitigation

Chapter 8 Design for Testability

FPGA Development for Radar, Radio-Astronomy and Communications

Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM-based FPGAs

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices

An MFA Binary Counter for Low Power Application

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

Hardware Design I Chap. 5 Memory elements

Sharif University of Technology. SoC: Introduction

HARDENED BY DESIGN APPROACHES FOR MITIGATING TRANSIENT FAULTS IN MEMORY-BASED SYSTEMS DANIEL RYAN BLUM

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

Design of Fault Coverage Test Pattern Generator Using LFSR

CS/EE 6710 Digital VLSI Design CAD Assignment #3 Due Thursday September 21 st, 5:00pm

Using on-chip Test Pattern Compression for Full Scan SoC Designs

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

Novel Correction and Detection for Memory Applications 1 B.Pujita, 2 SK.Sahir

Timing EECS141 EE141. EE141-Fall 2011 Digital Integrated Circuits. Pipelining. Administrative Stuff. Last Lecture. Latch-Based Clocking.

A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1

2.6 Reset Design Strategy

Slide Set 14. Design for Testability

FPGA Design. Part I - Hardware Components. Thomas Lenzi

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

Digital Integrated Circuits EECS 312

Testing Digital Systems II

Module 8. Testing of Embedded System. Version 2 EE IIT, Kharagpur 1

Unit 8: Testability. Prof. Roopa Kulkarni, GIT, Belgaum. 29


EECS150 - Digital Design Lecture 18 - Circuit Timing (2) In General...

Lecture 6: Simple and Complex Programmable Logic Devices. EE 3610 Digital Systems

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

Combinational vs Sequential

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

EE178 Spring 2018 Lecture Module 5. Eric Crabill

Clock Domain Crossing. Presented by Abramov B. 1

SoC IC Basics. COE838: Systems on Chip Design

Design Of Error Hardened Flip-Flop Withmultiplexer Using Transmission Gates And N-Type Pass Transistors

EECS150 - Digital Design Lecture 3 Synchronous Digital Systems Review. Announcements

Modeling Latches and Flip-flops

On the HPDP from architecture to a device. Final Presentation Days ESTEC, May 9 th 2017

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

A Tool For Run Time Soft Error Fault Injection. Into FPGA Circuits

Logic Analysis Basics

Logic Analysis Basics

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

An Introduction to Radiation-Induced Failure Modes and Related Mitigation Methods For Xilinx SRAM FPGAs

K.T. Tim Cheng 07_dft, v Testability

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

RFI MITIGATING RECEIVER BACK-END FOR RADIOMETERS

ECE321 Electronics I

Transcription:

Single Event Effect Mitigation in Digital Integrated Circuits for Space Topical Workshop on Electronics for Particle Physics 21. September 2010 Aachen Roland Weigand European Space Agency Data Systems Division TEC-EDM Microelectronics Section Tel. +31-71-565-3298 Fax. +31-71-565-6791 Roland.Weigand[at]esa.int (1)

Summary The Microelectronics Section at ESA - who we are Radiation sources in space environment SEU radiation hardening approach SEE (SEU/SET) hardening of commercial bulk CMOS Hardened standard cell library cells Triple Modular Redundancy with clock skew (TMR, STMR) Implications of TMR in the design flow SEE protection of memory blocks Memory cell design Error Correcting Codes, parity, TMR, scrubbing SEE in reprogrammable (SRAM) FPGA Triple Modular Redundancy(combinatorial and sequential logic) Dedicated rad-hard FPGA design Validation of SEE hardening Simulation, emulation, structural/formal verification Ground radiation testing (2)

Who we are... (2) (3)

Who we are... (2) The Technical and Quality Management Directorate (TEC) http://www.esa.int/specials/space_engineering/semb9fvg3hf_0.html Inside TEC, 3 sections work on radiation effects: The Space Environments and Effects Section (TEC-EES) Analysis of space environments and their effects on space systems http://space-env.esa.int/index.php/esa-estec-space-environment-tec-ees.html The Radiation Effects and Analysis Techniques Section (TEC-QEC) Analysis at component level and radiation testing https://escies.org/readarticle?docid=227 The Microelectronics Section (TEC-EDM) Availability of appropriate technologies and development methods Availability of space-specific standard components and IP Development support to projects Analysis and mitigation of SEE at design level http://www.esa.int/tec/microelectronics/ (4)

Radiation Sources in Space The space radiation environment is dynamic and inhomogenous Dependency on satellite trajectory/orbit Dependency on mission schedule Trapped radiation belts (Van Allen belts) e- and p+ trapped in the earth magnetic field Inhomogenous: e.g. South Atlantic Anomaly Solar Particles Solar activity cycle 11 years High flux for several days during solar flares Protons and heavy ions with a highly variable energy spectrum Shielding by earth magnetic field Galactic Cosmic Rays Anticorrelated with solar activity (high flux during solar low) Particles from protons to heavy ions High energy, up to 10 20 ev Flux ~ 4 particles / cm 2 / sec (5)

Radiation Hardening Dedicated processes for space are not affordable any more SOI is sometimes used Low SEU rates, latch-up free, some concerns on TID SOI is less readily available, analog IP need to be re-developed Total Ionising Dose (TID) Most space missions are limited to 100 krad dose, and in 180 nm or below, TID protection might be limited to e.g. screening of (commercial) library cells, elimination of certain transistor types Some long duration, deep space missions are in the Mrad domain, requiring mitigation e.g. by special transistor geometries (ELT), guard rings or derating Single Event Latch-Up (SEL) Horizontal: mitigation in layout, e.g. guard rings Vertical: thickness of the epitaxial layer, deep n-well Single Event Effects (SEE) by Transient and Upset (SET, SEU) Spatial or temporal redundancy Mitigation by design of library cells or in logic design see below (6)

Single Event Transients (SET) Collision induced carrier generation in PN junctions Propagate as glitches in combinatorial logic Latched into storage cells when arriving at data input during clock edge Upset rate increases with the clock frequency Seen already in ERC32 processor (0.5 m technology) definitely a concern in 0.18 m and below Analysis of SET effects in simulation and radiation tests SET pulse length and amplitude are most important parameters Specific test structures to catch and characterise the pulse CNES contract with Atmel on SET effects in the 0.18 m technology Mitigation of SET effects Propagation of complementary logic levels ( Dual Stream ) Using stronger drivers and higher capacitive loads Delay filtering on all flip-flop inputs (clock, data, reset) STMR: Triple skewed clocks in conjunction with the TMR flip-flop» Triplication of clock-like nets (including asynchronous resets)» see below (7)

SEU hardening approach 1. Determine mission requirements Fix reliability goal (FIT, # faults tolerated per time unit) Determine radiation profile (orbit, solar cycles) Shielding in the mechanical structure For standard components: take worst possible requirements 2. Characterise target silicon technology Simulation and ground radiation testing in accelerators LET threshold and saturated cross-section 3. Calculate error rate per bit (flip-flop, memory) and per chip CRÈME models for space SEE rates: https://creme-mc.isde.vanderbilt.edu/ Bit error rate to be multiplied with the number of bits in a chip The reality is sometimes different Requirements are unclear, radiation analysis is dropped or incomplete Uncertainty leads to overprotection, causing huge design overhead Projects may hit the ceiling of feasibility or affordability Several silicon iterations and radiation validation required (8)

SEU hardening of flip-flops and SRAM (9)

Radiation hardened standard cell libraries Resistor Memory Cell H. T. Weaver, C. L. Axness, J. D. McBrayer, J. S. Browning, J. S. Fu, A. Ochoa, R. Koga, "An SEU Tolerant Memory Cell Derived from Fundamental Studies of SEU Mechanisms in SRAM," Nuclear Science, IEEE Transactions on, vol. 34, no. 6, pp. 1281-1286, Dec. 1987 HIT = Heavy Ion Tolerant storage cell D. Bessot R. Velazco, "Design of SEU-hardened CMOS memory cells: the HIT cell" RADECS, 1993 DICE = Dual Interlocked storage CElI R. Velazco, D. Bessot, S. Duzellier, R. Ecoffet, R. Koga, "Two CMOS memory cells suitable for the design of SEU-tolerant VLSI circuits," Nuclear Science, IEEE Transactions on, vol. 41, no. 6, pp. 2229-2234, Dec. 1994. Examples of hardened libraries around the world ATMEL MH1RT (350 nm) and ATC18RHA (180 nm) technologies http://www.atmel.com DARE (Design Against Radiation Effects) library for UMC 180 nm and 90 nm (development) http://microelectronics.esa.int/mpd2010/day1/mpd-imec-dare-30march2010.pdf ST Microelectronics library for 65 nm under development http://microelectronics.esa.int/mpd2010/day2/dsm65nm.pdf Ramon Chips library for 180 nm Tower Semiconductors (130 nm under development) http://nepp.nasa.gov/mapld_2008/presentations/i/05%20-%20ginosar_ran_mapld08_pres_1.pdf Aeroflex (600, 250, 130, 90 nm) http://www.aeroflex.com/radhardasic MRC Microelectronics on TSMC (0.35/0.25), UTMC/AMI, HP, NSC, Peregrine http://parts.jpl.nasa.gov/mrqw/mrqw_presentations/s4_alexander.ppt HIREC/JAXXA - Fujitsu 0.18, OKI 0.15 SOI (NSREC2005) (10)

p g g SEU/SET Trench capacitors embedded DRAM cells can be used to minimise the area penalty IBM patent Transmission gates feedback path is cut off during write cycles to reduce the speed penalty ST patent (11)

Glitch filtering of clock/reset trees C-element as glitch filter Enhanced with weak keepers on the output node to prevent floating state Used to recombine a spatial redundant dual logic cone Single logic cone with a spike delay filter (Mongolkachit, RADECS 2003) (12)

DF-DICE, the SEU and SET hardened FF http://www.isi.edu/~draper/papers/mwscas05_naseer.pdf (13)

SEU protection by TMR Flip-Flop with voter Hardened libraries are used at logic synthesis, like native commercial cell libraries Speed and area (x2) penalty If no hardened library available Triple Modular Redundancy (TMR) flip-flops Using standard flip-flops of the commercial library Data input is fed to three flip-flops at the same time, Outputs of the flip-flops are majority voted (combinatorial half-adder) Area overhead on flip-flops is a factor of > 3, but little in combinatorial logic Implemented in the RTL source code, by netlist editing or by synthesis tool (14)

STMR: TMR with triple skewed clock By skewing the clocks, a glitch at D can be latched at most in one of the 3 FF D D3 SET pulse SET latched into FF1 only D1 D2 FF1 FF2 FF3 clk clk1 clk2 clock tree 1 clock tree 2 Q1 Triplicated clock tree and skewed clocks clock tree 3 Majority Voter Q2 Q3 Q remains at correct value clk3 ~ SET pulse length Q = (Q1 and Q2) or (Q2 and Q3) or (Q1 and Q3) Q (15)

STMR in the ASIC design flow TMR: Increased complexity affects the design flow and results Large netlist with higher cell and node count Increased run-time or even crashes of EDA tools Design optimisation is less efficient Synthesis tools are designed to remove redundancy Normally, registers are not modified but be careful with sequential optimisation (pipelining, retiming etc.) Timing issues TMR voting and clock skewing reduces maximum speed Increased area leads to higher interconnect delay Clock skewing can be removed by hold-time fix Verification and test issues TMR and formal verification (1 FF in RTL 3 FF at gate level) TMR (= redundancy) affects testability in scan testing Implementation of protection has to be verified at netlist level (16)

STMR insertion at RTL or gate level STMR in VHDL Clock nets/ports are a vector of 3 bit Use the two-process method [6] -- One process per TMR domain: rx0 : process(clk) begin if rising_edge(clk(0)) then r0 <= d; end if; end process; rx1 : process(clk) begin if rising_edge(clk(1)) then r1 <= d; end if; end process; rx2 : process(clk) begin if rising_edge(clk(2)) then r2 <= d; end if; end process; -- Vote outputs r <= (r0 and r1) or (r0 and r2) or (r1 and r2); Synthesis with TMR in one go Disallow register merging Structural verification required STMR at gate level Used mainly for third party IP Library and tool dependent Synthesise netlist without TMR Create HDL package with TMR equivalent macro-cells Edit netlist to triplicate clocks and asynchronous resets sed -e 's/clk\(.*\) std_logic/clk\1 std_logic_vector(2 downto 0) /' Edit netlist replacing every flip-flop by its TMR equivalent sed -e 's/dff1/dff1_tmr/' sed -e 's/dff2/dff2_tmr/' Resynthesise the edited netlist, linking with the TMR macro-cell package Disallow register merging Structural verification required (17)

Inserting triple skewed clock/reset trees Clock Tree Synthesis (CTS) optimises skew inside a single clock tree but we need three coherent trees (not supported by CTS tools) Need to control the insertion delay (X, X+, X+2 ) Compromise: insert three distinct trees with well adjusted CTS parameters Delay inserted at the origin of the clock trees Instantiate delay buffers in the VHDL source code for simulation Model at synthesis by set_ideal_latency and set_propagated_clock Initial value for is speculative control/adjustment in backend process Triplicate also asynchronous reset trees Triplicate any logic in clock and asynchronous reset networks (18)

We need to control the relative clock latency: X X+ X+2* Coherent clock trees CTS did not achieve goal Manual adjustment of delay elements required (19)

TMR Timing Issues d1a d3a d2a FF1 FF3 t setup t prop FF2 q2a q1a q3a Voter voter combinat. logic logic d1a d3a d2a FF1 FF3 FF2 q2a q1a q3a Voter voter clk clk1 clk2 clk3 Cycle Time T >= t prop + logic + t setup + voter + 2 TMR voters and clock skewing reduce operating frequency (20)

Area and power overheads of hardened FF Voted TMR cells Area overhead >~ factor 3 Power consumption ~ factor 3 SEU hardened flip-flops Area overhead factor 2 2.5 Power consumption factor 2 3 Overhead only on flip-flops Total overhead depends on share of combinatorial and sequential logic A = 3x flip-flops + 1x combinatorial Share of flip-flops Area overhead Synthesis description of the DARE library State toggle power increases ~ x3 Standard DFF rise_power(li5x5) { index_1("0.016, 0.064, 0.128, 0.8, 1.07") ; index_2("0.03, 0.15, 0.75, 1.5, 3") ; values("0.260154 0.260608 0.259797 0.262227 0.265544",\ "0.258697 0.259304 0.258465 0.262485 0.264274",\ "0.258899 0.259535 0.258754 0.26171 0.264257",\ "0.259817 0.260501 0.259856 0.262175 0.265157",\ "0.259849 0.260833 0.260201 0.262509 0.265653"); } Hardened XDFF rise_power(li5x5) { index_1("0.016, 0.064, 0.128, 0.8, 1.07") ; index_2("0.03, 0.15, 0.75, 1.5, 3") ; values("0.800729 0.800399 0.794199 0.79509 0.799814",\ "0.789216 0.788791 0.7821 0.78533 0.78516",\ "0.781962 0.781545 0.774982 0.777166 0.776802",\ "0.770274 0.769896 0.763804 0.765198 0.769422",\ "0.765816 0.76547 0.759478 0.760922 0.765386"); } 25% 1.5 50% 2 75% 2.5 Clock power increases ~ x2 Standard DFF rise_power(i5) { index_1("0.03, 0.15, 0.75, 1.5, 3") ; values("0.09928 0.098241 0.111142 0.131959 0.180269"); } Hardened XDFF rise_power(i5) { index_1("0.03, 0.15, 0.75, 1.5, 3") ; values("0.208006 0.207359 0.227548 0.26199 0.344905"); } (21)

Hold violations with skewed clocks FFA3 t setup t prop FFB3 t setup t prop FFA2 Voter FFB2 Voter FFA1 FFB1 clk clk1 clk2 clk3 When propagation delays (t prop, voter) < (2 ) clock skew hold violation FFA1 FFB3 (22)

Wrong hold fix by EDA tool FFA3 t setup t prop FFB3 t setup t prop FFA2 Voter FFB2 Voter FFA1 FFB1 clk clk1 clk2 clk3 Automatic buffer insertion by fix-hold of synthesis tool compensates clock skew and spoils SET protection (23)

Clock spread dilution by wrong hold fix D D3 SET pulse SET latched into FF1 only D1 D2 FF1 FF2 FF3 clk clk1 clock tree 1 Q1 clock tree 2 Q2 clock tree 3 Q3 clk2 Majority Voter Q remains at correct value clk3 Q [T(clk2) T(d2)] [T(clk1) T(d1)] Difference between clock and data arrival in each TMR triplet (24)

Clock spread dilution by wrong hold fix D D3 SET pulse SET latched into FF1 only D1 D2 FF1 FF2 FF3 clk clk1 clock tree 1 Q1 clock tree 2 Q2 clock tree 3 Q3 clk2 Majority Voter Q remains at correct value clk3 Q [T(clk2) T(d2)] [T(clk1) T(d1)] Difference between clock and data arrival in each TMR triplet Before hold-fix: well pronounced peak eff = nominal Clock skew creates many hold violations After wrong hold-fix: two maxima (with and without delay insertion) (25)

Correct hold fix FFA3 t setup t prop FFB3 t setup t prop FFA2 Voter FFB2 Voter FFA1 FFB1 clk clk1 clk2 clk3 Group FF belonging to the same triplet and dont_touch SET protection through clock skew conserved (26)

Scan Path Insertion (wrong) si3 qa2 FFA3 t setup t prop qa3 si3 qb2 FFB3 t setup t prop qb3 FFA2 FFB2 si2 qa1 si2 qb1 si1 FFA1 si1 FFB1 clk clk1 clk2 clk3 Scan path routing across sub-clock domains hold violations (27)

Scan Path Insertion (right) si3 FFA3 t setup t prop qa3 --> sib3 FFB3 t setup t prop qb3 si2 FFA2 qa2 --> sib2 FFB2 qb2 si1 FFA1 qa1 --> sib1 FFB1 qb3 clk clk1 clk2 clk3 Better: one scan path per sub-clock domain (28)

Protection of SRAM blocks (parity) XOR Parity bits Employed for a long time, also in ground-based computers Error handling: correction/reload by HW state machine or software (reboot) Loss of data, unless redundant data is available elsewhere in the system» Cache memories (duplicates in external memory) cache miss on parity error» Duplicated memories (e.g. a 3-port register file composed of two 2-port memories) Error detection while processing possibly corrupt data normally no timing penalty Only in error case: copy correct data from replica and repeat processing Control state machine ALU RF 1/2 RF 2/2 PAR PAR Error detection logic (29)

Protection of SRAM blocks (EDAC) EDAC = Error Detection And Correction ECC (Error Correcting Codes) Hamming codes to correct single bit flips per word EDAC VHDL package from ESA: http://www.esa.int/tec/microelectronics/semh0x8l6ve_0.html Reed Solomon for multiple bit upsets (MBU) in SDRAM Scrubbing required to prevent error accumulation (scrubbing) Control state machine to rewrite corrected data Timing penalty start processing with uncorrected data and abort processing (rewind pipeline) in case of error Example ACTEL core: www.actel.com/documents/edac_an.pdf (30)

Protection of SRAM blocks (TMR) Triplicated memory (Xilinx) Scrubbing in background using spare port of dual-port memory No Huge area overhead Also efficient against configuration upset (31)

SEU in reprogrammable FPGA (RFPGA) Increasing interest for SRAM based RFPGA Lower NRE cost than ASIC In-flight reconfiguration capability High performance and complexity allowing System-On-FPGA SEU in configuration memory Affect not only user data or state (as in ASIC) but alter the functionality of the circuit itself turn the direction of I/O pins SEU mitigation for RFPGA Configuration scrubbing or read-back and partial reconfiguration Triplication of registers and combinatorial logic Voting of logical feedback paths Redundancy for user memory Voting of the outputs Triplication of I/Os (32)

TMR for SRAM FPGA Plain sequential and combinatorial logic Standard TMR with single voters not for SRAM FPGA TMR for sequential and combinatorial logic and voters (33)

SEU mitigation in reprogrammable FPGA SEE mitigation by design for commercial RFPGA Functional Triple Modular Redundancy (FTMR) combinatorial and sequential triplication and voting in implemented in VHDL source code» http://microelectronics.esa.int/techno/reprofpga.htm Xilinx TMR tool Triplication of combinatorial, sequential logic and IO s and feedback voters http://www.xilinx.com/ise/optional_prod/tmrtool.htm SEE hard reprogrammable FPGA Atmel AT40KEL and the ATF280 FPGA under CNES contract http://www.atmel.com/dyn/products/devices.asp?family_id=641#1477 Xilinx Virtex-5QV (SIRF = SEU Immune Reconfigurable FPGA) http://www.xilinx.com/products/virtex5qv/ Actel RT ProASIC3 flash based FPGA http://www.actel.com/products/milaero/rtpa3/default.aspx JAXXA/CNES/Atmel development, 450kG SRAM based FPGA on 150 nm SOI https://eeepitnl.tksc.jaxa.jp/jp/event/mews/22nd/data/16_12_1.pdf (34)

Verification of SEE hardening TMR or hardened cells are larger and slower than soft FF» Redundancy removed by logic optimisation (synthesis and back-end)» TMR modified by timing optimisation Defects in redundant structures do not appear at simulation» TMR simulation works even if only two of the three FF are correct How do we know if the hardening concept was properly implemented? (35)

Verification of SEE hardening TMR or hardened cells are larger and slower than soft FF» Redundancy removed by logic optimisation (synthesis and back-end)» TMR modified by timing optimisation Defects in redundant structures do not appear at simulation» TMR simulation works even if only two of the three FF are correct Structural and formal verification, timing analysis» Presence of triple FF, correct wiring of the three clock/reset domains» Parsing the netlist with scripts (grep)» Increasing complexity requires formal verification tools Fault simulation and injection» Functional impact of tolerated SEU Ground radiation testing (36)

InFault Intelligent Fault Analaysis C++ SW to recognise TMR in a netlist and validate its correctness Simon Schulz, Giovanni Beltrame, David Merodio Codinachs: Smart Behavioural Netlist Simulation for SEU Protection Verification http://microelectronics.esa.int/papers/simonschulzinfault.pdf Main algorithm steps Netlist parser (Verilog, EDIF) Creates an untimed graph representation of the logic Detects TMR triplets and voters Checks TMR and voting logic Checks the (triplicated) clock and reset trees (37)

SST: the SEU Simulation Tool TCL package to inject SEU into flip-flops during Modelsim simulation http://microelectronics.esa.int/asic/sst-functionaldescription1-3.pdf http://www.nebrija.es/~jmaestro/esa/sst.htm DESIGN UNDER TEST (DUT) SEU s SST Modelsim VHDL SIMULATOR Inputs VHDL Design Behaviour with SEUs Responses Command Requests FAULT INJECTION MANAGEMENT Test Bench Gold behaviour Comparator Log (38)

FT-UNSHADES Fault Tolerance University Of Sevilla Hardware Debugging System SEU injection into flip-flops based on FPGA partial reconfiguration http://microelectronics.esa.int/fiws/wfift_p9_ft-unshades.ppt (or.ppsx) http://walle.us.es/ftunshades/ Test Vector Memories 2Mx102 control FPGA Spartan II-50 Comparator System Clock Stimuli inputs system FPGA Virtex II (6000 or 8000) STIMULI DUT Gold DESIG N Comparator Faulty Faulty Emulator (39)

FLIPPER Injection platform for SRAM based FPGA Injection into Xilinx Virtex2 configuration RAM (Virtex4 in preparation) http://microelectronics.esa.int/fiws/wfift_p8_alderighi_flipper.pdf http://cosy.iasf-milano.inaf.it/flipper_index.htm (40)

STAR RORA: SEE protection of SRAM FPGA layout CAD tools to analyze and improve the layout of SRAM based FPGA http://www.cad.polito.it/research/fpga/field%20programmable%20gate%20array.html http://microelectronics.esa.int/fiws/wfift_p6_analysis_scu_mbu.pdf http://microelectronics.esa.int/fiws/wfift_p7_mitigation_of_scu_and_mcu.pdf STAR: STatic AnalyzeR Identify bits which are sensitive in spite of full TMR, e.g. bits causing faults in two TMR domains RORA: Reliability-Oriented Routing Algorithm Modify the layout (place and route) to fix the sensitive bits identified by STAR Supports all Xilinx devices Spartan2 Virtex4 (41)

SUSANNA JONATHAN targeting Atmel FPGA Fault tolerance analysis of designs on Atmel AT40k and ATF280 FPGA http://microelectronics.esa.int/papers/susannajonathansemifinal.pdf SUSANNA SUSANNA: Identify sensitive bits of the configuration bit stream. Does a bit flip lead modify the design? JONATHAN: Correlates the sensitive bits with a given instance in the design and identifies the most sensitive modules in the design Improved place & route algorithms under development JONATHAN (42)

Ground radiation testing Radiation Facilities in use by ESA https://escies.org/readarticle?docid=230 Co-60 at ESA/ESTEC, Netherlands (total dose) Californium-252 at ESA/ESTEC, Netherlands Paul Scherrer Institut (PSI), Switzerland: proton irradiation Louvain la Neuve (UCL), Belgium: heavy ions and protons Jyväskylä University, Finland: heavy ions and protons (43)

Conclusion SEE mitigation requires a sound Radiation Hardening Approach Identify dependability requirements and environmental conditions Perform radiation analysis to define hardening concept Is 100% protection of every element always necessary? Determine the impact of an upset at system level Sometimes, selective use of SEE protection is sufficient Implement, design, verify during design time, validate the final result ASIC libraries with hardened elements (flip-flops, buffers) TMR allows using commercial cell libraries, but it is difficult to implement with commercial EDA tools Hardened library cells are easier to use SRAM reprogrammable FPGA require different hardening concepts Thorough verification of the radiation hardening is required Redundancy might be removed by EDA tools Numerous tools exist to verify and validate the designs (44)

Contact: Roland.Weigand [at] esa.int http://www.esa.int/tec/microelectronics/ http://microelectronics.esa.int/papers/twepp2010-rw.pdf Questions? (45)