Soft Error Derting Computtion in Sequentil Circuits Hossein Asdi Northestern University, ECE Dept. Boston, MA 02115 gsdi@ece.neu.edu Mehdi B. Thoori Northestern University, ECE Dept. Boston, MA 02115 mthoori@ece.neu.edu ABSTRACT Soft error tolernt design becomes more crucil due to exponentil increse in the vulnerbility of computer systems to soft errors. Accurte estimtion of soft error rte (SER), the probbility of system filure due to soft errors, is key fctor in design of cost-effective soft error resilient systems. We present very fst nd ccurte pproch bsed on enhnced sttic timing nlysis nd signl probbilities to estimte the probbility of ltching n incorrect vlue in the system bistbles (timing derting). Experimentl results nd comprison with fult injections using timing ccurte Monte- Crlo simultions show tht the ccurcy of our pproch is within 1% while orders of mgnitude fster. Ctegories nd Subject Descriptors B.2.3 [Performnce nd Relibility]: Relibility, Testing, nd Fult-Tolernce 1. INTRODUCTION Improvements in device scling, trnsistor density nd system speed of CMOS technology come t the expense of incresed vulnerbility of these systems to soft errors. Soft errors, lsoknownssingle Event Upsets (SEUs), re the min relibility thret of digitl systems. Soft Error Rte (SER) is defined s the system filure rte due to SEUs. Even if SER per bit remins constnt with technology scling, the SER per chip will increse exponentilly due to the increse in the number of trnsistors per chip, i.e. Moore s lw. Recent studies show tht the soft error vulnerbility of combintionl logic components will soon become comprble with tht of sequentil elements (SRAM cells, flip-flops, nd ltches) [5]. Accurte SER estimtion is essentil to develop efficient soft error tolernt schemes nd to determine the contribution of design components to the overll system SER. An erroneous system stte occurs in the following scenrio [4]. A prticle strike cuses glitch t the output of the gte (Nominl FIT), this glitch propgtes through the com- bintionl logic to the flip-flop inputs (Logic Derting), nd finlly this erroneous glitch is cptured in flip-flop, i.e. the erroneous trnsient must hve sufficient overlp with the ltching window of the flip-flop (Timing Derting). Therefore, the error rte of node in digitl circuit is computed s Nominl FIT Logic Derting T iming Derting. Unlike logic derting estimtion which only requires sttic nlysis, estimtion of timing derting needs dynmic nlysis of trnsient propgtion. Specificlly, for logic derting estimtion bsed on fult injection, smple of fult sites (e.g. gte outputs) re selected nd for ech error site, smple of input vectors re fult simulted. However, for timing derting estimtion, new dimension is dded in which the erroneous trnsient pulse t the fult site hs to be injected t rndom time within the clock period. As result, fult injection for timing derting estimtion is orders of mgnitude more tedious nd less ccurte thn logic derting estimtion. This work focuses on estimtion of timing derting fctor in sequentil circuits in SER estimtion flow. We present n nlyticl technique for logic-timing derting estimtion which elimintes the need for time-consuming (fult) simultions. The proposed technique is bsed on n enhnced sttic timing nlysis method to compute ll propgted wveforms from struck gte (error site) to rechble flipflops nd clculte the probbility of ltching n incorrect vlue in flip-flop. We lso exploit technique bsed on signl probbility vlues to estimte the propgtion probbilities of erroneous vlues (or trnsient pulses) from the error site to rechble ltches nd flip-flops. Algorithms for the estimtion of the ltching probbilities of erroneous trnsients re provided. We lso nlyze the dependency of the overll ccurcy of the proposed method to the ccurcy of signl probbility vlues. The rest of this pper is orgnized s follows. Sec. 2 reviews the previous work on SER estimtion techniques. In Sec. 3, the proposed logic nd timing derting estimtion pproch is presented. In Sec. 4, experimentl results re presented. Finlly, Sec. 5 concludes the pper. Permission to mke digitl or hrd copies of ll or prt of this work for personl or clssroom use is grnted without fee provided tht copies re not mde or distributed for profit or commercil dvntge nd tht copies ber this notice nd the full cittion on the first pge. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission nd/or fee. ICCAD 06, November 5-9, 2006, Sn Jose, CA Copyright 2006 ACM 1-59593-389-1/06/0011...$5.00. 497 2. PREVIOUS WORK Previous logic-timing derting estimtion methods cn be ctegorized into two groups. The first group uses fult injection bsed on rndom vector simultion pproches [3, 4, 6, 8]. Since the ccurcy of such pproches depends on the rtio of the number of injected fults nd simulted vectors to the totl number of possible error sites nd vector spce, it is very hrd to chieve resonble ccurcy using these
techniques. The execution time for logic derting estimtion of node in lrge circuits exponentilly increses with the size of the circuit. Hence, logic derting estimtion of lrger circuits becomes intrctble nd very inccurte using fult injection techniques. The second group uses n pproch bsed on Binry Decision Digrm (BDD) for SER estimtion [7]. Although this pproch might be ble to chieve more ccurte results compred to simultion-bsed methods, it hs still exponentil time complexity for lrge circuits, especilly with reconvergent fnouts. An nlyticl pproch to ccurtely estimte sttic logic derting in combintionl circuits ws proposed in our previous work [1, 2]. The proposed method gives liner computtionl time complexity nd computes the logic derting fctor orders of mgnitude fster thn simultion-bsed methods. 3. TIMING -LOGIC DERATING ESTIMA- TION If prticle with sufficient energy hits prticulr gte nd cuses bit flip t the output of this gte, we cll this gte s the error site. Bsed on structurl pths from the error site to the rechble primry outputs nd flip-flops, we cn ctegorize nets (signl lines) nd gtes in the circuit s follows [1, 2]. An on-pth signl is net on pth from the error site to rechble output. Also, n on-pth gte is defined s gte with t lest one on-pth input. An off-pth signl is net tht is not on-pth nd is n input of n on-pth gte. Assume tht prticle strike cretes full swing glitch with pulse width w t time t t the output of gte g i,s shown in Fig 1. Also, ssume tht there is only one pth from this gte to flip-flop j. Depending on the vlue of other signls in the circuit, this erroneous trnsient my or my not propgte to the input of j. If it propgtes, then glitch with width w t time t will pper t the input of this flip-flop. t t depends on the propgtion dely long the pth from g i to j,ndw depends on the vrious rise nd fll trnsition delys for the gtes long this pth. SEU A SP B =0.2 B t w off-pth signls D C SP C =0.4 Figure 1: Propgtion of trnsient through unique pth to flip-flop For computing the propgtion probbility (PP) of the erroneous glitch, we use the estimtion method presented in our previous work [1, 2]. Consider the exmple shown in Fig 1 in which there is only one pth from the error site to n output. As we trverse this pth gte by gte, the propgtion probbility from n on-pth input of gte to its output depends on the type of the gte nd the signl probbility of the other off-pth signls. In this exmple, the propgtion probbility for the glitch to propgte to the output of gte D (AND gte) is the product of the probbility of the output of gte B being 1 nd the propgtion E t' w' 498 probbility t its input (1 0.2 = 0.2). Similrly, the propgtion probbility t the output of gte E (OR gte) is clculted s 0.2 (1 SP C)=0.2 0.6 =0.12. Note tht we ssume tht the vlue of ll signls other thn on-pth signls re stble, i.e. no other signl is mking trnsition. This ssumption is used throughout the pper. The ltching probbility (LP) is defined s the probbility tht n erroneous vlue is cptured in rechble flip-flop. Once the durtion of the propgted erroneous glitch to the input of flip-flop is obtined, LP cn be clculted bsed on the setup (S) nd hold (H) time of the flip-flop, glitch width (W ), nd clock period (T ): LP = S+H+W.Figure2shows T the ltching window. Error propgtion probbility (EPP) is clculted s the product of propgtion probbility nd ltching probbility, i.e. EPP = PP LP. overlp window = W + S + H W S T Figure 2: Ltching window In generl, there cn be multiple pths from gte g i (error site) to flip-flop j. In this cse, there is t lest one gte long the pth in which the trnsient ppers on t lest two inputs of tht gte. In this sitution, the shpe of the propgted erroneous wveform due to simple glitch t the output of g i my not be simple glitch. The shpe of the propgted wveform depends on the prticulr pths which propgte the trnsient nd reltive propgtion delys of these pths. Figure 3 shows n exmple in which there re multiple pths from the error site to the flip-flip. There re three possible propgtion scenrios: 1) propgtion through only the NAND gte, 2) propgtion through only the OR gte, nd 3) propgtion through both pths. Even if we consider simple gte dely model (the dely of ech gte is shown inside the gte in this figure), there re five possible wveforms tht cn pper t the input of the flip-flops, plus one cse of no propgtion t ll. The top wveform t the input of the flip-flop is due to the propgtion through only the NAND gte. If the output of the OR gte is 0, then the sme wveform is propgted since the reconvergent gte is XOR. If the output of the OR gte is 1, then the inverted wveform will be propgted (not shown). If the wveform is propgted through both pths, then the shpe of the wveform is not single glitch (middle wveform). Finlly, if the glitch is propgted through only the OR gte, then the bottom wveform or its inverted will pper t the input of the flip-flop, depending on the output vlue of the NAND gte. SEU 1 2 H 2 W (inverted) (inverted) Figure 3: Propgtion of trnsient through reconvergent pths
This simple exmple shows tht depending upon the possible propgtion pths from the error site to rechble flip-flop, vrious wveforms cn pper t the input of the flip-flop. For ech propgtion scenrio, the error probbility is the product of the propgtion probbility nd the ltching probbility for tht prticulr cse. The overll EPP is clculted s follows: EPP gi j =1 ll propgted wveforms k (1 PP k LP k ) In the following subsections, we explin how to compute ll possible erroneous wveforms nd their corresponding propgtion probbilities. 3.1 Propgtion Probbility For estimtion of the propgtion probbility, we use n pproch similr to [1, 2]. Here we explin how to perform sttic error propgtion nlysis. In Sec. 3.2, this is expnded for dynmic error propgtion nlysis, i.e. propgtion of erroneous trnsients (glitches). In generl network of logic gtes in which there re reconvergent pths from n error site to prticulr rechble flip-flop or primry output, the polrities of propgted erroneous vlues, with respect to the erroneous vlue t the error site, must be considered. Therefore, the propgtion probbility from the error site to the output of reconvergent gte depends on not only the type of the gte nd the signl probbilities of the off-pth signls, but lso the polrities of the propgted error on the on-pth signls. In the presence of errors, the sttus of ech signl cn be expressed with four vlues: 0: no error is propgted to this signl line nd the signl hs n error-free vlue of 0. 1: no error is propgted to this signl line nd it hs logic vlue of 1. : the signl hs n erroneous vlue with the sme polrity s the originl erroneous vlue t the error site (denoted by ). ā: the signl hs n erroneous vlue, but the erroneous vlue hs n opposite polrity compred to the erroneous vlue t the error site (denoted by ā). Bsed on this four-vlue logic, we cn redefine propgtion rules for ech logic gte. These probbilities, denoted by P (U i), Pā(Ui), P 1(U i), nd P 0(U i), re explined s follows: P (U i)(pā(ui)) is the probbility tht the erroneous vlue is propgted from the error site to U i with n even (odd) number of inversions. P 1(U i)(p 0(U i)) is the probbility of node U i being 1 (0). In this cse, the error is msked nd not propgted. Note tht P (U i)=p (U i)+pā(ui)+p 1(U i)+p 0(U i)=1. Since the polrities of propgted errors re considered, propgtion probbilities t the output of reconvergent gtes re correctly clculted. The propgtion computtion rules for elementry gtes, AND, OR, ndnot,reshownin Tble 1. Propgtion rules for other logic gtes cn be derived ccordingly. These propgtion rules re used for ech gte rechble from the error site in level by level order. Therefore, ll error propgtion probbilities cn be clculted in only one pss. More detils cn be found in [1, 2]. 499 Tble 1: Output propgtion probbility rules for elementry gtes GATE RULE AND P 1(out) = n P1(Xi) P (out) = n [P1(Xi)+P(Xi)] P1(out) Pā(out) = n [P1(Xi)+Pā(Xi)] P1(out) P 0(out) =1 [P 1(out)+P (out)+pā(out)] OR P 0(out) = n P0(Xi) P (out) = n [P0(Xi)+P(Xi)] P0(out) Pā(out) = n [P0(Xi)+Pā(Xi)] P0(out) P 1(out) =1 [P 0(out)+P (out)+pā(out)] NOT P 1(out) =P 0(input), P 0(out) =P 1(input) P (out) =Pā(input), Pā(out) =P (input) 3.2 Ltching Probbility The objective here is to compute ll possible erroneous wveforms t the input of ech rechble flip-flop j due to glitch (with prticulr width w) t the output of gte g i (error site) cused by n SEU. Note tht the initil trnsient pulse width cn be determined bsed on the energy of the prticle (the mount of injected chrge), type nd size of the gte, nd the technology prmeters. A glitch t the output of gte g i strting t time t with pulse width w cn be expressed s two trnsition events t time t nd t + w on the error site, respectively. Depending upon the polrity of the glitch, the first event is rising (flling) nd the second event is flling (rising) trnsition. We use modified version of sttic timing nlysis in which we compute ll events t the outputs of ll on-pth gtes due to these two events t the error site. Ech event is described s pir of time nd polrity (flling or rising). Since the error-free stte of gte g i is sttisticl vrible, the erroneous trnsient could either be positive or negtive glitch. Therefore, the injected glitch cn be expressed by two events s follows. The first event cn be either flling or rising trnsition. The second event hs to be the opposite of the first event. This wy, n erroneous trnsient cn be described without specifying the error-free stte of g i. We use nottion similr to wht we used in Sec. 3.1. We denote the first event of the glitch s nd the second glitch s ā (s the opposite of the first event). So, we put the events (, t) nd(ā, t + w) t the output of gte g i to represent n erroneous trnsient with pulse width w. The events re propgted level by level, bsed on their distnce from g i. The level of ech gte is defined s one plus the mximum level of its input, ssuming tht the level of g i is zero. The sme propgtion rules presented in Tble 1 re used strting from the error site to ll rechble flipflop. However, we need to perform these propgtion rules on timed events. The gtes re processed bsed on their levels in their incresing order. The events t the output of ech gte cn be determined bsed on the events t its input, type of the gte, nd the gte dely model. This wy, we cn clculte the event list Event List(g) forech on-pth gte g. Once the event list t the input of ech rechble flip-flop j is clculted, we cn generte ll possible wveforms tht cn be resulted from propgtion of [(, t), (ā, t + w)] t g i. A propgted wveform t j input cn be obtined from series of to ā events (or lterntively from ā to events) in Event List( j). By enumerting ll such series, ll propgted wveforms will be clculted. As n
exmple, consider the following event list t flip-flop input: {(, t 1), (ā, t 2), (, t 3), (ā, t 4)}, wheret 1 <t 2 <t 3 <t 4. Possible wveforms include [(, t 1), (ā, t 2)],[(, t 3), (ā, t 4)], [(, t 1), (ā, t 4)],[(ā, t 2), (, t 3)], nd [(, t 1), (ā, t 2), (, t 3), (ā, t 4)]. However, [(, t 1), (ā, t 2), (, t 3)] is not vlid wveform since strts nd ends by events. Figure 4 shows n exmple of this pproch to propgte ll events from the error site to the rechble flip-flop. Since ll possible events will be considered in the event list of ech gte, one could rgue tht the size of this list could be excessively lrge. We looked t the mximum size of the event lists for some of the simulted circuits in our experiments. Our results show tht the mximum size of event lists for ISCAS 89 benchmrk circuits vries between 13 (for s298) to 217 (for s35932). Therefore, the size of the event lists is trctble. 3.3 Algorithm Algorithm 1 shows the overll procedure s explined in Sec. 3.2. For ech gte (considered s n error site) in the circuit, ll its structurlly rechble flip-flips re extrcted. The event list s well s the probbility of ech event is propgted from the error site to rechble flip-flops. Bsed on the event list t the input of ech flip-flop, the possible wveforms re computed to obtin propgtion nd ltching probbilities. Timing-logic derting due to SEUs t this error site is clculted bsed on these probbilities. The overll circuit derting cn be obtined bsed on the derting of ech individul gte. Note tht the glitch pulse width must be specified s n input to this procedure. 1 Algorithm:Timing Derting Computtion 2 w: Glitch-Width 3 TD: Timing-Derting fctor 4 for ech gte G i do 5 List(G i ) Extrct on-pth gtes rechble from G i 6 List(G i ) Sort List(G i ) bsed on distnce from G i 7 Event List(G i ) Add Event(,time=t); 8 Event List(G i ) Add Event(ā, time=t + w); 9 for ech gte G j in List(G i ) do 10 for ech input (k) ofgteg j do 11 Event List(G j ).Add event list(k); 12 end 13 for ech event E in Event List(G j ) do 14 Apply propgtion rules(e); /*see Tble 1*/ 15 end 16 end 17 TD(G i ) 1; 18 for ech Flip-Flop ( j ) in List(G i ) do 19 TD Gi j 0; 20 for ech vlid wveform (p k )in Event List( j ) do 21 PP k Propgtion Probbility(p k ); 22 LP k Ltching Probbility(p k ); 23 TD Gi j TD Gi j + PP k LP k ; 24 end 25 TD(G i ) TD(G i ) (1 TD Gi j ); 26 end 27 TD(G i ) 1 TD(G i ); 28 end Algorithm 1: Timing Derting Computtion 3.4 Electricl Msking Effect The mgnitude (height) of the erroneous glitch cn be ttenuted while propgting through logic stges. This is known s electricl msking nd ffects SER. To consider 500 this effect, logic librry cells cn be pre-chrcterized for different prticle chrge vlues. For ech librry cell, n electricl ttenution fctor lookup tble cn be obtined bsed on cell fnout cpcitnce nd SEU pulse width nd height. The logicl msking fctor needs to be multiplied by the ttenution fctor when computing P (U i)ndpā(ui). As propgtion probbility tble (Tble 1) is used to obtin the propgtion probbilities t the output of ech gte using the corresponding vlues t the inputs of tht gte, the ttenution lookup tbles re used to compute the mgnitudes of the propgted vlues t the output of the gte bsed on the mgnitudes of the trnsients t the inputs of the gte nd the fnout of the gte. 4. EXPERIMENTAL RESULTS In order to verify the ccurcy of the proposed technique, we hve developed fult injection engine bsed on Monte- Crlo (MC) simultions. For given glitch width, we hve injected glitches t the output of gtes t different times during the clock period. The rndom vribles re the struck gte nd the time of the glitch. Timing ccurte logic simultion determines if the injected glitch cn be cptured in ny flip-flop. The MC simultion termintes if the ccurcy of the estimted derting flls within pre-defined confidence intervl (in our experiments, the mximum vrince is 5% nd the confidence level is 99%). The proposed pproch ws implemented nd pplied to ISCAS 89 sequentil benchmrk circuits. All experiments hve been performed on the DELL Precision 450 system equipped with 2 GB min memory. Figure 5 shows the run time for both Monte-Crlo simultion nd our proposed pproch including signl probbility (SP) clcultion time. Note tht the Y-xis in this figure is logrithmic. On verge, the proposed method is 31,000 times fster thn the MC simultion pproch. The run time of our pproch for the lrgest ISCAS 89 circuits is only 5 minutes. Figure 6 shows the ccurcy of the proposed pproch compred to the MC simultion method. The ccurcy is compred using different vrinces (ccurcies) of SP vlues used in the proposed method. Note tht the run-time for SP estimtion is exponentilly relted to the required ccurcy of the vlues. However, these results confirm tht the overll ccurcy of the proposed method is not considerbly sensitive to the ccurcy of SP vlues. In other words, it is possible to use less ccurte SP vlues (trctble for lrge circuits) to chieve resonbly ccurte SER. The results show tht the ccurcy of our presented pproch is within 1% of the MC pproch. 5. CONCLUSIONS Soft errors due to single event upsets re the min relibility thret of digitl systems. Estimtion of soft error rte in sequentil circuits is very chllenging since computing the probbility of n erroneous system stte requires dynmic nlysis of trnsients. As result, fult injection methods become completely intrctble. In this pper, we hve proposed combined logic nd timing derting estimtion method in sequentil circuits. The proposed technique uses n enhnced sttic timing nlysis to derive ll possible erroneous wveforms propgted from struck gte to rechble flip-flops nd clcultes the probbility of ltching n incorrect vlue in flip-flops. We
SEU Glitch width = 1 A B SP A =0.3 SP B =0.2 D C 5 6 SP C =0.3 T=5: P(D) = 0.2()+0.8(0) T=6: P(D) = 0.2()+0.8(0) SP D =0.1 SP SP E =0.7 G =0.5 0 1 E T=3: P(A) = 1() G T=0: P(A) = 1() T=1: P(A) = 1() T=4: P(A) = 1() 3 4 8 9 T=8: P(G) = 0.7()+0.3(0) F SP T=9: P(G) = 0.7()+0.3(0) F =0.7 T=13: P(H) = 0.28(0)+0.07()+0.65(1) T=14: P(H) = 0.28(0)+0.07()+0.65(1) T=16: P(H) = 0.168(0)+0.532()+0.3(1) T=17: P(H) = 0.168(0)+0.392()+0.042()+0.398(1) H SP H =0.7 13 14 13 16 17 16 14 17 prob = 0.07*0.07*0.468*0.566=0.0013 prob = 0.93*0.93*0.532*0.392=0.1804 prob = 0.07*0.93*0.532*0.566=0.0196 prob = 0.93*0.07*0.468*0.392=0.0119 Figure 4: Exmple: Event propgtion, genertion of ll possible propgted wveform, nd propgtion probbilities Time (seconds) (log) 10 6 10 5 10 4 10 3 10 2 10 1 10 0 10 1 10 2 s298 s344 SP time Our pproch time MC sim time s349 s382 s386 s400 s420 s444 s510 s526 s641 s713 s820 s832 s838 s953 s1196 s1238 s1423 s1488 s1494 s35932 verge Figure 5: Execution times for the MC simultion pproch, SP computtion, nd the proposed method (for n injected pulse width of 50 ps) Derting Fctor 0.5 0.4 0.3 0.2 0.1 0 s27 s298 Our pproch: SP vrince=0.02, confidence level=99% Our pproch: SP vrince=0.04, confidence level=99% Our pproch: SP vrince=0.08, confidence level=99% MC Simultion: vrince=0.02, confidence level=99% s344 s349 s382 s386 s400 s420 s444 s510 s526 s641 s713 s820 s832 s838 s953 s1196 s1238 s1423 s1488 s1494 s35932 verge Figure 6: Comprison of the ccurcy of the MC simultion with our pproch using different SP vrinces (for n injected pulse width of 50 ps) lso exploit technique bsed on signl probbility to estimte propgtion probbilities. Experimentl results nd comprison with timing ccurte Monte-Crlo simultions show tht our proposed technique is 4-5 orders of mgnitude fster while the difference in ccurcy is lmost 1% on verge. 6. REFERENCES [1] G. Asdi nd M. B. Thoori, An Accurte SER Estimtion Method Bsed on Propgtion Probbility, Proc. Design Automtion nd Test in Europe Conf., pp.306-307, Mrch 2005. [2] G. Asdi nd M. B. Thoori, An Anlyticl Approch for Soft Error Rte Estimtion In Digitl Circuits, Proc. Intl. Symp. on Circuits nd Systems, pp. 2991-2994, My 2005. [3] K. Mohnrm nd N. A. Toub, Cost-Effective Approch for Reducing Soft Error Filure Rte in Logic Circuits, Proc. Int l 501 Test Conf., pp. 893-901, 2003. [4] H. T. Nguyen nd Y. Ygil, A Systemtic Approch to SER Estimtion nd Solutions, Proc. Intl. Relibility Physicl Symp., pp. 60-70, 2003. [5] P. Shivkumr, M. Kistler, S.W. Keckler, D. Burger, nd L. Alvisi, Modeling the Effect of Technology Trends on the Soft Error Rte of Combintoril Logic, Proc. Int l Conf. on Dependble Systems nd Networks, pp. 389-398, 2002. [6] M. Zhng nd N. R. Shnbhg, A Soft Error Rte Anlysis (SERA) Methodology, Proc. Intl. Conf. on Computer-Aided Design, pp. 111-118, Nov. 2004. [7] B. Zhng nd M. Orshnsky, Symbolic Simultion of the Propgtion nd Filtering of Trnsient Fulty Pulses, Proc. SELSE Workshop, Urbn-Chmpign, April 2005. [8] Q. Zhou nd K. Mohnrm, Cost-Effective Rdition Hrdening Technique for Combintionl Logic, Proc. Intl. Conf. on Computer-Aided Design, pp. 100-106, Nov. 2004.