Resilience and yield of flip-flops in future CMOS technologies under process variations and aging

Similar documents
data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Noise Margin in Low Power SRAM Cells

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

Low-Power and Area-Efficient Shift Register Using Pulsed Latches

Improve Performance of Low-Power Clock Branch Sharing Double-Edge Triggered Flip-Flop

Design of Low Power Universal Shift Register

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

Design a Low Power Flip-Flop Based on a Signal Feed-Through Scheme

LOW POWER DOUBLE EDGE PULSE TRIGGERED FLIP FLOP DESIGN

New Single Edge Triggered Flip-Flop Design with Improved Power and Power Delay Product for Low Data Activity Applications

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module

An FPGA Implementation of Shift Register Using Pulsed Latches

I. INTRODUCTION. Figure 1: Explicit Data Close to Output

An Optimized Implementation of Pulse Triggered Flip-flop Based on Single Feed-Through Scheme in FPGA Technology

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

Novel Design of Static Dual-Edge Triggered (DET) Flip-Flops using Multiple C-Elements

Design of Pulse Triggered Flip Flop Using Conditional Pulse Enhancement Technique

A Low-Power CMOS Flip-Flop for High Performance Processors

A Power Efficient Flip Flop by using 90nm Technology

Low Power High Speed Voltage Level Shifter for Sub- Threshold Operations

Design and analysis of RCA in Subthreshold Logic Circuits Using AFE

Reduction of Area and Power of Shift Register Using Pulsed Latches

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

II. ANALYSIS I. INTRODUCTION

FP 12.4: A CMOS Scheme for 0.5V Supply Voltage with Pico-Ampere Standby Current

Comparative study on low-power high-performance standard-cell flip-flops

DESIGN OF LOW POWER TEST PATTERN GENERATOR

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

Novel Low Power and Low Transistor Count Flip-Flop Design with. High Performance

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

Robust Synchronization using the Wagging Technique

Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks. A Thesis presented.

Energy-Delay Space Analysis for Clocked Storage Elements Under Process Variations

ADVANCES in NATURAL and APPLIED SCIENCES

Research Article Ultra Low Power, High Performance Negative Edge Triggered ECRL Energy Recovery Sequential Elements with Power Clock Gating

FinFETs & SRAM Design

DUAL EDGE-TRIGGERED D-TYPE FLIP-FLOP WITH LOW POWER CONSUMPTION

High Performance Dynamic Hybrid Flip-Flop For Pipeline Stages with Methodical Implanted Logic

Power Optimization Techniques for Sequential Elements Using Pulse Triggered Flip-Flops with SVL Logic

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

Design and Multi-Corner Optimization of the Energy-Delay Product of CMOS Flip-Flops under the NBTI Effect

Low Power and Reduce Area Dual Edge Pulse Triggered Flip-Flop Based on Signal Feed-Through Scheme

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Design of Low Power and Area Efficient Pulsed Latch Based Shift Register

Power Optimization by Using Multi-Bit Flip-Flops

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P46 ISSN Online:

Robust flip-flop Redesign for Violation Minimization Considering Hot Carrier Injection (HCI) and Negative Bias Temperature

Variation-and-Aging Aware Low Power embedded SRAM for Multimedia Applications

IMPACT OF PROCESS VARIATIONS ON SOFT ERROR SENSITIVITY OF 32-NM VLSI CIRCUITS IN NEAR-THRESHOLD REGION. Lingbo Kou. Thesis

Asynchronous Model of Flip-Flop s and Latches for Low Power Clocking

Design of an Efficient Low Power Multi Modulus Prescaler

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

Memory elements. Topics. Memory element terminology. Variations in memory elements. Clock terminology. Memory element parameters. clock.

Built-In Proactive Tuning System for Circuit Aging Resilience

Metastability Analysis of Synchronizer

A NOVEL APPROACH TO ACHIEVE HIGH SPEED LOW-POWER HYBRID FLIP-FLOP

Digital Integrated Circuits EECS 312

MEMORY ERROR COMPENSATION TECHNIQUES FOR JPEG2000. Yunus Emre and Chaitali Chakrabarti

Design of Low Power D-Flip Flop Using True Single Phase Clock (TSPC)

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor

Efficient 500 MHz Digital Phase Locked Loop Implementation sin 180nm CMOS Technology

Design and Analysis of Metastable-Hardened and Soft-Error Tolerant. High-Performance, Low-Power Flip-Flops

Design of Shift Register Using Pulse Triggered Flip Flop

High Frequency 32/33 Prescalers Using 2/3 Prescaler Technique

Design of New Dual Edge Triggered Sense Amplifier Flip-Flop with Low Area and Power Efficient

The Impact of Device-Width Quantization on Digital Circuit Design Using FinFET Structures

UNIT III COMBINATIONAL AND SEQUENTIAL CIRCUIT DESIGN

Design of Conditional-Boosting Flip-Flop for Ultra Low Power Applications

DESIGN AND IMPLEMENTATION OF SYNCHRONOUS 4-BIT UP COUNTER USING 180NM CMOS PROCESS TECHNOLOGY

DESIGN AND ANALYSIS OF LOW POWER STS PULSE TRIGGERED FLIP-FLOP USING 250NM CMOS TECHNOLOGY

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

LFSR Counter Implementation in CMOS VLSI

ECE321 Electronics I

Impact of Intermittent Faults on Nanocomputing Devices

Comparative Analysis of Pulsed Latch and Flip-Flop based Shift Registers for High-Performance and Low-Power Systems

Future of Analog Design and Upcoming Challenges in Nanometer CMOS

Figure.1 Clock signal II. SYSTEM ANALYSIS

Energy Recovery Clocking Scheme and Flip-Flops for Ultra Low-Energy Applications

International Journal of Scientific & Engineering Research, Volume 5, Issue 11, November-2014 ISSN

GLITCH FREE NAND BASED DCDL IN PHASE LOCKED LOOP APPLICATION

LOW POWER HIGH PERFORMANCE PULSED FLIP FLOPS BASED ON SIGNAL FEED SCHEME

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Timing Error Detection and Correction for Reliable Integrated Circuits in Nanometer Technologies

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Design and Analysis of CNTFET Based D Flip-Flop

LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

A Novel Low-overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application

Load-Sensitive Flip-Flop Characterization

ISSN:

PERFORMANCE ANALYSIS OF POWER GATING TECHNIQUES IN 4-BIT SISO SHIFT REGISTER CIRCUITS

EFFICIENT POWER REDUCTION OF TOPOLOGICALLY COMPRESSED FLIP-FLOP AND GDI BASED FLIP FLOP

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

Transcription:

Published in IET Circuits, Devices & Systems Received on 24th November 2012 Revised on 4th July 2013 Accepted on 4th July 2013 Resilience and yield of flip-flops in future CMOS technologies under process variations and aging Christoph Werner, Benedikt Backs, Martin Wirnshofer, Doris Schmitt-Landsiedel Lehrstuhl Technische Elektronik, Technical University Munich, Theresienstr.90, Muenchen 80290, Germany E-mail: christoph.j.werner@tum.de ISSN 1751-858X Abstract: In this study, the failure rate of flip-flops in future 16 nm complementary metal-oxide-semiconductor (CMOS) technologies is investigated. Using transistor level Monte Carlo simulations, the authors studied the influence of process variations and long term aging on the yield. The statistical distribution of the switching time (clock-to-q delay) is shown to be highly asymmetric compared to a Gaussian distribution leading to a drastically enhanced fraction of very slow or metastable samples. Moreover, the failure rates will rise additionally during the device lifetime because of aging effects. To improve the yield the authors investigated several possible countermeasures including enhanced supply voltage or ensuring larger data-to-clock times as well as process and circuit optimisation. 1 Introduction The influence of the ever increasing process variations in future CMOS technologies on the performance and the yield of digital CMOS circuits have received increasing attention in recent publications [1 3]. Whereas the impacts on static random access memory (SRAM) reliability as well as on timing of digital circuits have been studied in broad scope in a large number of papers [4 8]. However, the consequences of increased process variations on the switching behaviour of flip-flops or clocked storage elements in digital circuits have been investigated only by a small number of publications, which so far have led to quite contradictory results. Most authors [9, 10] use rather optimistic predictions (e.g. a 3σ value of 18% for the threshold voltage variations) from older International Technology Roadmap for Semiconductors (ITRS) roadmaps from 2007 [11] for the future process stability, which lead them to results, that seem quite manageable in future CMOS generations. In [9] the influence of process variations on the delay of five different flip-flop architectures was studied for a predictive 32 nm CMOS technology. From the ITRS prediction on the V th variation they deduce mean delay variations of 15 56%. On the other side, in [12], much higher process variations were found by atomistic device simulations, which were significantly higher than assumed in 2007. In accordance to that, the newer ITRS roadmaps publish an allowable threshold voltage variation (ATVV) of 42%, which would result in a 1σ value of 37 mv for the high performance 16 nm CMOS technology, investigated in this paper. In [13], it is shown that these process variations lead to a highly asymmetric distribution of delay variations in a CMOS D-flip-flop leading to a 3 higher failure rate for slowly switching elements than a symmetric Gaussian distribution would predict. In [14], the impact of process variations in an industrial 65 nm CMOS process on delay and energy of five different flip-flop architectures is investigated which results in 3σ values of 15 25%. In [15] four designs of D-flip-flops are studied experimentally in a 65 nm process technology. The authors found a very asymmetric distribution leading to an enhanced number of very slowly switching elements. In this paper, we will follow the newer assumptions and investigate in more detail the influence of process variations on the performance of two typical flip-flop architectures. Investigations on long-term aging for six different types of flip-flops have been published in [16] for a 32 nm technology. It is predicted that the mean values for setup-, hold- and switching times are delayed by less than 1 ps in nearly all cases after an aging period of 5 years, which seems small enough for all applications. However, the authors of [16] did not consider the effect of process variations, which will lead to enhanced failure rates in the aged samples as we will show in Section 3 of this paper. The rest of this paper is organised as follows: In the next section two typical flip-flop architectures are investigated, a transmission-gate master-slave flip-flop (TGMS) as it is used in PowerPC CPU and also in many application specific integrated circuit (ASIC) and system-on-chip (SOC) technologies and even in the newest 22 nm Intel processors [17] and the semi-dynamic flip-flop (SDFF), which is much faster but more power consuming, a variation of which is used in the ultra scalable processor architecture (USPARC) processor. Extensive Monte Carlo simulations are presented using predictive technology models (PTM) for a future 16 nm CMOS technology [18] and process variations as predicted by the ITRS roadmap 2009. Section 3 gives results on degraded devices after long term aging by biastemperature-instability effects (NBTI and PBTI). In Section 19

4 we present possible countermeasures to reduce the failure rates, which include increased supply voltage, ensuring larger data-to-clock times, reduction of process variations and design improvements. Section 5 concludes the paper and gives an outlook to further work. 2 Influence of process variations Fig. 1 shows the two flip-flop architectures studied in this work, a TGMS and a SDFF. These two architectures have been chosen since they represent two opposite corners in the energy-delay space as shown in [19 23]. In Fig. 1a, we see the TGMS flip-flop [10], which has a classical master-slave structure with two statically back-coupled inverter stages. This leads to a rather slow but stable switching behaviour at low power consumption. The transistor dimensions were adapted to the used 16 nm technology according to an established design in 65 nm. The detailed values are given in Table 1. An additional optimisation step for our predictive 16 nm technology will be described in Section 4. More detailed optimisation procedures for various flip-flop architectures have been published in [9, 21, 22]. In Fig. 1b, the SDFF is shown [10]. It has a dynamic precharge phase for the internal node X during the clock signal low phase. During the clock high phase X is discharged if the data input D is high. This leads to a very fast switching of the output Q, but also to enhanced power consumption because of the permanent charge discharge cycles even if the input remains constantly high. As a consequence the SDFF has a stable setup time near zero and its switching behaviour is more critical for rising than for falling data edges. However, the hold time required for the SDFF is somewhat larger than for the TGMS [24]. The flip-flop design was adopted from [10] resulting in the Fig. 1 Circuit diagram of the two flip-flops investigated a TGMS b Semi-dynamic flip-flop (SDFF) [10] For the TGMS the most degradation-sensitive PMOSFET at the data input is indicated, which will be optimised for higher yield in Section 4 20

Table 1 Values for the transistor widths used in our simulations of the SDFF and TGMS flip-flops SDFF TGMS Width, nm Width, nm Width, nm Width, nm M1 96 MN1 72 MP17 87 MN15 64 M2 432 MP1 139 MP19 65 MN16 48 M3 588 MN2 48 MP22 85 MN17 61 M4 660 MP2 96 MP25 87 MN22 64 M5 660 MN3 48 MP18 67 MN23 48 M6 204 MP3 48 MP15 80 MN24 56 M7 204 MN4 144 MP33 48 MN25 48 MPA 60 MP4 48 MP13 67 MN28 48 MPB 228 MN5 372 MP23 48 MN36 48 MNA 60 MP5 648 MP32 48 MN37 48 MNB 72 MN6 48 MP31 48 MN38 48 MP6 72 MP43 67 MN49 48 The length is taken as 16 nm for all MOSFETs transistor dimensions given in Table 1. For the simulation BSIM4 parameters have been used for high performance planar 16 nm CMOS from the PTM in [18]. The Monte Carlo simulations have been done with the circuit simulator HSPICE [25]. The scatter plots of Fig. 2 shows our results for both types of flip-flops. We plot the clock-to-q switching delay (c2q) for the rising against the falling edge for a number of 3000 samples. In most of the cases we see a good correlation between falling and rising edge especially for small values. However, in all cases there are a number of samples, where either the falling or the rising data edge has an extremely slow switching time (metastability). This is due to a metastable behaviour of the circuit [10], which can lead to an unpredictable switching because of small noise sources (or numerical uncertainties in the simulation). For the TGMS the plots contain also a number of samples with c2q = 0 for the falling edge and one for the rising edge. This is displayed for failing samples, which remained at their old value and did not switch within 3 ns. In Fig. 3, we compare the distribution function for different values of the threshold variations in a TGMS. We consider both the local variations of the threshold voltage within the circuit and the global variations between different circuit samples. The local variation, which is mainly because of the intrinsic random dopant variations (RDF), cannot be reduced by process optimisation. It is described by Pelgrom s law σ = A/(WL) 0.5 with a Pelgrom parameter of A =10 9 Vm, as reported from measurements in a recent technology [1]. Please note that in contrast to the local variations, which are quite consistent in literature, the global variations to be expected in future technologies are widely different between various publications. As described in [1], global variations result from non-ideal manufacturing processes including patterning proximity effects and variations in the gate dielectric and in the metal gate. The 2007 ITRS roadmap [11] demanded a 3σ value of 18% for the global V th variations which would be a 1σ of 15 mv in our predictive technology, whereas in the 2009 edition a total ATVV of 42% is reported, which in our technology would result in a σ of 37 mv and recent atomistic simulations predict 40 mv or even more [6]. In our work we only considered variations in the MOSFET threshold voltage V th, both local and global variations. In a real 16 nm technology Fig. 2 Correlations of c2q delay times for rising and for falling data edges of the two flip-flops investigated a TGMS flip-flop where the data edge comes 50 ps before the clock edge b SDFF with a data-to-clock delay of 0 ps Zero delay times are displayed for failing samples which do not switch at all (HSPICE Monte Carlo simulations with 3000 samples) 21

Fig. 3 Probability distribution functions (giving the percentage of occurrences in each bin of 16ps) for three different values of the global threshold variation (52, 37 and 22 mv) in a TGMS, for rising and falling transitions with a fixed data-to-clock delay of 100 ps It is seen that the smallest variation has no fails and no metastabilities (at least for the 25 000 MC samples investigated), whereas the highest variation leads to fails and metastabilities in the percentage range we expect also variations on other parameters either correlated or uncorrelated to the V th variations. Nevertheless, the threshold variations are considered to have the highest impact and thus, as a first step, the other variations are neglected in our investigations. In our work we assumed a Gaussian distribution of V th using a local variation given by Pelgrom s law and global variations with a number of different σ values. Future technology nodes will presumably use undoped FinFETs, which will result in significant reduction of the doping induced local variations. However, because of line edge roughness (LER) of the fins and because of metal gate granularity (MGG), comparable values of σ th are predicted by atomistic device simulations [26] (e.g. σ th = 31.5 mv for L =20nm, σ th = 41.5 mv for L = 14 nm, and σ th = 51.6 mv for L = 10 nm), though these are considered as local variations, whereas in our study we focus more on the global variations. In Fig. 3, we see a failure rate of 0.01% for the σ th (global) = 37 mv variation whereas a value of 52 mv leads to fails in the percentage range. No fails or metastabilities were found at σ th (global) = 22 mv for the 25 000 MC samples. A linear extrapolation of the distribution function to c2q = 100 ps, where the other curves show the first metastabilities, would predict fails below 10 8. In Fig. 3, we see that the width of the c2q delay distribution is a strong function of the threshold voltage variation and also the number of metastable or non-switching states increases for larger variations. Beyond a certain c2q (about 100 150 ps) we consider the flip-flops to be in their metastable state. Moreover, for σ(global) = 52 mv we even found some percent of totally failing flip-flops, which do not propagate at all the data input D to the output Q within 3 ns. In Fig. 4, we show the distribution function for the c2q delay for a typical data-to-clock time (0 ps for SDFF, 100 ps for TGMS) as a result of 25 000 MC samples. We see a Fig. 4 Comparison of distribution functions for TGMS flip-flop and SDFF (25 000 Monte Carlo samples (V dd = 0.7 V, data-to-clock delay = 0 and 100 ps, resp., virgin circuits, σ(global) = 37 mv) Both PDFs are strongly asymmetric leading to a higher fraction of slow samples. Both distributions are very similar showing a small number of metastable samples at a rather low level (a factor of 1E 4 below the maximum). The values are the percentage of occurrences in each bin of 4 ps 22

very asymmetric distribution yielding a reduction of the PDF of 1E4 below the maximum level at μ +9σ at the slow side and at μ 2σ at the fast side for the SDFF. The distributions are quite similar for rising and falling data edges and show a 2 3 faster switching time for SDFF compared to TGMS. In Section 4 we will use a definition for the total failure rate as the sum of total fails, which do not switch at all either at a rising or at a falling edge, plus the sum of metastabilities with a c2q delay above μ +9σ. This is rather similar to [3] where a failure is assumed for a 10 enhanced switching time of the latch. In most publications [19 23], which compare different FF architectures, the sum of an optimum setup time plus the resultant c2q delay (D-to-Q delay) is considered as the key figure. As we will show in Section 4 by adapting the data-to-clock delay the number of fails can be significantly reduced. Nevertheless, since we want to estimate the percentage of very slow or failing flip-flops in one chip because of process variations, we have to use a fixed data-to-clock delay for all our samples with a sufficient margin to the nominal setup time (50 or 100 ps for TGMS and 0 ps for SDFF). So we plot our yield as a function of c2q delay and not of D-to-Q delay as in [19 23]. For the TGMS, we show in Fig. 5 the individual correlations between the threshold variation of the single MOSFETs and the flip-flop failure rate as defined above. We selected all the Monte Carlo samples with metastable or failing behaviour in a TGMS. Now we looked how much the mean V th values of each individual transistor in this failing subset deviated from the mean V th of this MOSFET in the whole MC distribution. From this we could identify those MOSFETs whose deviations would mostly influence the FF fails. We found that the strongest correlation occurs for high absolute values of PMOS threshold in the global variation and also for the transistors MP43 and MN24, which have to Fig. 5 Sensitivity of the failure rate of a TGMS flip-flop on the single threshold voltage variations The graph shows the mean deviations of the threshold voltages from the nominal value (in number of sigmas) of the 14 most sensitive MOSFETs in all samples that were found metastable or failing in a 100 sample Monte Carlo simulation. Since the PMOS threshold voltages have a negative value, the negative values for the PMOSFETs mean that an enhanced absolute value of PMOS V th leads to a higher failure rate charge the master node at a falling data signal. As we will see in Section 4 an increased width of MP43 can help to reduce the failure rate. For the SDFF we found the sensitivities of the c2q delay on the threshold voltages to have rather similar values for all NMOSFETs, whereas for the PMOSFET threshold voltages there was a somewhat lower correlation. 3 Long term aging In today s CMOS technologies the threshold shift induced by long term aging is an increasing problem. To investigate this, we assume V th shifts in those transistors that are permanently under NBTI or PBTI stress as long as a constant input signal is applied. In our estimations we assumed that 50% of V th shift for those transistors that see a toggling clocking signal permanently switching between 0 and V dd, since the BTI aging under AC conditions is about that factor lower than for DC [27]. For a worst-case consideration with minimum recovery we assume that the aged flip-flops see a constant input signal for a long time which is changed for just one clock cycle and then written back again. Moreover, hot carrier instability will also occur at the NMOSFETs that see a switching input signal, but this effect can be considered to have a smaller impact. Usually circuits have to be designed to allow a maximum voltage shift of about 30 mv in the MOSFETS that experience the highest stress conditions without leading to circuit failures. Because of lacking experimental data for our predictive 16 nm high-k technology we assumed the same mean value of 30 mv for NBTI effects on PMOS and PBTI on NMOS transistors, which might be quite a good guess from today s experimental results [28, 29]. Moreover, future CMOS devices will experience a strong variability in the V th shifts induced by long term aging. We used the statistical model for the bias temperature instability effect in aged MOSFETs published in [30] which is based on the most advanced physical models, and implemented it into SPICE to allow Monte Carlo simulations considering the variability of the virgin devices as well as the variations in the aging effects themselves. Our model can well describe the non-gaussian distribution, which can be described by a Gamma function as shown in Fig. 6 and leads to an increased fraction of samples with large threshold voltage shift. It can accurately describe the stress/recovery behaviour of V th in MOSFETs in a 65 nm CMOS technology and will allow also predictions of aging induced failure rates in our 16 nm CMOS circuits. Details were given in [31]. In Fig. 7, we see the influence of long term degradation on the distribution function of the c2q delay for a TGMS with a data-to-clock time in the critical range of 50 ps. It is not clear today how large the degradation effects in future technologies will be, but it can be assumed that a maximum drift of 30 mv at the permanently stressed transistors as used in our MC simulations must still be allowed before circuit failures occur. The results show that the mean switching speed of the flip-flop is degraded by less than 10 ps and the 3σ value by 20 ps, which should not give rise to problems in the circuit behaviour, but the number of slowly switching samples with c2q delays of more than 100 ps is increased by a factor of 2 10 in the aged samples compared to the virgin circuit. Moreover the number of failing samples, which do not switch at all, is increased by more than a factor of 10. As we will see in Section 4 these fails can be 23

Fig. 6 Probability distribution function for the threshold voltage shift from a Monte Carlo simulation of 3000 samples after 10 000 s stress at V GS = 1.5 V followed by recovery at V GS = 0 V for 10, 1000 and 10 000 s Each value gives the number of samples in a bin of ΔV th = 12.8 mv. NBTI model from [31], parameters adapted for a 65 nm CMOS technology. The results can quite well be fitted by Gamma functions Fig. 7 Probability distribution functions for virgin and aged distributions of the switching delay in a TGMS with σ(global) = 37 mv (data-to-clock = 50 ps, V dd = 0.70 V) The y-axis give the percentage of samples falling into a bin of 5 ps. The number of slowly switching samples increased by a factor of 2 5, whereas the number of fails increased by more than a factor of 10 Fig. 8 Probability distribution functions for virgin and aged distributions of SDFF (T d2c = 0 ps, hold time = 150 ps, V dd = 0.70 V, rising edge, σ(global) = 37 mv) The y-axis give the percentage of samples falling into a bin of 5 ps. The mean value of c2q is enhanced by 10 ps by the aging, whereas 0.1% of metastables occurs after aging will study the possibility to additionally reduce the failures and metastabilities in the flip-flops by reducing frequency and increasing the supply voltage. We will estimate how much yield can be regained, when limited constraints in power and delay are accepted, that is, with somewhat enhanced supply voltage or ensuring longer data-to-clock time or by process or design optimisation. In Fig. 9, we show the failure rates of our two flip-flop architectures as defined in Section 2 as a function of data-to-clock time. It can be seen that an increase from 50 to 100 ps for the TGMS can reduce the failure rate for the aged circuit by a factor of 30, whereas for SDFF a low value is already achieved for the nominal setup time of 0 ps, and for longer data-to-clock times only marginal improvements are seen. This is true both for virgin and aged circuits. In Figs. 10 and 11, the influence of supply voltage can be seen. An increase of V dd from 0.70 to 0.75 V will result in a decrease of fails by a factor of 10 or more. avoided by guaranteeing a higher data-to-clock time or by increasing the supply voltage by 50 mv. A similar behaviour is seen for the SDFF in Fig. 8. Here we have an AC stress for most of the internal MOSFETs because of the dynamic charge/discharge architecture, only the input and output stages see a permanent stress. Again we used our statistical NBTI model from [31] with a mean threshold shift of 30 mv for all NMOS and PMOS transistors which are permanently under PBTI or NBTI stress and 15 mv for the MOSFETs under AC stress. We see only a small shift in the mean switching time, but a drastic increase in metastable samples. Here we also see a fraction of 0.1% of the samples that completely fail. 4 Countermeasures Clustered voltage scaling (CVS) and adaptive voltage and frequency scaling (AVFS) have been discussed extensively as an approach to reduce leakage power as well as timing errors in combinational logic paths [8]. In this section we Fig. 9 Percentage of failures for both flip-flop types as a function of data-to-clock times for virgin and aged circuits For the TGMS the failure rates can be drastically reduced for enhanced data-to-clock times, whereas for SDFF only a small improvement is seen 24

Fig. 10 Probability distribution functions for virgin and aged samples of SDFF (t d2c = 0 ps) An enhancement of V dd from 0.70 to 0.75 V can reduce the failure rate back to the values of the virgin samples at 0.70 V Fig. 12 Percentage of failures for TGMS flip-flop with two different designs as a function of data-to-clock times for virgin and aged circuits Whereas the failure rates are not changed for long d2c times in the virgin circuit, it can be reduced significantly for the aged circuit and for very short d2c times in the virgin circuit Fig. 11 Percentage of failures for a TGMS flip-flop as a function of data-to-clock times for virgin and aged circuits An enhancement of V dd from 0.70 to 0.75 V can reduce the failure rate by a factor of 10 back to the values of the virgin circuit Of course an increased V dd will result in slightly higher power consumption as well as in somewhat increased long term aging. Nevertheless, we think that the advantage of higher yield will be worth the cost, especially when V dd is increased only if necessary by use of adaptive voltage scaling (AVS) [8] and/or only in the blocks where fails might occur (CVS). In today s CMOS technology flip-flops are optimised with regard to power consumption and timing delay, but without too much attention to reliability [9]. Therefore for the TGMS we also tried a simple design improvement of the circuit. In Fig. 12, we compare the failure rates for two modifications, where the width of the most critical PMOSFET (MP43 in Fig. 1a) of the input stage was enhanced by 50% (from 4 L min to 6 L min ), whereas all other components remain unchanged. We found that the stronger pullup PMOSFET helps to reduce the failure rate for the case of a rising data signal in the aged circuit by factors of 3 10, without any drawback for the falling data signal. Whereas the TGMS failure rate can be significantly reduced by guaranteeing a higher setup time, the SDFF does not benefit from a data-to-clock time increased above 0 ps neither for the virgin nor for the aged device. Since the SDFF is known to be sensitive to the hold time [24], we looked at this influence in Fig. 13. It is seen that the failure rate for the virgin circuit increases significantly for hold times below 30 ps. However, additional failures that occur after aging cannot really be avoided by increasing the hold time. Thus for the SDFF only an enhanced supply voltage can avoid failures in the aged device, as can be seen from Fig. 10. Finally, we also consider the influence of improved process technology, which could reduce the global variation of the threshold voltage. This has already been shown in Fig. 3 of Section 2. There it can be seen that our failure rate could be very low, if it would be possible to reduce the global threshold voltage variation down to a σ value of 22 mv. An extrapolation of the curves in Fig. 3 would suggest a value of 10 8 for metastable and failing flip-flops. Therefore if it really would be possible to achieve the ambitious goals of the 2007 ITRS roadmap, we will obtain Fig. 13 Percentage of failures for SDFF flip-flop as a function of hold times for virgin and aged circuits The failure rates can be significantly reduced for longer hold times in the virgin circuit, but the additional number of failures that occur in the aged circuit remain the same for all hold times 25

Table 2 Relative number of fails for virgin and aged flip-flops with different working conditions V dd, V Data-to-clock, ps sufficient yield at least for the virgin circuits. However, may be the other options presented before will be achievable with less effort. In Table 2, we summarise the effects of our different countermeasures on the yield of the two flip-flop architectures. We do that both for the unstressed circuits and for aged circuits, where the threshold voltage of those MOSFETs that experience the highest stress conditions has shifted by 30 mv. The results show that we can reduce the failure rate by factors of 5 10, which is about the same factor which we loose because of the long term aging. 5 Conclusion Monte Carlo transistor level simulations of typical flip-flops in a future 16 nm CMOS technology have shown that a considerable number of metastable or failing samples must be expected. For typical process variations we found that 1 10 4 up to 1 10 3 of the flip-flops will fail because of metastable or completely non-switching behaviour. This is primarily because of a strongly asymmetric distribution function with a long tail of slow switching flip-flops. In addition, already a 30 mv shift of threshold voltage because of long term aging of the circuit (BTI) will lead to an increased number of fails by a factor of 10 20. In our study, we have also investigated a number of countermeasures in order to reduce the number of failures. These include an increased data-to-clock time or an enhanced hold time, which leads to a reduction of the failure rate by a factor of 5 10, as well as increasing the supply voltage by 50 mv, which leads to a 10 reduction of failures. Moreover, we showed a way how the flip-flop design can be optimised for reduced failure rates and increased yield. 6 Acknowledgment The authors would like to acknowledge Sani Nassif (IBM) for inspiring discussions. 7 References Fails for virgin circuit Fails for aged circuit (a) TGMS 0.70 50 1 10 3 2 10 2 0.75 50 < 1 10 4 2 10 3 0.70 100 2 10 4 1 10 3 (b) SDFF hold time 0.70 30 1 10 3 7 10 3 0.75 30 1 10 4 6 10 4 0.70 50 1 10 4 2 10 3 1 Kuhn, K.J., Giles, M.D., Becher, D., et al.: Process technology variation, IEEE Trans. Electron Devices, 2011, 58/8, pp. 2197 2208 2 Nassif, S.R., Mehta, N., Cao, Y.: A resilience roadmap. Proc. of DATE Conf., 2010, pp. 1011 1016 3 Nassif, S., Leeberger, V.K., Schlichtmann, U.: Goldilocks Failures: not too soft, not too hard (IRPS, 2012), paper 2F1.1 4 Drapatz, S., Hofmann, K., Georgakos, G., Schmitt-Landsiedel, D.: Impact of fast-recovering NBTI degradation on stability of large-scale SRAM arrays. Proc. of ESSCIRC 2010, pp. 146 149 5 Eireiner, M., Henzler, S., Georgakos, G., Berthold, J., Schmitt-Landsiedel, D.: In-situ delay characterization and local supply voltage adjustment for compensation of local parametric variations, IEEE J. Solid-State Circuits, 2007, 42, pp. 1583 1592 6 Wang, X., Roy, G., Saxod, O., Bajolet, A., Juge, A., Asenov, A.: Simulation study of dominant statistical variability sources in 32-nm high-κ/metal gate CMOS, IEEE Electron Device Lett., 2012, 33/5, pp. 643 645 7 Wang, J., Calhoun, B.H.: Minimum supply voltage and yield estimation for large SRAMs under parametric variations, IEEE Trans. VLSI Syst., 2011, 19/11, pp. 2120 2125 8 Wirnshofer, M., Heiss, L., Georgakos, G., Schmitt-Landsiedel, D.: An energy-efficient supply voltage scheme using in-situ pre-error detection for on-the-fly voltage adaptation to PVT variations. Proc. 13th Int. Symp. on Integrated Circuits (ISIC), 2011, pp. 94 97 9 Moon, J., Aktan, M., Oklobdzija, V.: Clocked storage elements robust to process variations (ASICON, 2009), pp. 827 831 10 Stojanovic, V., Oklobdzija, V.G.: Comparative analysis of master slave latches and flip-flops for high-performance and low-power systems, IEEE J. Solid-State Circuits, 1999, 34, pp. 536 548 11 I.T.R.S. for Semiconductors, Edition 2007. ITRS, Technical Report, 2007. Available at: http://www.itrs.net 12 Wang, X., Roy, S., Brown, A.R., Asenov, A.: Impact of STI on statistical variability and reliability of decananometer MOSFETs, IEEE Electron Device Lett., 2011, 32, pp. 479 481 13 Hassan, F. Vanderbauwhede, W., Rodríguez-Salazar, F.: Impact of random dopant fluctuations on the timing characteristics of flip-flops, IEEE Trans. VLSI Syst., 2012, 20/1, pp. 157 161 14 Lanuzza, M., De Rose, R., Frustaci, F., Perri, S., Corsonello, P.: Impact of process variations on flip-flops energy and timing characteristics. IEEE VLSI Symp., 2010, pp. 458 460 15 Sunagawa, H., Onodera, H.: Variation-tolerant design of D-flip-flops. SOC Conf. (SOCC), 2010, pp. 147 151 16 Rao, V.G., Mahmoodi, H.: Analysis of reliability of flip-flops under transistor aging effects in nano-scale CMOS technology. ICCD Conf. 2011, pp. 439 440 17 Hsu, S., Agarwal, A., Anders, M., et al.: A 280 mv-to-1.1 V 256b reconfigurable SIMD vector permutation engine with 2-dimensional shuffle in 22 nm CMOS. ISSCC 2012, pp. 178 180 18 Predictive Technology Models, Arizona State University Nanoscale Group; available at: http://www.eas.asu.edu/~ptm/ 19 Alioto, M., Consoli, E., Palumbo, G.: Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: part I methodology and design strategies, IEEE Trans. VLSI Syst., 2030, 19/5, pp. 725 736 20 Alioto, M., Consoli, E., Palumbo, G.: Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: part II results and figures of merit, IEEE Trans. VLSI Syst., 2011, 19/5, pp. 737 750 21 Alioto, M., Consoli, E., Palumbo, G.: General strategies to design nanometer flip-flops in the energy-delay space, IEEE Trans. Circuits Syst., 2010, 57/7, pp. 1583 1596 22 Consoli, E., Palumbo, G., Pennisi, M.: Reconsidering high-speed design criteria for transmission-gate-based master slave flip-flops, IEEE Trans. VLSI Syst., 2012, 20/2, pp. 284 295 23 Consoli, E., Alioto, M., Palumbo, G., Rabaey, J.: Conditional push-pull pulsed latches with 726fJ ps energy-delay product in 65 nm CMOS (ISSCC, 2012), pp. 482 484 24 Li, D., Chuang, P., Sachdev, M.: Comparative analysis and study of metastability on high-performance flip-flops (ISQED, 2010), pp. 853 860 25 Synopsys Circuit Simulator: available at: http://www.synopsys.com/ Tools/Verification/AMSVerification/CircuitSimulation/HSPICE 26 Wang, X., Brown, A., Cheng, B., Asenov, A.: Statistical variability and reliability in nanoscale FinFETs. Proc. of IEDM, 2011, pp. 103 106 27 Reisinger, H., Grasser, T., Ermisch, U., et al.: Understanding and modeling AC BTI (IRPS, 2011), pp. 597 604 28 Rahman, A., Agostinelli, M., Bai, P., et al.: Reliability studies of a 32 nm system-on-chip (SoC) platform technology with 2nd generation high-k/metal gate transistors (IRPS, 2011), pp. 5D.3.1 5D.3.6 29 Chen, X., Sanavedam, S., Narayanan, V., et al.: A cost effective 32 nm high-k/ metal gate CMOS technology for low power applications with single-metal/gate-first process. VLSI Symp., 2008, pp. 88 89 30 Wirth, G.I., da Silva, R., Kaczer, B.: Statistical model for MOSFET bias temperature instability component due to charge trapping, IEEE Trans. Electron Devices, 2011, 58/8, pp. 2743 2751 31 Yilmaz, C., Heiß, L., Werner, C., Schmitt-Landsiedel, D.: Modeling of NBTI-recovery effects in analog CMOS circuits. Int. Reliability Physics Symp. IRPS 2013, paper 2A-4. Details of the model were also presented at the Synopsys User Group 2012 in Germany (http://www.synopsys. com/news/pubs/snug/2012/germany/a4_werner_paper.pdf) 26