Built-In Proactive Tuning System for Circuit Aging Resilience

Similar documents
BUILT-IN PROACTIVE TUNING SYSTEM FOR CIRCUIT AGING AND PROCESS VARIATION RESILIENCE. A Thesis NIMAY SHAH

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Design of Fault Coverage Test Pattern Generator Using LFSR

Performance Driven Reliable Link Design for Network on Chips

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

VLSI System Testing. BIST Motivation

LFSR Counter Implementation in CMOS VLSI

Timing Error Detection and Correction for Reliable Integrated Circuits in Nanometer Technologies

DESIGN OF LOW POWER TEST PATTERN GENERATOR

Weighted Random and Transition Density Patterns For Scan-BIST

Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction

Timing Error Detection and Correction by Time Dilation

RAZOR: CIRCUIT-LEVEL CORRECTION OF TIMING ERRORS FOR LOW-POWER OPERATION

ISSN:

A Novel Low Power pattern Generation Technique for Concurrent Bist Architecture

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

A Novel Low-overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

Power Problems in VLSI Circuit Testing

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

Power Optimization of Linear Feedback Shift Register (LFSR) using Power Gating

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

CMOS Testing-2. Design for testability (DFT) Design and Test Flow: Old View Test was merely an afterthought. Specification. Design errors.

Dual Edge Adaptive Pulse Triggered Flip-Flop for a High Speed and Low Power Applications

PERFORMANCE ANALYSIS OF POWER GATING TECHNIQUES IN 4-BIT SISO SHIFT REGISTER CIRCUITS

Overview: Logic BIST

Controlling Peak Power During Scan Testing

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Diagnosis of Resistive open Fault using Scan Based Techniques

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

A Low-Power CMOS Flip-Flop for High Performance Processors

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

II. ANALYSIS I. INTRODUCTION

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

DESIGN OF DOUBLE PULSE TRIGGERED FLIP-FLOP BASED ON SIGNAL FEED THROUGH SCHEME

Scan. This is a sample of the first 15 pages of the Scan chapter.

SIC Vector Generation Using Test per Clock and Test per Scan

A New Approach to Design Fault Coverage Circuit with Efficient Hardware Utilization for Testing Applications

Design of BIST with Low Power Test Pattern Generator

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab. Built-In Self Test 2

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

Future of Analog Design and Upcoming Challenges in Nanometer CMOS

11. Sequential Elements

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis

Testing of Cryptographic Hardware

Design and Implementation of Uart with Bist for Low Power Dissipation Using Lp-Tpg

SYNCHRONOUS DERIVED CLOCK AND SYNTHESIS OF LOW POWER SEQUENTIAL CIRCUITS *

32 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 1, JANUARY /$ IEEE

Design and Analysis of a Linear Feedback Shift Register with Reduced Leakage Power

DETERMINISTIC SEED RANGE AND TEST PATTERN DECREASE IN LOGIC BIST

Fault Detection And Correction Using MLD For Memory Applications

Using on-chip Test Pattern Compression for Full Scan SoC Designs

New Single Edge Triggered Flip-Flop Design with Improved Power and Power Delay Product for Low Data Activity Applications

I. INTRODUCTION. S Ramkumar. D Punitha

An On-Chip Test Clock Control Scheme for Multi-Clock At-Speed Testing

Slack Redistribution for Graceful Degradation Under Voltage Overscaling

Power Optimization by Using Multi-Bit Flip-Flops

On the Rules of Low-Power Design

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

HIGH PERFORMANCE AND LOW POWER ASYNCHRONOUS DATA SAMPLING WITH POWER GATED DOUBLE EDGE TRIGGERED FLIP-FLOP

Analysis and Optimization of Sequential Circuit Elements to Combat Single-Event Timing Upsets

A New Low Energy BIST Using A Statistical Code

CSER: BISER-Based Concurrent Soft-Error Resilience

Aging Aware Multiplier with AHL using FPGA

Figure 1 shows a simple implementation of a clock switch, using an AND-OR type multiplexer logic.

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

EITF35: Introduction to Structured VLSI Design

Self-Test and Adaptation for Random Variations in Reliability

Chapter 5 Flip-Flops and Related Devices

K.T. Tim Cheng 07_dft, v Testability

EDSU: Error detection and sampling unified flip-flop with ultra-low overhead

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

ECE321 Electronics I

ECE 715 System on Chip Design and Test. Lecture 22

Design and Implementation OF Logic-BIST Architecture for I2C Slave VLSI ASIC Design Using Verilog

Analysis of Low Power Test Pattern Generator by Using Low Power Linear Feedback Shift Register (LP-LFSR)

Clock Gate Test Points

Final Exam CPSC/ECEN 680 May 2, Name: UIN:

LOW POWER LEVEL CONVERTING FLIP-FLOP DESIGN BY USING CONDITIONAL DISCHARGE TECHNIQUE

Strategies for Efficient and Effective Scan Delay Testing. Chao Han

Power-Optimal Pipelining in Deep Submicron Technology

Design of Low Power D-Flip Flop Using True Single Phase Clock (TSPC)

Structural Fault Tolerance for SOC

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

Retiming Sequential Circuits for Low Power

COMP2611: Computer Organization. Introduction to Digital Logic

Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation

Bit-Serial Test Pattern Generation by an Accumulator behaving as a Non-Linear Feedback Shift Register

Design and analysis of RCA in Subthreshold Logic Circuits Using AFE

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

Transcription:

IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems Built-In Proactive Tuning System for Circuit Aging Resilience Nimay Shah 1, Rupak Samanta 1, Ming Zhang 2, Jiang Hu 1, Duncan Walker 3 1 Dept. of ECE, Texas A&M University, College Station 2 SoC Enabling Group, Intel 3 Dept. of Computer Science, Texas A&M University, College Station E-mail: nimay_shah@tamu.edu, rupak9@tamu.edu, ming.y.zhang@intel.com, jianghu@ece.tamu.edu, walker@cs.tamu.edu Abstract VLSI circuits in nanometer VLSI technology experience significant aging effects, which are embodied by performance degradation over operation time. Although this degradation can be compensated by over-design, it induces remarkable power overhead which is undesirable in tightly power-constrained designs. Dynamic voltage scaling (DVS) is a more power-efficient approach. However, its coarse granularity implies difficulty in handling fine-grained variations in the aging effects. We propose a Built-In Proactive Tuning (BIPT) system that allows each circuit block to autonomously tune its performance according to its own degree of aging. The BIPT system is validated through SPICE simulations on benchmark circuits with consideration of NBTI effect. The experimental results indicate that the proposed BIPT system leads to about 45% less power than the approach of over-design while maintaining the same performance. Compared to DVS, BIPT can achieve the same aging resilience with about 30% less power dissipation 1. Introduction As VLSI technology scales to nanometer regime, circuit aging effects, such as NBTI (Negative Bias Temperature Instability) and HCI (Hot Carrier Injection) [10] become prominent. NBTI manifests itself by degradation of PMOS threshold voltage [6, 10] whereas HCI results in threshold voltage increase in mostly NMOS transistors. When technology scales from 180nm to 65nm, the MTTF (Mean Time To Failure) of processors due to aging effects is reduced by about 76% [10]. That is, if a chip would have previously lasted for 10 years, now it can perform well for about 2 years. Therefore, it becomes increasingly imperative to address the aging effect in chip design. To handle the circuit aging problem, a common approach is to over-size transistors such that the aged performance can still meet specifications [6]. This approach is able to extend chip lifetime under the aging effect. However, it inevitably increases circuit power dissipation and therefore hits another wall of nanometer integrated circuit design the increasingly tight power constraint. Over-sized transistors usually imply unnecessarily large timing slack and therefore wasteful power dissipation during the initial lifetime of a circuit. Alternatively, architectural approaches [9, 10] are suggested for mitigating the aging problem. One technique is architectural-level adaptation [9, 10] such as DVS (Dynamic Voltage Scaling). For instance, a chip can operate at relatively low supply voltage level when new and switch to higher supply voltage level when it gets aged. Such adaptation can avoid the wasteful power as compared to the oversized transistor approach. However, this is a coarse-grained technique the supply voltage level is usually fixed for major partitions of the chip, if not across the entire chip. In general, the aging effects vary among different components of a circuit. In order to ensure the performance of an entire chip, the DVS must be performed according to the worst transistor aging. That is, strong aging of only 1% percent transistors may require the 1550-5774/08 $25.00 2008 IEEE DOI 10.1109/DFT.2008.49 96

chip-level supply voltage increase although the other 99% transistors have very minor aging induced degradation. In this paper, we propose a Built-In Proactive Tuning (BIPT) system to mitigate the aging problem in a power-efficient manner. This system includes a canary circuit which can generate predictive warning signals for performance degradation. According to the warning signal, circuit speed is tuned through body bias such that the performance degradation is compensated. The proactive tuning is performed offline, at power-on of the chip or periodically. Since aging is a slow change with time constant of weeks/months, periodic tuning of once in a few days is sufficient to capture the change. The offline tuning has the advantage of allowing relatively easy control on input vectors. When detecting performance degradation or circuit delay variation, one must consider the delay uncertainty due to different input vectors. Even if there is no warning signal for certain input vectors, there is still risk of delay errors under other input vectors. Therefore, we include a Test Pattern Generator (TPG) in the system in order to have large input vector coverage. TPG is usually a part of Built-in Self Test (BIST) hardware; thus, if a chip already has BIST circuit, TPG does not cause extra overhead. The proposed BIPT system has the following advantages: It can be applied at circuit block level instead of the chip level architectural approach [9, 10]. In other words, each block can be tuned according to its own degree of aging. Evidently, the finer granularity control allows improved power efficiency. Its performance degradation detection is obtained from the actual operating circuit as opposed to replica circuit in other adaptive design methods [12]. Since the detection is more direct, it is more reliable. Using TPG further improves the reliability of the detection. Its proactive nature can avoid the complex error correction schemes in retroactive systems [3, 4]. The retroactive systems rely on pipeline flush [4] or instruction replay [3] and therefore are restricted to processor designs. In contrast, our system can be applied to both processors and general sequential circuits. Existing approaches have one or two of the above advantages, but none of them have all to the best of our knowledge. The work of [12] is a block level adaptive body bias technique. However, its delay variation detection is obtained from replica circuits which often have discrepancy from the actual operating circuits. The Razor based techniques [3, 4] use direct variation detection, but they rely on complex error correction method and are restricted to processor designs. Another retroactive method [7] is mainly targeted for fast variations such as voltage variations and hence complements our work. The canary circuit based predictive detection is proposed in [8]. However, it is applied with online tuning which suffers from delay uncertainty due to different input vectors. The recent work of [2] focuses on only the aging detection instead of an overall tuning system. Actually, the detection method in [2] can be easily adopted in our tuning system. The BIPT system is validated through SPICE simulations on benchmark circuits with consideration of NBTI effect. Even with consideration of overhead due to TPG, canary and control circuit, the proposed BIPT system can lead to about 45% less power than the overdesign approach while maintaining the same performance. Compared to DVS, BIPT can achieve the same aging resilience with about 30% less power dissipation. 97

2. Built-In Proactive Tuning System 2.1. Overview The Built-In Proactive Tuning (BIPT) System consists of the existing main circuit augmented with a Test Pattern Generator (TPG), Body Bias Circuitry, Canary Circuit and Control circuit. Figure 1 shows these blocks and the corresponding interface signals. Figure 1. Overview of the proposed built-in proactive tuning system At power-on or periodically, the BIPT system can launch test vectors from TPG and then tune circuit body voltage according to the observations from the canary circuit. Canary circuit plays the role of predicting aging-induced performance degradations (more details in Section 2.2). A Warning signal is generated by the canary flip-flops when the timing constraint is tight on one or more of the few critical paths where these are inserted. The top-level warning signal is the OR of all the individual canary flip-flop warning signals. The Linear Feedback Shift Register (LFSR) [1] is implemented as a pseudo-random test pattern generator which applies these random patterns when offline test is in progress. It is triggered by the preset signal from the control block. The control block monitors the status of all the blocks and issues control signals. PON is the power-on-reset signal which is an active high reset signal issued on start-up and basically triggers the offline test. Offline test is an active high signal indicating that offline test is in progress. The most critical activity performed by the control block is to monitor the warning signal from the canary circuit. Based on this signal, it appropriately sets the body bias to selective gates on the critical paths of the main circuit via the bias level signal passed to the body bias block. This interface and the body bias block are modeled as in [12]. The body bias is adaptive to the circuit state: the circuit automatically selects from 4 available options of forward body bias using a counter decoder based scheme. This has been described in detail in section 2.3. 98

2.2 Canary Circuit The canary circuit [8] is for detecting aging-induced performance degradation in a predictive manner. As shown if figure 2, a canary circuit consists of two flip-flops; a main FF and a canary FF. The main FF gets the direct input and the canary FF which serves as the checker part gets the input through a delay buffer. This delay in the input reaching the two flops serves as the guard band for error detection. The outputs from these flops are fed to an xor gate which functions as a comparator, outputting 1 when these are different and thereby predicting the occurrence of an error. Some advanced designs of canary circuits are proposed in [2, 13]. Figure 2. Canary circuit Canary circuit is a typical case design alternative of Razor [4]. However, in contrast to Razor, which delivers a delayed system clock to the checker part (shadow FF), canary circuit delivers a delayed input signal to the checker part (canary FF). This simplifies the clock tree synthesis and routing as there is just one system clock now. Also, the delay buffer placed before the canary flop always has a positive delay, even if affected by aging, which makes the canary flip-flop recover from variation induced effects by itself. Canary circuit also predicts timing errors rather than detecting them afterwards. The predictive warning allows the user to take preventive measures before the timing violation actually occurs and thus the system does not run into any corrupt data states, except for errors that cannot be predicted such as single event upset (SEU) errors. However, aging induced timing violations can be predicted effectively by architectures such as canary circuit [2]. 2.3. Test Pattern Generator and Control Circuit Figure 3 shows the gate level implementation of the control circuit. Finish signal going high indicates the completion of offline testing, PON is the power-on-reset signal, Warning is the timing error prediction signal from the canary circuit and Preset is the active low signal to set the flip-flops in the LFSR to high state on power-on-reset. The preset generation circuit is shown in the dotted box in figure 3. The initial states of all the flip-flops in the LFSR on power-on-reset is 1, thus the starting seed for the LFSR is all 1 s. The LFSR shown in figure 3 is a 12-bit LFSR; it implements a primitive polynomial to generate 4095 patterns (2 n -1; n=12) before returning back to the initial state of all 1 s. The outputs of the flip-flops in the LFSR are fed to a scan chain through a mux-d connection. These connections are omitted in figure 3 for clarity. 99

Figure 3. Test pattern generator and control circuit The control circuit is triggered by the power-on-reset signal (PON), which remains high for one cycle on each power-on-reset of the chip. On each power-on-reset, the offline test signal triggers the offline testing. Generation of the offline test signal is shown in the box ifnfigure 3. The offline test signal is the input for preset generation circuit that presets the flipflops in the LFSR to high, the initial seed for the test patterns in the LFSR. The Finish signal is generated by the circuit shown in figure 4(a). Its first stage consists of a 12-input AND gate and the second stage consists of a 2-input Muller-C gate. Muller-C gate is an AND gate for events i.e., it produces a high output when all the inputs are high and goes low only when all the inputs transition to low state. The description about Muller-C gates can be found in [16]. As shown in figure 4(a), the outputs of the flip-flops in the LFSR are connected to a 12-input AND gate. The output of this AND gate and PON feed to a 2-input Muller-C gate to produce the finish signal. On every power-on-reset, the flip-flops of the LFSR are preset to 1, thus the output of the AND gate rises high. Since PON is active high, PON is low at startup and thus finish stays at 0 initially. PON stays high for one clock cycle and then goes to low. When all the 4095 test patterns have been generated, the output of the AND gate goes high again and since PON is also high; finish goes high indicating the completion of offline testing. After finish goes to high, at the next clock edge, the output of the AND gate goes low due to a pattern other than all ones. However, the finish signal still stays high because of the property of the Muller-C gate to hold the previous value until both the inputs transition to the same value. In this case, although the output of the AND gate goes low, since PON is still high, finish stays high. The possible timing violations in the critical paths i.e., the paths that are affected due to aging are predicted by the warning signal from the canary flip-flops. The critical paths that 100

are affected by aging need to be corrected by application of suitable forward body bias. Since, the body bias generation circuit takes some time to apply correct bias to the devices on these critical paths, the LFSR needs to be stalled. In our approach, we stall the clock to the LFSR by using gated clock circuitry shown in figure 4(b). In figure 4(b), the circuit can stall the clock for one clock cycle, which is sufficient for us to change the body bias of the devices on the critical paths. However, if we need more time then the clock can be stalled for a longer period of time using cascaded Muller-C gates in figure 4(b). Outputs from LFSR flip-flops 1-12 C Finish PON Figure 4(a). Finish signal generator Figure 4(b). Gated clock circuit To Body Bias Block 4 To 2-4 Decoder 0 1 Select D SET 1 CLR Q Q 0 1 Select D SET 2 CLR Q Q Clock Preset Warning Figure 5. Generation of control signal to body bias block Figure 5 shows the body-bias generation logic. As shown, it consists of a 2-bit up-counter, made up of two flip-flops. The four possible states of the counter translate into fours possible body bias levels to choose from. Level 0 is the no-bias condition; levels 1-3 are in the increasing order of the forward body biases. As stated earlier, we deal with aging degradation which monotonically degrades the circuit performance with time; thus, forward body bias or reduced reverse body bias is necessary to restore the circuit performance. The up-counter counts upward (increases forward bias / reduces reverse bias) when a warning signal is generated by the canary circuit. It counts upward till it reaches the highest forward body bias state / least reverse body bias (binary 11 in our case) and freezes in that state. We implement a four state counter as few forward body bias levels are sufficient for the circuits under consideration. However, larger number of forward bias levels can be generated by adding extra flip-flops in the body bias generation circuit. The outputs Q1 and Q2 of the counter go to a 2-to-4 decoder. The decoder outputs are inputs to the body bias circuitry which is implemented as in [12] and enable the appropriate body bias option. 101

3. Experiment Setup and Results The experiments for offline testing are performed on ISCAS89 sequential benchmarks: s526 and s832. First, we augment these circuits with BIPT hardware. To do this, we determine the critical paths in these circuits by using a static timing analyzer written in C. The gate libraries needed for the static timer are characterized in HSPICE for 90nm model card from BPTM [http://www-device.eecs.berkeley.edu/~ptm/]. The flip-flops at the output of the critical paths are replaced by canary circuits whose structure and operation is described in section 2.2 and in detail in [8]. Once the canary circuits are determined, we traverse the path from input of the canary FF in a breadth first manner till we reach either a flip-flop or a primary input. We replace the flip-flops by mux-d scan flops and add extra scan flip-flops for the primary inputs. The scan flip-flop has two inputs, one input is connected to the input of the original flip-flop and the other input is connected to the output of the LFSR. A scanenable signal is used to select between the two inputs. Finish signal serves as the scan-enable for scan flip-flops in our design. It can as well be a user-defined input. The characteristics of the benchmarks pre- and post-bipt processing are shown in Table 1. Column 3 shows the number of flip-flops originally in the design and column 7 shows the number of these flipflops replaced by canary flops respectively. Column 6 shows the number of mux-d scan flops inserted in the design. To validate the BIPT system on these benchmarks, we consider the effect of NBTI induced PMOS threshold voltage (V t ) degradation in these circuits. Our simulations take into account the effect of both nominal V t degradation and temporal variations in V t degradation; using models as described in [5]. The other important task is to set the clock period for simulations and thus set the target performance for both the benchmarks. The clock period for the simulations is determined by applying a pre-defined V dd to just the main circuit (without BIPT hardware). For this V dd, we run simulations to find out the nominal clock period such that no error occurs during offline testing. We add a safety margin of 15% to this period and the resulting clock period becomes clock period for our simulations. For a V dd of 1.15V, this final value is found to be 480ps (2.08 GHz) and 600ps (1.67 GHz) for s526 and s832 respectively. Since a circuit with BIPT hardware, doesn t need to operate with high safety margins, we set the clock period to be 480ps (600ps) and for this clock period we determine minimum V dd such that no error occurs during offline testing. For both the benchmarks, V dd is set to 0.925v for BIPT. Table 1. Characteristics of ISCAS 89 benchmarks under consideration: pre- BIPT processing and post-bipt processing No of No of FF No of mux-d flip-flops No of No of replaced by ISCAS '89 No of scan-flops (FF) Primary Primary canary FF Benchmark Gates (post-bipt (pre-bipt Inputs Outputs (post-bipt processing) processing) processing) s526 193 21 3 6 12 4 s832 262 5 18 19 12 2 To evaluate the effectiveness of BIPT approach, we carry out two sets of simulations in HSPICE at 100 C: (a) Deterministic Simulations and (b) Statistical Simulations. For deterministic simulations, simulations are carried out for 0%, 5% and 10% of NBTI induced deterministic V t degradation. We compare the total operating power consumed by BIPT scheme with the over-designed case as the baseline case. The power estimation of BIPT system includes power dissipation due to the TPG, canary circuit and control circuit. The over-design implemented here is a conservative scaling of V dd level. In particular, the V dd for the over-designed case is set such that it does not cause timing violations and meets the 102

performance targets at 10% V t degradation as well. This value is found to be 1.2V for both s526 and s832 for 2.08 GHz and 1.67 GHz respectively. On the other hand, BIPT scheme allows for typical case circuit design, and adapts to the degradation of the circuit during its lifetime. Thus, the operating voltage is kept at 0.925V for BIPT simulations. Figure 6 plots the power consumed for deterministic simulations for s526 and s832. From the simulation results, we can observe that, on an average, BIPT scheme leads to power savings of 45% compared to the over-designed case. Power (mw) 35 30 25 20 15 10 5 0 Power consumption for Over-Designed vs BIPT schemes, Deterministic Simulations 30.56 30.41 30.29 26.48 26.36 26.24 15.85 16.38 15.82 16.3 15.83 16.16 0% 5% 10% V t Degradation s526 Over-designed Power s526 BIPT Power s832 Over-designed Power s832 BIPT Power Figure 6. Power consumption for deterministic simulations Power (mw) 40 35 30 25 20 15 10 5 0 27 Power consumption for DVS vs BIPT schemes, Statistical Simulations 33.51 30.06 29.03 26.24 23.62 19.3 19.25 19.03 16.31 16.15 16.03 2% 5% 10% Nominal V t Degradation s526 DVS Power s526 BIPT Power s832 DVS Power s832 BIPT Power Figure 7. Power consumption for statistical simulations considering the temporal variations of NBTI effect For statistical simulations, we take into account the statistical component of NBTI degradation over and above the nominal V t degradation in lifetime V t degradation, which takes into account the statistical variation in the underlying process causing V t degradation [5]. We model the statistical V t degradation as a Poisson random variable. We compare the power consumed by BIPT scheme with Dynamic Voltage Scaling (DVS) scheme. Thus, the dynamic voltage scheme serves as the baseline for statistical simulations. The simulations are carried out for statistical V t variation over 2%, 5% and 10% of nominal value. The V dd values for DVS are selected such that in each nominal case, the circuit is ensured to work for the 103

worst statistical variation. Thus, for a transistor whose V t is degraded by 5% (statistical variation component) over and above the 2% nominal degradation, V dd is selected such that the circuit would still work without any timing violations if all transistors in the circuit were similarly affected. The operating voltages for the DVS schemes are found to be 1.15V, 1.2V and 1.25V for 2%, 5% and 10% degradations respectively. The operating voltage for BIPT case still remains at 0.925V. Figure 7 plots the power consumed for statistical simulations for s526 and s832. From the experimental results, we can observe that, on an average, BIPT scheme leads to power savings of 30% compared to the dynamic voltage scaling approach. Power saving here is less than the over-design case because dynamic voltage scaling scheme is an improvement over the over-designed approach. The power for DVS methodology increases as V t degradation increases because of the fact that the voltage supply is varied keeping in mind the most degraded transistor. 4. Conclusions In this paper, we propose a Built-In Proactive Tuning system that allows VLSI circuits to autonomously compensate aging-induced performance degradations. Due to its adaptive nature, BIPT is power-efficient and uses about 45% less power than over-design based aging compensation. Since it is a middle-grained approach, it can achieve 30% power reduction compared to the coarse-grained DVS method. References [1] M. Abramovici, M. A. Breuer and A. D. Friedman, Digital Systems Testing and Testable Design, IEEE Press, New York, 1990. [2] M. Agarwal, B. C. Paul, M. Zhang and S. Mitra, Circuit Failure Prediction and Its Application to Transistor Aging, IEEE VLSI Test Symposium, 2007, pp. 277-286. [3] K. A. Bowman, J. W. Tschanz, N. S. Kim, J. C. Lee, C. B. Wilkerson, S.-L. L. Lu, T. Karnik and V. K. De, Energy-Efficient and Metastability-Immune Timing-Error Detection and Instruction-Replay-Based Recovery Circuits for Dynamic-Variation Tolerance, IEEE ISSCC, 2008, pp. 402-403. [4] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner and T. Mudge, A Self-Tuning DVS Processor Using Delay-Error Detection and Correction, IEEE Journal of Solid-State Circuits, Vol. 41, No. 4, April 2006, pp. 792-804. [5] K. Kang, S. P. Park, K. Roy, and M. A. Alam, Estimation of statistical variation in temporal NBTI degradation and its impact on lifetime circuit performance, Proceedings of the 2007 IEEE/ACM ICCAD, November 2007, pp 730-734. [6] B. C. Paul, K. Kang, H. Kufluoglu, M. A. Alam and K. Roy, Negative Bias Temperature Instability: Estimation and Design for Improved Reliability of Nanoscale Circuits, IEEE Transactions on CAD of Integrated Circuits and Systems, Vol. 26, No. 4, April 2007, pp. 743-751. [7] R. Samanta, G. Venkataraman, N. Shah and J. Hu, Elastic Timing Scheme for Energy-Efficient and Robust Performance, IEEE ISQED, 2008, pp. 537-542. [8] T. Sato and Y. Kunitake, A Simple Flip-flop Circuit for Typical-Case Designs for DFM, IEEE ISQED, 2007, pp. 539-544. [9] J. Srinivasan, S. V. Adve, P. Bose and J. A. Rivers, The Case for Lifetime Reliability-Aware Microprocessors, ACM/IEEE ISCA 2004, pp. 276-287. [10] J. Srinivasan, S. V. Adve, P. Bose and J. A. Rivers, Lifetime Reliability: Towards an Architectural Solution, IEEE Micro, Vol. 25, No. 3, May-June 2005, pp. 70-80. [11] I. E. Sutherland, Micropipelines, Communications of the ACM, Vol. 32, No. 6, June 1989, pp. 720-738. [12] J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoniadis, A. P. Chandrakasan and V. De, Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die Parameter Variations on Microprocessor Frequency and Leakage, IEEE Journal of Solid-State Circuits, Vol. 37, No. 11, November 2002, pp. 1396-1402. [13] M. Zhang, T. M. Mak, J. Tschanz, K. S. Kim, N. Seifert, and D. Lu, Design for Resilience to Soft Errors and Variations, Proceedings of the 13th IEEE IOLTS, July 2007, pp. 23-28. 104