Clock Gate Test Points

Similar documents
Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

Changing the Scan Enable during Shift

K.T. Tim Cheng 07_dft, v Testability

A Technique to Reduce Peak Current and Average Power Dissipation in Scan Designs by Limited Capture

A Novel Low-overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application

VLSI IMPLEMENTATION OF SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST IN FPGA TECHNOLOGY

Scan-shift Power Reduction Based on Scan Partitioning and Q-D Connection

Analysis of Power Consumption and Transition Fault Coverage for LOS and LOC Testing Schemes

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

Logic Design for Single On-Chip Test Clock Generation for N Clock Domain - Impact on SOC Area and Test Quality

Launch-on-Shift-Capture Transition Tests

MVP: Capture-Power Reduction with Minimum-Violations Partitioning for Delay Testing

Logic Design for On-Chip Test Clock Generation- Implementation Details and Impact on Delay Test Quality

An On-Chip Test Clock Control Scheme for Multi-Clock At-Speed Testing

Scan. This is a sample of the first 15 pages of the Scan chapter.

Controlling Peak Power During Scan Testing

Design of Fault Coverage Test Pattern Generator Using LFSR

Design for Testability

Impact of Test Point Insertion on Silicon Area and Timing during Layout

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Design for test methods to reduce test set size

Clock Control Architecture and ATPG for Reducing Pattern Count in SoC Designs with Multiple Clock Domains

New tests and test methodologies for scan cell internal faults

Response Compaction with any Number of Unknowns using a new LFSR Architecture*

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

High-Frequency, At-Speed Scan Testing

At-speed testing made easy

Scan Chain Reordering-aware X-Filling and Stitching for Scan Shift Power Reduction

Overview: Logic BIST

Weighted Random and Transition Density Patterns For Scan-BIST

On Reducing Both Shift and Capture Power for Scan-Based Testing

SIC Vector Generation Using Test per Clock and Test per Scan

Power Problems in VLSI Circuit Testing

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Final Examination CLOSED BOOK

Logic BIST Architecture Using Staggered Launch-on-Shift for Testing Designs Containing Asynchronous Clock Domains

Reducing Test Point Area for BIST through Greater Use of Functional Flip-Flops to Drive Control Points

Minimizing Peak Power Consumption during Scan Testing: Test Pattern Modification with X Filling Heuristics

Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Cell-Aware Fault Analysis and Test Set Optimization in Digital Integrated Circuits

Strategies for Efficient and Effective Scan Delay Testing. Chao Han

Based on slides/material by. Topic 14. Testing. Testing. Logic Verification. Recommended Reading:

Lecture 18 Design For Test (DFT)

Reducing Power Supply Noise in Linear-Decompressor-Based Test Data Compression Environment for At-Speed Scan Testing

Performance Driven Reliable Link Design for Network on Chips

Module 8. Testing of Embedded System. Version 2 EE IIT, Kharagpur 1

Lecture 23 Design for Testability (DFT): Full-Scan

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

University College of Engineering, JNTUK, Kakinada, India Member of Technical Staff, Seerakademi, Hyderabad

A Low Power Delay Buffer Using Gated Driver Tree

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

A New Approach to Design Fault Coverage Circuit with Efficient Hardware Utilization for Testing Applications

MULTI-CYCLE AT SPEED TEST. A Thesis MALLIKA SHREE POKHAREL

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

Design of Routing-Constrained Low Power Scan Chains

Deterministic BIST Based on a Reconfigurable Interconnection Network

Chapter 8 Design for Testability

VLSI System Testing. BIST Motivation

BUILT-IN SELF-TEST BASED ON TRANSPARENT PSEUDORANDOM TEST PATTERN GENERATION. Karpagam College of Engineering,coimbatore.

Low Power Implementation of Launch-Off- Shift and Launch-Off-Capture Using T-Algorithm

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

I. INTRODUCTION. S Ramkumar. D Punitha

FLIP-FLOPS AND RELATED DEVICES

UNIT IV CMOS TESTING. EC2354_Unit IV 1

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop

VirtualScan TM An Application Story

Test Data Compression for System-on-a-Chip Using Golomb Codes 1

A Novel Framework for Faster-than-at-Speed Delay Test Considering IR-drop Effects

DELAY TEST SCAN FLIP-FLOP (DTSFF) DESIGN AND ITS APPLICATIONS FOR SCAN BASED DELAY TESTING

Design for Testability Part II

Scan Chain Design for Power Minimization During Scan Testing Under Routing Constraint.

Multi-Scan Architecture with Scan Chain Disabling Technique for Capture Power Reduction

Final Exam CPSC/ECEN 680 May 2, Name: UIN:

Figure.1 Clock signal II. SYSTEM ANALYSIS

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

DETERMINISTIC SEED RANGE AND TEST PATTERN DECREASE IN LOGIC BIST

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

PERFORMANCE ANALYSIS OF AN EFFICIENT PULSE-TRIGGERED FLIP FLOPS FOR ULTRA LOW POWER APPLICATIONS

TKK S ASIC-PIIRIEN SUUNNITTELU

LFSR Counter Implementation in CMOS VLSI

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction

HIGHER circuit densities and ever-increasing design

Deterministic Logic BIST for Transition Fault Testing 1

A New Low Energy BIST Using A Statistical Code

This Chapter describes the concepts of scan based testing, issues in testing, need

At-Speed Transition Fault Testing With Low Speed Scan Enable

Lecture 17: Introduction to Design For Testability (DFT) & Manufacturing Test

Multiple Scan Methodology for Detection and Tuning Small Delay paths

CMOS Testing-2. Design for testability (DFT) Design and Test Flow: Old View Test was merely an afterthought. Specification. Design errors.

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

Transcription:

Clock Gate Test Points Narendra Devta-Prasanna and Arun Gunda LSI Corporation 5 McCarthy Blvd. Milpitas CA 9535, USA {narendra.devta-prasanna, arun.gunda}@lsi.com Abstract Clock gating is widely used in modern integrated circuits as a means of reducing dynamic power consumption. In this paper we present a comprehensive analysis of the impact of clock gating during test. We then propose a new type of test point called Clock Gate Test Points. Similar to classic test point techniques, clock gate test points help in increasing the test coverage as well as reducing the number of test patterns and thus test time. We also outline techniques for applying the proposed test points in a design. We present the results of coverage improvement and test pattern reduction with the proposed method for several large industrial circuits. Our results show that with the proposed method, in many cases, more than 2.% improvement in transition delay fault coverage can be achieved and the number of test patterns can be reduced by more than 5% for the same fault coverage. Furthermore, the proposed test points add very little area overhead and do not impact the circuit performance.. Introduction Test point insertion [-8] is a well known DFT method with several applications. In BIST domain [][2], test points have been extensively studied for improving the testability of hard to detect random pattern resistant faults and for achieving higher fault coverage with less number of test patterns. For design using scan based deterministic testing [3-6], test points have been considered for several reasons such as improving the testability of hard to detect faults and reducing the fill rate of test patterns in order to achieve better test compaction and thus reduce the number of test patterns and test time. The low fill rate of test patterns with test points can also be exploited to target higher compression ratios since the efficiency of the compression algorithms increase with higher number of don t care bits in test patterns. Higher compression ratios ultimately result in test pattern and test time savings. Also, in the case of very high compression, some hard to detect faults that are untestable because all the care bits needed to test them cannot be encoded by the compression can become testable due to test points. There are two basic types of test points that are used in the industry: control test points and observe test points. Control test points make a hard to control circuit node easily controllable and observe test points make a hard to observe node easy to observe by adding additional to the design. Additionally, control test points also add performance overhead to the design. Clock gating is a widely used technique for reducing the dynamic power consumption in modern integrated circuits. In most circuits that are manufactured in the latest technologies, a significant number of the scan cells in the design are controlled by clock gates. Furthermore, the functional for operating the clock gates can be fairly complex and it can bear a substantial impact on test. In this paper, we present a comprehensive analysis of the two popular methods of handling clock gates during test with respect to test coverage, test pattern count and test power. We then propose a new type of test point called Clock Gate Test Points that help in reducing the fill rate of test patterns similar to classic test points as well as improving the transition fault coverage. As the name indicates, these test points are related to clock gates in a design and they make it easier for the ATPG to control the clock gates during test. Also, the proposed test points add very little area overhead and do not add any performance overhead which can be a critical factor in very high speed designs. It should also be noted that similar to classic test points, clock gate test points can also be used in BIST designs for achieving higher fault coverage using fewer test patterns. The rest of the paper is organized in the following manner. In Section 2 we discuss classic test point techniques. In Section 3 we discuss clock gate usage in modern industrial designs and provide data on its impact on test. The proposed new test point method and its application are described in Section 4 as well as experimental results pro- Paper 3.2 INTERNATIONAL TEST CONFERENCE 978--4244-727-9//$26. 2 IEEE

vided for several large industrial circuits. Section 5 concludes the paper. 2. Background In this section we discuss classic test point techniques that are well known in the industry. 2. Test Points There are two basic types of test points: control test points and observe test points. Control test points provide a method to control a hard to control or uncontrollable node to a desired value during test and thus enable easy testing of faults which are in the output cone of of the hard to control node. Observe test points provide a method to easily observe faults in circuitry where the fault effect is blocked from propagation to an observation point due to some hard to control or uncontrollable circuit. Figure (a) shows an example of unobservable and uncontrollable circuit due to hard to control circuitry. hard to control circuitry unobservable circuit uncontrollable circuit Figure (a): Example of uncontrollable and unobservable hard to control circuitry circuit is observable observe test point scan cell scan cell scan_mode circuit is controllable control test point Figure (b): Example of control test point and observe test point insertion Figure (b) illustrates one of the common implementation of the two test points. The scan_mode signal controlling the multiplexer of the control test point is present in all industrial designs and it remains high throughout the duration of the entire test and it is low during the functional operation of the circuit. The observe scan cell and the control scan cell are part of the regular scan chains in the design. Even though test points help in reducing the number of test patterns and test time, there are several costs associated with their usage. Both control and observe test points require adding extra to the circuit. Control test points require one multiplexer and one scan flip-flop per test point and observe test points require one scan flip-flop per test point. These scan cells also increase the length of the scan chains in the design resulting in additional shift cycles and thus additional test data per pattern. In the case of control test points, the multiplexer is inserted in the functional path which can also potentially degrade the circuit performance. It should also be noted that while observe test points enable easy testing of faults in hard to observe circuitry, in the case of transition faults, the additional fault coverage is obtained along paths that are not part of the functional and therefore such tests can be of inferior quality in terms of screening delay defects. Recently, methods have been proposed to alleviate the area overhead of control and observe test points by utilizing existing scan cells in the design rather than adding new scan cells [7][8]. Additionally, in the case of observe test points the area overhead can be reduced by sharing one scan cell for several observe test points. In this case, instead of having one scan cell for each observe test point, several hard to observe nodes are observed through a OR tree at a common observation point scan cell. 3. Clock Gates in Design Clock gating is a commonly used technique in industrial designs for reducing the dynamic power consumption during functional operation of the circuit. It works by selectively stopping the clock to portions of the circuit that are inactive at certain periods of time. This is achieved by gating the clock with a control signal which is low during periods of inactivity and high otherwise. Clock gate insertion is usually performed automatically by the synthesis tools during synthesis step. It utilizes the Integrated Clock Gate (ICG) cell which is a custom cell available in all standard technology libraries for clock gating purpose. It contains a level sensitive latch and an OR gate and an AND gate connected together as shown in Figure 2. It should be mentioned that several other implementations of the ICG cell are possible though they are similar in functionality. The CP pin is connected to the input clock signal and GCP is the gated clock output pin. The LD input is connected to the functional control signal which determines the clock gating. If LD =, then the gated clock output is active and if LD =, then the gated clock output is stopped from the next cycle after LD goes low. The latch is present in order to ensure that the gated clock output waveform always has the same shape as the input clock. The TE input is internally ORed with the LD Paper 3.2 INTERNATIONAL TEST CONFERENCE 2

input as shown in the figure and it is provided to enable easy control of the clock gate cell during test as discussed in the next section. LD TE CP D GN Q GCP Figure 2: Integrated Clock Gate (ICG) cell 3. Clock Gates during Test One of the basic requirements for correct operation of the scan chains during test is that the clocks driving the scan cells should be completely controllable during scan chain shifting. When clock gates are used in a design, the gated clock outputs are only active when either its LD input or TE input or both are high. As the LD input pin is driven by functional, it is not possible to ensure that it is high during each cycle of scan shifting operation; therefore, the TE pin is used to take control of the clock gate cells during scan shifting. combinational scan_mode value during scan shift = value during scan capture = combinational scan_enable value during scan shift = value during scan capture = LD TE CP D GN (a): scan_mode scheme LD TE CP D GN (b): scan_enable scheme Figure 3: Clock gate cell operation during test Q Q GCP GCP There are two commonly used schemes for controlling the TE pin in industrial designs. In the first scheme, which is referred to as the scan_mode scheme in this paper, the TE pins of all the clock gates in the design are connected to the scan_mode signal. As previously mentioned, the scan_mode signal is high throughout the entire duration of test including during scan shifting. In the second scheme, the TE pins of all the clock gates are connected to the scan_enable signal. The scan_enable signal is also present in all scan designs. It is high during scan shifting operation and low at all other times. The second scheme is referred to as the scan_enable scheme in the rest of the paper. The two schemes are shown in Figures 3(a) and 3(b) respectively. There are several advantages and dis-advantages associated with each of the two schemes which are discussed below. Fault coverage: The fault coverage achieved with the two schemes is different. It is commonly understood that the scan_mode scheme results in lower fault coverage than the scan_enable scheme. This is because the faults in the input cone of driving the LD pins of the clock gates cannot be tested since they become unobservable because their fault effect is blocked at the internal OR gate of the clock gate cells whose other input is connected to the scan_mode signal and thus always high during test. In the scan_enable scheme, however, the faults can be tested because their fault effects can be propagated through the clock gate cells since the scan_enable signal is low during scan capture mode. We will show later in this paper that the belief about lower fault coverage of the scan_mode scheme is only true for stuck-at faults but not for transition faults. In the case of transition faults, we show that the scan_mode scheme can test some faults which cannot be tested using the scan_enable scheme. Therefore, the overall transition fault coverage of the two schemes depends on the circuit characteristics of each design. It should also be noted that when the scan_mode scheme is used, observe test points can be added to the LD pin net of all the clock gates in order to test the faults in the input cone of of the LD pins. However, as stated earlier, observe test points add area overhead and in the case of delay testing, the transition faults are tested along nonfunctional paths which can be of lower test quality. Test power: Another important difference between the two schemes is in the area of test power. In the scan_mode scheme, all the clock gates in the design are always active throughout the entire duration of test. As a result, during scan capture mode, all the flip-flops in the design are clocked leading to high switching activity and test power consumption when compared to the functional mode. The high switching activity can aggravate the clock stretching effect during delay testing and degrade the quality of test [9]. False failures can also happen since good dies can fail the test due to higher than normal dynamic voltage droop on the power grid [][]. In the scan_enable scheme, however, only those clock gates whose LD pin is high during scan capture allow the Paper 3.2 INTERNATIONAL TEST CONFERENCE 3

gated clock output to be active. Due to the random fill of the don t care bits in test patterns, it is likely that many of the LD pins will be low during scan capture and thus not all the flip-flops in the circuit will be clocked in a test pattern. Recently, several ATPG techniques [2-6] have been proposed that deterministically set the LD pin of clock gates to low during scan capture in order to reduce the switching activity during test to a desirable level to mimic the functional mode power profile. These ATPG techniques cannot be used in conjunction with the scan_mode scheme since the clock gates can never be turned-off during test as their TE pins are always high during test. For the reasons stated above, most industrial designs use the scan_enable scheme to control the clock gates during test. The main drawback of this scheme is that it can increase the fill rate of test patterns and thus result in higher number of test patterns and test time when compared to the scan_mode scheme. This can be demonstrated using the circuit shown in Figure 4. input pipeline - 2 scan_enable = clock comb comb input pipeline - comb CUT TE LD clock gating cell D GN output pipeline Figure 4: Impact of clock gates on fill rate of test patterns for scan_enable scheme Consider, for example, test generation for a transition delay fault in the combinational circuit marked with an in the figure. For the time being, ignore the clock gate cell and all the that is driving its LD input pin and assume that the clock signal driving the scan cells in the input and output cone of the combinational under test (CUT) is coming directly from a primary input. Now, in order to launch a transition at the target node in the CUT and propagate the fault effect to a scan cell for observation, care bits are needed for some of the scan cells in up to two levels of pipeline stages driving the CUT as shown in the figure. These care bits are determined by the ATPG and shifted-in during test. Now, let s consider that clock gates are used in the design and all the scan cells in the input and output cone of Q of the CUT are driven by gated clock as shown in the figure. Also, the TE pin of the clock gate is connected to the scan_enable signal. In this case, in addition to the care bits needed to excite and propagate the transition fault, care bits are also needed to ensure that the launch and the capture clock pulses are not killed by the clock gate cell and allowed to pass through to the scan cells in the input and the output cone of of the CUT. In order to satisfy this requirement, ATPG has to ensure that the LD pin of the clock gate cell is high during both launch and capture cycles by assigning appropriate care bits to some of the scan cells in up to two pipeline stages driving the LD input of the clock gate. If cascaded clock gating scheme is used in the design where the input clock to a clock gating cell is driven by the output of another clock gate cell then even more additional care bits are needed in order to ensure that the launch and capture clock pulses are not killed by any of the clock gates along the clock path. It should be noted that if the scan_mode scheme is used, then no additional care bits are needed to set the LD pin of the clock gate to high since the clock gate cells are always enabled during test. 3.2 Clock Gate Usage Statistics Next, we present some statistics on the usage of clock gates in industrial designs and experimental data to compare the two schemes of handling clock gates during test. Ckt. Table : Summary of test case circuits # of sim gates # of scan cells # of clock gates % gated scan cells % scan cells driving clock gates E 6 k 36,225 588 67.2 4.62 E2.32 mil 68,495 442 59.62 5.9 E3 2.73 mil 2,359 853 57.84 5.4 E4 3.67 mil 93,732 2,54 6.24 6.8 E5 4.62 mil 97,499 939 22.94 2.66 E6 7.5 mil 269,3 2,69 68.94 6.7 E7 22.36 mil 625,45 5,99 29.77 3.54 E8 45.85 mil 2,73,783 3,298. 2.66 Table gives some relevant information on the circuits studied in this paper. We used eight circuits whose names are given in the first column of the table. E, E2, E3, E4 and E5 are 4 nm circuits whereas E6 and E8 are 65 nm circuits and E7 is a 9 nm design. All circuits except E7 and E8 are individual blocks belonging to some large designs. The second and third columns list the number of simulation gates and the number of scan cells respectively. The fourth column lists the number of clock gate cells in the design. The fifth column shows the percentage of scan cells in the design with gated clocks and the last column shows the percentage of scan cells that Paper 3.2 INTERNATIONAL TEST CONFERENCE 4

are in the input cone of driving the LD pins of all the clock gates in the design. All designs above except E8 have a single level of clock gating i.e. there is at most one clock gate cell in the clock path between the clock source and any scan flip-flop. In the case of E8 design, however, clock gates were also used to enable/disable the entire clock tree going to each of the hierarchical blocks in the design. Due to this reason, up to three levels of clock gating is present in the clock path to the scan cells and all the scan cells in the design have gated clocks. It can be seen from the table that a substantial number of scan cells in industrial designs have gated clocks. For the designs studied, the percent of scan cells with gated clocks ranges from ~3% to ~7%. Also, a substantial number (~2.5% to 6%) of scan cells are involved in operating the clock gating in the design. In order to better understand the impact of clock gates on ATPG test patterns when the scan_enable scheme is used, we collected and analyzed data on the number of scan cells driven by each clock gate cell and also the number of scan cells that are in the input cone of driving the LD pin of each clock gate cell. The result of our analyses for five designs is shown below in Figures 5 and 6 respectively. Percent of clock gates 8. 7. 6. 5. 4. 3. 2... E2 E4 E5 E8 E7 - -2 2-5 5- -5 >5 Number of scan cells Figure 5: Statistics for clock gate influence cone Figure 5 shows the statistics for the number of scan cells driven by each clock gate in a design. The y-axis is the number of clock gates expressed as a percentage of the total number of clock gates in the design and the x-axis shows the number of scan cells driven by each clock gate cell. It can be seen that for all the five designs, a huge majority of the clock gates drive anywhere from to 5 scan cells and only a handful of clock gates drive either more than 5 scan cells or less than scan cells. Based on this figure, it can be said that in order to test the maximum number of faults in as few patterns as possible, a large number of clock gate cells need to be turned-on by the ATPG tool since it is not sufficient that only a few scan cells may be turned-on and it will enable the clock to a large number of scan cells. Figure 6 shows the statistics for the number of scan cells that are driving the LD pin of clock gate cells in a design. The x-axis shows the number of scan cells in the input cone of of the LD pin and the y-axis shows the number of clock gate cells as a percentage of the total number of clock gates in the design. It can be seen that for all the designs except E5, the controlling the clock gate cell for more than 5% of the clock gates consists of to scan cells. However for the rest of the clock gates, the clock gate controlling can be fairly complex with up to more than 5 scan cells driving their LD pins. In order for these clock gates to be turned-on during test, in the worst case, the ATPG might need to assign care bits to all the scan cells driving the LD pin. Furthermore, in the case of delay testing, care bits will also be needed to ensure that the driving the LD pin has the necessary state after the launch clock pulse to ensure that the LD pin is high during the capture clock cycle as well. Percent of clock gates 9. 8. 7. 6. 5. 4. 3. 2... E2 E4 E5 E8 E7 - - 2 2-5 5 - - 5 > 5 Number of scan cells Figure 6: Statistics for clock gate control cone 3.3 Impact of Clock Gate Handling on Test In this section we present experimental results to compare the two scan_mode and the scan_enable scheme of handling clock gates during test. Ckt. Table 2: Stuck-at fault ATPG results scan_mode scheme # of Pats scan_enable scheme # of Pats E 98.7 5,95,953 99.25 4,99,577 E2 99.4 6,588 4,2 99.27 5,249 3,27 E3 98.98 6,223,996 99.2 6,39,38 E4 98.92 7,589 27,4 99.28 7,555 2,784 E5 99.25 56,64 78,93 99.37 56,867 79,26 E6 98.36 4,238 3,778 99.54 4,3 2,76 E7 98.94 9,35 49,659 99.8 9,49 72,366 98.9 5,3 26,786 99.29 4,92 28,734 Table 2 shows the result of stuck-at fault ATPG for the test case designs. The first column shows the name of the circuit. The next three columns show respectively the test coverage, test pattern count and ATPG run time for the scan_mode scheme and the last three columns show the Paper 3.2 INTERNATIONAL TEST CONFERENCE 5

same information for the scan_enable scheme. The last row shows the average values for each column across all the designs. It can be seen that, overall, both methods result in similar pattern counts and ATPG run times but the test coverage for the scan_mode scheme is lower than the scan_enable scheme. As mentioned earlier, this is due to the fact that in the scan_mode scheme, faults in the input cone of of the LD pin become unobservable. On an average, for the designs studied, the coverage of the scan_mode scheme is lower than the scan_enable scheme by.39%. In the case of E6, however, the difference in coverage is.8%. Table 3 shows the results of transition fault ATPG in the same format as Table 2. It can be seen that, on average, both the schemes result in similar transition fault coverage but when the scan_enable scheme is used, it results in 22% higher pattern count and 27% higher ATPG run times. However, if we examine the results for individual designs, it can be seen that the TDF coverage of the two schemes are quite different. For designs E, E2, E3 and E4, the scan_mode scheme results in higher coverage of.4%,.3%,.7% and.92% respectively while for E5, E6 and E7 designs, the scan_enable scheme results in higher TDF coverage of.22%, 3.22% and.5% respectively. Ckt. Table 3: Transition fault (TDF) ATPG results scan_mode scheme # of Pats scan_enable scheme # of Pats E 88.32,222 3,43 87.8 2,659 3,457 E2 92.23 6,322 24,454 9. 5,958 2,398 E3 95.48 7,77 44,883 95.3 7,924 46,84 E4 89.85 29,9 42,52 88.93 37,8 6,977 E5 88.9 4,88 654,48 89.3 42,488 66,248 E6 9.2 2,66 9,346 93.24 24,495,543 E7 9.52 3,8 29,943 9.67 76,36 73,828 9.9 38,335 82,796 9.94 46,78 232,899 These results show that the conventional understanding that the scan_enable scheme results in higher coverage is not true when it comes to transition fault testing. This is due to the fact that in the scan_mode scheme even though the faults in the driving the LD pin are blocked from being observed through the clock gate cell, those faults might be observable through some other fan-out that does not involve fault propagation through a clock gate cell. Moreover, our analysis shows that many transition delay faults that are ATPG untestable with the scan_enable scheme can be tested with the scan_mode scheme. This phenomenon also contributes to higher achievable TDF coverage for the scan_mode scheme. In general, for a given design, whether the scan_mode scheme results in higher coverage or the scan_enable scheme results in higher coverage depends on circuit characteristics and structure of the design. Next, we demonstrate with an example how some transition faults that are untestable with the scan_enable scheme can be tested with the scan_mode scheme. Based on extensive literature survey, this paper is the first work to demonstrate this effect. Figure 7: Example of a transition fault untestable with scan_enable scheme but testable with scan_mode scheme Consider a sequential circuit as shown in Figure 7 consisting of four scan flip-flops and one clock gate cell. Also assume that the flip-flops FF, FF2 and FF3 receive their clocks directly from the clock source while FF4 is connected to the gated clock output of the clock gate cell as shown in the figure. The TE pin of the clock gate is connected to scan_enable signal. Y and Z are circuit primary inputs. Now let s consider the slow-to-fall transition delay fault at node g. The first two values shown in the circuit corresponding to each node represent the launch and capture cycle value assigned by the ATPG in order to activate the fault. For the moment we should ignore the third set of values shown in red. It can be seen that the necessary assignments required for launching the transition also result in the LD pin of the clock gate being during both launch and capture cycles of the test. As a result, though the fault effect is propagated to FF4 for observation, it cannot be captured since FF4 does not receive any clock pulse during the test. Therefore, the slow-to-fall transition fault at node g is ATPG untestable. However, if the TE pin of the clock gate cell is connected to scan_mode signal instead of scan_enable signal, then the clock gate is always enabled during test. In this case, even though the LD pin of the clock gate is during test, the clock pulse is propagated to FF4 and the fault effect is captured. Therefore the fault can be tested and it is no longer ATPG untestable. It should be noted that even though the slow-to-fall transition fault at node g is untestable when the TE pin is connected to scan_enable and thus during test, it is important to test this fault since a delay defect at this node can result in functional failure. For instance, let us consider Paper 3.2 INTERNATIONAL TEST CONFERENCE 6

that the circuit is operating in functional mode and the scan_enable signal is always low. The three values shown next to each node represent the state of the circuit in three consecutive clock cycles. Also, let s assume that a defect is present at node g which causes the high to low transition at this node to be delayed by more than one clock cycle. This is very much possible if the underlying mechanism causing the delay fault is a weak resistive bridge or a transistor stuck-open fault. It can be seen that while a high to low transition at node g during the second clock cycle does not result in any failure, when a third clock pulse arrives, the fault effect is captured in FF4 resulting in a functional failure. Next, we present data to compare the impact of using the two schemes on test power. Table 4 shows the average and peak weighted switching activity () of stuck-at and TDF test patterns. Columns two to five show the data for the scan_mode scheme, and the next four columns show the data for the scan_enable scheme. It can be seen that on an average, patterns generated for the scan_mode scheme have both higher average and peak compared to patterns for the scan_enable scheme. On an average, the average for the scan_mode scheme is higher than the scan_enable scheme by 4.57% for stuck-at ATPG patterns and.8% for TDF patterns. In the case of E, the difference is more than 7% for stuck-at tests. Based on these results it can be seen that, in general, the scan_mode scheme results in much higher test power consumption than the second scheme. Ckt Table 4: of test patterns scan_mode scheme Stuck-at ATPG TDF ATPG Stuck-at ATPG TDF ATPG scan_enable scheme E 2.39 28.4 5.47 27.4 4. 23.39.72 24.42 E2 2.24 3.3 8.46 29.66 3.67 25.83 4.8 28.6 E3 25. 28.75 9.45 28.77 2.59 25.77 7.8 26.3 E4 24.72 27.4 7.25 27.28 9.8 23.36 5.62 23.2 E5 23.89 3.62 8.62 32. 2.64 3.5 6.8 3.22 E6 6.26 9.7.98 22.9 3.4 2.4 2.59 22.45 E7 6.3 7.89.29 8.9 4.53 6.57.83 6.62 2.25 26.9 6.7 26.49 6.68 23.77 4.26 24.46 Next, we present results of low capture power ATPG for the two schemes for both stuck-at and TDF testing. We used the low capture power feature of the TestKompress ATPG tool to generate test patterns with lower switching activity in order to reduce test power consumption. We used the tool setting such that test coverage is not sacrificed i.e. if for some fault no test pattern can satisfy the switching activity threshold, then the test with the lowest switching activity is accepted. The results of our experiments are shown in Table 5 in the same format as Table 4. It can be seen from Table 5 that when the scan_enable scheme is used, it is possible to reduce the switching activity and thus power consumption during test. For stuck-at tests, the average could be reduced by 3.83% and for TDF tests, the average could be reduced by.7%. Also the peak of the test patterns was also reduced by 2.5% and 2.26% for stuck-at and TDF tests respectively. However, for the scan_mode scheme, no reduction in switching activity of the test patterns was possible. Ckt Table 5: of low capture power test patterns scan_mode scheme scan_enable scheme Stuck-at ATPG TDF ATPG Stuck-at ATPG TDF ATPG E 2.8 27.62 5.47 27.52 2.2 22.2.46 2.66 E2 2.82 29.79 8.45 29.44 2.7 23.37 3.3 24. E3 25.59 29.44 9.53 29.4 4.37 9.24 4.72 26.3 E4 24.7 27. 7.26 27.42 4.2 8.52 4.2 22.8 E5 23.95 3.69 8.62 3.56 6.8 3.22 6.3 26.85 E6 6.4 8.89.64 2.2 9.7 2.8 9.55 2.4 E7 6.6 7.98.34 8.2.72 3.5 9.77 4.8 2.3 25.92 6.4 26.32 2.85 2.26 2.55 22.2 4. Clock Gate Test Points We propose a new method of handling clock gates during test. The proposed method is called Clock Gate Test Points (CGTPs) since they are similar to classic test point techniques in that they simplify the task of enabling the clock gates during test for the ATPG tool. scan_enable = scan_mode = scan cells driving clock gate LD pin comb clock Clock Gate test point TE LD clock gating cell D GN Figure 8: Clock Gate Test Point Figure 8 shows the implementation of the proposed method. The test point consists of an AND-OR combinational gate as shown in the figure. The output of the AND-OR is connected to the TE input of the clock gate. The inputs of the test point are connected as follows. One input of the AND gate is connected to the scan_mode signal and the other input is connected to the output of one of the scan cells which is in the input cone of of the clock gate LD pin. The output of the AND gate is connected to the input of the OR gate whose other Q Paper 3.2 INTERNATIONAL TEST CONFERENCE 7

input is connected to the scan_enable signal. It is also possible to use a dedicated test point enable signal, which can be called cgtp_enable, instead of the scan_mode signal. The cgtp_enable signal is a constant value signal during test and it can be controlled through a JTAG register. It provides the flexibility to enable or disable the test points during test. The operation of the clock gate test point is as follows. During functional operation, both scan_enable and scan_mode signals are low, therefore the TE pin of the clock gate is always low and the clock gate is only controllable through the LD pin. During test, when the scan_enable signal is high, the TE pin of the clock gate is also high which allows the scan chain to be correctly operated during scan shifting. During scan capture mode, however, the scan_enable signal is low and the clock gate can be enabled through two paths. The first path is the functional path through the LD pin which requires assigning appropriate care bits to some of the scan cells driving the LD pin. The second path (shown in red) is through the test point and it requires only one care bit assignment of to the scan cell that is connected to the test point. This is a short-cut path since it requires only one care bit whereas the functional path through the LD pin can require multiple care bits. The presence of the short-cut path through the clock gate test point allows the ATPG to generate tests with lesser number of care bits and thus enable better test compaction. It should be noted that the short-cut path needs to be timing closed at the functional speed of the circuit for correct operation. However, since the path originates from a scan cell which already has a functional path to the clock gate, it is easy to satisfy this requirement. The area overhead of the proposed method is one AND-OR gate per test point. Also, it does not impose any overhead on circuit performance. With the proposed method, all the faults in the driving the LD pin of the clock gate are still observable through the clock gate, since the ATPG can assign the scan cell driving the test point to in order to make the TE pin of the clock gate low. Therefore, all the faults that are testable using the scan_enable scheme are still testable. However, the proposed method also allows faults that are untestable with scan_enable scheme but testable with scan_mode scheme to be also tested. For example, if the proposed method is applied to the circuit shown in Figure 7, and the output of flip-flop FF3 is used to drive the clock gate test point, then the slow-to-fall TDF fault on node g becomes testable. Hence, the transition fault coverage achieved with the proposed method is higher either of the existing schemes. When low capture power ATPG is used for reducing test power, clock gates can be deterministically turned off by ensuring that the LD pin of the clock gate as well as the scan cell driving the test point are both low. When compared to classic test point techniques, the proposed method has very low area overhead, it does not add any scan cells to the design and it also does not add any additional in functional paths that can impact circuit performance. 4. Selection Schemes for Clock Gate Test Points In this section we describe various selection schemes for applying the proposed clock gate test points. The selection process consists of two steps: () Identifying clock gates in the design to which the proposed test points should be applied, and (2) Identifying the scan cell for driving each test point. Several heuristic methods can be used for the selection process. First, we outline methods for selecting the clock gates to which the proposed test point technique can be applied. Method-: Since the area overhead of the proposed method is very small compared to the design, it is feasible to apply the test points to all the clock gates in the design. For example, in the case of E7 design, there are 599 clock gates in the design. Therefore, the area overhead is 599 AND-OR gates which constitutes.2% of the gates in the design. Method-2: We define two values for each clock gate in the design: input rank and output rank. The input rank is the minimum number of care bits required to set the LD pin of the clock gate to. Note that if it is not feasible to determine the number of care bits, then the input rank can be determined by counting the number of scan cells in the input cone of of the LD pin. The output rank is the number of scan cells driven by the clock gate. Based on these two values, several selection heuristics can be used. For example, we can order the clock gates in the decreasing order of the product of their input and output rank, and select the top N clock gates from this list. Alternatively, we can select clock gates which satisfy the criterion that their input rank and output ranks are greater than a certain threshold. Method-3: Order the clock gates based on the SCOAP controllability measure of the LD pin. Then the top N clock gates from this list can be selected. Next, we outline some methods for selecting the scan cell for driving the test point. Method-: Randomly pick any scan cell which is in the input cone of of the LD pin. To reduce the routing length of the short-cut path, the selection process can be guided by factors such as physical proximity to the clock gate cell. In the absence of layout information, netlist hierarchical proximity between the scan cell and the clock gate can also be considered. Paper 3.2 INTERNATIONAL TEST CONFERENCE 8

Table 6: ATPG Results for Clock Gate Test Points Ckt. # of clock gates # of CGTPs Stuck-at ATPG # of pats TDF ATPG # of pats Comparison to scan_enable scheme (TDF) gain # of eff. pats % reduction Comparison to scan_mode scheme (TDF) gain # of eff. pats % reduction E 588 573 99.3 4,98,53 88.75,52,349.57 8,32 34.27.43 8,768 2.87 E2 442 442 99.3 6,43 2,96 9.73 6,344 8,975.63,264 29.4 -.5 E3 853 85 99. 6,34 8,684 95.65 7, 68,278.34,56 4.8.7,968 3.32 E4 2,54 2,43 99.2 7,78 3,477 9.79 38,94 46,288 2.86,648 69.8.94 4,4 5.84 E5 939 937 99.34 56,62 8,472 89.32 4,5 67,.9 2,92 5.65.4 4,576 25.77 E6 2,69,3 99.5 4,333 2,75 93.62 22,584 9,6.38,72 52.9 3.6 4,544 79.2 E7 5,99 5,59 99.4 9,373 68,76 9.76 33,859 47,644.9 23,36 69.32.24 9,648 36.98 22,44 2,35 99.27 5,26 3,72 9.8 4,95 2,964.87 28,5 44.44.9 27,37 4.97 Method-2: Evaluate the SCOAP controllability measure of the D pin of all the scan cells which are in the input cone of of the clock gate. Select the scan cell which is the easiest to control and use it to drive the test point. The motivation for this approach is that during TDF pattern generation, only one care bit is needed to enable the clock gate during launch cycle. But in order to control the clock gate during capture cycle, the number of care bits is determined by the requirement to capture value in the scan cell driving the test point. Next, we present the results of our experiments using the proposed method. For all the industrial designs considered, we selected the clock gates whose input ranks are more than 3 and whose output ranks are more than for applying clock gate test points. The scan cell for driving the test points were randomly selected from the set of scan cells in the input cone of of the LD pin. 4.2 Experimental Results Table 6 shows the ATPG results for test case designs after the clock gate test points were inserted. The first column shows the name of the design and the second column shows the number of clock gates in the design. The third column shows the number of clock gate test points in the design. The next three columns show the results of stuckat fault ATPG (coverage, pattern count and ATPG run time in seconds) and columns seven to nine show the results for TDF ATPG in the same order. In the case of stuck-at ATPG, it can be seen that similar fault coverage as with the scan_enable scheme (shown in Table 2) is achieved for all the designs. Also, the pattern count and the ATPG run time are roughly the same. When compared to the scan_mode scheme, the proposed test points achieve higher coverage but for similar test pattern count. In the case of TDF ATPG, in order to facilitate easy comparison of the proposed method, we show the gain in TDF coverage using the proposed method with respect to the scan_enable and the scan_mode schemes in columns and 3 respectively. It can be seen that on an average, the proposed method results in about.9% higher test coverage when compared to the existing schemes. In order to measure pattern count savings, columns and 4 show the number of test patterns required using the proposed method to achieve the same fault coverage as with the scan_enable and scan_mode schemes. Columns 2 and 5 show, respectively, the percentage savings in the number of test patterns with respect to the two schemes. It can be seen that when compared to the scan_enable method, which is the most widely used scheme in the industry, the proposed method results in 44% fewer test patterns. In the case of E7 and E4 designs the savings are more than 69%. When compared to the scan_mode scheme, the average savings are close to 4% but in the case of E6 design, the savings are close to 8%. Only in the case of E2, the TDF coverage of the proposed method was lower than the scan_mode scheme by.5% but when compared to the scan_enable scheme, the coverage of the proposed method is higher by.63% and pattern saving is about 3%. TDF coverage (%) 95 9 85 8 75 7 65 6 Patterns scan_enable clock gate test points Figure 9: Coverage graph for E7 design Paper 3.2 INTERNATIONAL TEST CONFERENCE 9

To demonstrate the effect of lower fill rate using the proposed method and the resulting superior test compaction, we show the test coverage graph of the proposed method as well as the scan_enable scheme in Figure 9 for E7 design. It can be seen that using the proposed method, the same coverage is achieved with less than 5% patterns when compared to the scan_enable scheme. Table 7 shows the average and peak of stuck-at and transition fault test patterns for regular ATPG as well as low capture power ATPG using the proposed method. Table 7: of test patterns with Clock Gate Test Points Regular ATPG Low capture power ATPG Ckt. Stuck-at ATPG TDF ATPG Stuck-at ATPG TDF ATPG E 7.89 25.3 4.6 25.4 5.27 24.26 3.6 2.78 E2 7.76 27.3 7.5 28.32 4.3 23.95 6.52 26.52 E3 22.3 26.8 8. 27.24 6.28 2.62 5.93 26.59 E4 22.2 25.4 7.77 25.57 8.34 23.34 5.9 24.59 E5 22.92 3.63 7.93 3.5 2.69 29.59 7.25 28.62 E6 7.3 22.68 6.43 23.3.28 23.6.92 22.55 E7 5.24 7.4.25 7.34.28 4.62 9.2 3.94 9.38 24.96 6.8 25.38 5.35 22.78 4.27 23.5 When regular ATPG is used, the average and peak of the test patterns is higher than the corresponding numbers for the scan_enable scheme (shown in Table 4) but lower than the scan_mode scheme. The average of the proposed method is 2.7% higher than the scan_enable scheme for stuck-at tests and.8% for TDF tests. However, when low capture power ATPG is used, it is possible to reduce the average as well as the peak of the test patterns. In the case of stuck-at tests, the average was reduced by 4.% and in the case of TDF tests, it was reduced by 2.3%. It should be noted that it is possible to further reduce the switching activity of the test patterns for the proposed method by using more aggressive settings of the switching activity threshold of the ATPG tool. 5. Conclusion We presented a comprehensive analysis of the usage and impact of clock gating during test. We then proposed a new type of test point called Clock Gate Test Points. Similar to classic test point techniques, clock gate test points help in reducing the fill rate of test patterns and thus reduce the number of test patterns needed to achieve desired fault coverage. They can also be used for achieving higher transition fault coverage and thus improving the quality of test. The proposed method has very little area overhead and no impact on circuit performance. Experimental results for several large industrial circuits are included to demonstrate the usefulness of the proposed method. 6. References [] M. Nakao, S. Kobayashi, K. Hatayama, K. Iilima and S. Terada, Low Overhead Test Point Insertion for Scanbased BIST, In Proc. International Test Conference, 999 [2] N. Tamarapalli and J. Rajski, Constructive Multi Phase Test Point Insertion for Scan-based BIST, In Proc. International Test Conference, 996 [3] M. J. Guezebroek, J. T. van der Linden and A. J. van de Goor, Test Point Insertion that Facilitates ATPG in Reducing Test Time and Data Volume, In Proc. International Test Conference, 22 [4] S. Remersaro, J. Rajski, T. Rinderknecht, S. M. Reddy and I. Pomeranz, ATPG Hueristics Dependant Observation Point Insertion for Enhanced Compaction and Data Volume Reduction, In Proc. International Symposium on Defect and Fault Tolerance of VLSI Systems, 28 [5] R. Sethuram, W. Seongmoon, S. T. Chakradhar and M. L. Bushnell, Zero Cost Test Point Insertion Technique to Reduce Test Set Size and Test Generation Time for Structured ASICs, In Proc. Asian Test Symposium, 26 [6] M. Yoshimura, T. Hosokawa and M. Ohta, A Test Point Insertion Method to Reduce the Number of Test Patterns, In Proc. Asian Test Symposium, 22 [7] H. Ren, M. Kusko, V. Kravets and R. Yaari, Low Cost Test Point Insertion without Using Extra Registers for High Performance Designs, In Proc. International Test Conference, 29 [8] J-S Yang, B. Nadeau-Dostie and N. A. Touba, Test Point Insertion Using Functional Flip-Flops to Drive Control Points, In Proc. International Test Conference, 29 [9] J. Rearick, Too Much Delay Fault Coverage is a Bad Thing, In Proc. International Test Conference, 2 [] P. Girard, Low Power Testing of VLSI Circuits: Problems and Solutions, In Proc. International Symposium on Quality Electronic Design, 2 [] J. Saxena, K. M. Butler, V. B. Jayaram et.al., A Case Study of IR-Drop in Structured At-Speed Testing, In Proc. International Test Conference, 23 [2] S. Remersaro,. Lin, S. M. Reddy, I. Pomeranz and J. Rajski, Low Shift and Capture Power Scan Tests, In Proc. International Conference on VLSI Design, 27 [3] D. Czysz, M. Kassab,. Lin, G. Mrugalski, J. Rajski and J. Tyszer, Low Power Scan Shift and Capture in the EDT Environment, In Proc. International Test Conference, 28 [4] S. Remersaro,. Lin, Z. Zhang, S. M. Reddy, I. Pomeranz and J. Rajski, Preferred Fill: A Scalable Method to Reduce Capture Power for Scan Based Designs, In Proc. International Test Conference, 26 [5]. Wen, Y. Yamashita, S. Morishima, S. Kajihara, L. T. Wang, K. K. Saluja and K. Kinoshita, Low Capture Power Test Generation for Scan Based At-Speed Testing, In Proc. International Test Conference, 25 Paper 3.2 INTERNATIONAL TEST CONFERENCE