Reducing Power Supply Noise in Linear-Decompressor-Based Test Data Compression Environment for At-Speed Scan Testing

Similar documents
IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

Controlling Peak Power During Scan Testing

Scan-shift Power Reduction Based on Scan Partitioning and Q-D Connection

Analysis of Power Consumption and Transition Fault Coverage for LOS and LOC Testing Schemes

Design of Fault Coverage Test Pattern Generator Using LFSR

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

A Novel Low Power pattern Generation Technique for Concurrent Bist Architecture

On Reducing Both Shift and Capture Power for Scan-Based Testing

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation

Low Power Implementation of Launch-Off- Shift and Launch-Off-Capture Using T-Algorithm

SIC Vector Generation Using Test per Clock and Test per Scan

Clock Gate Test Points

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

MVP: Capture-Power Reduction with Minimum-Violations Partitioning for Delay Testing

Survey of Test Vector Compression Techniques

A New Approach to Design Fault Coverage Circuit with Efficient Hardware Utilization for Testing Applications

I. INTRODUCTION. S Ramkumar. D Punitha

A Novel Scan Segmentation Design Method for Avoiding Shift Timing Failures in Scan Testing

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

Czech Technical University in Prague Faculty of Information Technology Department of Digital Design

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

Low Power Estimation on Test Compression Technique for SoC based Design

A New Low Energy BIST Using A Statistical Code

VLSI System Testing. BIST Motivation

Minimizing Peak Power Consumption during Scan Testing: Test Pattern Modification with X Filling Heuristics

Weighted Random and Transition Density Patterns For Scan-BIST

DETERMINISTIC SEED RANGE AND TEST PATTERN DECREASE IN LOGIC BIST

Logic Design for Single On-Chip Test Clock Generation for N Clock Domain - Impact on SOC Area and Test Quality

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Testing of Cryptographic Hardware

Scan. This is a sample of the first 15 pages of the Scan chapter.

Test Compression for Circuits with Multiple Scan Chains

Scan Chain Reordering-aware X-Filling and Stitching for Scan Shift Power Reduction

A Technique to Reduce Peak Current and Average Power Dissipation in Scan Designs by Limited Capture

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

Fault Detection And Correction Using MLD For Memory Applications

Survey of low power testing of VLSI circuits

Response Compaction with any Number of Unknowns using a new LFSR Architecture*

THE MAJORITY of the time spent by automatic test

Power Problems in VLSI Circuit Testing

LFSR Counter Implementation in CMOS VLSI

Transactions Brief. Circular BIST With State Skipping

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab. Built-In Self Test 2

VLSI IMPLEMENTATION OF SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST IN FPGA TECHNOLOGY

State Skip LFSRs: Bridging the Gap between Test Data Compression and Test Set Embedding for IP Cores *

Achieving High Encoding Efficiency With Partial Dynamic LFSR Reseeding

Soft Computing Approach To Automatic Test Pattern Generation For Sequential Vlsi Circuit

TEST PATTERNS COMPRESSION TECHNIQUES BASED ON SAT SOLVING FOR SCAN-BASED DIGITAL CIRCUITS

HIGHER circuit densities and ever-increasing design

Overview: Logic BIST

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

Changing the Scan Enable during Shift

Design for test methods to reduce test set size

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Launch-on-Shift-Capture Transition Tests

ISSN:

Final Exam CPSC/ECEN 680 May 2, Name: UIN:

BUILT-IN SELF-TEST BASED ON TRANSPARENT PSEUDORANDOM TEST PATTERN GENERATION. Karpagam College of Engineering,coimbatore.

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

Design for Testability

Retiming Sequential Circuits for Low Power

Low-Power Scan Testing and Test Data Compression for System-on-a-Chip

ECE 715 System on Chip Design and Test. Lecture 22

Deterministic BIST Based on a Reconfigurable Interconnection Network

Peak Dynamic Power Estimation of FPGA-mapped Digital Designs

LOW-OVERHEAD BUILT-IN BIST RESEEDING

Test-Pattern Compression & Test-Response Compaction. Mango Chia-Tso Chao ( 趙家佐 ) EE, NCTU, Hsinchu Taiwan

Design and Implementation of Uart with Bist for Low Power Dissipation Using Lp-Tpg

An On-Chip Test Clock Control Scheme for Multi-Clock At-Speed Testing

Controlled Transition Density Based Power Constrained Scan-BIST with Reduced Test Time. Farhana Rashid

K.T. Tim Cheng 07_dft, v Testability

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

Multi-Scan Architecture with Scan Chain Disabling Technique for Capture Power Reduction

Reducing Test Point Area for BIST through Greater Use of Functional Flip-Flops to Drive Control Points

DETERMINISTIC TEST PATTERN GENERATOR DESIGN WITH GENETIC ALGORITHM APPROACH

An FPGA Implementation of Shift Register Using Pulsed Latches

VLSI Test Technology and Reliability (ET4076)

High-Frequency, At-Speed Scan Testing

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

Clock Control Architecture and ATPG for Reducing Pattern Count in SoC Designs with Multiple Clock Domains

Design for Testability Part II

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

March Test Compression Technique on Low Power Programmable Pseudo Random Test Pattern Generator

Efficient Combination of Trace and Scan Signals for Post Silicon Validation and Debug

A Novel Framework for Faster-than-at-Speed Delay Test Considering IR-drop Effects

FOR A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY

Power Optimization by Using Multi-Bit Flip-Flops

Synthesis Techniques for Pseudo-Random Built-In Self-Test Based on the LFSR

Strategies for Efficient and Effective Scan Delay Testing. Chao Han

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Design of Routing-Constrained Low Power Scan Chains

Lecture 23 Design for Testability (DFT): Full-Scan

Diagnosis of Resistive open Fault using Scan Based Techniques

Impact of Test Point Insertion on Silicon Area and Timing during Layout

TEST PATTERN GENERATION USING PSEUDORANDOM BIST

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

POWER AND AREA EFFICIENT LFSR WITH PULSED LATCHES

Transcription:

Reducing Power Supply Noise in Linear-Decompressor-Based Test Data Compression Environment for At-Speed Scan Testing Meng-Fan Wu, Jiun-Lang Huang Graduate Institute of Electronics Engineering Dept. of Electrical Engineering National Taiwan University, Taipei 6, Taiwan iaoqing Wen, and Kohei Miyase Dept. of Computer Science and Electronics Kyushu Institute of Technology Iizuka 82-852, Japan Abstract Yield loss caused by excessive power supply noise has become a serious problem in at-speed scan testing. Although -filling techniques are available to reduce the launch cycle switching activity, their performance may not be satisfactory in the linear-decompressor-based test compression environment. This work is the first to solve this problem by proposing a novel integrated ATPG scheme that efficiently and effectively performs compressible - filling. Related theoretical principles are established, based on which the problem size is substantially reduced. The proposed scheme is validated by large benchmark circuits as well as an industry design in the embedded deterministic test (EDT) environment.. Introduction While the advanced IC fabrication technology empowers the designers to realize more versatile systems on a chip, it also poses new test challenges, for example, the timing related defects and the growing test data volume... Power Supply Noise in At-Speed Scan Testing Modern circuits are more prone to timing related errors because of the growing circuit complexity, the escalating clock speed, and the lowered power supply voltage. As a result, delay testing becomes a necessity to ensure high test quality [9]. Delay testing in general adopts the two-pattern test approach. The first pattern sets the circuit state and the second pattern activates the desired transition at the fault site. The fault is detected if the transition does not propagate to the target flip-flop(s) within the functional clock period. Figure depicts the timing diagram of the launch-on-capture (LOC) at-speed scan testing scheme. The rising edges of the two capture cycles, C and C 2, correspond to the functional clock cycle, called the launch cycle hereafter. If the transition launched at C does not propagate to the target flip-flop(s) before C 2, the chip under test is classified as faulty. clk shift cycles capture cycles C C 2 shift cycles scan enable : launch cycle / functional clock cycle Figure. The launch-on-capture scheme. The LOC scheme suffers yield loss caused by the power supply noise in the launch cycle. Conventional delay fault ATPG s neglect the impact of launch induced switching activity. The generated patterns may cause excessive switching activity in the launch cycle, which leads to abnormally high power network IR-drop and results in extra gate propagation delays [25]. The extra delays may cause a timingdefect-free CUT to fail the delay fault test; this problem is referred to as the power supply noise induced yield loss. In [3,7], it is reported that, in a 3 nm ASIC design running at 5 MHz clock frequency, some circuits pass the transition fault test only if the supply voltage is above.55 V; otherwise, they fail. Previous works that aim at reducing the launch cycle power supply noise can be categorized into the architecturebased class and the pattern-based class. The partial capture scheme in [2] is architecture-based; the noise-aware ATPG techniques [, 3, 5, 4, 9, 22, 24] and the post-atpg - filling techniques [2, 5, 2, 23] are pattern-based. Patternbased techniques are more compatible with any existing design flow and need no circuit modification. Note that -filling is very powerful whether it is used alone or integrated into ATPG. The reason is that most test patterns, even after compaction, are sparsely specified [2]; thus, one can properly assign the -bits to effectively reduce the incurred switching activity..2. Test Pattern Compression Test data compression has become a necessity as a result of growing test data size in the new generation of technology. Figure 2 shows a compressor-decompressor archi- Paper 3. INTERNATIONAL TEST CONFERENCE -4244-423-/8/$2. c 28 IEEE

compressed test stimulus decompressor chip ATE compactor compacted test response b channels total l bits from ATE scan slices on-chip decompressor n-bit linear finitestate machine phase shifter c channels total m bits to CUT Figure 2. A test data compression architecture. tecture. The decompressor decompresses the test stimulus from ATE and the compactor compacts the test response. Like -filling, test pattern compression techniques properly assigns the -bits so that the patterns become compressible. [8] provides a thorough survey on data compression techniques. Among the various test pattern compression techniques, we will focus on the lineardecompressor-based scheme in this paper. Compared to the code-based and broadcast-scan-based schemes, it has the advantages of high compression rate and very little hardware overhead. Furthermore, it has been widely adopted in industry designs [8]..3. Motivation Since the effectiveness of -filling based launch noise reduction techniques highly deps on the percentage of -bits left for assignment, the effect on power supply noise reduction is severely degraded if test pattern compression is performed first [3]. Similarly, applying launch noise reduction -filling first will sacrifice the data compression performance. To resolve this problem, there is a need to develop a launch noise reduction technique that is compatible with the utilized test pattern compression scheme. Although several previous techniques [4, 7, 8,, 6] combine test data compression and test power reduction, they only consider the shift-in induced switching activity and neglect the launch induced switching activity. These techniques solve the average power issue rather than the power supply noise issue..4. Contribution The major contribution of this paper is to present the first -filling-based technique that generates test patterns with low launch cycle power supply noise in the lineardecompressor-based test pattern compression environment. The proposed technique is based on the test pattern refinement flow in [24] and the JP-filling technique in [23], it has the following advantages.. A fault list shuffling mechanism is introduced to help escape from the local optima. This procedure is very effective for the large industry circuit. Figure 3. A typical sequential linear decompressor. 2. An efficient compression-compatible JP-filling is developed to generate compressible patterns with low launch cycle supply noise. Theoretical principles are defined to speed up the assignment process. The proposed technique has been validated with the IS- CAS 89, ITC 99, and one industrial design. Without sacrificing the test set size and fault coverage, the proposed technique reduces the launch cycle WSA (weighted switching activity) by 26% for the three largest circuits and 7% for all the circuits..5. Paper Organization The paper organization is as follows. Section 2 gives the background of this work, including linear-decompressorbased test compression and pattern-based power supply noise reduction. Section 3 illustrates the proposed low launch cycle noise pattern generation technique for lineardecompressor-based compression. Experimental results on benchmark and industrial circuits are shown in Section 5. Finally, we conclude this work in Section 6. 2. Background 2.. Sequential Linear Decompression Linear decompressors consist of only OR gates, wires, and D flip-flops. A typical sequential linear decompressor is shown in Figure 3 [6, 2]; it consists of an n-bit linear finite-state machine such as linear feedback shift register (LFSR), cellular automata, or ring generator [], and a phase shifter. At each shift clock cycle, the linear finite-state machine receives b free variables from ATE and outputs n signals which represent the current state. The phase shifter is a combinational logic which consists of OR gates and wires. It splits the n signals into c signals, which are inputs to the c scan chains. Assuming that the test pattern is m- bit wide and the compressed test stimulus is l-bit wide, we have l b = m = number of scan slices c For the linear decompressor in Figure 3, there is a Boolean matrix M m (l+n) from which compressible test patterns are spanned. Let Y be the vector of free variables from Paper 3. INTERNATIONAL TEST CONFERENCE 2

y y 2 y 9 y y 4 y 3 y 2 y y 5 y 6 y 7 y 8 start commercial ATPG for all faults identify test patterns P with higher power supply noise remove P from test set F: faults exclusively covered by P z 9 z z z 2 z 3 z 4 z 5 z 6 z z 2 z 3 z 4 z 5 z 6 z 7 z 8 power-supply-noise-aware ATPG for F add new patterns to test set further refine? Figure 4. An example sequential linear decompressor. ATE. If a test vector V is compressible, the following linear system has at least one solution, MY = V () Figure 4 uses an 8-bit ring generator with 2 external input channels as an example. The ring generator implements the primitive polynomial x 8 + x 6 + x 5 + x +; its initial state is represented by the free variables y y 2 y 8. Assuming that the test pattern is 6-bit wide, denoted by V = z z 2 z 6, there are two scan slices: (z z 2 z 8 ) and (z 9 z z 6 ). The free variables injected into the ring generator are (y 9,y ) in the first scan slice and (y,y 2 ) in the second scan slice. The linear system that describes this configuration is as follows. y y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9 y y y 2 = z z 2 z 3 z 4 z 5 z 6 z 7 z 8 z 9 z z z 2 z 3 z 4 z 5 z 6 Given a partially specified test pattern V.LetV s denote the sub-vector of V that consists of the specified bits, and M s the sub-matrix of M that consists of the rows corresponding to the specified bits. V is compressible if there exists a solution for the following linear system. M s Y = V s (2) Another way to determine whether V is compressible is to compute the rank of the corresponding augmented matrix [M s V s ] V is compressible if the rank of the augmented matrix is the same as that of the coefficient matrix, i.e., M s. Continued from the above example, assuming that ATPG requires that z 6 =, z 9 =, z 5 =,andz 6 =and Figure 5. The iterative test refinement flow in [24]. leaves the other bits unspecified, we have V s = [] T and the corresponding augmented matrix is as follows. Since the ranks of the augmented and coefficient matrices are the same, the test pattern is compressible. In general, multiple solutions exist for a sparsely specified test pattern; however, deping on which solution is used, the final fully specified pattern varies. For example, Y = [] T and [] T are two possible solutions. The corresponding test patterns are and, respectively. 2.2. Iterative Test Set Refinement The work in [24] presents an iterative test pattern refinement flow (Figure 5) to reduce test application induced WSA. It utilized a standard commercial ATPG to generate the initial test set. In each refinement iteration, patterns that are prone to excessive WSA are identified and faults that are exclusively detected by these patterns are re-targeted by a power-supply-noise-aware ATPG. The refinement process continues until no further improvement is made. The noise-aware ATPG is slower compared to the standard one; therefore, using the latter to generate the initial test set substantially improves the CPU time. Experimental results also show that most patterns in the initial test set remain throughout the refinement process. 2.3. JP-Filling The work in [23] presents an improved post-atpg - filling technique on [2] and [5]. Called JP-filling, this -filling technique is both effective and scalable in minimizing launch cycle power supply noise. Given a partially Paper 3. INTERNATIONAL TEST CONFERENCE 3

PPI start test pattern event-driven simulation PPI-PPO classification any type A? any type B? any type C? for all pairs ( PPI, b) assign b to PPI for all pairs (a, PPO ) justify a on PPO for some pairs ( PPI, PPO ) assign PPI according to probability Figure 6. The JP-filling flow in [23]. Table. PPI-PPO Classification in JP-filling PPI PPO Type A / Type B / Type C Type D / / PPO Type A assign assign Type B Type C justify assign Figure 7. A JP-filling example. specified test pattern, JP-filling aims at reducing the Hamming distance between the pattern itself and its output response. The result is reduced flip-flip switching activity in the launch cycle, which indirectly brings down the launch cycle WSA. The JP-filling flow is shown in Figure 6. First, it performs 3-valued (//) logic simulation to derive the output response of the given partially specified pattern. Then, each PPI-PPO pair is classified as type A, B, C, or D according to Table. These pairs are processed in the order: A, B, C. (Type D needs no further processing.) For each type A pair, JP-filling assigns the PPO value to PPI. For each type B pair, it justifies PPO with the specified PPI value. As for the type C pair, the PPI and PPO are assigned either or according to probability. Note that all type A pairs are processed at the same time, so do all type B pairs. Type C pairs of which the difference between -probability and - probability is greater than a pre-defined threshold are also processed at the same time. Figure 7 gives a JP-filling example. The circled PPO s are the ones that become specified after event-driven simulation. 2.4. Problem Definition Assume that the LOC at-speed testing scheme and the linear-decompressor-based test pattern compression technique are utilized. The goal is to generate test patterns that induce low WSA in the launch cycle and are compressible by the linear decompressor. Meanwhile, compared to the standard compressible test pattern generation flow, the increase in test set size should be minimal. The proposed technique is called Compressible Supply Noise Reduced Test and is abbreviated as CSNR-Test hereafter for convenience. 3. Compressible Supply Noise Reduced Test (CSNR-Test) Before describing further details of CSNR-Test, we will introduce the concept of implied and free -bits and show how to determine whether an -bit is a free or implied one. 3.. Implied and Free -Bits Definition. An -bit in a compressible pattern is ()- compressible if the resulting pattern is compressible after the -bit is assigned (). Definition 2. An -bit in a compressible pattern is a freebit if it is both -compressible and -compressible. Definition 3. An -bit in a compressible pattern is an implied-bit if it is either -compressible or -compressible, but not both. Definition 4. The implied-value of a -compressible implied bit is ; the implied-value of a -compressible implied bit is. Definition 5. A partially specified pattern is a free pattern if all its -bits are free bits. The following example depicts the difference between free and implied bits. Consider the following linear system y y 2 y 3 y 4 = z z 2 z 3 z 4 = (3) In this example, z 3 and z 4 have been specified as and, respectively. The corresponding linear equations are as follows. y y 2 y 3 = z (4) y y 3 = z 2 (5) y y 4 = (6) y 2 y 3 y 4 = (7) Paper 3. INTERNATIONAL TEST CONFERENCE 4

From (4), (6), and (7), one arrives at z = y y 2 y 3 (8) =(y y 4 ) (y 2 y 3 y 4 ) (9) = () = () which shows that z s value has been implied by the specified z 3 and z 4 even though we have not explicitly specified it. From the M matrix s point of view, one can arrive at () because z s row vector, the first row vector in M, can be spanned by those of z 3 and z 4. According to the definition, z is a -compressible implied bit. On the contrary, we cannot find any linear combination of (6) and (7) to arrive at (5), i.e., the row vector of z 2 cannot be spanned by those of z 3 and z 4. z 2 can be assigned either or. Thus, it is - compressible and -compressible, and by definition, a free bit. Theorem. Given a compressible test pattern V,iftherow vector of an -bit can be spanned by the row vectors of the specified bits, the -bit is an implied-bit; otherwise, it is a free-bit. Proof. If the row vector of an -bit can be spanned by some of the row vectors of the specified bits, the value of this -bit can be computed in the same way as we compute z in the above example. Thus, this -bit is an implied one. Now, let s prove the other half of the theorem. Since the pattern is compressible, we have Rank ([M s V s ]) = Rank (M s )=r (2) Consider an bit whose row vector cannot be spanned by M s. Let s add its row vector to M s and denote the resulting matrix by M s. The rank of M s is r +. As a result, the corresponding augmented matrix will also have a rank of r +no matter this bit is assigned or. Thus, by definition, this bit isafreebit. Lemma. Given a compressible test pattern V, assigning an implied bit with its implied-value does not affect the compressibility of the test pattern. Proof. Assume that Rank ([M s V s ]) = r before the assignment. Since the assigned -bit is an implied one, we have Rank (M s ) = r where M s is as defined in Theorem. The way the implied value is computed ensures that Rank ([M s V s ]) = r, wherev s is the resulting pattern after the assignment. This proves the lemma. One way to determine whether an -bit is free or implied is as follows.. Compute the orthogonal basis of M s and denote by B the set of row vectors in the orthogonal basis. start EDT-S for all faults identify test patterns P with higher power supply noise remove P from test set F: faults exclusively covered by P CSNR ATPG for F add new patterns to test set further refine? re-enter? shuffle the order of faults F Figure 8. The proposed CSNR-Test flow. 2. Check whether the -bit s row vector can be spanned by B or not. If positive, the -bit is implied; otherwise, it is free. Note that a free bit may become an implied one after some other free bits are specified. Thus, one should execute -bits classification process after each assignment. 3.2. CSNR-Test Overview The CSNR-Test flow is depicted in Figure 8 which is a modified version of that in [24]. First, an ATPG that generates compressible at-speed test patterns, e.g., the EDT- Standard [2], is utilized to obtain the initial compressible test set. CSNR-Test then enters the test set refinement process to lower the launch cycle supply noise. In each refinement iteration, the set of patterns whose launch cycle WSA is greater than 99% of the maximum launch cycle WSA in the current test set are identified. These patterns form the set of high supply noise patterns, denoted by P,thatisto be refined. The threshold is set to 99% so that the maximum launch cycle WSA is reduced by at least % in each iteration. Once identified, P is removed from the test set and fault simulation is performed to identify the set of faults F that are exclusively detected by P. A launch cycle noise-aware ATPG targets the faults in F. If the newly generated patterns improve the maximum launch cycle WSA, they are accepted; otherwise, they are rejected. In the latter case, CSNR-Test randomly shuffles the order of faults in F and re-enters the refinement process. These shuffles prevent CSNR-Test from being trapped in local optima. If CSNR- Test fails to improve the maximum launch cycle WSA for 5 consecutive iterations, i.e., 5 continuous shuffles, it terminates the refinement process. In our experiments, if more continuous shuffles are allowed, substantial improvement is observed for the 3 largest circuits. Paper 3. INTERNATIONAL TEST CONFERENCE 5

start restore latest test cube Earlier Assignments v.s. Latter Assignments select target fault test cube generation compressible? compressible JP-filling fault dropping all faults targeted? another fault? incremental test cube generation compressible? enough -bits? Figure 9. The proposed CSNR ATPG flow. 3.3. Compressible Supply Noise Reduced (CSNR) ATPG Figure 9 depicts the flow of the compressible supply noise reduced ATPG (CSNR ATPG) in Figure 8. The flow is a modified version of the EDT-Standard. The grey blocks indicate the augmented steps, including the new dynamic compaction constraint and the compressible JP-filling the former ensures that the generated patterns still have enough -bits left; the latter performs the compressible low launch noise assignments. Let s be the number of specified bits in test pattern V. CSNR ATPG stops incremental pattern generation for V and starts generating next pattern if s.5 b m (3) c The following experiment explains why we choose this threshold value. Assume that a 2-to-2 linear decompressor is used; this corresponds to x compression. The demonstration circuit is the ISCAS 89 benchmark circuit s3847 whose test pattern width is,664. The experiment starts from a totally unspecified test pattern. Each time we randomly select an unspecified bit and randomly assigns or to it. If the resulting test pattern is still compressible, this assignment is accepted, or it is rejected and we invert the assignment so that the pattern remains compressible. The reason to invert a rejected assignment is as follows. As the pattern is compressible before the rejected assignment, there exists a solution to () that set the rejected bit to either or. Thus, if we invert a rejected assignment, the resulting pattern must be compressible. At the, we record which of the,664 assignments are accepted and which are rejected. This experiment is repeated 5 times; Figure is the average acceptance rate vs. assignment index plot. The curve Accept Ratio % 75% 5% 25% % 2 22 42 62 82 2 22 42 62 Index of Per-Bit Assignment Figure. The acceptance ratio vs. assignment index plot. is close to % at the beginning when the pattern is so sparsely specified that most available -bits are free ones. On the contrary, the curve eventually converges to 5% at the when almost all the -bits left are implied ones. We are more interested in the inflection point that occurs at about assignment 64 around which the acceptance rate declines rapidly. Note that the inflection point index is close to the number of free variables from ATE, i.e., l = b m c = 66. If the compaction termination threshold is greater than 64, the following JP-filling will not perform well because the number of free-bits available for low launch noise assignments decreases quickly or is too low. The chosen threshold is half the inflection point index, 82 in this example, to avoid the fast declining part before the inflection point and leave more room for launch noise reduction. Therefore, in CSNR-ATPG, the threshold for general circuits is set to be.5 b m c. One side effect of the added dynamic compaction constraint is the increased test set size. However, because CSNR-ATPG only targets the faults that are exclusively detected by high noise patterns, there is no significant test set size increase in the experimental results. 4. Compressible JP-Filling (CJP-Filling) The compressible JP-filling (CJP-filling) is the core technology of CSNR-ATPG. It tightly integrates launch cycle noise reduction and test pattern compression. 4.. A Naive Compressible JP-filling Flow A naive flow for compressible JP-filling is shown in Figure. The modifications made to the original flow are as follows. The multi-bit assignments in JP-filling (for type A, B, and C PPI-PPO pairs) are replaced by single-bit assignments. The reason is to improve the acceptance rate and to facilitate the following compressibility check procedure. For type B pairs, we choose the one whose flip-flop has the largest weight and apply Paper 3. INTERNATIONAL TEST CONFERENCE 6

start compressible test pattern event-driven simulation PPI-PPO classification any type A? any type B? any type C? compressible? invert the assigned bit for a pair ( PPI, b), assign b to PPI for a pair (a, PPO ), assign / to one -bit in test cube which has a most likelihood to justify a on PPO for a pair ( PPI, PPO ), assign PPI according to probability Figure. The naive CJP-filling flow. backtrace() to find the unspecified PPI that has the most likelihood to justify the chosen PPO to its corresponding PPI value. A compressibility checker examines whether the test pattern is compressible after the single-bit assignment. If positive, the assignment is accepted; otherwise, the assignment is rejected and the assigned bit is inverted. Note that the initial test pattern is compressible because it is generated by a compression-aware ATPG. Also, the process of inverting rejected assignments ensures that the test pattern is always compressible. The above flow is inefficient because it assigns one -bit at a time. According to the dynamic compaction constraint in (3), CJP-filling will call the time-consuming compressibilitycheckerforaboutm l/2 times, which substantially slows it down. 4.2. The Proposed CJP-Filling Flow Compared to the naive approach, the proposed CJPfilling substantially improves the CPU time by enabling multi-bit assignments (for type A and type C pairs) and by avoiding making unnecessary assignments on the implied bits. Figure 2 depicts the proposed CJP-filling flow. The flow can be divided into two phases. The upper part constitutes Phase I which keeps the pattern free. The lower part constitutes Phase II which makes compressible launch noise reduction assignments. The loop is repeated until the pattern is fully specified. 4.3. Phase I In Phase I, we first derive or update the orthogonal basis associated with the current pattern ( OB Update ), from ATPG or Phase II. Based on the updated orthogonal basis, the -bits are classified as implied or free in - Classification. All the implied bits are assigned with their respective implied values; these assignments must be compressible according to Lemma. The process of identifying and assigning the implied bits in Phase I prevents CJPfilling from making unnecessary or unacceptable assignments in Phase II. This substantially reduces the number of times the loop is executed and thus improves the CJP-filling efficiency. 4.4. Phase II In Phase II, event-driven simulation is first performed to derive the output response of the current pattern. Then, PPI- PPO pairs are classified and they are processed in the following ways. Type A (, /). We first perform compatible-freebit-set identification (CFBS Identification) which identifies a set of -bits that can be arbitrarily assigned at the same time without affecting the compressibility. Then, the original JP-filling method is utilized to assign these -bits; the resulting pattern is guaranteed compressible. Type B (/,). The process is the same as that in the naive flow. As a single assignment is made and the test pattern is free, this assignment is compressible. Type C (, ). Like Type A, a set of -bits that can be concurrently assigned is first identified (CFBS Identification). Then, the original JP-filling method is utilized to assign these -bits. Note that Phase I and CFBS Identification guarantee that the row vector of each newly assigned -bit in Phase II cannot be spanned by those of the previously and newly assigned bits (with itself excluded). Thus, the OB Update process in Phase I simply adds row vectors of the newly assigned bits in Phase II to the orthogonal basis. The following theorem establishes the foundation of the CFBS Identification algorithm. Theorem 2. A set of free bits can be randomly assigned at the same time and the resulting pattern is still compressible if, for any of the free bits in this set, its row vector cannot be spanned by the union of the orthogonal basis before assignment and the row vectors of the other free bits in this set. Proof. Denote the set of free bits by. Assume that the size of is q and Rank (M s ) = r. The property of ensures that, after the concurrent random assignments, we have Rank (M s)=r + q and, as a result, ]) = r + q. This proves the theorem. The CFBS Identification heuristic for type A and C pairs is as follows. These pairs are sorted in ascing order according to the flip-flop weight, i.e., the fanout size. This way, flip-flops with larger weights are considered first as Rank ([M s V s Paper 3. INTERNATIONAL TEST CONFERENCE 7

start compressible test pattern free pattern event-driven simulation PPI-PPO classification any type A? any type B? any type C? assign implied value to each implied bit CFBS- Identification for a pair (a, PPO ), assign / to one -bit in test cube which has a most likelihood to justify a on PPO CFBS- Identification - Classification OB- Update for identified pairs ( PPI, b), assign b to PPI Phase I for identified pairs ( PPI, b), assign b to PPI according to probability Phase II Figure 2. The proposed CJP-filling flow. they have more impact on launch noise reduction. The selection process is as follows.. Pick the first unprocessed pair in the list as the target pair. 2. If the row vector of the target pair s -bit cannot be spanned by the current orthogonal basis, the target pair s -bit is selected and its row vector is added to the orthogonal basis. 3. If not all pairs are processed, go to. According to Theorem 2, the -bits selected this way can be randomly assigned at the same time. Therefore, the JPfilling methods for type A and C pairs can be employed to concurrently target these selected -bits. CFBS Identification further improves the CJP-filling efficiency. The reasons are as follows. CFBS Identification reduces the number of times the loop is executed because it enables multi-assignments. The incurred CPU overhead is small because CFBS Identification implicitly classifies the bits of the processed pairs and performs OB Update for the selected bits the bits that are not selected are implied bits while row vectors of the selected bits have been added to the orthogonal basis. In other words, some of the operations in OB Update and Classification are performed in CFBS with a better use model. 4.5. CJP-Filling Performance Analysis The following theorem provides the foundation of the performance analysis. Theorem 3. Given a compressible test pattern V,themaximum number of times the CJP-filling loop is performed is the number of free variables from ATE minus the size of the orthogonal basis before CJP-filling. Proof. At the beginning, the rank of M s equals the size of the orthogonal basis before CJP-filling. At the, the rank of M s, i.e., M, will be smaller than or equal to the number of free variables. Thus, the theorem is proved if we show that the rank of M s increases in each loop. This is true if type B pairs are processed because a single freebit assignment is made. This is also true for type A and C pairs because if q bits are concurrently assigned, the CFBS Identification process ensures that the rank of M s increases by q. This proves the theorem. In the following, we compare the performance of the naive and proposed CJP-filling flows. First, their loop run times are about the same. The reason is that the -bit classification, the CFBS Identification, and the compressibility check are all based on Gaussian Elimination. Second, the number of times the proposed CJP-filling loop is executed is less than the number of free variables (Theorem 3) while the naive CJP-filling executes its loop for about m l/2 times. Finally, the speed-up factor is about m l/2 l = m l.5. Note that the first term is the decompressor s compression rate. Following the example in Section 3.3, the number of - bits left for CJP-filling is at least 664 ( (/2)(/)) =, 58 and the number of free variables is 664 (/2) 2 + 8 = 76 Thus, the speed-up is about, 58/76 9. Note that the analysis is for the CJP-filling process only. 4.6. Discussions The CFBS Identification process still has much room for improvement. Currently, it focuses more on multiple assignment and less on launch noise reduction. As a result, the following problems arise, and require further investigation. CFBS Identification favors the -bits whose corresponding flip-flops have larger weights. However, this does not guarantee that the sum of selected bits weights is the maximum. CFBS Identification does not consider the impact of the implied bits on launch noise. Note that minimizing launch cycle WSA could adversely increase the defect level if the resulting WSA becomes less than that in the functional mode. This can be solved by modifying the refinement algorithm only the patterns that exceed the given launch cycle WSA upper bound are refined. Paper 3. INTERNATIONAL TEST CONFERENCE 8

5. Experimental Results To facilitate the experiments, we implement an EDT- Standard test pattern generator. The experimental setup is as follows. The decompressor consists of the 2-to-8 ring generator in Figure 4 and a phase shifter that splits 8 signals to 2 signals. The total compression rate is x. The top 5 benchmark circuits from ISCAS 89, the top 5 benchmark circuits from ITC 99, and an industry circuit with 5k gates and,77 scan cells are used. The transition fault model is targeted. 5.. EDT-Standard versus CSNR-Test The experimental results are shown in Table 2. In the table, the second column is the scan architecture in the form of (# scan chains) (# scan slides). For EDT-Standard and the proposed CSNR-Test, the number of test patterns, the achieved fault coverage, the maximum launch cycle WSA, and the CPU time (in seconds) are shown. (Note that the run time of EDT-Standard is excluded in column.) For CSNR-Test, the amount of WSA reduction compared to EDT-Standard is also shown. The results are summarized as follows. The 3rd and 7th columns show that the test set size increases in some cases; the increase is within % of the initial test set, except for s3847. The reason is that CSNR ATPG must ensure that there are enough bits left for CJP-filling. The 4th and 8th columns show that the fault coverage is preserved. The 5th and 9th columns list the maximum launch cycle WSA, and the th column gives the reduction ratio achieved by CSNR-Test. For the 3 largest circuits, i.e., s3847, b7s, and the industry design, the reduction ratios are 27%, 24%, and 28%, respectively. The overall reduction ratio is 7% in average. Better reduction is achieved for larger circuits because there are more bits left for CJP-filling to reduce the launch WSA. The 6th column lists the EDT-Standard run time; the th column lists the CPU time of the test pattern refinement process. It is interesting that the three biggest circuits require much more CPU times than the others. (Recall that these circuits also have the most significant WSA reduction.) The explanation is that the initial test sets of these three circuits have much room for improvement, which requires more refinement iterations to arrive at the final solution. Reduction Ratio 3% 25% 2% 5% % 5% % CSNR-Test CSNR-Test - CI CSNR-Test (Naive) Figure 3. Comparison of reduction ratio. #. of Test Patterns 25 2 5 5 s327 s585 s35932 s3847 s38584 b5s b7s b2s b2s b22s industry s327 s585 s35932 s3847 s38584 b5s b7s b2s b2s b22s industry CSNR-Test CSNR-Test - CI CSNR-Test (Naive) Figure 4. Comparison of pattern size. 5.2. Further Analysis To validate the effectiveness of the key techniques, we compare the performance of the following three approaches. CSNR-Test. The proposed technique. CSNR-Test CI. CSNR-Test with CFBS Identification removed; this disables multi-assignments. CSNR-Test (Naive). CSNR-Test with the proposed CJP-filling replaced by the naive CJP-filling. Figure 3 shows the reductions of maximum launch WSA. It is interesting that the CFBS Identification improves not only CPU times (as we have mentioned) but also maximum WSA reduction. The reason should be that CFBS favors flip-flops with larger weights. Figure 4 compares the test set size, which is almost identical for the three approaches. 6. Conclusion This paper presents the first work to overcome the problem of launch induced yield loss in the linear-decompressorbased test data compression environment. To speed up the -filling process, classification andcfbs Identification techniques are developed and integrated into the proposed CSNR-Test flow. The proposed ATPG flow have been validated by benchmark circuits and an industry circuit; the results show that CSNR-Test is a promising flow. In the future, we will () investigate techniques to improve CFBS Identification by making it noise-reductionaware, and (2) include the circuit layout related information in the cost function. Paper 3. INTERNATIONAL TEST CONFERENCE 9

Table 2. Experimental Results for Transition Faults. EDT-Standard CSNR-Test scan #test fault max launch CPU #test fault max launch WSA CPU Circuit architecture pattern coverage (%) WSA (s) pattern coverage (%) WSA reduction (%) (s) s327 2 35 55 88.6 684 57 539 88.6 5334 22% 96 s585 2 3 34 82.7 6637 34 365 82.7 573 4% 8 s35932 2 89 92 9.4 969 2 74 9.4 7277 9% 44 s3847 2 84 76 97.7 976 926 899 97.7 4478 27% 38 s38584 2 74 54 8. 8497 3778 32 8. 676 3% 63 b5s 2 25 722 68. 6542 826 739 68. 5656 4% 53 b7s 2 73 225 76.3 446 68 288 76.3 954 24% 3223 b2s 2 27 669 74.8 55 5 67 74.8 9346 % 97 b2s 2 27 669 76.7 74 5 677 76.7 933 3% 46 b22s 2 39 9 79.5 53 5 5 79.5 3888 9% 363 industry 2 2226 96. 3783 86 2264 96. 9939 28% 544 : The EDT-Standard run time is excluded. Acknowledgment The authors would like to acknowledge Semiconductor Technology Academic Research Center (STARC), Japan, for providing the industry circuit. References [] N. Ahmed, M. Tehranipoor, and V. Jayaram. Supply Voltage Noise Aware ATPG for Transition Delay Faults. In Proc. VLSI Test Symp., pages 79 86, 27. [2] K. M. Butler et al. Minimizing Power consumption in scan testing: pattern generation and DFT techniques. In Proc. International Test Conference, pages 355 364, 24. [3] F. Corno, P. Prinetto, M. Redaudengo, and M. Reorda. Test Pattern Generation Methodology for Low Power Consumption. In Proc. VLSI Test Symp., pages 453 457, 998. [4] D. Czysz, G. Mrugalski, J. Rajski, and J. Tyszer. Low Power Embedded Deterministic Test. In Proc. VLSI Test Symp., pages 75 83, 27. [5] V. Devanathan, C. Ravikumar, and V. Kamakoti. Glitch- Aware Pattern Generation and Optimization Framework for Power-SafeScanTest. InProc. VLSI Test Symp., pages 67 72, 27. [6] C. Krishna, A. Jas, and N. Touba. Test Vector Encoding Using Partial LFSR Reseeding. In Proc. International Test Conference, pages 885 893, 2. [7] J. Lee and N. A. Touba. Low Power Test Data Compression Based on LFSR Reseeding. In Proc. IEEE Conf. on Computer Design, pages 8 85, 24. [8] J. Lee and N. A. Touba. LFSR-Reseeding Scheme Achieving Low-Power Dissipation During Test. IEEE Transactions on Computer-Aided Design, 26(2):396 4, Feb. 27. [9]. Lin et al. Timing-Aware ATPG for High Quality At-speed Testing of Small Delay Defects. In Proc. Asian Test Symp., pages 39 46, 26. [] G. Mrugalski, J. Rajski, D. Czysz, and J. Tyszer. New Test Data Decompressor for Low Power Applications. In Proc. DAC, pages 539 544, 27. [] G. Mrugalski, J. Rajski, and J. Tyszer. Ring generators - New Devices for Embedded Test Applications. IEEE Transactions on Computer-Aided Design, 23(9):36 32, Sept. 24. [2] J. Rajski et al. Embedded Deterministic Test. IEEE Transactions on Computer-Aided Design, 23(5):776 792, May 24. [3] S. Ravi. Power-aware Test: Challenges and Solutions. In Proc. International Test Conference, pages, 27. [4] S. Ravi, V. R. Devanathan, and R. Parekhji. Methodology for Low Power Test Pattern Generation Using Activity Threshold Control Logic. In Proc. Int. Conf. on Computer-Aided Design, pages 526 529, 27. [5] S. Remersaro et al. Preferred Fill: A Scalable Method to Reduce Capture Power for Scan Based Designs. In Proc. International Test Conference, pages 32.2. 32.2., 26. [6] P. M. Rosinger, B. M. Al-Hashimi, and N. Nicolici. Low PowerMixed-ModeBISTBasedonMaskPatternGeneration Using Dual LFSR Re-Seeding. In Proc. IEEE Conf. on Computer Design, pages 474 479, 22. [7] J. Saxena et al. A Case Study of IR-drop in Structured Atspeed Testing. In Proc. International Test Conference, pages 98 4, 23. [8] N. A. Touba. Survey of Test Vector Compression Techniques. IEEE Design and Test of Computers, 23(6):294 33, Apr. 26. [9] S. Wang and S. K. Gupta. An Automatic Test Pattern Generator for Minimizing Switching Activity During Scan Testing Activity. IEEE Transactions on Computer-Aided Design, 2(8):954 968, Aug. 22. [2] S. Wang and W. Wei. A Technique to Reduce Peak Power Current and Average Power Dissipation in Scan Designs by Limited Capture. In Proc. Asian S. Pacific Design Automation Conf., pages 8 86, 27. [2]. Wen et al. Low-Capture-Power Test Generation for Scan- Based At-Speed Testing. In Proc. International Test Conference, pages 9 28, 25. [22]. Wen et al. A New ATPG Method for Efficient Capture Power Reduction During Scan Testing. In Proc. VLSI Test Symp., pages 58 65, 26. [23]. Wen et al. A Novel Scheme to Reduce Power Supply Noise for High-Quality At-Speed Scan Testing. In Proc. International Test Conference, pages 25.. 25.., 27. [24] M.-F. Wu, K.-S. Hu, and J.-L. Huang. An Efficient Peak Power Reduction Technique for Scan Testing. In Proc. Asian Test Symp., pages 4, 27. [25] T. Yoshida and M. Watari. MD-Scan Method for Low Power Scan Testing. In Proc. Asian Test Symp., pages 8 85, 22. Paper 3. INTERNATIONAL TEST CONFERENCE