Test Data Compression for System-on-a-Chip Using Golomb Codes 1

Similar documents
Low-Power Scan Testing and Test Data Compression for System-on-a-Chip

HIGHER circuit densities and ever-increasing design

Deterministic BIST Based on a Reconfigurable Interconnection Network

Changing the Scan Enable during Shift

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation

Low Power Estimation on Test Compression Technique for SoC based Design

Design of Fault Coverage Test Pattern Generator Using LFSR

Scan-shift Power Reduction Based on Scan Partitioning and Q-D Connection

Controlling Peak Power During Scan Testing

Synchronization Overhead in SOC Compressed Test

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

A New Low Energy BIST Using A Statistical Code

Clock Gate Test Points

A Combined Compatible Block Coding and Run Length Coding Techniques for Test Data Compression

DETERMINISTIC SEED RANGE AND TEST PATTERN DECREASE IN LOGIC BIST

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Survey of Test Vector Compression Techniques

A Technique to Reduce Peak Current and Average Power Dissipation in Scan Designs by Limited Capture

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

I. INTRODUCTION. S Ramkumar. D Punitha

SIC Vector Generation Using Test per Clock and Test per Scan

Powerful Software Tools and Methods to Accelerate Test Program Development A Test Systems Strategies, Inc. (TSSI) White Paper.

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

MVP: Capture-Power Reduction with Minimum-Violations Partitioning for Delay Testing

Testing of Cryptographic Hardware

Transactions Brief. Circular BIST With State Skipping

VLSI System Testing. BIST Motivation

State Skip LFSRs: Bridging the Gap between Test Data Compression and Test Set Embedding for IP Cores *

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

On Reducing Both Shift and Capture Power for Scan-Based Testing

Design of Routing-Constrained Low Power Scan Chains

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

LFSR Counter Implementation in CMOS VLSI

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Design for Testability Part II

TEST PATTERNS COMPRESSION TECHNIQUES BASED ON SAT SOLVING FOR SCAN-BASED DIGITAL CIRCUITS

Czech Technical University in Prague Faculty of Information Technology Department of Digital Design

Fault Detection And Correction Using MLD For Memory Applications

Test Compression for Circuits with Multiple Scan Chains

Achieving High Encoding Efficiency With Partial Dynamic LFSR Reseeding

Designing for High Speed-Performance in CPLDs and FPGAs

THE MAJORITY of the time spent by automatic test

VLSI Test Technology and Reliability (ET4076)

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Design of BIST with Low Power Test Pattern Generator

Overview: Logic BIST

Retiming Sequential Circuits for Low Power

Hybrid BIST Based on Weighted Pseudo-Random Testing: A New Test Resource Partitioning Scheme

Channel Masking Synthesis for Efficient On-Chip Test Compression

Lossless Compression Algorithms for Direct- Write Lithography Systems

K.T. Tim Cheng 07_dft, v Testability

This Chapter describes the concepts of scan based testing, issues in testing, need

[Krishna*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

926 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY /$ IEEE

Built-In Self-Test (BIST) Abdil Rashid Mohamed, Embedded Systems Laboratory (ESLAB) Linköping University, Sweden

Implementation of Scan Insertion and Compression for 28nm design Technology

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

Logic Design for Single On-Chip Test Clock Generation for N Clock Domain - Impact on SOC Area and Test Quality

Performance Driven Reliable Link Design for Network on Chips

Response Compaction with any Number of Unknowns using a new LFSR Architecture*

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Clock Control Architecture and ATPG for Reducing Pattern Count in SoC Designs with Multiple Clock Domains

VLSI IMPLEMENTATION OF SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST IN FPGA TECHNOLOGY

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

Weighted Random and Transition Density Patterns For Scan-BIST

ADVANCES in semiconductor technology are contributing

Scan. This is a sample of the first 15 pages of the Scan chapter.

Reducing Power Supply Noise in Linear-Decompressor-Based Test Data Compression Environment for At-Speed Scan Testing

Strategies for Efficient and Effective Scan Delay Testing. Chao Han

Scan Chain Design for Power Minimization During Scan Testing Under Routing Constraint.

Testing Sequential Circuits

Power Problems in VLSI Circuit Testing

Using down to a Single Scan Channel to Meet your Test Goals (Part 2) Richard Illman Member of Technical Staff

Frame Processing Time Deviations in Video Processors

Multivalued Logic for Reduced Pin Count and Multi-Site SoC Testing

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

Synthesis Techniques for Pseudo-Random Built-In Self-Test Based on the LFSR

ALONG with the progressive device scaling, semiconductor

Chapter 10 Exercise Solutions

CHAPTER 6 DESIGN OF HIGH SPEED COUNTER USING PIPELINING

DETERMINISTIC TEST PATTERN GENERATOR DESIGN WITH GENETIC ALGORITHM APPROACH

A Novel Low Power pattern Generation Technique for Concurrent Bist Architecture

Previous Lecture Sequential Circuits. Slide Summary of contents covered in this lecture. (Refer Slide Time: 01:55)

Deterministic Logic BIST for Transition Fault Testing 1

Reducing Test Point Area for BIST through Greater Use of Functional Flip-Flops to Drive Control Points

CacheCompress A Novel Approach for Test Data Compression with cache for IP cores

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Controlled Transition Density Based Power Constrained Scan-BIST with Reduced Test Time. Farhana Rashid

ISSN:

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel

From Theory to Practice: Private Circuit and Its Ambush

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Transcription:

Test Data Compression for System-on-a-Chip Using Golomb Codes 1 Anshuman Chandra and Krishnendu Chakrabarty Department of Electrical and Computer Engineering Duke University Durham, NC 27708 {achandra, krish}@eedukeedu ABSTRACT We present a new test data method and de architecture based on Golomb codes The proposed method is especially suitable for encoding precomputed test sets for embedded cores in a system-on-achip (SOC) The major advantages of Golomb include very high, analytically predictable results, and a low-cost and scalable on-chip decoder In addition, the novel interleaving de architecture allows multiple cores in an SOC to be tested concurrently using a single ATE I/O channel We demonstrate the effectiveness of the proposed approach by applying it to the ISCAS benchmark circuits and to two industrial production circuits We also use analytical and experimental means to highlight the superiority of Golomb codes over run-length codes 1 Introduction Core-based system-on-a-chip (SOC) designs present a number of test challenges [1] In order to effectively test these systems, each intellectual property (IP) core must be adequately exercised with a set of precomputed test patterns provided by the core vendor (Figure 1) However, the I/O channel capacity, speed and accuracy, and data memory of automatic test equipment (ATE) are limited Thus, it is becoming increasingly difficult to apply the enormous volume of test data to the SOC (which can be as high as 25 Gbits for an industrial ASIC [2]) without increasing testing time and test cost substantially The reduction in test data volume will not only reduce ATE memory requirements but also lower testing time The testing time of an SOC depends on the test data volume, the time required to transfer the data to the cores, and the rate at which the test data is transferred (measured by the cores test data bandwidth and ATE channel capacity) Lower testing time will increase production capacity as well as reduce test cost and time-to-market for SOC New techniques are therefore needed for decreasing test data volume in order to overcome memory bottlenecks and to reduce testing time Built-in self-test (BIST) has emerged as a useful approach for alleviating the above problems [3] BIST reduces dependencies on expensive ATEs and it allows precomputed test sets to be embedded in test sequences generated by onchip hardware [4, 5, 6] However, BIST can be applied directly to SOC designs only if the embedded cores are BISTready Since most IP cores that are currently available from core vendors are not BIST-ready, considerable redesign is necessary for incorporating BIST Test data offers a promising solution to the problem of reducing the test data volume for SOCs, especially if the IP cores in the system are not BIST-ready [7-10] In this approach, a precomputed test set for an IP core is compressed (encoded) to a much smaller test set, which is stored in ATE memory An on-chip decoder is used for pattern de to obtain from during test application (Figure 2) Test data using statistical coding of test sequences for synchronous sequential (non-scan) circuits was presented in [7, 8] Statistical coding was successfully applied to test sets for full-scan circuits in [9] While the method in [7, 8] is restricted to sequential circuits with a large number of flip-flops and relatively few primary inputs, the work presented in [9] does not conclusively demonstrate that statistical coding provides greater than standard ATPG methods for full-scan circuits [11, 12] An alternative approach to test data is motivated by the fact that successive test patterns in a test sequence often differ in only a small number of bits This was exploited in [10], where instead of compressing the test sequence, a difference vector sequence T diff determined from was compressed using run-length coding A test architecture employing difference vectors and based on cyclical scan registers (CSRs) is sketched in Figure 3 Note that existing registers on the SOC may be used as CSRs in order to reduce overhead [10] A drawback of the method described in [10] is that it relies on variable-to-fixed-length codes, which are less efficient than more general variable-to-variablelength codes [13, 14] Instead of using a run-length code with a fixed block size b, we can achieve greater by using Golomb codes that map variablelength runs of 0s in a difference vector to variable-length codewords [13] In this paper, we present a new test data and de method based on Golomb codes for testing SOCs using precomputed test sets The proposed method is applicable to both full-scan and non-scan circuits For full-scan circuits, the test patterns in a precomputed test set can be reordered to obtain a difference vector with very few 1s For non-scan circuits however, the order of pattern application must be preserved; therefore no reordering is possible Nevertheless, we show that Golomb 1 This research was supported in part by the National Science Foundation under grant no CCR-9875324, by a contract from Delphi Delco Electronics Systems, and by an equipment grant from Sun Microsystems 1

SOC ATE Memory Test Head ATE I/O channel Timing and synchronization Test Access Mechanism Core 1 Core 2 Core n Figure 1: A conceptual architecture for SOC testing ATE Memory T E Test Head ATE I/O channel Timing and synchronization Test Access Mechanism SOC Core 1 Core 2 Figure 2: A conceptual architecture for testing a systemon-a-chip by storing the encoded test data in the ATE memory and decoding it using on-chip decoder coding is effective for encoding for these circuits An encoded test set derived using Golomb coding is considerably smaller than the original precomputed test set Furthermore we show that is also much smaller than the smallest test sets that have been derived for the ISCAS benchmark circuits using ATPG compaction We derive upper and lower bounds on the amount of with Golomb and run-length codes that can be achieved for any given T diff These simple bounds provide useful guidelines to the designer and they reveal the inherent superiority of Golomb codes over run-length codes We also design a low-cost decoder for decompressing Golomb-encoded test patterns We implement the decoder using Synopsys Design Compiler [15] and show that overhead due to the decoder is very small In addition, the decoder is scalable and independent of the core under test and the precomputed test set We then present a de architecture that allows multiple cores to be tested in parallel without requiring additional ATE I/O channels This benefit is a direct consequence of the structure of the Golomb code The organization of the paper is as follows In Section 2, we present the basic concept of Golomb coding and bounds on the amount of that can be achieved using Golomb and run-length codes Section 3 presents encoding procedures and describes the decoder that is necessary for onchip de Section 4 presents the overall test architecture and a de method for an SOC with multiple cores Experimental results for the ISCAS benchmarks and two industrial production circuits are reported in Section 5 2 Golomb coding In this section, we describe Golomb coding and analyze its effectiveness for test data As discussed in Section 1, the first step in encoding a test set is to generate its difference vector test set T diff Let the (ordered) precomputed test set be = {t 1, t 2, t 3,, t n } Its difference vector is then given by T diff = {t 1, t 1 t 2, t 2 t 3,, t n-1 t n } This C ore n T diff CSR Core Figure 3: De architecture based on a cyclical scan register (CSR) assumes that the CSR starts in the all 0 state Other starting states can be considered similarly The next step in the encoding procedure is to select the Golomb code parameter m, referred to as the group size The choice of m has received a lot of attention in the information theory literature for certain distributions of the input data stream (T diff in our case), the group size m can be optimally determined For example, if the input data stream is random with 0-probability p, then m should be chosen such that p m 05 [14] However, since T diff does not satisfy the randomness assumption, the best value of m for test data can only be determined through actual experimentation Once m is determined, the runs of 0s in T diff are mapped to groups of size m (each group corresponding to a run length) The number of such groups is determined by the length of the longest run of 0s in T diff The set of run-lengths {0, 1, 2,, m -1} forms group A 1 ; the set {m, m + 1, m + 2,, 2m -1}, group A 2 ; etc In general, the set of run-lengths {(k 1)m, (k 1)m + 1, (k 1)m + 2,, k m 1} comprises group A k [15] To each group A k, we assign a group prefix of (k 1) 1s followed by a 0 We denote this by 1 (k 1) 0 If m is chosen to be a power of 2 ie, m = 2 N, each group contains 2 N members and a log 2 m-bit sequence (tail) uniquely identifies each member within the group Thus, the final code word for a run-length L that belongs to group A k is composed of two parts a group prefix and tail The prefix is 1 (k 1) 0 and the tail is a sequence of log 2 m bits The encoding process is illustrated in Figure 4 for m = 4 We now analyze the effectiveness of Golomb coding for a given T diff We derive upper and lower bounds on for any given m = 2 N The patterns in T diff can be considered as a single stream of data as shown in Figure 5 Let there be n bits and r 1s in T diff Also, without loss of generality, let the sequence always end with a 1 Therefore T diff will contain r runs of 0s Let these runs be of length l 1, l 2, l 3,, l r respectively Thus, T diff can be represented by the sequence l 1 1 l 2 1 l 3 1 l r 1 such that (l 1 + l 2 + l 3 + + l r ) + r = n The following theorem provides a bound on G, the size of the encoded sequence The proof is omitted for conciseness Theorem 1: Let the total number of bits in the difference vector set T diff be n and the total number of 1s be r Then the size G (in bits) of the encoded test data is bounded as follows: n/m + rlog 2 m G n/m + rlog 2 m + r(1-1/m) The following corollary shows that Theorem 1 provides tight bounds on G, especially if the number of 1s in T diff is small The proof of the corollary follows from Theorem 1 Corollary 1: Consider any difference vector set T diff with r 1s Let G max (G min ) be the upper (lower) bound on the size of the encoded test set, as predicted by Theorem 1 Then r/2 G max G min r 2

Group A 1 Runlength Group prefix Tail Codeword 0 00 000 1 0 01 001 2 10 010 3 11 011 4 00 1000 5 10 01 1001 6 10 1010 7 11 1011 8 00 11000 9 110 01 11001 10 10 11010 11 11 11011 A 2 A 3 Encoded sequence corresponding to 0010000001001 is 0101010010 Figure 4: An example of Golomb coding for m = 4 we note that as long as r is sufficiently small compared to n, the best that can be achieved with run-length coding is less than the worst with Golomb coding This provides an analytical justification for the use of Golomb codes instead of run-length codes 200 180 160 140 120 100 2 4 8 16 32 Group size m (a) Gmin Gmax 0001000 0011000 0100001 0000001 0010000 0001001 T diff 0001 000001 1 00001 00001 0000001 001 00000001 001 l 1 = 3 l 2 = 5 l 3 = 0 l 4 = 4 l 5 = 4 l 6 = 6 l 7 = 2 l 8 = 7 l 9 = 2 m = 4, r = 9, n = 42 = 011 1001 000 1000 1000 1010 010 1011 001 Number of encoded bits = 32 Figure 5: T diff and its encoded Corollary 1 illustrates an interesting property of Golomb codes, namely, if the number of 1s in T diff is small, Golomb coding provides almost the same amount of for different n-bit sequences with r 1s The value of G lies between the values of G max and G min derived above, and this variation can be at most r As an illustration of these bounds, consider a hypothetical example where n = 256 and r = 30 The upper and lower bounds for various values of m are shown in Figure 6(b) and the corresponding graph is plotted in Figure 6(a) We note that the lower and upper bound on the G follows a bathtub curve, and the best value of m depends on T diff and therefore needs to be determined experimentally These bounds are obtained from the parameters n and r and they do not depend on the distribution of 1s in T diff They can be used as predictors for the effectiveness of Golomb coding for a particular We next present upper and lower bounds on the achieved by run-length coding Theorem 2 Let the total number of bits in test set T diff be n and the total number of 1s be r In addition, suppose block size b is used for run-length coding The size RL (in bits) of the encoded test data is given by: bn/(2 b -1) RL bn(2 b -1) + br(2 b -2)/(2 b -1) bn/(2 b -1) + br We can now compare the efficiency of Golomb coding (m = 4) and run-length coding for block size b = 3 For runlength coding, a lower bound from Theorem 2 is given by RL min = 3n/7 = 0428n An upper bound for Golomb coding from Theorem 1 is given by G max = n/4 + 11r/4 If we make the realistic assumption (based on experimental data) that r 005n, we get G max = 039n, which is smaller than RL min In fact as r becomes smaller relative to n, G max 025n Therefore, Group size m G min G max 2 158 173 4 124 146 8 122 148 16 136 164 32 158 187 (b) Figure 6: An example illustrating the variation of the lower and upper bounds with m for n = 256 and r = 30 3 Test data /de In this section, we describe the test data procedure, the de architecture, and the design of the on-chip decoder Additional practical issues related to the de architecture are discussed in the following section We show that the decoder is simple and scalable, and independent of both the core under test and the precomputed test set Moreover, due to its small size, it does not introduce significant hardware overhead The encoding procedure for a block of data using Golomb codes was outlined in Section 2 Let be the test set with p patterns and n primary inputs and T diff be the corresponding difference vector test set A straightforward algorithm is used for generating T diff For full-scan cores, reordering of the test patterns is allowed, therefore the patterns can be arranged such that the runs of 0s are long in T diff The problem of determining the best ordering is equivalent to the NP-Complete Traveling Salesman problem Therefore, a greedy algorithm is used to generate T diff Let every pattern in correspond to a node in a complete directed graph G and let the weight (w ij ) equal the number of 0s in the difference vector obtained from t i t j (weight of the edge from t i to t j ) Starting from the first pattern t 1, we choose the next pattern that is at the least distance from t 1 (The distance between two nodes is given by n w ij ) We continue this process until all the patterns are covered, ie all nodes in G are visited The same procedure can be used to generate for non-scan cores by disabling the reordering step For test cubes, the don t-cares have to be mapped to 0s or 1s before they can be compressed The don t-cares are therefore assigned binary values such that w ij is maximum for the edge between t i and t j 3

bit_in en out clk Figure 7: Block diagram of the decoder used for de 0-/1--0 S5 0-/0--0 FSM 0-/1-00 --/11-1 v 1-/00-1 S0 S1 inc rs 1-/0011 S4 S8 i-bit counter i = log 2 m bit_in, rs/ en, out, inc, v --/1-00 1-/10-1 1-/00-1 Figure 8: The decode FSM state diagram 31 Pattern de The decoder decompresses the encoded test set and outputs T diff The exclusive-or gate and the CSR are used to generate the test patterns from the difference vectors The decoder can be efficiently implemented by a log 2 m-bit counter and a finite-state machine (FSM) The block diagram of the decoder is shown in Figure 7 The bit_in is the input to the FSM and an enable (en) signal is used to input the bit whenever the decoder is ready The signal inc is used to increment the counter and rs indicates that the counter has finished counting The signal out is the decode output and v indicates when the output is valid The operation of the decoder is as follows: Whenever the input is 1, the counter counts upto m The signal en is low while the counter is busy counting and enables the input at the end of m cycles to accept another bit The decoder outputs m 0s during this operation and makes the valid signal v high When the input is 0, the FSM starts decoding the tail of the input codeword Depending on the tail bits, the number of 0s outputted is different The en and v signals are used to synchronize the input and output operation of the decoder The state diagram corresponding to the decoder for m = 4 is shown in Figure 8 The states S0 to S3 and S4 to S8 correspond to the prefix and tail decoding respectively We also synthesized the FSM using Synopsys Design Compiler to access the hardware overhead of the decoder The synthesized circuit contains only 4 flip-flops and 34 combinational gates For any circuit whose test set is compressed using m = 4, the synthesized logic is the only additional hardware required S2 S3 0-/00-1 1-/0011 0-/1--0 S6-1/1-00 -0/0011 S7 --/00-1 other than the log 2 m-bit counter Thus the decoder is independent of not only the core under test but also its precomputed test set The extra logic required for de is very small and can be implemented very easily This is in contrast to a run-length decoder, which is not scalable and becomes increasingly complex for higher values of the block length b 4 De architecture In this section, we present a de architecture for testing SOC designs when Golomb coding is used for test data We describe the application of Golomb codes to non-scan and full-scan circuits and we present a new technique for testing several cores simultaneously using a single ATE I/O channel 41 Application to sequential (non-scan) cores For sequential cores, a boundary scan register is required at the functional inputs for de This register is usually available for cores that are wrapped In addition, a two input exclusive-or gate is required to translate the difference vectors to the patterns of Figure 9(a) shows the overall test architecture for the sequential core The encoded data is fed bit-wise to the decoder, which produces a sequence of difference vectors The de hardware then translates the difference vectors into the test patterns, which are applied to the core If an existing boundary-scan register is used to decompress the test data, the decoder and a small amount of synchronizing logic are the only additional logic required 42 Application to full-scan cores Most cores in use today contain one or more internal scan chains However, since the scan chains are used for capturing test responses, they cannot be used for de An additional cyclical scan register (CSR), with length equal to the length of the internal scan chain, is required to generate the test patterns Figure 9(b) shows the de architecture for full-scan cores As discussed in [10], there are a number of ways in which the various scan chains in a SOC can be configured to test the cores in the system If an SOC contains both nonscan and full-scan cores, the boundary-scan register associated with a non-scan core C 1 can be first used to decompress and apply test patterns to C 1 and then it can be used to decompress the test patterns and feed the internal scan of a full-scan core C 2 Similarly the internal scan of a core can be used to decompress and feed the test patterns to the internal scan of the core under test if the length of the internal scan chain being used for de is smaller than or equal to the internal scan chain being fed If the scan chain is smaller, extra scan elements can be added to make the lengths of the two scan chains equal In this way, the proposed scheme provides the core integrator with flexibility in configuring the various scan chains to minimize hardware overhead 43 Application to multiple cores We now highlight another important advantage of Golomb coding In addition to reducing testing time and the size of the test data to be stored in the ATE memory, Golomb coding also allows multiple cores to be tested simultaneously using a single ATE I/O channel in this 4

Encoded Data Synchronizing signals clk _enable Sequential Core Core 1 Core 2 Final encoded test data 1101111100 1110011100 11110111001111110000 (a) Core Under Test Figure 10: Composite encoded test data for two cores with group size m = 2 Difference vectors CSR Internal scan chain S0 --/1-00 Combinational logic S1 1-/0011 (b) S2-0/0011 Figure 9: (a) De architecture using boundary scan register (b) CSR used to feed the internal scan chain way, the I/O channel capacity of the ATE can be increased This is a direct consequence of the structure of the Golomb code, and such a design is not possible for variable-to-fixedlength (run-length) coding As discussed in Section 2, when Golomb coding is applied to a block of data containing a run of 0s followed by a single 1, the code word contains two parts a prefix and tail For a given code parameter m (group size), the length of the tail (log 2 m) is independent of the run-length Note further that every 1 in the prefix corresponds to m 0s in the decoded difference vector Thus the prefix consists of a string of 1s followed by a 0, and the 0 can be used to identify the beginning of the tail The FSM in the decoder runs the counter for m decode cycles whenever a 1 is received and starts decoding the tail as soon as a 0 is received The tail decoding takes at most m cycles During prefix decoding, the FSM has to wait for m cycles before the next bit of the prefix can be decoded Therefore, we can use interleaving to test m cores together, such that the decoder corresponding to each core is fed with encoded prefix data after every m cycles (This can also be used to feed multiple scan chains in parallel as long as the capture cycles of the scan chains are synchronized) Whenever the tail is to be decoded (identified by a 0 in the encoded bit stream), the respective decoder is fed with the entire tail of log 2 m bits in a single burst of log 2 m cycles This interleaving scheme is based on the use of a demultiplexer and it works as follows First the encoded test data for m cores is combined to generate a composite bit stream T C that is stored in the ATE Next T C is fed to the demultiplexer and a small FSM with only i = log 2 m states is used to detect beginning of each tail An i-bit counter is used to select the outputs to the decoders of the various cores Now we outline how T C is generated from the different encoded test data T C is obtained by interleaving the prefix parts of the compressed test sets of each core, but the tails are included unchanged in T C An example is shown in the Figure 10 where compressed data for two cores (generated using group size m = 2) have been interleaved to obtain the final encoded test set to be applied through the de scheme for multiple cores --/0--0 0-/1--0 S5 0-/0--0 S10 0-/1-00 1-/00-1 S9 --/11-1 Additional States --/0--0 1-/0011 --/0--0 1-/10-1 0-/00-1 Figure 11: Modified state diagram of the decode FSM to make the tail and prefix decode cycles equal S4 S8 Every scan chain has its dedicated decoder This decoder receives either a 1 or the tail of the compressed data corresponding to the various cores connected to the scan chain The i-bit counter connected to the select lines of the demultiplexer selects a decoder after every m clock cycles If the FSM detects that a portion of the tail has arrived, the 0 that is used to identify the tail, is passed to the decoder and then the counter is stopped for log 2 m (tail length) cycles so that the test data is transferred continuously to the appropriate core The tail decoding takes at most m cycles This is because the number of states traversed by the decode FSM depends on the bits of that it receives; see Figure 8 This number can be at most m In order to make the prefix and tail decoding cycles equal, two additional states must be added to the FSM state diagram as shown in the Figure 11 This ensures that the decoder works in synchronization with the demultiplexer Moreover, now the tail bits may not be passed on to the decoder as a single block Thus, the interleaving of test data to generate T C changes slightly The additional states do not increase the number of flip-flops in the decoder S3 0-/1--0 S6 S11-1/1-00 S7 --/00-1 5

Circuit 5 Experimental results In this section, we experimentally evaluate the proposed test data /de method for the ISCAS benchmark circuits, and for two industrial circuits We considered both full-scan and non-scan sequential circuits in our experiments The test set for each full-scan circuit was reordered to increase ; on the other hand, no reordering was done for the non-scan circuits The amount of obtained was computed as follows: Compression =( G)/ 100 ISCAS circuit Number of 1s (r) Percentage for various values of m m = 2 m = 4 m = 8 m = 16 m = 32 Lower bound G min encoded test set G Upper bound G max c1355 572 2353 2627 2782 c1908 700 2456 2876 3071 c2670 1728 7717 8903 9229 c3450 1303 4218 4868 5195 c5315 2206 8283 9443 9938 c7552 6475 19102 21338 22339 s641 296 998 1098 1146 s713 290 931 1080 1148 s1196 589 2290 2570 2731 s1238 599 2414 2685 2863 s5378 3239 12416 14085 14845 s9234 5039 19896 22250 23675 s13207 6716 37189 41658 43485 s15850 8702 36650 40717 43177 s35932 5340 55886 59573 61216 s38417 20165 81514 92054 96637 s38584 23320 96416 104111 113906 Table 2: Comparison between G (obtained experimentally) with the theoretical bounds G min and G max The first set of experimental data that we present is based on the use of partially-specified test sets (test cubes) The No of bits in Best No of bits G in system integrator can determine the best Golomb code parameter and encode test cubes if they are provided by the core vendor Alternatively, the core vendor can encode the test set for the core and provide the encoded test set along with the value of m to the core user, who can then use m to design the decoder In a third possible scenario, the core vendor can encode the test set and provide it to the core user without disclosing the value of m used for encoding Thus now serves as an encryption of the test data for IP protection and m serves as the "secret key" In this case however, the core vendor must also design the decoder for the core and provide it to the core user Table 1 presents the experimental results for the ISCAS benchmark circuits with test cubes obtained from the Mintest ATPG program with dynamic compaction [9] We carried out our experiments using a Sun Ultra 10 workstation with a 333 MHz processor and 256 MB of DRAM The table lists the sizes of the precomputed (original) test sets, the amount of achieved for several values of m, and the size of the smallest encoded test set As is evident from Table 1, the best value of m depends on the test set Not only do we achieve very high test data with a suitable choice of m, but we also observe that in a majority of cases (eg for all but one of the ISCAS 89 circuits), the size of is less than the smallest tests that have been derived for these circuits using ATPG compaction [11] (These cases are shown shaded in Table 1) Hence ATPG compaction may not always be necessary for saving memory and reducing testing time This comparison is essential in order to show that storing in ATE memory is more efficient than simply applying ATPG compaction to test cubes and storing the resulting compact test sets For example, the effectiveness of statistical coding for full-scan circuits was not completely established in [9] since no comparison was drawn with ATPG compaction in that work We next present results on Golomb coding for non-scan circuits For this set of experiments, we used HITEC [17] to generate test sequences (cubes) for some of the ISCAS 89 benchmark circuits (including the three largest ones), and No of bits for Mintest c1355 3486 4570 4458 3763 2800 4838 4570 2627 3444 c1908 3017 3730 3263 2193 806 4587 3730 2876 3498 c2670 3864 5334 5608 5302 4708 20271 5608 8903 10252 c3450 2348 2452 1390 240 2136 6450 2452 4868 4200 c5315 3133 3902 3508 2546 1326 15486 3902 9443 6586 c7552 1550 980-699 2951 5417 25254 1550 21338 15111 s641 2179 2158 1032 683 2663 1404 2179 1098 1134 s713 2243 2307 1196 470 2407 1404 2307 1080 1134 s1196 3280 4222 4006 3167 2021 4448 4222 2570 3616 s1238 3421 4479 4362 3626 2594 4864 4479 2685 3872 s5378 3211 4070 3760 2872 1719 23754 4070 14085 20758 s9234 3344 4334 4153 3347 2233 39273 4334 22250 25935 s13207 4478 6503 7297 7478 7344 165200 7478 41658 163100 s15850 3537 4711 4679 4045 3107 76986 4711 40717 57434 s35932 * 4983 7468 8704 9315 9614 4007299 9851 59573 19393 s38417 3355 4412 4238 3522 2520 164736 4412 92054 113152 s38584 3565 4771 4767 4165 3271 199104 4771 104111 161040 * The test set used is obtained from the Atalanta ATPG program [16] (The Mintest test set with dynamic compaction is almost fully compacted) The maximum was obtained for group size m = 512 Table 1: Experimental results on Golomb coding for the combinational and full-scan ISCAS benchmark circuits with test patterns generated using Mintest [11] 6

ISCAS 89 Circuit Percentage for group size m (non-scan) m = 2 m = 4 m = 8 m = 16 m = 32 m = 64 Best s953 4178 5856 6378 6258 5787 5162 1168 6378 423 s5378 4107 5768 6224 6079 - - 169995 6224 64176 s13207 4924 7336 8521 9081 9330 9421 42284 9421 2491 s35932 4810 7141 8231 8703 8858 8874 147070 8874 16554 s15850 4832 7181 8297 8791 8962 8968 430353 8968 46872 s38417 4705 6908 7806 8179 8211 8034 22624 8211 4046 (a) Test sequence Percentage for group size m for CKT1 m = 2 m = 4 m = 8 m = 16 m = 32 m = 64 Best Size of TS1 4671 6853 7812 8156 8171 8038 25130 8171 4595 TS2 4777 7068 8114 8561 8675 8638 23230 8675 3078 TS3 4606 6690 7598 7902 7840 7630 5660 7902 1187 TS4 4817 7125 8212 8685 8823 8812 18830 8823 2216 TS5 4795 7071 8120 8580 8684 8647 21550 8684 2835 TS6 4705 6905 7905 8288 8319 8203 18800 8319 3160 (b) Test sequence Percentage for group size m for CKT2 m = 2 m = 4 m = 8 m = 16 m = 32 m = 64 Best TS1 4689 6881 7844 8212 8266 8134 11079 8266 1921 TS2 3760 5170 5427 5085 4316-234 5427 107 TS3 4594 6720 7629 7971 7955-14562 7971 2954 TS4 4689 6881 7844 8212 8266 8134 11079 8266 1921 (c) Table 3: Experimental results for (a) ISCAS 89 benchmark circuits (b) various test sequences for industrial non-scan circuit CKT1 (c) various test sequences for industrial non-scan circuit CKT2 Size of determined the size of in each case Table 3(a) illustrates the amount of achieved for these circuits We also applied Golomb coding to two non-scan industrial circuits These production circuits are microcontrollers, whose test data were provided to us by Delphi Delco Electronics Systems The first circuit CKT1 contains 168K gates, 145 flip-flops, and 35 latches The second (smaller) circuit contains 68 K gates, 88 flip-flops, and 32 latches The test sequences for these circuits were fully-specified and they were derived using functional methods targeted at single ISCAS circuit (fullscan) Size of Percentage (Golomb coding) Percentage (run-length coding) Difference G RL s5378 23754 4070 3557 513 s9234 39273 4334 4008 326 s13207 165200 7478 5550 1928 s15850 76986 4711 4210 501 s35932 4007299 9851 6232 3619 s38417 164736 4412 3716 696 s38584 199104 4771 4240 531 Table 4: Comparison between the obtained with Golomb coding and run-length coding stuck-at faults in their subcircuits The results on Golomb coding for these circuits are presented in Table 3(b) and Table 3(c) We achieved significant (over 80% on average) in all cases Thus the results show that the scheme is very effective for the non-scan circuits as well We next revisit the lower and upper bounds derived in Section 2 for test data using Golomb codes In Table 2, we list these bounds and the actual obtained for the ISCAS circuits Table 3 shows the number of 1s in T diff, size of the encoded test set, and lower and upper bounds corresponding to each circuit These results show that the experimental results are consistent with the theoretically-predicted bounds An analytical comparison between run-length coding and Golomb coding was presented in Section 2 Here we present experimental results to reinforce that comparison Table 4 compares the amount of obtained with run-length coding for b = 3 with Golomb coding for the large ISCAS benchmark circuits Golomb codes give better in all cases For example, the is almost 20% better for s13207 While run-length coding may yield slightly better for higher values of b, the complexity of the run-length decoder increases considerably with an increase in b If the precomputed test set is already compacted using ATPG methods, then the obtained using Golomb codes is considerably less Nevertheless, we have seen that a significant amount of is often achieved if Golomb coding is applied to an ATPGcompacted Table 5 lists the achieved for some ISCAS benchmark circuits with test sets derived using SIS [19] We also present results for I99C1, a combinational benchmark circuit extracted from an industrial design and presented at ITC-99 The corresponding results achieved with run-length coding (block size b = 3) are also shown, and are seen to be significantly less Unfortunately, 7

Circuit (Golomb, bits) Compressi on (Golomb, percent) we were unable to directly compare our results with [10] since the test sets used in [10] are no longer available However, we note that Golomb coding indirectly outperforms [10] since is much smaller and the is significantly higher for Golomb-coded test sets in all cases 6 Conclusions We have presented a new test vector method and de architecture for testing embedded cores in a SOC The proposed method is based on variable-tovariable-length Golomb codes We have shown that Golomb codes can be used for efficient of test data for SOCs and to save ATE memory and the testing time Golomb coding is inherently superior then run-length coding; we have demonstrated this analytically and through experimental results The on-chip decoder is small and easy to implement In addition, it is scalable and independent of the core under test and the precomputed test set We have also presented a novel de architecture for testing multiple cores simultaneously This reduces the testing time of an SOC further and increases the ATE I/O channel capacity considerably The novel de architecture is a direct consequence of the structure of the Golomb codes Experimental results for the ISCAS benchmark show that the technique is very efficient for combinational and full-scan circuits Significant is achieved not only for test cubes, but also for compacted fully-specified test sets The results show that ATPG compaction may not be always necessary for saving ATE memory and reducing testing time We also achieved substantial for two non-scan industrial circuits and for the non-scan ISCAS 89 circuits using HITEC test sets These results show that Golomb coding is also attractive for compressing (ordered) test sequences of non-scan circuits Acknowledgements The authors thank Dr Mark Hansen of Delphi Delco Electronics Systems for providing test sequences for the industrial circuits, Dr Andrej Morosov of University of Potsdam for generating test sets using SIS, and Dr Scott Davidson of Sun Microsystems for providing the test set for I99C1 References [1] Y Zorian, E J Marinissen and S Dey, Testing embedded-core based systems chip, Proc International Test Conference, pp 130-143, 1998 [2] G Hetheringten, T Fryars, N Tamarapalli, M Kassab, A Hassan and J Rajski, Logic BIST for large industrial designs, Proc International Test Conference, pp 358-367, 1999 [3] B T Murray and J P Hayes, Testing ICs: Getting to the core of the problem, Computer, vol 29, pp 32-38, November 1996 (runlength, bits) Compression (run-length, percent) [13] Compression [13] s13207 529900 278207 (m=4) 4749 323088 3903 313666 158 s15850 400205 223356 (m=4) 4418 254838 3632 260532 161 s38417 3076736 1321185 (m=8) 5705 1708227 4448 1673680 162 s38584 1742160 1237049 (m=4) 2899 1346118 2273 I99C1 23168 8719 (m=8) 6236 11475 5047 Table 5: Comparison between Golomb and run-length coding for fully specified test sets [4] C-A Chen and S K Gupta, Efficient BIST TPG design and test set compaction via input reduction, IEEE Transactions on Computer-Aided Design, vol 17, pp 692-705, August 1998 [5] K Chakrabarty and B T Murray, Design of built-in test generator circuits using width, IEEE Transactions on Computer-Aided Design, vol 17, pp 1044-1051, October 1998 [6] K Chakrabarty, B T Murray and V Iyengar, Built-in pattern generation for high-performance circuits using twisted-ring counters, Proc IEEE VLSI Test Symposium, pp 22-27, 1999 [7] V Iyengar, K Chakrabarty and B T Murray, Built-in self testing of sequential circuits using precomputed test sets, Proc IEEE VLSI Test Symposium, pp 418-423, 1998 [8] V Iyengar, K Chakrabarty and B T Murray, Deterministic built-in pattern generation for sequential circuits, Journal of Electronic Testing: Theory and Applications, vol 15, pp 97-114, August/October, 1999 [9] A Jas, J Ghosh-Dastidar and N A Touba, Scan vector /de using statistical coding, Proc IEEE VLSI Test Symposium, pp 114-120, 1999 [10] A Jas and N A Touba, Test vector de via cyclical scan chains and its application to testing core-based design, Proc International Test Conference, pp 458-464, 1998 [11] I Hamzaoglu and J H Patel, Test set compaction algorithms for combinational circuits, Proc International Test Conference on CAD, pp 283-289, 1998 [12] S Kajihara, I Pomeranz, K Kinoshita and S M Reddy, On compacting test sets by addition and removal of vectors, Proc VLSI Test Symposium, pp 202-207, 1994 [13] S W Golomb, Run-Length Encoding, IEEE Transactions on Information Theory, vol IT-12, pp 399-401, 1966 [14] H Kobayashi and L R Bahl, Image data by predictive coding, Part I: Prediction Algorithm, IBM Journal of Research & Develoment, vol 18, pp 164, 1974 [15] Synopsys Inc, Design compiler reference manual, 1992 [16] H K Lee and D S Ha, On the generation of test patterns for combinational circuits, Tech report no 12_93, Department of Electrical Engineering, Virginia Tech [17] The University of Illinois, wwwcrhcuiucedu/igate [18] Y Zorian, Test requirements for embedded core-based systems and IEEE P1500, Proc International Test Conference, pp 191-199, 1997 [19] E M Sentovich et al, SIS: A system for sequential circuit synthesis, Technical report UCB/ERL M92/41, Electronic Research Laboratory, University of California, Berkeley, CA, 1992 8 nbn 3bnn 3n 1 n 11 RL G RL min = r log m= r 1= 0428 min = n = + = = 0428 + n 1 n = n 11 G = + r log m + r 1 = + + r r