A Parallel Multilevel-Huffman Decompression Scheme for IP Cores with Multiple Scan Chains

Similar documents
926 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY /$ IEEE

State Skip LFSRs: Bridging the Gap between Test Data Compression and Test Set Embedding for IP Cores *

Low Power Estimation on Test Compression Technique for SoC based Design

Synchronization Overhead in SOC Compressed Test

Review: What is it? What does it do? slti $4, $5, 6

Chapter 4 (Part I) The Processor. Baback Izadi Division of Engineering Programs

Deterministic BIST Based on a Reconfigurable Interconnection Network

Test Data Compression for System-on-a-Chip Using Golomb Codes 1

Survey of Test Vector Compression Techniques

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

Speech Recognition Combining MFCCs and Image Features

HIGHER circuit densities and ever-increasing design

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation

Novel Blind Recognition Algorithm of Frame Synchronization Words Based on Soft- Decision in Digital Communication Systems

CacheCompress A Novel Approach for Test Data Compression with cache for IP cores

Changing the Scan Enable during Shift

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

Design of Fault Coverage Test Pattern Generator Using LFSR

Low-Power Scan Testing and Test Data Compression for System-on-a-Chip

A Combined Compatible Block Coding and Run Length Coding Techniques for Test Data Compression

Test Compression for Circuits with Multiple Scan Chains

Achieving High Encoding Efficiency With Partial Dynamic LFSR Reseeding

Response Compaction with any Number of Unknowns using a new LFSR Architecture*

A Real-time Framework for Video Time and Pitch Scale Modification

Czech Technical University in Prague Faculty of Information Technology Department of Digital Design

Weighted Random and Transition Density Patterns For Scan-BIST

A New Low Energy BIST Using A Statistical Code

Montgomery Modular Exponentiation on Reconfigurable Hardware æ

Analog Signal Input. ! Note: B.1 Analog Connections. Programming for Analog Channels

VLSI Design Verification and Test BIST II CMPE 646 Space Compaction Multiple Outputs We need to treat the general case of a k-output circuit.

DETERMINISTIC SEED RANGE AND TEST PATTERN DECREASE IN LOGIC BIST

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

VLSI System Testing. BIST Motivation

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Power Problems in VLSI Circuit Testing

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Overview: Logic BIST

Reducing Power Supply Noise in Linear-Decompressor-Based Test Data Compression Environment for At-Speed Scan Testing

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Using Device-Specific Data Acquisition for Automated Laboratory Testing

Fault Detection And Correction Using MLD For Memory Applications

March Test Compression Technique on Low Power Programmable Pseudo Random Test Pattern Generator

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

Testing of Cryptographic Hardware

Transactions Brief. Circular BIST With State Skipping

Scan-shift Power Reduction Based on Scan Partitioning and Q-D Connection

Controlling Peak Power During Scan Testing

With Ease. BETTY WAGNER Associate Trinity College London, Associate Music Australia READING LEDGER LINE NOTES

An Efficient Spurious Power Suppression Technique (SPST) and its Applications on MPEG-4 AVC/H.264 Transform Coding Design

SIC Vector Generation Using Test per Clock and Test per Scan

ALONG with the progressive device scaling, semiconductor

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

Bit-Serial Test Pattern Generation by an Accumulator behaving as a Non-Linear Feedback Shift Register

MVP: Capture-Power Reduction with Minimum-Violations Partitioning for Delay Testing

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

Easy Estimation of Spectral Purity of Test Signals for ADC Testing. David Slepička

Design of BIST with Low Power Test Pattern Generator

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab. Built-In Self Test 2

TEST PATTERNS COMPRESSION TECHNIQUES BASED ON SAT SOLVING FOR SCAN-BASED DIGITAL CIRCUITS

Optimal Selective Count Compatible Runlength Encoding for SOC Test Data Compression

SoC Testing Using LFSR Reseeding, and Scan-Slice-Based TAM Optimization and Test Scheduling

A Technique to Reduce Peak Current and Average Power Dissipation in Scan Designs by Limited Capture

ECE 715 System on Chip Design and Test. Lecture 22

VLSI Test Technology and Reliability (ET4076)

Built-In Self-Test (BIST) Abdil Rashid Mohamed, Embedded Systems Laboratory (ESLAB) Linköping University, Sweden

Implementation of Memory Based Multiplication Using Micro wind Software

Survey of low power testing of VLSI circuits

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

Chapter 10 Exercise Solutions

Efficient Trace Signal Selection for Post Silicon Validation and Debug

Computer and Digital System Architecture

DESIGN OF LOW POWER TEST PATTERN GENERATOR

CPE 628 Chapter 5 Logic Built-In Self-Test. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

LFSR Counter Implementation in CMOS VLSI

FOR A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY

VLSI IMPLEMENTATION OF SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST IN FPGA TECHNOLOGY

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Launch-on-Shift-Capture Transition Tests

Design and Implementation OF Logic-BIST Architecture for I2C Slave VLSI ASIC Design Using Verilog

DESIGN O'F A HIGH SPEED DDA

An FPGA Implementation of Shift Register Using Pulsed Latches

1. Basic safety information 4 2. Proper use 4

Diagnosis of Resistive open Fault using Scan Based Techniques

I. INTRODUCTION. S Ramkumar. D Punitha

CMOS Testing-2. Design for testability (DFT) Design and Test Flow: Old View Test was merely an afterthought. Specification. Design errors.

Implementation of Scan Insertion and Compression for 28nm design Technology

[Krishna*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Strategies for Efficient and Effective Scan Delay Testing. Chao Han

Efficient Test Pattern Generation Scheme with modified seed circuit.

DETERMINISTIC TEST PATTERN GENERATOR DESIGN WITH GENETIC ALGORITHM APPROACH

A Model for Scale-Degree Reinterpretation: Melodic Structure, Modulation, and Cadence Choice in the Chorale Harmonizations of J. S.

Designing for High Speed-Performance in CPLDs and FPGAs

LOW-OVERHEAD BUILT-IN BIST RESEEDING

Vadim V. Romanuke * (Professor, Polish Naval Academy, Gdynia, Poland)

On Reducing Both Shift and Capture Power for Scan-Based Testing

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

Chapter 5: Synchronous Sequential Logic

Multivalued Logic for Reduced Pin Count and Multi-Site SoC Testing

Transcription:

A Parallel Mltilevel-Hffman Decompression Scheme for IP Cores with Mltiple Scan Chains X Kavosianos, E Kalligeros 2 and D Nikolos 2 Compter Science Dept, University of Ioannina, 45 Ioannina, Greece 2 Compter Engineering & Informatics Dept, University of Patras, 265 Patras, Greece Abstract Varios efficient compression methods have been proposed for tackling the problem of increased test-data volme of contemporary, core-based Systems-on-Chip (SoCs) However, many of them cannot exploit the test-application-time advantage that cores with mltiple scan chains offer, since they are not able to perform parallel decompression of the encoded data For eliminating this problem, we present a new, low-overhead decompression scheme that can generate clsters of test bits in parallel The test data are encoded sing a recently proposed and very effective compression method called mltilevel Hffman Ths, apart from the significantly redced test-application times, the proposed approach offers high compression ratios, as well as increased probability of detection of nmodeled falts, since the majority of the nspecified bits of the test sets are replaced by psedorandom data The time/space advantages of the proposed approach are validated by thorogh experiments Introdction In order to meet tight time-to-market constraints, contemporary systems embed pre-designed and pre-verified modles called IP (Intellectal Property) cores The strctre of IP cores is often hidden from the system integrator and as a reslt, neither falt simlation nor test pattern generation can be performed for them IP cores are delivered along with a pre-compted test set and if they are not BIST-ready, proper test strctres shold be incorporated in the system Varios methods have been proposed to cope with testing of IP cores Some of them embed the pre-compted test vectors in longer psedorandom seqences generated on chip [], [], [2] The major drawback of sch methods is their long test application time Ths, many techniqes, sing varios compression codes, have been proposed for direct test-data encoding Golomb codes were proposed in [2], [3], [7], alternating rn length codes in [4], FDR codes in [5], [4], statistical codes in [6], [8], [], a nine-coded techniqe in [9], and combinations of codes in [5], [8] An additional advantage of these approaches is that they can operate on flly compacted test sets ths allowing frther test-time redctions Some methods se dictionaries bt impose high hardware overhead de to the large embedded RAMs reqired Usally, test sets, even if they are dynamically and/or statically compacted, inclde large nmbers of 'x' vales Traditionally, ATPG tools fill these 'x' vales randomly with logic or, so as to increase the fortitos detection of nmodeled falts On the contrary, compression methods, in order to achieve high compression ratios, replace these 'x' vales with the logic vales and/or, depending on the characteristics of the implemented code Therefore, compression methods may adversely affect the coverage of nmodeled falts The athors of [9] sggest that, if possible, at least a portion of a test set's 'x' vales shold be set randomly This is why in [] and [2] LFSRs are sed for generating whole clsters of test data Another common problem of many compression methods is their inability of exploiting the test-application-time advantages that a core with mltiple scan chains offers In other words, if parallel decompression is not possible, a serial-in parallel-ot register mst be sed for spreading the decoded data in the scan chains and as a reslt, no test-time savings, de to the mltiple scan chains, are possible The soltion of sing more than one decoder is too expensive and ths inapplicable For that reason, in this paper we propose a test-data compression scheme that can generate whole clsters of decoded data in parallel It will be shown that the proposed scheme manages to exploit most of the offered parallelism with low hardware overhead (comparable to that of single-scan-chain schemes) The test-data are compressed sing the recently proposed and very effective mltilevel Hffman encoding method of [] Ths, the proposed approach offers high compression ratios as well as increased probability of detection of nmodeled falts, since most of the test sets' 'x' bits are replaced by psedorandom data The paper is organized as follows: since it is not straightforward, in Section 2 we explain how the mltilevel Hffman approach can be applied to mltiple-scan-chain cores The reqired strctres are also introdced In Section 3 the proposed decompressor is discssed in detail, while in Section 4 the proposed techniqe is evalated with experimental reslts and comparisons Section 5 concldes the paper 2 Compression Method Consider a core with N sc balanced scan chains of W sc scan cells each, as shown in Figre : clsters block block block slice W sc N sc slice W sc - Figre Scan chains, clsters and blocks Each test cbe (test vector with 'x' vales) is partitioned into W sc consective slices of N sc bits, according to the scan-

chain strctre of the core In other words, a test slice consists of the test bits contained in the scan cells of a vertical cross-section of the scan chains W sc scan cycles are reqired for loading the scan chains In case of a not perfectly balanced scan strctre (scan chains are not eqally sized), the short test slices are padded with 'x' vales Each test slice is partitioned into clsters of size If N sc is not divided exactly by, then the last clster of all slices is shorter than the others In other words each test cbe is partitioned into W sc N sc / test clsters The proposed encoding scheme is based on psedorandom bit generation and mltilevel Hffman coding According to the mltilevel Hffman approach, the same Hffman code is sed for encoding different sets of information (three in or case, as it will be explained later) As psedorandom generator we se a small LFSR and a phase shifter, which can prodce psedorandom clsters of size The phase shifter is initially designed as proposed in [6] However, since we want to be able to choose among different seqences of psedorandom clsters for the same time period, we add an extra inpt to each XOR tree [9] This extra inpt is driven, throgh a mltiplexer, by varios cells of the LFSR For every different cell, a different seqence of psedorandom clsters is generated at the otpts of the phase shifter The psedorandom generator is shown in Figre 2 Cell Selection Address Phase Shifter XOR tree mx LFSR XOR tree 2 Figre 2 Psedorandom Generator At first, the LFSR is set to a random initial state and is let evolve for a nmber of cycles eqal to the total nmber of the test clsters Then all the test clsters are compared against the corresponding psedorandom clsters of the clster seqences generated when each LFSR-cell's otpt is fed to the phase shifter's extra inpt If a test clster is compatible with a psedorandom clster belonging in the seqence of cell i, a hit for cell i occrs A designer-defined nmber of LFSR cells with the largest hit ratios are selected in order to feed the extra inpt of the phase shifter The mltiplexer selection address of each chosen cell is encoded sing Hffman coding, ie each Hffman codeword is sed for enabling an LFSR cell to drive the extra inpt of the phase shifter We call this type of encoding Cell encoding Since many test clsters have a large nmber of 'x' vales, they are compatible with psedorandom clsters generated when different LFSR cells feed the phase shifter's extra inpt The proposed method associates each clster with the cell that skews the cell-occrrence probabilities the most Test clsters that are not compatible with any psedorandom clsters are labeled as failed and a single Hffman codeword is sed for distingishing them from the rest We note that both the normal and inverted otpts of the LFSR cells are considered dring the aforementioned cell-selection procedre XOR tree If consective test clsters can be generated by sing the same LFSR cell, we encode them with only one codeword, which scceeds the Cell-encoding codeword and indicates the nmber of consective clsters (clster-grop length) that will be generated In order to keep the cost low, the available lengths are chosen from a predetermined list of distinct lengths (grop-length qantization) These distinct lengths are eqal to the powers of 2 in the interval [, max_length), where max_length is the maximm nmber of consective test clsters which are compatible with psedorandom ones We call this type of Hffman encoding, Length encoding A clster grop with a length not inclded in the list, is partitioned into smaller grops of proper length A Cell-encoding codeword is always followed by a Lengthencoding codeword, if the encoded clster is not a failed one All failed clsters are partitioned into blocks of size, and the blocks with the highest probabilities of occrrence are encoded sing a selective Hffman code as proposed in [8] We call this encoding encoding Some blocks remain n-encoded (failed blocks) and are embedded in the compressed test set As in the case of failed clsters, a single Hffman codeword is associated with each failed block, contrary to [8] where an extra bit is sed in front of all codewords The advantage of the proposed compression method is that the same Hffman decompressor can be sed for implementing the three different decodings (Cell, Length and ) Note that one codeword is sed for each cell As the same codewords are sed for all three encodings, the nmber of selected cells is eqal to the nmber of list lengths in Length encoding and to the nmber of niqe blocks encoded by encoding The Hffman tree is constrcted by smming the corresponding occrrence probabilities of the three cases so as a single Hffman code, covering all three of them, to be generated Ths the same codeword, depending on the mode of the decoding process, corresponds to 3 different kinds of information: to an LFSR cell (normal and/or inverted), to a clster-grop length or to a block of data Always the first codeword in the encoded-data stream is considered as a Cell-codeword If it does not indicate a failed clster, then the next codeword determines the length of the clster grop If, on the other hand, it corresponds to a failed clster, the next / codewords are processed as codewords Each of the codewords may indicate a failed block or a Hffman encoded block In the former case the actal block of data is embedded in the encoded-data stream, while in the later case the block of data is generated by the decompressor This seqence is iteratively repeated starting always from a Cell encoding codeword Example Consider a core with 48 scan chains and a test set of 768 bits For its encoding we se 4 LFSR cells and ths 4 clster-grop-list lengths and 4 encode-able data blocks Let each clster be 24 bit wide (2 clsters per slice) and each block 4 bit wide (6 blocks per clster) Figre 3 presents the selected cells, the available list lengths and the most freqently occrring data blocks sorted in descending order according to their occrrence freqency Each line of the table (ie, the respective case for all three encodings) corre- 2

sponds to a single codeword in the final encoded stream Note that there are 3 grops of clsters matched by LFSRcell seqences and 3 failed clsters which are partitioned into 8 blocks Overall, there are 47 occrrences of encodeable data and 5 niqe codewords that will be sed for encoding them The occrrence volmes in each line of the table are smmed and divided by the total nmber of occrrences (47), generating the probability of occrrence of each distinct codeword, as shown in Figre 3 The encoded stream in Figre 3 corresponds to the data stored in the ATE Initially (t ) the first slice is ndefined () The first codeword () corresponds to cell A, and the next codeword () indicates the grop length, which is 2 according to the table Therefore the first slice is filled by sing cell A (Clster in t and Clster 2 in t 2 ) The next codeword () indicates that the next clster is a failed one According to the proposed compression scheme, this clster is partitioned into 6 blocks The next codeword () indicates that the first block is a failed one as well; therefore the actal data () are not encoded and follow codeword in the code stream The codeword for the second block is which correspond to the encoded block This is repeated ntil all 6 blocks have been processed The size of the encoded data is bits c c 2 c 3 c 4 c 5 Hffman Tree Cell Encoding Cell Occr Length Encoding List Length Occr Encoding Occr Code Word A 6 7 8 (6+7+8)/47 B 4 2 3 Fail 4 (4+3+4)/47 Fail 3 4 2 3 (3+2+3)/47 C 2 8 2 (2++2)/47 D (+)/47 SUM 6 3 8 Total Sm = 47 Failed Clster Failed 55 32 5 45 23 7 4 c c 2 c 3 c 4 c 5 Encoded Stream t t t 2 t 3 t 4 Slice Slice 2 P Figre 3 Proposed Encoding Example Cell A (t ) Cell A (t ) Cell A (t 2 ) Clster Clster 2 3 Decompression Architectre The block diagram of the proposed decompression architectre is shown in Figre 4 The Inpt Bffer receives the encoded data from the ATE channels (ATE_DATA) with the freqency of the ATE clock (ATE_CLK), shifts them into Hffman FSM nit with the freqency of the system clock, and activates signal Empty to notify ATE so as to send the next test data When the Hffman FSM recognizes a codeword, it informs Inpt Bffer to stop sending test data and, assming that the implemented code consists of N codewords, it places on the bs CodeIndex a binary vale between and N-, which is sed as the Cell Selection Address (Figre 2) It also sets Valid Code= Units and Clster Grop Length which are combinational blocks (or lookp tables) retrn respectively the encoded block and grop length that correspond to CodeIndex As it was shown in Section 2, a failed clster is partitioned into blocks and each block is either Hffman encoded or not encoded (embedded as is into the compressed data) These two cases are distingished by the signal Select Hffman/ATE and the selected data are driven throgh mltiplexing logic to the Sorce Select nit This nit receives psedorandom clsters and blocks of failed clsters and, depending on the Select Clster/ signal, it constrcts the slice that will enter the scan chains ATE_DATA ATE_CLK Inpt Bffer ATE_SYNC Empty Sync Decompressor Hffman FSM Valid Code LFSR_en Controller Select Clster/ CodeIndex R_en R_en en R Select Hffman/ATE Cell Selection Address LFSR_en ATE_DATA Clster Grop Length Psedo- Random Generator Clster Sorce Select en slice Scan Chains W sc Figre 4 Decompression Architectre The controller is a finite state machine which synchronizes the operation of all nits Its state diagram is shown in Figre 5 (the most important signals are presented) Initially the controller waits for the first codeword to be received (Valid Code=) If this codeword indicates a non-failed clster (this is determined by the vale of bs CodeIndex), the controller sets R_en= so as to store the cell address to the R resister and proceeds to _LENGTH state Otherwise, it reaches state _FAILED_CLUSTER At _LENGTH state the controller waits for the next codeword and pon reception stores the data retrned by the Clster Grop Length nit (length of the grop) and sets Select Clster/= in order to enable psedorandom data to enter the Sorce Select nit Then it proceeds to the state LOAD_CLUSTER_GROUP where it remains for a nmber of clock cycles eqal to the length of the grop Dring these cycles the LFSR evolves (LFSR_en=) and the prodced psedorandom clsters are loaded into the Sorce Select nit Every time a whole test slice is ready, it is loaded into the scan chains After the end of these cycles the state machine retrns to _CHANNEL state for the next iteration In the _FAILED_CLUSTER state the controller waits for the next codeword If this codeword corresponds to an encoded block, the controller sets Select Hffman/ATE= and Select Clster/= in order to drive the otpt of the nit (which is the decoded data block) into the Sorce Select nit, and proceeds to the CLUSTER_DONE? state On slice W sc - N sc 3

the other hand if the received codeword corresponds to a failed block then the controller proceeds to the _ FAILED_BLOCK state and sets Select Hffman/ATE=, Select Clster/= and Sync= to enable the ATE to send the data block Then it samples the ATE_CLK and when the data from the ATE are available, they are driven directly to the Sorce Select nit (Inpt Bffer is bypassed) From the _FAILED_BLOCK state the controller proceeds to the CLUSTER_DONE? state If all blocks of the failed clster have been handled, the LFSR is let evolve for one clock cycle (LFSR_en=) and the next state is _CHANNEL Otherwise the controller proceeds to state _FAILED_ CLUSTER in order to process the next block,-,- /,,-,-, Valid Code,Grop_Done,ATE_CLK / Sync,R_en,Select Hffman-ATE,Select Clster-, LFSR_en FAILED BLOCK,-,- /,,-,-,,-,- /,,,, -,-, /,,,, -,-,/,,,, FAILED CLUSTER -,-,- /,,-,-, CLUSTER DONE?,-,- /,,-,-,,-,- /,,,, -,-,- /,,-,-, CHANNEL,-,- /,,-,-,,-,- /,,-,-, LENGTH -,,- /,,-,-, Figre 5 Controller State Diagram,-,- /,,-,, -,,- /,,-,, LOAD CLUSTER GROUP The Sorce Select nit is shown in Figre 6a It receives clster data prodced by the Psedorandom Generator (encoded clsters - Clster bs), as well as block data either by the nit (Hffman encoded blocks - bs) or by the ATE (failed blocks) The received data are stored in a bffer (Scan Bffer) of size eqal to that of a test slice (N sc ) This bffer consists of N sc / registers with size eqal to each, groped into k= N sc / grops of w=/ registers All Bffer Grops are loaded in a rond robin fashion (Bffer Grop i is loaded after Bffer Grop i- ) When SelectClster/= the Clster bs (of width ) loads, throgh the Mx nit, all registers of a grop simltaneosly (in a single clock cycle), while when SelectClster/= the bs (of width ) is driven to every register (w clock cycles are needed for loading a whole grop) This operation is handled by the controller throgh the se of w enable signals en iw en (i+)w-, one for each register in the grop (Bffer Grop i is shown in Figre 6b) Totally, k w enable signals are generated for the whole Scan Bffer In order for a clster to be loaded into Bffer Grop i by the Psedorandom Generator, all w enable signals of this grop are activated When a failed clster is loaded into Bffer Grop i, grop's i registers are enabled one after the other, ntil all the blocks of the failed clster are loaded into the corresponding registers (the enable signals are one-hot encoded) When the whole Scan Bffer is fll the scan chains are loaded The Scan Bffer can be avoided if the core is eqipped with a separate scan enable or clock signal for each scan chain Then the scan chains can be loaded directly withot the interference of the bffer, sing the enable signals for driving the scan enables, or for gating the clock of each scan chain Clster Select Clster/ Mx Unit (a) Scan Bffer w- kw- (k-)w Bffer Grop en en To w- N sc =k w Scan Chains en (k-)w en kw- w Bffer Grop k- w Clster Select Clster/ Mx Unit =w - mx 2- mx (w-)w- mx Bffer Grop i en iw en iw+ en (i+)w- (b) reg iw reg iw+ reg (i+)w- To Scan Chain i w (i w+) - (i w+) (i w+2) - [(i+)w-] (i+)w - Scan Chains Figre 6 Sorce Select nit As it will be shown, the encoding efficiency of the proposed method depends mainly on the nmber of selected cells, which determine the nmber of codewords of the Hffman code The same decompressor can be sed for two or more cores by changing only the nits and Clster Grop Length, as well as the mltiplexer in the Psedorandom Generator, which occpy only a small portion of the total area Moreover, if the and Clster Grop Length nits are implemented as lookp tables, they need to be loaded with the specific data of each core only at the beginning of the test session Therefore, the decompressor can be easily resed for different cores with almost zero area penalty In [] it was shown that the compression ratio redction in the case of tilizing the same decompressor for mltiple cores, de to the se of the same codewords, is only marginal This is easily explained if we take into accont that, for the same nmber of cells (same nmber of Hffman codewords) and relatively skewed freqencies of occrrence, the Hffman trees are not mch different and ths the encoding, if not optimal, will be very close to the optimal one Note that, regardless of the fact that the same Hffman FSM nit is tilized, the selected cells, list lengths, encoded blocks, clster size and block size do not have to be the same for different cores Let s now calclate the test application time Sppose that D, E is the size in bits of the ncompressed and compressed test set respectively The compression ratio is given by the formla CR=( D - E )/ D Let f ATE, f SYS be the ATE and system clock freqencies respectively, with f SYS =m f ATE, and N ch be the nmber of channels available for downloading the test data from the ATE Also, let G i be the nmber of occrrences of clster grops with length L i and F c, F b be the nmber of failed clsters and failed blocks respectively The test application time of the ncompressed test set is t D = D /(N ch f ATE ) and the redction is given by the formla 4

Table Compression Reslts Min- 2 scan chains 4 scan chains 8 scan chains scan chains Red core Test 8 cells 6 cells 24 cells 8 cells 6 cells 24 cells 8 cells 6 cells 24 cells 8 cells 6 cells 24 cells (%) s5378 23754 9597 942 9247 972 947 926 973 9427 9338 29 9697 952 6, s9234 39273 6595 656 5787 6746 62 5722 795 633 586 6995 6358 5923 6, s327 652 2865 2258 94 2363 8973 8543 239 884 838 952 8593 853 89, s585 76986 2844 243 963 275 9754 9326 2687 9763 933 292 262 9329 74,9 s3847 64736 65372 63725 62227 63569 6585 5978 6342 6593 5926 6384 64 5876 64,4 s38584 994 6277 689 5975 62449 59699 578 62989 6776 5838 6242 6584 5858 7, t red =(t D -t E )/t D, where t E is the test application time of the compressed test set t E consists of for parts: t The time reqired for downloading the data (codewords and failed blocks) from the ATE to the core: t = E /(N ch f ATE ) t 2 The time for the serialization of codewords by the Inpt Bffer (note that the compressed test set contains also failed blocks which do not reqire serialization): t 2 =( E -F b )/f SYS t 3 The time reqired for loading the scan chains with psedorandom seqences of length eqal to the nmber of the decoded clster grops (each clster reqires a single clock cycle so as to be loaded): t 3 = Gi L i f SYS i t 4 The time reqired for loading the scan chains with failed clsters Each clster is partitioned into / blocks and a single clock cycle is reqired for loading each block (either from the nit or the ATE) We note that the time reqired for downloading the failed blocks from the ATE has been taken into accont in t Ths: t 4 =F b /( f SYS ) The total time reqired for the compressed test set is t E =t +t 2 +t 3 +t 4 and it can be easily proven that N ch F c t = + + red CR E F b Gi Li D m i 4 Evalation and comparisons The proposed compression method was implemented in C programming langage We condcted or experiments on a Pentim PC for the largest ISCAS '89 benchmarks circits sing the dynamically compacted test sets generated by the Mintest ATPG program [7] The same test sets were also sed in [2]-[6], [8], [3], [4], [7]-[9] The rn time of the compression method is a few seconds for each benchmark circit In the experiments that we will present, a primitivepolynomial LFSR with internal XOR gates and size 2 was sed Each XOR tree of the phase shifter consists of 3 gates size () was considered eqal to N ch and ranged from 5 to, while clster-size vales () ranged from 2 to 5 Note that normal and inverted LFSR-cell otpts can be selected as different cells In Table the compression reslts of the proposed method for 2, 4, 8 and scan chains, and 8, 6 and 24 cells are presented For each cell-volme case, varios clster and block sizes were examined Among them the best reslts are reported In colmn 2 the sizes of the original Mintest test sets are shown It is obvios that the compression improves as the nmber of cells increases Last colmn presents the redction achieved by the best reslts of the proposed method (boldfaced) over Mintest Table 2 Compression improvement (%) vs other methods Circ [2] [3] [4] [5] [6] [8] [3] [4] [7] [8] [9] s5378 - - 29 25 93 33 35 9 367 58 2 s9234 293 3 273 29 24 26 478 26 343 236 5 s327 564 483 444 42 334 522 35 395 522 372 258 s585 526 368 266 257 28 262 232 26 383 232 27 s3847 362 356 96 372 236 3 3 96 2 5 4 s38584 445 357 253 257 23 9-2 27 33 228 8 In Table 2 we present the comparisons of the proposed method against other compression techniqes in the literatre that have reported reslts for the Mintest test sets It can be seen that the proposed approach performs better than all the other methods except of the case of s38584 of [3] which provides a marginally better reslt Compared to the singlescan-chain Mltilevel Hffman approach of [] the compression reslts are similar and therefore are not appended We note that no comparisons are provided against the approach of [2], which also exploits LFSR-generated psedorandom seqences, since its ATPG-synergy reqirement renders it nsitable for IP cores of nknown strctre Figre 7 TAT redction For assessing the test application time (TAT) improvements of or method we performed two sets of experiments, for the boldfaced cases of Table In the first one we stdy the redction of the test application time achieved against the case in which the test set is downloaded ncompacted (UNC) to the core, sing the same nmber of channels Figre 7 presents the average (UNC:AVG), minimm (UNC: MIN) and maximm (UNC:MAX) improvement for varios vales of m=f SYS /f ATE for all benchmarks It is obvios that as m increases, the test-time gain becomes greater In the second set of experiments we compare the test application time of the proposed method against the single-scan-chain Mltilevel Hffman approach of [] Since [] considers only one channel for downloading data from ATE, we recalclated its test application time for the channel volmes sed in this paper (an inpt bffer is appended in [] too) The best reslts of the proposed method and [] have been compared and the average ([]:AVG), minimm ([]:MIN) and maximm ([]:MAX) improvement for varios vales 5

of m for all benchmarks, are shown in Figre 7 It is obvios that the test application time gain is very high in all cases (4%-86%) However, althogh the test-time gain attribted to the parallel loading of mltiple scan chains is constant, the serialization of the decoder inpt data is carried ot faster as m increases and ths the test-time redction drops For calclating the hardware overhead of the proposed techniqe, we synthesized three different decompressors sing Leonardo Spectrm (Mentor tools) for 8, 6 and 24 cells, assming ATE channels, 4 scan chains, = 2 bits and = bits The and Clster Grop Length nits were implemented as combinational circits The reslted area overhead is 377, 473 and 582 gate eqivalents respectively (a gate eqivalent corresponds to a 2-inpt NAND gate) In this overhead we have not considered the Scan Bffer which can be avoided and is not considered in the other methods too The hardware overhead, in gate eqivalents, for the most efficient methods in the literatre is: 46 for [9], 32 for [4], 36-296 for [6], 25-37 for [2] (as reported in [6]) and 23-432 for [], while the hardware overhead of [8], althogh not reported directly, is greater than that of [6] As can be seen, the hardware overhead of the proposed decompressor is comparable to that of the rest techniqes, even thogh all of them do not exploit the advantages of mltiple scan chains (ie, perform serial decoding which is a simpler and less hardware intensive case) The hardware overhead of the proposed method can be redced if the same decompressor is sed for testing, one after the other, several cores of a chip Units Hffman FSM, Controller, R, Sorce Select, as well as the LFSR and the phase shifter can be implemented only once on the chip On the other hand, nits, Clster Grop Length and the mltiplexer of the Psedorandom Generator mst be implemented for every core nder test The area occpied by the latter nits is eqal to 77%, 4% and 97% of the total area of the decompressor for 8, 6 and 24 cells respectively Therefore, only a small amont of hardware shold be added for each additional core The se of the same Hffman FSM nit for several cores implies that the codewords, which correspond to LFSR cells, list lengths and data blocks, are the same for each core, while the actal cells, list lengths and data blocks can be different As shown in [], in sch a case, the compression ratio sffers only a marginal decrease 5 Conclsion A test-data compression method that can exploit the existence of mltiple scan chains in a core in order to redce the test-application time has been presented Mltilevel Hffman coding, properly adapted to the mltiple-scan-chains case, is sed for compressing the test data, while a lowoverhead decompressor capable of generating whole clsters of test bits in parallel is also introdced The proposed method offers redced test-application times, high compression ratios and increased probability of detection of nmodeled falts, since most of the test sets' 'x' bits are replaced by psedorandom vales References [] K Chakrabarty, et al, Deterministic Bilt-In Test Pattern Generation for High-Performance Circits sing Twisted-Ring Conters, IEEE Trans On VLSI Systems, pp 633-636, Oct 2 [2] A Chandra, K Chakrabarty, System-on-a-Chip Test- Compression and Decompression Architectres Based on Golomb Codes, IEEE Trans on CAD, vol 2, no 3, pp 355-368, 2 [3] Chandra A, Chakrabarty K, Test Compression and Decompression Based on Internal Scan Chains and Golomb Coding, IEEE Trans on CAD, vol 2, pp 75-72, Jne 22 [4] A Chandra, K Chakrabarty, A Unified Approach to Redce SOC Test Volme, Scan Power and Testing Time, IEEE Trans on CAD, vol 22, no 3, pp 352-363, 23 [5] A Chandra, K Chakrabarty, Test Compression and Test Resorce Partitioning for System-On-A-Chip Using Freqency- Directed Rn-Length (FDR) codes, IEEE Trans on Compters, vol 52, no 8, pp 76 88, 23 [6] PT Gonciari, BM Al-Hashimi, N Nicolici, Variable-Length Inpt Hffman Coding for System-On-A-Chip Test, IEEE Trans on CAD, vol 22, no 6, pp 783 796, 23 [7] I Hamzaogl, J H Patel, Test Set Compaction Algorithms for Combinational Circits, IEEE Trans on CAD, vol 9, no 8, pp 957-963 [8] A Jas, J Ghosh-Dastidar, M E Ng and N A Toba, An Efficient Test Vector Compression Scheme Using Selective Hffman Coding, IEEE Trans on CAD, vol22, no6, pp797-86, 23 [9] E Kalligeros et al, Mltiphase BIST: A New Reseeding Techniqe for High Test- Compression, IEEE Trans on CAD, vol 23, no, pp 429-446 [] D Kaseridis et al, An Efficient Test Set Embedding Scheme with Redced Test Storage and Test Seqence Length Reqirements for Scan-based Testing, Inf Papers Dig of IEEE ETS, pp 47-5, 25 [] X Kavosianos et al, Efficient Test- Compression for IP Cores Using Mltilevel Hffman Coding, DATE 6, to appear [2] L Lei, K Chakrabarty, Test Set Embedding for Deterministic BIST Using A Reconfigrable Interconnection Network, IEEE Trans on CAD, vol23, pp 289-35, Dec 24 [3] Lei Li et al, Efficient space/time compression to redce test data volme and testing time for IP cores, Proc of Int Conf on VLSI Design, pp53-58, 25 [4] A El-Maleh, R Al-Abaji, Extended Freqency-Directed Rn-Length Code with Improved Application to System-On-A- Chip Test Compression Proc of IEEE ICE, vol 2, pp 449-452, 22 [5] M Norani, M H Tehranipor, RL-hffman Encoding for Test Compression and Power Redction in Scan Applications, ACM Trans on Design Atomation of Electronic Systems, vol, no, pp 9 5, 24 [6] J Rajski, N Tamarapalli, and J Tyszer, Atomated synthesis of phase shifters for bilt-in self-test applications, IEEE Trans Compter-AidedDesign, vol 9, pp 75 88, Oct 2 [7] P Rosinger, et al, Simltaneos Redction in Volme of Test and Power Dissipation for Systems-On-A-Chip, Electr Letters, vol 37, no 24, pp 434 436, 2 [8] M Tehranipor et al, Mixed RL-Hffman Encoding for Power Redction and Compression in Scan Test, Proc of ISCAS, vol 2, pp II- 68-4, 24 [9] M Tehranipor, M Norani, K Chakrabarty, Nine-Coded Compression Techniqe for Testing Embedded Cores in SoCs, IEEE Trans On VLSI Systems, vol 3, pp 79-73, Jne 25 [2] EH Volkerink, A Khoche, S Mitra, Packet-Based Inpt Test Compression Techniqes, Proc ITC, pp 54-63, 2 6