Optimal Selective Count Compatible Runlength Encoding for SOC Test Data Compression

J Electron Test (2016) 32:735 747 DOI 10.1007/s10836-016-5617-x Optimal Selective Count Compatible Runlength Encoding for SOC Test Data Compression Harpreet Vohra 1 Amardeep Singh 2 Received: 12 June 2016 /Accepted: 2 September 2016 /Published online: 26 October 2016 # Springer Science+Business Media New York 2016 Abstract Test data volume amount is increased multi-fold due to the need of quality assurance of various parts of the circuit design at deep submicron level. Huge memory is required to store this enormous test data which not only increases the cost of the ATE but also the test application time. This paper presents an optimal selective count compatible run length (OSCCPRL) encoding scheme for achieving maximum compression for reduction of the test cost. OSCCPRL is a hybrid technique that amalgamates the benefits of other two techniques: 10 Coded run length (10 C) and Selective CCPRL (SCCPRL) proposed here. These techniques work on improvement of the 9 C and CCPRL techniques. In OSCCPRL, entire data is segmented in blocks and further compressed using inter block and intra block level merging techniques. SCCPRL technique is used for encoding the compatible blocks while the 10C is used to do encoding at sub block (half block length) level. In case, if no compatibility is found at block/sub block level then the unique pattern is held as such in the encoded data along with the necessary categorization bits. The decompression architecture is described and it is shown how by just the addition of few states of FSM, better test data compression can be achieved as compared to previous schemes. The simulation results performed for Responsible Editor: N. A. Touba * Harpreet Vohra hvohra@thapar.edu 1 2 Amardeep Singh Amardeep_dhiman@yahoo.com Electronics and Communication Engineering Department, Thapar University, Patiala, India Computer Science and Engineering department, Punjabi University, Patiala, Punjab, India various ISCAS benchmarks circuits prove that the proposed OSCCPRL technique provides an average compression efficiency of around 80 %. Keywords Test data compression. Block merging. Compatibility. Test application time. Code-based testing 1 Introduction The integration of billions of transistors has led to the feasibility of having complete System On Chip or the so called SOC. SOCs provide improved performance, low power, smaller size and shorter time to market by incorporating a heterogeneous mix of different digital/analog logic blocks and embedded memories. The recent developments in the semiconductor technology has brought in the implementation of 3D structures [21] that provide the benefit of vertical interconnects in place of the long horizontal wires allowing much complex systems to be made in lesser space. However, all these advancements have brought in new challenges for design and test engineers [13]. Defects occurrence due to possibilities of errors like imperfections in manufacturing of transistors/interconnects at deep submicron level, misalignment during stacking of dies in 3D SOC etc. lead to a vivid increase in the number of faults. For quality assurance against these faults huge number of test vectors/test data need to be applied. This test data can either be generated on chip (with help of BIST or storage on available on chip memory) or saved externally on Automatic Test equipment (ATE) memory. The unavailability of BIST on all the embedded cores and insufficient on-chip memory (dedicated for holding the test data) makes the later approach more promising. High test volume affects cost associated with the installation of ATE and the application of the test data. ATE cost is dominated

736 J Electron Test (2016) 32:735 747 by the need of high end workstations for the generation and application of the test data. Likewise, the major test application cost parameter is the testing time which involves the time needed to transport the test data between the ATE and the SOC periphery and its application to the various IP cores/circuit under test [8]. Out of these, test time reduction during application of test data to various cores is taken care of by the test scheduling mechanism that allows the testing of the various cores to be done concurrently. The time needed in test data transport is directly related to the bandwidths and I/O channel capacity available between the ATE and the system on chip. Due to the pin limitation the available test bandwidth is small which requires the test data to be sent serially from ATE to the circuit under test. In order to reduce the cost associated with both test data transport and the ATE memory requirement, the test data should be compressed as much as possible. An SOC test model is presented in Fig. 1. During test mode, the compressed data which is already saved in the ATE source is shifted to the circuit under test (IP cores) at slow ATE clock. An on chip decompression is required for retrieving the original test data set. This test data can be applied to the various scan chains at the frequency of the circuit under test. Similarly, the resulting test responses can be compressed/compacted and taken out of the SOC to the ATE sink wherein they are compared with the pre-saved golden/expected responses. The output of the comparator helps identify whether the circuit under test is a working fine or not. The existing test data compression techniques can be broadly classified into three types [26]: Linear decompression based schemes Broadcast scan based schemes and Code based schemes. Out of above, linear decompression based schemes encode the test data by solving a series of linear equations. The decompression is done using linear operations based on linear feedback shift registers (LFSRs) [12,25], combinational linear XOR networks [1,31], ring counter [19,20] and twisted-ring counter [33]. Broadcast scan-based schemes utilize a single tester channel value to drive multiple internal scan chains with only combinational logic [18,28]. However, the implementation of the above mentioned techniques need structural changes in the design which are normally not permissible. Therefore, the test engineers have to work on the compression of the test sets provided by the vendors. A careful analysis of the test data of various ISCAS 85 and 89 benchmark circuits show the following combinations: huge number of don tcare bits, continuous run of zeroes and ones, runs of same pattern sequence for multiple number of times, compatible or inverse compatible sequences etc. Many researchers have been working on exploitation of one or more of the above stated features and came up with code based compression techniques. These techniques can be broadly classified into run length coding techniques or statistical coding techniques. Run-length-based coding techniques help compress the test data by encoding the runs of zeroes or ones. For example Golomb code [2], frequency directed run-length (FDR) code [3] and alternating run-length (ALT-FDR) code [4] compresses the data by encoding the run lengths of 0 s and 1 s. Similarly, blocks of equal length that show compatibility can be merged together to reduce the test data. Examples like Block merging[6], block merging with 8 C [29], 9 C [24], Pattern run-length (PRL) code [22], extended frequency-directed run-length (EFDR) code [7], multi-dimensional pattern run-length code (MDPRC) [27], internal pattern run-length (IPR) code [14], 2n pattern run-length (2n-PRL) code [5] and Count compatible run length encoding (CCPRL) [32] improves the compression efficiency. Other codes based test data compression encoding scheme are statistical codes that do the encoding based on the frequency of occurrence of the patterns [9 11,16,23]. Another technique based on efficient utilization of ATE Vector Repeat is presented in [30]. In this paper, three new techniques have been proposed to increase the compression of test data. Out of these, the first and second techniques tries to improve the compression of test data by modifying the previously proposed 9C and CCPRL techniques while, the third is a hybrid technique that presents an optimal selective count compatible run length based coding (OSCCPRL). The proposed OSCCPRL technique achieves higher compression ratio for precomputed test sets (independent of the structural details of the circuit under test) resulting Fig. 1 SOC Test Model

J Electron Test (2016) 32:735 747 737 in minimization of test application time and memory requirement for the test data. The decompression logic is quite simple and doesn t require on-chip memory unlike various dictionary based techniques [15]. The organization of the paper is as follows: Section 2 presents the SOC test data properties and overview of previous techniques. Section 2.1 briefs about various properties possessed by the SOC test data and their utilization by the different code based compression techniques. Section 2.2 provides the overview of Nine coded and Count compatible runlength encoding techniques. Section 3 presents the proposed techniques by initially highlighting the shortcomings of the previous techniques which forms motivation to do the proposed work. Sections 3.1 and 3.2 describe the proposed 10 C and selective CCPRL techniques; section 3.3 presents the proposed OSCCPRL encoding technique. Section 4 describes the decompression architecture of OSCCPRL. Section 5 contains simulation results for the test data compression for different ISCAS circuits. Finally, section 6 draws the conclusions. 2 SOC Test Data and Compression Techniques 2.1 Test Data Properties An electronic circuit can have a variety of possible manufacturing defects that can result into different faults like: stuck at, bridging faults, cross point faults, transition faults etc. According to the fault diagnosis, the test vectors are calculated that may consist of three values: 0, 1 and X (don t care). This test data being a stream of bit vectors may consist of different combinations like a continuous run of zeroes/ones/ zeroes or ones sandwiched between some don t care values/ some runs of data bits consisting of patterns of 0 s, 1 s and don t care etc. To improve the compression efficiency the various don t care values can be filled with either 0 or 1 [14,23]. Examples to show the effectiveness of test data compression obtained by using the test data properties are given in Table 1. As shown in the case 1 of Table 1, attempt is made to increase the runs of 0 s/1 s by filling the don t care bits with 0 s or1 s respectively. Such all zeroes and all ones combinations can be compressed with less no of bits by using codes like Golomb, FDR etc. Similarly, more than one number of blocks can be merged to form a representative (retained) block if they hold compatibility/inverse compatibility among them. In some cases, the retained pattern itself may hold combinations like 0U, 1U, U0, U1, UU (compatibility),uu (inverse compatibility) at sub block level (as shown in case 2 and 3). Encoding at sub block level can be done with much reduced number of bits by using codes like 9C, BM, BM-8C etc. Leftover don t care bits may either be filled as per minimum transition fill to avoid unnecessary switching in the scan in chains or left intact as they may help find additional faults sometimes. 2.2 Overview of Nine Coded and Count compatible runlength encoding techniques The Nine (9) Coded and Count compatible runlength encoding techniques are test independent techniques that employ the don t care (X) filling with appropriate bit values to minimize the test data volume. The encoding of various test data properties is done using less number of bits. The brief summary of these techniques is as given below: 2.2.1 Nine (9) Coded compression technique The 9 C (fixed) technique partitions [24] the complete test set into blocks of fixed length. Each block is partitioned into two halves which are further checked for the different properties based on their bit pattern. For example the data in the blocks can have one of the following combinations: 00/11/01/10/0U/ 1U/U0/U1 and UU (complete block being unique). Similarly, in the variable length 9 Coding [24]: best block length is chosen by carrying multiple investigations on the test data using different block length values such that maximum compression can be achieved. In other words instead of finding a global best block length as in case of fixed 9C, local best block lengths are found and encoding is done accordingly. However, changing the value of the block length for different pattern sequences necessitates either sending the value of the block length each time or maintaining associated dictionary of block lengths so that the data can be accurately decoded on chip. All this makes the decoder design a little more complex. 2.2.2 Count Compatible Pattern run Length Encoding Scheme The CCPRL code works by selecting a pattern block which is compared with its consecutive blocks for compatibility checks. The various don t care bits are filled appropriately for making the two blocks compatible so as to form a merged pattern. This merged pattern is compared with other succeeding pattern blocks till no further merging can be done. Correspondingly the encoded stream is produced that consists of block code (binary conversion of the length of the pattern block), pattern code (merged/retained block sequence), count code (number of times matches between the retained pattern and adjacent blocks is found), compatible code (consisting of relation bits that signify the compatible/ inverse compatible relation between the retained and successive blocks) [32].

738 J Electron Test (2016) 32:735 747 Table 1 Examples of test data properties and don t care bits utilization Case Description Pattern Blocks with K = 12 Action taken Codeword at Block level Codeword at sub Block level 1 Runsof0 s and X s/runs of 1 s and X s 2 Compatible/Inverse compatible patterns at block and sub block levels 3 Combinations of zeroes, ones or unique pattern 0000XX0XX000 All X s changed to 0 All zeroes 00 1X11X1X11XX1 All X s changed to 1 All ones 11 10XX0111XX00 XXXX0111X000 1010101010XX 010101010X01 00X00011XXX1 XXXXXX111111 X present at tenth bit position of first block changed to 0 and X s at first-second bit positions of second block are changed 1 and 0. X s at eleventh and twelveth bit positions of first pattern changed to 1 and 0 respectively X at tenth position in second block changed to 0 X s at third, tenth and eleventh bit positions of first pattern changed to 0, 1 and 1. X s at first to sixth positions in second block changed to 0 00000010101 000000 followed by unique 101010 10111X000X00 111X11101011 X present at tenth bit position of second block changed to 0 to make block 10111X followed by 000000 X present at fourth bit position of 1st block changed to 1 to make 111111 followed by 101011. 10XX0111X000 retained pattern block (Compatibility at block) 101010101010 retained pattern block (Inv_Compatibility at block) 000000111111 retained pattern block (Inv_Compatibility at block) 00000010101 (retained unique pattern block) 10111X000X00 (retained unique pattern block) 111X11101011 (retained unique pattern block) No compatibility at sub block level (unique -UU) 101010 retained pattern sub block (Compa-tibility at sub block) 01 All zeroes sub block followed by All ones sub block 0U All zeroes sub block followed by Unique sub block U0 Unique sub block followed by All zeroes sub block 1U All ones sub block followed by Unique sub block 3 Proposed Test Compression Techniques The objective of an optimal test data compression technique is to encode the test data to obtain a minimum data length to improve the compression efficiency. Reduced test data further minimizes the amount of ATE memory requirement. Similarly, the encoding should ensure that it doesn t increase the on chip decoder area overhead beyond the acceptable limits. Three different approaches are presented here: out of which two techniques: 10 Coded compression and Selective Count Compatible Run Length encoding (SCCPRL) work on the short comings (as discussed in section 2) of9cand CCPRL respectively; while the third: Optimal selective count compatible run length encoding scheme (OSCCPRL) is a hybrid technique that merges the benefits of the two and improves the compression further. 3.1 Ten Coded Compression Technique Fixed and variable length 9 C [24] techniques offer good compression efficiencies but, the lack of compatibility checks between adjacent blocks still puts some limitations in the path of attaining higher compression efficiency. In other words, a better compression could have been achieved if compatibility analysis between different blocks was also checked 10 coded (10C) compression technique works on the limitation of 9C compression technique by addition of prefix and sub-prefix bits. In this technique, the test data after being segmented in equal length blocks is encoded on basis of ten different intra level cases that can broadly be classified as: eight special cases (00/11/01/10/0U/1U/U0/U1) andthree unique cases (UU/UU /UV). Among the unique cases, both U and V notations represent the unique bits. UU and UU notations

J Electron Test (2016) 32:735 747 739 Table 2 Encoding scheme of 10 coded compression technique Input Block (K = 8) Sym-bol Prefix Sub- Prefix Tail Decoder Input for 10c No. of bits in Code word (10C) Decoder Input for 9C [34] No. of bits Code word (9C) 0000_0000 00 0 NA 000 0000 4 0 1 1111_1111 11 0 NA 111 0111 4 10 2 0000_1111 01 0 NA 001 0001 4 11000 5 1111_0000 10 0 NA 010 0010 4 11001 5 0000_UUUU 0U 0 NA 011 0011_UUUU 8 11010_ UUUU 9 1111_UUUU 1U 0 NA 100 0100_UUUU 8 11011 _ UUUU 9 UUUU_0000 U0 0 NA 101 0101_UUUU 8 11100_UUUU 9 UUUU_1111 U1 0 NA 110 0110_UUUU 8 11101_UUUU 9 UUUU_UUUU UU 1 1 0 110_UUUU 7 1111_ UUUU_UUUU 12 UUUU_U U U U UU 1 1 1 111_UUUU 7 1111_ UUUU_U U U U 12 UUUU_VVVV UV 1 0 NA 10_UUUU_VVVV 10 1111_ UUUU_VVVV 12 represent cases wherein the sub blocks carrying unique bits, hold compatible/inverse compatible relation with respect to each other. Meanwhile, UV notation is used to signify that the two sub blocks carry unique bits and are uncompatible. The encoding scheme is as presented in the Table 2 As shown in column number 3 of Table 2, the status of the prefix bit is set either to 0 to specify the occurrence of one out of eight special cases or to 1 to signify the unique cases. Among the unique cases, the sub-prefix bit is set to 1 to signify if the sub blocks hold compatibility else it is set to 0. As per the comparison between total number of bits in encoded codewords for 10C and 9 C techniques (given in columns 7 and 9 respectively), 10C offers an advantage of :1 bit for 01, 10, U0, U1, 0U and 1U cases (it being true for all block lengths except the block length =10 bits wherein both 10C and 9C will gave same codeword length); 1+ (block length/2) bits for UU and UU cases and 2 bits for UV cases. At the same time, 10C provides a disadvantage of extra 3 bits for all zeroes and 2 bits for all ones cases. Based on the statistics of frequency of occurrence of different combinations of test patterns, the penalty of additional 3 and 2 bits in cases of all zeroes and all ones respectively can be offset if the number of unique cases are more. For example, a random sequence: 00XX0111X110101X00X0111X1X0-0X000XX000XX10101101011010101010 when segmented in blocks of length 10 bits result in the patterns given in column no 2 of Table 3. The data blocks on being encoded using 9C and 10C run length techniques result in 49 and 37 bits respectively. Hence a saving of 12 bits can be achieved by using 10C technique. The various combinations and corresponding code words are as presented in Table 2. 3.2 Selective CCPRL In case of CCPRL, if the pattern turns out to be unique,thena count code is still needed to be added to signify zero block match. In such cases, the extra count code bits lead to inefficient test data compression. As the name suggests, the Selective CCPRL technique works on the basis of selection between: whether to encode as per CCPRL scheme or not. This selection between the inter block merging cases and unique cases is done using an Table 3 Example to demonstrate 10C encoding compression efficiency over 9C Input Block Combinations with block length K = 10 Encoding combinations as per 9C with Encoding combinations as per 10C Cases Codeword No. of bits Cases Codeword No. of bits 1 00xx0_111x1 01 11000 5 01 0001 4 2 10101_x00x0 U0 11100_10101 10 U0 0101_10101 9 3 111x1_x00x0 10 11001 5 10 0010 4 4 00xx0_00xx0 00 0 1 00 0000 4 5 10101_10101 UU 1111_ 10101_10101 14 UU 110_10101 8 6 10101_01010 UU 1111_ 10101_01010 14 UU 111_10101 8 Total no of bits 49 37

740 J Electron Test (2016) 32:735 747 additional bit called detect bit which is added as a prefix to code word. The status of the detect bit is set to 1 if the merging is possible else it is set to 0. The corresponding coding scheme so formed consists of encoding prefixes (block code : indicates the binary conversion of pattern length and needs to be added in the beginning, detect bit: to represent if merging is possible) and tails (retained pattern code: presents the retained pattern sequence with length equal to the decimal value corresponding to block code, count code: represents count of compatible blocks with retained pattern, compatibility code: signifies compatibility relation of each block with retained pattern block which is set to 0 if compatible and 1 if inverse compatible). For Unique cases, retained pattern needs to be just preceded by the detect bit (equal to 0 ). This nullifies the need to send redundant count code (with all zeroes to represent 0 block matching). The encoding of the test data in case of inter block merging is done in a way very similar to CCPRL. An example to show the effectiveness of the SCCPRL over CCPRL for the sequence 0101010101011010101010101101 10111110 is provided in Fig. 2. The pattern length k is taken to be 6 (fixed) and the lengths of block and count code is taken to be 3 bits each. The example shown in the Fig. 2 employs fixed block size due to which the block count (110) needs to be sent just once in the beginning of the encoded output. It may be noted that if the same encoding had to be done without using detect bit then additional count code ( 000 ) had to follow the retained patterns to signify unique cases. For increasing the compression even further, variable block sizes can be used. This is done as follows: for a block count of R bits, all combinations of block sizes ranging from 3 to 2 R -1 are tried to find the optimum compression. For example, for a block count of 4 bits, iterations are run for block sizes equal to 3,4,5,6.15 and the corresponding compression efficiencies calculated as per the Eq. 1 (given below) are calculated and compared. Compression efficiency ¼ uncompressed tst length compressed test length uncompressed test length *100 ð1þ The best block size value is chosen and accordingly the code word is generated. The minimum value block size equal to 3 is chosen since below 3 bits the number of the test data bits will be expanded instead of getting compressed [3,17,22]. In each iteration, search is done till no more compatibility is found and a code word is generated accordingly. Left over bit sequences are put for next round of investigation. For keeping the decoder design simple the number of bits in the block count and count code are chosen to be same. 3.3 Optimal Selective Count Compatible run Length Code OSCCPRL technique is an amalgamation of good features of 10 C and SCCPRL techniques. Like SCCPRL, the inter block merging in OSCCPRL may involve multiple blocks (contrary to the 10C technique wherein the inter block merging is limited to only two blocks). Similarly, if the pattern block is found to be unique, then intra block level merging is investigated for presence of special eight different combinations (unlike SCCPRL wherein the unique block needs to be retained as such). The encoding scheme of OSCCPRL consists of two extra bits namely: detect and prefix. The significance and status of the two signals for various cases are as shown in Table 4.The detect bit is used to differentiate between inter versus intra block level merging similar to SCCPRL technique. Among the intra level cases, the status of the prefix bit helps in the selection of the occurrence of one of special eight cases and unique cases. OSCCPRL techniques works on the principle of achieving maximum compression by opting either inter or intra block merging on the basis of the compression achieved by both. This technique works as follows: based on the value R (number of bits to represent block code), inter block level investigations are done using different block sizes. Encoding is done similar to SSCPRL (as explained in section 3.2) using detect Fig. 2 The code word using SCCPRL

J Electron Test (2016) 32:735 747 741 Table 4 Encoding scheme of Optimal Selective Count Compatible Run Length encoding technique Cases Sub Case (i) Input Block (K = 12) Case Detect Bit Prefix Tail Code word (Ci) Decoder Input A (Intra Block) B (Inter Block) 1 000000_000000 00 0 0 000 00_000 2 111111_111111 11 0 0 111 00_111 3 000000_111111 01 0 0 001 00_001 4 111111_000000 10 0 0 010 00_010 5 000000_UUUUUU 0U 0 0 011 00_011_UUUU 6 111111_UUUUUU 1U 0 0 100 00_100_UUUU 7 UUUUUU_000000 U0 0 0 101 00_101_UUUU 8 UUUUUU_111111 U1 0 0 110 00_110_UUUU 9 UUUUUU_VVVVVV UV 0 1 UUUUUU_VVVVVV 01_UUUUUU_VVVVVV 10 (Bi/B (i + 1) Or COMP/INV. COMP 1 NA Selective CCPRL 1 _Pattern Code_ Count code _compatible Code bit, block code, pattern code, count code and compatibility coded. Based on the analysis of the compression efficiency achieved in each case, selection of the optimum block size and its associated compression efficiency is done. Similarly, the pattern block of fixed size is selected and considered for intra block checks (to explore the possibility of occurrence of one of special eight cases as mentioned under sub cases 1 8 of Table 4). The encoded code word so formed includes detect bit followed by prefix and tail code word respectively. Using the codeword length, Intra block level compression efficiency is also calculated. Based on the comparison of the compression efficiencies obtained in each case, either of the two is selected. For rare cases when both inter block and intra block level merging fail, then the unique pattern is sent as such along with detect and prefix bits being set to 0 and 1 respectively. As is evident from the Table 4, the number of the redundant bits that need to be sent in case of unique pattern is always two irrespective of the size of the pattern code. An example of a random test sequence: 00XX00000XX1110 110101011100000XXXX11111XXX11-1010101011101011000 (63 bits) to show the effectiveness of the OSCCPRL technique over CCPRL and SCCPRL is presented in Table 5. In all three techniques, block code and count code are taken to be of 3 bits each. Since the length of block code is 3 bits so, the block lengths (K) for the investigation can range from 3 to 2 3-1 (3,4,5,6 and 7) as shown in rows 2 6 oftable5. Current round (column 2) considers the blocks that can be merged and the leftover bits are further moved to next round. The compression efficiency is calculated using Eq. 1 in each case. As evident from the compression efficiency results obtained in Table 5: if a block length of 3 bits is employed in CCPRL, the T E1 and T D values are same, which means no compression is achieved at all. However, the fact that there is no increase in the test data length makes it best choice out of lot (3, 4, 5, 6 and 7); SCCPRL results are better than CCPRL only when no inter block merging is possible at all. Results obtained for intra case with block length K = 7 are presented in row no 7. As evident from the compression efficiency analysis, intra block encoding for OSCCPRL turns out to be a better option for current round. It may be noted that the compatibility check at the sub block level as mentioned in 10C technique has been intentionally avoided since its inclusion would require an extra sub prefix bit which nullifies the advantage offered by SCCPRL. 4 Decompression Architecture The decompression architecture for the optimal selective count compatible run length encoding is shown in Fig. 3. From the top level the decompressor has four incoming signals: ATEclk, CUTclk, data_in and Ack and one out going signal: scan_out. The encoded data stream is received from ATE through data_in at ATE clock frequency (ATEclk), decoded and fed to the circuit under test (CUT) at the circuit frequency CUT_clk through scan _out line. The major blocks of decompression architecture are FSM, module R/module C counter, K bit counter, K bit shift register and buffer. The FSM is used to generate appropriate control signals based on the various bits received from the data_in. The counter1 :module R/n are used to count till the block count and count code while the K bit counter is used for receiving the pattern code. The K bit multiplexer is used to fill in the appropriate bits to deliver the test data based on the select lines received from the FSM. The scan_en signals finally controls when to enable and let go the test data onto the circuit under test. The description of the various blocks of the decoder is as follows:

742 J Electron Test (2016) 32:735 747 Table 5 Compression ratio using CCPRL, SCCPRL and OSCCPRL K Current Round AndLength(T D ) Codeword and length Using CCPRL (T E1 ) Codeword and length Using SCCPRL (T E2 ) Codeword and length Using OSCCPRL (T E3 ) 7 00XX000 (7 bits) 111_00XX000_000 (13 bits) 0_111_00XX000 (11 bits) 0_111_00XX000 (11 bits) 6 00XX00 (6 bits) 110_00XX00_000 (12 bits) 0_110_00XX00 (10 bits) 0_110_00XX00 (10 bits) 5 00XX00000X (10 bits) 101_00000_001_0 (12 bits) 1_101_00000_001_0 (13 bits) 1_101_00000_001_0 (13 bits) 4 00XX0000 (8 bits) 100_0000_001_0 (11 bits) 1_100_0000_001_0 (12 bits) 1_100_0000_001_0 (12 bits) 3 00XX00000XX1 (12 bits) 011_000_011_001 (12 bits) 1_011_000_011_001 (13 bits) 1_011_000_011_001 (13 bits) Intra-block merging with K=7 00XX000 (7 bits) NA NA 00000 (5 bits) Compression Next Round and length [(T D -T E1 )/ (TD)]*100 (CCPRL) [(T D -T E2 )/(T D )]*100 (SCCPRL) [(T D -T E3 )/(T D )]*100 (OSCCPRL) 85.7 % 57.14 % 57.14 % 00XX111011010101110 0000XXXX11111XX X1110101010111010 11000 (56 bits) 100 % 66.7 % 66.7 % 000XX11101101010111 00000XXXX11111X XX11101010101110 1011000(57 bits) 20 % 30 % 30 % X11101101010111000 00XXXX11111XX X11101010101110 1011000 (53 bits) 37.5 % 50 % 50 % 0XX1110110101011 100000XXXX11 111XXX1110101 01011101011000 (55 bits) 0% 76.9 % 76.9 % 1101101010111000 00XXXX11111 XXX11101010 1011101011000 (51 bits) NA NA 28.57 % 00XX1110110101 011100000XXX X11111XXX111 0101010111010 11000 (56 bits)

J Electron Test (2016) 32:735 747 743 Fig. 3 Decompressor Architecture 4.1 Counters The decompression circuit consists of two counters: counter1: module R/module C counter and counter2: K/n bit counter. The functions of the first counter is to count and hold the value of the retained pattern length (module R) in the start of the decoding process and count code (module C) during the later part when merging is being done at the block level. As described in section 3.2, R and C values helps in defining retained pattern length and number of such compatible pattern blocks. The counter 2 performs three functions: reception of the pattern block of K bit length/sub block of K/2 bit length or compatibility bits equal in number as defined by the C value. The working of the counters can be explained in reference to selective CCPRL case and eight special cases as follows: 4.1.1 Case1 (Inter Block Merging) on the K bit buffer is then shifted out to the circuit under test using the scan_en signal generated by the FSM. From here the data would be sent to the circuit under test at CUTclk. 4.1.2 Case2 (Intra Block Merging: Special Eight Cases) During the occurrence of one of the 0U,1U,U0,U1 combinations as decided by the FSM, the counter 2 and scratch register are made to work till K/2. This is done by setting the status of the flag2[1:0] lines = 10. The rest of the data packet formation and transfer to CUT is done in more or less the same way as explained earlier relative to select line sel[1:0] and scan_en. The occurrence of one out of rest four cases (00, 01, 10, 11), deactivates the counters. FSM sets the appropriate sel[1:0] lines for appropriate data packing. On the initialization of reception of encoded data from ATE as signified by ACK signal, the initial R bits are loaded into the counter 1 using inc and flag1 signals. For holding the block code,theflag1 is set to 1.Oncompletion of the R bits, the counter asserts a 1 at the done1 line. On receiving done1 =1,theFSMsetsflag2[1:0] = 00 to make counter2 work as K bit counter. Shift, dec2 signals help shift the data to the K bit scratch register through pattern_in line. On completion of the counting/resetting of counter the done2 signal is set to 1. Next, FSM generates the control signal flag1 = 0 to make thecounter1workasmodulec counter. Once the counter starts its task it maintains done1 = 0. Oncompletionof module C, the counter asserts done1 = 1 again. Occurrence of high on done1 leads the counter2 as n bit counter to receive the compatibility bits. This is done by setting the flag2[1:0] lines equal to 01. Corresponding to the compatibility bits the FSM generates the appropriate select signals for MUX which either feeds the retained pattern or inverse of it to the buffer. The data 4.1.3 Case 3 (Unique Case) On reception of detect= 1 the FSM sets Flag2 [1:0] is set to 11 and hence the counter2 and scratch registers are made to work for K bits. However here the select lines allow the pattern received to be directly fed to the CUT through the K bit buffer. 4.2 FSM The FSM of the decoder is shown in Fig. 4. It initiates in state S0 where in the initial fixed block count value is uploaded in counter 1. On receiving the complete count value, counter generates the done1 signal. From here if the data_in value is equal to 1, then the FSM generates control signal for SCCPRL at block level. On the other hand, receiving a 0 on data_in signal, makes the FSM generate an output analogous to occurrence of one of special eight cases (described earlier). Depending upon the various states, the FSM

744 J Electron Test (2016) 32:735 747 Fig. 4 FSM of Decompressor Architecture generates the select lines for the output buffer and various counters. The leaf nodes: C0-C9 represents the different encoding cases. 5 Simulation Results In order to validate the efficiency of these proposed techniques, comparisons are done with previous techniques using different benchmark circuits. The test sets generated by Mintest ATPG for Six large ISCAS 89 benchmark circuits are taken as input and fed to the various compression algorithms. Figure 5 represent the frequency of occurrence of inter block and unique cases for test vector files of various benchmark circuits. A fixed block size equal to 8 for all the benchmark circuits has been taken just as an example and may not give the best efficiency. It may be observed that if the frequency of occurrence of compatible cases is always more than that of unique case then, the overhead of extra detect bit for the inter cases may reduce the compression efficiency. Hence, to take advantage of SCCPRL, unique cases should be more. On analysing the unique cases further, blocks may be found to have occurrence of special 8 cases. Figure 6 provides the Fig. 5 Frequency of occurrence of inter block merging and unique case for different benchmark circuits (for fixed block size =8) frequency of occurrence of special 8 cases within the unique pattern blocks. So if these eight special cases are encoded with lesser number of bits then better compression can be achieved. Same is being done in OSCCPRL. The results corresponding to different encoding schemes and the proposed OSCCPRL are presented in Table 6. Assuming T D and T E represents the uncompressed and compressed test data length respectively. The compression efficiency percentage can be obtained using Eq. 1. The comparisons are made between the proposed techniques and other schemes like Golomb [2], FDR [3], EFDR [11], 9C[24], BM[6], CCPRL[32], 2 n - PRL[5], BM-8C[29] in terms of the compression efficiency in percentage. The results for 10C are presented in columns 10 of Table 6. The best compression results are presented here which were obtained for block (K) equal to: 16, 12, 14, 12, 18 and 10 bits for benchmark circuits: s5378, s9234, s13207, s38417, s15850 and s38584 respectively. It may be noted that the 10C does provide a benefit of 1 2 % only in some of the benchmark circuits. Based upon the statistics, it could be concluded that the fall in efficiency occurs for the benchmarks with more number of don tcarebits. Filling of such don t care bits can help in increasing run of cases like all zeroes etc. All zeroes being encoded by using only one bit in 9C gives it an advantage of better compression. At the same time, this result can be improved if variable length 10C approach (similar to the approach used in 9C) is used. Due to shortage of space, the discussion and results for variable 10C are not included here. The results of SCCPRL are presented in column number 11, the block count length (R) is chosen to be 6, 5, 6, 5, 4, 6 (in bits) for the circuits s5378, s9234, s13207, s15850, s38417 and s38584 respectively. However, the count code is kept to be 3 bits in all cases. Row 9 of Table 6

J Electron Test (2016) 32:735 747 745 Fig. 6 Frequency of occurrence of special eight cases for different benchmark circuits represents the improvement obtained in application of SCCPRL in comparison to various techniques mentioned in the respective columns. As can be seen, SCCPRL shows very little advantage over CCPRL. Effectiveness of the SCCPRL technique in improving the compression efficiency can be more pronounced if the frequency of occurrence of the unique cases is high. In other words, the overhead of extra detect bit in cases of inter block merging can be offset by the advantage provided by reduction of redundant bits. The same is done in OSCCPRL by selection of Intra block merging over Inter. Row 9 of Table 6 represents the improvement obtained in application of OSCCPRL in comparison to various techniques mentioned in the respective columns. The block count length (R) is chosen to be 6, 5, 6, 5, 5, 6 (in bits) for the circuits s5378, s9234, s13207, s15850, s38417 and s38584 respectively. The fixed block_ size value (chosen for categorization of Intra block merging or unique cases) achieved for best cases are 16, 20, 48, 20, 12, 24 for the circuits s5378, s9234, s13207, s15850, s38417 and s38584 respectively. However, the count code is kept to be 3 bits in all cases since the frequency of occurrence of merging of blocks is not substantial beyond seven. Comparison of the Decoder Area The hardware overhead of the decoder of OSCCPRL (modeled using Verilog HDL and synthesized using Encounter RTL compiler from Cadence with 1.8 V, TSMC 180 nm CMOS standard cell library) is presented in Table 7. The full-scan ISCAS 89 benchmark circuits are synthesized with single scan-chain. The comparison with respect to various other compression techniques is presented in Table 7. It can be concluded that the area overhead is comparable to recently proposed BM-8C [29]and CCPRL[5] techniques but, it is larger as compared to 9C [24] and FDR [3]. Therefore, if there is a limitation in terms of the available silicon area then decision to employ the OSCCPRL technique should be made after weighing the trade-offs of decoder area and needed test data compression. At the deep submicron level, being just a fraction of the overall larger SOCs design, such overhead may sometimes be acceptable. Test Application Time Comparison As mentioned in the section 1, the compressed test bits are fed to the on chip decompressor at ATE clock frequency (f ate ).The FSM recognizes the codewords and generates the actual test data which is further applied to the circuit under test at circuit frequency (f CUT ).For synchronization f ate is selected to be an integral Table 6 Comparison of compression ratio of OSCCPRL with other test-independent compression techniques in terms of compression efficiency % Benchmark circuit Golo-mb [2] FDR [3] EFDR [29] 9C [5] B.M [15] CCPRL [5] 2 n -PRL [7] B.M-8C [14] 10C SCCP RL OSCCP RL s5378 37.11 48.02 53.67 51.64 54.98 61.08 54.94 58.56 52.40 60 71.15 s9234 42.25 43.6 48.66 50.94 51.19 62.95 57.72 57.49 51.30 52.19 75.93 s13207 79.74 81.3 82.49 82.31 84.89 90.06 88.1 87.52 80.64 92.1 94.14 s15850 62.82 66.22 68.66 66.38 69.49 76.32 64.29 73.69 64.72 79.34 85.91 s38417 28.37 43.26 62.02 60.63 59.39 64.61 58.33 59.92 58.45 70.15 73.23 s38584 57.17 60.93 64.28 65.53 66.86 75.38 72.44 71.66 64.75 77 85.62 Average 51.24 57.22 63.29 62.38 64.46 71.73 65.97 68.14 62.04 71.80 80.99 % Improvement in SCCPRL 20.55 14.57 8.51 9.42 7.34 0.06 5.83 3.66 % Improvement in OSCCPRL 29.75 23.77 17.7 18.62 16.53 9.26 15.03 12.86 12.86 9.19

746 J Electron Test (2016) 32:735 747 Table 7 Comparison of decompression area overhead for OSCCPRL with other compression techniques Benchmark circuit FDR BM-8C 9C CCPRL OSCCPRL [3] [29] [24] [32] S5378 7.8 12.8 8.2 9.6 10.7 S9234 5.9 9.7 6.2 7.3 8.3 S13207 3.5 5.8 3.7 3.5 4.2 S15850 3.6 5.9 3.8 3.7 3.9 S38417 1.4 2.3 1.5 1.8 2.2 S8584 1.5 2.5 1.6 1.9 2.5 multiple of the f CUT. Let the frequency ratio be α =f CUT /f ATE. Assuming the compressed test data has N codewords C 1 C N and each codeword has a length of W i (i = 1, 2,,N). Let α max = max 2<i <N (Hi 1/Wi)whereHi-1 is the length of decompressed test data for the codeword Ci-1. If α α max,minimum TAT can be calculated as [7,14]: TATmin ¼ X N h i. Wi þ max Hn α 2 α i ð2þ Table 8 Comparison of Test application time achieved for OSSCPRL with other compression techniques in terms of the f ate Circuits α FDR EFDR BM BM-8C OSCCPRL S5378 2 24,933 17,075 16,018 15,088 13,760 4 16,803 13,172 12,239 11,191 9,194 6 15,259 12,096 11,183 10,348 7,821 8 14,039 11,652 10,899 10,089 7,390 S9234 2 42,039 26,129 26,336 24,281 21,753 4 29,206 21,424 20,828 18,410 14,008 6 26,675 20,557 19,762 17,278 11,852 8 24,086 20,318 19,436 16,921 11,030 S13207 2 1,16,101 88,487 88,045 87,319 84,339 4 70,361 52,711 50,784 49,730 44,585 6 57,089 41,898 39,177 38,138 31,454 8 48,358 36,946 33,768 32,326 25,243 S15850 2 65,020 46,076 46,076 44,110 41,080 4 42,270 32,517 32,084 29,553 24,317 6 36,732 28,798 28,216 25,522 19,071 8 32,362 27,172 26,518 23,673 16,710 S38417 2 1,86,261 1,04,569 1,09,180 1,06,725 93,895 4 1,23,700 75,614 80,273 79,074 61,539 6 1,13,451 68,212 73,286 71,564 52,008 8 1,10,521 65,509 70,202 69,069 48,719 S38584 2 1,79,530 1,19,849 1,18,844 1,13,821 10,5343 4 1,18,628 86,320 83,255 76,161 62,212 6 1,04,630 78,066 73,953 66,048 48,632 8 93,260 74,955 70,692 61,908 42,518 If α < α max, the ATE will be stalled several cycles to wait for the CUT to apply the decompressed test data, which occurs when the time consumed for ATE to send the codeword C i to FSM is shorter than the time consumed for the CUT to apply the decompressed test data of the previous codeword C i 1. Then, the total TAT will be calculated as [29]: TAT ¼ TATmin þ X N n o. max Hi 1 wi *α; 0 α ð3þ i¼2 The TAT values calculated as per Eqs 2 and 3 for different benchmark circuits and α values are presented in Table 8. Results are compared with the TAT obtained for other compression techniques like FDR[3], EFDR[7], BM[6], BM- 8C[29], As evident from the table, proposed OSCCPRL offers much reduced Test application time. 6 Conclusion The test data compression is a very promising technique to reduce the test data volume and challenges of test application time. This paper proposed three different techniques for test data compression namely 10 Coded, Selective CCPRL and Optimal selective count compatible run length (OSCCPRL). In OSCCPRL technique, the pattern blocks are treated for compression in both vertical (inter block) and horizontal (intra block) directions. Encoding at inter and intra block levels is done as per SCCPRL and 10C respectively. It improves the test data compression efficiency without the need of any structural information of the circuit under test. As per the simulation results of the application of the OSCCPRL on various ISCAS 89 benchmark circuits, it can be seen that the compression efficiency is increased by 20 50 % in comparison to the previously proposed techniques. The decompressor architecture area overhead is found to be comparable to the earlier techniques. As evident from the experimental results, the test application time too is reduced by 27-30 % in this scheme. Acknowledgment The authors would like to thank Dr. Nur ATouba of University of Texas at Austin, Texas, USA for providing the test vectors for various benchmark circuits. References 1. Bayraktaroglu I, Orailoglu A. (2003) Decompression hardware determination for test volume and time reduction through unified test pattern compaction and compression. In: Proceedings IEEE VLSI test symposium (VTS), pp 113 118. 2. Chandra A, Chakrabarty K (2001) System-on-a-chip data compression and decompression architecture based on Golomb codes. IEEE Trans Comput Aided Des Integr Circuits Syst 20(3):355 368 3. Chandra A, Chakrabarty K (2003) Test data compression and test resource partitioning for system-on-a-chip using frequency-directed run-length (FDR) codes. IEEE Trans Commun 52(8):1076 1088

J Electron Test (2016) 32:735 747 747 4. Chandra A, Chakrabarty K (2003) A unified approach to reduce SoC test data volume, scan power and testing time. IEEE Trans Comput Aided Des Integr Circuits Syst 22(3):352 363 5. Chang CH, Lee LJ, Tseng WD, Lin RB (2012) 2n pattern runlength for test data compression. IEEE Trans Comput Aided Des Integr Circuits Syst 31(4):644 648 6. El-Maleh AH (2008) Efficient test compression technique based on block merging. IET Comput Digit Tech 2(5):327 335 7. El-Maleh AH (2008) Test data compression for system-on-a-chip using extended frequency-directed run-length code. IET Comput Digit Tech 2(3):155 163 8. Gonciari P.T., Al-Hashimi B.M., Nicolici N.(2002) Improving compression ratio, area overhead, and test application time for system on-a-chip test data compression/decompression. In: Proceedings IEEE design automation and test in Europe conference and exhibition (DATE), pp 604 611 9. Gonciari PT, Hashimi BA, Nicolici N (2003) Variable-length input huffman coding for system-on-a-chip test. IEEE Trans Computer- Aided Design 22:783 796 10. Haiying Y, Kun G, Xun S, Zijian J (2016) Power Efficient test data compression method for SoC using alternating statistical Runlength coding. J Electron Test 32:59 68 11. Jas A, Ghosh DJ, Ng M-E, Touba NA (2003) An efficient test vector compression scheme using selective Huffman coding. IEEE Trans Comput Aided Des 22:797 806 12. Krishna C, Touba N.A. (2002) Reducing test data volume using LFSR reseeding with seed compression. In: Proceedings IEEE international test conference (ITC), pp 321 330. 13. Lee H-H S, Chakrabarty K (2009) Test challenges for 3D integrated circuits. IEEE Des Test Comput 26(5):26 35 14. Lee L-J, Tseng W-D, Lin R-B (2011) An internal pattern run-length methodology for slice encoding. ETRI J 33(3):374 381 15. Li L, Chakrabarty K (2003) Test data compression using dictionaries with selective entries and fixed-length indices. ACM Trans Des Autom Electron Syst 8(4):470 490 16. Mehta US, Dasgupta KS, Devashrayee NM (2010) Modified selective Huffman coding for optimization of test data compression, test application time and area overhead. J Electron Test 26(6):679 688 17. Mehta US, Dasgupta KS, Devashrayee N (2010).Hamming Distance Based Reordering and Columnwise Bit Stuffing with Difference Vector: A Better Scheme for Test Data Compression with Run Length Based Codes. In proceeding 23rd International Conference on VLSI Design, pp: 33 38 18. Miyase K, Kajihara S, Reddy SM (2004) Multiple scan tree design with test vector modification. In Proceedings IEEE Asian test symposium (ATS), pp 76 81 19. Mrugalski G, Rajski J, Tyszer J (2004) Ring generators new devices for embedded test applications. IEEE Trans Comput Aided Des Integr Circuits Syst 23(9):1306 1320 20. Rajski J, Tyszer J, Kassab M, Mukherjee N (2004) Embedded deterministic test. IEEE Trans Comput Aided Des Integr Circuits Syst 23(5):776 792 21. Ramm P, Armin K, Josef W, Maaike M, Taklo V (2010) 3D systemon-chip technologies for more than moore systems. Microsyst Technol 6:1051 1055 22. Ruan X, Katti R.(2006) An efficient data-independent technique for compressing test vectors in systems-on-a-chip. In: Proceedings IEEE Computer Society Annual Symposium on Emerging VLSI technologies and architectures (ISVLSI), pp 153 158 23. Sivanantham S, Padmavathy M, Gopakumar G, Mallick PS, Perinbam JRP (2014) Enhancement of test data compression with multistage encoding. J Integ VLSI J 47:499 509 24. Tehranipoor M, Nourani M, Chakrabarty K (2005) Nine-coded compression technique for testing embedded cores in SoCs. IEEE Trans Very Large Scale Integ (Vlsi) Syst 13(6):719 731 25. Tenentes V, Kavousianos X, Kalligeros E (2010) Single and variable-state-skip LFSRs: bridging the gap between test data compression and test set embedding for IP cores. IEEE Trans Comput Aided Des Integr Circuits Syst 29(2):1640 1644 26. Touba NA (2006) Survey of test vector compression techniques. IEEE Des Test Comput 23(4):294 303 27. Tseng W-D, Lee L-J (2010) Test data compression using multidimensional pattern run-length codes. J Electron Test 226:393 400 28. Wang L., Wen X, Furukawa H., Hsu F. Lin S.,Tsai S.,Abdel-hafez K.S, Wu S. (2004). Virtual Scan: A new compressed scan technology for test cost reduction. In Proceedings IEEE international test conference (ITC), pp 916 925 29. Wu T, Liu H, Liu PJ (2013) Efficient Test compression technique for SoC based on block merging and eight coding. J Electron Test 29:849 859. doi:10.1007/s10836-013-5415-7 30. Yang JS, Lee J, Touba NA (2014) Utilizing ATE Vector repeat with linear decompressor for test vector compression. IEEE Trans Comput-Aided Des Integ Circuits Syst 33(8):1219 1230 31. Yi M, Liang H, Zhang L, Zhan W (2010) A novel x-ploiting strategy for improving performance of test data compression. IEEE Trans VLSI Syst 18(2):324 329 32. Yuan H, Mei J, Song H, Guo K (2014) Test data compression for system-on-a-chip using count compatible pattern Run-length coding. J Electron Test 30:237 242 33. Zhou B, Y-Zheng Y, Li Z, Zhang J, Wu X, Ke R (2010) A test set embedding approach based on twisted-ring counter with few seeds. Integr VLSI J 43:81 100 Harpreet Vohra received her Masters degree in VLSI.Design. She joined Thapar University in year 2006 and is working as Assistant Professor with Electronics and Communication Engineering Department. Her Interest areas include low power VLSI design and test. She has guided around 14 M.Tech. thesis. Amardeep Singh received his BTech degree in Electronics and Communication from MIT and his MTech degree in Computer Science and Engineering from Punjabi University, Patiala. He received his PhD degree in VLSI Testing using DNA and Quantum computing Algorithms in 2006 from Thapar University, Patiala. Now he is holding an academic position as Professor in Computer Engineering Department, Punjabi University, Patiala, India. His main interest areas include nanocomputing, embedded systems, and nonconventional algorithms.