CacheCompress A Novel Approach for Test Data Compression with cache for IP cores

Similar documents
A Combined Compatible Block Coding and Run Length Coding Techniques for Test Data Compression

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

Design of Fault Coverage Test Pattern Generator Using LFSR

Low Power Estimation on Test Compression Technique for SoC based Design

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

Testing of Cryptographic Hardware

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

Czech Technical University in Prague Faculty of Information Technology Department of Digital Design

Overview: Logic BIST

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

Test-Pattern Compression & Test-Response Compaction. Mango Chia-Tso Chao ( 趙家佐 ) EE, NCTU, Hsinchu Taiwan

TEST PATTERNS COMPRESSION TECHNIQUES BASED ON SAT SOLVING FOR SCAN-BASED DIGITAL CIRCUITS

Sharif University of Technology. SoC: Introduction

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

Synchronization Overhead in SOC Compressed Test

Using on-chip Test Pattern Compression for Full Scan SoC Designs

Changing the Scan Enable during Shift

Response Compaction with any Number of Unknowns using a new LFSR Architecture*

Survey of Test Vector Compression Techniques

A New Low Energy BIST Using A Statistical Code

VirtualScan TM An Application Story

Testing Digital Systems II

DETERMINISTIC SEED RANGE AND TEST PATTERN DECREASE IN LOGIC BIST

State Skip LFSRs: Bridging the Gap between Test Data Compression and Test Set Embedding for IP Cores *

I. INTRODUCTION. S Ramkumar. D Punitha

VLSI System Testing. BIST Motivation

Lossless Compression Algorithms for Direct- Write Lithography Systems

Testing Digital Systems II

Analog Performance-based Self-Test Approaches for Mixed-Signal Circuits

Test Data Compression for System-on-a-Chip Using Golomb Codes 1

FPGA Laboratory Assignment 4. Due Date: 06/11/2012

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

Built-In Self-Test (BIST) Abdil Rashid Mohamed, Embedded Systems Laboratory (ESLAB) Linköping University, Sweden

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation

Individual Project Report

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Implementation of an MPEG Codec on the Tilera TM 64 Processor

CMOS Testing-2. Design for testability (DFT) Design and Test Flow: Old View Test was merely an afterthought. Specification. Design errors.

ECE 715 System on Chip Design and Test. Lecture 22

Design and Implementation OF Logic-BIST Architecture for I2C Slave VLSI ASIC Design Using Verilog

SIC Vector Generation Using Test per Clock and Test per Scan

Design for Testability Part II

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

Controlling Peak Power During Scan Testing

VLSI Test Technology and Reliability (ET4076)

Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)

LAB 3 Verilog for Combinational Circuits

Implementation of Scan Insertion and Compression for 28nm design Technology

Lecture 23 Design for Testability (DFT): Full-Scan

Co-simulation Techniques for Mixed Signal Circuits

Using down to a Single Scan Channel to Meet your Test Goals (Part 2) Richard Illman Member of Technical Staff

Deterministic BIST Based on a Reconfigurable Interconnection Network

TEST PATTERN GENERATION USING PSEUDORANDOM BIST

UNIT IV CMOS TESTING. EC2354_Unit IV 1

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

FPGA Development for Radar, Radio-Astronomy and Communications

Strategies for Efficient and Effective Scan Delay Testing. Chao Han

Testing Sequential Logic. CPE/EE 428/528 VLSI Design II Intro to Testing (Part 2) Testing Sequential Logic (cont d) Testing Sequential Logic (cont d)

A New Approach to Design Fault Coverage Circuit with Efficient Hardware Utilization for Testing Applications

Boolean, 1s and 0s stuff: synthesis, verification, representation This is what happens in the front end of the ASIC design process


THE USE OF forward error correction (FEC) in optical networks

VLSI Design Verification and Test BIST II CMPE 646 Space Compaction Multiple Outputs We need to treat the general case of a k-output circuit.

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

SRAM Based Random Number Generator For Non-Repeating Pattern Generation

LOCAL DECODING OF WALSH CODES TO REDUCE CDMA DESPREADING COMPUTATION. Matt Doherty Introductory Digital Systems Laboratory.

Test Compression for Circuits with Multiple Scan Chains

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Layout Decompression Chip for Maskless Lithography

University of Pennsylvania Department of Electrical and Systems Engineering. Digital Design Laboratory. Lab8 Calculator

Sequential Logic. Introduction to Computer Yung-Yu Chuang

ADVANCES in semiconductor technology are contributing

On the Rules of Low-Power Design

Design for test methods to reduce test set size

Low-Power Scan Testing and Test Data Compression for System-on-a-Chip

Design and Implementation of an AHB VGA Peripheral

Nodari S. Sitchinava

Slide Set 6. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

CprE 281: Digital Logic

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

A low-power portable H.264/AVC decoder using elastic pipeline

Final Exam CPSC/ECEN 680 May 2, Name: UIN:

Leveraging Reconfigurability to Raise Productivity in FPGA Functional Debug

A LOW COMPLEXITY CODE COMPRESSION BASED ON HYBRID RLC-BM CODES

Unit 8: Testability. Prof. Roopa Kulkarni, GIT, Belgaum. 29

926 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY /$ IEEE

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

DESIGN OF RANDOM TESTING CIRCUIT BASED ON LFSR FOR THE EXTERNAL MEMORY INTERFACE

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science SOLUTIONS

ISSN:

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Laboratory 1 - Introduction to Digital Electronics and Lab Equipment (Logic Analyzers, Digital Oscilloscope, and FPGA-based Labkit)

CS6201 UNIT I PART-A. Develop or build the following Boolean function with NAND gate F(x,y,z)=(1,2,3,5,7).

FOR A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY

EEC 116 Fall 2011 Lab #5: Pipelined 32b Adder

Transcription:

CacheCompress A Novel Approach for Test Data Compression with cache for IP cores Hao Fang ( 方昊 ) fanghao@mprc.pku.edu.cn Rizhao, ICDFN 07 20/08/2007 To be appeared in ICCAD 07

Sections Introduction Our proposed methods Experimentation Conclusion

Test Challenges Large test data size CUT (Circuit-Under-Test) Present circuit 100~1000M bits Next decade 10G~100G bits ATE (Automatic Test Equipment) Limited memory Cannot store all the data in ATE Long test time Directly proportional to $$ 1~2 $ for 1 minute Relative small number of pins stimulus ATE CUT response

Test Data Compression Automatic Test Equipment (ATE) Output channel Clock generator Input channel memory Pre-computed stimulus Pre-computed Test data Compacted response chip On-chip decoder Original stimulus Circuit-under-test (CUT) Original response Research Hot spot On-chip compactor Well researched Stimulus ( hot research spot ) Software compress Hardware decompress (on-chip decoder) Response Hardware on-chip compactor (well researched)

Test Data Compression Architecture Based on Scan slices circuit-under-test Codewords (from ATE) 1 1 1.. 0 1 0 0.. 0 1 1 1.. 1 Codewords for Slice i+1 0 0 1.. 1 1 0 1.. 0 Codeword for Slice i On-chip decoder chain1 chain2 chainc On-chip compactor slicei-1 slicei-2 slicei-3 slicek slicek-1 Narrow input (codeword) Data after compressed Broad output (slice) Original test data to be compressed

Hardware On-chip Decoder Features Vs. conventional software compression (gzip, rar) Small Area Overhead Cannot use too complex methods Cannot save all the former data Codeword Continuous-flow Input data (codeword) Receives one codeword every cycle Difficult to stop ATE clock Usually every slice needs at least one codeword Original stimulus content slice Few 0 or 1 Many don t-cares bits (X) Information Lossless (narrow input) Some image algorithms with very high compression ratio Pre-computed stimulus On-chip decoder Slice (broad output) Original stimulus

Compression methods LFSR based linear expansion Dictionary based on-chip memory Selective Encoding encode specified bits Others Hybrid...

Dictionary Methods Basic method Contain all the slices in the static memory Load test data in the on-chip memory at the beginning ROM Shorter memory address instead of larger slice Need a very large memory Advanced method Dictionary with Correction (ITC 04) Need a smaller memory, but still very large Correction index Memory address RAM correction Chain 1 Chain 2 Chain n

Selective Encoding (ITC 05) For every slice Count # of 0s & # of 1s in the slice Determine default bit & value (majority) slice # 0 # 1 type specified bit & value (minority) Fill Xs with the default value Only encode the specified bits 0000xxxx 00xx001x 4 4 0 1 N-type S-type Classified by # of specified bits 0: N-type (Non-specified) 01101110 3 5 M-type 1: S-type (Single-specified) >1: M-type (Multiple-specified)

Encode Slices with Two Code Types 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 Group 0 Group 1 Group 2 Group 3 control-code Data-code Single-mode code Slice: C=15, G=4 Codeword W=6 Only one codeword for N-type, S-type Composition The default value Index of the specified bit (index=n means no specified bit) Group-mode code M-type First codeword: group index Subsequent codewords: group content

Selective Encoding Decoder Structure decoder codeword w control Controller control Single Sub-decoder Group Sub-decoder Default value G index Select Group Content index G G Scan Slice Selector C Scan slice shift

Selective Coding -- Example Slice xx00 01xx xxxx 1xx1 1100 0001 Control code 00 01 00 11 Data code 0101 1000 0111 0000 Interpretation Start new slice, default=0, index=5, set bit 5 to 1 Start new slice, default=1, index=8 (no bit set) Start new slice, default=0, index=7, set bit 7 to 1 Enter group-mode, Group index=0 Decompressed Slice 0000 0100 1111 1111 0000 0001 0000 0001 11 1100 Group data is 1100 1100 0001 Group pointer Overwrite

Sections Introduction Our proposed methods Experimentation Conclusion

Disadvantages of Dictionary Code Store entire slice Memory width Store slices as many as possible Great many words Memory height Contents are constant (static) Need additional initialization step Store large initialization data on ATE

Improvement in CacheCompress Store only part of slice Memory width A default compress method Single-mode code of selective encoding Many simple slices can be compressed Use memory to compress remain slices Memory height Cache-like Updating during testing Dynamic, MRU (Most frequently used) Memory height Eliminates memory initialization

Disadvantages of Selective Encoding Many specified bits in one slice Eg: 1010 1000 1001 100 Codeword: 00 0010 11 0000 11 1010 10 0010 11 1001 11 100x Cause many group codes No specified bits in the slice (N-type) Eg: 0000 0000 0000 000 Codeword: 00 1111 The whole codeword for one slice In fact only a bit is enough, waste bandwidth

Improving Selective Coding Key idea Use the wasted bandwidth to write the dictionary in advance for future reading Many specified bits in the slice Much wider segment instead of group Read segment from dictionary Only one codeword for several groups No specified bits in the slice (N-type) One bit is for default value Wasted bits for writing dictionary

Compression Parameters & Slice Division Given parameters C: # of bits per slice W: word width Segment (read and write word) [C/W] groups 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Segment 0 Segment 1

Cachecompress Decoder Structure decoder Default value control Single Sub-Decoder index G Scan slice codeword W Controller D Select Read Sub-decoder control we S*G address dictionary Segment content S*G G index Scan Slice Selector C S*G Default value control Write Sub-decoder shift

An Example (C=15, W=8) Original scan slice Code Decompressed slice Shift Type Details 00xxx0xx xxxxxxxx 110 0010 000000000000000 Write default=0, writeaddress=2 xxxxxx00 0xxxxxxx 110 0101 000000000000000 Write default=0, write content=0101 11xxx011 xxxxxxxx 011 0101 111110111111111 Single default=1, specified bit index=5 11111xxx xxxxxxxx 111 0011 111111111111111 Write default=1, write content=0011 Do write: Mem[2]=01010011 01010011 1111111 011 1111 111111111111111 Single default=1, no specified index 100 0010 010101011111111 Read segindex=0, readaddress=2, content=01010011 Write: Mem[2]=01010011

Encoding Algorithm S-type One single-mode codeword N-type One single-mode codeword Record for possible future writes M-type One single-mode codeword for default value Encoding specified segment (specified bits > 1) Search suitable words in the dictionary If not found Use LRU replacement policy to overwrite one entry Convert single-mode for N-type to write-mode An read-after-write race should be avoided Encode as read-mode codeword For other uncovered bits Encode as single-mode codeword for correction

Sections Introduction Our proposed methods Experimentation Conclusion

Hardware Implementation Gate Count or Reg Count C=255 W=64 C=1023 W=64 C=1023 W=128 reuse memory gates regs 1050 330 3472 1097 3752 1163 not reuse memory gates regs 3974 2378 6396 3145 16284 9355 Hardware Implement Verilog HDL (Parameterized C, W) Pass Simulation by VCS Hardware overhead Design compiler under TSMC13 process About 3-4 gates per scan chain (memory reuse) 1% area overhead

Benchmark Circuits Circuits Cells Scan cells Vectors TD X ratio CCT1 94757 11675 634 7,401,950 96.11 CCT2 183743 20652 2396 49,482,192 96.45 CCT3 248518 29057 2355 68,429,235 98.17 CCT4 293265 31239 2439 76,191,921 98.41 CCT5 286205 33465 3658 122,414,970 99.09 CCT6 520342 67569 6385 431,428,065 99.42 Industrial SoCs from MPRC All silicon proved Test patterns Generated by Synopsys Tetramax After dynamic compaction

Compression Result Circuits C SE [ITC 05] TE Cycle DC [ITC 04] TE Cycle Size W CacheCompress TE Cycle Size Hit rate (%) CCT1 255 1,053,680 106,002 893,776 29,798 339,660 64 778,393 71,937 2048 84.47 CCT2 1023 3,035,352 255,342 3,963,168 52,712 2,856,216 64 2,345,096 182,788 2048 83.02 CCT3 1023 4,024,536 337,733 3,935,184 70,650 2,432,694 64 3,305,809 256,648 2048 81.90 CCT4 1023 5,338,008 447,273 4,613,730 78,048 2,950,332 64 3,917,108 303,755 2048 87.68 CCT5 1023 5,123,880 430,648 5,074,080 124,372 2,418,372 64 4,165,993 324,119 2048 85.56 CCT6 1023 12,096,432 1,014,421 14,786,513 434,180 4,947,228 128 9,931,935 770,380 8192 83.61 Test data size: 30% reduced in all cases Testing time: 27% faster than SE Memory size: 100-1000+ times smaller than DC Average hit rate: 84%

Ratio Improvement Analysis Codeword for N-type -> Write-mode Sufficiently use codeword bandwidth Words reuse 84% hit rate means every word will be read 7 times in average Only provide data the first time Subsequent 7 complex segments are benefited by reading

Sections Introduction Our proposed methods Experimentation Conclusion

Conclusion Contribution Combine other technique with dictionary coding Dynamic, cache-like Eliminates memory initialization Result Much smaller dictionary Higher compression ratio