Efficient Parallel Scan Test Technique for Cores on AMBA-based SoC

Similar documents
VLSI System Testing. BIST Motivation

Chapter 10 Exercise Solutions

Testing Sequential Logic. CPE/EE 428/528 VLSI Design II Intro to Testing (Part 2) Testing Sequential Logic (cont d) Testing Sequential Logic (cont d)

K.T. Tim Cheng 07_dft, v Testability

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Scan. This is a sample of the first 15 pages of the Scan chapter.

Design of Fault Coverage Test Pattern Generator Using LFSR

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Final Examination CLOSED BOOK

Overview: Logic BIST

[Krishna*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

A Low Power Delay Buffer Using Gated Driver Tree

Design of BIST Enabled UART with MISR

Modifying the Scan Chains in Sequential Circuit to Reduce Leakage Current

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

Using on-chip Test Pattern Compression for Full Scan SoC Designs

An On-Chip Test Clock Control Scheme for Multi-Clock At-Speed Testing

Sharif University of Technology. SoC: Introduction

DESIGN OF RANDOM TESTING CIRCUIT BASED ON LFSR FOR THE EXTERNAL MEMORY INTERFACE

Scan-shift Power Reduction Based on Scan Partitioning and Q-D Connection

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

Lecture 23 Design for Testability (DFT): Full-Scan

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

Testability: Lecture 23 Design for Testability (DFT) Slide 1 of 43

Logic Design for Single On-Chip Test Clock Generation for N Clock Domain - Impact on SOC Area and Test Quality

LOW POWER AND HIGH PERFORMANCE SHIFT REGISTERS USING PULSED LATCH TECHNIQUE

Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)

Design and analysis of microcontroller system using AMBA- Lite bus

Efficient Combination of Trace and Scan Signals for Post Silicon Validation and Debug

Implementation of Scan Insertion and Compression for 28nm design Technology

Testing Sequential Circuits

Design for Testability

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

A Novel Low Power pattern Generation Technique for Concurrent Bist Architecture

CMOS Testing-2. Design for testability (DFT) Design and Test Flow: Old View Test was merely an afterthought. Specification. Design errors.

VLSI Test Technology and Reliability (ET4076)

Design for Testability Part II

Enhanced JTAG to test interconnects in a SoC

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

A New Approach to Design Fault Coverage Circuit with Efficient Hardware Utilization for Testing Applications

ADVANCES in semiconductor technology are contributing

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

Controlling Peak Power During Scan Testing

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 2, FEBRUARY

Remote Diagnostics and Upgrades

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

Efficient Trace Signal Selection for Post Silicon Validation and Debug

ECE 715 System on Chip Design and Test. Lecture 22

Changing the Scan Enable during Shift


Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

An FPGA Implementation of Shift Register Using Pulsed Latches

A Technique to Reduce Peak Current and Average Power Dissipation in Scan Designs by Limited Capture

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

Testing of Cryptographic Hardware

System IC Design: Timing Issues and DFT. Hung-Chih Chiang

Design of Testable Reversible Toggle Flip Flop

This Chapter describes the concepts of scan based testing, issues in testing, need

A Briefing on IEEE Standard Test Access Port And Boundary-Scan Architecture ( AKA JTAG )

Built-In Self-Test (BIST) Abdil Rashid Mohamed, Embedded Systems Laboratory (ESLAB) Linköping University, Sweden

Chip-Level DFT: Some New, And Not So New, Challenges

Bit-Serial Test Pattern Generation by an Accumulator behaving as a Non-Linear Feedback Shift Register

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

Design and Implementation of Uart with Bist for Low Power Dissipation Using Lp-Tpg

Simulation Mismatches Can Foul Up Test-Pattern Verification

Final Exam CPSC/ECEN 680 May 2, Name: UIN:

Chapter 8 Design for Testability

Design for Test. Design for test (DFT) refers to those design techniques that make test generation and test application cost-effective.

Research Article Ring Counter Based ATPG for Low Transition Test Pattern Generation

Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

REDUCING DYNAMIC POWER BY PULSED LATCH AND MULTIPLE PULSE GENERATOR IN CLOCKTREE

LFSR Counter Implementation in CMOS VLSI

SIC Vector Generation Using Test per Clock and Test per Scan

Retiming Sequential Circuits for Low Power

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

TEST PATTERN GENERATION USING PSEUDORANDOM BIST

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Test Data Compression for System-on-a-Chip Using Golomb Codes 1

Unit 8: Testability. Prof. Roopa Kulkarni, GIT, Belgaum. 29

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

Czech Technical University in Prague Faculty of Information Technology Department of Digital Design

LUT Optimization for Memory Based Computation using Modified OMS Technique

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Logic Design for On-Chip Test Clock Generation- Implementation Details and Impact on Delay Test Quality

Chapter 2. Digital Circuits

On Reducing Both Shift and Capture Power for Scan-Based Testing

A New Low Energy BIST Using A Statistical Code

BTW03 DESIGN CONSIDERATIONS IN USING AS A BACKPLANE TEST BUS International Test Conference. Pete Collins

Outline. EECS150 - Digital Design Lecture 27 - Asynchronous Sequential Circuits. Cross-coupled NOR gates. Asynchronous State Transition Diagram

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

Low Power Implementation of Launch-Off- Shift and Launch-Off-Capture Using T-Algorithm

Using the XC9500/XL/XV JTAG Boundary Scan Interface

Transcription:

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.3, JUNE, 2014 http://dx.doi.org/10.5573/jsts.2014.14.3.345 Efficient Parallel Scan Test Technique for Cores on AMBA-based SoC Jaehoon Song, Jihun Jung, Dooyoung Kim, and Sungju Park Abstract Today s System-on-a-Chip (SoC) is designed with reusable IP cores to meet short time-tomarket requirements. However, the increasing cost of testing becomes a big burden in manufacturing a highly integrated SoC. In this paper, an efficient parallel scan test technique is introduced to minimize the test application time. Multiple scan enable signals are adopted to implement scan architecture to achieve optimal test application time for the test patterns scheduled for concurrent scan test. Experimental results show that testing times are considerably reduced with little area overhead. Index Terms AMBA, system-on-a-chip, scan test, IEEE 1500, parallel test, test time I. INTRODUCTION As deep submicron techniques are increasingly developed, it is possible to design and manufacture a System-on-a-Chip (SoC) comprised of various intellectual property (IP) cores while meeting short timeto-market requirements. Although the design time can be reduced by utilizing reusable IP cores, the testing time is significantly increased due to the high complexity of the SoC. Improving test quality while keeping testing costs low becomes crucial to survive in the emerging silicon Manuscript received Feb. 13, 2014; accepted Apr. 17, 2014 Dept. of Computer Science & Engineering, Hanyang University ERICA Campus, Ansan-si, Gyunggi-do, Korea E-mail : {jhsong, bbatte, dykim, parksj}@mslab.hanyang.ac.kr This research was supported in part by the National Research Foundation of Korea (NRF) grant (MEST) (No. NRF- 2013R1A1A2059326). market. Modular test techniques can be used for the effective testing of IP cores embedded in an SoC. To apply and observe the test patterns for an SoC, Test Access Mechanism (TAM) and test wrapper have to be provided to establish test paths [1-4]. Automatic Test Equipment (ATE) is used as the source and sink of test patterns, and either IEEE 1500 or customized interface is adopted as the test wrapper [5]. There is no standard specification in designing the TAM. Dedicated TAM may be adopted [6, 7] or functional bus can be reused as the TAM in test mode [8-10]. In sequential test, each core is tested one by one, and regardless of the characteristics of each core, the TAM width is determined by the TAM width of the SoC including cores. In parallel test, several cores are tested simultaneously. A TAM is partitioned into several sub- TAMs, and each core is assigned to each sub-tam to optimally realize the parallel test schedule thus to achieve optimal test time [6-13]. Hierarchical and scalable TAM, which provides flexible access for multicore chip, has been developed [14]. To reduce test cost, many multi-site testing methods are also used at ATE level. The testing time for each core depends upon the length of the scan chain. In both sequential and parallel testing, the scan chains of each core have to be reconfigured according to the TAM width, thus the chains have to be well balanced to minimize scan testing time [11]. In general, as the TAM width assigned to each core increases, the length of the scan chain decreases resulting in reduced testing time. In scheduling scan test patterns, parallel scan test utilizes the unused TAM width to provide more scan chains hence to minimize the length of scan length.

346 JAEHOON SONG et al : EFFICIENT PARALLEL SCAN TEST TECHNIQUE FOR CORES ON AMBA-BASED SOC Multiple scan enable signals have been adopted to improve delay fault coverage [15, 16], but their use for the optimal application of scheduled parallel test patterns has not been addressed. In this paper, an efficient parallel scan test technique is introduced to apply scan test patterns optimally scheduled for cores embedded to AMBA-based SoC, where the functional AMBA bus is reused as scan test channel. The paper is organized as follows. Section 2 gives a brief description of test interfaces for AMBA-based SoC. In section 3, IEEE 1500 wrapped core design is described. Our AMBA based parallel scan test technique is precisely explained in section 4. Experimental results are given in section 5 followed by concluding remarks in section 6. Fig. 1. Example of AMBA system with the TIC. II. TEST INTERFACES FOR AMBA-BASED SOC A conventional AMBA-based system is comprised of Advanced High-performance Bus (AHB) and Advanced Peripheral Bus (APB) as shown in Fig. 1 [17]. The AHB and APB have separate read and write data buses for onchip transactions. The AHB interfaces high-speed cores such as a microprocessor, and the APB is used to interface any peripherals which have low-bandwidth and do not require the high performance of a pipelined bus interface [8]. An AHB-APB bridge must be adopted to adjust different speeds and protocols of the AHB and APB. The TIC IP core in Fig. 1 is a test interface controller for the AMBA-based system that performs basic read/write transactions as an AMBA bus master [17]. In order to access the external memory modules outside an AMBA-based SoC, unidirectional 32-bit address and bidirectional 32-bit data pins of the EBI are generally used. By utilizing the functional buses of the EBI and AMBA as test buses, no additional Test Access Mechanism is required. In general, scan test patterns are applied and observed simultaneously, but when the TIC is used as a test controller the bidirectional TBUS is shared by READ/WRITE operations, therefore the scan in and out cannot be performed simultaneously. As shown in Fig. 2, a MUX is adopted in the test mode to use EBI address pins as scan in channels and data pins as scan out channels [8]. Fig. 2. Enhanced TIC and EBI structure. This paper also takes the structural test architecture of Fig. 2 where each core is planned to be tested through multiple scan chains. III. IEEE 1500 WRAPPED CORES The IEEE 1500, an industry standard for core test, provides a test interface between core developers and users. Although core test wrappers, which consist of Wrapper Instruction Register (WIR), Wrapper Bypass Register (WBY) and Wrapper Boundary Register (WBR), are standardized, test controller and TAM are defined by users. TAM takes test patterns from the external test device and transfers to the core test wrappers, and conversely transfers test responses to the external test device. There are two test access terminals defined. One is mandatory Wrapper Serial Port (WSP) and the other is Wrapper Parallel Port (WPP). Unlike the IEEE 1149.1, wherein TAP control is defined as a standard, the test control logic of the IEEE 1500 is supposed to be

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.3, JUNE, 2014 347 Fig. 3. Data registers of IEEE 1149.1. Fig. 5. Change in SelectWIR signal for each TAP state shown in Fig. 5 for each TAP state. After PROGRAM_WIR instruction is loaded to JTAG instruction register, SelectWIR becomes high. On the other hand after general JTAG instructions other than PROGRAM_WIR are loaded or TAP state is transited to Run-Test/Idle, SelectWIR signal becomes low. That is, while SelectWIR is high, instruction register chain of the test target core is connected to TDI-TDO path, otherwise data register chain is connected. IV. AMBA BASED PARALLEL SCAN TEST TECHNIQUE Fig. 4. Wrapper connection for parallel testing. customized by SoC integrators. However, in general, the IEEE 1149.1 TAP controller becomes the main control source for the embedded core test. This paper follows the IEEE 1500 standard in testing cores. WSP for core test uses Test Data Input (TDI) and Test Data Output (TDO) of IEEE 1149.1 as shown in Fig. 3. Wrapper registers for each core shaded in Fig. 3 are located at the user register of IEEE 1149.1 and chosen by the user specified instruction PROGRAM_WIR. AMBA is used as the WPP test paths in parallel scan test as shown in Fig. 4, where the patterns are applied through the HWDATA and the responses are observed through the HRDATA. Core input wrapper and output wrapper are connected to HWDATA and HRDATA respectively according to TAM width determined by test scheduling, and the test paths are established by executing WP- INTEST instruction. The clock operating each core wrapper is generated by Test Core Select (SN in Fig. 4), Stop signal to change test target (Stop in Fig. 4), and WRCK. JTAG TAP controller [18] is mainly used as the test controller and one of the control signal SelectWIR is 1. Parallel Test Scheduling The goal of test scheduling is to minimize testing time under the constraints of the TAM. Especially in parallel core test, efficient test scheduling is crucial. This paper adopts TR-ARCHITECT [3] which is a parallel test schedule optimizing SoC test wrapper and TAM simultaneously, and consists of four stages of CreateStartSolution, OptimizeBottomUp, OptimizeTopDown, and Resuffle. As in Fig. 6, it takes SoC information and user specified values (core type, schedule type, TAM type, and SoC TAM width), and then optimizes SoC test architecture. TAM width allocated to each core, cores simultaneously being tested, test order for each core, wrapper architecture optimized to each core, and SoC testing time will be determined after applying the test schedule. Given a TAM width of w, WrapperDesign(m,w) function determines an optimized wrapper architecture to achieve minimal test time for core m. According to TAM wires, internal scan chains are evenly partitioned and input/output wrapper cells are allocated. By adopting the technique in [12] this TAM chain partition is implemented as WrapperDesign function in this paper.

348 JAEHOON SONG et al : EFFICIENT PARALLEL SCAN TEST TECHNIQUE FOR CORES ON AMBA-BASED SOC Fig. 7. Result of test scheduling. Fig. 6. Execution flow of TR-ARCHITECT parallel test scheduler. After wrapper cells are designed for each core, the TestTime(r,w) function determines the total test time for the cores connected to TAM r with a width of w by summing up each test time. Observation: If the scan chain lengths of all cores in an SoC are the same, a single Scan Enable signal may be sufficient to apply scan test patterns concurrently. Justification: If the scan chain lengths of the cores tested concurrently are different, single SE cannot provide scan shift and capture operations for different cores at the same time. Either multiple SEs must be used or pause the scan shift and resume operators with a single SE must be provided for concurrent scan test. Scan test time with single SE can be estimated by the following equation; Test_Time_Single_SE = Parallel_Test_Time + Σ(Core_Test_Patterns) pause_resume_cycles Scan test time expected by a Test Scheduler is noted as Parallel_Test_Time. If single SE is adopted, whenever any scan shift operation for a core tested concurrently is finished, the other cores tested concurrently should pause the shift operation. Because the lengths of scan chains on the cores concurrently tested are not the same, extra test time noted as pause_resume_cycles is needed for each test pattern for each core. Therefore additional (1) Σ(Core_Test_Patterns) pause_resume_cycles are needed with single SE. Fig. 7 shows a parallel test schedule obtained for three TAMs (TAM 1, TAM 2, TAM 3 ) and five target cores (CoreA, CoreB, CoreC, CoreD, CoreE). If the lengths of scan chains for target cores tested concurrently are different, the test time scheduled as 930 for the TAM1 cannot be achievable with a single global Scan Enable. Suppose multiple scan chains in each core are balanced as {20, 25, 40, 50, 100} with {34, 15, 20, 45, 24} test patterns for five cores respectively. At first three cores {CoreA, CoreB, CoreC} are tested concurrently followed by {CoreA, CoreB, CoreD} and {CoreA, CoreD, CoreE}. If single SE is used, then after scan shifting the CoreA of which the length is 20, the CoreB and CoreC of which the lengths are 25 and 40 respectively must stop the scan shifting for the CoreA to be in capture mode. After the capture state, all three cores will resume scan shift. Whenever any core needs to be in capture state, the other cores tested concurrently but not with the same length of scan chain must be in pause and resume the scan shift. By the Eq. (1), (34+15+20+45+24) pause_resume_cycles (for example 2) = 276 additional cycles are needed in addition to the 930 clock cycles for concurrent test with a single SE. A large industrial design P22810 in ITC 02 benchmarks [19] includes 28 cores of which 21 cores are designed with scan chains. Average lengths of scan chains for each core are ranged from 26 to 370, the number of scan chains from 1 to 21, and the number of test patterns from 1 to 785. It is not likely that the scan chains for each core can be balanced, hence with a single SE more than 10,000 extra test cycles are needed in addition to the parallel test times scheduled to apply about 3,000 total test patterns.

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.3, JUNE, 2014 349 2. Core Selection Technique in Realizing Parallel Test Since, in parallel testing, more than one core is tested concurrently, the target cores to be tested must be chosen. This paper proposes a core selection scheme using the TIC and Test Mode Core Selector (TMCS) as shown in Fig. 8. The TMCS module generates a signal notifying the core to become a target through the TIC. The TMCS consists of AND gates and C n number of flip-flops corresponding to all target cores. Each flip-flop, which is one-to-one mapped to each core, has a value of one indicating the corresponding core becomes a target. The flip-flop is updated as HWDATA value only when the EN_TMCS signal is active through the HCLK. While the signals HCLK and HWDAT belong to AMBA, EN_TMCS signal is generated through the AHB decoder as shown in Fig. 9. When the device address of TMCS is loaded to the AMBA bus, the EN_TMCS signal is activated by the AHB decoder. The original function of the AHB decoder is to decode the bus address to generate slave selection signal HSELx as shown in Fig. 9(a), and this decoder is slightly modified to take the TMCS signal to generate an EN_TMCS signal as Fig. 9(b). If the modification is prohibited, since the HSELx signals are not used in test mode, one of the signals can be used as the EN_TMCS by assigning one slave address as the TMCS. Three steps are needed to select a test target core. Each step takes one clock cycle, thus, three clock cycles are required. STEP 1: The TMCS address is loaded to the AMBA bus by the TIC control. STEP 2: With the generation of the Write Transaction by the TIC control, the TMCS flip-flop contents are changed in order to activate the selection signal of the test target core. STEP 3: After the core selection, an address other than the addresses of the TMCS is loaded to AMBA bus in order to apply test vectors with the Write Transaction. We have discussed the technique for selecting a target core for parallel test in this section. 3. Stop Generator to Temporarily Stop Current Test In our technique the TIC is used to apply/observe the test patterns as well as to select the target core, but these Fig. 8. Select cores in test mode. Fig. 9. Generation of EN_TMCS signal using AHB decoder. two processes cannot be performed simultaneously. To avoid interrupting the current test process by changing target core, the current test must be temporary stopped. This paper introduces a Stop Generator logic which provides a STOP signal (active low) to stop the current test process for 3 clock cycles. The logic and timing diagram of Stop Generator are shown in Figs. 10 and 11 respectively. Initially three flip-flops in the STOP Generator hold the value of 1, and this value becomes 0 when the TMCS address is applied (TREQA=1, TREQB=1, EN_TMCS=0) to change target core. The STOP signal preserves the value of 0 until all the flipflops take the logic value of 1. 4. AMBA Architecture as the TAM for Parallel Testing When the AMBA is used as the TAM, it must provide test paths for the application and observation of test patterns while preserving the AMBA standard. To aid understanding, consider a 32-bit wide bus and assume that the TIC is in test mode. Test scheduling shown in Fig. 7 is applied to cores connected to this bus. The ScanTestMode signal is generated with a WP_INTETST instruction. The structure shown in Fig. 12 is used as the path for the application of test patterns on the AMBA bus. In test mode TIC acts as the AHB master to generate a Write transaction (HWDATA) for the test patterns loaded on the TBUS. Since the TIC has ownership in using the bus

350 JAEHOON SONG et al : EFFICIENT PARALLEL SCAN TEST TECHNIQUE FOR CORES ON AMBA-BASED SOC Fig. 10. Generation logic of the STOP signal. Fig. 11. Timing diagram of the STOP generator. Fig. 12. Application of the test vectors in parallel. in test mode the HWDATA connected to TIC is broadcasted through the Write Data MUX. No modification is needed to the HWDATA architecture and cores that belong to same TAM by test scheduling are connected to the same bit of the HWDATA. The path to apply and observe test patterns is shown in Fig. 13. Whenever test patterns are shifted in, the responses are automatically shifted out to HRDATA, hence different from test application step additional Read transaction through the TIC is not required for the observation of test responses. Minor modification is needed for HRDATA to reconfigure to 32 bit HRDATA using a MUX. The reconfigured HRDATA is connected to extended input/output ports through EBIDATAOUT and EBIEXTADDROUT upon the activation of ScanTestMode signal. In the AMBA bus standard, current slave state is supposed to be notified to the master through HRESP[1:0] and HREADY signals when Write transaction is activated. In order to keep performing Write transaction which is used for the application and observation of test patterns by the master TIC, both HREADY= and HRESP[1:0]=00 signals must be replied at the activation of ScanTestMode signal as shown in Fig.

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.3, JUNE, 2014 351 Fig. 13. Observation path of the test vectors. Fig. 15. Test path between AHB to APB. Fig. 14. AMBA response signal of the parallel testing. Fig. 16. Concurrent scan testing with a single SE. 14. End of bus transmission is notified by HREADY=1 for the next transmission, and HRESP[1:0] tells the transmission state where the value of 00 indicates the successful bus transmission. So far the cores connected to the AHB bus are considered, but the cores connected to APB bus for mostly low power operation have to go through AHB- APB bridge. The bridge, which requires extra 2 HCLK clocks to hold the data, brings another complexity in controlling parallel scan test. We propose a new bypass technique using a MUX as shown in Fig. 15 which does not require any extra clock cycle. 5. Scan Control Signal for Parallel Test In this paper scan design is used for the structural testing of cores embedded in an SoC, hence a scan control signal known as Scan Enable has to be adopted. To achieve best test application time expected by the parallel test scheduling algorithm, this paper uses a dedicated Scan Enable signal for each Sub-TAM. Therefore multiple Scan Enable signals are used corresponding to the number of target cores concurrently being tested. With a single scan enable (SE) signal, the scan architecture cannot effectively implement the parallel test schedule resulting in lengthy test application time. A structure for parallel scan test with a single SE is shown in Fig. 16. Since all target cores being tested concurrently do not have the same length of scan chains, all cores cannot be in the same scan states, instead some cores need to be in capture state. Therefore, if a single SE is available, scan shift operations on the cores in scan state must be stopped until the capture is completely finished for the other cores in capture state. Double capture is necessary for delay testing, thus, 1 or 2 clock cycles are assumed to stop the cores in scan shift mode. Simple experiments have been performed to analyze the differences in test application times with single Scan Enable and multiple Scan Enable signals. For ITC 02 benchmark cores, TAM of 32bit width is used for TR-ARCHITECT parallel scheduling. It can be seen in Table 1 that test application time is increased up to 24.35% for 1 clock assumption and 66.61% for 2 clock assumption. A structure for parallel scan test with multiple SEs is shown in Fig. 17. Instead of using additional external and internal signals, no extra pin is not needed to implement multiple SE signals in scan test mode as shown in Fig. 18.

352 JAEHOON SONG et al : EFFICIENT PARALLEL SCAN TEST TECHNIQUE FOR CORES ON AMBA-BASED SOC Table 1. Increase in concurrent test application time with using a single SE compared to multiple SEs SoC Multiple SEs Single SE Increase of test time # of SEs 1 cycle 2 cycle 1 cycle 2 cycle increase ratio increase ratio Fig. 17. Concurrent scan testing with multiple SEs. u226 10665 4 13262 17769 2597 24.35 7104 66.61 d281 4084 4 4504 4740 420 10.28 656 16.06 h953 119357 3 120103 120847 746 0.63 1490 1.25 g1023 16855 5 18969 21095 2114 12.54 4240 25.16 f2126 335334 3 335967 336602 633 0.19 1268 0.38 q12710 2222349 4 2224426 2226503 2077 0.09 4154 0.19 t512505 5268868 3 5276109 5283344 7241 0.14 14476 0.27 a586710 22475033 3 24600198 26725361 2125165 9.46 4250328 18.91 Fig. 18. Reusing functional input as Scan Enable. V. EXPERIMENTAL RESULTS Our parallel scan technique is compared with sequential scan on an AMBA-based SoC. ITC 02 test benchmarks [19] cores are connected to an AMBA bus whose width W max ranges from 16 to 64. The results for the sequential scan are taken from [8] where the TAM utilizing the AMBA bus is not partitioned instead the whole TAM width is allocated to each core. In the parallel scan test, the TAM is partitioned according to our modified TR-ARCHITECT [12] test scheduling algorithm. The test application times for hard and soft cores are shown in Table 2. Scan structures of soft cores can be reconfigured to achieve optimal test application time. Proposed multiple SEs are configured to compensate the inefficiency of the single SE, and in general more scan test time is needed for hard cores. Hence, our scheme gives better results for hard cores than soft cores as verified through the experimental results. More experiments have been performed for a real AMBA based SoC as shown in Fig. 19, and detailed information about the soft cores are described in Table 3. Test patterns are generated using commercial CAD tool and the SoC circuit was synthesized with 0.25 um process libraries. The area is described as 2 input NAND gates, and the columns 5, 6, 7 give the number of Table 2. Test application times of sequential and parallel with multiple SEs (soft/hard cores) SoC u226 d281 h953 g1023 f2126 q12710 t512505 a586710 W max Soft cores Hard cores Seq. Parallel Red.% Seq. Parallel Red.% 16 70.3k 18.7k 73.4 73.2k 18.7k 74.5 32 59.4k 10.7k 82.1 60.9k 10.7k 82.5 16 15.5k 8k 48.2 19.3k 8.2k 57.7 32 9.7k 4.2k 56.9 14.5k 4.1k 71.7 16 82k 72.5k 11.5 244.5k 119.4k 51.2 32 44.8k 36.4k 18.8 239.9k 119.4k 50.2 16 59.9k 31.6k 47.2 89.8k 34.5k 61.6 32 36.6k 16.3k 55.5 72k 16.9k 76.6 16 338.6k 324.7k 4.1 592.7k 372.1k 37.2 32 172.4k 163.1k 5.4 580k 335.3k 42.2 16 2M 1.5M 24.1 7.3M 2.2M 69.4 32 1M 766.3k 25.3 6.8M 2.2M 67.1 16 11M 10.3M 6.7 23.2M 10.5M 54.6 32 5.5M 5.1M 7.4 17.8M 5.3M 70.3 16 66.5M 41.9M 37 73.6M 41.5M 43.6 32 40.7M 21.1M 48.3 51.3M 22.5M 56.2 Fig. 19. AMBA based SoC. Primary Inputs, Primary Outputs, and D Flip-Flops. Number of test patterns and fault coverage are shown in columns 8 and 9 respectively. As can be seen in the previous ITC 02 benchmarks, higher reduction of the test time can be achieved with wider TAM for parallel than sequential test; see Table 4.

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.3, JUNE, 2014 353 Table 3. Core characteristics of AMBA based SoC A H B A P B Core name Leon3 Processor SDRAM Controller AHB-PCI Bridge Ethernet MAC No scan Area Scan design PI PO FFs Although the test application time is reduced in the parallel test, more area overhead is needed than the sequential technique. Standard wrappers are connected to all inputs and outputs in parallel scan design, but the sequential scan in [8] does not connect any test harness to core output. Relatively small area overheads are resulted as shown in Table 5 to realize parallel scan design. VI. CONCLUSIONS # of test patterns Fault coverage 41901 46303 252 148 1166 386 99.75 3701 4115 93 119 212 68 99.45 6364 7055 40 145 275 79 99.92 32737 35580 109 243 1339 485 99.99 UART 9308 10523 69 32 524 231 98.99 GPIO 4922 5107 78 104 96 13 100.00 RTC 7566 9067 47 32 340 130 99.99 Table 4. Test application time of sequential and parallel scan test in AMBA based SoC W max Sequential test Parallel test Reduction ratio 16 114406 99455 13.07 32 61467 50122 18.46 64 35236 25467 27.72 Table 5. Area overhead for AMBA based SoC AH B AP B Core name Test harness for sequential Test harness for parallel Increase ratio Leon3Processor 3975 4648 16.92 SDRAM Controller 2041 2643 29.45 AHB-PCI Bridge 1891 2355 24.49 Ethernet MAC 3027 4136 36.62 UART 1271 1459 14.72 GPIO 1785 2323 30.09 RTC 988 1224 23.82 Total sum and ratio 14978 18788 25.4 An efficient parallel scan test technique has been introduced for an AMBA based SoC. By only using a single Scan Enable, the test application time expected by parallel test schedule cannot be achievable. Multiple Scan Enable signals are used to realize the parallel test schedule. Experiments performed for hard and soft cores show significant reduction in test application time with a small area overhead while conforming to the AMBA standards. Although this technique has been adopted to the AMBA based SoCs, it can be generally applied to other types of buses to reduce test cost. REFERENCES [1] S. Narayanan, R. Gupta, and M. A. Breuer, "Optimal configuring of multiple scan chains," Computers, IEEE Transactions on, vol. 42, pp. 1121-1131, 1993. [2] Z. Shanrui, C. Minsu, N. Park, and F. Lombardi, "Cost-driven optimization of fault coverage in combined Built-In Self-Test/Automated Test Equipment testing," in Instrumentation and Measurement Technology Conference, 2004. IMTC 04. Proceedings of the 21st IEEE, 2004, pp. 2021-2026 Vol.3. [3] S. K. Goel and E. J. Marinissen, "Effective and efficient test architecture design for SOCs," in Test Conference, 2002. Proceedings. International, 2002, pp. 529-538. [4] Y. Zorian, E. J. Marinissen, and S. Dey, "Testing embedded-core based system chips," in Test Conference, 1998. Proceedings., International, 1998, pp. 130-143. [5] "IEEE std 1500 Standard for Embedded Core Test," http://grouper.ieee.org/groups/1500/. [6] J. Aerts and E. J. Marinissen, "Scan chain design for test time reduction in core-based ICs," in Test Conference, 1998. Proceedings., International, 1998, pp. 448-457. [7] P. Varma and S. Bhatia, "A structured test re-use methodology for core-based system chips," in Test Conference, 1998. Proceedings., International, 1998, pp. 294-302. [8] S. Jaehoon, M. Piljae, Y. Hyunbean, and P. Sungju, "Design of Test Access Mechanism for AMBA- Based System-on-a-Chip," in VLSI Test Symposium, 2007. 25th IEEE, 2007, pp. 375-380. [9] L. Chih-Yi and L. Hsing-Chung, "Bus-oriented DFT design for embedded cores," in Circuits and Systems, 2004. Proceedings. The 2004 IEEE Asia-

354 JAEHOON SONG et al : EFFICIENT PARALLEL SCAN TEST TECHNIQUE FOR CORES ON AMBA-BASED SOC Pacific Conference on, 2004, pp. 561-563 vol.1. [10] C. Feige, J. T. Pierick, C. Wouters, R. Tangelder, and H. G. Kerkhoff, "Integration of the Scan-Test Method into an Architecture Specific Core-Test Approach," J. Electron. Test., vol. 14, pp. 125-131, 1999. [11] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems Testing and Testable Design. New York: Computer Science Press, 1991. [12] E. J. Marinissen, S. K. Goel, and M. Lousberg, "Wrapper design for embedded core test," in Test Conference, 2000. Proceedings. International, 2000, pp. 911-920. [13] K.-w. Eom, D.-k. Han, Y. Lee, H.-s. Kim, and S. Kang, "Efficient Multi-site Testing Using ATE Channel Sharing" Journal of Semiconductor Technology and Science, vol. 13, pp. 259-262, 2013. [14] D. K. Bhavsar and S. J. Poehlman, "Test access and the testability features of the Poulson multi-core Intel Itanium processor," in Test Conference (ITC), 2011 IEEE International, 2011, pp. 1-8. [15] X. Dong, Z. Yang, K. Chakrabarty, and H. Fujiwara, "A Reconfigurable Scan Architecture With Weighted Scan-Enable Signals for Deterministic BIST," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 27, pp. 999-1012, 2008. [16] Z. Jin-yi, H. Xu-hui, C. Wan-lin, and W. Han-yi, "Improved delay fault coverage in SoC using controllable multi-scan-enable," in Solid-State and Integrated Circuit Technology (ICSICT), 2010 10th IEEE International Conference on, 2010, pp. 1973-1975. [17] ARM, "AMBA specification (rev. 2.0)," 1999 May. [18] "IEEE Standard Test Access Port and Boundary- Scan Architecture," IEEE std 1149.1. [19] E. J. Marinissen, V. Iyerngar, and K. Charkrabarty. (2002). ITC'02 SOC Test Benchmarks. Available: http://itc02socbenchm.pratt.duke.edu/ Jaehoon Song received the B.S., M.S., and Ph.D. degrees in computer science and engineering from Hanyang University, Gyunggi-do, Korea in 2000, 2002, and 2009 respectively. From 2009 to 2013, he has been working for TranSono Inc., Seoul, Korea. In 2003, he worked for the System-on-a- Chip (SoC) Design Center at Seoul National University in Korea, where he was on the Development Staff in charge of platform-based design. His main research interests are in Design-for-Testability (DfT), signal integrity, and low-power design. Mr. Song is a member of the Institute of Electronics Engineers of Korea and the Korea Information Science Society. He received the Best Paper Award from the Korea Test Association at the Korea Test Conference in 2007. Jihun Jung received the B.S. in computer science and engineering from Hanyang University, Gyunggi - do, Korea in 2010. Since 2010 he has been working toward the M.S. and Ph.D. degree in computer science and engineering at the same University. His interests include Design for Testability, Memory Test, Memory ECC, 3D SIC, Aging monitoring, and NoC Design. Dooyoung Kim received the B.S. and M.S. in computer science and engineering from Hanyang University, Gyunggi-do, Korea in 2004 and 2006. From 2006 to 2012, he was with LG Electronics in South Korea as a research engineer in charge of ASIC Front-end. Since 2012, he has been working toward the Ph.D. degree in computer science and engineering at Hanyang University. His interests include Design for Testability, Low Power Test, Test Cost Reduction, 3D SIC, and Reliability.

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.3, JUNE, 2014 355 Sungju Park received the B.S. degree in electronics from Hanyang University, Korea, in 1983 and the M.S and Ph.D. degrees in electrical and computer engineering from the University of Massachusetts at Amherst in 1988 and 1992, respecttively. From 1983 to 1986, he was with the Gold Star Company in Korea. From 1992 to 1995, he worked for IBM Microelectronics, USA as a Development Staff. Since then, he has been a Professor in the department of computer science and engineering in Hanyang University, Korea. His research interests lie in the area of VLSI testing including scan design, built-in self test, test pattern generation, fault simulation, and synthesis of test.