Design of an Infrastructural IP Dependability Manager for a Dependable Reconfigurable Many-Core Processor

Similar documents
Bit Swapping LFSR and its Application to Fault Detection and Diagnosis Using FPGA

Design of Fault Coverage Test Pattern Generator Using LFSR

Overview: Logic BIST

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

Available online at ScienceDirect. Procedia Computer Science 46 (2015 ) Aida S Tharakan a *, Binu K Mathew b

Using on-chip Test Pattern Compression for Full Scan SoC Designs

[Krishna*, 4.(12): December, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

This Chapter describes the concepts of scan based testing, issues in testing, need

Design and Implementation of Uart with Bist for Low Power Dissipation Using Lp-Tpg

SIC Vector Generation Using Test per Clock and Test per Scan

VHDL Implementation of Logic BIST (Built In Self Test) Architecture for Multiplier Circuit for High Test Coverage in VLSI Chips

Low Power Illinois Scan Architecture for Simultaneous Power and Test Data Volume Reduction

Low Cost Fault Detector Guided by Permanent Faults at the End of FPGAs Life Cycle Victor Manuel Gonçalves Martins

Scan. This is a sample of the first 15 pages of the Scan chapter.

A video signal processor for motioncompensated field-rate upconversion in consumer television

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA

Design of BIST with Low Power Test Pattern Generator

VHDL Design and Implementation of FPGA Based Logic Analyzer: Work in Progress

DESIGN OF RANDOM TESTING CIRCUIT BASED ON LFSR FOR THE EXTERNAL MEMORY INTERFACE

L11/12: Reconfigurable Logic Architectures

Analysis of Low Power Test Pattern Generator by Using Low Power Linear Feedback Shift Register (LP-LFSR)

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

Testing Sequential Logic. CPE/EE 428/528 VLSI Design II Intro to Testing (Part 2) Testing Sequential Logic (cont d) Testing Sequential Logic (cont d)

Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)

Design of Test Circuits for Maximum Fault Coverage by Using Different Techniques

Design for Testability

Lecture 23 Design for Testability (DFT): Full-Scan

UNIT IV CMOS TESTING. EC2354_Unit IV 1

Design and Implementation OF Logic-BIST Architecture for I2C Slave VLSI ASIC Design Using Verilog

DETERMINISTIC SEED RANGE AND TEST PATTERN DECREASE IN LOGIC BIST

Diagnosis of Resistive open Fault using Scan Based Techniques

A Briefing on IEEE Standard Test Access Port And Boundary-Scan Architecture ( AKA JTAG )

CMOS Testing-2. Design for testability (DFT) Design and Test Flow: Old View Test was merely an afterthought. Specification. Design errors.

TEST PATTERN GENERATION USING PSEUDORANDOM BIST

DESIGN OF LOW POWER TEST PATTERN GENERATOR

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

Design of BIST Enabled UART with MISR

Changing the Scan Enable during Shift

Comparative Analysis of Stein s. and Euclid s Algorithm with BIST for GCD Computations. 1. Introduction

Why FPGAs? FPGA Overview. Why FPGAs?

VLSI System Testing. BIST Motivation

L12: Reconfigurable Logic Architectures

An Application Specific Reconfigurable Architecture Diagnosis Fault in the LUT of Cluster Based FPGA

IMPLEMENTATION OF X-FACTOR CIRCUITRY IN DECOMPRESSOR ARCHITECTURE

Sharif University of Technology. SoC: Introduction

Testing Digital Systems II

LFSR TEST PATTERN FOR FAULT DETECTION AND DIAGNOSIS FOR FPGA CLB CELLS

Built-In Self-Test of Embedded SEU Detection Cores in Virtex-4 and Virtex-5 FPGAs

Fpga Implementation of Low Complexity Test Circuits Using Shift Registers

VLSI Test Technology and Reliability (ET4076)

LFSRs as Functional Blocks in Wireless Applications Author: Stephen Lim and Andy Miller

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

A New Approach to Design Fault Coverage Circuit with Efficient Hardware Utilization for Testing Applications

Level and edge-sensitive behaviour

A Novel Low Power pattern Generation Technique for Concurrent Bist Architecture

Unit 8: Testability. Prof. Roopa Kulkarni, GIT, Belgaum. 29

for Digital IC's Design-for-Test and Embedded Core Systems Alfred L. Crouch Prentice Hall PTR Upper Saddle River, NJ

Implementation of Scan Insertion and Compression for 28nm design Technology

EE241 - Spring 2001 Advanced Digital Integrated Circuits. References

Testing of UART Protocol using BIST

Based on slides/material by. Topic 14. Testing. Testing. Logic Verification. Recommended Reading:

Digital Systems Laboratory 1 IE5 / WS 2001

LFSR Counter Implementation in CMOS VLSI

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Testing Digital Systems II

Saving time & money with JTAG

ISSN (c) MIT Publications

DESIGN FOR TESTABILITY

A Symmetric Differential Clock Generator for Bit-Serial Hardware

Implementation of UART with BIST Technique

Tools to Debug Dead Boards

Using SignalTap II in the Quartus II Software

Testing of Cryptographic Hardware

Weighted Random and Transition Density Patterns For Scan-BIST

Built-In Self-Test (BIST) Abdil Rashid Mohamed, Embedded Systems Laboratory (ESLAB) Linköping University, Sweden

Lecture 17: Introduction to Design For Testability (DFT) & Manufacturing Test

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

BIST for Logic and Memory Resources in Virtex-4 FPGAs

California State University, Bakersfield Computer & Electrical Engineering & Computer Science ECE 3220: Digital Design with VHDL Laboratory 7

3. Configuration and Testing

Bit-Serial Test Pattern Generation by an Accumulator behaving as a Non-Linear Feedback Shift Register

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

Power Problems in VLSI Circuit Testing

Synthesis Techniques for Pseudo-Random Built-In Self-Test Based on the LFSR

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

FIELD programmable gate arrays (FPGA s) are widely

Laboratory Exercise 7

A Fast Constant Coefficient Multiplier for the XC6200

Instructions. Final Exam CPSC/ELEN 680 December 12, Name: UIN:

Simulation Mismatches Can Foul Up Test-Pattern Verification

Clock Gating Aware Low Power ALU Design and Implementation on FPGA

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

FPGA-BASED EDUCATIONAL LAB PLATFORM

ISSN:

Chapter 8 Design for Testability

Low Power Implementation of Launch-Off- Shift and Launch-Off-Capture Using T-Algorithm

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab. Built-In Self Test 2

Transcription:

Design of an Infrastructural IP Dependability Manager for a Dependable Reconfigurable Many-Core Processor Hans G. Kerkhoff and Xiao Zhang Testable Design and Test of Integrated Systems (TDT) Group Centre of Telecommunication and Information Technology (CTIT), Enschede, the Netherlands h.g.kerkhoff@utwente.nl and x.zhang@utwente.nl Abstract Reconfigurable many-core processors have many advantages over conventionally designed devices, such as low power consumption and very high flexibility. For an increasing number of safety-critical applications, these processors must have an ultra high dependability. This paper discusses the design and verification of an infrastructural IP, the Dependability Manager, which takes care of most essential dependability issues. Several additional innovative approaches with regard to dependability have been incorporated, like the NoC, wrapper and Network Interface design. The Dependability Manager design has been verified on an FPGA and is being processed in UMC CMOS technology as part of a many-core processor. Keywords: dependability, availability, reliability, many-core processors, reconfiguration, DfX, SoC, BIST 1 Introduction The advances in digital processors are often related to many-core processors, using more (dual, quad etc.) than one processor IP in a processor SoC. In order to cope with the huge data communication requirements between these cores, the cores are often interconnected by a Network-on-Chip (NoC). If the cores are identical, they are often referred to as tiles. On the other hand, these highly complex SoCs are increasingly used in safety-critical applications, like in the automotive, medical and military arena. This demands ultra dependable processor SoCs [1]. Because there are many tasks to be performed to accomplish this goal, the design of a dedicated Dependability Manager (DM) is considered nowadays to be a promising approach. As the DM is not related to a functional task of the SoC, it is referred to as an Infrastructural IP (IIP). This paper deals with the design and verification of a DM for a Reconfigurable Fabric Device (RFD) as being developed within the European CRISP 1 project. The paper is organised in the following way: First, the global architecture of the RFD is briefly discussed. It shows a very high regularity in terms of the tiles, interconnected by a NoC. The tile is a reconfigurable pipelined Xentium processor core from Recore Systems and associated local memories. This high regularity provides a clue with regard to the 1 This research is conducted within the FP7 Cutting edge Reconfigurable ICs for Stream Processing (CRISP) project (ICT- 215881) supported by the European Commission. periodic structural testing of these tiles, which is the starting point of our dependability approach. Repair is accomplished via run-time mapping of remaining fault-free reconfigurable Xentiums on the application. Next, the environment of the DM in the SoC is explained in more detail, including Network Interfaces (NI), Xentium tile and DM wrappers and the NoC. The central part of the paper discusses the functional blocks in the DM and their interaction, being the testpattern generator (TPG), the test-response evaluator (TRE) and the controller (FSM). Simulation results, as well as FPGA hardware tests, are shown. Finally, some conclusions are provided. 2 The Dependable Reconfigurable Fabric Device For many applications, like e.g. beam-forming, a flexibility of the functionality of the processing elements in real time in a SoC is an advantage to cope with changing requirements of the application due to actual circumstances. A possible set-up of such a SoC is shown in Figure 1. It consists of many reconfigurable processing tiles, being a Xentium processor core and its local associated memories, interconnected by a high performance (wormhole) NoC. The configuration for the individual tiles is taken care of by a General Purpose Device (GPD), which can be on-chip (e.g. ARM9-based IP) or offchip. As the RFD is meant to be used for safetycritical applications, the dependability has to be very high. The high degree of regularity, as well as the NoC communication provides new innovative ways to guarantee dependability.

Figure 1: Basic setup of a Reconfigurable Fabric Device (RFD) including 64 Xentium tiles [2]. In our case, two attributes [1] are of key importance: - on-chip detection of stuck-at faults in the tiles and NoC occurring during its life-time, relating to reliability (0,9783, 15 years), and subsequent repair - fast recovery time (10ms), being the time from the occurrence of a fault up to repair and re-initialization, resulting in a very high availability The central point of focus in this paper is the left bottom IIP in Figure 1, the Dependability Manager [1, 2]. It receives its commands from the GPD, which includes a dependability API, over the NoC. As first step, the NoC is functionally tested by the GPD. Basically, the hardware TPG generates test-patterns for the Xentium core which are distributed over three Xentium cores via the NoC, chosen by the GPD. Subsequently, the three test responses are send to the TRE via the NoC which compares the results, and flags in case of a fault. In the latter case, the GPD starts a run-time remapping operation (software), thereby omitting faulty tiles and/or NoC segments. 3 The Dependability Manager in the RFD SoC Because the DM communicates via the NoC with the Xentium tiles as well as the GPD, special measures have to be taken care of. An important condition of our approach is the fault-free behaviour of the NoC. This is taken care of via software running on the GPD, which basically verifies the functional behaviour of the NoC; this will not be further discussed in this paper. In the first paragraph, the environment of the DM in relation with the tiles is discussed, while in the second paragraph the NI is dealt with in more detail. 3.1 Environment of the Dependability Manager Figure 2 shows the most essential parts in the communication between the DM and the Xentium tiles. The NoC is a dedicated design of the packetswitched wormhole type, capable of multi-casting and running at 200 MHz. The multi-casting is required for providing the test-vectors at multiple Xentium locations. The NoC has routers at each crossing, determining the actual routing of the packet. More detailed information can be found in references [3, 4]. Each scan-based Xentium core has a specially designed wrapper, which is used during normal SoC final testing as well as during its life-time for accomplishing the dependability scenario. The associated Xentium memories are locally BISTed, and finally OR-ed with the final scan-result (OK-NOK). The design of the wrappers will be subject of another paper. The Xentium network interface (NI) has been designed by Recore Systems and will not be treated here either. The Dependability Manager Network Interface (DM- NI), shown in the left-hand IIP has been specially designed for this purpose, and is discussed in detail in the next paragraph. The TPG can generate 32-bit test vectors on demand, which are subsequently multicast to three chosen Xentium tiles. The control part (FSM) also sets the Xentium wrappers for the dependability scenario. The test responses are routed via two channels to the TRE being a result of bandwidth requirements. The DM can be configured by the GPD via the NoC and a Multi-Channel Port (MCP) in the case the GPD is off-chip. Figure 2: Essential parts of the DM communication in the RFD. The DM wrappers have been omitted. 3.2 Network Interface of the DM As shown in Figure 2, the network interface (DM-NI) is an essential part of the DM-IIP. It takes care of the bidirectional communication between the TPG, TRE and FSM on one side, and the NoC at the other side. The basic scheme of the DM-NI, divided in a sending and receiving part, is shown in Figure 3. From the In links, data arrives from the NoC, while from the Out links data departs to the NoC. Because of our

bandwidth requirements, two virtual channel handlers are required of 4 virtual channels each. Figure 3: Simplified scheme of the DM network interface. The data from the three Xentium (X) tile responses, for instance, are buffered in the response handlers, and then separated in Xentium response scan data and memory BIST data via the Xentium wrapper status. This data is subsequently handled by the TRE. In the case the GPD is activating the DM via the NoC, this configuration data is routed towards the DM configuration input. In the lower right part of the NI (Figure 3), the generated test-vectors of the TPG (data) are loaded in the flit generator. In the succeeding multiplexer, the chosen Xentium tiles or their internal addresses are chosen and finally multicasted over the NoC via the Send arbiter. Figure 4 shows a Modelsim simulation to illustrate the communication in the NI. For the sake of simplicity, only a part will be discussed. Box a consists of the In links and Out links. Box b, includes the DM configuration and status. The three Xentium responses and the white line TPG data is shown in box c. The last box d shows NI data and control lines. In (1), Figure 4, the GPD addresses the NI via the NoC. As a result, the NoC Out link and connection is being configured (2), as well as the DM (3). In (4) and (5), the Xentium wrappers and Xentiums are configured for testing first and their status read subsequently. Responses are shown in (6). In (7), the commands for test-vector generation (TPG) are given in the DM, which starts in (8). This TPG data is subsequently put on the NoC in (9). In the next paragraph, the DM parts will be discussed in more detail. 4 The Dependability Manager in Detail This paragraph will provide detailed information on the Dependability Manager. As Figure 5 shows, it consists of three main blocks, the Test Pattern Generator (TPG), the Test-Response Evaluator (TRE), and the local controller based on finite state machines (FSM). For completeness it is noted that the embedded memories, which are part of the Xentium tiles, are locally BISTed. Hence no TPG or TRE is required for this purpose. The parts only concern the Xentium cores. The combined network interface (NI) has been previously described. First, the TPG is Figure 4: Communication via de NI between NoC and TPG, TRE and FSM.

discussed, next the TRE and finally the FSM. The paragraph will also include actual hardware tests, besides Modelsim simulations. Figure 5: Detailed structure of the Dependability Manager. 4.1 The Test-Pattern Generator (TPG) The TPG is an essential part of our dependability concept [3]. If a stuck-at fault is not found in a scanbased Xentium core, by a periodic structural-based test, it will be labelled correct for use in the application. The fault coverage is hence the obvious parameter for the dependability efficiency. In order to build a generic TPG, a compiler was built which accepts deterministic test-vectors and automatically generates the VHDL code of the hardware implementation as close as possible to generate these deterministic vectors. First, the compiler is briefly discussed, then its verification by means of Modelsim simulation. Although not shown here, an actual circuit simulation confirmed its unique characteristics in a 90nm CMOS process [6]. As the architecture, and hence logic-gate level implementation, of the Xentium core was continuously developing in time, a very flexible and fast implementation path of the TPG had to be implemented. As a result, a TPG compiler was developed, in the style of DBIST [3]. Of course a chosen architecture is the basis of the TPG, with a number of changeable parameters. An example is shown in Figure 6. It consists of a programmable Fibonacci LFSR, seeding hardware, and a phase shifter [5]. Bit-flipping is an advanced module of the compiler. The deterministic patterns are currently determined by Synopsys TetraMAX from the VHDL-synthesized Xentium. It has 32 scan-chains of length 413. Scan-chain ordering in the layout phase has been taken into account. The result is a synthesizable VHDL code for the TPG, having the unique feature to pause and resume scan-test vectors depending on the NoC traffic load almost instantaneously. This will be detailed in another publication. To show the correct operation of the generated VHDL code, Figure 7 shows the Modelsim simulation of the generation of four test vectors [5]. The parameters used are the test-pattern length of 413 (scan flip-flops). Although only four scan patterns are shown, actually 1002 patterns are generated. Pause/resume options have not been used in this example. Of particular interest are the last two signals, being the generated scan-test (sc) output followed by the generated primary inputs (pi). The two scan vectors (412) and (413) are the last, followed by zeros only. Next the PIs are provided; note they were all zero when the scan vectors were generated. In slot (419), the LFSR is initialized and during the next clock cycle the first seed is loaded in the LFSR. Then, the first scan test vector is generated (1), and the next (2). Via automatic comparison of TetraMAX outputs and the TPG result vectors it was verified that they are identical. Many other interesting experiments were carried out, relating the TPG to used Silicon area, power dissipation and number of vectors, pause/resume cycles and TetraMAX care-bit distribution; however, they will not be discussed here. 4.2 The Test-Response Evaluator (TRE) The evaluation of the response test vectors from the Xentium tile resulting from the TPG is also handled Figure 6: Example architecture of the TPG.

Figure 7: Four test vectors generated from compiler implementation. within the Dependability Manager IIP. The fact that many Xentium tiles are present, enables the use of comparison between (3) Xentium cores, assuming that identical faults will not occur at 3 locations simultaneously [4, 7]. This greatly reduces the area required otherwise for evaluation. The basic design of the TRE is essentially a 3-input 32-bits comparator, preceded by three buffer FIFOs and a crossbar and a dedicated controller unit; it has already been published in reference [4]. However at that stage, the TRE was still considered a separate IIP, requiring its own network interface (NI). As a consequence, the TRE has been adapted later on, simulated in QuestaSim, implemented on a Xilinx Virtex4 board and subsequently tested. The simulation results of the new TRE are shown and explained in Figure 8. In the first 3 signals (black boxed at top left), the clock and resets are shown. After that (first arrow top left), the 32-bit results from the Xentium(s) are shown serially. In the middle arrow labelled a, a fault has been introduced/injected. This results in the arrow labelled b (signal full_pass) indicating that an error has occurred during comparison, and hence a Xentium core failed. The arrow labelled c indicates that the buffers in the TRE are full, and hence no new data can be read in. The bottom signal full_fail_pointer using the bidirectional arrow indicates during which test vector the comparison noticed a difference. The simulations showed that the circuit could operate beyond the required 200 MHz. 4.3 The Control Part of the DM (FSM) The DM accepts commands from the dependability software running on the GPD (Figure 2), via the DM configuration register. The DM can carry out tests as specified, e.g. which tiles are involved, and update the register to report the test results to the GPD. The internal control in the DM is carried out via a finite state machine (FSM), which was designed using the StateCAD software of Xilinx. The design was extensively verified by simulation for several dependability and debugging scenarios, including emulating faults in the Xentium core. Figure 8: Simulation of the TRE functionality.

4.4 Hardware Test Verification of the DM The complete DM, being the TPG, TRE, FSM and the NI was synthesised and implemented on a Xilinx Virtex 4 board for carrying out hardware tests. The total space required was 13%. Synopsys synthesis resulted in around 78k equivalent logic gates. For DM hardware test evaluation purposes, a RS232 data communication between FPGA and a PC was used in combination with a developed GUI in Visual Basic. A maximum test frequency of 212 MHz was used. As example, Figure 9 shows the GUI of the set-up to test the TRE part which includes some test results. The status block refers to failures in buffers (full) or the data streams (Xentium test responses). The control block provides TPG options, where data fault responses can be automatically generated. Most interesting is the communication viewer. It shows the three 32-bit data streams, the control commands (yellow/grey boxes) and the measured responses from the TRE (Virtex-4). The first result shows a fault in the first data stream, while the second occurs in the third data stream. It illustrates the correct operation of the TRE [8]. it is possible to include fault-tolerant comparators and TPG hardware. During its lifetime, the current DM can be internally tested periodically; means have been included to take over its function in the case of failure, by software or external hardware. In both cases, however, at the cost of a significantly decreased (especially in the software approach) availability. 5 Conclusions In this paper we have discussed the design and verification of an infrastructural IP, the Dependability Manager (DM). It is the essential part for enhancing the dependability of our many-cores Reconfigurable Fabric (RFD). It can be controlled with an internal or external General Purpose Device via a Network-on- Chip. 6 Acknowledgements The authors would like to acknowledge the discussions with the partners of the CRISP consortium, especially Bart Vermeulen, and Mark Westmijze for the network interface development. Figure 9: The TRE part hardware verification by means of an FPGA [8]. After completing the full verification on FPGA, the next step will be the processing in an UMC CMOS process. The current chip layout shows a total Silicon area of 0.24mm 2 for the DM in 90nm technology. 4.5 Debugging and Dependability of the DM The DM is equipped with scan-cells, wrappers and dedicated pins for test and debug. During production test, the DM is scan tested in a conventional way. For prototype evaluation, 21 pins are available for directpin debugging, in which case the DM is considered as stand-alone IIP. In the current version of the DM, no additional means have been incorporated to increase the hardware dependability of the IIP itself. However, 7 References [1] S. Sakai, M. Goshima and H. Irie, Ultra Dependable Processor, IEICE Trans. Electronics, vol. E91-C no. 9, pp. 1386-1393 (2008). [2] X. Zhang and H.G. Kerkhoff, Design of a Highly Dependable Beam Forming Chip, in Proc. Euromicro on Digital System Design (DSD09), Patras Greece, pp. 729-735 (2009). [3] O.J. Kuiken, X. Zhang and H.G. Kerkhoff, Built-In Self-Diagnostics for a NoC-Based Reconfigurable IC for Dependable Beamforming Applications, in Proc. IEEE Intern. Symp. on Defect and Fault Tolerance in VLSI Systems (DFT08), Cambridge USA, pp. 45-53 (2008). [4] H.G. Kerkhoff, O. Kuiken and X. Zhang, Increasing SoC Dependability via Known Good Tile NoC Testing, IEEE Intern. Conf. on Dependable Systems and Networks (DSN08), Anchorage USA (2008). [5] M. Duiven and F. van der Ende, Design of a Generic 90nm Test-Pattern Generator, Technical Report University of Twente, 69200912, July, 78 pages (2009). [6] T. Bruintjes and T. Jongsma, A Xentium TPG at Transistor Level, Technical Report University of Twente, 69200914, July, 42 pages (2009). [7] H.G. Kerkhoff and J.J.M. Huijts, Testing of a highly Reconfigurable Processor Core for Dependable Data Streaming Applications, in Proc. Symposium on Electronic Design Test and Applications (DELTA08), Hong Kong China, pp. 38-44 (2008). [8] W. van den Beld and J. Huiting, Simulation and Implementation of a Test Response Evaluator on a FPGA, Technical Report University of Twente, 69200918, July, 71 pages (2009).