Advanced Training Course on FPGA Design and VHDL for Hardware Simulation and Synthesis. 26 October - 20 November, 2009

Similar documents
The Read-Out system of the ALICE pixel detector

LHCb and its electronics. J. Christiansen On behalf of the LHCb collaboration

The ALICE on-detector pixel PILOT system - OPS

Large Area, High Speed Photo-detectors Readout

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

University of Oxford Department of Physics. Interim Report

Digilent Nexys-3 Cellular RAM Controller Reference Design Overview

Level and edge-sensitive behaviour

DXP-xMAP General List-Mode Specification

PIXEL2000, June 5-8, FRANCO MEDDI CERN-ALICE / University of Rome & INFN, Italy. For the ALICE Collaboration

CSCB58 - Lab 4. Prelab /3 Part I (in-lab) /1 Part II (in-lab) /1 Part III (in-lab) /2 TOTAL /8

Why FPGAs? FPGA Overview. Why FPGAs?

A video signal processor for motioncompensated field-rate upconversion in consumer television

Design, Realization and Test of a DAQ chain for ALICE ITS Experiment. S. Antinori, D. Falchieri, A. Gabrielli, E. Gandolfi

COE328 Course Outline. Fall 2007

Status of the CSC Track-Finder

Half-Adders. Ch.5 Summary. Chapter 5. Thomas L. Floyd

Conceps and trends for Front-end chips in Astroparticle physics

A new Scintillating Fibre Tracker for LHCb experiment

Laboratory Exercise 4

MUHAMMAD NAEEM LATIF MCS 3 RD SEMESTER KHANEWAL

EEM Digital Systems II

A Fast Constant Coefficient Multiplier for the XC6200

Lossless Compression Algorithms for Direct- Write Lithography Systems

A pixel chip for tracking in ALICE and particle identification in LHCb

DEPFET Active Pixel Sensors for the ILC

The Readout Architecture of the ATLAS Pixel System

TSIU03: Lab 3 - VGA. Petter Källström, Mario Garrido. September 10, 2018

Compact Muon Solenoid Detector (CMS) & The Token Bit Manager (TBM) Alex Armstrong & Wyatt Behn Mentor: Dr. Andrew Ivanov

READOUT ELECTRONICS FOR TPC DETECTOR IN THE MPD/NICA PROJECT

Advanced System LSIs for Home 3D Systems

EITF35: Introduction to Structured VLSI Design

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

Layout Decompression Chip for Maskless Lithography

Electrical and Telecommunications Engineering Technology_TCET3122/TC520. NEW YORK CITY COLLEGE OF TECHNOLOGY The City University of New York

ECE 263 Digital Systems, Fall 2015

FRANCO MEDDI CERN-ALICE / University of Rome & INFN, Italy. For the ALICE Collaboration

PICOSECOND TIMING USING FAST ANALOG SAMPLING

Electronics procurements

EE178 Spring 2018 Lecture Module 5. Eric Crabill

BABAR IFR TDC Board (ITB): system design

Laboratory 4. Figure 1: Serdes Transceiver

VHDL Design and Implementation of FPGA Based Logic Analyzer: Work in Progress

A TARGET-based camera for CTA

Hardware Implementation of Viterbi Decoder for Wireless Applications

Atlas Pixel Replacement/Upgrade. Measurements on 3D sensors

EEE130 Digital Electronics I Lecture #1_2. Dr. Shahrel A. Suandi

The Readout Architecture of the ATLAS Pixel System. 2 The ATLAS Pixel Detector System

ECSE-323 Digital System Design. Datapath/Controller Lecture #1

8 DIGITAL SIGNAL PROCESSOR IN OPTICAL TOMOGRAPHY SYSTEM

Chapter 9 MSI Logic Circuits

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

Task 4_B. Decoder for DCF-77 Radio Clock Receiver

EECS150 - Digital Design Lecture 12 - Video Interfacing. Recap and Outline

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Introductory Digital Systems Laboratory

EE178 Lecture Module 4. Eric Crabill SJSU / Xilinx Fall 2005

Registers and Counters

Review : 2 Release Date : 2019 Last Amendment : 2013 Course Code : SKEE 2742 Procedure Number : PK-UTM-FKE-(0)-10

FPGA Design with VHDL

The Silicon Pixel Detector (SPD) for the ALICE Experiment

CMS Conference Report

Lab 4: Hex Calculator

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

KEK. Belle2Link. Belle2Link 1. S. Nishida. S. Nishida (KEK) Nov.. 26, Aerogel RICH Readout

FPGA Based Data Read-Out System of the Belle 2 Pixel Detector

CDA 4253 FPGA System Design FPGA Architectures. Hao Zheng Dept of Comp Sci & Eng U of South Florida

In-process inspection: Inspector technology and concept

A CONTROL MECHANISM TO THE ANYWHERE PIXEL ROUTER

FPGA Implementation of Convolutional Encoder And Hard Decision Viterbi Decoder

DE2-115/FGPA README. 1. Running the DE2-115 for basic operation. 2. The code/project files. Project Files

The Alice Silicon Pixel Detector (SPD) Peter Chochula for the Alice Pixel Collaboration

THE WaveDAQ SYSTEM FOR THE MEG II UPGRADE

V6118 EM MICROELECTRONIC - MARIN SA. 2, 4 and 8 Mutiplex LCD Driver

Inside Digital Design Accompany Lab Manual

The Pixel Trigger System for the ALICE experiment

Implementation of Low Power and Area Efficient Carry Select Adder

Laboratory Exercise 7

T1 Deframer. LogiCORE Facts. Features. Applications. General Description. Core Specifics

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

SciFi A Large Scintillating Fibre Tracker for LHCb

IMPLEMENTATION OF USB TRANSCEIVER MACROCELL INTERFACE

Evaluation of an Optical Data Transfer System for the LHCb RICH Detectors.

Microprocessor Design

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

Microbolometer based infrared cameras PYROVIEW with Fast Ethernet interface

Front End Electronics

Experiment: FPGA Design with Verilog (Part 4)

Block Diagram. 16/24/32 etc. pixin pixin_sof pixin_val. Supports 300 MHz+ operation on basic FPGA devices 2 Memory Read/Write Arbiter SYSTEM SIGNALS

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Introductory Digital Systems Laboratory

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

FPGA TechNote: Asynchronous signals and Metastability

PRELIMINARY INFORMATION. Professional Signal Generation and Monitoring Options for RIFEforLIFE Research Equipment

1. Convert the decimal number to binary, octal, and hexadecimal.

Hardware Modeling of Binary Coded Decimal Adder in Field Programmable Gate Array

Spartan-II Development System

Muon Forward Tracker. MFT Collaboration

Lab 6: Video Game PONG

Local Trigger Electronics for the CMS Drift Tubes Muon Detector

Transcription:

2065-28 Advanced Training Course on FPGA Design and VHDL for Hardware Simulation and Synthesis 26 October - 20 November, 2009 Starting to make an FPGA Project Alexander Kluge PH ESE FE Division CERN 385, rte Mayrin CH-1211 Geneva 23 Switzerland

Starting to make an FPGA project

FPGA specifications How to make an FPGA? What should it do? How should it do it? Systems / Requirements define detailed implementation scheme/architecture Specification need to be worked out before even one thinks about the FPGA type or code. Specification: understand user needs define specification of system together with user/costumer re-discuss, re-negotiate understand task of designer to understand and translate specifications

FPGA specifications Costumer/boss says: I need a system which can calculate the value each 25 ns. What you might understand is: The calculation needs to be finished within 25 ns What he means is: A new value needs to be processed every 25 ns. How long it takes to present the result does not matter First case: might be impossible, maybe not. Second case: Processors in parallel or in pipeline

Adder Example: add 16 16-bit values in 25 ns data0 data_int (15 downto 0) data1 data2 data3 data4 data5 data6 data7 adder data15 sum(19 downto 0)

24 20

Adder 533 logic elements, 6% 278 pins, 74% 29.7 MHz => 33.6 ns 33.6 ns > 25 ns -> too slow

Adder 533 logic elements, 6% 278 pins, 74% 29.7 MHz => 33.6 ns 33.6 ns > 25 ns -> too slow Ask boss to buy faster, more expensive FPGA Work (manually) on FPGA placing&routing Help synthesizer to make fater adder Ask whether you have understood specification

FPGA specifications Costumer/boss says: I need a system which can calculate the value each 25 ns. What you might understand is: The calculation needs to be finished within 25 ns What he means is: A new value needs to be processed every 25 ns. How long it takes to present the result does not matter First case: might be impossible, maybe not. Second case: Processors in parallel or in pipeline

Pipeline architecture

Adder with pipeline Example: add 16 16-bit values every 25 ns data0 data1 data2 data3 data4 data5 data6 data7 data15 data_int (15 downto 0) adder adder adder adder adder reg reg reg reg reg adder sum(19 downto 0)

24 20

Adder with pipeline Adder without pipeline 533 logic elements, 6% 278 pins, 74% 29.7 MHz => 33.6 ns Adder with pipeline 526 logic elements, 6% 278 pins, 74% 45.4 MHz => 22 ns 22ns < 25 ns, fast enough and less logic

FPGA specifications re-discuss, re-negotiate understand task of designer to understand and translate specifications

Readout Processors

Read-out processors Specification Challenge - many parallel inputs 25 ns intervall - short processing time Storage during trigger decision time Data reduction/encoding (zero suppression) pipelining, buffering (FIFO, dual port RAM)

Pixel detector What do we need to know?

Silicon Sensor Position resolution: 10 µm light material: 1 % X 0 oder 2 mm Dez. 11, 2007 A. Kluge

Silicon Sensors V ext n-bulk p + Dez. 11, 2007 P. Riedler A. Kluge

Silicon Pixel sensors Dez. 11, 2007 P. Riedler A. Kluge

Silicon Pixel Wafers silicon sensor 72.72 mm x 13.92 mm 200 µm thin 160 x 256 pixel 425 µm x 50 µm Dez. 11, 2007 P. Riedler A. Kluge

Pixel read out chip Time resolution: 25 ns Repetition frequency: 40 MHz Storage time: > 3.2 µs Dez. 11, 2007 A. Kluge

Pixel chip Dez. 11, 2007 A. Kluge

Pixel detector 1 sensor 1 sensor 10 readout chips Image:INFN(Padova) Sept 3-7, 2007 A. Kluge

Pixel detector 1 sensor 1 sensor 10 readout chips Image:INFN(Padova) Sept 3-7, 2007 A. Kluge

Pixel detector 00001000000000000000000000 00000000000000000100000000 00000000001000000100000100 00000000000000000000000000

Pixel detector Full detector 120 x 2560 x 32 bits @ 10 MHz (100ns) = ~ 100 Gbits/s Separate read-out for each detector module Each detector module (10 chips) 1 x 2560 x 32 bits @ 10 MHz 00001000000000000000000000000000 00000000000000000100000000000000 00000000001000000100000100000000

Data funnel Data generator Data preprocessor Data processor Data merging

Data funnel Data Read-out generator ASIC Data Read-out preprocessor controller ASIC 1200 x 256 x 32 bits @ 10 MHz (100 ns) = ~100 Gbit/s 120 x 2560 x 32 bits @ 10 MHz (100 ns) = ~100 Gbit/s Data Link processor receiver FPGA Data Router merging FPGA 60 x 2 x 2560 x 32 bits @ 10 MHz (100 ns) = 60 x 1.6 Gbit/s 20 x 6 x 2560 x 32 bits * 0.02 @ 10 MHz (100 ns) = 20 x 10 kbit/s

Pixel detector Data generator 2560 x 32 bits 00001000000000000000000000 00000000000000000100000000 00000000001000000100000100 00000000000000000000000000

Pixel detector What is the strategy? 00001000000000000000000000 00000000000000000100000000 00000000001000000100000100 00000000000000000000000000 Some body counts values all the time, find out whether they can be divided by three, what to you do in real life? Include serial and dpm

Pixel detector channel1-5 serializer de-serializer FIFO zero suppress & address decoder dual port memory channel multiplexer

Pixel detector serializer de-serializer FIFO zero suppress & address decoder dual port memory

Pixel detector data processing 0 0 0 0 0 0 0 0 0 0 0 0 0 check if any hits if no hits -> load new value from FIFO if 1 hit only -> decode the hit & request new value from FIFO if more than one hit -> decode the hits

Pixel detector data processing 31.. 11 10 8 7 6 5 4 3 2 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 How to decode the address? this line has two hits the state machine must send two hits into the dual port memory row address row address hit position = 5 hit position = 11

Pixel detector data processing 31.. 11 10 8 7 6 5 4 3 2 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 Do we know enough to start the project? How do we encode the address? row address row address hit position = 5 hit position = 11

Pixel detector data processing read FIFO control parallelload shiftenable shiftregister serialout 0 0 1 0 0 0 0 1 0 0 0 0 0 cntenable counter writeenable dual port memory

Position decoder shift register

Position decoder shift register

Position decoder shift register VHDL code

state machine with case statement

Shift register is a parallel load register

Position decoder shift register 31.. 11 10 8 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 "00001000001000001100000000011010"

Position decoder shift register

Position decoder shift register

Position decoder shift register 31.. 11 10 8 7 6 0 0 1 0 0 0 0 1 0 0 0 0 0 Shift register & counter (if then) Result in an FPGA from 2002: (Altera EP20k200FC484-3) 81 out of 8320 logic elements 44 registers 5 4 3 2 1 0 11% (41/376) of pins 10.6 ns (94.5 MHz) position_count-> position_count tco: 8.0 ns: data_word_reg -> data_word tsu: 7.0 ns: new_value_available -> data_encode

Position decoder shift register 31.. 11 10 8 7 6 0 0 1 0 0 0 0 1 0 0 0 0 0 Shift register & counter (case) Result in an FPGA from 2002: (Altera EP20k200FC484-3) 50 out of 8320 logic elements (with case statement) 44 registers 5 4 3 2 1 0 11% (41/376) of pins 9.1 ns (109.9 MHz) position_count-> data_encode tco: 7.0 ns: data_word_reg -> data_word tsu: 6.3 ns: new_value_available -> data_encode

Position decoder shift register Task fulfilled? Few logic cells Timing constraints fulfilled User requirements fulfilled? Processing per 32 bit line takes: 32 bits * 25 ns = 800 ns Data comes each 100 ns -> 1 out of 2560 32 bit line Decoding time for all lines is: 2560 * 800 ns => 2 ms Within 2 ms => 20480 data lines arrive input FIFO would need to be at least 20k * 32 bit deep During 2 ms no other trigger acquisition can take place dead time => max trigger rate: 488 Hz User requirements not fulfilled

Position decoder priority encoder 31.. 11 10 8 7 6 5 4 3 2 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 How to decode the address? this line has two hits the state machine must send two hits into the dual port memory row address hit position = 5 row address hit position = 11

Position decoder priority encoder read FIFO sel mux control register 0 0 1 0 0 0 0 1 0 0 1 0 1 load 31.. 10 9 8 7 6 priority encoder 10 5 4 3 2 1 0 31 address decoder 1 1 0 1 1 1 1 1 1 1 1 1 1.. 10 9 8 7 6 5 4 3 2 1 0 writeenable dual port memory

Position decoder priority encoder

Position decoder priority encoder

Position decoder priority encoder

Position decoder priority encoder

Position decoder priority encoder

Position decoder priority encoder 31.. 11 10 8 0 0 1 0 0 0 0 1 0 0 0 0 0 Priority encoder Result in an FPGA from 2002: (Altera EP20k200FC484-3) 172 (out of 8320) logic elements 33 registers addressdecoder: 16 prior32: 54 11% (41/376) of pins 20.8 ns (48.0 MHz) data_encode -> state_encoding 7 6 5 4 3 2 1 0 tco: tsu: 17.1 ns:data_encode -> data_word 14.9 ns:new_value -> state_encoding

Position decoder priority encoder 31.. 11 10 8 7 0 0 1 0 0 0 0 1 0 0 0 0 0 Priority encoder Result in an FPGA from 2002: (Altera EP20k200FC484-3) 172 (out of 8320) logic elements -> more logic cells 33 registers addressdecoder: 16 prior32: 54 11% (41/376) of pins 20.8 ns (48.0 MHz) data_encode -> state_encoding -> slower state machine, but faster processing tco: 17.1 ns:data_encode -> data_word tsu: 14.9 ns:new_value -> state_encoding 6 5 4 3 2 1 0

Position decoder priority encoder Task fulfilled? Many logic cells FPGA Timing constraints fulfilled User requirements fulfilled? Processing per 32 bit line takes: numbhits per line * 25 ns =? Data comes each 100 ns -> one out of 2560 32 bit line Decoding time for all lines is: 2560 *? ns =>? ms Within? ms =>? data lines arrive input FIFO would need to be at least? * 32 bit deep During? ms no other trigger acquisition can take place dead time => max trigger rate:? Hz User requirements fulfilled?

Position decoder priority encoder Task fulfilled? Physics simulation: max 2% of all pixels will be hit in one acquisition User requirements fulfilled? Processing per 32 bit line takes: (numbhits per line) * 25 ns = (32 * 0.02) * 25 ns = <25 ns Data comes each 100 ns -> one out of 2560 32 bit line One line with up to 4 hits can be decoded before the next line arrives Input FIFO of 1000 * 32 bits implemented to buffer statistical fluctuations or calibration sequences Dead time defined by transmission of data stream 2560 lines each 100 ns => 256 µs => 3900 Hz dead time => max trigger rate: 3900 Hz User requirements fulfilled: yes

Position decoder priority encoder 31.. 11 10 8 7 6 0 0 1 0 0 0 0 1 0 0 0 0 0 Priority encoder Result in an FPGA from 2002: (Altera EP20k200FC484-3) 172 (out of 8320) logic elements -> more logic cells 5 4 3 2 1 0 20.8 ns (48.0 MHz) data_encode -> state_encoding -> slower state machine, but faster processing Slower and more logic can mean more elegant and effective

Position decoder priority encoder User requirements fulfilled: yes Can we do better? Can we do faster or with less logic? Do we know something which the synthesizer does not know?

Position decoder priority encoder

Position decoder priority encoder Knowledge of implementation in target technology is important Knowledge of what the synthesizer is doing is important

Processor board with optical inputs 12 channels Parallel optical receiver module 12 closely packed G-link deserializer ASICs