ALICE Week Technical Board TPC Intelligent Readout Architecture. Volker Lindenstruth Universität Heidelberg

Similar documents
LHCb and its electronics. J. Christiansen On behalf of the LHCb collaboration

PIXEL2000, June 5-8, FRANCO MEDDI CERN-ALICE / University of Rome & INFN, Italy. For the ALICE Collaboration

ALICE Muon Trigger upgrade

Minutes of the ALICE Technical Board, November 14 th, The draft minutes of the October 2013 TF meeting were approved without any changes.

CSC Data Rates, Formats and Calibration Methods

FRONT-END AND READ-OUT ELECTRONICS FOR THE NUMEN FPD

Zebra2 (PandA) Functionality and Development. Isa Uzun and Tom Cobb

Build Applications Tailored for Remote Signal Monitoring with the Signal Hound BB60C

THE Collider Detector at Fermilab (CDF) [1] is a general

CMS Conference Report

AIDA Advanced European Infrastructures for Detectors at Accelerators. Milestone Report. Pixel gas read-out progress

SuperB- DCH. Servizio Ele<ronico Laboratori FrascaA

S.Cenk Yıldız on behalf of ATLAS Muon Collaboration. Topical Workshop on Electronics for Particle Physics, 28 September - 2 October 2015

BABAR IFR TDC Board (ITB): requirements and system description

University of Oxford Department of Physics. Interim Report

Fast Orbit Feedback at the SLS. Outline

Data Acquisition System for Segmented Reactor Antineutrino Detector

EECS150 - Digital Design Lecture 12 - Video Interfacing. Recap and Outline

FRANCO MEDDI CERN-ALICE / University of Rome & INFN, Italy. For the ALICE Collaboration

Lossless Compression Algorithms for Direct- Write Lithography Systems

Chapter 4: One-Shots, Counters, and Clocks

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

LHCb and its electronics.

THE WaveDAQ SYSTEM FOR THE MEG II UPGRADE

DAQ Systems in Hall A

Design, Realization and Test of a DAQ chain for ALICE ITS Experiment. S. Antinori, D. Falchieri, A. Gabrielli, E. Gandolfi

The Read-Out system of the ALICE pixel detector

READOUT ELECTRONICS FOR TPC DETECTOR IN THE MPD/NICA PROJECT

Tracking Detector R&D at Cornell University and Purdue University

The Cornell/Purdue TPC

Trigger Cost & Schedule

Advanced Training Course on FPGA Design and VHDL for Hardware Simulation and Synthesis. 26 October - 20 November, 2009

PEP-II longitudinal feedback and the low groupdelay. Dmitry Teytelman

Experiment: FPGA Design with Verilog (Part 4)

Features of the 745T-20C: Applications of the 745T-20C: Model 745T-20C 20 Channel Digital Delay Generator

An Overview of Beam Diagnostic and Control Systems for AREAL Linac

First Measurements with the ATLAS Level-1 Calorimeter Trigger PreProcessor System

GFT Channel Digital Delay Generator

DT9834 Series High-Performance Multifunction USB Data Acquisition Modules

Sector Processor to Detector Dependent Unit Interface

WBS Trigger. Wesley Smith, U. Wisconsin CMS Trigger Project Manager. DOE/NSF Review April 11, 2000

Design and Implementation of Timer, GPIO, and 7-segment Peripherals

US CMS Endcap Muon. Regional CSC Trigger System WBS 3.1.1

Conceps and trends for Front-end chips in Astroparticle physics

Data Converters and DSPs Getting Closer to Sensors

1 Digital BPM Systems for Hadron Accelerators

FPGA Based Data Read-Out System of the Belle 2 Pixel Detector

Local Trigger Electronics for the CMS Drift Tubes Muon Detector

arxiv: v3 [astro-ph.im] 2 Nov 2011

TIME RESOLVED XAS DATA COLLECTION WITH AN XIA DXP-4T SPECTROMETER

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Data flow architecture for high-speed optical processors

Optical clock distribution for a more efficient use of DRAMs

Layout Decompression Chip for Maskless Lithography

High Speed Data Acquisition Cards

IT T35 Digital system desigm y - ii /s - iii

Supercam Spectrometer Synchronization at the SMT 7 February 2007 Craig Kulesa

New Spill Structure Analysis Tools for the VME Based Data Acquisition System ABLASS at GSI

CMS Note Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

GERDA GeDDAQ. Status, operation, integration. INFN Padova INFN & University Milano. Calin A. Ur

Evaluation of Giga-bit Ethernet Instrumentation for SalSA Electronics Readout (GEISER)

DT9837 Series. High Performance, USB Powered Modules for Sound & Vibration Analysis. Key Features:

The CMS Drift Tube Trigger Track Finder

TORCH a large-area detector for high resolution time-of-flight

Major Differences Between the DT9847 Series Modules

Global Trigger Trigger meeting 27.Sept 00 A.Taurok

OVERVIEW OF DATA FILTERING/ACQUISITION FOR A 47r DETECTOR AT THE SSC. 1. Introduction

AMD-53-C TWIN MODULATOR / MULTIPLEXER AMD-53-C DVB-C MODULATOR / MULTIPLEXER INSTRUCTION MANUAL

GREAT 32 channel peak sensing ADC module: User Manual

Test Beam Wrap-Up. Darin Acosta

Evaluation of an Optical Data Transfer System for the LHCb RICH Detectors.

A prototype of fine granularity lead-scintillating fiber calorimeter with imaging read-out

Arbitrary Waveform Generator

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

Electronics Status and Upgrade Opportunities for Flash ADC and 12GeV Trigger Hardware

MTL Software. Overview

TABLE 3. MIB COUNTER INPUT Register (Write Only) TABLE 4. MIB STATUS Register (Read Only)

The ALICE TPC Front End Electronics

ISSCC 2006 / SESSION 18 / CLOCK AND DATA RECOVERY / 18.6

Spatial Light Modulators XY Series

EAN-Performance and Latency

Distributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes

Development of BPM Electronics at the JLAB FEL

Improving EPICS IOC Application (EPICS user experience)

Milestone Solution Partner IT Infrastructure Components Certification Report

DXP-xMAP General List-Mode Specification

The hybrid photon detectors for the LHCb-RICH counters

Paul Rubinov Fermilab Front End Electronics. May 2006 Perugia, Italy

National Park Service Photo. Utah 400 Series 1. Digital Routing Switcher.

THE TIMING COUNTER OF THE MEG EXPERIMENT: DESIGN AND COMMISSIONING (OR HOW TO BUILD YOUR OWN HIGH TIMING RESOLUTION DETECTOR )

The ATLAS Level-1 Central Trigger

Status of the CSC Track-Finder

The LHCb Timing and Fast Control system

A Terabyte Linear Tape Recorder

GFT Channel Slave Generator

A pixel chip for tracking in ALICE and particle identification in LHCb

EE241 - Spring 2005 Advanced Digital Integrated Circuits

The Pixel Trigger System for the ALICE experiment

Model 4455 ASI Serial Digital Protection Switch Data Pack

Beam test of the QMB6 calibration board and HBU0 prototype

Transcription:

ALICE Week 17.11.99 Technical Board TPC Intelligent Readout Architecture Volker Lindenstruth Universität Heidelberg

What s new?? TPC occuppancy is much higher than originally assumed New Trigger Detector TRD First time TPC selective readout becomes relevant New Readout/L3 Architecture No intermediate buses and buffer memories - use and local memory instead New dead-time or throtteling architecture

TRD/TPC Overall Timeline TEC drift Track segment processing track matching 0 1 2 3 4 5 event TRD pretrigger data sampling, linear fit end of TEC drift Data shipping off detector TRD trigger at L1 Trigger at TPC (Gate opens) Time in ms

TPC L3 trigger and processing Front- End / Trigger ~2 khz TPC intelligent Readout TRD Trigger L0 Global Trigger Select Regions of Interest Tracking of e+/e- Candidates inside TPC L1 Trigger and readout TPC L2 L2 Other Trigger Detectors, (144 Links, ~60 MB/evt evt) TRD L0pre Ship Zero suppressed TPC Data Sector parallel Ship TRD e+/e- Tracks seeds Conical zero suppressed readout On- line data reduction ( tracking, reconstruction, partial readout, data compression) Reject event Verify e+/e- e+/e- Tracks Hypothesis plus RoIs Track segments and space points DAQ

Architecture from TP 10 4 Hz Pb-Pb 10 5 Hz p-p Event Rate TPC DDL ITS PID PHOS TRIG Trigger Data L0 Trigger LDC 2500 MB/s Pb+Pb 20 MB/s p+p LDC LDC LDC LDC LDC Switch G D C G D C G D C G D C STL EDM BUSY L1 Trigger 10 3 Hz Pb-Pb 10 4 Hz p-p 1.5-2 µs L2 Trigger 50 Hz zentral + 1 khz dimuon Pb-Pb 550 Hz p-p 10-100 µs 1250 MB/s Pb+Pb 20 MB/s p+p Switch PDS PDS PDS PDS

Some Technology Trends Kapazität Geschwindigkeit (Latenz) Logic: 2x in 3 years 2x in 3 Jahren DRAM: 4x in 3 years 2x in 15 Jahren Disk: 4x in 3 years 2x in 10 Jahren D R AM Jahr Size Cycle Time 1000:1! 2:1! 1 9 8 0 6 4 K b 250 ns 1 9 8 3 2 5 6 K b 220 ns 1 9 8 6 1 Mb 190 ns 1 9 8 9 4 Mb 165 ns 1 9 9 2 1 6 Mb 145 ns 1 9 9 5 6 4 Mb 120 ns...

Prozessor-DRAM Memory Gap Performance 1000 100 10 1 DRAM µproc 60%/yr. (2X/1.5yr) Processor -M e m o r y Performance Gap: (g r o w s 50% / year) DRAM 6%/yr. (2X/15 yrs) 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Moore s Law Time Volker Dave Lindenstruth, Patterson, November UC Berkeley 1999

Testing the uniformity of memory // Vary the size of the array, to determine the size of the cache or the // amount of memory covered by TLB entries. for (size = SIZE_MIN; size <= SIZE_MAX; size *= 2) { // Vary the stride at which we access elements, // to determine the line size and the associativity for (stride = 1; stride <= size; stride *=2) { // Do the following test multiple times so that the granularity of the // timer is better and the start-up effects are reduced. sec = 0; iter = 0; limit = size - stride + 1; iterations = ITERATIONS; do { sec0 = get_seconds(); for (i = iterations; i; i--) // The main loop. // Does a read and a write from various memory locations. for (index = 0; index < limit; index += stride) *(array + index) += 1; iteration stride stride stride size Address sec += (get_seconds() - sec0); iter += iterations; iterations *= 2; } while (sec < 1); stride

360 MHz Pentium MMX 190 ns 3 2 bytes 4 0 9 4 bytes L1 Instruct. Cache: 1 6 kb L1 Data Cache: 1 6 kb (4-way associative, 16Byte line) L2 Cache: 5 1 2 kb (unified) MMU: 32 I / 64D TLB (4-way assoc) 9 5 n s 2.7 ns

360 MHz Pentium MMX L2 Cache off All Caches off

Vergleich zweier Supercomputer HP V - Class ( P A - 8x00) SUN E10k ( UltraSparc II) L1 Instruct. Cache: 5 1 2 kb L1 Data Cache: 1 0 2 4 kb (4-way associative, 16Byte line) MMU: 160 fully assoc. TLB L1 Instruct. Cache: 1 6 kb L1 Data Cache: 1 6 kb (write-through, non allocate, direct mapped,32byte line) L2 Cache: 5 1 2 kb (unified) MMU: 2x64 fully assoc. TLB

LogP P (Prozessoren) o ( overhead) P M P M o ( overhead) P M g ( gap ) g ( gap ) L (Latenz) Verbindungs -Netzwerk Volume limited by L/g (aggregate Throughput) L: : Time, a packet travels in the network from sender to receiver o: : overhead to send or receive a message g: shortest time between sent or received message P: Number of processors : Network Interface Card Culler et. al. LogP: Towards a Realistic Model of Parallel Computation; PPOPP, May 1993

2-Node Ethernet Cluster Gigabit Ethernet Quelle: Intel Gigabit Ethernet with Carrier Extension Fast Ethernet (100 Mb/s) Test: SUN Gigabit Ethernet Karte IP 2.0 2 SUN 450 Ultra Server 1 each Sender produces TCP datastream with large Data buffers; ; receiver simply throws data away Prozessor Utilization: Sender 40%; Receiver 60%! Throughput ca. 160 Mbits! Netto Throughput increases if receiver is implemented as twin processor Why is the TCP/IP Gigabit Ethernet performance so much worse than the theoretically possible?? Note: CMS implemented their own propriate network API for Gethernet and Myrinet

First Conclusions - Outlook Memory Bandwidth is the limiting and determining factor. Moving Data requires significant memory bandwidth. Number of TPC Data links dropped from (528 ) to 180 Aggregate data rate per link ~34 MB/sec @ 100 Hz TPC has highest processing requirements - majority of TPC computation can be done on per sector basis. Keep the number of s that process one sector in parallel to a minimum Today this number is 5 due to TPC granularity Try to get Sector data directly into one processor Selective Readout of TPC sectors can reduce data rate requirement by factors of at least 2-5 Overall complexity of L3 Processor can be reduced by using based receiver modules delivering the data straight into the host memory, thus eliminating the need for VME crates combining the data from multiple TPC links. DATE already uses a GSM paradigm as memory pool - no software changes

Receiver Card Architecture Push readout Pointers Data FiFo Optical Receiver Multi Event Buffer FPGA 66/64 Hostbridge Host memory

Readout of one TPC sector c a v e c o u n t i n g h o u s e RcvBd RcvBd RcvBd L3 Network RcvBd x2x18 Receiver Processor Each TPC sector is readout by four optical links, which are fed by a small derandomizing buffer in the TPC front-e n d. The optical receiver modules mount directly i n a commercial off the shelf (COTS) receiver computer in the counting house T h e COTS receiver processor performs any necessary hit level functionality on the data in case of L 3 processing The receiver processor can also perform loss less compression and simply forward it to DAQ implementing the TP baseline functionality. The receiver processor is much less expensive than any crate based solution

Overall TPC Intelligent Readout Architecture Inner Tracking System DDL LDC/ Muon Tracking Chambers LDC/ Particle Identification Photon Spectrometer LDC/ Switch 36 TPC Sectors LDC/L3 LDC/L3 LDC/L3 LDC/L3 LDC/L3 LDC/L3 LDC/L3 LDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 L3 Matrix GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 GDC/L3 Computer center Trigger Decisions Detector busy Trigger Detectors: LDC/ Micro Channelplate - Zero-Degree Cal. - Muon Trigger Chambers - Transition Radiation Detector EDM L0 Trigger L1 Trigger L2 Trigger Trigger Data Each TPC sector forms an independent sector cluster The sector clusters merge through a cluster interconnect/network to a global processing cluster. The aggregate throughput of this network can be scaled up to beyond 5 GB/sec at any point in time allowing to fall back to simple loss less binary readout All nodes in the cluster are generic COTs processors, which are acquired at the latest possible time All processing elements can be replaced and upgraded at any point in time The network is commercial The resulting multiprocessor cluster is generic and can be u s e d a s o f f -line farm PDS PDS PDS PDS

Dead Time / Flow Control TPC Buffer (8 black Events) TPC reveiver Buffer > 100 Events High w a t e r mark - send XOFF Scenario I TPC Dead Time is determined centrally For every TPC trigger a counter is incremented For every completely received event the last receiver module produces a message (single bit pulse), which is forwarded through all nodes after they also received the event The event receipt pulse decrements the counter The counter reaching count 7 asserts TPC dead time (there could be an other event already in the queue Event Receipt Daisy Chain RcvBd l o w w a t e r mark - send XOFF Scenario II TPC Dead Time is determined centrally based on rates assuming worst case event sizes Overflow protection for buffers: Assert TPC BUSY if 7 events within 50 ms (assuming 120 MB/event, 1 Gbit) Overflow protection for receiver buffers: ~100 Events in 1 second - OR high- water mark in any receiver buffer (preferred way) No need for reverse flow control on optical link No need for dead time signalling at TPC frontend

Summary Memory bandwidth is a very important factor in designing high performance multi processor systems; it needs to be studied in detail Do not move data if not required - moving data costs money (except for some granularity effects) Overall complexity can be reduced by using based receiver modules delivering the data straight into the host memory, thus eliminating the need for VME General purpose COTS processors are less expensive than any crate solution FPGA based receiver card prototype is built, NT driver completed, Linux driver almost completed DDL already planned as version No reverse flow control required for DDL DDL URD to be revised by collaboration ASAP No dead time or throtteling required to be implemented at front-end Two scenarios as to how to implement it for the TPC at back-end without additional cost