CMOS Design of Focal Plane Programmable Array Processors

Similar documents
IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

MANY computer vision applications can benefit from the

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

Introduction to Data Conversion and Processing

Reconfigurable Neural Net Chip with 32K Connections

Smart Traffic Control System Using Image Processing

PARALLEL PROCESSOR ARRAY FOR HIGH SPEED PATH PLANNING

Processing. Electrical Engineering, Department. IIT Kanpur. NPTEL Online - IIT Kanpur

Design of Memory Based Implementation Using LUT Multiplier

A Fast Constant Coefficient Multiplier for the XC6200

The future of microled displays using nextgeneration

data and is used in digital networks and storage devices. CRC s are easy to implement in binary

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register

Data flow architecture for high-speed optical processors

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Neural Hardware for Vision

A pixel chip for tracking in ALICE and particle identification in LHCb

LUT Optimization for Memory Based Computation using Modified OMS Technique

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

ALONG with the progressive device scaling, semiconductor

A Symmetric Differential Clock Generator for Bit-Serial Hardware

Implementation of Memory Based Multiplication Using Micro wind Software

A video signal processor for motioncompensated field-rate upconversion in consumer television

An MFA Binary Counter for Low Power Application

Contents Circuits... 1

Optimization of memory based multiplication for LUT

Data Converters and DSPs Getting Closer to Sensors

LOW POWER & AREA EFFICIENT LAYOUT ANALYSIS OF CMOS ENCODER

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

A Real Time Infrared Imaging System Based on DSP & FPGA

Various Applications of Digital Signal Processing (DSP)

VLSI Chip Design Project TSEK06

VLSI IEEE Projects Titles LeMeniz Infotech

Abstract 1. INTRODUCTION. Cheekati Sirisha, IJECS Volume 05 Issue 10 Oct., 2016 Page No Page 18532

PICOSECOND TIMING USING FAST ANALOG SAMPLING

RECOMMENDATION ITU-R BT.1201 * Extremely high resolution imagery

New Components for Building Fuzzy Logic Circuits

Implementation of BIST Test Generation Scheme based on Single and Programmable Twisted Ring Counters

Area-efficient high-throughput parallel scramblers using generalized algorithms

Implementation of Low Power and Area Efficient Carry Select Adder

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

A Power Efficient Flip Flop by using 90nm Technology

CAEN Tools for Discovery

A High-Speed CMOS Image Sensor with Column-Parallel Single Capacitor CDSs and Single-slope ADCs

AbhijeetKhandale. H R Bhagyalakshmi

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

SEMICONDUCTOR TECHNOLOGY -CMOS-

Digital Electronics Course Outline

Charge-Mode Parallel Architecture for Vector Matrix Multiplication

A Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications

Research Article Low Power 256-bit Modified Carry Select Adder

Distortion Analysis Of Tamil Language Characters Recognition

L12: Reconfigurable Logic Architectures

Design And Implimentation Of Modified Sqrt Carry Select Adder On FPGA

ELEN Electronique numérique

DISTRIBUTION STATEMENT A 7001Ö

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

SEMICONDUCTOR TECHNOLOGY -CMOS-

LFSR Counter Implementation in CMOS VLSI

Getting Images of the World

PLTW Engineering Digital Electronics Course Outline

A High- Speed LFSR Design by the Application of Sample Period Reduction Technique for BCH Encoder

ONE SENSOR MICROPHONE ARRAY APPLICATION IN SOURCE LOCALIZATION. Hsin-Chu, Taiwan

Sharif University of Technology. SoC: Introduction

Digitally Assisted Analog Circuits. Boris Murmann Stanford University Department of Electrical Engineering

OMS Based LUT Optimization

128 BIT CARRY SELECT ADDER USING BINARY TO EXCESS-ONE CONVERTER FOR DELAY REDUCTION AND AREA EFFICIENCY

SI-Studio environment for SI circuits design automation

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

LUT OPTIMIZATION USING COMBINED APC-OMS TECHNIQUE

Status of readout electronic design in MOST1

EFFICIENT DESIGN OF SHIFT REGISTER FOR AREA AND POWER REDUCTION USING PULSED LATCH

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

Chapter 1. Introduction to Digital Signal Processing

IC Layout Design of Decoders Using DSCH and Microwind Shaik Fazia Kausar MTech, Dr.K.V.Subba Reddy Institute of Technology.

VLSI Technology used in Auto-Scan Delay Testing Design For Bench Mark Circuits

CMOS Design Analysis of 4 Bit Shifters 1 Baljot Kaur, M.E Scholar, Department of Electronics & Communication Engineering, National

A Novel Architecture of LUT Design Optimization for DSP Applications

Field Programmable Gate Arrays (FPGAs)

Memory efficient Distributed architecture LUT Design using Unified Architecture

L11/12: Reconfigurable Logic Architectures

An Efficient High Speed Wallace Tree Multiplier

WINTER 15 EXAMINATION Model Answer

Microbolometer based infrared cameras PYROVIEW with Fast Ethernet interface

Research Article Design and Implementation of High Speed and Low Power Modified Square Root Carry Select Adder (MSQRTCSLA)

CCD Element Linear Image Sensor CCD Element Line Scan Image Sensor

FDTD_SPICE Analysis of EMI and SSO of LSI ICs Using a Full Chip Macro Model

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

An Efficient 64-Bit Carry Select Adder With Less Delay And Reduced Area Application

An optimized implementation of 128 bit carry select adder using binary to excess-one converter for delay reduction and area efficiency

AN EFFICIENT LOW POWER DESIGN FOR ASYNCHRONOUS DATA SAMPLING IN DOUBLE EDGE TRIGGERED FLIP-FLOPS

Novel Correction and Detection for Memory Applications 1 B.Pujita, 2 SK.Sahir

Adding Analog and Mixed Signal Concerns to a Digital VLSI Course

Design of BIST with Low Power Test Pattern Generator

Semiconductors Displays Semiconductor Manufacturing and Inspection Equipment Scientific Instruments

IN DIGITAL transmission systems, there are always scramblers

NDIA Army Science and Technology Conference EWA Government Systems, Inc.

A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1

Transcription:

CMOS Design of Focal Plane Programmable Array Processors Angel Rodríguez-Vázquez, Servando Espejo, Rafael Domínguez-Castro, Ricardo Carmona and Gustavo Liñán Instituo de Microelectrónica de Sevilla, Edificio CICA-CNM, Avda. Reina Mercedes sn 42-Sevilla, SPAIN Phone: +34 95 55 6666; Fa: +34 95 55 6686 Abstract: While digital processors can solve problems in most application areas, in some fields their capabilities are very limited. A typical eample is vision. Simple animals outperform super-computers in the realization of basic vision tasks. The limitations of conventional digital systems in this field can be overcome following a fundamentally different approach based on architectures closer to nature solutions. Retinas, the front end of biological vision systems, obtain their high processing power from parallelism, and consist of concurrent spatial distributions (on the focal plane aerea) of photoreceptors and basic analog processors with local connectivity and moderate accuracy. This can be implemented using an architecture with the following main components are: a) parallel processing through an array of locally-connected analog processors; b) a means of storing, locally, piel-by-piel, the intermediate computation results, and 3) stored on-chip programmability. When implemented as a mied-signal VLSI chip, devices are obtained which are capable of image processing at rates of trillions of operations per second with very small size and low power consumption. This paper reviews the latest results on this type of chips and systems, and outlines the envisaged roadmap for these computers.. Introduction Conventional vision machines use a CCD camera for parallel acquisition of the input image, and serial transmission of a digitized version of the input data to a separate computer. This results in huge data rates which conventional computers can not analyze in real-time. For instance, a 3-color@ 52 52 camera delivers about F 6 bytes/second, where F is the frame rate. Conventional computers and DSPs are able to manage such a huge rate for auto-focus, image stabilization, control of the luminance/chrominance, etc. However, eecuting the spatial-temporal operations of image processing in real-time requires much more sophisticated digital processors. Consequently, conventional vision machines with real-time capabilities are bulky, epensive and etremely power-hungry. This is in contrast to living beings, where even very tiny and power-efficient brains can analyze comple time-varying scenes in real-time. One of the keys to this high efficiency is the processing front-end of natural vision systems: the retina [].. This work has been supported by the EU under contract IST-999-97, the spanish CICYT under contract TIC99-826 and the ONR under contract NICOPN687-98-C-94

This contrast between the performance of artificial and natural vision systems is, among other things, due to the inherent parallelism of the processing realized by the latter. Such parallelism is observed already in the retina [2]. It contains photoreceptor cells of two different types called cones (about 6 million in the whole retina) and rods (about 2 million) which perform a logarithmic three-color imaging for around ten decades of light intensity range. It also contains processing cells called horizontal, bipolar, amacrine and ganglion cells to perform non-linear spatial-temporal processing operations on the incoming flow of images through a sequence of layers. Among many other tasks, such processing serves to etract important features from the raw sensory data and, thus, to reduce the amount of information transmitted for subsequent processing [3][]. Inspired by the efficiency of natural vision systems, universities and companies have focused their efforts on the development of new generations of devices capable of overcoming the drawbacks of traditional ones through the incorporation of distributed parallel processing, and by making this processing act concurrently with the acquisition of the signal. One possible strategy to achieve that is through flip-chip bonding of separate sensing and processing devices; another possibility is to incorporate the sensory and the processing circuitry on the same semiconductor substrate. Silicon retinas, smart-piel chips and focal-plane array-processors are members of this latter class of vision chips [4][5][6]. Their development is epected to have a significant impact in quite diverse scenarios. However, industrial applications demand chips capable of fleible operation, with programmable features and standard interfacing to conventional equipment. A powerful methodological framework for a systematic development of these types of chips is using the paradigm of analogic cellular computing [7] [8]. 2. Description of the Architecture Fig. contains a conceptual architecture of programmable focal-plane processing systems. Each processing element performs the functions of sensing (photoreceptor), analog processing (essentially based on local convolutions), logic processing (boolean gate) and storing (gray-scale and black&white). The convolution parameters and logic gate can be programmed in a spatially-invariant form (same parameter values for all processors). This programmability, combined with the internal piel-wide storage capability allows the realization of comple image processing algorithms. The on-chip incorporation of some additional circuitry around the processors array provides easy digital control of the processing algorithms, eecution steps, and data interchange. 3. Eamples of Chip Implementations During the last few years several cellular programmable array processing chips have been designed. Particularly, those having a size larger than and whose operation have been actually demonstrated through eperimental evidence are found in [9]-[4]. Table presents a summary of some features associated to these chips. Last row in this table refers to a new prototype, ACE6K, recently submitted to foundry.

a c b d a c b d Fig. :Conceptual architecture of programmable focal-plane processing systems. Speed is epressed in terms of analog operations per second. The equivalent digital multiply/add operations per second can be calculated in such a way that time step are supposed in a time constant. This is a default needed when the A template is full and analog input or output values are present. This means 2=2 equivalent multiply/add operations per time constant, so that calculating with 496 cell processors and about 28ns time constant [4], the equivalent speed is about 3 TeraOPS. The data in this table reveals a trade-off between speed and accuracy common to any analog integrated circuit. Out from these chips, those reported in [] [4] and ACE6K have embedded distributed optical sensors; i.e. they are true focal plane array processors. On the other hand, only ACE6K and that reported in [4] are capable to operate with gray scale inputs and producing gray outputs, while at the same time having all functional features stated in the Introduction. The chip in [4] has served as a vehicle to demonstrate the concept of true VLSI analog chips with robust, controlled and predictable response. From here, basic challenges were to increase the size and to improve the I/O performance [5]. The new ACE6K prototype follows this trend. The integration of multiple sensors per piel within the array computer probably defines the dominant medium- and long-term scenario for systems based on these chips [6]. The multiple sensors should be adaptive and capture different modalities, spectra,

sensitivity and dynamics. Their control parameters should be set by underlying programmed calculations. Hence, the multi-sensor image acquisition depends, piel by piel, on the actual changing scene to be analyzed. Table : Summary and comparison of recent chip implementations Reference Technology (CMOS (µm)) Design Style a Array Size (cells) Die Size (mm 2 ) Cells Density (cells/mm 2 ) Speed XPS b XPS/cell XPS/mm 2 XPS/mW Stored Program Analog Resolution (eq. bits) Optical Sensors Electrical Input c Electrical Output c Embedded Images Memory Digital Eternal Control [9]. MS 32 32 7 3.3T.3G 9.3G --- 6-7 A B [] d.7 A 2 2 25 7 2.5G 3M.52G 82M 6-7 A B [].8 MS [2].5 BD [3].8 A 2 22 48 48 4 4 3 28.3T.3G 8.25G.2G 6-7 B B.4 295 7.65T 3.76G.T 25G 2 B B 26 6.37T.89G 3G.24G 4 A A [4] e.5 MS 64 64 87 8.4T 98M 7.93G.33G 7-8 A + B A + B -- f.35 MS 28 28 3 8.64T g M g 8G g --- 7-8 D D a. MS: Mied-Signal, A: Analog, BD: Basically Digital b. XPS: Analog Operations Per Second, is an equivalent measurement indicating the number of analog arithmetic operations like addition, substraction, multiplication and division. c. A: Analog, B: Binary, D: Digital (digitalized gray-scale). d. The convolutors in this chip have vertical and horizontal interconnections, but not diagonals. e. Some additional functionalities of this design include: local evolution enabling mask, global binary gates for fast binary output-images evaluation, cyclic spatial boundary conditions. f. Design presently in foundry. This chip has some additional functionalities: full digital interface (control and data), synchronous address event output for sparse binary images, local data-transference and evolution enabling masks, selectable linear-logarithmic photoreception. g. Preliminary data from simulations.

4. Application Algorithms: Some Eamples Fig. 2 illustrates two application eamples, namely nonlinear impulse-noise removal and real-time image segmentation, taken from those demonstrated by the chip referred in [4]. Further applications and results are described in the DICTAM project web page http://www.imse.cnm.es/~dictam. Fig. 2: Application eamples: a) non-linear salt&pepper noise removal, b) real-time image segmentation. 5. Prospects for Future Developments and Applications The eploitation of higher resolution technologies will certainly allow the production programmable focal-plane array processors with array sizes in the range of 256 256 and beyond. Even with present resolutions (28 28), the application scope and possible tasks for this type of systems include key areas like image segmentation, pattern recognition, objects classification, object counting, motion detection and estimation, activity detection, attention triggering and orientation, high speed search of relevant sectors in large images, image fusion, path finding, real-time spatio-temporal linear/nonlinear image filtering, artificial vision tasks, early vision., image processing front-ends, tracking, surveillance, real time video compression, intelligent toys, quality control systems, multimedia applications, teleconferencing, videophony, defense systems, and medical imaging. References [] F. Werblin, A. Jacobs and J. Teeters, The Computational Eye. IEEE Spectrum, Vol. 33, pp. 3-37, May 996. [2] F. Werblin, T. Roska and L.O. Chua, The Analogic Cellular Neural Network as a Bionic Eye, Int J. of Circuit Theory and Applications, Vol. 23, pp. 54-549, 995. [3] M.M. Gupta, G.K. Knopf (Eds.), Neuro-Vision Systems, Principles and Applications, IEEE Press, 994. ISBN: -783-42-X

[4] A. Rodríguez-Vázquez, et al.: Current-Mode Techniques for the Implementation of Continuous-Time and Discrete-Time Cellular Neural Networks, IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, Vol. 4, pp. 32-46, March 993. [5] C. Koch, H. Li (Eds.), Vision Chips, Implementing Vision Algorithms with Analog VLSI Circuits, IEEE Press, 995. ISBN: -886-6492-4 [6] B.J. Sheu, J. Choi, Neural Information Processing and VLSI, Kluwer Academic Publishers, 995. ISBN: -7923-9547-6 [7] L.O. Chua and T. Roska, The CNN Paradigm, IEEE Trans. Circuits & Systems-I, Vol. 4, pp. 47-56, March 993. [8] T. Roska and L.O. Chua, The CNN Universal Machine: An Analogic Array Computer, IEEE Trans. Circuits & Systems-I, Vol. 4, pp. 63-73, March 993. [9] S. Espejo et al., A CNN Universal Chip in CMOS Technology, International Journal of Circuit Theory and Applications, vol. 24, pp. 93-9, Jan-Feb. 996. [] P. Kinget and M. Steyaert, Analog VLSI Integration of Massive Parallel Processing Systems. Kluver Academic Publishers, ISBN: -7923-9823-8, 997 [] R. Domínguez-Castro et al., "A.8µm CMOS 2-D Programmable Mied-Signal Focal-Plane Array Processor with On-Chip Binary Imaging and Instructions Storage". IEEE J. Solid-State Circuits, Vol. 32, pp. 3-26, No. 7, July 997. [2] A. Paasio, V. Porra, A CNN Universal Machine with 295 cells/mm 2. Proc. of the 997 Int. Symposium on Non Lineal Theory and its Applications (NOLTA 97), Honolulu, USA, 997, pp. 22-224. [3] J. Cruz and L. Chua, A 66 Cellular Neural Network Universal Chip. Analog Integrated Circuits and Signal Processing, Vol. 5, pp. 226-238, March 998. [4] G. Liñán et al., A.5µm CMOS 6 Transistors Analog Programmable Array Processor for Real-Time Image Processing. Proc. of the 999 European Solid-State Circuits Conference, pp. 358-36, September 999. [5] A. Rodríguez-Vázquez et al., MOST-Based Design and Scaling of Synaptic Interconnections in VLSI Analog Array Processing Chips. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, Vol. 23, pp. 239-266, Kluwer Academics November/December 999. [6] T. Roska, Computer-Sensors: Spatio-Temporal Computers for Analog Array Signals, Dynamically Integrated with Sensors. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, Vol. 23, pp. 22-238, Kluwer Academics November/December 999.