PARALLEL PROCESSOR ARRAY FOR HIGH SPEED PATH PLANNING

Similar documents
NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

Random Access Scan. Veeraraghavan Ramamurthy Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

IT T35 Digital system desigm y - ii /s - iii

Introduction. NAND Gate Latch. Digital Logic Design 1 FLIP-FLOP. Digital Logic Design 1

Sequencing. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,

PICOSECOND TIMING USING FAST ANALOG SAMPLING

11. Sequential Elements

Combinational vs Sequential

Reconfigurable Neural Net Chip with 32K Connections

MUHAMMAD NAEEM LATIF MCS 3 RD SEMESTER KHANEWAL

Retiming Sequential Circuits for Low Power

Asynchronous (Ripple) Counters

Report on 4-bit Counter design Report- 1, 2. Report on D- Flipflop. Course project for ECE533

Chapter 4. Logic Design

COMP2611: Computer Organization. Introduction to Digital Logic

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

Scan. This is a sample of the first 15 pages of the Scan chapter.

Computer Architecture and Organization

UNIT-3: SEQUENTIAL LOGIC CIRCUITS

The Matched Delay Technique: Wentai Liu, Mark Clements, Ralph Cavin III. North Carolina State University. (919) (ph)

Advanced Devices. Registers Counters Multiplexers Decoders Adders. CSC258 Lecture Slides Steve Engels, 2006 Slide 1 of 20

CHAPTER 6 ASYNCHRONOUS QUASI DELAY INSENSITIVE TEMPLATES (QDI) BASED VITERBI DECODER

16 Stage Bi-Directional LED Sequencer

Logic Design. Flip Flops, Registers and Counters

A High-Speed CMOS Image Sensor with Column-Parallel Single Capacitor CDSs and Single-slope ADCs

Low Power D Flip Flop Using Static Pass Transistor Logic

EL302 DIGITAL INTEGRATED CIRCUITS LAB #3 CMOS EDGE TRIGGERED D FLIP-FLOP. Due İLKER KALYONCU, 10043

DIFFERENTIAL CONDITIONAL CAPTURING FLIP-FLOP TECHNIQUE USED FOR LOW POWER CONSUMPTION IN CLOCKING SCHEME

Design and Simulation of a Digital CMOS Synchronous 4-bit Up-Counter with Set and Reset

CHAPTER 4: Logic Circuits

Chapter 7 Memory and Programmable Logic

Adding Analog and Mixed Signal Concerns to a Digital VLSI Course

LFSR Counter Implementation in CMOS VLSI

VLSI Chip Design Project TSEK06

Experiment 8 Introduction to Latches and Flip-Flops and registers

Integration of Virtual Instrumentation into a Compressed Electricity and Electronic Curriculum

MC9211 Computer Organization

ISSCC 2003 / SESSION 19 / PROCESSOR BUILDING BLOCKS / PAPER 19.5

TEST-3 (DIGITAL ELECTRONICS)-(EECTRONIC)

Introduction to CMOS VLSI Design (E158) Lecture 11: Decoders and Delay Estimation

A Symmetric Differential Clock Generator for Bit-Serial Hardware

Name Of The Experiment: Sequential circuit design Latch, Flip-flop and Registers

Counters

FP 12.4: A CMOS Scheme for 0.5V Supply Voltage with Pico-Ampere Standby Current

RS flip-flop using NOR gate

YEDITEPE UNIVERSITY DEPARTMENT OF COMPUTER ENGINEERING. EXPERIMENT VIII: FLIP-FLOPS, COUNTERS 2014 Fall

Final Exam review: chapter 4 and 5. Supplement 3 and 4

Decade Counters Mod-5 counter: Decade Counter:

Lossless Compression Algorithms for Direct- Write Lithography Systems

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

Digital Systems Laboratory 3 Counters & Registers Time 4 hours

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Chapter 6. sequential logic design. This is the beginning of the second part of this course, sequential logic.

Combining Dual-Supply, Dual-Threshold and Transistor Sizing for Power Reduction

A VLSI Implementation of an Analog Neural Network suited for Genetic Algorithms

CCD Element Linear Image Sensor CCD Element Line Scan Image Sensor

A NOVEL DESIGN OF COUNTER USING TSPC D FLIP-FLOP FOR HIGH PERFORMANCE AND LOW POWER VLSI DESIGN APPLICATIONS USING 45NM CMOS TECHNOLOGY

LATCHES & FLIP-FLOP. Chapter 7

Solution to Digital Logic )What is the magnitude comparator? Design a logic circuit for 4 bit magnitude comparator and explain it,

SEQUENTIAL LOGIC. Satish Chandra Assistant Professor Department of Physics P P N College, Kanpur

CPS311 Lecture: Sequential Circuits

Design Project: Designing a Viterbi Decoder (PART I)

Section 6.8 Synthesis of Sequential Logic Page 1 of 8

Fully Static and Compressed Topology Using Power Saving in Digital circuits for Reduced Transistor Flip flop

Cascadable 4-Bit Comparator

OFC & VLSI SIMULATION LAB MANUAL

More on Flip-Flops Digital Design and Computer Architecture: ARM Edition 2015 Chapter 3 <98> 98

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Efficient Architecture for Flexible Prescaler Using Multimodulo Prescaler

Chapter 5 Flip-Flops and Related Devices

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

MANY computer vision applications can benefit from the

University of Illinois at Urbana-Champaign

WINTER 15 EXAMINATION Model Answer

Counter dan Register

EEC 118 Lecture #9: Sequential Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

CHAPTER 4: Logic Circuits

Design and Implementation of FPGA Configuration Logic Block Using Asynchronous Static NCL

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION (Autonomous) (ISO/IEC Certified)

IMS B007 A transputer based graphics board

DESIGN AND SIMULATION OF A CIRCUIT TO PREDICT AND COMPENSATE PERFORMANCE VARIABILITY IN SUBMICRON CIRCUIT

MODULE 3. Combinational & Sequential logic

The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of

Lecture 10: Sequential Circuits

A FOUR GAIN READOUT INTEGRATED CIRCUIT : FRIC 96_1

Design of a Low Power Four-Bit Binary Counter Using Enhancement Type Mosfet

ELCT201: DIGITAL LOGIC DESIGN

Copyright 2011 by Enoch Hwang, Ph.D. and Global Specialties. All rights reserved. Printed in Taiwan.

Electrical and Telecommunications Engineering Technology_TCET3122/TC520. NEW YORK CITY COLLEGE OF TECHNOLOGY The City University of New York

Introduction Actel Logic Modules Xilinx LCA Altera FLEX, Altera MAX Power Dissipation

For Teacher's Use Only Q Total No. Marks. Q No Q No Q No

Flip-Flops and Sequential Circuit Design

Logic Gates, Timers, Flip-Flops & Counters. Subhasish Chandra Assistant Professor Department of Physics Institute of Forensic Science, Nagpur

Chapter 7 Counters and Registers

Figure 1 shows a simple implementation of a clock switch, using an AND-OR type multiplexer logic.

V6118 EM MICROELECTRONIC - MARIN SA. 2, 4 and 8 Mutiplex LCD Driver

ELEN Electronique numérique

Transcription:

PARALLEL PROCESSOR ARRAY FOR HIGH SPEED PATH PLANNING S.E. Kemeny, T.J. Shaw, R.H. Nixon, E.R. Fossum Jet Propulsion LaboratoryKalifornia Institute of Technology 4800 Oak Grove Dr., Pasadena, CA 91 109 ABSTRACT The first integration of a 24 x 25 array of processors for high speed optimal path planning is reported. Based on programmed terrain costs (traversal time), the IC determines, in parallel, the fastest routes from a selected starting point(s) to all other points on a given tcrrain. The chip has hqen successfully tested at a 7 MHz clock frequency, with typical path determination requiring 230 lis, resulting in a four order of magnitude speed-up over currmt sofhvare-hasqd shortmtroute techniques. INTRODUCTION For a given terrain to be traversed, it is computationally intensive to determine the fastest route between two points, and for defense or civilian emergency dispatching applications, computation time is critical. This paper reports the integration of a 24 x 25 random access array of digital processors which are programmed to model a given terrain and determine the fastest (lowest cost) path between any points on the terrain at very high speed (milliseconds for arrays up to 512 x 512). The primary purpose of this research chip is to demonstrate high speed path planning capability for tactical mobility analysis in battlefield scenarios. However such high speed automated path planning will find utility in a variety of settings such as autonomous vehicle navigation, intelligent vehicle highway systems, evacuation and rescue planning, and police and transportation dispatching. Currently, the only tools available to assist in path planning are iiiplemented in software. These approaches can be slow, with best path determination typically requiring seconds to minutes for terrain sizes varying from 64 x 64 to 512 x 512 pixels.' Through the VLSI implementation of a fine grain parallel architecture, in which every terrain pixel is represented by a corresponding processor, the inherent parallelism of the problem can be exploited and extremely fast path determination can be realized. In such an architecture, the only processor communication required is between nearest neighbors so that processor communication overhead is virtually eliminated. This is in contrast to conventional parallel computers, where even with proper parallel decomposition of the problem, processor communication overhead is often a severe speed bottleneck. In this paper, the first parallel processor IC for route planning over complex terrain is reported. ARRAY ARCHITECTURE AND OPERATION The path planner architecture, shown schematically in Fig. 1, consists of a 24 x 25 array of unit cells (processors) which communicate with their nearest neighbors and are randomly accessed by 5-bit row and column decoders located adjacent to the array. The IC is implemented in a single-poly, double-metal 2 pm CMOS n-well process, utilizing a full custom layout. The overall chip area is 9.2 mm x 7.9 mrn. A photograph of the chip is shown in Fig. 2. In order to determine the fastest routes from a selected starting point(s) to all other points on a given terrain, each unit cell corresponds to a terrain pixel which has been preprogrammed with the cost (i.e. delay) of traversing that pixel. Operation begins with the selection of a path origination pixel@) which sends out a signal to its north, south, east and IEEE 1992 CUSTOM INTEGRATED CIRCUITS CONFERENCE 6.5.1

R 0 D e C 0 d e r I 4 4 Unit Unit Unit 4 4 be queried and the minimum path between it and the origination node is found by retracing the direction stored in each unit cell. Thus, determination of the fastest paths through a complex terrain (modelled by 256 cost levels) is realized. This is in contrast to the simpler task of maze solving or wire routing, in which the processors would be programmed with binary costs, i.e. the pixel is either blocked or open. A 4 x 4 array of such binary processors and later a 4 x 8 array which used the discharge of a capacitor to provide an additional cost (blocked, not blocked and slow) have been previously reported. In addition to the lack of available cost levels, another drawback of this approach is the uncontrolled cost Fig. 1 Column Decoder Block diagram of IC architecture nonuniformity associated with varying capacitor discharge times across the array. In the approach reported here, the all-digital implementation leads to perfect cost uniformity across the array. In addition to finding the fastest paths from one origination pixel to all possible destinations, multiple starting pixels can be selected, with signal propagation emanating from each source and stopping at the boundary between signal wavefronts. This feature is useful in battlefield scenarios where an analyst can model the progress of different forces across the terrain. In addition, when any destination node is queried, the minimum path between it and the nearest source pixel is displayed, which provides valuable information for rescue operations. Unit Cell In order to implement signal propagation and path Fig. 2 IC photograph. west neighbors. Each neighbor delays the signal by a preset time (programmable cost), after which it broadcasts a signal to each of its four neighbors. One of 256 costs (delays) can be selected. hen a signal is received, the incoming signal direction is stored and further inputs to the cell are disabled. This results in a signal wavefront propagating radially outward from the originating pixel that is then distorted by the varying delays encountered in the array. hen signal propagation through the entire array is complete, any destination node may retracing in the array, each unit cell must perform two main functions: programmable delay and storage of the incoming direction. The former is implemented with a programmable counter and the latter with a set of static latches. A block diagram of the unit cell is shown in Fig. 3 and occupies 296 pm x 330 pm. Signal propagation through the array, controlled by variable unit cell delays, is implemented by presetting an 8-bit ripple down-counter in each cell to one of its 256 possible values. hen triggered by an incoming signal the counter decrements down to zero that in turn triggers the broadcast of an outgoing signal to each of its nearest neighbors. 6.5.2

~ Fig. 3 - @1 (b* DELAY COST * (CONDITIONAL RIVER BLOCKING) Block diagram of unit cell processor. Each counter stage is based on a static latch configuration, shown in Fig. 4. To eliminate the need for an 8-input NAND gate, nine (rather than eight) counter stages are utilized to achieve the 256 delay resolution. In order to implement the path retracing function in the array, four static cross-coupled latches are used to store the incoming signal direction. They can be read out at any time, even during signal propagation through the array. A four input NOR gate is triggered if any of the latches are set, that in turn disables the static latches from receiving further input. Data in Fig. 4 Set VDD T Reset Circuit schematic of one counter stage L N S E Another unit cell function is the conditional blocking of signal propagation in any direction to model impassable terrain such as rivers and canyons. This is important because the current (and next generation) resolution of digitized map data results in single pixels which contain both rivers and other features such as roads. In this case, the unit cell is assigned the cost of a road and the outgoing signal is blocked from crossing the river, resulting in signal propagation along a road adjacent to a river. Such conditional blocking is accomplished with another set of four static latches which are preprogrammed to either block or transmit the signal emanating from the counter. In order to minimize the unit cell size, each of the three functional blocks (storage of incoming direction, programmable delay, and conditional signal blocking) access four coymon data lines when enabled. The enable circuitry and rowlcolumn decoder found in each unit cell are implemented primarily with NOR logic. EXPERIMENTAL RESULTS The path planner chip was interfaced to a laboratory PC computer through a wirewrap board and plug-in digital interface card. The entire chip (address memory, counter, river blocking, control logic and VO) is completely functional. It was found that the latches require 160 ns to settle, implying a terrain programming time for the 600 pixel array of less than 300 ps. A separate counter test circuit was successfully clocked at 8.33 MHz, limited by the test station. The array can be operated in two modes: single step and continuous. In the former, the chip is clocked via the PC, and the actual signal propagation on the chip can be monitored on the PC screen. An example of signal propagation through the array is given by the sequence of photos in Fig. 5. In the continuous mode, a function generator supplies a square wave input to an nonoverlapping clock generator located on the wire-wrap board which in turn clocks the counter. The chip was tested at frequencies up to 7 MHz in this mode, resulting in typical path determination times of under 250 ps. For a typical terrain cost map, signal propagation through the array required 2550 clock cycles, so that the entire signal propagation phase required only 360 ps at a 7 MHz clock rate. Fig. 6 displays the original map with a typical lowest cost path shown in white. 6.5.3

Fig. 5 Signal propagation through array shown in white on map background (black indicates road): a) after 450 clock cycles, b) after 500 clock cycles, c) after 750 clock cycles, and d) after 1250 clock cycles. 6.5.4

wrote the original software interface for operation of the path planner chip. The research described in this paper was performed at the Center for Space Microelectronics Technology, Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration and was jointly sponsored by the ASAS Program Office, the Defense Advanced Research Projects Agency, and the Naval Surface arfare Center. Table I Chip Architecture: IC Characteristics 24 x 25 digital processor array Maximum Clock Frequency: Equivalent Operations per second: Origination Nodes: 7 MHz 6 billion one or multiple Cost Dynamic Range: 256:l Fig. 6 A typical lowest cost path found by chip. Process: 2 iim CMOS Unit Cell (Processor) Size: 296 pm x 330 pm CONCLUSION In summary, the first single-chip fine grain parallel processor array to perform path planning over complex terrain has been demonstrated. The 24 x 25 array of digital processors has been operated at frequencies up to 7 MHz, providing best (fastest) route determination in under a millisecond. This corresponds to a four order of magnitude speed-up over current software approaches. Full functionality of this first generation research chip paves the way for the implementation of large arrays (e.g. 1024 x 1024) and chips with increased functionality. Both these avenues are currently being pursued. A summary of the chip characteristics is given in Table 1. IC Size: 7.9 mm x 9.2 mm REFERENCES T. Kreitzberg, T. Barragy, and N. Bryant, "Tactical Movement Analyzer: A Battlefield Mobility Tool," hoc. Joint Service Data Fusion Symposium, Laurel, MD, 1990. C.R. Carroll, "A Neural Processor for Maze Solving," in C. Mead and M. Ismail, Eds., Analog VLSI Implementation of Neural Systems, Kluwer Academic Publishers, Boston (1989). ACKNOLEDGMENTS The authors gratefully acknowledge useful technical discussions with H. Langenbacher, B. Minch, T. Brown, D. Kerns, S. Eberhardt, and A. Thakoor during the course of this work. Special thanks go to D. Kerns for his effort on a previous analog path planner design and to B. Minch who 6.5.5