Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan
|
|
- Abigayle Fisher
- 5 years ago
- Views:
Transcription
1 Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan Virginia Polytechnic Institute and State University
2 Reverse-engineer the brain National Academy of Engineering Top 5 Grand Challenges Neuron Axon Terminal (transmitter) Cited from Sciseek.com Dendrites (receiver) Axon (wires) Question: How are the neurons connected? Action Potentials (Spikes) 2
3 Reverse-engineer the brain National Academy of Engineering Top 5 Grand Challenges Multi-Electrode Array (MEA) Neurons grown on MEA Chip A B C A B C time Spike Train Stream 3
4 Reverse-engineer the brain National Academy of Engineering Top 5 Grand Challenges Find Repeating Patterns Infer Network Connectivity 4
5 Fast data mining of spike train stream on Graphics Processing Units (GPUs) MEA Chip GPU Chip Multi-Electrode Array (MEA) NVIDIA GTX280 Graphics Card 5
6 Fast data mining of spike train stream on Graphics Processing Units (GPUs) Two key algorithmic strategies to address scalability problem on GPU A hybrid mining approach A two-pass elimination approach 6
7 Event stream data: sequence of neurons firing ( E 1,t 1 ),( E 2,t 2 ),...,( E n,t ) n Event of Type A occurred at t = 6 Neuron A B 1 1 C D Time Event of Type D occurred at t = 5 7
8 Pattern or Episode Inter-event constraint Occurrences (Non-overlapped) A Neurons B C D Time 1 Episode appears twice in the event stream. 8
9 Data mining problem: Find all possible episodes / patterns which occur more than X-times in the event sequence. Challenge: Combinatorial Explosion: large number of episodes to count Episode Size/Length: A A B A B C A B C D B B A A C B A C B D A C B A C B C A A C D B A D B C A D C B 9
10 Mining Algorithm (A level wise procedure to control combinatorial explosion) Generate an initial list of candidate size-1 episodes Repeat until - no more candidate episodes Count: Occurrences of size-m candidate episodes Prune: Retain only frequent episodes Candidate Generation: size-(m+1) candidate episodes from N-size frequent episodes Output all the frequent episodes Computational bottleneck 10
11 Counting Algorithm (for one episode) Episode: Accept_A() Accept_B() Accept_C() Accept_D() A 1 B 4 C 10 D 17 A 2 B 12 C 13 A A 1 A 2 B 4 A 5 C 10 B 12 C 13 D 17 Event Stream 11
12 Find an efficient counting algorithm on GPU to count the occurrences of N size-m episodes in an event stream. Address scalability problem on GPU s massive parallel execution architecture. 12
13 One episode per GPU thread (PTPE) Each thread counts one episode Simple extension of serial counting GPU MP MP MP N Episodes N GPU Threads SP SP SP SM SM SM Event Stream Global Memory Efficient when the number of episode is larger than the number of GPU cores. 13
14 Not enough episodes/thread, some GPU cores will be idle. Solution: Increase the level of parallelism. Multiple Thread per Episode (MTPE) N Episodes NM N GPU Threads Event Stream M Event Segments 14
15 Problem with simple count merge. 15
16 Choose the right algorithm with respect to the number of episodes N. Define a switching threshold - Crossover point (CP) No If N < CP Yes Use PTPE Use MTPE GPU computing capacity CP = MP B MP T B f (size) MP : Number of multi - processors B MP : Block per multi - processor T B : Thread per block Performance Penalty Factor 16
17 Problem: Original counting algorithm is too complex for a GPU kernel function. Episode: Accept_A() Accept_B() Accept_C() Accept_D() A 1 B 4 C 10 D 17 A 2 B 12 C 13 A A 1 A 2 B 4 A 5 C 10 B 12 C 13 D 17 Event Stream 17
18 Problem: Original counting algorithm is too complex for a GPU kernel function. Accept_A() Accept B() Accept_C() Accept_D() SP MP SP MP SP MP A 1 B 4 C 10 D 17 A 2 B 12 C 13 A 5 SM SM SM Global Memory Large shared memory usage Large register file usage Large number of branching instructions 18
19 Solution: PreElim algorithm Less constrained counting Simple kernel function Upper bound only Episode: A (,5] B (,10] C (,5] D Accept_A() Accept_B() Accept_C() Accept_D() A 12 5 B 4 C D 17 B A 1 A 2 B 4 A 5 C 10 B 12 C 13 D 17 Event Stream 19
20 A simpler kernel function Shared Memory Register Local Memory PreElim 4 x Episode Size 13 0 Normal Counting 44 x Episode Size
21 Solution: Two-pass elimination approach PASS 1: Less Constrained Counting PASS 2: Normal Counting Episodes Threads Fewer Episodes Threads Event Stream Event Stream 21
22 A simpler kernel function Compile Time Difference Shared Memory Register Local Memory PreElim 4 x Episode Size 13 0 Normal Counting 44 x Episode Size Run Time Difference Local Memory Load and Store Divergent Branching Two Pass 24,770,310 12,258,590 Hybrid 210,773,785 14,161,399 22
23 Hardware Computer (custom-built) Intel Core2 2.33GHz 4GB memory Graphics Card (Nvidia GTX 280 GPU) 240 cores (30 MPs * 8 1.3GHz 1GB global memory 16K shared memory for each MP 23
24 Datasets Synthetic (Sym26) 60 seconds with 50,000 events Real (Culture growing for 5 weeks) Day 33: ( events) Day 34: ( events) Day 35: ( events) 24
25 PTPE vs MTPE Crossover points 25
26 Performance of the Hybrid Approach 1200 PTPE PTPE MTPE MTPE Hybrid Time (ms) Crossover points Episode Size Episode Number: Sym26 dataset, Support =
27 Crossover Point Estimation f (size) = a is a better fit. size + b A least square fit is performed. 27
28 Two-pass approach vs Hybrid approach 99.9% fewer episodes 28
29 Performance of the Two-pass approach One Pass Two Pass Total # First Pass Cull 160K 200K 120K 160K Time (ms) 80K Episode # 120K 80K 40K 40K 0K One Pass Two Pass Episode Size 0K Total # First Pass Cull Episode Size dataset, Support =
30 Percentage of episodes eliminated by each pass 100% 99% 98% 97% 96% 95% 94% 93% 92% 91% First Pass Second Pass Support dataset, episode size = 4 30
31 GPU vs CPU GPU is always faster than CPU 5x - 15x speedup Fair comparison Two-pass algorithm used Maximum threading for both 31
32 Massive parallelism is required for conquering near exponential search space GPU s far more accessible than high performance clusters Frequent episode mining Not data parallel Redesigned algorithm Framework for real-time and interactive analysis of spike train experimental data 32
33 A fast temporal data mining framework on GPUs Commoditized system Massive parallel execution architecture Two programming strategies A hybrid approach Increase level of parallelism (data segmentation + map-reduce) Two-pass elimination approach Decrease algorithm complexity (Task decomposition) 33
34 Questions. 34
35 Parallel Execution via pthreads Optimized for CPU execution Minimize disk access Cache performance Implements Two-Pass Approach PreElim Simpler/ Quicker state machine Full State Machine Slower but is required to eliminate all unsupported episodes... A B D E F Z G... A B C D E F G H ACE ACDE AEF EFG
36 A B C D Level-wise N-size frequent episodes => (N+1)-size candidates A B C D A B C D
PRACE Autumn School GPU Programming
PRACE Autumn School 2010 GPU Programming October 25-29, 2010 PRACE Autumn School, Oct 2010 1 Outline GPU Programming Track Tuesday 26th GPGPU: General-purpose GPU Programming CUDA Architecture, Threading
More informationScalability of MB-level Parallelism for H.264 Decoding
Scalability of Macroblock-level Parallelism for H.264 Decoding Mauricio Alvarez Mesa 1, Alex Ramírez 1,2, Mateo Valero 1,2, Arnaldo Azevedo 3, Cor Meenderinck 3, Ben Juurlink 3 1 Universitat Politècnica
More informationHigh-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures
46 H. Y. SU, M. WEN, J. REN, N. WU, J. CHAI, C.Y. ZHANG, HIGH-EFFICIENT PARALLEL CAVLC ENCODER High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures Huayou SU, Mei WEN, Ju REN,
More informationPrime Num Generator - Maker Faire 2014
Prime Num Generator - Maker Faire 2014 Experimenting with math in hardware Stanley Ng, Altera Synopsis The Prime Number Generator ( PNG ) counts from 1 to some number (273 million, on a Cyclone V C5 device)
More informationOptimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015
Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used
More informationTransparent low-overhead checkpoint for GPU-accelerated clusters
Transparent low-overhead checkpoint for GPU-accelerated clusters Leonardo BAUTISTA GOMEZ 1,3, Akira NUKADA 1, Naoya MARUYAMA 1, Franck CAPPELLO 3,4, Satoshi MATSUOKA 1,2 1 Tokyo Institute of Technology,
More informationHigh Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation
High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities Introduction About Myself What to expect out of this lecture Understand the current trend in the IC Design
More informationHighly Parallel HEVC Decoding for Heterogeneous Systems with CPU and GPU
2017. This manuscript version (accecpted manuscript) is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/. Highly Parallel HEVC Decoding for Heterogeneous
More informationCommunication Avoiding Successive Band Reduction
Communication Avoiding Successive Band Reduction Grey Ballard, James Demmel, Nicholas Knight UC Berkeley PPoPP 12 Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by
More informationUnderstanding Compression Technologies for HD and Megapixel Surveillance
When the security industry began the transition from using VHS tapes to hard disks for video surveillance storage, the question of how to compress and store video became a top consideration for video surveillance
More informationResearch Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation
e Scientific World Journal, Article ID 716020, 19 pages http://dx.doi.org/10.1155/2014/716020 Research Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation Huayou
More informationEmbedded System Design
Embedded System Design p. 1/2 Embedded System Design Prof. Stephen A. Edwards sedwards@cs.columbia.edu Spring 2007 Spot the Computer Embedded System Design p. 2/2 Embedded System Design p. 3/2 Hidden Computers
More informationMilestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring
white paper Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring Executive Summary Milestone Systems, the world s leading
More informationGated Driver Tree Based Power Optimized Multi-Bit Flip-Flops
International Journal of Emerging Engineering Research and Technology Volume 2, Issue 4, July 2014, PP 250-254 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Gated Driver Tree Based Power Optimized Multi-Bit
More informationImplementation of an MPEG Codec on the Tilera TM 64 Processor
1 Implementation of an MPEG Codec on the Tilera TM 64 Processor Whitney Flohr Supervisor: Mark Franklin, Ed Richter Department of Electrical and Systems Engineering Washington University in St. Louis Fall
More informationLossless Compression Algorithms for Direct- Write Lithography Systems
Lossless Compression Algorithms for Direct- Write Lithography Systems Hsin-I Liu Video and Image Processing Lab Department of Electrical Engineering and Computer Science University of California at Berkeley
More informationECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011
ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011 Lecture 9: TX Multiplexer Circuits Sam Palermo Analog & Mixed-Signal Center Texas A&M University Announcements & Agenda Next
More informationHEVC Real-time Decoding
HEVC Real-time Decoding Benjamin Bross a, Mauricio Alvarez-Mesa a,b, Valeri George a, Chi-Ching Chi a,b, Tobias Mayer a, Ben Juurlink b, and Thomas Schierl a a Image Processing Department, Fraunhofer Institute
More informationOutline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far.
Outline 1 Reiteration Lecture 5: EIT090 Computer Architecture 2 Dynamic scheduling - Tomasulo Anders Ardö 3 Superscalar, VLIW EIT Electrical and Information Technology, Lund University Sept. 30, 2009 4
More informationMasters of Science in COMPUTER ENGINEERING
PICSEL: Measuring User-Perceived Performance to Control Dynamic Frequency Scaling IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Masters of Science in COMPUTER ENGINEERING By Jack Cosgrove
More informationProcessor time 9 Used memory 9. Lost video frames 11 Storage buffer 11 Received rate 11
Processor time 9 Used memory 9 Lost video frames 11 Storage buffer 11 Received rate 11 2 3 After you ve completed the installation and configuration, run AXIS Installation Verifier from the main menu icon
More informationLow Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 29 Minimizing Switched Capacitance-III. (Refer
More informationHigh Performance Carry Chains for FPGAs
High Performance Carry Chains for FPGAs Matthew M. Hosler Department of Electrical and Computer Engineering Northwestern University Abstract Carry chains are an important consideration for most computations,
More informationUniversal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley
Eric Battenberg and David Wessel Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley Microsoft Parallel Applications Workshop
More informationMMI: A General Narrow Interface for Memory Devices
MMI: A General Narrow Interface for Devices Judy Chen Eric Linstadt Rambus Inc. Session 106 August 12, 2009 August 2009 1 What is MMI? WLAN BT GPS NOR S/M Baseband Processor Apps/Media Processor NAND M
More informationDesign Project: Designing a Viterbi Decoder (PART I)
Digital Integrated Circuits A Design Perspective 2/e Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolić Chapters 6 and 11 Design Project: Designing a Viterbi Decoder (PART I) 1. Designing a Viterbi
More informationSharif University of Technology. SoC: Introduction
SoC Design Lecture 1: Introduction Shaahin Hessabi Department of Computer Engineering System-on-Chip System: a set of related parts that act as a whole to achieve a given goal. A system is a set of interacting
More informationECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer
ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer by: Matt Mazzola 12222670 Abstract The design of a spectrum analyzer on an embedded device is presented. The device achieves minimum
More informationFrame Interpolation and Motion Blur for Film Production and Presentation GTC Conference, San Jose
Frame Interpolation and Motion Blur for Film Production and Presentation 2013 GTC Conference, San Jose Keith Slavin, isovideo LLC (slides 20 to 22 by Chad Fogg) 1 What we have today 24 frames/sec is too
More informationImpact of Intermittent Faults on Nanocomputing Devices
Impact of Intermittent Faults on Nanocomputing Devices Cristian Constantinescu June 28th, 2007 Dependable Systems and Networks Outline Fault classes Permanent faults Transient faults Intermittent faults
More informationVVD: VCR operations for Video on Demand
VVD: VCR operations for Video on Demand Ravi T. Rao, Charles B. Owen* Michigan State University, 3 1 1 5 Engineering Building, East Lansing, MI 48823 ABSTRACT Current Video on Demand (VoD) systems do not
More informationControlling Peak Power During Scan Testing
Controlling Peak Power During Scan Testing Ranganathan Sankaralingam and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering University of Texas, Austin,
More informationGPU Acceleration of a Production Molecular Docking Code
GPU Acceleration of a Production Molecular Docking Code Bharat Sukhwani Martin Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University
More informationDiscovery of frequent episodes in event sequences
Discovery of frequent episodes in event sequences Andres Kauts, Kait Kasak University of Tartu 2009 MTAT.03.249 Combinatorial Data Mining Algorithms What is sequential data mining Sequencial data mining
More informationPerformance Driven Reliable Link Design for Network on Chips
Performance Driven Reliable Link Design for Network on Chips Rutuparna Tamhankar Srinivasan Murali Prof. Giovanni De Micheli Stanford University Outline Introduction Objective Logic design and implementation
More informationFooling the Masses with Performance Results: Old Classics & Some New Ideas
Fooling the Masses with Performance Results: Old Classics & Some New Ideas Gerhard Wellein (1,2), Georg Hager (2) (1) Department for Computer Science (2) Erlangen Regional Computing Center Friedrich-Alexander-Universität
More informationMULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER Wassim Hamidouche, Mickael Raulet and Olivier Déforges
More informationPower Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT
Power Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT Giuseppe Desoli ST Central Labs STMicroelectronics Artificial Intelligence is Everywhere 2 Analysis,
More informationA parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b 1 Education Ministry
More informationNew-Generation Scalable Motion Processing from Mobile to 4K and Beyond
Mobile to 4K and Beyond White Paper Today s broadcast video content is being viewed on the widest range of display devices ever known, from small phone screens and legacy SD TV sets to enormous 4K and
More informationFigure 1: Feature Vector Sequence Generator block diagram.
1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.
More informationJ. Maillard, J. Silva. Laboratoire de Physique Corpusculaire, College de France. Paris, France
Track Parallelisation in GEANT Detector Simulations? J. Maillard, J. Silva Laboratoire de Physique Corpusculaire, College de France Paris, France Track parallelisation of GEANT-based detector simulations,
More informationRedEye Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision
Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision Robert LiKamWa Yunhui Hou Yuan Gao Mia Polansky Lin Zhong roblkw@rice.edu houyh@rice.edu yg18@rice.edu mia.polansky@rice.edu lzhong@rice.edu
More informationNVCP recommended settings for TSW incl GSync 5. Screen Settings in TSW - Graphics settings 6. TSW Settings explained and recommendations 7
Setting Up TSW with a single nvidia card, using nvidia Control Panel (NVCP) PLUS (optional) nvidia Inspector (NVI). Single Standard and GSync Monitor settings. Setting up DSR in TSW This is a guide to
More informationDesigning for High Speed-Performance in CPLDs and FPGAs
Designing for High Speed-Performance in CPLDs and FPGAs Zeljko Zilic, Guy Lemieux, Kelvin Loveless, Stephen Brown, and Zvonko Vranesic Department of Electrical and Computer Engineering University of Toronto,
More information1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington
1) New Paths to New Machine Learning Science 2) How an Unruly Mob Almost Stole the Grand Prize at the Last Moment Jeff Howbert University of Washington February 4, 2014 Netflix Viewing Recommendations
More informationEyeFace SDK v Technical Sheet
EyeFace SDK v4.5.0 Technical Sheet Copyright 2015, All rights reserved. All attempts have been made to make the information in this document complete and accurate. Eyedea Recognition, Ltd. is not responsible
More informationLong and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003
1 Introduction Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003 Circuits for counting both forward and backward events are frequently used in computers and other digital systems. Digital
More informationThis paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.
This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. The final version is published and available at IET Digital Library
More informationOn the Characterization of Distributed Virtual Environment Systems
On the Characterization of Distributed Virtual Environment Systems P. Morillo, J. M. Orduña, M. Fernández and J. Duato Departamento de Informática. Universidad de Valencia. SPAIN DISCA. Universidad Politécnica
More informationVideo-on-Demand. Nick Caggiano Walter Phillips
Video-on-Demand Nick Caggiano Walter Phillips Video-on-Demand What is Video-on-Demand? Storage, transmission, and display of archived video files in a networked environment Most popularly used to watch
More informationCPS311 Lecture: Sequential Circuits
CPS311 Lecture: Sequential Circuits Last revised August 4, 2015 Objectives: 1. To introduce asynchronous and synchronous flip-flops (latches and pulsetriggered, plus asynchronous preset/clear) 2. To introduce
More informationDistributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes
Distributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes Ankit Arora Sachin Bagga Rajbir Singh Cheema M.Tech (IT) M.Tech (CSE) M.Tech (CSE) Guru Nanak Dev University Asr. Thapar
More informationIEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing
IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing Theodore Yu theodore.yu@ti.com Texas Instruments Kilby Labs, Silicon Valley Labs September 29, 2012 1 Living in an analog world The
More informationAN EFFECTIVE CACHE FOR THE ANYWHERE PIXEL ROUTER
University of Kentucky UKnowledge Theses and Dissertations--Electrical and Computer Engineering Electrical and Computer Engineering 2007 AN EFFECTIVE CACHE FOR THE ANYWHERE PIXEL ROUTER Vijai Raghunathan
More informationMulticore Design Considerations
Multicore Design Considerations Multicore: The Forefront of Computing Technology We re not going to have faster processors. Instead, making software run faster in the future will mean using parallel programming
More informationOptimizing the Startup Time of Embedded Systems: A Case Study of Digital TV
2242 IEEE Transactions on Consumer Electronics, Vol. 55, No. 4, NOVEMBER 2009 Optimizing the Startup Time of Embedded Systems: A Case Study of Digital TV Heeseung Jo, Hwanju Kim, Jinkyu Jeong, Joonwon
More informationSPATIAL LIGHT MODULATORS
SPATIAL LIGHT MODULATORS Reflective XY Series Phase and Amplitude 512x512 A spatial light modulator (SLM) is an electrically programmable device that modulates light according to a fixed spatial (pixel)
More informationData Converters and DSPs Getting Closer to Sensors
Data Converters and DSPs Getting Closer to Sensors As the data converters used in military applications must operate faster and at greater resolution, the digital domain is moving closer to the antenna/sensor
More informationREAL-TIME H.264 ENCODING BY THREAD-LEVEL PARALLELISM: GAINS AND PITFALLS
REAL-TIME H.264 ENCODING BY THREAD-LEVEL ARALLELISM: GAINS AND ITFALLS Guy Amit and Adi inhas Corporate Technology Group, Intel Corp 94 Em Hamoshavot Rd, etah Tikva 49527, O Box 10097 Israel {guy.amit,
More information8 DIGITAL SIGNAL PROCESSOR IN OPTICAL TOMOGRAPHY SYSTEM
Recent Development in Instrumentation System 99 8 DIGITAL SIGNAL PROCESSOR IN OPTICAL TOMOGRAPHY SYSTEM Siti Zarina Mohd Muji Ruzairi Abdul Rahim Chiam Kok Thiam 8.1 INTRODUCTION Optical tomography involves
More informationOptical clock distribution for a more efficient use of DRAMs
Optical clock distribution for a more efficient use of DRAMs D. Litaize, M.P.Y. Desmulliez*, J. Collet**, P. Foulk* Institut de Recherche en Informatique de Toulouse (IRIT), Universite Paul Sabatier, 31062
More informationAmon: Advanced Mesh-Like Optical NoC
Amon: Advanced Mesh-Like Optical NoC Sebastian Werner, Javier Navaridas and Mikel Luján Advanced Processor Technologies Group School of Computer Science The University of Manchester Bottleneck: On-chip
More informationUsing Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel
IEEE TRANSACTIONS ON MAGNETICS, VOL. 46, NO. 1, JANUARY 2010 87 Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel Ningde Xie 1, Tong Zhang 1, and
More informationVLSI Digital Signal Processing
VLSI Digital Signal Processing EEC 28 Lecture Bevan M. Baas Tuesday, January 8, 29 Today Administrative items Syllabus and course overview My background Digital signal processing overview Read Programmable
More informationDiscovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences
Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences Sherri K. Harms, 1 Jitender Deogun, 2 Tsegaye Tadesse 3 1 Department of Computer Science and Information Systems
More informationHybrid Discrete-Continuous Computer Architectures for Post-Moore s-law Era
Hybrid Discrete-Continuous Computer Architectures for Post-Moore s-law Era Keynote at the Bi annual HiPEAC Compu6ng Systems Week Mee6ng Barcelona, Spain October 19 th 2010 Prof. Simha Sethumadhavan Columbia
More informationMindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK.
Andrew Robbins MindMouse Project Description: MindMouse is an application that interfaces the user s mind with the computer s mouse functionality. The hardware that is required for MindMouse is the Emotiv
More informationAdaptive Key Frame Selection for Efficient Video Coding
Adaptive Key Frame Selection for Efficient Video Coding Jaebum Jun, Sunyoung Lee, Zanming He, Myungjung Lee, and Euee S. Jang Digital Media Lab., Hanyang University 17 Haengdang-dong, Seongdong-gu, Seoul,
More informationFPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique
FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique Dr. Dhafir A. Alneema (1) Yahya Taher Qassim (2) Lecturer Assistant Lecturer Computer Engineering Dept.
More informationAutomatic optimization of image capture on mobile devices by human and non-human agents
Automatic optimization of image capture on mobile devices by human and non-human agents 1.1 Abstract Sophie Lebrecht, Mark Desnoyer, Nick Dufour, Zhihao Li, Nicole A. Halmi, David L. Sheinberg, Michael
More informationAn Improved Hardware Implementation of the Grain-128a Stream Cipher
An Improved Hardware Implementation of the Grain-128a Stream Cipher Shohreh Sharif Mansouri and Elena Dubrova Department of Electronic Systems Royal Institute of Technology (KTH), Stockholm Email:{shsm,dubrova}@kth.se
More informationUpgrading a FIR Compiler v3.1.x Design to v3.2.x
Upgrading a FIR Compiler v3.1.x Design to v3.2.x May 2005, ver. 1.0 Application Note 387 Introduction This application note is intended for designers who have an FPGA design that uses the Altera FIR Compiler
More informationOn the Rules of Low-Power Design
On the Rules of Low-Power Design (and How to Break Them) Prof. Todd Austin Advanced Computer Architecture Lab University of Michigan austin@umich.edu Once upon a time 1 Rules of Low-Power Design P = acv
More informationA video signal processor for motioncompensated field-rate upconversion in consumer television
A video signal processor for motioncompensated field-rate upconversion in consumer television B. De Loore, P. Lippens, P. Eeckhout, H. Huijgen, A. Löning, B. McSweeney, M. Verstraelen, B. Pham, G. de Haan,
More informationEE5780 Advanced VLSI CAD
EE5780 Advanced VLSI CAD Lecture 11 SRAM and Yield Analysis Zhuo Feng 11.1 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports Outline Serial Access Memories 11.2 Memory
More informationDC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview
DATASHEET DC Ultra Concurrent Timing, Area, Power and Test Optimization DC Ultra RTL synthesis solution enables users to meet today s design challenges with concurrent optimization of timing, area, power
More informationADVANCES in semiconductor technology are contributing
292 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 3, MARCH 2006 Test Infrastructure Design for Mixed-Signal SOCs With Wrapped Analog Cores Anuja Sehgal, Student Member,
More informationChapter 3 Unit Combinational
EE 200: Digital Logic Circuit Design Dr Radwan E Abdel-Aal, COE Logic and Computer Design Fundamentals Chapter 3 Unit Combinational 5 Registers Logic and Design Counters Part Implementation Technology
More information140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004
140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 Leakage Current Reduction in CMOS VLSI Circuits by Input Vector Control Afshin Abdollahi, Farzan Fallah,
More informationPerformance and Energy Consumption Analysis of the X265 Video Encoder
Performance and Energy Consumption Analysis of the X265 Video Encoder Dieison Silveira 1,3, Marcelo Porto 2 and Sergio Bampi 1 1 Federal University of Rio Grande do Sul - INF-UFRGS - Graduate Program in
More informationnmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response
nmos transistor asics of VLSI Design and Test If the gate is high, the switch is on If the gate is low, the switch is off Mohammad Tehranipoor Drain ECE495/695: Introduction to Hardware Security & Trust
More informationAchieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill
White Paper Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill May 2009 Author David Pemberton- Smith Implementation Group, Synopsys, Inc. Executive Summary Many semiconductor
More information3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits
University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Registers A Register Stores a Set of Bits Most of our representations use sets
More informationSpatial Light Modulators XY Series
Spatial Light Modulators XY Series Phase and Amplitude 512x512 and 256x256 A spatial light modulator (SLM) is an electrically programmable device that modulates light according to a fixed spatial (pixel)
More informationData Representation. signals can vary continuously across an infinite range of values e.g., frequencies on an old-fashioned radio with a dial
Data Representation 1 Analog vs. Digital there are two ways data can be stored electronically 1. analog signals represent data in a way that is analogous to real life signals can vary continuously across
More informationCompressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:
Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract: This article1 presents the design of a networked system for joint compression, rate control and error correction
More informationHardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy
Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy Vladimir Afonso 1-2, Henrique Maich 1, Luan Audibert 1, Bruno Zatt 1, Marcelo Porto 1, Luciano Agostini
More informationReconfigurable Neural Net Chip with 32K Connections
Reconfigurable Neural Net Chip with 32K Connections H.P. Graf, R. Janow, D. Henderson, and R. Lee AT&T Bell Laboratories, Room 4G320, Holmdel, NJ 07733 Abstract We describe a CMOS neural net chip with
More informationApplication of A Disk Migration Module in Virtual Machine live Migration
2010 3rd International Conference on Computer and Electrical Engineering (ICCEE 2010) IPCSIT vol. 53 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V53.No.2.61 Application of A Disk Migration
More informationEfficient implementation of a spectrum scanner on a software-defined radio platform
Efficient implementation of a spectrum scanner on a software-defined radio platform François Quitin, Riccardo Pace Université libre de Bruxelles (ULB), Belgium 1 Context and objectives Regulators need
More informationWiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems
1 WiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti +, Achilleas Anastasopoulos*, Scott Mahlke*, Trevor Mudge* *,
More information1ms Column Parallel Vision System and It's Application of High Speed Target Tracking
Proceedings of the 2(X)0 IEEE International Conference on Robotics & Automation San Francisco, CA April 2000 1ms Column Parallel Vision System and It's Application of High Speed Target Tracking Y. Nakabo,
More informationBuild Applications Tailored for Remote Signal Monitoring with the Signal Hound BB60C
Application Note Build Applications Tailored for Remote Signal Monitoring with the Signal Hound BB60C By Justin Crooks and Bruce Devine, Signal Hound July 21, 2015 Introduction The Signal Hound BB60C Spectrum
More informationHardware Implementation of Viterbi Decoder for Wireless Applications
Hardware Implementation of Viterbi Decoder for Wireless Applications Bhupendra Singh 1, Sanjeev Agarwal 2 and Tarun Varma 3 Deptt. of Electronics and Communication Engineering, 1 Amity School of Engineering
More informationOddCI: On-Demand Distributed Computing Infrastructure
OddCI: On-Demand Distributed Computing Infrastructure Rostand Costa Francisco Brasileiro Guido Lemos Filho Dênio Mariz Sousa MTAGS 2nd Workshop on Many-Task Computing on Grids and Supercomputers Co-located
More informationPerformance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques
Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Madhavi Anupoju 1, M. Sunil Prakash 2 1 M.Tech (VLSI) Student, Department of Electronics & Communication Engineering, MVGR
More informationDoubletalk Detection
ELEN-E4810 Digital Signal Processing Fall 2004 Doubletalk Detection Adam Dolin David Klaver Abstract: When processing a particular voice signal it is often assumed that the signal contains only one speaker,
More informationFPGA Digital Signal Processing. Derek Kozel July 15, 2017
FPGA Digital Signal Processing Derek Kozel July 15, 2017 table of contents 1. Field Programmable Gate Arrays (FPGAs) 2. FPGA Programming Options 3. Common DSP Elements 4. RF Network on Chip 5. Applications
More informationIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH GHEVC: An Efficient HEVC Decoder for Graphics Processing Units
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH 2017 459 GHEVC: An Efficient HEVC Decoder for Graphics Processing Units Diego F. de Souza, Student Member, IEEE, Aleksandar Ilic, Member, IEEE, Nuno
More information