Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan

Size: px
Start display at page:

Download "Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan"

Transcription

1 Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan Virginia Polytechnic Institute and State University

2 Reverse-engineer the brain National Academy of Engineering Top 5 Grand Challenges Neuron Axon Terminal (transmitter) Cited from Sciseek.com Dendrites (receiver) Axon (wires) Question: How are the neurons connected? Action Potentials (Spikes) 2

3 Reverse-engineer the brain National Academy of Engineering Top 5 Grand Challenges Multi-Electrode Array (MEA) Neurons grown on MEA Chip A B C A B C time Spike Train Stream 3

4 Reverse-engineer the brain National Academy of Engineering Top 5 Grand Challenges Find Repeating Patterns Infer Network Connectivity 4

5 Fast data mining of spike train stream on Graphics Processing Units (GPUs) MEA Chip GPU Chip Multi-Electrode Array (MEA) NVIDIA GTX280 Graphics Card 5

6 Fast data mining of spike train stream on Graphics Processing Units (GPUs) Two key algorithmic strategies to address scalability problem on GPU A hybrid mining approach A two-pass elimination approach 6

7 Event stream data: sequence of neurons firing ( E 1,t 1 ),( E 2,t 2 ),...,( E n,t ) n Event of Type A occurred at t = 6 Neuron A B 1 1 C D Time Event of Type D occurred at t = 5 7

8 Pattern or Episode Inter-event constraint Occurrences (Non-overlapped) A Neurons B C D Time 1 Episode appears twice in the event stream. 8

9 Data mining problem: Find all possible episodes / patterns which occur more than X-times in the event sequence. Challenge: Combinatorial Explosion: large number of episodes to count Episode Size/Length: A A B A B C A B C D B B A A C B A C B D A C B A C B C A A C D B A D B C A D C B 9

10 Mining Algorithm (A level wise procedure to control combinatorial explosion) Generate an initial list of candidate size-1 episodes Repeat until - no more candidate episodes Count: Occurrences of size-m candidate episodes Prune: Retain only frequent episodes Candidate Generation: size-(m+1) candidate episodes from N-size frequent episodes Output all the frequent episodes Computational bottleneck 10

11 Counting Algorithm (for one episode) Episode: Accept_A() Accept_B() Accept_C() Accept_D() A 1 B 4 C 10 D 17 A 2 B 12 C 13 A A 1 A 2 B 4 A 5 C 10 B 12 C 13 D 17 Event Stream 11

12 Find an efficient counting algorithm on GPU to count the occurrences of N size-m episodes in an event stream. Address scalability problem on GPU s massive parallel execution architecture. 12

13 One episode per GPU thread (PTPE) Each thread counts one episode Simple extension of serial counting GPU MP MP MP N Episodes N GPU Threads SP SP SP SM SM SM Event Stream Global Memory Efficient when the number of episode is larger than the number of GPU cores. 13

14 Not enough episodes/thread, some GPU cores will be idle. Solution: Increase the level of parallelism. Multiple Thread per Episode (MTPE) N Episodes NM N GPU Threads Event Stream M Event Segments 14

15 Problem with simple count merge. 15

16 Choose the right algorithm with respect to the number of episodes N. Define a switching threshold - Crossover point (CP) No If N < CP Yes Use PTPE Use MTPE GPU computing capacity CP = MP B MP T B f (size) MP : Number of multi - processors B MP : Block per multi - processor T B : Thread per block Performance Penalty Factor 16

17 Problem: Original counting algorithm is too complex for a GPU kernel function. Episode: Accept_A() Accept_B() Accept_C() Accept_D() A 1 B 4 C 10 D 17 A 2 B 12 C 13 A A 1 A 2 B 4 A 5 C 10 B 12 C 13 D 17 Event Stream 17

18 Problem: Original counting algorithm is too complex for a GPU kernel function. Accept_A() Accept B() Accept_C() Accept_D() SP MP SP MP SP MP A 1 B 4 C 10 D 17 A 2 B 12 C 13 A 5 SM SM SM Global Memory Large shared memory usage Large register file usage Large number of branching instructions 18

19 Solution: PreElim algorithm Less constrained counting Simple kernel function Upper bound only Episode: A (,5] B (,10] C (,5] D Accept_A() Accept_B() Accept_C() Accept_D() A 12 5 B 4 C D 17 B A 1 A 2 B 4 A 5 C 10 B 12 C 13 D 17 Event Stream 19

20 A simpler kernel function Shared Memory Register Local Memory PreElim 4 x Episode Size 13 0 Normal Counting 44 x Episode Size

21 Solution: Two-pass elimination approach PASS 1: Less Constrained Counting PASS 2: Normal Counting Episodes Threads Fewer Episodes Threads Event Stream Event Stream 21

22 A simpler kernel function Compile Time Difference Shared Memory Register Local Memory PreElim 4 x Episode Size 13 0 Normal Counting 44 x Episode Size Run Time Difference Local Memory Load and Store Divergent Branching Two Pass 24,770,310 12,258,590 Hybrid 210,773,785 14,161,399 22

23 Hardware Computer (custom-built) Intel Core2 2.33GHz 4GB memory Graphics Card (Nvidia GTX 280 GPU) 240 cores (30 MPs * 8 1.3GHz 1GB global memory 16K shared memory for each MP 23

24 Datasets Synthetic (Sym26) 60 seconds with 50,000 events Real (Culture growing for 5 weeks) Day 33: ( events) Day 34: ( events) Day 35: ( events) 24

25 PTPE vs MTPE Crossover points 25

26 Performance of the Hybrid Approach 1200 PTPE PTPE MTPE MTPE Hybrid Time (ms) Crossover points Episode Size Episode Number: Sym26 dataset, Support =

27 Crossover Point Estimation f (size) = a is a better fit. size + b A least square fit is performed. 27

28 Two-pass approach vs Hybrid approach 99.9% fewer episodes 28

29 Performance of the Two-pass approach One Pass Two Pass Total # First Pass Cull 160K 200K 120K 160K Time (ms) 80K Episode # 120K 80K 40K 40K 0K One Pass Two Pass Episode Size 0K Total # First Pass Cull Episode Size dataset, Support =

30 Percentage of episodes eliminated by each pass 100% 99% 98% 97% 96% 95% 94% 93% 92% 91% First Pass Second Pass Support dataset, episode size = 4 30

31 GPU vs CPU GPU is always faster than CPU 5x - 15x speedup Fair comparison Two-pass algorithm used Maximum threading for both 31

32 Massive parallelism is required for conquering near exponential search space GPU s far more accessible than high performance clusters Frequent episode mining Not data parallel Redesigned algorithm Framework for real-time and interactive analysis of spike train experimental data 32

33 A fast temporal data mining framework on GPUs Commoditized system Massive parallel execution architecture Two programming strategies A hybrid approach Increase level of parallelism (data segmentation + map-reduce) Two-pass elimination approach Decrease algorithm complexity (Task decomposition) 33

34 Questions. 34

35 Parallel Execution via pthreads Optimized for CPU execution Minimize disk access Cache performance Implements Two-Pass Approach PreElim Simpler/ Quicker state machine Full State Machine Slower but is required to eliminate all unsupported episodes... A B D E F Z G... A B C D E F G H ACE ACDE AEF EFG

36 A B C D Level-wise N-size frequent episodes => (N+1)-size candidates A B C D A B C D

PRACE Autumn School GPU Programming

PRACE Autumn School GPU Programming PRACE Autumn School 2010 GPU Programming October 25-29, 2010 PRACE Autumn School, Oct 2010 1 Outline GPU Programming Track Tuesday 26th GPGPU: General-purpose GPU Programming CUDA Architecture, Threading

More information

Scalability of MB-level Parallelism for H.264 Decoding

Scalability of MB-level Parallelism for H.264 Decoding Scalability of Macroblock-level Parallelism for H.264 Decoding Mauricio Alvarez Mesa 1, Alex Ramírez 1,2, Mateo Valero 1,2, Arnaldo Azevedo 3, Cor Meenderinck 3, Ben Juurlink 3 1 Universitat Politècnica

More information

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures 46 H. Y. SU, M. WEN, J. REN, N. WU, J. CHAI, C.Y. ZHANG, HIGH-EFFICIENT PARALLEL CAVLC ENCODER High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures Huayou SU, Mei WEN, Ju REN,

More information

Prime Num Generator - Maker Faire 2014

Prime Num Generator - Maker Faire 2014 Prime Num Generator - Maker Faire 2014 Experimenting with math in hardware Stanley Ng, Altera Synopsis The Prime Number Generator ( PNG ) counts from 1 to some number (273 million, on a Cyclone V C5 device)

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

Transparent low-overhead checkpoint for GPU-accelerated clusters

Transparent low-overhead checkpoint for GPU-accelerated clusters Transparent low-overhead checkpoint for GPU-accelerated clusters Leonardo BAUTISTA GOMEZ 1,3, Akira NUKADA 1, Naoya MARUYAMA 1, Franck CAPPELLO 3,4, Satoshi MATSUOKA 1,2 1 Tokyo Institute of Technology,

More information

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities Introduction About Myself What to expect out of this lecture Understand the current trend in the IC Design

More information

Highly Parallel HEVC Decoding for Heterogeneous Systems with CPU and GPU

Highly Parallel HEVC Decoding for Heterogeneous Systems with CPU and GPU 2017. This manuscript version (accecpted manuscript) is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/. Highly Parallel HEVC Decoding for Heterogeneous

More information

Communication Avoiding Successive Band Reduction

Communication Avoiding Successive Band Reduction Communication Avoiding Successive Band Reduction Grey Ballard, James Demmel, Nicholas Knight UC Berkeley PPoPP 12 Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by

More information

Understanding Compression Technologies for HD and Megapixel Surveillance

Understanding Compression Technologies for HD and Megapixel Surveillance When the security industry began the transition from using VHS tapes to hard disks for video surveillance storage, the question of how to compress and store video became a top consideration for video surveillance

More information

Research Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

Research Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation e Scientific World Journal, Article ID 716020, 19 pages http://dx.doi.org/10.1155/2014/716020 Research Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation Huayou

More information

Embedded System Design

Embedded System Design Embedded System Design p. 1/2 Embedded System Design Prof. Stephen A. Edwards sedwards@cs.columbia.edu Spring 2007 Spot the Computer Embedded System Design p. 2/2 Embedded System Design p. 3/2 Hidden Computers

More information

Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring

Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring white paper Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring Executive Summary Milestone Systems, the world s leading

More information

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops

Gated Driver Tree Based Power Optimized Multi-Bit Flip-Flops International Journal of Emerging Engineering Research and Technology Volume 2, Issue 4, July 2014, PP 250-254 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Gated Driver Tree Based Power Optimized Multi-Bit

More information

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Implementation of an MPEG Codec on the Tilera TM 64 Processor 1 Implementation of an MPEG Codec on the Tilera TM 64 Processor Whitney Flohr Supervisor: Mark Franklin, Ed Richter Department of Electrical and Systems Engineering Washington University in St. Louis Fall

More information

Lossless Compression Algorithms for Direct- Write Lithography Systems

Lossless Compression Algorithms for Direct- Write Lithography Systems Lossless Compression Algorithms for Direct- Write Lithography Systems Hsin-I Liu Video and Image Processing Lab Department of Electrical Engineering and Computer Science University of California at Berkeley

More information

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011 ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011 Lecture 9: TX Multiplexer Circuits Sam Palermo Analog & Mixed-Signal Center Texas A&M University Announcements & Agenda Next

More information

HEVC Real-time Decoding

HEVC Real-time Decoding HEVC Real-time Decoding Benjamin Bross a, Mauricio Alvarez-Mesa a,b, Valeri George a, Chi-Ching Chi a,b, Tobias Mayer a, Ben Juurlink b, and Thomas Schierl a a Image Processing Department, Fraunhofer Institute

More information

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far.

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far. Outline 1 Reiteration Lecture 5: EIT090 Computer Architecture 2 Dynamic scheduling - Tomasulo Anders Ardö 3 Superscalar, VLIW EIT Electrical and Information Technology, Lund University Sept. 30, 2009 4

More information

Masters of Science in COMPUTER ENGINEERING

Masters of Science in COMPUTER ENGINEERING PICSEL: Measuring User-Perceived Performance to Control Dynamic Frequency Scaling IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Masters of Science in COMPUTER ENGINEERING By Jack Cosgrove

More information

Processor time 9 Used memory 9. Lost video frames 11 Storage buffer 11 Received rate 11

Processor time 9 Used memory 9. Lost video frames 11 Storage buffer 11 Received rate 11 Processor time 9 Used memory 9 Lost video frames 11 Storage buffer 11 Received rate 11 2 3 After you ve completed the installation and configuration, run AXIS Installation Verifier from the main menu icon

More information

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 29 Minimizing Switched Capacitance-III. (Refer

More information

High Performance Carry Chains for FPGAs

High Performance Carry Chains for FPGAs High Performance Carry Chains for FPGAs Matthew M. Hosler Department of Electrical and Computer Engineering Northwestern University Abstract Carry chains are an important consideration for most computations,

More information

Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley

Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley Eric Battenberg and David Wessel Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley Microsoft Parallel Applications Workshop

More information

MMI: A General Narrow Interface for Memory Devices

MMI: A General Narrow Interface for Memory Devices MMI: A General Narrow Interface for Devices Judy Chen Eric Linstadt Rambus Inc. Session 106 August 12, 2009 August 2009 1 What is MMI? WLAN BT GPS NOR S/M Baseband Processor Apps/Media Processor NAND M

More information

Design Project: Designing a Viterbi Decoder (PART I)

Design Project: Designing a Viterbi Decoder (PART I) Digital Integrated Circuits A Design Perspective 2/e Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolić Chapters 6 and 11 Design Project: Designing a Viterbi Decoder (PART I) 1. Designing a Viterbi

More information

Sharif University of Technology. SoC: Introduction

Sharif University of Technology. SoC: Introduction SoC Design Lecture 1: Introduction Shaahin Hessabi Department of Computer Engineering System-on-Chip System: a set of related parts that act as a whole to achieve a given goal. A system is a set of interacting

More information

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer by: Matt Mazzola 12222670 Abstract The design of a spectrum analyzer on an embedded device is presented. The device achieves minimum

More information

Frame Interpolation and Motion Blur for Film Production and Presentation GTC Conference, San Jose

Frame Interpolation and Motion Blur for Film Production and Presentation GTC Conference, San Jose Frame Interpolation and Motion Blur for Film Production and Presentation 2013 GTC Conference, San Jose Keith Slavin, isovideo LLC (slides 20 to 22 by Chad Fogg) 1 What we have today 24 frames/sec is too

More information

Impact of Intermittent Faults on Nanocomputing Devices

Impact of Intermittent Faults on Nanocomputing Devices Impact of Intermittent Faults on Nanocomputing Devices Cristian Constantinescu June 28th, 2007 Dependable Systems and Networks Outline Fault classes Permanent faults Transient faults Intermittent faults

More information

VVD: VCR operations for Video on Demand

VVD: VCR operations for Video on Demand VVD: VCR operations for Video on Demand Ravi T. Rao, Charles B. Owen* Michigan State University, 3 1 1 5 Engineering Building, East Lansing, MI 48823 ABSTRACT Current Video on Demand (VoD) systems do not

More information

Controlling Peak Power During Scan Testing

Controlling Peak Power During Scan Testing Controlling Peak Power During Scan Testing Ranganathan Sankaralingam and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering University of Texas, Austin,

More information

GPU Acceleration of a Production Molecular Docking Code

GPU Acceleration of a Production Molecular Docking Code GPU Acceleration of a Production Molecular Docking Code Bharat Sukhwani Martin Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University

More information

Discovery of frequent episodes in event sequences

Discovery of frequent episodes in event sequences Discovery of frequent episodes in event sequences Andres Kauts, Kait Kasak University of Tartu 2009 MTAT.03.249 Combinatorial Data Mining Algorithms What is sequential data mining Sequencial data mining

More information

Performance Driven Reliable Link Design for Network on Chips

Performance Driven Reliable Link Design for Network on Chips Performance Driven Reliable Link Design for Network on Chips Rutuparna Tamhankar Srinivasan Murali Prof. Giovanni De Micheli Stanford University Outline Introduction Objective Logic design and implementation

More information

Fooling the Masses with Performance Results: Old Classics & Some New Ideas

Fooling the Masses with Performance Results: Old Classics & Some New Ideas Fooling the Masses with Performance Results: Old Classics & Some New Ideas Gerhard Wellein (1,2), Georg Hager (2) (1) Department for Computer Science (2) Erlangen Regional Computing Center Friedrich-Alexander-Universität

More information

MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges

MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER Wassim Hamidouche, Mickael Raulet and Olivier Déforges

More information

Power Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT

Power Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT Power Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT Giuseppe Desoli ST Central Labs STMicroelectronics Artificial Intelligence is Everywhere 2 Analysis,

More information

A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b

A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b 1 Education Ministry

More information

New-Generation Scalable Motion Processing from Mobile to 4K and Beyond

New-Generation Scalable Motion Processing from Mobile to 4K and Beyond Mobile to 4K and Beyond White Paper Today s broadcast video content is being viewed on the widest range of display devices ever known, from small phone screens and legacy SD TV sets to enormous 4K and

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

J. Maillard, J. Silva. Laboratoire de Physique Corpusculaire, College de France. Paris, France

J. Maillard, J. Silva. Laboratoire de Physique Corpusculaire, College de France. Paris, France Track Parallelisation in GEANT Detector Simulations? J. Maillard, J. Silva Laboratoire de Physique Corpusculaire, College de France Paris, France Track parallelisation of GEANT-based detector simulations,

More information

RedEye Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision

RedEye Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision Robert LiKamWa Yunhui Hou Yuan Gao Mia Polansky Lin Zhong roblkw@rice.edu houyh@rice.edu yg18@rice.edu mia.polansky@rice.edu lzhong@rice.edu

More information

NVCP recommended settings for TSW incl GSync 5. Screen Settings in TSW - Graphics settings 6. TSW Settings explained and recommendations 7

NVCP recommended settings for TSW incl GSync 5. Screen Settings in TSW - Graphics settings 6. TSW Settings explained and recommendations 7 Setting Up TSW with a single nvidia card, using nvidia Control Panel (NVCP) PLUS (optional) nvidia Inspector (NVI). Single Standard and GSync Monitor settings. Setting up DSR in TSW This is a guide to

More information

Designing for High Speed-Performance in CPLDs and FPGAs

Designing for High Speed-Performance in CPLDs and FPGAs Designing for High Speed-Performance in CPLDs and FPGAs Zeljko Zilic, Guy Lemieux, Kelvin Loveless, Stephen Brown, and Zvonko Vranesic Department of Electrical and Computer Engineering University of Toronto,

More information

1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington

1) New Paths to New Machine Learning Science. 2) How an Unruly Mob Almost Stole. Jeff Howbert University of Washington 1) New Paths to New Machine Learning Science 2) How an Unruly Mob Almost Stole the Grand Prize at the Last Moment Jeff Howbert University of Washington February 4, 2014 Netflix Viewing Recommendations

More information

EyeFace SDK v Technical Sheet

EyeFace SDK v Technical Sheet EyeFace SDK v4.5.0 Technical Sheet Copyright 2015, All rights reserved. All attempts have been made to make the information in this document complete and accurate. Eyedea Recognition, Ltd. is not responsible

More information

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003

Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003 1 Introduction Long and Fast Up/Down Counters Pushpinder Kaur CHOUHAN 6 th Jan, 2003 Circuits for counting both forward and backward events are frequently used in computers and other digital systems. Digital

More information

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. The final version is published and available at IET Digital Library

More information

On the Characterization of Distributed Virtual Environment Systems

On the Characterization of Distributed Virtual Environment Systems On the Characterization of Distributed Virtual Environment Systems P. Morillo, J. M. Orduña, M. Fernández and J. Duato Departamento de Informática. Universidad de Valencia. SPAIN DISCA. Universidad Politécnica

More information

Video-on-Demand. Nick Caggiano Walter Phillips

Video-on-Demand. Nick Caggiano Walter Phillips Video-on-Demand Nick Caggiano Walter Phillips Video-on-Demand What is Video-on-Demand? Storage, transmission, and display of archived video files in a networked environment Most popularly used to watch

More information

CPS311 Lecture: Sequential Circuits

CPS311 Lecture: Sequential Circuits CPS311 Lecture: Sequential Circuits Last revised August 4, 2015 Objectives: 1. To introduce asynchronous and synchronous flip-flops (latches and pulsetriggered, plus asynchronous preset/clear) 2. To introduce

More information

Distributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes

Distributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes Distributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes Ankit Arora Sachin Bagga Rajbir Singh Cheema M.Tech (IT) M.Tech (CSE) M.Tech (CSE) Guru Nanak Dev University Asr. Thapar

More information

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing Theodore Yu theodore.yu@ti.com Texas Instruments Kilby Labs, Silicon Valley Labs September 29, 2012 1 Living in an analog world The

More information

AN EFFECTIVE CACHE FOR THE ANYWHERE PIXEL ROUTER

AN EFFECTIVE CACHE FOR THE ANYWHERE PIXEL ROUTER University of Kentucky UKnowledge Theses and Dissertations--Electrical and Computer Engineering Electrical and Computer Engineering 2007 AN EFFECTIVE CACHE FOR THE ANYWHERE PIXEL ROUTER Vijai Raghunathan

More information

Multicore Design Considerations

Multicore Design Considerations Multicore Design Considerations Multicore: The Forefront of Computing Technology We re not going to have faster processors. Instead, making software run faster in the future will mean using parallel programming

More information

Optimizing the Startup Time of Embedded Systems: A Case Study of Digital TV

Optimizing the Startup Time of Embedded Systems: A Case Study of Digital TV 2242 IEEE Transactions on Consumer Electronics, Vol. 55, No. 4, NOVEMBER 2009 Optimizing the Startup Time of Embedded Systems: A Case Study of Digital TV Heeseung Jo, Hwanju Kim, Jinkyu Jeong, Joonwon

More information

SPATIAL LIGHT MODULATORS

SPATIAL LIGHT MODULATORS SPATIAL LIGHT MODULATORS Reflective XY Series Phase and Amplitude 512x512 A spatial light modulator (SLM) is an electrically programmable device that modulates light according to a fixed spatial (pixel)

More information

Data Converters and DSPs Getting Closer to Sensors

Data Converters and DSPs Getting Closer to Sensors Data Converters and DSPs Getting Closer to Sensors As the data converters used in military applications must operate faster and at greater resolution, the digital domain is moving closer to the antenna/sensor

More information

REAL-TIME H.264 ENCODING BY THREAD-LEVEL PARALLELISM: GAINS AND PITFALLS

REAL-TIME H.264 ENCODING BY THREAD-LEVEL PARALLELISM: GAINS AND PITFALLS REAL-TIME H.264 ENCODING BY THREAD-LEVEL ARALLELISM: GAINS AND ITFALLS Guy Amit and Adi inhas Corporate Technology Group, Intel Corp 94 Em Hamoshavot Rd, etah Tikva 49527, O Box 10097 Israel {guy.amit,

More information

8 DIGITAL SIGNAL PROCESSOR IN OPTICAL TOMOGRAPHY SYSTEM

8 DIGITAL SIGNAL PROCESSOR IN OPTICAL TOMOGRAPHY SYSTEM Recent Development in Instrumentation System 99 8 DIGITAL SIGNAL PROCESSOR IN OPTICAL TOMOGRAPHY SYSTEM Siti Zarina Mohd Muji Ruzairi Abdul Rahim Chiam Kok Thiam 8.1 INTRODUCTION Optical tomography involves

More information

Optical clock distribution for a more efficient use of DRAMs

Optical clock distribution for a more efficient use of DRAMs Optical clock distribution for a more efficient use of DRAMs D. Litaize, M.P.Y. Desmulliez*, J. Collet**, P. Foulk* Institut de Recherche en Informatique de Toulouse (IRIT), Universite Paul Sabatier, 31062

More information

Amon: Advanced Mesh-Like Optical NoC

Amon: Advanced Mesh-Like Optical NoC Amon: Advanced Mesh-Like Optical NoC Sebastian Werner, Javier Navaridas and Mikel Luján Advanced Processor Technologies Group School of Computer Science The University of Manchester Bottleneck: On-chip

More information

Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel

Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel IEEE TRANSACTIONS ON MAGNETICS, VOL. 46, NO. 1, JANUARY 2010 87 Using Embedded Dynamic Random Access Memory to Reduce Energy Consumption of Magnetic Recording Read Channel Ningde Xie 1, Tong Zhang 1, and

More information

VLSI Digital Signal Processing

VLSI Digital Signal Processing VLSI Digital Signal Processing EEC 28 Lecture Bevan M. Baas Tuesday, January 8, 29 Today Administrative items Syllabus and course overview My background Digital signal processing overview Read Programmable

More information

Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences

Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences Sherri K. Harms, 1 Jitender Deogun, 2 Tsegaye Tadesse 3 1 Department of Computer Science and Information Systems

More information

Hybrid Discrete-Continuous Computer Architectures for Post-Moore s-law Era

Hybrid Discrete-Continuous Computer Architectures for Post-Moore s-law Era Hybrid Discrete-Continuous Computer Architectures for Post-Moore s-law Era Keynote at the Bi annual HiPEAC Compu6ng Systems Week Mee6ng Barcelona, Spain October 19 th 2010 Prof. Simha Sethumadhavan Columbia

More information

MindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK.

MindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK. Andrew Robbins MindMouse Project Description: MindMouse is an application that interfaces the user s mind with the computer s mouse functionality. The hardware that is required for MindMouse is the Emotiv

More information

Adaptive Key Frame Selection for Efficient Video Coding

Adaptive Key Frame Selection for Efficient Video Coding Adaptive Key Frame Selection for Efficient Video Coding Jaebum Jun, Sunyoung Lee, Zanming He, Myungjung Lee, and Euee S. Jang Digital Media Lab., Hanyang University 17 Haengdang-dong, Seongdong-gu, Seoul,

More information

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique Dr. Dhafir A. Alneema (1) Yahya Taher Qassim (2) Lecturer Assistant Lecturer Computer Engineering Dept.

More information

Automatic optimization of image capture on mobile devices by human and non-human agents

Automatic optimization of image capture on mobile devices by human and non-human agents Automatic optimization of image capture on mobile devices by human and non-human agents 1.1 Abstract Sophie Lebrecht, Mark Desnoyer, Nick Dufour, Zhihao Li, Nicole A. Halmi, David L. Sheinberg, Michael

More information

An Improved Hardware Implementation of the Grain-128a Stream Cipher

An Improved Hardware Implementation of the Grain-128a Stream Cipher An Improved Hardware Implementation of the Grain-128a Stream Cipher Shohreh Sharif Mansouri and Elena Dubrova Department of Electronic Systems Royal Institute of Technology (KTH), Stockholm Email:{shsm,dubrova}@kth.se

More information

Upgrading a FIR Compiler v3.1.x Design to v3.2.x

Upgrading a FIR Compiler v3.1.x Design to v3.2.x Upgrading a FIR Compiler v3.1.x Design to v3.2.x May 2005, ver. 1.0 Application Note 387 Introduction This application note is intended for designers who have an FPGA design that uses the Altera FIR Compiler

More information

On the Rules of Low-Power Design

On the Rules of Low-Power Design On the Rules of Low-Power Design (and How to Break Them) Prof. Todd Austin Advanced Computer Architecture Lab University of Michigan austin@umich.edu Once upon a time 1 Rules of Low-Power Design P = acv

More information

A video signal processor for motioncompensated field-rate upconversion in consumer television

A video signal processor for motioncompensated field-rate upconversion in consumer television A video signal processor for motioncompensated field-rate upconversion in consumer television B. De Loore, P. Lippens, P. Eeckhout, H. Huijgen, A. Löning, B. McSweeney, M. Verstraelen, B. Pham, G. de Haan,

More information

EE5780 Advanced VLSI CAD

EE5780 Advanced VLSI CAD EE5780 Advanced VLSI CAD Lecture 11 SRAM and Yield Analysis Zhuo Feng 11.1 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports Outline Serial Access Memories 11.2 Memory

More information

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview DATASHEET DC Ultra Concurrent Timing, Area, Power and Test Optimization DC Ultra RTL synthesis solution enables users to meet today s design challenges with concurrent optimization of timing, area, power

More information

ADVANCES in semiconductor technology are contributing

ADVANCES in semiconductor technology are contributing 292 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 3, MARCH 2006 Test Infrastructure Design for Mixed-Signal SOCs With Wrapped Analog Cores Anuja Sehgal, Student Member,

More information

Chapter 3 Unit Combinational

Chapter 3 Unit Combinational EE 200: Digital Logic Circuit Design Dr Radwan E Abdel-Aal, COE Logic and Computer Design Fundamentals Chapter 3 Unit Combinational 5 Registers Logic and Design Counters Part Implementation Technology

More information

140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004

140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 140 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 Leakage Current Reduction in CMOS VLSI Circuits by Input Vector Control Afshin Abdollahi, Farzan Fallah,

More information

Performance and Energy Consumption Analysis of the X265 Video Encoder

Performance and Energy Consumption Analysis of the X265 Video Encoder Performance and Energy Consumption Analysis of the X265 Video Encoder Dieison Silveira 1,3, Marcelo Porto 2 and Sergio Bampi 1 1 Federal University of Rio Grande do Sul - INF-UFRGS - Graduate Program in

More information

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response

nmos transistor Basics of VLSI Design and Test Solution: CMOS pmos transistor CMOS Inverter First-Order DC Analysis CMOS Inverter: Transient Response nmos transistor asics of VLSI Design and Test If the gate is high, the switch is on If the gate is low, the switch is off Mohammad Tehranipoor Drain ECE495/695: Introduction to Hardware Security & Trust

More information

Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill

Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill White Paper Achieving Faster Time to Tapeout with In-Design, Signoff-Quality Metal Fill May 2009 Author David Pemberton- Smith Implementation Group, Synopsys, Inc. Executive Summary Many semiconductor

More information

3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits

3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Registers A Register Stores a Set of Bits Most of our representations use sets

More information

Spatial Light Modulators XY Series

Spatial Light Modulators XY Series Spatial Light Modulators XY Series Phase and Amplitude 512x512 and 256x256 A spatial light modulator (SLM) is an electrically programmable device that modulates light according to a fixed spatial (pixel)

More information

Data Representation. signals can vary continuously across an infinite range of values e.g., frequencies on an old-fashioned radio with a dial

Data Representation. signals can vary continuously across an infinite range of values e.g., frequencies on an old-fashioned radio with a dial Data Representation 1 Analog vs. Digital there are two ways data can be stored electronically 1. analog signals represent data in a way that is analogous to real life signals can vary continuously across

More information

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract: Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract: This article1 presents the design of a networked system for joint compression, rate control and error correction

More information

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy Vladimir Afonso 1-2, Henrique Maich 1, Luan Audibert 1, Bruno Zatt 1, Marcelo Porto 1, Luciano Agostini

More information

Reconfigurable Neural Net Chip with 32K Connections

Reconfigurable Neural Net Chip with 32K Connections Reconfigurable Neural Net Chip with 32K Connections H.P. Graf, R. Janow, D. Henderson, and R. Lee AT&T Bell Laboratories, Room 4G320, Holmdel, NJ 07733 Abstract We describe a CMOS neural net chip with

More information

Application of A Disk Migration Module in Virtual Machine live Migration

Application of A Disk Migration Module in Virtual Machine live Migration 2010 3rd International Conference on Computer and Electrical Engineering (ICCEE 2010) IPCSIT vol. 53 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V53.No.2.61 Application of A Disk Migration

More information

Efficient implementation of a spectrum scanner on a software-defined radio platform

Efficient implementation of a spectrum scanner on a software-defined radio platform Efficient implementation of a spectrum scanner on a software-defined radio platform François Quitin, Riccardo Pace Université libre de Bruxelles (ULB), Belgium 1 Context and objectives Regulators need

More information

WiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems

WiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems 1 WiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti +, Achilleas Anastasopoulos*, Scott Mahlke*, Trevor Mudge* *,

More information

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking Proceedings of the 2(X)0 IEEE International Conference on Robotics & Automation San Francisco, CA April 2000 1ms Column Parallel Vision System and It's Application of High Speed Target Tracking Y. Nakabo,

More information

Build Applications Tailored for Remote Signal Monitoring with the Signal Hound BB60C

Build Applications Tailored for Remote Signal Monitoring with the Signal Hound BB60C Application Note Build Applications Tailored for Remote Signal Monitoring with the Signal Hound BB60C By Justin Crooks and Bruce Devine, Signal Hound July 21, 2015 Introduction The Signal Hound BB60C Spectrum

More information

Hardware Implementation of Viterbi Decoder for Wireless Applications

Hardware Implementation of Viterbi Decoder for Wireless Applications Hardware Implementation of Viterbi Decoder for Wireless Applications Bhupendra Singh 1, Sanjeev Agarwal 2 and Tarun Varma 3 Deptt. of Electronics and Communication Engineering, 1 Amity School of Engineering

More information

OddCI: On-Demand Distributed Computing Infrastructure

OddCI: On-Demand Distributed Computing Infrastructure OddCI: On-Demand Distributed Computing Infrastructure Rostand Costa Francisco Brasileiro Guido Lemos Filho Dênio Mariz Sousa MTAGS 2nd Workshop on Many-Task Computing on Grids and Supercomputers Co-located

More information

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques

Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Performance Evolution of 16 Bit Processor in FPGA using State Encoding Techniques Madhavi Anupoju 1, M. Sunil Prakash 2 1 M.Tech (VLSI) Student, Department of Electronics & Communication Engineering, MVGR

More information

Doubletalk Detection

Doubletalk Detection ELEN-E4810 Digital Signal Processing Fall 2004 Doubletalk Detection Adam Dolin David Klaver Abstract: When processing a particular voice signal it is often assumed that the signal contains only one speaker,

More information

FPGA Digital Signal Processing. Derek Kozel July 15, 2017

FPGA Digital Signal Processing. Derek Kozel July 15, 2017 FPGA Digital Signal Processing Derek Kozel July 15, 2017 table of contents 1. Field Programmable Gate Arrays (FPGAs) 2. FPGA Programming Options 3. Common DSP Elements 4. RF Network on Chip 5. Applications

More information

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH GHEVC: An Efficient HEVC Decoder for Graphics Processing Units

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH GHEVC: An Efficient HEVC Decoder for Graphics Processing Units IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH 2017 459 GHEVC: An Efficient HEVC Decoder for Graphics Processing Units Diego F. de Souza, Student Member, IEEE, Aleksandar Ilic, Member, IEEE, Nuno

More information