EIE: Efficient Inference Engine on Compressed Deep Neural Network

Size: px
Start display at page:

Download "EIE: Efficient Inference Engine on Compressed Deep Neural Network"

Transcription

1 EIE: Efficient Inference Engine on Compressed Deep Neural Network Song Han*, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, Bill Dally Stanford University June 20, 2016

2 Deep Learning on Mobile Phones Drones Robots Glasses Self Driving Cars Battery Constrained! 2

3 Deep Learning on Mobile: Difficulty? Model Size! Accurate Prediction => Large Model => More Memory Reference => High Power Operation Energy [pj] Relative Cost 32 bit int ADD bit float ADD bit Register File bit int MULT bit float MULT bit SRAM Cache bit DRAM Memory Relative Energy Cost = 100 3

4 Our Past Work: Deep Compression Problem 1: DNN Model Size too Large Solution 1: Deep Compression Smaller Size 90% zeros in weights 4-bit weight Accuracy No loss of accuracy / Improved accuracy On-chip State-of-the-art DNN fit on-chip SRAM 8

5 Our Past Work: Deep Compression Network Pruning [1]: 10x fewer weights 60M weights 6M weights Weight Sharing [2]: only 4-bits per remaining weight 32 bit 4 bit [1]. Han et al. NIPS 2015 [2]. Han et al. ICLR 2016, best paper award 10

6 Deep Compression Results Network Original Size Compressed Size Compression Ratio Original Accuracy Compressed Accuracy AlexNet 240MB 6.9MB 35x 80.27% 80.30% VGGNet 550MB 11.3MB 49x 88.68% 89.09% GoogleNet 28MB 2.8MB 10x 88.90% 88.92% SqueezeNet 4.8MB 0.47MB 10x 80.32% 80.35% No loss of accuracy on ImageNet dataset. Weights fits on-chip SRAM, taking 120x less energy than DRAM. 11

7 EIE: First Accelerator for Compressed Sparse Neural Network SpMat Act_0 Act_1 Ptr_Even Arithm Ptr_Odd SpMat Problem 2: Irregular Computation Pattern Solution 2: EIE accelerator Sparse Matrix 90% static sparsity in the weights, 10x less computation, 5x less memory footprint Sparse Vector 70% dynamic sparsity in the activation 3x less computation Weight Sharing 4bits weights 8x less memory footprint Fully fits in SRAM 120x less energy than DRAM Savings are multiplicative: 5x3x8x120=14,400 theoretical energy improvement.

8 Distributed Storage and Processing logically ~a 0 a 1 0 a PE0 w 0,0 w 0,1 0 w 0,3 PE1 0 0 w 1,2 0 PE2 0 w 2,1 0 w 2,3 PE w 4,2 w 4,3 w 5, B w 6,3 A 0 w 7,1 0 0 = 0 b 0 b 1 b 2 b 3 b 4 b 5 b 6 b 7 1 C A ReLU ) 0 ~ b b 0 b 1 0 b 3 0 b 5 b C A physically Virtual Weight W 0,0 W 0,1 W 4,2 W 0,3 W 4,3 Relative Index Column Pointer

9 PE Architecture PE PE PE PE PE PE PE PE Central Control Act Value Act Index Act Queue Act Index Act Value Encoded Weight Act SRAM Leading NZero Detect PE PE PE PE PE PE PE PE Even Ptr SRAM Bank Odd Ptr SRAM Bank Pointer Read Col Start/ End Addr Sparse Matrix SRAM Sparse Matrix Access Regs Relative Index Weight Decoder Address Accum Absolute Address Arithmetic Unit Bypass Dest Act Regs Src Act Regs Act R/W ReLU SRAM Regs Comb

10 Benchmark CPU: Intel Core-i7 5930k GPU: NVIDIA TitanX Mobile GPU: NVIDIA Jetson TK1 Layer Size Weight Density Activation Density FLOP % AlexNet % 35.1% 3% AlexNet % 35.3% 3% AlexNet % 37.5% 10% VGG % 18.3% 1% VGG % 37.5% 2% VGG % 41.1% 9% Description AlexNet for image classification VGG-16 for image classification NeuralTalk-We % 100% 10% RNN and LSTM for NeuralTalk-Wd % 100% 11% image NeuralTalk-LSTM % 100% 11% caption 38

11 Scalability Speedup PE 2PEs 4PEs 8PEs 16PEs 32PEs 64PEs 128PEs 256PEs Alex-6 Alex-7 Alex-8 VGG-6 VGG-7 VGG-8 NT-We NT-Wd NT-LSTM Figure 11. System scalability. It measures the speedups with different numbers of PEs. The speedup is near-linear. #PEs ~ Speedup 64PEs: 64x 128PEs: 124x 256PEs: 210x 39

12 Load Balancing Load Balance 100% 80% 60% 40% 20% 0% FIFO=1 FIFO=2 FIFO=4 FIFO=8 FIFO=16 FIFO=32 FIFO=64 FIFO=128 FIFO=256 Alex-6 Alex-7 Alex-8 VGG-6 VGG-7 VGG-8 NT-We NT-Wd NT-LSTM 8. Load efficiency improves as FIFO size increases. When FIFO deepth 8, the marginal gain quickly diminishes. So we choose FIFO de Imbalanced non-zeros among PEs degrades system utilization. This load imbalance could be solved by FIFO. With FIFO depth=16, ALU utilization is > 90%. 40

13 Result of EIE Technology 45 nm # PEs 64 on-chip SRAM Max Model Size Static Sparsity Dynamic Sparsity Quantization ALU Width Area MxV Throughput Power 8 MB 84 Million 10x 3x 4-bit 16-bit 40.8 mm^2 81,967 layers/s 586 mw 1. Post layout result 2. Throughput measured on AlexNet FC-7 41

14 Energy Breakdown memory register clock network combinational Act_queue SpmatRead ActRW PtrRead ArithmUnit 11% 9% 13% 12% 1% 20% 20% 59% 54% 42

15 Prediction Accuracy 90% Multiply Energy (pj) Prediction Accuracy 4.0 Accuracy 68% 45% 23% 0% Figure b Float 32b Int 16b Int 8b Int Arithmetic Precision Prediction accuracy and multiplier energy with different Mixed Precision: 4 bit index (virtual weight) 16 bit real weight, 16 bit fixed point ALU Mul Energy (pj) 43

16 FC Layer: Speedup on EIE Speedup 1000x 100x 10x 1x 0.1x igure 6. CPU Dense (Baseline) CPU Compressed GPU Dense GPU Compressed mgpu Dense mgpu Compressed EIE 1018x 507x 618x 248x 210x 94x 135x 189x 115x 92x 98x 56x 63x 60x 25x 24x 34x 33x 48x 21x 22x 25x 14x 14x 16x 15x 15x 9x 8x 10x 9x 10x 9x 9x 5x 5x 2x 3x 2x 3x 3x 2x 3x 3x 1x 1x 1x 1x 2x 1.1x 1x 1x 1.0x 1x 1.0x 1x 1x 1x 1x 1x 1x 1x 1x 0.6x 0.5x 0.3x 0.5x 0.5x 0.5x 0.6x Alex-6 Alex-7 Alex-8 VGG-6 VGG-7 VGG-8 NT-We NT-Wd NT-LSTM Geo Mean Speedups of GPU, mobile GPU and EIE compared with CPU running uncompressed DNN model. There is no batching in all cases. Energy Efficiency x Compared to CPU 61,533xand GPU: 34,522x 14,826x 10000x re 7. CPU Dense (Baseline) CPU Compressed GPU Dense GPU Compressed mgpu Dense mgpu Compressed EIE 189x and 13x faster 119,797x 76,784x 11,828x 9,485x 10,904x 8,053x 1000x 78x 101x 102x 100x 59x 61x 26x 37x 37x 39x 17x 20x 9x12x 18x 25x 25x 36x 5x 7x 7x 10x 10x 10x 14x 15x 23x 14x 20x 8x 5x 6x 6x 6x 6x 3x 4x 5x 6x 7x 10x 2x 1x 10x 15x Baseline: 13x 14x 1x 1x 7x 1x 1x 1x 5x 1x 8x 1x 7x 1x 7x 1x 9x 1x Intel Core i7 5930K: MKL CBLAS GEMV, MKL SPBLAS CSRMV NVIDIA GeForce GTX Titan X: cublas GEMV, cusparse CSRMV Alex-6 Alex-7 Alex-8 VGG-6 VGG-7 VGG-8 NT-We NT-Wd NT-LSTM Geo Mean Energy efficiency of GPU, mobile GPU and EIE compared with CPU running uncompressed DNN model. There is no batching in all ca NVIDIA Tegra K1: cublas GEMV, cusparse CSRMV er. We placed and routed the PE using the Synopsys IC piler (ICC). We used Cacti [25] to get SRAM area and 44 gy numbers. We annotated the toggle rate from the RTL Table III BENCHMARK FROM STATE-OF-THE-ART DNN MODELS Layer Size Weight% Act% FLOP% Description 9216, 24,207

17 FC Layer: Energy Efficiency on EIE Energy Efficiency re x 10000x 1000x 100x 10x 1x CPU Dense (Baseline) CPU Compressed GPU Dense GPU Compressed mgpu Dense mgpu Compressed EIE 34,522x 61,533x 14,826x 119,797x 76,784x 11,828x 9,485x 10,904x 8,053x 78x 101x 102x 59x 61x 26x 37x 37x 39x 17x 20x 9x12x 18x 25x 25x 36x 5x 7x 7x 10x 10x 10x 14x 15x 23x 14x 20x 8x 5x 6x 6x 6x 6x 3x 4x 5x 6x 7x 2x 1x 10x 15x 13x 14x 1x 1x 7x 1x 1x 1x 5x 1x 8x 1x 7x 1x 7x 1x 9x Alex-6 Alex-7 Alex-8 VGG-6 VGG-7 VGG-8 NT-We NT-Wd NT-LSTM Geo Mean Energy efficiency of GPU, mobile GPU and EIE compared with CPU running uncompressed DNN model. There is no batching in all cas Table III BENCHMARK FROM STATE-OF-THE-ART DNN MODELS er. We placed and routed the PE using the Synopsys IC Compared to CPU and GPU: piler (ICC). We used Cacti [25] to get SRAM area and gy numbers. 24,000x We annotated and 3,400x the toggle more rate energy from the RTL efficient 9216, Alex-6 lation to the gate-level netlist, which was dumped to , ching activity interchange format (SAIF), and estimated Alex-7 ower Baseline: 4096 using Prime-Time PX. 4096, Alex-8 omparison Intel Baseline. Core i7 We 5930K: compare reported EIE withby three pcm-power dift off-the-shelf NVIDIA computing GeForce units: GTX CPU, Titan GPU X: and reported mobile by nvidia-smi VGG-6 utility , 4096 utility. 4096, NVIDIA Tegra K1: measured with power-meter, VGG-7 60% AP+DRAM power CPU. We use Intel Core i k CPU, a Haswell-E , processor, that has been used in NVIDIA Digits Deep VGG ning Dev Box as a CPU baseline. To run the benchmark Layer Size Weight% Act% FLOP% Description NT-We 4096, 9% 35.1% 3% 9% 35.3% 3% 25% 37.5% 10% 24,207x Compressed AlexNet [1] fo large scale ima classification VGG-16 [3] fo classification a 4% 18.3% 1% Compressed 4% 37.5% 2% large scale ima 23% 41.1% 9% object detectio 10% 100% 10% Compressed

18 Comparison: Throughput MxV Throughput (Layers/s) Core-i7 5930k (22nm CPU) TitanX (28nm GPU) Tegra K1 A-Eye Da-DianNao (28nm mgpu) (28nm FPGA) (28nm ASIC True-North (28nm ASIC) EIE (45nm ASIC) 64PEs EIE (28nm ASIC) 256PEs 46

19 Comparison: Area Efficiency Area Efficiency (Layers/s/mm^2) Core-i7 5930k (22nm CPU) TitanX (28nm GPU) Tegra K1 A-Eye Da-DianNao (28nm mgpu) (28nm FPGA) (28nm ASIC True-North (28nm ASIC) EIE (45nm ASIC) 64PEs EIE (28nm ASIC) 256PEs 47

20 Comparison: Energy Efficiency Energy Efficiency (Layers/J) Core-i7 5930k (22nm CPU) TitanX (28nm GPU) Tegra K1 A-Eye Da-DianNao (28nm mgpu) (28nm FPGA) (28nm ASIC True-North (28nm ASIC) EIE (45nm ASIC) 64PEs EIE (28nm ASIC) 256PEs 48

21 Where are the savings from? Four factors for energy saving: 10 static weight sparsity; less work to do; less bricks to carry. 3 dynamic activation sparsity; carry only good bricks; ignore broken bricks. Weight sharing with only 4-bits per weight; lighter bricks to carry. DRAM => SRAM, no need to go off-chip; carry bricks from San Francisco to Seoul => Incheon to Seoul. 49

22 Conclusion EIE: first accelerator for compressed, sparse neural network. Compression => Acceleration, no loss accuracy. Distributed storage/computation to parallelize/load balance across PEs. 13x faster and 3,400x more energy efficient than GPU. 2.9x faster and 19x more energy efficient than past ASIC. 50

23 Beyond EIE: a Multi-Dimension Sparse Recipe for Deep Learning Faster Speed: EIE accelerator sparsity Smaller Size: Deep Compression, SqueezeNet++ Higher Accuracy: DSD regularization [1]. Han et al. Learning both Weights and Connections for Efficient Neural Networks, NIPS 2015 [2]. Han et al. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, Deep Learning Symposium 2015, ICLR 2016 (best paper award) [3]. Han et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network, ISCA 2016 [4]. Han et al. DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow, arxiv 2016 [5]. Iandola, Han,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size, arxiv 16 [6]. Yao, Han, et.al, Hardware-friendly convolutional neural network with even-number filter size, ICLR workshop

24 Backup Slides 52

25 Sparsity: Pruning AlexNet & VGGNet CONV: 3x FC: 10x Han et al. Learning both Weights and Connections for Efficient Neural Networks, NIPS 2015

26 Retrain to Fully Recover Accuracy Accuracy Loss L2 regularization w/o retrain L1 regularization w/o retrain L1 regularization w/ retrain L2 regularization w/ retrain L2 regularization w/ iterative prune and retrain 0.5% 0.0% -0.5% -1.0% -1.5% -2.0% -2.5% -3.0% -3.5% -4.0% -4.5% 40% 50% 60% 70% 80% 90% 100% Parametes Pruned Away Han et al. Learning both Weights and Connections for Efficient Neural Networks, NIPS 2015

27 Weight Sharing: Accuracy with # Bits #CONV bits / #FC bits Top-1 Error Top-5 Error Top-1 Error Top-5 Error Increase Increase 32bits / 32bits 42.78% 19.73% bits / 5 bits 42.78% 19.70% 0.00% -0.03% 8 bits / 4 bits 42.79% 19.73% 0.01% 0.00% 4 bits / 2 bits 44.77% 22.33% 1.99% 2.60% Han et al. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding ICLR

28 Deep Compression Result on Major Convnets Network Top-1 Error Top-5 Error Parameters Compress Rate LeNet Ref 1.64% KB LeNet Compressed 1.58% - 27 KB 40 LeNet-5 Ref 0.80% KB LeNet-5 Compressed 0.74% - 44 KB 39 AlexNet Ref 42.78% 19.73% 240 MB AlexNet Compressed 42.78% 19.70% 6.9 MB 35 VGG-16 Ref 31.50% 11.32% 552 MB VGG-16 Compressed 31.17% 10.91% 11.3 MB 49 SqueezeNet Ref 42.5% 19.7% 4.8 MB SqueezeNet Compressed 42.5% 19.7% 0.47MB 10 GoogLeNet Ref 31.30% 11.10% 28 MB GoogLeNet Compressed 31.26% 11.08% 2.8 MB 10 SqueezeNet and GoogleNet: just Pruning and Quantization gives 10x compression. Inception Model is really efficient for classification. But it can still achieve an order of magnitude smaller with Deep Compression. Fits in SRAM cache. Han et al. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding ICLR

29 Pruning NeuralTalk and LSTM Original: a basketball player in a white uniform is playing with a ball Pruned 90%: a basketball player in a white uniform is playing with a basketball Original : a brown dog is running through a grassy field Pruned 90%: a brown dog is running through a grassy area Original : a man is riding a surfboard on a wave Pruned 90%: a man in a wetsuit is riding a wave on a beach Original : a soccer player in red is running in the field Pruned 95%: a man in a red shirt and black and white black shirt is running through a field Han et al. Learning both Weights and Connections for Efficient Neural Networks, NIPS 2015 poster

30 With Sparsity Constraint, DSD Training Improves Accuracy (Baseline: NeuralTalk) Baseline: a boy is swimming in a pool. Baseline: a group of people are Sparse: a small black dog is jumping standing in front of a building. into a pool. Sparse: a group of people are standing DSD: a black and white dog is swimming in front of a building. in a pool. DSD: a group of people are walking in a park. Baseline: two girls in bathing suits are playing in the water. Sparse: two children are playing in the sand. DSD: two children are playing in the sand. Baseline: a man in a red shirt and jeans is riding a bicycle down a street. Sparse: a man in a red shirt and a woman in a wheelchair. DSD: a man and a woman are riding on a street. Baseline: a group of people sit on a bench in front of a building. Sparse: a group of people are standing in front of a building. DSD: a group of people are standing in a fountain. Baseline: a man in a black jacket and a black jacket is smiling. Sparse: a man and a woman are standing in front of a mountain. DSD: a man in a black jacket is standing next to a man in a black shirt. Baseline: a group of football players in red uniforms. Sparse: a group of football players in a field. DSD: a group of football players in red and white uniforms. Han et al. DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow, arxiv 2016 Baseline: a dog runs through the grass. Sparse: a dog runs through the grass. DSD: a white and brown dog is running through the grass.

RedEye Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision

RedEye Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision Robert LiKamWa Yunhui Hou Yuan Gao Mia Polansky Lin Zhong roblkw@rice.edu houyh@rice.edu yg18@rice.edu mia.polansky@rice.edu lzhong@rice.edu

More information

TODAY computer vision technologies are used with great

TODAY computer vision technologies are used with great ARXIV PREPRINT 1 Origami: A 803 GOp/s/W Convolutional Network Accelerator Lukas Cavigelli, Student Member, IEEE, and Luca Benini, Fellow, IEEE arxiv:1512.04295v2 [cs.cv] 19 Jan 2016 Abstract An ever increasing

More information

Generalized Pattern Matching Micro-Engine

Generalized Pattern Matching Micro-Engine Generalized Pattern Matching Micro-Engine Yuanwei Fang*, Raihan Rasool, Dilip Vasudevan*, Andrew A. Chien* University of Chicago * Argonne National Laboratory King Faisal University Big Data Applications

More information

Lossless Compression Algorithms for Direct- Write Lithography Systems

Lossless Compression Algorithms for Direct- Write Lithography Systems Lossless Compression Algorithms for Direct- Write Lithography Systems Hsin-I Liu Video and Image Processing Lab Department of Electrical Engineering and Computer Science University of California at Berkeley

More information

Power Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT

Power Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT Power Efficient Architectures to Accelerate Deep Convolutional Neural Networks for edge computing and IoT Giuseppe Desoli ST Central Labs STMicroelectronics Artificial Intelligence is Everywhere 2 Analysis,

More information

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik

Discriminative and Generative Models for Image-Language Understanding. Svetlana Lazebnik Discriminative and Generative Models for Image-Language Understanding Svetlana Lazebnik Image-language understanding Robot, take the pan off the stove! Discriminative image-language tasks Image-sentence

More information

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Implementation of an MPEG Codec on the Tilera TM 64 Processor 1 Implementation of an MPEG Codec on the Tilera TM 64 Processor Whitney Flohr Supervisor: Mark Franklin, Ed Richter Department of Electrical and Systems Engineering Washington University in St. Louis Fall

More information

Scalability of MB-level Parallelism for H.264 Decoding

Scalability of MB-level Parallelism for H.264 Decoding Scalability of Macroblock-level Parallelism for H.264 Decoding Mauricio Alvarez Mesa 1, Alex Ramírez 1,2, Mateo Valero 1,2, Arnaldo Azevedo 3, Cor Meenderinck 3, Ben Juurlink 3 1 Universitat Politècnica

More information

A Low-Power 0.7-V H p Video Decoder

A Low-Power 0.7-V H p Video Decoder A Low-Power 0.7-V H.264 720p Video Decoder D. Finchelstein, V. Sze, M.E. Sinangil, Y. Koken, A.P. Chandrakasan A-SSCC 2008 Outline Motivation for low-power video decoders Low-power techniques pipelining

More information

Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability

Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Eploiting Numerical Precision Variability Alberto Delmás Lascorz, Sayeh Sharify, Patrick Judd & Andreas Moshovos

More information

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan Virginia Polytechnic Institute and State University Reverse-engineer the brain National

More information

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015 Optimization of Multi-Channel BCH Error Decoding for Common Cases Russell Dill Master's Thesis Defense April 20, 2015 Bose-Chaudhuri-Hocquenghem (BCH) BCH is an Error Correcting Code (ECC) and is used

More information

EE5780 Advanced VLSI CAD

EE5780 Advanced VLSI CAD EE5780 Advanced VLSI CAD Lecture 11 SRAM and Yield Analysis Zhuo Feng 11.1 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports Outline Serial Access Memories 11.2 Memory

More information

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright. The final version is published and available at IET Digital Library

More information

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems Hsin-I Liu, Brian Richards, Avideh Zakhor, and Borivoje Nikolic Dept. of Electrical Engineering

More information

ESE534: Computer Organization. Today. Image Processing. Retiming Demand. Preclass 2. Preclass 2. Retiming Demand. Day 21: April 14, 2014 Retiming

ESE534: Computer Organization. Today. Image Processing. Retiming Demand. Preclass 2. Preclass 2. Retiming Demand. Day 21: April 14, 2014 Retiming ESE534: Computer Organization Today Retiming Demand Folded Computation Day 21: April 14, 2014 Retiming Logical Pipelining Physical Pipelining Retiming Supply Technology Structures Hierarchy 1 2 Image Processing

More information

ESE534: Computer Organization. Previously. Today. Previously. Today. Preclass 1. Instruction Space Modeling

ESE534: Computer Organization. Previously. Today. Previously. Today. Preclass 1. Instruction Space Modeling ESE534: Computer Organization Previously Instruction Space Modeling Day 15: March 24, 2014 Empirical Comparisons Previously Programmable compute blocks LUTs, ALUs, PLAs Today What if we just built a custom

More information

On the Rules of Low-Power Design

On the Rules of Low-Power Design On the Rules of Low-Power Design (and How to Break Them) Prof. Todd Austin Advanced Computer Architecture Lab University of Michigan austin@umich.edu Once upon a time 1 Rules of Low-Power Design P = acv

More information

Sensor Development for the imote2 Smart Sensor Platform

Sensor Development for the imote2 Smart Sensor Platform Sensor Development for the imote2 Smart Sensor Platform March 7, 2008 2008 Introduction Aging infrastructure requires cost effective and timely inspection and maintenance practices The condition of a structure

More information

CacheCompress A Novel Approach for Test Data Compression with cache for IP cores

CacheCompress A Novel Approach for Test Data Compression with cache for IP cores CacheCompress A Novel Approach for Test Data Compression with cache for IP cores Hao Fang ( 方昊 ) fanghao@mprc.pku.edu.cn Rizhao, ICDFN 07 20/08/2007 To be appeared in ICCAD 07 Sections Introduction Our

More information

An Introduction to Deep Image Aesthetics

An Introduction to Deep Image Aesthetics Seminar in Laboratory of Visual Intelligence and Pattern Analysis (VIPA) An Introduction to Deep Image Aesthetics Yongcheng Jing College of Computer Science and Technology Zhejiang University Zhenchuan

More information

Tutorial Outline. Typical Memory Hierarchy

Tutorial Outline. Typical Memory Hierarchy Tutorial Outline 8:30-8:45 8:45-9:05 9:05-9:30 9:30-10:30 10:30-10:50 10:50-12:15 12:15-1:30 1:30-2:30 2:30-3:30 3:30-3:50 3:50-4:30 4:30-4:45 Introduction and motivation Sources of power in CMOS designs

More information

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems Hsin-I Liu, Brian Richards, Avideh Zakhor, and Borivoje Nikolic Dept. of Electrical Engineering

More information

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji S.NO 2018-2019 B.TECH VLSI IEEE TITLES TITLES FRONTEND 1. Approximate Quaternary Addition with the Fast Carry Chains of FPGAs 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. A Low-Power

More information

CS 7643: Deep Learning

CS 7643: Deep Learning CS 7643: Deep Learning Topics: Stride, padding Pooling layers Fully-connected layers as convolutions Backprop in conv layers Dhruv Batra Georgia Tech Invited Talks Sumit Chopra on CNNs for Pixel Labeling

More information

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS 9th European Signal Processing Conference (EUSIPCO 2) Barcelona, Spain, August 29 - September 2, 2 A 6-65 CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS Jinjia Zhou, Dajiang

More information

A video signal processor for motioncompensated field-rate upconversion in consumer television

A video signal processor for motioncompensated field-rate upconversion in consumer television A video signal processor for motioncompensated field-rate upconversion in consumer television B. De Loore, P. Lippens, P. Eeckhout, H. Huijgen, A. Löning, B. McSweeney, M. Verstraelen, B. Pham, G. de Haan,

More information

Sequential Logic. Introduction to Computer Yung-Yu Chuang

Sequential Logic. Introduction to Computer Yung-Yu Chuang Sequential Logic Introduction to Computer Yung-Yu Chuang with slides by Sedgewick & Wayne (introcs.cs.princeton.edu), Nisan & Schocken (www.nand2tetris.org) and Harris & Harris (DDCA) Review of Combinational

More information

FOIL it! Find One mismatch between Image and Language caption

FOIL it! Find One mismatch between Image and Language caption FOIL it! Find One mismatch between Image and Language caption ACL, Vancouver, 31st July, 2017 Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi

More information

High Performance Carry Chains for FPGAs

High Performance Carry Chains for FPGAs High Performance Carry Chains for FPGAs Matthew M. Hosler Department of Electrical and Computer Engineering Northwestern University Abstract Carry chains are an important consideration for most computations,

More information

Layout Decompression Chip for Maskless Lithography

Layout Decompression Chip for Maskless Lithography Layout Decompression Chip for Maskless Lithography Borivoje Nikolić, Ben Wild, Vito Dai, Yashesh Shroff, Benjamin Warlick, Avideh Zakhor, William G. Oldham Department of Electrical Engineering and Computer

More information

Deep learning for music data processing

Deep learning for music data processing Deep learning for music data processing A personal (re)view of the state-of-the-art Jordi Pons www.jordipons.me Music Technology Group, DTIC, Universitat Pompeu Fabra, Barcelona. 31st January 2017 Jordi

More information

Slide Set Overview. Special Topics in Advanced Digital System Design. Embedded System Design. Embedded System Design. What does a digital camera do?

Slide Set Overview. Special Topics in Advanced Digital System Design. Embedded System Design. Embedded System Design. What does a digital camera do? Slide Set Overview Special Topics in Advanced Digital System Design by Dr. Lesley Shannon Email: lshannon@ensc.sfu.ca Course Website: http://www.ensc.sfu.ca/~lshannon/ Simon Fraser University Slide Set:

More information

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design

Use of Low Power DET Address Pointer Circuit for FIFO Memory Design International Journal of Education and Science Research Review Use of Low Power DET Address Pointer Circuit for FIFO Memory Design Harpreet M.Tech Scholar PPIMT Hisar Supriya Bhutani Assistant Professor

More information

AN EFFECTIVE CACHE FOR THE ANYWHERE PIXEL ROUTER

AN EFFECTIVE CACHE FOR THE ANYWHERE PIXEL ROUTER University of Kentucky UKnowledge Theses and Dissertations--Electrical and Computer Engineering Electrical and Computer Engineering 2007 AN EFFECTIVE CACHE FOR THE ANYWHERE PIXEL ROUTER Vijai Raghunathan

More information

Advanced Video Processing for Future Multimedia Communication Systems

Advanced Video Processing for Future Multimedia Communication Systems Advanced Video Processing for Future Multimedia Communication Systems André Kaup Friedrich-Alexander University Erlangen-Nürnberg Future Multimedia Communication Systems Trend in video to make communication

More information

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System Zhibin Xiao and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Outline Introduction to H.264

More information

Hardware Design I Chap. 5 Memory elements

Hardware Design I Chap. 5 Memory elements Hardware Design I Chap. 5 Memory elements E-mail: shimada@is.naist.jp Why memory is required? To hold data which will be processed with designed hardware (for storage) Main memory, cache, register, and

More information

Day 21: Retiming Requirements. ESE534: Computer Organization. Relative Sizes. Today. State. State Size

Day 21: Retiming Requirements. ESE534: Computer Organization. Relative Sizes. Today. State. State Size ESE534: Computer Organization Day 22: November 16, 2016 Retiming 1 Day 21: Retiming Requirements Retiming requirement depends on parallelism and performance Even with a given amount of parallelism Will

More information

Scene Classification with Inception-7. Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Scene Classification with Inception-7. Christian Szegedy with Julian Ibarz and Vincent Vanhoucke Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke Julian Ibarz Vincent Vanhoucke Task Classification of images into 10 different classes: Bedroom Bridge Church

More information

Register Transfer Level (RTL) Design Cont.

Register Transfer Level (RTL) Design Cont. CSE4: Components and Design Techniques for Digital Systems Register Transfer Level (RTL) Design Cont. Tajana Simunic Rosing Where we are now What we are covering today: RTL design examples, RTL critical

More information

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification INTERSPEECH 17 August, 17, Stockholm, Sweden A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification Yun Wang and Florian Metze Language

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN

International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September ISSN International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September-2014 917 The Power Optimization of Linear Feedback Shift Register Using Fault Coverage Circuits K.YARRAYYA1, K CHITAMBARA

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

An Efficient Reduction of Area in Multistandard Transform Core

An Efficient Reduction of Area in Multistandard Transform Core An Efficient Reduction of Area in Multistandard Transform Core A. Shanmuga Priya 1, Dr. T. K. Shanthi 2 1 PG scholar, Applied Electronics, Department of ECE, 2 Assosiate Professor, Department of ECE Thanthai

More information

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing Theodore Yu theodore.yu@ti.com Texas Instruments Kilby Labs, Silicon Valley Labs September 29, 2012 1 Living in an analog world The

More information

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 Design and Implementation of an Enhanced LUT System in Security Based Computation dama.dhanalakshmi 1, K.Annapurna

More information

ALONG with the progressive device scaling, semiconductor

ALONG with the progressive device scaling, semiconductor IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 4, APRIL 2010 285 LUT Optimization for Memory-Based Computation Pramod Kumar Meher, Senior Member, IEEE Abstract Recently, we

More information

Multimedia Communications. Video compression

Multimedia Communications. Video compression Multimedia Communications Video compression Video compression Of all the different sources of data, video produces the largest amount of data There are some differences in our perception with regard to

More information

Data Storage and Manipulation

Data Storage and Manipulation Data Storage and Manipulation Data Storage Bits and Their Storage: Gates and Flip-Flops, Other Storage Techniques, Hexadecimal notation Main Memory: Memory Organization, Measuring Memory Capacity Mass

More information

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0 General Description Applications Features The OL_H264e core is a hardware implementation of the H.264 baseline video compression algorithm. The core

More information

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network

Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network Predicting Aesthetic Radar Map Using a Hierarchical Multi-task Network Xin Jin 1,2,LeWu 1, Xinghui Zhou 1, Geng Zhao 1, Xiaokun Zhang 1, Xiaodong Li 1, and Shiming Ge 3(B) 1 Department of Cyber Security,

More information

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida Reconfigurable Architectures Greg Stitt ECE Department University of Florida How can hardware be reconfigurable? Problem: Can t change fabricated chip ASICs are fixed Solution: Create components that can

More information

Joint Image and Text Representation for Aesthetics Analysis

Joint Image and Text Representation for Aesthetics Analysis Joint Image and Text Representation for Aesthetics Analysis Ye Zhou 1, Xin Lu 2, Junping Zhang 1, James Z. Wang 3 1 Fudan University, China 2 Adobe Systems Inc., USA 3 The Pennsylvania State University,

More information

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications Introduction Brandon Richardson December 16, 2011 Research preformed from the last 5 years has shown that the

More information

On the HPDP from architecture to a device. Final Presentation Days ESTEC, May 9 th 2017

On the HPDP from architecture to a device. Final Presentation Days ESTEC, May 9 th 2017 On the HPDP from architecture to a device Final Presentation Days ESTEC, May 9 th 2017 Outline Introduction HPDP Architecture Top Design Comparisons Target applications Design Flow Operating Environment

More information

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics

VLSI Design: 3) Explain the various MOSFET Capacitances & their significance. 4) Draw a CMOS Inverter. Explain its transfer characteristics 1) Explain why & how a MOSFET works VLSI Design: 2) Draw Vds-Ids curve for a MOSFET. Now, show how this curve changes (a) with increasing Vgs (b) with increasing transistor width (c) considering Channel

More information

Hardware Implementation of Viterbi Decoder for Wireless Applications

Hardware Implementation of Viterbi Decoder for Wireless Applications Hardware Implementation of Viterbi Decoder for Wireless Applications Bhupendra Singh 1, Sanjeev Agarwal 2 and Tarun Varma 3 Deptt. of Electronics and Communication Engineering, 1 Amity School of Engineering

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

SoC IC Basics. COE838: Systems on Chip Design

SoC IC Basics. COE838: Systems on Chip Design SoC IC Basics COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview SoC

More information

A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING. Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt

A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING. Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt Motivation High demand for video on mobile devices Compressionto reduce storage

More information

FPGA-BASED IMPLEMENTATION OF A REAL-TIME 5000-WORD CONTINUOUS SPEECH RECOGNIZER

FPGA-BASED IMPLEMENTATION OF A REAL-TIME 5000-WORD CONTINUOUS SPEECH RECOGNIZER FPGA-BASED IMPLEMENTATION OF A REAL-TIME 5000-WORD CONTINUOUS SPEECH RECOGNIZER Young-kyu Choi, Kisun You, and Wonyong Sung School of Electrical Engineering, Seoul National University San 56-1, Shillim-dong,

More information

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities Introduction About Myself What to expect out of this lecture Understand the current trend in the IC Design

More information

Reconfigurable Neural Net Chip with 32K Connections

Reconfigurable Neural Net Chip with 32K Connections Reconfigurable Neural Net Chip with 32K Connections H.P. Graf, R. Janow, D. Henderson, and R. Lee AT&T Bell Laboratories, Room 4G320, Holmdel, NJ 07733 Abstract We describe a CMOS neural net chip with

More information

LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED SERDES. Masum Hossain University of Alberta

LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED SERDES. Masum Hossain University of Alberta LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED SERDES Masum Hossain University of Alberta 0 Outline Why ADC-Based receiver? Challenges in ADC-based receiver ADC-DSP based Receiver Reducing impact of Quantization

More information

MPEG has been established as an international standard

MPEG has been established as an international standard 1100 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999 Fast Extraction of Spatially Reduced Image Sequences from MPEG-2 Compressed Video Junehwa Song, Member,

More information

Motion Video Compression

Motion Video Compression 7 Motion Video Compression 7.1 Motion video Motion video contains massive amounts of redundant information. This is because each image has redundant information and also because there are very few changes

More information

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures

Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Investigation of Look-Up Table Based FPGAs Using Various IDCT Architectures Jörn Gause Abstract This paper presents an investigation of Look-Up Table (LUT) based Field Programmable Gate Arrays (FPGAs)

More information

Decoder Hardware Architecture for HEVC

Decoder Hardware Architecture for HEVC Decoder Hardware Architecture for HEVC The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Tikekar, Mehul,

More information

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and

More information

CMOS Technology for Increasing Efficiency of Clock Gating Techniques Using Tri-State Buffer

CMOS Technology for Increasing Efficiency of Clock Gating Techniques Using Tri-State Buffer Engineering and Physical Sciences CMOS Technology for Increasing Efficiency of Clock Gating Techniques Using Tri-State Buffer Maan HAMEED *, Asem KHMAG, Fakhrul ZAMAN and Abdurrahman RAMLI Department of

More information

SEMICONDUCTOR TECHNOLOGY -CMOS-

SEMICONDUCTOR TECHNOLOGY -CMOS- SEMICONDUCTOR TECHNOLOGY -CMOS- Fire Tom Wada What is semiconductor and LSIs Huge number of transistors can be integrated in a small Si chip. The size of the chip is roughly the size of nails. Currently,

More information

An FPGA Platform for Demonstrating Embedded Vision Systems. Ariana Eisenstein

An FPGA Platform for Demonstrating Embedded Vision Systems. Ariana Eisenstein An FPGA Platform for Demonstrating Embedded Vision Systems by Ariana Eisenstein B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer Science

More information

11. Sequential Elements

11. Sequential Elements 11. Sequential Elements Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 11, 2017 ECE Department, University of Texas at Austin

More information

PROF. TAJANA SIMUNIC ROSING. Midterm. Problem Max. Points Points Total 150 INSTRUCTIONS:

PROF. TAJANA SIMUNIC ROSING. Midterm. Problem Max. Points Points Total 150 INSTRUCTIONS: CSE 237A FALL 2006 PROF. TAJANA SIMUNIC ROSING Midterm NAME: ID: Solutions Problem Max. Points Points 1 20 2 20 3 30 4 25 5 25 6 30 Total 150 INSTRUCTIONS: 1. There are 6 problems on 11 pages worth a total

More information

SRAM Based Random Number Generator For Non-Repeating Pattern Generation

SRAM Based Random Number Generator For Non-Repeating Pattern Generation Applied Mechanics and Materials Online: 2014-06-18 ISSN: 1662-7482, Vol. 573, pp 181-186 doi:10.4028/www.scientific.net/amm.573.181 2014 Trans Tech Publications, Switzerland SRAM Based Random Number Generator

More information

Sharif University of Technology. SoC: Introduction

Sharif University of Technology. SoC: Introduction SoC Design Lecture 1: Introduction Shaahin Hessabi Department of Computer Engineering System-on-Chip System: a set of related parts that act as a whole to achieve a given goal. A system is a set of interacting

More information

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0 General Description Applications Features The OL_H264MCLD core is a hardware implementation of the H.264 baseline video compression

More information

Communication Avoiding Successive Band Reduction

Communication Avoiding Successive Band Reduction Communication Avoiding Successive Band Reduction Grey Ballard, James Demmel, Nicholas Knight UC Berkeley PPoPP 12 Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by

More information

Frame Processing Time Deviations in Video Processors

Frame Processing Time Deviations in Video Processors Tensilica White Paper Frame Processing Time Deviations in Video Processors May, 2008 1 Executive Summary Chips are increasingly made with processor designs licensed as semiconductor IP (intellectual property).

More information

TKK S ASIC-PIIRIEN SUUNNITTELU

TKK S ASIC-PIIRIEN SUUNNITTELU Design TKK S-88.134 ASIC-PIIRIEN SUUNNITTELU Design Flow 3.2.2005 RTL Design 10.2.2005 Implementation 7.4.2005 Contents 1. Terminology 2. RTL to Parts flow 3. Logic synthesis 4. Static Timing Analysis

More information

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method

Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method Reconfigurable FPGA Implementation of FIR Filter using Modified DA Method M. Backia Lakshmi 1, D. Sellathambi 2 1 PG Student, Department of Electronics and Communication Engineering, Parisutham Institute

More information

Low Power Estimation on Test Compression Technique for SoC based Design

Low Power Estimation on Test Compression Technique for SoC based Design Indian Journal of Science and Technology, Vol 8(4), DOI: 0.7485/ijst/205/v8i4/6848, July 205 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Low Estimation on Test Compression Technique for SoC based

More information

A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame

A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame I J C T A, 9(34) 2016, pp. 673-680 International Science Press A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame K. Priyadarshini 1 and D. Jackuline Moni

More information

Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng

Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng Slide Set 9 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 9 slide

More information

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7

Contents Slide Set 6. Introduction to Chapter 7 of the textbook. Outline of Slide Set 6. An outline of the first part of Chapter 7 CM 69 W4 Section Slide Set 6 slide 2/9 Contents Slide Set 6 for CM 69 Winter 24 Lecture Section Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

Automatic optimization of image capture on mobile devices by human and non-human agents

Automatic optimization of image capture on mobile devices by human and non-human agents Automatic optimization of image capture on mobile devices by human and non-human agents 1.1 Abstract Sophie Lebrecht, Mark Desnoyer, Nick Dufour, Zhihao Li, Nicole A. Halmi, David L. Sheinberg, Michael

More information

Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis

Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis Low Power Approach of Clock Gating in Synchronous System like FIFO: A Novel Clock Gating Approach and Comparative Analysis Abstract- A new technique of clock is presented to reduce dynamic power consumption.

More information

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

The Multistandard Full Hd Video-Codec Engine On Low Power Devices The Multistandard Full Hd Video-Codec Engine On Low Power Devices B.Susma (M. Tech). Embedded Systems. Aurora s Technological & Research Institute. Hyderabad. B.Srinivas Asst. professor. ECE, Aurora s

More information

THE USE OF forward error correction (FEC) in optical networks

THE USE OF forward error correction (FEC) in optical networks IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 8, AUGUST 2005 461 A High-Speed Low-Complexity Reed Solomon Decoder for Optical Communications Hanho Lee, Member, IEEE Abstract

More information

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter

LUT Design Using OMS Technique for Memory Based Realization of FIR Filter International Journal of Emerging Engineering Research and Technology Volume. 2, Issue 6, September 2014, PP 72-80 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) LUT Design Using OMS Technique for Memory

More information

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 29 Minimizing Switched Capacitance-III. (Refer

More information

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique

FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique FPGA Based Implementation of Convolutional Encoder- Viterbi Decoder Using Multiple Booting Technique Dr. Dhafir A. Alneema (1) Yahya Taher Qassim (2) Lecturer Assistant Lecturer Computer Engineering Dept.

More information

SEMICONDUCTOR TECHNOLOGY -CMOS-

SEMICONDUCTOR TECHNOLOGY -CMOS- SEMICONDUCTOR TECHNOLOGY -CMOS- Fire Tom Wada 2011/12/19 1 What is semiconductor and LSIs Huge number of transistors can be integrated in a small Si chip. The size of the chip is roughly the size of nails.

More information

Syrah. Flux All 1rights reserved

Syrah. Flux All 1rights reserved Flux 2009. All 1rights reserved - The Creative adaptive-dynamics processor Thank you for using. We hope that you will get good use of the information found in this manual, and to help you getting acquainted

More information

Design of Memory Based Implementation Using LUT Multiplier

Design of Memory Based Implementation Using LUT Multiplier Design of Memory Based Implementation Using LUT Multiplier Charan Kumar.k 1, S. Vikrama Narasimha Reddy 2, Neelima Koppala 3 1,2 M.Tech(VLSI) Student, 3 Assistant Professor, ECE Department, Sree Vidyanikethan

More information

Innovative Fast Timing Design

Innovative Fast Timing Design Innovative Fast Timing Design Solution through Simultaneous Processing of Logic Synthesis and Placement A new design methodology is now available that offers the advantages of enhanced logical design efficiency

More information

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus

Read-only memory (ROM) Digital logic: ALUs Sequential logic circuits. Don't cares. Bus Digital logic: ALUs Sequential logic circuits CS207, Fall 2004 October 11, 13, and 15, 2004 1 Read-only memory (ROM) A form of memory Contents fixed when circuit is created n input lines for 2 n addressable

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Learning for Music arxiv:1606.04930v1 [cs.lg] 15 Jun 2016 Allen Huang Department of Management Science and Engineering Stanford University allenh@cs.stanford.edu Abstract Raymond Wu Department of

More information