PRACE Autumn School GPU Programming

Size: px
Start display at page:

Download "PRACE Autumn School GPU Programming"

Transcription

1 PRACE Autumn School 2010 GPU Programming October 25-29, 2010 PRACE Autumn School, Oct

2 Outline GPU Programming Track Tuesday 26th GPGPU: General-purpose GPU Programming CUDA Architecture, Threading and Memory model CUDA Programming, Runtimes and Environments Hands-on Lab 1: CUDA Environment Setup, Compilation and Execution Examples Wednesday 27th CUDA Optimizations. Debugging and Profiling GPU Multiprocessing. Deploying Multi-GPU Applications The GPU on Heterogeneous and High-Performance Computing Hands-on Lab 2: Advanced Tools and Exercises. HPC Codes and Performance Evaluation PRACE Autumn School, Oct

3 Instructors Manuel Ujaldón Associate Professor, Computer Architecture Department, University of Malaga Nacho Navarro Associate Professor, Computer Architecture Department, Universitat Politecnica de Catalunya (UPC), Researcher at Barcelona Supercomputing Center (BSC), Visiting Research Professor at University of Illinois (UIUC) Javier Cabezas Ph.D. student at the Computer Architecture Department, UPC. Researcher at the Barcelona Supercomputing Center. Visiting PhD. Student at UIUC. PRACE Autumn School, Oct

4 CUDA is Popular PRACE Autumn School, Oct

5 PUMPS Summer School Programming and Tuning Massively Parallel Systems Summer School (PUMPS) Teachers: Wen-mei W. Hwu, University of Illinois David B. Kirk, NVIDIA PRACE Autumn School, Oct

6 BSC named first CUDA Research Center in Spain The Barcelona Supercomputing Center (BSC) has been named by NVIDIA as a 2010 CUDA Research Center, the first in Spain. The CUDA Research Center Program recognizes and fosters collaboration with research groups at universities and research institutes that are expanding the frontier of massively parallel computing. Institutions identified as CUDA Research Centers are doing worldchanging research by leveraging CUDA and NVIDIA GPUs. PRACE Autumn School, Oct

7 Hands-on Labs Labs will be done at the AC GPU Cluster at NCSA AC.NCSA.UIUC.EDU Experimental system available for exploring GPU computing PRACE Autumn School, Oct

8 HP xw9400 workstation 2216 AMD Opteron 2.4 GHz dual socket dual core 8 GB DDR2 Infiniband QDR Tesla S1070 1U GPU Computing Server 1.3 GHz Tesla T10 processors 4x4 GB GDDR3 SDRAM Cluster Servers: 32 (128 CPUS) Accelerator Units: 32 (128 GPUS, 128 TF SP, 10 TF DP) Compute Node PRACE Autumn School, Oct

9 Course Wiki Course Material Hands-on Lab Info on Textbooks Links to interesting educational material Register and log in to get access to the content PRACE Autumn School, Oct

10 PRACE Autumn School 2010 GPU Programming Nacho Navarro Associate Professor Universitat Politecnica Catalunya / Barcelona Supercomputing Center Visiting Research Professor, UIUC, CSL PRACE Autumn School, Oct

11 Outline Multicore: Dual/Quad, Cell, GPU, FPGA,? Current and future systems Graphics beyond games Programmability experiences and trends Supercomputing anywhere Acknowledgements: Prof. Wen-mei Hwu, UIUC, David Kirk, NVIDIA, NCSA Summer School PRACE Autumn School, Oct

12 Current Trend: Multi-core Processors Cache Cache Core Core Core C1 C3 C2 Cache C4 C1 C2 C1 C2 C1 C2 C3 C4 C3 C4 Cache C1 C2 C1 C2 C3 C4 C3 C4 C3 C4 Past trend: increasing number of transistors on a chip and increasing clock speed Heat is an unmanageable problem, Intel Processors > 100 Watts We will not see the dramatic increases in clock speeds in the future. However, # transistors on a chip will continue to increase. Intel Core 2 Duo Do we have some free space? put more cores What s left over? Put cache memory PRACE Autumn School, Oct

13 Multicores: Just Cores? How many cores? Intel/AMD cores IBM Cell 8-16 SPU NVIDIA 480 cores Multicore is Hardware and Software together (challenge and inspire each other) More transistors, worse reliability Error / fault (detection / correction / recovery) Dynamic reconfiguration Memory Memory wall due to bandwidth (scalability?) Memory wall due to power (interconnect needs power) Memory size grows but data always grows more and more On-chip locality, communication PRACE Autumn School, Oct

14 IBM, SONY, TOSHIBA Cell BE Heterogeneous Mickey mouse Power PC 8 SPU Local memory, local address space Lot of memory copies: DMA s Always short of memory space Cannot host all data Software cache Two unrelated thread schedulers Reliability: if all cores are fine, IBM supercomputer; if SPE error, sell it as PS3 PRACE Autumn School, Oct

15 NVIDIA GPU PRACE Autumn School, Oct

16 GPU: How Many cores? (240 in chunks of 16 way MP) PRACE Autumn School, Oct

17 Is GPU driving the parallelism revolution? 1 Based on slide 7 of S. Green, GPU Physics, SIGGRAPH 2007 GPGPU Course. PRACE Autumn School, Oct

18 GPU performance in recent history Performance of NVIDIA GPUs over time Fermi Peak GFLOPS CUDA Memory Bandwidth (GB/s) PRACE Autumn School, Oct

19 CPU vs. GPU, approaching each other PRACE Autumn School, Oct

20 ILP vs. Massive Data Parallelism PRACE Autumn School, Oct

21 PRACE Autumn School, Oct

22 PRACE Autumn School, Oct

23 Graphics and Games: Nvidia purchased AGEIA PhysX middleware. PRACE Autumn School, Oct

24 Massive Parallelism PRACE Autumn School, Oct

25 GPU: Supercomputing at Home PRACE Autumn School, Oct

26 PRACE Autumn School, Oct

27 PRACE Autumn School, Oct

28 CUDA: Widely Adopted Parallel Programming Model PRACE Autumn School, Oct

29 PRACE Autumn School, Oct

30 PRACE Autumn School, Oct

31 Performance of Advanced MRI Reconstruction Wen-mei Hwu, IMPACT, UIUC PRACE Autumn School, Oct

32 GPU Speedup GPU gives us 100x (after one month of understanding the architecture) to massive parallel algorithms Faster is not just Faster 2-3X faster is just faster Do a little more, wait a little less Doesn t change how you work 5-10x faster is significant Worth upgrading Worth re-writing (parts of) the application 100x+ faster is fundamentally different Worth considering a new platform Worth re-architecting the application Makes new applications possible Drives time to discovery and creates fundamental changes in Science PRACE Autumn School, Oct

33 PRACE Autumn School, Oct

34 CUDA Features (Threading) Physical partitioning in SM Virtual partitioning Problem is divided into a grid of Thread Blocks (TBs) Each Thread Block is composed by <= 512 threads Threads are very lightweight Scheduling of threads on physical cores is performed by the HW (in groups called warps ) New warps are scheduled on memory stalls (hides latency) Many TBs can be executed on the same SM (1024 threads max), depending on the used (memory) resources SIMD: Divergent branches significantly reduce the performance PRACE Autumn School, Oct

35 CUDA Features (Memory) Global memory (up to 4GB per card) Very slow ( cycles) Texture memory (64KB per card) <cache> Read-only Useful for some kinds of access patterns Constant memory (64KB per card) <cache> Read-only 2 cycles (when all threads in a warp read the Shared memory (16KB per SM) 8 banks (4 bytes stride) 2 cycles if no bank conflict (consecutive accesses) Register memory registers/sm (16 per thread if 1024 threads, 32 if 512 threads) PRACE Autumn School, Oct

36 Data Movements and Kernel Launch PRACE Autumn School, Oct

37 Oil and Gas Prospection PRACE Autumn School, Oct

38 RTM on GPU : Experience on Mapping Forward Stencil + Hessian (GPU) Boundary Conditions (GPU) Shot insertion (GPU) Receivers (GPU) For synthetic traces Write to disk (CPU) Backward Stencil (GPU) Boundary Conditions (GPU) Receivers shots insertions (GPU) Read from disk Correlation PRACE Autumn School, Oct

39 RTM Port to GPU Timeline Three months progress for a new CUDA developer PRACE Autumn School, Oct

40 RTM kernel on GPUs Current Results Three months progress for a new CUDA developer PRACE Autumn School, Oct

41 RTM on GPU: Kernel bottlenecks Naïve: uses global memory only Store all the matrices in the global memory Unroll the loops and create as many TB as necessary Bottleneck: global accesses are very slow Shared memory: Use shared memory to store the values of the previous time step Drawback: divergent branches to load the ghost area Bottleneck: Shared memory usage Bad useful/total reads ratio due to the big stencil 2D sliding window: Proposed by Paulius Micikevicius (NVIDIA Total) Store the Y (geophysical) stencil dimension in registers Only store the ZX plane in shared memory Better useful/total reads ratio Slide the plane to the end of the cube Bottleneck: Registers usage PRACE Autumn School, Oct

42 Benchmarks and Lessons Learned App. Archit. Bottleneck Simult. T Kernel X App X H.264 Registers, global memory latency 3, LBM Shared memory capacity 3, RC5-72 Registers 3, FEM Global memory bandwidth 4, RPES Instruction issue rate 4, PNS Global memory capacity 2, LINPACK Global memory bandwidth, CPU-GPU data transfer 12, TRACF Shared memory capacity 4, FDTD Global memory bandwidth 1, MRI-Q Instruction issue rate 8, [HKR HotChips-2007] PRACE Autumn School, Oct

Scalability of MB-level Parallelism for H.264 Decoding

Scalability of MB-level Parallelism for H.264 Decoding Scalability of Macroblock-level Parallelism for H.264 Decoding Mauricio Alvarez Mesa 1, Alex Ramírez 1,2, Mateo Valero 1,2, Arnaldo Azevedo 3, Cor Meenderinck 3, Ben Juurlink 3 1 Universitat Politècnica

More information

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan Virginia Polytechnic Institute and State University Reverse-engineer the brain National

More information

Transparent low-overhead checkpoint for GPU-accelerated clusters

Transparent low-overhead checkpoint for GPU-accelerated clusters Transparent low-overhead checkpoint for GPU-accelerated clusters Leonardo BAUTISTA GOMEZ 1,3, Akira NUKADA 1, Naoya MARUYAMA 1, Franck CAPPELLO 3,4, Satoshi MATSUOKA 1,2 1 Tokyo Institute of Technology,

More information

GPU Acceleration of a Production Molecular Docking Code

GPU Acceleration of a Production Molecular Docking Code GPU Acceleration of a Production Molecular Docking Code Bharat Sukhwani Martin Herbordt Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University

More information

Amdahl s Law in the Multicore Era

Amdahl s Law in the Multicore Era Amdahl s Law in the Multicore Era Mark D. Hill and Michael R. Marty University of Wisconsin Madison August 2008 @ Semiahmoo Workshop IBM s Dr. Thomas Puzak: Everyone knows Amdahl s Law 2008 Multifacet

More information

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far.

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far. Outline 1 Reiteration Lecture 5: EIT090 Computer Architecture 2 Dynamic scheduling - Tomasulo Anders Ardö 3 Superscalar, VLIW EIT Electrical and Information Technology, Lund University Sept. 30, 2009 4

More information

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities Introduction About Myself What to expect out of this lecture Understand the current trend in the IC Design

More information

Instruction Level Parallelism Part III

Instruction Level Parallelism Part III Course on: Advanced Computer Architectures Instruction Level Parallelism Part III Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Outline of Part III Dynamic Scheduling

More information

Fooling the Masses with Performance Results: Old Classics & Some New Ideas

Fooling the Masses with Performance Results: Old Classics & Some New Ideas Fooling the Masses with Performance Results: Old Classics & Some New Ideas Gerhard Wellein (1,2), Georg Hager (2) (1) Department for Computer Science (2) Erlangen Regional Computing Center Friedrich-Alexander-Universität

More information

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures 46 H. Y. SU, M. WEN, J. REN, N. WU, J. CHAI, C.Y. ZHANG, HIGH-EFFICIENT PARALLEL CAVLC ENCODER High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures Huayou SU, Mei WEN, Ju REN,

More information

Instruction Level Parallelism Part III

Instruction Level Parallelism Part III Course on: Advanced Computer Architectures Instruction Level Parallelism Part III Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Outline of Part III Tomasulo Dynamic Scheduling

More information

Profiling techniques for parallel applications

Profiling techniques for parallel applications Profiling techniques for parallel applications Analyzing program performance with HPCToolkit 03/10/2016 PRACE Autumn School 2016 2 Introduction Focus of this session Profiling of parallel applications

More information

North America, Inc. AFFICHER. a true cloud digital signage system. Copyright PDC Co.,Ltd. All Rights Reserved.

North America, Inc. AFFICHER. a true cloud digital signage system. Copyright PDC Co.,Ltd. All Rights Reserved. AFFICHER a true cloud digital signage system AFFICHER INTRODUCTION AFFICHER (Sign in French) is a HIGH-END full function turnkey cloud based digital signage system for you to manage your screens. The AFFICHER

More information

Profiling techniques for parallel applications

Profiling techniques for parallel applications Profiling techniques for parallel applications Analyzing program performance with HPCToolkit 17/04/2014 PRACE Spring School 2014 2 Introduction Thomas Ponweiser Johannes Kepler University Linz (JKU) Involved

More information

USING FUSION SYSTEM ARCHITECTURE FOR BROADCAST VIDEO. Edward Callway AMD

USING FUSION SYSTEM ARCHITECTURE FOR BROADCAST VIDEO. Edward Callway AMD USING FUSION SYSTEM ARCHITECTURE FOR BROADCAST VIDEO Edward Callway AMD USING PC COMPONENTS FOR BROADCAST VIDEO Video processing from pure analog to digital compute PC Design for video Parallel GPU computing

More information

GPU s for High Performance Signal Processing in Infrared Camera System

GPU s for High Performance Signal Processing in Infrared Camera System GPU s for High Performance Signal Processing in Infrared Camera System Stefan Olsson, PhD Senior Company Specialist-Video Processing Project Manager at FLIR 2015-05-28 Instruments Automation/Process Monitoring

More information

Models NVIDIA NVS 315 1GB Graphics

Models NVIDIA NVS 315 1GB Graphics Overview Models NVIDIA NVS 315 1GB Graphics E1U66AA Introduction The NVIDIA NVS 315 graphics board is a PCI Express low profile form factor graphics add-in card targeted as an active low cost graphics

More information

Highly Parallel HEVC Decoding for Heterogeneous Systems with CPU and GPU

Highly Parallel HEVC Decoding for Heterogeneous Systems with CPU and GPU 2017. This manuscript version (accecpted manuscript) is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/. Highly Parallel HEVC Decoding for Heterogeneous

More information

Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley

Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley Eric Battenberg and David Wessel Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley Microsoft Parallel Applications Workshop

More information

Instruction Level Parallelism and Its. (Part II) ECE 154B

Instruction Level Parallelism and Its. (Part II) ECE 154B Instruction Level Parallelism and Its Exploitation (Part II) ECE 154B Dmitri Strukov ILP techniques not covered last week this week next week Scoreboard Technique Review Allow for out of order execution

More information

Research Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

Research Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation e Scientific World Journal, Article ID 716020, 19 pages http://dx.doi.org/10.1155/2014/716020 Research Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation Huayou

More information

M598. Radeon E8860 (Adelaar) Video & Graphics PMC. Aitech

M598. Radeon E8860 (Adelaar) Video & Graphics PMC.   Aitech Single Width PMC PCI-X 64-bit @ 133 MHz Host Interface AMD Radeon E8860 (Adelaar) GPU 6 Independent Graphics Heads 2 GB GDDR5 Analog Inputs Analog and Digital Outputs Full Switching Capabilities Capture

More information

A Highly Scalable Parallel Implementation of H.264

A Highly Scalable Parallel Implementation of H.264 A Highly Scalable Parallel Implementation of H.264 Arnaldo Azevedo 1, Ben Juurlink 1, Cor Meenderinck 1, Andrei Terechko 2, Jan Hoogerbrugge 3, Mauricio Alvarez 4, Alex Ramirez 4,5, Mateo Valero 4,5 1

More information

OddCI: On-Demand Distributed Computing Infrastructure

OddCI: On-Demand Distributed Computing Infrastructure OddCI: On-Demand Distributed Computing Infrastructure Rostand Costa Francisco Brasileiro Guido Lemos Filho Dênio Mariz Sousa MTAGS 2nd Workshop on Many-Task Computing on Grids and Supercomputers Co-located

More information

Milestone Solution Partner IT Infrastructure Components Certification Report

Milestone Solution Partner IT Infrastructure Components Certification Report Milestone Solution Partner IT Infrastructure Components Certification Report Infortrend Technologies 5000 Series NVR 12-15-2015 Table of Contents Executive Summary:... 4 Introduction... 4 Certified Products...

More information

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Implementation of an MPEG Codec on the Tilera TM 64 Processor 1 Implementation of an MPEG Codec on the Tilera TM 64 Processor Whitney Flohr Supervisor: Mark Franklin, Ed Richter Department of Electrical and Systems Engineering Washington University in St. Louis Fall

More information

Impact of Intermittent Faults on Nanocomputing Devices

Impact of Intermittent Faults on Nanocomputing Devices Impact of Intermittent Faults on Nanocomputing Devices Cristian Constantinescu June 28th, 2007 Dependable Systems and Networks Outline Fault classes Permanent faults Transient faults Intermittent faults

More information

Day 21: Retiming Requirements. ESE534: Computer Organization. Relative Sizes. Today. State. State Size

Day 21: Retiming Requirements. ESE534: Computer Organization. Relative Sizes. Today. State. State Size ESE534: Computer Organization Day 22: November 16, 2016 Retiming 1 Day 21: Retiming Requirements Retiming requirement depends on parallelism and performance Even with a given amount of parallelism Will

More information

Out of order execution allows

Out of order execution allows Out of order execution allows Letter A B C D E Answer Requires extra stages in the pipeline The processor to exploit parallelism between instructions. Is used mostly in handheld computers A, B, and C A

More information

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview

DC Ultra. Concurrent Timing, Area, Power and Test Optimization. Overview DATASHEET DC Ultra Concurrent Timing, Area, Power and Test Optimization DC Ultra RTL synthesis solution enables users to meet today s design challenges with concurrent optimization of timing, area, power

More information

Benchmark Mar_26_2018

Benchmark Mar_26_2018 Benchmark Mar_26_2018 Referens Tensorflow Official Benchmarks (May 2017, GitHub sour): https://www.tensorflow.org/performan/benchmarks IBM Power9 benchmark results (Nov 2017, 1.4.0): https://developer.ibm.com/linuxonpower/perfcol/perfcol-mldl/

More information

Tools to Debug Dead Boards

Tools to Debug Dead Boards Tools to Debug Dead Boards Hardware Prototype Bring-up Ryan Jones Senior Application Engineer Corelis 1 Boundary-Scan Without Boundaries click to start the show Webinar Outline What is a Dead Board? Prototype

More information

Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring

Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring white paper Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring Executive Summary Milestone Systems, the world s leading

More information

8088 Corruption. Motion Video on a 1981 IBM PC with CGA

8088 Corruption. Motion Video on a 1981 IBM PC with CGA 8088 Corruption Motion Video on a 1981 IBM PC with CGA Introduction 8088 Corruption plays video that: Is Full-motion (30fps) Is Full-screen In Color With synchronized audio on a 1981 IBM PC with CGA (and

More information

ESE534: Computer Organization. Today. Image Processing. Retiming Demand. Preclass 2. Preclass 2. Retiming Demand. Day 21: April 14, 2014 Retiming

ESE534: Computer Organization. Today. Image Processing. Retiming Demand. Preclass 2. Preclass 2. Retiming Demand. Day 21: April 14, 2014 Retiming ESE534: Computer Organization Today Retiming Demand Folded Computation Day 21: April 14, 2014 Retiming Logical Pipelining Physical Pipelining Retiming Supply Technology Structures Hierarchy 1 2 Image Processing

More information

Video Output and Graphics Acceleration

Video Output and Graphics Acceleration Video Output and Graphics Acceleration Overview Frame Buffer and Line Drawing Engine Prof. Kris Pister TAs: Vincent Lee, Ian Juch, Albert Magyar Version 1.5 In this project, you will use SDRAM to implement

More information

Multicore Design Considerations

Multicore Design Considerations Multicore Design Considerations Multicore: The Forefront of Computing Technology We re not going to have faster processors. Instead, making software run faster in the future will mean using parallel programming

More information

WiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems

WiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems 1 WiBench: An Open Source Kernel Suite for Benchmarking Wireless Systems Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti +, Achilleas Anastasopoulos*, Scott Mahlke*, Trevor Mudge* *,

More information

Communication Avoiding Successive Band Reduction

Communication Avoiding Successive Band Reduction Communication Avoiding Successive Band Reduction Grey Ballard, James Demmel, Nicholas Knight UC Berkeley PPoPP 12 Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by

More information

CREATE. CONTROL. CONNECT.

CREATE. CONTROL. CONNECT. CREATE. CONTROL. CONNECT. CREATE. CONTROL. CONNECT. DYVI offers unprecedented creativity, simple and secure operations along with technical reliability all in a costeffective, tailored and highly reliable

More information

Create. Control. Connect.

Create. Control. Connect. Create. Control. Connect. Create. Control. Connect. Control live broadcasting wherever you are The DYVI production suite is a whole new approach to live content creation. Taking advantage of the latest

More information

Distributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes

Distributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes Distributed Cluster Processing to Evaluate Interlaced Run-Length Compression Schemes Ankit Arora Sachin Bagga Rajbir Singh Cheema M.Tech (IT) M.Tech (CSE) M.Tech (CSE) Guru Nanak Dev University Asr. Thapar

More information

LIVE PRODUCTION SWITCHER. Think differently about what you can do with a production switcher

LIVE PRODUCTION SWITCHER. Think differently about what you can do with a production switcher LIVE PRODUCTION SWITCHER Think differently about what you can do with a production switcher BRILLIANTLY SIMPLE, CREATIVE CONTROL The DYVI live production switcher goes far beyond the traditional limits

More information

Build Applications Tailored for Remote Signal Monitoring with the Signal Hound BB60C

Build Applications Tailored for Remote Signal Monitoring with the Signal Hound BB60C Application Note Build Applications Tailored for Remote Signal Monitoring with the Signal Hound BB60C By Justin Crooks and Bruce Devine, Signal Hound July 21, 2015 Introduction The Signal Hound BB60C Spectrum

More information

EEC 581 Computer Architecture. Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling)

EEC 581 Computer Architecture. Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling) 1 EEC 581 Computer Architecture Instruction Level Parallelism (3.4 & 3.5 Dynamic Scheduling) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview of Chap. 3 (again) Pipelined

More information

ESE (ESE534): Computer Organization. Last Time. Today. Last Time. Align Data / Balance Paths. Retiming in the Large

ESE (ESE534): Computer Organization. Last Time. Today. Last Time. Align Data / Balance Paths. Retiming in the Large ESE680-002 (ESE534): Computer Organization Day 20: March 28, 2007 Retiming 2: Structures and Balance Last Time Saw how to formulate and automate retiming: start with network calculate minimum achievable

More information

Sharif University of Technology. SoC: Introduction

Sharif University of Technology. SoC: Introduction SoC Design Lecture 1: Introduction Shaahin Hessabi Department of Computer Engineering System-on-Chip System: a set of related parts that act as a whole to achieve a given goal. A system is a set of interacting

More information

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Slide Set 8. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng Slide Set 8 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide

More information

Lossless Compression Algorithms for Direct- Write Lithography Systems

Lossless Compression Algorithms for Direct- Write Lithography Systems Lossless Compression Algorithms for Direct- Write Lithography Systems Hsin-I Liu Video and Image Processing Lab Department of Electrical Engineering and Computer Science University of California at Berkeley

More information

Nutaq. PicoDigitizer-125. Up to 64 Channels, 125 MSPS ADCs, FPGA-based DAQ Solution With Up to 32 Channels, 1000 MSPS DACs PRODUCT SHEET. nutaq.

Nutaq. PicoDigitizer-125. Up to 64 Channels, 125 MSPS ADCs, FPGA-based DAQ Solution With Up to 32 Channels, 1000 MSPS DACs PRODUCT SHEET. nutaq. Nutaq Up to 64 Channels, 125 MSPS ADCs, FPGA-based DAQ Solution With Up to 32 Channels, 1000 MSPS DACs PRODUCT SHEET QUEBEC I MONTREAL I N E W YO R K I nutaq.com Nutaq The PicoDigitizer 125-Series is a

More information

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic. Prof. Sherief Reda School of Engineering, Brown University Fall 2014 EN2911X: Reconfigurable Computing Topic 01: Programmable Logic Prof. Sherief Reda School of Engineering, Brown University Fall 2014 1 Contents 1. Architecture of modern FPGAs Programmable interconnect

More information

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard Conference object, Postprint version This version is available

More information

THE Collider Detector at Fermilab (CDF) [1] is a general

THE Collider Detector at Fermilab (CDF) [1] is a general The Level-3 Trigger at the CDF Experiment at Tevatron Run II Y.S. Chung 1, G. De Lentdecker 1, S. Demers 1,B.Y.Han 1, B. Kilminster 1,J.Lee 1, K. McFarland 1, A. Vaiciulis 1, F. Azfar 2,T.Huffman 2,T.Akimoto

More information

Image Acquisition Technology

Image Acquisition Technology Image Choosing the Right Image Acquisition Technology A Machine Vision White Paper 1 Today, machine vision is used to ensure the quality of everything from tiny computer chips to massive space vehicles.

More information

Alain Legault Hardent. Create Higher Resolution Displays With VESA Display Stream Compression

Alain Legault Hardent. Create Higher Resolution Displays With VESA Display Stream Compression Alain Legault Hardent Create Higher Resolution Displays With VESA Display Stream Compression What Is VESA? 2 Why Is VESA Needed? Video In Processor TX Port RX Port Display Module To Display Mobile application

More information

Parallelization of Multimedia Applications by Compiler on Multicores for Consumer Electronics

Parallelization of Multimedia Applications by Compiler on Multicores for Consumer Electronics Vol. 0 No. 0 1959 TV MPEG2 MP3 JPEG 2000 OSCAR API VLIW 4 FR1000 SH-4A 4 RP1 FR1000 4 1 4 3.27 RP1 4 1 4 3.31 Parallelization of Multimedia Applications by Compiler on Multicores for Consumer Electronics

More information

Hardware Design I Chap. 5 Memory elements

Hardware Design I Chap. 5 Memory elements Hardware Design I Chap. 5 Memory elements E-mail: shimada@is.naist.jp Why memory is required? To hold data which will be processed with designed hardware (for storage) Main memory, cache, register, and

More information

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH GHEVC: An Efficient HEVC Decoder for Graphics Processing Units

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH GHEVC: An Efficient HEVC Decoder for Graphics Processing Units IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 3, MARCH 2017 459 GHEVC: An Efficient HEVC Decoder for Graphics Processing Units Diego F. de Souza, Student Member, IEEE, Aleksandar Ilic, Member, IEEE, Nuno

More information

Hybrid Discrete-Continuous Computer Architectures for Post-Moore s-law Era

Hybrid Discrete-Continuous Computer Architectures for Post-Moore s-law Era Hybrid Discrete-Continuous Computer Architectures for Post-Moore s-law Era Keynote at the Bi annual HiPEAC Compu6ng Systems Week Mee6ng Barcelona, Spain October 19 th 2010 Prof. Simha Sethumadhavan Columbia

More information

High Performance Raster Scan Displays

High Performance Raster Scan Displays High Performance Raster Scan Displays Item Type text; Proceedings Authors Fowler, Jon F. Publisher International Foundation for Telemetering Journal International Telemetering Conference Proceedings Rights

More information

The DM7 and the Future of High

The DM7 and the Future of High The DM7 and the Future of High Performance Computing in Space 15 th Annual CubeSat Developers Workshop April 30, 2018 Presented By: Aaron Zucherman Graduate Research Assistant DM Student Team Lead MSU

More information

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking

1ms Column Parallel Vision System and It's Application of High Speed Target Tracking Proceedings of the 2(X)0 IEEE International Conference on Robotics & Automation San Francisco, CA April 2000 1ms Column Parallel Vision System and It's Application of High Speed Target Tracking Y. Nakabo,

More information

Practical De-embedding for Gigabit fixture. Ben Chia Senior Signal Integrity Consultant 5/17/2011

Practical De-embedding for Gigabit fixture. Ben Chia Senior Signal Integrity Consultant 5/17/2011 Practical De-embedding for Gigabit fixture Ben Chia Senior Signal Integrity Consultant 5/17/2011 Topics Why De-Embedding/Embedding? De-embedding in Time Domain De-embedding in Frequency Domain De-embedding

More information

Achieving Timing Closure in ALTERA FPGAs

Achieving Timing Closure in ALTERA FPGAs Achieving Timing Closure in ALTERA FPGAs Course Description This course provides all necessary theoretical and practical know-how to write system timing constraints for variety designs in ALTERA FPGAs.

More information

Benchtop Portability with ATE Performance

Benchtop Portability with ATE Performance Benchtop Portability with ATE Performance Features: Configurable for simultaneous test of multiple connectivity standard Air cooled, 100 W power consumption 4 RF source and receive ports supporting up

More information

THE BaBar High Energy Physics (HEP) detector [1] is

THE BaBar High Energy Physics (HEP) detector [1] is IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 53, NO. 3, JUNE 2006 1299 BaBar Simulation Production A Millennium of Work in Under a Year D. A. Smith, F. Blanc, C. Bozzi, and A. Khan, Member, IEEE Abstract

More information

INFORMATION SYSTEMS. Written examination. Wednesday 12 November 2003

INFORMATION SYSTEMS. Written examination. Wednesday 12 November 2003 Victorian Certificate of Education 2003 SUPERVISOR TO ATTACH PROCESSING LABEL HERE INFORMATION SYSTEMS Written examination Wednesday 12 November 2003 Reading time: 11.45 am to 12.00 noon (15 minutes) Writing

More information

Optimizing the Startup Time of Embedded Systems: A Case Study of Digital TV

Optimizing the Startup Time of Embedded Systems: A Case Study of Digital TV 2242 IEEE Transactions on Consumer Electronics, Vol. 55, No. 4, NOVEMBER 2009 Optimizing the Startup Time of Embedded Systems: A Case Study of Digital TV Heeseung Jo, Hwanju Kim, Jinkyu Jeong, Joonwon

More information

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C

Go BEARS~ What are Machine Structures? Lecture #15 Intro to Synchronous Digital Systems, State Elements I C CS6C L5 Intro to SDS, State Elements I () inst.eecs.berkeley.edu/~cs6c CS6C : Machine Structures Lecture #5 Intro to Synchronous Digital Systems, State Elements I 28-7-6 Go BEARS~ Albert Chae, Instructor

More information

Vector IRAM Memory Performance for Image Access Patterns Richard M. Fromm Report No. UCB/CSD-99-1067 October 1999 Computer Science Division (EECS) University of California Berkeley, California 94720 Vector

More information

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor 14 12 10 8 6 IBM ES9000 Bipolar Fujitsu VP2000 IBM 3090S Pulsar 4 IBM 3090 IBM RY6 CDC Cyber 205 IBM 4381 IBM RY4 2 IBM 3081 Apache Fujitsu M380 IBM 370 Merced IBM 360 IBM 3033 Vacuum Pentium II(DSIP)

More information

Explorer Edition FUZZY LOGIC DEVELOPMENT TOOL FOR ST6

Explorer Edition FUZZY LOGIC DEVELOPMENT TOOL FOR ST6 fuzzytech ST6 Explorer Edition FUZZY LOGIC DEVELOPMENT TOOL FOR ST6 DESIGN: System: up to 4 inputs and one output Variables: up to 7 labels per input/output Rules: up to 125 rules ON-LINE OPTIMISATION:

More information

100Gb/s Single-lane SERDES Discussion. Phil Sun, Credo Semiconductor IEEE New Ethernet Applications Ad Hoc May 24, 2017

100Gb/s Single-lane SERDES Discussion. Phil Sun, Credo Semiconductor IEEE New Ethernet Applications Ad Hoc May 24, 2017 100Gb/s Single-lane SERDES Discussion Phil Sun, Credo Semiconductor IEEE 802.3 New Ethernet Applications Ad Hoc May 24, 2017 Introduction This contribution tries to share thoughts on 100Gb/s single-lane

More information

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011

ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011 ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2011 Lecture 9: TX Multiplexer Circuits Sam Palermo Analog & Mixed-Signal Center Texas A&M University Announcements & Agenda Next

More information

Amon: Advanced Mesh-Like Optical NoC

Amon: Advanced Mesh-Like Optical NoC Amon: Advanced Mesh-Like Optical NoC Sebastian Werner, Javier Navaridas and Mikel Luján Advanced Processor Technologies Group School of Computer Science The University of Manchester Bottleneck: On-chip

More information

On the Rules of Low-Power Design

On the Rules of Low-Power Design On the Rules of Low-Power Design (and How to Break Them) Prof. Todd Austin Advanced Computer Architecture Lab University of Michigan austin@umich.edu Once upon a time 1 Rules of Low-Power Design P = acv

More information

Digital Integrated Circuits EECS 312

Digital Integrated Circuits EECS 312 14 12 10 8 6 Fujitsu VP2000 IBM 3090S Pulsar 4 IBM 3090 IBM RY6 CDC Cyber 205 IBM 4381 IBM RY4 2 IBM 3081 Apache Fujitsu M380 IBM 370 Merced IBM 360 IBM 3033 Vacuum Pentium II(DSIP) 0 1950 1960 1970 1980

More information

Epiphan Frame Grabber User Guide

Epiphan Frame Grabber User Guide Epiphan Frame Grabber User Guide VGA2USB VGA2USB LR DVI2USB VGA2USB HR DVI2USB Solo VGA2USB Pro DVI2USB Duo KVM2USB www.epiphan.com 1 February 2009 Version 3.20.2 (Windows) 3.16.14 (Mac OS X) Thank you

More information

ni.com Digital Signal Processing for Every Application

ni.com Digital Signal Processing for Every Application Digital Signal Processing for Every Application Digital Signal Processing is Everywhere High-Volume Image Processing Production Test Structural Sound Health and Vibration Monitoring RF WiMAX, and Microwave

More information

Frame Processing Time Deviations in Video Processors

Frame Processing Time Deviations in Video Processors Tensilica White Paper Frame Processing Time Deviations in Video Processors May, 2008 1 Executive Summary Chips are increasingly made with processor designs licensed as semiconductor IP (intellectual property).

More information

Telephony Training Systems

Telephony Training Systems Telephony Training Systems LabVolt Series Datasheet Festo Didactic en 120 V - 60 Hz 07/2018 Table of Contents General Description 2 Topic Coverage 6 Features & Benefits 6 List of Available Training Systems

More information

CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm

CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm CS152 Computer Architecture and Engineering Lecture 17 Advanced Pipelining: Tomasulo Algorithm 2003-10-23 Dave Patterson (www.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs152/ CS 152 L17 Adv.

More information

SCOREBOARDS ADDENDUM NO. 2 PROJECT NO PAGE 1 OF 5 MATTOON ILLINOIS April 12, 2018 ADDENDUM NO. 2

SCOREBOARDS ADDENDUM NO. 2 PROJECT NO PAGE 1 OF 5 MATTOON ILLINOIS April 12, 2018 ADDENDUM NO. 2 PROJECT NO. 2018-005 PAGE 1 OF 5 ADDENDUM NO. 2 The following shall be added to and become part of the Specifications for the above referenced project. ITEM NO. 1 Section 00113. Advertisement for Bids,

More information

Solutions to Embedded System Design Challenges Part II

Solutions to Embedded System Design Challenges Part II Solutions to Embedded System Design Challenges Part II Time-Saving Tips to Improve Productivity In Embedded System Design, Validation and Debug Hi, my name is Mike Juliana. Welcome to today s elearning.

More information

Computer and Machine Vision

Computer and Machine Vision Computer and Machine Vision Lecture Week 3 Part-1 January 27, 2014 Sam Siewert Outline of Week 3 Processing Images and Moving Pictures High Level View and Computer Architecture for it Linux Platforms for

More information

On the Characterization of Distributed Virtual Environment Systems

On the Characterization of Distributed Virtual Environment Systems On the Characterization of Distributed Virtual Environment Systems P. Morillo, J. M. Orduña, M. Fernández and J. Duato Departamento de Informática. Universidad de Valencia. SPAIN DISCA. Universidad Politécnica

More information

SCode V3.5.1 (SP-601 and MP-6010) Digital Video Network Surveillance System

SCode V3.5.1 (SP-601 and MP-6010) Digital Video Network Surveillance System V3.5.1 (SP-601 and MP-6010) Digital Video Network Surveillance System Core Technologies Image Compression MPEG4. It supports high compression rate with good image quality and reduces the requirement of

More information

2 MHz Lock-In Amplifier

2 MHz Lock-In Amplifier 2 MHz Lock-In Amplifier SR865 2 MHz dual phase lock-in amplifier SR865 2 MHz Lock-In Amplifier 1 mhz to 2 MHz frequency range Dual reference mode Low-noise current and voltage inputs Touchscreen data display

More information

Using SignalTap II in the Quartus II Software

Using SignalTap II in the Quartus II Software White Paper Using SignalTap II in the Quartus II Software Introduction The SignalTap II embedded logic analyzer, available exclusively in the Altera Quartus II software version 2.1, helps reduce verification

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

QuickSpecs. NVIDIA Graphics SUPPORTED SOLUTIONS. NVIDIA Graphics. Overview QUADRO NVIDIA QUADRO K2200 L2K02AA NVIDIA QUADRO M2000M (12GB)

QuickSpecs. NVIDIA Graphics SUPPORTED SOLUTIONS. NVIDIA Graphics. Overview QUADRO NVIDIA QUADRO K2200 L2K02AA NVIDIA QUADRO M2000M (12GB) Overview SUPPORTED SOLUTIONS Category Part number QUADRO NVIDIA QUADRO K420 NVIDIA QUADRO K620 NVIDIA QUADRO K1200 NVIDIA QUADRO K2200 NVIDIA QUADRO M2000 NVIDIA QUADRO M4000 NVIDIA QUADRO M5000 NVIDIA

More information

Masters of Science in COMPUTER ENGINEERING

Masters of Science in COMPUTER ENGINEERING PICSEL: Measuring User-Perceived Performance to Control Dynamic Frequency Scaling IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Masters of Science in COMPUTER ENGINEERING By Jack Cosgrove

More information

Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng

Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng Slide Set 9 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 9 slide

More information

Telephony Training Systems

Telephony Training Systems Telephony Training Systems LabVolt Series Datasheet Festo Didactic en 240 V - 50 Hz 04/2018 Table of Contents General Description 2 Topic Coverage 6 Features & Benefits 6 List of Available Training Systems

More information

Critical C-RAN Technologies Speaker: Lin Wang

Critical C-RAN Technologies Speaker: Lin Wang Critical C-RAN Technologies Speaker: Lin Wang Research Advisor: Biswanath Mukherjee Three key technologies to realize C-RAN Function split solutions for fronthaul design Goal: reduce the fronthaul bandwidth

More information

3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits

3/5/2017. A Register Stores a Set of Bits. ECE 120: Introduction to Computing. Add an Input to Control Changing a Register s Bits University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Registers A Register Stores a Set of Bits Most of our representations use sets

More information

5620 SAM SERVICE AWARE MANAGER 14.0 R7. Planning Guide

5620 SAM SERVICE AWARE MANAGER 14.0 R7. Planning Guide 5620 SAM SERVICE AWARE MANAGER 14.0 R7 Planning Guide 3HE-10698-AAAE-TQZZA December 2016 5620 SAM Legal notice Nokia is a registered trademark of Nokia Corporation. Other products and company names mentioned

More information

Lab2: Cache Memories. Dimitar Nikolov

Lab2: Cache Memories. Dimitar Nikolov Lab2: Cache Memories Dimitar Nikolov Goal Understand how cache memories work Learn how different cache-mappings impact CPU time Leran how different cache-sizes impact CPU time Lund University / Electrical

More information

The AuroraScience Project

The AuroraScience Project The AuroraScience Project F. S. Schifano 1 1 University of Ferrara and INFN-Ferrara November 25-26, 2009 F. S. Schifano (Univ. and INFN of Ferrara) The AuroraScience Project November 25-26, 2009 1 / 24

More information

SIGGRAPH 2013 Shaping the Future of Visual Computing

SIGGRAPH 2013 Shaping the Future of Visual Computing SIGGRAPH 2013 Shaping the Future of Visual Computing High Performance Graphics for 4K & Ultra High Resolution Displays Doug Traill, Senior Solutions Architect QuadroSVS@nvidia.com Things I want you to

More information