Parallel Computing. Chapter 3

Similar documents
Amdahl s Law in the Multicore Era

VLSI Design Digital Systems and VLSI

High Performance Microprocessor Design and Automation: Overview, Challenges and Opportunities IBM Corporation

Lecture 1: Circuits & Layout

Sharif University of Technology. SoC: Introduction

Digital Integrated Circuits EECS 312. Review. Remember the ENIAC? IC ENIAC. Trend for one company. First microprocessor

Digital Integrated Circuits EECS 312

Overview of All Pixel Circuits for Active Matrix Organic Light Emitting Diode (AMOLED)

Understanding Compression Technologies for HD and Megapixel Surveillance

8 DIGITAL SIGNAL PROCESSOR IN OPTICAL TOMOGRAPHY SYSTEM

Co-simulation Techniques for Mixed Signal Circuits

Why Use the Cypress PSoC?

Lecture 1: Intro to CMOS Circuits

IC Design of a New Decision Device for Analog Viterbi Decoder

SEMICONDUCTOR TECHNOLOGY -CMOS-

Last time, we saw how latches can be used as memory in a circuit

SEMICONDUCTOR TECHNOLOGY -CMOS-

DIFFERENTIATE SOMETHING AT THE VERY BEGINNING THE COURSE I'LL ADD YOU QUESTIONS USING THEM. BUT PARTICULAR QUESTIONS AS YOU'LL SEE

MindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK.

Prime Num Generator - Maker Faire 2014

(Refer Slide Time 1:58)

Layers of Innovation: How Signal Chain Innovations are Creating Analog Opportunities in a Digital World

System Quality Indicators

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Reconfigurable Neural Net Chip with 32K Connections

Embedded System Design

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Integrated Circuit Design ELCT 701 (Winter 2017) Lecture 1: Introduction

WELCOME. ECE 2030: Introduction to Computer Engineering* Richard M. Dansereau Copyright by R.M. Dansereau,

24. Scaling, Economics, SOI Technology

VLSI Test Technology and Reliability (ET4076)

ELEN Electronique numérique

Welcome to Electrical and Electronic Engineering UCD. Electronic/Computer Engineering (ECE)

Basics Of Digital Logic And Data Representation

Chapter 1. Introduction to Digital Signal Processing

Copyright 2011 by Enoch Hwang, Ph.D. and Global Specialties. All rights reserved. Printed in Taiwan.

VU + SOLO2 performance above all

The SOUND PROCESSING. A History of Audio Processing Part 4 Digital Processing Goes into High Gear. by Jim Somich with Barry Mishkind

CS Part 1 1 Dr. Rajesh Subramanyan, 2005

Lecture 1: Introduction to Digital Logic Design. CK Cheng CSE Dept. UC San Diego

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

MARKET OUTPERFORMERS CELERITAS INVESTMENTS

Outline. 1 Reiteration. 2 Dynamic scheduling - Tomasulo. 3 Superscalar, VLIW. 4 Speculation. 5 ILP limitations. 6 What we have done so far.

Challenges in the design of a RGB LED display for indoor applications

6.111 Final Project Proposal Kelly Snyder and Rebecca Greene. Abstract

Reconfigurable Architectures. Greg Stitt ECE Department University of Florida

An FPGA Implementation of Shift Register Using Pulsed Latches

Digital Logic Design ENEE x. Lecture 24

Electronic & Electrical Engineering. Your Feedback on Stage 2. Information for Stage 2 Students April Electronic & Electrical Route

Analog, Mixed-Signal, and Radio-Frequency (RF) Electronic Design Laboratory. Electrical and Computer Engineering Department UNC Charlotte

Semiconductor Devices. Microwave Application Products. Microwave Tubes and Radar Components

An Efficient High Speed Wallace Tree Multiplier

IE1204 Digital Design L1 : Course Overview. Introduction to Digital Technology. Binary Numbers

January Spectra7 Microsystems, Inc.

Senior Design Project: Blind Transmitter

006 Dual Divider. Two clock/frequency dividers with reset

Written Progress Report. Automated High Beam System

High Performance Raster Scan Displays

ECE Real Time Embedded Systems Final Project. Speeding Detecting System

Slide Set 7. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

RECOMMENDATION ITU-R BT.1201 * Extremely high resolution imagery

Overview. Teacher s Manual and reproductions of student worksheets to support the following lesson objective:

Telephone calls and the Brontosaurus Adam Atkinson

EECS150 - Digital Design Lecture 10 - Interfacing. Recap and Topics

VPL-VW5000ES. Technical Background VPL-VW5000ES

DCI Requirements Image - Dynamics

Administrative issues. Sequential logic

VLSI System Testing. BIST Motivation

ECE532 Digital System Design Title: Stereoscopic Depth Detection Using Two Cameras. Final Design Report

Sequencing and Control

Control Unit. Arturo Díaz-Pérez Departamento de Computación Laboratorio de Tecnologías de Información CINVESTAV-IPN

In this lecture we will work through a design example from problem statement to digital circuits.

Image Acquisition Technology

WINNER TAKE ALL: How Competitiveness Shapes the Fate of Nations. Richard Elkus, Jr. The Derivative Debacle

Display Technologies CMSC 435. Slides based on Dr. Luebke s slides

Techniques for Extending Real-Time Oscilloscope Bandwidth

PORK BEEF CHICKEN. Rapid Defrosting with. Radio Frequency BENEFITS OF THE RF TEMPERING & THAWING METHOD

Appeal decision. Appeal No USA. Osaka, Japan

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

Adding Analog and Mixed Signal Concerns to a Digital VLSI Course

Digital High Resolution Display Technology. A New Way of Seeing Things.

Spectrum for the Internet of Things

R Fig. 5 photograph of the image reorganization circuitry. Circuit diagram of output sampling stage.

Transcript of Dr. Supiryo Bandyopadhyay Interview. Inigo Howlett, 88.9 WCVE, reporting for Science Matters, April 2016

Need for FEC-protected chip-to-module CAUI-4 specification. Piers Dawe Mellanox Technologies

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

ADOSE DELIVERABLE D6.9; PUBLIC SUMMARY SRS Testing of components and subsystems

High Performance TFT LCD Driver ICs for Large-Size Displays

PORK BEEF POULTRY. Rapid Defrosting with. Radio Frequency BENEFITS OF THE RF TEMPERING & THAWING METHOD

Digital Light Processing

TV Character Generator

Frame Processing Time Deviations in Video Processors

Slide Set 9. for ENCM 501 in Winter Steve Norman, PhD, PEng

OLED: Form Follows Function for Digital Displays. Presented by:

3D-CHIP TECHNOLOGY AND APPLICATIONS OF MINIATURIZATION

Reciprocating Machine Protection


Monitor and Display Adapters UNIT 4

Sony, metal particle and A3MP tape: Nanoscale technology for terabyte storage

Bluetooth. Wireless Technology Solutions. STMicroelectronics More Intelligent Solutions

Transcription:

Chapter 3 Parallel Computing As we have discussed in the Processor module, in these few decades, there has been a great progress in terms of the computer speed, indeed a 20 million fold increase during a fifty year period. This is done, mainly due to the fact that more and more transistors have been integrated into a silicon chip, from a few to tens (SSI), to hundreds(msi), to thousands(lsi), and to the billions(vlsi). 1

Moore s law This phenomenon is nicely summarized via the Moore s law: The number of transistors placed on a chip has been doubled every eighteen month. For example, Intel 8086, a processor chip made by Intel in 1978, contained 29,000 transistors, and ran at 5 MHz; and the Intel Core 2 Duo, introduced in 2006, contained 291 million transistors and ran at the speed of 2.93 GHz. Thus, during those 28 years, the number of transistors has gone up by 10,034 times, or doubled once every 24 months, or two years. 2

A picture worths how many words? More importantly, this increase of the transistors directly leads to an increase of the computer speed. In this case, the speed goes up by 586 times during this period. The following chart shows the increase of the computer speed corresponding to that of the integration number. 3

Not just the speed... Moreover, besides processing speed, some of the other capabilities of many digital electronic devices are also strongly connected to Moore s law: memory capacity, sensors and even the number and size of pixels in digital cameras. As a result, all of these technology have also been speeding up at this stunning exponential rate as well. Since Moore s law precisely describes a driving force of technological and social change in the past thirty or so years, it has been used to guide long term planning and to set targets for research and development. 4

A dead end? Unfortunately, this era of steady and rapid growth of single-processor performance over 30 years is essential over, because By doubling every eighteen months,, we have to make the wires 2 thinner every eighteen months. This has to come to an end at some point since we can t make the wires infinitely thin. Although every transistor produces only a tiny bit of heat, when you put billions of them to a tiny space, the amount do add up,..., to that at the surface of the Sun. We also have essentially done our best to dig out all the benefits of a complicated single processor architecture. 5

What to do? Fortunately, Moore s law is not completely out of the window yet. It is predicted that it will continue for another five years or so. This many transistors will no longer be used to construct a single processor, but to increase the number of independent processors in a single chip. We will then try to speed up the whole process of letting those independent processors work on the data in parallel. An analogy could be that, in the ancient time, we can only cook one thing at a time with our old fashioned stove. 6

Nowadays, with a contemporary stove, we can cook many different dishes in parallel, or at the same time, which certainly saves time. Similarly, we could cut up a big problem into many smaller ones, and run them in parallel with multiple processors. Could we? 7

They are happening everywhere... Indeed, we can find many examples of parallel computing in our work and/or life: multiple galaxies running in the Universe, multiple lanes in I-93, multiple gas pumps in most of the gas stations, etc.. 8

It is difficult... They all sound good, but it is not as easy. In the cooking example, a good chef knows that she will not always cook everything at the same time. To cook the dish of, e.g., Pepper, Onions and Pork, she has to fry the pepper, and the pork first, which can be done at the same time; then fry the onion, which is mixed with the partially fried pepper and the pork. In the multiple lane case, although the cars in different lanes can go forward in parallel, the cars in the same lane have to go forward in turn. It is the same idea to do computing in parallel. You have to figure out what parts can be done in parallel, and what have to be done in parallel. 9

An example We have been using computers to do the courses registration for quite a few years now. When adding somebody into a class, a program has to make sure, among other things, that the total number of students added into a class is no more than the cap of that class, 25 for ours. If we run course add sequentially, i.e., one by one, this is what the program will do to add another student into this calfs: if the current number < 25 then add this student Thus, before we add in another student, we always check the cap. 10

The parallel case Since the above add consists of two steps: one check and another add, when we try to add multiple requests at the same time, we might get into trouble since we don t know in what order will the steps get mixed up. For example, if there are 24 students signed up for this course, and two more students come to add into the course. What is to happen? 11

This will. If we do the add in parallel, and it happens that the arrangement of the two steps for the two adds look like the following: Request 1 time Request 2 - - Check the number t 1 - (Still 24) - - t 2 Check the number - (Still 24) Add in student t 3 - (Now 25) - - t 4 Add in student - (Now 26) Thus, as the above charts shows, we will add more students than what the cap requires. 12

Software is really hard Although we have been working with parallel computer hardware for a long time, since the late 1960 s, its programming is really difficult as we have to take care of the communication and coordination issues between the multiple processors, just like when we do conference calls, we want to make sure that only one person speaks at a time. In other words, the difficulty lies in on the software part, although we can come with lots of cheap hardware parts. 13

How fast could it be? The natural expectation for the speed-up from parallelization would be linear: If you put in a two lane highway, then two cars can do through the toll both at the same time, and if you put in a four lane, then four cars can pay tolls in parallel. That is why we often put in multiple toll booths, e.g., in Exit 11 in I-93. On the other hand, this does not happen to the parallel computing: very few parallel algorithms achieve linear speed-up. Most of them have a near-linear speed-up for small numbers of processing elements, but degrades to constant value for large numbers of processing elements. 14

Here is the limit The potential speed-up of a parallel algorithm on a parallel computer is given by Amdahl s law, established in 1960s by Gene Amdahl. When a big problem is cut into a bunch of smaller one, some of them can run in parallel, while the others have to run as a sequence, then, it is the latter that will decide overall speed-up available from parallelization. This relationship is given by the equation: S = 1 1 P, where S is the speed-up of the program, as a factor of its original sequential runtime, and P is the fraction that can be run in parallel. 15

An example If we cut the problem into ten pieces, nine of them can run in parallel, while one piece can t, we have S = 10%,P = 90%, then, the Amdahl s law tells us that S = 1 1 0.9 = 1 0.1 = 10. In other words, at most, we can speed it up 10 times, no matter how many processors we throw in. This result thus puts an upper limit on the usefulness of adding more parallel execution units. One way to put it: The bearing of a child takes nine months, no matter how many women are assigned. 16

Discussion topics Do some further research on Amdahl s law, and share with us your findings in laymen s language. What are some of the successful applications of this multi-processing idea in parallel computing? Give some details... What is it? Why do we do it in parallel? What are the benefits, as compared with sequential computing? In your life, study and/or work, have you ever applied the multi-processing strategy, i.e., do multiple things at one time? If yes, give us some examples: what is the problem? how to you cut it into smaller problems? Can all these smaller ones be run in parallel? If not all of them can be run in parallel, how do you coordinate them? 17